Key Responsibilities
- Own QA for Spark/Scala data pipelines on AWS EMR.
- Design and execute test plans, test cases, and data validations for large datasets.
- Perform source-to-target data reconciliation using SQL and Spark SQL.
- Build and maintain automation suites for data pipeline regression.
- Validate pipeline failure scenarios, retries, backfills, schema changes, and data quality rules.
Collaborate with Data Engineering and DevOps teams in an Agile/CI-CD environment.
Roles & Responsibilities
Required Skills
- 5+ years of QA experience with strong Big Data testing exposure
- Hands-on knowledge of Apache Spark, Scala (code-level understanding)
- Experience with AWS EMR, S3, logs, and job monitoring
- Strong SQL / Spark SQL for data validation
- Test automation using Python / Scala / Java
- Experience with ETL/ELT pipeline testing and regression strategies
- Familiarity with Git and CI/CD tools