Negotiable
Outside
Remote
USA
Summary: The ETL Engineer role requires a seasoned professional with over 12 years of experience to design and implement scalable ETL/ELT data pipelines using PySpark on AWS. The position involves building and maintaining data lakes and warehouses, optimizing Spark jobs, and collaborating with data teams to support analytics initiatives. The engineer will also implement data governance practices and ensure compliance with data security regulations.
Key Responsibilities:
- Design and implement scalable ETL/ELT data pipelines using PySpark on AWS cloud infrastructure.
- Build and maintain data lakes and data warehouses (e.g., S3, Redshift, Glue, EMR).
- Optimize Spark jobs for performance and cost-efficiency.
- Manage data ingestion from multiple sources, ensuring data quality, reliability, and consistency.
- Automate workflows using Apache Airflow, AWS Step Functions, or similar orchestration tools.
- Collaborate with data scientists, analysts, and other engineers to support analytics and machine learning initiatives.
- Implement data governance practices, including metadata management, data cataloging, and access control.
- Monitor and troubleshoot pipeline failures and performance bottlenecks.
- Ensure compliance with data security and privacy regulations.
Key Skills:
- 12+ years of experience in ETL/ELT processes.
- Proficiency in PySpark and AWS services (S3, Redshift, Glue, EMR).
- Experience with data lakes and data warehouses.
- Strong optimization skills for Spark jobs.
- Knowledge of data ingestion and data quality management.
- Experience with workflow automation tools like Apache Airflow or AWS Step Functions.
- Understanding of data governance practices.
- Ability to monitor and troubleshoot data pipelines.
- Familiarity with data security and privacy regulations.
Salary (Rate): undetermined
City: undetermined
Country: USA
Working Arrangements: remote
IR35 Status: outside IR35
Seniority Level: undetermined
Industry: IT