Negotiable
Undetermined
Remote
United Kingdom
Summary: The Lead PySpark Engineer will spearhead a significant data modernization project, focusing on migrating legacy data workflows to an AWS cloud environment. This hands-on role requires expertise in converting SAS code into PySpark pipelines within the financial services sector. The position demands strong technical skills in pipeline engineering, performance tuning, and quality assurance. The ideal candidate will have over five years of experience in PySpark and a solid understanding of AWS data services.
Key Responsibilities:
- Lead the end-to-end migration of SAS code (Base SAS, Macros, DI Studio) to PySpark using automated tools (SAS2PY) and manual refactoring.
- Design, build, and troubleshoot complex ETL/ELT workflows and data marts on AWS.
- Optimise Spark workloads for execution efficiency, partitioning, and cost-effectiveness.
- Implement clean coding principles, modular design, and robust unit/comparative testing to ensure data accuracy throughout the migration.
- Maintain Git-based workflows, CI/CD integration, and comprehensive technical documentation.
Key Skills:
- 5+ years of hands-on experience writing scalable, production-grade PySpark/Spark SQL.
- Strong proficiency in AWS Data Stack including EMR, Glue, S3, Athena, and Glue Workflows.
- Solid foundation in SAS to enable the understanding and debugging of legacy logic for conversion.
- Expertise in ETL/ELT, dimensions, facts, SCDs, and data mart architecture.
- Experience with parameterisation, exception handling, and modular Python design.
Salary (Rate): undetermined
City: undetermined
Country: United Kingdom
Working Arrangements: remote
IR35 Status: undetermined
Seniority Level: Senior
Industry: Financial Services
Lead PySpark Engineer (Cloud Migration)
Role Type: 5-Month Contract
Location: Remote (UK-Based)
Experience Level: Lead / Senior (5+ years PySpark)
Role Overview
We are seeking a Lead PySpark Engineer to drive a large-scale data modernisation project, transitioning legacy data workflows into a high-performance AWS cloud environment. This is a hands-on technical role focused on converting legacy SAS code into production-ready PySpark pipelines within a complex financial services landscape.
Key Responsibilities
- Code Conversion: Lead the end-to-end migration of SAS code (Base SAS, Macros, DI Studio) to PySpark using automated tools (SAS2PY) and manual refactoring.
- Pipeline Engineering: Design, build, and troubleshoot complex ETL/ELT workflows and data marts on AWS.
- Performance Tuning: Optimise Spark workloads for execution efficiency, partitioning, and cost-effectiveness.
- Quality Assurance: Implement clean coding principles, modular design, and robust unit/comparative testing to ensure data accuracy throughout the migration.
- Engineering Excellence: Maintain Git-based workflows, CI/CD integration, and comprehensive technical documentation.
Technical Requirements
- PySpark (P3): 5+ years of hands-on experience writing scalable, production-grade PySpark/Spark SQL.
- AWS Data Stack (P3): Strong proficiency in EMR, Glue, S3, Athena, and Glue Workflows.
- SAS Knowledge (P1): Solid foundation in SAS to enable the understanding and debugging of legacy logic for conversion.
- Data Modeling: Expertise in ETL/ELT, dimensions, facts, SCDs, and data mart architecture.
- Engineering Quality: Experience with parameterisation, exception handling, and modular Python design.
Additional Details
Industry: Financial Services experience is highly desirable.
Working Pattern: Fully remote with internal team collaboration days.
Benefits: 33 days holiday entitlement (pro-rata).