Pyspark Lead Engineer

Pyspark Lead Engineer

Posted Today by Randstad Digital

Negotiable
Undetermined
Undetermined
London Area, United Kingdom

Summary: The PySpark Engineer Lead will spearhead the migration of legacy SAS analytics to a cloud-native PySpark ecosystem on AWS, focusing on refactoring complex logic into scalable pipelines for a Tier-1 financial services environment. This role requires engineering leadership in designing ETL/ELT pipelines and modernizing legacy systems while ensuring performance optimization and data governance. The position demands expertise in various AWS services and a strong foundation in Python and PySpark. The successful candidate will implement rigorous testing and CI/CD frameworks to maintain high data accuracy.

Key Responsibilities:

  • Design and develop complex ETL/ELT pipelines and Data Marts using PySpark, EMR, and Glue.
  • Architect the conversion of SAS Base/Macros into modular, testable Python code using SAS2PY and manual refactoring.
  • Optimise Spark execution (partitioning, shuffling, caching) for cost-efficient processing of massive financial datasets.
  • Implement rigorous CI/CD, unit testing, and data reconciliation frameworks to ensure "penny-perfect" accuracy.

Key Skills:

  • Expertise in PySpark and Python (Clean Code/SOLID principles).
  • Experience with AWS services: EMR, Glue, S3, Athena, IAM, Lambda.
  • Proficiency in data modeling: SCD Type 2, Fact/Dimension tables, Data Vault/Star Schema.
  • Ability to read/debug SAS (Base, Macros, DI Studio).
  • Familiarity with DevOps tools: Git-based workflows, Jenkins/GitLab CI, Terraform.

Salary (Rate): undetermined

City: London Area

Country: United Kingdom

Working Arrangements: undetermined

IR35 Status: undetermined

Seniority Level: undetermined

Industry: IT

Detailed Description From Employer:

PySpark Engineer Lead Contract

As the Technical Lead, you will drive the high-stakes migration of legacy SAS analytics to a modern, cloud-native PySpark ecosystem on AWS. This isn't just a lift and shift you will refactor complex procedural logic into scalable, production-ready distributed pipelines for a Tier-1 financial services environment.

Core Responsibilities

  • Engineering Leadership: Design and develop complex ETL/ELT pipelines and Data Marts using PySpark, EMR, and Glue.
  • Legacy Modernisation: Architect the conversion of SAS Base/Macros into modular, testable Python code using SAS2PY and manual refactoring.
  • Performance Tuning: Optimise Spark execution (partitioning, shuffling, caching) to ensure cost-efficient processing of massive financial datasets.
  • Quality & Governance: Implement rigorous CI/CD, unit testing, and data reconciliation frameworks to ensure "penny-perfect" accuracy.

Technical Stack

  • Engine: PySpark (Expert), Python (Clean Code/SOLID principles).
  • AWS: EMR, Glue, S3, Athena, IAM, Lambda.
  • Data Modeling: SCD Type 2, Fact/Dimension tables, Data Vault/Star Schema.
  • Legacy: Proficiency in reading/debugging SAS (Base, Macros, DI Studio).
  • DevOps: Git-based workflows, Jenkins/GitLab CI, Terraform.