£450 Per day
Inside
Remote
England, United Kingdom
Summary: The role of a PySpark Data Engineer involves supporting the development of a modern, scalable data lake for a strategic programme, focusing on replacing legacy reporting solutions. This position offers the opportunity to shape a high-impact platform from the ground up, utilizing technologies such as PySpark, Databricks, and Delta Lake. The engineer will work collaboratively in Agile workflows and apply best practices in data engineering and testing. Active SC clearance is essential for applicants.
Key Responsibilities:
- Design, build, and maintain scalable data pipelines using PySpark 3/4 and Python 3.
- Contribute to the creation of a unified data lake following medallion architecture principles.
- Leverage Databricks and Delta Lake (Parquet format) for efficient, reliable data processing.
- Apply BDD testing practices using Python Behave and ensure code quality with Python Coverage.
- Collaborate with cross-functional teams and participate in Agile delivery workflows.
- Manage configurations and workflows using YAML, Git, and Azure DevOps.
Key Skills:
- Proven expertise in PySpark 3/4 and Python 3 for large-scale data engineering.
- Hands-on experience with Databricks, Delta Lake, and medallion architecture.
- Familiarity with Python Behave for Behaviour Driven Development.
- Strong understanding of YAML, code quality tools (e.g. Python Coverage), and CI/CD pipelines.
- Knowledge of Azure DevOps and Git best practices.
- Active SC clearance is essential - applicants without this cannot be considered.
Salary (Rate): £450 daily
City: undetermined
Country: United Kingdom
Working Arrangements: remote
IR35 Status: inside IR35
Seniority Level: undetermined
Industry: IT
PySpark Data Engineer | up to £450/day Inside | Remote with occasional London travel
We are seeking a PySpark Data Engineer to support the development of a modern, scalable data lake for a new strategic programme. This is a greenfield initiative to replace fragmented legacy reporting solutions, offering the opportunity to shape a long-term, high-impact platform from the ground up.
Key Responsibilities:
- Design, build, and maintain scalable data pipelines using PySpark 3/4 and Python 3.
- Contribute to the creation of a unified data lake following medallion architecture principles.
- Leverage Databricks and Delta Lake (Parquet format) for efficient, reliable data processing.
- Apply BDD testing practices using Python Behave and ensure code quality with Python Coverage.
- Collaborate with cross-functional teams and participate in Agile delivery workflows.
- Manage configurations and workflows using YAML, Git, and Azure DevOps.
Required Skills & Experience:
- Proven expertise in PySpark 3/4 and Python 3 for large-scale data engineering.
- Hands-on experience with Databricks, Delta Lake, and medallion architecture.
- Familiarity with Python Behave for Behaviour Driven Development.
- Strong understanding of YAML, code quality tools (e.g. Python Coverage), and CI/CD pipelines.
- Knowledge of Azure DevOps and Git best practices.
- Active SC clearance is essential - applicants without this cannot be considered.
Contract Details:
- 6-month initial contract with long-term extension potential (multi-year programme).
- Inside IR35.
This is an excellent opportunity to join a high-profile programme at its inception and help build a critical data platform from the ground up. If you are a mission-driven engineer with a passion for scalable data solutions and secure environments, we'd love to hear from you.