PySpark Data Engineer

PySpark Data Engineer

Posted 4 days ago by Damia Group LTD

£630 Per day
Inside
Remote
England, UK

Summary: The role of a PySpark Data Engineer involves developing and optimizing PySpark batch pipelines for processing Parquet data and utilizing Delta Lake for input/output operations. The position requires implementing validation and billing logic within the code, ensuring performance tuning, and integrating with orchestrators and CI/CD pipelines. The role is remote and classified as inside IR35, with a contract duration of over two months. Candidates should ideally possess CTC clearance and relevant industry knowledge in Azure technologies.

Key Responsibilities:

  • Develop and optimise PySpark batch pipelines that process Parquet data and use Delta Lake for all IO, applying validation, enrichment, and billing calculation logic directly in PySpark code.
  • Build reliable PySpark jobs that read/write Delta tables on ADLS Gen2.
  • Implement in-code validations (schema, null/range/value checks, referential lookups), routing rejects to dedicated Delta "quarantine" tables.
  • Design and implement billing logic - tariff/charge models, tiered pricing, pro-rata handling, VAT/discounts, adjustments, and full auditability.
  • Externalise billing and validation rules via versioned JSON configs, ensuring deterministic, idempotent re-runs.
  • Optimise Delta operations (MERGE, OPTIMIZE, Z-ORDER, VACUUM) and incremental/CDC merges into Azure SQL.
  • Tune performance (partitioning, caching, broadcast joins) and maintain robust retries, checkpoints, and structured logging.
  • Integrate with orchestrators (ADF or Container App Orchestrator) and CI/CD pipelines (GitHub Actions).
  • Operate securely within private-network Azure environments (Managed Identity, RBAC, Private Endpoints).

Key Skills:

  • PySpark with Delta Lake (structured APIs, MERGE, schema evolution).
  • Solid knowledge of Azure Synapse Spark pools or Databricks, ADLS Gen2, and Azure SQL.
  • Strong engineering discipline: observability, retries, cost and performance optimisation.
  • Great Expectations (for supplementary DQ checks).
  • Familiarity with ADF orchestration and containerised Spark workloads.

Salary (Rate): £630 daily

City: undetermined

Country: UK

Working Arrangements: remote

IR35 Status: inside IR35

Seniority Level: undetermined

Industry: IT

Detailed Description From Employer:

PySpark Data Engineer - 2 months+ £600-630pd Inside IR35- Remote

Ideally looking for someone who is CTC Cleared

  • Develop and optimise PySpark batch pipelines that process Parquet data and use Delta Lake for all IO, applying validation, enrichment, and billing calculation logic directly in PySpark code.
  • Build reliable PySpark jobs that read/write Delta tables on ADLS Gen2.
  • Implement in-code validations (schema, null/range/value checks, referential lookups), routing rejects to dedicated Delta "quarantine" tables.
  • Design and implement billing logic - tariff/charge models, tiered pricing, pro-rata handling, VAT/discounts, adjustments, and full auditability.
  • Externalise billing and validation rules via versioned JSON configs, ensuring deterministic, idempotent re-runs.
  • Optimise Delta operations (MERGE, OPTIMIZE, Z-ORDER, VACUUM) and incremental/CDC merges into Azure SQL.
  • Tune performance (partitioning, caching, broadcast joins) and maintain robust retries, checkpoints, and structured logging.
  • Integrate with orchestrators (ADF or Container App Orchestrator) and CI/CD pipelines (GitHub Actions).
  • Operate securely within private-network Azure environments (Managed Identity, RBAC, Private Endpoints).

Required Industry Knowledge and Competencies

  • PySpark with Delta Lake (structured APIs, MERGE, schema evolution).
    Solid knowledge of Azure Synapse Spark pools or Databricks, ADLS Gen2, and Azure SQL.
  • Strong engineering discipline: observability, retries, cost and performance optimisation.
  • Great Expectations (for supplementary DQ checks).
  • Familiarity with ADF orchestration and containerised Spark workloads.

PySpark Data Engineer - 2 months+ £600-630pd Inside IR35- Remote

Damia Group Limited acts as an employment agency for permanent recruitment and employment business for the supply of temporary workers. By applying for this job you accept our Data Protection Policy which can be found on our website.

Please note that no terminology in this advert is intended to discriminate on the grounds of a person's gender, marital status, race, religion, colour, age, disability or sexual orientation. Every candidate will be assessed only in accordance with their merits, qualifications and ability to perform the duties of the job.

Damia Group is acting as an Employment Business in relation to this vacancy and in accordance to Conduct Regulations 2003.