Lead Data Engineer - Near Real-Time Ingestion (GCP)

Lead Data Engineer - Near Real-Time Ingestion (GCP)

Posted Today by fortice

£500 Per day
Inside
Hybrid
Hybrid - Birmingham, UK

Summary: The Lead Data Engineer will be responsible for designing, developing, and ensuring operational excellence of Near Real Time data ingestion platforms on Google Cloud. This role involves architecting low-latency ingestion pipelines and mentoring engineers while enforcing best practices. The position requires a strong technical leader to shape the roadmap and ensure the platform's security, reliability, and scalability. The role is hybrid based in Birmingham, UK, and classified as inside IR35.

Key Responsibilities:

  • Design and build near Real Time ingestion frameworks using Pub/Sub, Kafka, Dataflow (Beam), GCS, BigQuery, and Cloud SQL.
  • Develop and optimize streaming pipelines leveraging Apache Beam in Java or Python, deployed on Dataflow.
  • Implement CDC ingestion patterns using Datastream, ensuring resilience against schema drift and low-latency updates.
  • Utilize advanced BigQuery optimization techniques, including Storage Write API, partitioning, clustering, and materialized views.
  • Tune streaming workloads for latency, throughput, backpressure management, windowing, and watermarks.
  • Set up robust dead-letter queues, replay strategies, retries, and error-handling frameworks.
  • Establish enterprise-grade observability with Cloud Monitoring, Logging, and Trace.
  • Build highly available, self-healing ingestion pipelines with strong SLAs.
  • Implement strong security controls across IAM, KMS, VPC Service Controls, Secret Manager, and DLP.
  • Enforce best practices in data governance, quality validation, and auditability.
  • Lead the NRT ingestion roadmap and partner closely with architecture, platform, and analytics teams.
  • Mentor data engineers and promote engineering excellence, reusable patterns, and automation.
  • Define SLAs/SLOs, ensure compliance, and champion operational maturity.

Key Skills:

  • Deep hands-on expertise with Google Cloud Platform services including Pub/Sub, Dataflow, BigQuery, GCS, and Cloud Composer.
  • Strong experience with CDC technologies, especially Datastream.
  • Proficiency in Apache Beam (Java or Python) and SQL.
  • Solid knowledge of Terraform, CI/CD/GitOps, and container orchestration (e.g., GKE).
  • Strong understanding of distributed systems, streaming design patterns, and cloud security controls.
  • Proven experience leading teams, driving technical strategy, and delivering complex ingestion systems at scale.

Salary (Rate): £500/day

City: Birmingham

Country: UK

Working Arrangements: hybrid

IR35 Status: inside IR35

Seniority Level: Senior

Industry: IT

Detailed Description From Employer:

Lead Data Engineer - Near Real Time Ingestion (GCP)

Location: Birmingham (Hybrid)
Role Type: Full-time
Seniority: Lead/Principal Engineer

Inside iR35

About the Role

We are seeking an experienced Lead Data Engineer to own and drive the design, development, and operational excellence of our Near Real Time (NRT) data ingestion platforms on Google Cloud. In this role, you will architect and deliver low-latency ingestion pipelines, enabling mission-critical, high-throughput data flows from diverse source systems into BigQuery and Cloud SQL using cutting-edge CDC and streaming technologies.

As a technical leader, you will shape the roadmap, mentor engineers, enforce best practices, and ensure our ingestion platform is secure, reliable, observable, and scalable.

Key Responsibilities

Architecture & Development

  • Design and build near Real Time ingestion frameworks using Pub/Sub, Kafka, Dataflow (Beam), GCS, BigQuery, and Cloud SQL.
  • Develop and optimize streaming pipelines leveraging Apache Beam in Java or Python, deployed on Dataflow.
  • Implement CDC ingestion patterns using Datastream, ensuring resilience against schema drift and low-latency updates.
  • Utilize advanced BigQuery optimization techniques, including Storage Write API, partitioning, clustering, and materialized views.

Performance, Reliability & Operations

  • Tune streaming workloads for latency, throughput, backpressure management, windowing, and watermarks.
  • Set up robust dead-letter queues, replay strategies, retries, and error-handling frameworks.
  • Establish enterprise-grade observability with Cloud Monitoring, Logging, and Trace.
  • Build highly available, self-healing ingestion pipelines with strong SLAs.

Security & Governance

  • Implement strong security controls across IAM, KMS, VPC Service Controls, Secret Manager, and DLP.
  • Enforce best practices in data governance, quality validation, and auditability.

Leadership & Strategy

  • Lead the NRT ingestion roadmap and partner closely with architecture, platform, and analytics teams.
  • Mentor data engineers and promote engineering excellence, reusable patterns, and automation.
  • Define SLAs/SLOs, ensure compliance, and champion operational maturity.

Required Qualifications

Deep hands-on expertise with Google Cloud Platform services including:

o Pub/Sub

    • Dataflow
    • BigQuery
    • GCS
    • Cloud Composer
  • Strong experience with CDC technologies, especially Datastream.
  • Proficiency in Apache Beam (Java or Python) and SQL.
  • Solid knowledge of Terraform, CI/CD/GitOps, and container orchestration (eg, GKE).
  • Strong understanding of distributed systems, streaming design patterns, and cloud security controls.
  • Proven experience leading teams, driving technical strategy, and delivering complex ingestion systems at scale.

Key Performance Indicators (KPIs)

Data ingestion latency (p95): < X seconds

Platform availability: = 99.9%

Data quality/accuracy: = 99.5%