MLOps Data Engineer (GCP)

MLOps Data Engineer (GCP)

Posted 1 day ago by Xcede

Negotiable
Outside
Hybrid
Greater London, England, United Kingdom

Summary: The MLOps Data Engineer role focuses on building and maintaining reliable data pipelines and ML pipelines in production, enabling Data Science through high-quality datasets and robust deployments. The position requires collaboration with Data Scientists to standardize workflows and ensure reproducibility while implementing observability and best practices for data and ML quality. The role emphasizes strong technical skills in Python, SQL, and cloud environments, particularly GCP. The position is hybrid, requiring 2-3 days in London.

Key Responsibilities:

  • Design, build, and operate scalable data pipelines for ingestion, transformation, and distribution.
  • Develop and maintain ML pipelines end-to-end: data preparation, feature generation, training orchestration, packaging, deployment, and retraining.
  • Partner closely with Data Scientists to productionize models: standardise workflows, ensure reproducibility, and reduce time-to-production.
  • Build and maintain MLOps automation: CI/CD for ML, environment management, artefact handling, versioning of data/models/code.
  • Implement observability for ML systems: monitoring, alerting, logging, dashboards, and incident response for data + model health.
  • Establish best practices for data quality and ML quality: validation checks, pipeline tests, lineage, documentation, and SLAs/SLOs.
  • Optimise cost and performance across data processing and training workflows (e.g., Spark tuning, BigQuery optimisation, compute autoscaling).
  • Ensure secure, compliant handling of data and models, including access controls, auditability, and governance practices.

Key Skills:

  • 4+ years of experience as a Data Engineer (or ML Platform / MLOps Engineer with strong DE foundations) shipping production pipelines.
  • Strong Python and SQL skills; ability to write maintainable, testable, production-grade code.
  • Solid understanding of MLOps fundamentals: model lifecycle, reproducibility, deployment patterns, and monitoring needs.
  • Hands-on experience with orchestration and distributed processing in a cloud environment.
  • Experience with data modelling and ETL/ELT patterns; ability to deliver analysis-ready datasets.
  • Familiarity with containerization and deployment workflows (Docker, CI/CD, basic Kubernetes/serverless concepts).
  • Strong GCP experience and services such as Vertex, BigQuery, Composer, Dataproc, Cloud Run, Dataplex, Cloud Storage/or at least one major cloud provider, GCP, AWS, Azure.
  • Strong troubleshooting mindset: ability to debug issues across data, infra, pipelines, and deployments.
  • Experience with ML tooling such as MLflow (tracking/registry), Vertex AI / SageMaker / Azure ML, or similar platforms (nice to have).
  • Experience building and maintaining feature stores (e.g., Feast, Vertex Feature Store) (nice to have).
  • Experience with data/model validation tools (e.g., Great Expectations, TensorFlow Data Validation, Evidently) (nice to have).
  • Knowledge of model monitoring concepts: drift, data quality issues, performance degradation, bias checks, and alerting strategies (nice to have).
  • Infrastructure-as-Code (Terraform) and secrets management / IAM best practices (nice to have).
  • Familiarity with governance/compliance standards and audit requirements (nice to have).

Salary (Rate): undetermined

City: Greater London

Country: United Kingdom

Working Arrangements: hybrid

IR35 Status: outside IR35

Seniority Level: undetermined

Industry: IT

Detailed Description From Employer:

MLOps Data Engineer Hybrid 2/3 days – London Outside IR/35

We’re looking for a Data Engineer with strong MLOps ownership—someone who builds reliable data pipelines and designs, runs, and improves ML pipelines in production. You won’t be training models day-to-day like a Data Scientist; instead, you’ll enable Data Science by delivering high-quality datasets, reproducible training pipelines, robust deployments, and monitoring that keeps ML systems healthy and trustworthy.

What you’ll do

  • Design, build, and operate scalable data pipelines for ingestion, transformation, and distribution
  • Develop and maintain ML pipelines end-to-end: data preparation, feature generation, training orchestration, packaging, deployment, and retraining
  • Partner closely with Data Scientists to productionize models: standardise workflows, ensure reproducibility, and reduce time-to-production
  • Build and maintain MLOps automation: CI/CD for ML, environment management, artefact handling, versioning of data/models/code
  • Implement observability for ML systems: monitoring, alerting, logging, dashboards, and incident response for data + model health
  • Establish best practices for data quality and ML quality: validation checks, pipeline tests, lineage, documentation, and SLAs/SLOs
  • Optimise cost and performance across data processing and training workflows (e.g., Spark tuning, BigQuery optimisation, compute autoscaling)
  • Ensure secure, compliant handling of data and models, including access controls, auditability, and governance practices

What makes you a great fit

  • 4+ years of experience as a Data Engineer (or ML Platform / MLOps Engineer with strong DE foundations) shipping production pipelines
  • Strong Python and SQL skills; ability to write maintainable, testable, production-grade code
  • Solid understanding of MLOps fundamentals: model lifecycle, reproducibility, deployment patterns, and monitoring needs
  • Hands-on experience with orchestration and distributed processing in a cloud environment
  • Experience with data modelling and ETL/ELT patterns; ability to deliver analysis-ready datasets
  • Familiarity with containerization and deployment workflows (Docker, CI/CD, basic Kubernetes/serverless concepts)
  • Strong GCP experience and services such as Vertex, BigQuery, Composer, Dataproc, Cloud Run, Dataplex, Cloud Storage/or at least one major cloud provider, GCP, AWS, Azure
  • Strong troubleshooting mindset: ability to debug issues across data, infra, pipelines, and deployments

Nice to have / big advantage

  • Experience with ML tooling such as MLflow (tracking/registry), Vertex AI / SageMaker / Azure ML, or similar platforms
  • Experience building and maintaining feature stores (e.g., Feast, Vertex Feature Store)
  • Experience with data/model validation tools (e.g., Great Expectations, TensorFlow Data Validation, Evidently)
  • Knowledge of model monitoring concepts: drift, data quality issues, performance degradation, bias checks, and alerting strategies
  • Infrastructure-as-Code (Terraform) and secrets management / IAM best practices
  • Familiarity with governance/compliance standards and audit requirements