Negotiable
Undetermined
Undetermined
Staffordshire, England, United Kingdom
Summary: The Data & Analytics Engineer role focuses on designing, building, and optimizing data pipelines and analytical datasets on the Databricks Lakehouse. The position requires collaboration with business stakeholders to gather requirements and deliver scalable solutions that enhance business value. The ideal candidate will have a strong technical background in data engineering and analytics, ensuring data quality and governance throughout the process. This role is pivotal during a transformational journey to enhance global data capabilities.
Key Responsibilities:
- Design and implement batch/streaming pipelines (PySpark, Spark SQL, DLT, Auto Loader) across bronze, silver & gold layers.
- Implement CDC patterns, incremental loads, tune partitioning, Z Order, OPTIMIZE/VACUUM.
- Build robust tests (unit/integration), data expectations, and observability (alerts/metrics, runbooks).
- Orchestrate using Databricks Jobs.
- Collaborate on CI/CD with Platform (Repos, Git, deployment pipelines, Terraform Jobs/Infrastructure as Code).
- Model curated gold datasets, dimensional models, and semantic layers for BI (Tableau).
- Optimize SQL Warehouses for performance, concurrency, and cost; maintain versioned metric definitions.
- Document tables with catalog cards (owner, SLA, lineage, data contracts); enable discoverability.
- Implement access controls with Unity Catalog.
- Participate in access reviews, audit remediation, and privacy/compliance processes.
Key Skills:
- Databricks (Workspaces, Clusters/Policies, Pools, Jobs, SQL Warehouses, Repos).
- Spark / PySpark / Spark SQL with performance tuning (partitioning, caching, joins, shuffle optimisation).
- Delta Lake (ACID, time travel, OPTIMIZE, ZORDER, VACUUM, MERGE into for CDC).
- Delta Live Tables (DLT) and/or structured streaming for incremental/real-time pipelines.
- Unity Catalog: permissions model, lineage, data discovery, RLS/CLS patterns.
- CI/CD: Git (branching/PR flow), build/test pipelines, environment promotion; familiarity with Terraform (Databricks provider).
- Dimensional modelling, star/snowflake schemas, semantic versioning.
- SQL performance on analytical workloads; familiarity with Tableau (Direct Lake/DirectQuery) and other reporting tools.
Salary (Rate): undetermined
City: Staffordshire
Country: United Kingdom
Working Arrangements: undetermined
IR35 Status: undetermined
Seniority Level: undetermined
Industry: IT
Data & Analytics Engineer (Databricks Lakehouse) This requirement calls for a self-motivated and result driven Data & Analytics Engineer to join our clint during an exciting time, as they embark on a transformational journey with our global data capabilities. You will own the design, build, and optimisation of reliable data pipelines and well-governed analytical datasets on the Databricks Lakehouse. You will acquire & transform raw data (bronze) into high quality, BI-ready models (gold), ensure data quality and lineage, and enable fast, secure analytics through SQL warehouses and Unity Catalogs. The ideal candidate will work closely with key business stakeholders and our SI partners to gather requirements, optimise business processes and deliver scalable solutions that drive business value.
Responsibilities
- Data Engineering
- Design and implement batch/streaming pipelines (PySpark, Spark SQL, DLT, Auto Loader) across bronze, silver & gold layers.
- Implement CDC patterns, incremental loads, tune partitioning, Z Order, OPTIMIZE/VACUUM.
- Build robust tests (unit/integration), data expectations, and observability (alerts/metrics, runbooks).
- Orchestrate using Databricks Jobs.
- Adhere to naming conventions, tagging standards (owner, environment, cost centre, classification).
- Collaborate on CI/CD with Platform (Repos, Git, deployment pipelines, Terraform Jobs/Infrastructure as Code).
- Analytics Engineering
- Model curated gold datasets, dimensional models, and semantic layers for BI (Tableau).
- Optimise SQL Warehouses for performance, concurrency, and cost; maintain versioned metric definitions.
- Document tables with catalog cards (owner, SLA, lineage, data contracts); enable discoverability.
- Governance & Security
- Implement access controls with Unity Catalog
- Adhere to naming conventions, tagging standards (owner, environment, cost centre, classification).
- Participate in access reviews, audit remediation, and privacy/compliance processes.
Qualifications
Core Technical
- Databricks (Workspaces, Clusters/Policies, Pools, Jobs, SQL Warehouses, Repos)
- Spark / PySpark / Spark SQL with performance tuning (partitioning, caching, joins, shuffle optimisation).
- Delta Lake (ACID, time travel, OPTIMIZE, ZORDER, VACUUM, MERGE into for CDC).
- Delta Live Tables (DLT) and/or structured streaming for incremental/real time pipelines.
- Unity Catalog: permissions model, lineage, data discovery, RLS/CLS patterns.
- CI/CD: Git (branching/PR flow), build/test pipelines, environment promotion; familiarity with Terraform (Databricks provider).
- Data Quality: expectations, validation frameworks, test automation.
Required Skills
- Dimensional modelling, star/snowflake schemas, semantic versioning.
- SQL performance on analytical workloads; familiarity with Tableau (Direct Lake/DirectQuery) and other reporting tools.
Equal Opportunity Statement We are committed to diversity and inclusivity in the workplace.