Negotiable
Undetermined
Undetermined
England, United Kingdom
Summary: The role of a Databricks Data Engineer involves developing and optimizing large-scale data engineering solutions within the Databricks Data Intelligence Platform. The candidate will focus on workflow orchestration, performance optimization, and data governance, utilizing tools such as PySpark, Delta Lake, and Azure services. Collaboration with cloud architects and data analysts is essential to design end-to-end workflows for analytics and machine learning. The position requires a strong technical background and experience in managing data pipelines and governance practices.
Key Responsibilities:
- Design, build, and maintain robust data pipelines using Databricks notebooks, Jobs, and Workflows for batch and streaming data processing.
- Optimize Spark and Delta Lake performance on Databricks clusters through efficient cluster configuration, adaptive query execution, and caching strategies.
- Conduct performance testing and cluster tuning to ensure cost-efficient and high-performing workloads.
- Implement data quality, lineage tracking, and access control policies aligned with Databricks Unity Catalog and data governance best practices.
- Develop PySpark applications for ETL, data transformation, and analytical use cases, adhering to modular and reusable design principles.
- Create and manage Delta Lake tables with a focus on ACID compliance, schema evolution, and time travel for versioned data management.
- Integrate Databricks solutions with Azure services including Azure Data Lake Storage, Key Vault, and Azure Functions.
- Collaborate with cloud architects and data analysts to design end-to-end workflows supporting analytics, machine learning, and reporting use cases.
- Support CI/CD deployment of Databricks assets using Azure DevOps or similar automation frameworks.
- Maintain detailed technical documentation on architecture, performance benchmarks, and governance configurations.
Key Skills:
- In-depth knowledge of Databricks Data Intelligence Platform and multi-cloud ecosystem integration.
- Experience configuring, scheduling, and monitoring Databricks Jobs and Workflows.
- Strong proficiency in PySpark, including advanced data transformation, schema management, and optimization techniques.
- Solid understanding of Delta Lake architecture, transactional processing, and incremental data pipeline design.
- Proven ability to conduct Spark performance tuning and cluster optimization based on workload profiles.
- Experience implementing fine-grained data governance with Unity Catalog, access policies, and data lineage tracking.
- Hands-on experience with Azure Cloud components such as Data Lake Storage (Gen2), Key Vault, and Azure Functions.
- Familiarity with CI/CD frameworks for Databricks asset deployment and environment automation.
- Strong analytical and troubleshooting skills in distributed data environments.
Salary (Rate): undetermined
City: undetermined
Country: United Kingdom
Working Arrangements: undetermined
IR35 Status: undetermined
Seniority Level: undetermined
Industry: IT
We are looking for a Databricks Data Engineer with strong expertise in developing and optimizing large-scale data engineering solutions within the Databricks Data Intelligence Platform. The ideal candidate will have practical experience in workflow orchestration, performance optimization, and data governance, alongside broad proficiency in PySpark, Delta Lake, and Azure services.
Key Responsibilities:
- Design, build, and maintain robust data pipelines using Databricks notebooks, Jobs, and Workflows for batch and streaming data processing.
- Optimize Spark and Delta Lake performance on Databricks clusters through efficient cluster configuration, adaptive query execution, and caching strategies.
- Conduct performance testing and cluster tuning to ensure cost-efficient and high-performing workloads.
- Implement data quality, lineage tracking, and access control policies aligned with Databricks Unity Catalog and data governance best practices.
- Develop PySpark applications for ETL, data transformation, and analytical use cases, adhering to modular and reusable design principles.
- Create and manage Delta Lake tables with a focus on ACID compliance, schema evolution, and time travel for versioned data management.
- Integrate Databricks solutions with Azure services including Azure Data Lake Storage, Key Vault, and Azure Functions.
- Collaborate with cloud architects and data analysts to design end-to-end workflows supporting analytics, machine learning, and reporting use cases.
- Support CI/CD deployment of Databricks assets using Azure DevOps or similar automation frameworks.
- Maintain detailed technical documentation on architecture, performance benchmarks, and governance configurations.
Required Skills and Experience:
- In-depth knowledge of Databricks Data Intelligence Platform and multi-cloud ecosystem integration.
- Experience configuring, scheduling, and monitoring Databricks Jobs and Workflows.
- Strong proficiency in PySpark, including advanced data transformation, schema management, and optimization techniques.
- Solid understanding of Delta Lake architecture, transactional processing, and incremental data pipeline design.
- Proven ability to conduct Spark performance tuning and cluster optimization based on workload profiles.
- Experience implementing fine-grained data governance with Unity Catalog, access policies, and data lineage tracking.
- Hands-on experience with Azure Cloud components such as Data Lake Storage (Gen2), Key Vault, and Azure Functions.
- Familiarity with CI/CD frameworks for Databricks asset deployment and environment automation.
- Strong analytical and troubleshooting skills in distributed data environments.
Preferred Qualifications:
- Experience supporting enterprise-scale Databricks environments with multiple workspaces and governed catalogs.
- Knowledge of Azure Synapse, Power BI, or related analytics services.
- Understanding of cost optimization strategies for data compute on Databricks clusters.
- Excellent problem-solving skills, technical communication, and cross-functional collaboration abilities.