Negotiable
Undetermined
Undetermined
London Area, United Kingdom
Summary: The Data Engineer role focuses on building and maintaining data pipelines using Pyspark and Python, with a specific task of migrating Jupyter Notebooks to an in-house Pyspark framework for AWS Glue jobs. The position requires strong documentation skills and the ability to perform reconciliation checks to ensure data integrity. Candidates should have a solid background in data engineering concepts and experience with AWS and modern data warehouse platforms.
Key Responsibilities:
- Build Pyspark and Python data pipelines.
- Write and maintain documentation of technical architecture.
- Identify areas for quick wins to improve the experience of end users.
- Migrate an existing Jupyter Notebook from OmniAI to a firm’s in-house Pyspark framework for orchestrating AWS Glue jobs in an estimated 3 month’s time.
- Ensure the final output in the new pipeline on the new setup matches with the existing pipeline.
- Perform reconciliation checks, identify and resolve any differences.
Key Skills:
- Formal training or certification on data engineering concepts and 3+ years applied experience.
- Proficiency in one or more programming languages such as Python and Java.
- Ability to design and implement scalable data pipelines for batch and real-time data processing.
- Experience with AWS.
- Experience working with modern data warehouse platforms like Amazon Redshift.
- Experience in developing, debugging, and maintaining code in a large corporate environment.
- Overall knowledge of the Software Development Life Cycle.
- Solid understanding of agile methodologies such as CI/CD, Application Resiliency, and Security.
- Certifications in relevant technologies or platforms, such as AWS Certified Big Data Engineer Associate can be advantageous.
- Relevant industry experience, preferably in a data engineering role.
Salary (Rate): undetermined
City: London Area
Country: United Kingdom
Working Arrangements: undetermined
IR35 Status: undetermined
Seniority Level: undetermined
Industry: IT
Job Responsibilities
- Build Pyspark and python data pipelines.
- Write and maintain documentation of technical architecture.
- Identify areas for quick wins to improve the experience of end users.
- Migrate an existing Jupyter Notebook from OmniAI to a firm’s inhouse Pyspark framework for orchestrating AWS Glue jobs in an estimated 3 month’s time.
- Ensure the final output in the new pipeline on the new setup matches with the existing pipeline.
- Perform reconciliation checks, identify and resolve any differences.
Required Qualifications, Capabilities, and Skills
- Formal training or certification on data engineering concepts and 3+ years applied experience.
- Proficiency in one or more programming languages such as Python and Java
- Ability to design and implement scalable data pipelines for batch and real-time data processing.
- Experience with AWS.
- Experience working with modern data warehouse platforms like Amazon Redshift.
- Experience in developing, debugging, and maintaining code in a large corporate environment.
- Overall knowledge of the Software Development Life Cycle.
- Solid understanding of agile methodologies such as CI/CD, Application Resiliency, and Security.
Preferred Qualifications, Capabilities, and Skills
- Certifications in relevant technologies or platforms, such as AWS Certified Big Data Engineer Associate can be advantageous.
- Relevant industry experience, preferably in a data engineering role.