Data Engineer AWS

Posted 3 days ago by Athsai

Apply

Negotiable

Outside

Remote

England, United Kingdom

Apply

Airflow Amazon CloudWatch Amazon Redshift Amazon S3 Amazon Web Services (AWS) Apache Airflow Apache Kafka Apache Spark API Gateway Application Programming Interface (API) AWS Elastic MapReduce (EMR) AWS Glue AWS Identity And Access Management (IAM) AWS Kinesis AWS Lambda Azure Kubernetes Service Batch Processing Cloud Computing Cloud Services Cloud Technologies Cloud Technology Containerisation Continuous Integration and Continuous Delivery Data Architecture Data Engineering Data Governance Data Processing Data Quality Data Streaming Dependency Management Docker (Software) Extract Transform Load (ETL) Github Infrastructure as Code (IaC) Kubernetes Machine Learning Management Pyspark Python (Programming Language) Scalability Scheduling SQL (Programming Language) Terraform Workflows

Summary: We are seeking a Senior Data Engineer with expertise in PySpark and AWS to join our data engineering team. The role involves designing and optimizing large-scale data pipelines and collaborating with data scientists and analysts to ensure data reliability and availability. The ideal candidate will have extensive experience in cloud technologies and data processing. This position is fully remote within the UK.

Key Responsibilities:

Design, build, and maintain scalable data pipelines using PySpark and Python for high-volume, high-velocity data processing.
Develop and manage ETL/ELT workflows, ensuring data accuracy, consistency, and performance.
Orchestrate complex workflows using Apache Airflow, including scheduling, dependency management, and failure handling.
Architect and implement cloud-native data solutions on AWS, following best practices for performance, scalability, and security.
Work extensively with AWS services such as API Gateway, AWS Lambda, Amazon Redshift, AWS Glue, Amazon CloudWatch, Amazon S3, EMR, and IAM.
Use Terraform to provision and manage AWS infrastructure as code, ensuring reproducible and reliable environments.
Build and maintain CI/CD pipelines using GitHub Actions to automate testing, deployment, and infrastructure changes.
Optimize Spark jobs, tune performance, and troubleshoot production issues across distributed systems.
Collaborate with cross-functional teams to define data architecture, governance, and best practices.

Key Skills:

6+ years of hands-on experience in data engineering or related roles.
Strong expertise in Python, PySpark, and SQL with experience in writing optimized, production-grade code.
In-depth knowledge of Apache Spark internals and Apache Airflow.
Proven experience designing and implementing ETL pipelines for large-scale data platforms.
Strong hands-on experience with AWS cloud services, especially API Gateway, Lambda, Redshift, Glue, CloudWatch, S3, and EMR.
Experience provisioning infrastructure using Terraform.
Practical experience building CI/CD pipelines using GitHub Actions.
Experience with real-time data streaming using Kafka, Kinesis, or similar technologies (preferred).
Familiarity with containerization tools such as Docker and Kubernetes (preferred).
Knowledge of data governance, data quality frameworks, and monitoring strategies (preferred).

Salary (Rate): undetermined

City: undetermined

Country: United Kingdom

Working Arrangements: remote

IR35 Status: outside IR35

Seniority Level: Senior

Industry: IT

Detailed Description From Employer:

We Are Hiring – Senior Data Engineer (PySpark & AWS)

Salary: 300 outside IR35

Location: Remote in UK

We are looking for an experienced and highly skilled Senior Data Engineer to join our growing data engineering team. This role is ideal for a passionate engineer who thrives in building scalable data platforms, designing robust pipelines, and working with cutting-edge cloud technologies.

About The Role

As a Senior Data Engineer, you will be responsible for designing, developing, and optimizing large-scale data pipelines that power analytics, reporting, and machine learning initiatives. You will work closely with data scientists, analysts, and platform teams to ensure data is reliable, secure, and available in real time and batch processing environments.

Key Responsibilities

Design, build, and maintain scalable data pipelines using PySpark and Python for high-volume, high-velocity data processing.
Develop and manage ETL/ELT workflows, ensuring data accuracy, consistency, and performance.
Orchestrate complex workflows using Apache Airflow, including scheduling, dependency management, and failure handling.
Architect and implement cloud-native data solutions on AWS, following best practices for performance, scalability, and security.
Work extensively with AWS services such as API Gateway, AWS Lambda, Amazon Redshift, AWS Glue, Amazon CloudWatch, Amazon S3, EMR, and IAM.
Use Terraform to provision and manage AWS infrastructure as code, ensuring reproducible and reliable environments.
Build and maintain CI/CD pipelines using GitHub Actions to automate testing, deployment, and infrastructure changes.
Optimize Spark jobs, tune performance, and troubleshoot production issues across distributed systems.
Collaborate with cross-functional teams to define data architecture, governance, and best practices.

Required Qualifications

6+ years of hands-on experience in data engineering or related roles.
Strong expertise in Python, PySpark, and SQL with experience in writing optimized, production-grade code.
In-depth knowledge of Apache Spark internals and Apache Airflow.
Proven experience designing and implementing ETL pipelines for large-scale data platforms.
Strong hands-on experience with AWS cloud services, especially API Gateway, Lambda, Redshift, Glue, CloudWatch, S3, and EMR.
Experience provisioning infrastructure using Terraform.
Practical experience building CI/CD pipelines using GitHub Actions.

Preferred Qualifications

Experience with real-time data streaming using Kafka, Kinesis, or similar technologies.
Familiarity with containerization tools such as Docker and Kubernetes.
Knowledge of data governance, data quality frameworks, and monitoring strategies.

Why Join Us?

Work on large-scale, high-impact data platforms.
Opportunity to shape modern data architecture in a cloud-first environment.
Collaborative, innovative, and growth-focused culture.
Competitive compensation and benefits.

Apply

Inside IR35

Outside IR35

Permanent Employee

IR35

Umbrella Companies

Limited Companies

First Time Contractors

What Is IR35?

InsideIR35

Outside IR35

The Cost of IR35

IR35 Assessments

IR35 Rules

IR35 Compliance

Expenses

Foreign Companies

Overseas Contractors

Limited Companies

Sole Traders

What Is An Umbrella Company?

Choosing an Umbrella Company

Tax and Pay

Tax Avoidance

Fees (Margin)

National Insurance

Holiday Pay

Expenses

Pensions

Maternity Pay

Sick Pay

What Is A Limited Company?

Limited Company vs Sole Trader

Incorporation

Taxes

Filing Responsibilities

Bookkeeping

Insurance

Expenses

Buying a Car or Van

Capital Allowances

Benefits In Kind

Pensions

Employing A Spouse

Managing Excess Money

Dormant Companies

Closing Your Company

Withdrawing Money

Business Asset Disposal Relief

How To Become A Contractor

Inside IR35 Checklist

Outside IR35 Checklist

Self-Assessment Tax Returns

Mortgages

Pensions

Working Multiple Contracts

What is the £100k Abatement?

Inside IR35

Outside IR35

Permanent Employee

IR35

Umbrella Companies

Limited Companies

First Time Contractors

What Is IR35?

InsideIR35

Outside IR35

The Cost of IR35

IR35 Assessments

IR35 Rules

IR35 Compliance

Expenses

Foreign Companies

Overseas Contractors

Limited Companies

Sole Traders

What Is An Umbrella Company?

Choosing an Umbrella Company

Tax and Pay

Tax Avoidance

Fees (Margin)