Data Engineer AWS

Data Engineer AWS

Posted 3 days ago by Athsai

Negotiable
Outside
Remote
England, United Kingdom

Summary: We are seeking a Senior Data Engineer with expertise in PySpark and AWS to join our data engineering team. The role involves designing and optimizing large-scale data pipelines and collaborating with data scientists and analysts to ensure data reliability and availability. The ideal candidate will have extensive experience in cloud technologies and data processing. This position is fully remote within the UK.

Key Responsibilities:

  • Design, build, and maintain scalable data pipelines using PySpark and Python for high-volume, high-velocity data processing.
  • Develop and manage ETL/ELT workflows, ensuring data accuracy, consistency, and performance.
  • Orchestrate complex workflows using Apache Airflow, including scheduling, dependency management, and failure handling.
  • Architect and implement cloud-native data solutions on AWS, following best practices for performance, scalability, and security.
  • Work extensively with AWS services such as API Gateway, AWS Lambda, Amazon Redshift, AWS Glue, Amazon CloudWatch, Amazon S3, EMR, and IAM.
  • Use Terraform to provision and manage AWS infrastructure as code, ensuring reproducible and reliable environments.
  • Build and maintain CI/CD pipelines using GitHub Actions to automate testing, deployment, and infrastructure changes.
  • Optimize Spark jobs, tune performance, and troubleshoot production issues across distributed systems.
  • Collaborate with cross-functional teams to define data architecture, governance, and best practices.

Key Skills:

  • 6+ years of hands-on experience in data engineering or related roles.
  • Strong expertise in Python, PySpark, and SQL with experience in writing optimized, production-grade code.
  • In-depth knowledge of Apache Spark internals and Apache Airflow.
  • Proven experience designing and implementing ETL pipelines for large-scale data platforms.
  • Strong hands-on experience with AWS cloud services, especially API Gateway, Lambda, Redshift, Glue, CloudWatch, S3, and EMR.
  • Experience provisioning infrastructure using Terraform.
  • Practical experience building CI/CD pipelines using GitHub Actions.
  • Experience with real-time data streaming using Kafka, Kinesis, or similar technologies (preferred).
  • Familiarity with containerization tools such as Docker and Kubernetes (preferred).
  • Knowledge of data governance, data quality frameworks, and monitoring strategies (preferred).

Salary (Rate): undetermined

City: undetermined

Country: United Kingdom

Working Arrangements: remote

IR35 Status: outside IR35

Seniority Level: Senior

Industry: IT

Detailed Description From Employer:

We Are Hiring – Senior Data Engineer (PySpark & AWS)

Salary: 300 outside IR35

Location: Remote in UK

We are looking for an experienced and highly skilled Senior Data Engineer to join our growing data engineering team. This role is ideal for a passionate engineer who thrives in building scalable data platforms, designing robust pipelines, and working with cutting-edge cloud technologies.

About The Role

As a Senior Data Engineer, you will be responsible for designing, developing, and optimizing large-scale data pipelines that power analytics, reporting, and machine learning initiatives. You will work closely with data scientists, analysts, and platform teams to ensure data is reliable, secure, and available in real time and batch processing environments.

Key Responsibilities

  • Design, build, and maintain scalable data pipelines using PySpark and Python for high-volume, high-velocity data processing.
  • Develop and manage ETL/ELT workflows, ensuring data accuracy, consistency, and performance.
  • Orchestrate complex workflows using Apache Airflow, including scheduling, dependency management, and failure handling.
  • Architect and implement cloud-native data solutions on AWS, following best practices for performance, scalability, and security.
  • Work extensively with AWS services such as API Gateway, AWS Lambda, Amazon Redshift, AWS Glue, Amazon CloudWatch, Amazon S3, EMR, and IAM.
  • Use Terraform to provision and manage AWS infrastructure as code, ensuring reproducible and reliable environments.
  • Build and maintain CI/CD pipelines using GitHub Actions to automate testing, deployment, and infrastructure changes.
  • Optimize Spark jobs, tune performance, and troubleshoot production issues across distributed systems.
  • Collaborate with cross-functional teams to define data architecture, governance, and best practices.

Required Qualifications

  • 6+ years of hands-on experience in data engineering or related roles.
  • Strong expertise in Python, PySpark, and SQL with experience in writing optimized, production-grade code.
  • In-depth knowledge of Apache Spark internals and Apache Airflow.
  • Proven experience designing and implementing ETL pipelines for large-scale data platforms.
  • Strong hands-on experience with AWS cloud services, especially API Gateway, Lambda, Redshift, Glue, CloudWatch, S3, and EMR.
  • Experience provisioning infrastructure using Terraform.
  • Practical experience building CI/CD pipelines using GitHub Actions.

Preferred Qualifications

  • Experience with real-time data streaming using Kafka, Kinesis, or similar technologies.
  • Familiarity with containerization tools such as Docker and Kubernetes.
  • Knowledge of data governance, data quality frameworks, and monitoring strategies.

Why Join Us?

  • Work on large-scale, high-impact data platforms.
  • Opportunity to shape modern data architecture in a cloud-first environment.
  • Collaborative, innovative, and growth-focused culture.
  • Competitive compensation and benefits.