Big Data Engineer - Scala, Java

Big Data Engineer - Scala, Java

Posted 7 days ago by 1753784970

Negotiable
Outside
Remote
USA

Summary: The Big Data Engineer role focuses on developing and maintaining scalable data processing pipelines using technologies such as Scala, Java, Apache Flink, and Apache Spark. The position requires expertise in AWS cloud services and containerization tools, along with a strong emphasis on real-time and batch data processing. The ideal candidate will collaborate with cross-functional teams to ensure data quality and governance across the platform.

Key Responsibilities:

  • Design, develop, and maintain scalable data processing pipelines using Apache Flink and Apache Spark.
  • Work on real-time and batch data processing using Kafka and Flink.
  • Write clean, efficient, and testable code in Scala and Java.
  • Implement data integration solutions leveraging Kafka topics and streaming APIs.
  • Deploy, manage, and monitor Big Data applications on AWS (EMR, S3, Lambda, EC2, SQS).
  • Use Docker and Kubernetes (k8s) for containerization and orchestration of microservices and batch jobs.
  • Set up CI/CD pipelines with GitLab for automated testing and deployment.
  • Monitor and troubleshoot systems using Splunk and other observability tools.
  • Ensure data quality, security, and governance across the platform.
  • Collaborate with cross-functional teams including data scientists, DevOps, and product stakeholders.

Key Skills:

  • Strong programming experience in Scala and Java.
  • Hands-on experience with Apache Flink, Apache Spark, and Apache Kafka.
  • Deep understanding of stream processing and event-driven architectures.
  • Proficient with AWS services such as EMR, Lambda, SQS, S3, EC2.
  • Working knowledge of containerization (Docker) and orchestration with Kubernetes.
  • Experience in monitoring and log analysis using Splunk or equivalent tools.
  • Good understanding of CI/CD processes using GitLab or similar tools.
  • Strong problem-solving skills, with the ability to debug and optimize complex systems.
  • Excellent communication and documentation skills.

Salary (Rate): undetermined

City: undetermined

Country: USA

Working Arrangements: remote

IR35 Status: outside IR35

Seniority Level: undetermined

Industry: IT

Detailed Description From Employer:
Job Title: Big Data Engineer - Scala, Java
Location: Remote
Experience: 10+
About Role:
We are looking for an experienced Big Data Engineer with a strong background in Scala, Java, and a deep understanding of streaming and batch processing frameworks such as Apache Flink, Apache Spark, and Apache Kafka. The ideal candidate should have hands-on experience with AWS cloud services (EMR, Lambda, SQS, S3, EC2), containerization tools like Docker, orchestration using Kubernetes, and exposure to CI/CD pipelines with GitLab.

Key Responsibilities:
  • Design, develop, and maintain scalable data processing pipelines using Apache Flink and Apache Spark.
  • Work on real-time and batch data processing using Kafka and Flink.
  • Write clean, efficient, and testable code in Scala and Java.
  • Implement data integration solutions leveraging Kafka topics and streaming APIs.
  • Deploy, manage, and monitor Big Data applications on AWS (EMR, S3, Lambda, EC2, SQS).
  • Use Docker and Kubernetes (k8s) for containerization and orchestration of microservices and batch jobs.
  • Set up CI/CD pipelines with GitLab for automated testing and deployment.
  • Monitor and troubleshoot systems using Splunk and other observability tools.
  • Ensure data quality, security, and governance across the platform.
  • Collaborate with cross-functional teams including data scientists, DevOps, and product stakeholders.

Required Skills & Qualifications:
  • Strong programming experience in Scala and Java.
  • Hands-on experience with Apache Flink, Apache Spark, and Apache Kafka.
  • Deep understanding of stream processing and event-driven architectures.
  • Proficient with AWS services such as EMR, Lambda, SQS, S3, EC2.
  • Working knowledge of containerization (Docker) and orchestration with Kubernetes.
  • Experience in monitoring and log analysis using Splunk or equivalent tools.
  • Good understanding of CI/CD processes using GitLab or similar tools.
  • Strong problem-solving skills, with the ability to debug and optimize complex systems.
  • Excellent communication and documentation skills.