Negotiable
Outside
Remote
USA
Summary: The Big Data Engineer role focuses on developing and maintaining scalable data processing pipelines using technologies such as Scala, Java, Apache Flink, and Apache Spark. The position requires expertise in AWS cloud services and containerization tools, along with a strong emphasis on real-time and batch data processing. The ideal candidate will collaborate with cross-functional teams to ensure data quality and governance across the platform.
Key Responsibilities:
- Design, develop, and maintain scalable data processing pipelines using Apache Flink and Apache Spark.
- Work on real-time and batch data processing using Kafka and Flink.
- Write clean, efficient, and testable code in Scala and Java.
- Implement data integration solutions leveraging Kafka topics and streaming APIs.
- Deploy, manage, and monitor Big Data applications on AWS (EMR, S3, Lambda, EC2, SQS).
- Use Docker and Kubernetes (k8s) for containerization and orchestration of microservices and batch jobs.
- Set up CI/CD pipelines with GitLab for automated testing and deployment.
- Monitor and troubleshoot systems using Splunk and other observability tools.
- Ensure data quality, security, and governance across the platform.
- Collaborate with cross-functional teams including data scientists, DevOps, and product stakeholders.
Key Skills:
- Strong programming experience in Scala and Java.
- Hands-on experience with Apache Flink, Apache Spark, and Apache Kafka.
- Deep understanding of stream processing and event-driven architectures.
- Proficient with AWS services such as EMR, Lambda, SQS, S3, EC2.
- Working knowledge of containerization (Docker) and orchestration with Kubernetes.
- Experience in monitoring and log analysis using Splunk or equivalent tools.
- Good understanding of CI/CD processes using GitLab or similar tools.
- Strong problem-solving skills, with the ability to debug and optimize complex systems.
- Excellent communication and documentation skills.
Salary (Rate): undetermined
City: undetermined
Country: USA
Working Arrangements: remote
IR35 Status: outside IR35
Seniority Level: undetermined
Industry: IT
- Design, develop, and maintain scalable data processing pipelines using Apache Flink and Apache Spark.
- Work on real-time and batch data processing using Kafka and Flink.
- Write clean, efficient, and testable code in Scala and Java.
- Implement data integration solutions leveraging Kafka topics and streaming APIs.
- Deploy, manage, and monitor Big Data applications on AWS (EMR, S3, Lambda, EC2, SQS).
- Use Docker and Kubernetes (k8s) for containerization and orchestration of microservices and batch jobs.
- Set up CI/CD pipelines with GitLab for automated testing and deployment.
- Monitor and troubleshoot systems using Splunk and other observability tools.
- Ensure data quality, security, and governance across the platform.
- Collaborate with cross-functional teams including data scientists, DevOps, and product stakeholders.
- Strong programming experience in Scala and Java.
- Hands-on experience with Apache Flink, Apache Spark, and Apache Kafka.
- Deep understanding of stream processing and event-driven architectures.
- Proficient with AWS services such as EMR, Lambda, SQS, S3, EC2.
- Working knowledge of containerization (Docker) and orchestration with Kubernetes.
- Experience in monitoring and log analysis using Splunk or equivalent tools.
- Good understanding of CI/CD processes using GitLab or similar tools.
- Strong problem-solving skills, with the ability to debug and optimize complex systems.
- Excellent communication and documentation skills.