Negotiable
Undetermined
Undetermined
United Kingdom
Summary: The Hadoop Engineer role focuses on leveraging extensive experience with Hadoop and its components to develop and manage big data solutions on the Open Data Platform. The position requires proficiency in Python and Apache Airflow, along with expertise in real-time data processing using Apache Spark Streaming. The engineer will also be responsible for building scalable ETL pipelines and analyzing operational data to derive insights.
Key Responsibilities:
- Develop and manage big data solutions using Hadoop and related components.
- Utilize Python for scripting, data manipulation, and pipeline development.
- Implement workflow orchestration and scheduling with Apache Airflow.
- Process real-time data and perform stream analytics using Apache Spark Streaming.
- Build scalable ETL pipelines and data ingestion frameworks.
- Analyze telemetry logs and performance data from infrastructure systems.
- Derive insights from operational data, including server metrics and network logs.
- Understand data schemas, partitioning strategies, and optimization techniques for large-scale datasets.
Key Skills:
- Minimum 5 years of hands-on experience with Hadoop and related components (HDFS, YARN, MapReduce).
- Experience with the Open Data Platform (ODP) stack and enterprise-grade big data platforms.
- Proficient in Python for scripting and data manipulation.
- Hands-on experience with Apache Airflow for workflow orchestration.
- Strong expertise in Apache Spark Streaming for real-time data processing.
- Familiarity with additional big data tools such as Hive, HBase, Pig, and Kafka.
- Experience in building scalable ETL pipelines.
- Ability to analyze telemetry logs and performance data.
- Understanding of data schemas and optimization techniques.
Salary (Rate): undetermined
City: undetermined
Country: United Kingdom
Working Arrangements: undetermined
IR35 Status: undetermined
Seniority Level: undetermined
Industry: IT
Minimum 5 years of hands-on experience with Hadoop and related components ( HDFS, YARN, MapReduce ). Experience working with ODP (Open Data Platform) stack and enterprise-grade big data platforms. Proficient in Python for scripting, data manipulation, and pipeline development. Hands-on experience with Apache Airflow for workflow orchestration and scheduling. Strong expertise in Apache Spark Streaming for real-time data processing and stream analytics. Familiarity with additional big data tools such as Hive, HBase, Pig, and Kafka (added advantage). Experience in building scalable ETL pipelines and data ingestion frameworks. Capable of analyzing telemetry logs or performance data from infrastructure systems. Ability to derive insights from operational data (e.g., server metrics, network logs). Understanding of data schemas , partitioning strategies, and optimization techniques for large-scale datasets.