Negotiable
Undetermined
Undetermined
EMEA
Summary: The Tech Lead Data Engineer at Andela is a pivotal role focused on leading data engineering efforts within the AI Tribe, emphasizing the development of scalable data systems for machine learning. This hands-on leadership position requires expertise in Cloudera and big data technologies, along with mentoring junior engineers and collaborating with data science teams. The role includes significant travel to Cairo, Egypt, for client requirements, with all travel costs covered by Andela. Candidates should have extensive experience in data engineering and a strong understanding of the machine learning lifecycle.
Key Responsibilities:
- Serve as the primary technical authority and mentor for a team of data engineers.
- Design and oversee the implementation of scalable data pipelines using Informatica PowerCenter and Spark/PySpark.
- Act as a liaison between Data Engineering and Data Science leads to translate modeling requirements into technical solutions.
- Champion data quality and integrity across the AI tribe, implementing automated monitoring and validation checks.
- Troubleshoot complex data-related issues in the big data ecosystem.
Key Skills:
- 8+ years of professional experience in data engineering, with at least 3+ years in a senior or lead capacity.
- Deep expertise in Cloudera Data Platform (CDP) and its core components.
- Proficiency in Apache Spark and PySpark, including performance tuning for large-scale data processing.
- Experience with Informatica PowerCenter for ETL workflows.
- Excellent programming skills in Python for data processing and automation.
- Strong communication and mentoring skills.
Salary (Rate): undetermined
City: undetermined
Country: undetermined
Working Arrangements: undetermined
IR35 Status: undetermined
Seniority Level: undetermined
Industry: IT
Tech Lead Data Engineer (Cloudera, Big Data) Andela
Location Requirement: Africa, Middle East, Europe preferred
About Andela
Andela exists to connect brilliance and opportunity. Since 2014, we have been dedicated to breaking down global barriers and accelerating the future of work for both technologists and organizations around the world. For technologists, Andela offers competitive long-term career opportunities with leading organizations, access to a global community of professionals, and educational opportunities with leading technology providers. At Andela, we’re deeply passionate about creating long-lasting and transformative growth opportunities for all - and doing it in an E.P.I.C. way! We’re excited to continue building our remote-first team with incredible people like you. After applying for this role, you will join our Andela Community of brilliant technologists by passing a technical screening and live interview. As a community member, you’ll have access to many exclusive technologist roles. Join Andela today to access this opportunity and more in our global marketplace! Our roles are typically filled at lightning speed, so if you’re considering applying, get your application in quickly!
Andela´s Benefits:
- 100% payment in USD.
Important: This role requires travel to Cairo, Egypt for 2 weeks, every month (based on the client requirements). Travel and accommodation costs are covered by Andela.
About the role
We are seeking an exceptional and highly motivated Data Engineering Technical Lead to serve as the cornerstone of our AI Tribe's data capabilities. This is a critical, hands-on leadership role for an individual who is passionate about building robust, scalable data systems for machine learning within a sophisticated on-premise environment. You will be the Subject Matter Expert (SME) for all data engineering activities within the tribe, providing technical guidance, mentorship, and architectural vision. You will work closely with Data Science leads to bridge the gap between data preparation and model development, solving complex challenges in data pipelines, feature engineering, and data quality. Your expertise will be instrumental in elevating the skills of our junior data engineers and establishing best practices that ensure our AI models are built on a foundation of trust, quality, and efficiency.
Key Responsibilities
- Technical Leadership & Mentorship: Serve as the primary technical authority and go-to expert for a team of data engineers working across multiple AI use cases on our on-premise infrastructure. Mentor and guide junior data engineers, fostering their growth through code reviews, paired programming, and knowledge sharing. Establish and enforce best practices for data engineering, including coding standards, design patterns, and development processes specific to our tech stack.
- Architecture & Design: Design, architect, and oversee the implementation of scalable, production-grade data pipelines using Informatica PowerCenter for ETL and Spark/PySpark for large-scale data transformation. Lead the strategy and development of a robust feature engineering framework on our Cloudera platform, enabling data scientists to efficiently create and productionize features for ML models. Take ownership of the data architecture within our on-premise Cloudera Data Platform (CDP), ensuring it is optimized for performance, security, and reliability for all AI/ML workloads.
- Collaboration & Stakeholder Management: Act as the key liaison between Data Engineering and Data Science leads to translate modeling requirements into technical data solutions. Define and implement clear "data contracts" between data producers (pipelines) and data consumers (models), specifying schema, SLAs, quality metrics, and update frequency. Work with Product Owners and MLOps engineers to ensure seamless integration of data pipelines into the end-to-end machine learning lifecycle.
- Data Quality & Governance: Champion a culture of data quality and integrity across the AI tribe. Design and implement automated data quality monitoring and validation checks within Informatica workflows and Spark jobs. Troubleshoot and resolve complex data-related issues in our big data ecosystem, ensuring minimal disruption to AI development and deployment.
Essential Qualifications & Experience
- 8+ years of professional experience in data engineering, with at least 3+ years in a senior or lead capacity within an on-premise environment.
- Deep, hands-on expertise managing and developing on the Cloudera Data Platform (CDP) Private Cloud Base. This must include strong experience with core components like HDFS, YARN, Hive, Impala, and Oozie.
- Expert, hands-on proficiency with Apache Spark and PySpark, including deep knowledge of performance tuning and optimization for large-scale data processing (TB+) on a Cloudera cluster (CML & CDP).
- Extensive, proven experience developing, deploying, and managing complex ETL workflows using Informatica PowerCenter.
- Excellent programming skills in Python for data processing and automation.
- Previous experience working closely with data science teams and a strong understanding of the machine learning lifecycle, particularly feature engineering and data preparation for modeling.
- Excellent communication, leadership, and mentoring skills, with a proven ability to guide and influence technical teams.
Preferred Qualifications
- Experience with streaming technologies like Kafka within the Cloudera ecosystem.
- Experience building or using a Feature Store.
- Proficiency with workflow orchestration tools like Airflow to manage dependencies between Informatica and Spark jobs.
Interview Process with client:
- 1st round: technical and behavioral interview
- 2nd round: to be confirmed
Contract: 6 months, possibly renewable
Full-time contractor role (8 hours/day)
Work schedule: Sunday to Thursday
Device: Bring your own device
Important: This role requires travel to Cairo, Egypt for 2 weeks, every month (based on the client requirements). Travel and accommodation costs are covered by Andela.
Required skills
- Data Engineering (7 - 9 yrs)
- SQL (7 - 9 yrs)
- Cloudera (7 - 9 yrs)
- Big Data (7 - 9 yrs)
- Python (7 - 9 yrs)
- Apache Spark (7 - 9 yrs)
- Informatica Powercenter (4 - 6 yrs)
Optional skills
- Machine Learning (4 - 6 yrs)
- Apache Airflow (4 - 6 yrs)
- Kafka (4 - 6 yrs)
Must-Haves: TRAVELS TO EGYPT REQUIRED 2 WEEKS PER MONTH. Cloudera CDP, Spark/PySpark, Informatica PowerCenter (or equivalent ETL tools), Python, On-premise infrastructure experience, Data Engineering Leadership.
Nice-to-have: Kafka, Airflow, MLOps exposure, experience with Machine Learning workflows.
Main job time zone: Spain CET (Central European time)
Time zone overlap requirements: 6+ hours with UTC+3
Location Requirements: Africa, Middle East, Europe preferred
Important: This role requires travel to Cairo, Egypt for 2 weeks, every month (based on the client requirements). Travel and accommodation costs are covered by Andela.
Offer Estimated Duration: 6-month-term contractor position (Renewable)
Full-time dedication (40 hours/week)
At Andela, we know our strengths lie in our diverse community whose talents, perspectives, backgrounds, and orientations we take pride in. Andela is committed to nurturing a work environment where all individuals are treated with respect and dignity. Everyone has the right to work in a professional atmosphere that promotes equal employment opportunities and prohibits discriminatory practices. Andela provides equal employment opportunities to all employees and applicants without regard to factors including but not limited to race, color, religion, gender, sexual orientation, gender identity, national origin, age, disability, pregnancy (including breastfeeding), genetic information, HIV/AIDS or any other medical status, family or parental status, marital status, amnesty or status as a covered veteran in accordance with applicable federal, state and local laws. This commitment applies to all terms and conditions of employment, including but not limited to hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation, and training. Our policies expressly prohibit any form of harassment and/or discrimination, as stated above. Andela is home for all. Come as you are.