Negotiable
Outside
Remote
USA
Summary: The Cloudera Data Engineer role involves supporting the migration of a Medicaid Data Warehouse Implementation within an AWS environment. The engineer will ensure a seamless transition of a Cloudera/Hive/Scala-based data pipeline, focusing on operational stability and data integrity. Collaboration with the project team on AWS infrastructure is essential for successful migration and ongoing operations. This position requires a strong background in data engineering and Cloudera platform management.
Key Responsibilities:
- Replicate and configure existing Cloudera cluster (HDFS, YARN, Hive, Spark) in the new AWS account.
- Coordinate with project team to ensure proper infrastructure provisioning (EC2, security groups, IAM roles, and networking).
- Reconfigure cluster connectivity and job dependencies for the new environment.
- Migrate and validate metadata stores (Hive Metastore, job configs, dependencies).
- Validate job execution and data outputs for parity with existing environment.
- Deploy, test, and operate existing Hive, Spark (Scala) jobs post-migration.
- Maintain job schedules, dependencies, and runtime configurations.
- Monitor job performance, identify bottlenecks, and apply tuning or code-level optimizations.
- Troubleshoot failures and implement automated recovery or alerting where applicable.
- Monitor Cloudera Manager dashboards, cluster health, and resource utilization.
- Manage user roles and access within Cloudera environment.
- Implement periodic data cleanup, archiving, and housekeeping processes.
- Document configurations, migration steps, and operational runbooks.
Key Skills:
- Bachelor's degree in computer science, Information Systems, or a related field.
- 7+ years of experience in data engineering or big data development.
- 4+ years experience with Cloudera platform (HDFS, YARN, Hive, Spark, Oozie).
- Experience deploying and operating Cloudera workloads on AWS (EC2, S3, IAM, CloudWatch).
- Strong proficiency in Scala, Java and HiveQL; Python or Bash scripting experience preferred.
- Strong proficiency in Apache Spark & Scala programming for data processing and transformation.
- Hands-on experience with Cloudera distribution of Hadoop.
- Hands-on experience implementing business-rules processing using Drools.
- Able to work with infrastructure, DevOps, and data governance teams in a multi-disciplinary environment.
Salary (Rate): undetermined
City: undetermined
Country: USA
Working Arrangements: remote
IR35 Status: outside IR35
Seniority Level: undetermined
Industry: IT
.*************** DIRECT CLIENT REQUIREMENT ****************