Cloudera Data engineer

Posted 2 days ago by 1762591316

Apply

Negotiable

Outside

Remote

USA

Apply

Amazon Elastic Compute Cloud Amazon Web Services (AWS) Apache Hadoop Apache Hive Apache Oozie Apache Spark Apache Yarn AWS Identity And Access Management (IAM) Bash (Scripting Language) Big Data Certified Data Professional (CDP) Cisco Discovery Protocol Cloudera Manager Computer Science Data Engineering Data Governance Data Integrity Data Pipeline Data Processing Data Warehousing DevOps Drools Email Archiving Generic Programming Information Systems Java (Programming Language) Metadata Python (Programming Language) Scala (Programming Language) Scripting Software Development

Summary: The Cloudera Data Engineer will support the migration of a Medicaid Data Warehouse Implementation in an AWS environment, focusing on the transition of a Cloudera/Hive/Scala-based data pipeline. This role is responsible for ensuring data integrity, job performance, and operational stability post-migration while collaborating with the project team on AWS infrastructure. The engineer will handle cluster migration, data pipeline reconfiguration, and maintain reliable daily operations. The position requires extensive experience in data engineering and proficiency in Cloudera technologies.

Key Responsibilities:

Replicate and configure existing Cloudera cluster (HDFS, YARN, Hive, Spark) in the new AWS account.
Coordinate with project team to ensure proper infrastructure provisioning (EC2, security groups, IAM roles, and networking).
Reconfigure cluster connectivity and job dependencies for the new environment.
Migrate and validate metadata stores (Hive Metastore, job configs, dependencies).
Validate job execution and data outputs for parity with existing environment.
Deploy, test, and operate existing Hive, Spark (Scala) jobs post-migration.
Maintain job schedules, dependencies, and runtime configurations.
Monitor job performance, identify bottlenecks, and apply tuning or code-level optimizations.
Troubleshoot failures and implement automated recovery or alerting where applicable.
Monitor Cloudera Manager dashboards, cluster health, and resource utilization.
Manage user roles and access within Cloudera environment.
Implement periodic data cleanup, archiving, and housekeeping processes.
Document configurations, migration steps, and operational runbooks.

Key Skills:

Bachelor's degree in computer science, Information Systems, or a related field.
7+ years of experience in data engineering or big data development.
4+ years experience with Cloudera platform (HDFS, YARN, Hive, Spark, Oozie).
Experience deploying and operating Cloudera workloads on AWS (EC2, S3, IAM, CloudWatch).
Strong proficiency in Scala, Java and HiveQL; Python or Bash scripting experience preferred.
Strong proficiency in Apache Spark & Scala programming for data processing and transformation.
Hands-on experience with Cloudera distribution of Hadoop.
Hands-on experience implementing business-rules processing using Drools.
Able to work with infrastructure, DevOps, and data governance teams in a multi-disciplinary environment.

Salary (Rate): undetermined

City: undetermined

Country: USA

Working Arrangements: remote

IR35 Status: outside IR35

Seniority Level: undetermined

Industry: IT

Detailed Description From Employer:

Cloudera Data Engineer
Job Summary
We are seeking a Cloudera Data Engineer to support the migration of a Medicaid Data Warehouse Implementation in AWS environment. The resource will support the migration and continued operations of an existing Cloudera/Hive/Scala-based data pipeline environment from one AWS account to another.
This position is responsible for ensuring a seamless transition, validating data integrity and job performance, and maintaining reliable daily operations post-migration.
The role will work closely with the existing project team for the underlying AWS infrastructure (VPC, IAM, S3, EC2, networking). The resource will focus on Cloudera cluster migration, data pipeline reconfiguration, and operational stability.
Key Responsibilities
Replicate and configure existing Cloudera cluster (HDFS, YARN, Hive, Spark) in the new AWS account.
Coordinate with project team to ensure proper infrastructure provisioning (EC2, security groups, IAM roles, and networking).
Reconfigure cluster connectivity and job dependencies for the new environment.
Migrate and validate metadata stores (Hive Metastore, job configs, dependencies).
Validate job execution and data outputs for parity with existing environment.
Deploy, test, and operate existing Hive, Spark (Scala) jobs post-migration.
Maintain job schedules, dependencies, and runtime configurations.
Monitor job performance, identify bottlenecks, and apply tuning or code-level optimizations.
Troubleshoot failures and implement automated recovery or alerting where applicable.
Monitor Cloudera Manager dashboards, cluster health, and resource utilization.
Manage user roles and access within Cloudera environment.
Implement periodic data cleanup, archiving, and housekeeping processes.
Document configurations, migration steps, and operational runbooks.

Required Skills and Experience:

Bachelor s degree in computer science, Information Systems, or a related field.
7+ years of experience in data engineering or big data development
4+ years experience with Cloudera platform (HDFS, YARN, Hive, Spark, Oozie)
Experience deploying and operating Cloudera workloads on AWS (EC2, S3, IAM, CloudWatch)
Strong proficiency in Scala, Java and HiveQL; Python or Bash scripting experience preferred
Strong proficiency in Apache Spark & Scala programming for data processing and transformation.
Hands on experience with Cloudera distribution of Hadoop.
Hands-on experience implementing business-rules processing using Drools.
Able to work with infrastructure, DevOps, and data governance teams in a multi-disciplinary environment.

Preferred Qualifications:

Candidates with Cloudera certification (e.g., CDP Data Engineer or Cloudera Administrator)
Experience with Cloudera version upgrades or AWS-to-AWS environment migrations.
Experience in public-sector or large enterprise data environments.

Apply

Inside IR35

Outside IR35

Permanent Employee

IR35

Umbrella Companies

Limited Companies

First Time Contractors

What Is IR35?

InsideIR35

Outside IR35

The Cost of IR35

IR35 Assessments

IR35 Rules

IR35 Compliance

Expenses

Foreign Companies

Overseas Contractors

Limited Companies

Sole Traders

What Is An Umbrella Company?

Choosing an Umbrella Company

Tax and Pay

Tax Avoidance

Fees (Margin)

National Insurance

Holiday Pay

Expenses

Pensions

Maternity Pay

Sick Pay

What Is A Limited Company?

Limited Company vs Sole Trader

Incorporation

Taxes

Filing Responsibilities

Bookkeeping

Insurance

Expenses

Buying a Car or Van

Capital Allowances

Benefits In Kind

Pensions

Employing A Spouse

Managing Excess Money

Dormant Companies

Closing Your Company

Withdrawing Money

Business Asset Disposal Relief

How To Become A Contractor

Inside IR35 Checklist

Outside IR35 Checklist

Self-Assessment Tax Returns

Mortgages

Pensions

Working Multiple Contracts

What is the £100k Abatement?

Inside IR35

Outside IR35

Permanent Employee

IR35

Umbrella Companies

Limited Companies

First Time Contractors

What Is IR35?

InsideIR35

Outside IR35

The Cost of IR35

IR35 Assessments

IR35 Rules

IR35 Compliance

Expenses

Foreign Companies

Overseas Contractors

Limited Companies

Sole Traders

What Is An Umbrella Company?

Choosing an Umbrella Company

Tax and Pay

Tax Avoidance

Fees (Margin)