Negotiable
Outside
Remote
USA
Summary: The DataStage Engineer role focuses on building data pipelines using Google Cloud Platform technologies, specifically DataStage and SnapLogic, within a healthcare context. Candidates are expected to have hands-on experience in developing ETL processes from scratch rather than merely maintaining existing workflows. The position emphasizes proactive communication and problem-solving skills, as well as familiarity with data warehousing and orchestration tools. This is a remote contract position with a potential for hire after 4-6 months.
Key Responsibilities:
- Analyze existing ETL pipelines and jobs, migrating them to a modern stack including SnapLogic, Python, Spark, and Dataflow.
- Develop new data ingestion and ETL pipelines from scratch, primarily using SnapLogic, along with Python, SQL, Dataflow, and Spark.
- Support data modeling efforts without owning them, requiring some experience in data modeling and data warehousing fundamentals.
- Utilize Airflow or Cloud Composer for orchestration and development of new DAGs.
- Communicate proactively and solve problems, contributing suggestions and asking questions.
- Work with a variety of technologies in the environment, with familiarity in Kafka, Java, Apache Beam, and Alteryx as a plus.
Key Skills:
- Experience in DataStage and SnapLogic.
- Proficiency in Python, SQL, Dataflow, and Spark.
- Knowledge of data warehousing and Google BigQuery.
- Experience with Airflow or Cloud Composer for orchestration.
- Strong communication and problem-solving skills.
- Familiarity with additional technologies such as Kafka, Java, Apache Beam, and Alteryx is a plus.
Salary (Rate): undetermined
City: undetermined
Country: USA
Working Arrangements: remote
IR35 Status: outside IR35
Seniority Level: undetermined
Industry: IT
DataStage Engineer (Google Cloud Platform) Remote 4-6 month contract to hire
DataStage, Google Cloud Platform and Snaplogic required Healthcare background preferred Pain Points: candidates who do not have the correct focus area (building the data pipelines), candidates who have not built something from scratch; either they are maintaining existing data workflows or processes in production, but not building.
- 2 Developers need experience in Datastage--have legacy ETL pipelines which they are migrating. This role will analyze the existing ETL pipelines, jobs, and replace in their modern stack: SnapLogic, Python, some Spark, some Dataflow.
- These 2 Developers need experience in DataStage (they have not been successful in upskilling from Informatica, Talend, etc.).
- Experience in SnapLogic is strongly preferred; they have found this to be more amenable to upskilling.
- Needs experience in Airflow or Cloud Composer orchestration, development of new DAGs from scratch.
- Development of data ingestion and ETL pipelines from scratch. Using SnapLogic primarily for data pipelines and integrations, but also Python, SQL, Dataflow, Spark.
- Needs experience in data warehousing, Google BigQuery.
- Not responsible for building out visualizations; another team handles
- Will be supporting data modeling, but not owning. Should have some experience in data modeling, data warehousing fundamentals.
- Understanding of analytics as a whole, how data moves from source, warehouse, semantic or reporting layer, models, and reporting/BI, but their hands-on focus will be around building data pipelines and orchestration.
- Proactive communicators, inquisitive people, problem-solvers, unafraid to make suggestions, ask questions. "Order taker" and "heads down" types of Engineers will not be a culture fit for the team.
- They have a list of other technologies in their environment in smaller amounts/more dispersed--any would be a "nice to have": i.e. Kafka, Java, Apache Beam, Alteryx, etc.