Data Automation Engineer

Posted Today by Protos IT

Apply

Negotiable

Undetermined

Remote

Apply

Agile Methodology Amazon DynamoDB Amazon Web Services (AWS) Anomaly Detection Apache Flume Apache Kafka Apache Solr Apache Spark Application Programming Interface (API) Artificial Intelligence Automation AWS Elastic MapReduce (EMR) AWS Glue AWS Identity And Access Management (IAM) AWS Key Management Service (KMS) AWS Lambda Azure Databricks Azure DevOps Bash (Scripting Language) Batch Processing Cloud Computing Cloud Services Cloud Technology Computer Science Constituent Relationship Management Continuous Integration and Continuous Delivery Customer Relationship Management (CRM) Software Databricks Data Encryption Data Engineering Data Ingestion Data Integration Data Pipeline Data Quality Device Tracking Software DevOps Encryption Extract Transform Load (ETL) File Servers Github Indexing Jenkins JIRA Lifecycle Management Management Metadata Microsoft 365 Microsoft Azure Microsoft Biztalk Servers Microsoft SQL Servers Multi-Cloud Open Source Development Operational Reporting Profiling (Computer Programming) Python (Programming Language) Query Optimisation RESTful API Role-Based Access Control (RBAC) Software Coding SQL (Programming Language) Unstructured Data Virtual Private Cloud Workflows

Summary: The GenAI Data Automation Engineer role involves designing and implementing AI-driven automation solutions across AWS and Azure environments. The position requires building intelligent data pipelines and automations that integrate cloud services and Generative AI for analytics and reporting. Candidates should possess a mission-focused mindset and critical thinking skills to solve technical challenges. Active Public Trust Clearance is encouraged for applicants.

Key Responsibilities:

Design and maintain data pipelines in AWS using S3, RDS/SQL Server, Glue, Lambda, EMR, DynamoDB, and Step Functions.
Develop ETL/ELT processes to move data from multiple data systems including DynamoDB and SQL Server (AWS) and between AWS and Azure SQL systems.
Integrate AWS Connect CRM data into the enterprise data pipeline for analytics and operational reporting.
Engineer, enhance ingestion pipelines with Apache Spark, Flume, Kafka for real-time and batch processing into Apache Solr, AWS Open Search platforms.
Create automated processes for vector generation and embeddings from unstructured data using Generative AI services and Frameworks (AWS Bedrock, Amazon Q, Azure OpenAI, Hugging Face, LangChain).
Automate data quality checks, metadata tagging, and lineage tracking.
Enhance ingestion/ETL with LLM-assisted transformation and anomaly detection.
Build conversational BI interfaces that allow natural language access to Solr and SQL data.
Develop AI-powered copilots for pipeline monitoring and automated troubleshooting.
Implement SQL Server stored procedures, indexing, query optimization, profiling, and execution plan tuning to maximize performance.
Apply CI/CD best practices using GitHub, Jenkins, or Azure DevOps for both data pipelines and GenAI model integration.
Ensure security and compliance through IAM, KMS encryption, VPC isolation, RBAC, and firewalls.
Support Agile DevOps processes with sprint-based delivery of pipeline and AI-enabled features.

Key Skills:

BS in Computer Science or related field with 2+ years of data engineering, automation experiences.
Hands-on experience with SQL, SSIS, Python, Spark, Bash, Power shell, AWS/Azure CLIs.
Experience with AWS services like S3, RDS/SQL Server, Glue, Lambda, EMR, DynamoDB.
Familiarity with Apache Flume, Kafka, Solr for large-scale data ingestion and search.
Familiarity with LLM, Gen AI frameworks using AWS Bedrock, Azure OpenAI or open source platform, tools.
Experience with integrating REST API calls in data pipelines and workflows.
Familiarity with JIRA, GitHub / Azure DevOps / Jenkins for SDLC and CI/CD automation.
Strong troubleshooting and performance optimization skills in SQL, Spark or other data engineering solutions.
Experience operationalizing Generative AI (GenAI Ops) pipelines, including model deployment, monitoring, retraining, and lifecycle management for LLMs and AI-enabled data workflows.
Good communication and presentation skills.
Ability to obtain Federal government Public Trust clearance.

Salary (Rate): £46.00 hourly

City: undetermined

Country: undetermined

Working Arrangements: remote

IR35 Status: undetermined

Seniority Level: undetermined

Industry: IT

Detailed Description From Employer:

Job Title: GenAI Data Automation Engineer

Location: Remote

Duration: 6 + Months Contract

s with Active Public Trust Clearance are encouraged to apply for this position.

Our client is seeking a Data Automation Engineer to design and implement innovative, AI-driven automation solutions across AWS and Azure hybrid environments. You will be responsible for building intelligent, scalable data pipelines and automations that integrate cloud services, enterprise tools, and Generative AI to support mission-critical analytics, reporting, and customer engagement platforms. Ideal candidate is mission focused, delivery oriented, applies critical thinking to create innovative functions and solve technical issues.

In this role, you will:

Design and maintain data pipelines in AWS using S3, RDS/SQL Server, Glue, Lambda, EMR, DynamoDB, and Step Functions.
Develop ETL/ELT processes to move data from multiple data systems including DynamoDB SQL Server (AWS) and between AWS Azure SQL systems.
Integrate AWS Connect CRM data into the enterprise data pipeline for analytics and operational reporting.
Engineer, enhance ingestion pipelines with Apache Spark, Flume, Kafka for real-time and batch processing into Apache Solr, AWS Open Search platforms.
Leverage Generative AI services and Frameworks (AWS Bedrock, Amazon Q, Azure OpenAI, Hugging Face, LangChain) to:

o Create automated processes for vector generation and embeddings from unstructured data.

o Automate data quality checks, metadata tagging, and lineage tracking.

o Enhance ingestion/ETL with LLM-assisted transformation and anomaly detection.

o Build conversational BI interfaces that allow natural language access to Solr and SQL data.

Develop AI-powered copilots for pipeline monitoring and automated troubleshooting.
Implement SQL Server stored procedures, indexing, query optimization, profiling, and execution plan tuning to maximize performance.
Apply CI/CD best practices using GitHub, Jenkins, or Azure DevOps for both data pipelines and GenAI model integration.
Ensure security and compliance through IAM, KMS encryption, VPC isolation, RBAC, and firewalls.
Support Agile DevOps processes with sprint-based delivery of pipeline and AI-enabled features.

Required Qualifications:

BS in Computer Science or related field with 2+ years of data engineering, automation experiences.
Hands-on experience with SQL, SSIS, Python, Spark, Bash, Power shell, AWS/Azure CLIs.
Experience with AWS services like S3, RDS/SQL Server, Glue, Lambda, EMR, DynamoDB.
Familiarity with Apache Flume, Kafka, Solr for large-scale data ingestion and search.
Familiarity with LLM, Gen AI frameworks using AWS Bedrock, Azure OpenAI or open source platform, tools.
Experience with integrating REST API calls in data pipelines and workflows.
Familiarity with JIRA, GitHub / Azure DevOps / Jenkins for SDLC and CI/CD automation.
Strong troubleshooting and performance optimization skills in SQL, Spark or other data engineering solutions.
Experience operationalizing Generative AI (GenAI Ops) pipelines, including model deployment, monitoring, retraining, and lifecycle management for LLMs and AI-enabled data workflows.
Good communication and presentation skills.
Ability to obtain Federal government Public Trust clearance.

Preferred (plus):

Certifications: AWS Data Engineer, AWS AI/ML Specialty, Azure AI Engineer, Databricks certified Data Engineer.
Experience implementing RAG pipelines, embeddings, and vector search with Solr, OpenSearch, FAISS, Pinecone, or Pgvector/SQL server vector types.
Experience with GenAI powered coding tools such as Claude Code, OpenAI Codex, VS Code.
Experience with multi-cloud data integration (AWS Azure SQL).
Familiarity with Microsoft BizTalk and SSIS for SQL Server ETL workflows.
Knowledge of data lineage/governance tools (Purview, Unity Catalog, AWS Glue Catalog).
Familiarity with Infrastructure-as-Code (Terraform/CloudFormation, Bicep) for automated deployments.
Experience with compliance frameworks (FedRAMP, PCI-DSS, HIPAA).

Apply

Inside IR35

Outside IR35

Permanent Employee

IR35

Umbrella Companies

Limited Companies

First Time Contractors

What Is IR35?

InsideIR35

Outside IR35

The Cost of IR35

IR35 Assessments

IR35 Rules

IR35 Compliance

Expenses

Foreign Companies

Overseas Contractors

Limited Companies

Sole Traders

What Is An Umbrella Company?

Choosing an Umbrella Company

Tax and Pay

Tax Avoidance

Fees (Margin)

National Insurance

Holiday Pay

Expenses

Pensions

Maternity Pay

Sick Pay

What Is A Limited Company?

Limited Company vs Sole Trader

Incorporation

Taxes

Filing Responsibilities

Bookkeeping

Insurance

Expenses

Buying a Car or Van

Capital Allowances

Benefits In Kind

Pensions

Employing A Spouse

Managing Excess Money

Dormant Companies

Closing Your Company

Withdrawing Money

Business Asset Disposal Relief

How To Become A Contractor

Inside IR35 Checklist

Outside IR35 Checklist

Self-Assessment Tax Returns

Mortgages

Pensions

Working Multiple Contracts

What is the £100k Abatement?

Inside IR35

Outside IR35

Permanent Employee

IR35

Umbrella Companies

Limited Companies

First Time Contractors

What Is IR35?

InsideIR35

Outside IR35

The Cost of IR35

IR35 Assessments

IR35 Rules

IR35 Compliance

Expenses

Foreign Companies

Overseas Contractors

Limited Companies

Sole Traders

What Is An Umbrella Company?

Choosing an Umbrella Company

Tax and Pay

Tax Avoidance

Fees (Margin)