Senior SDET / AI LLM || Remote || W2 Contract

Posted Today by Integrass

Apply

Negotiable

Undetermined

Remote

Apply

Agile Methodology Application Programming Interface (API) Artificial Intelligence Automation Continuous Integration and Continuous Delivery Datadog Data Engineering Data Ingestion Data Quality DevOps Infrastructure as Code (IaC) Machine Learning Microservices Nice (Unix Utility) Python (Programming Language) PyTorch (Machine Learning Library) Regression Testing Software Development Software Development Engineer in Test System Testing TensorFlow Testability Test Automation Test Datasets Test Strategy Verification And Validation Workflows

Summary: The role of Senior Software Development Engineer in Test (SDET) focuses on test automation, backend systems testing, and AI/LLM validation. This hands-on position involves testing LLM-powered applications, building evaluation workflows, and defining quality standards for generative AI systems. The candidate will work closely with engineering teams to enhance testability and reliability while advocating for best practices in AI quality engineering. The role is remote and requires a strong background in Python and experience with ML or LLM systems.

Key Responsibilities:

Testing LLM-powered applications used across the enterprise
Building LLM-driven testing and evaluation workflows
Defining organization-wide standards for GenAI quality, reliability, and release readiness
Design and implement test strategies for LLM-powered systems, including prompt and response validation, regression testing, and evaluation of accuracy, consistency, hallucinations, bias, and safety
Build and maintain LLM-based evaluation frameworks using tools such as DeepEval, MLflow, LangChain, and Langflow
Develop synthetic and real-world test datasets in collaboration with the Data Engineer
Define quality thresholds, scoring mechanisms, benchmarks, and pass/fail criteria for GenAI systems
Build and maintain automated test frameworks for LLM APIs and services, agentic workflows, and data ingestion pipelines
Integrate LLM testing and evaluation into CI/CD pipelines, enforcing quality gates prior to production release
Partner with engineering teams to improve testability, reliability, and observability of AI systems
Perform root-cause analysis for failures related to model behavior, data quality, or orchestration logic
Instrument LLM applications using Datadog LLM Observability to track latency, token usage, errors, and cost
Build dashboards and alerting focused on LLM quality and reliability
Use production telemetry to continuously refine test coverage and evaluation strategies
Act as a consultative partner to product, platform, and data teams adopting LLM technologies
Provide guidance on generative AI test strategies, prompt engineering, and release readiness
Contribute to organization-wide standards and best practices for testing AI systems
Participate in architecture and design reviews from a quality-first perspective
Advocate for automation-first testing, infrastructure as code, and continuous monitoring
Drive adoption of Agile, DevOps, and CI/CD best practices within AI quality engineering
Conduct code reviews and promote secure, maintainable, and scalable test frameworks
Continuously improve internal tooling and frameworks within the QA Center of Excellence

Key Skills:

Strong Python development skills
Experience testing backend systems, APIs, microservices, or distributed platforms
Proven experience building and maintaining automation frameworks
Ability to work effectively with ambiguous, non-deterministic systems
Hands-on experience testing or validating ML- or LLM-based systems
Familiarity with LLM orchestration and evaluation tools, including LangChain, Langflow, DeepEval, MLflow
Strong understanding of challenges unique to testing generative AI systems
Experience with Datadog, especially LLM Observability (nice to have)
Exposure to Hugging Face, PyTorch, or TensorFlow (usage-level) (nice to have)
Experience testing RAG pipelines, Vector Databases, or data-driven platforms (nice to have)
Background working in platform teams, shared services, or QA Centers of Excellence (nice to have)
Experience collaborating closely with Data Engineering or ML Platform teams (nice to have)

Salary (Rate): £34.50 hourly

City: undetermined

Country: undetermined

Working Arrangements: remote

IR35 Status: undetermined

Seniority Level: undetermined

Industry: IT

Detailed Description From Employer:

We are seeking a Senior Software Development Engineer in Test (SDET) with a strong background in test automation, backend systems testing, and AI/LLM validation.

This is a hands-on, highly influential role responsible for:

Testing LLM-powered applications used across the enterprise
Building LLM-driven testing and evaluation workflows
Defining organization-wide standards for GenAI quality, reliability, and release readiness

Key Responsibilities

LLM Testing & Evaluation

Design and implement test strategies for LLM-powered systems, including:
- Prompt and response validation
- Regression testing across model, prompt, and data changes
- Evaluation of accuracy, consistency, hallucinations, bias, and safety
Build and maintain LLM-based evaluation frameworks using tools such as DeepEval, MLflow, LangChain, and Langflow
Develop synthetic and real-world test datasets in collaboration with the Data Engineer
Define quality thresholds, scoring mechanisms, benchmarks, and pass/fail criteria for GenAI systems

Test Automation & Framework Development

Build and maintain automated test frameworks for:
- LLM APIs and services
- Agentic workflows and RAG pipelines
- Data ingestion and inference pipelines
Integrate LLM testing and evaluation into CI/CD pipelines, enforcing quality gates prior to production release
Partner with engineering teams to improve testability, reliability, and observability of AI systems
Perform root-cause analysis for failures related to model behavior, data quality, or orchestration logic

Observability & Monitoring

Instrument LLM applications using Datadog LLM Observability to track:
- Latency, token usage, errors, and cost
- Quality regressions, drift, and performance anomalies
Build dashboards and alerting focused on LLM quality and reliability
Use production telemetry to continuously refine test coverage and evaluation strategies

Shared Services & Collaboration

Act as a consultative partner to product, platform, and data teams adopting LLM technologies
Provide guidance on:
- Generative AI test strategies
- Prompt engineering and workflow validation
- Release readiness and AI risk assessment
Contribute to organization-wide standards and best practices for testing, explaining, and monitoring AI systems
Participate in architecture and design reviews from a quality-first perspective

Engineering Excellence

Advocate for automation-first testing, infrastructure as code, and continuous monitoring
Drive adoption of Agile, DevOps, and CI/CD best practices within AI quality engineering
Conduct code reviews and promote secure, maintainable, and scalable test frameworks
Continuously improve internal tooling and frameworks within the QA Center of Excellence

Required Skills & Experience

Strong Python development skills
Experience testing backend systems, APIs, microservices, or distributed platforms
Proven experience building and maintaining automation frameworks
Ability to work effectively with ambiguous, non-deterministic systems

AI / LLM Experience

Hands-on experience testing or validating ML- or LLM-based systems
Familiarity with LLM orchestration and evaluation tools, including:
- LangChain, Langflow
- DeepEval, MLflow
Strong understanding of challenges unique to testing generative AI systems

Nice to Have

Experience with Datadog, especially LLM Observability
Exposure to Hugging Face, PyTorch, or TensorFlow (usage-level)
Experience testing RAG pipelines, Vector Databases, or data-driven platforms
Background working in platform teams, shared services, or QA Centers of Excellence
Experience collaborating closely with Data Engineering or ML Platform teams

Apply

Inside IR35

Outside IR35

Permanent Employee

IR35

Umbrella Companies

Limited Companies

First Time Contractors

What Is IR35?

InsideIR35

Outside IR35

The Cost of IR35

IR35 Assessments

IR35 Rules

IR35 Compliance

Expenses

Foreign Companies

Overseas Contractors

Limited Companies

Sole Traders

What Is An Umbrella Company?

Choosing an Umbrella Company

Tax and Pay

Tax Avoidance

Fees (Margin)

National Insurance

Holiday Pay

Expenses

Pensions

Maternity Pay

Sick Pay

What Is A Limited Company?

Limited Company vs Sole Trader

Incorporation

Taxes

Filing Responsibilities

Bookkeeping

Insurance

Expenses

Buying a Car or Van

Capital Allowances

Benefits In Kind

Pensions

Employing A Spouse

Managing Excess Money

Dormant Companies

Closing Your Company

Withdrawing Money

Business Asset Disposal Relief

How To Become A Contractor

Inside IR35 Checklist

Outside IR35 Checklist

Self-Assessment Tax Returns

Mortgages

Pensions

Working Multiple Contracts

What is the £100k Abatement?

Inside IR35

Outside IR35

Permanent Employee

IR35

Umbrella Companies

Limited Companies

First Time Contractors

What Is IR35?

InsideIR35

Outside IR35

The Cost of IR35

IR35 Assessments

IR35 Rules

IR35 Compliance

Expenses

Foreign Companies

Overseas Contractors

Limited Companies

Sole Traders

What Is An Umbrella Company?

Choosing an Umbrella Company

Tax and Pay

Tax Avoidance

Fees (Margin)