Negotiable
Undetermined
Undetermined
London Area, United Kingdom
Summary: The Lead AI Red Teaming & QA Engineer will design and implement automated adversarial testing for enterprise Agentic AI platforms, focusing on building continuous safety pipelines. This role requires a shift from traditional software QA to ensure the security and compliance of non-deterministic LLM agents and related systems before production release. The engineer will also develop metrics for AI risks and produce regulatory compliance evidence. The position demands a strong technical background in regulated finance and AI security fundamentals.
Key Responsibilities:
- Build and integrate automated red teaming suites into CI/CD pipelines using frameworks like Garak, Pyrit, and AgentDojo.
- Develop metrics and continuous testing for core AI risks, including hallucinations, memorisation, algorithmic bias, uncertainty, and model drift.
- Map threat models (OWASP LLM Top 10, Agentic threats) to automated test cases and produce technical testing evidence required by EU AI Act Article 15, DORA, and FCA Operational Resilience guidelines.
- Own the enterprise AI Bill of Materials (AI-BOM), tracking model lineages, dataset versions, and signed artifacts as a centralized evaluation service.
Key Skills:
- Proven experience testing software within FCA, DORA, or EU AI Act frameworks.
- Hands-on experience configuring, testing, and bypassing AWS Bedrock Guardrails, Agents, and Knowledge Bases (RAG).
- Solid understanding of Foundation Models, tool use (function calling), OWASP LLM Top 10, and NIST AI RMF.
- Strong Python development skills and experience with AI eval tools (Garak, Pyrit, Ragas).
- Experience building complex CI/CD test pipelines.
Salary (Rate): undetermined
City: London Area
Country: United Kingdom
Working Arrangements: undetermined
IR35 Status: undetermined
Seniority Level: undetermined
Industry: IT
We are seeking a Lead AI Red Teaming & QA Engineer to design and execute automated adversarial testing for our enterprise Agentic AI platforms. You will move beyond traditional software QA to build continuous safety pipelines, ensuring our non-deterministic LLM agents, RAG systems, and tool integrations are secure, resilient, and compliant before production release.
Key Responsibilities
- Automated Adversarial Testing: Build and integrate automated red teaming suites into CI/CD pipelines using frameworks like Garak , Pyrit , and AgentDojo to enforce strict safety release gates.
- AI Evaluation Frameworks: Develop metrics and continuous testing for core AI risks, including hallucinations, memorisation, algorithmic bias, uncertainty, and model drift .
- Regulatory Compliance Evidence: Map threat models (OWASP LLM Top 10, Agentic threats) to automated test cases. Produce the technical testing evidence required by EU AI Act Article 15 , DORA , and FCA Operational Resilience guidelines.
- Centralised AI-BOM Platform: Own the enterprise AI Bill of Materials (AI-BOM) , tracking model lineages, dataset versions, and signed artifacts as a centralized evaluation service.
Required Technical Skills
- Regulated Finance: Proven experience testing software within FCA, DORA, or EU AI Act frameworks.
- AWS Bedrock Ecosystem: Hands-on experience configuring, testing, and bypassing Bedrock Guardrails, Agents, and Knowledge Bases (RAG) .
- AI Security & Fundamentals: Solid understanding of Foundation Models, tool use (function calling), OWASP LLM Top 10 , and NIST AI RMF .
- Automation Stack: Strong Python development skills, experience with AI eval tools (Garak, Pyrit, Ragas), and building complex CI/CD test pipelin