AI Evaluation Engineer (Remote) (Manchester)

AI Evaluation Engineer (Remote) (Manchester)

Posted Today by Outlier AI

Negotiable
Undetermined
Remote
Stretford, England, United Kingdom

Summary: The AI Evaluation Engineer role at Outlier involves enhancing AI agents through human feedback, focusing on training Large Language Models for complex architectural workflows. Candidates should have a background in backend engineering or AI automation, with a strong emphasis on building production-grade software. The position offers remote working options and seeks individuals passionate about shaping the future of autonomous agents. Ideal candidates will possess technical expertise and attention to detail in system interactions.

Key Responsibilities:

  • Collaborate with AI organizations to train Large Language Models (LLMs).
  • Build and maintain production-grade software with modular separation.
  • Provide technical feedback on complex system behaviors.
  • Integrate agents with live tools and APIs to solve real-world problems.
  • Implement persistent state and session discovery for tracking agent progress.
  • Identify and address subtle failures in system interactions.

Key Skills:

  • 2+ years of experience in backend engineering, AI automation, or complex systems integration.
  • Proficiency in at least two major programming languages (e.g., Python, JavaScript, Go, Java).
  • Experience with SQL databases and building for live environments.
  • Outstanding attention to detail and ability to provide clear technical feedback.
  • Expertise in multi-stage coordination tasks and integrating agents with live tools.
  • Experience identifying privacy leaks and authority escalation issues.

Salary (Rate): undetermined

City: Stretford

Country: United Kingdom

Working Arrangements: remote

IR35 Status: undetermined

Seniority Level: undetermined

Industry: IT

Detailed Description From Employer:

About The Project Find out more about this role by reading the information below, then apply to be considered. Outlier helps the world’s most innovative companies improve their AI agents by providing human feedback. Do you want to shape the future of autonomous agents like OpenClaw? We collaborate with leading AI organizations to train Large Language Models (LLMs) to function as proactive, multi-step agents. Our projects focus on teaching these systems how to design, coordinate, and optimize complex, real-world architectural workflows. Whether you are a passionate orchestration guru or experienced software developer — we want you to help us train the world's most advanced generative systems.

Ideal Qualifications 2+ years of experience in backend engineering, AI automation, or complex systems integration. Proven ability to build and maintain production-grade software with modular separation (e.g., distinct services for data parsing, logic processing, and reporting). Strong command of at least two major languages (e.g., Python, JavaScript, Go, or Java) and experience working with SQL databases. Practical experience building for live, non-mocked environments and handling multi-turn system interactions. Outstanding attention to detail and the ability to provide clear, high-density technical feedback on complex system behaviors. Nice to have Expertise building multi-stage coordination tasks where data acquisition leads to reasoned output. Hands-on experience integrating agents with live tools such as Supabase, Gmail, and various APIs to solve real-world problems. High level of comfort implementing persistent state and session discovery using to track agent progress. Experience identifying subtle failures like privacy leaks, authority escalation, or indirect prompt injections.

xlqdzyr Remote working/work at home options are available for this role.