Negotiable
Undetermined
Remote
London, England, United Kingdom
Summary: The role of Software Engineer – AI Model Evaluator involves leveraging engineering expertise to assess and enhance the performance of advanced AI systems in generating and reasoning about code. This fully remote contract position allows engineers to work flexibly on complex software engineering tasks, identifying bugs and reliability issues in AI-generated code. Candidates will provide detailed feedback and work across various programming languages to ensure the robustness of AI tools. The position is designed for those who enjoy tackling challenging problems and making a significant impact in the AI field.
Key Responsibilities:
- Evaluate the performance of frontier language models on complex, real-world software engineering tasks
- Identify bugs, logical errors, hallucinations, and reliability issues in AI-generated code and reasoning
- Design and review prompts, test cases, and evaluation scenarios that stress-test advanced coding workflows
- Provide precise, well-reasoned written feedback explaining model strengths, weaknesses, and edge cases
- Work across multiple programming languages and codebases to assess generalization, correctness, and robustness
- Think critically about model behavior — not just whether code runs, but whether it's right
Key Skills:
- 3+ years of professional software engineering experience
- Strong proficiency in at least one of: TypeScript, Ruby, Java, or C++
- Sharp debugger — you spot non-obvious issues and can articulate exactly why something is broken
- Excellent written and spoken English; you communicate technical findings clearly and precisely
- Comfortable reasoning about complex systems, edge cases, and unexpected failure modes
- Familiarity with modern development tooling — Git, CLI workflows, testing frameworks, and similar
- You critically evaluate outputs rather than taking them at face value
Salary (Rate): £38.46 hourly
City: London
Country: United Kingdom
Working Arrangements: remote
IR35 Status: undetermined
Seniority Level: undetermined
Industry: IT
About The Role
What if your years of engineering experience could directly influence how the world's most advanced AI systems write and reason about code? We're looking for experienced software engineers to evaluate frontier AI models — hunting down bugs, exposing failure modes, and helping ensure that AI-generated code actually holds up under real-world scrutiny. This is a fully remote, flexible contract role built for engineers who love digging into hard problems. You set your own schedule, work across cutting-edge projects, and make a tangible impact on the AI tools that millions of developers will rely on.
Organization: Alignerr
Type: Hourly Contract
Location: Remote
Commitment: 10–40 hours/week
What You'll Do
- Evaluate the performance of frontier language models on complex, real-world software engineering tasks
- Identify bugs, logical errors, hallucinations, and reliability issues in AI-generated code and reasoning
- Design and review prompts, test cases, and evaluation scenarios that stress-test advanced coding workflows
- Provide precise, well-reasoned written feedback explaining model strengths, weaknesses, and edge cases
- Work across multiple programming languages and codebases to assess generalization, correctness, and robustness
- Think critically about model behavior — not just whether code runs, but whether it's right
Who You Are
- 3+ years of professional software engineering experience
- Strong proficiency in at least one of: TypeScript, Ruby, Java, or C++
- Sharp debugger — you spot non-obvious issues and can articulate exactly why something is broken
- Excellent written and spoken English; you communicate technical findings clearly and precisely
- Comfortable reasoning about complex systems, edge cases, and unexpected failure modes
- Familiarity with modern development tooling — Git, CLI workflows, testing frameworks, and similar
- You critically evaluate outputs rather than taking them at face value
Nice to Have
- Experience across multiple programming languages or paradigms
- Background in QA, code review, or software reliability engineering
- Familiarity with AI or LLM tools and how they generate code
- Interest in AI safety, alignment, or model evaluation research
Why Join Us
- Work on cutting-edge AI projects alongside leading research labs
- Fully remote and flexible — work when and where it suits you
- Freelance autonomy with the structure of meaningful, high-impact technical work
- Make a direct, tangible impact on how AI writes, reasons about, and understands code
- Potential for ongoing work and contract extension as new projects launch