Negotiable
Undetermined
Remote
United Kingdom
Summary: The Site Reliability Engineer (SRE) at Mercor is responsible for building and automating systems to ensure the platform's reliability, scalability, and speed. The role involves mentoring engineers, leading incident responses, and improving various reliability programs. Candidates should have a strong background in SRE and proficiency in relevant technologies. The position is full-time and based in San Francisco, with remote work flexibility.
Key Responsibilities:
- Mentor engineers on best practices for observability, alert management, and instrumentation.
- Lead incident response from triage through post-mortem and remediation.
- Own and improve load-testing, disaster-recovery, and chaos-engineering programs.
- Automate reliability checks, capacity planning, and service-level monitoring.
- Partner with product and platform teams to design for reliability and scalability from the start.
Key Skills:
- Background in SRE
- Proficiency in Terraform, Python, Go
- Experience working with AWS
- Experience with RDBMS (MySQL)
- Experience with document storage systems (MongoDB)
- Experience with caching systems (Redis)
- Exposure to data warehousing (Snowflake)
- Previous work in a high-growth startup environment
Salary (Rate): £300,000.00/hr
City: San Francisco
Country: United States
Working Arrangements: remote
IR35 Status: undetermined
Seniority Level: undetermined
Industry: IT
Company Introduction Mercor connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors include Benchmark , General Catalyst , Peter Thiel , Adam D'Angelo , Larry Summers , and Jack Dorsey .
Role Overview Position: Site Reliability Engineer (SRE) – Full-Time, San Francisco Commitment: 40 hours per week As an SRE at Mercor, you’ll build and automate systems to keep our platform reliable, scalable, and fast. You will work across every layer of the stack to drive measurable reliability improvements.
Responsibilities
- Mentor engineers on best practices for observability, alert management, and instrumentation.
- Lead incident response from triage through post-mortem and remediation.
- Own and improve load-testing, disaster-recovery, and chaos-engineering programs.
- Automate reliability checks, capacity planning, and service-level monitoring.
- Partner with product and platform teams to design for reliability and scalability from the start.
Requirements / Qualifications
Must-Have Qualifications
- Background in SRE
- Proficiency in Terraform, Python, Go
- Experience working with AWS
Preferred Qualifications
- Experience with RDBMS (MySQL)
- Experience with document storage systems (MongoDB)
- Experience with caching systems (Redis)
- Exposure to data warehousing (Snowflake)
- Previous work in a high-growth startup environment
Engagement Details
Full-Time position Location: San Francisco Remote work flexibility Competitive compensation
Application Process (Takes 20-30 mins to complete)
- Upload resume
- AI interview based on your resume
- Submit form
Resources & Support
For details about the interview process and platform information, please check: https://talent.docs.mercor.com/welcome/welcome For any help or support, reach out to: support@mercor.com PS: Our team reviews applications daily. Please complete your AI interview and application steps to be considered for this opportunity.