Principal SRE - Site Reliability

Principal SRE - Site Reliability

Posted 4 days ago by CBSbutler

£525 Per day
Inside
Hybrid
Wokingham

Summary: The Principal Site Reliability Engineer (Platform / DevOps) role is focused on leading initiatives that enhance the scalability, reliability, and performance of distributed systems. This position requires a strong background in DevOps and the ability to design and maintain resilient cloud-based infrastructure. The role is hybrid, with a significant portion of remote work, and requires active SC clearance. The contract duration is for 6 months or more, with a competitive daily rate.

Key Responsibilities:

  • Lead platform-first engineering initiatives to enhance scalability and reliability
  • Design, build, and maintain resilient infrastructure for distributed systems
  • Implement monitoring and alerting solutions to ensure high availability
  • Collaborate with engineering teams to improve system reliability and mitigate risks
  • Develop and maintain CI/CD pipelines to support efficient deployments
  • Recommend ongoing improvements to platform architecture and processes
  • Ensure compliance with security, governance, and regulatory standards

Key Skills:

  • Strong background in software engineering for large-scale distributed systems
  • Proficiency in Golang, Java, or Python
  • Hands-on experience with AWS, Azure, or GCP
  • Deep knowledge of Kubernetes and container orchestration
  • Proven experience with CI/CD and infrastructure automation
  • Excellent troubleshooting and communication skills

Salary (Rate): £520 per day

City: Wokingham

Country: United Kingdom

Working Arrangements: hybrid

IR35 Status: inside IR35

Seniority Level: Senior

Industry: IT

Role: Principal Site Reliability Engineer (Platform / DevOps)


Location: Wokingham (Reading) - Hybrid (60% remote / 40% onsite)
Duration: 6 Months+
Rate: £500-£520 per day
Clearance: Active SC Clearance required (mandatory)

Overview
We are seeking an experienced Principal SRE / Platform Engineer to lead platform-first initiatives focused on scalability, reliability, and performance across distributed systems. This role requires strong DevOps expertise and the ability to design and maintain resilient cloud-based infrastructure.

Key Responsibilities

  • Lead platform-first engineering initiatives to enhance scalability and reliability
  • Design, build, and maintain resilient infrastructure for distributed systems
  • Implement monitoring and alerting solutions to ensure high availability
  • Collaborate with engineering teams to improve system reliability and mitigate risks
  • Develop and maintain CI/CD pipelines to support efficient deployments
  • Recommend ongoing improvements to platform architecture and processes
  • Ensure compliance with security, governance, and regulatory standards

Required Skills & Experience

  • Strong background in software engineering for large-scale distributed systems
  • Proficiency in Golang, Java, or Python
  • Hands-on experience with AWS, Azure, or GCP
  • Deep knowledge of Kubernetes and container orchestration
  • Proven experience with CI/CD and infrastructure automation
  • Excellent troubleshooting and communication skills