Site Reliability Engineer

Site Reliability Engineer

Posted Today by Haystack

£62 Per hour
Inside
Remote
London, England, United Kingdom

Summary: The Site Reliability Engineer (SRE) role involves architecting and maintaining high-performance observability platforms for a global technology company. The position requires expertise in distributed systems and automation tools to ensure seamless performance across millions of connected devices. The role offers 100% remote working flexibility and an initial 12-month contract with potential for extension. Candidates should have extensive experience in Site Reliability Engineering or DevOps within enterprise-scale cloud environments.

Key Responsibilities:

  • Design, deploy, and scale high-performance observability platforms and Prometheus monitoring systems.
  • Architect and maintain massive Elasticsearch clusters and robust data pipelines leveraging Kafka.
  • Drive "Infrastructure as Code" (IaC) initiatives by automating complex cloud environments using Terraform and Ansible.
  • Build custom internal tools and sophisticated automation scripts using Python, Go, or Ruby.
  • Optimize Linux systems (Debian/Ubuntu) and participate in a collaborative on-call rotation.

Key Skills:

  • 5+ years of experience in Site Reliability Engineering (SRE) or DevOps.
  • Mastery of the Observability stack, specifically Prometheus, Grafana, and the full ELK Stack.
  • Expert-level Linux systems administration skills.
  • Deep knowledge of distributed systems architecture and Kafka messaging.
  • Hands-on proficiency with automation and configuration tools, including Terraform and Ansible.
  • Programming skills in Python or Golang.
  • Ability to thrive in a fast-paced environment.

Salary (Rate): £62 hourly

City: London

Country: United Kingdom

Working Arrangements: remote

IR35 Status: inside IR35

Seniority Level: Senior

Industry: IT

Detailed Description From Employer:

SRE - Site Reliability Engineer | £55 - £62

We're working with a global technology powerhouse supporting millions of connected devices on this exciting opportunity. Step into a high-impact Senior SRE role where you will be the architect of reliability for a massive distributed systems landscape. You will take the lead on scaling mission-critical observability and monitoring platforms using a cutting-edge stack including Prometheus, Kafka, and the ELK stack to ensure seamless performance for a global user base.

The Role

  • Design, deploy, and scale high-performance observability platforms and Prometheus monitoring systems to support millions of global devices.
  • Architect and maintain massive Elasticsearch clusters and robust data pipelines leveraging Kafka for real-time streaming.
  • Drive "Infrastructure as Code" (IaC) initiatives by automating complex cloud environments using Terraform and Ansible.
  • Build custom internal tools and sophisticated automation scripts using Python, Go, or Ruby to eliminate toil and boost system performance.
  • Optimize Linux systems (Debian/Ubuntu) and participate in a collaborative on-call rotation to maintain 24/7 service availability.

What You'll Need

  • 5+ years of battle-tested experience in Site Reliability Engineering (SRE) or DevOps within enterprise-scale cloud environments.
  • Mastery of the Observability stack, specifically Prometheus, Grafana, and the full ELK Stack (Elasticsearch, Logstash, Kibana).
  • Expert-level Linux systems administration skills and deep knowledge of distributed systems architecture and Kafka messaging.
  • Hands-on proficiency with automation and configuration tools, including Terraform, Ansible, and programming in Python or Golang.
  • The ability to thrive in a fast-paced environment, tackling complex scaling challenges for high-traffic cloud services.

What's On Offer

  • Competitive day rate of £55 - £62 per hour (Inside IR35).
  • Long-term stability with an initial 12-month contract and high potential for extension.
  • 100% remote working flexibility while supporting a premier London-based technology hub.
  • Opportunity to work on a truly global scale, impacting the experience of millions of daily active users.

Apply via Haystack today!