£62 Per hour
Inside
Remote
london, london
Summary: The Senior Site Reliability Engineer (Observability) role is focused on enhancing the performance and reliability of large-scale cloud infrastructure through the development and management of observability platforms. The position requires expertise in monitoring, logging, and alerting systems to support millions of devices globally. The role is contract-based for an initial period of 12 months and is classified as inside IR35. The position allows for remote work from London, UK.
Key Responsibilities:
- Design, deploy and scale observability platforms
- Manage and scale Prometheus monitoring systems
- Deploy and maintain large Elasticsearch clusters
- Build and maintain data pipelines using Kafka
- Develop alerting and monitoring frameworks
- Automate infrastructure using Terraform and Ansible
- Develop tools and scripts using Python, Go, Ruby or Bash
- Work with Linux systems (Debian/Ubuntu)
- Participate in on-call rotation
- Improve system reliability, performance and scalability
Key Skills:
- 5+ years experience in Site Reliability Engineering / DevOps
- Strong Linux systems experience
- Observability and Monitoring tools experience
- Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana)
- Kafka
- Terraform / Infrastructure as Code
- Ansible / Configuration Management
- Programming experience (Python, Go, Ruby or Bash)
- Distributed systems and cloud infrastructure experience
Salary (Rate): £62.00 per hour
City: London
Country: UK
Working Arrangements: remote
IR35 Status: inside IR35
Seniority Level: Senior
Industry: IT
Senior Site Reliability Engineer (Observability)
Location: London/UK (Remote)
Contract: 12 Months Initial
Day rate : £55 Per Hour - £62 Per Hour Inside IR35
Job Overview
We are looking for a Senior Site Reliability Engineer with strong experience in Observability, Monitoring and Distributed Systems to support large-scale cloud infrastructure supporting millions of devices globally. The role focuses on building and scaling monitoring, logging and alerting platforms to ensure high availability and performance of cloud services.
Responsibilities
- Design, deploy and scale observability platforms
- Manage and scale Prometheus monitoring systems
- Deploy and maintain large Elasticsearch clusters
- Build and maintain data pipelines using Kafka
- Develop alerting and monitoring frameworks
- Automate infrastructure using Terraform and Ansible
- Develop tools and scripts using Python, Go, Ruby or Bash
- Work with Linux systems (Debian/Ubuntu)
- Participate in on-call rotation
- Improve system reliability, performance and scalability
Required Skills
- 5+ years experience in Site Reliability Engineering / DevOps
- Strong Linux systems experience
- Observability and Monitoring tools experience
- Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana)
- Kafka
- Terraform / Infrastructure as Code
- Ansible / Configuration Management
- Programming experience (Python, Go, Ruby or Bash)
- Distributed systems and cloud infrastructure experience
This is an urgent vacancy where the hiring manager is shortlisting for an interview immediately. Please apply with a copy of your CV or send it khushboo. pandey @ randstad. Co. uk
Randstad Technologies is acting as an Employment Business in relation to this vacancy.