sre - site reliability engineer

sre - site reliability engineer

Posted 3 days ago by 1774514576

£62 Per hour
Inside
Remote
london, london

Summary: The Senior Site Reliability Engineer (Observability) role is focused on enhancing the performance and reliability of large-scale cloud infrastructure through the development and management of observability platforms. The position requires expertise in monitoring, logging, and alerting systems to support millions of devices globally. The role is contract-based for an initial period of 12 months and is classified as inside IR35. The position allows for remote work from London, UK.

Key Responsibilities:

  • Design, deploy and scale observability platforms
  • Manage and scale Prometheus monitoring systems
  • Deploy and maintain large Elasticsearch clusters
  • Build and maintain data pipelines using Kafka
  • Develop alerting and monitoring frameworks
  • Automate infrastructure using Terraform and Ansible
  • Develop tools and scripts using Python, Go, Ruby or Bash
  • Work with Linux systems (Debian/Ubuntu)
  • Participate in on-call rotation
  • Improve system reliability, performance and scalability

Key Skills:

  • 5+ years experience in Site Reliability Engineering / DevOps
  • Strong Linux systems experience
  • Observability and Monitoring tools experience
  • Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana)
  • Kafka
  • Terraform / Infrastructure as Code
  • Ansible / Configuration Management
  • Programming experience (Python, Go, Ruby or Bash)
  • Distributed systems and cloud infrastructure experience

Salary (Rate): £62.00 per hour

City: London

Country: UK

Working Arrangements: remote

IR35 Status: inside IR35

Seniority Level: Senior

Industry: IT

Detailed Description From Employer:

Senior Site Reliability Engineer (Observability)

Location: London/UK (Remote)

Contract: 12 Months Initial

Day rate : £55 Per Hour - £62 Per Hour Inside IR35

Job Overview

We are looking for a Senior Site Reliability Engineer with strong experience in Observability, Monitoring and Distributed Systems to support large-scale cloud infrastructure supporting millions of devices globally. The role focuses on building and scaling monitoring, logging and alerting platforms to ensure high availability and performance of cloud services.

Responsibilities

  • Design, deploy and scale observability platforms
  • Manage and scale Prometheus monitoring systems
  • Deploy and maintain large Elasticsearch clusters
  • Build and maintain data pipelines using Kafka
  • Develop alerting and monitoring frameworks
  • Automate infrastructure using Terraform and Ansible
  • Develop tools and scripts using Python, Go, Ruby or Bash
  • Work with Linux systems (Debian/Ubuntu)
  • Participate in on-call rotation
  • Improve system reliability, performance and scalability

Required Skills

  • 5+ years experience in Site Reliability Engineering / DevOps
  • Strong Linux systems experience
  • Observability and Monitoring tools experience
  • Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana)
  • Kafka
  • Terraform / Infrastructure as Code
  • Ansible / Configuration Management
  • Programming experience (Python, Go, Ruby or Bash)
  • Distributed systems and cloud infrastructure experience

This is an urgent vacancy where the hiring manager is shortlisting for an interview immediately. Please apply with a copy of your CV or send it khushboo. pandey @ randstad. Co. uk

Randstad Technologies is acting as an Employment Business in relation to this vacancy.