Cloud Infrastructure Site Reliability Engineer

Cloud Infrastructure Site Reliability Engineer

Posted 1 week ago by Red - The Global SAP Solutions Provider

Negotiable
Undetermined
Hybrid
Sheffield, South Yorkshire, UK

Summary: The role of Azure SRE/Cloud Platform Engineer involves joining a high-impact team to work on large-scale, cloud-native infrastructure with a focus on Azure and GCP. This hands-on position requires building, automating, and running highly available platforms while applying modern SRE principles. The ideal candidate will enjoy solving complex problems and improving reliability across cloud, DevOps, and data platforms.

Key Responsibilities:

  • Build, operate, and support scalable Azure and GCP cloud infrastructure
  • Apply SRE principles to improve reliability, performance, and automation
  • Create and maintain Infrastructure as Code (Terraform/ARM)
  • Develop scripts and tooling (PowerShell, Bash, Python)
  • Monitor systems using tools like Azure Monitor, Prometheus, Grafana
  • Troubleshoot complex, cross-platform production issues
  • Collaborate across teams to deliver resilient, production-grade services
  • Continuously improve systems, processes, and deployment pipelines

Key Skills:

  • Strong experience in Azure + DevOps/SRE environments
  • Solid Scripting skills (PowerShell, Bash, Python)
  • Experience with CI/CD, Git, automation pipelines
  • Good understanding of Linux, networking, and cloud architecture
  • Hands-on with Terraform or similar IaC tools
  • Proven ability to troubleshoot and optimise production systems

Salary (Rate): undetermined

City: Sheffield

Country: UK

Working Arrangements: hybrid

IR35 Status: undetermined

Seniority Level: undetermined

Industry: IT

Detailed Description From Employer:

Azure SRE/Cloud Platform Engineer

Sheffield (Hybrid - 3 days onsite)
6-Month Contract (Likely Extension)

We're looking for a GCP/Azure-focused Site Reliability Engineer/Cloud Platform Engineer to join a high-impact team working on large-scale, cloud-native infrastructure.

This is a hands-on role where you'll build, automate, and run highly available platforms, applying modern SRE principles to real-world systems at scale. You are an advocate of SRE principles (Google model) for building and operating resilient, large-scale cloud systems

If you enjoy solving complex problems, improving reliability, and working across cloud, DevOps, and data platforms, this is for you.

What You'll Be Doing
  • Build, operate, and support scalable Azure and GCP cloud infrastructure
  • Apply SRE principles to improve reliability, performance, and automation
  • Create and maintain Infrastructure as Code (Terraform/ARM)
  • Develop scripts and tooling (PowerShell, Bash, Python)
  • Monitor systems using tools like Azure Monitor, Prometheus, Grafana
  • Troubleshoot complex, cross-platform production issues
  • Collaborate across teams to deliver resilient, production-grade services
  • Continuously improve systems, processes, and deployment pipelines
What You Bring
  • Strong experience in Azure + DevOps/SRE environments
  • Solid Scripting skills (PowerShell, Bash, Python)
  • Experience with CI/CD, Git, automation pipelines
  • Good understanding of Linux, networking, and cloud architecture
  • Hands-on with Terraform or similar IaC tools
  • Proven ability to troubleshoot and optimise production systems