Cloud Infrastructure Site Reliability Engineer
Posted 1 week ago by Red - The Global SAP Solutions Provider
Negotiable
Undetermined
Hybrid
Sheffield, South Yorkshire, UK
Summary: The role of Azure SRE/Cloud Platform Engineer involves joining a high-impact team to work on large-scale, cloud-native infrastructure with a focus on Azure and GCP. This hands-on position requires building, automating, and running highly available platforms while applying modern SRE principles. The ideal candidate will enjoy solving complex problems and improving reliability across cloud, DevOps, and data platforms.
Key Responsibilities:
- Build, operate, and support scalable Azure and GCP cloud infrastructure
- Apply SRE principles to improve reliability, performance, and automation
- Create and maintain Infrastructure as Code (Terraform/ARM)
- Develop scripts and tooling (PowerShell, Bash, Python)
- Monitor systems using tools like Azure Monitor, Prometheus, Grafana
- Troubleshoot complex, cross-platform production issues
- Collaborate across teams to deliver resilient, production-grade services
- Continuously improve systems, processes, and deployment pipelines
Key Skills:
- Strong experience in Azure + DevOps/SRE environments
- Solid Scripting skills (PowerShell, Bash, Python)
- Experience with CI/CD, Git, automation pipelines
- Good understanding of Linux, networking, and cloud architecture
- Hands-on with Terraform or similar IaC tools
- Proven ability to troubleshoot and optimise production systems
Salary (Rate): undetermined
City: Sheffield
Country: UK
Working Arrangements: hybrid
IR35 Status: undetermined
Seniority Level: undetermined
Industry: IT
Azure SRE/Cloud Platform Engineer
Sheffield (Hybrid - 3 days onsite)
6-Month Contract (Likely Extension)
We're looking for a GCP/Azure-focused Site Reliability Engineer/Cloud Platform Engineer to join a high-impact team working on large-scale, cloud-native infrastructure.
This is a hands-on role where you'll build, automate, and run highly available platforms, applying modern SRE principles to real-world systems at scale. You are an advocate of SRE principles (Google model) for building and operating resilient, large-scale cloud systems
If you enjoy solving complex problems, improving reliability, and working across cloud, DevOps, and data platforms, this is for you.
What You'll Be Doing- Build, operate, and support scalable Azure and GCP cloud infrastructure
- Apply SRE principles to improve reliability, performance, and automation
- Create and maintain Infrastructure as Code (Terraform/ARM)
- Develop scripts and tooling (PowerShell, Bash, Python)
- Monitor systems using tools like Azure Monitor, Prometheus, Grafana
- Troubleshoot complex, cross-platform production issues
- Collaborate across teams to deliver resilient, production-grade services
- Continuously improve systems, processes, and deployment pipelines
- Strong experience in Azure + DevOps/SRE environments
- Solid Scripting skills (PowerShell, Bash, Python)
- Experience with CI/CD, Git, automation pipelines
- Good understanding of Linux, networking, and cloud architecture
- Hands-on with Terraform or similar IaC tools
- Proven ability to troubleshoot and optimise production systems