Site Reliability Engineer - AWS & Azure

Site Reliability Engineer - AWS & Azure

Posted 1 day ago by Square One Resources

£605 Per day
Inside
Undetermined
England, UK

Summary: We are looking for a Site Reliability Engineer with expertise in Azure and AWS to lead the migration of an on-prem HPC solution to the cloud. The role focuses on enhancing cloud infrastructure reliability, scalability, and performance through automation and software engineering practices. The ideal candidate will integrate development and operations, applying a software engineering mindset to IT infrastructure. This position is a 12-month contract based in the UK.

Key Responsibilities:

  • Work with existing solutions already in place in the US to redefine, implement, and maintain scalable, reliable cloud infrastructure across Azure and AWS for the UK business as a similar but separate entity.
  • Develop automation scripts and tools to streamline operational tasks such as log analysis, environment testing, and incident response.
  • Collaborate with development and operations teams to ensure seamless deployment and performance of applications and services.
  • Monitor system performance and availability, proactively identifying and resolving issues.
  • Apply software engineering principles to infrastructure management, improving efficiency and reducing manual effort.
  • Deliver value by monitoring spending, optimizing resource usage, right-sizing and automation, and implement governance through tagging strategies and budget alerts.
  • Document the solution and deliver knowledge transfer and training to existing team members.

Key Skills:

  • Strong understanding of cloud-native architectures and services in Azure and AWS including AKS/EKS and its automation.
  • Experience with infrastructure-as-code tools (e.g., Terraform).
  • Familiarity with CI/CD pipelines, containerization (Docker, Kubernetes), and monitoring tools.
  • Knowledge of data processing and configuration design.
  • Experience with IT infrastructure and monitoring systems.
  • Bachelor's degree in Computer Science, Computer Engineering, Information Technology, or a related field.
  • Extensive experience in site reliability engineering, DevOps, or cloud infrastructure roles.

Salary (Rate): £605 per day

City: undetermined

Country: UK

Working Arrangements: undetermined

IR35 Status: inside IR35

Seniority Level: undetermined

Industry: IT

Detailed Description From Employer:

Job Title: Site Reliability Engineer - Azure & AWS
Location: UK
Salary/Rate: Up to £605 per day (Inside IR35)
Start Date: Jan 2026
Job Type: 12-month contract

Role overview:

We are seeking a highly skilled Site Reliability Engineer (SRE) with expertise in both Azure and AWS cloud platforms. This position is responsible for taking a lead role in migrating an existing on-prem HPC solution into Cloud, enhancing the reliability, scalability, and performance of that cloud infrastructure through automation, software engineering practices, and proactive system management. The ideal candidate will bridge the gap between development and operations, applying a software engineering mindset to IT operations and infrastructure.

Required Skills/Experience
The ideal candidate will have the following:

  • Strong understanding of cloud-native architectures and services in Azure and AWS including AKS/EKS and it's automation.
  • Experience with infrastructure-as-code tools (eg, Terraform).
  • Familiarity with CI/CD pipelines, containerization (Docker, Kubernetes), and monitoring tools.
  • Knowledge of data processing and configuration design.
  • Experience with IT infrastructure and monitoring systems.

Job Responsibilities/Objectives:

  • Work with existing solutions already in place in the US to redefine, implement, and maintain scalable, reliable cloud infrastructure across Azure and AWS for the UK business as a similar but separate entity.
  • Develop automation scripts and tools to streamline operational tasks such as log analysis, environment testing, and incident response.
  • Collaborate with development and operations teams to ensure seamless deployment and performance of applications and services.
  • Monitor system performance and availability, proactively identifying and resolving issues.
  • Apply software engineering principles to infrastructure management, improving efficiency and reducing manual effort.
  • Deliver value by monitoring spending, optimizing resource usage, right-sizing and automation, and implement governance through tagging strategies and budget alerts.
  • Document the solution and deliver knowledge transfer and training to existing team members.

Education & Experience:

  • Bachelor's degree in Computer Science, Computer Engineering, Information Technology, or a related field.
  • Extensive experience in site reliability engineering, DevOps, or cloud infrastructure roles.

If you are interested in this opportunity, please apply now with your updated CV in Microsoft Word/PDF format.

Disclaimer
Notwithstanding any guidelines given to level of experience sought, we will consider candidates from outside this range if they can demonstrate the necessary competencies.
Square One is acting as both an employment agency and an employment business, and is an equal opportunities recruitment business. Square One embraces diversity and will treat everyone equally. Please see our website for our full diversity statement.