Senior Site Reliability Engineer

Posted 1 week ago by SixteenFifty

Apply

Negotiable

Undetermined

London Area, United Kingdom

Apply

Amazon Web Services (AWS) Automation Azure DevOps Azure Kubernetes Service Bash (Scripting Language) Change Management Cloud Computing Cloud Engineering Cloud Infrastructure Cloud Services Cloud Technology Containerisation Continuous Integration and Continuous Delivery Datadog DevOps Docker (Software) Generic Programming Github Gitlab Infrastructure as Code (IaC) Java (Programming Language) Jenkins Kubernetes Management Microsoft Azure Operational Excellence Python (Programming Language) Root Cause Analysis Scalability Scripting Site Reliability Engineering Software Development Splunk Stakeholder Management Terraform

Summary: The role of Contract Site Reliability Engineer Team Lead involves providing technical leadership and mentorship to a team of Site Reliability Engineers (SREs) while remaining hands-on with cloud infrastructure and operational excellence. The position requires a strong technical background in AWS and a focus on driving reliability and performance improvements across critical production systems. Occasional travel to the client's offices in Central London is expected. The ideal candidate will champion Infrastructure as Code practices and collaborate closely with various teams to enhance system reliability.

Key Responsibilities:

Provide technical leadership for the Site Reliability Engineering function, driving reliability, scalability, and performance improvements across critical production systems.
Lead, mentor, and coach a team of SREs and engineers, fostering a culture of operational excellence, collaboration, and continuous improvement.
Remain hands-on with the design, implementation, and support of cloud infrastructure, automation, observability, and platform reliability initiatives.
Define, implement, and govern SLOs, SLIs, and error budgets, ensuring alignment between engineering priorities and business objectives.
Architect, maintain, and optimise highly available, distributed systems within an AWS cloud environment.
Drive change management initiatives across infrastructure, platforms, and operational processes, ensuring smooth adoption of new technologies and ways of working.
Champion Infrastructure as Code (IaC) and automation practices, reducing manual operational effort through tools such as Terraform and CloudFormation.
Collaborate closely with development, platform, and operational teams to embed reliability and resilience best practices throughout the software development lifecycle.
Lead incident management, root cause analysis, and continuous service improvement activities.
Establish and enhance monitoring, alerting, and observability capabilities across the technology estate.

Key Skills:

Proven experience in a Site Reliability Engineering, DevOps, Cloud Engineering, or Infrastructure Engineering role, with experience leading or mentoring technical teams.
Demonstrable hands-on technical expertise alongside leadership responsibilities.
Strong experience delivering and managing change within complex technology environments.
Extensive experience working with AWS cloud services and architectures.
Strong Linux/Unix systems administration knowledge.
Proficiency in one or more scripting or programming languages such as Python, Bash, Go, or Java.
Strong experience with Infrastructure as Code tools, including Terraform and/or CloudFormation.
Experience with containerisation and orchestration technologies, including Docker and Kubernetes.
Familiarity with CI/CD tooling such as Jenkins, GitHub Actions, GitLab CI, or Azure DevOps.
Essential experience with observability and monitoring platforms, including Datadog and Splunk.
Strong understanding of distributed systems, networking, security principles, and cloud-native architectures.
Excellent troubleshooting, problem-solving, and stakeholder management skills.

Salary (Rate): undetermined

City: London

Country: United Kingdom

Working Arrangements: undetermined

IR35 Status: undetermined

Seniority Level: undetermined

Industry: IT

Detailed Description From Employer:

My client is seeking an experienced Contract Site Reliability Engineer Team Lead to join their team. This is a hands-on leadership role requiring a strong technical background alongside the ability to lead, mentor, and develop a high-performing SRE function. There will be occasional travel to the client's offices in Central London.

Responsibilities

Provide technical leadership for the Site Reliability Engineering function, driving reliability, scalability, and performance improvements across critical production systems.
Lead, mentor, and coach a team of SREs and engineers, fostering a culture of operational excellence, collaboration, and continuous improvement.
Remain hands-on with the design, implementation, and support of cloud infrastructure, automation, observability, and platform reliability initiatives.
Define, implement, and govern SLOs, SLIs, and error budgets, ensuring alignment between engineering priorities and business objectives.
Architect, maintain, and optimise highly available, distributed systems within an AWS cloud environment.
Drive change management initiatives across infrastructure, platforms, and operational processes, ensuring smooth adoption of new technologies and ways of working.
Champion Infrastructure as Code (IaC) and automation practices, reducing manual operational effort through tools such as Terraform and CloudFormation.
Collaborate closely with development, platform, and operational teams to embed reliability and resilience best practices throughout the software development lifecycle.
Lead incident management, root cause analysis, and continuous service improvement activities.
Establish and enhance monitoring, alerting, and observability capabilities across the technology estate.

Required Skills & Experience

Proven experience in a Site Reliability Engineering, DevOps, Cloud Engineering, or Infrastructure Engineering role, with experience leading or mentoring technical teams.
Demonstrable hands-on technical expertise alongside leadership responsibilities.
Strong experience delivering and managing change within complex technology environments.
Extensive experience working with AWS cloud services and architectures, as the client's platform is hosted within AWS.
Strong Linux/Unix systems administration knowledge.
Proficiency in one or more scripting or programming languages such as Python, Bash, Go, or Java.
Strong experience with Infrastructure as Code tools, including Terraform and/or CloudFormation.
Experience with containerisation and orchestration technologies, including Docker and Kubernetes.
Familiarity with CI/CD tooling such as Jenkins, GitHub Actions, GitLab CI, or Azure DevOps.
Essential experience with observability and monitoring platforms, including Datadog and Splunk.
Strong understanding of distributed systems, networking, security principles, and cloud-native architectures.
Excellent troubleshooting, problem-solving, and stakeholder management skills.

Desirable Experience

Operating within large-scale, mission-critical production environments.
Previous experience establishing or maturing SRE practices and operating models.
Relevant AWS, Kubernetes, or cloud certifications.

Please apply for immediate consideration.

Apply

Inside IR35

Outside IR35

Permanent Employee

IR35

Umbrella Companies

Limited Companies

First Time Contractors

What Is IR35?

InsideIR35

Outside IR35

The Cost of IR35

IR35 Assessments

IR35 Rules

IR35 Compliance

Expenses

Foreign Companies

Overseas Contractors

Limited Companies

Sole Traders

What Is An Umbrella Company?

Choosing an Umbrella Company

Tax and Pay

Tax Avoidance

Fees (Margin)

National Insurance

Holiday Pay

Expenses

Pensions

Maternity Pay

Sick Pay

What Is A Limited Company?

Limited Company vs Sole Trader

Incorporation

Taxes

Filing Responsibilities

Bookkeeping

Insurance

Expenses

Buying a Car or Van

Capital Allowances

Benefits In Kind

Pensions

Employing A Spouse

Managing Excess Money

Dormant Companies

Closing Your Company

Withdrawing Money

Business Asset Disposal Relief

How To Become A Contractor

Inside IR35 Checklist

Outside IR35 Checklist

Self-Assessment Tax Returns

Mortgages

Pensions

Working Multiple Contracts

What is the £100k Abatement?

Inside IR35

Outside IR35

Permanent Employee

IR35

Umbrella Companies

Limited Companies

First Time Contractors

What Is IR35?

InsideIR35

Outside IR35

The Cost of IR35

IR35 Assessments

IR35 Rules

IR35 Compliance

Expenses

Foreign Companies

Overseas Contractors

Limited Companies

Sole Traders

What Is An Umbrella Company?

Choosing an Umbrella Company

Tax and Pay

Tax Avoidance

Fees (Margin)