Senior Site Reliability Engineer (SRE)

Posted Today by 1773918608

Apply

Negotiable

Inside

Remote

London

Apply

Amazon Elastic Kubernetes Service Amazon Web Services (AWS) Ansible Automation AWS Identity And Access Management (IAM) AWS Lambda Azure Kubernetes Service Balancing (Ledger/Billing) Containerisation Continuous Integration and Continuous Delivery Cyber Incident Response Docker (Software) Domain Name System (DNS) Servers Firewall Infrastructure as Code (IaC) Infrastructure Automation Kubernetes Load Balancing OpenShift Operational Efficiency Service Catalog TCP/IP Virtual Private Cloud

Summary: The Senior Site Reliability Engineer (SRE) role involves ensuring the reliability of high-traffic platforms in the video game industry. The position focuses on improving architecture, platform resiliency, and service performance while leading incident response and mentoring teams. This is a remote, 12-month contract with a high chance of extension. The role requires extensive experience in AWS and Kubernetes, among other technical skills.

Key Responsibilities:

Lead incident response and troubleshooting for production systems, resolving high-severity issues and driving post-incident improvements.
Influence architecture to improve platform-wide reliability, resiliency, and operational efficiency, ensuring services remain available under heavy load.
Drive containerisation best practices and manage Kubernetes-based workloads at scale.
Build and maintain event-driven architectures that scale globally while ensuring fault-tolerance and high availability.
Automate infrastructure provisioning, deployment, and monitoring using Infrastructure as Code (Terraform, CloudFormation, Ansible, CDK).
Collaborate with engineering, product, and security teams to define SLOs, SLIs, and error budgets across services.
Provide mentorship, advocate SRE best practices, and ensure teams are empowered to deliver resilient, reliable systems.

Key Skills:

Extensive experience in AWS and AWS-managed services (EC2, Lambda, S3, VPC, CloudWatch, CloudTrail, IAM, EKS, Service Catalog, multi-account environments).
Strong Kubernetes / container orchestration experience, including EKS, OpenShift, Docker, and service mesh.
Deep understanding of networking fundamentals: DNS, VPCs, routing, load balancing, TCP/IP, firewall policies.
Proven track record in incident response and troubleshooting at scale.
Hands-on experience with infrastructure automation and CI/CD pipelines.
Experience designing event-driven architectures and resilient systems.
High level of autonomy, able to influence platform-wide decisions and architect for reliability across services.
Ability and desire to mentor junior staff.
Bonus: experience in gaming, interactive entertainment, or other high-traffic, global-scale platforms.

Salary (Rate): undetermined

City: London

Country: UK

Working Arrangements: remote

IR35 Status: inside IR35

Seniority Level: undetermined

Industry: IT

Detailed Description From Employer:

Senior Site Reliability Engineer (SRE)
Remote

12-month contract (high chance of extension)

Job Description
Join a global pioneer in the video game industry and own the reliability of high-traffic, revenue-critical platforms used by millions worldwide. As a Senior SRE, you'll shape the architecture, improve platform-wide resiliency, and ensure services stay performant, scalable, and secure. This isn't just about maintaining a single system, you'll influence reliability across multiple services, driving improvements that touch the entire ecosystem.

Key Responsibilities

Lead incident response and troubleshooting for production systems, resolving high-severity issues and driving post-incident improvements.
Influence architecture to improve platform-wide reliability, resiliency, and operational efficiency, ensuring services remain available under heavy load.
Drive containerisation best practices and manage Kubernetes-based workloads at scale.
Build and maintain event-driven architectures that scale globally while ensuring fault-tolerance and high availability.
Automate infrastructure provisioning, deployment, and monitoring using Infrastructure as Code (Terraform, CloudFormation, Ansible, CDK).
Collaborate with engineering, product, and security teams to define SLOs, SLIs, and error budgets across services.
Provide mentorship, advocate SRE best practices, and ensure teams are empowered to deliver resilient, reliable systems.

Experience / Must-Have Skills

Extensive experience in AWS and AWS-managed services (EC2, Lambda, S3, VPC, CloudWatch, CloudTrail, IAM, EKS, Service Catalog, multi-account environments).
Strong Kubernetes / container orchestration experience, including EKS, OpenShift, Docker, and service mesh.
Deep understanding of networking fundamentals: DNS, VPCs, routing, load balancing, TCP/IP, firewall policies.
Proven track record in incident response and troubleshooting at scale.
Hands-on experience with infrastructure automation and CI/CD pipelines.
Experience designing event-driven architectures and resilient systems.
High level of autonomy, able to influence platform-wide decisions and architect for reliability across services.
Ability and desire to mentor junior staff
Bonus: experience in gaming, interactive entertainment, or other high-traffic, global-scale platforms.

If you are interested in this role, please feel free to submit your CV.

Apply

Inside IR35

Outside IR35

Permanent Employee

IR35

Umbrella Companies

Limited Companies

First Time Contractors

What Is IR35?

InsideIR35

Outside IR35

The Cost of IR35

IR35 Assessments

IR35 Rules

IR35 Compliance

Expenses

Foreign Companies

Overseas Contractors

Limited Companies

Sole Traders

What Is An Umbrella Company?

Choosing an Umbrella Company

Tax and Pay

Tax Avoidance

Fees (Margin)

National Insurance

Holiday Pay

Expenses

Pensions

Maternity Pay

Sick Pay

What Is A Limited Company?

Limited Company vs Sole Trader

Incorporation

Taxes

Filing Responsibilities

Bookkeeping

Insurance

Expenses

Buying a Car or Van

Capital Allowances

Benefits In Kind

Pensions

Employing A Spouse

Managing Excess Money

Dormant Companies

Closing Your Company

Withdrawing Money

Business Asset Disposal Relief

How To Become A Contractor

Inside IR35 Checklist

Outside IR35 Checklist

Self-Assessment Tax Returns

Mortgages

Pensions

Working Multiple Contracts

What is the £100k Abatement?

Inside IR35

Outside IR35

Permanent Employee

IR35

Umbrella Companies

Limited Companies

First Time Contractors

What Is IR35?

InsideIR35

Outside IR35

The Cost of IR35

IR35 Assessments

IR35 Rules

IR35 Compliance

Expenses

Foreign Companies

Overseas Contractors

Limited Companies

Sole Traders

What Is An Umbrella Company?

Choosing an Umbrella Company

Tax and Pay

Tax Avoidance

Fees (Margin)