Negotiable
Undetermined
Hybrid
London Area, United Kingdom
Summary: The SRE Transformation Lead for Global Banking & Payments is responsible for leading the transition from traditional L2 production support to a Site Reliability Engineering (SRE) operating model within a highly regulated banking environment. This senior role involves defining and implementing SRE practices to enhance reliability, reduce manual toil, and improve service visibility across critical banking services. The successful candidate will leverage their extensive experience in SRE to influence various stakeholders and drive measurable outcomes in reliability and operational efficiency.
Key Responsibilities:
- Lead the design and execution of the SRE adoption approach across Global Banking & Payments.
- Establish engagement patterns between SRE, application teams, and platform teams.
- Drive adoption of Critical User Journeys, SLIs, SLOs, and error budgets for priority services.
- Implement error budget based decisioning balancing reliability, delivery velocity, and operational risk.
- Identify operational toil and lead initiatives to eliminate it through automation and tooling improvements.
- Improve production outcomes through strong incident response practices and preventive engineering actions.
- Establish practical observability standards to reduce noise and improve signal quality.
- Influence leaders across operations, engineering, and product to adopt SRE principles.
Key Skills:
- Significant experience in Site Reliability Engineering.
- Demonstrated experience leading SRE transformation in a corporate banking environment.
- Proven ability to implement and scale SLO/SLI and error budget approaches.
- Strong engineering background with a focus on automation.
- Deep knowledge of incident response and operational resilience practices.
- Strong stakeholder management and communication skills.
Salary (Rate): undetermined
City: London
Country: United Kingdom
Working Arrangements: hybrid
IR35 Status: undetermined
Seniority Level: undetermined
Industry: IT
Job Title: SRE Transformation Lead (Global Banking & Payments) Contract: 12 Months Location: Bromley / London (3 days a week) Working pattern: Full time
Role Summary Global Banking & Payments is building and scaling Site Reliability Engineering (SRE) across a large, highly regulated banking environment. We are seeking a senior SRE practitioner to lead and accelerate transformation from traditional L2 production support toward an SRE operating model . This role will help define, implement, and embed SRE practices across critical payment and banking services, enabling measurable reliability outcomes, reduced manual toil, stronger automation, and improved service visibility. The successful candidate will bring proven, hands-on experience implementing SRE in a large corporate bank and will be able to influence operations, engineering, and product partners to institutionalize SRE practices on a scale.
Required Qualifications Significant experience in Site Reliability Engineering and implementing SRE practices across large scale, complex services in essential Demonstrated experience leading an SRE transformation in a corporate banking environment (or similarly regulated financial services enterprise). Proven ability to implement and scale SLO/SLI and error budget approaches, and to operationalize them across multiple teams and services. Strong engineering background with the ability to drive automation and reduce manual toil through code, tooling, and process redesign. Deep knowledge of incident response, problem management, root cause analysis, and operational resilience practices in mission critical environments. Strong stakeholder management skills, able to influence technology and business partners and communicate effectively at senior levels.
What You Will Do (Key Responsibilities) SRE Operating Model and Transformation Lead the design and execution of the SRE adoption approach across Global Banking & Payments, including the transition path from traditional L2 support to reliability engineering. Establish practical engagement patterns between SRE, application teams, and platform teams and help teams adopt a consistent way of working. Reliability Measurement and Decisioning Drive adoption of Critical User Journeys , Service Level Indicators (SLIs) , Service Level Objectives (SLOs) , and error budgets for priority services, ensuring metrics reflect user experience and business outcomes Help teams implement error budget based decisioning that balances reliability, delivery velocity, and operational risk Toil Reduction, Automation, and Engineering Excellence Identify operational toil and lead initiatives to eliminate it through automation, self-healing patterns, runbook automation, and operational tooling improvements Establish and implement a model to partner with engineering teams to build reliability into services through design improvements, improved instrumentation, and resilience patterns Incident and Problem Management Excellence Improve production outcomes through strong incident response practices, including major incident triage support, root cause analysis, post incident reviews, and preventive engineering actions. Strengthen problem management with a focus on reducing repeat incidents, technical debt risk, and manual intervention. Observability and Tooling Enablement Establish practical observability standards across logs, metrics, traces, dashboards, and alerting to reduce noise, improve signal quality, and shorten time to detect and restore service. Partner across platform, tooling, and service management teams to align SRE needs to enterprise tooling and processes Work with tools like Splunk, Dynatrace, OTEL and instrument end to end observability for services, ensuring teams are able to adopt and use the platforms Stakeholder Management and Change Leadership Influence leaders across operations, engineering, and product to adopt SRE principles and measurable reliability goals Communicate clearly with senior stakeholders, including executive updates on progress, adoption, and outcomes Key Competencies Transformation leadership in complex, matrixed environments Strong engineering judgment and pragmatic problem solving Ability to simplify, standardize, and scale operating practices Calm and effective leadership during production events Excellent written and verbal communication
Preferred Qualifications: Experience with high-availability banking platforms and 24x7 operational expectations. Familiarity with observability tools and building SRE communities of practise.
Why Join Us? Be a Pioneer : Lead the charge in transforming how reliability engineering is approached in the banking sector. Collaborative Environment : Work with a diverse team that values innovation, teamwork, and excellence. Professional Growth : Take on a pivotal role that will challenge and expand your skills in a dynamic and fast-paced industry. Are you ready to take the next step in your career and make a lasting impact? If you have the expertise and enthusiasm for driving SRE transformation, we want to hear from you! Apply Now! Join our client in revolutionising the Global Banking & Payments landscape. Your journey toward making a difference starts here! Pontoon is an employment consultancy. We put expertise, energy, and enthusiasm into improving everyone’s chance of being part of the workplace. We respect and appreciate people of all ethnicities, generations, religious beliefs, sexual orientations, gender identities, and more. We do this by showcasing their talents, skills, and unique experience in an inclusive environment that helps them thrive. If you require reasonable adjustments at any stage, please let us know and we will be happy to support you.