Negotiable
Undetermined
Undetermined
London
Summary: We are looking for a Senior Network Site Reliability Engineer (SRE) to join our global network operations team, focusing on the reliability, scalability, and performance of network infrastructure. The role involves leading incident responses, troubleshooting complex issues, and driving automation initiatives. This position requires extensive experience in network engineering and operations, along with strong technical leadership skills. The ideal candidate will have a deep understanding of multi-vendor environments and a commitment to operational excellence.
Key Responsibilities:
- Lead Incident Management: Own and resolve critical network incidents, manage outages, and provide expert guidance during high-pressure situations.
- Advanced Troubleshooting: Diagnose and resolve complex issues across routing, switching, Firewalling, and wireless domains.
- Technical Leadership: Set technical direction, mentor junior engineers, and foster a culture of operational excellence.
- 24/7 Operations: Participate in a shift-based model to ensure continuous availability of critical network services.
- Multi-Vendor Expertise: Operate across diverse environments including Arista, Cisco, Cumulus, Spectrum Ethernet, InfiniBand, Palo Alto, Check Point, Mist, Aruba, A10, Netscaler, and F5.
- Security & Segmentation: Support network segmentation, policy enforcement, and VPN solutions (GlobalProtect, AnyConnect).
- Automation & Observability: Utilize tools like Grafana, Big Panda, ServiceNow, ITMP, syslog, Splunk, Salt, Ansible, and Prometheus to enhance monitoring and automation.
- Innovation Projects: Collaborate on wireless design and AI cluster deployments to support cutting-edge initiatives.
Key Skills:
- Minimum 10 years' hands-on experience in network engineering and operations.
- Deep expertise in routing, switching, Firewalling, and wireless across multiple vendors.
- Strong troubleshooting skills, including overlay/underlay network understanding.
- Proficiency in Linux/Unix environments.
- Experience with automation and monitoring platforms.
- Ability to work independently, set technical direction, and mentor others.
- Tools: Netbox/Nautobot, Prometheus/VictoriaMetrics, Salt.
- Networking: EVPN Segment routing or significant MPLS depth.
Salary (Rate): undetermined
City: London
Country: United Kingdom
Working Arrangements: undetermined
IR35 Status: undetermined
Seniority Level: undetermined
Industry: IT
Job Description: Senior Network SRE
Role Overview:
We are seeking a highly experienced Senior Network Site Reliability Engineer (SRE) to join our global network operations team. This role is critical in ensuring the reliability, scalability, and performance of our network infrastructure. You will lead incident responses, troubleshoot complex issues, and drive automation initiatives to maintain world-class network services.
Required Skills:
- Minimum 10 years' hands-on experience in network engineering and operations.
- Deep expertise in routing, switching, Firewalling, and wireless across multiple vendors.
- Strong troubleshooting skills, including overlay/underlay network understanding.
- Proficiency in Linux/Unix environments.
- Experience with automation and monitoring platforms.
- Ability to work independently, set technical direction, and mentor others.
- Tools
- Netbox/Nautobot
- Prometheus/VictoriaMetrics
- Salt
- Networking (either one of the following)
- EVPN Segment routing (although I would accept someone with significant MPLS depth on their resume)
Key Responsibilities
- Lead Incident Management: Own and resolve critical network incidents, manage outages, and provide expert guidance during high-pressure situations.
- Advanced Troubleshooting: Diagnose and resolve complex issues across routing, switching, Firewalling, and wireless domains.
- Technical Leadership: Set technical direction, mentor junior engineers, and foster a culture of operational excellence.
- 24/7 Operations: Participate in a shift-based model to ensure continuous availability of critical network services.
- Multi-Vendor Expertise: Operate across diverse environments including Arista, Cisco, Cumulus, Spectrum Ethernet, InfiniBand, Palo Alto, Check Point, Mist, Aruba, A10, Netscaler, and F5.
- Security & Segmentation: Support network segmentation, policy enforcement, and VPN solutions (GlobalProtect, AnyConnect).
- Automation & Observability: Utilize tools like Grafana, Big Panda, ServiceNow, ITMP, syslog, Splunk, Salt, Ansible, and Prometheus to enhance monitoring and automation.
- Innovation Projects: Collaborate on wireless design and AI cluster deployments to support cutting-edge initiatives.
Preferred Skills
- Experience with InfiniBand and AI cluster deployments.
- Familiarity with network asset management systems (eg, Nautobot).
- Wireless design experience with Cisco, Mist, Aruba.