Negotiable
Outside
Remote
USA
Summary: The role of ITSM Service Delivery focuses on managing Infrastructure Operations, particularly in Service Level Management and Availability Management. The ideal candidate will possess extensive experience in IT Operations and demonstrate strong technical and communication skills to interface with various stakeholders. Responsibilities include developing frameworks for service management, monitoring infrastructure health, and driving proactive measures to ensure service reliability. The position requires a blend of technical expertise and operational rigor to enhance service delivery in a fast-paced environment.
Key Responsibilities:
- Serve as the primary point of accountability for end-to-end service availability and/or Service Level Management.
- Monitor critical infrastructure and application health, leveraging (and in some cases, creating) advanced analytics and real-time dashboards to detect early warning signs and eliminate single points of failure.
- SME level experience in identifying relevant trends and outliers and provide executive-level insights.
- Develop Service Level Management framework, implement SLAs for all OneOps teams in ServiceNow, ensure these SLA/SLOs are included in appropriate PowerBI Dashboard.
- Partner with Architects, DevOps, SRE, and Application teams to drive awareness and alignment with Service Level and/or Availability and move towards a unified IT Operations model across the enterprise.
- Develop and maintain Service Availability Plans, incorporating business priorities, technical dependencies, and risk mitigation strategies.
- Own and evolve metrics for service uptime, reliability, MTTR/MTTI, and user-impacting events. Present trends and recommendations to both technical staff and executive leadership.
- Embed availability practices into Change Management, Release, and Problem Management workflows, ensuring risks are surfaced and planned for up front.
- Mentor team members in proactive monitoring and resilience engineering. Foster a culture of continuous improvement and transparency.
Key Skills:
- Bachelor's degree or equivalent practical experience in IT, Computer Science, Engineering, or a related field.
- 5+ years of hands-on experience in IT Operations, SRE, Service Level Management and Availability Management within enterprise-scale environments.
- Proven track record managing Service Level Management/Availability Management and in-depth experience implementing these services in alignment with ITIL.
- Deep understanding of IT infrastructure (compute, storage, network), cloud platforms (AWS/Azure), and modern application architectures.
- Strong PowerBI experience with experience ingesting data from various sources (like ServiceNow).
- Experience with monitoring, alerting, and analytics tools (e.g., ServiceNow, PagerDuty, PowerBI, Datadog, Splunk).
- Exceptional written and verbal communication skills; able to translate technical details for senior leaders and non-technical stakeholders.
- Analytical mindset: able to spot trends, correlate data, and identify improvement opportunities independently.
- Executive presence and the confidence to lead discussions, challenge assumptions, and drive decisions in high-visibility scenarios.
- Must be able to work independently with little oversight and progress quickly.
Salary (Rate): undetermined
City: undetermined
Country: USA
Working Arrangements: remote
IR35 Status: outside IR35
Seniority Level: undetermined
Industry: IT
Overview:
The Global Hosting Service Delivery team is responsible for managing Infrastructure Operations, including Major Incident Management, Problem (RCA) Management, Enterprise Change Management, and PagerDuty. Additionally, we are building new service offerings around Service Level Management and Availability Management. The ideal candidate will have strong Infrastructure, Cloud, and Operations experience in enterprise environments and possess deep subject matter expertise in Service Level Management and Availability Management. This person will require strong technical capabilities and confident communication skills. They must be able to multitask in a fast-paced environment with short timelines and high visibility from our clients and internal customers. This person will interface with Infrastructure Architects, Application Development within the Business Units, and Senior Leadership.
The ideal candidate will be comfortable communicating at all levels and have a broad technical understanding as well as specific, in-depth knowledge of implementing Service Level and Availability Management. This person should be able to gather requirements, ask appropriate questions, and have above-average communication skills, as well as project management and presentation skills. Strong PowerBI skills are required. This person must be able to conceptualize and translate their vision as well as quickly progress to implementation.
This team member will be responsible for creating the framework and implementing Service Level Management and/or Availability Management from the ground-up. Including gathering requirements, building appropriate dashboards, engaging with stakeholders, etc. This role blends technical depth with operational rigor, driving proactive measures to prevent outages, managing high-stakes incident response, and collaboration across business and IT to ensure resilient, always-on service delivery.
Key Responsibilities:
- Serve as the primary point of accountability for end-to-end service availability and/or Service Level Management.
- Monitor critical infrastructure and application health, leveraging (and in some cases, creating) advanced analytics and real-time dashboards to detect early warning signs and eliminate single points of failure.
- SME level experience in identifying relevant trends and outliers and provide executive-level insights.
- Develop Service Level Management framework, implement SLAs for all OneOps teams in ServiceNow, ensure these SLA/SLOs are included in appropriate PowerBI Dashboard.
- Partner with Architects, DevOps, SRE, and Application teams to drive awareness and alignment with Service Level and/or Availability and move towards a unified IT Operations model across the enterprise.
- Develop and maintain Service Availability Plans, incorporating business priorities, technical dependencies, and risk mitigation strategies.
- Own and evolve metrics for service uptime, reliability, MTTR/MTTI, and user-impacting events. Present trends and recommendations to both technical staff and executive leadership.
- Embed availability practices into Change Management, Release, and Problem Management workflows, ensuring risks are surfaced and planned for up front.
- Mentor team members in proactive monitoring and resilience engineering. Foster a culture of continuous improvement and transparency.
Required Skills & Experience:
- Bachelor s degree or equivalent practical experience in IT, Computer Science, Engineering, or a related field.
- 5+ years of hands-on experience in IT Operations, SRE, Service Level Management and Availability Management within enterprise-scale environments.
- Proven track record managing Service Level Management/Availability Management and in-depth experience implementing these services in alignment with ITIL.
- Deep understanding of IT infrastructure (compute, storage, network), cloud platforms (AWS/Azure), and modern application architectures.
- Strong PowerBI experience with experience ingesting data from various sources (like ServiceNow).
- Experience with monitoring, alerting, and analytics tools (e.g., ServiceNow, PagerDuty, PowerBI, Datadog, Splunk).
- Exceptional written and verbal communication skills; able to translate technical details for senior leaders and non-technical stakeholders.
- Analytical mindset: able to spot trends, correlate data, and identify improvement opportunities independently.
- Executive presence and the confidence to lead discussions, challenge assumptions, and drive decisions in high-visibility scenarios.
- Must be able to work independently with little oversight and progress quickly.
Preferred:
- ITIL, AWS/Azure, or related certifications (preference will be given to these candidates).
- Experience with automation and orchestration tools.
- Programming/scripting ability (Python, PowerShell, etc.) is a plus.
- Familiarity with DevOps/DevSecOps, SRE, and Monitoring/Observability platforms.