Negotiable
Outside
Remote
USA
Summary: The Senior Production Support Engineer is responsible for leading and coordinating high-severity incident calls, ensuring effective communication and resolution. This role involves conducting post-incident reviews, managing problem records, and collaborating with cross-functional teams to implement improvements. The engineer will also engage with client stakeholders to align on reliability goals and maintain reporting dashboards for service health.
Key Responsibilities:
- Lead and coordinate SEV1/SEV2 incident calls, ensuring timely resolution and clear communication.
- Conduct thorough post-incident reviews and drive root cause analysis (RCA).
- Manage Problem Records (PRBs) and associated tasks to eliminate recurring issues.
- Collaborate with cross-functional teams to implement corrective and preventive actions.
- Engage with client stakeholders to negotiate priorities, communicate impact, and align on reliability goals.
- Identify and drive continuous improvement initiatives across systems, processes, and tooling.
- Maintain dashboards and reporting for incident trends, problem management, and service health.
Key Skills:
- Experience in incident management and problem resolution.
- Strong analytical and root cause analysis skills.
- Ability to collaborate with cross-functional teams.
- Excellent communication and negotiation skills.
- Proficiency in maintaining dashboards and reporting tools.
- Knowledge of continuous improvement methodologies.
Salary (Rate): undetermined
City: undetermined
Country: USA
Working Arrangements: remote
IR35 Status: outside IR35
Seniority Level: undetermined
Industry: Other
- Lead and coordinate SEV1/SEV2 incident calls, ensuring timely resolution and clear communication.
- Conduct thorough post-incident reviews and drive root cause analysis (RCA).
- Manage Problem Records (PRBs) and associated tasks to eliminate recurring issues.
- Collaborate with cross-functional teams to implement corrective and preventive actions.
- Engage with client stakeholders to negotiate priorities, communicate impact, and align on reliability goals.
- Identify and drive continuous improvement initiatives across systems, processes, and tooling.
- Maintain dashboards and reporting for incident trends, problem management, and service health.