Negotiable
Outside
Remote
USA
Summary: This role as a Monitoring Engineer focuses on supporting the Technical Operations Center (TOC) at CareFirst by monitoring enterprise-wide systems and applications. The engineer will utilize tools like Dynatrace and Splunk to identify anomalies and improve processes while ensuring high availability of critical business processes. The position requires shift work and offers the flexibility of being fully remote. Proficiency in scripting and coding is a significant advantage for candidates.
Key Responsibilities:
- Provide eyes-on-glass monitoring using Dynatrace and other monitoring tools
- Support a 24x7 system monitoring service to proactively identify and assess problems
- Provide oversight, coordination, and visibility for critical business processes
- Perform system health checks, some manual some automated
- Identify, investigate, verify, report, communicate, and escalate critical events
- Review device logs documentation and analysis where applicable
- Develop runbooks and manage documentation for repeatable processes (Lifecycle Management)
- Follow basic triage steps, monitor production systems, and assure their high availability
- Facilitate and coordinate the necessary IT response to system problems
- Continuously analyze events and eliminate noise, and non-actionable event trends (Continual Service Improvement)
- Provide event management support to service owners and IT managers
- Author reports, trends and anomalies for KPI (Key Performance Indicators) for Event Management and Monitoring
- Communicate to stakeholders; support and facilitate open communication between all stakeholders.
Key Skills:
- Associate of Arts/Associate of Science and 3+ years of experience or equivalent combination such as bachelor's degree and 2+ years' experience or no degree and at least 3 years in a NOC/TOC, Command Center roles
- 3+ years IT experience and understanding of performance monitoring tools
- 3+ years Dynatrace monitoring experience
- 2+ years operating in a command center in an Incident Management, or Event Monitoring/Event Management role
- Ability to assess monitoring events and respond or escalate accordingly
- Knowledge and experience of system and network infrastructures such as LAN and WAN network technologies, server virtualization, enterprise storage area network (SAN) and backup, and database technologies
Salary (Rate): undetermined
City: Baltimore
Country: USA
Working Arrangements: remote
IR35 Status: outside IR35
Seniority Level: undetermined
Industry: IT
Provide eyes-on-glass monitoring using Dynatrace and other monitoring tools
Support a 24x7 system monitoring service to proactively identify and assess problems
Provide oversight, coordination, and visibility for critical business processes
Perform system health checks, some manual some automated
Identify, investigate, verify, report, communicate, and escalate critical events
Review device logs documentation and analysis where applicable
Develop runbooks and manage documentation for repeatable processes (Lifecycle Management)
Will follow basic triage steps, monitor production systems, and assure their high availability
Facilitate and coordinate the necessary IT response to system problems
Continuously analyze events and eliminate noise, and non-actionable event trends (Continual Service Improvement)
Provide event management support to service owners and IT managers
Author reports, trends and anomalies for KPI (Key Performance Indicators) for Event Management and Monitoring
Communicate to stakeholders; support and facilitate open communication between all stakeholders.
Associate of Arts/Associate of Science and 3+years of experience or equivalent combination such as bachelor's degree and 2+ years' experience or no degree and at least 3 years in a NOC/TOC, Command Center roles.
3+ years IT experience and understanding of performance monitoring tools
3+ years Dynatrace monitoring experience
2+ years operating in a command center in an Incident Management, or Event Monitoring/Event Management role
Ability to assess monitoring events and respond or escalate accordingly
Knowledge and experience of system and network infrastructures such as LAN and WAN network technologies, server virtualization, enterprise storage area network (SAN) and backup, and database technologies