Negotiable
Outside
Remote
USA
Summary: The Lead Systems Engineer will support the Client's Systems Monitoring initiatives, focusing on monitoring tools like DataDog and Linux platforms. This role encompasses the full lifecycle of monitoring tools administration, including implementation, scripting, and dashboard creation. The ideal candidate will collaborate across functions to enhance monitoring capabilities for various Statements of Work (SOWs) in 2025 and beyond.
Key Responsibilities:
- Administer and maintain monitoring tools, primarily DataDog, on Linux platforms.
- Configure infrastructure, network, and application monitoring, including centralized logging and SNMP-based monitoring.
- Instrument Java-based applications (e.g., running on Tomcat) with DataDog for Application Performance Monitoring (APM).
- Create and manage dashboards and visualizations in DataDog.
- Administer related monitoring platforms such as ELK Stack (Elasticsearch, Logstash, Kibana) and CloudBeat for synthetic monitoring.
- Write automation scripts using Shell, Python, or Ansible.
- Support logging configurations from various platforms including WebSphere, Tomcat, and AIX.
- Set up Browser Real User Monitoring (RUM) and Synthetic Monitoring using Selenium and CloudBeat.
- Troubleshoot production performance issues, correlate cross-platform data, and provide root cause analysis.
- Collaborate with architecture and development teams to integrate monitoring early in the SDLC.
- Document tool usage, configurations, procedures, and provide internal training as needed.
Key Skills:
- 5-8 years of IT experience in distributed environments (Windows, Linux/Unix, VMware, SQL Server, network infrastructure).
- Minimum 3 years of hands-on experience with DataDog administration or equivalent experience with ELK Stack.
- Proficient in Shell scripting, Python, and Selenium; VuGen is a plus.
- Experience configuring SSL certs and encryption on Linux systems.
- Understanding of F5 Load Balancers, WebSeal, SNMP, Palo Alto, Gigamon, and network monitoring tools.
- Comfortable with setting up monitoring in cloud and hybrid environments.
- Experience with alerting, dashboarding, and reporting in DataDog or similar platforms.
- Strong documentation skills, including SOPs, training material, and user guides.
- Familiarity with service level management (SLAs, SLRs, etc.).
- Bachelor's degree in computer science, Engineering, or related technical field (or equivalent experience).
- Experience with Agile/SAFe methodologies.
- Exposure to both Waterfall and Agile SDLC environments.
Salary (Rate): undetermined
City: undetermined
Country: USA
Working Arrangements: remote
IR35 Status: outside IR35
Seniority Level: undetermined
Industry: IT
- Administer and maintain monitoring tools, primarily DataDog, on Linux platforms.
- Configure infrastructure, network, and application monitoring, including centralized logging and SNMP-based monitoring.
- Instrument Java-based applications (e.g., running on Tomcat) with DataDog for Application Performance Monitoring (APM).
- Create and manage dashboards and visualizations in DataDog.
- Administer related monitoring platforms such as ELK Stack (Elasticsearch, Logstash, Kibana) and CloudBeat for synthetic monitoring.
- Write automation scripts using Shell, Python, or Ansible.
- Support logging configurations from various platforms including WebSphere, Tomcat, and AIX.
- Set up Browser Real User Monitoring (RUM) and Synthetic Monitoring using Selenium and CloudBeat.
- Troubleshoot production performance issues, correlate cross-platform data, and provide root cause analysis.
- Collaborate with architecture and development teams to integrate monitoring early in the SDLC.
- Document tool usage, configurations, procedures, and provide internal training as needed
- 5 8 years of IT experience in distributed environments (Windows, Linux/Unix, VMware, SQL Server, network infrastructure).
- Minimum 3 years of hands-on experience with DataDog administration or equivalent experience with ELK Stack.
- Proficient in Shell scripting, Python, and Selenium; VuGen is a plus.
- Experience configuring SSL certs and encryption on Linux systems.
- Understanding of F5 Load Balancers, WebSeal, SNMP, Palo Alto, Gigamon, and network monitoring tools.
- Comfortable with setting up monitoring in cloud and hybrid environments.
- Experience with alerting, dashboarding, and reporting in DataDog or similar platforms.
- Strong documentation skills, including SOPs, training material, and user guides.
- Familiarity with service level management (SLAs, SLRs, etc.).
- Bachelor s degree in computer science, Engineering, or related technical field (or equivalent experience).
- Experience with Agile/SAFe methodologies.
- Exposure to both Waterfall and Agile SDLC environments.
- ITIL Foundations v3 (must be obtained within 180 days if not currently held)
- SAFe Certification