Lead Systems Engineer

Lead Systems Engineer

Posted 4 days ago by 1750753650

Negotiable
Outside
Remote
USA

Summary: The Lead Systems Engineer will support the Client's Systems Monitoring initiatives, focusing on monitoring tools like DataDog and Linux platforms. This role encompasses the full lifecycle of monitoring tools administration, including implementation, scripting, and dashboard creation. The ideal candidate will collaborate across functions to enhance monitoring capabilities for various Statements of Work (SOWs) in 2025 and beyond.

Key Responsibilities:

  • Administer and maintain monitoring tools, primarily DataDog, on Linux platforms.
  • Configure infrastructure, network, and application monitoring, including centralized logging and SNMP-based monitoring.
  • Instrument Java-based applications (e.g., running on Tomcat) with DataDog for Application Performance Monitoring (APM).
  • Create and manage dashboards and visualizations in DataDog.
  • Administer related monitoring platforms such as ELK Stack (Elasticsearch, Logstash, Kibana) and CloudBeat for synthetic monitoring.
  • Write automation scripts using Shell, Python, or Ansible.
  • Support logging configurations from various platforms including WebSphere, Tomcat, and AIX.
  • Set up Browser Real User Monitoring (RUM) and Synthetic Monitoring using Selenium and CloudBeat.
  • Troubleshoot production performance issues, correlate cross-platform data, and provide root cause analysis.
  • Collaborate with architecture and development teams to integrate monitoring early in the SDLC.
  • Document tool usage, configurations, procedures, and provide internal training as needed.

Key Skills:

  • 5-8 years of IT experience in distributed environments (Windows, Linux/Unix, VMware, SQL Server, network infrastructure).
  • Minimum 3 years of hands-on experience with DataDog administration or equivalent experience with ELK Stack.
  • Proficient in Shell scripting, Python, and Selenium; VuGen is a plus.
  • Experience configuring SSL certs and encryption on Linux systems.
  • Understanding of F5 Load Balancers, WebSeal, SNMP, Palo Alto, Gigamon, and network monitoring tools.
  • Comfortable with setting up monitoring in cloud and hybrid environments.
  • Experience with alerting, dashboarding, and reporting in DataDog or similar platforms.
  • Strong documentation skills, including SOPs, training material, and user guides.
  • Familiarity with service level management (SLAs, SLRs, etc.).
  • Bachelor's degree in computer science, Engineering, or related technical field (or equivalent experience).
  • Experience with Agile/SAFe methodologies.
  • Exposure to both Waterfall and Agile SDLC environments.

Salary (Rate): undetermined

City: undetermined

Country: USA

Working Arrangements: remote

IR35 Status: outside IR35

Seniority Level: undetermined

Industry: IT

Detailed Description From Employer:
Position Title: Lead Systems Engineer
Location: Washington, DC / Remote
Duration: 9 Months
W2 Only
Job Overview:
We are seeking a Lead Systems Engineer to support the Client s Systems Monitoring initiatives for several Statements of Work (SOWs) in 2025 and beyond. The ideal candidate will bring deep expertise in monitoring tools particularly DataDog and possess strong experience on the Linux platform. This role involves the full lifecycle of monitoring tools administration including implementation, scripting, dashboard creation, and cross-functional collaboration.
Key Responsibilities:
  • Administer and maintain monitoring tools, primarily DataDog, on Linux platforms.
  • Configure infrastructure, network, and application monitoring, including centralized logging and SNMP-based monitoring.
  • Instrument Java-based applications (e.g., running on Tomcat) with DataDog for Application Performance Monitoring (APM).
  • Create and manage dashboards and visualizations in DataDog.
  • Administer related monitoring platforms such as ELK Stack (Elasticsearch, Logstash, Kibana) and CloudBeat for synthetic monitoring.
  • Write automation scripts using Shell, Python, or Ansible.
  • Support logging configurations from various platforms including WebSphere, Tomcat, and AIX.
  • Set up Browser Real User Monitoring (RUM) and Synthetic Monitoring using Selenium and CloudBeat.
  • Troubleshoot production performance issues, correlate cross-platform data, and provide root cause analysis.
  • Collaborate with architecture and development teams to integrate monitoring early in the SDLC.
  • Document tool usage, configurations, procedures, and provide internal training as needed
Required Skills and Experience:
  • 5 8 years of IT experience in distributed environments (Windows, Linux/Unix, VMware, SQL Server, network infrastructure).
  • Minimum 3 years of hands-on experience with DataDog administration or equivalent experience with ELK Stack.
  • Proficient in Shell scripting, Python, and Selenium; VuGen is a plus.
  • Experience configuring SSL certs and encryption on Linux systems.
  • Understanding of F5 Load Balancers, WebSeal, SNMP, Palo Alto, Gigamon, and network monitoring tools.
  • Comfortable with setting up monitoring in cloud and hybrid environments.
  • Experience with alerting, dashboarding, and reporting in DataDog or similar platforms.
  • Strong documentation skills, including SOPs, training material, and user guides.
  • Familiarity with service level management (SLAs, SLRs, etc.).
  • Bachelor s degree in computer science, Engineering, or related technical field (or equivalent experience).
  • Experience with Agile/SAFe methodologies.
  • Exposure to both Waterfall and Agile SDLC environments.
Preferred Certifications:
  • ITIL Foundations v3 (must be obtained within 180 days if not currently held)
  • SAFe Certification