System Monitoring & Observability Engineer (Prometheus-Grafana)

System Monitoring & Observability Engineer (Prometheus-Grafana)

Posted 1 week ago by SRT Marine Systems plc

Negotiable
Outside
Hybrid
Cardiff, Wales, United Kingdom

Summary: The System Monitoring & Observability Engineer at SRT Marine Systems will focus on enhancing user observability solutions using Prometheus and Grafana. This role involves designing and maintaining monitoring solutions, alerting systems, and dashboards to improve operational insights for end-users. The position is hybrid, requiring one day a week in the Cardiff office, and is contract-based for six months. The engineer will collaborate with experienced teams to optimize system performance and reliability across a globally distributed infrastructure.

Key Responsibilities:

  • Design, configure, and maintain Prometheus-based monitoring solutions.
  • Develop and manage metric exporters for application and system-level data.
  • Define and maintain alert rules based on SLIs/SLOs and performance baselines.
  • Design and maintain Grafana dashboards for real-time operational insights.
  • Monitor infrastructure for uptime, latency, and throughput.
  • Keep the platform maintainable, easily configurable, and fully automatable.

Key Skills:

  • Proven experience with Prometheus (including PromQL) and Grafana in production environments.
  • Strong knowledge of Linux-based systems.
  • Experience writing and optimizing PromQL queries for alerts and dashboards.
  • Familiarity with Prometheus exporters (e.g. node_exporter, blackbox_exporter).
  • Understanding of alertmanager configuration and routing.
  • Proficiency with Grafana dashboard creation and templating.
  • Strong troubleshooting skills for infrastructure and application issues.
  • Familiarity with containers (Docker).
  • Scripting skills (Bash, Python, or Go) for automation.

Salary (Rate): undetermined

City: Cardiff

Country: United Kingdom

Working Arrangements: hybrid

IR35 Status: outside IR35

Seniority Level: undetermined

Industry: IT

Detailed Description From Employer:

Join Us at SRT Marine Systems as a System Monitoring & Observability Engineer (Prometheus / Grafana)

Job Title: System Monitoring & Observability Engineer (Prometheus / Grafana)

Location: 1 day / week in Cardiff office

Job Type: Contract, Hybrid, Full-Time

Duration: 6 months

Status and Rate: Outside of IR35, day rate.

SRT Marine Systems plc (SRT) are a market leader in its domain of international marine surveillance technology and systems. We are respected, established and an ambitious multi-national company headquartered in the UK with a global customer base. The company has a global impact in the marine domain by leading the next generation of maritime domain awareness technologies, products and systems that significantly enhance, security, safety and environment protection and sustainability. Our customers are worldwide and range from the largest national coast guards to individual vessel owners. SRT is an exciting company where high quality results are rewarded. We are ambitious and are constantly seeking to innovate to deliver better products and services to our customers. We strive to make SRT a rewarding and challenging place to work where talented hard-working individuals have the opportunity to make a real impact across the marine world.

About The Role

We are seeking a skilled engineer to implement an end-user observability visualisation. We already have observability dashboards in place for use by our engineers, implemented using Prometheus for metrics collection and Grafana for visualisation. This initiative is expected to build on that stack to provide a more user-friendly observability solution for end-users of our system. Our clients are in different countries around the world with varying WAN capabilities and our system is physically distributed in-country on-prem across several sites. You will be supported by a wealth of experienced engineers, including UX designers. Our lead observability engineer and a UX expert will provide guidance as needed.

What You’ll Be Doing

  • Monitoring & Metrics Collection
  • Design, configure, and maintain Prometheus-based monitoring solutions.
  • Develop and manage metric exporters for application and system-level data.
  • Optimize Prometheus scraping configurations and retention policies.
  • Alerting & Incident Response
  • Define and maintain alert rules based on SLIs/SLOs and performance baselines.
  • Ensure alerts are actionable, with minimal false positives.
  • Participate (not necessarily lead) in on-call rotations and incident postmortems.
  • Observability Dashboards
  • Design and maintain Grafana dashboards for real-time operational insights.
  • Collaborate with engineering and product teams to create tailored visualisations.
  • Provide self-service dashboard capabilities for end users.
  • System Performance & Reliability
  • Monitor infrastructure (servers, containers, databases, services) for uptime, latency, and throughput.
  • Identify bottlenecks and recommend improvements.
  • Platform Maintenance & Automation
  • Keep the platform maintainable, easily configurable, and fully automatable.
  • Enable simple redeployments and configuration changes with minimal effort.

What You’ll Bring

  • Proven experience with Prometheus (including PromQL) and Grafana in production environments.
  • Strong knowledge of Linux-based systems.
  • Experience writing and optimizing PromQL queries for alerts and dashboards.
  • Familiarity with Prometheus exporters (e.g. node_exporter, blackbox_exporter, custom exporters).
  • Understanding of alertmanager configuration and routing.
  • Proficiency with Grafana dashboard creation and templating.
  • Strong troubleshooting skills for infrastructure and application issues.
  • Strong ability and motivation to quickly learn and master new technologies and frameworks.
  • Familiarity with containers (Docker).
  • Scripting skills (Bash, Python, or Go) for automation.

Our Values at SRT Marine

  • Ambition – Aspiring to lead in maritime domain management.
  • Innovation – Driving improvement through creativity and forward-thinking.
  • Quality – Committing to high standards in performance and reliability.
  • Responsibility – Being individually accountable and team-driven.
  • Team – Collaborating openly with colleagues, partners, and customers.

Why Join Us?

  • Work on mission-critical maritime surveillance systems used worldwide.
  • Be part of an ambitious, innovative, and supportive team.
  • Make a direct impact on global maritime safety and sustainability.
  • Enjoy flexible hybrid working.
  • Competitive salary and benefits, including:
  • Matched pension contributions up to 5%
  • 25 days annual leave (rising to 28 with service)
  • Private health care
  • Flexible working opportunities
  • Development and training programmes

SRT Marine plc is an equal opportunity employer. We are committed to creating an inclusive environment for all employees and welcome applications from all backgrounds.