Linux Site Reliability Engineer

Linux Site Reliability Engineer

Posted 1 day ago by NP Group

Negotiable
Inside
Hybrid
Glasgow, Scotland, UK

Summary: The role of a Linux Site Reliability Engineer (SRE) involves joining an infrastructure support team to enhance platform reliability in a large-scale enterprise environment. The engineer will focus on resolving hardware and platform-related incidents, leveraging strong Linux systems expertise and physical server troubleshooting skills. A proactive approach to operational improvement and automation is essential for success in this position. The role is hybrid, requiring three days on-site in Glasgow, and is classified as inside IR35 via an umbrella solution.

Key Responsibilities:

  • Resolve hardware and platform-related incidents escalated from the L3 support team.
  • Utilize strong Linux administration and troubleshooting skills.
  • Manage incidents end-to-end, including triage, mitigation, and resolution.
  • Document procedures and contribute to knowledge bases.
  • Participate in post-incident reviews and root cause analysis.

Key Skills:

  • Strong Linux administration and troubleshooting skills.
  • Understanding of server hardware and peripherals.
  • Experience with out-of-band management technologies.
  • Knowledge of SRE operational practices and metrics.
  • Strong communication and documentation skills.
  • Scripting and automation skills (Bash, Python).
  • Familiarity with virtualization and containerization concepts.
  • Experience with monitoring and observability workflows.

Salary (Rate): undetermined

City: Glasgow

Country: UK

Working Arrangements: hybrid

IR35 Status: inside IR35

Seniority Level: undetermined

Industry: IT

Detailed Description From Employer:

Contract: Site Reliability Engineer (Linux Administration & Server Hardware)
Location: Glasgow (hybrid - 3 days onsite)
Duration: 6 months
Day Rate: Negotiable (Inside IR35 via umbrella solution)


Reference: 20460

We are looking for an experienced Linux Site Reliability Engineer (SRE) to join a high-performing infrastructure support team focused on maintaining and improving critical platform reliability within a large-scale enterprise environment.

This position will focus on resolving hardware and platform-related incidents escalated from the L3 support team. The successful candidate will have strong Linux systems expertise and hands-on physical server troubleshooting experience, and a proactive approach to operational improvement, automation, and incident reduction.

Essential Skills/Requirements

  • Strong Linux administration and troubleshooting skills (process, networking basics, logs, package/service management).
  • Solid understanding of server hardware and peripherals (disks, RAID/HBA, NICs, firmware) and how failures present at OS level.
  • Experience with out-of-band management/lights-out technologies (eg, iDRAC, iLO, IPMI/Redfish) for remote troubleshooting and recovery.
  • Proven ability to own incidents end-to-end: triage, identify mitigations/workarounds, coordinate with L3/engineering, communicate status, and drive to resolution.
  • Understanding of SRE operational practices and metrics (eg, SLO/SLI concepts, error budgets, MTTD/MTTR) and a continuous-improvement mindset.
  • Strong communication skills (written and verbal): clear incident updates, customer/stakeholder management, and effective escalation and handoffs.
  • Strong documentation skills: writing clear runbooks/procedures, contributing to knowledge bases, and participating in post-incident reviews/root cause analysis.

Nice to Have/Desired Skills

  • Scripting and automation skills (eg, Bash, Python) to build small tools, checks, and workflow automation that reduce toil.
  • Familiarity with virtualization and containerization concepts/operations (eg, VMware/KVM, Docker, Kubernetes) and using automation to support these environments.
  • Experience with monitoring/observability and alerting workflows (dashboards, log analysis, alert tuning) and translating signals into actionable response steps

Networking People (UK) is acting as an Employment Business in relation to this vacancy.