Splunk and OpenShift Observability Engineer

Splunk and OpenShift Observability Engineer

Posted 5 days ago by CBSbutler Holdings Limited trading as CBSbutler

£490 Per day
Undetermined
Undetermined
Birmingham, West Midlands

Summary: The Splunk & OpenShift Observability Engineer will design, deploy, and optimize enterprise-grade monitoring solutions across hybrid Kubernetes and OpenShift environments. This role focuses on shaping observability strategy, enhancing service intelligence, and ensuring platform reliability at scale. The engineer will transform raw telemetry into actionable insights, influencing reliability strategy and improving operational maturity within a cloud-native estate.

Key Responsibilities:

  • Design, deploy, and operate Splunk Enterprise and ITSI across hybrid Kubernetes/OpenShift platforms.
  • Onboard and normalize data at scale (HEC, Universal Forwarder, Deployment Server), aligning to CIM standards.
  • Build and optimize ITSI service models: service trees, KPIs, adaptive thresholds, NEAP policies, glass tables, deep dives, and health scoring.
  • Deliver OpenShift-focused executive and operational dashboards, including cluster/API/etcd health, node readiness and resource pressure, pod restart trends, network and storage error visibility, and capacity analysis.
  • Optimize search and platform performance (workload rules, DMA, summary indexing, scheduling hygiene, concurrency tuning).
  • Implement intelligent alerting and automated routing into ITSM and ChatOps platforms.
  • Govern data ingestion and security controls (RBAC, retention, PII handling, TLS, token governance).
  • Integrate telemetry pipelines including OpenTelemetry, Prometheus, Fluentd/Fluent Bit/Vector, Kafka, CMDB, and AIOps/ML solutions.
  • Drive SLO/KPI alignment, golden signal monitoring, rollout/rollback health validation, and executive reporting.

Key Skills:

  • Deep expertise in Splunk Enterprise (SPL mastery, CIM alignment, saved searches, macros, KV stores, index/retention/RBAC design, performance tuning).
  • Strong experience with Splunk ITSI (service trees, KPIs, adaptive/time-based thresholds, NEAP tuning, Service Analyzer configuration).
  • Proven OpenShift/Kubernetes observability experience across control-plane metrics, events, logs, workload correlation, and capacity management.
  • Hands-on experience with telemetry pipelines (OpenTelemetry/OTLP, Prometheus exporters, Fluentd/Fluent Bit/Vector, Kafka with TLS, HEC/UF/DS onboarding).
  • Strong understanding of reliability engineering principles (golden signals, SLO design, namespace/application KPI mapping).
  • Experience optimizing performance and licensing costs using workload rules, DMA, and summary indexing.
  • Solid security and compliance knowledge (TLS/mTLS, certificate/token hygiene, PII controls, auditability, role/index mapping).
  • Automation and integration expertise across ITSM, ChatOps, webhooks, CMDB enrichment, and AIOps tooling.

Salary (Rate): £490/day

City: Birmingham

Country: United Kingdom

Working Arrangements: undetermined

IR35 Status: undetermined

Seniority Level: undetermined

Industry: IT

Detailed Description From Employer:

We're looking for a Splunk & OpenShift Observability Engineer to design, deploy, and optimise enterprise-grade monitoring across hybrid Kubernetes and OpenShift environments.This is a high-impact role where you'll shape observability strategy, enhance service intelligence, and ensure platform reliability at scale - balancing performance, cost efficiency, and security governance.You'll work at the intersection of platform engineering, observability, and service intelligence, helping to transform raw telemetry into actionable insight. This is an opportunity to influence reliability strategy, improve operational maturity, and deliver measurable value across a modern cloud-native estate.

What You'll Be Doing

  • Design, deploy, and operate Splunk Enterprise and ITSI across hybrid Kubernetes/OpenShift platforms
  • Onboard and normalise data at scale (HEC, Universal Forwarder, Deployment Server), aligning to CIM standards
  • Build and optimise ITSI service models: service trees, KPIs, adaptive thresholds, NEAP policies, glass tables, deep dives, and health scoring
  • Deliver OpenShift-focused executive and operational dashboards, including:
    • Cluster/API/etcd health
    • Node readiness and resource pressure
    • Pod restart trends and noisy-neighbour detection
    • Network and storage error visibility
    • Capacity, quota, and burst analysis
  • Optimise search and platform performance (workload rules, DMA, summary indexing, scheduling hygiene, concurrency tuning)
  • Implement intelligent alerting and automated routing into ITSM and ChatOps platforms, including enrichment, suppression windows, and maintenance scheduling
  • Govern data ingestion and security controls (RBAC, retention, PII handling, TLS, token governance, index and role mapping)
  • Integrate telemetry pipelines including OpenTelemetry, Prometheus, Fluentd/Fluent Bit/Vector, Kafka, CMDB and AIOps/ML solutions
  • Drive SLO/KPI alignment, golden signal monitoring, rollout/rollback health validation, and executive reporting

What You'll Bring

  • Deep expertise in Splunk Enterprise (SPL mastery, CIM alignment, saved searches, macros, KV stores, index/retention/RBAC design, performance tuning)
  • Strong experience with Splunk ITSI (service trees, KPIs, adaptive/time-based thresholds, NEAP tuning, Service Analyzer configuration)
  • Proven OpenShift/Kubernetes observability experience across control-plane metrics, events, logs, workload correlation, and capacity management
  • Hands-on experience with telemetry pipelines (OpenTelemetry/OTLP, Prometheus exporters, Fluentd/Fluent Bit/Vector, Kafka with TLS, HEC/UF/DS onboarding)
  • Strong understanding of reliability engineering principles (golden signals, SLO design, namespace/application KPI mapping)
  • Experience optimising performance and licensing costs using workload rules, DMA, and summary indexing
  • Solid security and compliance knowledge (TLS/mTLS, certificate/token hygiene, PII controls, auditability, role/index mapping)
  • Automation and integration expertise across ITSM, ChatOps, webhooks, CMDB enrichment, and AIOps tooling