Sr. Site Reliability Engineer - W2 only

Sr. Site Reliability Engineer - W2 only

Posted 2 weeks ago by 1752650984

Negotiable
Outside
Remote
USA

Summary: The Site Reliability Engineer (SRE) role requires a seasoned professional with over 6 years of experience in implementing monitoring solutions for both on-premises and cloud platforms, specifically Azure. The SRE will focus on building observability and resilience capabilities, collaborating with various engineering teams to enhance application performance and infrastructure monitoring. Key responsibilities include configuring alerts, creating dashboards, and supporting resilience engineering efforts. The position is remote and classified as outside IR35.

Key Responsibilities:

  • Build and configure alerts, tracing, telemetry, and instrumentation for Infrastructure Monitoring and Application Performance Management.
  • Implement dashboards to monitor and share Observability at various levels (engineering teams, portfolio, senior management).
  • Support resilience engineering (application and infrastructure resilience) to meet availability requirements.
  • Collaborate with development engineers, cloud engineers, product teams, and support engineers to gather requirements and evolve observability and resilience solutions.

Key Skills:

  • Good knowledge on Observability and Application Performance Monitoring best practices, KPIs/metrics on Cloud platforms.
  • Experience in monitoring tools such as Splunk, Dyna Trace, Prometheus, Cloud Watch, Azure Monitor, New Relic, and other open-source tools.
  • Experience building monitoring solutions for a variety of workloads such as Microservices (Java / Spring Boot desirable), databases, Kafka, Kubernetes.
  • Experience in resilience engineering and implementing high availability solutions.
  • Experience creating Monitoring dashboards using tools such as Grafana (Preferred), Splunk, Kibana, Power BI.
  • Ability to work in a fast-paced and agile environment.

Salary (Rate): undetermined

City: undetermined

Country: USA

Working Arrangements: remote

IR35 Status: outside IR35

Seniority Level: undetermined

Industry: IT

Detailed Description From Employer:

Site Reliability Engineer
- At least 6+ years of experience defining and implementing Monitoring solutions - alerts, Telemetry, and instrumentation for on-premises and cloud platforms for large enterprises
- Site Reliability Engineer will be playing a key role in building Observability and Resilience capabilities on cloud platform (Azure).Responsibilities of the SRE will be:
- Build and configure alerts, tracing, telemetry, and instrumentation required for Infrastructure Monitoring and Application Performance Management.
- Role entails implementing dashboards to monitor and share Observability at various levels (engineering teams, portfolio, senior management).
- Support resilience engineering (application and infrastructure resilience) to meet availability requirements.
- Work with development engineers, cloud engineers, product teams, and support engineers to gather requirements, implement, and evolve observability and resilience solutions.

Key Skillsets :
Good knowledge on Observability and Application Performance Monitoring best practices, KPIs/metrics on Cloud platforms
Experience in monitoring tools such as Splunk, Dyna Trace, Prometheus, Cloud Watch, Azure Monitor, New Relic, other open-source tools.
Experience building monitoring solutions for variety of workloads such as Micro services (Java / Spring boot desirable), databases, Kafka, Kubernetes
Experience in resilience engineering, and implementing high availability solutions
Experience creating Monitoring dashboards using tools such as Grafana (Preferred), Splunk, Kibana, Power BI
Ability to work in a fast paced and agile environment