Site Reliability Engineer

Site Reliability Engineer

Posted 1 week ago by 1750429514

Negotiable
Undetermined
Undetermined
.Riverwoods, United States

Summary: We are seeking a Senior Site Reliability/DevOps Engineer to enhance the reliability, scalability, and security of our financial platforms. The ideal candidate will have extensive experience in automating infrastructure and optimizing deployments in a regulated environment. This role involves leading the design of resilient infrastructure and mentoring junior engineers. A strong background in AWS, Kubernetes, and Linux systems is essential for success in this position.

Key Responsibilities:

  • Lead the design and implementation of resilient, scalable infrastructure using Infrastructure as Code (Terraform, CloudFormation, etc.)
  • Own and optimize CI/CD pipelines and deployment strategies
  • Proactively monitor, troubleshoot, and resolve system issues to minimize downtime
  • Develop and maintain comprehensive observability solutions—logging, metrics, tracing, and alerting—to ensure full visibility into system performance and reliability
  • Support and optimize AWS EMR clusters for data processing workloads, ensuring stability, cost-efficiency, and integration with data pipelines
  • Champion automation and DevOps best practices across teams
  • Collaborate with security and compliance teams to meet regulatory requirements
  • Mentor junior engineers and contribute to architectural decisions

Key Skills:

  • 8+ years in SRE, DevOps or infrastructure engineering roles
  • Expert-level knowledge of AWS (including EMR), Kubernetes, and Linux systems
  • Strong experience with Docker, Terraform, CI/CD tools (e.g., Jenkins, GitLab CI), and scripting (Python, Bash)
  • Proven track record managing mission-critical systems in financial or similarly regulated industries
  • Deep understanding of observability tools and practices (e.g., Prometheus, Grafana, ELK, OpenTelemetry)
  • Hands-on experience deploying, tuning, and managing AWS EMR clusters in production environments

Salary (Rate): undetermined

City: undetermined

Country: United States

Working Arrangements: undetermined

IR35 Status: undetermined

Seniority Level: Senior

Industry: IT

Detailed Description From Employer:
Ref: #67936

Site Reliability Engineer

  • Practice Cloud & Infrastructure

  • Technologies Infrastructure & Cloud

  • Location .Riverwoods, United States

  • Type Contract

Job Description:

We are hiring a Senior Site Reliability/DevOps Engineer to drive the reliability, scalability, and security of our financial platforms. This role is ideal for a seasoned engineer with deep experience in automating infrastructure, optimizing deployments, and building fault-tolerant systems in a regulated, high-stakes environment.

Key Responsibilities:

  • Lead the design and implementation of resilient, scalable infrastructure using Infrastructure as Code (Terraform, CloudFormation, etc.)
  • Own and optimize CI/CD pipelines and deployment strategies
  • Proactively monitor, troubleshoot, and resolve system issues to minimize downtime
  • Develop and maintain comprehensive observability solutions—logging, metrics, tracing, and alerting—to ensure full visibility into system performance and reliability
  • Support and optimize AWS EMR clusters for data processing workloads, ensuring stability, cost-efficiency, and integration with data pipelines
  • Champion automation and DevOps best practices across teams
  • Collaborate with security and compliance teams to meet regulatory requirements
  • Mentor junior engineers and contribute to architectural decisions

Requirements:

  • 8+ years in SRE, DevOps or infrastructure engineering roles
  • Expert-level knowledge of AWS (including EMR), Kubernetes, and Linux systems
  • Strong experience with Docker, Terraform, CI/CD tools (e.g., Jenkins, GitLab CI), and scripting (Python, Bash)
  • Proven track record managing mission-critical systems in financial or similarly regulated industries
  • Deep understanding of observability tools and practices (e.g., Prometheus, Grafana, ELK, OpenTelemetry)
  • Hands-on experience deploying, tuning, and managing AWS EMR clusters in production environments

Preferred:

  • Experience with SOC2, PCI, or other compliance frameworks
  • Relevant certifications (AWS, Kubernetes, etc.)