Negotiable
Undetermined
Undetermined
.Riverwoods, United States
Summary: We are seeking a Senior Site Reliability/DevOps Engineer to enhance the reliability, scalability, and security of our financial platforms. The ideal candidate will have extensive experience in automating infrastructure and optimizing deployments in a regulated environment. This role involves leading the design of resilient infrastructure and mentoring junior engineers. A strong background in AWS, Kubernetes, and Linux systems is essential for success in this position.
Key Responsibilities:
- Lead the design and implementation of resilient, scalable infrastructure using Infrastructure as Code (Terraform, CloudFormation, etc.)
- Own and optimize CI/CD pipelines and deployment strategies
- Proactively monitor, troubleshoot, and resolve system issues to minimize downtime
- Develop and maintain comprehensive observability solutions—logging, metrics, tracing, and alerting—to ensure full visibility into system performance and reliability
- Support and optimize AWS EMR clusters for data processing workloads, ensuring stability, cost-efficiency, and integration with data pipelines
- Champion automation and DevOps best practices across teams
- Collaborate with security and compliance teams to meet regulatory requirements
- Mentor junior engineers and contribute to architectural decisions
Key Skills:
- 8+ years in SRE, DevOps or infrastructure engineering roles
- Expert-level knowledge of AWS (including EMR), Kubernetes, and Linux systems
- Strong experience with Docker, Terraform, CI/CD tools (e.g., Jenkins, GitLab CI), and scripting (Python, Bash)
- Proven track record managing mission-critical systems in financial or similarly regulated industries
- Deep understanding of observability tools and practices (e.g., Prometheus, Grafana, ELK, OpenTelemetry)
- Hands-on experience deploying, tuning, and managing AWS EMR clusters in production environments
Salary (Rate): undetermined
City: undetermined
Country: United States
Working Arrangements: undetermined
IR35 Status: undetermined
Seniority Level: Senior
Industry: IT
Site Reliability Engineer
Job Description:
We are hiring a Senior Site Reliability/DevOps Engineer to drive the reliability, scalability, and security of our financial platforms. This role is ideal for a seasoned engineer with deep experience in automating infrastructure, optimizing deployments, and building fault-tolerant systems in a regulated, high-stakes environment.
Key Responsibilities:
- Lead the design and implementation of resilient, scalable infrastructure using Infrastructure as Code (Terraform, CloudFormation, etc.)
- Own and optimize CI/CD pipelines and deployment strategies
- Proactively monitor, troubleshoot, and resolve system issues to minimize downtime
- Develop and maintain comprehensive observability solutions—logging, metrics, tracing, and alerting—to ensure full visibility into system performance and reliability
- Support and optimize AWS EMR clusters for data processing workloads, ensuring stability, cost-efficiency, and integration with data pipelines
- Champion automation and DevOps best practices across teams
- Collaborate with security and compliance teams to meet regulatory requirements
- Mentor junior engineers and contribute to architectural decisions
Requirements:
- 8+ years in SRE, DevOps or infrastructure engineering roles
- Expert-level knowledge of AWS (including EMR), Kubernetes, and Linux systems
- Strong experience with Docker, Terraform, CI/CD tools (e.g., Jenkins, GitLab CI), and scripting (Python, Bash)
- Proven track record managing mission-critical systems in financial or similarly regulated industries
- Deep understanding of observability tools and practices (e.g., Prometheus, Grafana, ELK, OpenTelemetry)
- Hands-on experience deploying, tuning, and managing AWS EMR clusters in production environments
Preferred:
- Experience with SOC2, PCI, or other compliance frameworks
- Relevant certifications (AWS, Kubernetes, etc.)