RH Advanced Cluster Management for Kubernetes Principle Consultant

RH Advanced Cluster Management for Kubernetes Principle Consultant

Posted Today by Daddy Recruiter LLP

Negotiable
Undetermined
Remote
Remote

Summary: The role of Principal Consultant for RH Advanced Cluster Management involves leading the strategic implementation of Red Hat Advanced Cluster Management at Truist Bank. The consultant will act as the primary architect and delivery lead, focusing on complex deployment topologies and establishing standards for observability and governance. This position requires a high level of autonomy and the ability to deliver project requirements with minimal oversight. The consultant will also engage with stakeholders to create tailored solutions for performance management and automation.

Key Responsibilities:

  • Architecture Validation & Strategy: Review and finalize ACM architecture, optimize infrastructure management, and define performance specs.
  • Observability & Performance Management: Lead sessions to build Grafana dashboards, design alerting frameworks, and identify resource optimization opportunities.
  • GitOps & Governance Automation: Transition operations to ArgoCD, establish governance processes, and integrate policies into automation pipelines.

Key Skills:

  • Deep expertise in Advanced Cluster Management (ACM) and Multi-cluster Observability (MCO).
  • Proven experience with GitOps and ArgoCD for cluster configurations and automated drift mitigation.
  • Expertise in Grafana and Prometheus for alerting frameworks.
  • Strong proficiency in Ansible for infrastructure automation.
  • Experience in defining and deploying ACM Policies for security and compliance.

Salary (Rate): undetermined

City: undetermined

Country: undetermined

Working Arrangements: remote

IR35 Status: undetermined

Seniority Level: undetermined

Industry: IT

Detailed Description From Employer:

RH Advanced Cluster Management for Kubernetes Principle Consultant
Remote
6 Months Contract

Role Overview
As a Principal Consultant, you will lead the strategic implementation of Red Hat Advanced Cluster Management (RHACM) to transform platform operations for Truist Bank. You will serve as the primary architect and delivery lead, responsible for validating complex deployment topologies and establishing a gold standard for multi-cluster observability, automated governance, and resource optimization. This is a high-autonomy role that requires the ability to deliver requirements and project scope with limited oversight.

Key Responsibilities
1. Architecture Validation & Strategy
Design Authority: Review and finalize ACM architecture, ensuring it supports diverse deployment topologies and critical Disaster Recovery (DR) requirements (Active/Passive configurations).
Infrastructure Synergy: Optimize the co-location of infrastructure management and ArgoCD to ensure a seamless "single pane of glass" for the platform.
Performance Engineering: Define storage and performance specs required to support high-throughput multi-cluster observability and alerting frameworks.
2. Observability & Performance Management
Data-Driven Insights: Lead stakeholder sessions to define and build custom Grafana dashboards that provide actionable data on capacity, network traffic, and workload scaling.
Alerting Framework: Design and implement a performant alerting framework that filters noise and provides SRE teams with discrete, actionable notifications.
Right-Sizing Initiatives: Utilize Multi-cluster Observability (MCO) and auto-scalers (HPA/VPA) to identify over-requested resources and automate application density optimization.
3. GitOps & Governance Automation
Configuration Drift Mitigation: Transition Day-2 operations to ArgoCD, ensuring all cluster configurations (RBAC, network policies, operator installs) are managed as code and automatically reverted if manual drift occurs.
Policy-as-Code: Establish a GitOps-based governance process. Create and roll out ACM Policy Sets to monitor cluster health and security compliance across the entire fleet.
Automation Integration: Integrate ACM Policies and Day-2 configurations into existing Ansible automation pipelines for full lifecycle orchestration.

Core Technical Requirements
Advanced Cluster Management (ACM): Deep expertise in implementing and configuring ACM Multi-cluster Observability (MCO), including managing Multi-cluster Hubs and Spoke clusters.
GitOps & Continuous Delivery: Proven experience using ArgoCD for Day-2 cluster configurations, operator installations, and automated drift mitigation.
Observability Stack: Expert-level capability in Grafana dashboard development and PrometheAlertmanager for creating actionable, noise-reduced alerting frameworks.
Infrastructure Automation: Strong proficiency in Ansible for automating platform deployments and managing infrastructure-as-code (IaC) workflows.
Policy & Governance: Experience defining and deploying ACM Policies and Policy Sets to enforce security, compliance, and configuration consistency across multiple clusters.



Success Criteria
Independence: Able to translate high-level business goals into a documented, validated implementation plan without day-to-day technical direction.
Customer Centricity: Strong ability to interface with various personas (SRE, Platform, Stakeholders) to extract requirements and build tailored dashboard/alerting solutions.

Additional Requirements
US-based resource; background checks will be required.
Duration: 6 months, 40hrs/wk