Site-Reliability Engineer Remote Location

Site-Reliability Engineer Remote Location

Posted 1 day ago by 1753951458

Negotiable
Outside
Remote
USA

Summary: The Site-Reliability Engineer role requires a professional with 3-5 years of experience in service reliability operations for high-performance applications in hybrid environments. The position emphasizes automation, cloud transition, and containerization, alongside proficiency in various programming languages and databases. Candidates should also possess strong debugging skills and experience with cloud observability tools. This role is remote and classified as outside IR35.

Key Responsibilities:

  • Manage service reliability operations for large-scale applications in hybrid environments.
  • Write automation scripts and build dashboards for Application Performance Management.
  • Transition platforms to the cloud and implement containerization solutions.
  • Maintain containerized applications in GKE environments.
  • Implement cloud observability using OTEL for real-time monitoring and incident resolution.
  • Utilize networking protocols to troubleshoot issues in high-pressure situations.
  • Manage application availability and improve processes for high availability platforms.
  • Monitor and troubleshoot HashiCorp Vault environments.
  • Work with various monitoring tools and CI/CD extenders.

Key Skills:

  • 3-5 years of service reliability operations experience.
  • Experience with automation scripting and Application Performance Management.
  • Proficiency in programming languages such as Go, Python, Java, and Rust.
  • Knowledge of databases including Oracle, SQL Server, and MongoDB.
  • Experience with cloud platforms like Google Cloud Platform and AWS.
  • Familiarity with containerization and GKE environments.
  • Experience with cloud observability tools and monitoring solutions.
  • Strong debugging skills across integrated technical platforms.
  • Hands-on experience with in-memory caching solutions, particularly Redis.

Salary (Rate): undetermined

City: undetermined

Country: USA

Working Arrangements: remote

IR35 Status: outside IR35

Seniority Level: undetermined

Industry: IT

Detailed Description From Employer:

Phoenix AZ

Job Description:

>> Min 3-5 years of Service reliabilityoperation experience running large scale, high performance applications in a hybrid environment (on-prem and cloud).

>> Min 3-5 years of experience writing automation scripts and building dashboards for Application Performance management to manage Transaction journeys.

>> 2-4 years of Experience working with Programming languages such as Go, Python, Java, Rust etc.

>> Working knowledge on with one or more databases-Oracle, PL?SQL, SQL Server, Redis, Clickhouse, postgres, Mongo or any time-series databases

>> At least 2+ years of Experience transitioning platforms to the cloud and Containerization - Google Cloud Platform, AWS and Rancher (or Cloud Formation, Azure and OpenShift).

>> Experience maintaining containerized app in GKERKEAKE environments.

>> Experience Implementing Cloud observability using OTEL to enable real-time monitoring, distributed tracing and incident resolution.

>> Experience working with specific GraphQL Framework (Apollo, Prisma, Hasura etc...)

>> Experience using knowledge of networking protocols such as TCPIP, HTTP, DNS, Load balancing and service mesh to troubleshoot issues in high pressure situations.

>> Proven experience managing Application availability, building creative solutions to manage repetitive activities, improve gating and detect for applications at every touchpoint for a 24 x 7 High availability platform exposed to critical clients and customers.

>> Working knowledge of Monitoring tools - Splunk, App-dynamics, grafanaPrometheus and Dynatrace.

>> Experience with tools like Rally, Confluence and other CICD extenders.

>> Hands-on experience with implementing in-memory caching solutions. Experience on Redis DB is a plus.

>> Excellent debugging skills across variety of integrated technical platforms on API gateway.

>> Hands-on with GCS, Cloud SQL, PL?SQL and Spanner.

>> Monitor and troubleshoot HashiCorp Vault environments, ensuring minimal downtime and rapid recovery from incidents.

>> Working knowledge on Vertex Al, Gen Al and Bigquery.