Negotiable
Outside
Remote
USA
Summary: The role of Google Cloud Platform Infra SME focuses on leveraging hands-on experience with High-Performance Computing (HPC) within the Google Cloud environment. The position requires expertise in various Google Cloud products and technologies, particularly in infrastructure management and optimization. Candidates must possess a Google Cloud Professional Architect Certification and demonstrate proficiency in tools such as Terraform and GKE. The role is remote and emphasizes practical experience with large-scale GKE clusters and machine learning products.
Key Responsibilities:
- Utilize Google Cloud Professional Architect Certification to guide infrastructure projects.
- Implement and manage HPC solutions on Google Cloud Platform.
- Optimize and troubleshoot large GKE clusters with thousands of nodes.
- Work with GPU/TPU hardware and ML-specific Google Cloud products.
- Develop infrastructure as code (IaC) using Terraform.
- Collaborate on multiple Google Cloud Platform projects with hands-on experience.
Key Skills:
- Google Cloud Professional Architect Certification.
- Hands-on experience with HPC and Google Cloud Platform.
- Proficiency in Terraform and GKE.
- Knowledge of networking and storage solutions.
- Experience with Python and libraries such as numpy, pandas, Pytorch, and JAX.
- Familiarity with Nvidia and/or Google TPU hardware.
- Experience with ML-specific Google Cloud products like Parallel store and Hyperdisk ML.
Salary (Rate): undetermined
City: undetermined
Country: USA
Working Arrangements: remote
IR35 Status: outside IR35
Seniority Level: undetermined
Industry: IT
Title: Google Cloud Platform Infra SME (HPC Experience)
Location: Remote, USA
Google Cloud Platform Infra SME with hands on HPC Experience:
Google Cloud Professional Architect Certification - Mandatory
Google Cloud Platform SME with hands-on HPC experience (with Infra)
GPU / TPU Experience
Familiarity with hands on IaC is a must
Terraform
GKE
Networking
Storage
Python
Library familiarity with: numpy/pandas/Pytorch/JAX, including optimization
Experience with Nvidia and/or Google TPU hardware in GCE and GKE
ML specific Google Cloud Platform products: Parallel store, Hyperdisk ML, TCP Direct
Troubleshooting and optimization of large (1000s of nodes) GKE clusters
Prior experience working with Google PSO is a plus. More than 3 Google Cloud Platform Projects with hands-on experience on the above mentioned Infra background.