Negotiable
Outside
Remote
USA
Summary: The role is for a Google Cloud Platform Infrastructure Subject Matter Expert (SME) with hands-on experience in High-Performance Computing (HPC). The position requires extensive knowledge of Google Cloud services and infrastructure, particularly in managing large-scale GKE clusters. The candidate must possess a Google Cloud Professional Architect Certification and have practical experience with various technologies and tools related to cloud infrastructure.
Key Responsibilities:
- Provide expertise as a Google Cloud Platform SME with hands-on HPC experience.
- Manage and optimize large GKE clusters, including troubleshooting and performance tuning.
- Utilize GPU/TPU hardware effectively within Google Cloud environments.
- Implement Infrastructure as Code (IaC) practices using tools like Terraform.
- Work with networking, storage, and Python libraries relevant to machine learning.
- Engage in multiple Google Cloud Platform projects, ensuring hands-on involvement with infrastructure.
- Collaborate with Google PSO when applicable.
Key Skills:
- Google Cloud Professional Architect Certification.
- Hands-on experience with Google Cloud Platform and HPC.
- Proficiency in GPU/TPU technologies.
- Familiarity with Infrastructure as Code (IaC) tools, particularly Terraform.
- Experience with Google Kubernetes Engine (GKE).
- Strong networking and storage knowledge.
- Proficient in Python and libraries such as numpy, pandas, Pytorch, and JAX.
- Experience with ML-specific Google Cloud products.
- Ability to troubleshoot and optimize large-scale GKE clusters.
- Prior experience with Google PSO is a plus.
Salary (Rate): undetermined
City: undetermined
Country: USA
Working Arrangements: remote
IR35 Status: outside IR35
Seniority Level: undetermined
Industry: IT
Role: Google Cloud Platform Infra SME with hands on HPC Experience
Location : US Remote
Experience: 10 years
What is Needed
- Google Cloud Professional Architect Certification is Mandatory
- Google Cloud Platform SME with hands-on HPC experience (with Infra)
- GPU / TPU Experience
- Familiarity with hands on IaC is a must
- Terraform
- GKE
- Networking
- Storage
- Python
- Library familiarity with: numpy/pandas/Pytorch/JAX, including optimization
- Experience with Nvidia and/or Google TPU hardware in GCE and GKE
- ML specific Google Cloud Platform products: Parallel store, Hyperdisk ML, TCP Direct
- Troubleshooting and optimization of large (1000s of nodes) GKE clusters
- Prior experience working with Google PSO is a plus. More than 3 Google Cloud Platform Projects with hands-on experience on the above mentioned Infra background.