MLOps Architect

MLOps Architect

Posted 6 days ago by 1751525454

Negotiable
Outside
Remote
USA

Summary: The role of Lead MLOps Engineer involves leading the operationalization of Machine Learning workloads, focusing on designing, building, and maintaining the necessary infrastructure for efficient development and deployment. The position requires close collaboration with data scientists to ensure model reliability and performance, alongside expertise in automating ML workflows and continuous integration. The ideal candidate will have extensive experience in MLOps, particularly within AWS environments. This role is remote and emphasizes a strong understanding of MLOps tools and methodologies.

Key Responsibilities:

  • Architect for scalable, cost-efficient, reliable and secure MLOps solution.
  • Design, implement and deploy MLOps solutions in AWS.
  • Select and justify appropriate ML technology within AWS and Identify appropriate AWS services to implement MLOps solutions.
  • Design, build, and maintain infrastructure required for efficient development, deployment, and monitoring of machine learning models.
  • Implement CI/CD pipelines for machine learning applications to ensure smooth development and deployment processes.
  • Collaborate with data scientists to understand and implement requirements for model serving, versioning, and reproducibility.
  • Monitor and optimize model performance in production, identifying and resolving issues proactively to ensure optimal results.
  • Automate repetitive tasks to improve efficiency and reduce the risk of human error in MLOps workflows.
  • Maintain documentation and provide training to team members on MLOps best practices, ensuring knowledge sharing and collaboration within the team.
  • Stay updated with the latest developments in MLOps tools, technologies, and methodologies to remain current and effective in your role.

Key Skills:

  • Minimum 10+ years of experience in MLOps with a proven track record of successful deployments.
  • In-depth working knowledge of MLOps tools and platforms (Kubernetes, Docker, Jenkins, Git, MLflow, JupyterHub, LLM-specific tooling).
  • In-depth working knowledge of AWS and infrastructure as code (IaC) principles.
  • Strong Experience with DevOps methodologies and CI/CD pipelines such as Github Actions.
  • Strong understanding of machine learning pipelines, model training frameworks, and monitoring techniques.
  • Strong programming skills in Python.
  • Experience with ML frameworks such as TensorFlow, PyTorch, and/or scikit-learn.
  • Strong understanding of machine learning lifecycle, including data preprocessing, model training, evaluation, and deployment.
  • Experience with large language models (LLMs) and their unique operational considerations is a plus.
  • Excellent communication, collaboration, and problem-solving skills.
  • The ability to translate technical concepts into clear and concise language.
  • A passion for innovation and a drive to optimize ML and LLM workflows.
  • 12+ years of experience in MLOps, DevOps, or related fields.
  • Hands-on experience with AWS.
  • Familiarity with containerization and orchestration tools like Docker and Kubernetes.
  • In depth Knowledge of infrastructure-as-code tools such as AWS CDK and Cloudformation.
  • Excellent problem-solving skills and the ability to work independently as well as part of a team.
  • Strong communication skills and the ability to explain complex technical concepts to non-technical stakeholders.

Salary (Rate): undetermined

City: New York

Country: USA

Working Arrangements: remote

IR35 Status: outside IR35

Seniority Level: undetermined

Industry: IT

Detailed Description From Employer:
  • We're seeking an experienced Lead MLOps Engineer to lead the operationalization of our Machine Learning workloads.
  • As a key team member, you'll be responsible for designing, building, and maintaining infrastructure required for efficient development, deployment, and monitoring of machine learning workloads. Your close collaboration with data scientists will ensure that our models are reliable, scalable, and performing optimally.
  • This role requires expertise in automating ML workflows, enhancing model reproducibility, and ensuring continuous integration and delivery.

Responsibilities: -

  • Architect for scalable, cost-efficient, reliable and secure MLOps solution.
  • Design, implement and deploy MLOps solutions in AWS.
  • Select and justify appropriate ML technology within AWS and Identify appropriate AWS services to implement MLOps solutions.
  • Design, build, and maintain infrastructure required for efficient development, deployment, and monitoring of machine learning models.
  • Implement CI/CD pipelines for machine learning applications to ensure smooth development and deployment processes.
  • Collaborate with data scientists to understand and implement requirements for model serving, versioning, and reproducibility.
  • Monitor and optimize model performance in production, identifying and resolving issues proactively to ensure optimal results.
  • Automate repetitive tasks to improve efficiency and reduce the risk of human error in MLOps workflows.
  • Maintain documentation and provide training to team members on MLOps best practices, ensuring knowledge sharing and collaboration within the team.
  • Stay updated with the latest developments in MLOps tools, technologies, and methodologies to remain current and effective in your role.

Experience: -

  • Minimum 10+ years of experience in MLOps with a proven track record of successful deployments.
  • In-depth working knowledge of MLOps tools and platforms (Kubernetes, Docker, Jenkins, Git, MLflow, JupyterHub, LLM-specific tooling).
  • In-depth working knowledge of AWS and infrastructure as code (IaC) principles.
  • Strong Experience with DevOps methodologies and CI/CD pipelines such as Github Actions.
  • Strong understanding of machine learning pipelines, model training frameworks, and monitoring techniques.
  • Strong programming skills in Python
  • Experience with ML frameworks such as TensorFlow, PyTorch, and/or scikit-learn.
  • Strong understanding of machine learning lifecycle, including data preprocessing, model training, evaluation, and deployment.
  • Experience with large language models (LLMs) and their unique operational considerations is a plus.
  • Excellent communication, collaboration, and problem-solving skills.
  • The ability to translate technical concepts into clear and concise language.
  • A passion for innovation and a drive to optimize ML and LLM workflows
  • 12+ years of experience in MLOps, DevOps, or related fields.
  • Hands-on experience with AWS.
  • Familiarity with containerization and orchestration tools like Docker and Kubernetes.
  • In depth Knowledge of infrastructure-as-code tools such as AWS CDK and Cloudformation.
  • Excellent problem-solving skills and the ability to work independently as well as part of a team.
  • Strong communication skills and the ability to explain complex technical concepts to non-technical stakeholders.

Location: -

  • New York, NY(Remote)

Educational Qualifications: -

  • Bachelor's or master s degree in computer science, Engineering, or a related field.

Preferred Qualifications: -

  • AWS Certified Machine Learning Specialty
  • Experience with A/B testing and model performance monitoring