Data Scientist

Data Scientist

Posted 2 weeks ago by 1755586494

Negotiable
Outside
Remote
USA

Summary: The Data Scientist role is focused on leveraging expertise in data science, machine learning, and large language models to build and optimize solutions within the Databricks and Azure ecosystems. The position requires hands-on experience with PySpark and involves collaboration with various stakeholders to integrate ML models into production workflows. The ideal candidate will stay updated on advancements in AI/ML and document methodologies for knowledge sharing. This is a long-term contract position that allows for remote work.

Key Responsibilities:

  • Design, develop, and deploy data science and ML solutions on Databricks (Azure environment).
  • Work on end-to-end ML lifecycle, from data preparation and feature engineering to model training, evaluation, and deployment.
  • Apply LLM fine-tuning and optimization techniques within Databricks for domain-specific use cases.
  • Utilize PySpark for distributed data processing, cleaning, and transformation.
  • Collaborate with data engineers, cloud architects, and business stakeholders to ensure seamless integration of ML models into production workflows.
  • Conduct exploratory data analysis (EDA), statistical modeling, and hypothesis testing to extract insights from structured and unstructured data.
  • Stay updated on the latest advancements in AI/ML, LLMs, and Databricks capabilities to bring innovative solutions.
  • Document methodologies, experiments, and best practices for knowledge sharing.

Key Skills:

  • Bachelor's/Master's degree in Computer Science, Data Science, Statistics, AI/ML, or related field.
  • Proven experience as a Data Scientist with exposure to ML and NLP projects.
  • Strong hands-on experience with Databricks on Azure (MLflow, Delta Lake, Databricks ML).
  • Proficiency in PySpark for large-scale data processing.
  • Experience in training, fine-tuning, and deploying LLMs within Databricks environment.
  • Strong programming skills in Python and familiarity with ML frameworks (TensorFlow, PyTorch, Scikit-learn, Hugging Face).
  • Solid understanding of data science workflows: data wrangling, feature engineering, model development, and evaluation.
  • Working knowledge of Azure cloud services (Azure Data Lake, Azure Synapse, Azure ML).
  • Strong problem-solving, analytical thinking, and communication skills.

Salary (Rate): undetermined

City: undetermined

Country: USA

Working Arrangements: remote

IR35 Status: outside IR35

Seniority Level: undetermined

Industry: IT

Detailed Description From Employer:

Job Title: Data Scientist

Location: Atlanta, GA (Remote) Job Type: Long-Term Contract

About the Role

We are seeking a highly motivated and skilled Data Scientist with strong expertise in data science fundamentals, machine learning (ML), and large language models (LLMs). The ideal candidate will have hands-on experience working with Databricks and Azure ecosystems, including PySpark for data processing and LLM tuning within Databricks. This role involves building and optimizing data science solutions that leverage cloud-based technologies to deliver business value.

Key Responsibilities

  • Design, develop, and deploy data science and ML solutions on Databricks (Azure environment).
  • Work on end-to-end ML lifecycle, from data preparation and feature engineering to model training, evaluation, and deployment.
  • Apply LLM fine-tuning and optimization techniques within Databricks for domain-specific use cases.
  • Utilize PySpark for distributed data processing, cleaning, and transformation.
  • Collaborate with data engineers, cloud architects, and business stakeholders to ensure seamless integration of ML models into production workflows.
  • Conduct exploratory data analysis (EDA), statistical modeling, and hypothesis testing to extract insights from structured and unstructured data.
  • Stay updated on the latest advancements in AI/ML, LLMs, and Databricks capabilities to bring innovative solutions.
  • Document methodologies, experiments, and best practices for knowledge sharing.

Required Skills & Qualifications

  • Bachelor s/Master s degree in Computer Science, Data Science, Statistics, AI/ML, or related field.
  • Proven experience as a Data Scientist with exposure to ML and NLP projects.
  • Strong hands-on experience with Databricks on Azure (MLflow, Delta Lake, Databricks ML).
  • Proficiency in PySpark for large-scale data processing.
  • Experience in training, fine-tuning, and deploying LLMs within Databricks environment.
  • Strong programming skills in Python and familiarity with ML frameworks (TensorFlow, PyTorch, Scikit-learn, Hugging Face).
  • Solid understanding of data science workflows: data wrangling, feature engineering, model development, and evaluation.
  • Working knowledge of Azure cloud services (Azure Data Lake, Azure Synapse, Azure ML).
  • Strong problem-solving, analytical thinking, and communication skills.

Good-to-Have Skills

  • Experience with MLOps practices and tools (CI/CD for ML, MLflow).
  • Knowledge of vector databases and LLM deployment pipelines.
  • Familiarity with prompt engineering and RAG (Retrieval-Augmented Generation) techniques.
  • Exposure to generative AI projects on cloud platforms.