Negotiable
Outside
Remote
USA
Summary: The Data Scientist role is focused on leveraging expertise in data science, machine learning, and large language models to build and optimize solutions within the Databricks and Azure ecosystems. The position requires hands-on experience with PySpark and involves collaboration with various stakeholders to integrate ML models into production workflows. The ideal candidate will stay updated on advancements in AI/ML and document methodologies for knowledge sharing. This is a long-term contract position that allows for remote work.
Key Responsibilities:
- Design, develop, and deploy data science and ML solutions on Databricks (Azure environment).
- Work on end-to-end ML lifecycle, from data preparation and feature engineering to model training, evaluation, and deployment.
- Apply LLM fine-tuning and optimization techniques within Databricks for domain-specific use cases.
- Utilize PySpark for distributed data processing, cleaning, and transformation.
- Collaborate with data engineers, cloud architects, and business stakeholders to ensure seamless integration of ML models into production workflows.
- Conduct exploratory data analysis (EDA), statistical modeling, and hypothesis testing to extract insights from structured and unstructured data.
- Stay updated on the latest advancements in AI/ML, LLMs, and Databricks capabilities to bring innovative solutions.
- Document methodologies, experiments, and best practices for knowledge sharing.
Key Skills:
- Bachelor's/Master's degree in Computer Science, Data Science, Statistics, AI/ML, or related field.
- Proven experience as a Data Scientist with exposure to ML and NLP projects.
- Strong hands-on experience with Databricks on Azure (MLflow, Delta Lake, Databricks ML).
- Proficiency in PySpark for large-scale data processing.
- Experience in training, fine-tuning, and deploying LLMs within Databricks environment.
- Strong programming skills in Python and familiarity with ML frameworks (TensorFlow, PyTorch, Scikit-learn, Hugging Face).
- Solid understanding of data science workflows: data wrangling, feature engineering, model development, and evaluation.
- Working knowledge of Azure cloud services (Azure Data Lake, Azure Synapse, Azure ML).
- Strong problem-solving, analytical thinking, and communication skills.
Salary (Rate): undetermined
City: undetermined
Country: USA
Working Arrangements: remote
IR35 Status: outside IR35
Seniority Level: undetermined
Industry: IT
Job Title: Data Scientist
Location: Atlanta, GA (Remote) Job Type: Long-Term Contract
About the Role
We are seeking a highly motivated and skilled Data Scientist with strong expertise in data science fundamentals, machine learning (ML), and large language models (LLMs). The ideal candidate will have hands-on experience working with Databricks and Azure ecosystems, including PySpark for data processing and LLM tuning within Databricks. This role involves building and optimizing data science solutions that leverage cloud-based technologies to deliver business value.
Key Responsibilities
- Design, develop, and deploy data science and ML solutions on Databricks (Azure environment).
- Work on end-to-end ML lifecycle, from data preparation and feature engineering to model training, evaluation, and deployment.
- Apply LLM fine-tuning and optimization techniques within Databricks for domain-specific use cases.
- Utilize PySpark for distributed data processing, cleaning, and transformation.
- Collaborate with data engineers, cloud architects, and business stakeholders to ensure seamless integration of ML models into production workflows.
- Conduct exploratory data analysis (EDA), statistical modeling, and hypothesis testing to extract insights from structured and unstructured data.
- Stay updated on the latest advancements in AI/ML, LLMs, and Databricks capabilities to bring innovative solutions.
- Document methodologies, experiments, and best practices for knowledge sharing.
Required Skills & Qualifications
- Bachelor s/Master s degree in Computer Science, Data Science, Statistics, AI/ML, or related field.
- Proven experience as a Data Scientist with exposure to ML and NLP projects.
- Strong hands-on experience with Databricks on Azure (MLflow, Delta Lake, Databricks ML).
- Proficiency in PySpark for large-scale data processing.
- Experience in training, fine-tuning, and deploying LLMs within Databricks environment.
- Strong programming skills in Python and familiarity with ML frameworks (TensorFlow, PyTorch, Scikit-learn, Hugging Face).
- Solid understanding of data science workflows: data wrangling, feature engineering, model development, and evaluation.
- Working knowledge of Azure cloud services (Azure Data Lake, Azure Synapse, Azure ML).
- Strong problem-solving, analytical thinking, and communication skills.
Good-to-Have Skills
- Experience with MLOps practices and tools (CI/CD for ML, MLflow).
- Knowledge of vector databases and LLM deployment pipelines.
- Familiarity with prompt engineering and RAG (Retrieval-Augmented Generation) techniques.
- Exposure to generative AI projects on cloud platforms.