Negotiable
Inside
Remote
United Kingdom
Summary: The role involves hiring an experienced Contract Data Scientist/ML Engineer for a significant Generative AI project within a secure public-sector environment. The position focuses on transforming complex datasets for AI applications and requires active SC clearance. It is fully remote with occasional travel within the UK, and the role is classified as inside IR35.
Key Responsibilities:
- Analyse, structure, and transform complex, messy datasets into machine-readable formats suitable for LLMs.
- Design and optimise RAG datasets, embeddings pipelines, and retrieval strategies.
- Implement and evaluate embeddings-based search using vector databases.
- Conduct robust EDA, data quality assessment, and anomaly detection.
- Translate manual human processes into clear, machine-interpretable logic for GenAI integration.
- Deliver modular, production-ready Python code with minimal oversight.
- Evaluate LLM and RAG system performance using modern metrics and techniques.
Key Skills:
- Strong Python engineering skills (exploratory + production-ready).
- Comprehensive EDA and data analysis capability.
- Expertise in LLM data preparation, including:
- Prompt engineering fundamentals.
- Embeddings & vector databases (FAISS, Weaviate, Chroma).
- RAG dataset design & retrieval optimisation (chunking strategies, hybrid search, re-ranking).
- Evaluation techniques for RAG (retrieval scoring, LLM-as-a-judge, hallucination checks).
- Ability to convert unstructured, ambiguous data into structured, validated datasets.
- Strong understanding of data quality, validation, and documenting assumptions.
- Clear communication of technical findings to both technical and non-technical audiences.
- Familiarity with AWS is beneficial.
Salary (Rate): undetermined
City: undetermined
Country: United Kingdom
Working Arrangements: remote
IR35 Status: inside IR35
Seniority Level: undetermined
Industry: IT
I'm partnering with a specialist UK technology consultancy to support the hire of an experienced Contract Data Scientist/ML Engineer for a major Generative AI project within a secure public-sector environment.
This is an opportunity to work on high-impact AI initiatives, helping to redesign complex human-driven processes through LLMs and advanced retrieval systems. The work is fully remote with periodic UK travel, and active SC clearance is essential.
This role is inside iR35 and fully remote.
Key Responsibilities
-
Analyse, structure, and transform complex, messy datasets into machine-readable formats suitable for LLMs.
-
Design and optimise RAG datasets, embeddings pipelines, and retrieval strategies.
-
Implement and evaluate embeddings-based search using vector databases.
-
Conduct robust EDA, data quality assessment, and anomaly detection.
-
Translate manual human processes into clear, machine-interpretable logic for GenAI integration.
-
Deliver modular, production-ready Python code with minimal oversight.
-
Evaluate LLM and RAG system performance using modern metrics and techniques.
Technical Skills
-
Strong Python engineering skills (exploratory + production-ready).
-
Comprehensive EDA and data analysis capability.
-
Expertise in LLM data preparation, including:
-
Prompt engineering fundamentals.
-
Embeddings & vector databases (FAISS, Weaviate, Chroma).
-
RAG dataset design & retrieval optimisation (chunking strategies, hybrid search, re-ranking).
-
Evaluation techniques for RAG (retrieval scoring, LLM-as-a-judge, hallucination checks).
-
-
Ability to convert unstructured, ambiguous data into structured, validated datasets.
-
Strong understanding of data quality, validation, and documenting assumptions.
-
Clear communication of technical findings to both technical and non-technical audiences.
-
Familiarity with AWS is beneficial.