Negotiable
Outside
Remote
Manchester, England, United Kingdom
Summary: The role of Senior Machine Learning Engineer - Speech / Voice AI involves developing an in-house voice generation and audio delivery system for a technology-enabled wellbeing platform that supports neurodiverse users and individuals with disabilities. The engineer will focus on creating a text-to-speech capability that enhances accessibility and emotional engagement, while also implementing multilingual functionality and dynamic personalization. This position is fully remote and offers a contract length of 3 months, classified as outside IR35. The ideal candidate will have a strong background in machine learning and audio processing, with experience in deploying modern TTS models.
Key Responsibilities:
- Develop an in-house voice generation and audio delivery system to enhance accessibility and emotional engagement.
- Build a text-to-speech capability that produces natural, empathetic voices for guided exercises and wellbeing content.
- Implement multilingual functionality and customizable voice tones to support diverse user needs.
- Enable dynamic personalization for user content preferences.
- Integrate the audio system with the existing app and backend for real-time playback.
- Create an inclusive, emotionally intelligent audio experience to support lasting behavioural wellbeing.
Key Skills:
- Strong background in Machine Learning / Deep Learning with hands-on experience in speech or audio processing.
- Experience fine-tuning or deploying modern TTS models (e.g. VITS, Bark, or FastSpeech2).
- Proficiency in PyTorch (or similar) and comfortable optimizing GPU inference.
- Experience deploying ML models to production and integrating via APIs.
- Familiarity with AWS, GCP, or Azure for scalable deployment.
- Understanding of speaker cloning or emotional prosody control (desirable).
- Experience with multilingual TTS or phoneme alignment (desirable).
- Interest in ethical AI and accessible, emotionally sensitive applications (desirable).
Salary (Rate): undetermined
City: Manchester
Country: United Kingdom
Working Arrangements: remote
IR35 Status: outside IR35
Seniority Level: undetermined
Industry: IT
Our client is a technology-enabled wellbeing platform that supports neurodiverse users and individuals with disabilities to thrive in education, work, and everyday life. They are looking to develop an in-house voice generation and audio delivery system to enhance accessibility and emotional engagement and searching for the ML Engineer to work remotely that's going to make it happen!
Senior Machine Learning Engineer - Speech / Voice AI (remote)
Contract length: 3-month
IR-35 determination: Outside
Location: Fully remote
Our client is a technology-enabled wellbeing platform that supports neurodiverse users and individuals with disabilities to thrive in education, work, and everyday life. They offer businesses a personal productivity app featuring tools for task breakdown, priority-setting, and structured support to manage anxiety, procrastination, and executive dysfunction. The platform combines tailored learning resources, assistive technology guidance, and mental health content in one accessible space. It serves both students and professionals, helping them build resilience, independence, and sustainable wellbeing through behaviour-change frameworks.
Our Client Is Looking For Someone To Develop an in-house voice generation and audio delivery system to enhance accessibility and emotional engagement.
- Build a text-to-speech capability that produces natural, empathetic voices for guided exercises and wellbeing content.
- Implement multilingual functionality and customizable voice tones to support diverse user needs.
- Enable dynamic personalization so users receive content in voices and styles suited to their preferences.
- Integrate the audio system seamlessly with the existing app and backend for real-time playback and consistency across devices.
- Create an inclusive, emotionally intelligent audio experience that deepens user connection and supports lasting behavioural wellbeing.
Required Skills
- Strong background in Machine Learning / Deep Learning with hands-on experience in speech or audio processing.
- Experience fine-tuning or deploying modern TTS models (e.g. VITS, Bark, or FastSpeech2).
- Proficiency in PyTorch (or similar) and comfortable optimizing GPU inference.
- Experience deploying ML models to production and integrating via APIs.
- Familiarity with AWS, GCP, or Azure for scalable deployment.
Desirable
- Understanding of speaker cloning or emotional prosody control.
- Experience with multilingual TTS or phoneme alignment.
- Interest in ethical AI and accessible, emotionally sensitive applications.
This is an exciting opportunity to help shape an inclusive AI experience that brings empathy and accessibility to users around the world.
Robert Walters Operations Limited is an employment business and employment agency and welcomes applications from all candidates