Companies Synthesia Senior Research Engineer - Voice

About the role

Synthesia

Synthesia is the world’s leading AI video platform for business, used by over 90% of the Fortune 100. Founded in 2017, the company is headquartered in London, with offices and teams across Europe and the US.

As AI continues to shape the way we live and work, Synthesia develops products to enhance visual communication and enterprise skill development, helping people work better and stay at the center of successful organizations.

Following our recent Series E funding round, where we raised $200 million, our valuation stands at $4 billion. Our total funding exceeds $530 million from premier investors including Accel, NVentures (Nvidia's VC arm), Kleiner Perkins, GV, and Evantic Capital, alongside the founders and operators of Stripe, Datadog, Miro, and Webflow.

What you'll do at Synthesia

As a Research Engineer you will join a team of 40+ Researchers and Engineers within the R&D Department working on cutting-edge challenges in the Generative AI space, with a focus on creating high-quality, expressive and real-time synthetic voices. Within the team you’ll have the opportunity to work on the applied side of our research efforts and directly impact our solutions that are used worldwide by over 60,000 businesses.

If you are an expert in ML, LLMs, speech generation, conversational models, this is your chance to make a global impact. You will join our Audio Post-Training Team, which works on generative speech and voice synthesis, ensuring our in-house voice models reach production-level quality, speed, and robustness. Typical projects include:

  • Develop and evaluate streaming and speech-to-speech systems, enabling low-latency, interactive voice synthesis.

  • Adapt models for new conditioning inputs (emotion, speed, prosody, speaker control, etc.).

  • Implement post-training optimization techniques (quantization, pruning, distillation) to improve efficiency and latency in real-time speech generation.

  • Integrate and test novel architectures, such as neural codecs, diffusion, or flow-matching models, to enhance realism and responsiveness.

  • Contribute to defining new evaluation metrics for conversational speech, including latency-aware and online MOS prediction systems.

  • Stay updated with the latest research in audio diffusion, autoregressive models, neural codecs, and multimodal LLMs.

  • Apply DPO (Direct Preference Optimization) and distillation to fine-tune large-scale speech models.

What we're looking for:

  • Strong understanding of generative modeling, ideally applied to sequential or multimodal data.

  • Hands-on experience with large language models (LLMs) or similar transformer-based architectures.

  • High proficiency in PyTorch, including experience with distributed training and model optimization.

  • Solid grasp of time-series modeling and tokenization, preferably in the context of audio or speech.

  • Demonstrated ability to prototype quickly, test hypotheses, and iterate efficiently.

  • Proven experience in training deep learning models end-to-end, from data preparation to evaluation.

  • Strong general software engineering skills, enabling contributions to a large, shared research infrastructure.

Nice to have experience:

  • Experience with real-time or streaming architectures is a big plus.

  • Familiarity with state-of-the-art architectures in audio and speech generation (e.g., diffusion models, neural codecs, flow-matching models, autoregressive decoders).

  • Experience with speech-to-speech or text-to-speech (TTS) systems.

  • Evidence of original research contributions, such as publications or open-source work in top-tier venues (e.g., ICASSP, Interspeech, NeurIPS, ICML).

Ready to apply to Synthesia?
Apply to Synthesia
Apply now
🤖

Whoa — hold up

JobsRadar was built for real people having a rough time in their job search — not for automated requests. You're clicking way too fast and you're now temporarily blocked.

Come back later. If you're genuinely job hunting, we've got your back — just act like a human.

Catch your next role the second it’s posted.

Create a free account and we’ll watch the boards for you — the instant a job matches your search, it lands in your inbox or Telegram. No digging, no refreshing.

Create free account

Free forever · takes 30 seconds · already have one?

Get the worldwide-remote edge.

Join our Telegram channel for the stuff that helps you land the role — salary benchmarks, the weekly market pulse, and new-feature drops. No spam, just signal.

Join the channel — it's free