Companies White Circle Multimodal ML Engineer

About the role

White Circle · Hybrid

TLDR: Multimodal ML Engineer to train and ship vision, audio, video, and speech models for an AI safety platform that operates at 100M+ API calls/month.

About us

White Circle is an AI Safety company building the safety, reliability, and optimization layer for AI systems. At the core of our platform are policies – simple natural-language rules that define what an AI model should and shouldn’t do. We automatically test, enforce, and continuously improve these policies at scale.

  • We’ve raised $11M from top funds, founders, and senior leaders at OpenAI, Anthropic, HuggingFace, Mistral, DeepMind, Datadog, Sentry, and others

  • We process over 100M+ API calls every month

  • We fine-tune and train our own LLMs so they run faster and cheaper than any open or proprietary model

We’re a small, highly focused team. If you want to work deeply on hard problems, see your work ship to production quickly, and influence how AI safety is actually built – you’re the one we need.

You will

  • Train and fine-tune large-scale multimodal models (vision-language, audio, speech) from scratch and from pretrained checkpoints

  • Extend models across modalities: image understanding, video temporal modeling, long-context processing, and streaming audio

  • Design and run experiments: architecture changes, data mixes, training recipes

  • Build and maintain multimodal data pipelines — from raw images, video, and audio recordings to training-ready datasets, including synthetic data generation

  • Train and optimize MoE architectures for efficient multimodal inference

  • Build alignment pipelines: SFT, DPO, GRPO, reward modeling — across modalities, not just text

  • Optimize models for production: quantization, distillation, batching, streaming and low-latency serving

  • Deploy models end-to-end: from research checkpoint to production serving

  • Define evaluation metrics and benchmarks that actually matter for the product: visual QA, spatial reasoning, video comprehension, speech and audio understanding

You’ll fit right in if you

  • 3+ years training large-scale deep learning models in multimodal domains (vision-language, audio, speech, or acoustic)

  • Strong PyTorch skills with hands-on distributed training experience (DeepSpeed, FSDP, or similar)

  • Deep experience with multimodal architectures — you understand how vision/audio encoders, projectors, and LLMs fit together (LLaVA, Qwen-VL, InternVL, Audio Flamingo, Omni Qwen, Audio Qwen, Whisper, HuBERT, Conformer, or similar)

  • Hands-on with RLHF/alignment for multimodal: GRPO, DPO, reward modeling — not just for text

  • Experience with video and/or audio sequence modeling: temporal modeling, long-context processing, efficient attention, streaming inference

  • Track record of shipping models to production: you've hit latency targets and optimized inference, not just reported benchmark scores

  • Comfortable with large-scale multimodal dataset curation: image-text pairs, video-instruction data, audio preprocessing, augmentation, synthetic data generation

  • Familiar with MoE architectures and their tradeoffs for multimodal workloads

  • Strong engineering fundamentals: clean code, version control, testing, documentation

A big plus:

  • Understanding of audio signal processing fundamentals (spectrograms, mel features, noise reduction) is a plus

Why White Circle

  • Paid time off in line with your local regulations, no matter where you work from

  • Work from Paris (hybrid) with a relocation package available, or work from London (note: we are unable to provide relocation support for London-based roles)

  • Comprehensive medical insurance for our France-based team (please note that we are in the process of setting up our UK office and therefore cannot offer medical insurance for London-based roles yet)

  • All the hardware, tools, and services you need

  • Covered subscriptions for AI agents and IDEs

  • Team off-sites twice a year: we’ve recently been to the Alps and to Saint-Tropez

 

How we hire

  1. Introductory call with HR (25 min)

  2. Take-home test task

  3. Technical interview with Head of Applied Research (60 min)

  4. Final conversation with our CEO (45 min)

Ready to apply to White Circle?
Apply to White Circle

Similar jobs

Sign up for suggestions tailored to the jobs you open and the searches you save.

Apply now
🤖

Whoa — hold up

JobsRadar was built for real people having a rough time in their job search — not for automated requests. You're clicking way too fast and you're now temporarily blocked.

Come back later. If you're genuinely job hunting, we've got your back — just act like a human.

Catch your next role the second it’s posted.

Create a free account and we’ll watch the boards for you — the instant a job matches your search, it lands in your inbox or Telegram. No digging, no refreshing.

Create free account

Free forever · takes 30 seconds · already have one?

Get the worldwide-remote edge.

Join our Telegram channel for the stuff that helps you land the role — salary benchmarks, the weekly market pulse, and new-feature drops. No spam, just signal.

Join the channel — it's free