Companies Plaud Machine Learning Engineer, Inference & Serving (Speech LLM) - San Francisco

About the role

Plaud · Hybrid

About Plaud Inc.

Plaud is building the world's most trusted AI work companion for professionals to elevate productivity and performance through note-taking solutions, loved by over 1,500,000 users worldwide since 2023. With a mission to amplify human intelligence, Plaud is building the next-generation intelligence infrastructure and interfaces to capture, extract, and utilize what you say, hear, see, and think.

 

Plaud Inc. is a Delaware-incorporated, San Francisco-based company pushing the boundary of human–AI intelligence through a hardware–software combination. With SOC 2, HIPAA, GDPR, ISO27001, ISO27701, and EN18031 compliance, Plaud is committed to the highest standards of data security and privacy protection.

To learn more about Plaud, please visit https://www.Plaud.ai and follow along on Instagram, X, Facebook, LinkedIn, and YouTube

 

Why You Should Join Us

Plaud is building the next generation intelligence infrastructure and interfaces to capture, extract, and utilize intelligence from what people say, hear, see, and think.

  • Plaud is a bootstrapped, skyrocketing, profitable company with a $250M revenue run rate achieved in just three years.

  • Define the next-gen paradigm for human-AI interaction.

  • Gain exposure to cutting-edge AI for Pro tools and play a direct role in our global expansion.

  • Work with passionate teammates who value innovation, collaboration, and customer success.

  • Grow your career in a culture that champions continuous learning and fast career development.

  • Market-competitive compensation, global exposure, and a vibrant, creativity-fueled work atmosphere.

 

You may be a good fit if you:

  • Have hands-on experience building and deploying high-throughput, ultra-low-latency inference engines for large language models or foundational speech models.

  • Understand the intricate tradeoffs between latency, throughput, and Time-To-First-Token (or Time-To-First-Audio) in real-time streaming environments.

  • Have practical experience with continuous batching, KV cache management (e.g., PagedAttention), and stateful connections necessary for real-time conversational AI.

  • Possess a deep understanding of GPU architectures (NVIDIA Ampere/Hopper) and the memory hierarchy, allowing you to identify and eliminate hardware bottlenecks.

  • Communicate clearly and collaborate effectively, as you will sit at the critical intersection between the core ML training team and the backend infrastructure team.

  • Thrive in fast-moving environments and genuinely enjoy the systems-engineering challenge of squeezing every last drop of performance out of a cluster of GPUs.

  • Are obsessed with building AI systems that natively understand and generate speech, ultimately creating a hardware-software AI companion that amplifies human productivity.

 

Strong candidates may also have experience with:

  • Frontier Serving Frameworks: Deep, under-the-hood familiarity with modern LLM serving frameworks like vLLM, TensorRT-LLM, SGLang, or NVIDIA Triton Inference Server (bonus points for active open-source contributions to these repositories).

  • Real-Time Audio Streaming: Experience handling continuous audio streams over WebSockets or WebRTC, deploying neural audio codecs, and managing chunked audio generation to minimize conversational latency.

  • Advanced Inference Techniques: Implementing cutting-edge generation algorithms such as speculative decoding, lookahead decoding, or chunked prefill.

  • Model Compression & Quantization: Hands-on experience with post-training quantization (PTQ), deploying models in FP8, INT8, AWQ, or GPTQ, without degrading audio naturalness or ASR accuracy.

  • Large-Scale Distributed Systems: Deploying multi-GPU (Tensor Parallelism) and multi-node inference pipelines, and managing autoscaling infrastructure using Kubernetes.

 

What We Offer

  • Founding Team Initiative: Opportunity to be an early, foundational member of our core SpeechLLM lab, with meaningful ownership and impact on a fast-growing startup.

  • Competitive Compensation: $200K - $540K base salary + performance bonus + Equity.

  • Comprehensive Benefits: Top-tier healthcare for employees and dependents, including dental and vision, and a generous employer subsidy.

  • Retirement Planning: 401(k) plan for full-time employees with company matching.

  • Paid Time Off: Unlimited PTO, plus 13 paid holidays.

  • New Parent Leave: 12 weeks of paid time off to spend time with your new family, regardless of gender.

  • Hybrid Office: Minimum of 3x in-office per week to foster highly collaborative, fast-paced research.

  • Gear & Perks: Choice of top-of-the-line laptops/workstations, annual offsites, and a fully stocked office.

 

Plaud is and will continue to be an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristics.

Ready to apply to Plaud?
Apply to Plaud

Similar jobs

Redwood Materials
Software Engineer - ML/Computer Vision (Battery Sorting)
Redwood Materials
⚡ Apply early McCarran, NV; San Francisco, C... Onsite $152,500–$287,500
● New 👁 Seen ✓ Applied 6h ago
Calico
Machine Learning / Senior Machine Learning Scientist / Engineer
Calico
⚡ Apply early South San Francisco, CA Onsite $170,000–$240,000
● New 👁 Seen ✓ Applied 12h ago
Quartermaster
Senior RF Machine Learning Engineer
Quartermaster
⚡ Apply early Arlington, VA Hybrid $210,000–$260,000
● New 👁 Seen ✓ Applied 13h ago
Airbnb
Machine Learning Engineer, Community Support Engineering
Airbnb
⚡ Apply early San Francisco, CA Onsite $170,000–$180,000
● New 👁 Seen ✓ Applied 14h ago
Pinterest
Sr. Staff Machine Learning Engineer, Monetization Engineering
Pinterest
⚡ Apply early San Francisco, CA, US; Palo Al... Onsite $227,871–$469,147
● New 👁 Seen ✓ Applied 14h ago
Pinterest
Sr. Staff Machine Learning Engineer, Agentic Ads
Pinterest
⚡ Apply early San Francisco, CA, US; Remote,... · location restricted $227,871–$469,147
● New 👁 Seen ✓ Applied 15h ago
Lumafield
Application Engineer, AI/ML
Lumafield
⚡ Apply early San Francisco, CA Onsite
● New 👁 Seen ✓ Applied 19h ago
Rocket Money
Staff ML Engineer, Product
Rocket Money
⚡ Apply early San Francisco, CA, Washington,... Onsite $210,000–$260,000
● New 👁 Seen ✓ Applied 1d ago
Chime Financial, Inc
Senior AI/ML Engineer
Chime Financial, Inc
⚡ Apply early Chicago, IL, USA; New York, NY... Onsite
● New 👁 Seen ✓ Applied 1d ago

Sign up for suggestions tailored to the jobs you open and the searches you save.

Apply now
🤖

Whoa — hold up

JobsRadar was built for real people having a rough time in their job search — not for automated requests. You're clicking way too fast and you're now temporarily blocked.

Come back later. If you're genuinely job hunting, we've got your back — just act like a human.

Catch your next role the second it’s posted.

Create a free account and we’ll watch the boards for you — the instant a job matches your search, it lands in your inbox or Telegram. No digging, no refreshing.

Create free account

Free forever · takes 30 seconds · already have one?

Get the worldwide-remote edge.

Join our Telegram channel for the stuff that helps you land the role — salary benchmarks, the weekly market pulse, and new-feature drops. No spam, just signal.

Join the channel — it's free