Companies Embedding VC Member of Technical Staff - ML Infrastructure & Performance

About the role

Embedding VC

Introducing Moonlake, AI for creating real-time interactive content

Mission: Improve Throughput, Latency, & Cost - deploying our models 2–10× faster & cheaper without quality regressions.

Scope of Work:

- GPU performance: CUDA/Triton kernels, FlashAttention family, paged attention, CUDA Graphs.

- Serving stack: TensorRT-LLM/Triton Inference Server, vLLM/TGI; continuous batching; on-GPU KV reuse; speculative decoding/medusa; mixture-of-agents routing.

- Parallelism: FSDP/ZeRO, TP/PP/expert parallel; NCCL tuning.

- Quantization/PEFT: AWQ/GPTQ/FP8; LoRA/DoRA serving.

- Systems: Ray/k8s/Argo, observability (Prom/Grafana/OpenTelemetry), autoscaling, A/B infra, canary + rollback.

Tech signals:

Previous experience at Infra-heavy startups such as Databricks, Roblox

We are committed to being an on-site, in-person team currently based in San Mateo

Ready to apply to Embedding VC?
Apply to Embedding VC

Similar jobs

Sign up for suggestions tailored to the jobs you open and the searches you save.

Apply now
🤖

Whoa — hold up

JobsRadar was built for real people having a rough time in their job search — not for automated requests. You're clicking way too fast and you're now temporarily blocked.

Come back later. If you're genuinely job hunting, we've got your back — just act like a human.

Catch your next role the second it’s posted.

Create a free account and we’ll watch the boards for you — the instant a job matches your search, it lands in your inbox or Telegram. No digging, no refreshing.

Create free account

Free forever · takes 30 seconds · already have one?

Get the worldwide-remote edge.

Join our Telegram channel for the stuff that helps you land the role — salary benchmarks, the weekly market pulse, and new-feature drops. No spam, just signal.

Join the channel — it's free