About the role

Embedding VC

Introducing Moonlake, AI for creating real-time interactive content

Mission: Improve Throughput, Latency, & Cost - deploying our models 2–10× faster & cheaper without quality regressions.

Scope of Work:

- GPU performance: CUDA/Triton kernels, FlashAttention family, paged attention, CUDA Graphs.

- Serving stack: TensorRT-LLM/Triton Inference Server, vLLM/TGI; continuous batching; on-GPU KV reuse; speculative decoding/medusa; mixture-of-agents routing.

- Parallelism: FSDP/ZeRO, TP/PP/expert parallel; NCCL tuning.

- Quantization/PEFT: AWQ/GPTQ/FP8; LoRA/DoRA serving.

- Systems: Ray/k8s/Argo, observability (Prom/Grafana/OpenTelemetry), autoscaling, A/B infra, canary + rollback.

Tech signals:

Previous experience at Infra-heavy startups such as Databricks, Roblox

We are committed to being an on-site, in-person team currently based in San Mateo

Ready to apply to Embedding VC?

Apply to Embedding VC

Similar jobs

Senior Product Engineer (North America)

Embedding VC

⚡ Apply early San Francisco Bay Area Hybrid $300,000–$300,000

● New 👁 Seen ✓ Applied 2w ago

AI 数据平台产品经理｜标注 / 评测方向

Embedding VC

⚡ Apply early Palo Alto, CA

● New 👁 Seen ✓ Applied 1mo ago

Data Scientist

Embedding VC

⚡ Apply early Redwood City, CA

● New 👁 Seen ✓ Applied 1mo ago

Product Manager, Core Platform

Embedding VC

⚡ Apply early New York, NY

● New 👁 Seen ✓ Applied 1mo ago

Brand Designer

Embedding VC

⚡ Apply early New York, NY

● New 👁 Seen ✓ Applied 1mo ago

Director of Education

Embedding VC

⚡ Apply early New York, NY

● New 👁 Seen ✓ Applied 1mo ago

Enterprise Marketing Lead

Embedding VC

⚡ Apply early New York, NY

● New 👁 Seen ✓ Applied 1mo ago

Product Marketing Lead

Embedding VC

⚡ Apply early New York, NY

● New 👁 Seen ✓ Applied 1mo ago

Growth Marketing Lead

Embedding VC

⚡ Apply early New York, NY

● New 👁 Seen ✓ Applied 1mo ago

About the role

Similar jobs

Whoa — hold up

Catch your next role the second it’s posted.

Get the worldwide-remote edge.