Companies Wizard Senior Machine Learning Engineer (Inference Platform)

About the role

Wizard

About Wizard AI

At Wizard AI, we’re building the top-performing AI Shopping Agent that delivers the best products from across the web with unmatched accuracy, quality, and trust. Our ML models power the core of our platform, and we’re looking for a Senior Machine Learning Engineer to own how they run in production reliably, efficiently, and at scale.

The Role

As a Senior ML Engineer on our Inference Platform, you’ll own the end-to-end lifecycle of production ML serving systems from model packaging and deployment to monitoring, optimization, and scaling. This is not a traditional MLOps role focused solely on pipelines and tooling. You’ll be responsible for the inference infrastructure powering a live conversational shopping agent, operating multiple specialized serving engines under real-world production load.

You’ll own critical decisions around serving architecture, performance, reliability, and scalability, working closely with ML Engineers, Data teams, Product, and DevOps to ensure models move seamlessly from experimentation into high-performance production systems.

What You'll Do

  • Own and evolve our multi-engine inference platform, supporting a variety of model types and serving requirements.
  • Build and improve production ML pipelines — taking models from experimentation to reliable, high-throughput serving.
  • Define and implement model versioning, rollout, rollback, and lifecycle management strategies that ensure reproducibility and operational reliability.
  • Define and enforce serving-layer SLAs, including latency, availability, GPU utilization, Time-to-First-Token (TTFT), and Inter-Token Latency (ITL).
  • Build observability, monitoring, alerting, and operational tooling for production inference systems.
  • Apply software engineering best practices, including testing, CI/CD integration, and reproducibility across ML workflows.
  • Optimize inference performance through efficient resource utilization, hardware-aware serving strategies, and cost-conscious infrastructure design.
  • Ensure ML serving systems are secure, scalable, and operationally resilient.
  • Partner with ML, Data, Product, and DevOps teams to turn ideas into production systems, driving the technical decisions on serving and scale.

What We're Looking For

  • Bachelor's or Master's degree in Computer Science, Data Science, Engineering, or a related field, or equivalent practical experience.
  • 5–8+ years of experience in Software Engineering, ML Engineering, Platform Engineering, or Infrastructure Engineering, with direct ownership of production ML serving systems.
  • Hands-on experience running an LLM serving engine (vLLM, TGI, TensorRT-LLM, or SGLang) in production under real load — not just managed or hosted endpoints.
  • Strong Python skills and software engineering fundamentals, combined with deep systems and infrastructure knowledge.
  • Experience with cloud platforms such as AWS, GCP, or Azure, and familiarity with ML lifecycle tooling, experimentation platforms, and model registries.
  • Strong grasp of inference performance — continuous batching, KV-cache and GPU-memory behavior, quantization, and CPU-versus-GPU bottlenecks — with the instinct to profile before tuning.
  • Experience serving heterogeneous workloads, including LLMs, embedding models, and extraction models, each with distinct latency, throughput, and scaling requirements.
  • Demonstrated ability to balance latency, throughput, reliability, and infrastructure cost while operating production-scale ML systems.
  • Experience in high-growth startup environments and comfort operating in fast-moving, evolving technical landscapes.

What Success Looks Like

Reliable, Scalable Inference Systems

Production serving infrastructure operates with clear SLAs, strong observability, and minimal downtime. Latency, availability, throughput, and GPU utilization are actively measured and optimized as platform demands grow.

End-to-End Ownership

You own the complete serving lifecycle — from deployment and release management through monitoring, optimization, and scaling — enabling ML engineers to ship quickly while maintaining reliability and reproducibility.

Technical Leadership and Impact

You shape the future of Wizard's inference platform, driving key architectural decisions that improve performance, reduce infrastructure costs, and support the next generation of AI-powered shopping experiences.

Ready to apply to Wizard?
Apply to Wizard

Similar jobs

Anduril Industries
Senior Machine Learning Engineer, Sentry Tower
Anduril Industries
⚡ Apply early Irvine, California, United Sta... · location restricted $220,000–$330,000
● New 👁 Seen ✓ Applied 8h ago
Block
Staff Applied Machine Learning Engineer - Fraud & Abuse
Block
⚡ Apply early Bay Area, CA, United States of... Onsite $276,800–$415,200
● New 👁 Seen ✓ Applied 8h ago
Block
Staff Applied Machine Learning Engineer - Intelligent Data, Signals & Systems
Block
⚡ Apply early Bay Area, CA, United States of... Onsite $276,800–$415,200
● New 👁 Seen ✓ Applied 8h ago
Nebius
ML Infrastructure Engineer
Nebius
⚡ Apply early Amsterdam, Netherlands; Remote... · location restricted
● New 👁 Seen ✓ Applied 12h ago
Nebius
Senior Applied ML Engineer (Agentic Search)
Nebius
⚡ Apply early Amsterdam, Netherlands; London... · location restricted
● New 👁 Seen ✓ Applied 12h ago
Nebius
Senior ML Engineer (AI Research)
Nebius
⚡ Apply early Amsterdam, Netherlands; Israel... · location restricted
● New 👁 Seen ✓ Applied 12h ago
Reddit
Machine Learning Engineer, Ads Optimization & Ads Marketplace Quality
Reddit
⚡ Apply early Remote - United States · location restricted $185,800–$303,400
● New 👁 Seen ✓ Applied 17h ago
Reddit
Machine Learning Systems Engineer, Ads ML Platform
Reddit
⚡ Apply early Remote - The Netherlands · location restricted
● New 👁 Seen ✓ Applied 17h ago
Reddit
Machine Learning Systems Engineer, Ads ML Platform
Reddit
⚡ Apply early Remote - United Kingdom · location restricted
● New 👁 Seen ✓ Applied 17h ago

Sign up for suggestions tailored to the jobs you open and the searches you save.

Apply now
🤖

Whoa — hold up

JobsRadar was built for real people having a rough time in their job search — not for automated requests. You're clicking way too fast and you're now temporarily blocked.

Come back later. If you're genuinely job hunting, we've got your back — just act like a human.

Catch your next role the second it’s posted.

Create a free account and we’ll watch the boards for you — the instant a job matches your search, it lands in your inbox or Telegram. No digging, no refreshing.

Create free account

Free forever · takes 30 seconds · already have one?

Get the worldwide-remote edge.

Join our Telegram channel for the stuff that helps you land the role — salary benchmarks, the weekly market pulse, and new-feature drops. No spam, just signal.

Join the channel — it's free