Companies Pluralis Research Machine Learning Engineer - ML Training Platform

About the role

Pluralis Research

Overview

Pluralis Research is pioneering Protocol Learning—a fully decentralised way to train and deploy AI models that opens this layer to individuals rather than well resourced corporates. By pooling compute from many participants, incentivising their efforts, and preventing any single party from controlling a model’s full weights, we’re creating a genuinely open, collaborative path to frontier-scale AI.

We’re looking for an ML Training Platform Engineer to architect, build, and scale the foundational infrastructure powering our decentralized ML training platform. You will own core systems spanning infrastructure orchestration, distributed compute, and services integration, enabling continuous experimentation and large-scale model training.

Responsibilities

  • Multi-Cloud Infrastructure: Design resource management systems provisioning and orchestrating compute across AWS, GCP, and Azure using infrastructure-as-code (Pulumi/Terraform). Handle dynamic scaling, state synchronization, and concurrent operations across hundreds of heterogeneous nodes.

  • Distributed Training Systems: Architect fault-tolerant infrastructure for distributed ML. GPU clusters, NVIDIA runtime, S3 checkpointing, Large dataset management and streaming, health monitoring, and resilient retry strategies.

  • Real-World Networking: Build systems that simulate and handle real-world network conditions — bandwidth shaping, latency injection, packet loss — while managing dynamic node churn and ensuring efficient data flow across workers with heterogeneous connectivity, because our training happens on consumer nodes and non co-located infrastructure, not in a datacenter.

What You’ll Bring

Ideally, you’ll have 5+ years of work experience with deep experience in:

  • Infrastructure & Platform Engineering: Production experience with infrastructure-as-code (Pulumi/Terraform/CloudFormation) managing multi-cloud deployments, lifecycle orchestration, self-healing systems, Docker/Kubernetes (EKS), GPU workloads, and heterogeneous clusters at scale.

  • Distributed Systems & ML Infrastructure: Deep understanding of distributed training workflows, checkpointing, data sharding, model versioning, long-running job orchestration, decentralized networking (P2P, NAT traversal, traffic shaping), and real-world bandwidth constraints.

  • Systems Programming & Reliability: Strong Python engineering (asyncio, concurrency, retry logic, cloud SDKs, CLI tooling) with hands-on experience in observability, SRE practices, monitoring (Prometheus/Grafana), performance profiling, and incident response.

What we’re looking for

  • Experience in a startup environment with an emphasis on micro-services orchestration or big tech background

  • Deep understanding of multi-cloud infra & distributed training systems

  • A team player with high attention to detail

  • A strong passion to join

Backed by Union Square Ventures and other tier-1 investors, we’re a world-class, deeply technical team of ML researchers. Pluralis is unapologetically ideological. We view the world as a better place if we are able to implement what we are attempting, and Protocol Learning as the only plausible approach to preventing a handful of massive corporations monopolising model development, access and release, and achieving massive economic capture. If this resonates, please apply.

Ready to apply to Pluralis Research?
Apply to Pluralis Research

Similar jobs

Mariana Minerals
Machine Learning Engineer
Mariana Minerals
⚡ Apply early Ann Arbor, MI Onsite $120,000–$160,000
● New 👁 Seen ✓ Applied 52m ago
Redwood Materials
Software Engineer - ML/Computer Vision (Battery Sorting)
Redwood Materials
⚡ Apply early McCarran, NV; San Francisco, C... Onsite $152,500–$287,500
● New 👁 Seen ✓ Applied 3h ago
LS
Senior / Staff Machine Learning Engineer, Applied AI
Lila Sciences
⚡ Apply early Cambridge, MA USA; San Francis... Onsite
● New 👁 Seen ✓ Applied 3h ago
MrBeast
Senior Machine Learning Engineer
MrBeast
⚡ Apply early San Francisco Hybrid
● New 👁 Seen ✓ Applied 3h ago
MrBeast
Senior ML/AI Engineer
MrBeast
⚡ Apply early San Francisco Onsite $170,000–$223,000
● New 👁 Seen ✓ Applied 3h ago
DigitalOcean
Staff Forward Deployed Engineer, AI/ML
DigitalOcean
⚡ Apply early San Francisco Onsite $195,000–$239,000
● New 👁 Seen ✓ Applied 4h ago
Pinterest
Sr. Staff Machine Learning Engineer, Content Ecosystem
Pinterest
⚡ Apply early San Francisco, CA, US; Remote,... · location restricted $227,871–$469,147
● New 👁 Seen ✓ Applied 4h ago
Pinterest
Sr. Staff Machine Learning Engineer, Agentic Ads
Pinterest
⚡ Apply early San Francisco, CA, US; Remote,... · location restricted $227,871–$469,147
● New 👁 Seen ✓ Applied 4h ago
Pinterest
Sr. Machine Learning Engineer, Responsible AI– Applied Research Science
Pinterest
⚡ Apply early San Francisco, CA, US; Remote,... · location restricted $189,721–$332,012
● New 👁 Seen ✓ Applied 4h ago

Sign up for suggestions tailored to the jobs you open and the searches you save.

Apply now
🤖

Whoa — hold up

JobsRadar was built for real people having a rough time in their job search — not for automated requests. You're clicking way too fast and you're now temporarily blocked.

Come back later. If you're genuinely job hunting, we've got your back — just act like a human.

Catch your next role the second it’s posted.

Create a free account and we’ll watch the boards for you — the instant a job matches your search, it lands in your inbox or Telegram. No digging, no refreshing.

Create free account

Free forever · takes 30 seconds · already have one?

Get the worldwide-remote edge.

Join our Telegram channel for the stuff that helps you land the role — salary benchmarks, the weekly market pulse, and new-feature drops. No spam, just signal.

Join the channel — it's free