Companies Deeproute.ai Member of Technical Staff (MTS) - Multimodal Foundation Models

About the role

Deeproute.ai

Focus

Multimodal Foundation Models · Representation Learning · Method Innovation

We are looking for strong technical builders and researchers who deeply understand foundation models and representation learning beyond simply applying existing frameworks.

Ideal candidates should have:

  • Strong experimental rigor
  • Solid systems and modeling intuition
  • Hands-on engineering ability
  • Interest in scalable multimodal AI systems for real-world autonomy

We value people who can bridge research and production, and who care about robustness, scalability, efficiency, and practical deployment in large-scale autonomous driving systems.

Responsibilities

1. Large-Scale Foundation Model Pretraining

  • Develop scalable pretraining pipelines for large-scale multimodal driving data
  • Design and optimize training strategies for:
      • Vision-language-action models
      • Video foundation models
      • Long-context temporal modeling
      • Multimodal representation alignment
  • Improve:
    • Training stability
    • Data efficiency
    • Scaling efficiency
    • Representation robustness
  • Work on distributed training systems and large-scale model optimization using frameworks such as:
    • PyTorch Distributed
    • DeepSpeed
    • Megatron-LM

2. Representation Learning & Method Innovation

  • Design and improve self-supervised and multimodal learning methods for real-world autonomous driving systems
  • Conduct architecture-level research on:
    • Vision Transformers (ViT)
    • Video / temporal architectures
    • Multimodal fusion and alignment
    • Embedding and retrieval systems
    • Long-context and memory-efficient architectures
  • Explore and improve:
    • Pretraining objectives
    • Loss functions
    • Training paradigms
    • Generalization and robustness
  • Analyze model behavior through:
    • Rigorous ablation studies
    • Failure case analysis
  • Representation probing and evaluation

3. Efficient Foundation Models & Scalable Deployment

  • Improve the efficiency, scalability, and deployability of large multimodal foundation models for real-world autonomous driving systems
  • Work on areas such as:
    • Model quantization
    • Knowledge distillation
    • Efficient attention mechanisms
    • Sparse architectures and Mixture-of-Experts (MoE)
    • Long-context and memory-efficient modeling
    • Inference acceleration and serving optimization
    • Training and inference system efficiency
  • Optimize model throughput, latency, memory usage, and deployment performance for large-scale production environments

Requirements

  1. MS or PhD in:
      • Computer Vision
      • Machine Learning
      • Robotics
      • Computer Science
      • Related fields
  2. Strong understanding of:
      • Foundation models
      • Self-supervised learning
      • Representation learning
      • Multimodal learning
      • Large-scale pretraining
  3. Hands-on experience with methods such as:
      • CLIP
      • DINO / DINOv2
      • MAE
      • Contrastive learning
      • Masked modeling
      • MoE or scalable transformer architectures
  4. Experience with one or more of the following is highly valued:
      • Video foundation models
      • Long-context modeling
      • Retrieval systems
      • Efficient inference
      • Distributed training
      • Model compression and deployment optimization
  5. Strong publication record in top-tier venues is preferred:
      • CVPR
      • ICCV
      • ECCV
      • NeurIPS
      • ICLR
      • ICML
Ready to apply to Deeproute.ai?
Apply to Deeproute.ai
Apply now
🤖

Whoa — hold up

JobsRadar was built for real people having a rough time in their job search — not for automated requests. You're clicking way too fast and you're now temporarily blocked.

Come back later. If you're genuinely job hunting, we've got your back — just act like a human.

Catch your next role the second it’s posted.

Create a free account and we’ll watch the boards for you — the instant a job matches your search, it lands in your inbox or Telegram. No digging, no refreshing.

Create free account

Free forever · takes 30 seconds · already have one?

Get the worldwide-remote edge.

Join our Telegram channel for the stuff that helps you land the role — salary benchmarks, the weekly market pulse, and new-feature drops. No spam, just signal.

Join the channel — it's free