About this Senior ML Research Engineer, Marengo role at TwelveLabs

TwelveLabs · Hybrid · Seoul, South Korea

Who we are

At TwelveLabs, we are pioneering the development of cutting-edge multimodal foundation models that have the ability to comprehend videos just like humans do. Our models have redefined the standards in video-language modeling, empowering us with more intuitive and far-reaching capabilities, and fundamentally transforming the way we interact with and analyze various forms of media.

With a $110+ million in Seed and Series A funding, our company is backed by top-tier venture capital firms such as NVIDIA’s NVentures, NEA, Radical Ventures, and Index Ventures, and prominent AI visionaries and founders such as Fei-Fei Li, Silvio Savarese, Alexandr Wang and more. Headquartered in San Francisco, with an influential APAC presence in Seoul, our global footprint underscores our commitment to driving worldwide innovation.

Our partnership with NVIDIA and AWS gives us access to the most advanced chips, including B300s, enabling us to push the boundaries of what's possible in video AI.

We are a global company that values the uniqueness of each person’s journey. It is the differences in our cultural, educational, and life experiences that allow us to constantly challenge the status quo. We are looking for individuals who are motivated by our mission and eager to make an impact as we push the bounds of technology to transform the world. Join us as we revolutionize video understanding and multimodal AI.

About the Team

This team owns the research and development of Marengo, TwelveLabs’ multimodal embedding model. We develop foundation models that bring video, audio, and text into a shared embedding space, powering state-of-the-art multimodal understanding and retrieval.

End-to-end model development: We work across a broad range of research areas, including contrastive learning, temporal video understanding, and multimodal representation learning. The team owns the entire model development lifecycle—from building large-scale training datasets and designing model architectures to optimizing distributed training and developing robust evaluation frameworks.

Research at scale: With access to world-class compute infrastructure, including NVIDIA B300 GPUs, we rapidly iterate on large-scale experiments, enabling fast progress on ambitious research problems.

Research with real-world impact: The path from research to production is exceptionally short. We work closely with the Search, Product, and Infrastructure teams to continuously improve the models that power multimodal search and understanding for thousands of customers worldwide.

About the Role

As a Senior ML Research Engineer on the Marengo team, you will drive the research and development of TwelveLabs' multimodal embedding models, from data strategy and training pipeline optimization to model architecture experimentation and evaluation.

This is a research-heavy engineering role at the intersection of multimodal representation learning, large-scale distributed training, and data engineering. We're looking for a strong engineer-researcher who can take well-scoped research problems with moderate ambiguity, design rigorous experiments, and deliver reproducible results that ship to production.

In this role, you will

Design and execute experiments to improve multimodal embedding model quality, spanning model architecture, training methodology, data composition, and evaluation
Build and optimize large-scale distributed training pipelines (multi-node, multi-GPU) for contrastive and representation learning
Develop and improve data curation, filtering, and quality assessment pipelines at scale
Conduct ablation studies to systematically evaluate design choices and communicate findings to guide technical direction
Implement evaluation frameworks and benchmarks that rigorously measure embedding model quality
Collaborate with the search/serving team to ensure model improvements translate to end-to-end retrieval quality gains

Even if you don't check every box, we encourage you to apply.

If you're a zero-to-one achiever, a ferocious learner, and a kind team player who motivates others, you'll find a home at TwelveLabs.

You may be a good fit if you have

4–7 years of industry experience in computer vision, NLP, or multimodal learning, with a track record of shipping ML systems to production
Strong proficiency in Python and PyTorch, with hands-on experience in distributed model training
Experience in contrastive learning, representation learning, or embedding models, demonstrated through shipped products, publications, or open-source contributions
End-to-end ownership experience: taking a model from research idea through training to production deployment, not just running experiments in isolation
Ability to independently drive research projects from problem definition through experiment design to conclusions
Effective communication skills for collaborating with colleagues from diverse backgrounds

We evaluate based on relevant technical skills and industry impact rather than degrees alone. This role is typically a strong fit for engineers with an MS and meaningful industry experience building ML systems at scale.

Preferred Qualifications

Experience with temporal video understanding (segmentation, boundary detection, temporal grounding)
Experience with large-scale data curation (filtering, deduplication, quality scoring) for model training
Experience with training infrastructure optimization (mixed precision, gradient checkpointing, communication backends)
Familiarity with experiment tracking and reproducibility tools
Experience with petabyte-scale data processing

What makes this role unique

The gap between research and production is remarkably short here. Models you build will be used by thousands of companies worldwide within months. We work as a unified team toward the broader goal of video understanding, rather than solving isolated problems. Our research philosophy balances rigorous experimentation with real-world application: we aim to build multimodal systems that are powerful, trustworthy, and genuinely useful.

Others

Work Location: Seoul Itaewon office + Pangyo satellite office
Additional Info: 전문연구요원 편입/전직 가능합니다.

Hiring Process

Application Review → Recruiter Interview (비대면/30분) → Loop Interview [Hiring Manager Interview&Live Coding Test Interview] (대면/약 90분) → System Design Interview(대면/약 60분) → Final Round Interview (비대면/약 30분) → Reference Check → Offer

Benefits and Perks

Growth & Tools
- 글로벌 B2B 고객과 함께 성장하는 Global Team
- 자율성과 협업을 모두 갖춘 하이브리드 근무
- 최신 맥북 및 70만 원 상당 재택근무 장비 지원, 3년 주기로 최신 장비 교체
- Tokens never sleep - Tech 직군 LLM 토큰 무제한 지원
- 강의, 컨퍼런스, 멤버십 등에 사용 가능한 연 140만원 상당 자기개발비 지원
- 영어 교육 프로그램 및 글로벌 버디 프로그램 운영
- 야간 및 주말 출퇴근 택시비 지원
Meal & Snack
- 식비·교통비 등 자유롭게 사용할 수 있는 연 720만원 상당 법인카드 제공
- 사무실 내 스낵바 운영 (간식, 커피, 제철 과일 등)
- 사무실 근무 시, 오후 7시 이후 저녁 식대 제공
Wellness & Family
- 연 1회 본인 및 가족 1인의 건강검진 제공
- 단체보험 가입 (상해보험/치아보험/가족 상해보험 중 택 1)
- 독감 예방접종비 지원
- 연말 2주간 유급 Holiday Break 운영

Ready to apply to TwelveLabs?

Apply to TwelveLabs