Companies Ifm Us Data Engineer

About the role

Ifm Us
About the Institute of Foundation Models
We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy.

As part of our team, you’ll have the opportunity to work on the core of cutting-edge foundation model training, alongside world-class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem-solving skills will be instrumental in establishing MBZUAI as a global hub for high-performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers.



The Role
 
As a Data Engineer specializing in Natural Language Processing (NLP) and large-scale data processing, you will quickly and effectively gather, curate, and prepare high-quality datasets to support cutting-edge NLP research. Your role will be instrumental in enabling researchers by delivering essential data through efficient and scalable engineering practices, including web crawling, LLM-generated content refinement, and robust data pipelines, primarily leveraging Python and related technologies.

Key Responsibilities

  • Rapidly collect, curate, and preprocess datasets based on detailed specifications provided by NLPresearchers,delivering data within tight timelines.
  • Develop and maintain efficient web crawling solutions, APIs, and automated workflows to continuously improve data collection processes.
  • Refine and evaluate outputs from Large Language Models (LLMs) to generate structured datasets suitable for model training and benchmarking.
  • Implement scalable data pipelines, ensuring efficient data processing, storage, retrieval, and distribution to research teams.
  • Collaborate closely with researchers and engineers to ensure collected data meets specified quality and relevance criteria.
  • Document data collection methodologies, dataset characteristics, and pipeline architecture clearly and effectively.
  • Engage with peer teams and participate in technical reviews to uphold best practices and data quality standards.
  • Represent MBZUAI at industry and research forums, showcasing technical capabilities in large-scale data processing and AI data infrastructure.
  • Academic Qualifications

  • Bachelor's degree in Computer Science, Data Science, Engineering, or a related technical field required
  • Master’s degree or PhD degree or equivalent experience in Computer Science, Data Engineering, or related technical fields preferred.
  • Professional Experience - Required

  • Extensive experience in data engineering, data processing, and automation using Python.
  • Demonstrated proficiency in designing and deploying web crawling solutions, automated data extraction, and processing pipelines.
  • Strong understanding of data structures, algorithms, databases, SQL, and performance optimization.
  • Experience working with cloud infrastructure and distributed data processing frameworks (e.g., AWS, Spark, Kafka, Kubernetes).
  • Excellent problem-solving abilities, attention to detail, and the capability to rapidly address technical challenges.
  • Strong communication and collaboration skills with cross-functional teams.
  • Professional Experience - Preferred

  • Proven track record of supporting NLP or AI research teams with rapid and reliable data delivery.
  • Experience working with large language models, including evaluation, efficient inference, and prompt engineering.
  • Experience with refining outputs from large-scale AI models, such as LLM-generated data.
  • Contributions to open-source projects, coding competitions, or high visibility in coding communities (e.g., GitHub, Stack Overflow).
  • Familiarity with the latest advancements in NLP data processing and large language model technologies.

  • Visa Sponsorship
    This position is eligible for visa sponsorship.

    Benefits Include
    *Comprehensive medical, dental, and vision benefits 
     *Bonus
    *401K Plan
    *Generous paid time off, sick leave and holidays
    *Paid Parental Leave
    *Employee Assistance Program
    *Life insurance and disability


    Ready to apply to Ifm Us?
    Apply to Ifm Us

    Similar jobs

    CoreWeave
    Senior Software Engineer, Data Center Infrastructure Tooling
    CoreWeave
    ⚡ Apply early Livingston, NJ / New York, NY... Onsite $165,000–$242,000
    ● New 👁 Seen ✓ Applied 1d ago
    CoreWeave
    Staff Business Systems Engineer, Data Center Systems
    CoreWeave
    ⚡ Apply early Sunnyvale, CA Onsite $188,000–$275,000
    ● New 👁 Seen ✓ Applied 1d ago
    CoreWeave
    Software Engineer - Data Infrastructure Services
    CoreWeave
    ⚡ Apply early Sunnyvale, CA / Bellevue, WA Onsite $109,000–$160,000
    ● New 👁 Seen ✓ Applied 1d ago
    Speechify
    Software Engineer, Data Infrastructure & Acquisition - Sunnyvale, CA, USA
    Speechify
    ⚡ Apply early Sunnyvale, CA, USA Onsite $140,000–$200,000
    ● New 👁 Seen ✓ Applied 3d ago
    Crusoe
    Senior Software Engineer, Data Platform
    Crusoe
    ⚡ Apply early Sunnyvale, CA - US Onsite $200,000–$220,000
    ● New 👁 Seen ✓ Applied 6d ago
    Crusoe
    Electrical Design Engineer - Data Center
    Crusoe
    ⚡ Apply early San Francisco, CA - US Onsite $195,000–$225,000
    ● New 👁 Seen ✓ Applied 2w ago
    Crusoe
    Senior Staff Data Center Operations Engineer, GPU Hardware Architecture
    Crusoe
    ⚡ Apply early San Francisco, CA - US Onsite $179,000–$218,000
    ● New 👁 Seen ✓ Applied 2w ago
    Ceribell, Inc
    AI & Data Systems Engineer (Remote)
    Ceribell, Inc
    ⚡ Apply early Sunnyvale, CA $141,000–$173,000
    ● New 👁 Seen ✓ Applied 3w ago
    DoorDash USA
    Software Engineer II, Data Engineering
    DoorDash USA
    ⚡ Apply early San Francisco, CA; Sunnyvale,... $130,600–$192,000
    ● New 👁 Seen ✓ Applied 1mo ago

    Sign up for suggestions tailored to the jobs you open and the searches you save.

    Apply now
    🤖

    Whoa — hold up

    JobsRadar was built for real people having a rough time in their job search — not for automated requests. You're clicking way too fast and you're now temporarily blocked.

    Come back later. If you're genuinely job hunting, we've got your back — just act like a human.

    Catch your next role the second it’s posted.

    Create a free account and we’ll watch the boards for you — the instant a job matches your search, it lands in your inbox or Telegram. No digging, no refreshing.

    Create free account

    Free forever · takes 30 seconds · already have one?

    Get the worldwide-remote edge.

    Join our Telegram channel for the stuff that helps you land the role — salary benchmarks, the weekly market pulse, and new-feature drops. No spam, just signal.

    Join the channel — it's free