Companies Ifm Us Inference Optimization Intern – Performance Modeling

About the role

Ifm Us · Onsite
About the Institute of Foundation Models
 
The Institute of Foundation Models is dedicated to advancing the science and engineering of large-scale AI systems. Our researchers and engineers develop cutting-edge foundation models while pushing the limits of high-performance computing and efficient AI inference. By combining deep expertise in machine learning, systems engineering, and hardware optimization, we build scalable AI solutions that drive scientific discovery and real-world impact.
As part of the team, interns work alongside world-class researchers and performance engineers to optimize the execution of large-scale foundation models on next-generation NVIDIA GPU architectures. This internship provides hands-on experience in low-level GPU performance analysis, kernel optimization, and hardware-aware inference acceleration.

Key Responsibilities

This intensive internship offers a unique opportunity to contribute to the development of a simulator and profiling framework for foundation model inference on NVidia GPUs.
Responsibilities include:
  • Develop analytical performance models for GPU kernels and inference workloads.
  • Build and validate a simulator to estimate theoretical hardware performance limits.
  • Compare measured kernel performance against architectural peak throughput.
  • Identify performance bottlenecks in compute, memory, communication, and scheduling.
  • Analyze GPU execution using NVIDIA Nsight Systems and Nsight Compute.
  • Investigate PTX and SASS code generation to understand low-level execution behavior.
  • Collaborate with researchers and engineers to optimize inference kernels for transformer-based models.
  • Evaluate utilization of Tensor Cores, memory bandwidth, caches, and instruction pipelines.
  • Design profiling methodologies for Hopper and Blackwell architectures.
  • Document findings and provide actionable recommendations for performance improvements.
  • Academic Qualifications

    Currently pursuing a degree in Computer Science, Computer Engineering, Electrical Engineering, Artificial Intelligence, High-Performance Computing, or a related quantitative discipline.

    Preferred Qualifications

  • Experience with CUDA programming and GPU kernel development.
  • Understanding of NVIDIA GPU architecture and memory hierarchy.
  • Familiarity with performance profiling tools such as Nsight Systems and Nsight Compute.
  • Knowledge of PTX, SASS, and low-level GPU execution.
  • Experience optimizing CUDA kernels for throughput and latency.
  • Understanding of roofline analysis, performance modeling, and hardware utilization metrics.
  • Experience with deep learning frameworks such as PyTorch or TensorFlow.
  • Strong programming skills in C++, CUDA, and Python.
  • Desired Skills

  • Performance engineering mindset.
  • Strong analytical and debugging abilities.
  • Interest in AI systems, inference optimization, and hardware-software co-design.
  • Ability to work independently on research and engineering challenges.
  • Excellent written and verbal communication skills.
  • Ready to apply to Ifm Us?
    Apply to Ifm Us

    Similar jobs

    Sign up for suggestions tailored to the jobs you open and the searches you save.

    Apply now
    🤖

    Whoa — hold up

    JobsRadar was built for real people having a rough time in their job search — not for automated requests. You're clicking way too fast and you're now temporarily blocked.

    Come back later. If you're genuinely job hunting, we've got your back — just act like a human.

    Catch your next role the second it’s posted.

    Create a free account and we’ll watch the boards for you — the instant a job matches your search, it lands in your inbox or Telegram. No digging, no refreshing.

    Create free account

    Free forever · takes 30 seconds · already have one?

    Get the worldwide-remote edge.

    Join our Telegram channel for the stuff that helps you land the role — salary benchmarks, the weekly market pulse, and new-feature drops. No spam, just signal.

    Join the channel — it's free