About the role
This role is for one of Weekday’s clients
Salary range: Rs 2000000 - Rs 10000000 (ie INR 20 - 100 LPA)
Min Experience: 3+ years
Location: India
JobType: full-time
We are looking for a highly skilled GPU Compute, MLIR Compiler, and Kernel Optimization Engineer with deep expertise in GPU compute, MLIR-based code generation, and end-to-end performance optimization for AI workloads. In this role, you will design, optimize, and deploy high-performance GPU compute kernels, build and extend MLIR compiler backends, and collaborate closely with ML, runtime, and hardware teams to push the limits of performance on modern GPU architectures.
Requirements
Key Responsibilities
- Develop and optimize GPU compute kernels targeting OpenCL and Vulkan compute backends for high-throughput AI/ML workloads.
- Design, build, and extend MLIR dialects across multiple abstraction levels—including frontend dialects, graph-level IR, tensor IR (e.g., Linalg, Tensor, TOSA), and runtime/low-level dialects—to enable efficient end-to-end model compilation.
- Implement and maintain MLIR-based compiler passes and transformations, including tiling, fusion, bufferization, vectorization, and lowering pipelines targeting OpenCL and Vulkan GPU backends.
- Conduct profiling and bottleneck analysis of compiled kernels using GPU counters and vendor-specific profilers, and drive performance improvements through compiler-level optimizations.
- Build and maintain GPU runtime infrastructure for both OpenCL and Vulkan, including memory management, pipeline setup, command buffer orchestration, and resource scheduling.
- Develop and extend code generation pipelines, enabling automatic lowering from tensor IR through MLIR to efficient OpenCL and Vulkan GPU kernels.
- Implement performance-critical schedules—including tiling, loop fusion, parallelism, and caching strategies—within MLIR-based backends targeting OpenCL and Vulkan runtimes.
- Collaborate with framework teams to optimize end-to-end model lowering for computer vision and LLM workloads using MLIR compilation stacks.
- Design and implement robust compiler and runtime components using modern C/C++, leveraging advanced programming paradigms for high-performance systems.
Required Qualifications
- Strong hands-on experience with the MLIR framework, including authoring and extending custom dialects, writing compiler passes, and building end-to-end lowering pipelines.
- Deep expertise across MLIR abstraction levels:
- Frontend dialects – ingestion and representation of ML models (e.g., TOSA, StableHLO, ONNX-MLIR)
- Graph-level IR – high-level operation fusion, shape inference, and graph transformations
- Tensor IR level – structured operation representation using Linalg, Tensor, and Vector dialects; tiling and fusion strategies
- Runtime/low-level dialects – Bufferization, MemRef, SCF, GPU, and LLVM dialects for final code generation
- Strong hands-on experience in OpenCL programming, including kernel development, memory model, work-group/work-item optimization, and OpenCL runtime management.
- Solid understanding of Vulkan compute programming, including descriptor management, compute pipelines, synchronization primitives, and Vulkan runtime internals.
- Strong understanding of GPU architecture, memory hierarchies, and asynchronous compute.
- Proficiency in C/C++ for system-level development.
- Experience with kernel profiling and bottleneck analysis on GPU platforms.
- Strong background in machine learning fundamentals, covering both Computer Vision (CV) and Large Language Model (LLM) workloads.
Must-have skills
gpu computing, MLIR, C/C++
Good-to-have skills
vulkan, Kernel, OpenCL