About the role

Weekday AI · Remote

This role is for one of Weekday’s clients
Salary range: Rs 2000000 - Rs 10000000 (ie INR 20 - 100 LPA)

Min Experience: 3+ years
Location: India
JobType: full-time

We are looking for a highly skilled GPU Compute, MLIR Compiler, and Kernel Optimization Engineer with deep expertise in GPU compute, MLIR-based code generation, and end-to-end performance optimization for AI workloads. In this role, you will design, optimize, and deploy high-performance GPU compute kernels, build and extend MLIR compiler backends, and collaborate closely with ML, runtime, and hardware teams to push the limits of performance on modern GPU architectures.

Requirements

Key Responsibilities

Develop and optimize GPU compute kernels targeting OpenCL and Vulkan compute backends for high-throughput AI/ML workloads.
Design, build, and extend MLIR dialects across multiple abstraction levels—including frontend dialects, graph-level IR, tensor IR (e.g., Linalg, Tensor, TOSA), and runtime/low-level dialects—to enable efficient end-to-end model compilation.
Implement and maintain MLIR-based compiler passes and transformations, including tiling, fusion, bufferization, vectorization, and lowering pipelines targeting OpenCL and Vulkan GPU backends.
Conduct profiling and bottleneck analysis of compiled kernels using GPU counters and vendor-specific profilers, and drive performance improvements through compiler-level optimizations.
Build and maintain GPU runtime infrastructure for both OpenCL and Vulkan, including memory management, pipeline setup, command buffer orchestration, and resource scheduling.
Develop and extend code generation pipelines, enabling automatic lowering from tensor IR through MLIR to efficient OpenCL and Vulkan GPU kernels.
Implement performance-critical schedules—including tiling, loop fusion, parallelism, and caching strategies—within MLIR-based backends targeting OpenCL and Vulkan runtimes.
Collaborate with framework teams to optimize end-to-end model lowering for computer vision and LLM workloads using MLIR compilation stacks.
Design and implement robust compiler and runtime components using modern C/C++, leveraging advanced programming paradigms for high-performance systems.

Required Qualifications

Strong hands-on experience with the MLIR framework, including authoring and extending custom dialects, writing compiler passes, and building end-to-end lowering pipelines.
Deep expertise across MLIR abstraction levels:
Frontend dialects – ingestion and representation of ML models (e.g., TOSA, StableHLO, ONNX-MLIR)
Graph-level IR – high-level operation fusion, shape inference, and graph transformations
Tensor IR level – structured operation representation using Linalg, Tensor, and Vector dialects; tiling and fusion strategies
Runtime/low-level dialects – Bufferization, MemRef, SCF, GPU, and LLVM dialects for final code generation
Strong hands-on experience in OpenCL programming, including kernel development, memory model, work-group/work-item optimization, and OpenCL runtime management.
Solid understanding of Vulkan compute programming, including descriptor management, compute pipelines, synchronization primitives, and Vulkan runtime internals.
Strong understanding of GPU architecture, memory hierarchies, and asynchronous compute.
Proficiency in C/C++ for system-level development.
Experience with kernel profiling and bottleneck analysis on GPU platforms.
Strong background in machine learning fundamentals, covering both Computer Vision (CV) and Large Language Model (LLM) workloads.

Must-have skills

gpu computing, MLIR, C/C++

Good-to-have skills

vulkan, Kernel, OpenCL

Ready to apply to Weekday AI?

Apply to Weekday AI

About the role

Must-have skills

Good-to-have skills

Similar jobs

Whoa — hold up

About the role

Must-have skills

Good-to-have skills

Similar jobs

Whoa — hold up

Catch your next role the second it’s posted.

Get the worldwide-remote edge.