Pick a job to read the details

Tap any role on the left — its description and apply link will open here.

Member of Technical Staff (Software Engineer)

Cerebras Systems · Sunnyvale, CA

Software Headquarters/Sunnyvale Office Posted May 8, 2026

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

Cerebras Systems Inc. has multiple openings for Member of Technical Staff (Software Engineer)

Title: Member of Technical Staff (Software Engineer)

Job Duties

Implement infrastructure to support high-performance, low-latency inference service.
Deploy and configure Kubernetes services to ensure scalability and reliability of inference workloads.
Optimize resource allocation and auto-scaling policies to handle variable inference demand while minimizing operational costs.
Integrate inference services with containerized environments using Docker and Kubernetes for orchestration.
Ensure high availability and fault tolerance by implementing multi-region deployments and disaster recovery strategies.
Develop Python-based scripts and APIs to streamline data preprocessing, inference execution, and post-processing for real-time inference tasks.
Collaborate with machine learning engineers to validate inference accuracy and performance against functional and latency requirements.
Triage and resolve defects in the service by analyzing logs, metrics, and distributed traces.
Debug issues related to model deployment, container orchestration, or networking configurations, documenting steps to reproduce and root-cause defects.
Collaborate with cross-functional teams to address performance regressions, scalability issues, or integration failures in the inference pipeline.
Develop automated scripts to detect and mitigate common failure modes, improving system reliability.
Author detailed technical documentation for infrastructure configurations, inference workflows, and APIs, ensuring clarity for internal teams and external customers.
Work with product management and user experience teams to define requirements for inference service interfaces, including configuration, monitoring, and event logging.
Document and track defects, enhancements, and release notes using tools like Jira and Git, ensuring version control and traceability.
Participate in release planning and prioritization discussions to align infrastructure development with customer needs and business objectives.

Minimum Requirements:

Master’s degree or foreign equivalent degree in Computer Science, or a related field and 1 year of experience as Software Developer, Student/Intern (Software Developer), Member of Technical Staff (Software Engineer), Software Engineer, or a related occupation required. Employer accepts full-time or equivalent part-time experience gained before, during or after graduate studies.

Required Skills:

Docker and Kubernetes;
Java or C++;
ActiveMQ and Kafka;
Python or Groovy;
JavaScript or TypeScript;
Linux;
SQL, OracleDB, and Redis; and
Git

Additional Information:

Employer’s name: Cerebras Systems Inc.

Job site : 1237 E Arques Avenue, Sunnyvale, CA 94085

Telecommuting permitted

Salary Range: $169,600.00 per year to $175,000.00 per year

If you are interested in applying for this position, please apply online on this web page or mail resume to HR at Cerebras Systems Inc., 1237 E Arques Avenue, Sunnyvale, CA 94085. Please reference Job # 146 on resume or cover letter.

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Read our blog: Five Reasons to Join Cerebras in 2026.

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

Ready to apply?

Apply to Cerebras Systems

Cerebras Systems

View all jobs →

Sr. Technical Staff

Cerebras Systems · Sunnyvale, CA

Apply now

Software Headquarters/Sunnyvale Office Posted May 8, 2026

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

Cerebras Systems Inc. has multiple openings for Sr. Technical Staff.

Title: Sr. Technical Staff

Job Duties:

Post silicon validation of Cerebras Wafer Scale Engines. Test and debug issues on new silicon.
Test, analyze, and characterize high-speed serial interfaces to verify compliance with hardware specifications, record performance data, and recommend design modifications to optimize functionality.
Work with the silicon and operations team to test, bring-up and run burn-in on wafers scale systems.
Support manufacturing operations to utilize the wafer bring up flow. Perform wafer bring-ups, diagnose and debug problems encountered.
Develop and implement hardware to ensure compliance with design specifications.
Collaborate with hardware design engineers and system software engineers to review specifications and to recommend changes that will improve the quality and verifiability of the hardware designs.
Create and maintain automated regression test scripts, using Python and/or bash, that ensure that all tests are run and pass after each change to the design, testbench, tests, or reference model.
Work with system team members to diagnose system related failures. Understand the key system interfaces to FPGA’s, power and cooling, and apply that knowledge to the debug of silicon features.
Development of debug tools in Python to program and analyze the behavior of the Wafer Scale Engine.
Development of wafer bring up flow utilizing Python and shell scripts to capture the steps required to bring up a wafer in a logical easy to use flow.
Documentation of issues found, tools and flow.

Minimum Requirements:

Master’s degree or foreign equivalent degree in Electrical Engineering, Computer Engineering, or a related field and 3 years of experience as Application Engineer, Sr. Technical Staff, Hardware Engineer, or a related occupation required.

Required Skills:

Electrical Signal Integrity Analysis;
Hardware Bring-up & Debug;
Functional and Electrical characterization;
Test automation using scripting language; and
High Speed Interfaces & Protocols including Ethernet, CPRI, or Interlaken.

Additional Information:

Employer’s name: Cerebras Systems Inc.

Job site : 1237 E Arques Avenue, Sunnyvale, CA 94085

Telecommuting permitted.

Salary Range: $250,000.00 per year to $275,000.00 per year

If you are interested in applying for this position, please apply online on this web page or mail resume to HR at Cerebras Systems Inc., 1237 E Arques Avenue, Sunnyvale, CA 94085. Please reference Job # 145 on resume or cover letter.

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Read our blog: Five Reasons to Join Cerebras in 2026.

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

Ready to apply?

Apply to Cerebras Systems

Cerebras Systems

View all jobs →

Sr. Member of Technical Staff

Cerebras Systems · Sunnyvale, CA

Apply now

Software Headquarters/Sunnyvale Office Posted May 8, 2026

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

Cerebras Systems Inc. has multiple openings for Sr. Member of Technical Staff

Title: Sr. Member of Technical Staff

Job Duties:

Design and develop software features that support system resiliency and high availability, including automated recovery mechanisms and fault-tolerant architecture across distributed environments.
Develop and maintain cloud-based deployment workflows for AI inference software using AWS tools and services to support low-latency and scalable system performance.
Develop Python-based scripts and APIs to streamline data preprocessing, inference execution, and post-processing for real-time inference tasks.
Use parallel programming techniques (e.g., multi-threading, asynchronous processing) to maximize resource efficiency on AWS compute instances.
Develop software components to support visualization and analysis of system performance metrics, enhancing the monitoring and usability of inference services. ⠀
Develop inference software in Docker containers and define Kubernetes orchestration strategies that ensure software reliability and efficient scaling.
Develop automated scripts to detect and mitigate common failure modes, improving software system reliability.
Debug issues related to model deployment, container orchestration, networking configurations, documenting steps to reproduce and root-cause defects.
Triage and resolve defects in the software service by analyzing logs, metrics, and distributed traces using tools like AWS CloudWatch, Grafana, or custom Python scripts.
Work with product management and user experience teams to define requirements for inference service interfaces, including configuration, monitoring, and event logging.
Author detailed technical documentation for infrastructure configurations, inference workflows, and APIs, ensuring clarity for internal teams and external customers.
Document and track defects, enhancements, and release notes using tools like Jira and Git, ensuring version control and traceability.

Minimum Requirements:

Master’s degree or foreign equivalent degree in Computer Science, or a related field and 18 months of experience as Information Security Analyst, Software Engineer, Sr. Member of Technical Staff, IT Senior Applications Engineer, or a related occupation required.

The required experience must include 18 months of experience with the following:

Infrastructure-as-Code and deployment automation:Terraform, AWS CloudFormation, AWS CDK, and Ansible;
Containerization and orchestration:Docker, Kubernetes, AWS EKS, AWS Elastic Container Service (ECS), AWS Fargate, and Helm;
Compute and serverless services: AWS EC2, AWS Lambda functions, and Auto Scaling Groups;
Monitoring, logging, and distributed tracing: AWS CloudWatch, AWS X-Ray, ELK (Elasticsearch, Logstash, Kibana), Prometheus, and Grafana;
Programming languages and frameworks: Python, Node.js, JavaScript, and Flask;
Data storage and caching: PostgreSQL, Redis, and NFS; and
CI/CD and version control: Jenkins and Git

Additional Information:

Employer’s name: Cerebras Systems Inc.

Job site : 1237 E Arques Avenue, Sunnyvale, CA 94085

Telecommuting permitted

Salary Range: $230,000.00 per year to $250,000.00 per year

If you are interested in applying for this position, please apply online on this web page or mail resume to HR at Cerebras Systems Inc., 1237 E Arques Avenue, Sunnyvale, CA 94085. Please reference Job # 142 on resume or cover letter.

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Read our blog: Five Reasons to Join Cerebras in 2026.

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

Ready to apply?

Apply to Cerebras Systems

Cerebras Systems

View all jobs →

ML Performance Benchmarking Engineer

Cerebras Systems · Toronto, Ontario, Canada

Apply now

Software Toronto Office Posted May 7, 2026

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

About The Role

The Inference Core Platform group is at the heart of Cerebras' mission to deliver the world’s fastest AI inference. Our team builds the foundational software and hardware infrastructure that powers low-latency, high-speed, high-throughput deployment on the Cerebras Wafer-Scale Engine (WSE). We are responsible for the full stack—from model compilation and scheduling down to custom hardware kernels and driver development.

The ML Performance Benchmarking team plays a pivotal role in shaping the performance and scalability of AI inference on one of the most advanced computing systems ever built. We drive the bring-up of core inference capabilities and deliver performance improvements at every stage of development – from early prototyping to production deployment.

We're looking for passionate engineers to join us in redefining the limits of AI inference. If you thrive on building systems that measure, analyze, and optimize performance at scale, this is your opportunity to make a transformative impact on the future of AI.

Scope of the team includes:

Core Inference Observability – Design and implement end-to-end telemetry systems across the software stack, providing deep visibility into inference performance and enabling rapid iteration before and after deployment.
Benchmarking Infrastructure – Architect, build, and scale the automation that generates, analyzes, and visualizes performance data used to inform business decisions across engineering and leadership.
Performance Analysis – Dive deep into system behavior, dissect performance bottlenecks, and deliver actionable insights that directly influence which features ship and how they evolve.
Feature Integration – Partner closely with Core Platform teams to define rigorous testing methodologies that validate inference features for peak performance.

Skills & Qualifications

Bachelor’s or Master’s degree in Computer Engineering, Systems Engineering, or a related field.
Proficiency in Python and/or C++ programming.
Proven experience in building and scaling automated infrastructure.
Strong background in throughput and performance optimization techniques, especially in complex, large-scale systems.
Excellent problem-solving skills and a strong analytical mindset.
Demonstrated ability to dive deep into new domains.
Ability to work in a fast-paced, ambiguous, and collaborative environment.

Preferred Skills & Qualifications

Familiarity with problem-solving at the intersection of hardware and software.
Hands-on experience with AI workloads and architectures is a plus.

Location

On-site or hybrid at our Toronto office

#LI-WA1

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Read our blog: Five Reasons to Join Cerebras in 2026.

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

Ready to apply?

Apply to Cerebras Systems

Cerebras Systems

View all jobs →

Senior Performance Engineer, Inference

Cerebras Systems · Sunnyvale, CA

Apply now

Software Headquarters/Sunnyvale Office Posted May 7, 2026

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

About The Role

We are hiring a Senior Performance Engineer to join our Product team. You are an expert on state-of-the-art inference performance and will serve as our resident expert on how Cerebras stacks up against alternative inference providers on both price and performance. This role sits at the intersection of performance benchmarking from first principles and competitive intelligence. The role has two core pillars:

Performance Benchmarking
You will build, run, and maintain reproducible benchmarks that measure Cerebras inference performance for real customer workloads. This includes metrics like tokens per second, time to first token, latency under concurrency, and total cost of ownership (TCO).
Competitive Pricing Intelligence
You will build and maintain a living model of competitor pricing across the AI inference landscape, including cloud providers, custom silicon vendors, and inference API platforms. You will work directly with our Sales and Product teams to translate this intelligence into pricing recommendations for enterprise contracts, ensuring Cerebras offers a compelling value proposition for every customer.

This role requires deep, hands-on fluency with open-source inference stacks (vLLM, SGLang, TensorRT-LLM), GPU kernel-level optimization toolchains (CUDA, Triton), and an intuitive understanding of how transformer architecture decisions—attention mechanisms, model sizing, quantization, KV-cache strategies—interact with the realities of GPU memory hierarchies and compute budgets.

Responsibilities

Design standardized benchmark suites for inference workloads (code generation, summarization, multi-turn conversation, agentic tool use) that enable fair, reproducible comparisons.
Stay current with GPU optimization communities (CUDA, Triton, TensorRT) and evaluate how new kernel fusions, flash-attention variants, and quantization techniques shift performance ceilings.
Build and continuously update a competitive pricing model covering token-based pricing, throughput-based pricing, and enterprise contract structures across major inference providers.
Monitor industry announcements, pricing changes, and new product launches. Synthesize findings into actionable briefs for the Sales and Product teams.
Partner with Sales to build deal-specific competitive analyses showing total cost of ownership and performance advantages for enterprise prospects.
Collaborate with Product and Engineering to identify where competitors are closing gaps or where Cerebras has underappreciated advantages.
Track third-party benchmarking sources (Artificial Analysis, InferenceX) and ensure Cerebras is well-represented and accurately measured.

Skills & Qualifications

Required

Deep practical experience with state-of-the-art open-source inference frameworks like vLLM, SGLang, or TensorRT-LLM.
5+ years of experience in ML systems, ML research engineering, or high-performance computing.
Strong understanding of LLM inference economics: tokens, throughput, latency, batch sizes, precision trade-offs, and how these translate to customer cost.
Strong understanding of transformer model architecture internals such as attention mechanisms (MHA, MQA,GQA, MLA, DSA, MHA) and KV-cache management, and how each affects memory and compute profiles.
Self-directed and resourceful.

Preferred

Background in ML research (publications or significant open-source contributions) with a systems or efficiency focus.
Contributions to open-source inference or kernel optimization projects.
Excellent communication skills. You will collaborate with executives, write for engineers, and create materials for sales leaders.

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Read our blog: Five Reasons to Join Cerebras in 2026.

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

Ready to apply?

Apply to Cerebras Systems

Cerebras Systems

View all jobs →

ML Systems Performance Engineer

Cerebras Systems · Sunnyvale CA or Toronto Canada

Apply now

Software Toronto Office Posted Apr 15, 2026

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

About The Role

Engineers on the inference performance team operate at the intersection of hardware and software, driving end-to-end model inference speed and throughput. Their work spans low-level kernel performance debugging and optimization, system-level performance analysis, performance modeling and estimation, and the development of tooling for performance projection and diagnostics.

Responsibilities

Build performance models (kernel-level, end-to-end) to estimate the performance of state of the art and customer ML models.
Optimize and debug our kernel micro code and compiler algorithms to elevate ML model inference speed, throughput and compute utilization on the Cerebras WSE.
Debug and understand runtime performance on the system and cluster.
Develop tools and infrastructure to help visualize performance data collected from the Wafer Scale Engine and our compute cluster.

Requirements

Bachelors / Masters / PhD in Electrical Engineering or Computer Science.
Strong background in computer architecture.
Exposure to and understanding of low-level deep learning / LLM math.
Strong analytical and problem-solving mindset.
3+ years of experience in a relevant domain (Computer Architecture, CPU/GPU Performance, Kernel Optimization, HPC).
Experience working on CPU/GPU simulators.
Exposure to performance profiling and debug on any system pipeline.
Comfort with C++ and Python.

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Read our blog: Five Reasons to Join Cerebras in 2026.

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

Ready to apply?

Apply to Cerebras Systems

Cerebras Systems

View all jobs →

Full Stack LLM Engineer

Cerebras Systems · Toronto, Ontario, Canada

Apply now

Software Toronto Office Posted Apr 15, 2026

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

About the Role
We are seeking a versatile and experienced engineer to join our Inference Core Model Bringup team. This team is responsible to rapidly bring up state-of-the-art open-source models (like LLaMA, Qwen, etc) or customer-provided proprietary models on our Cerebras CSX systems. Success in this role requires a system-minded generalist who thrives in fast-paced bringup environments and is comfortable working across the entire Cerebras software stack.
Your work will play a critical role in achieving unprecedented levels of performance, efficiency, and scalability for AI applications.

Responsibilities

Contribute to the end-to-end bring up of ML models on Cerebras CSX systems.
Work across the stack: model architecture translation, graph lowering, compiler optimizations, runtime integration, and performance tuning.
Debug performance and correctness issues spanning model code, compiler IRs, runtime behavior, and hardware utilization.
Propose and prototype improvements across tools, APIs, or automation flows to accelerate future bring ups.

Skills & Qualifications

Bachelor’s, Master’s, or PhD in Computer Science, Engineering, or a related field.
Comfort navigating the full AI toolchain: Python modeling code, compiler IRs, performance profiling, etc.
Strong debugging skills across performance, numerical accuracy, and runtime integration.
Experience with deep learning frameworks (e.g., PyTorch, TensorFlow) and familiarity with model internals (e.g., attention, MoE, diffusion).
Proficiency in C/C++ programming and experience with low-level optimization.
Proven experience in compiler development, particularly with LLVM and/or MLIR.
Strong background in optimization techniques, particularly those involving NP-hard problems.

What We Offer

Competitive salary and benefits package.
Opportunities for professional growth and career advancement.
A dynamic and innovative work environment.
The chance to work on cutting-edge technologies and make a significant impact on the future of AI.

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Read our blog: Five Reasons to Join Cerebras in 2026.

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

Ready to apply?

Apply to Cerebras Systems

Cerebras Systems

View all jobs →

Kernel Engineer

Cerebras Systems · Sunnyvale CA or Toronto Canada

Apply now

Software Headquarters/Sunnyvale Office Toronto Office Posted Apr 15, 2026

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

About The Role

As a Kernel Engineer on our team, you will develop high-performance software solutions at the intersection of hardware and software, developing high-performance software for cutting-edge AI and HPC workloads. Your focus will be on implementing, optimizing, and scaling deep learning operations to fully leverage our custom, massively parallel processor architecture.

You will be part of a world-class team responsible for the design, performance tuning, and validation of foundational ML and HPC kernels. This includes building a library of parallel and distributed algorithms that maximize compute utilization and push the boundaries of training efficiency for state-of-the-art AI models. Your work will be critical to unlocking the full potential of our hardware and accelerating the pace of AI innovation.

Responsibilities

Develop design specifications for new machine learning and linear algebra kernels and mapping to the Cerebras WSE System using various parallel programming algorithms.
Develop and debug kernel library of highly optimized low level assembly instruction and C-like domain specific language routines to implement algorithms targeting the Cerebras hardware system.
Develop and debug high-performance kernel routines in low-level assembly and a custom C-like (CSL) language, implementing algorithms optimized for the Cerebras hardware system.
Using mathematical models and analysis to measure the software performance and inform design decisions.
Develop and integrate unit and system testing methodologies to verify correct functionality and performance of kernel libraries.
Study emerging trends in Machine Learning applications and help evolve Kernel library architecture to address computational challenges of the start-of-the-art Neural Networks.
Interact with chip and system architects to optimize instruction sets, microarchitecture, and IO of next generation systems.

Skills And Qualifications

Bachelor’s, Master’s, PhD or foreign equivalents in Computer Science, Computer Engineering, Mathematics, or related fields.
Understanding of hardware architecture concepts — must be comfortable learning the details of a new hardware architecture.
Skilled in C++ and Python programming languages.
Good knowledge of library and/or API development best practices.
Strong debugging skills and knowledge of debugging complex software stack.

Preferred Skills And Qualifications

Experience in kernel development and/or testing.
Familiarity with parallel algorithms and distributed memory systems.
Experience in programming accelerators such as GPUs and FPGAs.
Familiarity with Machine Learning neural networks and frameworks such as TensorFlow and PyTorch.
Familiarity with HPC kernels and their optimization.

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Read our blog: Five Reasons to Join Cerebras in 2026.

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

Ready to apply?

Apply to Cerebras Systems

Cerebras Systems

View all jobs →

Network Architect

Cerebras Systems · Sunnyvale, CA

Apply now

Software Headquarters/Sunnyvale Office Posted Apr 15, 2026

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

About The Role

As a Network Architect on the Cluster Architecture Team, you will work closely with the vendors, internal networking teams and industry peers to develop best-in-class front-end datacenter and interconnect architecture of the current and future generations of the Cerebras AI clusters. You will be responsible for developing proof-of-concept of new network designs and features enabling resilient and reliable network for AI workloads. The role will require cross-functional collaboration and interaction with diverse hardware components (e.g., network devices and the Wafer-Scale Engine) as well as software at several layers of the stack, from host-side networking to cluster-level coordination. The role also requires understanding of network monitoring systems and network debugging methodologies.

Responsibilities

Design and architect front-end network fabrics for AI/ML and HPC systems.
Identify and resolve performance and efficiency bottlenecks, ensuring high resource utilization, low latency, and high-throughput communication.
Lead cross-functional technical projects spanning multiple teams and integrating diverse software and hardware components to deliver advanced networking technologies.
Foster clear and effective communication across teams and stakeholders.
Collaborate with vendors and industry partners to shape network hardware and feature roadmaps.
Represent Cerebras in industry forums and technical communities.
Serve as the central point of contact for network reliability issues.

Skills & Qualifications

Ph.D. in Computer Science or Electrical Engineering + 10 years industry experience or Master’s in CS or EE + 15 years industry experience.
8+ Years of experience in large scale network designs in datacenter and cloud environments.
Extensive experience debugging networking issues in large distributed systems environment with multiple networking platforms and protocols.
Experience of managing and leading multi-phase and multi-team projects.
Networking platforms like Juniper, Arista, Cisco, Open box architectures (Sonic, FOBSS).
Networking protocols like VXLAN, EVPN, RoCE, BGP, DCQCN, PFC, Streaming telemetry.
Familiarity with automation languages like Python, or Go.
Familiarity with Network visibility and management systems.
Prior experience in hyperscalers or cloud service providers is strongly preferred.

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Read our blog: Five Reasons to Join Cerebras in 2026.

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

Ready to apply?

Apply to Cerebras Systems

Cerebras Systems

View all jobs →

ML Research Engineer (Inference)

Cerebras Systems · Bengaluru, Karnataka, India

Apply now

Software India Office Posted Apr 15, 2026

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

About The Role

As a Research Engineer on the Inference ML team at Cerebras Systems, you will adapt today's most advanced language and vision models to run efficiently on our flagship Cerebras architecture. You'll work alongside ML researchers and engineers to design, prototype, validate, and optimize models, gaining end-to-end exposure to cutting-edge inference research on the world's fastest AI accelerator.

You will focus on pushing the frontier of speculative decoding, large-model pruning and compression, sparse attention, and sparsity-driven techniques to deliver low-latency, high-throughput inference at scale.

Responsibilities

Implement and adapt transformer-based models (NLP and/or vision) to run on Cerebras hardware
Assist in optimizing models for inference performance (latency, throughput)
Run experiments, analyze results, and support model improvements
Help bring up and validate models on the Cerebras system
Debug and troubleshoot model or system issues with guidance from senior team members
Support profiling and performance analysis using internal tools
Collaborate with cross-functional teams (ML, software, hardware) on model integration

Minimum Qualifications

Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field
1–3 years of experience in software engineering or machine learning in a similar capacity (internships count)
Experience with Python and at least one ML framework (e.g., PyTorch, Transformers, vLLM or SGLang)
Understanding of deep learning concepts (e.g., neural networks, transformers)
Experience with Generative AI and Machine Learning systems
Strong programming skills in Python and/or C++

Preferred Qualifications

Experience with speculative decoding, neural network pruning and compression, sparse attention, quantization, sparsity, post-training techniques, and inference-focused evaluations.
Exposure to large language models or computer vision models
Experience running experiments or tuning models
Familiarity with tools like PyTorch, Hugging Face Transformers, or similar
Basic understanding of performance concepts (e.g., latency, throughput)
Experience working in Linux environments

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Read our blog: Five Reasons to Join Cerebras in 2026.

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

Ready to apply?

Apply to Cerebras Systems

Cerebras Systems

View all jobs →

QA Lead (ML Integration and Quality)

Cerebras Systems · Bengaluru, Karnataka, India

Apply now

Software India Office Posted Apr 15, 2026

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

About The Role

As an ML QA Lead, you ensure quality of Cerebras SW across all supported ML workloads and workflows. You will be part of MIQ (ML Integration and Quality) team that will focus on SW components feature testing, ML training accuracy and performance, pre deployment/production validation, validating customer workloads and workflows.

As part of this role, you will influence the best testing practice, good debugging methodology, effective cross team communication and advocate for world-class products.

Responsibilities

Drive quality of various software and hardware components of Cerebras solution to ensure accuracy, performance and usability of model trainings.
Bring good testing methodology, effective communication and strong debugging skills to the team.
Demand the highest quality from all components within the Cerebras environment.
Ability to automate workflows, setup testbeds and build tools to effectively monitor and debug issues.
Implement creative ways to break Cerebras software and identify potential problems.
Break down complex tasks into smaller tasks. Be a problem solver. Be a thought leader.
Ability to work in a fast-paced environment and make the necessary prioritizations and judgements which affects productivity at a company level.

Skills & Qualifications

8 years of relevant industry experience in Software quality and testing areas.
Experience testing AI/ML models and evaluation of the model quality.
Stong automation and programming skills using one or more programming languages like Python, C++ or go.
Experience in testing compute/machine learning/networking/storage systems within a large-scale enterprise environment.
Experience in debugging issues across scale out deployment.
Experience in putting together thorough test-plans.
Experience working effectively across teams, including product development, product management, customer operations, and field teams.

Preferred Skills & Qualifications

Knowledge of ML workflows and frameworks.
Knowledge of basic storage and networking protocols.
Hands-on experience with training LLMs.
Hands-on experience working with containers, Kubernetes.

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Read our blog: Five Reasons to Join Cerebras in 2026.

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

Ready to apply?

Apply to Cerebras Systems

Cerebras Systems

View all jobs →

Compute Server Platform Architect

Cerebras Systems · Sunnyvale CA or Toronto Canada

Apply now

Software US and Canada Offices Posted Apr 15, 2026

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

About The Role

As a Compute / Server Platform Architect on the Cluster Architecture Team, you will own the server-side platform architecture that enables Cerebras CS3-based AI clusters (training and inference) to deliver predictable performance, scalability, and reliability. Our accelerators are network-attached, so the x86 server fleet is a first-class part of the end-to-end system: it runs critical-path runtime functions (for example orchestration, prompt caching, and IO/control services) and must be co-designed with software for token-level latency, throughput, and cost efficiency. You will translate workload behavior into CPU, memory, IO, PCIe, and host-networking requirements, drive platform evaluations with vendors, and provide technical leadership through qualification and production adoption in close partnership with other function leaders and TPMs.

Responsibilities

Own the architecture for all server roles in Cerebras clusters, including definitions of server types, configurations, and lifecycle strategy.
Define and maintain server formulas (counts and ratios per CS-3 count, cluster size, and workload type) including capacity planning and headroom policy.
Specify platform configurations: CPU SKU and core strategy, our vendor roadmap (e.g., AMD, Intel, ARM), memory topology (channels, DIMM type, capacity), PCIe topology and lane budgeting, NIC selection/placement, and local NVMe policy where applicable.
Translate software and runtime flows into measurable hardware requirements (CPU utilization, memory bandwidth/latency, bursty IO patterns, queueing and concurrency limits) and communicate clear guardrails back to software teams.
Develop performance and scaling models; validate with microbenchmarks and workload-level experiments; identify bottlenecks and drive cross-stack fixes.
Define the OS, BIOS, firmware, and driver baseline for each server type; there are other teams that follow these recommendations and apply them on our fleet.
Stay current on emerging server technologies (CPU generations, new memory technologies, CXL, NVMe evolutions, SmartNIC/DPU capabilities where relevant) and run proof-of-concept evaluations to determine when to adopt.
Lead technical vendor engagements (OEM/ODM and component vendors): influence roadmap, request platform knobs, and drive joint debugging on performance or reliability issues.
Define qualification and acceptance criteria (performance, stability, operability) and partner with the Infrastructure Hardware TPM to execute qualification plans and land changes cleanly into production.
Support bring-up and rare deployment debugging in lab and staging environments; drive root-cause analysis for regressions spanning firmware, drivers, OS, and runtime behavior.

Skills and Qualifications

PhD. in Computer Science or Electrical/Computer Engineering and + 8 years industry experience, or Master’s/Bachelor’s in CS or EE + 10 years industry experience.
5+ years of experience in server platform architecture, systems performance engineering, or large-scale infrastructure design for AI/ML, HPC, or performance-sensitive distributed systems.
Deep understanding of x86 server architecture: CPU microarchitecture basics, cache hierarchies, NUMA, memory controllers/channels, and memory bandwidth vs latency tradeoffs.
Strong Linux systems knowledge: profiling and performance analysis, scheduling and syscall overheads, memory management behavior, and practical tuning methodology.
Experience reasoning about high-performance IO paths, including NIC behavior at a systems level, RDMA/RoCE concepts, and NVMe performance characteristics.
Proven ability to create capacity and performance models and validate them empirically with a rigorous benchmarking plan.
Experience working directly with vendors/partners to evaluate platforms, drive issue resolution, and influence roadmaps.
Strong cross-functional communication skills and ability to drive technical decisions through clear tradeoff documents and reviews.
Familiarity with application and system software (C, C++, Python).

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Read our blog: Five Reasons to Join Cerebras in 2026.

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

Ready to apply?

Apply to Cerebras Systems

Cerebras Systems

View all jobs →

Distributed Systems Cluster Security Software – Engineering Lead

Cerebras Systems · Sunnyvale, CA

Apply now

Software Headquarters/Sunnyvale Office Posted Apr 15, 2026

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

About The Role

In this role, you will be the security czar for the Cerebras’s AI cluster product. Such AI clusters have 100’s of Wafer-scale accelerator systems, 1000’s of high-end servers, and several 1000’s of networking ports including switches. Plus, there will be network attached storage, all in a large-scale datacenter.

You will ensure that Cerebras’s large-scale AI clusters are secured through first-principles, best practices, security-first based engineering. Cerebras cluster involves complex HW components, networking and a vertically integrated cluster management software stack – all the way from a bare-metal deployment that brings up an operational cluster to a suite of cluster management software that enables multi-tenant higher-level training and inference services to be hosted on such large clusters.

Your role will be to ensure both end-to-end security as well as privacy of such cluster use-cases. You will develop security engineering solutions that have the necessary network access control, user access controls, and world-class multi-tenancy solution

Responsibilities

Be the primary engineering face and owner of cluster security.
Provide strong technical leadership in cluster security for the company.
Actively work with corporate security, and customers to identify and define security enhancements needed.
Build engineering driven software that will provide guardrails, detection solution and response tools for vulnerabilities at all layers of vertical stack (includes HW and SW).
Straddle vertically and horizontally cross functional collaboration to ensure end-to-end cluster software is secure.
Develop, maintain and execute roadmap of the cluster security product.
Build an outstanding engineering team to deliver world-class security solution.

Skills & Qualifications

3+ years of demonstrated engineering leadership/management role in distributed systems security.
Proven track record of delivering product, launching and deploying secured distributed solutions to customers.
Excellent communication, articulation, collaboration and ability to act as a stakeholder.
Tough decision-making skills with data and trade-off analysis.
Outstanding sense for product and user journeys, out-of-box thinker.
Outstanding road map and schedule execution skills under tight timeline and budgets.
Strong background in multi-tenancy of large scale clusters is necessary.
Strong technical experience in computer and cluster networks is necessary.
Strong technical background in distributed systems software development (K8s and its ecosystem) is preferred.
Technical experience with bare metal cluster management software and related monitoring is preferred.

The salary range for this position is $140,000 - $240,000 annually. Actual compensation will be determined based on factors such as experience, skills, qualifications, and location.

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Read our blog: Five Reasons to Join Cerebras in 2026.

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

Ready to apply?

Apply to Cerebras Systems

Cerebras Systems

View all jobs →

Software Engineer, Kernel Reliability

Cerebras Systems · Sunnyvale CA or Toronto Canada

Apply now

Software Headquarters/Sunnyvale Office Toronto Office Posted Apr 15, 2026

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

About The Role

We're looking for a deeply technical, hands-on software engineer to join our on-field Kernel Reliability team. You'll help tackle a critical challenge: improving the reliability of our advanced compute clusters and the underlying inference, training, and internal production services. In this role, you'll work close to the code and design solutions that will scale with our rapidly growing system production and software service offerings. If you have strong fundamentals in systems, debugging, and failure analysis—and enjoy building tools and solving hard reliability problems—we want to hear from you. New college graduates are welcome.

Responsibilities

Contribute to the technical roadmap and execution for kernel-centric reliability of our internal and customer-facing systems.
Partner with System and Cluster Operations teams to reduce system and service downtime after failure through tooling, analysis, and hands-on debugging support.
Work with the Debug Team to enhance debug tools with the goal of speeding up failure analysis.
Collaborate with software teams to improve the software stack—including kernels—to improve on-field debugging and failure analysis.
Work with ASIC and hardware architecture teams to co-design next-generation architectures with reliability and ease of debug in mind.
Participate in incident response, root-cause analysis, and post-mortems; drive follow-ups that measurably improve reliability over time.

Skills & Qualifications

We recognize great engineers come from different backgrounds. If you're excited about the role, we encourage you to apply even if you don't meet every qualification.
Required (or demonstrated through projects/internships/coursework):
- Strong programming skills in C/C++ and Python.
- Solid foundations in operating systems, computer architecture, and systems programming fundamentals.
- Ability to debug complex issues using logs, traces, and standard debugging workflows; interest in root-cause analysis.

Nice to have:

Exposure to parallel and distributed programming (message passing, multicore, GPU, embedded, etc.).
Experience building or using debug/diagnostic tools (debuggers, core dump handling, tracing, sanitizers, profilers, etc.).
Familiarity with debugging distributed and parallel applications (deadlocks, livelocks, race conditions, etc.).
Knowledge of computer architecture concepts (instruction pipelining, multithreading, networking, memory systems, etc.).
Operations & Monitoring: familiarity with monitoring, incident response, and post-mortem culture.

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Read our blog: Five Reasons to Join Cerebras in 2026.

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

Ready to apply?

Apply to Cerebras Systems

Cerebras Systems

View all jobs →

LLM Inference Performance & Evals Engineer

Cerebras Systems · Toronto, Ontario, Canada

Apply now

Software Toronto Office Posted Apr 15, 2026

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

About The Role

Join the inference model team dedicated to bring up the state-of-the-art models, numerically validating and accelerating new model ideas on wafer-scale hardware. You will prototype architectural tweaks, build performance-eval pipelines, and turn hard numbers into changes that land in production.

Key Responsibilities

Prototype and benchmark cutting-edge ideas: new attentions, MoE, speculative decoding, and many more innovations as they emerge.
Develop agent-driven automation that designs experiments, schedules runs, triages regressions, and drafts pull-requests.
Work closely with compiler, runtime, and silicon teams: unique opportunity to experience the full stack of software/hardware innovation.
Keep pace with the latest open- and closed-source models; run them first on wafer scale to expose new optimization opportunities.

Skills And Qualifications

3 + years building high-performance ML or systems software.
Solid grounding in Transformer math—attention scaling, KV-cache, quantisation—or clear evidence you learn this material rapidly.
Comfort navigating the full AI toolchain: Python modeling code, compiler IRs, performance profiling, etc.
Strong debugging skills across performance, numerical accuracy, and runtime integration.
Prior experience in modeling, compilers or crafting benchmarks or performance studies; not just black-box QA tests.
Strong passion to leverage AI agents or workflow orchestration tools to boost personal productivity.

Assets

Hands-on with flash-attention, Triton kernels, linear-attention, or sparsity research.
Performance-tuning experience on custom silicon, GPUs, or FPGAs.
Proficiency in C/C++ programming and experience with low-level optimization.
Proven experience in compiler development, particularly with LLVM and/or MLIR.
Publications, repos, or blog posts dissecting model speed-ups.
Contributions to open-source agent frameworks.

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Read our blog: Five Reasons to Join Cerebras in 2026.

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

Ready to apply?

Apply to Cerebras Systems

Cerebras Systems

View all jobs →

ML Software Tool Development Engineer

Cerebras Systems · Sunnyvale CA or Toronto Canada

Apply now

Software US and Canada Offices Posted Apr 15, 2026

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

Responsibilities:

Lead the design and implementation of system-level debugging, validation, and observability platforms.
Develop automated systems for collecting and analyzing numerical, and execution anomalies.
Create visualization and analysis tools to enable efficient root-cause investigation.
Build frameworks for failure classification, regression detection, and anomaly monitoring.
Extend compilers, runtimes, and programming interfaces to support advanced profiling and instrumentation.
Improve system bring-up, low-level debug, and validation workflows.
Partner cross-functionally with compiler, hardware, firmware, runtime, and infrastructure teams.
Establish best practices for debuggability, reliability, and operational excellence.
Lead high-impact initiatives.
Support incident response and drive long-term corrective actions.

Qualifications:

Strong proficiency in C++ and Python, with a track record of building reliable, high-performance systems and tooling.
Demonstrated experience debugging complex hardware/software systems and driving issues to root cause.
Experience analyzing system-level data structures, execution graphs, or dependency networks for diagnostics and validation.
Proven ability to design and build intuitive visualization and analysis tools for complex technical data.

Experience with compiler internals, custom hardware interfaces, or low-level protocol design.

Strong written and verbal communication skills, with the ability to explain technical concepts to diverse stakeholders.
Ability to work independently and lead complex technical projects end-to-end.

Preferred Skills & Qualifications

Familiarity with machine learning training and inference pipelines, especially distributed training and large-model scaling.
Prior work on high-performance clusters, HPC systems, or custom hardware/software co-design.

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Read our blog: Five Reasons to Join Cerebras in 2026.

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

Ready to apply?

Apply to Cerebras Systems

Cerebras Systems

View all jobs →

Staff Python / PyTorch Developer — Frontend Inference Compiler – Dubai

Cerebras Systems · Europe; Remote, California, United States; UAE

Apply now

Software UAE Posted Apr 15, 2026

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

About the Role:

Would you like to participate in creating the fastest Generative Models inference in the world? Join the Cerebras Inference Team to participate in development of unique Software and Hardware combination that sports best inference characteristics in the market while running largest models available.

Cerebras wafer scale inference platform allows running Generative models with unprecedented speed thanks to unique hardware architecture that provides fastest access to local memory, ultra-fast interconnect and huge amount of available compute.

You will be part of the team that works with latest open and closed generative AI models to optimize for the Cerebras inference platform. Your responsibilities will include working on model representation, optimization and compilation stack to produce the best results on Cerebras current and future platforms.

Responsibilities:

Analysis of new models from generative AI field and understanding of impacts on compilation stack
Develop and maintain model definition framework that consists of model building blocks to represent large language models based on PyTorch and Cerebras dialects ready to be deployed on Cerebras hardware.
Develop and maintain the frontend compiler infrastructure that ingests PyTorch models and produces an intermediate representation (IR).
Extend and optimize PyTorch FX / TorchScript / TorchDynamo-based tooling for graph capture, transformation, and analysis.
Collaboration with other teams throughout feature implementation
Research on new methods for model optimization to improve Cerebras inference

Qualifications:

Degree in Engineering, Computer Science, or equivalent in experience and evidence of exceptional ability
Strong Python programming skills and in-depth experience with PyTorch internals (e.g., TorchScript, FX, or Dynamo).
Solid understanding of computational graphs, tensor operations, and model tracing.
Experience building or extending compilers, interpreters, or ML graph optimization frameworks.
Experience working with PyTorch and HuggingFace Transformers library
Knowledge and experience working with Large Language Models (understanding Transformer architecture variations, generation cycle, etc.)
Strong C++ programming skills.
Knowledge of MLIR based compilation stack

Preferred Qualifications

Prior experience contributing to PyTorch, TensorFlow XLA, TVM, ONNX RT, or similar compiler stacks.
Knowledge of hardware accelerators, quantization, or runtime scheduling.
Experience with multi-target inference compilation (e.g., CPU, GPU, custom ASICs).
Understanding of numerical precision trade-offs and operator lowering.
Contributions to open-source ML compiler projects.

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Read our blog: Five Reasons to Join Cerebras in 2026.

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

Ready to apply?

Apply to Cerebras Systems

Cerebras Systems

View all jobs →

Performance Engineer

Cerebras Systems · Remote, California, United States; UAE

Apply now

Software UAE Posted Apr 15, 2026

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

About The Role
As a Kernel Engineer on our team, you will develop high-performance software solutions at the intersection of hardware and software, developing high-performance software for cutting-edge AI and HPC workloads. Your focus will be on implementing, optimizing, and scaling deep learning operations to fully leverage our custom, massively parallel processor architecture.
You will be part of a world-class team responsible for the design, performance tuning, and validation of foundational ML and HPC kernels. This includes building a library of parallel and distributed algorithms that maximize compute utilization and push the boundaries of training efficiency for state-of-the-art AI models. Your work will be critical to unlocking the full potential of our hardware and accelerating the pace of AI innovation.
Responsibilities

Develop design specifications for new machine learning and linear algebra kernels and mapping to the Cerebras WSE System using various parallel programming algorithms.
Develop and debug kernel library of highly optimized low level assembly instruction and C-like domain specific language routines to implement algorithms targeting the Cerebras hardware system.
Develop and debug high-performance kernel routines in low-level assembly and a custom C-like (CSL) language, implementing algorithms optimized for the Cerebras hardware system.
Using mathematical models and analysis to measure the software performance and inform design decisions.
Develop and integrate unit and system testing methodologies to verify correct functionality and performance of kernel libraries.
Study emerging trends in Machine Learning applications and help evolve Kernel library architecture to address computational challenges of the start-of-the-art Neural Networks.
Interact with chip and system architects to optimize instruction sets, microarchitecture, and IO of next generation systems.

Skills And Qualifications

Bachelor’s, Master’s, PhD or foreign equivalents in Computer Science, Computer Engineering, Mathematics, or related fields.
Understanding of hardware architecture concepts — must be comfortable learning the details of a new hardware architecture.
Skilled in C++ and Python programming languages.
Good knowledge of library and/or API development best practices.
Strong debugging skills and knowledge of debugging complex software stack.

Preferred Skills And Qualifications

Experience in kernel development and/or testing.
Familiarity with parallel algorithms and distributed memory systems.
Experience in programming accelerators such as GPUs and FPGAs.
Familiarity with Machine Learning neural networks and frameworks such as TensorFlow and PyTorch.
Familiarity with HPC kernels and their optimization.

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Read our blog: Five Reasons to Join Cerebras in 2026.

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

Ready to apply?

Apply to Cerebras Systems

Cerebras Systems

View all jobs →

Distributed Software Engineer

Cerebras Systems · Bengaluru, Karnataka, India; Sunnyvale CA or Toronto Canada

Apply now

Software Headquarters/Sunnyvale Office Toronto Office India Office Posted Apr 15, 2026

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

About The Role

Cerebras Systems is a pioneer in large-scale AI Supercomputers. These multi-exaflop supercomputers are deployed in some of the biggest datacenters. These supercomputers are built using our Wafer-Scale Cluster technology - a cluster of several Wafer Scale Engine (WSE) chips. The Cluster engineering team is responsible for delivering software that are all-things related to cluster.

Responsibilities

Automate bare-metal configuration of networking, OS, and application software in large clusters of Cerebras WSE, servers, and switches.
Additional push button workflows for cluster upgrades, downgrades, and security patching with key metrics to minimize downtime on clusters.
An orchestration and scheduler system for resource allocation, job submission C placements for a multi-user environment on a cluster.
Seamless support for both on-premise and cloud mode deployment and operations.
A robust system for monitoring, detecting and handling failures for a variety of resources on the clusters (including High Availability of clusters).
Broad cluster and job monitoring and visualization capabilities, along with alerting systems.
User facing tools to monitor the status of jobs and collect metrics.
Administrator facing tools to manage and operate large clusters.

Skills & Qualifications

Strong track record of software architecture, system design and development.
Strong track record of development in distributed cluster.
Strong understanding of Kubernetes (K8s) software ecosystem, Prometheus and Grafana.
Strong development skills in GoLang, Python, bash.
Strong debugging skills with distributed systems.
Strong skill to develop tests for the new features and regress old features.

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Read our blog: Five Reasons to Join Cerebras in 2026.

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

Ready to apply?

Apply to Cerebras Systems

Cerebras Systems

View all jobs →

Kernel Engineer

Cerebras Systems · Bengaluru, Karnataka, India

Apply now

Software India Office Posted Apr 15, 2026

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

About The Role

As a Kernel Engineer on our team, you will develop high-performance software solutions at the intersection of hardware and software, developing high-performance software for cutting-edge AI and HPC workloads. Your focus will be on implementing, optimizing, and scaling deep learning operations to fully leverage our custom, massively parallel processor architecture.

You will be part of a world-class team responsible for the design, performance tuning, and validation of foundational ML and HPC kernels. This includes building a library of parallel and distributed algorithms that maximize compute utilization and push the boundaries of training efficiency for state-of-the-art AI models. Your work will be critical to unlocking the full potential of our hardware and accelerating the pace of AI innovation.

Responsibilities

Develop design specifications for new machine learning and linear algebra kernels and mapping to the Cerebras WSE System using various parallel programming algorithms.
Develop and debug kernel library of highly optimized low level assembly instruction and C-like domain specific language routines to implement algorithms targeting the Cerebras hardware system.
Develop and debug high-performance kernel routines in low-level assembly and a custom C-like (CSL) language, implementing algorithms optimized for the Cerebras hardware system.
Using mathematical models and analysis to measure the software performance and inform design decisions.
Develop and integrate unit and system testing methodologies to verify correct functionality and performance of kernel libraries.
Study emerging trends in Machine Learning applications and help evolve Kernel library architecture to address computational challenges of the start-of-the-art Neural Networks.
Interact with chip and system architects to optimize instruction sets, microarchitecture, and IO of next generation systems.

Skills & Qualifications

Bachelor’s, Master’s, PhD, or foreign equivalent in Computer Science, Computer Engineering, Mathematics, or a related field.
Proven experience leading technical teams, including mentoring engineers, setting technical direction, and driving execution.
Strong understanding of hardware architecture concepts and willingness to dive into new system architectures.
Proficiency in C++ and Python; experience with low-level systems programming.
Familiarity with library/API development best practices and performance optimization.
Excellent debugging skills across complex, layered software stacks.

Preferred Skills & Qualifications

Experience leading teams in kernel development, performance optimization, or low-level systems programming.
Strong background in parallel algorithms and distributed memory systems.
Hands-on experience with accelerators such as GPUs, FPGAs, or other custom hardware.
Familiarity with machine learning workloads and frameworks like TensorFlow and PyTorch.
Understanding of HPC kernels and strategies for optimizing them on modern architectures.

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Read our blog: Five Reasons to Join Cerebras in 2026.

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

Ready to apply?

Apply to Cerebras Systems

Cerebras Systems

View all jobs →

Infrastructure Hardware Technical Program Manager (Server and Network Systems)

Cerebras Systems · Sunnyvale CA or Toronto Canada

Apply now

Software US and Canada Offices Posted Apr 15, 2026

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

As an Infrastructure Hardware Technical Program Manager (Server and Network Systems) on the Cluster Architecture Team, you will drive end-to-end delivery of server and network platform programs across Cerebras CS-3–based AI clusters — from requirements and vendor selection through lab bring-up, qualification, and production rollout. You will be the execution owner for multi-team programs spanning OEM/ODM partners, component vendors, internal software/runtime teams and architects, validation/QA, and deployment/operations.

This role is intentionally technical: you must understand server, network, and system-level trade-offs well enough to run effective technical reviews, keep programs grounded in real constraints, and maintain a crisp decision trail - while partnering closely with the Compute / Server / Network Platform Architects for detailed technical direction and sign-off. You will also build shared understanding with our rack/elevations and physical datacenter design partners so that server and network changes land smoothly in real deployments (without owning physical DC design).

Responsibilities

Own end-to-end program execution for server systems and network equipment in Cerebras clusters, including new platforms, refreshes, and major component/config changes.
Drive requirements gathering and convert inputs into executable plans with clear milestones, readiness gates, and cross-functional deliverables.
Represent Cluster Architecture in executive reviews, OKR cycles, and leadership/customer forums as needed.
Build and manage integrated schedules across vendors and internal teams, track dependencies, critical path, and risks.
Manage OEM/ODM and switch/vendor engagements (RFI/RFP, samples, escalations, roadmap alignment).
Partner with Compute / Server Platform / Network Architects to turn architectural decisions into qualification plans, acceptance criteria, and rollout strategies.
Lead qualification and release readiness (lab/staging validation, regression tracking, go/no-go decisions).
Own risk and change management into production, including versioning, rollout sequencing, and stakeholder communication.
Ensure operational readiness with deployment and fleet teams and maintain alignment with rack/physical DC owners on power, cooling, space, and cabling constraints.

Skills and Qualifications

B.S. or M.S. in Computer Science, Electrical/Computer Engineering, or equivalent experience.
8+ years in Technical Program Management (or similar delivery leadership) for server, network, or infrastructure platforms from concept through production.
Experience coordinating complex server and/or datacenter network programs across OEM/ODMs, switch vendors, and internal engineering teams.
Working knowledge of server architecture (CPU/NUMA, memory bandwidth, PCIe, NIC and storage IO) and enough networking fundamentals (leaf-spine fabrics, switch platforms, high-performance interconnects) to run effective technical reviews.
Familiarity with Linux server fleet management (provisioning, firmware/BIOS, drivers, field triage).
Strong multi-team program execution skills: integrated plans, risk management, dependency tracking, and executive-level communication.
Ability to operate in ambiguity and keep parallel server and network workstreams aligned.
Experience with AI/ML, HPC, or performance-sensitive distributed infrastructure is a plus.

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Read our blog: Five Reasons to Join Cerebras in 2026.

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

Ready to apply?

Apply to Cerebras Systems

Cerebras Systems

View all jobs →

Senior Runtime Engineer

Cerebras Systems · Sunnyvale CA or Toronto Canada

Apply now

Software Headquarters/Sunnyvale Office Toronto Office Posted Apr 15, 2026

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

About The Role

We are building the next generation of large-scale AI systems that power training and inference workloads at unprecedented scale and efficiency.

You will design and develop high-performance distributed software that orchestrates massive compute and data pipelines across heterogeneous clusters. Your work will push the limits of concurrency, throughput, and scalability—enabling efficient execution of models at massive scale. This role sits at the intersection of systems engineering and machine learning performance, demanding both architectural depth and low-level implementation skills. You will help shape how models are executed and optimized end-to-end, from data ingestion to distributed execution, across cutting-edge hardware platforms.

We’re hiring for runtime roles across both Training and Inference.

Responsibilities

Design and implement distributed runtime components to efficiently manage large-scale execution workloads.
Develop and optimize high-performance data and communication pipelines that fully utilize CPU, memory, storage, and network resources.
Enable scalable execution across multiple compute nodes, ensuring high concurrency and minimal bottlenecks.
Collaborate closely with ML and compiler teams to integrate new model architectures, training regimes, and hardware-specific optimizations.
Diagnose and resolve complex performance issues across the software stack using profiling and instrumentation tools.
Contribute to overall system design, architecture reviews, and roadmap planning for large-scale AI workloads.

Skills & Qualifications

3+ years of experience developing high-performance or distributed system software.
Strong programming skills in C/C++, with expertise in multi-threading, memory management, and performance optimization.
Experience with distributed systems, networking, or inter-process communication.
Solid understanding of data structures, concurrency, and system-level resource management (CPU, I/O, and memory).
Proven ability to debug, profile, and optimize code across scales—from threads to clusters.
Bachelor’s, Master’s, or equivalent experience in Computer Science, Electrical Engineering, or related field.

Preferred Skills & Qualifications

Familiarity with machine learning training or inference pipelines, especially distributed training and large-model scaling.
Exposure to Python and PyTorch, particularly in the context of model training or performance tuning.
Experience with compiler internals, custom hardware interfaces, or low-level protocol design.
Prior work on high-performance clusters, HPC systems, or custom hardware/software co-design.
Deep curiosity about how to unlock new levels of performance for large-scale AI workloads.

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Read our blog: Five Reasons to Join Cerebras in 2026.

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

Ready to apply?

Apply to Cerebras Systems

Cerebras Systems

View all jobs →

Engineering Manager, Inference ML Runtime

Cerebras Systems · Sunnyvale CA or Toronto Canada

Apply now

Software Headquarters/Sunnyvale Office Toronto Office Posted Apr 15, 2026

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

About the Role

The Inference ML Engineering team at Cerebras builds the runtime, APIs, and systems that power the fastest generative AI inference platform in the world.

As an Engineering Manager, Inference ML Runtime, you will lead a team responsible for designing and scaling the systems that enable seamless execution of state-of-the-art AI models on Cerebras hardware. You will operate at the intersection of machine learning, distributed systems, and high-performance runtime engineering, translating cutting-edge research into production-ready infrastructure to serve a variety of text-only and multimodal models.

This role combines technical leadership, people management, and execution ownership, with direct impact on Cerebras’ core inference platform.

What You’ll Do

Technical Leadership

Own the architecture and evolution of the ML inference runtime and serving systems.
Guide the design of:

high-throughput, low-latency inference pipelines;
multimodal model execution (text, image, audio, video);
scalable serving infrastructure for concurrent workloads.

Partner with cloud, compiler, core runtime, hardware, and ML teams to optimize end-to-end performance.

Team Leadership

Build, manage, and grow a team of ML systems and infrastructure engineers.
Provide technical direction, mentorship, and career development.
Foster a culture of ownership, velocity, and engineering excellence.
Recruit top talent in ML systems, distributed systems, and runtime engineering.

Execution & Delivery

Drive execution of complex, cross-functional initiatives across:

ML engineering;
compiler/runtime teams;
cloud and infrastructure teams.

Own delivery of features such as:

advanced inference capabilities (structured outputs, sampling strategies);
heterogeneous model types, including test and multimodal;
performance optimization (latency, throughput, memory efficiency);
observability and reliability across the inference stack.

Ensure high-quality releases through strong testing, validation, and operational rigor.

Platform & Performance Ownership

Scale Cerebras’ inference platform to handle large volumes of concurrent requests at very fast speed
Drive improvements in:

latency;
throughput;
compute efficiency.

Identify and prioritize technical debt and system bottlenecks.
Maintain Cerebras’ industry-leading inference speed advantage.

Cross-Functional Collaboration

Partner with:

ML researchers (model enablement);
compiler teams (model execution optimization);
cloud/platform teams (deployment and scaling).

Act as a bridge between research, infrastructure, and production systems.

What You Bring

Required

8+ years of experience in:

large-scale software engineering;
ML systems or distributed systems.

2+ years of engineering management experience.
Strong programming skills in:

Python (production systems);
C++ (performance-critical systems).

Experience building and scaling large-scale inference systems (LLMs or multimodal).
Experience working with cloud infrastructures and following best-practices for building scalable microservices and applications.

Preferred

Experience with:

LLM serving frameworks (e.g., vLLM, TensorRT-LLM, SGLang);
PyTorch and deep learning frameworks;
distributed systems and high-performance computing.

Familiarity with:

ML runtime systems;
model execution pipelines;
performance optimization for AI workloads.

Why This Role Matters

This team is central to Cerebras’ mission of delivering the fastest AI inference in the world. Your work will directly enable real-time AI applications and unlock new capabilities across enterprise and frontier AI use cases.

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Read our blog: Five Reasons to Join Cerebras in 2026.

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

Ready to apply?

Apply to Cerebras Systems

Cerebras Systems

View all jobs →

Senior ML Systems Engineer

Cerebras Systems · Sunnyvale CA or Toronto Canada

Apply now

Software US and Canada Offices Posted Apr 15, 2026

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

About the Role
We are seeking a versatile and experienced engineer to join our SOTA Training Platform team. This team is responsible to rapidly bring up state-of-the-art open-source models (like LLaMA, Qwen, etc) or customer-provided proprietary models on our Cerebras CSX systems. Success in this role requires a system-minded generalist who thrives in fast-paced bringup environments and is comfortable working across the entire Cerebras software stack.
Your work will play a critical role in achieving unprecedented levels of performance, efficiency, and scalability for AI applications.

Responsibilities

Contribute to the end-to-end bring up of ML models on Cerebras CSX systems.
Work across the stack: model architecture translation, graph lowering, compiler optimizations, runtime integration, and performance tuning.
Debug performance and correctness issues spanning model code, compiler IRs, runtime behavior, and hardware utilization.
Propose and prototype improvements across tools, APIs, or automation flows to accelerate future bring ups.
Study emerging training and post-training algorithms and map to Cerebras software architecture and hardware.

Skills & Qualifications

Bachelor’s, Master’s, or PhD in Computer Science, Engineering, or a related field.
5+ years of relevant industry experience (internship/co-op experience included)
Comfort navigating the full AI toolchain: Python modeling code, compiler IRs, performance profiling, etc.
Strong debugging skills across performance, numerical accuracy, and runtime integration.
Experience with deep learning frameworks (e.g., PyTorch, TensorFlow) and familiarity with model internals (e.g., attention, MoE, diffusion).
Proficiency in C/C++ programming and experience with low-level optimization.
Proven experience in compiler development, particularly with LLVM and/or MLIR.
Strong background in optimization techniques, particularly those involving NP-hard problems.
Familiarity with large scale ML systems and state of the art algorithms, including model training and reinforcement learning.

What We Offer

Competitive salary and benefits package.
Opportunities for professional growth and career advancement.
A dynamic and innovative work environment.
The chance to work on cutting-edge technologies and make a significant impact on the future of AI.

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Read our blog: Five Reasons to Join Cerebras in 2026.

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

Ready to apply?

Apply to Cerebras Systems

Cerebras Systems

View all jobs →

Senior ML Software Engineer - Integration & Quality

Cerebras Systems · Sunnyvale CA or Toronto Canada

Apply now

Software Headquarters/Sunnyvale Office Toronto Office Posted Apr 15, 2026

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

About the Role

We are looking for a Software Engineer to join the ML Integration and Quality team at Cerebras. This team sits at the intersection of machine learning infrastructure, distributed systems, and hardware/software co-design.

In this role, you will help integrate and validate the software stack that powers the Cerebras AI platform, ensuring large-scale ML workloads run reliably and efficiently across our systems. You will work closely with engineers across runtime, compiler, kernel, and hardware teams to debug complex issues, improve automation, and strengthen the reliability of our AI infrastructure.

This is an excellent opportunity for engineers who enjoy working across the stack, debugging complex systems, and improving the reliability of large-scale AI platforms.

Responsibilities

Integrate and validate software components across the Cerebras AI platform.
Collaborate with engineers across ML runtime, compiler, kernel, and hardware teams to ensure reliable feature integration.
Investigate and debug complex issues across distributed systems and large-scale ML workloads.
Build automation tools and infrastructure to support integration testing, system validation, and debugging workflows.
Develop and maintain testbeds used to validate system performance and reliability.
Identify system bottlenecks, failure points, and edge cases that impact ML workload performance.
Contribute to test plans and validation strategies for new features and platform capabilities.
Improve observability, diagnostics, and debugging workflows across the ML software stack.
Work with product and engineering teams to ensure high-quality releases of the Cerebras inference platform.

Minimum Skills & Qualifications

~5 years of experience in software engineering, systems engineering, or infrastructure development.
Strong programming skills in Python, C++, Go, or similar languages.
Experience debugging complex systems or distributed software environments.
Familiarity with systems-level development, infrastructure tooling, or platform integration.
Experience building automation tools, testing frameworks, or internal developer tooling.
Strong problem-solving skills and the ability to investigate issues across multiple system layers.
Excellent communication and collaboration skills.

Preferred Skills

Experience working with machine learning infrastructure or ML model deployment.
Familiarity with LLM or multimodal model workloads.
Experience with distributed systems, cloud infrastructure, or large-scale compute clusters.
Exposure to performance debugging, profiling, or system observability tools.
Experience with microservices, containerized environments, or cluster orchestration.
Exposure to hardware accelerators, compilers, or ML frameworks.

Location

This role follows a hybrid schedule, requiring in-office presence 3 days per Please note, fully remote is not an option.
Office locations: Sunnyvale, CA or Toronto, ON.

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Read our blog: Five Reasons to Join Cerebras in 2026.

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

Ready to apply?

Apply to Cerebras Systems

Cerebras Systems

View all jobs →

Senior WAN Network Engineer

Cerebras Systems · Sunnyvale, CA

Apply now

Software Headquarters/Sunnyvale Office Posted Apr 15, 2026

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

About The Role

We are seeking a highly skilled WAN Network Engineer to design, implement, manage, and optimize global connectivity. The ideal candidate will have strong experience with carrier networks, routing protocols, and network security, and will play a critical role in ensuring high availability, performance, and reliability of global network services.

Responsibilities

Design, deploy, and maintain WAN network across leased lines and dark fiber for low latency and 99.999% availability.
Collaborate with telecom providers and ISPs for circuit provisioning, upgrades, and issue resolution.
Configure, troubleshoot, and optimize security and routing protocols (IPSec Tunnels, MACsec, BGP, VXLAN, EVPN).
Monitor WAN performance, latency, packet loss, capacity utilization. Analyze traffic patterns to predict growth and trigger circuit upgrades or hardware refreshes before bottlenecks occur.
Implement redundancy, failover, QoS, and traffic engineering to ensure business continuity.
Participate in network modernization and cloud connectivity projects (AWS, Azure, GCP).
Provide Tier 3 support for WAN-related incidents and root cause analysis.
Develop and maintain network documentation, diagrams, and standard operating procedures.
Use Python, Ansible, or Terraform to deploy configurations and manage network state at scale.
Support network security initiatives including site-to-site VPNs and perimeter connectivity. Ensure compliance with security, governance, and operational best practices.

Requirements

Bachelor’s degree in Computer Science, Electrical Engineering, or Computer Engineering. Master’s degree is preferred.
6+ years of experience in WAN network engineering, or Service Provider network, or Hyper-scale Data Center environment.
Industry certifications such as, CCIE or JNCIE. Strong knowledge of BGP routing protocols and WAN technologies.
Experience with major network vendors such as Arista, Cisco, Juniper, or Palo Alto.
Strong troubleshooting and analytical skills, and excellent communication and documentation skills.
Experience with cloud networking and hybrid network architectures.
Expertise with network automation tools (Python, Ansible, Terraform).
Knowledge of network monitoring tools (SolarWinds, Kentik, ThousandEyes, or similar).
Ability to participate in on-call rotation and after-hours maintenance.

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Read our blog: Five Reasons to Join Cerebras in 2026.

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

Ready to apply?

Apply to Cerebras Systems

Cerebras Systems

View all jobs →

Cluster UI Full Stack, Engineering Lead

Cerebras Systems · Bengaluru, Karnataka, India; Toronto, Ontario, Canada

Apply now

Software Toronto Office India Office Posted Apr 15, 2026

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

About the Role

In this role, you will be building a world class UI-based large-scale cluster management portal. This portal will act as one stop for all operations and maintenance of cerebras clusters – such as cluster bringup deployment (day0/1/2), job management, health management to name a few. Cerebras AI clusters may have 1000’s of

Wafer-scale accelerator systems, several 1000’s of high-end servers, and several 1000’s of networking ports including switches.

Responsibilities

Be the primary engineering face and owner of UI and integrating to the backend through standard best practices.
Heavily partner with product management and end users of this tool to build a world class tool.
Provide strong technical leadership for this tool development.
Actively work with variety of engineering teams that needs interaction in backend.
Build UI experience that is cohesive and seamless across all operations and maintenance activities.
Ability to build and mentor a small team of engineers for this tool.

Skills & Qualifications

6+ years of demonstrated technical excellence in UI development and backend integration.
5+ years of professional software engineering experience with modern front-end frameworks such as React, Angular, or Vue.
5+ years of technical engineering experience with coding in languages including, but not limited to, C++, TypeScript, JavaScript, or Python.
2+ years of back-end development experience using technologies like Node.js, Python or Go with a proven track record of designing scalable APIs and microservices.
Expertise in HTML, CSS, JavaScript/TypeScript, and responsive design principles to deliver polished, accessible, and high-performance user interfaces.
Experience with cloud platforms such as AWS, Azure, or GCP.
Experience in CI/CD pipelines for deploying and maintaining production-grade applications.
Proven track record of delivering product, launching and deploying solutions in production.
Excellent communication, articulation, collaboration and stakeholder management.
Tough decision-making skills with data and trade-off analysis.
Outstanding sense for product and user journeys, out-of-box thinker.
Outstanding road map and schedule execution skills under tight timeline and budgets.

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Read our blog: Five Reasons to Join Cerebras in 2026.

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

Ready to apply?

Apply to Cerebras Systems

Cerebras Systems

View all jobs →

Staff FE Engineer - Inference

Cerebras Systems · Sunnyvale CA or Toronto Canada

Apply now

Software Headquarters/Sunnyvale Office Toronto Office Posted Apr 15, 2026

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

About The Role

We’re hiring a staff level full-stack Technical Lead (L6/L7) to own and scale critical parts of the Cerebras Developer Console — the primary interface developers and enterprises use to run and manage inference workloads.

This is a deeply technical, end-to-end role. You’ll build high-quality frontend systems (Next.js, TypeScript) and design backend services (GraphQL, Postgres, Redis) that power usage tracking, billing, quotas, and observability. The systems you build will operate at high scale, require careful data modeling, and balance real-time and batch processing. You’ll be expected to make strong architectural decisions and move quickly from idea to production.

You’ll join an existing, high-velocity team and take ownership of major platform areas such as billing, request logs, and metrics. This is not a “ticket execution” role — you’ll define problems, drive technical direction, and lead execution across the stack. The work directly impacts customer experience and revenue, and the expectations are correspondingly high.

As a Technical Lead, you’ll set the bar for engineering quality and execution. You’ll mentor engineers, drive design reviews, and push the team toward simple, scalable solutions. We’re looking for someone who thrives in fast-moving environments, operates with urgency, and is comfortable navigating ambiguity while shipping high-quality systems.

What You’ll Do

Own a major area of the platform — take end-to-end responsibility for systems such as billing, usage tracking, quotas, request logs, or metrics.
Build and evolve core systems — design and implement APIs, services, and UI that power the Developer Console and scale with growing customer usage.
Make architectural decisions — define system boundaries, data models, and tradeoffs across real-time vs batch processing, performance, and cost.
Drive projects from 0 → 1 → scale — take ambiguous problems, define solutions, and deliver them to production.
Lead technical execution — break down work, align engineers, and ensure high-quality delivery across the stack.
Improve system reliability and visibility — ensure systems are observable, debuggable, and production-ready.
Partner with product and design — shape developer-facing workflows and experiences.

What We’re Looking For

Track record of ownership — you’ve led and delivered complex systems end-to-end, not just contributed components.
Technical depth in backend systems — strong fundamentals in APIs, data modeling, and distributed systems; experience with high-scale or real-time systems is a plus.
Full-stack capability — comfortable working across frontend and backend, with the ability to make pragmatic tradeoffs.
Strong technical judgment — you make sound architectural decisions and know when to optimize vs move fast.
Ability to operate without structure — you bring clarity to ambiguous problems and drive execution independently.
High standards for quality — you care about correctness, maintainability, and long-term system health.
Influence and mentorship — you elevate the team through design reviews, guidance, and leading by example.
Bias for action — you move quickly, unblock yourself and others, and focus on impact.
Experience in fast-moving environments — comfortable with shifting priorities and evolving scope.
Experience — typically 8+ years of industry experience building and operating production systems.
Education — Bachelor’s or master's in computer science (or equivalent practical experience).

Why Cerebras

Cerebras is redefining the speed and scale of AI inference. The systems we build power real-world production workloads, not demos.

Work on systems that matter — you’ll build core platform infrastructure that directly impacts how customers run and scale AI workloads in production.
Own meaningful surface area — this is not a narrow role. You’ll take ownership of critical systems (billing, usage, observability) that sit at the heart of the platform.
Operate at the intersection of product and infrastructure — the work spans developer experience, distributed systems, and real-time data, requiring both strong engineering and product thinking.
High impact, low bureaucracy — you’ll work in a small, high-performing team where decisions are made quickly and engineers have real influence on direction.
Grow with the platform — as we expand from cloud offerings into broader infrastructure and cluster management, the scope and technical challenges will continue to grow.

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Read our blog: Five Reasons to Join Cerebras in 2026.

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

Ready to apply?

Apply to Cerebras Systems

Cerebras Systems

View all jobs →

Cerebras Systems

Member of Technical Staff (Software Engineer)

Why Join Cerebras

Apply today and become part of the forefront of groundbreaking advancements in AI!

Sr. Technical Staff

Why Join Cerebras

Apply today and become part of the forefront of groundbreaking advancements in AI!

Sr. Member of Technical Staff

Why Join Cerebras

Apply today and become part of the forefront of groundbreaking advancements in AI!

ML Performance Benchmarking Engineer

Why Join Cerebras

Apply today and become part of the forefront of groundbreaking advancements in AI!

Senior Performance Engineer, Inference

About The Role

Responsibilities

Skills & Qualifications

Why Join Cerebras

Apply today and become part of the forefront of groundbreaking advancements in AI!

ML Systems Performance Engineer

About The Role

Responsibilities

Requirements

Why Join Cerebras

Apply today and become part of the forefront of groundbreaking advancements in AI!

Full Stack LLM Engineer

Why Join Cerebras

Apply today and become part of the forefront of groundbreaking advancements in AI!

Kernel Engineer

About The Role

Why Join Cerebras

Apply today and become part of the forefront of groundbreaking advancements in AI!

Network Architect

About The Role

Responsibilities

Skills & Qualifications

Why Join Cerebras

Apply today and become part of the forefront of groundbreaking advancements in AI!

ML Research Engineer (Inference)

Why Join Cerebras

Apply today and become part of the forefront of groundbreaking advancements in AI!

QA Lead (ML Integration and Quality)

Why Join Cerebras

Apply today and become part of the forefront of groundbreaking advancements in AI!

Compute Server Platform Architect

About The Role

Why Join Cerebras

Apply today and become part of the forefront of groundbreaking advancements in AI!

Distributed Systems Cluster Security Software – Engineering Lead

Why Join Cerebras

Apply today and become part of the forefront of groundbreaking advancements in AI!

Software Engineer, Kernel Reliability

About The Role

Why Join Cerebras

Apply today and become part of the forefront of groundbreaking advancements in AI!

LLM Inference Performance & Evals Engineer

About The Role

Key Responsibilities

Skills And Qualifications

Assets

Why Join Cerebras

Apply today and become part of the forefront of groundbreaking advancements in AI!

ML Software Tool Development Engineer

Why Join Cerebras

Apply today and become part of the forefront of groundbreaking advancements in AI!

Staff Python / PyTorch Developer — Frontend Inference Compiler – Dubai

Why Join Cerebras

Apply today and become part of the forefront of groundbreaking advancements in AI!

Performance Engineer

Why Join Cerebras

Apply today and become part of the forefront of groundbreaking advancements in AI!

Distributed Software Engineer

About The Role

Responsibilities

Skills & Qualifications

Why Join Cerebras

Apply today and become part of the forefront of groundbreaking advancements in AI!

Kernel Engineer

About The Role

Why Join Cerebras

LLM Inference Performance & Evals Engineer