Pick a job to read the details

Tap any role on the left — its description and apply link will open here.

Member of Technical Staff, Cloud Infrastructure

Fireworks AI · New York, NY; San Mateo, CA

Engineering New York San Mateo Posted May 3, 2026

About Us:

At Fireworks, we’re building the future of generative AI infrastructure. Our platform delivers the highest-quality models with the fastest and most scalable inference in the industry. We’ve been independently benchmarked as the leader in LLM inference speed and are driving cutting-edge innovation through projects like our own function calling and multimodal models. Fireworks is a Series C company valued at $4 billion and backed by top investors including Benchmark, Sequoia, Lightspeed, Index, and Evantic. We’re an ambitious, collaborative team of builders, founded by veterans of Meta PyTorch and Google Vertex AI.

The Role:

As a Software Engineer on our Cloud Infrastructure team, you'll be at the forefront, architecting and building the foundational systems that power Fireworks AI's revolutionary generative AI platform. You'll spearhead the creation of one of the world's first virtual clouds, seamlessly serving AI workloads across the globe and every cloud provider. Your mission: to deliver unparalleled reliability, efficiency, and scalability, fueling the world's most innovative AI products.This is a highly technical role requiring deep expertise in distributed systems, cloud-native infrastructure, and machine learning platforms. You’ll partner closely with engineering partners, product teams, and infrastructure stakeholders to design solutions that balance performance, cost-efficiency, and operational simplicity across compute, storage, and networking layers.

Key Responsibilities:

Architect and build scalable, resilient, and high-performance backend infrastructure to support distributed training, inference, and data processing pipelines.
Lead technical design discussions, mentor other engineers, and establish best practices for building and operating large-scale ML infrastructure.
Design and implement core backend services (e.g., job schedulers, resource managers, autoscalers, model serving layers) with a focus on efficiency and low latency.
Drive infrastructure optimization initiatives, including compute cost reduction, storage lifecycle management, and network performance tuning.
Collaborate cross-functionally with ML, DevOps, and product teams to translate research and product needs into robust infrastructure solutions.
Continuously evaluate and integrate cloud-native and open-source technologies (e.g., Kubernetes, Kubeflow, MLFlow) to enhance our platform’s capabilities and reliability.
Own end-to-end systems from design to deployment and observability, with a strong emphasis on reliability, fault tolerance, and operational excellence.

Minimum qualifications:

Bachelor’s degree in Computer Science, Engineering, or a related technical field (or equivalent practical experience).
5+ years of experience designing and building backend infrastructure in cloud environments (e.g., AWS, GCP, Azure).
Proven experience in ML infrastructure and tooling (e.g., PyTorch, TensorFlow, Vertex AI, SageMaker, Kubernetes, etc.).
Strong software development skills in languages like Python, or C++.
Deep understanding of distributed systems fundamentals: scheduling, orchestration, storage, networking, and compute optimization.

Preferred qualifications:

Master’s or PhD in Computer Science or related field.
Experience leading infrastructure projects supporting large-scale ML/AI workloads or high-throughput systems.
Familiarity with infrastructure-as-code and CI/CD tooling (e.g., Terraform, ArgoCD, GitOps).
Track record of driving system performance, reliability, and cost-efficiency improvements.
Contributions to open-source cloud or ML infrastructure projects a plus.

Total compensation for this role also includes meaningful equity in a fast-growing startup, along with a competitive salary and comprehensive benefits package. Base salary is determined by a range of factors including individual qualifications, experience, skills, interview performance, market data, and work location. The listed salary range is intended as a guideline and may be adjusted.

Base Pay Range (Plus Equity)

$175,000—$220,000 USD

Why Fireworks AI?

Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low-latency inference to scalable model serving.
Build What’s Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally.
Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results.
Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation.

Fireworks AI is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all innovators.

Ready to apply?

Apply to Fireworks AI

Fireworks AI

View all jobs →

Member of Technical Staff, Software Engineer

Fireworks AI · San Mateo, CA

Apply now

Engineering San Mateo Posted May 2, 2026

About Us:

At Fireworks, we’re building the future of generative AI infrastructure. Our platform delivers the highest-quality models with the fastest and most scalable inference in the industry. We’ve been independently benchmarked as the leader in LLM inference speed and are driving cutting-edge innovation through projects like our own function calling and multimodal models. Fireworks is a Series C company valued at $4 billion and backed by top investors including Benchmark, Sequoia, Lightspeed, Index, and Evantic. We’re an ambitious, collaborative team of builders, founded by veterans of Meta PyTorch and Google Vertex AI.

The Role:

You’ll be a core builder of the backend systems that power Fireworks:

Our main web application
Model and fine-tuning job orchestration
Billing and enterprise features
Accounts, org management, access controls
Policy enforcement and governance
APIs and developer tooling
And many other cool things!

This is platform engineering with product impact. Your systems will directly shape how customers build on top of AI. You’ll work closely with product, frontend, infra, and GTM to ship end-to-end features — not just tickets.

What You’ll Do

Design and build scalable backend services
Own major product surfaces from architecture to production
Improve reliability, performance, and developer experience
Work directly with customers to understand pain points
Ship enterprise-grade features without enterprise slowness
Use AI tooling aggressively — we expect you to automate yourself

You Might Be a Fit If

You like building real products, not just infrastructure for infrastructure’s sake
You enjoy driving initiatives across teams to get things done
You think in terms of systems, tradeoffs, and business impact
You care about developer experience and clean abstractions
You’re curious about AI and want to build where the future is going
You want ownership, not task lists

Minimum Qualifications

Comfortable collaborating with both humans and AI systems (Yes, We’re Serious)
5+ years of experience helping humans solve real problems
2+ years of working with AI — agents, bots, chats, or building on top of models

Why Fireworks AI?

Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low-latency inference to scalable model serving.
Build What’s Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally.
Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results.
Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation.

Fireworks AI is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all innovators.

Ready to apply?

Apply to Fireworks AI

Fireworks AI

View all jobs →

IT Engineer

Fireworks AI · San Mateo, CA

Apply now

Engineering San Mateo Posted Apr 16, 2026

About Us:

At Fireworks, we’re building the future of generative AI infrastructure. Our platform delivers the highest-quality models with the fastest and most scalable inference in the industry. We’ve been independently benchmarked as the leader in LLM inference speed and are driving cutting-edge innovation through projects like our own function calling and multimodal models. Fireworks is a Series C company valued at $4 billion and backed by top investors including Benchmark, Sequoia, Lightspeed, Index, and Evantic. We’re an ambitious, collaborative team of builders, founded by veterans of Meta PyTorch and Google Vertex AI.

Role Description:

We’re looking for an IT Engineer that is obsessed with the AI movement and is always looking for any opportunity to automate. In this role, you’ll handle technical support requests, and help drive employee satisfaction by resolving issues quickly and effectively. You’ll play a key part in identifying automation opportunities, addressing IT issues and inquiries, and collaborating with department heads to help catalog challenges that have automation opportunities. This role is ideal for someone who thrives in a fast-paced, user-facing environment and enjoys learning something new every day.

Key Responsibilities:

Own and drive resolution for employee support requests, including managing incidents, and responding to inquiries in a timely, helpful manner.
Investigate and troubleshoot issues when reported by employees or triggered by a system alert by looking into logs and other internal tooling.
Triaging technical issues to the Head of IT and engineering team.
Contributing to a knowledge base for the broader support team.
Proactively explore and implement process improvements, including opportunities for automation using AI tools.
Evaluates user needs, defines technical problems, and works with various departments to escalate solutions in a timely manner.
Provides walk-up assistance to customers who may need in-person support.
Develops documentation to help improve the Service Management ecosystem.
Deploys new or refreshed computers and assists customers with post install configurations.
Deploys end user operating system and other 3rd party software updates to end user computing systems.
Cross trains with teams at the next levels to develop specializations.
Assists with the management and maintenance of IT inventory and asset tracking.
Performs installs, moves, adds, and changes of all approved end user hardware, software, and applications for local and remote personnel.
Participates in projects, including technology refresh, deployments, and office build-outs.
Provides remote support to resolve issues reported from other locations.
Supports and assists with Infrastructure to provide onsite troubleshooting for network and infrastructure related issues.
Supports and troubleshoots meeting room technology and AV equipment, ensuring smooth operation for meetings and presentations.
Provides support for printing related issues, coordinating with the print service provider to ensure reliable and seamless printing services for end users.
Provides after-hours support as needed for high severity situations.
Perform other duties assigned by leadership.

Minimum Qualifications:

5+ years of startup experience building and supporting IT systems.
3+ years experience as a Power User in an Identify Access Management space e.g. Okta, Ping etc.
3+ years experience in Google Workspace, Microsoft Entra, or similar.
3+ years experience in client-side-engineering tools such as Jamf, Intune, Kandji, JumpCloud or similar.
Basic scripting experience.
Strong passion in the AI field and basic knowledge of how modern AIs are used in IT.
Strong technical acumen with the ability to understand, investigate, and resolve customer issues.
Experience performing technical troubleshooting, including collecting debugging information and triaging problems.
Excellent communication skills and high degree of customer empathy.
Comfort working in a fast-paced startup environment with evolving processes and responsibilities.

Preferred Qualifications:

Experience supporting AI platforms.
5+ years of experience supporting complex IT systems.
Strong technical and computer skills to resolve software and hardware issues.
Strong customer service, interpersonal skills and the ability to interact with all levels of staff.
Strong work ethic and eagerness to produce high quality, accurate results.
Ability to hold sensitive information with a high level of confidentiality and integrity.
Ability to communicate and present ideas in a clear, concise and professional manner both verbally and in writing.
Ability to proactively solve problems and apply innovative solutions.
Ability to work and collaborate in a team environment, and ability to work independently and prioritize work.
Ability to work on multiple projects at the same time.
Ability to effectively meet deadlines at expected quality.
Ability to lift IT equipment such as monitors and servers, weighing up to 20 pounds.
Ability to bend and/or stoop to access cables under desk or in server racks.
Ability to read small print, recognize color-coded cables, and distinguish sounds or alarms.
Travel may be required.

Why Fireworks AI?

Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low-latency inference to scalable model serving.
Build What’s Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally.
Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results.
Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation.

Fireworks AI is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all innovators.

Ready to apply?

Apply to Fireworks AI

Fireworks AI

View all jobs →

Solutions Architect

Fireworks AI · New York, NY; San Mateo, CA

Apply now

Engineering New York San Mateo Posted Apr 15, 2026

About Us:

At Fireworks, we’re building the future of generative AI infrastructure. Our platform delivers the highest-quality models with the fastest and most scalable inference in the industry. We’ve been independently benchmarked as the leader in LLM inference speed and are driving cutting-edge innovation through projects like our own function calling and multimodal models. Fireworks is a Series C company valued at $4 billion and backed by top investors including Benchmark, Sequoia, Lightspeed, Index, and Evantic. We’re an ambitious, collaborative team of builders, founded by veterans of Meta PyTorch and Google Vertex AI.

In the last few months alone we've launched the Fireworks Training Platform, partnered with Microsoft Azure Foundry, and published research straight from our production systems which is helping scale some of the most innovative companies and products of our generation.

As an SA you'll be close to all of it. The customer conversations you lead directly feed our roadmap, and the work you do shows up in what we build and publish next. A few examples of what that looks like in practice:

Frontier RL is cheaper than the mega-cluster narrative suggests — we ran cross-region rollouts using 98% sparse weight deltas and published what we learned
Training-inference parity in MoE models — kernel fusions that are mathematically equivalent can still drift numerically; we shipped the fixes across Kimi K2.5 and Qwen3.5-MoE
The fine-tuning bottleneck isn't the algorithm — integration friction and iteration speed are what actually stall teams; we documented the patterns across dozens of customer engagements

If you want to work on hard infrastructure problems, be close to the customers pushing the frontier, and actually see your work ship come work with us!

The Role:

Solutions Architects at Fireworks are the technical and strategic owners of the customer relationship from the first discovery call through to production. You'll work with some of the most ambitious engineering teams in the world, translating complex business problems into concrete AI solutions built on the Fireworks platform.

This is a role that demands both technical depth and strong people skills. You'll need to earn the trust of ML engineers and VPs in the same meeting, scope and execute POCs without losing sight of the customer's definition of success, and know enough about inference, fine-tuning, and model architecture to make credible recommendations under pressure.

We hire SAs across two tracks. Both require strong technical grounding and sharp customer instincts; the difference is where each track places its emphasis.

Enterprise SA Track

Works with digital native and large organizations — navigating multiple stakeholders, procurement cycles, and executive relationships
Heavy emphasis on executive presence: equally comfortable presenting to a CTO and debugging a latency issue with an ML engineer
Leads complex technical sales: discovery, solution design, POC execution, commercial negotiation
Owns the account relationship end-to-end, including expansion and renewal
Strong commercial instincts understands how to build a business case and close large deals

Applied AI Track

Works with high-velocity accounts and technology partners startups, ISVs, and hyperscaler ecosystems
Heavier emphasis on technical execution more time in the code, building integrations, running enablements
Faster iteration cycles with less org navigation focused on shipping working solutions quickly
Embeds with partner engineering teams to enable their AI practices and build joint solutions
Comfortable operating across engineering, partnerships, and sales simultaneously

What You'll Work On:

Regardless of track, SAs at Fireworks own a consistent set of responsibilities:

Technical Discovery & Solution Design

Lead structured discovery conversations to unpack customer pain points, constraints, and success criteria before proposing solutions
Design end-to-end architectures for GenAI applications covering model selection, inference configuration, RAG design, and fine-tuning strategy

POC Scoping & Execution

Define what a minimal, compelling proof-of-concept looks like and own it through to delivery. Prioritize and stack rank opportunities: manage scope creep, set realistic timelines, and keep the customer aligned on what success looks like
Work alongside product and engineering teams to execute technically rigorous POCs

Performance Engineering

Run inference sweeps and establish performance baselines for customer workloads
Create and configure deployments tuned to specific latency, throughput, and cost targets

Fine-Tuning & Model Recommendations

Guide customers on fine-tuning strategy and model recommendations: when to use SFT, DPO, or RFT, and which model family fits their use case
Build and run fine-tuning pipelines directly for customers
Evaluate model quality and help customers build robust eval pipelines

Account Ownership & Stakeholder Management

Own the technical relationship across the account: from champion to executive sponsor
Navigate complex organizations, build trust at multiple levels, and maintain momentum through long sales cycles
Feed customer signal: deployment patterns, pain points, feature gaps — back into the product roadmap

What We're Looking For

5+ years in a technical, customer-facing role — Solutions Architect, Sales Engineer, Forward Deployed Engineer, Customer facing AI Engineer / Data Scientist or equivalent
Hands-on experience with the LLM stack: inference trade-offs, fine-tuning methodologies (SFT, RFT, DPO), and deploying models at scale
Strong Python skills: comfortable reading, writing, and debugging production code
Exceptional communication: able to run a sharp discovery call, present to a VP, and explain reinforcement learning to an ML engineer in the same afternoon
Experience with cloud infrastructure (AWS, Azure, GCP) and model serving at scale

Why Fireworks AI?

Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low-latency inference to scalable model serving.
Build What’s Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally.
Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results.
Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation.

Fireworks AI is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all innovators.

Ready to apply?

Apply to Fireworks AI

Fireworks AI

View all jobs →

Member of Technical Staff, Data Platform Engineer

Fireworks AI · San Mateo, CA

Apply now

Engineering San Mateo Posted Apr 7, 2026

About Us:

At Fireworks, we’re building the future of generative AI infrastructure. Our platform delivers the highest-quality models with the fastest and most scalable inference in the industry. We’ve been independently benchmarked as the leader in LLM inference speed and are driving cutting-edge innovation through projects like our own function calling and multimodal models. Fireworks is a Series C company valued at $4 billion and backed by top investors including Benchmark, Sequoia, Lightspeed, Index, and Evantic. We’re an ambitious, collaborative team of builders, founded by veterans of Meta PyTorch and Google Vertex AI.

The Role

We are looking for a Data Platform Engineer that specializes in Order-to-Cash (OTC) Revenue Transformation and AI Application Enablement to own and evolve the end-to-end billing, revenue and business data pipeline - from usage metering and invoice generation through revenue recognition and financial reporting. You will sit at the intersection of Engineering, Finance, and Data, ensuring every dollar of usage across our five revenue streams is accurately captured, billed, recognized, and reconciled.

This is a high-impact, cross-functional role. You will work hands-on with our billing platform (Orb, etc), accounting systems , data warehouse (BigQuery), and cloud marketplaces (AWS, GCP) — and ultimately help design AI-enabled workflow agents that automate reconciliation, anomaly detection, and revenue operations once the core data infrastructure is hardened.

What You'll Do

Phase 1 – Platform & Data Foundation

Own and enhance billing infrastructure: pricing models, usage ingestion, invoicing, and revenue workflows.
Resolve key platform gaps (pricing flexibility, account hierarchy, overages, prepaid/credit logic).
Integrate billing with ERP, payments, and cloud marketplaces for an automated invoice-to-ledger pipeline.
Implement deferred revenue and prepaid amortization across all billing models.
Build and maintain an end-to-end OTC data pipeline (usage → billing → payments → revenue → GL → reporting).
Establish authoritative data models and ensure transaction-level reconciliation across all systems.
Implement data quality, auditability, and SOX-ready controls.
Strengthen CRM → Billing → ERP integration as a single source of truth.
Automate journal entries, AR sub-ledger, and revenue postings; integrate payments and marketplace settlements.

Phase 2 – Autonomous Intelligence

Build and deploy autonomous enterprise agents to automate and augment OTC operations, including anomaly detection, reconciliation, revenue recognition, collections, contract interpretation, and forecasting.

What We're Looking For

Preferred

5+ years in billing engineering, revenue systems, or order-to-cash operations at a SaaS or usage-based platform company.
Experience with Billing Systems, ERP, AWS, K8, etc
Strong SQL and BigQuery proficiency — you can design schemas, write complex analytical queries, build dbt models, and maintain production data pipelines.
Working knowledge of accounting systems (QuickBooks or NetSuite) and the ability to map billing events to GL journal entries, manage sub-ledger reconciliation, and support month-end close.
Experience with payment platforms including payment processing, dunning, refunds, and cash application.
Familiarity with cloud marketplace billing — AWS Marketplace CPPO/SaaS contracts, GCP Marketplace, or Azure Marketplace private offers and settlement reporting.
Proficiency in Python or Node.js for building integrations, data transforms, and automation scripts.

Experience building LLM-powered agents or automation workflows — using frameworks like LangChain, or custom tool-calling architectures.
Background in GPU compute or AI infrastructure billing — understanding of compute-hour metering, token-based pricing, and capacity reservation models.
Experience with ERP migration projects (e.g., QuickBooks to NetSuite).

Total compensation for this role also includes meaningful equity in a fast-growing startup, along with a competitive salary and comprehensive benefits package. Base salary is determined by a range of factors including individual qualifications, experience, skills, interview performance, market data, and work location. The listed salary range is intended as a guideline and may be adjusted.

Base Pay Range (Plus Equity)

$175,000—$220,000 USD

Why Fireworks AI?

Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low-latency inference to scalable model serving.
Build What’s Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally.
Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results.
Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation.

Fireworks AI is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all innovators.

Ready to apply?

Apply to Fireworks AI

Fireworks AI

View all jobs →

Member of Technical Staff, Evals & Post-Training Product

Fireworks AI · San Mateo, CA

Apply now

Engineering San Mateo Posted Mar 24, 2026

About Us:

At Fireworks, we’re building the future of generative AI infrastructure. Our platform delivers the highest-quality models with the fastest and most scalable inference in the industry. We’ve been independently benchmarked as the leader in LLM inference speed and are driving cutting-edge innovation through projects like our own function calling and multimodal models. Fireworks is a Series C company valued at $4 billion and backed by top investors including Benchmark, Sequoia, Lightspeed, Index, and Evantic. We’re an ambitious, collaborative team of builders, founded by veterans of Meta PyTorch and Google Vertex AI.

We are seeking a Member of Technical Staff, Evals & Post-Training Product to help define how developers improve models on Fireworks. This role sits at the intersection of product engineering, developer experience, and model quality.

You will build the products and workflows that connect evaluation and post-training into a continuous loop: helping internal teams run evals at scale, enabling external developers through our open-source Eval Protocol SDK, and owning key product experiences for fine-tuning custom models on Fireworks.

You will work across the stack—from APIs, SDKs, and backend systems to user-facing product surfaces in the web app—to make it easier for users to author evals, understand results, fine-tune models, and iterate quickly. You will also work directly with customers and internal teams to identify friction, support real-world use cases, and turn repeated pain points into reusable product capabilities.

Key Responsibilities:

Build internal eval workflows: Design and scale evaluation tooling used by internal teams to measure model quality, compare model changes, and inform post-training decisions.
Own fine-tuning product experiences: Build and improve user-facing product workflows for post-training, including fine-tuning experiences across SFT, RFT, and related model-improvement capabilities.
Work closely with users: Partner with customers and internal stakeholders to understand evaluation and fine-tuning needs, support high-priority engagements, triage issues, and convert bespoke workflows into productized solutions.

Minimum Requirements:

1 - 7 years of software engineering experience (We are hiring at multiple levels for this role).
Hands-on experience with LLM evaluations and/or post-training methods: How to design useful evals and use their results to guide model improvement.
Product Engineering Skills: The ability to work across backend systems and developer-facing product surfaces. Comfortable shipping full-stack features when needed.
Understanding of the GenAI Lifecycle: You understand the end-to-end workflow—from prompting a base model to curating a dataset, fine-tuning, and productionizing agents—and how these steps interconnect.
User-Centric Mindset: Willing to talk to users, triage GitHub issues for open-source projects, and build products from scratch to serve emerging needs.

Preferred Qualifications:

3+ years of software engineering experience.
Domain-Specific Evaluation Experience: Strong familiarity with designing and running evaluations for domain-specific use cases (e.g. medical, legal, coding, or custom internal datasets).
Open Source Contributions: Prior contributions to developer tools or AI/ML repositories.
Inference & Hardware Knowledge: Interest in the hardware side of AI—understanding GPU constraints, inference optimization techniques, and how they relate to model performance.
Startup DNA: Experience in fast-paced environments where you own features end-to-end.

Why Fireworks AI?

Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low-latency inference to scalable model serving.
Build What’s Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally.
Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results.
Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation.

Fireworks AI is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all innovators.

Ready to apply?

Apply to Fireworks AI

Fireworks AI

View all jobs →

Member of Technical Staff, Performance Optimization

Fireworks AI · San Mateo, CA

Apply now

Engineering San Mateo Posted Mar 5, 2026

About Us:

At Fireworks, we’re building the future of generative AI infrastructure. Our platform delivers the highest-quality models with the fastest and most scalable inference in the industry. We’ve been independently benchmarked as the leader in LLM inference speed and are driving cutting-edge innovation through projects like our own function calling and multimodal models. Fireworks is a Series C company valued at $4 billion and backed by top investors including Benchmark, Sequoia, Lightspeed, Index, and Evantic. We’re an ambitious, collaborative team of builders, founded by veterans of Meta PyTorch and Google Vertex AI.

The Role:

We're looking for a Software Engineer focused on Performance Optimization to help push the boundaries of speed and efficiency across our AI infrastructure. In this role, you'll take ownership of optimizing performance at every layer of the stack—from low-level GPU kernels to large-scale distributed systems. A key focus will be maximizing the performance of our most demanding workloads, including large language models (LLMs), vision-language models (VLMs), and next-generation video models.

You’ll work closely with teams across research, infrastructure, and systems to identify performance bottlenecks, implement cutting-edge optimizations, and scale our AI systems to meet the demands of real-world production use cases. Your work will directly impact the speed, scalability, and cost-effectiveness of some of the most advanced generative AI models in the world.

Key Responsibilities:

Optimize system and GPU performance for high-throughput AI workloads across training and inference
Analyze and improve latency, throughput, memory usage, and compute efficiency
Profile system performance to detect and resolve GPU- and kernel-level bottlenecks
Implement low-level optimizations using CUDA, Triton, and other performance tooling
Drive improvements in execution speed and resource utilization for large-scale model workloads (LLMs, VLMs, and video models)
Collaborate with ML researchers to co-design and tune model architectures for hardware efficiency
Improve support for mixed precision, quantization, and model graph optimization
Build and maintain performance benchmarking and monitoring infrastructure
Scale inference and training systems across multi-GPU, multi-node environments
Evaluate and integrate optimizations for emerging hardware accelerators and specialized runtimes

Minimum Qualifications:

Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience
5+ years of experience working on performance optimization or high-performance computing systems
Proficiency in CUDA or ROCm and experience with GPU profiling tools (e.g., Nsight, nvprof, CUPTI)
Familiarity with PyTorch and performance-critical model execution
Experience with distributed system debugging and optimization in multi-GPU environments
Deep understanding of GPU architecture, parallel programming models, and compute kernels

Preferred Qualifications:

Master’s or PhD in Computer Science, Electrical Engineering, or a related field
Experience optimizing large models for training and inference (LLMs, VLMs, or video models)
Knowledge of compiler stacks or ML compilers (e.g., torch.compile, Triton, XLA)
Contributions to open-source ML or HPC infrastructure
Familiarity with cloud-scale AI infrastructure and orchestration tools (e.g., Kubernetes)
Background in ML systems engineering or hardware-aware model design

Example projects:

Implement fully asynchronous low-latency sampling for large language models integrated with structured outputs
Implement GPU kernels for the new low-precision scheme and run experiments to find optimal speed-quality tradeoff
Build a distributed router with a custom load-balancing algorithm to optimize LLM cache efficiency
Define metrics and build harness for finding optimal performance configuration (e.g. sharding, precision) for a given class of model
Determine and implement in PyTorch an optimal sharding scheme for a novel attention variant
Optimize communication patterns in RDMA networks (Infiniband, RoCE)
Debug numerical instabilities for a given model for a small portion of requests when deployed at scale

Total compensation for this role also includes meaningful equity in a fast-growing startup, along with a competitive salary and comprehensive benefits package. Base salary is determined by a range of factors including individual qualifications, experience, skills, interview performance, market data, and work location. The listed salary range is intended as a guideline and may be adjusted.

Base Pay Range (Plus Equity)

$175,000—$220,000 USD

Why Fireworks AI?

Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low-latency inference to scalable model serving.
Build What’s Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally.
Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results.
Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation.

Fireworks AI is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all innovators.

Ready to apply?

Apply to Fireworks AI

Fireworks AI

View all jobs →

Member of Technical Staff, Research

Fireworks AI · San Mateo, CA

Apply now

Engineering San Mateo Posted Mar 5, 2026

About Us:

At Fireworks, we’re building the future of generative AI infrastructure. Our platform delivers the highest-quality models with the fastest and most scalable inference in the industry. We’ve been independently benchmarked as the leader in LLM inference speed and are driving cutting-edge innovation through projects like our own function calling and multimodal models. Fireworks is a Series C company valued at $4 billion and backed by top investors including Benchmark, Sequoia, Lightspeed, Index, and Evantic. We’re an ambitious, collaborative team of builders, founded by veterans of Meta PyTorch and Google Vertex AI.

The Role:

As a Member of Technical Staff on the Research team, you’ll push the boundaries of generative AI, advancing LLMs and multimodal systems through foundational research. Your work will enhance model efficiency, accuracy, and scalability, directly shaping our high-performance AI infrastructure. You'll collaborate with top experts in deep learning, distributed systems, and optimization to bring cutting-edge research into real-world applications. You'll also have the opportunity to shape how some of the world’s leading companies build and deploy AI through the models and tools you help create.

Key Responsibilities

Conduct foundational research to advance the capabilities, efficiency, and reliability of LLMs and multimodal systems
Design, implement, and evaluate novel model architectures, training methods, and optimization techniques
Collaborate with engineering teams to transition research prototypes into production-grade systems
Analyze empirical results, identify performance bottlenecks, and iterate quickly to improve model quality
Contribute to internal research strategy by identifying high-impact opportunities and emerging trends in AI

Minimum Qualifications:

Research background in Artificial Intelligence, Machine Learning, Physics, or similar field
Experience solving analytical problems using analytic and quantitative approaches
Experience communicating research to audiences with different backgrounds
Experience coding in C/C++, Python, or other similar languages

Preferred Qualifications:

PhD degree in Computer Science, Computational Physics, Mathematics, or a similar field
Research and engineering experience demonstrated via grants, fellowships, patents, internships, work experience, and/or coding competitions
Experience having first-authored publications at peer-reviewed conferences or journals
Experience working with ML frameworks such as PyTorch, TensorFlow, or Jax

Total compensation for this role also includes meaningful equity in a fast-growing startup, along with a competitive salary and comprehensive benefits package. Base salary is determined by a range of factors including individual qualifications, experience, skills, interview performance, market data, and work location. The listed salary range is intended as a guideline and may be adjusted.

Base Pay Range (Plus Equity)

$175,000—$240,000 USD

Why Fireworks AI?

Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low-latency inference to scalable model serving.
Build What’s Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally.
Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results.
Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation.

Fireworks AI is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all innovators.

Ready to apply?

Apply to Fireworks AI

Fireworks AI

View all jobs →

Software Engineer, AI Infrastructure

Fireworks AI · New York, NY; San Mateo, CA

Apply now

Engineering San Mateo Posted Mar 5, 2026

About Us:

At Fireworks, we’re building the future of generative AI infrastructure. Our platform delivers the highest-quality models with the fastest and most scalable inference in the industry. We’ve been independently benchmarked as the leader in LLM inference speed and are driving cutting-edge innovation through projects like our own function calling and multimodal models. Fireworks is a Series C company valued at $4 billion and backed by top investors including Benchmark, Sequoia, Lightspeed, Index, and Evantic. We’re an ambitious, collaborative team of builders, founded by veterans of Meta PyTorch and Google Vertex AI.

The Role:

As a Software Engineer on our AI Infrastructure team, you will help design the core systems that power Fireworks AI’s generative AI platform. You will help build infrastructure and tools that ensure the reliability, performance, quality, and availability of our AI system.

Our mission is to make Fireworks AI the most reliable and user friendly generative AI platform in the world. You will partner closely with our cloud infrastructure team, product team, and performance team to deliver infrastructure that bridges the gap between our customer and the ultra-performant proprietary Fireworks inference engine.

Key Responsibilities:

Contribute to the design and development of scalable backend infrastructure that supports distributed training, inference, and data pipelines
Build and maintain core backend services such as LLM CI/CD pipeline, control plane, and model serving systems
Support performance optimization, cost efficiency, and reliability improvements across compute, storage, and networking layers
Building frameworks and safeguards to ensure Fireworks AI has the best model quality in the industry
Collaborate with performance, training, and product teams to translate research and product needs into infrastructure solutions
Participate in code reviews, technical discussions, and continuous integration and deployment processes

Minimum Qualifications:

Bachelor’s degree in Computer Science, Engineering, or a related technical field (or equivalent practical experience)
3 years of experience in software engineering, with a focus on infrastructure or machine learning systems
Strong programming skills in Python, Go, or a similar language
Proven experience in ML infrastructure and tooling (e.g., PyTorch, MLflow, Vertex AI, SageMaker, Kubernetes, etc.).
Basic understanding of LLM knowledge (e.g., context length, disaggregated prefill, KV cache memory estimation, etc)

Preferred Qualifications:

5+ years of experience in software engineering, with a focus on infrastructure or machine learning systems
Experience with open source inference engine like vLLM, Sglang, or TRT-LLM
Contributions to open-source infrastructure or ML projects
Experience in building large scale ML/MLOps infrastructure

Why Fireworks AI?

Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low-latency inference to scalable model serving.
Build What’s Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally.
Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results.
Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation.

Fireworks AI is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all innovators.

Ready to apply?

Apply to Fireworks AI

Fireworks AI

View all jobs →

Member of Technical Staff, AI Training Infrastructure

Fireworks AI · San Mateo, CA

Apply now

Engineering New York San Mateo Posted Mar 5, 2026

About Us:

At Fireworks, we’re building the future of generative AI infrastructure. Our platform delivers the highest-quality models with the fastest and most scalable inference in the industry. We’ve been independently benchmarked as the leader in LLM inference speed and are driving cutting-edge innovation through projects like our own function calling and multimodal models. Fireworks is a Series C company valued at $4 billion and backed by top investors including Benchmark, Sequoia, Lightspeed, Index, and Evantic. We’re an ambitious, collaborative team of builders, founded by veterans of Meta PyTorch and Google Vertex AI.

The Role:

As a Training Infrastructure Engineer, you'll design, build, and optimize the infrastructure that powers our large-scale model training operations. Your work will be essential to developing high-performance AI training infrastructure. You'll collaborate with AI researchers and engineers to create robust training pipelines, optimize distributed training workloads, and ensure reliable model development.

Key Responsibilities:

Design and implement scalable infrastructure for large-scale model training workloads
Develop and maintain distributed training pipelines for LLMs and multimodal models
Optimize training performance across multiple GPUs, nodes, and data centers
Implement monitoring, logging, and debugging tools for training operations
Architect and maintain data storage solutions for large-scale training datasets
Automate infrastructure provisioning, scaling, and orchestration for model training
Collaborate with researchers to implement and optimize training methodologies
Analyze and improve efficiency, scalability, and cost-effectiveness of training systems
Troubleshoot complex performance issues in distributed training environments

Minimum Qualifications:

Bachelor's degree in Computer Science, Computer Engineering, or related field, or equivalent practical experience
3+ years of experience with distributed systems and ML infrastructure
Experience with PyTorch
Proficiency in cloud platforms (AWS, GCP, Azure)
Experience with containerization, orchestration (Kubernetes, Docker)
Knowledge of distributed training techniques (data parallelism, model parallelism, FSDP)

Preferred Qualifications:

Master's or PhD in Computer Science or related field
Experience training large language models or multimodal AI systems
Experience with ML workflow orchestration tools
Background in optimizing high-performance distributed computing systems
Familiarity with ML DevOps practices
Contributions to open-source ML infrastructure or related projects

Total compensation for this role also includes meaningful equity in a fast-growing startup, along with a competitive salary and comprehensive benefits package. Base salary is determined by a range of factors including individual qualifications, experience, skills, interview performance, market data, and work location. The listed salary range is intended as a guideline and may be adjusted.

Base Pay Range (Plus Equity)

$175,000—$220,000 USD

Why Fireworks AI?

Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low-latency inference to scalable model serving.
Build What’s Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally.
Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results.
Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation.

Fireworks AI is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all innovators.

Ready to apply?

Apply to Fireworks AI

Fireworks AI

View all jobs →

Applied Machine Learning Engineer

Fireworks AI · New York, NY; San Mateo, CA

Apply now

Engineering New York San Mateo Posted Mar 5, 2026

About Us:

At Fireworks, we’re building the future of generative AI infrastructure. Our platform delivers the highest-quality models with the fastest and most scalable inference in the industry. We’ve been independently benchmarked as the leader in LLM inference speed and are driving cutting-edge innovation through projects like our own function calling and multimodal models. Fireworks is a Series C company valued at $4 billion and backed by top investors including Benchmark, Sequoia, Lightspeed, Index, and Evantic. We’re an ambitious, collaborative team of builders, founded by veterans of Meta PyTorch and Google Vertex AI.

The Role:

As an Applied Machine Learning Engineer, you will serve as a vital bridge between cutting-edge AI research and practical, real-world applications. Your work will focus on developing, fine-tuning, and operationalizing machine learning models that drive business value and enhance user experiences. This is a hands-on engineering role that combines deep technical expertise with a strong customer focus to deliver scalable AI solutions.

Key Responsibilities:

Customer Success: Collaborate directly with the GTM team (Account Executives and Solutions Architects) to ensure smooth integration and successful deployment of ML solutions.
Demo / Proof of Concept (PoC): Build and present compelling PoCs that demonstrate the capabilities of our AI technology.
Application Build: Design, develop, and deploy end-to-end AI-powered applications tailored to customer needs.
Platform Features / Bug Fixes: Contribute to the internal ML platform, including adding features and resolving issues.
New Model Enablements: Integrate and enable new machine learning models into the existing platform or client environments.
Performance Optimizations: Improve system performance, efficiency, and scalability of deployed models and applications.
Partnership Enablement: Work closely with partners to enable joint AI solutions and ensure seamless collaboration.

Minimum Qualifications:

Bachelor’s degree in Computer Science, Engineering, or a related technical field.
5+ years of experience in a software engineering role, with a strong preference for customer-facing roles.
Robust coding skills required, preferably with proficiency in Python.
Demonstrated ability to lead and execute complex technical projects with a focus on customer success.
Strong interpersonal and communication skills; ability to thrive in dynamic, cross-functional teams.

Preferred Qualifications:

Master’s degree in Computer Science, Engineering, or a related technical field.
Experience working in a startup or fast-paced environment.
Hands-on experience fine-tuning machine learning models, including supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF or RFT).
Solid understanding of generative AI, machine learning principles, and enterprise infrastructure.

Total compensation for this role also includes meaningful equity in a fast-growing startup, along with a competitive salary and comprehensive benefits package. Base salary is determined by a range of factors including individual qualifications, experience, skills, interview performance, market data, and work location. The listed salary range is intended as a guideline and may be adjusted.

Base Pay Range (Plus Equity)

$170,000—$240,000 USD

Why Fireworks AI?

Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low-latency inference to scalable model serving.
Build What’s Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally.
Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results.
Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation.

Fireworks AI is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all innovators.

Ready to apply?

Apply to Fireworks AI

Fireworks AI

View all jobs →

Support Engineer

Fireworks AI · San Mateo, CA

Apply now

Engineering San Mateo Posted Mar 5, 2026

About Us:

At Fireworks, we’re building the future of generative AI infrastructure. Our platform delivers the highest-quality models with the fastest and most scalable inference in the industry. We’ve been independently benchmarked as the leader in LLM inference speed and are driving cutting-edge innovation through projects like our own function calling and multimodal models. Fireworks is a Series C company valued at $4 billion and backed by top investors including Benchmark, Sequoia, Lightspeed, Index, and Evantic. We’re an ambitious, collaborative team of builders, founded by veterans of Meta PyTorch and Google Vertex AI.

Role Description:

We’re looking for a Technical Support & Community Engineer to be the frontline connection between our platform and its users. In this role, you’ll handle technical support requests, manage our developer community (including Discord), and help drive customer satisfaction by resolving issues quickly and effectively. You’ll play a key part in identifying sales and product opportunities, addressing customer issues and inquiries, and collating feedback for our product and engineering teams. This role blends technical troubleshooting with community management and is ideal for someone who thrives in a fast-paced, user-facing environment.

Key Responsibilities:

Own and drive resolution for customer support requests, including managing the Discord community, handling customer communication around active issues and incidents, and responding to inquiries in a timely, helpful manner.
Identify and surface sales, partnership, and product opportunities discovered through support conversations when appropriate.
Investigate and troubleshoot issues when reported by customers by looking into logs and other internal tooling.
Triaging product issues to the product and engineering team.
Contributing to a knowledge base for the broader support team.
Coordinate access to HIPAA, GDPR and other compliance-related documentation as needed.
Proactively explore and implement process improvements, including opportunities for automation using AI tools.

Minimum Qualifications:

2+ years of experience in a forward-deployed or customer-facing technical role.
Basic Python and scripting experience.
Strong passion in the AI field and basic knowledge of how modern AIs are built.
Strong technical acumen with the ability to understand, investigate, and resolve customer issues.
Experience performing technical troubleshooting, including collecting debugging information and triaging problems.
Excellent communication skills and high degree of customer empathy.
Comfort working in a fast-paced startup environment with evolving processes and responsibilities.

Preferred Qualifications:

Bachelor's degree in computer science or equivalent
Experience supporting AI/ML platforms
Prior experience in community management (e.g., Discord, GitHub, or forums)

Why Fireworks AI?

Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low-latency inference to scalable model serving.
Build What’s Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally.
Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results.
Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation.

Fireworks AI is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all innovators.

Ready to apply?

Apply to Fireworks AI

Fireworks AI

View all jobs →

Security Engineer

Fireworks AI · San Mateo, CA

Apply now

Engineering San Mateo Posted Mar 5, 2026

About Us:

At Fireworks, we’re building the future of generative AI infrastructure. Our platform delivers the highest-quality models with the fastest and most scalable inference in the industry. We’ve been independently benchmarked as the leader in LLM inference speed and are driving cutting-edge innovation through projects like our own function calling and multimodal models. Fireworks is a Series C company valued at $4 billion and backed by top investors including Benchmark, Sequoia, Lightspeed, Index, and Evantic. We’re an ambitious, collaborative team of builders, founded by veterans of Meta PyTorch and Google Vertex AI.

The Role:

Security is the foundation of trust in AI systems. As the Security Engineer at Fireworks AI, you will play a key role in designing, implementing and operating security controls across AI infrastructure, AI platforms and internal systems. You will work closely with the multiple teams to strengthen our security posture and support our rapid growth. As more organizations rely on large language models and cloud-native AI services, ensuring the confidentiality, integrity, and availability of data, models, and infrastructure is paramount. This role plays a critical part in building that trust by designing and embedding security across layers of our technology stack.

Key Responsibilities:

Design and build security-focused software and platform capabilities to protect customer data, models, and services across our multi-cloud infrastructure, including encryption, identity and access management, secure API gateways, secure model execution, and sandboxing strategies.
Perform security reviews of cloud-native architectures—including Kubernetes clusters, multi-cloud workloads, and distributed data stores—and build integrated systems for continuous security monitoring, anomaly detection, and automated response.
Embed security into CI/CD pipelines using a DevSecOps approach, implementing automated scanning, policy enforcement, and secure-by-default build and deployment workflows.
Apply a build-over-buy philosophy by designing and developing in-house security tooling and automation where it provides better control, scalability, and integration than off-the-shelf solutions.
Build and operate a comprehensive vulnerability management program, partnering with various teams to remediate risks across applications, containers, cloud infrastructure, and dependencies.
Operate and continuously improve security operations, including detection engineering, alert triage, incident response, and continuous improvement through post-incident reviews.
Participate in red/blue team exercises, tabletop simulations, and post-incident root cause analysis to strengthen security resilience.
Embed compliance and regulatory controls into infrastructure and product layers (e.g., SOC 2, ISO 27001, ISO42001, HIPAA, PCI-DSS, GDPR).

Minimum qualifications:

3 to 7 years of experience in software engineering or security engineering with a strong focus on security, infrastructure, or cloud-native systems.
Proficient in Python and/or Go with experience in designing production-grade systems.
Strong understanding of cloud-native architectures using GCP, particularly in the area of network segregation, authentication, authorization, encryption, data protection, intrusion detection, and cloud-specific security benchmarks.
Hands-on experience with Kubernetes, Docker, and containerized production environments; deep knowledge of Kubernetes internals and native security controls is a strong plus.
Familiarity with security tooling in managed CI/CD environments (e.g., GitHub Actions, Harness, CircleCI).
Solid experience working in Linux environments, including system administration, debugging, and automation via command-line tooling.
Familiarity with modern identity and access controls (SAML, OAuth, OIDC, SSO, RBAC/ABAC).

Preferred qualifications:

Experience designing secure multi-cloud deployments and zero-trust architectures.
Experience designing, operating, and securing large-scale Kubernetes platforms, including control plane security, node hardening, and multi-tenant isolation.
Experience designing, operating, and securing large-scale multi-cloud platforms across AWS, GCP, Azure, Oracle Cloud, and GPU as service cloud providers.
Proficiency with infrastructure-as-code using Terraform and Python, including experience building modular policy-as-code frameworks.
Strong understanding of data protection techniques, including encryption at rest/in transit, tokenization, key management, and confidential computing.
Experience integrating security into microservice architectures, service meshes, and distributed systems.
Hands-on experience securing LLM/ML platforms, model inference infrastructure, GPU clusters, or data labeling pipelines.
Experience designing detection engineering pipelines across cloud audit logs, network telemetry, and application signals.
Experience building large-scale IAM and PAM platforms using least-privilege, workload identity, and just-in-time access.
Familiarity with container image vulnerability remediation, security, SBOM generation, and software supply chain security.
Experience building, implementing and operating security automation platforms for incident response and security operations.
Familiarity with compliance tooling and frameworks (e.g., Vanta, SOC 2, ISO 27001, ISO 42001, PCI-DSS).

Why Fireworks AI?

Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low-latency inference to scalable model serving.
Build What’s Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally.
Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results.
Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation.

Fireworks AI is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all innovators.

Ready to apply?

Apply to Fireworks AI

Fireworks AI

View all jobs →

Member of Technical Staff, Cluster Management

Fireworks AI · San Mateo, CA

Apply now

Engineering San Mateo Posted Mar 5, 2026

About Us:

At Fireworks, we’re building the future of generative AI infrastructure. Our platform delivers the highest-quality models with the fastest and most scalable inference in the industry. We’ve been independently benchmarked as the leader in LLM inference speed and are driving cutting-edge innovation through projects like our own function calling and multimodal models. Fireworks is a Series C company valued at $4 billion and backed by top investors including Benchmark, Sequoia, Lightspeed, Index, and Evantic. We’re an ambitious, collaborative team of builders, founded by veterans of Meta PyTorch and Google Vertex AI.

The Role:

As a Member of Technical Staff, Cluster Management at Fireworks AI, you will play a critical role in making our world-scale virtual AI cloud reliable, performant, and efficient. You will apply your expertise in large-scale distributed systems, cloud infrastructure, and operational excellence. You will partner closely with world-class software engineers and AI experts to scale cutting-edge AI platforms to meet the fast-growing demands and ever-evolving application paradigms. This role is for someone passionate about operating highly robust, observable, and automated systems and enabling customer successes.

Key Responsibilities:

Ensuring System Reliability: Ensure systems are designed and implemented with high availability, scalability, and performance. Focus on fault tolerance, disaster recovery, identifying and removing scaling bottlenecks, and performance optimization across our multi-cloud infrastructure.
Incident Management & Response: Lead efforts in incident detection, response, and resolution for critical production issues. Drive post-mortems to identify root causes and implement preventative measures to improve system reliability.
Observability & Monitoring: Develop, implement, and maintain comprehensive monitoring, alerting, logging, and tracing solutions to provide deep insights into system health and performance.
Automation & Toil Reduction: Identify and automate repetitive operational tasks to reduce toil and improve operational efficiency. Develop tools and scripts to streamline deployments, scaling, and system management.
Capacity Planning & Performance Tuning: Work proactively on capacity planning to ensure our infrastructure can gracefully handle growth and peak loads. Optimize system performance and resource utilization.
Reliability Best Practices: Collaborate with software engineers to embed reliability principles (e.g., SLOs, SLIs, error budgets) into the development lifecycle, promoting a culture of operational excellence.
On-call Rotation: Participate in a periodic on-call rotation to support our production environment and respond to critical alerts.

Minimum qualifications:

Bachelor's degree in Computer Science, related technical field, or equivalent practical experience.
5+ years of experience in Site Reliability Engineering, DevOps, or a similar role focused on large-scale production systems.
Deep expertise in SRE principles and practices, including SLOs, SLIs, operational automation, incident management, and post-mortems.
Extensive hands-on experience with public cloud platforms (AWS, GCP, Azure), including compute, networking, storage, and database services.
Strong experience with containerization technologies (Docker) and orchestration platforms (Kubernetes).
Proficiency in designing and implementing robust monitoring, logging, and alerting systems using tools like Prometheus, Grafana, ELK stack, and distributed tracing.
Solid programming/scripting skills in at least one language (e.g., Python, Go) for automation and tool development.
In-depth knowledge of Linux operating systems, networking fundamentals, and system debugging.
Proven ability to troubleshoot complex issues across the entire stack.
Excellent communication, collaboration, and problem-solving skills.
Willingness to participate in on-call rotations.

Preferred qualifications:

Experience of managing data center grade GPU clusters with GPU (and peripherals like HBM and RDMA enabled networking) monitoring, troubleshooting, and fixing.
Experience with machine learning infrastructure, model serving, or distributed AI frameworks.
Hands-on experience in security and data protection.

Why Fireworks AI?

Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low-latency inference to scalable model serving.
Build What’s Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally.
Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results.
Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation.

Fireworks AI is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all innovators.

Ready to apply?

Apply to Fireworks AI

Fireworks AI

View all jobs →

Fireworks AI

Frequently Asked Questions:

If you’re excited to solve problems that redefine how AI is built and deployed, we’d love to hear from you.

Member of Technical Staff, Cloud Infrastructure

About Us:

The Role:

Key Responsibilities:

Minimum qualifications:

Preferred qualifications:

Why Fireworks AI?

Member of Technical Staff, Software Engineer

About Us:

The Role:

What You’ll Do

You Might Be a Fit If

Minimum Qualifications

Why Fireworks AI?

IT Engineer

About Us:

Role Description:

Key Responsibilities:

Minimum Qualifications:

Preferred Qualifications:

Why Fireworks AI?

Solutions Architect

About Us:

Why Fireworks AI?

Member of Technical Staff, Data Platform Engineer

About Us:

The Role

What You'll Do

Phase 1 – Platform & Data Foundation

Phase 2 – Autonomous Intelligence

What We're Looking For

Preferred

Why Fireworks AI?

Member of Technical Staff, Evals & Post-Training Product

About Us:

Key Responsibilities:

Minimum Requirements:

Preferred Qualifications:

Why Fireworks AI?

Member of Technical Staff, Performance Optimization

About Us:

The Role:

Key Responsibilities:

Minimum Qualifications:

Preferred Qualifications:

Example projects:

Why Fireworks AI?

Member of Technical Staff, Research

About Us:

The Role:

Key Responsibilities

Minimum Qualifications:

Preferred Qualifications:

Why Fireworks AI?

Software Engineer, AI Infrastructure

About Us:

The Role:

Key Responsibilities:

Minimum Qualifications:

Preferred Qualifications:

Why Fireworks AI?

Member of Technical Staff, AI Training Infrastructure

About Us:

The Role:

Key Responsibilities:

Minimum Qualifications:

Preferred Qualifications:

Why Fireworks AI?

Applied Machine Learning Engineer

About Us:

The Role:

Key Responsibilities:

Minimum Qualifications:

Preferred Qualifications:

Why Fireworks AI?

Support Engineer

About Us: