Pick a job to read the details

Tap any role on the left — its description and apply link will open here.

Senior ML Solutions Architect - Token Factory

Nebius · Remote; Singapore

Token Factory Singapore Remote Posted May 8, 2026

About Nebius:

Nebius is leading a new era in cloud infrastructure for the global AI economy. We are building a full-stack AI cloud platform that supports developers and enterprises from data and model training through to production deployment, without the cost and complexity of building large in-house AI/ML infrastructure.

Built by engineers, for engineers. From large-scale GPU orchestration to inference optimization, we own the hard problems across compute, storage, networking and applied AI.

Listed on Nasdaq (NBIS) and headquartered in Amsterdam, we have a global footprint with R&D hubs across Europe, the UK, North America and Israel. Our team of 1,500+ includes hundreds of engineers with deep expertise across hardware, software and AI R&D.

The role

We seek an experienced Senior ML Solutions Architect to support customers leveraging Nebius Token Factory's serverless inference platform for open-source LLMs across multiple modalities. In this role, you will be collaborating with clients to design and implement customized LLM-based solution and architect scalable AI applications using our served models, and working together with our backend team to improve our platform to match the clients' needs.

You’re welcome to work remotely from Singapore.

Your responsibilities will include:

Design and implement LLM-based solutions using Nebius Token Factory’s inference services to drive business value and support customer goals
Build production-ready applications leveraging our serverless LLM APIs, including multimodal models (text, vision, audio) and domain-specific models
Provide technical expertise in prompt engineering, RAG architectures, model selection, and inference optimization
Collaborate with product and engineering teams to surface customer feedback and shape the platform roadmap
Guide customers in scaling from POC to production with a focus on performance, reliability, and cost efficiency

We expect you to have:

5+ years of experience in ML/AI systems, with at least 2 years focused on LLMs and generative AI
Deep knowledge of the LLM ecosystem, including model architectures and fine-tuning approaches
Hands-on experience with:
- Prompt engineering and LLM pipeline development, including evaluation
- Agentic frameworks such as Langchain, Langsmith, smolagents, or equivalent
- Vector databases and RAG implementation patterns
- Deploying LLM-powered applications using APIs from OpenAI, Anthropic, or open-source models
Strong Python programming skills
Excellent communication skills, with the ability to clearly explain technical concepts to diverse audiences

It would be an added bonus if you have:

Experience with inference frameworks and libraries (e.g., vLLM, SGLang, TensorRT-LLM, Transformers)
Familiarity with inference optimization techniques such as quantization, batching, caching, and routing
Work with multimodal AI models (e.g., vision-language, speech)
Proficiency with DevOps tools (Docker, Kubernetes)
Contributions to open-source ML/AI projects

Preferred technical stack:

Programming Languages – Python
ML Frameworks and Libraries – vLLM, SGLang, TensorRT-LLM, Transformers, OpenAI/Anthropic SDKs
Frameworks for Agentic Pipelines – Langchain / Langsmith / smolagents / equivalent
API and Web Frameworks – FastAPI, Flask
MLOps and DevOps tools – Kubernetes (K8s), Docker, Git
Cloud Platforms – AWS (SageMaker, Bedrock), GCP (Vertex AI), Azure (Azure ML)

Benefits & Perks:

Competitive compensation
Career growth and learning opportunities
Flexibility and work-life balance
Collaborative and innovative culture
Opportunity to work on impactful AI projects
International environment and talented teams

What's it like to work at Nebius:

Fast moving - Bold thinking - Constant growth - Meaningful impact - Trust and real ownership - Opportunity to shape the future of AI

Equal Opportunity Statement:

Nebius is an equal opportunity employer. We are committed to fostering an inclusive and diverse workplace and to providing equal employment opportunities in all aspects of employment. We do not discriminate on the basis of race, color, religion, sex (including pregnancy), national origin, ancestry, age, disability, genetic information, marital status, veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by applicable law.

Applicants must be authorized to work in the country in which they apply and will be required to provide proof of employment eligibility as a condition of hire.

If you need accommodations during the application process, please let us know.

Ready to apply?

Apply to Nebius

Nebius

View all jobs →

EN

HPC & Cloud Infrastructure Engineer

Encora · Singapore

Apply now

APAC DU Singapore Posted May 8, 2026

HPC & Cloud Infrastructure Engineer

Important Information

Location: Singapore

12 months contract

Job Summary

We’re hiring an HPC & Cloud Infrastructure Engineer to design, deploy, and optimize high-performance computing environments across on-prem and cloud. You’ll manage HPC clusters, interconnects, job schedulers, and enable AI/ML workloads at scale while driving automation and cost efficiency

Job Description

Architect, deploy, and manage HPC clusters with job schedulers, parallel file systems, and cluster management tools

Design, configure, and troubleshoot Infiniband high-throughput, low-latency interconnects for HPC/distributed workloads

Own PBS Professional scheduling: deployment, queue optimization, custom job submission scripts, workload management

Administer RHEL-based systems: performance tuning, package management, security hardening, patching via Red Hat Satellite and Ansible

Build and maintain cloud HPC environments on AWS, Azure, and GCP – provisioning, hybrid setups, migrations, and cost optimization

Implement Infrastructure as Code using Terraform/Ansible and integrate with CI/CD pipelines for reproducible infrastructure

Enable GPU & AI/ML workloads: containers, TensorFlow, PyTorch, scikit-learn, Keras, MXNet; support MLOps pipelines for training and deployment

Optimize parallel applications using MPI and OpenMP; debug and scale distributed/shared memory workloads

Drive monitoring, logging, and alerting for cluster health, job efficiency, and resource utilization

Required Skills and Experience

High-Performance Computing

Hands on experience in managing HPC clusters with job scheduler, cluster management parallel programming libraries, and parallel filesystems.

Knowledge of resource scheduling and job optimization for efficient workload management

Infiniband (Networking)

Hands-on experience with high-throughput, low-latency interconnect technologies like Infiniband.

Ability to design, configure, and troubleshoot interconnects in HPC or distributed environments.

Operating Systems and Environments

Administration and configuration of RHEL-based systems.

Performance tuning, package management, and security hardening.

Knowledge of Red Hat Satellite and Ansible for automation.

Job Scheduling with PBS Professional

Experience in deploying and managing PBS Professional for scheduling and workload management in HPC environments.

Customizing job submission scripts and optimizing job queues.

Parallel Programming Libraries

MPI (Message Passing Interface) and OpenMP (Open Multi-Processing):

Proficiency in writing, debugging, and optimizing parallelized code.

Experience with scaling applications across HPC systems.

Understanding of distributed memory (MPI) and shared memory (OpenMP)

paradigms.

Cloud Platforms

AWS, Azure, Google Cloud:

Expertise in provisioning, configuring, and managing services on all three platforms.

Cross-platform migration and hybrid cloud solutions knowledge.

Proficiency in managing high-performance computing (HPC) clusters on the cloud.

Deep understanding of cost optimization, security, and cloud native development tools (e.g., Kubernetes, Terraform).

Infrastructure as Code (IaC)

Ability to design, deploy, and maintain infrastructure using automation and configuration management tools.

CI/CD pipeline integration for IaC workflows.

GPU & AI Libraries and Tools

Hands-on experience with container technologies.

Hands-on experience with TensorFlow, PyTorch, scikit-learn, Keras, or MXNet.

Familiarity with AI/ML pipelines, model training, and optimization.

Knowledge of MLOps tools for deploying and monitoring models

About Encora

Encora is a global company that offers Software and Digital Engineering solutions. Our practices include Cloud Services, Product Engineering & Application Modernization, Data & Analytics, Digital Experience & Design Services, DevSecOps, Cybersecurity, Quality Engineering, AI & LLM Engineering, among others.

At Encora, we hire professionals based solely on their skills and do not discriminate based on age, disability, religion, gender, sexual orientation, socioeconomic status, or nationality.

Ready to apply?

Apply to Encora

EN

Encora

View all jobs →

AI Researcher

Workato · Singapore

Apply now

Product & Engineering Singapore Posted Apr 30, 2026

About Workato

Workato delivers enterprise infrastructure for the agentic era, redefining iPaaS and helping enterprises unify data, applications, processes, and AI into a single, governed platform. A leader in Enterprise MCP and trusted by 50% of the Fortune 500, Workato’s cloud-native architecture connects every application, data source, and process to power real-time orchestration at scale. With enterprise-grade security and continuous innovation at its core, Workato provides the trusted foundation for organizations to automate with confidence and operationalize AI across the business. To learn more, visit www.workato.com

Why join us?

Ultimately, Workato believes in fostering a flexible, trust-oriented culture that empowers everyone to take full ownership of their roles. We are driven by innovation and looking for team players who want to actively build our company.

But, we also believe in balancing productivity with self-care. That’s why we offer all of our employees a vibrant and dynamic work environment along with a multitude of benefits they can enjoy inside and outside of their work lives.

If this sounds right up your alley, please submit an application. We look forward to getting to know you!

Also, feel free to check out why:

Business Insider named us an “enterprise startup to bet your career on”
Forbes’ Cloud 100 recognized us as one of the top 100 private cloud companies in the world
Deloitte Tech Fast 500 ranked us as the 17th fastest growing tech company in the Bay Area, and 96th in North America
Quartz ranked us the #1 best company for remote workers

Responsibilities

We are looking for an exceptional AI Researcher to join our growing AI team. In this role, you will design, build, deploy, and improve ML/LLM-powered services and features that power intelligent automation and AI-driven product experiences across the Workato platform. You will work closely with our Engineering, Product, and Design teams to define and track product metrics and evaluation strategies, design customer-facing experiments and dive deep to provide actionable insights.This role is ideal for someone who combines strong ML/LLM intuition, software engineering skills and a practical mindset for shipping reliable, scalable AI systems.

Build and improve AI services using LLMs and custom machine learning models for production use cases.
Design, develop, and operate ML/LLM systems end-to-end, from prototyping to deployment and monitoring.
Write high-quality Python code that is testable, maintainable, and efficient.
Improve validation, observability, and performance monitoring for ML services (quality, latency, reliability, cost).
Partner cross-functionally with product managers, platform engineers, and other stakeholders to ship AI-powered product capabilities.
Evaluate and improve existing implementations by identifying bottlenecks, bugs, and opportunities for optimization.
Design controlled experiments to test the features for our AI-based products and perform deep analysis from the results to find actionable insights
Contribute to technical design and code reviews, helping raise engineering quality across the team.
Experiment and iterate on model behavior, prompting, retrieval, tool use, or orchestration strategies to improve user outcomes.

Requirements

Qualifications / Experience / Technical Skills

Experience with tool-use agents or workflow-aware AI systems.
Experience building AI products in enterprise SaaS environments.
Experience with A/B testing and statistical significance techniques.
Experience with LLMOps/MLOps tooling and practices (monitoring, evaluation pipelines, model rollout, CI/CD).
Experience working with modern data warehouses such as Amazon Redshift Snowflake.

Job Req ID: 2724

Ready to apply?

Apply to Workato

Workato

View all jobs →

Principal Engineer Tech Lead Manager, ML Acceleration

Motional · Singapore, Central, Singapore

Apply now

Infrastructure Singapore Posted Apr 29, 2026

Mission Summary:

We are seeking an experienced and visionary Principle Level Tech Lead Manager to build and lead our new Machine Learning (ML) Acceleration team. This pivotal role will drive the strategy, development, and execution of initiatives aimed at significantly accelerating ML model training. The ultimate goal is to drastically reduce the development cycle for new ML models and enable rapid hot-patching for issues within our deployed autonomous vehicle services.

You will be a hands-on leader, blending deep technical expertise in ML systems and performance optimization with strong leadership and people management skills. You will recruit, mentor, and grow a high-performing team of engineers, fostering a culture of innovation, collaboration, and continuous improvement.

What you'll be doing:

Team Leadership & Management:

Build, lead, and manage a high-performing team of ML and infra engineers focused on acceleration.
Provide technical guidance, mentorship, and career development opportunities to team members.
Foster a collaborative and inclusive team environment.
Define team goals, priorities, and roadmap in alignment with company objectives.

Technical Strategy & Execution:

Define the technical vision and strategy for ML acceleration across the organization.
Identify and evaluate cutting-edge technologies and methodologies for speeding up ML training, including but not limited to data pipeline optimization, large scale distributed training, data loader optimization, hardware acceleration, and model optimization techniques.
Design, develop, and implement scalable and efficient ML acceleration solutions.

Cross-functional Collaboration:

Collaborate closely with ML research, ML Training platform, and product teams to understand their needs and integrate acceleration solutions seamlessly.
Communicate complex technical concepts and strategies to both technical and non-technical stakeholders.
Act as a technical expert and advocate for ML acceleration initiatives across the company.

Impact & Measurement:

Regularly measure and report on the impact of acceleration efforts.
Continuously seek opportunities for further optimization and innovation.

What we're looking for:

Experience:

Bachelor’s degree in Computer Science, a related technical field, or equivalent practical experience
8+ years of experience in software engineering, with at least 5+ years focused on Machine Learning systems and model performance optimization.
3+ years of experience in a technical lead or management role, with a proven track record of building and leading high-performing teams.
Extensive experience with large-scale ML model training and deployment, ideally in a production environment.
Strong understanding of distributed systems and cloud computing platforms (e.g., AWS, GCP, Azure).

Technical Skills:

Deep expertise in ML frameworks such as PyTorch or JAX.
Proficiency in performance profiling and optimization techniques for Deep Neural Networks.
Strong programming skills in Python; C++ experience is a plus.
Knowledge of MLOps principles and practices.

Leadership Skills:

Exceptional leadership, communication, and interpersonal skills.
Ability to attract, hire, and retain top engineering talent.
Proven ability to drive complex technical projects from conception to completion.
Strong problem-solving skills and a proactive, results-oriented mindset.
Ability to thrive in a fast-paced, dynamic environment.

Motional is a driverless technology company making autonomous vehicles a safe, reliable, and accessible reality. We’re driven by something more.

Our journey is always people first.

We aren't just developing driverless cars; we're creating safer roadways, more equitable transportation options, and making our communities better places to live, work, and connect. Our team is made up of engineers, researchers, innovators, dreamers and doers, who are creating a technology with the potential to transform the way we move.

Higher purpose, greater impact.

We’re creating first-of-its-kind technology that will transform transportation. To do so successfully, we must design for everyone in our cities and on our roads. We believe in building a great place to work through a progressive, global culture that is diverse, inclusive, and ensures people feel valued at every level of the organization. Diversity helps us to see the world differently; it’s not only good for our business, it’s the right thing to do.

Scale up, not starting up.

Our team is behind some of the industry's largest leaps forward, including the first fully-autonomous cross-country drive in the U.S, the launch of the world's first robotaxi pilot, and operation of the world's longest-standing public robotaxi fleet. We’re driven to scale; we’re moving towards commercialization of our technology, and we need team members who are ready to embrace change and challenges.

Formed as a joint venture between Hyundai Motor Group and Aptiv, Motional is fundamentally changing how people move through their lives. Headquartered in Boston, Motional has operations in the U.S and Asia. For more information, visit www.Motional.com and follow us on Twitter, LinkedIn, Instagram and YouTube.

Motional AD Inc. is an EOE. We celebrate diversity and are committed to creating an inclusive environment for all employees. To comply with Federal Law, we participate in E-Verify. All newly-hired employees are queried through this electronic system established by the DHS and the SSA to verify their identity and employment eligibility.

Ready to apply?

Apply to Motional

Motional

View all jobs →

WP

Senior Director, AI Solutions Engineer

WPP Media · Singapore, Singapore

Apply now

Strategy Singapore Posted Apr 29, 2026

About WPP Media

WPP is the trusted growth partner for the world’s leading brands. With exceptional talent, trusted data and intelligence, and world-class partnerships – all united by our pioneering agentic marketing platform, WPP Open – we help clients navigate change, capture opportunity, and deliver transformational growth. 

WPP Media is WPP's AI-driven media operating unit, bringing together media, data, and partnerships to deliver creative personalisation at scale. Connected through WPP Open and powered by Open Intelligence, clients see exactly where, how, and why their media investment is working.

For more information, visit wppmedia.com.

About WPP Media 

WPP is the trusted growth partner for the world’s leading brands. With exceptional talent, trusted data and intelligence, and world-class partnerships – all united by our pioneering agentic marketing platform, WPP Open – we help clients navigate change, capture opportunity, and deliver transformational growth.  

WPP Media is WPP's AI-driven media operating unit, bringing together media, data, and partnerships to deliver creative personalisation at scale. Connected through WPP Open and powered by Open Intelligence, clients see exactly where, how, and why their media investment is working. 

For more information, visit wppmedia.com.

 Role Summary and Impact

The Media Futures Group AI Squad is a small, agile, and autonomous innovation team dedicated to charting the course for AI transformation across APAC for one of WPP’s most important and strategic global technology clients.

By seamlessly integrating AI across media, creative, and production, this cross-functional squad delivers cutting-edge innovation and pioneering work to deliver more effective and efficient business outcomes.

The team’s mission is to solve specific business challenges by delivering demonstrable proof of AI-enabled marketing improvements, providing strategic guideposts for the broader organization.

Utilizing best-in-class tools like WPP Open and Gemini, you will innovate and deliver award-winning creative ideas while building, testing, and scaling solutions that enhance effectiveness, efficiency, and execution for our clients.

 The AI Solutions Engineer serves as the master builder executing the construction. This highly technical role provides the dedicated, hands-on development support required to forcefully move theoretical architectural visions out of isolated testing environments and into highly secure, Google-approved, enterprise-grade production infrastructures. The AI Solutions Engineer bridges the critical, often perilous gap between a fragile, locally hosted prototype and a highly scalable, robust organizational tool capable of withstanding massive global traffic.

Responsibilities

API & UI Development: Construct complex backend Application Programming Interface (API) connections and develop the secure user interfaces absolutely necessary to bridge the proprietary WPP Open platform with localized campaign tools and highly guarded client data lakes.

Live Deployment & Debugging: Execute rapid debugging, performance analysis, and troubleshooting of both machine learning models and underlying cloud infrastructure in real-time to prevent catastrophic performance degradation or service outages during live campaign sprints.

Security & Compliance: Strictly document the deployed technical architecture upon the conclusion of active campaigns. Solidify security protocols and data pipelines to ensure absolute compliance with enterprise risk standards and global data privacy regulations.

Codebase Handover: Facilitate the seamless handover of the finalized codebase to centralized, operations-driven technology teams, ensuring pristine code quality, proper containerization, and long-term deployment readiness.

Skills and Experience

A minimum of 5 years of applied experience in software engineering, highly technical support, or complex solutions engineering.

At least 2 years focused exclusively on deploying machine learning (ML) or natural language processing (NLP) applications in live commercial environments.

Exceptional coding and scripting capabilities, primarily utilizing Python, Java, C++, or Go.

Deep, academic understanding of data structures, complex algorithms, and object-oriented software design principles.

Proven, hands-on experience troubleshooting, debugging, and reproducing obscure issues within advanced machine learning frameworks, specifically TensorFlow, Keras, and PyTorch.

Extensive operational experience navigating public cloud services, specifically Google Cloud Platform (GCP) components such as Kubernetes, Compute Engine, and Apache Beam.

Thorough understanding of computer networking, peering with private clouds, managing highly distributed systems, and Git version control.

Key Competencies

Vertex MLOps & Cloud Infrastructure Deployment: Capable of establishing robust, automated machine learning pipelines using Vertex AI Pipelines and Google Cloud Platform components like Kubernetes and Compute Engine. Must ensure that generative AI tools are properly containerized to scale securely and withstand massive global traffic during live campaigns.

Vertex AI Mastery: Deep, hands-on proficiency with Google Cloud's Vertex AI platform as the exclusive, mandated environment for model deployment. Capable of navigating the Vertex Model Garden, configuring foundational models (like Gemini), and executing parameter-efficient fine-tuning for specific campaign needs

Secure RAG & Grounding Pipeline Engineering: Expert ability to build complex Retrieval-Augmented Generation (RAG) architectures within Vertex AI. Must demonstrate high technical acumen in securely connecting generative models to highly guarded, proprietary enterprise data lakes and Customer Data Platforms without exposing PII or violating data governance.

Rapid Problem-Solving: Ability to diagnose and repair code-level pipeline failures or UI breakdowns under the extreme pressure of a live campaign sprint.

Extreme Psychological Resilience: Thriving in highly volatile, fast-paced deployment environments where prototype code must be hardened instantly.

Life at WPP Media & Benefits 

Our passion for shaping the next era of media is powered by our commitment to Be Extraordinary, investing in our employees to inspire transformational creativity. We also Lead Optimistically, firmly believing in and Championing Growth and Development for every individual. This commitment allows WPP Media employees to leverage the extensive global WPP Media & WPP networks to pursue their passions, build vital professional connections, and learn at the cutting edge of marketing and advertising. 

We Create an Open environment built on trust and respect, where everyone feels they belong and has opportunities to progress. This inclusive culture is fostered through a variety of employee resource groups and frequent in-office events showcasing team wins, sharing thought leadership, and celebrating holidays and milestone events. Our comprehensive benefits package reflects this commitment, including competitive medical, vision, and dental insurance, significant paid time off, preferential partner discounts, and employee mental health awareness days.   

WPP Media is an equal opportunity employer and considers applicants for all positions without discrimination or regard to characteristics. We believe the best work happens when we're together, fostering creativity, collaboration, and connection in this open and supportive environment. That's why we’ve adopted a hybrid approach, with teams in the office around four days a week. If you require accommodations or flexibility, please discuss this with the hiring team during the interview process. 

Please note that while our philosophy is the same across WPP, benefits may vary by office/country.

#LI-Regional

Please read our Privacy Notice for more information on how we process the information you provide.

Ready to apply?

Apply to WPP Media

WP

WPP Media

View all jobs →

AI Engineer

SimplifyNext · Singapore

Apply now

Professional Services Singapore Posted Apr 27, 2026

SimplifyNext is a fast-growing consulting and technology firm founded by veterans from top-tier consulting companies, focused on AI, Automation, and Application Platforms. Our mission is to drive business transformation across industries by combining strategic insight with deep technical expertise.

We work with leading enterprises and public sector organisations across Singapore and the Asia Pacific region to design, build, and operate scalable digital and automation platforms — delivering impactful transformations for global and local organisations alike.

Built as an agile practice, we mentor and grow the next generation of consulting and technology experts. We invest heavily in structured training and enablement programmes that help our teams expand across Intelligent Automation, Test Automation, AI-powered workflows, and Agentic AI solutions.

Recognised as one of the fastest-growing companies in Singapore and Asia Pacific, SimplifyNext is positioned as one of the most credible and ambitious digital transformation teams in the region.

We’re not hiring someone to run models. We’re hiring someone who builds systems that think.

At SimplifyNext, our AI Engineers are core to how we deliver transformation — designing and deploying intelligent systems that genuinely change how organisations operate. You won’t be a supporting act to another team. You’ll be the one building the agents, pipelines, and infrastructure that make our AI products real.

We work across public sector and enterprise, at the intersection of AI, automation, and product-led transformation. If you’re energised by hard engineering problems, care about production outcomes - not just research benchmarks - and want your work to reach real users at scale, read on.

What You'll Do

1. Build Agentic AI Systems

Design and build sophisticated AI agents capable of independent operation, complex decision-making, and self-correction across domain-specific contexts.
Develop orchestration workflows using frameworks such as LangChain, LangGraph, and/or the Microsoft Bot Framework to create robust conversational and task-oriented agents.
Implement Retrieval-Augmented Generation (RAG) systems that connect agents to live, accurate knowledge bases — reducing hallucinations and improving output quality.
Build Memory, Reasoning, and Planning (MRP) capabilities so agents can maintain context, reason across information, and execute multi-step plans.
Design Agent-to-Agent (A2A) communication protocols that allow multiple autonomous agents to collaborate, delegate tasks, and exchange information securely.
Work with LLM serving solutions (Ollama, vLLM) to ensure efficient, scalable inference in production environments.

2. Deploy and Operate at Scale

Deploy and manage AI systems on major cloud platforms — AWS, Azure, or GCP — ensuring high availability, security, and scalability.
Containerise applications with Docker and orchestrate deployments with Kubernetes, including end-to-end on-premise cluster setup and management.
Build and maintain CI/CD pipelines using tools like Argo (Argo Workflows, Argo CD) for reliable, automated delivery of AI services.
Expose AI capabilities through well-documented, performant APIs that integrate seamlessly with client-facing systems and applications.

3. Train and Fine-tune Models

Train, fine-tune, and optimise custom AI models using TensorFlow and/or PyTorch, particularly where off-the-shelf models don’t meet specific project requirements.
Run structured experiments — hyperparameter tuning, ablations, model evaluation — to hit performance targets with confidence.
Stay current with the latest developments in large language models and agentic AI, and bring new techniques into our systems proactively.
Work across diverse data modalities: text, image, time-series, and graph-structured data.

Who You Are

Must-Have

Production mindset — you’ve shipped AI systems to real users, not just demos.
Strong fundamentals in agentic AI — you understand how to design agents that are reliable, not just impressive.
Hands-on with at least one major cloud platform (AWS, Azure, or GCP) for deploying and managing AI workloads.
Solid Python engineering skills and comfort working in a Git-based, collaborative codebase.
Experience with containerisation (Docker) and orchestration (Kubernetes) in real deployment contexts.
Clear communicator — able to explain complex technical decisions to non-technical stakeholders.

Good to Have

3–7 years in AI/ML engineering or a closely related role (data engineering, MLOps, applied research).
Hands-on experience with LangChain, LangGraph, Ollama, vLLM, or similar orchestration and serving tools.
Demonstrable RAG system implementation and optimisation experience.
Familiarity with Memory, Reasoning, and Planning (MRP) concepts and Agent-to-Agent (A2A) protocol design.
Experience with MLOps tooling, particularly Argo Workflows and Argo CD.
Proficiency in TensorFlow and/or PyTorch for model training and fine-tuning.
Exposure to knowledge graphs, semantic web technologies, or prompt engineering strategies.

This role is not for you if…

You’ve only worked with AI in sandboxed or prototype environments and haven’t dealt with production reliability challenges.
You prefer to work from a detailed spec rather than figuring out the right approach as you go.
You treat deployment as someone else’s problem
You measure success by models trained, not by outcomes delivered.

Why SimplifyNext

We partner with governments and enterprises to shift from project delivery to product thinking. That means working on problems that genuinely matter — healthcare access, business licensing, workforce development — and being held accountable for outcomes, not just deliverables.

High-impact problem spaces	Public sector and enterprise transformation, AI, and automation at scale across ASEAN and Asia Pacific.
Engineering-first culture	You’ll work alongside world-class architects, developers, and AI practitioners who set a high bar.
End-to-end ownership	You own problems fully — from architecture decisions to production operations — not just one slice.
Learning environment	Full certification sponsorship, structured learning paths, and direct mentorship from day one.

At SimplifyNext, we’re committed to building a team of curious, driven, and forward-thinking individuals who care deeply about creating meaningful impact through technology. If you’re excited by the opportunity to grow, collaborate, and shape the future of digital transformation across the region, we’d be happy to hear from you.

Ready to apply?

Apply to SimplifyNext

SimplifyNext

View all jobs →

MLOps & Agentic Platform Engineer (AI Infrastructure)

Hyphen Connect Limited · Singapore

Apply now

Engineering United States Posted Apr 24, 2026

We are seeking a skilled MLOps & Agentic Platform Engineer. This role involves managing model registries, developing continuous training loops, and implementing A/B testing infrastructure. The ideal candidate will have a strong DevOps/MLOps background and be adept at deploying scalable microservices and building observability dashboards.

Responsibilities:

Manage model registries, continuous training loops, and A/B testing infrastructure.
Deploy agents as scalable microservices on Kubernetes.
Build observability dashboards to track token usage, latency, and agent reasoning paths.

Qualifications:

Strong DevOps/MLOps background (Kubernetes, Docker, Terraform).
Experience with MLflow, Weights & Biases, or LangSmith.
Knowledge of building scalable microservice architectures.

Ready to apply?

Apply to Hyphen Connect Limited

Hyphen Connect Limited

View all jobs →

Senior Specialist, Operations Intelligence, AI/ML

ASM · Singapore

Apply now

SU2.GBE - Advanced Analytics Singapore (ASM Front-End Mfg (S) Ltd) Posted Apr 24, 2026

Step into a career with ASM, where cutting edge technology meets collaborative culture.

For over 55 years ASM has been ahead of what’s next, at the forefront of innovation and what’s technologically possible. With more than 4,500 ASMers representing 70 nationalities, our people and our advanced semiconductor devices are playing a crucial role in trends such as 5G, cloud computing, AI, and autonomous driving.  But we’re more than just a tech company. We value diversity, inclusion and sustainability as we strive to make a positive impact on the world. Our development programs help support your growth, shaping your future and pushing the boundaries of innovation to unleash potential.

Job's mission

As a Senior Specialist in AI/ML within ASM’s Operations Intelligence function, you will play a pivotal role in reimagining how data, artificial intelligence, and intelligent automation transform global operations. You will design, build, and deploy scalable AI and machine learning solutions that optimize semiconductor supply chain, manufacturing, and logistics performance. By translating complex operational challenges into impactful AI-driven solutions, you will help demonstrate the power of agentic AI, large language models, and advanced analytics—driving measurable business outcomes and shaping the future of smart manufacturing at scale.

What you will be working on

Design, develop, and deploy advanced prediction models, including Agentic AI systems and Large Language Models (LLMs), to drive automation, intelligent decision-making, and innovation across Global Operations.
Collaborate with leaders, program managers, and cross-functional teams to integrate AI into existing systems.
Stay abreast of the latest advancements in machine learning techniques, Agentic AI, and LLMs, and evaluate their applicability to Global Operations’ business challenges.
Analyze business problem statements, translate business problem statements into data and AI requirements, collaborate with tech team to identify data sources, extract and transform data, perform data modelling, simulate outputs, test to produce data and knowledge for AI Solutions.
Automate and integrate data and AI solution development and deployment; identify and prioritize opportunities, and develop proof-of-concepts for AI/ML applications to automate manual business processes.
Work closely with data engineers, data analysts/scientists, senior management, and other stakeholders to support digital transformation.

What we are looking for

Bachelor’s or master’s degree (or equivalent experience) in computer science, engineering, or a related technical field.
Minimum 5 years of hands‑on experience delivering data science or AI/ML projects end to end.
Familiarity with SAP systems (S/4HANA, CRM, SRM), Kinaxis, and end‑to‑end processes across manufacturing, supply chain, purchasing, logistics, quality, or engineering.
Strong experience applying statistical methods and machine learning algorithms to real business problems.
Proven experience deploying AI/ML solutions on cloud platforms such as AWS, Azure, or Google Cloud, including exposure to MLOps practices.
Strong analytical thinking, problem‑solving, and communication skills, with the ability to explain complex concepts to non‑technical audiences.
Comfortable working in a fast‑paced, startup‑like environment where priorities evolve and innovation is encouraged.

What sets you apart

Hands‑on experience with large language models, agentic AI frameworks, and autonomous decision‑making systems.
Knowledge of computer vision techniques and their application in manufacturing or operations environments.
Strong awareness of data privacy, security, and ethical AI principles, with experience applying them in production environments.

Apply today to be part of what’s next.

We make the tech that enables the chips in devices which improve lives around the world. We do this with an eye to the future, pushing the boundaries of what’s possible through cutting-edge innovation, and driving the next wave of technological breakthroughs that shape how we live, work, and connect.

To learn more about ASM, find us at asm.com and on LinkedIn, Facebook, Instagram, X and YouTube.

ASM is an equal opportunity employer and considers qualified applicants for employment without regard to race, color, religion, age, nationality, social or ethnic origin, sexual orientation, gender, gender identify or expression, marital status, pregnancy, political affiliation, disability, genetic information, veteran status, or any other characteristic protected by law.

Ready to apply?

Apply to ASM

ASM

View all jobs →

Director, Compliance Data Science & AI

OKX · Singapore, Singapore

Apply now

Product Management Singapore Posted Mar 20, 2026

OKX will be prioritising applicants who have a current right to work in Singapore, and do not require OKX's sponsorship of a visa.

Who We Are

At OKX, we believe that the future will be reshaped by crypto, and ultimately contribute to every individual's freedom.

OKX is a leading crypto exchange, and the developer of OKX Wallet, giving millions access to crypto trading and decentralized crypto applications (dApps). OKX is also a trusted brand by hundreds of large institutions seeking access to crypto markets. We are safe and reliable, backed by our Proof of Reserves.

Across our multiple offices globally, we are united by our core principles: We Before Me, Do the Right Thing, and Get Things Done. These shared values drive our culture, shape our processes, and foster a friendly, rewarding, and diverse environment for every OK-er.

OKX is part of OKG, a group that brings the value of Blockchain to users around the world, through our leading products OKX, OKX Wallet, OKLink and more.

About The Opportunity

Few compliance analytics roles offer this combination: genuinely novel problems, global scale, and the freedom to build rather than maintain. At a global crypto exchange, the financial crime data landscape is more varied, more real-time, and more analytically rich than almost anywhere in traditional finance. The regulatory environment is evolving quickly, the typologies are new territory, and the analytical work has direct impact on how the organisation detects and responds to financial crime. This role sits within the Product organisation and is dedicated entirely to that space, covering AML, sanctions, KYC/KYB, transaction monitoring, and beyond.

You will join a collaborative team of data scientists, data engineers, and business analysts, working closely with compliance stakeholders across the full range of financial crime domains. The role sits at the point where product engineering culture meets compliance depth, and you will have room to contribute across both. It is a role well suited to someone who enjoys varied, substantive work and values being part of a team that takes the quality of its output seriously.

The team is actively building toward an AI-native way of working. LLM-assisted coding, automated analytical pipelines, and AI-augmented investigation tools are already part of how the team operates or in active development. For someone who has wanted to apply AI seriously in a compliance context without cutting corners on rigour or auditability, this is an environment where that work is already underway and genuinely valued.

What You’ll Be Doing

Lead delivery of advanced analytics and AI initiatives across core compliance domains including AML detection, sanctions screening, KYC/KYB risk modelling, customer risk rating, transaction monitoring calibration, and SAR analytics, with involvement from problem definition through to production.
Work as the primary point of contact between the Product engineering team and compliance stakeholders, translating regulatory requirements into scoped analytical workstreams and bringing analytical findings back to compliance teams in a form that is clear, actionable, and well grounded in the underlying data.
Drive the adoption of AI-assisted workflows across the team's compliance analytics work: LLM-assisted coding, automated analytical pipelines, AI-augmented investigation and triage. You will help demonstrate what AI-native working looks like in a compliance context and support the broader team in building confidence with these approaches.
Work hands-on with ML engineers on detection models and risk scoring systems that are production-grade and auditable. Compliance analytics requires both explainability and reliability, and experience balancing those two requirements in practice will be important here.
Provide technical guidance and mentorship to data scientists and analytics engineers through code and methodology reviews, collaborative problem solving on complex compliance workstreams, and a consistent focus on raising the quality of the work across the team.
Partner with product managers on scoping, sequencing, and effort estimation for compliance analytics initiatives, bringing both technical and regulatory context to delivery planning conversations.
Maintain documentation and governance standards for compliance analytics output, ensuring that models and analyses are recorded clearly enough to support internal review, regulatory scrutiny, and knowledge continuity within the team.
Contribute to regulatory responses and internal audit events by leading the data science workstream: coordinating analytical evidence, supporting the technical narrative, and presenting findings to compliance and legal review functions where required.

What We Look For In You

10+ years in data science, quantitative analytics, or ML, with meaningful time in a leadership or senior individual contributor capacity in a compliance or regulatory analytics environment within financial services, fintech, or a crypto exchange. We welcome candidates across varied seniority levels; scope and responsibilities will be calibrated to your experience.
Solid compliance domain knowledge, with hands-on experience across one or more of AML typology detection, sanctions analytics, KYC/KYB modelling, customer risk rating, or transaction monitoring. Comfort navigating nuanced conversations with compliance stakeholders, where analytical and regulatory requirements do not always align neatly, will serve you well in this role.
Strong Python and SQL, applied ML across supervised, unsupervised, anomaly detection, and graph-based methods, and enough MLOps and data engineering fluency to engage credibly in production design conversations with engineers.
A working AI-first mindset: you use AI tools regularly in your own work, have a view on where LLMs and automation genuinely improve compliance analytics workflows, and understand where they introduce risk. Experience helping a team build confidence with new AI approaches is a plus.
Good cross-functional communication skills, with the ability to engage clearly with compliance officers, product managers, and engineers who each bring different priorities to the same conversation.
Experience mentoring data scientists and analysts, with a genuine interest in helping colleagues develop and in raising the overall quality of the work around you.
Familiarity with model governance, explainability requirements, and the documentation standards associated with analytics work in a regulated environment.
Some exposure to working within or alongside a Product organisation, with an understanding of how delivery cycles, engineering prioritisation, and roadmaps are managed in practice.
Familiarity with the crypto ecosystem, on-chain analytics, blockchain data, or VASP compliance frameworks is a meaningful advantage for this role.
Experience with regulatory examinations, enforcement responses, or formal model validation in a compliance context. Having worked through one of those processes gives you a perspective on what the work ultimately needs to withstand, and that experience is genuinely valued here.

Perks & Benefits

Competitive total compensation package
L&D programs and education subsidy for employees' growth and development
Various team building programs and company events
Wellness and meal allowances
Comprehensive healthcare schemes for employees and dependants
More that we love to tell you along the process!

#LI-ONSITE
#LI-WWW

Notice:

All official OKX vacancies are published on this website. While roles may appear on selected third-party platforms from time to time, information on other sites may be inaccurate or outdated. If in doubt, please apply directly through our official careers website.

Information collected and processed as part of the recruitment process of any job application you choose to submit is subject to OKX's Candidate Privacy Notice.

Ready to apply?

Apply to OKX

OKX

View all jobs →

Senior Machine Learning Engineer, Compliance

OKX · Singapore, Singapore

Apply now

Product Management Singapore Posted Mar 20, 2026

OKX will be prioritising applicants who have a current right to work in Singapore, and do not require OKX's sponsorship of a visa.

Who We Are

At OKX, we believe that the future will be reshaped by crypto, and ultimately contribute to every individual's freedom.

OKX is a leading crypto exchange, and the developer of OKX Wallet, giving millions access to crypto trading and decentralized crypto applications (dApps). OKX is also a trusted brand by hundreds of large institutions seeking access to crypto markets. We are safe and reliable, backed by our Proof of Reserves.

Across our multiple offices globally, we are united by our core principles: We Before Me, Do the Right Thing, and Get Things Done. These shared values drive our culture, shape our processes, and foster a friendly, rewarding, and diverse environment for every OK-er.

OKX is part of OKG, a group that brings the value of Blockchain to users around the world, through our leading products OKX, OKX Wallet, OKLink and more.

About The Opportunity

Building ML systems for compliance at a crypto exchange is a different kind of problem from most ML engineering work. The data spans on-chain transactions, fiat flows, KYC records, and behavioural signals that very few organisations have in one place. The problems are genuinely unsolved, the stakes are high, and the work has direct bearing on how a global exchange detects and responds to financial crime. For someone who wants their engineering work to matter beyond model accuracy metrics, this is an interesting place to be.

This role sits within a team of data scientists, analytics engineers, and compliance specialists who are building the analytical and AI infrastructure that powers the compliance function. You will work across the full ML lifecycle, from feature pipelines and model development through to deployment and monitoring, with close involvement from the domain experts who understand what the models need to do in practice.

AI-assisted development is how this team works. LLM-assisted coding, automated analytical pipelines, and AI-powered investigation tooling are part of the daily workflow. We are looking for engineers who already operate this way and who can raise the bar for what that looks like in a production compliance environment.

What You’ll Be Doing

Design, build, and deploy ML models for compliance use cases including AML transaction monitoring, customer risk rating, KYC/KYB risk scoring, sanctions exposure detection, and SAR analytics, working closely with data scientists on model architecture and with data engineers on pipeline design.
Own the production infrastructure for compliance ML: feature pipelines, model serving, monitoring, drift detection, and retraining workflows. Models in a compliance context need to be reliable, auditable, and well documented, and you will be responsible for making sure they are.
Build and maintain internal ML tooling that the broader team depends on: reusable pipeline components, experiment tracking, model registries, and evaluation frameworks that raise the quality and speed of model development across the team.
Apply AI-assisted coding and automation as a matter of course: using LLM tooling to accelerate development, building automated pipelines that reduce manual analytical work, and integrating LLM-based capabilities into compliance workflows such as SAR narrative assistance, alert summarisation, and investigative triage. The expectation is that you bring this fluency with you, not that you develop it here.
Work with data scientists to take research-stage models into production, reviewing feature logic, validating pipeline assumptions, and bridging the gap between a notebook and a deployment that a compliance team can rely on.
Collaborate with compliance and legal stakeholders to ensure models are explainable and documented to the standard required for internal governance and regulatory review.
Keep a close eye on developments in compliance-relevant ML: graph neural networks for network-based AML detection, anomaly detection approaches for novel typologies, and emerging LLM applications in regulated environments, bringing relevant ideas into the team's work where they hold up to scrutiny.

What We Look For In You

8+ years in ML engineering, data science, or a closely related field, with a strong track record of taking models from prototype to production in environments where reliability and auditability matter. We welcome candidates across seniority levels; scope will be calibrated to your experience.
Solid Python and experience with ML frameworks alongside hands-on MLOps practice including model deployment, monitoring, and CI/CD pipelines for ML workflows.
Experience with big data platforms such as Spark, Databricks, Hadoop, or MaxCompute, and comfort designing and optimising both batch and real-time data pipelines that feed production ML systems.
Demonstrable fluency with AI-assisted development: you use LLM coding tools before, you have built LLM-integrated pipelines or automation workflows in a professional context, and you have a clear, practical view on where these tools genuinely improve engineering output and where they introduce risk.
A good working knowledge of ML fundamentals across supervised, unsupervised, and anomaly detection methods, and an understanding of how different approaches translate to compliance problem formulations.
Familiarity with explainability frameworks such as SHAP or LIME, and an appreciation for what model governance and auditability require in a regulated environment.
Good communication skills and a collaborative working style, with the ability to work effectively alongside data scientists, compliance domain experts, and engineers who each bring a different perspective to the same problem.
Experience in financial services, fintech, or a crypto exchange, particularly in AML, KYC/KYB, transaction monitoring, or a related compliance domain, is a meaningful advantage.
Familiarity with the crypto ecosystem, on-chain data, blockchain analytics, or VASP regulatory frameworks is a plus and will give you a head start in understanding the data you will be working with.

Perks & Benefits

Competitive total compensation package
L&D programs and education subsidy for employees' growth and development
Various team building programs and company events
Wellness and meal allowances
Comprehensive healthcare schemes for employees and dependants
More that we love to tell you along the process!

#LI-ONSITE
#LI-WWW

Notice:

All official OKX vacancies are published on this website. While roles may appear on selected third-party platforms from time to time, information on other sites may be inaccurate or outdated. If in doubt, please apply directly through our official careers website.

Information collected and processed as part of the recruitment process of any job application you choose to submit is subject to OKX's Candidate Privacy Notice.

Ready to apply?

Apply to OKX

OKX

View all jobs →

Machine Learning Engineer Lead, Vulcan (Global)

AIFT · Taipei, Hong Kong, Singapore, Japan, Abu Dhabi

Apply now

Asia - Engineering Taiwan Office Hong Kong Office SEA/EA/ME Posted Mar 6, 2026

About the role

We are seeking an experienced Machine Learning Lead to helm our Machine Learning team.

In this pivotal role, you will be the engineering architect behind Vulcan’s core AI capabilities. You will act as the nexus between Research, Platform, and Product. Your mission is to translate cutting-edge findings on GenAI threats into robust, production-ready machine learning models that power our GenAI Security Guardrails (Blue Team) and Automated Vulnerability Assessment (Red Team).

Crucially, you will serve as the bridge between deep tech and business strategy, articulating technical constraints (like FLOPS and latency) to leadership and clients while guiding the engineering direction.

Key Responsibilities

1. Model Development & Optimization (Training & Fine-tuning):

Research to Production: Collaborate with the Security Research Team to operationalize new threat detection techniques. They identify the "what" (e.g., new prompt injection patterns); you determine the "how" (model architecture, training strategy).
Fine-tuning & Adaptation: Lead the fine-tuning of Language Models (e.g., using LoRA/PEFT) to optimize for our supported muti-lingual languages and specific security intents.
Multimodal Readiness: Prepare the system for Multimodal (Text + Image/Audio) capabilities. Evaluate and implement models to detect visual prompt injections and non-textual threats as the product evolves.

2. MLOps& Data Infrastructure:

Enhance & Scale MLOps: Take ownership of our existing ML pipelines. Focus on optimizing and scaling CI/CD/CT workflows to improve training efficiency and deployment velocity.
Data Governance: Implement and enforce rigorous Data Versioning strategies (e.g., DVC) to ensure complete reproducibility of model artifacts and datasets.
Monitoring & Reliability: Maintain rigorous monitoring for model drift and performance, ensuring high reliability in a production security environment.

3. Cross-Functional Implementation & Leadership:

Platform Collaboration: Work closely with the Platform Engineering Team to integrate ML models into the broader product architecture. Ensure seamless interaction between model inference services and the main platform logic.
Team Leadership: Lead and mentor Machine Learning Engineers, fostering a culture of engineering rigor, code quality, and operational excellence.
Resource Management: Manage GPU resources and compute budgets effectively for both training and inference workloads.

4. Technical Strategy & Stakeholder Management:

Translating Tech to Business: Act as the technical voice of the ML team. You must effectively explain complex ML concepts (e.g., FLOPS, quantization trade-offs, model latency vs. accuracy) to executive leadership and clients.
Cost-Benefit Analysis: Justify compute resource investments. Articulate the trade-off between infrastructure costs (GPU hours) and performance gains to non-technical stakeholders.

Qualifications

Experience: 5+ years in Machine Learning Engineering, with specific experience in leading technical projects or mentoring engineers.
Communication & Business Acumen: Exceptional ability to distill complex technical topics (e.g., compute complexity, infrastructure costs) into clear, business-relevant insights for decision-makers.
MLOps Proficiency: Proven experience in optimizing ML pipelines and infrastructure. Familiarity with tools like MLflow, Kubeflow, Airflow, and Data Versioning tools (DVC, etc.).
Engineering First: Proficient in Python, Docker, and Kubernetes. You treat ML models as software artifacts that need testing and version control.
NLP & LLM Expertise: Experience with Transformer architectures, Embeddings, and LLM fine-tuning. Familiarity with frameworks like PyTorch, Hugging Face, and vLLM.
Language Support: Experience processing or fine-tuning models for multi-lingual environments.

Nice to Have

Multimodal Expertise: Experience working with Multimodal models (Image-to-Text, Text-to-Image, VLMs like CLIP, LLaVA).
Security Awareness: Understanding of GenAI security threats (e.g., Prompt Injection).
High-Performance Computing: Experience optimizing inference speed (quantization, distillation, vLLM) for real-time applications.
Vector Database: Experience with Vector DBs for RAG applications.

Why Join Us?

Innovative Environment: Be part of a company at the forefront of technology to provide security in GenAI, with opportunities to work on groundbreaking projects.
Growth Opportunities: Take your career to new heights with our career development programs and growth-focused culture.
Dynamic Team: Join a multi-cultural and dynamic team of dedicated professionals who inspire and support each other.
Compensation: Competitive salary and benefits package, commensurate with experience and performance.

Application Process

If you're ready to embark on this exciting journey and contribute to shaping the future of GenAI security, please submit your resume outlining your relevant experience and motivation for applying.
If you prefer a direct connection or have specific questions about our vision, feel free to reach out to our Co-founder, Alvin Kwock, via LinkedIn.

Ready to apply?

Apply to AIFT

AIFT

View all jobs →

Tech Lead, AI Compute Infrastructure

HeyGen · Los Angeles, Palo Alto, San Francisco, Toronto, Singapore

Apply now

Engineering Los Angeles San Francisco Palo Alto Toronto Posted Feb 11, 2026

About HeyGen

At HeyGen, our mission is to make visual storytelling accessible to all. Over the last decade, visual content has become the preferred method of information creation, consumption, and retention. But the ability to create such content, in particular videos, continues to be costly and challenging to scale. Our ambition is to build technology that equips more people with the power to reach, captivate, and inspire audiences.
Learn more at www.heygen.com. Visit our Mission and Culture doc here.

We are seeking a seasoned Technical Leader to build and scale the foundational compute infrastructure that powers our state-of-the-art AI models—from multimodal training data pipelines to high-throughput, low-latency video generation.

Responsibilities

You will be the core engineer responsible for building the robust, efficient, and scalable platform that enables our research and production teams to rapidly iterate on HeyGen's generative video models. Your contributions will directly impact model performance, developer productivity, and the final quality of every AI-generated video.

Optimize GPU Utilization: Design and implement mechanisms to aggressively optimize GPU and cluster utilization across thousands of devices for inference, training, data processing and large-scale deployment of our state-of-art video generation models.
Develop Large-Scale AI Job Framework: Build highly scalable, reliable frameworks for launching and managing massive, heterogeneous compute jobs, including multi-modal high-volume data ingestion/processing, distributed model training, and continuous evaluation/benchmarking.
Enhance Observability: Develop world-class observability, tracing, and visualization tools for our compute cluster to ensure reliability, diagnose performance bottlenecks (e.g., memory, bandwidth, communication).
Accelerate Pipelines: Collaborate closely with AI researchers and AI engineers to integrate innovative acceleration techniques (e.g., custom CUDA kernels, distributed training libraries) into production-ready, scalable training and inference pipelines.
Infrastructure Management: Champion the adoption and optimization of modern cloud and container technologies (Kubernetes, Ray) for elastic, cost-efficient scaling of our distributed systems.

Minimum Requirements

We are looking for a highly motivated engineer with deep experience operating and optimizing AI infrastructure at scale.

Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
5+ years of full-time industry experience in large-scale MLOps, AI infrastructure, or HPC systems.
Experience with data frameworks and standards like Ray, Apache Spark, LanceDB
Strong proficiency in Python and a high-performance language such as C++ for developing core infrastructure components.
Deep understanding and hands-on experience with modern orchestration and distributed computing frameworks such as Kubernetes and Ray.
Experience with core ML frameworks such as PyTorch, TensorFlow, or JAX.

Preferred Qualifications

Master's or PhD in Computer Science or a related technical field.
Demonstrated Tech Lead experience, driving projects from conceptual design through to production deployment across cross-functional teams.
Prior experience building infrastructure specifically for Generative AI models (e.g., diffusion models, GANs, or large language models) where cost and latency are critical.
Proven background in building and operating large-scale data infrastructure (e.g., Ray, Apache Spark) to manage petabytes of multi-modal data (video, audio, text).
Expertise in GPU acceleration and deep familiarity with low-level compute programming, including CUDA, NCCL, or similar technologies for efficient inter-GPU communication.

What HeyGen Offers

Competitive salary and benefits package.
Dynamic and inclusive work environment.
Opportunities for professional growth and advancement.
Collaborative culture that values innovation and creativity.
Access to the latest technologies and tools.

HeyGen is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Ready to apply?

Apply to HeyGen

HeyGen

View all jobs →

Software Engineer, AI Compute Infrastructure

HeyGen · Los Angeles, Palo Alto, San Francisco, Toronto, Singapore

Apply now

Engineering Los Angeles San Francisco Palo Alto Toronto Posted Feb 11, 2026

About HeyGen

At HeyGen, our mission is to make visual storytelling accessible to all. Over the last decade, visual content has become the preferred method of information creation, consumption, and retention. But the ability to create such content, in particular videos, continues to be costly and challenging to scale. Our ambition is to build technology that equips more people with the power to reach, captivate, and inspire audiences.
Learn more at www.heygen.com. Visit our Mission and Culture doc here.

We are seeking a seasoned Software Engineer to build and scale the foundational compute infrastructure that powers our state-of-the-art AI models—from multimodal training data pipelines to high-throughput, low-latency video generation.

Responsibilities

You will be the core engineer responsible for building the robust, efficient, and scalable platform that enables our research and production teams to rapidly iterate on HeyGen's generative video models. Your contributions will directly impact model performance, developer productivity, and the final quality of every AI-generated video.

Optimize GPU Utilization: Design and implement mechanisms to aggressively optimize GPU and cluster utilization across thousands of devices for inference, training, data processing and large-scale deployment of our state-of-art video generation models.
Develop Large-Scale AI Job Framework: Build highly scalable, reliable frameworks for launching and managing massive, heterogeneous compute jobs, including multi-modal high-volume data ingestion/processing, distributed model training, and continuous evaluation/benchmarking.
Enhance Observability: Develop world-class observability, tracing, and visualization tools for our compute cluster to ensure reliability, diagnose performance bottlenecks (e.g., memory, bandwidth, communication).
Accelerate Pipelines: Collaborate closely with AI researchers and AI engineers to integrate innovative acceleration techniques (e.g., custom CUDA kernels, distributed training libraries) into production-ready, scalable training and inference pipelines.
Infrastructure Management: Champion the adoption and optimization of modern cloud and container technologies (Kubernetes, Ray) for elastic, cost-efficient scaling of our distributed systems.

Minimum Requirements

We are looking for a highly motivated engineer with deep experience operating and optimizing AI infrastructure at scale.

Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
5+ years of full-time industry experience in large-scale MLOps, AI infrastructure, or HPC systems.
Experience with data frameworks and standards like Ray, Apache Spark, LanceDB
Strong proficiency in Python and a high-performance language such as C++ for developing core infrastructure components.
Deep understanding and hands-on experience with modern orchestration and distributed computing frameworks such as Kubernetes and Ray.
Experience with core ML frameworks such as PyTorch, TensorFlow, or JAX.

Preferred Qualifications

Master's or PhD in Computer Science or a related technical field.
Demonstrated Tech Lead experience, driving projects from conceptual design through to production deployment across cross-functional teams.
Prior experience building infrastructure specifically for Generative AI models (e.g., diffusion models, GANs, or large language models) where cost and latency are critical.
Proven background in building and operating large-scale data infrastructure (e.g., Ray, Apache Spark) to manage petabytes of multi-modal data (video, audio, text).
Expertise in GPU acceleration and deep familiarity with low-level compute programming, including CUDA, NCCL, or similar technologies for efficient inter-GPU communication.

What HeyGen Offers

Competitive salary and benefits package.
Dynamic and inclusive work environment.
Opportunities for professional growth and advancement.
Collaborative culture that values innovation and creativity.
Access to the latest technologies and tools.

HeyGen is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Ready to apply?

Apply to HeyGen

HeyGen

View all jobs →

Machine Learning Researcher

Kronos Research · Singapore

Apply now

Research & Trading Taiwan Posted Feb 9, 2026

Role Overview

We are seeking highly motivated and curious individuals to join our Machine Learning team at Kronos Research. In this role, you will bridge the gap between advanced deep learning and financial markets, designing robust models for medium and high-frequency systematic trading strategies. You will manage the full ML lifecycle, from researching novel architectures to deploying scalable, low-latency models that directly drive trading revenue.

Key Responsibilities

Feature Engineering: Analyze complex time-series data, orderbook dynamics and trade data to engineer high-signal features
Deep Learning Architecture: Design and train deep-learning based models (MLP, LSTM, RNN, Transformers, RL agents, etc) tailored for financial trading environments
Backtesting & Evaluation: Conduct comprehensive backtesting and simulation across various asset classes and exchanges; analyze trade execution and PnL attribution
Model Deployment: Collaborate with engineering teams to optimize and deploy models into production
MLOps & Automation: Build and maintain automated pipelines for data ingestion, model retraining, and continuous performance monitoring to streamline the research-to-production workflow

Qualifications

Strong academic or professional foundation in machine learning, quantitative research, and/or other related STEM fields; open to both experienced candidates and highly-motivated fresh graduates
Deep understanding of neural network architectures and their application to time-series forecasting
Proficiency in Python and modern ML frameworks (PyTorch/TensorFlow/Jax); C++ preferred
Solid command of probability theory, linear algebra and applied statistics
Strong communication skills and able to articulate technical concepts with clarity
High level of drive, curiosity and a passion for continuously learning in a fast-paced environment

Ready to apply?

Apply to Kronos Research

Kronos Research

View all jobs →

MLOps Jobs in Singapore.

Senior ML Solutions Architect - Token Factory

The role

HPC & Cloud Infrastructure Engineer

AI Researcher

About Workato

Why join us?

Responsibilities

Requirements

Qualifications / Experience / Technical Skills

Principal Engineer Tech Lead Manager, ML Acceleration

Senior Director, AI Solutions Engineer

AI Engineer

What You'll Do

1. Build Agentic AI Systems

2. Deploy and Operate at Scale

3. Train and Fine-tune Models

Who You Are

Why SimplifyNext

MLOps & Agentic Platform Engineer (AI Infrastructure)

Senior Specialist, Operations Intelligence, AI/ML

Director, Compliance Data Science & AI

Who We Are

About The Opportunity

What You’ll Be Doing

What We Look For In You

Perks & Benefits

Senior Machine Learning Engineer, Compliance

Who We Are

About The Opportunity

What You’ll Be Doing

What We Look For In You

Perks & Benefits

Machine Learning Engineer Lead, Vulcan (Global)

Key Responsibilities

1. Model Development & Optimization (Training & Fine-tuning):

Nice to Have

Why Join Us?

Application Process

Tech Lead, AI Compute Infrastructure

About HeyGen

Responsibilities

Minimum Requirements

Preferred Qualifications

What HeyGen Offers

Software Engineer, AI Compute Infrastructure

About HeyGen

Responsibilities

Minimum Requirements

Preferred Qualifications

What HeyGen Offers

Machine Learning Researcher

Role Overview