About the role

Cube · Onsite

As one of our early AI Engineering hires, you'll help define what AI at Cube looks like. You'll build the AI features people actually use from our self-hosted chat interface and MCP server to retrieval pipelines, prompts, evaluations, and integrations with internal systems. You'll work closely with our Infrastructure and Data Engineering teams to design architecture, connect systems, and transform emerging AI capabilities into practical products and tools that solve real problems every day.

Maintain and tunning our self-hosted chat interface including model connections, MCP integration, RAG/knowledge base setup
Build the RAG pipeline: ingestion, chunking, embeddings, vector store, retrieval, reranking, and evaluation
Integrate LiteLLM or OpenRouter as the gateway; handle routing, fallbacks, rate limits, and cost tracking
Maintain and configure MCP server and the tools it exposes to the model
Write prompts and evaluations, and iterate on them based on real usage and failure cases
Monitoring the logging, tracing, and guardrails of our AI platforms and model does.
Good to have exposure on MLOps/Platform team to deploy self-hosted models (vLLM, TGI, Ollama) and keep them healthy
Ship features end-to-end: API, retrieval, prompt, evaluation, and rollout

Requirements

4+ years of software engineering experience
Familiarity with containerized technologies and orchestration platforms such as Kubernetes
Strong interest in AI, LLMs, and the rapidly evolving model ecosystem
1+ years of experience building, deploying, or supporting production LLM systems (RAG, agents, or fine-tuned models)
Experience deploying and configuring self-hosted LLM chat interfaces (Open WebUI preferred; similar platforms are acceptable)
Hands-on experience with retrieval and RAG systems, including embeddings, vector databases, chunking strategies, hybrid search, and evaluation methodologies
Experience working with LLM gateways or routing layers such as LiteLLM, OpenRouter, Portkey, or similar solutions
Experience serving open-weight models using tools such as vLLM, TGI, or SGLang
Experience designing and implementing secure integrations between LLMs and internal business systems
Nice to have: Experience with or understanding of MCP servers, agent frameworks, or tool-calling architectures
Nice to have: Experience with or understanding of LLM observability and monitoring platforms such as LangSmith, Langfuse, or similar tools

Ready to apply to Cube?

Apply to Cube

About the role

Similar jobs

Whoa — hold up

About the role

Similar jobs

Whoa — hold up

Catch your next role the second it’s posted.

Get the worldwide-remote edge.