Pick a job to read the details

Tap any role on the left — its description and apply link will open here.

Senior AI Engineer - Harness Engineering (Kimchi)

Cast AI · Bulgaria; Croatia; Estonia; Greece; Hungary; Latvia; Lithuania; Poland; Romania; Slovakia; Slovenia; Ukraine

Technology European Union Posted May 7, 2026

Why Kimchi?

Kimchi is the AI platform inside CAST AI. We started by helping companies run LLMs on their own Kubernetes clusters - now we're building the execution layer where agents do real work.

Our Infrastructure today: multi-model inference (MiniMax, Kimi, GLM-5, Nemotron, DeepSeek) with intelligent routing, an OpenAI-compatible API, and deployment flexibility from our GPUs to your VPC. The inference layer is the foundation. What we're hiring for sits on top of it: coding agents, agent runtimes, orchestration systems, and the reliability engineering that makes them actually finish things.

Tech Stack: TypeScript, Go, Kubernetes, AWS/GCP/Azure, MCP, Prometheus/Grafana/Loki, GitLab CI, ArgoCD.

Why harness engineering matters here
OpenAI and Anthropic ship models. They also ship one harness each - the scaffolding that turns a raw model into something that can plan, execute, recover, and complete work. We ship a different kind of harness: one built for cost-conscious, long-horizon autonomy, running on inference infrastructure we control end-to-end.
A decent model with a great harness beats a great model with a bad harness. We've watched this play out. The gap between what today's models can do and what you see them doing is largely a harness gap - and that gap is where we operate.

What you'll build
The ratchet.
Every time our agent makes a mistake, we engineer a solution so it never makes that mistake again. That means hooks that enforce constraints the model "knows" but forgets: pre-commit lint checks, permission gates, context compaction before the window fills. Success is silent, failures are verbose.

Long-horizon execution.
Our harness is built around spec-driven autonomy: meta-prompting, fresh context per task, worktree-per-slice git strategy, automatic replanning, crash recovery, stuck detection. We're implementing Ralph loops - when the model tries to exit, we intercept and reinject the goal into a fresh context. The agent reads state from disk and continues. Multi-session, multi-day work, without context rot.

Planner/executor splits.
Planning with a reasoning model, executing with a fast one, evaluating with a third. Separating generation from evaluation beats self-verification because agents reliably skew positive when grading their own work.

The harness surface.
CLI, TUI, MCP integration, sandboxed execution, telemetry. Our AGENTS.md is short - every line traces to a specific thing that went wrong. TypeScript on the surface, Go where it matters.

Memory and context.
Moving agents off laptops, giving them state that survives across sessions, managing context so information lands where it's actionable. Compaction, tool-call offloading, progressive skill disclosure.

What makes this different (with receipts)
You've seen the pitch: "we route to the best model." Everyone says that. Here's what we actually have:

GPU infrastructure we own. Not just an API reseller. From GPU placement across clouds to the inference endpoint your agent calls - we control the cost curve.
A harness-first thesis. We treat agent failures as configuration problems, not model problems. When we moved from a stock harness to our own, completion rates on internal benchmarks improved by 40%+ on the same model.
Agents.md that earns every line. No brainstormed rules - every constraint in our system prompt traces to a real failure we saw and fixed.

Requirements:

You've used AI coding agents in anger. Not demos - real work. You have opinions about Claude Code, Codex, OpenCode, Cursor. You know what it feels like when an agent gets stuck and why.
Strong TypeScript or Go in production. Comfort moving between them. Our surface is TypeScript; our core is Go.
You think in harness terms. You read "the agent hallucinated" and your first instinct is to ask what context it was missing, what hook should have caught it, what constraint should exist.
You drive features end-to-end. Design → build → ship → measure → iterate. We don't have layers that absorb ambiguity for you.

Responsibilities:

Build and evolve the agent harness - ship hooks, permission gates, and context compaction. Every AGENTS.md constraint traces to a failure you personally diagnosed.
Own long-horizon execution - multi-session task completion via spec-driven prompting, worktree-per-slice git, Ralph loop recovery, and stuck detection. Completion rate is your metric.
Architect planner/executor/evaluator pipelines - planning with a reasoning model, execution with a fast one, evaluation with a third. No self-verification.
Manage agent memory and context - state persistence across sessions, context compaction, tool-call offloading. Zero context rot on multi-day work.
Own the harness surface - CLI, TUI, MCP integrations, sandboxed execution, telemetry. TypeScript on the surface, Go where it matters.

What success looks like (after 6 months):

You've shipped at least one major harness feature end-to-end: designed it, built it, measured it, iterated.
You've added constraints to our AGENTS.md based on failures you personally observed and diagnosed.
You've improved a measurable reliability metric - completion rate, context efficiency, or cost per successful task.
You've formed strong opinions about where our harness is load-bearing and where it's dead weight.

What’s in it for you?

Competitive salary (€6,500 - €9,000 gross, depending on the level of experience).
Enjoy a flexible, remote-first global environment.
Collaborate with a global team of cloud experts and innovators, passionate about pushing the boundaries of Kubernetes technology
Equity options.
Get quick feedback with a fast-paced workflow. Most feature projects are completed in 1 to 4 weeks.
Spend 10% of your work time on personal projects or self-improvement.
Learning budget for professional and personal development - including access to international conferences and courses that elevate your skills.
Annual hackathon to spark new ideas and strengthen team bonds.
Team-building budget and company events to connect with your colleagues.
Equipment budget to ensure you have everything you need.
Extra days off to help maintain a healthy work-life balance.

This is a location-specific opportunity. We are currently accepting applications from candidates residing in the following European countries: Bulgaria, Croatia, Estonia, Greece, Hungary, Latvia, Lithuania, Poland, Romania, Slovakia, Slovenia, and Ukraine.

*As part of our standard hiring process, we would like to inform you that a background check may be conducted at the final stage of recruitment through our third-party provider, Checkr.
*Please note that Cast AI does not provide any form of visa sponsorship/work permit.

#LI-Remote

Ready to apply?

Apply to Cast AI

Cast AI

View all jobs →

Senior Full Stack Engineer

Duetto Research · Croatia

Apply now

Engineering Croatia Posted Apr 23, 2026

Senior Fullstack Engineer

If you love building software that actually matters — and you're energised by the idea of shipping production-grade features across the full stack while working in a team where AI is a genuine daily collaborator, not a buzzword — this role is for you. You'll own end-to-end delivery on Duetto's core revenue platform, drive a major migration from monolith to microservices, and help shape how the team engineers at scale.

What Makes Us Different?

Duetto is the hospitality industry's leading revenue management platform, founded in 2012 by former Wynn Resorts executives who knew the industry needed better technology. We built the world's first Revenue & Profit Operating System — a suite of tools (GameChanger, ScoreBoard, BlockBuster, Advance and more) that goes beyond room pricing to give hotels, resorts and casinos a complete picture of their revenue and profitability. Trusted by clients ranging from independent boutique hotels to global chains, we've been named the #1 Revenue Management Software by HotelTechAwards four years running and the #1 Best Place to Work in Hotel Tech in 2025. Backed by GrowthCurve Capital since 2024, we're accelerating our investment in AI — and we're genuinely passionate about the industry we serve. We build products we're proud of, for customers we care about.

What You'll Be Doing

You'll own full-stack feature delivery end-to-end — from requirements through to production — across Java/Spring Boot microservices and TypeScript/React frontends, including monitoring and troubleshooting in live environments.
You'll make independent architectural decisions and convert prototypes into scalable, maintainable production systems, collaborating closely with Product, Design, and Engineering to hold a high bar for end-user quality.
You'll drive the migration of our legacy monolith, applying Domain-Driven Design principles, event-driven architecture patterns, and structured decomposition strategies to modernise software at scale.
You'll champion test quality by writing automated end-to-end tests in Cypress or Playwright and embedding data-driven testing practices across the team.
You'll mentor peers on prompt engineering, AI-assisted development, and code review — operating confidently in a generate-and-review model where 50–70% of code is AI-generated.
You'll contribute to and improve AI-augmented engineering workflows, building and refining custom skills, agents, and agentic pipelines that accelerate the whole team's delivery velocity.

What We're Looking For

You may be a good fit if you have:

4–6 years of full-stack engineering experience with production depth across both backend and frontend
Strong proficiency in Java and Spring Boot for enterprise backend development
Strong proficiency in TypeScript and React for modern web applications
Experience building and maintaining GraphQL APIs
Solid working knowledge of SQL and NoSQL databases, particularly MongoDB
Hands-on experience with end-to-end testing frameworks — Cypress or Playwright
A working understanding of microservices architecture and event-driven integration patterns
Demonstrated experience with Claude Code CLI or a comparable AI code generation tool — you're comfortable in a generate-and-review workflow, not just curious about it

Strong candidates may also have:

Experience in or exposure to the hospitality technology sector
A background in legacy modernisation — monolith decomposition, migration planning, or similar
Familiarity with Domain-Driven Design (DDD) in an enterprise context
Experience with AI code review tools such as CodeRabbit or Augment
Working knowledge of AWS, Kubernetes, and CI/CD pipeline management

Why Duetto?

AI isn't a side project here — it's how we work. With 50–70% of code AI-generated and a team actively building custom agents and agentic pipelines, you'll be operating at the frontier of how software gets built.
Real architectural ownership. This is a high-autonomy IC role — you'll make independent technical decisions that shape the platform, not just implement tickets.
Work that ships. We move fast, hold a high bar, and care deeply about what we put in front of customers. If you want to see your work in production, you will.
A mission worth caring about. Hospitality is a people industry, and we help the people running it do their jobs better. That's something to feel good about.
Fully remote from Split, Croatia, with a global team that takes collaboration seriously.

The Details

Location: Split, Croatia
Work model: Remote

Duetto is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. All qualified applicants will receive consideration for employment without regard to race, colour, religion, sex, sexual orientation, gender identity, national origin, disability, veteran status, or any other characteristic protected by applicable law.

Sound like you?

You don't need to tick every box — if this role excites you and you're strong across most of what we're looking for, we'd love to hear from you. Apply and tell us what you'd bring.

#LI-REMOTE