Companies Makro PRO DevOps / SRE Engineer - AI Platform

About the role

Makro PRO

The DevOps / SRE Engineer owns the operational substrate of an AI-native retail decisioning platform — infrastructure, CI / CD, observability, cost meter, and incident response for a system that runs production agents taking real business actions. The role builds on the enterprise Terraform standard, CI / CD spine, and FinOps tagging policy rather than reinventing parallel infrastructure. 

Remote candidates outside of Thailand are welcome to apply.

Key Responsibilities:

    • Adopt the enterprise Terraform standard and module library for all platform infrastructure; author platform-specific modules where needed (agent runtime, vector DB, knowledge graph); run drift detection weekly. 
    • Build platform-specific CI / CD pipelines on the enterprise spine — service deploys, agent deploys, eval-gate enforcement; integrate eval gates so no agent reaches production without eval pass. 
    • Operate rollback orchestration with sub-15-minute recovery; quarterly game days. 
    • Own the platform observability stack — OpenTelemetry, Langfuse for LLM traces, custom dashboards for per-agent cost. 
    • Implement the per-agent cost meter end-to-end — token counts, vector queries, model inference, downstream LLM Gateway costs; surface cost data to the enterprise GenAI cost dashboard. 
    • Stand up the platform on-call rotation; author runbooks for every production agent and service; lead incident response with measurable corrective actions. 
    • Implement platform cost-tagging policy consistent with the enterprise standard (team, domain, environment, project, agent, suite, persona); report monthly to Cost Review. 
    • Drive cost optimisation — right-sizing, caching, model routing decisions, reserved compute. 

Requirements

    • Bachelor's or Master's degree in Computer Science, Engineering, or a related discipline. 
    • 5+ years SRE / DevOps with production ownership. 
    • Terraform at scale — modules, state, drift, environment promotion. 
    • CI / CD for data + ML / AI services (GitLab CI / CD or comparable). 
    • Cloud platform (Azure preferred; AWS / GCP transferable). 
    • Observability — OpenTelemetry, Langfuse (or comparable LLM traces), custom dashboards. 
    • FinOps — tagging policies, attribution, optimisation. 
    • Incident response — on-call, post-mortems, runbook authorship. 

Preferred Qualifications

  • AI / agent platform SRE experience; cost-meter / chargeback systems built or operated. 
  • Multi-cloud production experience; open-source contributions to IaC / observability tooling. 
  • AI / ML / agent system observability instrumentation (LLM cost, agent cost, eval scores). 
  • Vendor certifications such as HashiCorp Terraform Associate / Professional, Azure Solutions Architect Associate, or Databricks Data Engineer Professional. 
Ready to apply to Makro PRO?
Apply to Makro PRO
Apply now
🤖

Whoa — hold up

JobsRadar was built for real people having a rough time in their job search — not for automated requests. You're clicking way too fast and you're now temporarily blocked.

Come back later. If you're genuinely job hunting, we've got your back — just act like a human.

Catch your next role the second it’s posted.

Create a free account and we’ll watch the boards for you — the instant a job matches your search, it lands in your inbox or Telegram. No digging, no refreshing.

Create free account

Free forever · takes 30 seconds · already have one?

Get the worldwide-remote edge.

Join our Telegram channel for the stuff that helps you land the role — salary benchmarks, the weekly market pulse, and new-feature drops. No spam, just signal.

Join the channel — it's free