Companies REAL DEV INC Senior AI Engineer - AI Systems Evaluation Team

About the role

REAL DEV INC

REAL is building an AI Execution Platform for real estate organizations.

Today, the data required to run real estate is scattered across fragmented systems, leading to missed insights and preventable financial leakage.

REAL transforms this complexity into connected intelligence and automated execution, enabling enterprises to operate with greater precision and confidence.

REAL Values

  • Ownership: We take responsibility and move decisively.
  • Clarity: We simplify complexity to deliver meaningful impact.
  • Accuracy: Precision matters in everything we build.
  • Velocity: We work with urgency and intent.
  • Partnership: We collaborate closely with customers and teammates.

Role Overview

  • Own the systems that define, measure, and enforce AI quality at REAL.
  • Translate ambiguous model behavior into measurable signals, automated tests, and release gates.
  • Operate across evaluation design, tooling, and production integration.

What You'll Do

  • Design evaluation architectures (benchmarks, regression suites, coverage)
  • Build automated pipelines to run and score evals across models and prompts
  • Implement scoring systems (LLM-as-judge, rubrics, hybrid approaches)
  • Create and maintain golden datasets + edge-case suites
  • Develop internal tools for prompt testing, dataset generation, experiment tracking
  • Instrument systems for traces, outputs, and debugging
  • Detect regressions and enforce quality gates in CI/CD
  • Monitor model performance in production
  • Close the loop between eval insights and product improvements

Requirements

What We're Looking For

  • 3-6 years building production software, internal platforms, ML/data infrastructure, experimentation systems, or AI tooling
  • Strong backend and systems engineering fundamentals with hands-on applied AI experience
  • Strong Python, production-level systems experience
  • Built testing frameworks or validation systems end-to-end
  • Hands-on with LLMs / RAG / agent workflows
  • Understands eval methods (benchmarking, A/B, LLM-as-judge, HITL)
  • Experience with observability / logging / experiment tracking
  • Strong systems thinking (coverage, reliability, reproducibility)
  • Comfort with non-deterministic systems

Nice to Have

  • Experience with eval, tracing, observability, or experimentation tooling (one or more of the following: LangSmith, Braintrust, Phoenix, MLflow, OpenTelemetry, PostHog, custom eval stacks)
  • Familiarity with dataset/versioning workflows, HITL systems, and production AI observability systems
  • CI/CD integration for model evaluation
  • Background in search, retrieval, or document systems
  • Built internal platforms or developer tools
  • Experience working in startups and business driven environments

Ready to apply to REAL DEV INC?
Apply to REAL DEV INC
Apply now
🤖

Whoa — hold up

JobsRadar was built for real people having a rough time in their job search — not for automated requests. You're clicking way too fast and you're now temporarily blocked.

Come back later. If you're genuinely job hunting, we've got your back — just act like a human.

Catch your next role the second it’s posted.

Create a free account and we’ll watch the boards for you — the instant a job matches your search, it lands in your inbox or Telegram. No digging, no refreshing.

Create free account

Free forever · takes 30 seconds · already have one?

Get the worldwide-remote edge.

Join our Telegram channel for the stuff that helps you land the role — salary benchmarks, the weekly market pulse, and new-feature drops. No spam, just signal.

Join the channel — it's free