Curated jobs from companies hiring worldwide — engineering, design, product, marketing, support, and more. Remote‑first, updated daily.
Sign in to save this search and get notified by Telegram or email the moment a matching job is posted.
No account? Create one
Pick a job to read the details
Tap any role on the left — its description and apply link will open here.
Harper is an AI-native commercial insurance company in San Francisco. We're not bolting AI onto insurance — we're rebuilding the entire business as software, on a simple bet: turning expert human judgment into compute is one of the largest transitions left to make, and a trillion-dollar industry still run 90% by hand is the place to prove it. We've grown ~100x in the last year and we move at that speed — on-site, in person, long days, very high standards. Almost no one joins Harper for insurance; they join to build the company that replaces how it works.
Turning judgment into compute only compounds if the company can tell whether the compute is getting better. Today that's mostly vibes: an engineer ships a prompt change, a tool change, or a new model and judges it by feel — "seems better," "the demo passed." Vibes don't survive Series B, and they definitely don't survive an agent that's quoting real coverage for real businesses. Your job is to turn agent quality from a vibe into a number. Harper's agents handle intake, sales, service, voice, and submission packaging; every one needs to be evaluated, regression-tested, and monitored in production. You'll work alongside the engineer setting AI-quality direction and own a specific agent surface end-to-end — so that when the agent improves we know, and when it regresses we know before the customer does. That's how we scale judgment without scaling headcount.
Build capability + regression eval suites for your assigned agents — intake, submissions, placements, renewals, CRM, or voice.
Curate golden datasets from real failure modes: real transcripts, real underwriter back-and-forth, real call recordings. 20–50 sharp cases per agent, not thousands of synthetic ones.
Design graders. Deterministic first (string match, state check, tool-call assertions); LLM-as-judge where deterministic fails; human calibration on samples.
Ship pre-merge eval gates. Every PR touching an agent, prompt, or tool runs the relevant suite in CI. Below threshold, it's blocked.
Wire production trajectory monitoring. Online evaluators score live trajectories; drift gets caught within hours.
Turn ops findings into permanent tests. Every flagged failure becomes a regression case; every repeat issue becomes a test that catches it forever.
3–6 years building software, with hands-on production LLM/agent eval experience — capability + regression suite design, LLM-as-judge graders, golden datasets.
You can describe a specific regression an eval suite you built caught — and exactly how it would have leaked otherwise.
You've designed an LLM-as-judge rubric that survived human calibration, and you debug a hallucination by reading transcripts, not aggregate dashboards.
Familiar with at least one major eval framework; strong written communication (rubric docs, failure-mode taxonomies).
You write code with AI daily and have real opinions on which agent behaviors actually matter.
Bonus: open-source eval-framework contributions; red-team/adversarial testing; voice eval (latency, interruption, transcription accuracy); ML eval/observability background.
On-site in San Francisco, in person, long days, high standards. AI quality is the discipline that decides whether the whole bet holds, which means the work is scrutinized and the bar is high — your evals are what let everyone else ship fast without flying blind. The right person wants that leverage and that pace.
Compensation (OTE): $176,000–$253,000 cash (base + target performance bonus), plus competitive equity.
Location: San Francisco, in-office. Based here or willing to relocate.
Benefits: Uber commuter benefits; breakfast, lunch, and dinner provided; snacks and coffee stocked; free gym membership; health, dental, and vision.
Process: Founder call (15 min) → Tech Lead deep-dive (60 min, eval architecture and real failure modes) → Super Day on-site → founder + Tech Lead offer. No committee. Best offer, first.
To apply: If you've turned vibes into a number — built an eval suite that caught a regression a model upgrade silently introduced — send your resume, the framework, and a transcript of a failure you found that nobody else did.
Ready to apply?
Apply to Harper
Share this job
Ready to apply?
Apply to Redwood Materials
Share this job
Ready to apply?
Apply to Redwood Materials
Share this job
Ready to apply?
Apply to Redwood Materials
Share this job
Ready to apply?
Apply to Redwood Materials
Share this job
Ready to apply?
Apply to Redwood Materials
Share this job
Ready to apply?
Apply to Redwood Materials
Share this job
Ready to apply?
Apply to Redwood Materials
Share this job
Ready to apply?
Apply to Redwood Materials
Share this job
Ready to apply?
Apply to Redwood Materials
Share this job
Ready to apply?
Apply to Industrial Electric ManufacturingReady to apply?
Apply to SezzleShare this job
Ready to apply?
Apply to Redwood Materials
Share this job
Ready to apply?
Apply to Redwood Materials
Share this job
Ready to apply?
Apply to Redwood Materials
Share this job
Ready to apply?
Apply to Redwood Materials
Share this job
Ready to apply?
Apply to Redwood Materials
Share this job
Ready to apply?
Apply to Redwood Materials
Share this job
Ready to apply?
Apply to Redwood Materials
Share this job
Ready to apply?
Apply to Redwood Materials
Share this job
Ready to apply?
Apply to Redwood Materials
Share this job
Ready to apply?
Apply to Fairstead ESC LLCReady to apply?
Apply to Coupang
Share this job
Ready to apply?
Apply to Stone - Linkedin
Share this job
Ready to apply?
Apply to Stone - Linkedin
Share this job
Ready to apply?
Apply to Stone - Linkedin
Share this job
Ready to apply?
Apply to Stone - Linkedin
Share this job
Ready to apply?
Apply to Stone - Linkedin
Share this job
Ready to apply?
Apply to Stone - Linkedin
Share this job
Ready to apply?
Apply to Redwood Materials
Share this job
Ready to apply?
Apply to Redwood Materials
Share this job
Ready to apply?
Apply to Redwood Materials
Share this job
Ready to apply?
Apply to Oribe Hair Care
Share this job
Ready to apply?
Apply to Oribe Hair Care
Ready to apply?
Apply to Boomi
Share this job
Ready to apply?
Apply to Stone - Linkedin
Share this job
Ready to apply?
Apply to Stone - Linkedin
Ready to apply?
Apply to Nebius
Ready to apply?
Apply to Payoneer
Share this job
Ready to apply?
Apply to Stone - Linkedin
Share this job
Ready to apply?
Apply to Stone - Linkedin
Share this job
Ready to apply?
Apply to Databricks
Ready to apply?
Apply to Airbnb
Share this job
Ready to apply?
Apply to PayPay CardReady to apply?
Apply to Coupang
Share this job
Ready to apply?
Apply to Kymera Therapeutics
Ready to apply?
Apply to Datadog
Ready to apply?
Apply to Coupang
Share this job
Ready to apply?
Apply to Anduril Industries
Share this job
Ready to apply?
Apply to sweetgreen
Ready to apply?
Apply to Workato
JobsRadar was built for real people having a rough time in their job search — not for automated requests. You're clicking way too fast and you're now temporarily blocked.
Come back later. If you're genuinely job hunting, we've got your back — just act like a human.
Cookies & analytics
This site uses cookies from third-party services to deliver its features and to analyze traffic.
Create a free account and we’ll watch the boards for you — the instant a job matches your search, it lands in your inbox or Telegram. No digging, no refreshing.
Free forever · takes 30 seconds · already have one?
Every worldwide-remote role pushed to our Telegram the moment it goes live. Subscribers apply hours before this page even refreshes.