About the role

Crogl · Remote

Crogl
Fully Autonomous AI Team Member for Your SOC

Crogl is building AI-powered systems that help security teams investigate, understand, and respond to threats faster. We combine advances in large language models, agent architectures, and security expertise to create intelligent systems that solve real-world problems for security practitioners.

We’re looking for AI Engineers who are excited about building practical AI products. You’ll help design, evaluate, and improve agentic systems that operate in production, working closely with customers, engineers, and product teams to push the boundaries of what’s possible with modern AI.

This role is ideal for early-career and mid-level engineers who are passionate about AI, enjoy shipping products, and want to work at the forefront of LLMs and agent systems.

What you’ll be doing:

Build LLM-powered features, workflows, and agentic systems that solve real customer problems.
Design and implement evaluation frameworks to measure agent quality, reliability, and business impact.
Create automated benchmarks, regression tests, and datasets for evaluating AI behavior.
Investigate agent failures and develop systematic approaches to improve performance.
Experiment with prompting, tool use, retrieval, memory, planning, and reasoning strategies.
Build infrastructure that supports rapid experimentation, evaluation, deployment, and monitoring.
Work closely with customers and internal teams to understand workflows and identify opportunities for AI automation.
Contribute to engineering best practices for testing, observability, and production reliability.
Stay current with advances in LLMs, agents, evaluation methodologies, and AI engineering.

A core part of this role: evaluating AI systems

Building agents is only half the challenge. Understanding whether they are actually improving is equally important. A significant portion of this role involves designing and maintaining evaluation systems for AI agents operating in real-world security workflows.

You’ll help answer questions like:

Is the agent producing accurate investigations?
Are changes to prompts, tools, or models actually improving outcomes?
How do we detect regressions before customers experience them?
Which failure modes matter most?
How do we measure reliability, trustworthiness, and business impact?

You’ll build benchmarks, datasets, automated evaluations, regression testing pipelines, and observability systems that help us continuously improve agent performance.

What you’ll bring to the team:

Strong programming skills, preferably in Python.
Solid software engineering fundamentals, including testing, debugging, and system design.
Experience building applications, projects, or products using LLMs and modern AI tools.
Ability to design experiments, interpret results, and make data-driven decisions.
Strong communication skills and willingness to collaborate across disciplines.
Curiosity, ownership, and a desire to learn quickly.

What makes you stand out from others:

Experience building AI agents, copilots, or workflow automation systems.
Experience designing evaluations, benchmarks, or testing frameworks for AI systems.
Familiarity with OpenAI, Anthropic, Gemini, or open-source LLM ecosystems.
Experience with retrieval systems, vector databases, and RAG architectures.
Familiarity with LangGraph, OpenAI Agents SDK, MCP, or similar agent frameworks.
Experience with observability, tracing, and production monitoring for AI systems.
Exposure to cybersecurity, security operations, or developer tooling.
Open-source contributions, research projects, or personal AI products.

Example projects:

Building evaluation suites that measure the quality of AI-driven security investigations.
Creating automated regression tests to detect model and prompt regressions.
Improving agent reasoning, planning, and tool-use capabilities.
Developing datasets and benchmarks that reflect real customer workflows.
Building systems that automatically identify and categorize agent failures.
Designing feedback loops that continuously improve production agent performance.

Who thrives at Crogl:

The strongest candidates for this role are usually builders.

You may have experience as a software engineer, ML engineer, researcher, AI engineer, or founder. What matters most is your ability to learn quickly, work independently, and ship impactful systems. We’re particularly excited about candidates who have:

Built and deployed AI agents.
Created evaluation frameworks for LLM applications.
Published AI projects, demos, or open-source contributions.
Developed tools that other people actively use.
Strong opinions about what makes AI systems reliable and useful.

If you’ve spent nights and weekends building AI products because you genuinely enjoy it, you’ll probably fit in well here.

What we value:

We care more about demonstrated ability than specific credentials.

You might be a great fit if you’ve:

Built AI products, agents, or developer tools that people actually use.
Created evaluation frameworks, benchmarks, or testing systems for LLM applications.
Contributed to open-source AI projects.
Conducted independent research or published technical writing.
Built ambitious side projects and iterated on them based on real-world feedback.
Shown exceptional curiosity and a track record of learning quickly.

We’re especially interested in candidates who can show us what they’ve built. GitHub repositories, demos, blog posts, open-source contributions, benchmarks, evaluation frameworks, AI products, and side projects are all valuable signals.

We don’t expect candidates to check every box. If you’re excited about the role and believe you can contribute, we’d encourage you to apply.

What success looks like:

Within your first six months, you will:

Ship improvements to production AI systems used by customers.
Help establish rigorous evaluation practices across the company.
Contribute new ideas that improve agent performance and reliability.
Develop a strong understanding of customer workflows and security investigations.
Become a trusted contributor across engineering, product, and AI initiatives.

Why Crogl?

Work on some of the most challenging problems in applied AI.
Help define how agentic systems are evaluated and deployed in production.
Join a small, highly collaborative team with significant ownership and impact.
Learn quickly while working alongside experienced engineers, researchers, and security experts.
Shape the future of AI-powered security operations.

Ready to apply to Crogl?

Apply to Crogl

About the role

Similar jobs

Whoa — hold up

About the role

Similar jobs

Whoa — hold up

Catch your next role the second it’s posted.

Get the worldwide-remote edge.