Gauntlet is a two-agent adversarial loop that infers software correctness by observing how code behaves under sustained, targeted attack. It's designed as quality control for a dark factory environment — where code is written by bots and verified by attack.
The name comes from "running the gauntlet": a challenge where you must survive a sustained barrage from all sides. Here, the Inspector drives the system under test through escalating tiers of adversarial pressure until hidden failure modes become detectable — then gates promotion on whether any signal came through.
AI-written code can look correct — following conventions, passing linting, reading plausibly — while hiding behavioral failures that only surface under real use. Traditional tests don't catch this because the same agent that wrote the code also wrote the tests, sharing the same blind spots. Gauntlet is built for this: the Inspector assumes the code is broken and generates plans the code author never considered, and the blockers in each Weapon are never shown to the Attacker, preserving a real train/test split that prevents the agent from inadvertently writing code that passes by knowing what the tests check.
An Attacker uses a Weapon aimed at a Target to generate Plans. A Drone executes those Plans as a User. An Inspector watches and surfaces Findings. Hidden Vitals — externally observable truths about expected system behavior — are checked independently to produce a Clearance.
Set your LLM credentials, then point Gauntlet at a running API:
export GAUNTLET_ATTACKER_TYPE=openai
export GAUNTLET_ATTACKER_KEY=sk-...
export GAUNTLET_INSPECTOR_TYPE=anthropic
export GAUNTLET_INSPECTOR_KEY=sk-ant-...
git clone git@github.com:coilysiren/gauntlet.git
cd gauntlet
docker compose run --rm demoThat starts the demo API and runs the full adversarial loop against it.
pip install gauntlet
# or: uv add gauntletFor workflow guidance (when to run, how to integrate, how to act on results), see docs/usage.md.
Gauntlet requires one LLM for the Attacker role and one for the Inspector role. Configure each with a pair of environment variables:
| Variable | Description |
|---|---|
GAUNTLET_ATTACKER_TYPE |
LLM provider for the Attacker: openai or anthropic |
GAUNTLET_ATTACKER_KEY |
API key for the Attacker's provider |
GAUNTLET_INSPECTOR_TYPE |
LLM provider for the Inspector: openai or anthropic |
GAUNTLET_INSPECTOR_KEY |
API key for the Inspector's provider |
The default models are gpt-4o for OpenAI and claude-opus-4-5 for Anthropic.
Using different providers for each role is intentional — model diversity reduces blind spots.
gauntlet [url] [--config FILE] [--arsenal FILE] [--weapon FILE_OR_DIR] [--target FILE_OR_DIR] [--openapi FILE] [--users FILE] [--threshold N] [--no-fail-fast]
| Argument | Default | Description |
|---|---|---|
url |
from config or required | Base URL of the running API |
--config |
.gauntlet/config.yaml |
Path to a YAML config file; CLI flags override config values |
--arsenal |
none | Path to an Arsenal YAML file (a named collection of weapons) |
--weapon |
.gauntlet/weapons |
Path to a Weapon YAML file, or a directory of YAML files (one weapon per file) |
--target |
.gauntlet/targets |
Path to a Target YAML file, or a directory of YAML files (one target per file) |
--openapi |
none | Path to an OpenAPI 3.x YAML/JSON spec; auto-generates Target objects |
--users |
.gauntlet/users.yaml |
Path to an users YAML file |
--threshold |
0.90 |
Holdout satisfaction score required to recommend merge |
--fail-fast / --no-fail-fast |
enabled | Stop at the first critical finding; use --no-fail-fast to run all iterations |
gauntlet http://localhost:8000
gauntlet http://localhost:8000 --no-fail-fast
gauntlet http://localhost:8000 --openapi openapi.yaml
gauntlet http://localhost:8000 --arsenal .gauntlet/arsenal.yaml
gauntlet --config .gauntlet/config.yamlOutput is YAML:
risk_report:
confidence_score: 0.06
risk_level: critical
confirmed_failures:
- unauthorized_cross_user_modification # userB rewrote userA's task
coverage:
- GET /tasks/42
- PATCH /tasks/42
- POST /tasks
conclusion: >-
System fails under adversarial pressure and should not be promoted
without remediation.Place your Gauntlet config files in a .gauntlet/ directory at the root of your project.
The CLI discovers them automatically — no flags needed for the common case:
your-project/
├── .gauntlet/
│ ├── weapons/ # one YAML file per Weapon — all loaded automatically
│ │ ├── task_ownership.yaml
│ │ └── task_read_isolation.yaml
│ ├── targets/ # one YAML file per Target — all loaded automatically
│ │ └── task_endpoints.yaml
│ └── users.yaml # User auth — loaded automatically if present
└── ...
Override any path with --weapon FILE_OR_DIR, --target FILE_OR_DIR, or --users FILE.
A Weapon defines a reusable attack strategy. The blockers are the Weapon's Vitals — externally observable truths about expected system behavior — never shown to the Attacker, preserving the train/test separation.
# .gauntlet/weapons/task_ownership.yaml
title: Users cannot modify each other's tasks
description: >
The task API must enforce resource ownership. A user who did not create
a task must not be able to modify or delete it.
blockers:
- A PATCH request by a non-owner is rejected with 403
- The task body is unchanged after an unauthorized PATCH attempt
- A GET by the owner after an unauthorized PATCH returns the original dataA Target defines the API surface a Weapon is tested against. One target per YAML file. Point multiple targets at the same weapon to test the same attack across different API surfaces.
# .gauntlet/targets/task_endpoints.yaml
title: Task ownership endpoints
endpoints:
- POST /tasks
- PATCH /tasks/{id}
- GET /tasks/{id}Create .gauntlet/users.yaml to provide per-user credentials. Secret values are
never stored in the file — each entry names an environment variable that holds the
actual credential. Users omitted from the file fall back to the default X-User: <name> header.
# .gauntlet/users.yaml
users:
alice:
type: bearer
token_env: ALICE_TOKEN # export ALICE_TOKEN=eyJ...
bob:
type: api_key
header: X-API-Key
key_env: BOB_API_KEY # export BOB_API_KEY=sk-...Supported authentication types:
| Type | Fields | Header sent |
|---|---|---|
bearer |
token_env |
Authorization: Bearer <$token_env> |
api_key |
header, key_env |
<header>: <$key_env> |
Gauntlet treats code change correctness as a problem of behavioral observation while under attack.
- Code is assumed to be untrusted, potentially written but a human - but designed to be written by a bot
- Tests are generated dynamically
- Confidence emerges from what survives adversarial probing
It asks: "How hard did we try to break this, and what happened when we did?"
Explores the execution space
- Constructs plausible, production-like plans
- Simulates how the system will actually be used (and misused)
- Explores workflows, edge cases, and state transitions
- Adapts based on what has already been tested
The Attacker is not trying to prove correctness. It is trying to create situations where correctness might fail.
Applies intelligent pressure
- Analyzes execution results for weaknesses
- Identifies suspicious passes and untested assumptions
- Forms hypotheses about hidden failure modes
- Forces the next round of plans toward likely breakpoints
The Inspector assumes "This system is broken. I just haven't proven it yet."
- The Attacker explores
- The Inspector sharpens
- Execution grounds both
Together, they perform a form of guided adversarial search over the space of possible failures.
Gauntlet is not:
- a test runner
- a code reviewer
- a fuzzing tool
It is an adversarial inference engine for software correctness.
It combines:
- dynamic plan generation (like red teaming)
- execution grounding (like CI)
- adversarial refinement (like security testing)
These projects occupy the same space — adversarial testing of running services.
Stateful REST API fuzzer from Microsoft Research. RESTler generates and executes sequences of HTTP requests against a live service, inferring producer-consumer dependencies between endpoints from the OpenAPI spec to explore deep service states.
Shared ground: attacks a running HTTP server with multi-step request sequences, finds bugs that only manifest through specific request orderings, and checks for both security and reliability failures.
Architectural divergence: RESTler uses grammar-based fuzzing derived from the OpenAPI spec, not LLM reasoning. Validation is hardcoded checkers (status codes, schema conformance), not an Inspector that reasons about what looks suspicious. There is no train/test split — all validation rules are visible to the generation logic. Output is boolean pass/fail per sequence, not a probabilistic confidence score.
Property-based API testing built on the Hypothesis framework. Generates thousands of test cases from OpenAPI/GraphQL schemas and executes them against a live API to find crashes, schema violations, and stateful workflow bugs.
Shared ground: tests a live running API, supports stateful multi-step workflows where earlier requests create resources consumed by later ones, and is deliberately adversarial — generating edge cases, boundary conditions, and invalid inputs to break the API.
Architectural divergence: generation is algorithmic (property-based testing), not LLM-driven. There is no Attacker/Inspector separation — generation and validation are unified. No hidden blockers or train/test split. Results are deterministic pass/fail, not probabilistic confidence.
LLM-powered fuzzer from ETH Zurich that generates natural-language test prompts and executes them against LLM agent tools, detecting both runtime crashes and semantic correctness failures.
Shared ground: uses LLMs to generate adversarial inputs and has separate generation and evaluation phases — prompts are generated, executed against the target, and then an LLM judges whether outputs are semantically correct. This is the closest architectural parallel to the Attacker/Drone/Inspector pipeline.
Architectural divergence: targets LLM agent tools (LangChain, Composio) rather than arbitrary HTTP APIs. No hidden blockers or train/test split — the evaluator sees all context. Attacks are individual prompts, not multi-step chained API call sequences. No probabilistic confidence scoring.