⚡🔄🛂 Gauntlet

Gauntlet is a two-agent adversarial loop that infers software correctness by observing how code behaves under sustained, targeted attack. It's designed as quality control for a dark factory environment — where code is written by bots and verified by attack.

The name comes from "running the gauntlet": a challenge where you must survive a sustained barrage from all sides. Here, the Inspector drives the system under test through escalating tiers of adversarial pressure until hidden failure modes become detectable — then gates promotion on whether any signal came through.

AI-written code can look correct — following conventions, passing linting, reading plausibly — while hiding behavioral failures that only surface under real use. Traditional tests don't catch this because the same agent that wrote the code also wrote the tests, sharing the same blind spots. Gauntlet is built for this: the Inspector assumes the code is broken and generates plans the code author never considered, and the blockers in each Weapon are never shown to the Attacker, preserving a real train/test split that prevents the agent from inadvertently writing code that passes by knowing what the tests check.

An Attacker uses a Weapon aimed at a Target to generate Plans. A Drone executes those Plans as a User. An Inspector watches and surfaces Findings. Hidden Vitals — externally observable truths about expected system behavior — are checked independently to produce a Clearance.

Quick start

Set your LLM credentials, then point Gauntlet at a running API:

export GAUNTLET_ATTACKER_TYPE=openai
export GAUNTLET_ATTACKER_KEY=sk-...
export GAUNTLET_INSPECTOR_TYPE=anthropic
export GAUNTLET_INSPECTOR_KEY=sk-ant-...

git clone git@github.com:coilysiren/gauntlet.git
cd gauntlet
docker compose run --rm demo

That starts the demo API and runs the full adversarial loop against it.

Installation

pip install gauntlet
# or: uv add gauntlet

Usage

For workflow guidance (when to run, how to integrate, how to act on results), see docs/usage.md.

LLM configuration

Gauntlet requires one LLM for the Attacker role and one for the Inspector role. Configure each with a pair of environment variables:

Variable	Description
`GAUNTLET_ATTACKER_TYPE`	LLM provider for the Attacker: `openai` or `anthropic`
`GAUNTLET_ATTACKER_KEY`	API key for the Attacker's provider
`GAUNTLET_INSPECTOR_TYPE`	LLM provider for the Inspector: `openai` or `anthropic`
`GAUNTLET_INSPECTOR_KEY`	API key for the Inspector's provider

The default models are gpt-4o for OpenAI and claude-opus-4-5 for Anthropic. Using different providers for each role is intentional — model diversity reduces blind spots.

CLI

gauntlet [url] [--config FILE] [--arsenal FILE] [--weapon FILE_OR_DIR] [--target FILE_OR_DIR] [--openapi FILE] [--users FILE] [--threshold N] [--no-fail-fast]

Argument	Default	Description
`url`	from config or required	Base URL of the running API
`--config`	`.gauntlet/config.yaml`	Path to a YAML config file; CLI flags override config values
`--arsenal`	none	Path to an Arsenal YAML file (a named collection of weapons)
`--weapon`	`.gauntlet/weapons`	Path to a Weapon YAML file, or a directory of YAML files (one weapon per file)
`--target`	`.gauntlet/targets`	Path to a Target YAML file, or a directory of YAML files (one target per file)
`--openapi`	none	Path to an OpenAPI 3.x YAML/JSON spec; auto-generates Target objects
`--users`	`.gauntlet/users.yaml`	Path to an users YAML file
`--threshold`	`0.90`	Holdout satisfaction score required to recommend merge
`--fail-fast` / `--no-fail-fast`	enabled	Stop at the first critical finding; use `--no-fail-fast` to run all iterations

gauntlet http://localhost:8000
gauntlet http://localhost:8000 --no-fail-fast
gauntlet http://localhost:8000 --openapi openapi.yaml
gauntlet http://localhost:8000 --arsenal .gauntlet/arsenal.yaml
gauntlet --config .gauntlet/config.yaml

Output is YAML:

risk_report:
  confidence_score: 0.06
  risk_level: critical
  confirmed_failures:
    - unauthorized_cross_user_modification   # userB rewrote userA's task
  coverage:
    - GET /tasks/42
    - PATCH /tasks/42
    - POST /tasks
  conclusion: >-
    System fails under adversarial pressure and should not be promoted
    without remediation.

Project config directory

Place your Gauntlet config files in a .gauntlet/ directory at the root of your project. The CLI discovers them automatically — no flags needed for the common case:

your-project/
├── .gauntlet/
│   ├── weapons/            # one YAML file per Weapon — all loaded automatically
│   │   ├── task_ownership.yaml
│   │   └── task_read_isolation.yaml
│   ├── targets/            # one YAML file per Target — all loaded automatically
│   │   └── task_endpoints.yaml
│   └── users.yaml         # User auth — loaded automatically if present
└── ...

Override any path with --weapon FILE_OR_DIR, --target FILE_OR_DIR, or --users FILE.

Weapons

A Weapon defines a reusable attack strategy. The blockers are the Weapon's Vitals — externally observable truths about expected system behavior — never shown to the Attacker, preserving the train/test separation.

# .gauntlet/weapons/task_ownership.yaml
title: Users cannot modify each other's tasks
description: >
  The task API must enforce resource ownership. A user who did not create
  a task must not be able to modify or delete it.
blockers:
  - A PATCH request by a non-owner is rejected with 403
  - The task body is unchanged after an unauthorized PATCH attempt
  - A GET by the owner after an unauthorized PATCH returns the original data

Targets

A Target defines the API surface a Weapon is tested against. One target per YAML file. Point multiple targets at the same weapon to test the same attack across different API surfaces.

# .gauntlet/targets/task_endpoints.yaml
title: Task ownership endpoints
endpoints:
  - POST /tasks
  - PATCH /tasks/{id}
  - GET /tasks/{id}

User authentication

Create .gauntlet/users.yaml to provide per-user credentials. Secret values are never stored in the file — each entry names an environment variable that holds the actual credential. Users omitted from the file fall back to the default X-User: <name> header.

# .gauntlet/users.yaml
users:
  alice:
    type: bearer
    token_env: ALICE_TOKEN       # export ALICE_TOKEN=eyJ...
  bob:
    type: api_key
    header: X-API-Key
    key_env: BOB_API_KEY         # export BOB_API_KEY=sk-...

Supported authentication types:

Type	Fields	Header sent
`bearer`	`token_env`	`Authorization: Bearer <$token_env>`
`api_key`	`header`, `key_env`	`<header>: <$key_env>`

Core Model

Gauntlet treats code change correctness as a problem of behavioral observation while under attack.

Code is assumed to be untrusted, potentially written but a human - but designed to be written by a bot
Tests are generated dynamically
Confidence emerges from what survives adversarial probing

It asks: "How hard did we try to break this, and what happened when we did?"

The Two Roles

The Attacker

Explores the execution space

Constructs plausible, production-like plans
Simulates how the system will actually be used (and misused)
Explores workflows, edge cases, and state transitions
Adapts based on what has already been tested

The Attacker is not trying to prove correctness. It is trying to create situations where correctness might fail.

The Inspector

Applies intelligent pressure

Analyzes execution results for weaknesses
Identifies suspicious passes and untested assumptions
Forms hypotheses about hidden failure modes
Forces the next round of plans toward likely breakpoints

The Inspector assumes "This system is broken. I just haven't proven it yet."

Dynamic Between Them

The Attacker explores
The Inspector sharpens
Execution grounds both

Together, they perform a form of guided adversarial search over the space of possible failures.

What Makes This Different

Gauntlet is not:

a test runner
a code reviewer
a fuzzing tool

It is an adversarial inference engine for software correctness.

It combines:

dynamic plan generation (like red teaming)
execution grounding (like CI)
adversarial refinement (like security testing)

Prior Art

These projects occupy the same space — adversarial testing of running services.

RESTler

Stateful REST API fuzzer from Microsoft Research. RESTler generates and executes sequences of HTTP requests against a live service, inferring producer-consumer dependencies between endpoints from the OpenAPI spec to explore deep service states.

Shared ground: attacks a running HTTP server with multi-step request sequences, finds bugs that only manifest through specific request orderings, and checks for both security and reliability failures.

Architectural divergence: RESTler uses grammar-based fuzzing derived from the OpenAPI spec, not LLM reasoning. Validation is hardcoded checkers (status codes, schema conformance), not an Inspector that reasons about what looks suspicious. There is no train/test split — all validation rules are visible to the generation logic. Output is boolean pass/fail per sequence, not a probabilistic confidence score.

Schemathesis

Property-based API testing built on the Hypothesis framework. Generates thousands of test cases from OpenAPI/GraphQL schemas and executes them against a live API to find crashes, schema violations, and stateful workflow bugs.

Shared ground: tests a live running API, supports stateful multi-step workflows where earlier requests create resources consumed by later ones, and is deliberately adversarial — generating edge cases, boundary conditions, and invalid inputs to break the API.

Architectural divergence: generation is algorithmic (property-based testing), not LLM-driven. There is no Attacker/Inspector separation — generation and validation are unified. No hidden blockers or train/test split. Results are deterministic pass/fail, not probabilistic confidence.

ToolFuzz

LLM-powered fuzzer from ETH Zurich that generates natural-language test prompts and executes them against LLM agent tools, detecting both runtime crashes and semantic correctness failures.

Shared ground: uses LLMs to generate adversarial inputs and has separate generation and evaluation phases — prompts are generated, executed against the target, and then an LLM judges whether outputs are semantically correct. This is the closest architectural parallel to the Attacker/Drone/Inspector pipeline.

Architectural divergence: targets LLM agent tools (LangChain, Composio) rather than arbitrary HTTP APIs. No hidden blockers or train/test split — the evaluator sees all context. Attacks are individual prompts, not multi-step chained API call sequences. No probabilistic confidence scoring.

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
.gauntlet		.gauntlet
.github/workflows		.github/workflows
demo_api		demo_api
docs		docs
gauntlet		gauntlet
scripts		scripts
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
README.md		README.md
TODO-v2.md		TODO-v2.md
TODO-v3.md		TODO-v3.md
TODO.md		TODO.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚡🔄🛂 Gauntlet

Quick start

Installation

Usage

LLM configuration

CLI

Project config directory

Weapons

Targets

User authentication

Core Model

The Two Roles

The Attacker

The Inspector

Dynamic Between Them

What Makes This Different

Prior Art

RESTler

Schemathesis

ToolFuzz

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

⚡🔄🛂 Gauntlet

Quick start

Installation

Usage

LLM configuration

CLI

Project config directory

Weapons

Targets

User authentication

Core Model

The Two Roles

The Attacker

The Inspector

Dynamic Between Them

What Makes This Different

Prior Art

RESTler

Schemathesis

ToolFuzz

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages