Introduction

AgentStateCrucible is an agent testing and validation framework built on AgentStateGraph (ASG) primitives: plans, policies, tasks, decision commits, blame, and sealed epochs.

Crucible runs the same scenario against multiple agents, captures every decision as an auditable graph, and uses a third judge agent to score runs side-by-side.

Why a crucible?

Picking an agent for a job is a credentialing problem. Today the answer is vibes — a few demo runs, a leaderboard scraped from someone else’s benchmark, a hunch. Crucible replaces vibes with side-by-side judgment over auditable runs:

Same scenario. Same starting state, same task, same policy. Every candidate agent gets the same inputs.
Every decision a commit. ASG decision commits capture intent, reasoning, confidence, alternatives, and authority. No “trust me, it worked.”
A third judge agent. An LLM judge scores runs on correctness, reasoning quality, and effect blast radius. Pluggable — bring your own judge.
A sealed epoch. The entire run is bundled into a tamper-evident Merkle-rooted epoch. Hand it to an auditor and they can verify it without trusting the harness.

Sister projects

CTXone — underlying state graph store (used as backing store)
AgentStateDeveloper — code-level ledger/effects
AgentStateRouter — agent routing
AgentStateGraph demo — the demo that birthed the validation concept

Status

Bootstrap. See CRUCIBLE.md in the repo for the v0 plan.