Introduction
AgentStateCrucible is an agent testing and validation framework built on AgentStateGraph (ASG) primitives: plans, policies, tasks, decision commits, blame, and sealed epochs.
Crucible runs the same scenario against multiple agents, captures every decision as an auditable graph, and uses a third judge agent to score runs side-by-side.
Why a crucible?
Section titled “Why a crucible?”Picking an agent for a job is a credentialing problem. Today the answer is vibes — a few demo runs, a leaderboard scraped from someone else’s benchmark, a hunch. Crucible replaces vibes with side-by-side judgment over auditable runs:
- Same scenario. Same starting state, same task, same policy. Every candidate agent gets the same inputs.
- Every decision a commit. ASG decision commits capture intent, reasoning, confidence, alternatives, and authority. No “trust me, it worked.”
- A third judge agent. An LLM judge scores runs on correctness, reasoning quality, and effect blast radius. Pluggable — bring your own judge.
- A sealed epoch. The entire run is bundled into a tamper-evident Merkle-rooted epoch. Hand it to an auditor and they can verify it without trusting the harness.
Sister projects
Section titled “Sister projects”- CTXone — underlying state graph store (used as backing store)
- AgentStateDeveloper — code-level ledger/effects
- AgentStateRouter — agent routing
- AgentStateGraph demo — the demo that birthed the validation concept
Status
Section titled “Status”Bootstrap. See CRUCIBLE.md in the repo for the v0 plan.