Quick Start

Install

cd AgentStateCrucible
pip install -e .

Define a scenario

scenarios/billing-refactor.yaml:

scenario: billing-refactor
starting_state:
  repo: ./fixtures/billing-repo
task: |
  Refactor the proration calculator to handle mid-cycle upgrades.
  The current implementation is correct but unreadable.
policy:
  allowed_effects: [io.fs.read, io.fs.write]
  forbidden_effects: [io.net.out]
expectations:
  - all_existing_tests_pass
  - no_new_io_categories

Run the bake-off

crucible run scenarios/billing-refactor.yaml \
  --agents agent-a,agent-b,agent-c \
  --epoch crucible-2026-05-30

Each agent receives the same scenario. Crucible records every decision each agent makes as a decision commit on its own branch of the ASG store.

Judge

crucible judge \
  --epoch crucible-2026-05-30 \
  --judge opus-judge-v2

The judge agent reads each run, scores it on correctness / reasoning / blast radius, and writes its rulings back as ratification commits.

Seal the epoch

crucible seal --epoch crucible-2026-05-30 --sign craig@

The epoch is now tamper-evident. Export it as a self-contained bundle to hand to a stakeholder or auditor:

crucible export --epoch crucible-2026-05-30 --out crucible-2026-05-30.json

Replay against a new agent version

crucible diff --epoch q1-2026 --against crucible-2026-05-30

Any divergence — different reasoning, different effect set, different outcome — is flagged for review, even if the test status didn’t change.