Quick Start
Install
Section titled “Install”git clone https://gitlab.agentstatelabs.com/AgentStateLabs/AgentStateCruciblecd AgentStateCruciblepip install -e .Define a scenario
Section titled “Define a scenario”scenarios/billing-refactor.yaml:
scenario: billing-refactorstarting_state: repo: ./fixtures/billing-repotask: | Refactor the proration calculator to handle mid-cycle upgrades. The current implementation is correct but unreadable.policy: allowed_effects: [io.fs.read, io.fs.write] forbidden_effects: [io.net.out]expectations: - all_existing_tests_pass - no_new_io_categoriesRun the bake-off
Section titled “Run the bake-off”crucible run scenarios/billing-refactor.yaml \ --agents agent-a,agent-b,agent-c \ --epoch crucible-2026-05-30Each agent receives the same scenario. Crucible records every decision each agent makes as a decision commit on its own branch of the ASG store.
crucible judge \ --epoch crucible-2026-05-30 \ --judge opus-judge-v2The judge agent reads each run, scores it on correctness / reasoning / blast radius, and writes its rulings back as ratification commits.
Seal the epoch
Section titled “Seal the epoch”crucible seal --epoch crucible-2026-05-30 --sign craig@The epoch is now tamper-evident. Export it as a self-contained bundle to hand to a stakeholder or auditor:
crucible export --epoch crucible-2026-05-30 --out crucible-2026-05-30.jsonReplay against a new agent version
Section titled “Replay against a new agent version”crucible diff --epoch q1-2026 --against crucible-2026-05-30Any divergence — different reasoning, different effect set, different outcome — is flagged for review, even if the test status didn’t change.