Judge Agents
Judge agents are LLM-pluggable. The default judge is a high-quality model (e.g. opus-judge-v2). The judge reads each run, scores it on correctness, reasoning quality, and effect blast radius, and writes its ruling back as a ratification commit.
Multiple judges can run in parallel for triangulation.