Interactive dashboard
These panels use representative stress-test values from the mechanism stack. Read the interface as a plausible coordination primitive and evaluation layer, not just a benchmark UI.
A live interface for the settlement layer: public constitutions, influence-free scoring, sparse audits, and credit for complementary evidence. The push is not another agent wrapper, but a candidate coordination primitive for evaluating multi-agent systems under adversarial pressure.
These panels use representative stress-test values from the mechanism stack. Read the interface as a plausible coordination primitive and evaluation layer, not just a benchmark UI.
The right read is not "look, agents do a task." It is "here is a plausible scoring and settlement layer an arena could actually use, critique, and reuse."
If the scoring rule can be steered, the system trains manipulation. Influence-free settlement is a first principle, not an optional patch.
Real deployments rarely observe the true objective every round. Sparse higher-fidelity audits are how you stop the proxy from drifting too far.
Useful evidence is often only decisive in combination. If the reward layer cannot see that, it underpays exactly the contributors you need.