@voybio/ace-swarm 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +109 -0
- package/LICENSE +186 -0
- package/README.md +229 -0
- package/assets/.agents/ACE/ACE-Init/AGENTS.md +210 -0
- package/assets/.agents/ACE/ACE-Init/instructions.md +118 -0
- package/assets/.agents/ACE/ACE_coders/AGENTS.md +154 -0
- package/assets/.agents/ACE/ACE_coders/INSTRUCTIONS.md +216 -0
- package/assets/.agents/ACE/AGENT_REGISTRY.md +70 -0
- package/assets/.agents/ACE/AGENT_REGISTRY_7.md +9 -0
- package/assets/.agents/ACE/DIRECTIVE_KERNEL.md +234 -0
- package/assets/.agents/ACE/UI/AGENTS.md +115 -0
- package/assets/.agents/ACE/UI/instructions.md +178 -0
- package/assets/.agents/ACE/VOS/ACE_VOS_MISSING_INFO_MATRIX.md +42 -0
- package/assets/.agents/ACE/VOS/AGENTS.md +72 -0
- package/assets/.agents/ACE/VOS/instructions.md +211 -0
- package/assets/.agents/ACE/agent-astgrep/AGENTS.md +123 -0
- package/assets/.agents/ACE/agent-astgrep/instructions.md +91 -0
- package/assets/.agents/ACE/agent-builder/AGENTS.md +172 -0
- package/assets/.agents/ACE/agent-builder/instructions.md +137 -0
- package/assets/.agents/ACE/agent-docs/AGENTS.md +159 -0
- package/assets/.agents/ACE/agent-docs/instructions.md +133 -0
- package/assets/.agents/ACE/agent-eval/AGENTS.md +46 -0
- package/assets/.agents/ACE/agent-eval/instructions.md +56 -0
- package/assets/.agents/ACE/agent-memory/AGENTS.md +49 -0
- package/assets/.agents/ACE/agent-memory/instructions.md +50 -0
- package/assets/.agents/ACE/agent-observability/AGENTS.md +46 -0
- package/assets/.agents/ACE/agent-observability/instructions.md +50 -0
- package/assets/.agents/ACE/agent-ops/AGENTS.md +201 -0
- package/assets/.agents/ACE/agent-ops/instructions.md +136 -0
- package/assets/.agents/ACE/agent-qa/AGENTS.md +189 -0
- package/assets/.agents/ACE/agent-qa/instructions.md +121 -0
- package/assets/.agents/ACE/agent-release/AGENTS.md +48 -0
- package/assets/.agents/ACE/agent-release/instructions.md +49 -0
- package/assets/.agents/ACE/agent-research/AGENTS.md +160 -0
- package/assets/.agents/ACE/agent-research/instructions.md +118 -0
- package/assets/.agents/ACE/agent-security/AGENTS.md +48 -0
- package/assets/.agents/ACE/agent-security/instructions.md +50 -0
- package/assets/.agents/ACE/agent-skeptic/AGENTS.md +178 -0
- package/assets/.agents/ACE/agent-skeptic/instructions.md +196 -0
- package/assets/.agents/ACE/agent-spec/AGENTS.md +169 -0
- package/assets/.agents/ACE/agent-spec/instructions.md +116 -0
- package/assets/.agents/ACE/orchestrator/AGENTS.md +365 -0
- package/assets/.agents/ACE/orchestrator/instructions.md +231 -0
- package/assets/.agents/skills/ace-orchestrator/SKILL.md +63 -0
- package/assets/.agents/skills/ace-orchestrator/references/engineering-bootstrap-playbook.md +360 -0
- package/assets/.agents/skills/astgrep-index/SKILL.md +58 -0
- package/assets/.agents/skills/codemunch/SKILL.md +65 -0
- package/assets/.agents/skills/codemunch/references/ast-driven-protocol.md +543 -0
- package/assets/.agents/skills/codesnipe/SKILL.md +64 -0
- package/assets/.agents/skills/codesnipe/references/dual-codebase-playbook.md +671 -0
- package/assets/.agents/skills/eval-harness/SKILL.md +203 -0
- package/assets/.agents/skills/handoff-lint/SKILL.md +164 -0
- package/assets/.agents/skills/incident-commander/SKILL.md +174 -0
- package/assets/.agents/skills/landing-review-watcher/SKILL.md +68 -0
- package/assets/.agents/skills/memory-curator/SKILL.md +179 -0
- package/assets/.agents/skills/problem-triage/SKILL.md +57 -0
- package/assets/.agents/skills/problem-triage/agents/openai.yaml +3 -0
- package/assets/.agents/skills/release-sentry/SKILL.md +189 -0
- package/assets/.agents/skills/risk-quant/SKILL.md +190 -0
- package/assets/.agents/skills/schema-forge/SKILL.md +174 -0
- package/assets/.agents/skills/skill-auditor/SKILL.md +52 -0
- package/assets/.agents/skills/state-auditor/SKILL.md +182 -0
- package/assets/.github/hooks/ace-copilot.json +68 -0
- package/assets/agent-state/ACE_WORKFLOW.md +131 -0
- package/assets/agent-state/ARTIFACT_MANIFEST.json +5 -0
- package/assets/agent-state/AST_GREP_COMMANDS.md +121 -0
- package/assets/agent-state/AST_GREP_INDEX.json +13 -0
- package/assets/agent-state/AST_GREP_INDEX.md +15 -0
- package/assets/agent-state/DECISIONS.md +7 -0
- package/assets/agent-state/EVIDENCE_LOG.md +7 -0
- package/assets/agent-state/HANDOFF.json +24 -0
- package/assets/agent-state/INTERFACE_REGISTRY.md +75 -0
- package/assets/agent-state/MODULES/gates/gate-autonomy.json +7 -0
- package/assets/agent-state/MODULES/gates/gate-completeness.json +7 -0
- package/assets/agent-state/MODULES/gates/gate-correctness.json +7 -0
- package/assets/agent-state/MODULES/gates/gate-evaluation.json +7 -0
- package/assets/agent-state/MODULES/gates/gate-operability.json +7 -0
- package/assets/agent-state/MODULES/gates/gate-security.json +7 -0
- package/assets/agent-state/MODULES/gates/gate-typescript-public-surface.json +7 -0
- package/assets/agent-state/MODULES/registry.json +41 -0
- package/assets/agent-state/MODULES/roles/capability-astgrep.json +49 -0
- package/assets/agent-state/MODULES/roles/capability-build.json +39 -0
- package/assets/agent-state/MODULES/roles/capability-docs.json +38 -0
- package/assets/agent-state/MODULES/roles/capability-eval.json +20 -0
- package/assets/agent-state/MODULES/roles/capability-memory.json +20 -0
- package/assets/agent-state/MODULES/roles/capability-observability.json +20 -0
- package/assets/agent-state/MODULES/roles/capability-ops.json +45 -0
- package/assets/agent-state/MODULES/roles/capability-qa.json +40 -0
- package/assets/agent-state/MODULES/roles/capability-release.json +21 -0
- package/assets/agent-state/MODULES/roles/capability-research.json +44 -0
- package/assets/agent-state/MODULES/roles/capability-security.json +21 -0
- package/assets/agent-state/MODULES/roles/capability-skeptic.json +48 -0
- package/assets/agent-state/MODULES/roles/capability-spec.json +42 -0
- package/assets/agent-state/MODULES/schemas/ACE_RUNTIME_PROFILE.schema.json +289 -0
- package/assets/agent-state/MODULES/schemas/ARTIFACT_MANIFEST.schema.json +185 -0
- package/assets/agent-state/MODULES/schemas/HANDOFF.agent-state.schema.json +124 -0
- package/assets/agent-state/MODULES/schemas/HANDOFF.schema.json +55 -0
- package/assets/agent-state/MODULES/schemas/RUNTIME_EXECUTOR_SESSION_REGISTRY.schema.json +290 -0
- package/assets/agent-state/MODULES/schemas/RUNTIME_TOOL_SPEC_REGISTRY.schema.json +144 -0
- package/assets/agent-state/MODULES/schemas/STATUS_EVENT.schema.json +84 -0
- package/assets/agent-state/MODULES/schemas/SWARM_HANDOFF.schema.json +138 -0
- package/assets/agent-state/MODULES/schemas/TRACKER_SNAPSHOT.schema.json +134 -0
- package/assets/agent-state/MODULES/schemas/VERICIFY_BRIDGE_SNAPSHOT.schema.json +157 -0
- package/assets/agent-state/MODULES/schemas/VERICIFY_PROCESS_POST_LOG.schema.json +93 -0
- package/assets/agent-state/MODULES/schemas/WORKSPACE_SESSION_REGISTRY.schema.json +133 -0
- package/assets/agent-state/PROVENANCE_LOG.md +28 -0
- package/assets/agent-state/QUALITY_GATES.md +15 -0
- package/assets/agent-state/RISKS.md +8 -0
- package/assets/agent-state/SCOPE.md +20 -0
- package/assets/agent-state/SKILL_CATALOG.md +48 -0
- package/assets/agent-state/STATUS.md +8 -0
- package/assets/agent-state/STATUS_EVENTS.ndjson +1 -0
- package/assets/agent-state/TASK.md +18 -0
- package/assets/agent-state/TEAL_CONFIG.md +117 -0
- package/assets/agent-state/handoff-registry.json +5 -0
- package/assets/agent-state/index-fingerprints.json +7 -0
- package/assets/agent-state/index.json +32 -0
- package/assets/agent-state/run-ledger.json +5 -0
- package/assets/agent-state/runtime-executor-sessions.json +5 -0
- package/assets/agent-state/runtime-tool-specs.json +5 -0
- package/assets/agent-state/runtime-workspaces.json +5 -0
- package/assets/agent-state/todo-state.json +7 -0
- package/assets/agent-state/tracker-snapshot.json +7 -0
- package/assets/agent-state/vericify/ace-bridge.json +60 -0
- package/assets/agent-state/vericify/process-posts.json +5 -0
- package/assets/instructions/ACE.instructions.md +187 -0
- package/assets/instructions/ACE_Coder.instructions.md +146 -0
- package/assets/instructions/ACE_UI.instructions.md +178 -0
- package/assets/instructions/ACE_VOS.instructions.md +211 -0
- package/assets/scripts/ace-hook-dispatch.mjs +538 -0
- package/assets/scripts/bootstrap-workspace.sh +27 -0
- package/assets/scripts/copilot-hook-dispatch.mjs +3 -0
- package/assets/scripts/eval-harness.sh +68 -0
- package/assets/scripts/render-mcp-configs.sh +396 -0
- package/assets/tasks/README.md +48 -0
- package/assets/tasks/SWARM_HANDOFF.example.json +53 -0
- package/assets/tasks/SWARM_HANDOFF.example_ui_to_coders.json +55 -0
- package/assets/tasks/SWARM_HANDOFF.example_vos_to_ui.json +55 -0
- package/assets/tasks/SWARM_HANDOFF.template.json +52 -0
- package/assets/tasks/cli_work_split.md +22 -0
- package/assets/tasks/lessons.md +17 -0
- package/assets/tasks/role_tasks.md +206 -0
- package/assets/tasks/todo.md +23 -0
- package/dist/ace-autonomy.d.ts +137 -0
- package/dist/ace-autonomy.js +472 -0
- package/dist/ace-context.d.ts +29 -0
- package/dist/ace-context.js +240 -0
- package/dist/ace-internal-tools.d.ts +8 -0
- package/dist/ace-internal-tools.js +76 -0
- package/dist/ace-server-instructions.d.ts +12 -0
- package/dist/ace-server-instructions.js +324 -0
- package/dist/agent-runtime/role-adapters.d.ts +29 -0
- package/dist/agent-runtime/role-adapters.js +573 -0
- package/dist/astgrep-index.d.ts +24 -0
- package/dist/astgrep-index.js +476 -0
- package/dist/cli.d.ts +3 -0
- package/dist/cli.js +591 -0
- package/dist/git-ops.d.ts +53 -0
- package/dist/git-ops.js +238 -0
- package/dist/handoff-registry.d.ts +71 -0
- package/dist/handoff-registry.js +422 -0
- package/dist/helpers.d.ts +126 -0
- package/dist/helpers.js +1687 -0
- package/dist/index-store.d.ts +51 -0
- package/dist/index-store.js +328 -0
- package/dist/index.d.ts +3 -0
- package/dist/index.js +7 -0
- package/dist/internal-tool-runtime.d.ts +21 -0
- package/dist/internal-tool-runtime.js +136 -0
- package/dist/job-scheduler.d.ts +175 -0
- package/dist/job-scheduler.js +1217 -0
- package/dist/kanban.d.ts +27 -0
- package/dist/kanban.js +339 -0
- package/dist/local-model-runtime.d.ts +40 -0
- package/dist/local-model-runtime.js +174 -0
- package/dist/model-bridge.d.ts +54 -0
- package/dist/model-bridge.js +587 -0
- package/dist/orchestrator-supervisor.d.ts +100 -0
- package/dist/orchestrator-supervisor.js +399 -0
- package/dist/problem-triage.d.ts +23 -0
- package/dist/problem-triage.js +448 -0
- package/dist/prompts.d.ts +7 -0
- package/dist/prompts.js +628 -0
- package/dist/public-surface.d.ts +30 -0
- package/dist/public-surface.js +316 -0
- package/dist/resources.d.ts +7 -0
- package/dist/resources.js +545 -0
- package/dist/run-ledger.d.ts +36 -0
- package/dist/run-ledger.js +257 -0
- package/dist/runtime-command.d.ts +18 -0
- package/dist/runtime-command.js +76 -0
- package/dist/runtime-executor.d.ts +104 -0
- package/dist/runtime-executor.js +985 -0
- package/dist/runtime-profile.d.ts +116 -0
- package/dist/runtime-profile.js +532 -0
- package/dist/runtime-tool-specs.d.ts +68 -0
- package/dist/runtime-tool-specs.js +527 -0
- package/dist/safe-edit.d.ts +52 -0
- package/dist/safe-edit.js +255 -0
- package/dist/schemas.d.ts +44 -0
- package/dist/schemas.js +830 -0
- package/dist/semantic-cache.d.ts +147 -0
- package/dist/semantic-cache.js +552 -0
- package/dist/semantic-hash.d.ts +83 -0
- package/dist/semantic-hash.js +346 -0
- package/dist/server.d.ts +10 -0
- package/dist/server.js +46 -0
- package/dist/shared.d.ts +136 -0
- package/dist/shared.js +269 -0
- package/dist/skill-auditor.d.ts +26 -0
- package/dist/skill-auditor.js +184 -0
- package/dist/skill-catalog.d.ts +60 -0
- package/dist/skill-catalog.js +305 -0
- package/dist/status-events.d.ts +40 -0
- package/dist/status-events.js +269 -0
- package/dist/store/ace-packed-store.d.ts +69 -0
- package/dist/store/ace-packed-store.js +434 -0
- package/dist/store/bootstrap-store.d.ts +46 -0
- package/dist/store/bootstrap-store.js +242 -0
- package/dist/store/catalog-builder.d.ts +21 -0
- package/dist/store/catalog-builder.js +68 -0
- package/dist/store/importer.d.ts +19 -0
- package/dist/store/importer.js +157 -0
- package/dist/store/knowledge-bake.d.ts +59 -0
- package/dist/store/knowledge-bake.js +339 -0
- package/dist/store/materializers/hook-context-materializer.d.ts +25 -0
- package/dist/store/materializers/hook-context-materializer.js +100 -0
- package/dist/store/materializers/host-file-materializer.d.ts +37 -0
- package/dist/store/materializers/host-file-materializer.js +271 -0
- package/dist/store/materializers/todo-syncer.d.ts +30 -0
- package/dist/store/materializers/todo-syncer.js +140 -0
- package/dist/store/materializers/vericify-projector.d.ts +38 -0
- package/dist/store/materializers/vericify-projector.js +239 -0
- package/dist/store/repositories/discovery-repository.d.ts +24 -0
- package/dist/store/repositories/discovery-repository.js +58 -0
- package/dist/store/repositories/handoff-repository.d.ts +31 -0
- package/dist/store/repositories/handoff-repository.js +67 -0
- package/dist/store/repositories/ledger-repository.d.ts +26 -0
- package/dist/store/repositories/ledger-repository.js +49 -0
- package/dist/store/repositories/runtime-kv-repository.d.ts +16 -0
- package/dist/store/repositories/runtime-kv-repository.js +36 -0
- package/dist/store/repositories/scheduler-repository.d.ts +50 -0
- package/dist/store/repositories/scheduler-repository.js +123 -0
- package/dist/store/repositories/session-repository.d.ts +33 -0
- package/dist/store/repositories/session-repository.js +82 -0
- package/dist/store/repositories/todo-repository.d.ts +31 -0
- package/dist/store/repositories/todo-repository.js +77 -0
- package/dist/store/repositories/tracker-repository.d.ts +25 -0
- package/dist/store/repositories/tracker-repository.js +43 -0
- package/dist/store/repositories/vericify-repository.d.ts +32 -0
- package/dist/store/repositories/vericify-repository.js +58 -0
- package/dist/store/skills-install.d.ts +28 -0
- package/dist/store/skills-install.js +86 -0
- package/dist/store/state-reader.d.ts +49 -0
- package/dist/store/state-reader.js +111 -0
- package/dist/store/store-artifacts.d.ts +12 -0
- package/dist/store/store-artifacts.js +138 -0
- package/dist/store/store-snapshot.d.ts +19 -0
- package/dist/store/store-snapshot.js +140 -0
- package/dist/store/topology-bake.d.ts +15 -0
- package/dist/store/topology-bake.js +215 -0
- package/dist/store/types.d.ts +155 -0
- package/dist/store/types.js +35 -0
- package/dist/store/workspace-snapshot.d.ts +26 -0
- package/dist/store/workspace-snapshot.js +107 -0
- package/dist/store/write-queue.d.ts +7 -0
- package/dist/store/write-queue.js +26 -0
- package/dist/todo-state.d.ts +41 -0
- package/dist/todo-state.js +399 -0
- package/dist/tools-agent.d.ts +7 -0
- package/dist/tools-agent.js +1542 -0
- package/dist/tools-discovery.d.ts +6 -0
- package/dist/tools-discovery.js +178 -0
- package/dist/tools-drift.d.ts +13 -0
- package/dist/tools-drift.js +357 -0
- package/dist/tools-files.d.ts +6 -0
- package/dist/tools-files.js +679 -0
- package/dist/tools-framework.d.ts +7 -0
- package/dist/tools-framework.js +1414 -0
- package/dist/tools-git.d.ts +6 -0
- package/dist/tools-git.js +183 -0
- package/dist/tools-handoff.d.ts +32 -0
- package/dist/tools-handoff.js +489 -0
- package/dist/tools-lifecycle.d.ts +6 -0
- package/dist/tools-lifecycle.js +205 -0
- package/dist/tools-memory.d.ts +6 -0
- package/dist/tools-memory.js +260 -0
- package/dist/tools-scheduler.d.ts +6 -0
- package/dist/tools-scheduler.js +228 -0
- package/dist/tools-skills.d.ts +3 -0
- package/dist/tools-skills.js +104 -0
- package/dist/tools-todo.d.ts +6 -0
- package/dist/tools-todo.js +154 -0
- package/dist/tools.d.ts +9 -0
- package/dist/tools.js +33 -0
- package/dist/tracker-adapters.d.ts +74 -0
- package/dist/tracker-adapters.js +776 -0
- package/dist/tracker-sync.d.ts +10 -0
- package/dist/tracker-sync.js +84 -0
- package/dist/tui/agent-runner.d.ts +137 -0
- package/dist/tui/agent-runner.js +466 -0
- package/dist/tui/agent-worker.d.ts +10 -0
- package/dist/tui/agent-worker.js +347 -0
- package/dist/tui/chat.d.ts +84 -0
- package/dist/tui/chat.js +368 -0
- package/dist/tui/commands.d.ts +57 -0
- package/dist/tui/commands.js +432 -0
- package/dist/tui/dashboard.d.ts +24 -0
- package/dist/tui/dashboard.js +110 -0
- package/dist/tui/index.d.ts +114 -0
- package/dist/tui/index.js +1059 -0
- package/dist/tui/input.d.ts +49 -0
- package/dist/tui/input.js +336 -0
- package/dist/tui/layout.d.ts +116 -0
- package/dist/tui/layout.js +367 -0
- package/dist/tui/ollama.d.ts +116 -0
- package/dist/tui/ollama.js +192 -0
- package/dist/tui/openai-compatible.d.ts +63 -0
- package/dist/tui/openai-compatible.js +370 -0
- package/dist/tui/provider-discovery.d.ts +59 -0
- package/dist/tui/provider-discovery.js +530 -0
- package/dist/tui/renderer.d.ts +166 -0
- package/dist/tui/renderer.js +304 -0
- package/dist/tui/tabs.d.ts +70 -0
- package/dist/tui/tabs.js +208 -0
- package/dist/tui/telemetry.d.ts +56 -0
- package/dist/tui/telemetry.js +106 -0
- package/dist/vericify-bridge.d.ts +146 -0
- package/dist/vericify-bridge.js +571 -0
- package/dist/vericify-context.d.ts +10 -0
- package/dist/vericify-context.js +72 -0
- package/dist/workspace-manager.d.ts +107 -0
- package/dist/workspace-manager.js +636 -0
- package/package.json +83 -0
|
@@ -0,0 +1,203 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: eval-harness
|
|
3
|
+
description:
|
|
4
|
+
Run deterministic autonomy evaluations for schemas, routing, completeness, and regressions. Use when contracts change, a release candidate is being promoted, or post-incident verification is required.
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Eval Harness
|
|
8
|
+
|
|
9
|
+
## Purpose
|
|
10
|
+
|
|
11
|
+
Provide repeatable quality evidence for autonomy behavior, not just code behavior.
|
|
12
|
+
Testing without baselines is hope. Testing with baselines and regression detection is engineering.
|
|
13
|
+
|
|
14
|
+
## Canonical Use Cases
|
|
15
|
+
|
|
16
|
+
1. A schema, handoff, or routing contract changed and the team needs deterministic conformance evidence before continuing.
|
|
17
|
+
2. A release candidate is being promoted and someone needs a repeatable quality gate across schema, routing, completeness, and behavioral suites.
|
|
18
|
+
3. An incident or routing anomaly just closed and the team wants proof that the failure mode no longer reproduces.
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
## MICE Boundaries
|
|
23
|
+
|
|
24
|
+
| Rule | Enforcement |
|
|
25
|
+
|---|---|
|
|
26
|
+
| **Modular** | Runs evaluations; does NOT fix failures or change schemas. |
|
|
27
|
+
| **Interoperable** | Eval results follow canonical report schema below. |
|
|
28
|
+
| **Customizable** | Pass thresholds adapt to `TEAL_CONFIG.md` quality targets. |
|
|
29
|
+
| **Extensible** | New eval suites added as suite definitions, not ad-hoc checks. |
|
|
30
|
+
|
|
31
|
+
---
|
|
32
|
+
|
|
33
|
+
## MICE/TEAL Alignment
|
|
34
|
+
|
|
35
|
+
- Modular verification harness independent of any single builder instance.
|
|
36
|
+
- TEAL-aware suites validate route correctness for configured topologies.
|
|
37
|
+
|
|
38
|
+
---
|
|
39
|
+
|
|
40
|
+
## Trigger Conditions
|
|
41
|
+
|
|
42
|
+
Use when:
|
|
43
|
+
|
|
44
|
+
- schema contracts changed
|
|
45
|
+
- release candidate is being promoted
|
|
46
|
+
- quality claims need objective proof
|
|
47
|
+
- post-incident verification required
|
|
48
|
+
- new agent or skill added to topology
|
|
49
|
+
- regression suspected from failure pattern
|
|
50
|
+
|
|
51
|
+
---
|
|
52
|
+
|
|
53
|
+
## Inputs
|
|
54
|
+
|
|
55
|
+
- schema files (`agent-state/MODULES/schemas/*`)
|
|
56
|
+
- handoff/event samples (from `HANDOFF_HISTORY/`, `STATUS_EVENTS.ndjson`)
|
|
57
|
+
- prior evaluation baseline (`agent-state/EVAL_BASELINE.json`)
|
|
58
|
+
- `agent-state/TEAL_CONFIG.md` (for route topology)
|
|
59
|
+
- `agent-state/QUALITY_GATES.md` (for pass criteria)
|
|
60
|
+
|
|
61
|
+
---
|
|
62
|
+
|
|
63
|
+
## Eval Suite Definition Schema
|
|
64
|
+
|
|
65
|
+
Each eval suite is defined as:
|
|
66
|
+
|
|
67
|
+
```markdown
|
|
68
|
+
### SUITE-<NNN>: <name>
|
|
69
|
+
- **Category:** <schema | routing | completeness | regression | behavioral>
|
|
70
|
+
- **Target:** <what is being evaluated>
|
|
71
|
+
- **Pass Threshold:** <numeric or boolean criterion>
|
|
72
|
+
- **Inputs:** <what data is needed>
|
|
73
|
+
- **Method:** <how evaluation is performed>
|
|
74
|
+
- **Baseline:** <reference to prior result for regression comparison>
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
---
|
|
78
|
+
|
|
79
|
+
## Standard Eval Suite Categories
|
|
80
|
+
|
|
81
|
+
| Category | What It Tests | Example Checks |
|
|
82
|
+
|---|---|---|
|
|
83
|
+
| **Schema** | JSON schema validity of artifacts | All handoffs validate; all events validate; manifest validates |
|
|
84
|
+
| **Routing** | Correct `from → to` transitions | Routes match TEAL topology; no orphan transitions |
|
|
85
|
+
| **Completeness** | Required fields and evidence presence | Handoffs have evidence refs; decisions have rationale |
|
|
86
|
+
| **Regression** | Comparison against baseline | No metrics worse than baseline; new failures not in baseline |
|
|
87
|
+
| **Behavioral** | Agent protocol compliance | Clarity Protocol headers present; MICE boundaries respected |
|
|
88
|
+
|
|
89
|
+
---
|
|
90
|
+
|
|
91
|
+
## Confidence Scoring
|
|
92
|
+
|
|
93
|
+
| Score | Label | Meaning |
|
|
94
|
+
|---|---|---|
|
|
95
|
+
| 95-100% | `HIGH` | All suites pass; no regression; full baseline coverage |
|
|
96
|
+
| 80-94% | `MEDIUM` | Minor failures; no regression in critical suites |
|
|
97
|
+
| 60-79% | `LOW` | Failures present; some regression detected |
|
|
98
|
+
| < 60% | `CRITICAL` | Major failures; regression threshold exceeded; block release |
|
|
99
|
+
|
|
100
|
+
---
|
|
101
|
+
|
|
102
|
+
## Workflow — Mapped to Clarity Protocol
|
|
103
|
+
|
|
104
|
+
### `[STATE_ANALYSIS]`
|
|
105
|
+
1. Identify what triggered evaluation (schema change, release, post-incident).
|
|
106
|
+
2. Load applicable eval suites from `EVAL_SUITES.md`.
|
|
107
|
+
3. Load baseline from `EVAL_BASELINE.json` if available.
|
|
108
|
+
|
|
109
|
+
### `[STRATEGY_SELECTOR]`
|
|
110
|
+
4. Select suites relevant to trigger:
|
|
111
|
+
- Schema change → schema + routing suites.
|
|
112
|
+
- Release promotion → all suites.
|
|
113
|
+
- Post-incident → regression + behavioral suites.
|
|
114
|
+
5. Plan execution order: schema first, then routing, then completeness, then regression.
|
|
115
|
+
|
|
116
|
+
### `[EXECUTION_LOG]`
|
|
117
|
+
6. Run each eval suite:
|
|
118
|
+
- Schema: validate sample payloads against schemas.
|
|
119
|
+
- Routing: check all transitions in `HANDOFF_HISTORY/` against `TEAL_CONFIG.md`.
|
|
120
|
+
- Completeness: verify required fields in all state artifacts.
|
|
121
|
+
- Regression: compare current metrics against baseline.
|
|
122
|
+
- Behavioral: check for Clarity Protocol headers and MICE compliance markers.
|
|
123
|
+
7. Record pass/fail for each check with evidence.
|
|
124
|
+
8. Compute confidence score.
|
|
125
|
+
|
|
126
|
+
### `[ARTIFACT_UPDATE]`
|
|
127
|
+
9. Write `agent-state/EVAL_REPORT.md`:
|
|
128
|
+
|
|
129
|
+
```markdown
|
|
130
|
+
# Evaluation Report
|
|
131
|
+
Generated: <ISO8601>
|
|
132
|
+
Trigger: <what caused this evaluation>
|
|
133
|
+
Confidence: <score>% (<label>)
|
|
134
|
+
|
|
135
|
+
## Suite Results
|
|
136
|
+
| Suite | Category | Pass/Fail | Detail |
|
|
137
|
+
|---|---|---|---|
|
|
138
|
+
| SUITE-001 | schema | PASS/FAIL | <detail> |
|
|
139
|
+
| ... | ... | ... | ... |
|
|
140
|
+
|
|
141
|
+
## Regression Analysis
|
|
142
|
+
| Metric | Baseline | Current | Delta | Status |
|
|
143
|
+
|---|---|---|---|---|
|
|
144
|
+
| Schema validity | <n>% | <n>% | <delta> | OK/REGRESSION |
|
|
145
|
+
| ... | ... | ... | ... | ... |
|
|
146
|
+
|
|
147
|
+
## Actionable Failures
|
|
148
|
+
| Failure | Owner | Required Action |
|
|
149
|
+
|---|---|---|
|
|
150
|
+
| <failure> | <agent-role> | <what must happen> |
|
|
151
|
+
|
|
152
|
+
## Residual Risk
|
|
153
|
+
- <risks that remain even if all failures are fixed>
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
10. Update `agent-state/EVAL_BASELINE.json` with new baseline (if confidence >= MEDIUM).
|
|
157
|
+
11. Write/update `agent-state/EVAL_SUITES.md` with suite definitions.
|
|
158
|
+
12. Append evaluation evidence to `EVIDENCE_LOG.md`.
|
|
159
|
+
|
|
160
|
+
### `[VERIFICATION]`
|
|
161
|
+
13. Every suite produced a deterministic pass/fail result.
|
|
162
|
+
14. Confidence score is mathematically consistent with suite results.
|
|
163
|
+
15. Actionable failures have owner + required action.
|
|
164
|
+
16. Regression analysis compares against actual baseline (not fabricated).
|
|
165
|
+
|
|
166
|
+
---
|
|
167
|
+
|
|
168
|
+
## Output Contract
|
|
169
|
+
|
|
170
|
+
- `agent-state/EVAL_SUITES.md` (suite definitions)
|
|
171
|
+
- `agent-state/EVAL_REPORT.md` (results, regression analysis, actionable failures)
|
|
172
|
+
- `agent-state/EVAL_BASELINE.json` (updated baseline if criteria met)
|
|
173
|
+
|
|
174
|
+
---
|
|
175
|
+
|
|
176
|
+
## Anti-Patterns
|
|
177
|
+
|
|
178
|
+
| Anti-Pattern | Correct Behavior |
|
|
179
|
+
|---|---|
|
|
180
|
+
| "All tests pass" without evidence | Every pass/fail has detail and evidence ref |
|
|
181
|
+
| Regression comparison without baseline | First run establishes baseline; no regression claim possible |
|
|
182
|
+
| Skipping behavioral suites | Autonomy behavior is tested, not just code |
|
|
183
|
+
| Confidence score without math | Score = (passing checks / total checks) with category weights |
|
|
184
|
+
| Fixing failures during evaluation | Report failures; do NOT fix them (route to owner) |
|
|
185
|
+
|
|
186
|
+
---
|
|
187
|
+
|
|
188
|
+
## Wrong-Stuff Protocol
|
|
189
|
+
|
|
190
|
+
| Finding | Classification | Route To |
|
|
191
|
+
|---|---|---|
|
|
192
|
+
| Schema validation failure | `schema_violation` | `schema-forge` |
|
|
193
|
+
| Routing topology mismatch | `routing_error` | `agent-ops` |
|
|
194
|
+
| Missing evidence/completeness | `documentation_drift` | source agent |
|
|
195
|
+
| Regression detected | `quality_regression` | `agent-qa` then `agent-builder` |
|
|
196
|
+
| Behavioral non-compliance | `protocol_violation` | violating agent |
|
|
197
|
+
|
|
198
|
+
---
|
|
199
|
+
|
|
200
|
+
## Failure Policy
|
|
201
|
+
|
|
202
|
+
If regression threshold is exceeded (confidence < 60%), emit `GATE_FAILED` and block release transition.
|
|
203
|
+
Route each actionable failure to its owner via Wrong-Stuff Protocol.
|
|
@@ -0,0 +1,164 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: handoff-lint
|
|
3
|
+
description:
|
|
4
|
+
Enforce complete, schema-valid, evidence-linked handoffs before any module transition is persisted.
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Handoff Lint
|
|
8
|
+
|
|
9
|
+
## Purpose
|
|
10
|
+
|
|
11
|
+
Prevent malformed or context-losing transitions by applying strict handoff validation rules.
|
|
12
|
+
A handoff without evidence is a broken telephone. This skill is the dial tone check.
|
|
13
|
+
|
|
14
|
+
## Canonical Use Cases
|
|
15
|
+
|
|
16
|
+
1. An agent is about to write `agent-state/HANDOFF.json` and needs a blocking validation pass first.
|
|
17
|
+
2. A swarm handoff payload was generated after a complex routing cycle and someone needs to verify schema, evidence refs, and topology before persistence.
|
|
18
|
+
3. Post-incident forensics found suspicious transitions and the team needs to replay handoff validity checks deterministically.
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
## MICE Boundaries
|
|
23
|
+
|
|
24
|
+
| Rule | Enforcement |
|
|
25
|
+
|---|---|
|
|
26
|
+
| **Modular** | Validates handoff payloads; does NOT generate content or make routing decisions. |
|
|
27
|
+
| **Interoperable** | Validates against canonical schemas in `agent-state/MODULES/schemas/`. |
|
|
28
|
+
| **Customizable** | Lint rules adapt to handoff type (`swarm` vs `agent-state`). |
|
|
29
|
+
| **Extensible** | New validation rules added as lint-rule rows, not prose. |
|
|
30
|
+
|
|
31
|
+
---
|
|
32
|
+
|
|
33
|
+
## MICE/TEAL Alignment
|
|
34
|
+
|
|
35
|
+
- Interoperability guardrail for modular handoffs.
|
|
36
|
+
- Enforces TEAL dependency integrity by blocking invalid route transitions.
|
|
37
|
+
|
|
38
|
+
---
|
|
39
|
+
|
|
40
|
+
## Trigger Conditions
|
|
41
|
+
|
|
42
|
+
Use before writing:
|
|
43
|
+
|
|
44
|
+
- `agent-state/HANDOFF.json`
|
|
45
|
+
- `agent-state/HANDOFF_HISTORY/*.json`
|
|
46
|
+
- `tasks/SWARM_HANDOFF.*.json`
|
|
47
|
+
|
|
48
|
+
Also use when:
|
|
49
|
+
|
|
50
|
+
- any agent emits a transition event
|
|
51
|
+
- ops detects routing anomaly
|
|
52
|
+
- post-incident handoff forensics
|
|
53
|
+
|
|
54
|
+
---
|
|
55
|
+
|
|
56
|
+
## Inputs
|
|
57
|
+
|
|
58
|
+
- candidate handoff payload (JSON)
|
|
59
|
+
- `agent-state/STATUS.md`
|
|
60
|
+
- `agent-state/EVIDENCE_LOG.md`
|
|
61
|
+
- schema files in `agent-state/MODULES/schemas/`
|
|
62
|
+
- `agent-state/TEAL_CONFIG.md` (for valid route topology)
|
|
63
|
+
|
|
64
|
+
---
|
|
65
|
+
|
|
66
|
+
## Lint Rules Table
|
|
67
|
+
|
|
68
|
+
| # | Rule | Check | On Fail |
|
|
69
|
+
|---|---|---|---|
|
|
70
|
+
| 1 | **Schema Validity** | Payload validates against applicable JSON schema | Reject with field-level errors |
|
|
71
|
+
| 2 | **Required Fields** | `from`, `to`, `timestamp`, `context`, `evidence_refs` all present | Reject; list missing fields |
|
|
72
|
+
| 3 | **Route Legitimacy** | `from` → `to` transition exists in `TEAL_CONFIG.md` topology | Reject; log invalid route |
|
|
73
|
+
| 4 | **Evidence Freshness** | All `evidence_refs` point to entries < 24h old (or current sprint) | Warn if stale; reject if > 48h |
|
|
74
|
+
| 5 | **Evidence Existence** | Every `evidence_ref` resolves to an actual entry in `EVIDENCE_LOG.md` | Reject; list dangling refs |
|
|
75
|
+
| 6 | **Context Completeness** | `context.objective`, `context.blockers`, `context.artifacts` present | Reject; list missing context |
|
|
76
|
+
| 7 | **Checksum Integrity** | If `checksum` field present, payload hash matches | Reject; integrity violation |
|
|
77
|
+
| 8 | **Duplicate Detection** | No identical handoff in `HANDOFF_HISTORY/` within last 5 minutes | Warn; likely re-emission |
|
|
78
|
+
| 9 | **Owner Consistency** | `from` matches current owner in `STATUS.md` | Warn; possible stale status |
|
|
79
|
+
|
|
80
|
+
---
|
|
81
|
+
|
|
82
|
+
## Workflow — Mapped to Clarity Protocol
|
|
83
|
+
|
|
84
|
+
### `[STATE_ANALYSIS]`
|
|
85
|
+
1. Receive candidate handoff payload.
|
|
86
|
+
2. Determine handoff schema mode (`swarm` vs `agent-state`) from payload structure.
|
|
87
|
+
3. Load applicable schema from `agent-state/MODULES/schemas/`.
|
|
88
|
+
|
|
89
|
+
### `[STRATEGY_SELECTOR]`
|
|
90
|
+
4. Plan lint pass: run all 9 rules in order.
|
|
91
|
+
5. Classify each rule result as `PASS`, `WARN`, or `FAIL`.
|
|
92
|
+
|
|
93
|
+
### `[EXECUTION_LOG]`
|
|
94
|
+
6. Execute each lint rule against the payload.
|
|
95
|
+
7. For schema validation: validate every field against JSON schema.
|
|
96
|
+
8. For route legitimacy: check `TEAL_CONFIG.md` for valid `from → to` path.
|
|
97
|
+
9. For evidence checks: resolve each `evidence_ref` to actual `EVIDENCE_LOG.md` entry.
|
|
98
|
+
10. Collect all results.
|
|
99
|
+
|
|
100
|
+
### `[ARTIFACT_UPDATE]`
|
|
101
|
+
11. Produce lint report:
|
|
102
|
+
|
|
103
|
+
```json
|
|
104
|
+
{
|
|
105
|
+
"handoff_lint": {
|
|
106
|
+
"status": "PASS | WARN | FAIL",
|
|
107
|
+
"timestamp": "<ISO8601>",
|
|
108
|
+
"rules": [
|
|
109
|
+
{"id": 1, "name": "schema_validity", "status": "PASS|WARN|FAIL", "detail": "..."},
|
|
110
|
+
{"id": 2, "name": "required_fields", "status": "PASS|WARN|FAIL", "detail": "..."}
|
|
111
|
+
],
|
|
112
|
+
"blocking_errors": [],
|
|
113
|
+
"warnings": []
|
|
114
|
+
}
|
|
115
|
+
}
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
12. If PASS: allow handoff write to proceed.
|
|
119
|
+
13. If FAIL: block handoff write; attach lint errors to `EVIDENCE_LOG.md`.
|
|
120
|
+
|
|
121
|
+
### `[VERIFICATION]`
|
|
122
|
+
14. Every lint rule produced a deterministic result.
|
|
123
|
+
15. No FAIL-status handoff was written to disk.
|
|
124
|
+
16. Lint report is attached to the evidence trail.
|
|
125
|
+
|
|
126
|
+
---
|
|
127
|
+
|
|
128
|
+
## Output Contract
|
|
129
|
+
|
|
130
|
+
- On pass: `{ "status": "PASS", "rules": [...] }` — handoff write proceeds
|
|
131
|
+
- On fail: `{ "status": "FAIL", "blocking_errors": [...] }` — handoff write blocked, `GATE_FAILED` emitted, lint errors appended to `EVIDENCE_LOG.md`
|
|
132
|
+
|
|
133
|
+
---
|
|
134
|
+
|
|
135
|
+
## Anti-Patterns
|
|
136
|
+
|
|
137
|
+
| Anti-Pattern | Correct Behavior |
|
|
138
|
+
|---|---|
|
|
139
|
+
| Writing handoff without running lint | Every handoff write is lint-gated |
|
|
140
|
+
| Ignoring WARN results | Warnings must be acknowledged; repeated warns escalate to FAIL |
|
|
141
|
+
| Evidence refs to chat context | Evidence refs must point to durable state files, not ephemeral conversation |
|
|
142
|
+
| "from: unknown" | `from` must match a real agent role in the registry |
|
|
143
|
+
|
|
144
|
+
---
|
|
145
|
+
|
|
146
|
+
## Wrong-Stuff Protocol
|
|
147
|
+
|
|
148
|
+
| Finding | Classification | Route To |
|
|
149
|
+
|---|---|---|
|
|
150
|
+
| Schema violation | `schema_violation` | `schema-forge` |
|
|
151
|
+
| Invalid route topology | `routing_error` | `agent-ops` |
|
|
152
|
+
| Dangling evidence refs | `documentation_drift` | source agent that created handoff |
|
|
153
|
+
| Stale evidence | `evidence_decay` | `agent-research` or source agent |
|
|
154
|
+
|
|
155
|
+
---
|
|
156
|
+
|
|
157
|
+
## Failure Policy
|
|
158
|
+
|
|
159
|
+
On failure:
|
|
160
|
+
|
|
161
|
+
- do not write handoff payload
|
|
162
|
+
- emit `GATE_FAILED` with `failure_type: handoff_lint_failure`
|
|
163
|
+
- attach lint errors to `EVIDENCE_LOG.md`
|
|
164
|
+
- block downstream transition until all FAIL rules are resolved
|
|
@@ -0,0 +1,174 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: incident-commander
|
|
3
|
+
description:
|
|
4
|
+
Incident response protocol for autonomy failures: severity declaration, owner assignment, timeline reconstruction, and closure discipline.
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Incident Commander
|
|
8
|
+
|
|
9
|
+
## Purpose
|
|
10
|
+
|
|
11
|
+
Restore control during autonomy incidents while preserving forensic traceability and post-incident learning.
|
|
12
|
+
An incident without a commander is chaos. An incident with a commander is a learning opportunity.
|
|
13
|
+
|
|
14
|
+
## Canonical Use Cases
|
|
15
|
+
|
|
16
|
+
1. Two or more consecutive gate failures indicate the normal recovery loop is not containing the issue.
|
|
17
|
+
2. Multiple blockers or contradictory state files are stalling several agents and someone needs a single accountable incident owner plus timeline.
|
|
18
|
+
3. Releases must remain paused after an incident until mitigation, blast radius, and closure evidence are explicit.
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
## MICE Boundaries
|
|
23
|
+
|
|
24
|
+
| Rule | Enforcement |
|
|
25
|
+
|---|---|
|
|
26
|
+
| **Modular** | Coordinates incident response; does NOT fix code, rewrite specs, or change scope. |
|
|
27
|
+
| **Interoperable** | Incident records follow canonical schema below; events validate against `STATUS_EVENT.schema.json`. |
|
|
28
|
+
| **Customizable** | Severity thresholds adapt to `TEAL_CONFIG.md` pipeline criticality. |
|
|
29
|
+
| **Extensible** | New incident categories added via schema, not ad-hoc prose. |
|
|
30
|
+
|
|
31
|
+
---
|
|
32
|
+
|
|
33
|
+
## MICE/TEAL Alignment
|
|
34
|
+
|
|
35
|
+
- Modular incident layer that coordinates but does not rewrite role contracts.
|
|
36
|
+
- TEAL-aware incident routing maps failures to the correct dependency owner.
|
|
37
|
+
|
|
38
|
+
---
|
|
39
|
+
|
|
40
|
+
## Trigger Conditions
|
|
41
|
+
|
|
42
|
+
Use when:
|
|
43
|
+
|
|
44
|
+
- repeated gate failures occur (>= 2 consecutive `GATE_FAILED` events)
|
|
45
|
+
- blocker storms spread across dependencies (>= 3 active blockers)
|
|
46
|
+
- reliability SLOs are breached
|
|
47
|
+
- agent-ops circuit breaker has opened
|
|
48
|
+
- contradictory state detected by state-auditor
|
|
49
|
+
|
|
50
|
+
---
|
|
51
|
+
|
|
52
|
+
## Inputs
|
|
53
|
+
|
|
54
|
+
- `agent-state/STATUS_EVENTS.ndjson`
|
|
55
|
+
- `agent-state/STATUS.md`
|
|
56
|
+
- `agent-state/EVIDENCE_LOG.md`
|
|
57
|
+
- `agent-state/RISKS.md`
|
|
58
|
+
- open handoffs (current `HANDOFF.json`)
|
|
59
|
+
- `agent-state/STATE_AUDIT_REPORT.md` (if available)
|
|
60
|
+
|
|
61
|
+
---
|
|
62
|
+
|
|
63
|
+
## Severity Classification Matrix
|
|
64
|
+
|
|
65
|
+
| Severity | Criteria | Response SLA | Blast Radius |
|
|
66
|
+
|---|---|---|---|
|
|
67
|
+
| `SEV-1` | Objective failure; thesis invalidation; data loss; all pipelines blocked | Immediate; all other work paused | Full swarm |
|
|
68
|
+
| `SEV-2` | Sprint-level delay; architecture change required; multi-agent blocker | Within current cycle | Multiple agents |
|
|
69
|
+
| `SEV-3` | Single-agent blocker; quality regression detected | Within 2 cycles | Single agent + downstream |
|
|
70
|
+
| `SEV-4` | Non-blocking quality concern; process deviation | Next available cycle | Informational |
|
|
71
|
+
|
|
72
|
+
---
|
|
73
|
+
|
|
74
|
+
## Incident Record Schema
|
|
75
|
+
|
|
76
|
+
```markdown
|
|
77
|
+
# INCIDENT-<NNN>
|
|
78
|
+
- **Severity:** <SEV-1 | SEV-2 | SEV-3 | SEV-4>
|
|
79
|
+
- **Declared:** <ISO8601>
|
|
80
|
+
- **Status:** <open | investigating | mitigating | resolved | closed>
|
|
81
|
+
- **Blast Radius:** <affected agents/modules>
|
|
82
|
+
- **Owner:** <single accountable agent-role>
|
|
83
|
+
- **Summary:** <1-2 sentence description>
|
|
84
|
+
- **Root Cause:** <determined | investigating | unknown>
|
|
85
|
+
- **Root Cause Detail:** <when determined>
|
|
86
|
+
- **Trigger Event:** <STATUS_EVENTS.ndjson entry ref>
|
|
87
|
+
- **Resolution:** <what fixed it>
|
|
88
|
+
- **Closed:** <ISO8601 or N/A>
|
|
89
|
+
- **Lessons:** <what changes to prevent recurrence>
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
---
|
|
93
|
+
|
|
94
|
+
## Workflow — Mapped to Clarity Protocol
|
|
95
|
+
|
|
96
|
+
### `[STATE_ANALYSIS]`
|
|
97
|
+
1. Assess trigger condition. Determine severity using classification matrix.
|
|
98
|
+
2. Identify blast radius: which agents/modules are affected or blocked.
|
|
99
|
+
3. Read `STATUS_EVENTS.ndjson` for recent failure pattern.
|
|
100
|
+
|
|
101
|
+
### `[STRATEGY_SELECTOR]`
|
|
102
|
+
4. Assign single accountable owner (the agent closest to root cause).
|
|
103
|
+
5. Choose response strategy:
|
|
104
|
+
- SEV-1/SEV-2: Immediate timeline reconstruction + corrective routing.
|
|
105
|
+
- SEV-3: Owner-routed fix with deadline.
|
|
106
|
+
- SEV-4: Log and monitor.
|
|
107
|
+
6. Determine if circuit breaker should remain open or can be closed.
|
|
108
|
+
|
|
109
|
+
### `[EXECUTION_LOG]`
|
|
110
|
+
7. Declare incident ID and record in `global-state/INCIDENTS.md`.
|
|
111
|
+
8. Build timeline from `STATUS_EVENTS.ndjson` and `EVIDENCE_LOG.md`:
|
|
112
|
+
|
|
113
|
+
```markdown
|
|
114
|
+
## Timeline: INCIDENT-<NNN>
|
|
115
|
+
| Time | Event | Source | Detail |
|
|
116
|
+
|---|---|---|---|
|
|
117
|
+
| <ISO8601> | <event_type> | <agent> | <detail> |
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
9. Route corrective actions with due conditions to owner.
|
|
121
|
+
10. For SEV-1/SEV-2: notify all affected agents via status event emission.
|
|
122
|
+
|
|
123
|
+
### `[ARTIFACT_UPDATE]`
|
|
124
|
+
11. Write/update `global-state/INCIDENTS.md` with incident record.
|
|
125
|
+
12. Write `agent-state/INCIDENT_TIMELINE.md` with full timeline reconstruction.
|
|
126
|
+
13. Append incident declaration and resolution to `EVIDENCE_LOG.md`.
|
|
127
|
+
|
|
128
|
+
### `[VERIFICATION]`
|
|
129
|
+
14. Owner has acknowledged and is working on corrective action.
|
|
130
|
+
15. Timeline reconstruction is evidence-linked (every row has source ref).
|
|
131
|
+
16. Blast radius assessment matches actual blocked agents.
|
|
132
|
+
17. Severity classification is justified by criteria matrix.
|
|
133
|
+
|
|
134
|
+
---
|
|
135
|
+
|
|
136
|
+
## Closure Protocol
|
|
137
|
+
|
|
138
|
+
No incident closure without ALL of:
|
|
139
|
+
|
|
140
|
+
| # | Requirement | Verification |
|
|
141
|
+
|---|---|---|
|
|
142
|
+
| 1 | Owner sign-off | Owner confirms fix is deployed/applied |
|
|
143
|
+
| 2 | Verified mitigation evidence | Evidence entry with before/after state |
|
|
144
|
+
| 3 | Regression prevention | Action item to prevent recurrence recorded |
|
|
145
|
+
| 4 | Post-incident review | Lessons documented in incident record |
|
|
146
|
+
| 5 | Circuit breaker status | Confirmed closed or explicitly left open with reason |
|
|
147
|
+
|
|
148
|
+
---
|
|
149
|
+
|
|
150
|
+
## Output Contract
|
|
151
|
+
|
|
152
|
+
- `global-state/INCIDENTS.md` (incident record with full lifecycle)
|
|
153
|
+
- `agent-state/INCIDENT_TIMELINE.md` (forensic timeline)
|
|
154
|
+
- closure entry in `agent-state/EVIDENCE_LOG.md`
|
|
155
|
+
- status events: `INCIDENT_DECLARED`, `INCIDENT_MITIGATING`, `INCIDENT_RESOLVED`, `INCIDENT_CLOSED`
|
|
156
|
+
|
|
157
|
+
---
|
|
158
|
+
|
|
159
|
+
## Anti-Patterns
|
|
160
|
+
|
|
161
|
+
| Anti-Pattern | Correct Behavior |
|
|
162
|
+
|---|---|
|
|
163
|
+
| Closing incident without lessons | Every closure includes prevention action |
|
|
164
|
+
| Multiple owners | Single accountable owner; others are contributors |
|
|
165
|
+
| Timeline from memory | Timeline from event logs and evidence only |
|
|
166
|
+
| SEV-1 without pausing other work | SEV-1 means full swarm focuses on resolution |
|
|
167
|
+
| Skipping post-incident review | No closure without review, even for SEV-4 |
|
|
168
|
+
|
|
169
|
+
---
|
|
170
|
+
|
|
171
|
+
## Failure Policy
|
|
172
|
+
|
|
173
|
+
No incident closure without owner sign-off, verified mitigation evidence, and regression-prevention action item.
|
|
174
|
+
If root cause cannot be determined, incident stays `open` with `investigating` status and escalation to next-higher severity after 2 cycles.
|
|
@@ -0,0 +1,68 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: landing-review-watcher
|
|
3
|
+
description:
|
|
4
|
+
Watch review, CI, and landing loops for a change and use when the user needs a durable merge-readiness procedure rather than a one-off PR check.
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Landing Review Watcher
|
|
8
|
+
|
|
9
|
+
## Purpose
|
|
10
|
+
|
|
11
|
+
Make code-review and landing work observable, repeatable, and operator-safe.
|
|
12
|
+
This skill does not own the release decision itself; it owns the watch loop that turns "waiting on review/CI/merge" into a deterministic procedure with artifacts.
|
|
13
|
+
|
|
14
|
+
## Canonical Use Cases
|
|
15
|
+
|
|
16
|
+
1. A change is open for review and someone needs a stable loop for comments, CI state, and merge blockers.
|
|
17
|
+
2. A release candidate is waiting to land and the team wants a written handoff-safe landing procedure.
|
|
18
|
+
3. A long-running review thread needs explicit acknowledgement, blocker tracking, and final landing evidence.
|
|
19
|
+
|
|
20
|
+
## Inputs
|
|
21
|
+
|
|
22
|
+
- active branch / change identifier
|
|
23
|
+
- review comments or requested changes
|
|
24
|
+
- CI status / failing checks
|
|
25
|
+
- merge policy constraints
|
|
26
|
+
- rollout or release artifact pointers when applicable
|
|
27
|
+
|
|
28
|
+
## Workflow
|
|
29
|
+
|
|
30
|
+
1. Define the watch target:
|
|
31
|
+
branch, PR, merge queue item, or review thread.
|
|
32
|
+
2. Capture the current state:
|
|
33
|
+
reviewers, open comments, requested changes, CI state, merge policy, and known blockers.
|
|
34
|
+
3. Classify the loop:
|
|
35
|
+
`reviewing`, `changes_requested`, `waiting_on_ci`, `ready_to_land`, or `blocked`.
|
|
36
|
+
4. Write or update `agent-state/LANDING_REVIEW_WATCH.md` with:
|
|
37
|
+
owner, blockers, next checks, evidence refs, and explicit exit criteria.
|
|
38
|
+
5. On every cycle, acknowledge new review input explicitly:
|
|
39
|
+
accepted, rebutted, deferred, or blocked with reason.
|
|
40
|
+
6. If CI fails, route the failure to the responsible owner and record the remediation checkpoint.
|
|
41
|
+
7. If merge is safe, hand off to [$release-sentry](/Users/voy/Desktop/dev-ace/.agents/skills/release-sentry/SKILL.md) for the actual approve/hold decision.
|
|
42
|
+
8. After landing, record the final result, landing timestamp, and rollback pointer if one exists.
|
|
43
|
+
|
|
44
|
+
## Outputs
|
|
45
|
+
|
|
46
|
+
- `agent-state/LANDING_REVIEW_WATCH.md`
|
|
47
|
+
- updated evidence pointer in `agent-state/EVIDENCE_LOG.md`
|
|
48
|
+
- optional release routing note to `release-sentry`
|
|
49
|
+
|
|
50
|
+
## Validation
|
|
51
|
+
|
|
52
|
+
- Verify every open reviewer/blocker has an owner or explicit waiting reason.
|
|
53
|
+
- Verify CI state is current and tied to a specific run/check reference.
|
|
54
|
+
- Verify landing state is one of:
|
|
55
|
+
`reviewing`, `changes_requested`, `waiting_on_ci`, `ready_to_land`, `blocked`, `landed`.
|
|
56
|
+
- Verify the watch artifact names the next action instead of only summarizing history.
|
|
57
|
+
|
|
58
|
+
## Compatibility
|
|
59
|
+
|
|
60
|
+
- `SKILL.md` is the portable source of truth for this workflow.
|
|
61
|
+
- The skill remains useful without provider-specific PR adapters; branch names, plain-text review notes, and CI summaries are enough.
|
|
62
|
+
- Client overlays may add launch shortcuts, but must not redefine the watch-state model.
|
|
63
|
+
|
|
64
|
+
## Failure Policy
|
|
65
|
+
|
|
66
|
+
- Do not claim a change is ready to land if requested changes or failing CI remain unresolved.
|
|
67
|
+
- Do not collapse review feedback into a vague summary; every new blocker or accepted fix must be acknowledged explicitly.
|
|
68
|
+
- If the change cannot be tied to a concrete watch target, stop and create the target definition first.
|