@event4u/agent-config 2.9.0 → 2.11.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.agent-src/commands/agents.md +1 -0
- package/.agent-src/commands/challenge-me.md +1 -0
- package/.agent-src/commands/chat-history.md +1 -0
- package/.agent-src/commands/context.md +1 -0
- package/.agent-src/commands/council.md +1 -0
- package/.agent-src/commands/feature.md +1 -0
- package/.agent-src/commands/fix.md +1 -0
- package/.agent-src/commands/grill-me.md +1 -0
- package/.agent-src/commands/judge.md +1 -0
- package/.agent-src/commands/memory.md +1 -0
- package/.agent-src/commands/module.md +1 -0
- package/.agent-src/commands/onboard.md +32 -4
- package/.agent-src/commands/optimize.md +1 -0
- package/.agent-src/commands/override.md +1 -0
- package/.agent-src/commands/roadmap.md +1 -0
- package/.agent-src/commands/tests.md +1 -0
- package/.agent-src/rules/no-roadmap-references.md +19 -0
- package/.agent-src/skills/nextjs-patterns/SKILL.md +203 -0
- package/.agent-src/skills/symfony-workflow/SKILL.md +173 -0
- package/.agent-src/templates/scripts/work_engine/hook_bootstrap.py +4 -0
- package/.agent-src/templates/scripts/work_engine/hooks/builtin/__init__.py +3 -0
- package/.agent-src/templates/scripts/work_engine/hooks/builtin/decision_gate.py +162 -0
- package/.agent-src/templates/scripts/work_engine/hooks/builtin/memory_visibility.py +32 -3
- package/.agent-src/templates/scripts/work_engine/hooks/settings.py +24 -6
- package/.agent-src/templates/scripts/work_engine/scoring/decision_engine.py +351 -0
- package/.agent-src/templates/scripts/work_engine/scoring/memory_visibility.py +147 -1
- package/.claude-plugin/marketplace.json +3 -1
- package/CHANGELOG.md +65 -0
- package/README.md +66 -17
- package/config/agent-settings.template.yml +85 -0
- package/docs/architecture.md +1 -1
- package/docs/contracts/STABILITY.md +16 -0
- package/docs/contracts/adr-chat-history-split.md +1 -0
- package/docs/contracts/adr-forecast-construction-shape.md +1 -0
- package/docs/contracts/adr-gtm-context-spine.md +1 -0
- package/docs/contracts/adr-level-6-productization.md +147 -0
- package/docs/contracts/adr-settings-sync-engine.md +1 -0
- package/docs/contracts/adr-wing4-context-spine.md +1 -0
- package/docs/contracts/agent-memory-contract.md +1 -0
- package/docs/contracts/agents-md-tech-stack.md +1 -0
- package/docs/contracts/audit-log-v1.md +1 -0
- package/docs/contracts/command-clusters.md +1 -0
- package/docs/contracts/command-surface-tiers.md +1 -0
- package/docs/contracts/context-paths.md +1 -0
- package/docs/contracts/cost-profile-defaults.md +105 -0
- package/docs/contracts/cross-wing-handoff.md +1 -0
- package/docs/contracts/decision-engine-gates.md +115 -0
- package/docs/contracts/decision-trace-v1.md +31 -0
- package/docs/contracts/file-ownership-matrix.md +1 -0
- package/docs/contracts/hook-architecture-v1.md +47 -0
- package/docs/contracts/implement-ticket-flow.md +1 -0
- package/docs/contracts/installed-tools-lockfile.md +1 -0
- package/docs/contracts/kernel-membership.md +1 -0
- package/docs/contracts/linear-ai-rules-inclusion.md +1 -0
- package/docs/contracts/linear-ai-three-layers.md +1 -0
- package/docs/contracts/linter-structural-model.md +1 -0
- package/docs/contracts/load-context-budget-model.md +1 -0
- package/docs/contracts/load-context-schema.md +1 -0
- package/docs/contracts/memory-visibility-v1.md +34 -0
- package/docs/contracts/one-off-script-lifecycle.md +1 -0
- package/docs/contracts/orchestration-dsl-v1.md +1 -0
- package/docs/contracts/package-self-orientation.md +1 -0
- package/docs/contracts/persona-schema.md +1 -0
- package/docs/contracts/release-trunk-sync.md +104 -0
- package/docs/contracts/roadmap-complexity-standard.md +1 -0
- package/docs/contracts/rule-classification.md +1 -0
- package/docs/contracts/rule-interactions.md +26 -0
- package/docs/contracts/rule-priority-hierarchy.md +1 -0
- package/docs/contracts/rule-router.md +1 -0
- package/docs/contracts/settings-sync-yaml-subset.md +139 -0
- package/docs/contracts/skill-domains.md +1 -0
- package/docs/contracts/tier-3-contrib-plugin.md +1 -0
- package/docs/contracts/ui-stack-extension.md +1 -0
- package/docs/contracts/ui-track-flow.md +1 -0
- package/docs/customization.md +1 -1
- package/docs/getting-started.md +3 -1
- package/docs/installation.md +8 -6
- package/docs/readme-split-plan.md +102 -0
- package/package.json +1 -1
- package/scripts/_cli/cmd_settings_check.py +171 -0
- package/scripts/agent-config +40 -0
- package/scripts/chat_history.py +19 -0
- package/scripts/check_beta_review_markers.py +127 -0
- package/scripts/check_council_references.py +46 -5
- package/scripts/check_release_trunk_sync.py +152 -0
- package/scripts/hooks/dispatch_hook.py +5 -1
- package/scripts/hooks/replay_hook.py +144 -0
- package/scripts/hooks/state_io.py +24 -1
- package/scripts/hooks_doctor.py +184 -0
- package/scripts/install.py +3 -3
- package/scripts/lint_hook_concern_budget.py +203 -0
- package/scripts/roadmap_progress_hook.py +11 -0
- package/scripts/schemas/command.schema.json +5 -0
- package/scripts/skill_linter.py +11 -2
- package/scripts/smoke_quickstart.py +134 -0
- package/scripts/validate_decision_engine.py +124 -0
|
@@ -0,0 +1,147 @@
|
|
|
1
|
+
---
|
|
2
|
+
stability: stable
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
# ADR — Level-6 Productization Closure
|
|
6
|
+
|
|
7
|
+
> **Status:** Decided · 2026-05-14
|
|
8
|
+
> **Context:** Closure record for `road-to-productization.md` and its
|
|
9
|
+
> two sibling roadmaps (`road-to-proof-not-features.md`,
|
|
10
|
+
> `road-to-better-skills-and-profiles.md` Block A). PR #43 lifted the
|
|
11
|
+
> package from Level-4 (execution engine) to Level-5 (observable
|
|
12
|
+
> decision system); this roadmap was the Level-5 → Level-6 jump:
|
|
13
|
+
> **steerable + provable + onboardable**.
|
|
14
|
+
> **Cross-links:**
|
|
15
|
+
> [`road-to-productization.md`](../../agents/roadmaps/road-to-productization.md) ·
|
|
16
|
+
> [`road-to-proof-not-features.md`](../../agents/roadmaps/archive/road-to-proof-not-features.md) ·
|
|
17
|
+
> [`road-to-better-skills-and-profiles.md`](../../agents/roadmaps/archive/road-to-better-skills-and-profiles.md).
|
|
18
|
+
|
|
19
|
+
## What shipped
|
|
20
|
+
|
|
21
|
+
### Decision-Engine steerability (Phase 2)
|
|
22
|
+
|
|
23
|
+
- [`decision-engine-gates.md`](decision-engine-gates.md) — additive
|
|
24
|
+
`decision_engine:` block in `.agent-settings.yml` with
|
|
25
|
+
`min_confidence`, `block_on_risk`, `require_memory_hits`, `on_block`,
|
|
26
|
+
`ask_timeout_seconds`, `on_block_fallback`. Absent block = unchanged
|
|
27
|
+
observe-only behaviour.
|
|
28
|
+
- Gate-conflict resolution matrix (P2.1a) + non-TTY timeout fallback
|
|
29
|
+
(P2.1b) shipped before the gates themselves; the engine refuses to
|
|
30
|
+
evaluate downstream gates after the first rejection and falls back
|
|
31
|
+
to `on_block_fallback` in non-interactive contexts.
|
|
32
|
+
- Confidence-band gate (P2.2) and risk-class gate (P2.3) wired into
|
|
33
|
+
the scoring path. Memory-required policy (P2.4) unblocks on P6.2
|
|
34
|
+
shipping (`affected` keys in the decision trace).
|
|
35
|
+
|
|
36
|
+
### UX simplification (Phase 3)
|
|
37
|
+
|
|
38
|
+
- README "Quickstart" block — install → `/onboard` → `/work "first
|
|
39
|
+
real task"`, contributor detail moved below the `## For contributors`
|
|
40
|
+
fold.
|
|
41
|
+
- Default `cost_profile` flipped from `minimal` to `balanced`;
|
|
42
|
+
rationale in [`cost-profile-defaults.md`](cost-profile-defaults.md).
|
|
43
|
+
- `/onboard` step 11 prints the Quickstart command list inline.
|
|
44
|
+
- CI gate: `task smoke-quickstart` runs the installer into a tmpdir
|
|
45
|
+
and validates the documented default surface deterministically.
|
|
46
|
+
|
|
47
|
+
### Multi-stack skill depth (Phase 4)
|
|
48
|
+
|
|
49
|
+
- `symfony-workflow` skill (~8.6 KB) — DI, Doctrine, Messenger,
|
|
50
|
+
voters, Twig, console.
|
|
51
|
+
- `nextjs-patterns` skill (~9.9 KB) — App Router, RSC boundaries,
|
|
52
|
+
Server Actions, caching, route handlers, 14.x↔15.x deltas.
|
|
53
|
+
- README stack table now separates Symfony / Next.js / Zend-Laminas
|
|
54
|
+
rows; "Deepest reference stack" paragraph names the workflow-grade
|
|
55
|
+
second tier explicitly.
|
|
56
|
+
|
|
57
|
+
### Architecture cleanup (Phase 5)
|
|
58
|
+
|
|
59
|
+
- Auto-rules (`non-destructive-by-default`, `scope-control-policy`)
|
|
60
|
+
audited: already refactored to trigger + Iron Law + pointer shape;
|
|
61
|
+
bound by the kernel-budget linter at 4 000-char override ceiling
|
|
62
|
+
(P5.1).
|
|
63
|
+
- Rule-Interaction matrix marked rule-only by design;
|
|
64
|
+
[`rule-interactions.md`](rule-interactions.md) § "Out of scope —
|
|
65
|
+
orchestration surfaces" points at `decision-engine-gates`,
|
|
66
|
+
`decision-trace-v1`, `agent-memory-contract`, `memory-visibility-v1`,
|
|
67
|
+
and the `ai-council` skill for Council × Memory × Work-Engine
|
|
68
|
+
interactions (P5.2).
|
|
69
|
+
- `type: orchestrator` frontmatter tag exempts cluster routers from
|
|
70
|
+
the `command_missing_skill_references` linter check; 15 commands
|
|
71
|
+
carry the tag (P5.3).
|
|
72
|
+
- Beta-review marker protocol shipped in [`STABILITY.md`](STABILITY.md)
|
|
73
|
+
§ Beta-review markers; `scripts/check_beta_review_markers.py` wired
|
|
74
|
+
into `task ci`; 39 beta contracts back-filled (P5.4).
|
|
75
|
+
- Test-redundancy audit produced
|
|
76
|
+
[`road-to-test-cleanup.md`](../../agents/roadmaps/road-to-test-cleanup.md)
|
|
77
|
+
— audit-only, no deletions (P5.5).
|
|
78
|
+
|
|
79
|
+
### Release-trunk discipline (Phase 1)
|
|
80
|
+
|
|
81
|
+
- [`release-trunk-sync.md`](release-trunk-sync.md) protocol; CI gate
|
|
82
|
+
fails the release-prep branch when `main` is more than one tagged
|
|
83
|
+
release behind (P1.3).
|
|
84
|
+
|
|
85
|
+
### Proof + cognition layers (Phases 6 + 7)
|
|
86
|
+
|
|
87
|
+
- Memory-consequence in the trace: `affected` keys in
|
|
88
|
+
[`decision-trace-v1.md`](decision-trace-v1.md) (sibling P2.1a–c).
|
|
89
|
+
- README three-audience split (sibling P2.2a–c).
|
|
90
|
+
- Hook doctor (sibling P2.3).
|
|
91
|
+
- Persona spine: Core-tier 5-section + Specialist-tier 7-section
|
|
92
|
+
spines locked in [`persona-schema.md`](persona-schema.md) (sibling
|
|
93
|
+
Block A).
|
|
94
|
+
|
|
95
|
+
## What got cancelled
|
|
96
|
+
|
|
97
|
+
- **P6.1 — Three real showcase sessions** (sibling P1.1–P1.4).
|
|
98
|
+
Cancelled upstream — capturing real host-agent sessions requires a
|
|
99
|
+
hosted-LLM runner that is out of scope for this roadmap. P1.0
|
|
100
|
+
pre-flight shipped; the capture surface is ready when a runner
|
|
101
|
+
exists. Reopen as `road-to-showcase-capture.md` once a runner is
|
|
102
|
+
on the table.
|
|
103
|
+
- **P8.1 — End-to-end Level-6 smoke** — same gating as P6.1.
|
|
104
|
+
Structural coverage (`task smoke-quickstart` + decision-engine
|
|
105
|
+
schema validator + gate-evaluator unit tests) covers the
|
|
106
|
+
configuration surface deterministically; the live smoke remains
|
|
107
|
+
the manual pre-tag gate.
|
|
108
|
+
|
|
109
|
+
## What stayed beta
|
|
110
|
+
|
|
111
|
+
39 contracts carry `keep-beta-until: 2026-08-12` (next audit
|
|
112
|
+
deadline). None met the 30-day promotion floor at audit time.
|
|
113
|
+
First-commit age range: 0–12 days. Audit cap is 90 days from the
|
|
114
|
+
audit date; CI rejects undated betas, multiple markers, and
|
|
115
|
+
keep-beta-until dates beyond the window.
|
|
116
|
+
|
|
117
|
+
## What got deferred to siblings
|
|
118
|
+
|
|
119
|
+
- **Showcase capture** → future `road-to-showcase-capture.md` when a
|
|
120
|
+
hosted-LLM runner is on the table.
|
|
121
|
+
- **Test-suite deletion** →
|
|
122
|
+
[`road-to-test-cleanup.md`](../../agents/roadmaps/road-to-test-cleanup.md)
|
|
123
|
+
(audit-only sibling spawned by P5.5; non-destructive by default).
|
|
124
|
+
- **Persona Block B** (Architect / Risk-Officer extension) —
|
|
125
|
+
anti-recommended per the sibling closure decision; not deferred,
|
|
126
|
+
closed.
|
|
127
|
+
- **Distribution / adoption** →
|
|
128
|
+
`road-to-distribution-and-adoption.md`, gated on this roadmap
|
|
129
|
+
closing (which this ADR records).
|
|
130
|
+
- **MCP server work** — own strand, out of scope.
|
|
131
|
+
|
|
132
|
+
## Consequences
|
|
133
|
+
|
|
134
|
+
- **Steerable:** the Decision Engine now gates on configurable
|
|
135
|
+
thresholds; the configuration surface is documented and CI-tested.
|
|
136
|
+
- **Provable:** memory hits/misses surface as `affected` keys in the
|
|
137
|
+
decision trace; the trace shape is contract-stable.
|
|
138
|
+
- **Onboardable:** a fresh user can land at a working `/work`
|
|
139
|
+
invocation in three Quickstart steps without scrolling past the
|
|
140
|
+
fold.
|
|
141
|
+
- **Multi-stack credible:** Laravel stays the deepest reference;
|
|
142
|
+
Symfony and Next.js shipped at workflow-grade depth; other stacks
|
|
143
|
+
remain project-analysis-only with the honest delta language in the
|
|
144
|
+
README.
|
|
145
|
+
- **Architecturally tidy:** orchestrator commands no longer warn,
|
|
146
|
+
beta contracts cannot rot undated, and the contract surface itself
|
|
147
|
+
carries a periodic review obligation.
|
|
@@ -0,0 +1,105 @@
|
|
|
1
|
+
# Cost-Profile Defaults — Contract
|
|
2
|
+
|
|
3
|
+
> **Status:** beta · **Owner:** package maintainer · **Last reviewed:** 2026-05-14
|
|
4
|
+
>
|
|
5
|
+
> Normative contract for the **default `cost_profile`** new installs receive.
|
|
6
|
+
> Profile semantics themselves are documented in
|
|
7
|
+
> [`docs/customization.md` § cost_profile](../customization.md) and
|
|
8
|
+
> [`docs/contracts/rule-router.md`](rule-router.md); this file owns only the
|
|
9
|
+
> **default-selection decision** and the rationale behind it.
|
|
10
|
+
|
|
11
|
+
## Decision
|
|
12
|
+
|
|
13
|
+
```
|
|
14
|
+
DEFAULT_PROFILE = "balanced"
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
`scripts/install.py` and `npx @event4u/agent-config init` write
|
|
18
|
+
`cost_profile: balanced` into `.agent-settings.yml` for fresh installs
|
|
19
|
+
unless the user passes `--profile=minimal` or `--profile=full`.
|
|
20
|
+
|
|
21
|
+
## Profile table
|
|
22
|
+
|
|
23
|
+
| Profile | Contents | Token footprint | Use when |
|
|
24
|
+
|---|---|---|---|
|
|
25
|
+
| `minimal` | Kernel only (9 always-loaded Iron-Law rules, ≤ 26 k chars) | Lowest | Token-constrained agents (small context windows, free-tier models) or projects that opt out of routing |
|
|
26
|
+
| **`balanced`** *(default)* | Kernel + tier-1 auto-rules (workflow + safety floor) | Medium | Every productized install — the documented "current behaviour superset" |
|
|
27
|
+
| `full` | Kernel + tier-1 + tier-2 (every rule, every guideline-cited skill) | Highest | Teams running large-context models (Opus 4, GPT-5) that want maximum guardrail coverage |
|
|
28
|
+
| `custom` | Ignore profile; every matrix value set explicitly | Variable | Power users tuning per-rule load decisions |
|
|
29
|
+
|
|
30
|
+
## Why `balanced`, not `minimal`
|
|
31
|
+
|
|
32
|
+
The kernel-only `minimal` profile predates the tier-1 router. It was the
|
|
33
|
+
correct default while tier-1 was experimental, but four signals now point
|
|
34
|
+
at `balanced`:
|
|
35
|
+
|
|
36
|
+
1. **Documented intent already says so.** Both
|
|
37
|
+
`config/agent-settings.template.yml` (the source the installer projects
|
|
38
|
+
from) and `docs/customization.md` describe `balanced` as
|
|
39
|
+
"default — current behaviour superset". The code default of `minimal`
|
|
40
|
+
was a drift artifact, not a deliberate stance.
|
|
41
|
+
2. **Productization (Level-6) demands sensible-default-out-of-the-box.**
|
|
42
|
+
A fresh `npx init` followed immediately by `/work` should engage the
|
|
43
|
+
full workflow guardrail set — `developer-like-execution`,
|
|
44
|
+
`verify-before-complete`, `minimal-safe-diff`, `scope-control`.
|
|
45
|
+
These live in tier-1, not the kernel. With `minimal`, the
|
|
46
|
+
work-engine runs unanchored against most quality guardrails.
|
|
47
|
+
3. **Decision-engine gates assume tier-1 is present.** The P2.x gates
|
|
48
|
+
(`min_confidence`, `block_on_risk`, `require_memory_hits`) are
|
|
49
|
+
harmless under `minimal` but only reach their documented behaviour
|
|
50
|
+
under `balanced` and above — because the confidence model and
|
|
51
|
+
risk-classification rules they read live in tier-1.
|
|
52
|
+
4. **Opt-out is cheap, opt-in is invisible.** A team that wants the
|
|
53
|
+
`minimal` floor flips one YAML value. A team that doesn't know
|
|
54
|
+
tier-1 exists never finds it. The default should err toward
|
|
55
|
+
guardrail coverage.
|
|
56
|
+
|
|
57
|
+
## Opt-out path
|
|
58
|
+
|
|
59
|
+
Token-budget pressure → flip in `.agent-settings.yml`:
|
|
60
|
+
|
|
61
|
+
```yaml
|
|
62
|
+
cost_profile: minimal
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
…or pass `--profile=minimal` to `npx @event4u/agent-config init`.
|
|
66
|
+
No migration is required: removing tier-1 rules from a session has no
|
|
67
|
+
state-machine impact because the kernel carries the Iron-Law floor.
|
|
68
|
+
|
|
69
|
+
## Drift detection
|
|
70
|
+
|
|
71
|
+
CI must keep three surfaces in sync:
|
|
72
|
+
|
|
73
|
+
- `scripts/install.py` — `DEFAULT_PROFILE` constant.
|
|
74
|
+
- `config/agent-settings.template.yml` — comment block on the
|
|
75
|
+
`cost_profile:` key.
|
|
76
|
+
- `docs/customization.md` — cost-profile table default column.
|
|
77
|
+
|
|
78
|
+
Reviewer guidance: a PR that changes any one of these must touch the
|
|
79
|
+
other two **plus** this file's `Last reviewed:` field. The
|
|
80
|
+
`docs-sync` rule enforces the cross-reference check; a missing update
|
|
81
|
+
trips it.
|
|
82
|
+
|
|
83
|
+
## Re-review schedule
|
|
84
|
+
|
|
85
|
+
`re-review: 2026-11-14` (six months out). Triggers for earlier
|
|
86
|
+
re-review:
|
|
87
|
+
|
|
88
|
+
- Tier-1 rule count drops below 5 (the router would carry too little
|
|
89
|
+
to justify the load cost).
|
|
90
|
+
- Median `npx init` token cost grows past 40 k for a fresh agent
|
|
91
|
+
session (then re-evaluate `minimal` as the default).
|
|
92
|
+
- A consumer-project tally shows ≥ 80 % of installs override the
|
|
93
|
+
default within seven days (the default is wrong for the population).
|
|
94
|
+
|
|
95
|
+
## Non-goals
|
|
96
|
+
|
|
97
|
+
- This contract does **not** dictate what tier-1 contains. That belongs
|
|
98
|
+
to [`rule-router.md`](rule-router.md) and the `kernel-membership.md`
|
|
99
|
+
contract.
|
|
100
|
+
- It does **not** add a fourth profile. `custom` covers the
|
|
101
|
+
per-tenant-tuning case; no new tier needed.
|
|
102
|
+
- It does **not** auto-migrate existing installs. Projects already
|
|
103
|
+
pinned to `minimal` keep `minimal` until a developer edits the file
|
|
104
|
+
or runs `npx @event4u/agent-config migrate` (which preserves
|
|
105
|
+
user-set values per [`migration/v1-to-v2.md`](../migration/v1-to-v2.md)).
|
|
@@ -0,0 +1,115 @@
|
|
|
1
|
+
# Decision-engine gates (v1)
|
|
2
|
+
|
|
3
|
+
**Status:** beta — landed 2026-05-14 via `road-to-productization.md` Phase 2.
|
|
4
|
+
**Owners:** `work_engine` maintainers.
|
|
5
|
+
**Scope:** the optional `decision_engine:` block in `.agent-settings.yml`.
|
|
6
|
+
|
|
7
|
+
## Purpose
|
|
8
|
+
|
|
9
|
+
Cross the package from **observable** (Level-5) to **controllable**
|
|
10
|
+
(Level-6). The engine has scored confidence-bands, risk-classes, and
|
|
11
|
+
memory-hits since Phase 4 of `road-to-decision-trace`; this contract
|
|
12
|
+
turns those signals into refusal gates the user opts into.
|
|
13
|
+
|
|
14
|
+
Absent block = unchanged behaviour. Enforcement is opt-in only; the
|
|
15
|
+
engine never silently halts on a signal the user did not configure.
|
|
16
|
+
|
|
17
|
+
## Schema
|
|
18
|
+
|
|
19
|
+
All keys optional. Unknown keys are rejected hard by
|
|
20
|
+
`scripts/validate_decision_engine.py` and by
|
|
21
|
+
`work_engine.scoring.decision_engine.parse`.
|
|
22
|
+
|
|
23
|
+
| Key | Type | Default | Notes |
|
|
24
|
+
|------------------------|-----------------|---------|-------|
|
|
25
|
+
| `surface_traces` | bool | `false` | Mirrored to `DecisionTraceHook`. Predates the gates; lives here so the block has one schema. |
|
|
26
|
+
| `min_confidence` | enum | `off` | `low` \| `medium` \| `high` \| `off`. Phase=Plan floor. |
|
|
27
|
+
| `block_on_risk` | enum | `off` | `low` \| `medium` \| `high` \| `off`. Phase=Implement ceiling. |
|
|
28
|
+
| `require_memory_hits` | bool | `false` | Phase=Refine demands `memory_hits >= 1`. |
|
|
29
|
+
| `on_block` | enum | `stop` | `stop` \| `ask` \| `warn`. Action when a gate fires. |
|
|
30
|
+
| `ask_timeout_seconds` | int (>= 0) | `30` | Non-TTY wait before applying `on_block_fallback`. |
|
|
31
|
+
| `on_block_fallback` | enum | `stop` | `stop` \| `warn`. Resolution after `ask_timeout`. |
|
|
32
|
+
|
|
33
|
+
## Gate-to-phase mapping
|
|
34
|
+
|
|
35
|
+
Each gate fires on exactly one phase. The dispatcher emits gate
|
|
36
|
+
decisions on `AFTER_STEP` for that phase only.
|
|
37
|
+
|
|
38
|
+
| Gate | Phase | Signal compared | Fires when |
|
|
39
|
+
|-----------------------|-----------|-----------------------------|-------------------------------------|
|
|
40
|
+
| `min_confidence` | Plan | `confidence_band` | actual < floor |
|
|
41
|
+
| `require_memory_hits` | Refine | `state.memory.hits` | hits < 1 |
|
|
42
|
+
| `block_on_risk` | Implement | `risk_class` | actual >= ceiling |
|
|
43
|
+
|
|
44
|
+
`low` < `medium` < `high` for both confidence and risk. `off` disables
|
|
45
|
+
the gate.
|
|
46
|
+
|
|
47
|
+
## Conflict matrix
|
|
48
|
+
|
|
49
|
+
Only one gate fires per phase, so cross-phase conflicts are impossible
|
|
50
|
+
by construction. Within a phase, **only the highest-impact gate
|
|
51
|
+
applies**; downstream gates are evaluated against the same phase but
|
|
52
|
+
skipped if a higher-priority gate already fired.
|
|
53
|
+
|
|
54
|
+
Priority (highest → lowest):
|
|
55
|
+
|
|
56
|
+
1. `block_on_risk` (Implement)
|
|
57
|
+
2. `require_memory_hits` (Refine)
|
|
58
|
+
3. `min_confidence` (Plan)
|
|
59
|
+
|
|
60
|
+
This priority surfaces only when a future schema adds gates that
|
|
61
|
+
overlap on the same phase; today each gate owns a unique phase and the
|
|
62
|
+
priority is documentary. The order is locked so future additions
|
|
63
|
+
inherit the contract.
|
|
64
|
+
|
|
65
|
+
### Worked examples
|
|
66
|
+
|
|
67
|
+
| Config | Phase | confidence | risk | hits | Outcome |
|
|
68
|
+
|---------------------------------------------------------------------------------------|-----------|------------|----------|------|----------------------------------|
|
|
69
|
+
| `min_confidence: medium` | Plan | `low` | - | - | `min_confidence` fires, action=stop |
|
|
70
|
+
| `min_confidence: medium` | Plan | `high` | - | - | no fire — band at/above floor |
|
|
71
|
+
| `block_on_risk: medium` | Implement | - | `high` | - | `block_on_risk` fires, action=stop |
|
|
72
|
+
| `block_on_risk: high` | Implement | - | `medium` | - | no fire — below ceiling |
|
|
73
|
+
| `require_memory_hits: true` | Refine | - | - | 0 | `require_memory_hits` fires |
|
|
74
|
+
| `require_memory_hits: true` | Refine | - | - | 2 | no fire |
|
|
75
|
+
| `min_confidence: high, block_on_risk: low, require_memory_hits: true` (all on) | Plan | `low` | `low` | 0 | `min_confidence` fires (Plan-owning gate) — Refine/Implement gates inert this phase |
|
|
76
|
+
|
|
77
|
+
## Non-TTY timeout protocol
|
|
78
|
+
|
|
79
|
+
`on_block=ask` is interactive. In a non-interactive context the
|
|
80
|
+
engine cannot block waiting for keystrokes that will never arrive.
|
|
81
|
+
Detection follows two signals (either disables interactivity):
|
|
82
|
+
|
|
83
|
+
- environment variable `CI` set to `1`, `true`, `yes` (case-insensitive)
|
|
84
|
+
- `sys.stdin.isatty()` or `sys.stdout.isatty()` returns false
|
|
85
|
+
|
|
86
|
+
When non-interactive, `on_block=ask` collapses to action `ask_timeout`.
|
|
87
|
+
The consumer (CLI / dispatcher) is expected to:
|
|
88
|
+
|
|
89
|
+
1. wait `ask_timeout_seconds` for a stdin response;
|
|
90
|
+
2. apply `on_block_fallback` (`stop` or `warn`) when the timeout
|
|
91
|
+
elapses or stdin is closed;
|
|
92
|
+
3. surface `block_reason=ask_timeout` on the decision trace so the
|
|
93
|
+
reason is replay-visible.
|
|
94
|
+
|
|
95
|
+
Default fallback is `stop` (fail-safe). Flip to `warn` only when CI
|
|
96
|
+
explicitly wants advisory gates.
|
|
97
|
+
|
|
98
|
+
## Rollback
|
|
99
|
+
|
|
100
|
+
The block is config-only. Remove the `decision_engine:` block and
|
|
101
|
+
the engine reverts to observe-only behaviour — no migration, no DB
|
|
102
|
+
state, no schema lock. Per-key removal also works (each key has a
|
|
103
|
+
safe default).
|
|
104
|
+
|
|
105
|
+
## Test surface
|
|
106
|
+
|
|
107
|
+
Coverage lives in `tests/work_engine/scoring/test_decision_engine.py`:
|
|
108
|
+
|
|
109
|
+
- schema parser: defaults, unknown-key rejection, bad-type rejection;
|
|
110
|
+
- gate evaluation: per-phase, per-signal, conflict isolation;
|
|
111
|
+
- TTY detection: env-var detection, fallback to `ask_timeout`;
|
|
112
|
+
- action resolution: `stop` / `warn` short-circuit interactivity.
|
|
113
|
+
|
|
114
|
+
Wiring tests (dispatcher + hook) live in
|
|
115
|
+
`tests/work_engine/test_decision_gate_hook.py`.
|
|
@@ -1,5 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
stability: beta
|
|
3
|
+
keep-beta-until: 2026-08-12
|
|
3
4
|
---
|
|
4
5
|
|
|
5
6
|
# Decision-trace v1
|
|
@@ -113,6 +114,36 @@ the trace inherits the **maximum** risk class across all files the
|
|
|
113
114
|
phase touched. If no files were touched (pure planning phase), risk
|
|
114
115
|
is `low`.
|
|
115
116
|
|
|
117
|
+
## Memory consequence keys
|
|
118
|
+
|
|
119
|
+
**Purpose.** Bound the surface area where a memory hit can be said
|
|
120
|
+
to have *changed* an outcome. Closed list, not open — without this
|
|
121
|
+
bound, every memory call risks the "memory affected everything"
|
|
122
|
+
failure mode (Risk register row 2 of
|
|
123
|
+
[`agents/roadmaps/road-to-proof-not-features.md`](../../agents/roadmaps/road-to-proof-not-features.md)).
|
|
124
|
+
|
|
125
|
+
**Closed list (v1).** Exactly four keys. Adding a fifth requires a
|
|
126
|
+
schema bump + entry under `### Breaking` in `CHANGELOG.md`.
|
|
127
|
+
|
|
128
|
+
| Key | Source | Diff semantics |
|
|
129
|
+
|---|---|---|
|
|
130
|
+
| `confidence_band` | Top-level envelope field. | String inequality (`high` ≠ `medium` ≠ `low`). |
|
|
131
|
+
| `risk_class` | Top-level envelope field. | String inequality. |
|
|
132
|
+
| `applied_rules` | Derived: sorted list of `rules[].rule_id` where `applied == true`. | Set inequality. |
|
|
133
|
+
| `test_plan` | Derived: sorted list of test paths captured in the Plan-phase `state.plan.tests` slice. May be `null` when the phase is not `plan` or no Plan-phase tests were captured. | Set inequality; `null` on either side suppresses the key from the diff. |
|
|
134
|
+
|
|
135
|
+
**Diff semantics.** The producer renders two traces for the same
|
|
136
|
+
phase: one **with** the memory entry consulted, one **without**
|
|
137
|
+
(re-running the heuristic against `memory.hits` decremented by the
|
|
138
|
+
entry's contribution). The `affected` field is the sorted list of
|
|
139
|
+
keys above whose values differ between the two traces. Empty list
|
|
140
|
+
means "consulted but no key diverged" — the call was informational,
|
|
141
|
+
not load-bearing.
|
|
142
|
+
|
|
143
|
+
**Out of scope for v1.** Gradations beyond binary key-diverged /
|
|
144
|
+
not-diverged (overridden, combined, filtered). Tracked as a Phase-1-
|
|
145
|
+
gated revisit in the same Risk register.
|
|
146
|
+
|
|
116
147
|
## Privacy floor
|
|
117
148
|
|
|
118
149
|
- `memory.ids` carries opaque ids only — no entry bodies, no secrets.
|
|
@@ -1,5 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
stability: beta
|
|
3
|
+
keep-beta-until: 2026-08-12
|
|
3
4
|
---
|
|
4
5
|
|
|
5
6
|
# Hook architecture v1
|
|
@@ -205,6 +206,50 @@ that:
|
|
|
205
206
|
The dispatcher silently no-ops when called with `--platform copilot`;
|
|
206
207
|
the fallback is consumed by reading the rule, not by hook invocation.
|
|
207
208
|
|
|
209
|
+
## Fixture corpus — `tests/fixtures/hooks/`
|
|
210
|
+
|
|
211
|
+
Replay-safe, platform-native payloads. One JSON file per event in the
|
|
212
|
+
agent-config event vocabulary. Consumed by `./agent-config hooks:replay`
|
|
213
|
+
and by the dispatcher replay tests
|
|
214
|
+
(`tests/hooks/test_hooks_replay.py` — Phase 2.4c).
|
|
215
|
+
|
|
216
|
+
```
|
|
217
|
+
tests/fixtures/hooks/
|
|
218
|
+
session_start.json · session_end.json · user_prompt_submit.json
|
|
219
|
+
pre_tool_use.json · post_tool_use.json · stop.json
|
|
220
|
+
pre_compact.json · agent_error.json
|
|
221
|
+
README.md — corpus contract + platform-shape table
|
|
222
|
+
```
|
|
223
|
+
|
|
224
|
+
Each fixture is a **stdin payload** — the dispatcher wraps it via
|
|
225
|
+
`_build_envelope` before handing it to a concern. Required keys:
|
|
226
|
+
|
|
227
|
+
- Valid JSON object at the top level.
|
|
228
|
+
- `session_id` — string, non-empty (drives feedback dir naming).
|
|
229
|
+
- Event-specific fields realistic enough that the bound concerns
|
|
230
|
+
(`chat-history`, `roadmap-progress`, `context-hygiene`,
|
|
231
|
+
`verify-before-complete`, `minimal-safe-diff`) run without raising
|
|
232
|
+
— primarily `tool_name` (for `*_tool_use`), `prompt` (for
|
|
233
|
+
`user_prompt_submit`).
|
|
234
|
+
- No real user content. Committed alongside source; the redaction
|
|
235
|
+
workflow in [`hook-payload-capture`](../hook-payload-capture.md)
|
|
236
|
+
applies to **captured** payloads, not committed fixtures.
|
|
237
|
+
|
|
238
|
+
The corpus is platform-shape-representative, not platform-exhaustive
|
|
239
|
+
— multi-platform shape coverage lives in
|
|
240
|
+
`tests/hooks/test_event_shape_contract.py`. The replay test asserts
|
|
241
|
+
1:1 mapping between `EVENT_VOCABULARY` and this directory.
|
|
242
|
+
|
|
243
|
+
## Replay mode — `AGENT_CONFIG_REPLAY=1`
|
|
244
|
+
|
|
245
|
+
Concerns that write under `agents/state/` MUST honor the
|
|
246
|
+
`AGENT_CONFIG_REPLAY` env var: when set to `1`, skip all state
|
|
247
|
+
mutations and run as read-only. The dispatcher passes the env var
|
|
248
|
+
through to subprocess concerns unchanged. Concerns that do not honor
|
|
249
|
+
the flag are listed by `./agent-config hooks:doctor` as not
|
|
250
|
+
replay-safe; replay tests assert no `agents/state/` mutation
|
|
251
|
+
post-invocation.
|
|
252
|
+
|
|
208
253
|
## Stability
|
|
209
254
|
|
|
210
255
|
Beta. Breaking changes between v1 and v2 are allowed in a minor
|
|
@@ -218,3 +263,5 @@ majors.
|
|
|
218
263
|
operational how-to for capturing redacted live payloads to upgrade
|
|
219
264
|
a platform's chat-history extractor from `docs-verified` to
|
|
220
265
|
`payload-verified`.
|
|
266
|
+
- [`tests/fixtures/hooks/README.md`](../../tests/fixtures/hooks/README.md)
|
|
267
|
+
— fixture corpus contract.
|