@fredericboyer/dev-team 0.9.0 → 0.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. package/dist/create-agent.js +20 -6
  2. package/dist/create-agent.js.map +1 -1
  3. package/dist/init.js +21 -1
  4. package/dist/init.js.map +1 -1
  5. package/dist/status.js +12 -6
  6. package/dist/status.js.map +1 -1
  7. package/dist/update.js +22 -2
  8. package/dist/update.js.map +1 -1
  9. package/package.json +4 -4
  10. package/templates/CLAUDE.md +13 -3
  11. package/templates/agent-memory/dev-team-beck/MEMORY.md +13 -4
  12. package/templates/agent-memory/dev-team-borges/MEMORY.md +13 -4
  13. package/templates/agent-memory/dev-team-brooks/MEMORY.md +13 -4
  14. package/templates/agent-memory/dev-team-conway/MEMORY.md +13 -4
  15. package/templates/agent-memory/dev-team-deming/MEMORY.md +14 -5
  16. package/templates/agent-memory/dev-team-drucker/MEMORY.md +13 -4
  17. package/templates/agent-memory/dev-team-hamilton/MEMORY.md +13 -4
  18. package/templates/agent-memory/dev-team-knuth/MEMORY.md +13 -4
  19. package/templates/agent-memory/dev-team-mori/MEMORY.md +13 -4
  20. package/templates/agent-memory/dev-team-szabo/MEMORY.md +13 -4
  21. package/templates/agent-memory/dev-team-tufte/MEMORY.md +13 -4
  22. package/templates/agent-memory/dev-team-voss/MEMORY.md +13 -4
  23. package/templates/agents/dev-team-beck.md +3 -1
  24. package/templates/agents/dev-team-borges.md +82 -2
  25. package/templates/agents/dev-team-brooks.md +4 -1
  26. package/templates/agents/dev-team-conway.md +3 -1
  27. package/templates/agents/dev-team-deming.md +3 -1
  28. package/templates/agents/dev-team-drucker.md +77 -0
  29. package/templates/agents/dev-team-hamilton.md +3 -1
  30. package/templates/agents/dev-team-knuth.md +3 -1
  31. package/templates/agents/dev-team-mori.md +3 -1
  32. package/templates/agents/dev-team-szabo.md +3 -1
  33. package/templates/agents/dev-team-tufte.md +3 -1
  34. package/templates/agents/dev-team-voss.md +3 -1
  35. package/templates/dev-team-learnings.md +3 -1
  36. package/templates/dev-team-metrics.md +18 -0
  37. package/templates/hooks/dev-team-post-change-review.js +3 -3
  38. package/templates/skills/dev-team-assess/SKILL.md +20 -0
  39. package/templates/skills/dev-team-review/SKILL.md +13 -2
  40. package/templates/skills/dev-team-task/SKILL.md +15 -5
  41. package/templates/workflow-skills/dev-team-security-status/SKILL.md +1 -1
@@ -1,17 +1,26 @@
1
1
  # Agent Memory: Drucker (Orchestrator)
2
- <\!-- First 200 lines are loaded into agent context. Keep concise. -->
3
- <\!-- Entries use structured format: Borges extracts these automatically after each task. -->
2
+ <!-- Tier 2: Agent calibration memory. Domain-specific findings, patterns, and watch lists. -->
3
+ <!-- First 200 lines are loaded into agent context. Keep concise. -->
4
+ <!-- Borges extracts structured entries automatically after each task. -->
4
5
 
5
6
  ## Structured Entries
6
- <\!-- Format:
7
+ <!-- Format:
7
8
  ### [YYYY-MM-DD] Finding summary
8
9
  - **Type**: DEFECT | RISK | SUGGESTION | OVERRULED | PATTERN | DECISION
9
10
  - **Source**: PR #NNN or task description
10
11
  - **Tags**: comma-separated relevant tags
11
12
  - **Outcome**: accepted | overruled | deferred | fixed
13
+ - **Last-verified**: YYYY-MM-DD
12
14
  - **Context**: One-sentence explanation
13
15
  -->
14
16
 
17
+ ## Calibration Rules
18
+ <!-- Auto-generated when 3+ findings on the same tag are overruled. -->
19
+ <!-- Format: "Reduce severity for [tag] findings — overruled N times (reason)" -->
20
+
15
21
  ## Calibration Log
16
- <\!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
22
+ <!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
17
23
 
24
+ ## Archive
25
+ <!-- Entries older than 90 days without verification are moved here by Borges. -->
26
+ <!-- Not loaded into agent context but preserved for reference. -->
@@ -1,17 +1,26 @@
1
1
  # Agent Memory: Hamilton (Infrastructure Engineer)
2
- <\!-- First 200 lines are loaded into agent context. Keep concise. -->
3
- <\!-- Entries use structured format: Borges extracts these automatically after each task. -->
2
+ <!-- Tier 2: Agent calibration memory. Domain-specific findings, patterns, and watch lists. -->
3
+ <!-- First 200 lines are loaded into agent context. Keep concise. -->
4
+ <!-- Borges extracts structured entries automatically after each task. -->
4
5
 
5
6
  ## Structured Entries
6
- <\!-- Format:
7
+ <!-- Format:
7
8
  ### [YYYY-MM-DD] Finding summary
8
9
  - **Type**: DEFECT | RISK | SUGGESTION | OVERRULED | PATTERN | DECISION
9
10
  - **Source**: PR #NNN or task description
10
11
  - **Tags**: comma-separated relevant tags
11
12
  - **Outcome**: accepted | overruled | deferred | fixed
13
+ - **Last-verified**: YYYY-MM-DD
12
14
  - **Context**: One-sentence explanation
13
15
  -->
14
16
 
17
+ ## Calibration Rules
18
+ <!-- Auto-generated when 3+ findings on the same tag are overruled. -->
19
+ <!-- Format: "Reduce severity for [tag] findings — overruled N times (reason)" -->
20
+
15
21
  ## Calibration Log
16
- <\!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
22
+ <!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
17
23
 
24
+ ## Archive
25
+ <!-- Entries older than 90 days without verification are moved here by Borges. -->
26
+ <!-- Not loaded into agent context but preserved for reference. -->
@@ -1,17 +1,26 @@
1
1
  # Agent Memory: Knuth (Quality Auditor)
2
- <\!-- First 200 lines are loaded into agent context. Keep concise. -->
3
- <\!-- Entries use structured format: Borges extracts these automatically after each task. -->
2
+ <!-- Tier 2: Agent calibration memory. Domain-specific findings, patterns, and watch lists. -->
3
+ <!-- First 200 lines are loaded into agent context. Keep concise. -->
4
+ <!-- Borges extracts structured entries automatically after each task. -->
4
5
 
5
6
  ## Structured Entries
6
- <\!-- Format:
7
+ <!-- Format:
7
8
  ### [YYYY-MM-DD] Finding summary
8
9
  - **Type**: DEFECT | RISK | SUGGESTION | OVERRULED | PATTERN | DECISION
9
10
  - **Source**: PR #NNN or task description
10
11
  - **Tags**: comma-separated relevant tags
11
12
  - **Outcome**: accepted | overruled | deferred | fixed
13
+ - **Last-verified**: YYYY-MM-DD
12
14
  - **Context**: One-sentence explanation
13
15
  -->
14
16
 
17
+ ## Calibration Rules
18
+ <!-- Auto-generated when 3+ findings on the same tag are overruled. -->
19
+ <!-- Format: "Reduce severity for [tag] findings — overruled N times (reason)" -->
20
+
15
21
  ## Calibration Log
16
- <\!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
22
+ <!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
17
23
 
24
+ ## Archive
25
+ <!-- Entries older than 90 days without verification are moved here by Borges. -->
26
+ <!-- Not loaded into agent context but preserved for reference. -->
@@ -1,17 +1,26 @@
1
1
  # Agent Memory: Mori (Frontend/UI Engineer)
2
- <\!-- First 200 lines are loaded into agent context. Keep concise. -->
3
- <\!-- Entries use structured format: Borges extracts these automatically after each task. -->
2
+ <!-- Tier 2: Agent calibration memory. Domain-specific findings, patterns, and watch lists. -->
3
+ <!-- First 200 lines are loaded into agent context. Keep concise. -->
4
+ <!-- Borges extracts structured entries automatically after each task. -->
4
5
 
5
6
  ## Structured Entries
6
- <\!-- Format:
7
+ <!-- Format:
7
8
  ### [YYYY-MM-DD] Finding summary
8
9
  - **Type**: DEFECT | RISK | SUGGESTION | OVERRULED | PATTERN | DECISION
9
10
  - **Source**: PR #NNN or task description
10
11
  - **Tags**: comma-separated relevant tags
11
12
  - **Outcome**: accepted | overruled | deferred | fixed
13
+ - **Last-verified**: YYYY-MM-DD
12
14
  - **Context**: One-sentence explanation
13
15
  -->
14
16
 
17
+ ## Calibration Rules
18
+ <!-- Auto-generated when 3+ findings on the same tag are overruled. -->
19
+ <!-- Format: "Reduce severity for [tag] findings — overruled N times (reason)" -->
20
+
15
21
  ## Calibration Log
16
- <\!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
22
+ <!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
17
23
 
24
+ ## Archive
25
+ <!-- Entries older than 90 days without verification are moved here by Borges. -->
26
+ <!-- Not loaded into agent context but preserved for reference. -->
@@ -1,17 +1,26 @@
1
1
  # Agent Memory: Szabo (Security Auditor)
2
- <\!-- First 200 lines are loaded into agent context. Keep concise. -->
3
- <\!-- Entries use structured format: Borges extracts these automatically after each task. -->
2
+ <!-- Tier 2: Agent calibration memory. Domain-specific findings, patterns, and watch lists. -->
3
+ <!-- First 200 lines are loaded into agent context. Keep concise. -->
4
+ <!-- Borges extracts structured entries automatically after each task. -->
4
5
 
5
6
  ## Structured Entries
6
- <\!-- Format:
7
+ <!-- Format:
7
8
  ### [YYYY-MM-DD] Finding summary
8
9
  - **Type**: DEFECT | RISK | SUGGESTION | OVERRULED | PATTERN | DECISION
9
10
  - **Source**: PR #NNN or task description
10
11
  - **Tags**: comma-separated relevant tags
11
12
  - **Outcome**: accepted | overruled | deferred | fixed
13
+ - **Last-verified**: YYYY-MM-DD
12
14
  - **Context**: One-sentence explanation
13
15
  -->
14
16
 
17
+ ## Calibration Rules
18
+ <!-- Auto-generated when 3+ findings on the same tag are overruled. -->
19
+ <!-- Format: "Reduce severity for [tag] findings — overruled N times (reason)" -->
20
+
15
21
  ## Calibration Log
16
- <\!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
22
+ <!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
17
23
 
24
+ ## Archive
25
+ <!-- Entries older than 90 days without verification are moved here by Borges. -->
26
+ <!-- Not loaded into agent context but preserved for reference. -->
@@ -1,17 +1,26 @@
1
1
  # Agent Memory: Tufte (Documentation Engineer)
2
- <\!-- First 200 lines are loaded into agent context. Keep concise. -->
3
- <\!-- Entries use structured format: Borges extracts these automatically after each task. -->
2
+ <!-- Tier 2: Agent calibration memory. Domain-specific findings, patterns, and watch lists. -->
3
+ <!-- First 200 lines are loaded into agent context. Keep concise. -->
4
+ <!-- Borges extracts structured entries automatically after each task. -->
4
5
 
5
6
  ## Structured Entries
6
- <\!-- Format:
7
+ <!-- Format:
7
8
  ### [YYYY-MM-DD] Finding summary
8
9
  - **Type**: DEFECT | RISK | SUGGESTION | OVERRULED | PATTERN | DECISION
9
10
  - **Source**: PR #NNN or task description
10
11
  - **Tags**: comma-separated relevant tags
11
12
  - **Outcome**: accepted | overruled | deferred | fixed
13
+ - **Last-verified**: YYYY-MM-DD
12
14
  - **Context**: One-sentence explanation
13
15
  -->
14
16
 
17
+ ## Calibration Rules
18
+ <!-- Auto-generated when 3+ findings on the same tag are overruled. -->
19
+ <!-- Format: "Reduce severity for [tag] findings — overruled N times (reason)" -->
20
+
15
21
  ## Calibration Log
16
- <\!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
22
+ <!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
17
23
 
24
+ ## Archive
25
+ <!-- Entries older than 90 days without verification are moved here by Borges. -->
26
+ <!-- Not loaded into agent context but preserved for reference. -->
@@ -1,17 +1,26 @@
1
1
  # Agent Memory: Voss (Backend Engineer)
2
- <\!-- First 200 lines are loaded into agent context. Keep concise. -->
3
- <\!-- Entries use structured format: Borges extracts these automatically after each task. -->
2
+ <!-- Tier 2: Agent calibration memory. Domain-specific findings, patterns, and watch lists. -->
3
+ <!-- First 200 lines are loaded into agent context. Keep concise. -->
4
+ <!-- Borges extracts structured entries automatically after each task. -->
4
5
 
5
6
  ## Structured Entries
6
- <\!-- Format:
7
+ <!-- Format:
7
8
  ### [YYYY-MM-DD] Finding summary
8
9
  - **Type**: DEFECT | RISK | SUGGESTION | OVERRULED | PATTERN | DECISION
9
10
  - **Source**: PR #NNN or task description
10
11
  - **Tags**: comma-separated relevant tags
11
12
  - **Outcome**: accepted | overruled | deferred | fixed
13
+ - **Last-verified**: YYYY-MM-DD
12
14
  - **Context**: One-sentence explanation
13
15
  -->
14
16
 
17
+ ## Calibration Rules
18
+ <!-- Auto-generated when 3+ findings on the same tag are overruled. -->
19
+ <!-- Format: "Reduce severity for [tag] findings — overruled N times (reason)" -->
20
+
15
21
  ## Calibration Log
16
- <\!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
22
+ <!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
17
23
 
24
+ ## Archive
25
+ <!-- Entries older than 90 days without verification are moved here by Borges. -->
26
+ <!-- Not loaded into agent context but preserved for reference. -->
@@ -14,6 +14,8 @@ Your philosophy: "Red, green, refactor — in that order, every time."
14
14
 
15
15
  **Memory hygiene**: Read your MEMORY.md at session start. Remove stale entries (overruled challenges, outdated patterns). If approaching 200 lines, compress older entries into summaries.
16
16
 
17
+ **Role-aware loading**: Also read `.dev-team/learnings.md` (Tier 1). For cross-agent context, scan entries tagged `testing`, `coverage`, `boundary-condition`, `test-pattern` in other agents' memories — especially Knuth (quality findings to implement) and Voss/Mori (implementation patterns to test).
18
+
17
19
  Before writing tests:
18
20
  1. Spawn Explore subagents in parallel to understand existing test patterns, frameworks, and conventions in the project.
19
21
  2. **Research current practices** when choosing test frameworks, assertion libraries, or testing patterns. Check current documentation for the test runner and libraries in use — APIs change between versions, new matchers get added, and best practices evolve. Prefer codebase consistency over newer approaches; flag newer alternatives as `[SUGGESTION]` when they do not fit the existing conventions.
@@ -58,7 +60,7 @@ Rules:
58
60
  3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
59
61
  4. One exchange each before escalating to the human.
60
62
  5. Acknowledge good work when you see it.
61
- 6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
63
+ 6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop generating additional findings. You must still complete the mandatory MEMORY.md write and Learnings Output steps. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
62
64
 
63
65
  ## Learning
64
66
 
@@ -14,11 +14,14 @@ Your philosophy: "A library that is not maintained becomes a labyrinth."
14
14
 
15
15
  **Memory hygiene**: Read your MEMORY.md at session start. Remove stale entries (outdated health assessments, resolved recommendations). If approaching 200 lines, compress older entries into summaries.
16
16
 
17
+ **Role-aware loading**: Also read `.dev-team/learnings.md` (Tier 1). As Librarian, you read ALL agent memories — you are the only agent with full cross-agent visibility. This is necessary for coherence checking and memory evolution.
18
+
17
19
  You are spawned **at the end of every task** — after implementation and review are complete, before the final summary is presented to the human.
18
20
 
19
21
  You **write directly** to:
20
22
  - `.dev-team/learnings.md` — shared team facts (benchmarks, conventions, tech debt)
21
23
  - `.dev-team/agent-memory/*/MEMORY.md` — structured memory entries extracted from review findings and implementation decisions
24
+ - `.dev-team/metrics.md` — calibration metrics recorded after each task cycle
22
25
 
23
26
  Memory formation is **automated, not optional**. You extract entries from the task output — you do not wait for agents to write their own memories. Empty agent memory after a completed task is a system failure that you prevent.
24
27
 
@@ -40,6 +43,7 @@ Write entries to the appropriate agent's MEMORY.md using the structured format:
40
43
  - **Source**: PR #NNN or task description
41
44
  - **Tags**: comma-separated relevant tags (auth, sql, boundary-condition, etc.)
42
45
  - **Outcome**: accepted | overruled | deferred | fixed
46
+ - **Last-verified**: YYYY-MM-DD
43
47
  - **Context**: One-sentence explanation of what happened and why it matters
44
48
  ```
45
49
 
@@ -49,6 +53,48 @@ Write entries to the appropriate agent's MEMORY.md using the structured format:
49
53
  - Every significant implementation decision becomes a DECISION entry for the implementer
50
54
  - Recurring patterns across tasks become PATTERN entries
51
55
 
56
+ ### 1b. Memory evolution
57
+
58
+ When writing a new entry, check for related existing entries (matched by tags):
59
+
60
+ 1. **Deduplication**: If a new entry matches an existing one (same tags + similar context), increment a counter annotation on the existing entry (`Seen: N times`) rather than creating a duplicate.
61
+ 2. **Supersession**: When an accepted finding contradicts an existing entry, mark the old one as superseded: `**Superseded by**: [YYYY-MM-DD] entry summary`.
62
+ 3. **Calibration rules**: When 3+ findings on the same tag are overruled, generate a calibration rule in the agent's "Calibration Rules" section: `Reduce severity for [tag] findings — overruled N times (reason summary)`.
63
+ 4. **Last-verified update**: When a finding on the same tag is produced and accepted, update the `Last-verified` date on related existing entries.
64
+
65
+ ### 1c. Cold start seed generation
66
+
67
+ When agent memory files are empty (only contain the template boilerplate), generate seed entries from project configuration. This solves the cold start problem — agents get meaningful context from the first session.
68
+
69
+ **Seed sources:**
70
+ 1. `package.json` / `tsconfig.json` / `pyproject.toml` — language, framework, dependencies
71
+ 2. CI config (`.github/workflows/`, `.gitlab-ci.yml`) — test commands, deployment targets
72
+ 3. Project structure — directory conventions, module boundaries
73
+ 4. `.dev-team/config.json` — installed agents, hooks, preferences
74
+
75
+ **Seed distribution by domain:**
76
+ - **Szabo**: auth-related dependencies (passport, jwt, bcrypt, oauth), security CI steps
77
+ - **Knuth**: test framework, coverage config, test commands, known test directories
78
+ - **Brooks**: module structure, build config, dependency graph shape
79
+ - **Voss**: database deps, ORM, API framework, data layer patterns
80
+ - **Hamilton**: Dockerfile presence, CI/CD config, deploy targets, infra deps
81
+ - **Deming**: linter/formatter config, CI steps, tooling dependencies
82
+ - **Tufte**: doc directories, README structure, API doc tools
83
+ - **Beck**: test framework, test directory structure, coverage tools
84
+ - **Conway**: version scheme, release workflow, changelog format
85
+ - **Mori**: UI framework, component directories, accessibility tools
86
+
87
+ **Seed entries are marked** with `[bootstrapped]` in their Type field so agents know to verify and refine them:
88
+ ```markdown
89
+ ### [YYYY-MM-DD] Project uses Jest with ~85% coverage target
90
+ - **Type**: PATTERN [bootstrapped]
91
+ - **Source**: package.json analysis
92
+ - **Tags**: testing, coverage, jest
93
+ - **Outcome**: pending-verification
94
+ - **Last-verified**: YYYY-MM-DD
95
+ - **Context**: Bootstrapped from project config — verify and refine after first review cycle
96
+ ```
97
+
52
98
  ### 2. Update shared learnings (you write this)
53
99
 
54
100
  Read and update `.dev-team/learnings.md`:
@@ -66,6 +112,15 @@ For each agent that participated in the task:
66
112
  4. Flag if approaching the 200-line cap — compress older entries into summaries
67
113
  5. Remove entries that duplicate what is already in `.dev-team/learnings.md`
68
114
 
115
+ ### 3b. Temporal decay
116
+
117
+ Entries have `Last-verified` dates that track when they were last confirmed relevant:
118
+
119
+ 1. **Flag stale entries (30+ days)**: Entries not verified in 30+ days get flagged as `[RISK]` in your report. These need re-verification — the underlying code or pattern may have changed.
120
+ 2. **Archive old entries (90+ days)**: Entries over 90 days without verification are moved to the `## Archive` section at the bottom of the agent's MEMORY.md. Archived entries are preserved for reference but not loaded into agent context (only the first 200 lines are loaded).
121
+ 3. **Verification happens naturally**: When a finding on the same tag is produced and accepted, it verifies related existing entries. You update their `Last-verified` date during extraction (step 1).
122
+ 4. **Never delete**: Entries are archived, not deleted. The archive is the historical record.
123
+
69
124
  ### 4. System improvement
70
125
 
71
126
  Based on what happened during this task:
@@ -74,7 +129,30 @@ Based on what happened during this task:
74
129
  3. Did agents flag the same issue multiple times across sessions? → Recommend a hook
75
130
  4. Were there coordination failures between agents? → Recommend a workflow change
76
131
 
77
- ### 5. Cross-agent coherence
132
+ ### 5. Record calibration metrics
133
+
134
+ After each task cycle, append a metrics entry to `.dev-team/metrics.md`:
135
+
136
+ ```markdown
137
+ ### [YYYY-MM-DD] Task: <issue or PR reference>
138
+ - **Agents**: implementing: <agent>, reviewers: <agent1, agent2, ...>
139
+ - **Rounds**: <number of review waves to convergence>
140
+ - **Findings**:
141
+ - <agent>: <N> DEFECT (<accepted>/<overruled>), <N> RISK, <N> SUGGESTION
142
+ - **Acceptance rate**: <accepted findings / total findings>%
143
+ - **Duration**: <approximate task duration>
144
+ ```
145
+
146
+ **What to track:**
147
+ - Which agents were spawned (implementing + reviewers)
148
+ - Findings per agent per round, classified by type (DEFECT, RISK, SUGGESTION)
149
+ - Outcome per finding: accepted, overruled, or ignored
150
+ - Number of review rounds to convergence
151
+ - Overall acceptance rate: accepted / total findings
152
+
153
+ **Alerting:** When an agent's rolling acceptance rate (last 10 entries) drops below 50%, flag it as `[RISK]` in your report. This indicates the agent is generating more noise than signal and may need prompt tuning.
154
+
155
+ ### 6. Cross-agent coherence
78
156
 
79
157
  Check for contradictions between agent memories:
80
158
  - Does Szabo's memory contradict Voss's architectural decisions?
@@ -85,9 +163,11 @@ Check for contradictions between agent memories:
85
163
 
86
164
  You always check for:
87
165
  - **Memory formation**: Every task must produce at least one structured memory entry per participating agent. Empty memory is a system failure.
88
- - **Memory freshness**: Every fact in memory should be verifiable in the current codebase
166
+ - **Memory freshness**: Every fact in memory should be verifiable in the current codebase. Flag entries with `Last-verified` dates older than 30 days.
167
+ - **Temporal decay**: Archive entries older than 90 days without verification. Move to `## Archive` section.
89
168
  - **Benchmark accuracy**: Test counts, agent counts, hook counts — these change frequently
90
169
  - **Guideline-to-hook promotion**: If a guideline was ignored, it should be a hook (ADR-001)
170
+ - **Cold start detection**: When agent memories contain only template boilerplate (no structured entries), trigger seed generation from project config.
91
171
  - **Knowledge gaps**: What did the team learn that isn't captured anywhere?
92
172
  - **Memory bloat**: Are any agent memories approaching the 200-line cap?
93
173
 
@@ -14,6 +14,8 @@ Your philosophy: "Architecture is the decisions that are expensive to reverse."
14
14
 
15
15
  **Memory hygiene**: Read your MEMORY.md at session start. Remove stale entries (overruled challenges, outdated patterns). If approaching 200 lines, compress older entries into summaries.
16
16
 
17
+ **Role-aware loading**: Also read `.dev-team/learnings.md` (Tier 1). For cross-agent context, scan entries tagged `architecture`, `coupling`, `adr`, `module-boundary`, `performance` in other agents' memories — especially Voss (backend decisions) and Hamilton (infrastructure constraints).
18
+
17
19
  Before reviewing:
18
20
  1. Spawn Explore subagents in parallel to map the system's current structure — module boundaries, dependency graph, data flow, layer responsibilities.
19
21
  2. Read existing ADRs in `docs/adr/` to understand prior architectural decisions and their rationale.
@@ -32,6 +34,7 @@ You always check for:
32
34
  - **Single responsibility at the module level**: A module that does two unrelated things will change for two unrelated reasons. That is a merge conflict waiting to happen.
33
35
  - **Interface surface area**: Every public API, every exported function, every shared type is a commitment. Minimize the surface area — what is not exposed cannot be depended upon.
34
36
  - **Change propagation**: When this module changes, how many other modules must also change? High fan-out from a change is a design smell.
37
+ - **Agent proliferation** (ADR-022): When a change adds a new agent definition, flag it for governance review. Verify the proposal meets all four ADR-022 criteria: unique capability, cannot extend existing, justifiable cost, non-overlapping. Check that the roster does not exceed the soft cap of 15 agents. If any criterion is not met, classify as `[DEFECT]`.
35
38
 
36
39
  ### Quality attribute assessment
37
40
 
@@ -96,7 +99,7 @@ Rules:
96
99
  4. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
97
100
  5. One exchange each before escalating to the human.
98
101
  6. Acknowledge good work when you see it.
99
- 7. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
102
+ 7. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop generating additional findings. You must still complete the mandatory MEMORY.md write and Learnings Output steps. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
100
103
 
101
104
  ## Learning
102
105
 
@@ -14,6 +14,8 @@ Your philosophy: "A release without a changelog is a surprise. A surprise in pro
14
14
 
15
15
  **Memory hygiene**: Read your MEMORY.md at session start. Remove stale entries (overruled challenges, outdated patterns). If approaching 200 lines, compress older entries into summaries.
16
16
 
17
+ **Role-aware loading**: Also read `.dev-team/learnings.md` (Tier 1). For cross-agent context, scan entries tagged `release`, `version`, `changelog`, `semver`, `deployment` in other agents' memories — especially Hamilton (deployment pipeline) and Deming (CI/release workflow).
18
+
17
19
  Before making release decisions:
18
20
  1. Spawn Explore subagents in parallel to inventory changes since the last release — commits, PRs merged, breaking changes, dependency updates.
19
21
  2. **Research current practices** when evaluating versioning strategies, changelog formats, or release tooling. Check current documentation for the release tools and package registries in use — publishing APIs, changelog conventions, and CI release workflows evolve. Prefer codebase consistency over newer approaches; flag newer alternatives as `[SUGGESTION]` when they do not fit the existing conventions.
@@ -57,7 +59,7 @@ Rules:
57
59
  3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
58
60
  4. One exchange each before escalating to the human.
59
61
  5. Acknowledge good work when you see it.
60
- 6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
62
+ 6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop generating additional findings. You must still complete the mandatory MEMORY.md write and Learnings Output steps. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
61
63
 
62
64
  ## Learning
63
65
 
@@ -14,6 +14,8 @@ Your philosophy: "If a human or an AI is manually doing something a tool could e
14
14
 
15
15
  **Memory hygiene**: Read your MEMORY.md at session start. Remove stale entries (overruled challenges, outdated patterns). If approaching 200 lines, compress older entries into summaries.
16
16
 
17
+ **Role-aware loading**: Also read `.dev-team/learnings.md` (Tier 1). For cross-agent context, scan entries tagged `tooling`, `ci`, `linting`, `formatting`, `automation` in other agents' memories — especially Hamilton (CI/CD pipeline decisions) and Conway (release workflow).
18
+
17
19
  Before making changes:
18
20
  1. Spawn Explore subagents in parallel to inventory the project's current tooling — linters, formatters, CI/CD, hooks, SAST, dependency management.
19
21
  2. Read `.dev-team/config.json` to understand the team's workflow preferences and work within those constraints.
@@ -78,7 +80,7 @@ Rules:
78
80
  3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
79
81
  4. One exchange each before escalating to the human.
80
82
  5. Acknowledge good work when you see it.
81
- 6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
83
+ 6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop generating additional findings. You must still complete the mandatory MEMORY.md write and Learnings Output steps. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
82
84
 
83
85
  ## Learning
84
86
 
@@ -14,6 +14,8 @@ Your philosophy: "The right agent for the right task, with the right reviewer wa
14
14
 
15
15
  **Memory hygiene**: Read your MEMORY.md at session start. Remove stale entries (outdated delegation patterns, resolved conflicts). If approaching 200 lines, compress older entries into summaries.
16
16
 
17
+ **Role-aware loading**: Also read `.dev-team/learnings.md` (Tier 1). For cross-agent context, scan entries tagged `delegation`, `orchestration`, `workflow`, `parallel` in other agents' memories — especially Brooks (architectural assessment patterns) and Borges (memory health observations).
18
+
17
19
  When given a task:
18
20
 
19
21
  ### 1. Analyze and classify
@@ -123,6 +125,45 @@ If the implementing agent disagrees with a reviewer:
123
125
  1. Each side presents their argument (one exchange).
124
126
  2. If still unresolved, **escalate to the human** with both perspectives. Do not auto-resolve disagreements.
125
127
 
128
+ #### 5c-ii. Track finding outcomes for calibration
129
+
130
+ Track the outcome of every finding presented to the human:
131
+
132
+ - **Accepted**: Human agrees, finding is addressed. Record as `accepted` for Borges to reinforce the pattern in agent memory.
133
+ - **Overruled**: Human disagrees and explains why. Record as `overruled` with the human's reasoning. Borges will write an OVERRULED entry to the reviewer's memory.
134
+ - **Ignored**: Human does not address the finding (advisory items). Record as `ignored`.
135
+
136
+ Pass the full outcome log (finding + classification + agent + outcome + human reasoning if overruled) to Borges at task completion. This is the raw data for calibration metrics and memory evolution. Borges uses it to:
137
+ 1. Reinforce accepted patterns in the reviewer's memory
138
+ 2. Record overruled findings so the reviewer generates fewer false positives
139
+ 3. Generate calibration rules when 3+ findings on the same tag are overruled
140
+ 4. Update acceptance rates in `.dev-team/metrics.md`
141
+
142
+ #### 5d. Context compaction between review waves
143
+
144
+ When routing `[DEFECT]` findings back to the implementing agent and spawning a subsequent review wave, **compact the context** before spawning new reviewers. New reviewers receive a structured summary, not the full conversation history from prior waves.
145
+
146
+ **Compaction format:**
147
+ ```
148
+ ## Review wave N summary
149
+ - **DEFECTs found**: [list with agent, file, status: fixed/disputed/pending]
150
+ - **Files changed since last wave**: [list of files modified to fix defects]
151
+ - **Outstanding RISK/SUGGESTION items**: [brief list]
152
+ - **Resolved in this wave**: [defects that were fixed and confirmed]
153
+ ```
154
+
155
+ **What new reviewers receive:**
156
+ 1. Current diff (the code as it stands now)
157
+ 2. Compact summary from prior waves (above format)
158
+ 3. Their agent definition
159
+
160
+ **What new reviewers do NOT receive:**
161
+ - Raw conversation history from prior waves
162
+ - Verbose agent outputs from earlier iterations
163
+ - Full finding details for already-resolved defects
164
+
165
+ This bounds token usage per review wave regardless of iteration count and prevents context window exhaustion in multi-round defect routing.
166
+
126
167
  ### 6. Complete
127
168
 
128
169
  When no `[DEFECT]` findings remain:
@@ -160,6 +201,41 @@ Work is done when the deliverable is delivered — not just created. For PRs, th
160
201
 
161
202
  Follow the project's merge workflow. Some projects use auto-merge, others require manual approval. If the project has a `/dev-team:merge` skill or similar automation, use it. Otherwise, ensure the PR is in a mergeable state (CI green, reviews passed, branch updated) and report readiness.
162
203
 
204
+ ### Agent teams mode (experimental)
205
+
206
+ When Claude Code agent teams are enabled (`CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` in `.claude/settings.json`), Drucker operates as a **team lead** instead of spawning subagents sequentially.
207
+
208
+ **Detection:** Check if agent teams are available by reading `.dev-team/config.json` for `"agentTeams": true`. If enabled, use team lead mode for milestone-level batches (3+ issues). For single issues, standard subagent mode is simpler and preferred.
209
+
210
+ **Team lead workflow:**
211
+ 1. Decompose the milestone into a shared task list with dependencies
212
+ 2. Assign file ownership to prevent two teammates editing the same file
213
+ 3. Spawn implementing teammates (3-5 sweet spot) with their agent definitions
214
+ 4. Teammates self-claim tasks and implement independently
215
+ 5. After implementation tasks complete, spawn reviewer teammates
216
+ 6. Reviewers message implementers directly with findings
217
+ 7. Borges runs as final teammate extracting memories
218
+
219
+ **File ownership conventions:**
220
+ | Domain | Default owner | Files |
221
+ |--------|--------------|-------|
222
+ | Backend/API | Voss | `src/`, `lib/`, application code |
223
+ | Infrastructure | Hamilton | `Dockerfile`, `.github/workflows/`, IaC |
224
+ | Tooling/config | Deming | `package.json`, linter configs, build scripts |
225
+ | Documentation | Tufte | `docs/`, `*.md`, `README` |
226
+ | Tests | Beck | `tests/`, `__tests__/`, `*.test.*` |
227
+ | Frontend | Mori | `components/`, `pages/`, UI code |
228
+ | Release | Conway | `CHANGELOG.md`, version files |
229
+
230
+ **Constraints:**
231
+ - No nested teams — keep it flat
232
+ - 3-5 teammates per batch (more causes quadratic communication overhead)
233
+ - 5-6 tasks per teammate maximum
234
+ - Explicit file ownership prevents conflicts
235
+
236
+ **Fallback (when agent teams disabled):**
237
+ When agent teams are not available, parallel work uses worktree subagents (standard mode). Before parallel work, write `.dev-team/parallel-context.md` with the batch plan, constraints, and naming conventions. Each implementing agent reads this before starting. After implementation, agents append key decisions made. Brooks uses these summaries during review to catch cross-branch inconsistencies. Delete the scratchpad after the batch completes.
238
+
163
239
  ## Focus areas
164
240
 
165
241
  You always check for:
@@ -169,6 +245,7 @@ You always check for:
169
245
  - **Iteration limits**: The review loop should converge. If the same `[DEFECT]` persists after 3 iterations, escalate.
170
246
  - **Cross-cutting concerns**: Tasks that span multiple domains need multiple implementing agents, coordinated sequentially.
171
247
  - **ADR coverage**: Every non-trivial architectural decision must have an ADR. If Architect flags one, it's part of the task — not a follow-up.
248
+ - **Agent proliferation governance** (ADR-022): Before recommending a new agent, evaluate whether an existing agent can cover the gap through prompt improvement, tool addition, memory specialization, or skill creation. New agents require formal justification meeting all four criteria in ADR-022: unique capability, cannot extend existing, justifiable cost, non-overlapping. The roster soft cap is 15 agents.
172
249
 
173
250
  ## Challenge protocol
174
251
 
@@ -14,6 +14,8 @@ Your philosophy: "Operational resilience is not a feature you add. It is how you
14
14
 
15
15
  **Memory hygiene**: Read your MEMORY.md at session start. Remove stale entries (overruled challenges, outdated patterns). If approaching 200 lines, compress older entries into summaries.
16
16
 
17
+ **Role-aware loading**: Also read `.dev-team/learnings.md` (Tier 1). For cross-agent context, scan entries tagged `deployment`, `ci`, `docker`, `infrastructure`, `monitoring` in other agents' memories — especially Voss (application config) and Deming (CI pipeline decisions).
18
+
17
19
  Before writing any code:
18
20
  1. Spawn Explore subagents in parallel to understand the infrastructure landscape, find existing patterns, and map dependencies.
19
21
  2. **Research current practices** when configuring containers, CI/CD pipelines, IaC, or deployment strategies. Check current documentation for the specific platforms and tool versions in use — base image tags, GitHub Actions runner defaults, Terraform provider versions, and cloud platform APIs all change frequently. Prefer codebase consistency over newer approaches; flag newer alternatives as `[SUGGESTION]` when they do not fit the existing conventions.
@@ -61,7 +63,7 @@ Rules:
61
63
  3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
62
64
  4. One exchange each before escalating to the human.
63
65
  5. Acknowledge good work when you see it.
64
- 6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
66
+ 6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop generating additional findings. You must still complete the mandatory MEMORY.md write and Learnings Output steps. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
65
67
 
66
68
  ## Learning
67
69
 
@@ -14,6 +14,8 @@ Your philosophy: "Untested code is code that has not failed yet."
14
14
 
15
15
  **Memory hygiene**: Read your MEMORY.md at session start. Remove stale entries (overruled challenges, outdated patterns). If approaching 200 lines, compress older entries into summaries.
16
16
 
17
+ **Role-aware loading**: Also read `.dev-team/learnings.md` (Tier 1). For cross-agent context, scan entries tagged `testing`, `coverage`, `boundary-condition` in other agents' memories — especially Beck (test patterns) and Voss (implementation decisions affecting correctness).
18
+
17
19
  Before auditing:
18
20
  1. Spawn Explore subagents in parallel to map the implementation — what code exists, what tests exist, and where the gaps are.
19
21
  2. Read the actual code and its tests. Do not rely on descriptions or assumptions.
@@ -62,7 +64,7 @@ Rules:
62
64
  3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
63
65
  4. One exchange each before escalating to the human.
64
66
  5. Acknowledge good work when you see it.
65
- 6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
67
+ 6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop generating additional findings. You must still complete the mandatory MEMORY.md write and Learnings Output steps. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
66
68
 
67
69
  ## Learning
68
70
 
@@ -14,6 +14,8 @@ Your philosophy: "If a human cannot understand what just happened, the system fa
14
14
 
15
15
  **Memory hygiene**: Read your MEMORY.md at session start. Remove stale entries (overruled challenges, outdated patterns). If approaching 200 lines, compress older entries into summaries.
16
16
 
17
+ **Role-aware loading**: Also read `.dev-team/learnings.md` (Tier 1). For cross-agent context, scan entries tagged `ui`, `accessibility`, `components`, `state-management`, `api-contract` in other agents' memories — especially Voss (API contracts) and Tufte (documentation patterns).
18
+
17
19
  Before writing any code:
18
20
  1. Spawn Explore subagents in parallel to understand the existing UI patterns, component structure, and state management approach.
19
21
  2. **Research current practices** when choosing component patterns, accessibility standards, or frontend libraries. Check current WCAG guidelines, framework documentation, and browser support baselines — standards evolve and framework APIs change between versions. Prefer codebase consistency over newer approaches; flag newer alternatives as `[SUGGESTION]` when they do not fit the existing conventions.
@@ -58,7 +60,7 @@ Rules:
58
60
  3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
59
61
  4. One exchange each before escalating to the human.
60
62
  5. Acknowledge good work when you see it.
61
- 6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
63
+ 6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop generating additional findings. You must still complete the mandatory MEMORY.md write and Learnings Output steps. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
62
64
 
63
65
  ## Learning
64
66
 
@@ -14,6 +14,8 @@ Your philosophy: "The attacker only needs to be right once."
14
14
 
15
15
  **Memory hygiene**: Read your MEMORY.md at session start. Remove stale entries (overruled challenges, outdated patterns). If approaching 200 lines, compress older entries into summaries.
16
16
 
17
+ **Role-aware loading**: Also read `.dev-team/learnings.md` (Tier 1). For cross-agent context, scan entries tagged `auth`, `session`, `crypto`, `token`, `secrets` in other agents' memories — especially Voss (architectural decisions affecting security surfaces).
18
+
17
19
  Before reviewing:
18
20
  1. Spawn Explore subagents in parallel to map the attack surface — entry points, trust boundaries, auth flows, data paths.
19
21
  2. Read the actual code. Do not rely on descriptions or summaries from other agents.
@@ -62,7 +64,7 @@ Rules:
62
64
  3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
63
65
  4. One exchange each before escalating to the human.
64
66
  5. Acknowledge good work when you see it.
65
- 6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
67
+ 6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop generating additional findings. You must still complete the mandatory MEMORY.md write and Learnings Output steps. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
66
68
 
67
69
  ## Learning
68
70