@fredericboyer/dev-team 0.9.0 → 0.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. package/dist/create-agent.js +20 -6
  2. package/dist/create-agent.js.map +1 -1
  3. package/dist/init.js +21 -1
  4. package/dist/init.js.map +1 -1
  5. package/dist/status.js +12 -6
  6. package/dist/status.js.map +1 -1
  7. package/dist/update.js +22 -2
  8. package/dist/update.js.map +1 -1
  9. package/package.json +4 -4
  10. package/templates/CLAUDE.md +13 -3
  11. package/templates/agent-memory/dev-team-beck/MEMORY.md +13 -4
  12. package/templates/agent-memory/dev-team-borges/MEMORY.md +13 -4
  13. package/templates/agent-memory/dev-team-brooks/MEMORY.md +13 -4
  14. package/templates/agent-memory/dev-team-conway/MEMORY.md +13 -4
  15. package/templates/agent-memory/dev-team-deming/MEMORY.md +14 -5
  16. package/templates/agent-memory/dev-team-drucker/MEMORY.md +13 -4
  17. package/templates/agent-memory/dev-team-hamilton/MEMORY.md +13 -4
  18. package/templates/agent-memory/dev-team-knuth/MEMORY.md +13 -4
  19. package/templates/agent-memory/dev-team-mori/MEMORY.md +13 -4
  20. package/templates/agent-memory/dev-team-szabo/MEMORY.md +13 -4
  21. package/templates/agent-memory/dev-team-tufte/MEMORY.md +13 -4
  22. package/templates/agent-memory/dev-team-voss/MEMORY.md +13 -4
  23. package/templates/agents/dev-team-beck.md +3 -1
  24. package/templates/agents/dev-team-borges.md +82 -2
  25. package/templates/agents/dev-team-brooks.md +4 -1
  26. package/templates/agents/dev-team-conway.md +3 -1
  27. package/templates/agents/dev-team-deming.md +3 -1
  28. package/templates/agents/dev-team-drucker.md +77 -0
  29. package/templates/agents/dev-team-hamilton.md +3 -1
  30. package/templates/agents/dev-team-knuth.md +3 -1
  31. package/templates/agents/dev-team-mori.md +3 -1
  32. package/templates/agents/dev-team-szabo.md +3 -1
  33. package/templates/agents/dev-team-tufte.md +3 -1
  34. package/templates/agents/dev-team-voss.md +3 -1
  35. package/templates/dev-team-learnings.md +3 -1
  36. package/templates/dev-team-metrics.md +18 -0
  37. package/templates/hooks/dev-team-post-change-review.js +3 -3
  38. package/templates/skills/dev-team-assess/SKILL.md +20 -0
  39. package/templates/skills/dev-team-review/SKILL.md +13 -2
  40. package/templates/skills/dev-team-task/SKILL.md +15 -5
  41. package/templates/workflow-skills/dev-team-security-status/SKILL.md +1 -1
@@ -14,6 +14,8 @@ Your philosophy: "If the docs say one thing and the code does another, both are
14
14
 
15
15
  **Memory hygiene**: Read your MEMORY.md at session start. Remove stale entries (overruled challenges, outdated patterns). If approaching 200 lines, compress older entries into summaries.
16
16
 
17
+ **Role-aware loading**: Also read `.dev-team/learnings.md` (Tier 1). For cross-agent context, scan entries tagged `documentation`, `api-docs`, `readme`, `doc-code-sync` in other agents' memories — especially Voss (API changes) and Mori (UI documentation needs).
18
+
17
19
  Before reviewing or writing documentation:
18
20
  1. Spawn Explore subagents in parallel to map the actual behavior — read the implementation, trace the call graph, run the code if needed.
19
21
  2. **Research current practices** when recommending documentation tooling, formats, or patterns. Check current documentation standards and toolchain versions — static site generators, API doc generators, and markup formats evolve. Prefer codebase consistency over newer approaches; flag newer alternatives as `[SUGGESTION]` when they do not fit the existing conventions.
@@ -73,7 +75,7 @@ Rules:
73
75
  3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
74
76
  4. One exchange each before escalating to the human.
75
77
  5. Acknowledge good work when you see it.
76
- 6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
78
+ 6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop generating additional findings. You must still complete the mandatory MEMORY.md write and Learnings Output steps. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
77
79
 
78
80
  ## Learning
79
81
 
@@ -14,6 +14,8 @@ Your philosophy: "Build as if the next developer inherits your mistakes at 3 AM
14
14
 
15
15
  **Memory hygiene**: Read your MEMORY.md at session start. Remove stale entries (overruled challenges, outdated patterns). If approaching 200 lines, compress older entries into summaries.
16
16
 
17
+ **Role-aware loading**: Also read `.dev-team/learnings.md` (Tier 1). For cross-agent context, scan entries tagged `api`, `database`, `migration`, `config`, `architecture` in other agents' memories — especially Brooks (architectural decisions) and Hamilton (deployment constraints).
18
+
17
19
  Before writing any code:
18
20
  1. Spawn Explore subagents in parallel to understand the codebase area, find existing patterns, and map dependencies.
19
21
  2. **Research current practices** when making framework, library, or architectural pattern choices. Check current documentation for the libraries and runtime versions in use — APIs deprecate, defaults change, and best practices evolve. Prefer codebase consistency over newer approaches; flag newer alternatives as `[SUGGESTION]` when they do not fit the existing conventions.
@@ -59,7 +61,7 @@ Rules:
59
61
  3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
60
62
  4. One exchange each before escalating to the human.
61
63
  5. Acknowledge good work when you see it.
62
- 6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
64
+ 6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop generating additional findings. You must still complete the mandatory MEMORY.md write and Learnings Output steps. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
63
65
 
64
66
  ## Learning
65
67
 
@@ -1,5 +1,6 @@
1
1
  # Shared Team Learnings
2
- <!-- Read by all agents at session start. Keep under 200 lines. -->
2
+ <!-- Tier 1: Shared team memory. Project facts, conventions, and cross-agent decisions. -->
3
+ <!-- Read by ALL agents at session start. Keep under 200 lines. -->
3
4
  <!-- For formal decisions, use ADRs instead. This file captures organic learnings. -->
4
5
 
5
6
  ## Coding Conventions
@@ -13,4 +14,5 @@
13
14
 
14
15
  ## Overruled Challenges
15
16
  <!-- When the human overrules an agent, record why — prevents re-flagging -->
17
+ <!-- Format: "[YYYY-MM-DD] Agent: finding summary — overruled because: reason" -->
16
18
 
@@ -0,0 +1,18 @@
1
+ # Agent Calibration Metrics
2
+ <!-- Appendable log of per-task agent performance metrics. -->
3
+ <!-- Borges records an entry after each task cycle. -->
4
+ <!-- Used by /dev-team:assess to track acceptance rates and signal quality over time. -->
5
+
6
+ ## Format
7
+ <!-- Each entry follows this structure:
8
+ ### [YYYY-MM-DD] Task: <issue or PR reference>
9
+ - **Agents**: implementing: <agent>, reviewers: <agent1, agent2, ...>
10
+ - **Rounds**: <number of review waves to convergence>
11
+ - **Findings**:
12
+ - <agent>: <N> DEFECT (<accepted>/<overruled>), <N> RISK, <N> SUGGESTION
13
+ - **Acceptance rate**: <accepted findings / total findings>%
14
+ - **Duration**: <approximate task duration>
15
+ -->
16
+
17
+ ## Entries
18
+
@@ -241,11 +241,11 @@ function scoreComplexity(toolInput, filePath) {
241
241
  let score = 0;
242
242
 
243
243
  // Lines changed
244
- const oldStr = toolInput.old_string || "";
245
- const newStr = toolInput.new_string || toolInput.content || "";
244
+ const oldStr = toolInput.old_string ?? "";
245
+ const newStr = toolInput.new_string ?? toolInput.content ?? "";
246
246
  const oldLines = oldStr ? oldStr.split("\n").length : 0;
247
247
  const newLines = newStr ? newStr.split("\n").length : 0;
248
- const linesChanged = Math.abs(newLines - oldLines) + Math.min(oldLines, newLines);
248
+ const linesChanged = oldLines + newLines;
249
249
  score += Math.min(linesChanged, 50); // Cap at 50 to avoid single large file dominating
250
250
 
251
251
  // Complexity indicators in the new content
@@ -22,6 +22,7 @@ This skill audits **only update-safe files** — files that survive `dev-team up
22
22
  - All `.dev-team/agent-memory/*/MEMORY.md` files (use Glob to discover them)
23
23
  - The project's `CLAUDE.md` (root of repo)
24
24
  - `.dev-team/config.json` (to know which agents are installed)
25
+ - `.dev-team/metrics.md` (if it exists — calibration metrics log)
25
26
 
26
27
  2. If `$ARGUMENTS` specifies a focus area (e.g., "learnings", "memory", "claude.md"), scope the audit to that area only. Otherwise, audit all three.
27
28
 
@@ -91,6 +92,24 @@ Check the project's `CLAUDE.md` for:
91
92
  ### Learnings promotion
92
93
  - Mature learnings that have been stable for multiple sessions and should be promoted to `CLAUDE.md` instructions
93
94
 
95
+ ## Phase 4: Calibration metrics audit (`.dev-team/metrics.md`)
96
+
97
+ If `.dev-team/metrics.md` exists and contains entries, analyze:
98
+
99
+ ### Acceptance rates per agent
100
+ - Calculate rolling acceptance rate (last 10 entries) for each reviewer agent
101
+ - Flag agents with acceptance rate below 50% — they may be generating more noise than signal
102
+ - Identify trend direction: improving, stable, or degrading
103
+
104
+ ### Signal quality
105
+ - Are DEFECT findings being overruled frequently? This suggests over-flagging
106
+ - Are SUGGESTION findings dominating? This suggests agents are not calibrated to the project's conventions
107
+ - Are review rounds consistently high (3+)? This suggests systemic quality issues or miscalibrated reviewers
108
+
109
+ ### Delegation patterns
110
+ - Which implementing agents are used most frequently?
111
+ - Are reviewers consistently finding issues in specific domains? This may indicate an implementing agent needs calibration
112
+
94
113
  ## Report
95
114
 
96
115
  Produce a structured health report:
@@ -145,6 +164,7 @@ Provide a simple health score:
145
164
  | Learnings | healthy / needs attention / unhealthy | count by severity |
146
165
  | Agent Memory | healthy / needs attention / unhealthy | count by severity |
147
166
  | CLAUDE.md | healthy / needs attention / unhealthy | count by severity |
167
+ | Metrics | healthy / needs attention / unhealthy | count by severity |
148
168
  | **Overall** | **status** | **total** |
149
169
 
150
170
  Thresholds:
@@ -48,7 +48,7 @@ Before spawning reviewers, verify the changes are reviewable:
48
48
  ## Filter findings (judge pass)
49
49
 
50
50
  Before producing the report, filter raw findings to maximize signal quality:
51
- 1. **Remove contradictions**: Drop findings that contradict existing ADRs, learnings, or agent memory
51
+ 1. **Remove contradictions**: Drop findings that contradict existing ADRs (`docs/adr/`), learnings (`.dev-team/learnings.md`), or agent memory (`.dev-team/agent-memory/*/MEMORY.md`)
52
52
  2. **Deduplicate**: When multiple agents flag the same issue, keep the most specific finding
53
53
  3. **Consolidate suggestions**: Group `[SUGGESTION]`-level items into a single summary block
54
54
  4. **Suppress generated file findings**: Skip findings on generated, vendored, or build artifacts
@@ -78,6 +78,14 @@ Group by severity:
78
78
  - **[QUESTION]** — decisions needing justification
79
79
  - **[SUGGESTION]** — specific improvements
80
80
 
81
+ ### Filtered
82
+
83
+ List findings removed during the judge pass, with the reason for filtering:
84
+ ```
85
+ **Filtered** @agent-name — reason (contradicts ADR-NNN / duplicate of above / no concrete scenario / generated file)
86
+ Original finding summary.
87
+ ```
88
+
81
89
  ### Verdict
82
90
 
83
91
  - **Approve** — No `[DEFECT]` findings. Advisory items noted.
@@ -92,8 +100,11 @@ Before starting the review, check for open security alerts: run `/dev-team:secur
92
100
  ### Completion
93
101
 
94
102
  After the review report is delivered:
95
- 1. You MUST spawn **@dev-team-borges** (Librarian) as the final step. Borges will:
103
+ 1. You MUST spawn **@dev-team-borges** (Librarian) as the final step. Pass Borges the **finding outcome log**: every finding with its classification, source agent, and outcome (accepted/overruled/ignored), including reasoning for overrules. Borges will:
96
104
  - **Extract structured memory entries** from the review findings (each classified finding becomes a memory entry for the reviewer who produced it)
105
+ - **Reinforce accepted patterns** and **record overruled findings** for reviewer calibration
106
+ - **Generate calibration rules** when 3+ findings on the same tag are overruled
107
+ - **Record metrics** to `.dev-team/metrics.md`
97
108
  - Write entries to each participating agent's MEMORY.md using the structured format
98
109
  - Update shared learnings in `.dev-team/learnings.md`
99
110
  - Check cross-agent coherence
@@ -46,9 +46,13 @@ Track iterations in conversation context (no state files). For each iteration:
46
46
  - If validation fails, route back to implementer with specific failure reason. If it fails twice, escalate to human.
47
47
  3. After validation passes, spawn review agents in parallel as background tasks.
48
48
  4. Collect classified challenges from reviewers.
49
- 5. If any `[DEFECT]` challenges exist, address them in the next iteration.
50
- 6. If no `[DEFECT]` remains, output DONE to exit the loop.
51
- 7. If max iterations reached without convergence, report remaining defects and exit.
49
+ 5. If any `[DEFECT]` challenges exist, **compact the context** before the next iteration:
50
+ - Produce a structured summary: DEFECTs found (agent, file, status), files changed, outstanding items
51
+ - New reviewers in subsequent waves receive: current diff + compact summary + agent definition
52
+ - They do NOT receive raw conversation history from prior waves
53
+ 6. Address defects in the next iteration.
54
+ 7. If no `[DEFECT]` remains, output DONE to exit the loop.
55
+ 8. If max iterations reached without convergence, report remaining defects and exit.
52
56
 
53
57
  The convergence check happens in conversation context: count iterations, check for `[DEFECT]` findings, and decide whether to continue or exit.
54
58
 
@@ -56,6 +60,8 @@ The convergence check happens in conversation context: count iterations, check f
56
60
 
57
61
  When multiple issues are being addressed in a single session, the task loop switches to parallel orchestration (see ADR-019). Drucker coordinates all phases in conversation context.
58
62
 
63
+ **Mode selection:** If agent teams are enabled (check `.dev-team/config.json` for `"agentTeams": true`), use team lead mode for batches of 3+ issues. Otherwise, use standard worktree subagent mode. For single issues, always use standard mode.
64
+
59
65
  ### Phase 0: Brooks pre-assessment (batch)
60
66
  Spawn @dev-team-brooks once with all issues. Brooks identifies:
61
67
  - **File independence**: which issues touch overlapping files (conflict groups that must run sequentially)
@@ -71,7 +77,7 @@ Drucker spawns one implementing agent per independent issue, each on its own bra
71
77
  Reviews do **not** start until **all** implementation agents have completed (Agent tool provides completion notifications as the sync barrier). Once all are done, spawn review agents (Szabo + Knuth, plus conditional reviewers) in parallel across all branches simultaneously. Each reviewer receives the diff for one specific branch and produces classified findings scoped to that branch.
72
78
 
73
79
  ### Phase 3: Defect routing
74
- Collect all findings. Route `[DEFECT]` items back to the original implementing agent for each branch. Agents fix defects on their own branch. After fixes, another review wave runs. Continue until no `[DEFECT]` findings remain or the per-branch iteration limit is reached.
80
+ Collect all findings. Route `[DEFECT]` items back to the original implementing agent for each branch. Agents fix defects on their own branch. Before spawning the next review wave, **compact context**: produce a structured summary of prior findings, their status (fixed/disputed/pending), and files changed. New reviewers receive current diff + compact summary only — not full conversation history from prior waves. Continue until no `[DEFECT]` findings remain or the per-branch iteration limit is reached.
75
81
 
76
82
  ### Phase 4: Borges completion
77
83
  Borges runs **once** across all branches after the final review wave clears. This ensures cross-branch coherence: memory files are consistent, learnings are not duplicated, and system improvement recommendations consider the full batch.
@@ -90,8 +96,12 @@ Before starting work, check for open security alerts: run `/dev-team:security-st
90
96
  When the loop exits:
91
97
  1. **Deliver the work**: If changes are on a feature branch, create the PR (body must include `Closes #<issue>`). Ensure the PR is ready to merge: CI green, reviews passed, branch up to date. Then follow the project's merge workflow — use `/dev-team:merge` if the project has it configured, otherwise report readiness. If merge fails (CI failures, merge conflicts, branch protection), report the blocker to the human rather than leaving work unattended.
92
98
  2. **Clean up worktree**: If the work was done in a worktree, clean it up after the branch is pushed and the PR is created. Do not wait for merge to clean the worktree.
93
- 3. You MUST spawn **@dev-team-borges** (Librarian) as the final step. Borges will:
99
+ 3. You MUST spawn **@dev-team-borges** (Librarian) as the final step. Pass Borges the **finding outcome log**: every finding with its classification, source agent, and outcome (accepted/overruled/ignored), including the human's reasoning for overrules. Borges will:
94
100
  - **Extract structured memory entries** from review findings and implementation decisions
101
+ - **Reinforce accepted patterns** in the reviewer's memory (calibration feedback)
102
+ - **Record overruled findings** with context so reviewers generate fewer false positives
103
+ - **Generate calibration rules** when 3+ findings on the same tag are overruled
104
+ - **Record metrics** to `.dev-team/metrics.md` (acceptance rates, rounds to convergence)
95
105
  - Write entries to each participating agent's MEMORY.md using the structured format
96
106
  - Update shared learnings in `.dev-team/learnings.md`
97
107
  - Check cross-agent coherence
@@ -1,5 +1,5 @@
1
1
  ---
2
- name: security-status
2
+ name: dev-team:security-status
3
3
  description: Check GitHub security signals — code scanning, Dependabot, secret scanning, and compliance status. Use at session start and before releases.
4
4
  user_invocable: true
5
5
  ---