@fredericboyer/dev-team 0.9.0 → 0.11.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/create-agent.js +20 -6
- package/dist/create-agent.js.map +1 -1
- package/dist/init.js +21 -1
- package/dist/init.js.map +1 -1
- package/dist/status.js +12 -6
- package/dist/status.js.map +1 -1
- package/dist/update.js +22 -2
- package/dist/update.js.map +1 -1
- package/package.json +4 -4
- package/templates/CLAUDE.md +13 -3
- package/templates/agent-memory/dev-team-beck/MEMORY.md +13 -4
- package/templates/agent-memory/dev-team-borges/MEMORY.md +13 -4
- package/templates/agent-memory/dev-team-brooks/MEMORY.md +13 -4
- package/templates/agent-memory/dev-team-conway/MEMORY.md +13 -4
- package/templates/agent-memory/dev-team-deming/MEMORY.md +14 -5
- package/templates/agent-memory/dev-team-drucker/MEMORY.md +13 -4
- package/templates/agent-memory/dev-team-hamilton/MEMORY.md +13 -4
- package/templates/agent-memory/dev-team-knuth/MEMORY.md +13 -4
- package/templates/agent-memory/dev-team-mori/MEMORY.md +13 -4
- package/templates/agent-memory/dev-team-szabo/MEMORY.md +13 -4
- package/templates/agent-memory/dev-team-tufte/MEMORY.md +13 -4
- package/templates/agent-memory/dev-team-voss/MEMORY.md +13 -4
- package/templates/agents/dev-team-beck.md +3 -1
- package/templates/agents/dev-team-borges.md +82 -2
- package/templates/agents/dev-team-brooks.md +4 -1
- package/templates/agents/dev-team-conway.md +3 -1
- package/templates/agents/dev-team-deming.md +3 -1
- package/templates/agents/dev-team-drucker.md +77 -0
- package/templates/agents/dev-team-hamilton.md +3 -1
- package/templates/agents/dev-team-knuth.md +3 -1
- package/templates/agents/dev-team-mori.md +3 -1
- package/templates/agents/dev-team-szabo.md +3 -1
- package/templates/agents/dev-team-tufte.md +3 -1
- package/templates/agents/dev-team-voss.md +3 -1
- package/templates/dev-team-learnings.md +3 -1
- package/templates/dev-team-metrics.md +18 -0
- package/templates/hooks/dev-team-post-change-review.js +3 -3
- package/templates/skills/dev-team-assess/SKILL.md +20 -0
- package/templates/skills/dev-team-review/SKILL.md +13 -2
- package/templates/skills/dev-team-task/SKILL.md +15 -5
- package/templates/workflow-skills/dev-team-security-status/SKILL.md +1 -1
|
@@ -14,6 +14,8 @@ Your philosophy: "If the docs say one thing and the code does another, both are
|
|
|
14
14
|
|
|
15
15
|
**Memory hygiene**: Read your MEMORY.md at session start. Remove stale entries (overruled challenges, outdated patterns). If approaching 200 lines, compress older entries into summaries.
|
|
16
16
|
|
|
17
|
+
**Role-aware loading**: Also read `.dev-team/learnings.md` (Tier 1). For cross-agent context, scan entries tagged `documentation`, `api-docs`, `readme`, `doc-code-sync` in other agents' memories — especially Voss (API changes) and Mori (UI documentation needs).
|
|
18
|
+
|
|
17
19
|
Before reviewing or writing documentation:
|
|
18
20
|
1. Spawn Explore subagents in parallel to map the actual behavior — read the implementation, trace the call graph, run the code if needed.
|
|
19
21
|
2. **Research current practices** when recommending documentation tooling, formats, or patterns. Check current documentation standards and toolchain versions — static site generators, API doc generators, and markup formats evolve. Prefer codebase consistency over newer approaches; flag newer alternatives as `[SUGGESTION]` when they do not fit the existing conventions.
|
|
@@ -73,7 +75,7 @@ Rules:
|
|
|
73
75
|
3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
|
|
74
76
|
4. One exchange each before escalating to the human.
|
|
75
77
|
5. Acknowledge good work when you see it.
|
|
76
|
-
6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
|
|
78
|
+
6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop generating additional findings. You must still complete the mandatory MEMORY.md write and Learnings Output steps. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
|
|
77
79
|
|
|
78
80
|
## Learning
|
|
79
81
|
|
|
@@ -14,6 +14,8 @@ Your philosophy: "Build as if the next developer inherits your mistakes at 3 AM
|
|
|
14
14
|
|
|
15
15
|
**Memory hygiene**: Read your MEMORY.md at session start. Remove stale entries (overruled challenges, outdated patterns). If approaching 200 lines, compress older entries into summaries.
|
|
16
16
|
|
|
17
|
+
**Role-aware loading**: Also read `.dev-team/learnings.md` (Tier 1). For cross-agent context, scan entries tagged `api`, `database`, `migration`, `config`, `architecture` in other agents' memories — especially Brooks (architectural decisions) and Hamilton (deployment constraints).
|
|
18
|
+
|
|
17
19
|
Before writing any code:
|
|
18
20
|
1. Spawn Explore subagents in parallel to understand the codebase area, find existing patterns, and map dependencies.
|
|
19
21
|
2. **Research current practices** when making framework, library, or architectural pattern choices. Check current documentation for the libraries and runtime versions in use — APIs deprecate, defaults change, and best practices evolve. Prefer codebase consistency over newer approaches; flag newer alternatives as `[SUGGESTION]` when they do not fit the existing conventions.
|
|
@@ -59,7 +61,7 @@ Rules:
|
|
|
59
61
|
3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
|
|
60
62
|
4. One exchange each before escalating to the human.
|
|
61
63
|
5. Acknowledge good work when you see it.
|
|
62
|
-
6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
|
|
64
|
+
6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop generating additional findings. You must still complete the mandatory MEMORY.md write and Learnings Output steps. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
|
|
63
65
|
|
|
64
66
|
## Learning
|
|
65
67
|
|
|
@@ -1,5 +1,6 @@
|
|
|
1
1
|
# Shared Team Learnings
|
|
2
|
-
<!--
|
|
2
|
+
<!-- Tier 1: Shared team memory. Project facts, conventions, and cross-agent decisions. -->
|
|
3
|
+
<!-- Read by ALL agents at session start. Keep under 200 lines. -->
|
|
3
4
|
<!-- For formal decisions, use ADRs instead. This file captures organic learnings. -->
|
|
4
5
|
|
|
5
6
|
## Coding Conventions
|
|
@@ -13,4 +14,5 @@
|
|
|
13
14
|
|
|
14
15
|
## Overruled Challenges
|
|
15
16
|
<!-- When the human overrules an agent, record why — prevents re-flagging -->
|
|
17
|
+
<!-- Format: "[YYYY-MM-DD] Agent: finding summary — overruled because: reason" -->
|
|
16
18
|
|
|
@@ -0,0 +1,18 @@
|
|
|
1
|
+
# Agent Calibration Metrics
|
|
2
|
+
<!-- Appendable log of per-task agent performance metrics. -->
|
|
3
|
+
<!-- Borges records an entry after each task cycle. -->
|
|
4
|
+
<!-- Used by /dev-team:assess to track acceptance rates and signal quality over time. -->
|
|
5
|
+
|
|
6
|
+
## Format
|
|
7
|
+
<!-- Each entry follows this structure:
|
|
8
|
+
### [YYYY-MM-DD] Task: <issue or PR reference>
|
|
9
|
+
- **Agents**: implementing: <agent>, reviewers: <agent1, agent2, ...>
|
|
10
|
+
- **Rounds**: <number of review waves to convergence>
|
|
11
|
+
- **Findings**:
|
|
12
|
+
- <agent>: <N> DEFECT (<accepted>/<overruled>), <N> RISK, <N> SUGGESTION
|
|
13
|
+
- **Acceptance rate**: <accepted findings / total findings>%
|
|
14
|
+
- **Duration**: <approximate task duration>
|
|
15
|
+
-->
|
|
16
|
+
|
|
17
|
+
## Entries
|
|
18
|
+
|
|
@@ -241,11 +241,11 @@ function scoreComplexity(toolInput, filePath) {
|
|
|
241
241
|
let score = 0;
|
|
242
242
|
|
|
243
243
|
// Lines changed
|
|
244
|
-
const oldStr = toolInput.old_string
|
|
245
|
-
const newStr = toolInput.new_string
|
|
244
|
+
const oldStr = toolInput.old_string ?? "";
|
|
245
|
+
const newStr = toolInput.new_string ?? toolInput.content ?? "";
|
|
246
246
|
const oldLines = oldStr ? oldStr.split("\n").length : 0;
|
|
247
247
|
const newLines = newStr ? newStr.split("\n").length : 0;
|
|
248
|
-
const linesChanged =
|
|
248
|
+
const linesChanged = oldLines + newLines;
|
|
249
249
|
score += Math.min(linesChanged, 50); // Cap at 50 to avoid single large file dominating
|
|
250
250
|
|
|
251
251
|
// Complexity indicators in the new content
|
|
@@ -22,6 +22,7 @@ This skill audits **only update-safe files** — files that survive `dev-team up
|
|
|
22
22
|
- All `.dev-team/agent-memory/*/MEMORY.md` files (use Glob to discover them)
|
|
23
23
|
- The project's `CLAUDE.md` (root of repo)
|
|
24
24
|
- `.dev-team/config.json` (to know which agents are installed)
|
|
25
|
+
- `.dev-team/metrics.md` (if it exists — calibration metrics log)
|
|
25
26
|
|
|
26
27
|
2. If `$ARGUMENTS` specifies a focus area (e.g., "learnings", "memory", "claude.md"), scope the audit to that area only. Otherwise, audit all three.
|
|
27
28
|
|
|
@@ -91,6 +92,24 @@ Check the project's `CLAUDE.md` for:
|
|
|
91
92
|
### Learnings promotion
|
|
92
93
|
- Mature learnings that have been stable for multiple sessions and should be promoted to `CLAUDE.md` instructions
|
|
93
94
|
|
|
95
|
+
## Phase 4: Calibration metrics audit (`.dev-team/metrics.md`)
|
|
96
|
+
|
|
97
|
+
If `.dev-team/metrics.md` exists and contains entries, analyze:
|
|
98
|
+
|
|
99
|
+
### Acceptance rates per agent
|
|
100
|
+
- Calculate rolling acceptance rate (last 10 entries) for each reviewer agent
|
|
101
|
+
- Flag agents with acceptance rate below 50% — they may be generating more noise than signal
|
|
102
|
+
- Identify trend direction: improving, stable, or degrading
|
|
103
|
+
|
|
104
|
+
### Signal quality
|
|
105
|
+
- Are DEFECT findings being overruled frequently? This suggests over-flagging
|
|
106
|
+
- Are SUGGESTION findings dominating? This suggests agents are not calibrated to the project's conventions
|
|
107
|
+
- Are review rounds consistently high (3+)? This suggests systemic quality issues or miscalibrated reviewers
|
|
108
|
+
|
|
109
|
+
### Delegation patterns
|
|
110
|
+
- Which implementing agents are used most frequently?
|
|
111
|
+
- Are reviewers consistently finding issues in specific domains? This may indicate an implementing agent needs calibration
|
|
112
|
+
|
|
94
113
|
## Report
|
|
95
114
|
|
|
96
115
|
Produce a structured health report:
|
|
@@ -145,6 +164,7 @@ Provide a simple health score:
|
|
|
145
164
|
| Learnings | healthy / needs attention / unhealthy | count by severity |
|
|
146
165
|
| Agent Memory | healthy / needs attention / unhealthy | count by severity |
|
|
147
166
|
| CLAUDE.md | healthy / needs attention / unhealthy | count by severity |
|
|
167
|
+
| Metrics | healthy / needs attention / unhealthy | count by severity |
|
|
148
168
|
| **Overall** | **status** | **total** |
|
|
149
169
|
|
|
150
170
|
Thresholds:
|
|
@@ -48,7 +48,7 @@ Before spawning reviewers, verify the changes are reviewable:
|
|
|
48
48
|
## Filter findings (judge pass)
|
|
49
49
|
|
|
50
50
|
Before producing the report, filter raw findings to maximize signal quality:
|
|
51
|
-
1. **Remove contradictions**: Drop findings that contradict existing ADRs, learnings, or agent memory
|
|
51
|
+
1. **Remove contradictions**: Drop findings that contradict existing ADRs (`docs/adr/`), learnings (`.dev-team/learnings.md`), or agent memory (`.dev-team/agent-memory/*/MEMORY.md`)
|
|
52
52
|
2. **Deduplicate**: When multiple agents flag the same issue, keep the most specific finding
|
|
53
53
|
3. **Consolidate suggestions**: Group `[SUGGESTION]`-level items into a single summary block
|
|
54
54
|
4. **Suppress generated file findings**: Skip findings on generated, vendored, or build artifacts
|
|
@@ -78,6 +78,14 @@ Group by severity:
|
|
|
78
78
|
- **[QUESTION]** — decisions needing justification
|
|
79
79
|
- **[SUGGESTION]** — specific improvements
|
|
80
80
|
|
|
81
|
+
### Filtered
|
|
82
|
+
|
|
83
|
+
List findings removed during the judge pass, with the reason for filtering:
|
|
84
|
+
```
|
|
85
|
+
**Filtered** @agent-name — reason (contradicts ADR-NNN / duplicate of above / no concrete scenario / generated file)
|
|
86
|
+
Original finding summary.
|
|
87
|
+
```
|
|
88
|
+
|
|
81
89
|
### Verdict
|
|
82
90
|
|
|
83
91
|
- **Approve** — No `[DEFECT]` findings. Advisory items noted.
|
|
@@ -92,8 +100,11 @@ Before starting the review, check for open security alerts: run `/dev-team:secur
|
|
|
92
100
|
### Completion
|
|
93
101
|
|
|
94
102
|
After the review report is delivered:
|
|
95
|
-
1. You MUST spawn **@dev-team-borges** (Librarian) as the final step. Borges will:
|
|
103
|
+
1. You MUST spawn **@dev-team-borges** (Librarian) as the final step. Pass Borges the **finding outcome log**: every finding with its classification, source agent, and outcome (accepted/overruled/ignored), including reasoning for overrules. Borges will:
|
|
96
104
|
- **Extract structured memory entries** from the review findings (each classified finding becomes a memory entry for the reviewer who produced it)
|
|
105
|
+
- **Reinforce accepted patterns** and **record overruled findings** for reviewer calibration
|
|
106
|
+
- **Generate calibration rules** when 3+ findings on the same tag are overruled
|
|
107
|
+
- **Record metrics** to `.dev-team/metrics.md`
|
|
97
108
|
- Write entries to each participating agent's MEMORY.md using the structured format
|
|
98
109
|
- Update shared learnings in `.dev-team/learnings.md`
|
|
99
110
|
- Check cross-agent coherence
|
|
@@ -46,9 +46,13 @@ Track iterations in conversation context (no state files). For each iteration:
|
|
|
46
46
|
- If validation fails, route back to implementer with specific failure reason. If it fails twice, escalate to human.
|
|
47
47
|
3. After validation passes, spawn review agents in parallel as background tasks.
|
|
48
48
|
4. Collect classified challenges from reviewers.
|
|
49
|
-
5. If any `[DEFECT]` challenges exist,
|
|
50
|
-
|
|
51
|
-
|
|
49
|
+
5. If any `[DEFECT]` challenges exist, **compact the context** before the next iteration:
|
|
50
|
+
- Produce a structured summary: DEFECTs found (agent, file, status), files changed, outstanding items
|
|
51
|
+
- New reviewers in subsequent waves receive: current diff + compact summary + agent definition
|
|
52
|
+
- They do NOT receive raw conversation history from prior waves
|
|
53
|
+
6. Address defects in the next iteration.
|
|
54
|
+
7. If no `[DEFECT]` remains, output DONE to exit the loop.
|
|
55
|
+
8. If max iterations reached without convergence, report remaining defects and exit.
|
|
52
56
|
|
|
53
57
|
The convergence check happens in conversation context: count iterations, check for `[DEFECT]` findings, and decide whether to continue or exit.
|
|
54
58
|
|
|
@@ -56,6 +60,8 @@ The convergence check happens in conversation context: count iterations, check f
|
|
|
56
60
|
|
|
57
61
|
When multiple issues are being addressed in a single session, the task loop switches to parallel orchestration (see ADR-019). Drucker coordinates all phases in conversation context.
|
|
58
62
|
|
|
63
|
+
**Mode selection:** If agent teams are enabled (check `.dev-team/config.json` for `"agentTeams": true`), use team lead mode for batches of 3+ issues. Otherwise, use standard worktree subagent mode. For single issues, always use standard mode.
|
|
64
|
+
|
|
59
65
|
### Phase 0: Brooks pre-assessment (batch)
|
|
60
66
|
Spawn @dev-team-brooks once with all issues. Brooks identifies:
|
|
61
67
|
- **File independence**: which issues touch overlapping files (conflict groups that must run sequentially)
|
|
@@ -71,7 +77,7 @@ Drucker spawns one implementing agent per independent issue, each on its own bra
|
|
|
71
77
|
Reviews do **not** start until **all** implementation agents have completed (Agent tool provides completion notifications as the sync barrier). Once all are done, spawn review agents (Szabo + Knuth, plus conditional reviewers) in parallel across all branches simultaneously. Each reviewer receives the diff for one specific branch and produces classified findings scoped to that branch.
|
|
72
78
|
|
|
73
79
|
### Phase 3: Defect routing
|
|
74
|
-
Collect all findings. Route `[DEFECT]` items back to the original implementing agent for each branch. Agents fix defects on their own branch.
|
|
80
|
+
Collect all findings. Route `[DEFECT]` items back to the original implementing agent for each branch. Agents fix defects on their own branch. Before spawning the next review wave, **compact context**: produce a structured summary of prior findings, their status (fixed/disputed/pending), and files changed. New reviewers receive current diff + compact summary only — not full conversation history from prior waves. Continue until no `[DEFECT]` findings remain or the per-branch iteration limit is reached.
|
|
75
81
|
|
|
76
82
|
### Phase 4: Borges completion
|
|
77
83
|
Borges runs **once** across all branches after the final review wave clears. This ensures cross-branch coherence: memory files are consistent, learnings are not duplicated, and system improvement recommendations consider the full batch.
|
|
@@ -90,8 +96,12 @@ Before starting work, check for open security alerts: run `/dev-team:security-st
|
|
|
90
96
|
When the loop exits:
|
|
91
97
|
1. **Deliver the work**: If changes are on a feature branch, create the PR (body must include `Closes #<issue>`). Ensure the PR is ready to merge: CI green, reviews passed, branch up to date. Then follow the project's merge workflow — use `/dev-team:merge` if the project has it configured, otherwise report readiness. If merge fails (CI failures, merge conflicts, branch protection), report the blocker to the human rather than leaving work unattended.
|
|
92
98
|
2. **Clean up worktree**: If the work was done in a worktree, clean it up after the branch is pushed and the PR is created. Do not wait for merge to clean the worktree.
|
|
93
|
-
3. You MUST spawn **@dev-team-borges** (Librarian) as the final step. Borges will:
|
|
99
|
+
3. You MUST spawn **@dev-team-borges** (Librarian) as the final step. Pass Borges the **finding outcome log**: every finding with its classification, source agent, and outcome (accepted/overruled/ignored), including the human's reasoning for overrules. Borges will:
|
|
94
100
|
- **Extract structured memory entries** from review findings and implementation decisions
|
|
101
|
+
- **Reinforce accepted patterns** in the reviewer's memory (calibration feedback)
|
|
102
|
+
- **Record overruled findings** with context so reviewers generate fewer false positives
|
|
103
|
+
- **Generate calibration rules** when 3+ findings on the same tag are overruled
|
|
104
|
+
- **Record metrics** to `.dev-team/metrics.md` (acceptance rates, rounds to convergence)
|
|
95
105
|
- Write entries to each participating agent's MEMORY.md using the structured format
|
|
96
106
|
- Update shared learnings in `.dev-team/learnings.md`
|
|
97
107
|
- Check cross-agent coherence
|