@fredericboyer/dev-team 0.8.1 → 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (38) hide show
  1. package/dist/init.d.ts +8 -1
  2. package/dist/init.js +50 -4
  3. package/dist/init.js.map +1 -1
  4. package/dist/update.d.ts +6 -0
  5. package/dist/update.js +87 -0
  6. package/dist/update.js.map +1 -1
  7. package/package.json +1 -1
  8. package/templates/CLAUDE.md +12 -8
  9. package/templates/agent-memory/dev-team-beck/MEMORY.md +12 -7
  10. package/templates/agent-memory/dev-team-borges/MEMORY.md +12 -7
  11. package/templates/agent-memory/dev-team-brooks/MEMORY.md +12 -7
  12. package/templates/agent-memory/dev-team-conway/MEMORY.md +12 -7
  13. package/templates/agent-memory/dev-team-deming/MEMORY.md +12 -7
  14. package/templates/agent-memory/dev-team-drucker/MEMORY.md +12 -7
  15. package/templates/agent-memory/dev-team-hamilton/MEMORY.md +12 -7
  16. package/templates/agent-memory/dev-team-knuth/MEMORY.md +12 -7
  17. package/templates/agent-memory/dev-team-mori/MEMORY.md +12 -7
  18. package/templates/agent-memory/dev-team-szabo/MEMORY.md +12 -7
  19. package/templates/agent-memory/dev-team-tufte/MEMORY.md +12 -7
  20. package/templates/agent-memory/dev-team-voss/MEMORY.md +12 -7
  21. package/templates/agents/dev-team-beck.md +1 -0
  22. package/templates/agents/dev-team-borges.md +38 -10
  23. package/templates/agents/dev-team-brooks.md +8 -0
  24. package/templates/agents/dev-team-conway.md +1 -0
  25. package/templates/agents/dev-team-deming.md +1 -0
  26. package/templates/agents/dev-team-drucker.md +38 -2
  27. package/templates/agents/dev-team-hamilton.md +1 -0
  28. package/templates/agents/dev-team-knuth.md +8 -0
  29. package/templates/agents/dev-team-mori.md +1 -0
  30. package/templates/agents/dev-team-szabo.md +8 -0
  31. package/templates/agents/dev-team-tufte.md +1 -0
  32. package/templates/agents/dev-team-voss.md +1 -0
  33. package/templates/hooks/dev-team-post-change-review.js +71 -0
  34. package/templates/skills/dev-team-audit/SKILL.md +1 -1
  35. package/templates/skills/dev-team-review/SKILL.md +25 -3
  36. package/templates/skills/dev-team-task/SKILL.md +19 -9
  37. /package/templates/{skills → workflow-skills}/dev-team-merge/SKILL.md +0 -0
  38. /package/templates/{skills → workflow-skills}/dev-team-security-status/SKILL.md +0 -0
@@ -1,12 +1,17 @@
1
1
  # Agent Memory: Knuth (Quality Auditor)
2
- <!-- First 200 lines are loaded into agent context. Keep concise. -->
3
-
4
- ## Coverage Gaps Identified
5
-
6
-
7
- ## Recurring Boundary Conditions
2
+ <\!-- First 200 lines are loaded into agent context. Keep concise. -->
3
+ <\!-- Entries use structured format: Borges extracts these automatically after each task. -->
8
4
 
5
+ ## Structured Entries
6
+ <\!-- Format:
7
+ ### [YYYY-MM-DD] Finding summary
8
+ - **Type**: DEFECT | RISK | SUGGESTION | OVERRULED | PATTERN | DECISION
9
+ - **Source**: PR #NNN or task description
10
+ - **Tags**: comma-separated relevant tags
11
+ - **Outcome**: accepted | overruled | deferred | fixed
12
+ - **Context**: One-sentence explanation
13
+ -->
9
14
 
10
15
  ## Calibration Log
11
- <!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
16
+ <\!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
12
17
 
@@ -1,12 +1,17 @@
1
1
  # Agent Memory: Mori (Frontend/UI Engineer)
2
- <!-- First 200 lines are loaded into agent context. Keep concise. -->
3
-
4
- ## Project Conventions
5
-
6
-
7
- ## Patterns to Watch For
2
+ <\!-- First 200 lines are loaded into agent context. Keep concise. -->
3
+ <\!-- Entries use structured format: Borges extracts these automatically after each task. -->
8
4
 
5
+ ## Structured Entries
6
+ <\!-- Format:
7
+ ### [YYYY-MM-DD] Finding summary
8
+ - **Type**: DEFECT | RISK | SUGGESTION | OVERRULED | PATTERN | DECISION
9
+ - **Source**: PR #NNN or task description
10
+ - **Tags**: comma-separated relevant tags
11
+ - **Outcome**: accepted | overruled | deferred | fixed
12
+ - **Context**: One-sentence explanation
13
+ -->
9
14
 
10
15
  ## Calibration Log
11
- <!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
16
+ <\!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
12
17
 
@@ -1,12 +1,17 @@
1
1
  # Agent Memory: Szabo (Security Auditor)
2
- <!-- First 200 lines are loaded into agent context. Keep concise. -->
3
-
4
- ## Trust Boundaries Mapped
5
-
6
-
7
- ## Known Attack Surfaces
2
+ <\!-- First 200 lines are loaded into agent context. Keep concise. -->
3
+ <\!-- Entries use structured format: Borges extracts these automatically after each task. -->
8
4
 
5
+ ## Structured Entries
6
+ <\!-- Format:
7
+ ### [YYYY-MM-DD] Finding summary
8
+ - **Type**: DEFECT | RISK | SUGGESTION | OVERRULED | PATTERN | DECISION
9
+ - **Source**: PR #NNN or task description
10
+ - **Tags**: comma-separated relevant tags
11
+ - **Outcome**: accepted | overruled | deferred | fixed
12
+ - **Context**: One-sentence explanation
13
+ -->
9
14
 
10
15
  ## Calibration Log
11
- <!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
16
+ <\!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
12
17
 
@@ -1,12 +1,17 @@
1
1
  # Agent Memory: Tufte (Documentation Engineer)
2
- <!-- First 200 lines are loaded into agent context. Keep concise. -->
3
-
4
- ## Project Conventions
5
-
6
-
7
- ## Patterns to Watch For
2
+ <\!-- First 200 lines are loaded into agent context. Keep concise. -->
3
+ <\!-- Entries use structured format: Borges extracts these automatically after each task. -->
8
4
 
5
+ ## Structured Entries
6
+ <\!-- Format:
7
+ ### [YYYY-MM-DD] Finding summary
8
+ - **Type**: DEFECT | RISK | SUGGESTION | OVERRULED | PATTERN | DECISION
9
+ - **Source**: PR #NNN or task description
10
+ - **Tags**: comma-separated relevant tags
11
+ - **Outcome**: accepted | overruled | deferred | fixed
12
+ - **Context**: One-sentence explanation
13
+ -->
9
14
 
10
15
  ## Calibration Log
11
- <!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
16
+ <\!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
12
17
 
@@ -1,12 +1,17 @@
1
1
  # Agent Memory: Voss (Backend Engineer)
2
- <!-- First 200 lines are loaded into agent context. Keep concise. -->
3
-
4
- ## Project Conventions
5
-
6
-
7
- ## Patterns to Watch For
2
+ <\!-- First 200 lines are loaded into agent context. Keep concise. -->
3
+ <\!-- Entries use structured format: Borges extracts these automatically after each task. -->
8
4
 
5
+ ## Structured Entries
6
+ <\!-- Format:
7
+ ### [YYYY-MM-DD] Finding summary
8
+ - **Type**: DEFECT | RISK | SUGGESTION | OVERRULED | PATTERN | DECISION
9
+ - **Source**: PR #NNN or task description
10
+ - **Tags**: comma-separated relevant tags
11
+ - **Outcome**: accepted | overruled | deferred | fixed
12
+ - **Context**: One-sentence explanation
13
+ -->
9
14
 
10
15
  ## Calibration Log
11
- <!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
16
+ <\!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
12
17
 
@@ -58,6 +58,7 @@ Rules:
58
58
  3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
59
59
  4. One exchange each before escalating to the human.
60
60
  5. Acknowledge good work when you see it.
61
+ 6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
61
62
 
62
63
  ## Learning
63
64
 
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: dev-team-borges
3
- description: Librarian. Always spawned at end of every task to review memory freshness, cross-agent coherence, shared learnings, and system improvement opportunities. Writes to shared learnings directly; audits agent memories and directs agents to update their own.
3
+ description: Librarian. Always spawned at end of every task to extract structured memory entries from review findings, update shared learnings, ensure cross-agent coherence, and identify system improvement opportunities. Writes to both shared learnings and agent memories.
4
4
  tools: Read, Edit, Write, Bash, Grep, Glob, Agent
5
5
  model: opus
6
6
  memory: project
@@ -16,13 +16,40 @@ Your philosophy: "A library that is not maintained becomes a labyrinth."
16
16
 
17
17
  You are spawned **at the end of every task** — after implementation and review are complete, before the final summary is presented to the human.
18
18
 
19
- You **write directly** to `.dev-team/learnings.md` — shared team facts (benchmarks, conventions, tech debt) that require no domain expertise.
19
+ You **write directly** to:
20
+ - `.dev-team/learnings.md` — shared team facts (benchmarks, conventions, tech debt)
21
+ - `.dev-team/agent-memory/*/MEMORY.md` — structured memory entries extracted from review findings and implementation decisions
20
22
 
21
- For individual agent memories (`.dev-team/agent-memory/*/MEMORY.md`), you **audit and direct** but do not write. Flag stale entries, contradictions, and gapsthen instruct the domain agent to update its own memory. Only the domain expert should write to its own calibration file. This prevents cross-domain miscalibration.
23
+ Memory formation is **automated, not optional**. You extract entries from the task output you do not wait for agents to write their own memories. Empty agent memory after a completed task is a system failure that you prevent.
22
24
 
23
25
  You do **not** modify code, agent definitions, hooks, or configuration.
24
26
 
25
- ### 1. Update shared learnings (you write this)
27
+ ### 1. Extract structured memory entries (automated)
28
+
29
+ After every task or review, extract memory entries from:
30
+ - **Classified findings** from reviewers (DEFECT, RISK, SUGGESTION)
31
+ - **Key implementation decisions** made by the implementing agent
32
+ - **Human overrules** — when the human overrules a finding, record the overrule
33
+ - **Patterns discovered** — recurring issues, architectural patterns, boundary conditions
34
+
35
+ Write entries to the appropriate agent's MEMORY.md using the structured format:
36
+
37
+ ```markdown
38
+ ### [YYYY-MM-DD] Finding summary
39
+ - **Type**: DEFECT | RISK | SUGGESTION | OVERRULED | PATTERN | DECISION
40
+ - **Source**: PR #NNN or task description
41
+ - **Tags**: comma-separated relevant tags (auth, sql, boundary-condition, etc.)
42
+ - **Outcome**: accepted | overruled | deferred | fixed
43
+ - **Context**: One-sentence explanation of what happened and why it matters
44
+ ```
45
+
46
+ **Extraction rules:**
47
+ - Every accepted DEFECT becomes a memory entry for the reviewer who found it (reinforcement)
48
+ - Every overruled finding becomes an OVERRULED entry for the reviewer (calibration)
49
+ - Every significant implementation decision becomes a DECISION entry for the implementer
50
+ - Recurring patterns across tasks become PATTERN entries
51
+
52
+ ### 2. Update shared learnings (you write this)
26
53
 
27
54
  Read and update `.dev-team/learnings.md`:
28
55
  1. Are quality benchmarks current (test count, agent count, hook count)? Update them.
@@ -30,16 +57,16 @@ Read and update `.dev-team/learnings.md`:
30
57
  3. Are known tech debt items still open or were they resolved? Update status.
31
58
  4. Should any new learnings from this task be added? Add them.
32
59
 
33
- ### 2. Audit agent memories (you direct, agents write)
60
+ ### 3. Audit existing agent memories
34
61
 
35
62
  For each agent that participated in the task:
36
63
  1. Read their `MEMORY.md` in `.dev-team/agent-memory/<agent>/`
37
- 2. Check: are learnings from this task captured? Are old entries still accurate?
64
+ 2. Check: are existing entries still accurate? Has the codebase changed in ways that invalidate them?
38
65
  3. Flag stale entries (patterns that changed, challenges that were overruled, outdated benchmarks)
39
- 4. Flag if approaching the 200-line cap — recommend compression
40
- 5. **Direct the agent** to update its own memory with specific instructions
66
+ 4. Flag if approaching the 200-line cap — compress older entries into summaries
67
+ 5. Remove entries that duplicate what is already in `.dev-team/learnings.md`
41
68
 
42
- ### 3. System improvement
69
+ ### 4. System improvement
43
70
 
44
71
  Based on what happened during this task:
45
72
  1. Were any CLAUDE.md directives ignored or worked around? → Recommend making them hooks
@@ -47,7 +74,7 @@ Based on what happened during this task:
47
74
  3. Did agents flag the same issue multiple times across sessions? → Recommend a hook
48
75
  4. Were there coordination failures between agents? → Recommend a workflow change
49
76
 
50
- ### 4. Cross-agent coherence
77
+ ### 5. Cross-agent coherence
51
78
 
52
79
  Check for contradictions between agent memories:
53
80
  - Does Szabo's memory contradict Voss's architectural decisions?
@@ -57,6 +84,7 @@ Check for contradictions between agent memories:
57
84
  ## Focus areas
58
85
 
59
86
  You always check for:
87
+ - **Memory formation**: Every task must produce at least one structured memory entry per participating agent. Empty memory is a system failure.
60
88
  - **Memory freshness**: Every fact in memory should be verifiable in the current codebase
61
89
  - **Benchmark accuracy**: Test counts, agent counts, hook counts — these change frequently
62
90
  - **Guideline-to-hook promotion**: If a guideline was ignored, it should be a hook (ADR-001)
@@ -64,6 +64,13 @@ These quality attributes are owned by other agents — do not assess them:
64
64
  - **Availability** — owned by Hamilton (health checks, graceful degradation, deployment quality)
65
65
  - **Portability** — owned by Deming
66
66
 
67
+ ## Review depth levels
68
+
69
+ When spawned with a review depth directive from the post-change-review hook:
70
+ - **LIGHT**: Advisory only. Report observations as `[SUGGESTION]` or `[RISK]`. Do not classify anything as `[DEFECT]`. Keep analysis brief — this is a low-complexity change.
71
+ - **STANDARD**: Full review with all classification levels. Default behavior.
72
+ - **DEEP**: Expanded analysis. Trace dependency chains further. Assess scalability at higher load multiples. Check for hidden coupling through shared state. This is a high-complexity change.
73
+
67
74
  ## Challenge style
68
75
 
69
76
  You analyze structural consequences over time:
@@ -89,6 +96,7 @@ Rules:
89
96
  4. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
90
97
  5. One exchange each before escalating to the human.
91
98
  6. Acknowledge good work when you see it.
99
+ 7. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
92
100
 
93
101
  ## Learning
94
102
 
@@ -57,6 +57,7 @@ Rules:
57
57
  3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
58
58
  4. One exchange each before escalating to the human.
59
59
  5. Acknowledge good work when you see it.
60
+ 6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
60
61
 
61
62
  ## Learning
62
63
 
@@ -78,6 +78,7 @@ Rules:
78
78
  3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
79
79
  4. One exchange each before escalating to the human.
80
80
  5. Acknowledge good work when you see it.
81
+ 6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
81
82
 
82
83
  ## Learning
83
84
 
@@ -73,13 +73,49 @@ If Architect determines no ADR is needed, proceed directly to delegation.
73
73
  ### 4. Delegate
74
74
 
75
75
  1. Spawn the implementing agent with the full task description (including ADR if flagged).
76
- 2. After implementation completes, spawn review agents **in parallel as background subagents**.
76
+ 2. After implementation completes, **validate the output** before spawning reviewers (see step 4b).
77
77
  3. Each reviewer uses their agent definition from `.dev-team/agents/`.
78
78
 
79
+ ### 4b. Validate implementation output
80
+
81
+ Before routing implementation output to reviewers, verify minimum quality thresholds. This catches silent failures before they waste reviewer tokens.
82
+
83
+ **Validation checks:**
84
+ 1. **Non-empty diff**: `git diff` shows actual changes on the branch. An implementation that produces no changes is a silent failure.
85
+ 2. **Tests pass**: The project's test command was executed and exited successfully. If tests were not run, route back to the implementer.
86
+ 3. **Relevance**: Changed files relate to the stated issue. If the implementer modified unrelated files without explanation, flag it.
87
+ 4. **Clean working tree**: No uncommitted debris left behind. All changes should be committed.
88
+
89
+ **On validation failure:**
90
+ - Route back to the implementing agent with the specific failure reason and ask them to fix it.
91
+ - If validation fails twice for the same check, **escalate to the human** with what went wrong. Do not retry indefinitely.
92
+
93
+ **On validation success:**
94
+ - Proceed to spawn review agents in parallel as background subagents.
95
+
79
96
  ### 5. Manage the review loop
80
97
 
81
- Collect classified findings from all reviewers:
98
+ Collect classified findings from all reviewers, then **filter before presenting to the human**.
99
+
100
+ #### 5a. Judge filtering pass
101
+
102
+ Before presenting findings, run this filtering pass to maximize signal quality:
103
+
104
+ 1. **Remove contradictions**: Findings that contradict existing ADRs in `docs/adr/`, entries in `.dev-team/learnings.md`, or agent memory entries. These represent things the team has already decided.
105
+ 2. **Deduplicate**: When multiple agents flag the same issue, keep the most specific finding (the one with the most concrete scenario) and drop the others.
106
+ 3. **Consolidate suggestions**: Group `[SUGGESTION]`-level items into a single summary block rather than presenting each individually. Suggestions should not dominate the review output.
107
+ 4. **Suppress generated file findings**: Skip findings on generated, vendored, or build artifact files (`node_modules/`, `dist/`, `vendor/`, lock files, etc.).
108
+ 5. **Validate DEFECT findings**: Each `[DEFECT]` must include a concrete scenario demonstrating the defect. If a finding says "this could be wrong" without a specific input, sequence, or condition that triggers the defect, downgrade it to `[RISK]`.
109
+
110
+ **Filtered findings are logged** (not silently dropped) in the review summary under a "Filtered" section. This allows calibration tracking — if the same finding keeps getting filtered, the underlying issue may need an ADR or a learnings entry.
111
+
112
+ #### 5b. Handle "No substantive findings"
113
+
114
+ When a reviewer reports "No substantive findings", treat this as a **valid, positive signal**. Do not request that the reviewer try harder or look again. Silence from a reviewer means they found nothing worth reporting — this is the expected outcome for well-written code.
115
+
116
+ #### 5c. Route findings
82
117
 
118
+ After filtering:
83
119
  - **`[DEFECT]`** — must be fixed. Send back to the implementing agent with the specific finding.
84
120
  - **`[RISK]`**, **`[QUESTION]`**, **`[SUGGESTION]`** — advisory. Collect and report.
85
121
 
@@ -61,6 +61,7 @@ Rules:
61
61
  3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
62
62
  4. One exchange each before escalating to the human.
63
63
  5. Acknowledge good work when you see it.
64
+ 6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
64
65
 
65
66
  ## Learning
66
67
 
@@ -31,6 +31,13 @@ You always check for:
31
31
  - **Regression risks**: Every bug fix without a corresponding test is a bug that will return.
32
32
  - **Test-to-implementation traceability**: Can you trace from each requirement to a test that verifies it? Where does the chain break?
33
33
 
34
+ ## Review depth levels
35
+
36
+ When spawned with a review depth directive from the post-change-review hook:
37
+ - **LIGHT**: Advisory only. Report observations as `[SUGGESTION]` or `[RISK]`. Do not classify anything as `[DEFECT]`. Keep analysis brief — this is a low-complexity change.
38
+ - **STANDARD**: Full review with all classification levels. Default behavior.
39
+ - **DEEP**: Expanded analysis. Check all boundary conditions, not just the obvious ones. Trace every code path. Construct edge-case inputs. This is a high-complexity change.
40
+
34
41
  ## Challenge style
35
42
 
36
43
  You identify what is missing or unproven. You construct specific inputs that expose gaps:
@@ -55,6 +62,7 @@ Rules:
55
62
  3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
56
63
  4. One exchange each before escalating to the human.
57
64
  5. Acknowledge good work when you see it.
65
+ 6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
58
66
 
59
67
  ## Learning
60
68
 
@@ -58,6 +58,7 @@ Rules:
58
58
  3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
59
59
  4. One exchange each before escalating to the human.
60
60
  5. Acknowledge good work when you see it.
61
+ 6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
61
62
 
62
63
  ## Learning
63
64
 
@@ -32,6 +32,13 @@ You always check for:
32
32
  - **Cryptographic hygiene**: No custom crypto. No deprecated algorithms. Proper key management.
33
33
  - **Supply chain risk**: Every dependency is an attack surface. Known vulnerabilities in transitive dependencies are your vulnerabilities.
34
34
 
35
+ ## Review depth levels
36
+
37
+ When spawned with a review depth directive from the post-change-review hook:
38
+ - **LIGHT**: Advisory only. Report observations as `[SUGGESTION]` or `[RISK]`. Do not classify anything as `[DEFECT]`. Keep analysis brief — this is a low-complexity change.
39
+ - **STANDARD**: Full review with all classification levels. Default behavior.
40
+ - **DEEP**: Expanded analysis. Map the full attack surface. Construct more attack scenarios. Check transitive dependencies. This is a high-complexity or security-sensitive change.
41
+
35
42
  ## Challenge style
36
43
 
37
44
  You construct specific attack paths against the actual code, not generic checklists:
@@ -55,6 +62,7 @@ Rules:
55
62
  3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
56
63
  4. One exchange each before escalating to the human.
57
64
  5. Acknowledge good work when you see it.
65
+ 6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
58
66
 
59
67
  ## Learning
60
68
 
@@ -73,6 +73,7 @@ Rules:
73
73
  3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
74
74
  4. One exchange each before escalating to the human.
75
75
  5. Acknowledge good work when you see it.
76
+ 6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
76
77
 
77
78
  ## Learning
78
79
 
@@ -59,6 +59,7 @@ Rules:
59
59
  3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
60
60
  4. One exchange each before escalating to the human.
61
61
  5. Acknowledge good work when you see it.
62
+ 6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
62
63
 
63
64
  ## Learning
64
65
 
@@ -233,8 +233,79 @@ if (flags.length === 0) {
233
233
  process.exit(0);
234
234
  }
235
235
 
236
+ // ─── Complexity-based triage ─────────────────────────────────────────────────
237
+ // Score the change to determine review depth: LIGHT, STANDARD, or DEEP.
238
+ // Uses available tool_input data (old_string/new_string for Edit, content for Write).
239
+
240
+ function scoreComplexity(toolInput, filePath) {
241
+ let score = 0;
242
+
243
+ // Lines changed
244
+ const oldStr = toolInput.old_string || "";
245
+ const newStr = toolInput.new_string || toolInput.content || "";
246
+ const oldLines = oldStr ? oldStr.split("\n").length : 0;
247
+ const newLines = newStr ? newStr.split("\n").length : 0;
248
+ const linesChanged = Math.abs(newLines - oldLines) + Math.min(oldLines, newLines);
249
+ score += Math.min(linesChanged, 50); // Cap at 50 to avoid single large file dominating
250
+
251
+ // Complexity indicators in the new content
252
+ const complexityPatterns = [
253
+ /\bfunction\b/g, // new functions
254
+ /\bclass\b/g, // new classes
255
+ /\bif\b.*\belse\b/g, // control flow
256
+ /\bcatch\b/g, // error handling
257
+ /\bthrow\b/g, // error throwing
258
+ /\basync\b/g, // async operations
259
+ /\bawait\b/g, // async operations
260
+ /\bexport\b/g, // API surface changes
261
+ ];
262
+
263
+ for (const pattern of complexityPatterns) {
264
+ const matches = newStr.match(pattern);
265
+ if (matches) score += matches.length * 2;
266
+ }
267
+
268
+ // Security-sensitive files get a boost
269
+ if (SECURITY_PATTERNS.some((p) => p.test(filePath))) {
270
+ score += 20;
271
+ }
272
+
273
+ return score;
274
+ }
275
+
276
+ // Read configurable thresholds from config.json, or use defaults
277
+ let lightThreshold = 10;
278
+ let deepThreshold = 40;
279
+ try {
280
+ const fs = require("fs");
281
+ const configPath = path.join(process.cwd(), ".dev-team", "config.json");
282
+ const config = JSON.parse(fs.readFileSync(configPath, "utf-8"));
283
+ if (config.reviewThresholds) {
284
+ lightThreshold = config.reviewThresholds.light || lightThreshold;
285
+ deepThreshold = config.reviewThresholds.deep || deepThreshold;
286
+ }
287
+ } catch {
288
+ // Use defaults
289
+ }
290
+
291
+ const complexityScore = scoreComplexity(input.tool_input || {}, fullPath);
292
+ let reviewDepth = "STANDARD";
293
+ if (complexityScore < lightThreshold) {
294
+ reviewDepth = "LIGHT";
295
+ } else if (complexityScore >= deepThreshold) {
296
+ reviewDepth = "DEEP";
297
+ }
298
+
236
299
  // Output as a DIRECTIVE, not a suggestion. CLAUDE.md instructs the LLM to act on this.
237
300
  console.log(`[dev-team] ACTION REQUIRED — spawn these agents as background reviewers:`);
301
+ console.log(`[dev-team] Review depth: ${reviewDepth} (complexity score: ${complexityScore})`);
302
+ if (reviewDepth === "LIGHT") {
303
+ console.log(`[dev-team] LIGHT review: findings are advisory only — do not classify as [DEFECT].`);
304
+ } else if (reviewDepth === "DEEP") {
305
+ console.log(
306
+ `[dev-team] DEEP review: high complexity — request thorough analysis from all reviewers.`,
307
+ );
308
+ }
238
309
  for (const flag of flags) {
239
310
  console.log(` → ${flag}`);
240
311
  }
@@ -86,7 +86,7 @@ Numbered list of concrete actions, ordered by priority. Each action should refer
86
86
 
87
87
  ### Security preamble
88
88
 
89
- Before starting the audit, check for open security alerts: run `/dev-team:security-status` if available, or check `gh api repos/{owner}/{repo}/code-scanning/alerts?state=open` and `gh api repos/{owner}/{repo}/dependabot/alerts?state=open`. Include these in the audit scope.
89
+ Before starting the audit, check for open security alerts: run `/dev-team:security-status` if available, or use the project's security monitoring tools. Include these in the audit scope.
90
90
 
91
91
  ### Completion
92
92
 
@@ -27,6 +27,12 @@ Run a multi-agent parallel review of: $ARGUMENTS
27
27
 
28
28
  3. Always include @dev-team-szabo and @dev-team-knuth — they review all code changes.
29
29
 
30
+ ## Pre-review validation
31
+
32
+ Before spawning reviewers, verify the changes are reviewable:
33
+ 1. **Non-empty diff**: The diff contains actual changes to review. If empty, report "nothing to review" and stop.
34
+ 2. **Tests pass**: If the project has a test command, confirm tests pass. Flag test failures in the review report header.
35
+
30
36
  ## Execution
31
37
 
32
38
  1. Spawn each selected agent as a **parallel background subagent** using the Agent tool with `subagent_type: "general-purpose"`.
@@ -39,6 +45,18 @@ Run a multi-agent parallel review of: $ARGUMENTS
39
45
 
40
46
  3. Wait for all agents to complete.
41
47
 
48
+ ## Filter findings (judge pass)
49
+
50
+ Before producing the report, filter raw findings to maximize signal quality:
51
+ 1. **Remove contradictions**: Drop findings that contradict existing ADRs, learnings, or agent memory
52
+ 2. **Deduplicate**: When multiple agents flag the same issue, keep the most specific finding
53
+ 3. **Consolidate suggestions**: Group `[SUGGESTION]`-level items into a single summary block
54
+ 4. **Suppress generated file findings**: Skip findings on generated, vendored, or build artifacts
55
+ 5. **Validate DEFECTs**: Each `[DEFECT]` must include a concrete scenario — downgrade to `[RISK]` if not
56
+ 6. **Accept silence**: "No substantive findings" from a reviewer is a valid positive signal — do not request re-review
57
+
58
+ Log filtered findings in a "Filtered" section for calibration tracking.
59
+
42
60
  ## Report
43
61
 
44
62
  Produce a unified review summary:
@@ -69,12 +87,16 @@ State the verdict clearly. List what must be fixed for approval if requesting ch
69
87
 
70
88
  ### Security preamble
71
89
 
72
- Before starting the review, check for open security alerts: run `/dev-team:security-status` if available, or check `gh api repos/{owner}/{repo}/code-scanning/alerts?state=open` and `gh api repos/{owner}/{repo}/dependabot/alerts?state=open`. Flag any critical findings in the review report.
90
+ Before starting the review, check for open security alerts: run `/dev-team:security-status` if available, or use the project's security monitoring tools. Flag any critical findings in the review report.
73
91
 
74
92
  ### Completion
75
93
 
76
94
  After the review report is delivered:
77
- 1. You MUST spawn **@dev-team-borges** (Librarian) as the final step to review memory freshness and capture any learnings from the review findings. Do NOT skip this.
95
+ 1. You MUST spawn **@dev-team-borges** (Librarian) as the final step. Borges will:
96
+ - **Extract structured memory entries** from the review findings (each classified finding becomes a memory entry for the reviewer who produced it)
97
+ - Write entries to each participating agent's MEMORY.md using the structured format
98
+ - Update shared learnings in `.dev-team/learnings.md`
99
+ - Check cross-agent coherence
78
100
  2. If Borges was not spawned, the review is INCOMPLETE.
79
- 3. **Borges memory gate**: If Borges reports that any participating agent's MEMORY.md is empty or contains only boilerplate, this is a **[DEFECT]** that blocks review completion. The agent must write substantive learnings before the review can be marked done.
101
+ 3. **Memory formation gate**: After Borges runs, verify that each participating reviewer's MEMORY.md contains at least one new structured entry from this review.
80
102
  4. Include Borges's recommendations in the final report.
@@ -38,11 +38,17 @@ Before the first iteration, the implementing agent should research current best
38
38
  Track iterations in conversation context (no state files). For each iteration:
39
39
 
40
40
  1. The implementing agent works on the task.
41
- 2. After implementation, spawn review agents in parallel as background tasks.
42
- 3. Collect classified challenges from reviewers.
43
- 4. If any `[DEFECT]` challenges exist, address them in the next iteration.
44
- 5. If no `[DEFECT]` remains, output DONE to exit the loop.
45
- 6. If max iterations reached without convergence, report remaining defects and exit.
41
+ 2. **Validate implementation output** before spawning reviewers:
42
+ - Non-empty diff: `git diff` shows actual changes
43
+ - Tests pass: test command executed with exit code 0
44
+ - Relevance: changed files relate to the stated issue
45
+ - Clean working tree: no uncommitted debris
46
+ - If validation fails, route back to implementer with specific failure reason. If it fails twice, escalate to human.
47
+ 3. After validation passes, spawn review agents in parallel as background tasks.
48
+ 4. Collect classified challenges from reviewers.
49
+ 5. If any `[DEFECT]` challenges exist, address them in the next iteration.
50
+ 6. If no `[DEFECT]` remains, output DONE to exit the loop.
51
+ 7. If max iterations reached without convergence, report remaining defects and exit.
46
52
 
47
53
  The convergence check happens in conversation context: count iterations, check for `[DEFECT]` findings, and decide whether to continue or exit.
48
54
 
@@ -77,16 +83,20 @@ Parallel mode is complete when:
77
83
 
78
84
  ## Security preamble
79
85
 
80
- Before starting work, check for open security alerts: run `/dev-team:security-status` if available, or check `gh api repos/{owner}/{repo}/code-scanning/alerts?state=open` and `gh api repos/{owner}/{repo}/dependabot/alerts?state=open`. Flag any critical findings before proceeding.
86
+ Before starting work, check for open security alerts: run `/dev-team:security-status` if available, or use the project's security monitoring tools. Flag any critical findings before proceeding.
81
87
 
82
88
  ## Completion
83
89
 
84
90
  When the loop exits:
85
91
  1. **Deliver the work**: If changes are on a feature branch, create the PR (body must include `Closes #<issue>`). Ensure the PR is ready to merge: CI green, reviews passed, branch up to date. Then follow the project's merge workflow — use `/dev-team:merge` if the project has it configured, otherwise report readiness. If merge fails (CI failures, merge conflicts, branch protection), report the blocker to the human rather than leaving work unattended.
86
92
  2. **Clean up worktree**: If the work was done in a worktree, clean it up after the branch is pushed and the PR is created. Do not wait for merge to clean the worktree.
87
- 3. You MUST spawn **@dev-team-borges** (Librarian) as the final step to review memory freshness, cross-agent coherence, and system improvement opportunities. Do NOT skip this.
93
+ 3. You MUST spawn **@dev-team-borges** (Librarian) as the final step. Borges will:
94
+ - **Extract structured memory entries** from review findings and implementation decisions
95
+ - Write entries to each participating agent's MEMORY.md using the structured format
96
+ - Update shared learnings in `.dev-team/learnings.md`
97
+ - Check cross-agent coherence
98
+ - Report system improvement opportunities
88
99
  4. If Borges was not spawned, the task is INCOMPLETE.
89
- 5. **Borges memory gate**: If Borges reports that any implementing agent's MEMORY.md is empty or contains only boilerplate after a task, this is a **[DEFECT]** that blocks task completion. The implementing agent must write substantive learnings before the task can be marked done. Empty agent memory after a task means the enforcement pipeline failed.
100
+ 5. **Memory formation gate**: After Borges runs, verify that each participating agent's MEMORY.md contains at least one new structured entry from this task. Empty agent memory after a completed task is a system failure Borges prevents this by automating extraction.
90
101
  6. Summarize what was accomplished across all iterations.
91
102
  7. Report any remaining `[RISK]` or `[SUGGESTION]` items, including Borges's recommendations.
92
- 8. Write key learnings to agent MEMORY.md files.