@fredericboyer/dev-team 0.8.1 → 0.9.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/init.d.ts +8 -1
- package/dist/init.js +50 -4
- package/dist/init.js.map +1 -1
- package/dist/update.d.ts +6 -0
- package/dist/update.js +87 -0
- package/dist/update.js.map +1 -1
- package/package.json +1 -1
- package/templates/CLAUDE.md +12 -8
- package/templates/agent-memory/dev-team-beck/MEMORY.md +12 -7
- package/templates/agent-memory/dev-team-borges/MEMORY.md +12 -7
- package/templates/agent-memory/dev-team-brooks/MEMORY.md +12 -7
- package/templates/agent-memory/dev-team-conway/MEMORY.md +12 -7
- package/templates/agent-memory/dev-team-deming/MEMORY.md +12 -7
- package/templates/agent-memory/dev-team-drucker/MEMORY.md +12 -7
- package/templates/agent-memory/dev-team-hamilton/MEMORY.md +12 -7
- package/templates/agent-memory/dev-team-knuth/MEMORY.md +12 -7
- package/templates/agent-memory/dev-team-mori/MEMORY.md +12 -7
- package/templates/agent-memory/dev-team-szabo/MEMORY.md +12 -7
- package/templates/agent-memory/dev-team-tufte/MEMORY.md +12 -7
- package/templates/agent-memory/dev-team-voss/MEMORY.md +12 -7
- package/templates/agents/dev-team-beck.md +1 -0
- package/templates/agents/dev-team-borges.md +38 -10
- package/templates/agents/dev-team-brooks.md +8 -0
- package/templates/agents/dev-team-conway.md +1 -0
- package/templates/agents/dev-team-deming.md +1 -0
- package/templates/agents/dev-team-drucker.md +38 -2
- package/templates/agents/dev-team-hamilton.md +1 -0
- package/templates/agents/dev-team-knuth.md +8 -0
- package/templates/agents/dev-team-mori.md +1 -0
- package/templates/agents/dev-team-szabo.md +8 -0
- package/templates/agents/dev-team-tufte.md +1 -0
- package/templates/agents/dev-team-voss.md +1 -0
- package/templates/hooks/dev-team-post-change-review.js +71 -0
- package/templates/skills/dev-team-audit/SKILL.md +1 -1
- package/templates/skills/dev-team-review/SKILL.md +25 -3
- package/templates/skills/dev-team-task/SKILL.md +19 -9
- /package/templates/{skills → workflow-skills}/dev-team-merge/SKILL.md +0 -0
- /package/templates/{skills → workflow-skills}/dev-team-security-status/SKILL.md +0 -0
|
@@ -1,12 +1,17 @@
|
|
|
1
1
|
# Agent Memory: Knuth (Quality Auditor)
|
|
2
|
-
|
|
3
|
-
|
|
4
|
-
## Coverage Gaps Identified
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
## Recurring Boundary Conditions
|
|
2
|
+
<\!-- First 200 lines are loaded into agent context. Keep concise. -->
|
|
3
|
+
<\!-- Entries use structured format: Borges extracts these automatically after each task. -->
|
|
8
4
|
|
|
5
|
+
## Structured Entries
|
|
6
|
+
<\!-- Format:
|
|
7
|
+
### [YYYY-MM-DD] Finding summary
|
|
8
|
+
- **Type**: DEFECT | RISK | SUGGESTION | OVERRULED | PATTERN | DECISION
|
|
9
|
+
- **Source**: PR #NNN or task description
|
|
10
|
+
- **Tags**: comma-separated relevant tags
|
|
11
|
+
- **Outcome**: accepted | overruled | deferred | fixed
|
|
12
|
+
- **Context**: One-sentence explanation
|
|
13
|
+
-->
|
|
9
14
|
|
|
10
15
|
## Calibration Log
|
|
11
|
-
|
|
16
|
+
<\!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
|
|
12
17
|
|
|
@@ -1,12 +1,17 @@
|
|
|
1
1
|
# Agent Memory: Mori (Frontend/UI Engineer)
|
|
2
|
-
|
|
3
|
-
|
|
4
|
-
## Project Conventions
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
## Patterns to Watch For
|
|
2
|
+
<\!-- First 200 lines are loaded into agent context. Keep concise. -->
|
|
3
|
+
<\!-- Entries use structured format: Borges extracts these automatically after each task. -->
|
|
8
4
|
|
|
5
|
+
## Structured Entries
|
|
6
|
+
<\!-- Format:
|
|
7
|
+
### [YYYY-MM-DD] Finding summary
|
|
8
|
+
- **Type**: DEFECT | RISK | SUGGESTION | OVERRULED | PATTERN | DECISION
|
|
9
|
+
- **Source**: PR #NNN or task description
|
|
10
|
+
- **Tags**: comma-separated relevant tags
|
|
11
|
+
- **Outcome**: accepted | overruled | deferred | fixed
|
|
12
|
+
- **Context**: One-sentence explanation
|
|
13
|
+
-->
|
|
9
14
|
|
|
10
15
|
## Calibration Log
|
|
11
|
-
|
|
16
|
+
<\!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
|
|
12
17
|
|
|
@@ -1,12 +1,17 @@
|
|
|
1
1
|
# Agent Memory: Szabo (Security Auditor)
|
|
2
|
-
|
|
3
|
-
|
|
4
|
-
## Trust Boundaries Mapped
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
## Known Attack Surfaces
|
|
2
|
+
<\!-- First 200 lines are loaded into agent context. Keep concise. -->
|
|
3
|
+
<\!-- Entries use structured format: Borges extracts these automatically after each task. -->
|
|
8
4
|
|
|
5
|
+
## Structured Entries
|
|
6
|
+
<\!-- Format:
|
|
7
|
+
### [YYYY-MM-DD] Finding summary
|
|
8
|
+
- **Type**: DEFECT | RISK | SUGGESTION | OVERRULED | PATTERN | DECISION
|
|
9
|
+
- **Source**: PR #NNN or task description
|
|
10
|
+
- **Tags**: comma-separated relevant tags
|
|
11
|
+
- **Outcome**: accepted | overruled | deferred | fixed
|
|
12
|
+
- **Context**: One-sentence explanation
|
|
13
|
+
-->
|
|
9
14
|
|
|
10
15
|
## Calibration Log
|
|
11
|
-
|
|
16
|
+
<\!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
|
|
12
17
|
|
|
@@ -1,12 +1,17 @@
|
|
|
1
1
|
# Agent Memory: Tufte (Documentation Engineer)
|
|
2
|
-
|
|
3
|
-
|
|
4
|
-
## Project Conventions
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
## Patterns to Watch For
|
|
2
|
+
<\!-- First 200 lines are loaded into agent context. Keep concise. -->
|
|
3
|
+
<\!-- Entries use structured format: Borges extracts these automatically after each task. -->
|
|
8
4
|
|
|
5
|
+
## Structured Entries
|
|
6
|
+
<\!-- Format:
|
|
7
|
+
### [YYYY-MM-DD] Finding summary
|
|
8
|
+
- **Type**: DEFECT | RISK | SUGGESTION | OVERRULED | PATTERN | DECISION
|
|
9
|
+
- **Source**: PR #NNN or task description
|
|
10
|
+
- **Tags**: comma-separated relevant tags
|
|
11
|
+
- **Outcome**: accepted | overruled | deferred | fixed
|
|
12
|
+
- **Context**: One-sentence explanation
|
|
13
|
+
-->
|
|
9
14
|
|
|
10
15
|
## Calibration Log
|
|
11
|
-
|
|
16
|
+
<\!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
|
|
12
17
|
|
|
@@ -1,12 +1,17 @@
|
|
|
1
1
|
# Agent Memory: Voss (Backend Engineer)
|
|
2
|
-
|
|
3
|
-
|
|
4
|
-
## Project Conventions
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
## Patterns to Watch For
|
|
2
|
+
<\!-- First 200 lines are loaded into agent context. Keep concise. -->
|
|
3
|
+
<\!-- Entries use structured format: Borges extracts these automatically after each task. -->
|
|
8
4
|
|
|
5
|
+
## Structured Entries
|
|
6
|
+
<\!-- Format:
|
|
7
|
+
### [YYYY-MM-DD] Finding summary
|
|
8
|
+
- **Type**: DEFECT | RISK | SUGGESTION | OVERRULED | PATTERN | DECISION
|
|
9
|
+
- **Source**: PR #NNN or task description
|
|
10
|
+
- **Tags**: comma-separated relevant tags
|
|
11
|
+
- **Outcome**: accepted | overruled | deferred | fixed
|
|
12
|
+
- **Context**: One-sentence explanation
|
|
13
|
+
-->
|
|
9
14
|
|
|
10
15
|
## Calibration Log
|
|
11
|
-
|
|
16
|
+
<\!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
|
|
12
17
|
|
|
@@ -58,6 +58,7 @@ Rules:
|
|
|
58
58
|
3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
|
|
59
59
|
4. One exchange each before escalating to the human.
|
|
60
60
|
5. Acknowledge good work when you see it.
|
|
61
|
+
6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
|
|
61
62
|
|
|
62
63
|
## Learning
|
|
63
64
|
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: dev-team-borges
|
|
3
|
-
description: Librarian. Always spawned at end of every task to
|
|
3
|
+
description: Librarian. Always spawned at end of every task to extract structured memory entries from review findings, update shared learnings, ensure cross-agent coherence, and identify system improvement opportunities. Writes to both shared learnings and agent memories.
|
|
4
4
|
tools: Read, Edit, Write, Bash, Grep, Glob, Agent
|
|
5
5
|
model: opus
|
|
6
6
|
memory: project
|
|
@@ -16,13 +16,40 @@ Your philosophy: "A library that is not maintained becomes a labyrinth."
|
|
|
16
16
|
|
|
17
17
|
You are spawned **at the end of every task** — after implementation and review are complete, before the final summary is presented to the human.
|
|
18
18
|
|
|
19
|
-
You **write directly** to
|
|
19
|
+
You **write directly** to:
|
|
20
|
+
- `.dev-team/learnings.md` — shared team facts (benchmarks, conventions, tech debt)
|
|
21
|
+
- `.dev-team/agent-memory/*/MEMORY.md` — structured memory entries extracted from review findings and implementation decisions
|
|
20
22
|
|
|
21
|
-
|
|
23
|
+
Memory formation is **automated, not optional**. You extract entries from the task output — you do not wait for agents to write their own memories. Empty agent memory after a completed task is a system failure that you prevent.
|
|
22
24
|
|
|
23
25
|
You do **not** modify code, agent definitions, hooks, or configuration.
|
|
24
26
|
|
|
25
|
-
### 1.
|
|
27
|
+
### 1. Extract structured memory entries (automated)
|
|
28
|
+
|
|
29
|
+
After every task or review, extract memory entries from:
|
|
30
|
+
- **Classified findings** from reviewers (DEFECT, RISK, SUGGESTION)
|
|
31
|
+
- **Key implementation decisions** made by the implementing agent
|
|
32
|
+
- **Human overrules** — when the human overrules a finding, record the overrule
|
|
33
|
+
- **Patterns discovered** — recurring issues, architectural patterns, boundary conditions
|
|
34
|
+
|
|
35
|
+
Write entries to the appropriate agent's MEMORY.md using the structured format:
|
|
36
|
+
|
|
37
|
+
```markdown
|
|
38
|
+
### [YYYY-MM-DD] Finding summary
|
|
39
|
+
- **Type**: DEFECT | RISK | SUGGESTION | OVERRULED | PATTERN | DECISION
|
|
40
|
+
- **Source**: PR #NNN or task description
|
|
41
|
+
- **Tags**: comma-separated relevant tags (auth, sql, boundary-condition, etc.)
|
|
42
|
+
- **Outcome**: accepted | overruled | deferred | fixed
|
|
43
|
+
- **Context**: One-sentence explanation of what happened and why it matters
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
**Extraction rules:**
|
|
47
|
+
- Every accepted DEFECT becomes a memory entry for the reviewer who found it (reinforcement)
|
|
48
|
+
- Every overruled finding becomes an OVERRULED entry for the reviewer (calibration)
|
|
49
|
+
- Every significant implementation decision becomes a DECISION entry for the implementer
|
|
50
|
+
- Recurring patterns across tasks become PATTERN entries
|
|
51
|
+
|
|
52
|
+
### 2. Update shared learnings (you write this)
|
|
26
53
|
|
|
27
54
|
Read and update `.dev-team/learnings.md`:
|
|
28
55
|
1. Are quality benchmarks current (test count, agent count, hook count)? Update them.
|
|
@@ -30,16 +57,16 @@ Read and update `.dev-team/learnings.md`:
|
|
|
30
57
|
3. Are known tech debt items still open or were they resolved? Update status.
|
|
31
58
|
4. Should any new learnings from this task be added? Add them.
|
|
32
59
|
|
|
33
|
-
###
|
|
60
|
+
### 3. Audit existing agent memories
|
|
34
61
|
|
|
35
62
|
For each agent that participated in the task:
|
|
36
63
|
1. Read their `MEMORY.md` in `.dev-team/agent-memory/<agent>/`
|
|
37
|
-
2. Check: are
|
|
64
|
+
2. Check: are existing entries still accurate? Has the codebase changed in ways that invalidate them?
|
|
38
65
|
3. Flag stale entries (patterns that changed, challenges that were overruled, outdated benchmarks)
|
|
39
|
-
4. Flag if approaching the 200-line cap —
|
|
40
|
-
5.
|
|
66
|
+
4. Flag if approaching the 200-line cap — compress older entries into summaries
|
|
67
|
+
5. Remove entries that duplicate what is already in `.dev-team/learnings.md`
|
|
41
68
|
|
|
42
|
-
###
|
|
69
|
+
### 4. System improvement
|
|
43
70
|
|
|
44
71
|
Based on what happened during this task:
|
|
45
72
|
1. Were any CLAUDE.md directives ignored or worked around? → Recommend making them hooks
|
|
@@ -47,7 +74,7 @@ Based on what happened during this task:
|
|
|
47
74
|
3. Did agents flag the same issue multiple times across sessions? → Recommend a hook
|
|
48
75
|
4. Were there coordination failures between agents? → Recommend a workflow change
|
|
49
76
|
|
|
50
|
-
###
|
|
77
|
+
### 5. Cross-agent coherence
|
|
51
78
|
|
|
52
79
|
Check for contradictions between agent memories:
|
|
53
80
|
- Does Szabo's memory contradict Voss's architectural decisions?
|
|
@@ -57,6 +84,7 @@ Check for contradictions between agent memories:
|
|
|
57
84
|
## Focus areas
|
|
58
85
|
|
|
59
86
|
You always check for:
|
|
87
|
+
- **Memory formation**: Every task must produce at least one structured memory entry per participating agent. Empty memory is a system failure.
|
|
60
88
|
- **Memory freshness**: Every fact in memory should be verifiable in the current codebase
|
|
61
89
|
- **Benchmark accuracy**: Test counts, agent counts, hook counts — these change frequently
|
|
62
90
|
- **Guideline-to-hook promotion**: If a guideline was ignored, it should be a hook (ADR-001)
|
|
@@ -64,6 +64,13 @@ These quality attributes are owned by other agents — do not assess them:
|
|
|
64
64
|
- **Availability** — owned by Hamilton (health checks, graceful degradation, deployment quality)
|
|
65
65
|
- **Portability** — owned by Deming
|
|
66
66
|
|
|
67
|
+
## Review depth levels
|
|
68
|
+
|
|
69
|
+
When spawned with a review depth directive from the post-change-review hook:
|
|
70
|
+
- **LIGHT**: Advisory only. Report observations as `[SUGGESTION]` or `[RISK]`. Do not classify anything as `[DEFECT]`. Keep analysis brief — this is a low-complexity change.
|
|
71
|
+
- **STANDARD**: Full review with all classification levels. Default behavior.
|
|
72
|
+
- **DEEP**: Expanded analysis. Trace dependency chains further. Assess scalability at higher load multiples. Check for hidden coupling through shared state. This is a high-complexity change.
|
|
73
|
+
|
|
67
74
|
## Challenge style
|
|
68
75
|
|
|
69
76
|
You analyze structural consequences over time:
|
|
@@ -89,6 +96,7 @@ Rules:
|
|
|
89
96
|
4. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
|
|
90
97
|
5. One exchange each before escalating to the human.
|
|
91
98
|
6. Acknowledge good work when you see it.
|
|
99
|
+
7. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
|
|
92
100
|
|
|
93
101
|
## Learning
|
|
94
102
|
|
|
@@ -57,6 +57,7 @@ Rules:
|
|
|
57
57
|
3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
|
|
58
58
|
4. One exchange each before escalating to the human.
|
|
59
59
|
5. Acknowledge good work when you see it.
|
|
60
|
+
6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
|
|
60
61
|
|
|
61
62
|
## Learning
|
|
62
63
|
|
|
@@ -78,6 +78,7 @@ Rules:
|
|
|
78
78
|
3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
|
|
79
79
|
4. One exchange each before escalating to the human.
|
|
80
80
|
5. Acknowledge good work when you see it.
|
|
81
|
+
6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
|
|
81
82
|
|
|
82
83
|
## Learning
|
|
83
84
|
|
|
@@ -73,13 +73,49 @@ If Architect determines no ADR is needed, proceed directly to delegation.
|
|
|
73
73
|
### 4. Delegate
|
|
74
74
|
|
|
75
75
|
1. Spawn the implementing agent with the full task description (including ADR if flagged).
|
|
76
|
-
2. After implementation completes,
|
|
76
|
+
2. After implementation completes, **validate the output** before spawning reviewers (see step 4b).
|
|
77
77
|
3. Each reviewer uses their agent definition from `.dev-team/agents/`.
|
|
78
78
|
|
|
79
|
+
### 4b. Validate implementation output
|
|
80
|
+
|
|
81
|
+
Before routing implementation output to reviewers, verify minimum quality thresholds. This catches silent failures before they waste reviewer tokens.
|
|
82
|
+
|
|
83
|
+
**Validation checks:**
|
|
84
|
+
1. **Non-empty diff**: `git diff` shows actual changes on the branch. An implementation that produces no changes is a silent failure.
|
|
85
|
+
2. **Tests pass**: The project's test command was executed and exited successfully. If tests were not run, route back to the implementer.
|
|
86
|
+
3. **Relevance**: Changed files relate to the stated issue. If the implementer modified unrelated files without explanation, flag it.
|
|
87
|
+
4. **Clean working tree**: No uncommitted debris left behind. All changes should be committed.
|
|
88
|
+
|
|
89
|
+
**On validation failure:**
|
|
90
|
+
- Route back to the implementing agent with the specific failure reason and ask them to fix it.
|
|
91
|
+
- If validation fails twice for the same check, **escalate to the human** with what went wrong. Do not retry indefinitely.
|
|
92
|
+
|
|
93
|
+
**On validation success:**
|
|
94
|
+
- Proceed to spawn review agents in parallel as background subagents.
|
|
95
|
+
|
|
79
96
|
### 5. Manage the review loop
|
|
80
97
|
|
|
81
|
-
Collect classified findings from all reviewers
|
|
98
|
+
Collect classified findings from all reviewers, then **filter before presenting to the human**.
|
|
99
|
+
|
|
100
|
+
#### 5a. Judge filtering pass
|
|
101
|
+
|
|
102
|
+
Before presenting findings, run this filtering pass to maximize signal quality:
|
|
103
|
+
|
|
104
|
+
1. **Remove contradictions**: Findings that contradict existing ADRs in `docs/adr/`, entries in `.dev-team/learnings.md`, or agent memory entries. These represent things the team has already decided.
|
|
105
|
+
2. **Deduplicate**: When multiple agents flag the same issue, keep the most specific finding (the one with the most concrete scenario) and drop the others.
|
|
106
|
+
3. **Consolidate suggestions**: Group `[SUGGESTION]`-level items into a single summary block rather than presenting each individually. Suggestions should not dominate the review output.
|
|
107
|
+
4. **Suppress generated file findings**: Skip findings on generated, vendored, or build artifact files (`node_modules/`, `dist/`, `vendor/`, lock files, etc.).
|
|
108
|
+
5. **Validate DEFECT findings**: Each `[DEFECT]` must include a concrete scenario demonstrating the defect. If a finding says "this could be wrong" without a specific input, sequence, or condition that triggers the defect, downgrade it to `[RISK]`.
|
|
109
|
+
|
|
110
|
+
**Filtered findings are logged** (not silently dropped) in the review summary under a "Filtered" section. This allows calibration tracking — if the same finding keeps getting filtered, the underlying issue may need an ADR or a learnings entry.
|
|
111
|
+
|
|
112
|
+
#### 5b. Handle "No substantive findings"
|
|
113
|
+
|
|
114
|
+
When a reviewer reports "No substantive findings", treat this as a **valid, positive signal**. Do not request that the reviewer try harder or look again. Silence from a reviewer means they found nothing worth reporting — this is the expected outcome for well-written code.
|
|
115
|
+
|
|
116
|
+
#### 5c. Route findings
|
|
82
117
|
|
|
118
|
+
After filtering:
|
|
83
119
|
- **`[DEFECT]`** — must be fixed. Send back to the implementing agent with the specific finding.
|
|
84
120
|
- **`[RISK]`**, **`[QUESTION]`**, **`[SUGGESTION]`** — advisory. Collect and report.
|
|
85
121
|
|
|
@@ -61,6 +61,7 @@ Rules:
|
|
|
61
61
|
3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
|
|
62
62
|
4. One exchange each before escalating to the human.
|
|
63
63
|
5. Acknowledge good work when you see it.
|
|
64
|
+
6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
|
|
64
65
|
|
|
65
66
|
## Learning
|
|
66
67
|
|
|
@@ -31,6 +31,13 @@ You always check for:
|
|
|
31
31
|
- **Regression risks**: Every bug fix without a corresponding test is a bug that will return.
|
|
32
32
|
- **Test-to-implementation traceability**: Can you trace from each requirement to a test that verifies it? Where does the chain break?
|
|
33
33
|
|
|
34
|
+
## Review depth levels
|
|
35
|
+
|
|
36
|
+
When spawned with a review depth directive from the post-change-review hook:
|
|
37
|
+
- **LIGHT**: Advisory only. Report observations as `[SUGGESTION]` or `[RISK]`. Do not classify anything as `[DEFECT]`. Keep analysis brief — this is a low-complexity change.
|
|
38
|
+
- **STANDARD**: Full review with all classification levels. Default behavior.
|
|
39
|
+
- **DEEP**: Expanded analysis. Check all boundary conditions, not just the obvious ones. Trace every code path. Construct edge-case inputs. This is a high-complexity change.
|
|
40
|
+
|
|
34
41
|
## Challenge style
|
|
35
42
|
|
|
36
43
|
You identify what is missing or unproven. You construct specific inputs that expose gaps:
|
|
@@ -55,6 +62,7 @@ Rules:
|
|
|
55
62
|
3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
|
|
56
63
|
4. One exchange each before escalating to the human.
|
|
57
64
|
5. Acknowledge good work when you see it.
|
|
65
|
+
6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
|
|
58
66
|
|
|
59
67
|
## Learning
|
|
60
68
|
|
|
@@ -58,6 +58,7 @@ Rules:
|
|
|
58
58
|
3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
|
|
59
59
|
4. One exchange each before escalating to the human.
|
|
60
60
|
5. Acknowledge good work when you see it.
|
|
61
|
+
6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
|
|
61
62
|
|
|
62
63
|
## Learning
|
|
63
64
|
|
|
@@ -32,6 +32,13 @@ You always check for:
|
|
|
32
32
|
- **Cryptographic hygiene**: No custom crypto. No deprecated algorithms. Proper key management.
|
|
33
33
|
- **Supply chain risk**: Every dependency is an attack surface. Known vulnerabilities in transitive dependencies are your vulnerabilities.
|
|
34
34
|
|
|
35
|
+
## Review depth levels
|
|
36
|
+
|
|
37
|
+
When spawned with a review depth directive from the post-change-review hook:
|
|
38
|
+
- **LIGHT**: Advisory only. Report observations as `[SUGGESTION]` or `[RISK]`. Do not classify anything as `[DEFECT]`. Keep analysis brief — this is a low-complexity change.
|
|
39
|
+
- **STANDARD**: Full review with all classification levels. Default behavior.
|
|
40
|
+
- **DEEP**: Expanded analysis. Map the full attack surface. Construct more attack scenarios. Check transitive dependencies. This is a high-complexity or security-sensitive change.
|
|
41
|
+
|
|
35
42
|
## Challenge style
|
|
36
43
|
|
|
37
44
|
You construct specific attack paths against the actual code, not generic checklists:
|
|
@@ -55,6 +62,7 @@ Rules:
|
|
|
55
62
|
3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
|
|
56
63
|
4. One exchange each before escalating to the human.
|
|
57
64
|
5. Acknowledge good work when you see it.
|
|
65
|
+
6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
|
|
58
66
|
|
|
59
67
|
## Learning
|
|
60
68
|
|
|
@@ -73,6 +73,7 @@ Rules:
|
|
|
73
73
|
3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
|
|
74
74
|
4. One exchange each before escalating to the human.
|
|
75
75
|
5. Acknowledge good work when you see it.
|
|
76
|
+
6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
|
|
76
77
|
|
|
77
78
|
## Learning
|
|
78
79
|
|
|
@@ -59,6 +59,7 @@ Rules:
|
|
|
59
59
|
3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
|
|
60
60
|
4. One exchange each before escalating to the human.
|
|
61
61
|
5. Acknowledge good work when you see it.
|
|
62
|
+
6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
|
|
62
63
|
|
|
63
64
|
## Learning
|
|
64
65
|
|
|
@@ -233,8 +233,79 @@ if (flags.length === 0) {
|
|
|
233
233
|
process.exit(0);
|
|
234
234
|
}
|
|
235
235
|
|
|
236
|
+
// ─── Complexity-based triage ─────────────────────────────────────────────────
|
|
237
|
+
// Score the change to determine review depth: LIGHT, STANDARD, or DEEP.
|
|
238
|
+
// Uses available tool_input data (old_string/new_string for Edit, content for Write).
|
|
239
|
+
|
|
240
|
+
function scoreComplexity(toolInput, filePath) {
|
|
241
|
+
let score = 0;
|
|
242
|
+
|
|
243
|
+
// Lines changed
|
|
244
|
+
const oldStr = toolInput.old_string || "";
|
|
245
|
+
const newStr = toolInput.new_string || toolInput.content || "";
|
|
246
|
+
const oldLines = oldStr ? oldStr.split("\n").length : 0;
|
|
247
|
+
const newLines = newStr ? newStr.split("\n").length : 0;
|
|
248
|
+
const linesChanged = Math.abs(newLines - oldLines) + Math.min(oldLines, newLines);
|
|
249
|
+
score += Math.min(linesChanged, 50); // Cap at 50 to avoid single large file dominating
|
|
250
|
+
|
|
251
|
+
// Complexity indicators in the new content
|
|
252
|
+
const complexityPatterns = [
|
|
253
|
+
/\bfunction\b/g, // new functions
|
|
254
|
+
/\bclass\b/g, // new classes
|
|
255
|
+
/\bif\b.*\belse\b/g, // control flow
|
|
256
|
+
/\bcatch\b/g, // error handling
|
|
257
|
+
/\bthrow\b/g, // error throwing
|
|
258
|
+
/\basync\b/g, // async operations
|
|
259
|
+
/\bawait\b/g, // async operations
|
|
260
|
+
/\bexport\b/g, // API surface changes
|
|
261
|
+
];
|
|
262
|
+
|
|
263
|
+
for (const pattern of complexityPatterns) {
|
|
264
|
+
const matches = newStr.match(pattern);
|
|
265
|
+
if (matches) score += matches.length * 2;
|
|
266
|
+
}
|
|
267
|
+
|
|
268
|
+
// Security-sensitive files get a boost
|
|
269
|
+
if (SECURITY_PATTERNS.some((p) => p.test(filePath))) {
|
|
270
|
+
score += 20;
|
|
271
|
+
}
|
|
272
|
+
|
|
273
|
+
return score;
|
|
274
|
+
}
|
|
275
|
+
|
|
276
|
+
// Read configurable thresholds from config.json, or use defaults
|
|
277
|
+
let lightThreshold = 10;
|
|
278
|
+
let deepThreshold = 40;
|
|
279
|
+
try {
|
|
280
|
+
const fs = require("fs");
|
|
281
|
+
const configPath = path.join(process.cwd(), ".dev-team", "config.json");
|
|
282
|
+
const config = JSON.parse(fs.readFileSync(configPath, "utf-8"));
|
|
283
|
+
if (config.reviewThresholds) {
|
|
284
|
+
lightThreshold = config.reviewThresholds.light || lightThreshold;
|
|
285
|
+
deepThreshold = config.reviewThresholds.deep || deepThreshold;
|
|
286
|
+
}
|
|
287
|
+
} catch {
|
|
288
|
+
// Use defaults
|
|
289
|
+
}
|
|
290
|
+
|
|
291
|
+
const complexityScore = scoreComplexity(input.tool_input || {}, fullPath);
|
|
292
|
+
let reviewDepth = "STANDARD";
|
|
293
|
+
if (complexityScore < lightThreshold) {
|
|
294
|
+
reviewDepth = "LIGHT";
|
|
295
|
+
} else if (complexityScore >= deepThreshold) {
|
|
296
|
+
reviewDepth = "DEEP";
|
|
297
|
+
}
|
|
298
|
+
|
|
236
299
|
// Output as a DIRECTIVE, not a suggestion. CLAUDE.md instructs the LLM to act on this.
|
|
237
300
|
console.log(`[dev-team] ACTION REQUIRED — spawn these agents as background reviewers:`);
|
|
301
|
+
console.log(`[dev-team] Review depth: ${reviewDepth} (complexity score: ${complexityScore})`);
|
|
302
|
+
if (reviewDepth === "LIGHT") {
|
|
303
|
+
console.log(`[dev-team] LIGHT review: findings are advisory only — do not classify as [DEFECT].`);
|
|
304
|
+
} else if (reviewDepth === "DEEP") {
|
|
305
|
+
console.log(
|
|
306
|
+
`[dev-team] DEEP review: high complexity — request thorough analysis from all reviewers.`,
|
|
307
|
+
);
|
|
308
|
+
}
|
|
238
309
|
for (const flag of flags) {
|
|
239
310
|
console.log(` → ${flag}`);
|
|
240
311
|
}
|
|
@@ -86,7 +86,7 @@ Numbered list of concrete actions, ordered by priority. Each action should refer
|
|
|
86
86
|
|
|
87
87
|
### Security preamble
|
|
88
88
|
|
|
89
|
-
Before starting the audit, check for open security alerts: run `/dev-team:security-status` if available, or
|
|
89
|
+
Before starting the audit, check for open security alerts: run `/dev-team:security-status` if available, or use the project's security monitoring tools. Include these in the audit scope.
|
|
90
90
|
|
|
91
91
|
### Completion
|
|
92
92
|
|
|
@@ -27,6 +27,12 @@ Run a multi-agent parallel review of: $ARGUMENTS
|
|
|
27
27
|
|
|
28
28
|
3. Always include @dev-team-szabo and @dev-team-knuth — they review all code changes.
|
|
29
29
|
|
|
30
|
+
## Pre-review validation
|
|
31
|
+
|
|
32
|
+
Before spawning reviewers, verify the changes are reviewable:
|
|
33
|
+
1. **Non-empty diff**: The diff contains actual changes to review. If empty, report "nothing to review" and stop.
|
|
34
|
+
2. **Tests pass**: If the project has a test command, confirm tests pass. Flag test failures in the review report header.
|
|
35
|
+
|
|
30
36
|
## Execution
|
|
31
37
|
|
|
32
38
|
1. Spawn each selected agent as a **parallel background subagent** using the Agent tool with `subagent_type: "general-purpose"`.
|
|
@@ -39,6 +45,18 @@ Run a multi-agent parallel review of: $ARGUMENTS
|
|
|
39
45
|
|
|
40
46
|
3. Wait for all agents to complete.
|
|
41
47
|
|
|
48
|
+
## Filter findings (judge pass)
|
|
49
|
+
|
|
50
|
+
Before producing the report, filter raw findings to maximize signal quality:
|
|
51
|
+
1. **Remove contradictions**: Drop findings that contradict existing ADRs, learnings, or agent memory
|
|
52
|
+
2. **Deduplicate**: When multiple agents flag the same issue, keep the most specific finding
|
|
53
|
+
3. **Consolidate suggestions**: Group `[SUGGESTION]`-level items into a single summary block
|
|
54
|
+
4. **Suppress generated file findings**: Skip findings on generated, vendored, or build artifacts
|
|
55
|
+
5. **Validate DEFECTs**: Each `[DEFECT]` must include a concrete scenario — downgrade to `[RISK]` if not
|
|
56
|
+
6. **Accept silence**: "No substantive findings" from a reviewer is a valid positive signal — do not request re-review
|
|
57
|
+
|
|
58
|
+
Log filtered findings in a "Filtered" section for calibration tracking.
|
|
59
|
+
|
|
42
60
|
## Report
|
|
43
61
|
|
|
44
62
|
Produce a unified review summary:
|
|
@@ -69,12 +87,16 @@ State the verdict clearly. List what must be fixed for approval if requesting ch
|
|
|
69
87
|
|
|
70
88
|
### Security preamble
|
|
71
89
|
|
|
72
|
-
Before starting the review, check for open security alerts: run `/dev-team:security-status` if available, or
|
|
90
|
+
Before starting the review, check for open security alerts: run `/dev-team:security-status` if available, or use the project's security monitoring tools. Flag any critical findings in the review report.
|
|
73
91
|
|
|
74
92
|
### Completion
|
|
75
93
|
|
|
76
94
|
After the review report is delivered:
|
|
77
|
-
1. You MUST spawn **@dev-team-borges** (Librarian) as the final step
|
|
95
|
+
1. You MUST spawn **@dev-team-borges** (Librarian) as the final step. Borges will:
|
|
96
|
+
- **Extract structured memory entries** from the review findings (each classified finding becomes a memory entry for the reviewer who produced it)
|
|
97
|
+
- Write entries to each participating agent's MEMORY.md using the structured format
|
|
98
|
+
- Update shared learnings in `.dev-team/learnings.md`
|
|
99
|
+
- Check cross-agent coherence
|
|
78
100
|
2. If Borges was not spawned, the review is INCOMPLETE.
|
|
79
|
-
3. **
|
|
101
|
+
3. **Memory formation gate**: After Borges runs, verify that each participating reviewer's MEMORY.md contains at least one new structured entry from this review.
|
|
80
102
|
4. Include Borges's recommendations in the final report.
|
|
@@ -38,11 +38,17 @@ Before the first iteration, the implementing agent should research current best
|
|
|
38
38
|
Track iterations in conversation context (no state files). For each iteration:
|
|
39
39
|
|
|
40
40
|
1. The implementing agent works on the task.
|
|
41
|
-
2.
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
41
|
+
2. **Validate implementation output** before spawning reviewers:
|
|
42
|
+
- Non-empty diff: `git diff` shows actual changes
|
|
43
|
+
- Tests pass: test command executed with exit code 0
|
|
44
|
+
- Relevance: changed files relate to the stated issue
|
|
45
|
+
- Clean working tree: no uncommitted debris
|
|
46
|
+
- If validation fails, route back to implementer with specific failure reason. If it fails twice, escalate to human.
|
|
47
|
+
3. After validation passes, spawn review agents in parallel as background tasks.
|
|
48
|
+
4. Collect classified challenges from reviewers.
|
|
49
|
+
5. If any `[DEFECT]` challenges exist, address them in the next iteration.
|
|
50
|
+
6. If no `[DEFECT]` remains, output DONE to exit the loop.
|
|
51
|
+
7. If max iterations reached without convergence, report remaining defects and exit.
|
|
46
52
|
|
|
47
53
|
The convergence check happens in conversation context: count iterations, check for `[DEFECT]` findings, and decide whether to continue or exit.
|
|
48
54
|
|
|
@@ -77,16 +83,20 @@ Parallel mode is complete when:
|
|
|
77
83
|
|
|
78
84
|
## Security preamble
|
|
79
85
|
|
|
80
|
-
Before starting work, check for open security alerts: run `/dev-team:security-status` if available, or
|
|
86
|
+
Before starting work, check for open security alerts: run `/dev-team:security-status` if available, or use the project's security monitoring tools. Flag any critical findings before proceeding.
|
|
81
87
|
|
|
82
88
|
## Completion
|
|
83
89
|
|
|
84
90
|
When the loop exits:
|
|
85
91
|
1. **Deliver the work**: If changes are on a feature branch, create the PR (body must include `Closes #<issue>`). Ensure the PR is ready to merge: CI green, reviews passed, branch up to date. Then follow the project's merge workflow — use `/dev-team:merge` if the project has it configured, otherwise report readiness. If merge fails (CI failures, merge conflicts, branch protection), report the blocker to the human rather than leaving work unattended.
|
|
86
92
|
2. **Clean up worktree**: If the work was done in a worktree, clean it up after the branch is pushed and the PR is created. Do not wait for merge to clean the worktree.
|
|
87
|
-
3. You MUST spawn **@dev-team-borges** (Librarian) as the final step
|
|
93
|
+
3. You MUST spawn **@dev-team-borges** (Librarian) as the final step. Borges will:
|
|
94
|
+
- **Extract structured memory entries** from review findings and implementation decisions
|
|
95
|
+
- Write entries to each participating agent's MEMORY.md using the structured format
|
|
96
|
+
- Update shared learnings in `.dev-team/learnings.md`
|
|
97
|
+
- Check cross-agent coherence
|
|
98
|
+
- Report system improvement opportunities
|
|
88
99
|
4. If Borges was not spawned, the task is INCOMPLETE.
|
|
89
|
-
5. **
|
|
100
|
+
5. **Memory formation gate**: After Borges runs, verify that each participating agent's MEMORY.md contains at least one new structured entry from this task. Empty agent memory after a completed task is a system failure — Borges prevents this by automating extraction.
|
|
90
101
|
6. Summarize what was accomplished across all iterations.
|
|
91
102
|
7. Report any remaining `[RISK]` or `[SUGGESTION]` items, including Borges's recommendations.
|
|
92
|
-
8. Write key learnings to agent MEMORY.md files.
|
|
File without changes
|
|
File without changes
|