npm - @fredericboyer/dev-team - Versions diffs - 0.8.1 → 0.9.0 - Mend

@fredericboyer/dev-team 0.8.1 → 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (38) hide show

package/dist/init.d.ts +8 -1
package/dist/init.js +50 -4
package/dist/init.js.map +1 -1
package/dist/update.d.ts +6 -0
package/dist/update.js +87 -0
package/dist/update.js.map +1 -1
package/package.json +1 -1
package/templates/CLAUDE.md +12 -8
package/templates/agent-memory/dev-team-beck/MEMORY.md +12 -7
package/templates/agent-memory/dev-team-borges/MEMORY.md +12 -7
package/templates/agent-memory/dev-team-brooks/MEMORY.md +12 -7
package/templates/agent-memory/dev-team-conway/MEMORY.md +12 -7
package/templates/agent-memory/dev-team-deming/MEMORY.md +12 -7
package/templates/agent-memory/dev-team-drucker/MEMORY.md +12 -7
package/templates/agent-memory/dev-team-hamilton/MEMORY.md +12 -7
package/templates/agent-memory/dev-team-knuth/MEMORY.md +12 -7
package/templates/agent-memory/dev-team-mori/MEMORY.md +12 -7
package/templates/agent-memory/dev-team-szabo/MEMORY.md +12 -7
package/templates/agent-memory/dev-team-tufte/MEMORY.md +12 -7
package/templates/agent-memory/dev-team-voss/MEMORY.md +12 -7
package/templates/agents/dev-team-beck.md +1 -0
package/templates/agents/dev-team-borges.md +38 -10
package/templates/agents/dev-team-brooks.md +8 -0
package/templates/agents/dev-team-conway.md +1 -0
package/templates/agents/dev-team-deming.md +1 -0
package/templates/agents/dev-team-drucker.md +38 -2
package/templates/agents/dev-team-hamilton.md +1 -0
package/templates/agents/dev-team-knuth.md +8 -0
package/templates/agents/dev-team-mori.md +1 -0
package/templates/agents/dev-team-szabo.md +8 -0
package/templates/agents/dev-team-tufte.md +1 -0
package/templates/agents/dev-team-voss.md +1 -0
package/templates/hooks/dev-team-post-change-review.js +71 -0
package/templates/skills/dev-team-audit/SKILL.md +1 -1
package/templates/skills/dev-team-review/SKILL.md +25 -3
package/templates/skills/dev-team-task/SKILL.md +19 -9
/package/templates/{skills → workflow-skills}/dev-team-merge/SKILL.md +0 -0
/package/templates/{skills → workflow-skills}/dev-team-security-status/SKILL.md +0 -0

package/templates/agent-memory/dev-team-knuth/MEMORY.md CHANGED Viewed

@@ -1,12 +1,17 @@
 # Agent Memory: Knuth (Quality Auditor)
-<!-- First 200 lines are loaded into agent context. Keep concise. -->
-## Coverage Gaps Identified
-## Recurring Boundary Conditions
+<\!-- First 200 lines are loaded into agent context. Keep concise. -->
+<\!-- Entries use structured format: Borges extracts these automatically after each task. -->
+## Structured Entries
+<\!-- Format:
+### [YYYY-MM-DD] Finding summary
+- **Type**: DEFECT | RISK | SUGGESTION | OVERRULED | PATTERN | DECISION
+- **Source**: PR #NNN or task description
+- **Tags**: comma-separated relevant tags
+- **Outcome**: accepted | overruled | deferred | fixed
+- **Context**: One-sentence explanation
+-->
 ## Calibration Log
-<!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
+<\!-- Challenges accepted/overruled — tunes adversarial intensity over time -->

package/templates/agent-memory/dev-team-mori/MEMORY.md CHANGED Viewed

@@ -1,12 +1,17 @@
 # Agent Memory: Mori (Frontend/UI Engineer)
-<!-- First 200 lines are loaded into agent context. Keep concise. -->
-## Project Conventions
-## Patterns to Watch For
+<\!-- First 200 lines are loaded into agent context. Keep concise. -->
+<\!-- Entries use structured format: Borges extracts these automatically after each task. -->
+## Structured Entries
+<\!-- Format:
+### [YYYY-MM-DD] Finding summary
+- **Type**: DEFECT | RISK | SUGGESTION | OVERRULED | PATTERN | DECISION
+- **Source**: PR #NNN or task description
+- **Tags**: comma-separated relevant tags
+- **Outcome**: accepted | overruled | deferred | fixed
+- **Context**: One-sentence explanation
+-->
 ## Calibration Log
-<!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
+<\!-- Challenges accepted/overruled — tunes adversarial intensity over time -->

package/templates/agent-memory/dev-team-szabo/MEMORY.md CHANGED Viewed

@@ -1,12 +1,17 @@
 # Agent Memory: Szabo (Security Auditor)
-<!-- First 200 lines are loaded into agent context. Keep concise. -->
-## Trust Boundaries Mapped
-## Known Attack Surfaces
+<\!-- First 200 lines are loaded into agent context. Keep concise. -->
+<\!-- Entries use structured format: Borges extracts these automatically after each task. -->
+## Structured Entries
+<\!-- Format:
+### [YYYY-MM-DD] Finding summary
+- **Type**: DEFECT | RISK | SUGGESTION | OVERRULED | PATTERN | DECISION
+- **Source**: PR #NNN or task description
+- **Tags**: comma-separated relevant tags
+- **Outcome**: accepted | overruled | deferred | fixed
+- **Context**: One-sentence explanation
+-->
 ## Calibration Log
-<!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
+<\!-- Challenges accepted/overruled — tunes adversarial intensity over time -->

package/templates/agent-memory/dev-team-tufte/MEMORY.md CHANGED Viewed

@@ -1,12 +1,17 @@
 # Agent Memory: Tufte (Documentation Engineer)
-<!-- First 200 lines are loaded into agent context. Keep concise. -->
-## Project Conventions
-## Patterns to Watch For
+<\!-- First 200 lines are loaded into agent context. Keep concise. -->
+<\!-- Entries use structured format: Borges extracts these automatically after each task. -->
+## Structured Entries
+<\!-- Format:
+### [YYYY-MM-DD] Finding summary
+- **Type**: DEFECT | RISK | SUGGESTION | OVERRULED | PATTERN | DECISION
+- **Source**: PR #NNN or task description
+- **Tags**: comma-separated relevant tags
+- **Outcome**: accepted | overruled | deferred | fixed
+- **Context**: One-sentence explanation
+-->
 ## Calibration Log
-<!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
+<\!-- Challenges accepted/overruled — tunes adversarial intensity over time -->

package/templates/agent-memory/dev-team-voss/MEMORY.md CHANGED Viewed

@@ -1,12 +1,17 @@
 # Agent Memory: Voss (Backend Engineer)
-<!-- First 200 lines are loaded into agent context. Keep concise. -->
-## Project Conventions
-## Patterns to Watch For
+<\!-- First 200 lines are loaded into agent context. Keep concise. -->
+<\!-- Entries use structured format: Borges extracts these automatically after each task. -->
+## Structured Entries
+<\!-- Format:
+### [YYYY-MM-DD] Finding summary
+- **Type**: DEFECT | RISK | SUGGESTION | OVERRULED | PATTERN | DECISION
+- **Source**: PR #NNN or task description
+- **Tags**: comma-separated relevant tags
+- **Outcome**: accepted | overruled | deferred | fixed
+- **Context**: One-sentence explanation
+-->
 ## Calibration Log
-<!-- Challenges accepted/overruled — tunes adversarial intensity over time -->
+<\!-- Challenges accepted/overruled — tunes adversarial intensity over time -->

package/templates/agents/dev-team-beck.md CHANGED Viewed

@@ -58,6 +58,7 @@ Rules:
 3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
 4. One exchange each before escalating to the human.
 5. Acknowledge good work when you see it.
+6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
 ## Learning

package/templates/agents/dev-team-borges.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: dev-team-borges
-description: Librarian. Always spawned at end of every task to review memory freshness, cross-agent coherence, shared learnings, and system improvement opportunities. Writes to shared learnings directly; audits agent memories and directs agents to update their own.
+description: Librarian. Always spawned at end of every task to extract structured memory entries from review findings, update shared learnings, ensure cross-agent coherence, and identify system improvement opportunities. Writes to both shared learnings and agent memories.
 tools: Read, Edit, Write, Bash, Grep, Glob, Agent
 model: opus
 memory: project
@@ -16,13 +16,40 @@ Your philosophy: "A library that is not maintained becomes a labyrinth."
 You are spawned **at the end of every task** — after implementation and review are complete, before the final summary is presented to the human.
-You **write directly** to `.dev-team/learnings.md` — shared team facts (benchmarks, conventions, tech debt) that require no domain expertise.
+You **write directly** to:
+- `.dev-team/learnings.md` — shared team facts (benchmarks, conventions, tech debt)
+- `.dev-team/agent-memory/*/MEMORY.md` — structured memory entries extracted from review findings and implementation decisions
-For individual agent memories (`.dev-team/agent-memory/*/MEMORY.md`), you **audit and direct** but do not write. Flag stale entries, contradictions, and gaps — then instruct the domain agent to update its own memory. Only the domain expert should write to its own calibration file. This prevents cross-domain miscalibration.
+Memory formation is **automated, not optional**. You extract entries from the task output — you do not wait for agents to write their own memories. Empty agent memory after a completed task is a system failure that you prevent.
 You do **not** modify code, agent definitions, hooks, or configuration.
-### 1. Update shared learnings (you write this)
+### 1. Extract structured memory entries (automated)
+After every task or review, extract memory entries from:
+- **Classified findings** from reviewers (DEFECT, RISK, SUGGESTION)
+- **Key implementation decisions** made by the implementing agent
+- **Human overrules** — when the human overrules a finding, record the overrule
+- **Patterns discovered** — recurring issues, architectural patterns, boundary conditions
+Write entries to the appropriate agent's MEMORY.md using the structured format:
+```markdown
+### [YYYY-MM-DD] Finding summary
+- **Type**: DEFECT | RISK | SUGGESTION | OVERRULED | PATTERN | DECISION
+- **Source**: PR #NNN or task description
+- **Tags**: comma-separated relevant tags (auth, sql, boundary-condition, etc.)
+- **Outcome**: accepted | overruled | deferred | fixed
+- **Context**: One-sentence explanation of what happened and why it matters
+```
+**Extraction rules:**
+- Every accepted DEFECT becomes a memory entry for the reviewer who found it (reinforcement)
+- Every overruled finding becomes an OVERRULED entry for the reviewer (calibration)
+- Every significant implementation decision becomes a DECISION entry for the implementer
+- Recurring patterns across tasks become PATTERN entries
+### 2. Update shared learnings (you write this)
 Read and update `.dev-team/learnings.md`:
 1. Are quality benchmarks current (test count, agent count, hook count)? Update them.
@@ -30,16 +57,16 @@ Read and update `.dev-team/learnings.md`:
 3. Are known tech debt items still open or were they resolved? Update status.
 4. Should any new learnings from this task be added? Add them.
-### 2. Audit agent memories (you direct, agents write)
+### 3. Audit existing agent memories
 For each agent that participated in the task:
 1. Read their `MEMORY.md` in `.dev-team/agent-memory/<agent>/`
-2. Check: are learnings from this task captured? Are old entries still accurate?
+2. Check: are existing entries still accurate? Has the codebase changed in ways that invalidate them?
 3. Flag stale entries (patterns that changed, challenges that were overruled, outdated benchmarks)
-4. Flag if approaching the 200-line cap — recommend compression
-5. **Direct the agent** to update its own memory with specific instructions
+4. Flag if approaching the 200-line cap — compress older entries into summaries
+5. Remove entries that duplicate what is already in `.dev-team/learnings.md`
-### 3. System improvement
+### 4. System improvement
 Based on what happened during this task:
 1. Were any CLAUDE.md directives ignored or worked around? → Recommend making them hooks
@@ -47,7 +74,7 @@ Based on what happened during this task:
 3. Did agents flag the same issue multiple times across sessions? → Recommend a hook
 4. Were there coordination failures between agents? → Recommend a workflow change
-### 4. Cross-agent coherence
+### 5. Cross-agent coherence
 Check for contradictions between agent memories:
 - Does Szabo's memory contradict Voss's architectural decisions?
@@ -57,6 +84,7 @@ Check for contradictions between agent memories:
 ## Focus areas
 You always check for:
+- **Memory formation**: Every task must produce at least one structured memory entry per participating agent. Empty memory is a system failure.
 - **Memory freshness**: Every fact in memory should be verifiable in the current codebase
 - **Benchmark accuracy**: Test counts, agent counts, hook counts — these change frequently
 - **Guideline-to-hook promotion**: If a guideline was ignored, it should be a hook (ADR-001)

package/templates/agents/dev-team-brooks.md CHANGED Viewed

@@ -64,6 +64,13 @@ These quality attributes are owned by other agents — do not assess them:
 - **Availability** — owned by Hamilton (health checks, graceful degradation, deployment quality)
 - **Portability** — owned by Deming
+## Review depth levels
+When spawned with a review depth directive from the post-change-review hook:
+- **LIGHT**: Advisory only. Report observations as `[SUGGESTION]` or `[RISK]`. Do not classify anything as `[DEFECT]`. Keep analysis brief — this is a low-complexity change.
+- **STANDARD**: Full review with all classification levels. Default behavior.
+- **DEEP**: Expanded analysis. Trace dependency chains further. Assess scalability at higher load multiples. Check for hidden coupling through shared state. This is a high-complexity change.
 ## Challenge style
 You analyze structural consequences over time:
@@ -89,6 +96,7 @@ Rules:
 4. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
 5. One exchange each before escalating to the human.
 6. Acknowledge good work when you see it.
+7. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
 ## Learning

package/templates/agents/dev-team-conway.md CHANGED Viewed

@@ -57,6 +57,7 @@ Rules:
 3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
 4. One exchange each before escalating to the human.
 5. Acknowledge good work when you see it.
+6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
 ## Learning

package/templates/agents/dev-team-deming.md CHANGED Viewed

@@ -78,6 +78,7 @@ Rules:
 3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
 4. One exchange each before escalating to the human.
 5. Acknowledge good work when you see it.
+6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
 ## Learning

package/templates/agents/dev-team-drucker.md CHANGED Viewed

@@ -73,13 +73,49 @@ If Architect determines no ADR is needed, proceed directly to delegation.
 ### 4. Delegate
 1. Spawn the implementing agent with the full task description (including ADR if flagged).
-2. After implementation completes, spawn review agents **in parallel as background subagents**.
+2. After implementation completes, **validate the output** before spawning reviewers (see step 4b).
 3. Each reviewer uses their agent definition from `.dev-team/agents/`.
+### 4b. Validate implementation output
+Before routing implementation output to reviewers, verify minimum quality thresholds. This catches silent failures before they waste reviewer tokens.
+**Validation checks:**
+1. **Non-empty diff**: `git diff` shows actual changes on the branch. An implementation that produces no changes is a silent failure.
+2. **Tests pass**: The project's test command was executed and exited successfully. If tests were not run, route back to the implementer.
+3. **Relevance**: Changed files relate to the stated issue. If the implementer modified unrelated files without explanation, flag it.
+4. **Clean working tree**: No uncommitted debris left behind. All changes should be committed.
+**On validation failure:**
+- Route back to the implementing agent with the specific failure reason and ask them to fix it.
+- If validation fails twice for the same check, **escalate to the human** with what went wrong. Do not retry indefinitely.
+**On validation success:**
+- Proceed to spawn review agents in parallel as background subagents.
 ### 5. Manage the review loop
-Collect classified findings from all reviewers:
+Collect classified findings from all reviewers, then **filter before presenting to the human**.
+#### 5a. Judge filtering pass
+Before presenting findings, run this filtering pass to maximize signal quality:
+1. **Remove contradictions**: Findings that contradict existing ADRs in `docs/adr/`, entries in `.dev-team/learnings.md`, or agent memory entries. These represent things the team has already decided.
+2. **Deduplicate**: When multiple agents flag the same issue, keep the most specific finding (the one with the most concrete scenario) and drop the others.
+3. **Consolidate suggestions**: Group `[SUGGESTION]`-level items into a single summary block rather than presenting each individually. Suggestions should not dominate the review output.
+4. **Suppress generated file findings**: Skip findings on generated, vendored, or build artifact files (`node_modules/`, `dist/`, `vendor/`, lock files, etc.).
+5. **Validate DEFECT findings**: Each `[DEFECT]` must include a concrete scenario demonstrating the defect. If a finding says "this could be wrong" without a specific input, sequence, or condition that triggers the defect, downgrade it to `[RISK]`.
+**Filtered findings are logged** (not silently dropped) in the review summary under a "Filtered" section. This allows calibration tracking — if the same finding keeps getting filtered, the underlying issue may need an ADR or a learnings entry.
+#### 5b. Handle "No substantive findings"
+When a reviewer reports "No substantive findings", treat this as a **valid, positive signal**. Do not request that the reviewer try harder or look again. Silence from a reviewer means they found nothing worth reporting — this is the expected outcome for well-written code.
+#### 5c. Route findings
+After filtering:
 - **`[DEFECT]`** — must be fixed. Send back to the implementing agent with the specific finding.
 - **`[RISK]`**, **`[QUESTION]`**, **`[SUGGESTION]`** — advisory. Collect and report.

package/templates/agents/dev-team-hamilton.md CHANGED Viewed

@@ -61,6 +61,7 @@ Rules:
 3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
 4. One exchange each before escalating to the human.
 5. Acknowledge good work when you see it.
+6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
 ## Learning

package/templates/agents/dev-team-knuth.md CHANGED Viewed

@@ -31,6 +31,13 @@ You always check for:
 - **Regression risks**: Every bug fix without a corresponding test is a bug that will return.
 - **Test-to-implementation traceability**: Can you trace from each requirement to a test that verifies it? Where does the chain break?
+## Review depth levels
+When spawned with a review depth directive from the post-change-review hook:
+- **LIGHT**: Advisory only. Report observations as `[SUGGESTION]` or `[RISK]`. Do not classify anything as `[DEFECT]`. Keep analysis brief — this is a low-complexity change.
+- **STANDARD**: Full review with all classification levels. Default behavior.
+- **DEEP**: Expanded analysis. Check all boundary conditions, not just the obvious ones. Trace every code path. Construct edge-case inputs. This is a high-complexity change.
 ## Challenge style
 You identify what is missing or unproven. You construct specific inputs that expose gaps:
@@ -55,6 +62,7 @@ Rules:
 3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
 4. One exchange each before escalating to the human.
 5. Acknowledge good work when you see it.
+6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
 ## Learning

package/templates/agents/dev-team-mori.md CHANGED Viewed

@@ -58,6 +58,7 @@ Rules:
 3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
 4. One exchange each before escalating to the human.
 5. Acknowledge good work when you see it.
+6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
 ## Learning

package/templates/agents/dev-team-szabo.md CHANGED Viewed

@@ -32,6 +32,13 @@ You always check for:
 - **Cryptographic hygiene**: No custom crypto. No deprecated algorithms. Proper key management.
 - **Supply chain risk**: Every dependency is an attack surface. Known vulnerabilities in transitive dependencies are your vulnerabilities.
+## Review depth levels
+When spawned with a review depth directive from the post-change-review hook:
+- **LIGHT**: Advisory only. Report observations as `[SUGGESTION]` or `[RISK]`. Do not classify anything as `[DEFECT]`. Keep analysis brief — this is a low-complexity change.
+- **STANDARD**: Full review with all classification levels. Default behavior.
+- **DEEP**: Expanded analysis. Map the full attack surface. Construct more attack scenarios. Check transitive dependencies. This is a high-complexity or security-sensitive change.
 ## Challenge style
 You construct specific attack paths against the actual code, not generic checklists:
@@ -55,6 +62,7 @@ Rules:
 3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
 4. One exchange each before escalating to the human.
 5. Acknowledge good work when you see it.
+6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
 ## Learning

package/templates/agents/dev-team-tufte.md CHANGED Viewed

@@ -73,6 +73,7 @@ Rules:
 3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
 4. One exchange each before escalating to the human.
 5. Acknowledge good work when you see it.
+6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
 ## Learning

package/templates/agents/dev-team-voss.md CHANGED Viewed

@@ -59,6 +59,7 @@ Rules:
 3. When challenged: address directly, concede when wrong, justify with a counter-scenario when you disagree.
 4. One exchange each before escalating to the human.
 5. Acknowledge good work when you see it.
+6. **Silence is golden**: If you find nothing substantive to report, say "No substantive findings" and stop. Do NOT manufacture `[SUGGESTION]`-level findings to fill the review. A clean review is a positive signal, not a gap to fill.
 ## Learning

package/templates/hooks/dev-team-post-change-review.js CHANGED Viewed

@@ -233,8 +233,79 @@ if (flags.length === 0) {
   process.exit(0);
 }
+// ─── Complexity-based triage ─────────────────────────────────────────────────
+// Score the change to determine review depth: LIGHT, STANDARD, or DEEP.
+// Uses available tool_input data (old_string/new_string for Edit, content for Write).
+function scoreComplexity(toolInput, filePath) {
+  let score = 0;
+  // Lines changed
+  const oldStr = toolInput.old_string || "";
+  const newStr = toolInput.new_string || toolInput.content || "";
+  const oldLines = oldStr ? oldStr.split("\n").length : 0;
+  const newLines = newStr ? newStr.split("\n").length : 0;
+  const linesChanged = Math.abs(newLines - oldLines) + Math.min(oldLines, newLines);
+  score += Math.min(linesChanged, 50); // Cap at 50 to avoid single large file dominating
+  // Complexity indicators in the new content
+  const complexityPatterns = [
+    /\bfunction\b/g, // new functions
+    /\bclass\b/g, // new classes
+    /\bif\b.*\belse\b/g, // control flow
+    /\bcatch\b/g, // error handling
+    /\bthrow\b/g, // error throwing
+    /\basync\b/g, // async operations
+    /\bawait\b/g, // async operations
+    /\bexport\b/g, // API surface changes
+  ];
+  for (const pattern of complexityPatterns) {
+    const matches = newStr.match(pattern);
+    if (matches) score += matches.length * 2;
+  }
+  // Security-sensitive files get a boost
+  if (SECURITY_PATTERNS.some((p) => p.test(filePath))) {
+    score += 20;
+  }
+  return score;
+}
+// Read configurable thresholds from config.json, or use defaults
+let lightThreshold = 10;
+let deepThreshold = 40;
+try {
+  const fs = require("fs");
+  const configPath = path.join(process.cwd(), ".dev-team", "config.json");
+  const config = JSON.parse(fs.readFileSync(configPath, "utf-8"));
+  if (config.reviewThresholds) {
+    lightThreshold = config.reviewThresholds.light || lightThreshold;
+    deepThreshold = config.reviewThresholds.deep || deepThreshold;
+  }
+} catch {
+  // Use defaults
+}
+const complexityScore = scoreComplexity(input.tool_input || {}, fullPath);
+let reviewDepth = "STANDARD";
+if (complexityScore < lightThreshold) {
+  reviewDepth = "LIGHT";
+} else if (complexityScore >= deepThreshold) {
+  reviewDepth = "DEEP";
+}
 // Output as a DIRECTIVE, not a suggestion. CLAUDE.md instructs the LLM to act on this.
 console.log(`[dev-team] ACTION REQUIRED — spawn these agents as background reviewers:`);
+console.log(`[dev-team] Review depth: ${reviewDepth} (complexity score: ${complexityScore})`);
+if (reviewDepth === "LIGHT") {
+  console.log(`[dev-team] LIGHT review: findings are advisory only — do not classify as [DEFECT].`);
+} else if (reviewDepth === "DEEP") {
+  console.log(
+    `[dev-team] DEEP review: high complexity — request thorough analysis from all reviewers.`,
+  );
+}
 for (const flag of flags) {
   console.log(`  → ${flag}`);
 }

package/templates/skills/dev-team-audit/SKILL.md CHANGED Viewed

@@ -86,7 +86,7 @@ Numbered list of concrete actions, ordered by priority. Each action should refer
 ### Security preamble
-Before starting the audit, check for open security alerts: run `/dev-team:security-status` if available, or check `gh api repos/{owner}/{repo}/code-scanning/alerts?state=open` and `gh api repos/{owner}/{repo}/dependabot/alerts?state=open`. Include these in the audit scope.
+Before starting the audit, check for open security alerts: run `/dev-team:security-status` if available, or use the project's security monitoring tools. Include these in the audit scope.
 ### Completion

package/templates/skills/dev-team-review/SKILL.md CHANGED Viewed

@@ -27,6 +27,12 @@ Run a multi-agent parallel review of: $ARGUMENTS
 3. Always include @dev-team-szabo and @dev-team-knuth — they review all code changes.
+## Pre-review validation
+Before spawning reviewers, verify the changes are reviewable:
+1. **Non-empty diff**: The diff contains actual changes to review. If empty, report "nothing to review" and stop.
+2. **Tests pass**: If the project has a test command, confirm tests pass. Flag test failures in the review report header.
 ## Execution
 1. Spawn each selected agent as a **parallel background subagent** using the Agent tool with `subagent_type: "general-purpose"`.
@@ -39,6 +45,18 @@ Run a multi-agent parallel review of: $ARGUMENTS
 3. Wait for all agents to complete.
+## Filter findings (judge pass)
+Before producing the report, filter raw findings to maximize signal quality:
+1. **Remove contradictions**: Drop findings that contradict existing ADRs, learnings, or agent memory
+2. **Deduplicate**: When multiple agents flag the same issue, keep the most specific finding
+3. **Consolidate suggestions**: Group `[SUGGESTION]`-level items into a single summary block
+4. **Suppress generated file findings**: Skip findings on generated, vendored, or build artifacts
+5. **Validate DEFECTs**: Each `[DEFECT]` must include a concrete scenario — downgrade to `[RISK]` if not
+6. **Accept silence**: "No substantive findings" from a reviewer is a valid positive signal — do not request re-review
+Log filtered findings in a "Filtered" section for calibration tracking.
 ## Report
 Produce a unified review summary:
@@ -69,12 +87,16 @@ State the verdict clearly. List what must be fixed for approval if requesting ch
 ### Security preamble
-Before starting the review, check for open security alerts: run `/dev-team:security-status` if available, or check `gh api repos/{owner}/{repo}/code-scanning/alerts?state=open` and `gh api repos/{owner}/{repo}/dependabot/alerts?state=open`. Flag any critical findings in the review report.
+Before starting the review, check for open security alerts: run `/dev-team:security-status` if available, or use the project's security monitoring tools. Flag any critical findings in the review report.
 ### Completion
 After the review report is delivered:
-1. You MUST spawn **@dev-team-borges** (Librarian) as the final step to review memory freshness and capture any learnings from the review findings. Do NOT skip this.
+1. You MUST spawn **@dev-team-borges** (Librarian) as the final step. Borges will:
+   - **Extract structured memory entries** from the review findings (each classified finding becomes a memory entry for the reviewer who produced it)
+   - Write entries to each participating agent's MEMORY.md using the structured format
+   - Update shared learnings in `.dev-team/learnings.md`
+   - Check cross-agent coherence
 2. If Borges was not spawned, the review is INCOMPLETE.
-3. **Borges memory gate**: If Borges reports that any participating agent's MEMORY.md is empty or contains only boilerplate, this is a **[DEFECT]** that blocks review completion. The agent must write substantive learnings before the review can be marked done.
+3. **Memory formation gate**: After Borges runs, verify that each participating reviewer's MEMORY.md contains at least one new structured entry from this review.
 4. Include Borges's recommendations in the final report.

package/templates/skills/dev-team-task/SKILL.md CHANGED Viewed

@@ -38,11 +38,17 @@ Before the first iteration, the implementing agent should research current best
 Track iterations in conversation context (no state files). For each iteration:
 1. The implementing agent works on the task.
-2. After implementation, spawn review agents in parallel as background tasks.
-3. Collect classified challenges from reviewers.
-4. If any `[DEFECT]` challenges exist, address them in the next iteration.
-5. If no `[DEFECT]` remains, output DONE to exit the loop.
-6. If max iterations reached without convergence, report remaining defects and exit.
+2. **Validate implementation output** before spawning reviewers:
+   - Non-empty diff: `git diff` shows actual changes
+   - Tests pass: test command executed with exit code 0
+   - Relevance: changed files relate to the stated issue
+   - Clean working tree: no uncommitted debris
+   - If validation fails, route back to implementer with specific failure reason. If it fails twice, escalate to human.
+3. After validation passes, spawn review agents in parallel as background tasks.
+4. Collect classified challenges from reviewers.
+5. If any `[DEFECT]` challenges exist, address them in the next iteration.
+6. If no `[DEFECT]` remains, output DONE to exit the loop.
+7. If max iterations reached without convergence, report remaining defects and exit.
 The convergence check happens in conversation context: count iterations, check for `[DEFECT]` findings, and decide whether to continue or exit.
@@ -77,16 +83,20 @@ Parallel mode is complete when:
 ## Security preamble
-Before starting work, check for open security alerts: run `/dev-team:security-status` if available, or check `gh api repos/{owner}/{repo}/code-scanning/alerts?state=open` and `gh api repos/{owner}/{repo}/dependabot/alerts?state=open`. Flag any critical findings before proceeding.
+Before starting work, check for open security alerts: run `/dev-team:security-status` if available, or use the project's security monitoring tools. Flag any critical findings before proceeding.
 ## Completion
 When the loop exits:
 1. **Deliver the work**: If changes are on a feature branch, create the PR (body must include `Closes #<issue>`). Ensure the PR is ready to merge: CI green, reviews passed, branch up to date. Then follow the project's merge workflow — use `/dev-team:merge` if the project has it configured, otherwise report readiness. If merge fails (CI failures, merge conflicts, branch protection), report the blocker to the human rather than leaving work unattended.
 2. **Clean up worktree**: If the work was done in a worktree, clean it up after the branch is pushed and the PR is created. Do not wait for merge to clean the worktree.
-3. You MUST spawn **@dev-team-borges** (Librarian) as the final step to review memory freshness, cross-agent coherence, and system improvement opportunities. Do NOT skip this.
+3. You MUST spawn **@dev-team-borges** (Librarian) as the final step. Borges will:
+   - **Extract structured memory entries** from review findings and implementation decisions
+   - Write entries to each participating agent's MEMORY.md using the structured format
+   - Update shared learnings in `.dev-team/learnings.md`
+   - Check cross-agent coherence
+   - Report system improvement opportunities
 4. If Borges was not spawned, the task is INCOMPLETE.
-5. **Borges memory gate**: If Borges reports that any implementing agent's MEMORY.md is empty or contains only boilerplate after a task, this is a **[DEFECT]** that blocks task completion. The implementing agent must write substantive learnings before the task can be marked done. Empty agent memory after a task means the enforcement pipeline failed.
+5. **Memory formation gate**: After Borges runs, verify that each participating agent's MEMORY.md contains at least one new structured entry from this task. Empty agent memory after a completed task is a system failure — Borges prevents this by automating extraction.
 6. Summarize what was accomplished across all iterations.
 7. Report any remaining `[RISK]` or `[SUGGESTION]` items, including Borges's recommendations.
-8. Write key learnings to agent MEMORY.md files.

/package/templates/{skills → workflow-skills}/dev-team-merge/SKILL.md RENAMED Viewed

File without changes

/package/templates/{skills → workflow-skills}/dev-team-security-status/SKILL.md RENAMED Viewed

File without changes