devflow-kit 1.4.0 → 1.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (45) hide show
  1. package/CHANGELOG.md +14 -0
  2. package/README.md +2 -1
  3. package/dist/commands/init.js +29 -0
  4. package/dist/commands/list.d.ts +21 -0
  5. package/dist/commands/list.js +71 -3
  6. package/dist/plugins.js +1 -1
  7. package/dist/utils/manifest.d.ts +45 -0
  8. package/dist/utils/manifest.js +100 -0
  9. package/package.json +1 -1
  10. package/plugins/devflow-accessibility/.claude-plugin/plugin.json +1 -1
  11. package/plugins/devflow-ambient/.claude-plugin/plugin.json +1 -1
  12. package/plugins/devflow-ambient/skills/ambient-router/SKILL.md +1 -1
  13. package/plugins/devflow-ambient/skills/ambient-router/references/skill-catalog.md +1 -0
  14. package/plugins/devflow-audit-claude/.claude-plugin/plugin.json +1 -1
  15. package/plugins/devflow-code-review/.claude-plugin/plugin.json +1 -1
  16. package/plugins/devflow-code-review/agents/reviewer.md +42 -5
  17. package/plugins/devflow-code-review/agents/synthesizer.md +12 -5
  18. package/plugins/devflow-code-review/commands/code-review.md +4 -1
  19. package/plugins/devflow-core-skills/.claude-plugin/plugin.json +2 -1
  20. package/plugins/devflow-core-skills/skills/search-first/SKILL.md +133 -0
  21. package/plugins/devflow-core-skills/skills/search-first/references/evaluation-criteria.md +101 -0
  22. package/plugins/devflow-debug/.claude-plugin/plugin.json +1 -1
  23. package/plugins/devflow-frontend-design/.claude-plugin/plugin.json +1 -1
  24. package/plugins/devflow-go/.claude-plugin/plugin.json +1 -1
  25. package/plugins/devflow-implement/.claude-plugin/plugin.json +1 -1
  26. package/plugins/devflow-implement/agents/coder.md +16 -13
  27. package/plugins/devflow-implement/agents/synthesizer.md +12 -5
  28. package/plugins/devflow-implement/commands/implement-teams.md +1 -5
  29. package/plugins/devflow-implement/commands/implement.md +1 -5
  30. package/plugins/devflow-java/.claude-plugin/plugin.json +1 -1
  31. package/plugins/devflow-python/.claude-plugin/plugin.json +1 -1
  32. package/plugins/devflow-react/.claude-plugin/plugin.json +1 -1
  33. package/plugins/devflow-resolve/.claude-plugin/plugin.json +1 -1
  34. package/plugins/devflow-rust/.claude-plugin/plugin.json +1 -1
  35. package/plugins/devflow-self-review/.claude-plugin/plugin.json +1 -1
  36. package/plugins/devflow-specify/.claude-plugin/plugin.json +1 -1
  37. package/plugins/devflow-specify/agents/synthesizer.md +12 -5
  38. package/plugins/devflow-typescript/.claude-plugin/plugin.json +1 -1
  39. package/shared/agents/coder.md +16 -13
  40. package/shared/agents/reviewer.md +42 -5
  41. package/shared/agents/synthesizer.md +12 -5
  42. package/shared/skills/ambient-router/SKILL.md +1 -1
  43. package/shared/skills/ambient-router/references/skill-catalog.md +1 -0
  44. package/shared/skills/search-first/SKILL.md +133 -0
  45. package/shared/skills/search-first/references/evaluation-criteria.md +101 -0
@@ -0,0 +1,133 @@
1
+ ---
2
+ name: search-first
3
+ description: >-
4
+ This skill should be used when the user asks to "add a utility", "create a helper",
5
+ "implement parsing", "build a wrapper", or writes infrastructure/utility code that
6
+ may already exist as a well-maintained package. Enforces research before building.
7
+ user-invocable: false
8
+ allowed-tools: Read, Grep, Glob
9
+ ---
10
+
11
+ # Search-First
12
+
13
+ Research before building. Check if a battle-tested solution exists before writing custom utility code.
14
+
15
+ ## Iron Law
16
+
17
+ > **RESEARCH BEFORE BUILDING**
18
+ >
19
+ > Never write custom utility code without first checking if a battle-tested solution
20
+ > exists. The best code is code you don't write. A maintained package with thousands
21
+ > of users will always beat a hand-rolled utility in reliability, edge cases, and
22
+ > long-term maintenance.
23
+
24
+ ## When This Skill Activates
25
+
26
+ **Triggers** — creating or modifying code that:
27
+ - Parses, formats, or validates data (dates, URLs, emails, UUIDs, etc.)
28
+ - Implements common algorithms (sorting, diffing, hashing, encoding)
29
+ - Wraps HTTP clients, retries, rate limiting, caching
30
+ - Handles file system operations beyond basic read/write
31
+ - Implements CLI argument parsing, logging, or configuration
32
+ - Creates test utilities (mocking, fixtures, assertions)
33
+
34
+ **Does NOT trigger** for:
35
+ - Domain-specific business logic unique to this project
36
+ - Glue code connecting existing components
37
+ - Trivial operations (< 5 lines, single-use)
38
+ - Code that intentionally avoids external dependencies (e.g., zero-dep libraries)
39
+
40
+ ---
41
+
42
+ ## Research Process
43
+
44
+ ### Phase 1: Need Analysis
45
+
46
+ Before searching, define what you actually need:
47
+
48
+ ```
49
+ Need: {one-sentence description of the capability}
50
+ Constraints: {runtime, bundle size, license, zero-dep requirement}
51
+ Must-haves: {non-negotiable requirements}
52
+ Nice-to-haves: {optional features}
53
+ ```
54
+
55
+ ### Phase 2: Search
56
+
57
+ Delegate research to an Explore subagent to keep main session context clean.
58
+
59
+ **Spawn an Explore agent** with this prompt template:
60
+
61
+ ```
62
+ Task(subagent_type="Explore"):
63
+ "Research existing solutions for: {need description}
64
+
65
+ Search for:
66
+ 1. npm/PyPI/crates packages that solve this (check package.json/requirements.txt for ecosystem)
67
+ 2. Existing utilities in this codebase (grep for related function names)
68
+ 3. Framework built-ins that already handle this
69
+
70
+ For each candidate, find:
71
+ - Package name and weekly downloads (if applicable)
72
+ - Last publish date and maintenance status
73
+ - Bundle size / dependency count
74
+ - API surface relevant to our need
75
+ - License compatibility
76
+
77
+ Return top 3 candidates with pros/cons, or confirm nothing suitable exists."
78
+ ```
79
+
80
+ ### Phase 3: Evaluate
81
+
82
+ Score each candidate against evaluation criteria. See `references/evaluation-criteria.md` for the full matrix.
83
+
84
+ Quick checklist:
85
+ - [ ] Last published within 12 months
86
+ - [ ] Weekly downloads > 1,000 (npm) or equivalent traction
87
+ - [ ] No known vulnerabilities (check Snyk/npm audit)
88
+ - [ ] API fits the use case without heavy wrapping
89
+ - [ ] License compatible with project (MIT/Apache/BSD preferred)
90
+ - [ ] Bundle size acceptable for the project context
91
+
92
+ ### Phase 4: Decide
93
+
94
+ Choose one of four outcomes:
95
+
96
+ | Decision | When | Action |
97
+ |----------|------|--------|
98
+ | **Adopt** | Exact match, well-maintained, good API | Install and use directly |
99
+ | **Extend** | Partial match, needs thin wrapper | Install + write minimal adapter |
100
+ | **Compose** | No single package fits, but 2-3 small ones combine well | Install multiple, write glue code |
101
+ | **Build** | Nothing fits, or dependency cost exceeds value | Write custom, document why |
102
+
103
+ **Document the decision** in a code comment at the usage site:
104
+
105
+ ```typescript
106
+ // search-first: Adopted date-fns for date formatting (2M weekly downloads, 30KB)
107
+ // search-first: Built custom — no package handles our specific wire format
108
+ ```
109
+
110
+ ---
111
+
112
+ ## Anti-Patterns
113
+
114
+ | Anti-Pattern | Correct Approach |
115
+ |-------------|-----------------|
116
+ | Adding a dependency for 5 lines of trivial code | Build — dependency overhead exceeds value |
117
+ | Choosing the most popular package without checking fit | Evaluate API fit, not just popularity |
118
+ | Wrapping a package so heavily it obscures the original | If wrapping > 50% of original API, reconsider |
119
+ | Skipping research because "I know how to build this" | Research anyway — maintenance cost matters more than initial build |
120
+ | Installing a massive framework for one utility function | Look for focused, single-purpose packages |
121
+
122
+ ## Scope Limiter
123
+
124
+ This skill concerns **utility and infrastructure code** only:
125
+ - Data transformation, validation, formatting
126
+ - Network operations, retries, caching
127
+ - CLI tooling, logging, configuration
128
+ - Test utilities and helpers
129
+
130
+ It does NOT apply to **domain-specific business logic** where:
131
+ - The logic encodes unique business rules
132
+ - No generic solution could exist
133
+ - The code is inherently project-specific
@@ -0,0 +1,101 @@
1
+ # Search-First — Evaluation Criteria
2
+
3
+ Detailed package evaluation criteria and decision matrix for the 4-outcome model.
4
+
5
+ ## Evaluation Matrix
6
+
7
+ Score each candidate on these axes (1-5 scale):
8
+
9
+ | Criterion | Weight | 1 (Poor) | 3 (Acceptable) | 5 (Excellent) |
10
+ |-----------|--------|-----------|-----------------|----------------|
11
+ | **Maintenance** | High | No commits in 2+ years | Active, yearly releases | Regular releases, responsive maintainer |
12
+ | **Adoption** | Medium | < 100 weekly downloads | 1K-10K weekly downloads | > 100K weekly downloads |
13
+ | **API Fit** | High | Needs heavy wrapping | Partial fit, thin adapter needed | Direct use, clean API |
14
+ | **Bundle Size** | Medium | > 500KB | 50-500KB | < 50KB |
15
+ | **Security** | High | Known vulnerabilities | No known issues, few dependencies | Audited, zero/minimal dependencies |
16
+ | **License** | Required | GPL/AGPL (restrictive) | LGPL (conditional) | MIT/Apache/BSD (permissive) |
17
+
18
+ **Minimum thresholds**: License must be compatible. Security must be ≥ 3. All others are trade-offs.
19
+
20
+ ## Decision Matrix
21
+
22
+ ### Adopt (score ≥ 20/25, API Fit ≥ 4)
23
+
24
+ The package directly solves the problem with minimal integration code.
25
+
26
+ **Example**: Using `zod` for schema validation — exact fit, massive adoption, tiny bundle.
27
+
28
+ ```
29
+ ✅ Adopt: zod v3.22
30
+ - Maintenance: 5 (monthly releases)
31
+ - Adoption: 5 (4M weekly downloads)
32
+ - API Fit: 5 (direct use for all validation)
33
+ - Bundle Size: 4 (57KB)
34
+ - Security: 5 (zero dependencies)
35
+ - Total: 24/25
36
+ ```
37
+
38
+ ### Extend (score ≥ 15/25, API Fit ≥ 2)
39
+
40
+ The package handles 60-80% of the need. Write a thin adapter for the rest.
41
+
42
+ **Example**: Using `got` for HTTP but wrapping it with project-specific retry and auth logic.
43
+
44
+ ```
45
+ ✅ Extend: got v14
46
+ - Maintenance: 4 (active)
47
+ - Adoption: 5 (8M weekly downloads)
48
+ - API Fit: 3 (need custom retry wrapper)
49
+ - Bundle Size: 3 (150KB)
50
+ - Security: 4 (minimal deps)
51
+ - Total: 19/25
52
+ Adapter: ~30 lines wrapping retry + auth headers
53
+ ```
54
+
55
+ ### Compose (no single package fits, but small packages combine)
56
+
57
+ Two or three focused packages together solve the problem better than one large framework.
58
+
59
+ **Example**: `ms` (time parsing) + `p-retry` (retry logic) + `quick-lru` (caching) instead of a monolithic HTTP client framework.
60
+
61
+ **Rules for Compose**:
62
+ - Maximum 3 packages in a composition
63
+ - Each package must be focused (single responsibility)
64
+ - Total combined bundle < what a monolithic alternative would cost
65
+ - Glue code should be < 50 lines
66
+
67
+ ### Build (nothing fits, or dependency cost > value)
68
+
69
+ Write custom code when:
70
+ - No package scores ≥ 15/25
71
+ - The code is < 50 lines and trivial
72
+ - Zero-dependency constraint is explicit
73
+ - The domain is too specific for generic packages
74
+
75
+ **Required**: Document why Build was chosen:
76
+
77
+ ```typescript
78
+ // search-first: Built custom — our wire format uses non-standard
79
+ // ISO-8601 extensions that no date library handles correctly.
80
+ // Evaluated: date-fns (no custom format support), luxon (500KB overhead),
81
+ // dayjs (close but missing timezone edge case).
82
+ ```
83
+
84
+ ## Ecosystem-Specific Hints
85
+
86
+ ### Node.js / TypeScript
87
+ - Check npm: `https://www.npmjs.com/package/{name}`
88
+ - Bundle size: `https://bundlephobia.com/package/{name}`
89
+ - Check if Node.js built-ins handle it (`node:crypto`, `node:url`, `node:path`)
90
+
91
+ ### Python
92
+ - Check PyPI: `https://pypi.org/project/{name}`
93
+ - Check if stdlib handles it (`urllib`, `json`, `pathlib`, `dataclasses`)
94
+
95
+ ### Rust
96
+ - Check crates.io: `https://crates.io/crates/{name}`
97
+ - Check if std handles it
98
+
99
+ ### Go
100
+ - Check pkg.go.dev
101
+ - Go standard library is extensive — check stdlib first
@@ -4,7 +4,7 @@
4
4
  "author": {
5
5
  "name": "Dean0x"
6
6
  },
7
- "version": "1.4.0",
7
+ "version": "1.5.0",
8
8
  "homepage": "https://github.com/dean0x/devflow",
9
9
  "repository": "https://github.com/dean0x/devflow",
10
10
  "license": "MIT",
@@ -4,7 +4,7 @@
4
4
  "author": {
5
5
  "name": "Dean0x"
6
6
  },
7
- "version": "1.4.0",
7
+ "version": "1.5.0",
8
8
  "homepage": "https://github.com/dean0x/devflow",
9
9
  "repository": "https://github.com/dean0x/devflow",
10
10
  "license": "MIT",
@@ -4,7 +4,7 @@
4
4
  "author": {
5
5
  "name": "Dean0x"
6
6
  },
7
- "version": "1.4.0",
7
+ "version": "1.5.0",
8
8
  "homepage": "https://github.com/dean0x/devflow",
9
9
  "repository": "https://github.com/dean0x/devflow",
10
10
  "license": "MIT",
@@ -4,7 +4,7 @@
4
4
  "author": {
5
5
  "name": "Dean0x"
6
6
  },
7
- "version": "1.4.0",
7
+ "version": "1.5.0",
8
8
  "homepage": "https://github.com/dean0x/devflow",
9
9
  "repository": "https://github.com/dean0x/devflow",
10
10
  "license": "MIT",
@@ -29,11 +29,14 @@ You receive from orchestrator:
29
29
 
30
30
  ## Responsibilities
31
31
 
32
- 1. **Orient on branch state**: Check git log for commits from previous Coders (if sequential). Read files created by prior phases - **do not trust summaries alone**. Identify patterns from actual code: naming conventions, error handling approach, testing style.
33
-
34
- 2. **Reference handoff** (if PRIOR_PHASE_SUMMARY provided): Use summary to validate your understanding of prior work, not as the sole source of truth. The actual code is authoritative.
35
-
36
- 3. **Load domain skills**: Based on DOMAIN hint and files in scope, dynamically load relevant language/ecosystem skills by reading their SKILL.md. Only load skills that are installed:
32
+ 1. **Orient on branch state** (always, before any implementation):
33
+ - Run `git log --oneline --stat -n 10` to scan recent commit history on this branch
34
+ - Run `git status` and `git diff --stat` and `git diff --cached --stat` to see uncommitted/unstaged work
35
+ - Cross-reference changed files against EXECUTION_PLAN to identify what's relevant to your task
36
+ - Read those relevant files to understand interfaces, types, naming conventions, error handling, and testing patterns established by prior work
37
+ - If PRIOR_PHASE_SUMMARY is provided, use it to validate your understanding — actual code is authoritative, summaries are supplementary
38
+
39
+ 2. **Load domain skills**: Based on DOMAIN hint and files in scope, dynamically load relevant language/ecosystem skills by reading their SKILL.md. Only load skills that are installed:
37
40
  - `backend` (TypeScript): Read `~/.claude/skills/typescript/SKILL.md`, `~/.claude/skills/input-validation/SKILL.md`
38
41
  - `backend` (Go): Read `~/.claude/skills/go/SKILL.md`
39
42
  - `backend` (Java): Read `~/.claude/skills/java/SKILL.md`
@@ -44,22 +47,22 @@ You receive from orchestrator:
44
47
  - `fullstack`: Combine backend + frontend skills
45
48
  - If a Read fails (skill not installed), skip it silently and continue.
46
49
 
47
- 4. **Implement the plan**: Work through execution steps systematically, creating and modifying files. Follow existing patterns. Type everything. Use Result types if codebase uses them.
50
+ 3. **Implement the plan**: Work through execution steps systematically, creating and modifying files. Follow existing patterns. Type everything. Use Result types if codebase uses them.
48
51
 
49
- 5. **Write tests**: Add tests for new functionality. Cover happy path, error cases, and edge cases. Follow existing test patterns.
52
+ 4. **Write tests**: Add tests for new functionality. Cover happy path, error cases, and edge cases. Follow existing test patterns.
50
53
 
51
- 6. **Run tests**: Execute the test suite. Fix any failures. All tests must pass before proceeding.
54
+ 5. **Run tests**: Execute the test suite. Fix any failures. All tests must pass before proceeding.
52
55
 
53
- 7. **Commit and push**: Create atomic commits with clear messages. Reference TASK_ID. Push to remote.
56
+ 6. **Commit and push**: Create atomic commits with clear messages. Reference TASK_ID. Push to remote.
54
57
 
55
- 8. **Create PR** (if CREATE_PR=true): Create pull request against BASE_BRANCH with summary and testing notes.
58
+ 7. **Create PR** (if CREATE_PR=true): Create pull request against BASE_BRANCH with summary and testing notes.
56
59
 
57
- 9. **Generate handoff** (if HANDOFF_REQUIRED=true): Include implementation summary for next Coder (see Output section).
60
+ 8. **Generate handoff** (if HANDOFF_REQUIRED=true): Include implementation summary for next Coder (see Output section).
58
61
 
59
62
  ## Principles
60
63
 
61
64
  1. **Work on feature branch** - All operations happen on the current feature branch
62
- 2. **Branch orientation first** - In sequential execution, read actual files before trusting handoff summaries
65
+ 2. **Branch orientation first** - Always orient on branch state before writing code; actual code is authoritative over summaries
63
66
  3. **Pattern discovery first** - Before writing code, find similar implementations and match their conventions
64
67
  4. **Be decisive** - Make confident implementation choices. Don't present alternatives or ask permission for tactical decisions
65
68
  5. **Follow existing patterns** - Match codebase style, don't invent new conventions
@@ -124,4 +127,4 @@ Return structured completion status:
124
127
  - Switch branches during implementation
125
128
  - Push to branches other than your feature branch
126
129
  - Merge PRs (orchestrator handles this)
127
- - Trust handoff summaries without reading actual code (in sequential execution)
130
+ - Trust handoff summaries without reading actual code
@@ -128,10 +128,14 @@ Analyze 3 axes to determine strategy:
128
128
  Synthesize outputs from multiple Reviewer agents. Apply strict merge rules.
129
129
 
130
130
  **Process:**
131
- 1. Read all review reports from `${REVIEW_BASE_DIR}/*-report.*.md`
132
- 2. Categorize issues into 3 buckets (from review-methodology)
133
- 3. Count by severity (CRITICAL, HIGH, MEDIUM, LOW)
134
- 4. Determine merge recommendation based on blocking issues
131
+ 1. Read all review reports from `${REVIEW_BASE_DIR}/*.md` (exclude your own output `review-summary.*.md`)
132
+ 2. Extract confidence percentages from each finding
133
+ 3. Apply confidence-aware aggregation: when multiple reviewers flag the same file:line, boost confidence by 10% per additional reviewer (cap at 100%)
134
+ <!-- Confidence threshold also in: shared/agents/reviewer.md, plugins/devflow-code-review/commands/code-review.md -->
135
+ 4. Maintain ≥80% confidence threshold in final output
136
+ 5. Categorize issues into 3 buckets (from review-methodology)
137
+ 6. Count by severity (CRITICAL, HIGH, MEDIUM, LOW)
138
+ 7. Determine merge recommendation based on blocking issues
135
139
 
136
140
  **Issue Categories:**
137
141
  - **Blocking** (Category 1): Issues in YOUR changes - CRITICAL/HIGH must block
@@ -172,7 +176,10 @@ Report format:
172
176
  | Pre-existing | - | - | {n} | {n} | {n} |
173
177
 
174
178
  ## Blocking Issues
175
- {List with file:line and suggested fix}
179
+ {List with file:line, confidence %, and suggested fix}
180
+
181
+ ## Suggestions (Lower Confidence)
182
+ {Max 5 items across all reviewers with 60-79% confidence. Brief descriptions only.}
176
183
 
177
184
  ## Action Plan
178
185
  1. {Priority fix}
@@ -318,11 +318,7 @@ FILES_FROM_PRIOR_PHASE: {list of files created}
318
318
  HANDOFF_REQUIRED: {true if not last phase}"
319
319
  ```
320
320
 
321
- **Handoff Protocol**: Each sequential Coder receives the prior Coder's implementation summary. The receiving Coder MUST:
322
- 1. Check git log to see commits from previous phases
323
- 2. Read actual files created - do not trust summary alone
324
- 3. Identify patterns from actual code (naming, error handling, testing)
325
- 4. Reference handoff summary to validate understanding
321
+ **Handoff Protocol**: Each sequential Coder receives the prior Coder's implementation summary via PRIOR_PHASE_SUMMARY and FILES_FROM_PRIOR_PHASE. The Coder's built-in branch orientation step handles git log scanning, file reading, and pattern discovery automatically.
326
322
 
327
323
  ---
328
324
 
@@ -191,11 +191,7 @@ FILES_FROM_PRIOR_PHASE: {list of files created}
191
191
  HANDOFF_REQUIRED: {true if not last phase}"
192
192
  ```
193
193
 
194
- **Handoff Protocol**: Each sequential Coder receives the prior Coder's implementation summary. The receiving Coder MUST:
195
- 1. Check git log to see commits from previous phases
196
- 2. Read actual files created - do not trust summary alone
197
- 3. Identify patterns from actual code (naming, error handling, testing)
198
- 4. Reference handoff summary to validate understanding
194
+ **Handoff Protocol**: Each sequential Coder receives the prior Coder's implementation summary via PRIOR_PHASE_SUMMARY and FILES_FROM_PRIOR_PHASE. The Coder's built-in branch orientation step handles git log scanning, file reading, and pattern discovery automatically.
199
195
 
200
196
  ---
201
197
 
@@ -4,7 +4,7 @@
4
4
  "author": {
5
5
  "name": "Dean0x"
6
6
  },
7
- "version": "1.4.0",
7
+ "version": "1.5.0",
8
8
  "homepage": "https://github.com/dean0x/devflow",
9
9
  "repository": "https://github.com/dean0x/devflow",
10
10
  "license": "MIT",
@@ -4,7 +4,7 @@
4
4
  "author": {
5
5
  "name": "Dean0x"
6
6
  },
7
- "version": "1.4.0",
7
+ "version": "1.5.0",
8
8
  "homepage": "https://github.com/dean0x/devflow",
9
9
  "repository": "https://github.com/dean0x/devflow",
10
10
  "license": "MIT",
@@ -4,7 +4,7 @@
4
4
  "author": {
5
5
  "name": "Dean0x"
6
6
  },
7
- "version": "1.4.0",
7
+ "version": "1.5.0",
8
8
  "homepage": "https://github.com/dean0x/devflow",
9
9
  "repository": "https://github.com/dean0x/devflow",
10
10
  "license": "MIT",
@@ -4,7 +4,7 @@
4
4
  "author": {
5
5
  "name": "Dean0x"
6
6
  },
7
- "version": "1.4.0",
7
+ "version": "1.5.0",
8
8
  "homepage": "https://github.com/dean0x/devflow",
9
9
  "repository": "https://github.com/dean0x/devflow",
10
10
  "license": "MIT",
@@ -4,7 +4,7 @@
4
4
  "author": {
5
5
  "name": "Dean0x"
6
6
  },
7
- "version": "1.4.0",
7
+ "version": "1.5.0",
8
8
  "homepage": "https://github.com/dean0x/devflow",
9
9
  "repository": "https://github.com/dean0x/devflow",
10
10
  "license": "MIT",
@@ -4,7 +4,7 @@
4
4
  "author": {
5
5
  "name": "Dean0x"
6
6
  },
7
- "version": "1.4.0",
7
+ "version": "1.5.0",
8
8
  "agents": [
9
9
  "simplifier",
10
10
  "scrutinizer",
@@ -4,7 +4,7 @@
4
4
  "author": {
5
5
  "name": "Dean0x"
6
6
  },
7
- "version": "1.4.0",
7
+ "version": "1.5.0",
8
8
  "homepage": "https://github.com/dean0x/devflow",
9
9
  "repository": "https://github.com/dean0x/devflow",
10
10
  "license": "MIT",
@@ -128,10 +128,14 @@ Analyze 3 axes to determine strategy:
128
128
  Synthesize outputs from multiple Reviewer agents. Apply strict merge rules.
129
129
 
130
130
  **Process:**
131
- 1. Read all review reports from `${REVIEW_BASE_DIR}/*-report.*.md`
132
- 2. Categorize issues into 3 buckets (from review-methodology)
133
- 3. Count by severity (CRITICAL, HIGH, MEDIUM, LOW)
134
- 4. Determine merge recommendation based on blocking issues
131
+ 1. Read all review reports from `${REVIEW_BASE_DIR}/*.md` (exclude your own output `review-summary.*.md`)
132
+ 2. Extract confidence percentages from each finding
133
+ 3. Apply confidence-aware aggregation: when multiple reviewers flag the same file:line, boost confidence by 10% per additional reviewer (cap at 100%)
134
+ <!-- Confidence threshold also in: shared/agents/reviewer.md, plugins/devflow-code-review/commands/code-review.md -->
135
+ 4. Maintain ≥80% confidence threshold in final output
136
+ 5. Categorize issues into 3 buckets (from review-methodology)
137
+ 6. Count by severity (CRITICAL, HIGH, MEDIUM, LOW)
138
+ 7. Determine merge recommendation based on blocking issues
135
139
 
136
140
  **Issue Categories:**
137
141
  - **Blocking** (Category 1): Issues in YOUR changes - CRITICAL/HIGH must block
@@ -172,7 +176,10 @@ Report format:
172
176
  | Pre-existing | - | - | {n} | {n} | {n} |
173
177
 
174
178
  ## Blocking Issues
175
- {List with file:line and suggested fix}
179
+ {List with file:line, confidence %, and suggested fix}
180
+
181
+ ## Suggestions (Lower Confidence)
182
+ {Max 5 items across all reviewers with 60-79% confidence. Brief descriptions only.}
176
183
 
177
184
  ## Action Plan
178
185
  1. {Priority fix}
@@ -4,7 +4,7 @@
4
4
  "author": {
5
5
  "name": "Dean0x"
6
6
  },
7
- "version": "1.4.0",
7
+ "version": "1.5.0",
8
8
  "homepage": "https://github.com/dean0x/devflow",
9
9
  "repository": "https://github.com/dean0x/devflow",
10
10
  "license": "MIT",
@@ -29,11 +29,14 @@ You receive from orchestrator:
29
29
 
30
30
  ## Responsibilities
31
31
 
32
- 1. **Orient on branch state**: Check git log for commits from previous Coders (if sequential). Read files created by prior phases - **do not trust summaries alone**. Identify patterns from actual code: naming conventions, error handling approach, testing style.
33
-
34
- 2. **Reference handoff** (if PRIOR_PHASE_SUMMARY provided): Use summary to validate your understanding of prior work, not as the sole source of truth. The actual code is authoritative.
35
-
36
- 3. **Load domain skills**: Based on DOMAIN hint and files in scope, dynamically load relevant language/ecosystem skills by reading their SKILL.md. Only load skills that are installed:
32
+ 1. **Orient on branch state** (always, before any implementation):
33
+ - Run `git log --oneline --stat -n 10` to scan recent commit history on this branch
34
+ - Run `git status` and `git diff --stat` and `git diff --cached --stat` to see uncommitted/unstaged work
35
+ - Cross-reference changed files against EXECUTION_PLAN to identify what's relevant to your task
36
+ - Read those relevant files to understand interfaces, types, naming conventions, error handling, and testing patterns established by prior work
37
+ - If PRIOR_PHASE_SUMMARY is provided, use it to validate your understanding — actual code is authoritative, summaries are supplementary
38
+
39
+ 2. **Load domain skills**: Based on DOMAIN hint and files in scope, dynamically load relevant language/ecosystem skills by reading their SKILL.md. Only load skills that are installed:
37
40
  - `backend` (TypeScript): Read `~/.claude/skills/typescript/SKILL.md`, `~/.claude/skills/input-validation/SKILL.md`
38
41
  - `backend` (Go): Read `~/.claude/skills/go/SKILL.md`
39
42
  - `backend` (Java): Read `~/.claude/skills/java/SKILL.md`
@@ -44,22 +47,22 @@ You receive from orchestrator:
44
47
  - `fullstack`: Combine backend + frontend skills
45
48
  - If a Read fails (skill not installed), skip it silently and continue.
46
49
 
47
- 4. **Implement the plan**: Work through execution steps systematically, creating and modifying files. Follow existing patterns. Type everything. Use Result types if codebase uses them.
50
+ 3. **Implement the plan**: Work through execution steps systematically, creating and modifying files. Follow existing patterns. Type everything. Use Result types if codebase uses them.
48
51
 
49
- 5. **Write tests**: Add tests for new functionality. Cover happy path, error cases, and edge cases. Follow existing test patterns.
52
+ 4. **Write tests**: Add tests for new functionality. Cover happy path, error cases, and edge cases. Follow existing test patterns.
50
53
 
51
- 6. **Run tests**: Execute the test suite. Fix any failures. All tests must pass before proceeding.
54
+ 5. **Run tests**: Execute the test suite. Fix any failures. All tests must pass before proceeding.
52
55
 
53
- 7. **Commit and push**: Create atomic commits with clear messages. Reference TASK_ID. Push to remote.
56
+ 6. **Commit and push**: Create atomic commits with clear messages. Reference TASK_ID. Push to remote.
54
57
 
55
- 8. **Create PR** (if CREATE_PR=true): Create pull request against BASE_BRANCH with summary and testing notes.
58
+ 7. **Create PR** (if CREATE_PR=true): Create pull request against BASE_BRANCH with summary and testing notes.
56
59
 
57
- 9. **Generate handoff** (if HANDOFF_REQUIRED=true): Include implementation summary for next Coder (see Output section).
60
+ 8. **Generate handoff** (if HANDOFF_REQUIRED=true): Include implementation summary for next Coder (see Output section).
58
61
 
59
62
  ## Principles
60
63
 
61
64
  1. **Work on feature branch** - All operations happen on the current feature branch
62
- 2. **Branch orientation first** - In sequential execution, read actual files before trusting handoff summaries
65
+ 2. **Branch orientation first** - Always orient on branch state before writing code; actual code is authoritative over summaries
63
66
  3. **Pattern discovery first** - Before writing code, find similar implementations and match their conventions
64
67
  4. **Be decisive** - Make confident implementation choices. Don't present alternatives or ask permission for tactical decisions
65
68
  5. **Follow existing patterns** - Match codebase style, don't invent new conventions
@@ -124,4 +127,4 @@ Return structured completion status:
124
127
  - Switch branches during implementation
125
128
  - Push to branches other than your feature branch
126
129
  - Merge PRs (orchestrator handles this)
127
- - Trust handoff summaries without reading actual code (in sequential execution)
130
+ - Trust handoff summaries without reading actual code
@@ -46,8 +46,33 @@ The orchestrator provides:
46
46
  3. **Apply 3-category classification** - Sort issues by where they occur
47
47
  4. **Apply focus-specific analysis** - Use pattern skill detection rules from the loaded skill file
48
48
  5. **Assign severity** - CRITICAL, HIGH, MEDIUM, LOW based on impact
49
- 6. **Generate report** - File:line references with suggested fixes
50
- 7. **Determine merge recommendation** - Based on blocking issues
49
+ 6. **Assess confidence** - Assign 0-100% confidence to each finding (see Confidence Scale below)
50
+ 7. **Filter by confidence** - Only report findings ≥80% in main sections; lower-confidence items go to Suggestions
51
+ 8. **Consolidate similar issues** - Group related findings to reduce noise (see Consolidation Rules)
52
+ 9. **Generate report** - File:line references with suggested fixes
53
+ 10. **Determine merge recommendation** - Based on blocking issues
54
+
55
+ ## Confidence Scale
56
+
57
+ Assess how certain you are that each finding is a real issue (not a false positive):
58
+
59
+ | Range | Label | Meaning |
60
+ |-------|-------|---------|
61
+ | 90-100% | Certain | Clearly a bug, vulnerability, or violation — no ambiguity |
62
+ | 80-89% | High | Very likely an issue, but minor chance of false positive |
63
+ | 60-79% | Medium | Plausible issue, but depends on context you may not fully see |
64
+ | < 60% | Low | Possible concern, but likely a matter of style or interpretation |
65
+
66
+ <!-- Confidence threshold also in: shared/agents/synthesizer.md, plugins/devflow-code-review/commands/code-review.md -->
67
+ **Threshold**: Only report findings with ≥80% confidence in Blocking, Should-Fix, and Pre-existing sections. Findings with 60-79% confidence go to the Suggestions section. Findings < 60% are dropped entirely.
68
+
69
+ ## Consolidation Rules
70
+
71
+ Before writing your report, apply these noise reduction rules:
72
+
73
+ 1. **Group similar issues** — If 3+ instances of the same pattern appear (e.g., "missing error handling" in multiple functions), consolidate into 1 finding listing all locations rather than N separate findings
74
+ 2. **Skip stylistic preferences** — Do not flag formatting, naming style, or code organization choices unless they violate explicit project conventions found in CLAUDE.md, .editorconfig, or linter configs
75
+ 3. **Skip issues in unchanged code** — Pre-existing issues in lines you did NOT change should only be reported if CRITICAL severity (security vulnerabilities, data loss risks)
51
76
 
52
77
  ## Issue Categories (from review-methodology)
53
78
 
@@ -76,17 +101,29 @@ Report format for `{output_path}`:
76
101
 
77
102
  ### CRITICAL
78
103
  **{Issue}** - `file.ts:123`
104
+ **Confidence**: {n}%
79
105
  - Problem: {description}
80
106
  - Fix: {suggestion with code}
81
107
 
108
+ **{Issue Title} ({N} occurrences)** — Confidence: {n}%
109
+ - `file1.ts:12`, `file2.ts:45`, `file3.ts:89`
110
+ - Problem: {description of the shared pattern}
111
+ - Fix: {suggestion that applies to all occurrences}
112
+
82
113
  ### HIGH
83
- {issues...}
114
+ {issues with **Confidence**: {n}% each...}
84
115
 
85
116
  ## Issues in Code You Touched (Should Fix)
86
- {issues with file:line...}
117
+ {issues with file:line and **Confidence**: {n}% each...}
87
118
 
88
119
  ## Pre-existing Issues (Not Blocking)
89
- {informational issues...}
120
+ {informational issues with **Confidence**: {n}% each...}
121
+
122
+ ## Suggestions (Lower Confidence)
123
+
124
+ {Max 3 items with 60-79% confidence. Brief description only — no code fixes.}
125
+
126
+ - **{Issue}** - `file.ts:456` (Confidence: {n}%) — {brief description}
90
127
 
91
128
  ## Summary
92
129
  | Category | CRITICAL | HIGH | MEDIUM | LOW |