npm - devflow-kit - Versions diffs - 1.3.3 → 1.5.0 - Mend

devflow-kit 1.3.3 → 1.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (51) hide show

package/plugins/devflow-react/.claude-plugin/plugin.json CHANGED Viewed

@@ -2,10 +2,9 @@
   "name": "devflow-react",
   "description": "React framework patterns - hooks, state management, composition, performance",
   "author": {
-    "name": "DevFlow Contributors",
-    "email": "dean@keren.dev"
+    "name": "Dean0x"
   },
-  "version": "1.3.3",
+  "version": "1.5.0",
   "homepage": "https://github.com/dean0x/devflow",
   "repository": "https://github.com/dean0x/devflow",
   "license": "MIT",

package/plugins/devflow-resolve/.claude-plugin/plugin.json CHANGED Viewed

@@ -2,10 +2,9 @@
   "name": "devflow-resolve",
   "description": "Process and fix code review issues with risk assessment - decides FIX vs TECH_DEBT based on impact",
   "author": {
-    "name": "DevFlow Contributors",
-    "email": "dean@keren.dev"
+    "name": "Dean0x"
   },
-  "version": "1.3.3",
+  "version": "1.5.0",
   "homepage": "https://github.com/dean0x/devflow",
   "repository": "https://github.com/dean0x/devflow",
   "license": "MIT",

package/plugins/devflow-resolve/agents/git.md CHANGED Viewed

@@ -21,7 +21,7 @@ The orchestrator provides:
 |-----------|---------|----------------|
 | `ensure-pr-ready` | Pre-flight for /review: commit, push, create PR | - |
 | `validate-branch` | Pre-flight for /resolve: check branch state | - |
-| `setup-task` | Create feature branch and fetch issue | `TASK_ID`, `BASE_BRANCH`, `ISSUE_INPUT` (optional) |
+| `setup-task` | Create feature branch and fetch issue | `BASE_BRANCH`, `ISSUE_INPUT` (optional), `TASK_DESCRIPTION` (optional) |
 | `fetch-issue` | Fetch GitHub issue for implementation | `ISSUE_INPUT` (number or search term) |
 | `comment-pr` | Create PR inline comments for review findings | `PR_NUMBER`, `REVIEW_BASE_DIR`, `TIMESTAMP` |
 | `manage-debt` | Update tech debt backlog with pre-existing issues | `REVIEW_DIR`, `TIMESTAMP` |
@@ -101,25 +101,30 @@ Pre-flight validation for `/resolve`. Checks branch state without modifications.
 ## Operation: setup-task
-Set up task environment: create feature branch and optionally fetch issue.
+Set up task environment: derive branch name, create feature branch, and optionally fetch issue.
 **Input:**
-- `TASK_ID`: Unique task identifier (becomes branch name)
 - `BASE_BRANCH`: Branch to create from (track this for PR target)
 - `ISSUE_INPUT` (optional): Issue number to fetch
+- `TASK_DESCRIPTION` (optional): Free-text task description (when no issue)
 **Process:**
 1. Record current branch as BASE_BRANCH for later PR targeting
-2. Create and checkout feature branch: `git checkout -b {TASK_ID}`
-3. If ISSUE_INPUT provided, fetch issue details via GitHub API
-4. Return setup summary with BASE_BRANCH recorded
+2. **Derive branch name:**
+   - If `ISSUE_INPUT` provided: fetch issue via GitHub API first, then derive branch name as `{type}/{number}-{slug}` where:
+     - `type` is inferred from issue labels: `bug` → `fix`, `documentation` or `docs` → `docs`, `refactor` → `refactor`, `chore` or `maintenance` → `chore`, default → `feature`
+     - `slug` is the issue title: lowercased, non-alphanumeric replaced with hyphens, consecutive hyphens collapsed, trimmed, max 40 characters
+   - If `TASK_DESCRIPTION` provided (no issue): infer type from description keywords (e.g., "fix login bug" → `fix`, "refactor auth" → `refactor`, "add JWT" → `feature`, "update docs" → `docs`, "chore: cleanup" → `chore`), then slugify description as `{type}/{slug}` (max 40 chars)
+   - If neither: fallback to `task-{YYYY-MM-DD_HHMM}`
+3. Create and checkout feature branch: `git checkout -b {derived-branch-name}`
+4. Return setup summary with branch name and BASE_BRANCH recorded
 **Output:**
 ```markdown
-## Task Setup: {TASK_ID}
+## Task Setup: {branch-name}
 ### Branch
-- **Feature branch**: {TASK_ID}
+- **Branch name**: {derived-branch-name}
 - **Base branch**: {BASE_BRANCH} (PR target)
 ### Issue (if fetched)

package/plugins/devflow-rust/.claude-plugin/plugin.json CHANGED Viewed

@@ -2,10 +2,9 @@
   "name": "devflow-rust",
   "description": "Rust language patterns - ownership, borrowing, error handling, type-driven design",
   "author": {
-    "name": "DevFlow Contributors",
-    "email": "dean@keren.dev"
+    "name": "Dean0x"
   },
-  "version": "1.3.3",
+  "version": "1.5.0",
   "homepage": "https://github.com/dean0x/devflow",
   "repository": "https://github.com/dean0x/devflow",
   "license": "MIT",

package/plugins/devflow-self-review/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,7 +1,10 @@
 {
   "name": "devflow-self-review",
   "description": "Self-review workflow: Simplifier + Scrutinizer for code quality",
-  "version": "1.3.3",
+  "author": {
+    "name": "Dean0x"
+  },
+  "version": "1.5.0",
   "agents": [
     "simplifier",
     "scrutinizer",

package/plugins/devflow-specify/.claude-plugin/plugin.json CHANGED Viewed

@@ -2,10 +2,9 @@
   "name": "devflow-specify",
   "description": "Interactive feature specification - creates well-defined GitHub issues through requirements exploration and clarification",
   "author": {
-    "name": "DevFlow Contributors",
-    "email": "dean@keren.dev"
+    "name": "Dean0x"
   },
-  "version": "1.3.3",
+  "version": "1.5.0",
   "homepage": "https://github.com/dean0x/devflow",
   "repository": "https://github.com/dean0x/devflow",
   "license": "MIT",

package/plugins/devflow-specify/agents/synthesizer.md CHANGED Viewed

@@ -128,10 +128,14 @@ Analyze 3 axes to determine strategy:
 Synthesize outputs from multiple Reviewer agents. Apply strict merge rules.
 **Process:**
-1. Read all review reports from `${REVIEW_BASE_DIR}/*-report.*.md`
-2. Categorize issues into 3 buckets (from review-methodology)
-3. Count by severity (CRITICAL, HIGH, MEDIUM, LOW)
-4. Determine merge recommendation based on blocking issues
+1. Read all review reports from `${REVIEW_BASE_DIR}/*.md` (exclude your own output `review-summary.*.md`)
+2. Extract confidence percentages from each finding
+3. Apply confidence-aware aggregation: when multiple reviewers flag the same file:line, boost confidence by 10% per additional reviewer (cap at 100%)
+<!-- Confidence threshold also in: shared/agents/reviewer.md, plugins/devflow-code-review/commands/code-review.md -->
+4. Maintain ≥80% confidence threshold in final output
+5. Categorize issues into 3 buckets (from review-methodology)
+6. Count by severity (CRITICAL, HIGH, MEDIUM, LOW)
+7. Determine merge recommendation based on blocking issues
 **Issue Categories:**
 - **Blocking** (Category 1): Issues in YOUR changes - CRITICAL/HIGH must block
@@ -172,7 +176,10 @@ Report format:
 | Pre-existing | - | - | {n} | {n} | {n} |
 ## Blocking Issues
-{List with file:line and suggested fix}
+{List with file:line, confidence %, and suggested fix}
+## Suggestions (Lower Confidence)
+{Max 5 items across all reviewers with 60-79% confidence. Brief descriptions only.}
 ## Action Plan
 1. {Priority fix}

package/plugins/devflow-typescript/.claude-plugin/plugin.json CHANGED Viewed

@@ -2,10 +2,9 @@
   "name": "devflow-typescript",
   "description": "TypeScript language patterns - type safety, generics, utility types, type guards",
   "author": {
-    "name": "DevFlow Contributors",
-    "email": "dean@keren.dev"
+    "name": "Dean0x"
   },
-  "version": "1.3.3",
+  "version": "1.5.0",
   "homepage": "https://github.com/dean0x/devflow",
   "repository": "https://github.com/dean0x/devflow",
   "license": "MIT",

package/shared/agents/coder.md CHANGED Viewed

@@ -29,11 +29,14 @@ You receive from orchestrator:
 ## Responsibilities
-1. **Orient on branch state**: Check git log for commits from previous Coders (if sequential). Read files created by prior phases - **do not trust summaries alone**. Identify patterns from actual code: naming conventions, error handling approach, testing style.
-2. **Reference handoff** (if PRIOR_PHASE_SUMMARY provided): Use summary to validate your understanding of prior work, not as the sole source of truth. The actual code is authoritative.
-3. **Load domain skills**: Based on DOMAIN hint and files in scope, dynamically load relevant language/ecosystem skills by reading their SKILL.md. Only load skills that are installed:
+1. **Orient on branch state** (always, before any implementation):
+   - Run `git log --oneline --stat -n 10` to scan recent commit history on this branch
+   - Run `git status` and `git diff --stat` and `git diff --cached --stat` to see uncommitted/unstaged work
+   - Cross-reference changed files against EXECUTION_PLAN to identify what's relevant to your task
+   - Read those relevant files to understand interfaces, types, naming conventions, error handling, and testing patterns established by prior work
+   - If PRIOR_PHASE_SUMMARY is provided, use it to validate your understanding — actual code is authoritative, summaries are supplementary
+2. **Load domain skills**: Based on DOMAIN hint and files in scope, dynamically load relevant language/ecosystem skills by reading their SKILL.md. Only load skills that are installed:
    - `backend` (TypeScript): Read `~/.claude/skills/typescript/SKILL.md`, `~/.claude/skills/input-validation/SKILL.md`
    - `backend` (Go): Read `~/.claude/skills/go/SKILL.md`
    - `backend` (Java): Read `~/.claude/skills/java/SKILL.md`
@@ -44,22 +47,22 @@ You receive from orchestrator:
    - `fullstack`: Combine backend + frontend skills
    - If a Read fails (skill not installed), skip it silently and continue.
-4. **Implement the plan**: Work through execution steps systematically, creating and modifying files. Follow existing patterns. Type everything. Use Result types if codebase uses them.
+3. **Implement the plan**: Work through execution steps systematically, creating and modifying files. Follow existing patterns. Type everything. Use Result types if codebase uses them.
-5. **Write tests**: Add tests for new functionality. Cover happy path, error cases, and edge cases. Follow existing test patterns.
+4. **Write tests**: Add tests for new functionality. Cover happy path, error cases, and edge cases. Follow existing test patterns.
-6. **Run tests**: Execute the test suite. Fix any failures. All tests must pass before proceeding.
+5. **Run tests**: Execute the test suite. Fix any failures. All tests must pass before proceeding.
-7. **Commit and push**: Create atomic commits with clear messages. Reference TASK_ID. Push to remote.
+6. **Commit and push**: Create atomic commits with clear messages. Reference TASK_ID. Push to remote.
-8. **Create PR** (if CREATE_PR=true): Create pull request against BASE_BRANCH with summary and testing notes.
+7. **Create PR** (if CREATE_PR=true): Create pull request against BASE_BRANCH with summary and testing notes.
-9. **Generate handoff** (if HANDOFF_REQUIRED=true): Include implementation summary for next Coder (see Output section).
+8. **Generate handoff** (if HANDOFF_REQUIRED=true): Include implementation summary for next Coder (see Output section).
 ## Principles
 1. **Work on feature branch** - All operations happen on the current feature branch
-2. **Branch orientation first** - In sequential execution, read actual files before trusting handoff summaries
+2. **Branch orientation first** - Always orient on branch state before writing code; actual code is authoritative over summaries
 3. **Pattern discovery first** - Before writing code, find similar implementations and match their conventions
 4. **Be decisive** - Make confident implementation choices. Don't present alternatives or ask permission for tactical decisions
 5. **Follow existing patterns** - Match codebase style, don't invent new conventions
@@ -124,4 +127,4 @@ Return structured completion status:
 - Switch branches during implementation
 - Push to branches other than your feature branch
 - Merge PRs (orchestrator handles this)
-- Trust handoff summaries without reading actual code (in sequential execution)
+- Trust handoff summaries without reading actual code

package/shared/agents/git.md CHANGED Viewed

@@ -21,7 +21,7 @@ The orchestrator provides:
 |-----------|---------|----------------|
 | `ensure-pr-ready` | Pre-flight for /review: commit, push, create PR | - |
 | `validate-branch` | Pre-flight for /resolve: check branch state | - |
-| `setup-task` | Create feature branch and fetch issue | `TASK_ID`, `BASE_BRANCH`, `ISSUE_INPUT` (optional) |
+| `setup-task` | Create feature branch and fetch issue | `BASE_BRANCH`, `ISSUE_INPUT` (optional), `TASK_DESCRIPTION` (optional) |
 | `fetch-issue` | Fetch GitHub issue for implementation | `ISSUE_INPUT` (number or search term) |
 | `comment-pr` | Create PR inline comments for review findings | `PR_NUMBER`, `REVIEW_BASE_DIR`, `TIMESTAMP` |
 | `manage-debt` | Update tech debt backlog with pre-existing issues | `REVIEW_DIR`, `TIMESTAMP` |
@@ -101,25 +101,30 @@ Pre-flight validation for `/resolve`. Checks branch state without modifications.
 ## Operation: setup-task
-Set up task environment: create feature branch and optionally fetch issue.
+Set up task environment: derive branch name, create feature branch, and optionally fetch issue.
 **Input:**
-- `TASK_ID`: Unique task identifier (becomes branch name)
 - `BASE_BRANCH`: Branch to create from (track this for PR target)
 - `ISSUE_INPUT` (optional): Issue number to fetch
+- `TASK_DESCRIPTION` (optional): Free-text task description (when no issue)
 **Process:**
 1. Record current branch as BASE_BRANCH for later PR targeting
-2. Create and checkout feature branch: `git checkout -b {TASK_ID}`
-3. If ISSUE_INPUT provided, fetch issue details via GitHub API
-4. Return setup summary with BASE_BRANCH recorded
+2. **Derive branch name:**
+   - If `ISSUE_INPUT` provided: fetch issue via GitHub API first, then derive branch name as `{type}/{number}-{slug}` where:
+     - `type` is inferred from issue labels: `bug` → `fix`, `documentation` or `docs` → `docs`, `refactor` → `refactor`, `chore` or `maintenance` → `chore`, default → `feature`
+     - `slug` is the issue title: lowercased, non-alphanumeric replaced with hyphens, consecutive hyphens collapsed, trimmed, max 40 characters
+   - If `TASK_DESCRIPTION` provided (no issue): infer type from description keywords (e.g., "fix login bug" → `fix`, "refactor auth" → `refactor`, "add JWT" → `feature`, "update docs" → `docs`, "chore: cleanup" → `chore`), then slugify description as `{type}/{slug}` (max 40 chars)
+   - If neither: fallback to `task-{YYYY-MM-DD_HHMM}`
+3. Create and checkout feature branch: `git checkout -b {derived-branch-name}`
+4. Return setup summary with branch name and BASE_BRANCH recorded
 **Output:**
 ```markdown
-## Task Setup: {TASK_ID}
+## Task Setup: {branch-name}
 ### Branch
-- **Feature branch**: {TASK_ID}
+- **Branch name**: {derived-branch-name}
 - **Base branch**: {BASE_BRANCH} (PR target)
 ### Issue (if fetched)

package/shared/agents/reviewer.md CHANGED Viewed

@@ -46,8 +46,33 @@ The orchestrator provides:
 3. **Apply 3-category classification** - Sort issues by where they occur
 4. **Apply focus-specific analysis** - Use pattern skill detection rules from the loaded skill file
 5. **Assign severity** - CRITICAL, HIGH, MEDIUM, LOW based on impact
-6. **Generate report** - File:line references with suggested fixes
-7. **Determine merge recommendation** - Based on blocking issues
+6. **Assess confidence** - Assign 0-100% confidence to each finding (see Confidence Scale below)
+7. **Filter by confidence** - Only report findings ≥80% in main sections; lower-confidence items go to Suggestions
+8. **Consolidate similar issues** - Group related findings to reduce noise (see Consolidation Rules)
+9. **Generate report** - File:line references with suggested fixes
+10. **Determine merge recommendation** - Based on blocking issues
+## Confidence Scale
+Assess how certain you are that each finding is a real issue (not a false positive):
+| Range | Label | Meaning |
+|-------|-------|---------|
+| 90-100% | Certain | Clearly a bug, vulnerability, or violation — no ambiguity |
+| 80-89% | High | Very likely an issue, but minor chance of false positive |
+| 60-79% | Medium | Plausible issue, but depends on context you may not fully see |
+| < 60% | Low | Possible concern, but likely a matter of style or interpretation |
+<!-- Confidence threshold also in: shared/agents/synthesizer.md, plugins/devflow-code-review/commands/code-review.md -->
+**Threshold**: Only report findings with ≥80% confidence in Blocking, Should-Fix, and Pre-existing sections. Findings with 60-79% confidence go to the Suggestions section. Findings < 60% are dropped entirely.
+## Consolidation Rules
+Before writing your report, apply these noise reduction rules:
+1. **Group similar issues** — If 3+ instances of the same pattern appear (e.g., "missing error handling" in multiple functions), consolidate into 1 finding listing all locations rather than N separate findings
+2. **Skip stylistic preferences** — Do not flag formatting, naming style, or code organization choices unless they violate explicit project conventions found in CLAUDE.md, .editorconfig, or linter configs
+3. **Skip issues in unchanged code** — Pre-existing issues in lines you did NOT change should only be reported if CRITICAL severity (security vulnerabilities, data loss risks)
 ## Issue Categories (from review-methodology)
@@ -76,17 +101,29 @@ Report format for `{output_path}`:
 ### CRITICAL
 **{Issue}** - `file.ts:123`
+**Confidence**: {n}%
 - Problem: {description}
 - Fix: {suggestion with code}
+**{Issue Title} ({N} occurrences)** — Confidence: {n}%
+- `file1.ts:12`, `file2.ts:45`, `file3.ts:89`
+- Problem: {description of the shared pattern}
+- Fix: {suggestion that applies to all occurrences}
 ### HIGH
-{issues...}
+{issues with **Confidence**: {n}% each...}
 ## Issues in Code You Touched (Should Fix)
-{issues with file:line...}
+{issues with file:line and **Confidence**: {n}% each...}
 ## Pre-existing Issues (Not Blocking)
-{informational issues...}
+{informational issues with **Confidence**: {n}% each...}
+## Suggestions (Lower Confidence)
+{Max 3 items with 60-79% confidence. Brief description only — no code fixes.}
+- **{Issue}** - `file.ts:456` (Confidence: {n}%) — {brief description}
 ## Summary
 | Category | CRITICAL | HIGH | MEDIUM | LOW |

package/shared/agents/synthesizer.md CHANGED Viewed

@@ -128,10 +128,14 @@ Analyze 3 axes to determine strategy:
 Synthesize outputs from multiple Reviewer agents. Apply strict merge rules.
 **Process:**
-1. Read all review reports from `${REVIEW_BASE_DIR}/*-report.*.md`
-2. Categorize issues into 3 buckets (from review-methodology)
-3. Count by severity (CRITICAL, HIGH, MEDIUM, LOW)
-4. Determine merge recommendation based on blocking issues
+1. Read all review reports from `${REVIEW_BASE_DIR}/*.md` (exclude your own output `review-summary.*.md`)
+2. Extract confidence percentages from each finding
+3. Apply confidence-aware aggregation: when multiple reviewers flag the same file:line, boost confidence by 10% per additional reviewer (cap at 100%)
+<!-- Confidence threshold also in: shared/agents/reviewer.md, plugins/devflow-code-review/commands/code-review.md -->
+4. Maintain ≥80% confidence threshold in final output
+5. Categorize issues into 3 buckets (from review-methodology)
+6. Count by severity (CRITICAL, HIGH, MEDIUM, LOW)
+7. Determine merge recommendation based on blocking issues
 **Issue Categories:**
 - **Blocking** (Category 1): Issues in YOUR changes - CRITICAL/HIGH must block
@@ -172,7 +176,10 @@ Report format:
 | Pre-existing | - | - | {n} | {n} | {n} |
 ## Blocking Issues
-{List with file:line and suggested fix}
+{List with file:line, confidence %, and suggested fix}
+## Suggestions (Lower Confidence)
+{Max 5 items across all reviewers with 60-79% confidence. Brief descriptions only.}
 ## Action Plan
 1. {Priority fix}

package/shared/skills/ambient-router/SKILL.md CHANGED Viewed

@@ -54,7 +54,7 @@ Based on classified intent, read the following skills to inform your response.
 | Intent | Primary Skills | Secondary (if file type matches) |
 |--------|---------------|----------------------------------|
-| **BUILD** | test-driven-development, implementation-patterns | typescript (.ts), react (.tsx/.jsx), go (.go), java (.java), python (.py), rust (.rs), frontend-design (CSS/UI), input-validation (forms/API), security-patterns (auth/crypto) |
+| **BUILD** | test-driven-development, implementation-patterns, search-first | typescript (.ts), react (.tsx/.jsx), go (.go), java (.java), python (.py), rust (.rs), frontend-design (CSS/UI), input-validation (forms/API), security-patterns (auth/crypto) |
 | **DEBUG** | test-patterns, core-patterns | git-safety (if git operations involved) |
 | **REVIEW** | self-review, core-patterns | test-patterns |
 | **PLAN** | implementation-patterns | core-patterns |

package/shared/skills/ambient-router/references/skill-catalog.md CHANGED Viewed

@@ -12,6 +12,7 @@ These skills may be loaded during GUIDED-depth ambient routing.
 |-------|-------------|---------------|
 | test-driven-development | Always for BUILD | `*.ts`, `*.tsx`, `*.js`, `*.jsx`, `*.py` |
 | implementation-patterns | Always for BUILD | Any code file |
+| search-first | Always for BUILD | Any code file |
 | typescript | TypeScript files in scope | `*.ts`, `*.tsx` |
 | react | React components in scope | `*.tsx`, `*.jsx` |
 | frontend-design | UI/styling work | `*.css`, `*.scss`, `*.tsx` with styling keywords |

package/shared/skills/search-first/SKILL.md ADDED Viewed

@@ -0,0 +1,133 @@
+---
+name: search-first
+description: >-
+  This skill should be used when the user asks to "add a utility", "create a helper",
+  "implement parsing", "build a wrapper", or writes infrastructure/utility code that
+  may already exist as a well-maintained package. Enforces research before building.
+user-invocable: false
+allowed-tools: Read, Grep, Glob
+---
+# Search-First
+Research before building. Check if a battle-tested solution exists before writing custom utility code.
+## Iron Law
+> **RESEARCH BEFORE BUILDING**
+>
+> Never write custom utility code without first checking if a battle-tested solution
+> exists. The best code is code you don't write. A maintained package with thousands
+> of users will always beat a hand-rolled utility in reliability, edge cases, and
+> long-term maintenance.
+## When This Skill Activates
+**Triggers** — creating or modifying code that:
+- Parses, formats, or validates data (dates, URLs, emails, UUIDs, etc.)
+- Implements common algorithms (sorting, diffing, hashing, encoding)
+- Wraps HTTP clients, retries, rate limiting, caching
+- Handles file system operations beyond basic read/write
+- Implements CLI argument parsing, logging, or configuration
+- Creates test utilities (mocking, fixtures, assertions)
+**Does NOT trigger** for:
+- Domain-specific business logic unique to this project
+- Glue code connecting existing components
+- Trivial operations (< 5 lines, single-use)
+- Code that intentionally avoids external dependencies (e.g., zero-dep libraries)
+---
+## Research Process
+### Phase 1: Need Analysis
+Before searching, define what you actually need:
+```
+Need: {one-sentence description of the capability}
+Constraints: {runtime, bundle size, license, zero-dep requirement}
+Must-haves: {non-negotiable requirements}
+Nice-to-haves: {optional features}
+```
+### Phase 2: Search
+Delegate research to an Explore subagent to keep main session context clean.
+**Spawn an Explore agent** with this prompt template:
+```
+Task(subagent_type="Explore"):
+"Research existing solutions for: {need description}
+Search for:
+1. npm/PyPI/crates packages that solve this (check package.json/requirements.txt for ecosystem)
+2. Existing utilities in this codebase (grep for related function names)
+3. Framework built-ins that already handle this
+For each candidate, find:
+- Package name and weekly downloads (if applicable)
+- Last publish date and maintenance status
+- Bundle size / dependency count
+- API surface relevant to our need
+- License compatibility
+Return top 3 candidates with pros/cons, or confirm nothing suitable exists."
+```
+### Phase 3: Evaluate
+Score each candidate against evaluation criteria. See `references/evaluation-criteria.md` for the full matrix.
+Quick checklist:
+- [ ] Last published within 12 months
+- [ ] Weekly downloads > 1,000 (npm) or equivalent traction
+- [ ] No known vulnerabilities (check Snyk/npm audit)
+- [ ] API fits the use case without heavy wrapping
+- [ ] License compatible with project (MIT/Apache/BSD preferred)
+- [ ] Bundle size acceptable for the project context
+### Phase 4: Decide
+Choose one of four outcomes:
+| Decision | When | Action |
+|----------|------|--------|
+| **Adopt** | Exact match, well-maintained, good API | Install and use directly |
+| **Extend** | Partial match, needs thin wrapper | Install + write minimal adapter |
+| **Compose** | No single package fits, but 2-3 small ones combine well | Install multiple, write glue code |
+| **Build** | Nothing fits, or dependency cost exceeds value | Write custom, document why |
+**Document the decision** in a code comment at the usage site:
+```typescript
+// search-first: Adopted date-fns for date formatting (2M weekly downloads, 30KB)
+// search-first: Built custom — no package handles our specific wire format
+```
+---
+## Anti-Patterns
+| Anti-Pattern | Correct Approach |
+|-------------|-----------------|
+| Adding a dependency for 5 lines of trivial code | Build — dependency overhead exceeds value |
+| Choosing the most popular package without checking fit | Evaluate API fit, not just popularity |
+| Wrapping a package so heavily it obscures the original | If wrapping > 50% of original API, reconsider |
+| Skipping research because "I know how to build this" | Research anyway — maintenance cost matters more than initial build |
+| Installing a massive framework for one utility function | Look for focused, single-purpose packages |
+## Scope Limiter
+This skill concerns **utility and infrastructure code** only:
+- Data transformation, validation, formatting
+- Network operations, retries, caching
+- CLI tooling, logging, configuration
+- Test utilities and helpers
+It does NOT apply to **domain-specific business logic** where:
+- The logic encodes unique business rules
+- No generic solution could exist
+- The code is inherently project-specific

package/shared/skills/search-first/references/evaluation-criteria.md ADDED Viewed

@@ -0,0 +1,101 @@
+# Search-First — Evaluation Criteria
+Detailed package evaluation criteria and decision matrix for the 4-outcome model.
+## Evaluation Matrix
+Score each candidate on these axes (1-5 scale):
+| Criterion | Weight | 1 (Poor) | 3 (Acceptable) | 5 (Excellent) |
+|-----------|--------|-----------|-----------------|----------------|
+| **Maintenance** | High | No commits in 2+ years | Active, yearly releases | Regular releases, responsive maintainer |
+| **Adoption** | Medium | < 100 weekly downloads | 1K-10K weekly downloads | > 100K weekly downloads |
+| **API Fit** | High | Needs heavy wrapping | Partial fit, thin adapter needed | Direct use, clean API |
+| **Bundle Size** | Medium | > 500KB | 50-500KB | < 50KB |
+| **Security** | High | Known vulnerabilities | No known issues, few dependencies | Audited, zero/minimal dependencies |
+| **License** | Required | GPL/AGPL (restrictive) | LGPL (conditional) | MIT/Apache/BSD (permissive) |
+**Minimum thresholds**: License must be compatible. Security must be ≥ 3. All others are trade-offs.
+## Decision Matrix
+### Adopt (score ≥ 20/25, API Fit ≥ 4)
+The package directly solves the problem with minimal integration code.
+**Example**: Using `zod` for schema validation — exact fit, massive adoption, tiny bundle.
+```
+✅ Adopt: zod v3.22
+- Maintenance: 5 (monthly releases)
+- Adoption: 5 (4M weekly downloads)
+- API Fit: 5 (direct use for all validation)
+- Bundle Size: 4 (57KB)
+- Security: 5 (zero dependencies)
+- Total: 24/25
+```
+### Extend (score ≥ 15/25, API Fit ≥ 2)
+The package handles 60-80% of the need. Write a thin adapter for the rest.
+**Example**: Using `got` for HTTP but wrapping it with project-specific retry and auth logic.
+```
+✅ Extend: got v14
+- Maintenance: 4 (active)
+- Adoption: 5 (8M weekly downloads)
+- API Fit: 3 (need custom retry wrapper)
+- Bundle Size: 3 (150KB)
+- Security: 4 (minimal deps)
+- Total: 19/25
+Adapter: ~30 lines wrapping retry + auth headers
+```
+### Compose (no single package fits, but small packages combine)
+Two or three focused packages together solve the problem better than one large framework.
+**Example**: `ms` (time parsing) + `p-retry` (retry logic) + `quick-lru` (caching) instead of a monolithic HTTP client framework.
+**Rules for Compose**:
+- Maximum 3 packages in a composition
+- Each package must be focused (single responsibility)
+- Total combined bundle < what a monolithic alternative would cost
+- Glue code should be < 50 lines
+### Build (nothing fits, or dependency cost > value)
+Write custom code when:
+- No package scores ≥ 15/25
+- The code is < 50 lines and trivial
+- Zero-dependency constraint is explicit
+- The domain is too specific for generic packages
+**Required**: Document why Build was chosen:
+```typescript
+// search-first: Built custom — our wire format uses non-standard
+// ISO-8601 extensions that no date library handles correctly.
+// Evaluated: date-fns (no custom format support), luxon (500KB overhead),
+// dayjs (close but missing timezone edge case).
+```
+## Ecosystem-Specific Hints
+### Node.js / TypeScript
+- Check npm: `https://www.npmjs.com/package/{name}`
+- Bundle size: `https://bundlephobia.com/package/{name}`
+- Check if Node.js built-ins handle it (`node:crypto`, `node:url`, `node:path`)
+### Python
+- Check PyPI: `https://pypi.org/project/{name}`
+- Check if stdlib handles it (`urllib`, `json`, `pathlib`, `dataclasses`)
+### Rust
+- Check crates.io: `https://crates.io/crates/{name}`
+- Check if std handles it
+### Go
+- Check pkg.go.dev
+- Go standard library is extensive — check stdlib first