npm - specweave - Versions diffs - 1.0.577 → 1.0.579 - Mend

specweave 1.0.577 → 1.0.579

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (94) hide show

package/plugins/specweave/skills/grill/SKILL.md CHANGED Viewed

@@ -8,6 +8,48 @@ model: opus
 # Code Grill Expert
+## Tool-Use Rationale
+- **Read**: Load the increment's `spec.md`, `rubric.md`, `tasks.md`, and the implementation files being interrogated so findings have real evidence.
+- **Grep**: Search for AC IDs, try/catch sites, TODO markers, and patterns cited during grilling.
+- **Glob**: Enumerate implementation and test files within the increment's scope to spot untested paths.
+- **Bash**: Run `npx vitest run` (and related commands) to confirm whether a suspected bug actually trips a test.
+## Model Configuration
+**Default effort**: `xhigh` — recommended for all review tasks per Opus 4.7 conventions.
+**Opt-in max**: `--effort max` enables maximum effort with a warning: "max effort risks overthinking on straightforward problems."
+**Legacy mode**: Set `quality.thinkingBudget: "legacy"` in config to pass a fixed `thinking` parameter (for pre-4.7 models only).
+## Prompt Caching
+`sw:grill` uses Anthropic's ephemeral prompt caching to keep stable context hot between invocations. This cuts cost and latency when running grill back-to-back on the same increment (e.g. during the `sw:done` fix-loop).
+**Files cached by default** (via `static-context-loader`):
+- `CLAUDE.md` (project root)
+- `.specweave/config.json`
+- The active increment's `spec.md`
+- The active increment's `rubric.md` (if present)
+**Cache window**: 5-minute TTL (Anthropic's `cache_control: { type: "ephemeral" }` breakpoint). A second grill invocation within 5 minutes reads the cached prefix and only pays tokens for the dynamic tail.
+**Extending the list**: Add paths to `cache.staticContextFiles` in `.specweave/config.json`:
+```json
+{
+  "cache": {
+    "staticContextFiles": [
+      "CLAUDE.md",
+      ".specweave/config.json",
+      ".specweave/docs/internal/specs/custom-rubric.md"
+    ]
+  }
+}
+```
+**Disable caching**: Set `cache.staticContextFiles: []` in `.specweave/config.json`. Grill will still run, but without the prefix cache (full prompt tokens every call).
+See `.specweave/docs/internal/specs/config-reference.md` and `opus-47-migration.md` for the full caching setup.
 ## Project Overrides
 **Skill Memories**: If `.specweave/skill-memories/grill.md` exists, read and apply its learnings.
@@ -46,6 +88,10 @@ I approach code like a demanding tech lead:
 ## Grill Process
+> **think carefully and step-by-step — this evaluation is harder than it looks**
+Apply this adaptive-thinking prompt hint throughout every grill phase. On Opus 4.7 we no longer pass a `thinking` API parameter; the hint above triggers the model to reason deeply where it matters. When a finding looks obvious, look again — the hardest bugs are the ones hiding in plain sight.
 ### Phase 0: Spec Compliance Interrogation (ALWAYS RUNS)
 **Rubric Integration**: If `rubric.md` exists in the increment directory:
@@ -157,17 +203,18 @@ Every finding from the grill process MUST be scored for confidence. This reduces
 ### Scoring System
 - Each finding receives a confidence score from 0 to 100
-- Only findings with confidence >= 70 are surfaced by default
+- Only findings with confidence >= **50** are surfaced by default
 - Findings below the threshold are silently dropped (they create noise, not value)
 - Categories: **correctness** (bugs), **performance**, **security**, **maintainability**, **edge-case**
+- Override via `quality.grillConfidenceThreshold` in `.specweave/config.json`. Pass `--threshold N` on the CLI for one-off overrides.
 ### Confidence Guidelines
 | Score | Meaning | Action |
 |-------|---------|--------|
 | 90-100 | Certain bug/issue — reproducible or provably wrong | MUST fix before shipping |
-| 70-89 | Very likely issue — strong evidence but not 100% confirmed | SHOULD fix, review recommended |
-| 50-69 | Possible issue — circumstantial evidence | Consider fixing, low priority |
+| 75-89 | Very likely issue — strong evidence but not 100% confirmed | SHOULD fix, review recommended |
+| 50-74 | Possible issue — circumstantial evidence | Consider fixing — surfaced by default at the lowered threshold |
 | <50 | Speculative — gut feeling, no hard evidence | Don't report (noise reduction) |
 **How to score**: Base confidence on concrete evidence. Reading the code and seeing a null dereference path = 95. Suspecting a performance issue without profiling data = 60. "This might be a problem someday" = 30 (don't report).
@@ -192,8 +239,8 @@ Each finding in the grill report MUST use this structured format:
 | Confidence Finding | Legacy Severity |
 |---|---|
 | critical (90-100 confidence) | BLOCKER / CRITICAL |
-| high (70-89 confidence) | MAJOR |
-| medium (50-69 confidence) | MINOR (only if explicitly requested) |
+| high (75-89 confidence) | MAJOR |
+| medium (50-74 confidence) | MINOR (surfaced by default at the lowered 50-point threshold) |
 | low (<50 confidence) | Not reported |
 ### Aggregated Summary
@@ -209,8 +256,8 @@ Total findings: {X} (above threshold)
 Suppressed: {Y} (below confidence threshold)
   Critical (must-fix, confidence 90+): {X}
-  High (should-fix, confidence 70-89): {X}
-  Medium (consider, confidence 50-69): {X} (only shown with --verbose)
+  High (should-fix, confidence 75-89): {X}
+  Medium (consider, confidence 50-74): {X} (shown by default; previously --verbose only)
 Ship readiness: READY | NOT READY | NEEDS REVIEW
@@ -225,11 +272,11 @@ Ship readiness: READY | NOT READY | NEEDS REVIEW
 To see all findings including low-confidence ones:
 ```
-sw:grill 0042 --verbose       # Show findings with confidence >= 50
+sw:grill 0042 --verbose       # Show findings with confidence >= 30 (includes speculative)
 sw:grill 0042 --threshold 30  # Show findings with confidence >= 30
 ```
-Default threshold is 70. Lowering it is useful when debugging a specific area or doing a thorough pre-release review.
+Default threshold is **50** (lowered in SpecWeave 1.1.0). Override via `quality.grillConfidenceThreshold` in `.specweave/config.json` or `--threshold N` on the CLI. Raise it for strictly actionable findings on a focused PR, or lower it further for a thorough pre-release review.
 ---

package/plugins/specweave/skills/help/SKILL.md CHANGED Viewed

@@ -24,10 +24,41 @@ STATUS_JSON=$(specweave status --json 2>/dev/null || echo '{"increments":[]}')
 # Get usage stats (if initialized)
 ANALYTICS_JSON=$(specweave analytics --since 30d --json 2>/dev/null || echo '{}')
+# Parse flags
+SHOW_DEPRECATED=$(echo "$ARGUMENTS" | grep -c -- "--deprecated" || echo "0")
+# Load marketplace.json so deprecated skills can be filtered
+MARKETPLACE_JSON=$(cat "$(specweave root 2>/dev/null)/plugins/specweave/marketplace.json" 2>/dev/null || echo '{"skills":[]}')
 ```
 If any command fails, skip that section gracefully — never show errors to the user.
+## Deprecated skill filtering (v1.1.0+)
+By default, `sw:help` **HIDES** skills with `"deprecated": true` in `plugins/specweave/marketplace.json` from the workflow-stage listing in Step 2 / Section C.
+- Default invocation: `sw:help` → deprecated skills are **not listed**
+- Opt-in invocation: `sw:help --deprecated` → deprecated skills are listed in a dedicated "DEPRECATED" section with their migration notes extracted from each SKILL.md
+**Filtering logic**:
+1. Parse `marketplace.json` to build the set of deprecated skill names: `deprecated = {s.name for s in marketplace.skills if s.deprecated == true}`
+2. When rendering Section C (Skills by Workflow Stage), skip any skill whose name is in `deprecated` unless `--deprecated` was passed.
+3. When `--deprecated` is passed, after Section C render an extra section:
+   ```
+   DEPRECATED — Scheduled for removal
+     sw:github-sync       → Use sw-github:sync-spec (removal: v1.3.0)
+     sw:jira-sync         → Use sw-jira:push / sw-jira:pull (removal: v1.3.0)
+     sw:ado-sync          → Use sw-ado:push / sw-ado:pull (removal: v1.3.0)
+     sw:tdd-red           → Use sw:tdd-cycle --phase red (removal: v1.3.0)
+     sw:tdd-green         → Use sw:tdd-cycle --phase green (removal: v1.3.0)
+     sw:tdd-refactor      → Use sw:tdd-cycle --phase refactor (removal: v1.3.0)
+     sw:github-issue-standard → See .specweave/docs/internal/specs/github-issue-standard.md
+   ```
+4. Deprecated skills are still invokable directly (alias-routed in marketplace.json) — the filter only affects discovery listing, not invocation.
+See `.specweave/docs/internal/specs/skill-deprecation-policy.md` for the full lifecycle policy.
 ## Step 2: Display Help
 ### If NOT initialized (no `.specweave/` directory)
@@ -90,10 +121,7 @@ IMPLEMENT — Build it
   sw:do            Execute tasks step by step
   sw:auto          Autonomous execution (unattended)
   sw:team-lead     Parallel multi-agent orchestration
-  sw:tdd-cycle     Test-driven development (red-green-refactor)
-  sw:tdd-red       Write failing tests first
-  sw:tdd-green     Make failing tests pass
-  sw:tdd-refactor  Refactor with test safety net
+  sw:tdd-cycle     Test-driven development (red-green-refactor; use --phase red|green|refactor for single phase)
 VERIFY — Check quality
   sw:validate      130+ rule-based checks + AI quality assessment

package/plugins/specweave/skills/increment/SKILL.md CHANGED Viewed

@@ -2,10 +2,19 @@
 description: Plan and create SpecWeave increments with PM and Architect agent collaboration. Use when starting new features, hotfixes, bugs, or any development work that needs specification and task breakdown. Creates spec.md, plan.md, tasks.md with proper AC-IDs and living docs integration.
 argument-hint: "<feature-description>"
 model: opus
+effort: xhigh
 ---
+**Effort**: `xhigh` (Opus 4.7 default for planning). Use `--effort max` for unusually complex architecture, accepting the overthinking risk.
 # Plan Product Increment
+## Tool-Use Rationale
+- **Read**: Load `.specweave/config.json`, existing increments, and referenced living docs to inform scope and AC-IDs.
+- **Write**: Produce the four increment artifacts (`metadata.json`, `spec.md`, `plan.md`, `tasks.md`) inside the increment directory.
+- **Edit**: Refine AC-IDs, user-story numbering, and task dependencies after the single-agent draft is complete.
 ## CRITICAL: Plan Mode Required (BLOCKING)
 **You MUST be in plan mode before proceeding.** If not, call `EnterPlanMode` now and wait for confirmation before continuing to Step 0A.
@@ -184,6 +193,18 @@ mkdir -p .specweave/increments/XXXX-name
 Create files in order: metadata.json FIRST, then spec.md, plan.md, tasks.md.
+## Flags
+| Flag | Description | Default |
+|------|-------------|---------|
+| `--regenerate-plan` | Regenerate `plan.md` and `tasks.md` for an existing increment without re-running the PM interview. Useful when architecture changes after spec is finalized. Supersedes the deprecated standalone `sw:plan` skill. | false |
+Example:
+```bash
+sw:increment --regenerate-plan 0014-checkout-flow
+```
 ## Quick Reference
 ### Increment Types
@@ -251,23 +272,35 @@ The PM agent will:
 **After PM agent returns**, read the interview state file to confirm all categories are covered
 before proceeding to spec.md creation (especially when `enforcement: "strict"`).
-## Step 4: Direct Specification Writing (Universal — works with ALL AI tools)
+## Step 4: Single-Agent Planning (DEFAULT — 0669 AC-US4-01, AC-US4-02)
-**After increment folder + metadata.json are created, write the spec files using CLI commands and templates.**
+**Default path: one agent writes spec.md + plan.md + tasks.md + rubric.md sequentially.**
-This is the default path. It works with Claude Code, Cursor, OpenCode, Copilot, Aider, and any other AI tool.
+This is now the default for ALL increments — no fan-out, no team-creation overhead, faster planning for the typical small-to-medium increment. It works with Claude Code, Cursor, OpenCode, Copilot, Aider, and any other AI tool.
 1. Create the increment: `specweave create-increment --auto-id --name "feature-name" --title "Title" --description "Desc" --project "my-app"`
+   - Add `--parallel` to opt into 3-agent fan-out planning (see Step 4a).
 2. Write `spec.md` with user stories and acceptance criteria (use the User Story Format above)
-3. Write `plan.md` with architecture decisions and ADR references
-4. Write `tasks.md` with BDD test plans (Given/When/Then) for each AC
-5. Run: `specweave sync-living-docs {increment-id}`
+3. Write `plan.md` with architecture decisions and ADR references (must contain `## Design` and `## Rationale` headings)
+4. Write `tasks.md` with BDD test plans (Given/When/Then) for each AC (`### T-NN` entries)
+5. Write `rubric.md` with the per-increment quality contract (`## Quality Contract` heading)
+6. Run: `specweave sync-living-docs {increment-id}`
 Proceed to Step 5 after writing all files.
-### Step 4a: Enhanced — Team-Based Delegation (Optional, Claude Code only)
+**Parity contract (enforced by `tests/integration/increment-single-agent-parity.test.ts`):** single-agent output MUST match the top-level structure of the 3-agent path — `spec.md` with an `Acceptance Criteria` section and `AC-US*-NN` IDs, `plan.md` with `Design` and `Rationale` headings, `tasks.md` with `### T-NN` tasks, and `rubric.md` with a `Quality Contract` heading.
+### Step 4a: Opt-In — Team-Based 3-Agent Fan-Out (Parallel Planning)
+**Use the 3-agent fan-out ONLY when one of these gates fires:**
+1. **Explicit flag** — user invoked with `--parallel` (maps to `parallel: true` on the create-increment options).
+2. **Large scope** — user-story count is ≥ 10 in the feature description or an existing draft.
+3. **Keyword trigger** — the feature description contains any of: `parallel`, `team lead`, `fan out`.
+If NONE of these fire, STOP — use Step 4 (single-agent) instead. Do not spawn a planning team for small/medium increments by default.
-**If TeamCreate is available**, use team-based delegation for better quality. This provides isolated context, persistent memory, resumability, auto-compaction, and tmux pane visibility for each agent.
+**When the gate fires and TeamCreate is available**, use team-based delegation for better quality. This provides isolated context, persistent memory, resumability, auto-compaction, and tmux pane visibility for each agent.
 **Team lifecycle:**
 1. `TeamCreate({ team_name: "plan-XXXX-name", description: "Planning: <feature>" })`

package/plugins/specweave/skills/jira-sync/SKILL.md CHANGED Viewed

@@ -1,9 +1,24 @@
 ---
-description: Sync guidance for SpecWeave increments with JIRA epics/stories (content SpecWeave→JIRA, status JIRA→SpecWeave). Use when asking about JIRA integration setup or troubleshooting sync. For actual syncing, use sw-jira:sync command instead.
+description: "[DEPRECATED] Sync guidance for SpecWeave increments with JIRA epics/stories (content SpecWeave→JIRA, status JIRA→SpecWeave). Use when asking about JIRA integration setup or troubleshooting sync. For actual syncing, use sw-jira:push or sw-jira:pull command instead."
 user-invokable: false
+deprecated: true
 allowed-tools: Read, Task
 ---
+> ⚠️ DEPRECATED: Use `sw-jira:push` / `sw-jira:pull` instead. This skill will be removed in v1.3.0.
+## Migration
+This skill has been deprecated as part of the Opus 4.7 framework alignment (increment 0669).
+- **Use instead**: `sw-jira:push` (content SpecWeave→JIRA) and `sw-jira:pull` (status JIRA→SpecWeave)
+- **Removal**: Scheduled for v1.3.0 (2 minor releases after v1.1.0)
+- **Why**: Consolidated sync logic moved to the `sw-jira:*` command family.
+For the migration policy, see `.specweave/docs/internal/specs/skill-deprecation-policy.md`.
+---
 # JIRA Sync Skill
 Coordinates JIRA synchronization by delegating to `jira-mapper` agent.

package/plugins/specweave/skills/judge-llm/SKILL.md CHANGED Viewed

@@ -1,11 +1,35 @@
 ---
-description: Ultrathink LLM-as-Judge validation of completed work. Uses extended thinking and Opus model for thorough, independent evaluation. Use when saying "judge my code", "judge-llm", "deep validate", or as part of sw:done closure.
+description: Adaptive-thinking LLM-as-Judge validation of completed work. Uses the Opus model and an adaptive-thinking prompt hint for thorough, independent evaluation. Use when saying "judge my code", "judge-llm", "deep validate", or as part of sw:done closure.
 allowed-tools: Read, Grep, Glob, Bash
 ---
-# Ultrathink LLM-as-Judge Validation
+# Adaptive-Thinking LLM-as-Judge Validation
-**ULTRATHINK BY DEFAULT** - Validate completed work using extended thinking and the LLM-as-Judge pattern. Provides an independent second opinion separate from `sw:grill`.
+Validate completed work using the adaptive-thinking LLM-as-Judge pattern. Provides an independent second opinion separate from `sw:grill`.
+## Tool-Use Rationale
+- **Read**: Load spec.md, tasks.md, rubric.md, and the files under review to build evaluation context.
+- **Grep**: Search for AC patterns, test assertions, and implementation markers across the codebase.
+- **Glob**: Discover test files and implementation files matching the increment's scope.
+- **Bash**: Run `npx vitest run` to verify test pass rates; check file existence.
+## Adaptive-Thinking Prompt Hint
+> **think carefully and step-by-step — this evaluation is harder than it looks**
+With Opus 4.7, we no longer pass a `thinking` API parameter. Instead, we rely on adaptive thinking triggered by the prompt hint above. The model decides how much reasoning each evaluation requires.
+## Model Configuration
+**Default effort**: `xhigh` — recommended for all evaluation tasks per Opus 4.7 conventions.
+**Opt-in max**: `--effort max` enables maximum effort with a warning: "max effort risks overthinking on straightforward problems."
+**Legacy mode**: Set `quality.thinkingBudget: "legacy"` in config to pass a fixed `thinking` parameter (for pre-4.7 models only).
+## Effort Level
+- **Default**: `xhigh` effort — the judge runs with the highest reasoning effort level by default.
+- **Opt-in**: `--effort max` — elevates to maximum effort for exceptionally complex or high-stakes reviews.
 ## How It Differs from sw:grill
@@ -13,7 +37,7 @@ allowed-tools: Read, Grep, Glob, Bash
 |--------|-------------|-----------------|
 | Execution | In-session (same context) | **Separate Opus API call** |
 | Context | Shares conversation context | **Fresh context (no bias)** |
-| Thinking | Standard reasoning | **Extended thinking / ultrathink** |
+| Reasoning | Standard reasoning | **Adaptive thinking via prompt hint (`xhigh` default, `--effort max` opt-in)** |
 | Output | Confidence-scored findings | Structured verdict + score |
 | Domain | Generic code review | **Built-in domain criteria** |
@@ -23,6 +47,7 @@ allowed-tools: Read, Grep, Glob, Bash
 **TypeScript**: `src/core/skills/skill-judge.ts`
 - Uses Anthropic SDK with user's `ANTHROPIC_API_KEY`
+- Model-version guard: omits the `thinking` API parameter on `claude-opus-4-7*` and newer models; falls back to adaptive-thinking prompt hint
 - AbortController-based timeout to prevent stuck states (default: 60s)
 - Progress logging to `.specweave/logs/judge-llm.log`
 - Fallback to basic pattern matching if no API key
@@ -31,11 +56,11 @@ allowed-tools: Read, Grep, Glob, Bash
 ## Usage
 ```bash
-# DEFAULT: Ultrathink validation (recommended)
+# DEFAULT: Adaptive-thinking validation at xhigh effort
 sw:judge-llm src/file.ts
 sw:judge-llm "src/**/*.ts"
-# Validate git changes (ultrathink by default)
+# Validate git changes (adaptive thinking by default)
 sw:judge-llm --staged           # Staged changes
 sw:judge-llm --last-commit      # Last commit
 sw:judge-llm --diff main        # Diff vs branch
@@ -43,6 +68,9 @@ sw:judge-llm --diff main        # Diff vs branch
 # Quick mode (ONLY if you need speed over thoroughness)
 sw:judge-llm src/file.ts --quick
+# Maximum reasoning effort (opt-in)
+sw:judge-llm src/file.ts --effort max
 # Timeout control (default: 60s)
 sw:judge-llm src/file.ts --timeout 120000
@@ -61,13 +89,13 @@ sw:judge-llm src/file.ts --verbose  # Show progress to console
 1. Read `.specweave/config.json` → check `externalModels.consent` field
 2. If `"always-allow"` → proceed silently
-3. If `"never"` → skip API call, use in-session ultrathink evaluation instead
+3. If `"never"` → skip API call, use in-session adaptive-thinking evaluation instead
 4. If `"ask"` (default):
    - Check if `"anthropic"` is in `externalModels.allowedProviders`
    - If YES → proceed silently (standing permission)
    - If NO → **ASK USER**: "Judge-LLM will call the Anthropic API using your ANTHROPIC_API_KEY. This costs ~$0.01-0.05 per evaluation. Proceed? (yes/no/always)"
      - "yes" → proceed this time only
-     - "no" → skip API call, use in-session ultrathink instead
+     - "no" → skip API call, use in-session adaptive-thinking evaluation instead
      - "always" → run: `grantStandingConsent('anthropic', projectRoot)` from `src/core/llm/consent.ts`, then proceed
 5. No `ANTHROPIC_API_KEY` set → falls back to pattern matching automatically (no cost, no consent needed)
@@ -82,12 +110,16 @@ Determine what to validate:
 - If `--diff <branch>`: get diff against branch
 - If no args: validate recent work in conversation context
-### Step 2: Ultrathink Analysis (Default)
+### Step 2: Adaptive-Thinking Analysis (Default)
+Prompt hint prefix (include verbatim at the start of the judge prompt):
+> **think carefully and step-by-step — this evaluation is harder than it looks**
-Use extended thinking for deep LLM-as-Judge evaluation via the Opus model:
+Use adaptive thinking (triggered by the hint) for deep LLM-as-Judge evaluation via the Opus 4.7 model at `xhigh` effort by default (`--effort max` opt-in for exceptional cases):
 ```
-Claude MUST use ultrathink/extended thinking to:
+Claude MUST think carefully and step-by-step to:
 1. DEEP READ: Thoroughly understand all code, context, and intent
 2. MULTI-DIMENSIONAL ANALYSIS: Evaluate across ALL dimensions:
@@ -109,12 +141,12 @@ Claude MUST use ultrathink/extended thinking to:
 JUDGE-LLM VERDICT: APPROVED | CONCERNS | REJECTED
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-Mode: ULTRATHINK (extended thinking)
+Mode: ADAPTIVE-THINKING (xhigh effort)
 Confidence: 0.XX
 Files Analyzed: N
 REASONING:
-[Detailed chain-of-thought from extended thinking]
+[Detailed chain-of-thought from adaptive thinking]
 ISSUES (if any):
   CRITICAL: [title]
@@ -161,7 +193,7 @@ After evaluation (including consent-denied fallback), you **MUST** write a JSON
   "timestamp": "<ISO-8601>",
   "verdict": "APPROVED|CONCERNS|REJECTED",
   "score": 87,
-  "mode": "ultrathink|quick|pattern-match",
+  "mode": "adaptive-thinking|quick|pattern-match",
   "timedOut": false,
   "duration_ms": 45000,
   "consentStatus": "granted",