npm - wogiflow - Versions diffs - 2.16.0 → 2.17.5 - Mend

wogiflow 2.16.0 → 2.17.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (112) hide show

package/.claude/commands/wogi-audit.md +212 -17
package/.claude/commands/wogi-research.md +37 -0
package/.claude/commands/wogi-review.md +200 -22
package/.claude/commands/wogi-start.md +45 -0
package/.claude/docs/intent-grounded-review.md +209 -0
package/.workflow/agents/logic-adversary.md +8 -0
package/.workflow/templates/claude-md.hbs +18 -0
package/lib/installer.js +1 -0
package/lib/utils.js +29 -3
package/lib/workspace-changelog.js +2 -1
package/lib/workspace-channel-server.js +4 -6
package/lib/workspace-contracts.js +5 -4
package/lib/workspace-events.js +8 -7
package/lib/workspace-gates.js +4 -3
package/lib/workspace-integration-tests.js +2 -1
package/lib/workspace-intelligence.js +3 -2
package/lib/workspace-locks.js +2 -1
package/lib/workspace-messages.js +7 -6
package/lib/workspace-routing.js +14 -26
package/lib/workspace-session.js +7 -6
package/lib/workspace-sync.js +9 -8
package/lib/workspace.js +45 -2
package/package.json +4 -2
package/scripts/base-workflow-step.js +1 -1
package/scripts/flow +22 -0
package/scripts/flow-adaptive-learning.js +1 -1
package/scripts/flow-aggregate.js +2 -1
package/scripts/flow-architect-pass.js +3 -3
package/scripts/flow-archive-runs.js +372 -0
package/scripts/flow-ask.js +121 -0
package/scripts/flow-ast-grep.js +216 -0
package/scripts/flow-audit-gates.js +1 -1
package/scripts/flow-auto-learn.js +8 -11
package/scripts/flow-bug.js +2 -2
package/scripts/flow-capture-gate.js +644 -0
package/scripts/flow-capture.js +4 -3
package/scripts/flow-cli-flags.js +95 -0
package/scripts/flow-community-sync.js +2 -1
package/scripts/flow-community.js +6 -6
package/scripts/flow-conclusion-classifier.js +310 -0
package/scripts/flow-config-defaults.js +13 -3
package/scripts/flow-constants.js +11 -12
package/scripts/flow-context-scoring.js +1 -0
package/scripts/flow-correction-detector.js +344 -3
package/scripts/flow-damage-control.js +1 -1
package/scripts/flow-decisions-merge.js +1 -0
package/scripts/flow-done-gates.js +20 -0
package/scripts/flow-done-report.js +2 -2
package/scripts/flow-done.js +4 -4
package/scripts/flow-epics.js +5 -11
package/scripts/flow-id.js +92 -0
package/scripts/flow-io.js +15 -5
package/scripts/flow-knowledge-router.js +2 -1
package/scripts/flow-links.js +1 -1
package/scripts/flow-log-manager.js +2 -1
package/scripts/flow-logic-adversary.js +4 -4
package/scripts/flow-long-input-cli.js +6 -0
package/scripts/flow-long-input-stories.js +1 -1
package/scripts/flow-loop-retry-learning.js +1 -1
package/scripts/flow-mcp-capabilities.js +2 -3
package/scripts/flow-mcp-docs.js +2 -1
package/scripts/flow-memory-blocks.js +2 -1
package/scripts/flow-memory-sync.js +1 -1
package/scripts/flow-memory.js +767 -0
package/scripts/flow-migrate-igr.js +1 -1
package/scripts/flow-migrate.js +2 -1
package/scripts/flow-model-adapter.js +1 -1
package/scripts/flow-model-config.js +5 -1
package/scripts/flow-model-profile.js +2 -1
package/scripts/flow-orchestrate.js +3 -3
package/scripts/flow-output.js +29 -0
package/scripts/flow-parallel.js +10 -9
package/scripts/flow-pattern-enforcer.js +2 -1
package/scripts/flow-permissions-audit.js +124 -0
package/scripts/flow-plugin-registry.js +2 -2
package/scripts/flow-progress.js +5 -1
package/scripts/flow-project-analyzer.js +1 -1
package/scripts/flow-promote.js +510 -0
package/scripts/flow-registries.js +86 -0
package/scripts/flow-request-log.js +133 -0
package/scripts/flow-research-protocol.js +0 -1
package/scripts/flow-revision-tracker.js +2 -1
package/scripts/flow-roadmap.js +2 -1
package/scripts/flow-rules-sync.js +3 -7
package/scripts/flow-session-end.js +3 -1
package/scripts/flow-session-learning.js +6 -13
package/scripts/flow-session-state.js +2 -2
package/scripts/flow-setup-hooks.js +2 -1
package/scripts/flow-skill-create.js +1 -1
package/scripts/flow-skill-freshness.js +6 -7
package/scripts/flow-skill-learn.js +1 -1
package/scripts/flow-step-coverage.js +1 -1
package/scripts/flow-step-security.js +1 -1
package/scripts/flow-story.js +58 -10
package/scripts/flow-sys.js +204 -0
package/scripts/flow-task-hierarchy.js +88 -0
package/scripts/flow-tech-debt.js +2 -1
package/scripts/flow-test-api.js +1 -1
package/scripts/flow-utils.js +60 -890
package/scripts/hooks/core/bugfix-scope-gate.js +5 -4
package/scripts/hooks/core/deploy-gate.js +1 -1
package/scripts/hooks/core/pre-tool-helpers.js +72 -0
package/scripts/hooks/core/pre-tool-orchestrator.js +442 -0
package/scripts/hooks/core/routing-gate.js +8 -0
package/scripts/hooks/core/session-context.js +35 -0
package/scripts/hooks/core/session-end.js +28 -0
package/scripts/hooks/core/task-boundary-reset.js +10 -0
package/scripts/hooks/entry/claude-code/pre-tool-use.js +48 -492
package/scripts/hooks/entry/claude-code/user-prompt-submit.js +12 -0
package/scripts/hooks/entry/shared/hook-runner.js +1 -1
package/scripts/registries/schema-registry.js +1 -1
package/scripts/registries/service-registry.js +1 -1

package/.claude/commands/wogi-review.md CHANGED Viewed

@@ -29,7 +29,7 @@ Auto-detects when to use multi-pass (4 sequential passes) vs parallel (3 agents)
 At each phase checkpoint, display a progress bar AND update the progress state file:
 ```bash
-node node_modules/wogiflow/scripts/flow-progress-tracker.js update '{"taskId":"wf-XXX","command":"/wogi-review","phase":"AI Review","phaseNum":2,"totalPhases":5,"step":"Agent 3/6 complete","stepNum":3,"totalSteps":6}'
+node node_modules/wogiflow/scripts/flow-progress-tracker.js update '{"taskId":"wf-XXX","command":"/wogi-review","phase":"AI Review","phaseNum":2,"totalPhases":7,"step":"Agent 3/6 complete","stepNum":3,"totalSteps":6}'
 ```
 **Standard format for each checkpoint:**
@@ -38,34 +38,49 @@ node node_modules/wogiflow/scripts/flow-progress-tracker.js update '{"taskId":"w
   Agent 3/6 complete
 ```
-**Phase mapping for /wogi-review:**
+**Phase mapping for /wogi-review (v6.0 — IGR-hardened):**
 | Phase | phaseNum | Description |
 |-------|----------|-------------|
+| 0 | Review Framing | Scope + assumptions (IGR v6.0) |
 | 1 | Verification Gates | Syntax, lint, tests |
 | 2 | AI Review | N agents (sub-steps = agents) |
+| 2.5 | Git-Verified Claims | Cross-reference spec vs diff |
+| 2.8 | Findings Adversary | Different-model critique (IGR v6.0) |
 | 3 | Standards + Promotion | Compliance check + pattern learning |
 | 4 | Optimization | Solution suggestions |
-| 5 | Post-Review | Fix routing, learning, archive |
+| 5 | Post-Review | Fix routing, truth gate, archive |
+Note: `totalPhases: 7` when Phase 0 counted as phaseNum=0 (8 named phases overall, 7 sequential numeric slots 0→5). Pass `totalPhases: 7` to the progress tracker.
 On review completion, clear progress: `node node_modules/wogiflow/scripts/flow-progress-tracker.js clear`
-## Review Phases (v5.0)
+## Review Phases (v6.0 — IGR-hardened)
 ```
 ┌─────────────────────────────────────────────────────────────┐
 │  /wogi-review                                                │
 ├─────────────────────────────────────────────────────────────┤
+│  Phase 0: Review Framing Pass (IGR v6.0)                     │
+│     → Interpret what the user asked to review                │
+│     → Surface scope (in/out) + review-model assumptions      │
+│     → Item reconciliation (anti-deferral guard)              │
+│                                                              │
 │  Phase 1: Verification Gates                                 │
 │     → Spec verification, lint, typecheck, tests              │
 │                                                              │
 │  Phase 2: AI Review (multi-pass or parallel)                 │
 │     → Code/Logic, Security, Architecture analysis            │
 │     → Adversarial mode: min findings per agent (v5.0)        │
+│     → Evidence tiers required on every finding (IGR v6.0)    │
 │                                                              │
 │  Phase 2.5: Git-Verified Claim Checking (v5.0)               │
 │     → Cross-reference spec claims vs actual git diff         │
 │     → BLOCKS if spec promises files not in git diff          │
 │                                                              │
+│  Phase 2.8: Findings Adversary Critique (IGR v6.0)           │
+│     → Different-model review of the findings themselves      │
+│     → Flags false positives, severity inflation, missed bugs │
+│                                                              │
 │  Phase 3: Standards Compliance [STRICT]                      │
 │     → decisions.md, app-map.md, naming-conventions.md        │
 │     → MUST_FIX violations block sign-off in Phase 5          │
@@ -74,8 +89,9 @@ On review completion, clear progress: `node node_modules/wogiflow/scripts/flow-p
 │     → Technical alternatives, UX improvements                │
 │     → Suggestions only - not violations                      │
 │                                                              │
-│  Phase 5: Post-Review Workflow                               │
+│  Phase 5: Post-Review Workflow + Completion Truth Gate       │
 │     → Fix loop, learning, task creation                      │
+│     → "Fixed" claims require INTERACTIVE evidence (IGR v6.0) │
 └─────────────────────────────────────────────────────────────┘
 ```
@@ -112,9 +128,17 @@ Multi-pass advantages:
 The review system has **two layers**:
 1. **Runtime scripts** (`flow-review.js`, `flow-standards-checker.js`, `flow-solution-optimizer.js`) — perform automated pre-flight checks (verification gates, standards, optimization). These are helper tools, NOT the full review.
-2. **AI instructions** (this document) — describe the complete 5-phase review loop, agent spawning, and post-review workflow. The AI model executes the full 5-phase loop, using runtime script output as input to specific phases.
+2. **AI instructions** (this document) — describe the complete 7-phase review loop, agent spawning, and post-review workflow. The AI model executes the full 7-phase loop, using runtime script output as input to specific phases.
+**The runtime script does NOT execute all 7 phases.** It handles pre-flight only. You (the AI) are responsible for orchestrating the complete review.
+### IGR v6.0 — Config Enforcement + Adversary Model Rule (concise)
-**The runtime script does NOT execute all 5 phases.** It handles pre-flight only. You (the AI) are responsible for orchestrating the complete review.
+All `config.review.*` toggles are **AI-honored, not runtime-enforced**. Load config first, print toggle states, honor them. Matches `/wogi-audit`'s docs-driven model.
+`adversaryPass.adversaryModel` is a mapping. **Override-always rule**: the adversary MUST run on a different model than the review agents (same-model = rubber-stamp). If the resolved value equals the agent model, pick a different model regardless.
+Full reference: [intent-grounded-review.md → Config Enforcement Model](../docs/intent-grounded-review.md#config-enforcement-model--reference-detail).
 ## Step 0: Scope Resolution (Natural Language Scoping)
@@ -182,7 +206,7 @@ The resolved file list replaces the default git diff in Phase 1. All subsequent
 ## How It Works (MANDATORY 5-PHASE SEQUENTIAL EXECUTION)
-**CRITICAL: You MUST execute ALL 5 phases sequentially. Do NOT stop after Phase 2.**
+**CRITICAL: You MUST execute ALL 7 phases sequentially (0 → 1 → 2 → 2.5 → 2.8 → 3 → 4 → 5). Do NOT stop after Phase 2.**
 ```
 ┌─────────────────────────────────────────────────────────────┐
@@ -221,7 +245,7 @@ The resolved file list replaces the default git diff in Phase 1. All subsequent
 │     → Persist findings, present fix options to user          │
 │     → If user chooses fix: convert to todos, fix loop        │
 │     → Learning capture: corrections, pattern promotion       │
-│     → Display "Phases: 5/5 executed"                         │
+│     → Display "Phases: 7/7 executed"                         │
 │     ✓ CHECKPOINT: "Phase 5 complete - Review done"           │
 │                                                              │
 └─────────────────────────────────────────────────────────────┘
@@ -535,6 +559,48 @@ Track phases completed: start at 0/5, increment after each phase checkpoint.
 ---
+### PHASE 0: Review Framing Pass (IGR v6.0)
+**Config toggle**: `config.review.framingPass.enabled` (default `true`). Reference: [intent-grounded-review.md → Phase 0](../docs/intent-grounded-review.md#phase-0-review-framing-pass--reference-detail).
+**Procedure**:
+1. Interpret the review request into a **Framing Artifact** with 5 fields: `interpretation`, `scopeIn`, `scopeOut`, `assumptions`, `posture` (`pre-ship` | `session-review` | `security-focused` | `exploratory`).
+2. Write the artifact to `.workflow/state/review-framing/{timestamp}.md` (with PIN markers).
+3. Display a short summary:
+   ```
+   ━━━ REVIEW FRAMING ━━━
+   Interpretation: [one sentence]
+   Scope (in):  [list]
+   Scope (out): [list]
+   Assumptions:
+     - [assumption 1]
+     - [assumption 2]
+   Posture: [pre-ship | session-review | security-focused | exploratory]
+   Proceeding with N-agent analysis on this scope.
+   ━━━━━━━━━━━━━━━━━━━━━━
+   ```
+4. **Item reconciliation (MANDATORY anti-deferral guard)**: if the user's request enumerated multiple items, each MUST appear in `scopeIn`. If the count shrank, framing FAILS — require user confirmation before proceeding.
+5. **Posture adjusts agent weighting** — see the reference doc for the full table.
+**Display Phase 0 results**:
+```
+═══════════════════════════════════════
+PHASE 0: REVIEW FRAMING [0/7]
+═══════════════════════════════════════
+[Framing artifact summary]
+✓ Phase 0 complete. Proceeding to Phase 1...
+```
+Config toggles: `review.framingPass.enabled` (default true), `review.framingPass.itemReconciliation` (default true), `review.framingPass.adversaryInExploratory` (default false).
+---
 ### PHASE 1: Verification Gates
 **1.1. Get changed files**:
@@ -554,7 +620,7 @@ git diff --name-only HEAD~N HEAD  # If --commits N specified
 **1.3. Display Phase 1 results**:
 ```
 ═══════════════════════════════════════
-PHASE 1: VERIFICATION GATES [1/5]
+PHASE 1: VERIFICATION GATES [1/7]
 ═══════════════════════════════════════
 ✓ Spec: N/N deliverables exist
 ✓ Lint: passed
@@ -613,9 +679,9 @@ Agent Lineup (N agents):
   Total: N (max: 6)
 ```
-**2.3. Append adversarial minimum findings suffix to EVERY agent prompt**:
+**2.3. Append adversarial minimum findings suffix + evidence tier requirement to EVERY agent prompt**:
-Read `config.review.minFindings` (default: 3). Append this to every agent's prompt:
+Read `config.review.minFindings` (default: 3) and `config.review.evidenceTiers.enabled` (default: true). Append this to every agent's prompt:
 ```
 IMPORTANT: Adversarial Review Mode
@@ -623,8 +689,37 @@ You MUST find at least [minFindings] findings. If you genuinely cannot find
 [minFindings] issues, you MUST provide a "clean code justification" as a
 special finding with type "clean-justification" explaining WHY the code is
 clean. Generic praise like "looks good" is NOT acceptable.
+IMPORTANT: Evidence Tier Requirement (IGR v6.0)
+Every finding MUST carry two additional fields:
+  evidenceTier: integer 0–4
+    0 = STATIC      — inferred from source alone (weakest)
+    1 = STRUCTURAL  — grepped / globbed / counted instances
+    2 = OBSERVATIONAL — ran a tool (lint, typecheck, npm audit) and read output
+    3 = INTERACTIVE — executed code/tests and observed behavior
+    4 = AUTOMATED   — deterministic check in a quality gate / test suite
+  evidenceNote: one-line string citing what produced the evidence
+    examples: "grep 'JSON\\.parse' returned 7 matches in src/api/"
+              "ran require.resolve() — path resolves correctly"
+              "executed tests/foo.test.js and observed assertion failure"
+SEVERITY IS CAPPED BY TIER:
+  - Tier 0: severity MUST be LOW (and will be flagged UNVERIFIED in the report)
+  - Tier 1: severity capped at MEDIUM (unless grep returned >=5 instances → HIGH allowed)
+  - Tier 2+: severity stands as you assign it
+Also respect the FRAMING ARTIFACT from Phase 0 — only report findings within
+`scopeIn`. Findings outside `scopeOut` will be moved to an appendix by the
+orchestrator.
 ```
+**Why evidence tiers matter**: During this project's own self-review (session logs), a `code-reviewer` agent reported an F1 finding as "Critical — broken require path" without citing evidence. Manual verification via `require.resolve()` showed the path was correct — the agent's path math was flawed. With tier enforcement, F1 would have been Tier 0 (no grep, no execution), capped at LOW, and flagged UNVERIFIED — alerting the reader to verify before acting.
+**Config toggles**: `review.evidenceTiers.enabled` (default true), `review.evidenceTiers.capByTier` (default true — enforce severity caps).
 **2.4. Launch ALL agents in parallel** (single message with N Task tool calls, subagent_type=Explore)
 **2.5. Wait for all agents to complete**
@@ -657,7 +752,7 @@ clean. Generic praise like "looks good" is NOT acceptable.
 **2.7. Display Phase 2 results (per-agent sections)**:
 ```
 ═══════════════════════════════════════
-PHASE 2: AI REVIEW [2/5]
+PHASE 2: AI REVIEW [2/7]
 ═══════════════════════════════════════
 Agents: N launched (3 core + 1 optional + 2 project-rules)
@@ -713,7 +808,7 @@ git diff --name-only               # For unstaged changes
 **2.5.5. Display Phase 2.5 results**:
 ```
 ═══════════════════════════════════════
-PHASE 2.5: GIT-VERIFIED CLAIMS [2.5/5]
+PHASE 2.5: GIT-VERIFIED CLAIMS [3/7]
 ═══════════════════════════════════════
 Spec: .workflow/changes/wf-XXXXXXXX.md
@@ -733,6 +828,52 @@ Summary: X verified, Y missing, Z unplanned
 ---
+### PHASE 2.8: Findings Adversary Critique (IGR v6.0)
+**Config toggle**: `config.review.adversaryPass.enabled` (default `true`; MANDATORY when framing posture is `pre-ship`). Reference: [intent-grounded-review.md → Phase 2.8](../docs/intent-grounded-review.md#phase-28-findings-adversary-critique--reference-detail).
+**Procedure**:
+1. **Collect inputs**: the framing artifact + all Phase 2 findings (with `evidenceTier` + `evidenceNote`) + Phase 2.5 git-claim results.
+2. **Launch ONE Agent sub-agent** (`subagent_type=Explore`, READ-ONLY) on a DIFFERENT model than the review agents. Resolve via `config.review.adversaryPass.adversaryModel` mapping: agents on Sonnet → adversary on Opus; agents on Opus → adversary on Sonnet; agents on Haiku → adversary on Sonnet. **Override-always rule**: if the resolved value equals the agent model, pick a different model anyway.
+3. **Adversary prompt** — produce JSON with: `falsePositives[]`, `missedIssues[]`, `severityAdjustments[]`, `scopeDrift[]`, `evidenceChallenges[]`, `overallVerdict` (`ACCEPT | ACCEPT_WITH_ADJUSTMENTS | REVISE_SCOPE | BLOCK`).
+    HUNT specifically for: (a) `evidenceTier=0` + severity ≥ HIGH, (b) line-number claims without code quotes, (c) "broken require path" / "missing import" / "wrong type" without `require.resolve` / `tsc` / `grep` verification, (d) findings contradicting `scopeIn`/`scopeOut`.
+    Forbid "I think" / "might" / "could" — require evidence. Full prompt template in the reference doc.
+4. **Parse + apply adjustments**: `severityAdjustments` rewrite severity (mark `[ADVERSARY-ADJUSTED]`); `scopeDrift` moves to appendix; `falsePositives` marked `[DISPUTED]` (not removed); `missedIssues` appended as `[ADVERSARY-FOUND]` Tier-0; `evidenceChallenges` downgrade tier and re-apply severity cap.
+5. **Archive** run to `.workflow/state/adversary-runs/review-{timestamp}.json` for the pattern-promotion pipeline.
+6. **Display Phase 2.8 results**:
+```
+═══════════════════════════════════════
+PHASE 2.8: FINDINGS ADVERSARY [4/7]
+═══════════════════════════════════════
+Adversary model: [model]  (agents: [agent-model])
+Verdict:         [ACCEPT | ACCEPT_WITH_ADJUSTMENTS | REVISE_SCOPE | BLOCK]
+False positives:       N  (marked [DISPUTED])
+Severity adjustments:  N  (marked [ADVERSARY-ADJUSTED])
+Missed issues found:   N  (appended as [ADVERSARY-FOUND] Tier-0 findings)
+Scope drift:           N  (moved to Out-of-Scope appendix)
+Evidence challenges:   N  (tier downgraded, severity re-capped)
+[For each item, one-line summary with finding ID + reason]
+✓ Phase 2.8 complete. Proceeding to Phase 3...
+```
+**One pass only** — no iteration loop. If the adversary `BLOCKS`, display the block reason prominently and require the user to acknowledge before proceeding to Phase 3 — or to retry the review with adjusted scope.
+**Config toggles**: `review.adversaryPass.enabled` (default true), `review.adversaryPass.adversaryModel` (mapping object — see "Adversary Model Selection Rule" in the Architecture Note; resolve at runtime based on agent model, override-always rule applies), `review.adversaryPass.applySeverityAdjustments` (default true), `review.adversaryPass.applyScopeDrift` (default true), `review.adversaryPass.blockOnBlockVerdict` (default true).
+---
 ### PHASE 3: Standards Compliance [STRICT]
 **This phase BLOCKS review completion if MUST_FIX violations are found.**
@@ -766,7 +907,7 @@ After running the standards check, feed any violations through the pattern promo
 **3.4. Display Phase 3 results**:
 ```
 ═══════════════════════════════════════
-PHASE 3: STANDARDS COMPLIANCE [3/5]
+PHASE 3: STANDARDS COMPLIANCE [5/7]
 ═══════════════════════════════════════
 ✓ decisions.md: passed
@@ -809,7 +950,7 @@ Or if the runtime script is not available, manually analyze changed files for:
 **4.3. Display Phase 4 results**:
 ```
 ═══════════════════════════════════════
-PHASE 4: SOLUTION OPTIMIZATION [4/5]
+PHASE 4: SOLUTION OPTIMIZATION [6/7]
 ═══════════════════════════════════════
 Technical (N):
@@ -850,7 +991,7 @@ Phase Results:
 Total Findings: N (X critical, Y high, Z medium, W low)
 Pattern Learning: P patterns tracked, M promoted, G enforcement gaps
-Phases: 5/5 executed
+Phases: 7/7 executed
 ```
 **5.2. Present severity-aware fix options to user** (use AskUserQuestion):
@@ -990,14 +1131,51 @@ This ensures that patterns discovered during code review feed into the same prom
 - Save review report to `.workflow/reviews/YYYY-MM-DD-HHMMSS-review.md`
 - Include: date, files reviewed, mode, all findings with status (fixed/task-created/dismissed), summary
-**5.6. Sign-off gate**:
+**5.6. Completion Truth Gate (IGR v6.0)** — runs BEFORE sign-off:
+**Config toggle**: `config.review.completionTruthGate.enabled` (default `true`).
+**Problem this solves**: A review's "fixed" claim is only as good as the evidence behind it. A finding marked `fixed` because the AI applied an edit is NOT the same as a finding verified to work. Without a truth gate, the sign-off rubber-stamps whatever the agent says.
+**Procedure** — for every finding now marked `status: fixed`:
+1. **Check evidence tier of the fix**:
+   - Did the fix come with an executed test (`tier ≥ 3 INTERACTIVE`)?
+   - Or an automated gate confirming the fix (`tier 4 AUTOMATED`)?
+   - Or just an edit + lint pass (`tier 2 OBSERVATIONAL`)?
+   - Or just an edit (`tier 0 STATIC`)?
+2. **Downgrade rule**:
+   - `tier ≥ 3` → status stays `fixed` (INTERACTIVE evidence is sufficient)
+   - `tier 2` → status downgraded to `fixed-unverified` (lint/typecheck passed but behavior not exercised)
+   - `tier ≤ 1` → status downgraded to `implemented-unverified` (edit applied, no evidence of correctness)
+3. **Display the downgrade in the final summary**:
+```
+━━━ COMPLETION TRUTH GATE ━━━
+  Findings marked "fixed":         N
+  Tier ≥ 3 (INTERACTIVE):          M  → status stands
+  Tier 2 (OBSERVATIONAL):          K  → downgraded to "fixed-unverified"
+  Tier ≤ 1 (STATIC/STRUCTURAL):    J  → downgraded to "implemented-unverified"
+  ⚠ K + J findings lack runtime proof of fix.
+  To upgrade: run the relevant tests / smoke-test / browser check and re-verify.
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+```
+4. **Persist downgraded statuses** to `last-review.json`. Do NOT silently mark everything as complete.
+**Config toggles**: `review.completionTruthGate.enabled` (default true), `review.completionTruthGate.requireInteractiveForFixed` (default true — when false, Tier 2 counts as fully fixed).
+**5.7. Sign-off gate**:
 - Present summary to user and ask for confirmation that the review is complete
-- If user requests additional fixes, return to step 5.3
+- Display the truth-gate downgrade counts prominently — the user should consciously accept unverified fixes, not have them hidden
+- If user requests additional fixes or verification, return to step 5.3
-**5.7. Display final checkpoint**:
+**5.8. Display final checkpoint**:
 ```
 ═══════════════════════════════════════
-PHASE 5: POST-REVIEW COMPLETE [5/5]
+PHASE 5: POST-REVIEW COMPLETE [7/7]
 ═══════════════════════════════════════
 Findings: N total
@@ -1010,7 +1188,7 @@ Pattern Learning:
 Run /wogi-review-fix --pending to batch-process deferred items.
-Phases: 5/5 executed
+Phases: 7/7 executed
 Review complete.
 ```

package/.claude/commands/wogi-start.md CHANGED Viewed

@@ -101,6 +101,51 @@ When a local `/wogi-*` CLI command fails (error in output, "Unknown skill", comm
 - After `/wogi-start` classifies as conversation: Read, Glob, Grep, WebSearch, WebFetch (read-only). No Edit/Write/state modifications.
 - Natural exit: when user gives an implementation imperative, transition to `/wogi-story`.
+**Research Reasoning Gate** (applies inside Conversation mode when `config.researchReasoningGate.enabled` — default ON): classify the question into a tier based on structural markers. Do NOT self-classify the question's complexity — use the markers below mechanically. When ambiguous, default to Tier 2.
+| Tier | Marker phrases | What you do |
+|------|---------------|-------------|
+| **Tier 1 — Factual** | "what is", "how many", "show me", "list all", "which file", "where does" | Answer directly from code/docs. No gate. |
+| **Tier 2 — Domain** (default for ambiguous) | "what should", "how should", "recommend", "which approach", "what do you think about", "is it better to" | **Surface assumptions, then WAIT.** |
+| **Tier 3 — Architecture** | "should we restructure", "what's the right architecture", "design a schema", "how to migrate", "should we split / merge / replace" | Tier 2 flow + spawn adversary on a different model after recommendation. |
+**Tier 2 flow — the user is the adversary**:
+1. Before any analysis, identify the domain-model assumptions your answer will depend on (typically 2–5).
+2. Present them in a fenced block and STOP:
+   ```
+   ━━━ ASSUMPTIONS (confirm before I analyze) ━━━
+   My analysis will depend on these domain model assumptions:
+   1. <assumption 1>
+   2. <assumption 2>
+   3. <assumption 3>
+   Do these match your understanding? [confirm / correct]
+   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+   ```
+3. WAIT for the user to confirm or correct. Do not analyze while waiting.
+4. When confirmed (or corrected), ground the analysis in the user's stated model — not your original guess.
+**Tier 3 flow** — after steps 1–4 above, also:
+5. Produce the recommendation.
+6. Spawn an Agent sub-agent on a DIFFERENT model (config-controlled, default `sonnet`) with: the user's confirmed assumptions + your recommendation + the original question. Ask: "Does this recommendation follow from these assumptions? What's the strongest counterargument? List 1–3 specific concerns with line/file citations where possible."
+7. Present both the recommendation AND the adversary critique to the user in a single response:
+   ```
+   ━━━ RECOMMENDATION ━━━
+   <your recommendation>
+   ━━━ ADVERSARY CRITIQUE (reviewed by a different model) ━━━
+   <sub-agent output>
+   ```
+8. One pass only — this is conversation, not implementation. No iteration loop.
+**Config toggles**:
+- `researchReasoningGate.enabled` — master switch
+- `researchReasoningGate.tier2.enabled` — assumption surfacing
+- `researchReasoningGate.tier3.enabled` — spawn adversary
+- `researchReasoningGate.tier3.adversaryModel` — model for the critique agent (default `sonnet`)
+**Why this works** (from spec wf-6dbc0b2a): same-model self-critique is a known rubber-stamp. The USER is the effective adversary — you surface assumptions so they can validate the domain model before you build recommendations on invisible guesses.
 **Everything else**: Route to best command from catalog. Zero exemptions.
 ### Examples

package/.claude/docs/intent-grounded-review.md ADDED Viewed

@@ -0,0 +1,209 @@
+# Intent-Grounded Review (IGR v6.0) — Reference
+Detailed reference for the IGR phases added to `/wogi-review` in v2.17.3. The main `/wogi-review` skill file contains the mandatory execution flow; this doc holds the full rationale, examples, and background each phase depends on.
+Read this when:
+- You're debugging unexpected review-skill behavior
+- You're tuning `config.review.*` and want to understand each key's impact
+- You're extending the review pipeline (adding a new phase, new adversary, new tier)
+- A review surfaced a false-positive finding and you want to know which phase should have caught it
+The four IGR additions to the review pipeline:
+| Phase | What it adds |
+|-------|---|
+| Phase 0 — Framing | Scope + assumptions surfaced BEFORE agents launch |
+| Phase 2 — Evidence Tiers | Every finding carries `evidenceTier` + `evidenceNote`; severity capped by tier |
+| Phase 2.8 — Findings Adversary | Different-model critique of the findings themselves |
+| Phase 5 — Completion Truth Gate | "Fixed" claims require INTERACTIVE evidence |
+---
+## Phase 0: Review Framing Pass — Reference Detail
+**Problem this solves**: "Review" means different things in different invocations. "Review what we just did" is bounded to the session diff; "review the auth flow" is bounded to a module; "review before ship" expects a final-sign-off posture. Without explicit framing, the AI picks its own scope and grades findings against its own mental rubric, producing a different answer than the user asked for.
+### The Five Framing Fields
+| Field | What it captures |
+|---|---|
+| `interpretation` | One sentence: "I understand this as: review X with posture Y" |
+| `scopeIn` | Explicit list: which files, commits, or modules are in scope |
+| `scopeOut` | Explicit list: what this review will NOT cover (by design, not omission) |
+| `assumptions` | 2–5 review-model assumptions (e.g., "a refactor review must verify behavior preservation, not just test pass") |
+| `posture` | `pre-ship` / `session-review` / `security-focused` / `exploratory` — adjusts agent emphasis |
+### Posture → Agent Weight Adjustment
+- `pre-ship` → boost security + integration agents, require Phase 2.8 adversary pass
+- `session-review` → balanced across all agents
+- `security-focused` → security agent mandatory, injection/authn checks emphasized
+- `exploratory` → logic + architecture agents; adversary pass OPTIONAL (via `config.review.framingPass.adversaryInExploratory`)
+### Item Reconciliation (Anti-Deferral Guard)
+When the user's request enumerated multiple focus areas ("review X, Y, Z"), each named item MUST appear in `scopeIn`. If the count shrank (user named 5, framing has 3), the framing pass FAILS — display which items were dropped and require the user to confirm before proceeding.
+Ported from `/wogi-start`'s anti-deferral rule. The AI cannot silently drop items.
+---
+## Phase 2: Evidence Tiers — Reference Detail
+Every finding returned by any review agent MUST carry two additional fields: `evidenceTier` (0–4) and `evidenceNote` (one-line string citing what produced the evidence).
+### Tier Definitions
+| Tier | Name | What it means for findings |
+|------|------|----------------------------|
+| 0 | STATIC | AI inferred from the source alone — no grep, no execution. Weakest. |
+| 1 | STRUCTURAL | AI grepped / globbed / counted instances across the codebase. |
+| 2 | OBSERVATIONAL | AI ran a tool (lint, typecheck, npm audit) and read its output. |
+| 3 | INTERACTIVE | AI executed code or tests and observed the behavior. |
+| 4 | AUTOMATED | A quality gate or test suite produces this finding deterministically on every run. |
+### Severity Cap Rules
+- **Tier 0** findings: severity MUST be LOW (and flagged UNVERIFIED in the report).
+- **Tier 1** findings: severity capped at MEDIUM unless grep returned ≥5 instances.
+- **Tier 2+** findings: severity stands as the agent assigned.
+### Why Tiers Matter (Real Incident)
+During the v2.17.3 self-review (session 2026-04-15), a `code-reviewer` agent reported an F1 finding as "Critical — broken require path" without citing evidence. Manual verification via `require.resolve()` showed the path was correct — the agent's path math was flawed.
+With tier enforcement, F1 would have been Tier 0 (no grep, no execution), capped at LOW, and flagged UNVERIFIED — alerting the reader to verify before acting. The evidence-tier requirement is the single most powerful rubber-stamp-prevention mechanism in the IGR toolkit.
+---
+## Phase 2.8: Findings Adversary Critique — Reference Detail
+This is the review analogue of the `/wogi-audit` Adversary Pass (Step 3.5) and the IGR Logic Adversary (wf-3975a001). Same pattern: different model, separate context, looking for specific defect classes.
+### Adversary Model Selection Rule (CRITICAL)
+The `adversaryPass.adversaryModel` is a mapping, NOT a static string. The AI resolves it at runtime by inspecting which model the review agents used.
+```json
+"adversaryModel": {
+  "whenAgentOnSonnet": "opus",
+  "whenAgentOnOpus": "sonnet",
+  "whenAgentOnHaiku": "sonnet",
+  "default": "sonnet"
+}
+```
+**Override-always rule**: if the resolved value equals the agent model (e.g., legacy plain-string config set to `sonnet` when agents ran on Sonnet), pick a different model instead. Same-model adversary = rubber-stamp, which defeats the entire purpose of the adversary pass.
+### Specific Defect Classes to Hunt
+The adversary prompt includes HUNT instructions for these patterns:
+1. Findings where `evidenceTier=0` but severity ≥ HIGH
+2. Findings that cite line numbers without quoting the surrounding code
+3. "Broken require path" / "missing import" / "wrong type" claims without `require.resolve` / `tsc` / `grep` verification
+4. Findings that contradict the framing's `scopeIn` / `scopeOut` declarations
+### Applied Adjustments
+The orchestrator applies the adversary's recommendations automatically:
+- `severityAdjustments` rewrite findings' severity in the consolidated report (mark `[ADVERSARY-ADJUSTED]`)
+- `scopeDrift` moves findings out of the main report into an "Out-of-Scope Findings" appendix (not dropped — user still sees them)
+- `falsePositives` get marked `[DISPUTED]` in the report body (not removed — user sees both the finding AND the dispute)
+- `missedIssues` get appended as new Tier-0 findings labeled `[ADVERSARY-FOUND]`
+- `evidenceChallenges` downgrade the `evidenceTier` on challenged findings and re-apply the severity cap
+### Verdict Semantics
+- `ACCEPT` — no adjustments needed; findings are well-grounded
+- `ACCEPT_WITH_ADJUSTMENTS` — severity caps/scope fixes applied, report still ships
+- `REVISE_SCOPE` — framing was wrong; reviewer should restart with corrected scope
+- `BLOCK` — adversary found a critical false positive or missed issue that makes the report untrustworthy; user must acknowledge before Phase 3 proceeds
+**One pass only** — no iteration loop. If the adversary BLOCKS, the user calls it out and we re-review with adjusted scope.
+### Archival
+Every adversary run is archived to `.workflow/state/adversary-runs/review-{timestamp}.json` — same directory as IGR + audit adversary runs. This feeds the `flow promote` pipeline: recurring review-adversary findings graduate to `feedback-patterns.md`.
+---
+## Phase 5: Completion Truth Gate — Reference Detail
+**Problem this solves**: A review's "fixed" claim is only as good as the evidence behind it. A finding marked `fixed` because the AI applied an edit is NOT the same as a finding verified to work. Without a truth gate, the sign-off rubber-stamps whatever the agent says.
+### Downgrade Rules
+For every finding now marked `status: fixed`:
+| Fix evidence tier | New status |
+|---|---|
+| Tier ≥ 3 (INTERACTIVE) | stays `fixed` |
+| Tier 4 (AUTOMATED quality gate) | stays `fixed` |
+| Tier 2 (OBSERVATIONAL — lint/typecheck pass only) | downgraded to `fixed-unverified` |
+| Tier ≤ 1 (STATIC / STRUCTURAL) | downgraded to `implemented-unverified` |
+### Persistence
+Downgraded statuses are persisted to `last-review.json` — NOT silently dropped back to `fixed`. The user should consciously accept unverified fixes, not have them hidden.
+### Self-Incident (v2.17.4)
+In v2.17.4 I claimed to "fix all review findings." The truth gate (applied manually) caught:
+- F1, F2, F3 — fixed with Tier 2+ evidence (OK)
+- F4 — doc update only, Tier 0, should have been `implemented-unverified`
+- M1 — deferred, but the release notes said "fix all" — promise/delivery mismatch
+- M3 — dropped entirely, never mentioned in the commit
+User correction: "You're not supposed to defer any fixes. It's up to the user to defer, not you." → Anti-Deferral Guard added to feedback-patterns + decisions.
+---
+## Config Enforcement Model — Reference Detail
+All `config.review.*` toggles are AI-honored, not runtime-enforced. No JavaScript reads `config.review.framingPass`, `config.review.evidenceTiers`, `config.review.adversaryPass`, or `config.review.completionTruthGate`.
+The AI executing `/wogi-review` is responsible for reading these keys via `getConfig()` and honoring them. This matches `/wogi-audit`'s docs-driven model.
+**Practical implication**: a user who sets `review.adversaryPass.enabled: false` will have the pass skipped ONLY if the AI respects the config. As a reviewer, always load config first and print the toggle states before launching phases.
+---
+## Config Reference (all IGR keys)
+```json
+{
+  "review": {
+    "minFindings": 3,
+    "requireJustificationIfClean": true,
+    "framingPass": {
+      "enabled": true,
+      "itemReconciliation": true,
+      "adversaryInExploratory": false
+    },
+    "evidenceTiers": {
+      "enabled": true,
+      "capByTier": true
+    },
+    "adversaryPass": {
+      "enabled": true,
+      "adversaryModel": {
+        "whenAgentOnSonnet": "opus",
+        "whenAgentOnOpus": "sonnet",
+        "whenAgentOnHaiku": "sonnet",
+        "default": "sonnet"
+      },
+      "applySeverityAdjustments": true,
+      "applyScopeDrift": true,
+      "blockOnBlockVerdict": true,
+      "archiveRuns": true
+    },
+    "completionTruthGate": {
+      "enabled": true,
+      "requireInteractiveForFixed": true
+    }
+  }
+}
+```

package/.workflow/agents/logic-adversary.md CHANGED Viewed

@@ -35,6 +35,14 @@ Patterns that produce logic failures in practice — seen in real agent session
 **P11.3 — Also check for EXISTING WOGIFLOW FEATURES that touch the same domain.** Before shipping any new mechanism (hook, wrapper, CLI entry, state file, config key, skill), enumerate the sibling surface: (S1) `grep -r "execSync\|spawn.*claude" lib/ scripts/`, check `.claude/commands/`, check `scripts/flow-constants.js`, check `lib/workspace.js` — does an existing feature already touch this domain? (S2) Show how the new mechanism composes, conflicts, or integrates with each sibling. "Orthogonal" is OK but must be asserted. (S3) If integration work is needed (e.g., the new wrapper needs to be injected into workspace's `execSync('claude')` call), include it in scope OR explicitly file a follow-up story. Silent omission of sibling integration = FAIL. Example violation caught live: `wogi-claude` wrapper initially missed that `lib/workspace.js:1612` spawns claude directly, so workspace-mode workers weren't restart-capable.
+**P11.4 — Generative edge-case taxonomy (5 buckets, always run for any new mechanism).** Go through EACH bucket and demand a sentence of acknowledgment — "addressed by X", "N/A because Y", or "accepted limitation Z documented in spec". Blank buckets = FAIL. The buckets: **B1 Interleaving/concurrency** (TOCTOU, two instances at once, hook-in-hook races), **B2 Partial failure** (step 1 ok, step 2 fails — is the half-done state acceptable?), **B3 Boundary counts** (0x, 1x, 1000x — accumulation, caps, restart-storm), **B4 Execution-environment portability** (which OS/browser/runtime/deploy-target variants does this run in? Which are in scope vs explicitly unsupported?), **B5 Silent-failure observability** (if it breaks silently, will anyone notice? — is there a health-check surface, log line, or telemetry event?). Distinct from P11.1-P11.3 because it's GENERATIVE (force-enumeration) not REACTIVE (critique what the plan says).
+**Stack-agnostic**: the buckets are universal. Substitute examples with whatever your plan's stack uses. "Windows / non-bash / filesystem" applies to CLI tools; for web frontends B4 means "Safari vs Chrome vs mobile WebView, SSR vs CSR, accessibility modes"; for mobile apps it means "iOS / Android versions, phone vs tablet"; for backends it means "Node/Python/Go runtime spread, containerized vs bare-metal, serverless cold-start". See rubric P11.4 for the full stack-specific mapping.
+Cost: ~50-100 words added per plan. Value: catches architectural gaps at plan-time that otherwise surface as post-ship fires.
+Reflex (stack-agnostic): *"for this mechanism — can 2 run at once? can a step half-fail? what happens at 0x and 1000x? which execution-environment variants does this run on? if this breaks silently, what surfaces it?"*
 ### What you are NOT looking for
 - Code style, lint, naming — other gates handle these.