wogiflow 2.16.0 → 2.17.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (112) hide show
  1. package/.claude/commands/wogi-audit.md +212 -17
  2. package/.claude/commands/wogi-research.md +37 -0
  3. package/.claude/commands/wogi-review.md +200 -22
  4. package/.claude/commands/wogi-start.md +45 -0
  5. package/.claude/docs/intent-grounded-review.md +209 -0
  6. package/.workflow/agents/logic-adversary.md +8 -0
  7. package/.workflow/templates/claude-md.hbs +18 -0
  8. package/lib/installer.js +1 -0
  9. package/lib/utils.js +29 -3
  10. package/lib/workspace-changelog.js +2 -1
  11. package/lib/workspace-channel-server.js +4 -6
  12. package/lib/workspace-contracts.js +5 -4
  13. package/lib/workspace-events.js +8 -7
  14. package/lib/workspace-gates.js +4 -3
  15. package/lib/workspace-integration-tests.js +2 -1
  16. package/lib/workspace-intelligence.js +3 -2
  17. package/lib/workspace-locks.js +2 -1
  18. package/lib/workspace-messages.js +7 -6
  19. package/lib/workspace-routing.js +14 -26
  20. package/lib/workspace-session.js +7 -6
  21. package/lib/workspace-sync.js +9 -8
  22. package/lib/workspace.js +45 -2
  23. package/package.json +4 -2
  24. package/scripts/base-workflow-step.js +1 -1
  25. package/scripts/flow +22 -0
  26. package/scripts/flow-adaptive-learning.js +1 -1
  27. package/scripts/flow-aggregate.js +2 -1
  28. package/scripts/flow-architect-pass.js +3 -3
  29. package/scripts/flow-archive-runs.js +372 -0
  30. package/scripts/flow-ask.js +121 -0
  31. package/scripts/flow-ast-grep.js +216 -0
  32. package/scripts/flow-audit-gates.js +1 -1
  33. package/scripts/flow-auto-learn.js +8 -11
  34. package/scripts/flow-bug.js +2 -2
  35. package/scripts/flow-capture-gate.js +644 -0
  36. package/scripts/flow-capture.js +4 -3
  37. package/scripts/flow-cli-flags.js +95 -0
  38. package/scripts/flow-community-sync.js +2 -1
  39. package/scripts/flow-community.js +6 -6
  40. package/scripts/flow-conclusion-classifier.js +310 -0
  41. package/scripts/flow-config-defaults.js +13 -3
  42. package/scripts/flow-constants.js +11 -12
  43. package/scripts/flow-context-scoring.js +1 -0
  44. package/scripts/flow-correction-detector.js +344 -3
  45. package/scripts/flow-damage-control.js +1 -1
  46. package/scripts/flow-decisions-merge.js +1 -0
  47. package/scripts/flow-done-gates.js +20 -0
  48. package/scripts/flow-done-report.js +2 -2
  49. package/scripts/flow-done.js +4 -4
  50. package/scripts/flow-epics.js +5 -11
  51. package/scripts/flow-id.js +92 -0
  52. package/scripts/flow-io.js +15 -5
  53. package/scripts/flow-knowledge-router.js +2 -1
  54. package/scripts/flow-links.js +1 -1
  55. package/scripts/flow-log-manager.js +2 -1
  56. package/scripts/flow-logic-adversary.js +4 -4
  57. package/scripts/flow-long-input-cli.js +6 -0
  58. package/scripts/flow-long-input-stories.js +1 -1
  59. package/scripts/flow-loop-retry-learning.js +1 -1
  60. package/scripts/flow-mcp-capabilities.js +2 -3
  61. package/scripts/flow-mcp-docs.js +2 -1
  62. package/scripts/flow-memory-blocks.js +2 -1
  63. package/scripts/flow-memory-sync.js +1 -1
  64. package/scripts/flow-memory.js +767 -0
  65. package/scripts/flow-migrate-igr.js +1 -1
  66. package/scripts/flow-migrate.js +2 -1
  67. package/scripts/flow-model-adapter.js +1 -1
  68. package/scripts/flow-model-config.js +5 -1
  69. package/scripts/flow-model-profile.js +2 -1
  70. package/scripts/flow-orchestrate.js +3 -3
  71. package/scripts/flow-output.js +29 -0
  72. package/scripts/flow-parallel.js +10 -9
  73. package/scripts/flow-pattern-enforcer.js +2 -1
  74. package/scripts/flow-permissions-audit.js +124 -0
  75. package/scripts/flow-plugin-registry.js +2 -2
  76. package/scripts/flow-progress.js +5 -1
  77. package/scripts/flow-project-analyzer.js +1 -1
  78. package/scripts/flow-promote.js +510 -0
  79. package/scripts/flow-registries.js +86 -0
  80. package/scripts/flow-request-log.js +133 -0
  81. package/scripts/flow-research-protocol.js +0 -1
  82. package/scripts/flow-revision-tracker.js +2 -1
  83. package/scripts/flow-roadmap.js +2 -1
  84. package/scripts/flow-rules-sync.js +3 -7
  85. package/scripts/flow-session-end.js +3 -1
  86. package/scripts/flow-session-learning.js +6 -13
  87. package/scripts/flow-session-state.js +2 -2
  88. package/scripts/flow-setup-hooks.js +2 -1
  89. package/scripts/flow-skill-create.js +1 -1
  90. package/scripts/flow-skill-freshness.js +6 -7
  91. package/scripts/flow-skill-learn.js +1 -1
  92. package/scripts/flow-step-coverage.js +1 -1
  93. package/scripts/flow-step-security.js +1 -1
  94. package/scripts/flow-story.js +58 -10
  95. package/scripts/flow-sys.js +204 -0
  96. package/scripts/flow-task-hierarchy.js +88 -0
  97. package/scripts/flow-tech-debt.js +2 -1
  98. package/scripts/flow-test-api.js +1 -1
  99. package/scripts/flow-utils.js +60 -890
  100. package/scripts/hooks/core/bugfix-scope-gate.js +5 -4
  101. package/scripts/hooks/core/deploy-gate.js +1 -1
  102. package/scripts/hooks/core/pre-tool-helpers.js +72 -0
  103. package/scripts/hooks/core/pre-tool-orchestrator.js +442 -0
  104. package/scripts/hooks/core/routing-gate.js +8 -0
  105. package/scripts/hooks/core/session-context.js +35 -0
  106. package/scripts/hooks/core/session-end.js +28 -0
  107. package/scripts/hooks/core/task-boundary-reset.js +10 -0
  108. package/scripts/hooks/entry/claude-code/pre-tool-use.js +48 -492
  109. package/scripts/hooks/entry/claude-code/user-prompt-submit.js +12 -0
  110. package/scripts/hooks/entry/shared/hook-runner.js +1 -1
  111. package/scripts/registries/schema-registry.js +1 -1
  112. package/scripts/registries/service-registry.js +1 -1
@@ -41,15 +41,19 @@ node node_modules/wogiflow/scripts/flow-progress-tracker.js update '{"taskId":"a
41
41
  ```
42
42
 
43
43
  **Phase mapping for /wogi-audit:**
44
- | Phase | phaseNum | Description |
45
- |-------|----------|-------------|
46
- | 1 | Gather Files | Scan project files |
47
- | 1.5 | Gate 0 | Pre-agent baseline checks (build, typecheck, lint, config integrity) |
48
- | 2 | Agents | 7 parallel agents (sub-steps = agents) |
49
- | 3 | Consolidate | Score calculation + Gate 0 cap |
50
- | 4 | Pattern Promotion | AI clustering + cross-reference + gaps |
51
- | 5 | Report | Display formatted report with Gate 0 baseline |
52
- | 6 | Persist | Save to last-audit.json (includes Gate 0 data + trend) |
44
+ | Step | Description |
45
+ |------|-------------|
46
+ | 0 | Framing interpret scope, surface assumptions, item reconciliation |
47
+ | 1 | Gather Files scan project files |
48
+ | 1.5 | Gate 0 pre-agent baseline checks (build, typecheck, lint, config integrity) |
49
+ | 1.8 | Evidence Tiers brief agents on required evidence grading (0–4) |
50
+ | 2 | Agents 7 parallel agents (sub-steps = agents) |
51
+ | 3 | Consolidate score calculation + Gate 0 cap |
52
+ | 3.5 | Adversary different-model critique of findings (false positives, missed issues, severity) |
53
+ | 4 | Pattern Promotion — AI clustering + cross-reference + enforcement-gap detection |
54
+ | 5 | Display Report — formatted report with Gate 0 baseline + adversary block + promotions |
55
+ | 6 | Post-Audit Actions — user chooses follow-up (create tasks, apply promotions, etc.) |
56
+ | 7 | Persist — save to last-audit.json (includes Gate 0 data + adversary run + framing + trend) |
53
57
 
54
58
  **Display at each agent completion:**
55
59
  ```
@@ -61,6 +65,43 @@ On audit completion, clear progress: `node node_modules/wogiflow/scripts/flow-pr
61
65
 
62
66
  ## How It Works
63
67
 
68
+ ### Step 0: Framing Pass (MANDATORY when `config.audit.framingPass.enabled`, default ON)
69
+
70
+ **Problem this solves**: "Audit" means different things in different invocations. "Audit what we did this epic" is bounded to ~20 files; "audit the project" is bounded to the whole repo; "audit our auth flow" is bounded to a module. Without explicit framing, the AI picks its own scope and the user may get an answer to a different question than they asked.
71
+
72
+ **This is NOT a clarifying-questions step** (no user round-trip). It's a self-reflective interpretation: the AI writes down what it thinks the user asked, what scope bounds that implies, and what's explicitly out of scope — BEFORE launching any agents. The user sees the framing before agents run and can correct it.
73
+
74
+ **Procedure**:
75
+ 1. Interpret the user's audit request into a **Framing Artifact** with 5 fields:
76
+ - `interpretation` — one sentence: "I understand this as: audit X for Y purpose"
77
+ - `scopeIn` — explicit list: which files / directories / epics / time windows are in scope
78
+ - `scopeOut` — explicit list: what this audit will NOT cover (out of scope by design, not by omission)
79
+ - `assumptions` — 2–5 domain assumptions the audit rests on (e.g., "an audit must verify test coverage" or "the epic-episodic-memory stories were shipped in the last 30 days")
80
+ - `dimensionWeights` — any adjustment to the 7-dimension balance based on request (e.g., "user asked for token-saving validation → weight performance + tech-debt higher")
81
+
82
+ 2. Write the artifact to `.workflow/state/audit-framing/{timestamp}.md` (with PIN markers for future queryability).
83
+
84
+ 3. Display a short summary to the user:
85
+ ```
86
+ ━━━ AUDIT FRAMING ━━━
87
+ Interpretation: [one sentence]
88
+ Scope (in): [list]
89
+ Scope (out): [list]
90
+ Assumptions:
91
+ - [assumption 1]
92
+ - [assumption 2]
93
+
94
+ Dimension weights: [any adjustments from default]
95
+ Proceeding with 7-agent analysis on this scope.
96
+ ━━━━━━━━━━━━━━━━━━━━━━
97
+ ```
98
+
99
+ 4. **Item reconciliation** (when the user's request enumerated multiple focus areas, e.g., "audit X, Y, and Z"): each named item MUST appear in `scopeIn`. If the count shrank (user named 5, framing has 3), the framing pass FAILS — display which items were dropped and require the user to confirm before proceeding. This is the anti-deferral guard from `/wogi-start` ported to audit.
100
+
101
+ 5. **Conversation-mode tier check** (shared with Research Reasoning Gate): "What should we do about X?" in audit context → Tier 2 (surface assumptions). Plain audit = Tier 1 factual.
102
+
103
+ Config toggles: `audit.framingPass.enabled` (default true), `audit.framingPass.itemReconciliation` (default true).
104
+
64
105
  ### Step 1: Gather Project Files
65
106
 
66
107
  ```bash
@@ -131,12 +172,71 @@ Trend: typecheck errors 939 → 412 (-527) ↑
131
172
 
132
173
  The framework checks are appended to the existing agent prompts — they don't replace the universal checks.
133
174
 
175
+ ### Step 1.8: Finding Evidence Tiers (MANDATORY when `config.audit.evidenceTiers.enabled`, default ON)
176
+
177
+ **Problem this solves**: Today's audit findings say "[HIGH] Missing error handling in X" without telling the reader WHY the AI is confident. That leads to rubber-stamped "HIGH" on a finding that's actually speculative, and dismissed "LOW" on findings that were verified by grep.
178
+
179
+ **Tier system** (shared with the IGR Completion Truth Gate — same constants from `flow-runtime-verification.js`):
180
+
181
+ | Tier | Name | What it means for audit findings |
182
+ |------|------|----------------------------------|
183
+ | 0 | STATIC | AI inferred from the source alone — no grep, no execution. Weakest. |
184
+ | 1 | STRUCTURAL | AI grepped / globbed / counted instances across the codebase. |
185
+ | 2 | OBSERVATIONAL | AI ran a tool (lint, typecheck, npm audit) and read its output. |
186
+ | 3 | INTERACTIVE | AI executed code or tests and observed the behavior. |
187
+ | 4 | AUTOMATED | A quality gate or test suite produces this finding deterministically on every run. |
188
+
189
+ **Agent instructions update** (applies to all 7 agents + new ones): every finding MUST carry an `evidenceTier` 0–4 and a one-line `evidenceNote` citing what produced the evidence (filename, tool name, test ID, command run). A finding at Tier 0 with severity HIGH is suspect and should be flagged in the Adversary pass.
190
+
191
+ **Severity/tier interaction rule**:
192
+ - Tier ≥ 2 findings: severity stands as agent assigned.
193
+ - Tier 1 findings: severity capped at MEDIUM unless grep returned ≥5 instances.
194
+ - Tier 0 findings: severity capped at LOW and must be flagged "UNVERIFIED" in the report.
195
+
196
+ Config toggle: `audit.evidenceTiers.enabled` (default true).
197
+
134
198
  ### Step 2: Launch 7 Parallel Agents
135
199
 
136
200
  Launch ALL enabled agents as parallel `Task` calls in a single message. Each agent uses `subagent_type=Explore` and `model="sonnet"` (per decisions.md: use Sonnet for routine exploration).
137
201
 
138
202
  **Agent configuration** is in `config.audit.agents` — skip any agent set to `false`.
139
203
 
204
+ **Shared agent preamble (prepend to every agent prompt when `config.audit.evidenceTiers.enabled`)**:
205
+
206
+ ```
207
+ IMPORTANT — EVIDENCE TIER REQUIREMENT (wogi-audit evidence tiers):
208
+
209
+ Every finding you return MUST carry two additional fields:
210
+
211
+ evidenceTier: integer 0–4
212
+ 0 = STATIC — inferred from source alone (weakest)
213
+ 1 = STRUCTURAL — grepped / globbed / counted instances
214
+ 2 = OBSERVATIONAL — ran a tool (lint, typecheck, npm audit) and read output
215
+ 3 = INTERACTIVE — executed code/tests and observed behavior
216
+ 4 = AUTOMATED — deterministic check in a quality gate / test suite
217
+
218
+ evidenceNote: one-line string citing what produced the evidence
219
+ examples: "grep 'JSON\\.parse' returned 7 matches in src/api/"
220
+ "npm audit reports 3 high-severity CVEs in package X"
221
+ "Agent 1 file scan: 12 files over 300 LOC"
222
+
223
+ SEVERITY IS CAPPED BY TIER:
224
+ - Tier 0: severity MUST be LOW (and will be flagged UNVERIFIED in the report)
225
+ - Tier 1: severity capped at MEDIUM (unless grep returned >=5 instances, then HIGH allowed)
226
+ - Tier 2+: severity stands as you assign it
227
+
228
+ Return each finding in this shape:
229
+ {
230
+ "severity": "HIGH|MEDIUM|LOW",
231
+ "description": "...",
232
+ "files": ["..."],
233
+ "evidenceTier": 0|1|2|3|4,
234
+ "evidenceNote": "..."
235
+ }
236
+
237
+ Also respect the FRAMING ARTIFACT — only report findings within `scopeIn`. Findings in `scopeOut` will be removed by the orchestrator.
238
+ ```
239
+
140
240
  ---
141
241
 
142
242
  #### Agent 1: Architecture Analyzer
@@ -418,7 +518,81 @@ Final score = min(gate0_cap, weighted_agent_score - gate0_penalties)
418
518
  **3.4. Trend delta (if previous audit exists):**
419
519
  Compare current metrics with `last-audit.json`. Show improvement/regression arrows.
420
520
 
421
- ### Step 4: Display Report
521
+ ### Step 3.5: Adversary Critique Pass (MANDATORY when `config.audit.adversaryPass.enabled`, default ON)
522
+
523
+ **Problem this solves**: Agent findings are the single most important output of an audit, and they're also the most likely to contain false positives ("this is HIGH") and false negatives (missing real issues) when no one challenges them. Without an adversary, the audit report rubber-stamps whatever the agents produced.
524
+
525
+ **This is the audit analogue of the IGR Logic Adversary pass** (wf-3975a001). Same pattern: different model, separate context, looking for specific defect classes.
526
+
527
+ **Procedure**:
528
+ 1. Collect: the framing artifact + ALL agent findings (with evidence tiers) + the consolidated score.
529
+ 2. Launch ONE Agent sub-agent with `subagent_type=Explore` (READ-ONLY), `model=<config.audit.adversaryPass.adversaryModel>` (default `opus` when main audit ran on Sonnet; `sonnet` when audit ran on Opus — must be DIFFERENT from the agent model).
530
+ 3. Prompt structure:
531
+ ```
532
+ You are the Audit Adversary. Critique the audit report below.
533
+
534
+ FRAMING: [framing artifact]
535
+ FINDINGS: [all findings from 7+ agents, each with evidenceTier]
536
+ SCORE: [consolidated score + cap]
537
+
538
+ Your job — produce a JSON object with these fields:
539
+
540
+ {
541
+ "falsePositives": [
542
+ { "findingId": "...", "reason": "why this isn't actually HIGH/a real issue",
543
+ "evidenceContradicting": "file:line or command that refutes it" }
544
+ ],
545
+ "missedIssues": [
546
+ { "category": "<dimension>", "issue": "...", "whyMissed": "why the scan likely skipped it",
547
+ "evidenceFor": "file:line or pattern" }
548
+ ],
549
+ "severityAdjustments": [
550
+ { "findingId": "...", "from": "HIGH", "to": "MEDIUM",
551
+ "reason": "Tier 0 evidence cannot support HIGH" }
552
+ ],
553
+ "scopeDrift": [
554
+ { "findingId": "...", "reason": "out of declared scopeIn per framing" }
555
+ ],
556
+ "frameAssumptionChallenges": [
557
+ { "assumption": "...from framing", "challenge": "why it may not hold" }
558
+ ],
559
+ "overallVerdict": "ACCEPT | ACCEPT_WITH_ADJUSTMENTS | REVISE_SCORE | REVISE_SCOPE"
560
+ }
561
+
562
+ Ground every item in a file path, a line number, a grep pattern, a tool output, or a test ID.
563
+ Do NOT invent issues. "I think" / "might" / "could" are FORBIDDEN — require evidence.
564
+ ```
565
+ 4. Parse the adversary response. If parse fails, log a warning and continue with unmodified findings.
566
+ 5. **Apply automatic adjustments**:
567
+ - Each `severityAdjustments` item rewrites the finding's severity in the consolidated report (and marks it `[ADVERSARY-ADJUSTED]`).
568
+ - Each `scopeDrift` item moves the finding out of the main report into an "Out-of-Scope Findings" appendix (not dropped — the user still sees them).
569
+ - `falsePositives` get marked `[DISPUTED]` in the report body (not removed — the user sees both the finding and the dispute).
570
+ - `missedIssues` get appended as new Tier-0 findings labeled `[ADVERSARY-FOUND]` — the user can escalate them with follow-up.
571
+ 6. **Recompute the score** if `overallVerdict` is `REVISE_SCORE` (e.g., false-positive removal can lift a score by one tier).
572
+ 7. **Archive the adversary run** to `.workflow/state/adversary-runs/audit-{timestamp}.json` — same directory as IGR adversary runs. This feeds the `flow promote` promotion pipeline (wf-6a352aae) — recurring audit-adversary findings graduate to feedback-patterns.md.
573
+ 8. **Display a summary block** in the final report:
574
+ ```
575
+ ━━━ ADVERSARY CRITIQUE (different model) ━━━
576
+ Verdict: [ACCEPT | ACCEPT_WITH_ADJUSTMENTS | REVISE_SCORE | REVISE_SCOPE]
577
+ False positives: N (marked [DISPUTED] in findings)
578
+ Severity adjustments: N (marked [ADVERSARY-ADJUSTED])
579
+ Missed issues found: N (appended as [ADVERSARY-FOUND] Tier-0 findings)
580
+ Scope drift: N (moved to Out-of-Scope appendix)
581
+
582
+ [For each item, show one line with the finding ID + reason]
583
+ ```
584
+
585
+ **One pass only** — no iteration loop. This is analysis, not implementation. If the adversary finds a serious issue, the user calls it out and we re-audit with adjusted scope.
586
+
587
+ Config toggles: `audit.adversaryPass.enabled` (default true), `audit.adversaryPass.adversaryModel` (default `sonnet` — different from the agent model used), `audit.adversaryPass.applySeverityAdjustments` (default true), `audit.adversaryPass.applyScopeDrift` (default true).
588
+
589
+ ### Step 4: Pattern Promotion Analysis (MANDATORY)
590
+
591
+ _Moved from former "Step 4.5" — pattern promotion must run BEFORE Display Report so the report includes promotion outcomes. Phase-table (L43-56) now matches step numbering. Adversary caught this mismatch 2026-04-15 during audit of epic-episodic-memory; see `.workflow/state/adversary-runs/audit-2026-04-15-epic-episodic-memory.json`._
592
+
593
+ After the adversary pass consolidates findings, run pattern promotion BEFORE displaying the final report. This ensures promotion outcomes (enforcement gaps, newly promoted rules, recurring patterns) are visible in the report itself. This step has 3 phases.
594
+
595
+ ### Step 5: Display Report
422
596
 
423
597
  ```
424
598
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
@@ -474,10 +648,6 @@ Top 5 Quick Wins (highest impact, lowest effort):
474
648
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
475
649
  ```
476
650
 
477
- ### Step 4.5: Pattern Promotion Analysis (MANDATORY)
478
-
479
- After displaying the report, run pattern promotion analysis **before** offering post-audit actions. This step has 3 phases.
480
-
481
651
  #### Phase 1: AI Semantic Clustering
482
652
 
483
653
  Launch a single Agent (`subagent_type=Explore`, `model="sonnet"`) with ALL findings from the 7 audit agents:
@@ -604,7 +774,7 @@ Display investigation results:
604
774
  - Add patterns to standards gate for programmatic enforcement
605
775
  ```
606
776
 
607
- ### Step 5: Post-Audit Actions
777
+ ### Step 6: Post-Audit Actions
608
778
 
609
779
  After displaying the report and promotion summary, offer these options using AskUserQuestion:
610
780
 
@@ -615,9 +785,34 @@ After displaying the report and promotion summary, offer these options using Ask
615
785
  5. **Investigate enforcement gaps** — Run Phase 3 investigation for all `ENFORCEMENT_GAP` patterns
616
786
  6. **Apply all promotions** — Batch-confirm all auto-promoted rules (already written by Phase 2)
617
787
 
618
- ### Step 6: Persist Report
788
+ ### Step 7: Persist Report
789
+
790
+ Regardless of user choice, always save the audit results to `.workflow/state/last-audit.json`. Include the new framing + adversary sections when those passes ran:
791
+
792
+ ```json
793
+ {
794
+ "framing": {
795
+ "interpretation": "...",
796
+ "scopeIn": [...],
797
+ "scopeOut": [...],
798
+ "assumptions": [...],
799
+ "dimensionWeights": {...},
800
+ "artifactPath": ".workflow/state/audit-framing/<timestamp>.md"
801
+ },
802
+ "adversary": {
803
+ "ran": true,
804
+ "overallVerdict": "ACCEPT_WITH_ADJUSTMENTS",
805
+ "falsePositives": N,
806
+ "severityAdjustments": N,
807
+ "missedIssues": N,
808
+ "scopeDrift": N,
809
+ "archivePath": ".workflow/state/adversary-runs/audit-<timestamp>.json"
810
+ },
811
+ ...
812
+ }
813
+ ```
619
814
 
620
- Regardless of user choice, always save the audit results to `.workflow/state/last-audit.json`:
815
+ Full persisted shape:
621
816
 
622
817
  ```json
623
818
  {
@@ -339,6 +339,43 @@ Before presenting ANY research report, verify ALL of these are present. If any i
339
339
 
340
340
  If the report is missing any required section, DO NOT present it — add the missing section first.
341
341
 
342
+ ## Research Reasoning Gate (wf-6dbc0b2a)
343
+
344
+ When `config.researchReasoningGate.enabled` (default: true), classify the research question into a tier by **structural markers**, NOT by your own judgement. When ambiguous, default to Tier 2.
345
+
346
+ | Tier | Markers | Behavior |
347
+ |------|---------|----------|
348
+ | 1 — Factual | "what is", "how many", "show me", "list all", "which file", "where does" | Run the zero-trust research protocol and answer. No assumption gate. |
349
+ | 2 — Domain (default for ambiguous) | "what should", "how should", "recommend", "which approach", "what do you think about", "is it better to" | **Before analyzing**, surface the domain-model assumptions your recommendation will depend on. WAIT for user confirmation. |
350
+ | 3 — Architecture | "should we restructure", "what's the right architecture", "design a schema", "how to migrate", "should we split / merge / replace" | Tier 2 flow + after producing the recommendation, spawn an Agent on a DIFFERENT model (config `researchReasoningGate.tier3.adversaryModel`, default `sonnet`) to critique it. Show both perspectives. |
351
+
352
+ **Tier 2 assumption-surfacing format** (BEFORE any analysis):
353
+ ```
354
+ ━━━ ASSUMPTIONS (confirm before I analyze) ━━━
355
+ My analysis will depend on these domain model assumptions:
356
+ 1. <assumption 1>
357
+ 2. <assumption 2>
358
+ 3. <assumption 3>
359
+
360
+ Do these match your understanding? [confirm / correct]
361
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
362
+ ```
363
+
364
+ Do NOT produce the research report while waiting. When the user confirms or corrects, ground the report in the user's domain model — not your original guess.
365
+
366
+ **Tier 3 adversary-critique format** (AFTER recommendation):
367
+ ```
368
+ ━━━ RECOMMENDATION ━━━
369
+ <research report>
370
+
371
+ ━━━ ADVERSARY CRITIQUE (reviewed by a different model) ━━━
372
+ <sub-agent output — 1-3 specific concerns with citations>
373
+ ```
374
+
375
+ **Why this is here** (and not left to AI self-reflection): same-model self-critique is a known rubber-stamp. The USER is the effective adversary at Tier 2 — surfacing assumptions lets them validate the domain model before you build recommendations on invisible guesses. At Tier 3, a different-model agent catches failures of reasoning the original model cannot see.
376
+
377
+ Tier toggles: `researchReasoningGate.tier2.enabled` / `researchReasoningGate.tier3.enabled` — independent. Both default ON.
378
+
342
379
  ## CLI Compatibility
343
380
 
344
381
  This command currently supports Claude Code only.