wogiflow 2.24.0 → 2.25.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/commands/wogi-debug-hypothesis.md +1 -1
- package/.claude/commands/wogi-decide.md +40 -0
- package/.claude/commands/wogi-init.md +29 -0
- package/.claude/commands/wogi-learn.md +46 -0
- package/.claude/commands/wogi-onboard.md +26 -0
- package/.claude/commands/wogi-peer-review.md +5 -3
- package/.claude/commands/wogi-triage.md +57 -0
- package/package.json +2 -2
- package/scripts/flow-completion-truth-gate.js +130 -0
- package/scripts/flow-config-defaults.js +25 -0
- package/scripts/flow-extraction-review.js +18 -3
- package/scripts/flow-morning.js +12 -3
- package/scripts/flow-session-end.js +12 -3
|
@@ -176,7 +176,7 @@ After all agents complete, display the consolidated results:
|
|
|
176
176
|
|
|
177
177
|
### Step 4: Hypothesis Adversary (v2.23.0+ — MANDATORY unless `--no-adversary`)
|
|
178
178
|
|
|
179
|
-
After consolidation, spawn a single Agent (
|
|
179
|
+
After consolidation, spawn a single Agent on a DIFFERENT model (default `sonnet` via `config.researchReasoningGate.tier3.adversaryModel` — canonical cross-command adversary key, same as `/wogi-peer-review`, `/wogi-learn`, `/wogi-decide`) with this prompt:
|
|
180
180
|
|
|
181
181
|
```
|
|
182
182
|
You are the hypothesis adversary.
|
|
@@ -304,6 +304,46 @@ In `config.json`:
|
|
|
304
304
|
}
|
|
305
305
|
```
|
|
306
306
|
|
|
307
|
+
## Rule-Creation Adversary (v2.25.0+ — OPTIONAL but recommended for ambiguous rules)
|
|
308
|
+
|
|
309
|
+
When creating a non-trivial rule (anything beyond pure preference-setting like "always use semicolons"), spawn an adversary on a different model (default `sonnet` via `config.researchReasoningGate.tier3.adversaryModel`) to stress-test the proposed rule BEFORE it lands in `decisions.md`.
|
|
310
|
+
|
|
311
|
+
```
|
|
312
|
+
Spawn Agent (subagent_type: general-purpose, model: <adversaryModel>):
|
|
313
|
+
|
|
314
|
+
Input:
|
|
315
|
+
Proposed rule title: <title>
|
|
316
|
+
Proposed rule body: <body>
|
|
317
|
+
User's original phrasing: <literal request>
|
|
318
|
+
|
|
319
|
+
Prompt:
|
|
320
|
+
You are the rule-creation adversary.
|
|
321
|
+
1. Edge cases: name 3 situations where following this rule would produce
|
|
322
|
+
worse outcomes than NOT following it.
|
|
323
|
+
2. Interpretation: are there 2+ reasonable interpretations? If yes, list
|
|
324
|
+
them and pick the one the user most likely meant.
|
|
325
|
+
3. Scope creep: could this rule be over-applied to situations the user
|
|
326
|
+
didn't intend? Suggest scope qualifiers.
|
|
327
|
+
4. Verdict:
|
|
328
|
+
- ACCEPT — ship as-is
|
|
329
|
+
- CLARIFY — multiple interpretations; ask user
|
|
330
|
+
- NARROW — over-application risk; add scope qualifiers
|
|
331
|
+
- REJECT — edge cases dominate; more harm than good
|
|
332
|
+
|
|
333
|
+
Output JSON: {
|
|
334
|
+
"verdict", "edge_cases", "interpretations",
|
|
335
|
+
"scope_qualifiers", "suggested_revision"
|
|
336
|
+
}
|
|
337
|
+
```
|
|
338
|
+
|
|
339
|
+
Process:
|
|
340
|
+
- **ACCEPT** → proceed with rule creation
|
|
341
|
+
- **CLARIFY** → ask user to pick interpretation
|
|
342
|
+
- **NARROW** → show scope qualifier; ask user to approve
|
|
343
|
+
- **REJECT** → surface edge cases; require explicit override
|
|
344
|
+
|
|
345
|
+
Fail-open: adversary unavailable → proceed with standard flow. User confirmation is still present.
|
|
346
|
+
|
|
307
347
|
## Files
|
|
308
348
|
|
|
309
349
|
| Action | File |
|
|
@@ -1209,3 +1209,32 @@ Say "show me the rules" or "what patterns are we using?" anytime.
|
|
|
1209
1209
|
### If user cancels mid-wizard
|
|
1210
1210
|
- Save progress to `.workflow/state/setup-progress.json`
|
|
1211
1211
|
- Next run can offer to resume
|
|
1212
|
+
|
|
1213
|
+
## v2.25.0+ — Modern Config Scaffolding (MANDATORY)
|
|
1214
|
+
|
|
1215
|
+
New projects MUST be initialized with the following modern-stack config blocks explicitly written to `.workflow/config.json` so users can see + tune them (defaults-only inheritance is fine for behavior, but visibility matters for learning):
|
|
1216
|
+
|
|
1217
|
+
```json
|
|
1218
|
+
{
|
|
1219
|
+
"intentGroundedReasoning": { "enabled": true },
|
|
1220
|
+
"taskBoundaryReset": {
|
|
1221
|
+
"enabled": true,
|
|
1222
|
+
"maxRestartsPerSession": 50
|
|
1223
|
+
},
|
|
1224
|
+
"storyFlow": {
|
|
1225
|
+
"consumerImpactAnalysis": { "enabled": true, "breakingThreshold": 5 },
|
|
1226
|
+
"scopeConfidenceAudit": { "enabled": true },
|
|
1227
|
+
"itemReconciliation": { "enabled": true, "minItems": 3 }
|
|
1228
|
+
},
|
|
1229
|
+
"longInputGate": { "enabled": true, "lineThreshold": 40 },
|
|
1230
|
+
"researchReasoningGate": {
|
|
1231
|
+
"enabled": true,
|
|
1232
|
+
"tier2": { "enabled": true },
|
|
1233
|
+
"tier3": { "enabled": true, "adversaryModel": "sonnet" }
|
|
1234
|
+
}
|
|
1235
|
+
}
|
|
1236
|
+
```
|
|
1237
|
+
|
|
1238
|
+
These capabilities (IGR, task-boundary restart, P0 story gates, long-input routing, research reasoning gate) have proven out in 2.22+ releases. New users should NOT have to manually enable them via `flow migrate-igr` or equivalent — they are active from the first session.
|
|
1239
|
+
|
|
1240
|
+
If onboarding a workspace (multi-repo), also ensure `workspace.autoPickupChannelDispatches: true` and the 2.22.x restart-handoff settings are present.
|
|
@@ -275,6 +275,52 @@ In `config.json`:
|
|
|
275
275
|
}
|
|
276
276
|
```
|
|
277
277
|
|
|
278
|
+
## Promotion Adversary (v2.25.0+ — MANDATORY)
|
|
279
|
+
|
|
280
|
+
Before promoting a pattern from `feedback-patterns.md` to `decisions.md`, run a **Promotion Adversary** on a different model. Rationale: same-model self-critique rubber-stamps. The adversary checks whether the N events that triggered promotion share an actual root cause (genuine recurrence) vs. superficial similarity with different underlying causes (false recurrence — common when the pattern detector just matched keywords).
|
|
281
|
+
|
|
282
|
+
```
|
|
283
|
+
Spawn Agent (subagent_type: general-purpose,
|
|
284
|
+
model: config.researchReasoningGate.tier3.adversaryModel, default 'sonnet'):
|
|
285
|
+
|
|
286
|
+
Input:
|
|
287
|
+
Proposed rule: <title + body>
|
|
288
|
+
Triggering events: [
|
|
289
|
+
{ date, request, correction },
|
|
290
|
+
{ date, request, correction },
|
|
291
|
+
{ date, request, correction }
|
|
292
|
+
]
|
|
293
|
+
|
|
294
|
+
Prompt:
|
|
295
|
+
You are the rule-promotion adversary.
|
|
296
|
+
Do these N events actually share a root cause, or are they superficially
|
|
297
|
+
similar events with different underlying issues?
|
|
298
|
+
|
|
299
|
+
1. For each event: describe the root cause in your own words.
|
|
300
|
+
2. List what's common to all N root causes.
|
|
301
|
+
3. List what's different between them.
|
|
302
|
+
4. Verdict:
|
|
303
|
+
- SAME_PATTERN — genuine recurrence; rule is well-founded
|
|
304
|
+
- MIXED — N-1 match but one event has a different root cause
|
|
305
|
+
- DIFFERENT — surface-similar only; no unifying pattern
|
|
306
|
+
|
|
307
|
+
Output JSON:
|
|
308
|
+
{
|
|
309
|
+
"verdict": "SAME_PATTERN" | "MIXED" | "DIFFERENT",
|
|
310
|
+
"root_causes": [...],
|
|
311
|
+
"commonalities": [...],
|
|
312
|
+
"differences": [...],
|
|
313
|
+
"suggested_rule_scope": "as_proposed" | "narrower" | "split_into_N"
|
|
314
|
+
}
|
|
315
|
+
```
|
|
316
|
+
|
|
317
|
+
Process the verdict:
|
|
318
|
+
- **SAME_PATTERN** → proceed with promotion as-is
|
|
319
|
+
- **MIXED** → ask the user: "Adversary flags event #X as different root cause. Promote rule anyway, narrow scope, or split into multiple rules?"
|
|
320
|
+
- **DIFFERENT** → DO NOT auto-promote. Surface adversary output; require explicit user confirmation.
|
|
321
|
+
|
|
322
|
+
Fail-open: if adversary cannot be spawned (missing API key, network), proceed with standard promotion and log a warning. The threshold check + user confirmation still apply.
|
|
323
|
+
|
|
278
324
|
## Files
|
|
279
325
|
|
|
280
326
|
| Action | File |
|
|
@@ -1077,3 +1077,29 @@ AskUserQuestion({
|
|
|
1077
1077
|
}]
|
|
1078
1078
|
});
|
|
1079
1079
|
```
|
|
1080
|
+
|
|
1081
|
+
## v2.25.0+ — Modern Config Scaffolding (MANDATORY)
|
|
1082
|
+
|
|
1083
|
+
When generating `.workflow/config.json` for a fresh project, include these 2.22+ capability blocks so new users inherit the current-best defaults:
|
|
1084
|
+
|
|
1085
|
+
```json
|
|
1086
|
+
{
|
|
1087
|
+
"intentGroundedReasoning": { "enabled": true },
|
|
1088
|
+
"taskBoundaryReset": { "enabled": true, "maxRestartsPerSession": 50 },
|
|
1089
|
+
"storyFlow": {
|
|
1090
|
+
"consumerImpactAnalysis": { "enabled": true, "breakingThreshold": 5 },
|
|
1091
|
+
"scopeConfidenceAudit": { "enabled": true },
|
|
1092
|
+
"itemReconciliation": { "enabled": true, "minItems": 3 }
|
|
1093
|
+
},
|
|
1094
|
+
"longInputGate": { "enabled": true, "lineThreshold": 40 },
|
|
1095
|
+
"researchReasoningGate": {
|
|
1096
|
+
"enabled": true,
|
|
1097
|
+
"tier2": { "enabled": true },
|
|
1098
|
+
"tier3": { "enabled": true, "adversaryModel": "sonnet" }
|
|
1099
|
+
}
|
|
1100
|
+
}
|
|
1101
|
+
```
|
|
1102
|
+
|
|
1103
|
+
These drive IGR (Architect + Adversary + Truth Gate), task-boundary context reset via the `wogi-claude` wrapper, `/wogi-story` P0 spec gates, auto-routing of long inputs to `/wogi-extract-review`, and the research reasoning gate's assumption-surfacing + cross-model adversary. All have proven out across the 2.22.x release series; new users should not have to discover them one at a time.
|
|
1104
|
+
|
|
1105
|
+
For multi-repo workspaces, also scaffold `workspace.autoPickupChannelDispatches: true` and leave the other `workspace.*` defaults intact — they include the 2.22.2 restart-handoff protocol.
|
|
@@ -47,9 +47,11 @@ Models are selected once per session and remembered for subsequent runs.
|
|
|
47
47
|
├─────────────────────────────────────────────────────────┤
|
|
48
48
|
│ 1. Collect code changes (git diff or specified files) │
|
|
49
49
|
│ 2. Classify change size → effort tier: │
|
|
50
|
-
│ L0/L1 (>10 files) → opus
|
|
50
|
+
│ L0/L1 (>10 files) → opus (latest) xhigh │
|
|
51
51
|
│ L2 (3-10 files) → sonnet medium │
|
|
52
52
|
│ L3 (<3 files) → haiku medium │
|
|
53
|
+
│ (Model IDs resolve from config.models — avoid │
|
|
54
|
+
│ hardcoding model version in this doc.) │
|
|
53
55
|
│ 3. Generate improvement-focused prompt │
|
|
54
56
|
│ 4. If includeClaude enabled: │
|
|
55
57
|
│ - Launch Claude review (Task agent, Explore type) │
|
|
@@ -96,7 +98,7 @@ analysis, EACH carrying an explicit evidence tier.
|
|
|
96
98
|
|
|
97
99
|
## Synthesis Adversary (v2.23.0+ — MANDATORY unless `--no-adversary`)
|
|
98
100
|
|
|
99
|
-
After initial synthesis, spawn a single adversary agent on a DIFFERENT model from the synthesizer (default
|
|
101
|
+
After initial synthesis, spawn a single adversary agent on a DIFFERENT model from the synthesizer (default `sonnet`; override via the canonical `config.researchReasoningGate.tier3.adversaryModel` — same key used by `/wogi-debug-hypothesis`, `/wogi-learn`, `/wogi-decide`). Prompt:
|
|
100
102
|
|
|
101
103
|
```
|
|
102
104
|
You are the synthesis adversary.
|
|
@@ -199,7 +201,7 @@ For manual review (no API keys needed): `/wogi-peer-review --manual`
|
|
|
199
201
|
| `--verbose` | Show full model responses |
|
|
200
202
|
| `--create-tasks` | Auto-create tasks for strong agreements |
|
|
201
203
|
| `--no-adversary` | Skip the v2.23.0 synthesis adversary (not recommended for L0/L1 diffs) |
|
|
202
|
-
| `--adversary-model <id>` | Override adversary model (default:
|
|
204
|
+
| `--adversary-model <id>` | Override adversary model (default: `config.researchReasoningGate.tier3.adversaryModel`, usually `sonnet`) |
|
|
203
205
|
| `--effort <level>` | Override effort tier (low/medium/high/xhigh/max) — otherwise derived from diff size |
|
|
204
206
|
|
|
205
207
|
ARGUMENTS: {args}
|
|
@@ -356,3 +356,60 @@ Each finding is displayed using these fields from `last-review.json`:
|
|
|
356
356
|
| File | `finding.file` + `finding.line` | "src/api.ts:45" |
|
|
357
357
|
| Issue | `finding.issue` | "Raw JSON.parse without try-catch" |
|
|
358
358
|
| Recommendation | `finding.recommendation` | "Use safeJsonParse from flow-utils.js" |
|
|
359
|
+
|
|
360
|
+
## Anti-Deferral Enforcement (v2.25.0+ — two layers)
|
|
361
|
+
|
|
362
|
+
The **Review-Findings Anti-Deferral Rule** (`.workflow/state/decisions.md`, 2026-04-15) gets two complementary enforcement layers. One mechanical (an actual gate in the codebase), one AI-followed (a protocol documented here that the triage flow honors).
|
|
363
|
+
|
|
364
|
+
### Layer 1 — Mechanical gate (v2.25.1+)
|
|
365
|
+
|
|
366
|
+
`scripts/flow-completion-truth-gate.js` exports `parseCommitMessageClaims()` and `verifyCommitMessageAgainstDiff()`. Callers pass a commit message and the staged diff (or changed-files list); the function parses finding IDs (`F1`/`M1`/`SEC-001`), task IDs (`wf-XXXXXXXX` after fix/close/resolve verbs), and file-path mentions, then checks each against the diff. Any unverified claim surfaces as a blocking prompt with three remediation options. This is real code, callable from pre-commit hooks, `flow-done.js`, or the triage flow itself.
|
|
367
|
+
|
|
368
|
+
Example usage:
|
|
369
|
+
```javascript
|
|
370
|
+
const { verifyCommitMessageAgainstDiff, formatMissingClaimsMessage } =
|
|
371
|
+
require('wogiflow/scripts/flow-completion-truth-gate');
|
|
372
|
+
|
|
373
|
+
const result = verifyCommitMessageAgainstDiff(commitMsg, { diffText, changedFiles });
|
|
374
|
+
if (!result.ok) {
|
|
375
|
+
console.error(formatMissingClaimsMessage(result));
|
|
376
|
+
// Block + remediate
|
|
377
|
+
}
|
|
378
|
+
```
|
|
379
|
+
|
|
380
|
+
### Layer 2 — AI-followed protocol (documentation)
|
|
381
|
+
|
|
382
|
+
The rest of the triage flow is a protocol the AI follows. It is NOT automatically enforced by a hook — the historical v2.17.4 incident showed that doc-only protocols can be violated. The mechanical gate above closes the most damaging failure mode (commit message / diff mismatch). The AI-followed rules below cover the earlier stages:
|
|
383
|
+
|
|
384
|
+
1. **Defer requires explicit user confirmation + reason.** The triage flow prompts when proposing to defer:
|
|
385
|
+
```
|
|
386
|
+
Defer finding wf-review-XXXX?
|
|
387
|
+
Severity: HIGH
|
|
388
|
+
Reason required: [user input]
|
|
389
|
+
[Confirm defer] [Cancel — fix now]
|
|
390
|
+
```
|
|
391
|
+
Auto-defer without reason is forbidden by this protocol.
|
|
392
|
+
|
|
393
|
+
2. **"Fix all" / "Option 1" means fix ALL.** If the user requests bulk processing:
|
|
394
|
+
- Ship a fix for every finding with evidence-tier ≥ 1
|
|
395
|
+
- If any finding is too large, STOP and ask: "Finding X requires ~Y minutes of work. Ship now, split to its own release, or defer (needs reason)?"
|
|
396
|
+
- Never silently convert a finding to "deferred" in commit messages or release notes
|
|
397
|
+
|
|
398
|
+
3. **Triage output includes a Deferral Audit Trail**:
|
|
399
|
+
```
|
|
400
|
+
━━━ TRIAGE SUMMARY ━━━
|
|
401
|
+
Fixed: 12
|
|
402
|
+
Deferred (with reasons): 2
|
|
403
|
+
• M3 — "requires restructure, tracked as wf-XXXXXXXX" (user-confirmed)
|
|
404
|
+
• L5 — "out of scope for current release" (user-confirmed)
|
|
405
|
+
Silently dropped: 0 ← MUST be 0
|
|
406
|
+
━━━━━━━━━━━━━━━━━━━━━━
|
|
407
|
+
```
|
|
408
|
+
|
|
409
|
+
### Honest tradeoff
|
|
410
|
+
|
|
411
|
+
Layer 1 is genuinely mechanical — impossible for an AI to bypass without explicitly disabling the gate. Layer 2 is a protocol the AI can fail to follow if prompted poorly, distracted, or confused about priorities. Both matter; calling the whole system "architecturally impossible to bypass" would be inaccurate. The mechanical gate at least ensures that WHEN the AI writes a commit message, claimed fixes must actually appear in the diff.
|
|
412
|
+
|
|
413
|
+
Historical incident (v2.17.4 release, 2026-04-15): commit claimed "fix all findings" but M1 and M3 were silently dropped. Layer 1 would have caught that — the commit message mentioned M1 + M3 but the diff didn't. Layer 2 is the human-protocol reinforcement.
|
|
414
|
+
|
|
415
|
+
Skip via `config.triage.antiDeferralEnforcement.enabled: false` — note that this is currently a surface flag only (read by AI-followed protocol, not by the Layer 1 gate); to disable Layer 1 set `config.commitClaimsGate.enabled: false`.
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "wogiflow",
|
|
3
|
-
"version": "2.
|
|
3
|
+
"version": "2.25.1",
|
|
4
4
|
"description": "AI-powered development workflow management system with multi-model support",
|
|
5
5
|
"main": "lib/index.js",
|
|
6
6
|
"bin": {
|
|
@@ -10,7 +10,7 @@
|
|
|
10
10
|
},
|
|
11
11
|
"scripts": {
|
|
12
12
|
"flow": "./scripts/flow",
|
|
13
|
-
"test": "NODE_ENV=test node --test tests/auto-compact-prompt.test.js tests/flow-paths.test.js tests/flow-io.test.js tests/flow-config-loader.test.js tests/flow-damage-control.test.js tests/flow-output.test.js tests/flow-constants.test.js tests/flow-session-state.test.js tests/flow-hooks-integration.test.js tests/flow-utils.test.js tests/flow-security.test.js tests/flow-memory-db.test.js tests/flow-durable-session.test.js tests/flow-skill-matcher.test.js tests/flow-bridge.test.js tests/flow-proactive-compact.test.js tests/flow-cascade-completion.test.js tests/flow-capture-gate.test.js tests/flow-correction-detector-hybrid.test.js tests/flow-promote.test.js tests/flow-archive-runs.test.js tests/flow-memory.test.js tests/flow-hooks-pre-tool-helpers.test.js tests/flow-hooks-bugfix-scope-gate.test.js tests/flow-hooks-routing-gate.test.js tests/flow-hooks-phase-read-gate.test.js tests/flow-hooks-commit-log-gate.test.js tests/flow-hooks-deploy-gate.test.js tests/flow-hooks-todowrite-gate.test.js tests/flow-hooks-git-safety-gate.test.js tests/flow-hooks-scope-mutation-gate.test.js tests/flow-hooks-strike-gate.test.js tests/flow-hooks-component-check.test.js tests/flow-hooks-scope-gate.test.js tests/flow-hooks-implementation-gate.test.js tests/flow-hooks-research-gate.test.js tests/flow-hooks-loop-check.test.js tests/flow-hooks-manager-boundary-gate.test.js tests/flow-hooks-phase-gate.test.js tests/flow-hooks-pre-tool-orchestrator.test.js tests/flow-hooks-observation-capture.test.js tests/flow-hooks-task-gate.test.js tests/flow-durable-session-suspension.test.js tests/flow-health-mcp-scopes.test.js tests/flow-lean-config.test.js tests/flow-workspace-autopickup.test.js tests/flow-worker-boundary-gate.test.js tests/flow-worker-question-classifier.test.js tests/flow-completion-truth-gate-contradictions.test.js tests/flow-structure-sensor.test.js tests/flow-workspace-dispatch-tracking.test.js tests/flow-story-gates.test.js tests/flow-workspace-restart-handoff.test.js tests/flow-wogi-claude-wrapper.test.js tests/flow-wave1-integrations.test.js tests/flow-wave2-integrations.test.js && NODE_ENV=test node tests/run-quality-gates.test.js",
|
|
13
|
+
"test": "NODE_ENV=test node --test tests/auto-compact-prompt.test.js tests/flow-paths.test.js tests/flow-io.test.js tests/flow-config-loader.test.js tests/flow-damage-control.test.js tests/flow-output.test.js tests/flow-constants.test.js tests/flow-session-state.test.js tests/flow-hooks-integration.test.js tests/flow-utils.test.js tests/flow-security.test.js tests/flow-memory-db.test.js tests/flow-durable-session.test.js tests/flow-skill-matcher.test.js tests/flow-bridge.test.js tests/flow-proactive-compact.test.js tests/flow-cascade-completion.test.js tests/flow-capture-gate.test.js tests/flow-correction-detector-hybrid.test.js tests/flow-promote.test.js tests/flow-archive-runs.test.js tests/flow-memory.test.js tests/flow-hooks-pre-tool-helpers.test.js tests/flow-hooks-bugfix-scope-gate.test.js tests/flow-hooks-routing-gate.test.js tests/flow-hooks-phase-read-gate.test.js tests/flow-hooks-commit-log-gate.test.js tests/flow-hooks-deploy-gate.test.js tests/flow-hooks-todowrite-gate.test.js tests/flow-hooks-git-safety-gate.test.js tests/flow-hooks-scope-mutation-gate.test.js tests/flow-hooks-strike-gate.test.js tests/flow-hooks-component-check.test.js tests/flow-hooks-scope-gate.test.js tests/flow-hooks-implementation-gate.test.js tests/flow-hooks-research-gate.test.js tests/flow-hooks-loop-check.test.js tests/flow-hooks-manager-boundary-gate.test.js tests/flow-hooks-phase-gate.test.js tests/flow-hooks-pre-tool-orchestrator.test.js tests/flow-hooks-observation-capture.test.js tests/flow-hooks-task-gate.test.js tests/flow-durable-session-suspension.test.js tests/flow-health-mcp-scopes.test.js tests/flow-lean-config.test.js tests/flow-workspace-autopickup.test.js tests/flow-worker-boundary-gate.test.js tests/flow-worker-question-classifier.test.js tests/flow-completion-truth-gate-contradictions.test.js tests/flow-structure-sensor.test.js tests/flow-workspace-dispatch-tracking.test.js tests/flow-story-gates.test.js tests/flow-workspace-restart-handoff.test.js tests/flow-wogi-claude-wrapper.test.js tests/flow-wave1-integrations.test.js tests/flow-wave2-integrations.test.js tests/flow-wave3-integrations.test.js tests/flow-commit-claims-gate.test.js && NODE_ENV=test node tests/run-quality-gates.test.js",
|
|
14
14
|
"test:syntax": "find scripts/ lib/ -name '*.js' -not -path '*/node_modules/*' -exec node --check {} +",
|
|
15
15
|
"lint": "eslint scripts/ lib/ tests/",
|
|
16
16
|
"lint:ci": "eslint scripts/ lib/ tests/ --max-warnings 0",
|
|
@@ -614,6 +614,133 @@ function collectArrayEntries(obj, keys) {
|
|
|
614
614
|
return out;
|
|
615
615
|
}
|
|
616
616
|
|
|
617
|
+
// ============================================================
|
|
618
|
+
// Commit-vs-diff consistency scanner (v2.25.1 — H2b from Waves 1-3 review)
|
|
619
|
+
// ============================================================
|
|
620
|
+
|
|
621
|
+
/**
|
|
622
|
+
* Parse a commit message for "fixes X" / "closes X" / "F1, F2, M1" style claims
|
|
623
|
+
* that should be verifiable against the diff.
|
|
624
|
+
*
|
|
625
|
+
* Heuristics — conservative to avoid false positives:
|
|
626
|
+
* 1. Bracketed finding IDs: `F1`, `F2`, `M1`, `H3`, `L5`, or `SEC-001`/`PERF-002`
|
|
627
|
+
* 2. Task IDs: `wf-XXXXXXXX` that appear as "fixes wf-...", "closes wf-...", etc.
|
|
628
|
+
* 3. File paths mentioned in fix-context: "fixes `path/to/file.js`"
|
|
629
|
+
*
|
|
630
|
+
* Returns the structured claims a diff-consistency check can verify.
|
|
631
|
+
*
|
|
632
|
+
* @param {string} commitMessage
|
|
633
|
+
* @returns {{claims: Array<{kind: 'finding-id'|'task-id'|'file', value: string, raw: string}>}}
|
|
634
|
+
*/
|
|
635
|
+
function parseCommitMessageClaims(commitMessage) {
|
|
636
|
+
const claims = [];
|
|
637
|
+
if (typeof commitMessage !== 'string' || commitMessage.trim().length === 0) {
|
|
638
|
+
return { claims };
|
|
639
|
+
}
|
|
640
|
+
|
|
641
|
+
// Finding IDs: F1, F2, M1, H3, L5, SEC-001, PERF-002, etc.
|
|
642
|
+
// - Single-letter + digits: match on word boundary
|
|
643
|
+
// - ALLCAPS-dashnum: SEC-001, PERF-002
|
|
644
|
+
const findingRe = /\b(?:F\d+|H\d+|M\d+|L\d+|[A-Z]{2,6}-\d+)\b/g;
|
|
645
|
+
for (const m of commitMessage.matchAll(findingRe)) {
|
|
646
|
+
claims.push({ kind: 'finding-id', value: m[0], raw: m[0] });
|
|
647
|
+
}
|
|
648
|
+
|
|
649
|
+
// Task IDs (wf-XXXXXXXX) — only count if preceded by fix/close/resolve verb
|
|
650
|
+
const taskRe = /\b(?:fix(?:es|ed)?|clos(?:es|ed)?|resolv(?:es|ed)?|address(?:es|ed)?)\s+(wf-[0-9a-f]{8})\b/gi;
|
|
651
|
+
for (const m of commitMessage.matchAll(taskRe)) {
|
|
652
|
+
claims.push({ kind: 'task-id', value: m[1], raw: m[0] });
|
|
653
|
+
}
|
|
654
|
+
|
|
655
|
+
// File paths in backticks after fix/address verbs: `fixes \`path/to/file.js\``
|
|
656
|
+
const fileRe = /(?:fix(?:es|ed)?|address(?:es|ed)?|updat(?:es|ed)?)\s+`([^`\n]{3,120})`/gi;
|
|
657
|
+
for (const m of commitMessage.matchAll(fileRe)) {
|
|
658
|
+
// Only count values that look like file paths (have an extension or a slash)
|
|
659
|
+
const val = m[1];
|
|
660
|
+
if (/[./]/.test(val) && !val.includes(' ')) {
|
|
661
|
+
claims.push({ kind: 'file', value: val, raw: m[0] });
|
|
662
|
+
}
|
|
663
|
+
}
|
|
664
|
+
|
|
665
|
+
// Dedup
|
|
666
|
+
const seen = new Set();
|
|
667
|
+
return {
|
|
668
|
+
claims: claims.filter(c => {
|
|
669
|
+
const k = `${c.kind}::${c.value.toLowerCase()}`;
|
|
670
|
+
if (seen.has(k)) return false;
|
|
671
|
+
seen.add(k);
|
|
672
|
+
return true;
|
|
673
|
+
})
|
|
674
|
+
};
|
|
675
|
+
}
|
|
676
|
+
|
|
677
|
+
/**
|
|
678
|
+
* Check commit message claims against the staged diff. Each claim must appear
|
|
679
|
+
* somewhere in the diff (a file path in the changed-files list OR the token
|
|
680
|
+
* appearing as-is in the diff body).
|
|
681
|
+
*
|
|
682
|
+
* @param {string} commitMessage
|
|
683
|
+
* @param {Object} [opts]
|
|
684
|
+
* @param {string} [opts.diffText] — raw `git diff --staged` output
|
|
685
|
+
* @param {string[]} [opts.changedFiles] — staged file list (alternative input)
|
|
686
|
+
* @returns {{ok: boolean, totalClaims: number, missingClaims: Array, verifiedClaims: Array}}
|
|
687
|
+
*/
|
|
688
|
+
function verifyCommitMessageAgainstDiff(commitMessage, opts = {}) {
|
|
689
|
+
const { claims } = parseCommitMessageClaims(commitMessage);
|
|
690
|
+
if (claims.length === 0) return { ok: true, totalClaims: 0, missingClaims: [], verifiedClaims: [] };
|
|
691
|
+
|
|
692
|
+
const diffText = typeof opts.diffText === 'string' ? opts.diffText : '';
|
|
693
|
+
const changedFiles = Array.isArray(opts.changedFiles) ? opts.changedFiles : [];
|
|
694
|
+
const haystack = [diffText, ...changedFiles].join('\n');
|
|
695
|
+
|
|
696
|
+
const missingClaims = [];
|
|
697
|
+
const verifiedClaims = [];
|
|
698
|
+
|
|
699
|
+
for (const claim of claims) {
|
|
700
|
+
let found = false;
|
|
701
|
+
if (claim.kind === 'file') {
|
|
702
|
+
// File claims verify by exact path match (or suffix) in changed-files list
|
|
703
|
+
found = changedFiles.some(f => f === claim.value || f.endsWith('/' + claim.value) || f.endsWith(claim.value));
|
|
704
|
+
if (!found) found = diffText.includes(claim.value);
|
|
705
|
+
} else {
|
|
706
|
+
// finding-id + task-id: plain substring search in the haystack
|
|
707
|
+
found = haystack.includes(claim.value);
|
|
708
|
+
}
|
|
709
|
+
(found ? verifiedClaims : missingClaims).push(claim);
|
|
710
|
+
}
|
|
711
|
+
|
|
712
|
+
return {
|
|
713
|
+
ok: missingClaims.length === 0,
|
|
714
|
+
totalClaims: claims.length,
|
|
715
|
+
missingClaims,
|
|
716
|
+
verifiedClaims
|
|
717
|
+
};
|
|
718
|
+
}
|
|
719
|
+
|
|
720
|
+
/**
|
|
721
|
+
* Human-readable message when claims are missing from the diff.
|
|
722
|
+
*
|
|
723
|
+
* @param {Object} result — from verifyCommitMessageAgainstDiff
|
|
724
|
+
* @returns {string|null}
|
|
725
|
+
*/
|
|
726
|
+
function formatMissingClaimsMessage(result) {
|
|
727
|
+
if (!result || result.ok || !Array.isArray(result.missingClaims) || result.missingClaims.length === 0) {
|
|
728
|
+
return null;
|
|
729
|
+
}
|
|
730
|
+
const lines = [
|
|
731
|
+
`Commit message claims ${result.missingClaims.length} item(s) that do not appear in the staged diff:`
|
|
732
|
+
];
|
|
733
|
+
for (const c of result.missingClaims) {
|
|
734
|
+
lines.push(` • ${c.kind === 'finding-id' ? 'Finding' : c.kind === 'task-id' ? 'Task' : 'File'} "${c.value}" — not found`);
|
|
735
|
+
}
|
|
736
|
+
lines.push('');
|
|
737
|
+
lines.push('Options:');
|
|
738
|
+
lines.push(' 1. Add the missing fix to the commit now (git add + amend)');
|
|
739
|
+
lines.push(' 2. Remove the unverified claim from the commit message');
|
|
740
|
+
lines.push(' 3. Acknowledge + proceed (use --force-commit-claims if blocking from a gate)');
|
|
741
|
+
return lines.join('\n');
|
|
742
|
+
}
|
|
743
|
+
|
|
617
744
|
// ============================================================
|
|
618
745
|
// Exports
|
|
619
746
|
// ============================================================
|
|
@@ -627,6 +754,9 @@ module.exports = {
|
|
|
627
754
|
isTruthGateDisabled,
|
|
628
755
|
getMinTierForDone,
|
|
629
756
|
scanForClaimContradictions,
|
|
757
|
+
parseCommitMessageClaims,
|
|
758
|
+
verifyCommitMessageAgainstDiff,
|
|
759
|
+
formatMissingClaimsMessage,
|
|
630
760
|
TIER_NAMES,
|
|
631
761
|
DONE_WORDS,
|
|
632
762
|
DISAGREEMENT_WORDS,
|
|
@@ -818,6 +818,31 @@ const CONFIG_DEFAULTS = {
|
|
|
818
818
|
// --- Gate Confidence ---
|
|
819
819
|
gateConfidence: { enabled: false },
|
|
820
820
|
|
|
821
|
+
// --- Intent-Grounded Reasoning (IGR) ---
|
|
822
|
+
// Master flag for the IGR pipeline: Intent Framing (Step 1.15), Architect
|
|
823
|
+
// Pass (Step 1.55), Logic Adversary (Step 1.57), Scope-Confidence Audit
|
|
824
|
+
// (Step 1.45), Completion Truth Gate (Step 3.9). Default-on so new projects
|
|
825
|
+
// inherit the full reasoning pipeline. See .claude/docs/intent-grounded-reasoning.md.
|
|
826
|
+
intentGroundedReasoning: {
|
|
827
|
+
enabled: true,
|
|
828
|
+
_comment: 'IGR pipeline: architect + logic adversary + truth gate. See .claude/docs/intent-grounded-reasoning.md'
|
|
829
|
+
},
|
|
830
|
+
|
|
831
|
+
// --- Research Reasoning Gate ---
|
|
832
|
+
// Tiered classification for conversation-mode questions. Tier 1 = factual,
|
|
833
|
+
// direct answer. Tier 2 = domain/recommendation, surface assumptions and
|
|
834
|
+
// wait for user confirmation. Tier 3 = architecture, tier 2 flow + spawn
|
|
835
|
+
// cross-model adversary. See wogi-start.md § Research Reasoning Gate.
|
|
836
|
+
researchReasoningGate: {
|
|
837
|
+
enabled: true,
|
|
838
|
+
tier2: { enabled: true },
|
|
839
|
+
tier3: {
|
|
840
|
+
enabled: true,
|
|
841
|
+
adversaryModel: 'sonnet',
|
|
842
|
+
_comment_adversaryModel: 'Model used for Tier-3 cross-model adversary. Reused by /wogi-peer-review, /wogi-debug-hypothesis, /wogi-learn, /wogi-decide — single canonical key.'
|
|
843
|
+
}
|
|
844
|
+
},
|
|
845
|
+
|
|
821
846
|
// --- Long Input Gate ---
|
|
822
847
|
longInputGate: {
|
|
823
848
|
enabled: true,
|
|
@@ -104,8 +104,13 @@ function loadReviewSession() {
|
|
|
104
104
|
return null;
|
|
105
105
|
}
|
|
106
106
|
|
|
107
|
-
// Check for prototype pollution keys
|
|
108
|
-
|
|
107
|
+
// Check for prototype pollution keys. Use Object.prototype.hasOwnProperty
|
|
108
|
+
// rather than `key in parsed` — the latter also returns true for inherited
|
|
109
|
+
// properties, and EVERY plain object inherits `constructor` from
|
|
110
|
+
// Object.prototype, which made this guard falsely trip on every valid
|
|
111
|
+
// session file (pre-existing bug, found via v2.25.1 wave2 test).
|
|
112
|
+
const hasOwn = Object.prototype.hasOwnProperty;
|
|
113
|
+
if (hasOwn.call(parsed, '__proto__') || hasOwn.call(parsed, 'constructor') || hasOwn.call(parsed, 'prototype')) {
|
|
109
114
|
console.error('Review session file contains unsafe keys');
|
|
110
115
|
return null;
|
|
111
116
|
}
|
|
@@ -414,11 +419,21 @@ function exportAsItemManifest() {
|
|
|
414
419
|
// Coordinate with Intent Bootstrap (see flow-story-gates.coordinateIntentBootstrap)
|
|
415
420
|
// so /wogi-start doesn't re-prompt if the user already scheduled bootstrap via
|
|
416
421
|
// /wogi-story during this session.
|
|
422
|
+
//
|
|
423
|
+
// v2.25.1: Semantics corrected (nit from Waves 1-3 review). The flag
|
|
424
|
+
// represents "is IGR bootstrap active/scheduled for this session?", NOT
|
|
425
|
+
// "did THIS call schedule it?". `result.active` is true when IGR is enabled
|
|
426
|
+
// and bootstrap has been scheduled — whether by this call or a prior one.
|
|
417
427
|
let intentBootstrapScheduled = false;
|
|
418
428
|
try {
|
|
419
429
|
const gates = require('./flow-story-gates');
|
|
420
430
|
const result = gates.coordinateIntentBootstrap();
|
|
421
|
-
|
|
431
|
+
if (result && result.active) {
|
|
432
|
+
// Scheduled in this call OR already-scheduled from a prior call = active
|
|
433
|
+
intentBootstrapScheduled = result.scheduled === true ||
|
|
434
|
+
result.reason === 'already-scheduled' ||
|
|
435
|
+
result.reason === 'artifacts-exist';
|
|
436
|
+
}
|
|
422
437
|
} catch (_err) { /* non-critical */ }
|
|
423
438
|
|
|
424
439
|
return {
|
package/scripts/flow-morning.js
CHANGED
|
@@ -385,22 +385,31 @@ function collectBriefingData() {
|
|
|
385
385
|
// v2.23.0 — Workspace dispatch surfacing (manager mode only).
|
|
386
386
|
// If the user is working inside a workspace manager session, surface any
|
|
387
387
|
// overdue or restart-gap-lost dispatches so the morning briefing catches
|
|
388
|
-
// what the last manager turn would have caught. Fail-open.
|
|
388
|
+
// what the last manager turn would have caught. Fail-open; DEBUG-logged.
|
|
389
389
|
try {
|
|
390
390
|
if (process.env.WOGI_WORKSPACE_ROOT) {
|
|
391
391
|
const { buildOverdueContext } = require('./hooks/core/overdue-dispatches');
|
|
392
392
|
const ctx = buildOverdueContext();
|
|
393
393
|
if (ctx) briefing.workspaceOverdue = ctx;
|
|
394
394
|
}
|
|
395
|
-
} catch (
|
|
395
|
+
} catch (err) {
|
|
396
|
+
if (process.env.DEBUG) {
|
|
397
|
+
console.error(`[morning] Workspace overdue check failed (fail-open): ${err.message}`);
|
|
398
|
+
}
|
|
399
|
+
}
|
|
396
400
|
|
|
397
401
|
// v2.23.0 — Completion-claim honesty scan.
|
|
398
402
|
// Catches done-word-in-notes-while-status-partial and similar
|
|
399
403
|
// contradictions across ready.json (uses the honesty-infra from 2026-04-16).
|
|
404
|
+
// Fail-open; DEBUG-logged.
|
|
400
405
|
try {
|
|
401
406
|
const { checkCompletionClaimHonesty } = require('./flow-health');
|
|
402
407
|
briefing.honestyHits = checkCompletionClaimHonesty();
|
|
403
|
-
} catch (
|
|
408
|
+
} catch (err) {
|
|
409
|
+
if (process.env.DEBUG) {
|
|
410
|
+
console.error(`[morning] Honesty scan failed (fail-open): ${err.message}`);
|
|
411
|
+
}
|
|
412
|
+
}
|
|
404
413
|
|
|
405
414
|
// Generate suggested prompt if enabled
|
|
406
415
|
if (morningConfig.generatePrompt !== false) {
|
|
@@ -596,9 +596,18 @@ function writeWorkspaceSessionEndMessage() {
|
|
|
596
596
|
const workspaceRoot = process.env.WOGI_WORKSPACE_ROOT;
|
|
597
597
|
if (!workspaceRoot) return;
|
|
598
598
|
const repo = process.env.WOGI_REPO_NAME;
|
|
599
|
-
// Only
|
|
600
|
-
//
|
|
601
|
-
|
|
599
|
+
// Only emit this signal from EXPLICIT manager-mode sessions.
|
|
600
|
+
// v2.25.1 (M2 from Waves 1-3 review): tightened to require
|
|
601
|
+
// WOGI_REPO_NAME === 'manager' explicitly. Previously we let
|
|
602
|
+
// unset-repo sessions fall through, which could emit a spurious
|
|
603
|
+
// "manager session ended" broadcast from a mis-env'd worker shell.
|
|
604
|
+
// Workers use their own Stop-hook worker-stopped message.
|
|
605
|
+
if (repo !== 'manager') {
|
|
606
|
+
if (repo && process.env.DEBUG) {
|
|
607
|
+
console.error(`[session-end] Skipping workspace message — WOGI_REPO_NAME is '${repo}', not 'manager'`);
|
|
608
|
+
}
|
|
609
|
+
return;
|
|
610
|
+
}
|
|
602
611
|
|
|
603
612
|
try {
|
|
604
613
|
const messagesLib = path.resolve(__dirname, '..', 'lib', 'workspace-messages.js');
|