wogiflow 2.29.5 → 2.29.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/docs/claude-code-compatibility.md +2 -1
- package/.workflow/templates/claude-md.hbs +18 -34
- package/.workflow/templates/partials/methodology-rules.hbs +96 -105
- package/README.md +1 -1
- package/lib/wogi-claude +34 -3
- package/lib/wogi-claude-expect.exp +30 -5
- package/package.json +2 -2
- package/scripts/flow-defer-auth.js +103 -0
- package/scripts/flow-utils.js +52 -0
- package/scripts/hooks/core/deferral-classifier.js +129 -0
- package/scripts/hooks/core/deferral-gate.js +379 -0
- package/scripts/hooks/core/pre-tool-orchestrator.js +58 -0
- package/scripts/hooks/core/research-evidence-gate.js +11 -1
- package/scripts/hooks/core/research-required-classifier.js +205 -0
- package/scripts/hooks/core/research-required-gate.js +235 -0
- package/scripts/hooks/core/session-context.js +21 -0
- package/scripts/hooks/core/task-boundary-reset.js +132 -1
- package/scripts/hooks/entry/claude-code/stop.js +26 -0
- package/scripts/hooks/entry/claude-code/user-prompt-submit.js +39 -0
|
@@ -77,6 +77,7 @@ flow parallel check # See available parallel tasks
|
|
|
77
77
|
| 2.18.0+ | 2.1.108+ | ENABLE_PROMPT_CACHING_1H guidance, /recap awareness, /doctor MCP duplicate-scope mirror in `/wogi-health` |
|
|
78
78
|
| 2.27.0+ | 2.1.116+ | Sandbox dangerous-path safety on auto-allow, agent frontmatter hooks for `--agent`, `/resume` large-session speedup, MCP stdio concurrent startup |
|
|
79
79
|
| 2.27.0+ | 2.1.117+ | Native bfs/ugrep via Bash (hook audit documented), Opus 4.7 /context fix (estimator already percentage-based), Pro/Max effort default shift (advisory delta documented), agent frontmatter `mcpServers` for `--agent`, subagent model-mismatch malware-warning fix, managed-settings plugin marketplace enforcement |
|
|
80
|
+
| 2.29.6+ | 2.1.132+ | Statusline `context_window` token-count accuracy fix (release notes: was reporting cumulative session totals — may have affected `wogi-statusline-setup` percentage presets if percentage was derived from cumulative tokens), Bedrock/Vertex `ENABLE_PROMPT_CACHING_1H` 400-error fix (recommendation now safe on those providers), `CLAUDE_CODE_SESSION_ID` available in Bash subprocess env |
|
|
80
81
|
|
|
81
82
|
### Environment Variables (2.1.19+)
|
|
82
83
|
|
|
@@ -368,7 +369,7 @@ await cancelTask('wf-123', 'superseded', false);
|
|
|
368
369
|
|
|
369
370
|
### Features in 2.1.108+
|
|
370
371
|
|
|
371
|
-
- **`ENABLE_PROMPT_CACHING_1H` env var (RECOMMENDED for non-subscribers)**: Opts into **1-hour prompt-cache TTL** on **API key, Bedrock, Vertex, and Foundry** providers. Subscribers (Claude Pro, Max, Team, Enterprise via claude.ai OAuth) already get 1h TTL by default — this flag is a **no-op for them**. The complementary `FORCE_PROMPT_CACHING_5M` pins to 5min, and the older `ENABLE_PROMPT_CACHING_1H_BEDROCK` is deprecated but still honored. **Impact on WogiFlow (HIGH)**: WogiFlow sessions load a large, stable prefix every turn — CLAUDE.md (~300 lines), state files (`ready.json`, `decisions.md`, `app-map.md`), phase files, and pinned spec context. At the default 5min TTL, any pause longer than 5 minutes (user thinking, a long `flow` CLI run, a meeting mid-session) invalidates the cache and the next turn pays the full input-token cost again. At 1h TTL, the same prefix stays cached across those pauses, yielding **substantial token-cost reduction** on typical multi-hour WogiFlow work. **Action for API-key / Bedrock / Vertex / Foundry users**: `export ENABLE_PROMPT_CACHING_1H=1` in your shell profile. **Action for subscribers**: none (already enabled). **Risk**: none — if set on a subscriber account it is ignored; if set when not supported, it silently falls back.
|
|
372
|
+
- **`ENABLE_PROMPT_CACHING_1H` env var (RECOMMENDED for non-subscribers)**: Opts into **1-hour prompt-cache TTL** on **API key, Bedrock, Vertex, and Foundry** providers. Subscribers (Claude Pro, Max, Team, Enterprise via claude.ai OAuth) already get 1h TTL by default — this flag is a **no-op for them**. The complementary `FORCE_PROMPT_CACHING_5M` pins to 5min, and the older `ENABLE_PROMPT_CACHING_1H_BEDROCK` is deprecated but still honored. **Impact on WogiFlow (HIGH)**: WogiFlow sessions load a large, stable prefix every turn — CLAUDE.md (~300 lines), state files (`ready.json`, `decisions.md`, `app-map.md`), phase files, and pinned spec context. At the default 5min TTL, any pause longer than 5 minutes (user thinking, a long `flow` CLI run, a meeting mid-session) invalidates the cache and the next turn pays the full input-token cost again. At 1h TTL, the same prefix stays cached across those pauses, yielding **substantial token-cost reduction** on typical multi-hour WogiFlow work. **Action for API-key / Bedrock / Vertex / Foundry users**: `export ENABLE_PROMPT_CACHING_1H=1` in your shell profile. **Action for subscribers**: none (already enabled). **Risk**: none — if set on a subscriber account it is ignored; if set when not supported, it silently falls back. **Bedrock/Vertex caveat**: Some Claude Code versions before 2.1.132 returned 400 errors when this flag was set on Bedrock/Vertex (per the 2.1.132 release notes). Fixed in **2.1.132+** — Bedrock/Vertex users on older Claude Code should upgrade before setting the flag.
|
|
372
373
|
|
|
373
374
|
- **`/recap` command and session recap feature**: Provides context when returning to a session. Configurable in `/config` and manually invocable with `/recap`. For users with telemetry disabled (Bedrock/Vertex/Foundry/`DISABLE_TELEMETRY`), recap is still enabled by default; opt out via `/config` or `CLAUDE_CODE_ENABLE_AWAY_SUMMARY=0`. **Overlap with WogiFlow**: `/wogi-morning`, `/wogi-session-end`, and `/wogi-pre-compact` already provide durable recap via state files. `/recap` is ephemeral (summarizes the current session); WogiFlow's state survives session exit. Use both: `/recap` for intra-session context, `/wogi-morning` for cross-session pickup.
|
|
374
375
|
|
|
@@ -148,53 +148,37 @@ When in doubt, route through `/wogi-start` which will classify correctly.
|
|
|
148
148
|
|
|
149
149
|
### Anti-Deferral Rule (MANDATORY — ZERO TOLERANCE)
|
|
150
150
|
|
|
151
|
-
**You MUST NEVER autonomously defer, skip, deprioritize, or drop items from the user's input.**
|
|
151
|
+
**You MUST NEVER autonomously defer, skip, deprioritize, or drop items from the user's input.** If the user provides N items, ALL N become tracked work items. No judgment calls about "important" vs. "enhancement" vs. "long-term."
|
|
152
152
|
|
|
153
|
-
|
|
153
|
+
**Deferral-specific traps** (in addition to master Anti-Rationalization Checklist above):
|
|
154
|
+
- "Items 6-9 are enhancements, I'll focus on fixes first" → WRONG. Create tasks for ALL items.
|
|
155
|
+
- "I already created the important ones" → WRONG. Important is not your call.
|
|
156
|
+
- "I'll defer these as lower priority" → WRONG. Suggest priority; every item still gets a task.
|
|
157
|
+
- "The ready queue would be too large" → WRONG. A large queue is correct; a filtered queue is data loss.
|
|
158
|
+
- "This one was labeled 'long-term'" → WRONG. The user decides when to execute, not you.
|
|
154
159
|
|
|
155
|
-
**
|
|
156
|
-
- "Items 6-9 are enhancements, I'll focus on the fixes first" → WRONG. Create tasks for ALL items.
|
|
157
|
-
- "This one was labeled 'long-term' by the team" → WRONG. Track it. The user decides when to execute, not you.
|
|
158
|
-
- "I'll defer these as lower priority" → WRONG. You may SUGGEST a priority order, but every item must be a tracked task.
|
|
159
|
-
- "The ready queue would be too large" → WRONG. A large queue is correct. A filtered queue is data loss.
|
|
160
|
-
- "I already created the important ones" → WRONG. Important is not your call. Create ALL of them.
|
|
160
|
+
**MAY**: suggest priority order (P0/P1/P2/P3); group related items into stories (every item appears as a criterion in ≥1 story); ask the user to confirm scope.
|
|
161
161
|
|
|
162
|
-
**
|
|
163
|
-
- Suggest a priority order (P0/P1/P2/P3) — but ALL items get tasks regardless of priority
|
|
164
|
-
- Group related items into stories — but every item must appear as a criterion in at least one story
|
|
165
|
-
- Ask the user to confirm scope — but do NOT preemptively filter
|
|
162
|
+
**MUST NEVER**: silently drop items based on AI judgment; create tasks for a subset without explicit user approval to defer the rest; use words like "deferred"/"skipped"/"not created" for user-provided items.
|
|
166
163
|
|
|
167
|
-
|
|
168
|
-
- Silently drop items because you judged them as "enhancements" or "nice-to-haves"
|
|
169
|
-
- Create tasks for only a subset of items without explicit user approval to defer the rest
|
|
170
|
-
- Use words like "deferred", "skipped", or "not created" for items the user provided
|
|
164
|
+
Applies to `/wogi-start`, `/wogi-story`, `/wogi-epics`, `/wogi-extract-review`, and any command converting user input into tracked work.
|
|
171
165
|
|
|
172
|
-
|
|
166
|
+
### Mid-Execution Anti-Deferral (AFTER TASKS ARE CREATED)
|
|
173
167
|
|
|
174
|
-
|
|
168
|
+
**Reordering is permitted. Deferring is not.** Once work is tracked, you MUST NOT propose to skip, postpone, drop, or "deprioritize to later" — regardless of risk, cost, or token-weight. Only sequence changes. "Revisit later" and "deprioritize" are soft-defer euphemisms.
|
|
175
169
|
|
|
176
|
-
|
|
170
|
+
When genuinely unsure work is still needed: ask explicitly — "Do you still want wf-XXXX to ship this epic, or drop it?" User decides.
|
|
177
171
|
|
|
178
|
-
|
|
179
|
-
|
|
180
|
-
**MAY do after tasks are tracked**: propose sequence/parallelization/prerequisites; flag risks without using them to drop scope.
|
|
181
|
-
|
|
182
|
-
**MUST NEVER do**: propose to "defer", skip based on AI judgment, present a plan that silently omits tracked work.
|
|
183
|
-
|
|
184
|
-
**When genuinely unsure work is still needed**: ask explicitly — "Do you still want wf-XXXX to ship this epic, or drop it?" User decides, not you.
|
|
185
|
-
|
|
186
|
-
### Review-Findings Anti-Deferral (MANDATORY — INCIDENT-DRIVEN)
|
|
172
|
+
### Review-Findings Anti-Deferral
|
|
187
173
|
|
|
188
174
|
Extends Mid-Execution Anti-Deferral to `/wogi-review`, `/wogi-audit`, `/wogi-triage` findings. When the user says "fix all findings" / "option 1" / any variant meaning "address everything":
|
|
189
175
|
|
|
190
176
|
1. Ship a fix for every finding at evidence tier ≥ 1, regardless of effort estimate.
|
|
191
|
-
2. Never silently convert a finding to "deferred" in commit/release notes without the user
|
|
192
|
-
3. If
|
|
193
|
-
4. Never list a finding in release notes without actually fixing it.
|
|
194
|
-
|
|
195
|
-
Transparency ≠ permission. "Low-risk can wait" and "restructure warrants separate release" are AI judgment calls — they're the user's to make. Apply the master Anti-Rationalization Checklist above.
|
|
177
|
+
2. Never silently convert a finding to "deferred" in commit/release notes without the user saying "defer X."
|
|
178
|
+
3. If too large for the current release → STOP and ask: "Finding X requires ~Y min. Ship / split / defer?"
|
|
179
|
+
4. Never list a finding in release notes without actually fixing it.
|
|
196
180
|
|
|
197
|
-
|
|
181
|
+
"Low-risk can wait" and "restructure warrants separate release" are AI judgment calls — the user's to make.
|
|
198
182
|
|
|
199
183
|
### Task ID Format (MANDATORY)
|
|
200
184
|
|
|
@@ -1,24 +1,20 @@
|
|
|
1
1
|
## WogiFlow Methodology Rules
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
Rules below are enforced by shipped hooks; the prose is so Claude understands the contract. Apply the master Anti-Rationalization Checklist (top of CLAUDE.md) to any rule that doesn't list its own.
|
|
4
4
|
|
|
5
5
|
---
|
|
6
6
|
|
|
7
7
|
### Research Before Propose
|
|
8
8
|
|
|
9
|
-
Before proposing a fix, plan, or spec, read 2+ files from `.workflow/state/`, `.workflow/changes/`, `.workflow/specs/`, or `.workflow/epics
|
|
9
|
+
Before proposing a fix, plan, or spec, read 2+ files from `.workflow/state/`, `.workflow/changes/`, `.workflow/specs/`, or `.workflow/epics/`. Clarifying questions are a valid escape; proposing without evidence is not.
|
|
10
10
|
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
Enforced by: `research-evidence-gate.js` (blocks `→ spec_review` / `→ coding` transitions and spec-file writes until threshold met; cleared at task start, session-end, and post-compact). Config: `hooks.rules.researchEvidenceGate.{enabled,minEvidence}` (defaults `true`, `2`).
|
|
11
|
+
Enforced by: `research-evidence-gate.js` (blocks `→ spec_review` / `→ coding` and spec-file writes until threshold met). Config: `hooks.rules.researchEvidenceGate.{enabled,minEvidence}` (defaults `true`, `2`).
|
|
14
12
|
|
|
15
13
|
---
|
|
16
14
|
|
|
17
15
|
### Completion-Claim Honesty Scan
|
|
18
16
|
|
|
19
|
-
At session-end and `flow health`, `ready.json` entries are scanned (surfaced, not blocked) for
|
|
20
|
-
- **Status-mismatch** — free-text says "done/completed/shipped" while `status` is partial/blocked/failed.
|
|
21
|
-
- **Negation-vs-evidence** — free-text says "no outages / 0 regressions" while `hotfixes[]` / `incidents[]` / `regressions[]` is non-empty.
|
|
17
|
+
At session-end and `flow health`, `ready.json` entries are scanned (surfaced, not blocked) for status-mismatch (free-text says "done" while `status` is partial/blocked) and negation-vs-evidence (free-text says "no outages" while `hotfixes[]`/`incidents[]`/`regressions[]` is non-empty).
|
|
22
18
|
|
|
23
19
|
Enforced by: `flow-completion-truth-gate.js → scanForClaimContradictions()`.
|
|
24
20
|
|
|
@@ -26,7 +22,7 @@ Enforced by: `flow-completion-truth-gate.js → scanForClaimContradictions()`.
|
|
|
26
22
|
|
|
27
23
|
### Merge-Plan Artifact Gate
|
|
28
24
|
|
|
29
|
-
`/wogi-finalize` requires `.workflow/scratch/merge-plan.md` for merges >5 commits or any cross-repo merge. Every commit in `git log <base>..<branch>` must map to `port | adapt | skip-style | superseded | skip-with-reason`; SHA-line count
|
|
25
|
+
`/wogi-finalize` requires `.workflow/scratch/merge-plan.md` for merges >5 commits or any cross-repo merge. Every commit in `git log <base>..<branch>` must map to `port | adapt | skip-style | superseded | skip-with-reason`; SHA-line count = commit count. ≥20% restructure-pattern files biases affected commits toward `adapt`.
|
|
30
26
|
|
|
31
27
|
Enforced by: `flow-structure-sensor.js`, `.claude/commands/wogi-finalize.md` Step 2.5.
|
|
32
28
|
|
|
@@ -34,58 +30,46 @@ Enforced by: `flow-structure-sensor.js`, `.claude/commands/wogi-finalize.md` Ste
|
|
|
34
30
|
|
|
35
31
|
### Story Creation Quality Gates
|
|
36
32
|
|
|
37
|
-
`/wogi-story` runs 5 P0 spec-quality gates at creation time
|
|
33
|
+
`/wogi-story` runs 5 P0 spec-quality gates at creation time:
|
|
38
34
|
|
|
39
|
-
1. **Long Input** — ≥40 lines or ≥5
|
|
40
|
-
2. **Item Reconciliation** — ≥3 items → enumerated Item Manifest; unmapped items
|
|
41
|
-
3. **Consumer Impact Analysis** — refactoring keywords trigger `git grep
|
|
42
|
-
4. **Scope-Confidence Audit** — assumption patterns
|
|
43
|
-
5. **Intent Bootstrap Coordination** — schedules IGR
|
|
35
|
+
1. **Long Input** — ≥40 lines or ≥5 items → route to `/wogi-extract-review`.
|
|
36
|
+
2. **Item Reconciliation** — ≥3 items → enumerated Item Manifest; unmapped items warn.
|
|
37
|
+
3. **Consumer Impact Analysis** — refactoring keywords trigger `git grep`; ≥5 breaking → phased migration.
|
|
38
|
+
4. **Scope-Confidence Audit** — assumption patterns verified against codebase; findings → Pending Clarifications.
|
|
39
|
+
5. **Intent Bootstrap Coordination** — schedules IGR bootstrap once.
|
|
44
40
|
|
|
45
|
-
All
|
|
41
|
+
All fail-open. Bypass for tests via `--skip-gates`. Config: `storyFlow.*`.
|
|
46
42
|
|
|
47
43
|
---
|
|
48
44
|
|
|
49
45
|
### Workspace Worker Contract
|
|
50
46
|
|
|
51
|
-
*
|
|
52
|
-
|
|
53
|
-
**Tool-First Turn**: Every turn after `UserPromptSubmit` must contain ≥1 tool call. In strict mode (default), the first assistant content block must be `tool_use`, not text. Pure-text responses are invisible to the user (they only see the manager terminal) and disqualify the worker from the three-state contract below.
|
|
54
|
-
|
|
55
|
-
**Three-State End-of-Turn**: Exactly one of:
|
|
56
|
-
1. **ACTION** — start next pre-approved channel dispatch via `/wogi-start <nextId>`.
|
|
57
|
-
2. **ESCALATION** — channel-dispatch `## QUESTION: ...` to the manager.
|
|
58
|
-
3. **IDLE** — zero pending dispatches AND zero in-progress tasks.
|
|
59
|
-
|
|
60
|
-
Hedging phrases ("awaiting your signal", "let me know", "standing by", "should I continue") are mechanically forbidden — visibility is NOT a substitute for action; the manager already pre-approved the dispatch by queuing it.
|
|
47
|
+
*Workspace worker mode only (`WOGI_WORKSPACE_ROOT` set + `WOGI_REPO_NAME !== 'manager'`). Skip in solo sessions.*
|
|
61
48
|
|
|
62
|
-
**
|
|
49
|
+
- **Tool-First Turn**: every turn after `UserPromptSubmit` must contain ≥1 tool call. In strict mode (default), the first content block must be `tool_use`. Pure-text responses are invisible to the user.
|
|
50
|
+
- **Three-State End-of-Turn**: exactly one of ACTION (`/wogi-start <nextId>`), ESCALATION (channel-dispatch `## QUESTION:`), or IDLE.
|
|
51
|
+
- **Hedging forbidden**: "awaiting your signal", "let me know", "standing by", "should I continue".
|
|
52
|
+
- **No direct user prompts**: `AskUserQuestion` is blocked; questions go through channel dispatch.
|
|
63
53
|
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
Enforced by: `worker-tool-first-gate.js` (G1/G4/Gap B), `worker-boundary-gate.js`, `flow-worker-question-classifier.js`. Config: `workspace.toolFirstTurnGate.{enabled,strict}`, `workspace.blockAskUserQuestionInWorker`, `workspace.aiWorkerQuestionClassifier.*`, `workspace.autoPickupChannelDispatches`.
|
|
54
|
+
Enforced by: `worker-tool-first-gate.js` (G1/G4/Gap B), `worker-boundary-gate.js`, `flow-worker-question-classifier.js`. Config: `workspace.toolFirstTurnGate.{enabled,strict}`, `workspace.blockAskUserQuestionInWorker`, `workspace.aiWorkerQuestionClassifier.*`. Long-form: `.claude/rules/_internal/worker-tool-first-turn.md`.
|
|
67
55
|
|
|
68
56
|
---
|
|
69
57
|
|
|
70
58
|
### Workspace Manager Silent-Halt Detection
|
|
71
59
|
|
|
72
|
-
*
|
|
73
|
-
|
|
74
|
-
Every manager→worker dispatch is tracked. A pending dispatch past its `expectedDeadline` with no `task-complete` or `worker-stopped` message = silent death, surfaced on the manager's next turn via `UserPromptSubmit` `additionalContext`. Default `expectedDurationMs = 30min`; callers override per-dispatch for long tasks.
|
|
60
|
+
*Workspace manager mode only.*
|
|
75
61
|
|
|
76
|
-
Three terminal states:
|
|
62
|
+
Every manager→worker dispatch is tracked. A pending dispatch past `expectedDeadline` with no `task-complete`/`worker-stopped` = silent halt, surfaced on next turn via `UserPromptSubmit` `additionalContext`. Default `expectedDurationMs = 30min`. Three terminal states: Completed / Graceful-stop / Silent-halt.
|
|
77
63
|
|
|
78
|
-
Enforced by: `lib/workspace-dispatch-tracking.js`, `.workspace/state/dispatched-tasks.json` (ring buffer, last 100
|
|
64
|
+
Enforced by: `lib/workspace-dispatch-tracking.js`, `.workspace/state/dispatched-tasks.json` (ring buffer, last 100).
|
|
79
65
|
|
|
80
66
|
---
|
|
81
67
|
|
|
82
68
|
### Main-Mode Question Classifier
|
|
83
69
|
|
|
84
|
-
*
|
|
85
|
-
|
|
86
|
-
Before the Stop hook fires SIGTERM for task-boundary restart, a Haiku classifier inspects the final assistant message. If the AI ended the turn with an open user-facing question AND `pending-question.json` is absent, the classifier writes the marker and defers the restart — the user's reply then lands in the same session context. Fail-open throughout.
|
|
70
|
+
*Solo sessions with `taskBoundaryReset.enabled: true`.*
|
|
87
71
|
|
|
88
|
-
|
|
72
|
+
Before Stop hook fires SIGTERM, a Haiku classifier inspects the final assistant message. Open user-facing question + no `pending-question.json` → write marker, defer restart. Prefer explicit `flow ask "<question>"` (writes marker directly, short-circuits the classifier). Fail-open throughout.
|
|
89
73
|
|
|
90
74
|
Enforced by: `task-boundary-reset.js → consumeAndTriggerRestart()`. Config: `mainModeQuestionClassifier.{enabled,minConfidence,model}`.
|
|
91
75
|
|
|
@@ -93,123 +77,130 @@ Enforced by: `task-boundary-reset.js → consumeAndTriggerRestart()`. Config: `m
|
|
|
93
77
|
|
|
94
78
|
### Main-Mode Auto-Pickup After Clean Restart
|
|
95
79
|
|
|
96
|
-
*
|
|
80
|
+
*Solo sessions with `taskBoundaryReset.enabled: true` AND `autoPickupNextTask: true` (default).*
|
|
97
81
|
|
|
98
|
-
After a task-boundary restart
|
|
82
|
+
After a clean-completion task-boundary restart, SessionStart context injects `AUTO-PICKUP MODE ACTIVE` with the next ready task ID. First user message → invoke `Skill(skill="wogi-start", args="<nextReadyId>")` immediately, regardless of message content.
|
|
99
83
|
|
|
100
|
-
|
|
84
|
+
Precedence: `pending-question.json` wins. Skip conditions (any disables): pending-question exists, ready empty, autoPickup off, marker absent.
|
|
101
85
|
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
Enforced by: `task-boundary-reset.js → writeCleanCompletionMarker()` + `session-context.js → formatContextForInjection()`. Marker: `.workflow/state/task-boundary-clean-completion.json` (single-use).
|
|
86
|
+
Enforced by: `task-boundary-reset.js → writeCleanCompletionMarker()` + `session-context.js`. Marker: `.workflow/state/task-boundary-clean-completion.json` (single-use).
|
|
105
87
|
|
|
106
88
|
---
|
|
107
89
|
|
|
108
|
-
### Code Quality Patterns
|
|
90
|
+
### Code Quality Patterns
|
|
109
91
|
|
|
110
|
-
1. **Single
|
|
111
|
-
2. **Named
|
|
92
|
+
1. **Single source of truth for constants** — import from one canonical location.
|
|
93
|
+
2. **Named constants for magic numbers** — define thresholds as named constants; don't inline literals.
|
|
112
94
|
|
|
113
95
|
---
|
|
114
96
|
|
|
115
97
|
### Regression Discipline
|
|
116
98
|
|
|
117
|
-
Typecheck/lint/build
|
|
99
|
+
Typecheck/lint/build catches code errors, not behavior drift. For critical user-facing flows (login, submit, approve, delete, invite):
|
|
118
100
|
|
|
119
|
-
1.
|
|
120
|
-
2.
|
|
121
|
-
3.
|
|
122
|
-
4.
|
|
101
|
+
1. Executable scripts at `regression-suite/<flow>.<ext>`, not test-plan documents.
|
|
102
|
+
2. Living feature inventory: `Feature | Last Verified | Commit | Regression Script | Known Issues`.
|
|
103
|
+
3. Change-touch rule: task modifying a file mapped to a regression script must pass that script before close.
|
|
104
|
+
4. Audit-seeded inventory via `/wogi-audit`, then human-reviewed.
|
|
123
105
|
|
|
124
|
-
|
|
106
|
+
"Confident my fix won't break it" is not evidence.
|
|
125
107
|
|
|
126
108
|
---
|
|
127
109
|
|
|
128
110
|
### Memory-First Clarification
|
|
129
111
|
|
|
130
|
-
Before asking
|
|
131
|
-
|
|
132
|
-
When you must ask, cite what you checked: *"I read domain-model.md §Roles; it says X — does this apply to Y too?"* — not *"what's Y?"*
|
|
133
|
-
|
|
134
|
-
If artifacts don't exist yet, run `node scripts/flow-intent-bootstrap.js bootstrap` (or trigger via `/wogi-start` on any IGR-enabled task). A project without `domain-model.md` is a project where every domain question will be re-asked every session.
|
|
112
|
+
Before asking a product-domain question, check `.workflow/state/{product,domain-model,user-journeys,glossary}.md`. When you must ask, cite what you read: *"I read domain-model.md §Roles; it says X — does this apply to Y too?"* not *"what's Y?"*. If artifacts don't exist, run `node scripts/flow-intent-bootstrap.js bootstrap`.
|
|
135
113
|
|
|
136
114
|
---
|
|
137
115
|
|
|
138
116
|
### Source Fidelity Rule (Verbatim Source Preservation)
|
|
139
117
|
|
|
140
|
-
When a long-form user request becomes a spec
|
|
118
|
+
When a long-form user request becomes a spec or channel-dispatch, the **verbatim source MUST be preserved alongside the structured derivation**. The lossy step is at the spec-authoring layer (manager summarizing user input); downstream actors then build the summary, missing items the user named.
|
|
141
119
|
|
|
142
|
-
|
|
120
|
+
Mandatory structure for any spec/dispatch derived from a long user prompt (>40 lines OR ≥5 items):
|
|
143
121
|
|
|
144
|
-
|
|
122
|
+
1. **`## Original Request (verbatim)`** — user's prompt unmodified, top of spec body.
|
|
123
|
+
2. **`## Item Manifest`** — enumerated list reconciling every source item to a specific AC OR an explicit `defer-with-reason: <user-cited reason>`. AI-judged "low priority" is not a valid reason.
|
|
124
|
+
3. **Channel-dispatch links the spec, not summarizes it** — manager-to-worker messages MUST include verbatim source OR a path to a saved spec containing it. Bare "summary contracts" are forbidden.
|
|
145
125
|
|
|
146
|
-
|
|
126
|
+
Enforced by: Logic Constitution v3 sub-principle 11.6. Adversary blocks specs missing the block when source qualifies. Verifier: `node scripts/flow-source-fidelity.js check <spec-file>`. Worker fallback: `scripts/hooks/core/long-input-enforcement.js`.
|
|
147
127
|
|
|
148
|
-
|
|
149
|
-
|
|
150
|
-
|
|
128
|
+
---
|
|
129
|
+
|
|
130
|
+
### Cross-Story Integration Tier-3 Rule
|
|
151
131
|
|
|
152
|
-
|
|
132
|
+
When Story B layers on infrastructure shipped by Story A, Story B's IGR pass MUST treat that infrastructure as an audited dependency. Within-module unit tests don't verify Story A's contract holds for Story B's usage.
|
|
153
133
|
|
|
154
|
-
|
|
134
|
+
Mandatory for layering stories:
|
|
155
135
|
|
|
156
|
-
**
|
|
157
|
-
|
|
158
|
-
|
|
159
|
-
-
|
|
160
|
-
- *"This is just an internal manager message; the user won't see it"* → WRONG. That's exactly when the lossy step happens; verbatim preservation is more important here, not less.
|
|
161
|
-
- *"The long-input gate already extracted the items"* → WRONG IF you don't pin its output as canonical and reconcile every spec against it.
|
|
136
|
+
1. **Architect names upstream dependencies** — "Dependencies" section listing prior stories/commits + the specific contract relied on (interface, file format, transport, invariant). Quote the contract.
|
|
137
|
+
2. **Adversary challenges the dependency** — "What if Story A's invariant doesn't hold? What evidence proves the contract is intact for THIS usage?"
|
|
138
|
+
3. **At least one Tier-3 integration test** exercises the chain end-to-end. Mark `// regression-tier3`.
|
|
139
|
+
4. **Pre-release gate** verifies stacked coverage. Missing Tier-3 + stacked stories → block release.
|
|
162
140
|
|
|
163
|
-
|
|
141
|
+
Apply: `git log --oneline <prior-N-commits>` to identify dependencies; for each, write the contract; `grep -r "<interface>"` to verify HEAD; write the Tier-3 test BEFORE Story B's code.
|
|
142
|
+
|
|
143
|
+
Enforced by: Logic Constitution v3 sub-principle 11.5. Pre-release gate consumes this signal before tagging.
|
|
164
144
|
|
|
165
145
|
---
|
|
166
146
|
|
|
167
|
-
###
|
|
147
|
+
### Autonomous Walk-Away Mode
|
|
148
|
+
|
|
149
|
+
User says "go until you finish" / "autonomous mode" / "run this autonomously" / "don't bother me, just do it" → flag activates, AI runs without interruption. While active:
|
|
150
|
+
|
|
151
|
+
- **productBehavior / ux** → append to `.workflow/state/question-queue.json` (do NOT ask). Render in end-of-run summary.
|
|
152
|
+
- **engineering / naming / implementation** → decide autonomously, report in summary.
|
|
153
|
+
- **infrastructure / performance** → decide autonomously, report after.
|
|
154
|
+
- **security** → auto-fix-report-after.
|
|
155
|
+
- **low-confidence technical decisions** → self-adversarial challenge to ≥90% confidence; queue if cap hit. Counter shared with IGR Architect-Adversary loop (default cap 30, `autonomousMode.maxAdversaryInvocations`).
|
|
156
|
+
- **Blocking errors** → fix autonomously; surface only if fundamentally un-fixable.
|
|
168
157
|
|
|
169
|
-
|
|
158
|
+
Persistence: flag in `session-state.json`, survives task-boundary SIGTERM via SessionStart re-hydration. Staleness threshold (`autonomousMode.stalenessThresholdMs`, default 1h) — stale flags don't auto-resume.
|
|
170
159
|
|
|
171
|
-
|
|
160
|
+
Anti-hedging while active: "let me know if", "should I continue", "awaiting your signal", "standing by", "would you like me to" are forbidden.
|
|
172
161
|
|
|
173
|
-
|
|
162
|
+
Exit: ready drains, user types "stop"/"pause", or fatal error. On exit, render completion summary (`.workflow/state/autonomous-run-summary-<runId>.json`) and clear flag.
|
|
174
163
|
|
|
175
|
-
|
|
164
|
+
Enforced by: `flow-autonomous-detector.js`, `flow-question-queue.js`, `flow-decision-authority.js` (autonomous param + `queue-for-review` + `adversary-loop` buckets), `flow-completion-summary.js`, SessionStart context in `session-context.js`.
|
|
176
165
|
|
|
177
|
-
|
|
166
|
+
---
|
|
167
|
+
|
|
168
|
+
### Mechanical Deferral Authorization Gate (wf-f9912af6)
|
|
169
|
+
|
|
170
|
+
The Review-Findings Anti-Deferral rule is enforced mechanically. The PreToolUse hook intercepts every Write/Edit/Bash that targets `.workflow/state/last-review.json` or `last-audit.json` and BLOCKS the write when:
|
|
178
171
|
|
|
179
|
-
|
|
172
|
+
1. New content introduces a finding with `status` matching `/^deferred(?:[-_].*)?$|^wont-?fix$|^skipped$/i`, AND
|
|
173
|
+
2. No valid auth marker at `.workflow/state/deferral-authorization.json`, AND
|
|
174
|
+
3. `no-defer-pin.json` is not active.
|
|
180
175
|
|
|
181
|
-
**
|
|
176
|
+
**Authorization sources**:
|
|
177
|
+
- **User-prompt classifier** — regex-detects defer phrases ("defer X", "fix critical only", "ship as-is", "option 2/4"). Auth TTL 10min.
|
|
178
|
+
- **Explicit CLI** — `node scripts/flow-defer-auth.js grant --scope=all --reason="<verbatim user phrase>"`.
|
|
182
179
|
|
|
183
|
-
**
|
|
184
|
-
- *"The upstream story has its own tests"* → WRONG. Their tests pin THEIR contract. Your Tier-3 test pins YOUR usage of their contract.
|
|
185
|
-
- *"It's expensive to set up an integration test"* → WRONG. The 2026-04-26 incident cost a v2.29.1 hot-fix release. Set up time amortizes; regression cost compounds.
|
|
186
|
-
- *"Self-IGR is enough; we don't need the actual adversary subagent"* → WRONG. Self-IGR pattern-matches on the same model that wrote the plan; the cross-story dependency is exactly the blind spot a different-model adversary catches.
|
|
180
|
+
**Negative intent overrides positive**: "fix everything", "no deferrals", "I don't want tech debt" delete auth and write `no-defer-pin.json` (~30min hard-block).
|
|
187
181
|
|
|
188
|
-
**
|
|
189
|
-
- `git log --oneline <prior-N-commits>` — which earlier work does this story sit on?
|
|
190
|
-
- For each, write the contract you're relying on: "Story A delivers X via Y."
|
|
191
|
-
- `grep -r "<Story A's interface>"` — is the contract still intact in HEAD?
|
|
192
|
-
- Write the Tier-3 test BEFORE writing Story B's code. If the test cannot be written without first standing up infrastructure that makes the integration verifiable, that's a signal the architecture needs that infrastructure too.
|
|
182
|
+
**Bash-mutating commands** that target review/audit files AND mention `deferred|wont-fix|skipped|dismissed` are blocked when no auth is active. Reads (cat/jq/grep) pass.
|
|
193
183
|
|
|
194
|
-
|
|
184
|
+
Audit trail: `.workflow/state/deferral-block-log.json` (last 100). Config: `deferralGate.{enabled,authTtlSeconds,classifyUserPrompts}` (defaults true / 600 / true).
|
|
185
|
+
|
|
186
|
+
Enforced by: `scripts/hooks/core/deferral-gate.js`, `deferral-classifier.js`, `scripts/flow-defer-auth.js`, wired into `pre-tool-orchestrator.js` and `user-prompt-submit.js`.
|
|
195
187
|
|
|
196
188
|
---
|
|
197
189
|
|
|
198
|
-
###
|
|
190
|
+
### Mechanical Research-Required Gate (wf-5cd71b1f)
|
|
199
191
|
|
|
200
|
-
|
|
192
|
+
Diagnostic prompts are intercepted at UserPromptSubmit and re-prompted at Stop hook if the assistant turn produced text without enough Read calls against evidence paths.
|
|
201
193
|
|
|
202
|
-
|
|
203
|
-
- **engineering / naming / implementation** → decide autonomously, report in the summary.
|
|
204
|
-
- **infrastructure / performance** → decide autonomously, report after.
|
|
205
|
-
- **security** → auto-fix-report-after (existing).
|
|
206
|
-
- **low-confidence technical decisions** → self-adversarial challenge to ≥90% confidence; queue if cap hit. Counter is shared with the IGR Architect-Adversary loop (default cap 30 per run, configurable via `autonomousMode.maxAdversaryInvocations`).
|
|
207
|
-
- **Blocking errors (typecheck/test/conflict)** → fix autonomously; only surface if fundamentally un-fixable.
|
|
194
|
+
Flow:
|
|
208
195
|
|
|
209
|
-
**
|
|
196
|
+
1. **Classifier** (`research-required-classifier.js`) classifies each prompt: `command` / `factual` / `diagnostic` / `none`. Diagnostic markers: "why", "should I", "what do you think", "is this correct", "explain why", "did you fix". On diagnostic → write `.workflow/state/research-required-this-turn.json` with `{requiredEvidence: 2, attemptCount: 0}`.
|
|
197
|
+
2. **Override**: prompt prefix `!` skips the gate.
|
|
198
|
+
3. **Stop-hook gate** (`research-required-gate.js`) parses the JSONL transcript, counts Read against evidence prefixes, Bash with `cat|head|tail|grep|rg|jq|less|view|awk|sed` against evidence paths, and any Glob/Grep.
|
|
199
|
+
4. **count < required** → `{continue: true, stopReason: <message>}` forces redo. After `maxAttempts` (default 3) → hard-stop visible to user.
|
|
200
|
+
5. **count ≥ required** → marker consumed, Stop proceeds.
|
|
210
201
|
|
|
211
|
-
|
|
202
|
+
Evidence prefixes: `.workflow/state/`, `.workflow/changes/`, `.workflow/specs/`, `.workflow/epics/`, `lib/`, `scripts/`, `src/`, `tests/`, `app/`.
|
|
212
203
|
|
|
213
|
-
|
|
204
|
+
Config: `researchRequiredGate.{enabled,requiredEvidence,maxAttempts}` (defaults true / 2 / 3). Override prefix `!` is hard-coded.
|
|
214
205
|
|
|
215
|
-
Enforced by: `
|
|
206
|
+
Enforced by: `research-required-classifier.js` (UserPromptSubmit), `research-required-gate.js` (Stop), wired into `user-prompt-submit.js` and `stop.js`.
|
package/README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# WogiFlow
|
|
2
2
|
|
|
3
|
-
A self-improving AI development workflow that learns from your feedback. Currently supports **Claude Code 2.1.33+**.
|
|
3
|
+
A self-improving AI development workflow that learns from your feedback. Currently supports **Claude Code 2.1.33+**. Claude Code **2.1.121+** is recommended for native MCP startup retries (transient channel-server failures auto-recover), Bash resilience to deleted CWDs (worktree cleanup is safe mid-session), and `Always allow` permission persistence across worker restarts.
|
|
4
4
|
|
|
5
5
|
```bash
|
|
6
6
|
npm install -D wogiflow
|
package/lib/wogi-claude
CHANGED
|
@@ -21,6 +21,12 @@
|
|
|
21
21
|
# WOGI_MAX_RESTARTS — safety cap, default 50 (prevents runaway restart storms)
|
|
22
22
|
# WOGI_WRAPPER_PID — exported to child; hook checks this to confirm wrapper is present
|
|
23
23
|
# WOGI_CLAUDE_BIN — override path to claude binary (default: found via PATH)
|
|
24
|
+
# WOGI_BASH_BIN — (wf-ee4e343b cleanup) override the bash binary used
|
|
25
|
+
# by the PID-alignment subshell trick. Defaults to
|
|
26
|
+
# `bash` on PATH. Useful on minimal containers
|
|
27
|
+
# (Alpine, distroless) where bash lives at a
|
|
28
|
+
# non-standard path or where the shell wrapping the
|
|
29
|
+
# claude CLI is not bash-by-default.
|
|
24
30
|
# WOGI_USE_EXPECT — (EXPERIMENTAL, v2.22.4+) set to 1 to opt IN to the
|
|
25
31
|
# expect-based auto-dismiss of the "Loading development
|
|
26
32
|
# channels" dialog. OFF BY DEFAULT because Ink's
|
|
@@ -99,7 +105,7 @@ if [ "$__wogi_is_worker" -eq 1 ]; then
|
|
|
99
105
|
try {
|
|
100
106
|
const cfg = require(process.cwd() + "/.workflow/config.json");
|
|
101
107
|
process.stdout.write(String(!!(cfg.workspace && cfg.workspace.inheritClaudeAiMcpIntegrations)));
|
|
102
|
-
} catch (
|
|
108
|
+
} catch (_err) { process.stdout.write("false"); }
|
|
103
109
|
' 2>/dev/null)"
|
|
104
110
|
if [ "$__wogi_config_inherit" = "true" ]; then
|
|
105
111
|
__wogi_strip_mcp=0
|
|
@@ -205,7 +211,7 @@ if [ "$__wogi_strip_mcp" -eq 1 ]; then
|
|
|
205
211
|
const ws = cfg && cfg.mcpServers && cfg.mcpServers["wogi-workspace-channel"];
|
|
206
212
|
if (ws) channelEntry = ws;
|
|
207
213
|
}
|
|
208
|
-
} catch (
|
|
214
|
+
} catch (_err) {}
|
|
209
215
|
const payload = channelEntry
|
|
210
216
|
? { mcpServers: { "wogi-workspace-channel": channelEntry } }
|
|
211
217
|
: { mcpServers: {} };
|
|
@@ -284,12 +290,37 @@ __wogi_build_argv() {
|
|
|
284
290
|
|
|
285
291
|
# run_claude — invoke claude, routing through expect when we can auto-dismiss
|
|
286
292
|
# the dev-channels dialog. Preserves stdin/stdout/stderr exactly.
|
|
293
|
+
#
|
|
294
|
+
# wf-ee4e343b: PID-alignment via bash-c-exec trick. The Stop hook's SEC-006
|
|
295
|
+
# check (task-boundary-reset.js:200-206) requires WOGI_WRAPPER_PID === process.ppid
|
|
296
|
+
# in any hook running under claude. Plain `"$CLAUDE_BIN" ...` without `exec`
|
|
297
|
+
# causes bash to fork: claude gets a NEW PID that does not match $$ (this bash
|
|
298
|
+
# wrapper's PID). The check then fails silently, breaking auto-restart for
|
|
299
|
+
# everyone since 2026-04-26.
|
|
300
|
+
#
|
|
301
|
+
# Fix: spawn claude through `bash -c '...'` which forks a fresh bash with its
|
|
302
|
+
# OWN $$, sets WOGI_WRAPPER_PID to that $$, then `exec` replaces the new bash
|
|
303
|
+
# with claude — preserving the same PID. Result: claude's PID equals the
|
|
304
|
+
# WOGI_WRAPPER_PID it inherits, and process.ppid in any hook child of claude
|
|
305
|
+
# equals that same value. The strict-equality SEC-006 check now holds.
|
|
306
|
+
#
|
|
307
|
+
# Why `bash -c` and not a `( ... )` subshell: in bash 3.x (macOS system bash),
|
|
308
|
+
# `$$` inside a `( ... )` subshell returns the OUTER shell's PID, not the
|
|
309
|
+
# subshell's own PID. Bash 4+ adds $BASHPID for that purpose, but we cannot
|
|
310
|
+
# rely on bash 4+ being installed. `bash -c` always returns its own PID via
|
|
311
|
+
# `$$`, regardless of version.
|
|
312
|
+
#
|
|
313
|
+
# Bash -c argv form: `bash -c COMMAND COMMAND_NAME ARG1 ARG2 ...` — COMMAND_NAME
|
|
314
|
+
# becomes $0 inside the script and ARG1..N become $1..$N, so `exec "$0" "$@"`
|
|
315
|
+
# invokes claude with all original args without quoting hazards.
|
|
316
|
+
#
|
|
317
|
+
# For expect mode, the same alignment is performed inside wogi-claude-expect.exp.
|
|
287
318
|
run_claude() {
|
|
288
319
|
__wogi_build_argv "$@"
|
|
289
320
|
if [ "$__wogi_use_expect" -eq 1 ]; then
|
|
290
321
|
expect "$WOGI_EXPECT_SCRIPT" "$CLAUDE_BIN" "${__wogi_claude_argv[@]+"${__wogi_claude_argv[@]}"}"
|
|
291
322
|
else
|
|
292
|
-
"$CLAUDE_BIN" "${__wogi_claude_argv[@]+"${__wogi_claude_argv[@]}"}"
|
|
323
|
+
"${WOGI_BASH_BIN:-bash}" -c 'export WOGI_WRAPPER_PID=$$; exec "$0" "$@"' "$CLAUDE_BIN" "${__wogi_claude_argv[@]+"${__wogi_claude_argv[@]}"}"
|
|
293
324
|
fi
|
|
294
325
|
}
|
|
295
326
|
|
|
@@ -90,12 +90,37 @@ set claude_bin [lindex $argv 0]
|
|
|
90
90
|
set claude_args [lrange $argv 1 end]
|
|
91
91
|
|
|
92
92
|
# Spawn claude in a pseudo-TTY so its Ink UI renders normally.
|
|
93
|
-
#
|
|
94
|
-
#
|
|
95
|
-
# (
|
|
96
|
-
#
|
|
93
|
+
#
|
|
94
|
+
# wf-ee4e343b PID-alignment: spawn claude through `bash -c` so we can
|
|
95
|
+
# re-export WOGI_WRAPPER_PID=$$ (the subshell's PID) before exec — this
|
|
96
|
+
# makes claude inherit a WOGI_WRAPPER_PID equal to its own PID, satisfying
|
|
97
|
+
# the SEC-006 strict-equality check in task-boundary-reset.js. Without this,
|
|
98
|
+
# expect's spawn gives claude a PID different from WOGI_WRAPPER_PID and the
|
|
99
|
+
# Stop-hook restart trigger silently fails.
|
|
100
|
+
#
|
|
101
|
+
# The bash -c form `bash -c COMMAND COMMAND_NAME ARG1 ARG2 ...` makes
|
|
102
|
+
# COMMAND_NAME = $0 and the remaining args = $1..$N, so we use `exec "$0" "$@"`
|
|
103
|
+
# to invoke claude with all original args — no quoting hazards.
|
|
97
104
|
_wogi_boot_mark "before spawn"
|
|
98
|
-
|
|
105
|
+
|
|
106
|
+
# F4 fix (wf-ee4e343b cleanup): defensive list construction. `lrange $argv 1
|
|
107
|
+
# end` already returns a clean Tcl list, and `{*}$claude_args` splices it
|
|
108
|
+
# without re-parsing element contents — so brace-containing args are
|
|
109
|
+
# preserved as single elements. The Sonnet review flagged this as a quoting
|
|
110
|
+
# hazard; the Opus adversary verified it's safe. We rebuild the list
|
|
111
|
+
# explicitly via [list ...] anyway for defense-in-depth and to make the
|
|
112
|
+
# safety property obvious to future readers (no implicit dependency on
|
|
113
|
+
# lrange's return contract).
|
|
114
|
+
set claude_args_safe [list]
|
|
115
|
+
foreach _arg $claude_args { lappend claude_args_safe $_arg }
|
|
116
|
+
|
|
117
|
+
# Honor WOGI_BASH_BIN (wf-ee4e343b cleanup) for non-standard shell layouts.
|
|
118
|
+
set _wogi_bash_bin "bash"
|
|
119
|
+
if {[info exists env(WOGI_BASH_BIN)] && $env(WOGI_BASH_BIN) ne ""} {
|
|
120
|
+
set _wogi_bash_bin $env(WOGI_BASH_BIN)
|
|
121
|
+
}
|
|
122
|
+
|
|
123
|
+
spawn $_wogi_bash_bin -c "export WOGI_WRAPPER_PID=\$\$; exec \"\$0\" \"\$@\"" $claude_bin {*}$claude_args_safe
|
|
99
124
|
_wogi_boot_mark "after spawn (pid=$spawn_id)"
|
|
100
125
|
|
|
101
126
|
# ============================================================================
|