wogiflow 2.15.0 → 2.16.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/commands/wogi-challenge.md +4 -4
- package/.claude/commands/wogi-gate-stats.md +1 -1
- package/.claude/commands/wogi-start-continuation.md +10 -10
- package/.claude/commands/wogi-start.md +2 -0
- package/.claude/docs/intent-grounded-reasoning.md +1 -1
- package/.claude/docs/knowledge-base/02-task-execution/02-execution-loop.md +8 -0
- package/.claude/docs/knowledge-base/02-task-execution/03-verification.md +110 -10
- package/.claude/docs/knowledge-base/02-task-execution/README.md +10 -0
- package/.claude/docs/knowledge-base/02-task-execution/decision-authority.md +110 -0
- package/.claude/docs/knowledge-base/02-task-execution/workspace-mode.md +176 -0
- package/.claude/docs/knowledge-base/04-memory-context/context-management.md +40 -0
- package/.claude/docs/knowledge-base/04-memory-context/memory-systems.md +12 -1
- package/.claude/docs/knowledge-base/06-safety-guardrails/README.md +1 -0
- package/.claude/docs/knowledge-base/06-safety-guardrails/mechanical-gates.md +150 -0
- package/.claude/docs/knowledge-base/wogiflow-enterprise-showcase.md +423 -0
- package/.claude/docs/phases/02-spec.md +2 -2
- package/.claude/docs/phases/04-verify.md +1 -1
- package/.workflow/agents/logic-adversary.md +7 -2
- package/.workflow/templates/claude-md.hbs +27 -0
- package/lib/wogi-claude +87 -0
- package/package.json +3 -2
- package/scripts/flow-architect-pass.js +3 -3
- package/scripts/flow-config-defaults.js +51 -0
- package/scripts/flow-constants.js +3 -1
- package/scripts/flow-correct.js +1 -0
- package/scripts/flow-done.js +16 -0
- package/scripts/flow-hook-status.js +6 -2
- package/scripts/flow-logic-adversary.js +4 -4
- package/scripts/flow-migrate-igr.js +1 -1
- package/scripts/hooks/core/phase-read-gate.js +52 -9
- package/scripts/hooks/core/post-compact.js +18 -0
- package/scripts/hooks/core/session-context.js +26 -0
- package/scripts/hooks/core/session-end.js +10 -0
- package/scripts/hooks/core/session-history.js +116 -0
- package/scripts/hooks/core/task-boundary-reset.js +249 -0
- package/scripts/hooks/core/task-completed.js +35 -0
- package/scripts/hooks/entry/claude-code/pre-tool-use.js +10 -5
- package/scripts/hooks/entry/claude-code/stop.js +63 -0
|
@@ -0,0 +1,150 @@
|
|
|
1
|
+
# Mechanical Enforcement Gates
|
|
2
|
+
|
|
3
|
+
WogiFlow enforces workflow rules through PreToolUse hook gates — real JavaScript code that intercepts every tool call and blocks violations before they happen. These are not prompt suggestions; they are physical blocks that the AI cannot bypass.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## How It Works
|
|
8
|
+
|
|
9
|
+
Every time Claude Code calls a tool (Edit, Write, Bash, Read, Grep, etc.), the PreToolUse hook fires first. It runs through a chain of gates. If any gate returns `blocked: true`, the tool call is rejected with an error message telling the AI what to do instead.
|
|
10
|
+
|
|
11
|
+
```
|
|
12
|
+
Claude Code → tool call → PreToolUse hook → [Gate chain] → allowed / blocked
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
Gates are fail-open by default (errors don't block work) unless noted. The routing gate is fail-closed (errors block everything until routing completes).
|
|
16
|
+
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
## Gate Reference
|
|
20
|
+
|
|
21
|
+
### Routing Gate
|
|
22
|
+
**Enforces**: Every user message must go through `/wogi-start` before any tool can run.
|
|
23
|
+
**Blocks**: Read, Glob, Grep, Edit, Write, Bash, Agent, WebSearch, WebFetch, EnterPlanMode
|
|
24
|
+
**Fail mode**: Fail-CLOSED (if the gate errors, tools are blocked)
|
|
25
|
+
**Config**: `hooks.rules.routingGate.enabled`
|
|
26
|
+
**Exceptions**: Read-only git commands (`git status`, `git log`, `git diff`), subagents with an active parent task
|
|
27
|
+
|
|
28
|
+
### Phase Gate
|
|
29
|
+
**Enforces**: Tools are restricted based on the current workflow phase.
|
|
30
|
+
**Rules**:
|
|
31
|
+
- `routing` phase: Edit, Write, Bash blocked
|
|
32
|
+
- `exploring` phase: Edit, Write blocked (research is read-only)
|
|
33
|
+
- `spec_review` phase: Edit, Write, Bash blocked (reviewing, not coding)
|
|
34
|
+
- `coding` phase: All tools allowed
|
|
35
|
+
- `validating` phase: Edit, Write blocked (verifying, not changing)
|
|
36
|
+
- `completing` phase: All tools allowed (for logs, maps, commits)
|
|
37
|
+
**Config**: `hooks.rules.phaseGate.enabled`
|
|
38
|
+
|
|
39
|
+
### Phase-Read Gate
|
|
40
|
+
**Enforces**: Must read the phase instruction file before using mutation tools.
|
|
41
|
+
**Blocks**: Edit, Write, Bash — until the current phase's file in `.claude/docs/phases/` is read
|
|
42
|
+
**Purpose**: Enables on-demand loading of pipeline instructions (79% token savings for conversations)
|
|
43
|
+
**State file**: `.workflow/state/phase-reads.json`
|
|
44
|
+
**Config**: Respects `hooks.rules.phaseGate.enabled` (same toggle)
|
|
45
|
+
|
|
46
|
+
### Scope Gate
|
|
47
|
+
**Enforces**: Edits must be within the task's declared file scope.
|
|
48
|
+
**Blocks**: Edit, Write on files not listed in the task spec
|
|
49
|
+
**Config**: `hooks.rules.scopeGating.enabled`
|
|
50
|
+
|
|
51
|
+
### Bugfix Scope Gate
|
|
52
|
+
**Enforces**: L3 bugfix tasks are limited in how many files they can touch.
|
|
53
|
+
**Behavior**: Warns after 3 unique file edits, blocks at configurable threshold
|
|
54
|
+
**Purpose**: Prevents scope creep — a "quick fix" that touches 15 files should be an L2 task
|
|
55
|
+
**Config**: `enforcement.bugfixScope.enabled`
|
|
56
|
+
|
|
57
|
+
### Scope Mutation Gate
|
|
58
|
+
**Enforces**: Fix tasks cannot create new files; tasks cannot delete pre-existing files via Bash.
|
|
59
|
+
**Blocks**: Write (new file creation in fix tasks), Bash (rm commands on tracked files)
|
|
60
|
+
**Config**: `enforcement.scopeMutation.enabled`
|
|
61
|
+
|
|
62
|
+
### Strike Gate
|
|
63
|
+
**Enforces**: After repeated verification failures, blocks further edits.
|
|
64
|
+
**Behavior**: Tracks consecutive failures per task. After 3 strikes, blocks Edit/Write/Bash.
|
|
65
|
+
**Purpose**: Prevents infinite retry loops where the AI tries the same broken approach repeatedly
|
|
66
|
+
**State file**: `.workflow/state/strike-tracker.json`
|
|
67
|
+
**Config**: `enforcement.strikeEscalation.enabled`
|
|
68
|
+
|
|
69
|
+
### Deploy Gate
|
|
70
|
+
**Enforces**: Cannot deploy without a verification artifact. Cannot write/edit verification artifacts directly (anti-forgery).
|
|
71
|
+
**Blocks**: Bash (deploy commands without artifact), Write/Edit (on verification artifact files)
|
|
72
|
+
**Config**: `enforcement.deployGate.enabled`
|
|
73
|
+
|
|
74
|
+
### Git Safety Gate
|
|
75
|
+
**Enforces**: Creates automatic backup before destructive git operations.
|
|
76
|
+
**Triggers on**: `git reset --hard`, `git checkout -- .`, `git restore .`, `git clean -f`
|
|
77
|
+
**Behavior**: Creates a backup branch before allowing the destructive operation
|
|
78
|
+
**Config**: `enforcement.gitSafety.enabled`
|
|
79
|
+
|
|
80
|
+
### Commit-Log Gate
|
|
81
|
+
**Enforces**: Commits must have corresponding request-log entries.
|
|
82
|
+
**Blocks**: Bash (git commit) when request-log.md wasn't updated for the current task
|
|
83
|
+
**Config**: `hooks.rules.commitLogGate.enabled`
|
|
84
|
+
|
|
85
|
+
### Manager Boundary Gate
|
|
86
|
+
**Enforces**: Manager repos cannot modify worker repo source code (workspace mode).
|
|
87
|
+
**Blocks**: Edit/Write on any file inside member repos; Bash except allowlisted read-only commands
|
|
88
|
+
**Allows**: Read of metadata files (api-map, app-map, config, state files)
|
|
89
|
+
**Active when**: `WOGI_REPO_NAME === 'manager'`
|
|
90
|
+
|
|
91
|
+
### Component Reuse Gate
|
|
92
|
+
**Enforces**: Must check existing components before creating new ones.
|
|
93
|
+
**Behavior**: When Write creates a new component file, checks app-map for similar existing components
|
|
94
|
+
**Output**: Warning with reuse candidates (name, path, similarity score)
|
|
95
|
+
**Config**: `componentReuse.enabled`
|
|
96
|
+
|
|
97
|
+
### Standards Compliance Gate
|
|
98
|
+
**Enforces**: Naming conventions, security patterns, decisions.md rules.
|
|
99
|
+
**Runs at**: Task completion (part of quality gates)
|
|
100
|
+
**Checks scoped by task type**: component → naming/components/security, API → naming/api/security, feature → all
|
|
101
|
+
**Config**: Part of `qualityGates.<taskType>.require` array
|
|
102
|
+
|
|
103
|
+
### Damage Control
|
|
104
|
+
**Enforces**: Configurable blocklist for dangerous commands and protected files.
|
|
105
|
+
**Behavior**: Block or ask-before-execute based on pattern matching
|
|
106
|
+
**Config**: `damageControl.enabled` with custom patterns in `.workflow/damage-control.yaml`
|
|
107
|
+
|
|
108
|
+
---
|
|
109
|
+
|
|
110
|
+
## Configuration
|
|
111
|
+
|
|
112
|
+
All gates can be toggled independently:
|
|
113
|
+
|
|
114
|
+
```json
|
|
115
|
+
{
|
|
116
|
+
"hooks": {
|
|
117
|
+
"rules": {
|
|
118
|
+
"routingGate": { "enabled": true },
|
|
119
|
+
"phaseGate": { "enabled": true },
|
|
120
|
+
"scopeGating": { "enabled": true },
|
|
121
|
+
"commitLogGate": { "enabled": true }
|
|
122
|
+
}
|
|
123
|
+
},
|
|
124
|
+
"enforcement": {
|
|
125
|
+
"strictMode": true,
|
|
126
|
+
"bugfixScope": { "enabled": true },
|
|
127
|
+
"scopeMutation": { "enabled": true },
|
|
128
|
+
"strikeEscalation": { "enabled": true },
|
|
129
|
+
"deployGate": { "enabled": true },
|
|
130
|
+
"gitSafety": { "enabled": true }
|
|
131
|
+
},
|
|
132
|
+
"componentReuse": { "enabled": true },
|
|
133
|
+
"damageControl": { "enabled": false }
|
|
134
|
+
}
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
---
|
|
138
|
+
|
|
139
|
+
## Fast-Path Optimization
|
|
140
|
+
|
|
141
|
+
The hook checks pre-computed status from `.workflow/state/hook-status.json`. If all gates are disabled, the entire hook exits in <1ms. Individual gates are skipped via the status cache without loading their modules.
|
|
142
|
+
|
|
143
|
+
---
|
|
144
|
+
|
|
145
|
+
## Related
|
|
146
|
+
|
|
147
|
+
- [Damage Control](./damage-control.md) — Configurable pattern-based protection
|
|
148
|
+
- [Commit Gates](./commit-gates.md) — Approval workflow for commits
|
|
149
|
+
- [Task Execution](../02-task-execution/) — Where gates enforce workflow phases
|
|
150
|
+
- [Verification](../02-task-execution/03-verification.md) — Quality gates at completion
|
|
@@ -0,0 +1,423 @@
|
|
|
1
|
+
# WogiFlow: Enterprise AI Development Workflow
|
|
2
|
+
|
|
3
|
+
**The only AI coding workflow with mechanical enforcement.**
|
|
4
|
+
|
|
5
|
+
Every other tool in the Claude Code ecosystem gives the AI _suggestions_. WogiFlow gives it _rules it physically cannot break_. This is the difference between a coding assistant that sometimes follows best practices and one that is architecturally incapable of skipping them.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## The Core Problem WogiFlow Solves
|
|
10
|
+
|
|
11
|
+
AI coding agents are powerful but undisciplined. Without structure, they:
|
|
12
|
+
|
|
13
|
+
- Skip verification and claim "done" based on "the code looks correct"
|
|
14
|
+
- Make untracked changes that nobody can audit
|
|
15
|
+
- Forget project conventions between sessions
|
|
16
|
+
- Hallucinate completion — static checks pass, but the feature doesn't actually work
|
|
17
|
+
- Make autonomous decisions about things the team should decide
|
|
18
|
+
- Lose context mid-task and produce inconsistent output
|
|
19
|
+
|
|
20
|
+
These problems multiply with team size. One developer with a loose AI assistant creates tech debt. Ten developers with loose AI assistants create chaos.
|
|
21
|
+
|
|
22
|
+
**WogiFlow eliminates these failure modes through mechanical enforcement** — hook-based gates that physically intercept every tool call and block violations before they happen. Not prompts asking nicely. Real code that runs before every file edit, every bash command, every tool invocation.
|
|
23
|
+
|
|
24
|
+
---
|
|
25
|
+
|
|
26
|
+
## How It Works: The /wogi-start Pipeline
|
|
27
|
+
|
|
28
|
+
Everything in WogiFlow begins with `/wogi-start`. This is the universal entry point — every user message, every task, every question must pass through it. The routing gate mechanically blocks all tools (Edit, Write, Bash, Read, Grep) until routing completes.
|
|
29
|
+
|
|
30
|
+
### What Happens When You Say "Add dark mode toggle"
|
|
31
|
+
|
|
32
|
+
```
|
|
33
|
+
User message
|
|
34
|
+
│
|
|
35
|
+
▼
|
|
36
|
+
┌─────────────────────────────┐
|
|
37
|
+
│ ROUTING GATE (mechanical) │ ← All tools blocked until /wogi-start runs
|
|
38
|
+
│ PreToolUse hook intercepts │
|
|
39
|
+
│ every tool call │
|
|
40
|
+
└─────────────┬───────────────┘
|
|
41
|
+
│
|
|
42
|
+
▼
|
|
43
|
+
┌─────────────────────────────┐
|
|
44
|
+
│ /wogi-start TRIAGE │ ← Classifies: story, bug, review, conversation?
|
|
45
|
+
│ Determines task level: │ Routes to appropriate pipeline
|
|
46
|
+
│ L0 Epic (15+ files) │
|
|
47
|
+
│ L1 Story (5-15 files) │
|
|
48
|
+
│ L2 Task (1-5 files) │
|
|
49
|
+
│ L3 Subtask (1 file) │
|
|
50
|
+
└─────────────┬───────────────┘
|
|
51
|
+
│
|
|
52
|
+
▼
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
### The Pipeline: What Runs Depends on Task Size
|
|
56
|
+
|
|
57
|
+
| Phase | L3 Subtask | L2 Task | L1 Story | L0 Epic |
|
|
58
|
+
|-------|-----------|---------|----------|---------|
|
|
59
|
+
| **Routing + Context** | Yes | Yes | Yes | Yes |
|
|
60
|
+
| **Multi-Agent Research** (6 parallel agents) | Skip | Yes | Yes | Yes |
|
|
61
|
+
| **Reuse Gate** (check existing components) | Skip | Yes | Yes | Yes |
|
|
62
|
+
| **Scope-Confidence Audit** | Skip | Skip | Yes | Yes |
|
|
63
|
+
| **Architect Pass** (IGR) | Skip | Skip | Yes | Yes |
|
|
64
|
+
| **Logic Adversary** (different model critiques plan) | Skip | Skip | Yes | Yes |
|
|
65
|
+
| **Spec Generation + Approval** | Skip | Conditional | Yes (blocks for approval) | Yes (blocks for approval) |
|
|
66
|
+
| **Implementation Loop** | Yes | Yes | Yes | Yes |
|
|
67
|
+
| **Skeptical Evaluator** (separate agent grades work) | Skip | Yes | Yes | Yes |
|
|
68
|
+
| **Runtime Verification** (auto-generated tests) | Yes | Yes | Yes | Yes |
|
|
69
|
+
| **Wiring Validation** | Yes | Yes | Yes | Yes |
|
|
70
|
+
| **Standards Compliance** | Yes | Yes | Yes | Yes |
|
|
71
|
+
| **Quality Gates** | Yes | Yes | Yes | Yes |
|
|
72
|
+
|
|
73
|
+
A trivial L3 fix takes seconds. An L1 story goes through the full pipeline — research, planning, adversarial review, implementation, independent verification, quality gates. The AI doesn't choose what to skip; the pipeline enforces it based on task classification.
|
|
74
|
+
|
|
75
|
+
### Phase-Loaded Architecture
|
|
76
|
+
|
|
77
|
+
The pipeline instructions are split into 5 phase files loaded on-demand. A conversation that never reaches the coding phase never loads coding instructions — saving ~79% of prompt tokens. The PreToolUse hook blocks Edit/Write/Bash until the current phase's instruction file is read, ensuring the AI always has the right instructions loaded.
|
|
78
|
+
|
|
79
|
+
---
|
|
80
|
+
|
|
81
|
+
## Mechanical Enforcement: 12+ Gates That Cannot Be Bypassed
|
|
82
|
+
|
|
83
|
+
Every tool call in WogiFlow passes through the PreToolUse hook. This is real JavaScript code that executes before Claude Code processes any Edit, Write, Bash, Read, or Grep command. The gates are not suggestions — they are physical blocks.
|
|
84
|
+
|
|
85
|
+
| Gate | What It Enforces | Fail Mode |
|
|
86
|
+
|------|-----------------|-----------|
|
|
87
|
+
| **Routing Gate** | Every message must go through /wogi-start first | Block all tools |
|
|
88
|
+
| **Phase Gate** | Tools restricted per workflow phase (no editing during research) | Block Edit/Write |
|
|
89
|
+
| **Phase-Read Gate** | Must read phase instructions before working | Block Edit/Write/Bash |
|
|
90
|
+
| **Scope Gate** | Edits must be within the task's declared file scope | Block Edit/Write |
|
|
91
|
+
| **Bugfix Scope Gate** | L3 bugfixes limited to 3 files before escalation | Warn then block |
|
|
92
|
+
| **Scope Mutation Gate** | Fix tasks can't create new files; can't delete pre-existing files | Block Write/Bash |
|
|
93
|
+
| **Strike Gate** | After 3 failed verification attempts, blocks further edits | Block Edit/Write/Bash |
|
|
94
|
+
| **Deploy Gate** | Can't deploy without verification artifact | Block Bash |
|
|
95
|
+
| **Git Safety Gate** | Auto-backup before destructive git operations | Block or backup |
|
|
96
|
+
| **Commit-Log Gate** | Commits must have request-log entries | Block Bash (git commit) |
|
|
97
|
+
| **Manager Boundary Gate** | Manager repos can't modify worker repo source code | Block Edit/Write |
|
|
98
|
+
| **Component Reuse Gate** | Must check existing components before creating new ones | Warn on Write |
|
|
99
|
+
| **Standards Compliance** | Naming, security, decisions.md rules enforced | Block task completion |
|
|
100
|
+
| **Damage Control** | Configurable blocklist for dangerous commands/files | Block or ask |
|
|
101
|
+
|
|
102
|
+
**Example**: A developer asks Claude to "quickly fix the login bug." Without WogiFlow, Claude edits 8 files, skips tests, and says "done." With WogiFlow:
|
|
103
|
+
|
|
104
|
+
1. Routing gate forces `/wogi-start` classification → L3 bugfix
|
|
105
|
+
2. Bugfix scope gate warns after 3 files, blocks after threshold
|
|
106
|
+
3. Phase gate prevents editing during the research phase
|
|
107
|
+
4. Strike gate blocks further changes after 3 failed lint checks
|
|
108
|
+
5. Standards gate verifies the fix follows project conventions
|
|
109
|
+
6. Commit-log gate ensures the change is logged before commit
|
|
110
|
+
|
|
111
|
+
---
|
|
112
|
+
|
|
113
|
+
## Intent-Grounded Reasoning (IGR)
|
|
114
|
+
|
|
115
|
+
For L1+ tasks, WogiFlow adds a reasoning layer that catches logic failures _before_ code is written.
|
|
116
|
+
|
|
117
|
+
### The Architect + Adversary Pattern
|
|
118
|
+
|
|
119
|
+
1. **Intent Framing**: The AI explicitly interprets the task — resolving ambiguous terms, identifying affected user journeys, surfacing assumptions
|
|
120
|
+
2. **Architect Pass**: A read-only sub-agent produces an 8-section plan (approach, data model, risks, alternatives, dependencies)
|
|
121
|
+
3. **Logic Adversary**: A _different model_ critiques the plan against a 10-principle Logic Constitution. Same-model self-critique is a known rubber-stamp failure mode — WogiFlow uses Sonnet to critique Opus plans (or vice versa)
|
|
122
|
+
4. **Iteration Loop**: If the Adversary finds issues, the plan goes back to the Architect. Max 3 rounds. If it still fails → task is blocked and surfaced to the user
|
|
123
|
+
|
|
124
|
+
This catches architectural mistakes, missing edge cases, and flawed assumptions before a single line of code is written.
|
|
125
|
+
|
|
126
|
+
### Completion Truth Gate
|
|
127
|
+
|
|
128
|
+
When the AI claims a task is "done," the Truth Gate audits every acceptance criterion against evidence tiers:
|
|
129
|
+
|
|
130
|
+
| Tier | Name | Counts as Done? |
|
|
131
|
+
|------|------|----------------|
|
|
132
|
+
| 0 | STATIC (compiles, lints) | Never |
|
|
133
|
+
| 1 | STRUCTURAL (file exists, imported) | Never |
|
|
134
|
+
| 2 | OBSERVATIONAL (page loads, renders) | Display-only criteria |
|
|
135
|
+
| 3 | INTERACTIVE (click → result persists) | Yes |
|
|
136
|
+
| 4 | AUTOMATED (test passes) | Yes (strongest) |
|
|
137
|
+
|
|
138
|
+
If the AI claims "done" with only Tier 0 evidence (it compiles), the gate downgrades the claim to "implemented (unverified)" and blocks task completion.
|
|
139
|
+
|
|
140
|
+
---
|
|
141
|
+
|
|
142
|
+
## Verification: The Skeptical Evaluator
|
|
143
|
+
|
|
144
|
+
After implementation, WogiFlow spawns a separate sub-agent specifically tuned for skepticism:
|
|
145
|
+
|
|
146
|
+
- **Different model** than the implementer (prevents self-congratulation bias)
|
|
147
|
+
- **Prompted to find problems**, not praise
|
|
148
|
+
- **Grades each criterion**: PASS / PARTIAL / FAIL with file:line evidence
|
|
149
|
+
- **Iteration loop**: If issues found → fix → re-evaluate (max 3 rounds)
|
|
150
|
+
|
|
151
|
+
This is based on Anthropic's own harness design research: _"Separating the agent doing the work from the agent judging it is a strong lever"_ and _"tuning standalone evaluators toward skepticism is far more tractable than making a generator critical of its own work."_
|
|
152
|
+
|
|
153
|
+
---
|
|
154
|
+
|
|
155
|
+
## Enforced Testing: Auto-Generated Verification Tests
|
|
156
|
+
|
|
157
|
+
WogiFlow doesn't trust "it compiles" as proof that a feature works. For every task that changes code, the Runtime Verification Gate automatically generates and runs tests — frontend and backend — as part of the execution loop. This is ON by default, not opt-in.
|
|
158
|
+
|
|
159
|
+
### How It Works
|
|
160
|
+
|
|
161
|
+
For each acceptance criterion in the task spec:
|
|
162
|
+
1. **Classify**: Is this a UI behavior, API behavior, or internal logic?
|
|
163
|
+
2. **Generate**: Write a test that exercises the specific criterion
|
|
164
|
+
3. **Implement**: Write the actual code
|
|
165
|
+
4. **Run**: Execute the test — it must pass
|
|
166
|
+
5. **If fail** → debug, fix, re-run (max 5 retries)
|
|
167
|
+
6. **Persist**: Test stays in `tests/verification/` as a permanent regression guard
|
|
168
|
+
|
|
169
|
+
Over time, this builds an automated test suite from the actual use cases that were implemented — not hypothetical test cases, but tests that directly verify the features your team shipped.
|
|
170
|
+
|
|
171
|
+
### Frontend: Browser Verification
|
|
172
|
+
|
|
173
|
+
When changed files include UI code (`.tsx`, `.jsx`, `.vue`, `.svelte`, `.css`), WogiFlow generates browser-level tests:
|
|
174
|
+
|
|
175
|
+
**Method 1 — WebMCP (preferred)**: If a browser MCP server is connected, WogiFlow drives the actual browser:
|
|
176
|
+
1. Navigate to the affected page
|
|
177
|
+
2. Screenshot BEFORE the change
|
|
178
|
+
3. Perform the user action (click, type, submit)
|
|
179
|
+
4. Wait for async updates
|
|
180
|
+
5. Screenshot AFTER
|
|
181
|
+
6. Assert DOM state matches expected behavior
|
|
182
|
+
7. For state mutations: reload the page and verify changes persisted
|
|
183
|
+
|
|
184
|
+
**Method 2 — Playwright**: Auto-generates a Playwright test to `tests/verification/verify-{taskId}.spec.ts`. Persists as a CI-ready regression guard.
|
|
185
|
+
|
|
186
|
+
**Method 3 — User Checklist (fallback)**: When no browser automation is available, generates a specific checklist and blocks task completion until the user replies "verified."
|
|
187
|
+
|
|
188
|
+
**High-risk mutation detection**: When code contains `useMutation`, `invalidateQueries`, or `onMutate`, WogiFlow adds extra verification — wait 3 seconds after all criteria pass, reload the page, and confirm state survived the refetch cycle. This catches the #1 frontend false-completion: "it looked right but the server didn't persist it."
|
|
189
|
+
|
|
190
|
+
### Backend: API Integration Tests
|
|
191
|
+
|
|
192
|
+
When changed files include API code (`.controller.*`, `.service.*`, `.resolver.*`, `/routes/`, `/api/`, `.dto.*`, `.guard.*`, `.middleware.*`), WogiFlow generates integration tests:
|
|
193
|
+
|
|
194
|
+
- Makes actual HTTP requests to the running dev server
|
|
195
|
+
- Asserts status codes, response shapes, and field values
|
|
196
|
+
- For mutations (POST/PUT/PATCH/DELETE): re-fetches the resource to verify persistence
|
|
197
|
+
- Auth-protected endpoints get the auth token included automatically
|
|
198
|
+
- Tests persist in `tests/verification/api-verify-{taskId}.test.js`
|
|
199
|
+
|
|
200
|
+
### Fullstack: Boundary Verification
|
|
201
|
+
|
|
202
|
+
When both UI and API files change, WogiFlow generates BOTH browser and API tests and validates the boundary:
|
|
203
|
+
- The API test verifies the server accepts the payload shape the frontend sends
|
|
204
|
+
- The browser test verifies the UI correctly displays the response shape the server returns
|
|
205
|
+
- If either side fails → the frontend/backend contract is broken
|
|
206
|
+
|
|
207
|
+
### Banned Verification Methods
|
|
208
|
+
|
|
209
|
+
WogiFlow explicitly bans methods that give false confidence for UI tasks:
|
|
210
|
+
|
|
211
|
+
| Banned Method | Why It's Insufficient |
|
|
212
|
+
|---|---|
|
|
213
|
+
| `tsc --noEmit` passes | Type-correct code can have wrong runtime behavior |
|
|
214
|
+
| `vite build` succeeds | Build success says nothing about UX |
|
|
215
|
+
| `grep` deployed bundle | Function may exist but never execute |
|
|
216
|
+
| "I read the code and it's correct" | Author is the worst judge of own work |
|
|
217
|
+
|
|
218
|
+
### The Repeat Failure Protocol
|
|
219
|
+
|
|
220
|
+
When the same issue appears in 2+ consecutive attempts:
|
|
221
|
+
|
|
222
|
+
| Strike | What Happens |
|
|
223
|
+
|--------|-------------|
|
|
224
|
+
| 1 | Normal fix + evidence log |
|
|
225
|
+
| 2 | Mandatory root cause analysis BEFORE coding. Must change approach. Tier 3+ evidence required. |
|
|
226
|
+
| 3 | Hard block: Cannot mark done without screenshot/console evidence. Must explain what's different this time. |
|
|
227
|
+
| 4+ | Escalation: Acknowledge inability, suggest pair debugging with the developer. |
|
|
228
|
+
|
|
229
|
+
This prevents the AI from trying the same broken approach in a loop — a pattern that wastes hours on real projects.
|
|
230
|
+
|
|
231
|
+
---
|
|
232
|
+
|
|
233
|
+
## Hybrid Mode: Smart Model Routing
|
|
234
|
+
|
|
235
|
+
Not every task needs the most expensive model. WogiFlow's hybrid mode lets Opus plan while cheaper models execute:
|
|
236
|
+
|
|
237
|
+
| Task Complexity | Executor | Token Savings |
|
|
238
|
+
|----------------|----------|---------------|
|
|
239
|
+
| Typo fix, config edit | Haiku / GPT-4o-mini | ~75% |
|
|
240
|
+
| New function, component | Sonnet / GPT-4o | ~60% |
|
|
241
|
+
| Documentation | Haiku | ~80% |
|
|
242
|
+
| Complex refactoring | Opus only | 0% (not delegated) |
|
|
243
|
+
|
|
244
|
+
**How it works:**
|
|
245
|
+
1. Opus analyzes the task and creates a detailed execution plan
|
|
246
|
+
2. You review and approve (or edit via `/wogi-hybrid-edit`)
|
|
247
|
+
3. The executor model (Haiku, Sonnet, Ollama, etc.) runs each step
|
|
248
|
+
4. Opus validates results — lint, typecheck, standards checks
|
|
249
|
+
5. Opus handles failures with escalation back to itself if needed
|
|
250
|
+
|
|
251
|
+
**Supports cloud and local models**: Haiku, Sonnet, GPT-4o, GPT-4o-mini, Gemini Flash/Pro, Ollama (Qwen3-Coder, DeepSeek Coder, Nemotron). Local models = free execution.
|
|
252
|
+
|
|
253
|
+
---
|
|
254
|
+
|
|
255
|
+
## Code Review: /wogi-review
|
|
256
|
+
|
|
257
|
+
WogiFlow's review is not "look at the code and give suggestions." It's a 5-phase verification pipeline:
|
|
258
|
+
|
|
259
|
+
1. **Verification Gates**: Spec verification, lint, typecheck, test execution
|
|
260
|
+
2. **AI Analysis**: Multi-pass or parallel code/logic/security/architecture review with adversarial minimum findings (the reviewer must find at least N issues — prevents rubber-stamping)
|
|
261
|
+
3. **Git-Verified Claim Checking**: Cross-references spec claims against actual git diff. If the spec promises a file that isn't in the diff → BLOCKED
|
|
262
|
+
4. **Standards Compliance [STRICT]**: Every finding checked against decisions.md, app-map.md, naming conventions. MUST_FIX violations block sign-off
|
|
263
|
+
5. **Post-Review Workflow**: Fix loop with automatic task creation for findings
|
|
264
|
+
|
|
265
|
+
The review produces a structured report with findings classified by severity (critical, high, medium, low) and type (MUST_FIX, SHOULD_FIX, SUGGESTION). Critical/high findings block the next release.
|
|
266
|
+
|
|
267
|
+
---
|
|
268
|
+
|
|
269
|
+
## Morning Briefing: /wogi-morning
|
|
270
|
+
|
|
271
|
+
Start every day with instant situational awareness:
|
|
272
|
+
|
|
273
|
+
- **Where you left off**: Current task, status, files touched
|
|
274
|
+
- **What happened since**: New commits, issues, changes by teammates
|
|
275
|
+
- **Rule violations**: Auto-promoted patterns awaiting enforcement decisions
|
|
276
|
+
- **Stale skills**: Documentation older than 90 days flagged for refresh
|
|
277
|
+
- **Recommended next tasks**: Top 3 by priority from the backlog
|
|
278
|
+
- **Ready-to-use prompt**: Copy-paste continuation prompt to resume immediately
|
|
279
|
+
|
|
280
|
+
This eliminates the 10-15 minute "where was I?" ramp-up that costs teams hours per week.
|
|
281
|
+
|
|
282
|
+
---
|
|
283
|
+
|
|
284
|
+
## Session End: /wogi-session-end
|
|
285
|
+
|
|
286
|
+
When you stop working, WogiFlow preserves everything:
|
|
287
|
+
|
|
288
|
+
- **Structured handoff notes**: What's completed, what's in progress, what's next
|
|
289
|
+
- **Cross-session pattern detection**: Identifies requests repeated 3+ times across sessions and suggests promoting them to permanent rules
|
|
290
|
+
- **State persistence**: All workflow state committed to git — survives restarts, crashes, and machine changes
|
|
291
|
+
- **Request log verification**: Ensures all changes are logged before closing
|
|
292
|
+
- **Component registry check**: Verifies new components are registered in app-map
|
|
293
|
+
|
|
294
|
+
The next session (or the next developer on the same repo) picks up exactly where you left off.
|
|
295
|
+
|
|
296
|
+
---
|
|
297
|
+
|
|
298
|
+
## Workspace Mode: Multi-Repo Orchestration
|
|
299
|
+
|
|
300
|
+
This is the game changer for enterprise teams. WogiFlow can manage multiple repositories from a single orchestrator.
|
|
301
|
+
|
|
302
|
+
### The Manager-Worker Architecture
|
|
303
|
+
|
|
304
|
+
```
|
|
305
|
+
Manager (workspace root — orchestrates, never codes)
|
|
306
|
+
│
|
|
307
|
+
├── Backend repo (provider — APIs, database)
|
|
308
|
+
├── Frontend repo (consumer — UI, pages)
|
|
309
|
+
├── Shared repo (library — types, utils)
|
|
310
|
+
└── Mobile repo (consumer — native app)
|
|
311
|
+
```
|
|
312
|
+
|
|
313
|
+
**How it works:**
|
|
314
|
+
1. You tell the manager: "Add user profile editing"
|
|
315
|
+
2. Manager reads metadata from all repos (API maps, component registries, schemas — never source code)
|
|
316
|
+
3. Manager analyzes which repos are affected and in what order
|
|
317
|
+
4. Manager creates phased execution plan: library → provider → consumer
|
|
318
|
+
5. Each worker repo gets a task dispatched via HTTP channel
|
|
319
|
+
6. Workers execute independently in their own Claude Code sessions
|
|
320
|
+
7. Workers communicate via message bus: contract changes, questions, completion signals
|
|
321
|
+
8. Manager validates cross-repo integration
|
|
322
|
+
|
|
323
|
+
### Mechanical Boundary Enforcement
|
|
324
|
+
|
|
325
|
+
The manager physically cannot modify worker repo source code. The Manager Boundary Gate blocks all Edit/Write operations on files inside member repos. The manager can only:
|
|
326
|
+
- Read metadata (api-map, app-map, schema-map, config)
|
|
327
|
+
- Dispatch tasks to workers
|
|
328
|
+
- Read messages from the message bus
|
|
329
|
+
- Validate cross-repo contracts
|
|
330
|
+
|
|
331
|
+
This prevents the orchestrator from becoming a bottleneck or making unauthorized changes.
|
|
332
|
+
|
|
333
|
+
### Cross-Repo Quality Gates
|
|
334
|
+
|
|
335
|
+
When workspace mode is active, additional gates enforce integration quality:
|
|
336
|
+
- **Contract Compliance**: Changes must comply with declared API contracts
|
|
337
|
+
- **Peer Notification**: Affected repos are automatically notified of changes
|
|
338
|
+
- **Cascade Verification**: Library changes trigger verification in all consumer repos
|
|
339
|
+
- **Impact Query**: Before implementing, workers can ask peers "will my change break you?"
|
|
340
|
+
|
|
341
|
+
### Agent-to-Agent Communication
|
|
342
|
+
|
|
343
|
+
Workers communicate through 11 message types:
|
|
344
|
+
- `contract-change` — "I changed an API endpoint"
|
|
345
|
+
- `question` — "Does your side handle X?"
|
|
346
|
+
- `impact-query` / `impact-response` — Pre-implementation impact assessment
|
|
347
|
+
- `lock-acquired` / `lock-released` — Shared interface edit coordination
|
|
348
|
+
- `verification-request` — "Please verify your integrations"
|
|
349
|
+
|
|
350
|
+
---
|
|
351
|
+
|
|
352
|
+
## Self-Improving Workflow
|
|
353
|
+
|
|
354
|
+
WogiFlow learns from corrections and promotes patterns to permanent rules:
|
|
355
|
+
|
|
356
|
+
1. **Feedback Patterns**: When you correct the AI, it records the pattern in `feedback-patterns.md`
|
|
357
|
+
2. **Promotion Threshold**: When a pattern occurs 3+ times, it's promoted to `decisions.md` (permanent project rules)
|
|
358
|
+
3. **Gate Telemetry**: Every gate tracks pass/catch/miss rates. A gate that consistently passes work that later needs correction has a high "miss rate" — revealing rubber-stamping
|
|
359
|
+
4. **Correction Memory**: The IGR system cross-references corrections back to gates that previously approved the flawed work
|
|
360
|
+
5. **Decision Authority**: The AI learns which decisions it can make autonomously vs which need human approval, calibrated per category
|
|
361
|
+
|
|
362
|
+
This means the more you use WogiFlow, the better it gets. Week 1 catches 60% of issues. Week 8 catches 90% — because the rules that caught the other 30% were learned from your corrections.
|
|
363
|
+
|
|
364
|
+
---
|
|
365
|
+
|
|
366
|
+
## Memory System
|
|
367
|
+
|
|
368
|
+
WogiFlow maintains persistent memory across sessions using SQLite with semantic search:
|
|
369
|
+
|
|
370
|
+
- **SQLite database** with HuggingFace Transformers embeddings (all-MiniLM-L6-v2)
|
|
371
|
+
- **Cosine similarity search**: "Find decisions related to authentication" works without exact keywords
|
|
372
|
+
- **Relevance decay**: Facts accessed frequently stay hot; unused facts decay over 30 days
|
|
373
|
+
- **Cold retention**: Facts not accessed in 90 days are archived
|
|
374
|
+
- **Auto-promotion**: Facts accessed 3+ times with high relevance are promoted to permanent knowledge
|
|
375
|
+
- **Structured registries**: app-map, function-map, api-map, schema-map, service-map — deterministic lookup for components, functions, endpoints, schemas
|
|
376
|
+
|
|
377
|
+
---
|
|
378
|
+
|
|
379
|
+
## Why This Matters for Enterprise
|
|
380
|
+
|
|
381
|
+
### The Cost of Undisciplined AI Coding
|
|
382
|
+
|
|
383
|
+
| Problem | Without WogiFlow | With WogiFlow |
|
|
384
|
+
|---------|-----------------|---------------|
|
|
385
|
+
| Untracked changes | AI edits files without logging | Every change logged, tagged, auditable |
|
|
386
|
+
| Skipped verification | "It compiles" = "it works" | 5-tier evidence system, skeptical evaluator |
|
|
387
|
+
| Convention drift | Each session forgets project rules | Rules mechanically enforced, self-improving |
|
|
388
|
+
| Scope creep | Fix one bug, touch 15 files | Bugfix scope gate limits blast radius |
|
|
389
|
+
| Knowledge loss | Context lost between sessions | SQLite memory + structured handoffs |
|
|
390
|
+
| Unsafe deployments | Deploy without testing | Deploy gate blocks without verification |
|
|
391
|
+
| Team inconsistency | 10 developers, 10 styles | Standards gate enforces unified conventions |
|
|
392
|
+
| Wasted tokens | Full pipeline for typo fixes | Phase-loaded router: 79% savings for small tasks |
|
|
393
|
+
| Multi-repo chaos | Manual coordination across repos | Workspace orchestration with boundary enforcement |
|
|
394
|
+
|
|
395
|
+
### What Companies Get
|
|
396
|
+
|
|
397
|
+
1. **Auditability**: Every task is tracked from request through implementation to verification. The request log, git history, and verification artifacts create a complete audit trail.
|
|
398
|
+
|
|
399
|
+
2. **Consistency**: Mechanical enforcement means the AI follows the same process whether it's Monday morning or Friday 5pm, whether the developer is senior or junior.
|
|
400
|
+
|
|
401
|
+
3. **Scalability**: Hybrid mode reduces token costs by 60-75%. Workspace mode enables multi-repo orchestration. Phase loading optimizes prompt costs.
|
|
402
|
+
|
|
403
|
+
4. **Safety**: 12+ mechanical gates prevent unauthorized changes, scope creep, unsafe deployments, and convention violations. The AI literally cannot bypass them.
|
|
404
|
+
|
|
405
|
+
5. **Continuous improvement**: The self-learning system means the workflow gets better over time without manual rule maintenance. Corrections become permanent rules automatically.
|
|
406
|
+
|
|
407
|
+
6. **Knowledge retention**: Project knowledge persists in structured state files, semantic memory, and cross-session handoffs. New team members inherit the full learning history.
|
|
408
|
+
|
|
409
|
+
---
|
|
410
|
+
|
|
411
|
+
## Quick Start
|
|
412
|
+
|
|
413
|
+
```bash
|
|
414
|
+
npm install -D wogiflow
|
|
415
|
+
npx flow onboard
|
|
416
|
+
```
|
|
417
|
+
|
|
418
|
+
Onboarding analyzes the existing project, detects the tech stack, indexes components, and configures the workflow. First task can start in under 5 minutes.
|
|
419
|
+
|
|
420
|
+
---
|
|
421
|
+
|
|
422
|
+
*WogiFlow v2.15.0 — AGPL-3.0 Licensed*
|
|
423
|
+
*Teams/enterprise features available via @wogiflow/teams*
|
|
@@ -6,7 +6,7 @@ Instructions for the spec/approval phase. Loaded on-demand when phase transition
|
|
|
6
6
|
|
|
7
7
|
**Conditional** — runs for L1+ tasks when IGR on. L3 skip. L2 runs only on ultrathink auto-bump.
|
|
8
8
|
|
|
9
|
-
Spawn a **read-only sub-agent** (Explore subagent_type, with Read/Grep/Glob only — no Edit/Write/Bash) on a model chosen per `config.intentGroundedReasoning.architectPass.modelOverride`. Input: Framing Artifact from Step 1.15 + explore findings from Step 1.3 + scope-confidence audit from Step 1.45 + the Logic Constitution
|
|
9
|
+
Spawn a **read-only sub-agent** (Explore subagent_type, with Read/Grep/Glob only — no Edit/Write/Bash) on a model chosen per `config.intentGroundedReasoning.architectPass.modelOverride`. Input: Framing Artifact from Step 1.15 + explore findings from Step 1.3 + scope-confidence audit from Step 1.45 + the Logic Constitution v2 rubric (so the Architect anticipates the Adversary's checks, including Principle 11 — Platform Capability Grounding, which demands citation + enforcement-preservation + alternative-ruled-out + fallback for every platform-capability claim).
|
|
10
10
|
|
|
11
11
|
Build the prompt via `node scripts/flow-architect-pass.js prompt <task>`. Invoke via Agent tool. Output: an 8-section plan at `.workflow/plans/{taskId}.md` (PINs: approach, data-model, journey-impact, net-new, alternatives, risks, reversibility, dependencies). Parse via `parsePlanArtifact()`; if structural FAIL, re-prompt.
|
|
12
12
|
|
|
@@ -20,7 +20,7 @@ When IGR flag is OFF: SKIPPED. Pipeline proceeds from Step 1.45 directly to Step
|
|
|
20
20
|
|
|
21
21
|
Spawn a **separate sub-agent on a different model** than the Architect (Sonnet when Architect is Opus; Opus when Architect is Sonnet — per `modelSeparation: different-from-architect`). Per Anthropic harness research, same-model self-critique is a known rubber-stamp failure mode.
|
|
22
22
|
|
|
23
|
-
Build the prompt via `node scripts/flow-logic-adversary.js prompt .workflow/plans/{taskId}.md`. The Adversary critiques the plan against the
|
|
23
|
+
Build the prompt via `node scripts/flow-logic-adversary.js prompt .workflow/plans/{taskId}.md`. The Adversary critiques the plan against the 11-principle Logic Constitution v2 with few-shot calibration examples from `.workflow/state/adversary-calibration.json`. Principle 11 (Platform Capability Grounding) always runs — every claim about hook/tool/API/platform behavior must be cited, enforcement must be preserved, an alternative must be named, and a fallback must be specified.
|
|
24
24
|
|
|
25
25
|
Iteration loop (max 3 rounds by default):
|
|
26
26
|
- `overallVerdict: PASS` or `PASS_WITH_CONCERNS` → proceed to Step 1.5. Concerns surface at approval gate (Step 1.6).
|
|
@@ -473,7 +473,7 @@ Run `node node_modules/wogiflow/scripts/flow-standards-gate.js wf-XXXXXXXX [chan
|
|
|
473
473
|
|
|
474
474
|
Checks scoped by task type: component → naming/components/security. Utility → naming/functions/security. API → naming/api/security. Bugfix → naming/security. Feature → all. Refactor/migration → all + consumer-impact verification.
|
|
475
475
|
|
|
476
|
-
**Consumer impact check** (ALL L1+ tasks):
|
|
476
|
+
**Consumer impact check** (ALL L1+ tasks): The blast-radius analysis ran in the explore phase (Agent 6: Consumer Impact) and wrote results to `.workflow/state/blast-radius-{taskId}.json`. That file contains an array of consumer entries, each classified as BREAKING (must update), NEEDS-UPDATE (review), or SAFE (no change needed). Read the file and, for each entry with `classification: "BREAKING"`, verify the file listed in `path` was actually modified in this task's changeset (check `git diff --name-only`). If any BREAKING consumer is NOT in the diff → BLOCK task completion and surface to user.
|
|
477
477
|
|
|
478
478
|
**Reuse candidate check** (AI-as-Judge): Standards gate returns similar items from all registries. AI reasons about PURPOSE overlap (not just name). If purpose overlaps → ask user (use existing / extend / create new). If purpose clearly differs → proceed silently.
|
|
479
479
|
|
|
@@ -11,9 +11,9 @@
|
|
|
11
11
|
|
|
12
12
|
You are the **Logic Adversary** for WogiFlow's Intent-Grounded Reasoning layer.
|
|
13
13
|
|
|
14
|
-
Your job is to find logic problems in a plan BEFORE any code is written. You are not critiquing code. You are not checking style, library choice, or syntax — other gates handle those. You are reasoning about whether this plan is **logically right** for the project it's proposed for.
|
|
14
|
+
Your job is to find logic problems in a plan BEFORE any code is written. You are not critiquing code. You are not checking style, library choice, or syntax — other gates handle those. You are reasoning about whether this plan is **logically right** for the project it's proposed for AND whether its claims about the target platform (hooks, tool APIs, subagent model, MCP, etc.) are actually true.
|
|
15
15
|
|
|
16
|
-
You have a specific rubric: the Logic Constitution (currently
|
|
16
|
+
You have a specific rubric: the Logic Constitution (currently v2). You will receive it as input. For each of the 11 principles, you produce a verdict: PASS, CONCERN, FAIL, or SKIP. You cite specific evidence for every verdict. Verdicts without evidence are themselves failures of your job.
|
|
17
17
|
|
|
18
18
|
### What you are looking for
|
|
19
19
|
|
|
@@ -29,6 +29,11 @@ Patterns that produce logic failures in practice — seen in real agent session
|
|
|
29
29
|
8. **Implicit-requirement blindness** — happy path only, no edge cases.
|
|
30
30
|
9. **User-journey orphans** — dead-end screens, unreachable features.
|
|
31
31
|
10. **Undocumented irreversibility** — destructive ops without confirmation.
|
|
32
|
+
11. **Ungrounded platform-capability claims** — plans that rely on a hook, tool API, subagent behavior, MCP feature, or slash command working a certain way WITHOUT citation, WITHOUT enforcement-preservation evidence, WITHOUT a ruled-out alternative, or WITHOUT a capability-unavailable fallback. For every platform-capability claim, demand all four: citation, enforcement walk-through, alternative, fallback. Missing any of the four = FAIL. **Additionally, for runtime-behavior claims (hooks firing, tools returning specific shapes, signals being handled, events being emitted), hearsay-level citations — code comments or docs claiming "X does Y" — are NOT sufficient. Demand either O1 (a captured observation: log, telemetry, trace, test result) OR O2 (a named live-test plan that produces O1 before downstream code is built). See P11.1 in the rubric. A comment saying "the hook fires" is not evidence the hook fires; a log line showing it firing is.**
|
|
33
|
+
|
|
34
|
+
**P11.2 — The same discipline applies to the PROJECT'S OWN RULES**, not just platform capabilities. For every artifact a plan produces (task IDs, file names, config values, state-file entries, spec structures, commit messages), demand: (E1) which rule from `decisions.md`, `feedback-patterns.md`, `.claude/rules/`, a schema, or a validator function applies? (E2) show the artifact satisfying the rule — run the validator, show the format side-by-side with the rule, paste the passing check — *not* "I followed it." (E3) what's the failure mode when violated? Examples of P11.2 violations: (a) "task ID `wf-test0001` follows WogiFlow convention" — no, the convention requires hex, this fails `validateTaskId()`; (b) "config key `taskBoundaryReset` is valid" without being in `flow-constants.js`'s known-keys list; (c) "file name `flowFoo.js` follows kebab-case" — it doesn't. Reflex: *what's the artifact? what rule governs it? is satisfaction SHOWN, not just claimed?*
|
|
35
|
+
|
|
36
|
+
**P11.3 — Also check for EXISTING WOGIFLOW FEATURES that touch the same domain.** Before shipping any new mechanism (hook, wrapper, CLI entry, state file, config key, skill), enumerate the sibling surface: (S1) `grep -r "execSync\|spawn.*claude" lib/ scripts/`, check `.claude/commands/`, check `scripts/flow-constants.js`, check `lib/workspace.js` — does an existing feature already touch this domain? (S2) Show how the new mechanism composes, conflicts, or integrates with each sibling. "Orthogonal" is OK but must be asserted. (S3) If integration work is needed (e.g., the new wrapper needs to be injected into workspace's `execSync('claude')` call), include it in scope OR explicitly file a follow-up story. Silent omission of sibling integration = FAIL. Example violation caught live: `wogi-claude` wrapper initially missed that `lib/workspace.js:1612` spawns claude directly, so workspace-mode workers weren't restart-capable.
|
|
32
37
|
|
|
33
38
|
### What you are NOT looking for
|
|
34
39
|
|