wogiflow 2.7.1 → 2.8.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/commands/wogi-audit.md +156 -8
- package/.claude/commands/wogi-start.md +276 -0
- package/lib/workspace-channel-server.js +19 -4
- package/lib/workspace-routing.js +15 -5
- package/lib/workspace.js +10 -0
- package/package.json +1 -1
- package/scripts/flow-audit-gates.js +766 -0
- package/scripts/flow-runtime-verification.js +782 -0
|
@@ -44,11 +44,12 @@ node node_modules/wogiflow/scripts/flow-progress-tracker.js update '{"taskId":"a
|
|
|
44
44
|
| Phase | phaseNum | Description |
|
|
45
45
|
|-------|----------|-------------|
|
|
46
46
|
| 1 | Gather Files | Scan project files |
|
|
47
|
+
| 1.5 | Gate 0 | Pre-agent baseline checks (build, typecheck, lint, config integrity) |
|
|
47
48
|
| 2 | Agents | 7 parallel agents (sub-steps = agents) |
|
|
48
|
-
| 3 | Consolidate | Score calculation |
|
|
49
|
+
| 3 | Consolidate | Score calculation + Gate 0 cap |
|
|
49
50
|
| 4 | Pattern Promotion | AI clustering + cross-reference + gaps |
|
|
50
|
-
| 5 | Report | Display formatted report |
|
|
51
|
-
| 6 | Persist | Save to last-audit.json |
|
|
51
|
+
| 5 | Report | Display formatted report with Gate 0 baseline |
|
|
52
|
+
| 6 | Persist | Save to last-audit.json (includes Gate 0 data + trend) |
|
|
52
53
|
|
|
53
54
|
**Display at each agent completion:**
|
|
54
55
|
```
|
|
@@ -68,6 +69,68 @@ node node_modules/wogiflow/scripts/flow-audit.js files
|
|
|
68
69
|
|
|
69
70
|
This returns all tracked project files (excluding node_modules, dist, .workflow/state/, etc.). Use this as the base file set for all agents.
|
|
70
71
|
|
|
72
|
+
### Step 1.5: Gate 0 — Pre-Agent Baseline Checks (MANDATORY)
|
|
73
|
+
|
|
74
|
+
**Run BEFORE launching any analysis agents.** These are hard, verifiable checks — not AI judgment. They produce quantitative metrics that cap the final audit score.
|
|
75
|
+
|
|
76
|
+
**Principle**: If the project doesn't build, doesn't pass typecheck, or has hundreds of linter errors — the score CANNOT be higher than D+, regardless of how elegant the architecture is. The foundation is broken.
|
|
77
|
+
|
|
78
|
+
```bash
|
|
79
|
+
node node_modules/wogiflow/scripts/flow-audit-gates.js run
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
This returns JSON with all gate results. Parse and display:
|
|
83
|
+
|
|
84
|
+
```
|
|
85
|
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
86
|
+
GATE 0: PROJECT HEALTH BASELINE
|
|
87
|
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
88
|
+
|
|
89
|
+
BUILD: ✓ passes | ✗ FAILS (cap: D)
|
|
90
|
+
TYPECHECK: ✓ 0 errors | ✗ N errors (cap: C/D+/D)
|
|
91
|
+
LINT: ✓ 0 errors, M warnings | ✗ N errors (cap: C)
|
|
92
|
+
LINT CONFIG: ✓ no downgraded rules | ✗ N rules downgraded (-N pts)
|
|
93
|
+
TESTS: ✓ pass | ✗ FAIL | ○ no test script
|
|
94
|
+
SCRIPTS: ✓ all present | ✗ missing: build, test
|
|
95
|
+
|
|
96
|
+
Extended:
|
|
97
|
+
eslint-disable comments: N (across M files)
|
|
98
|
+
Framework: React 18.x + TypeScript (monorepo)
|
|
99
|
+
Git health: 45 commits/30d, conventional commits: yes
|
|
100
|
+
Env hygiene: .env.example ✓, CI ✓
|
|
101
|
+
|
|
102
|
+
Score cap: [GRADE] (reasons: ...)
|
|
103
|
+
Trend: typecheck errors 939 → 412 (-527) ↑
|
|
104
|
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
**Gate results feed into scoring (Step 3)**:
|
|
108
|
+
- `gate0.cap.scoreCap` — maximum score the project can achieve
|
|
109
|
+
- `gate0.cap.penalties` — points deducted from the agent-derived score
|
|
110
|
+
- `gate0.eslintDisables` — passed to Consistency agent as context
|
|
111
|
+
- `gate0.framework` — used to load framework-specific agent prompts
|
|
112
|
+
- `gate0.trend` — shown in the final report for improvement tracking
|
|
113
|
+
|
|
114
|
+
**If Gate 0 reveals critical issues** (build fails, >500 type errors), display a prominent warning before proceeding to agents:
|
|
115
|
+
```
|
|
116
|
+
⚠ CRITICAL BASELINE ISSUES DETECTED
|
|
117
|
+
The project has fundamental health problems. Agent analysis will proceed
|
|
118
|
+
but the overall score is capped at [GRADE] due to Gate 0 failures.
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
**Framework-specific agent prompts**: When `gate0.framework` detects a known framework, inject framework-specific checks into the relevant agents:
|
|
122
|
+
|
|
123
|
+
| Framework | Agent | Additional Checks |
|
|
124
|
+
|-----------|-------|-------------------|
|
|
125
|
+
| **React** | Performance | useState count per component (>5 = re-render risk), React.memo usage ratio, inline objects in JSX .map(), useEffect without cleanup |
|
|
126
|
+
| **React** | Architecture | God components (>1000 LOC), prop drilling depth, context provider nesting |
|
|
127
|
+
| **Next.js** | Performance | Page bundle sizes, dynamic imports usage, ISR/SSR appropriate usage |
|
|
128
|
+
| **Next.js** | Architecture | API route structure, middleware usage, server/client boundary |
|
|
129
|
+
| **NestJS** | Architecture | Module structure, circular module deps, guard/interceptor coverage |
|
|
130
|
+
| **NestJS** | Performance | Eager-loaded modules, missing caching decorators |
|
|
131
|
+
|
|
132
|
+
The framework checks are appended to the existing agent prompts — they don't replace the universal checks.
|
|
133
|
+
|
|
71
134
|
### Step 2: Launch 7 Parallel Agents
|
|
72
135
|
|
|
73
136
|
Launch ALL enabled agents as parallel `Task` calls in a single message. Each agent uses `subagent_type=Explore` and `model="sonnet"` (per decisions.md: use Sonnet for routine exploration).
|
|
@@ -91,8 +154,13 @@ Analyze the architecture of this project.
|
|
|
91
154
|
- Route handlers containing business logic (>50 LOC)
|
|
92
155
|
- Utility files importing domain-specific modules
|
|
93
156
|
4. Find god files (files with >300 LOC or >10 exported functions)
|
|
94
|
-
5. Check for circular dependencies between modules
|
|
157
|
+
5. Check for circular dependencies between modules (import cycles)
|
|
95
158
|
6. Identify missing abstractions (repeated patterns that could be extracted)
|
|
159
|
+
7. **Dead export scan**: For every exported function/component/type, grep for importers.
|
|
160
|
+
Report exports with ZERO importers — these are dead code at the module boundary.
|
|
161
|
+
Count total dead exports and list the top 10 by file.
|
|
162
|
+
8. **If React detected** (from Gate 0 framework): Flag components with >5 useState as
|
|
163
|
+
re-render risks, check React.memo usage ratio, identify prop drilling depth >3
|
|
96
164
|
|
|
97
165
|
Return a structured report with:
|
|
98
166
|
- Strengths (good patterns found)
|
|
@@ -117,9 +185,14 @@ Audit the project's dependencies.
|
|
|
117
185
|
4. Check for known security vulnerabilities:
|
|
118
186
|
- Run: node node_modules/wogiflow/scripts/flow-audit.js audit
|
|
119
187
|
→ This runs npm audit and returns structured results
|
|
188
|
+
5. **Dependency health** (enhanced):
|
|
189
|
+
- Major versions behind: packages that are 2+ majors behind (HIGH)
|
|
190
|
+
- License risk: GPL/AGPL in commercial projects, or UNLICENSED packages
|
|
191
|
+
- Bundle size outliers: dependencies >500KB that could be replaced with lighter alternatives
|
|
192
|
+
- Duplicate packages: same package at multiple versions in the tree
|
|
120
193
|
|
|
121
194
|
Return:
|
|
122
|
-
- Dependencies summary (total, outdated, vulnerable)
|
|
195
|
+
- Dependencies summary (total, outdated, vulnerable, deprecated, license issues)
|
|
123
196
|
- Each finding tagged [HIGH/MED/LOW]
|
|
124
197
|
- Score: A through F
|
|
125
198
|
```
|
|
@@ -201,10 +274,20 @@ Audit consistency of patterns across the project.
|
|
|
201
274
|
5. Configuration patterns:
|
|
202
275
|
- Are config values accessed consistently?
|
|
203
276
|
- Any hardcoded values that should be configurable?
|
|
277
|
+
6. **eslint-disable comment census** (from Gate 0 data):
|
|
278
|
+
- Gate 0 provides the total count and top files
|
|
279
|
+
- Each eslint-disable is a suppressed violation — a high count (>50) indicates
|
|
280
|
+
hidden technical debt through suppression
|
|
281
|
+
- Flag files with >5 eslint-disable comments as consistency violations
|
|
282
|
+
7. **Lint config integrity** (from Gate 0 data):
|
|
283
|
+
- If Gate 0 detected downgraded rules, include them as [HIGH] consistency findings
|
|
284
|
+
- This is "configuration-level debt masking" — making the project appear clean
|
|
285
|
+
by lowering standards instead of fixing code
|
|
204
286
|
|
|
205
287
|
Return:
|
|
206
288
|
- Consistency findings, each tagged [HIGH/MED/LOW]
|
|
207
289
|
- Dominant patterns vs outliers
|
|
290
|
+
- eslint-disable count and top offenders
|
|
208
291
|
- Score: A through F
|
|
209
292
|
```
|
|
210
293
|
|
|
@@ -252,11 +335,26 @@ Catalog technical debt in this project.
|
|
|
252
335
|
5. Cross-reference with existing tech debt:
|
|
253
336
|
- Read .workflow/state/tech-debt.json if it exists
|
|
254
337
|
- Identify new debt vs already-tracked debt
|
|
338
|
+
6. **Test coverage reality check** (from Gate 0 data):
|
|
339
|
+
- Test file ratio: N test files / M source files (ideal: >30%)
|
|
340
|
+
- If coverage report is available: line/branch coverage %
|
|
341
|
+
- 0% test coverage + complex business logic = [HIGH] tech debt
|
|
342
|
+
7. **Git health indicators** (from Gate 0 data):
|
|
343
|
+
- Commit frequency: active/inactive
|
|
344
|
+
- Stale branches (unmerged >30 days)
|
|
345
|
+
- Commit message quality (conventional commits?)
|
|
346
|
+
- Large uncommitted changes count
|
|
347
|
+
8. **Environment/config hygiene** (from Gate 0 data):
|
|
348
|
+
- .env.example missing when .env exists
|
|
349
|
+
- No CI configuration = no automated quality enforcement
|
|
350
|
+
- Secrets patterns in tracked files
|
|
255
351
|
|
|
256
352
|
Return:
|
|
257
353
|
- Tech debt items, each tagged [HIGH/MED/LOW]
|
|
258
354
|
- Summary: TODOs count, FIXMEs count, HACKs count
|
|
259
355
|
- Commented-out code blocks count
|
|
356
|
+
- Test coverage metrics
|
|
357
|
+
- Git health summary
|
|
260
358
|
- Score: A through F
|
|
261
359
|
```
|
|
262
360
|
|
|
@@ -286,11 +384,39 @@ Return:
|
|
|
286
384
|
- Score: A through F
|
|
287
385
|
```
|
|
288
386
|
|
|
289
|
-
### Step 3: Consolidate Results
|
|
387
|
+
### Step 3: Consolidate Results + Apply Score Cap
|
|
290
388
|
|
|
291
389
|
After all agents complete, consolidate into a single report.
|
|
292
390
|
|
|
293
|
-
**
|
|
391
|
+
**3.1. Calculate weighted agent score:**
|
|
392
|
+
```bash
|
|
393
|
+
node node_modules/wogiflow/scripts/flow-audit.js score '{"architecture":"B+","dependencies":"A-",...}'
|
|
394
|
+
```
|
|
395
|
+
|
|
396
|
+
**3.2. Apply Gate 0 score cap:**
|
|
397
|
+
```
|
|
398
|
+
Final score = min(gate0_cap, weighted_agent_score - gate0_penalties)
|
|
399
|
+
```
|
|
400
|
+
|
|
401
|
+
| Gate 0 Result | Score Cap |
|
|
402
|
+
|--------------|-----------|
|
|
403
|
+
| Build fails | max D (63) |
|
|
404
|
+
| Typecheck >500 errors | max D+ (67) |
|
|
405
|
+
| Typecheck >100 errors | max C (73) |
|
|
406
|
+
| Typecheck >50 errors | max C+ (77) |
|
|
407
|
+
| Lint >50 errors | max C (73) |
|
|
408
|
+
| Lint config manipulation | -3 pts per downgraded rule (max -15) |
|
|
409
|
+
|
|
410
|
+
**Example**: Agents score B (83), but build fails → capped at D (63). Agents score B+ (87), but lint config has 4 downgraded rules → 87 - 12 = 75 → C+.
|
|
411
|
+
|
|
412
|
+
**3.3. Include extended metrics in the report:**
|
|
413
|
+
- eslint-disable comment count (from Gate 0)
|
|
414
|
+
- Dead export count (from agent scan)
|
|
415
|
+
- Test file ratio (from Gate 0)
|
|
416
|
+
- Git health indicators (from Gate 0)
|
|
417
|
+
|
|
418
|
+
**3.4. Trend delta (if previous audit exists):**
|
|
419
|
+
Compare current metrics with `last-audit.json`. Show improvement/regression arrows.
|
|
294
420
|
|
|
295
421
|
### Step 4: Display Report
|
|
296
422
|
|
|
@@ -301,7 +427,11 @@ PROJECT AUDIT REPORT
|
|
|
301
427
|
|
|
302
428
|
Project: [name] | Files scanned: N | Date: YYYY-MM-DD
|
|
303
429
|
|
|
304
|
-
|
|
430
|
+
GATE 0 BASELINE:
|
|
431
|
+
Build: ✓/✗ | Typecheck: N errors | Lint: N errors, M warnings
|
|
432
|
+
Score cap: [GRADE] | Penalties: -N pts | Framework: [detected]
|
|
433
|
+
|
|
434
|
+
HEALTH SCORE: [A/B/C/D/F] (capped by Gate 0 from agent score of [X])
|
|
305
435
|
|
|
306
436
|
━━━ ARCHITECTURE (score: X) ━━━
|
|
307
437
|
Strengths:
|
|
@@ -493,6 +623,24 @@ Regardless of user choice, always save the audit results to `.workflow/state/las
|
|
|
493
623
|
{
|
|
494
624
|
"date": "YYYY-MM-DD",
|
|
495
625
|
"overallScore": "B+",
|
|
626
|
+
"gate0": {
|
|
627
|
+
"buildPasses": true,
|
|
628
|
+
"typecheckErrors": 0,
|
|
629
|
+
"lintErrors": 0,
|
|
630
|
+
"lintWarnings": 12,
|
|
631
|
+
"downgradedRules": [],
|
|
632
|
+
"testsPassing": true,
|
|
633
|
+
"missingScripts": [],
|
|
634
|
+
"eslintDisableCount": 23,
|
|
635
|
+
"scoreCap": 100,
|
|
636
|
+
"penalties": 0,
|
|
637
|
+
"framework": { "name": "react", "version": "18.2.0" },
|
|
638
|
+
"gitHealth": { "recentCommits": 45, "staleBranches": 2, "conventionalCommits": true },
|
|
639
|
+
"envHygiene": { "envExample": true, "ciConfigured": true },
|
|
640
|
+
"testCoverage": { "testFiles": 34, "sourceFiles": 120, "ratio": "28.3%" }
|
|
641
|
+
},
|
|
642
|
+
"agentScore": "B+",
|
|
643
|
+
"scoreCappedBy": null,
|
|
496
644
|
"scores": {
|
|
497
645
|
"architecture": "B+",
|
|
498
646
|
"dependencies": "A-",
|
|
@@ -560,6 +560,282 @@ After implementing all scenarios, BEFORE quality gates:
|
|
|
560
560
|
|
|
561
561
|
**Why this works**: The evaluator has NO emotional investment in the code. It reads the spec and the diff cold. It's explicitly prompted to be skeptical. And because it's a separate sub-agent, it has a fresh context — no accumulated "I already know this works" bias from the implementation phase.
|
|
562
562
|
|
|
563
|
+
### Step 3.58: Runtime Verification Gate — Auto-Test Generation (MANDATORY)
|
|
564
|
+
|
|
565
|
+
**Activates when**: ANY code file is changed. This is the DEFAULT — not optional.
|
|
566
|
+
|
|
567
|
+
Run detection: `node node_modules/wogiflow/scripts/flow-runtime-verification.js task-type [changed-files...]`
|
|
568
|
+
|
|
569
|
+
This returns the task type: `frontend`, `backend`, `fullstack`, or `other`. For `frontend` and `fullstack`, UI browser tests are generated. For `backend` and `fullstack`, API integration tests are generated. For `other`, standard static verification applies.
|
|
570
|
+
|
|
571
|
+
**The problem this solves**: AI workers mark tasks as "done" based on static evidence (TypeScript compiles, build succeeds) without verifying the feature actually works end-to-end. This leads to repeated failed iterations. Auto-generated tests catch these failures BEFORE the user does.
|
|
572
|
+
|
|
573
|
+
**DEFAULT BEHAVIOR**: For every task, WogiFlow auto-generates and runs verification tests as part of the execution loop. Tests are written to `tests/verification/` and persist as regression guards. This is ON by default — disable with `config.runtimeVerification.enabled: false`.
|
|
574
|
+
|
|
575
|
+
#### Auto-Test Generation Flow
|
|
576
|
+
|
|
577
|
+
```
|
|
578
|
+
For EACH acceptance criterion in the spec:
|
|
579
|
+
1. Classify: Is this a UI behavior, API behavior, or internal logic?
|
|
580
|
+
2. Generate: Write a test that exercises the criterion
|
|
581
|
+
3. Implement: Write the actual code
|
|
582
|
+
4. Run: Execute the test — it MUST pass
|
|
583
|
+
5. If FAIL → debug, fix, re-run (max 5 retries)
|
|
584
|
+
6. Persist: Test file stays in tests/verification/ as regression guard
|
|
585
|
+
```
|
|
586
|
+
|
|
587
|
+
**This is NOT TDD** (where tests come first and must fail initially). This is **post-implementation verification** — the test is generated from the criterion, the code is written, then the test validates the code works. The key difference: TDD tests are written before code; verification tests are written alongside code and run after.
|
|
588
|
+
|
|
589
|
+
---
|
|
590
|
+
|
|
591
|
+
#### FRONTEND: Browser Test Generation (Playwright + WebMCP)
|
|
592
|
+
|
|
593
|
+
**Activates when**: Changed files match `*.tsx`, `*.jsx`, `*.vue`, `*.svelte`, `*.css`, `*.styled.*`
|
|
594
|
+
|
|
595
|
+
**The problem this solves**: AI workers mark UI tasks as "done" based on static evidence without ever opening a browser. (See: Pipeline Rules case study — 5 failed iterations, same bug.)
|
|
596
|
+
|
|
597
|
+
**BANNED verification methods** — these NEVER count as evidence for UI tasks:
|
|
598
|
+
|
|
599
|
+
| Banned Method | What it proves | Why it's insufficient |
|
|
600
|
+
|---|---|---|
|
|
601
|
+
| `grep` deployed bundle for function names | Code included in build | Function may never execute or render wrong |
|
|
602
|
+
| `tsc --noEmit` passes | Types are correct | Type-correct code can have wrong runtime behavior |
|
|
603
|
+
| `vite build` succeeds | Modules resolve | Build success says nothing about UX |
|
|
604
|
+
| "I read the code and it's logically correct" | Nothing | Author is worst possible judge of own work |
|
|
605
|
+
| `aws s3 sync` completes | Files hosted | Hosting ≠ functioning |
|
|
606
|
+
|
|
607
|
+
**Evidence Tiers** — every verification claim must be classified:
|
|
608
|
+
|
|
609
|
+
| Tier | Name | Sufficient alone? |
|
|
610
|
+
|---|---|---|
|
|
611
|
+
| 0 | STATIC (compile, build, lint) | NEVER |
|
|
612
|
+
| 1 | STRUCTURAL (file exists, imported, route registered) | NEVER |
|
|
613
|
+
| 2 | OBSERVATIONAL (page loads, feature renders) | Yes (display-only) |
|
|
614
|
+
| 3 | INTERACTIVE (click/type/submit → observed result persists) | Yes (behavioral) |
|
|
615
|
+
| 4 | AUTOMATED (Playwright/WebMCP test passes) | Yes (strongest) |
|
|
616
|
+
|
|
617
|
+
**Minimum: Tier 2 for display criteria, Tier 3 for behavioral criteria.**
|
|
618
|
+
|
|
619
|
+
#### Verification Method Selection
|
|
620
|
+
|
|
621
|
+
Run: `node node_modules/wogiflow/scripts/flow-runtime-verification.js method`
|
|
622
|
+
|
|
623
|
+
**Priority order** (use the first available):
|
|
624
|
+
|
|
625
|
+
**1. WebMCP Browser Verification (DEFAULT — preferred)**
|
|
626
|
+
|
|
627
|
+
When `config.webmcp.enabled` or a browser MCP server is detected in `.mcp.json`:
|
|
628
|
+
|
|
629
|
+
For EACH acceptance criterion:
|
|
630
|
+
1. Navigate to the affected page via `mcp_browser_navigate`
|
|
631
|
+
2. Screenshot BEFORE: `mcp_browser_screenshot()`
|
|
632
|
+
3. Perform the user action (click, type, select, submit)
|
|
633
|
+
4. Wait 2-3 seconds for async updates
|
|
634
|
+
5. Screenshot AFTER: `mcp_browser_screenshot()`
|
|
635
|
+
6. Assert DOM state: `mcp_browser_evaluate("document.querySelector(...)")`
|
|
636
|
+
7. Record in Behavioral Evidence Log
|
|
637
|
+
|
|
638
|
+
**High-risk tasks** (state mutation detected — useMutation, invalidateQueries, onMutate):
|
|
639
|
+
- After all criteria verified, wait 3 seconds
|
|
640
|
+
- Screenshot again — check state persisted after refetch
|
|
641
|
+
- Reload page: `mcp_browser_navigate` to same URL
|
|
642
|
+
- Wait for networkidle
|
|
643
|
+
- Screenshot — check state survived page reload
|
|
644
|
+
- If state reverted → the server didn't persist, or refetch overwrote it → FAIL
|
|
645
|
+
|
|
646
|
+
**2. Playwright Test Generation (secondary)**
|
|
647
|
+
|
|
648
|
+
When Playwright/Puppeteer is in dependencies but no WebMCP:
|
|
649
|
+
|
|
650
|
+
1. Auto-generate a Playwright test from acceptance criteria
|
|
651
|
+
2. Write test to `tests/verification/verify-{taskId}.spec.ts`
|
|
652
|
+
3. Instruct the user: "Run `npx playwright test tests/verification/verify-{taskId}.spec.ts --headed` to verify"
|
|
653
|
+
4. If the project has CI, the test persists as a regression guard
|
|
654
|
+
|
|
655
|
+
**3. User Verification Checklist (fallback — always available)**
|
|
656
|
+
|
|
657
|
+
When neither WebMCP nor Playwright is available:
|
|
658
|
+
|
|
659
|
+
Present a checklist to the user:
|
|
660
|
+
```
|
|
661
|
+
━━━ USER VERIFICATION CHECKLIST ━━━
|
|
662
|
+
I cannot verify UI behavior from the CLI. Please check:
|
|
663
|
+
|
|
664
|
+
□ 1. Navigate to [page]
|
|
665
|
+
□ 2. [criterion 1 — specific action + expected result]
|
|
666
|
+
□ 3. [criterion 2 — specific action + expected result]
|
|
667
|
+
□ Wait 3 seconds after each action
|
|
668
|
+
□ Refresh the page and verify changes persisted
|
|
669
|
+
|
|
670
|
+
Reply "verified" when all checks pass, or describe what's broken.
|
|
671
|
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
672
|
+
```
|
|
673
|
+
|
|
674
|
+
**CRITICAL**: The agent MUST wait for the user's "verified" response before marking the task complete. Do NOT proceed to quality gates without verification.
|
|
675
|
+
|
|
676
|
+
#### Behavioral Evidence Log (BEL)
|
|
677
|
+
|
|
678
|
+
Before marking ANY UI task complete, produce a BEL:
|
|
679
|
+
|
|
680
|
+
```
|
|
681
|
+
━━━ BEHAVIORAL EVIDENCE LOG ━━━
|
|
682
|
+
Task: wf-XXXXXXXX
|
|
683
|
+
Method: WEBMCP / PLAYWRIGHT / USER_CHECKLIST
|
|
684
|
+
Verified on: localhost:5173
|
|
685
|
+
|
|
686
|
+
CRITERION: "[text]"
|
|
687
|
+
ACTION: Clicked "Route To" cell, selected "Design Department"
|
|
688
|
+
EXPECTED: Cell updates to show "Design DEPARTMENT"
|
|
689
|
+
OBSERVED: Cell shows "Design DEPARTMENT" with blue icon
|
|
690
|
+
WAIT: 3 seconds — state persisted after refetch
|
|
691
|
+
VERDICT: PASS
|
|
692
|
+
EVIDENCE: Tier 3 (INTERACTIVE)
|
|
693
|
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
694
|
+
```
|
|
695
|
+
|
|
696
|
+
The OBSERVED field MUST describe what was SEEN, not what the code theoretically produces.
|
|
697
|
+
|
|
698
|
+
#### Pre-Implementation "See Before You Touch" (modification tasks)
|
|
699
|
+
|
|
700
|
+
For tasks modifying existing UI (not greenfield):
|
|
701
|
+
1. Start dev server if not running
|
|
702
|
+
2. Navigate to the affected page
|
|
703
|
+
3. Screenshot/observe current state (BEFORE)
|
|
704
|
+
4. Document the baseline
|
|
705
|
+
5. Then implement changes
|
|
706
|
+
6. After implementation, compare BEFORE vs AFTER
|
|
707
|
+
|
|
708
|
+
#### Repeat Failure Protocol (Groundhog Day Detector)
|
|
709
|
+
|
|
710
|
+
When the SAME issue is reported in 2+ consecutive dispatches:
|
|
711
|
+
|
|
712
|
+
| Strike | Action |
|
|
713
|
+
|--------|--------|
|
|
714
|
+
| 1 | Normal fix + BEL |
|
|
715
|
+
| 2 | MANDATORY root cause analysis BEFORE coding. Change approach. Add console.log tracing. Tier 3+ evidence required. |
|
|
716
|
+
| 3 | HARD BLOCK: Cannot mark done without screenshot/console evidence. Must state what's DIFFERENT this time. |
|
|
717
|
+
| 4+ | ESCALATION: Acknowledge inability, suggest pair debugging with user. |
|
|
718
|
+
|
|
719
|
+
Run: `node node_modules/wogiflow/scripts/flow-runtime-verification.js repeat wf-XXXXXXXX`
|
|
720
|
+
|
|
721
|
+
#### Devil's Advocate Prompt
|
|
722
|
+
|
|
723
|
+
Before marking ANY task complete (frontend or backend), ask yourself:
|
|
724
|
+
|
|
725
|
+
> "Assume this is broken. What are the 3 most likely ways it could fail?"
|
|
726
|
+
|
|
727
|
+
Then CHECK each one:
|
|
728
|
+
1. Does the API actually accept these fields? (curl it or check the DTO)
|
|
729
|
+
2. Does the response include the fields I'm reading? (log the response)
|
|
730
|
+
3. Does the UI update persist after refetch/re-render? (wait 3 seconds and look again)
|
|
731
|
+
4. Is the request payload shape what the server expects? (compare DTO with frontend fetch)
|
|
732
|
+
|
|
733
|
+
If ANY is plausible and not verified → investigate before marking done.
|
|
734
|
+
|
|
735
|
+
---
|
|
736
|
+
|
|
737
|
+
#### BACKEND: API Integration Test Generation
|
|
738
|
+
|
|
739
|
+
**Activates when**: Changed files match `*.controller.*`, `*.service.*`, `*.resolver.*`, `/routes/`, `/api/`, `*.dto.*`, `*.guard.*`, `*.middleware.*`
|
|
740
|
+
|
|
741
|
+
Run detection: `node node_modules/wogiflow/scripts/flow-runtime-verification.js api-detect [changed-files...]`
|
|
742
|
+
|
|
743
|
+
**For EACH acceptance criterion that involves an API endpoint**:
|
|
744
|
+
|
|
745
|
+
1. **Identify the endpoint**: method (GET/POST/PUT/PATCH/DELETE), path, expected request/response shape
|
|
746
|
+
2. **Generate an integration test** that:
|
|
747
|
+
- Makes the actual HTTP request to the running dev server
|
|
748
|
+
- Asserts the status code matches expected
|
|
749
|
+
- Asserts the response body contains expected fields
|
|
750
|
+
- For mutations (POST/PUT/PATCH/DELETE): re-fetches the resource to verify persistence
|
|
751
|
+
- For auth-protected endpoints: includes the auth token
|
|
752
|
+
3. **Write the test** to `tests/verification/api-verify-{taskId}.test.js`
|
|
753
|
+
4. **Run the test**: `node --test tests/verification/api-verify-{taskId}.test.js`
|
|
754
|
+
5. **If test fails** → debug, fix the implementation, re-run (max 5 retries)
|
|
755
|
+
6. **Test persists** as a regression guard
|
|
756
|
+
|
|
757
|
+
**API Test Template** (generated per criterion):
|
|
758
|
+
|
|
759
|
+
```javascript
|
|
760
|
+
it('POST /api/pipeline-rules — creates a rule with correct fields', async () => {
|
|
761
|
+
const res = await apiRequest('POST', '/api/pipeline-rules', {
|
|
762
|
+
tagPattern: 'animation',
|
|
763
|
+
routeTo: { type: 'department', id: 'dept-123' },
|
|
764
|
+
mode: 'CLAIMABLE'
|
|
765
|
+
});
|
|
766
|
+
|
|
767
|
+
// Status check
|
|
768
|
+
assert.equal(res.status, 201);
|
|
769
|
+
|
|
770
|
+
// Response shape check
|
|
771
|
+
assert.ok(res.data.id, 'Response missing field: id');
|
|
772
|
+
assert.equal(res.data.tagPattern, 'animation');
|
|
773
|
+
assert.equal(res.data.mode, 'CLAIMABLE');
|
|
774
|
+
|
|
775
|
+
// Persistence check: re-fetch and verify stored
|
|
776
|
+
const verify = await apiRequest('GET', `/api/pipeline-rules/${res.data.id}`);
|
|
777
|
+
assert.equal(verify.status, 200);
|
|
778
|
+
assert.equal(verify.data.tagPattern, 'animation');
|
|
779
|
+
});
|
|
780
|
+
```
|
|
781
|
+
|
|
782
|
+
**Boundary verification** (frontend↔backend):
|
|
783
|
+
When the task is `fullstack` (both UI and API files changed):
|
|
784
|
+
1. Generate BOTH browser tests AND API tests
|
|
785
|
+
2. The API test verifies the server accepts the payload shape the frontend sends
|
|
786
|
+
3. The browser test verifies the UI correctly displays the response shape the server returns
|
|
787
|
+
4. If either fails → the boundary contract is broken
|
|
788
|
+
|
|
789
|
+
**Quick verification via curl** (for manual checking):
|
|
790
|
+
The AI can also generate and run curl commands directly:
|
|
791
|
+
```bash
|
|
792
|
+
# Create a rule
|
|
793
|
+
curl -s -X POST http://localhost:3000/api/pipeline-rules \
|
|
794
|
+
-H "Content-Type: application/json" \
|
|
795
|
+
-d '{"tagPattern":"animation","routeTo":{"type":"department","id":"dept-123"},"mode":"CLAIMABLE"}'
|
|
796
|
+
|
|
797
|
+
# Verify it was stored
|
|
798
|
+
curl -s http://localhost:3000/api/pipeline-rules | jq '.[-1]'
|
|
799
|
+
```
|
|
800
|
+
|
|
801
|
+
---
|
|
802
|
+
|
|
803
|
+
#### Configuration
|
|
804
|
+
|
|
805
|
+
```json
|
|
806
|
+
{
|
|
807
|
+
"runtimeVerification": {
|
|
808
|
+
"enabled": true,
|
|
809
|
+
"autoGenerateTests": true,
|
|
810
|
+
"frontend": {
|
|
811
|
+
"method": "webmcp",
|
|
812
|
+
"fallback": ["playwright", "checklist"],
|
|
813
|
+
"devServerUrl": "http://localhost:5173"
|
|
814
|
+
},
|
|
815
|
+
"backend": {
|
|
816
|
+
"method": "api-test",
|
|
817
|
+
"fallback": ["curl", "checklist"],
|
|
818
|
+
"baseUrl": "http://localhost:3000"
|
|
819
|
+
},
|
|
820
|
+
"testOutput": "tests/verification",
|
|
821
|
+
"persistTests": true,
|
|
822
|
+
"blockOnFailure": true
|
|
823
|
+
}
|
|
824
|
+
}
|
|
825
|
+
```
|
|
826
|
+
|
|
827
|
+
**`autoGenerateTests: true`** (default) — Tests are generated for EVERY task. This is the core behavioral change: verification is not an afterthought, it's built into the execution loop.
|
|
828
|
+
|
|
829
|
+
**`persistTests: true`** (default) — Generated tests stay in `tests/verification/` as permanent regression guards. Over time, this builds an automated test suite from the actual use cases that were implemented.
|
|
830
|
+
|
|
831
|
+
**`blockOnFailure: true`** (default) — If generated tests fail, the task is NOT complete. The agent must fix the implementation until tests pass.
|
|
832
|
+
|
|
833
|
+
#### Skip Conditions
|
|
834
|
+
|
|
835
|
+
- `config.runtimeVerification.enabled: false` → skip entirely (not recommended)
|
|
836
|
+
- Task has NO code files in changed set (docs-only, config-only) → skip
|
|
837
|
+
- Task is L3 trivial AND no UI/API files → skip
|
|
838
|
+
|
|
563
839
|
### Step 3.6: Integration Wiring Validation (MANDATORY)
|
|
564
840
|
|
|
565
841
|
Run `node node_modules/wogiflow/scripts/flow-wiring-verifier.js wf-XXXXXXXX`
|
|
@@ -462,17 +462,32 @@ const server = http.createServer(async (req, res) => {
|
|
|
462
462
|
return;
|
|
463
463
|
}
|
|
464
464
|
|
|
465
|
-
// Determine sender from header or default
|
|
466
|
-
const
|
|
465
|
+
// Determine sender from header or default (validate against name pattern)
|
|
466
|
+
const rawFrom = req.headers['x-wogi-from'] || '';
|
|
467
|
+
const from = VALID_NAME_PATTERN.test(rawFrom) ? rawFrom : 'workspace-manager';
|
|
468
|
+
|
|
469
|
+
// Parse effort level prefix: [effort:high] /wogi-start wf-xxx
|
|
470
|
+
let effortLevel = '';
|
|
471
|
+
let cleanBody = body;
|
|
472
|
+
const effortMatch = body.match(/^\[effort:(low|medium|high)\]\s*/);
|
|
473
|
+
if (effortMatch) {
|
|
474
|
+
effortLevel = effortMatch[1];
|
|
475
|
+
cleanBody = body.substring(effortMatch[0].length);
|
|
476
|
+
}
|
|
467
477
|
|
|
468
478
|
// Forward as channel notification to Claude Code
|
|
469
479
|
const meta = {
|
|
470
480
|
from,
|
|
471
481
|
port: String(PORT),
|
|
472
482
|
repo: REPO_NAME,
|
|
473
|
-
receivedAt: new Date().toISOString()
|
|
483
|
+
receivedAt: new Date().toISOString(),
|
|
484
|
+
...(effortLevel && { effortLevel })
|
|
474
485
|
};
|
|
475
|
-
|
|
486
|
+
// Send the clean body (without effort prefix) but include effort in meta
|
|
487
|
+
const notificationBody = effortLevel
|
|
488
|
+
? `${cleanBody}\n\n[System: Apply reasoning effort level "${effortLevel}" to this task — propagated from workspace manager]`
|
|
489
|
+
: cleanBody;
|
|
490
|
+
sendChannelNotification(notificationBody, meta);
|
|
476
491
|
|
|
477
492
|
// Also broadcast to SSE subscribers
|
|
478
493
|
if (sseClients.size > 0) {
|
package/lib/workspace-routing.js
CHANGED
|
@@ -222,7 +222,9 @@ After completing the task:
|
|
|
222
222
|
agentConfig: {
|
|
223
223
|
description: `${repoName}: ${task.substring(0, 50)}...`,
|
|
224
224
|
// Named subagent (2.1.88+): shows in @ mention typeahead as @repoName
|
|
225
|
-
name: repoName
|
|
225
|
+
name: repoName,
|
|
226
|
+
// Propagate reasoning effort from manager to worker
|
|
227
|
+
...(process.env.WOGI_EFFORT_LEVEL && { model_options: { effort: process.env.WOGI_EFFORT_LEVEL } })
|
|
226
228
|
}
|
|
227
229
|
};
|
|
228
230
|
}
|
|
@@ -704,9 +706,11 @@ function checkWorkerHealth(port) {
|
|
|
704
706
|
* @param {string} workspaceRoot
|
|
705
707
|
* @param {string} repoName — target repo
|
|
706
708
|
* @param {string} taskId — task ID to start
|
|
709
|
+
* @param {Object} [opts] — dispatch options
|
|
710
|
+
* @param {string} [opts.effortLevel] — reasoning effort to propagate ('low'|'medium'|'high')
|
|
707
711
|
* @returns {Promise<{ ok: boolean, message: string }>}
|
|
708
712
|
*/
|
|
709
|
-
async function dispatchToChannel(workspaceRoot, repoName, taskId) {
|
|
713
|
+
async function dispatchToChannel(workspaceRoot, repoName, taskId, opts = {}) {
|
|
710
714
|
// Validate taskId format to prevent injection into channel body
|
|
711
715
|
if (!/^wf-[0-9a-f]{8}$/i.test(taskId)) {
|
|
712
716
|
return { ok: false, message: `Invalid task ID format: "${taskId}" — expected wf-XXXXXXXX` };
|
|
@@ -738,10 +742,16 @@ async function dispatchToChannel(workspaceRoot, repoName, taskId) {
|
|
|
738
742
|
};
|
|
739
743
|
}
|
|
740
744
|
|
|
741
|
-
// Dispatch the task
|
|
742
|
-
|
|
745
|
+
// Dispatch the task with effort level propagation
|
|
746
|
+
// When the manager uses ultrathink/high effort, workers should too
|
|
747
|
+
const VALID_EFFORTS = new Set(['low', 'medium', 'high']);
|
|
748
|
+
const rawEffort = opts.effortLevel || process.env.WOGI_EFFORT_LEVEL || '';
|
|
749
|
+
const effortLevel = VALID_EFFORTS.has(rawEffort) ? rawEffort : '';
|
|
750
|
+
const effortPrefix = effortLevel ? `[effort:${effortLevel}] ` : '';
|
|
751
|
+
const dispatchBody = `${effortPrefix}/wogi-start ${taskId}`;
|
|
752
|
+
const result = await httpPost('127.0.0.1', port, dispatchBody);
|
|
743
753
|
if (result.ok) {
|
|
744
|
-
return { ok: true, message: `Dispatched /wogi-start ${taskId} to ${repoName} (port ${port})` };
|
|
754
|
+
return { ok: true, message: `Dispatched /wogi-start ${taskId} to ${repoName} (port ${port})${effortLevel ? ` [effort: ${effortLevel}]` : ''}` };
|
|
745
755
|
}
|
|
746
756
|
|
|
747
757
|
return { ok: false, message: `Dispatch failed: HTTP ${result.status} — ${result.body}` };
|
package/lib/workspace.js
CHANGED
|
@@ -645,6 +645,16 @@ ${Object.keys(config.channels?.members || {}).map(name => `- **@${name}-investig
|
|
|
645
645
|
|
|
646
646
|
When spawning agents for delegation, always include the \`name\` field in the Agent config to enable @mention addressing.
|
|
647
647
|
|
|
648
|
+
## Reasoning Effort Propagation
|
|
649
|
+
|
|
650
|
+
When the user requests high reasoning effort (e.g., "ultrathink"), propagate it to workers:
|
|
651
|
+
- Set \`WOGI_EFFORT_LEVEL=high\` environment variable before dispatching
|
|
652
|
+
- The dispatch system automatically prefixes channel messages with \`[effort:high]\`
|
|
653
|
+
- Workers receiving \`[effort:high]\` should apply the same reasoning level to their work
|
|
654
|
+
- This ensures workers don't use lower effort than what the user requested for the manager
|
|
655
|
+
|
|
656
|
+
**In curl dispatches**, prefix the message: \`curl -s -X POST http://localhost:{port} -d "[effort:high] /wogi-start {taskId}"\`
|
|
657
|
+
|
|
648
658
|
## Waiting for Worker Results (CRITICAL — Automatic Return Path)
|
|
649
659
|
|
|
650
660
|
Workers **automatically write results** to \`.workspace/messages/\` when they complete a task. You do NOT need to ask them to report back — the task-completed hook writes a \`task-complete\` message automatically.
|