wogiflow 2.7.1 → 2.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -44,11 +44,12 @@ node node_modules/wogiflow/scripts/flow-progress-tracker.js update '{"taskId":"a
44
44
  | Phase | phaseNum | Description |
45
45
  |-------|----------|-------------|
46
46
  | 1 | Gather Files | Scan project files |
47
+ | 1.5 | Gate 0 | Pre-agent baseline checks (build, typecheck, lint, config integrity) |
47
48
  | 2 | Agents | 7 parallel agents (sub-steps = agents) |
48
- | 3 | Consolidate | Score calculation |
49
+ | 3 | Consolidate | Score calculation + Gate 0 cap |
49
50
  | 4 | Pattern Promotion | AI clustering + cross-reference + gaps |
50
- | 5 | Report | Display formatted report |
51
- | 6 | Persist | Save to last-audit.json |
51
+ | 5 | Report | Display formatted report with Gate 0 baseline |
52
+ | 6 | Persist | Save to last-audit.json (includes Gate 0 data + trend) |
52
53
 
53
54
  **Display at each agent completion:**
54
55
  ```
@@ -68,6 +69,68 @@ node node_modules/wogiflow/scripts/flow-audit.js files
68
69
 
69
70
  This returns all tracked project files (excluding node_modules, dist, .workflow/state/, etc.). Use this as the base file set for all agents.
70
71
 
72
+ ### Step 1.5: Gate 0 — Pre-Agent Baseline Checks (MANDATORY)
73
+
74
+ **Run BEFORE launching any analysis agents.** These are hard, verifiable checks — not AI judgment. They produce quantitative metrics that cap the final audit score.
75
+
76
+ **Principle**: If the project doesn't build, doesn't pass typecheck, or has hundreds of linter errors — the score CANNOT be higher than D+, regardless of how elegant the architecture is. The foundation is broken.
77
+
78
+ ```bash
79
+ node node_modules/wogiflow/scripts/flow-audit-gates.js run
80
+ ```
81
+
82
+ This returns JSON with all gate results. Parse and display:
83
+
84
+ ```
85
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
86
+ GATE 0: PROJECT HEALTH BASELINE
87
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
88
+
89
+ BUILD: ✓ passes | ✗ FAILS (cap: D)
90
+ TYPECHECK: ✓ 0 errors | ✗ N errors (cap: C/D+/D)
91
+ LINT: ✓ 0 errors, M warnings | ✗ N errors (cap: C)
92
+ LINT CONFIG: ✓ no downgraded rules | ✗ N rules downgraded (-N pts)
93
+ TESTS: ✓ pass | ✗ FAIL | ○ no test script
94
+ SCRIPTS: ✓ all present | ✗ missing: build, test
95
+
96
+ Extended:
97
+ eslint-disable comments: N (across M files)
98
+ Framework: React 18.x + TypeScript (monorepo)
99
+ Git health: 45 commits/30d, conventional commits: yes
100
+ Env hygiene: .env.example ✓, CI ✓
101
+
102
+ Score cap: [GRADE] (reasons: ...)
103
+ Trend: typecheck errors 939 → 412 (-527) ↑
104
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
105
+ ```
106
+
107
+ **Gate results feed into scoring (Step 3)**:
108
+ - `gate0.cap.scoreCap` — maximum score the project can achieve
109
+ - `gate0.cap.penalties` — points deducted from the agent-derived score
110
+ - `gate0.eslintDisables` — passed to Consistency agent as context
111
+ - `gate0.framework` — used to load framework-specific agent prompts
112
+ - `gate0.trend` — shown in the final report for improvement tracking
113
+
114
+ **If Gate 0 reveals critical issues** (build fails, >500 type errors), display a prominent warning before proceeding to agents:
115
+ ```
116
+ ⚠ CRITICAL BASELINE ISSUES DETECTED
117
+ The project has fundamental health problems. Agent analysis will proceed
118
+ but the overall score is capped at [GRADE] due to Gate 0 failures.
119
+ ```
120
+
121
+ **Framework-specific agent prompts**: When `gate0.framework` detects a known framework, inject framework-specific checks into the relevant agents:
122
+
123
+ | Framework | Agent | Additional Checks |
124
+ |-----------|-------|-------------------|
125
+ | **React** | Performance | useState count per component (>5 = re-render risk), React.memo usage ratio, inline objects in JSX .map(), useEffect without cleanup |
126
+ | **React** | Architecture | God components (>1000 LOC), prop drilling depth, context provider nesting |
127
+ | **Next.js** | Performance | Page bundle sizes, dynamic imports usage, ISR/SSR appropriate usage |
128
+ | **Next.js** | Architecture | API route structure, middleware usage, server/client boundary |
129
+ | **NestJS** | Architecture | Module structure, circular module deps, guard/interceptor coverage |
130
+ | **NestJS** | Performance | Eager-loaded modules, missing caching decorators |
131
+
132
+ The framework checks are appended to the existing agent prompts — they don't replace the universal checks.
133
+
71
134
  ### Step 2: Launch 7 Parallel Agents
72
135
 
73
136
  Launch ALL enabled agents as parallel `Task` calls in a single message. Each agent uses `subagent_type=Explore` and `model="sonnet"` (per decisions.md: use Sonnet for routine exploration).
@@ -91,8 +154,13 @@ Analyze the architecture of this project.
91
154
  - Route handlers containing business logic (>50 LOC)
92
155
  - Utility files importing domain-specific modules
93
156
  4. Find god files (files with >300 LOC or >10 exported functions)
94
- 5. Check for circular dependencies between modules
157
+ 5. Check for circular dependencies between modules (import cycles)
95
158
  6. Identify missing abstractions (repeated patterns that could be extracted)
159
+ 7. **Dead export scan**: For every exported function/component/type, grep for importers.
160
+ Report exports with ZERO importers — these are dead code at the module boundary.
161
+ Count total dead exports and list the top 10 by file.
162
+ 8. **If React detected** (from Gate 0 framework): Flag components with >5 useState as
163
+ re-render risks, check React.memo usage ratio, identify prop drilling depth >3
96
164
 
97
165
  Return a structured report with:
98
166
  - Strengths (good patterns found)
@@ -117,9 +185,14 @@ Audit the project's dependencies.
117
185
  4. Check for known security vulnerabilities:
118
186
  - Run: node node_modules/wogiflow/scripts/flow-audit.js audit
119
187
  → This runs npm audit and returns structured results
188
+ 5. **Dependency health** (enhanced):
189
+ - Major versions behind: packages that are 2+ majors behind (HIGH)
190
+ - License risk: GPL/AGPL in commercial projects, or UNLICENSED packages
191
+ - Bundle size outliers: dependencies >500KB that could be replaced with lighter alternatives
192
+ - Duplicate packages: same package at multiple versions in the tree
120
193
 
121
194
  Return:
122
- - Dependencies summary (total, outdated, vulnerable)
195
+ - Dependencies summary (total, outdated, vulnerable, deprecated, license issues)
123
196
  - Each finding tagged [HIGH/MED/LOW]
124
197
  - Score: A through F
125
198
  ```
@@ -201,10 +274,20 @@ Audit consistency of patterns across the project.
201
274
  5. Configuration patterns:
202
275
  - Are config values accessed consistently?
203
276
  - Any hardcoded values that should be configurable?
277
+ 6. **eslint-disable comment census** (from Gate 0 data):
278
+ - Gate 0 provides the total count and top files
279
+ - Each eslint-disable is a suppressed violation — a high count (>50) indicates
280
+ hidden technical debt through suppression
281
+ - Flag files with >5 eslint-disable comments as consistency violations
282
+ 7. **Lint config integrity** (from Gate 0 data):
283
+ - If Gate 0 detected downgraded rules, include them as [HIGH] consistency findings
284
+ - This is "configuration-level debt masking" — making the project appear clean
285
+ by lowering standards instead of fixing code
204
286
 
205
287
  Return:
206
288
  - Consistency findings, each tagged [HIGH/MED/LOW]
207
289
  - Dominant patterns vs outliers
290
+ - eslint-disable count and top offenders
208
291
  - Score: A through F
209
292
  ```
210
293
 
@@ -252,11 +335,26 @@ Catalog technical debt in this project.
252
335
  5. Cross-reference with existing tech debt:
253
336
  - Read .workflow/state/tech-debt.json if it exists
254
337
  - Identify new debt vs already-tracked debt
338
+ 6. **Test coverage reality check** (from Gate 0 data):
339
+ - Test file ratio: N test files / M source files (ideal: >30%)
340
+ - If coverage report is available: line/branch coverage %
341
+ - 0% test coverage + complex business logic = [HIGH] tech debt
342
+ 7. **Git health indicators** (from Gate 0 data):
343
+ - Commit frequency: active/inactive
344
+ - Stale branches (unmerged >30 days)
345
+ - Commit message quality (conventional commits?)
346
+ - Large uncommitted changes count
347
+ 8. **Environment/config hygiene** (from Gate 0 data):
348
+ - .env.example missing when .env exists
349
+ - No CI configuration = no automated quality enforcement
350
+ - Secrets patterns in tracked files
255
351
 
256
352
  Return:
257
353
  - Tech debt items, each tagged [HIGH/MED/LOW]
258
354
  - Summary: TODOs count, FIXMEs count, HACKs count
259
355
  - Commented-out code blocks count
356
+ - Test coverage metrics
357
+ - Git health summary
260
358
  - Score: A through F
261
359
  ```
262
360
 
@@ -286,11 +384,39 @@ Return:
286
384
  - Score: A through F
287
385
  ```
288
386
 
289
- ### Step 3: Consolidate Results
387
+ ### Step 3: Consolidate Results + Apply Score Cap
290
388
 
291
389
  After all agents complete, consolidate into a single report.
292
390
 
293
- **Use `node node_modules/wogiflow/scripts/flow-audit.js score` with the agent scores to calculate a weighted overall score.**
391
+ **3.1. Calculate weighted agent score:**
392
+ ```bash
393
+ node node_modules/wogiflow/scripts/flow-audit.js score '{"architecture":"B+","dependencies":"A-",...}'
394
+ ```
395
+
396
+ **3.2. Apply Gate 0 score cap:**
397
+ ```
398
+ Final score = min(gate0_cap, weighted_agent_score - gate0_penalties)
399
+ ```
400
+
401
+ | Gate 0 Result | Score Cap |
402
+ |--------------|-----------|
403
+ | Build fails | max D (63) |
404
+ | Typecheck >500 errors | max D+ (67) |
405
+ | Typecheck >100 errors | max C (73) |
406
+ | Typecheck >50 errors | max C+ (77) |
407
+ | Lint >50 errors | max C (73) |
408
+ | Lint config manipulation | -3 pts per downgraded rule (max -15) |
409
+
410
+ **Example**: Agents score B (83), but build fails → capped at D (63). Agents score B+ (87), but lint config has 4 downgraded rules → 87 - 12 = 75 → C+.
411
+
412
+ **3.3. Include extended metrics in the report:**
413
+ - eslint-disable comment count (from Gate 0)
414
+ - Dead export count (from agent scan)
415
+ - Test file ratio (from Gate 0)
416
+ - Git health indicators (from Gate 0)
417
+
418
+ **3.4. Trend delta (if previous audit exists):**
419
+ Compare current metrics with `last-audit.json`. Show improvement/regression arrows.
294
420
 
295
421
  ### Step 4: Display Report
296
422
 
@@ -301,7 +427,11 @@ PROJECT AUDIT REPORT
301
427
 
302
428
  Project: [name] | Files scanned: N | Date: YYYY-MM-DD
303
429
 
304
- HEALTH SCORE: [A/B/C/D/F] (weighted across all dimensions)
430
+ GATE 0 BASELINE:
431
+ Build: ✓/✗ | Typecheck: N errors | Lint: N errors, M warnings
432
+ Score cap: [GRADE] | Penalties: -N pts | Framework: [detected]
433
+
434
+ HEALTH SCORE: [A/B/C/D/F] (capped by Gate 0 from agent score of [X])
305
435
 
306
436
  ━━━ ARCHITECTURE (score: X) ━━━
307
437
  Strengths:
@@ -493,6 +623,24 @@ Regardless of user choice, always save the audit results to `.workflow/state/las
493
623
  {
494
624
  "date": "YYYY-MM-DD",
495
625
  "overallScore": "B+",
626
+ "gate0": {
627
+ "buildPasses": true,
628
+ "typecheckErrors": 0,
629
+ "lintErrors": 0,
630
+ "lintWarnings": 12,
631
+ "downgradedRules": [],
632
+ "testsPassing": true,
633
+ "missingScripts": [],
634
+ "eslintDisableCount": 23,
635
+ "scoreCap": 100,
636
+ "penalties": 0,
637
+ "framework": { "name": "react", "version": "18.2.0" },
638
+ "gitHealth": { "recentCommits": 45, "staleBranches": 2, "conventionalCommits": true },
639
+ "envHygiene": { "envExample": true, "ciConfigured": true },
640
+ "testCoverage": { "testFiles": 34, "sourceFiles": 120, "ratio": "28.3%" }
641
+ },
642
+ "agentScore": "B+",
643
+ "scoreCappedBy": null,
496
644
  "scores": {
497
645
  "architecture": "B+",
498
646
  "dependencies": "A-",
@@ -560,6 +560,282 @@ After implementing all scenarios, BEFORE quality gates:
560
560
 
561
561
  **Why this works**: The evaluator has NO emotional investment in the code. It reads the spec and the diff cold. It's explicitly prompted to be skeptical. And because it's a separate sub-agent, it has a fresh context — no accumulated "I already know this works" bias from the implementation phase.
562
562
 
563
+ ### Step 3.58: Runtime Verification Gate — Auto-Test Generation (MANDATORY)
564
+
565
+ **Activates when**: ANY code file is changed. This is the DEFAULT — not optional.
566
+
567
+ Run detection: `node node_modules/wogiflow/scripts/flow-runtime-verification.js task-type [changed-files...]`
568
+
569
+ This returns the task type: `frontend`, `backend`, `fullstack`, or `other`. For `frontend` and `fullstack`, UI browser tests are generated. For `backend` and `fullstack`, API integration tests are generated. For `other`, standard static verification applies.
570
+
571
+ **The problem this solves**: AI workers mark tasks as "done" based on static evidence (TypeScript compiles, build succeeds) without verifying the feature actually works end-to-end. This leads to repeated failed iterations. Auto-generated tests catch these failures BEFORE the user does.
572
+
573
+ **DEFAULT BEHAVIOR**: For every task, WogiFlow auto-generates and runs verification tests as part of the execution loop. Tests are written to `tests/verification/` and persist as regression guards. This is ON by default — disable with `config.runtimeVerification.enabled: false`.
574
+
575
+ #### Auto-Test Generation Flow
576
+
577
+ ```
578
+ For EACH acceptance criterion in the spec:
579
+ 1. Classify: Is this a UI behavior, API behavior, or internal logic?
580
+ 2. Generate: Write a test that exercises the criterion
581
+ 3. Implement: Write the actual code
582
+ 4. Run: Execute the test — it MUST pass
583
+ 5. If FAIL → debug, fix, re-run (max 5 retries)
584
+ 6. Persist: Test file stays in tests/verification/ as regression guard
585
+ ```
586
+
587
+ **This is NOT TDD** (where tests come first and must fail initially). This is **post-implementation verification** — the test is generated from the criterion, the code is written, then the test validates the code works. The key difference: TDD tests are written before code; verification tests are written alongside code and run after.
588
+
589
+ ---
590
+
591
+ #### FRONTEND: Browser Test Generation (Playwright + WebMCP)
592
+
593
+ **Activates when**: Changed files match `*.tsx`, `*.jsx`, `*.vue`, `*.svelte`, `*.css`, `*.styled.*`
594
+
595
+ **The problem this solves**: AI workers mark UI tasks as "done" based on static evidence without ever opening a browser. (See: Pipeline Rules case study — 5 failed iterations, same bug.)
596
+
597
+ **BANNED verification methods** — these NEVER count as evidence for UI tasks:
598
+
599
+ | Banned Method | What it proves | Why it's insufficient |
600
+ |---|---|---|
601
+ | `grep` deployed bundle for function names | Code included in build | Function may never execute or render wrong |
602
+ | `tsc --noEmit` passes | Types are correct | Type-correct code can have wrong runtime behavior |
603
+ | `vite build` succeeds | Modules resolve | Build success says nothing about UX |
604
+ | "I read the code and it's logically correct" | Nothing | Author is worst possible judge of own work |
605
+ | `aws s3 sync` completes | Files hosted | Hosting ≠ functioning |
606
+
607
+ **Evidence Tiers** — every verification claim must be classified:
608
+
609
+ | Tier | Name | Sufficient alone? |
610
+ |---|---|---|
611
+ | 0 | STATIC (compile, build, lint) | NEVER |
612
+ | 1 | STRUCTURAL (file exists, imported, route registered) | NEVER |
613
+ | 2 | OBSERVATIONAL (page loads, feature renders) | Yes (display-only) |
614
+ | 3 | INTERACTIVE (click/type/submit → observed result persists) | Yes (behavioral) |
615
+ | 4 | AUTOMATED (Playwright/WebMCP test passes) | Yes (strongest) |
616
+
617
+ **Minimum: Tier 2 for display criteria, Tier 3 for behavioral criteria.**
618
+
619
+ #### Verification Method Selection
620
+
621
+ Run: `node node_modules/wogiflow/scripts/flow-runtime-verification.js method`
622
+
623
+ **Priority order** (use the first available):
624
+
625
+ **1. WebMCP Browser Verification (DEFAULT — preferred)**
626
+
627
+ When `config.webmcp.enabled` or a browser MCP server is detected in `.mcp.json`:
628
+
629
+ For EACH acceptance criterion:
630
+ 1. Navigate to the affected page via `mcp_browser_navigate`
631
+ 2. Screenshot BEFORE: `mcp_browser_screenshot()`
632
+ 3. Perform the user action (click, type, select, submit)
633
+ 4. Wait 2-3 seconds for async updates
634
+ 5. Screenshot AFTER: `mcp_browser_screenshot()`
635
+ 6. Assert DOM state: `mcp_browser_evaluate("document.querySelector(...)")`
636
+ 7. Record in Behavioral Evidence Log
637
+
638
+ **High-risk tasks** (state mutation detected — useMutation, invalidateQueries, onMutate):
639
+ - After all criteria verified, wait 3 seconds
640
+ - Screenshot again — check state persisted after refetch
641
+ - Reload page: `mcp_browser_navigate` to same URL
642
+ - Wait for networkidle
643
+ - Screenshot — check state survived page reload
644
+ - If state reverted → the server didn't persist, or refetch overwrote it → FAIL
645
+
646
+ **2. Playwright Test Generation (secondary)**
647
+
648
+ When Playwright/Puppeteer is in dependencies but no WebMCP:
649
+
650
+ 1. Auto-generate a Playwright test from acceptance criteria
651
+ 2. Write test to `tests/verification/verify-{taskId}.spec.ts`
652
+ 3. Instruct the user: "Run `npx playwright test tests/verification/verify-{taskId}.spec.ts --headed` to verify"
653
+ 4. If the project has CI, the test persists as a regression guard
654
+
655
+ **3. User Verification Checklist (fallback — always available)**
656
+
657
+ When neither WebMCP nor Playwright is available:
658
+
659
+ Present a checklist to the user:
660
+ ```
661
+ ━━━ USER VERIFICATION CHECKLIST ━━━
662
+ I cannot verify UI behavior from the CLI. Please check:
663
+
664
+ □ 1. Navigate to [page]
665
+ □ 2. [criterion 1 — specific action + expected result]
666
+ □ 3. [criterion 2 — specific action + expected result]
667
+ □ Wait 3 seconds after each action
668
+ □ Refresh the page and verify changes persisted
669
+
670
+ Reply "verified" when all checks pass, or describe what's broken.
671
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
672
+ ```
673
+
674
+ **CRITICAL**: The agent MUST wait for the user's "verified" response before marking the task complete. Do NOT proceed to quality gates without verification.
675
+
676
+ #### Behavioral Evidence Log (BEL)
677
+
678
+ Before marking ANY UI task complete, produce a BEL:
679
+
680
+ ```
681
+ ━━━ BEHAVIORAL EVIDENCE LOG ━━━
682
+ Task: wf-XXXXXXXX
683
+ Method: WEBMCP / PLAYWRIGHT / USER_CHECKLIST
684
+ Verified on: localhost:5173
685
+
686
+ CRITERION: "[text]"
687
+ ACTION: Clicked "Route To" cell, selected "Design Department"
688
+ EXPECTED: Cell updates to show "Design DEPARTMENT"
689
+ OBSERVED: Cell shows "Design DEPARTMENT" with blue icon
690
+ WAIT: 3 seconds — state persisted after refetch
691
+ VERDICT: PASS
692
+ EVIDENCE: Tier 3 (INTERACTIVE)
693
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
694
+ ```
695
+
696
+ The OBSERVED field MUST describe what was SEEN, not what the code theoretically produces.
697
+
698
+ #### Pre-Implementation "See Before You Touch" (modification tasks)
699
+
700
+ For tasks modifying existing UI (not greenfield):
701
+ 1. Start dev server if not running
702
+ 2. Navigate to the affected page
703
+ 3. Screenshot/observe current state (BEFORE)
704
+ 4. Document the baseline
705
+ 5. Then implement changes
706
+ 6. After implementation, compare BEFORE vs AFTER
707
+
708
+ #### Repeat Failure Protocol (Groundhog Day Detector)
709
+
710
+ When the SAME issue is reported in 2+ consecutive dispatches:
711
+
712
+ | Strike | Action |
713
+ |--------|--------|
714
+ | 1 | Normal fix + BEL |
715
+ | 2 | MANDATORY root cause analysis BEFORE coding. Change approach. Add console.log tracing. Tier 3+ evidence required. |
716
+ | 3 | HARD BLOCK: Cannot mark done without screenshot/console evidence. Must state what's DIFFERENT this time. |
717
+ | 4+ | ESCALATION: Acknowledge inability, suggest pair debugging with user. |
718
+
719
+ Run: `node node_modules/wogiflow/scripts/flow-runtime-verification.js repeat wf-XXXXXXXX`
720
+
721
+ #### Devil's Advocate Prompt
722
+
723
+ Before marking ANY task complete (frontend or backend), ask yourself:
724
+
725
+ > "Assume this is broken. What are the 3 most likely ways it could fail?"
726
+
727
+ Then CHECK each one:
728
+ 1. Does the API actually accept these fields? (curl it or check the DTO)
729
+ 2. Does the response include the fields I'm reading? (log the response)
730
+ 3. Does the UI update persist after refetch/re-render? (wait 3 seconds and look again)
731
+ 4. Is the request payload shape what the server expects? (compare DTO with frontend fetch)
732
+
733
+ If ANY is plausible and not verified → investigate before marking done.
734
+
735
+ ---
736
+
737
+ #### BACKEND: API Integration Test Generation
738
+
739
+ **Activates when**: Changed files match `*.controller.*`, `*.service.*`, `*.resolver.*`, `/routes/`, `/api/`, `*.dto.*`, `*.guard.*`, `*.middleware.*`
740
+
741
+ Run detection: `node node_modules/wogiflow/scripts/flow-runtime-verification.js api-detect [changed-files...]`
742
+
743
+ **For EACH acceptance criterion that involves an API endpoint**:
744
+
745
+ 1. **Identify the endpoint**: method (GET/POST/PUT/PATCH/DELETE), path, expected request/response shape
746
+ 2. **Generate an integration test** that:
747
+ - Makes the actual HTTP request to the running dev server
748
+ - Asserts the status code matches expected
749
+ - Asserts the response body contains expected fields
750
+ - For mutations (POST/PUT/PATCH/DELETE): re-fetches the resource to verify persistence
751
+ - For auth-protected endpoints: includes the auth token
752
+ 3. **Write the test** to `tests/verification/api-verify-{taskId}.test.js`
753
+ 4. **Run the test**: `node --test tests/verification/api-verify-{taskId}.test.js`
754
+ 5. **If test fails** → debug, fix the implementation, re-run (max 5 retries)
755
+ 6. **Test persists** as a regression guard
756
+
757
+ **API Test Template** (generated per criterion):
758
+
759
+ ```javascript
760
+ it('POST /api/pipeline-rules — creates a rule with correct fields', async () => {
761
+ const res = await apiRequest('POST', '/api/pipeline-rules', {
762
+ tagPattern: 'animation',
763
+ routeTo: { type: 'department', id: 'dept-123' },
764
+ mode: 'CLAIMABLE'
765
+ });
766
+
767
+ // Status check
768
+ assert.equal(res.status, 201);
769
+
770
+ // Response shape check
771
+ assert.ok(res.data.id, 'Response missing field: id');
772
+ assert.equal(res.data.tagPattern, 'animation');
773
+ assert.equal(res.data.mode, 'CLAIMABLE');
774
+
775
+ // Persistence check: re-fetch and verify stored
776
+ const verify = await apiRequest('GET', `/api/pipeline-rules/${res.data.id}`);
777
+ assert.equal(verify.status, 200);
778
+ assert.equal(verify.data.tagPattern, 'animation');
779
+ });
780
+ ```
781
+
782
+ **Boundary verification** (frontend↔backend):
783
+ When the task is `fullstack` (both UI and API files changed):
784
+ 1. Generate BOTH browser tests AND API tests
785
+ 2. The API test verifies the server accepts the payload shape the frontend sends
786
+ 3. The browser test verifies the UI correctly displays the response shape the server returns
787
+ 4. If either fails → the boundary contract is broken
788
+
789
+ **Quick verification via curl** (for manual checking):
790
+ The AI can also generate and run curl commands directly:
791
+ ```bash
792
+ # Create a rule
793
+ curl -s -X POST http://localhost:3000/api/pipeline-rules \
794
+ -H "Content-Type: application/json" \
795
+ -d '{"tagPattern":"animation","routeTo":{"type":"department","id":"dept-123"},"mode":"CLAIMABLE"}'
796
+
797
+ # Verify it was stored
798
+ curl -s http://localhost:3000/api/pipeline-rules | jq '.[-1]'
799
+ ```
800
+
801
+ ---
802
+
803
+ #### Configuration
804
+
805
+ ```json
806
+ {
807
+ "runtimeVerification": {
808
+ "enabled": true,
809
+ "autoGenerateTests": true,
810
+ "frontend": {
811
+ "method": "webmcp",
812
+ "fallback": ["playwright", "checklist"],
813
+ "devServerUrl": "http://localhost:5173"
814
+ },
815
+ "backend": {
816
+ "method": "api-test",
817
+ "fallback": ["curl", "checklist"],
818
+ "baseUrl": "http://localhost:3000"
819
+ },
820
+ "testOutput": "tests/verification",
821
+ "persistTests": true,
822
+ "blockOnFailure": true
823
+ }
824
+ }
825
+ ```
826
+
827
+ **`autoGenerateTests: true`** (default) — Tests are generated for EVERY task. This is the core behavioral change: verification is not an afterthought, it's built into the execution loop.
828
+
829
+ **`persistTests: true`** (default) — Generated tests stay in `tests/verification/` as permanent regression guards. Over time, this builds an automated test suite from the actual use cases that were implemented.
830
+
831
+ **`blockOnFailure: true`** (default) — If generated tests fail, the task is NOT complete. The agent must fix the implementation until tests pass.
832
+
833
+ #### Skip Conditions
834
+
835
+ - `config.runtimeVerification.enabled: false` → skip entirely (not recommended)
836
+ - Task has NO code files in changed set (docs-only, config-only) → skip
837
+ - Task is L3 trivial AND no UI/API files → skip
838
+
563
839
  ### Step 3.6: Integration Wiring Validation (MANDATORY)
564
840
 
565
841
  Run `node node_modules/wogiflow/scripts/flow-wiring-verifier.js wf-XXXXXXXX`
@@ -462,17 +462,32 @@ const server = http.createServer(async (req, res) => {
462
462
  return;
463
463
  }
464
464
 
465
- // Determine sender from header or default
466
- const from = req.headers['x-wogi-from'] || 'workspace-manager';
465
+ // Determine sender from header or default (validate against name pattern)
466
+ const rawFrom = req.headers['x-wogi-from'] || '';
467
+ const from = VALID_NAME_PATTERN.test(rawFrom) ? rawFrom : 'workspace-manager';
468
+
469
+ // Parse effort level prefix: [effort:high] /wogi-start wf-xxx
470
+ let effortLevel = '';
471
+ let cleanBody = body;
472
+ const effortMatch = body.match(/^\[effort:(low|medium|high)\]\s*/);
473
+ if (effortMatch) {
474
+ effortLevel = effortMatch[1];
475
+ cleanBody = body.substring(effortMatch[0].length);
476
+ }
467
477
 
468
478
  // Forward as channel notification to Claude Code
469
479
  const meta = {
470
480
  from,
471
481
  port: String(PORT),
472
482
  repo: REPO_NAME,
473
- receivedAt: new Date().toISOString()
483
+ receivedAt: new Date().toISOString(),
484
+ ...(effortLevel && { effortLevel })
474
485
  };
475
- sendChannelNotification(body, meta);
486
+ // Send the clean body (without effort prefix) but include effort in meta
487
+ const notificationBody = effortLevel
488
+ ? `${cleanBody}\n\n[System: Apply reasoning effort level "${effortLevel}" to this task — propagated from workspace manager]`
489
+ : cleanBody;
490
+ sendChannelNotification(notificationBody, meta);
476
491
 
477
492
  // Also broadcast to SSE subscribers
478
493
  if (sseClients.size > 0) {
@@ -63,6 +63,12 @@ const WORKSPACE_GATES = [
63
63
  description: 'Verify integration map is up-to-date',
64
64
  phase: 'pre',
65
65
  severity: 'warning'
66
+ },
67
+ {
68
+ name: 'deploymentReadiness',
69
+ description: 'Verify changes are committed and pushed before handoff to downstream workers',
70
+ phase: 'post',
71
+ severity: 'error'
66
72
  }
67
73
  ];
68
74
 
@@ -434,6 +440,84 @@ function broadcastPostChange(workspaceRoot, fromRepo, context, options = {}) {
434
440
  * @param {Object} [taskMeta] — { taskId, taskTitle, changedFiles, impactAssessed }
435
441
  * @returns {{ passed: boolean, message: string, severity: string }}
436
442
  */
443
+ /**
444
+ * Deployment readiness gate — verifies changes are committed and pushed
445
+ * before allowing handoff to downstream workers.
446
+ *
447
+ * In workspace mode, when backend completes and frontend needs to start,
448
+ * the backend's changes MUST be committed and pushed first. Otherwise the
449
+ * frontend worker will build against stale code.
450
+ *
451
+ * Checks:
452
+ * 1. No uncommitted changes in the current repo (git status clean)
453
+ * 2. Local branch is not ahead of remote (changes are pushed)
454
+ *
455
+ * @param {string} workspaceRoot
456
+ * @param {Object} context
457
+ * @param {Object} taskMeta
458
+ * @returns {{ passed: boolean, message: string, severity: string }}
459
+ */
460
+ function gateDeploymentReadiness(workspaceRoot, context, taskMeta) {
461
+ const { execFileSync } = require('node:child_process');
462
+
463
+ try {
464
+ // Check 1: No uncommitted changes
465
+ const statusOutput = execFileSync('git', ['status', '--porcelain'], {
466
+ encoding: 'utf-8',
467
+ timeout: 5000,
468
+ stdio: ['pipe', 'pipe', 'pipe'],
469
+ cwd: workspaceRoot || process.cwd()
470
+ }).trim();
471
+
472
+ if (statusOutput) {
473
+ const lineCount = statusOutput.split('\n').filter(Boolean).length;
474
+ return {
475
+ passed: false,
476
+ message: `${lineCount} uncommitted change(s). Commit and push before handoff to downstream workers.`,
477
+ severity: 'error'
478
+ };
479
+ }
480
+
481
+ // Check 2: Not ahead of remote (changes pushed)
482
+ try {
483
+ const aheadOutput = execFileSync('git', ['rev-list', '--count', '@{upstream}..HEAD'], {
484
+ encoding: 'utf-8',
485
+ timeout: 5000,
486
+ stdio: ['pipe', 'pipe', 'pipe'],
487
+ cwd: workspaceRoot || process.cwd()
488
+ }).trim();
489
+
490
+ const aheadCount = parseInt(aheadOutput, 10);
491
+ if (aheadCount > 0) {
492
+ return {
493
+ passed: false,
494
+ message: `${aheadCount} commit(s) not pushed to remote. Push before handoff to downstream workers.`,
495
+ severity: 'error'
496
+ };
497
+ }
498
+ } catch (_err) {
499
+ // No upstream configured — skip push check but warn
500
+ return {
501
+ passed: true,
502
+ message: 'No upstream branch configured — push check skipped',
503
+ severity: 'warning'
504
+ };
505
+ }
506
+
507
+ return {
508
+ passed: true,
509
+ message: 'Changes committed and pushed — ready for downstream handoff',
510
+ severity: 'info'
511
+ };
512
+ } catch (err) {
513
+ return {
514
+ passed: true,
515
+ message: `Deployment readiness check failed (${err.message}) — degraded to manual`,
516
+ severity: 'warning'
517
+ };
518
+ }
519
+ }
520
+
437
521
  function runWorkspaceGate(gateName, workspaceRoot, context, taskMeta = {}) {
438
522
  switch (gateName) {
439
523
  case 'crossRepoImpactCheck':
@@ -451,6 +535,9 @@ function runWorkspaceGate(gateName, workspaceRoot, context, taskMeta = {}) {
451
535
  case 'integrationMapFreshness':
452
536
  return gateIntegrationMapFreshness(workspaceRoot);
453
537
 
538
+ case 'deploymentReadiness':
539
+ return gateDeploymentReadiness(workspaceRoot, context, taskMeta);
540
+
454
541
  default:
455
542
  return { passed: true, message: `Unknown gate: ${gateName}`, severity: 'warning' };
456
543
  }