npm - wogiflow - Versions diffs - 2.7.1 → 2.9.0 - Mend

wogiflow 2.7.1 → 2.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

package/.claude/commands/wogi-audit.md +156 -8
package/.claude/commands/wogi-start.md +276 -0
package/lib/workspace-channel-server.js +19 -4
package/lib/workspace-gates.js +87 -0
package/lib/workspace-routing.js +15 -5
package/lib/workspace-session.js +308 -0
package/lib/workspace.js +49 -3
package/package.json +1 -1
package/scripts/flow-audit-gates.js +766 -0
package/scripts/flow-config-defaults.js +27 -4
package/scripts/flow-config-migrate.js +270 -0
package/scripts/flow-context-manifest.js +322 -0
package/scripts/flow-done-gates.js +76 -0
package/scripts/flow-done.js +14 -0
package/scripts/flow-gate-latch.js +119 -0
package/scripts/flow-runtime-verification.js +782 -0
package/scripts/hooks/core/post-compact.js +11 -1
package/scripts/hooks/core/session-context.js +51 -7
package/scripts/hooks/core/task-completed.js +26 -0
package/scripts/postinstall.js +20 -0

package/.claude/commands/wogi-audit.md CHANGED Viewed

@@ -44,11 +44,12 @@ node node_modules/wogiflow/scripts/flow-progress-tracker.js update '{"taskId":"a
 | Phase | phaseNum | Description |
 |-------|----------|-------------|
 | 1 | Gather Files | Scan project files |
+| 1.5 | Gate 0 | Pre-agent baseline checks (build, typecheck, lint, config integrity) |
 | 2 | Agents | 7 parallel agents (sub-steps = agents) |
-| 3 | Consolidate | Score calculation |
+| 3 | Consolidate | Score calculation + Gate 0 cap |
 | 4 | Pattern Promotion | AI clustering + cross-reference + gaps |
-| 5 | Report | Display formatted report |
-| 6 | Persist | Save to last-audit.json |
+| 5 | Report | Display formatted report with Gate 0 baseline |
+| 6 | Persist | Save to last-audit.json (includes Gate 0 data + trend) |
 **Display at each agent completion:**
 ```
@@ -68,6 +69,68 @@ node node_modules/wogiflow/scripts/flow-audit.js files
 This returns all tracked project files (excluding node_modules, dist, .workflow/state/, etc.). Use this as the base file set for all agents.
+### Step 1.5: Gate 0 — Pre-Agent Baseline Checks (MANDATORY)
+**Run BEFORE launching any analysis agents.** These are hard, verifiable checks — not AI judgment. They produce quantitative metrics that cap the final audit score.
+**Principle**: If the project doesn't build, doesn't pass typecheck, or has hundreds of linter errors — the score CANNOT be higher than D+, regardless of how elegant the architecture is. The foundation is broken.
+```bash
+node node_modules/wogiflow/scripts/flow-audit-gates.js run
+```
+This returns JSON with all gate results. Parse and display:
+```
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+GATE 0: PROJECT HEALTH BASELINE
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+BUILD:       ✓ passes  |  ✗ FAILS (cap: D)
+TYPECHECK:   ✓ 0 errors  |  ✗ N errors (cap: C/D+/D)
+LINT:        ✓ 0 errors, M warnings  |  ✗ N errors (cap: C)
+LINT CONFIG: ✓ no downgraded rules  |  ✗ N rules downgraded (-N pts)
+TESTS:       ✓ pass  |  ✗ FAIL  |  ○ no test script
+SCRIPTS:     ✓ all present  |  ✗ missing: build, test
+Extended:
+  eslint-disable comments: N (across M files)
+  Framework: React 18.x + TypeScript (monorepo)
+  Git health: 45 commits/30d, conventional commits: yes
+  Env hygiene: .env.example ✓, CI ✓
+Score cap: [GRADE] (reasons: ...)
+Trend: typecheck errors 939 → 412 (-527) ↑
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+```
+**Gate results feed into scoring (Step 3)**:
+- `gate0.cap.scoreCap` — maximum score the project can achieve
+- `gate0.cap.penalties` — points deducted from the agent-derived score
+- `gate0.eslintDisables` — passed to Consistency agent as context
+- `gate0.framework` — used to load framework-specific agent prompts
+- `gate0.trend` — shown in the final report for improvement tracking
+**If Gate 0 reveals critical issues** (build fails, >500 type errors), display a prominent warning before proceeding to agents:
+```
+⚠ CRITICAL BASELINE ISSUES DETECTED
+  The project has fundamental health problems. Agent analysis will proceed
+  but the overall score is capped at [GRADE] due to Gate 0 failures.
+```
+**Framework-specific agent prompts**: When `gate0.framework` detects a known framework, inject framework-specific checks into the relevant agents:
+| Framework | Agent | Additional Checks |
+|-----------|-------|-------------------|
+| **React** | Performance | useState count per component (>5 = re-render risk), React.memo usage ratio, inline objects in JSX .map(), useEffect without cleanup |
+| **React** | Architecture | God components (>1000 LOC), prop drilling depth, context provider nesting |
+| **Next.js** | Performance | Page bundle sizes, dynamic imports usage, ISR/SSR appropriate usage |
+| **Next.js** | Architecture | API route structure, middleware usage, server/client boundary |
+| **NestJS** | Architecture | Module structure, circular module deps, guard/interceptor coverage |
+| **NestJS** | Performance | Eager-loaded modules, missing caching decorators |
+The framework checks are appended to the existing agent prompts — they don't replace the universal checks.
 ### Step 2: Launch 7 Parallel Agents
 Launch ALL enabled agents as parallel `Task` calls in a single message. Each agent uses `subagent_type=Explore` and `model="sonnet"` (per decisions.md: use Sonnet for routine exploration).
@@ -91,8 +154,13 @@ Analyze the architecture of this project.
    - Route handlers containing business logic (>50 LOC)
    - Utility files importing domain-specific modules
 4. Find god files (files with >300 LOC or >10 exported functions)
-5. Check for circular dependencies between modules
+5. Check for circular dependencies between modules (import cycles)
 6. Identify missing abstractions (repeated patterns that could be extracted)
+7. **Dead export scan**: For every exported function/component/type, grep for importers.
+   Report exports with ZERO importers — these are dead code at the module boundary.
+   Count total dead exports and list the top 10 by file.
+8. **If React detected** (from Gate 0 framework): Flag components with >5 useState as
+   re-render risks, check React.memo usage ratio, identify prop drilling depth >3
 Return a structured report with:
 - Strengths (good patterns found)
@@ -117,9 +185,14 @@ Audit the project's dependencies.
 4. Check for known security vulnerabilities:
    - Run: node node_modules/wogiflow/scripts/flow-audit.js audit
    → This runs npm audit and returns structured results
+5. **Dependency health** (enhanced):
+   - Major versions behind: packages that are 2+ majors behind (HIGH)
+   - License risk: GPL/AGPL in commercial projects, or UNLICENSED packages
+   - Bundle size outliers: dependencies >500KB that could be replaced with lighter alternatives
+   - Duplicate packages: same package at multiple versions in the tree
 Return:
-- Dependencies summary (total, outdated, vulnerable)
+- Dependencies summary (total, outdated, vulnerable, deprecated, license issues)
 - Each finding tagged [HIGH/MED/LOW]
 - Score: A through F
 ```
@@ -201,10 +274,20 @@ Audit consistency of patterns across the project.
 5. Configuration patterns:
    - Are config values accessed consistently?
    - Any hardcoded values that should be configurable?
+6. **eslint-disable comment census** (from Gate 0 data):
+   - Gate 0 provides the total count and top files
+   - Each eslint-disable is a suppressed violation — a high count (>50) indicates
+     hidden technical debt through suppression
+   - Flag files with >5 eslint-disable comments as consistency violations
+7. **Lint config integrity** (from Gate 0 data):
+   - If Gate 0 detected downgraded rules, include them as [HIGH] consistency findings
+   - This is "configuration-level debt masking" — making the project appear clean
+     by lowering standards instead of fixing code
 Return:
 - Consistency findings, each tagged [HIGH/MED/LOW]
 - Dominant patterns vs outliers
+- eslint-disable count and top offenders
 - Score: A through F
 ```
@@ -252,11 +335,26 @@ Catalog technical debt in this project.
 5. Cross-reference with existing tech debt:
    - Read .workflow/state/tech-debt.json if it exists
    - Identify new debt vs already-tracked debt
+6. **Test coverage reality check** (from Gate 0 data):
+   - Test file ratio: N test files / M source files (ideal: >30%)
+   - If coverage report is available: line/branch coverage %
+   - 0% test coverage + complex business logic = [HIGH] tech debt
+7. **Git health indicators** (from Gate 0 data):
+   - Commit frequency: active/inactive
+   - Stale branches (unmerged >30 days)
+   - Commit message quality (conventional commits?)
+   - Large uncommitted changes count
+8. **Environment/config hygiene** (from Gate 0 data):
+   - .env.example missing when .env exists
+   - No CI configuration = no automated quality enforcement
+   - Secrets patterns in tracked files
 Return:
 - Tech debt items, each tagged [HIGH/MED/LOW]
 - Summary: TODOs count, FIXMEs count, HACKs count
 - Commented-out code blocks count
+- Test coverage metrics
+- Git health summary
 - Score: A through F
 ```
@@ -286,11 +384,39 @@ Return:
 - Score: A through F
 ```
-### Step 3: Consolidate Results
+### Step 3: Consolidate Results + Apply Score Cap
 After all agents complete, consolidate into a single report.
-**Use `node node_modules/wogiflow/scripts/flow-audit.js score` with the agent scores to calculate a weighted overall score.**
+**3.1. Calculate weighted agent score:**
+```bash
+node node_modules/wogiflow/scripts/flow-audit.js score '{"architecture":"B+","dependencies":"A-",...}'
+```
+**3.2. Apply Gate 0 score cap:**
+```
+Final score = min(gate0_cap, weighted_agent_score - gate0_penalties)
+```
+| Gate 0 Result | Score Cap |
+|--------------|-----------|
+| Build fails | max D (63) |
+| Typecheck >500 errors | max D+ (67) |
+| Typecheck >100 errors | max C (73) |
+| Typecheck >50 errors | max C+ (77) |
+| Lint >50 errors | max C (73) |
+| Lint config manipulation | -3 pts per downgraded rule (max -15) |
+**Example**: Agents score B (83), but build fails → capped at D (63). Agents score B+ (87), but lint config has 4 downgraded rules → 87 - 12 = 75 → C+.
+**3.3. Include extended metrics in the report:**
+- eslint-disable comment count (from Gate 0)
+- Dead export count (from agent scan)
+- Test file ratio (from Gate 0)
+- Git health indicators (from Gate 0)
+**3.4. Trend delta (if previous audit exists):**
+Compare current metrics with `last-audit.json`. Show improvement/regression arrows.
 ### Step 4: Display Report
@@ -301,7 +427,11 @@ PROJECT AUDIT REPORT
 Project: [name] | Files scanned: N | Date: YYYY-MM-DD
-HEALTH SCORE: [A/B/C/D/F] (weighted across all dimensions)
+GATE 0 BASELINE:
+  Build: ✓/✗ | Typecheck: N errors | Lint: N errors, M warnings
+  Score cap: [GRADE] | Penalties: -N pts | Framework: [detected]
+HEALTH SCORE: [A/B/C/D/F] (capped by Gate 0 from agent score of [X])
 ━━━ ARCHITECTURE (score: X) ━━━
   Strengths:
@@ -493,6 +623,24 @@ Regardless of user choice, always save the audit results to `.workflow/state/las
 {
   "date": "YYYY-MM-DD",
   "overallScore": "B+",
+  "gate0": {
+    "buildPasses": true,
+    "typecheckErrors": 0,
+    "lintErrors": 0,
+    "lintWarnings": 12,
+    "downgradedRules": [],
+    "testsPassing": true,
+    "missingScripts": [],
+    "eslintDisableCount": 23,
+    "scoreCap": 100,
+    "penalties": 0,
+    "framework": { "name": "react", "version": "18.2.0" },
+    "gitHealth": { "recentCommits": 45, "staleBranches": 2, "conventionalCommits": true },
+    "envHygiene": { "envExample": true, "ciConfigured": true },
+    "testCoverage": { "testFiles": 34, "sourceFiles": 120, "ratio": "28.3%" }
+  },
+  "agentScore": "B+",
+  "scoreCappedBy": null,
   "scores": {
     "architecture": "B+",
     "dependencies": "A-",

package/.claude/commands/wogi-start.md CHANGED Viewed

@@ -560,6 +560,282 @@ After implementing all scenarios, BEFORE quality gates:
 **Why this works**: The evaluator has NO emotional investment in the code. It reads the spec and the diff cold. It's explicitly prompted to be skeptical. And because it's a separate sub-agent, it has a fresh context — no accumulated "I already know this works" bias from the implementation phase.
+### Step 3.58: Runtime Verification Gate — Auto-Test Generation (MANDATORY)
+**Activates when**: ANY code file is changed. This is the DEFAULT — not optional.
+Run detection: `node node_modules/wogiflow/scripts/flow-runtime-verification.js task-type [changed-files...]`
+This returns the task type: `frontend`, `backend`, `fullstack`, or `other`. For `frontend` and `fullstack`, UI browser tests are generated. For `backend` and `fullstack`, API integration tests are generated. For `other`, standard static verification applies.
+**The problem this solves**: AI workers mark tasks as "done" based on static evidence (TypeScript compiles, build succeeds) without verifying the feature actually works end-to-end. This leads to repeated failed iterations. Auto-generated tests catch these failures BEFORE the user does.
+**DEFAULT BEHAVIOR**: For every task, WogiFlow auto-generates and runs verification tests as part of the execution loop. Tests are written to `tests/verification/` and persist as regression guards. This is ON by default — disable with `config.runtimeVerification.enabled: false`.
+#### Auto-Test Generation Flow
+```
+For EACH acceptance criterion in the spec:
+  1. Classify: Is this a UI behavior, API behavior, or internal logic?
+  2. Generate: Write a test that exercises the criterion
+  3. Implement: Write the actual code
+  4. Run: Execute the test — it MUST pass
+  5. If FAIL → debug, fix, re-run (max 5 retries)
+  6. Persist: Test file stays in tests/verification/ as regression guard
+```
+**This is NOT TDD** (where tests come first and must fail initially). This is **post-implementation verification** — the test is generated from the criterion, the code is written, then the test validates the code works. The key difference: TDD tests are written before code; verification tests are written alongside code and run after.
+---
+#### FRONTEND: Browser Test Generation (Playwright + WebMCP)
+**Activates when**: Changed files match `*.tsx`, `*.jsx`, `*.vue`, `*.svelte`, `*.css`, `*.styled.*`
+**The problem this solves**: AI workers mark UI tasks as "done" based on static evidence without ever opening a browser. (See: Pipeline Rules case study — 5 failed iterations, same bug.)
+**BANNED verification methods** — these NEVER count as evidence for UI tasks:
+| Banned Method | What it proves | Why it's insufficient |
+|---|---|---|
+| `grep` deployed bundle for function names | Code included in build | Function may never execute or render wrong |
+| `tsc --noEmit` passes | Types are correct | Type-correct code can have wrong runtime behavior |
+| `vite build` succeeds | Modules resolve | Build success says nothing about UX |
+| "I read the code and it's logically correct" | Nothing | Author is worst possible judge of own work |
+| `aws s3 sync` completes | Files hosted | Hosting ≠ functioning |
+**Evidence Tiers** — every verification claim must be classified:
+| Tier | Name | Sufficient alone? |
+|---|---|---|
+| 0 | STATIC (compile, build, lint) | NEVER |
+| 1 | STRUCTURAL (file exists, imported, route registered) | NEVER |
+| 2 | OBSERVATIONAL (page loads, feature renders) | Yes (display-only) |
+| 3 | INTERACTIVE (click/type/submit → observed result persists) | Yes (behavioral) |
+| 4 | AUTOMATED (Playwright/WebMCP test passes) | Yes (strongest) |
+**Minimum: Tier 2 for display criteria, Tier 3 for behavioral criteria.**
+#### Verification Method Selection
+Run: `node node_modules/wogiflow/scripts/flow-runtime-verification.js method`
+**Priority order** (use the first available):
+**1. WebMCP Browser Verification (DEFAULT — preferred)**
+When `config.webmcp.enabled` or a browser MCP server is detected in `.mcp.json`:
+For EACH acceptance criterion:
+1. Navigate to the affected page via `mcp_browser_navigate`
+2. Screenshot BEFORE: `mcp_browser_screenshot()`
+3. Perform the user action (click, type, select, submit)
+4. Wait 2-3 seconds for async updates
+5. Screenshot AFTER: `mcp_browser_screenshot()`
+6. Assert DOM state: `mcp_browser_evaluate("document.querySelector(...)")`
+7. Record in Behavioral Evidence Log
+**High-risk tasks** (state mutation detected — useMutation, invalidateQueries, onMutate):
+- After all criteria verified, wait 3 seconds
+- Screenshot again — check state persisted after refetch
+- Reload page: `mcp_browser_navigate` to same URL
+- Wait for networkidle
+- Screenshot — check state survived page reload
+- If state reverted → the server didn't persist, or refetch overwrote it → FAIL
+**2. Playwright Test Generation (secondary)**
+When Playwright/Puppeteer is in dependencies but no WebMCP:
+1. Auto-generate a Playwright test from acceptance criteria
+2. Write test to `tests/verification/verify-{taskId}.spec.ts`
+3. Instruct the user: "Run `npx playwright test tests/verification/verify-{taskId}.spec.ts --headed` to verify"
+4. If the project has CI, the test persists as a regression guard
+**3. User Verification Checklist (fallback — always available)**
+When neither WebMCP nor Playwright is available:
+Present a checklist to the user:
+```
+━━━ USER VERIFICATION CHECKLIST ━━━
+I cannot verify UI behavior from the CLI. Please check:
+□ 1. Navigate to [page]
+□ 2. [criterion 1 — specific action + expected result]
+□ 3. [criterion 2 — specific action + expected result]
+□ Wait 3 seconds after each action
+□ Refresh the page and verify changes persisted
+Reply "verified" when all checks pass, or describe what's broken.
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+```
+**CRITICAL**: The agent MUST wait for the user's "verified" response before marking the task complete. Do NOT proceed to quality gates without verification.
+#### Behavioral Evidence Log (BEL)
+Before marking ANY UI task complete, produce a BEL:
+```
+━━━ BEHAVIORAL EVIDENCE LOG ━━━
+Task: wf-XXXXXXXX
+Method: WEBMCP / PLAYWRIGHT / USER_CHECKLIST
+Verified on: localhost:5173
+CRITERION: "[text]"
+  ACTION: Clicked "Route To" cell, selected "Design Department"
+  EXPECTED: Cell updates to show "Design DEPARTMENT"
+  OBSERVED: Cell shows "Design DEPARTMENT" with blue icon
+  WAIT: 3 seconds — state persisted after refetch
+  VERDICT: PASS
+  EVIDENCE: Tier 3 (INTERACTIVE)
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+```
+The OBSERVED field MUST describe what was SEEN, not what the code theoretically produces.
+#### Pre-Implementation "See Before You Touch" (modification tasks)
+For tasks modifying existing UI (not greenfield):
+1. Start dev server if not running
+2. Navigate to the affected page
+3. Screenshot/observe current state (BEFORE)
+4. Document the baseline
+5. Then implement changes
+6. After implementation, compare BEFORE vs AFTER
+#### Repeat Failure Protocol (Groundhog Day Detector)
+When the SAME issue is reported in 2+ consecutive dispatches:
+| Strike | Action |
+|--------|--------|
+| 1 | Normal fix + BEL |
+| 2 | MANDATORY root cause analysis BEFORE coding. Change approach. Add console.log tracing. Tier 3+ evidence required. |
+| 3 | HARD BLOCK: Cannot mark done without screenshot/console evidence. Must state what's DIFFERENT this time. |
+| 4+ | ESCALATION: Acknowledge inability, suggest pair debugging with user. |
+Run: `node node_modules/wogiflow/scripts/flow-runtime-verification.js repeat wf-XXXXXXXX`
+#### Devil's Advocate Prompt
+Before marking ANY task complete (frontend or backend), ask yourself:
+> "Assume this is broken. What are the 3 most likely ways it could fail?"
+Then CHECK each one:
+1. Does the API actually accept these fields? (curl it or check the DTO)
+2. Does the response include the fields I'm reading? (log the response)
+3. Does the UI update persist after refetch/re-render? (wait 3 seconds and look again)
+4. Is the request payload shape what the server expects? (compare DTO with frontend fetch)
+If ANY is plausible and not verified → investigate before marking done.
+---
+#### BACKEND: API Integration Test Generation
+**Activates when**: Changed files match `*.controller.*`, `*.service.*`, `*.resolver.*`, `/routes/`, `/api/`, `*.dto.*`, `*.guard.*`, `*.middleware.*`
+Run detection: `node node_modules/wogiflow/scripts/flow-runtime-verification.js api-detect [changed-files...]`
+**For EACH acceptance criterion that involves an API endpoint**:
+1. **Identify the endpoint**: method (GET/POST/PUT/PATCH/DELETE), path, expected request/response shape
+2. **Generate an integration test** that:
+   - Makes the actual HTTP request to the running dev server
+   - Asserts the status code matches expected
+   - Asserts the response body contains expected fields
+   - For mutations (POST/PUT/PATCH/DELETE): re-fetches the resource to verify persistence
+   - For auth-protected endpoints: includes the auth token
+3. **Write the test** to `tests/verification/api-verify-{taskId}.test.js`
+4. **Run the test**: `node --test tests/verification/api-verify-{taskId}.test.js`
+5. **If test fails** → debug, fix the implementation, re-run (max 5 retries)
+6. **Test persists** as a regression guard
+**API Test Template** (generated per criterion):
+```javascript
+it('POST /api/pipeline-rules — creates a rule with correct fields', async () => {
+  const res = await apiRequest('POST', '/api/pipeline-rules', {
+    tagPattern: 'animation',
+    routeTo: { type: 'department', id: 'dept-123' },
+    mode: 'CLAIMABLE'
+  });
+  // Status check
+  assert.equal(res.status, 201);
+  // Response shape check
+  assert.ok(res.data.id, 'Response missing field: id');
+  assert.equal(res.data.tagPattern, 'animation');
+  assert.equal(res.data.mode, 'CLAIMABLE');
+  // Persistence check: re-fetch and verify stored
+  const verify = await apiRequest('GET', `/api/pipeline-rules/${res.data.id}`);
+  assert.equal(verify.status, 200);
+  assert.equal(verify.data.tagPattern, 'animation');
+});
+```
+**Boundary verification** (frontend↔backend):
+When the task is `fullstack` (both UI and API files changed):
+1. Generate BOTH browser tests AND API tests
+2. The API test verifies the server accepts the payload shape the frontend sends
+3. The browser test verifies the UI correctly displays the response shape the server returns
+4. If either fails → the boundary contract is broken
+**Quick verification via curl** (for manual checking):
+The AI can also generate and run curl commands directly:
+```bash
+# Create a rule
+curl -s -X POST http://localhost:3000/api/pipeline-rules \
+  -H "Content-Type: application/json" \
+  -d '{"tagPattern":"animation","routeTo":{"type":"department","id":"dept-123"},"mode":"CLAIMABLE"}'
+# Verify it was stored
+curl -s http://localhost:3000/api/pipeline-rules | jq '.[-1]'
+```
+---
+#### Configuration
+```json
+{
+  "runtimeVerification": {
+    "enabled": true,
+    "autoGenerateTests": true,
+    "frontend": {
+      "method": "webmcp",
+      "fallback": ["playwright", "checklist"],
+      "devServerUrl": "http://localhost:5173"
+    },
+    "backend": {
+      "method": "api-test",
+      "fallback": ["curl", "checklist"],
+      "baseUrl": "http://localhost:3000"
+    },
+    "testOutput": "tests/verification",
+    "persistTests": true,
+    "blockOnFailure": true
+  }
+}
+```
+**`autoGenerateTests: true`** (default) — Tests are generated for EVERY task. This is the core behavioral change: verification is not an afterthought, it's built into the execution loop.
+**`persistTests: true`** (default) — Generated tests stay in `tests/verification/` as permanent regression guards. Over time, this builds an automated test suite from the actual use cases that were implemented.
+**`blockOnFailure: true`** (default) — If generated tests fail, the task is NOT complete. The agent must fix the implementation until tests pass.
+#### Skip Conditions
+- `config.runtimeVerification.enabled: false` → skip entirely (not recommended)
+- Task has NO code files in changed set (docs-only, config-only) → skip
+- Task is L3 trivial AND no UI/API files → skip
 ### Step 3.6: Integration Wiring Validation (MANDATORY)
 Run `node node_modules/wogiflow/scripts/flow-wiring-verifier.js wf-XXXXXXXX`

package/lib/workspace-channel-server.js CHANGED Viewed

@@ -462,17 +462,32 @@ const server = http.createServer(async (req, res) => {
       return;
     }
-    // Determine sender from header or default
-    const from = req.headers['x-wogi-from'] || 'workspace-manager';
+    // Determine sender from header or default (validate against name pattern)
+    const rawFrom = req.headers['x-wogi-from'] || '';
+    const from = VALID_NAME_PATTERN.test(rawFrom) ? rawFrom : 'workspace-manager';
+    // Parse effort level prefix: [effort:high] /wogi-start wf-xxx
+    let effortLevel = '';
+    let cleanBody = body;
+    const effortMatch = body.match(/^\[effort:(low|medium|high)\]\s*/);
+    if (effortMatch) {
+      effortLevel = effortMatch[1];
+      cleanBody = body.substring(effortMatch[0].length);
+    }
     // Forward as channel notification to Claude Code
     const meta = {
       from,
       port: String(PORT),
       repo: REPO_NAME,
-      receivedAt: new Date().toISOString()
+      receivedAt: new Date().toISOString(),
+      ...(effortLevel && { effortLevel })
     };
-    sendChannelNotification(body, meta);
+    // Send the clean body (without effort prefix) but include effort in meta
+    const notificationBody = effortLevel
+      ? `${cleanBody}\n\n[System: Apply reasoning effort level "${effortLevel}" to this task — propagated from workspace manager]`
+      : cleanBody;
+    sendChannelNotification(notificationBody, meta);
     // Also broadcast to SSE subscribers
     if (sseClients.size > 0) {

package/lib/workspace-gates.js CHANGED Viewed

@@ -63,6 +63,12 @@ const WORKSPACE_GATES = [
     description: 'Verify integration map is up-to-date',
     phase: 'pre',
     severity: 'warning'
+  },
+  {
+    name: 'deploymentReadiness',
+    description: 'Verify changes are committed and pushed before handoff to downstream workers',
+    phase: 'post',
+    severity: 'error'
   }
 ];
@@ -434,6 +440,84 @@ function broadcastPostChange(workspaceRoot, fromRepo, context, options = {}) {
  * @param {Object} [taskMeta] — { taskId, taskTitle, changedFiles, impactAssessed }
  * @returns {{ passed: boolean, message: string, severity: string }}
  */
+/**
+ * Deployment readiness gate — verifies changes are committed and pushed
+ * before allowing handoff to downstream workers.
+ *
+ * In workspace mode, when backend completes and frontend needs to start,
+ * the backend's changes MUST be committed and pushed first. Otherwise the
+ * frontend worker will build against stale code.
+ *
+ * Checks:
+ * 1. No uncommitted changes in the current repo (git status clean)
+ * 2. Local branch is not ahead of remote (changes are pushed)
+ *
+ * @param {string} workspaceRoot
+ * @param {Object} context
+ * @param {Object} taskMeta
+ * @returns {{ passed: boolean, message: string, severity: string }}
+ */
+function gateDeploymentReadiness(workspaceRoot, context, taskMeta) {
+  const { execFileSync } = require('node:child_process');
+  try {
+    // Check 1: No uncommitted changes
+    const statusOutput = execFileSync('git', ['status', '--porcelain'], {
+      encoding: 'utf-8',
+      timeout: 5000,
+      stdio: ['pipe', 'pipe', 'pipe'],
+      cwd: workspaceRoot || process.cwd()
+    }).trim();
+    if (statusOutput) {
+      const lineCount = statusOutput.split('\n').filter(Boolean).length;
+      return {
+        passed: false,
+        message: `${lineCount} uncommitted change(s). Commit and push before handoff to downstream workers.`,
+        severity: 'error'
+      };
+    }
+    // Check 2: Not ahead of remote (changes pushed)
+    try {
+      const aheadOutput = execFileSync('git', ['rev-list', '--count', '@{upstream}..HEAD'], {
+        encoding: 'utf-8',
+        timeout: 5000,
+        stdio: ['pipe', 'pipe', 'pipe'],
+        cwd: workspaceRoot || process.cwd()
+      }).trim();
+      const aheadCount = parseInt(aheadOutput, 10);
+      if (aheadCount > 0) {
+        return {
+          passed: false,
+          message: `${aheadCount} commit(s) not pushed to remote. Push before handoff to downstream workers.`,
+          severity: 'error'
+        };
+      }
+    } catch (_err) {
+      // No upstream configured — skip push check but warn
+      return {
+        passed: true,
+        message: 'No upstream branch configured — push check skipped',
+        severity: 'warning'
+      };
+    }
+    return {
+      passed: true,
+      message: 'Changes committed and pushed — ready for downstream handoff',
+      severity: 'info'
+    };
+  } catch (err) {
+    return {
+      passed: true,
+      message: `Deployment readiness check failed (${err.message}) — degraded to manual`,
+      severity: 'warning'
+    };
+  }
+}
 function runWorkspaceGate(gateName, workspaceRoot, context, taskMeta = {}) {
   switch (gateName) {
     case 'crossRepoImpactCheck':
@@ -451,6 +535,9 @@ function runWorkspaceGate(gateName, workspaceRoot, context, taskMeta = {}) {
     case 'integrationMapFreshness':
       return gateIntegrationMapFreshness(workspaceRoot);
+    case 'deploymentReadiness':
+      return gateDeploymentReadiness(workspaceRoot, context, taskMeta);
     default:
       return { passed: true, message: `Unknown gate: ${gateName}`, severity: 'warning' };
   }