npm - wogiflow - Versions diffs - 2.4.3 → 2.5.0 - Mend

wogiflow 2.4.3 → 2.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (24) hide show

package/.claude/commands/wogi-audit.md +26 -0
package/.claude/commands/wogi-review.md +29 -0
package/.claude/commands/wogi-start.md +124 -0
package/.claude/docs/claude-code-compatibility.md +24 -0
package/.claude/docs/explore-agents.md +19 -2
package/.claude/settings.json +11 -0
package/bin/flow +11 -1
package/lib/workspace-channel-server.js +364 -0
package/lib/workspace-contracts.js +599 -0
package/lib/workspace-intelligence.js +600 -0
package/lib/workspace-messages.js +441 -0
package/lib/workspace-routing.js +782 -0
package/lib/workspace-sync.js +339 -0
package/lib/workspace.js +1349 -0
package/package.json +1 -1
package/scripts/flow-config-defaults.js +28 -0
package/scripts/flow-eval-calibration.js +257 -0
package/scripts/flow-eval-judge.js +10 -1
package/scripts/flow-eval.js +9 -0
package/scripts/flow-schema-drift.js +837 -0
package/scripts/hooks/adapters/claude-code.js +29 -0
package/scripts/hooks/core/task-created.js +83 -0
package/scripts/hooks/entry/claude-code/task-created.js +15 -0
package/scripts/postinstall.js +2 -0

package/.claude/commands/wogi-audit.md CHANGED Viewed

@@ -260,6 +260,32 @@ Return:
 - Score: A through F
 ```
+#### Agent 8: Schema Drift Auditor
+```
+Audit schema drift across the entire project.
+1. Identify all schema source-of-truth files:
+   - Read schema-map.md and schema-index.json for registered schemas
+   - Scan for convention files: *.prisma, *.entity.ts, *.model.ts, *.schema.ts
+2. For each schema file, extract all defined field names
+3. For each field, grep the codebase for references outside the schema file
+4. Cross-reference: are there field names in consumer code that do NOT exist
+   in the current schema? (stale references from past changes)
+5. Check for inconsistencies:
+   - Field name in consumer doesn't match schema casing
+   - Optional field accessed without null check in consumer
+   - Field used in tests but removed from schema
+6. Run automated detection:
+   node scripts/flow-schema-drift.js
+Return:
+- Orphaned field references (field in consumer, not in schema)
+- Casing mismatches
+- Coverage: % of schema fields actually used by consumers
+- Score: A through F
+```
 ### Step 3: Consolidate Results
 After all agents complete, consolidate into a single report.

package/.claude/commands/wogi-review.md CHANGED Viewed

@@ -453,6 +453,35 @@ For each issue found, report as JSON:
   "agent": "performance" }
 ```
+#### Agent: Schema Drift Review
+Enabled when `"schema-drift"` is in `config.review.agents.optional`. **Auto-enabled** when any changed file matches schema conventions (*.prisma, *.entity.ts, *.model.ts, *.schema.ts) or is listed in schema-map.md.
+Launch a Task agent with subagent_type=Explore:
+```
+Schema drift review of the following files:
+[FILE_LIST]
+Check for:
+1. Read schema-map.md and schema-index.json to identify schema source-of-truth files
+2. For each schema file in the changed set, parse the git diff for removed/renamed fields
+3. For each removed/renamed field, grep the entire codebase for references:
+   - Property access: obj.fieldName
+   - Destructuring: { fieldName }
+   - Object keys: fieldName:
+   - String literals: 'fieldName' or "fieldName"
+4. Report any consumer file that still references a removed/renamed field
+For each issue found, report as JSON:
+{ "id": "finding-NNN", "file": "consumer-path", "line": N, "type": "schema-drift",
+  "severity": "high", "category": "schema-drift",
+  "issue": "Consumer references field 'X' which was removed/renamed in schema-file",
+  "recommendation": "Update reference to use new field name / remove reference",
+  "autoFixable": true, "agent": "schema-drift" }
+```
+Also run `node scripts/flow-schema-drift.js [changed-files]` for automated detection and include its output.
 ### Project-Rules Agents (Auto-Generated from decisions.md)
 When `config.review.agents.projectRules` is `true`, additional agents are **automatically generated** from project rules:

package/.claude/commands/wogi-start.md CHANGED Viewed

@@ -305,6 +305,45 @@ Test framework auto-detected from package.json: jest, vitest, mocha, tap, or fal
 4. If failing: debug, fix, retry (max 5 attempts)
 5. Mark completed only when verification passes
+### Step 3.05: Sprint-Based Context Reset (L1+ tasks with 5+ criteria)
+**Activates when**: `config.sprintReset.enabled` (default: true) AND task has 5+ acceptance criteria AND current criterion index is a multiple of `config.sprintReset.criteriaPerSprint` (default: 3).
+**The problem this solves**: For large tasks, context fills with implementation details from early criteria. By criterion 6+, the AI is working with degraded context — old diffs, stale tool results, and exploration artifacts crowd out what matters for the current criterion. The Anthropic harness design research found that full context resets with structured file-based handoffs produce higher quality output than continuous context for long-running tasks.
+**Procedure** (runs automatically at sprint boundaries):
+1. After completing criterion N (where N % `criteriaPerSprint` === 0 AND remaining criteria > 0):
+2. **Commit progress**: `git add -A && git commit -m "sprint: criteria 1-N of M complete"`
+3. **Save sprint checkpoint** to `.workflow/state/task-checkpoint.json`:
+   - Task ID, spec path, completed criteria indices, changed files, remaining criteria
+4. **Output sprint summary** (visible to user):
+   ```
+   ━━━ SPRINT BOUNDARY ━━━
+   Completed criteria 1-N of M. Committing and resetting context.
+   Remaining: criteria (N+1)-M
+   ```
+5. **Compact context** — this triggers a full compaction. The PostCompact hook restores:
+   - Active task ID and spec reference
+   - Which criteria are done vs pending (from checkpoint)
+   - Changed files list
+6. **Resume from checkpoint** — read the spec fresh, skip completed criteria, continue with criterion N+1
+**Why this is different from normal compaction**: Normal compaction summarizes the conversation. Sprint reset goes further — it commits work, saves a structured checkpoint, and compacts. The next sprint starts with a clean slate + the checkpoint file, not a compressed summary of everything that happened. The AI reads the spec fresh rather than relying on a summarized memory of it.
+**Configuration**:
+```json
+{
+  "sprintReset": {
+    "enabled": true,
+    "criteriaPerSprint": 3,
+    "minTaskCriteria": 5
+  }
+}
+```
+**Skip when**: Task has < 5 criteria, TDD mode is active (TDD has its own rhythm), or `sprintReset.enabled` is false.
 ### Step 3.5: Criteria Completion Verification (MANDATORY)
 After implementing all scenarios, BEFORE quality gates:
@@ -416,6 +455,91 @@ After implementing all scenarios, BEFORE quality gates:
 **Skip conditions**: Tasks that target a specific file or a small known set (e.g., "remove the mock import in Dashboard.tsx") don't need the full inventory — they're scoped enough already. The inventory is for "all X" / "every X" / "clean up X everywhere" tasks.
+### Step 3.56: Skeptical Evaluator Gate (L2+ tasks, when `config.skepticalEvaluator.enabled`)
+**The problem this solves**: The same agent that wrote the code verifies its own work in Step 3.5. Anthropic's harness design research found that "separating the agent doing the work from the agent judging it proves to be a strong lever" and that "tuning standalone evaluators toward skepticism is far more tractable than making a generator critical of its own work." This is "confident praise bias" — the implementer always thinks it did a good job.
+**Activates when**: `config.skepticalEvaluator.enabled` (default: true) AND task level is L2 or higher (not L3 trivial tasks).
+**Procedure**:
+1. **Spawn a skeptical evaluator sub-agent** (separate from the implementation agent):
+   ```
+   Agent({
+     subagent_type: "code-reviewer",
+     model: "sonnet",  // Use a different model for diversity
+     prompt: <see below>
+   })
+   ```
+2. **Evaluator prompt** (tuned toward skepticism):
+   ```
+   You are a SKEPTICAL code evaluator. Your job is to find problems, not praise.
+   Assume the implementation has gaps until proven otherwise.
+   ## Task Specification
+   <read and paste the spec from .workflow/specs/wf-XXXXXXXX.md>
+   ## Implementation Diff
+   <git diff of all changed files>
+   ## Your Job
+   For EACH acceptance criterion in the spec:
+   1. Read the criterion carefully
+   2. Find the EXACT code that implements it (cite file:line)
+   3. Grade: PASS (fully works), PARTIAL (code exists but incomplete), FAIL (not implemented)
+   4. If PARTIAL or FAIL: explain exactly what's missing
+   IMPORTANT: "Code exists" is NOT the same as "criterion is met."
+   A service that exists but is never called = FAIL.
+   A component that renders but doesn't handle the specified edge case = PARTIAL.
+   Only grade PASS when the criterion is FULLY satisfied end-to-end.
+   ## Output Format
+   Return JSON:
+   {
+     "criteria": [
+       { "criterion": "...", "grade": "PASS|PARTIAL|FAIL", "evidence": "file:line", "issue": "..." }
+     ],
+     "overallPass": true/false,
+     "criticalIssues": ["..."]
+   }
+   ```
+3. **Process evaluator results**:
+   - If `overallPass: true` → proceed to Step 3.6
+   - If `overallPass: false` → **iteration loop** (see below)
+4. **Generator-Evaluator Iteration Loop** (when evaluator finds issues):
+   - Feed the evaluator's `criticalIssues` and failed criteria back to the implementation context
+   - Fix the identified issues (targeted fixes, not re-implementation)
+   - Re-run the evaluator on the updated diff
+   - **Max iterations**: `config.skepticalEvaluator.maxIterations` (default: 3)
+   - If still failing after max iterations → proceed to Step 3.6 anyway but **flag the unresolved issues** in the completion report
+5. **Calibration** (when `config.skepticalEvaluator.calibration` is true):
+   - Before spawning the evaluator, check `.workflow/state/eval-calibration.json` for calibration examples
+   - If examples exist, inject 2-3 into the evaluator prompt as few-shot examples:
+     - One high-scoring example (what a PASS looks like)
+     - One low-scoring example (what a FAIL looks like)
+   - This prevents score drift — the evaluator is anchored to concrete examples
+**Configuration**:
+```json
+{
+  "skepticalEvaluator": {
+    "enabled": true,
+    "maxIterations": 3,
+    "model": "sonnet",
+    "calibration": true,
+    "skipForL3": true
+  }
+}
+```
+**Why this works**: The evaluator has NO emotional investment in the code. It reads the spec and the diff cold. It's explicitly prompted to be skeptical. And because it's a separate sub-agent, it has a fresh context — no accumulated "I already know this works" bias from the implementation phase.
 ### Step 3.6: Integration Wiring Validation (MANDATORY)
 Run `node node_modules/wogiflow/scripts/flow-wiring-verifier.js wf-XXXXXXXX`

package/.claude/docs/claude-code-compatibility.md CHANGED Viewed

@@ -71,6 +71,7 @@ flow parallel check  # See available parallel tasks
 | 2.0.0+ | 2.1.76+ | PostCompact hook, Elicitation/ElicitationResult events, deferred tool schema fix |
 | 2.1.0+ | 2.1.77+ | PreToolUse allow/deny separation, 128k output tokens, worktree sparse checkout, compaction circuit breaker |
 | 2.4.0+ | 2.1.83+ | managed-settings.d/, CwdChanged/FileChanged hooks, ENV_SCRUB, --channels limitations, MEMORY.md 25KB cap |
+| 2.5.0+ | 2.1.84+ | TaskCreated hook, YAML glob lists in rules, CLAUDE_STREAM_IDLE_TIMEOUT_MS, WorktreeCreate HTTP transport, idle-return prompt, MCP 2KB cap |
 ### Environment Variables (2.1.19+)
@@ -163,6 +164,7 @@ await cancelTask('wf-123', 'superseded', false);
 | Stop | stop.js | Session cleanup |
 | SessionEnd | session-end.js | Request logging, progress update |
 | TaskCompleted | task-completed.js | Move task to recentlyCompleted |
+| TaskCreated | task-created.js | Link native tasks to active WogiFlow task (2.1.84+) |
 | ConfigChange | config-change.js | Re-sync bridge on mid-session config changes |
 | InstructionsLoaded | instructions-loaded.js | Package check, rule conflicts, auto-onboard |
 | PostCompact | post-compact.js | Re-inject state after context compaction (2.1.76+) |
@@ -263,6 +265,28 @@ await cancelTask('wf-123', 'superseded', false);
 - **Uninstalled plugin hooks fix**: Fixed uninstalled plugin hooks continuing to fire until the next session. Improves hook hygiene for WogiFlow plugin management.
+### Features in 2.1.84+
+- **TaskCreated hook event**: New hook event fired when a task is created via TaskCreate. WogiFlow uses this to link native Claude Code tasks to the active WogiFlow task in `session-state.json`, enabling cross-system task tracking. Implemented in `scripts/hooks/core/task-created.js`.
+- **YAML glob lists in rules/skills frontmatter**: Rules and skills `globs:` field now accepts YAML lists in addition to single strings. WogiFlow's `flow-rules-sync.js` currently generates single-string globs with brace expansion (`"**/*.{js,ts}"`). This opens the door to cleaner multi-pattern rules without brace expansion hacks. No immediate code change — tracked as improvement.
+- **CLAUDE_STREAM_IDLE_TIMEOUT_MS**: New env var to configure the streaming idle watchdog threshold (default 90s). WogiFlow's explore phase launches 5-6 parallel agents — if an agent takes >90s without streaming output, the watchdog may kill the connection. Users experiencing timeouts during explore should set this higher (e.g., `CLAUDE_STREAM_IDLE_TIMEOUT_MS=180000` for 3 minutes).
+- **WorktreeCreate hook HTTP transport**: WorktreeCreate now supports `type: "http"` — return the created worktree path via `hookSpecificOutput.worktreePath`. WogiFlow continues to use command transport locally. HTTP transport enables wogiflow-cloud to receive worktree events server-side for team task tracking. Listed in `UNUSED_SUPPORTED_EVENTS` as a cloud opportunity.
+- **Idle-return prompt**: Users returning after 75+ minutes are nudged to `/clear`. WogiFlow's PostCompact hook handles `/clear` correctly — it fires on compaction, re-injects state (active task, workflow phase, durable session progress), and re-arms routing. Session restore tested and working via the same PostCompact pathway.
+- **MCP tool descriptions capped at 2KB**: MCP tool descriptions and server instructions now capped at 2KB to prevent OpenAPI-generated servers from bloating context. WogiFlow's plugin system registers MCP servers — plugins with verbose OpenAPI specs may have descriptions silently truncated. Plugin docs should note this limit.
+- **System-prompt caching with ToolSearch**: Global system-prompt caching now works when ToolSearch is enabled. WogiFlow sessions use ToolSearch for deferred MCP tools — this reduces input token costs automatically. No code change needed.
+- **Subagent JSON-schema fix**: Fixed workflow subagents failing with API 400 when the outer session uses `--json-schema` and the subagent also specifies a schema. Improves reliability of WogiFlow explore agents in structured-output sessions.
+- **allowedChannelPlugins managed setting**: Enterprise admins can define a channel plugin allowlist. Relevant for wogiflow-cloud teams product — team admins could control which wogi plugins are allowed across the team. Tracked as cloud opportunity.
+- **ANTHROPIC_DEFAULT_{OPUS,SONNET,HAIKU}_MODEL_SUPPORTS**: New env vars to override effort/thinking capability detection for pinned default models on Bedrock/Vertex/Foundry. WogiFlow's hybrid mode routes to different models — 3P users who pin models can now declare their capabilities properly.
 ### Simple Mode Naming Distinction
 Claude Code's `CLAUDE_CODE_SIMPLE` environment variable (which enables a simplified tool set) is **unrelated** to WogiFlow's `loops.simpleMode` (a lightweight task completion loop using string detection). They are separate features that happen to share the word "simple":

package/.claude/docs/explore-agents.md CHANGED Viewed

@@ -111,12 +111,23 @@ Planned files: [FILES_TO_CHANGE]
 4. If a memory database exists (.workflow/memory/local.db or via MCP):
    - Query for rejected approaches from past tasks touching the same files
    - Surface any "approach X was tried and failed" warnings
+5. **Eval trend analysis** (NEW — from Anthropic harness design research):
+   - Read `.workflow/evals/` directory for the last 5-10 eval results
+   - Calculate average score per dimension (completeness, accuracy, workflowCompliance, tokenEfficiency, quality)
+   - If any dimension averages below 6/10 across recent evals:
+     - Flag it as a RECURRING WEAKNESS
+     - Suggest a mitigation for the spec (e.g., "tokenEfficiency averaging 4/10 → add context budgeting hints")
+   - If eval calibration exists (`.workflow/state/eval-calibration.json`):
+     - Compare the current task type against high/low calibration examples
+     - Warn if this task type historically scores low
 Return:
 - Known risks for this task type (from feedback-patterns)
 - Past corrections in this area (from corrections/)
 - Promoted rules that apply (from decisions.md, count >= 3)
 - Rejected approaches from similar past work (from memory-db)
+- **Eval trend warnings** (dimensions scoring below 6/10 in recent evals)
+- **Recommended spec hints** (based on eval trends — inject into spec generation)
 - Confidence: HIGH (many data points) / MEDIUM / LOW (no history)
 ```
@@ -153,11 +164,12 @@ Return:
 - Security patterns that apply
 ```
-## Agent 6: Consumer Impact Analyzer (Refactor/Migration Only)
+## Agent 6: Consumer Impact Analyzer (Refactor/Migration/Schema Changes)
-Launch as `Agent(subagent_type=Explore)` (local only). **MANDATORY for refactor, migration, architecture tasks.**
+Launch as `Agent(subagent_type=Explore)` (local only). **MANDATORY for refactor, migration, architecture tasks AND any task that modifies schema/model files.**
 Trigger keywords: refactor, replace, rename, restructure, extract, consolidate, deprecate, migrate, move, reorganize.
+Trigger files: *.prisma, *.entity.ts, *.model.ts, *.schema.ts, files listed in schema-map.md.
 ```
 Analyze consumer impact for task: "[TASK_TITLE]"
@@ -172,10 +184,15 @@ You MUST map all consumers before changes proceed.
    c. Grep for ALL config files that reference it
    d. Grep for ALL documentation (.md) that reference it
    e. Grep for ALL test files that import or mock it
+   f. For schema/model files: grep for FIELD-LEVEL references — property accesses
+      (obj.fieldName), destructuring ({ fieldName }), object keys (fieldName:),
+      and string literals ('fieldName'). Report which specific fields are referenced
+      by which consumers. This catches drift that module-level import checks miss.
 2. For EACH consumer, classify impact:
    - BREAKING (import/API changes) — describe what breaks + migration path
    - NEEDS-UPDATE (behavior change) — describe expected behavioral change
+   - SCHEMA-DRIFT (field removed/renamed but consumer still references old name)
    - SAFE (no change needed)
 3. Check indirect consumers (up to 3 levels deep)

package/.claude/settings.json CHANGED Viewed

@@ -132,6 +132,17 @@
           }
         ]
       }
+    ],
+    "TaskCreated": [
+      {
+        "hooks": [
+          {
+            "type": "command",
+            "command": "node scripts/hooks/entry/claude-code/task-created.js",
+            "timeout": 5
+          }
+        ]
+      }
     ]
   },
   "_wogiFlowManaged": true,

package/bin/flow CHANGED Viewed

@@ -23,7 +23,7 @@ const packageJson = require('../package.json');
 const VERSION = packageJson.version;
 // Global commands that don't require a project context
-const GLOBAL_COMMANDS = ['init', 'upgrade', 'version', '--version', '-v', '--help', '-h', 'skill', 'channel', 'login', 'logout'];
+const GLOBAL_COMMANDS = ['init', 'upgrade', 'version', '--version', '-v', '--help', '-h', 'skill', 'channel', 'login', 'logout', 'workspace'];
 /**
  * Find the project root by looking for .workflow directory
@@ -97,6 +97,7 @@ Usage: flow <command> [options]
 Global Commands:
   init                 Initialize Wogi Flow in a new project
+  workspace init       Initialize a multi-repo workspace
   upgrade              Upgrade an existing project to latest version
   login                Connect to WogiFlow Teams
   logout               Disconnect from WogiFlow Teams
@@ -208,6 +209,15 @@ function main() {
     return;
   }
+  if (command === 'workspace') {
+    const { workspace } = require('../lib/workspace');
+    workspace(args.slice(1)).catch(err => {
+      console.error(`Workspace error: ${err.message}`);
+      process.exit(1);
+    });
+    return;
+  }
   // For all other commands, try to find project context
   const projectRoot = findProjectRoot();