wogiflow 2.4.3 → 2.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -260,6 +260,32 @@ Return:
260
260
  - Score: A through F
261
261
  ```
262
262
 
263
+ #### Agent 8: Schema Drift Auditor
264
+
265
+ ```
266
+ Audit schema drift across the entire project.
267
+
268
+ 1. Identify all schema source-of-truth files:
269
+ - Read schema-map.md and schema-index.json for registered schemas
270
+ - Scan for convention files: *.prisma, *.entity.ts, *.model.ts, *.schema.ts
271
+ 2. For each schema file, extract all defined field names
272
+ 3. For each field, grep the codebase for references outside the schema file
273
+ 4. Cross-reference: are there field names in consumer code that do NOT exist
274
+ in the current schema? (stale references from past changes)
275
+ 5. Check for inconsistencies:
276
+ - Field name in consumer doesn't match schema casing
277
+ - Optional field accessed without null check in consumer
278
+ - Field used in tests but removed from schema
279
+ 6. Run automated detection:
280
+ node scripts/flow-schema-drift.js
281
+
282
+ Return:
283
+ - Orphaned field references (field in consumer, not in schema)
284
+ - Casing mismatches
285
+ - Coverage: % of schema fields actually used by consumers
286
+ - Score: A through F
287
+ ```
288
+
263
289
  ### Step 3: Consolidate Results
264
290
 
265
291
  After all agents complete, consolidate into a single report.
@@ -453,6 +453,35 @@ For each issue found, report as JSON:
453
453
  "agent": "performance" }
454
454
  ```
455
455
 
456
+ #### Agent: Schema Drift Review
457
+
458
+ Enabled when `"schema-drift"` is in `config.review.agents.optional`. **Auto-enabled** when any changed file matches schema conventions (*.prisma, *.entity.ts, *.model.ts, *.schema.ts) or is listed in schema-map.md.
459
+
460
+ Launch a Task agent with subagent_type=Explore:
461
+ ```
462
+ Schema drift review of the following files:
463
+ [FILE_LIST]
464
+
465
+ Check for:
466
+ 1. Read schema-map.md and schema-index.json to identify schema source-of-truth files
467
+ 2. For each schema file in the changed set, parse the git diff for removed/renamed fields
468
+ 3. For each removed/renamed field, grep the entire codebase for references:
469
+ - Property access: obj.fieldName
470
+ - Destructuring: { fieldName }
471
+ - Object keys: fieldName:
472
+ - String literals: 'fieldName' or "fieldName"
473
+ 4. Report any consumer file that still references a removed/renamed field
474
+
475
+ For each issue found, report as JSON:
476
+ { "id": "finding-NNN", "file": "consumer-path", "line": N, "type": "schema-drift",
477
+ "severity": "high", "category": "schema-drift",
478
+ "issue": "Consumer references field 'X' which was removed/renamed in schema-file",
479
+ "recommendation": "Update reference to use new field name / remove reference",
480
+ "autoFixable": true, "agent": "schema-drift" }
481
+ ```
482
+
483
+ Also run `node scripts/flow-schema-drift.js [changed-files]` for automated detection and include its output.
484
+
456
485
  ### Project-Rules Agents (Auto-Generated from decisions.md)
457
486
 
458
487
  When `config.review.agents.projectRules` is `true`, additional agents are **automatically generated** from project rules:
@@ -305,6 +305,45 @@ Test framework auto-detected from package.json: jest, vitest, mocha, tap, or fal
305
305
  4. If failing: debug, fix, retry (max 5 attempts)
306
306
  5. Mark completed only when verification passes
307
307
 
308
+ ### Step 3.05: Sprint-Based Context Reset (L1+ tasks with 5+ criteria)
309
+
310
+ **Activates when**: `config.sprintReset.enabled` (default: true) AND task has 5+ acceptance criteria AND current criterion index is a multiple of `config.sprintReset.criteriaPerSprint` (default: 3).
311
+
312
+ **The problem this solves**: For large tasks, context fills with implementation details from early criteria. By criterion 6+, the AI is working with degraded context — old diffs, stale tool results, and exploration artifacts crowd out what matters for the current criterion. The Anthropic harness design research found that full context resets with structured file-based handoffs produce higher quality output than continuous context for long-running tasks.
313
+
314
+ **Procedure** (runs automatically at sprint boundaries):
315
+
316
+ 1. After completing criterion N (where N % `criteriaPerSprint` === 0 AND remaining criteria > 0):
317
+ 2. **Commit progress**: `git add -A && git commit -m "sprint: criteria 1-N of M complete"`
318
+ 3. **Save sprint checkpoint** to `.workflow/state/task-checkpoint.json`:
319
+ - Task ID, spec path, completed criteria indices, changed files, remaining criteria
320
+ 4. **Output sprint summary** (visible to user):
321
+ ```
322
+ ━━━ SPRINT BOUNDARY ━━━
323
+ Completed criteria 1-N of M. Committing and resetting context.
324
+ Remaining: criteria (N+1)-M
325
+ ```
326
+ 5. **Compact context** — this triggers a full compaction. The PostCompact hook restores:
327
+ - Active task ID and spec reference
328
+ - Which criteria are done vs pending (from checkpoint)
329
+ - Changed files list
330
+ 6. **Resume from checkpoint** — read the spec fresh, skip completed criteria, continue with criterion N+1
331
+
332
+ **Why this is different from normal compaction**: Normal compaction summarizes the conversation. Sprint reset goes further — it commits work, saves a structured checkpoint, and compacts. The next sprint starts with a clean slate + the checkpoint file, not a compressed summary of everything that happened. The AI reads the spec fresh rather than relying on a summarized memory of it.
333
+
334
+ **Configuration**:
335
+ ```json
336
+ {
337
+ "sprintReset": {
338
+ "enabled": true,
339
+ "criteriaPerSprint": 3,
340
+ "minTaskCriteria": 5
341
+ }
342
+ }
343
+ ```
344
+
345
+ **Skip when**: Task has < 5 criteria, TDD mode is active (TDD has its own rhythm), or `sprintReset.enabled` is false.
346
+
308
347
  ### Step 3.5: Criteria Completion Verification (MANDATORY)
309
348
 
310
349
  After implementing all scenarios, BEFORE quality gates:
@@ -416,6 +455,91 @@ After implementing all scenarios, BEFORE quality gates:
416
455
 
417
456
  **Skip conditions**: Tasks that target a specific file or a small known set (e.g., "remove the mock import in Dashboard.tsx") don't need the full inventory — they're scoped enough already. The inventory is for "all X" / "every X" / "clean up X everywhere" tasks.
418
457
 
458
+ ### Step 3.56: Skeptical Evaluator Gate (L2+ tasks, when `config.skepticalEvaluator.enabled`)
459
+
460
+ **The problem this solves**: The same agent that wrote the code verifies its own work in Step 3.5. Anthropic's harness design research found that "separating the agent doing the work from the agent judging it proves to be a strong lever" and that "tuning standalone evaluators toward skepticism is far more tractable than making a generator critical of its own work." This is "confident praise bias" — the implementer always thinks it did a good job.
461
+
462
+ **Activates when**: `config.skepticalEvaluator.enabled` (default: true) AND task level is L2 or higher (not L3 trivial tasks).
463
+
464
+ **Procedure**:
465
+
466
+ 1. **Spawn a skeptical evaluator sub-agent** (separate from the implementation agent):
467
+ ```
468
+ Agent({
469
+ subagent_type: "code-reviewer",
470
+ model: "sonnet", // Use a different model for diversity
471
+ prompt: <see below>
472
+ })
473
+ ```
474
+
475
+ 2. **Evaluator prompt** (tuned toward skepticism):
476
+ ```
477
+ You are a SKEPTICAL code evaluator. Your job is to find problems, not praise.
478
+ Assume the implementation has gaps until proven otherwise.
479
+
480
+ ## Task Specification
481
+ <read and paste the spec from .workflow/specs/wf-XXXXXXXX.md>
482
+
483
+ ## Implementation Diff
484
+ <git diff of all changed files>
485
+
486
+ ## Your Job
487
+
488
+ For EACH acceptance criterion in the spec:
489
+ 1. Read the criterion carefully
490
+ 2. Find the EXACT code that implements it (cite file:line)
491
+ 3. Grade: PASS (fully works), PARTIAL (code exists but incomplete), FAIL (not implemented)
492
+ 4. If PARTIAL or FAIL: explain exactly what's missing
493
+
494
+ IMPORTANT: "Code exists" is NOT the same as "criterion is met."
495
+ A service that exists but is never called = FAIL.
496
+ A component that renders but doesn't handle the specified edge case = PARTIAL.
497
+ Only grade PASS when the criterion is FULLY satisfied end-to-end.
498
+
499
+ ## Output Format
500
+ Return JSON:
501
+ {
502
+ "criteria": [
503
+ { "criterion": "...", "grade": "PASS|PARTIAL|FAIL", "evidence": "file:line", "issue": "..." }
504
+ ],
505
+ "overallPass": true/false,
506
+ "criticalIssues": ["..."]
507
+ }
508
+ ```
509
+
510
+ 3. **Process evaluator results**:
511
+ - If `overallPass: true` → proceed to Step 3.6
512
+ - If `overallPass: false` → **iteration loop** (see below)
513
+
514
+ 4. **Generator-Evaluator Iteration Loop** (when evaluator finds issues):
515
+ - Feed the evaluator's `criticalIssues` and failed criteria back to the implementation context
516
+ - Fix the identified issues (targeted fixes, not re-implementation)
517
+ - Re-run the evaluator on the updated diff
518
+ - **Max iterations**: `config.skepticalEvaluator.maxIterations` (default: 3)
519
+ - If still failing after max iterations → proceed to Step 3.6 anyway but **flag the unresolved issues** in the completion report
520
+
521
+ 5. **Calibration** (when `config.skepticalEvaluator.calibration` is true):
522
+ - Before spawning the evaluator, check `.workflow/state/eval-calibration.json` for calibration examples
523
+ - If examples exist, inject 2-3 into the evaluator prompt as few-shot examples:
524
+ - One high-scoring example (what a PASS looks like)
525
+ - One low-scoring example (what a FAIL looks like)
526
+ - This prevents score drift — the evaluator is anchored to concrete examples
527
+
528
+ **Configuration**:
529
+ ```json
530
+ {
531
+ "skepticalEvaluator": {
532
+ "enabled": true,
533
+ "maxIterations": 3,
534
+ "model": "sonnet",
535
+ "calibration": true,
536
+ "skipForL3": true
537
+ }
538
+ }
539
+ ```
540
+
541
+ **Why this works**: The evaluator has NO emotional investment in the code. It reads the spec and the diff cold. It's explicitly prompted to be skeptical. And because it's a separate sub-agent, it has a fresh context — no accumulated "I already know this works" bias from the implementation phase.
542
+
419
543
  ### Step 3.6: Integration Wiring Validation (MANDATORY)
420
544
 
421
545
  Run `node node_modules/wogiflow/scripts/flow-wiring-verifier.js wf-XXXXXXXX`
@@ -71,6 +71,7 @@ flow parallel check # See available parallel tasks
71
71
  | 2.0.0+ | 2.1.76+ | PostCompact hook, Elicitation/ElicitationResult events, deferred tool schema fix |
72
72
  | 2.1.0+ | 2.1.77+ | PreToolUse allow/deny separation, 128k output tokens, worktree sparse checkout, compaction circuit breaker |
73
73
  | 2.4.0+ | 2.1.83+ | managed-settings.d/, CwdChanged/FileChanged hooks, ENV_SCRUB, --channels limitations, MEMORY.md 25KB cap |
74
+ | 2.5.0+ | 2.1.84+ | TaskCreated hook, YAML glob lists in rules, CLAUDE_STREAM_IDLE_TIMEOUT_MS, WorktreeCreate HTTP transport, idle-return prompt, MCP 2KB cap |
74
75
 
75
76
  ### Environment Variables (2.1.19+)
76
77
 
@@ -163,6 +164,7 @@ await cancelTask('wf-123', 'superseded', false);
163
164
  | Stop | stop.js | Session cleanup |
164
165
  | SessionEnd | session-end.js | Request logging, progress update |
165
166
  | TaskCompleted | task-completed.js | Move task to recentlyCompleted |
167
+ | TaskCreated | task-created.js | Link native tasks to active WogiFlow task (2.1.84+) |
166
168
  | ConfigChange | config-change.js | Re-sync bridge on mid-session config changes |
167
169
  | InstructionsLoaded | instructions-loaded.js | Package check, rule conflicts, auto-onboard |
168
170
  | PostCompact | post-compact.js | Re-inject state after context compaction (2.1.76+) |
@@ -263,6 +265,28 @@ await cancelTask('wf-123', 'superseded', false);
263
265
 
264
266
  - **Uninstalled plugin hooks fix**: Fixed uninstalled plugin hooks continuing to fire until the next session. Improves hook hygiene for WogiFlow plugin management.
265
267
 
268
+ ### Features in 2.1.84+
269
+
270
+ - **TaskCreated hook event**: New hook event fired when a task is created via TaskCreate. WogiFlow uses this to link native Claude Code tasks to the active WogiFlow task in `session-state.json`, enabling cross-system task tracking. Implemented in `scripts/hooks/core/task-created.js`.
271
+
272
+ - **YAML glob lists in rules/skills frontmatter**: Rules and skills `globs:` field now accepts YAML lists in addition to single strings. WogiFlow's `flow-rules-sync.js` currently generates single-string globs with brace expansion (`"**/*.{js,ts}"`). This opens the door to cleaner multi-pattern rules without brace expansion hacks. No immediate code change — tracked as improvement.
273
+
274
+ - **CLAUDE_STREAM_IDLE_TIMEOUT_MS**: New env var to configure the streaming idle watchdog threshold (default 90s). WogiFlow's explore phase launches 5-6 parallel agents — if an agent takes >90s without streaming output, the watchdog may kill the connection. Users experiencing timeouts during explore should set this higher (e.g., `CLAUDE_STREAM_IDLE_TIMEOUT_MS=180000` for 3 minutes).
275
+
276
+ - **WorktreeCreate hook HTTP transport**: WorktreeCreate now supports `type: "http"` — return the created worktree path via `hookSpecificOutput.worktreePath`. WogiFlow continues to use command transport locally. HTTP transport enables wogiflow-cloud to receive worktree events server-side for team task tracking. Listed in `UNUSED_SUPPORTED_EVENTS` as a cloud opportunity.
277
+
278
+ - **Idle-return prompt**: Users returning after 75+ minutes are nudged to `/clear`. WogiFlow's PostCompact hook handles `/clear` correctly — it fires on compaction, re-injects state (active task, workflow phase, durable session progress), and re-arms routing. Session restore tested and working via the same PostCompact pathway.
279
+
280
+ - **MCP tool descriptions capped at 2KB**: MCP tool descriptions and server instructions now capped at 2KB to prevent OpenAPI-generated servers from bloating context. WogiFlow's plugin system registers MCP servers — plugins with verbose OpenAPI specs may have descriptions silently truncated. Plugin docs should note this limit.
281
+
282
+ - **System-prompt caching with ToolSearch**: Global system-prompt caching now works when ToolSearch is enabled. WogiFlow sessions use ToolSearch for deferred MCP tools — this reduces input token costs automatically. No code change needed.
283
+
284
+ - **Subagent JSON-schema fix**: Fixed workflow subagents failing with API 400 when the outer session uses `--json-schema` and the subagent also specifies a schema. Improves reliability of WogiFlow explore agents in structured-output sessions.
285
+
286
+ - **allowedChannelPlugins managed setting**: Enterprise admins can define a channel plugin allowlist. Relevant for wogiflow-cloud teams product — team admins could control which wogi plugins are allowed across the team. Tracked as cloud opportunity.
287
+
288
+ - **ANTHROPIC_DEFAULT_{OPUS,SONNET,HAIKU}_MODEL_SUPPORTS**: New env vars to override effort/thinking capability detection for pinned default models on Bedrock/Vertex/Foundry. WogiFlow's hybrid mode routes to different models — 3P users who pin models can now declare their capabilities properly.
289
+
266
290
  ### Simple Mode Naming Distinction
267
291
 
268
292
  Claude Code's `CLAUDE_CODE_SIMPLE` environment variable (which enables a simplified tool set) is **unrelated** to WogiFlow's `loops.simpleMode` (a lightweight task completion loop using string detection). They are separate features that happen to share the word "simple":
@@ -111,12 +111,23 @@ Planned files: [FILES_TO_CHANGE]
111
111
  4. If a memory database exists (.workflow/memory/local.db or via MCP):
112
112
  - Query for rejected approaches from past tasks touching the same files
113
113
  - Surface any "approach X was tried and failed" warnings
114
+ 5. **Eval trend analysis** (NEW — from Anthropic harness design research):
115
+ - Read `.workflow/evals/` directory for the last 5-10 eval results
116
+ - Calculate average score per dimension (completeness, accuracy, workflowCompliance, tokenEfficiency, quality)
117
+ - If any dimension averages below 6/10 across recent evals:
118
+ - Flag it as a RECURRING WEAKNESS
119
+ - Suggest a mitigation for the spec (e.g., "tokenEfficiency averaging 4/10 → add context budgeting hints")
120
+ - If eval calibration exists (`.workflow/state/eval-calibration.json`):
121
+ - Compare the current task type against high/low calibration examples
122
+ - Warn if this task type historically scores low
114
123
 
115
124
  Return:
116
125
  - Known risks for this task type (from feedback-patterns)
117
126
  - Past corrections in this area (from corrections/)
118
127
  - Promoted rules that apply (from decisions.md, count >= 3)
119
128
  - Rejected approaches from similar past work (from memory-db)
129
+ - **Eval trend warnings** (dimensions scoring below 6/10 in recent evals)
130
+ - **Recommended spec hints** (based on eval trends — inject into spec generation)
120
131
  - Confidence: HIGH (many data points) / MEDIUM / LOW (no history)
121
132
  ```
122
133
 
@@ -153,11 +164,12 @@ Return:
153
164
  - Security patterns that apply
154
165
  ```
155
166
 
156
- ## Agent 6: Consumer Impact Analyzer (Refactor/Migration Only)
167
+ ## Agent 6: Consumer Impact Analyzer (Refactor/Migration/Schema Changes)
157
168
 
158
- Launch as `Agent(subagent_type=Explore)` (local only). **MANDATORY for refactor, migration, architecture tasks.**
169
+ Launch as `Agent(subagent_type=Explore)` (local only). **MANDATORY for refactor, migration, architecture tasks AND any task that modifies schema/model files.**
159
170
 
160
171
  Trigger keywords: refactor, replace, rename, restructure, extract, consolidate, deprecate, migrate, move, reorganize.
172
+ Trigger files: *.prisma, *.entity.ts, *.model.ts, *.schema.ts, files listed in schema-map.md.
161
173
 
162
174
  ```
163
175
  Analyze consumer impact for task: "[TASK_TITLE]"
@@ -172,10 +184,15 @@ You MUST map all consumers before changes proceed.
172
184
  c. Grep for ALL config files that reference it
173
185
  d. Grep for ALL documentation (.md) that reference it
174
186
  e. Grep for ALL test files that import or mock it
187
+ f. For schema/model files: grep for FIELD-LEVEL references — property accesses
188
+ (obj.fieldName), destructuring ({ fieldName }), object keys (fieldName:),
189
+ and string literals ('fieldName'). Report which specific fields are referenced
190
+ by which consumers. This catches drift that module-level import checks miss.
175
191
 
176
192
  2. For EACH consumer, classify impact:
177
193
  - BREAKING (import/API changes) — describe what breaks + migration path
178
194
  - NEEDS-UPDATE (behavior change) — describe expected behavioral change
195
+ - SCHEMA-DRIFT (field removed/renamed but consumer still references old name)
179
196
  - SAFE (no change needed)
180
197
 
181
198
  3. Check indirect consumers (up to 3 levels deep)
@@ -132,6 +132,17 @@
132
132
  }
133
133
  ]
134
134
  }
135
+ ],
136
+ "TaskCreated": [
137
+ {
138
+ "hooks": [
139
+ {
140
+ "type": "command",
141
+ "command": "node scripts/hooks/entry/claude-code/task-created.js",
142
+ "timeout": 5
143
+ }
144
+ ]
145
+ }
135
146
  ]
136
147
  },
137
148
  "_wogiFlowManaged": true,
package/bin/flow CHANGED
@@ -23,7 +23,7 @@ const packageJson = require('../package.json');
23
23
  const VERSION = packageJson.version;
24
24
 
25
25
  // Global commands that don't require a project context
26
- const GLOBAL_COMMANDS = ['init', 'upgrade', 'version', '--version', '-v', '--help', '-h', 'skill', 'channel', 'login', 'logout'];
26
+ const GLOBAL_COMMANDS = ['init', 'upgrade', 'version', '--version', '-v', '--help', '-h', 'skill', 'channel', 'login', 'logout', 'workspace'];
27
27
 
28
28
  /**
29
29
  * Find the project root by looking for .workflow directory
@@ -97,6 +97,7 @@ Usage: flow <command> [options]
97
97
 
98
98
  Global Commands:
99
99
  init Initialize Wogi Flow in a new project
100
+ workspace init Initialize a multi-repo workspace
100
101
  upgrade Upgrade an existing project to latest version
101
102
  login Connect to WogiFlow Teams
102
103
  logout Disconnect from WogiFlow Teams
@@ -208,6 +209,15 @@ function main() {
208
209
  return;
209
210
  }
210
211
 
212
+ if (command === 'workspace') {
213
+ const { workspace } = require('../lib/workspace');
214
+ workspace(args.slice(1)).catch(err => {
215
+ console.error(`Workspace error: ${err.message}`);
216
+ process.exit(1);
217
+ });
218
+ return;
219
+ }
220
+
211
221
  // For all other commands, try to find project context
212
222
  const projectRoot = findProjectRoot();
213
223