wogiflow 2.15.0 → 2.16.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (38) hide show
  1. package/.claude/commands/wogi-challenge.md +4 -4
  2. package/.claude/commands/wogi-gate-stats.md +1 -1
  3. package/.claude/commands/wogi-start-continuation.md +10 -10
  4. package/.claude/commands/wogi-start.md +2 -0
  5. package/.claude/docs/intent-grounded-reasoning.md +1 -1
  6. package/.claude/docs/knowledge-base/02-task-execution/02-execution-loop.md +8 -0
  7. package/.claude/docs/knowledge-base/02-task-execution/03-verification.md +110 -10
  8. package/.claude/docs/knowledge-base/02-task-execution/README.md +10 -0
  9. package/.claude/docs/knowledge-base/02-task-execution/decision-authority.md +110 -0
  10. package/.claude/docs/knowledge-base/02-task-execution/workspace-mode.md +176 -0
  11. package/.claude/docs/knowledge-base/04-memory-context/context-management.md +40 -0
  12. package/.claude/docs/knowledge-base/04-memory-context/memory-systems.md +12 -1
  13. package/.claude/docs/knowledge-base/06-safety-guardrails/README.md +1 -0
  14. package/.claude/docs/knowledge-base/06-safety-guardrails/mechanical-gates.md +150 -0
  15. package/.claude/docs/knowledge-base/wogiflow-enterprise-showcase.md +423 -0
  16. package/.claude/docs/phases/02-spec.md +2 -2
  17. package/.claude/docs/phases/04-verify.md +1 -1
  18. package/.workflow/agents/logic-adversary.md +7 -2
  19. package/.workflow/templates/claude-md.hbs +27 -0
  20. package/lib/wogi-claude +87 -0
  21. package/package.json +3 -2
  22. package/scripts/flow-architect-pass.js +3 -3
  23. package/scripts/flow-config-defaults.js +51 -0
  24. package/scripts/flow-constants.js +3 -1
  25. package/scripts/flow-correct.js +1 -0
  26. package/scripts/flow-done.js +16 -0
  27. package/scripts/flow-hook-status.js +6 -2
  28. package/scripts/flow-logic-adversary.js +4 -4
  29. package/scripts/flow-migrate-igr.js +1 -1
  30. package/scripts/hooks/core/phase-read-gate.js +52 -9
  31. package/scripts/hooks/core/post-compact.js +18 -0
  32. package/scripts/hooks/core/session-context.js +26 -0
  33. package/scripts/hooks/core/session-end.js +10 -0
  34. package/scripts/hooks/core/session-history.js +116 -0
  35. package/scripts/hooks/core/task-boundary-reset.js +249 -0
  36. package/scripts/hooks/core/task-completed.js +35 -0
  37. package/scripts/hooks/entry/claude-code/pre-tool-use.js +10 -5
  38. package/scripts/hooks/entry/claude-code/stop.js +63 -0
@@ -1,5 +1,5 @@
1
1
  ---
2
- description: "Manual trigger of the IGR Logic Adversary — critique a plan against the Logic Constitution v1 rubric."
2
+ description: "Manual trigger of the IGR Logic Adversary — critique a plan against the Logic Constitution v2 rubric (11 principles, including Platform Capability Grounding)."
3
3
  effort: medium
4
4
  ---
5
5
 
@@ -17,13 +17,13 @@ Story: `wf-b00262b1` (IGR)
17
17
  /wogi-challenge wf-XXXXXXXX
18
18
 
19
19
  # Critique with an explicit rubric version
20
- /wogi-challenge path/to/plan.md --rubric=logic-constitution-v1
20
+ /wogi-challenge path/to/plan.md --rubric=logic-constitution-v2
21
21
  ```
22
22
 
23
23
  ## What it does
24
24
 
25
25
  1. Loads the plan (either from a file path or from `.workflow/plans/{taskId}.md`).
26
- 2. Calls `scripts/flow-logic-adversary.js buildAdversaryPrompt` to assemble the critique prompt — includes the 10-principle Logic Constitution, few-shot calibration examples, and all available intent artifacts.
26
+ 2. Calls `scripts/flow-logic-adversary.js buildAdversaryPrompt` to assemble the critique prompt — includes the 11-principle Logic Constitution v2 (Principle 11 is Platform Capability Grounding), few-shot calibration examples, and all available intent artifacts.
27
27
  3. Spawns a sub-agent via the Agent tool on a different model than this session when possible (Sonnet when you're on Opus; Opus when you're on Sonnet) — the model-separation rule per the approved spec.
28
28
  4. Parses the returned JSON verdict against the rubric schema.
29
29
  5. Records a telemetry event (`gateId: logic-adversary`) with the verdict.
@@ -46,7 +46,7 @@ Story: `wf-b00262b1` (IGR)
46
46
  ## Under the hood
47
47
 
48
48
  - Script: `scripts/flow-logic-adversary.js`
49
- - Rubric: `.workflow/rubrics/logic-constitution-v1.md`
49
+ - Rubric: `.workflow/rubrics/logic-constitution-v2.md` (v1 retained for historical telemetry)
50
50
  - Persona: `.workflow/agents/logic-adversary.md`
51
51
  - Calibration: `.workflow/state/adversary-calibration.json`
52
52
  - Telemetry: `gateId: logic-adversary` in `.workflow/state/gate-telemetry.jsonl`
@@ -43,7 +43,7 @@ Story: `wf-faf340cf` (IGR Story 0 — Gate Telemetry & Self-Assessment Framework
43
43
  A gate with `pass% = 100%` and `miss% > 10%` is **rubber-stamping**. It's letting things through that you then have to correct. This is the failure mode the owner's QA-98%-parable warned against: 100% coverage that creates false confidence is more dangerous than 70% coverage that triggers a second review.
44
44
 
45
45
  When you see high miss rates:
46
- 1. Tune the rubric (for `logic-adversary`: edit `.workflow/rubrics/logic-constitution-v1.md`)
46
+ 1. Tune the rubric (for `logic-adversary`: edit `.workflow/rubrics/logic-constitution-v2.md`)
47
47
  2. Add calibration examples (for `logic-adversary`: append to `.workflow/state/adversary-calibration.json`)
48
48
  3. Strengthen the gate's blocking behavior (for `completion-truth-gate`: raise `minTierForDone` or set `blockFalseCompletion: true`)
49
49
 
@@ -39,11 +39,11 @@ For each criterion:
39
39
 
40
40
  ### 6.5. Additional Mandatory Gates
41
41
 
42
- **Inventory Verification** (remove/fix/replace-all tasks): Pre/post inventory scan per Step 3.55. Wait for user confirmation.
42
+ **Inventory Verification** (remove/fix/replace-all tasks): Pre/post inventory scan per Step 3.55 in `.claude/docs/phases/04-verify.md`. Wait for user confirmation.
43
43
 
44
- **Item Reconciliation** (3+ item inputs): Enumerate all items, verify each becomes a criterion, reconcile at completion per Step 1.25.
44
+ **Item Reconciliation** (3+ item inputs): Enumerate all items, verify each becomes a criterion, reconcile at completion per Step 1.25 in `.claude/docs/phases/01-explore.md`.
45
45
 
46
- **Scope-Confidence Gate** (L0/L1 only): Extract assumptions, verify against codebase, present UNVERIFIABLE/CONTRADICTED per Step 1.45.
46
+ **Scope-Confidence Gate** (L0/L1 only): Extract assumptions, verify against codebase, present UNVERIFIABLE/CONTRADICTED per Step 1.45 in `.claude/docs/phases/01-explore.md`.
47
47
 
48
48
  ### 7. Verification Gates (ALL MANDATORY)
49
49
 
@@ -78,13 +78,13 @@ At every 3rd criterion: commit progress, save checkpoint to `task-checkpoint.jso
78
78
 
79
79
  Before executing ANY phase, you MUST Read the phase instruction file. The PreToolUse hook BLOCKS Edit/Write/Bash until the phase file is read.
80
80
 
81
- | Phase | File to Read |
82
- |-------|-------------|
83
- | exploring | `.claude/docs/phases/01-explore.md` |
84
- | spec_review | `.claude/docs/phases/02-spec.md` |
85
- | coding | `.claude/docs/phases/03-implement.md` |
86
- | validating | `.claude/docs/phases/04-verify.md` |
87
- | completing | `.claude/docs/phases/05-complete.md` |
81
+ | Phase | File to Read | Contents |
82
+ |-------|-------------|----------|
83
+ | exploring | `.claude/docs/phases/01-explore.md` | Steps 1–1.45: Context, framing, clarifying questions, item reconciliation, multi-agent research, reuse gate, scope-confidence audit |
84
+ | spec_review | `.claude/docs/phases/02-spec.md` | Steps 1.55–2.5: Architect pass, logic adversary, spec generation, approval gate, test generation, TodoWrite, TDD check |
85
+ | coding | `.claude/docs/phases/03-implement.md` | Steps 3–3.52: Execution loop, sprint resets, criteria verification, sub-agent output verification |
86
+ | validating | `.claude/docs/phases/04-verify.md` | Steps 3.55–3.9: Inventory verification, skeptical evaluator, runtime verification, wiring validation, standards compliance, completion truth gate |
87
+ | completing | `.claude/docs/phases/05-complete.md` | Steps 4–5: Quality gates, finalization, progress tracking, mandatory rules |
88
88
 
89
89
  ## Rules
90
90
  - Validate after EVERY file edit
@@ -211,6 +211,8 @@ Before executing ANY phase, you MUST Read the phase instruction file. The PreToo
211
211
 
212
212
  **How it works**: When you transition to a new phase, Read the corresponding file BEFORE using Edit/Write/Bash. The phase-read gate tracks which files you've read and blocks mutation tools until the current phase's file is loaded.
213
213
 
214
+ **Enforcement caveats**: The gate blocks Edit/Write/Bash when all of these hold: (a) phase is non-idle, non-routing, (b) `hooks.rules.phaseReadGate.enabled` is not false, (c) `workflow-phase.json` exists and has a recognized phase, and (d) the required phase file has not been recorded as read. If any condition fails (no phase state, unknown phase, gate disabled, config error), the gate fails open — the tool is allowed through. Read phase files proactively on every phase transition rather than assuming the gate will always catch you.
215
+
214
216
  ## Mandatory Rules
215
217
 
216
218
  - **TodoWrite**: Track progress. Clean up all items after completion.
@@ -18,7 +18,7 @@ Research finding: across 1,309 user messages mined, first-pass agent output was
18
18
  | 1 | **Intent Bootstrap** — scaffolds product/domain/glossary/user-journeys artifacts; agnostic trap-zone detector finds structural ambiguities | `scripts/flow-intent-bootstrap.js` + `scripts/flow-trap-zone.js` |
19
19
  | 2 | **Intent Framing Pass** — per-task reasoning step; produces a Framing Artifact resolving ambiguities before any other work | `scripts/flow-intent-framing.js` |
20
20
  | 3 | **Architect Pass** — read-only sub-agent produces an 8-section pre-spec plan | `scripts/flow-architect-pass.js` + persona `.workflow/agents/architect.md` |
21
- | 4 | **Logic Adversary** — separate sub-agent on a different model critiques the plan against the 10-principle Logic Constitution | `scripts/flow-logic-adversary.js` + rubric `.workflow/rubrics/logic-constitution-v1.md` |
21
+ | 4 | **Logic Adversary** — separate sub-agent on a different model critiques the plan against the 11-principle Logic Constitution (v2 adds Principle 11 — Platform Capability Grounding) | `scripts/flow-logic-adversary.js` + rubric `.workflow/rubrics/logic-constitution-v2.md` |
22
22
  | 5 | **Session Correction Memory** — detects user corrections during a session and cross-references back to gates that passed the contradicted work | extensions in `scripts/flow-correction-detector.js` |
23
23
  | 6 | **Completion Truth Gate** — audits "done" claims against Tier 0–4 evidence; downgrades language when evidence is insufficient | `scripts/flow-completion-truth-gate.js` |
24
24
  | 7 | **Pipeline wiring + rollout** — integrates all above into `/wogi-start`, the gate registry, the eval framework | (this story) |
@@ -4,6 +4,14 @@ The execution loop is the core mechanism that ensures task completion. When enab
4
4
 
5
5
  ---
6
6
 
7
+ ## Phase-Loaded Architecture (v2.15+)
8
+
9
+ The pipeline instructions are split into 5 phase files (`.claude/docs/phases/01-05`) loaded on-demand. The phase-read gate (PreToolUse hook) blocks Edit/Write/Bash until the current phase's instruction file is read. This saves ~79% of prompt tokens for conversations and small tasks.
10
+
11
+ See [Context Management](../04-memory-context/context-management.md) for details on the phase architecture and sprint-based context reset.
12
+
13
+ ---
14
+
7
15
  ## Self-Completing Loops
8
16
 
9
17
  **The Problem**: Without enforcement, AI often stops when code "looks done" but hasn't been verified against all acceptance criteria.
@@ -446,6 +446,101 @@ Cross-references spec deliverables against actual `git diff` to catch false "don
446
446
 
447
447
  ---
448
448
 
449
+ ## Skeptical Evaluator (v2.13+)
450
+
451
+ After implementation, a separate sub-agent independently grades every acceptance criterion.
452
+
453
+ **Why**: The same agent that wrote the code verifies its own work — this is "confident praise bias." Anthropic's harness research found that separating the implementer from the evaluator is a strong lever for quality.
454
+
455
+ **How it works**:
456
+ 1. Spawn a code-reviewer sub-agent on a **different model** (e.g., Sonnet evaluates Opus's work)
457
+ 2. Feed it the spec + git diff — it reads the code cold with no implementation context
458
+ 3. For each criterion: grade PASS / PARTIAL / FAIL with file:line evidence
459
+ 4. If issues found → feed back to implementer → fix → re-evaluate (max 3 rounds)
460
+ 5. Calibrated with few-shot examples from `.workflow/state/eval-calibration.json`
461
+
462
+ **Configuration**:
463
+ ```json
464
+ {
465
+ "skepticalEvaluator": {
466
+ "enabled": true,
467
+ "maxIterations": 3,
468
+ "model": "sonnet",
469
+ "calibration": true,
470
+ "skipForL3": true
471
+ }
472
+ }
473
+ ```
474
+
475
+ ---
476
+
477
+ ## Runtime Verification Gate (v2.13+)
478
+
479
+ Auto-generates and runs tests for every task that changes code. ON by default.
480
+
481
+ ### Evidence Tiers
482
+
483
+ | Tier | Name | Counts as Done? |
484
+ |------|------|----------------|
485
+ | 0 | STATIC (compiles, lints) | Never |
486
+ | 1 | STRUCTURAL (file exists, imported) | Never |
487
+ | 2 | OBSERVATIONAL (page loads, renders) | Display-only criteria |
488
+ | 3 | INTERACTIVE (click → result persists) | Yes |
489
+ | 4 | AUTOMATED (test passes) | Yes (strongest) |
490
+
491
+ ### Frontend: Browser tests generated when UI files change
492
+ - **WebMCP** (preferred): Drives actual browser — screenshot before/after, assert DOM, verify persistence after reload
493
+ - **Playwright**: Auto-generates test to `tests/verification/verify-{taskId}.spec.ts`
494
+ - **User Checklist** (fallback): Blocks completion until user replies "verified"
495
+
496
+ ### Backend: API tests generated when API files change
497
+ - HTTP integration tests with status, response shape, and persistence assertions
498
+ - Tests persist in `tests/verification/api-verify-{taskId}.test.js`
499
+
500
+ ### Fullstack: Boundary verification
501
+ - API test verifies server accepts the frontend's payload shape
502
+ - Browser test verifies UI displays the server's response shape
503
+
504
+ ### Repeat Failure Protocol
505
+
506
+ | Strike | Action |
507
+ |--------|--------|
508
+ | 1 | Normal fix |
509
+ | 2 | Mandatory root cause analysis. Must change approach. |
510
+ | 3 | Hard block: evidence required. Must explain what's different. |
511
+ | 4+ | Escalation: suggest pair debugging with developer. |
512
+
513
+ ```json
514
+ {
515
+ "runtimeVerification": {
516
+ "enabled": true,
517
+ "autoGenerateTests": true,
518
+ "persistTests": true,
519
+ "blockOnFailure": true
520
+ }
521
+ }
522
+ ```
523
+
524
+ ---
525
+
526
+ ## Sprint-Based Context Reset (v2.12+)
527
+
528
+ For large tasks (5+ criteria), every 3 criteria: commit progress, save checkpoint, compact context, resume with fresh context reading the spec anew.
529
+
530
+ ```json
531
+ {
532
+ "sprintReset": { "enabled": true, "criteriaPerSprint": 3, "minTaskCriteria": 5 }
533
+ }
534
+ ```
535
+
536
+ ---
537
+
538
+ ## Completion Truth Gate (IGR, v2.13+)
539
+
540
+ When IGR is enabled, audits every "done" claim against evidence tiers. Claims with only Tier 0-1 evidence are downgraded to "implemented (unverified)" and task completion is blocked.
541
+
542
+ ---
543
+
449
544
  ## Verification Flow Summary
450
545
 
451
546
  ```
@@ -461,23 +556,28 @@ Task Completion Attempt
461
556
  │ 2.5 Git-Verified Claim Check │
462
557
  │ - Spec promises match git diff? │
463
558
  ├────────────────────────────────────────────┤
464
- │ 3. Integration Wiring Check
559
+ │ 3. Skeptical Evaluator (L2+)
560
+ │ - Separate agent grades each criterion │
561
+ ├────────────────────────────────────────────┤
562
+ │ 3.5 Runtime Verification Gate │
563
+ │ - Auto-generated frontend/backend tests │
564
+ ├────────────────────────────────────────────┤
565
+ │ 4. Integration Wiring Check │
465
566
  │ - Created files imported somewhere? │
466
- │ - Components wired to parents? │
467
567
  ├────────────────────────────────────────────┤
468
- 3.5 Cross-Artifact Consistency
469
- │ - Maps match codebase?
568
+ 4.5 Standards Compliance
569
+ │ - Naming, security, decisions.md rules
470
570
  ├────────────────────────────────────────────┤
471
- 4. Run Quality Gates │
571
+ 5. Run Quality Gates │
472
572
  │ - tests, lint, typecheck │
473
573
  ├────────────────────────────────────────────┤
474
- 5. Smoke Test (for refactors)
475
- │ - App starts without errors
574
+ 6. Completion Truth Gate (IGR)
575
+ │ - Evidence tier >= 3 for "done" claims
476
576
  ├────────────────────────────────────────────┤
477
- 6. Run Regression Tests (if enabled)
478
- │ - Sample completed tasks
577
+ 7. Smoke Test (for refactors)
578
+ │ - App starts without errors
479
579
  ├────────────────────────────────────────────┤
480
- 7. Security Scan (if enabled) │
580
+ 8. Security Scan (if enabled) │
481
581
  └────────────────────────────────────────────┘
482
582
 
483
583
  All passed? → Complete task
@@ -182,6 +182,16 @@ Merge, PR, or discard decision workflow for branches.
182
182
 
183
183
  [Read more: Branch Finalization](./branch-finalization.md)
184
184
 
185
+ ### Workspace Mode
186
+ Multi-repo orchestration with manager-worker architecture, boundary enforcement, and agent-to-agent communication.
187
+
188
+ [Read more: Workspace Mode](./workspace-mode.md)
189
+
190
+ ### Decision Authority
191
+ Automatic classification of which decisions the AI makes autonomously vs which need human approval.
192
+
193
+ [Read more: Decision Authority](./decision-authority.md)
194
+
185
195
  ### External Integrations (Archived)
186
196
  Task import from Jira and Linear — currently archived, may return via WogiFlow Teams.
187
197
 
@@ -0,0 +1,110 @@
1
+ # Decision Authority Framework
2
+
3
+ Automatically classifies which decisions the AI can make autonomously vs which require human approval.
4
+
5
+ ---
6
+
7
+ ## Overview
8
+
9
+ During task execution, the AI faces many decisions: naming conventions, error handling strategy, library choice, API shape, UX behavior. Without structure, it either asks too many questions (blocking progress) or makes too many autonomous choices (surprising the developer).
10
+
11
+ The Decision Authority Framework classifies every decision into one of four authority levels, with configurable defaults per category.
12
+
13
+ ---
14
+
15
+ ## Authority Levels
16
+
17
+ | Level | Action |
18
+ |-------|--------|
19
+ | `agent-decides` | Decide autonomously. Report only in completion summary. |
20
+ | `agent-decides-report-after` | Decide autonomously. Explicitly state the decision after implementing. |
21
+ | `owner-decides` | Present to user. Wait for answer before proceeding. |
22
+ | `auto-fix-report-after` | Fix automatically. Report what was fixed after. |
23
+
24
+ ---
25
+
26
+ ## Default Categories
27
+
28
+ | Category | Default Authority | Rationale |
29
+ |----------|------------------|-----------|
30
+ | Engineering | `agent-decides` | Code structure, patterns — AI competent |
31
+ | Naming | `agent-decides` | Variable/function names — low risk |
32
+ | Infrastructure | `agent-decides-report-after` | Build config, deps — report for awareness |
33
+ | Performance | `agent-decides-report-after` | Optimization choices — report for awareness |
34
+ | Product Behavior | `owner-decides` | Feature behavior — human judgment needed |
35
+ | UX | `owner-decides` | User-facing design — human judgment needed |
36
+ | Security | `auto-fix-report-after` | Vulnerabilities — fix immediately, report after |
37
+
38
+ ---
39
+
40
+ ## Batch Enforcement
41
+
42
+ When multiple decisions arise in a single task:
43
+ - Decisions are batched and classified together
44
+ - If `owner-decides` questions exceed `maxOwnerQuestionsPerBatch` (default: 5), overflow is automatically downgraded to `agent-decides-report-after`
45
+ - This prevents question flooding (12+ questions blocking progress)
46
+
47
+ ---
48
+
49
+ ## Low-Confidence Fallback
50
+
51
+ When the classifier cannot confidently categorize a decision, it defaults to `owner-decides` — the safest fallback. Better to ask unnecessarily than to make an unauthorized autonomous decision.
52
+
53
+ ---
54
+
55
+ ## Usage
56
+
57
+ ### Classify a decision
58
+
59
+ ```bash
60
+ node node_modules/wogiflow/scripts/flow-decision-authority.js classify "Should we use Redis or in-memory cache?"
61
+ ```
62
+
63
+ ### Batch classify
64
+
65
+ ```bash
66
+ node node_modules/wogiflow/scripts/flow-decision-authority.js batch '[
67
+ "Should we use Redis or in-memory cache?",
68
+ "Name for the cache service class?",
69
+ "Add rate limiting to the endpoint?"
70
+ ]'
71
+ ```
72
+
73
+ ### Update category authority
74
+
75
+ Users can change defaults via `/wogi-decide`:
76
+
77
+ ```
78
+ "from now on, just fix infrastructure decisions yourself"
79
+ → Updates infrastructure category to agent-decides
80
+ ```
81
+
82
+ ---
83
+
84
+ ## Configuration
85
+
86
+ ```json
87
+ {
88
+ "decisionAuthority": {
89
+ "enabled": true,
90
+ "maxOwnerQuestionsPerBatch": 5,
91
+ "categories": {
92
+ "engineering": "agent-decides",
93
+ "naming": "agent-decides",
94
+ "infrastructure": "agent-decides-report-after",
95
+ "performance": "agent-decides-report-after",
96
+ "productBehavior": "owner-decides",
97
+ "ux": "owner-decides",
98
+ "security": "auto-fix-report-after"
99
+ }
100
+ }
101
+ }
102
+ ```
103
+
104
+ ---
105
+
106
+ ## Related
107
+
108
+ - [Task Planning](./01-task-planning.md) — Where decisions arise during planning
109
+ - [Execution Loop](./02-execution-loop.md) — Decisions during implementation
110
+ - [Rules Management](../03-self-improvement/rules-management.md) — `/wogi-decide` for permanent rules
@@ -0,0 +1,176 @@
1
+ # Workspace Mode: Multi-Repo Orchestration
2
+
3
+ Manage multiple repositories from a single orchestrator using the manager-worker architecture.
4
+
5
+ ---
6
+
7
+ ## Overview
8
+
9
+ Workspace mode enables a **manager** Claude Code session to orchestrate work across multiple **worker** repos. The manager reads metadata, creates execution plans, and dispatches tasks — but never touches source code directly. Each worker runs its own Claude Code session and executes independently.
10
+
11
+ ```
12
+ Manager (workspace root)
13
+ ├── Backend repo (provider — APIs, database)
14
+ ├── Frontend repo (consumer — UI, pages)
15
+ ├── Shared repo (library — types, utilities)
16
+ └── Mobile repo (consumer — native app)
17
+ ```
18
+
19
+ ---
20
+
21
+ ## Setup
22
+
23
+ ### 1. Create workspace config
24
+
25
+ Create `wogi-workspace.json` at the workspace root:
26
+
27
+ ```json
28
+ {
29
+ "workspace": "my-project",
30
+ "members": {
31
+ "backend": { "role": "provider", "path": "./backend", "port": 8802 },
32
+ "frontend": { "role": "consumer", "path": "./frontend", "port": 8803 },
33
+ "shared": { "role": "library", "path": "./shared", "port": 8804 }
34
+ }
35
+ }
36
+ ```
37
+
38
+ ### 2. Start worker sessions
39
+
40
+ Each worker runs with its identity:
41
+
42
+ ```bash
43
+ WOGI_REPO_NAME=backend WOGI_CHANNEL_PORT=8802 claude
44
+ WOGI_REPO_NAME=frontend WOGI_CHANNEL_PORT=8803 claude
45
+ ```
46
+
47
+ ### 3. Start manager session
48
+
49
+ ```bash
50
+ WOGI_REPO_NAME=manager WOGI_PEERS=backend:8802,frontend:8803,shared:8804 claude
51
+ ```
52
+
53
+ ---
54
+
55
+ ## How Task Routing Works
56
+
57
+ When you tell the manager "Add user profile editing":
58
+
59
+ 1. **Metadata scan**: Manager reads api-map, app-map, schema-map from each member repo (never source code)
60
+ 2. **Routing analysis**: Scores each repo by matching task keywords against role keywords
61
+ - Provider keywords: endpoint, route, controller, database, schema, backend, api
62
+ - Consumer keywords: page, component, ui, form, modal, hook, redux
63
+ - Library keywords: shared, utility, types, common, helper
64
+ 3. **Execution plan**: Determines single-repo or cross-repo, creates phased plan
65
+ 4. **Dispatch**: Tasks sent to workers via HTTP channel
66
+
67
+ ### Execution Phase Order
68
+
69
+ Cross-repo tasks execute in dependency order:
70
+
71
+ ```
72
+ library (0) → contract (0) → provider (1) → consumer (2) → verify (4)
73
+ ```
74
+
75
+ The provider (backend) always finishes before the consumer (frontend) starts — no broken integrations from timing mismatches.
76
+
77
+ ---
78
+
79
+ ## Manager Boundary Enforcement
80
+
81
+ The manager-boundary-gate mechanically prevents the manager from modifying worker source code:
82
+
83
+ | Action | Allowed? |
84
+ |--------|----------|
85
+ | Read metadata (api-map, app-map, config, state) | Yes |
86
+ | Read source code | No — blocked |
87
+ | Edit/Write any worker file | No — blocked |
88
+ | Bash in worker directories | Only allowlisted read-only commands |
89
+ | Dispatch tasks to workers | Yes |
90
+ | Read worker messages | Yes |
91
+
92
+ This is enforced by a PreToolUse hook gate, not a prompt. The manager physically cannot `cd` into a worker repo and start editing.
93
+
94
+ ---
95
+
96
+ ## Agent-to-Agent Communication
97
+
98
+ Workers communicate through a file-based message bus at `.workspace/messages/`:
99
+
100
+ | Message Type | Purpose |
101
+ |-------------|---------|
102
+ | `contract-change` | "I changed an API endpoint" |
103
+ | `question` | "Does your side handle X?" |
104
+ | `impact-query` | Pre-implementation: "Will my change break you?" |
105
+ | `impact-response` | "Yes/No, watch out for..." |
106
+ | `task-complete` | "I finished my side" |
107
+ | `needs-help` | "I'm stuck, can you check X?" |
108
+ | `heads-up` | "I'm about to change Y, FYI" |
109
+ | `verification-request` | "Please verify your integrations" |
110
+ | `lock-acquired` / `lock-released` | Shared interface edit coordination |
111
+ | `bug-report` | "Your endpoint returns 500 when..." |
112
+
113
+ Workers can also query peers directly via HTTP for synchronous questions.
114
+
115
+ ---
116
+
117
+ ## Cross-Repo Quality Gates
118
+
119
+ When workspace mode is active, additional quality gates are injected:
120
+
121
+ - **Contract Compliance**: Changes must comply with declared API contracts in `.workspace/contracts/`
122
+ - **Peer Notification**: Affected repos are automatically notified of changes
123
+ - **Cascade Verification**: Library changes trigger verification in all consumer repos
124
+ - **Cross-Repo Impact Check**: Verify impact assessed before implementation
125
+
126
+ ---
127
+
128
+ ## Contract Management
129
+
130
+ `workspace-contracts.js` tracks integration health:
131
+
132
+ - Builds integration map: cross-references provider endpoints with consumer usage
133
+ - Detects orphaned consumers (calling endpoints that don't exist)
134
+ - Detects orphaned providers (endpoints nobody uses)
135
+ - Tracks type versions for schema drift detection
136
+ - Supports OpenAPI, GraphQL, TypeScript, and JSON Schema contract formats
137
+
138
+ ---
139
+
140
+ ## Session Continuity
141
+
142
+ Manager sessions have special handoff handling:
143
+
144
+ - `saveManagerHandoff()`: Captures dispatched tasks, pending messages, active locks, contract drifts
145
+ - `loadManagerHandoff()`: Restores state on next session start
146
+ - Session notes and decisions are preserved across restarts
147
+
148
+ ---
149
+
150
+ ## Directory Structure
151
+
152
+ ```
153
+ workspace-root/
154
+ ├── wogi-workspace.json # Workspace configuration
155
+ ├── .workspace/
156
+ │ ├── state/ # Workspace-level state
157
+ │ │ ├── workspace-manifest.json
158
+ │ │ └── manager-session.json
159
+ │ ├── contracts/ # Shared API contracts
160
+ │ ├── messages/ # Agent-to-agent message bus
161
+ │ └── specs/ # Cross-repo task specs
162
+ ├── backend/ # Worker repo (provider)
163
+ │ └── .workflow/ # Its own WogiFlow state
164
+ ├── frontend/ # Worker repo (consumer)
165
+ │ └── .workflow/
166
+ └── shared/ # Worker repo (library)
167
+ └── .workflow/
168
+ ```
169
+
170
+ ---
171
+
172
+ ## Related
173
+
174
+ - [Mechanical Gates](../06-safety-guardrails/mechanical-gates.md) — Manager boundary gate details
175
+ - [Execution Loop](./02-execution-loop.md) — Single-repo task execution
176
+ - [Model Management](./model-management.md) — Multi-model support
@@ -175,6 +175,46 @@ Before running `/compact`:
175
175
 
176
176
  ---
177
177
 
178
+ ## Phase-Loaded Architecture (v2.15+)
179
+
180
+ The `/wogi-start` pipeline instructions are split into 5 phase files loaded on-demand. This reduces prompt token consumption by ~79% for conversations and small tasks that never reach later phases.
181
+
182
+ | Phase | File | Loaded when |
183
+ |-------|------|-------------|
184
+ | exploring | `.claude/docs/phases/01-explore.md` | Phase transitions to exploring |
185
+ | spec_review | `.claude/docs/phases/02-spec.md` | Phase transitions to spec_review |
186
+ | coding | `.claude/docs/phases/03-implement.md` | Phase transitions to coding |
187
+ | validating | `.claude/docs/phases/04-verify.md` | Phase transitions to validating |
188
+ | completing | `.claude/docs/phases/05-complete.md` | Phase transitions to completing |
189
+
190
+ The phase-read gate (PreToolUse hook) blocks Edit/Write/Bash until the current phase's file is read.
191
+
192
+ ---
193
+
194
+ ## Sprint-Based Context Reset (v2.12+)
195
+
196
+ For large tasks (5+ acceptance criteria), context degrades as implementation details from early criteria crowd out what matters for the current one.
197
+
198
+ **How it works**: At every Nth criterion (default: 3):
199
+ 1. Commit progress: `git add -A && git commit -m "sprint: criteria 1-N of M complete"`
200
+ 2. Save checkpoint to `.workflow/state/task-checkpoint.json` (task ID, completed criteria, changed files)
201
+ 3. Compact context — the PostCompact hook restores task state automatically
202
+ 4. Resume from checkpoint with fresh context, reading the spec anew
203
+
204
+ **Key difference from normal compaction**: Normal compaction summarizes the conversation. Sprint reset commits work, saves a structured checkpoint, and provides a clean slate. The next sprint reads the spec fresh rather than relying on a compressed summary.
205
+
206
+ ```json
207
+ {
208
+ "sprintReset": {
209
+ "enabled": true,
210
+ "criteriaPerSprint": 3,
211
+ "minTaskCriteria": 5
212
+ }
213
+ }
214
+ ```
215
+
216
+ ---
217
+
178
218
  ## Compaction Strategy
179
219
 
180
220
  ### Default Strategy
@@ -44,12 +44,23 @@ WogiFlow has multiple memory systems:
44
44
 
45
45
  ## Local Facts
46
46
 
47
- Stored in SQLite database:
47
+ Stored in SQLite database with semantic search capabilities:
48
48
 
49
49
  ```
50
50
  .workflow/memory/local.db
51
51
  ```
52
52
 
53
+ ### Semantic Search (Embeddings)
54
+
55
+ The memory database supports vector-based semantic search using HuggingFace Transformers:
56
+
57
+ - **Embedding model**: `Xenova/all-MiniLM-L6-v2` (runs locally, no API calls)
58
+ - **Similarity**: Cosine similarity between query embedding and stored fact embeddings
59
+ - **Fallback**: When `@huggingface/transformers` is not installed, falls back to text-based search
60
+ - **Threshold**: Results below `0.1` similarity are filtered out
61
+
62
+ This enables queries like "find decisions related to authentication" to match facts that don't contain the exact word "authentication" but are semantically related (e.g., "JWT tokens expire after 1 hour", "Use bcrypt for password hashing").
63
+
53
64
  ### Fact Structure
54
65
 
55
66
  ```json
@@ -18,6 +18,7 @@ Safety features prevent:
18
18
 
19
19
  | Feature | Purpose |
20
20
  |---------|---------|
21
+ | [Mechanical Gates](./mechanical-gates.md) | 12+ PreToolUse hook gates that physically block violations |
21
22
  | [Damage Control](./damage-control.md) | Pattern-based protection |
22
23
  | [Security Scanning](./security-scanning.md) | Pre-commit security checks |
23
24
  | [Checkpoint/Rollback](./checkpoint-rollback.md) | Recovery system |