npm - wogiflow - Versions diffs - 2.15.0 → 2.15.1 - Mend

wogiflow 2.15.0 → 2.15.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

package/.claude/commands/wogi-start-continuation.md CHANGED Viewed

@@ -39,11 +39,11 @@ For each criterion:
 ### 6.5. Additional Mandatory Gates
-**Inventory Verification** (remove/fix/replace-all tasks): Pre/post inventory scan per Step 3.55. Wait for user confirmation.
+**Inventory Verification** (remove/fix/replace-all tasks): Pre/post inventory scan per Step 3.55 in `.claude/docs/phases/04-verify.md`. Wait for user confirmation.
-**Item Reconciliation** (3+ item inputs): Enumerate all items, verify each becomes a criterion, reconcile at completion per Step 1.25.
+**Item Reconciliation** (3+ item inputs): Enumerate all items, verify each becomes a criterion, reconcile at completion per Step 1.25 in `.claude/docs/phases/01-explore.md`.
-**Scope-Confidence Gate** (L0/L1 only): Extract assumptions, verify against codebase, present UNVERIFIABLE/CONTRADICTED per Step 1.45.
+**Scope-Confidence Gate** (L0/L1 only): Extract assumptions, verify against codebase, present UNVERIFIABLE/CONTRADICTED per Step 1.45 in `.claude/docs/phases/01-explore.md`.
 ### 7. Verification Gates (ALL MANDATORY)
@@ -78,13 +78,13 @@ At every 3rd criterion: commit progress, save checkpoint to `task-checkpoint.jso
 Before executing ANY phase, you MUST Read the phase instruction file. The PreToolUse hook BLOCKS Edit/Write/Bash until the phase file is read.
-| Phase | File to Read |
-|-------|-------------|
-| exploring | `.claude/docs/phases/01-explore.md` |
-| spec_review | `.claude/docs/phases/02-spec.md` |
-| coding | `.claude/docs/phases/03-implement.md` |
-| validating | `.claude/docs/phases/04-verify.md` |
-| completing | `.claude/docs/phases/05-complete.md` |
+| Phase | File to Read | Contents |
+|-------|-------------|----------|
+| exploring | `.claude/docs/phases/01-explore.md` | Steps 1–1.45: Context, framing, clarifying questions, item reconciliation, multi-agent research, reuse gate, scope-confidence audit |
+| spec_review | `.claude/docs/phases/02-spec.md` | Steps 1.55–2.5: Architect pass, logic adversary, spec generation, approval gate, test generation, TodoWrite, TDD check |
+| coding | `.claude/docs/phases/03-implement.md` | Steps 3–3.52: Execution loop, sprint resets, criteria verification, sub-agent output verification |
+| validating | `.claude/docs/phases/04-verify.md` | Steps 3.55–3.9: Inventory verification, skeptical evaluator, runtime verification, wiring validation, standards compliance, completion truth gate |
+| completing | `.claude/docs/phases/05-complete.md` | Steps 4–5: Quality gates, finalization, progress tracking, mandatory rules |
 ## Rules
 - Validate after EVERY file edit

package/.claude/commands/wogi-start.md CHANGED Viewed

@@ -211,6 +211,8 @@ Before executing ANY phase, you MUST Read the phase instruction file. The PreToo
 **How it works**: When you transition to a new phase, Read the corresponding file BEFORE using Edit/Write/Bash. The phase-read gate tracks which files you've read and blocks mutation tools until the current phase's file is loaded.
+**Enforcement caveats**: The gate blocks Edit/Write/Bash when all of these hold: (a) phase is non-idle, non-routing, (b) `hooks.rules.phaseReadGate.enabled` is not false, (c) `workflow-phase.json` exists and has a recognized phase, and (d) the required phase file has not been recorded as read. If any condition fails (no phase state, unknown phase, gate disabled, config error), the gate fails open — the tool is allowed through. Read phase files proactively on every phase transition rather than assuming the gate will always catch you.
 ## Mandatory Rules
 - **TodoWrite**: Track progress. Clean up all items after completion.

package/.claude/docs/knowledge-base/02-task-execution/02-execution-loop.md CHANGED Viewed

@@ -4,6 +4,14 @@ The execution loop is the core mechanism that ensures task completion. When enab
 ---
+## Phase-Loaded Architecture (v2.15+)
+The pipeline instructions are split into 5 phase files (`.claude/docs/phases/01-05`) loaded on-demand. The phase-read gate (PreToolUse hook) blocks Edit/Write/Bash until the current phase's instruction file is read. This saves ~79% of prompt tokens for conversations and small tasks.
+See [Context Management](../04-memory-context/context-management.md) for details on the phase architecture and sprint-based context reset.
+---
 ## Self-Completing Loops
 **The Problem**: Without enforcement, AI often stops when code "looks done" but hasn't been verified against all acceptance criteria.

package/.claude/docs/knowledge-base/02-task-execution/03-verification.md CHANGED Viewed

@@ -446,6 +446,101 @@ Cross-references spec deliverables against actual `git diff` to catch false "don
 ---
+## Skeptical Evaluator (v2.13+)
+After implementation, a separate sub-agent independently grades every acceptance criterion.
+**Why**: The same agent that wrote the code verifies its own work — this is "confident praise bias." Anthropic's harness research found that separating the implementer from the evaluator is a strong lever for quality.
+**How it works**:
+1. Spawn a code-reviewer sub-agent on a **different model** (e.g., Sonnet evaluates Opus's work)
+2. Feed it the spec + git diff — it reads the code cold with no implementation context
+3. For each criterion: grade PASS / PARTIAL / FAIL with file:line evidence
+4. If issues found → feed back to implementer → fix → re-evaluate (max 3 rounds)
+5. Calibrated with few-shot examples from `.workflow/state/eval-calibration.json`
+**Configuration**:
+```json
+{
+  "skepticalEvaluator": {
+    "enabled": true,
+    "maxIterations": 3,
+    "model": "sonnet",
+    "calibration": true,
+    "skipForL3": true
+  }
+}
+```
+---
+## Runtime Verification Gate (v2.13+)
+Auto-generates and runs tests for every task that changes code. ON by default.
+### Evidence Tiers
+| Tier | Name | Counts as Done? |
+|------|------|----------------|
+| 0 | STATIC (compiles, lints) | Never |
+| 1 | STRUCTURAL (file exists, imported) | Never |
+| 2 | OBSERVATIONAL (page loads, renders) | Display-only criteria |
+| 3 | INTERACTIVE (click → result persists) | Yes |
+| 4 | AUTOMATED (test passes) | Yes (strongest) |
+### Frontend: Browser tests generated when UI files change
+- **WebMCP** (preferred): Drives actual browser — screenshot before/after, assert DOM, verify persistence after reload
+- **Playwright**: Auto-generates test to `tests/verification/verify-{taskId}.spec.ts`
+- **User Checklist** (fallback): Blocks completion until user replies "verified"
+### Backend: API tests generated when API files change
+- HTTP integration tests with status, response shape, and persistence assertions
+- Tests persist in `tests/verification/api-verify-{taskId}.test.js`
+### Fullstack: Boundary verification
+- API test verifies server accepts the frontend's payload shape
+- Browser test verifies UI displays the server's response shape
+### Repeat Failure Protocol
+| Strike | Action |
+|--------|--------|
+| 1 | Normal fix |
+| 2 | Mandatory root cause analysis. Must change approach. |
+| 3 | Hard block: evidence required. Must explain what's different. |
+| 4+ | Escalation: suggest pair debugging with developer. |
+```json
+{
+  "runtimeVerification": {
+    "enabled": true,
+    "autoGenerateTests": true,
+    "persistTests": true,
+    "blockOnFailure": true
+  }
+}
+```
+---
+## Sprint-Based Context Reset (v2.12+)
+For large tasks (5+ criteria), every 3 criteria: commit progress, save checkpoint, compact context, resume with fresh context reading the spec anew.
+```json
+{
+  "sprintReset": { "enabled": true, "criteriaPerSprint": 3, "minTaskCriteria": 5 }
+}
+```
+---
+## Completion Truth Gate (IGR, v2.13+)
+When IGR is enabled, audits every "done" claim against evidence tiers. Claims with only Tier 0-1 evidence are downgraded to "implemented (unverified)" and task completion is blocked.
+---
 ## Verification Flow Summary
 ```
@@ -461,23 +556,28 @@ Task Completion Attempt
 │ 2.5 Git-Verified Claim Check               │
 │    - Spec promises match git diff?         │
 ├────────────────────────────────────────────┤
-│ 3. Integration Wiring Check                │
+│ 3. Skeptical Evaluator (L2+)               │
+│    - Separate agent grades each criterion  │
+├────────────────────────────────────────────┤
+│ 3.5 Runtime Verification Gate              │
+│    - Auto-generated frontend/backend tests │
+├────────────────────────────────────────────┤
+│ 4. Integration Wiring Check                │
 │    - Created files imported somewhere?     │
-│    - Components wired to parents?          │
 ├────────────────────────────────────────────┤
-│ 3.5 Cross-Artifact Consistency             │
-│    - Maps match codebase?                  │
+│ 4.5 Standards Compliance                   │
+│    - Naming, security, decisions.md rules  │
 ├────────────────────────────────────────────┤
-│ 4. Run Quality Gates                       │
+│ 5. Run Quality Gates                       │
 │    - tests, lint, typecheck               │
 ├────────────────────────────────────────────┤
-│ 5. Smoke Test (for refactors)              │
-│    - App starts without errors             │
+│ 6. Completion Truth Gate (IGR)             │
+│    - Evidence tier >= 3 for "done" claims  │
 ├────────────────────────────────────────────┤
-│ 6. Run Regression Tests (if enabled)       │
-│    - Sample completed tasks               │
+│ 7. Smoke Test (for refactors)              │
+│    - App starts without errors             │
 ├────────────────────────────────────────────┤
-│ 7. Security Scan (if enabled)              │
+│ 8. Security Scan (if enabled)              │
 └────────────────────────────────────────────┘
          ↓
     All passed? → Complete task

package/.claude/docs/knowledge-base/02-task-execution/README.md CHANGED Viewed

@@ -182,6 +182,16 @@ Merge, PR, or discard decision workflow for branches.
 [Read more: Branch Finalization](./branch-finalization.md)
+### Workspace Mode
+Multi-repo orchestration with manager-worker architecture, boundary enforcement, and agent-to-agent communication.
+[Read more: Workspace Mode](./workspace-mode.md)
+### Decision Authority
+Automatic classification of which decisions the AI makes autonomously vs which need human approval.
+[Read more: Decision Authority](./decision-authority.md)
 ### External Integrations (Archived)
 Task import from Jira and Linear — currently archived, may return via WogiFlow Teams.

package/.claude/docs/knowledge-base/02-task-execution/decision-authority.md ADDED Viewed

@@ -0,0 +1,110 @@
+# Decision Authority Framework
+Automatically classifies which decisions the AI can make autonomously vs which require human approval.
+---
+## Overview
+During task execution, the AI faces many decisions: naming conventions, error handling strategy, library choice, API shape, UX behavior. Without structure, it either asks too many questions (blocking progress) or makes too many autonomous choices (surprising the developer).
+The Decision Authority Framework classifies every decision into one of four authority levels, with configurable defaults per category.
+---
+## Authority Levels
+| Level | Action |
+|-------|--------|
+| `agent-decides` | Decide autonomously. Report only in completion summary. |
+| `agent-decides-report-after` | Decide autonomously. Explicitly state the decision after implementing. |
+| `owner-decides` | Present to user. Wait for answer before proceeding. |
+| `auto-fix-report-after` | Fix automatically. Report what was fixed after. |
+---
+## Default Categories
+| Category | Default Authority | Rationale |
+|----------|------------------|-----------|
+| Engineering | `agent-decides` | Code structure, patterns — AI competent |
+| Naming | `agent-decides` | Variable/function names — low risk |
+| Infrastructure | `agent-decides-report-after` | Build config, deps — report for awareness |
+| Performance | `agent-decides-report-after` | Optimization choices — report for awareness |
+| Product Behavior | `owner-decides` | Feature behavior — human judgment needed |
+| UX | `owner-decides` | User-facing design — human judgment needed |
+| Security | `auto-fix-report-after` | Vulnerabilities — fix immediately, report after |
+---
+## Batch Enforcement
+When multiple decisions arise in a single task:
+- Decisions are batched and classified together
+- If `owner-decides` questions exceed `maxOwnerQuestionsPerBatch` (default: 5), overflow is automatically downgraded to `agent-decides-report-after`
+- This prevents question flooding (12+ questions blocking progress)
+---
+## Low-Confidence Fallback
+When the classifier cannot confidently categorize a decision, it defaults to `owner-decides` — the safest fallback. Better to ask unnecessarily than to make an unauthorized autonomous decision.
+---
+## Usage
+### Classify a decision
+```bash
+node node_modules/wogiflow/scripts/flow-decision-authority.js classify "Should we use Redis or in-memory cache?"
+```
+### Batch classify
+```bash
+node node_modules/wogiflow/scripts/flow-decision-authority.js batch '[
+  "Should we use Redis or in-memory cache?",
+  "Name for the cache service class?",
+  "Add rate limiting to the endpoint?"
+]'
+```
+### Update category authority
+Users can change defaults via `/wogi-decide`:
+```
+"from now on, just fix infrastructure decisions yourself"
+→ Updates infrastructure category to agent-decides
+```
+---
+## Configuration
+```json
+{
+  "decisionAuthority": {
+    "enabled": true,
+    "maxOwnerQuestionsPerBatch": 5,
+    "categories": {
+      "engineering": "agent-decides",
+      "naming": "agent-decides",
+      "infrastructure": "agent-decides-report-after",
+      "performance": "agent-decides-report-after",
+      "productBehavior": "owner-decides",
+      "ux": "owner-decides",
+      "security": "auto-fix-report-after"
+    }
+  }
+}
+```
+---
+## Related
+- [Task Planning](./01-task-planning.md) — Where decisions arise during planning
+- [Execution Loop](./02-execution-loop.md) — Decisions during implementation
+- [Rules Management](../03-self-improvement/rules-management.md) — `/wogi-decide` for permanent rules

package/.claude/docs/knowledge-base/02-task-execution/workspace-mode.md ADDED Viewed

@@ -0,0 +1,176 @@
+# Workspace Mode: Multi-Repo Orchestration
+Manage multiple repositories from a single orchestrator using the manager-worker architecture.
+---
+## Overview
+Workspace mode enables a **manager** Claude Code session to orchestrate work across multiple **worker** repos. The manager reads metadata, creates execution plans, and dispatches tasks — but never touches source code directly. Each worker runs its own Claude Code session and executes independently.
+```
+Manager (workspace root)
+   ├── Backend repo (provider — APIs, database)
+   ├── Frontend repo (consumer — UI, pages)
+   ├── Shared repo (library — types, utilities)
+   └── Mobile repo (consumer — native app)
+```
+---
+## Setup
+### 1. Create workspace config
+Create `wogi-workspace.json` at the workspace root:
+```json
+{
+  "workspace": "my-project",
+  "members": {
+    "backend": { "role": "provider", "path": "./backend", "port": 8802 },
+    "frontend": { "role": "consumer", "path": "./frontend", "port": 8803 },
+    "shared": { "role": "library", "path": "./shared", "port": 8804 }
+  }
+}
+```
+### 2. Start worker sessions
+Each worker runs with its identity:
+```bash
+WOGI_REPO_NAME=backend WOGI_CHANNEL_PORT=8802 claude
+WOGI_REPO_NAME=frontend WOGI_CHANNEL_PORT=8803 claude
+```
+### 3. Start manager session
+```bash
+WOGI_REPO_NAME=manager WOGI_PEERS=backend:8802,frontend:8803,shared:8804 claude
+```
+---
+## How Task Routing Works
+When you tell the manager "Add user profile editing":
+1. **Metadata scan**: Manager reads api-map, app-map, schema-map from each member repo (never source code)
+2. **Routing analysis**: Scores each repo by matching task keywords against role keywords
+   - Provider keywords: endpoint, route, controller, database, schema, backend, api
+   - Consumer keywords: page, component, ui, form, modal, hook, redux
+   - Library keywords: shared, utility, types, common, helper
+3. **Execution plan**: Determines single-repo or cross-repo, creates phased plan
+4. **Dispatch**: Tasks sent to workers via HTTP channel
+### Execution Phase Order
+Cross-repo tasks execute in dependency order:
+```
+library (0) → contract (0) → provider (1) → consumer (2) → verify (4)
+```
+The provider (backend) always finishes before the consumer (frontend) starts — no broken integrations from timing mismatches.
+---
+## Manager Boundary Enforcement
+The manager-boundary-gate mechanically prevents the manager from modifying worker source code:
+| Action | Allowed? |
+|--------|----------|
+| Read metadata (api-map, app-map, config, state) | Yes |
+| Read source code | No — blocked |
+| Edit/Write any worker file | No — blocked |
+| Bash in worker directories | Only allowlisted read-only commands |
+| Dispatch tasks to workers | Yes |
+| Read worker messages | Yes |
+This is enforced by a PreToolUse hook gate, not a prompt. The manager physically cannot `cd` into a worker repo and start editing.
+---
+## Agent-to-Agent Communication
+Workers communicate through a file-based message bus at `.workspace/messages/`:
+| Message Type | Purpose |
+|-------------|---------|
+| `contract-change` | "I changed an API endpoint" |
+| `question` | "Does your side handle X?" |
+| `impact-query` | Pre-implementation: "Will my change break you?" |
+| `impact-response` | "Yes/No, watch out for..." |
+| `task-complete` | "I finished my side" |
+| `needs-help` | "I'm stuck, can you check X?" |
+| `heads-up` | "I'm about to change Y, FYI" |
+| `verification-request` | "Please verify your integrations" |
+| `lock-acquired` / `lock-released` | Shared interface edit coordination |
+| `bug-report` | "Your endpoint returns 500 when..." |
+Workers can also query peers directly via HTTP for synchronous questions.
+---
+## Cross-Repo Quality Gates
+When workspace mode is active, additional quality gates are injected:
+- **Contract Compliance**: Changes must comply with declared API contracts in `.workspace/contracts/`
+- **Peer Notification**: Affected repos are automatically notified of changes
+- **Cascade Verification**: Library changes trigger verification in all consumer repos
+- **Cross-Repo Impact Check**: Verify impact assessed before implementation
+---
+## Contract Management
+`workspace-contracts.js` tracks integration health:
+- Builds integration map: cross-references provider endpoints with consumer usage
+- Detects orphaned consumers (calling endpoints that don't exist)
+- Detects orphaned providers (endpoints nobody uses)
+- Tracks type versions for schema drift detection
+- Supports OpenAPI, GraphQL, TypeScript, and JSON Schema contract formats
+---
+## Session Continuity
+Manager sessions have special handoff handling:
+- `saveManagerHandoff()`: Captures dispatched tasks, pending messages, active locks, contract drifts
+- `loadManagerHandoff()`: Restores state on next session start
+- Session notes and decisions are preserved across restarts
+---
+## Directory Structure
+```
+workspace-root/
+├── wogi-workspace.json          # Workspace configuration
+├── .workspace/
+│   ├── state/                   # Workspace-level state
+│   │   ├── workspace-manifest.json
+│   │   └── manager-session.json
+│   ├── contracts/               # Shared API contracts
+│   ├── messages/                # Agent-to-agent message bus
+│   └── specs/                   # Cross-repo task specs
+├── backend/                     # Worker repo (provider)
+│   └── .workflow/               # Its own WogiFlow state
+├── frontend/                    # Worker repo (consumer)
+│   └── .workflow/
+└── shared/                      # Worker repo (library)
+    └── .workflow/
+```
+---
+## Related
+- [Mechanical Gates](../06-safety-guardrails/mechanical-gates.md) — Manager boundary gate details
+- [Execution Loop](./02-execution-loop.md) — Single-repo task execution
+- [Model Management](./model-management.md) — Multi-model support

package/.claude/docs/knowledge-base/04-memory-context/context-management.md CHANGED Viewed

@@ -175,6 +175,46 @@ Before running `/compact`:
 ---
+## Phase-Loaded Architecture (v2.15+)
+The `/wogi-start` pipeline instructions are split into 5 phase files loaded on-demand. This reduces prompt token consumption by ~79% for conversations and small tasks that never reach later phases.
+| Phase | File | Loaded when |
+|-------|------|-------------|
+| exploring | `.claude/docs/phases/01-explore.md` | Phase transitions to exploring |
+| spec_review | `.claude/docs/phases/02-spec.md` | Phase transitions to spec_review |
+| coding | `.claude/docs/phases/03-implement.md` | Phase transitions to coding |
+| validating | `.claude/docs/phases/04-verify.md` | Phase transitions to validating |
+| completing | `.claude/docs/phases/05-complete.md` | Phase transitions to completing |
+The phase-read gate (PreToolUse hook) blocks Edit/Write/Bash until the current phase's file is read.
+---
+## Sprint-Based Context Reset (v2.12+)
+For large tasks (5+ acceptance criteria), context degrades as implementation details from early criteria crowd out what matters for the current one.
+**How it works**: At every Nth criterion (default: 3):
+1. Commit progress: `git add -A && git commit -m "sprint: criteria 1-N of M complete"`
+2. Save checkpoint to `.workflow/state/task-checkpoint.json` (task ID, completed criteria, changed files)
+3. Compact context — the PostCompact hook restores task state automatically
+4. Resume from checkpoint with fresh context, reading the spec anew
+**Key difference from normal compaction**: Normal compaction summarizes the conversation. Sprint reset commits work, saves a structured checkpoint, and provides a clean slate. The next sprint reads the spec fresh rather than relying on a compressed summary.
+```json
+{
+  "sprintReset": {
+    "enabled": true,
+    "criteriaPerSprint": 3,
+    "minTaskCriteria": 5
+  }
+}
+```
+---
 ## Compaction Strategy
 ### Default Strategy

package/.claude/docs/knowledge-base/04-memory-context/memory-systems.md CHANGED Viewed

@@ -44,12 +44,23 @@ WogiFlow has multiple memory systems:
 ## Local Facts
-Stored in SQLite database:
+Stored in SQLite database with semantic search capabilities:
 ```
 .workflow/memory/local.db
 ```
+### Semantic Search (Embeddings)
+The memory database supports vector-based semantic search using HuggingFace Transformers:
+- **Embedding model**: `Xenova/all-MiniLM-L6-v2` (runs locally, no API calls)
+- **Similarity**: Cosine similarity between query embedding and stored fact embeddings
+- **Fallback**: When `@huggingface/transformers` is not installed, falls back to text-based search
+- **Threshold**: Results below `0.1` similarity are filtered out
+This enables queries like "find decisions related to authentication" to match facts that don't contain the exact word "authentication" but are semantically related (e.g., "JWT tokens expire after 1 hour", "Use bcrypt for password hashing").
 ### Fact Structure
 ```json

package/.claude/docs/knowledge-base/06-safety-guardrails/README.md CHANGED Viewed

@@ -18,6 +18,7 @@ Safety features prevent:
 | Feature | Purpose |
 |---------|---------|
+| [Mechanical Gates](./mechanical-gates.md) | 12+ PreToolUse hook gates that physically block violations |
 | [Damage Control](./damage-control.md) | Pattern-based protection |
 | [Security Scanning](./security-scanning.md) | Pre-commit security checks |
 | [Checkpoint/Rollback](./checkpoint-rollback.md) | Recovery system |