npm - speexor - Versions diffs - 0.1.1 → 0.2.0 - Mend

speexor 0.1.1 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (46) hide show

package/API-REFERENCE.md +96 -1
package/ARCHITECTURE.md +83 -32
package/BENCHMARKS.md +52 -0
package/CHANGELOG.md +35 -4
package/CODE-OF-CONDUCT.md +83 -83
package/CONTRIBUTING.md +98 -98
package/FAQ.md +105 -105
package/GLOSSARY.md +33 -0
package/LICENSE.md +21 -21
package/PUBLISH.md +77 -77
package/README.md +219 -5
package/REFACTOR-LOG.md +40 -40
package/ROADMAP.md +37 -15
package/SECURITY-DEFAULTS.md +118 -0
package/SECURITY.md +79 -79
package/SUMMARY.md +31 -8
package/TESTING.md +140 -140
package/dist/{agent-5D3BVWNK.js → agent-D4BRWEOZ.js} +4 -4
package/dist/agent-D4BRWEOZ.js.map +1 -0
package/dist/{chunk-2F66BZYJ.js → chunk-2DX54KIM.js} +2 -2
package/dist/chunk-2DX54KIM.js.map +1 -0
package/dist/{chunk-B7WLHC4W.js → chunk-7VZHDGRQ.js} +2 -2
package/dist/chunk-7VZHDGRQ.js.map +1 -0
package/dist/{chunk-SXALZEOJ.js → chunk-AOFWQZWY.js} +2 -2
package/dist/chunk-AOFWQZWY.js.map +1 -0
package/dist/cli/index.js +4 -4
package/dist/cli/index.js.map +1 -1
package/dist/core/index.js +1 -1
package/dist/index.js +3 -3
package/dist/index.js.map +1 -1
package/dist/plugins/index.js +1 -1
package/docs/SETUP.md +94 -94
package/docs/TROUBLESHOOTING.md +113 -113
package/docs/adr/0001-record-architecture-decisions.md +44 -0
package/docs/adr/0002-plugin-architecture.md +53 -0
package/docs/adr/0003-recursive-task-decomposition.md +57 -0
package/docs/adr/0004-local-first-security.md +58 -0
package/docs/adr/0005-data-directory-layout.md +69 -0
package/examples/basic.yaml +61 -61
package/package.json +103 -102
package/schema/config.schema.json +119 -119
package/speexor.config.yaml.example +30 -30
package/dist/agent-5D3BVWNK.js.map +0 -1
package/dist/chunk-2F66BZYJ.js.map +0 -1
package/dist/chunk-B7WLHC4W.js.map +0 -1
package/dist/chunk-SXALZEOJ.js.map +0 -1

package/README.md CHANGED Viewed

@@ -6,6 +6,8 @@
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
 [![Node](https://img.shields.io/badge/node-%3E%3D18.0.0-brightgreen)](https://nodejs.org)
 [![TypeScript](https://img.shields.io/badge/%3C/%3E-TypeScript-3178C6)](https://www.typescriptlang.org/)
+[![CI](https://github.com/superdevids/speexor/actions/workflows/ci.yml/badge.svg)](https://github.com/superdevids/speexor/actions/workflows/ci.yml)
+[![Coverage](https://img.shields.io/badge/coverage-~90%25-brightgreen)](https://github.com/superdevids/speexor)
 ---
@@ -15,9 +17,9 @@ Speexor is an **agent orchestrator** purpose-built for teams and solo developers
 Coordinating multiple coding agents manually is error-prone and slow. Speexor solves this with a **7-plugin slot architecture** that separates agent adapters, runtime backends, workspace management, issue tracking, SCM operations, notifications, and terminal I/O into clean, swappable modules. Git worktree isolation keeps each agent session on its own branch without polluting your working tree.
-Speexor ships with a built-in **HTTP dashboard**, a **reaction engine** for automated CI-fix loops, JSON-file session persistence, and two runtime backends — `tmux` on Unix and `Process` on Windows — so it works wherever you do.
+Speexor ships with a built-in **interactive dashboard**, a **reaction engine** for automated CI-fix loops, **recursive task decomposition** with DAG-based scheduling, a **governance engine** for risk-aware approvals, a **secrets vault** for secure credential storage, and an **extension marketplace** for community plugins.
-> **Status:** v0.1.0 — Early release. The plugin API and core commands are stable; breaking changes are possible before 1.0.
+> **Status:** v0.2.0 — Feature release. Core commands, task decomposition, governance, extensions, dashboard v2, and 320+ tests stable. Breaking changes are possible before 1.0.
 ---
@@ -25,13 +27,28 @@ Speexor ships with a built-in **HTTP dashboard**, a **reaction engine** for auto
 - **Parallel Agent Dispatch** — Spawn multiple coding agents (OpenCode, Claude Code, Aider, Codex) across different repositories simultaneously
 - **Multi-Provider Routing** — Configure primary and fallback AI providers per project with concurrent-limit controls
-- **Git Worktree Isolation** — Every agent task operates on its own isolated `git worktree` branch — zero interference with your main workspace
+- **Recursive Task Decomposition** — LLM-driven planner breaks high-level tasks into a DAG of sub-tasks with automatic dependency resolution
+- **Parallel Agent Scheduler** — Execute decomposed tasks in parallel with resource-aware throttling and auto-retry
+- **Governance Engine** — Two-axis approval model (Task Origin Gate + Action Risk Gate) with risk-based action classification
+- **Cost Tracking** — Per-provider, per-project, per-task-node cost tracking with budget guard enforcement
+- **Secrets Vault** — OS-native secure credential storage (macOS Keychain, Windows Credential Manager, Linux encrypted file)
+- **Extension Marketplace** — Search, install, list, update, and remove extensions via `speexor ext` commands
+- **Plugin SDK** — `@speexor/sdk` with extension scaffold generator and TypeScript types
+- **Decision Quality Eval** — Confidence calibration reporting, decision logging, and auto-labeling
+- **Worktree Hierarchy Protocol** — Pin-to-commit-hash forking, serialized subagent merges, and conflict escalation
+- **Interactive Dashboard v2** — Task Tree View, Agent Fleet, Approvals Inbox, Cost Panel, Decision Log, Simple Mode, WCAG accessibility
+- **Session Guard** — AFK auto-pause, budget enforcement, idle timeout
+- **Provider Error Retry** — Exponential backoff on provider failures with automatic fallback
 - **Reaction Engine** — Automatically detect CI failures, PR review changes, and approvals; trigger fix, notify, or escalate actions with configurable retries
-- **Built-in HTTP Dashboard** — Monitor sessions, worktrees, and agent status in real time (Node.js built-in `http` module, no external dependencies)
+- **Git Worktree Isolation** — Every agent task operates on its own isolated `git worktree` branch — zero interference with your main workspace
 - **7-Plugin Slot Architecture** — Swap or extend every layer: agent adapters, runtimes, workspace, tracker, SCM, notifier, and terminal
 - **Cross-Platform Runtimes** — `tmux` for Unix, `Process` for Windows — pick the backend that matches your OS
 - **JSON-File Persistence** — Session state, worktrees, and runtimes persist to `.speexor/state.json` automatically
 - **Zod-Validated Config** — YAML configuration validated at load with clear error messages for every field
+- **19 Test Files / ~320 Tests** — Comprehensive Vitest coverage across core, CLI, plugins, and scheduler
+- **CI/CD Pipeline** — GitHub Actions for test, lint, build, and publish
+- **Architecture Decision Records** — 5 ADRs in `docs/adr/` documenting key design choices
+- **Glossary** — Canonical terminology reference in `GLOSSARY.md`
 ---
@@ -106,10 +123,158 @@ projects:
       concurrentLimit: 2
 ```
+Validate your configuration against the schema and run any pending migrations:
+```bash
+speexor config validate
+```
 See the full schema with `speexor config-help`.
 ---
+## Recursive Task Decomposition & Scheduling
+Speexor's task system transforms high-level descriptions into an executable DAG of sub-tasks:
+### LLM-Driven Planner
+- Submit a task with `speexor task submit "Implement user authentication"`
+- The LLM planner decomposes it into a directed acyclic graph of atomic sub-tasks
+- Dependencies are automatically resolved: a sub-task only executes after all its predecessors succeed
+### Parallel Agent Scheduler
+- The scheduler executes independent sub-tasks in parallel across the configured agent fleet
+- **Resource-aware throttling** — respects per-project `concurrentLimit` and system load
+- **Auto-retry** — failed sub-tasks are retried up to the configured limit before escalation
+- **Cost tracking per node** — every executed sub-task is attributed with provider, tokens, and duration
+### Worktree Hierarchy Protocol
+- Each decomposed sub-task forks from its parent's commit hash via `git worktree`
+- Completed sub-tasks merge back serially, respecting the dependency chain
+- Merge conflicts are detected and escalated for resolution
+```bash
+# Submit a high-level task for decomposition
+speexor task submit "Implement OAuth2 login flow"
+```
+---
+## Governance & Security
+### Governance Engine
+Speexor implements a two-axis approval model:
+| Axis | Gate | What It Checks |
+|------|------|----------------|
+| **Task Origin Gate** | Who submitted? | Validates the user, session, or CI trigger against configured policies |
+| **Action Risk Gate** | What will it do? | Classifies every action by risk (low / medium / high / critical) based on file scope, provider, and command |
+Actions are automatically classified:
+- **Low** — read-only, documentation, lint fixes — auto-approved
+- **Medium** — isolated file edits — requires single approval
+- **High** — cross-module refactors, dependency changes — requires peer approval
+- **Critical** — production config, secrets, destructive operations — requires multi-party approval
+### Secrets Vault
+Credentials are stored in the OS-native secure store:
+- **macOS** — Keychain via `security` CLI
+- **Windows** — Credential Manager via `cmdkey`
+- **Linux** — AES-256-GCM encrypted file at `~/.speexor/vault.enc`
+### Session Guard
+- **AFK auto-pause** — pauses agent sessions after a configurable period of keyboard inactivity
+- **Budget enforcement** — hard-stops execution when per-session or per-project cost limits are reached
+- **Idle timeout** — auto-terminates sessions that have no output for a configurable duration
+### Provider Error Retry
+Provider failures (rate limits, timeouts, 5xx) are handled with:
+- **Exponential backoff** — 1s, 2s, 4s, 8s, max 60s between retries
+- **Automatic fallback** — switches to the configured fallback provider after exhausting retries
+- Each retry and fallback event is logged for audit
+---
+## Extension Marketplace
+Speexor supports community extensions published to npm under the `@speexor/ext-*` scope.
+```bash
+# Search the marketplace
+speexor ext search "gitlab"
+# Install an extension
+speexor ext install @speexor/ext-gitlab-tracker
+# List installed extensions
+speexor ext list
+# Update all extensions
+speexor ext update
+# Remove an extension
+speexor ext remove @speexor/ext-gitlab-tracker
+```
+### Plugin SDK (`@speexor/sdk`)
+Build your own extensions:
+```bash
+npm create @speexor/ext my-extension
+cd my-extension
+npm install
+npm run build
+npm publish
+```
+The SDK provides:
+- TypeScript types for all 7 plugin slots
+- Extension scaffold generator (`npm create @speexor/ext`)
+- Validation utilities matching the core Zod schemas
+- Lifecycle hooks (`initialize`, `destroy`) for clean setup/teardown
+---
+## Interactive Dashboard
+The built-in HTTP dashboard has been upgraded to **v2** with six panels:
+| Panel | Description |
+|-------|-------------|
+| **Task Tree View** | Visual DAG of decomposed tasks with status, dependencies, and execution order |
+| **Agent Fleet** | Live view of all running agents, their sessions, resource usage, and logs |
+| **Approvals Inbox** | Pending governance approvals with risk classification and context |
+| **Cost Panel** | Real-time cost accumulation per provider, project, and task node |
+| **Decision Log** | Searchable history of governance decisions with confidence calibration |
+| **Simple Mode** | Reduced UI for read-only monitoring — hides actions and approvals |
+Accessibility: The dashboard targets WCAG 2.1 AA compliance with keyboard navigation, ARIA labels, and high-contrast mode support.
+```bash
+speexor start https://github.com/your-org/your-repo --port 3000
+# Open http://localhost:3000
+```
+---
+## Decision Quality Evaluation
+Speexor logs every governance decision and can evaluate decision quality:
+```bash
+# Run decision quality evaluation
+speexor eval decisions
+```
+This produces a **confidence calibration report** showing:
+- Accuracy of risk classification vs. actual outcomes
+- Decision latency distribution
+- Auto-labeling quality metrics
+- Recommendations for policy tuning
+---
 ## Plugin Architecture
 Speexor defines **7 plugin slots**. Each slot can have multiple implementations registered at startup.
@@ -126,6 +291,8 @@ Speexor defines **7 plugin slots**. Each slot can have multiple implementations
 Plugins implement `PluginModule` with a standard lifecycle (`initialize` / `destroy`) and receive typed access to the config, event bus, and logger through `PluginContext`.
+Extensions published on the marketplace implement the same interface and are loaded dynamically at startup.
 ---
 ## CLI Reference
@@ -138,13 +305,57 @@ speexor <command> [options]
 |---------|-------------|
 | `start [repo]` | Initialize a project config and start the HTTP dashboard. Options: `--port`, `--name`, `--no-dashboard` |
 | `agent spawn --task <id>` | Spawn an agent for a GitHub issue. Options: `--agent` (opencode, claude-code, aider, codex) |
+| `task submit <description>` | Submit a high-level task description for LLM-based decomposition into a DAG of sub-tasks |
 | `list` | List all projects and active agent statuses |
 | `stop <session-id>` | Stop an agent session safely |
 | `logs <session-id>` | Tail logs for an agent session. Options: `--follow`, `--lines` |
+| `ext search <query>` | Search the extension marketplace for available extensions |
+| `ext install <package>` | Install an extension from the marketplace |
+| `ext list` | List all installed extensions |
+| `ext update [package]` | Update all extensions or a specific one |
+| `ext remove <package>` | Uninstall an extension |
+| `config validate` | Validate configuration against the schema and run pending migrations |
+| `eval decisions` | Run decision quality evaluation with confidence calibration report |
 | `config-help` | Print the full YAML config schema reference |
 ---
+## Testing & CI/CD
+### Test Suite
+Speexor includes **19 test files** containing **~320 tests** across the following domains:
+| Area | Files | Focus |
+|------|-------|-------|
+| Core | 6 | State management, config validation, event bus, session lifecycle |
+| CLI | 4 | Command parsing, argument validation, output formatting |
+| Plugins | 4 | Agent adapters, runtime backends, workspace isolation |
+| Scheduler | 3 | DAG execution, parallelism, throttling, retry logic |
+| Governance | 2 | Risk classification, approval workflow, budget enforcement |
+Run the tests:
+```bash
+npm test                 # Run all tests
+npm run test:watch       # Watch mode
+npm run test:coverage    # Coverage report (~90%)
+```
+### CI/CD Pipeline
+GitHub Actions (`.github/workflows/ci.yml`) runs on every push and pull request:
+| Stage | What It Does |
+|-------|-------------|
+| **Lint** | Biome check on `src/` |
+| **Typecheck** | `tsc --noEmit` full project type validation |
+| **Test** | Vitest with ~320 tests across all modules |
+| **Build** | `tsup` production build |
+| **Publish** | npm publish on tagged releases (`v*`) |
+---
 ## Examples
 A complete multi-project configuration with reaction rules is available in [`examples/basic.yaml`](./examples/basic.yaml).
@@ -158,6 +369,8 @@ A complete multi-project configuration with reaction rules is available in [`exa
 - [Behavior Spec (PRD 03)](./docs/PRD03.md)
 - [Design Docs (PRD 04)](./docs/PRD04.md)
 - [Specification (PRD 05)](./docs/PRD05.md)
+- [Architecture Decision Records](./docs/adr/) — 5 ADRs covering key design decisions
+- [Glossary](./GLOSSARY.md) — Canonical terminology reference
 - [Setup Guide](./docs/SETUP.md)
 - [Troubleshooting](./docs/TROUBLESHOOTING.md)
@@ -170,7 +383,8 @@ Speexor was created because existing agent orchestration tools were either tied
 - **Agent-agnostic** — run OpenCode, Claude Code, Aider, or Codex with the same CLI
 - **Lightweight** — zero external runtime dependencies beyond your agents and git
 - **Repository-native** — works with your existing GitHub repos, issues, and PR workflows
-- **Extensible** — swap any plugin slot with your own implementation
+- **Extensible** — swap any plugin slot with your own implementation or install community extensions
+- **Governance-ready** — built-in approval gates, cost controls, and audit logging for production use
 ---

package/REFACTOR-LOG.md CHANGED Viewed

@@ -1,40 +1,40 @@
-# Refactoring Log
-> Tracking significant architectural changes and refactoring decisions for Speexor.
-## [2026-06-30] — Initial Implementation (v0.1.0)
-### Context
-Initial implementation of the Agent Orchestrator based on PRD01.md. Built from scratch as `packages/speexor` in the speexjs monorepo.
-### Decisions
-1. **Plugin Architecture**: 7 fixed plugin slots (agent, runtime, workspace, tracker, scm, notifier, terminal) — inspired by AgentWrapper AO but with cleaner interface segregation
-2. **Monorepo as Single Package**: Unlike AO's multi-package structure, v1 lives in a single `speexor` package with internal module separation. This reduces initial complexity while maintaining clean module boundaries via `src/` directories.
-3. **TypeScript ESM Only**: No CJS support. The rest of speexjs ecosystem is ESM-first.
-4. **Built-in HTTP Dashboard**: Using Node.js built-in `http` module instead of Express/NestJS to keep dependencies minimal for v1.
-5. **No Dynamic Plugin Loading**: All built-in plugins are hardcoded in `loadAllPlugins()`. Dynamic loading from external packages is deferred to v2.
-6. **Synchronous File I/O for State**: `SessionStore` uses `readFileSync`/`writeFileSync` for simplicity. Will migrate to async when performance requires it.
-### Rebrand: Konduktor → Speexor
-The project was initially codenamed "Konduktor" internally. After initial implementation, renamed to "Speexor" to align with the SpeexJS brand ecosystem. All code identifiers, CLI names, config files, and documentation were updated.
-### Patterns Established
-- `PluginModule` base interface with `initialize()`/`destroy()` lifecycle
-- Debug namespace pattern: `speexor:<module>:<submodule>`
-- Session ID format: `<prefix>-<task-id>-<timestamp>`
-- Worktree branch format: `speexor/<task-id>`
-- Config directory: `.speexor/` at project root
-### Technical Debt
-- No test files yet (vitest configured but no tests written)
-- `getLiveStream()` throws "not implemented" in both runtime plugins
-- Terminal plugin has no implementation
-- Dashboard HTML is inline (hard to maintain)
-- Session store uses synchronous I/O
-- No dynamic plugin discovery mechanism
-### What Worked Well
-- TypeScript strict mode caught several type errors during initial build
-- Event bus pattern cleanly decouples lifecycle from dashboard
-- Zod validation provides clear error messages for misconfigured YAML
-- Plugin interface design allows adding new providers without touching core
+# Refactoring Log
+> Tracking significant architectural changes and refactoring decisions for Speexor.
+## [2026-06-30] — Initial Implementation (v0.1.0)
+### Context
+Initial implementation of the Agent Orchestrator based on PRD01.md. Built from scratch as `packages/speexor` in the speexjs monorepo.
+### Decisions
+1. **Plugin Architecture**: 7 fixed plugin slots (agent, runtime, workspace, tracker, scm, notifier, terminal) — inspired by AgentWrapper AO but with cleaner interface segregation
+2. **Monorepo as Single Package**: Unlike AO's multi-package structure, v1 lives in a single `speexor` package with internal module separation. This reduces initial complexity while maintaining clean module boundaries via `src/` directories.
+3. **TypeScript ESM Only**: No CJS support. The rest of speexjs ecosystem is ESM-first.
+4. **Built-in HTTP Dashboard**: Using Node.js built-in `http` module instead of Express/NestJS to keep dependencies minimal for v1.
+5. **No Dynamic Plugin Loading**: All built-in plugins are hardcoded in `loadAllPlugins()`. Dynamic loading from external packages is deferred to v2.
+6. **Synchronous File I/O for State**: `SessionStore` uses `readFileSync`/`writeFileSync` for simplicity. Will migrate to async when performance requires it.
+### Rebrand: Konduktor → Speexor
+The project was initially codenamed "Konduktor" internally. After initial implementation, renamed to "Speexor" to align with the SpeexJS brand ecosystem. All code identifiers, CLI names, config files, and documentation were updated.
+### Patterns Established
+- `PluginModule` base interface with `initialize()`/`destroy()` lifecycle
+- Debug namespace pattern: `speexor:<module>:<submodule>`
+- Session ID format: `<prefix>-<task-id>-<timestamp>`
+- Worktree branch format: `speexor/<task-id>`
+- Config directory: `.speexor/` at project root
+### Technical Debt
+- No test files yet (vitest configured but no tests written)
+- `getLiveStream()` throws "not implemented" in both runtime plugins
+- Terminal plugin has no implementation
+- Dashboard HTML is inline (hard to maintain)
+- Session store uses synchronous I/O
+- No dynamic plugin discovery mechanism
+### What Worked Well
+- TypeScript strict mode caught several type errors during initial build
+- Event bus pattern cleanly decouples lifecycle from dashboard
+- Zod validation provides clear error messages for misconfigured YAML
+- Plugin interface design allows adding new providers without touching core

package/ROADMAP.md CHANGED Viewed

@@ -10,6 +10,30 @@ Completed milestones:
 - M2: GitHub integration (tracker + SCM + reaction engine)
 - M3: Multi-agent adapters (Claude Code, Aider, Codex)
 - M4: Dashboard MVP (REST API + HTML frontend)
+- M6: Polish & Documentation — CI/CD pipeline setup, ADRs, glossary, benchmarks, ROADMAP update
+## PRD v5 Fix Milestones (inserted before M5)
+### M22a — Critical Fixes (Target: Q3 2026)
+- FR-85: Correct sandboxing model (isolated-vm / restricted process, not worker_threads)
+- FR-86: Unified two-axis Approval Model (Task Origin Gate + Action Risk Gate)
+- FR-87: Canonical dashboard panel names (Approvals, Decision Log)
+- FR-88: Worktree Hierarchy Protocol (pinned commits, serialized merges, conflict escalation)
+- FR-89: Split maxTaskGraphDepth / maxAgentSpawnDepth config
+### M22b — Major Fixes (Target: Q3 2026)
+- FR-90: Config schema migration tooling (speexor config validate)
+- FR-91: Decision Quality Eval Harness + calibration reporting
+- FR-92: Explicit rollback trigger conditions
+- FR-93: Budget guard enforcement (riskPolicy.budgetLimitUSD)
+- FR-94: SECURITY-DEFAULTS.md consolidated reference
+- FR-95: Provider error retry/backoff/fallback contract
+- FR-96: Clarified maxAfkDurationHours pause semantics
+### M22c — Documentation Consolidation (Target: Q3 2026)
+- SECURITY-DEFAULTS.md update with two-layer model
+- Glossary additions from v5 §6
+- GLOSSARY.md maintenance pass
 ## Upcoming Milestones
@@ -19,18 +43,14 @@ Completed milestones:
 - Usage analytics dashboard panel
 - Provider fallback configuration UI
-### M6 — Polish & Documentation (Target: Q3 2026)
-- Comprehensive test suite (Vitest + integration tests)
-- CI/CD pipeline setup
-- Open-source readiness review
-- Performance benchmarking
 ### M7 — Recursive Task Decomposition (Target: Q4 2026)
 - DAG-based task breakdown
+- LLM-based planner integration
 - Parallel sub-agent spawning
 - Task dependency resolution
 - Progress tracking across sub-tasks
 - Context window management
+- Worktree Hierarchy Protocol implementation
 ### M8 — Real-Time Terminal (Target: Q4 2026)
 - WebSocket-based live terminal streaming
@@ -39,11 +59,12 @@ Completed milestones:
 - Session recording and replay
 ### M9 — Extension Marketplace (Target: Q1 2027)
-- Extension manifest format
+- Extension manifest format (speexor.extension.json)
 - Plugin SDK (@speexor/sdk)
 - Marketplace index (GitHub-based registry)
 - Extension install/update/remove lifecycle
 - Permission system for extensions
+- Skill manifest as superset-compatible wrapper over ECC/affaan-m folders (FR-97)
 ### M10 — Advanced Security & Compliance (Target: Q1 2027)
 - End-to-end encryption for cross-device sync
@@ -61,18 +82,19 @@ Completed milestones:
 ## Themes
 ### 🎯 Short-term (Q3 2026)
-- Polish existing features
+- PRD v5 critical/major fix implementation (M22a, M22b)
+- Documentation consolidation (M22c)
+- Cost tracking (M5)
 - Test coverage >80%
-- Documentation completion
-- Cost tracking
+- CI/CD pipeline active
 ### 🚀 Medium-term (Q4 2026)
-- Recursive task decomposition
-- Real-time terminal streaming
+- Recursive task decomposition (M7)
+- Real-time terminal streaming (M8)
 - Advanced observability
 ### 🌟 Long-term (2027)
-- Extension marketplace
+- Extension marketplace (M9)
 - Plugin SDK for community
-- Multi-host distributed execution
-- Enterprise security features
+- Advanced security & compliance (M10)
+- Multi-host distributed execution (M11)

package/SECURITY-DEFAULTS.md ADDED Viewed

@@ -0,0 +1,118 @@
+# Security & Safety Defaults
+> Reference document for Speexor's two-layer security and safety model.
+> Created per FR-94 (PRD05 §4.5) to clarify the distinction between
+> Extension Permissions and Action Risk Tiers.
+## Overview
+Speexor has **two independent safety layers** that operate at different levels:
+1. **Extension Permissions** — gates what an extension *can ever do* (set once at install time)
+2. **Action Risk Tiers** — gates what *any* action (from any already-permitted extension or core agent) *does right now* (evaluated every time)
+These are intentionally separate systems. An extension that passes its permission check still has every runtime action evaluated against the Action Risk Tier policy.
+---
+## Layer 1: Extension Permissions
+### What it controls
+Capabilities granted to third-party extensions at install time.
+### Configuration
+`spexor.config.yaml` → `extensions.permissionsMode`
+| Mode | Behavior |
+|------|----------|
+| `strict` (default) | Extensions run in sandboxed process with enforced permission boundaries |
+| `relaxed` | Permissions still require user confirmation, but sandboxing is less restrictive |
+### Permission Axes
+| Axis | Levels | Default | Description |
+|------|--------|---------|-------------|
+| `fileSystem` | `none`, `read-only`, `read-write`, `scoped`, `full` | `none` | Access to the local file system |
+| `network` | `none`, `read-only`, `scoped`, `full` | `none` | Outbound network access |
+| `shell` | `none`, `read-only`, `scoped`, `full` | `none` | Shell command execution |
+| `secrets` | string[] of named scopes | `[]` | Access to named secrets in the Vault |
+### Permission Lifecycle
+1. **Install**: Extension manifest declares permissions → displayed to user in plain English
+2. **Confirm**: User explicitly accepts or rejects the permission set
+3. **Upgrade**: Any permission upgrade on update requires re-confirmation
+4. **Runtime**: PermissionEnforcer checks every operation against declared permissions
+### Enforcement
+- Extensions with `shell: none` + `network: none` → `IsolatedSandbox` (node:vm, no Node built-ins)
+- Extensions requiring broader access → `ProcessSandbox` (child_process with proxy)
+- `worker_threads` is NEVER used as a security boundary (it shares process memory)
+---
+## Layer 2: Action Risk Tiers
+### What it controls
+Whether a specific runtime action (from any source) requires user approval before execution.
+### Configuration
+`spexor.config.yaml` → `riskPolicy`
+| Setting | Default | Description |
+|---------|---------|-------------|
+| `autoApprove` | `[]` | Action categories that auto-execute without approval |
+| `requireApproval` | `["irreversible-high-stakes"]` | Action categories that always require approval |
+| `approvalTimeout` | `4h` | How long an approval request waits before default action |
+| `approvalDefaultAction` | `skip` | What happens when approval times out |
+| `defaultRiskTierForUnknownActions` | `medium` | Default tier when action risk can't be classified |
+### Risk Tiers
+| Tier | Auto-Approve? | Examples |
+|------|---------------|----------|
+| `reversible-low` | Yes (default) | Running tests, linting, reading files |
+| `reversible-medium` | Yes (default) | Creating branches, committing code |
+| `irreversible-high-stakes` | No (requires approval) | Merging PRs, deleting branches, publishing packages, modifying CI config |
+| `unknown` | Depends on `defaultRiskTierForUnknownActions` | Actions not classified by the risk classifier |
+### Approval Workflow
+1. Action is evaluated by GovernanceEngine.evaluateAction()
+2. If requires approval → ApprovalItem created with expiry + default action
+3. Appears in dashboard "Approvals" panel (tagged as Axis 2)
+4. User approves/rejects or timeout triggers default action
+---
+## Comparison: When Each Layer Applies
+| Scenario | Extension Permissions | Action Risk Tier |
+|----------|----------------------|------------------|
+| Installing a code-review skill | ✅ Checked (install time) | ❌ Not applicable |
+| Skill reads a file in workspace | ✅ Checked (fileSystem permission) | ❌ Not applicable (reversible-low) |
+| Skill runs `git push` | ✅ Checked (shell permission) | ✅ Checked (irreversible-high-stakes) |
+| Skill makes HTTP call | ✅ Checked (network permission) | ❌ Not applicable (reversible-medium) |
+| Core agent creates a PR | ❌ Not applicable (core, not extension) | ✅ Checked (irreversible-high-stakes) |
+| Core agent proposes a new task | ❌ Not applicable | ✅ Checked (task-origin-gate, Axis 1) |
+---
+## Safety Defaults Summary
+| Default | Value | Rationale |
+|---------|-------|-----------|
+| Extension permissions default | All `none` | Least privilege by default |
+| Permissions mode | `strict` | Sandbox everything third-party |
+| Unknown risk tier | `medium` | Conservative when can't classify |
+| Approval timeout default action | `skip` | Fail safe, don't auto-execute |
+| Budget limit | Not set (opt-in) | User must explicitly set cost controls |
+| Auto-merge PRs | `false` | Human judgment required for merges |
+---
+## Related Documentation
+- [SECURITY.md](SECURITY.md) — Vulnerability reporting, incident response
+- [ARCHITECTURE.md](ARCHITECTURE.md) — System architecture and plugin model
+- [CONTRIBUTING.md](CONTRIBUTING.md) — Extension development guidelines
+- `src/sandbox/` — Sandbox implementation
+- `src/governance/` — Governance engine implementation