speexor 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (46) hide show
  1. package/API-REFERENCE.md +96 -1
  2. package/ARCHITECTURE.md +84 -33
  3. package/BENCHMARKS.md +52 -0
  4. package/CHANGELOG.md +35 -4
  5. package/CODE-OF-CONDUCT.md +83 -83
  6. package/CONTRIBUTING.md +98 -98
  7. package/FAQ.md +105 -105
  8. package/GLOSSARY.md +33 -0
  9. package/LICENSE.md +21 -21
  10. package/PUBLISH.md +77 -77
  11. package/README.md +222 -8
  12. package/REFACTOR-LOG.md +40 -40
  13. package/ROADMAP.md +37 -15
  14. package/SECURITY-DEFAULTS.md +118 -0
  15. package/SECURITY.md +79 -79
  16. package/SUMMARY.md +31 -8
  17. package/TESTING.md +140 -140
  18. package/dist/{agent-5D3BVWNK.js → agent-D4BRWEOZ.js} +4 -4
  19. package/dist/agent-D4BRWEOZ.js.map +1 -0
  20. package/dist/{chunk-2F66BZYJ.js → chunk-2DX54KIM.js} +2 -2
  21. package/dist/chunk-2DX54KIM.js.map +1 -0
  22. package/dist/{chunk-B7WLHC4W.js → chunk-7VZHDGRQ.js} +2 -2
  23. package/dist/chunk-7VZHDGRQ.js.map +1 -0
  24. package/dist/{chunk-SXALZEOJ.js → chunk-AOFWQZWY.js} +2 -2
  25. package/dist/chunk-AOFWQZWY.js.map +1 -0
  26. package/dist/cli/index.js +4 -4
  27. package/dist/cli/index.js.map +1 -1
  28. package/dist/core/index.js +1 -1
  29. package/dist/index.js +3 -3
  30. package/dist/index.js.map +1 -1
  31. package/dist/plugins/index.js +1 -1
  32. package/docs/SETUP.md +94 -94
  33. package/docs/TROUBLESHOOTING.md +113 -113
  34. package/docs/adr/0001-record-architecture-decisions.md +44 -0
  35. package/docs/adr/0002-plugin-architecture.md +53 -0
  36. package/docs/adr/0003-recursive-task-decomposition.md +57 -0
  37. package/docs/adr/0004-local-first-security.md +58 -0
  38. package/docs/adr/0005-data-directory-layout.md +69 -0
  39. package/examples/basic.yaml +61 -61
  40. package/package.json +103 -102
  41. package/schema/config.schema.json +119 -119
  42. package/speexor.config.yaml.example +30 -30
  43. package/dist/agent-5D3BVWNK.js.map +0 -1
  44. package/dist/chunk-2F66BZYJ.js.map +0 -1
  45. package/dist/chunk-B7WLHC4W.js.map +0 -1
  46. package/dist/chunk-SXALZEOJ.js.map +0 -1
package/README.md CHANGED
@@ -1,11 +1,13 @@
1
- # @speexjs/speexor
1
+ # Speexor
2
2
 
3
3
  **Agent Orchestrator for multi-AI coding agents across repositories**
4
4
 
5
- [![npm version](https://img.shields.io/npm/v/@speexjs/speexor.svg)](https://www.npmjs.com/package/@speexjs/speexor)
5
+ [![npm version](https://img.shields.io/npm/v/speexor.svg)](https://www.npmjs.com/package/speexor)
6
6
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
7
7
  [![Node](https://img.shields.io/badge/node-%3E%3D18.0.0-brightgreen)](https://nodejs.org)
8
8
  [![TypeScript](https://img.shields.io/badge/%3C/%3E-TypeScript-3178C6)](https://www.typescriptlang.org/)
9
+ [![CI](https://github.com/superdevids/speexor/actions/workflows/ci.yml/badge.svg)](https://github.com/superdevids/speexor/actions/workflows/ci.yml)
10
+ [![Coverage](https://img.shields.io/badge/coverage-~90%25-brightgreen)](https://github.com/superdevids/speexor)
9
11
 
10
12
  ---
11
13
 
@@ -15,9 +17,9 @@ Speexor is an **agent orchestrator** purpose-built for teams and solo developers
15
17
 
16
18
  Coordinating multiple coding agents manually is error-prone and slow. Speexor solves this with a **7-plugin slot architecture** that separates agent adapters, runtime backends, workspace management, issue tracking, SCM operations, notifications, and terminal I/O into clean, swappable modules. Git worktree isolation keeps each agent session on its own branch without polluting your working tree.
17
19
 
18
- Speexor ships with a built-in **HTTP dashboard**, a **reaction engine** for automated CI-fix loops, JSON-file session persistence, and two runtime backends `tmux` on Unix and `Process` on Windows so it works wherever you do.
20
+ Speexor ships with a built-in **interactive dashboard**, a **reaction engine** for automated CI-fix loops, **recursive task decomposition** with DAG-based scheduling, a **governance engine** for risk-aware approvals, a **secrets vault** for secure credential storage, and an **extension marketplace** for community plugins.
19
21
 
20
- > **Status:** v0.1.0 — Early release. The plugin API and core commands are stable; breaking changes are possible before 1.0.
22
+ > **Status:** v0.2.0 — Feature release. Core commands, task decomposition, governance, extensions, dashboard v2, and 320+ tests stable. Breaking changes are possible before 1.0.
21
23
 
22
24
  ---
23
25
 
@@ -25,13 +27,28 @@ Speexor ships with a built-in **HTTP dashboard**, a **reaction engine** for auto
25
27
 
26
28
  - **Parallel Agent Dispatch** — Spawn multiple coding agents (OpenCode, Claude Code, Aider, Codex) across different repositories simultaneously
27
29
  - **Multi-Provider Routing** — Configure primary and fallback AI providers per project with concurrent-limit controls
28
- - **Git Worktree Isolation** — Every agent task operates on its own isolated `git worktree` branch — zero interference with your main workspace
30
+ - **Recursive Task Decomposition** — LLM-driven planner breaks high-level tasks into a DAG of sub-tasks with automatic dependency resolution
31
+ - **Parallel Agent Scheduler** — Execute decomposed tasks in parallel with resource-aware throttling and auto-retry
32
+ - **Governance Engine** — Two-axis approval model (Task Origin Gate + Action Risk Gate) with risk-based action classification
33
+ - **Cost Tracking** — Per-provider, per-project, per-task-node cost tracking with budget guard enforcement
34
+ - **Secrets Vault** — OS-native secure credential storage (macOS Keychain, Windows Credential Manager, Linux encrypted file)
35
+ - **Extension Marketplace** — Search, install, list, update, and remove extensions via `speexor ext` commands
36
+ - **Plugin SDK** — `@speexor/sdk` with extension scaffold generator and TypeScript types
37
+ - **Decision Quality Eval** — Confidence calibration reporting, decision logging, and auto-labeling
38
+ - **Worktree Hierarchy Protocol** — Pin-to-commit-hash forking, serialized subagent merges, and conflict escalation
39
+ - **Interactive Dashboard v2** — Task Tree View, Agent Fleet, Approvals Inbox, Cost Panel, Decision Log, Simple Mode, WCAG accessibility
40
+ - **Session Guard** — AFK auto-pause, budget enforcement, idle timeout
41
+ - **Provider Error Retry** — Exponential backoff on provider failures with automatic fallback
29
42
  - **Reaction Engine** — Automatically detect CI failures, PR review changes, and approvals; trigger fix, notify, or escalate actions with configurable retries
30
- - **Built-in HTTP Dashboard** — Monitor sessions, worktrees, and agent status in real time (Node.js built-in `http` module, no external dependencies)
43
+ - **Git Worktree Isolation** — Every agent task operates on its own isolated `git worktree` branch zero interference with your main workspace
31
44
  - **7-Plugin Slot Architecture** — Swap or extend every layer: agent adapters, runtimes, workspace, tracker, SCM, notifier, and terminal
32
45
  - **Cross-Platform Runtimes** — `tmux` for Unix, `Process` for Windows — pick the backend that matches your OS
33
46
  - **JSON-File Persistence** — Session state, worktrees, and runtimes persist to `.speexor/state.json` automatically
34
47
  - **Zod-Validated Config** — YAML configuration validated at load with clear error messages for every field
48
+ - **19 Test Files / ~320 Tests** — Comprehensive Vitest coverage across core, CLI, plugins, and scheduler
49
+ - **CI/CD Pipeline** — GitHub Actions for test, lint, build, and publish
50
+ - **Architecture Decision Records** — 5 ADRs in `docs/adr/` documenting key design choices
51
+ - **Glossary** — Canonical terminology reference in `GLOSSARY.md`
35
52
 
36
53
  ---
37
54
 
@@ -39,7 +56,7 @@ Speexor ships with a built-in **HTTP dashboard**, a **reaction engine** for auto
39
56
 
40
57
  ```bash
41
58
  # 1. Install globally
42
- npm install -g @speexjs/speexor
59
+ npm install -g speexor
43
60
 
44
61
  # 2. Initialize a project (creates speexor.config.yaml + starts dashboard)
45
62
  speexor start https://github.com/your-org/your-repo --port 3000
@@ -106,10 +123,158 @@ projects:
106
123
  concurrentLimit: 2
107
124
  ```
108
125
 
126
+ Validate your configuration against the schema and run any pending migrations:
127
+
128
+ ```bash
129
+ speexor config validate
130
+ ```
131
+
109
132
  See the full schema with `speexor config-help`.
110
133
 
111
134
  ---
112
135
 
136
+ ## Recursive Task Decomposition & Scheduling
137
+
138
+ Speexor's task system transforms high-level descriptions into an executable DAG of sub-tasks:
139
+
140
+ ### LLM-Driven Planner
141
+ - Submit a task with `speexor task submit "Implement user authentication"`
142
+ - The LLM planner decomposes it into a directed acyclic graph of atomic sub-tasks
143
+ - Dependencies are automatically resolved: a sub-task only executes after all its predecessors succeed
144
+
145
+ ### Parallel Agent Scheduler
146
+ - The scheduler executes independent sub-tasks in parallel across the configured agent fleet
147
+ - **Resource-aware throttling** — respects per-project `concurrentLimit` and system load
148
+ - **Auto-retry** — failed sub-tasks are retried up to the configured limit before escalation
149
+ - **Cost tracking per node** — every executed sub-task is attributed with provider, tokens, and duration
150
+
151
+ ### Worktree Hierarchy Protocol
152
+ - Each decomposed sub-task forks from its parent's commit hash via `git worktree`
153
+ - Completed sub-tasks merge back serially, respecting the dependency chain
154
+ - Merge conflicts are detected and escalated for resolution
155
+
156
+ ```bash
157
+ # Submit a high-level task for decomposition
158
+ speexor task submit "Implement OAuth2 login flow"
159
+ ```
160
+
161
+ ---
162
+
163
+ ## Governance & Security
164
+
165
+ ### Governance Engine
166
+ Speexor implements a two-axis approval model:
167
+
168
+ | Axis | Gate | What It Checks |
169
+ |------|------|----------------|
170
+ | **Task Origin Gate** | Who submitted? | Validates the user, session, or CI trigger against configured policies |
171
+ | **Action Risk Gate** | What will it do? | Classifies every action by risk (low / medium / high / critical) based on file scope, provider, and command |
172
+
173
+ Actions are automatically classified:
174
+ - **Low** — read-only, documentation, lint fixes — auto-approved
175
+ - **Medium** — isolated file edits — requires single approval
176
+ - **High** — cross-module refactors, dependency changes — requires peer approval
177
+ - **Critical** — production config, secrets, destructive operations — requires multi-party approval
178
+
179
+ ### Secrets Vault
180
+ Credentials are stored in the OS-native secure store:
181
+ - **macOS** — Keychain via `security` CLI
182
+ - **Windows** — Credential Manager via `cmdkey`
183
+ - **Linux** — AES-256-GCM encrypted file at `~/.speexor/vault.enc`
184
+
185
+ ### Session Guard
186
+ - **AFK auto-pause** — pauses agent sessions after a configurable period of keyboard inactivity
187
+ - **Budget enforcement** — hard-stops execution when per-session or per-project cost limits are reached
188
+ - **Idle timeout** — auto-terminates sessions that have no output for a configurable duration
189
+
190
+ ### Provider Error Retry
191
+ Provider failures (rate limits, timeouts, 5xx) are handled with:
192
+ - **Exponential backoff** — 1s, 2s, 4s, 8s, max 60s between retries
193
+ - **Automatic fallback** — switches to the configured fallback provider after exhausting retries
194
+ - Each retry and fallback event is logged for audit
195
+
196
+ ---
197
+
198
+ ## Extension Marketplace
199
+
200
+ Speexor supports community extensions published to npm under the `@speexor/ext-*` scope.
201
+
202
+ ```bash
203
+ # Search the marketplace
204
+ speexor ext search "gitlab"
205
+
206
+ # Install an extension
207
+ speexor ext install @speexor/ext-gitlab-tracker
208
+
209
+ # List installed extensions
210
+ speexor ext list
211
+
212
+ # Update all extensions
213
+ speexor ext update
214
+
215
+ # Remove an extension
216
+ speexor ext remove @speexor/ext-gitlab-tracker
217
+ ```
218
+
219
+ ### Plugin SDK (`@speexor/sdk`)
220
+
221
+ Build your own extensions:
222
+
223
+ ```bash
224
+ npm create @speexor/ext my-extension
225
+ cd my-extension
226
+ npm install
227
+ npm run build
228
+ npm publish
229
+ ```
230
+
231
+ The SDK provides:
232
+ - TypeScript types for all 7 plugin slots
233
+ - Extension scaffold generator (`npm create @speexor/ext`)
234
+ - Validation utilities matching the core Zod schemas
235
+ - Lifecycle hooks (`initialize`, `destroy`) for clean setup/teardown
236
+
237
+ ---
238
+
239
+ ## Interactive Dashboard
240
+
241
+ The built-in HTTP dashboard has been upgraded to **v2** with six panels:
242
+
243
+ | Panel | Description |
244
+ |-------|-------------|
245
+ | **Task Tree View** | Visual DAG of decomposed tasks with status, dependencies, and execution order |
246
+ | **Agent Fleet** | Live view of all running agents, their sessions, resource usage, and logs |
247
+ | **Approvals Inbox** | Pending governance approvals with risk classification and context |
248
+ | **Cost Panel** | Real-time cost accumulation per provider, project, and task node |
249
+ | **Decision Log** | Searchable history of governance decisions with confidence calibration |
250
+ | **Simple Mode** | Reduced UI for read-only monitoring — hides actions and approvals |
251
+
252
+ Accessibility: The dashboard targets WCAG 2.1 AA compliance with keyboard navigation, ARIA labels, and high-contrast mode support.
253
+
254
+ ```bash
255
+ speexor start https://github.com/your-org/your-repo --port 3000
256
+ # Open http://localhost:3000
257
+ ```
258
+
259
+ ---
260
+
261
+ ## Decision Quality Evaluation
262
+
263
+ Speexor logs every governance decision and can evaluate decision quality:
264
+
265
+ ```bash
266
+ # Run decision quality evaluation
267
+ speexor eval decisions
268
+ ```
269
+
270
+ This produces a **confidence calibration report** showing:
271
+ - Accuracy of risk classification vs. actual outcomes
272
+ - Decision latency distribution
273
+ - Auto-labeling quality metrics
274
+ - Recommendations for policy tuning
275
+
276
+ ---
277
+
113
278
  ## Plugin Architecture
114
279
 
115
280
  Speexor defines **7 plugin slots**. Each slot can have multiple implementations registered at startup.
@@ -126,6 +291,8 @@ Speexor defines **7 plugin slots**. Each slot can have multiple implementations
126
291
 
127
292
  Plugins implement `PluginModule` with a standard lifecycle (`initialize` / `destroy`) and receive typed access to the config, event bus, and logger through `PluginContext`.
128
293
 
294
+ Extensions published on the marketplace implement the same interface and are loaded dynamically at startup.
295
+
129
296
  ---
130
297
 
131
298
  ## CLI Reference
@@ -138,13 +305,57 @@ speexor <command> [options]
138
305
  |---------|-------------|
139
306
  | `start [repo]` | Initialize a project config and start the HTTP dashboard. Options: `--port`, `--name`, `--no-dashboard` |
140
307
  | `agent spawn --task <id>` | Spawn an agent for a GitHub issue. Options: `--agent` (opencode, claude-code, aider, codex) |
308
+ | `task submit <description>` | Submit a high-level task description for LLM-based decomposition into a DAG of sub-tasks |
141
309
  | `list` | List all projects and active agent statuses |
142
310
  | `stop <session-id>` | Stop an agent session safely |
143
311
  | `logs <session-id>` | Tail logs for an agent session. Options: `--follow`, `--lines` |
312
+ | `ext search <query>` | Search the extension marketplace for available extensions |
313
+ | `ext install <package>` | Install an extension from the marketplace |
314
+ | `ext list` | List all installed extensions |
315
+ | `ext update [package]` | Update all extensions or a specific one |
316
+ | `ext remove <package>` | Uninstall an extension |
317
+ | `config validate` | Validate configuration against the schema and run pending migrations |
318
+ | `eval decisions` | Run decision quality evaluation with confidence calibration report |
144
319
  | `config-help` | Print the full YAML config schema reference |
145
320
 
146
321
  ---
147
322
 
323
+ ## Testing & CI/CD
324
+
325
+ ### Test Suite
326
+
327
+ Speexor includes **19 test files** containing **~320 tests** across the following domains:
328
+
329
+ | Area | Files | Focus |
330
+ |------|-------|-------|
331
+ | Core | 6 | State management, config validation, event bus, session lifecycle |
332
+ | CLI | 4 | Command parsing, argument validation, output formatting |
333
+ | Plugins | 4 | Agent adapters, runtime backends, workspace isolation |
334
+ | Scheduler | 3 | DAG execution, parallelism, throttling, retry logic |
335
+ | Governance | 2 | Risk classification, approval workflow, budget enforcement |
336
+
337
+ Run the tests:
338
+
339
+ ```bash
340
+ npm test # Run all tests
341
+ npm run test:watch # Watch mode
342
+ npm run test:coverage # Coverage report (~90%)
343
+ ```
344
+
345
+ ### CI/CD Pipeline
346
+
347
+ GitHub Actions (`.github/workflows/ci.yml`) runs on every push and pull request:
348
+
349
+ | Stage | What It Does |
350
+ |-------|-------------|
351
+ | **Lint** | Biome check on `src/` |
352
+ | **Typecheck** | `tsc --noEmit` full project type validation |
353
+ | **Test** | Vitest with ~320 tests across all modules |
354
+ | **Build** | `tsup` production build |
355
+ | **Publish** | npm publish on tagged releases (`v*`) |
356
+
357
+ ---
358
+
148
359
  ## Examples
149
360
 
150
361
  A complete multi-project configuration with reaction rules is available in [`examples/basic.yaml`](./examples/basic.yaml).
@@ -158,6 +369,8 @@ A complete multi-project configuration with reaction rules is available in [`exa
158
369
  - [Behavior Spec (PRD 03)](./docs/PRD03.md)
159
370
  - [Design Docs (PRD 04)](./docs/PRD04.md)
160
371
  - [Specification (PRD 05)](./docs/PRD05.md)
372
+ - [Architecture Decision Records](./docs/adr/) — 5 ADRs covering key design decisions
373
+ - [Glossary](./GLOSSARY.md) — Canonical terminology reference
161
374
  - [Setup Guide](./docs/SETUP.md)
162
375
  - [Troubleshooting](./docs/TROUBLESHOOTING.md)
163
376
 
@@ -170,7 +383,8 @@ Speexor was created because existing agent orchestration tools were either tied
170
383
  - **Agent-agnostic** — run OpenCode, Claude Code, Aider, or Codex with the same CLI
171
384
  - **Lightweight** — zero external runtime dependencies beyond your agents and git
172
385
  - **Repository-native** — works with your existing GitHub repos, issues, and PR workflows
173
- - **Extensible** — swap any plugin slot with your own implementation
386
+ - **Extensible** — swap any plugin slot with your own implementation or install community extensions
387
+ - **Governance-ready** — built-in approval gates, cost controls, and audit logging for production use
174
388
 
175
389
  ---
176
390
 
package/REFACTOR-LOG.md CHANGED
@@ -1,40 +1,40 @@
1
- # Refactoring Log
2
-
3
- > Tracking significant architectural changes and refactoring decisions for Speexor.
4
-
5
- ## [2026-06-30] — Initial Implementation (v0.1.0)
6
-
7
- ### Context
8
- Initial implementation of the Agent Orchestrator based on PRD01.md. Built from scratch as `packages/speexor` in the speexjs monorepo.
9
-
10
- ### Decisions
11
- 1. **Plugin Architecture**: 7 fixed plugin slots (agent, runtime, workspace, tracker, scm, notifier, terminal) — inspired by AgentWrapper AO but with cleaner interface segregation
12
- 2. **Monorepo as Single Package**: Unlike AO's multi-package structure, v1 lives in a single `@speexjs/speexor` package with internal module separation. This reduces initial complexity while maintaining clean module boundaries via `src/` directories.
13
- 3. **TypeScript ESM Only**: No CJS support. The rest of speexjs ecosystem is ESM-first.
14
- 4. **Built-in HTTP Dashboard**: Using Node.js built-in `http` module instead of Express/NestJS to keep dependencies minimal for v1.
15
- 5. **No Dynamic Plugin Loading**: All built-in plugins are hardcoded in `loadAllPlugins()`. Dynamic loading from external packages is deferred to v2.
16
- 6. **Synchronous File I/O for State**: `SessionStore` uses `readFileSync`/`writeFileSync` for simplicity. Will migrate to async when performance requires it.
17
-
18
- ### Rebrand: Konduktor → Speexor
19
- The project was initially codenamed "Konduktor" internally. After initial implementation, renamed to "Speexor" to align with the SpeexJS brand ecosystem. All code identifiers, CLI names, config files, and documentation were updated.
20
-
21
- ### Patterns Established
22
- - `PluginModule` base interface with `initialize()`/`destroy()` lifecycle
23
- - Debug namespace pattern: `speexor:<module>:<submodule>`
24
- - Session ID format: `<prefix>-<task-id>-<timestamp>`
25
- - Worktree branch format: `speexor/<task-id>`
26
- - Config directory: `.speexor/` at project root
27
-
28
- ### Technical Debt
29
- - No test files yet (vitest configured but no tests written)
30
- - `getLiveStream()` throws "not implemented" in both runtime plugins
31
- - Terminal plugin has no implementation
32
- - Dashboard HTML is inline (hard to maintain)
33
- - Session store uses synchronous I/O
34
- - No dynamic plugin discovery mechanism
35
-
36
- ### What Worked Well
37
- - TypeScript strict mode caught several type errors during initial build
38
- - Event bus pattern cleanly decouples lifecycle from dashboard
39
- - Zod validation provides clear error messages for misconfigured YAML
40
- - Plugin interface design allows adding new providers without touching core
1
+ # Refactoring Log
2
+
3
+ > Tracking significant architectural changes and refactoring decisions for Speexor.
4
+
5
+ ## [2026-06-30] — Initial Implementation (v0.1.0)
6
+
7
+ ### Context
8
+ Initial implementation of the Agent Orchestrator based on PRD01.md. Built from scratch as `packages/speexor` in the speexjs monorepo.
9
+
10
+ ### Decisions
11
+ 1. **Plugin Architecture**: 7 fixed plugin slots (agent, runtime, workspace, tracker, scm, notifier, terminal) — inspired by AgentWrapper AO but with cleaner interface segregation
12
+ 2. **Monorepo as Single Package**: Unlike AO's multi-package structure, v1 lives in a single `speexor` package with internal module separation. This reduces initial complexity while maintaining clean module boundaries via `src/` directories.
13
+ 3. **TypeScript ESM Only**: No CJS support. The rest of speexjs ecosystem is ESM-first.
14
+ 4. **Built-in HTTP Dashboard**: Using Node.js built-in `http` module instead of Express/NestJS to keep dependencies minimal for v1.
15
+ 5. **No Dynamic Plugin Loading**: All built-in plugins are hardcoded in `loadAllPlugins()`. Dynamic loading from external packages is deferred to v2.
16
+ 6. **Synchronous File I/O for State**: `SessionStore` uses `readFileSync`/`writeFileSync` for simplicity. Will migrate to async when performance requires it.
17
+
18
+ ### Rebrand: Konduktor → Speexor
19
+ The project was initially codenamed "Konduktor" internally. After initial implementation, renamed to "Speexor" to align with the SpeexJS brand ecosystem. All code identifiers, CLI names, config files, and documentation were updated.
20
+
21
+ ### Patterns Established
22
+ - `PluginModule` base interface with `initialize()`/`destroy()` lifecycle
23
+ - Debug namespace pattern: `speexor:<module>:<submodule>`
24
+ - Session ID format: `<prefix>-<task-id>-<timestamp>`
25
+ - Worktree branch format: `speexor/<task-id>`
26
+ - Config directory: `.speexor/` at project root
27
+
28
+ ### Technical Debt
29
+ - No test files yet (vitest configured but no tests written)
30
+ - `getLiveStream()` throws "not implemented" in both runtime plugins
31
+ - Terminal plugin has no implementation
32
+ - Dashboard HTML is inline (hard to maintain)
33
+ - Session store uses synchronous I/O
34
+ - No dynamic plugin discovery mechanism
35
+
36
+ ### What Worked Well
37
+ - TypeScript strict mode caught several type errors during initial build
38
+ - Event bus pattern cleanly decouples lifecycle from dashboard
39
+ - Zod validation provides clear error messages for misconfigured YAML
40
+ - Plugin interface design allows adding new providers without touching core
package/ROADMAP.md CHANGED
@@ -10,6 +10,30 @@ Completed milestones:
10
10
  - M2: GitHub integration (tracker + SCM + reaction engine)
11
11
  - M3: Multi-agent adapters (Claude Code, Aider, Codex)
12
12
  - M4: Dashboard MVP (REST API + HTML frontend)
13
+ - M6: Polish & Documentation — CI/CD pipeline setup, ADRs, glossary, benchmarks, ROADMAP update
14
+
15
+ ## PRD v5 Fix Milestones (inserted before M5)
16
+
17
+ ### M22a — Critical Fixes (Target: Q3 2026)
18
+ - FR-85: Correct sandboxing model (isolated-vm / restricted process, not worker_threads)
19
+ - FR-86: Unified two-axis Approval Model (Task Origin Gate + Action Risk Gate)
20
+ - FR-87: Canonical dashboard panel names (Approvals, Decision Log)
21
+ - FR-88: Worktree Hierarchy Protocol (pinned commits, serialized merges, conflict escalation)
22
+ - FR-89: Split maxTaskGraphDepth / maxAgentSpawnDepth config
23
+
24
+ ### M22b — Major Fixes (Target: Q3 2026)
25
+ - FR-90: Config schema migration tooling (speexor config validate)
26
+ - FR-91: Decision Quality Eval Harness + calibration reporting
27
+ - FR-92: Explicit rollback trigger conditions
28
+ - FR-93: Budget guard enforcement (riskPolicy.budgetLimitUSD)
29
+ - FR-94: SECURITY-DEFAULTS.md consolidated reference
30
+ - FR-95: Provider error retry/backoff/fallback contract
31
+ - FR-96: Clarified maxAfkDurationHours pause semantics
32
+
33
+ ### M22c — Documentation Consolidation (Target: Q3 2026)
34
+ - SECURITY-DEFAULTS.md update with two-layer model
35
+ - Glossary additions from v5 §6
36
+ - GLOSSARY.md maintenance pass
13
37
 
14
38
  ## Upcoming Milestones
15
39
 
@@ -19,18 +43,14 @@ Completed milestones:
19
43
  - Usage analytics dashboard panel
20
44
  - Provider fallback configuration UI
21
45
 
22
- ### M6 — Polish & Documentation (Target: Q3 2026)
23
- - Comprehensive test suite (Vitest + integration tests)
24
- - CI/CD pipeline setup
25
- - Open-source readiness review
26
- - Performance benchmarking
27
-
28
46
  ### M7 — Recursive Task Decomposition (Target: Q4 2026)
29
47
  - DAG-based task breakdown
48
+ - LLM-based planner integration
30
49
  - Parallel sub-agent spawning
31
50
  - Task dependency resolution
32
51
  - Progress tracking across sub-tasks
33
52
  - Context window management
53
+ - Worktree Hierarchy Protocol implementation
34
54
 
35
55
  ### M8 — Real-Time Terminal (Target: Q4 2026)
36
56
  - WebSocket-based live terminal streaming
@@ -39,11 +59,12 @@ Completed milestones:
39
59
  - Session recording and replay
40
60
 
41
61
  ### M9 — Extension Marketplace (Target: Q1 2027)
42
- - Extension manifest format
62
+ - Extension manifest format (speexor.extension.json)
43
63
  - Plugin SDK (@speexor/sdk)
44
64
  - Marketplace index (GitHub-based registry)
45
65
  - Extension install/update/remove lifecycle
46
66
  - Permission system for extensions
67
+ - Skill manifest as superset-compatible wrapper over ECC/affaan-m folders (FR-97)
47
68
 
48
69
  ### M10 — Advanced Security & Compliance (Target: Q1 2027)
49
70
  - End-to-end encryption for cross-device sync
@@ -61,18 +82,19 @@ Completed milestones:
61
82
  ## Themes
62
83
 
63
84
  ### 🎯 Short-term (Q3 2026)
64
- - Polish existing features
85
+ - PRD v5 critical/major fix implementation (M22a, M22b)
86
+ - Documentation consolidation (M22c)
87
+ - Cost tracking (M5)
65
88
  - Test coverage >80%
66
- - Documentation completion
67
- - Cost tracking
89
+ - CI/CD pipeline active
68
90
 
69
91
  ### 🚀 Medium-term (Q4 2026)
70
- - Recursive task decomposition
71
- - Real-time terminal streaming
92
+ - Recursive task decomposition (M7)
93
+ - Real-time terminal streaming (M8)
72
94
  - Advanced observability
73
95
 
74
96
  ### 🌟 Long-term (2027)
75
- - Extension marketplace
97
+ - Extension marketplace (M9)
76
98
  - Plugin SDK for community
77
- - Multi-host distributed execution
78
- - Enterprise security features
99
+ - Advanced security & compliance (M10)
100
+ - Multi-host distributed execution (M11)
@@ -0,0 +1,118 @@
1
+ # Security & Safety Defaults
2
+
3
+ > Reference document for Speexor's two-layer security and safety model.
4
+ > Created per FR-94 (PRD05 §4.5) to clarify the distinction between
5
+ > Extension Permissions and Action Risk Tiers.
6
+
7
+ ## Overview
8
+
9
+ Speexor has **two independent safety layers** that operate at different levels:
10
+
11
+ 1. **Extension Permissions** — gates what an extension *can ever do* (set once at install time)
12
+ 2. **Action Risk Tiers** — gates what *any* action (from any already-permitted extension or core agent) *does right now* (evaluated every time)
13
+
14
+ These are intentionally separate systems. An extension that passes its permission check still has every runtime action evaluated against the Action Risk Tier policy.
15
+
16
+ ---
17
+
18
+ ## Layer 1: Extension Permissions
19
+
20
+ ### What it controls
21
+ Capabilities granted to third-party extensions at install time.
22
+
23
+ ### Configuration
24
+ `spexor.config.yaml` → `extensions.permissionsMode`
25
+
26
+ | Mode | Behavior |
27
+ |------|----------|
28
+ | `strict` (default) | Extensions run in sandboxed process with enforced permission boundaries |
29
+ | `relaxed` | Permissions still require user confirmation, but sandboxing is less restrictive |
30
+
31
+ ### Permission Axes
32
+
33
+ | Axis | Levels | Default | Description |
34
+ |------|--------|---------|-------------|
35
+ | `fileSystem` | `none`, `read-only`, `read-write`, `scoped`, `full` | `none` | Access to the local file system |
36
+ | `network` | `none`, `read-only`, `scoped`, `full` | `none` | Outbound network access |
37
+ | `shell` | `none`, `read-only`, `scoped`, `full` | `none` | Shell command execution |
38
+ | `secrets` | string[] of named scopes | `[]` | Access to named secrets in the Vault |
39
+
40
+ ### Permission Lifecycle
41
+ 1. **Install**: Extension manifest declares permissions → displayed to user in plain English
42
+ 2. **Confirm**: User explicitly accepts or rejects the permission set
43
+ 3. **Upgrade**: Any permission upgrade on update requires re-confirmation
44
+ 4. **Runtime**: PermissionEnforcer checks every operation against declared permissions
45
+
46
+ ### Enforcement
47
+ - Extensions with `shell: none` + `network: none` → `IsolatedSandbox` (node:vm, no Node built-ins)
48
+ - Extensions requiring broader access → `ProcessSandbox` (child_process with proxy)
49
+ - `worker_threads` is NEVER used as a security boundary (it shares process memory)
50
+
51
+ ---
52
+
53
+ ## Layer 2: Action Risk Tiers
54
+
55
+ ### What it controls
56
+ Whether a specific runtime action (from any source) requires user approval before execution.
57
+
58
+ ### Configuration
59
+ `spexor.config.yaml` → `riskPolicy`
60
+
61
+ | Setting | Default | Description |
62
+ |---------|---------|-------------|
63
+ | `autoApprove` | `[]` | Action categories that auto-execute without approval |
64
+ | `requireApproval` | `["irreversible-high-stakes"]` | Action categories that always require approval |
65
+ | `approvalTimeout` | `4h` | How long an approval request waits before default action |
66
+ | `approvalDefaultAction` | `skip` | What happens when approval times out |
67
+ | `defaultRiskTierForUnknownActions` | `medium` | Default tier when action risk can't be classified |
68
+
69
+ ### Risk Tiers
70
+
71
+ | Tier | Auto-Approve? | Examples |
72
+ |------|---------------|----------|
73
+ | `reversible-low` | Yes (default) | Running tests, linting, reading files |
74
+ | `reversible-medium` | Yes (default) | Creating branches, committing code |
75
+ | `irreversible-high-stakes` | No (requires approval) | Merging PRs, deleting branches, publishing packages, modifying CI config |
76
+ | `unknown` | Depends on `defaultRiskTierForUnknownActions` | Actions not classified by the risk classifier |
77
+
78
+ ### Approval Workflow
79
+ 1. Action is evaluated by GovernanceEngine.evaluateAction()
80
+ 2. If requires approval → ApprovalItem created with expiry + default action
81
+ 3. Appears in dashboard "Approvals" panel (tagged as Axis 2)
82
+ 4. User approves/rejects or timeout triggers default action
83
+
84
+ ---
85
+
86
+ ## Comparison: When Each Layer Applies
87
+
88
+ | Scenario | Extension Permissions | Action Risk Tier |
89
+ |----------|----------------------|------------------|
90
+ | Installing a code-review skill | ✅ Checked (install time) | ❌ Not applicable |
91
+ | Skill reads a file in workspace | ✅ Checked (fileSystem permission) | ❌ Not applicable (reversible-low) |
92
+ | Skill runs `git push` | ✅ Checked (shell permission) | ✅ Checked (irreversible-high-stakes) |
93
+ | Skill makes HTTP call | ✅ Checked (network permission) | ❌ Not applicable (reversible-medium) |
94
+ | Core agent creates a PR | ❌ Not applicable (core, not extension) | ✅ Checked (irreversible-high-stakes) |
95
+ | Core agent proposes a new task | ❌ Not applicable | ✅ Checked (task-origin-gate, Axis 1) |
96
+
97
+ ---
98
+
99
+ ## Safety Defaults Summary
100
+
101
+ | Default | Value | Rationale |
102
+ |---------|-------|-----------|
103
+ | Extension permissions default | All `none` | Least privilege by default |
104
+ | Permissions mode | `strict` | Sandbox everything third-party |
105
+ | Unknown risk tier | `medium` | Conservative when can't classify |
106
+ | Approval timeout default action | `skip` | Fail safe, don't auto-execute |
107
+ | Budget limit | Not set (opt-in) | User must explicitly set cost controls |
108
+ | Auto-merge PRs | `false` | Human judgment required for merges |
109
+
110
+ ---
111
+
112
+ ## Related Documentation
113
+
114
+ - [SECURITY.md](SECURITY.md) — Vulnerability reporting, incident response
115
+ - [ARCHITECTURE.md](ARCHITECTURE.md) — System architecture and plugin model
116
+ - [CONTRIBUTING.md](CONTRIBUTING.md) — Extension development guidelines
117
+ - `src/sandbox/` — Sandbox implementation
118
+ - `src/governance/` — Governance engine implementation