speexor 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/API-REFERENCE.md +201 -0
- package/ARCHITECTURE.md +548 -0
- package/CHANGELOG.md +52 -0
- package/CODE-OF-CONDUCT.md +83 -0
- package/CONTRIBUTING.md +98 -0
- package/FAQ.md +105 -0
- package/LICENSE.md +21 -0
- package/PUBLISH.md +77 -0
- package/README.md +179 -0
- package/REFACTOR-LOG.md +40 -0
- package/ROADMAP.md +78 -0
- package/SECURITY.md +79 -0
- package/SUMMARY.md +46 -0
- package/TESTING.md +140 -0
- package/dist/agent-5D3BVWNK.js +37 -0
- package/dist/agent-5D3BVWNK.js.map +1 -0
- package/dist/chunk-2F66BZYJ.js +212 -0
- package/dist/chunk-2F66BZYJ.js.map +1 -0
- package/dist/chunk-5NA2TFPG.js +3 -0
- package/dist/chunk-5NA2TFPG.js.map +1 -0
- package/dist/chunk-B7WLHC4W.js +666 -0
- package/dist/chunk-B7WLHC4W.js.map +1 -0
- package/dist/chunk-SXALZEOJ.js +345 -0
- package/dist/chunk-SXALZEOJ.js.map +1 -0
- package/dist/cli/index.d.ts +1 -0
- package/dist/cli/index.js +287 -0
- package/dist/cli/index.js.map +1 -0
- package/dist/core/index.d.ts +31 -0
- package/dist/core/index.js +4 -0
- package/dist/core/index.js.map +1 -0
- package/dist/index.d.ts +75 -0
- package/dist/index.js +205 -0
- package/dist/index.js.map +1 -0
- package/dist/plugins/index.d.ts +6 -0
- package/dist/plugins/index.js +3 -0
- package/dist/plugins/index.js.map +1 -0
- package/dist/types-0q_okI2g.d.ts +205 -0
- package/docs/PRD01.md +264 -0
- package/docs/PRD02.md +299 -0
- package/docs/PRD03.md +0 -0
- package/docs/PRD04.md +349 -0
- package/docs/PRD05.md +312 -0
- package/docs/SETUP.md +94 -0
- package/docs/TROUBLESHOOTING.md +113 -0
- package/examples/basic.yaml +61 -0
- package/package.json +102 -0
- package/schema/config.schema.json +119 -0
- package/speexor.config.yaml.example +30 -0
package/docs/PRD04.md
ADDED
|
@@ -0,0 +1,349 @@
|
|
|
1
|
+
# PRD: Speexor v4 — "The All-in-One Local Autonomous Agent Platform"
|
|
2
|
+
**Codename:** Speexor — Open, Local-First, Extensible Multi-Agent OS
|
|
3
|
+
**Version:** 4.0 — Built on top of v1, v2, and v3
|
|
4
|
+
**Author:** Aditya (drafted with Claude)
|
|
5
|
+
**Date:** June 30, 2026
|
|
6
|
+
**Status:** Draft
|
|
7
|
+
**Language policy:** This document and the product itself default to **English** for all UI strings, logs, docs, and config keys, to maximize adoption among developers worldwide. Localization (including Indonesian) is an opt-in layer, not the default.
|
|
8
|
+
|
|
9
|
+
> This PRD extends v1 (execution foundation: plugins, GitHub-first, worktrees), v2 (recursive task decomposition, parallel subagents, real-time observability), and v3 (Never-Ask autonomy, interrupt/override protocol, universal action layer). v4 adds the layer that turns Speexor from "a tool you built" into **a platform other developers can extend, trust, and run for free on their own machines**: an Extension Marketplace, a local-first security model, a performance/scalability architecture, a strict disambiguation policy, and a professional-grade interactive dashboard.
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## 1. What Changes in v4
|
|
14
|
+
|
|
15
|
+
| Aspect | v3 | v4 |
|
|
16
|
+
|---|---|---|
|
|
17
|
+
| Extensibility | Plugins exist (7 slots + Action Adapters) but are hard-coded/manually installed | **Extension Manager & Marketplace**: agents, skills, plugins, libraries, commands, and MCP servers are discoverable, installable, versioned, and manageable from the CLI/dashboard, like a package manager |
|
|
18
|
+
| Deployment model | Implied local, not formalized | **Local-first by design**: the entire orchestrator, task graph, event bus, and dashboard run on the user's machine; no mandatory cloud backend |
|
|
19
|
+
| Licensing/cost | Not specified | **Free and open-source (MIT)**, forever, by design — no paywalled core features |
|
|
20
|
+
| Security | Mentioned (auth, encryption) | **Formal local-first security model**: sandboxed extensions, permission system, secrets vault, network egress control, signed extension manifests |
|
|
21
|
+
| Performance | Implied via scheduler | **Explicit performance architecture**: worker-thread/process-pool concurrency, event-bus backpressure, caching, benchmarked latency targets |
|
|
22
|
+
| Scalability of the *codebase itself* | Monorepo growth not addressed | **Plugin SDK + contribution pipeline** so the project itself scales with many third-party contributors, not just Aditya |
|
|
23
|
+
| Terminology | Some overlapping terms (agent/subagent/skill/plugin/adapter) | **Formal glossary + naming conventions** to eliminate ambiguity across docs, code, and UI |
|
|
24
|
+
| Dashboard | Functional, real-time | **Interactive, opinionated UX**: graph manipulation, agent drill-down, command palette, accessible by default |
|
|
25
|
+
| Theoretical grounding | Implicit | Explicitly informed by established multi-agent LLM research (see §13) |
|
|
26
|
+
|
|
27
|
+
---
|
|
28
|
+
|
|
29
|
+
## 2. Problem Statement (New in v4)
|
|
30
|
+
|
|
31
|
+
1. **Closed-garden orchestrators don't scale as software.** v1–v3 assume Aditya hand-writes every adapter (OpenCode, Claude Code, Aider, Codex) and every domain Action Adapter. This does not scale as a *project* — without a plugin SDK and a marketplace, every new agent backend, skill, or MCP server requires a core-team PR. This is the same lesson AO (AgentWrapper) learned: a 7-slot plugin system is what let it grow past 1,000 commits and a real contributor base.
|
|
32
|
+
2. **Running arbitrary AI-generated/installed extensions locally is dangerous without a security model.** Once we allow third-party skills, plugins, and MCP servers to be installed by users, we introduce a real attack surface: malicious or buggy extensions could exfiltrate secrets, execute destructive shell commands, or leak proprietary code. v1–v3 never formalized this.
|
|
33
|
+
3. **"Autonomous and parallel" claims fall apart under load without a real concurrency architecture.** Running many agents/subagents in parallel on a single laptop (Node.js's single-threaded event loop by default) will choke on CPU-bound work (parsing, diffing, embedding, file I/O) unless we explicitly design for worker threads/processes, backpressure, and resource budgeting.
|
|
34
|
+
4. **Ambiguous terminology slows down both contributors and users.** "Agent" vs "subagent" vs "skill" vs "plugin" vs "adapter" vs "MCP server" currently overlap loosely across v1–v3. Without a strict glossary, contributors will build incompatible mental models, and the dashboard will confuse end users.
|
|
35
|
+
5. **A tool only Aditya can install is not a platform.** For broad developer adoption (global, English-first), the project needs to be genuinely free, well-documented, and easy to extend — mirroring what successful open-source dev tools (e.g., VS Code's extension model, npm's package ecosystem) did to scale beyond their original author.
|
|
36
|
+
|
|
37
|
+
---
|
|
38
|
+
|
|
39
|
+
## 3. Goals
|
|
40
|
+
|
|
41
|
+
| # | Goal | Success Metric |
|
|
42
|
+
|---|------|------------------|
|
|
43
|
+
| G1 | Provide a first-class Extension Manager for agents, skills, plugins, libraries, commands, and MCP servers | A new extension can be installed, enabled, and used in a running task graph in under 60 seconds, without restarting the orchestrator |
|
|
44
|
+
| G2 | Run entirely local-first, with zero mandatory cloud dependency for core orchestration | Core orchestration (task graph, scheduler, event bus, dashboard) works fully offline except for the LLM provider API calls themselves |
|
|
45
|
+
| G3 | Be 100% free and open-source, with no artificially gated core features | License = MIT (or equivalent permissive OSS license); no "Pro tier" gating core orchestration, parallelism, or the dashboard |
|
|
46
|
+
| G4 | Guarantee a baseline security posture for third-party extensions | Every installed extension runs under an explicit permission model; zero default network/file-system access outside its declared scope |
|
|
47
|
+
| G5 | Meet explicit performance targets under realistic parallel load | ≥8 concurrent agents/subagents on a typical developer laptop (8 cores, 16GB RAM) without dashboard lag (<200ms UI interaction latency) |
|
|
48
|
+
| G6 | Make the codebase scalable for external contributors | A documented Plugin SDK + contribution template lets an external developer ship a new adapter without modifying `core` |
|
|
49
|
+
| G7 | Eliminate ambiguity in terminology, config, and UX | A single canonical glossary (§5) is referenced by 100% of public docs, code comments in `core`, and dashboard labels |
|
|
50
|
+
| G8 | Ship an interactive, readable, beginner-friendly dashboard | A new user can understand "what is happening right now" within 30 seconds of opening the dashboard for the first time, without reading docs |
|
|
51
|
+
|
|
52
|
+
### Non-Goals (v4)
|
|
53
|
+
- Speexor does **not** become a hosted SaaS in v4. A hosted/cloud-sync variant may be explored later as an optional, separate add-on — but the core promise is **local-first, free, and self-hosted**.
|
|
54
|
+
- Speexor does **not** build its own LLM. All "agents" remain orchestration wrappers around existing CLIs/APIs (OpenCode, Claude Code, Aider, Codex, or any MCP-compatible backend).
|
|
55
|
+
- The Extension Marketplace is not, at v4, a monetized storefront. It is a free, open registry (similar in spirit to npm or VS Code Marketplace), with no transaction layer.
|
|
56
|
+
|
|
57
|
+
---
|
|
58
|
+
|
|
59
|
+
## 4. Naming & Terminology Glossary (Disambiguation Layer)
|
|
60
|
+
|
|
61
|
+
To eliminate ambiguity across docs, code, and UI, v4 introduces **one canonical glossary**. Every term below has exactly one meaning across the entire project; synonyms are explicitly forbidden in code/docs.
|
|
62
|
+
|
|
63
|
+
| Term | Definition | Not to be confused with |
|
|
64
|
+
|---|---|---|
|
|
65
|
+
| **Orchestrator** | The single root process per task graph that owns decomposition, scheduling, and lifecycle | "Speexor" (the product name) — Orchestrator is a *role*, Speexor is the *platform* |
|
|
66
|
+
| **Agent** | A process instance assigned to exactly one task node in the graph, backed by one Agent Backend | "Agent Backend" (the underlying CLI/model, e.g. OpenCode) |
|
|
67
|
+
| **Subagent** | An Agent spawned *by* another Agent to handle a child task node (depth > 0 relative to its parent Agent) | "Worker" (not used in v4 docs — always say Agent or Subagent) |
|
|
68
|
+
| **Agent Backend** | A concrete CLI/SDK that actually runs model inference and produces output (OpenCode, Claude Code, Aider, Codex, or any future backend) | "Agent" — the Backend is what an Agent *uses*, not the Agent itself |
|
|
69
|
+
| **Skill** | A named, reusable capability/prompt-and-tool bundle that an Agent invokes during execution (e.g., `code-review`, `test-writer`, `task-proposal`) | "Plugin" |
|
|
70
|
+
| **Plugin** | A core-level extension implementing one of the seven Plugin Slots from v1 (Runtime, Agent, Workspace, Tracker, SCM, Notifier, Terminal) or a v3 Action Adapter | "Skill" — a Plugin extends *Speexor's capabilities as a system*; a Skill extends *what an individual Agent can do* |
|
|
71
|
+
| **MCP Server** | An external Model Context Protocol server providing tools to an Agent (e.g., a database tool, a browser tool) | "Plugin" — an MCP Server is a *third-party* tool surface; a Plugin is *Speexor-native* |
|
|
72
|
+
| **Library** | A versioned, installable code dependency used by an extension (Plugin/Skill/Agent Backend adapter) at the Node.js package level | "Plugin"/"Skill" |
|
|
73
|
+
| **Command** | A user- or agent-invocable CLI action exposed by Speexor core or an extension (e.g., `speexor task submit`) | "Skill" — a Command is invoked by *humans or the CLI*; a Skill is invoked by *an Agent during reasoning* |
|
|
74
|
+
| **Task Node** | One node in the Task Graph (v2 concept), unchanged in v4 | "Task" alone — always qualify as "Task Node" in technical docs to avoid ambiguity with "Task Graph" |
|
|
75
|
+
| **Extension** | Umbrella term covering any of: Plugin, Skill, Agent Backend adapter, MCP Server registration, or Library, when discussed generically in the Marketplace context | — |
|
|
76
|
+
|
|
77
|
+
This glossary ships as `GLOSSARY.md` at the repo root and is enforced via a lint rule (`scripts/lint-glossary.js`) that flags banned synonyms in new PRs.
|
|
78
|
+
|
|
79
|
+
---
|
|
80
|
+
|
|
81
|
+
## 5. Core Concept: Extension Manager & Marketplace
|
|
82
|
+
|
|
83
|
+
### 5.1 What Can Be Managed
|
|
84
|
+
Per the new requirement, users must be able to install, configure, enable/disable, and remove the following from one unified interface (CLI + dashboard):
|
|
85
|
+
|
|
86
|
+
| Extension Type | Examples | Distribution |
|
|
87
|
+
|---|---|---|
|
|
88
|
+
| Agent Backends | OpenCode, Claude Code, Aider, Codex, future CLIs | npm package implementing `AgentBackendPlugin` |
|
|
89
|
+
| Skills | `test-writer`, `task-proposal`, `code-review`, custom domain skills | Folder-based bundle (prompt + metadata + optional scripts), distributed via npm or git URL — directly compatible with existing skill repos (e.g., the ECC/affaan-m pattern Aditya already uses) |
|
|
90
|
+
| Plugins (core slots) | new Tracker (GitLab), new SCM, new Notifier (Slack/Discord/Telegram) | npm package implementing the relevant `*Plugin` interface |
|
|
91
|
+
| Action Adapters (v3) | Document, Web/Browser, Scheduling, Communication | npm package implementing `ActionPlugin` |
|
|
92
|
+
| Libraries | any npm dependency an extension needs | standard npm resolution, sandboxed per extension |
|
|
93
|
+
| Commands | custom CLI subcommands contributed by extensions | declared in extension manifest, namespaced (`speexor ext run <ext>:<command>`) |
|
|
94
|
+
| MCP Servers | any MCP-compliant external tool server | registered by URL/connection spec, used by Agents like first-class tools |
|
|
95
|
+
|
|
96
|
+
### 5.2 Extension Manifest
|
|
97
|
+
Every extension ships a manifest (`speexor.extension.json`) — this is the contract that makes the whole system disambiguated and machine-verifiable:
|
|
98
|
+
|
|
99
|
+
```json
|
|
100
|
+
{
|
|
101
|
+
"name": "speexor-skill-test-writer",
|
|
102
|
+
"type": "skill",
|
|
103
|
+
"version": "1.2.0",
|
|
104
|
+
"speexorApiVersion": "^4.0.0",
|
|
105
|
+
"permissions": {
|
|
106
|
+
"fileSystem": ["read:workspace", "write:workspace"],
|
|
107
|
+
"network": [],
|
|
108
|
+
"shell": ["read-only"],
|
|
109
|
+
"secrets": []
|
|
110
|
+
},
|
|
111
|
+
"entry": "./index.js",
|
|
112
|
+
"description": "Generates unit tests for changed files in the current task node.",
|
|
113
|
+
"author": "community",
|
|
114
|
+
"signature": "sha256-..."
|
|
115
|
+
}
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
### 5.3 Marketplace UX
|
|
119
|
+
- FR-62: `speexor ext search <query>` / dashboard "Marketplace" tab — search a public, free, open registry (a simple Git-based index initially, e.g. a GitHub repo of manifests + npm package names, similar in spirit to early VS Code extension galleries).
|
|
120
|
+
- FR-63: `speexor ext install <name>` — installs, verifies manifest signature, displays required permissions for explicit user confirmation before first run.
|
|
121
|
+
- FR-64: `speexor ext list` / dashboard panel — shows all installed extensions, version, enabled/disabled state, and permission footprint.
|
|
122
|
+
- FR-65: `speexor ext update` / `speexor ext remove` — standard lifecycle management.
|
|
123
|
+
- FR-66: Per-project enable/disable — an extension can be installed globally but only enabled for specific projects (`speexor.config.yaml` → `extensions.enabled: [...]`).
|
|
124
|
+
- FR-67: Dashboard surfaces, in real time, **which extension/skill/MCP server each running Agent or Subagent is currently using** — directly fulfilling the "see which skill/plugin/command/library is in use, live" requirement carried over from v2/v3 and made first-class here.
|
|
125
|
+
|
|
126
|
+
---
|
|
127
|
+
|
|
128
|
+
## 6. Core Concept: Local-First Architecture
|
|
129
|
+
|
|
130
|
+
### 6.1 Principles
|
|
131
|
+
- **No mandatory backend server owned by the project.** Everything — Task Graph Store, Event Bus, Checkpoint Manager, Extension registry cache — runs as local processes and local SQLite storage on the user's machine.
|
|
132
|
+
- **The only network calls Speexor itself makes** are: (a) calls to the Agent Backend's configured LLM provider (OpenCode/DeepSeek/GLM/etc., as the user already does), (b) calls to the user's configured SCM/Tracker (GitHub API), and (c) optional Marketplace index lookups (can be fully disabled for air-gapped use).
|
|
133
|
+
- **Cross-device access (v3, FR-59-60)** is achieved via the user's *own* tunnel/network (e.g., Tailscale, like AO's `caffeinate`-based remote-access pattern), not via a Speexor-hosted relay — preserving the "local-first" guarantee even when accessed remotely.
|
|
134
|
+
|
|
135
|
+
### 6.2 Why This Matters for the Target User
|
|
136
|
+
Aditya (and any developer working on proprietary client code, e.g., Kata Netizen's NLP pipelines or BrainClash's unreleased game logic) cannot risk source code or credentials transiting a third-party server they don't control. Local-first is therefore not just a technical preference but a **trust requirement** for serious adoption by professional developers.
|
|
137
|
+
|
|
138
|
+
---
|
|
139
|
+
|
|
140
|
+
## 7. Core Concept: Security Model
|
|
141
|
+
|
|
142
|
+
### 7.1 Threat Model
|
|
143
|
+
| Threat | Source | Mitigation |
|
|
144
|
+
|---|---|---|
|
|
145
|
+
| Malicious/buggy extension exfiltrates secrets (API keys, tokens) | Third-party Skill/Plugin/MCP server | Secrets Vault (§7.2) + permission-gated access; extensions never see raw secrets unless explicitly granted |
|
|
146
|
+
| Extension executes destructive shell commands | Third-party Skill/Plugin | Permission model (§7.3): shell access is `none` / `read-only` / `scoped` by default; full shell access requires explicit, per-extension opt-in with a visible warning |
|
|
147
|
+
| Extension calls out to unknown network endpoints | Third-party Plugin/MCP server | Network egress allowlist per extension manifest; default network access is `none` |
|
|
148
|
+
| Two agents racing on the same file/worktree | Internal (already addressed in v1/v2) | Worktree isolation + lock files (unchanged) |
|
|
149
|
+
| High-stakes action executed without consent | Internal (v3 Decision Engine) | Risk Classifier + approval flow (unchanged, reinforced here) |
|
|
150
|
+
| Supply-chain attack via a compromised Marketplace package | Marketplace ecosystem | Signed manifests (`signature` field), checksum verification on install, optional "verified publisher" badge for community-reviewed extensions |
|
|
151
|
+
|
|
152
|
+
### 7.2 Secrets Vault
|
|
153
|
+
- FR-68: All credentials (LLM API keys, GitHub tokens, etc.) are stored in an OS-native secure store (macOS Keychain, Windows Credential Manager, Linux `libsecret`/encrypted file fallback) — never in plaintext config files.
|
|
154
|
+
- FR-69: Extensions request named secret scopes (e.g., `secrets: ["github-token"]`) in their manifest; the Vault injects only the scopes explicitly granted, never the full secret set.
|
|
155
|
+
|
|
156
|
+
### 7.3 Permission Model
|
|
157
|
+
- FR-70: Each extension declares required permissions across four axes: `fileSystem` (none/read/write, scoped to workspace or broader), `network` (none/allowlist), `shell` (none/read-only/scoped/full), `secrets` (named scopes).
|
|
158
|
+
- FR-71: First install (and any permission upgrade on update) requires explicit user confirmation — shown clearly in both CLI and dashboard, in plain English, no jargon (e.g., "This extension can read and modify files in your project, but cannot access the internet.").
|
|
159
|
+
- FR-72: Extensions run in a sandboxed child process (Node.js `worker_threads`/`child_process` with restricted `fs`/`net` access via a permission-enforcing wrapper) — not in the same process/context as core orchestration.
|
|
160
|
+
|
|
161
|
+
### 7.4 Data Security at Rest & in Transit
|
|
162
|
+
- FR-73: Task Graph Store, Decision Journal, and Checkpoint data are encrypted at rest by default (SQLite with SQLCipher or equivalent).
|
|
163
|
+
- FR-74: Any cross-device sync (v3 remote access) is end-to-end encrypted; Speexor never has a server-side copy of user data.
|
|
164
|
+
|
|
165
|
+
---
|
|
166
|
+
|
|
167
|
+
## 8. Core Concept: Performance & Concurrency Architecture
|
|
168
|
+
|
|
169
|
+
### 8.1 Why a Naive Implementation Fails
|
|
170
|
+
Node.js's main thread is single-threaded; spawning many Agents as plain child processes is fine for I/O-bound waiting (the actual LLM call), but CPU-bound work the orchestrator itself does — diffing large files, parsing ASTs for risk classification, computing task-similarity for deduplication (v2 FR-34), rendering large task graphs for the dashboard — must not block the event loop or the whole system degrades as agent count grows.
|
|
171
|
+
|
|
172
|
+
### 8.2 Architecture
|
|
173
|
+
| Layer | Design |
|
|
174
|
+
|---|---|
|
|
175
|
+
| **Agent/Subagent execution** | Each Agent runs as its own OS process (already true via Runtime plugin in v1) — true OS-level parallelism, not Node.js threads |
|
|
176
|
+
| **Orchestrator CPU-bound work** | Offloaded to a `worker_threads` pool (diffing, embeddings for dedup/similarity, AST-based risk classification) so the main event loop stays responsive for the Event Bus and WebSocket dashboard traffic |
|
|
177
|
+
| **Event Bus** | Implemented with explicit **backpressure**: if the dashboard/consumer is slower than event production, events are buffered up to `liveFeedBufferSize` (v2 config) and intelligently summarized/coalesced rather than dropped or blocking producers |
|
|
178
|
+
| **Task Graph Store** | SQLite with WAL mode for concurrent reads/writes from Scheduler, Checkpoint Manager, and Dashboard API simultaneously |
|
|
179
|
+
| **Scheduler** | Resource-budget aware: tracks CPU/memory headroom (via Node's `os` module) and throttles `maxConcurrentAgents` dynamically if the host machine is under heavy load, rather than relying solely on a static config number |
|
|
180
|
+
| **Dashboard rendering** | Task Graph visualization uses virtualization/level-of-detail rendering (collapse deep subtrees by default) so a 50-node graph (v2 `maxNodesPerGraph`) renders smoothly even on modest hardware |
|
|
181
|
+
|
|
182
|
+
### 8.3 Performance Targets (Benchmarked, not aspirational)
|
|
183
|
+
| Metric | Target |
|
|
184
|
+
|---|---|
|
|
185
|
+
| Time to spawn 1 new Agent/Subagent from a `ready` Task Node | < 2 seconds (excluding LLM cold-start) |
|
|
186
|
+
| Dashboard UI interaction latency (click, drag, expand node) | < 200ms on a mid-range laptop with 8 concurrent agents |
|
|
187
|
+
| Event-to-dashboard latency (carried over from v2 FR-41) | < 1 second, reaffirmed and benchmarked in CI |
|
|
188
|
+
| Task Graph Store write throughput | ≥ 100 checkpoint writes/sec sustained |
|
|
189
|
+
| Memory footprint of idle orchestrator (no active agents) | < 200MB RSS |
|
|
190
|
+
|
|
191
|
+
These targets are tracked via an automated benchmark suite (`pnpm bench`) run in CI on every release, with results published in `BENCHMARKS.md` — preventing silent performance regressions as the codebase grows (directly serving the "scalability of package development" requirement).
|
|
192
|
+
|
|
193
|
+
---
|
|
194
|
+
|
|
195
|
+
## 9. Core Concept: Scalability of Development (Plugin SDK & Contribution Pipeline)
|
|
196
|
+
|
|
197
|
+
### 9.1 Plugin SDK
|
|
198
|
+
- FR-75: A published `@speexor/sdk` package providing TypeScript types, a local extension scaffold generator (`speexor ext create --type skill`), and a local test harness (`speexor ext test`) that runs an extension against mock Task Nodes without needing a full orchestrator session.
|
|
199
|
+
- FR-76: Every core interface (`AgentBackendPlugin`, `ActionPlugin`, `TrackerPlugin`, `SCMPlugin`, `NotifierPlugin`, `RuntimePlugin`, `TerminalPlugin`, and the v4 `SkillModule`/`McpServerRegistration` contracts) is versioned independently with semver, decoupled from core release cadence, so third-party extensions don't break on every core update.
|
|
200
|
+
|
|
201
|
+
### 9.2 Contribution Pipeline
|
|
202
|
+
- FR-77: `CONTRIBUTING.md` (carried over and expanded from v1) defines: how to scaffold a new extension, the manifest schema, the review checklist for Marketplace submission, and the glossary lint rule (§4).
|
|
203
|
+
- FR-78: CI pipeline runs: unit tests, the Plugin SDK contract test suite (ensures any new adapter actually implements the interface correctly), the benchmark suite (§8.3), and a security lint pass (checks manifest permissions against actual code behavior heuristically, e.g., flags `fileSystem: none` extensions that call `fs.writeFile`).
|
|
204
|
+
- FR-79: Architecture Decision Records (ADRs) stored in `docs/adr/` for every core architectural change, so external contributors understand *why*, not just *what* — critical for long-term maintainability as the contributor base grows beyond Aditya.
|
|
205
|
+
|
|
206
|
+
---
|
|
207
|
+
|
|
208
|
+
## 10. Core Concept: Interactive, Readable Dashboard
|
|
209
|
+
|
|
210
|
+
### 10.1 Design Principles
|
|
211
|
+
1. **Glanceable first, detailed second.** The landing view answers "what is happening right now" in one screen, with drill-down available but never required.
|
|
212
|
+
2. **English-first, plain language.** No unexplained jargon ("worktree," "checkpoint") on first-time screens — tooltips and a built-in glossary panel (rendered directly from `GLOSSARY.md`) explain terms inline.
|
|
213
|
+
3. **Live, not static.** Every panel updates via WebSocket without manual refresh, consistent with v2's real-time requirement.
|
|
214
|
+
4. **Actionable, not just observational.** Every status shown has a corresponding action available right there (approve, override, pause, drill into logs) — no "go check the CLI for this."
|
|
215
|
+
|
|
216
|
+
### 10.2 Key Views
|
|
217
|
+
| View | Content |
|
|
218
|
+
|---|---|
|
|
219
|
+
| **Mission Control (home)** | Active Task Graphs across all projects, agent count per project, pending approvals badge, today's digest summary |
|
|
220
|
+
| **Task Graph View (v2, refined)** | Interactive, zoomable/collapsible DAG; color-coded status; click any node to see its Agent, Subagents, and live activity feed |
|
|
221
|
+
| **Agent Fleet View (v2, refined)** | Card-per-agent showing: current Task Node, Agent Backend in use, **currently active Skill**, last 3 commands executed, CPU/memory usage, uptime |
|
|
222
|
+
| **Extension Manager (new, v4)** | Installed extensions, permission footprint per extension, Marketplace search/install |
|
|
223
|
+
| **Decision Journal / Review Queue (v3, refined)** | Filterable, searchable log of every autonomous decision, confidence score, and reversibility tag; one-click "flag for follow-up" |
|
|
224
|
+
| **Approval Inbox (v3, refined)** | High-stakes actions awaiting confirmation, with countdown timer and default-action indicator |
|
|
225
|
+
| **Command Palette** | `Cmd/Ctrl+K` quick action — submit a new task, override, search logs, jump to any agent — for power users who don't want to click through panels |
|
|
226
|
+
| **Settings → Security** | Visual permission matrix across all installed extensions; one place to audit "what can touch what" |
|
|
227
|
+
|
|
228
|
+
### 10.3 Accessibility & Readability
|
|
229
|
+
- FR-80: WCAG 2.1 AA-equivalent contrast and keyboard navigation support across all dashboard views.
|
|
230
|
+
- FR-81: A "Simple Mode" (carried over from v3) hides the Task Graph DAG and Extension Manager technical details behind a plain-language summary view, toggleable per user, without losing functionality underneath.
|
|
231
|
+
- FR-82: All status colors are paired with text labels and icons (not color alone), supporting colorblind users.
|
|
232
|
+
|
|
233
|
+
---
|
|
234
|
+
|
|
235
|
+
## 11. Updated Configuration (`speexor.config.yaml` v4)
|
|
236
|
+
|
|
237
|
+
```yaml
|
|
238
|
+
$schema: https://.../config.schema.v4.json
|
|
239
|
+
language:
|
|
240
|
+
default: en
|
|
241
|
+
ui: en
|
|
242
|
+
|
|
243
|
+
license: MIT
|
|
244
|
+
telemetry:
|
|
245
|
+
enabled: false # opt-in only, never default-on; local-first means no silent phone-home
|
|
246
|
+
|
|
247
|
+
extensions:
|
|
248
|
+
marketplaceIndex: "https://registry.speexor.dev/index.json" # can be disabled for air-gapped use
|
|
249
|
+
enabled:
|
|
250
|
+
- "speexor-agent-opencode"
|
|
251
|
+
- "speexor-agent-claude-code"
|
|
252
|
+
- "speexor-skill-test-writer"
|
|
253
|
+
permissionsMode: "strict" # strict | relaxed (relaxed still requires manual confirmation, never silent)
|
|
254
|
+
|
|
255
|
+
security:
|
|
256
|
+
secretsBackend: "os-keychain"
|
|
257
|
+
encryptAtRest: true
|
|
258
|
+
sandboxExtensions: true
|
|
259
|
+
|
|
260
|
+
performance:
|
|
261
|
+
maxConcurrentAgents: "auto" # dynamic resource-aware throttling (§8.2), or an explicit integer to override
|
|
262
|
+
workerThreadPoolSize: 4
|
|
263
|
+
|
|
264
|
+
# All sections from v1, v2, v3 configs remain valid and unchanged.
|
|
265
|
+
```
|
|
266
|
+
|
|
267
|
+
---
|
|
268
|
+
|
|
269
|
+
## 12. Non-Functional Requirements Summary (v4 additions)
|
|
270
|
+
|
|
271
|
+
| Category | Requirement |
|
|
272
|
+
|---|---|
|
|
273
|
+
| **Cost** | Zero cost to install, run, and use core features, forever (MIT license, no telemetry-funded tiers) |
|
|
274
|
+
| **Data Security** | No plaintext secrets at rest; sandboxed extensions; explicit, auditable permission model |
|
|
275
|
+
| **Performance** | Benchmarked, CI-enforced latency/throughput targets (§8.3) |
|
|
276
|
+
| **Scalability (codebase)** | Decoupled, semver-versioned plugin contracts; documented SDK; CI contract tests for all extension types |
|
|
277
|
+
| **Clarity/Disambiguation** | One canonical glossary enforced by lint; manifest-driven, machine-verifiable extension contracts |
|
|
278
|
+
| **Internationalization** | English default; i18n architecture open for community-contributed locales (carried from v3) |
|
|
279
|
+
| **Portability** | Linux, macOS, Windows — unchanged from v1, reaffirmed |
|
|
280
|
+
|
|
281
|
+
---
|
|
282
|
+
|
|
283
|
+
## 13. Research Grounding
|
|
284
|
+
|
|
285
|
+
v4's design choices are deliberately aligned with established patterns in multi-agent LLM systems research and practice, rather than invented from scratch:
|
|
286
|
+
|
|
287
|
+
- **Hierarchical task decomposition with planner/executor separation** (v2's Planner Engine, reinforced in v4's Decision Engine) mirrors the planner–executor split popularized by frameworks such as **AutoGPT**, **BabyAGI**, and more formally explored in **MetaGPT**'s role-based multi-agent collaboration and **AutoGen**'s conversable-agent orchestration pattern — both of which demonstrate that decomposing a goal into role-specific subtasks improves task completion reliability over a single monolithic agent.
|
|
288
|
+
- **Reasoning-then-acting loops with explicit confidence/reflection** (the Never-Ask Decision Ladder, v3) draws on the **ReAct** (Reason+Act) prompting pattern and **Reflexion**'s self-evaluation loop, where an agent critiques its own intermediate outputs before committing to an action — adapted here into a confidence score and reversibility classification rather than a free-text self-critique.
|
|
289
|
+
- **Tree-structured exploration of solution paths** (Task Graph as a DAG rather than a list) is conceptually related to **Tree of Thoughts**, generalized from "thoughts" to "executable subtasks with dependencies."
|
|
290
|
+
- **Long-horizon autonomous agents that persist and accumulate skills over time** (the Skill system, Extension Marketplace, and Decision Journal as long-term memory) take inspiration from **Voyager**'s skill-library approach in open-ended environments, where an agent's previously learned skills become reusable building blocks for future tasks — directly mirrored in Speexor's Skill extension type being installable, versioned, and reused across Task Graphs.
|
|
291
|
+
- **Multi-agent role specialization and communication protocols** (Agent ↔ Subagent handoff, governance/approval flow) reflect patterns from **CAMEL**'s role-playing framework and **AutoGen**'s structured agent-to-agent message passing, adapted to a software-engineering-first, file/git-based execution context rather than pure conversational exchange.
|
|
292
|
+
- **Sandboxed, permissioned tool use** (the v4 Security Model) follows the general direction of the **Model Context Protocol (MCP)** itself and broader tool-use safety literature, which emphasizes scoped, declared tool permissions over unrestricted agent action as systems are given more autonomy.
|
|
293
|
+
|
|
294
|
+
This grounding is included to make explicit that Speexor's architecture is not ad hoc — it operationalizes well-documented patterns from the multi-agent LLM literature into a concrete, installable, local-first developer tool. (Note: specific implementation details remain Speexor's own design; the references above describe general patterns and frameworks, not sources to be reproduced verbatim.)
|
|
295
|
+
|
|
296
|
+
---
|
|
297
|
+
|
|
298
|
+
## 14. Roadmap (v4, building on M0–M21 from v1–v3)
|
|
299
|
+
|
|
300
|
+
| Phase | Scope | Estimate |
|
|
301
|
+
|---|---|---|
|
|
302
|
+
| **M22 — Glossary & Disambiguation Pass** | Canonical glossary, lint rule, retrofit existing v1-v3 docs/code naming | 1 week |
|
|
303
|
+
| **M23 — Extension Manifest & Permission Model** | Manifest schema, permission declaration, install-time confirmation flow | 2 weeks |
|
|
304
|
+
| **M24 — Secrets Vault & Sandboxing** | OS-keychain integration, sandboxed extension execution | 2-3 weeks |
|
|
305
|
+
| **M25 — Extension Manager CLI + Marketplace Index v1** | `speexor ext *` commands, simple Git-based registry index | 2 weeks |
|
|
306
|
+
| **M26 — Performance Architecture Retrofit** | Worker-thread pool, resource-aware scheduler, backpressure event bus | 2-3 weeks |
|
|
307
|
+
| **M27 — Benchmark Suite & CI Gating** | `pnpm bench`, performance regression CI gate, `BENCHMARKS.md` | 1 week |
|
|
308
|
+
| **M28 — Plugin SDK & Contribution Pipeline** | `@speexor/sdk`, scaffold generator, contract test suite, ADR process | 2-3 weeks |
|
|
309
|
+
| **M29 — Dashboard v4 (Extension Manager + Mission Control + Command Palette)** | New dashboard views, accessibility pass | 3 weeks |
|
|
310
|
+
| **M30 — Public Launch Readiness** | Licensing finalization (MIT), README/SETUP polish, public Marketplace index hosting, launch docs | 1-2 weeks |
|
|
311
|
+
|
|
312
|
+
Total additional estimate: **~16-20 weeks** after v1–v3 foundations are stable.
|
|
313
|
+
|
|
314
|
+
---
|
|
315
|
+
|
|
316
|
+
## 15. New Risks & Mitigations (v4-specific)
|
|
317
|
+
|
|
318
|
+
| Risk | Impact | Mitigation |
|
|
319
|
+
|---|---|---|
|
|
320
|
+
| Malicious extension published to the open Marketplace | High | Signed manifests, "verified publisher" community review badge, sandboxing as defense-in-depth even if review fails |
|
|
321
|
+
| Performance retrofit (worker threads, resource-aware scheduling) introduces regressions in already-working v1-v3 scheduler | Medium | Benchmark suite (§8.3) run before/after retrofit; feature-flagged rollout |
|
|
322
|
+
| Free/OSS model creates unsustainable maintenance burden for a solo maintainer | High | Plugin SDK + contribution pipeline explicitly designed to shift extension maintenance to the community; core stays small and stable |
|
|
323
|
+
| Glossary enforcement slows down early development velocity | Low | Lint rule is warning-only initially, hard-enforced only from M22 onward after existing code is retrofitted |
|
|
324
|
+
| Permission model is too strict, frustrating legitimate extension authors | Medium | "Relaxed" permission mode available, still requires explicit confirmation (never silent), can be tuned based on community feedback post-launch |
|
|
325
|
+
| Local-first constraint limits future monetization/sustainability options | Medium | Explicitly out of scope for v4 — acceptable tradeoff per stated goal (G3); revisit only as an optional separate hosted add-on later, never replacing the free local core |
|
|
326
|
+
|
|
327
|
+
---
|
|
328
|
+
|
|
329
|
+
## 16. Success Metrics (v4)
|
|
330
|
+
|
|
331
|
+
- A third-party developer (not Aditya) successfully publishes a working Skill or Agent Backend adapter to the Marketplace and it is installed and used by at least one other user.
|
|
332
|
+
- Core orchestration runs with zero outbound network calls other than the user's own configured LLM provider and SCM — verifiable via a network monitor during a test session.
|
|
333
|
+
- Benchmark suite passes all targets in §8.3 on a defined reference machine (8-core/16GB), gated in CI.
|
|
334
|
+
- A first-time user, given only the dashboard (no docs), correctly identifies what each visible agent is doing within 30 seconds.
|
|
335
|
+
- Zero reported incidents of an installed extension accessing a permission it did not declare, across the beta period.
|
|
336
|
+
|
|
337
|
+
---
|
|
338
|
+
|
|
339
|
+
## 17. Open Questions (Update from v3)
|
|
340
|
+
|
|
341
|
+
1. Should the initial Marketplace index be a simple GitHub-hosted JSON file (zero infra cost, fully aligned with "free forever") or a lightweight self-hosted registry service — and if the latter, who funds/maintains its uptime long-term?
|
|
342
|
+
2. How strict should the "verified publisher" review process be without becoming a bottleneck that defeats the purpose of an open marketplace?
|
|
343
|
+
3. For sandboxing (§7.3), is Node.js's `worker_threads`/`child_process` permission wrapper sufficient, or should higher-risk extensions (full shell access) be required to run in a stronger isolation boundary (e.g., a lightweight container/VM) even at local cost to startup latency?
|
|
344
|
+
4. Should the dynamic `maxConcurrentAgents: "auto"` resource-aware throttling be configurable with a minimum/maximum bound, to avoid surprising a user who expects a fixed number of parallel agents?
|
|
345
|
+
5. Given Aditya's existing CEO Orchestrator system (67+ agents, 261+ skills) and skill repo conventions (ECC/affaan-m pattern), should the v4 Skill extension format be designed for direct backward compatibility/import from that existing system, rather than inventing a new format from scratch?
|
|
346
|
+
|
|
347
|
+
---
|
|
348
|
+
|
|
349
|
+
*PRD v4 completes the platform layer on top of v1 (execution), v2 (task graph & observability), and v3 (autonomy & universality): it makes Speexor genuinely extensible, secure, fast, free, and approachable to external developers — turning it from a personal tool into a real open-source platform.*
|