vgxness 1.5.1 → 1.5.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (42) hide show
  1. package/README.md +23 -2
  2. package/dist/agents/agent-seed-service.js +10 -0
  3. package/dist/agents/canonical-agent-manifest.js +177 -0
  4. package/dist/agents/canonical-agent-projection.js +146 -0
  5. package/dist/agents/renderers/claude-renderer.js +30 -52
  6. package/dist/cli/bun-bin.js +6 -0
  7. package/dist/cli/cli-help.js +3 -0
  8. package/dist/cli/commands/agent-skill-dispatcher.js +6 -5
  9. package/dist/cli/commands/mcp-dispatcher.js +65 -3
  10. package/dist/cli/index.js +1 -1
  11. package/dist/governance/governance-report-builder.js +45 -26
  12. package/dist/mcp/claude-code-agent-config.js +79 -0
  13. package/dist/mcp/claude-code-config.js +84 -0
  14. package/dist/mcp/client-install-claude-code-contract.js +86 -0
  15. package/dist/mcp/client-install-claude-code.js +85 -0
  16. package/dist/mcp/index.js +5 -0
  17. package/dist/mcp/opencode-default-agent-config.js +7 -113
  18. package/dist/mcp/provider-canonical-agent-manifest.js +39 -0
  19. package/dist/mcp/provider-change-plan.js +57 -1
  20. package/dist/mcp/provider-doctor.js +54 -0
  21. package/dist/mcp/provider-status.js +82 -2
  22. package/dist/mcp/schema.js +2 -2
  23. package/dist/mcp/validation.js +1 -1
  24. package/dist/memory/memory-service.js +4 -0
  25. package/dist/sdd/sdd-workflow-service.js +129 -59
  26. package/dist/setup/providers/claude-setup-adapter.js +7 -4
  27. package/docs/architecture.md +54 -112
  28. package/docs/cli.md +53 -0
  29. package/docs/code-runtime.md +218 -0
  30. package/docs/contributing.md +120 -0
  31. package/docs/glossary.md +211 -0
  32. package/docs/mcp.md +144 -0
  33. package/docs/prd.md +23 -26
  34. package/docs/providers.md +123 -0
  35. package/docs/roadmap.md +88 -0
  36. package/docs/safety.md +147 -0
  37. package/docs/storage.md +93 -0
  38. package/package.json +1 -1
  39. package/docs/funcionamiento-del-sistema.md +0 -865
  40. package/docs/harness-gap-analysis.md +0 -243
  41. package/docs/vgxcode.md +0 -87
  42. package/docs/vgxness-code.md +0 -48
package/docs/mcp.md ADDED
@@ -0,0 +1,144 @@
1
+ # MCP tools
2
+
3
+ VGXNESS exposes 38 typed MCP tools over stdio through `vgxness mcp start`. The canonical list is the `SUPPORTED_VGX_MCP_TOOL_NAMES` array in `src/mcp/schema.ts` — treat that as the source of truth. The list below mirrors it; if a tool appears here that is not in the array, or vice versa, the array wins.
4
+
5
+ Tools are exposed under the `vgxness_*` prefix. Some MCP hosts strip the prefix and accept the short form (`sdd_status`, `memory_save`, etc.). The schema accepts both.
6
+
7
+ All tools:
8
+
9
+ - Are read-only or auditable on the VGXNESS side. The only state-mutating tools are the ones explicitly named `*_save`, `*_update`, `*_start`, `*_checkpoint`, `*_finalize`, `*_accept_artifact`, `*_set`, `*_append_activity`, `*_close`, `*_record_*`.
10
+ - Validate input with `zod` before dispatching to the service layer.
11
+ - Return either a typed result or a `VgxMcpError` with a `code` and `message`.
12
+ - Do not require the agent to import or instantiate anything; pass JSON, get JSON.
13
+
14
+ ## SDD workflow (10)
15
+
16
+ | Tool | Purpose | Required inputs | Notes |
17
+ |---|---|---|---|
18
+ | `vgxness_sdd_status` | Report per-phase artifact presence and the next ready missing phase. | `project`, `change` | |
19
+ | `vgxness_sdd_next` | Recommend the next phase decision for a change. | `project`, `change` | |
20
+ | `vgxness_sdd_ready` | Check whether a specific phase is ready. | `project`, `change`, `phase`; optional `runId`, `agentId` | Returns structured `SddReadiness` with `blockedPrerequisites`. |
21
+ | `vgxness_sdd_get_readiness` | Same as `ready` with explicit output shape. | same as `ready` | |
22
+ | `vgxness_sdd_save_artifact` | Persist phase content under canonical topic key `sdd/{change}/{phase}`. | `project`, `change`, `phase`, `content` | Saving does not imply acceptance. |
23
+ | `vgxness_sdd_accept_artifact` | Record explicit human acceptance for a phase. | `project`, `change`, `phase`, `acceptedBy` (`{type:"human", id, displayName?}`); optional `acceptedAt`, `note`, `notes`, `rationale`, `runId`, `agentId` | `acceptedBy.type` must be `"human"`. Runtime rejects agent or anonymous acceptance. |
24
+ | `vgxness_sdd_get_artifact` | Read one full artifact. | `project`, `change`, `phase`; optional `payloadMode` (`compact`/`verbose`) | |
25
+ | `vgxness_sdd_list_artifacts` | List all artifacts for a change in canonical phase order. | `project`, `change`; optional `payloadMode` | |
26
+ | `vgxness_sdd_cockpit` | Aggregate per-phase status, blockers, and recommended next action. | `project`, `change` | Returns `SddCockpit` with `aggregateBlockers` of kinds `missing-topic-key`, `unaccepted-phase`, `legacy-artifact`, `readiness`. |
27
+ | `vgxness_governance_report` | Build a redacted governance report for a project/change/run. | `project`; optional `change`, `runId`, `workflow`, `phase`, `agentId`, `agent` (with `id`, `name`, `mode`, `scope`), `payloadMode` | See [Safety model](./safety.md) for redaction rules. |
28
+
29
+ ## Memory (4)
30
+
31
+ | Tool | Purpose | Required inputs |
32
+ |---|---|---|
33
+ | `vgxness_memory_save` | Upsert memory observation. | `project`, `type` (`architecture`/`decision`/`bugfix`/`pattern`/`config`/`discovery`/`learning`/`preference`/`manual`), `title`, `content`; optional `scope` (`project`/`personal`), `topicKey` |
34
+ | `vgxness_memory_search` | FTS5 search over memory. | `limit` (1–100); optional `project`, `scope`, `type`, `topicKey`, `query` |
35
+ | `vgxness_memory_get` | Read one observation by id. | `id` |
36
+ | `vgxness_memory_update` | Patch an existing observation. | `id`; optional `type`, `title`, `content`, `topicKey` |
37
+
38
+ `topicKey` is the durable upsert key: re-saving with the same key updates the existing observation rather than creating a duplicate.
39
+
40
+ ## Sessions (5)
41
+
42
+ | Tool | Purpose | Required inputs | Notes |
43
+ |---|---|---|---|
44
+ | `vgxness_session_start` | Begin a session record. | `project`; optional `id`, `directory` | If `id` is omitted, the runtime generates one. |
45
+ | `vgxness_session_append_activity` | Append a structured event to a session. | `sessionId`, `actor`, `kind` (`prompt`/`tool_call`/`artifact`/`summary`/`error`), `payloadJson` | `payloadJson` is a stringified JSON value, validated before append. |
46
+ | `vgxness_session_close` | Close a session with a summary. | `sessionId`, `actor`, `summary` (non-empty, trimmed) | |
47
+ | `vgxness_session_restore` | Read the latest restorable session for a project/directory. | `project`; optional `directory` | One signal — not authoritative. Prefer `vgxness_context_cockpit`. |
48
+ | `vgxness_context_cockpit` | Read-only cockpit: recent sessions, bounded memory previews, last restorable session. | `project`; optional `directory`, `limit` | Does not trace, mutate state, run preflight, or write provider config. |
49
+
50
+ ## Agents, profile, skills (5)
51
+
52
+ | Tool | Purpose | Required inputs | Notes |
53
+ |---|---|---|---|
54
+ | `vgxness_agent_resolve` | Rank registered agents/subagents for a task. | `project`, `scope?`, `taskDescription` or `intent`; optional `desiredCapabilities`, `workflow`, `phase`, `providerAdapter`, `mode` (`agent`/`subagent`) | Deterministic rule-based ranking. No model call. |
55
+ | `vgxness_agent_activate` | Build an activation envelope for the resolved agent. | `project`, `agentId` or `name`; optional `scope`, `workflow`, `phase`, `providerAdapter`, `mode`, `workspaceRoot`, `maxSourceBytes`, `payloadMode` | Default `payloadMode` is `compact`. |
56
+ | `vgxness_manager_profile_get` | Read the manager profile overlay. | `project`; optional `scope`, `managerName` | |
57
+ | `vgxness_manager_profile_set` | Upsert a manager profile overlay. | input shape from `SaveManagerProfileOverlayInput` | Requires explicit human authorization in the harness prompt. |
58
+ | `vgxness_skill_payload` | Resolve and build a provider-agnostic skill injection payload. | `agentId`/`name`+`project`+`scope`, `workflow`, `phase`, `providerAdapter`; `workspaceRoot`; optional `maxSourceBytes`, `mode` | `mode` defaults to `compact`. See [Code runtime](./code-runtime.md) for how the runtime consumes this. |
59
+
60
+ ## OpenCode preview (2)
61
+
62
+ | Tool | Purpose | Required inputs | Notes |
63
+ |---|---|---|---|
64
+ | `vgxness_opencode_manager_payload` | Build the OpenCode manager payload envelope. | `OpenCodeManagerPayloadInput` shape (project, scope, agent, workflow/phase, workspaceRoot, maxSourceBytes, payloadMode) | `installable: false`, `readOnly: true`. |
65
+ | `vgxness_opencode_handoff_preview` | Compose a full handoff preview: provider artifacts + skill diagnostics + SDD status + provider status + safety flags. | `project`; optional `scope`, `agentId`, `agentName`, `workspaceRoot`, `maxSourceBytes`, `change`, `phase` | Read-only preview. Does not execute OpenCode, install hooks, or write provider config. |
66
+
67
+ ## Runs (8)
68
+
69
+ | Tool | Purpose | Required inputs | Notes |
70
+ |---|---|---|---|
71
+ | `vgxness_run_list` | List runs with filters. | `limit`; optional `project`, `status` (one of the 8 `RunStatus` values) | |
72
+ | `vgxness_run_get` | Read a run with `events`, `checkpoints`, `approvals`, `operationAttempts`. | `id` | |
73
+ | `vgxness_run_start` | Create a run record. | `CreateRunInput` shape | |
74
+ | `vgxness_run_checkpoint` | Append a resumable JSON state. | `AppendRunCheckpointInput` shape | |
75
+ | `vgxness_run_finalize` | Finalize a run with `outcome` and `outcomeReason`. | `FinalizeRunInput` shape | `outcome` must match a terminal `status`. Re-finalization is rejected. |
76
+ | `vgxness_run_preflight` | Run policy evaluation and return the decision + planned execution isolation strategy. | `PreflightRunOperationInput` shape | May create a pending approval when the decision is `ask`. Does not invoke an executor. |
77
+ | `vgxness_run_resume_inspect` | Plan-only resume advisory. | `runId` | Returns `RunResumeOrchestrationPlan` with blockers and `nextAction` (`resolve-approval`/`inspect-run`/`retry-check`/`manual-recovery`/`no-action`). |
78
+ | `vgxness_run_resume_gate` | Evaluate retry policy for an approval without executing. | `approvalId`; optional `policy` | Default policy is `never`. See [Safety model](./safety.md) for the policy table. |
79
+
80
+ Run `status` accepts the full 8-state vocabulary: `created`, `planned`, `running`, `needs-human`, `completed`, `failed`, `blocked`, `cancelled`. `outcome` is one of `success`, `partial`, `failure`, `blocked`, `cancelled`.
81
+
82
+ ## Providers (3)
83
+
84
+ | Tool | Purpose | Required inputs | Notes |
85
+ |---|---|---|---|
86
+ | `vgxness_provider_status` | Report provider health for a project/scope. | `ProviderHealthInput` shape (project, scope, workspaceRoot, providerAdapter) | Read-only. |
87
+ | `vgxness_provider_doctor` | Run provider health checks with structured JSON output. | same as `status` | |
88
+ | `vgxness_provider_change_plan` | Compose a read-only provider config change plan. | `ProviderChangePlanInput` shape (`provider`: `opencode`/`claude`/`antigravity`/`custom`; `changeType`: `opencode-mcp-install`/`setup`/`install`/`config-preparation`; `workspaceRoot`; `payloadMode`) | Does not write provider config. See [Providers](./providers.md). |
89
+
90
+ ## Verification (1)
91
+
92
+ | Tool | Purpose | Required inputs |
93
+ |---|---|---|
94
+ | `vgxness_verification_plan` | Recommend a verification plan for a change type. | `changeType` (`docs-only`/`test-only`/`cli`/`mcp`/`sdd-storage`/`provider-setup`/`package-release`/`workflow-runs`) |
95
+
96
+ ## Usage patterns
97
+
98
+ Starting, resuming, and ending a session:
99
+
100
+ ```text
101
+ vgxness_session_start { project: "vgxness", directory: "/path" }
102
+ vgxness_session_append_activity { sessionId, actor, kind, payloadJson }
103
+ vgxness_session_close { sessionId, actor, summary }
104
+ ```
105
+
106
+ Inspecting SDD before doing any work:
107
+
108
+ ```text
109
+ vgxness_sdd_cockpit { project: "vgxness", change: "new-flow" }
110
+ vgxness_sdd_get_readiness { project: "vgxness", change: "new-flow", phase: "proposal" }
111
+ ```
112
+
113
+ The cockpit returns `aggregateBlockers`; if any are `unaccepted-phase`, the agent should not run the next phase until a human accepts the prerequisite.
114
+
115
+ Recording a phase with explicit human acceptance:
116
+
117
+ ```text
118
+ vgxness_sdd_save_artifact { project, change, phase: "proposal", content: "..." }
119
+ # human opens the harness and runs:
120
+ vgxness_sdd_accept_artifact { project, change, phase: "proposal", acceptedBy: { type: "human", id: "uziel" }, rationale: "Reviewed proposal." }
121
+ ```
122
+
123
+ Running a planned operation:
124
+
125
+ ```text
126
+ vgxness_run_start { project, userIntent, workflow: "sdd", phase: "apply-progress", selectedAgentId, providerAdapter, model }
127
+ vgxness_run_preflight { runId, category: "edit", operation: "apply_patch", workspaceRoot, targetPath, payload }
128
+ # if decision is "ask", wait for human approval; on approval, the harness resumes via run_resume_inspect + run_resume_gate.
129
+ vgxness_run_checkpoint { runId, label: "after-step-1", state: {...} }
130
+ vgxness_run_finalize { runId, outcome: "success", outcomeReason: "..." }
131
+ ```
132
+
133
+ ## Safety boundary
134
+
135
+ MCP tools do not:
136
+
137
+ - Write provider config (`.opencode/`, `.claude/`, `opencode.json`, `CLAUDE.md`).
138
+ - Execute providers (`opencode`, `claude`, etc.).
139
+ - Install global memory.
140
+ - Create `openspec/`.
141
+ - Bypass permission policy.
142
+ - Mutate other tools' state outside the VGXNESS SQLite store.
143
+
144
+ Tools that would normally mutate external state (provider install, shell, network) are split into a plan/preview step and an explicit `apply` step. The apply step requires an MCP-side `apply` flag or a separate `vgxness` CLI command (e.g. `vgxness setup apply --yes`).
package/docs/prd.md CHANGED
@@ -128,18 +128,7 @@ Minimum MCP capabilities:
128
128
 
129
129
  Candidate MCP tools:
130
130
 
131
- ```text
132
- vgxness_sdd_status
133
- vgxness_sdd_next
134
- vgxness_sdd_ready
135
- vgxness_sdd_save_artifact
136
- vgxness_run_start
137
- vgxness_run_checkpoint
138
- vgxness_run_request_approval
139
- vgxness_agent_resolve
140
- vgxness_skill_payload
141
- vgxness_profile_get
142
- ```
131
+ The current MCP surface covers SDD status and artifacts, memory read/write, sessions, agent resolution and activation, skill payload, OpenCode manager payload, run lifecycle (start, list, get, preflight, checkpoint, finalize, resume inspect, resume gate), provider status/doctor/change-plan, verification plan, governance report, and context cockpit. The full, current list lives in [MCP tools](./mcp.md) and in `SUPPORTED_VGX_MCP_TOOL_NAMES` (`src/mcp/schema.ts`); treat that array as the source of truth.
143
132
 
144
133
  #### CLI
145
134
 
@@ -213,8 +202,8 @@ The same flow must be available through CLI flags for automation and CI-friendly
213
202
 
214
203
  Initial integration targets:
215
204
 
216
- - OpenCode
217
- - Claude Code
205
+ - OpenCode — primary supported provider. The configurator plane renders OpenCode MCP config and manager/SDD agent definitions; the code runtime speaks to any OpenAI-compatible endpoint, including OpenCode's local bridge.
206
+ - Claude Code — preview/manual only as of v1.5.1. The canonical agent manifest declares Claude support as `preview` with an explicit reason: VGXNESS does not install Claude or write `.claude/` or `CLAUDE.md`. Owners of Claude-only workflows must run setup themselves.
218
207
 
219
208
  Pi/`gentle-pi` is a design reference and future adapter target, not part of the first integration target list unless the MVP scope is explicitly expanded.
220
209
 
@@ -358,15 +347,23 @@ The MVP is successful when an advanced individual developer can:
358
347
 
359
348
  ## Open questions
360
349
 
361
- - What is the first integration adapter: OpenCode or Claude Code?
362
- - Should memory storage be per-repo by default, with global memory in a user-level directory?
363
- - What config format should define agents/subagents?
364
- - What config format should define skills and skill versions?
365
- - Which skill improvement proposals should require approval versus automatic rejection?
366
- - Which commands form the first public CLI surface?
367
- - Which TUI framework should be used for the first implementation?
368
- - Should the MCP server run only over stdio for MVP, or also support local HTTP later?
369
- - What is the safest default install command and update channel?
370
- - What privacy/export guarantees are required before public release?
371
- - What is the first sandbox strategy: normal workspace, git worktree, or process/container isolation?
372
- - What trace format should be used for local inspection and future cloud sync?
350
+ Many of the early PRD questions are resolved in v1.5.1. Tracking them here keeps the historical record.
351
+
352
+ | Question | Resolved as of v1.5.1 |
353
+ |---|---|
354
+ | First integration adapter | OpenCode primary; Claude is preview/manual. |
355
+ | Memory storage scopes | Project memory and personal/global memory live in separate SQLite stores; `--db` flag and `VGXNESS_DB_PATH` env var override. |
356
+ | Agent config format | Provider-neutral schema in `src/agents/schema.ts`; canonical manifest with validation in `src/agents/canonical-agent-manifest.ts`. |
357
+ | Skill config format | Versioned skills with source metadata (`path`/`url`/`inline`); active version gating in `src/skills/skill-registry-service.ts`. |
358
+ | Skill improvement approval | All proposals require explicit human approval before activation; rejected/cancelled proposals cannot be applied. |
359
+ | First public CLI surface | `vgxness {init, setup, doctor, mcp, sdd, agents, skills, memory, runs, code, opencode}`. See [CLI reference](./cli.md). |
360
+ | First TUI framework | OpenTUI (`@opentui/core`) for the main menu and setup screens; legacy tui-kit screens still in tree for backwards compatibility. |
361
+ | MCP transport | Local stdio for MVP; HTTP deferred. |
362
+ | Default install command | `npm install -g vgxness` ships both `vgxness` and `vgx` bins; Bun 1.3.14+ is the canonical installed runtime. |
363
+ | Sandbox strategy | Planning-only in v1.5.1 (`src/runs/sandbox-worktree-planning.ts`); real sandbox/worktree execution is follow-up. Approval-gated edits/shell happen through the code runtime's executors. |
364
+
365
+ Still open:
366
+
367
+ - Privacy/export guarantees required before public release (depends on release scope).
368
+ - Distributed multi-worker execution model (post-MVP).
369
+ - Web/hosted console surface (post-MVP).
@@ -0,0 +1,123 @@
1
+ # Providers
2
+
3
+ VGXNESS is provider-agnostic at the core: the registry stores provider-neutral definitions and adapters translate them into provider-specific config and runtime behavior. This document covers the two adapter layers: the **control-plane adapter** (OpenCode renderer today, with a Claude preview renderer in the tree) and the **code-runtime provider adapter** (OpenAI-compatible + fake).
4
+
5
+ ## Status (v1.5.1)
6
+
7
+ | Provider | Control plane | Code runtime | Notes |
8
+ |---|---|---|---|
9
+ | OpenCode | `managed` | n/a (target) | Primary supported provider. The configurator renders OpenCode MCP config and manager/SDD agent definitions into the chosen scope. |
10
+ | Claude Code | `preview/manual` | n/a | The canonical agent manifest declares Claude support as `preview` with an explicit reason: VGXNESS does not install Claude or write `.claude/` or `CLAUDE.md`. Owners of Claude-only workflows must run setup themselves. |
11
+ | Antigravity | `placeholder` | n/a | Listed in the TUI Installation surface as coming-soon. |
12
+ | Custom / future | `extension` | extension point | Per the [Architecture](./architecture.md) decision, anything not OpenCode or Claude is a custom extension. |
13
+ | OpenAI-compatible | n/a | `openai-compatible-provider-adapter.ts` | Real adapter used by `vgxness code`. Speaks to any OpenAI-compatible endpoint. |
14
+ | Fake | n/a | `fake-provider-adapter.ts` | Deterministic, offline; for unit tests and CI. |
15
+
16
+ ## Control-plane adapter contract
17
+
18
+ The control-plane adapter takes a root agent plus optional registered subagents and returns previewable artifacts. It does not mutate the registry, write provider config, or call providers.
19
+
20
+ | Type | Purpose |
21
+ |---|---|
22
+ | `ProviderRenderer` | Named renderer for one output format/provider. |
23
+ | `ProviderRenderInput` | A root agent plus optional registered subagents. |
24
+ | `ProviderRenderResult` | Generated artifacts, provider name, `installable: false`, warnings. |
25
+ | `ProviderRenderArtifact` | Relative path, content type, and generated contents. |
26
+
27
+ ### OpenCode renderer
28
+
29
+ `src/agents/renderers/opencode-renderer.ts` renders a single OpenCode config preview with `$schema` and an `agent` object. Top-level agents default to `primary`; subagents render as `subagent`. Agent keys are sanitized deterministically from registry names, and rendering rejects key collisions instead of overwriting generated config.
30
+
31
+ ```bash
32
+ vgxness agents render --provider opencode --project vgxness --name apply-agent
33
+ ```
34
+
35
+ The output is previewable JSON; the renderer does not write to `.opencode/`, `.claude/`, or any user/global provider config.
36
+
37
+ ### JSON renderer
38
+
39
+ `src/agents/renderers/json-renderer.ts` produces a debug/export shape. It includes the matching `adapters.json` as `selectedAdapter` so downstream consumers can replay the same rendering deterministically.
40
+
41
+ ```bash
42
+ vgxness agents render --provider json --project vgxness --name apply-agent
43
+ ```
44
+
45
+ ### Claude renderer (preview)
46
+
47
+ `src/agents/renderers/claude-renderer.ts` exists in the tree as a preview renderer. The shape of an install-safe Claude artifact is not yet finalized, so the renderer is not enabled for end-to-end install flows.
48
+
49
+ ### OpenCode injection preview
50
+
51
+ `OpenCodeInjectionPreviewService` (in `src/providers/opencode/`) composes existing read-only outputs into a single envelope:
52
+
53
+ | Output | Source |
54
+ |---|---|
55
+ | `providerArtifacts` | OpenCode renderer for the selected agent and registered subagents. |
56
+ | `skillPayload` | Skill registry payload builder for the selected SDD phase and OpenCode adapter. |
57
+ | `sdd` | SDD workflow status and readiness for the selected project/change/phase. |
58
+ | `context` and `safety` | OpenCode preview layer metadata for future OpenCode/MCP/hook callers. |
59
+
60
+ The envelope is always `installable: false` and `readOnly: true`. It does not execute OpenCode, install hooks, create MCP servers, create runs, record skill usage, or touch provider/user/global config. Future live injection should build on this contract only after a separate approved change defines execution, hook, or MCP safety rules.
61
+
62
+ ```bash
63
+ vgxness opencode preview --provider opencode --agent apply-agent --project vgxness --change checkout-flow --phase apply-progress
64
+ ```
65
+
66
+ ## Code-runtime provider adapter
67
+
68
+ The code runtime speaks to a model through `CodeProviderAdapter` (`src/code/providers/provider-adapter.ts`):
69
+
70
+ ```ts
71
+ export interface CodeProviderAdapter {
72
+ readonly id: string;
73
+ readonly displayName: string;
74
+ readonly capabilities: ProviderCapabilities;
75
+ createResponse(request: ProviderRequest): Promise<ProviderResponse>;
76
+ streamResponse?(request: ProviderRequest): AsyncIterable<ProviderStreamEvent>;
77
+ countTokens?(input: ProviderTokenInput): Promise<TokenUsageEstimate>;
78
+ diagnostics?(model: string): Promise<ProviderDiagnostics>;
79
+ }
80
+ ```
81
+
82
+ Errors are surfaced through `CodeProviderError` with a `code` of `missing_credentials`, `blocked_credentials`, `provider_error`, or `retryable_provider_error`. The `retryable` flag tells callers whether a transient retry is meaningful.
83
+
84
+ ### `openai-compatible-provider-adapter.ts`
85
+
86
+ The real adapter. Speaks to any OpenAI-compatible endpoint; credentials come from environment references, never embedded in prompts or reports. The adapter streams responses through `stream-normalizer.ts` and maps messages with `message-mapper.ts`.
87
+
88
+ ### `fake-provider-adapter.ts`
89
+
90
+ Deterministic, offline. Used by `test/code/` and `bun run smoke:opentui-code` to keep CI hermetic. The fake adapter is intentionally thin — it does not model provider quirks, only the contract.
91
+
92
+ ### Adding a new code-runtime provider
93
+
94
+ 1. Implement `CodeProviderAdapter` and a `CodeProviderError`-throwing path. The interface is small; most of the work is in the request/response shape and the stream normalizer.
95
+ 2. Add credentials handling in `src/code/providers/credentials.ts`. Do not embed secrets; accept environment references only.
96
+ 3. Add a smoke test under `test/code/providers.test.ts` that exercises `createResponse`, `streamResponse`, and `countTokens` (where applicable).
97
+ 4. If the new provider has a different default tool shape, add a normalizer. If the stream events are different, extend `stream-normalizer.ts`.
98
+ 5. Update [Code runtime](./code-runtime.md) and this document with the new adapter id and capabilities.
99
+
100
+ ## OpenCode provider install and doctor
101
+
102
+ Provider install and doctor flows live in `src/mcp/client-install-opencode.ts` and `src/mcp/provider-doctor.ts` and are exposed through the MCP server and the CLI:
103
+
104
+ - `vgxness mcp install opencode --plan` — read-only plan; never writes config.
105
+ - `vgxness mcp install opencode --yes` — explicit write path; creates a backup first.
106
+ - `vgxness mcp doctor opencode` — JSON report of provider health.
107
+ - `vgxness provider status` / `vgxness provider doctor` (planned CLI) — same shape through the operator surface.
108
+ - `vgxness setup rollback --backup <path>` — restores a previous OpenCode config byte-for-byte after validation.
109
+
110
+ Writes happen only through `apply` with explicit consent. Plans, status, doctor, change-plan, and previews are read-only by contract.
111
+
112
+ ## Safety boundary
113
+
114
+ Adapters and renderers must not:
115
+
116
+ - Write provider config (`.opencode/`, `.claude/`, `opencode.json`, `CLAUDE.md`).
117
+ - Call providers (`opencode`, `claude`, etc.) during preview or status.
118
+ - Install global memory.
119
+ - Create `openspec/`.
120
+ - Bypass permission policy.
121
+ - Mutate other tools' state outside the VGXNESS SQLite store.
122
+
123
+ Adapter code is reviewed against the safety boundary tests in `test/mcp/` and `test/agents/provider-renderer.test.ts` before merge.
@@ -0,0 +1,88 @@
1
+ # Roadmap
2
+
3
+ This document tracks planned work that is not yet shipped. The [Architecture](./architecture.md) document describes what is built; this is what is next. Items are grouped by area and ordered roughly by impact and dependency.
4
+
5
+ ## Runtime execution
6
+
7
+ The lifecycle, policy recording, approval records, and retry policy are all in place. What is still test-only is the actual executor that turns a reserved attempt into a real-world side effect.
8
+
9
+ - **Real provider/tool executor.** The current `RunService.executeOperation(...)` takes an injected `RunOperationExecutor`; production code still uses fake/deterministic executors in tests. The plan is to ship a sandboxed executor that respects the `ExecutionIsolationPlan` (`workspace`/`git-worktree`/`process-sandbox`) and the SDD phase permission matrix.
10
+ - **CLI/MCP orchestration for `resume-after-approval`.** Once the executor is real, the `vgxness_run_resume_inspect` and `vgxness_run_resume_gate` tools need an `apply` partner that calls the executor only after a human resolves the pending approval.
11
+ - **Sandbox/worktree execution strategies.** `src/runs/sandbox-worktree-planning.ts` produces a plan; the plan needs to materialize into a real worktree or sandbox. Symlink and realpath hardening is already in the policy evaluator; the next step is wiring it to a runner.
12
+ - **Richer verification evidence summaries.** Verification currently records `pass`/`fail`/`skipped`; the next step is to summarize per-task verification into the run, task list, and SDD cockpit so blocked phases have concrete evidence.
13
+
14
+ ## Code runtime
15
+
16
+ The four modes (`inspect`/`plan`/`craft-preview`/`craft`) and the 19 tools are in. What is left is provider coverage, ergonomics, and the OpenTUI shell promotion.
17
+
18
+ - **Native Anthropic provider.** The code runtime speaks to any OpenAI-compatible endpoint. A native Anthropic adapter would remove the need to route Claude through an OpenAI-compatible bridge.
19
+ - **More providers.** The adapter contract is small and provider-neutral; additional adapters (Anthropic, Google, local model servers) can be added incrementally.
20
+ - **`vgxcode` OpenTUI shell promotion.** The shell is currently root-owned during development. To promote it, we need: deterministic event contracts, snapshot tests, and a packaging path that does not require `bun src/cli/tui/opentui/code/index.ts`.
21
+ - **Workspace-executor for the `craft` mode.** Workspace mutations (`apply_patch`, `create_file`, `update_file`, `delete_file`) currently use the `WorkspaceToolExecutor` with policy gating. The next step is a real file-system backend with rollback on failure.
22
+ - **Repair loop.** The `--verification repair` option exists in the CLI but is not wired to a loop in the runtime.
23
+
24
+ ## SDD governance
25
+
26
+ The hard acceptance gate and the cockpit blockers are in. The remaining work is mostly UX and stronger cross-linking.
27
+
28
+ - **Verification evidence linked to the cockpit.** The cockpit currently returns blockers; surfacing the per-phase verification evidence (pass/fail/skipped counts, last-run timestamps) would make "why is this blocked" answers more useful.
29
+ - **Per-phase model/profile routing in the cockpit.** The manager profile overlay exists. The cockpit should recommend a model for the next phase based on the active overlay.
30
+ - **Migration of `openspec/`-style workflows.** Some users bring artifacts from other tools. A formal `import` path through the artifact portability service would help, but it should not silently convert external artifacts into accepted SDD phases.
31
+
32
+ ## Skills and agents
33
+
34
+ The registries, version model, payloads, and improvement-proposal lifecycle are in.
35
+
36
+ - **Skill evaluation harness.** `013_skill_evaluation_scenarios.sql` is the storage; the runtime that runs scenarios against a proposed skill version is not yet built.
37
+ - **Skill improvement suggestion agent.** Skill proposals today are user-driven. A trace-driven candidate detector is planned but explicitly out of scope per the "no silent skill mutation" rule.
38
+ - **Canonical manifest v7.** v6 is in the tree (WIP commit). v7 would address the uncommitted `canonical-agent-manifest.ts` and `canonical-agent-projection.ts` work and ship a stable, validated manifest as the source of truth.
39
+
40
+ ## Storage and portability
41
+
42
+ SQLite, scopes, migrations, and the run snapshot export are in.
43
+
44
+ - **`import` path for SDD artifacts.** The `ArtifactPortabilityService` exports; an import path that re-creates acceptance records with explicit human re-acceptance is planned.
45
+ - **Sensitive-data redaction during export.** `src/export/redaction.ts` exists. Wiring it as a default into the export path and a CLI flag (`--redact`) is the next step.
46
+ - **Database upgrade tooling.** Forward-only migrations work; downgrade requires a snapshot. A small CLI helper for "is my DB on the latest migration?" would be useful for support.
47
+
48
+ ## TUI
49
+
50
+ The main menu and setup screens are in via OpenTUI. The dashboard directory is empty in the current tree.
51
+
52
+ - **TUI dashboard screen.** Real-time view of active runs, pending approvals, and SDD blockers. Should match the cockpit surface from MCP.
53
+ - **TUI runs screen.** Run list, drill into a run, see events/approvals/attempts, with read-only safe actions.
54
+ - **TUI approvals screen.** Resolve pending approvals directly from the TUI, with the same audit trail as `vgxness_sdd_accept_artifact`.
55
+
56
+ ## Observability and evaluation
57
+
58
+ 95 test files cover the eval targets in v1.5.1.
59
+
60
+ - **Eval gate.** A `vgxness verify --evals` runner that orchestrates a quality gate beyond `bun run verify`. The architecture document lists 11 eval targets; lifting them into a single runner would close the loop.
61
+ - **Token/cost tracking.** A nice-to-have that the trace model already supports structurally (`ProviderTokenInput`).
62
+ - **Timeline export.** `RunSnapshotPackageV1` exists. A `runs timeline <id> --format html|md|json` CLI would make sharing easier.
63
+
64
+ ## Provider coverage
65
+
66
+ - **Claude Code install path.** Preview/manual only today. The renderer exists; an install-safe Claude artifact shape is the blocker.
67
+ - **Antigravity and other custom adapters.** Listed as placeholders; a real adapter would unblock those workflows.
68
+ - **Provider doctor upgrades.** Doctor checks could surface real config drift, not just existence of expected files.
69
+
70
+ ## Out of scope (still)
71
+
72
+ These remain future expansion areas per the [PRD](./prd.md):
73
+
74
+ - Cloud sync across machines.
75
+ - Team/shared memory spaces.
76
+ - Hosted web console.
77
+ - Distributed agent workers.
78
+ - Hosted observability and evaluation UI.
79
+ - Skill marketplace or shared skill catalog.
80
+
81
+ ## How work gets into this list
82
+
83
+ When the architecture drifts from the code, or when a real product gap is identified, the right path is:
84
+
85
+ 1. Open an SDD change with `vgxness sdd status --project <project> --change <id>` (or a fresh change id).
86
+ 2. Move the item from here to an `explore` artifact.
87
+ 3. Walk the SDD phases: `explore → proposal → spec → design → tasks → apply-progress → verify → archive`.
88
+ 4. Once shipped, retire the item from this roadmap and update [Architecture](./architecture.md) and the [CHANGELOG](../../CHANGELOG.md) to reflect the new state.
package/docs/safety.md ADDED
@@ -0,0 +1,147 @@
1
+ # Safety model
2
+
3
+ VGXNESS treats safety as a core domain, not as adapter-specific behavior. This document describes the three layers that make up the safety model: the **policy evaluator** for the control plane, the **per-SDD-phase permission matrix**, and the **code runtime approval flow** with redactors.
4
+
5
+ If a tool, surface, or new provider wants to take an action, the action must:
6
+
7
+ 1. Be classified into a permission category.
8
+ 2. Resolve through the policy evaluator or the per-SDD-phase matrix.
9
+ 3. If the decision is `ask`, surface an approval prompt through the configured approval channel.
10
+ 4. Record the decision in the run event stream regardless of outcome.
11
+
12
+ ## Permission categories
13
+
14
+ The category list lives in `src/permissions/schema.ts` (`permissionCategories`). The default policy (`defaultPermissionPolicy` in `src/permissions/policy-evaluator.ts`) defines the baseline decision for each.
15
+
16
+ | Category | Default | Risky | Notes |
17
+ |---|---|---|---|
18
+ | `read` | `allow` | no | Files, memory, artifacts. Allowed only when path stays inside `workspaceRoot`. |
19
+ | `edit` | `ask` | yes | Write/patch/modify files. |
20
+ | `implementation-edit` | `ask` | yes | Edits tied to a SDD `apply-progress` phase. |
21
+ | `spec-write` | `ask` | yes | Writes to a spec artifact. |
22
+ | `design-write` | `ask` | yes | Writes to a design artifact. |
23
+ | `task-write` | `ask` | yes | Writes to a task artifact. |
24
+ | `shell` | `ask` | yes | Commands, scripts, package managers. |
25
+ | `test-run` | `ask` | yes | Test execution. |
26
+ | `install` | `ask` | yes | Dependency installation mutates local state. |
27
+ | `network` | `ask` | yes | Web fetch, API calls, package downloads. |
28
+ | `git` | `ask` | yes | Status, diff, branch, commit. |
29
+ | `git-write` | `ask` | yes | Push, merge, branch mutation. |
30
+ | `memory` | `ask` | yes | Memory read/write/search. |
31
+ | `memory-write` | `ask` | yes | Memory upsert/update. |
32
+ | `external-directory` | `deny` | yes | Access outside project/user-approved roots. Cannot be relaxed by agent overrides. |
33
+ | `provider-tool` | `ask` | yes | Opaque adapter/provider tool calls. |
34
+ | `secrets` | `deny` | yes | Environment variables, credentials, tokens. |
35
+
36
+ The risky categories (`isRiskyPermissionCategory`) get a forced `ask` even when an agent override would otherwise allow the category.
37
+
38
+ ## Decisions
39
+
40
+ Every permission request resolves to one of three decisions:
41
+
42
+ - `allow` — execute and record `succeeded` or `failed`.
43
+ - `ask` — create a pending approval record, record `pending-approval`, and pause until the approval is resolved.
44
+ - `deny` — record `blocked` and do not invoke the executor.
45
+
46
+ In addition, the SDD-phase permission matrix uses four `PermissionMode` values:
47
+
48
+ - `allow` — pass through.
49
+ - `audit` — log and pass through.
50
+ - `require-preflight` — must run preflight and receive an `allow` decision before the operation executes.
51
+ - `deny` — block.
52
+
53
+ ## Per-SDD-phase permission matrix
54
+
55
+ The matrix (`sddPhasePermissionMatrix`, version `sdd-phase-permissions-v1`) gives every (phase, category) pair a mode. The matrix version is exported as a constant so consumers can compare.
56
+
57
+ | Phase | Distinctive modes (vs. the planning baseline) |
58
+ |---|---|
59
+ | `explore`, `proposal`, `spec`, `design`, `tasks`, `archive` | Planning baseline. Reads allowed; edits denied; spec/design/task writes, shell, install, network, git, memory all `require-preflight`; provider-tool `audit`; external-directory and secrets `deny`. |
60
+ | `apply-progress` | Edits and `implementation-edit` are `require-preflight` (instead of `deny`). Shell and `test-run` are `require-preflight`. |
61
+ | `verify` | Edits and `implementation-edit` stay `deny`. Shell and `test-run` are `require-preflight`. |
62
+
63
+ Workspace reads are allowed only when the target path stays inside `workspaceRoot`. The evaluator resolves the real path with `realpathSync` to defeat symlink escapes and refuses to relax workspace boundary denials through agent overrides.
64
+
65
+ ## Code runtime approval flow
66
+
67
+ The code runtime layers a second, finer-grained decision on top of the policy evaluator. Per-tool definitions declare whether a tool is `read`, `confirm`, or `restricted` (see [Code runtime](./code-runtime.md)). The approval coordinator (`src/code/runtime/approval-coordinator.ts`) wires that to a broker and a gateway.
68
+
69
+ ```text
70
+ tool call
71
+
72
+
73
+ PolicyApprovalBroker ──► ConservativePermissionGateway.evaluate(tool, mode, context)
74
+ │ │
75
+ │ ├── allow → execute, emit CodeRuntimeEvent(succeeded)
76
+ │ ├── ask → ApprovalPrompt
77
+ │ │ ├── stdio channel: read line from stdin
78
+ │ │ └── auto channel: injected broker (e.g. MCP client)
79
+ │ └── deny → emit CodeRuntimeEvent(blocked), return error
80
+
81
+ CodeRuntimeEventSink (stream JSONL to consumer; OpenTUI shell, scripts, tests)
82
+ ```
83
+
84
+ The gateway is conservative by default: a non-`read` tool in a non-`apply-progress` phase defaults to `ask`. Even with `--approval-policy allow`, the gateway can re-promote to `ask` for risky categories or workspace boundary issues.
85
+
86
+ ## Redaction
87
+
88
+ `src/code/reporting/redaction.ts` ships three helpers used by every consumer that emits prompts, reports, checkpoints, transcripts, or memory saves.
89
+
90
+ | Helper | What it does |
91
+ |---|---|
92
+ | `redactSecrets(value: string): string` | Replaces secret-shaped substrings with `[REDACTED]`. Patterns include `*TOKEN=...`, `*SECRET=...`, `*KEY=...`, `*PASSWORD=...`, `sk_*`/`pk_*` keys, `Bearer ...` headers, and PEM private keys. |
93
+ | `redactJson(value: JsonValue): JsonValue` | Recursively walks JSON, redacts string values, and replaces values whose key matches `(token\|secret\|password\|api.?key\|authorization\|credential)` with the literal `"[REDACTED]"`. |
94
+ | `omitSensitiveCommandOutput<T extends JsonValue>(value: T): JsonValue` | For tool results that include `stdout`/`stderr`, replaces those fields with `"[omitted by default]"` unless the consumer explicitly opts in. |
95
+
96
+ These helpers are deterministic and are the only place secret-shaped values should be stripped. Tool results that include shell output must use `omitSensitiveCommandOutput` before they are recorded into the run event stream or rendered into a checkpoint.
97
+
98
+ ## Approval lifecycle
99
+
100
+ Approvals are first-class records linked to a permission-decision event:
101
+
102
+ ```text
103
+ executeOperation(...)
104
+
105
+
106
+ evaluate permission
107
+
108
+ ├── allow → no approval record, executor runs
109
+ ├── deny → no approval record, executor blocked
110
+ └── ask → pending approval record + reserved operation attempt
111
+
112
+
113
+ human resolves approval (approved | rejected | cancelled)
114
+
115
+
116
+ resumeApprovedOperation({ approvalId, executor })
117
+
118
+
119
+ reserved attempt → succeeded | failed | abandoned
120
+ ```
121
+
122
+ Retry policy evaluation is separate from execution. `vgxness_run_resume_gate` evaluates the policy and returns an `OperationRetryDecision` (`allowed`/`reasonCode`/`reason`/attempt metadata). The default policy is `never`, so any prior `reserved`/`succeeded`/`failed`/`abandoned` attempt blocks later resume before the executor is invoked.
123
+
124
+ | Policy | Allows a new attempt after | Always blocks |
125
+ |---|---|---|
126
+ | `never` | No previous attempt only | `reserved`, `succeeded`, `failed`, `abandoned` |
127
+ | `after-abandoned` | latest attempt is `abandoned` | active `reserved`, `succeeded`, `failed` |
128
+ | `after-failure` | latest attempt is `failed` | active `reserved`, `succeeded`, `abandoned` |
129
+ | `after-failure-or-abandoned` | latest attempt is `failed` or `abandoned` | active `reserved`, `succeeded` |
130
+
131
+ `RunService.abandonReservedOperationAttempt({ attemptId, actor, reason })` is recovery-only: it transitions a stuck attempt to `abandoned` and appends an `operation-execution` audit event. It does not call an executor or retry.
132
+
133
+ ## Safety boundaries (enforced everywhere)
134
+
135
+ - Read-only/preview commands must stay non-mutating: setup plans, MCP setup previews, OpenCode previews, workflow previews, status views, and the natural-language orchestrator preview never write provider config, call providers, or install global memory.
136
+ - Provider config writes (`opencode.json`, `.opencode/`, `.claude/`, `CLAUDE.md`) require explicit consent (`--yes` or an equivalent confirmed flow) plus backup/rollback behavior.
137
+ - The control plane and the code runtime do not create or write `openspec/`. SDD artifacts are stored through the local SQLite artifact service under canonical topic keys.
138
+ - Human acceptance is distinct from artifact presence: `vgxness_sdd_accept_artifact` records explicit human-only acceptance; saving a draft never implies acceptance.
139
+ - Unrelated user work is preserved. Workspace boundary denials cannot be relaxed by agent or subagent overrides.
140
+
141
+ ## Tests
142
+
143
+ The policy and approval flow are covered by:
144
+
145
+ - `test/permissions/policy-evaluator.test.ts` — decisions, defaults, workspace boundary, SDD phase matrix.
146
+ - `test/runs/` — approval records, reserved attempts, retry policy, abandonment, resume inspect/gate.
147
+ - `test/code/` — approval broker, tool definitions, runtime approval flow.