npm - @agent-native/core - Versions diffs - 0.53.0 → 0.54.0 - Mend

@agent-native/core 0.53.0 → 0.54.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (99) hide show

package/dist/action.d.ts +40 -1
package/dist/action.d.ts.map +1 -1
package/dist/action.js +69 -2
package/dist/action.js.map +1 -1
package/dist/agent/index.d.ts +1 -0
package/dist/agent/index.d.ts.map +1 -1
package/dist/agent/index.js +1 -0
package/dist/agent/index.js.map +1 -1
package/dist/agent/observational-memory/index.d.ts +6 -6
package/dist/agent/observational-memory/index.js +6 -6
package/dist/agent/observational-memory/index.js.map +1 -1
package/dist/agent/observational-memory/read.d.ts +7 -9
package/dist/agent/observational-memory/read.d.ts.map +1 -1
package/dist/agent/observational-memory/read.js +7 -9
package/dist/agent/observational-memory/read.js.map +1 -1
package/dist/agent/processors.d.ts +146 -0
package/dist/agent/processors.d.ts.map +1 -0
package/dist/agent/processors.js +122 -0
package/dist/agent/processors.js.map +1 -0
package/dist/agent/production-agent.d.ts +10 -0
package/dist/agent/production-agent.d.ts.map +1 -1
package/dist/agent/production-agent.js +101 -0
package/dist/agent/production-agent.js.map +1 -1
package/dist/agent/run-loop-with-resume.d.ts.map +1 -1
package/dist/agent/run-loop-with-resume.js +4 -5
package/dist/agent/run-loop-with-resume.js.map +1 -1
package/dist/agent/tool-call-journal.d.ts +6 -8
package/dist/agent/tool-call-journal.d.ts.map +1 -1
package/dist/agent/tool-call-journal.js +6 -8
package/dist/agent/tool-call-journal.js.map +1 -1
package/dist/agent/types.d.ts +11 -0
package/dist/agent/types.d.ts.map +1 -1
package/dist/agent/types.js.map +1 -1
package/dist/cli/plan-local.d.ts.map +1 -1
package/dist/cli/plan-local.js +129 -4
package/dist/cli/plan-local.js.map +1 -1
package/dist/cli/skills.d.ts.map +1 -1
package/dist/cli/skills.js +38 -3
package/dist/cli/skills.js.map +1 -1
package/dist/coding-tools/run-code.d.ts.map +1 -1
package/dist/coding-tools/run-code.js +18 -2
package/dist/coding-tools/run-code.js.map +1 -1
package/dist/extensions/fetch-tool.d.ts.map +1 -1
package/dist/extensions/fetch-tool.js +80 -15
package/dist/extensions/fetch-tool.js.map +1 -1
package/dist/extensions/web-content.d.ts +61 -0
package/dist/extensions/web-content.d.ts.map +1 -0
package/dist/extensions/web-content.js +468 -0
package/dist/extensions/web-content.js.map +1 -0
package/dist/extensions/web-search-tool.js +3 -3
package/dist/extensions/web-search-tool.js.map +1 -1
package/dist/mcp/build-server.d.ts.map +1 -1
package/dist/mcp/build-server.js +4 -1
package/dist/mcp/build-server.js.map +1 -1
package/dist/provider-api/corpus-jobs.d.ts +80 -0
package/dist/provider-api/corpus-jobs.d.ts.map +1 -1
package/dist/provider-api/corpus-jobs.js +219 -22
package/dist/provider-api/corpus-jobs.js.map +1 -1
package/dist/provider-api/index.d.ts +24 -32
package/dist/provider-api/index.d.ts.map +1 -1
package/dist/provider-api/index.js +28 -1
package/dist/provider-api/index.js.map +1 -1
package/dist/server/agent-chat-plugin.js +1 -1
package/dist/server/agent-chat-plugin.js.map +1 -1
package/dist/server/better-auth-instance.d.ts +7 -0
package/dist/server/better-auth-instance.d.ts.map +1 -1
package/dist/server/better-auth-instance.js +90 -0
package/dist/server/better-auth-instance.js.map +1 -1
package/dist/server/deep-link.d.ts +7 -0
package/dist/server/deep-link.d.ts.map +1 -1
package/dist/server/deep-link.js +13 -2
package/dist/server/deep-link.js.map +1 -1
package/dist/server/index.d.ts +1 -1
package/dist/server/index.d.ts.map +1 -1
package/dist/server/index.js +1 -1
package/dist/server/index.js.map +1 -1
package/dist/templates/default/.agents/skills/actions/SKILL.md +52 -1
package/dist/templates/default/.agents/skills/security/SKILL.md +22 -0
package/dist/templates/workspace-core/.agents/skills/actions/SKILL.md +52 -1
package/dist/templates/workspace-core/.agents/skills/external-agents/SKILL.md +6 -4
package/dist/templates/workspace-core/.agents/skills/observability/SKILL.md +11 -0
package/dist/templates/workspace-core/.agents/skills/security/SKILL.md +22 -0
package/docs/content/actions.md +50 -0
package/docs/content/durable-resume.md +49 -0
package/docs/content/external-agents.md +2 -2
package/docs/content/human-approval.md +101 -0
package/docs/content/observability.md +21 -0
package/docs/content/observational-memory.md +63 -0
package/docs/content/plan-plugin.md +5 -0
package/docs/content/pr-visual-recap.md +4 -3
package/docs/content/processors.md +99 -0
package/docs/content/template-plan.md +78 -14
package/package.json +6 -1
package/src/templates/default/.agents/skills/actions/SKILL.md +52 -1
package/src/templates/default/.agents/skills/security/SKILL.md +22 -0
package/src/templates/workspace-core/.agents/skills/actions/SKILL.md +52 -1
package/src/templates/workspace-core/.agents/skills/external-agents/SKILL.md +6 -4
package/src/templates/workspace-core/.agents/skills/observability/SKILL.md +11 -0
package/src/templates/workspace-core/.agents/skills/security/SKILL.md +22 -0

package/docs/content/observability.md CHANGED Viewed

@@ -188,6 +188,27 @@ await putSetting("observability-config", {
 The framework emits `gen_ai.*` semantic convention spans compatible with the OpenTelemetry GenAI spec.
+## OpenTelemetry spans {#otel}
+Separate from the `exporters` config above (which ships the in-house traces to an OTLP endpoint), the agent loop can also emit **live OpenTelemetry spans** for every run, model call, and tool call — so a host that already runs an OTel collector sees agent activity alongside the rest of its distributed traces.
+This layer is **optional and no-op by default**:
+- `@opentelemetry/api` is an **optional dependency**. If it isn't installed, the helpers degrade to silent no-ops — nothing here ever throws into the agent loop.
+- Even when the api package _is_ present, it ships a default no-op tracer. Spans only become real once the **host registers a `TracerProvider`** (via `@opentelemetry/sdk-node` or similar). The framework deliberately does **not** depend on the heavy SDK/exporter packages or register a provider itself — instrumentation is opt-in by the embedding app.
+So the cost when you haven't wired OTel is a couple of cached property reads per call. To turn it on, install the api package plus your SDK and register a provider at server startup the same way you would for any other Node service.
+The agent loop emits three span kinds:
+| Span        | When                       | Attributes                                                        |
+| ----------- | -------------------------- | ----------------------------------------------------------------- |
+| `agent.run` | once per agent run         | `agent.run_id`, `agent.thread_id`, `agent.user_id`, `agent.model` |
+| `tool.call` | once per action invocation | `tool.name`, plus success/error status                            |
+| `llm.call`  | per model call             | timing + OK/error status                                          |
+Spans are finished with OK/ERROR status and record the error message on failure. Zero/sentinel attribute values are pruned so spans aren't cluttered with noise. This OTel layer is purely additive to the in-house `agent_trace_spans` / `agent_trace_summaries` tables that power the dashboard above — both are produced from the same run events.
 ## Error reporting (Sentry) {#sentry}
 Server-side errors that escape Nitro route handlers are reported to Sentry when a DSN is configured. Without it the SDK silently no-ops, so it's safe to leave the env vars unset in dev. Browser and server events can go to the same Sentry project; split them into separate projects only when you want operational separation for ownership, volume, quotas, or alert routing.

package/docs/content/observational-memory.md ADDED Viewed

@@ -0,0 +1,63 @@
+---
+title: "Observational Memory"
+description: "Background three-tier compaction (recent raw → observations → reflections) that keeps long agent threads cheap and prompt-cache-stable without touching short conversations."
+---
+# Observational Memory
+A long-running agent thread accumulates a huge transcript: every message, every tool call, every result. Replaying that whole history into the model on each turn is expensive and eventually blows the context window. **Observational Memory (OM)** compacts the older part of a long thread into a dated, layered summary so the model still knows what happened — just at a fraction of the token cost — while the most recent turns stay verbatim.
+OM is entirely automatic and owner-scoped. **Short threads are unaffected**: until a thread crosses the first compaction threshold, OM is a no-op and the context is byte-for-byte what it would be without it.
+## The three tiers {#tiers}
+OM represents a long thread as three layers, from most-distilled to most-recent:
+| Tier                    | What it is                                                                                        |
+| ----------------------- | ------------------------------------------------------------------------------------------------- |
+| **Reflections**         | Highest-level, condensed from the observation log once it grows large. The long-arc summary.      |
+| **Observations**        | Dense, dated entries that fold a stretch of raw messages into a compact record of what happened.  |
+| **Recent raw messages** | The last N turns, kept **verbatim** — never folded — so the agent always sees the latest context. |
+On each turn, the read side assembles these into a single self-labeled `[Observational Memory]` block that replaces the raw older prefix, keeps the recent-raw window intact, and tells the model to treat the compacted record as authoritative (don't redo completed work, trust the recorded decisions, names, dates, and status).
+## How compaction runs {#compaction}
+Two passes run as a **fire-and-forget, best-effort** step _after_ a clean turn, so they never add latency to the user-visible response and any failure is swallowed:
+1. **Observer** — once a thread's _unobserved_ messages exceed the observation token threshold, folds them into a single dense observation entry.
+2. **Reflector** — once the persisted observation log itself exceeds the reflection token threshold, condenses the observations into a higher-level reflection.
+Both passes no-op below their thresholds, so calling the compactor after every turn is cheap. Because OM replaces the volatile raw prefix with stable compacted text, it also keeps the prompt **cache-stable** across turns of a long thread.
+OM data lives in the app's own SQL database, scoped to the owner (and org when present) — the same scoping model as the rest of the framework. It is never shared across users.
+## Configuration {#config}
+Defaults are conservative. An operator can dial compaction at deploy time with `AGENT_NATIVE_OM_*` environment variables (no redeploy of the app code needed); an invalid or missing value always falls back to the named default.
+| Env var                                       | Default | What it controls                                                                       |
+| --------------------------------------------- | ------- | -------------------------------------------------------------------------------------- |
+| `AGENT_NATIVE_OM_OBSERVATION_TOKEN_THRESHOLD` | `30000` | Unobserved-message tokens that trigger the Observer to fold them into one observation. |
+| `AGENT_NATIVE_OM_REFLECTION_TOKEN_THRESHOLD`  | `40000` | Observation-log tokens that trigger the Reflector to condense into a reflection.       |
+| `AGENT_NATIVE_OM_RECENT_RAW_MESSAGE_COUNT`    | `12`    | How many of the most-recent messages stay verbatim (never folded into an observation). |
+The Observer and Reflector output caps (4000 / 2000 tokens) keep a single compaction pass from itself blowing the budget; they are tunable in code via `resolveObservationalMemoryConfig({ ... })` but not env-exposed.
+> [!TIP]
+> Lower the thresholds to compact sooner (cheaper long threads, slightly more summarization); raise them to keep more raw history in context before compacting. Set `AGENT_NATIVE_OM_RECENT_RAW_MESSAGE_COUNT` higher if your workflows need a longer verbatim tail.
+## When it kicks in {#when}
+OM only changes behavior for threads long enough to have produced at least one observation or reflection. Concretely:
+- A brand-new or short thread: no OM entries yet → the context is the plain transcript, unchanged.
+- A long thread that has crossed the observation threshold: the older prefix is replaced by the compacted `[Observational Memory]` block, the recent-raw tail stays verbatim, and token usage drops substantially.
+The injection is best-effort and boundary-safe — if a safe trim point can't be found (e.g. a pending tool-use/result pair sits at the window edge), OM injects the memory block _additively_ without trimming rather than risk dropping a pending tool result.
+## Related
+- [**Context X-Ray**](/docs/using-your-agent) — inspect what's actually in the live context window.
+- [**Observability**](/docs/observability) — token and cost metrics per run, where OM's savings show up.
+- [**Custom Agents & Teams**](/docs/agent-teams) — long sub-agent runs benefit from the same compaction.

package/docs/content/plan-plugin.md CHANGED Viewed

@@ -40,6 +40,11 @@ the Plan MCP connector. They write `plans/<slug>/plan.mdx` plus optional
 npx @agent-native/core@latest plan local preview --dir plans/<slug> --kind plan --open
 ```
+For folders in the current repo, the direct local route includes `?path=...` so
+the local Plan app can keep browser edits saving to the repo folder. The Plan
+app uses `apps.plan.roots[0].path` in `agent-native.json` as the default place
+to save promoted local plans, falling back to `plans/`.
 This keeps plan content out of the Agent-Native Plan database. Hosted sharing,
 comments, screenshots, and plan history are unavailable until you explicitly
 publish later.

package/docs/content/pr-visual-recap.md CHANGED Viewed

@@ -223,9 +223,10 @@ The returned URL opens the hosted Plan UI while the browser reads the recap MDX
 from a localhost bridge. Recap content is not written to the hosted Plan
 database, and the URL only works on the machine running the bridge. If you run
 the Plan app locally with the same `PLAN_LOCAL_DIR`, the
-`/local-plans/pr-123-visual-recap` route is also valid. This mode disables the
-hosted sticky PR comment, inline screenshot upload, usage attachment, and
-browser comments until you explicitly publish.
+`/local-plans/pr-123-visual-recap` route is also valid. Repo-backed folders can
+open as `/local-plans/pr-123-visual-recap?path=plans%2Fpr-123-visual-recap`.
+This mode disables the hosted sticky PR comment, inline screenshot upload,
+usage attachment, and browser comments until you explicitly publish.
 ## It's informational, not a gate

package/docs/content/processors.md ADDED Viewed

@@ -0,0 +1,99 @@
+---
+title: "In-Loop Processors"
+description: "Loop-internal observer/guardrail hooks that watch the model's streamed output and tool calls mid-run and can abort it — the seam for real-time guardrails and proof-of-done gates."
+---
+# In-Loop Processors
+A `Processor` is a loop-internal **observer/guardrail** for the agent run. It watches the model's streamed output and the tool calls it requests _as the run progresses_, keeps its own scratch state, and can **abort** the run before a "done" is claimed. This is the structural prerequisite for real-time guardrails (block disallowed output mid-stream) and a proof-of-done / coverage gate (inspect what the model is about to do and halt it).
+> [!WARNING]
+> A processor is **configuration**, not a tool, not an action, and not an authoring DSL. Processors only observe, mutate their own stream-scoped state, and `abort()`. They never define app behavior, replace actions, or appear to the model. App operations belong in [actions](/docs/actions).
+## The hooks {#hooks}
+A processor implements any subset of three optional lifecycle hooks (the shape is borrowed from Mastra's output processors):
+| Hook                  | Fires…                                                                | Use it to…                                                  |
+| --------------------- | --------------------------------------------------------------------- | ----------------------------------------------------------- |
+| `processOutputStream` | per streamed chunk (text / thinking deltas) while the model generates | react to output before the full turn lands                  |
+| `processOutputStep`   | once per model response, around tool execution                        | inspect the tool calls the model is about to run; gate them |
+| `processOutputResult` | once at run end, with the final assistant text                        | record a verdict / proof-of-done over the completed answer  |
+Each processor gets its own mutable, run-scoped `state` object that persists across every one of its hook invocations within a single run and is **isolated** from other processors' state.
+```ts
+import type { Processor } from "@agent-native/core";
+const noSecretsInOutput: Processor = {
+  name: "no-secrets",
+  processOutputStream({ part, abort }) {
+    if (part.type === "text" && /sk-live_/.test(part.text)) {
+      abort("Model attempted to emit a live secret token.", {
+        kind: "secret-leak",
+      });
+    }
+  },
+};
+const coverageGate: Processor = {
+  name: "proof-of-done",
+  processOutputStep({ toolCalls, state }) {
+    // Track what the model has actually done this run...
+    for (const call of toolCalls) {
+      (state.ran ??= new Set<string>()).add(call.name);
+    }
+  },
+  processOutputResult({ text, state }) {
+    // ...and record a verdict over the final answer.
+    const ran = state.ran as Set<string> | undefined;
+    state.verdict = ran?.has("run-tests") ? "verified" : "unverified";
+  },
+};
+```
+## Aborting with `TripWire` {#tripwire}
+A hook halts the run by calling `abort(reason, meta?)`, which throws a **`TripWire`**. The loop catches it, emits a single **`tripwire` event**, stops cleanly, and surfaces the reason as the final assistant message.
+```ts
+import { TripWire } from "@agent-native/core";
+```
+The `tripwire` event carries:
+| Field       | Type     | Notes                                                          |
+| ----------- | -------- | -------------------------------------------------------------- |
+| `reason`    | `string` | The human-readable reason passed to `abort`.                   |
+| `processor` | `string` | Name of the processor that aborted, when it declared a `name`. |
+`TripWire` also carries optional structured `meta` and the originating `processor` name for programmatic consumers that `instanceof`-check it. Because a halt is graceful, `processOutputResult` still fires on the (halted) final text so a proof-of-done processor can record its verdict even when the run was aborted.
+## Wiring processors {#wiring}
+Processors are configured in code via the `processors` array on `runAgentLoop`:
+```ts
+await runAgentLoop({
+  engine,
+  model,
+  systemPrompt,
+  tools,
+  messages,
+  actions,
+  send,
+  signal,
+  processors: [noSecretsInOutput, coverageGate],
+});
+```
+**Zero-overhead when unused.** The loop builds the processor chain only when at least one processor is supplied; when `processors` is omitted or empty, none of the seam code runs and the loop is byte-for-byte unchanged. Hooks run in registration order and may be sync or async.
+> [!NOTE]
+> The loop-level seam is the deliverable today and is callable directly by sub-agents, A2A, MCP, and tests. Threading `processors` through the HTTP chat handler (so a per-request resolver can configure them without calling `runAgentLoop` directly) is convenience plumbing that is not yet wired — configure processors at the `runAgentLoop` call site for now.
+## Related
+- [**Durable Resume**](/docs/durable-resume) — how the loop survives interruptions without re-running completed side effects.
+- [**Custom Agents & Teams**](/docs/agent-teams) — sub-agents run the same loop and can carry their own processors.
+- [**Observability**](/docs/observability) — record processor verdicts alongside run traces.

package/docs/content/template-plan.md CHANGED Viewed

@@ -97,6 +97,22 @@ connector, so use the Agent-Native CLI path when you want the one-command setup.
 > Plan skills _and_ the connector in one install and auto-updates as the skills
 > improve — see [Plan plugin & marketplace](/docs/plan-plugin).
+### Open Plans inside VS Code {#vscode-extension}
+If you live in VS Code, the Agent Native VS Code extension can open the same
+Plan review surface in a side panel instead of sending you to a separate browser
+tab. Plans tools still return the normal web link, and the MCP metadata also
+includes a VS Code handoff URL:
+```text
+vscode://builderio.agent-native/open?url=<encoded-plan-url>
+```
+The extension handles that URI, opens the decoded Plan URL in a VS Code webview,
+and includes a command to run the existing Agent Native MCP connect flow for VS
+Code / GitHub Copilot. This is especially useful from Claude Code or another
+coding-agent workflow where the plan should stay next to the files being edited.
 ## Use it from your coding agent
 After installation, ask your agent for the command that fits the work:
@@ -110,9 +126,9 @@ After installation, ask your agent for the command that fits the work:
   before/after blocks instead of a wall of raw diff.
 The agent should inspect the codebase first, then create the visual plan when a
-wrong direction would be costly. The returned Plans link opens the review UI so
-you can annotate, correct, choose options, and ask for updates before code
-changes begin.
+wrong direction would be costly. The returned Plans link opens the review UI in
+the browser or VS Code, so you can annotate, correct, choose options, and ask for
+updates before code changes begin.
 When a Codex, Claude Code, Markdown, or pasted plan already exists, use
 `/visual-plan`; the agent preserves that source plan and builds the richer review
@@ -207,12 +223,38 @@ not sent through hosted Plan actions. Keep the bridge process running while you
 review; the URL is local to your machine and is not a shareable team link.
 If you run the Plan app locally with the same `PLAN_LOCAL_DIR`, you can also
-open the read-only app route:
+open the editable app route:
 ```text
 http://localhost:<port>/local-plans/<slug>
 ```
+For repo-backed folders, the direct local route can carry the repo-relative
+folder path so browser edits keep writing to that folder:
+```text
+http://localhost:<port>/local-plans/<slug>?path=plans%2F<slug>
+```
+The Plan app uses `apps.plan.roots[0].path` in `agent-native.json` as the
+default repo location for promoted local plans, falling back to `plans/`:
+```json
+{
+  "version": 1,
+  "apps": {
+    "plan": {
+      "mode": "local-files",
+      "roots": [{ "name": "Plans", "path": "plans", "kind": "plans" }]
+    }
+  }
+}
+```
+Direct local Plan routes include a menu action to save a temporary local folder
+into that repo location. After promotion, the page reopens with `?path=...` and
+continues autosaving MDX edits to the repo folder.
 Local-files mode prevents plan or recap content from going to the Agent-Native
 Plan database. It also disables hosted sharing, browser comments, plan history,
 and publish/export receipts until you explicitly opt into publishing. To move a
@@ -244,6 +286,27 @@ This path does not require cloning the Plan app or running a CLI. It is for
 file-first review/editing around a hosted plan, not for keeping plan content out
 of the hosted database.
+## Deleting hosted plan data {#delete-data}
+Signed-in owners can delete their hosted plans and recaps from the Plans list or
+the plan action menu.
+- **Soft delete** moves the plan to the **Deleted** tab, makes normal plan
+  views/direct links stop working, and removes public access by making the row
+  private. The SQL rows are retained so the owner can restore the plan later.
+- **Restore** is available from the **Deleted** tab for soft-deleted plans.
+- **Permanent delete** removes the hosted plan row and plan-scoped comments,
+  sections, activity events, version snapshots, share grants, abuse reports, and
+  SQL asset records. The UI requires typing `DELETE <plan-id>` before the final
+  button enables.
+Permanent delete removes the Plan app's database records and SQL-backed asset
+bytes/references. If a deployment uses an external upload provider, provider
+object retention follows that provider's lifecycle because the shared upload
+abstraction does not currently expose object deletion. Local-files privacy mode
+keeps the source in your local MDX folder instead; deleting hosted data does not
+touch local files.
 ## Useful prompts
 - "Use `/visual-plan` before changing the auth flow."
@@ -287,16 +350,16 @@ The local template is useful when you are developing Plans itself, testing local
 Schema lives in `templates/plan/server/db/schema.ts`. Core tables:
-| Table              | What it holds                                                                                                                                                |
-| ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------ |
-| `plans`            | Each plan or recap — `title`, `brief`, `kind` (plan/recap), `status`, `source`, `html`/`markdown`/`content`, `hosted_plan_id/url`, usage stats, `source_url` |
-| `plan_sections`    | Ordered sections within a plan — `type`, `title`, `body`, `html`, `sort_order`, `created_by`                                                                 |
-| `plan_comments`    | Threaded comments — `kind`, `status`, `anchor`, `message`, `resolution_target`, `mentions_json`, `resolved_by`                                               |
-| `plan_events`      | Audit log of agent/human events on a plan                                                                                                                    |
-| `plan_versions`    | Point-in-time snapshots for version history                                                                                                                  |
-| `plan_shares`      | Per-principal share grants (viewer / editor / admin)                                                                                                         |
-| `plan_guest_mints` | Rate-limit records for guest session issuance                                                                                                                |
-| `plan_assets`      | Inline image assets stored as base64 (fallback when no upload provider)                                                                                      |
+| Table              | What it holds                                                                                                                                                                           |
+| ------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `plans`            | Each plan or recap — `title`, `brief`, `kind` (plan/recap), `status`, `source`, `html`/`markdown`/`content`, `hosted_plan_id/url`, usage stats, `source_url`, `deleted_at`/`deleted_by` |
+| `plan_sections`    | Ordered sections within a plan — `type`, `title`, `body`, `html`, `sort_order`, `created_by`                                                                                            |
+| `plan_comments`    | Threaded comments — `kind`, `status`, `anchor`, `message`, `resolution_target`, `mentions_json`, `resolved_by`                                                                          |
+| `plan_events`      | Audit log of agent/human events on a plan                                                                                                                                               |
+| `plan_versions`    | Point-in-time snapshots for version history                                                                                                                                             |
+| `plan_shares`      | Per-principal share grants (viewer / editor / admin)                                                                                                                                    |
+| `plan_guest_mints` | Rate-limit records for guest session issuance                                                                                                                                           |
+| `plan_assets`      | Inline image assets stored as base64 (fallback when no upload provider)                                                                                                                 |
 ### Key actions
@@ -304,6 +367,7 @@ Actions in `templates/plan/actions/`:
 - **Creation** — `create-visual-plan`, `create-visual-recap`, `create-ui-plan`, `create-prototype-plan`, `create-plan-design`, `create-visual-questions`
 - **Reading & editing** — `get-visual-plan`, `update-visual-plan`, `list-visual-plans`, `import-visual-plan-source`, `patch-visual-plan-source`, `read-visual-plan-source`, `export-visual-plan`
+- **Lifecycle** — `delete-visual-plan` for owner-only soft delete, restore, and typed-confirmation permanent delete
 - **Publishing & sharing** — `publish-visual-plan`
 - **Versions** — `list-plan-versions`, `get-plan-version`, `restore-plan-version`
 - **Comments & feedback** — `get-plan-feedback`, `reply-to-plan-comment`, `resolve-plan-comment`, `consume-plan-feedback`, `delete-plan-comment`

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@agent-native/core",
-  "version": "0.53.0",
+  "version": "0.54.0",
   "type": "module",
   "engines": {
     "node": ">=22"
@@ -165,6 +165,7 @@
     "@libsql/client": "^0.15.0",
     "@modelcontextprotocol/ext-apps": "1.7.2",
     "@modelcontextprotocol/sdk": "^1.29.0",
+    "@mozilla/readability": "0.6.0",
     "@neondatabase/serverless": "^1.1.0",
     "@radix-ui/react-dialog": "1.1.15",
     "@radix-ui/react-dropdown-menu": "^2.1.16",
@@ -212,6 +213,7 @@
     "jiti": "^2.6.1",
     "jose": "^6.2.2",
     "kiwi-schema": "^0.5.0",
+    "linkedom": "0.18.12",
     "lowlight": "^3.3.0",
     "minimatch": "^10.0.0",
     "nanoid": "^5.1.9",
@@ -224,10 +226,12 @@
     "recharts": "^3.8.1",
     "remark-gfm": "^4.0.1",
     "roughjs": "4.6.6",
+    "safe-regex2": "5.1.1",
     "shiki": "^4.0.2",
     "sonner": "^2.0.7",
     "tailwind-merge": "^3.5.0",
     "tiptap-markdown": "^0.9.0",
+    "turndown": "7.2.4",
     "tw-animate-css": "1.4.0",
     "y-protocols": "^1.0.7",
     "yjs": "^13.6.31",
@@ -380,6 +384,7 @@
     "@types/pako": "^2.0.4",
     "@types/react": "^19.2.14",
     "@types/react-dom": "^19.2.3",
+    "@types/turndown": "5.0.6",
     "@types/ws": "^8.18.1",
     "@vitejs/plugin-react-swc": "^4.0.0",
     "@vitest/coverage-v8": "4.1.5",

package/src/templates/default/.agents/skills/actions/SKILL.md CHANGED Viewed

@@ -112,7 +112,10 @@ action trio instead:
   docs/spec URLs, placeholders, and examples without exposing secrets.
 - `provider-api-docs`: fetches public provider docs/spec/changelog URLs when
   the exact endpoint, filter operator, payload shape, or pagination contract is
-  uncertain. Registered docs URLs are curated starting points.
+  uncertain. Registered docs URLs are curated starting points. Use
+  `responseMode: "markdown"` for clean readable docs, or
+  `responseMode: "matches"` with `search: { query | terms | regex }` for
+  compact snippets instead of flooding context with raw HTML.
 - `provider-api-request`: makes a constrained authenticated HTTP request to the
   provider host, injects configured credentials, blocks private/internal URLs,
   and redacts secrets.
@@ -151,6 +154,12 @@ pagination status, truncation, failed pages, and uncovered gaps. They must not
 turn default limits, sampled rows, truncated excerpts, or aborted calls into a
 confident "none found", "all records", or exhaustive conclusion.
+For public web pages and docs, prefer the token-efficient path: `web-search`
+to find likely URLs, `web-request` or `provider-api-docs` with clean
+`responseMode` output to read a page, and `run-code` with `webRead()` /
+`webFetch()` when you need to grep, aggregate, or compare many pages before
+returning a small result.
 ### The `http` Option
 Controls how the action is exposed as an HTTP endpoint:
@@ -195,6 +204,48 @@ run: async (args) => {
 }
 ```
+### Validating Return Values (`outputSchema`)
+`schema` validates inputs; `outputSchema` validates what the action **returns**. Pass any Standard Schema-compatible schema (Zod, Valibot, ArkType) and the framework validates the result _after_ `run()` resolves — input validated before `run`, output after.
+```ts
+export default defineAction({
+  description: "Summarize a thread.",
+  schema: z.object({ threadId: z.string() }),
+  outputSchema: z.object({ summary: z.string(), messageCount: z.number() }),
+  outputErrorStrategy: "warn", // default; "strict" | "fallback"
+  // outputFallback: { summary: "", messageCount: 0 }, // used only by "fallback"
+  run: async ({ threadId }) => {
+    /* ... */
+  },
+});
+```
+- `"warn"` (default) — `console.warn` the issues and return the **original** result unchanged. Non-breaking.
+- `"strict"` — throw a clear error so a buggy action surfaces loudly.
+- `"fallback"` — return `outputFallback` in place of the invalid result.
+On success the validated value is returned, so coercion/defaults on `outputSchema` apply. Omit `outputSchema` and behavior is byte-for-byte unchanged (no wrapping).
+### Human-in-the-Loop Approval (`needsApproval`)
+For high-consequence, outward-facing, hard-to-undo actions (sending an email, charging a card, deleting an account), set `needsApproval` so the agent **cannot** run the action without a human approving the specific call:
+```ts
+export default defineAction({
+  description: "Send an email via Gmail.",
+  schema: z.object({ to: z.string(), subject: z.string(), body: z.string() }),
+  needsApproval: true, // boolean, or (args, ctx) => boolean | Promise<boolean>
+  run: async (args) => {
+    /* ...actually send... */
+  },
+});
+```
+When the gate is truthy and the call isn't yet approved, the loop emits an `approval_required` event and **stops the turn — `run()` never executes**. A predicate gates conditionally (e.g. only external recipients) and **fails closed**: a throw is treated as "approval required". The human approves via the chat UI's Approve affordance, which re-issues the turn with the call's `approvalKey`, and only then does the action run.
+**Keep approvals rare** — the default is off and almost every action should leave it off. The canonical example is Mail's `send-email` (`needsApproval: true`). See the `security` skill and the Human Approval doc.
 ## Frontend Hooks
 The frontend calls actions using React Query hooks from `@agent-native/core/client`. Components should not hand-write `fetch("/_agent-native/actions/...")`; add or reuse a client hook/helper instead. Use `callAction` from the same package for imperative cases that do not fit a hook, such as debounced search, prefetching, or non-React event handlers.

package/src/templates/default/.agents/skills/security/SKILL.md CHANGED Viewed

@@ -139,6 +139,28 @@ export default defineEventHandler(async (event) => {
 - Never create unprotected routes that modify data.
+## Human-in-the-Loop Approval for High-Consequence Actions
+For a small set of outward-facing, hard-to-undo operations — sending an email, charging a card, deleting an account, posting publicly — auth and access control are necessary but not sufficient: you also do not want the **agent** to perform them autonomously. Set `needsApproval` on the `defineAction` so the agent cannot run the action without a human approving the specific call.
+```ts
+export default defineAction({
+  description: "Send an email via Gmail.",
+  schema: z.object({ to: z.string(), subject: z.string(), body: z.string() }),
+  needsApproval: true, // or (args, ctx) => boolean | Promise<boolean>
+  run: async (args) => {
+    /* ...actually send... */
+  },
+});
+```
+When the gate is truthy and the call is not yet approved, the loop emits an `approval_required` event and **stops the turn — `run()` never executes**. The human approves via the chat UI's Approve affordance, which re-issues the turn with the call's stable `approvalKey`; only then does the action run. A predicate gates conditionally (e.g. only external recipients) and **fails closed** — a throw is treated as "approval required".
+Rules:
+- Reach for `needsApproval` only for genuinely high-consequence operations. The default is off, and the framework intentionally keeps approvals rare — over-gating turns the agent into a click-through wizard. The canonical (and intentionally lone) framework example is Mail's `send-email`.
+- `needsApproval` is **not** a substitute for `accessFilter` / `assertAccess` or for hiding sensitive operations from the model with `agentTool: false` / `toolCallable: false`. It is the layer for "a human must explicitly bless this specific outward-facing call," not for scoping data. See the `actions` skill for the full surface.
 ## Custom HTTP Routes Must Apply Access Control Themselves
 This is the single most-failed rule in the codebase. Auto-mounted action routes (`/_agent-native/actions/...`) get a request context wired up automatically. **Hand-written `/api/*` Nitro routes do not.** If your handler queries an ownable resource (any table with `...ownableColumns()`), you MUST:

package/src/templates/workspace-core/.agents/skills/actions/SKILL.md CHANGED Viewed

@@ -112,7 +112,10 @@ action trio instead:
   docs/spec URLs, placeholders, and examples without exposing secrets.
 - `provider-api-docs`: fetches public provider docs/spec/changelog URLs when
   the exact endpoint, filter operator, payload shape, or pagination contract is
-  uncertain. Registered docs URLs are curated starting points.
+  uncertain. Registered docs URLs are curated starting points. Use
+  `responseMode: "markdown"` for clean readable docs, or
+  `responseMode: "matches"` with `search: { query | terms | regex }` for
+  compact snippets instead of flooding context with raw HTML.
 - `provider-api-request`: makes a constrained authenticated HTTP request to the
   provider host, injects configured credentials, blocks private/internal URLs,
   and redacts secrets.
@@ -151,6 +154,12 @@ pagination status, truncation, failed pages, and uncovered gaps. They must not
 turn default limits, sampled rows, truncated excerpts, or aborted calls into a
 confident "none found", "all records", or exhaustive conclusion.
+For public web pages and docs, prefer the token-efficient path: `web-search`
+to find likely URLs, `web-request` or `provider-api-docs` with clean
+`responseMode` output to read a page, and `run-code` with `webRead()` /
+`webFetch()` when you need to grep, aggregate, or compare many pages before
+returning a small result.
 ### The `http` Option
 Controls how the action is exposed as an HTTP endpoint:
@@ -195,6 +204,48 @@ run: async (args) => {
 }
 ```
+### Validating Return Values (`outputSchema`)
+`schema` validates inputs; `outputSchema` validates what the action **returns**. Pass any Standard Schema-compatible schema (Zod, Valibot, ArkType) and the framework validates the result _after_ `run()` resolves — input validated before `run`, output after.
+```ts
+export default defineAction({
+  description: "Summarize a thread.",
+  schema: z.object({ threadId: z.string() }),
+  outputSchema: z.object({ summary: z.string(), messageCount: z.number() }),
+  outputErrorStrategy: "warn", // default; "strict" | "fallback"
+  // outputFallback: { summary: "", messageCount: 0 }, // used only by "fallback"
+  run: async ({ threadId }) => {
+    /* ... */
+  },
+});
+```
+- `"warn"` (default) — `console.warn` the issues and return the **original** result unchanged. Non-breaking.
+- `"strict"` — throw a clear error so a buggy action surfaces loudly.
+- `"fallback"` — return `outputFallback` in place of the invalid result.
+On success the validated value is returned, so coercion/defaults on `outputSchema` apply. Omit `outputSchema` and behavior is byte-for-byte unchanged (no wrapping).
+### Human-in-the-Loop Approval (`needsApproval`)
+For high-consequence, outward-facing, hard-to-undo actions (sending an email, charging a card, deleting an account), set `needsApproval` so the agent **cannot** run the action without a human approving the specific call:
+```ts
+export default defineAction({
+  description: "Send an email via Gmail.",
+  schema: z.object({ to: z.string(), subject: z.string(), body: z.string() }),
+  needsApproval: true, // boolean, or (args, ctx) => boolean | Promise<boolean>
+  run: async (args) => {
+    /* ...actually send... */
+  },
+});
+```
+When the gate is truthy and the call isn't yet approved, the loop emits an `approval_required` event and **stops the turn — `run()` never executes**. A predicate gates conditionally (e.g. only external recipients) and **fails closed**: a throw is treated as "approval required". The human approves via the chat UI's Approve affordance, which re-issues the turn with the call's `approvalKey`, and only then does the action run.
+**Keep approvals rare** — the default is off and almost every action should leave it off. The canonical example is Mail's `send-email` (`needsApproval: true`). See the `security` skill and the Human Approval doc.
 ## Frontend Hooks
 The frontend calls actions using React Query hooks from `@agent-native/core/client`. Components should not hand-write `fetch("/_agent-native/actions/...")`; add or reuse a client hook/helper instead. Use `callAction` from the same package for imperative cases that do not fit a hook, such as debounced search, prefetching, or non-React event handlers.

package/src/templates/workspace-core/.agents/skills/external-agents/SKILL.md CHANGED Viewed

@@ -197,7 +197,7 @@ path is obvious.
 `defineAction` accepts an optional `link` builder. When set, every MCP/A2A
 result for that tool auto-appends a markdown `[label →](absoluteUrl)` block and
 a structured `_meta["agent-native/openLink"] = { label, view, webUrl,
-desktopUrl }`; `tools/list` adds
+desktopUrl, vscodeUrl }`; `tools/list` adds
 `annotations["agent-native/producesOpenLink"]` plus a description suffix so the
 external agent knows the tool yields an openable link.
@@ -285,9 +285,11 @@ ngrok/prod testing caveats are documented in
 `buildDeepLink(...)` returns the app-relative path
 `/_agent-native/open?app=…&view=…&<recordId>=…`. The MCP layer turns that into
-an absolute web URL (`toAbsoluteOpenUrl`, using the request origin) and a
-desktop `agentnative://open?…` URL (`toDesktopOpenUrl`). When the user clicks
-it in any browser or inline webview, `GET /_agent-native/open`
+an absolute web URL (`toAbsoluteOpenUrl`, using the request origin), a
+desktop `agentnative://open?…` URL (`toDesktopOpenUrl`), and a VS Code
+extension URL (`toVsCodeOpenUrl`) for
+`vscode://builderio.agent-native/open?url=…`. When the user clicks the web
+link in any browser or inline webview, `GET /_agent-native/open`
 (`createOpenRouteHandler`, mounted by the core routes plugin, gated by
 `disableOpenRoute`, customizable via `resolveOpenPath`):

package/src/templates/workspace-core/.agents/skills/observability/SKILL.md CHANGED Viewed

@@ -220,3 +220,14 @@ await putSetting("observability-config", {
 ```
 The framework emits `gen_ai.*` semantic convention spans compatible with Langfuse, Datadog, Grafana, New Relic, and any OTel-compatible backend.
+## Live OpenTelemetry Spans (Optional)
+Separate from the `exporters` config above (which ships the in-house traces to an OTLP endpoint), the agent loop can also emit **live OpenTelemetry spans** for every run, model call, and tool call, so a host that already runs an OTel collector sees agent activity alongside its other distributed traces.
+This layer is optional and **no-op by default**:
+- `@opentelemetry/api` is an **optional dependency**. If it isn't installed, the span helpers degrade to silent no-ops — they never throw into the agent loop.
+- Even with the api package installed, it ships a default no-op tracer. Spans become real only once the **host registers a `TracerProvider`** (via `@opentelemetry/sdk-node` or similar). The framework deliberately does not depend on the heavy SDK/exporter packages and never registers a provider itself — instrumentation is opt-in by the embedding app.
+The loop emits `agent.run` (with `agent.run_id`, `agent.thread_id`, `agent.user_id`, `agent.model`), `tool.call` (`tool.name` + status), and `llm.call` spans, each finished with OK/ERROR status. This is purely additive to the in-house `agent_trace_spans` / `agent_trace_summaries` tables. Source: `packages/core/src/observability/tracing.ts` + `traces.ts`. See the Observability doc for the full table.