@agent-native/core 0.53.0 → 0.54.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (99) hide show
  1. package/dist/action.d.ts +40 -1
  2. package/dist/action.d.ts.map +1 -1
  3. package/dist/action.js +69 -2
  4. package/dist/action.js.map +1 -1
  5. package/dist/agent/index.d.ts +1 -0
  6. package/dist/agent/index.d.ts.map +1 -1
  7. package/dist/agent/index.js +1 -0
  8. package/dist/agent/index.js.map +1 -1
  9. package/dist/agent/observational-memory/index.d.ts +6 -6
  10. package/dist/agent/observational-memory/index.js +6 -6
  11. package/dist/agent/observational-memory/index.js.map +1 -1
  12. package/dist/agent/observational-memory/read.d.ts +7 -9
  13. package/dist/agent/observational-memory/read.d.ts.map +1 -1
  14. package/dist/agent/observational-memory/read.js +7 -9
  15. package/dist/agent/observational-memory/read.js.map +1 -1
  16. package/dist/agent/processors.d.ts +146 -0
  17. package/dist/agent/processors.d.ts.map +1 -0
  18. package/dist/agent/processors.js +122 -0
  19. package/dist/agent/processors.js.map +1 -0
  20. package/dist/agent/production-agent.d.ts +10 -0
  21. package/dist/agent/production-agent.d.ts.map +1 -1
  22. package/dist/agent/production-agent.js +101 -0
  23. package/dist/agent/production-agent.js.map +1 -1
  24. package/dist/agent/run-loop-with-resume.d.ts.map +1 -1
  25. package/dist/agent/run-loop-with-resume.js +4 -5
  26. package/dist/agent/run-loop-with-resume.js.map +1 -1
  27. package/dist/agent/tool-call-journal.d.ts +6 -8
  28. package/dist/agent/tool-call-journal.d.ts.map +1 -1
  29. package/dist/agent/tool-call-journal.js +6 -8
  30. package/dist/agent/tool-call-journal.js.map +1 -1
  31. package/dist/agent/types.d.ts +11 -0
  32. package/dist/agent/types.d.ts.map +1 -1
  33. package/dist/agent/types.js.map +1 -1
  34. package/dist/cli/plan-local.d.ts.map +1 -1
  35. package/dist/cli/plan-local.js +129 -4
  36. package/dist/cli/plan-local.js.map +1 -1
  37. package/dist/cli/skills.d.ts.map +1 -1
  38. package/dist/cli/skills.js +38 -3
  39. package/dist/cli/skills.js.map +1 -1
  40. package/dist/coding-tools/run-code.d.ts.map +1 -1
  41. package/dist/coding-tools/run-code.js +18 -2
  42. package/dist/coding-tools/run-code.js.map +1 -1
  43. package/dist/extensions/fetch-tool.d.ts.map +1 -1
  44. package/dist/extensions/fetch-tool.js +80 -15
  45. package/dist/extensions/fetch-tool.js.map +1 -1
  46. package/dist/extensions/web-content.d.ts +61 -0
  47. package/dist/extensions/web-content.d.ts.map +1 -0
  48. package/dist/extensions/web-content.js +468 -0
  49. package/dist/extensions/web-content.js.map +1 -0
  50. package/dist/extensions/web-search-tool.js +3 -3
  51. package/dist/extensions/web-search-tool.js.map +1 -1
  52. package/dist/mcp/build-server.d.ts.map +1 -1
  53. package/dist/mcp/build-server.js +4 -1
  54. package/dist/mcp/build-server.js.map +1 -1
  55. package/dist/provider-api/corpus-jobs.d.ts +80 -0
  56. package/dist/provider-api/corpus-jobs.d.ts.map +1 -1
  57. package/dist/provider-api/corpus-jobs.js +219 -22
  58. package/dist/provider-api/corpus-jobs.js.map +1 -1
  59. package/dist/provider-api/index.d.ts +24 -32
  60. package/dist/provider-api/index.d.ts.map +1 -1
  61. package/dist/provider-api/index.js +28 -1
  62. package/dist/provider-api/index.js.map +1 -1
  63. package/dist/server/agent-chat-plugin.js +1 -1
  64. package/dist/server/agent-chat-plugin.js.map +1 -1
  65. package/dist/server/better-auth-instance.d.ts +7 -0
  66. package/dist/server/better-auth-instance.d.ts.map +1 -1
  67. package/dist/server/better-auth-instance.js +90 -0
  68. package/dist/server/better-auth-instance.js.map +1 -1
  69. package/dist/server/deep-link.d.ts +7 -0
  70. package/dist/server/deep-link.d.ts.map +1 -1
  71. package/dist/server/deep-link.js +13 -2
  72. package/dist/server/deep-link.js.map +1 -1
  73. package/dist/server/index.d.ts +1 -1
  74. package/dist/server/index.d.ts.map +1 -1
  75. package/dist/server/index.js +1 -1
  76. package/dist/server/index.js.map +1 -1
  77. package/dist/templates/default/.agents/skills/actions/SKILL.md +52 -1
  78. package/dist/templates/default/.agents/skills/security/SKILL.md +22 -0
  79. package/dist/templates/workspace-core/.agents/skills/actions/SKILL.md +52 -1
  80. package/dist/templates/workspace-core/.agents/skills/external-agents/SKILL.md +6 -4
  81. package/dist/templates/workspace-core/.agents/skills/observability/SKILL.md +11 -0
  82. package/dist/templates/workspace-core/.agents/skills/security/SKILL.md +22 -0
  83. package/docs/content/actions.md +50 -0
  84. package/docs/content/durable-resume.md +49 -0
  85. package/docs/content/external-agents.md +2 -2
  86. package/docs/content/human-approval.md +101 -0
  87. package/docs/content/observability.md +21 -0
  88. package/docs/content/observational-memory.md +63 -0
  89. package/docs/content/plan-plugin.md +5 -0
  90. package/docs/content/pr-visual-recap.md +4 -3
  91. package/docs/content/processors.md +99 -0
  92. package/docs/content/template-plan.md +78 -14
  93. package/package.json +6 -1
  94. package/src/templates/default/.agents/skills/actions/SKILL.md +52 -1
  95. package/src/templates/default/.agents/skills/security/SKILL.md +22 -0
  96. package/src/templates/workspace-core/.agents/skills/actions/SKILL.md +52 -1
  97. package/src/templates/workspace-core/.agents/skills/external-agents/SKILL.md +6 -4
  98. package/src/templates/workspace-core/.agents/skills/observability/SKILL.md +11 -0
  99. package/src/templates/workspace-core/.agents/skills/security/SKILL.md +22 -0
@@ -188,6 +188,27 @@ await putSetting("observability-config", {
188
188
 
189
189
  The framework emits `gen_ai.*` semantic convention spans compatible with the OpenTelemetry GenAI spec.
190
190
 
191
+ ## OpenTelemetry spans {#otel}
192
+
193
+ Separate from the `exporters` config above (which ships the in-house traces to an OTLP endpoint), the agent loop can also emit **live OpenTelemetry spans** for every run, model call, and tool call — so a host that already runs an OTel collector sees agent activity alongside the rest of its distributed traces.
194
+
195
+ This layer is **optional and no-op by default**:
196
+
197
+ - `@opentelemetry/api` is an **optional dependency**. If it isn't installed, the helpers degrade to silent no-ops — nothing here ever throws into the agent loop.
198
+ - Even when the api package _is_ present, it ships a default no-op tracer. Spans only become real once the **host registers a `TracerProvider`** (via `@opentelemetry/sdk-node` or similar). The framework deliberately does **not** depend on the heavy SDK/exporter packages or register a provider itself — instrumentation is opt-in by the embedding app.
199
+
200
+ So the cost when you haven't wired OTel is a couple of cached property reads per call. To turn it on, install the api package plus your SDK and register a provider at server startup the same way you would for any other Node service.
201
+
202
+ The agent loop emits three span kinds:
203
+
204
+ | Span | When | Attributes |
205
+ | ----------- | -------------------------- | ----------------------------------------------------------------- |
206
+ | `agent.run` | once per agent run | `agent.run_id`, `agent.thread_id`, `agent.user_id`, `agent.model` |
207
+ | `tool.call` | once per action invocation | `tool.name`, plus success/error status |
208
+ | `llm.call` | per model call | timing + OK/error status |
209
+
210
+ Spans are finished with OK/ERROR status and record the error message on failure. Zero/sentinel attribute values are pruned so spans aren't cluttered with noise. This OTel layer is purely additive to the in-house `agent_trace_spans` / `agent_trace_summaries` tables that power the dashboard above — both are produced from the same run events.
211
+
191
212
  ## Error reporting (Sentry) {#sentry}
192
213
 
193
214
  Server-side errors that escape Nitro route handlers are reported to Sentry when a DSN is configured. Without it the SDK silently no-ops, so it's safe to leave the env vars unset in dev. Browser and server events can go to the same Sentry project; split them into separate projects only when you want operational separation for ownership, volume, quotas, or alert routing.
@@ -0,0 +1,63 @@
1
+ ---
2
+ title: "Observational Memory"
3
+ description: "Background three-tier compaction (recent raw → observations → reflections) that keeps long agent threads cheap and prompt-cache-stable without touching short conversations."
4
+ ---
5
+
6
+ # Observational Memory
7
+
8
+ A long-running agent thread accumulates a huge transcript: every message, every tool call, every result. Replaying that whole history into the model on each turn is expensive and eventually blows the context window. **Observational Memory (OM)** compacts the older part of a long thread into a dated, layered summary so the model still knows what happened — just at a fraction of the token cost — while the most recent turns stay verbatim.
9
+
10
+ OM is entirely automatic and owner-scoped. **Short threads are unaffected**: until a thread crosses the first compaction threshold, OM is a no-op and the context is byte-for-byte what it would be without it.
11
+
12
+ ## The three tiers {#tiers}
13
+
14
+ OM represents a long thread as three layers, from most-distilled to most-recent:
15
+
16
+ | Tier | What it is |
17
+ | ----------------------- | ------------------------------------------------------------------------------------------------- |
18
+ | **Reflections** | Highest-level, condensed from the observation log once it grows large. The long-arc summary. |
19
+ | **Observations** | Dense, dated entries that fold a stretch of raw messages into a compact record of what happened. |
20
+ | **Recent raw messages** | The last N turns, kept **verbatim** — never folded — so the agent always sees the latest context. |
21
+
22
+ On each turn, the read side assembles these into a single self-labeled `[Observational Memory]` block that replaces the raw older prefix, keeps the recent-raw window intact, and tells the model to treat the compacted record as authoritative (don't redo completed work, trust the recorded decisions, names, dates, and status).
23
+
24
+ ## How compaction runs {#compaction}
25
+
26
+ Two passes run as a **fire-and-forget, best-effort** step _after_ a clean turn, so they never add latency to the user-visible response and any failure is swallowed:
27
+
28
+ 1. **Observer** — once a thread's _unobserved_ messages exceed the observation token threshold, folds them into a single dense observation entry.
29
+ 2. **Reflector** — once the persisted observation log itself exceeds the reflection token threshold, condenses the observations into a higher-level reflection.
30
+
31
+ Both passes no-op below their thresholds, so calling the compactor after every turn is cheap. Because OM replaces the volatile raw prefix with stable compacted text, it also keeps the prompt **cache-stable** across turns of a long thread.
32
+
33
+ OM data lives in the app's own SQL database, scoped to the owner (and org when present) — the same scoping model as the rest of the framework. It is never shared across users.
34
+
35
+ ## Configuration {#config}
36
+
37
+ Defaults are conservative. An operator can dial compaction at deploy time with `AGENT_NATIVE_OM_*` environment variables (no redeploy of the app code needed); an invalid or missing value always falls back to the named default.
38
+
39
+ | Env var | Default | What it controls |
40
+ | --------------------------------------------- | ------- | -------------------------------------------------------------------------------------- |
41
+ | `AGENT_NATIVE_OM_OBSERVATION_TOKEN_THRESHOLD` | `30000` | Unobserved-message tokens that trigger the Observer to fold them into one observation. |
42
+ | `AGENT_NATIVE_OM_REFLECTION_TOKEN_THRESHOLD` | `40000` | Observation-log tokens that trigger the Reflector to condense into a reflection. |
43
+ | `AGENT_NATIVE_OM_RECENT_RAW_MESSAGE_COUNT` | `12` | How many of the most-recent messages stay verbatim (never folded into an observation). |
44
+
45
+ The Observer and Reflector output caps (4000 / 2000 tokens) keep a single compaction pass from itself blowing the budget; they are tunable in code via `resolveObservationalMemoryConfig({ ... })` but not env-exposed.
46
+
47
+ > [!TIP]
48
+ > Lower the thresholds to compact sooner (cheaper long threads, slightly more summarization); raise them to keep more raw history in context before compacting. Set `AGENT_NATIVE_OM_RECENT_RAW_MESSAGE_COUNT` higher if your workflows need a longer verbatim tail.
49
+
50
+ ## When it kicks in {#when}
51
+
52
+ OM only changes behavior for threads long enough to have produced at least one observation or reflection. Concretely:
53
+
54
+ - A brand-new or short thread: no OM entries yet → the context is the plain transcript, unchanged.
55
+ - A long thread that has crossed the observation threshold: the older prefix is replaced by the compacted `[Observational Memory]` block, the recent-raw tail stays verbatim, and token usage drops substantially.
56
+
57
+ The injection is best-effort and boundary-safe — if a safe trim point can't be found (e.g. a pending tool-use/result pair sits at the window edge), OM injects the memory block _additively_ without trimming rather than risk dropping a pending tool result.
58
+
59
+ ## Related
60
+
61
+ - [**Context X-Ray**](/docs/using-your-agent) — inspect what's actually in the live context window.
62
+ - [**Observability**](/docs/observability) — token and cost metrics per run, where OM's savings show up.
63
+ - [**Custom Agents & Teams**](/docs/agent-teams) — long sub-agent runs benefit from the same compaction.
@@ -40,6 +40,11 @@ the Plan MCP connector. They write `plans/<slug>/plan.mdx` plus optional
40
40
  npx @agent-native/core@latest plan local preview --dir plans/<slug> --kind plan --open
41
41
  ```
42
42
 
43
+ For folders in the current repo, the direct local route includes `?path=...` so
44
+ the local Plan app can keep browser edits saving to the repo folder. The Plan
45
+ app uses `apps.plan.roots[0].path` in `agent-native.json` as the default place
46
+ to save promoted local plans, falling back to `plans/`.
47
+
43
48
  This keeps plan content out of the Agent-Native Plan database. Hosted sharing,
44
49
  comments, screenshots, and plan history are unavailable until you explicitly
45
50
  publish later.
@@ -223,9 +223,10 @@ The returned URL opens the hosted Plan UI while the browser reads the recap MDX
223
223
  from a localhost bridge. Recap content is not written to the hosted Plan
224
224
  database, and the URL only works on the machine running the bridge. If you run
225
225
  the Plan app locally with the same `PLAN_LOCAL_DIR`, the
226
- `/local-plans/pr-123-visual-recap` route is also valid. This mode disables the
227
- hosted sticky PR comment, inline screenshot upload, usage attachment, and
228
- browser comments until you explicitly publish.
226
+ `/local-plans/pr-123-visual-recap` route is also valid. Repo-backed folders can
227
+ open as `/local-plans/pr-123-visual-recap?path=plans%2Fpr-123-visual-recap`.
228
+ This mode disables the hosted sticky PR comment, inline screenshot upload,
229
+ usage attachment, and browser comments until you explicitly publish.
229
230
 
230
231
  ## It's informational, not a gate
231
232
 
@@ -0,0 +1,99 @@
1
+ ---
2
+ title: "In-Loop Processors"
3
+ description: "Loop-internal observer/guardrail hooks that watch the model's streamed output and tool calls mid-run and can abort it — the seam for real-time guardrails and proof-of-done gates."
4
+ ---
5
+
6
+ # In-Loop Processors
7
+
8
+ A `Processor` is a loop-internal **observer/guardrail** for the agent run. It watches the model's streamed output and the tool calls it requests _as the run progresses_, keeps its own scratch state, and can **abort** the run before a "done" is claimed. This is the structural prerequisite for real-time guardrails (block disallowed output mid-stream) and a proof-of-done / coverage gate (inspect what the model is about to do and halt it).
9
+
10
+ > [!WARNING]
11
+ > A processor is **configuration**, not a tool, not an action, and not an authoring DSL. Processors only observe, mutate their own stream-scoped state, and `abort()`. They never define app behavior, replace actions, or appear to the model. App operations belong in [actions](/docs/actions).
12
+
13
+ ## The hooks {#hooks}
14
+
15
+ A processor implements any subset of three optional lifecycle hooks (the shape is borrowed from Mastra's output processors):
16
+
17
+ | Hook | Fires… | Use it to… |
18
+ | --------------------- | --------------------------------------------------------------------- | ----------------------------------------------------------- |
19
+ | `processOutputStream` | per streamed chunk (text / thinking deltas) while the model generates | react to output before the full turn lands |
20
+ | `processOutputStep` | once per model response, around tool execution | inspect the tool calls the model is about to run; gate them |
21
+ | `processOutputResult` | once at run end, with the final assistant text | record a verdict / proof-of-done over the completed answer |
22
+
23
+ Each processor gets its own mutable, run-scoped `state` object that persists across every one of its hook invocations within a single run and is **isolated** from other processors' state.
24
+
25
+ ```ts
26
+ import type { Processor } from "@agent-native/core";
27
+
28
+ const noSecretsInOutput: Processor = {
29
+ name: "no-secrets",
30
+ processOutputStream({ part, abort }) {
31
+ if (part.type === "text" && /sk-live_/.test(part.text)) {
32
+ abort("Model attempted to emit a live secret token.", {
33
+ kind: "secret-leak",
34
+ });
35
+ }
36
+ },
37
+ };
38
+
39
+ const coverageGate: Processor = {
40
+ name: "proof-of-done",
41
+ processOutputStep({ toolCalls, state }) {
42
+ // Track what the model has actually done this run...
43
+ for (const call of toolCalls) {
44
+ (state.ran ??= new Set<string>()).add(call.name);
45
+ }
46
+ },
47
+ processOutputResult({ text, state }) {
48
+ // ...and record a verdict over the final answer.
49
+ const ran = state.ran as Set<string> | undefined;
50
+ state.verdict = ran?.has("run-tests") ? "verified" : "unverified";
51
+ },
52
+ };
53
+ ```
54
+
55
+ ## Aborting with `TripWire` {#tripwire}
56
+
57
+ A hook halts the run by calling `abort(reason, meta?)`, which throws a **`TripWire`**. The loop catches it, emits a single **`tripwire` event**, stops cleanly, and surfaces the reason as the final assistant message.
58
+
59
+ ```ts
60
+ import { TripWire } from "@agent-native/core";
61
+ ```
62
+
63
+ The `tripwire` event carries:
64
+
65
+ | Field | Type | Notes |
66
+ | ----------- | -------- | -------------------------------------------------------------- |
67
+ | `reason` | `string` | The human-readable reason passed to `abort`. |
68
+ | `processor` | `string` | Name of the processor that aborted, when it declared a `name`. |
69
+
70
+ `TripWire` also carries optional structured `meta` and the originating `processor` name for programmatic consumers that `instanceof`-check it. Because a halt is graceful, `processOutputResult` still fires on the (halted) final text so a proof-of-done processor can record its verdict even when the run was aborted.
71
+
72
+ ## Wiring processors {#wiring}
73
+
74
+ Processors are configured in code via the `processors` array on `runAgentLoop`:
75
+
76
+ ```ts
77
+ await runAgentLoop({
78
+ engine,
79
+ model,
80
+ systemPrompt,
81
+ tools,
82
+ messages,
83
+ actions,
84
+ send,
85
+ signal,
86
+ processors: [noSecretsInOutput, coverageGate],
87
+ });
88
+ ```
89
+
90
+ **Zero-overhead when unused.** The loop builds the processor chain only when at least one processor is supplied; when `processors` is omitted or empty, none of the seam code runs and the loop is byte-for-byte unchanged. Hooks run in registration order and may be sync or async.
91
+
92
+ > [!NOTE]
93
+ > The loop-level seam is the deliverable today and is callable directly by sub-agents, A2A, MCP, and tests. Threading `processors` through the HTTP chat handler (so a per-request resolver can configure them without calling `runAgentLoop` directly) is convenience plumbing that is not yet wired — configure processors at the `runAgentLoop` call site for now.
94
+
95
+ ## Related
96
+
97
+ - [**Durable Resume**](/docs/durable-resume) — how the loop survives interruptions without re-running completed side effects.
98
+ - [**Custom Agents & Teams**](/docs/agent-teams) — sub-agents run the same loop and can carry their own processors.
99
+ - [**Observability**](/docs/observability) — record processor verdicts alongside run traces.
@@ -97,6 +97,22 @@ connector, so use the Agent-Native CLI path when you want the one-command setup.
97
97
  > Plan skills _and_ the connector in one install and auto-updates as the skills
98
98
  > improve — see [Plan plugin & marketplace](/docs/plan-plugin).
99
99
 
100
+ ### Open Plans inside VS Code {#vscode-extension}
101
+
102
+ If you live in VS Code, the Agent Native VS Code extension can open the same
103
+ Plan review surface in a side panel instead of sending you to a separate browser
104
+ tab. Plans tools still return the normal web link, and the MCP metadata also
105
+ includes a VS Code handoff URL:
106
+
107
+ ```text
108
+ vscode://builderio.agent-native/open?url=<encoded-plan-url>
109
+ ```
110
+
111
+ The extension handles that URI, opens the decoded Plan URL in a VS Code webview,
112
+ and includes a command to run the existing Agent Native MCP connect flow for VS
113
+ Code / GitHub Copilot. This is especially useful from Claude Code or another
114
+ coding-agent workflow where the plan should stay next to the files being edited.
115
+
100
116
  ## Use it from your coding agent
101
117
 
102
118
  After installation, ask your agent for the command that fits the work:
@@ -110,9 +126,9 @@ After installation, ask your agent for the command that fits the work:
110
126
  before/after blocks instead of a wall of raw diff.
111
127
 
112
128
  The agent should inspect the codebase first, then create the visual plan when a
113
- wrong direction would be costly. The returned Plans link opens the review UI so
114
- you can annotate, correct, choose options, and ask for updates before code
115
- changes begin.
129
+ wrong direction would be costly. The returned Plans link opens the review UI in
130
+ the browser or VS Code, so you can annotate, correct, choose options, and ask for
131
+ updates before code changes begin.
116
132
 
117
133
  When a Codex, Claude Code, Markdown, or pasted plan already exists, use
118
134
  `/visual-plan`; the agent preserves that source plan and builds the richer review
@@ -207,12 +223,38 @@ not sent through hosted Plan actions. Keep the bridge process running while you
207
223
  review; the URL is local to your machine and is not a shareable team link.
208
224
 
209
225
  If you run the Plan app locally with the same `PLAN_LOCAL_DIR`, you can also
210
- open the read-only app route:
226
+ open the editable app route:
211
227
 
212
228
  ```text
213
229
  http://localhost:<port>/local-plans/<slug>
214
230
  ```
215
231
 
232
+ For repo-backed folders, the direct local route can carry the repo-relative
233
+ folder path so browser edits keep writing to that folder:
234
+
235
+ ```text
236
+ http://localhost:<port>/local-plans/<slug>?path=plans%2F<slug>
237
+ ```
238
+
239
+ The Plan app uses `apps.plan.roots[0].path` in `agent-native.json` as the
240
+ default repo location for promoted local plans, falling back to `plans/`:
241
+
242
+ ```json
243
+ {
244
+ "version": 1,
245
+ "apps": {
246
+ "plan": {
247
+ "mode": "local-files",
248
+ "roots": [{ "name": "Plans", "path": "plans", "kind": "plans" }]
249
+ }
250
+ }
251
+ }
252
+ ```
253
+
254
+ Direct local Plan routes include a menu action to save a temporary local folder
255
+ into that repo location. After promotion, the page reopens with `?path=...` and
256
+ continues autosaving MDX edits to the repo folder.
257
+
216
258
  Local-files mode prevents plan or recap content from going to the Agent-Native
217
259
  Plan database. It also disables hosted sharing, browser comments, plan history,
218
260
  and publish/export receipts until you explicitly opt into publishing. To move a
@@ -244,6 +286,27 @@ This path does not require cloning the Plan app or running a CLI. It is for
244
286
  file-first review/editing around a hosted plan, not for keeping plan content out
245
287
  of the hosted database.
246
288
 
289
+ ## Deleting hosted plan data {#delete-data}
290
+
291
+ Signed-in owners can delete their hosted plans and recaps from the Plans list or
292
+ the plan action menu.
293
+
294
+ - **Soft delete** moves the plan to the **Deleted** tab, makes normal plan
295
+ views/direct links stop working, and removes public access by making the row
296
+ private. The SQL rows are retained so the owner can restore the plan later.
297
+ - **Restore** is available from the **Deleted** tab for soft-deleted plans.
298
+ - **Permanent delete** removes the hosted plan row and plan-scoped comments,
299
+ sections, activity events, version snapshots, share grants, abuse reports, and
300
+ SQL asset records. The UI requires typing `DELETE <plan-id>` before the final
301
+ button enables.
302
+
303
+ Permanent delete removes the Plan app's database records and SQL-backed asset
304
+ bytes/references. If a deployment uses an external upload provider, provider
305
+ object retention follows that provider's lifecycle because the shared upload
306
+ abstraction does not currently expose object deletion. Local-files privacy mode
307
+ keeps the source in your local MDX folder instead; deleting hosted data does not
308
+ touch local files.
309
+
247
310
  ## Useful prompts
248
311
 
249
312
  - "Use `/visual-plan` before changing the auth flow."
@@ -287,16 +350,16 @@ The local template is useful when you are developing Plans itself, testing local
287
350
 
288
351
  Schema lives in `templates/plan/server/db/schema.ts`. Core tables:
289
352
 
290
- | Table | What it holds |
291
- | ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------ |
292
- | `plans` | Each plan or recap — `title`, `brief`, `kind` (plan/recap), `status`, `source`, `html`/`markdown`/`content`, `hosted_plan_id/url`, usage stats, `source_url` |
293
- | `plan_sections` | Ordered sections within a plan — `type`, `title`, `body`, `html`, `sort_order`, `created_by` |
294
- | `plan_comments` | Threaded comments — `kind`, `status`, `anchor`, `message`, `resolution_target`, `mentions_json`, `resolved_by` |
295
- | `plan_events` | Audit log of agent/human events on a plan |
296
- | `plan_versions` | Point-in-time snapshots for version history |
297
- | `plan_shares` | Per-principal share grants (viewer / editor / admin) |
298
- | `plan_guest_mints` | Rate-limit records for guest session issuance |
299
- | `plan_assets` | Inline image assets stored as base64 (fallback when no upload provider) |
353
+ | Table | What it holds |
354
+ | ------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
355
+ | `plans` | Each plan or recap — `title`, `brief`, `kind` (plan/recap), `status`, `source`, `html`/`markdown`/`content`, `hosted_plan_id/url`, usage stats, `source_url`, `deleted_at`/`deleted_by` |
356
+ | `plan_sections` | Ordered sections within a plan — `type`, `title`, `body`, `html`, `sort_order`, `created_by` |
357
+ | `plan_comments` | Threaded comments — `kind`, `status`, `anchor`, `message`, `resolution_target`, `mentions_json`, `resolved_by` |
358
+ | `plan_events` | Audit log of agent/human events on a plan |
359
+ | `plan_versions` | Point-in-time snapshots for version history |
360
+ | `plan_shares` | Per-principal share grants (viewer / editor / admin) |
361
+ | `plan_guest_mints` | Rate-limit records for guest session issuance |
362
+ | `plan_assets` | Inline image assets stored as base64 (fallback when no upload provider) |
300
363
 
301
364
  ### Key actions
302
365
 
@@ -304,6 +367,7 @@ Actions in `templates/plan/actions/`:
304
367
 
305
368
  - **Creation** — `create-visual-plan`, `create-visual-recap`, `create-ui-plan`, `create-prototype-plan`, `create-plan-design`, `create-visual-questions`
306
369
  - **Reading & editing** — `get-visual-plan`, `update-visual-plan`, `list-visual-plans`, `import-visual-plan-source`, `patch-visual-plan-source`, `read-visual-plan-source`, `export-visual-plan`
370
+ - **Lifecycle** — `delete-visual-plan` for owner-only soft delete, restore, and typed-confirmation permanent delete
307
371
  - **Publishing & sharing** — `publish-visual-plan`
308
372
  - **Versions** — `list-plan-versions`, `get-plan-version`, `restore-plan-version`
309
373
  - **Comments & feedback** — `get-plan-feedback`, `reply-to-plan-comment`, `resolve-plan-comment`, `consume-plan-feedback`, `delete-plan-comment`
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@agent-native/core",
3
- "version": "0.53.0",
3
+ "version": "0.54.0",
4
4
  "type": "module",
5
5
  "engines": {
6
6
  "node": ">=22"
@@ -165,6 +165,7 @@
165
165
  "@libsql/client": "^0.15.0",
166
166
  "@modelcontextprotocol/ext-apps": "1.7.2",
167
167
  "@modelcontextprotocol/sdk": "^1.29.0",
168
+ "@mozilla/readability": "0.6.0",
168
169
  "@neondatabase/serverless": "^1.1.0",
169
170
  "@radix-ui/react-dialog": "1.1.15",
170
171
  "@radix-ui/react-dropdown-menu": "^2.1.16",
@@ -212,6 +213,7 @@
212
213
  "jiti": "^2.6.1",
213
214
  "jose": "^6.2.2",
214
215
  "kiwi-schema": "^0.5.0",
216
+ "linkedom": "0.18.12",
215
217
  "lowlight": "^3.3.0",
216
218
  "minimatch": "^10.0.0",
217
219
  "nanoid": "^5.1.9",
@@ -224,10 +226,12 @@
224
226
  "recharts": "^3.8.1",
225
227
  "remark-gfm": "^4.0.1",
226
228
  "roughjs": "4.6.6",
229
+ "safe-regex2": "5.1.1",
227
230
  "shiki": "^4.0.2",
228
231
  "sonner": "^2.0.7",
229
232
  "tailwind-merge": "^3.5.0",
230
233
  "tiptap-markdown": "^0.9.0",
234
+ "turndown": "7.2.4",
231
235
  "tw-animate-css": "1.4.0",
232
236
  "y-protocols": "^1.0.7",
233
237
  "yjs": "^13.6.31",
@@ -380,6 +384,7 @@
380
384
  "@types/pako": "^2.0.4",
381
385
  "@types/react": "^19.2.14",
382
386
  "@types/react-dom": "^19.2.3",
387
+ "@types/turndown": "5.0.6",
383
388
  "@types/ws": "^8.18.1",
384
389
  "@vitejs/plugin-react-swc": "^4.0.0",
385
390
  "@vitest/coverage-v8": "4.1.5",
@@ -112,7 +112,10 @@ action trio instead:
112
112
  docs/spec URLs, placeholders, and examples without exposing secrets.
113
113
  - `provider-api-docs`: fetches public provider docs/spec/changelog URLs when
114
114
  the exact endpoint, filter operator, payload shape, or pagination contract is
115
- uncertain. Registered docs URLs are curated starting points.
115
+ uncertain. Registered docs URLs are curated starting points. Use
116
+ `responseMode: "markdown"` for clean readable docs, or
117
+ `responseMode: "matches"` with `search: { query | terms | regex }` for
118
+ compact snippets instead of flooding context with raw HTML.
116
119
  - `provider-api-request`: makes a constrained authenticated HTTP request to the
117
120
  provider host, injects configured credentials, blocks private/internal URLs,
118
121
  and redacts secrets.
@@ -151,6 +154,12 @@ pagination status, truncation, failed pages, and uncovered gaps. They must not
151
154
  turn default limits, sampled rows, truncated excerpts, or aborted calls into a
152
155
  confident "none found", "all records", or exhaustive conclusion.
153
156
 
157
+ For public web pages and docs, prefer the token-efficient path: `web-search`
158
+ to find likely URLs, `web-request` or `provider-api-docs` with clean
159
+ `responseMode` output to read a page, and `run-code` with `webRead()` /
160
+ `webFetch()` when you need to grep, aggregate, or compare many pages before
161
+ returning a small result.
162
+
154
163
  ### The `http` Option
155
164
 
156
165
  Controls how the action is exposed as an HTTP endpoint:
@@ -195,6 +204,48 @@ run: async (args) => {
195
204
  }
196
205
  ```
197
206
 
207
+ ### Validating Return Values (`outputSchema`)
208
+
209
+ `schema` validates inputs; `outputSchema` validates what the action **returns**. Pass any Standard Schema-compatible schema (Zod, Valibot, ArkType) and the framework validates the result _after_ `run()` resolves — input validated before `run`, output after.
210
+
211
+ ```ts
212
+ export default defineAction({
213
+ description: "Summarize a thread.",
214
+ schema: z.object({ threadId: z.string() }),
215
+ outputSchema: z.object({ summary: z.string(), messageCount: z.number() }),
216
+ outputErrorStrategy: "warn", // default; "strict" | "fallback"
217
+ // outputFallback: { summary: "", messageCount: 0 }, // used only by "fallback"
218
+ run: async ({ threadId }) => {
219
+ /* ... */
220
+ },
221
+ });
222
+ ```
223
+
224
+ - `"warn"` (default) — `console.warn` the issues and return the **original** result unchanged. Non-breaking.
225
+ - `"strict"` — throw a clear error so a buggy action surfaces loudly.
226
+ - `"fallback"` — return `outputFallback` in place of the invalid result.
227
+
228
+ On success the validated value is returned, so coercion/defaults on `outputSchema` apply. Omit `outputSchema` and behavior is byte-for-byte unchanged (no wrapping).
229
+
230
+ ### Human-in-the-Loop Approval (`needsApproval`)
231
+
232
+ For high-consequence, outward-facing, hard-to-undo actions (sending an email, charging a card, deleting an account), set `needsApproval` so the agent **cannot** run the action without a human approving the specific call:
233
+
234
+ ```ts
235
+ export default defineAction({
236
+ description: "Send an email via Gmail.",
237
+ schema: z.object({ to: z.string(), subject: z.string(), body: z.string() }),
238
+ needsApproval: true, // boolean, or (args, ctx) => boolean | Promise<boolean>
239
+ run: async (args) => {
240
+ /* ...actually send... */
241
+ },
242
+ });
243
+ ```
244
+
245
+ When the gate is truthy and the call isn't yet approved, the loop emits an `approval_required` event and **stops the turn — `run()` never executes**. A predicate gates conditionally (e.g. only external recipients) and **fails closed**: a throw is treated as "approval required". The human approves via the chat UI's Approve affordance, which re-issues the turn with the call's `approvalKey`, and only then does the action run.
246
+
247
+ **Keep approvals rare** — the default is off and almost every action should leave it off. The canonical example is Mail's `send-email` (`needsApproval: true`). See the `security` skill and the Human Approval doc.
248
+
198
249
  ## Frontend Hooks
199
250
 
200
251
  The frontend calls actions using React Query hooks from `@agent-native/core/client`. Components should not hand-write `fetch("/_agent-native/actions/...")`; add or reuse a client hook/helper instead. Use `callAction` from the same package for imperative cases that do not fit a hook, such as debounced search, prefetching, or non-React event handlers.
@@ -139,6 +139,28 @@ export default defineEventHandler(async (event) => {
139
139
 
140
140
  - Never create unprotected routes that modify data.
141
141
 
142
+ ## Human-in-the-Loop Approval for High-Consequence Actions
143
+
144
+ For a small set of outward-facing, hard-to-undo operations — sending an email, charging a card, deleting an account, posting publicly — auth and access control are necessary but not sufficient: you also do not want the **agent** to perform them autonomously. Set `needsApproval` on the `defineAction` so the agent cannot run the action without a human approving the specific call.
145
+
146
+ ```ts
147
+ export default defineAction({
148
+ description: "Send an email via Gmail.",
149
+ schema: z.object({ to: z.string(), subject: z.string(), body: z.string() }),
150
+ needsApproval: true, // or (args, ctx) => boolean | Promise<boolean>
151
+ run: async (args) => {
152
+ /* ...actually send... */
153
+ },
154
+ });
155
+ ```
156
+
157
+ When the gate is truthy and the call is not yet approved, the loop emits an `approval_required` event and **stops the turn — `run()` never executes**. The human approves via the chat UI's Approve affordance, which re-issues the turn with the call's stable `approvalKey`; only then does the action run. A predicate gates conditionally (e.g. only external recipients) and **fails closed** — a throw is treated as "approval required".
158
+
159
+ Rules:
160
+
161
+ - Reach for `needsApproval` only for genuinely high-consequence operations. The default is off, and the framework intentionally keeps approvals rare — over-gating turns the agent into a click-through wizard. The canonical (and intentionally lone) framework example is Mail's `send-email`.
162
+ - `needsApproval` is **not** a substitute for `accessFilter` / `assertAccess` or for hiding sensitive operations from the model with `agentTool: false` / `toolCallable: false`. It is the layer for "a human must explicitly bless this specific outward-facing call," not for scoping data. See the `actions` skill for the full surface.
163
+
142
164
  ## Custom HTTP Routes Must Apply Access Control Themselves
143
165
 
144
166
  This is the single most-failed rule in the codebase. Auto-mounted action routes (`/_agent-native/actions/...`) get a request context wired up automatically. **Hand-written `/api/*` Nitro routes do not.** If your handler queries an ownable resource (any table with `...ownableColumns()`), you MUST:
@@ -112,7 +112,10 @@ action trio instead:
112
112
  docs/spec URLs, placeholders, and examples without exposing secrets.
113
113
  - `provider-api-docs`: fetches public provider docs/spec/changelog URLs when
114
114
  the exact endpoint, filter operator, payload shape, or pagination contract is
115
- uncertain. Registered docs URLs are curated starting points.
115
+ uncertain. Registered docs URLs are curated starting points. Use
116
+ `responseMode: "markdown"` for clean readable docs, or
117
+ `responseMode: "matches"` with `search: { query | terms | regex }` for
118
+ compact snippets instead of flooding context with raw HTML.
116
119
  - `provider-api-request`: makes a constrained authenticated HTTP request to the
117
120
  provider host, injects configured credentials, blocks private/internal URLs,
118
121
  and redacts secrets.
@@ -151,6 +154,12 @@ pagination status, truncation, failed pages, and uncovered gaps. They must not
151
154
  turn default limits, sampled rows, truncated excerpts, or aborted calls into a
152
155
  confident "none found", "all records", or exhaustive conclusion.
153
156
 
157
+ For public web pages and docs, prefer the token-efficient path: `web-search`
158
+ to find likely URLs, `web-request` or `provider-api-docs` with clean
159
+ `responseMode` output to read a page, and `run-code` with `webRead()` /
160
+ `webFetch()` when you need to grep, aggregate, or compare many pages before
161
+ returning a small result.
162
+
154
163
  ### The `http` Option
155
164
 
156
165
  Controls how the action is exposed as an HTTP endpoint:
@@ -195,6 +204,48 @@ run: async (args) => {
195
204
  }
196
205
  ```
197
206
 
207
+ ### Validating Return Values (`outputSchema`)
208
+
209
+ `schema` validates inputs; `outputSchema` validates what the action **returns**. Pass any Standard Schema-compatible schema (Zod, Valibot, ArkType) and the framework validates the result _after_ `run()` resolves — input validated before `run`, output after.
210
+
211
+ ```ts
212
+ export default defineAction({
213
+ description: "Summarize a thread.",
214
+ schema: z.object({ threadId: z.string() }),
215
+ outputSchema: z.object({ summary: z.string(), messageCount: z.number() }),
216
+ outputErrorStrategy: "warn", // default; "strict" | "fallback"
217
+ // outputFallback: { summary: "", messageCount: 0 }, // used only by "fallback"
218
+ run: async ({ threadId }) => {
219
+ /* ... */
220
+ },
221
+ });
222
+ ```
223
+
224
+ - `"warn"` (default) — `console.warn` the issues and return the **original** result unchanged. Non-breaking.
225
+ - `"strict"` — throw a clear error so a buggy action surfaces loudly.
226
+ - `"fallback"` — return `outputFallback` in place of the invalid result.
227
+
228
+ On success the validated value is returned, so coercion/defaults on `outputSchema` apply. Omit `outputSchema` and behavior is byte-for-byte unchanged (no wrapping).
229
+
230
+ ### Human-in-the-Loop Approval (`needsApproval`)
231
+
232
+ For high-consequence, outward-facing, hard-to-undo actions (sending an email, charging a card, deleting an account), set `needsApproval` so the agent **cannot** run the action without a human approving the specific call:
233
+
234
+ ```ts
235
+ export default defineAction({
236
+ description: "Send an email via Gmail.",
237
+ schema: z.object({ to: z.string(), subject: z.string(), body: z.string() }),
238
+ needsApproval: true, // boolean, or (args, ctx) => boolean | Promise<boolean>
239
+ run: async (args) => {
240
+ /* ...actually send... */
241
+ },
242
+ });
243
+ ```
244
+
245
+ When the gate is truthy and the call isn't yet approved, the loop emits an `approval_required` event and **stops the turn — `run()` never executes**. A predicate gates conditionally (e.g. only external recipients) and **fails closed**: a throw is treated as "approval required". The human approves via the chat UI's Approve affordance, which re-issues the turn with the call's `approvalKey`, and only then does the action run.
246
+
247
+ **Keep approvals rare** — the default is off and almost every action should leave it off. The canonical example is Mail's `send-email` (`needsApproval: true`). See the `security` skill and the Human Approval doc.
248
+
198
249
  ## Frontend Hooks
199
250
 
200
251
  The frontend calls actions using React Query hooks from `@agent-native/core/client`. Components should not hand-write `fetch("/_agent-native/actions/...")`; add or reuse a client hook/helper instead. Use `callAction` from the same package for imperative cases that do not fit a hook, such as debounced search, prefetching, or non-React event handlers.
@@ -197,7 +197,7 @@ path is obvious.
197
197
  `defineAction` accepts an optional `link` builder. When set, every MCP/A2A
198
198
  result for that tool auto-appends a markdown `[label →](absoluteUrl)` block and
199
199
  a structured `_meta["agent-native/openLink"] = { label, view, webUrl,
200
- desktopUrl }`; `tools/list` adds
200
+ desktopUrl, vscodeUrl }`; `tools/list` adds
201
201
  `annotations["agent-native/producesOpenLink"]` plus a description suffix so the
202
202
  external agent knows the tool yields an openable link.
203
203
 
@@ -285,9 +285,11 @@ ngrok/prod testing caveats are documented in
285
285
 
286
286
  `buildDeepLink(...)` returns the app-relative path
287
287
  `/_agent-native/open?app=…&view=…&<recordId>=…`. The MCP layer turns that into
288
- an absolute web URL (`toAbsoluteOpenUrl`, using the request origin) and a
289
- desktop `agentnative://open?…` URL (`toDesktopOpenUrl`). When the user clicks
290
- it in any browser or inline webview, `GET /_agent-native/open`
288
+ an absolute web URL (`toAbsoluteOpenUrl`, using the request origin), a
289
+ desktop `agentnative://open?…` URL (`toDesktopOpenUrl`), and a VS Code
290
+ extension URL (`toVsCodeOpenUrl`) for
291
+ `vscode://builderio.agent-native/open?url=…`. When the user clicks the web
292
+ link in any browser or inline webview, `GET /_agent-native/open`
291
293
  (`createOpenRouteHandler`, mounted by the core routes plugin, gated by
292
294
  `disableOpenRoute`, customizable via `resolveOpenPath`):
293
295
 
@@ -220,3 +220,14 @@ await putSetting("observability-config", {
220
220
  ```
221
221
 
222
222
  The framework emits `gen_ai.*` semantic convention spans compatible with Langfuse, Datadog, Grafana, New Relic, and any OTel-compatible backend.
223
+
224
+ ## Live OpenTelemetry Spans (Optional)
225
+
226
+ Separate from the `exporters` config above (which ships the in-house traces to an OTLP endpoint), the agent loop can also emit **live OpenTelemetry spans** for every run, model call, and tool call, so a host that already runs an OTel collector sees agent activity alongside its other distributed traces.
227
+
228
+ This layer is optional and **no-op by default**:
229
+
230
+ - `@opentelemetry/api` is an **optional dependency**. If it isn't installed, the span helpers degrade to silent no-ops — they never throw into the agent loop.
231
+ - Even with the api package installed, it ships a default no-op tracer. Spans become real only once the **host registers a `TracerProvider`** (via `@opentelemetry/sdk-node` or similar). The framework deliberately does not depend on the heavy SDK/exporter packages and never registers a provider itself — instrumentation is opt-in by the embedding app.
232
+
233
+ The loop emits `agent.run` (with `agent.run_id`, `agent.thread_id`, `agent.user_id`, `agent.model`), `tool.call` (`tool.name` + status), and `llm.call` spans, each finished with OK/ERROR status. This is purely additive to the in-house `agent_trace_spans` / `agent_trace_summaries` tables. Source: `packages/core/src/observability/tracing.ts` + `traces.ts`. See the Observability doc for the full table.