npm - @synapsor/runner - Versions diffs - 0.1.0-alpha.9 → 0.1.1 - Mend

@synapsor/runner 0.1.0-alpha.9 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (85) hide show

package/CHANGELOG.md +189 -0
package/README.md +949 -164
package/dist/cli.d.ts +2 -0
package/dist/cli.d.ts.map +1 -1
package/dist/runner.mjs +2982 -238
package/docs/README.md +90 -15
package/docs/app-owned-executors.md +38 -0
package/docs/capability-authoring.md +265 -0
package/docs/cloud-mode.md +24 -0
package/docs/current-scope.md +29 -0
package/docs/dependency-license-inventory.md +35 -0
package/docs/doctor.md +98 -0
package/docs/getting-started-own-database.md +131 -46
package/docs/handler-helper.md +228 -0
package/docs/http-mcp.md +85 -17
package/docs/licensing.md +36 -0
package/docs/local-mode.md +44 -25
package/docs/mcp-audit.md +8 -8
package/docs/mcp-client-setup.md +59 -21
package/docs/openai-agents-sdk.md +57 -0
package/docs/recipes.md +6 -6
package/docs/release-notes.md +348 -0
package/docs/release-policy.md +125 -0
package/docs/result-envelope-v2.md +151 -0
package/docs/rfcs/001-result-envelope-v2.md +143 -0
package/docs/rfcs/002-app-owned-handler-helper.md +161 -0
package/docs/rfcs/003-integrator-feedback-teardown.md +97 -0
package/docs/store-lifecycle.md +83 -0
package/docs/troubleshooting-first-run.md +6 -6
package/docs/use-your-own-database.md +18 -0
package/docs/writeback-executors.md +92 -1
package/examples/app-owned-writeback/README.md +128 -0
package/examples/app-owned-writeback/business-actions.md +221 -0
package/examples/app-owned-writeback/command-handler.mjs +55 -0
package/examples/app-owned-writeback/node-fastify-handler.mjs +64 -0
package/examples/app-owned-writeback/python-fastapi-handler.py +66 -0
package/examples/claude-desktop-postgres/Makefile +6 -0
package/examples/claude-desktop-postgres/README.md +40 -0
package/examples/cursor-postgres/Makefile +6 -0
package/examples/cursor-postgres/README.md +30 -0
package/examples/mcp-postgres-billing-app-handler/README.md +94 -0
package/examples/mcp-postgres-billing-app-handler/app-handler.mjs +123 -0
package/examples/mcp-postgres-billing-app-handler/docker-compose.yml +13 -0
package/examples/mcp-postgres-billing-app-handler/schema.sql +59 -0
package/examples/mcp-postgres-billing-app-handler/scripts/run-demo.sh +100 -0
package/examples/mcp-postgres-billing-app-handler/seed.sql +39 -0
package/examples/mcp-postgres-billing-app-handler/synapsor-handler.mjs +437 -0
package/examples/mcp-postgres-billing-app-handler/synapsor.runner.json +158 -0
package/examples/mysql-refund-agent/Makefile +4 -0
package/examples/mysql-refund-agent/README.md +36 -0
package/examples/openai-agents-http/Makefile +6 -0
package/examples/openai-agents-http/README.md +33 -12
package/examples/openai-agents-http/agent.py +29 -65
package/examples/openai-agents-stdio/Makefile +6 -0
package/examples/openai-agents-stdio/README.md +24 -6
package/examples/openai-agents-stdio/agent.py +4 -2
package/examples/raw-sql-vs-synapsor/Makefile +11 -0
package/examples/raw-sql-vs-synapsor/README.md +41 -0
package/examples/reference-support-billing-app/README.md +16 -16
package/examples/reference-support-billing-app/mcp-client.generic.json +1 -1
package/examples/support-billing-agent/Makefile +19 -0
package/examples/support-billing-agent/README.md +89 -0
package/examples/support-billing-agent/app/README.md +13 -0
package/examples/support-billing-agent/db/schema.sql +91 -0
package/examples/support-billing-agent/db/seed.sql +43 -0
package/examples/support-billing-agent/docker-compose.yml +13 -0
package/examples/support-billing-agent/scripts/run-demo.sh +15 -0
package/examples/support-billing-agent/synapsor.runner.json +233 -0
package/fixtures/benchmark/mcp-efficiency.json +53 -0
package/fixtures/benchmark/mcp-efficiency.txt +25 -0
package/fixtures/protocol/MANIFEST.json +54 -0
package/fixtures/protocol/change-set.late-fee-waiver.v1.json +72 -0
package/fixtures/protocol/execution-receipt.applied.v1.json +14 -0
package/fixtures/protocol/execution-receipt.conflict.v1.json +15 -0
package/fixtures/protocol/runner-registration.v1.json +22 -0
package/fixtures/protocol/writeback-job.late-fee-waiver.v1.json +44 -0
package/package.json +27 -4
package/schemas/change-set.v1.schema.json +140 -0
package/schemas/execution-receipt.v1.schema.json +34 -0
package/schemas/onboarding-selection.v1.schema.json +132 -0
package/schemas/runner-registration.v1.schema.json +48 -0
package/schemas/synapsor.app-handler-receipt.v1.json +39 -0
package/schemas/synapsor.app-handler-request.v1.json +119 -0
package/schemas/synapsor.runner.schema.json +415 -0
package/schemas/writeback-job.v1.schema.json +121 -0

package/docs/rfcs/001-result-envelope-v2.md ADDED Viewed

@@ -0,0 +1,143 @@
+# Proposal: One Result Envelope for all Synapsor Runner tool results
+Status: draft for synapsor-runner (OSS)
+Author: external integrator feedback (built a full OpenAI Agents SDK + Postgres lab on alpha.6→alpha.11)
+Goal: make every tool result (read, proposal, error) share one shape that is both
+**machine-branchable** and **LLM-legible**, so agents behave reliably and client
+code stops special-casing.
+## Why
+Observed today (alpha.11), the shapes diverge:
+```jsonc
+// read success
+{ "status": "ok", "action": "billing.inspect_invoice", "data": { ... },
+  "evidence_bundle_id": "ev_…", "trusted_context": { ... }, "source_database_changed": false }
+// read not-found / tenant mismatch
+{ "ok": false, "code": "ROW_NOT_FOUND", "error": "The scoped capability read did not find exactly one authorized row." }
+// proposal success
+{ "status": "review_required", "proposal_id": "wrp_…", "diff": { "late_fee_cents": { "before": 2500, "proposed": 0 } },
+  "source_database_changed": false }
+```
+Three different top-level keys (`status` vs `ok`), two different success vocabularies,
+and raw infra strings leaking into `error`. An LLM driving these has to learn three
+branches; in my live tests the model **stalled / misreported** ("database access issue")
+when it hit the off-shape error path.
+## The envelope
+Every tool returns exactly this top-level shape:
+```jsonc
+{
+  "ok": true,                       // boolean — the ONLY field client code must branch on
+  "summary": "Invoice INV-3001: $100 + $25 late fee, status overdue.",  // one-line NL for the model to read/echo
+  "action": "billing.inspect_invoice",
+  "kind": "read",                   // read | proposal
+  "data": { ... } | null,           // read payload (the row), or null
+  "proposal": { ... } | null,       // proposal payload (see below), or null
+  "error": { ... } | null,          // populated iff ok=false (see below)
+  "evidence": { "bundle_id": "ev_…", "note": "audit handle; you do not need to act on it" } | null,
+  "source_database_changed": false, // ALWAYS present; true only after applied writeback
+  "_meta": { "tenant_id": "tenant_acme", "principal": "demo-operator", "provenance": "environment",
+             "canonical_capability": "billing.inspect_invoice" }
+}
+```
+Rules:
+- `ok` is the single branch point. No more `status` vs `ok`.
+- `summary` is mandatory and is the field the model is expected to read first and can
+  echo back to the user. Keep it short, factual, no internal ids unless useful.
+- Exactly one of `data` / `proposal` is non-null on success; both null on error.
+- `_meta` carries the trusted context and the canonical capability name (so OpenAI-safe
+  aliases like `billing__inspect_invoice` still expose their real name for reasoning/audit).
+### Proposal payload
+```jsonc
+"proposal": {
+  "id": "wrp_…",
+  "state": "review_required",      // review_required | approved | applied | conflict | rejected
+  "target": "invoices:INV-3001",
+  "diff": { "late_fee_cents": { "before": 2500, "proposed": 0 } },
+  "approval_required": true,
+  "writeback": { "mode": "direct_update" | "app_handler", "applied": false },
+  "next": "A human must approve outside this loop; nothing is committed yet."
+}
+```
+This part of today's output is already the best-designed — keep `diff.before/proposed`
+and `approval_required` verbatim, just move it under `proposal`.
+### Error payload (safe + stable)
+```jsonc
+"error": {
+  "code": "NOT_FOUND_IN_TENANT",   // STABLE enum (below) — never a raw infra string
+  "message": "No invoice INV-9999 is visible in your tenant.",  // safe, terse, actionable
+  "retryable": false
+}
+```
+Never surface raw driver text (`connect ECONNREFUSED 127.0.0.1:5433`) to the tool
+caller — log that to the local ledger only. Leaking it is a small info disclosure
+**and** degrades LLM behavior (the model parrots infra errors to the user).
+## Stable error code enum
+| code | meaning | retryable |
+|---|---|---|
+| `NOT_FOUND_IN_TENANT` | lookup found 0 authorized rows (missing OR wrong tenant — do not distinguish, it's a scoping signal) | no |
+| `INVALID_ARGUMENT` | arg failed schema/`numeric_bounds` | no |
+| `POLICY_VIOLATION` | request outside an allowed bound/transition | no |
+| `CAPABILITY_NOT_FOUND` | unknown tool name | no |
+| `VERSION_CONFLICT` | row changed since the agent saw it (stale-row guard) | no (re-inspect first) |
+| `MULTI_ROW_BLOCKED` | a write would touch ≠1 row | no |
+| `APPROVAL_REQUIRED` | attempted to apply without approval | no |
+| `TEMPORARILY_UNAVAILABLE` | DB/handler unreachable or timed out | yes |
+| `INTERNAL` | anything else (details only in ledger) | maybe |
+Keep this list small and documented; the model can be told "on `VERSION_CONFLICT`,
+re-inspect then re-propose" and act correctly.
+## Tool descriptions (ships with the envelope, same impact)
+The envelope fixes *results*; descriptions fix *whether the model calls the right
+tool at all*. Today they're generic ("Read public.outage_events through a reviewed
+Synapsor capability…"). Let capability config carry model-facing text:
+```jsonc
+{
+  "name": "support.inspect_outage",
+  "description": "Look up an outage event: its time window, affected plan, and the credit policy that governs waivers/credits. Use this before deciding whether a waiver or credit is justified.",
+  "args": {
+    "outage_id": { "type": "string", "description": "Outage/incident id, e.g. OUT-9001 (often referenced in the support ticket)." }
+  },
+  "returns_hint": "Returns the outage window, affected_plan, and credit_policy."
+}
+```
+In my live runs, the difference between a reliable agent and one that stalled before
+proposing was exactly this: whether the outage tool's description told the model it
+returns the *policy*. Surface `description` + per-arg `description` + `returns_hint`
+in `tools/list`. Fall back to today's auto-text only when the author omits them.
+## Migration
+- Add `"result_format": 2` to `synapsor.runner.json` (or a server flag
+  `--result-format v2`); default stays v1 for one minor cycle, then flips.
+- During transition the server can **dual-emit**: v2 envelope with a `legacy` mirror
+  of the old keys, so existing parsers don't break.
+- Document a one-line mapping: `status:"ok"` → `ok:true`; `status:"review_required"`
+  → `ok:true, proposal.state:"review_required"`; top-level `code/error` → `error.{code,message}`.
+## Acceptance
+- All of `tools/call` (read + proposal) and tool-level failures return the envelope.
+- `ok` alone is sufficient to branch in client code.
+- No raw driver/infra strings appear in any `error.message`.
+- `tools/list` exposes author-supplied `description` / arg `description` / `returns_hint`.

package/docs/rfcs/002-app-owned-handler-helper.md ADDED Viewed

@@ -0,0 +1,161 @@
+# Proposal: An app-owned writeback handler helper (safe-by-default executors)
+Status: draft for synapsor-runner (OSS)
+Goal: make `http_handler` / `command_handler` executors **safe by default**. Today a
+handler author must re-implement tenant scoping, the stale-row (`expected_version`)
+guard, and idempotency by hand — "match the example." That's a security-critical loop
+to leave to copy-paste. Ship a tiny helper that enforces the guards and hands the
+developer only the business write.
+## The problem, concretely
+I wrote a working credit handler for the lab. To be correct it had to, in order:
+re-auth the bearer token, reject the wrong `action`, extract tenant/object/version
+from a loosely-typed `change_set` (with 3 fallback paths each, copied from the
+example), check idempotency, `SELECT … FOR UPDATE`, compare a normalized
+`expected_version`, INSERT + UPDATE, and format a receipt with the right status
+vocabulary. **Every one of those is a place to introduce a vulnerability** (skip the
+version check → lost-update; skip tenant → cross-tenant write; trust body tenant
+without signature → spoofing). Most integrators will get at least one wrong.
+## The contract (formalize what's currently implicit)
+Request the runner POSTs to an `http_handler` (make this a published, versioned schema):
+```jsonc
+{
+  "protocol_version": "1.0",
+  "proposal_id": "wrp_…",
+  "idempotency_key": "wrp_…:INV-3001",
+  "issued_at": "2026-06-28T…Z",
+  "signature": "sha256=…",            // NEW: HMAC over the raw body (see Security)
+  "change_set": {
+    "action": "support.propose_plan_credit",
+    "scope":   { "tenant_id": "tenant_acme", "object_id": "INV-3001" },
+    "principal": { "id": "human-reviewer" },
+    "target":  { "schema": "public", "table": "invoices", "primary_key": { "column": "id", "value": "INV-3001" } },
+    "patch":   { "credit_requested_cents": 1500, "credit_reason": "outage credit" },
+    "guards":  { "tenant": { "column": "tenant_id", "value": "tenant_acme" },
+                 "expected_version": { "column": "updated_at", "value": "2026-05-16T00:00:00Z" } }
+  }
+}
+```
+Receipt the handler must return (today's status vocabulary, kept):
+```jsonc
+{ "status": "applied" | "already_applied" | "conflict" | "failed",
+  "rows_affected": 2,
+  "source_database_mutated": true,
+  "previous_version": "2026-05-16T00:00:00Z",
+  "new_version": "2026-06-28T…Z",
+  "safe_error_code": "ROW_CHANGED_AFTER_PROPOSAL",   // on conflict/failed
+  "details": { "effects": [ { "type": "db.insert", "table": "credits", "id": "CR-…" } ] } }
+```
+## Helper API (TypeScript — first-party, since the runner is TS/Node)
+```ts
+import { createWritebackHandler } from "../packages/handler/src/index.js";
+export const handler = createWritebackHandler({
+  // 1. Authenticity: helper verifies bearer AND the HMAC signature for you.
+  tokenEnv: "SYNAPSOR_APP_HANDLER_TOKEN",
+  signingSecretEnv: "SYNAPSOR_APP_HANDLER_SIGNING_SECRET",   // optional but recommended
+  // 2. Bind one apply() per capability. The helper has ALREADY:
+  //    - verified auth + signature + protocol_version
+  //    - matched the action
+  //    - parsed scope/target/patch/guards into a typed `job`
+  //    - opened a transaction, taken `SELECT … FOR UPDATE` on the target row,
+  //      enforced tenant match + expected_version (stale-row), and short-circuited
+  //      idempotency via the receipts table.
+  //  You only write business effects with the provided tx; throw to roll back.
+  capabilities: {
+    "support.propose_plan_credit": async (job, tx) => {
+      const creditId = `CR-${job.proposalId.slice(-12)}`;
+      await tx.insert("credits", {
+        id: creditId, tenant_id: job.tenantId, invoice_id: job.objectId,
+        customer_id: job.row.customer_id, amount_cents: job.patch.credit_requested_cents,
+        reason: job.patch.credit_reason, created_by: job.principal,
+      });
+      await tx.update("invoices", job.objectId, {
+        credited_cents: job.row.credited_cents + job.patch.credit_requested_cents,
+      });
+      return { effects: [{ type: "db.insert", table: "credits", id: creditId }] };
+    },
+  },
+  // 3. DB binding (helper owns the tx + FOR UPDATE + version compare + receipt write).
+  source: { engine: "postgres", writeUrlEnv: "SYNAPSOR_APP_WRITE_URL" },
+});
+// handler is a (req,res) you mount at POST /synapsor/writeback, or an express/fastify route.
+```
+The helper turns conflict/idempotency/auth into framework concerns. The author writes
+**only** the INSERT/UPDATE and returns `effects`; status/`rows_affected`/version
+bookkeeping/receipt shape are produced by the helper.
+## Helper API (Python reference — handlers are often the app, not Node)
+```python
+from synapsor_handler import writeback_handler, Job, Tx   # pip install synapsor-handler
+@writeback_handler(
+    token_env="SYNAPSOR_APP_HANDLER_TOKEN",
+    signing_secret_env="SYNAPSOR_APP_HANDLER_SIGNING_SECRET",
+    write_url_env="SYNAPSOR_APP_WRITE_URL",
+)
+def support_propose_plan_credit(job: Job, tx: Tx):
+    credit_id = f"CR-{job.proposal_id[-12:]}"
+    tx.insert("credits", id=credit_id, tenant_id=job.tenant_id, invoice_id=job.object_id,
+              customer_id=job.row["customer_id"], amount_cents=job.patch["credit_requested_cents"],
+              reason=job.patch["credit_reason"], created_by=job.principal)
+    tx.update("invoices", job.object_id,
+              credited_cents=job.row["credited_cents"] + job.patch["credit_requested_cents"])
+    return {"effects": [{"type": "db.insert", "table": "credits", "id": credit_id}]}
+# Mount as a FastAPI/Flask route:  app.post("/synapsor/writeback")(support_propose_plan_credit.asgi)
+```
+`Job` is fully typed/validated: `proposal_id`, `idempotency_key`, `tenant_id`,
+`object_id`, `principal`, `patch`, and `row` (the locked current row). `Tx` only
+exposes scoped `insert`/`update`/`query` against the configured write URL.
+## What the helper guarantees (so the author can't forget)
+1. **Authenticity** — bearer + HMAC signature over the raw body. Without signing,
+   a handler trusts body-supplied `tenant_id`; with it, spoofing a writeback requires
+   the secret, not just network reach.
+2. **Tenant scope** — the locked-row `SELECT` always includes `tenant_id = scope.tenant_id`.
+3. **Stale-row guard** — `expected_version` compared at second precision (matching the
+   runner's own `versionValuesMatch`); mismatch → `conflict`, auto-rollback.
+4. **Idempotency** — a receipts/dedup row keyed by `idempotency_key`; replay → `already_applied`, no double write.
+5. **Atomicity** — author effects + receipt commit in one tx; any throw rolls back and returns a safe `failed`.
+6. **Safe receipts** — never leaks raw driver errors; maps exceptions to `safe_error_code`.
+## Security notes
+- **Sign requests.** Add `signature = HMAC_SHA256(signing_secret, raw_body)` and a
+  short `issued_at` skew window. Document it as recommended for any handler not on loopback.
+- The handler's DB credential should still be least-privilege (in the lab: `synapsor_app`
+  = SELECT/UPDATE invoices + SELECT/INSERT credits, nothing else). The helper doesn't
+  replace DB perms; it complements them.
+- Receipts table/dedup store: the helper should create-or-require it and, on permission
+  error, print the exact `GRANT`/DDL (same gap as the direct-writeback receipts table).
+## Why this matters for adoption
+App-owned executors are the answer to "rich writes" (INSERT/multi-row/events) — the
+thing the runner deliberately won't do itself. But that answer is only safe if the
+handler is safe, and right now safety is the integrator's homework. A first-party
+helper makes the secure path the easy path, which is exactly the framing the whole
+product is built on: don't hand people a footgun, hand them a reviewed capability.
+## Acceptance
+- `createWritebackHandler` (TS) + `synapsor_handler` (Python) enforce auth, signature,
+  tenant, version, idempotency, atomicity with no author code.
+- A handler written with the helper passes the same conflict/idempotency/tenant tests
+  the runner ships for direct writeback.
+- Request/receipt schemas are published and versioned (`protocol_version`).

package/docs/rfcs/003-integrator-feedback-teardown.md ADDED Viewed

@@ -0,0 +1,97 @@
+ here's the detailed teardown of the OSS synapsor-runner specifically, split into (A) what trips up a serious dev, and (B) how an LLM actually experiences it. Everything below is from things I hit building and live-testing the lab, not theory. I'll keep Cloud-domain stuff out of it (RBAC, central ledger, etc. — those belong in hosted).
+A. Limitations for a serious dev/user
+1. Config authoring is the real onboarding cost.
+synapsor.runner.json is powerful but you mostly hand-write it. init --wizard exists, but multi-capability, executor-backed configs are manual, and there's no published JSON schema or one-page field reference — I learned the shape by running init and reading dist/runner.mjs. A serious dev wants: a versioned JSON Schema (so editors autocomplete + validate), and a "capability authoring" doc covering read vs proposal, patch (fixed/from_arg), allowed_columns, numeric_bounds, conflict_guard, executor.
+2. Env-var sprawl + historical credential ambiguity.
+There are a lot: SYNAPSOR_DATABASE_READ_URL, _WRITE_URL, legacy SYNAPSOR_DATABASE_URL, TENANT_ID, PRINCIPAL, RUNNER_HTTP_TOKEN, plus per-executor *_HANDLER_URL/_TOKEN. The apply-uses-SYNAPSOR_DATABASE_URL-vs-write_url_env thing cost me real debugging on alpha.6 (now documented). doctor is good but only checks presence; it doesn't prove the write path (attempt a rolled-back probe write with the writer cred) or handler reachability.
+3. Store ↔ server lifecycle is a footgun.
+The SQLite store is shared state, and the running server holds it open. Deleting/resetting the store under a live server gives corrupt/confusing behavior — I hit "database access issue" surfaced to the agent because I reset the store without restarting the server. There's no lock, warning, or coordinated reset. Also: running serve-http + serve-streamable-http on one store simultaneously is asking for contention, with no guardrail.
+4. The receipt-table permission gotcha.
+Direct writeback does CREATE TABLE IF NOT EXISTS synapsor_writeback_receipts, which a least-privilege writer can't do (PG15+ no CREATE on public). Now documented, but doctor could detect it and print the exact GRANT/DDL (or a flag to pre-create). I had to invent the dedicated-schema trick myself.
+5. App-owned handlers re-implement security by hand.
+The executor contract is "match the example": you must re-check tenant, parse change_set, enforce the expected_version stale-row guard, and do idempotency yourself. It's easy to write an insecure handler (skip the version check, forget tenant scope). There's no published handler SDK/helper and no request signing — the only auth is a bearer token, so the handler trusts the POST body's tenant/version. A first-party handler helper (verify + parse + enforce guards, optional HMAC signature) would remove a whole class of mistakes.
+6. Versioning discipline.
+@alpha is a moving tag and behavior changed meaningfully across alpha.6→11 (transport, arg types string→number, credential resolution). I had to pin and bump six times. A stable channel + changelog + semver promise is table stakes before serious devs build on it.
+7. Two serve modes are easy to confuse.
+serve-http (lightweight JSON-RPC, not real MCP) vs serve-streamable-http (spec MCP) on different ports, plus --alias-mode. I pointed the SDK at the wrong one. Consider: make serve-streamable-http the headline, rename serve-http to something like serve-bridge/--legacy-jsonrpc, and have mcp client-config always pair the client with the matching server command (it does for openai-agents — good).
+8. Observability is CLI-only.
+replay/activity are genuinely nice, but there's no structured event stream/webhook for proposal.created/approved/applied. Even a local webhook would let people build a review UI or Slack-notify a reviewer without polling. (The full ledger is Cloud's job; a local event hook isn't.)
+B. How easy is it for an LLM to use/understand?
+This is where you have the most leverage, and I have direct evidence.
+1. Tool descriptions are the single biggest reliability lever — and they're currently generic.
+On the JSON-RPC path my agent was reliable because my function-tool docstrings were rich ("Read an outage event — window, affected plan, credit policy; use this to decide if a waiver is justified"). On the native streamable path the model stalled and refused to propose, and a big reason was the auto-generated description: "Read public.outage_events through a reviewed Synapsor capability with trusted tenant context and evidence." That tells the model the plumbing, not what the tool is for or what it returns. The model didn't realize the outage tool gives it the policy it needed.
+→ Let capability config carry a model-facing description + per-arg descriptions + an optional "returns/when to use" hint, and surface them in tools/list. This alone would have made my streamable runs reliable. Right now authors can't easily improve what the model sees.
+2. Inconsistent result envelopes hurt the model (and the code).
+Success = {status:"ok", data:{...}, evidence_bundle_id, trusted_context, ...}. Not-found = {ok:false, code, error}. Two different shapes for the same tool means the model (and my client) must branch on multiple keys. → One envelope always: {ok: true/false, data?, error?, summary}, where summary is a one-line natural-language result the model can echo to the user. The proposal result (proposal_id, diff: {before, proposed}, source_database_changed:false) is the best-designed part — model-legible and unambiguous; mirror that everywhere.
+3. Leaky/raw errors confuse the model.
+A failed read surfaced connect ECONNREFUSED 127.0.0.1:5433 straight into the agent, which then told the user "database access issue." Raw infra errors are both a small info leak and bad for LLM behavior. → Safe, terse, actionable tool errors ("temporarily unavailable, retry later" / "not found in your tenant"), with details only in the local ledger.
+4. Dotted names vs aliases.
+alias-mode openai correctly makes names valid (billing__inspect_invoice), but the alias diverges from any example/instruction that uses the canonical dotted name — so prompts that say "call propose_late_fee_waiver" don't match the tool the model sees. I had to make my agent instructions tool-name-agnostic. → Keep openai-safe aliases as the default for the openai-agents config (you do), and put the canonical name + purpose in the description so the model can still reason about it.
+5. Evidence handles are slightly confusing in-loop.
+The model gets evidence_bundle_id but can't really do anything with it during the turn (it's for replay). Without a hint, a model may try to "use" it. A one-line "this is an audit handle; you don't need to act on it" in the description avoids wasted reasoning.
+6. No recommended agent-instruction snippet.
+The biggest gap for first-time LLM integrators: the propose-first pattern needs the model to inspect all evidence before proposing and follow referenced IDs. I had to write that system prompt myself (and rewrite it once for the streamable path). → mcp client-config could optionally emit a recommended instruction block ("gather evidence first; follow referenced ids like OUT-…; propose only within policy; you cannot write or approve"). That would make day-1 agent behavior reliable instead of something each integrator rediscovers.
+Net on LLM-friendliness: the conceptual model is excellent for LLMs — propose-only tools are inherently safe and easy for a model to call. The surface (generic descriptions, mixed envelopes, raw errors, alias/name divergence) is where reliability leaks. These are all fixable in the OSS runner and would noticeably raise the floor on agent reliability.
+C. If I were prioritizing your OSS updates
+P0 (adoption + LLM reliability):
+1. Author-supplied rich tool/arg descriptions in config, surfaced in tools/list.
+2. One consistent result envelope + safe terse errors + a summary field.
+3. Published config JSON Schema + a capability/executor authoring page.
+P1 (trust + safety):
+4. Stable release channel + changelog/semver.
+5. Handler helper/SDK (parse + enforce tenant/version/idempotency, optional HMAC) so app-owned writeback is safe-by-default.
+6. doctor that probes the real write path + handler and emits exact GRANT/DDL for receipts; store/server lifecycle guardrails (warn on store-removed-under-server).
+P2 (polish):
+7. Optional recommended agent-instruction snippet from mcp client-config.
+8. Local lifecycle webhook/event stream for proposals.
+---------------
+01-result-envelope-spec.md
+One envelope for every tool result, so ok is the only thing client code (and the model) must branch on:
+jsonc
+{ "ok": true, "summary": "<one-line NL the model reads/echoes>", "kind": "read|proposal",
+  "data": {…}|null, "proposal": {…}|null, "error": {"code","message","retryable"}|null,
+  "evidence": {…}|null, "source_database_changed": false, "_meta": {tenant, principal, canonical_capability} }
+Key decisions, all grounded in what broke in my live runs:
+- Kills the status vs ok / mixed-shape problem (today success uses status:"ok", errors use ok:false — two branches).
+- Mandatory summary field — the LLM reads it first; fixed my "model parroted infra error" issue.
+- Safe, stable error.code enum (9 codes, table included) — never raw ECONNREFUSED-style strings, which both leak and degrade agent behavior.
+- Author-supplied description / per-arg description / returns_hint surfaced in tools/list — this is the single biggest LLM lever; the streamable stall I hit was the outage tool's generic description not telling the model it returns the policy.
+- Migration path: result_format: 2 flag + dual-emit + old→new mapping table.
+02-handler-helper-interface.md
+A first-party helper so app-owned executors are safe by default instead of "match the example."
+- Formalizes (and versions) the request/receipt contract that's currently implicit.
+- createWritebackHandler (TS) + synapsor_handler (Python — since real handlers are app code) where the author writes only the INSERT/UPDATE and returns effects; the helper enforces auth + HMAC signature, tenant scope, expected_version stale-row guard, idempotency, atomicity, safe receipts for them.
+- Calls out request signing (today a handler trusts body-supplied tenant_id behind only a bearer token) and the receipts-table GRANT/DDL gap.
+- Rationale tied to your own thesis: rich writes are the executor's job, but that's only safe if the handler is safe — so make the secure path the easy path.
+Both are sized to drop into the repo as RFCs. If you want, next I can: turn the error-code enum into a concrete TypeScript type + a v1→v2 adapter shim, or sketch the Job/Tx type definitions for the handler helper so they're ready to implement.

package/docs/store-lifecycle.md ADDED Viewed

@@ -0,0 +1,83 @@
+# Store Lifecycle
+Synapsor Runner keeps local evidence, query audit, proposals, receipts, replay,
+and lifecycle events in a SQLite store.
+Default path:
+```bash
+./.synapsor/local.db
+```
+## Server leases
+MCP server modes write a small lease file next to the store:
+```text
+<store>.lease.json
+```
+The lease records the server pid, mode, transport, and start time. Destructive
+store operations refuse to run while that lease points at a live process.
+Use `--force` only after you have stopped the server or verified the lease is
+stale.
+## Prune safely
+Preview first:
+```bash
+synapsor-runner store prune --store ./.synapsor/local.db --older-than 30d --dry-run
+```
+Apply after review:
+```bash
+synapsor-runner store prune --store ./.synapsor/local.db --older-than 30d --yes
+```
+Override an active/stale lease:
+```bash
+synapsor-runner store prune --store ./.synapsor/local.db --older-than 30d --yes --force
+```
+## Reset the local ledger
+Reset deletes only the local SQLite ledger files:
+```bash
+synapsor-runner store reset --store ./.synapsor/local.db --yes
+```
+It removes:
+```text
+local.db
+local.db-wal
+local.db-shm
+local.db.lease.json
+```
+It never touches your source Postgres/MySQL database. Like prune, reset refuses
+while an active server lease exists unless you pass `--force` after verifying
+the server is stopped or the lease is stale.
+## Deleted store under a running server
+If the store file disappears while a server is still running, model-facing tool
+calls fail safely with `TEMPORARILY_UNAVAILABLE`. Runner does not expose raw
+SQLite paths, corruption text, or filesystem errors to the model.
+Fix:
+1. Stop the running MCP server.
+2. Recreate the store by rerunning the demo/setup or restore the previous store.
+3. Restart the MCP server.
+## Concurrent server modes
+Running multiple server transports against the same SQLite store can cause
+contention and confusing local state. Runner refuses concurrent server leases by
+default. Use `--allow-concurrent-store` only for controlled local debugging.

package/docs/troubleshooting-first-run.md CHANGED Viewed

@@ -3,13 +3,13 @@
 Run the friendly doctor first:
 ```bash
-npx -y -p @synapsor/runner@alpha synapsor-runner doctor --first-run
+npx -y -p @synapsor/runner synapsor-runner doctor --first-run
 ```
 Use JSON for automation:
 ```bash
-npx -y -p @synapsor/runner@alpha synapsor-runner doctor --first-run --json
+npx -y -p @synapsor/runner synapsor-runner doctor --first-run --json
 ```
 ## Docker Missing
@@ -128,13 +128,13 @@ Own-database MCP setup needs a reviewed config before serving tools.
 Fix:
 ```bash
-npx -y -p @synapsor/runner@alpha synapsor-runner init --from-env DATABASE_URL --mode review --wizard
+npx -y -p @synapsor/runner synapsor-runner init --from-env DATABASE_URL --mode review --wizard
 ```
 Or pass an example config:
 ```bash
-npx -y -p @synapsor/runner@alpha synapsor-runner tools preview --config ./examples/mcp-postgres-billing/synapsor.runner.json --store ./.synapsor/local.db
+npx -y -p @synapsor/runner synapsor-runner tools preview --config ./examples/mcp-postgres-billing/synapsor.runner.json --store ./.synapsor/local.db
 ```
 ## SQLite Store Missing
@@ -180,7 +180,7 @@ Fix:
 ```bash
 export SYNAPSOR_DATABASE_READ_URL="<read-only-url>"
-npx -y -p @synapsor/runner@alpha synapsor-runner doctor --config synapsor.runner.json
+npx -y -p @synapsor/runner synapsor-runner doctor --config synapsor.runner.json
 ```
 ## Read/Write Credential Split Failed
@@ -216,7 +216,7 @@ Fix:
 Regenerate the snippet:
 ```bash
-npx -y -p @synapsor/runner@alpha synapsor-runner mcp config claude-desktop \
+npx -y -p @synapsor/runner synapsor-runner mcp config claude-desktop \
   --absolute-paths \
   --config ./synapsor.runner.json \
   --store ./.synapsor/local.db

package/docs/use-your-own-database.md ADDED Viewed

@@ -0,0 +1,18 @@
+# Use Your Own Database
+The canonical guide is [Connect Your Own Database](getting-started-own-database.md).
+Use it when you want to point Synapsor Runner at a staging Postgres/MySQL
+database, inspect schemas/tables, generate one reviewed context/capability, and
+serve semantic MCP tools without exposing raw SQL or write credentials to the
+model.
+Short path:
+```bash
+export DATABASE_URL="postgresql://readonly_user:password@host:5432/app?sslmode=require"
+npx -y -p @synapsor/runner synapsor-runner start --from-env DATABASE_URL --schema public
+```
+Runner stores environment-variable names in `synapsor.runner.json`, not database
+URLs. Keep credentials in your shell, process manager, or secret manager.