RubyGems - parse-stack-next - Versions diffs - 5.1.1 → 5.2.1 - Mend

parse-stack-next 5.1.1 → 5.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (60) hide show

checksums.yaml +4 -4
data/.env.sample +12 -0
data/.env.test +4 -4
data/CHANGELOG.md +630 -0
data/Gemfile +3 -0
data/Gemfile.lock +6 -1
data/README.md +226 -39
data/Rakefile +56 -10
data/docs/atlas_vector_search_guide.md +110 -9
data/docs/mcp_guide.md +504 -0
data/docs/mongodb_direct_guide.md +66 -1
data/docs/mongodb_index_optimization_guide.md +22 -1
data/docs/usage_guide.md +15 -0
data/lib/parse/agent/approval_gate.rb +0 -0
data/lib/parse/agent/constraint_translator.rb +90 -19
data/lib/parse/agent/describe.rb +1 -0
data/lib/parse/agent/errors.rb +16 -0
data/lib/parse/agent/mcp_client.rb +9 -0
data/lib/parse/agent/mcp_dispatcher.rb +139 -7
data/lib/parse/agent/mcp_rack_app.rb +621 -17
data/lib/parse/agent/mcp_subscriptions.rb +607 -0
data/lib/parse/agent/metadata_dsl.rb +58 -0
data/lib/parse/agent/metadata_registry.rb +141 -1
data/lib/parse/agent/prompt_hardening.rb +213 -0
data/lib/parse/agent/result_formatter.rb +18 -3
data/lib/parse/agent/tools.rb +167 -24
data/lib/parse/agent.rb +692 -21
data/lib/parse/client/request.rb +55 -4
data/lib/parse/client/response.rb +4 -0
data/lib/parse/client.rb +205 -7
data/lib/parse/model/classes/installation.rb +27 -10
data/lib/parse/model/classes/user.rb +8 -0
data/lib/parse/model/core/actions.rb +65 -13
data/lib/parse/model/core/embed_managed.rb +19 -14
data/lib/parse/model/core/indexing.rb +108 -16
data/lib/parse/model/core/querying.rb +29 -0
data/lib/parse/model/model.rb +34 -3
data/lib/parse/model/object.rb +42 -0
data/lib/parse/query.rb +90 -24
data/lib/parse/retrieval/agent_tool.rb +369 -0
data/lib/parse/retrieval/chunk.rb +74 -0
data/lib/parse/retrieval/chunker.rb +208 -0
data/lib/parse/retrieval/retriever.rb +274 -0
data/lib/parse/retrieval.rb +10 -0
data/lib/parse/schema.rb +69 -20
data/lib/parse/stack/version.rb +2 -2
data/lib/parse/webhooks/payload.rb +62 -34
data/lib/parse/webhooks.rb +15 -3
data/parse-stack-next.gemspec +1 -1
data/scripts/docker/docker-compose.atlas.yml +14 -10
data/scripts/docker/docker-compose.test.yml +24 -20
data/scripts/docker/mongo-init.js +3 -3
data/scripts/start-parse.sh +10 -0
data/scripts/start_mcp_server.rb +1 -1
data/scripts/test_server_connection.rb +1 -1
data/scripts/vector_prototype/create_vector_index.js +1 -1
data/scripts/vector_prototype/fetch_embeddings.py +2 -2
data/scripts/vector_prototype/query_prototype.rb +1 -1
data/scripts/vector_prototype/run.sh +4 -4
metadata +10 -2

data/docs/atlas_vector_search_guide.md CHANGED Viewed

@@ -351,23 +351,124 @@ Mechanics:
 ### Single vector per record
-`embed` produces exactly one vector per record. There is no built-in
-chunker. Long source text whose concatenation exceeds the provider's
-per-call token budget will be truncated provider-side, and the
-resulting vector will represent only the leading portion of the
-document.
+`embed` produces exactly one vector per record. Long source text whose
+concatenation exceeds the provider's per-call token budget is truncated
+provider-side, and the stored vector represents only the leading portion
+of the document. **Chunking happens at retrieval time, not embed time**
+(see [Retrieval (RAG)](#retrieval-rag) below): the embedding stays
+one-vector-per-record by design.
-For long-form content, two options:
+If you instead want each passage to have its OWN embedding (true
+embed-time chunking), use one of these patterns:
 1. **Pre-chunk client-side** and write each chunk as its own
    `Parse::Object` record with its own `embed` declaration.
-2. **Dedicated `Chunk` subclass** that `belongs_to` the parent, with
+2. **Dedicated chunk subclass** that `belongs_to` the parent, with
    `embed :content, into: :embedding` on the chunk class itself. Run
    similarity search against the chunk collection, then hydrate
    parents as needed.
-A built-in chunker plus a `semantic_search` agent tool are scheduled
-for a future release.
+---
+## Retrieval (RAG)
+`Parse::Retrieval` (`Parse::RAG` is an alias) sits on top of
+`find_similar`. `Parse::Retrieval.retrieve` embeds a natural-language
+query, runs Atlas `$vectorSearch` through `find_similar` (so ACL/CLP are
+enforced mongo-direct — there is no REST two-stage re-query), and splits
+each retrieved document's text field into scored, citable chunks.
+Chunking here is **presentation-only**: every chunk inherits its parent
+document's single `$vectorSearch` score.
+```ruby
+chunks = Parse::Retrieval.retrieve(
+  query:         "how do I reset my password?",
+  klass:         KnowledgeArticle,   # or "KnowledgeArticle"
+  field:         :embedding,         # optional; auto-resolves a single :vector field
+  k:             5,
+  filter:        { published: true }, # post-$vectorSearch $match
+  vector_filter: nil,                 # Atlas-native pre-filter (fields must be type:"filter")
+  tenant_scope:  nil,                 # { field:, value: } merged into vector_filter
+  score_quantize: false,
+  session_token: user.session_token,  # ACL scope kwargs pass through to find_similar
+)
+# => Array<Parse::Retrieval::Chunk> — { id, score, content, source, metadata }
+```
+`rerank:` and `hybrid:` are reserved on the signature and raise
+`NotImplementedError` if supplied.
+### Chunkers
+The default is a fixed-size sliding window with overlap. Subclass
+`Parse::Retrieval::Chunker::Base` (implement `#chunk(text) -> Array<String>`)
+for semantic / sentence-aware strategies.
+```ruby
+Parse::Retrieval::Chunker::FixedSizeOverlap.new(
+  size: 800,                    # window width
+  overlap: 100,                 # units shared between consecutive windows (must be < size)
+  by: :chars,                   # :chars (default) or :tokens (whitespace tokens)
+  max_chunks_per_document: 200, # amplification cap — TRUNCATES with a signal, never raises
+)
+```
+### `agent_searchable` + the `semantic_search` agent tool
+Opt a model in to agentic retrieval, declaring the vector field and the
+fields an agent may filter on:
+```ruby
+class KnowledgeArticle < Parse::Object
+  property :title, :string
+  property :body, :string
+  property :embedding, :vector, dimensions: 1536, provider: :openai
+  embed :title, :body, into: :embedding
+  agent_searchable field: :embedding, filter_fields: %i[published category]
+end
+```
+Every property referenced by `embed` must be declared — omitting
+`property :title` here raises `InvalidEmbedDeclaration` at class load.
+Because this model embeds **two** text sources (`:title` and `:body`),
+`semantic_search` cannot guess which one to chunk and return as the
+result `content`. Pass `text_field:` to choose (it must name one of the
+embedded sources); a single-source model infers it automatically and the
+parameter is optional:
+```ruby
+# via the agent tool (LLM-facing parameter)
+semantic_search(class_name: "KnowledgeArticle", query: "vector indexes",
+                text_field: "body")
+# or directly
+Parse::Retrieval.retrieve(query: "vector indexes", klass: KnowledgeArticle,
+                          text_field: :body)
+```
+The readonly, `client_safe` `semantic_search` tool then routes through
+`Parse::Retrieval.retrieve` with the full agent security envelope:
+searchable-class allowlist (`MetadataRegistry.resolve_searchable!`),
+recursive underscore-key refusal + filter-field allowlist on caller
+input, tenant scope folded into the Atlas pre-filter AND re-asserted on
+every returned record, `field_allowlist` projection of each source, and
+score quantization in non-admin contexts. In a tenant-aware deployment
+(any class declares `agent_tenant_scope`), a searchable class without its
+own tenant scope is refused at dispatch. See the
+[MCP guide](./mcp_guide.md) for the agent-side wiring.
+**Result shape (token-economy).** The tool returns
+`{ chunks:, documents:, count: }`. Each chunk's parent record is hoisted
+**once** into `documents` (keyed by `objectId`) rather than duplicated on
+every chunk — map a chunk to its source via `metadata.object_id`. A
+`max_total_tokens:` budget (default 20,000; estimated chars/4) trims the
+lowest-ranked chunks so a few long documents can't silently blow the
+context window, adding `budget_truncated: true` / `budget_dropped: <n>`
+when it trims (pass `0` to disable). The library-level
+`Parse::Retrieval.retrieve` still returns the flat `Array<Chunk>` with
+`source` on each chunk — the dedup and budget live in the agent tool's
+envelope. See the [MCP guide's Token Economy section](./mcp_guide.md#token-economy).
 ---

data/docs/mcp_guide.md CHANGED Viewed

@@ -360,6 +360,504 @@ Common uses for the direct dispatcher:
 ---
+## Connecting Claude Desktop (stdio bridge)
+Parse Stack speaks MCP over **HTTP** (the standalone server and the
+Rack adapter both expose a JSON-RPC-over-HTTP endpoint). Claude Desktop,
+however, launches MCP servers as local **stdio** subprocesses — it does
+not dial an HTTP URL directly. Bridge the two with
+[`mcp-remote`](https://www.npmjs.com/package/mcp-remote), a small stdio↔HTTP
+proxy that Claude Desktop runs as the subprocess and which forwards to your
+HTTP endpoint.
+1. Start the Parse Stack MCP endpoint over HTTP (standalone or Rack — see
+   Deployment Modes above) and note its URL and the bearer token your
+   `agent_factory` expects, e.g. `http://localhost:3001/` with
+   `Authorization: Bearer <token>`.
+2. Add the bridge to `claude_desktop_config.json` (macOS:
+   `~/Library/Application Support/Claude/claude_desktop_config.json`;
+   Windows: `%APPDATA%\Claude\claude_desktop_config.json`):
+   ```json
+   {
+     "mcpServers": {
+       "parse-stack": {
+         "command": "npx",
+         "args": [
+           "-y",
+           "mcp-remote",
+           "http://localhost:3001/",
+           "--header",
+           "Authorization: Bearer ${PARSE_MCP_TOKEN}"
+         ],
+         "env": {
+           "PARSE_MCP_TOKEN": "your-mcp-token"
+         }
+       }
+     }
+   }
+   ```
+3. Restart Claude Desktop. The Parse Stack tools (`query_class`,
+   `get_schema`, `semantic_search`, …) appear in the client.
+Notes:
+- `mcp-remote` requires Node.js on the machine running Claude Desktop.
+- For a public endpoint, terminate TLS in front of the HTTP server and use
+  an `https://` URL; the bearer token rides the `Authorization` header.
+- The same bridge works for any stdio-only MCP client (e.g. some IDE
+  integrations). Clients that support remote MCP connectors natively can
+  point at the HTTP URL without the bridge.
+- Approval workflows (elicitation) need the streaming/listening-stream
+  prerequisites described under Approval Workflows — confirm the bridge and
+  client forward the SSE channel before relying on human-in-the-loop gating.
+---
+## Resource Subscriptions (LiveQuery bridge)
+MCP lets a client `resources/subscribe` to a resource URI and then receive
+unsolicited `notifications/resources/updated` messages whenever the underlying
+data changes. Parse Stack bridges that surface onto Parse LiveQuery: a
+subscribed `parse://<Class>/count` or `parse://<Class>/samples` resource is
+backed by a LiveQuery subscription on `<Class>`, and any matching
+create/update/delete/enter/leave event is debounced into a single coarse
+update for that URI. The client re-reads the resource via `resources/read` to
+obtain the new value — row payloads are never streamed through the resource
+surface.
+This is opt-in and requires a streaming-capable Rack server (Puma, Falcon —
+WEBrick buffers responses and cannot hold the listening stream open) plus
+LiveQuery enabled and configured.
+```ruby
+# Boot: enable LiveQuery and point it at the server.
+Parse.setup(
+  server_url:     "https://your-parse-server.com/parse",
+  application_id: "your_app_id",
+  api_key:        "your_api_key",
+  live_query_url: "wss://your-parse-server.com",
+)
+Parse.live_query_enabled = true
+# Mount the Rack app with resource subscriptions enabled.
+app = Parse::Agent::MCPRackApp.new(resource_subscriptions: true) do |env|
+  token = env["HTTP_AUTHORIZATION"].to_s.delete_prefix("Bearer ")
+  MyAuth.agent_for_token!(token) # returns a Parse::Agent or raises Unauthorized
+end
+```
+When enabled and LiveQuery is available, the `initialize` handshake advertises
+`resources.subscribe: true`. When LiveQuery is not enabled/available — or on
+the WEBrick `MCPServer`, which cannot stream — the capability stays
+`subscribe: false` and `resources/subscribe` returns a "not supported" error.
+The capability is a contract: it is never advertised unless the server can
+actually deliver updates.
+### Protocol flow
+1. **`initialize`** — the response carries a server-issued `Mcp-Session-Id`
+   header. The client echoes it on every subsequent request.
+2. **`GET` listening stream** — the client opens a long-lived `GET` to the same
+   endpoint with `Accept: text/event-stream` and the `Mcp-Session-Id` header.
+   This is the server→client channel; it stays open and emits
+   `notifications/resources/updated` events until the client disconnects.
+3. **`resources/subscribe`** — a normal `POST` with
+   `{ "uri": "parse://Post/count" }`. Returns an empty result; updates begin
+   flowing on the listening stream.
+4. **`resources/unsubscribe`** — stops one subscription. `DELETE` with the
+   session id tears the whole session down.
+Only `count` and `samples` resources are subscribable. `schema` is rejected
+with an invalid-params error because schema changes are not LiveQuery events.
+### Access control (important)
+The bridge enforces the same scope rules as the rest of the SDK. LiveQuery
+filters events server-side using the credential on the subscribe frame, so the
+subscription's credentials are derived from the subscribing agent:
+| Agent scope | LiveQuery credential | Events seen |
+|-------------|----------------------|-------------|
+| session-token agent | that session token | only rows the user can read (ACL/CLP enforced by Parse Server) |
+| master-key agent | master key | every event |
+| `acl_user:` / `acl_role:` agent | **refused** | none — see below |
+`acl_user:` / `acl_role:` agents are an SDK-side, mongo-direct-only construct
+with no Parse Server REST or LiveQuery equivalent (Parse Server has no
+"act as this user pointer / role" handshake). Bridging them would force a
+silent downgrade to either master key (a row-level leak) or an unscoped
+session, so the bridge **fails closed** and refuses the subscription with a
+security error. Subscribe with a session-token or master-key agent instead.
+Because Parse Server fixes ACL-bypass authorization at LiveQuery *connect*
+time (there is no per-subscription master key), the bridge keeps two
+connections and routes by credential: master-posture subscriptions ride a
+dedicated **admin** connection
+(`Parse::LiveQuery::Client.new(use_master_key: true)`), while session-token
+subscriptions ride a normal connection and pass their token per subscription.
+Either way, an update only fires for an object the subscription's scope is
+permitted to read — LiveQuery filters events by ACL server-side. (Whether a
+master connection additionally surfaces master-key-only rows depends on the
+Parse Server version and its `masterKeyIps` configuration.)
+### Operational notes and limitations
+- **Single-process.** Subscription state lives in the `MCPRackApp` instance
+  (like the cancellation registry), so in a clustered / multi-process
+  deployment a LiveQuery event observed on one worker does not reach a
+  listening stream held on another. The delivery seam
+  (`Parse::Agent::MCPSubscriptions::Notifier`) is isolated so a Redis-backed
+  pub/sub adapter can be supplied later without changing the bridge or the
+  dispatcher; pass it via `subscription_manager:`.
+- **Subscriptions do not survive a listening-stream reconnect.** Closing the
+  `GET` stream tears down the session's LiveQuery subscriptions; a client that
+  reconnects must re-issue its `resources/subscribe` calls.
+- **Listening streams are owner-bound (not a bare bearer capability).** The
+  stream authenticates via the agent factory *and* the server-issued
+  `Mcp-Session-Id` is bound to the principal that established it, so another
+  authenticated caller who knows or guesses the id is refused with `403`. The
+  `Mcp-Session-Id` is still secret-bearing and should be kept confidential, but
+  possession alone is no longer sufficient — see **Listening-stream ownership**
+  below for the binding model, its limits, and the `principal_resolver:` knob
+  master-key deployments need to make it effective.
+- **Per-session and global caps.** A client that subscribes but never opens (or
+  later drops) its listening stream leaves LiveQuery subscriptions running until
+  the session is torn down. A per-session ceiling (default 100,
+  `max_subscriptions_per_session:` on the manager) bounds one session's
+  footprint, and a global ceiling on the number of distinct subscribing sessions
+  (default 10,000, `max_sessions:`) bounds total growth. The global cap is a
+  rejection cap (new sessions are refused with a JSON-RPC error once it is
+  reached) and fails closed.
+- **Concurrent listening streams are bounded separately from request SSE.**
+  `max_concurrent_dispatchers:` does **not**, by itself, bound the GET listening
+  streams used for resource subscriptions and notifications — those get their own
+  soft cap *equal to* `max_concurrent_dispatchers`. So the effective steady-state
+  ceiling across both surfaces is up to **2× `max_concurrent_dispatchers`** (up
+  to N request-scoped SSE dispatchers plus N listening streams). Size the value
+  with that 2× factor in mind (e.g. relative to your Puma `max_threads`). Leaving
+  it unset (the default `nil`) leaves both surfaces uncapped; the app logs a
+  one-time warning at construction when a streaming or subscription/notification
+  surface is enabled without a cap.
+### Listening-stream ownership
+The GET listening stream is the single server→client bus shared by resource
+subscriptions, [server-initiated notifications](#server-initiated-notifications-general-purpose),
+and [approval elicitation](#approval-workflows-mcp-elicitation). Whoever holds
+that stream receives everything pushed to its `Mcp-Session-Id` — another
+session's `notifications/resources/updated`, `elicitation/create` approval
+prompts, and arbitrary `notify` payloads. So the stream is **owner-bound**: a
+session is tied to the principal that established it, and only the same
+principal may later open (or re-open) its stream.
+How the binding is established and checked:
+- **Initialize-bound.** A session created through an `initialize` POST is bound
+  authoritatively to that caller's principal. A later `GET` carrying the same
+  `Mcp-Session-Id` from a *different* principal is refused with HTTP `403`
+  (`-32600`, "Mcp-Session-Id is owned by another principal"). A re-`initialize`
+  by the same caller refreshes the binding.
+- **Trust-on-first-use (TOFU) for the decoupled bus.** A session id that
+  `initialize` never saw — the `notifications: true` bus, where application code
+  pushes to ids it chose itself — is claimed by the first principal to attach a
+  listener; a different principal attaching afterward is refused. TOFU closes
+  the prior model's eviction-after-claim hole (a second caller could overwrite
+  or shadow an existing listener), but a first-mover attacker can still claim an
+  *unused* id, so **notification-bus session ids must be high-entropy**.
+- **Stream close keeps the claim.** The binding is dropped only on an explicit
+  `DELETE` termination, not on mere stream close — a reconnecting owner keeps
+  its claim, and an attacker cannot grab the id during a brief disconnect.
+The principal fingerprint is derived, in order, from: an operator-supplied
+`principal_resolver:`, then the agent's `session_token` (hashed), then
+`acl_user`, then `acl_role`. With none of these the agent falls back to a shared
+`"mk"` (master-key) principal:
+- **A master-key-everywhere factory makes owner-binding a no-op.** If every
+  request builds a bare master-key agent (no `session_token:` / `acl_user:` /
+  `acl_role:`), all agents share the `"mk"` fingerprint and are
+  indistinguishable, so the `403` never fires among them. Deployments that
+  authenticate users upstream and run master-key agents should supply a
+  `principal_resolver:` to restore a real per-user identity:
+  ```ruby
+  app = Parse::Agent::MCPRackApp.new(
+    streaming: true,
+    notifications: true,                 # or resource_subscriptions: true
+    principal_resolver: ->(agent, env) {
+      # Return a stable per-user id (String). nil/empty falls through to the
+      # agent's own scope, then to the shared "mk" principal.
+      env["myapp.authenticated_user_id"]
+    },
+    agent_factory: ->(env) { ... },
+  )
+  ```
+  The resolver must respond to `#call`; an invalid one raises `ArgumentError` at
+  construction. Per-user impersonation (binding a real `session_token` per
+  request) achieves the same effect without a resolver.
+**Limits (same scope as the cancellation registry):** the owner registry is
+per-`MCPRackApp` instance and **single-process** — it does not span Puma workers
+or survive a restart. In a clustered deployment the `initialize` POST and the
+`GET` stream may land on different workers, so the initialize-binding degrades
+to TOFU there. The registry is LRU-bounded (default 10,000 sessions) so a stream
+of `initialize`-without-`DELETE` sessions cannot grow it without limit; evicting
+an active owner just downgrades that id to TOFU on its next attach. Blank
+session ids or blank fingerprints fail closed.
+---
+## Approval Workflows (MCP elicitation)
+`:write` / `:admin` tier tool calls can require human approval before they run,
+using the MCP 2025-06-18 spec-native `elicitation/create` channel. Off by
+default, so existing clients are unaffected.
+```ruby
+# Opt tiers in (process-wide). Has teeth only when an approval gate is installed
+# (the MCP transport installs one per session; see below).
+Parse::Agent.require_approval_for = [:write, :admin]
+```
+The approval gate is a pluggable `agent.approval_gate` consulted inside
+`Parse::Agent#execute` — so it is reachable on the non-MCP path and
+unit-testable with a fake approver. `Parse::Agent::MCPElicitationGate` is the
+spec-native implementation; `Parse::Agent::NullGate` (the default) approves.
+Round-trip over the streaming transport:
+1. A `tools/call` for a gated tier pauses before execution. The server builds an
+   `elicitation/create` request whose payload carries the **approval preview**
+   (for `call_method` the *effective* tier is resolved from the target
+   `agent_method`'s declared permission, so write/admin methods invoked through
+   the readonly `call_method` tool are gated correctly). The preview is a real
+   before/after only for methods that declare `supports_dry_run`; for the
+   built-in `update_object` / `delete_object` it is the proposed `{ tool, args }`
+   call, **not** a fetched before/after of the target row.
+2. The request is pushed to the client over the open **GET listening stream**
+   (the same bus as resource subscriptions).
+3. The client replies with a JSON-RPC response (`{ result: { action: "accept" |
+   "decline" | "cancel" } }`) as a separate POST. The server routes it,
+   session-bound, into a pending registry that wakes the blocked tool thread.
+4. `accept` → the tool runs. Anything else → a structured refusal; the tool
+   never executes.
+Client capability + transport requirements (the server READS, does not
+advertise, the client's `elicitation` capability at `initialize`):
+```ruby
+Parse::Agent::MCPRackApp.new(
+  streaming: true,
+  resource_subscriptions: true,   # or notifications: true — either opens the GET bus
+  approval_timeout: 300,          # seconds to wait for a human; default 300
+  agent_factory: ->(env) { ... },
+)
+```
+**Three prerequisites — miss any one and every gated write fails closed,
+which looks like a bug rather than a config gap:**
+1. **`streaming: true`** on the `MCPRackApp` (it defaults to `false`). Approval
+   needs a server→client request, which only the streaming transport can send.
+2. **An open GET bus** — `notifications: true` *or* `resource_subscriptions:
+   true`. `notifications: true` is the lighter choice if you don't need
+   LiveQuery resource subscriptions. Without a bus there is no channel to
+   deliver `elicitation/create`.
+3. **A concurrent server (Puma), not the bundled `MCPServer`.** The bundled
+   server runs on WEBrick and is non-streaming, so approval can never round-trip
+   there — mount {Parse::Agent.rack_app} under Puma for any deployment that uses
+   approval.
+Operator aid: a write/admin agent served over MCP with `require_approval_for`
+empty emits a one-time `[Parse::Agent:SECURITY]` warning (writes run ungated).
+Approval round-trips also emit a `parse.agent.approval` `ActiveSupport::Notifications`
+event carrying `outcome`, `reason`, and the measured wait — subscribe to it to
+spot a non-answering client holding a dispatcher thread for the full
+`approval_timeout` (default 300s).
+**Fails closed.** When approval is required but the client did not advertise the
+`elicitation` capability, no listening stream is open, the transport is
+non-streaming (WEBrick), or the approver times out, the destructive operation is
+**refused** — never blocked forever, never silently executed. Replies are bound
+to the answering session's `Mcp-Session-Id`, so one session cannot answer (or
+guess the id of) another's prompt.
+---
+## Server-initiated Notifications (general purpose)
+The GET listening-stream bus also backs arbitrary server→client notifications,
+without requiring LiveQuery resource subscriptions:
+```ruby
+app = Parse::Agent::MCPRackApp.new(streaming: true, notifications: true,
+                                   agent_factory: ->(env) { ... })
+# From application code that holds the app reference:
+app.notify("the-session-id", method: "notifications/custom", params: { foo: 1 })
+```
+`notifications: true` builds the listening-stream manager in a `supported:
+false` posture: the GET stream and `#notify` work, but `resources.subscribe`
+stays unadvertised and `resources/subscribe` POSTs fail closed. `#notify` builds
+a JSON-RPC **notification** (never an `id` — that distinguishes it from the
+server-initiated *request* used by elicitation) and returns `false` when no
+stream is attached for the session. `app.subscription_manager` is exposed for an
+out-of-band / clustered publisher that needs the lower-level `publish` seam.
+---
+## Built-in Agent Hardening & Telemetry
+5.2 adds several agent-side controls, all configured on `Parse::Agent`:
+- **Impersonation** — `Parse::Agent.new(impersonate_user: <id|Pointer|User>,
+  impersonate_mint: false, impersonation_label:)` (or `agent.impersonate(user)`
+  / `agent.stop_impersonating!`) resolves a real session token for a `_User`
+  (reusing an active `_Session`, or minting a restricted one with
+  `impersonate_mint: true`) and binds it as if `session_token:` had been passed.
+  Master-key client required; fails closed if no session resolves. An
+  `impersonation_label:` (also usable with `acl_role:`) is emitted on the
+  `parse.agent.tool_call` payload alongside `impersonated_user_id`.
+- **Prompt hardening** (`Parse::Agent::PromptHardening`) — schema descriptions
+  surfaced by `get_schema` / `get_all_schemas` are sanitized (non-identifier
+  field names dropped with a `[Parse::Agent:PROMPT]` warning, control/zero-width
+  chars stripped, capped, marker-wrapped); untrusted tool content has embedded
+  wrapper markers neutralized (`Parse::Agent.prompt_marker_strict = true` to
+  refuse instead). Operator canary phrases via
+  `Parse::Agent.prompt_injection_canaries = ["IGNORE PREVIOUS", /system:/i]`
+  emit `parse.agent.prompt_injection_detected`; set
+  `Parse::Agent.canary_action = :refuse` to raise on a hit.
+  `Parse::Agent::PROMPT_VERSION` is surfaced via
+  `agent.describe[:prompt][:version]`. A one-time warning fires when
+  `allowed_llm_endpoints` is left unrestricted (nil).
+- **Embedding-cost telemetry** — embedding calls made inside a tool span add
+  `embed_calls`, `embed_tokens`, and (when
+  `Parse::Agent.embed_cost_per_million_tokens` is set) `embed_cost_usd` to the
+  `parse.agent.tool_call` payload. The per-tool span does **not** cover
+  corpus/ingestion embeds fired at `Model.save` time (typically the dominant
+  spend) — wrap those in `Parse::Agent.measure_embeddings { … }`, which returns
+  `{ calls:, tokens:, cost_usd: }` for the work done on the calling thread:
+  ```ruby
+  stats = Parse::Agent.measure_embeddings do
+    KnowledgeArticle.save_all(batch)   # embed-on-save
+  end
+  stats # => { calls: 1200, tokens: 4_300_000, cost_usd: 0.43 }
+  ```
+  Thread-local: embeds fanned out to other threads/fibers are not captured —
+  measure inside each worker. `Parse::Agent.embed_cost_usd(tokens)` converts a
+  token count to USD using the configured rate (nil when unset).
+- **Provenance** — `Parse::Agent.include_source_provenance = true` (default
+  false) stamps each read-tool row with `_source = { class, tool, object_id }`,
+  applied after field-allowlist projection and redaction.
+- **`semantic_search` tool** — registered readonly + `client_safe`; opt a model
+  in with `agent_searchable field:, filter_fields:`. See the
+  [Atlas Vector Search Guide](./atlas_vector_search_guide.md#retrieval-rag).
+### Runtime denial gates
+Beyond the permission-tier and env-gate checks, several gates refuse a tool
+call at runtime based on its arguments. They fail closed; a caller sees a
+structured error (the built-in tools return `{ success: false, error:,
+error_code: }`, which surfaces as `isError: true` over MCP). Knowing them up
+front avoids discovering each only on impact:
+| Gate | When it fires | Surfaced as |
+|------|---------------|-------------|
+| Missing tenant scope | A searchable class has no `agent_tenant_scope` while other classes do (tenant-aware deployment) | `Parse::Agent::MissingTenantScope` (search path); a one-time `[Parse::Agent:SECURITY]` lint warning on the general query path |
+| No tenant binding | A scoped class is queried by an agent whose tenant value resolves to `nil` | `Parse::Agent::AccessDenied` (`kind: :tenant`) |
+| Hidden class | A tool targets an `agent_hidden` class (or one outside a per-instance `classes:` allowlist) | `Parse::Agent::AccessDenied` (`kind: :hidden_class`) / off-allowlist refusal |
+| Reserved underscore key | A `filter:` / `vector_filter:` / `where:` contains an underscore-prefixed key (`_rperm`, `_p_*`, …) at any depth | `ArgumentError` / `ValidationError` (recursive refusal) |
+| Filter-field allowlist | A `filter:` / `vector_filter:` names a field not in the class's `agent_searchable filter_fields:` | `ValidationError` naming the offending field(s) |
+| `text_field` not embedded | `semantic_search` `text_field:` names a field that isn't a declared `embed` source | `ValidationError` listing the allowed sources |
+| Tool filtered | A tool/method removed by a per-instance `tools:` / `methods:` filter is invoked | `error_code: :tool_filtered` |
+| Approval denied/unavailable | A gated write/admin op is rejected or the approver is unreachable | `error_code: :approval_denied` |
+---
+## Token Economy
+The MCP surface is paid for in LLM context tokens — the tool schemas sent every
+session, and the data every tool returns. 5.2 adds controls to keep that cost
+down.
+### Lean tool profile
+A full `:readonly` `tools/list` payload is roughly **7.9K context tokens** every
+session. For small-context models or token-sensitive deployments, the `:lean`
+profile narrows the surface to the six core read tools (`get_all_schemas`,
+`get_schema`, `query_class`, `count_objects`, `get_object`, `aggregate`) —
+about **2.6K tokens, a ~67% reduction**:
+```ruby
+Parse::Agent.new(permissions: :readonly, tools: :lean)
+```
+A profile is an allowlist: it composes with the permission tier and can only
+narrow, never elevate. Profiles are Symbol-only (`Parse::Agent::TOOL_PROFILES`);
+for finer control still pass an explicit Array or `{ only:, except: }`. An
+unknown profile raises rather than silently exposing the full surface.
+### Leaner tool responses
+Read tools return rows in an LLM-friendly form (Pointers as `{_type, class,
+id}`, Dates as bare ISO strings) and now **strip the raw `ACL` map** — it is
+operationally useless to a model (effective authority is enforced server-side
+regardless) and is pure token overhead plus a minor role/user-id disclosure.
+`get_objects` and the Atlas Search tools now go through the same normalization
+`query_class` always used, instead of shipping raw wire-form.
+Defaults that bound response size: `query_class` `limit:` defaults to 100 (cap
+1000) with the rendered array capped at 50 (`truncated_note`); `aggregate`
+auto-injects a terminal `$limit: 200`. Pass a smaller `limit:` / project fewer
+fields via `keys:` when you want a tighter result.
+### `semantic_search` — deduped sources and a token budget
+The `semantic_search` result hoists each chunk's parent record **once** into a
+`documents` map keyed by `objectId`, instead of duplicating the full source on
+every chunk — map a chunk back to its source via `metadata.object_id`:
+```jsonc
+{
+  "chunks": [
+    { "id": "a#0", "score": 0.82, "content": "…", "metadata": { "object_id": "a", "chunk_index": 0 } },
+    { "id": "a#1", "score": 0.82, "content": "…", "metadata": { "object_id": "a", "chunk_index": 1 } }
+  ],
+  "documents": { "a": { "objectId": "a", "title": "…" } },
+  "count": 2
+}
+```
+A `max_total_tokens` budget (default 20,000; estimated as chars/4) trims the
+lowest-ranked chunks so a few long documents can't silently blow the context
+window — the count caps (`k * max_chunks_per_document`) bound the chunk *count*
+but not their total size. When the budget trims, the result adds
+`budget_truncated: true` and `budget_dropped: <n>` so the truncation is never
+silent. Pass `max_total_tokens: 0` to disable.
+### Structured error metadata on the wire
+A failing `tools/call` already carries `error_code` and a structured `details:`
+hash (e.g. `allowed_fields`, `suggested_rewrite`) and `retry_after` — these are
+now forwarded on the MCP error envelope under `_meta` (`parse.error_code`,
+`parse.retry_after`, `parse.details`) so a client can branch deterministically
+and honor `retry_after` instead of re-parsing the prose message. The
+human-readable `content` text is unchanged.
+`get_schema` on a mistyped class name now raises a `ValidationError` carrying a
+"Did you mean: …?" hint (near matches from the locally-known classes), so the
+model self-corrects in one retry instead of falling back to a full
+`get_all_schemas` sweep.
+---
 ## Custom Authentication
 The agent factory pattern gives you full control over authentication. Every request passes through the factory before any Parse operation is attempted.
@@ -739,6 +1237,10 @@ agent = Parse::Agent.new(tools: { only: [:query_class, :get_schema, :aggregate],
 # Denylist only
 agent = Parse::Agent.new(tools: { except: [:emit_artifact] })
+# Named profile (Symbol) — :lean narrows to the six core read tools
+# (~67% smaller tools/list). See "Token Economy" above.
+agent = Parse::Agent.new(tools: :lean)
 ```
 **Resolution order** is strict: env-gates ▷ permission tier ▷ per-instance filter. The filter cannot elevate — `tools: { only: [:delete_object] }` on a `:readonly` agent still excludes `delete_object` because `delete_object` is not in the readonly tier's permitted set in the first place.
@@ -1667,6 +2169,8 @@ Known `details[:kind]` subcodes for `:access_denied`:
 The top-level `error_code` stays at `:access_denied` for back-compat with consumers that only branch on it. The new subcode is purely additive — clients that ignore `details:` see no change in behavior.
+**On the wire (5.2+):** `error_code`, `retry_after`, and `details` are forwarded on the MCP tool-error envelope under `_meta` — `parse.error_code`, `parse.retry_after`, `parse.details` — so a spec-compliant client can branch deterministically (and honor `retry_after`) without parsing the prose `content` text. The `content` text and `isError: true` are unchanged.
 ---
 ## Performance and Timeouts