parse-stack-next 5.1.1 → 5.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (60) hide show
  1. checksums.yaml +4 -4
  2. data/.env.sample +12 -0
  3. data/.env.test +4 -4
  4. data/CHANGELOG.md +630 -0
  5. data/Gemfile +3 -0
  6. data/Gemfile.lock +6 -1
  7. data/README.md +226 -39
  8. data/Rakefile +56 -10
  9. data/docs/atlas_vector_search_guide.md +110 -9
  10. data/docs/mcp_guide.md +504 -0
  11. data/docs/mongodb_direct_guide.md +66 -1
  12. data/docs/mongodb_index_optimization_guide.md +22 -1
  13. data/docs/usage_guide.md +15 -0
  14. data/lib/parse/agent/approval_gate.rb +0 -0
  15. data/lib/parse/agent/constraint_translator.rb +90 -19
  16. data/lib/parse/agent/describe.rb +1 -0
  17. data/lib/parse/agent/errors.rb +16 -0
  18. data/lib/parse/agent/mcp_client.rb +9 -0
  19. data/lib/parse/agent/mcp_dispatcher.rb +139 -7
  20. data/lib/parse/agent/mcp_rack_app.rb +621 -17
  21. data/lib/parse/agent/mcp_subscriptions.rb +607 -0
  22. data/lib/parse/agent/metadata_dsl.rb +58 -0
  23. data/lib/parse/agent/metadata_registry.rb +141 -1
  24. data/lib/parse/agent/prompt_hardening.rb +213 -0
  25. data/lib/parse/agent/result_formatter.rb +18 -3
  26. data/lib/parse/agent/tools.rb +167 -24
  27. data/lib/parse/agent.rb +692 -21
  28. data/lib/parse/client/request.rb +55 -4
  29. data/lib/parse/client/response.rb +4 -0
  30. data/lib/parse/client.rb +205 -7
  31. data/lib/parse/model/classes/installation.rb +27 -10
  32. data/lib/parse/model/classes/user.rb +8 -0
  33. data/lib/parse/model/core/actions.rb +65 -13
  34. data/lib/parse/model/core/embed_managed.rb +19 -14
  35. data/lib/parse/model/core/indexing.rb +108 -16
  36. data/lib/parse/model/core/querying.rb +29 -0
  37. data/lib/parse/model/model.rb +34 -3
  38. data/lib/parse/model/object.rb +42 -0
  39. data/lib/parse/query.rb +90 -24
  40. data/lib/parse/retrieval/agent_tool.rb +369 -0
  41. data/lib/parse/retrieval/chunk.rb +74 -0
  42. data/lib/parse/retrieval/chunker.rb +208 -0
  43. data/lib/parse/retrieval/retriever.rb +274 -0
  44. data/lib/parse/retrieval.rb +10 -0
  45. data/lib/parse/schema.rb +69 -20
  46. data/lib/parse/stack/version.rb +2 -2
  47. data/lib/parse/webhooks/payload.rb +62 -34
  48. data/lib/parse/webhooks.rb +15 -3
  49. data/parse-stack-next.gemspec +1 -1
  50. data/scripts/docker/docker-compose.atlas.yml +14 -10
  51. data/scripts/docker/docker-compose.test.yml +24 -20
  52. data/scripts/docker/mongo-init.js +3 -3
  53. data/scripts/start-parse.sh +10 -0
  54. data/scripts/start_mcp_server.rb +1 -1
  55. data/scripts/test_server_connection.rb +1 -1
  56. data/scripts/vector_prototype/create_vector_index.js +1 -1
  57. data/scripts/vector_prototype/fetch_embeddings.py +2 -2
  58. data/scripts/vector_prototype/query_prototype.rb +1 -1
  59. data/scripts/vector_prototype/run.sh +4 -4
  60. metadata +10 -2
@@ -351,23 +351,124 @@ Mechanics:
351
351
 
352
352
  ### Single vector per record
353
353
 
354
- `embed` produces exactly one vector per record. There is no built-in
355
- chunker. Long source text whose concatenation exceeds the provider's
356
- per-call token budget will be truncated provider-side, and the
357
- resulting vector will represent only the leading portion of the
358
- document.
354
+ `embed` produces exactly one vector per record. Long source text whose
355
+ concatenation exceeds the provider's per-call token budget is truncated
356
+ provider-side, and the stored vector represents only the leading portion
357
+ of the document. **Chunking happens at retrieval time, not embed time**
358
+ (see [Retrieval (RAG)](#retrieval-rag) below): the embedding stays
359
+ one-vector-per-record by design.
359
360
 
360
- For long-form content, two options:
361
+ If you instead want each passage to have its OWN embedding (true
362
+ embed-time chunking), use one of these patterns:
361
363
 
362
364
  1. **Pre-chunk client-side** and write each chunk as its own
363
365
  `Parse::Object` record with its own `embed` declaration.
364
- 2. **Dedicated `Chunk` subclass** that `belongs_to` the parent, with
366
+ 2. **Dedicated chunk subclass** that `belongs_to` the parent, with
365
367
  `embed :content, into: :embedding` on the chunk class itself. Run
366
368
  similarity search against the chunk collection, then hydrate
367
369
  parents as needed.
368
370
 
369
- A built-in chunker plus a `semantic_search` agent tool are scheduled
370
- for a future release.
371
+ ---
372
+
373
+ ## Retrieval (RAG)
374
+
375
+ `Parse::Retrieval` (`Parse::RAG` is an alias) sits on top of
376
+ `find_similar`. `Parse::Retrieval.retrieve` embeds a natural-language
377
+ query, runs Atlas `$vectorSearch` through `find_similar` (so ACL/CLP are
378
+ enforced mongo-direct — there is no REST two-stage re-query), and splits
379
+ each retrieved document's text field into scored, citable chunks.
380
+ Chunking here is **presentation-only**: every chunk inherits its parent
381
+ document's single `$vectorSearch` score.
382
+
383
+ ```ruby
384
+ chunks = Parse::Retrieval.retrieve(
385
+ query: "how do I reset my password?",
386
+ klass: KnowledgeArticle, # or "KnowledgeArticle"
387
+ field: :embedding, # optional; auto-resolves a single :vector field
388
+ k: 5,
389
+ filter: { published: true }, # post-$vectorSearch $match
390
+ vector_filter: nil, # Atlas-native pre-filter (fields must be type:"filter")
391
+ tenant_scope: nil, # { field:, value: } merged into vector_filter
392
+ score_quantize: false,
393
+ session_token: user.session_token, # ACL scope kwargs pass through to find_similar
394
+ )
395
+ # => Array<Parse::Retrieval::Chunk> — { id, score, content, source, metadata }
396
+ ```
397
+
398
+ `rerank:` and `hybrid:` are reserved on the signature and raise
399
+ `NotImplementedError` if supplied.
400
+
401
+ ### Chunkers
402
+
403
+ The default is a fixed-size sliding window with overlap. Subclass
404
+ `Parse::Retrieval::Chunker::Base` (implement `#chunk(text) -> Array<String>`)
405
+ for semantic / sentence-aware strategies.
406
+
407
+ ```ruby
408
+ Parse::Retrieval::Chunker::FixedSizeOverlap.new(
409
+ size: 800, # window width
410
+ overlap: 100, # units shared between consecutive windows (must be < size)
411
+ by: :chars, # :chars (default) or :tokens (whitespace tokens)
412
+ max_chunks_per_document: 200, # amplification cap — TRUNCATES with a signal, never raises
413
+ )
414
+ ```
415
+
416
+ ### `agent_searchable` + the `semantic_search` agent tool
417
+
418
+ Opt a model in to agentic retrieval, declaring the vector field and the
419
+ fields an agent may filter on:
420
+
421
+ ```ruby
422
+ class KnowledgeArticle < Parse::Object
423
+ property :title, :string
424
+ property :body, :string
425
+ property :embedding, :vector, dimensions: 1536, provider: :openai
426
+ embed :title, :body, into: :embedding
427
+ agent_searchable field: :embedding, filter_fields: %i[published category]
428
+ end
429
+ ```
430
+
431
+ Every property referenced by `embed` must be declared — omitting
432
+ `property :title` here raises `InvalidEmbedDeclaration` at class load.
433
+
434
+ Because this model embeds **two** text sources (`:title` and `:body`),
435
+ `semantic_search` cannot guess which one to chunk and return as the
436
+ result `content`. Pass `text_field:` to choose (it must name one of the
437
+ embedded sources); a single-source model infers it automatically and the
438
+ parameter is optional:
439
+
440
+ ```ruby
441
+ # via the agent tool (LLM-facing parameter)
442
+ semantic_search(class_name: "KnowledgeArticle", query: "vector indexes",
443
+ text_field: "body")
444
+
445
+ # or directly
446
+ Parse::Retrieval.retrieve(query: "vector indexes", klass: KnowledgeArticle,
447
+ text_field: :body)
448
+ ```
449
+
450
+ The readonly, `client_safe` `semantic_search` tool then routes through
451
+ `Parse::Retrieval.retrieve` with the full agent security envelope:
452
+ searchable-class allowlist (`MetadataRegistry.resolve_searchable!`),
453
+ recursive underscore-key refusal + filter-field allowlist on caller
454
+ input, tenant scope folded into the Atlas pre-filter AND re-asserted on
455
+ every returned record, `field_allowlist` projection of each source, and
456
+ score quantization in non-admin contexts. In a tenant-aware deployment
457
+ (any class declares `agent_tenant_scope`), a searchable class without its
458
+ own tenant scope is refused at dispatch. See the
459
+ [MCP guide](./mcp_guide.md) for the agent-side wiring.
460
+
461
+ **Result shape (token-economy).** The tool returns
462
+ `{ chunks:, documents:, count: }`. Each chunk's parent record is hoisted
463
+ **once** into `documents` (keyed by `objectId`) rather than duplicated on
464
+ every chunk — map a chunk to its source via `metadata.object_id`. A
465
+ `max_total_tokens:` budget (default 20,000; estimated chars/4) trims the
466
+ lowest-ranked chunks so a few long documents can't silently blow the
467
+ context window, adding `budget_truncated: true` / `budget_dropped: <n>`
468
+ when it trims (pass `0` to disable). The library-level
469
+ `Parse::Retrieval.retrieve` still returns the flat `Array<Chunk>` with
470
+ `source` on each chunk — the dedup and budget live in the agent tool's
471
+ envelope. See the [MCP guide's Token Economy section](./mcp_guide.md#token-economy).
371
472
 
372
473
  ---
373
474
 
data/docs/mcp_guide.md CHANGED
@@ -360,6 +360,504 @@ Common uses for the direct dispatcher:
360
360
 
361
361
  ---
362
362
 
363
+ ## Connecting Claude Desktop (stdio bridge)
364
+
365
+ Parse Stack speaks MCP over **HTTP** (the standalone server and the
366
+ Rack adapter both expose a JSON-RPC-over-HTTP endpoint). Claude Desktop,
367
+ however, launches MCP servers as local **stdio** subprocesses — it does
368
+ not dial an HTTP URL directly. Bridge the two with
369
+ [`mcp-remote`](https://www.npmjs.com/package/mcp-remote), a small stdio↔HTTP
370
+ proxy that Claude Desktop runs as the subprocess and which forwards to your
371
+ HTTP endpoint.
372
+
373
+ 1. Start the Parse Stack MCP endpoint over HTTP (standalone or Rack — see
374
+ Deployment Modes above) and note its URL and the bearer token your
375
+ `agent_factory` expects, e.g. `http://localhost:3001/` with
376
+ `Authorization: Bearer <token>`.
377
+
378
+ 2. Add the bridge to `claude_desktop_config.json` (macOS:
379
+ `~/Library/Application Support/Claude/claude_desktop_config.json`;
380
+ Windows: `%APPDATA%\Claude\claude_desktop_config.json`):
381
+
382
+ ```json
383
+ {
384
+ "mcpServers": {
385
+ "parse-stack": {
386
+ "command": "npx",
387
+ "args": [
388
+ "-y",
389
+ "mcp-remote",
390
+ "http://localhost:3001/",
391
+ "--header",
392
+ "Authorization: Bearer ${PARSE_MCP_TOKEN}"
393
+ ],
394
+ "env": {
395
+ "PARSE_MCP_TOKEN": "your-mcp-token"
396
+ }
397
+ }
398
+ }
399
+ }
400
+ ```
401
+
402
+ 3. Restart Claude Desktop. The Parse Stack tools (`query_class`,
403
+ `get_schema`, `semantic_search`, …) appear in the client.
404
+
405
+ Notes:
406
+
407
+ - `mcp-remote` requires Node.js on the machine running Claude Desktop.
408
+ - For a public endpoint, terminate TLS in front of the HTTP server and use
409
+ an `https://` URL; the bearer token rides the `Authorization` header.
410
+ - The same bridge works for any stdio-only MCP client (e.g. some IDE
411
+ integrations). Clients that support remote MCP connectors natively can
412
+ point at the HTTP URL without the bridge.
413
+ - Approval workflows (elicitation) need the streaming/listening-stream
414
+ prerequisites described under Approval Workflows — confirm the bridge and
415
+ client forward the SSE channel before relying on human-in-the-loop gating.
416
+
417
+ ---
418
+
419
+ ## Resource Subscriptions (LiveQuery bridge)
420
+
421
+ MCP lets a client `resources/subscribe` to a resource URI and then receive
422
+ unsolicited `notifications/resources/updated` messages whenever the underlying
423
+ data changes. Parse Stack bridges that surface onto Parse LiveQuery: a
424
+ subscribed `parse://<Class>/count` or `parse://<Class>/samples` resource is
425
+ backed by a LiveQuery subscription on `<Class>`, and any matching
426
+ create/update/delete/enter/leave event is debounced into a single coarse
427
+ update for that URI. The client re-reads the resource via `resources/read` to
428
+ obtain the new value — row payloads are never streamed through the resource
429
+ surface.
430
+
431
+ This is opt-in and requires a streaming-capable Rack server (Puma, Falcon —
432
+ WEBrick buffers responses and cannot hold the listening stream open) plus
433
+ LiveQuery enabled and configured.
434
+
435
+ ```ruby
436
+ # Boot: enable LiveQuery and point it at the server.
437
+ Parse.setup(
438
+ server_url: "https://your-parse-server.com/parse",
439
+ application_id: "your_app_id",
440
+ api_key: "your_api_key",
441
+ live_query_url: "wss://your-parse-server.com",
442
+ )
443
+ Parse.live_query_enabled = true
444
+
445
+ # Mount the Rack app with resource subscriptions enabled.
446
+ app = Parse::Agent::MCPRackApp.new(resource_subscriptions: true) do |env|
447
+ token = env["HTTP_AUTHORIZATION"].to_s.delete_prefix("Bearer ")
448
+ MyAuth.agent_for_token!(token) # returns a Parse::Agent or raises Unauthorized
449
+ end
450
+ ```
451
+
452
+ When enabled and LiveQuery is available, the `initialize` handshake advertises
453
+ `resources.subscribe: true`. When LiveQuery is not enabled/available — or on
454
+ the WEBrick `MCPServer`, which cannot stream — the capability stays
455
+ `subscribe: false` and `resources/subscribe` returns a "not supported" error.
456
+ The capability is a contract: it is never advertised unless the server can
457
+ actually deliver updates.
458
+
459
+ ### Protocol flow
460
+
461
+ 1. **`initialize`** — the response carries a server-issued `Mcp-Session-Id`
462
+ header. The client echoes it on every subsequent request.
463
+ 2. **`GET` listening stream** — the client opens a long-lived `GET` to the same
464
+ endpoint with `Accept: text/event-stream` and the `Mcp-Session-Id` header.
465
+ This is the server→client channel; it stays open and emits
466
+ `notifications/resources/updated` events until the client disconnects.
467
+ 3. **`resources/subscribe`** — a normal `POST` with
468
+ `{ "uri": "parse://Post/count" }`. Returns an empty result; updates begin
469
+ flowing on the listening stream.
470
+ 4. **`resources/unsubscribe`** — stops one subscription. `DELETE` with the
471
+ session id tears the whole session down.
472
+
473
+ Only `count` and `samples` resources are subscribable. `schema` is rejected
474
+ with an invalid-params error because schema changes are not LiveQuery events.
475
+
476
+ ### Access control (important)
477
+
478
+ The bridge enforces the same scope rules as the rest of the SDK. LiveQuery
479
+ filters events server-side using the credential on the subscribe frame, so the
480
+ subscription's credentials are derived from the subscribing agent:
481
+
482
+ | Agent scope | LiveQuery credential | Events seen |
483
+ |-------------|----------------------|-------------|
484
+ | session-token agent | that session token | only rows the user can read (ACL/CLP enforced by Parse Server) |
485
+ | master-key agent | master key | every event |
486
+ | `acl_user:` / `acl_role:` agent | **refused** | none — see below |
487
+
488
+ `acl_user:` / `acl_role:` agents are an SDK-side, mongo-direct-only construct
489
+ with no Parse Server REST or LiveQuery equivalent (Parse Server has no
490
+ "act as this user pointer / role" handshake). Bridging them would force a
491
+ silent downgrade to either master key (a row-level leak) or an unscoped
492
+ session, so the bridge **fails closed** and refuses the subscription with a
493
+ security error. Subscribe with a session-token or master-key agent instead.
494
+
495
+ Because Parse Server fixes ACL-bypass authorization at LiveQuery *connect*
496
+ time (there is no per-subscription master key), the bridge keeps two
497
+ connections and routes by credential: master-posture subscriptions ride a
498
+ dedicated **admin** connection
499
+ (`Parse::LiveQuery::Client.new(use_master_key: true)`), while session-token
500
+ subscriptions ride a normal connection and pass their token per subscription.
501
+ Either way, an update only fires for an object the subscription's scope is
502
+ permitted to read — LiveQuery filters events by ACL server-side. (Whether a
503
+ master connection additionally surfaces master-key-only rows depends on the
504
+ Parse Server version and its `masterKeyIps` configuration.)
505
+
506
+ ### Operational notes and limitations
507
+
508
+ - **Single-process.** Subscription state lives in the `MCPRackApp` instance
509
+ (like the cancellation registry), so in a clustered / multi-process
510
+ deployment a LiveQuery event observed on one worker does not reach a
511
+ listening stream held on another. The delivery seam
512
+ (`Parse::Agent::MCPSubscriptions::Notifier`) is isolated so a Redis-backed
513
+ pub/sub adapter can be supplied later without changing the bridge or the
514
+ dispatcher; pass it via `subscription_manager:`.
515
+ - **Subscriptions do not survive a listening-stream reconnect.** Closing the
516
+ `GET` stream tears down the session's LiveQuery subscriptions; a client that
517
+ reconnects must re-issue its `resources/subscribe` calls.
518
+ - **Listening streams are owner-bound (not a bare bearer capability).** The
519
+ stream authenticates via the agent factory *and* the server-issued
520
+ `Mcp-Session-Id` is bound to the principal that established it, so another
521
+ authenticated caller who knows or guesses the id is refused with `403`. The
522
+ `Mcp-Session-Id` is still secret-bearing and should be kept confidential, but
523
+ possession alone is no longer sufficient — see **Listening-stream ownership**
524
+ below for the binding model, its limits, and the `principal_resolver:` knob
525
+ master-key deployments need to make it effective.
526
+ - **Per-session and global caps.** A client that subscribes but never opens (or
527
+ later drops) its listening stream leaves LiveQuery subscriptions running until
528
+ the session is torn down. A per-session ceiling (default 100,
529
+ `max_subscriptions_per_session:` on the manager) bounds one session's
530
+ footprint, and a global ceiling on the number of distinct subscribing sessions
531
+ (default 10,000, `max_sessions:`) bounds total growth. The global cap is a
532
+ rejection cap (new sessions are refused with a JSON-RPC error once it is
533
+ reached) and fails closed.
534
+ - **Concurrent listening streams are bounded separately from request SSE.**
535
+ `max_concurrent_dispatchers:` does **not**, by itself, bound the GET listening
536
+ streams used for resource subscriptions and notifications — those get their own
537
+ soft cap *equal to* `max_concurrent_dispatchers`. So the effective steady-state
538
+ ceiling across both surfaces is up to **2× `max_concurrent_dispatchers`** (up
539
+ to N request-scoped SSE dispatchers plus N listening streams). Size the value
540
+ with that 2× factor in mind (e.g. relative to your Puma `max_threads`). Leaving
541
+ it unset (the default `nil`) leaves both surfaces uncapped; the app logs a
542
+ one-time warning at construction when a streaming or subscription/notification
543
+ surface is enabled without a cap.
544
+
545
+ ### Listening-stream ownership
546
+
547
+ The GET listening stream is the single server→client bus shared by resource
548
+ subscriptions, [server-initiated notifications](#server-initiated-notifications-general-purpose),
549
+ and [approval elicitation](#approval-workflows-mcp-elicitation). Whoever holds
550
+ that stream receives everything pushed to its `Mcp-Session-Id` — another
551
+ session's `notifications/resources/updated`, `elicitation/create` approval
552
+ prompts, and arbitrary `notify` payloads. So the stream is **owner-bound**: a
553
+ session is tied to the principal that established it, and only the same
554
+ principal may later open (or re-open) its stream.
555
+
556
+ How the binding is established and checked:
557
+
558
+ - **Initialize-bound.** A session created through an `initialize` POST is bound
559
+ authoritatively to that caller's principal. A later `GET` carrying the same
560
+ `Mcp-Session-Id` from a *different* principal is refused with HTTP `403`
561
+ (`-32600`, "Mcp-Session-Id is owned by another principal"). A re-`initialize`
562
+ by the same caller refreshes the binding.
563
+ - **Trust-on-first-use (TOFU) for the decoupled bus.** A session id that
564
+ `initialize` never saw — the `notifications: true` bus, where application code
565
+ pushes to ids it chose itself — is claimed by the first principal to attach a
566
+ listener; a different principal attaching afterward is refused. TOFU closes
567
+ the prior model's eviction-after-claim hole (a second caller could overwrite
568
+ or shadow an existing listener), but a first-mover attacker can still claim an
569
+ *unused* id, so **notification-bus session ids must be high-entropy**.
570
+ - **Stream close keeps the claim.** The binding is dropped only on an explicit
571
+ `DELETE` termination, not on mere stream close — a reconnecting owner keeps
572
+ its claim, and an attacker cannot grab the id during a brief disconnect.
573
+
574
+ The principal fingerprint is derived, in order, from: an operator-supplied
575
+ `principal_resolver:`, then the agent's `session_token` (hashed), then
576
+ `acl_user`, then `acl_role`. With none of these the agent falls back to a shared
577
+ `"mk"` (master-key) principal:
578
+
579
+ - **A master-key-everywhere factory makes owner-binding a no-op.** If every
580
+ request builds a bare master-key agent (no `session_token:` / `acl_user:` /
581
+ `acl_role:`), all agents share the `"mk"` fingerprint and are
582
+ indistinguishable, so the `403` never fires among them. Deployments that
583
+ authenticate users upstream and run master-key agents should supply a
584
+ `principal_resolver:` to restore a real per-user identity:
585
+
586
+ ```ruby
587
+ app = Parse::Agent::MCPRackApp.new(
588
+ streaming: true,
589
+ notifications: true, # or resource_subscriptions: true
590
+ principal_resolver: ->(agent, env) {
591
+ # Return a stable per-user id (String). nil/empty falls through to the
592
+ # agent's own scope, then to the shared "mk" principal.
593
+ env["myapp.authenticated_user_id"]
594
+ },
595
+ agent_factory: ->(env) { ... },
596
+ )
597
+ ```
598
+
599
+ The resolver must respond to `#call`; an invalid one raises `ArgumentError` at
600
+ construction. Per-user impersonation (binding a real `session_token` per
601
+ request) achieves the same effect without a resolver.
602
+
603
+ **Limits (same scope as the cancellation registry):** the owner registry is
604
+ per-`MCPRackApp` instance and **single-process** — it does not span Puma workers
605
+ or survive a restart. In a clustered deployment the `initialize` POST and the
606
+ `GET` stream may land on different workers, so the initialize-binding degrades
607
+ to TOFU there. The registry is LRU-bounded (default 10,000 sessions) so a stream
608
+ of `initialize`-without-`DELETE` sessions cannot grow it without limit; evicting
609
+ an active owner just downgrades that id to TOFU on its next attach. Blank
610
+ session ids or blank fingerprints fail closed.
611
+
612
+ ---
613
+
614
+ ## Approval Workflows (MCP elicitation)
615
+
616
+ `:write` / `:admin` tier tool calls can require human approval before they run,
617
+ using the MCP 2025-06-18 spec-native `elicitation/create` channel. Off by
618
+ default, so existing clients are unaffected.
619
+
620
+ ```ruby
621
+ # Opt tiers in (process-wide). Has teeth only when an approval gate is installed
622
+ # (the MCP transport installs one per session; see below).
623
+ Parse::Agent.require_approval_for = [:write, :admin]
624
+ ```
625
+
626
+ The approval gate is a pluggable `agent.approval_gate` consulted inside
627
+ `Parse::Agent#execute` — so it is reachable on the non-MCP path and
628
+ unit-testable with a fake approver. `Parse::Agent::MCPElicitationGate` is the
629
+ spec-native implementation; `Parse::Agent::NullGate` (the default) approves.
630
+
631
+ Round-trip over the streaming transport:
632
+
633
+ 1. A `tools/call` for a gated tier pauses before execution. The server builds an
634
+ `elicitation/create` request whose payload carries the **approval preview**
635
+ (for `call_method` the *effective* tier is resolved from the target
636
+ `agent_method`'s declared permission, so write/admin methods invoked through
637
+ the readonly `call_method` tool are gated correctly). The preview is a real
638
+ before/after only for methods that declare `supports_dry_run`; for the
639
+ built-in `update_object` / `delete_object` it is the proposed `{ tool, args }`
640
+ call, **not** a fetched before/after of the target row.
641
+ 2. The request is pushed to the client over the open **GET listening stream**
642
+ (the same bus as resource subscriptions).
643
+ 3. The client replies with a JSON-RPC response (`{ result: { action: "accept" |
644
+ "decline" | "cancel" } }`) as a separate POST. The server routes it,
645
+ session-bound, into a pending registry that wakes the blocked tool thread.
646
+ 4. `accept` → the tool runs. Anything else → a structured refusal; the tool
647
+ never executes.
648
+
649
+ Client capability + transport requirements (the server READS, does not
650
+ advertise, the client's `elicitation` capability at `initialize`):
651
+
652
+ ```ruby
653
+ Parse::Agent::MCPRackApp.new(
654
+ streaming: true,
655
+ resource_subscriptions: true, # or notifications: true — either opens the GET bus
656
+ approval_timeout: 300, # seconds to wait for a human; default 300
657
+ agent_factory: ->(env) { ... },
658
+ )
659
+ ```
660
+
661
+ **Three prerequisites — miss any one and every gated write fails closed,
662
+ which looks like a bug rather than a config gap:**
663
+
664
+ 1. **`streaming: true`** on the `MCPRackApp` (it defaults to `false`). Approval
665
+ needs a server→client request, which only the streaming transport can send.
666
+ 2. **An open GET bus** — `notifications: true` *or* `resource_subscriptions:
667
+ true`. `notifications: true` is the lighter choice if you don't need
668
+ LiveQuery resource subscriptions. Without a bus there is no channel to
669
+ deliver `elicitation/create`.
670
+ 3. **A concurrent server (Puma), not the bundled `MCPServer`.** The bundled
671
+ server runs on WEBrick and is non-streaming, so approval can never round-trip
672
+ there — mount {Parse::Agent.rack_app} under Puma for any deployment that uses
673
+ approval.
674
+
675
+ Operator aid: a write/admin agent served over MCP with `require_approval_for`
676
+ empty emits a one-time `[Parse::Agent:SECURITY]` warning (writes run ungated).
677
+ Approval round-trips also emit a `parse.agent.approval` `ActiveSupport::Notifications`
678
+ event carrying `outcome`, `reason`, and the measured wait — subscribe to it to
679
+ spot a non-answering client holding a dispatcher thread for the full
680
+ `approval_timeout` (default 300s).
681
+
682
+ **Fails closed.** When approval is required but the client did not advertise the
683
+ `elicitation` capability, no listening stream is open, the transport is
684
+ non-streaming (WEBrick), or the approver times out, the destructive operation is
685
+ **refused** — never blocked forever, never silently executed. Replies are bound
686
+ to the answering session's `Mcp-Session-Id`, so one session cannot answer (or
687
+ guess the id of) another's prompt.
688
+
689
+ ---
690
+
691
+ ## Server-initiated Notifications (general purpose)
692
+
693
+ The GET listening-stream bus also backs arbitrary server→client notifications,
694
+ without requiring LiveQuery resource subscriptions:
695
+
696
+ ```ruby
697
+ app = Parse::Agent::MCPRackApp.new(streaming: true, notifications: true,
698
+ agent_factory: ->(env) { ... })
699
+
700
+ # From application code that holds the app reference:
701
+ app.notify("the-session-id", method: "notifications/custom", params: { foo: 1 })
702
+ ```
703
+
704
+ `notifications: true` builds the listening-stream manager in a `supported:
705
+ false` posture: the GET stream and `#notify` work, but `resources.subscribe`
706
+ stays unadvertised and `resources/subscribe` POSTs fail closed. `#notify` builds
707
+ a JSON-RPC **notification** (never an `id` — that distinguishes it from the
708
+ server-initiated *request* used by elicitation) and returns `false` when no
709
+ stream is attached for the session. `app.subscription_manager` is exposed for an
710
+ out-of-band / clustered publisher that needs the lower-level `publish` seam.
711
+
712
+ ---
713
+
714
+ ## Built-in Agent Hardening & Telemetry
715
+
716
+ 5.2 adds several agent-side controls, all configured on `Parse::Agent`:
717
+
718
+ - **Impersonation** — `Parse::Agent.new(impersonate_user: <id|Pointer|User>,
719
+ impersonate_mint: false, impersonation_label:)` (or `agent.impersonate(user)`
720
+ / `agent.stop_impersonating!`) resolves a real session token for a `_User`
721
+ (reusing an active `_Session`, or minting a restricted one with
722
+ `impersonate_mint: true`) and binds it as if `session_token:` had been passed.
723
+ Master-key client required; fails closed if no session resolves. An
724
+ `impersonation_label:` (also usable with `acl_role:`) is emitted on the
725
+ `parse.agent.tool_call` payload alongside `impersonated_user_id`.
726
+ - **Prompt hardening** (`Parse::Agent::PromptHardening`) — schema descriptions
727
+ surfaced by `get_schema` / `get_all_schemas` are sanitized (non-identifier
728
+ field names dropped with a `[Parse::Agent:PROMPT]` warning, control/zero-width
729
+ chars stripped, capped, marker-wrapped); untrusted tool content has embedded
730
+ wrapper markers neutralized (`Parse::Agent.prompt_marker_strict = true` to
731
+ refuse instead). Operator canary phrases via
732
+ `Parse::Agent.prompt_injection_canaries = ["IGNORE PREVIOUS", /system:/i]`
733
+ emit `parse.agent.prompt_injection_detected`; set
734
+ `Parse::Agent.canary_action = :refuse` to raise on a hit.
735
+ `Parse::Agent::PROMPT_VERSION` is surfaced via
736
+ `agent.describe[:prompt][:version]`. A one-time warning fires when
737
+ `allowed_llm_endpoints` is left unrestricted (nil).
738
+ - **Embedding-cost telemetry** — embedding calls made inside a tool span add
739
+ `embed_calls`, `embed_tokens`, and (when
740
+ `Parse::Agent.embed_cost_per_million_tokens` is set) `embed_cost_usd` to the
741
+ `parse.agent.tool_call` payload. The per-tool span does **not** cover
742
+ corpus/ingestion embeds fired at `Model.save` time (typically the dominant
743
+ spend) — wrap those in `Parse::Agent.measure_embeddings { … }`, which returns
744
+ `{ calls:, tokens:, cost_usd: }` for the work done on the calling thread:
745
+
746
+ ```ruby
747
+ stats = Parse::Agent.measure_embeddings do
748
+ KnowledgeArticle.save_all(batch) # embed-on-save
749
+ end
750
+ stats # => { calls: 1200, tokens: 4_300_000, cost_usd: 0.43 }
751
+ ```
752
+
753
+ Thread-local: embeds fanned out to other threads/fibers are not captured —
754
+ measure inside each worker. `Parse::Agent.embed_cost_usd(tokens)` converts a
755
+ token count to USD using the configured rate (nil when unset).
756
+ - **Provenance** — `Parse::Agent.include_source_provenance = true` (default
757
+ false) stamps each read-tool row with `_source = { class, tool, object_id }`,
758
+ applied after field-allowlist projection and redaction.
759
+ - **`semantic_search` tool** — registered readonly + `client_safe`; opt a model
760
+ in with `agent_searchable field:, filter_fields:`. See the
761
+ [Atlas Vector Search Guide](./atlas_vector_search_guide.md#retrieval-rag).
762
+
763
+ ### Runtime denial gates
764
+
765
+ Beyond the permission-tier and env-gate checks, several gates refuse a tool
766
+ call at runtime based on its arguments. They fail closed; a caller sees a
767
+ structured error (the built-in tools return `{ success: false, error:,
768
+ error_code: }`, which surfaces as `isError: true` over MCP). Knowing them up
769
+ front avoids discovering each only on impact:
770
+
771
+ | Gate | When it fires | Surfaced as |
772
+ |------|---------------|-------------|
773
+ | Missing tenant scope | A searchable class has no `agent_tenant_scope` while other classes do (tenant-aware deployment) | `Parse::Agent::MissingTenantScope` (search path); a one-time `[Parse::Agent:SECURITY]` lint warning on the general query path |
774
+ | No tenant binding | A scoped class is queried by an agent whose tenant value resolves to `nil` | `Parse::Agent::AccessDenied` (`kind: :tenant`) |
775
+ | Hidden class | A tool targets an `agent_hidden` class (or one outside a per-instance `classes:` allowlist) | `Parse::Agent::AccessDenied` (`kind: :hidden_class`) / off-allowlist refusal |
776
+ | Reserved underscore key | A `filter:` / `vector_filter:` / `where:` contains an underscore-prefixed key (`_rperm`, `_p_*`, …) at any depth | `ArgumentError` / `ValidationError` (recursive refusal) |
777
+ | Filter-field allowlist | A `filter:` / `vector_filter:` names a field not in the class's `agent_searchable filter_fields:` | `ValidationError` naming the offending field(s) |
778
+ | `text_field` not embedded | `semantic_search` `text_field:` names a field that isn't a declared `embed` source | `ValidationError` listing the allowed sources |
779
+ | Tool filtered | A tool/method removed by a per-instance `tools:` / `methods:` filter is invoked | `error_code: :tool_filtered` |
780
+ | Approval denied/unavailable | A gated write/admin op is rejected or the approver is unreachable | `error_code: :approval_denied` |
781
+
782
+ ---
783
+
784
+ ## Token Economy
785
+
786
+ The MCP surface is paid for in LLM context tokens — the tool schemas sent every
787
+ session, and the data every tool returns. 5.2 adds controls to keep that cost
788
+ down.
789
+
790
+ ### Lean tool profile
791
+
792
+ A full `:readonly` `tools/list` payload is roughly **7.9K context tokens** every
793
+ session. For small-context models or token-sensitive deployments, the `:lean`
794
+ profile narrows the surface to the six core read tools (`get_all_schemas`,
795
+ `get_schema`, `query_class`, `count_objects`, `get_object`, `aggregate`) —
796
+ about **2.6K tokens, a ~67% reduction**:
797
+
798
+ ```ruby
799
+ Parse::Agent.new(permissions: :readonly, tools: :lean)
800
+ ```
801
+
802
+ A profile is an allowlist: it composes with the permission tier and can only
803
+ narrow, never elevate. Profiles are Symbol-only (`Parse::Agent::TOOL_PROFILES`);
804
+ for finer control still pass an explicit Array or `{ only:, except: }`. An
805
+ unknown profile raises rather than silently exposing the full surface.
806
+
807
+ ### Leaner tool responses
808
+
809
+ Read tools return rows in an LLM-friendly form (Pointers as `{_type, class,
810
+ id}`, Dates as bare ISO strings) and now **strip the raw `ACL` map** — it is
811
+ operationally useless to a model (effective authority is enforced server-side
812
+ regardless) and is pure token overhead plus a minor role/user-id disclosure.
813
+ `get_objects` and the Atlas Search tools now go through the same normalization
814
+ `query_class` always used, instead of shipping raw wire-form.
815
+
816
+ Defaults that bound response size: `query_class` `limit:` defaults to 100 (cap
817
+ 1000) with the rendered array capped at 50 (`truncated_note`); `aggregate`
818
+ auto-injects a terminal `$limit: 200`. Pass a smaller `limit:` / project fewer
819
+ fields via `keys:` when you want a tighter result.
820
+
821
+ ### `semantic_search` — deduped sources and a token budget
822
+
823
+ The `semantic_search` result hoists each chunk's parent record **once** into a
824
+ `documents` map keyed by `objectId`, instead of duplicating the full source on
825
+ every chunk — map a chunk back to its source via `metadata.object_id`:
826
+
827
+ ```jsonc
828
+ {
829
+ "chunks": [
830
+ { "id": "a#0", "score": 0.82, "content": "…", "metadata": { "object_id": "a", "chunk_index": 0 } },
831
+ { "id": "a#1", "score": 0.82, "content": "…", "metadata": { "object_id": "a", "chunk_index": 1 } }
832
+ ],
833
+ "documents": { "a": { "objectId": "a", "title": "…" } },
834
+ "count": 2
835
+ }
836
+ ```
837
+
838
+ A `max_total_tokens` budget (default 20,000; estimated as chars/4) trims the
839
+ lowest-ranked chunks so a few long documents can't silently blow the context
840
+ window — the count caps (`k * max_chunks_per_document`) bound the chunk *count*
841
+ but not their total size. When the budget trims, the result adds
842
+ `budget_truncated: true` and `budget_dropped: <n>` so the truncation is never
843
+ silent. Pass `max_total_tokens: 0` to disable.
844
+
845
+ ### Structured error metadata on the wire
846
+
847
+ A failing `tools/call` already carries `error_code` and a structured `details:`
848
+ hash (e.g. `allowed_fields`, `suggested_rewrite`) and `retry_after` — these are
849
+ now forwarded on the MCP error envelope under `_meta` (`parse.error_code`,
850
+ `parse.retry_after`, `parse.details`) so a client can branch deterministically
851
+ and honor `retry_after` instead of re-parsing the prose message. The
852
+ human-readable `content` text is unchanged.
853
+
854
+ `get_schema` on a mistyped class name now raises a `ValidationError` carrying a
855
+ "Did you mean: …?" hint (near matches from the locally-known classes), so the
856
+ model self-corrects in one retry instead of falling back to a full
857
+ `get_all_schemas` sweep.
858
+
859
+ ---
860
+
363
861
  ## Custom Authentication
364
862
 
365
863
  The agent factory pattern gives you full control over authentication. Every request passes through the factory before any Parse operation is attempted.
@@ -739,6 +1237,10 @@ agent = Parse::Agent.new(tools: { only: [:query_class, :get_schema, :aggregate],
739
1237
 
740
1238
  # Denylist only
741
1239
  agent = Parse::Agent.new(tools: { except: [:emit_artifact] })
1240
+
1241
+ # Named profile (Symbol) — :lean narrows to the six core read tools
1242
+ # (~67% smaller tools/list). See "Token Economy" above.
1243
+ agent = Parse::Agent.new(tools: :lean)
742
1244
  ```
743
1245
 
744
1246
  **Resolution order** is strict: env-gates ▷ permission tier ▷ per-instance filter. The filter cannot elevate — `tools: { only: [:delete_object] }` on a `:readonly` agent still excludes `delete_object` because `delete_object` is not in the readonly tier's permitted set in the first place.
@@ -1667,6 +2169,8 @@ Known `details[:kind]` subcodes for `:access_denied`:
1667
2169
 
1668
2170
  The top-level `error_code` stays at `:access_denied` for back-compat with consumers that only branch on it. The new subcode is purely additive — clients that ignore `details:` see no change in behavior.
1669
2171
 
2172
+ **On the wire (5.2+):** `error_code`, `retry_after`, and `details` are forwarded on the MCP tool-error envelope under `_meta` — `parse.error_code`, `parse.retry_after`, `parse.details` — so a spec-compliant client can branch deterministically (and honor `retry_after`) without parsing the prose `content` text. The `content` text and `isError: true` are unchanged.
2173
+
1670
2174
  ---
1671
2175
 
1672
2176
  ## Performance and Timeouts