parse-stack-next 5.4.1 → 5.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 842a9a2d8d24afbb8e0444d995ea6c1d8707ec5fc2c9a40405d40f71b46495b8
4
- data.tar.gz: 8e54c583a6bf251818144b2ae6e1589197b4b0153093e0dd36048798a696346a
3
+ metadata.gz: 21be0f771a719c1df464556b7b4757d23266e4446a0636887cbc7b0ca079e3db
4
+ data.tar.gz: e130b4255384a8fb3b0a1be0e44d6fa8a345ee120851c9596285e44a4c9ec81b
5
5
  SHA512:
6
- metadata.gz: 264a5574513616b8cb9ebe662c9e1109746d688c191affe3faefd11f20b0911039f09e895e535874e8715ddf5f53b30e863f6cdc5fa54f4cafabe2f5044398db
7
- data.tar.gz: d61b36d12ef78eb05b701ff1e45858c7dc2f911bd548d912db9e061fe9027fed86c3b2b2808f2a89cbeb5115c603fd279425ccec2d184f567b9ef00f8eb901af
6
+ metadata.gz: '008496f006ad4c6026675be14f50be0189e4d64fa8ba1b5102bf781ef85e0d851507c6eb7c3ad3fadb9796e09f983e0609d220e0f580f2589ba0cea1471668c9'
7
+ data.tar.gz: 1ccb000d645ad338c5cafb98c7d7dbc8610d5cf9ce0c2fac84595c6fb8e6c27dc66c9c14678a834180a1ad57b653f242bb64fe8139383a979921a10a13d84953
data/CHANGELOG.md CHANGED
@@ -1,5 +1,349 @@
1
1
  ## parse-stack-next Changelog
2
2
 
3
+ ### 5.5.0
4
+
5
+ #### Multimodal bytes-fetch path with magic-byte MIME verification
6
+
7
+ - **NEW**: `Parse::Embeddings::ImageFetch` — the SDK-side image download
8
+ layer for image embeddings. Downloads through the existing
9
+ `Parse::File.safe_open_url` SSRF primitive (CIDR blocks, port allowlist,
10
+ DNS-rebinding re-check, size caps, timeouts — no parallel fetch mechanism),
11
+ determines the MIME type **exclusively by magic-byte sniffing** of the
12
+ leading bytes (JPEG / PNG / GIF / WebP), cross-checks the URL extension
13
+ against the sniffed type, and enforces a configurable
14
+ `Parse::Embeddings.allowed_image_types` allowlist. The HTTP `Content-Type`
15
+ header is never consulted, closing the file MIME-laundering gap: a `.jpg`
16
+ URL serving HTML (or PNG bytes behind a JPEG extension) is refused outright.
17
+ - **NEW**: `embed_image ..., source: :bytes` declaration mode. Where the
18
+ default `source: :url` forwards a validated URL for the provider to fetch
19
+ itself (and therefore requires the `trust_provider_url_fetch` sentinel),
20
+ `:bytes` mode has the SDK download, verify, and metadata-strip the image,
21
+ then forward it to the provider as a base64 data URI. No third-party URL
22
+ egress occurs, so the sentinel is not required — but the file's host must
23
+ still be in `Parse::Embeddings.allowed_image_hosts` (deny-all when empty).
24
+
25
+ ```ruby
26
+ class Post < Parse::Object
27
+ property :cover_image, :file
28
+ property :cover_embedding, :vector, dimensions: 1024, provider: :voyage
29
+ embed_image :cover_image, into: :cover_embedding, source: :bytes
30
+ end
31
+ ```
32
+ - **NEW**: EXIF/XMP metadata stripping, **default ON** for the bytes path.
33
+ User-uploaded photos commonly carry GPS coordinates and device serial
34
+ numbers; forwarding them to an embedding provider is a PII egress. JPEG
35
+ APP1 segments (Exif and XMP), PNG `eXIf` chunks, and WebP `EXIF`/`XMP `
36
+ RIFF chunks (with the VP8X flag bits cleared) are removed before the bytes
37
+ leave the process. Opt out per declaration with `exif_strip: false` when
38
+ orientation metadata must be preserved.
39
+ - **NEW**: `Voyage#embed_image` and `Cohere#embed_image` accept
40
+ `Parse::Embeddings::ImageFetch::FetchedImage` sources alongside URL
41
+ Strings (forms may be mixed in one batch). Fetched bytes ride Voyage's
42
+ `image_base64` content row and Cohere's `image_url` data-URI form.
43
+ - **NEW**: `Parse::Embeddings.allowed_image_types=` — MIME allowlist for the
44
+ bytes path (default JPEG/PNG/GIF/WebP; SVG deliberately excluded as
45
+ script-capable active content).
46
+ - **ENHANCED**: `Parse::Embeddings.validate_image_url!` accepts
47
+ `mode: :fetch` for SDK-side downloads — same host allowlist,
48
+ obfuscated-IP screen, port and CIDR checks as the default `:forward`
49
+ mode, minus the provider-egress sentinel that doesn't apply when no URL
50
+ is forwarded.
51
+
52
+ #### Embedding-model migration tooling
53
+
54
+ - **NEW**: `Class.reembed!(field:, batch_size:, limit:, where:, only_stale:,
55
+ save_opts:)` — bulk re-embed for provider/model migrations. Unlike
56
+ `embed_pending!` (which only fills null vectors), `reembed!` walks every
57
+ row with objectId-cursor pagination, clears the digest sibling so the
58
+ save-path recompute cannot elide the provider call, and saves. With
59
+ `only_stale: true` the walk skips rows whose recorded provenance already
60
+ matches the current provider, model, and dimensions — making a partially
61
+ failed migration resumable.
62
+ - **NEW**: `embed` / `embed_image` auto-declare an `<into>_meta` `:object`
63
+ sibling property recording `{ provider, model, dimensions, modality,
64
+ embedded_at }` on every recompute (cleared when the source clears).
65
+ This is the provenance record `reembed!(only_stale: true)` reads, and it
66
+ tells operational tooling which model produced any stored vector.
67
+ Override the name with `meta_field:`.
68
+
69
+ #### Bulk embedding and query-embed caching
70
+
71
+ - **NEW**: `Parse::Embeddings::BatchEmbedder` — batch-level orchestration
72
+ for bulk embedding jobs. Wraps any registered provider with batch slicing
73
+ (defaulting to the provider's own batch-size hint), requests-per-minute
74
+ pacing between calls, and batch-level exponential backoff with jitter on
75
+ rate-limit / transient errors (previously backoff lived only inside each
76
+ provider's single HTTP call). A batch that exhausts its attempts raises
77
+ `BatchEmbedder::BatchFailed` carrying `batch_index` and `completed_count`
78
+ so a resumable job knows where to pick up. Supports `retry_on:` exception
79
+ overrides and an `on_progress:` callback.
80
+ - **NEW**: `Parse::Embeddings::Cache` — process-local embedding cache keyed
81
+ by `(provider, model, dimensions, input_type, SHA-256(input))`, disabled by
82
+ default. Dimensions participate in the key so two registrations of the
83
+ same Matryoshka-capable model at different output widths never serve each
84
+ other's vectors.
85
+ `Parse::Embeddings::Cache.enable!(max_entries:, ttl:)` activates an LRU +
86
+ TTL store (or pass `store:` for a custom backend); repeated identical
87
+ query embeds through `find_similar(text:)`, `hybrid_search(text:)`, and
88
+ `Parse::Retrieval.retrieve` then skip the provider round-trip. Cache hits
89
+ emit the standard `parse.embeddings.embed` notification with
90
+ `cached: true`, so existing spend subscribers see hits and misses on one
91
+ stream. The input text is hashed before keying — plaintext queries never
92
+ land in a shared store.
93
+
94
+ #### Vector index drift detection
95
+
96
+ - **NEW**: first-query verification of deployed Atlas vectorSearch indexes.
97
+ When `find_similar` / `hybrid_search` auto-discovers an index, the SDK now
98
+ compares the index's `numDimensions` and `similarity` against the
99
+ `:vector` property declaration, and — when the class registers an
100
+ `agent_tenant_scope` — confirms the scope field is declared as a
101
+ `type: "filter"` path (without it, every tenant-scoped
102
+ `$vectorSearch.filter` fails Atlas-side). Findings are computed once per
103
+ (class, field, index) per process and governed by
104
+ `Parse::VectorSearch.index_drift_policy`: `:warn` (default) emits a
105
+ `[Parse::VectorSearch:DRIFT]` warning on the first check; `:raise` raises
106
+ `Parse::Core::VectorSearchable::IndexDriftError` on **every** query
107
+ against the drifted index, so strict deployments never serve degraded
108
+ results after the first failure; `:ignore` skips verification. An
109
+ explicit `index:` kwarg is verified best-effort when the catalog's
110
+ covering index carries the same name (lookup failures never fail the
111
+ query).
112
+
113
+ #### Hybrid search hardening
114
+
115
+ - **FIXED**: on the opt-in native `$rankFusion` path, a scoped (non-master)
116
+ caller's `_hybrid_score` is now recomputed from the post-ACL visible
117
+ ordering instead of surfacing the raw fused score. The raw score is
118
+ materialized before the ACL `$match`, so it encoded a surviving row's
119
+ rank among rows the caller cannot read — a cross-tenant/cross-ACL
120
+ inference channel for callers probing with crafted queries. The
121
+ recomputed score is monotone with the true fused order but is a function
122
+ of visible rows only. Master-key results and the default client-side RRF
123
+ path (which ranks from already-filtered rows) are unchanged.
124
+ - **FIXED**: the `$rankFusion` support probe no longer classifies MongoDB
125
+ authorization errors as "stage unsupported". The probe's
126
+ unrecognized-stage matching included the broad phrase "is not allowed",
127
+ which also appears in auth failures ("not allowed to execute command
128
+ aggregate") and could cache the wrong verdict for the probe TTL. Matching
129
+ is narrowed to unambiguous unknown-stage phrases; any other failure is
130
+ treated as supported and the real query surfaces the real error, with
131
+ the client-side path as the standing fallback.
132
+
133
+ #### Retrieval spend-cap and filter hardening
134
+
135
+ - **NEW**: `Parse::Embeddings::SpendCap.configure(..., warn_at: 0.8)` —
136
+ soft-cap alerting. When a charge pushes a tenant's in-window usage across
137
+ the given fraction of its hard limit, a
138
+ `parse.embeddings.spend_cap_warning` ActiveSupport::Notifications event
139
+ is emitted (`tenant_id`, `used`, `limit`, `window`, `warn_at`,
140
+ `threshold`), once per crossing and re-arming as the window rolls off —
141
+ an operator alerting hook that fires BEFORE the hard refuse trips.
142
+ Disabled unless configured. Note the cap deliberately charges before the
143
+ query-embed cache lookup, so cache hits bill at full price: it bounds
144
+ query volume (an abuse control), not just provider spend.
145
+ - **NEW**: `Parse::Embeddings::Cache::MonetaStore` — persistent-L2 adapter
146
+ for the embedding cache. Wraps any Moneta-compatible store (`[]`/`[]=`,
147
+ optional `store(key, value, expires:)`) behind the cache's `get`/`set`
148
+ duck, with key namespacing and TTL forwarding, so
149
+ `Cache.enable!(store: MonetaStore.new(moneta, ttl: 30 * 24 * 3600))`
150
+ shares query-embed entries across processes and restarts. Fail-open: a
151
+ backend error degrades to a cache miss / dropped write, never a failed
152
+ embed. Cache keys are input hashes — plaintext queries never land in the
153
+ shared store.
154
+ - **NEW**: embedding spend-cap coverage on every query-embed path. The
155
+ per-tenant `Parse::Embeddings::SpendCap` was previously charged only at
156
+ the `semantic_search` agent-tool boundary; direct `find_similar(text:)`,
157
+ `hybrid_search(text:)`, and `Parse::Retrieval.retrieve` callers bypassed
158
+ it. The shared query-embed path now charges via
159
+ `SpendCap.charge_query!` — tenant identity resolves to the ambient
160
+ `Parse.with_cache_tenant` scope when set, else the shared default bucket.
161
+ The agent tool wraps its retrieval in the new `SpendCap.with_precharged`
162
+ block so a query it already charged with per-tenant identity is not
163
+ double-billed (and admin-exempt queries are not billed to the shared
164
+ bucket). As before, the cap is a no-op until configured.
165
+ - **NEW**: pointer-value translation for caller-supplied retrieval filters.
166
+ `Parse::Retrieval.retrieve` (and through it the `semantic_search` agent
167
+ tool) now rewrites Parse pointer values — `Parse::Pointer` /
168
+ `Parse::Object` instances and wire-form `{"__type": "Pointer"}` hashes,
169
+ including inside `$in` / `$eq` / `$ne` operator hashes — into their
170
+ MongoDB storage form, so `{ owner: some_user }` becomes
171
+ `{ "_p_owner" => "_User$abc123" }` and actually matches rows. Previously
172
+ a pointer-valued filter silently matched nothing. Translation runs after
173
+ the underscore-key gate and filter-field allowlist (callers still cannot
174
+ name `_p_*` columns directly) and before the tenant-scope fold. The
175
+ standalone helper is `Parse::Retrieval.translate_pointer_filter_values`.
176
+ - **IMPROVED**: `Parse::Schema::SearchIndexMigrator` auto-includes the
177
+ model's registered `agent_tenant_scope` field as a `type: "filter"` path
178
+ when planning or applying `vectorSearch` index declarations. Newly created
179
+ indexes support tenant-scoped pre-filtering out of the box; existing
180
+ indexes missing the path surface as `drifted:` in the plan instead of
181
+ failing at query time.
182
+
183
+ #### Opt-in Unicode regex matching for text constraints
184
+
185
+ - **NEW**: `starts_with`, `contains`, `ends_with`, and `like`/`regex` now accept
186
+ an opt-in `{ value:, unicode: true }` form that appends the `u` (Unicode) flag
187
+ to the compiled `$options`, enabling correct multibyte case-insensitive
188
+ matching for accented and non-Latin text (for example `café` matching
189
+ `CAFÉ`, or CJK characters).
190
+
191
+ ```ruby
192
+ Post.where(:title.starts_with => { value: "café", unicode: true })
193
+ # => "title": { "$regex": "^café", "$options": "iu" }
194
+
195
+ Post.where(:title.like => { value: /café/i, unicode: true })
196
+ # => "title": { "$regex": "café", "$options": "iu" }
197
+ ```
198
+
199
+ The flag is strictly opt-in: the bare-value forms
200
+ (`:title.starts_with => "café"`) compile exactly as before with `$options: "i"`,
201
+ so existing queries are unchanged. The `u` flag is honored by Parse Server
202
+ 8.3.0+ over the REST query interface and by MongoDB 6.1+ on the mongo-direct
203
+ query path; older Parse Servers reject it, which is why it is never emitted
204
+ unless requested.
205
+
206
+ #### ACL permission query hardening
207
+
208
+ - **FIXED**: `readable_by`, `writable_by`, `readable_by_role`,
209
+ `writable_by_role`, `publicly_readable`, and `publicly_writable` no longer
210
+ raise a pipeline-security error when they auto-route through the direct
211
+ MongoDB path. These constraints compile to an aggregation `$match` on the
212
+ internal `_rperm` / `_wperm` permission columns, and the internal-fields
213
+ denylist that protects user-supplied pipelines from referencing
214
+ server-internal columns was also rejecting these SDK-generated references.
215
+ The aggregation runner now forwards the `allow_internal_fields` sanction for
216
+ pipelines built entirely from SDK constraint translation — matching the
217
+ parity already held by the `results_direct` / `count_direct` /
218
+ `distinct_direct` helpers — so public-read detection (`publicly_readable`,
219
+ `readable_by("*")`) and role/user permission filtering work again. The
220
+ sanction is scoped to SDK-built ACL pipelines only; caller-supplied
221
+ aggregation pipelines remain subject to the full denylist, so they still
222
+ cannot reference password hashes, session tokens, or other internal columns.
223
+ - **FIXED**: `Query#count` now routes ACL permission filters
224
+ (`publicly_readable.count`, `readable_by(...).count`, and friends) through
225
+ the direct MongoDB path, mirroring `Query#results`. Previously `count` only
226
+ switched to the direct path for subquery `$lookup` stages, so an ACL count
227
+ was sent to Parse Server's REST aggregate endpoint, which cannot express a
228
+ `$match` on `_rperm` / `_wperm`.
229
+ - **FIXED**: the scalar aggregation terminals — `Query#sum`, `#average`,
230
+ `#min`, `#max`, `#distinct`, and `#count_distinct` — now honor ACL
231
+ permission filters and scoped queries. They funnel through `Query#aggregate`,
232
+ which previously only switched to the direct MongoDB path for subquery
233
+ `$lookup` stages. An ACL filter (`readable_by(...).sum(:plays)`) was sent to
234
+ Parse Server's REST aggregate endpoint, which cannot express a `$match` on
235
+ `_rperm` / `_wperm`. More seriously, a **scoped** terminal
236
+ (`scope_to_user(u).sum(:plays)`, `scope_to_role`, or a `session_token`)
237
+ reached the same REST endpoint, which is master-key-only and enforces
238
+ neither ACL nor CLP — so the aggregate ran unscoped as the master key,
239
+ computing the result over rows the caller cannot read. `Query#aggregate` now
240
+ routes to mongo-direct whenever the query is scoped or the pipeline
241
+ references the ACL columns, and **fails closed** (raises
242
+ `Parse::Query::MongoDirectRequired`) for a scoped terminal when mongo-direct
243
+ is unavailable, rather than silently bypassing enforcement. The same
244
+ contract covers the inline-pipeline terminals: a scoped `Query#count` or
245
+ `Query#results` whose constraints compile to an aggregation pipeline
246
+ (e.g. `:field.size`) promotes to mongo-direct and fails closed identically
247
+ instead of falling back to REST `/aggregate`.
248
+ - **FIXED**: `not_publicly_readable` / `not_publicly_writable` (and the
249
+ `:ACL.not_readable_by` / `:ACL.not_writable_by` constraints) no longer return
250
+ the rows they are meant to exclude. They compiled to `{ _rperm: { $nin:
251
+ [...] } }`, and MongoDB's `$nin` matches documents where the field is
252
+ **absent** — and a missing `_rperm` is treated by Parse Server as public.
253
+ A security audit using `not_publicly_writable` to find safe objects silently
254
+ excluded write-exposed (public-by-absence) objects. The constraints now carry
255
+ an `$exists: true` guard. "Not readable by X" additionally expands the
256
+ principal's roles and excludes publicly-readable rows (a public row is
257
+ readable by everyone, so it cannot be "not readable by X").
258
+ - **FIXED**: `readable_by([])` / `writable_by([])` and the `:none` / `nil`
259
+ forms no longer raise `ArgumentError`; they now compile to the documented
260
+ "no permissions" match (an explicit empty `_rperm` / `_wperm`). Symbol
261
+ principals (`:public`, `:everyone`, `:world`) are accepted and map to the
262
+ public wildcard, matching the String forms.
263
+ - **FIXED**: `PrivateAclConstraint` (`:ACL.private_acl` / `master_key_only`)
264
+ no longer classifies public-by-absence rows as private. A truly master-key-
265
+ only object has an explicit empty `_rperm` **and** `_wperm`; a missing
266
+ column is public, the opposite of private, so the missing-field branch was
267
+ removed. `private_acl => false` is now the exact complement.
268
+ - **FIXED**: role expansion for `readable_by` / `writable_by` /
269
+ `readable_by_role` / `writable_by_role` now always includes the role's own
270
+ name in the permission set. The upward-inheritance walk yields nothing for
271
+ an unpersisted role (objectId still nil), which previously dropped the role
272
+ entirely and raised "no valid permissions"; the role's own `role:<name>`
273
+ entry is now appended idempotently, so persisted roles compile unchanged.
274
+ - **CHANGED**: a mistyped ACL permission no longer vanishes silently. An
275
+ unrecognized element in a `readable_by` / `writable_by` array (or an
276
+ unsupported Symbol) now raises `ArgumentError` instead of being dropped from
277
+ the permission set, which would silently weaken the intended filter.
278
+ - **NEW**: `strict:` option on `readable_by` / `writable_by` /
279
+ `readable_by_role` / `writable_by_role` (and the `:ACL.readable_by_exact` /
280
+ `writable_by_exact` / `*_by_role_exact` operators) for an **exact** match —
281
+ only rows whose `_rperm` / `_wperm` literally contains one of the resolved
282
+ permissions, with no implicit public `"*"` and no missing-field rows. The
283
+ default remains inclusive (access-simulation) semantics; `strict: true` is
284
+ the right choice for ownership and security audits.
285
+ - **NEW**: `Query#not_readable_by` / `#not_writable_by` chained methods, the
286
+ fluent counterparts to the existing `:ACL.not_readable_by` symbol operators.
287
+ - **BREAKING**: the British-spelled `:ACL.writeable_by` operator now resolves
288
+ to the same public-inclusive, role-expanding implementation as
289
+ `:ACL.writable_by`. Previously the one-letter spelling difference selected a
290
+ separate, strict, non-role-expanding constraint, so `writeable_by` and
291
+ `writable_by` silently produced different result sets. Code that relied on
292
+ the old strict behavior of `writeable_by` should pass `strict: true` (or use
293
+ the `:writable_by_exact` operator).
294
+
295
+ #### Webhook after_save callback hardening
296
+
297
+ - **FIXED**: the model's chained `after_save` / `after_create` callbacks now
298
+ fire exactly once per `afterSave` delivery, even when an app registers both a
299
+ class-specific handler (`webhook :after_save, MyClass`) and a catch-all
300
+ handler (`webhook :after_save, "*"`). The webhook endpoint dispatches every
301
+ trigger to both the class route and the `"*"` route, and the callback chain
302
+ previously ran inside each route — so an app with both handlers fired its
303
+ model `after_save` twice (e.g. two emails per save). The chain now runs once,
304
+ after both routes are dispatched. The existing behavior is otherwise
305
+ preserved: an `afterSave` for a class with no registered handler never fires
306
+ model callbacks, and trusted Ruby-initiated saves still skip the webhook-side
307
+ callbacks so the local `run_callbacks :save` is the single fire.
308
+ - **FIXED**: a chained `after_save` or `after_create` callback that raises
309
+ during an `afterSave` webhook no longer crashes the webhook endpoint or
310
+ suppresses the other phase's side effects. Because `afterSave` fires after the
311
+ object is already persisted and Parse Server discards the response body, the
312
+ `after_create` and `after_save` phases now run independently and any
313
+ `StandardError` they raise is logged and swallowed (mirroring Parse Server's
314
+ own afterSave semantics). A raising `after_create :send_welcome_email` no
315
+ longer silently skips an unrelated `after_save :reindex`, and an uncaught
316
+ callback error can no longer return a 500 to Parse Server.
317
+ - **FIXED**: `Parse::Webhooks::Payload#ruby_initiated?` now memoizes a `false`
318
+ result stably instead of re-deriving it on every call. The prior `||=`
319
+ memoization recomputed whenever the cached value was `false`, so a stamped
320
+ `false` could be re-derived inconsistently; the detection result is now cached
321
+ exactly once.
322
+
323
+ #### `verify_password` client-side rate-limit parity
324
+
325
+ - **CHANGED**: `verify_password` now participates in the same client-side login
326
+ rate-limit as `login`. It calls the rate-limit guard before issuing the
327
+ request and records the result afterward, keyed on the bare username so
328
+ failures share a bucket with `login` — an attacker cannot sidestep a `login`
329
+ lockout by pivoting to the `verify_password` credential oracle. Because the
330
+ bucket is shared, a run of failed step-up / re-authentication calls counts
331
+ toward (and can trigger) the primary login lockout for that username. As with
332
+ `login`, this is a convenience guard, not a security boundary — server-side
333
+ rate limiting remains the real control.
334
+
335
+ #### Cloud function results are server-authoritative
336
+
337
+ - **IMPROVED**: Documented that decoded cloud function results are treated as
338
+ server-authoritative. A cloud function that returns a Parse object decodes
339
+ through the same trusted path as every query and `fetch` result, so
340
+ server-set fields on the returned object (including `sessionToken` on a
341
+ returned user) are preserved rather than stripped — consistent with how the
342
+ rest of the SDK hydrates server responses. If a cloud function is expected to
343
+ echo back third-party-influenced data that you want to sanitize yourself,
344
+ call it with `raw: true` (`Parse.call_function(name, body, raw: true)`) to
345
+ receive the undecoded response before any object is built.
346
+
3
347
  ### 5.4.1
4
348
 
5
349
  #### Webhook after_save callback fix
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- parse-stack-next (5.4.1)
4
+ parse-stack-next (5.5.0)
5
5
  activemodel (>= 6.1, < 9)
6
6
  activesupport (>= 6.1, < 9)
7
7
  connection_pool (>= 2.2, < 4)
data/README.md CHANGED
@@ -4,6 +4,13 @@
4
4
 
5
5
  A full-featured Ruby client SDK for [Parse Server](http://parseplatform.org/). [parse-stack-next](https://github.com/neurosynq/parse-stack-next) is a Ruby client SDK, REST client, and Active Model ORM for [Parse Server](http://parseplatform.org/), combining a low-level API client, a query engine, an object-relational mapper (ORM), and a Cloud Code Webhooks rack application in a single gem.
6
6
 
7
+ ### What's new in 5.5
8
+
9
+ - **5.5.0 — Multimodal bytes-fetch with magic-byte MIME verification** — `embed_image ..., source: :bytes` has the SDK download an image itself through the `Parse::File.safe_open_url` SSRF primitive, verify the content by **magic-byte sniff** (the `Content-Type` header is never consulted — a `.jpg` URL serving HTML is refused), cross-check the URL extension, enforce a `Parse::Embeddings.allowed_image_types` allowlist, strip EXIF/XMP metadata **by default** (JPEG APP1, PNG `eXIf`, WebP `EXIF`/`XMP ` chunks; opt out with `exif_strip: false`), and forward the verified bytes to Voyage/Cohere as a base64 data URI. No provider-side URL fetch occurs, so the `trust_provider_url_fetch` sentinel is not required — the host allowlist still applies. See [CHANGELOG.md](./CHANGELOG.md)
10
+ - **5.5.0 — Embedding-model migration tooling** — `Class.reembed!(only_stale: true)` bulk re-embeds rows through the current provider/model (resumable; skips rows already current), driven by the new auto-declared `<into>_meta` provenance sibling (`{provider, model, dimensions, modality, embedded_at}`, stamped on every recompute). `Parse::Embeddings::BatchEmbedder` adds batch-level requests-per-minute pacing and exponential backoff for bulk jobs; `Parse::Embeddings::Cache.enable!` adds an opt-in query-embed cache keyed by `(provider, model, input_type, input-hash)` so repeated identical queries skip the provider round-trip. See [CHANGELOG.md](./CHANGELOG.md)
11
+ - **5.5.0 — Vector index drift detection** — on first auto-discovered use of an Atlas vectorSearch index, the SDK verifies the deployed index's `numDimensions`/`similarity` against the `:vector` property declaration and confirms a registered `agent_tenant_scope` field is covered as a `type: "filter"` path. Policy via `Parse::VectorSearch.index_drift_policy` (`:warn` default / `:raise` / `:ignore`). `Parse::Schema::SearchIndexMigrator` now auto-includes the tenant-scope field in `vectorSearch` declarations, so newly created indexes support tenant-scoped pre-filtering out of the box. See [CHANGELOG.md](./CHANGELOG.md)
12
+ - **5.5.0 — Retrieval spend-cap and filter hardening** — the per-tenant embedding spend cap now covers every query-embed path (`find_similar(text:)`, `hybrid_search(text:)`, `Parse::Retrieval.retrieve`), not just the `semantic_search` agent tool; tenant identity resolves through the ambient `Parse.with_cache_tenant` scope. Caller-supplied retrieval filters now translate Parse pointer values to storage form (`{ owner: user }` → `{ "_p_owner" => "_User$id" }`), so pointer filters match rows instead of silently matching nothing. See [CHANGELOG.md](./CHANGELOG.md)
13
+
7
14
  ### What's new in 5.4
8
15
 
9
16
  - **5.4.0 — Hybrid search + reranking for RAG** — `Class.hybrid_search(text:, lexical:, vector:, k:, fusion:)` fuses a lexical Atlas Search branch with a `$vectorSearch` branch using reciprocal-rank fusion (RRF): lexical search nails exact tokens (codes, proper nouns), vector search nails paraphrase, and fusing the two beats either alone. Each branch enforces ACL/CLP independently before fusion (no separate hydration fetch to secure); results carry `#hybrid_score` / `#hybrid_ranks`. `Parse::VectorSearch::Hybrid.rank_fusion_supported?` detects Atlas 8.0+ native `$rankFusion` by a cached behavioural probe (native execution is opt-in; client-side RRF is the always-enforced default). `Parse::Retrieval::Reranker` adds cross-encoder reranking (`Reranker::Cohere` over `/v2/rerank`, plus a deterministic `Reranker::Fixture`), wired into `Parse::Retrieval.retrieve(hybrid:, rerank:)`. `Parse::Embeddings::SpendCap` adds an opt-in per-tenant embedding token cap (hard-refuse) at the `semantic_search` agent-tool boundary. See [CHANGELOG.md](./CHANGELOG.md) and [`docs/atlas_vector_search_guide.md`](./docs/atlas_vector_search_guide.md)
@@ -38,7 +45,7 @@ See [CHANGELOG.md](./CHANGELOG.md) for the full 5.2 entry.
38
45
  - **`Parse::File` URL normalization + presigned-URL stash** — `Parse::File#url=` and `attributes=` now strip signed-URL query parameters (`X-Amz-Signature`, `AWSAccessKeyId`, `Key-Pair-Id`, etc.) before storage; the bare canonical URL lands in `@url`, and the original signed URL is stashed in `file.presigned_url` with a data-driven expiry in `file.presigned_url_expires_at`. New `file.presigned_url_valid?(buffer: 60)` predicate, configurable `Parse::File.signed_url_policy = :strip | :raise`, and `Parse::File.log_filter` / `log_filter_strict` regexes for `lograge` / Sentry / Honeybadger scrubbers. `Parse::File#inspect` no longer emits the URL — see CHANGELOG for the error-reporter payload migration callout
39
46
  - **`Parse::Lock` — public TTL-bounded mutual-exclusion primitive** — `Parse::Lock.acquire(key, ttl:, wait:) { … }` exposes the Redis-backed lock previously hidden inside `first_or_create!` as a first-class API. In-process `Mutex` fallback for memory-backed caches, fails closed on backend errors, HMAC-keyed via `PARSE_STACK_LOCK_SECRET`, namespace-separated from `first_or_create!` so the two cannot collide
40
47
  - **LiveQuery ergonomics** — autoloaded (no explicit `require 'parse/live_query'`); connections are **ACL-scoped by default** (build an admin, ACL-bypassing connection explicitly with `Parse::LiveQuery::Client.new(use_master_key: true)` — master-key authorization is per-connection, not per-subscription); `Query#subscribe` / `Klass.subscribe` accept a block yielded the `Subscription` *before* the subscribe frame is sent so `sub.on(:create) { … }` callbacks are wired before any server event can arrive; `Parse::LiveQuery.run_until_signal!(client:) { … }` is a signal-safe shutdown helper for long-running consumers
41
- - **Image embeddings** — new `embed_image` class macro for `:file`-typed source properties plus `Voyage#embed_image` (`voyage-multimodal-3`, 1024-dim) and `Cohere#embed_image` (`embed-v4.0`, 1536-dim). URL-only routing in v5.1 (bytes-fetch with MIME-sniff lands later); operator-gated via the `Parse::Embeddings.trust_provider_url_fetch = "PROVIDER_EGRESS_VERIFIED"` sentinel plus a `Parse::Embeddings.allowed_image_hosts` CDN allowlist
48
+ - **Image embeddings** — new `embed_image` class macro for `:file`-typed source properties plus `Voyage#embed_image` (`voyage-multimodal-3`, 1024-dim) and `Cohere#embed_image` (`embed-v4.0`, 1536-dim). URL-only routing in v5.1 (the bytes-fetch path with MIME-sniff shipped in v5.5 as `source: :bytes`); operator-gated via the `Parse::Embeddings.trust_provider_url_fetch = "PROVIDER_EGRESS_VERIFIED"` sentinel plus a `Parse::Embeddings.allowed_image_hosts` CDN allowlist
42
49
  - **Tenant-aware cache namespacing** — `Parse.with_cache_tenant(scope) { … }` composes the tenant into the response-cache key as `<base>:T:<tenant>:…` so a multi-tenant app sharing one Redis gets per-tenant key isolation and per-tenant SCAN-delete eviction without per-tenant `Parse::Client.new` plumbing. Fiber-local, restored on block exit, AS::N payloads carry `:cache_tenant`
43
50
  - **`_User` field-visibility DSL** — `Parse::User.master_only_fields(*fields)` and `Parse::User.self_visible_fields(*fields, via: :self)` declare admin-only and owner-only field protections on `_User`. Requires Parse Server's `protectedFieldsOwnerExempt: false` server option (the SDK emits a one-time advisory at class declaration so the dependency is surfaced before deploy). Parse Server's default for this option is changing to `false` in a future version; until your server adopts that default, set it explicitly
44
51
  - **`Parse::Installation` `belongs_to :user`** — read `installation.user` to find which user a device is currently signed in as. Symmetric `Parse::User#has_many :installations` for targeted-push grouping (master-key-only by Parse Server design; see the YARD for the owner-identity caveat)
@@ -64,6 +71,16 @@ See [CHANGELOG.md](./CHANGELOG.md) for the full 5.0 entry, including security-ha
64
71
 
65
72
  ### Core capabilities
66
73
 
74
+ > **Vector search requires MongoDB Atlas (or Atlas Local).** The `:vector`
75
+ > property, `find_similar`, `hybrid_search`, and `Parse::Retrieval` all
76
+ > execute Atlas `$vectorSearch` / `$search` aggregation stages, which exist
77
+ > only on Atlas clusters and the Atlas Local container — community/self-hosted
78
+ > MongoDB is not supported and there is no in-process fallback (a pure-Ruby
79
+ > cosine scan over a real collection is a silent performance cliff, so the
80
+ > SDK refuses rather than degrades). This is a closed design decision.
81
+ > Everything else in this list works against any MongoDB that Parse Server
82
+ > supports.
83
+
67
84
  - MongoDB Aggregation Framework support
68
85
  - **MongoDB Atlas Search** — full-text search, autocomplete, faceted search with direct MongoDB access
69
86
  - **Direct MongoDB Queries** — bypass Parse Server's REST surface for high-performance reads, with SDK-side ACL/CLP/`protectedFields` enforcement for scoped agents
@@ -5533,13 +5550,24 @@ pipeline = [
5533
5550
 
5534
5551
  Filter objects by ACL permissions using MongoDB's `_rperm` and `_wperm` fields:
5535
5552
 
5536
- **`readable_by` / `writable_by`** - Exact permission strings:
5553
+ **`readable_by` / `writable_by`** - filter by principal:
5537
5554
  ```ruby
5538
5555
  Song.query.readable_by("user123").results(mongo_direct: true) # User ID
5539
5556
  Song.query.readable_by("role:Admin").results(mongo_direct: true) # Role (explicit prefix)
5540
- Song.query.readable_by(current_user).results(mongo_direct: true) # User object
5541
- Song.query.readable_by("public").results(mongo_direct: true) # Public access (alias for "*")
5542
- Song.query.readable_by("none").results(mongo_direct: true) # Empty _rperm (master key only)
5557
+ Song.query.readable_by(current_user).results(mongo_direct: true) # User object (roles expanded)
5558
+ Song.query.readable_by(:public).results(mongo_direct: true) # Public access (maps to "*")
5559
+ Song.query.readable_by([]).results(mongo_direct: true) # No read perms (empty _rperm)
5560
+ ```
5561
+
5562
+ By default the match is **inclusive** — it ALSO returns publicly-readable rows
5563
+ (`_rperm` contains `"*"`) and rows with a missing `_rperm` (public by absence),
5564
+ because those are genuinely readable by the principal (access-simulation
5565
+ semantics). For an **exact** match — only rows whose `_rperm` literally grants
5566
+ the principal, with no public/missing rows — pass `strict: true`. This is what
5567
+ an ownership or security audit wants:
5568
+
5569
+ ```ruby
5570
+ Song.query.readable_by("role:Admin", strict: true).results # ONLY rows that explicitly grant Admin
5543
5571
  ```
5544
5572
 
5545
5573
  **`readable_by_role` / `writable_by_role`** - Adds "role:" prefix automatically:
@@ -5549,7 +5577,18 @@ Song.query.readable_by_role(admin_role).results(mongo_direct: true) #
5549
5577
  Song.query.writable_by_role(["Admin", "Editor"]).results(mongo_direct: true) # Multiple roles
5550
5578
  ```
5551
5579
 
5552
- **Note:** Requires the `mongo` gem. Add `gem 'mongo'` to your Gemfile.
5580
+ **Convenience and negation:** `publicly_readable` / `publicly_writable`,
5581
+ `privately_readable` / `private_acl` (master-key-only), `not_readable_by` /
5582
+ `not_writable_by`, and `not_publicly_readable` / `not_publicly_writable`.
5583
+ "Not readable by X" excludes rows readable by X directly, via any role X
5584
+ inherits, or publicly.
5585
+
5586
+ **Note:** These constraints compile to an aggregation `$match` on the internal
5587
+ `_rperm` / `_wperm` columns, so they auto-route to the direct-MongoDB path
5588
+ (requires the `mongo` gem and `Parse::MongoDB.configure(...)`). For a scoped
5589
+ query (`scope_to_user` / `scope_to_role` / `session_token`) the SDK enforces
5590
+ ACL/CLP on that path; a scoped aggregate fails closed if mongo-direct is not
5591
+ configured rather than running unscoped.
5553
5592
 
5554
5593
  ### ACL Dirty Tracking
5555
5594