rails-paradedb 0.2.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: be1908cb2b7b8e9062ac3a567ac2cb27ee4e90a62e5137ff09aa00345e805576
4
- data.tar.gz: 0ee0f049473df3ab660b86a84f494885fc95f1021c7ab258c57d9278083f993e
3
+ metadata.gz: f119945534d0ec4f358a9e05d85529d645dbd71e249a0456e65bbd70a7d63135
4
+ data.tar.gz: 7a9f56fe0eb2a0ea452697f90beeca5a58ca55b996c199fc08babbb14f10eb0c
5
5
  SHA512:
6
- metadata.gz: 9038e6a1fa469c4e0de2d0ba10949074d5c995943d655216626590bca0f912a9e9381f5054ae5f647df7d699cef56586b1491e837d390908038c3a7f2142315b
7
- data.tar.gz: 8fbbec4ccec76d6c1591e773dc72e06975b5b176ec7cc91ffcfe09200ccad9b892ecf2ab8c616e95190b180eed31d03df54e62a7b5d90d30b43df070b7eee48c
6
+ metadata.gz: 1f98449702f8795645745b81bc344a1264946f8928e7bef804295a3bd15e468677414c85b266b2983ceb46aa558d9494f7933e0f770757cb0e31ff1d5fa39567
7
+ data.tar.gz: 3c2e6e60585532b13f3f810160ea77684274a1ba9ad7d7e878d037602bdb784edb2dbe773b66a87e21651671be121c3ba948fd7be38ee4e341c13ec151b7d1bd
data/CHANGELOG.md CHANGED
@@ -4,6 +4,19 @@ All notable changes to this project will be documented in this file. The format
4
4
 
5
5
  ## [Unreleased]
6
6
 
7
+ ## [0.3.0] - 2026-03-23
8
+
9
+ ### Removed
10
+
11
+ - **BREAKING**: Removed `has_paradedb_index` class attribute. It had no
12
+ effect on library behavior. Remove `self.has_paradedb_index = true`
13
+ from your models.
14
+
15
+ ### Changed
16
+
17
+ - **BREAKING**: `near` now accepts a chainable `ParadeDB.proximity(...).within(...)`
18
+ clause to support the full proximity API
19
+
7
20
  ## [0.2.0] - 2026-03-13
8
21
 
9
22
  ### Added
@@ -90,6 +103,7 @@ All notable changes to this project will be documented in this file. The format
90
103
  - Schema dump/load round-trip for tokenizer configuration and index options
91
104
  (including `target_segment_count`)
92
105
 
93
- [Unreleased]: https://github.com/paradedb/rails-paradedb/compare/v0.2.0...HEAD
106
+ [Unreleased]: https://github.com/paradedb/rails-paradedb/compare/v0.3.0...HEAD
107
+ [0.3.0]: https://github.com/paradedb/rails-paradedb/releases/tag/v0.3.0
94
108
  [0.2.0]: https://github.com/paradedb/rails-paradedb/releases/tag/v0.2.0
95
109
  [0.1.0]: https://github.com/paradedb/rails-paradedb/releases/tag/v0.1.0
data/README.md CHANGED
@@ -1,18 +1,61 @@
1
+ <!-- ParadeDB: Postgres for Search and Analytics -->
2
+ <h1 align="center">
3
+ <a href="https://paradedb.com"><img src="https://github.com/paradedb/paradedb/raw/main/docs/logo/readme.svg" alt="ParadeDB"></a>
4
+ <br>
5
+ </h1>
6
+
7
+ <p align="center">
8
+ <b>Simple, Elastic-quality search for Postgres</b><br/>
9
+ </p>
10
+
11
+ <h3 align="center">
12
+ <a href="https://paradedb.com">Website</a> &bull;
13
+ <a href="https://docs.paradedb.com">Docs</a> &bull;
14
+ <a href="https://paradedb.com/slack/">Community</a> &bull;
15
+ <a href="https://paradedb.com/blog/">Blog</a> &bull;
16
+ <a href="https://docs.paradedb.com/changelog/">Changelog</a>
17
+ </h3>
18
+
19
+ ---
20
+
1
21
  # rails-paradedb
2
22
 
3
23
  [![Gem Version](https://img.shields.io/gem/v/rails-paradedb)](https://rubygems.org/gems/rails-paradedb)
4
- [![CI](https://github.com/paradedb/rails-paradedb/actions/workflows/ci.yml/badge.svg)](https://github.com/paradedb/rails-paradedb/actions/workflows/ci.yml)
24
+ [![Ruby Requirement](https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Frubygems.org%2Fapi%2Fv1%2Fversions%2Frails-paradedb.json&query=%24%5B0%5D.ruby_version&label=ruby&logo=ruby)](https://rubygems.org/gems/rails-paradedb)
25
+ [![Gem Downloads](https://img.shields.io/gem/dt/rails-paradedb)](https://rubygems.org/gems/rails-paradedb)
26
+ [![Codecov](https://codecov.io/gh/paradedb/rails-paradedb/graph/badge.svg)](https://codecov.io/gh/paradedb/rails-paradedb)
5
27
  [![License](https://img.shields.io/github/license/paradedb/rails-paradedb?color=blue)](https://github.com/paradedb/rails-paradedb?tab=MIT-1-ov-file#readme)
28
+ [![Slack URL](https://img.shields.io/badge/Join%20Slack-purple?logo=slack&link=https%3A%2F%2Fparadedb.com%2Fslack)](https://paradedb.com/slack)
29
+ [![X URL](https://img.shields.io/twitter/url?url=https%3A%2F%2Ftwitter.com%2Fparadedb&label=Follow%20%40paradedb)](https://x.com/paradedb)
30
+
31
+ The official Ruby client for [ParadeDB](https://paradedb.com), built for ActiveRecord.
32
+ Use Elastic-quality full-text search, scoring, snippets, facets, and aggregations directly from Rails.
33
+
34
+ ## Features
6
35
 
7
- ActiveRecord integration for [ParadeDB](https://paradedb.com): BM25 full-text search, scoring, snippets, facets, and aggregations in PostgreSQL.
36
+ - BM25 index management in Rails migrations (`create_paradedb_index`, `remove_bm25_index`, `reindex_bm25`)
37
+ - Chainable ActiveRecord search API (`matching_all`, `matching_any`, `term`, `phrase`, `regex`, `near`, `parse`, and more)
38
+ - Relevance and highlighting (`with_score`, `with_snippet`, `with_snippets`, `with_snippet_positions`)
39
+ - Facets and aggregations (`with_facets`, `facets`, `with_agg`, `facets_agg`, `aggregate_by`)
40
+ - More Like This similarity search (`more_like_this`)
41
+ - Arel integration for advanced query composition with native ParadeDB operators
42
+ - Diagnostics helpers and rake tasks for index health and verification
43
+ - Optional runtime index validation to detect missing/drifted BM25 indexes
8
44
 
9
- ParadeDB docs: <https://docs.paradedb.com>
45
+ ## Requirements & Compatibility
10
46
 
11
- ## Requirements
47
+ | Component | Supported |
48
+ | ---------- | ------------------------------------------------ |
49
+ | Ruby | 3.2+ |
50
+ | Rails | 7.2+ |
51
+ | ParadeDB | 0.22.0+ |
52
+ | PostgreSQL | 15+ (PostgreSQL adapter with ParadeDB extension) |
12
53
 
13
- - Ruby 3.2+
14
- - Rails 7.2+
15
- - PostgreSQL 17+ with `pg_search` (ParadeDB)
54
+ Notes:
55
+
56
+ - CI runs Ruby `3.2` through `4.0` across Rails `7.2` and `8.1` on PostgreSQL `18`.
57
+ - Schema compatibility is checked against every ParadeDB release.
58
+ - The maintained minimum ParadeDB version is `0.22.0`; update `README.md`, `RELEASE.md`, and CI in the same PR whenever that floor changes.
16
59
 
17
60
  ## Installation
18
61
 
@@ -26,25 +69,24 @@ bundle install
26
69
 
27
70
  ## Quick Start
28
71
 
72
+ ### Prerequisites
73
+
74
+ Make sure your Rails app uses PostgreSQL and that `pg_search` is installed in the target database:
75
+
76
+ ```sql
77
+ CREATE EXTENSION IF NOT EXISTS pg_search;
78
+ ```
79
+
80
+ ### 1. Define Your Model and Index
81
+
29
82
  ```ruby
30
83
  class MockItem < ActiveRecord::Base
31
84
  include ParadeDB::Model
32
85
 
33
86
  self.table_name = "mock_items"
34
87
  self.primary_key = "id"
35
- self.has_paradedb_index = true
36
88
  end
37
- ```
38
89
 
39
- ```ruby
40
- MockItem.search(:description).matching_all("running shoes")
41
- MockItem.search(:description).matching_any("wireless", "bluetooth")
42
- MockItem.search(:description).term("electronics")
43
- ```
44
-
45
- ## Index Definition
46
-
47
- ```ruby
48
90
  class MockItemIndex < ParadeDB::Index
49
91
  self.table_name = :mock_items
50
92
  self.key_field = :id
@@ -62,14 +104,10 @@ class MockItemIndex < ParadeDB::Index
62
104
  end
63
105
  ```
64
106
 
65
- For text or JSON fields you plan to use in Top K queries, facets, grouped
66
- aggregations, or `top_hits` docvalue fields, use `:literal` or
67
- `:literal_normalized`.
68
-
69
- Create in migration:
107
+ ### 2. Create the BM25 Index in a Migration
70
108
 
71
109
  ```ruby
72
- class AddMockItemBm25Index < ActiveRecord::Migration[8.1]
110
+ class AddMockItemBm25Index < ActiveRecord::Migration[7.2] # use your app's migration version
73
111
  def up
74
112
  create_paradedb_index(MockItemIndex, if_not_exists: true)
75
113
  end
@@ -80,10 +118,18 @@ class AddMockItemBm25Index < ActiveRecord::Migration[8.1]
80
118
  end
81
119
  ```
82
120
 
121
+ ### 3. Search
122
+
123
+ ```ruby
124
+ MockItem.search(:description).matching_all("running shoes")
125
+ MockItem.search(:description).matching_any("wireless", "bluetooth")
126
+ MockItem.search(:description).term("electronics")
127
+ ```
128
+
83
129
  ## Query API
84
130
 
85
131
  ```ruby
86
- # Full-text
132
+ # Full text
87
133
  MockItem.search(:description).matching_all("running shoes")
88
134
  MockItem.search(:description).matching_any("wireless bluetooth")
89
135
 
@@ -92,6 +138,7 @@ MockItem.search(:description).matching_any("running shoes", tokenizer: "whitespa
92
138
  MockItem.search(:description).matching_any("running shoes", tokenizer: "whitespace('lowercase=false')")
93
139
 
94
140
  # Fuzzy options on match/term
141
+ # Note: tokenizer overrides are mutually exclusive with fuzzy options.
95
142
  MockItem.search(:description).matching_any("runing shose", distance: 1)
96
143
  MockItem.search(:description).matching_all("runing", distance: 1, prefix: true)
97
144
  MockItem.search(:description).term("shose", distance: 1, transposition_cost_one: true)
@@ -101,22 +148,25 @@ MockItem.search(:description).phrase("running shoes", slop: 2)
101
148
  MockItem.search(:description).phrase("running shoes", tokenizer: "whitespace")
102
149
  MockItem.search(:description).phrase(%w[running shoes])
103
150
  MockItem.search(:description).regex("run.*")
104
- MockItem.search(:description).near("running", anchor: "shoes", distance: 3)
105
- MockItem.search(:description).near("running", anchor: "shoes", distance: 3, ordered: true)
106
- MockItem.search(:description).near(ParadeDB.regex_term("run.*"), anchor: "shoes", distance: 3)
107
- MockItem.search(:description).near("running", "trail", anchor: "shoes", distance: 3)
108
- MockItem.search(:description).near(ParadeDB.regex_term("run.*"), "trail", anchor: "shoes", distance: 3)
151
+ MockItem.search(:description).near(ParadeDB.proximity("running").within(3, "shoes"))
152
+ MockItem.search(:description).near(ParadeDB.proximity("running").within(3, "shoes", ordered: true))
153
+ MockItem.search(:description).near(ParadeDB.proximity("hiking", "running").within(2, "shoes"))
154
+ MockItem.search(:description).near(ParadeDB.proximity("running").within(2, "shoes", "sneakers", ordered: true))
155
+ MockItem.search(:description).near(ParadeDB.regex_term("run.*").within(3, "shoes"))
156
+ MockItem.search(:description).near(ParadeDB.proximity("trail").within(1, "running").within(1, "shoes"))
157
+ MockItem.search(:description).near(ParadeDB.proximity("running").within(3, "shoes"), boost: 2.0)
158
+ MockItem.search(:description).near(ParadeDB.proximity("running").within(3, "shoes"), const: 1.0)
109
159
  MockItem.search(:description).regex_phrase("run.*", "shoes")
110
- MockItem.search(:description).phrase_prefix("run", "sh")
111
160
  MockItem.search(:description).phrase_prefix("run", "sh", max_expansion: 100)
112
161
  MockItem.search(:description).parse("running AND shoes", lenient: true)
113
- MockItem.search(:description).parse("running shoes", conjunction_mode: true)
114
162
 
163
+ # Match-all / exists / ranges
115
164
  MockItem.search(:id).match_all
116
165
  MockItem.search(:id).exists
117
166
  MockItem.search(:rating).range(gte: 3, lt: 5)
118
167
  MockItem.search(:weight_range).range_term("(10, 12]", relation: "Intersects")
119
168
 
169
+ # Similarity
120
170
  MockItem.more_like_this(42, fields: [:description])
121
171
  ```
122
172
 
@@ -150,16 +200,10 @@ relation = MockItem.search(:description)
150
200
  .with_facets(:category, size: 10)
151
201
  .order(:id)
152
202
  .limit(10)
203
+
153
204
  rows = relation.to_a
154
205
  facets = relation.facets
155
206
 
156
- # Non-exact window facets
157
- relation = MockItem.search(:description)
158
- .matching_all("shoes")
159
- .with_facets(:category, size: 10, exact: false)
160
- .order(:id)
161
- .limit(10)
162
-
163
207
  # Facets-only aggregate
164
208
  MockItem.search(:description).matching_all("shoes").facets(:category)
165
209
 
@@ -169,11 +213,39 @@ MockItem.search(:description).matching_all("shoes").facets_agg(
169
213
  avg_rating: ParadeDB::Aggregations.avg(:rating)
170
214
  )
171
215
 
172
- # Non-exact window named aggregations
216
+ # Window aggregations + rows
173
217
  MockItem.search(:description).matching_all("shoes").with_agg(
174
218
  exact: false,
175
- docs: ParadeDB::Aggregations.value_count(:id)
219
+ docs: ParadeDB::Aggregations.value_count(:id),
220
+ stats: ParadeDB::Aggregations.stats(:rating)
176
221
  ).order(:id).limit(10)
222
+
223
+ # Grouped aggregations
224
+ MockItem.search(:id).match_all.aggregate_by(
225
+ :category,
226
+ docs: ParadeDB::Aggregations.value_count(:id)
227
+ )
228
+ ```
229
+
230
+ If you group by text/JSON fields, index those fields using `:literal` or `:literal_normalized`.
231
+
232
+ ## ActiveRecord and Arel Composition
233
+
234
+ Use ParadeDB conditions with normal ActiveRecord scopes:
235
+
236
+ ```ruby
237
+ MockItem.search(:description)
238
+ .matching_all("shoes")
239
+ .where(in_stock: true)
240
+ .where(MockItem.arel_table[:rating].gteq(4))
241
+ .order(created_at: :desc)
242
+ ```
243
+
244
+ For advanced SQL composition, ParadeDB operators are also available through Arel predications:
245
+
246
+ ```ruby
247
+ t = MockItem.arel_table
248
+ MockItem.where(t[:description].pdb_match("running shoes"))
177
249
  ```
178
250
 
179
251
  ## Diagnostics Helpers
@@ -187,7 +259,9 @@ ParadeDB.paradedb_verify_index("search_idx", sample_rate: 0.1)
187
259
  ParadeDB.paradedb_verify_all_indexes(index_pattern: "search_idx")
188
260
  ```
189
261
 
190
- Rake tasks:
262
+ Availability depends on the installed `pg_search` version.
263
+
264
+ Repository development tasks (from this repo's `Rakefile`):
191
265
 
192
266
  ```bash
193
267
  rake paradedb:diagnostics:indexes
@@ -196,7 +270,60 @@ rake "paradedb:diagnostics:verify_index[search_idx]" SAMPLE_RATE=0.1
196
270
  rake paradedb:diagnostics:verify_all_indexes INDEX_PATTERN=search_idx
197
271
  ```
198
272
 
199
- Note: availability depends on your installed `pg_search` version.
273
+ ## Index Validation
274
+
275
+ By default, index validation is disabled. You can enable runtime checks globally:
276
+
277
+ ```ruby
278
+ # config/initializers/paradedb.rb
279
+ ParadeDB.index_validation_mode = :warn # :warn, :raise, or :off
280
+ ```
281
+
282
+ When enabled, `rails-paradedb` validates that the expected BM25 index exists and can raise
283
+ `ParadeDB::IndexDriftError` or `ParadeDB::IndexClassNotFoundError` depending on mode.
284
+
285
+ ## Common Errors
286
+
287
+ ### "No search field set. Call .search(column) first."
288
+
289
+ ```ruby
290
+ # ❌ Missing .search(...)
291
+ MockItem.matching_all("shoes")
292
+
293
+ # ✅ Start with .search(column)
294
+ MockItem.search(:description).matching_all("shoes")
295
+ ```
296
+
297
+ ### "with_facets requires ORDER BY and LIMIT"
298
+
299
+ ```ruby
300
+ # ❌ Missing order/limit
301
+ MockItem.search(:description).matching_all("shoes").with_facets(:category).to_a
302
+
303
+ # ✅ Include both
304
+ relation = MockItem.search(:description)
305
+ .matching_all("shoes")
306
+ .with_facets(:category)
307
+ .order(:id)
308
+ .limit(10)
309
+ relation.to_a
310
+ relation.facets
311
+ ```
312
+
313
+ ### "search(:field) is not indexed"
314
+
315
+ ```ruby
316
+ # ❌ Field not in your ParadeDB::Index fields hash
317
+ MockItem.search(:title).matching_all("shoes")
318
+
319
+ # ✅ Add :title to the index definition, then migrate
320
+ ```
321
+
322
+ ## Security
323
+
324
+ `rails-paradedb` builds SQL through Arel nodes and quoted literals (`Arel::Nodes.build_quoted`)
325
+ rather than manual string interpolation. Tokenizer expressions are validated and search operators are
326
+ rendered through typed nodes, with unit and integration coverage for quoting and edge cases.
200
327
 
201
328
  ## Examples
202
329
 
@@ -206,11 +333,37 @@ Note: availability depends on your installed `pg_search` version.
206
333
  - [More Like This](examples/more_like_this/more_like_this.rb)
207
334
  - [Hybrid RRF](examples/hybrid_rrf/hybrid_rrf.rb)
208
335
  - [RAG](examples/rag/rag.rb)
336
+ - [Examples README](examples/README.md)
337
+
338
+ ## Documentation
339
+
340
+ - **ParadeDB Official Docs**: <https://docs.paradedb.com>
341
+ - **ParadeDB Website**: <https://paradedb.com>
209
342
 
210
343
  ## Contributing
211
344
 
212
- See [CONTRIBUTING.md](CONTRIBUTING.md).
345
+ See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup, test commands, linting, and PR workflow.
346
+
347
+ ## Support
348
+
349
+ If you're missing a feature or found a bug, open a
350
+ [GitHub Issue](https://github.com/paradedb/rails-paradedb/issues/new/choose).
351
+
352
+ For community support:
353
+
354
+ - Join the [ParadeDB Slack Community](https://paradedb.com/slack)
355
+ - Ask in [ParadeDB Discussions](https://github.com/paradedb/paradedb/discussions)
356
+
357
+ For commercial support, contact [sales@paradedb.com](mailto:sales@paradedb.com).
358
+
359
+ ## Acknowledgments
360
+
361
+ We would like to thank the following members of the community for their valuable feedback and reviews during the development of this package:
362
+
363
+ - [Eric Barendt](https://github.com/ebarendt) - Engineering at Modern Treasury
364
+ - [Matthew Higgins](https://github.com/matthuhiggins) - Engineering at Modern Treasury
365
+ - [Patrick Schmitz](https://github.com/bullfight) - Engineering at Modern Treasury
213
366
 
214
367
  ## License
215
368
 
216
- MIT
369
+ rails-paradedb is licensed under the [MIT License](LICENSE).
@@ -25,36 +25,39 @@ Render any node with `ParadeDB::Arel.to_sql(node)`. All nodes respond to
25
25
 
26
26
  ## Builder Methods
27
27
 
28
- | Method | ParadeDB SQL |
29
- | ---------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------- |
30
- | `match(column, *terms, tokenizer: nil, distance:, prefix:, transposition_cost_one:, boost: nil)` | `column &&& 'a b'::pdb.whitespace::pdb.fuzzy(...)::pdb.boost(N)` |
31
- | `match_any(column, *terms, tokenizer: nil, distance:, prefix:, transposition_cost_one:, boost: nil)` | `column \|\|\| 'a b'::pdb.whitespace::pdb.fuzzy(...)::pdb.boost(N)` |
32
- | `phrase(column, text_or_terms, slop: n, tokenizer: nil)` | `column ### 'text'::pdb.slop(n)::pdb.whitespace` / `### ARRAY['a', 'b']::pdb.slop(n)` |
33
- | `term(column, term, distance:, prefix:, transposition_cost_one:, boost: nil)` | `column === 'term'::pdb.fuzzy(...)::pdb.boost(N)` |
34
- | `term_set(column, *terms)` | `column @@@ pdb.term_set(ARRAY[...])` |
35
- | `regex(column, pattern)` | `column @@@ pdb.regex('pattern')` |
36
- | `regex_phrase(column, *patterns, slop: nil, max_expansions: nil)` | `column @@@ pdb.regex_phrase(ARRAY['a', 'b'], slop => 2)` |
37
- | `near(column, *terms, anchor:, distance:, ordered: false)` | `column @@@ ('a' ## d ## 'b')` / `(pdb.prox_array('a', 'b') ## d ## 'c')` |
38
- | `near(column, ParadeDB.regex_term('a'), 'b', anchor:, distance:)` | `column @@@ (pdb.prox_array(pdb.prox_regex('a'), 'b') ## d ## 'c')` |
39
- | `phrase_prefix(column, *terms, max_expansion: nil)` | `column @@@ pdb.phrase_prefix(ARRAY['a','b'][, 100])` |
40
- | `parse(column, query, lenient: nil, conjunction_mode: nil)` | `column @@@ pdb.parse('q', lenient => true, conjunction_mode => true)` |
41
- | `full_text(column, expr)` | `column @@@ expr` (raw right-hand value) |
42
- | `match_all(column)` | `column @@@ pdb.all()` |
43
- | `exists(column)` | `column @@@ pdb.exists()` |
44
- | `range(column, value = nil, gte:, gt:, lte:, lt:, type:)` | `column @@@ pdb.range(int8range(3, 5, '[)'))` |
45
- | `range_term(column, value, relation: nil, range_type: nil)` | `column @@@ pdb.range_term(1)` / `pdb.range_term('(1,2]'::int4range, 'Intersects')` |
46
- | `more_like_this(column, key, fields: [:f1, :f2])` | `column @@@ pdb.more_like_this(key, ARRAY['f1','f2'])` |
47
- | `score(key_field)` | `pdb.score(key_field)` |
48
- | `snippet(column, start, finish, max)` | `pdb.snippet(column, start, finish, max)` |
49
- | `snippets(column, start_tag:, end_tag:, max_num_chars:, limit:, offset:, sort_by:)` | `pdb.snippets(column, ...)` |
50
- | `snippet_positions(column)` | `pdb.snippet_positions(column)` |
51
- | `agg(json, exact: nil)` | `pdb.agg(json[, false])` |
28
+ | Method | ParadeDB SQL |
29
+ | ------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------- |
30
+ | `match(column, *terms, tokenizer: nil, distance:, prefix:, transposition_cost_one:, boost: nil, constant_score: nil)` | `column &&& 'a b'::pdb.whitespace::pdb.fuzzy(...)::pdb.boost(N)` |
31
+ | `match_any(column, *terms, tokenizer: nil, distance:, prefix:, transposition_cost_one:, boost: nil, constant_score: nil)` | `column \|\|\| 'a b'::pdb.whitespace::pdb.fuzzy(...)::pdb.boost(N)` |
32
+ | `phrase(column, text_or_terms, slop: n, tokenizer: nil, boost: nil, constant_score: nil)` | `column ### 'text'::pdb.slop(n)::pdb.whitespace` / `### ARRAY['a', 'b']::pdb.slop(n)` |
33
+ | `term(column, term, distance:, prefix:, transposition_cost_one:, boost: nil, constant_score: nil)` | `column === 'term'::pdb.fuzzy(...)::pdb.boost(N)` |
34
+ | `term_set(column, *terms, boost: nil, constant_score: nil)` | `column @@@ pdb.term_set(ARRAY[...])` |
35
+ | `regex(column, pattern, boost: nil, constant_score: nil)` | `column @@@ pdb.regex('pattern')` |
36
+ | `regex_phrase(column, *patterns, slop: nil, max_expansions: nil, boost: nil, constant_score: nil)` | `column @@@ pdb.regex_phrase(ARRAY['a', 'b'], slop => 2)` |
37
+ | `near(column, ParadeDB.proximity('a').within(d, 'b'))` | `column @@@ ('a' ## d ## 'b')` |
38
+ | `near(column, ParadeDB.proximity('a', 'b').within(d, ParadeDB.regex_term('c')).within(e, 'd'))` | `column @@@ ((pdb.prox_array('a', 'b') ## d ## pdb.prox_regex('c')) ## e ## 'd')` |
39
+ | `near(column, ParadeDB.proximity('a').within(d, 'b'), boost: 2.0)` | `column @@@ ('a' ## d ## 'b')::pdb.boost(2.0)` |
40
+ | `near(column, ParadeDB.proximity('a').within(d, 'b'), const: 1.0)` | `column @@@ ('a' ## d ## 'b')::pdb.const(1.0)` |
41
+ | `phrase_prefix(column, *terms, max_expansion: nil, boost: nil, constant_score: nil)` | `column @@@ pdb.phrase_prefix(ARRAY['a','b'][, 100])` |
42
+ | `parse(column, query, lenient: nil, conjunction_mode: nil, boost: nil, constant_score: nil)` | `column @@@ pdb.parse('q', lenient => true, conjunction_mode => true)` |
43
+ | `full_text(column, expr)` | `column @@@ expr` (raw right-hand value) |
44
+ | `match_all(column, boost: nil, constant_score: nil)` | `column @@@ pdb.all()` |
45
+ | `exists(column, boost: nil, constant_score: nil)` | `column @@@ pdb.exists()` |
46
+ | `range(column, value = nil, gte:, gt:, lte:, lt:, type:, boost: nil, constant_score: nil)` | `column @@@ pdb.range(int8range(3, 5, '[)'))` |
47
+ | `range_term(column, value, relation: nil, range_type: nil, boost: nil, constant_score: nil)` | `column @@@ pdb.range_term(1)` / `pdb.range_term('(1,2]'::int4range, 'Intersects')` |
48
+ | `more_like_this(column, key, fields: [:f1, :f2], options: {}, boost: nil, constant_score: nil)` | `column @@@ pdb.more_like_this(key, ARRAY['f1','f2'])` |
49
+ | `score(key_field)` | `pdb.score(key_field)` |
50
+ | `snippet(column, start, finish, max)` | `pdb.snippet(column, start, finish, max)` |
51
+ | `snippets(column, start_tag:, end_tag:, max_num_chars:, limit:, offset:, sort_by:)` | `pdb.snippets(column, ...)` |
52
+ | `snippet_positions(column)` | `pdb.snippet_positions(column)` |
53
+ | `agg(json, exact: nil)` | `pdb.agg(json[, false])` |
52
54
 
53
55
  `Builder#[]` returns a column node for manual composition: `arel[:description]`.
54
56
 
55
57
  > **Note:** `Builder` has no access to ActiveRecord model metadata.
56
58
  > When calling `range_term` with a `relation:`, you must pass `range_type:` explicitly.
57
59
  > The `SearchMethods` layer (`.search(:col).range_term(...)`) auto-infers `range_type` from the column's SQL type.
60
+ > Tokenizer overrides on `match`/`match_any` are mutually exclusive with fuzzy options.
58
61
 
59
62
  ## Composition
60
63
 
@@ -29,6 +29,12 @@ module ParadeDB
29
29
  boost: nil,
30
30
  constant_score: nil
31
31
  )
32
+ validate_tokenizer_fuzzy_compatibility!(
33
+ tokenizer: tokenizer,
34
+ distance: distance,
35
+ prefix: prefix,
36
+ transposition_cost_one: transposition_cost_one
37
+ )
32
38
  rhs = quoted_value(join_terms(terms))
33
39
  rhs = apply_fuzzy(
34
40
  rhs,
@@ -52,6 +58,12 @@ module ParadeDB
52
58
  boost: nil,
53
59
  constant_score: nil
54
60
  )
61
+ validate_tokenizer_fuzzy_compatibility!(
62
+ tokenizer: tokenizer,
63
+ distance: distance,
64
+ prefix: prefix,
65
+ transposition_cost_one: transposition_cost_one
66
+ )
55
67
  rhs = quoted_value(join_terms(terms))
56
68
  rhs = apply_fuzzy(
57
69
  rhs,
@@ -138,25 +150,9 @@ module ParadeDB
138
150
  infix("@@@", column_node(column), rhs)
139
151
  end
140
152
 
141
- def near(column, *terms, anchor:, distance:, ordered: false, boost: nil, constant_score: nil)
142
- raise ArgumentError, "near requires at least one term" if terms.empty?
143
-
144
- left_operand =
145
- if terms.length == 1 && !terms.first.is_a?(::Array)
146
- proximity_term_node(terms.first)
147
- else
148
- prox_array_node(terms.flatten)
149
- end
150
-
151
- build_proximity_query(
152
- column,
153
- left_operand: left_operand,
154
- right_operand: quoted_value(anchor),
155
- distance: distance,
156
- ordered: ordered,
157
- boost: boost,
158
- constant_score: constant_score
159
- )
153
+ def near(column, proximity, boost: nil, const: nil)
154
+ rhs = proximity_query_node(proximity, boost: boost, const: const)
155
+ infix("@@@", column_node(column), rhs)
160
156
  end
161
157
 
162
158
  def phrase_prefix(column, *terms, max_expansion: nil, boost: nil, constant_score: nil)
@@ -343,12 +339,38 @@ module ParadeDB
343
339
  ::Arel::Nodes.build_quoted(value)
344
340
  end
345
341
 
346
- def build_proximity_query(column, left_operand:, right_operand:, distance:, ordered:, boost:, constant_score:)
347
- validate_numeric!(distance, :distance)
348
- operator = ordered ? "##>" : "##"
349
- near_chain = infix(operator, infix(operator, left_operand, quoted_value(distance)), right_operand)
350
- rhs = apply_score_modifier(::Arel::Nodes::Grouping.new(near_chain), boost: boost, constant_score: constant_score)
351
- infix("@@@", column_node(column), rhs)
342
+ def proximity_query_node(proximity, boost: nil, const: nil)
343
+ unless proximity.is_a?(ParadeDB::Proximity::Clause)
344
+ raise ArgumentError, "near requires a ParadeDB.proximity(...) clause"
345
+ end
346
+
347
+ if proximity.clauses.empty?
348
+ raise ArgumentError, "near requires at least one within clause"
349
+ end
350
+
351
+ if boost && !const.nil?
352
+ raise ArgumentError, "boost and const are mutually exclusive"
353
+ end
354
+
355
+ validate_numeric!(boost, :boost) if boost
356
+ validate_numeric!(const, :const) unless const.nil?
357
+
358
+ apply_score_modifier(compile_proximity_clause(proximity), boost: boost, constant_score: const)
359
+ end
360
+
361
+ def compile_proximity_clause(clause)
362
+ current = proximity_operand_node(clause.operand, empty_message: "proximity requires at least one term")
363
+
364
+ clause.clauses.each do |within_clause|
365
+ validate_numeric!(within_clause.distance, :distance)
366
+ operator = within_clause.ordered ? "##>" : "##"
367
+ right_operand = proximity_operand_node(within_clause.operand, empty_message: "within requires at least one term")
368
+ current = ::Arel::Nodes::Grouping.new(
369
+ infix(operator, infix(operator, current, quoted_value(within_clause.distance)), right_operand)
370
+ )
371
+ end
372
+
373
+ current
352
374
  end
353
375
 
354
376
  def prox_regex_node(pattern, max_expansions)
@@ -360,16 +382,30 @@ module ParadeDB
360
382
  ::Arel::Nodes::NamedFunction.new("pdb.prox_regex", args)
361
383
  end
362
384
 
363
- def prox_array_node(left_terms)
364
- terms = normalize_proximity_terms(left_terms)
385
+ def prox_array_node(terms, empty_message:)
365
386
  values = terms.map { |term| proximity_term_node(term) }
366
- raise ArgumentError, "near requires at least one left-side term" if values.empty?
387
+ raise ArgumentError, empty_message if values.empty?
367
388
 
368
389
  ::Arel::Nodes::NamedFunction.new("pdb.prox_array", values)
369
390
  end
370
391
 
392
+ def proximity_operand_node(terms, empty_message:)
393
+ return compile_proximity_clause(terms) if terms.is_a?(ParadeDB::Proximity::Clause)
394
+
395
+ normalized_terms = normalize_proximity_terms(terms)
396
+ raise ArgumentError, empty_message if normalized_terms.empty?
397
+
398
+ if normalized_terms.length == 1
399
+ proximity_term_node(normalized_terms.first)
400
+ else
401
+ prox_array_node(normalized_terms, empty_message: empty_message)
402
+ end
403
+ end
404
+
371
405
  def proximity_term_node(term)
372
- if term.is_a?(ParadeDB::Proximity::RegexTerm)
406
+ if term.is_a?(ParadeDB::Proximity::Clause)
407
+ raise ArgumentError, "nested proximity clauses must be passed directly, not inside an array"
408
+ elsif term.is_a?(ParadeDB::Proximity::RegexTerm)
373
409
  prox_regex_node(term.pattern, term.max_expansions)
374
410
  else
375
411
  quoted_value(term)
@@ -566,6 +602,14 @@ module ParadeDB
566
602
  ParadeDB::TokenizerSQL.qualify(value)
567
603
  end
568
604
 
605
+ def validate_tokenizer_fuzzy_compatibility!(tokenizer:, distance:, prefix:, transposition_cost_one:)
606
+ return if tokenizer.nil?
607
+ return if distance.nil? && !prefix && !transposition_cost_one
608
+
609
+ raise ArgumentError,
610
+ "tokenizer cannot be combined with fuzzy options (distance, prefix, transposition_cost_one)"
611
+ end
612
+
569
613
  def arel_table
570
614
  @arel_table ||= table ? ::Arel::Table.new(table.to_s) : nil
571
615
  end
@@ -52,8 +52,8 @@ module ParadeDB
52
52
  BUILDER.regex_phrase(self, *patterns, slop: slop, max_expansions: max_expansions)
53
53
  end
54
54
 
55
- def pdb_near(*terms, anchor:, distance:, ordered: false)
56
- BUILDER.near(self, *terms, anchor: anchor, distance: distance, ordered: ordered)
55
+ def pdb_near(proximity, boost: nil, const: nil)
56
+ BUILDER.near(self, proximity, boost: boost, const: const)
57
57
  end
58
58
 
59
59
  def pdb_phrase_prefix(*terms, max_expansion: nil)
@@ -189,13 +189,6 @@ module ParadeDB
189
189
  Nodes::TokenizerCast.new(node, normalized)
190
190
  end
191
191
 
192
- def pdb_apply_slop(node, slop)
193
- return node if slop.nil?
194
-
195
- pdb_validate_numeric!(slop, :slop)
196
- Nodes::SlopCast.new(node, pdb_quoted(slop))
197
- end
198
-
199
192
  def pdb_quoted(value)
200
193
  ::Arel::Nodes.build_quoted(value)
201
194
  end
@@ -33,7 +33,6 @@ module ParadeDB
33
33
  end
34
34
 
35
35
  base.extend(ClassMethods)
36
- base.class_attribute :has_paradedb_index, default: false
37
36
 
38
37
  # Provide `.search` as a convenience alias unless the model already defines it.
39
38
  # In collision scenarios (Searchkick, Ransack, etc.), users can call `.paradedb_search`.
@@ -2,7 +2,15 @@
2
2
 
3
3
  module ParadeDB
4
4
  module Proximity
5
+ module Chainable
6
+ def within(distance, *terms, ordered: false)
7
+ Clause.new(self).within(distance, *terms, ordered: ordered)
8
+ end
9
+ end
10
+
5
11
  class RegexTerm
12
+ include Chainable
13
+
6
14
  attr_reader :pattern, :max_expansions
7
15
 
8
16
  def initialize(pattern, max_expansions: nil)
@@ -15,5 +23,49 @@ module ParadeDB
15
23
  @max_expansions = max_expansions
16
24
  end
17
25
  end
26
+
27
+ class Within
28
+ attr_reader :distance, :operand, :ordered
29
+
30
+ def initialize(distance, operand, ordered: false)
31
+ @distance = distance
32
+ @operand = operand
33
+ @ordered = ordered
34
+ end
35
+ end
36
+
37
+ class Clause
38
+ include Chainable
39
+
40
+ attr_reader :operand, :clauses
41
+
42
+ def initialize(*terms, operand: nil, clauses: [])
43
+ @operand = operand || self.class.normalize_operand(terms)
44
+ @clauses = clauses
45
+ end
46
+
47
+ def within(distance, *terms, ordered: false)
48
+ normalized_operand =
49
+ begin
50
+ self.class.normalize_operand(terms)
51
+ rescue ArgumentError => e
52
+ raise unless e.message == "proximity requires at least one term"
53
+
54
+ raise ArgumentError, "within requires at least one term"
55
+ end
56
+
57
+ self.class.new(
58
+ operand: operand,
59
+ clauses: clauses + [Within.new(distance, normalized_operand, ordered: ordered)]
60
+ )
61
+ end
62
+
63
+ def self.normalize_operand(terms)
64
+ values = Array(terms).flatten(1).compact
65
+ raise ArgumentError, "proximity requires at least one term" if values.empty?
66
+
67
+ values.length == 1 ? values.first : values
68
+ end
69
+ end
18
70
  end
19
71
  end
@@ -129,7 +129,7 @@ module ParadeDB
129
129
  boost: nil,
130
130
  constant_score: nil
131
131
  )
132
- raise "No search field set. Call .search(column) first." unless _paradedb_current_field
132
+ require_search_field!
133
133
 
134
134
  node = builder.match(
135
135
  _paradedb_current_field,
@@ -153,7 +153,7 @@ module ParadeDB
153
153
  boost: nil,
154
154
  constant_score: nil
155
155
  )
156
- raise "No search field set. Call .search(column) first." unless _paradedb_current_field
156
+ require_search_field!
157
157
 
158
158
  node = builder.match_any(
159
159
  _paradedb_current_field,
@@ -169,14 +169,14 @@ module ParadeDB
169
169
  end
170
170
 
171
171
  def excluding(*terms)
172
- raise "No search field set. Call .search(column) first." unless _paradedb_current_field
172
+ require_search_field!
173
173
 
174
174
  neg = builder.match(_paradedb_current_field, *terms)
175
175
  where(grouped(neg.not))
176
176
  end
177
177
 
178
178
  def phrase(text, slop: nil, tokenizer: nil, boost: nil, constant_score: nil)
179
- raise "No search field set. Call .search(column) first." unless _paradedb_current_field
179
+ require_search_field!
180
180
 
181
181
  node = builder.phrase(
182
182
  _paradedb_current_field,
@@ -190,14 +190,14 @@ module ParadeDB
190
190
  end
191
191
 
192
192
  def regex(pattern, boost: nil, constant_score: nil)
193
- raise "No search field set. Call .search(column) first." unless _paradedb_current_field
193
+ require_search_field!
194
194
 
195
195
  node = builder.regex(_paradedb_current_field, pattern, boost: boost, constant_score: constant_score)
196
196
  where(grouped(node))
197
197
  end
198
198
 
199
199
  def regex_phrase(*patterns, slop: nil, max_expansions: nil, boost: nil, constant_score: nil)
200
- raise "No search field set. Call .search(column) first." unless _paradedb_current_field
200
+ require_search_field!
201
201
 
202
202
  node = builder.regex_phrase(
203
203
  _paradedb_current_field,
@@ -218,7 +218,7 @@ module ParadeDB
218
218
  boost: nil,
219
219
  constant_score: nil
220
220
  )
221
- raise "No search field set. Call .search(column) first." unless _paradedb_current_field
221
+ require_search_field!
222
222
 
223
223
  node = builder.term(
224
224
  _paradedb_current_field,
@@ -233,29 +233,21 @@ module ParadeDB
233
233
  end
234
234
 
235
235
  def term_set(*values, boost: nil, constant_score: nil)
236
- raise "No search field set. Call .search(column) first." unless _paradedb_current_field
236
+ require_search_field!
237
237
 
238
238
  node = builder.term_set(_paradedb_current_field, *values, boost: boost, constant_score: constant_score)
239
239
  where(grouped(node))
240
240
  end
241
241
 
242
- def near(*terms, anchor:, distance:, ordered: false, boost: nil, constant_score: nil)
243
- raise "No search field set. Call .search(column) first." unless _paradedb_current_field
242
+ def near(proximity, boost: nil, const: nil)
243
+ require_search_field!
244
244
 
245
- node = builder.near(
246
- _paradedb_current_field,
247
- *terms,
248
- anchor: anchor,
249
- distance: distance,
250
- ordered: ordered,
251
- boost: boost,
252
- constant_score: constant_score
253
- )
245
+ node = builder.near(_paradedb_current_field, proximity, boost: boost, const: const)
254
246
  where(grouped(node))
255
247
  end
256
248
 
257
249
  def phrase_prefix(*terms, max_expansion: nil, boost: nil, constant_score: nil)
258
- raise "No search field set. Call .search(column) first." unless _paradedb_current_field
250
+ require_search_field!
259
251
 
260
252
  node = builder.phrase_prefix(
261
253
  _paradedb_current_field,
@@ -269,7 +261,7 @@ module ParadeDB
269
261
 
270
262
  # Parse query-string syntax into ParadeDB query AST (e.g. "running AND shoes").
271
263
  def parse(query, lenient: nil, conjunction_mode: nil, boost: nil, constant_score: nil)
272
- raise "No search field set. Call .search(column) first." unless _paradedb_current_field
264
+ require_search_field!
273
265
  node = builder.parse(
274
266
  _paradedb_current_field,
275
267
  query,
@@ -284,7 +276,7 @@ module ParadeDB
284
276
  # Match-all wrapper for APIs that need an explicit ParadeDB predicate.
285
277
  # Use with `.search(:id)` (or any indexed field): `Product.search(:id).match_all`.
286
278
  def match_all(boost: nil, constant_score: nil)
287
- raise "No search field set. Call .search(column) first." unless _paradedb_current_field
279
+ require_search_field!
288
280
 
289
281
  where(grouped(builder.match_all(_paradedb_current_field, boost: boost, constant_score: constant_score)))
290
282
  end
@@ -292,7 +284,7 @@ module ParadeDB
292
284
  # Exists wrapper to match rows where the indexed field has a value.
293
285
  # Use with `.search(:id)` (or another exists-compatible indexed field).
294
286
  def exists(boost: nil, constant_score: nil)
295
- raise "No search field set. Call .search(column) first." unless _paradedb_current_field
287
+ require_search_field!
296
288
 
297
289
  where(grouped(builder.exists(_paradedb_current_field, boost: boost, constant_score: constant_score)))
298
290
  end
@@ -302,7 +294,7 @@ module ParadeDB
302
294
  # Product.search(:rating).range(3..5)
303
295
  # Product.search(:rating).range(gte: 3, lt: 5)
304
296
  def range(value = nil, gte: nil, gt: nil, lte: nil, lt: nil, type: nil, boost: nil, constant_score: nil)
305
- raise "No search field set. Call .search(column) first." unless _paradedb_current_field
297
+ require_search_field!
306
298
 
307
299
  inferred_type = type || default_range_type_for_field(_paradedb_current_field)
308
300
  node = builder.range(_paradedb_current_field, value, gte: gte, gt: gt, lte: lte, lt: lt, type: inferred_type, boost: boost, constant_score: constant_score)
@@ -310,7 +302,7 @@ module ParadeDB
310
302
  end
311
303
 
312
304
  def range_term(value, relation: nil, range_type: nil, boost: nil, constant_score: nil)
313
- raise "No search field set. Call .search(column) first." unless _paradedb_current_field
305
+ require_search_field!
314
306
 
315
307
  inferred_range_type = range_type || (relation && infer_range_type_for_field(_paradedb_current_field))
316
308
  node = builder.range_term(
@@ -570,6 +562,12 @@ module ParadeDB
570
562
  ::Arel::Nodes::Grouping.new(node)
571
563
  end
572
564
 
565
+ def require_search_field!
566
+ return if _paradedb_current_field
567
+
568
+ raise ArgumentError, "No search field set. Call .search(column) first."
569
+ end
570
+
573
571
  def with_projection(projection)
574
572
  rel = self
575
573
  rel = rel.select(klass.arel_table[::Arel.star]) if rel.select_values.empty?
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module ParadeDB
4
- VERSION = "0.2.0"
4
+ VERSION = "0.3.0"
5
5
  end
data/lib/parade_db.rb CHANGED
@@ -45,8 +45,20 @@ module ParadeDB
45
45
  end
46
46
 
47
47
  def index_validation_mode=(mode)
48
- normalized = mode.to_sym
49
48
  valid_modes = %i[warn raise off]
49
+ normalized =
50
+ case mode
51
+ when Symbol
52
+ mode
53
+ when String
54
+ stripped = mode.strip
55
+ raise ArgumentError, "index_validation_mode must be one of: #{valid_modes.join(', ')}" if stripped.empty?
56
+
57
+ stripped.to_sym
58
+ else
59
+ raise ArgumentError, "index_validation_mode must be one of: #{valid_modes.join(', ')}"
60
+ end
61
+
50
62
  if valid_modes.include?(normalized)
51
63
  @index_validation_mode = normalized
52
64
  return
@@ -57,7 +69,8 @@ module ParadeDB
57
69
 
58
70
  def ensure_postgresql_adapter!(connection, context:)
59
71
  adapter_name = connection.adapter_name.to_s
60
- return if adapter_name.downcase.include?("postgres")
72
+ normalized = adapter_name.downcase
73
+ return if normalized.include?("postgres") || normalized.include?("postgis")
61
74
 
62
75
  raise Errors::UnsupportedAdapterError,
63
76
  "#{context} only supports PostgreSQL. Current adapter: #{adapter_name.inspect}"
@@ -66,4 +79,8 @@ module ParadeDB
66
79
  def regex_term(pattern, max_expansions: nil)
67
80
  Proximity::RegexTerm.new(pattern, max_expansions: max_expansions)
68
81
  end
82
+
83
+ def proximity(*terms)
84
+ Proximity::Clause.new(*terms)
85
+ end
69
86
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rails-paradedb
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - ParadeDB