rails-paradedb 0.1.0 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +47 -6
- data/README.md +130 -196
- data/lib/parade_db/aggregations.rb +112 -1
- data/lib/parade_db/arel/README.md +28 -21
- data/lib/parade_db/arel/builder.rb +265 -25
- data/lib/parade_db/arel/nodes.rb +70 -8
- data/lib/parade_db/arel/predications.rb +93 -30
- data/lib/parade_db/arel/visitor.rb +32 -1
- data/lib/parade_db/diagnostics.rb +78 -0
- data/lib/parade_db/migration_helpers.rb +10 -12
- data/lib/parade_db/model.rb +51 -15
- data/lib/parade_db/proximity.rb +19 -0
- data/lib/parade_db/query.rb +14 -0
- data/lib/parade_db/search_methods.rb +370 -49
- data/lib/parade_db/tokenizer_sql.rb +21 -0
- data/lib/parade_db/version.rb +1 -1
- data/lib/parade_db.rb +23 -0
- metadata +35 -13
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: be1908cb2b7b8e9062ac3a567ac2cb27ee4e90a62e5137ff09aa00345e805576
|
|
4
|
+
data.tar.gz: 0ee0f049473df3ab660b86a84f494885fc95f1021c7ab258c57d9278083f993e
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 9038e6a1fa469c4e0de2d0ba10949074d5c995943d655216626590bca0f912a9e9381f5054ae5f647df7d699cef56586b1491e837d390908038c3a7f2142315b
|
|
7
|
+
data.tar.gz: 8fbbec4ccec76d6c1591e773dc72e06975b5b176ec7cc91ffcfe09200ccad9b892ecf2ab8c616e95190b180eed31d03df54e62a7b5d90d30b43df070b7eee48c
|
data/CHANGELOG.md
CHANGED
|
@@ -1,13 +1,53 @@
|
|
|
1
1
|
# Changelog
|
|
2
|
-
<!-- markdownlint-disable MD024 -->
|
|
3
2
|
|
|
4
|
-
All notable changes to this project will be documented in this file.
|
|
5
|
-
|
|
6
|
-
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
|
|
7
|
-
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
3
|
+
All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
8
4
|
|
|
9
5
|
## [Unreleased]
|
|
10
6
|
|
|
7
|
+
## [0.2.0] - 2026-03-13
|
|
8
|
+
|
|
9
|
+
### Added
|
|
10
|
+
|
|
11
|
+
- Rails 7.2 support and CI coverage
|
|
12
|
+
- New search/query APIs: `regex_phrase`, `phrase_prefix`, `parse`,
|
|
13
|
+
grouped `aggregate_by`, and `ParadeDB::Query.regex`
|
|
14
|
+
- Expanded snippet support with `with_snippets` and
|
|
15
|
+
`with_snippet_positions`
|
|
16
|
+
- ParadeDB diagnostics helpers:
|
|
17
|
+
`paradedb_indexes`, `paradedb_index_segments`,
|
|
18
|
+
`paradedb_verify_index`, and `paradedb_verify_all_indexes`
|
|
19
|
+
- Additional aggregation helpers:
|
|
20
|
+
`percentiles`, `histogram`, `date_histogram`, `top_hits`, and
|
|
21
|
+
`filtered`
|
|
22
|
+
- Support for passing regexes into proximity queries using
|
|
23
|
+
`ParadeDB.regex_term`
|
|
24
|
+
|
|
25
|
+
### Changed
|
|
26
|
+
|
|
27
|
+
- Fuzzy search controls are now flattened across the relation and Arel
|
|
28
|
+
DSLs with direct `distance`, `prefix`, and
|
|
29
|
+
`transposition_cost_one` options
|
|
30
|
+
- `matching_all` and `matching_any` now accept explicit `tokenizer:`
|
|
31
|
+
overrides
|
|
32
|
+
- Runtime index validation now includes index-class discovery, drift
|
|
33
|
+
checks, indexed-field validation, and model helpers for
|
|
34
|
+
`paradedb_index_classes`, `paradedb_indexed_fields`,
|
|
35
|
+
`paradedb_key_field`, and `paradedb_index_name`
|
|
36
|
+
- Facet and aggregation APIs now support `exact:` controls for exact
|
|
37
|
+
versus windowed execution
|
|
38
|
+
- README, examples, and Arel documentation were expanded to cover the
|
|
39
|
+
newer query, snippet, aggregation, and diagnostics APIs
|
|
40
|
+
|
|
41
|
+
### Fixed
|
|
42
|
+
|
|
43
|
+
- Search/runtime tokenizer handling now renders tokenizer SQL safely and
|
|
44
|
+
validates unsupported tokenizer and facet combinations earlier
|
|
45
|
+
|
|
46
|
+
### Removed
|
|
47
|
+
|
|
48
|
+
- **BREAKING**: `near_regex` has been removed in favor of calling
|
|
49
|
+
`near` with a regex argument using `ParadeDB.regex_term`
|
|
50
|
+
|
|
11
51
|
## [0.1.0] - 2026-02-07
|
|
12
52
|
|
|
13
53
|
### Added
|
|
@@ -50,5 +90,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|
|
50
90
|
- Schema dump/load round-trip for tokenizer configuration and index options
|
|
51
91
|
(including `target_segment_count`)
|
|
52
92
|
|
|
53
|
-
[Unreleased]: https://github.com/paradedb/rails-paradedb/compare/v0.
|
|
93
|
+
[Unreleased]: https://github.com/paradedb/rails-paradedb/compare/v0.2.0...HEAD
|
|
94
|
+
[0.2.0]: https://github.com/paradedb/rails-paradedb/releases/tag/v0.2.0
|
|
54
95
|
[0.1.0]: https://github.com/paradedb/rails-paradedb/releases/tag/v0.1.0
|
data/README.md
CHANGED
|
@@ -3,280 +3,214 @@
|
|
|
3
3
|
[](https://rubygems.org/gems/rails-paradedb)
|
|
4
4
|
[](https://github.com/paradedb/rails-paradedb/actions/workflows/ci.yml)
|
|
5
5
|
[](https://github.com/paradedb/rails-paradedb?tab=MIT-1-ov-file#readme)
|
|
6
|
-
[](https://join.slack.com/t/paradedbcommunity/shared_invite/zt-32abtyjg4-yoYoi~RPh9MSW8tDbl0BQw)
|
|
7
|
-
[](https://x.com/paradedb)
|
|
8
6
|
|
|
9
|
-
[ParadeDB](https://paradedb.com)
|
|
7
|
+
ActiveRecord integration for [ParadeDB](https://paradedb.com): BM25 full-text search, scoring, snippets, facets, and aggregations in PostgreSQL.
|
|
10
8
|
|
|
11
|
-
|
|
9
|
+
ParadeDB docs: <https://docs.paradedb.com>
|
|
12
10
|
|
|
13
|
-
## Requirements
|
|
11
|
+
## Requirements
|
|
14
12
|
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
| Rails | 8.1+ |
|
|
19
|
-
| ParadeDB | 0.21.0+ |
|
|
20
|
-
| PostgreSQL | 17+ (with ParadeDB extension) |
|
|
21
|
-
|
|
22
|
-
**Note**: This gem requires ActiveRecord with PostgreSQL. The DSL and Arel layer delegate SQL value quoting to `ActiveRecord::Base.connection.quote` for type safety and proper escaping.
|
|
13
|
+
- Ruby 3.2+
|
|
14
|
+
- Rails 7.2+
|
|
15
|
+
- PostgreSQL 17+ with `pg_search` (ParadeDB)
|
|
23
16
|
|
|
24
17
|
## Installation
|
|
25
18
|
|
|
26
|
-
Add to your Gemfile:
|
|
27
|
-
|
|
28
19
|
```ruby
|
|
29
20
|
gem "rails-paradedb"
|
|
30
21
|
```
|
|
31
22
|
|
|
32
|
-
Then run:
|
|
33
|
-
|
|
34
23
|
```bash
|
|
35
24
|
bundle install
|
|
36
25
|
```
|
|
37
26
|
|
|
38
27
|
## Quick Start
|
|
39
28
|
|
|
40
|
-
Enable ParadeDB on a model:
|
|
41
|
-
|
|
42
29
|
```ruby
|
|
43
|
-
class
|
|
30
|
+
class MockItem < ActiveRecord::Base
|
|
44
31
|
include ParadeDB::Model
|
|
32
|
+
|
|
33
|
+
self.table_name = "mock_items"
|
|
34
|
+
self.primary_key = "id"
|
|
35
|
+
self.has_paradedb_index = true
|
|
45
36
|
end
|
|
46
37
|
```
|
|
47
38
|
|
|
48
|
-
Search with a simple query:
|
|
49
|
-
|
|
50
39
|
```ruby
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
Check out some examples:
|
|
55
|
-
|
|
56
|
-
- [Quick Start](examples/quickstart/quickstart.rb)
|
|
57
|
-
- [Faceted Search](examples/faceted_search/faceted_search.rb)
|
|
58
|
-
- [Autocomplete](examples/autocomplete/autocomplete.rb)
|
|
59
|
-
- [More Like This](examples/more_like_this/more_like_this.rb)
|
|
60
|
-
- [RAG](examples/rag/rag.rb)
|
|
61
|
-
|
|
62
|
-
## BM25 Index
|
|
63
|
-
|
|
64
|
-
Generate an index class and migration:
|
|
65
|
-
|
|
66
|
-
```bash
|
|
67
|
-
rails g parade_db:index Product description category rating
|
|
40
|
+
MockItem.search(:description).matching_all("running shoes")
|
|
41
|
+
MockItem.search(:description).matching_any("wireless", "bluetooth")
|
|
42
|
+
MockItem.search(:description).term("electronics")
|
|
68
43
|
```
|
|
69
44
|
|
|
70
|
-
|
|
45
|
+
## Index Definition
|
|
71
46
|
|
|
72
47
|
```ruby
|
|
73
|
-
class
|
|
74
|
-
self.table_name = :
|
|
48
|
+
class MockItemIndex < ParadeDB::Index
|
|
49
|
+
self.table_name = :mock_items
|
|
75
50
|
self.key_field = :id
|
|
76
|
-
self.
|
|
51
|
+
self.index_name = :search_idx
|
|
77
52
|
self.fields = {
|
|
78
|
-
id:
|
|
79
|
-
description:
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
"metadata->>'color'": { tokenizer: :literal, alias: "metadata_color" },
|
|
87
|
-
metadata: { fast: true, expand_dots: false }
|
|
53
|
+
id: nil,
|
|
54
|
+
description: nil,
|
|
55
|
+
category: nil,
|
|
56
|
+
rating: nil,
|
|
57
|
+
in_stock: nil,
|
|
58
|
+
created_at: nil,
|
|
59
|
+
metadata: nil,
|
|
60
|
+
weight_range: nil
|
|
88
61
|
}
|
|
89
62
|
end
|
|
90
63
|
```
|
|
91
64
|
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
- `tokenizers` for multiple tokenizer entries on the same source field.
|
|
96
|
-
- `args`, `named_args`, `filters`, `stemmer`, `alias` inside tokenizer entries.
|
|
97
|
-
- field options such as `fast`, `record`, `normalizer`, `expand_dots`.
|
|
65
|
+
For text or JSON fields you plan to use in Top K queries, facets, grouped
|
|
66
|
+
aggregations, or `top_hits` docvalue fields, use `:literal` or
|
|
67
|
+
`:literal_normalized`.
|
|
98
68
|
|
|
99
|
-
Create
|
|
69
|
+
Create in migration:
|
|
100
70
|
|
|
101
71
|
```ruby
|
|
102
|
-
class
|
|
72
|
+
class AddMockItemBm25Index < ActiveRecord::Migration[8.1]
|
|
103
73
|
def up
|
|
104
|
-
create_paradedb_index(
|
|
74
|
+
create_paradedb_index(MockItemIndex, if_not_exists: true)
|
|
105
75
|
end
|
|
106
76
|
|
|
107
77
|
def down
|
|
108
|
-
remove_bm25_index :
|
|
78
|
+
remove_bm25_index :mock_items, name: :search_idx, if_exists: true
|
|
109
79
|
end
|
|
110
80
|
end
|
|
111
81
|
```
|
|
112
82
|
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
- `create_paradedb_index(index_class_or_name, if_not_exists: false)`
|
|
116
|
-
- `replace_paradedb_index(index_class_or_name)`
|
|
117
|
-
- `add_bm25_index(table, fields:, key_field:, name: nil, index_options: nil, if_not_exists: false)`
|
|
118
|
-
- `remove_bm25_index(table, name: nil, if_exists: false)`
|
|
119
|
-
- `reindex_bm25(table, name: nil, concurrently: false)`
|
|
120
|
-
|
|
121
|
-
### Index Validation Mode
|
|
122
|
-
|
|
123
|
-
Runtime index drift validation is controlled by `ParadeDB.index_validation_mode`.
|
|
124
|
-
Default is `:off` (no runtime drift checks).
|
|
125
|
-
|
|
126
|
-
```ruby
|
|
127
|
-
ParadeDB.index_validation_mode = :warn # log drift warnings
|
|
128
|
-
ParadeDB.index_validation_mode = :raise # raise ParadeDB::IndexDriftError on drift
|
|
129
|
-
ParadeDB.index_validation_mode = :off # disable drift checks (default)
|
|
130
|
-
```
|
|
131
|
-
|
|
132
|
-
## Query Types
|
|
133
|
-
|
|
134
|
-
For advanced options, see [ParadeDB Query Builder Documentation](https://docs.paradedb.com/documentation/query-builder/overview) and the runnable scripts in [`examples/`](examples).
|
|
83
|
+
## Query API
|
|
135
84
|
|
|
136
85
|
```ruby
|
|
137
86
|
# Full-text
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
|
|
141
|
-
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
#
|
|
146
|
-
|
|
147
|
-
|
|
148
|
-
|
|
149
|
-
|
|
150
|
-
|
|
151
|
-
|
|
152
|
-
|
|
153
|
-
|
|
154
|
-
|
|
155
|
-
|
|
156
|
-
|
|
157
|
-
|
|
158
|
-
|
|
87
|
+
MockItem.search(:description).matching_all("running shoes")
|
|
88
|
+
MockItem.search(:description).matching_any("wireless bluetooth")
|
|
89
|
+
|
|
90
|
+
# Query-time tokenizer override
|
|
91
|
+
MockItem.search(:description).matching_any("running shoes", tokenizer: "whitespace")
|
|
92
|
+
MockItem.search(:description).matching_any("running shoes", tokenizer: "whitespace('lowercase=false')")
|
|
93
|
+
|
|
94
|
+
# Fuzzy options on match/term
|
|
95
|
+
MockItem.search(:description).matching_any("runing shose", distance: 1)
|
|
96
|
+
MockItem.search(:description).matching_all("runing", distance: 1, prefix: true)
|
|
97
|
+
MockItem.search(:description).term("shose", distance: 1, transposition_cost_one: true)
|
|
98
|
+
|
|
99
|
+
# Other query types
|
|
100
|
+
MockItem.search(:description).phrase("running shoes", slop: 2)
|
|
101
|
+
MockItem.search(:description).phrase("running shoes", tokenizer: "whitespace")
|
|
102
|
+
MockItem.search(:description).phrase(%w[running shoes])
|
|
103
|
+
MockItem.search(:description).regex("run.*")
|
|
104
|
+
MockItem.search(:description).near("running", anchor: "shoes", distance: 3)
|
|
105
|
+
MockItem.search(:description).near("running", anchor: "shoes", distance: 3, ordered: true)
|
|
106
|
+
MockItem.search(:description).near(ParadeDB.regex_term("run.*"), anchor: "shoes", distance: 3)
|
|
107
|
+
MockItem.search(:description).near("running", "trail", anchor: "shoes", distance: 3)
|
|
108
|
+
MockItem.search(:description).near(ParadeDB.regex_term("run.*"), "trail", anchor: "shoes", distance: 3)
|
|
109
|
+
MockItem.search(:description).regex_phrase("run.*", "shoes")
|
|
110
|
+
MockItem.search(:description).phrase_prefix("run", "sh")
|
|
111
|
+
MockItem.search(:description).phrase_prefix("run", "sh", max_expansion: 100)
|
|
112
|
+
MockItem.search(:description).parse("running AND shoes", lenient: true)
|
|
113
|
+
MockItem.search(:description).parse("running shoes", conjunction_mode: true)
|
|
114
|
+
|
|
115
|
+
MockItem.search(:id).match_all
|
|
116
|
+
MockItem.search(:id).exists
|
|
117
|
+
MockItem.search(:rating).range(gte: 3, lt: 5)
|
|
118
|
+
MockItem.search(:weight_range).range_term("(10, 12]", relation: "Intersects")
|
|
119
|
+
|
|
120
|
+
MockItem.more_like_this(42, fields: [:description])
|
|
159
121
|
```
|
|
160
122
|
|
|
161
|
-
##
|
|
162
|
-
|
|
163
|
-
See [BM25 Scoring](https://docs.paradedb.com/documentation/sorting/score) and [Highlighting](https://docs.paradedb.com/documentation/full-text/highlight) for full function details.
|
|
123
|
+
## Scoring and Highlighting
|
|
164
124
|
|
|
165
125
|
```ruby
|
|
166
|
-
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
|
|
126
|
+
results = MockItem.search(:description)
|
|
127
|
+
.matching_all("shoes")
|
|
128
|
+
.with_score
|
|
129
|
+
.order(search_score: :desc)
|
|
130
|
+
|
|
131
|
+
MockItem.search(:description)
|
|
132
|
+
.matching_all("shoes")
|
|
133
|
+
.with_snippet(:description, start_tag: "<b>", end_tag: "</b>", max_chars: 80)
|
|
134
|
+
|
|
135
|
+
MockItem.search(:description)
|
|
136
|
+
.matching_all("running")
|
|
137
|
+
.with_snippets(:description, max_chars: 15, limit: 2, offset: 0, sort_by: :position)
|
|
138
|
+
|
|
139
|
+
MockItem.search(:description)
|
|
140
|
+
.matching_all("running")
|
|
141
|
+
.with_snippet_positions(:description)
|
|
170
142
|
```
|
|
171
143
|
|
|
172
|
-
##
|
|
173
|
-
|
|
174
|
-
For supported aggregate functions and JSON shapes, see [ParadeDB Aggregations Documentation](https://docs.paradedb.com/documentation/aggregates/overview).
|
|
175
|
-
|
|
176
|
-
`with_facets(...)` requires:
|
|
177
|
-
|
|
178
|
-
- an existing ParadeDB predicate
|
|
179
|
-
- `.order(...)`
|
|
180
|
-
- `.limit(...)`
|
|
144
|
+
## Facets and Aggregations
|
|
181
145
|
|
|
182
146
|
```ruby
|
|
183
|
-
# Rows + facets
|
|
184
|
-
relation =
|
|
147
|
+
# Rows + facets (requires order + limit)
|
|
148
|
+
relation = MockItem.search(:description)
|
|
149
|
+
.matching_all("shoes")
|
|
185
150
|
.with_facets(:category, size: 10)
|
|
186
151
|
.order(:id)
|
|
187
152
|
.limit(10)
|
|
188
153
|
rows = relation.to_a
|
|
189
154
|
facets = relation.facets
|
|
190
155
|
|
|
191
|
-
#
|
|
192
|
-
|
|
193
|
-
|
|
194
|
-
|
|
195
|
-
|
|
196
|
-
|
|
197
|
-
.facets_agg(
|
|
198
|
-
docs: ParadeDB::Aggregations.value_count(:id),
|
|
199
|
-
avg_rating: ParadeDB::Aggregations.avg(:rating)
|
|
200
|
-
)
|
|
201
|
-
```
|
|
156
|
+
# Non-exact window facets
|
|
157
|
+
relation = MockItem.search(:description)
|
|
158
|
+
.matching_all("shoes")
|
|
159
|
+
.with_facets(:category, size: 10, exact: false)
|
|
160
|
+
.order(:id)
|
|
161
|
+
.limit(10)
|
|
202
162
|
|
|
203
|
-
|
|
163
|
+
# Facets-only aggregate
|
|
164
|
+
MockItem.search(:description).matching_all("shoes").facets(:category)
|
|
204
165
|
|
|
205
|
-
|
|
166
|
+
# Named aggregations
|
|
167
|
+
MockItem.search(:description).matching_all("shoes").facets_agg(
|
|
168
|
+
docs: ParadeDB::Aggregations.value_count(:id),
|
|
169
|
+
avg_rating: ParadeDB::Aggregations.avg(:rating)
|
|
170
|
+
)
|
|
206
171
|
|
|
207
|
-
|
|
208
|
-
|
|
209
|
-
|
|
210
|
-
|
|
211
|
-
|
|
212
|
-
.limit(10)
|
|
172
|
+
# Non-exact window named aggregations
|
|
173
|
+
MockItem.search(:description).matching_all("shoes").with_agg(
|
|
174
|
+
exact: false,
|
|
175
|
+
docs: ParadeDB::Aggregations.value_count(:id)
|
|
176
|
+
).order(:id).limit(10)
|
|
213
177
|
```
|
|
214
178
|
|
|
215
|
-
|
|
179
|
+
## Diagnostics Helpers
|
|
216
180
|
|
|
217
|
-
|
|
218
|
-
If your application already defines `.search`, rails-paradedb will **not** override it.
|
|
219
|
-
|
|
220
|
-
Use `.paradedb_search` instead:
|
|
181
|
+
Ruby helpers:
|
|
221
182
|
|
|
222
183
|
```ruby
|
|
223
|
-
|
|
184
|
+
ParadeDB.paradedb_indexes
|
|
185
|
+
ParadeDB.paradedb_index_segments("search_idx")
|
|
186
|
+
ParadeDB.paradedb_verify_index("search_idx", sample_rate: 0.1)
|
|
187
|
+
ParadeDB.paradedb_verify_all_indexes(index_pattern: "search_idx")
|
|
224
188
|
```
|
|
225
189
|
|
|
226
|
-
|
|
227
|
-
|
|
228
|
-
See the dedicated Arel guide: [`lib/parade_db/arel/README.md`](lib/parade_db/arel/README.md).
|
|
229
|
-
|
|
230
|
-
## Security
|
|
231
|
-
|
|
232
|
-
### SQL Injection Protection
|
|
233
|
-
|
|
234
|
-
rails-paradedb uses **ActiveRecord's quoting** for all search terms:
|
|
235
|
-
|
|
236
|
-
**Quoting Strategy:**
|
|
237
|
-
|
|
238
|
-
- All user input is quoted via `ActiveRecord::Base.connection.quote`
|
|
239
|
-
- Search terms use Arel's `Nodes.build_quoted()` for type-safe SQL generation
|
|
240
|
-
- This prevents SQL injection while maintaining compatibility with ParadeDB's full-text operators
|
|
241
|
-
|
|
242
|
-
**Implementation Details:**
|
|
243
|
-
|
|
244
|
-
All values flow through ActiveRecord's connection adapter quoting, which handles:
|
|
190
|
+
Rake tasks:
|
|
245
191
|
|
|
246
|
-
|
|
247
|
-
|
|
248
|
-
|
|
249
|
-
|
|
250
|
-
|
|
251
|
-
|
|
252
|
-
```ruby
|
|
253
|
-
# Even malicious input is safely escaped
|
|
254
|
-
user_query = "'; DROP TABLE products; --"
|
|
255
|
-
Product.search(:description).matching_all(user_query)
|
|
256
|
-
# The query is escaped and treated as a literal search term
|
|
192
|
+
```bash
|
|
193
|
+
rake paradedb:diagnostics:indexes
|
|
194
|
+
rake "paradedb:diagnostics:index_segments[search_idx]"
|
|
195
|
+
rake "paradedb:diagnostics:verify_index[search_idx]" SAMPLE_RATE=0.1
|
|
196
|
+
rake paradedb:diagnostics:verify_all_indexes INDEX_PATTERN=search_idx
|
|
257
197
|
```
|
|
258
198
|
|
|
259
|
-
|
|
260
|
-
|
|
261
|
-
- **ParadeDB Official Docs**: <https://docs.paradedb.com>
|
|
262
|
-
- **ParadeDB Website**: <https://paradedb.com>
|
|
263
|
-
|
|
264
|
-
## Contributing
|
|
265
|
-
|
|
266
|
-
Contribution and local development workflow live in [`CONTRIBUTING.md`](CONTRIBUTING.md).
|
|
199
|
+
Note: availability depends on your installed `pg_search` version.
|
|
267
200
|
|
|
268
|
-
##
|
|
201
|
+
## Examples
|
|
269
202
|
|
|
270
|
-
|
|
271
|
-
[
|
|
272
|
-
|
|
273
|
-
|
|
203
|
+
- [Quick Start](examples/quickstart/quickstart.rb)
|
|
204
|
+
- [Faceted Search](examples/faceted_search/faceted_search.rb)
|
|
205
|
+
- [Autocomplete](examples/autocomplete/autocomplete.rb)
|
|
206
|
+
- [More Like This](examples/more_like_this/more_like_this.rb)
|
|
207
|
+
- [Hybrid RRF](examples/hybrid_rrf/hybrid_rrf.rb)
|
|
208
|
+
- [RAG](examples/rag/rag.rb)
|
|
274
209
|
|
|
275
|
-
|
|
276
|
-
- Ask for help on our [GitHub Discussions](https://github.com/paradedb/paradedb/discussions)
|
|
210
|
+
## Contributing
|
|
277
211
|
|
|
278
|
-
|
|
212
|
+
See [CONTRIBUTING.md](CONTRIBUTING.md).
|
|
279
213
|
|
|
280
214
|
## License
|
|
281
215
|
|
|
282
|
-
|
|
216
|
+
MIT
|
|
@@ -3,6 +3,19 @@
|
|
|
3
3
|
module ParadeDB
|
|
4
4
|
# Typed helpers for building agg JSON payloads passed to pdb.agg(...).
|
|
5
5
|
module Aggregations
|
|
6
|
+
FilteredSpec = Struct.new(:spec, :agg_filter, keyword_init: true) do
|
|
7
|
+
# Backward-compatible reader for code that accessed `filtered_spec.filter`.
|
|
8
|
+
alias filter agg_filter
|
|
9
|
+
end
|
|
10
|
+
FieldTermFilter = Struct.new(
|
|
11
|
+
:field,
|
|
12
|
+
:term,
|
|
13
|
+
:distance,
|
|
14
|
+
:prefix,
|
|
15
|
+
:transposition_cost_one,
|
|
16
|
+
keyword_init: true
|
|
17
|
+
)
|
|
18
|
+
|
|
6
19
|
TERMS_ORDER = {
|
|
7
20
|
count_desc: { "_count" => "desc" },
|
|
8
21
|
count_asc: { "_count" => "asc" },
|
|
@@ -18,7 +31,7 @@ module ParadeDB
|
|
|
18
31
|
|
|
19
32
|
specs.each_with_object({}) do |(alias_name, spec), payload|
|
|
20
33
|
alias_key = normalize_alias(alias_name)
|
|
21
|
-
payload[alias_key] =
|
|
34
|
+
payload[alias_key] = normalize_named_spec(spec)
|
|
22
35
|
end
|
|
23
36
|
end
|
|
24
37
|
|
|
@@ -132,11 +145,43 @@ module ParadeDB
|
|
|
132
145
|
}
|
|
133
146
|
end
|
|
134
147
|
|
|
148
|
+
def top_hits(size: nil, from: nil, sort: nil, docvalue_fields: nil)
|
|
149
|
+
payload = {}
|
|
150
|
+
payload["size"] = normalize_non_negative_integer(size, "size") unless size.nil?
|
|
151
|
+
payload["from"] = normalize_non_negative_integer(from, "from") unless from.nil?
|
|
152
|
+
payload["sort"] = normalize_top_hits_sort(sort) unless sort.nil?
|
|
153
|
+
payload["docvalue_fields"] = normalize_docvalue_fields(docvalue_fields) unless docvalue_fields.nil?
|
|
154
|
+
{ "top_hits" => payload }
|
|
155
|
+
end
|
|
156
|
+
|
|
157
|
+
def filtered(spec, filter: nil, field: nil, term: nil, distance: nil, prefix: nil, transposition_cost_one: nil)
|
|
158
|
+
normalized_spec = normalize_spec(spec)
|
|
159
|
+
normalized_filter = normalize_filter(
|
|
160
|
+
filter: filter,
|
|
161
|
+
field: field,
|
|
162
|
+
term: term,
|
|
163
|
+
distance: distance,
|
|
164
|
+
prefix: prefix,
|
|
165
|
+
transposition_cost_one: transposition_cost_one
|
|
166
|
+
)
|
|
167
|
+
FilteredSpec.new(spec: normalized_spec, agg_filter: normalized_filter)
|
|
168
|
+
end
|
|
169
|
+
|
|
135
170
|
def metric(name, field)
|
|
136
171
|
{ name => { "field" => normalize_field(field) } }
|
|
137
172
|
end
|
|
138
173
|
private_class_method :metric
|
|
139
174
|
|
|
175
|
+
def normalize_named_spec(spec)
|
|
176
|
+
case spec
|
|
177
|
+
when FilteredSpec
|
|
178
|
+
FilteredSpec.new(spec: normalize_spec(spec.spec), agg_filter: spec.agg_filter)
|
|
179
|
+
else
|
|
180
|
+
normalize_spec(spec)
|
|
181
|
+
end
|
|
182
|
+
end
|
|
183
|
+
private_class_method :normalize_named_spec
|
|
184
|
+
|
|
140
185
|
def normalize_alias(alias_name)
|
|
141
186
|
value =
|
|
142
187
|
case alias_name
|
|
@@ -166,6 +211,32 @@ module ParadeDB
|
|
|
166
211
|
end
|
|
167
212
|
private_class_method :normalize_spec
|
|
168
213
|
|
|
214
|
+
def normalize_filter(filter:, field:, term:, distance:, prefix:, transposition_cost_one:)
|
|
215
|
+
if filter
|
|
216
|
+
if !field.nil? || !term.nil?
|
|
217
|
+
raise ArgumentError, "filtered aggregation accepts either filter: or field/term arguments, not both"
|
|
218
|
+
end
|
|
219
|
+
return filter
|
|
220
|
+
end
|
|
221
|
+
|
|
222
|
+
if field.nil? || term.nil?
|
|
223
|
+
raise ArgumentError, "filtered aggregation requires filter: or both field: and term:"
|
|
224
|
+
end
|
|
225
|
+
|
|
226
|
+
normalized_distance = distance.nil? ? nil : normalize_non_negative_integer(distance, "distance")
|
|
227
|
+
normalized_prefix = normalize_boolean_option(prefix, "prefix")
|
|
228
|
+
normalized_transposition = normalize_boolean_option(transposition_cost_one, "transposition_cost_one")
|
|
229
|
+
|
|
230
|
+
FieldTermFilter.new(
|
|
231
|
+
field: normalize_field(field),
|
|
232
|
+
term: term,
|
|
233
|
+
distance: normalized_distance,
|
|
234
|
+
prefix: normalized_prefix,
|
|
235
|
+
transposition_cost_one: normalized_transposition
|
|
236
|
+
)
|
|
237
|
+
end
|
|
238
|
+
private_class_method :normalize_filter
|
|
239
|
+
|
|
169
240
|
def normalize_field(field)
|
|
170
241
|
case field
|
|
171
242
|
when Symbol
|
|
@@ -215,6 +286,46 @@ module ParadeDB
|
|
|
215
286
|
end
|
|
216
287
|
private_class_method :normalize_bounds
|
|
217
288
|
|
|
289
|
+
def normalize_top_hits_sort(sort)
|
|
290
|
+
entries = Array(sort)
|
|
291
|
+
raise ArgumentError, "top_hits sort must include at least one field" if entries.empty?
|
|
292
|
+
|
|
293
|
+
entries.map do |entry|
|
|
294
|
+
raise ArgumentError, "top_hits sort entries must be Hash values" unless entry.is_a?(Hash)
|
|
295
|
+
raise ArgumentError, "top_hits sort entries must include exactly one field" unless entry.size == 1
|
|
296
|
+
|
|
297
|
+
field, direction = entry.first
|
|
298
|
+
{
|
|
299
|
+
normalize_field(field) => normalize_sort_direction(direction)
|
|
300
|
+
}
|
|
301
|
+
end
|
|
302
|
+
end
|
|
303
|
+
private_class_method :normalize_top_hits_sort
|
|
304
|
+
|
|
305
|
+
def normalize_docvalue_fields(fields)
|
|
306
|
+
values = Array(fields)
|
|
307
|
+
raise ArgumentError, "top_hits docvalue_fields must include at least one field" if values.empty?
|
|
308
|
+
|
|
309
|
+
values.map { |field| normalize_field(field) }
|
|
310
|
+
end
|
|
311
|
+
private_class_method :normalize_docvalue_fields
|
|
312
|
+
|
|
313
|
+
def normalize_sort_direction(direction)
|
|
314
|
+
value = direction.to_s
|
|
315
|
+
return value if %w[asc desc].include?(value)
|
|
316
|
+
|
|
317
|
+
raise ArgumentError, "sort direction must be 'asc' or 'desc'"
|
|
318
|
+
end
|
|
319
|
+
private_class_method :normalize_sort_direction
|
|
320
|
+
|
|
321
|
+
def normalize_boolean_option(value, name)
|
|
322
|
+
return nil if value.nil?
|
|
323
|
+
return value if value == true || value == false
|
|
324
|
+
|
|
325
|
+
raise ArgumentError, "#{name} must be true, false, or nil"
|
|
326
|
+
end
|
|
327
|
+
private_class_method :normalize_boolean_option
|
|
328
|
+
|
|
218
329
|
def deep_stringify(value)
|
|
219
330
|
case value
|
|
220
331
|
when Hash
|