caoutsearch 0.0.0 → 0.0.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +819 -6
- data/lib/caoutsearch/config/mappings.rb +1 -1
- data/lib/caoutsearch/filter/base.rb +11 -7
- data/lib/caoutsearch/filter/boolean.rb +1 -1
- data/lib/caoutsearch/filter/date.rb +93 -22
- data/lib/caoutsearch/filter/default.rb +10 -10
- data/lib/caoutsearch/filter/geo_point.rb +1 -1
- data/lib/caoutsearch/filter/match.rb +5 -5
- data/lib/caoutsearch/filter/none.rb +1 -1
- data/lib/caoutsearch/filter/range.rb +6 -6
- data/lib/caoutsearch/index/document.rb +11 -11
- data/lib/caoutsearch/index/indice_versions.rb +3 -3
- data/lib/caoutsearch/index/internal_dsl.rb +3 -3
- data/lib/caoutsearch/index/reindex.rb +11 -11
- data/lib/caoutsearch/index/scoping.rb +4 -4
- data/lib/caoutsearch/index/serialization.rb +13 -13
- data/lib/caoutsearch/instrumentation/base.rb +12 -12
- data/lib/caoutsearch/instrumentation/search.rb +11 -2
- data/lib/caoutsearch/mappings.rb +1 -1
- data/lib/caoutsearch/model/indexable.rb +57 -0
- data/lib/caoutsearch/model/searchable.rb +31 -0
- data/lib/caoutsearch/model.rb +12 -0
- data/lib/caoutsearch/response/aggregations.rb +50 -0
- data/lib/caoutsearch/response/response.rb +9 -0
- data/lib/caoutsearch/response/suggestions.rb +9 -0
- data/lib/caoutsearch/response.rb +6 -0
- data/lib/caoutsearch/search/adapter/active_record.rb +39 -0
- data/lib/caoutsearch/search/base.rb +16 -15
- data/lib/caoutsearch/search/batch/scroll.rb +93 -0
- data/lib/caoutsearch/search/batch/search_after.rb +70 -0
- data/lib/caoutsearch/search/batch_methods.rb +63 -0
- data/lib/caoutsearch/search/callbacks.rb +28 -0
- data/lib/caoutsearch/search/delete_methods.rb +19 -0
- data/lib/caoutsearch/search/dsl/item.rb +2 -2
- data/lib/caoutsearch/search/inspect.rb +34 -0
- data/lib/caoutsearch/search/instrumentation.rb +19 -0
- data/lib/caoutsearch/search/internal_dsl.rb +107 -0
- data/lib/caoutsearch/search/naming.rb +45 -0
- data/lib/caoutsearch/search/point_in_time.rb +28 -0
- data/lib/caoutsearch/search/query/boolean.rb +4 -4
- data/lib/caoutsearch/search/query/nested.rb +1 -1
- data/lib/caoutsearch/search/query/setters.rb +4 -4
- data/lib/caoutsearch/search/query_builder/aggregations.rb +49 -0
- data/lib/caoutsearch/search/query_builder.rb +89 -0
- data/lib/caoutsearch/search/query_methods.rb +157 -0
- data/lib/caoutsearch/search/records.rb +23 -0
- data/lib/caoutsearch/search/resettable.rb +38 -0
- data/lib/caoutsearch/search/response.rb +97 -0
- data/lib/caoutsearch/search/sanitizer.rb +2 -2
- data/lib/caoutsearch/search/search_methods.rb +239 -0
- data/lib/caoutsearch/search/type_cast.rb +14 -6
- data/lib/caoutsearch/search/value.rb +10 -10
- data/lib/caoutsearch/search/value_overflow.rb +1 -1
- data/lib/caoutsearch/settings.rb +1 -1
- data/lib/caoutsearch/testing/mock_requests.rb +105 -0
- data/lib/caoutsearch/testing.rb +3 -0
- data/lib/caoutsearch/version.rb +1 -1
- data/lib/caoutsearch.rb +10 -5
- metadata +44 -126
- data/lib/caoutsearch/search/search/delete_methods.rb +0 -21
- data/lib/caoutsearch/search/search/inspect.rb +0 -36
- data/lib/caoutsearch/search/search/instrumentation.rb +0 -21
- data/lib/caoutsearch/search/search/internal_dsl.rb +0 -77
- data/lib/caoutsearch/search/search/naming.rb +0 -47
- data/lib/caoutsearch/search/search/query_builder.rb +0 -94
- data/lib/caoutsearch/search/search/query_methods.rb +0 -180
- data/lib/caoutsearch/search/search/resettable.rb +0 -35
- data/lib/caoutsearch/search/search/response.rb +0 -88
- data/lib/caoutsearch/search/search/scroll_methods.rb +0 -113
- data/lib/caoutsearch/search/search/search_methods.rb +0 -230
data/README.md
CHANGED
@@ -1,18 +1,830 @@
|
|
1
1
|
# Caoutsearch [\ˈkawt͡ˈsɝtʃ\\](http://ipa-reader.xyz/?text=ˈkawt͡ˈsɝtʃ)
|
2
2
|
|
3
|
-
|
3
|
+
[](https://rubygems.org/gems/caoutsearch)
|
4
|
+
[](https://github.com/mon-territoire/caoutsearch/actions/workflows/ci.yml)
|
5
|
+
[](https://github.com/testdouble/standard)
|
6
|
+
[](https://codeclimate.com/github/mon-territoire/caoutsearch/maintainability)
|
7
|
+
[](https://codeclimate.com/github/mon-territoire/caoutsearch/test_coverage)
|
8
|
+
|
9
|
+
[](https://github.com/mon-territoire/caoutsearch/actions/workflows/jruby.yml)
|
10
|
+
[](https://github.com/mon-territoire/caoutsearch/actions/workflows/truffle_ruby.yml)
|
11
|
+
|
12
|
+
**!! Gem under development before public release !!**
|
13
|
+
|
14
|
+
Caoutsearch is a new Elasticsearch integration for Ruby and/or Rails.
|
15
|
+
It provides a simple but powerful DSL to perform complex indexing and searching, while securely exposing search criteria to a public and chainable API, without overwhelming your models.
|
16
|
+
|
17
|
+
Caoutsearch only supports Elasticsearch 8.x right now.
|
18
|
+
It is used in production in a robust application, updated and maintained for several years at [Mon Territoire](https://mon-territoire.fr).
|
19
|
+
|
20
|
+
Caoutsearch was inspired by awesome gems such as [elasticsearch-rails](https://github.com/elastic/elasticsearch-rails) or [search_flip](https://github.com/mrkamel/search_flip).
|
21
|
+
If you don't have scenarios as complex as those described in this documentation, they should better suite your needs.
|
22
|
+
|
23
|
+
## Table of Contents
|
24
|
+
|
25
|
+
- [Installation](#installation)
|
26
|
+
- [Configuration](#configuration)
|
27
|
+
- Instrumentation
|
28
|
+
- [Usage](#usage)
|
29
|
+
- [Indice Configuration](#indice-configuration)
|
30
|
+
- Mapping & settings
|
31
|
+
- Text analysis
|
32
|
+
- Versionning
|
33
|
+
- [Index Engine](#index-engine)
|
34
|
+
- Properties
|
35
|
+
- Partial updates
|
36
|
+
- Eager loading
|
37
|
+
- Interdependencies
|
38
|
+
- [Search Engine](#search-engine)
|
39
|
+
- Queries
|
40
|
+
- [Filters](#filters)
|
41
|
+
- Full-text query
|
42
|
+
- Custom filters
|
43
|
+
- Orders
|
44
|
+
- [Aggregations](#aggregations)
|
45
|
+
- [Transform aggregations](#transform-aggregations)
|
46
|
+
- [Responses](#responses)
|
47
|
+
- [Loading records](#loading-records)
|
48
|
+
- [Model integration](#model-integration)
|
49
|
+
- [Add Caoutsearch to your models](#add-caoutsearch-to-your-models)
|
50
|
+
- [Index records](#index-records)
|
51
|
+
- [Index multiple records](#index-multiple-records)
|
52
|
+
- [Index single records](#index-single-records)
|
53
|
+
- [Delete documents](#delete-documents)
|
54
|
+
- [Automatic Callbacks](#automatic-callbacks)
|
55
|
+
- Asynchronous methods
|
56
|
+
- [Search for records](#search-for-records)
|
57
|
+
- [Search API](#search-api)
|
58
|
+
- [Pagination](#pagination)
|
59
|
+
- [Total count](#total-count)
|
60
|
+
- [Iterating results](#iterating-results)
|
61
|
+
- [Testing with Caoutsearch](#testing-with-Caoutsearch)
|
62
|
+
|
63
|
+
## Installation
|
4
64
|
|
5
65
|
```bash
|
6
66
|
bundle add caoutsearch
|
7
67
|
```
|
8
68
|
|
9
|
-
|
69
|
+
## Configuration
|
70
|
+
|
71
|
+
TODO
|
72
|
+
|
73
|
+
## Usage
|
74
|
+
|
75
|
+
### Indice Configuration
|
76
|
+
|
77
|
+
TODO
|
78
|
+
|
79
|
+
### Index Engine
|
80
|
+
|
81
|
+
TODO
|
82
|
+
|
83
|
+
### Search Engine
|
84
|
+
|
85
|
+
#### Filters
|
86
|
+
Filters declared in the search engine will define how Caoutsearch will build the queries
|
87
|
+
|
88
|
+
The main use of filters is to expose a field for search, but they can also be used to build more complex queries:
|
89
|
+
```ruby
|
90
|
+
class ArticleSearch < Caoutsearch::Search::Base
|
91
|
+
# Build a filter on the author field
|
92
|
+
filter :author
|
93
|
+
|
94
|
+
# Build a Match filter on multiple fields
|
95
|
+
filter :content, indexes: %i[title.words content], as: :match
|
96
|
+
|
97
|
+
# Build a more complex filter by using other filters
|
98
|
+
filter :public, as: :boolean
|
99
|
+
filter :published_on, as: :date
|
100
|
+
filter :active do |value|
|
101
|
+
search_by(published: value, published_on: value)
|
102
|
+
end
|
103
|
+
end
|
104
|
+
```
|
105
|
+
|
106
|
+
Caoutsearch different types of filters to handle different types of data or ways to search them:
|
107
|
+
|
108
|
+
##### Default filter
|
109
|
+
|
110
|
+
##### Boolean filter
|
111
|
+
|
112
|
+
##### Date filter
|
113
|
+
|
114
|
+
For a date filter defined like this:
|
115
|
+
```ruby
|
116
|
+
class ArticleSearch < Caoutsearch::Search::Base
|
117
|
+
...
|
118
|
+
|
119
|
+
filter :published_on, as: :date
|
120
|
+
end
|
121
|
+
```
|
122
|
+
|
123
|
+
You can now search the matching index with the `published_on` criterion:
|
124
|
+
```ruby
|
125
|
+
Article.search(published_on: Date.today)
|
126
|
+
```
|
127
|
+
|
128
|
+
and the following query will be generated to send to elasticsearch:
|
129
|
+
```json
|
130
|
+
{
|
131
|
+
"query": {
|
132
|
+
"bool": {
|
133
|
+
"filter": [
|
134
|
+
{ "range": { "published_on": { "gte": "2022-23-11", "lte": "2022-23-11"}}}
|
135
|
+
]
|
136
|
+
}
|
137
|
+
}
|
138
|
+
}
|
139
|
+
```
|
140
|
+
|
141
|
+
The date filter accepts multiple types of arguments :
|
142
|
+
|
143
|
+
```ruby
|
144
|
+
# Search for articles published on a date:
|
145
|
+
Article.search(published_on: Date.today)
|
146
|
+
|
147
|
+
# Search for articles published before a date:
|
148
|
+
Article.search(published_on: { less_than: "2022-12-25" })
|
149
|
+
Article.search(published_on: { less_than_or_equal: "2022-12-25" })
|
150
|
+
Article.search(published_on: ..Date.new(2022, 12, 25))
|
151
|
+
Article.search(published_on: [[nil, "now-2w/d"]])
|
152
|
+
|
153
|
+
# Search for articles published after a date:
|
154
|
+
Article.search(published_on: { greater_than: "2022-12-25" })
|
155
|
+
Article.search(published_on: { greater_than_or_equal: "2022-12-25" })
|
156
|
+
Article.search(published_on: Date.new(2022, 12, 25)..)
|
157
|
+
Article.search(published_on: [["now-1w/d", nil]])
|
158
|
+
|
159
|
+
# Search for articles published between two dates:
|
160
|
+
Article.search(published_on: { greater_than: "2022-12-25", less_than: "2023-12-25" })
|
161
|
+
Article.search(published_on: Date.new(2022, 12, 25)..Date.new(2023, 12, 25))
|
162
|
+
Article.search(published_on: [["now-1w/d", "now/d"]])
|
163
|
+
```
|
164
|
+
|
165
|
+
Dates of various formats are handled:
|
166
|
+
```ruby
|
167
|
+
"2022-10-11"
|
168
|
+
Date.today
|
169
|
+
Time.zone.now
|
170
|
+
```
|
171
|
+
|
172
|
+
We also support elasticsearch's date math
|
173
|
+
```ruby
|
174
|
+
"now-1h"
|
175
|
+
"now+2w/d"
|
176
|
+
```
|
177
|
+
|
178
|
+
##### GeoPoint filter
|
179
|
+
|
180
|
+
##### Match filter
|
181
|
+
|
182
|
+
##### Range filter
|
183
|
+
|
184
|
+
#### Aggregations
|
185
|
+
|
186
|
+
You can define simple to complex aggregations.
|
187
|
+
|
188
|
+
````ruby
|
189
|
+
class ArticleSearch < Caoutsearch::Search::Base
|
190
|
+
has_aggregation :view_count, sum: { field: :view_count }
|
191
|
+
has_aggregation :popular_tags,
|
192
|
+
filter: { term: { published: true } },
|
193
|
+
aggs: {
|
194
|
+
published: {
|
195
|
+
terms: { field: :tags, size: 10 }
|
196
|
+
}
|
197
|
+
}
|
198
|
+
end
|
199
|
+
````
|
200
|
+
|
201
|
+
Then you can request one or more aggregations at the same time or chain the `aggregate` method.
|
202
|
+
The `aggregations` method will trigger a request and returns a [Response::Aggregations](#responses).
|
203
|
+
|
204
|
+
````ruby
|
205
|
+
ArticleSearch.aggregate(:view_count).aggregations
|
206
|
+
# ArticleSearch Search { "body": { "aggs": { "view_count": { "sum": { "field": "view_count" }}}}}
|
207
|
+
# ArticleSearch Search (10ms / took 5ms)
|
208
|
+
=> #<Caoutsearch::Response::Aggregations view_count=#<Caoutsearch::Response::Response value=119652>>
|
209
|
+
|
210
|
+
ArticleSearch.aggregate(:view_count, :popular_tags).aggregations
|
211
|
+
# ArticleSearch Search { "body": { "aggs": { "view_count": {…}, "popular_tags": {…}}}}
|
212
|
+
# ArticleSearch Search (10ms / took 5ms)
|
213
|
+
=> #<Caoutsearch::Response::Aggregations view_count=#<Caoutsearch::Response::Response value=119652> popular_tags=#<Caoutsearch::Response::Response buckets=…>>
|
214
|
+
|
215
|
+
ArticleSearch.aggregate(:view_count).aggregate(:popular_tags).aggregations
|
216
|
+
# ArticleSearch Search { "body": { "aggs": { "view_count": {…}, "popular_tags": {…}}}}
|
217
|
+
# ArticleSearch Search (10ms / took 5ms)
|
218
|
+
=> #<Caoutsearch::Response::Aggregations view_count=#<Caoutsearch::Response::Response value=119652> popular_tags=#<Caoutsearch::Response::Response buckets=…>>
|
219
|
+
````
|
220
|
+
|
221
|
+
You can create powerful aggregations using blocks and pass arguments to them.
|
222
|
+
|
223
|
+
````ruby
|
224
|
+
class ArticleSearch < Caoutsearch::Search::Base
|
225
|
+
has_aggregation :popular_tags_since do |date|
|
226
|
+
raise TypeError unless date.is_a?(Date)
|
227
|
+
|
228
|
+
query.aggregations[:popular_tags_since] = {
|
229
|
+
filter: { range: { publication_date: { gte: date.to_s } } },
|
230
|
+
aggs: {
|
231
|
+
published: {
|
232
|
+
terms: { field: :tags, size: 20 }
|
233
|
+
}
|
234
|
+
}
|
235
|
+
}
|
236
|
+
end
|
237
|
+
end
|
238
|
+
|
239
|
+
ArticleSearch.aggregate(popular_tags_since: 1.day.ago).aggregations
|
240
|
+
# ArticleSearch Search { "body": { "aggs": { "popular_tags_since": {…}}}}
|
241
|
+
# ArticleSearch Search (10ms / took 5ms)
|
242
|
+
=> #<Caoutsearch::Response::Aggregations popular_tags_since=#<Caoutsearch::Response::Response …
|
243
|
+
````
|
244
|
+
|
245
|
+
Only one argument can be passed to an aggregation block.
|
246
|
+
Use an Array or a Hash if you need to pass multiple options.
|
247
|
+
|
248
|
+
````ruby
|
249
|
+
class ArticleSearch < Caoutsearch::Search::Base
|
250
|
+
has_aggregation :popular_tags_since do |options|
|
251
|
+
# …
|
252
|
+
end
|
253
|
+
|
254
|
+
has_aggregation :popular_tags_between do |(first_date, end_date)|
|
255
|
+
# …
|
256
|
+
end
|
257
|
+
end
|
258
|
+
|
259
|
+
ArticleSearch.aggregate(popular_tags_since: { date: 1.day.ago, size: 20 })
|
260
|
+
ArticleSearch.aggregate(popular_tags_between: [date1, date2])
|
261
|
+
````
|
262
|
+
|
263
|
+
Finally, you can create a "catch-all" aggregation to handle cumbersome behaviors:
|
264
|
+
|
265
|
+
````ruby
|
266
|
+
class ArticleSearch < Caoutsearch::Search::Base
|
267
|
+
has_aggregation do |name, options = {}|
|
268
|
+
raise "unxpected_error" unless name.match?(/^view_count_(?<year>\d{4})$/)
|
269
|
+
|
270
|
+
query.aggregations[name] = {
|
271
|
+
filter: { term: { year: $LAST_LATCH_INFO[:year] } },
|
272
|
+
aggs: {
|
273
|
+
filtered: {
|
274
|
+
sum: { field: :view_count }
|
275
|
+
}
|
276
|
+
}
|
277
|
+
}
|
278
|
+
end
|
279
|
+
end
|
280
|
+
|
281
|
+
ArticleSearch.aggregate(:view_count_2020, :view_count_2019).aggregations
|
282
|
+
# ArticleSearch Search { "body": { "aggs": { "view_count_2020": {…}, "view_count_2019": {…}}}}
|
283
|
+
# ArticleSearch Search (10ms / took 5ms)
|
284
|
+
=> #<Caoutsearch::Response::Aggregations view_count_2020=#<Caoutsearch::Response::Response …
|
285
|
+
````
|
286
|
+
|
287
|
+
#### Transform aggregations
|
288
|
+
|
289
|
+
When using [buckets aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket.html) and/or [pipeline aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline.html), the path to the expected values can get complicated and become subject to unexpected changes for a public API.
|
290
|
+
|
291
|
+
````ruby
|
292
|
+
ArticleSearch.aggregate(popular_tags_since: 1.month.ago).aggregations.popular_tags_since.published.buckets.pluck(:key)
|
293
|
+
=> ["Blog", "Tech", …]
|
294
|
+
````
|
295
|
+
|
296
|
+
Instead, you can define transformations to provide simpler access to aggregated data:
|
297
|
+
|
298
|
+
````ruby
|
299
|
+
class ArticleSearch < Caoutsearch::Search::Base
|
300
|
+
has_aggregation :popular_tags_since do |since|
|
301
|
+
# …
|
302
|
+
end
|
303
|
+
|
304
|
+
transform_aggregation :popular_tags_since do |aggs|
|
305
|
+
aggs.dig(:popular_tags_since, :published, :buckets).pluck(:key)
|
306
|
+
end
|
307
|
+
end
|
308
|
+
|
309
|
+
ArticleSearch.aggregate(popular_tags_since: 1.month.ago).aggregations.popular_tags_since
|
310
|
+
=> ["Blog", "Tech", …]
|
311
|
+
````
|
312
|
+
|
313
|
+
You can also use transformations to combine multiple aggregations:
|
314
|
+
|
315
|
+
````ruby
|
316
|
+
class ArticleSearch < Caoutsearch::Search::Base
|
317
|
+
has_aggregation :blog_count, filter: { term: { category: "blog" } }
|
318
|
+
has_aggregation :archives_count, filter: { term: { archived: true } }
|
319
|
+
|
320
|
+
transform_aggregation :stats, from: %i[blog_count archives_count] do |aggs|
|
321
|
+
{
|
322
|
+
blog_count: aggs.dig(:blog_count, :doc_count),
|
323
|
+
archives_count: aggs.dig(:archives, :doc_count)
|
324
|
+
}
|
325
|
+
end
|
326
|
+
end
|
327
|
+
|
328
|
+
ArticleSearch.aggregate(:stats).aggregations.stats
|
329
|
+
# ArticleSearch Search { "body": { "aggs": { "blog_count": {…}, "archives_count": {…}}}}
|
330
|
+
# ArticleSearch Search (10ms / took 5ms)
|
331
|
+
=> { blog_count: 124, archives_count: 2452 }
|
332
|
+
````
|
333
|
+
|
334
|
+
This is also usefull to unify the API between different search engines:
|
335
|
+
|
336
|
+
````ruby
|
337
|
+
class ArticleSearch < Caoutsearch::Search::Base
|
338
|
+
has_aggregation :popular_tags,
|
339
|
+
filter: { term: { published: true } },
|
340
|
+
aggs: { published: { terms: { field: :tags, size: 10 } } }
|
341
|
+
|
342
|
+
transform_aggregation :popular_tags do |aggs|
|
343
|
+
aggs.dig(:popular_tags, :published, :buckets).pluck(:key)
|
344
|
+
end
|
345
|
+
end
|
346
|
+
|
347
|
+
class TagSearch < Caoutsearch::Search::Base
|
348
|
+
has_aggregation :popular_tags,
|
349
|
+
terms: { field: "label", size: 20, order: { used_count: "desc" } }
|
350
|
+
|
351
|
+
transform_aggregation :popular_tags do |aggs|
|
352
|
+
aggs.dig(:popular_tags, :buckets).pluck(:key)
|
353
|
+
end
|
354
|
+
end
|
355
|
+
|
356
|
+
ArticleSearch.aggregate(:popular_tags).aggregations.popular_tags
|
357
|
+
=> ["Blog", "Tech", …]
|
358
|
+
|
359
|
+
TagSearch.aggregate(:popular_tags).aggregations.popular_tags
|
360
|
+
=> ["Tech", "Blog", …]
|
361
|
+
````
|
362
|
+
|
363
|
+
Transformations are performed on demand and result is memorized. That means:
|
364
|
+
- the result of transformation is not visible in the [Response::Aggregations](#responses) output.
|
365
|
+
- the block is called only once for the same search instance.
|
366
|
+
|
367
|
+
````ruby
|
368
|
+
class ArticleSearch < Caoutsearch::Search::Base
|
369
|
+
has_aggregation :popular_tags, …
|
370
|
+
|
371
|
+
transform_aggregation :popular_tags do |aggs|
|
372
|
+
tags = aggs.dig(:popular_tags, :published, :buckets).pluck(:key)
|
373
|
+
authorized = Tag.where(title: tags, authorize: true).pluck(:title)
|
374
|
+
tags & authorized
|
375
|
+
end
|
376
|
+
end
|
377
|
+
|
378
|
+
article_search = ArticleSearch.aggregate(:popular_tags)
|
379
|
+
=> #<ArticleSearch current_aggregations: [:popular_tags]>
|
380
|
+
|
381
|
+
article_search.aggregations
|
382
|
+
# ArticleSearch Search (10ms / took 5ms)
|
383
|
+
=> #<Caoutsearch::Response::Aggregations popular_tags=#<Caoutsearch::Response::Response doc_count=100 …
|
384
|
+
|
385
|
+
article_search.aggregations.popular_tags
|
386
|
+
# (10.2ms) SELECT "tags"."title" FROM "tags" WHERE "tags"."title" IN …
|
387
|
+
=> ["Blog", "Tech", …]
|
388
|
+
|
389
|
+
article_search.aggregations.popular_tags
|
390
|
+
=> ["Blog", "Tech", …]
|
391
|
+
|
392
|
+
article_search.search("Tech").aggregations.popular_tags
|
393
|
+
# ArticleSearch Search (10ms / took 5ms)
|
394
|
+
# (10.2ms) SELECT "tags"."title" FROM "tags" WHERE "tags"."title" IN …
|
395
|
+
=> ["Blog", "Tech", …]
|
396
|
+
````
|
397
|
+
|
398
|
+
Be careful to avoid using `aggregations.<aggregation_name>` inside a transformation block: it can lead to an infinite recursion.
|
399
|
+
|
400
|
+
````ruby
|
401
|
+
class ArticleSearch < Caoutsearch::Search::Base
|
402
|
+
transform_aggregation :popular_tags do
|
403
|
+
aggregations.popular_tags.buckets.pluck("key")
|
404
|
+
end
|
405
|
+
end
|
406
|
+
|
407
|
+
ArticleSearch.aggregate(:popular_tags).aggregations.popular_tags
|
408
|
+
Traceback (most recent call last):
|
409
|
+
4: from app/searches/article_search.rb:3:in `block in <class:ArticleSearch>'
|
410
|
+
3: from app/searches/article_search.rb:3:in `block in <class:ArticleSearch>'
|
411
|
+
2: from app/searches/article_search.rb:3:in `block in <class:ArticleSearch>'
|
412
|
+
1: from app/searches/article_search.rb:3:in `block in <class:ArticleSearch>'
|
413
|
+
SystemStackError (stack level too deep)
|
414
|
+
````
|
415
|
+
|
416
|
+
Instead, use the argument passed to the block: it's is a shortcut for `response.aggregations` which is a [Response::Reponse](#responses) and not a [Response::Aggregations](#responses).
|
417
|
+
|
418
|
+
````ruby
|
419
|
+
class ArticleSearch < Caoutsearch::Search::Base
|
420
|
+
transform_aggregation :popular_tags do |aggs|
|
421
|
+
aggs.popular_tags.buckets.pluck("key")
|
422
|
+
end
|
423
|
+
end
|
424
|
+
|
425
|
+
ArticleSearch.aggregate(:popular_tags).aggregations.popular_tags
|
426
|
+
=> ["Blog", "Tech", …]
|
427
|
+
````
|
428
|
+
|
429
|
+
One last helpful argument is `track_total_hits` which allows to perform calculations over aggregations using the `total_count` method without sending a second request.
|
430
|
+
Take a look at [Total count](#total-count) to understand why a second request could be performed.
|
431
|
+
|
432
|
+
````ruby
|
433
|
+
class ArticleSearch < Caoutsearch::Search::Base
|
434
|
+
aggregation :tagged, filter: { exist: "tag" }
|
435
|
+
|
436
|
+
transform_aggregation :tagged_rate, from: :tagged, track_total_hits: true do |aggs|
|
437
|
+
count = aggs.dig(:tagged, :doc_count)
|
438
|
+
count.to_f / total_count
|
439
|
+
end
|
440
|
+
|
441
|
+
transform_aggregation :tagged_rate_without_track_total_hits, from: :tagged do |aggs|
|
442
|
+
count = aggs.dig(:tagged, :doc_count)
|
443
|
+
count.to_f / total_count
|
444
|
+
end
|
445
|
+
end
|
446
|
+
|
447
|
+
ArticleSearch.aggregate(:tagged_rate).aggregations.tagged_rate
|
448
|
+
# ArticleSearch Search { "body": { "track_total_hits": true, "aggs": { "blog_count": {…}, "archives_count": {…}}}}
|
449
|
+
# ArticleSearch Search (10ms / took 5ms)
|
450
|
+
=> 0.95
|
451
|
+
|
452
|
+
ArticleSearch.aggregate(:tagged_rate_without_track_total_hits).aggregations.tagged_rate
|
453
|
+
# ArticleSearch Search { "body": { "aggs": { "blog_count": {…}, "archives_count": {…}}}}
|
454
|
+
# ArticleSearch Search (10ms / took 5ms)
|
455
|
+
# ArticleSearch Search { "body": { "track_total_hits": true, "aggs": { "blog_count": {…}, "archives_count":
|
456
|
+
# ArticleSearch Search (10ms / took 5ms)
|
457
|
+
=> 0.95
|
458
|
+
````
|
459
|
+
|
460
|
+
#### Responses
|
461
|
+
|
462
|
+
After the request has been sent by calling a method such as `load`, `response` or `hits`, the results is wrapped in a `Response::Response` class which provides method access to its properties via [Hashie::Mash](http://github.com/intridea/hashie).
|
463
|
+
|
464
|
+
Aggregations and suggestions are wrapped in their own respective subclass of `Response::Response`
|
465
|
+
|
466
|
+
````ruby
|
467
|
+
results.response
|
468
|
+
=> #<Caoutsearch::Response::Response _shards=#<Caoutsearch::Response::Response failed=0 skipped=0 successful=5 total=5> hits=…
|
469
|
+
|
470
|
+
search.hits
|
471
|
+
=> #<Hashie::Array [#<Caoutsearch::Response::Response _id="2"…
|
472
|
+
|
473
|
+
search.aggregations
|
474
|
+
=> #<Caoutsearch::Response::Aggregations view_count=#<Caoutsearch::Response::Response…
|
475
|
+
|
476
|
+
search.suggestions
|
477
|
+
=> #<Caoutsearch::Response::Suggestions tags=#<Caoutsearch::Response::Response…
|
478
|
+
````
|
479
|
+
|
480
|
+
##### Loading records
|
481
|
+
|
482
|
+
When calling `records`, the search engine will try to load records from a model using the same class name without `Search` the suffix:
|
483
|
+
* `ArticleSearch` > `Article`
|
484
|
+
* `Blog::ArticleSearch` > `Blog::Article`
|
485
|
+
|
486
|
+
````ruby
|
487
|
+
ArticleSearch.new.records.first
|
488
|
+
# ArticleSearch Search (10ms / took 5ms)
|
489
|
+
# Article Load (9.6ms) SELECT "articles".* FROM "articles" WHERE "articles"."id" IN (1, …
|
490
|
+
=> #<Article id: 1, …>
|
491
|
+
````
|
492
|
+
|
493
|
+
However, you can define an alternative model to load records. This might be helpful when using [single table inheritance](https://api.rubyonrails.org/classes/ActiveRecord/Inheritance.html).
|
494
|
+
|
495
|
+
````ruby
|
496
|
+
ArticleSearch.new.records(use: BlogArticle).first
|
497
|
+
# ArticleSearch Search (10ms / took 5ms)
|
498
|
+
# BlogArticle Load (9.6ms) SELECT "articles".* FROM "articles" WHERE "articles"."id" IN (1, …
|
499
|
+
=> #<BlogArticle id: 1, …>
|
500
|
+
````
|
501
|
+
|
502
|
+
You can also define an alternative model at class level:
|
503
|
+
|
504
|
+
````ruby
|
505
|
+
class BlogArticleSearch < Caoutsearch::Search::Base
|
506
|
+
self.model_name = "Article"
|
507
|
+
|
508
|
+
default do
|
509
|
+
query.filters << { term: { category: "blog" } }
|
510
|
+
end
|
511
|
+
end
|
512
|
+
|
513
|
+
BlogArticleSearch.new.records.first
|
514
|
+
# BlogArticleSearch Search (10ms / took 5ms)
|
515
|
+
# Article Load (9.6ms) SELECT "articles".* FROM "articles" WHERE "articles"."id" IN (1, …
|
516
|
+
=> #<Article id: 1, …>
|
517
|
+
````
|
518
|
+
|
519
|
+
### Model integration
|
520
|
+
|
521
|
+
#### Add Caoutsearch to your models
|
522
|
+
|
523
|
+
The simplest solution is to add `Caoutsearch::Model` to your model and the link the appropriate `Index` and/or `Search` engines:
|
524
|
+
|
525
|
+
```ruby
|
526
|
+
class Article < ActiveRecord::Base
|
527
|
+
include Caoutsearch::Model
|
528
|
+
|
529
|
+
index_with ArticleIndex
|
530
|
+
search_with ArticleSearch
|
531
|
+
end
|
532
|
+
```
|
533
|
+
|
534
|
+
If you don't need your models to be `Indexable` and `Searchable`, you can include only one of the following two modules:
|
535
|
+
|
536
|
+
````ruby
|
537
|
+
class Article < ActiveRecord::Base
|
538
|
+
include Caoutsearch::Model::Indexable
|
539
|
+
|
540
|
+
index_with ArticleIndex
|
541
|
+
end
|
542
|
+
````
|
543
|
+
or
|
544
|
+
````ruby
|
545
|
+
class Article < ActiveRecord::Base
|
546
|
+
include Caoutsearch::Model::Searchable
|
547
|
+
|
548
|
+
search_with ArticleSearch
|
549
|
+
end
|
550
|
+
````
|
551
|
+
|
552
|
+
The modules can be safely included in the meta model `ApplicationRecord`.
|
553
|
+
Indexing & searching features are not available until you call `index_with` or `search_with`:
|
554
|
+
|
555
|
+
````ruby
|
556
|
+
class ApplicationRecord < ActiveRecord::Base
|
557
|
+
include Caoutsearch::Model
|
558
|
+
end
|
559
|
+
````
|
560
|
+
|
561
|
+
#### Index records
|
562
|
+
|
563
|
+
##### Index multiple records
|
564
|
+
|
565
|
+
Import all your records or a restricted scope of records to Elastcisearch.
|
566
|
+
|
567
|
+
````ruby
|
568
|
+
Article.reindex
|
569
|
+
Article.where(published: true).reindex
|
570
|
+
````
|
571
|
+
|
572
|
+
You can update one or more properties. (see [Indexation Engines](#indexation-engines) to read more about properties):
|
573
|
+
|
574
|
+
````ruby
|
575
|
+
Article.reindex(:category)
|
576
|
+
Article.reindex(%i[category published_on])
|
577
|
+
````
|
578
|
+
|
579
|
+
When `reindex` is called without properties, it'll import the full document to ES.
|
580
|
+
On the contrary, when properties are passed, it'll only update existing documents.
|
581
|
+
You can control this behavior with the `method` argument.
|
582
|
+
|
583
|
+
````ruby
|
584
|
+
Article.where(id: 123).reindex(:category)
|
585
|
+
# ArticleIndex Reindex {"index":"articles","body":[{"update":{"_id":123}},{"doc":{"category":"blog"}}]}
|
586
|
+
# [Error] {"update"=>{"_index"=>"articles", "_id"=>"123", "status"=>404, "error"=>{"type"=>"document_missing_exception", …}}
|
587
|
+
|
588
|
+
Article.where(id: 123).reindex(:category, method: :index)
|
589
|
+
# ArticleIndex Reindex {"index":"articles","body":[{"index":{"_id":123}},{"category":"blog"}]}
|
590
|
+
|
591
|
+
Article.where(id: 123).reindex(method: :update)
|
592
|
+
# ArticleIndex Reindex {"index":"articles","body":[{"update":{"_id":123}},{"doc":{…}}]}
|
593
|
+
````
|
594
|
+
|
595
|
+
##### Index single records
|
596
|
+
|
597
|
+
Import a single record.
|
598
|
+
|
599
|
+
````ruby
|
600
|
+
Article.find(123).update_index
|
601
|
+
````
|
602
|
+
|
603
|
+
You can update one or more properties. (see [Indexation Engines](#indexation-engines) to read more about properties):
|
604
|
+
|
605
|
+
````ruby
|
606
|
+
Article.find(123).update_index(:category)
|
607
|
+
Article.find(123).update_index(%i[category published_on])
|
608
|
+
````
|
609
|
+
|
610
|
+
You can verify if and how documents are indexed.
|
611
|
+
If the document is missing in ES, it'll raise a `Elastic::Transport::Transport::Errors::NotFound`.
|
612
|
+
|
613
|
+
````ruby
|
614
|
+
Article.find(123).indexed_document
|
615
|
+
# Traceback (most recent call last):
|
616
|
+
# 1: from (irb):1
|
617
|
+
# Elastic::Transport::Transport::Errors::NotFound ([404] {"_index":"articles","_id":"123","found":false})
|
618
|
+
|
619
|
+
Article.find(123).update_index
|
620
|
+
Article.find(123).indexed_document
|
621
|
+
=> {"_index"=>"articles", "_id"=>"123", "_version"=>1"found"=>true, "_source"=>{…}}
|
622
|
+
````
|
623
|
+
|
624
|
+
##### Delete documents
|
625
|
+
|
626
|
+
You can delete one or more documents.
|
627
|
+
**Note**: it won't delete records from database, only from the ES indice.
|
628
|
+
|
629
|
+
````ruby
|
630
|
+
Article.delete_indexes
|
631
|
+
Article.where(id: 123).delete_indexed_documents
|
632
|
+
Article.find(123).delete_index
|
633
|
+
````
|
634
|
+
|
635
|
+
If a record is already deleted from the database, you can still delete its document.
|
636
|
+
|
637
|
+
````ruby
|
638
|
+
Article.delete_index(123)
|
639
|
+
````
|
640
|
+
|
641
|
+
##### Automatic Callbacks
|
642
|
+
|
643
|
+
Callbacks are not provided by Caoutsearch but they are very easy to add:
|
644
|
+
|
645
|
+
````ruby
|
646
|
+
class Article < ApplicationRecord
|
647
|
+
index_with ArticleIndex
|
648
|
+
|
649
|
+
after_commit :update_index, on: %i[create update]
|
650
|
+
after_commit :delete_index, on: %i[destroy]
|
651
|
+
end
|
652
|
+
````
|
653
|
+
|
654
|
+
##### Asynchronous methods
|
655
|
+
|
656
|
+
TODO
|
657
|
+
|
658
|
+
#### Search for records
|
659
|
+
|
660
|
+
##### Search API
|
661
|
+
Searching is pretty simple.
|
662
|
+
|
663
|
+
````ruby
|
664
|
+
Article.search("Quick brown fox")
|
665
|
+
=> #<ArticleSearch current_criteria: ["Quick brown fox"]>
|
666
|
+
````
|
667
|
+
|
668
|
+
You can chain criteria and many other parameters:
|
669
|
+
````ruby
|
670
|
+
Article.search("Quick brown fox").search(published: true)
|
671
|
+
=> #<ArticleSearch current_criteria: ["Quick brown fox", {"published"=>true}]>
|
672
|
+
|
673
|
+
Article.search("Quick brown fox").order(:publication_date)
|
674
|
+
=> #<ArticleSearch current_criteria: ["Quick brown fox"], current_order: :publication_date>
|
675
|
+
|
676
|
+
Article.search("Quick brown fox").limit(100).offset(100)
|
677
|
+
=> #<ArticleSearch current_criteria: ["Quick brown fox"], current_limit: 100, current_offset: 100>
|
678
|
+
|
679
|
+
Article.search("Quick brown fox").page(1).per(100)
|
680
|
+
=> #<ArticleSearch current_criteria: ["Quick brown fox"], current_page: 1, current_limit: 100>
|
681
|
+
|
682
|
+
Article.search("Quick brown fox").aggregate(:tags).aggregate(:dates)
|
683
|
+
=> #<ArticleSearch current_criteria: ["Quick brown fox"], current_aggregations: [:tags, :dates]>>
|
684
|
+
````
|
685
|
+
|
686
|
+
##### Pagination
|
687
|
+
|
688
|
+
Search results can be paginated.
|
689
|
+
````ruby
|
690
|
+
search = Article.search("Quick brown fox").page(1).per(100)
|
691
|
+
search.current_page
|
692
|
+
=> 1
|
693
|
+
|
694
|
+
search.total_pages
|
695
|
+
=> 2546
|
696
|
+
|
697
|
+
> search.total_count
|
698
|
+
=> 254514
|
699
|
+
````
|
700
|
+
|
701
|
+
##### Total count
|
702
|
+
|
703
|
+
By default [ES doesn't return the total number of hits](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-your-data.html#track-total-hits). So, when calling `total_count` or `total_pages` a second request might be sent to ES.
|
704
|
+
To avoid a second roundtrip, use `track_total_hits`:
|
705
|
+
|
706
|
+
````ruby
|
707
|
+
search = Article.search("Quick brown fox")
|
708
|
+
search.hits
|
709
|
+
# ArticleSearch Search {…}
|
710
|
+
# ArticleSearch Search (81.8ms / took 16ms)
|
711
|
+
=> […]
|
712
|
+
|
713
|
+
search.total_count
|
714
|
+
# ArticleSearch Search {…, track_total_hits: true }
|
715
|
+
# ArticleSearch Search (135.3ms / took 76ms)
|
716
|
+
=> 276
|
717
|
+
|
718
|
+
search = Article.search("Quick brown fox").track_total_hits
|
719
|
+
search.hits
|
720
|
+
# ArticleSearch Search {…, track_total_hits: true }
|
721
|
+
# ArticleSearch Search (120.2ms / took 56ms)
|
722
|
+
=> […]
|
723
|
+
|
724
|
+
search.total_count
|
725
|
+
=> 276
|
726
|
+
````
|
727
|
+
|
728
|
+
##### Iterating results
|
729
|
+
|
730
|
+
Several methods are provided to loop through a collection or hits or records.
|
731
|
+
These methods are processing batches in the most efficient way: [PIT search_after](https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html#search-after).
|
732
|
+
|
733
|
+
* `find_each_hit` to yield each hit returned by Elasticsearch.
|
734
|
+
* `find_each_record` to yield each record from your database.
|
735
|
+
* `find_hits_in_batches` to yield each batch of hits as returned by Elasticsearch.
|
736
|
+
* `find_records_in_batches` to yield each batch of records from the database.
|
737
|
+
|
738
|
+
Example:
|
739
|
+
|
740
|
+
```ruby
|
741
|
+
Article.search(published: true).find_each_record do |record|
|
742
|
+
record.inspect
|
743
|
+
end
|
744
|
+
```
|
745
|
+
|
746
|
+
The `keep_alive` parameter tells Elasticsearch how long it should keep the point in time alive. Defaults to 1 minute.
|
747
|
+
|
748
|
+
```ruby
|
749
|
+
Article.search(published: true).find_each_record(keep_alive: "2h")
|
750
|
+
```
|
751
|
+
|
752
|
+
To specifies the size of the batch, use `per` chainable method or `batch_size` parameter. Defaults to 1000.
|
753
|
+
|
754
|
+
```ruby
|
755
|
+
Article.search(published: true).find_records_in_batches(batch_size: 500)
|
756
|
+
Article.search(published: true).per(500).find_records_in_batches
|
757
|
+
```
|
758
|
+
|
759
|
+
## Testing with Caoutsearch
|
760
|
+
|
761
|
+
Caoutsearch offers few methods to stub Elasticsearch requests.
|
762
|
+
You first need to add [webmock](https://github.com/bblimke/webmock) to your Gemfile.
|
763
|
+
|
764
|
+
```bash
|
765
|
+
bundle add webmock
|
766
|
+
```
|
767
|
+
|
768
|
+
Then, add `Caoutsearch::Testing::MockRequests` to your test suite.
|
769
|
+
The examples below uses RSpec, but it should be compatible with other test framework.
|
770
|
+
|
771
|
+
```ruby
|
772
|
+
# spec/spec_helper.rb
|
773
|
+
|
774
|
+
require "caoutsearch/testing"
|
775
|
+
|
776
|
+
RSpec.configure do |config|
|
777
|
+
config.include Caoutsearch::Testing::MockRequests
|
778
|
+
end
|
779
|
+
```
|
780
|
+
|
781
|
+
You can then call the following methods:
|
782
|
+
|
783
|
+
```ruby
|
784
|
+
RSpec.describe SomeClass do
|
785
|
+
before do
|
786
|
+
stub_elasticsearch_request(:head, "articles").to_return(status: 200)
|
787
|
+
|
788
|
+
stub_elasticsearch_request(:get, "_cat/indices?format=json&h=index").to_return_json, [
|
789
|
+
{ index: "ca_locals_v14" }
|
790
|
+
])
|
791
|
+
|
792
|
+
stub_elasticsearch_reindex_request("articles")
|
793
|
+
stub_elasticsearch_search_request("articles", [
|
794
|
+
{"_id" => "135", "_source" => {"name" => "Hello World"}},
|
795
|
+
{"_id" => "137", "_source" => {"name" => "Hello World"}}
|
796
|
+
])
|
797
|
+
end
|
798
|
+
|
799
|
+
# ... do your tests...
|
800
|
+
end
|
801
|
+
```
|
10
802
|
|
11
|
-
|
803
|
+
`stub_elasticsearch_search_request` accepts an array or records:
|
12
804
|
|
13
|
-
|
805
|
+
```ruby
|
806
|
+
RSpec.describe SomeClass do
|
807
|
+
let(:articles) { create_list(:article, 5) }
|
14
808
|
|
15
|
-
|
809
|
+
before do
|
810
|
+
stub_elasticsearch_search_request("articles", articles)
|
811
|
+
end
|
812
|
+
|
813
|
+
# ... do your tests...
|
814
|
+
end
|
815
|
+
```
|
816
|
+
|
817
|
+
It allows to shim the total number of hits returned.
|
818
|
+
|
819
|
+
```ruby
|
820
|
+
RSpec.describe SomeClass do
|
821
|
+
before do
|
822
|
+
stub_elasticsearch_search_request("articles", [], total: 250)
|
823
|
+
end
|
824
|
+
|
825
|
+
# ... do your tests...
|
826
|
+
end
|
827
|
+
```
|
16
828
|
|
17
829
|
## Contributing
|
18
830
|
|
@@ -27,9 +839,10 @@ bundle add caoutsearch
|
|
27
839
|
```bash
|
28
840
|
bundle exec rspec
|
29
841
|
bundle exec rubocop
|
842
|
+
bundle exec standardrb
|
30
843
|
```
|
31
844
|
|
32
|
-
|
845
|
+
All of them can be run with:
|
33
846
|
|
34
847
|
```bash
|
35
848
|
bundle exec rake
|