caoutsearch 0.0.1 → 0.0.4
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +816 -7
- data/lib/caoutsearch/config/mappings.rb +1 -1
- data/lib/caoutsearch/filter/base.rb +11 -7
- data/lib/caoutsearch/filter/boolean.rb +1 -1
- data/lib/caoutsearch/filter/date.rb +93 -22
- data/lib/caoutsearch/filter/default.rb +10 -10
- data/lib/caoutsearch/filter/geo_point.rb +1 -1
- data/lib/caoutsearch/filter/match.rb +5 -5
- data/lib/caoutsearch/filter/none.rb +1 -1
- data/lib/caoutsearch/filter/range.rb +6 -6
- data/lib/caoutsearch/index/document.rb +11 -11
- data/lib/caoutsearch/index/indice_versions.rb +3 -3
- data/lib/caoutsearch/index/internal_dsl.rb +3 -3
- data/lib/caoutsearch/index/reindex.rb +11 -11
- data/lib/caoutsearch/index/scoping.rb +3 -3
- data/lib/caoutsearch/index/serialization.rb +13 -13
- data/lib/caoutsearch/instrumentation/base.rb +12 -12
- data/lib/caoutsearch/instrumentation/search.rb +11 -2
- data/lib/caoutsearch/mappings.rb +1 -1
- data/lib/caoutsearch/model/indexable.rb +57 -0
- data/lib/caoutsearch/model/searchable.rb +31 -0
- data/lib/caoutsearch/model.rb +12 -0
- data/lib/caoutsearch/response/aggregations.rb +50 -0
- data/lib/caoutsearch/response/response.rb +9 -0
- data/lib/caoutsearch/response/suggestions.rb +9 -0
- data/lib/caoutsearch/response.rb +6 -0
- data/lib/caoutsearch/search/adapter/active_record.rb +39 -0
- data/lib/caoutsearch/search/base.rb +16 -15
- data/lib/caoutsearch/search/batch/scroll.rb +93 -0
- data/lib/caoutsearch/search/batch/search_after.rb +71 -0
- data/lib/caoutsearch/search/batch_methods.rb +63 -0
- data/lib/caoutsearch/search/callbacks.rb +28 -0
- data/lib/caoutsearch/search/delete_methods.rb +19 -0
- data/lib/caoutsearch/search/dsl/item.rb +2 -2
- data/lib/caoutsearch/search/inspect.rb +34 -0
- data/lib/caoutsearch/search/instrumentation.rb +19 -0
- data/lib/caoutsearch/search/internal_dsl.rb +107 -0
- data/lib/caoutsearch/search/naming.rb +45 -0
- data/lib/caoutsearch/search/point_in_time.rb +28 -0
- data/lib/caoutsearch/search/query/boolean.rb +4 -4
- data/lib/caoutsearch/search/query/nested.rb +1 -1
- data/lib/caoutsearch/search/query/setters.rb +4 -4
- data/lib/caoutsearch/search/query_builder/aggregations.rb +49 -0
- data/lib/caoutsearch/search/query_builder.rb +89 -0
- data/lib/caoutsearch/search/query_methods.rb +157 -0
- data/lib/caoutsearch/search/records.rb +23 -0
- data/lib/caoutsearch/search/resettable.rb +38 -0
- data/lib/caoutsearch/search/response.rb +97 -0
- data/lib/caoutsearch/search/sanitizer.rb +2 -2
- data/lib/caoutsearch/search/search_methods.rb +239 -0
- data/lib/caoutsearch/search/type_cast.rb +14 -6
- data/lib/caoutsearch/search/value.rb +10 -10
- data/lib/caoutsearch/search/value_overflow.rb +1 -1
- data/lib/caoutsearch/settings.rb +1 -1
- data/lib/caoutsearch/testing/mock_requests.rb +105 -0
- data/lib/caoutsearch/testing.rb +3 -0
- data/lib/caoutsearch/version.rb +1 -1
- data/lib/caoutsearch.rb +10 -5
- metadata +45 -127
- data/lib/caoutsearch/search/search/delete_methods.rb +0 -21
- data/lib/caoutsearch/search/search/inspect.rb +0 -36
- data/lib/caoutsearch/search/search/instrumentation.rb +0 -21
- data/lib/caoutsearch/search/search/internal_dsl.rb +0 -77
- data/lib/caoutsearch/search/search/naming.rb +0 -47
- data/lib/caoutsearch/search/search/query_builder.rb +0 -94
- data/lib/caoutsearch/search/search/query_methods.rb +0 -180
- data/lib/caoutsearch/search/search/resettable.rb +0 -35
- data/lib/caoutsearch/search/search/response.rb +0 -88
- data/lib/caoutsearch/search/search/scroll_methods.rb +0 -113
- data/lib/caoutsearch/search/search/search_methods.rb +0 -230
data/README.md
CHANGED
@@ -2,21 +2,829 @@
|
|
2
2
|
|
3
3
|
[![Gem Version](https://badge.fury.io/rb/caoutsearch.svg)](https://rubygems.org/gems/caoutsearch)
|
4
4
|
[![CI Status](https://github.com/mon-territoire/caoutsearch/actions/workflows/ci.yml/badge.svg)](https://github.com/mon-territoire/caoutsearch/actions/workflows/ci.yml)
|
5
|
-
[![
|
5
|
+
[![Ruby Style Guide](https://img.shields.io/badge/code_style-standard-brightgreen.svg)](https://github.com/testdouble/standard)
|
6
|
+
[![Maintainability](https://api.codeclimate.com/v1/badges/fbe73db3fd8be9a10e12/maintainability)](https://codeclimate.com/github/mon-territoire/caoutsearch/maintainability)
|
7
|
+
[![Test Coverage](https://api.codeclimate.com/v1/badges/fbe73db3fd8be9a10e12/test_coverage)](https://codeclimate.com/github/mon-territoire/caoutsearch/test_coverage)
|
6
8
|
|
7
|
-
|
9
|
+
[![JRuby](https://github.com/mon-territoire/caoutsearch/actions/workflows/jruby.yml/badge.svg)](https://github.com/mon-territoire/caoutsearch/actions/workflows/jruby.yml)
|
10
|
+
[![Truffle Ruby](https://github.com/mon-territoire/caoutsearch/actions/workflows/truffle_ruby.yml/badge.svg)](https://github.com/mon-territoire/caoutsearch/actions/workflows/truffle_ruby.yml)
|
11
|
+
|
12
|
+
**!! Gem under development before public release !!**
|
13
|
+
|
14
|
+
Caoutsearch is a new Elasticsearch integration for Ruby and/or Rails.
|
15
|
+
It provides a simple but powerful DSL to perform complex indexing and searching, while securely exposing search criteria to a public and chainable API, without overwhelming your models.
|
16
|
+
|
17
|
+
Caoutsearch only supports Elasticsearch 8.x right now.
|
18
|
+
It is used in production in a robust application, updated and maintained for several years at [Mon Territoire](https://mon-territoire.fr).
|
19
|
+
|
20
|
+
Caoutsearch was inspired by awesome gems such as [elasticsearch-rails](https://github.com/elastic/elasticsearch-rails) or [search_flip](https://github.com/mrkamel/search_flip).
|
21
|
+
If you don't have scenarios as complex as those described in this documentation, they should better suite your needs.
|
22
|
+
|
23
|
+
## Table of Contents
|
24
|
+
|
25
|
+
- [Installation](#installation)
|
26
|
+
- [Configuration](#configuration)
|
27
|
+
- Instrumentation
|
28
|
+
- [Usage](#usage)
|
29
|
+
- [Indice Configuration](#indice-configuration)
|
30
|
+
- Mapping & settings
|
31
|
+
- Text analysis
|
32
|
+
- Versionning
|
33
|
+
- [Index Engine](#index-engine)
|
34
|
+
- Properties
|
35
|
+
- Partial updates
|
36
|
+
- Eager loading
|
37
|
+
- Interdependencies
|
38
|
+
- [Search Engine](#search-engine)
|
39
|
+
- Queries
|
40
|
+
- [Filters](#filters)
|
41
|
+
- Full-text query
|
42
|
+
- Custom filters
|
43
|
+
- Orders
|
44
|
+
- [Aggregations](#aggregations)
|
45
|
+
- [Transform aggregations](#transform-aggregations)
|
46
|
+
- [Responses](#responses)
|
47
|
+
- [Loading records](#loading-records)
|
48
|
+
- [Model integration](#model-integration)
|
49
|
+
- [Add Caoutsearch to your models](#add-caoutsearch-to-your-models)
|
50
|
+
- [Index records](#index-records)
|
51
|
+
- [Index multiple records](#index-multiple-records)
|
52
|
+
- [Index single records](#index-single-records)
|
53
|
+
- [Delete documents](#delete-documents)
|
54
|
+
- [Automatic Callbacks](#automatic-callbacks)
|
55
|
+
- Asynchronous methods
|
56
|
+
- [Search for records](#search-for-records)
|
57
|
+
- [Search API](#search-api)
|
58
|
+
- [Pagination](#pagination)
|
59
|
+
- [Total count](#total-count)
|
60
|
+
- [Iterating results](#iterating-results)
|
61
|
+
- [Testing with Caoutsearch](#testing-with-Caoutsearch)
|
62
|
+
|
63
|
+
## Installation
|
8
64
|
|
9
65
|
```bash
|
10
66
|
bundle add caoutsearch
|
11
67
|
```
|
12
68
|
|
13
|
-
|
69
|
+
## Configuration
|
70
|
+
|
71
|
+
TODO
|
72
|
+
|
73
|
+
## Usage
|
74
|
+
|
75
|
+
### Indice Configuration
|
76
|
+
|
77
|
+
TODO
|
78
|
+
|
79
|
+
### Index Engine
|
80
|
+
|
81
|
+
TODO
|
82
|
+
|
83
|
+
### Search Engine
|
84
|
+
|
85
|
+
#### Filters
|
86
|
+
Filters declared in the search engine will define how Caoutsearch will build the queries
|
87
|
+
|
88
|
+
The main use of filters is to expose a field for search, but they can also be used to build more complex queries:
|
89
|
+
```ruby
|
90
|
+
class ArticleSearch < Caoutsearch::Search::Base
|
91
|
+
# Build a filter on the author field
|
92
|
+
filter :author
|
93
|
+
|
94
|
+
# Build a Match filter on multiple fields
|
95
|
+
filter :content, indexes: %i[title.words content], as: :match
|
96
|
+
|
97
|
+
# Build a more complex filter by using other filters
|
98
|
+
filter :public, as: :boolean
|
99
|
+
filter :published_on, as: :date
|
100
|
+
filter :active do |value|
|
101
|
+
search_by(published: value, published_on: value)
|
102
|
+
end
|
103
|
+
end
|
104
|
+
```
|
105
|
+
|
106
|
+
Caoutsearch different types of filters to handle different types of data or ways to search them:
|
107
|
+
|
108
|
+
##### Default filter
|
109
|
+
|
110
|
+
##### Boolean filter
|
111
|
+
|
112
|
+
##### Date filter
|
113
|
+
|
114
|
+
For a date filter defined like this:
|
115
|
+
```ruby
|
116
|
+
class ArticleSearch < Caoutsearch::Search::Base
|
117
|
+
...
|
118
|
+
|
119
|
+
filter :published_on, as: :date
|
120
|
+
end
|
121
|
+
```
|
122
|
+
|
123
|
+
You can now search the matching index with the `published_on` criterion:
|
124
|
+
```ruby
|
125
|
+
Article.search(published_on: Date.today)
|
126
|
+
```
|
127
|
+
|
128
|
+
and the following query will be generated to send to elasticsearch:
|
129
|
+
```json
|
130
|
+
{
|
131
|
+
"query": {
|
132
|
+
"bool": {
|
133
|
+
"filter": [
|
134
|
+
{ "range": { "published_on": { "gte": "2022-23-11", "lte": "2022-23-11"}}}
|
135
|
+
]
|
136
|
+
}
|
137
|
+
}
|
138
|
+
}
|
139
|
+
```
|
140
|
+
|
141
|
+
The date filter accepts multiple types of arguments :
|
142
|
+
|
143
|
+
```ruby
|
144
|
+
# Search for articles published on a date:
|
145
|
+
Article.search(published_on: Date.today)
|
146
|
+
|
147
|
+
# Search for articles published before a date:
|
148
|
+
Article.search(published_on: { less_than: "2022-12-25" })
|
149
|
+
Article.search(published_on: { less_than_or_equal: "2022-12-25" })
|
150
|
+
Article.search(published_on: ..Date.new(2022, 12, 25))
|
151
|
+
Article.search(published_on: [[nil, "now-2w/d"]])
|
152
|
+
|
153
|
+
# Search for articles published after a date:
|
154
|
+
Article.search(published_on: { greater_than: "2022-12-25" })
|
155
|
+
Article.search(published_on: { greater_than_or_equal: "2022-12-25" })
|
156
|
+
Article.search(published_on: Date.new(2022, 12, 25)..)
|
157
|
+
Article.search(published_on: [["now-1w/d", nil]])
|
158
|
+
|
159
|
+
# Search for articles published between two dates:
|
160
|
+
Article.search(published_on: { greater_than: "2022-12-25", less_than: "2023-12-25" })
|
161
|
+
Article.search(published_on: Date.new(2022, 12, 25)..Date.new(2023, 12, 25))
|
162
|
+
Article.search(published_on: [["now-1w/d", "now/d"]])
|
163
|
+
```
|
164
|
+
|
165
|
+
Dates of various formats are handled:
|
166
|
+
```ruby
|
167
|
+
"2022-10-11"
|
168
|
+
Date.today
|
169
|
+
Time.zone.now
|
170
|
+
```
|
171
|
+
|
172
|
+
We also support elasticsearch's date math
|
173
|
+
```ruby
|
174
|
+
"now-1h"
|
175
|
+
"now+2w/d"
|
176
|
+
```
|
177
|
+
|
178
|
+
##### GeoPoint filter
|
179
|
+
|
180
|
+
##### Match filter
|
181
|
+
|
182
|
+
##### Range filter
|
183
|
+
|
184
|
+
#### Aggregations
|
185
|
+
|
186
|
+
You can define simple to complex aggregations.
|
187
|
+
|
188
|
+
````ruby
|
189
|
+
class ArticleSearch < Caoutsearch::Search::Base
|
190
|
+
has_aggregation :view_count, sum: { field: :view_count }
|
191
|
+
has_aggregation :popular_tags,
|
192
|
+
filter: { term: { published: true } },
|
193
|
+
aggs: {
|
194
|
+
published: {
|
195
|
+
terms: { field: :tags, size: 10 }
|
196
|
+
}
|
197
|
+
}
|
198
|
+
end
|
199
|
+
````
|
200
|
+
|
201
|
+
Then you can request one or more aggregations at the same time or chain the `aggregate` method.
|
202
|
+
The `aggregations` method will trigger a request and returns a [Response::Aggregations](#responses).
|
203
|
+
|
204
|
+
````ruby
|
205
|
+
ArticleSearch.aggregate(:view_count).aggregations
|
206
|
+
# ArticleSearch Search { "body": { "aggs": { "view_count": { "sum": { "field": "view_count" }}}}}
|
207
|
+
# ArticleSearch Search (10ms / took 5ms)
|
208
|
+
=> #<Caoutsearch::Response::Aggregations view_count=#<Caoutsearch::Response::Response value=119652>>
|
209
|
+
|
210
|
+
ArticleSearch.aggregate(:view_count, :popular_tags).aggregations
|
211
|
+
# ArticleSearch Search { "body": { "aggs": { "view_count": {…}, "popular_tags": {…}}}}
|
212
|
+
# ArticleSearch Search (10ms / took 5ms)
|
213
|
+
=> #<Caoutsearch::Response::Aggregations view_count=#<Caoutsearch::Response::Response value=119652> popular_tags=#<Caoutsearch::Response::Response buckets=…>>
|
214
|
+
|
215
|
+
ArticleSearch.aggregate(:view_count).aggregate(:popular_tags).aggregations
|
216
|
+
# ArticleSearch Search { "body": { "aggs": { "view_count": {…}, "popular_tags": {…}}}}
|
217
|
+
# ArticleSearch Search (10ms / took 5ms)
|
218
|
+
=> #<Caoutsearch::Response::Aggregations view_count=#<Caoutsearch::Response::Response value=119652> popular_tags=#<Caoutsearch::Response::Response buckets=…>>
|
219
|
+
````
|
220
|
+
|
221
|
+
You can create powerful aggregations using blocks and pass arguments to them.
|
222
|
+
|
223
|
+
````ruby
|
224
|
+
class ArticleSearch < Caoutsearch::Search::Base
|
225
|
+
has_aggregation :popular_tags_since do |date|
|
226
|
+
raise TypeError unless date.is_a?(Date)
|
227
|
+
|
228
|
+
query.aggregations[:popular_tags_since] = {
|
229
|
+
filter: { range: { publication_date: { gte: date.to_s } } },
|
230
|
+
aggs: {
|
231
|
+
published: {
|
232
|
+
terms: { field: :tags, size: 20 }
|
233
|
+
}
|
234
|
+
}
|
235
|
+
}
|
236
|
+
end
|
237
|
+
end
|
238
|
+
|
239
|
+
ArticleSearch.aggregate(popular_tags_since: 1.day.ago).aggregations
|
240
|
+
# ArticleSearch Search { "body": { "aggs": { "popular_tags_since": {…}}}}
|
241
|
+
# ArticleSearch Search (10ms / took 5ms)
|
242
|
+
=> #<Caoutsearch::Response::Aggregations popular_tags_since=#<Caoutsearch::Response::Response …
|
243
|
+
````
|
244
|
+
|
245
|
+
Only one argument can be passed to an aggregation block.
|
246
|
+
Use an Array or a Hash if you need to pass multiple options.
|
247
|
+
|
248
|
+
````ruby
|
249
|
+
class ArticleSearch < Caoutsearch::Search::Base
|
250
|
+
has_aggregation :popular_tags_since do |options|
|
251
|
+
# …
|
252
|
+
end
|
253
|
+
|
254
|
+
has_aggregation :popular_tags_between do |(first_date, end_date)|
|
255
|
+
# …
|
256
|
+
end
|
257
|
+
end
|
258
|
+
|
259
|
+
ArticleSearch.aggregate(popular_tags_since: { date: 1.day.ago, size: 20 })
|
260
|
+
ArticleSearch.aggregate(popular_tags_between: [date1, date2])
|
261
|
+
````
|
262
|
+
|
263
|
+
Finally, you can create a "catch-all" aggregation to handle cumbersome behaviors:
|
264
|
+
|
265
|
+
````ruby
|
266
|
+
class ArticleSearch < Caoutsearch::Search::Base
|
267
|
+
has_aggregation do |name, options = {}|
|
268
|
+
raise "unxpected_error" unless name.match?(/^view_count_(?<year>\d{4})$/)
|
269
|
+
|
270
|
+
query.aggregations[name] = {
|
271
|
+
filter: { term: { year: $LAST_LATCH_INFO[:year] } },
|
272
|
+
aggs: {
|
273
|
+
filtered: {
|
274
|
+
sum: { field: :view_count }
|
275
|
+
}
|
276
|
+
}
|
277
|
+
}
|
278
|
+
end
|
279
|
+
end
|
280
|
+
|
281
|
+
ArticleSearch.aggregate(:view_count_2020, :view_count_2019).aggregations
|
282
|
+
# ArticleSearch Search { "body": { "aggs": { "view_count_2020": {…}, "view_count_2019": {…}}}}
|
283
|
+
# ArticleSearch Search (10ms / took 5ms)
|
284
|
+
=> #<Caoutsearch::Response::Aggregations view_count_2020=#<Caoutsearch::Response::Response …
|
285
|
+
````
|
286
|
+
|
287
|
+
#### Transform aggregations
|
288
|
+
|
289
|
+
When using [buckets aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket.html) and/or [pipeline aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline.html), the path to the expected values can get complicated and become subject to unexpected changes for a public API.
|
290
|
+
|
291
|
+
````ruby
|
292
|
+
ArticleSearch.aggregate(popular_tags_since: 1.month.ago).aggregations.popular_tags_since.published.buckets.pluck(:key)
|
293
|
+
=> ["Blog", "Tech", …]
|
294
|
+
````
|
295
|
+
|
296
|
+
Instead, you can define transformations to provide simpler access to aggregated data:
|
297
|
+
|
298
|
+
````ruby
|
299
|
+
class ArticleSearch < Caoutsearch::Search::Base
|
300
|
+
has_aggregation :popular_tags_since do |since|
|
301
|
+
# …
|
302
|
+
end
|
303
|
+
|
304
|
+
transform_aggregation :popular_tags_since do |aggs|
|
305
|
+
aggs.dig(:popular_tags_since, :published, :buckets).pluck(:key)
|
306
|
+
end
|
307
|
+
end
|
308
|
+
|
309
|
+
ArticleSearch.aggregate(popular_tags_since: 1.month.ago).aggregations.popular_tags_since
|
310
|
+
=> ["Blog", "Tech", …]
|
311
|
+
````
|
312
|
+
|
313
|
+
You can also use transformations to combine multiple aggregations:
|
314
|
+
|
315
|
+
````ruby
|
316
|
+
class ArticleSearch < Caoutsearch::Search::Base
|
317
|
+
has_aggregation :blog_count, filter: { term: { category: "blog" } }
|
318
|
+
has_aggregation :archives_count, filter: { term: { archived: true } }
|
319
|
+
|
320
|
+
transform_aggregation :stats, from: %i[blog_count archives_count] do |aggs|
|
321
|
+
{
|
322
|
+
blog_count: aggs.dig(:blog_count, :doc_count),
|
323
|
+
archives_count: aggs.dig(:archives, :doc_count)
|
324
|
+
}
|
325
|
+
end
|
326
|
+
end
|
327
|
+
|
328
|
+
ArticleSearch.aggregate(:stats).aggregations.stats
|
329
|
+
# ArticleSearch Search { "body": { "aggs": { "blog_count": {…}, "archives_count": {…}}}}
|
330
|
+
# ArticleSearch Search (10ms / took 5ms)
|
331
|
+
=> { blog_count: 124, archives_count: 2452 }
|
332
|
+
````
|
333
|
+
|
334
|
+
This is also usefull to unify the API between different search engines:
|
335
|
+
|
336
|
+
````ruby
|
337
|
+
class ArticleSearch < Caoutsearch::Search::Base
|
338
|
+
has_aggregation :popular_tags,
|
339
|
+
filter: { term: { published: true } },
|
340
|
+
aggs: { published: { terms: { field: :tags, size: 10 } } }
|
341
|
+
|
342
|
+
transform_aggregation :popular_tags do |aggs|
|
343
|
+
aggs.dig(:popular_tags, :published, :buckets).pluck(:key)
|
344
|
+
end
|
345
|
+
end
|
346
|
+
|
347
|
+
class TagSearch < Caoutsearch::Search::Base
|
348
|
+
has_aggregation :popular_tags,
|
349
|
+
terms: { field: "label", size: 20, order: { used_count: "desc" } }
|
350
|
+
|
351
|
+
transform_aggregation :popular_tags do |aggs|
|
352
|
+
aggs.dig(:popular_tags, :buckets).pluck(:key)
|
353
|
+
end
|
354
|
+
end
|
355
|
+
|
356
|
+
ArticleSearch.aggregate(:popular_tags).aggregations.popular_tags
|
357
|
+
=> ["Blog", "Tech", …]
|
358
|
+
|
359
|
+
TagSearch.aggregate(:popular_tags).aggregations.popular_tags
|
360
|
+
=> ["Tech", "Blog", …]
|
361
|
+
````
|
362
|
+
|
363
|
+
Transformations are performed on demand and result is memorized. That means:
|
364
|
+
- the result of transformation is not visible in the [Response::Aggregations](#responses) output.
|
365
|
+
- the block is called only once for the same search instance.
|
366
|
+
|
367
|
+
````ruby
|
368
|
+
class ArticleSearch < Caoutsearch::Search::Base
|
369
|
+
has_aggregation :popular_tags, …
|
370
|
+
|
371
|
+
transform_aggregation :popular_tags do |aggs|
|
372
|
+
tags = aggs.dig(:popular_tags, :published, :buckets).pluck(:key)
|
373
|
+
authorized = Tag.where(title: tags, authorize: true).pluck(:title)
|
374
|
+
tags & authorized
|
375
|
+
end
|
376
|
+
end
|
377
|
+
|
378
|
+
article_search = ArticleSearch.aggregate(:popular_tags)
|
379
|
+
=> #<ArticleSearch current_aggregations: [:popular_tags]>
|
380
|
+
|
381
|
+
article_search.aggregations
|
382
|
+
# ArticleSearch Search (10ms / took 5ms)
|
383
|
+
=> #<Caoutsearch::Response::Aggregations popular_tags=#<Caoutsearch::Response::Response doc_count=100 …
|
384
|
+
|
385
|
+
article_search.aggregations.popular_tags
|
386
|
+
# (10.2ms) SELECT "tags"."title" FROM "tags" WHERE "tags"."title" IN …
|
387
|
+
=> ["Blog", "Tech", …]
|
388
|
+
|
389
|
+
article_search.aggregations.popular_tags
|
390
|
+
=> ["Blog", "Tech", …]
|
391
|
+
|
392
|
+
article_search.search("Tech").aggregations.popular_tags
|
393
|
+
# ArticleSearch Search (10ms / took 5ms)
|
394
|
+
# (10.2ms) SELECT "tags"."title" FROM "tags" WHERE "tags"."title" IN …
|
395
|
+
=> ["Blog", "Tech", …]
|
396
|
+
````
|
397
|
+
|
398
|
+
Be careful to avoid using `aggregations.<aggregation_name>` inside a transformation block: it can lead to an infinite recursion.
|
399
|
+
|
400
|
+
````ruby
|
401
|
+
class ArticleSearch < Caoutsearch::Search::Base
|
402
|
+
transform_aggregation :popular_tags do
|
403
|
+
aggregations.popular_tags.buckets.pluck("key")
|
404
|
+
end
|
405
|
+
end
|
406
|
+
|
407
|
+
ArticleSearch.aggregate(:popular_tags).aggregations.popular_tags
|
408
|
+
Traceback (most recent call last):
|
409
|
+
4: from app/searches/article_search.rb:3:in `block in <class:ArticleSearch>'
|
410
|
+
3: from app/searches/article_search.rb:3:in `block in <class:ArticleSearch>'
|
411
|
+
2: from app/searches/article_search.rb:3:in `block in <class:ArticleSearch>'
|
412
|
+
1: from app/searches/article_search.rb:3:in `block in <class:ArticleSearch>'
|
413
|
+
SystemStackError (stack level too deep)
|
414
|
+
````
|
415
|
+
|
416
|
+
Instead, use the argument passed to the block: it's is a shortcut for `response.aggregations` which is a [Response::Reponse](#responses) and not a [Response::Aggregations](#responses).
|
417
|
+
|
418
|
+
````ruby
|
419
|
+
class ArticleSearch < Caoutsearch::Search::Base
|
420
|
+
transform_aggregation :popular_tags do |aggs|
|
421
|
+
aggs.popular_tags.buckets.pluck("key")
|
422
|
+
end
|
423
|
+
end
|
424
|
+
|
425
|
+
ArticleSearch.aggregate(:popular_tags).aggregations.popular_tags
|
426
|
+
=> ["Blog", "Tech", …]
|
427
|
+
````
|
428
|
+
|
429
|
+
One last helpful argument is `track_total_hits` which allows to perform calculations over aggregations using the `total_count` method without sending a second request.
|
430
|
+
Take a look at [Total count](#total-count) to understand why a second request could be performed.
|
431
|
+
|
432
|
+
````ruby
|
433
|
+
class ArticleSearch < Caoutsearch::Search::Base
|
434
|
+
aggregation :tagged, filter: { exist: "tag" }
|
435
|
+
|
436
|
+
transform_aggregation :tagged_rate, from: :tagged, track_total_hits: true do |aggs|
|
437
|
+
count = aggs.dig(:tagged, :doc_count)
|
438
|
+
count.to_f / total_count
|
439
|
+
end
|
440
|
+
|
441
|
+
transform_aggregation :tagged_rate_without_track_total_hits, from: :tagged do |aggs|
|
442
|
+
count = aggs.dig(:tagged, :doc_count)
|
443
|
+
count.to_f / total_count
|
444
|
+
end
|
445
|
+
end
|
446
|
+
|
447
|
+
ArticleSearch.aggregate(:tagged_rate).aggregations.tagged_rate
|
448
|
+
# ArticleSearch Search { "body": { "track_total_hits": true, "aggs": { "blog_count": {…}, "archives_count": {…}}}}
|
449
|
+
# ArticleSearch Search (10ms / took 5ms)
|
450
|
+
=> 0.95
|
451
|
+
|
452
|
+
ArticleSearch.aggregate(:tagged_rate_without_track_total_hits).aggregations.tagged_rate
|
453
|
+
# ArticleSearch Search { "body": { "aggs": { "blog_count": {…}, "archives_count": {…}}}}
|
454
|
+
# ArticleSearch Search (10ms / took 5ms)
|
455
|
+
# ArticleSearch Search { "body": { "track_total_hits": true, "aggs": { "blog_count": {…}, "archives_count":
|
456
|
+
# ArticleSearch Search (10ms / took 5ms)
|
457
|
+
=> 0.95
|
458
|
+
````
|
459
|
+
|
460
|
+
#### Responses
|
461
|
+
|
462
|
+
After the request has been sent by calling a method such as `load`, `response` or `hits`, the results is wrapped in a `Response::Response` class which provides method access to its properties via [Hashie::Mash](http://github.com/intridea/hashie).
|
463
|
+
|
464
|
+
Aggregations and suggestions are wrapped in their own respective subclass of `Response::Response`
|
465
|
+
|
466
|
+
````ruby
|
467
|
+
results.response
|
468
|
+
=> #<Caoutsearch::Response::Response _shards=#<Caoutsearch::Response::Response failed=0 skipped=0 successful=5 total=5> hits=…
|
469
|
+
|
470
|
+
search.hits
|
471
|
+
=> #<Hashie::Array [#<Caoutsearch::Response::Response _id="2"…
|
14
472
|
|
15
|
-
|
473
|
+
search.aggregations
|
474
|
+
=> #<Caoutsearch::Response::Aggregations view_count=#<Caoutsearch::Response::Response…
|
16
475
|
|
17
|
-
|
476
|
+
search.suggestions
|
477
|
+
=> #<Caoutsearch::Response::Suggestions tags=#<Caoutsearch::Response::Response…
|
478
|
+
````
|
18
479
|
|
19
|
-
|
480
|
+
##### Loading records
|
481
|
+
|
482
|
+
When calling `records`, the search engine will try to load records from a model using the same class name without `Search` the suffix:
|
483
|
+
* `ArticleSearch` > `Article`
|
484
|
+
* `Blog::ArticleSearch` > `Blog::Article`
|
485
|
+
|
486
|
+
````ruby
|
487
|
+
ArticleSearch.new.records.first
|
488
|
+
# ArticleSearch Search (10ms / took 5ms)
|
489
|
+
# Article Load (9.6ms) SELECT "articles".* FROM "articles" WHERE "articles"."id" IN (1, …
|
490
|
+
=> #<Article id: 1, …>
|
491
|
+
````
|
492
|
+
|
493
|
+
However, you can define an alternative model to load records. This might be helpful when using [single table inheritance](https://api.rubyonrails.org/classes/ActiveRecord/Inheritance.html).
|
494
|
+
|
495
|
+
````ruby
|
496
|
+
ArticleSearch.new.records(use: BlogArticle).first
|
497
|
+
# ArticleSearch Search (10ms / took 5ms)
|
498
|
+
# BlogArticle Load (9.6ms) SELECT "articles".* FROM "articles" WHERE "articles"."id" IN (1, …
|
499
|
+
=> #<BlogArticle id: 1, …>
|
500
|
+
````
|
501
|
+
|
502
|
+
You can also define an alternative model at class level:
|
503
|
+
|
504
|
+
````ruby
|
505
|
+
class BlogArticleSearch < Caoutsearch::Search::Base
|
506
|
+
self.model_name = "Article"
|
507
|
+
|
508
|
+
default do
|
509
|
+
query.filters << { term: { category: "blog" } }
|
510
|
+
end
|
511
|
+
end
|
512
|
+
|
513
|
+
BlogArticleSearch.new.records.first
|
514
|
+
# BlogArticleSearch Search (10ms / took 5ms)
|
515
|
+
# Article Load (9.6ms) SELECT "articles".* FROM "articles" WHERE "articles"."id" IN (1, …
|
516
|
+
=> #<Article id: 1, …>
|
517
|
+
````
|
518
|
+
|
519
|
+
### Model integration
|
520
|
+
|
521
|
+
#### Add Caoutsearch to your models
|
522
|
+
|
523
|
+
The simplest solution is to add `Caoutsearch::Model` to your model and the link the appropriate `Index` and/or `Search` engines:
|
524
|
+
|
525
|
+
```ruby
|
526
|
+
class Article < ActiveRecord::Base
|
527
|
+
include Caoutsearch::Model
|
528
|
+
|
529
|
+
index_with ArticleIndex
|
530
|
+
search_with ArticleSearch
|
531
|
+
end
|
532
|
+
```
|
533
|
+
|
534
|
+
If you don't need your models to be `Indexable` and `Searchable`, you can include only one of the following two modules:
|
535
|
+
|
536
|
+
````ruby
|
537
|
+
class Article < ActiveRecord::Base
|
538
|
+
include Caoutsearch::Model::Indexable
|
539
|
+
|
540
|
+
index_with ArticleIndex
|
541
|
+
end
|
542
|
+
````
|
543
|
+
or
|
544
|
+
````ruby
|
545
|
+
class Article < ActiveRecord::Base
|
546
|
+
include Caoutsearch::Model::Searchable
|
547
|
+
|
548
|
+
search_with ArticleSearch
|
549
|
+
end
|
550
|
+
````
|
551
|
+
|
552
|
+
The modules can be safely included in the meta model `ApplicationRecord`.
|
553
|
+
Indexing & searching features are not available until you call `index_with` or `search_with`:
|
554
|
+
|
555
|
+
````ruby
|
556
|
+
class ApplicationRecord < ActiveRecord::Base
|
557
|
+
include Caoutsearch::Model
|
558
|
+
end
|
559
|
+
````
|
560
|
+
|
561
|
+
#### Index records
|
562
|
+
|
563
|
+
##### Index multiple records
|
564
|
+
|
565
|
+
Import all your records or a restricted scope of records to Elastcisearch.
|
566
|
+
|
567
|
+
````ruby
|
568
|
+
Article.reindex
|
569
|
+
Article.where(published: true).reindex
|
570
|
+
````
|
571
|
+
|
572
|
+
You can update one or more properties. (see [Indexation Engines](#indexation-engines) to read more about properties):
|
573
|
+
|
574
|
+
````ruby
|
575
|
+
Article.reindex(:category)
|
576
|
+
Article.reindex(%i[category published_on])
|
577
|
+
````
|
578
|
+
|
579
|
+
When `reindex` is called without properties, it'll import the full document to ES.
|
580
|
+
On the contrary, when properties are passed, it'll only update existing documents.
|
581
|
+
You can control this behavior with the `method` argument.
|
582
|
+
|
583
|
+
````ruby
|
584
|
+
Article.where(id: 123).reindex(:category)
|
585
|
+
# ArticleIndex Reindex {"index":"articles","body":[{"update":{"_id":123}},{"doc":{"category":"blog"}}]}
|
586
|
+
# [Error] {"update"=>{"_index"=>"articles", "_id"=>"123", "status"=>404, "error"=>{"type"=>"document_missing_exception", …}}
|
587
|
+
|
588
|
+
Article.where(id: 123).reindex(:category, method: :index)
|
589
|
+
# ArticleIndex Reindex {"index":"articles","body":[{"index":{"_id":123}},{"category":"blog"}]}
|
590
|
+
|
591
|
+
Article.where(id: 123).reindex(method: :update)
|
592
|
+
# ArticleIndex Reindex {"index":"articles","body":[{"update":{"_id":123}},{"doc":{…}}]}
|
593
|
+
````
|
594
|
+
|
595
|
+
##### Index single records
|
596
|
+
|
597
|
+
Import a single record.
|
598
|
+
|
599
|
+
````ruby
|
600
|
+
Article.find(123).update_index
|
601
|
+
````
|
602
|
+
|
603
|
+
You can update one or more properties. (see [Indexation Engines](#indexation-engines) to read more about properties):
|
604
|
+
|
605
|
+
````ruby
|
606
|
+
Article.find(123).update_index(:category)
|
607
|
+
Article.find(123).update_index(%i[category published_on])
|
608
|
+
````
|
609
|
+
|
610
|
+
You can verify if and how documents are indexed.
|
611
|
+
If the document is missing in ES, it'll raise a `Elastic::Transport::Transport::Errors::NotFound`.
|
612
|
+
|
613
|
+
````ruby
|
614
|
+
Article.find(123).indexed_document
|
615
|
+
# Traceback (most recent call last):
|
616
|
+
# 1: from (irb):1
|
617
|
+
# Elastic::Transport::Transport::Errors::NotFound ([404] {"_index":"articles","_id":"123","found":false})
|
618
|
+
|
619
|
+
Article.find(123).update_index
|
620
|
+
Article.find(123).indexed_document
|
621
|
+
=> {"_index"=>"articles", "_id"=>"123", "_version"=>1"found"=>true, "_source"=>{…}}
|
622
|
+
````
|
623
|
+
|
624
|
+
##### Delete documents
|
625
|
+
|
626
|
+
You can delete one or more documents.
|
627
|
+
**Note**: it won't delete records from database, only from the ES indice.
|
628
|
+
|
629
|
+
````ruby
|
630
|
+
Article.delete_indexes
|
631
|
+
Article.where(id: 123).delete_indexed_documents
|
632
|
+
Article.find(123).delete_index
|
633
|
+
````
|
634
|
+
|
635
|
+
If a record is already deleted from the database, you can still delete its document.
|
636
|
+
|
637
|
+
````ruby
|
638
|
+
Article.delete_index(123)
|
639
|
+
````
|
640
|
+
|
641
|
+
##### Automatic Callbacks
|
642
|
+
|
643
|
+
Callbacks are not provided by Caoutsearch but they are very easy to add:
|
644
|
+
|
645
|
+
````ruby
|
646
|
+
class Article < ApplicationRecord
|
647
|
+
index_with ArticleIndex
|
648
|
+
|
649
|
+
after_commit :update_index, on: %i[create update]
|
650
|
+
after_commit :delete_index, on: %i[destroy]
|
651
|
+
end
|
652
|
+
````
|
653
|
+
|
654
|
+
##### Asynchronous methods
|
655
|
+
|
656
|
+
TODO
|
657
|
+
|
658
|
+
#### Search for records
|
659
|
+
|
660
|
+
##### Search API
|
661
|
+
Searching is pretty simple.
|
662
|
+
|
663
|
+
````ruby
|
664
|
+
Article.search("Quick brown fox")
|
665
|
+
=> #<ArticleSearch current_criteria: ["Quick brown fox"]>
|
666
|
+
````
|
667
|
+
|
668
|
+
You can chain criteria and many other parameters:
|
669
|
+
````ruby
|
670
|
+
Article.search("Quick brown fox").search(published: true)
|
671
|
+
=> #<ArticleSearch current_criteria: ["Quick brown fox", {"published"=>true}]>
|
672
|
+
|
673
|
+
Article.search("Quick brown fox").order(:publication_date)
|
674
|
+
=> #<ArticleSearch current_criteria: ["Quick brown fox"], current_order: :publication_date>
|
675
|
+
|
676
|
+
Article.search("Quick brown fox").limit(100).offset(100)
|
677
|
+
=> #<ArticleSearch current_criteria: ["Quick brown fox"], current_limit: 100, current_offset: 100>
|
678
|
+
|
679
|
+
Article.search("Quick brown fox").page(1).per(100)
|
680
|
+
=> #<ArticleSearch current_criteria: ["Quick brown fox"], current_page: 1, current_limit: 100>
|
681
|
+
|
682
|
+
Article.search("Quick brown fox").aggregate(:tags).aggregate(:dates)
|
683
|
+
=> #<ArticleSearch current_criteria: ["Quick brown fox"], current_aggregations: [:tags, :dates]>>
|
684
|
+
````
|
685
|
+
|
686
|
+
##### Pagination
|
687
|
+
|
688
|
+
Search results can be paginated.
|
689
|
+
````ruby
|
690
|
+
search = Article.search("Quick brown fox").page(1).per(100)
|
691
|
+
search.current_page
|
692
|
+
=> 1
|
693
|
+
|
694
|
+
search.total_pages
|
695
|
+
=> 2546
|
696
|
+
|
697
|
+
> search.total_count
|
698
|
+
=> 254514
|
699
|
+
````
|
700
|
+
|
701
|
+
##### Total count
|
702
|
+
|
703
|
+
By default [ES doesn't return the total number of hits](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-your-data.html#track-total-hits). So, when calling `total_count` or `total_pages` a second request might be sent to ES.
|
704
|
+
To avoid a second roundtrip, use `track_total_hits`:
|
705
|
+
|
706
|
+
````ruby
|
707
|
+
search = Article.search("Quick brown fox")
|
708
|
+
search.hits
|
709
|
+
# ArticleSearch Search {…}
|
710
|
+
# ArticleSearch Search (81.8ms / took 16ms)
|
711
|
+
=> […]
|
712
|
+
|
713
|
+
search.total_count
|
714
|
+
# ArticleSearch Search {…, track_total_hits: true }
|
715
|
+
# ArticleSearch Search (135.3ms / took 76ms)
|
716
|
+
=> 276
|
717
|
+
|
718
|
+
search = Article.search("Quick brown fox").track_total_hits
|
719
|
+
search.hits
|
720
|
+
# ArticleSearch Search {…, track_total_hits: true }
|
721
|
+
# ArticleSearch Search (120.2ms / took 56ms)
|
722
|
+
=> […]
|
723
|
+
|
724
|
+
search.total_count
|
725
|
+
=> 276
|
726
|
+
````
|
727
|
+
|
728
|
+
##### Iterating results
|
729
|
+
|
730
|
+
Several methods are provided to loop through a collection or hits or records.
|
731
|
+
These methods are processing batches in the most efficient way: [PIT search_after](https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html#search-after).
|
732
|
+
|
733
|
+
* `find_each_hit` to yield each hit returned by Elasticsearch.
|
734
|
+
* `find_each_record` to yield each record from your database.
|
735
|
+
* `find_hits_in_batches` to yield each batch of hits as returned by Elasticsearch.
|
736
|
+
* `find_records_in_batches` to yield each batch of records from the database.
|
737
|
+
|
738
|
+
Example:
|
739
|
+
|
740
|
+
```ruby
|
741
|
+
Article.search(published: true).find_each_record do |record|
|
742
|
+
record.inspect
|
743
|
+
end
|
744
|
+
```
|
745
|
+
|
746
|
+
The `keep_alive` parameter tells Elasticsearch how long it should keep the point in time alive. Defaults to 1 minute.
|
747
|
+
|
748
|
+
```ruby
|
749
|
+
Article.search(published: true).find_each_record(keep_alive: "2h")
|
750
|
+
```
|
751
|
+
|
752
|
+
To specifies the size of the batch, use `per` chainable method or `batch_size` parameter. Defaults to 1000.
|
753
|
+
|
754
|
+
```ruby
|
755
|
+
Article.search(published: true).find_records_in_batches(batch_size: 500)
|
756
|
+
Article.search(published: true).per(500).find_records_in_batches
|
757
|
+
```
|
758
|
+
|
759
|
+
## Testing with Caoutsearch
|
760
|
+
|
761
|
+
Caoutsearch offers few methods to stub Elasticsearch requests.
|
762
|
+
You first need to add [webmock](https://github.com/bblimke/webmock) to your Gemfile.
|
763
|
+
|
764
|
+
```bash
|
765
|
+
bundle add webmock
|
766
|
+
```
|
767
|
+
|
768
|
+
Then, add `Caoutsearch::Testing::MockRequests` to your test suite.
|
769
|
+
The examples below uses RSpec, but it should be compatible with other test framework.
|
770
|
+
|
771
|
+
```ruby
|
772
|
+
# spec/spec_helper.rb
|
773
|
+
|
774
|
+
require "caoutsearch/testing"
|
775
|
+
|
776
|
+
RSpec.configure do |config|
|
777
|
+
config.include Caoutsearch::Testing::MockRequests
|
778
|
+
end
|
779
|
+
```
|
780
|
+
|
781
|
+
You can then call the following methods:
|
782
|
+
|
783
|
+
```ruby
|
784
|
+
RSpec.describe SomeClass do
|
785
|
+
before do
|
786
|
+
stub_elasticsearch_request(:head, "articles").to_return(status: 200)
|
787
|
+
|
788
|
+
stub_elasticsearch_request(:get, "_cat/indices?format=json&h=index").to_return_json, [
|
789
|
+
{ index: "ca_locals_v14" }
|
790
|
+
])
|
791
|
+
|
792
|
+
stub_elasticsearch_reindex_request("articles")
|
793
|
+
stub_elasticsearch_search_request("articles", [
|
794
|
+
{"_id" => "135", "_source" => {"name" => "Hello World"}},
|
795
|
+
{"_id" => "137", "_source" => {"name" => "Hello World"}}
|
796
|
+
])
|
797
|
+
end
|
798
|
+
|
799
|
+
# ... do your tests...
|
800
|
+
end
|
801
|
+
```
|
802
|
+
|
803
|
+
`stub_elasticsearch_search_request` accepts an array or records:
|
804
|
+
|
805
|
+
```ruby
|
806
|
+
RSpec.describe SomeClass do
|
807
|
+
let(:articles) { create_list(:article, 5) }
|
808
|
+
|
809
|
+
before do
|
810
|
+
stub_elasticsearch_search_request("articles", articles)
|
811
|
+
end
|
812
|
+
|
813
|
+
# ... do your tests...
|
814
|
+
end
|
815
|
+
```
|
816
|
+
|
817
|
+
It allows to shim the total number of hits returned.
|
818
|
+
|
819
|
+
```ruby
|
820
|
+
RSpec.describe SomeClass do
|
821
|
+
before do
|
822
|
+
stub_elasticsearch_search_request("articles", [], total: 250)
|
823
|
+
end
|
824
|
+
|
825
|
+
# ... do your tests...
|
826
|
+
end
|
827
|
+
```
|
20
828
|
|
21
829
|
## Contributing
|
22
830
|
|
@@ -31,9 +839,10 @@ bundle add caoutsearch
|
|
31
839
|
```bash
|
32
840
|
bundle exec rspec
|
33
841
|
bundle exec rubocop
|
842
|
+
bundle exec standardrb
|
34
843
|
```
|
35
844
|
|
36
|
-
|
845
|
+
All of them can be run with:
|
37
846
|
|
38
847
|
```bash
|
39
848
|
bundle exec rake
|