caoutsearch 0.0.0 → 0.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (71) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +819 -6
  3. data/lib/caoutsearch/config/mappings.rb +1 -1
  4. data/lib/caoutsearch/filter/base.rb +11 -7
  5. data/lib/caoutsearch/filter/boolean.rb +1 -1
  6. data/lib/caoutsearch/filter/date.rb +93 -22
  7. data/lib/caoutsearch/filter/default.rb +10 -10
  8. data/lib/caoutsearch/filter/geo_point.rb +1 -1
  9. data/lib/caoutsearch/filter/match.rb +5 -5
  10. data/lib/caoutsearch/filter/none.rb +1 -1
  11. data/lib/caoutsearch/filter/range.rb +6 -6
  12. data/lib/caoutsearch/index/document.rb +11 -11
  13. data/lib/caoutsearch/index/indice_versions.rb +3 -3
  14. data/lib/caoutsearch/index/internal_dsl.rb +3 -3
  15. data/lib/caoutsearch/index/reindex.rb +11 -11
  16. data/lib/caoutsearch/index/scoping.rb +4 -4
  17. data/lib/caoutsearch/index/serialization.rb +13 -13
  18. data/lib/caoutsearch/instrumentation/base.rb +12 -12
  19. data/lib/caoutsearch/instrumentation/search.rb +11 -2
  20. data/lib/caoutsearch/mappings.rb +1 -1
  21. data/lib/caoutsearch/model/indexable.rb +57 -0
  22. data/lib/caoutsearch/model/searchable.rb +31 -0
  23. data/lib/caoutsearch/model.rb +12 -0
  24. data/lib/caoutsearch/response/aggregations.rb +50 -0
  25. data/lib/caoutsearch/response/response.rb +9 -0
  26. data/lib/caoutsearch/response/suggestions.rb +9 -0
  27. data/lib/caoutsearch/response.rb +6 -0
  28. data/lib/caoutsearch/search/adapter/active_record.rb +39 -0
  29. data/lib/caoutsearch/search/base.rb +16 -15
  30. data/lib/caoutsearch/search/batch/scroll.rb +93 -0
  31. data/lib/caoutsearch/search/batch/search_after.rb +70 -0
  32. data/lib/caoutsearch/search/batch_methods.rb +63 -0
  33. data/lib/caoutsearch/search/callbacks.rb +28 -0
  34. data/lib/caoutsearch/search/delete_methods.rb +19 -0
  35. data/lib/caoutsearch/search/dsl/item.rb +2 -2
  36. data/lib/caoutsearch/search/inspect.rb +34 -0
  37. data/lib/caoutsearch/search/instrumentation.rb +19 -0
  38. data/lib/caoutsearch/search/internal_dsl.rb +107 -0
  39. data/lib/caoutsearch/search/naming.rb +45 -0
  40. data/lib/caoutsearch/search/point_in_time.rb +28 -0
  41. data/lib/caoutsearch/search/query/boolean.rb +4 -4
  42. data/lib/caoutsearch/search/query/nested.rb +1 -1
  43. data/lib/caoutsearch/search/query/setters.rb +4 -4
  44. data/lib/caoutsearch/search/query_builder/aggregations.rb +49 -0
  45. data/lib/caoutsearch/search/query_builder.rb +89 -0
  46. data/lib/caoutsearch/search/query_methods.rb +157 -0
  47. data/lib/caoutsearch/search/records.rb +23 -0
  48. data/lib/caoutsearch/search/resettable.rb +38 -0
  49. data/lib/caoutsearch/search/response.rb +97 -0
  50. data/lib/caoutsearch/search/sanitizer.rb +2 -2
  51. data/lib/caoutsearch/search/search_methods.rb +239 -0
  52. data/lib/caoutsearch/search/type_cast.rb +14 -6
  53. data/lib/caoutsearch/search/value.rb +10 -10
  54. data/lib/caoutsearch/search/value_overflow.rb +1 -1
  55. data/lib/caoutsearch/settings.rb +1 -1
  56. data/lib/caoutsearch/testing/mock_requests.rb +105 -0
  57. data/lib/caoutsearch/testing.rb +3 -0
  58. data/lib/caoutsearch/version.rb +1 -1
  59. data/lib/caoutsearch.rb +10 -5
  60. metadata +44 -126
  61. data/lib/caoutsearch/search/search/delete_methods.rb +0 -21
  62. data/lib/caoutsearch/search/search/inspect.rb +0 -36
  63. data/lib/caoutsearch/search/search/instrumentation.rb +0 -21
  64. data/lib/caoutsearch/search/search/internal_dsl.rb +0 -77
  65. data/lib/caoutsearch/search/search/naming.rb +0 -47
  66. data/lib/caoutsearch/search/search/query_builder.rb +0 -94
  67. data/lib/caoutsearch/search/search/query_methods.rb +0 -180
  68. data/lib/caoutsearch/search/search/resettable.rb +0 -35
  69. data/lib/caoutsearch/search/search/response.rb +0 -88
  70. data/lib/caoutsearch/search/search/scroll_methods.rb +0 -113
  71. data/lib/caoutsearch/search/search/search_methods.rb +0 -230
data/README.md CHANGED
@@ -1,18 +1,830 @@
1
1
  # Caoutsearch [\ˈkawt͡ˈsɝtʃ\\](http://ipa-reader.xyz/?text=ˈkawt͡ˈsɝtʃ)
2
2
 
3
- ### Installation
3
+ [![Gem Version](https://badge.fury.io/rb/caoutsearch.svg)](https://rubygems.org/gems/caoutsearch)
4
+ [![CI Status](https://github.com/mon-territoire/caoutsearch/actions/workflows/ci.yml/badge.svg)](https://github.com/mon-territoire/caoutsearch/actions/workflows/ci.yml)
5
+ [![Ruby Style Guide](https://img.shields.io/badge/code_style-standard-brightgreen.svg)](https://github.com/testdouble/standard)
6
+ [![Maintainability](https://api.codeclimate.com/v1/badges/fbe73db3fd8be9a10e12/maintainability)](https://codeclimate.com/github/mon-territoire/caoutsearch/maintainability)
7
+ [![Test Coverage](https://api.codeclimate.com/v1/badges/fbe73db3fd8be9a10e12/test_coverage)](https://codeclimate.com/github/mon-territoire/caoutsearch/test_coverage)
8
+
9
+ [![JRuby](https://github.com/mon-territoire/caoutsearch/actions/workflows/jruby.yml/badge.svg)](https://github.com/mon-territoire/caoutsearch/actions/workflows/jruby.yml)
10
+ [![Truffle Ruby](https://github.com/mon-territoire/caoutsearch/actions/workflows/truffle_ruby.yml/badge.svg)](https://github.com/mon-territoire/caoutsearch/actions/workflows/truffle_ruby.yml)
11
+
12
+ **!! Gem under development before public release !!**
13
+
14
+ Caoutsearch is a new Elasticsearch integration for Ruby and/or Rails.
15
+ It provides a simple but powerful DSL to perform complex indexing and searching, while securely exposing search criteria to a public and chainable API, without overwhelming your models.
16
+
17
+ Caoutsearch only supports Elasticsearch 8.x right now.
18
+ It is used in production in a robust application, updated and maintained for several years at [Mon Territoire](https://mon-territoire.fr).
19
+
20
+ Caoutsearch was inspired by awesome gems such as [elasticsearch-rails](https://github.com/elastic/elasticsearch-rails) or [search_flip](https://github.com/mrkamel/search_flip).
21
+ If you don't have scenarios as complex as those described in this documentation, they should better suite your needs.
22
+
23
+ ## Table of Contents
24
+
25
+ - [Installation](#installation)
26
+ - [Configuration](#configuration)
27
+ - Instrumentation
28
+ - [Usage](#usage)
29
+ - [Indice Configuration](#indice-configuration)
30
+ - Mapping & settings
31
+ - Text analysis
32
+ - Versionning
33
+ - [Index Engine](#index-engine)
34
+ - Properties
35
+ - Partial updates
36
+ - Eager loading
37
+ - Interdependencies
38
+ - [Search Engine](#search-engine)
39
+ - Queries
40
+ - [Filters](#filters)
41
+ - Full-text query
42
+ - Custom filters
43
+ - Orders
44
+ - [Aggregations](#aggregations)
45
+ - [Transform aggregations](#transform-aggregations)
46
+ - [Responses](#responses)
47
+ - [Loading records](#loading-records)
48
+ - [Model integration](#model-integration)
49
+ - [Add Caoutsearch to your models](#add-caoutsearch-to-your-models)
50
+ - [Index records](#index-records)
51
+ - [Index multiple records](#index-multiple-records)
52
+ - [Index single records](#index-single-records)
53
+ - [Delete documents](#delete-documents)
54
+ - [Automatic Callbacks](#automatic-callbacks)
55
+ - Asynchronous methods
56
+ - [Search for records](#search-for-records)
57
+ - [Search API](#search-api)
58
+ - [Pagination](#pagination)
59
+ - [Total count](#total-count)
60
+ - [Iterating results](#iterating-results)
61
+ - [Testing with Caoutsearch](#testing-with-Caoutsearch)
62
+
63
+ ## Installation
4
64
 
5
65
  ```bash
6
66
  bundle add caoutsearch
7
67
  ```
8
68
 
9
- ### Configuration
69
+ ## Configuration
70
+
71
+ TODO
72
+
73
+ ## Usage
74
+
75
+ ### Indice Configuration
76
+
77
+ TODO
78
+
79
+ ### Index Engine
80
+
81
+ TODO
82
+
83
+ ### Search Engine
84
+
85
+ #### Filters
86
+ Filters declared in the search engine will define how Caoutsearch will build the queries
87
+
88
+ The main use of filters is to expose a field for search, but they can also be used to build more complex queries:
89
+ ```ruby
90
+ class ArticleSearch < Caoutsearch::Search::Base
91
+ # Build a filter on the author field
92
+ filter :author
93
+
94
+ # Build a Match filter on multiple fields
95
+ filter :content, indexes: %i[title.words content], as: :match
96
+
97
+ # Build a more complex filter by using other filters
98
+ filter :public, as: :boolean
99
+ filter :published_on, as: :date
100
+ filter :active do |value|
101
+ search_by(published: value, published_on: value)
102
+ end
103
+ end
104
+ ```
105
+
106
+ Caoutsearch different types of filters to handle different types of data or ways to search them:
107
+
108
+ ##### Default filter
109
+
110
+ ##### Boolean filter
111
+
112
+ ##### Date filter
113
+
114
+ For a date filter defined like this:
115
+ ```ruby
116
+ class ArticleSearch < Caoutsearch::Search::Base
117
+ ...
118
+
119
+ filter :published_on, as: :date
120
+ end
121
+ ```
122
+
123
+ You can now search the matching index with the `published_on` criterion:
124
+ ```ruby
125
+ Article.search(published_on: Date.today)
126
+ ```
127
+
128
+ and the following query will be generated to send to elasticsearch:
129
+ ```json
130
+ {
131
+ "query": {
132
+ "bool": {
133
+ "filter": [
134
+ { "range": { "published_on": { "gte": "2022-23-11", "lte": "2022-23-11"}}}
135
+ ]
136
+ }
137
+ }
138
+ }
139
+ ```
140
+
141
+ The date filter accepts multiple types of arguments :
142
+
143
+ ```ruby
144
+ # Search for articles published on a date:
145
+ Article.search(published_on: Date.today)
146
+
147
+ # Search for articles published before a date:
148
+ Article.search(published_on: { less_than: "2022-12-25" })
149
+ Article.search(published_on: { less_than_or_equal: "2022-12-25" })
150
+ Article.search(published_on: ..Date.new(2022, 12, 25))
151
+ Article.search(published_on: [[nil, "now-2w/d"]])
152
+
153
+ # Search for articles published after a date:
154
+ Article.search(published_on: { greater_than: "2022-12-25" })
155
+ Article.search(published_on: { greater_than_or_equal: "2022-12-25" })
156
+ Article.search(published_on: Date.new(2022, 12, 25)..)
157
+ Article.search(published_on: [["now-1w/d", nil]])
158
+
159
+ # Search for articles published between two dates:
160
+ Article.search(published_on: { greater_than: "2022-12-25", less_than: "2023-12-25" })
161
+ Article.search(published_on: Date.new(2022, 12, 25)..Date.new(2023, 12, 25))
162
+ Article.search(published_on: [["now-1w/d", "now/d"]])
163
+ ```
164
+
165
+ Dates of various formats are handled:
166
+ ```ruby
167
+ "2022-10-11"
168
+ Date.today
169
+ Time.zone.now
170
+ ```
171
+
172
+ We also support elasticsearch's date math
173
+ ```ruby
174
+ "now-1h"
175
+ "now+2w/d"
176
+ ```
177
+
178
+ ##### GeoPoint filter
179
+
180
+ ##### Match filter
181
+
182
+ ##### Range filter
183
+
184
+ #### Aggregations
185
+
186
+ You can define simple to complex aggregations.
187
+
188
+ ````ruby
189
+ class ArticleSearch < Caoutsearch::Search::Base
190
+ has_aggregation :view_count, sum: { field: :view_count }
191
+ has_aggregation :popular_tags,
192
+ filter: { term: { published: true } },
193
+ aggs: {
194
+ published: {
195
+ terms: { field: :tags, size: 10 }
196
+ }
197
+ }
198
+ end
199
+ ````
200
+
201
+ Then you can request one or more aggregations at the same time or chain the `aggregate` method.
202
+ The `aggregations` method will trigger a request and returns a [Response::Aggregations](#responses).
203
+
204
+ ````ruby
205
+ ArticleSearch.aggregate(:view_count).aggregations
206
+ # ArticleSearch Search { "body": { "aggs": { "view_count": { "sum": { "field": "view_count" }}}}}
207
+ # ArticleSearch Search (10ms / took 5ms)
208
+ => #<Caoutsearch::Response::Aggregations view_count=#<Caoutsearch::Response::Response value=119652>>
209
+
210
+ ArticleSearch.aggregate(:view_count, :popular_tags).aggregations
211
+ # ArticleSearch Search { "body": { "aggs": { "view_count": {…}, "popular_tags": {…}}}}
212
+ # ArticleSearch Search (10ms / took 5ms)
213
+ => #<Caoutsearch::Response::Aggregations view_count=#<Caoutsearch::Response::Response value=119652> popular_tags=#<Caoutsearch::Response::Response buckets=…>>
214
+
215
+ ArticleSearch.aggregate(:view_count).aggregate(:popular_tags).aggregations
216
+ # ArticleSearch Search { "body": { "aggs": { "view_count": {…}, "popular_tags": {…}}}}
217
+ # ArticleSearch Search (10ms / took 5ms)
218
+ => #<Caoutsearch::Response::Aggregations view_count=#<Caoutsearch::Response::Response value=119652> popular_tags=#<Caoutsearch::Response::Response buckets=…>>
219
+ ````
220
+
221
+ You can create powerful aggregations using blocks and pass arguments to them.
222
+
223
+ ````ruby
224
+ class ArticleSearch < Caoutsearch::Search::Base
225
+ has_aggregation :popular_tags_since do |date|
226
+ raise TypeError unless date.is_a?(Date)
227
+
228
+ query.aggregations[:popular_tags_since] = {
229
+ filter: { range: { publication_date: { gte: date.to_s } } },
230
+ aggs: {
231
+ published: {
232
+ terms: { field: :tags, size: 20 }
233
+ }
234
+ }
235
+ }
236
+ end
237
+ end
238
+
239
+ ArticleSearch.aggregate(popular_tags_since: 1.day.ago).aggregations
240
+ # ArticleSearch Search { "body": { "aggs": { "popular_tags_since": {…}}}}
241
+ # ArticleSearch Search (10ms / took 5ms)
242
+ => #<Caoutsearch::Response::Aggregations popular_tags_since=#<Caoutsearch::Response::Response …
243
+ ````
244
+
245
+ Only one argument can be passed to an aggregation block.
246
+ Use an Array or a Hash if you need to pass multiple options.
247
+
248
+ ````ruby
249
+ class ArticleSearch < Caoutsearch::Search::Base
250
+ has_aggregation :popular_tags_since do |options|
251
+ # …
252
+ end
253
+
254
+ has_aggregation :popular_tags_between do |(first_date, end_date)|
255
+ # …
256
+ end
257
+ end
258
+
259
+ ArticleSearch.aggregate(popular_tags_since: { date: 1.day.ago, size: 20 })
260
+ ArticleSearch.aggregate(popular_tags_between: [date1, date2])
261
+ ````
262
+
263
+ Finally, you can create a "catch-all" aggregation to handle cumbersome behaviors:
264
+
265
+ ````ruby
266
+ class ArticleSearch < Caoutsearch::Search::Base
267
+ has_aggregation do |name, options = {}|
268
+ raise "unxpected_error" unless name.match?(/^view_count_(?<year>\d{4})$/)
269
+
270
+ query.aggregations[name] = {
271
+ filter: { term: { year: $LAST_LATCH_INFO[:year] } },
272
+ aggs: {
273
+ filtered: {
274
+ sum: { field: :view_count }
275
+ }
276
+ }
277
+ }
278
+ end
279
+ end
280
+
281
+ ArticleSearch.aggregate(:view_count_2020, :view_count_2019).aggregations
282
+ # ArticleSearch Search { "body": { "aggs": { "view_count_2020": {…}, "view_count_2019": {…}}}}
283
+ # ArticleSearch Search (10ms / took 5ms)
284
+ => #<Caoutsearch::Response::Aggregations view_count_2020=#<Caoutsearch::Response::Response …
285
+ ````
286
+
287
+ #### Transform aggregations
288
+
289
+ When using [buckets aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket.html) and/or [pipeline aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline.html), the path to the expected values can get complicated and become subject to unexpected changes for a public API.
290
+
291
+ ````ruby
292
+ ArticleSearch.aggregate(popular_tags_since: 1.month.ago).aggregations.popular_tags_since.published.buckets.pluck(:key)
293
+ => ["Blog", "Tech", …]
294
+ ````
295
+
296
+ Instead, you can define transformations to provide simpler access to aggregated data:
297
+
298
+ ````ruby
299
+ class ArticleSearch < Caoutsearch::Search::Base
300
+ has_aggregation :popular_tags_since do |since|
301
+ # …
302
+ end
303
+
304
+ transform_aggregation :popular_tags_since do |aggs|
305
+ aggs.dig(:popular_tags_since, :published, :buckets).pluck(:key)
306
+ end
307
+ end
308
+
309
+ ArticleSearch.aggregate(popular_tags_since: 1.month.ago).aggregations.popular_tags_since
310
+ => ["Blog", "Tech", …]
311
+ ````
312
+
313
+ You can also use transformations to combine multiple aggregations:
314
+
315
+ ````ruby
316
+ class ArticleSearch < Caoutsearch::Search::Base
317
+ has_aggregation :blog_count, filter: { term: { category: "blog" } }
318
+ has_aggregation :archives_count, filter: { term: { archived: true } }
319
+
320
+ transform_aggregation :stats, from: %i[blog_count archives_count] do |aggs|
321
+ {
322
+ blog_count: aggs.dig(:blog_count, :doc_count),
323
+ archives_count: aggs.dig(:archives, :doc_count)
324
+ }
325
+ end
326
+ end
327
+
328
+ ArticleSearch.aggregate(:stats).aggregations.stats
329
+ # ArticleSearch Search { "body": { "aggs": { "blog_count": {…}, "archives_count": {…}}}}
330
+ # ArticleSearch Search (10ms / took 5ms)
331
+ => { blog_count: 124, archives_count: 2452 }
332
+ ````
333
+
334
+ This is also usefull to unify the API between different search engines:
335
+
336
+ ````ruby
337
+ class ArticleSearch < Caoutsearch::Search::Base
338
+ has_aggregation :popular_tags,
339
+ filter: { term: { published: true } },
340
+ aggs: { published: { terms: { field: :tags, size: 10 } } }
341
+
342
+ transform_aggregation :popular_tags do |aggs|
343
+ aggs.dig(:popular_tags, :published, :buckets).pluck(:key)
344
+ end
345
+ end
346
+
347
+ class TagSearch < Caoutsearch::Search::Base
348
+ has_aggregation :popular_tags,
349
+ terms: { field: "label", size: 20, order: { used_count: "desc" } }
350
+
351
+ transform_aggregation :popular_tags do |aggs|
352
+ aggs.dig(:popular_tags, :buckets).pluck(:key)
353
+ end
354
+ end
355
+
356
+ ArticleSearch.aggregate(:popular_tags).aggregations.popular_tags
357
+ => ["Blog", "Tech", …]
358
+
359
+ TagSearch.aggregate(:popular_tags).aggregations.popular_tags
360
+ => ["Tech", "Blog", …]
361
+ ````
362
+
363
+ Transformations are performed on demand and result is memorized. That means:
364
+ - the result of transformation is not visible in the [Response::Aggregations](#responses) output.
365
+ - the block is called only once for the same search instance.
366
+
367
+ ````ruby
368
+ class ArticleSearch < Caoutsearch::Search::Base
369
+ has_aggregation :popular_tags, …
370
+
371
+ transform_aggregation :popular_tags do |aggs|
372
+ tags = aggs.dig(:popular_tags, :published, :buckets).pluck(:key)
373
+ authorized = Tag.where(title: tags, authorize: true).pluck(:title)
374
+ tags & authorized
375
+ end
376
+ end
377
+
378
+ article_search = ArticleSearch.aggregate(:popular_tags)
379
+ => #<ArticleSearch current_aggregations: [:popular_tags]>
380
+
381
+ article_search.aggregations
382
+ # ArticleSearch Search (10ms / took 5ms)
383
+ => #<Caoutsearch::Response::Aggregations popular_tags=#<Caoutsearch::Response::Response doc_count=100 …
384
+
385
+ article_search.aggregations.popular_tags
386
+ # (10.2ms) SELECT "tags"."title" FROM "tags" WHERE "tags"."title" IN …
387
+ => ["Blog", "Tech", …]
388
+
389
+ article_search.aggregations.popular_tags
390
+ => ["Blog", "Tech", …]
391
+
392
+ article_search.search("Tech").aggregations.popular_tags
393
+ # ArticleSearch Search (10ms / took 5ms)
394
+ # (10.2ms) SELECT "tags"."title" FROM "tags" WHERE "tags"."title" IN …
395
+ => ["Blog", "Tech", …]
396
+ ````
397
+
398
+ Be careful to avoid using `aggregations.<aggregation_name>` inside a transformation block: it can lead to an infinite recursion.
399
+
400
+ ````ruby
401
+ class ArticleSearch < Caoutsearch::Search::Base
402
+ transform_aggregation :popular_tags do
403
+ aggregations.popular_tags.buckets.pluck("key")
404
+ end
405
+ end
406
+
407
+ ArticleSearch.aggregate(:popular_tags).aggregations.popular_tags
408
+ Traceback (most recent call last):
409
+ 4: from app/searches/article_search.rb:3:in `block in <class:ArticleSearch>'
410
+ 3: from app/searches/article_search.rb:3:in `block in <class:ArticleSearch>'
411
+ 2: from app/searches/article_search.rb:3:in `block in <class:ArticleSearch>'
412
+ 1: from app/searches/article_search.rb:3:in `block in <class:ArticleSearch>'
413
+ SystemStackError (stack level too deep)
414
+ ````
415
+
416
+ Instead, use the argument passed to the block: it's is a shortcut for `response.aggregations` which is a [Response::Reponse](#responses) and not a [Response::Aggregations](#responses).
417
+
418
+ ````ruby
419
+ class ArticleSearch < Caoutsearch::Search::Base
420
+ transform_aggregation :popular_tags do |aggs|
421
+ aggs.popular_tags.buckets.pluck("key")
422
+ end
423
+ end
424
+
425
+ ArticleSearch.aggregate(:popular_tags).aggregations.popular_tags
426
+ => ["Blog", "Tech", …]
427
+ ````
428
+
429
+ One last helpful argument is `track_total_hits` which allows to perform calculations over aggregations using the `total_count` method without sending a second request.
430
+ Take a look at [Total count](#total-count) to understand why a second request could be performed.
431
+
432
+ ````ruby
433
+ class ArticleSearch < Caoutsearch::Search::Base
434
+ aggregation :tagged, filter: { exist: "tag" }
435
+
436
+ transform_aggregation :tagged_rate, from: :tagged, track_total_hits: true do |aggs|
437
+ count = aggs.dig(:tagged, :doc_count)
438
+ count.to_f / total_count
439
+ end
440
+
441
+ transform_aggregation :tagged_rate_without_track_total_hits, from: :tagged do |aggs|
442
+ count = aggs.dig(:tagged, :doc_count)
443
+ count.to_f / total_count
444
+ end
445
+ end
446
+
447
+ ArticleSearch.aggregate(:tagged_rate).aggregations.tagged_rate
448
+ # ArticleSearch Search { "body": { "track_total_hits": true, "aggs": { "blog_count": {…}, "archives_count": {…}}}}
449
+ # ArticleSearch Search (10ms / took 5ms)
450
+ => 0.95
451
+
452
+ ArticleSearch.aggregate(:tagged_rate_without_track_total_hits).aggregations.tagged_rate
453
+ # ArticleSearch Search { "body": { "aggs": { "blog_count": {…}, "archives_count": {…}}}}
454
+ # ArticleSearch Search (10ms / took 5ms)
455
+ # ArticleSearch Search { "body": { "track_total_hits": true, "aggs": { "blog_count": {…}, "archives_count":
456
+ # ArticleSearch Search (10ms / took 5ms)
457
+ => 0.95
458
+ ````
459
+
460
+ #### Responses
461
+
462
+ After the request has been sent by calling a method such as `load`, `response` or `hits`, the results is wrapped in a `Response::Response` class which provides method access to its properties via [Hashie::Mash](http://github.com/intridea/hashie).
463
+
464
+ Aggregations and suggestions are wrapped in their own respective subclass of `Response::Response`
465
+
466
+ ````ruby
467
+ results.response
468
+ => #<Caoutsearch::Response::Response _shards=#<Caoutsearch::Response::Response failed=0 skipped=0 successful=5 total=5> hits=…
469
+
470
+ search.hits
471
+ => #<Hashie::Array [#<Caoutsearch::Response::Response _id="2"…
472
+
473
+ search.aggregations
474
+ => #<Caoutsearch::Response::Aggregations view_count=#<Caoutsearch::Response::Response…
475
+
476
+ search.suggestions
477
+ => #<Caoutsearch::Response::Suggestions tags=#<Caoutsearch::Response::Response…
478
+ ````
479
+
480
+ ##### Loading records
481
+
482
+ When calling `records`, the search engine will try to load records from a model using the same class name without `Search` the suffix:
483
+ * `ArticleSearch` > `Article`
484
+ * `Blog::ArticleSearch` > `Blog::Article`
485
+
486
+ ````ruby
487
+ ArticleSearch.new.records.first
488
+ # ArticleSearch Search (10ms / took 5ms)
489
+ # Article Load (9.6ms) SELECT "articles".* FROM "articles" WHERE "articles"."id" IN (1, …
490
+ => #<Article id: 1, …>
491
+ ````
492
+
493
+ However, you can define an alternative model to load records. This might be helpful when using [single table inheritance](https://api.rubyonrails.org/classes/ActiveRecord/Inheritance.html).
494
+
495
+ ````ruby
496
+ ArticleSearch.new.records(use: BlogArticle).first
497
+ # ArticleSearch Search (10ms / took 5ms)
498
+ # BlogArticle Load (9.6ms) SELECT "articles".* FROM "articles" WHERE "articles"."id" IN (1, …
499
+ => #<BlogArticle id: 1, …>
500
+ ````
501
+
502
+ You can also define an alternative model at class level:
503
+
504
+ ````ruby
505
+ class BlogArticleSearch < Caoutsearch::Search::Base
506
+ self.model_name = "Article"
507
+
508
+ default do
509
+ query.filters << { term: { category: "blog" } }
510
+ end
511
+ end
512
+
513
+ BlogArticleSearch.new.records.first
514
+ # BlogArticleSearch Search (10ms / took 5ms)
515
+ # Article Load (9.6ms) SELECT "articles".* FROM "articles" WHERE "articles"."id" IN (1, …
516
+ => #<Article id: 1, …>
517
+ ````
518
+
519
+ ### Model integration
520
+
521
+ #### Add Caoutsearch to your models
522
+
523
+ The simplest solution is to add `Caoutsearch::Model` to your model and the link the appropriate `Index` and/or `Search` engines:
524
+
525
+ ```ruby
526
+ class Article < ActiveRecord::Base
527
+ include Caoutsearch::Model
528
+
529
+ index_with ArticleIndex
530
+ search_with ArticleSearch
531
+ end
532
+ ```
533
+
534
+ If you don't need your models to be `Indexable` and `Searchable`, you can include only one of the following two modules:
535
+
536
+ ````ruby
537
+ class Article < ActiveRecord::Base
538
+ include Caoutsearch::Model::Indexable
539
+
540
+ index_with ArticleIndex
541
+ end
542
+ ````
543
+ or
544
+ ````ruby
545
+ class Article < ActiveRecord::Base
546
+ include Caoutsearch::Model::Searchable
547
+
548
+ search_with ArticleSearch
549
+ end
550
+ ````
551
+
552
+ The modules can be safely included in the meta model `ApplicationRecord`.
553
+ Indexing & searching features are not available until you call `index_with` or `search_with`:
554
+
555
+ ````ruby
556
+ class ApplicationRecord < ActiveRecord::Base
557
+ include Caoutsearch::Model
558
+ end
559
+ ````
560
+
561
+ #### Index records
562
+
563
+ ##### Index multiple records
564
+
565
+ Import all your records or a restricted scope of records to Elastcisearch.
566
+
567
+ ````ruby
568
+ Article.reindex
569
+ Article.where(published: true).reindex
570
+ ````
571
+
572
+ You can update one or more properties. (see [Indexation Engines](#indexation-engines) to read more about properties):
573
+
574
+ ````ruby
575
+ Article.reindex(:category)
576
+ Article.reindex(%i[category published_on])
577
+ ````
578
+
579
+ When `reindex` is called without properties, it'll import the full document to ES.
580
+ On the contrary, when properties are passed, it'll only update existing documents.
581
+ You can control this behavior with the `method` argument.
582
+
583
+ ````ruby
584
+ Article.where(id: 123).reindex(:category)
585
+ # ArticleIndex Reindex {"index":"articles","body":[{"update":{"_id":123}},{"doc":{"category":"blog"}}]}
586
+ # [Error] {"update"=>{"_index"=>"articles", "_id"=>"123", "status"=>404, "error"=>{"type"=>"document_missing_exception", …}}
587
+
588
+ Article.where(id: 123).reindex(:category, method: :index)
589
+ # ArticleIndex Reindex {"index":"articles","body":[{"index":{"_id":123}},{"category":"blog"}]}
590
+
591
+ Article.where(id: 123).reindex(method: :update)
592
+ # ArticleIndex Reindex {"index":"articles","body":[{"update":{"_id":123}},{"doc":{…}}]}
593
+ ````
594
+
595
+ ##### Index single records
596
+
597
+ Import a single record.
598
+
599
+ ````ruby
600
+ Article.find(123).update_index
601
+ ````
602
+
603
+ You can update one or more properties. (see [Indexation Engines](#indexation-engines) to read more about properties):
604
+
605
+ ````ruby
606
+ Article.find(123).update_index(:category)
607
+ Article.find(123).update_index(%i[category published_on])
608
+ ````
609
+
610
+ You can verify if and how documents are indexed.
611
+ If the document is missing in ES, it'll raise a `Elastic::Transport::Transport::Errors::NotFound`.
612
+
613
+ ````ruby
614
+ Article.find(123).indexed_document
615
+ # Traceback (most recent call last):
616
+ # 1: from (irb):1
617
+ # Elastic::Transport::Transport::Errors::NotFound ([404] {"_index":"articles","_id":"123","found":false})
618
+
619
+ Article.find(123).update_index
620
+ Article.find(123).indexed_document
621
+ => {"_index"=>"articles", "_id"=>"123", "_version"=>1"found"=>true, "_source"=>{…}}
622
+ ````
623
+
624
+ ##### Delete documents
625
+
626
+ You can delete one or more documents.
627
+ **Note**: it won't delete records from database, only from the ES indice.
628
+
629
+ ````ruby
630
+ Article.delete_indexes
631
+ Article.where(id: 123).delete_indexed_documents
632
+ Article.find(123).delete_index
633
+ ````
634
+
635
+ If a record is already deleted from the database, you can still delete its document.
636
+
637
+ ````ruby
638
+ Article.delete_index(123)
639
+ ````
640
+
641
+ ##### Automatic Callbacks
642
+
643
+ Callbacks are not provided by Caoutsearch but they are very easy to add:
644
+
645
+ ````ruby
646
+ class Article < ApplicationRecord
647
+ index_with ArticleIndex
648
+
649
+ after_commit :update_index, on: %i[create update]
650
+ after_commit :delete_index, on: %i[destroy]
651
+ end
652
+ ````
653
+
654
+ ##### Asynchronous methods
655
+
656
+ TODO
657
+
658
+ #### Search for records
659
+
660
+ ##### Search API
661
+ Searching is pretty simple.
662
+
663
+ ````ruby
664
+ Article.search("Quick brown fox")
665
+ => #<ArticleSearch current_criteria: ["Quick brown fox"]>
666
+ ````
667
+
668
+ You can chain criteria and many other parameters:
669
+ ````ruby
670
+ Article.search("Quick brown fox").search(published: true)
671
+ => #<ArticleSearch current_criteria: ["Quick brown fox", {"published"=>true}]>
672
+
673
+ Article.search("Quick brown fox").order(:publication_date)
674
+ => #<ArticleSearch current_criteria: ["Quick brown fox"], current_order: :publication_date>
675
+
676
+ Article.search("Quick brown fox").limit(100).offset(100)
677
+ => #<ArticleSearch current_criteria: ["Quick brown fox"], current_limit: 100, current_offset: 100>
678
+
679
+ Article.search("Quick brown fox").page(1).per(100)
680
+ => #<ArticleSearch current_criteria: ["Quick brown fox"], current_page: 1, current_limit: 100>
681
+
682
+ Article.search("Quick brown fox").aggregate(:tags).aggregate(:dates)
683
+ => #<ArticleSearch current_criteria: ["Quick brown fox"], current_aggregations: [:tags, :dates]>>
684
+ ````
685
+
686
+ ##### Pagination
687
+
688
+ Search results can be paginated.
689
+ ````ruby
690
+ search = Article.search("Quick brown fox").page(1).per(100)
691
+ search.current_page
692
+ => 1
693
+
694
+ search.total_pages
695
+ => 2546
696
+
697
+ > search.total_count
698
+ => 254514
699
+ ````
700
+
701
+ ##### Total count
702
+
703
+ By default [ES doesn't return the total number of hits](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-your-data.html#track-total-hits). So, when calling `total_count` or `total_pages` a second request might be sent to ES.
704
+ To avoid a second roundtrip, use `track_total_hits`:
705
+
706
+ ````ruby
707
+ search = Article.search("Quick brown fox")
708
+ search.hits
709
+ # ArticleSearch Search {…}
710
+ # ArticleSearch Search (81.8ms / took 16ms)
711
+ => […]
712
+
713
+ search.total_count
714
+ # ArticleSearch Search {…, track_total_hits: true }
715
+ # ArticleSearch Search (135.3ms / took 76ms)
716
+ => 276
717
+
718
+ search = Article.search("Quick brown fox").track_total_hits
719
+ search.hits
720
+ # ArticleSearch Search {…, track_total_hits: true }
721
+ # ArticleSearch Search (120.2ms / took 56ms)
722
+ => […]
723
+
724
+ search.total_count
725
+ => 276
726
+ ````
727
+
728
+ ##### Iterating results
729
+
730
+ Several methods are provided to loop through a collection or hits or records.
731
+ These methods are processing batches in the most efficient way: [PIT search_after](https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html#search-after).
732
+
733
+ * `find_each_hit` to yield each hit returned by Elasticsearch.
734
+ * `find_each_record` to yield each record from your database.
735
+ * `find_hits_in_batches` to yield each batch of hits as returned by Elasticsearch.
736
+ * `find_records_in_batches` to yield each batch of records from the database.
737
+
738
+ Example:
739
+
740
+ ```ruby
741
+ Article.search(published: true).find_each_record do |record|
742
+ record.inspect
743
+ end
744
+ ```
745
+
746
+ The `keep_alive` parameter tells Elasticsearch how long it should keep the point in time alive. Defaults to 1 minute.
747
+
748
+ ```ruby
749
+ Article.search(published: true).find_each_record(keep_alive: "2h")
750
+ ```
751
+
752
+ To specifies the size of the batch, use `per` chainable method or `batch_size` parameter. Defaults to 1000.
753
+
754
+ ```ruby
755
+ Article.search(published: true).find_records_in_batches(batch_size: 500)
756
+ Article.search(published: true).per(500).find_records_in_batches
757
+ ```
758
+
759
+ ## Testing with Caoutsearch
760
+
761
+ Caoutsearch offers few methods to stub Elasticsearch requests.
762
+ You first need to add [webmock](https://github.com/bblimke/webmock) to your Gemfile.
763
+
764
+ ```bash
765
+ bundle add webmock
766
+ ```
767
+
768
+ Then, add `Caoutsearch::Testing::MockRequests` to your test suite.
769
+ The examples below uses RSpec, but it should be compatible with other test framework.
770
+
771
+ ```ruby
772
+ # spec/spec_helper.rb
773
+
774
+ require "caoutsearch/testing"
775
+
776
+ RSpec.configure do |config|
777
+ config.include Caoutsearch::Testing::MockRequests
778
+ end
779
+ ```
780
+
781
+ You can then call the following methods:
782
+
783
+ ```ruby
784
+ RSpec.describe SomeClass do
785
+ before do
786
+ stub_elasticsearch_request(:head, "articles").to_return(status: 200)
787
+
788
+ stub_elasticsearch_request(:get, "_cat/indices?format=json&h=index").to_return_json, [
789
+ { index: "ca_locals_v14" }
790
+ ])
791
+
792
+ stub_elasticsearch_reindex_request("articles")
793
+ stub_elasticsearch_search_request("articles", [
794
+ {"_id" => "135", "_source" => {"name" => "Hello World"}},
795
+ {"_id" => "137", "_source" => {"name" => "Hello World"}}
796
+ ])
797
+ end
798
+
799
+ # ... do your tests...
800
+ end
801
+ ```
10
802
 
11
- <!-- TODO -->
803
+ `stub_elasticsearch_search_request` accepts an array or records:
12
804
 
13
- ### Usage
805
+ ```ruby
806
+ RSpec.describe SomeClass do
807
+ let(:articles) { create_list(:article, 5) }
14
808
 
15
- <!-- TODO -->
809
+ before do
810
+ stub_elasticsearch_search_request("articles", articles)
811
+ end
812
+
813
+ # ... do your tests...
814
+ end
815
+ ```
816
+
817
+ It allows to shim the total number of hits returned.
818
+
819
+ ```ruby
820
+ RSpec.describe SomeClass do
821
+ before do
822
+ stub_elasticsearch_search_request("articles", [], total: 250)
823
+ end
824
+
825
+ # ... do your tests...
826
+ end
827
+ ```
16
828
 
17
829
  ## Contributing
18
830
 
@@ -27,9 +839,10 @@ bundle add caoutsearch
27
839
  ```bash
28
840
  bundle exec rspec
29
841
  bundle exec rubocop
842
+ bundle exec standardrb
30
843
  ```
31
844
 
32
- Both can be run with:
845
+ All of them can be run with:
33
846
 
34
847
  ```bash
35
848
  bundle exec rake