caoutsearch 0.0.1 → 0.0.3

Sign up to get free protection for your applications and to get access to all the features.
Files changed (71) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +816 -7
  3. data/lib/caoutsearch/config/mappings.rb +1 -1
  4. data/lib/caoutsearch/filter/base.rb +11 -7
  5. data/lib/caoutsearch/filter/boolean.rb +1 -1
  6. data/lib/caoutsearch/filter/date.rb +93 -22
  7. data/lib/caoutsearch/filter/default.rb +10 -10
  8. data/lib/caoutsearch/filter/geo_point.rb +1 -1
  9. data/lib/caoutsearch/filter/match.rb +5 -5
  10. data/lib/caoutsearch/filter/none.rb +1 -1
  11. data/lib/caoutsearch/filter/range.rb +6 -6
  12. data/lib/caoutsearch/index/document.rb +11 -11
  13. data/lib/caoutsearch/index/indice_versions.rb +3 -3
  14. data/lib/caoutsearch/index/internal_dsl.rb +3 -3
  15. data/lib/caoutsearch/index/reindex.rb +11 -11
  16. data/lib/caoutsearch/index/scoping.rb +3 -3
  17. data/lib/caoutsearch/index/serialization.rb +13 -13
  18. data/lib/caoutsearch/instrumentation/base.rb +12 -12
  19. data/lib/caoutsearch/instrumentation/search.rb +11 -2
  20. data/lib/caoutsearch/mappings.rb +1 -1
  21. data/lib/caoutsearch/model/indexable.rb +57 -0
  22. data/lib/caoutsearch/model/searchable.rb +31 -0
  23. data/lib/caoutsearch/model.rb +12 -0
  24. data/lib/caoutsearch/response/aggregations.rb +50 -0
  25. data/lib/caoutsearch/response/response.rb +9 -0
  26. data/lib/caoutsearch/response/suggestions.rb +9 -0
  27. data/lib/caoutsearch/response.rb +6 -0
  28. data/lib/caoutsearch/search/adapter/active_record.rb +39 -0
  29. data/lib/caoutsearch/search/base.rb +16 -15
  30. data/lib/caoutsearch/search/batch/scroll.rb +93 -0
  31. data/lib/caoutsearch/search/batch/search_after.rb +70 -0
  32. data/lib/caoutsearch/search/batch_methods.rb +63 -0
  33. data/lib/caoutsearch/search/callbacks.rb +28 -0
  34. data/lib/caoutsearch/search/delete_methods.rb +19 -0
  35. data/lib/caoutsearch/search/dsl/item.rb +2 -2
  36. data/lib/caoutsearch/search/inspect.rb +34 -0
  37. data/lib/caoutsearch/search/instrumentation.rb +19 -0
  38. data/lib/caoutsearch/search/internal_dsl.rb +107 -0
  39. data/lib/caoutsearch/search/naming.rb +45 -0
  40. data/lib/caoutsearch/search/point_in_time.rb +28 -0
  41. data/lib/caoutsearch/search/query/boolean.rb +4 -4
  42. data/lib/caoutsearch/search/query/nested.rb +1 -1
  43. data/lib/caoutsearch/search/query/setters.rb +4 -4
  44. data/lib/caoutsearch/search/query_builder/aggregations.rb +49 -0
  45. data/lib/caoutsearch/search/query_builder.rb +89 -0
  46. data/lib/caoutsearch/search/query_methods.rb +157 -0
  47. data/lib/caoutsearch/search/records.rb +23 -0
  48. data/lib/caoutsearch/search/resettable.rb +38 -0
  49. data/lib/caoutsearch/search/response.rb +97 -0
  50. data/lib/caoutsearch/search/sanitizer.rb +2 -2
  51. data/lib/caoutsearch/search/search_methods.rb +239 -0
  52. data/lib/caoutsearch/search/type_cast.rb +14 -6
  53. data/lib/caoutsearch/search/value.rb +10 -10
  54. data/lib/caoutsearch/search/value_overflow.rb +1 -1
  55. data/lib/caoutsearch/settings.rb +1 -1
  56. data/lib/caoutsearch/testing/mock_requests.rb +105 -0
  57. data/lib/caoutsearch/testing.rb +3 -0
  58. data/lib/caoutsearch/version.rb +1 -1
  59. data/lib/caoutsearch.rb +10 -5
  60. metadata +44 -126
  61. data/lib/caoutsearch/search/search/delete_methods.rb +0 -21
  62. data/lib/caoutsearch/search/search/inspect.rb +0 -36
  63. data/lib/caoutsearch/search/search/instrumentation.rb +0 -21
  64. data/lib/caoutsearch/search/search/internal_dsl.rb +0 -77
  65. data/lib/caoutsearch/search/search/naming.rb +0 -47
  66. data/lib/caoutsearch/search/search/query_builder.rb +0 -94
  67. data/lib/caoutsearch/search/search/query_methods.rb +0 -180
  68. data/lib/caoutsearch/search/search/resettable.rb +0 -35
  69. data/lib/caoutsearch/search/search/response.rb +0 -88
  70. data/lib/caoutsearch/search/search/scroll_methods.rb +0 -113
  71. data/lib/caoutsearch/search/search/search_methods.rb +0 -230
data/README.md CHANGED
@@ -2,21 +2,829 @@
2
2
 
3
3
  [![Gem Version](https://badge.fury.io/rb/caoutsearch.svg)](https://rubygems.org/gems/caoutsearch)
4
4
  [![CI Status](https://github.com/mon-territoire/caoutsearch/actions/workflows/ci.yml/badge.svg)](https://github.com/mon-territoire/caoutsearch/actions/workflows/ci.yml)
5
- [![Maintainability](https://api.codeclimate.com/v1/badges/9bb8b75ea8c66b1a9c94/maintainability)](https://codeclimate.com/github/mon-territoire/caoutsearch/maintainability)
5
+ [![Ruby Style Guide](https://img.shields.io/badge/code_style-standard-brightgreen.svg)](https://github.com/testdouble/standard)
6
+ [![Maintainability](https://api.codeclimate.com/v1/badges/fbe73db3fd8be9a10e12/maintainability)](https://codeclimate.com/github/mon-territoire/caoutsearch/maintainability)
7
+ [![Test Coverage](https://api.codeclimate.com/v1/badges/fbe73db3fd8be9a10e12/test_coverage)](https://codeclimate.com/github/mon-territoire/caoutsearch/test_coverage)
6
8
 
7
- ### Installation
9
+ [![JRuby](https://github.com/mon-territoire/caoutsearch/actions/workflows/jruby.yml/badge.svg)](https://github.com/mon-territoire/caoutsearch/actions/workflows/jruby.yml)
10
+ [![Truffle Ruby](https://github.com/mon-territoire/caoutsearch/actions/workflows/truffle_ruby.yml/badge.svg)](https://github.com/mon-territoire/caoutsearch/actions/workflows/truffle_ruby.yml)
11
+
12
+ **!! Gem under development before public release !!**
13
+
14
+ Caoutsearch is a new Elasticsearch integration for Ruby and/or Rails.
15
+ It provides a simple but powerful DSL to perform complex indexing and searching, while securely exposing search criteria to a public and chainable API, without overwhelming your models.
16
+
17
+ Caoutsearch only supports Elasticsearch 8.x right now.
18
+ It is used in production in a robust application, updated and maintained for several years at [Mon Territoire](https://mon-territoire.fr).
19
+
20
+ Caoutsearch was inspired by awesome gems such as [elasticsearch-rails](https://github.com/elastic/elasticsearch-rails) or [search_flip](https://github.com/mrkamel/search_flip).
21
+ If you don't have scenarios as complex as those described in this documentation, they should better suite your needs.
22
+
23
+ ## Table of Contents
24
+
25
+ - [Installation](#installation)
26
+ - [Configuration](#configuration)
27
+ - Instrumentation
28
+ - [Usage](#usage)
29
+ - [Indice Configuration](#indice-configuration)
30
+ - Mapping & settings
31
+ - Text analysis
32
+ - Versionning
33
+ - [Index Engine](#index-engine)
34
+ - Properties
35
+ - Partial updates
36
+ - Eager loading
37
+ - Interdependencies
38
+ - [Search Engine](#search-engine)
39
+ - Queries
40
+ - [Filters](#filters)
41
+ - Full-text query
42
+ - Custom filters
43
+ - Orders
44
+ - [Aggregations](#aggregations)
45
+ - [Transform aggregations](#transform-aggregations)
46
+ - [Responses](#responses)
47
+ - [Loading records](#loading-records)
48
+ - [Model integration](#model-integration)
49
+ - [Add Caoutsearch to your models](#add-caoutsearch-to-your-models)
50
+ - [Index records](#index-records)
51
+ - [Index multiple records](#index-multiple-records)
52
+ - [Index single records](#index-single-records)
53
+ - [Delete documents](#delete-documents)
54
+ - [Automatic Callbacks](#automatic-callbacks)
55
+ - Asynchronous methods
56
+ - [Search for records](#search-for-records)
57
+ - [Search API](#search-api)
58
+ - [Pagination](#pagination)
59
+ - [Total count](#total-count)
60
+ - [Iterating results](#iterating-results)
61
+ - [Testing with Caoutsearch](#testing-with-Caoutsearch)
62
+
63
+ ## Installation
8
64
 
9
65
  ```bash
10
66
  bundle add caoutsearch
11
67
  ```
12
68
 
13
- ### Configuration
69
+ ## Configuration
70
+
71
+ TODO
72
+
73
+ ## Usage
74
+
75
+ ### Indice Configuration
76
+
77
+ TODO
78
+
79
+ ### Index Engine
80
+
81
+ TODO
82
+
83
+ ### Search Engine
84
+
85
+ #### Filters
86
+ Filters declared in the search engine will define how Caoutsearch will build the queries
87
+
88
+ The main use of filters is to expose a field for search, but they can also be used to build more complex queries:
89
+ ```ruby
90
+ class ArticleSearch < Caoutsearch::Search::Base
91
+ # Build a filter on the author field
92
+ filter :author
93
+
94
+ # Build a Match filter on multiple fields
95
+ filter :content, indexes: %i[title.words content], as: :match
96
+
97
+ # Build a more complex filter by using other filters
98
+ filter :public, as: :boolean
99
+ filter :published_on, as: :date
100
+ filter :active do |value|
101
+ search_by(published: value, published_on: value)
102
+ end
103
+ end
104
+ ```
105
+
106
+ Caoutsearch different types of filters to handle different types of data or ways to search them:
107
+
108
+ ##### Default filter
109
+
110
+ ##### Boolean filter
111
+
112
+ ##### Date filter
113
+
114
+ For a date filter defined like this:
115
+ ```ruby
116
+ class ArticleSearch < Caoutsearch::Search::Base
117
+ ...
118
+
119
+ filter :published_on, as: :date
120
+ end
121
+ ```
122
+
123
+ You can now search the matching index with the `published_on` criterion:
124
+ ```ruby
125
+ Article.search(published_on: Date.today)
126
+ ```
127
+
128
+ and the following query will be generated to send to elasticsearch:
129
+ ```json
130
+ {
131
+ "query": {
132
+ "bool": {
133
+ "filter": [
134
+ { "range": { "published_on": { "gte": "2022-23-11", "lte": "2022-23-11"}}}
135
+ ]
136
+ }
137
+ }
138
+ }
139
+ ```
140
+
141
+ The date filter accepts multiple types of arguments :
142
+
143
+ ```ruby
144
+ # Search for articles published on a date:
145
+ Article.search(published_on: Date.today)
146
+
147
+ # Search for articles published before a date:
148
+ Article.search(published_on: { less_than: "2022-12-25" })
149
+ Article.search(published_on: { less_than_or_equal: "2022-12-25" })
150
+ Article.search(published_on: ..Date.new(2022, 12, 25))
151
+ Article.search(published_on: [[nil, "now-2w/d"]])
152
+
153
+ # Search for articles published after a date:
154
+ Article.search(published_on: { greater_than: "2022-12-25" })
155
+ Article.search(published_on: { greater_than_or_equal: "2022-12-25" })
156
+ Article.search(published_on: Date.new(2022, 12, 25)..)
157
+ Article.search(published_on: [["now-1w/d", nil]])
158
+
159
+ # Search for articles published between two dates:
160
+ Article.search(published_on: { greater_than: "2022-12-25", less_than: "2023-12-25" })
161
+ Article.search(published_on: Date.new(2022, 12, 25)..Date.new(2023, 12, 25))
162
+ Article.search(published_on: [["now-1w/d", "now/d"]])
163
+ ```
164
+
165
+ Dates of various formats are handled:
166
+ ```ruby
167
+ "2022-10-11"
168
+ Date.today
169
+ Time.zone.now
170
+ ```
171
+
172
+ We also support elasticsearch's date math
173
+ ```ruby
174
+ "now-1h"
175
+ "now+2w/d"
176
+ ```
177
+
178
+ ##### GeoPoint filter
179
+
180
+ ##### Match filter
181
+
182
+ ##### Range filter
183
+
184
+ #### Aggregations
185
+
186
+ You can define simple to complex aggregations.
187
+
188
+ ````ruby
189
+ class ArticleSearch < Caoutsearch::Search::Base
190
+ has_aggregation :view_count, sum: { field: :view_count }
191
+ has_aggregation :popular_tags,
192
+ filter: { term: { published: true } },
193
+ aggs: {
194
+ published: {
195
+ terms: { field: :tags, size: 10 }
196
+ }
197
+ }
198
+ end
199
+ ````
200
+
201
+ Then you can request one or more aggregations at the same time or chain the `aggregate` method.
202
+ The `aggregations` method will trigger a request and returns a [Response::Aggregations](#responses).
203
+
204
+ ````ruby
205
+ ArticleSearch.aggregate(:view_count).aggregations
206
+ # ArticleSearch Search { "body": { "aggs": { "view_count": { "sum": { "field": "view_count" }}}}}
207
+ # ArticleSearch Search (10ms / took 5ms)
208
+ => #<Caoutsearch::Response::Aggregations view_count=#<Caoutsearch::Response::Response value=119652>>
209
+
210
+ ArticleSearch.aggregate(:view_count, :popular_tags).aggregations
211
+ # ArticleSearch Search { "body": { "aggs": { "view_count": {…}, "popular_tags": {…}}}}
212
+ # ArticleSearch Search (10ms / took 5ms)
213
+ => #<Caoutsearch::Response::Aggregations view_count=#<Caoutsearch::Response::Response value=119652> popular_tags=#<Caoutsearch::Response::Response buckets=…>>
214
+
215
+ ArticleSearch.aggregate(:view_count).aggregate(:popular_tags).aggregations
216
+ # ArticleSearch Search { "body": { "aggs": { "view_count": {…}, "popular_tags": {…}}}}
217
+ # ArticleSearch Search (10ms / took 5ms)
218
+ => #<Caoutsearch::Response::Aggregations view_count=#<Caoutsearch::Response::Response value=119652> popular_tags=#<Caoutsearch::Response::Response buckets=…>>
219
+ ````
220
+
221
+ You can create powerful aggregations using blocks and pass arguments to them.
222
+
223
+ ````ruby
224
+ class ArticleSearch < Caoutsearch::Search::Base
225
+ has_aggregation :popular_tags_since do |date|
226
+ raise TypeError unless date.is_a?(Date)
227
+
228
+ query.aggregations[:popular_tags_since] = {
229
+ filter: { range: { publication_date: { gte: date.to_s } } },
230
+ aggs: {
231
+ published: {
232
+ terms: { field: :tags, size: 20 }
233
+ }
234
+ }
235
+ }
236
+ end
237
+ end
238
+
239
+ ArticleSearch.aggregate(popular_tags_since: 1.day.ago).aggregations
240
+ # ArticleSearch Search { "body": { "aggs": { "popular_tags_since": {…}}}}
241
+ # ArticleSearch Search (10ms / took 5ms)
242
+ => #<Caoutsearch::Response::Aggregations popular_tags_since=#<Caoutsearch::Response::Response …
243
+ ````
244
+
245
+ Only one argument can be passed to an aggregation block.
246
+ Use an Array or a Hash if you need to pass multiple options.
247
+
248
+ ````ruby
249
+ class ArticleSearch < Caoutsearch::Search::Base
250
+ has_aggregation :popular_tags_since do |options|
251
+ # …
252
+ end
253
+
254
+ has_aggregation :popular_tags_between do |(first_date, end_date)|
255
+ # …
256
+ end
257
+ end
258
+
259
+ ArticleSearch.aggregate(popular_tags_since: { date: 1.day.ago, size: 20 })
260
+ ArticleSearch.aggregate(popular_tags_between: [date1, date2])
261
+ ````
262
+
263
+ Finally, you can create a "catch-all" aggregation to handle cumbersome behaviors:
264
+
265
+ ````ruby
266
+ class ArticleSearch < Caoutsearch::Search::Base
267
+ has_aggregation do |name, options = {}|
268
+ raise "unxpected_error" unless name.match?(/^view_count_(?<year>\d{4})$/)
269
+
270
+ query.aggregations[name] = {
271
+ filter: { term: { year: $LAST_LATCH_INFO[:year] } },
272
+ aggs: {
273
+ filtered: {
274
+ sum: { field: :view_count }
275
+ }
276
+ }
277
+ }
278
+ end
279
+ end
280
+
281
+ ArticleSearch.aggregate(:view_count_2020, :view_count_2019).aggregations
282
+ # ArticleSearch Search { "body": { "aggs": { "view_count_2020": {…}, "view_count_2019": {…}}}}
283
+ # ArticleSearch Search (10ms / took 5ms)
284
+ => #<Caoutsearch::Response::Aggregations view_count_2020=#<Caoutsearch::Response::Response …
285
+ ````
286
+
287
+ #### Transform aggregations
288
+
289
+ When using [buckets aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket.html) and/or [pipeline aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline.html), the path to the expected values can get complicated and become subject to unexpected changes for a public API.
290
+
291
+ ````ruby
292
+ ArticleSearch.aggregate(popular_tags_since: 1.month.ago).aggregations.popular_tags_since.published.buckets.pluck(:key)
293
+ => ["Blog", "Tech", …]
294
+ ````
295
+
296
+ Instead, you can define transformations to provide simpler access to aggregated data:
297
+
298
+ ````ruby
299
+ class ArticleSearch < Caoutsearch::Search::Base
300
+ has_aggregation :popular_tags_since do |since|
301
+ # …
302
+ end
303
+
304
+ transform_aggregation :popular_tags_since do |aggs|
305
+ aggs.dig(:popular_tags_since, :published, :buckets).pluck(:key)
306
+ end
307
+ end
308
+
309
+ ArticleSearch.aggregate(popular_tags_since: 1.month.ago).aggregations.popular_tags_since
310
+ => ["Blog", "Tech", …]
311
+ ````
312
+
313
+ You can also use transformations to combine multiple aggregations:
314
+
315
+ ````ruby
316
+ class ArticleSearch < Caoutsearch::Search::Base
317
+ has_aggregation :blog_count, filter: { term: { category: "blog" } }
318
+ has_aggregation :archives_count, filter: { term: { archived: true } }
319
+
320
+ transform_aggregation :stats, from: %i[blog_count archives_count] do |aggs|
321
+ {
322
+ blog_count: aggs.dig(:blog_count, :doc_count),
323
+ archives_count: aggs.dig(:archives, :doc_count)
324
+ }
325
+ end
326
+ end
327
+
328
+ ArticleSearch.aggregate(:stats).aggregations.stats
329
+ # ArticleSearch Search { "body": { "aggs": { "blog_count": {…}, "archives_count": {…}}}}
330
+ # ArticleSearch Search (10ms / took 5ms)
331
+ => { blog_count: 124, archives_count: 2452 }
332
+ ````
333
+
334
+ This is also usefull to unify the API between different search engines:
335
+
336
+ ````ruby
337
+ class ArticleSearch < Caoutsearch::Search::Base
338
+ has_aggregation :popular_tags,
339
+ filter: { term: { published: true } },
340
+ aggs: { published: { terms: { field: :tags, size: 10 } } }
341
+
342
+ transform_aggregation :popular_tags do |aggs|
343
+ aggs.dig(:popular_tags, :published, :buckets).pluck(:key)
344
+ end
345
+ end
346
+
347
+ class TagSearch < Caoutsearch::Search::Base
348
+ has_aggregation :popular_tags,
349
+ terms: { field: "label", size: 20, order: { used_count: "desc" } }
350
+
351
+ transform_aggregation :popular_tags do |aggs|
352
+ aggs.dig(:popular_tags, :buckets).pluck(:key)
353
+ end
354
+ end
355
+
356
+ ArticleSearch.aggregate(:popular_tags).aggregations.popular_tags
357
+ => ["Blog", "Tech", …]
358
+
359
+ TagSearch.aggregate(:popular_tags).aggregations.popular_tags
360
+ => ["Tech", "Blog", …]
361
+ ````
362
+
363
+ Transformations are performed on demand and result is memorized. That means:
364
+ - the result of transformation is not visible in the [Response::Aggregations](#responses) output.
365
+ - the block is called only once for the same search instance.
366
+
367
+ ````ruby
368
+ class ArticleSearch < Caoutsearch::Search::Base
369
+ has_aggregation :popular_tags, …
370
+
371
+ transform_aggregation :popular_tags do |aggs|
372
+ tags = aggs.dig(:popular_tags, :published, :buckets).pluck(:key)
373
+ authorized = Tag.where(title: tags, authorize: true).pluck(:title)
374
+ tags & authorized
375
+ end
376
+ end
377
+
378
+ article_search = ArticleSearch.aggregate(:popular_tags)
379
+ => #<ArticleSearch current_aggregations: [:popular_tags]>
380
+
381
+ article_search.aggregations
382
+ # ArticleSearch Search (10ms / took 5ms)
383
+ => #<Caoutsearch::Response::Aggregations popular_tags=#<Caoutsearch::Response::Response doc_count=100 …
384
+
385
+ article_search.aggregations.popular_tags
386
+ # (10.2ms) SELECT "tags"."title" FROM "tags" WHERE "tags"."title" IN …
387
+ => ["Blog", "Tech", …]
388
+
389
+ article_search.aggregations.popular_tags
390
+ => ["Blog", "Tech", …]
391
+
392
+ article_search.search("Tech").aggregations.popular_tags
393
+ # ArticleSearch Search (10ms / took 5ms)
394
+ # (10.2ms) SELECT "tags"."title" FROM "tags" WHERE "tags"."title" IN …
395
+ => ["Blog", "Tech", …]
396
+ ````
397
+
398
+ Be careful to avoid using `aggregations.<aggregation_name>` inside a transformation block: it can lead to an infinite recursion.
399
+
400
+ ````ruby
401
+ class ArticleSearch < Caoutsearch::Search::Base
402
+ transform_aggregation :popular_tags do
403
+ aggregations.popular_tags.buckets.pluck("key")
404
+ end
405
+ end
406
+
407
+ ArticleSearch.aggregate(:popular_tags).aggregations.popular_tags
408
+ Traceback (most recent call last):
409
+ 4: from app/searches/article_search.rb:3:in `block in <class:ArticleSearch>'
410
+ 3: from app/searches/article_search.rb:3:in `block in <class:ArticleSearch>'
411
+ 2: from app/searches/article_search.rb:3:in `block in <class:ArticleSearch>'
412
+ 1: from app/searches/article_search.rb:3:in `block in <class:ArticleSearch>'
413
+ SystemStackError (stack level too deep)
414
+ ````
415
+
416
+ Instead, use the argument passed to the block: it's is a shortcut for `response.aggregations` which is a [Response::Reponse](#responses) and not a [Response::Aggregations](#responses).
417
+
418
+ ````ruby
419
+ class ArticleSearch < Caoutsearch::Search::Base
420
+ transform_aggregation :popular_tags do |aggs|
421
+ aggs.popular_tags.buckets.pluck("key")
422
+ end
423
+ end
424
+
425
+ ArticleSearch.aggregate(:popular_tags).aggregations.popular_tags
426
+ => ["Blog", "Tech", …]
427
+ ````
428
+
429
+ One last helpful argument is `track_total_hits` which allows to perform calculations over aggregations using the `total_count` method without sending a second request.
430
+ Take a look at [Total count](#total-count) to understand why a second request could be performed.
431
+
432
+ ````ruby
433
+ class ArticleSearch < Caoutsearch::Search::Base
434
+ aggregation :tagged, filter: { exist: "tag" }
435
+
436
+ transform_aggregation :tagged_rate, from: :tagged, track_total_hits: true do |aggs|
437
+ count = aggs.dig(:tagged, :doc_count)
438
+ count.to_f / total_count
439
+ end
440
+
441
+ transform_aggregation :tagged_rate_without_track_total_hits, from: :tagged do |aggs|
442
+ count = aggs.dig(:tagged, :doc_count)
443
+ count.to_f / total_count
444
+ end
445
+ end
446
+
447
+ ArticleSearch.aggregate(:tagged_rate).aggregations.tagged_rate
448
+ # ArticleSearch Search { "body": { "track_total_hits": true, "aggs": { "blog_count": {…}, "archives_count": {…}}}}
449
+ # ArticleSearch Search (10ms / took 5ms)
450
+ => 0.95
451
+
452
+ ArticleSearch.aggregate(:tagged_rate_without_track_total_hits).aggregations.tagged_rate
453
+ # ArticleSearch Search { "body": { "aggs": { "blog_count": {…}, "archives_count": {…}}}}
454
+ # ArticleSearch Search (10ms / took 5ms)
455
+ # ArticleSearch Search { "body": { "track_total_hits": true, "aggs": { "blog_count": {…}, "archives_count":
456
+ # ArticleSearch Search (10ms / took 5ms)
457
+ => 0.95
458
+ ````
459
+
460
+ #### Responses
461
+
462
+ After the request has been sent by calling a method such as `load`, `response` or `hits`, the results is wrapped in a `Response::Response` class which provides method access to its properties via [Hashie::Mash](http://github.com/intridea/hashie).
463
+
464
+ Aggregations and suggestions are wrapped in their own respective subclass of `Response::Response`
465
+
466
+ ````ruby
467
+ results.response
468
+ => #<Caoutsearch::Response::Response _shards=#<Caoutsearch::Response::Response failed=0 skipped=0 successful=5 total=5> hits=…
469
+
470
+ search.hits
471
+ => #<Hashie::Array [#<Caoutsearch::Response::Response _id="2"…
14
472
 
15
- <!-- TODO -->
473
+ search.aggregations
474
+ => #<Caoutsearch::Response::Aggregations view_count=#<Caoutsearch::Response::Response…
16
475
 
17
- ### Usage
476
+ search.suggestions
477
+ => #<Caoutsearch::Response::Suggestions tags=#<Caoutsearch::Response::Response…
478
+ ````
18
479
 
19
- <!-- TODO -->
480
+ ##### Loading records
481
+
482
+ When calling `records`, the search engine will try to load records from a model using the same class name without `Search` the suffix:
483
+ * `ArticleSearch` > `Article`
484
+ * `Blog::ArticleSearch` > `Blog::Article`
485
+
486
+ ````ruby
487
+ ArticleSearch.new.records.first
488
+ # ArticleSearch Search (10ms / took 5ms)
489
+ # Article Load (9.6ms) SELECT "articles".* FROM "articles" WHERE "articles"."id" IN (1, …
490
+ => #<Article id: 1, …>
491
+ ````
492
+
493
+ However, you can define an alternative model to load records. This might be helpful when using [single table inheritance](https://api.rubyonrails.org/classes/ActiveRecord/Inheritance.html).
494
+
495
+ ````ruby
496
+ ArticleSearch.new.records(use: BlogArticle).first
497
+ # ArticleSearch Search (10ms / took 5ms)
498
+ # BlogArticle Load (9.6ms) SELECT "articles".* FROM "articles" WHERE "articles"."id" IN (1, …
499
+ => #<BlogArticle id: 1, …>
500
+ ````
501
+
502
+ You can also define an alternative model at class level:
503
+
504
+ ````ruby
505
+ class BlogArticleSearch < Caoutsearch::Search::Base
506
+ self.model_name = "Article"
507
+
508
+ default do
509
+ query.filters << { term: { category: "blog" } }
510
+ end
511
+ end
512
+
513
+ BlogArticleSearch.new.records.first
514
+ # BlogArticleSearch Search (10ms / took 5ms)
515
+ # Article Load (9.6ms) SELECT "articles".* FROM "articles" WHERE "articles"."id" IN (1, …
516
+ => #<Article id: 1, …>
517
+ ````
518
+
519
+ ### Model integration
520
+
521
+ #### Add Caoutsearch to your models
522
+
523
+ The simplest solution is to add `Caoutsearch::Model` to your model and the link the appropriate `Index` and/or `Search` engines:
524
+
525
+ ```ruby
526
+ class Article < ActiveRecord::Base
527
+ include Caoutsearch::Model
528
+
529
+ index_with ArticleIndex
530
+ search_with ArticleSearch
531
+ end
532
+ ```
533
+
534
+ If you don't need your models to be `Indexable` and `Searchable`, you can include only one of the following two modules:
535
+
536
+ ````ruby
537
+ class Article < ActiveRecord::Base
538
+ include Caoutsearch::Model::Indexable
539
+
540
+ index_with ArticleIndex
541
+ end
542
+ ````
543
+ or
544
+ ````ruby
545
+ class Article < ActiveRecord::Base
546
+ include Caoutsearch::Model::Searchable
547
+
548
+ search_with ArticleSearch
549
+ end
550
+ ````
551
+
552
+ The modules can be safely included in the meta model `ApplicationRecord`.
553
+ Indexing & searching features are not available until you call `index_with` or `search_with`:
554
+
555
+ ````ruby
556
+ class ApplicationRecord < ActiveRecord::Base
557
+ include Caoutsearch::Model
558
+ end
559
+ ````
560
+
561
+ #### Index records
562
+
563
+ ##### Index multiple records
564
+
565
+ Import all your records or a restricted scope of records to Elastcisearch.
566
+
567
+ ````ruby
568
+ Article.reindex
569
+ Article.where(published: true).reindex
570
+ ````
571
+
572
+ You can update one or more properties. (see [Indexation Engines](#indexation-engines) to read more about properties):
573
+
574
+ ````ruby
575
+ Article.reindex(:category)
576
+ Article.reindex(%i[category published_on])
577
+ ````
578
+
579
+ When `reindex` is called without properties, it'll import the full document to ES.
580
+ On the contrary, when properties are passed, it'll only update existing documents.
581
+ You can control this behavior with the `method` argument.
582
+
583
+ ````ruby
584
+ Article.where(id: 123).reindex(:category)
585
+ # ArticleIndex Reindex {"index":"articles","body":[{"update":{"_id":123}},{"doc":{"category":"blog"}}]}
586
+ # [Error] {"update"=>{"_index"=>"articles", "_id"=>"123", "status"=>404, "error"=>{"type"=>"document_missing_exception", …}}
587
+
588
+ Article.where(id: 123).reindex(:category, method: :index)
589
+ # ArticleIndex Reindex {"index":"articles","body":[{"index":{"_id":123}},{"category":"blog"}]}
590
+
591
+ Article.where(id: 123).reindex(method: :update)
592
+ # ArticleIndex Reindex {"index":"articles","body":[{"update":{"_id":123}},{"doc":{…}}]}
593
+ ````
594
+
595
+ ##### Index single records
596
+
597
+ Import a single record.
598
+
599
+ ````ruby
600
+ Article.find(123).update_index
601
+ ````
602
+
603
+ You can update one or more properties. (see [Indexation Engines](#indexation-engines) to read more about properties):
604
+
605
+ ````ruby
606
+ Article.find(123).update_index(:category)
607
+ Article.find(123).update_index(%i[category published_on])
608
+ ````
609
+
610
+ You can verify if and how documents are indexed.
611
+ If the document is missing in ES, it'll raise a `Elastic::Transport::Transport::Errors::NotFound`.
612
+
613
+ ````ruby
614
+ Article.find(123).indexed_document
615
+ # Traceback (most recent call last):
616
+ # 1: from (irb):1
617
+ # Elastic::Transport::Transport::Errors::NotFound ([404] {"_index":"articles","_id":"123","found":false})
618
+
619
+ Article.find(123).update_index
620
+ Article.find(123).indexed_document
621
+ => {"_index"=>"articles", "_id"=>"123", "_version"=>1"found"=>true, "_source"=>{…}}
622
+ ````
623
+
624
+ ##### Delete documents
625
+
626
+ You can delete one or more documents.
627
+ **Note**: it won't delete records from database, only from the ES indice.
628
+
629
+ ````ruby
630
+ Article.delete_indexes
631
+ Article.where(id: 123).delete_indexed_documents
632
+ Article.find(123).delete_index
633
+ ````
634
+
635
+ If a record is already deleted from the database, you can still delete its document.
636
+
637
+ ````ruby
638
+ Article.delete_index(123)
639
+ ````
640
+
641
+ ##### Automatic Callbacks
642
+
643
+ Callbacks are not provided by Caoutsearch but they are very easy to add:
644
+
645
+ ````ruby
646
+ class Article < ApplicationRecord
647
+ index_with ArticleIndex
648
+
649
+ after_commit :update_index, on: %i[create update]
650
+ after_commit :delete_index, on: %i[destroy]
651
+ end
652
+ ````
653
+
654
+ ##### Asynchronous methods
655
+
656
+ TODO
657
+
658
+ #### Search for records
659
+
660
+ ##### Search API
661
+ Searching is pretty simple.
662
+
663
+ ````ruby
664
+ Article.search("Quick brown fox")
665
+ => #<ArticleSearch current_criteria: ["Quick brown fox"]>
666
+ ````
667
+
668
+ You can chain criteria and many other parameters:
669
+ ````ruby
670
+ Article.search("Quick brown fox").search(published: true)
671
+ => #<ArticleSearch current_criteria: ["Quick brown fox", {"published"=>true}]>
672
+
673
+ Article.search("Quick brown fox").order(:publication_date)
674
+ => #<ArticleSearch current_criteria: ["Quick brown fox"], current_order: :publication_date>
675
+
676
+ Article.search("Quick brown fox").limit(100).offset(100)
677
+ => #<ArticleSearch current_criteria: ["Quick brown fox"], current_limit: 100, current_offset: 100>
678
+
679
+ Article.search("Quick brown fox").page(1).per(100)
680
+ => #<ArticleSearch current_criteria: ["Quick brown fox"], current_page: 1, current_limit: 100>
681
+
682
+ Article.search("Quick brown fox").aggregate(:tags).aggregate(:dates)
683
+ => #<ArticleSearch current_criteria: ["Quick brown fox"], current_aggregations: [:tags, :dates]>>
684
+ ````
685
+
686
+ ##### Pagination
687
+
688
+ Search results can be paginated.
689
+ ````ruby
690
+ search = Article.search("Quick brown fox").page(1).per(100)
691
+ search.current_page
692
+ => 1
693
+
694
+ search.total_pages
695
+ => 2546
696
+
697
+ > search.total_count
698
+ => 254514
699
+ ````
700
+
701
+ ##### Total count
702
+
703
+ By default [ES doesn't return the total number of hits](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-your-data.html#track-total-hits). So, when calling `total_count` or `total_pages` a second request might be sent to ES.
704
+ To avoid a second roundtrip, use `track_total_hits`:
705
+
706
+ ````ruby
707
+ search = Article.search("Quick brown fox")
708
+ search.hits
709
+ # ArticleSearch Search {…}
710
+ # ArticleSearch Search (81.8ms / took 16ms)
711
+ => […]
712
+
713
+ search.total_count
714
+ # ArticleSearch Search {…, track_total_hits: true }
715
+ # ArticleSearch Search (135.3ms / took 76ms)
716
+ => 276
717
+
718
+ search = Article.search("Quick brown fox").track_total_hits
719
+ search.hits
720
+ # ArticleSearch Search {…, track_total_hits: true }
721
+ # ArticleSearch Search (120.2ms / took 56ms)
722
+ => […]
723
+
724
+ search.total_count
725
+ => 276
726
+ ````
727
+
728
+ ##### Iterating results
729
+
730
+ Several methods are provided to loop through a collection or hits or records.
731
+ These methods are processing batches in the most efficient way: [PIT search_after](https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html#search-after).
732
+
733
+ * `find_each_hit` to yield each hit returned by Elasticsearch.
734
+ * `find_each_record` to yield each record from your database.
735
+ * `find_hits_in_batches` to yield each batch of hits as returned by Elasticsearch.
736
+ * `find_records_in_batches` to yield each batch of records from the database.
737
+
738
+ Example:
739
+
740
+ ```ruby
741
+ Article.search(published: true).find_each_record do |record|
742
+ record.inspect
743
+ end
744
+ ```
745
+
746
+ The `keep_alive` parameter tells Elasticsearch how long it should keep the point in time alive. Defaults to 1 minute.
747
+
748
+ ```ruby
749
+ Article.search(published: true).find_each_record(keep_alive: "2h")
750
+ ```
751
+
752
+ To specifies the size of the batch, use `per` chainable method or `batch_size` parameter. Defaults to 1000.
753
+
754
+ ```ruby
755
+ Article.search(published: true).find_records_in_batches(batch_size: 500)
756
+ Article.search(published: true).per(500).find_records_in_batches
757
+ ```
758
+
759
+ ## Testing with Caoutsearch
760
+
761
+ Caoutsearch offers few methods to stub Elasticsearch requests.
762
+ You first need to add [webmock](https://github.com/bblimke/webmock) to your Gemfile.
763
+
764
+ ```bash
765
+ bundle add webmock
766
+ ```
767
+
768
+ Then, add `Caoutsearch::Testing::MockRequests` to your test suite.
769
+ The examples below uses RSpec, but it should be compatible with other test framework.
770
+
771
+ ```ruby
772
+ # spec/spec_helper.rb
773
+
774
+ require "caoutsearch/testing"
775
+
776
+ RSpec.configure do |config|
777
+ config.include Caoutsearch::Testing::MockRequests
778
+ end
779
+ ```
780
+
781
+ You can then call the following methods:
782
+
783
+ ```ruby
784
+ RSpec.describe SomeClass do
785
+ before do
786
+ stub_elasticsearch_request(:head, "articles").to_return(status: 200)
787
+
788
+ stub_elasticsearch_request(:get, "_cat/indices?format=json&h=index").to_return_json, [
789
+ { index: "ca_locals_v14" }
790
+ ])
791
+
792
+ stub_elasticsearch_reindex_request("articles")
793
+ stub_elasticsearch_search_request("articles", [
794
+ {"_id" => "135", "_source" => {"name" => "Hello World"}},
795
+ {"_id" => "137", "_source" => {"name" => "Hello World"}}
796
+ ])
797
+ end
798
+
799
+ # ... do your tests...
800
+ end
801
+ ```
802
+
803
+ `stub_elasticsearch_search_request` accepts an array or records:
804
+
805
+ ```ruby
806
+ RSpec.describe SomeClass do
807
+ let(:articles) { create_list(:article, 5) }
808
+
809
+ before do
810
+ stub_elasticsearch_search_request("articles", articles)
811
+ end
812
+
813
+ # ... do your tests...
814
+ end
815
+ ```
816
+
817
+ It allows to shim the total number of hits returned.
818
+
819
+ ```ruby
820
+ RSpec.describe SomeClass do
821
+ before do
822
+ stub_elasticsearch_search_request("articles", [], total: 250)
823
+ end
824
+
825
+ # ... do your tests...
826
+ end
827
+ ```
20
828
 
21
829
  ## Contributing
22
830
 
@@ -31,9 +839,10 @@ bundle add caoutsearch
31
839
  ```bash
32
840
  bundle exec rspec
33
841
  bundle exec rubocop
842
+ bundle exec standardrb
34
843
  ```
35
844
 
36
- Both can be run with:
845
+ All of them can be run with:
37
846
 
38
847
  ```bash
39
848
  bundle exec rake