searchkick 5.0.2 → 5.4.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +72 -0
- data/LICENSE.txt +1 -1
- data/README.md +236 -96
- data/lib/searchkick/index.rb +6 -3
- data/lib/searchkick/index_options.rb +77 -2
- data/lib/searchkick/log_subscriber.rb +3 -3
- data/lib/searchkick/middleware.rb +8 -1
- data/lib/searchkick/model.rb +5 -5
- data/lib/searchkick/query.rb +154 -13
- data/lib/searchkick/reindex_queue.rb +12 -7
- data/lib/searchkick/relation.rb +137 -2
- data/lib/searchkick/relation_indexer.rb +12 -7
- data/lib/searchkick/reranking.rb +28 -0
- data/lib/searchkick/results.rb +3 -1
- data/lib/searchkick/script.rb +11 -0
- data/lib/searchkick/version.rb +1 -1
- data/lib/searchkick/where.rb +11 -0
- data/lib/searchkick.rb +42 -22
- metadata +9 -6
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 5dfa66d383b0e1a91288b4636fef42b22ac73ec63294a7955ac20298851df1c1
|
4
|
+
data.tar.gz: fbe59c3e85352b01674c16831f67b3906367e6a032769582ee552038853194f8
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 25d7df6cf3861522a99851c8c561f0726a6e922d4aafb0fe646883b86c37ba4c1ec78735940a5eb70056433422fa2987989701260ec7b1e2969fb452758d2d43
|
7
|
+
data.tar.gz: d9b637bba6c90e1c08de0b9cd8239e6b8b03aa86eb0a77aebc6b855ff0744220d775c8b7c8a12e1df736f98ee44dac3c6f11ad3a0506aa51e5c22f120f3c13d3
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,75 @@
|
|
1
|
+
## 5.4.0 (2024-09-04)
|
2
|
+
|
3
|
+
- Added `knn` option
|
4
|
+
- Added `rrf` method
|
5
|
+
- Added experimental support for scripting to `where` option
|
6
|
+
- Added warning for `exists` with non-`true` values
|
7
|
+
- Added warning for full reindex and `:queue` mode
|
8
|
+
- Fixed `per_page` method when paginating beyond `max_result_window`
|
9
|
+
- Dropped support for Ruby < 3.1
|
10
|
+
|
11
|
+
## 5.3.1 (2023-11-28)
|
12
|
+
|
13
|
+
- Fixed error with misspellings below and failed queries
|
14
|
+
|
15
|
+
## 5.3.0 (2023-07-02)
|
16
|
+
|
17
|
+
- Fixed error with `cutoff_frequency`
|
18
|
+
- Dropped support for Ruby < 3 and Active Record < 6.1
|
19
|
+
- Dropped support for Mongoid < 7
|
20
|
+
|
21
|
+
## 5.2.4 (2023-05-11)
|
22
|
+
|
23
|
+
- Fixed error with non-string routing and `:async` mode
|
24
|
+
|
25
|
+
## 5.2.3 (2023-04-12)
|
26
|
+
|
27
|
+
- Fixed error with missing records and multiple models
|
28
|
+
|
29
|
+
## 5.2.2 (2023-04-01)
|
30
|
+
|
31
|
+
- Fixed `total_docs` method
|
32
|
+
- Fixed deprecation warning with Active Support 7.1
|
33
|
+
|
34
|
+
## 5.2.1 (2023-02-21)
|
35
|
+
|
36
|
+
- Added support for `redis-client` gem
|
37
|
+
|
38
|
+
## 5.2.0 (2023-02-08)
|
39
|
+
|
40
|
+
- Added model name to warning about missing records
|
41
|
+
- Fixed unnecessary data loading when reindexing relations with `:async` and `:queue` modes
|
42
|
+
|
43
|
+
## 5.1.2 (2023-01-29)
|
44
|
+
|
45
|
+
- Fixed error with missing point in time
|
46
|
+
|
47
|
+
## 5.1.1 (2022-12-05)
|
48
|
+
|
49
|
+
- Added support for strings for `offset` and `per_page`
|
50
|
+
|
51
|
+
## 5.1.0 (2022-10-12)
|
52
|
+
|
53
|
+
- Added support for fractional search timeout
|
54
|
+
- Fixed search timeout with `elasticsearch` 8+ and `opensearch-ruby` gems
|
55
|
+
- Fixed search timeout not applying to `multi_search`
|
56
|
+
|
57
|
+
## 5.0.5 (2022-10-09)
|
58
|
+
|
59
|
+
- Added `model` method to `Searchkick::Relation`
|
60
|
+
- Fixed deprecation warning with `redis` gem
|
61
|
+
- Fixed `respond_to?` method on relation loading relation
|
62
|
+
- Fixed `Relation loaded` error for non-mutating methods on relation
|
63
|
+
|
64
|
+
## 5.0.4 (2022-06-16)
|
65
|
+
|
66
|
+
- Added `max_result_window` option
|
67
|
+
- Improved error message for unsupported versions of Elasticsearch
|
68
|
+
|
69
|
+
## 5.0.3 (2022-03-13)
|
70
|
+
|
71
|
+
- Fixed context for index name for inherited models
|
72
|
+
|
1
73
|
## 5.0.2 (2022-03-03)
|
2
74
|
|
3
75
|
- Fixed index name for inherited models
|
data/LICENSE.txt
CHANGED
data/README.md
CHANGED
@@ -26,7 +26,7 @@ Check out [Searchjoy](https://github.com/ankane/searchjoy) for analytics and [Au
|
|
26
26
|
|
27
27
|
:tangerine: Battle-tested at [Instacart](https://www.instacart.com/opensource)
|
28
28
|
|
29
|
-
[![Build Status](https://github.com/ankane/searchkick/workflows/build/badge.svg
|
29
|
+
[![Build Status](https://github.com/ankane/searchkick/actions/workflows/build.yml/badge.svg)](https://github.com/ankane/searchkick/actions)
|
30
30
|
|
31
31
|
## Contents
|
32
32
|
|
@@ -43,15 +43,13 @@ Check out [Searchjoy](https://github.com/ankane/searchjoy) for analytics and [Au
|
|
43
43
|
- [Reference](#reference)
|
44
44
|
- [Contributing](#contributing)
|
45
45
|
|
46
|
-
Searchkick 5.0 was recently released! See [how to upgrade](#upgrading)
|
47
|
-
|
48
46
|
## Getting Started
|
49
47
|
|
50
48
|
Install [Elasticsearch](https://www.elastic.co/downloads/elasticsearch) or [OpenSearch](https://opensearch.org/downloads.html). For Homebrew, use:
|
51
49
|
|
52
50
|
```sh
|
53
|
-
brew install elasticsearch
|
54
|
-
brew services start elasticsearch
|
51
|
+
brew install elastic/tap/elasticsearch-full
|
52
|
+
brew services start elasticsearch-full
|
55
53
|
# or
|
56
54
|
brew install opensearch
|
57
55
|
brew services start opensearch
|
@@ -66,7 +64,7 @@ gem "elasticsearch" # select one
|
|
66
64
|
gem "opensearch-ruby" # select one
|
67
65
|
```
|
68
66
|
|
69
|
-
The latest version works with Elasticsearch 7 and 8 and OpenSearch 1. For Elasticsearch 6, use version 4.6.3 and [this readme](https://github.com/ankane/searchkick/blob/v4.6.3/README.md).
|
67
|
+
The latest version works with Elasticsearch 7 and 8 and OpenSearch 1 and 2. For Elasticsearch 6, use version 4.6.3 and [this readme](https://github.com/ankane/searchkick/blob/v4.6.3/README.md).
|
70
68
|
|
71
69
|
Add searchkick to models you want to search.
|
72
70
|
|
@@ -122,9 +120,9 @@ where: {
|
|
122
120
|
category: /frozen .+/, # regexp
|
123
121
|
category: {prefix: "frozen"}, # prefix
|
124
122
|
store_id: {exists: true}, # exists
|
123
|
+
_not: {store_id: 1}, # negate a condition
|
125
124
|
_or: [{in_stock: true}, {backordered: true}],
|
126
|
-
_and: [{in_stock: true}, {backordered: true}]
|
127
|
-
_not: {store_id: 1} # negate a condition
|
125
|
+
_and: [{in_stock: true}, {backordered: true}]
|
128
126
|
}
|
129
127
|
```
|
130
128
|
|
@@ -292,12 +290,18 @@ Option | Matches | Example
|
|
292
290
|
|
293
291
|
The default is `:word`. The most matches will happen with `:word_middle`.
|
294
292
|
|
293
|
+
To specify different matching for different fields, use:
|
294
|
+
|
295
|
+
```ruby
|
296
|
+
Product.search(query, fields: [{name: :word_start}, {brand: :word_middle}])
|
297
|
+
```
|
298
|
+
|
295
299
|
### Exact Matches
|
296
300
|
|
297
301
|
To match a field exactly (case-sensitive), use:
|
298
302
|
|
299
303
|
```ruby
|
300
|
-
Product.search(query, fields: [{
|
304
|
+
Product.search(query, fields: [{name: :exact}])
|
301
305
|
```
|
302
306
|
|
303
307
|
### Phrase Matches
|
@@ -323,11 +327,11 @@ end
|
|
323
327
|
See the [list of languages](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html#analysis-stemmer-tokenfilter-configure-parms). A few languages require plugins:
|
324
328
|
|
325
329
|
- `chinese` - [analysis-ik plugin](https://github.com/medcl/elasticsearch-analysis-ik)
|
326
|
-
- `chinese2` - [analysis-smartcn plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/
|
327
|
-
- `japanese` - [analysis-kuromoji plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/
|
330
|
+
- `chinese2` - [analysis-smartcn plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-smartcn.html)
|
331
|
+
- `japanese` - [analysis-kuromoji plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-kuromoji.html)
|
328
332
|
- `korean` - [analysis-openkoreantext plugin](https://github.com/open-korean-text/elasticsearch-analysis-openkoreantext)
|
329
|
-
- `korean2` - [analysis-nori plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/
|
330
|
-
- `polish` - [analysis-stempel plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/
|
333
|
+
- `korean2` - [analysis-nori plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-nori.html)
|
334
|
+
- `polish` - [analysis-stempel plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-stempel.html)
|
331
335
|
- `ukrainian` - [analysis-ukrainian plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/7.4/analysis-ukrainian.html)
|
332
336
|
- `vietnamese` - [analysis-vietnamese plugin](https://github.com/duydo/elasticsearch-analysis-vietnamese)
|
333
337
|
|
@@ -592,6 +596,14 @@ There are four strategies for keeping the index synced with your database.
|
|
592
596
|
end
|
593
597
|
```
|
594
598
|
|
599
|
+
And reindex a record or relation manually.
|
600
|
+
|
601
|
+
```ruby
|
602
|
+
product.reindex
|
603
|
+
# or
|
604
|
+
store.products.reindex(mode: :async)
|
605
|
+
```
|
606
|
+
|
595
607
|
You can also do bulk updates.
|
596
608
|
|
597
609
|
```ruby
|
@@ -608,6 +620,12 @@ Searchkick.callbacks(false) do
|
|
608
620
|
end
|
609
621
|
```
|
610
622
|
|
623
|
+
Or override the model’s strategy.
|
624
|
+
|
625
|
+
```ruby
|
626
|
+
product.reindex(mode: :async) # :inline or :queue
|
627
|
+
```
|
628
|
+
|
611
629
|
### Associations
|
612
630
|
|
613
631
|
Data is **not** automatically synced when an association is updated. If this is desired, add a callback to reindex:
|
@@ -654,20 +672,16 @@ The best starting point to improve your search **by far** is to track searches a
|
|
654
672
|
Product.search("apple", track: {user_id: current_user.id})
|
655
673
|
```
|
656
674
|
|
657
|
-
[See the docs](https://github.com/ankane/searchjoy) for how to install and use.
|
658
|
-
|
659
|
-
Focus on:
|
675
|
+
[See the docs](https://github.com/ankane/searchjoy) for how to install and use. Focus on top searches with a low conversion rate.
|
660
676
|
|
661
|
-
|
662
|
-
- top searches with no results
|
663
|
-
|
664
|
-
Searchkick can then use the conversion data to learn what users are looking for. If a user searches for “ice cream” and adds Ben & Jerry’s Chunky Monkey to the cart (our conversion metric at Instacart), that item gets a little more weight for similar searches.
|
677
|
+
Searchkick can then use the conversion data to learn what users are looking for. If a user searches for “ice cream” and adds Ben & Jerry’s Chunky Monkey to the cart (our conversion metric at Instacart), that item gets a little more weight for similar searches. This can make a huge difference on the quality of your search.
|
665
678
|
|
666
679
|
Add conversion data with:
|
667
680
|
|
668
681
|
```ruby
|
669
682
|
class Product < ApplicationRecord
|
670
|
-
has_many :
|
683
|
+
has_many :conversions, class_name: "Searchjoy::Conversion", as: :convertable
|
684
|
+
has_many :searches, class_name: "Searchjoy::Search", through: :conversions
|
671
685
|
|
672
686
|
searchkick conversions: [:conversions] # name of field
|
673
687
|
|
@@ -681,15 +695,100 @@ class Product < ApplicationRecord
|
|
681
695
|
end
|
682
696
|
```
|
683
697
|
|
684
|
-
Reindex and set up a cron job to add new conversions daily.
|
698
|
+
Reindex and set up a cron job to add new conversions daily. For zero downtime deployment, temporarily set `conversions: false` in your search calls until the data is reindexed.
|
699
|
+
|
700
|
+
### Performant Conversions
|
701
|
+
|
702
|
+
A performant way to do conversions is to cache them to prevent N+1 queries. For Postgres, create a migration with:
|
685
703
|
|
686
704
|
```ruby
|
687
|
-
|
705
|
+
add_column :products, :search_conversions, :jsonb
|
688
706
|
```
|
689
707
|
|
690
|
-
|
708
|
+
For MySQL, use `:json`, and for others, use `:text` with a [JSON serializer](https://api.rubyonrails.org/classes/ActiveRecord/AttributeMethods/Serialization/ClassMethods.html).
|
709
|
+
|
710
|
+
Next, update your model. Create a separate method for conversion data so you can use [partial reindexing](#partial-reindexing).
|
691
711
|
|
692
|
-
|
712
|
+
```ruby
|
713
|
+
class Product < ApplicationRecord
|
714
|
+
searchkick conversions: [:conversions]
|
715
|
+
|
716
|
+
def search_data
|
717
|
+
{
|
718
|
+
name: name,
|
719
|
+
category: category
|
720
|
+
}.merge(conversions_data)
|
721
|
+
end
|
722
|
+
|
723
|
+
def conversions_data
|
724
|
+
{
|
725
|
+
conversions: search_conversions || {}
|
726
|
+
}
|
727
|
+
end
|
728
|
+
end
|
729
|
+
```
|
730
|
+
|
731
|
+
Deploy and reindex your data. For zero downtime deployment, temporarily set `conversions: false` in your search calls until the data is reindexed.
|
732
|
+
|
733
|
+
```ruby
|
734
|
+
Product.reindex
|
735
|
+
```
|
736
|
+
|
737
|
+
Then, create a job to update the conversions column and reindex records with new conversions. Here’s one you can use for Searchjoy:
|
738
|
+
|
739
|
+
```ruby
|
740
|
+
class UpdateConversionsJob < ApplicationJob
|
741
|
+
def perform(class_name, since: nil, update: true, reindex: true)
|
742
|
+
model = Searchkick.load_model(class_name)
|
743
|
+
|
744
|
+
# get records that have a recent conversion
|
745
|
+
recently_converted_ids =
|
746
|
+
Searchjoy::Conversion.where(convertable_type: class_name).where(created_at: since..)
|
747
|
+
.order(:convertable_id).distinct.pluck(:convertable_id)
|
748
|
+
|
749
|
+
# split into batches
|
750
|
+
recently_converted_ids.in_groups_of(1000, false) do |ids|
|
751
|
+
if update
|
752
|
+
# fetch conversions
|
753
|
+
conversions =
|
754
|
+
Searchjoy::Conversion.where(convertable_id: ids, convertable_type: class_name)
|
755
|
+
.joins(:search).where.not(searchjoy_searches: {user_id: nil})
|
756
|
+
.group(:convertable_id, :query).distinct.count(:user_id)
|
757
|
+
|
758
|
+
# group by record
|
759
|
+
conversions_by_record = {}
|
760
|
+
conversions.each do |(id, query), count|
|
761
|
+
(conversions_by_record[id] ||= {})[query] = count
|
762
|
+
end
|
763
|
+
|
764
|
+
# update conversions column
|
765
|
+
model.transaction do
|
766
|
+
conversions_by_record.each do |id, conversions|
|
767
|
+
model.where(id: id).update_all(search_conversions: conversions)
|
768
|
+
end
|
769
|
+
end
|
770
|
+
end
|
771
|
+
|
772
|
+
if reindex
|
773
|
+
# reindex conversions data
|
774
|
+
model.where(id: ids).reindex(:conversions_data)
|
775
|
+
end
|
776
|
+
end
|
777
|
+
end
|
778
|
+
end
|
779
|
+
```
|
780
|
+
|
781
|
+
Run the job:
|
782
|
+
|
783
|
+
```ruby
|
784
|
+
UpdateConversionsJob.perform_now("Product")
|
785
|
+
```
|
786
|
+
|
787
|
+
And set it up to run daily.
|
788
|
+
|
789
|
+
```ruby
|
790
|
+
UpdateConversionsJob.perform_later("Product", since: 1.day.ago)
|
791
|
+
```
|
693
792
|
|
694
793
|
## Personalized Results
|
695
794
|
|
@@ -716,7 +815,7 @@ Product.search("milk", boost_where: {orderer_ids: current_user.id})
|
|
716
815
|
|
717
816
|
Autocomplete predicts what a user will type, making the search experience faster and easier.
|
718
817
|
|
719
|
-
![Autocomplete](https://gist.
|
818
|
+
![Autocomplete](https://gist.githubusercontent.com/ankane/b6988db2802aca68a589b31e41b44195/raw/40febe948427e5bc53ec4e5dc248822855fef76f/autocomplete.png)
|
720
819
|
|
721
820
|
**Note:** To autocomplete on search terms rather than results, check out [Autosuggest](https://github.com/ankane/autosuggest).
|
722
821
|
|
@@ -782,7 +881,7 @@ Then add the search box and JavaScript code to a view.
|
|
782
881
|
|
783
882
|
## Suggestions
|
784
883
|
|
785
|
-
![Suggest](https://gist.
|
884
|
+
![Suggest](https://gist.githubusercontent.com/ankane/b6988db2802aca68a589b31e41b44195/raw/40febe948427e5bc53ec4e5dc248822855fef76f/recursion.png)
|
786
885
|
|
787
886
|
```ruby
|
788
887
|
class Product < ApplicationRecord
|
@@ -801,7 +900,7 @@ products.suggestions # ["peanut butter"]
|
|
801
900
|
|
802
901
|
[Aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html) provide aggregated search data.
|
803
902
|
|
804
|
-
![Aggregations](https://gist.
|
903
|
+
![Aggregations](https://gist.githubusercontent.com/ankane/b6988db2802aca68a589b31e41b44195/raw/40febe948427e5bc53ec4e5dc248822855fef76f/facets.png)
|
805
904
|
|
806
905
|
```ruby
|
807
906
|
products = Product.search("chuck taylor", aggs: [:product_type, :gender, :brand])
|
@@ -1011,7 +1110,7 @@ Restaurant.search("soup", where: {bounds: {geo_shape: {type: "polygon", coordina
|
|
1011
1110
|
Falling entirely within the query shape
|
1012
1111
|
|
1013
1112
|
```ruby
|
1014
|
-
Restaurant.search("salad", where: {bounds: {geo_shape: {type: "circle", relation: "within", coordinates:
|
1113
|
+
Restaurant.search("salad", where: {bounds: {geo_shape: {type: "circle", relation: "within", coordinates: {lat: 38, lon: -123}, radius: "1km"}}})
|
1015
1114
|
```
|
1016
1115
|
|
1017
1116
|
Not touching the query shape
|
@@ -1384,7 +1483,15 @@ ENV["ELASTICSEARCH_URL"] = "https://user:password@host1,https://user:password@ho
|
|
1384
1483
|
ENV["OPENSEARCH_URL"] = "https://user:password@host1,https://user:password@host2"
|
1385
1484
|
```
|
1386
1485
|
|
1387
|
-
|
1486
|
+
### Client Options
|
1487
|
+
|
1488
|
+
Create an initializer with:
|
1489
|
+
|
1490
|
+
```ruby
|
1491
|
+
Searchkick.client_options[:reload_connections] = true
|
1492
|
+
```
|
1493
|
+
|
1494
|
+
See the docs for [Elasticsearch](https://www.elastic.co/guide/en/elasticsearch/client/ruby-api/current/advanced-config.html) or [Opensearch](https://rubydoc.info/gems/opensearch-transport#configuration) for a complete list of options.
|
1388
1495
|
|
1389
1496
|
### Lograge
|
1390
1497
|
|
@@ -1575,11 +1682,12 @@ Reindex a subset of attributes to reduce time spent generating search data and c
|
|
1575
1682
|
class Product < ApplicationRecord
|
1576
1683
|
def search_data
|
1577
1684
|
{
|
1578
|
-
name: name
|
1579
|
-
|
1685
|
+
name: name,
|
1686
|
+
category: category
|
1687
|
+
}.merge(prices_data)
|
1580
1688
|
end
|
1581
1689
|
|
1582
|
-
def
|
1690
|
+
def prices_data
|
1583
1691
|
{
|
1584
1692
|
price: price,
|
1585
1693
|
sale_price: sale_price
|
@@ -1591,68 +1699,7 @@ end
|
|
1591
1699
|
And use:
|
1592
1700
|
|
1593
1701
|
```ruby
|
1594
|
-
Product.reindex(:
|
1595
|
-
```
|
1596
|
-
|
1597
|
-
### Performant Conversions
|
1598
|
-
|
1599
|
-
Split out conversions into a separate method so you can use partial reindexing, and cache conversions to prevent N+1 queries. Be sure to use a centralized cache store like Memcached or Redis.
|
1600
|
-
|
1601
|
-
```ruby
|
1602
|
-
class Product < ApplicationRecord
|
1603
|
-
def search_data
|
1604
|
-
{
|
1605
|
-
name: name
|
1606
|
-
}.merge(search_conversions)
|
1607
|
-
end
|
1608
|
-
|
1609
|
-
def search_conversions
|
1610
|
-
{
|
1611
|
-
conversions: Rails.cache.read("search_conversions:#{self.class.name}:#{id}") || {}
|
1612
|
-
}
|
1613
|
-
end
|
1614
|
-
end
|
1615
|
-
```
|
1616
|
-
|
1617
|
-
Create a job to update the cache and reindex records with new conversions.
|
1618
|
-
|
1619
|
-
```ruby
|
1620
|
-
class ReindexConversionsJob < ApplicationJob
|
1621
|
-
def perform(class_name)
|
1622
|
-
# get records that have a recent conversion
|
1623
|
-
recently_converted_ids =
|
1624
|
-
Searchjoy::Search.where("convertable_type = ? AND converted_at > ?", class_name, 1.day.ago)
|
1625
|
-
.order(:convertable_id).distinct.pluck(:convertable_id)
|
1626
|
-
|
1627
|
-
# split into groups
|
1628
|
-
recently_converted_ids.in_groups_of(1000, false) do |ids|
|
1629
|
-
# fetch conversions
|
1630
|
-
conversions =
|
1631
|
-
Searchjoy::Search.where(convertable_id: ids, convertable_type: class_name)
|
1632
|
-
.group(:convertable_id, :query).distinct.count(:user_id)
|
1633
|
-
|
1634
|
-
# group conversions by record
|
1635
|
-
conversions_by_record = {}
|
1636
|
-
conversions.each do |(id, query), count|
|
1637
|
-
(conversions_by_record[id] ||= {})[query] = count
|
1638
|
-
end
|
1639
|
-
|
1640
|
-
# write to cache
|
1641
|
-
conversions_by_record.each do |id, conversions|
|
1642
|
-
Rails.cache.write("search_conversions:#{class_name}:#{id}", conversions)
|
1643
|
-
end
|
1644
|
-
|
1645
|
-
# partial reindex
|
1646
|
-
class_name.constantize.where(id: ids).reindex(:search_conversions)
|
1647
|
-
end
|
1648
|
-
end
|
1649
|
-
end
|
1650
|
-
```
|
1651
|
-
|
1652
|
-
Run the job with:
|
1653
|
-
|
1654
|
-
```ruby
|
1655
|
-
ReindexConversionsJob.perform_later("Product")
|
1702
|
+
Product.reindex(:prices_data)
|
1656
1703
|
```
|
1657
1704
|
|
1658
1705
|
## Advanced
|
@@ -1798,6 +1845,87 @@ To query nested data, use dot notation.
|
|
1798
1845
|
Product.search("san", fields: ["store.city"], where: {"store.zip_code" => 12345})
|
1799
1846
|
```
|
1800
1847
|
|
1848
|
+
## Nearest Neighbor Search
|
1849
|
+
|
1850
|
+
*Available for Elasticsearch 8.6+ and OpenSearch 2.4+*
|
1851
|
+
|
1852
|
+
```ruby
|
1853
|
+
class Product < ApplicationRecord
|
1854
|
+
searchkick knn: {embedding: {dimensions: 3, distance: "cosine"}}
|
1855
|
+
end
|
1856
|
+
```
|
1857
|
+
|
1858
|
+
Also supports `euclidean` and `inner_product`
|
1859
|
+
|
1860
|
+
Reindex and search with:
|
1861
|
+
|
1862
|
+
```ruby
|
1863
|
+
Product.search(knn: {field: :embedding, vector: [1, 2, 3]}, limit: 10)
|
1864
|
+
```
|
1865
|
+
|
1866
|
+
## Semantic Search
|
1867
|
+
|
1868
|
+
First, add [nearest neighbor search](#nearest-neighbor-search-unreleased-experimental) to your model
|
1869
|
+
|
1870
|
+
```ruby
|
1871
|
+
class Product < ApplicationRecord
|
1872
|
+
searchkick knn: {embedding: {dimensions: 768, distance: "cosine"}}
|
1873
|
+
end
|
1874
|
+
```
|
1875
|
+
|
1876
|
+
Generate an embedding for each record (you can use an external service or a library like [Informers](https://github.com/ankane/informers))
|
1877
|
+
|
1878
|
+
```ruby
|
1879
|
+
embed = Informers.pipeline("embedding", "Snowflake/snowflake-arctic-embed-m-v1.5")
|
1880
|
+
embed_options = {model_output: "sentence_embedding", pooling: "none"} # specific to embedding model
|
1881
|
+
|
1882
|
+
Product.find_each do |product|
|
1883
|
+
embedding = embed.(product.name, **embed_options)
|
1884
|
+
product.update!(embedding: embedding)
|
1885
|
+
end
|
1886
|
+
```
|
1887
|
+
|
1888
|
+
For search, generate an embedding for the query (the query prefix is specific to the [embedding model](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5))
|
1889
|
+
|
1890
|
+
```ruby
|
1891
|
+
query_prefix = "Represent this sentence for searching relevant passages: "
|
1892
|
+
query_embedding = embed.(query_prefix + query, **embed_options)
|
1893
|
+
```
|
1894
|
+
|
1895
|
+
And perform nearest neighbor search
|
1896
|
+
|
1897
|
+
```ruby
|
1898
|
+
Product.search(knn: {field: :embedding, vector: query_embedding}, limit: 20)
|
1899
|
+
```
|
1900
|
+
|
1901
|
+
See a [full example](examples/semantic.rb)
|
1902
|
+
|
1903
|
+
## Hybrid Search
|
1904
|
+
|
1905
|
+
Perform keyword search and semantic search in parallel
|
1906
|
+
|
1907
|
+
```ruby
|
1908
|
+
keyword_search = Product.search(query, limit: 20)
|
1909
|
+
semantic_search = Product.search(knn: {field: :embedding, vector: query_embedding}, limit: 20)
|
1910
|
+
Searchkick.multi_search([keyword_search, semantic_search])
|
1911
|
+
```
|
1912
|
+
|
1913
|
+
To combine the results, use Reciprocal Rank Fusion (RRF)
|
1914
|
+
|
1915
|
+
```ruby
|
1916
|
+
Searchkick::Reranking.rrf(keyword_search, semantic_search).first(5)
|
1917
|
+
```
|
1918
|
+
|
1919
|
+
Or a reranking model
|
1920
|
+
|
1921
|
+
```ruby
|
1922
|
+
rerank = Informers.pipeline("reranking", "mixedbread-ai/mxbai-rerank-xsmall-v1")
|
1923
|
+
results = (keyword_search.to_a + semantic_search.to_a).uniq
|
1924
|
+
rerank.(query, results.map(&:name)).first(5).map { |v| results[v[:doc_id]] }
|
1925
|
+
```
|
1926
|
+
|
1927
|
+
See a [full example](examples/hybrid.rb)
|
1928
|
+
|
1801
1929
|
## Reference
|
1802
1930
|
|
1803
1931
|
Reindex one record
|
@@ -2036,12 +2164,24 @@ Turn on misspellings after a certain number of characters
|
|
2036
2164
|
Product.search("api", misspellings: {prefix_length: 2}) # api, apt, no ahi
|
2037
2165
|
```
|
2038
2166
|
|
2039
|
-
**Note:** With this option, if the query length is the same as `prefix_length`, misspellings are turned off with Elasticsearch 7 and OpenSearch
|
2167
|
+
**Note:** With this option, if the query length is the same as `prefix_length`, misspellings are turned off with Elasticsearch 7 and OpenSearch 1
|
2040
2168
|
|
2041
2169
|
```ruby
|
2042
2170
|
Product.search("ah", misspellings: {prefix_length: 2}) # ah, no aha
|
2043
2171
|
```
|
2044
2172
|
|
2173
|
+
BigDecimal values are indexed as floats by default so they can be used for boosting. Convert them to strings to keep full precision.
|
2174
|
+
|
2175
|
+
```ruby
|
2176
|
+
class Product < ApplicationRecord
|
2177
|
+
def search_data
|
2178
|
+
{
|
2179
|
+
units: units.to_s("F")
|
2180
|
+
}
|
2181
|
+
end
|
2182
|
+
end
|
2183
|
+
```
|
2184
|
+
|
2045
2185
|
## Gotchas
|
2046
2186
|
|
2047
2187
|
### Consistency
|
data/lib/searchkick/index.rb
CHANGED
@@ -67,7 +67,8 @@ module Searchkick
|
|
67
67
|
index: name,
|
68
68
|
body: {
|
69
69
|
query: {match_all: {}},
|
70
|
-
size: 0
|
70
|
+
size: 0,
|
71
|
+
track_total_hits: true
|
71
72
|
}
|
72
73
|
)
|
73
74
|
|
@@ -98,7 +99,7 @@ module Searchkick
|
|
98
99
|
record_data = RecordData.new(self, record).record_data
|
99
100
|
|
100
101
|
# remove underscore
|
101
|
-
get_options = record_data.to_h { |k, v| [k.to_s.
|
102
|
+
get_options = record_data.to_h { |k, v| [k.to_s.delete_prefix("_").to_sym, v] }
|
102
103
|
|
103
104
|
client.get(get_options)["_source"]
|
104
105
|
end
|
@@ -351,6 +352,8 @@ module Searchkick
|
|
351
352
|
# http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/
|
352
353
|
def full_reindex(relation, import: true, resume: false, retain: false, mode: nil, refresh_interval: nil, scope: nil, wait: nil)
|
353
354
|
raise ArgumentError, "wait only available in :async mode" if !wait.nil? && mode != :async
|
355
|
+
# TODO raise ArgumentError in Searchkick 6
|
356
|
+
Searchkick.warn("Full reindex does not support :queue mode - use :async mode instead") if mode == :queue
|
354
357
|
|
355
358
|
if resume
|
356
359
|
index_name = all_indices.sort.last
|
@@ -418,7 +421,7 @@ module Searchkick
|
|
418
421
|
true
|
419
422
|
end
|
420
423
|
rescue => e
|
421
|
-
if Searchkick.transport_error?(e) && e.message.include?("No handler for type [text]")
|
424
|
+
if Searchkick.transport_error?(e) && (e.message.include?("No handler for type [text]") || e.message.include?("class java.util.ArrayList cannot be cast to class java.util.Map"))
|
422
425
|
raise UnsupportedVersionError
|
423
426
|
end
|
424
427
|
|