searchkick 5.0.2 → 5.4.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 5ebb326348913a8532f1f4e7771bcdb57dda1abe4cc668e4f5bf8fe44bdbc85b
4
- data.tar.gz: 57b63c4444b9dbe26bb8decc4ab4dca81e4e037e6642d5728b96549aea0b2131
3
+ metadata.gz: 5dfa66d383b0e1a91288b4636fef42b22ac73ec63294a7955ac20298851df1c1
4
+ data.tar.gz: fbe59c3e85352b01674c16831f67b3906367e6a032769582ee552038853194f8
5
5
  SHA512:
6
- metadata.gz: f901dfae13328bb168e60cc72559ee2fd624f801456fee1628272c83ed402c4d00a83d3cac2ba978e6ce67c76853ca2dc5c1c0048b11ac34d74bb2b18bc3b974
7
- data.tar.gz: '086eb84edef491f27c254cb09ee5bce9ec07bcec46f79b2ca65404041e67bfccd877e30e92e6393ca2117c353ec71e8c632c50615c71e481e89c30c071934dd5'
6
+ metadata.gz: 25d7df6cf3861522a99851c8c561f0726a6e922d4aafb0fe646883b86c37ba4c1ec78735940a5eb70056433422fa2987989701260ec7b1e2969fb452758d2d43
7
+ data.tar.gz: d9b637bba6c90e1c08de0b9cd8239e6b8b03aa86eb0a77aebc6b855ff0744220d775c8b7c8a12e1df736f98ee44dac3c6f11ad3a0506aa51e5c22f120f3c13d3
data/CHANGELOG.md CHANGED
@@ -1,3 +1,75 @@
1
+ ## 5.4.0 (2024-09-04)
2
+
3
+ - Added `knn` option
4
+ - Added `rrf` method
5
+ - Added experimental support for scripting to `where` option
6
+ - Added warning for `exists` with non-`true` values
7
+ - Added warning for full reindex and `:queue` mode
8
+ - Fixed `per_page` method when paginating beyond `max_result_window`
9
+ - Dropped support for Ruby < 3.1
10
+
11
+ ## 5.3.1 (2023-11-28)
12
+
13
+ - Fixed error with misspellings below and failed queries
14
+
15
+ ## 5.3.0 (2023-07-02)
16
+
17
+ - Fixed error with `cutoff_frequency`
18
+ - Dropped support for Ruby < 3 and Active Record < 6.1
19
+ - Dropped support for Mongoid < 7
20
+
21
+ ## 5.2.4 (2023-05-11)
22
+
23
+ - Fixed error with non-string routing and `:async` mode
24
+
25
+ ## 5.2.3 (2023-04-12)
26
+
27
+ - Fixed error with missing records and multiple models
28
+
29
+ ## 5.2.2 (2023-04-01)
30
+
31
+ - Fixed `total_docs` method
32
+ - Fixed deprecation warning with Active Support 7.1
33
+
34
+ ## 5.2.1 (2023-02-21)
35
+
36
+ - Added support for `redis-client` gem
37
+
38
+ ## 5.2.0 (2023-02-08)
39
+
40
+ - Added model name to warning about missing records
41
+ - Fixed unnecessary data loading when reindexing relations with `:async` and `:queue` modes
42
+
43
+ ## 5.1.2 (2023-01-29)
44
+
45
+ - Fixed error with missing point in time
46
+
47
+ ## 5.1.1 (2022-12-05)
48
+
49
+ - Added support for strings for `offset` and `per_page`
50
+
51
+ ## 5.1.0 (2022-10-12)
52
+
53
+ - Added support for fractional search timeout
54
+ - Fixed search timeout with `elasticsearch` 8+ and `opensearch-ruby` gems
55
+ - Fixed search timeout not applying to `multi_search`
56
+
57
+ ## 5.0.5 (2022-10-09)
58
+
59
+ - Added `model` method to `Searchkick::Relation`
60
+ - Fixed deprecation warning with `redis` gem
61
+ - Fixed `respond_to?` method on relation loading relation
62
+ - Fixed `Relation loaded` error for non-mutating methods on relation
63
+
64
+ ## 5.0.4 (2022-06-16)
65
+
66
+ - Added `max_result_window` option
67
+ - Improved error message for unsupported versions of Elasticsearch
68
+
69
+ ## 5.0.3 (2022-03-13)
70
+
71
+ - Fixed context for index name for inherited models
72
+
1
73
  ## 5.0.2 (2022-03-03)
2
74
 
3
75
  - Fixed index name for inherited models
data/LICENSE.txt CHANGED
@@ -1,4 +1,4 @@
1
- Copyright (c) 2013-2021 Andrew Kane
1
+ Copyright (c) 2013-2023 Andrew Kane
2
2
 
3
3
  MIT License
4
4
 
data/README.md CHANGED
@@ -26,7 +26,7 @@ Check out [Searchjoy](https://github.com/ankane/searchjoy) for analytics and [Au
26
26
 
27
27
  :tangerine: Battle-tested at [Instacart](https://www.instacart.com/opensource)
28
28
 
29
- [![Build Status](https://github.com/ankane/searchkick/workflows/build/badge.svg?branch=master)](https://github.com/ankane/searchkick/actions)
29
+ [![Build Status](https://github.com/ankane/searchkick/actions/workflows/build.yml/badge.svg)](https://github.com/ankane/searchkick/actions)
30
30
 
31
31
  ## Contents
32
32
 
@@ -43,15 +43,13 @@ Check out [Searchjoy](https://github.com/ankane/searchjoy) for analytics and [Au
43
43
  - [Reference](#reference)
44
44
  - [Contributing](#contributing)
45
45
 
46
- Searchkick 5.0 was recently released! See [how to upgrade](#upgrading)
47
-
48
46
  ## Getting Started
49
47
 
50
48
  Install [Elasticsearch](https://www.elastic.co/downloads/elasticsearch) or [OpenSearch](https://opensearch.org/downloads.html). For Homebrew, use:
51
49
 
52
50
  ```sh
53
- brew install elasticsearch
54
- brew services start elasticsearch
51
+ brew install elastic/tap/elasticsearch-full
52
+ brew services start elasticsearch-full
55
53
  # or
56
54
  brew install opensearch
57
55
  brew services start opensearch
@@ -66,7 +64,7 @@ gem "elasticsearch" # select one
66
64
  gem "opensearch-ruby" # select one
67
65
  ```
68
66
 
69
- The latest version works with Elasticsearch 7 and 8 and OpenSearch 1. For Elasticsearch 6, use version 4.6.3 and [this readme](https://github.com/ankane/searchkick/blob/v4.6.3/README.md).
67
+ The latest version works with Elasticsearch 7 and 8 and OpenSearch 1 and 2. For Elasticsearch 6, use version 4.6.3 and [this readme](https://github.com/ankane/searchkick/blob/v4.6.3/README.md).
70
68
 
71
69
  Add searchkick to models you want to search.
72
70
 
@@ -122,9 +120,9 @@ where: {
122
120
  category: /frozen .+/, # regexp
123
121
  category: {prefix: "frozen"}, # prefix
124
122
  store_id: {exists: true}, # exists
123
+ _not: {store_id: 1}, # negate a condition
125
124
  _or: [{in_stock: true}, {backordered: true}],
126
- _and: [{in_stock: true}, {backordered: true}],
127
- _not: {store_id: 1} # negate a condition
125
+ _and: [{in_stock: true}, {backordered: true}]
128
126
  }
129
127
  ```
130
128
 
@@ -292,12 +290,18 @@ Option | Matches | Example
292
290
 
293
291
  The default is `:word`. The most matches will happen with `:word_middle`.
294
292
 
293
+ To specify different matching for different fields, use:
294
+
295
+ ```ruby
296
+ Product.search(query, fields: [{name: :word_start}, {brand: :word_middle}])
297
+ ```
298
+
295
299
  ### Exact Matches
296
300
 
297
301
  To match a field exactly (case-sensitive), use:
298
302
 
299
303
  ```ruby
300
- Product.search(query, fields: [{email: :exact}, :name])
304
+ Product.search(query, fields: [{name: :exact}])
301
305
  ```
302
306
 
303
307
  ### Phrase Matches
@@ -323,11 +327,11 @@ end
323
327
  See the [list of languages](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html#analysis-stemmer-tokenfilter-configure-parms). A few languages require plugins:
324
328
 
325
329
  - `chinese` - [analysis-ik plugin](https://github.com/medcl/elasticsearch-analysis-ik)
326
- - `chinese2` - [analysis-smartcn plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/7.4/analysis-smartcn.html)
327
- - `japanese` - [analysis-kuromoji plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/7.4/analysis-kuromoji.html)
330
+ - `chinese2` - [analysis-smartcn plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-smartcn.html)
331
+ - `japanese` - [analysis-kuromoji plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-kuromoji.html)
328
332
  - `korean` - [analysis-openkoreantext plugin](https://github.com/open-korean-text/elasticsearch-analysis-openkoreantext)
329
- - `korean2` - [analysis-nori plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/7.4/analysis-nori.html)
330
- - `polish` - [analysis-stempel plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/7.4/analysis-stempel.html)
333
+ - `korean2` - [analysis-nori plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-nori.html)
334
+ - `polish` - [analysis-stempel plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-stempel.html)
331
335
  - `ukrainian` - [analysis-ukrainian plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/7.4/analysis-ukrainian.html)
332
336
  - `vietnamese` - [analysis-vietnamese plugin](https://github.com/duydo/elasticsearch-analysis-vietnamese)
333
337
 
@@ -592,6 +596,14 @@ There are four strategies for keeping the index synced with your database.
592
596
  end
593
597
  ```
594
598
 
599
+ And reindex a record or relation manually.
600
+
601
+ ```ruby
602
+ product.reindex
603
+ # or
604
+ store.products.reindex(mode: :async)
605
+ ```
606
+
595
607
  You can also do bulk updates.
596
608
 
597
609
  ```ruby
@@ -608,6 +620,12 @@ Searchkick.callbacks(false) do
608
620
  end
609
621
  ```
610
622
 
623
+ Or override the model’s strategy.
624
+
625
+ ```ruby
626
+ product.reindex(mode: :async) # :inline or :queue
627
+ ```
628
+
611
629
  ### Associations
612
630
 
613
631
  Data is **not** automatically synced when an association is updated. If this is desired, add a callback to reindex:
@@ -654,20 +672,16 @@ The best starting point to improve your search **by far** is to track searches a
654
672
  Product.search("apple", track: {user_id: current_user.id})
655
673
  ```
656
674
 
657
- [See the docs](https://github.com/ankane/searchjoy) for how to install and use.
658
-
659
- Focus on:
675
+ [See the docs](https://github.com/ankane/searchjoy) for how to install and use. Focus on top searches with a low conversion rate.
660
676
 
661
- - top searches with low conversions
662
- - top searches with no results
663
-
664
- Searchkick can then use the conversion data to learn what users are looking for. If a user searches for “ice cream” and adds Ben & Jerry’s Chunky Monkey to the cart (our conversion metric at Instacart), that item gets a little more weight for similar searches.
677
+ Searchkick can then use the conversion data to learn what users are looking for. If a user searches for “ice cream” and adds Ben & Jerry’s Chunky Monkey to the cart (our conversion metric at Instacart), that item gets a little more weight for similar searches. This can make a huge difference on the quality of your search.
665
678
 
666
679
  Add conversion data with:
667
680
 
668
681
  ```ruby
669
682
  class Product < ApplicationRecord
670
- has_many :searches, class_name: "Searchjoy::Search", as: :convertable
683
+ has_many :conversions, class_name: "Searchjoy::Conversion", as: :convertable
684
+ has_many :searches, class_name: "Searchjoy::Search", through: :conversions
671
685
 
672
686
  searchkick conversions: [:conversions] # name of field
673
687
 
@@ -681,15 +695,100 @@ class Product < ApplicationRecord
681
695
  end
682
696
  ```
683
697
 
684
- Reindex and set up a cron job to add new conversions daily.
698
+ Reindex and set up a cron job to add new conversions daily. For zero downtime deployment, temporarily set `conversions: false` in your search calls until the data is reindexed.
699
+
700
+ ### Performant Conversions
701
+
702
+ A performant way to do conversions is to cache them to prevent N+1 queries. For Postgres, create a migration with:
685
703
 
686
704
  ```ruby
687
- rake searchkick:reindex CLASS=Product
705
+ add_column :products, :search_conversions, :jsonb
688
706
  ```
689
707
 
690
- This can make a huge difference on the quality of your search.
708
+ For MySQL, use `:json`, and for others, use `:text` with a [JSON serializer](https://api.rubyonrails.org/classes/ActiveRecord/AttributeMethods/Serialization/ClassMethods.html).
709
+
710
+ Next, update your model. Create a separate method for conversion data so you can use [partial reindexing](#partial-reindexing).
691
711
 
692
- For a more performant way to reindex conversion data, check out [performant conversions](#performant-conversions).
712
+ ```ruby
713
+ class Product < ApplicationRecord
714
+ searchkick conversions: [:conversions]
715
+
716
+ def search_data
717
+ {
718
+ name: name,
719
+ category: category
720
+ }.merge(conversions_data)
721
+ end
722
+
723
+ def conversions_data
724
+ {
725
+ conversions: search_conversions || {}
726
+ }
727
+ end
728
+ end
729
+ ```
730
+
731
+ Deploy and reindex your data. For zero downtime deployment, temporarily set `conversions: false` in your search calls until the data is reindexed.
732
+
733
+ ```ruby
734
+ Product.reindex
735
+ ```
736
+
737
+ Then, create a job to update the conversions column and reindex records with new conversions. Here’s one you can use for Searchjoy:
738
+
739
+ ```ruby
740
+ class UpdateConversionsJob < ApplicationJob
741
+ def perform(class_name, since: nil, update: true, reindex: true)
742
+ model = Searchkick.load_model(class_name)
743
+
744
+ # get records that have a recent conversion
745
+ recently_converted_ids =
746
+ Searchjoy::Conversion.where(convertable_type: class_name).where(created_at: since..)
747
+ .order(:convertable_id).distinct.pluck(:convertable_id)
748
+
749
+ # split into batches
750
+ recently_converted_ids.in_groups_of(1000, false) do |ids|
751
+ if update
752
+ # fetch conversions
753
+ conversions =
754
+ Searchjoy::Conversion.where(convertable_id: ids, convertable_type: class_name)
755
+ .joins(:search).where.not(searchjoy_searches: {user_id: nil})
756
+ .group(:convertable_id, :query).distinct.count(:user_id)
757
+
758
+ # group by record
759
+ conversions_by_record = {}
760
+ conversions.each do |(id, query), count|
761
+ (conversions_by_record[id] ||= {})[query] = count
762
+ end
763
+
764
+ # update conversions column
765
+ model.transaction do
766
+ conversions_by_record.each do |id, conversions|
767
+ model.where(id: id).update_all(search_conversions: conversions)
768
+ end
769
+ end
770
+ end
771
+
772
+ if reindex
773
+ # reindex conversions data
774
+ model.where(id: ids).reindex(:conversions_data)
775
+ end
776
+ end
777
+ end
778
+ end
779
+ ```
780
+
781
+ Run the job:
782
+
783
+ ```ruby
784
+ UpdateConversionsJob.perform_now("Product")
785
+ ```
786
+
787
+ And set it up to run daily.
788
+
789
+ ```ruby
790
+ UpdateConversionsJob.perform_later("Product", since: 1.day.ago)
791
+ ```
693
792
 
694
793
  ## Personalized Results
695
794
 
@@ -716,7 +815,7 @@ Product.search("milk", boost_where: {orderer_ids: current_user.id})
716
815
 
717
816
  Autocomplete predicts what a user will type, making the search experience faster and easier.
718
817
 
719
- ![Autocomplete](https://gist.github.com/ankane/b6988db2802aca68a589b31e41b44195/raw/40febe948427e5bc53ec4e5dc248822855fef76f/autocomplete.png)
818
+ ![Autocomplete](https://gist.githubusercontent.com/ankane/b6988db2802aca68a589b31e41b44195/raw/40febe948427e5bc53ec4e5dc248822855fef76f/autocomplete.png)
720
819
 
721
820
  **Note:** To autocomplete on search terms rather than results, check out [Autosuggest](https://github.com/ankane/autosuggest).
722
821
 
@@ -782,7 +881,7 @@ Then add the search box and JavaScript code to a view.
782
881
 
783
882
  ## Suggestions
784
883
 
785
- ![Suggest](https://gist.github.com/ankane/b6988db2802aca68a589b31e41b44195/raw/40febe948427e5bc53ec4e5dc248822855fef76f/recursion.png)
884
+ ![Suggest](https://gist.githubusercontent.com/ankane/b6988db2802aca68a589b31e41b44195/raw/40febe948427e5bc53ec4e5dc248822855fef76f/recursion.png)
786
885
 
787
886
  ```ruby
788
887
  class Product < ApplicationRecord
@@ -801,7 +900,7 @@ products.suggestions # ["peanut butter"]
801
900
 
802
901
  [Aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html) provide aggregated search data.
803
902
 
804
- ![Aggregations](https://gist.github.com/ankane/b6988db2802aca68a589b31e41b44195/raw/40febe948427e5bc53ec4e5dc248822855fef76f/facets.png)
903
+ ![Aggregations](https://gist.githubusercontent.com/ankane/b6988db2802aca68a589b31e41b44195/raw/40febe948427e5bc53ec4e5dc248822855fef76f/facets.png)
805
904
 
806
905
  ```ruby
807
906
  products = Product.search("chuck taylor", aggs: [:product_type, :gender, :brand])
@@ -1011,7 +1110,7 @@ Restaurant.search("soup", where: {bounds: {geo_shape: {type: "polygon", coordina
1011
1110
  Falling entirely within the query shape
1012
1111
 
1013
1112
  ```ruby
1014
- Restaurant.search("salad", where: {bounds: {geo_shape: {type: "circle", relation: "within", coordinates: [{lat: 38, lon: -123}], radius: "1km"}}})
1113
+ Restaurant.search("salad", where: {bounds: {geo_shape: {type: "circle", relation: "within", coordinates: {lat: 38, lon: -123}, radius: "1km"}}})
1015
1114
  ```
1016
1115
 
1017
1116
  Not touching the query shape
@@ -1384,7 +1483,15 @@ ENV["ELASTICSEARCH_URL"] = "https://user:password@host1,https://user:password@ho
1384
1483
  ENV["OPENSEARCH_URL"] = "https://user:password@host1,https://user:password@host2"
1385
1484
  ```
1386
1485
 
1387
- See [elastic-transport](https://github.com/elastic/elastic-transport-ruby) or [opensearch-transport](https://github.com/opensearch-project/opensearch-ruby/tree/main/opensearch-transport) for a complete list of options.
1486
+ ### Client Options
1487
+
1488
+ Create an initializer with:
1489
+
1490
+ ```ruby
1491
+ Searchkick.client_options[:reload_connections] = true
1492
+ ```
1493
+
1494
+ See the docs for [Elasticsearch](https://www.elastic.co/guide/en/elasticsearch/client/ruby-api/current/advanced-config.html) or [Opensearch](https://rubydoc.info/gems/opensearch-transport#configuration) for a complete list of options.
1388
1495
 
1389
1496
  ### Lograge
1390
1497
 
@@ -1575,11 +1682,12 @@ Reindex a subset of attributes to reduce time spent generating search data and c
1575
1682
  class Product < ApplicationRecord
1576
1683
  def search_data
1577
1684
  {
1578
- name: name
1579
- }.merge(search_prices)
1685
+ name: name,
1686
+ category: category
1687
+ }.merge(prices_data)
1580
1688
  end
1581
1689
 
1582
- def search_prices
1690
+ def prices_data
1583
1691
  {
1584
1692
  price: price,
1585
1693
  sale_price: sale_price
@@ -1591,68 +1699,7 @@ end
1591
1699
  And use:
1592
1700
 
1593
1701
  ```ruby
1594
- Product.reindex(:search_prices)
1595
- ```
1596
-
1597
- ### Performant Conversions
1598
-
1599
- Split out conversions into a separate method so you can use partial reindexing, and cache conversions to prevent N+1 queries. Be sure to use a centralized cache store like Memcached or Redis.
1600
-
1601
- ```ruby
1602
- class Product < ApplicationRecord
1603
- def search_data
1604
- {
1605
- name: name
1606
- }.merge(search_conversions)
1607
- end
1608
-
1609
- def search_conversions
1610
- {
1611
- conversions: Rails.cache.read("search_conversions:#{self.class.name}:#{id}") || {}
1612
- }
1613
- end
1614
- end
1615
- ```
1616
-
1617
- Create a job to update the cache and reindex records with new conversions.
1618
-
1619
- ```ruby
1620
- class ReindexConversionsJob < ApplicationJob
1621
- def perform(class_name)
1622
- # get records that have a recent conversion
1623
- recently_converted_ids =
1624
- Searchjoy::Search.where("convertable_type = ? AND converted_at > ?", class_name, 1.day.ago)
1625
- .order(:convertable_id).distinct.pluck(:convertable_id)
1626
-
1627
- # split into groups
1628
- recently_converted_ids.in_groups_of(1000, false) do |ids|
1629
- # fetch conversions
1630
- conversions =
1631
- Searchjoy::Search.where(convertable_id: ids, convertable_type: class_name)
1632
- .group(:convertable_id, :query).distinct.count(:user_id)
1633
-
1634
- # group conversions by record
1635
- conversions_by_record = {}
1636
- conversions.each do |(id, query), count|
1637
- (conversions_by_record[id] ||= {})[query] = count
1638
- end
1639
-
1640
- # write to cache
1641
- conversions_by_record.each do |id, conversions|
1642
- Rails.cache.write("search_conversions:#{class_name}:#{id}", conversions)
1643
- end
1644
-
1645
- # partial reindex
1646
- class_name.constantize.where(id: ids).reindex(:search_conversions)
1647
- end
1648
- end
1649
- end
1650
- ```
1651
-
1652
- Run the job with:
1653
-
1654
- ```ruby
1655
- ReindexConversionsJob.perform_later("Product")
1702
+ Product.reindex(:prices_data)
1656
1703
  ```
1657
1704
 
1658
1705
  ## Advanced
@@ -1798,6 +1845,87 @@ To query nested data, use dot notation.
1798
1845
  Product.search("san", fields: ["store.city"], where: {"store.zip_code" => 12345})
1799
1846
  ```
1800
1847
 
1848
+ ## Nearest Neighbor Search
1849
+
1850
+ *Available for Elasticsearch 8.6+ and OpenSearch 2.4+*
1851
+
1852
+ ```ruby
1853
+ class Product < ApplicationRecord
1854
+ searchkick knn: {embedding: {dimensions: 3, distance: "cosine"}}
1855
+ end
1856
+ ```
1857
+
1858
+ Also supports `euclidean` and `inner_product`
1859
+
1860
+ Reindex and search with:
1861
+
1862
+ ```ruby
1863
+ Product.search(knn: {field: :embedding, vector: [1, 2, 3]}, limit: 10)
1864
+ ```
1865
+
1866
+ ## Semantic Search
1867
+
1868
+ First, add [nearest neighbor search](#nearest-neighbor-search-unreleased-experimental) to your model
1869
+
1870
+ ```ruby
1871
+ class Product < ApplicationRecord
1872
+ searchkick knn: {embedding: {dimensions: 768, distance: "cosine"}}
1873
+ end
1874
+ ```
1875
+
1876
+ Generate an embedding for each record (you can use an external service or a library like [Informers](https://github.com/ankane/informers))
1877
+
1878
+ ```ruby
1879
+ embed = Informers.pipeline("embedding", "Snowflake/snowflake-arctic-embed-m-v1.5")
1880
+ embed_options = {model_output: "sentence_embedding", pooling: "none"} # specific to embedding model
1881
+
1882
+ Product.find_each do |product|
1883
+ embedding = embed.(product.name, **embed_options)
1884
+ product.update!(embedding: embedding)
1885
+ end
1886
+ ```
1887
+
1888
+ For search, generate an embedding for the query (the query prefix is specific to the [embedding model](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5))
1889
+
1890
+ ```ruby
1891
+ query_prefix = "Represent this sentence for searching relevant passages: "
1892
+ query_embedding = embed.(query_prefix + query, **embed_options)
1893
+ ```
1894
+
1895
+ And perform nearest neighbor search
1896
+
1897
+ ```ruby
1898
+ Product.search(knn: {field: :embedding, vector: query_embedding}, limit: 20)
1899
+ ```
1900
+
1901
+ See a [full example](examples/semantic.rb)
1902
+
1903
+ ## Hybrid Search
1904
+
1905
+ Perform keyword search and semantic search in parallel
1906
+
1907
+ ```ruby
1908
+ keyword_search = Product.search(query, limit: 20)
1909
+ semantic_search = Product.search(knn: {field: :embedding, vector: query_embedding}, limit: 20)
1910
+ Searchkick.multi_search([keyword_search, semantic_search])
1911
+ ```
1912
+
1913
+ To combine the results, use Reciprocal Rank Fusion (RRF)
1914
+
1915
+ ```ruby
1916
+ Searchkick::Reranking.rrf(keyword_search, semantic_search).first(5)
1917
+ ```
1918
+
1919
+ Or a reranking model
1920
+
1921
+ ```ruby
1922
+ rerank = Informers.pipeline("reranking", "mixedbread-ai/mxbai-rerank-xsmall-v1")
1923
+ results = (keyword_search.to_a + semantic_search.to_a).uniq
1924
+ rerank.(query, results.map(&:name)).first(5).map { |v| results[v[:doc_id]] }
1925
+ ```
1926
+
1927
+ See a [full example](examples/hybrid.rb)
1928
+
1801
1929
  ## Reference
1802
1930
 
1803
1931
  Reindex one record
@@ -2036,12 +2164,24 @@ Turn on misspellings after a certain number of characters
2036
2164
  Product.search("api", misspellings: {prefix_length: 2}) # api, apt, no ahi
2037
2165
  ```
2038
2166
 
2039
- **Note:** With this option, if the query length is the same as `prefix_length`, misspellings are turned off with Elasticsearch 7 and OpenSearch
2167
+ **Note:** With this option, if the query length is the same as `prefix_length`, misspellings are turned off with Elasticsearch 7 and OpenSearch 1
2040
2168
 
2041
2169
  ```ruby
2042
2170
  Product.search("ah", misspellings: {prefix_length: 2}) # ah, no aha
2043
2171
  ```
2044
2172
 
2173
+ BigDecimal values are indexed as floats by default so they can be used for boosting. Convert them to strings to keep full precision.
2174
+
2175
+ ```ruby
2176
+ class Product < ApplicationRecord
2177
+ def search_data
2178
+ {
2179
+ units: units.to_s("F")
2180
+ }
2181
+ end
2182
+ end
2183
+ ```
2184
+
2045
2185
  ## Gotchas
2046
2186
 
2047
2187
  ### Consistency
@@ -67,7 +67,8 @@ module Searchkick
67
67
  index: name,
68
68
  body: {
69
69
  query: {match_all: {}},
70
- size: 0
70
+ size: 0,
71
+ track_total_hits: true
71
72
  }
72
73
  )
73
74
 
@@ -98,7 +99,7 @@ module Searchkick
98
99
  record_data = RecordData.new(self, record).record_data
99
100
 
100
101
  # remove underscore
101
- get_options = record_data.to_h { |k, v| [k.to_s.sub(/\A_/, "").to_sym, v] }
102
+ get_options = record_data.to_h { |k, v| [k.to_s.delete_prefix("_").to_sym, v] }
102
103
 
103
104
  client.get(get_options)["_source"]
104
105
  end
@@ -351,6 +352,8 @@ module Searchkick
351
352
  # http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/
352
353
  def full_reindex(relation, import: true, resume: false, retain: false, mode: nil, refresh_interval: nil, scope: nil, wait: nil)
353
354
  raise ArgumentError, "wait only available in :async mode" if !wait.nil? && mode != :async
355
+ # TODO raise ArgumentError in Searchkick 6
356
+ Searchkick.warn("Full reindex does not support :queue mode - use :async mode instead") if mode == :queue
354
357
 
355
358
  if resume
356
359
  index_name = all_indices.sort.last
@@ -418,7 +421,7 @@ module Searchkick
418
421
  true
419
422
  end
420
423
  rescue => e
421
- if Searchkick.transport_error?(e) && e.message.include?("No handler for type [text]")
424
+ if Searchkick.transport_error?(e) && (e.message.include?("No handler for type [text]") || e.message.include?("class java.util.ArrayList cannot be cast to class java.util.Map"))
422
425
  raise UnsupportedVersionError
423
426
  end
424
427