searchkick 5.4.0 → 5.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +27 -1
- data/LICENSE.txt +1 -1
- data/README.md +51 -58
- data/lib/searchkick/index_options.rb +10 -1
- data/lib/searchkick/query.rb +25 -7
- data/lib/searchkick/railtie.rb +1 -1
- data/lib/searchkick/version.rb +1 -1
- metadata +6 -10
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 37b82bea130dfffde9634bf74e4e078e97bbe6e8d0b4e540c13800c765ddc6c3
|
4
|
+
data.tar.gz: 2435f38ff18bf471673e1df11f0fd96cfacb408cf390e04afcc068488777a358
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: f0541cc6673c2382ec82bce16bb5c230fa0fb3220d6430324739c8fb722ca75f8f70d33c1833642ee03edc936f8dba88c37282cdf6e200ead84ae16688054307
|
7
|
+
data.tar.gz: 7522e671cf005d61a5df2856e6754605570f281f6e4b359f537f4a9d6cfe17eccad1466b81adbb683b782c9941caac4f935a2244094f7af11b09f58954d173ff
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,11 @@
|
|
1
|
+
## 5.5.0 (2025-04-03)
|
2
|
+
|
3
|
+
- Added `m` and `ef_construction` to `knn` index option
|
4
|
+
- Added `ef_search` to `knn` search option
|
5
|
+
- Fixed exact cosine distance for OpenSearch 2.19+
|
6
|
+
- Dropped support for Ruby < 3.2 and Active Record < 7.1
|
7
|
+
- Dropped support for Mongoid < 8
|
8
|
+
|
1
9
|
## 5.4.0 (2024-09-04)
|
2
10
|
|
3
11
|
- Added `knn` option
|
@@ -833,4 +841,22 @@ Breaking changes
|
|
833
841
|
|
834
842
|
## 0.1.2 (2013-07-30)
|
835
843
|
|
836
|
-
-
|
844
|
+
- Use conversions by default
|
845
|
+
|
846
|
+
## 0.1.1 (2013-07-29)
|
847
|
+
|
848
|
+
- Renamed `_source` to `search_data`
|
849
|
+
- Renamed `searchkick_import` to `search_import`
|
850
|
+
|
851
|
+
## 0.1.0 (2013-07-28)
|
852
|
+
|
853
|
+
- Added `_source` method
|
854
|
+
- Added `index_name` option
|
855
|
+
|
856
|
+
## 0.0.2 (2013-07-17)
|
857
|
+
|
858
|
+
- Added `conversions` option
|
859
|
+
|
860
|
+
## 0.0.1 (2013-07-14)
|
861
|
+
|
862
|
+
- First release
|
data/LICENSE.txt
CHANGED
data/README.md
CHANGED
@@ -821,7 +821,7 @@ Autocomplete predicts what a user will type, making the search experience faster
|
|
821
821
|
|
822
822
|
**Note 2:** If you only have a few thousand records, don’t use Searchkick for autocomplete. It’s *much* faster to load all records into JavaScript and autocomplete there (eliminates network requests).
|
823
823
|
|
824
|
-
First, specify which fields use this feature. This is necessary since autocomplete can increase the index size significantly, but don’t worry - this gives you blazing
|
824
|
+
First, specify which fields use this feature. This is necessary since autocomplete can increase the index size significantly, but don’t worry - this gives you blazing fast queries.
|
825
825
|
|
826
826
|
```ruby
|
827
827
|
class Movie < ApplicationRecord
|
@@ -835,7 +835,7 @@ Reindex and search with:
|
|
835
835
|
Movie.search("jurassic pa", fields: [:title], match: :word_start)
|
836
836
|
```
|
837
837
|
|
838
|
-
|
838
|
+
Use a front-end library like [typeahead.js](https://twitter.github.io/typeahead.js/) to show the results.
|
839
839
|
|
840
840
|
#### Here’s how to make it work with Rails
|
841
841
|
|
@@ -844,13 +844,14 @@ First, add a route and controller action.
|
|
844
844
|
```ruby
|
845
845
|
class MoviesController < ApplicationController
|
846
846
|
def autocomplete
|
847
|
-
render json: Movie.search(
|
847
|
+
render json: Movie.search(
|
848
|
+
params[:query],
|
848
849
|
fields: ["title^5", "director"],
|
849
850
|
match: :word_start,
|
850
851
|
limit: 10,
|
851
852
|
load: false,
|
852
853
|
misspellings: {below: 5}
|
853
|
-
|
854
|
+
).map(&:title)
|
854
855
|
end
|
855
856
|
end
|
856
857
|
```
|
@@ -1023,7 +1024,7 @@ Additional options can be specified for each field:
|
|
1023
1024
|
Band.search("cinema", fields: [:name], highlight: {fields: {name: {fragment_size: 200}}})
|
1024
1025
|
```
|
1025
1026
|
|
1026
|
-
You can find available highlight options in the [Elasticsearch
|
1027
|
+
You can find available highlight options in the [Elasticsearch](https://www.elastic.co/guide/en/elasticsearch/reference/current/highlighting.html) or [OpenSearch](https://opensearch.org/docs/latest/search-plugins/searching-data/highlight/) reference.
|
1027
1028
|
|
1028
1029
|
## Similar Items
|
1029
1030
|
|
@@ -1511,13 +1512,25 @@ See [Production Rails](https://github.com/ankane/production_rails) for other goo
|
|
1511
1512
|
|
1512
1513
|
### JSON Generation
|
1513
1514
|
|
1514
|
-
|
1515
|
+
Increase performance with faster JSON generation. Add to your Gemfile:
|
1515
1516
|
|
1516
1517
|
```ruby
|
1517
|
-
gem "
|
1518
|
+
gem "json", ">= 2.10.2"
|
1518
1519
|
```
|
1519
1520
|
|
1520
|
-
|
1521
|
+
And create an initializer with:
|
1522
|
+
|
1523
|
+
```ruby
|
1524
|
+
class SearchSerializer
|
1525
|
+
def dump(object)
|
1526
|
+
JSON.generate(object)
|
1527
|
+
end
|
1528
|
+
end
|
1529
|
+
|
1530
|
+
Elasticsearch::API.settings[:serializer] = SearchSerializer.new
|
1531
|
+
# or
|
1532
|
+
OpenSearch::API.settings[:serializer] = SearchSerializer.new
|
1533
|
+
```
|
1521
1534
|
|
1522
1535
|
### Persistent HTTP Connections
|
1523
1536
|
|
@@ -1533,8 +1546,6 @@ To reduce log noise, create an initializer with:
|
|
1533
1546
|
Ethon.logger = Logger.new(nil)
|
1534
1547
|
```
|
1535
1548
|
|
1536
|
-
If you run into issues on Windows, check out [this post](https://www.rastating.com/fixing-issues-in-typhoeus-and-httparty-on-windows/).
|
1537
|
-
|
1538
1549
|
### Searchable Fields
|
1539
1550
|
|
1540
1551
|
By default, all string fields are searchable (can be used in `fields` option). Speed up indexing and reduce index size by only making some fields searchable.
|
@@ -1563,7 +1574,7 @@ For large data sets, you can use background jobs to parallelize reindexing.
|
|
1563
1574
|
|
1564
1575
|
```ruby
|
1565
1576
|
Product.reindex(mode: :async)
|
1566
|
-
# {index_name: "
|
1577
|
+
# {index_name: "products_production_20250111210018065"}
|
1567
1578
|
```
|
1568
1579
|
|
1569
1580
|
Once the jobs complete, promote the new index with:
|
@@ -1590,19 +1601,19 @@ You can also have Searchkick wait for reindexing to complete
|
|
1590
1601
|
Product.reindex(mode: :async, wait: true)
|
1591
1602
|
```
|
1592
1603
|
|
1593
|
-
You can use
|
1604
|
+
You can use your background job framework to control concurrency. For Solid Queue, create an initializer with:
|
1594
1605
|
|
1595
1606
|
```ruby
|
1596
|
-
|
1597
|
-
|
1598
|
-
|
1599
|
-
And create an initializer with:
|
1607
|
+
module SearchkickBulkReindexConcurrency
|
1608
|
+
extend ActiveSupport::Concern
|
1600
1609
|
|
1601
|
-
|
1602
|
-
|
1610
|
+
included do
|
1611
|
+
limits_concurrency to: 3, key: ""
|
1612
|
+
end
|
1613
|
+
end
|
1603
1614
|
|
1604
|
-
|
1605
|
-
|
1615
|
+
Rails.application.config.after_initialize do
|
1616
|
+
Searchkick::BulkReindexJob.include(SearchkickBulkReindexConcurrency)
|
1606
1617
|
end
|
1607
1618
|
```
|
1608
1619
|
|
@@ -1616,7 +1627,7 @@ You can specify a longer refresh interval while reindexing to increase performan
|
|
1616
1627
|
Product.reindex(mode: :async, refresh_interval: "30s")
|
1617
1628
|
```
|
1618
1629
|
|
1619
|
-
**Note:** This only makes a
|
1630
|
+
**Note:** This only makes a noticeable difference with parallel reindexing.
|
1620
1631
|
|
1621
1632
|
When promoting, have it restored to the value in your mapping (defaults to `1s`).
|
1622
1633
|
|
@@ -1863,9 +1874,27 @@ Reindex and search with:
|
|
1863
1874
|
Product.search(knn: {field: :embedding, vector: [1, 2, 3]}, limit: 10)
|
1864
1875
|
```
|
1865
1876
|
|
1877
|
+
### HNSW Options
|
1878
|
+
|
1879
|
+
Nearest neighbor search uses [HNSW](https://en.wikipedia.org/wiki/Hierarchical_navigable_small_world) for indexing.
|
1880
|
+
|
1881
|
+
Specify `m` and `ef_construction`
|
1882
|
+
|
1883
|
+
```ruby
|
1884
|
+
class Product < ApplicationRecord
|
1885
|
+
searchkick knn: {embedding: {dimensions: 3, distance: "cosine", m: 16, ef_construction: 100}}
|
1886
|
+
end
|
1887
|
+
```
|
1888
|
+
|
1889
|
+
Specify `ef_search`
|
1890
|
+
|
1891
|
+
```ruby
|
1892
|
+
Product.search(knn: {field: :embedding, vector: [1, 2, 3], ef_search: 40}, limit: 10)
|
1893
|
+
```
|
1894
|
+
|
1866
1895
|
## Semantic Search
|
1867
1896
|
|
1868
|
-
First, add [nearest neighbor search](#nearest-neighbor-search
|
1897
|
+
First, add [nearest neighbor search](#nearest-neighbor-search) to your model
|
1869
1898
|
|
1870
1899
|
```ruby
|
1871
1900
|
class Product < ApplicationRecord
|
@@ -2205,42 +2234,6 @@ end
|
|
2205
2234
|
|
2206
2235
|
For convenience, this is set by default in the test environment.
|
2207
2236
|
|
2208
|
-
## Upgrading
|
2209
|
-
|
2210
|
-
### 5.0
|
2211
|
-
|
2212
|
-
Searchkick 5 supports both the `elasticsearch` and `opensearch-ruby` gems. Add the one you want to use to your Gemfile:
|
2213
|
-
|
2214
|
-
```ruby
|
2215
|
-
gem "elasticsearch"
|
2216
|
-
# or
|
2217
|
-
gem "opensearch-ruby"
|
2218
|
-
```
|
2219
|
-
|
2220
|
-
If using the deprecated `faraday_middleware-aws-signers-v4` gem, switch to `faraday_middleware-aws-sigv4`.
|
2221
|
-
|
2222
|
-
Also, searches now use lazy loading:
|
2223
|
-
|
2224
|
-
```ruby
|
2225
|
-
# search not executed
|
2226
|
-
Product.search("milk")
|
2227
|
-
|
2228
|
-
# search executed
|
2229
|
-
Product.search("milk").to_a
|
2230
|
-
```
|
2231
|
-
|
2232
|
-
You can reindex relations in the background:
|
2233
|
-
|
2234
|
-
```ruby
|
2235
|
-
store.products.reindex(mode: :async)
|
2236
|
-
# or
|
2237
|
-
store.products.reindex(mode: :queue)
|
2238
|
-
```
|
2239
|
-
|
2240
|
-
And there’s a [new option](#default-scopes) for models with default scopes.
|
2241
|
-
|
2242
|
-
Check out the [changelog](https://github.com/ankane/searchkick/blob/master/CHANGELOG.md#500-2022-02-21) for the full list of changes.
|
2243
|
-
|
2244
2237
|
## History
|
2245
2238
|
|
2246
2239
|
View the [changelog](https://github.com/ankane/searchkick/blob/master/CHANGELOG.md).
|
@@ -451,7 +451,8 @@ module Searchkick
|
|
451
451
|
vector_options[:method] = {
|
452
452
|
name: "hnsw",
|
453
453
|
space_type: space_type,
|
454
|
-
engine: "lucene"
|
454
|
+
engine: "lucene",
|
455
|
+
parameters: knn_options.slice(:m, :ef_construction)
|
455
456
|
}
|
456
457
|
end
|
457
458
|
|
@@ -475,6 +476,14 @@ module Searchkick
|
|
475
476
|
else
|
476
477
|
raise ArgumentError, "Unknown distance: #{distance}"
|
477
478
|
end
|
479
|
+
|
480
|
+
vector_index_options = knn_options.slice(:m, :ef_construction)
|
481
|
+
if vector_index_options.any?
|
482
|
+
# TODO no quantization by default in Searchkick 6
|
483
|
+
# int8_hnsw was made the default in Elasticsearch 8.14.0
|
484
|
+
type = Searchkick.server_below?("8.14.0") ? "hnsw" : "int8_hnsw"
|
485
|
+
vector_options[:index_options] = {type: type}.merge(vector_index_options)
|
486
|
+
end
|
478
487
|
end
|
479
488
|
|
480
489
|
mapping[field.to_s] = vector_options
|
data/lib/searchkick/query.rb
CHANGED
@@ -891,6 +891,7 @@ module Searchkick
|
|
891
891
|
exact = knn[:exact]
|
892
892
|
exact = field_options[:distance].nil? || distance != field_options[:distance] if exact.nil?
|
893
893
|
k = per_page + offset
|
894
|
+
ef_search = knn[:ef_search]
|
894
895
|
filter = payload.delete(:query)
|
895
896
|
|
896
897
|
if distance.nil?
|
@@ -934,22 +935,31 @@ module Searchkick
|
|
934
935
|
space_type: space_type
|
935
936
|
}
|
936
937
|
},
|
937
|
-
boost: distance == "cosine" ? 0.5 : 1.0
|
938
|
+
boost: distance == "cosine" && Searchkick.server_below?("2.19.0", true) ? 0.5 : 1.0
|
938
939
|
}
|
939
940
|
}
|
940
941
|
else
|
942
|
+
if ef_search && Searchkick.server_below?("2.16.0", true)
|
943
|
+
raise Error, "ef_search requires OpenSearch 2.16+"
|
944
|
+
end
|
945
|
+
|
941
946
|
payload[:query] = {
|
942
947
|
knn: {
|
943
948
|
field.to_sym => {
|
944
949
|
vector: vector,
|
945
950
|
k: k,
|
946
951
|
filter: filter
|
947
|
-
}
|
952
|
+
}.merge(ef_search ? {method_parameters: {ef_search: ef_search}} : {})
|
948
953
|
}
|
949
954
|
}
|
950
955
|
end
|
951
956
|
else
|
952
957
|
if exact
|
958
|
+
# prevent incorrect distances/results with Elasticsearch 9.0.0-rc1
|
959
|
+
if !below90? && field_options[:distance] == "cosine" && distance != "cosine"
|
960
|
+
raise ArgumentError, "distance must match searchkick options"
|
961
|
+
end
|
962
|
+
|
953
963
|
# https://github.com/elastic/elasticsearch/blob/main/docs/reference/vectors/vector-functions.asciidoc
|
954
964
|
source =
|
955
965
|
case distance
|
@@ -987,7 +997,7 @@ module Searchkick
|
|
987
997
|
query_vector: vector,
|
988
998
|
k: k,
|
989
999
|
filter: filter
|
990
|
-
}
|
1000
|
+
}.merge(ef_search ? {num_candidates: ef_search} : {})
|
991
1001
|
end
|
992
1002
|
end
|
993
1003
|
end
|
@@ -1134,13 +1144,17 @@ module Searchkick
|
|
1134
1144
|
range_query =
|
1135
1145
|
case op
|
1136
1146
|
when :gt
|
1137
|
-
|
1147
|
+
# TODO always use gt in Searchkick 6
|
1148
|
+
below90? ? {from: op_value, include_lower: false} : {gt: op_value}
|
1138
1149
|
when :gte
|
1139
|
-
|
1150
|
+
# TODO always use gte in Searchkick 6
|
1151
|
+
below90? ? {from: op_value, include_lower: true} : {gte: op_value}
|
1140
1152
|
when :lt
|
1141
|
-
|
1153
|
+
# TODO always use lt in Searchkick 6
|
1154
|
+
below90? ? {to: op_value, include_upper: false} : {lt: op_value}
|
1142
1155
|
when :lte
|
1143
|
-
|
1156
|
+
# TODO always use lte in Searchkick 6
|
1157
|
+
below90? ? {to: op_value, include_upper: true} : {lte: op_value}
|
1144
1158
|
else
|
1145
1159
|
raise ArgumentError, "Unknown where operator: #{op.inspect}"
|
1146
1160
|
end
|
@@ -1301,5 +1315,9 @@ module Searchkick
|
|
1301
1315
|
def below80?
|
1302
1316
|
Searchkick.server_below?("8.0.0")
|
1303
1317
|
end
|
1318
|
+
|
1319
|
+
def below90?
|
1320
|
+
Searchkick.server_below?("9.0.0")
|
1321
|
+
end
|
1304
1322
|
end
|
1305
1323
|
end
|
data/lib/searchkick/railtie.rb
CHANGED
data/lib/searchkick/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,13 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: searchkick
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 5.
|
4
|
+
version: 5.5.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Andrew Kane
|
8
|
-
autorequire:
|
9
8
|
bindir: bin
|
10
9
|
cert_chain: []
|
11
|
-
date:
|
10
|
+
date: 2025-04-03 00:00:00.000000000 Z
|
12
11
|
dependencies:
|
13
12
|
- !ruby/object:Gem::Dependency
|
14
13
|
name: activemodel
|
@@ -16,14 +15,14 @@ dependencies:
|
|
16
15
|
requirements:
|
17
16
|
- - ">="
|
18
17
|
- !ruby/object:Gem::Version
|
19
|
-
version: '
|
18
|
+
version: '7.1'
|
20
19
|
type: :runtime
|
21
20
|
prerelease: false
|
22
21
|
version_requirements: !ruby/object:Gem::Requirement
|
23
22
|
requirements:
|
24
23
|
- - ">="
|
25
24
|
- !ruby/object:Gem::Version
|
26
|
-
version: '
|
25
|
+
version: '7.1'
|
27
26
|
- !ruby/object:Gem::Dependency
|
28
27
|
name: hashie
|
29
28
|
requirement: !ruby/object:Gem::Requirement
|
@@ -38,7 +37,6 @@ dependencies:
|
|
38
37
|
- - ">="
|
39
38
|
- !ruby/object:Gem::Version
|
40
39
|
version: '0'
|
41
|
-
description:
|
42
40
|
email: andrew@ankane.org
|
43
41
|
executables: []
|
44
42
|
extensions: []
|
@@ -79,7 +77,6 @@ homepage: https://github.com/ankane/searchkick
|
|
79
77
|
licenses:
|
80
78
|
- MIT
|
81
79
|
metadata: {}
|
82
|
-
post_install_message:
|
83
80
|
rdoc_options: []
|
84
81
|
require_paths:
|
85
82
|
- lib
|
@@ -87,15 +84,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
87
84
|
requirements:
|
88
85
|
- - ">="
|
89
86
|
- !ruby/object:Gem::Version
|
90
|
-
version: '3.
|
87
|
+
version: '3.2'
|
91
88
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
92
89
|
requirements:
|
93
90
|
- - ">="
|
94
91
|
- !ruby/object:Gem::Version
|
95
92
|
version: '0'
|
96
93
|
requirements: []
|
97
|
-
rubygems_version: 3.
|
98
|
-
signing_key:
|
94
|
+
rubygems_version: 3.6.2
|
99
95
|
specification_version: 4
|
100
96
|
summary: Intelligent search made easy with Rails and Elasticsearch or OpenSearch
|
101
97
|
test_files: []
|