searchkick 4.4.2 → 4.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 782d732ce3dca45ba3654f3afa2ee565fc396543179dbdc87a0ee7418a151a3c
4
- data.tar.gz: 2752fb40094f068a97e77009634fccf7a1433182294835ad077a66632d43bcb6
3
+ metadata.gz: 44f4a5255a208b7d24aa1b6d668808d3aff83a9ebb9f07a1a1a83fd9e3845738
4
+ data.tar.gz: 0ac096c02151a04a00751de140520547570b9c24fa2fccffd9d4761637b85531
5
5
  SHA512:
6
- metadata.gz: b1125b09caaf20b0dcc8aa7c55755e257338499dda6d09427e139f9167d9e5b2e9971657b830352babe8100c510236412fbaa5a6edf9617c8eefcbe7aff4d2c7
7
- data.tar.gz: b111187e5e1d05f19120f850833a491405727811fde6b31465f6021dffdd01b44e55c6e9a19b930246a4a92671ddda62aa65f9fa6fb6793342a5581009961985
6
+ metadata.gz: ff25ecdd74852fdbca33909d37445adbb42aa76785df75d9f1d9903d1dbee5712507d30a654dfaf829b9b6c0db0148cf48fbb4a257e3decf69e6c316c40b66fc
7
+ data.tar.gz: e07d8ddd9dbcf661db01948a7f20ad0b91d6427adddbbe987d571011983046650bd21726f74a2d0d11caaf9d49cd83b70a92738dc1cb21346e2c74009a1c494c
data/CHANGELOG.md CHANGED
@@ -1,3 +1,22 @@
1
+ ## 4.5.1 (2021-08-03)
2
+
3
+ - Improved performance of reindex queue
4
+
5
+ ## 4.5.0 (2021-06-07)
6
+
7
+ - Added experimental support for OpenSearch
8
+ - Added support for synonyms in Japanese
9
+
10
+ ## 4.4.4 (2021-03-12)
11
+
12
+ - Fixed `too_long_frame_exception` with `scroll` method
13
+ - Fixed multi-word emoji tokenization
14
+
15
+ ## 4.4.3 (2021-02-25)
16
+
17
+ - Added support for Hunspell
18
+ - Fixed warning about accessing system indices
19
+
1
20
  ## 4.4.2 (2020-11-23)
2
21
 
3
22
  - Added `missing_records` method to results
data/README.md CHANGED
@@ -20,7 +20,7 @@ Plus:
20
20
  - autocomplete
21
21
  - “Did you mean” suggestions
22
22
  - supports many languages
23
- - works with ActiveRecord, Mongoid, and NoBrainer
23
+ - works with Active Record, Mongoid, and NoBrainer
24
24
 
25
25
  Check out [Searchjoy](https://github.com/ankane/searchjoy) for analytics and [Autosuggest](https://github.com/ankane/autosuggest) for query suggestions
26
26
 
@@ -45,11 +45,11 @@ Check out [Searchjoy](https://github.com/ankane/searchjoy) for analytics and [Au
45
45
 
46
46
  ## Getting Started
47
47
 
48
- [Install Elasticsearch](https://www.elastic.co/downloads/elasticsearch). For Homebrew, use:
48
+ Install [Elasticsearch](https://www.elastic.co/downloads/elasticsearch) or [OpenSearch](https://opensearch.org/downloads.html) (OpenSearch support is experimental). For Homebrew, use:
49
49
 
50
50
  ```sh
51
- brew install elasticsearch
52
- brew services start elasticsearch
51
+ brew install elasticsearch # or opensearch
52
+ brew services start elasticsearch # or opensearch
53
53
  ```
54
54
 
55
55
  Add this line to your application’s Gemfile:
@@ -58,7 +58,7 @@ Add this line to your application’s Gemfile:
58
58
  gem 'searchkick'
59
59
  ```
60
60
 
61
- The latest version works with Elasticsearch 6 and 7. For Elasticsearch 5, use version 3.1.3 and [this readme](https://github.com/ankane/searchkick/blob/v3.1.3/README.md).
61
+ The latest version works with Elasticsearch 6 and 7 and OpenSearch 1. For Elasticsearch 5, use version 3.1.3 and [this readme](https://github.com/ankane/searchkick/blob/v3.1.3/README.md).
62
62
 
63
63
  Add searchkick to models you want to search.
64
64
 
@@ -176,7 +176,7 @@ Get the full response from Elasticsearch
176
176
  results.response
177
177
  ```
178
178
 
179
- **Note:** By default, Elasticsearch [limits paging](#deep-paging-master) to the first 10,000 results for performance. With Elasticsearch 7, this applies to the total count as well.
179
+ **Note:** By default, Elasticsearch [limits paging](#deep-paging) to the first 10,000 results for performance. With Elasticsearch 7, this applies to the total count as well.
180
180
 
181
181
  ### Boosting
182
182
 
@@ -209,7 +209,7 @@ boost_by_recency: {created_at: {scale: "7d", decay: 0.5}}
209
209
 
210
210
  You can also boost by:
211
211
 
212
- - [Conversions](#keep-getting-better)
212
+ - [Conversions](#intelligent-search)
213
213
  - [Distance](#boost-by-distance)
214
214
 
215
215
  ### Get Everything
@@ -311,7 +311,7 @@ class Product < ApplicationRecord
311
311
  end
312
312
  ```
313
313
 
314
- See the [list of stemmers](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html). A few languages require plugins:
314
+ See the [list of languages](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html#analysis-stemmer-tokenfilter-configure-parms). A few languages require plugins:
315
315
 
316
316
  - `chinese` - [analysis-ik plugin](https://github.com/medcl/elasticsearch-analysis-ik)
317
317
  - `chinese2` - [analysis-smartcn plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/7.4/analysis-smartcn.html)
@@ -322,6 +322,14 @@ See the [list of stemmers](https://www.elastic.co/guide/en/elasticsearch/referen
322
322
  - `ukrainian` - [analysis-ukrainian plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/7.4/analysis-ukrainian.html)
323
323
  - `vietnamese` - [analysis-vietnamese plugin](https://github.com/duydo/elasticsearch-analysis-vietnamese)
324
324
 
325
+ You can also use a Hunspell dictionary for stemming.
326
+
327
+ ```ruby
328
+ class Product < ApplicationRecord
329
+ searchkick stemmer: {type: "hunspell", locale: "en_US"}
330
+ end
331
+ ```
332
+
325
333
  Disable stemming with:
326
334
 
327
335
  ```ruby
@@ -641,7 +649,7 @@ class Product < ApplicationRecord
641
649
  def search_data
642
650
  {
643
651
  name: name,
644
- conversions: searches.group(:query).uniq.count(:user_id)
652
+ conversions: searches.group(:query).distinct.count(:user_id)
645
653
  # {"ice cream" => 234, "chocolate" => 67, "cream" => 2}
646
654
  }
647
655
  end
@@ -1204,12 +1212,18 @@ FactoryBot.create(:product, :some_trait, :reindex, some_attribute: "foo")
1204
1212
 
1205
1213
  ### GitHub Actions
1206
1214
 
1207
- Check out [setup-elasticsearch](https://github.com/ankane/setup-elasticsearch) for an easy way to install Elasticsearch.
1215
+ Check out [setup-elasticsearch](https://github.com/ankane/setup-elasticsearch) for an easy way to install Elasticsearch:
1208
1216
 
1209
1217
  ```yml
1210
1218
  - uses: ankane/setup-elasticsearch@v1
1211
1219
  ```
1212
1220
 
1221
+ And [setup-opensearch](https://github.com/ankane/setup-opensearch) for an easy way to install OpenSearch:
1222
+
1223
+ ```yml
1224
+ - uses: ankane/setup-opensearch@v1
1225
+ ```
1226
+
1213
1227
  ## Deployment
1214
1228
 
1215
1229
  Searchkick uses `ENV["ELASTICSEARCH_URL"]` for the Elasticsearch server. This defaults to `http://localhost:9200`.
@@ -1217,7 +1231,7 @@ Searchkick uses `ENV["ELASTICSEARCH_URL"]` for the Elasticsearch server. This de
1217
1231
  - [Elastic Cloud](#elastic-cloud)
1218
1232
  - [Heroku](#heroku)
1219
1233
  - [Amazon Elasticsearch Service](#amazon-elasticsearch-service)
1220
- - [Self-Hosted and Other](#other)
1234
+ - [Self-Hosted and Other](#self-hosted-and-other)
1221
1235
 
1222
1236
  ### Elastic Cloud
1223
1237
 
@@ -1469,7 +1483,7 @@ Product.search_index.promote(index_name, update_refresh_interval: true)
1469
1483
 
1470
1484
  ### Queuing
1471
1485
 
1472
- Push ids of records needing reindexed to a queue and reindex in bulk for better performance. First, set up Redis in an initializer. We recommend using [connection_pool](https://github.com/mperham/connection_pool).
1486
+ Push ids of records needing reindexing to a queue and reindex in bulk for better performance. First, set up Redis in an initializer. We recommend using [connection_pool](https://github.com/mperham/connection_pool).
1473
1487
 
1474
1488
  ```ruby
1475
1489
  Searchkick.redis = ConnectionPool.new { Redis.new }
@@ -1572,14 +1586,14 @@ class ReindexConversionsJob < ApplicationJob
1572
1586
  # get records that have a recent conversion
1573
1587
  recently_converted_ids =
1574
1588
  Searchjoy::Search.where("convertable_type = ? AND converted_at > ?", class_name, 1.day.ago)
1575
- .order(:convertable_id).uniq.pluck(:convertable_id)
1589
+ .order(:convertable_id).distinct.pluck(:convertable_id)
1576
1590
 
1577
1591
  # split into groups
1578
1592
  recently_converted_ids.in_groups_of(1000, false) do |ids|
1579
1593
  # fetch conversions
1580
1594
  conversions =
1581
1595
  Searchjoy::Search.where(convertable_id: ids, convertable_type: class_name)
1582
- .group(:convertable_id, :query).uniq.count(:user_id)
1596
+ .group(:convertable_id, :query).distinct.count(:user_id)
1583
1597
 
1584
1598
  # group conversions by record
1585
1599
  conversions_by_record = {}
@@ -1831,7 +1845,7 @@ class Product < ApplicationRecord
1831
1845
  def search_data
1832
1846
  {
1833
1847
  name: name,
1834
- unique_user_conversions: searches.group(:query).uniq.count(:user_id),
1848
+ unique_user_conversions: searches.group(:query).distinct.count(:user_id),
1835
1849
  # {"ice cream" => 234, "chocolate" => 67, "cream" => 2}
1836
1850
  total_conversions: searches.group(:query).count
1837
1851
  # {"ice cream" => 412, "chocolate" => 117, "cream" => 6}
data/lib/searchkick.rb CHANGED
@@ -74,11 +74,24 @@ module Searchkick
74
74
  (defined?(@search_timeout) && @search_timeout) || timeout
75
75
  end
76
76
 
77
+ # private
78
+ def self.server_info
79
+ @server_info ||= client.info
80
+ end
81
+
77
82
  def self.server_version
78
- @server_version ||= client.info["version"]["number"]
83
+ @server_version ||= server_info["version"]["number"]
84
+ end
85
+
86
+ def self.opensearch?
87
+ unless defined?(@opensearch)
88
+ @opensearch = server_info["version"]["distribution"] == "opensearch"
89
+ end
90
+ @opensearch
79
91
  end
80
92
 
81
93
  def self.server_below?(version)
94
+ server_version = opensearch? ? "7.10.2" : self.server_version
82
95
  Gem::Version.new(server_version.split("-")[0]) < Gem::Version.new(version.split("-")[0])
83
96
  end
84
97
 
@@ -105,7 +105,7 @@ module Searchkick
105
105
  indices =
106
106
  begin
107
107
  if client.indices.respond_to?(:get_alias)
108
- client.indices.get_alias
108
+ client.indices.get_alias(index: "#{name}*")
109
109
  else
110
110
  client.indices.get_aliases
111
111
  end
@@ -161,6 +161,7 @@ module Searchkick
161
161
  RecordData.new(self, record).document_type
162
162
  end
163
163
 
164
+ # TODO use like: [{_index: ..., _id: ...}] in Searchkick 5
164
165
  def similar_record(record, **options)
165
166
  like_text = retrieve(record).to_hash
166
167
  .keep_if { |k, _| !options[:fields] || options[:fields].map(&:to_s).include?(k) }
@@ -153,6 +153,7 @@ module Searchkick
153
153
  }
154
154
  }
155
155
 
156
+ raise ArgumentError, "Can't pass both language and stemmer" if options[:stemmer] && language
156
157
  update_language(settings, language)
157
158
  update_stemming(settings)
158
159
 
@@ -234,6 +235,27 @@ module Searchkick
234
235
  type: "kuromoji"
235
236
  }
236
237
  )
238
+ when "japanese2"
239
+ analyzer = {
240
+ type: "custom",
241
+ tokenizer: "kuromoji_tokenizer",
242
+ filter: [
243
+ "kuromoji_baseform",
244
+ "kuromoji_part_of_speech",
245
+ "cjk_width",
246
+ "ja_stop",
247
+ "searchkick_stemmer",
248
+ "lowercase"
249
+ ]
250
+ }
251
+ settings[:analysis][:analyzer].merge!(
252
+ default_analyzer => analyzer.deep_dup,
253
+ searchkick_search: analyzer.deep_dup,
254
+ searchkick_search2: analyzer.deep_dup
255
+ )
256
+ settings[:analysis][:filter][:searchkick_stemmer] = {
257
+ type: "kuromoji_stemmer"
258
+ }
237
259
  when "korean"
238
260
  settings[:analysis][:analyzer].merge!(
239
261
  default_analyzer => {
@@ -286,6 +308,18 @@ module Searchkick
286
308
  end
287
309
 
288
310
  def update_stemming(settings)
311
+ if options[:stemmer]
312
+ stemmer = options[:stemmer]
313
+ # could also support snowball and stemmer
314
+ case stemmer[:type]
315
+ when "hunspell"
316
+ # supports all token filter options
317
+ settings[:analysis][:filter][:searchkick_stemmer] = stemmer
318
+ else
319
+ raise ArgumentError, "Unknown stemmer: #{stemmer[:type]}"
320
+ end
321
+ end
322
+
289
323
  stem = options[:stem]
290
324
 
291
325
  # language analyzer used
@@ -499,8 +533,18 @@ module Searchkick
499
533
  end
500
534
  settings[:analysis][:filter][:searchkick_synonym_graph] = synonym_graph
501
535
 
502
- [:searchkick_search2, :searchkick_word_search].each do |analyzer|
503
- settings[:analysis][:analyzer][analyzer][:filter].insert(2, "searchkick_synonym_graph")
536
+ if options[:language] == "japanese2"
537
+ [:searchkick_search, :searchkick_search2].each do |analyzer|
538
+ settings[:analysis][:analyzer][analyzer][:filter].insert(4, "searchkick_synonym_graph")
539
+ end
540
+ else
541
+ [:searchkick_search2, :searchkick_word_search].each do |analyzer|
542
+ unless settings[:analysis][:analyzer][analyzer].key?(:filter)
543
+ raise Searchkick::Error, "Search synonyms are not supported yet for language"
544
+ end
545
+
546
+ settings[:analysis][:analyzer][analyzer][:filter].insert(2, "searchkick_synonym_graph")
547
+ end
504
548
  end
505
549
  end
506
550
  end
@@ -6,7 +6,7 @@ module Searchkick
6
6
  unknown_keywords = options.keys - [:_all, :_type, :batch_size, :callbacks, :case_sensitive, :conversions, :deep_paging, :default_fields,
7
7
  :filterable, :geo_shape, :highlight, :ignore_above, :index_name, :index_prefix, :inheritance, :language,
8
8
  :locations, :mappings, :match, :merge_mappings, :routing, :searchable, :search_synonyms, :settings, :similarity,
9
- :special_characters, :stem, :stem_conversions, :stem_exclusion, :stemmer_override, :suggest, :synonyms, :text_end,
9
+ :special_characters, :stem, :stemmer, :stem_conversions, :stem_exclusion, :stemmer_override, :suggest, :synonyms, :text_end,
10
10
  :text_middle, :text_start, :word, :wordnet, :word_end, :word_middle, :word_start]
11
11
  raise ArgumentError, "unknown keywords: #{unknown_keywords.join(", ")}" if unknown_keywords.any?
12
12
 
@@ -11,7 +11,7 @@ module Searchkick
11
11
  if record_ids.any?
12
12
  batch_options = {
13
13
  class_name: class_name,
14
- record_ids: record_ids,
14
+ record_ids: record_ids.uniq,
15
15
  index_name: index_name
16
16
  }
17
17
 
@@ -25,7 +25,7 @@ module Searchkick
25
25
  term = term.to_s
26
26
 
27
27
  if options[:emoji]
28
- term = EmojiParser.parse_unicode(term) { |e| " #{e.name} " }.strip
28
+ term = EmojiParser.parse_unicode(term) { |e| " #{e.name.tr('_', ' ')} " }.strip
29
29
  end
30
30
 
31
31
  @klass = klass
@@ -353,8 +353,8 @@ module Searchkick
353
353
  shared_options[:cutoff_frequency] = 0.001 unless operator.to_s == "and" || field_misspellings == false || (!below73? && !track_total_hits?)
354
354
  qs << shared_options.merge(analyzer: "searchkick_search")
355
355
 
356
- # searchkick_search and searchkick_search2 are the same for ukrainian
357
- unless %w(japanese korean polish ukrainian vietnamese).include?(searchkick_options[:language])
356
+ # searchkick_search and searchkick_search2 are the same for some languages
357
+ unless %w(japanese japanese2 korean polish ukrainian vietnamese).include?(searchkick_options[:language])
358
358
  qs << shared_options.merge(analyzer: "searchkick_search2")
359
359
  end
360
360
  exclude_analyzer = "searchkick_search2"
@@ -864,10 +864,11 @@ module Searchkick
864
864
  }
865
865
  end
866
866
 
867
- # TODO id transformation for arrays
868
867
  def set_order(payload)
869
868
  order = options[:order].is_a?(Enumerable) ? options[:order] : {options[:order] => :asc}
870
869
  id_field = :_id
870
+ # TODO no longer map id to _id in Searchkick 5
871
+ # since sorting on _id is deprecated in Elasticsearch
871
872
  payload[:sort] = order.is_a?(Array) ? order : Hash[order.map { |k, v| [k.to_s == "id" ? id_field : k, v] }]
872
873
  end
873
874
 
@@ -14,11 +14,17 @@ module Searchkick
14
14
 
15
15
  # TODO use reliable queuing
16
16
  def reserve(limit: 1000)
17
- record_ids = Set.new
18
- while record_ids.size < limit && (record_id = Searchkick.with_redis { |r| r.rpop(redis_key) })
19
- record_ids << record_id
17
+ if supports_rpop_with_count?
18
+ Searchkick.with_redis { |r| r.call("rpop", redis_key, limit) }
19
+ else
20
+ record_ids = []
21
+ Searchkick.with_redis do |r|
22
+ while record_ids.size < limit && (record_id = r.rpop(redis_key))
23
+ record_ids << record_id
24
+ end
25
+ end
26
+ record_ids
20
27
  end
21
- record_ids.to_a
22
28
  end
23
29
 
24
30
  def clear
@@ -34,5 +40,13 @@ module Searchkick
34
40
  def redis_key
35
41
  "searchkick:reindex_queue:#{name}"
36
42
  end
43
+
44
+ def supports_rpop_with_count?
45
+ redis_version >= Gem::Version.new("6.2")
46
+ end
47
+
48
+ def redis_version
49
+ @redis_version ||= Searchkick.with_redis { |r| Gem::Version.new(r.info["redis_version"]) }
50
+ end
37
51
  end
38
52
  end
@@ -188,14 +188,9 @@ module Searchkick
188
188
 
189
189
  records.clear_scroll
190
190
  else
191
- params = {
192
- scroll: options[:scroll],
193
- scroll_id: scroll_id
194
- }
195
-
196
191
  begin
197
192
  # TODO Active Support notifications for this scroll call
198
- Searchkick::Results.new(@klass, Searchkick.client.scroll(params), @options)
193
+ Searchkick::Results.new(@klass, Searchkick.client.scroll(scroll: options[:scroll], body: {scroll_id: scroll_id}), @options)
199
194
  rescue Elasticsearch::Transport::Transport::Errors::NotFound => e
200
195
  if e.class.to_s =~ /NotFound/ && e.message =~ /search_context_missing_exception/i
201
196
  raise Searchkick::Error, "Scroll id has expired"
@@ -236,7 +231,7 @@ module Searchkick
236
231
  index_alias = index.split("_")[0..-2].join("_")
237
232
  Array((options[:index_mapping] || {})[index_alias])
238
233
  end
239
- raise Searchkick::Error, "Unknown model for index: #{index}" unless models.any?
234
+ raise Searchkick::Error, "Unknown model for index: #{index}. Pass the `models` option to the search method." unless models.any?
240
235
  index_models[index] = models
241
236
  end
242
237
 
@@ -1,3 +1,3 @@
1
1
  module Searchkick
2
- VERSION = "4.4.2"
2
+ VERSION = "4.5.1"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: searchkick
3
3
  version: !ruby/object:Gem::Version
4
- version: 4.4.2
4
+ version: 4.5.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2020-11-24 00:00:00.000000000 Z
11
+ date: 2021-08-03 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: activemodel
@@ -102,7 +102,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
102
102
  - !ruby/object:Gem::Version
103
103
  version: '0'
104
104
  requirements: []
105
- rubygems_version: 3.1.4
105
+ rubygems_version: 3.2.22
106
106
  signing_key:
107
107
  specification_version: 4
108
108
  summary: Intelligent search made easy with Rails and Elasticsearch