autosuggest 0.1.3 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 3f04e20f21653bcc9a8941c6da1eddae7726cf775abea699b78022b64fec9e48
4
- data.tar.gz: 7a5c362428558d17808310245d7cad1bdf693800419665d650ab9026e5aa8628
3
+ metadata.gz: fe418b5cfaa006d454a8e061ae08b63821de147677175054be5e28ad254399c6
4
+ data.tar.gz: 10c28b44de11f53fccd66a3e6f547a7f97282e23b529452a1db827b3d2dd0624
5
5
  SHA512:
6
- metadata.gz: 62c38fb482638185c18dde9b0b4db1c9aa4deaa6e8f0fb4341220f61c28b218c51f79376b1080fb954982f0aa2b9a979b20b4212bfb704f73fe2e92be8b4373e
7
- data.tar.gz: 84383dd6de2c654aefabe546f3c01335f906d5cfb517fd1890b26376ed9f482651b6e7e8d1b5082d120d228603b7b0002227b4293d2668c5de51a464aa831fa5
6
+ metadata.gz: 236b65f1939693fd076445ebddff62535d1d1197e44561ade442e2c381f2d3c4bd06572daacc29718dbfb087abc1cf3c001d0152f21c9198d059d8c1d933985c
7
+ data.tar.gz: 1ca979179a176b6a7eca3ee8b9593e6479680ea4706e8ad1c004d8c88efe0a1f1a82b779dee050cb1aa3c6af749323a87fc9f93c9c0c4d8eeb530a9e6a9f11cd
data/CHANGELOG.md CHANGED
@@ -1,3 +1,11 @@
1
+ ## 0.2.0 (2023-01-29)
2
+
3
+ - Added `language` option
4
+ - Changed `suggestions` method to filter by default
5
+ - Changed `filter: true` to only return query and score
6
+ - Removed `blacklist_words` method
7
+ - Dropped support for Ruby < 2.7
8
+
1
9
  ## 0.1.3 (2021-11-23)
2
10
 
3
11
  - Added model generator
data/LICENSE.txt CHANGED
@@ -1,4 +1,4 @@
1
- Copyright (c) 2015-2021 Andrew Kane
1
+ Copyright (c) 2015-2023 Andrew Kane
2
2
 
3
3
  MIT License
4
4
 
data/README.md CHANGED
@@ -4,6 +4,8 @@ Generate autocomplete suggestions based on what your users search
4
4
 
5
5
  :tangerine: Battle-tested at [Instacart](https://www.instacart.com/opensource)
6
6
 
7
+ Autosuggest 0.2 was recently released! See [how to upgrade](#upgrading)
8
+
7
9
  [![Build Status](https://github.com/ankane/autosuggest/workflows/build/badge.svg?branch=master)](https://github.com/ankane/autosuggest/actions)
8
10
 
9
11
  ## Installation
@@ -11,7 +13,7 @@ Generate autocomplete suggestions based on what your users search
11
13
  Add this line to your application’s Gemfile:
12
14
 
13
15
  ```ruby
14
- gem 'autosuggest'
16
+ gem "autosuggest"
15
17
  ```
16
18
 
17
19
  ## Getting Started
@@ -38,14 +40,20 @@ top_queries = Searchjoy::Search.group(:normalized_query)
38
40
  Then pass them to Autosuggest.
39
41
 
40
42
  ```ruby
41
- autosuggest = Autosuggest.new(top_queries)
43
+ autosuggest = Autosuggest::Generator.new(top_queries)
42
44
  ```
43
45
 
44
46
  #### Filter duplicates
45
47
 
46
48
  [Stemming](https://en.wikipedia.org/wiki/Stemming) is used to detect duplicates like `apple` and `apples`.
47
49
 
48
- The most popular query is preferred by default. To override this, use:
50
+ Specify the stemming language (defaults to `english`) with:
51
+
52
+ ```ruby
53
+ autosuggest = Autosuggest::Generator.new(top_queries, language: "spanish")
54
+ ```
55
+
56
+ The most popular query is preferred by default. To override this, use:
49
57
 
50
58
  ```ruby
51
59
  autosuggest.prefer ["apples"]
@@ -90,7 +98,7 @@ autosuggest.block_words ["boom"]
90
98
  Generate suggestions with:
91
99
 
92
100
  ```ruby
93
- suggestions = autosuggest.suggestions(filter: true)
101
+ suggestions = autosuggest.suggestions
94
102
  ```
95
103
 
96
104
  #### Save suggestions
@@ -152,18 +160,18 @@ end
152
160
  You may want to have someone manually approve suggestions:
153
161
 
154
162
  ```ruby
155
- Autosuggest::Suggestion.where(approved: true)
163
+ Autosuggest::Suggestion.where(status: "approved")
156
164
  ```
157
165
 
158
166
  Or filter suggestions without results:
159
167
 
160
168
  ```ruby
161
169
  Autosuggest::Suggestion.find_each do |suggestion|
162
- suggestion.has_results = Product.search(suggestion.query, load: false, limit: 1).any?
170
+ suggestion.results_count = Product.search(suggestion.query, load: false).count
163
171
  suggestion.save! if suggestion.changed?
164
172
  end
165
173
 
166
- Autosuggest::Suggestion.where(has_results: true)
174
+ Autosuggest::Suggestion.where("results_count > 0")
167
175
  ```
168
176
 
169
177
  You can add additional fields to your model/data store to accomplish this.
@@ -176,14 +184,14 @@ top_queries = Searchjoy::Search.group(:normalized_query)
176
184
  product_names = Product.pluck(:name)
177
185
  brand_names = Brand.pluck(:name)
178
186
 
179
- autosuggest = Autosuggest.new(top_queries)
187
+ autosuggest = Autosuggest::Generator.new(top_queries)
180
188
  autosuggest.parse_words product_names
181
189
  autosuggest.add_concept "brand", brand_names
182
190
  autosuggest.prefer brand_names
183
191
  autosuggest.not_duplicates [["straws", "straus"]]
184
192
  autosuggest.block_words ["boom"]
185
193
 
186
- suggestions = autosuggest.suggestions(filter: true)
194
+ suggestions = autosuggest.suggestions
187
195
 
188
196
  now = Time.now
189
197
  records = suggestions.map { |s| s.slice(:query, :score).merge(updated_at: now) }
@@ -193,6 +201,16 @@ Autosuggest::Suggestion.transaction do
193
201
  end
194
202
  ```
195
203
 
204
+ ## Upgrading
205
+
206
+ ### 0.2.0
207
+
208
+ Suggestions are now filtered by default, and only the query and score are returned. To get all queries and fields, use:
209
+
210
+ ```ruby
211
+ autosuggest.suggestions(filter: false)
212
+ ```
213
+
196
214
  ## History
197
215
 
198
216
  View the [changelog](https://github.com/ankane/autosuggest/blob/master/CHANGELOG.md)
@@ -0,0 +1,226 @@
1
+ module Autosuggest
2
+ class Generator
3
+ def initialize(top_queries, language: "english")
4
+ @top_queries = top_queries
5
+ @concepts = {}
6
+ @words = Set.new
7
+ @non_duplicates = Set.new
8
+ @blocked_words = {}
9
+ @preferred_queries = {}
10
+ @profane_words = {}
11
+ @concept_tree = {}
12
+ begin
13
+ @stemmer = Lingua::Stemmer.new(language: language)
14
+ rescue Lingua::StemmerError
15
+ raise ArgumentError, "Language not available"
16
+ end
17
+ # TODO take language into account for profanity
18
+ add_nodes(@profane_words, Obscenity::Base.blacklist)
19
+ end
20
+
21
+ def add_concept(name, values)
22
+ values = values.compact.uniq
23
+ add_nodes(@concept_tree, values)
24
+ @concepts[name] = Set.new(values.map(&:downcase))
25
+ end
26
+
27
+ def parse_words(phrases, options = {})
28
+ min = options[:min] || 1
29
+
30
+ word_counts = Hash.new(0)
31
+ phrases.each do |phrase|
32
+ words = tokenize(phrase)
33
+ words.each do |word|
34
+ word_counts[word] += 1
35
+ end
36
+ end
37
+
38
+ word_counts.select { |_, c| c >= min }.each do |word, _|
39
+ @words << word
40
+ end
41
+
42
+ word_counts
43
+ end
44
+
45
+ def not_duplicates(pairs)
46
+ pairs.each do |pair|
47
+ @non_duplicates << pair.map(&:downcase).sort
48
+ end
49
+ end
50
+
51
+ def block_words(words)
52
+ add_nodes(@blocked_words, words)
53
+ words
54
+ end
55
+
56
+ def prefer(queries)
57
+ queries.each do |query|
58
+ @preferred_queries[normalize_query(query)] ||= query
59
+ end
60
+ end
61
+
62
+ def suggestions(filter: true)
63
+ stemmed_queries = {}
64
+ added_queries = Set.new
65
+ results = @top_queries.sort_by { |_query, count| -count }.map do |query, count|
66
+ query = query.to_s
67
+
68
+ # TODO do not ignore silently
69
+ next if query.length < 2
70
+
71
+ stemmed_query = normalize_query(query)
72
+
73
+ # get preferred term
74
+ preferred_query = @preferred_queries[stemmed_query]
75
+ if preferred_query && preferred_query != query
76
+ original_query, query = query, preferred_query
77
+ end
78
+
79
+ # exclude duplicates
80
+ duplicate = stemmed_queries[stemmed_query]
81
+ stemmed_queries[stemmed_query] ||= query
82
+
83
+ # also detect possibly misspelled duplicates
84
+ # TODO use top query as duplicate
85
+ if !duplicate && query.length > 4
86
+ edits(query).each do |edited_query|
87
+ if added_queries.include?(edited_query)
88
+ duplicate = edited_query
89
+ break
90
+ end
91
+ end
92
+ end
93
+ if duplicate && @non_duplicates.include?([duplicate, query].sort)
94
+ duplicate = nil
95
+ end
96
+ added_queries << query unless duplicate
97
+
98
+ # find concepts
99
+ concepts = []
100
+ @concepts.each do |name, values|
101
+ concepts << name if values.include?(query)
102
+ end
103
+
104
+ tokens = tokenize(query)
105
+
106
+ # exclude misspellings that are not brands
107
+ misspelling = @words.any? && misspellings?(tokens)
108
+
109
+ profane = blocked?(tokens, @profane_words)
110
+ blocked = blocked?(tokens, @blocked_words)
111
+
112
+ notes = []
113
+ notes << "duplicate of #{duplicate}" if duplicate
114
+ notes.concat(concepts)
115
+ notes << "misspelling" if misspelling
116
+ notes << "profane" if profane
117
+ notes << "blocked" if blocked
118
+ notes << "originally #{original_query}" if original_query
119
+
120
+ {
121
+ query: query,
122
+ original_query: original_query,
123
+ score: count,
124
+ duplicate: duplicate,
125
+ concepts: concepts,
126
+ misspelling: misspelling,
127
+ profane: profane,
128
+ blocked: blocked,
129
+ notes: notes
130
+ }
131
+ end
132
+
133
+ results.compact!
134
+
135
+ if filter
136
+ results.filter_map do |s|
137
+ unless s[:duplicate] || s[:misspelling] || s[:profane] || s[:blocked]
138
+ s.slice(:query, :score)
139
+ end
140
+ end
141
+ else
142
+ results
143
+ end
144
+ end
145
+
146
+ def table
147
+ str = "%-30s %5s %s\n" % %w(Query Score Notes)
148
+ suggestions(filter: false).each do |suggestion|
149
+ str << "%-30s %5d %s\n" % [suggestion[:query], suggestion[:score], suggestion[:notes].join(", ")]
150
+ end
151
+ str
152
+ end
153
+ alias_method :pretty_suggestions, :table
154
+
155
+ protected
156
+
157
+ def misspellings?(tokens)
158
+ pos = [0]
159
+ while i = pos.shift
160
+ return false if i == tokens.size
161
+
162
+ if @words.include?(tokens[i])
163
+ pos << i + 1
164
+ end
165
+
166
+ node = @concept_tree[tokens[i]]
167
+ j = i
168
+ while node
169
+ j += 1
170
+ pos << j if node[:eos]
171
+ break if j == tokens.size
172
+ node = node[tokens[j]]
173
+ end
174
+
175
+ pos.uniq!
176
+ end
177
+ true
178
+ end
179
+
180
+ def blocked?(tokens, blocked_words)
181
+ tokens.each_with_index do |token, i|
182
+ node = blocked_words[token]
183
+ j = i
184
+ while node
185
+ return true if node[:eos]
186
+ j += 1
187
+ break if j == tokens.size
188
+ node = node[tokens[j]]
189
+ end
190
+ end
191
+ false
192
+ end
193
+
194
+ def tokenize(str)
195
+ str.to_s.downcase.split(" ")
196
+ end
197
+
198
+ # from https://blog.lojic.com/2008/09/04/how-to-write-a-spelling-corrector-in-ruby/
199
+ LETTERS = ("a".."z").to_a.join + "'"
200
+ def edits(word)
201
+ n = word.length
202
+ deletion = (0...n).collect { |i| word[0...i] + word[i + 1..-1] }
203
+ transposition = (0...n - 1).collect { |i| word[0...i] + word[i + 1, 1] + word[i, 1] + word[i + 2..-1] }
204
+ alteration = []
205
+ n.times { |i| LETTERS.each_byte { |l| alteration << word[0...i] + l.chr + word[i + 1..-1] } }
206
+ insertion = []
207
+ (n + 1).times { |i| LETTERS.each_byte { |l| insertion << word[0...i] + l.chr + word[i..-1] } }
208
+ deletion + transposition + alteration + insertion
209
+ end
210
+
211
+ def normalize_query(query)
212
+ tokenize(query.to_s.gsub("&", "and")).map { |q| @stemmer.stem(q) }.sort.join
213
+ end
214
+
215
+ def add_nodes(var, words)
216
+ words.each do |word|
217
+ node = var
218
+ tokenize(word).each do |token|
219
+ node = (node[token] ||= {})
220
+ end
221
+ node[:eos] = true
222
+ end
223
+ var
224
+ end
225
+ end
226
+ end
@@ -1,3 +1,3 @@
1
- class Autosuggest
2
- VERSION = "0.1.3"
1
+ module Autosuggest
2
+ VERSION = "0.2.0"
3
3
  end
data/lib/autosuggest.rb CHANGED
@@ -7,226 +7,11 @@ require "lingua/stemmer"
7
7
  require "obscenity"
8
8
 
9
9
  # modules
10
- require "autosuggest/version"
10
+ require_relative "autosuggest/generator"
11
+ require_relative "autosuggest/version"
11
12
 
12
- class Autosuggest
13
- def initialize(top_queries)
14
- @top_queries = top_queries
15
- @concepts = {}
16
- @words = Set.new
17
- @non_duplicates = Set.new
18
- @blocked_words = {}
19
- @blacklisted_words = {}
20
- @preferred_queries = {}
21
- @profane_words = {}
22
- @concept_tree = {}
23
- add_nodes(@profane_words, Obscenity::Base.blacklist)
24
- end
25
-
26
- def add_concept(name, values)
27
- values = values.compact.uniq
28
- add_nodes(@concept_tree, values)
29
- @concepts[name] = Set.new(values.map(&:downcase))
30
- end
31
-
32
- def parse_words(phrases, options = {})
33
- min = options[:min] || 1
34
-
35
- word_counts = Hash.new(0)
36
- phrases.each do |phrase|
37
- words = tokenize(phrase)
38
- words.each do |word|
39
- word_counts[word] += 1
40
- end
41
- end
42
-
43
- word_counts.select { |_, c| c >= min }.each do |word, _|
44
- @words << word
45
- end
46
-
47
- word_counts
48
- end
49
-
50
- def not_duplicates(pairs)
51
- pairs.each do |pair|
52
- @non_duplicates << pair.map(&:downcase).sort
53
- end
54
- end
55
-
56
- def block_words(words)
57
- add_nodes(@blocked_words, words)
58
- words
59
- end
60
-
61
- def blacklist_words(words)
62
- warn "[autosuggest] blacklist_words is deprecated. Use block_words instead."
63
- add_nodes(@blacklisted_words, words)
64
- words
65
- end
66
-
67
- def prefer(queries)
68
- queries.each do |query|
69
- @preferred_queries[normalize_query(query)] ||= query
70
- end
71
- end
72
-
73
- # TODO add queries method for filter: false and make suggestions use filter: true in 0.2.0
74
- def suggestions(filter: false)
75
- stemmed_queries = {}
76
- added_queries = Set.new
77
- results = @top_queries.sort_by { |_query, count| -count }.map do |query, count|
78
- query = query.to_s
79
-
80
- # TODO do not ignore silently
81
- next if query.length < 2
82
-
83
- stemmed_query = normalize_query(query)
84
-
85
- # get preferred term
86
- preferred_query = @preferred_queries[stemmed_query]
87
- if preferred_query && preferred_query != query
88
- original_query, query = query, preferred_query
89
- end
90
-
91
- # exclude duplicates
92
- duplicate = stemmed_queries[stemmed_query]
93
- stemmed_queries[stemmed_query] ||= query
94
-
95
- # also detect possibly misspelled duplicates
96
- # TODO use top query as duplicate
97
- if !duplicate && query.length > 4
98
- edits(query).each do |edited_query|
99
- if added_queries.include?(edited_query)
100
- duplicate = edited_query
101
- break
102
- end
103
- end
104
- end
105
- if duplicate && @non_duplicates.include?([duplicate, query].sort)
106
- duplicate = nil
107
- end
108
- added_queries << query unless duplicate
109
-
110
- # find concepts
111
- concepts = []
112
- @concepts.each do |name, values|
113
- concepts << name if values.include?(query)
114
- end
115
-
116
- tokens = tokenize(query)
117
-
118
- # exclude misspellings that are not brands
119
- misspelling = @words.any? && misspellings?(tokens)
120
-
121
- profane = blocked?(tokens, @profane_words)
122
- blocked = blocked?(tokens, @blocked_words)
123
- blacklisted = blocked?(tokens, @blacklisted_words)
124
-
125
- notes = []
126
- notes << "duplicate of #{duplicate}" if duplicate
127
- notes.concat(concepts)
128
- notes << "misspelling" if misspelling
129
- notes << "profane" if profane
130
- notes << "blocked" if blocked
131
- notes << "blacklisted" if blacklisted
132
- notes << "originally #{original_query}" if original_query
133
-
134
- result = {
135
- query: query,
136
- original_query: original_query,
137
- score: count,
138
- duplicate: duplicate,
139
- concepts: concepts,
140
- misspelling: misspelling,
141
- profane: profane,
142
- blocked: blocked
143
- }
144
- result[:blacklisted] = blacklisted if @blacklisted_words.any?
145
- result[:notes] = notes
146
- result
147
- end
148
- if filter
149
- results.reject! { |s| s[:duplicate] || s[:misspelling] || s[:profane] || s[:blocked] }
150
- end
151
- results
152
- end
153
-
154
- def pretty_suggestions
155
- str = "%-30s %5s %s\n" % %w(Query Score Notes)
156
- suggestions.each do |suggestion|
157
- str << "%-30s %5d %s\n" % [suggestion[:query], suggestion[:score], suggestion[:notes].join(", ")]
158
- end
159
- str
160
- end
161
-
162
- protected
163
-
164
- def misspellings?(tokens)
165
- pos = [0]
166
- while i = pos.shift
167
- return false if i == tokens.size
168
-
169
- if @words.include?(tokens[i])
170
- pos << i + 1
171
- end
172
-
173
- node = @concept_tree[tokens[i]]
174
- j = i
175
- while node
176
- j += 1
177
- pos << j if node[:eos]
178
- break if j == tokens.size
179
- node = node[tokens[j]]
180
- end
181
-
182
- pos.uniq!
183
- end
184
- true
185
- end
186
-
187
- def blocked?(tokens, blocked_words)
188
- tokens.each_with_index do |token, i|
189
- node = blocked_words[token]
190
- j = i
191
- while node
192
- return true if node[:eos]
193
- j += 1
194
- break if j == tokens.size
195
- node = node[tokens[j]]
196
- end
197
- end
198
- false
199
- end
200
-
201
- def tokenize(str)
202
- str.to_s.downcase.split(" ")
203
- end
204
-
205
- # from https://blog.lojic.com/2008/09/04/how-to-write-a-spelling-corrector-in-ruby/
206
- LETTERS = ("a".."z").to_a.join + "'"
207
- def edits(word)
208
- n = word.length
209
- deletion = (0...n).collect { |i| word[0...i] + word[i + 1..-1] }
210
- transposition = (0...n - 1).collect { |i| word[0...i] + word[i + 1, 1] + word[i, 1] + word[i + 2..-1] }
211
- alteration = []
212
- n.times { |i| LETTERS.each_byte { |l| alteration << word[0...i] + l.chr + word[i + 1..-1] } }
213
- insertion = []
214
- (n + 1).times { |i| LETTERS.each_byte { |l| insertion << word[0...i] + l.chr + word[i..-1] } }
215
- deletion + transposition + alteration + insertion
216
- end
217
-
218
- def normalize_query(query)
219
- tokenize(query.to_s.gsub("&", "and")).map { |q| Lingua.stemmer(q) }.sort.join
220
- end
221
-
222
- def add_nodes(var, words)
223
- words.each do |word|
224
- node = var
225
- tokenize(word).each do |token|
226
- node = (node[token] ||= {})
227
- end
228
- node[:eos] = true
229
- end
230
- var
13
+ module Autosuggest
14
+ def self.new(*args, **options)
15
+ Generator.new(*args, **options)
231
16
  end
232
17
  end
@@ -1,6 +1,6 @@
1
1
  require "rails/generators/active_record"
2
2
 
3
- class Autosuggest
3
+ module Autosuggest
4
4
  module Generators
5
5
  class SuggestionsGenerator < Rails::Generators::Base
6
6
  include ActiveRecord::Generators::Migration
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: autosuggest
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.3
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2021-11-24 00:00:00.000000000 Z
11
+ date: 2023-01-30 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: ruby-stemmer
@@ -48,6 +48,7 @@ files:
48
48
  - LICENSE.txt
49
49
  - README.md
50
50
  - lib/autosuggest.rb
51
+ - lib/autosuggest/generator.rb
51
52
  - lib/autosuggest/version.rb
52
53
  - lib/generators/autosuggest/suggestions_generator.rb
53
54
  - lib/generators/autosuggest/templates/migration.rb.tt
@@ -64,14 +65,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
64
65
  requirements:
65
66
  - - ">="
66
67
  - !ruby/object:Gem::Version
67
- version: '2.4'
68
+ version: '2.7'
68
69
  required_rubygems_version: !ruby/object:Gem::Requirement
69
70
  requirements:
70
71
  - - ">="
71
72
  - !ruby/object:Gem::Version
72
73
  version: '0'
73
74
  requirements: []
74
- rubygems_version: 3.2.22
75
+ rubygems_version: 3.4.1
75
76
  signing_key:
76
77
  specification_version: 4
77
78
  summary: Generate autocomplete suggestions based on what your users search