pg_search 0.2.2 → 0.3

Sign up to get free protection for your applications and to get access to all the features.
data/.gitignore CHANGED
@@ -5,5 +5,4 @@ doc
5
5
  .idea
6
6
  tags
7
7
  *~
8
- gemfiles/rails2/Gemfile.lock
9
- gemfiles/rails3/Gemfile.lock
8
+ Gemfile.lock
data/.rspec ADDED
@@ -0,0 +1 @@
1
+ --colour
data/.travis.yml ADDED
@@ -0,0 +1,2 @@
1
+ before_script:
2
+ - "psql -c 'create database pg_search_test;' -U postgres >/dev/null"
data/CHANGELOG CHANGED
@@ -1,11 +1,14 @@
1
- ### 0.2.2
1
+ ### 0.3
2
2
 
3
- * Fix a compatibility issue between Ruby 1.8.7 and 1.9.3 when using Rails 2
4
- (James Badger)
3
+ * Drop Active Record 2.0 support.
5
4
 
6
- ### 0.2.1
5
+ * Add PgSearch.multisearch for cross-model searching.
7
6
 
8
- * Backport support for searching against tsvector columns (Kris Hicks)
7
+ * Fix PostgreSQL warnings about truncated identifiers
8
+
9
+ * Support specifying a method of rank normalisation when using tsearch. (Arthur Gunn)
10
+
11
+ * Add :any_word option to :tsearch which uses OR between query terms instead of AND. (Fernando Espinosa)
9
12
 
10
13
  ### 0.2
11
14
 
data/Gemfile CHANGED
@@ -1,5 +1,10 @@
1
- puts <<-MESSAGE
2
- This project uses multiple Gemfiles in subdirectories of ./gemfiles.
3
- The rake tasks automatically install these bundles as necessary. See rake -T.
4
- MESSAGE
5
- exit 1
1
+ source "http://rubygems.org"
2
+
3
+ gemspec
4
+
5
+ gem "rake"
6
+ gem "rdoc"
7
+ gem "pg"
8
+ gem "rspec", ">=2.4"
9
+ gem "autotest"
10
+ gem "with_model"
data/README.rdoc CHANGED
@@ -18,16 +18,11 @@ In Gemfile
18
18
 
19
19
  gem 'pg_search'
20
20
 
21
- === Rails 2
21
+ === Other ActiveRecord-based projects
22
22
 
23
- In environment.rb
23
+ In addition to installing and requiring the gem, you may want to include the PgSearch rake tasks in your Rakefile:
24
24
 
25
- config.gem 'pg_search'
26
-
27
- In Rakefile
28
-
29
- require 'rubygems'
30
- require 'pg_search/tasks'
25
+ load "pg_search/tasks.rb"
31
26
 
32
27
  == USAGE
33
28
 
@@ -37,6 +32,94 @@ To add PgSearch to an ActiveRecord model, simply include the PgSearch module.
37
32
  include PgSearch
38
33
  end
39
34
 
35
+ === Multi-search vs. search scopes
36
+
37
+ pg_search supports two different techniques for searching, multi-search and search scopes.
38
+
39
+ The first technique is multi-search, in which records of many different Active Record classes can be mixed together into one global search index across your entire application. Most sites that want to support a generic search page will want to use this feature.
40
+
41
+ The other technique is search scopes, which allow you to do more advanced searching against only one Active Record class. This is more useful for building things like autocompleters or filtering a list of items in a faceted search.
42
+
43
+ === Multi-search
44
+
45
+ ==== Setup
46
+
47
+ Before using multi-search, you must generate and run a migration to create the pg_search_documents database table.
48
+
49
+ $ rake pg_search:migration:multisearch
50
+ $ rake db:migrate
51
+
52
+ ==== multisearchable
53
+
54
+ To add a model to the global search index for your application, call multisearchable in its class definition.
55
+
56
+ class EpicPoem < ActiveRecord::Base
57
+ include PgSearch
58
+ multisearchable :against => [:title, :author]
59
+ end
60
+
61
+ class Flower < ActiveRecord::Base
62
+ include PgSearch
63
+ multisearchable :against => :color
64
+ end
65
+
66
+ Whenever a record is created, updated, or destroyed, an Active Record callback will fire, leading to the creation of a corresponding PgSearch::Document record in the pg_search_documents table. The :against option can be one or several methods which will be called on the record to generate its search text.
67
+
68
+ ==== Multi-search associations
69
+
70
+ Two associations are built automatically. On the original record, there is a has_one :pg_search_document association pointing to the PgSearch::Document record, and on the PgSearch::Document record there is a belongs_to :searchable polymorphic association pointing back to the original record.
71
+
72
+ odyssey = EpicPoem.create!(:title => "Odyssey", :author => "Homer")
73
+ search_document = odyssey.pg_search_document #=> PgSearch::Document instance
74
+ search_document.searchable #=> #<EpicPoem id: 1, title: "Odyssey", author: "Homer">
75
+
76
+ ==== Searching in the global search index
77
+
78
+ To fetch the PgSearch::Document entries for all of the records that match a given query, use PgSearch.multisearch.
79
+
80
+ odyssey = EpicPoem.create!(:title => "Odyssey", :author => "Homer")
81
+ rose = Flower.create!(:color => "Red")
82
+ PgSearch.multisearch("Homer") #=> [#<PgSearch::Document searchable: odyssey>]
83
+ PgSearch.multisearch("Red") #=> [#<PgSearch::Document searchable: rose>]
84
+
85
+ ==== Chaining method calls onto the results
86
+
87
+ PgSearch.multisearch returns an ActiveRecord::Relation, just like scopes do, so you can chain scope calls to the end. This works with gems like Kaminari that add scope methods. Just like with regular scopes, the database will only receive SQL requests when necessary.
88
+
89
+ PgSearch.multisearch("Bertha").limit(10)
90
+ PgSearch.multisearch("Juggler").where(:searchable_type => "Occupation")
91
+ PgSearch.multisearch("Alamo").page(3).per_page(30)
92
+ PgSearch.mulitsearch("Diagonal").find_each do |document|
93
+ puts document.searchable.updated_at
94
+ end
95
+
96
+ ==== Rebuilding search documents for a given class
97
+
98
+ If you change the :against option on a class, add multisearchable to a class that already has records in the database, or remove multisearchable from a class in order to remove it from the index, you will find that the pg_search_documents table could become out-of-sync with the actual records in your other tables.
99
+
100
+ The index can also become out-of-sync if you ever modify records in a way that does not trigger Active Record callbacks. For instance, the #update_attribute instance method and the .update_all class method both skip callbacks and directly modify the database.
101
+
102
+ To remove all of the documents for a given class, you can simply delete all of the PgSearch::Document records.
103
+
104
+ PgSearch::Document.delete_all(:searchable_type => "Animal")
105
+
106
+ Run this Rake task to regenerate all of the documents for a given class.
107
+
108
+ $ rake pg_search:multisearch:rebuild CLASS=BlogPost
109
+
110
+ Currently this is only supported for :against methods that directly map to Active Record attributes. Until that is fixed, you could also manually rebuild all of the documents.
111
+
112
+ PgSearch::Document.delete_all(:searchable_type => "Ingredient")
113
+ Ingredient.find_each { |record| record.update_pg_search_document }
114
+
115
+ ==== Disabling multi-search indexing temporarily
116
+
117
+ If you have a large bulk operation to perform, such as importing a lot of records from an external source, you might want to speed things up by turning off indexing temporarily. You could then use one of the techniques above to rebuild the search documents off-line.
118
+
119
+ PgSearch.disable_multisearch do
120
+ Movie.import_from_xml_file(File.open("movies.xml"))
121
+ end
122
+
40
123
  === pg_search_scope
41
124
 
42
125
  You can use pg_search_scope to build a search scope. The first parameter is a scope name, and the second parameter is an options hash. The only required option is :against, which tells pg_search_scope which column or columns to search against.
@@ -135,11 +218,11 @@ You can pass a Hash into the :associated_against option to search columns on oth
135
218
 
136
219
  === Searching using different search features
137
220
 
138
- By default, pg_search_scope uses the built-in {PostgreSQL text search}[http://www.postgresql.org/docs/current/static/textsearch-intro.html]. If you pass the :features option to pg_search_scope, you can choose alternative search techniques.
221
+ By default, pg_search_scope uses the built-in {PostgreSQL text search}[http://www.postgresql.org/docs/current/static/textsearch-intro.html]. If you pass the :using option to pg_search_scope, you can choose alternative search techniques.
139
222
 
140
223
  class Beer < ActiveRecord::Base
141
224
  include PgSearch
142
- pg_search_scope :against => :name, :features => [:tsearch, :trigram, :dmetaphone]
225
+ pg_search_scope :search_name, :against => :name, :using => [:tsearch, :trigram, :dmetaphone]
143
226
  end
144
227
 
145
228
  The currently implemented features are
@@ -157,7 +240,7 @@ Each searchable column can be given a weight of "A", "B", "C", or "D". Columns w
157
240
 
158
241
  class NewsArticle < ActiveRecord::Base
159
242
  include PgSearch
160
- pg_search_scope :against => {
243
+ pg_search_scope :search_full_text, :against => {
161
244
  :title => 'A',
162
245
  :subtitle => 'B',
163
246
  :content => 'C'
@@ -168,7 +251,7 @@ You can also pass the weights in as an array of arrays, or any other structure t
168
251
 
169
252
  class NewsArticle < ActiveRecord::Base
170
253
  include PgSearch
171
- pg_search_scope :against => [
254
+ pg_search_scope :search_full_text, :against => [
172
255
  [:title, 'A'],
173
256
  [:subtitle, 'B'],
174
257
  [:content, 'C']
@@ -177,7 +260,7 @@ You can also pass the weights in as an array of arrays, or any other structure t
177
260
 
178
261
  class NewsArticle < ActiveRecord::Base
179
262
  include PgSearch
180
- pg_search_scope :against => [
263
+ pg_search_scope :search_full_text, :against => [
181
264
  [:title, 'A'],
182
265
  {:subtitle => 'B'},
183
266
  :content
@@ -228,15 +311,65 @@ PostgreSQL full text search also support multiple dictionaries for stemming. You
228
311
  BoringTweet.kinda_matching("sleeping") # => [sleepy, sleeping, sleeper]
229
312
  BoringTweet.literally_matching("sleeping") # => [sleeping]
230
313
 
231
- ==== :dmetaphone (Double Metaphone soundalike search)
314
+ ===== :normalization
232
315
 
233
- {Double Metaphone}[http://en.wikipedia.org/wiki/Double_Metaphone] is an algorithm for matching words that sound alike even if they are spelled very differently. For example, "Geoff" and "Jeff" sound identical and thus match. Currently, this is not a true double-metaphone, as only the first metaphone is used for searching.
316
+ PostgreSQL supports multiple algorithms for ranking results against queries. For instance, you might want to consider overall document size or the distance between multiple search terms in the original text. This option takes an integer, which is passed directly to PostgreSQL. According to the latest {PostgreSQL documentation}[http://www.postgresql.org/docs/current/static/textsearch-controls.html], the supported algorithms are:
234
317
 
235
- Double Metaphone support is currently available as part of the {fuzzystrmatch contrib package}[http://www.postgresql.org/docs/current/static/fuzzystrmatch.html] that must be installed before this feature can be used. In addition to the contrib package, you must install a utility function into your database. To generate a migration for this, add the following line to your Rakefile:
318
+ 0 (the default) ignores the document length
319
+ 1 divides the rank by 1 + the logarithm of the document length
320
+ 2 divides the rank by the document length
321
+ 4 divides the rank by the mean harmonic distance between extents
322
+ 8 divides the rank by the number of unique words in document
323
+ 16 divides the rank by 1 + the logarithm of the number of unique words in document
324
+ 32 divides the rank by itself + 1
236
325
 
237
- include "pg_search/tasks"
326
+ This integer is a bitmask, so if you want to combine algorithms, you can add their numbers together. (e.g. to use algorithms 1, 8, and 32, you would pass 1 + 8 + 32 = 41)
238
327
 
239
- and then run:
328
+ class BigLongDocument < ActiveRecord::Base
329
+ include PgSearch
330
+ pg_search_scope :regular_search,
331
+ :against => :text
332
+
333
+ pg_search_scope :short_search,
334
+ :against => :text,
335
+ :using => {
336
+ :tsearch => {:normalization => 2}
337
+ }
338
+
339
+ long = BigLongDocument.create!(:text => "Four score and twenty years ago")
340
+ short = BigLongDocument.create!(:text => "Four score")
341
+
342
+ BigLongDocument.regular_search("four score") #=> [long, short]
343
+ BigLongDocument.short_search("four score") #=> [short, long]
344
+
345
+ ===== :any_word
346
+
347
+ Setting this attribute to true will perform a search which will return all models containing any word in the search terms.
348
+
349
+ class Number < ActiveRecord::Base
350
+ include PgSearch
351
+ pg_search_scope :search_any_word,
352
+ :against => :text,
353
+ :using => {
354
+ :tsearch => {:any_word => true}
355
+ }
356
+
357
+ pg_search_scope :search_all_words,
358
+ :against => :text
359
+ end
360
+
361
+ one = Number.create! :text => 'one'
362
+ two = Number.create! :text => 'two'
363
+ three = Number.create! :text => 'three'
364
+
365
+ Number.search_any_word('one two three') # => [one, two, three]
366
+ Number.search_all_words('one two three') # => []
367
+
368
+ ==== :dmetaphone (Double Metaphone soundalike search)
369
+
370
+ {Double Metaphone}[http://en.wikipedia.org/wiki/Double_Metaphone] is an algorithm for matching words that sound alike even if they are spelled very differently. For example, "Geoff" and "Jeff" sound identical and thus match. Currently, this is not a true double-metaphone, as only the first metaphone is used for searching.
371
+
372
+ Double Metaphone support is currently available as part of the {fuzzystrmatch contrib package}[http://www.postgresql.org/docs/current/static/fuzzystrmatch.html] that must be installed before this feature can be used. In addition to the contrib package, you must install a utility function into your database. To generate a migration for this, run:
240
373
 
241
374
  $ rake pg_search:migration:dmetaphone
242
375
 
@@ -302,32 +435,6 @@ Ignoring accents uses the {unaccent contrib package}[http://www.postgresql.org/d
302
435
  SpanishQuestion.gringo_search("Que") # => [what]
303
436
  SpanishQuestion.gringo_search("Cüåñtô") # => [how_many]
304
437
 
305
- === Using tsvector columns
306
-
307
- PostgreSQL allows you the ability to search against a column with type tsvector instead of using an expression; this speeds up searching dramatically as it offloads creation of the tsvector that the tsquery is evaluated against.
308
-
309
- To use this functionality you'll need to do a few things:
310
-
311
- * Create a column of type tsvector that you'd like to search against. If you want to search using multiple search methods, for example tsearch and dmetaphone, you'll need a column for each.
312
- * Create a trigger function that will update the column(s) using the expression appropriate for that type of search. See: http://www.postgresql.org/docs/current/static/textsearch-features.html#TEXTSEARCH-UPDATE-TRIGGERS
313
- * Should you have any pre-existing data in the table, update the newly-created tsvector columns with the expression that your trigger function uses.
314
- * Add the option to pg_search_scope, e.g:
315
-
316
- pg_search_scope :fast_content_search,
317
- :against => :content,
318
- :using => {
319
- dmetaphone: {
320
- tsvector_column: 'tsvector_content_dmetaphone'
321
- },
322
- tsearch: {
323
- dictionary: 'english',
324
- tsvector_column: 'tsvector_content_tsearch'
325
- }
326
- trigram: {} # trigram does not use tsvectors
327
- }
328
-
329
- Please note that the :against column is only used when the tsvector_column is not present for the search type.
330
-
331
438
  == REQUIREMENTS
332
439
 
333
440
  * ActiveRecord 2 or 3
data/Rakefile CHANGED
@@ -3,42 +3,27 @@ Bundler::GemHelper.install_tasks
3
3
 
4
4
  task :default => :spec
5
5
 
6
- environments = %w[rails2 rails3]
7
- major, minor, revision = RUBY_VERSION.split(".").map{|str| str.to_i }
8
-
9
- in_environment = lambda do |environment, command|
10
- sh %Q{export BUNDLE_GEMFILE="gemfiles/#{environment}/Gemfile"; bundle update && bundle exec #{command}}
6
+ def bundle_exec(command)
7
+ sh %Q{bundle update && bundle exec #{command}}
11
8
  end
12
9
 
13
- in_all_environments = lambda do |command|
14
- environments.each do |environment|
15
- next if environment == "rails2" && major == 1 && minor > 8
16
- puts "\n---#{environment}---\n"
17
- in_environment.call(environment, command)
18
- end
19
- end
20
-
21
- desc "Run all specs against ActiveRecord 2 and 3"
10
+ desc "Run all specs"
22
11
  task "spec" do
23
- in_all_environments.call('rspec spec')
12
+ bundle_exec("rspec spec")
24
13
  end
25
14
 
26
15
  task "doc" do
27
- in_environment.call("rails3", "rspec --format d spec")
16
+ bundle_exec("rspec --format d spec")
28
17
  end
29
18
 
30
- namespace "autotest" do
31
- environments.each do |environment|
32
- desc "Run autotest in #{environment}"
33
- task environment do
34
- in_environment.call(environment, 'autotest -s rspec2')
35
- end
36
- end
19
+ desc "Launch autotest"
20
+ task "autotest" do
21
+ bundle_exec("autotest -s rspec2")
37
22
  end
38
23
 
39
24
  namespace "doc" do
40
25
  desc "Generate README and preview in browser"
41
26
  task "readme" do
42
- sh "rdoc -c utf8 README.rdoc && open doc/files/README_rdoc.html"
27
+ sh "rdoc -c utf8 README.rdoc && open doc/README_rdoc.html"
43
28
  end
44
29
  end
data/lib/pg_search.rb CHANGED
@@ -1,32 +1,56 @@
1
1
  require "active_record"
2
- require "pg_search/configuration"
3
- require "pg_search/features"
4
- require "pg_search/normalizer"
5
- require "pg_search/scope"
6
- require "pg_search/scope_options"
7
- require "pg_search/version"
8
- #require "pg_search/railtie" if defined?(Rails) && defined?(Rails::Railtie)
2
+ require "active_support/concern"
9
3
 
10
4
  module PgSearch
11
- def self.included(base)
12
- base.send(:extend, ClassMethods)
13
- end
5
+ extend ActiveSupport::Concern
14
6
 
15
7
  module ClassMethods
16
8
  def pg_search_scope(name, options)
17
- scope = PgSearch::Scope.new(name, self, options)
18
- scope_method =
19
- if respond_to?(:scope) && !protected_methods.map(&:to_s).include?('scope')
20
- :scope # ActiveRecord 3.x
21
- else
22
- :named_scope # ActiveRecord 2.x
23
- end
24
-
25
- send(scope_method, name, scope.to_proc)
9
+ self.scope(
10
+ name,
11
+ PgSearch::Scope.new(name, self, options).to_proc
12
+ )
13
+ end
14
+
15
+ def multisearchable(options = {})
16
+ include PgSearch::Multisearchable
17
+ class_attribute :pg_search_multisearchable_options
18
+ self.pg_search_multisearchable_options = options
26
19
  end
27
20
  end
28
21
 
29
- def rank
30
- attributes['pg_search_rank'].to_f
22
+ module InstanceMethods
23
+ def rank
24
+ attributes['pg_search_rank'].to_f
25
+ end
26
+ end
27
+
28
+ class << self
29
+ def multisearch(query)
30
+ PgSearch::Document.search(query)
31
+ end
32
+
33
+ def disable_multisearch
34
+ Thread.current["PgSearch.enable_multisearch"] = false
35
+ yield
36
+ ensure
37
+ Thread.current["PgSearch.enable_multisearch"] = true
38
+ end
39
+
40
+ def multisearch_enabled?
41
+ Thread.current.key?("PgSearch.enable_multisearch") ? Thread.current["PgSearch.enable_multisearch"] : true
42
+ end
31
43
  end
32
44
  end
45
+
46
+ require "pg_search/configuration"
47
+ require "pg_search/document"
48
+ require "pg_search/features"
49
+ require "pg_search/multisearch"
50
+ require "pg_search/multisearchable"
51
+ require "pg_search/normalizer"
52
+ require "pg_search/scope"
53
+ require "pg_search/scope_options"
54
+ require "pg_search/version"
55
+
56
+ require "pg_search/railtie" if defined?(Rails)
@@ -10,6 +10,14 @@ module PgSearch
10
10
  @model = model
11
11
  end
12
12
 
13
+ class << self
14
+ def alias(*strings)
15
+ name = Array.wrap(strings).compact.join("_")
16
+ # By default, PostgreSQL limits names to 32 characters, so we hash and limit to 32 characters.
17
+ "pg_search_#{Digest::SHA2.hexdigest(name)}"[0,32]
18
+ end
19
+ end
20
+
13
21
  def columns
14
22
  regular_columns + associated_columns
15
23
  end
@@ -49,6 +57,10 @@ module PgSearch
49
57
  Array(@options[:using])
50
58
  end
51
59
 
60
+ def order_within_rank
61
+ @options[:order_within_rank]
62
+ end
63
+
52
64
  private
53
65
 
54
66
  def default_options
@@ -56,7 +68,7 @@ module PgSearch
56
68
  end
57
69
 
58
70
  def assert_valid_options(options)
59
- valid_keys = [:against, :ranked_by, :ignoring, :using, :query, :associated_against]
71
+ valid_keys = [:against, :ranked_by, :ignoring, :using, :query, :associated_against, :order_within_rank]
60
72
  valid_values = {
61
73
  :ignoring => [:accents]
62
74
  }