ar-tsvectors 0.0.1 → 0.0.2

Sign up to get free protection for your applications and to get access to all the features.
data/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # ActiveRecord support for ts_vectors
2
2
 
3
- This small library extends ActiveRecord to support PostgreSQL's `tsvector` type. PostgreSQL's `tsvector` stores an array of keywords that can be efficiently queried. The `tsvector` type is therefore particularly suitable for storing tags.
3
+ This small library extends ActiveRecord to support PostgreSQL's `tsvector` type. PostgreSQL's `tsvector` stores an array of keywords that can be efficiently queried. The `tsvector` type is therefore particularly suitable for storing tags, since it's possible to store all tags in a single column instead of a table.
4
4
 
5
5
  ## Requirements
6
6
 
@@ -13,9 +13,9 @@ Add gem to your `Gemfile`:
13
13
 
14
14
  gem 'ar-tsvectors', :require => 'activerecord_tsvectors'
15
15
 
16
- To test, add a `tsvector` column to your table in a migration:
16
+ To test, add a `text` column to your table in a migration:
17
17
 
18
- add_column :posts, :tags, :tsvector
18
+ add_column :posts, :tags, :text
19
19
 
20
20
  Then extend your ActiveRecord model with a declaration:
21
21
 
@@ -34,10 +34,18 @@ You can now query based on the tags:
34
34
  Post.with_all_tags('pizza')
35
35
  Post.with_all_tags(['pizza', 'pepperoni'])
36
36
 
37
- Note that for queries to use indexes, you need to create an index on the column. This is slightly more complicated; again, in a migration:
37
+ For queries to perform well, you must add an index on the column. This is slightly more complicated; again, in a migration:
38
38
 
39
39
  execute("create index index_posts_on_tags on posts using gin(to_tsvector('simple', tags))")
40
40
 
41
+ ## Options
42
+
43
+ The `ts_vector` mixin takes the following options:
44
+
45
+ * `:configuation` - the name of the PostgreSQL text configuration to use. Defaults to `simple`, which is appropriate for keyword-type indexing.
46
+ * `:format` - the format of the column, must be either `:text` (the default) or `:tsvector`.
47
+ * `:normalize` - a proc that overrides the default normalization. It must take a single string and return a normalized copy of the string.
48
+
41
49
  ## Methods
42
50
 
43
51
  Declaring a `tsvector` column `tags` will dynamically add the following methods:
@@ -77,10 +85,24 @@ The values are normalized both when performing queries, and when assigning new v
77
85
  post.tags
78
86
  => ['wtf']
79
87
 
80
- ## Limitations
88
+ ## Using ts_vector directly
89
+
90
+ It's possible to use the `tsvector` type for the column directly. In other words, in your migration would have something like:
91
+
92
+ add_column :posts, :tags, :tsvector
93
+
94
+ This is supported by supplying a format option to the `ts_vector` declaration in your model:
95
+
96
+ class Post < ActiveRecord::Base
97
+ ts_vector :tags, :format => :tsvector
98
+ end
99
+
100
+ When you do this, the way you define a table index is slightly different:
101
+
102
+ execute("create index index_posts_on_tags on posts using gin(tags)")
81
103
 
82
- Currently, the library will always use the built-in `simple` configuration, which only performs basic normalization, and does not perform stemming.
104
+ While using a `tsvector` column works, it has some drawbacks:
83
105
 
84
- Due to a limitation in ActiveRecord, stored column values (on `INSERT` and `UPDATE`) are passed to PostgreSQL as strings, and are therefore *not* normalized using the text configuration's rules. This means that if you override the normalization function, you must make sure you always strip and downcase in addition to whatever other normalization you do, otherwise queries will potentially *not* match all rows.
106
+ * The `tsvector` type is lossy. When you assign values to a `tsvector`, the text will be tokenized by PostgreSQL's text parser, which will remove all punctuation and quotes from the text. In this system, a quoted tag such as `"New York"` will be split into two different tags `New` and `York`, and it's no longer possible to search for `New York` as a distinct tag.
85
107
 
86
- A forthcoming version of ActiveRecord will provide the plumbing that will allow us to solve this issue.
108
+ * Due to a limitation in ActiveRecord, stored column values (on `INSERT` and `UPDATE`) are passed to PostgreSQL as strings, and are therefore *not* normalized using the text configuration's rules. This means that if you override the normalization function, you must make sure you always strip and downcase in addition to whatever other normalization you do, otherwise queries will potentially *not* match all rows. A forthcoming version of ActiveRecord will provide the plumbing that will allow us to solve this issue.
@@ -1,5 +1,6 @@
1
1
  require 'active_record'
2
2
 
3
+ require 'ts_vectors/attribute'
3
4
  require 'ts_vectors/model'
4
5
 
5
6
  class ActiveRecord::Base
@@ -0,0 +1,82 @@
1
+ module TsVectors
2
+
3
+ class Attribute
4
+ def initialize(name, options = {})
5
+ options.assert_valid_keys(:normalize, :configuration, :format)
6
+ @name, @options = name, options
7
+ if (config = @options[:configuration])
8
+ @configuration = config
9
+ else
10
+ @configuration = 'simple'
11
+ end
12
+ if (format = @options[:format])
13
+ @format = format.to_sym
14
+ else
15
+ @format = :text
16
+ end
17
+ end
18
+
19
+ def format_operand
20
+ @operand ||= begin
21
+ config = @configuration
22
+ if @format == :text
23
+ "to_tsvector('#{config}', #{@name})"
24
+ else
25
+ @name
26
+ end
27
+ end
28
+ end
29
+
30
+ def format_query
31
+ @query ||= "to_tsquery('#{@configuration}', ?)"
32
+ end
33
+
34
+ def serialize_values(values)
35
+ if values and values.length > 0
36
+ values.join(' ')
37
+ else
38
+ nil
39
+ end
40
+ end
41
+
42
+ def parse_values(string)
43
+ if string
44
+ values = string.scan(/(?:([^'\s,]+(?::\d+)?)|'([^']+)(?::\d+)?')\s*/u).flatten
45
+ values.reject! { |v| v.blank? }
46
+ values
47
+ else
48
+ []
49
+ end
50
+ end
51
+
52
+ def normalize_values(values)
53
+ values = [values] unless values.is_a?(Enumerable)
54
+ values = values.map { |v| normalize(v) }
55
+ values.compact!
56
+ values
57
+ end
58
+
59
+ def normalize(value)
60
+ if value
61
+ if (normalize = @options[:normalize])
62
+ value = normalize.call(value)
63
+ else
64
+ value = value.strip.downcase
65
+ end
66
+ if value.blank?
67
+ nil
68
+ elsif value =~ /\s/
69
+ %('#{value}')
70
+ else
71
+ value
72
+ end
73
+ end
74
+ end
75
+
76
+ attr_reader :name
77
+ attr_reader :options
78
+ attr_reader :format
79
+ attr_reader :configuration
80
+ end
81
+
82
+ end
@@ -5,50 +5,6 @@ module TsVectors
5
5
  included do
6
6
  end
7
7
 
8
- class Attribute
9
- def initialize(name, options = {})
10
- options.assert_valid_keys(:normalize)
11
- @name, @options = name, options
12
- end
13
-
14
- def parse_values(string)
15
- if string
16
- values = string.scan(/(?:([^'\s,]+)|'([^']+)')\s*/u).flatten
17
- values.reject! { |v| v.blank? }
18
- values
19
- else
20
- []
21
- end
22
- end
23
-
24
- def normalize_values(values)
25
- values = [values] unless values.is_a?(Enumerable)
26
- values = values.map { |v| normalize(v) }
27
- values.compact!
28
- values
29
- end
30
-
31
- def normalize(value)
32
- if value
33
- if (normalize = @options[:normalize])
34
- value = normalize.call(value)
35
- else
36
- value = value.strip.downcase
37
- end
38
- if value.blank?
39
- nil
40
- elsif value =~ /\s/
41
- %('#{value}')
42
- else
43
- value
44
- end
45
- end
46
- end
47
-
48
- attr_reader :name
49
- attr_reader :options
50
- end
51
-
52
8
  module ClassMethods
53
9
 
54
10
  def ts_vector(attribute, options = {})
@@ -60,7 +16,7 @@ module TsVectors
60
16
  scope "with_all_#{attribute}", lambda { |values|
61
17
  values = attr.normalize_values(values)
62
18
  if values.any?
63
- where("#{attribute} @@ to_tsquery('simple', ?)", values.join(' & '))
19
+ where("#{attr.format_operand} @@ #{attr.format_query}", values.join(' & '))
64
20
  else
65
21
  where('false')
66
22
  end
@@ -69,7 +25,7 @@ module TsVectors
69
25
  scope "with_any_#{attribute}", lambda { |values|
70
26
  values = attr.normalize_values(values)
71
27
  if values.any?
72
- where("#{attribute} @@ to_tsquery('simple', ?)", values.join(' | '))
28
+ where("#{attr.format_operand} @@ #{attr.format_query}", values.join(' | '))
73
29
  else
74
30
  where('false')
75
31
  end
@@ -78,7 +34,7 @@ module TsVectors
78
34
  scope "without_all_#{attribute}", lambda { |values|
79
35
  values = attr.normalize_values(values)
80
36
  if values.any?
81
- where("#{attribute} @@ (!! to_tsquery('simple', ?))", values.join(' & '))
37
+ where("#{attr.format_operand} @@ (!! #{attr.format_query})", values.join(' & '))
82
38
  else
83
39
  where('false')
84
40
  end
@@ -87,7 +43,7 @@ module TsVectors
87
43
  scope "without_any_#{attribute}", lambda { |values|
88
44
  values = attr.normalize_values(values)
89
45
  if values.any?
90
- where("#{attribute} @@ (!! to_tsquery('simple', ?))", values.join(' | '))
46
+ where("#{attr.format_operand} @@ (!! #{attr.format_query})", values.join(' | '))
91
47
  else
92
48
  where('false')
93
49
  end
@@ -98,7 +54,7 @@ module TsVectors
98
54
  values = attr.normalize_values(values)
99
55
  if values.any?
100
56
  order(sanitize_sql_array([
101
- "ts_rank(#{attribute}, to_tsquery('simple', ?)) #{direction}", values.join(' | ')]))
57
+ "ts_rank(#{attr.format_operand}, #{attr.format_query}) #{direction}", values.join(' | ')]))
102
58
  else
103
59
  order('false')
104
60
  end
@@ -109,12 +65,8 @@ module TsVectors
109
65
  end
110
66
 
111
67
  define_method("#{attribute}=") do |values|
112
- values = attr.normalize_values(values)
113
- if values.any?
114
- write_attribute(attribute, values.join(' '))
115
- else
116
- write_attribute(attribute, nil)
117
- end
68
+ write_attribute(attribute, attr.serialize_values(
69
+ attr.normalize_values(values)))
118
70
  end
119
71
  end
120
72
 
@@ -1,3 +1,3 @@
1
1
  module TsVectors
2
- VERSION = '0.0.1'
2
+ VERSION = '0.0.2'
3
3
  end
@@ -22,15 +22,15 @@ RSpec.configure do |c|
22
22
 
23
23
  c.before :each do
24
24
  ActiveRecord::Base.connection.execute %(
25
- create table if not exists things (
25
+ create table if not exists models_with_tsvector_format (
26
26
  id serial primary key,
27
27
  tags tsvector
28
28
  )
29
29
  )
30
30
  ActiveRecord::Base.connection.execute %(
31
- create table if not exists thangs (
31
+ create table if not exists models_with_text_format (
32
32
  id serial primary key,
33
- tags tsvector
33
+ tags text
34
34
  )
35
35
  )
36
36
  end
@@ -1,98 +1,127 @@
1
1
  require 'spec_helper'
2
2
 
3
- class Thing < ActiveRecord::Base
3
+ class ModelUsingTsVectorFormat < ActiveRecord::Base
4
+ self.table_name = 'models_with_tsvector_format'
5
+ FORMAT = 'tsvector'
4
6
  ts_vector :tags
5
7
  end
6
8
 
7
- class Thang < ActiveRecord::Base
9
+ class ModelUsingTextFormat < ActiveRecord::Base
10
+ self.table_name = 'models_with_text_format'
11
+ FORMAT = 'text'
12
+ ts_vector :tags, :format => :text
13
+ end
14
+
15
+ class ModelWithNormalization < ActiveRecord::Base
16
+ self.table_name = 'models_with_tsvector_format'
8
17
  ts_vector :tags, :normalize => lambda { |v| v.downcase.gsub(/\d/, '') }
9
18
  end
10
19
 
11
20
  describe TsVectors::Model do
12
- let :thing do
13
- Thing.new
14
- end
15
-
16
21
  context 'attribute accessor' do
22
+ let :model do
23
+ ModelUsingTsVectorFormat.new
24
+ end
25
+
17
26
  it "sets empty array when assigning nil" do
18
- thing.tags = nil
19
- thing.tags.should == []
27
+ model.tags = nil
28
+ model.tags.should == []
20
29
  end
21
30
 
22
31
  it "sets empty array when assigning empty array" do
23
- thing.tags = []
24
- thing.tags.should == []
32
+ model.tags = []
33
+ model.tags.should == []
25
34
  end
26
35
 
27
36
  it "sets empty array when assigning empty string" do
28
- thing.tags = ""
29
- thing.tags.should == []
37
+ model.tags = ""
38
+ model.tags.should == []
30
39
  end
31
40
 
32
41
  it "sets empty array when assigning string containing only spaces" do
33
- thing.tags = " "
34
- thing.tags.should == []
42
+ model.tags = " "
43
+ model.tags.should == []
35
44
  end
36
45
 
37
46
  it "sets empty array when assigning array of strings containing only spaces" do
38
- thing.tags = [" "]
39
- thing.tags.should == []
40
- end
41
- end
42
-
43
- context 'with_all_XXX' do
44
- it 'matches rows' do
45
- thing.tags = %w(foo bar)
46
- thing.save!
47
- Thing.with_all_tags(%w(foo bar)).first.should == thing
48
- Thing.with_all_tags(%w(foo bar baz)).first.should == nil
49
- Thing.with_all_tags(%w(FOO BAR)).first.should == thing
50
- Thing.with_all_tags(%w(FOO BAR BAZ)).first.should == nil
51
- Thing.with_all_tags(%w(baz)).first.should == nil
47
+ model.tags = [" "]
48
+ model.tags.should == []
52
49
  end
53
50
  end
54
51
 
55
- context 'with_any_XXX' do
56
- it 'matches rows' do
57
- thing.tags = %w(foo bar)
58
- thing.save!
59
- Thing.with_any_tags(%w(foo bar)).first.should == thing
60
- Thing.with_any_tags(%w(foo bar baz)).first.should == thing
61
- Thing.with_any_tags(%w(FOO BAR)).first.should == thing
62
- Thing.with_any_tags(%w(FOO BAR BAZ)).first.should == thing
63
- Thing.with_any_tags(%w(baz)).first.should == nil
64
- end
65
- end
66
-
67
- context 'without_all_XXX' do
68
- it 'does not match rows' do
69
- thing1 = Thing.create!(:tags => %w(foo bar))
70
- thing2 = Thing.create!(:tags => %w(foo baz))
71
- thing3 = Thing.create!(:tags => %w(baz))
72
-
73
- Thing.without_all_tags(%w(foo)).sort.should == [thing3]
74
- Thing.without_all_tags(%w(foo bar)).sort.should == [thing2, thing3]
75
- Thing.without_all_tags(%w(baz)).sort.should == [thing1]
76
- Thing.without_all_tags(%w(foo baz)).sort.should == [thing1, thing3]
77
- end
78
- end
79
-
80
- context 'without_any_XXX' do
81
- it 'does not match rows' do
82
- thing1 = Thing.create!(:tags => %w(foo bar))
83
- thing2 = Thing.create!(:tags => %w(foo baz))
84
- thing3 = Thing.create!(:tags => %w(baz))
85
-
86
- Thing.without_any_tags(%w(foo)).sort.should == [thing3]
87
- Thing.without_any_tags(%w(foo bar)).sort.should == [thing3]
88
- Thing.without_any_tags(%w(baz)).sort.should == [thing1]
89
- Thing.without_any_tags(%w(foo baz)).should == []
52
+ [ModelUsingTsVectorFormat, ModelUsingTextFormat].each do |klass|
53
+ describe "using #{klass.const_get('FORMAT')} column" do
54
+ context 'with_all_XXX' do
55
+ let :model do
56
+ klass.new
57
+ end
58
+
59
+ it 'matches rows' do
60
+ model.tags = %w(foo bar)
61
+ model.save!
62
+ klass.with_all_tags(%w(foo bar)).first.should == model
63
+ klass.with_all_tags(%w(foo bar baz)).first.should == nil
64
+ klass.with_all_tags(%w(FOO BAR)).first.should == model
65
+ klass.with_all_tags(%w(FOO BAR BAZ)).first.should == nil
66
+ klass.with_all_tags(%w(baz)).first.should == nil
67
+ end
68
+ end
69
+
70
+ context 'with_any_XXX' do
71
+ let :model do
72
+ klass.new
73
+ end
74
+
75
+ it 'matches rows' do
76
+ model.tags = %w(foo bar)
77
+ model.save!
78
+ klass.with_any_tags(%w(foo bar)).first.should == model
79
+ klass.with_any_tags(%w(foo bar baz)).first.should == model
80
+ klass.with_any_tags(%w(FOO BAR)).first.should == model
81
+ klass.with_any_tags(%w(FOO BAR BAZ)).first.should == model
82
+ klass.with_any_tags(%w(baz)).first.should == nil
83
+ end
84
+ end
85
+
86
+ context 'without_all_XXX' do
87
+ let :model do
88
+ klass.new
89
+ end
90
+
91
+ it 'does not match rows' do
92
+ thing1 = klass.create!(:tags => %w(foo bar))
93
+ thing2 = klass.create!(:tags => %w(foo baz))
94
+ thing3 = klass.create!(:tags => %w(baz))
95
+
96
+ klass.without_all_tags(%w(foo)).sort.should == [thing3]
97
+ klass.without_all_tags(%w(foo bar)).sort.should == [thing2, thing3]
98
+ klass.without_all_tags(%w(baz)).sort.should == [thing1]
99
+ klass.without_all_tags(%w(foo baz)).sort.should == [thing1, thing3]
100
+ end
101
+ end
102
+
103
+ context 'without_any_XXX' do
104
+ let :model do
105
+ klass.new
106
+ end
107
+
108
+ it 'does not match rows' do
109
+ thing1 = klass.create!(:tags => %w(foo bar))
110
+ thing2 = klass.create!(:tags => %w(foo baz))
111
+ thing3 = klass.create!(:tags => %w(baz))
112
+
113
+ klass.without_any_tags(%w(foo)).sort.should == [thing3]
114
+ klass.without_any_tags(%w(foo bar)).sort.should == [thing3]
115
+ klass.without_any_tags(%w(baz)).sort.should == [thing1]
116
+ klass.without_any_tags(%w(foo baz)).should == []
117
+ end
118
+ end
90
119
  end
91
120
  end
92
121
 
93
122
  context 'normalization' do
94
123
  let :thang do
95
- Thang.new
124
+ ModelWithNormalization.new
96
125
  end
97
126
 
98
127
  it 'calls normalizing function on assignment' do
@@ -103,10 +132,10 @@ describe TsVectors::Model do
103
132
  it 'normalizes query' do
104
133
  thang.tags = %w(foo123 BAR123)
105
134
  thang.save!
106
- Thang.with_all_tags(%w(foo123)).first.should == thang
107
- Thang.with_all_tags(%w(BAR123)).first.should == thang
108
- Thang.with_all_tags(%w(foo)).first.should == thang
109
- Thang.with_all_tags(%w(bar)).first.should == thang
135
+ ModelWithNormalization.with_all_tags(%w(foo123)).first.should == thang
136
+ ModelWithNormalization.with_all_tags(%w(BAR123)).first.should == thang
137
+ ModelWithNormalization.with_all_tags(%w(foo)).first.should == thang
138
+ ModelWithNormalization.with_all_tags(%w(bar)).first.should == thang
110
139
  end
111
140
  end
112
141
 
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: ar-tsvectors
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.1
4
+ version: 0.0.2
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -90,6 +90,7 @@ files:
90
90
  - Rakefile
91
91
  - ar-tsvectors.gemspec
92
92
  - lib/activerecord_tsvectors.rb
93
+ - lib/ts_vectors/attribute.rb
93
94
  - lib/ts_vectors/model.rb
94
95
  - lib/ts_vectors/version.rb
95
96
  - spec/spec_helper.rb