ar-tsvectors 0.0.1 → 0.0.2
Sign up to get free protection for your applications and to get access to all the features.
- data/README.md +30 -8
- data/lib/activerecord_tsvectors.rb +1 -0
- data/lib/ts_vectors/attribute.rb +82 -0
- data/lib/ts_vectors/model.rb +7 -55
- data/lib/ts_vectors/version.rb +1 -1
- data/spec/spec_helper.rb +3 -3
- data/spec/ts_vectors_spec.rb +97 -68
- metadata +2 -1
data/README.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1
1
|
# ActiveRecord support for ts_vectors
|
2
2
|
|
3
|
-
This small library extends ActiveRecord to support PostgreSQL's `tsvector` type. PostgreSQL's `tsvector` stores an array of keywords that can be efficiently queried. The `tsvector` type is therefore particularly suitable for storing tags.
|
3
|
+
This small library extends ActiveRecord to support PostgreSQL's `tsvector` type. PostgreSQL's `tsvector` stores an array of keywords that can be efficiently queried. The `tsvector` type is therefore particularly suitable for storing tags, since it's possible to store all tags in a single column instead of a table.
|
4
4
|
|
5
5
|
## Requirements
|
6
6
|
|
@@ -13,9 +13,9 @@ Add gem to your `Gemfile`:
|
|
13
13
|
|
14
14
|
gem 'ar-tsvectors', :require => 'activerecord_tsvectors'
|
15
15
|
|
16
|
-
To test, add a `
|
16
|
+
To test, add a `text` column to your table in a migration:
|
17
17
|
|
18
|
-
add_column :posts, :tags, :
|
18
|
+
add_column :posts, :tags, :text
|
19
19
|
|
20
20
|
Then extend your ActiveRecord model with a declaration:
|
21
21
|
|
@@ -34,10 +34,18 @@ You can now query based on the tags:
|
|
34
34
|
Post.with_all_tags('pizza')
|
35
35
|
Post.with_all_tags(['pizza', 'pepperoni'])
|
36
36
|
|
37
|
-
|
37
|
+
For queries to perform well, you must add an index on the column. This is slightly more complicated; again, in a migration:
|
38
38
|
|
39
39
|
execute("create index index_posts_on_tags on posts using gin(to_tsvector('simple', tags))")
|
40
40
|
|
41
|
+
## Options
|
42
|
+
|
43
|
+
The `ts_vector` mixin takes the following options:
|
44
|
+
|
45
|
+
* `:configuation` - the name of the PostgreSQL text configuration to use. Defaults to `simple`, which is appropriate for keyword-type indexing.
|
46
|
+
* `:format` - the format of the column, must be either `:text` (the default) or `:tsvector`.
|
47
|
+
* `:normalize` - a proc that overrides the default normalization. It must take a single string and return a normalized copy of the string.
|
48
|
+
|
41
49
|
## Methods
|
42
50
|
|
43
51
|
Declaring a `tsvector` column `tags` will dynamically add the following methods:
|
@@ -77,10 +85,24 @@ The values are normalized both when performing queries, and when assigning new v
|
|
77
85
|
post.tags
|
78
86
|
=> ['wtf']
|
79
87
|
|
80
|
-
##
|
88
|
+
## Using ts_vector directly
|
89
|
+
|
90
|
+
It's possible to use the `tsvector` type for the column directly. In other words, in your migration would have something like:
|
91
|
+
|
92
|
+
add_column :posts, :tags, :tsvector
|
93
|
+
|
94
|
+
This is supported by supplying a format option to the `ts_vector` declaration in your model:
|
95
|
+
|
96
|
+
class Post < ActiveRecord::Base
|
97
|
+
ts_vector :tags, :format => :tsvector
|
98
|
+
end
|
99
|
+
|
100
|
+
When you do this, the way you define a table index is slightly different:
|
101
|
+
|
102
|
+
execute("create index index_posts_on_tags on posts using gin(tags)")
|
81
103
|
|
82
|
-
|
104
|
+
While using a `tsvector` column works, it has some drawbacks:
|
83
105
|
|
84
|
-
|
106
|
+
* The `tsvector` type is lossy. When you assign values to a `tsvector`, the text will be tokenized by PostgreSQL's text parser, which will remove all punctuation and quotes from the text. In this system, a quoted tag such as `"New York"` will be split into two different tags `New` and `York`, and it's no longer possible to search for `New York` as a distinct tag.
|
85
107
|
|
86
|
-
A forthcoming version of ActiveRecord will provide the plumbing that will allow us to solve this issue.
|
108
|
+
* Due to a limitation in ActiveRecord, stored column values (on `INSERT` and `UPDATE`) are passed to PostgreSQL as strings, and are therefore *not* normalized using the text configuration's rules. This means that if you override the normalization function, you must make sure you always strip and downcase in addition to whatever other normalization you do, otherwise queries will potentially *not* match all rows. A forthcoming version of ActiveRecord will provide the plumbing that will allow us to solve this issue.
|
@@ -0,0 +1,82 @@
|
|
1
|
+
module TsVectors
|
2
|
+
|
3
|
+
class Attribute
|
4
|
+
def initialize(name, options = {})
|
5
|
+
options.assert_valid_keys(:normalize, :configuration, :format)
|
6
|
+
@name, @options = name, options
|
7
|
+
if (config = @options[:configuration])
|
8
|
+
@configuration = config
|
9
|
+
else
|
10
|
+
@configuration = 'simple'
|
11
|
+
end
|
12
|
+
if (format = @options[:format])
|
13
|
+
@format = format.to_sym
|
14
|
+
else
|
15
|
+
@format = :text
|
16
|
+
end
|
17
|
+
end
|
18
|
+
|
19
|
+
def format_operand
|
20
|
+
@operand ||= begin
|
21
|
+
config = @configuration
|
22
|
+
if @format == :text
|
23
|
+
"to_tsvector('#{config}', #{@name})"
|
24
|
+
else
|
25
|
+
@name
|
26
|
+
end
|
27
|
+
end
|
28
|
+
end
|
29
|
+
|
30
|
+
def format_query
|
31
|
+
@query ||= "to_tsquery('#{@configuration}', ?)"
|
32
|
+
end
|
33
|
+
|
34
|
+
def serialize_values(values)
|
35
|
+
if values and values.length > 0
|
36
|
+
values.join(' ')
|
37
|
+
else
|
38
|
+
nil
|
39
|
+
end
|
40
|
+
end
|
41
|
+
|
42
|
+
def parse_values(string)
|
43
|
+
if string
|
44
|
+
values = string.scan(/(?:([^'\s,]+(?::\d+)?)|'([^']+)(?::\d+)?')\s*/u).flatten
|
45
|
+
values.reject! { |v| v.blank? }
|
46
|
+
values
|
47
|
+
else
|
48
|
+
[]
|
49
|
+
end
|
50
|
+
end
|
51
|
+
|
52
|
+
def normalize_values(values)
|
53
|
+
values = [values] unless values.is_a?(Enumerable)
|
54
|
+
values = values.map { |v| normalize(v) }
|
55
|
+
values.compact!
|
56
|
+
values
|
57
|
+
end
|
58
|
+
|
59
|
+
def normalize(value)
|
60
|
+
if value
|
61
|
+
if (normalize = @options[:normalize])
|
62
|
+
value = normalize.call(value)
|
63
|
+
else
|
64
|
+
value = value.strip.downcase
|
65
|
+
end
|
66
|
+
if value.blank?
|
67
|
+
nil
|
68
|
+
elsif value =~ /\s/
|
69
|
+
%('#{value}')
|
70
|
+
else
|
71
|
+
value
|
72
|
+
end
|
73
|
+
end
|
74
|
+
end
|
75
|
+
|
76
|
+
attr_reader :name
|
77
|
+
attr_reader :options
|
78
|
+
attr_reader :format
|
79
|
+
attr_reader :configuration
|
80
|
+
end
|
81
|
+
|
82
|
+
end
|
data/lib/ts_vectors/model.rb
CHANGED
@@ -5,50 +5,6 @@ module TsVectors
|
|
5
5
|
included do
|
6
6
|
end
|
7
7
|
|
8
|
-
class Attribute
|
9
|
-
def initialize(name, options = {})
|
10
|
-
options.assert_valid_keys(:normalize)
|
11
|
-
@name, @options = name, options
|
12
|
-
end
|
13
|
-
|
14
|
-
def parse_values(string)
|
15
|
-
if string
|
16
|
-
values = string.scan(/(?:([^'\s,]+)|'([^']+)')\s*/u).flatten
|
17
|
-
values.reject! { |v| v.blank? }
|
18
|
-
values
|
19
|
-
else
|
20
|
-
[]
|
21
|
-
end
|
22
|
-
end
|
23
|
-
|
24
|
-
def normalize_values(values)
|
25
|
-
values = [values] unless values.is_a?(Enumerable)
|
26
|
-
values = values.map { |v| normalize(v) }
|
27
|
-
values.compact!
|
28
|
-
values
|
29
|
-
end
|
30
|
-
|
31
|
-
def normalize(value)
|
32
|
-
if value
|
33
|
-
if (normalize = @options[:normalize])
|
34
|
-
value = normalize.call(value)
|
35
|
-
else
|
36
|
-
value = value.strip.downcase
|
37
|
-
end
|
38
|
-
if value.blank?
|
39
|
-
nil
|
40
|
-
elsif value =~ /\s/
|
41
|
-
%('#{value}')
|
42
|
-
else
|
43
|
-
value
|
44
|
-
end
|
45
|
-
end
|
46
|
-
end
|
47
|
-
|
48
|
-
attr_reader :name
|
49
|
-
attr_reader :options
|
50
|
-
end
|
51
|
-
|
52
8
|
module ClassMethods
|
53
9
|
|
54
10
|
def ts_vector(attribute, options = {})
|
@@ -60,7 +16,7 @@ module TsVectors
|
|
60
16
|
scope "with_all_#{attribute}", lambda { |values|
|
61
17
|
values = attr.normalize_values(values)
|
62
18
|
if values.any?
|
63
|
-
where("#{
|
19
|
+
where("#{attr.format_operand} @@ #{attr.format_query}", values.join(' & '))
|
64
20
|
else
|
65
21
|
where('false')
|
66
22
|
end
|
@@ -69,7 +25,7 @@ module TsVectors
|
|
69
25
|
scope "with_any_#{attribute}", lambda { |values|
|
70
26
|
values = attr.normalize_values(values)
|
71
27
|
if values.any?
|
72
|
-
where("#{
|
28
|
+
where("#{attr.format_operand} @@ #{attr.format_query}", values.join(' | '))
|
73
29
|
else
|
74
30
|
where('false')
|
75
31
|
end
|
@@ -78,7 +34,7 @@ module TsVectors
|
|
78
34
|
scope "without_all_#{attribute}", lambda { |values|
|
79
35
|
values = attr.normalize_values(values)
|
80
36
|
if values.any?
|
81
|
-
where("#{
|
37
|
+
where("#{attr.format_operand} @@ (!! #{attr.format_query})", values.join(' & '))
|
82
38
|
else
|
83
39
|
where('false')
|
84
40
|
end
|
@@ -87,7 +43,7 @@ module TsVectors
|
|
87
43
|
scope "without_any_#{attribute}", lambda { |values|
|
88
44
|
values = attr.normalize_values(values)
|
89
45
|
if values.any?
|
90
|
-
where("#{
|
46
|
+
where("#{attr.format_operand} @@ (!! #{attr.format_query})", values.join(' | '))
|
91
47
|
else
|
92
48
|
where('false')
|
93
49
|
end
|
@@ -98,7 +54,7 @@ module TsVectors
|
|
98
54
|
values = attr.normalize_values(values)
|
99
55
|
if values.any?
|
100
56
|
order(sanitize_sql_array([
|
101
|
-
"ts_rank(#{
|
57
|
+
"ts_rank(#{attr.format_operand}, #{attr.format_query}) #{direction}", values.join(' | ')]))
|
102
58
|
else
|
103
59
|
order('false')
|
104
60
|
end
|
@@ -109,12 +65,8 @@ module TsVectors
|
|
109
65
|
end
|
110
66
|
|
111
67
|
define_method("#{attribute}=") do |values|
|
112
|
-
|
113
|
-
|
114
|
-
write_attribute(attribute, values.join(' '))
|
115
|
-
else
|
116
|
-
write_attribute(attribute, nil)
|
117
|
-
end
|
68
|
+
write_attribute(attribute, attr.serialize_values(
|
69
|
+
attr.normalize_values(values)))
|
118
70
|
end
|
119
71
|
end
|
120
72
|
|
data/lib/ts_vectors/version.rb
CHANGED
data/spec/spec_helper.rb
CHANGED
@@ -22,15 +22,15 @@ RSpec.configure do |c|
|
|
22
22
|
|
23
23
|
c.before :each do
|
24
24
|
ActiveRecord::Base.connection.execute %(
|
25
|
-
create table if not exists
|
25
|
+
create table if not exists models_with_tsvector_format (
|
26
26
|
id serial primary key,
|
27
27
|
tags tsvector
|
28
28
|
)
|
29
29
|
)
|
30
30
|
ActiveRecord::Base.connection.execute %(
|
31
|
-
create table if not exists
|
31
|
+
create table if not exists models_with_text_format (
|
32
32
|
id serial primary key,
|
33
|
-
tags
|
33
|
+
tags text
|
34
34
|
)
|
35
35
|
)
|
36
36
|
end
|
data/spec/ts_vectors_spec.rb
CHANGED
@@ -1,98 +1,127 @@
|
|
1
1
|
require 'spec_helper'
|
2
2
|
|
3
|
-
class
|
3
|
+
class ModelUsingTsVectorFormat < ActiveRecord::Base
|
4
|
+
self.table_name = 'models_with_tsvector_format'
|
5
|
+
FORMAT = 'tsvector'
|
4
6
|
ts_vector :tags
|
5
7
|
end
|
6
8
|
|
7
|
-
class
|
9
|
+
class ModelUsingTextFormat < ActiveRecord::Base
|
10
|
+
self.table_name = 'models_with_text_format'
|
11
|
+
FORMAT = 'text'
|
12
|
+
ts_vector :tags, :format => :text
|
13
|
+
end
|
14
|
+
|
15
|
+
class ModelWithNormalization < ActiveRecord::Base
|
16
|
+
self.table_name = 'models_with_tsvector_format'
|
8
17
|
ts_vector :tags, :normalize => lambda { |v| v.downcase.gsub(/\d/, '') }
|
9
18
|
end
|
10
19
|
|
11
20
|
describe TsVectors::Model do
|
12
|
-
let :thing do
|
13
|
-
Thing.new
|
14
|
-
end
|
15
|
-
|
16
21
|
context 'attribute accessor' do
|
22
|
+
let :model do
|
23
|
+
ModelUsingTsVectorFormat.new
|
24
|
+
end
|
25
|
+
|
17
26
|
it "sets empty array when assigning nil" do
|
18
|
-
|
19
|
-
|
27
|
+
model.tags = nil
|
28
|
+
model.tags.should == []
|
20
29
|
end
|
21
30
|
|
22
31
|
it "sets empty array when assigning empty array" do
|
23
|
-
|
24
|
-
|
32
|
+
model.tags = []
|
33
|
+
model.tags.should == []
|
25
34
|
end
|
26
35
|
|
27
36
|
it "sets empty array when assigning empty string" do
|
28
|
-
|
29
|
-
|
37
|
+
model.tags = ""
|
38
|
+
model.tags.should == []
|
30
39
|
end
|
31
40
|
|
32
41
|
it "sets empty array when assigning string containing only spaces" do
|
33
|
-
|
34
|
-
|
42
|
+
model.tags = " "
|
43
|
+
model.tags.should == []
|
35
44
|
end
|
36
45
|
|
37
46
|
it "sets empty array when assigning array of strings containing only spaces" do
|
38
|
-
|
39
|
-
|
40
|
-
end
|
41
|
-
end
|
42
|
-
|
43
|
-
context 'with_all_XXX' do
|
44
|
-
it 'matches rows' do
|
45
|
-
thing.tags = %w(foo bar)
|
46
|
-
thing.save!
|
47
|
-
Thing.with_all_tags(%w(foo bar)).first.should == thing
|
48
|
-
Thing.with_all_tags(%w(foo bar baz)).first.should == nil
|
49
|
-
Thing.with_all_tags(%w(FOO BAR)).first.should == thing
|
50
|
-
Thing.with_all_tags(%w(FOO BAR BAZ)).first.should == nil
|
51
|
-
Thing.with_all_tags(%w(baz)).first.should == nil
|
47
|
+
model.tags = [" "]
|
48
|
+
model.tags.should == []
|
52
49
|
end
|
53
50
|
end
|
54
51
|
|
55
|
-
|
56
|
-
|
57
|
-
|
58
|
-
|
59
|
-
|
60
|
-
|
61
|
-
|
62
|
-
|
63
|
-
|
64
|
-
|
65
|
-
|
66
|
-
|
67
|
-
|
68
|
-
|
69
|
-
|
70
|
-
|
71
|
-
|
72
|
-
|
73
|
-
|
74
|
-
|
75
|
-
|
76
|
-
|
77
|
-
|
78
|
-
|
79
|
-
|
80
|
-
|
81
|
-
|
82
|
-
|
83
|
-
|
84
|
-
|
85
|
-
|
86
|
-
|
87
|
-
|
88
|
-
|
89
|
-
|
52
|
+
[ModelUsingTsVectorFormat, ModelUsingTextFormat].each do |klass|
|
53
|
+
describe "using #{klass.const_get('FORMAT')} column" do
|
54
|
+
context 'with_all_XXX' do
|
55
|
+
let :model do
|
56
|
+
klass.new
|
57
|
+
end
|
58
|
+
|
59
|
+
it 'matches rows' do
|
60
|
+
model.tags = %w(foo bar)
|
61
|
+
model.save!
|
62
|
+
klass.with_all_tags(%w(foo bar)).first.should == model
|
63
|
+
klass.with_all_tags(%w(foo bar baz)).first.should == nil
|
64
|
+
klass.with_all_tags(%w(FOO BAR)).first.should == model
|
65
|
+
klass.with_all_tags(%w(FOO BAR BAZ)).first.should == nil
|
66
|
+
klass.with_all_tags(%w(baz)).first.should == nil
|
67
|
+
end
|
68
|
+
end
|
69
|
+
|
70
|
+
context 'with_any_XXX' do
|
71
|
+
let :model do
|
72
|
+
klass.new
|
73
|
+
end
|
74
|
+
|
75
|
+
it 'matches rows' do
|
76
|
+
model.tags = %w(foo bar)
|
77
|
+
model.save!
|
78
|
+
klass.with_any_tags(%w(foo bar)).first.should == model
|
79
|
+
klass.with_any_tags(%w(foo bar baz)).first.should == model
|
80
|
+
klass.with_any_tags(%w(FOO BAR)).first.should == model
|
81
|
+
klass.with_any_tags(%w(FOO BAR BAZ)).first.should == model
|
82
|
+
klass.with_any_tags(%w(baz)).first.should == nil
|
83
|
+
end
|
84
|
+
end
|
85
|
+
|
86
|
+
context 'without_all_XXX' do
|
87
|
+
let :model do
|
88
|
+
klass.new
|
89
|
+
end
|
90
|
+
|
91
|
+
it 'does not match rows' do
|
92
|
+
thing1 = klass.create!(:tags => %w(foo bar))
|
93
|
+
thing2 = klass.create!(:tags => %w(foo baz))
|
94
|
+
thing3 = klass.create!(:tags => %w(baz))
|
95
|
+
|
96
|
+
klass.without_all_tags(%w(foo)).sort.should == [thing3]
|
97
|
+
klass.without_all_tags(%w(foo bar)).sort.should == [thing2, thing3]
|
98
|
+
klass.without_all_tags(%w(baz)).sort.should == [thing1]
|
99
|
+
klass.without_all_tags(%w(foo baz)).sort.should == [thing1, thing3]
|
100
|
+
end
|
101
|
+
end
|
102
|
+
|
103
|
+
context 'without_any_XXX' do
|
104
|
+
let :model do
|
105
|
+
klass.new
|
106
|
+
end
|
107
|
+
|
108
|
+
it 'does not match rows' do
|
109
|
+
thing1 = klass.create!(:tags => %w(foo bar))
|
110
|
+
thing2 = klass.create!(:tags => %w(foo baz))
|
111
|
+
thing3 = klass.create!(:tags => %w(baz))
|
112
|
+
|
113
|
+
klass.without_any_tags(%w(foo)).sort.should == [thing3]
|
114
|
+
klass.without_any_tags(%w(foo bar)).sort.should == [thing3]
|
115
|
+
klass.without_any_tags(%w(baz)).sort.should == [thing1]
|
116
|
+
klass.without_any_tags(%w(foo baz)).should == []
|
117
|
+
end
|
118
|
+
end
|
90
119
|
end
|
91
120
|
end
|
92
121
|
|
93
122
|
context 'normalization' do
|
94
123
|
let :thang do
|
95
|
-
|
124
|
+
ModelWithNormalization.new
|
96
125
|
end
|
97
126
|
|
98
127
|
it 'calls normalizing function on assignment' do
|
@@ -103,10 +132,10 @@ describe TsVectors::Model do
|
|
103
132
|
it 'normalizes query' do
|
104
133
|
thang.tags = %w(foo123 BAR123)
|
105
134
|
thang.save!
|
106
|
-
|
107
|
-
|
108
|
-
|
109
|
-
|
135
|
+
ModelWithNormalization.with_all_tags(%w(foo123)).first.should == thang
|
136
|
+
ModelWithNormalization.with_all_tags(%w(BAR123)).first.should == thang
|
137
|
+
ModelWithNormalization.with_all_tags(%w(foo)).first.should == thang
|
138
|
+
ModelWithNormalization.with_all_tags(%w(bar)).first.should == thang
|
110
139
|
end
|
111
140
|
end
|
112
141
|
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: ar-tsvectors
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.2
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -90,6 +90,7 @@ files:
|
|
90
90
|
- Rakefile
|
91
91
|
- ar-tsvectors.gemspec
|
92
92
|
- lib/activerecord_tsvectors.rb
|
93
|
+
- lib/ts_vectors/attribute.rb
|
93
94
|
- lib/ts_vectors/model.rb
|
94
95
|
- lib/ts_vectors/version.rb
|
95
96
|
- spec/spec_helper.rb
|