ar-tsvectors 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,5 @@
1
+ *.gem
2
+ .bundle
3
+ Gemfile.lock
4
+ pkg/*
5
+ /spec/database.yml
data/.rspec ADDED
@@ -0,0 +1 @@
1
+ --color
data/Gemfile ADDED
@@ -0,0 +1,2 @@
1
+ source :rubygems
2
+ gemspec
@@ -0,0 +1,86 @@
1
+ # ActiveRecord support for ts_vectors
2
+
3
+ This small library extends ActiveRecord to support PostgreSQL's `tsvector` type. PostgreSQL's `tsvector` stores an array of keywords that can be efficiently queried. The `tsvector` type is therefore particularly suitable for storing tags.
4
+
5
+ ## Requirements
6
+
7
+ * ActiveRecord
8
+ * PostgreSQL with `tsvector` support. (This has been included out of the box since 8.3.)
9
+
10
+ ## Usage
11
+
12
+ Add gem to your `Gemfile`:
13
+
14
+ gem 'ar-tsvectors', :require => 'activerecord_tsvectors'
15
+
16
+ To test, add a `tsvector` column to your table in a migration:
17
+
18
+ add_column :posts, :tags, :tsvector
19
+
20
+ Then extend your ActiveRecord model with a declaration:
21
+
22
+ class Post < ActiveRecord::Base
23
+ ts_vector :tags
24
+ end
25
+
26
+ Now you can assign tags:
27
+
28
+ post = Post.new
29
+ post.tags = ['pizza', 'pepperoni', 'Italian', 'mushrooms']
30
+ post.save
31
+
32
+ You can now query based on the tags:
33
+
34
+ Post.with_all_tags('pizza')
35
+ Post.with_all_tags(['pizza', 'pepperoni'])
36
+
37
+ Note that for queries to use indexes, you need to create an index on the column. This is slightly more complicated; again, in a migration:
38
+
39
+ execute("create index index_posts_on_tags on posts using gin(to_tsvector('simple', tags))")
40
+
41
+ ## Methods
42
+
43
+ Declaring a `tsvector` column `tags` will dynamically add the following methods:
44
+
45
+ * `with_all_tags(t)` - returns a scope that searches for `t`, which may be either a single value or an array of values. Only rows matching _all_ values will be returned.
46
+ * `without_all_tags(t)` - returns a scope that excludes `t`, which may be either a single value or an array of values. Only rows matching _all_ values will be excluded.
47
+ * `with_any_tags(t)` - returns a scope that searches for `t`, which may be either a single value or an array of values. Rows matching at least one of the provided values will be returned.
48
+ * `without_any_tags(t)` - returns a scope that excludes `t`, which may be either a single value or an array of values. Rows matching at least one of the provided values will be excluded.
49
+ * `order_by_tags_rank(t, direction = 'DESC')` - returns a scope that orders the rows by the rank, ie. a score computed based on the overlap of `t` and the row's value. See below for an example.
50
+
51
+ ## Ranking
52
+
53
+ It's trivial to rank results according to how many values match a given row:
54
+
55
+ @posts = Post.with_any_tags(["pizza", "pepperoni"]).
56
+ order_by_tags_rank(["pizza", "pepperoni"])
57
+
58
+ This orders the rows such that rows matching _both_ `pizza` and `pepperoni` will be ordered above rows matching _either_ `pizza` or `pepperoni` but not both.
59
+
60
+ ## Normalization
61
+
62
+ The default behaviour is to downcase and strip leading and trailing spaces from assigned values. By default, matches are case-sensitive; so something tagged with `PIZZA` will not match a search for `pizza`. To ensure case insensitivity, you can provide a normalization function, like so:
63
+
64
+ class Post < ActiveRecord::Base
65
+ ts_vector :tags, :normalize => lambda { |v| v.downcase }
66
+ end
67
+
68
+ You can also use this to strip unwanted characters:
69
+
70
+ class Post < ActiveRecord::Base
71
+ ts_vector :tags, :normalize => lambda { |v| v.downcase.gsub(/[^\w]/, '') }
72
+ end
73
+
74
+ The values are normalized both when performing queries, and when assigning new values:
75
+
76
+ post.tags = [' WTF#$%^ &*??!']
77
+ post.tags
78
+ => ['wtf']
79
+
80
+ ## Limitations
81
+
82
+ Currently, the library will always use the built-in `simple` configuration, which only performs basic normalization, and does not perform stemming.
83
+
84
+ Due to a limitation in ActiveRecord, stored column values (on `INSERT` and `UPDATE`) are passed to PostgreSQL as strings, and are therefore *not* normalized using the text configuration's rules. This means that if you override the normalization function, you must make sure you always strip and downcase in addition to whatever other normalization you do, otherwise queries will potentially *not* match all rows.
85
+
86
+ A forthcoming version of ActiveRecord will provide the plumbing that will allow us to solve this issue.
@@ -0,0 +1 @@
1
+ require "bundler/gem_tasks"
@@ -0,0 +1,26 @@
1
+ # -*- encoding: utf-8 -*-
2
+ $:.push File.expand_path("../lib", __FILE__)
3
+
4
+ require 'ts_vectors/version'
5
+
6
+ Gem::Specification.new do |s|
7
+ s.name = "ar-tsvectors"
8
+ s.version = TsVectors::VERSION
9
+ s.authors = ["Alexander Staubo"]
10
+ s.email = ["alex@bengler.no"]
11
+ s.homepage = ""
12
+ s.summary = %q{Support for PostgreSQL's ts_vector data type in ActiveRecord}
13
+ s.description = %q{Support for PostgreSQL's ts_vector data type in ActiveRecord. Perfect for indexing tags, arrays etc.}
14
+
15
+ s.rubyforge_project = "ar-tsvectors"
16
+
17
+ s.files = `git ls-files`.split("\n")
18
+ s.test_files = `git ls-files -- {spec}/*`.split("\n")
19
+ s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
20
+ s.require_paths = ["lib"]
21
+
22
+ s.add_dependency "activerecord", ">= 3.0"
23
+ s.add_development_dependency "rspec"
24
+ s.add_development_dependency "simplecov"
25
+ s.add_development_dependency "activerecord-postgresql-adapter"
26
+ end
@@ -0,0 +1,7 @@
1
+ require 'active_record'
2
+
3
+ require 'ts_vectors/model'
4
+
5
+ class ActiveRecord::Base
6
+ include ::TsVectors::Model
7
+ end
@@ -0,0 +1,124 @@
1
+ module TsVectors
2
+ module Model
3
+ extend ActiveSupport::Concern
4
+
5
+ included do
6
+ end
7
+
8
+ class Attribute
9
+ def initialize(name, options = {})
10
+ options.assert_valid_keys(:normalize)
11
+ @name, @options = name, options
12
+ end
13
+
14
+ def parse_values(string)
15
+ if string
16
+ values = string.scan(/(?:([^'\s,]+)|'([^']+)')\s*/u).flatten
17
+ values.reject! { |v| v.blank? }
18
+ values
19
+ else
20
+ []
21
+ end
22
+ end
23
+
24
+ def normalize_values(values)
25
+ values = [values] unless values.is_a?(Enumerable)
26
+ values = values.map { |v| normalize(v) }
27
+ values.compact!
28
+ values
29
+ end
30
+
31
+ def normalize(value)
32
+ if value
33
+ if (normalize = @options[:normalize])
34
+ value = normalize.call(value)
35
+ else
36
+ value = value.strip.downcase
37
+ end
38
+ if value.blank?
39
+ nil
40
+ elsif value =~ /\s/
41
+ %('#{value}')
42
+ else
43
+ value
44
+ end
45
+ end
46
+ end
47
+
48
+ attr_reader :name
49
+ attr_reader :options
50
+ end
51
+
52
+ module ClassMethods
53
+
54
+ def ts_vector(attribute, options = {})
55
+ attr = Attribute.new(attribute, options)
56
+
57
+ @ts_vectors ||= {}
58
+ @ts_vectors[attribute] = attr
59
+
60
+ scope "with_all_#{attribute}", lambda { |values|
61
+ values = attr.normalize_values(values)
62
+ if values.any?
63
+ where("#{attribute} @@ to_tsquery('simple', ?)", values.join(' & '))
64
+ else
65
+ where('false')
66
+ end
67
+ }
68
+
69
+ scope "with_any_#{attribute}", lambda { |values|
70
+ values = attr.normalize_values(values)
71
+ if values.any?
72
+ where("#{attribute} @@ to_tsquery('simple', ?)", values.join(' | '))
73
+ else
74
+ where('false')
75
+ end
76
+ }
77
+
78
+ scope "without_all_#{attribute}", lambda { |values|
79
+ values = attr.normalize_values(values)
80
+ if values.any?
81
+ where("#{attribute} @@ (!! to_tsquery('simple', ?))", values.join(' & '))
82
+ else
83
+ where('false')
84
+ end
85
+ }
86
+
87
+ scope "without_any_#{attribute}", lambda { |values|
88
+ values = attr.normalize_values(values)
89
+ if values.any?
90
+ where("#{attribute} @@ (!! to_tsquery('simple', ?))", values.join(' | '))
91
+ else
92
+ where('false')
93
+ end
94
+ }
95
+
96
+ scope "order_by_#{attribute}_rank", lambda { |values, direction = nil|
97
+ direction = 'DESC' unless %w(asc ascending desc descending).include?(direction.try(:downcase))
98
+ values = attr.normalize_values(values)
99
+ if values.any?
100
+ order(sanitize_sql_array([
101
+ "ts_rank(#{attribute}, to_tsquery('simple', ?)) #{direction}", values.join(' | ')]))
102
+ else
103
+ order('false')
104
+ end
105
+ }
106
+
107
+ define_method(attribute) do
108
+ attr.parse_values(read_attribute(attribute))
109
+ end
110
+
111
+ define_method("#{attribute}=") do |values|
112
+ values = attr.normalize_values(values)
113
+ if values.any?
114
+ write_attribute(attribute, values.join(' '))
115
+ else
116
+ write_attribute(attribute, nil)
117
+ end
118
+ end
119
+ end
120
+
121
+ end
122
+
123
+ end
124
+ end
@@ -0,0 +1,3 @@
1
+ module TsVectors
2
+ VERSION = '0.0.1'
3
+ end
@@ -0,0 +1,44 @@
1
+ require 'simplecov' # Must be required first
2
+
3
+ require 'activerecord_tsvectors'
4
+
5
+ require 'logger'
6
+
7
+ ActiveRecord::Base.logger = Logger.new(STDOUT)
8
+ ActiveRecord::Base.logger.level = Logger::INFO
9
+
10
+ begin
11
+ database_config = YAML.load(
12
+ File.open(File.expand_path("../database.yml", __FILE__)))
13
+ rescue Errno::ENOENT
14
+ abort "You need to create an ActiveRecord database configuration in #{File.expand_path("../database.yml", __FILE__)}."
15
+ else
16
+ connection = ActiveRecord::Base.establish_connection(database_config).connection
17
+ connection.execute("set client_min_messages = warning")
18
+ end
19
+
20
+ RSpec.configure do |c|
21
+ c.mock_with :rspec
22
+
23
+ c.before :each do
24
+ ActiveRecord::Base.connection.execute %(
25
+ create table if not exists things (
26
+ id serial primary key,
27
+ tags tsvector
28
+ )
29
+ )
30
+ ActiveRecord::Base.connection.execute %(
31
+ create table if not exists thangs (
32
+ id serial primary key,
33
+ tags tsvector
34
+ )
35
+ )
36
+ end
37
+
38
+ c.around(:each) do |example|
39
+ ActiveRecord::Base.connection.transaction do
40
+ example.run
41
+ raise ActiveRecord::Rollback
42
+ end
43
+ end
44
+ end
@@ -0,0 +1,113 @@
1
+ require 'spec_helper'
2
+
3
+ class Thing < ActiveRecord::Base
4
+ ts_vector :tags
5
+ end
6
+
7
+ class Thang < ActiveRecord::Base
8
+ ts_vector :tags, :normalize => lambda { |v| v.downcase.gsub(/\d/, '') }
9
+ end
10
+
11
+ describe TsVectors::Model do
12
+ let :thing do
13
+ Thing.new
14
+ end
15
+
16
+ context 'attribute accessor' do
17
+ it "sets empty array when assigning nil" do
18
+ thing.tags = nil
19
+ thing.tags.should == []
20
+ end
21
+
22
+ it "sets empty array when assigning empty array" do
23
+ thing.tags = []
24
+ thing.tags.should == []
25
+ end
26
+
27
+ it "sets empty array when assigning empty string" do
28
+ thing.tags = ""
29
+ thing.tags.should == []
30
+ end
31
+
32
+ it "sets empty array when assigning string containing only spaces" do
33
+ thing.tags = " "
34
+ thing.tags.should == []
35
+ end
36
+
37
+ it "sets empty array when assigning array of strings containing only spaces" do
38
+ thing.tags = [" "]
39
+ thing.tags.should == []
40
+ end
41
+ end
42
+
43
+ context 'with_all_XXX' do
44
+ it 'matches rows' do
45
+ thing.tags = %w(foo bar)
46
+ thing.save!
47
+ Thing.with_all_tags(%w(foo bar)).first.should == thing
48
+ Thing.with_all_tags(%w(foo bar baz)).first.should == nil
49
+ Thing.with_all_tags(%w(FOO BAR)).first.should == thing
50
+ Thing.with_all_tags(%w(FOO BAR BAZ)).first.should == nil
51
+ Thing.with_all_tags(%w(baz)).first.should == nil
52
+ end
53
+ end
54
+
55
+ context 'with_any_XXX' do
56
+ it 'matches rows' do
57
+ thing.tags = %w(foo bar)
58
+ thing.save!
59
+ Thing.with_any_tags(%w(foo bar)).first.should == thing
60
+ Thing.with_any_tags(%w(foo bar baz)).first.should == thing
61
+ Thing.with_any_tags(%w(FOO BAR)).first.should == thing
62
+ Thing.with_any_tags(%w(FOO BAR BAZ)).first.should == thing
63
+ Thing.with_any_tags(%w(baz)).first.should == nil
64
+ end
65
+ end
66
+
67
+ context 'without_all_XXX' do
68
+ it 'does not match rows' do
69
+ thing1 = Thing.create!(:tags => %w(foo bar))
70
+ thing2 = Thing.create!(:tags => %w(foo baz))
71
+ thing3 = Thing.create!(:tags => %w(baz))
72
+
73
+ Thing.without_all_tags(%w(foo)).sort.should == [thing3]
74
+ Thing.without_all_tags(%w(foo bar)).sort.should == [thing2, thing3]
75
+ Thing.without_all_tags(%w(baz)).sort.should == [thing1]
76
+ Thing.without_all_tags(%w(foo baz)).sort.should == [thing1, thing3]
77
+ end
78
+ end
79
+
80
+ context 'without_any_XXX' do
81
+ it 'does not match rows' do
82
+ thing1 = Thing.create!(:tags => %w(foo bar))
83
+ thing2 = Thing.create!(:tags => %w(foo baz))
84
+ thing3 = Thing.create!(:tags => %w(baz))
85
+
86
+ Thing.without_any_tags(%w(foo)).sort.should == [thing3]
87
+ Thing.without_any_tags(%w(foo bar)).sort.should == [thing3]
88
+ Thing.without_any_tags(%w(baz)).sort.should == [thing1]
89
+ Thing.without_any_tags(%w(foo baz)).should == []
90
+ end
91
+ end
92
+
93
+ context 'normalization' do
94
+ let :thang do
95
+ Thang.new
96
+ end
97
+
98
+ it 'calls normalizing function on assignment' do
99
+ thang.tags = %w(foo123 BAR123)
100
+ thang.tags.should == %w(foo bar)
101
+ end
102
+
103
+ it 'normalizes query' do
104
+ thang.tags = %w(foo123 BAR123)
105
+ thang.save!
106
+ Thang.with_all_tags(%w(foo123)).first.should == thang
107
+ Thang.with_all_tags(%w(BAR123)).first.should == thang
108
+ Thang.with_all_tags(%w(foo)).first.should == thang
109
+ Thang.with_all_tags(%w(bar)).first.should == thang
110
+ end
111
+ end
112
+
113
+ end
metadata ADDED
@@ -0,0 +1,121 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: ar-tsvectors
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1
5
+ prerelease:
6
+ platform: ruby
7
+ authors:
8
+ - Alexander Staubo
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2012-06-08 00:00:00.000000000 Z
13
+ dependencies:
14
+ - !ruby/object:Gem::Dependency
15
+ name: activerecord
16
+ requirement: !ruby/object:Gem::Requirement
17
+ none: false
18
+ requirements:
19
+ - - ! '>='
20
+ - !ruby/object:Gem::Version
21
+ version: '3.0'
22
+ type: :runtime
23
+ prerelease: false
24
+ version_requirements: !ruby/object:Gem::Requirement
25
+ none: false
26
+ requirements:
27
+ - - ! '>='
28
+ - !ruby/object:Gem::Version
29
+ version: '3.0'
30
+ - !ruby/object:Gem::Dependency
31
+ name: rspec
32
+ requirement: !ruby/object:Gem::Requirement
33
+ none: false
34
+ requirements:
35
+ - - ! '>='
36
+ - !ruby/object:Gem::Version
37
+ version: '0'
38
+ type: :development
39
+ prerelease: false
40
+ version_requirements: !ruby/object:Gem::Requirement
41
+ none: false
42
+ requirements:
43
+ - - ! '>='
44
+ - !ruby/object:Gem::Version
45
+ version: '0'
46
+ - !ruby/object:Gem::Dependency
47
+ name: simplecov
48
+ requirement: !ruby/object:Gem::Requirement
49
+ none: false
50
+ requirements:
51
+ - - ! '>='
52
+ - !ruby/object:Gem::Version
53
+ version: '0'
54
+ type: :development
55
+ prerelease: false
56
+ version_requirements: !ruby/object:Gem::Requirement
57
+ none: false
58
+ requirements:
59
+ - - ! '>='
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ - !ruby/object:Gem::Dependency
63
+ name: activerecord-postgresql-adapter
64
+ requirement: !ruby/object:Gem::Requirement
65
+ none: false
66
+ requirements:
67
+ - - ! '>='
68
+ - !ruby/object:Gem::Version
69
+ version: '0'
70
+ type: :development
71
+ prerelease: false
72
+ version_requirements: !ruby/object:Gem::Requirement
73
+ none: false
74
+ requirements:
75
+ - - ! '>='
76
+ - !ruby/object:Gem::Version
77
+ version: '0'
78
+ description: Support for PostgreSQL's ts_vector data type in ActiveRecord. Perfect
79
+ for indexing tags, arrays etc.
80
+ email:
81
+ - alex@bengler.no
82
+ executables: []
83
+ extensions: []
84
+ extra_rdoc_files: []
85
+ files:
86
+ - .gitignore
87
+ - .rspec
88
+ - Gemfile
89
+ - README.md
90
+ - Rakefile
91
+ - ar-tsvectors.gemspec
92
+ - lib/activerecord_tsvectors.rb
93
+ - lib/ts_vectors/model.rb
94
+ - lib/ts_vectors/version.rb
95
+ - spec/spec_helper.rb
96
+ - spec/ts_vectors_spec.rb
97
+ homepage: ''
98
+ licenses: []
99
+ post_install_message:
100
+ rdoc_options: []
101
+ require_paths:
102
+ - lib
103
+ required_ruby_version: !ruby/object:Gem::Requirement
104
+ none: false
105
+ requirements:
106
+ - - ! '>='
107
+ - !ruby/object:Gem::Version
108
+ version: '0'
109
+ required_rubygems_version: !ruby/object:Gem::Requirement
110
+ none: false
111
+ requirements:
112
+ - - ! '>='
113
+ - !ruby/object:Gem::Version
114
+ version: '0'
115
+ requirements: []
116
+ rubyforge_project: ar-tsvectors
117
+ rubygems_version: 1.8.24
118
+ signing_key:
119
+ specification_version: 3
120
+ summary: Support for PostgreSQL's ts_vector data type in ActiveRecord
121
+ test_files: []