dusen 0.2.2 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (59) hide show
  1. data/.gitignore +1 -0
  2. data/.travis.yml +3 -0
  3. data/README.md +176 -47
  4. data/documents/fulltext_vs_like_benchmark/all_records_and_scope/fulltext.csv +22 -0
  5. data/documents/fulltext_vs_like_benchmark/all_records_and_scope/fulltext_vs_like.xls +0 -0
  6. data/documents/fulltext_vs_like_benchmark/all_records_and_scope/like.csv +22 -0
  7. data/documents/fulltext_vs_like_benchmark/benchmark.rb +70 -0
  8. data/documents/fulltext_vs_like_benchmark/exact_number_of_records/fulltext.csv +22 -0
  9. data/documents/fulltext_vs_like_benchmark/exact_number_of_records/fulltext_vs_like.png +0 -0
  10. data/documents/fulltext_vs_like_benchmark/exact_number_of_records/fulltext_vs_like.xls +0 -0
  11. data/documents/fulltext_vs_like_benchmark/exact_number_of_records/like.csv +21 -0
  12. data/dusen.gemspec +1 -1
  13. data/lib/dusen/active_record/base_ext.rb +104 -0
  14. data/lib/dusen/active_record/search_text.rb +50 -0
  15. data/lib/dusen/description.rb +5 -4
  16. data/lib/dusen/query.rb +24 -4
  17. data/lib/dusen/railtie.rb +9 -0
  18. data/lib/dusen/syntax.rb +5 -0
  19. data/lib/dusen/tasks.rb +31 -0
  20. data/lib/dusen/token.rb +4 -0
  21. data/lib/dusen/util.rb +86 -1
  22. data/lib/dusen/version.rb +1 -1
  23. data/lib/dusen.rb +7 -1
  24. data/spec/rails-2.3/Gemfile +3 -1
  25. data/spec/rails-2.3/Rakefile +1 -1
  26. data/spec/rails-2.3/app_root/config/database.yml +4 -19
  27. data/spec/rails-2.3/app_root/config/environments/{in_memory.rb → test.rb} +0 -0
  28. data/spec/rails-2.3/spec/spec_helper.rb +7 -9
  29. data/spec/rails-3.0/Gemfile +3 -1
  30. data/spec/rails-3.0/Rakefile +1 -1
  31. data/spec/rails-3.0/app_root/config/database.yml +5 -3
  32. data/spec/rails-3.0/spec/spec_helper.rb +6 -7
  33. data/spec/rails-3.2/Gemfile +2 -1
  34. data/spec/rails-3.2/Rakefile +1 -1
  35. data/spec/rails-3.2/app_root/config/database.yml +5 -3
  36. data/spec/rails-3.2/spec/spec_helper.rb +3 -7
  37. data/spec/shared/app_root/app/models/recipe/category.rb +13 -0
  38. data/spec/shared/app_root/app/models/recipe/ingredient.rb +13 -0
  39. data/spec/shared/app_root/app/models/recipe.rb +14 -0
  40. data/spec/shared/app_root/app/models/user/with_fulltext.rb +35 -0
  41. data/spec/shared/app_root/app/models/user/without_fulltext.rb +34 -0
  42. data/spec/shared/app_root/config/database.sample.yml +6 -0
  43. data/spec/shared/app_root/db/migrate/001_create_search_text.rb +19 -0
  44. data/spec/shared/app_root/db/migrate/002_create_user_variants.rb +25 -0
  45. data/spec/shared/app_root/db/migrate/003_create_recipe_models.rb +23 -0
  46. data/spec/shared/spec/dusen/active_record/base_ext_spec.rb +138 -0
  47. data/spec/shared/spec/dusen/active_record/search_text_spec.rb +23 -0
  48. data/spec/shared/spec/dusen/parser_spec.rb +14 -0
  49. data/spec/shared/spec/dusen/query_spec.rb +20 -0
  50. data/spec/shared/spec/dusen/util_spec.rb +21 -0
  51. metadata +80 -46
  52. data/lib/dusen/active_record_ext.rb +0 -35
  53. data/spec/rails-2.3/app_root/config/environments/mysql.rb +0 -0
  54. data/spec/rails-2.3/app_root/config/environments/postgresql.rb +0 -0
  55. data/spec/rails-2.3/app_root/config/environments/sqlite.rb +0 -0
  56. data/spec/rails-2.3/app_root/config/environments/sqlite3.rb +0 -0
  57. data/spec/shared/app_root/app/models/user.rb +0 -22
  58. data/spec/shared/app_root/db/migrate/001_create_users.rb +0 -17
  59. data/spec/shared/dusen/active_record_spec.rb +0 -55
data/.gitignore CHANGED
@@ -5,5 +5,6 @@ pkg
5
5
  app_root/log/*
6
6
  Gemfile.lock
7
7
  spec/shared/app_root/db/*.db
8
+ spec/shared/app_root/config/database.yml
8
9
 
9
10
 
data/.travis.yml CHANGED
@@ -7,3 +7,6 @@ script: rake all:bundle all:spec
7
7
  notifications:
8
8
  email:
9
9
  - fail@makandra.de
10
+ branches:
11
+ only:
12
+ - master
data/README.md CHANGED
@@ -1,20 +1,15 @@
1
- Dusen - Maps Google-like queries to ActiveRecord scopes
2
- =======================================================
1
+ Dusen [![Build Status](https://secure.travis-ci.org/makandra/dusen.png?branch=master)](https://travis-ci.org/makandra/dusen)
2
+ ======
3
3
 
4
- [![Build Status](https://secure.travis-ci.org/makandra/dusen.png?branch=master)](https://travis-ci.org/makandra/dusen)
4
+ Comprehensive search solution for ActiveRecord and MySQL
5
+ --------------------------------------------------------
5
6
 
6
- Dusen gives your ActiveRecord models a DSL to process Google-like queries like:
7
+ Dusen lets you search ActiveRecord model when all you have is MySQL (no Solr, Sphinx, etc.). Here's what Dusen does for you:
7
8
 
8
- some words
9
- "a phrase of words"
10
- filetype:pdf
11
- a mix of words "and phrases" and qualified:fields
12
-
13
- Dusen tokenizes these queries for you and feeds them through simple mappers that
14
- convert a token to an ActiveRecord scope chain.
15
- This process is packaged in a class method `.search`:
16
-
17
- Contact.search('makandra software "Ruby on Rails" city:augsburg')
9
+ 1. It takes a text query in Google-like syntax (e.g. `some words "a phrase" filetype:pdf`)
10
+ 2. It parses the query into individual tokens.
11
+ 3. It lets you define simple mappers that convert a token to an ActiveRecord scope chain. Mappers can match tokens using ActiveRecord's `where` or perform full text searches with either [LIKE queries](#processing-full-text-search-queries-with-like-queries) or [FULLTEXT indexes](#processing-full-text-queries-with-fulltext-indexes) (see [performance analysis](https://makandracards.com/makandra/12813-performance-analysis-of-mysql-s-fulltext-indexes-and-like-queries-for-full-text-search)).
12
+ 4. It gives your model a method `Model.search('some query')` that performs all of the above and returns an ActiveRecord scope chain.
18
13
 
19
14
 
20
15
  Installation
@@ -28,25 +23,25 @@ Now run `bundle install` and restart your server.
28
23
 
29
24
 
30
25
 
31
- Processing text queries
32
- -----------------------
26
+ Processing full text search queries with LIKE queries
27
+ -----------------------------------------------------
33
28
 
34
29
  This describes how to define a search syntax that processes queries
35
- of words and phrases:
30
+ of words and phrases, e.g. `coworking fooville "market ave"`.
31
+
32
+ Under the hood the search will be performed using [LIKE queries](http://dev.mysql.com/doc/refman/5.0/en/string-comparison-functions.html#operator_like), which are [fast enough](https://makandracards.com/makandra/12813-performance-analysis-of-mysql-s-fulltext-indexes-and-like-queries-for-full-text-search) for medium sized data sets. Once your data outgrows LIKE queries, Dusen lets you [migrate to FULLTEXT indexes](#processing-full-text-queries-with-fulltext-indexes), which perform better but come at some added complexity.
36
33
 
37
- coworking fooville "market ave"
38
34
 
35
+ ### Setup and usage
39
36
 
40
37
  Our example will be a simple address book:
41
38
 
42
39
  class Contact < ActiveRecord::Base
43
-
44
40
  validates_presence_of :name, :street, :city, :name
45
-
46
41
  end
47
42
 
48
43
 
49
- We will now teach `Contact` to process a text query like this:
44
+ In order to teach `Contact` how to process a text query, use the `search_syntax` and `search_by :text` macros:
50
45
 
51
46
  class Contact < ActiveRecord::Base
52
47
 
@@ -54,9 +49,9 @@ We will now teach `Contact` to process a text query like this:
54
49
 
55
50
  search_syntax do
56
51
 
57
- search_by :text do |scope, phrase|
52
+ search_by :text do |scope, phrases|
58
53
  columns = [:name, :street, :city, :email]
59
- scope.where_like(columns => phrase)
54
+ scope.where_like(columns => phrases)
60
55
  end
61
56
 
62
57
  end
@@ -64,36 +59,52 @@ We will now teach `Contact` to process a text query like this:
64
59
  end
65
60
 
66
61
 
67
- Note how you will only ever need to deal with a single token (word or phrase) and return a scope that matches the token.
68
- Dusen will take care how these scopes will be chained together.
62
+ Dusen will tokenize the query into individual phrases and call the `search_by :text` block with it. The block is expected to return a scope that filters by the given phrases.
69
63
 
70
- If we now call `Contact.search('coworking fooville "market ave"')`
71
- the block supplied to `search_by` is called once per token:
64
+ If, for example, we call `Contact.search('coworking fooville "market ave"')`
65
+ the block supplied to `search_by :text` is called with the following arguments:
72
66
 
73
- 1. `|Contact, 'coworking'|`
74
- 2. `|Contact.where_like(columns => 'coworking'), 'fooville'|`
75
- 3. `|Contact.where_like(columns => 'coworking').where_like(columns => 'fooville'), 'market ave'|`
67
+ |Contact, ['coworking', 'fooville', 'market ave']|
76
68
 
77
69
 
78
70
  The resulting scope chain is your `Contact` model filtered by
79
71
  the given query:
80
72
 
81
73
  > Contact.search('coworking fooville "market ave"')
82
- => Contact.where_like(columns => 'coworking').where_like(columns => 'fooville').where_like(columns => 'market ave')
74
+ => Contact.where_like([:name, :street, :city, :email] => ['coworking', 'fooville', 'market ave'])
83
75
 
76
+ ### What where_like does under the hood
84
77
 
85
78
  Note that `where_like` is an utility method that comes with the Dusen gem.
86
- It takes one or more column names and a phrase and generates an SQL fragment
87
- like this:
88
-
89
- contacts.name LIKE "%coworking%" OR contacts.street LIKE "%coworking%" OR contacts.email LIKE "%coworking%" OR contacts.email LIKE "%coworking%"
79
+ It takes one or more column names and one or more phrases and generates an SQL fragment
80
+ that looks roughly like the following:
81
+
82
+ ( contacts.name LIKE "%coworking%" OR
83
+ contacts.street LIKE "%coworking%" OR
84
+ contacts.email LIKE "%coworking%" OR
85
+ contacts.email LIKE "%coworking%" ) AND
86
+ ( contacts.name LIKE "%fooville%" OR
87
+ contacts.street LIKE "%fooville%" OR
88
+ contacts.email LIKE "%fooville%" OR
89
+ contacts.email LIKE "%fooville%" ) AND
90
+ ( contacts.name LIKE "%market ave%" OR
91
+ contacts.street LIKE "%market ave%" OR
92
+ contacts.email LIKE "%market ave%" OR
93
+ contacts.email LIKE "%market ave%" )
90
94
 
91
95
 
92
96
  Processing queries for qualified fields
93
97
  ---------------------------------------
94
98
 
95
- Let's give `Contact` a way to explictely search for a contact's email address, without
96
- going through a full text search. We do this by adding additional `search_by` instructions
99
+ Google supports queries like `filetype:pdf` that filters records by some criteria without performing a full text search. Dusen gives you a simple way to support such search syntax.
100
+
101
+ ### Setup and usage
102
+
103
+ We now want to process a qualified query like `email:foo@bar.com` to
104
+ explictily search for a contact's email address, without going through
105
+ a full text search.
106
+
107
+ We can learn this syntax by adding a `search_by :email` instruction
97
108
  to our model:
98
109
 
99
110
  search_syntax do
@@ -102,11 +113,11 @@ to our model:
102
113
  ...
103
114
  end
104
115
 
105
- search_by :email do |scope, email|
106
- scope.where(:email => email)
107
- end
116
+ search_by :email do |scope, email|
117
+ scope.where(:email => email)
118
+ end
108
119
 
109
- end
120
+ end
110
121
 
111
122
 
112
123
  The result is this:
@@ -115,17 +126,134 @@ The result is this:
115
126
  => Contact.where(:email => 'foo@bar.com')
116
127
 
117
128
 
118
- Feel free to combine text tokens and field tokens:
129
+ Note that you can combine text tokens and field tokens:
119
130
 
120
131
  > Contact.search('fooville email:foo@bar.com')
121
132
  => Contact.where_like(columns => 'fooville').where(:email => 'foo@bar.com')
122
133
 
123
134
 
135
+ Processing full text queries with FULLTEXT indexes
136
+ ---------------------------------------------------
137
+
138
+ ### When do I need this?
139
+
140
+ As your number of records grows larger, you might outgrow a full text implementation that uses LIKE (see [performance analysis](https://makandracards.com/makandra/12813-performance-analysis-of-mysql-s-fulltext-indexes-and-like-queries-for-full-text-search)). For this case Dusen ships with an alternative full text search solution using MySQL FULLTEXT indexes that scale much better.
141
+
142
+ ### Understanding the MyISAM limitation
143
+
144
+ Using this feature comes at some added complexity so you should first check if search performance is actually a problem for you. If all you have is a few thousand records with a few dozen words each, changes are your views render many times longer than a LIKE query takes to finish. Always measure before optimizing.
145
+
146
+ Currently stable MySQL versions only allow FULLTEXT indexes on MyISAM tables (this will change in MySQL 5.6). You don't however want to migrate your models to MyISAM tables because of their many limitations (poor crash recovery, no transactions, etc.).
147
+
148
+ To work around this, Dusen uses a separate MyISAM table `search_texts` to index your searchable text. Each row in your model's table will be shadowed by a corresponding row in `search_texts`. Dusen will automatically create, update and destroy these shadow rows as your model records change.
149
+
150
+
151
+ ### Setup and usage
152
+
153
+ First we need to create the `search_texts` table. Since we're on Rails, we will do this using a migration. So enter `rails generate migration CreateSearchText` and use the following code as the migration's content:
154
+
155
+ class CreateSearchText < ActiveRecord::Migration
156
+
157
+ def self.up
158
+ create_table :search_texts, :options => 'ENGINE=MyISAM' do |t|
159
+ t.integer :source_id
160
+ t.string :source_type
161
+ t.boolean :stale
162
+ t.text :words
163
+ end
164
+ add_index :search_texts, [:source_type, :source_id] # for updates
165
+ add_index :search_texts, [:source_type, :stale] # for refreshs
166
+ execute 'CREATE FULLTEXT INDEX fulltext_index_words ON search_texts (words)'
167
+ end
168
+
169
+ def self.down
170
+ drop_table :search_texts
171
+ end
172
+
173
+ end
174
+
175
+ Since we're using some MySQL-specific options we also need to change the format of your `db/schema.rb` from Ruby to SQL (you will get a `db/schema.sql` instead). You can configure this in your `application.rb` (`environment.rb` in Rails 2):
176
+
177
+ config.active_record.schema_format = :sql
178
+
179
+
180
+ We now need to your model which text to index. We do this using the `search_text` macro and returning the searchable text:
181
+
182
+ class Contact < ActiveRecord::Base
183
+
184
+ search_syntax
185
+
186
+ search_text do
187
+ [name, street, city, email]
188
+ end
189
+
190
+ end
191
+
192
+ end
193
+
194
+ You can return any object or array of objects. Dusen will stringify the return value and index those words. Note that indexed words do not need to be fields of your model:
195
+
196
+ search_text do
197
+ [email, city, author.screen_name, ('client' if client?)
198
+ end
199
+
200
+ You're done! You can now search `Contact` using the same API you used with [LIKE queries](#processing-full-text-search-queries-with-like-queries):
201
+
202
+ Contact.search('coworking fooville "market ave"')
203
+
204
+ Note that you didn't need to teach your model how to process text queries by defining a mapper with `search_by :text { ... }`. The `search_text` macro defines this mapper for you.
205
+
206
+ Also note that if you migrated an existing table to FULLTEXT search, you need to [build the index the first time](#building-the-index-for-existing-records).
207
+
208
+
209
+ ### Building the index for existing records
210
+
211
+ If you migrated an existing table to FULLTEXT search, you must build the index for all existing records:
212
+
213
+ Model.all.each(&index_search_text)
214
+
215
+ You only need to do this once. Dusen will automatically index all further changes to your records.
216
+
217
+ ### Indexing changes in associated records
218
+
219
+ Dusen lets you index words from associated models. When you do this you need to reindex the indexed model whenever an associated record changes, or else the indexed text will be out of date.
220
+
221
+ As an example we will associate `Contact` with an `Organization` and make it searchable by the name of her `Organization`:
222
+
223
+ class Contact < ActiveRecord::Base
224
+
225
+ belongs_to :organization
226
+
227
+ search_syntax
228
+
229
+ search_text do
230
+ [name, email, organization && organization.name]
231
+ end
232
+
233
+ end
234
+
235
+ To make sure contacts will reindex when the organization changes its name, use the `part_of_search_text_for` macro:
236
+
237
+ class Organization < ActiveRecord::Base
238
+
239
+ has_many :contacts
240
+
241
+ part_of_search_text_for do
242
+ contacts
243
+ end
244
+
245
+ end
246
+
247
+ All records returned by `part_of_search_text_for` will be reindexed when the organization is changed or destroyed.
248
+
124
249
 
125
250
  Programmatic access without DSL
126
251
  -------------------------------
127
252
 
128
- You can use Dusen's functionality without using the ActiveRecord DSL or the search scope. Here are some method calls to get you started:
253
+ You can use Dusen's functionality without using the ActiveRecord DSL or the `search` method.
254
+ **Please note that at this time we cannot yet commit to the API of these internal methods**. So don't get mad when stuff breaks after you update the gem.
255
+
256
+ Here are some method calls to get you started:
129
257
 
130
258
  Contact.search_syntax # => #<Dusen::Syntax>
131
259
 
@@ -144,14 +272,15 @@ You can use Dusen's functionality without using the ActiveRecord DSL or the sear
144
272
  Development
145
273
  -----------
146
274
 
147
- Test applications for various Rails versions lives in `spec`. You can run specs from the project root by saying:
148
-
149
- bundle exec rake all:spec
275
+ - Test applications for various Rails versions lives in `spec`.
276
+ - You need to create a MySQL database and put credentials into `spec/shared/app_root/config/database.yml`.
277
+ - You can bundle all test applications by saying `bundle exec rake all:bundle`
278
+ - You can run specs from the project root by saying `bundle exec rake all:spec`.
150
279
 
151
280
  If you would like to contribute:
152
281
 
153
282
  - Fork the repository.
154
- - Push your changes **with specs**.
283
+ - Push your changes **with passing specs**.
155
284
  - Send me a pull request.
156
285
 
157
286
  I'm very eager to keep this gem leightweight and on topic. If you're unsure whether a change would make it into the gem, [talk to me beforehand](mailto:henning.koch@makandra.de).
@@ -0,0 +1,22 @@
1
+ 1000; 0.394796924591064
2
+ 2000; 0.391613826751709
3
+ 3000; 0.414114084243774
4
+ 4000; 0.406150894165039
5
+ 5000; 0.426540040969849
6
+ 6000; 0.449329986572266
7
+ 7000; 0.496056623458862
8
+ 8000; 0.480993881225586
9
+ 9000; 0.503967409133911
10
+ 10000; 0.523442239761353
11
+ 11000; 0.53717339515686
12
+ 12000; 0.564517946243286
13
+ 13000; 0.574517812728882
14
+ 14000; 0.585553913116455
15
+ 15000; 0.609066152572632
16
+ 16000; 0.618757972717285
17
+ 17000; 0.672317543029785
18
+ 18000; 0.667384099960327
19
+ 19000; 0.672960157394409
20
+ 20000; 0.696838903427124
21
+ 21000; 0.709187746047974
22
+
@@ -0,0 +1,22 @@
1
+ 1000; 0.0629708099365234
2
+ 2000; 0.101030836105347
3
+ 3000; 0.15950831413269
4
+ 4000; 0.725705051422119
5
+ 5000; 0.755747880935669
6
+ 6000; 0.791393089294434
7
+ 7000; 0.813040456771851
8
+ 8000; 0.842384424209595
9
+ 9000; 0.880409164428711
10
+ 10000; 0.901688318252563
11
+ 11000; 0.926919441223145
12
+ 12000; 0.963594560623169
13
+ 13000; 0.988169240951538
14
+ 14000; 1.02734631538391
15
+ 15000; 1.0335246181488
16
+ 16000; 1.07445472717285
17
+ 17000; 1.12085759162903
18
+ 18000; 1.12942158699036
19
+ 19000; 1.14870473861694
20
+ 20000; 1.19215731620789
21
+ 21000; 1.22716112136841
22
+
@@ -0,0 +1,70 @@
1
+ reload!; Benchmark.measure { puts "..."; Note.search('rails test method').all; puts "DONE." }
2
+
3
+
4
+
5
+
6
+
7
+
8
+
9
+
10
+
11
+ reload!
12
+
13
+ site = Site.first
14
+ note_scope = site.notes; nil
15
+ note_ids = note_scope.collect_ids; nil
16
+ puts "Benchmarking #{note_ids.size} notes"
17
+
18
+ batch_size = 1000
19
+ amount = batch_size
20
+ max_amount = note_ids.size
21
+
22
+ while amount < max_amount
23
+
24
+ batch_ids = note_ids[0, amount]
25
+ batch_scope = note_scope.scoped(:conditions => { :id => batch_ids })
26
+
27
+ times = []
28
+ 25.times do
29
+ times << (Benchmark.realtime { batch_scope.search('rails test method').count })
30
+ end
31
+
32
+ time = times.sum.to_f / times.size
33
+
34
+ puts "#{amount}; #{time}"
35
+
36
+ amount += batch_size
37
+
38
+ end
39
+
40
+
41
+ #------------------------------------------
42
+
43
+
44
+ reload!
45
+
46
+ Note.delete_all('site_id <> 1')
47
+
48
+ batch_size = 1000
49
+ amount = 20000
50
+
51
+ while Note.count >= batch_size
52
+
53
+ while Note.count > amount
54
+ Note.last.destroy
55
+ end
56
+
57
+ times = []
58
+ 25.times do
59
+ times << (Benchmark.realtime { Note.search('rails test method').count })
60
+ end
61
+
62
+ time = times.sum.to_f / times.size
63
+
64
+ puts "#{amount}; #{time}"
65
+
66
+ amount -= batch_size
67
+
68
+ end
69
+
70
+
@@ -0,0 +1,22 @@
1
+ 20000; 0.226401567459106
2
+ 19000; 0.210501832962036
3
+ 18000; 0.191384611129761
4
+ 17000; 0.188002090454102
5
+ 16000; 0.17433009147644
6
+ 15000; 0.168737888336182
7
+ 14000; 0.156399478912354
8
+ 13000; 0.141394910812378
9
+ 12000; 0.136331949234009
10
+ 11000; 0.129038143157959
11
+ 10000; 0.108856143951416
12
+ 9000; 0.100853176116943
13
+ 8000; 0.0817105197906494
14
+ 7000; 0.0714373302459717
15
+ 6000; 0.0654478454589844
16
+ 5000; 0.0597659683227539
17
+ 4000; 0.0383996486663818
18
+ 3000; 0.0362687492370605
19
+ 2000; 0.0331569194793701
20
+ 1000; 0.00965866088867188
21
+ 0; 0.00276429176330566
22
+
@@ -0,0 +1,21 @@
1
+ Records;LIKE;FULLTEXT
2
+ 1000;0.0211521244;0.0096586609
3
+ 2000;0.0543338013;0.0331569195
4
+ 3000;0.0776431179;0.0362687492
5
+ 4000;0.0866408348;0.0383996487
6
+ 5000;0.1185267925;0.0597659683
7
+ 6000;0.1424144745;0.0654478455
8
+ 7000;0.1785503292;0.0714373302
9
+ 8000;0.2093483353;0.0817105198
10
+ 9000;0.2463747692;0.1008531761
11
+ 10000;0.2596423721;0.108856144
12
+ 11000;0.2996723843;0.1290381432
13
+ 12000;0.3262492657;0.1363319492
14
+ 13000;0.3312761688;0.1413949108
15
+ 14000;0.3623173046;0.1563994789
16
+ 15000;0.3892039013;0.1687378883
17
+ 16000;0.4076850605;0.1743300915
18
+ 17000;0.4328162861;0.1880020905
19
+ 18000;0.4592379665;0.1913846111
20
+ 19000;0.489655714;0.210501833
21
+ 20000;0.5181032562;0.2264015675
data/dusen.gemspec CHANGED
@@ -7,7 +7,7 @@ Gem::Specification.new do |s|
7
7
  s.authors = ["Henning Koch"]
8
8
  s.email = 'henning.koch@makandra.de'
9
9
  s.homepage = 'https://github.com/makandra/dusen'
10
- s.summary = 'Maps Google-like queries (words, "phrases", qualified:fields) to ActiveRecord scope chains'
10
+ s.summary = 'Comprehensive full text search for ActiveRecord and MySQL'
11
11
  s.description = s.summary
12
12
 
13
13
  s.files = `git ls-files`.split("\n")
@@ -0,0 +1,104 @@
1
+ # encoding: utf-8
2
+
3
+ module Dusen
4
+ module ActiveRecord
5
+ module BaseExt
6
+ module ClassMethods
7
+
8
+ def search_syntax(&dsl)
9
+ @search_syntax ||= Dusen::Syntax.new
10
+ Dusen::Description.parse_syntax(@search_syntax, &dsl) if dsl
11
+ unless singleton_class.method_defined?(:search)
12
+ singleton_class.send(:define_method, :search) do |query_string|
13
+ @search_syntax.search(self, query_string)
14
+ end
15
+ end
16
+ @search_syntax
17
+ end
18
+
19
+ def search_text?
20
+ !!@has_search_text
21
+ end
22
+
23
+ def index_search_texts
24
+ Dusen::ActiveRecord::SearchText.rewrite_all_invalid(self)
25
+ end
26
+
27
+ def search_text(&text)
28
+
29
+ @has_search_text = true
30
+
31
+ has_one :search_text, :as => :source, :dependent => :destroy, :class_name => '::Dusen::ActiveRecord::SearchText', :inverse_of => :source
32
+
33
+ after_create :create_search_text
34
+
35
+ after_update :invalidate_search_text
36
+
37
+ define_method :index_search_text do
38
+ new_text = instance_eval(&text)
39
+ new_text = Array.wrap(new_text).flatten.collect(&:to_s).join(' ').gsub(/\s+/, ' ').strip
40
+ search_text || build_search_text
41
+ search_text.update_words!(new_text)
42
+ true
43
+ end
44
+
45
+ define_method :invalidate_search_text do
46
+ search_text.invalidate!
47
+ true
48
+ end
49
+
50
+ private
51
+
52
+ define_method :create_search_text do
53
+ build_search_text(:stale => true)
54
+ search_text.save!
55
+ end
56
+
57
+ search_syntax do
58
+ search_by :text do |scope, phrases|
59
+ Dusen::ActiveRecord::SearchText.match(scope, phrases)
60
+ end
61
+ end
62
+
63
+ end
64
+
65
+ def part_of_search_text_for(&associations)
66
+ invalidate_associations_method = "invalidate_search_text_for_associated_records"
67
+
68
+ before_save invalidate_associations_method
69
+ before_destroy invalidate_associations_method
70
+
71
+ private
72
+
73
+ define_method invalidate_associations_method do
74
+ associated_records = Array.wrap(instance_eval(&associations)).flatten
75
+ associated_records.each(&:invalidate_search_text)
76
+ true
77
+ end
78
+
79
+ end
80
+
81
+ def where_like(conditions)
82
+ scope = self
83
+ conditions.each do |field_or_fields, query|
84
+ fields = Array(field_or_fields).collect do |field|
85
+ Util.qualify_column_name(scope, field)
86
+ end
87
+ Array.wrap(query).each do |phrase|
88
+ phrase_with_placeholders = fields.collect { |field| "#{field} LIKE ?" }.join(' OR ')
89
+ like_expression = Dusen::Util.like_expression(phrase)
90
+ bindings = [like_expression] * fields.size
91
+ conditions = [ phrase_with_placeholders, *bindings ]
92
+ scope = Util.append_scope_conditions(scope, conditions)
93
+ end
94
+ end
95
+ scope
96
+ end
97
+
98
+ end
99
+ end
100
+ end
101
+ end
102
+
103
+ ActiveRecord::Base.send(:extend, Dusen::ActiveRecord::BaseExt::ClassMethods)
104
+
@@ -0,0 +1,50 @@
1
+ module Dusen
2
+ module ActiveRecord
3
+ class SearchText < ::ActiveRecord::Base
4
+
5
+ self.table_name = 'search_texts'
6
+
7
+ belongs_to :source, :polymorphic => true, :inverse_of => :search_text
8
+
9
+ def update_words!(words)
10
+ update_attributes!(:words => words, :stale => false)
11
+ end
12
+
13
+ def invalidate!
14
+ update_attributes!(:stale => true)
15
+ end
16
+
17
+ def self.for_model(model)
18
+ Util.append_scope_conditions(scoped({}), :source_type => model.name)
19
+ end
20
+
21
+ def self.invalid
22
+ scoped(:conditions => { :stale => true })
23
+ end
24
+
25
+ def self.rewrite_all_invalid(model)
26
+ invalid_index_records = for_model(model).invalid
27
+ ids = Util.collect_column(invalid_index_records, :source_id)
28
+ Util.append_scope_conditions(model, :id => ids).each(&:index_search_text)
29
+ end
30
+
31
+ def self.match(model, words)
32
+ rewrite_all_invalid(model) if model.search_text?
33
+ Dusen::Util.append_scope_conditions(
34
+ model,
35
+ :id => matching_source_ids(model, words)
36
+ )
37
+ end
38
+
39
+ def self.matching_source_ids(model, words)
40
+ conditions = [
41
+ 'MATCH (words) AGAINST (? IN BOOLEAN MODE)',
42
+ Dusen::Util.boolean_fulltext_query(words)
43
+ ]
44
+ matching_texts = Dusen::Util.append_scope_conditions(for_model(model), conditions)
45
+ Dusen::Util.collect_column(matching_texts, :source_id)
46
+ end
47
+
48
+ end
49
+ end
50
+ end