dusen 0.2.2 → 0.3.0
Sign up to get free protection for your applications and to get access to all the features.
- data/.gitignore +1 -0
- data/.travis.yml +3 -0
- data/README.md +176 -47
- data/documents/fulltext_vs_like_benchmark/all_records_and_scope/fulltext.csv +22 -0
- data/documents/fulltext_vs_like_benchmark/all_records_and_scope/fulltext_vs_like.xls +0 -0
- data/documents/fulltext_vs_like_benchmark/all_records_and_scope/like.csv +22 -0
- data/documents/fulltext_vs_like_benchmark/benchmark.rb +70 -0
- data/documents/fulltext_vs_like_benchmark/exact_number_of_records/fulltext.csv +22 -0
- data/documents/fulltext_vs_like_benchmark/exact_number_of_records/fulltext_vs_like.png +0 -0
- data/documents/fulltext_vs_like_benchmark/exact_number_of_records/fulltext_vs_like.xls +0 -0
- data/documents/fulltext_vs_like_benchmark/exact_number_of_records/like.csv +21 -0
- data/dusen.gemspec +1 -1
- data/lib/dusen/active_record/base_ext.rb +104 -0
- data/lib/dusen/active_record/search_text.rb +50 -0
- data/lib/dusen/description.rb +5 -4
- data/lib/dusen/query.rb +24 -4
- data/lib/dusen/railtie.rb +9 -0
- data/lib/dusen/syntax.rb +5 -0
- data/lib/dusen/tasks.rb +31 -0
- data/lib/dusen/token.rb +4 -0
- data/lib/dusen/util.rb +86 -1
- data/lib/dusen/version.rb +1 -1
- data/lib/dusen.rb +7 -1
- data/spec/rails-2.3/Gemfile +3 -1
- data/spec/rails-2.3/Rakefile +1 -1
- data/spec/rails-2.3/app_root/config/database.yml +4 -19
- data/spec/rails-2.3/app_root/config/environments/{in_memory.rb → test.rb} +0 -0
- data/spec/rails-2.3/spec/spec_helper.rb +7 -9
- data/spec/rails-3.0/Gemfile +3 -1
- data/spec/rails-3.0/Rakefile +1 -1
- data/spec/rails-3.0/app_root/config/database.yml +5 -3
- data/spec/rails-3.0/spec/spec_helper.rb +6 -7
- data/spec/rails-3.2/Gemfile +2 -1
- data/spec/rails-3.2/Rakefile +1 -1
- data/spec/rails-3.2/app_root/config/database.yml +5 -3
- data/spec/rails-3.2/spec/spec_helper.rb +3 -7
- data/spec/shared/app_root/app/models/recipe/category.rb +13 -0
- data/spec/shared/app_root/app/models/recipe/ingredient.rb +13 -0
- data/spec/shared/app_root/app/models/recipe.rb +14 -0
- data/spec/shared/app_root/app/models/user/with_fulltext.rb +35 -0
- data/spec/shared/app_root/app/models/user/without_fulltext.rb +34 -0
- data/spec/shared/app_root/config/database.sample.yml +6 -0
- data/spec/shared/app_root/db/migrate/001_create_search_text.rb +19 -0
- data/spec/shared/app_root/db/migrate/002_create_user_variants.rb +25 -0
- data/spec/shared/app_root/db/migrate/003_create_recipe_models.rb +23 -0
- data/spec/shared/spec/dusen/active_record/base_ext_spec.rb +138 -0
- data/spec/shared/spec/dusen/active_record/search_text_spec.rb +23 -0
- data/spec/shared/spec/dusen/parser_spec.rb +14 -0
- data/spec/shared/spec/dusen/query_spec.rb +20 -0
- data/spec/shared/spec/dusen/util_spec.rb +21 -0
- metadata +80 -46
- data/lib/dusen/active_record_ext.rb +0 -35
- data/spec/rails-2.3/app_root/config/environments/mysql.rb +0 -0
- data/spec/rails-2.3/app_root/config/environments/postgresql.rb +0 -0
- data/spec/rails-2.3/app_root/config/environments/sqlite.rb +0 -0
- data/spec/rails-2.3/app_root/config/environments/sqlite3.rb +0 -0
- data/spec/shared/app_root/app/models/user.rb +0 -22
- data/spec/shared/app_root/db/migrate/001_create_users.rb +0 -17
- data/spec/shared/dusen/active_record_spec.rb +0 -55
data/.gitignore
CHANGED
data/.travis.yml
CHANGED
data/README.md
CHANGED
@@ -1,20 +1,15 @@
|
|
1
|
-
Dusen
|
2
|
-
|
1
|
+
Dusen [![Build Status](https://secure.travis-ci.org/makandra/dusen.png?branch=master)](https://travis-ci.org/makandra/dusen)
|
2
|
+
======
|
3
3
|
|
4
|
-
|
4
|
+
Comprehensive search solution for ActiveRecord and MySQL
|
5
|
+
--------------------------------------------------------
|
5
6
|
|
6
|
-
Dusen
|
7
|
+
Dusen lets you search ActiveRecord model when all you have is MySQL (no Solr, Sphinx, etc.). Here's what Dusen does for you:
|
7
8
|
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
Dusen tokenizes these queries for you and feeds them through simple mappers that
|
14
|
-
convert a token to an ActiveRecord scope chain.
|
15
|
-
This process is packaged in a class method `.search`:
|
16
|
-
|
17
|
-
Contact.search('makandra software "Ruby on Rails" city:augsburg')
|
9
|
+
1. It takes a text query in Google-like syntax (e.g. `some words "a phrase" filetype:pdf`)
|
10
|
+
2. It parses the query into individual tokens.
|
11
|
+
3. It lets you define simple mappers that convert a token to an ActiveRecord scope chain. Mappers can match tokens using ActiveRecord's `where` or perform full text searches with either [LIKE queries](#processing-full-text-search-queries-with-like-queries) or [FULLTEXT indexes](#processing-full-text-queries-with-fulltext-indexes) (see [performance analysis](https://makandracards.com/makandra/12813-performance-analysis-of-mysql-s-fulltext-indexes-and-like-queries-for-full-text-search)).
|
12
|
+
4. It gives your model a method `Model.search('some query')` that performs all of the above and returns an ActiveRecord scope chain.
|
18
13
|
|
19
14
|
|
20
15
|
Installation
|
@@ -28,25 +23,25 @@ Now run `bundle install` and restart your server.
|
|
28
23
|
|
29
24
|
|
30
25
|
|
31
|
-
Processing text queries
|
32
|
-
|
26
|
+
Processing full text search queries with LIKE queries
|
27
|
+
-----------------------------------------------------
|
33
28
|
|
34
29
|
This describes how to define a search syntax that processes queries
|
35
|
-
of words and phrases
|
30
|
+
of words and phrases, e.g. `coworking fooville "market ave"`.
|
31
|
+
|
32
|
+
Under the hood the search will be performed using [LIKE queries](http://dev.mysql.com/doc/refman/5.0/en/string-comparison-functions.html#operator_like), which are [fast enough](https://makandracards.com/makandra/12813-performance-analysis-of-mysql-s-fulltext-indexes-and-like-queries-for-full-text-search) for medium sized data sets. Once your data outgrows LIKE queries, Dusen lets you [migrate to FULLTEXT indexes](#processing-full-text-queries-with-fulltext-indexes), which perform better but come at some added complexity.
|
36
33
|
|
37
|
-
coworking fooville "market ave"
|
38
34
|
|
35
|
+
### Setup and usage
|
39
36
|
|
40
37
|
Our example will be a simple address book:
|
41
38
|
|
42
39
|
class Contact < ActiveRecord::Base
|
43
|
-
|
44
40
|
validates_presence_of :name, :street, :city, :name
|
45
|
-
|
46
41
|
end
|
47
42
|
|
48
43
|
|
49
|
-
|
44
|
+
In order to teach `Contact` how to process a text query, use the `search_syntax` and `search_by :text` macros:
|
50
45
|
|
51
46
|
class Contact < ActiveRecord::Base
|
52
47
|
|
@@ -54,9 +49,9 @@ We will now teach `Contact` to process a text query like this:
|
|
54
49
|
|
55
50
|
search_syntax do
|
56
51
|
|
57
|
-
search_by :text do |scope,
|
52
|
+
search_by :text do |scope, phrases|
|
58
53
|
columns = [:name, :street, :city, :email]
|
59
|
-
scope.where_like(columns =>
|
54
|
+
scope.where_like(columns => phrases)
|
60
55
|
end
|
61
56
|
|
62
57
|
end
|
@@ -64,36 +59,52 @@ We will now teach `Contact` to process a text query like this:
|
|
64
59
|
end
|
65
60
|
|
66
61
|
|
67
|
-
|
68
|
-
Dusen will take care how these scopes will be chained together.
|
62
|
+
Dusen will tokenize the query into individual phrases and call the `search_by :text` block with it. The block is expected to return a scope that filters by the given phrases.
|
69
63
|
|
70
|
-
If we
|
71
|
-
the block supplied to `search_by` is called
|
64
|
+
If, for example, we call `Contact.search('coworking fooville "market ave"')`
|
65
|
+
the block supplied to `search_by :text` is called with the following arguments:
|
72
66
|
|
73
|
-
|
74
|
-
2. `|Contact.where_like(columns => 'coworking'), 'fooville'|`
|
75
|
-
3. `|Contact.where_like(columns => 'coworking').where_like(columns => 'fooville'), 'market ave'|`
|
67
|
+
|Contact, ['coworking', 'fooville', 'market ave']|
|
76
68
|
|
77
69
|
|
78
70
|
The resulting scope chain is your `Contact` model filtered by
|
79
71
|
the given query:
|
80
72
|
|
81
73
|
> Contact.search('coworking fooville "market ave"')
|
82
|
-
=> Contact.where_like(
|
74
|
+
=> Contact.where_like([:name, :street, :city, :email] => ['coworking', 'fooville', 'market ave'])
|
83
75
|
|
76
|
+
### What where_like does under the hood
|
84
77
|
|
85
78
|
Note that `where_like` is an utility method that comes with the Dusen gem.
|
86
|
-
It takes one or more column names and
|
87
|
-
like
|
88
|
-
|
89
|
-
contacts.name LIKE "%coworking%"
|
79
|
+
It takes one or more column names and one or more phrases and generates an SQL fragment
|
80
|
+
that looks roughly like the following:
|
81
|
+
|
82
|
+
( contacts.name LIKE "%coworking%" OR
|
83
|
+
contacts.street LIKE "%coworking%" OR
|
84
|
+
contacts.email LIKE "%coworking%" OR
|
85
|
+
contacts.email LIKE "%coworking%" ) AND
|
86
|
+
( contacts.name LIKE "%fooville%" OR
|
87
|
+
contacts.street LIKE "%fooville%" OR
|
88
|
+
contacts.email LIKE "%fooville%" OR
|
89
|
+
contacts.email LIKE "%fooville%" ) AND
|
90
|
+
( contacts.name LIKE "%market ave%" OR
|
91
|
+
contacts.street LIKE "%market ave%" OR
|
92
|
+
contacts.email LIKE "%market ave%" OR
|
93
|
+
contacts.email LIKE "%market ave%" )
|
90
94
|
|
91
95
|
|
92
96
|
Processing queries for qualified fields
|
93
97
|
---------------------------------------
|
94
98
|
|
95
|
-
|
96
|
-
|
99
|
+
Google supports queries like `filetype:pdf` that filters records by some criteria without performing a full text search. Dusen gives you a simple way to support such search syntax.
|
100
|
+
|
101
|
+
### Setup and usage
|
102
|
+
|
103
|
+
We now want to process a qualified query like `email:foo@bar.com` to
|
104
|
+
explictily search for a contact's email address, without going through
|
105
|
+
a full text search.
|
106
|
+
|
107
|
+
We can learn this syntax by adding a `search_by :email` instruction
|
97
108
|
to our model:
|
98
109
|
|
99
110
|
search_syntax do
|
@@ -102,11 +113,11 @@ to our model:
|
|
102
113
|
...
|
103
114
|
end
|
104
115
|
|
105
|
-
|
106
|
-
|
107
|
-
|
116
|
+
search_by :email do |scope, email|
|
117
|
+
scope.where(:email => email)
|
118
|
+
end
|
108
119
|
|
109
|
-
|
120
|
+
end
|
110
121
|
|
111
122
|
|
112
123
|
The result is this:
|
@@ -115,17 +126,134 @@ The result is this:
|
|
115
126
|
=> Contact.where(:email => 'foo@bar.com')
|
116
127
|
|
117
128
|
|
118
|
-
|
129
|
+
Note that you can combine text tokens and field tokens:
|
119
130
|
|
120
131
|
> Contact.search('fooville email:foo@bar.com')
|
121
132
|
=> Contact.where_like(columns => 'fooville').where(:email => 'foo@bar.com')
|
122
133
|
|
123
134
|
|
135
|
+
Processing full text queries with FULLTEXT indexes
|
136
|
+
---------------------------------------------------
|
137
|
+
|
138
|
+
### When do I need this?
|
139
|
+
|
140
|
+
As your number of records grows larger, you might outgrow a full text implementation that uses LIKE (see [performance analysis](https://makandracards.com/makandra/12813-performance-analysis-of-mysql-s-fulltext-indexes-and-like-queries-for-full-text-search)). For this case Dusen ships with an alternative full text search solution using MySQL FULLTEXT indexes that scale much better.
|
141
|
+
|
142
|
+
### Understanding the MyISAM limitation
|
143
|
+
|
144
|
+
Using this feature comes at some added complexity so you should first check if search performance is actually a problem for you. If all you have is a few thousand records with a few dozen words each, changes are your views render many times longer than a LIKE query takes to finish. Always measure before optimizing.
|
145
|
+
|
146
|
+
Currently stable MySQL versions only allow FULLTEXT indexes on MyISAM tables (this will change in MySQL 5.6). You don't however want to migrate your models to MyISAM tables because of their many limitations (poor crash recovery, no transactions, etc.).
|
147
|
+
|
148
|
+
To work around this, Dusen uses a separate MyISAM table `search_texts` to index your searchable text. Each row in your model's table will be shadowed by a corresponding row in `search_texts`. Dusen will automatically create, update and destroy these shadow rows as your model records change.
|
149
|
+
|
150
|
+
|
151
|
+
### Setup and usage
|
152
|
+
|
153
|
+
First we need to create the `search_texts` table. Since we're on Rails, we will do this using a migration. So enter `rails generate migration CreateSearchText` and use the following code as the migration's content:
|
154
|
+
|
155
|
+
class CreateSearchText < ActiveRecord::Migration
|
156
|
+
|
157
|
+
def self.up
|
158
|
+
create_table :search_texts, :options => 'ENGINE=MyISAM' do |t|
|
159
|
+
t.integer :source_id
|
160
|
+
t.string :source_type
|
161
|
+
t.boolean :stale
|
162
|
+
t.text :words
|
163
|
+
end
|
164
|
+
add_index :search_texts, [:source_type, :source_id] # for updates
|
165
|
+
add_index :search_texts, [:source_type, :stale] # for refreshs
|
166
|
+
execute 'CREATE FULLTEXT INDEX fulltext_index_words ON search_texts (words)'
|
167
|
+
end
|
168
|
+
|
169
|
+
def self.down
|
170
|
+
drop_table :search_texts
|
171
|
+
end
|
172
|
+
|
173
|
+
end
|
174
|
+
|
175
|
+
Since we're using some MySQL-specific options we also need to change the format of your `db/schema.rb` from Ruby to SQL (you will get a `db/schema.sql` instead). You can configure this in your `application.rb` (`environment.rb` in Rails 2):
|
176
|
+
|
177
|
+
config.active_record.schema_format = :sql
|
178
|
+
|
179
|
+
|
180
|
+
We now need to your model which text to index. We do this using the `search_text` macro and returning the searchable text:
|
181
|
+
|
182
|
+
class Contact < ActiveRecord::Base
|
183
|
+
|
184
|
+
search_syntax
|
185
|
+
|
186
|
+
search_text do
|
187
|
+
[name, street, city, email]
|
188
|
+
end
|
189
|
+
|
190
|
+
end
|
191
|
+
|
192
|
+
end
|
193
|
+
|
194
|
+
You can return any object or array of objects. Dusen will stringify the return value and index those words. Note that indexed words do not need to be fields of your model:
|
195
|
+
|
196
|
+
search_text do
|
197
|
+
[email, city, author.screen_name, ('client' if client?)
|
198
|
+
end
|
199
|
+
|
200
|
+
You're done! You can now search `Contact` using the same API you used with [LIKE queries](#processing-full-text-search-queries-with-like-queries):
|
201
|
+
|
202
|
+
Contact.search('coworking fooville "market ave"')
|
203
|
+
|
204
|
+
Note that you didn't need to teach your model how to process text queries by defining a mapper with `search_by :text { ... }`. The `search_text` macro defines this mapper for you.
|
205
|
+
|
206
|
+
Also note that if you migrated an existing table to FULLTEXT search, you need to [build the index the first time](#building-the-index-for-existing-records).
|
207
|
+
|
208
|
+
|
209
|
+
### Building the index for existing records
|
210
|
+
|
211
|
+
If you migrated an existing table to FULLTEXT search, you must build the index for all existing records:
|
212
|
+
|
213
|
+
Model.all.each(&index_search_text)
|
214
|
+
|
215
|
+
You only need to do this once. Dusen will automatically index all further changes to your records.
|
216
|
+
|
217
|
+
### Indexing changes in associated records
|
218
|
+
|
219
|
+
Dusen lets you index words from associated models. When you do this you need to reindex the indexed model whenever an associated record changes, or else the indexed text will be out of date.
|
220
|
+
|
221
|
+
As an example we will associate `Contact` with an `Organization` and make it searchable by the name of her `Organization`:
|
222
|
+
|
223
|
+
class Contact < ActiveRecord::Base
|
224
|
+
|
225
|
+
belongs_to :organization
|
226
|
+
|
227
|
+
search_syntax
|
228
|
+
|
229
|
+
search_text do
|
230
|
+
[name, email, organization && organization.name]
|
231
|
+
end
|
232
|
+
|
233
|
+
end
|
234
|
+
|
235
|
+
To make sure contacts will reindex when the organization changes its name, use the `part_of_search_text_for` macro:
|
236
|
+
|
237
|
+
class Organization < ActiveRecord::Base
|
238
|
+
|
239
|
+
has_many :contacts
|
240
|
+
|
241
|
+
part_of_search_text_for do
|
242
|
+
contacts
|
243
|
+
end
|
244
|
+
|
245
|
+
end
|
246
|
+
|
247
|
+
All records returned by `part_of_search_text_for` will be reindexed when the organization is changed or destroyed.
|
248
|
+
|
124
249
|
|
125
250
|
Programmatic access without DSL
|
126
251
|
-------------------------------
|
127
252
|
|
128
|
-
You can use Dusen's functionality without using the ActiveRecord DSL or the search
|
253
|
+
You can use Dusen's functionality without using the ActiveRecord DSL or the `search` method.
|
254
|
+
**Please note that at this time we cannot yet commit to the API of these internal methods**. So don't get mad when stuff breaks after you update the gem.
|
255
|
+
|
256
|
+
Here are some method calls to get you started:
|
129
257
|
|
130
258
|
Contact.search_syntax # => #<Dusen::Syntax>
|
131
259
|
|
@@ -144,14 +272,15 @@ You can use Dusen's functionality without using the ActiveRecord DSL or the sear
|
|
144
272
|
Development
|
145
273
|
-----------
|
146
274
|
|
147
|
-
Test applications for various Rails versions lives in `spec`.
|
148
|
-
|
149
|
-
|
275
|
+
- Test applications for various Rails versions lives in `spec`.
|
276
|
+
- You need to create a MySQL database and put credentials into `spec/shared/app_root/config/database.yml`.
|
277
|
+
- You can bundle all test applications by saying `bundle exec rake all:bundle`
|
278
|
+
- You can run specs from the project root by saying `bundle exec rake all:spec`.
|
150
279
|
|
151
280
|
If you would like to contribute:
|
152
281
|
|
153
282
|
- Fork the repository.
|
154
|
-
- Push your changes **with specs**.
|
283
|
+
- Push your changes **with passing specs**.
|
155
284
|
- Send me a pull request.
|
156
285
|
|
157
286
|
I'm very eager to keep this gem leightweight and on topic. If you're unsure whether a change would make it into the gem, [talk to me beforehand](mailto:henning.koch@makandra.de).
|
@@ -0,0 +1,22 @@
|
|
1
|
+
1000; 0.394796924591064
|
2
|
+
2000; 0.391613826751709
|
3
|
+
3000; 0.414114084243774
|
4
|
+
4000; 0.406150894165039
|
5
|
+
5000; 0.426540040969849
|
6
|
+
6000; 0.449329986572266
|
7
|
+
7000; 0.496056623458862
|
8
|
+
8000; 0.480993881225586
|
9
|
+
9000; 0.503967409133911
|
10
|
+
10000; 0.523442239761353
|
11
|
+
11000; 0.53717339515686
|
12
|
+
12000; 0.564517946243286
|
13
|
+
13000; 0.574517812728882
|
14
|
+
14000; 0.585553913116455
|
15
|
+
15000; 0.609066152572632
|
16
|
+
16000; 0.618757972717285
|
17
|
+
17000; 0.672317543029785
|
18
|
+
18000; 0.667384099960327
|
19
|
+
19000; 0.672960157394409
|
20
|
+
20000; 0.696838903427124
|
21
|
+
21000; 0.709187746047974
|
22
|
+
|
Binary file
|
@@ -0,0 +1,22 @@
|
|
1
|
+
1000; 0.0629708099365234
|
2
|
+
2000; 0.101030836105347
|
3
|
+
3000; 0.15950831413269
|
4
|
+
4000; 0.725705051422119
|
5
|
+
5000; 0.755747880935669
|
6
|
+
6000; 0.791393089294434
|
7
|
+
7000; 0.813040456771851
|
8
|
+
8000; 0.842384424209595
|
9
|
+
9000; 0.880409164428711
|
10
|
+
10000; 0.901688318252563
|
11
|
+
11000; 0.926919441223145
|
12
|
+
12000; 0.963594560623169
|
13
|
+
13000; 0.988169240951538
|
14
|
+
14000; 1.02734631538391
|
15
|
+
15000; 1.0335246181488
|
16
|
+
16000; 1.07445472717285
|
17
|
+
17000; 1.12085759162903
|
18
|
+
18000; 1.12942158699036
|
19
|
+
19000; 1.14870473861694
|
20
|
+
20000; 1.19215731620789
|
21
|
+
21000; 1.22716112136841
|
22
|
+
|
@@ -0,0 +1,70 @@
|
|
1
|
+
reload!; Benchmark.measure { puts "..."; Note.search('rails test method').all; puts "DONE." }
|
2
|
+
|
3
|
+
|
4
|
+
|
5
|
+
|
6
|
+
|
7
|
+
|
8
|
+
|
9
|
+
|
10
|
+
|
11
|
+
reload!
|
12
|
+
|
13
|
+
site = Site.first
|
14
|
+
note_scope = site.notes; nil
|
15
|
+
note_ids = note_scope.collect_ids; nil
|
16
|
+
puts "Benchmarking #{note_ids.size} notes"
|
17
|
+
|
18
|
+
batch_size = 1000
|
19
|
+
amount = batch_size
|
20
|
+
max_amount = note_ids.size
|
21
|
+
|
22
|
+
while amount < max_amount
|
23
|
+
|
24
|
+
batch_ids = note_ids[0, amount]
|
25
|
+
batch_scope = note_scope.scoped(:conditions => { :id => batch_ids })
|
26
|
+
|
27
|
+
times = []
|
28
|
+
25.times do
|
29
|
+
times << (Benchmark.realtime { batch_scope.search('rails test method').count })
|
30
|
+
end
|
31
|
+
|
32
|
+
time = times.sum.to_f / times.size
|
33
|
+
|
34
|
+
puts "#{amount}; #{time}"
|
35
|
+
|
36
|
+
amount += batch_size
|
37
|
+
|
38
|
+
end
|
39
|
+
|
40
|
+
|
41
|
+
#------------------------------------------
|
42
|
+
|
43
|
+
|
44
|
+
reload!
|
45
|
+
|
46
|
+
Note.delete_all('site_id <> 1')
|
47
|
+
|
48
|
+
batch_size = 1000
|
49
|
+
amount = 20000
|
50
|
+
|
51
|
+
while Note.count >= batch_size
|
52
|
+
|
53
|
+
while Note.count > amount
|
54
|
+
Note.last.destroy
|
55
|
+
end
|
56
|
+
|
57
|
+
times = []
|
58
|
+
25.times do
|
59
|
+
times << (Benchmark.realtime { Note.search('rails test method').count })
|
60
|
+
end
|
61
|
+
|
62
|
+
time = times.sum.to_f / times.size
|
63
|
+
|
64
|
+
puts "#{amount}; #{time}"
|
65
|
+
|
66
|
+
amount -= batch_size
|
67
|
+
|
68
|
+
end
|
69
|
+
|
70
|
+
|
@@ -0,0 +1,22 @@
|
|
1
|
+
20000; 0.226401567459106
|
2
|
+
19000; 0.210501832962036
|
3
|
+
18000; 0.191384611129761
|
4
|
+
17000; 0.188002090454102
|
5
|
+
16000; 0.17433009147644
|
6
|
+
15000; 0.168737888336182
|
7
|
+
14000; 0.156399478912354
|
8
|
+
13000; 0.141394910812378
|
9
|
+
12000; 0.136331949234009
|
10
|
+
11000; 0.129038143157959
|
11
|
+
10000; 0.108856143951416
|
12
|
+
9000; 0.100853176116943
|
13
|
+
8000; 0.0817105197906494
|
14
|
+
7000; 0.0714373302459717
|
15
|
+
6000; 0.0654478454589844
|
16
|
+
5000; 0.0597659683227539
|
17
|
+
4000; 0.0383996486663818
|
18
|
+
3000; 0.0362687492370605
|
19
|
+
2000; 0.0331569194793701
|
20
|
+
1000; 0.00965866088867188
|
21
|
+
0; 0.00276429176330566
|
22
|
+
|
Binary file
|
Binary file
|
@@ -0,0 +1,21 @@
|
|
1
|
+
Records;LIKE;FULLTEXT
|
2
|
+
1000;0.0211521244;0.0096586609
|
3
|
+
2000;0.0543338013;0.0331569195
|
4
|
+
3000;0.0776431179;0.0362687492
|
5
|
+
4000;0.0866408348;0.0383996487
|
6
|
+
5000;0.1185267925;0.0597659683
|
7
|
+
6000;0.1424144745;0.0654478455
|
8
|
+
7000;0.1785503292;0.0714373302
|
9
|
+
8000;0.2093483353;0.0817105198
|
10
|
+
9000;0.2463747692;0.1008531761
|
11
|
+
10000;0.2596423721;0.108856144
|
12
|
+
11000;0.2996723843;0.1290381432
|
13
|
+
12000;0.3262492657;0.1363319492
|
14
|
+
13000;0.3312761688;0.1413949108
|
15
|
+
14000;0.3623173046;0.1563994789
|
16
|
+
15000;0.3892039013;0.1687378883
|
17
|
+
16000;0.4076850605;0.1743300915
|
18
|
+
17000;0.4328162861;0.1880020905
|
19
|
+
18000;0.4592379665;0.1913846111
|
20
|
+
19000;0.489655714;0.210501833
|
21
|
+
20000;0.5181032562;0.2264015675
|
data/dusen.gemspec
CHANGED
@@ -7,7 +7,7 @@ Gem::Specification.new do |s|
|
|
7
7
|
s.authors = ["Henning Koch"]
|
8
8
|
s.email = 'henning.koch@makandra.de'
|
9
9
|
s.homepage = 'https://github.com/makandra/dusen'
|
10
|
-
s.summary = '
|
10
|
+
s.summary = 'Comprehensive full text search for ActiveRecord and MySQL'
|
11
11
|
s.description = s.summary
|
12
12
|
|
13
13
|
s.files = `git ls-files`.split("\n")
|
@@ -0,0 +1,104 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
|
3
|
+
module Dusen
|
4
|
+
module ActiveRecord
|
5
|
+
module BaseExt
|
6
|
+
module ClassMethods
|
7
|
+
|
8
|
+
def search_syntax(&dsl)
|
9
|
+
@search_syntax ||= Dusen::Syntax.new
|
10
|
+
Dusen::Description.parse_syntax(@search_syntax, &dsl) if dsl
|
11
|
+
unless singleton_class.method_defined?(:search)
|
12
|
+
singleton_class.send(:define_method, :search) do |query_string|
|
13
|
+
@search_syntax.search(self, query_string)
|
14
|
+
end
|
15
|
+
end
|
16
|
+
@search_syntax
|
17
|
+
end
|
18
|
+
|
19
|
+
def search_text?
|
20
|
+
!!@has_search_text
|
21
|
+
end
|
22
|
+
|
23
|
+
def index_search_texts
|
24
|
+
Dusen::ActiveRecord::SearchText.rewrite_all_invalid(self)
|
25
|
+
end
|
26
|
+
|
27
|
+
def search_text(&text)
|
28
|
+
|
29
|
+
@has_search_text = true
|
30
|
+
|
31
|
+
has_one :search_text, :as => :source, :dependent => :destroy, :class_name => '::Dusen::ActiveRecord::SearchText', :inverse_of => :source
|
32
|
+
|
33
|
+
after_create :create_search_text
|
34
|
+
|
35
|
+
after_update :invalidate_search_text
|
36
|
+
|
37
|
+
define_method :index_search_text do
|
38
|
+
new_text = instance_eval(&text)
|
39
|
+
new_text = Array.wrap(new_text).flatten.collect(&:to_s).join(' ').gsub(/\s+/, ' ').strip
|
40
|
+
search_text || build_search_text
|
41
|
+
search_text.update_words!(new_text)
|
42
|
+
true
|
43
|
+
end
|
44
|
+
|
45
|
+
define_method :invalidate_search_text do
|
46
|
+
search_text.invalidate!
|
47
|
+
true
|
48
|
+
end
|
49
|
+
|
50
|
+
private
|
51
|
+
|
52
|
+
define_method :create_search_text do
|
53
|
+
build_search_text(:stale => true)
|
54
|
+
search_text.save!
|
55
|
+
end
|
56
|
+
|
57
|
+
search_syntax do
|
58
|
+
search_by :text do |scope, phrases|
|
59
|
+
Dusen::ActiveRecord::SearchText.match(scope, phrases)
|
60
|
+
end
|
61
|
+
end
|
62
|
+
|
63
|
+
end
|
64
|
+
|
65
|
+
def part_of_search_text_for(&associations)
|
66
|
+
invalidate_associations_method = "invalidate_search_text_for_associated_records"
|
67
|
+
|
68
|
+
before_save invalidate_associations_method
|
69
|
+
before_destroy invalidate_associations_method
|
70
|
+
|
71
|
+
private
|
72
|
+
|
73
|
+
define_method invalidate_associations_method do
|
74
|
+
associated_records = Array.wrap(instance_eval(&associations)).flatten
|
75
|
+
associated_records.each(&:invalidate_search_text)
|
76
|
+
true
|
77
|
+
end
|
78
|
+
|
79
|
+
end
|
80
|
+
|
81
|
+
def where_like(conditions)
|
82
|
+
scope = self
|
83
|
+
conditions.each do |field_or_fields, query|
|
84
|
+
fields = Array(field_or_fields).collect do |field|
|
85
|
+
Util.qualify_column_name(scope, field)
|
86
|
+
end
|
87
|
+
Array.wrap(query).each do |phrase|
|
88
|
+
phrase_with_placeholders = fields.collect { |field| "#{field} LIKE ?" }.join(' OR ')
|
89
|
+
like_expression = Dusen::Util.like_expression(phrase)
|
90
|
+
bindings = [like_expression] * fields.size
|
91
|
+
conditions = [ phrase_with_placeholders, *bindings ]
|
92
|
+
scope = Util.append_scope_conditions(scope, conditions)
|
93
|
+
end
|
94
|
+
end
|
95
|
+
scope
|
96
|
+
end
|
97
|
+
|
98
|
+
end
|
99
|
+
end
|
100
|
+
end
|
101
|
+
end
|
102
|
+
|
103
|
+
ActiveRecord::Base.send(:extend, Dusen::ActiveRecord::BaseExt::ClassMethods)
|
104
|
+
|
@@ -0,0 +1,50 @@
|
|
1
|
+
module Dusen
|
2
|
+
module ActiveRecord
|
3
|
+
class SearchText < ::ActiveRecord::Base
|
4
|
+
|
5
|
+
self.table_name = 'search_texts'
|
6
|
+
|
7
|
+
belongs_to :source, :polymorphic => true, :inverse_of => :search_text
|
8
|
+
|
9
|
+
def update_words!(words)
|
10
|
+
update_attributes!(:words => words, :stale => false)
|
11
|
+
end
|
12
|
+
|
13
|
+
def invalidate!
|
14
|
+
update_attributes!(:stale => true)
|
15
|
+
end
|
16
|
+
|
17
|
+
def self.for_model(model)
|
18
|
+
Util.append_scope_conditions(scoped({}), :source_type => model.name)
|
19
|
+
end
|
20
|
+
|
21
|
+
def self.invalid
|
22
|
+
scoped(:conditions => { :stale => true })
|
23
|
+
end
|
24
|
+
|
25
|
+
def self.rewrite_all_invalid(model)
|
26
|
+
invalid_index_records = for_model(model).invalid
|
27
|
+
ids = Util.collect_column(invalid_index_records, :source_id)
|
28
|
+
Util.append_scope_conditions(model, :id => ids).each(&:index_search_text)
|
29
|
+
end
|
30
|
+
|
31
|
+
def self.match(model, words)
|
32
|
+
rewrite_all_invalid(model) if model.search_text?
|
33
|
+
Dusen::Util.append_scope_conditions(
|
34
|
+
model,
|
35
|
+
:id => matching_source_ids(model, words)
|
36
|
+
)
|
37
|
+
end
|
38
|
+
|
39
|
+
def self.matching_source_ids(model, words)
|
40
|
+
conditions = [
|
41
|
+
'MATCH (words) AGAINST (? IN BOOLEAN MODE)',
|
42
|
+
Dusen::Util.boolean_fulltext_query(words)
|
43
|
+
]
|
44
|
+
matching_texts = Dusen::Util.append_scope_conditions(for_model(model), conditions)
|
45
|
+
Dusen::Util.collect_column(matching_texts, :source_id)
|
46
|
+
end
|
47
|
+
|
48
|
+
end
|
49
|
+
end
|
50
|
+
end
|