repertoire-faceting 0.5.5 → 0.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/FAQ +23 -17
- data/INSTALL +52 -84
- data/LICENSE +1 -1
- data/README +213 -34
- data/TODO +20 -7
- data/ext/Makefile +24 -14
- data/ext/README.faceting +51 -0
- data/ext/bytea/bytea.sql +173 -0
- data/ext/bytea/faceting_bytea.control +6 -0
- data/ext/common/util.sql +35 -0
- data/ext/faceting--0.6.0.sql +251 -0
- data/ext/faceting_bytea--0.6.0.sql +207 -0
- data/ext/faceting_varbit--0.6.0.sql +198 -0
- data/ext/signature/faceting.control +6 -0
- data/ext/signature/signature.c +740 -0
- data/ext/{signature.o → signature/signature.o} +0 -0
- data/ext/{signature.so → signature/signature.so} +0 -0
- data/ext/signature/signature.sql +217 -0
- data/ext/varbit/faceting_varbit.control +7 -0
- data/ext/varbit/varbit.sql +164 -0
- data/{public → lib/assets}/images/repertoire-faceting/proportional_symbol.png +0 -0
- data/{public → lib/assets}/images/repertoire-faceting/spinner_sm.gif +0 -0
- data/{public → lib/assets}/javascripts/rep.faceting/context.js +2 -2
- data/{public → lib/assets}/javascripts/rep.faceting/ext/earth_facet.js +2 -4
- data/{public → lib/assets}/javascripts/rep.faceting/facet.js +1 -1
- data/{public → lib/assets}/javascripts/rep.faceting/facet_widget.js +3 -8
- data/{public → lib/assets}/javascripts/rep.faceting/nested_facet.js +1 -1
- data/{public → lib/assets}/javascripts/rep.faceting/results.js +1 -1
- data/{public → lib/assets}/javascripts/rep.faceting.js +5 -1
- data/{public → lib/assets}/javascripts/rep.protovis-facets.js +3 -3
- data/lib/assets/javascripts/rep.widgets/events.js +51 -0
- data/lib/assets/javascripts/rep.widgets/global.js +50 -0
- data/lib/assets/javascripts/rep.widgets/model.js +159 -0
- data/lib/assets/javascripts/rep.widgets/widget.js +213 -0
- data/lib/assets/javascripts/rep.widgets.js +14 -0
- data/{public → lib/assets}/stylesheets/rep.faceting.css +1 -1
- data/lib/repertoire-faceting/adapters/postgresql_adapter.rb +107 -48
- data/lib/repertoire-faceting/facets/abstract_facet.rb +43 -27
- data/lib/repertoire-faceting/facets/basic_facet.rb +23 -22
- data/lib/repertoire-faceting/facets/nested_facet.rb +50 -27
- data/lib/repertoire-faceting/model.rb +101 -65
- data/lib/repertoire-faceting/rails/engine.rb +8 -0
- data/lib/repertoire-faceting/rails/postgresql_adapter.rb +0 -1
- data/lib/repertoire-faceting/rails/relation.rb +0 -1
- data/lib/repertoire-faceting/railtie.rb +0 -1
- data/lib/repertoire-faceting/relation/calculations.rb +7 -2
- data/lib/repertoire-faceting/relation/query_methods.rb +17 -4
- data/lib/repertoire-faceting/routing.rb +2 -5
- data/lib/repertoire-faceting/tasks/all.rake +5 -4
- data/lib/repertoire-faceting/tasks/client.rake +2 -5
- data/lib/repertoire-faceting/version.rb +1 -1
- data/lib/repertoire-faceting.rb +2 -4
- data/{public → vendor/assets}/javascripts/google-earth-extensions.js +0 -0
- data/{public → vendor/assets}/javascripts/protovis.js +0 -0
- metadata +78 -78
- data/ext/README.signature +0 -33
- data/ext/signature.c +0 -740
- data/ext/signature.sql +0 -342
- data/ext/signature.sql.IN +0 -342
- data/ext/uninstall_signature.sql +0 -4
- data/ext/uninstall_signature.sql.IN +0 -4
- data/lib/repertoire-faceting/adapters/abstract_adapter.rb +0 -18
- data/lib/repertoire-faceting/relation/spawn_methods.rb +0 -26
data/README
CHANGED
@@ -1,12 +1,12 @@
|
|
1
1
|
=== Repertoire Faceting README
|
2
2
|
|
3
|
-
Repertoire Faceting is highly scalable and extensible module for creating database-driven faceted browsers in Rails 3. It consists of three components: (1) a native PostgreSQL data type for constructing fast bitset indices over controlled vocabularies; (2) Rails
|
3
|
+
Repertoire Faceting is highly scalable and extensible module for creating database-driven faceted browsers in Rails 3 & 4. It consists of three components: (1) a native PostgreSQL data type for constructing fast bitset indices over controlled vocabularies; (2) Rails model and controller mixins that add a faceting API to your existing application; and (3) a set of extensible javascript widgets for building user-interfaces. In only 10-15 lines of new code you can implement a fully-functional faceted browser for your existing Rails data models, with scalability out of the box to over 1,000,000 items.
|
4
4
|
|
5
5
|
== Features
|
6
6
|
|
7
7
|
Several features distinguish Repertoire Faceting from other faceting systems such as Simile Exhibit, Endeca, and Solr.
|
8
8
|
|
9
|
-
Repertoire Faceting is an end-to-end solution that works with your existing database schema and Rails models. There's no need to munge your data into a proprietary format, run a separate facet index server, or construct your own user-interface widgets. (Conversely, however, your project needs to use Rails
|
9
|
+
Repertoire Faceting is an end-to-end solution that works with your existing database schema and Rails models. There's no need to munge your data into a proprietary format, run a separate facet index server, or construct your own user-interface widgets. (Conversely, however, your project needs to use Rails, PostgreSQL, and JQuery.)
|
10
10
|
|
11
11
|
The module works equally well on small and large data sets, which means there's a low barrier to entry but existing projects can grow easily. In 'training wheels' mode, the module produces SQL queries for facet counts directly from declarations in your model so you can get a project up and running quickly. Then, after your dataset grows beyond a thousand items or so, just add indices as necessary. The module detects and uses these automatically, with no changes to your code or additional SQL necessary.
|
12
12
|
|
@@ -16,7 +16,7 @@ Both facet widgets and indexes are pluggable and extensible. You can subclass th
|
|
16
16
|
|
17
17
|
Similarly, you can write new facet implementations for novel data types, which automatically detect and index appropriate columns. For example, the module has been used to do facet value counts over GIS data points on a map, by drilling down through associated GIS layers using spatial logic relations.
|
18
18
|
|
19
|
-
For an out-of-the box example using Repertoire Faceting, which demonstrates the module's visualization and scalability features, see the example application (http://github.com/
|
19
|
+
For an out-of-the box example using Repertoire Faceting, which demonstrates the module's visualization and scalability features, see the example application (http://github.com/christopheryork/repertoire-faceting-example).
|
20
20
|
|
21
21
|
|
22
22
|
== Installation
|
@@ -29,10 +29,9 @@ See the INSTALL document for a description of how to install the module and buil
|
|
29
29
|
You can run the unit tests from the module's root directory. You will need a local PostgreSQL superuser role with your unix username (use 'createuser -Upostgres').
|
30
30
|
|
31
31
|
$ bundle install
|
32
|
-
$ rake db:
|
33
|
-
$ rake db:create
|
32
|
+
$ rake db:setup
|
34
33
|
$ rake test
|
35
|
-
|
34
|
+
|
36
35
|
|
37
36
|
== Generating documentation
|
38
37
|
|
@@ -68,9 +67,9 @@ See Repertoire::Faceting::Facets::AbstractFacet
|
|
68
67
|
It is very useful to create a rake task to update your application's indices. In the project's rake task file:
|
69
68
|
|
70
69
|
task :reindex => :environment do
|
71
|
-
Painting.
|
70
|
+
Painting.index_facets([:genre, :era])
|
72
71
|
end
|
73
|
-
|
72
|
+
|
74
73
|
Then run 'rake reindex' whenever you need to update indices manually.
|
75
74
|
|
76
75
|
*static* If the facet data is unchanging, use a rake task like the one above to create indices manually while developing or deploying.
|
@@ -84,7 +83,7 @@ Because repertoire-faceting depends on a native shared library loaded by the Pos
|
|
84
83
|
|
85
84
|
<server>$ bundle install --deployment
|
86
85
|
<server>$ export RAILS_ENV=production
|
87
|
-
<server>$ rake db:faceting:
|
86
|
+
<server>$ rake db:faceting:extensions:install
|
88
87
|
<server>$ # ... from here, follow normal deployment procedure
|
89
88
|
|
90
89
|
|
@@ -96,7 +95,7 @@ There are three direct implementations for faceted classifications like this in
|
|
96
95
|
|
97
96
|
*1:* Explicit controlled vocabulary, multiple valued facet
|
98
97
|
|
99
|
-
genres plays_genres plays
|
98
|
+
genres plays_genres plays
|
100
99
|
----+--------- ---------+---------- ----+------------------+---------
|
101
100
|
id | name play_id | genre_id id | title | date ...
|
102
101
|
----+--------- ---------+---------- ----+------------------|---------
|
@@ -112,7 +111,7 @@ There are three direct implementations for faceted classifications like this in
|
|
112
111
|
|
113
112
|
*2:* Implicit vocabulary, multiple valued facet
|
114
113
|
|
115
|
-
plays_genres plays
|
114
|
+
plays_genres plays
|
116
115
|
---------+---------- ----+------------------+---------
|
117
116
|
play_id | genre_id id | title | date ...
|
118
117
|
---------+---------- ----+------------------|---------
|
@@ -127,8 +126,8 @@ There are three direct implementations for faceted classifications like this in
|
|
127
126
|
... ....
|
128
127
|
|
129
128
|
*3:* Implicit vocabulary, single valued facet
|
130
|
-
|
131
|
-
plays
|
129
|
+
|
130
|
+
plays
|
132
131
|
----+-----------------+---------+---------
|
133
132
|
id | title | genre | date ...
|
134
133
|
----+-----------------|---------+---------
|
@@ -140,13 +139,13 @@ There are three direct implementations for faceted classifications like this in
|
|
140
139
|
6 | Comedy of Errors | comedy |
|
141
140
|
7 | Macbeth | tragedy |
|
142
141
|
8 | Hamlet | tragedy |
|
143
|
-
... ....
|
142
|
+
... ....
|
144
143
|
|
145
144
|
For all of these representations, Repertoire Faceting works by constructing an inverted bitset index from the controlled vocabulary to your central model. Each bit represents a distinct model row (plays.id in this example). 1 indicates the play is in the category, and 0 that it is not:
|
146
145
|
|
147
146
|
_plays_genre_facet
|
148
147
|
---------+-----------
|
149
|
-
genre | signature
|
148
|
+
genre | signature
|
150
149
|
---------+-----------
|
151
150
|
comedy | 00001100
|
152
151
|
history | 01110000
|
@@ -156,10 +155,10 @@ For all of these representations, Repertoire Faceting works by constructing an i
|
|
156
155
|
From these bitset "signatures", Repertoire Faceting can easily count the number of member plays for each category, even in combination with other facets and a base query. For example, the bitset signature for all plays whose title contains the search word "Henry" is 0110000. Masking this (via bitwise "and") with each signature in the genre index above, we see that there are 2 histories that match the base search - Henry 4 parts 1 & 2 - a none in the other categories:
|
157
156
|
|
158
157
|
---------+------------------
|
159
|
-
genre | signature & base
|
158
|
+
genre | signature & base
|
160
159
|
---------+------------------
|
161
|
-
comedy | 00000000
|
162
|
-
history | 01100000
|
160
|
+
comedy | 00000000
|
161
|
+
history | 01100000
|
163
162
|
romance | 00000000
|
164
163
|
tragedy | 00000000
|
165
164
|
|
@@ -173,14 +172,198 @@ References on faceted search:
|
|
173
172
|
- http://en.wikipedia.org/wiki/Controlled_vocabulary
|
174
173
|
|
175
174
|
|
176
|
-
==
|
175
|
+
== A Quick Tour of API Levels
|
176
|
+
|
177
|
+
The Repertoire Faceting module is intended to be a toolkit for building highly-scaleable faceted browsers over data held in relational databases. While it can be used as a black box (see the INSTALL document for a recipe), each API is also designed to be called directly. For example, you might write your own facet widgets in Javascript, using the JSON data feeds from the web service API level.
|
178
|
+
|
179
|
+
The API layers are:
|
180
|
+
|
181
|
+
- Javascript widgets [ see lib/assets/javascripts
|
182
|
+
- JSON web services [ see Repertoire::Faceting::Controller, Repertoire::Faceting::Routing
|
183
|
+
- Rails model & finders [ see Repertoire::Faceting::Model
|
184
|
+
- SQL queries and indexes [ see ext/README.faceting
|
185
|
+
|
186
|
+
To the relationships between the APIs clear, here is the same basic facet count query traced through each layer. While the module itself does not always issue exactly the queries listed here, the basic data model is the same. (To experiment with the APIs, run psql or the rails console from the Repertoire Faceting Example application. You may wish to use "SET search_path = public, facet" to bring the faceting schema's namespace into scope.)
|
187
|
+
|
188
|
+
*** SQL API ***
|
189
|
+
|
190
|
+
The most basic facet count query is a simple SQL aggregation.
|
191
|
+
|
192
|
+
=# SELECT discipline, COUNT(*) FROM nobelists GROUP BY discipline;
|
193
|
+
discipline | count
|
194
|
+
---------------------+-------
|
195
|
+
Chemistry | 12
|
196
|
+
Peace | 2
|
197
|
+
Economics | 13
|
198
|
+
Medicine/Physiology | 9
|
199
|
+
Physics | 27
|
200
|
+
(5 rows)
|
201
|
+
|
202
|
+
Here is the facet value index for nobelist.discipline, as described in the prior section of the README.
|
203
|
+
|
204
|
+
=# SELECT * FROM facet.nobelists_discipline_index;
|
205
|
+
discipline | signature
|
206
|
+
---------------------+------------------------------------------------------------------
|
207
|
+
Physics | 01011011001100110100100101000011010100001000001110110100110001
|
208
|
+
Chemistry | 0000010000001100000000000000100000001011000101000100101
|
209
|
+
Economics | 0010000001000000000101001000010010100000001000000000000000111001
|
210
|
+
Medicine/Physiology | 000000001000000010100010001100000000000001000000000000010000001
|
211
|
+
Peace | 000000000000000000000000000000000000010000001
|
212
|
+
(5 rows)
|
213
|
+
|
214
|
+
The faceting API's count() function returns the number of set bits in a signature. The same query, using a facet value index:
|
215
|
+
|
216
|
+
=# SELECT discipline, facet.count(signature) FROM facet.nobelists_discipline_index;
|
217
|
+
discipline | count
|
218
|
+
---------------------+-------
|
219
|
+
Physics | 27
|
220
|
+
Chemistry | 12
|
221
|
+
Economics | 13
|
222
|
+
Medicine/Physiology | 9
|
223
|
+
Peace | 2
|
224
|
+
(5 rows)
|
225
|
+
|
226
|
+
One of the cardinal virtues of faceted search is that facet value counts show the "landscape" of data surrounding a base query. For example, here is a raw facet value count using "Robert" and the base query.
|
227
|
+
|
228
|
+
=# SELECT discipline, COUNT(*) FROM nobelists WHERE name LIKE 'Robert%' GROUP BY discipline;
|
229
|
+
discipline | count
|
230
|
+
------------+-------
|
231
|
+
Chemistry | 2
|
232
|
+
Physics | 1
|
233
|
+
Economics | 5
|
234
|
+
(3 rows)
|
235
|
+
|
236
|
+
(* Keep in mind a proper data model would use full-text index here.)
|
237
|
+
|
238
|
+
To run facet value counts, we first gather a signature representing the base query, then use it as a mask over each entry in the facet value index. Here is a representative base query in raw SQL:
|
239
|
+
|
240
|
+
=# SELECT id, name, discipline, _packed_id FROM nobelists WHERE name LIKE 'Robert%';
|
241
|
+
id | name | discipline | _packed_id
|
242
|
+
------------+-----------------------+------------+------------
|
243
|
+
57839852 | Robert Burns Woodward | Chemistry | 39
|
244
|
+
506489850 | Robert B. Laughlin | Physics | 40
|
245
|
+
920398821 | Robert M. Solow | Economics | 42
|
246
|
+
54824727 | Robert S. Mulliken | Chemistry | 43
|
247
|
+
309696094 | Robert J. Aumann | Economics | 58
|
248
|
+
249288376 | Robert C. Merton | Economics | 59
|
249
|
+
889316300 | Robert Engle | Economics | 60
|
250
|
+
1039451971 | Robert A. Mundell | Economics | 63
|
251
|
+
(8 rows)
|
252
|
+
|
253
|
+
We can use the faceting API aggregator to read this result into a signature. (Note we use the serial id column, since the primary key is quite sparse.)
|
254
|
+
|
255
|
+
=# SELECT facet.signature(_packed_id) FROM nobelists WHERE name LIKE 'Robert%';
|
256
|
+
signature
|
257
|
+
------------------------------------------------------------------
|
258
|
+
0000000000000000000000000000000000000001101100000000000000111001
|
259
|
+
(1 row)
|
260
|
+
|
261
|
+
Combining this base mask bitwise with each of the signatures in the facet value indices allows us to quickly calculate counts for very large datasets. (*For clarity we access the faceting namespace directly and use a subquery).
|
262
|
+
|
263
|
+
=# SET search_path TO public, facet;
|
264
|
+
=# SELECT discipline, facet.count(index.signature & base.signature) FROM
|
265
|
+
(SELECT facet.signature(_packed_id) FROM nobelists WHERE name LIKE 'Robert%') AS base,
|
266
|
+
facet.nobelists_discipline_index AS index;
|
267
|
+
|
268
|
+
|
269
|
+
discipline | count
|
270
|
+
---------------------+-------
|
271
|
+
Physics | 1
|
272
|
+
Chemistry | 2
|
273
|
+
Economics | 5
|
274
|
+
Medicine/Physiology | 0
|
275
|
+
Peace | 0
|
276
|
+
(5 rows)
|
277
|
+
|
278
|
+
If other facet values have been refined, they are also collected into signatures and used as masks.
|
279
|
+
|
280
|
+
=# SELECT discipline, facet.count(index.signature & base.signature & refine.signature) FROM
|
281
|
+
(SELECT facet.signature(_packed_id) FROM nobelists WHERE name LIKE 'Robert%') AS base,
|
282
|
+
(SELECT signature FROM nobelists_degree_index WHERE degree = 'Ph.D.') AS refine,
|
283
|
+
facet.nobelists_discipline_index AS index;
|
284
|
+
|
285
|
+
In this fashion, facet count queries can be reduced to a single table scan over the model for the base query, plus an index table scan for each facet that has been refined.
|
286
|
+
|
287
|
+
Each of the PostgreSQL API bindings implements these same operators, but over a different bitmap value type.
|
288
|
+
|
289
|
+
*** ActiveRecord API ***
|
290
|
+
|
291
|
+
[ See Repertoire::Faceting::Model for full details ]
|
177
292
|
|
178
|
-
|
293
|
+
The ActiveRecord API is built around the observation that our basic facet value count query is built-in to Rails:
|
179
294
|
|
295
|
+
> Nobelist.count(:discipline)
|
296
|
+
=> {"Physics"=>27, "Economics"=>13, "Chemistry"=>12, "Medicine/Physiology"=>9, "Peace"=>2}
|
180
297
|
|
181
|
-
|
298
|
+
When the Repertoire Faceting module is loaded and facets declared in the model, the same query will read the facet index instead. Execute the query in the console, and you will see the SQL generated reads the facet value index rather than the model table.
|
182
299
|
|
183
|
-
|
300
|
+
> Nobelist.index_facets([:discipline, :degree])
|
301
|
+
> Nobelist.count(:discipline)
|
302
|
+
=> {"Physics"=>27, "Economics"=>13, "Chemistry"=>12, "Medicine/Physiology"=>9, "Peace"=>2}
|
303
|
+
|
304
|
+
Facets act just like Rails scoped queries, so you can use ActiveRecord's native syntax to specify a base query.
|
305
|
+
|
306
|
+
> Nobelist.where("name LIKE 'Robert%'").count(:discipline)
|
307
|
+
=> {"Economics"=>5, "Chemistry"=>2, "Physics"=>1}
|
308
|
+
|
309
|
+
Use refine() to specify facet value refinements on other attribtues.
|
310
|
+
|
311
|
+
> Nobelist.where("name LIKE 'Robert%'").refine(:degree => 'Ph.D.').count(:discipline)
|
312
|
+
=> {"Economics"=>3, "Physics"=>1}
|
313
|
+
|
314
|
+
You will see from the SQL query that the faceting module is detecting which model column to use as an id key, then reading the facet value indices wherever possible. The result is similar to the final SQL API example above.
|
315
|
+
|
316
|
+
If you use refine() without count(), the module will use facet value indices to calculate the list of final results.
|
317
|
+
|
318
|
+
> Nobelist.where("name LIKE 'Robert%'").refine(:degree => 'Ph.D.')
|
319
|
+
=> ...
|
320
|
+
|
321
|
+
Note that because facets are assumed to be multi-valued, refine() is different from a normal ActiveRecord where() clause. In rails an equivalent query would be:
|
322
|
+
|
323
|
+
> Nobelist.where("name LIKE 'Robert%'").joins(:affiliations).where('affiliations.degree' => 'Ph.D.')
|
324
|
+
=> ...
|
325
|
+
|
326
|
+
When the number of rows in the model table is large and many facets come into play, using refine() can yield a performance gain over the straight query.
|
327
|
+
|
328
|
+
*** Web services API ***
|
329
|
+
|
330
|
+
The web services API exposes two JSON feeds, one that returns facet value counts given a set of refinements and another that returns the actual list of results. Your Rails controller constructs the base query, and the faceting webservice handles the surrounding facet refinements. For example, one of the faceting example application's controllers declares a query similar to this:
|
331
|
+
|
332
|
+
class NobelistsController < ApplicationController
|
333
|
+
...
|
334
|
+
def base
|
335
|
+
search = "#{params[:search]}%"
|
336
|
+
Nobelist.where(["name ILIKE ?", search])
|
337
|
+
end
|
338
|
+
|
339
|
+
After including "faceting_for :nobelists" in the router, you can query the indexer by facet name, base query, and refinement filter:
|
340
|
+
|
341
|
+
$ curl -g "http://localhost:3000/nobelists/counts/discipline"
|
342
|
+
[["Physics",27],["Economics",13],["Chemistry",12],["Medicine/Physiology",9],["Peace",2]]
|
343
|
+
|
344
|
+
$ curl -g "http://localhost:3000/nobelists/counts/discipline?search=Robert"
|
345
|
+
[["Economics",5],["Chemistry",2],["Physics",1]]
|
346
|
+
|
347
|
+
$ curl -g "http://localhost:3000/nobelists/counts/discipline?filter[degree][]=Ph.D.&search=Robert"
|
348
|
+
[["Economics",3],["Physics",1]]
|
349
|
+
|
350
|
+
Or you can issue a refinement query to get the results list:
|
351
|
+
|
352
|
+
$ curl -g "http://localhost:3000/nobelists/?filter[degree][]=Ph.D.&search=Robert"
|
353
|
+
=> ...
|
354
|
+
|
355
|
+
|
356
|
+
== Appendix. PostgreSQL in-database Faceting API
|
357
|
+
|
358
|
+
Several bindings for the in-database faceting API are provided. In order of capability, they are:
|
359
|
+
|
360
|
+
- signature C language, requires superuser permissions
|
361
|
+
- bytea Javascript language, requires plv8 extension
|
362
|
+
- varbit No language or superuser requirements
|
363
|
+
|
364
|
+
In general, if you have superuser permissions you should build and install the C-language (signature) API, as it is more scalable than the others, at no cost.
|
365
|
+
|
366
|
+
All the Repertoire Faceting APIs add functionality for bitwise operations and population counts to PostgreSQL. For API details, see the ext directory.
|
184
367
|
|
185
368
|
Signature: an auto-sizing bitset with the following functions
|
186
369
|
|
@@ -189,14 +372,12 @@ Signature: an auto-sizing bitset with the following functions
|
|
189
372
|
- members(a) => { set of integers corresponding to set bits }
|
190
373
|
|
191
374
|
- sig_in, sig_out => { mandatory I/O functions }
|
192
|
-
- sig_and(a, b)
|
193
|
-
- sig_or(a, b)
|
194
|
-
-
|
195
|
-
- sig_length(a) => { number of bits in a }
|
196
|
-
- sig_min(a) => { lowest 1 in a, a.length }
|
375
|
+
- sig_and(a, b) => a & b
|
376
|
+
- sig_or(a, b) => a | b
|
377
|
+
- sig_length(a) => { number of bits in a }
|
197
378
|
- sig_get(a, i) => { ith bit of a, or 0 }
|
198
379
|
- sig_set(a, i, n) => { sets ith bit of a to n }
|
199
|
-
- sig_resize(a, n) => { resizes a to hold n bits }
|
380
|
+
- sig_resize(a, n) => { resizes a to hold at least n bits }
|
200
381
|
|
201
382
|
Bitwise signature operators: &, |
|
202
383
|
|
@@ -206,12 +387,10 @@ Bitwise aggregates:
|
|
206
387
|
- collect(signature) => 'or' signature results together
|
207
388
|
- filter(signature) => 'and' signature results together
|
208
389
|
|
209
|
-
Helper
|
210
|
-
|
211
|
-
- renumber_table(table, column, threshold)
|
390
|
+
Helper aggregates:
|
212
391
|
|
213
|
-
|
392
|
+
- wastage(INT) -> REAL
|
214
393
|
|
215
|
-
|
394
|
+
Aggregator that examines a table's primary key column, checking what proportion of signature bits constructed from the table would be wasted. If the proportion of wasted bits to valid bits is high, you should consider adding a new serial column.
|
216
395
|
|
217
|
-
|
396
|
+
The Rails API introspects signature wastage before any facet indexing operation, and adds or removes a new serial column (called _packed_id) as necessary.
|
data/TODO
CHANGED
@@ -5,9 +5,9 @@ DESIRED FEATURES / IMPROVEMENTS.
|
|
5
5
|
Adding support is a matter of defining the adapter and moving signature()
|
6
6
|
calls into the postgresql adapter ]
|
7
7
|
|
8
|
-
-- modify widgets to multiplex ajax calls to work around 2 call limit
|
8
|
+
-- modify widgets to multiplex ajax calls to work around 2 call limit
|
9
9
|
in many browsers
|
10
|
-
[ design: fetch() queues ajax calls; update() implementations request
|
10
|
+
[ design: fetch() queues ajax calls; update() implementations request
|
11
11
|
queue and merge current webservice call with ones already in the queue.
|
12
12
|
on controller side, receive multiple facet names, iterate, and bundle.
|
13
13
|
fetch() then unbundles the results can dispatches them to appropriate
|
@@ -16,13 +16,26 @@ DESIRED FEATURES / IMPROVEMENTS.
|
|
16
16
|
|
17
17
|
TODO
|
18
18
|
|
19
|
-
|
20
|
-
|
19
|
+
-- gemcutter release of new faceting gem
|
20
|
+
-- revise example app to bundle gem DONE
|
21
|
+
-- redeploy to bytea / varbit targets DONE
|
22
|
+
|
23
|
+
-- reference to the ActiveRecord API in the README DONE
|
24
|
+
-- revise the README to describe API layers & use DONE
|
25
|
+
|
26
|
+
-- clean up population in postgresql adapter? NO
|
27
|
+
-- clean up throughout the facet definitions PARTIAL
|
28
|
+
-- test harness
|
29
|
+
- test all extensions in order, skipping if cannot load DONE
|
30
|
+
-- simplify the in-database API
|
31
|
+
- use materialized views instead of dropping and adding tables DONE
|
32
|
+
- move signature_wastage into an aggregate function DONE
|
33
|
+
- remove the renumber function DONE
|
34
|
+
- the only remaining function should be expand_nesting MOVED
|
35
|
+
-- Makefile is verbose; use patterns NO
|
21
36
|
- check formatting of all docs DONE
|
22
|
-
|
23
37
|
- make rake tasks smarter about detecting whether to run NOT NECESSARY
|
24
38
|
- make rake tasks automatically choose task dep on db type NOT TO DO
|
25
|
-
|
26
39
|
- make sure example app can be set up and deployed via rake DONE
|
27
40
|
|
28
41
|
- README
|
@@ -34,7 +47,7 @@ TODO
|
|
34
47
|
* installing postgresql extensions DONE
|
35
48
|
* migrations for indexing DONE
|
36
49
|
* updating indices (a) postgresql-crontab (b) rake crontab DONE
|
37
|
-
- generate routes that don't conflict with resource routes
|
50
|
+
- generate routes that don't conflict with resource routes
|
38
51
|
(rails thinks /nobelists/results is the nobelist named 'results') NOT TO DO
|
39
52
|
- FAQ
|
40
53
|
* "not grouped error"
|
data/ext/Makefile
CHANGED
@@ -1,27 +1,37 @@
|
|
1
1
|
#-------------------------------------------------------------------------
|
2
2
|
#
|
3
3
|
# Makefile--
|
4
|
-
# Makefile for Repertoire
|
4
|
+
# Makefile for Repertoire faceting API
|
5
5
|
#
|
6
6
|
# By default, this builds against an existing PostgreSQL installation
|
7
7
|
# (the one identified by whichever pg_config is first in your path).
|
8
8
|
#
|
9
9
|
#-------------------------------------------------------------------------
|
10
|
-
|
11
|
-
MODULES = signature
|
12
|
-
DATA_built = signature.sql uninstall_signature.sql
|
13
|
-
# DOCS = README.signature # TODO. postgres 8.3 & 8.4 look for doc files in different places;
|
14
|
-
# temporarily commented out until we have migrated
|
15
10
|
|
16
|
-
|
17
|
-
|
11
|
+
API_VERSION = 0.6.0
|
12
|
+
|
13
|
+
MODULES = signature/signature
|
14
|
+
EXTENSION = signature/faceting \
|
15
|
+
bytea/faceting_bytea \
|
16
|
+
varbit/faceting_varbit
|
17
|
+
DATA_built = faceting--$(API_VERSION).sql \
|
18
|
+
faceting_bytea--$(API_VERSION).sql \
|
19
|
+
faceting_varbit--$(API_VERSION).sql
|
20
|
+
DOCS = README.faceting
|
21
|
+
|
18
22
|
PG_CONFIG = pg_config
|
19
23
|
PGXS := $(shell $(PG_CONFIG) --pgxs)
|
20
|
-
include $(PGXS)
|
21
24
|
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
+
-include $(PGXS)
|
26
|
+
# the dash above means loading pgxs may fail silently on Heroku, but
|
27
|
+
# the API bindings will still be built:
|
28
|
+
api_bindings : $(DATA_built)
|
29
|
+
|
30
|
+
faceting--$(API_VERSION).sql: signature/signature.sql common/util.sql
|
31
|
+
cat signature/signature.sql common/util.sql > $@
|
32
|
+
|
33
|
+
faceting_bytea--$(API_VERSION).sql: bytea/bytea.sql common/util.sql
|
34
|
+
cat bytea/bytea.sql common/util.sql > $@
|
25
35
|
|
26
|
-
|
27
|
-
|
36
|
+
faceting_varbit--$(API_VERSION).sql: varbit/varbit.sql common/util.sql
|
37
|
+
cat varbit/varbit.sql common/util.sql > $@
|
data/ext/README.faceting
ADDED
@@ -0,0 +1,51 @@
|
|
1
|
+
--
|
2
|
+
--
|
3
|
+
-- In-database support for Repertoire faceting module.
|
4
|
+
--
|
5
|
+
-- These libraries add scalable faceted indexing to the PostgreSQL database.
|
6
|
+
--
|
7
|
+
-- Basic approach is similar to other faceted browsers (Solr, Exhibit): an inverted bitmap index
|
8
|
+
-- allows fast computation of facet value counts, given a base context and prior facet refinements.
|
9
|
+
-- Bitsets can also be used to compute the result set of items.
|
10
|
+
--
|
11
|
+
-- There are three bindings for the API. The first extends PostgreSQL with a new bitset datatype
|
12
|
+
-- written in C (called 'signature'). This version provides scaleable faceting up to 1,000,000 items
|
13
|
+
-- and beyond, but requires control over the PostgreSQL server instance to build and load the C
|
14
|
+
-- extensions.
|
15
|
+
--
|
16
|
+
-- The second is implemented using PostgreSQL's built-in VARBIT data type, and scales to a rough
|
17
|
+
-- limit of about 30,000 items. It works in exactly the same way as the 'signature' data type above,
|
18
|
+
-- but is about a factor of 5-10 slower. However, it does not require administrative control over
|
19
|
+
-- the database server to install or use and so is suited to shared host deployment.
|
20
|
+
--
|
21
|
+
-- The third uses PostgreSQL's built-in BYTEA data type, processed via the Google Javascript
|
22
|
+
-- language binding plv8 (https://code.google.com/p/plv8js/wiki/PLV8). Scalability and performance
|
23
|
+
-- are unknown, but should be similar to the native C 'signature' type. However, the server needs
|
24
|
+
-- to have the PLV8 language extension installed.
|
25
|
+
--
|
26
|
+
-- Only one binding of the API needs to be loaded at any time. Each consists of:
|
27
|
+
--
|
28
|
+
-- (a) functions for accessing the bitset data types. These are used to store inverted indices from
|
29
|
+
-- facet values to item ids. Functions are provided for doing refinements and counts on items
|
30
|
+
-- with a given facet value.
|
31
|
+
--
|
32
|
+
-- (b) facilities for adding a packed (continuous) id sequence to the main item table. Packed ids
|
33
|
+
-- are used in the facet value indexes.
|
34
|
+
--
|
35
|
+
-- (c) utility functions for creating/updating packed ids and facet value index tables, e.g. in
|
36
|
+
-- a crontab task.
|
37
|
+
--
|
38
|
+
-- The API bindings can each be built as a PostgreSQL extension, and then loaded and dropped using
|
39
|
+
-- CREATE EXTENSION <faceting|faceting_bytea|faceting_varbit> and DROP EXTENSION ...
|
40
|
+
--
|
41
|
+
-- For hosts without administrative access, the individual sql files can be sourced directly.
|
42
|
+
--
|
43
|
+
-- Installation (in a Rails app)
|
44
|
+
--
|
45
|
+
-- $ cd repertoire-faceting
|
46
|
+
-- $ rake db:faceting:extensions:install
|
47
|
+
--
|
48
|
+
-- Installation (PostgreSQL APIs only)
|
49
|
+
--
|
50
|
+
-- $ cd repertoire-faceting/ext
|
51
|
+
-- $ make; sudo make install
|