repertoire-faceting 0.5.5 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (64) hide show
  1. checksums.yaml +4 -4
  2. data/FAQ +23 -17
  3. data/INSTALL +52 -84
  4. data/LICENSE +1 -1
  5. data/README +213 -34
  6. data/TODO +20 -7
  7. data/ext/Makefile +24 -14
  8. data/ext/README.faceting +51 -0
  9. data/ext/bytea/bytea.sql +173 -0
  10. data/ext/bytea/faceting_bytea.control +6 -0
  11. data/ext/common/util.sql +35 -0
  12. data/ext/faceting--0.6.0.sql +251 -0
  13. data/ext/faceting_bytea--0.6.0.sql +207 -0
  14. data/ext/faceting_varbit--0.6.0.sql +198 -0
  15. data/ext/signature/faceting.control +6 -0
  16. data/ext/signature/signature.c +740 -0
  17. data/ext/{signature.o → signature/signature.o} +0 -0
  18. data/ext/{signature.so → signature/signature.so} +0 -0
  19. data/ext/signature/signature.sql +217 -0
  20. data/ext/varbit/faceting_varbit.control +7 -0
  21. data/ext/varbit/varbit.sql +164 -0
  22. data/{public → lib/assets}/images/repertoire-faceting/proportional_symbol.png +0 -0
  23. data/{public → lib/assets}/images/repertoire-faceting/spinner_sm.gif +0 -0
  24. data/{public → lib/assets}/javascripts/rep.faceting/context.js +2 -2
  25. data/{public → lib/assets}/javascripts/rep.faceting/ext/earth_facet.js +2 -4
  26. data/{public → lib/assets}/javascripts/rep.faceting/facet.js +1 -1
  27. data/{public → lib/assets}/javascripts/rep.faceting/facet_widget.js +3 -8
  28. data/{public → lib/assets}/javascripts/rep.faceting/nested_facet.js +1 -1
  29. data/{public → lib/assets}/javascripts/rep.faceting/results.js +1 -1
  30. data/{public → lib/assets}/javascripts/rep.faceting.js +5 -1
  31. data/{public → lib/assets}/javascripts/rep.protovis-facets.js +3 -3
  32. data/lib/assets/javascripts/rep.widgets/events.js +51 -0
  33. data/lib/assets/javascripts/rep.widgets/global.js +50 -0
  34. data/lib/assets/javascripts/rep.widgets/model.js +159 -0
  35. data/lib/assets/javascripts/rep.widgets/widget.js +213 -0
  36. data/lib/assets/javascripts/rep.widgets.js +14 -0
  37. data/{public → lib/assets}/stylesheets/rep.faceting.css +1 -1
  38. data/lib/repertoire-faceting/adapters/postgresql_adapter.rb +107 -48
  39. data/lib/repertoire-faceting/facets/abstract_facet.rb +43 -27
  40. data/lib/repertoire-faceting/facets/basic_facet.rb +23 -22
  41. data/lib/repertoire-faceting/facets/nested_facet.rb +50 -27
  42. data/lib/repertoire-faceting/model.rb +101 -65
  43. data/lib/repertoire-faceting/rails/engine.rb +8 -0
  44. data/lib/repertoire-faceting/rails/postgresql_adapter.rb +0 -1
  45. data/lib/repertoire-faceting/rails/relation.rb +0 -1
  46. data/lib/repertoire-faceting/railtie.rb +0 -1
  47. data/lib/repertoire-faceting/relation/calculations.rb +7 -2
  48. data/lib/repertoire-faceting/relation/query_methods.rb +17 -4
  49. data/lib/repertoire-faceting/routing.rb +2 -5
  50. data/lib/repertoire-faceting/tasks/all.rake +5 -4
  51. data/lib/repertoire-faceting/tasks/client.rake +2 -5
  52. data/lib/repertoire-faceting/version.rb +1 -1
  53. data/lib/repertoire-faceting.rb +2 -4
  54. data/{public → vendor/assets}/javascripts/google-earth-extensions.js +0 -0
  55. data/{public → vendor/assets}/javascripts/protovis.js +0 -0
  56. metadata +78 -78
  57. data/ext/README.signature +0 -33
  58. data/ext/signature.c +0 -740
  59. data/ext/signature.sql +0 -342
  60. data/ext/signature.sql.IN +0 -342
  61. data/ext/uninstall_signature.sql +0 -4
  62. data/ext/uninstall_signature.sql.IN +0 -4
  63. data/lib/repertoire-faceting/adapters/abstract_adapter.rb +0 -18
  64. data/lib/repertoire-faceting/relation/spawn_methods.rb +0 -26
data/README CHANGED
@@ -1,12 +1,12 @@
1
1
  === Repertoire Faceting README
2
2
 
3
- Repertoire Faceting is highly scalable and extensible module for creating database-driven faceted browsers in Rails 3. It consists of three components: (1) a native PostgreSQL data type for constructing fast bitset indices over controlled vocabularies; (2) Rails 3 model and controller mixins that add a faceting API to your existing application; and (3) a set of extensible javascript widgets for building user-interfaces. In only 10-15 lines of new code you can implement a fully-functional faceted browser for your existing Rails data models, with scalability out of the box to over 1,000,000 items.
3
+ Repertoire Faceting is highly scalable and extensible module for creating database-driven faceted browsers in Rails 3 & 4. It consists of three components: (1) a native PostgreSQL data type for constructing fast bitset indices over controlled vocabularies; (2) Rails model and controller mixins that add a faceting API to your existing application; and (3) a set of extensible javascript widgets for building user-interfaces. In only 10-15 lines of new code you can implement a fully-functional faceted browser for your existing Rails data models, with scalability out of the box to over 1,000,000 items.
4
4
 
5
5
  == Features
6
6
 
7
7
  Several features distinguish Repertoire Faceting from other faceting systems such as Simile Exhibit, Endeca, and Solr.
8
8
 
9
- Repertoire Faceting is an end-to-end solution that works with your existing database schema and Rails models. There's no need to munge your data into a proprietary format, run a separate facet index server, or construct your own user-interface widgets. (Conversely, however, your project needs to use Rails 3, PostgreSQL, and JQuery.)
9
+ Repertoire Faceting is an end-to-end solution that works with your existing database schema and Rails models. There's no need to munge your data into a proprietary format, run a separate facet index server, or construct your own user-interface widgets. (Conversely, however, your project needs to use Rails, PostgreSQL, and JQuery.)
10
10
 
11
11
  The module works equally well on small and large data sets, which means there's a low barrier to entry but existing projects can grow easily. In 'training wheels' mode, the module produces SQL queries for facet counts directly from declarations in your model so you can get a project up and running quickly. Then, after your dataset grows beyond a thousand items or so, just add indices as necessary. The module detects and uses these automatically, with no changes to your code or additional SQL necessary.
12
12
 
@@ -16,7 +16,7 @@ Both facet widgets and indexes are pluggable and extensible. You can subclass th
16
16
 
17
17
  Similarly, you can write new facet implementations for novel data types, which automatically detect and index appropriate columns. For example, the module has been used to do facet value counts over GIS data points on a map, by drilling down through associated GIS layers using spatial logic relations.
18
18
 
19
- For an out-of-the box example using Repertoire Faceting, which demonstrates the module's visualization and scalability features, see the example application (http://github.com/yorkc/repertoire-faceting-example).
19
+ For an out-of-the box example using Repertoire Faceting, which demonstrates the module's visualization and scalability features, see the example application (http://github.com/christopheryork/repertoire-faceting-example).
20
20
 
21
21
 
22
22
  == Installation
@@ -29,10 +29,9 @@ See the INSTALL document for a description of how to install the module and buil
29
29
  You can run the unit tests from the module's root directory. You will need a local PostgreSQL superuser role with your unix username (use 'createuser -Upostgres').
30
30
 
31
31
  $ bundle install
32
- $ rake db:faceting:build { sudo will prompt for your password }
33
- $ rake db:create
32
+ $ rake db:setup
34
33
  $ rake test
35
-
34
+
36
35
 
37
36
  == Generating documentation
38
37
 
@@ -68,9 +67,9 @@ See Repertoire::Faceting::Facets::AbstractFacet
68
67
  It is very useful to create a rake task to update your application's indices. In the project's rake task file:
69
68
 
70
69
  task :reindex => :environment do
71
- Painting.update_indexed_facets([:genre, :era])
70
+ Painting.index_facets([:genre, :era])
72
71
  end
73
-
72
+
74
73
  Then run 'rake reindex' whenever you need to update indices manually.
75
74
 
76
75
  *static* If the facet data is unchanging, use a rake task like the one above to create indices manually while developing or deploying.
@@ -84,7 +83,7 @@ Because repertoire-faceting depends on a native shared library loaded by the Pos
84
83
 
85
84
  <server>$ bundle install --deployment
86
85
  <server>$ export RAILS_ENV=production
87
- <server>$ rake db:faceting:build
86
+ <server>$ rake db:faceting:extensions:install
88
87
  <server>$ # ... from here, follow normal deployment procedure
89
88
 
90
89
 
@@ -96,7 +95,7 @@ There are three direct implementations for faceted classifications like this in
96
95
 
97
96
  *1:* Explicit controlled vocabulary, multiple valued facet
98
97
 
99
- genres plays_genres plays
98
+ genres plays_genres plays
100
99
  ----+--------- ---------+---------- ----+------------------+---------
101
100
  id | name play_id | genre_id id | title | date ...
102
101
  ----+--------- ---------+---------- ----+------------------|---------
@@ -112,7 +111,7 @@ There are three direct implementations for faceted classifications like this in
112
111
 
113
112
  *2:* Implicit vocabulary, multiple valued facet
114
113
 
115
- plays_genres plays
114
+ plays_genres plays
116
115
  ---------+---------- ----+------------------+---------
117
116
  play_id | genre_id id | title | date ...
118
117
  ---------+---------- ----+------------------|---------
@@ -127,8 +126,8 @@ There are three direct implementations for faceted classifications like this in
127
126
  ... ....
128
127
 
129
128
  *3:* Implicit vocabulary, single valued facet
130
-
131
- plays
129
+
130
+ plays
132
131
  ----+-----------------+---------+---------
133
132
  id | title | genre | date ...
134
133
  ----+-----------------|---------+---------
@@ -140,13 +139,13 @@ There are three direct implementations for faceted classifications like this in
140
139
  6 | Comedy of Errors | comedy |
141
140
  7 | Macbeth | tragedy |
142
141
  8 | Hamlet | tragedy |
143
- ... ....
142
+ ... ....
144
143
 
145
144
  For all of these representations, Repertoire Faceting works by constructing an inverted bitset index from the controlled vocabulary to your central model. Each bit represents a distinct model row (plays.id in this example). 1 indicates the play is in the category, and 0 that it is not:
146
145
 
147
146
  _plays_genre_facet
148
147
  ---------+-----------
149
- genre | signature
148
+ genre | signature
150
149
  ---------+-----------
151
150
  comedy | 00001100
152
151
  history | 01110000
@@ -156,10 +155,10 @@ For all of these representations, Repertoire Faceting works by constructing an i
156
155
  From these bitset "signatures", Repertoire Faceting can easily count the number of member plays for each category, even in combination with other facets and a base query. For example, the bitset signature for all plays whose title contains the search word "Henry" is 0110000. Masking this (via bitwise "and") with each signature in the genre index above, we see that there are 2 histories that match the base search - Henry 4 parts 1 & 2 - a none in the other categories:
157
156
 
158
157
  ---------+------------------
159
- genre | signature & base
158
+ genre | signature & base
160
159
  ---------+------------------
161
- comedy | 00000000
162
- history | 01100000
160
+ comedy | 00000000
161
+ history | 01100000
163
162
  romance | 00000000
164
163
  tragedy | 00000000
165
164
 
@@ -173,14 +172,198 @@ References on faceted search:
173
172
  - http://en.wikipedia.org/wiki/Controlled_vocabulary
174
173
 
175
174
 
176
- == Known issues
175
+ == A Quick Tour of API Levels
176
+
177
+ The Repertoire Faceting module is intended to be a toolkit for building highly-scaleable faceted browsers over data held in relational databases. While it can be used as a black box (see the INSTALL document for a recipe), each API is also designed to be called directly. For example, you might write your own facet widgets in Javascript, using the JSON data feeds from the web service API level.
178
+
179
+ The API layers are:
180
+
181
+ - Javascript widgets [ see lib/assets/javascripts
182
+ - JSON web services [ see Repertoire::Faceting::Controller, Repertoire::Faceting::Routing
183
+ - Rails model & finders [ see Repertoire::Faceting::Model
184
+ - SQL queries and indexes [ see ext/README.faceting
185
+
186
+ To the relationships between the APIs clear, here is the same basic facet count query traced through each layer. While the module itself does not always issue exactly the queries listed here, the basic data model is the same. (To experiment with the APIs, run psql or the rails console from the Repertoire Faceting Example application. You may wish to use "SET search_path = public, facet" to bring the faceting schema's namespace into scope.)
187
+
188
+ *** SQL API ***
189
+
190
+ The most basic facet count query is a simple SQL aggregation.
191
+
192
+ =# SELECT discipline, COUNT(*) FROM nobelists GROUP BY discipline;
193
+ discipline | count
194
+ ---------------------+-------
195
+ Chemistry | 12
196
+ Peace | 2
197
+ Economics | 13
198
+ Medicine/Physiology | 9
199
+ Physics | 27
200
+ (5 rows)
201
+
202
+ Here is the facet value index for nobelist.discipline, as described in the prior section of the README.
203
+
204
+ =# SELECT * FROM facet.nobelists_discipline_index;
205
+ discipline | signature
206
+ ---------------------+------------------------------------------------------------------
207
+ Physics | 01011011001100110100100101000011010100001000001110110100110001
208
+ Chemistry | 0000010000001100000000000000100000001011000101000100101
209
+ Economics | 0010000001000000000101001000010010100000001000000000000000111001
210
+ Medicine/Physiology | 000000001000000010100010001100000000000001000000000000010000001
211
+ Peace | 000000000000000000000000000000000000010000001
212
+ (5 rows)
213
+
214
+ The faceting API's count() function returns the number of set bits in a signature. The same query, using a facet value index:
215
+
216
+ =# SELECT discipline, facet.count(signature) FROM facet.nobelists_discipline_index;
217
+ discipline | count
218
+ ---------------------+-------
219
+ Physics | 27
220
+ Chemistry | 12
221
+ Economics | 13
222
+ Medicine/Physiology | 9
223
+ Peace | 2
224
+ (5 rows)
225
+
226
+ One of the cardinal virtues of faceted search is that facet value counts show the "landscape" of data surrounding a base query. For example, here is a raw facet value count using "Robert" and the base query.
227
+
228
+ =# SELECT discipline, COUNT(*) FROM nobelists WHERE name LIKE 'Robert%' GROUP BY discipline;
229
+ discipline | count
230
+ ------------+-------
231
+ Chemistry | 2
232
+ Physics | 1
233
+ Economics | 5
234
+ (3 rows)
235
+
236
+ (* Keep in mind a proper data model would use full-text index here.)
237
+
238
+ To run facet value counts, we first gather a signature representing the base query, then use it as a mask over each entry in the facet value index. Here is a representative base query in raw SQL:
239
+
240
+ =# SELECT id, name, discipline, _packed_id FROM nobelists WHERE name LIKE 'Robert%';
241
+ id | name | discipline | _packed_id
242
+ ------------+-----------------------+------------+------------
243
+ 57839852 | Robert Burns Woodward | Chemistry | 39
244
+ 506489850 | Robert B. Laughlin | Physics | 40
245
+ 920398821 | Robert M. Solow | Economics | 42
246
+ 54824727 | Robert S. Mulliken | Chemistry | 43
247
+ 309696094 | Robert J. Aumann | Economics | 58
248
+ 249288376 | Robert C. Merton | Economics | 59
249
+ 889316300 | Robert Engle | Economics | 60
250
+ 1039451971 | Robert A. Mundell | Economics | 63
251
+ (8 rows)
252
+
253
+ We can use the faceting API aggregator to read this result into a signature. (Note we use the serial id column, since the primary key is quite sparse.)
254
+
255
+ =# SELECT facet.signature(_packed_id) FROM nobelists WHERE name LIKE 'Robert%';
256
+ signature
257
+ ------------------------------------------------------------------
258
+ 0000000000000000000000000000000000000001101100000000000000111001
259
+ (1 row)
260
+
261
+ Combining this base mask bitwise with each of the signatures in the facet value indices allows us to quickly calculate counts for very large datasets. (*For clarity we access the faceting namespace directly and use a subquery).
262
+
263
+ =# SET search_path TO public, facet;
264
+ =# SELECT discipline, facet.count(index.signature & base.signature) FROM
265
+ (SELECT facet.signature(_packed_id) FROM nobelists WHERE name LIKE 'Robert%') AS base,
266
+ facet.nobelists_discipline_index AS index;
267
+
268
+
269
+ discipline | count
270
+ ---------------------+-------
271
+ Physics | 1
272
+ Chemistry | 2
273
+ Economics | 5
274
+ Medicine/Physiology | 0
275
+ Peace | 0
276
+ (5 rows)
277
+
278
+ If other facet values have been refined, they are also collected into signatures and used as masks.
279
+
280
+ =# SELECT discipline, facet.count(index.signature & base.signature & refine.signature) FROM
281
+ (SELECT facet.signature(_packed_id) FROM nobelists WHERE name LIKE 'Robert%') AS base,
282
+ (SELECT signature FROM nobelists_degree_index WHERE degree = 'Ph.D.') AS refine,
283
+ facet.nobelists_discipline_index AS index;
284
+
285
+ In this fashion, facet count queries can be reduced to a single table scan over the model for the base query, plus an index table scan for each facet that has been refined.
286
+
287
+ Each of the PostgreSQL API bindings implements these same operators, but over a different bitmap value type.
288
+
289
+ *** ActiveRecord API ***
290
+
291
+ [ See Repertoire::Faceting::Model for full details ]
177
292
 
178
- - Running the unit tests issues warnings about a circular require. These can be ignored.
293
+ The ActiveRecord API is built around the observation that our basic facet value count query is built-in to Rails:
179
294
 
295
+ > Nobelist.count(:discipline)
296
+ => {"Physics"=>27, "Economics"=>13, "Chemistry"=>12, "Medicine/Physiology"=>9, "Peace"=>2}
180
297
 
181
- == PostgreSQL C extensions
298
+ When the Repertoire Faceting module is loaded and facets declared in the model, the same query will read the facet index instead. Execute the query in the console, and you will see the SQL generated reads the facet value index rather than the model table.
182
299
 
183
- Repertoire Faceting adds a native data type supporting bitwise operations and population count to PostgreSQL. For API details, see ext/signature.sql.IN.
300
+ > Nobelist.index_facets([:discipline, :degree])
301
+ > Nobelist.count(:discipline)
302
+ => {"Physics"=>27, "Economics"=>13, "Chemistry"=>12, "Medicine/Physiology"=>9, "Peace"=>2}
303
+
304
+ Facets act just like Rails scoped queries, so you can use ActiveRecord's native syntax to specify a base query.
305
+
306
+ > Nobelist.where("name LIKE 'Robert%'").count(:discipline)
307
+ => {"Economics"=>5, "Chemistry"=>2, "Physics"=>1}
308
+
309
+ Use refine() to specify facet value refinements on other attribtues.
310
+
311
+ > Nobelist.where("name LIKE 'Robert%'").refine(:degree => 'Ph.D.').count(:discipline)
312
+ => {"Economics"=>3, "Physics"=>1}
313
+
314
+ You will see from the SQL query that the faceting module is detecting which model column to use as an id key, then reading the facet value indices wherever possible. The result is similar to the final SQL API example above.
315
+
316
+ If you use refine() without count(), the module will use facet value indices to calculate the list of final results.
317
+
318
+ > Nobelist.where("name LIKE 'Robert%'").refine(:degree => 'Ph.D.')
319
+ => ...
320
+
321
+ Note that because facets are assumed to be multi-valued, refine() is different from a normal ActiveRecord where() clause. In rails an equivalent query would be:
322
+
323
+ > Nobelist.where("name LIKE 'Robert%'").joins(:affiliations).where('affiliations.degree' => 'Ph.D.')
324
+ => ...
325
+
326
+ When the number of rows in the model table is large and many facets come into play, using refine() can yield a performance gain over the straight query.
327
+
328
+ *** Web services API ***
329
+
330
+ The web services API exposes two JSON feeds, one that returns facet value counts given a set of refinements and another that returns the actual list of results. Your Rails controller constructs the base query, and the faceting webservice handles the surrounding facet refinements. For example, one of the faceting example application's controllers declares a query similar to this:
331
+
332
+ class NobelistsController < ApplicationController
333
+ ...
334
+ def base
335
+ search = "#{params[:search]}%"
336
+ Nobelist.where(["name ILIKE ?", search])
337
+ end
338
+
339
+ After including "faceting_for :nobelists" in the router, you can query the indexer by facet name, base query, and refinement filter:
340
+
341
+ $ curl -g "http://localhost:3000/nobelists/counts/discipline"
342
+ [["Physics",27],["Economics",13],["Chemistry",12],["Medicine/Physiology",9],["Peace",2]]
343
+
344
+ $ curl -g "http://localhost:3000/nobelists/counts/discipline?search=Robert"
345
+ [["Economics",5],["Chemistry",2],["Physics",1]]
346
+
347
+ $ curl -g "http://localhost:3000/nobelists/counts/discipline?filter[degree][]=Ph.D.&search=Robert"
348
+ [["Economics",3],["Physics",1]]
349
+
350
+ Or you can issue a refinement query to get the results list:
351
+
352
+ $ curl -g "http://localhost:3000/nobelists/?filter[degree][]=Ph.D.&search=Robert"
353
+ => ...
354
+
355
+
356
+ == Appendix. PostgreSQL in-database Faceting API
357
+
358
+ Several bindings for the in-database faceting API are provided. In order of capability, they are:
359
+
360
+ - signature C language, requires superuser permissions
361
+ - bytea Javascript language, requires plv8 extension
362
+ - varbit No language or superuser requirements
363
+
364
+ In general, if you have superuser permissions you should build and install the C-language (signature) API, as it is more scalable than the others, at no cost.
365
+
366
+ All the Repertoire Faceting APIs add functionality for bitwise operations and population counts to PostgreSQL. For API details, see the ext directory.
184
367
 
185
368
  Signature: an auto-sizing bitset with the following functions
186
369
 
@@ -189,14 +372,12 @@ Signature: an auto-sizing bitset with the following functions
189
372
  - members(a) => { set of integers corresponding to set bits }
190
373
 
191
374
  - sig_in, sig_out => { mandatory I/O functions }
192
- - sig_and(a, b) => a & b
193
- - sig_or(a, b) => a | b
194
- - sig_xor(a) => ~a
195
- - sig_length(a) => { number of bits in a }
196
- - sig_min(a) => { lowest 1 in a, a.length }
375
+ - sig_and(a, b) => a & b
376
+ - sig_or(a, b) => a | b
377
+ - sig_length(a) => { number of bits in a }
197
378
  - sig_get(a, i) => { ith bit of a, or 0 }
198
379
  - sig_set(a, i, n) => { sets ith bit of a to n }
199
- - sig_resize(a, n) => { resizes a to hold n bits }
380
+ - sig_resize(a, n) => { resizes a to hold at least n bits }
200
381
 
201
382
  Bitwise signature operators: &, |
202
383
 
@@ -206,12 +387,10 @@ Bitwise aggregates:
206
387
  - collect(signature) => 'or' signature results together
207
388
  - filter(signature) => 'and' signature results together
208
389
 
209
- Helper functions:
210
-
211
- - renumber_table(table, column, threshold)
390
+ Helper aggregates:
212
391
 
213
- Check what percentage of signature bits constructed from the specified table and column would be wasted. If higher than the threshold, drop and re-add the column with a packed, contiguous integer id.
392
+ - wastage(INT) -> REAL
214
393
 
215
- - signature_wastage(table, column)
394
+ Aggregator that examines a table's primary key column, checking what proportion of signature bits constructed from the table would be wasted. If the proportion of wasted bits to valid bits is high, you should consider adding a new serial column.
216
395
 
217
- Returns a real number representing the percentage of bits that would be wasted in signatures constructed from the specified column.
396
+ The Rails API introspects signature wastage before any facet indexing operation, and adds or removes a new serial column (called _packed_id) as necessary.
data/TODO CHANGED
@@ -5,9 +5,9 @@ DESIRED FEATURES / IMPROVEMENTS.
5
5
  Adding support is a matter of defining the adapter and moving signature()
6
6
  calls into the postgresql adapter ]
7
7
 
8
- -- modify widgets to multiplex ajax calls to work around 2 call limit
8
+ -- modify widgets to multiplex ajax calls to work around 2 call limit
9
9
  in many browsers
10
- [ design: fetch() queues ajax calls; update() implementations request
10
+ [ design: fetch() queues ajax calls; update() implementations request
11
11
  queue and merge current webservice call with ones already in the queue.
12
12
  on controller side, receive multiple facet names, iterate, and bundle.
13
13
  fetch() then unbundles the results can dispatches them to appropriate
@@ -16,13 +16,26 @@ DESIRED FEATURES / IMPROVEMENTS.
16
16
 
17
17
  TODO
18
18
 
19
- - deploy to menzinga, document process
20
- - make sure works with postgresql-crontab
19
+ -- gemcutter release of new faceting gem
20
+ -- revise example app to bundle gem DONE
21
+ -- redeploy to bytea / varbit targets DONE
22
+
23
+ -- reference to the ActiveRecord API in the README DONE
24
+ -- revise the README to describe API layers & use DONE
25
+
26
+ -- clean up population in postgresql adapter? NO
27
+ -- clean up throughout the facet definitions PARTIAL
28
+ -- test harness
29
+ - test all extensions in order, skipping if cannot load DONE
30
+ -- simplify the in-database API
31
+ - use materialized views instead of dropping and adding tables DONE
32
+ - move signature_wastage into an aggregate function DONE
33
+ - remove the renumber function DONE
34
+ - the only remaining function should be expand_nesting MOVED
35
+ -- Makefile is verbose; use patterns NO
21
36
  - check formatting of all docs DONE
22
-
23
37
  - make rake tasks smarter about detecting whether to run NOT NECESSARY
24
38
  - make rake tasks automatically choose task dep on db type NOT TO DO
25
-
26
39
  - make sure example app can be set up and deployed via rake DONE
27
40
 
28
41
  - README
@@ -34,7 +47,7 @@ TODO
34
47
  * installing postgresql extensions DONE
35
48
  * migrations for indexing DONE
36
49
  * updating indices (a) postgresql-crontab (b) rake crontab DONE
37
- - generate routes that don't conflict with resource routes
50
+ - generate routes that don't conflict with resource routes
38
51
  (rails thinks /nobelists/results is the nobelist named 'results') NOT TO DO
39
52
  - FAQ
40
53
  * "not grouped error"
data/ext/Makefile CHANGED
@@ -1,27 +1,37 @@
1
1
  #-------------------------------------------------------------------------
2
2
  #
3
3
  # Makefile--
4
- # Makefile for Repertoire bitset signature type
4
+ # Makefile for Repertoire faceting API
5
5
  #
6
6
  # By default, this builds against an existing PostgreSQL installation
7
7
  # (the one identified by whichever pg_config is first in your path).
8
8
  #
9
9
  #-------------------------------------------------------------------------
10
-
11
- MODULES = signature
12
- DATA_built = signature.sql uninstall_signature.sql
13
- # DOCS = README.signature # TODO. postgres 8.3 & 8.4 look for doc files in different places;
14
- # temporarily commented out until we have migrated
15
10
 
16
- SHLIB_LINK = $(BE_DLLLIBS)
17
-
11
+ API_VERSION = 0.6.0
12
+
13
+ MODULES = signature/signature
14
+ EXTENSION = signature/faceting \
15
+ bytea/faceting_bytea \
16
+ varbit/faceting_varbit
17
+ DATA_built = faceting--$(API_VERSION).sql \
18
+ faceting_bytea--$(API_VERSION).sql \
19
+ faceting_varbit--$(API_VERSION).sql
20
+ DOCS = README.faceting
21
+
18
22
  PG_CONFIG = pg_config
19
23
  PGXS := $(shell $(PG_CONFIG) --pgxs)
20
- include $(PGXS)
21
24
 
22
- #-- Repertoire specific targets
23
- signature.sql:
24
- cp signature.sql.IN signature.sql
25
+ -include $(PGXS)
26
+ # the dash above means loading pgxs may fail silently on Heroku, but
27
+ # the API bindings will still be built:
28
+ api_bindings : $(DATA_built)
29
+
30
+ faceting--$(API_VERSION).sql: signature/signature.sql common/util.sql
31
+ cat signature/signature.sql common/util.sql > $@
32
+
33
+ faceting_bytea--$(API_VERSION).sql: bytea/bytea.sql common/util.sql
34
+ cat bytea/bytea.sql common/util.sql > $@
25
35
 
26
- uninstall_signature.sql:
27
- cp uninstall_signature.sql.IN uninstall_signature.sql
36
+ faceting_varbit--$(API_VERSION).sql: varbit/varbit.sql common/util.sql
37
+ cat varbit/varbit.sql common/util.sql > $@
@@ -0,0 +1,51 @@
1
+ --
2
+ --
3
+ -- In-database support for Repertoire faceting module.
4
+ --
5
+ -- These libraries add scalable faceted indexing to the PostgreSQL database.
6
+ --
7
+ -- Basic approach is similar to other faceted browsers (Solr, Exhibit): an inverted bitmap index
8
+ -- allows fast computation of facet value counts, given a base context and prior facet refinements.
9
+ -- Bitsets can also be used to compute the result set of items.
10
+ --
11
+ -- There are three bindings for the API. The first extends PostgreSQL with a new bitset datatype
12
+ -- written in C (called 'signature'). This version provides scaleable faceting up to 1,000,000 items
13
+ -- and beyond, but requires control over the PostgreSQL server instance to build and load the C
14
+ -- extensions.
15
+ --
16
+ -- The second is implemented using PostgreSQL's built-in VARBIT data type, and scales to a rough
17
+ -- limit of about 30,000 items. It works in exactly the same way as the 'signature' data type above,
18
+ -- but is about a factor of 5-10 slower. However, it does not require administrative control over
19
+ -- the database server to install or use and so is suited to shared host deployment.
20
+ --
21
+ -- The third uses PostgreSQL's built-in BYTEA data type, processed via the Google Javascript
22
+ -- language binding plv8 (https://code.google.com/p/plv8js/wiki/PLV8). Scalability and performance
23
+ -- are unknown, but should be similar to the native C 'signature' type. However, the server needs
24
+ -- to have the PLV8 language extension installed.
25
+ --
26
+ -- Only one binding of the API needs to be loaded at any time. Each consists of:
27
+ --
28
+ -- (a) functions for accessing the bitset data types. These are used to store inverted indices from
29
+ -- facet values to item ids. Functions are provided for doing refinements and counts on items
30
+ -- with a given facet value.
31
+ --
32
+ -- (b) facilities for adding a packed (continuous) id sequence to the main item table. Packed ids
33
+ -- are used in the facet value indexes.
34
+ --
35
+ -- (c) utility functions for creating/updating packed ids and facet value index tables, e.g. in
36
+ -- a crontab task.
37
+ --
38
+ -- The API bindings can each be built as a PostgreSQL extension, and then loaded and dropped using
39
+ -- CREATE EXTENSION <faceting|faceting_bytea|faceting_varbit> and DROP EXTENSION ...
40
+ --
41
+ -- For hosts without administrative access, the individual sql files can be sourced directly.
42
+ --
43
+ -- Installation (in a Rails app)
44
+ --
45
+ -- $ cd repertoire-faceting
46
+ -- $ rake db:faceting:extensions:install
47
+ --
48
+ -- Installation (PostgreSQL APIs only)
49
+ --
50
+ -- $ cd repertoire-faceting/ext
51
+ -- $ make; sudo make install