repertoire-faceting 0.5.2 → 0.5.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README CHANGED
@@ -1,429 +1,217 @@
1
- == Deployment
2
-
3
- * repertoire-assets compression
4
-
5
- * deploying native extensions to server
1
+ === Repertoire Faceting README
6
2
 
7
- * sub-uris
3
+ Repertoire Faceting is highly scalable and extensible module for creating database-driven faceted browsers in Rails 3. It consists of three components: (1) a native PostgreSQL data type for constructing fast bitset indices over controlled vocabularies; (2) Rails 3 model and controller mixins that add a faceting API to your existing application; and (3) a set of extensible javascript widgets for building user-interfaces. In only 10-15 lines of new code you can implement a fully-functional faceted browser for your existing Rails data models, with scalability out of the box to over 1,000,000 items.
8
4
 
9
- * do NOT use schema.rb with repertoire-faceting
10
-
11
-
12
- == Running unit tests
5
+ == Features
13
6
 
14
- You can run the unit tests from the module's root directory. You will need a
15
- local PostgreSQL superuser role with your unix username (use 'createuser -Upostgres').
7
+ Several features distinguish Repertoire Faceting from other faceting systems such as Simile Exhibit, Endeca, and Solr.
16
8
 
17
- $ bundle install
18
- $ rake db:faceting:build { sudo will prompt for your password }
19
- $ rake db:create
20
- $ rake test
9
+ Repertoire Faceting is an end-to-end solution that works with your existing database schema and Rails models. There's no need to munge your data into a proprietary format, run a separate facet index server, or construct your own user-interface widgets. (Conversely, however, your project needs to use Rails 3, PostgreSQL, and JQuery.)
21
10
 
11
+ The module works equally well on small and large data sets, which means there's a low barrier to entry but existing projects can grow easily. In 'training wheels' mode, the module produces SQL queries for facet counts directly from declarations in your model so you can get a project up and running quickly. Then, after your dataset grows beyond a thousand items or so, just add indices as necessary. The module detects and uses these automatically, with no changes to your code or additional SQL necessary.
22
12
 
23
- == Known issues
13
+ Unlike some faceting systems, hierarchical vocabularies are supported out of the box. Using familiar SQL expressions you can decompose a date field into a drillable year / month / day facet. Or you can combine several columns into a single nested facet, for example from countries to states to cities.
24
14
 
25
- - Running the unit tests issues warnings about a circular require. These can be ignored.
15
+ Both facet widgets and indexes are pluggable and extensible. You can subclass the javascript widgets to build drillable data visualizations, for example using bar graphs, donut and scatter charts or heat-maps to display the current search state and results.
26
16
 
17
+ Similarly, you can write new facet implementations for novel data types, which automatically detect and index appropriate columns. For example, the module has been used to do facet value counts over GIS data points on a map, by drilling down through associated GIS layers using spatial logic relations.
27
18
 
28
- == Updating Indices
19
+ For an out-of-the box example using Repertoire Faceting, which demonstrates the module's visualization and scalability features, see the example application (http://github.com/yorkc/repertoire-faceting-example).
29
20
 
30
- Depending on how frequently you need to update facet indices, there are several options.
31
21
 
32
- (1) UNIX crontab, periodic, or launchd
22
+ == Installation
33
23
 
34
- Writing a rake task to update the facet indices for a given model is the simplest way to go. In your application's
35
- rake task file, use something like the following:
24
+ See the INSTALL document for a description of how to install the module and build a basic faceted browser for your existing Rails app.
36
25
 
37
- task :reindex => :environment do
38
- Nobelist.update_indexed_facets([:degree, :discipline])
39
- end
40
-
41
- Then configure your system-level crontab task to run 'rake reindex' at the required interval.
42
-
43
- (2) Using Repertoire's postgresql-crontab functionality
44
26
 
45
- If your project uses the Repertoire in-database crontab system, you can use migrations to insert SQL calls that
46
- re-index your model. However, since the database has no access to rails you will need to call the plpgsql
47
- index maintenance functions directly. In your application's indexing migration, use something like the following:
48
-
49
- class IndexFacets < ActiveRecord::Migration
50
- def self.up
51
- execute<<-SQL
52
- SQL
53
- end
54
-
55
- def self.down
56
- execute<<-SQL
57
- SQL
58
- end
59
- end
60
-
61
- (3) Using a one-time Rails migration
62
-
63
- If your facets' contents will not change, then you can create a rake task as in step 1 and run it by hand. Or
64
- create a migration with the same calls as the rake task.
65
-
66
-
67
-
68
- == Deployment
27
+ == Running unit tests
69
28
 
70
- Because repertoire-faceting depends on a native shared library loaded by the
71
- PostgreSQL server, the first time you deploy you will need to build and install
72
- the extension.
29
+ You can run the unit tests from the module's root directory. You will need a local PostgreSQL superuser role with your unix username (use 'createuser -Upostgres').
73
30
 
74
- <server>$ bundle install --deployment
75
- <server>$ export RAILS_ENV=production
76
- <server>$ rake db:facetinginstall
77
- <server>$ rake db:facetingload
78
-
31
+ $ bundle install
32
+ $ rake db:faceting:build { sudo will prompt for your password }
33
+ $ rake db:create
34
+ $ rake test
79
35
 
80
36
 
37
+ == Generating documentation
81
38
 
82
- Rails3 documentation revisions
83
-
84
- - requires ruby 1.9.2 (ordered hashes)
85
- - include_root_in_json = false
86
- - rake db:facetingpostgres:install, :load
87
-
88
- To cover in the README:
89
-
90
- - special features
91
- - scalability
92
- - nested facets
93
- - facet visualizations
94
- - extensible facet widgets, pluggable implementations
95
- - easy learning curve:
96
- - 'training wheels' mode
97
- - automatic index generation and detection
98
- [ no SQL necessary ]
99
- - persistent facet indices ( no separate server )
100
- - cross-database
101
- - packed ids
102
-
103
- - how facet indices work
104
- - why in-database
39
+ All API documentation, both ruby or javascript, is inline. To generate:
105
40
 
106
- - faceting: controlled vocabularies
107
- relational representations
41
+ $ rake doc
108
42
 
109
- - maintaining facet indices
110
- - via migration
111
- - via rake task
43
+ For the javascript API documentation, please look in the source files.
112
44
 
113
- - ordering goes by facet name (not original column)
114
- - default grouping: column with facet name on last join
115
45
 
116
- =====
46
+ == Faceting declarations (Model API)
117
47
 
118
- API for faceted browsing within PostgreSQL, with extensions for Merb and jquery ajax widgets
48
+ See Repertoire::Faceting::Model::ClassMethods
119
49
 
120
- ===== Strategy =====
121
50
 
122
- - faceting only on individual tables (table is context)
123
- - packed id (0..n) kept for each model with facets
124
- - faceting tables kept in each project schema
125
- - each facet has own index table (named _{table}_{facet}_facet, of facet value -> signature)
126
- - nested facet values kept as arrays
127
- - update facet index (a) explicitly, (b) using Repertoire's crontab sweeper, or (c) both
51
+ == Faceting webservices (Controller API)
128
52
 
129
- ===== PostgreSQL extensions =====
53
+ See Repertoire::Faceting::Controller
130
54
 
131
- = The following code leads you through an example of the SQL calls to set up and execute
132
- = a faceted navigation.
133
55
 
134
- = You can either use this SQL interface directly, or the Datamapper model and migration extensions
135
- = above (which wrap the SQL interface).
56
+ == Facet widgets / HTML (User Interface API)
136
57
 
137
- == Setup ==
58
+ See rep.faceting.js inline documentation in the source tree
138
59
 
139
- = creating facet indices =
140
60
 
141
- CREATE TABLE _project_status_facet AS SELECT status, signature(id) FROM project;
61
+ == Custom facet implementations
142
62
 
143
- [ This creates an inverted bitmap index of facet values and project ids. ]
144
- [ However, it wastes space since even unused id numbers occupy a bit in each entry. Let's use packed ids beginning at 1 instead. ]
63
+ See Repertoire::Faceting::Facets::AbstractFacet
145
64
 
146
- SELECT renumber_table('project', '_packed_id');
147
- CREATE TABLE _project_status_facet AS SELECT status, signature(_packed_id) FROM project GROUP BY status;
148
65
 
149
- [ Now the inverted index contains packed bitmaps that refer to a new column on the base table ]
150
- [ But what if we're refreshing an existing facet index instead of creating one from scratch? ]
66
+ == Updating Facet Indices
151
67
 
152
- SELECT renumber_table('project', '_packed_id');
153
- SELECT recreate_table('_project_status_facet', 'SELECT status, signature(_packed_id) FROM project GROUP BY status');
68
+ It is very useful to create a rake task to update your application's indices. In the project's rake task file:
154
69
 
155
- [ Now we can run this pair of commands to regenerate the facet indices periodically (uses Repertoire's crontab): ]
156
-
157
- INSERT INTO crontab(notice, role, task, interval)
158
- VALUES ('Update Project facets', 'web', $$
159
- SELECT renumber_table('project', '_packed_id');
160
- SELECT recreate_table('_project_status_facet', 'SELECT status, signature(_packed_id) FROM project GROUP BY status');
161
- $$, '10 seconds');
162
-
163
- [ Simply declare any other facet indices in separate recreate_table statements and set the refresh cycle time ]
164
- [ But what if you have a complex data model whose facet values are defined by a join? This example indexes a multivalued many-to-many join ]
165
-
166
- ... SELECT recreate_table('_project_feature_facet',
167
- 'SELECT features.name AS feature, signature(project._packed_id) ' ||
168
- 'FROM project JOIN project_features ON (project_id = project.id) JOIN features ON (feature_id = features.id) ' ||
169
- 'GROUP BY features.name');
170
-
171
- [ Finally, to index nested facets collect the values into an array. Then call expand_nesting to post-process the facet index.
172
- (This adds entries for interior nodes in the nesting tree.) ]
173
-
174
- ... SELECT recreate_table('_project_start_facet',
175
- 'SELECT ARRAY[ EXTRACT(year FROM date), EXTRACT(month FROM date), EXTRACT(day FROM date) ] AS start, signature(_packed_id) ' ||
176
- 'FROM project GROUP BY date');
177
- SELECT expand_nesting('_project_start_facet', 'start');
70
+ task :reindex => :environment do
71
+ Painting.update_indexed_facets([:genre, :era])
72
+ end
73
+
74
+ Then run 'rake reindex' whenever you need to update indices manually.
178
75
 
76
+ *static* If the facet data is unchanging, use a rake task like the one above to create indices manually while developing or deploying.
179
77
 
180
- == Faceted Navigation ==
78
+ *crontab* The easiest way to update indices periodically is to run a rake task like the one above via a UNIX tool such as launchd, periodic, or crontab. See the documentation for your tool of choice.
181
79
 
182
- = to get a context base signature (via fulltext search):
183
80
 
184
- SELECT signature(_packed_id) FROM project WHERE _fulltext @@ to_tsquery('Bush');
81
+ == Deployment
185
82
 
186
- = to get refinement filter signature (using indices declared above):
83
+ Because repertoire-faceting depends on a native shared library loaded by the PostgreSQL server, the first time you deploy you will need to build and install the extension.
84
+
85
+ <server>$ bundle install --deployment
86
+ <server>$ export RAILS_ENV=production
87
+ <server>$ rake db:faceting:build
88
+ <server>$ # ... from here, follow normal deployment procedure
89
+
90
+
91
+ == How the module works
92
+
93
+ It is helpful to think of faceted data as a set of model items categorised by one or more controlled vocabularies, as this eliminates confusion from the start. (A faceted classification is neither object-oriented nor relational, though it can be represented in either.) For example, one might categorise Shakespeare's plays by a controlled vocabulary of genres -- comedy, history, tragedy, or romance. Counting the total number of plays for each vocabulary item in this "genre" facet, we see 13 comedies, 10 histories, 10 tragedies, and 4 romances.
94
+
95
+ There are three direct implementations for faceted classifications like this in an SQL database. The controlled vocabulary can be listed explicitly in a separate table, or implicit in the range of values in a column on the central table (for single-valued facets) or on a join table (for multi-valued facets). Repertoire Faceting supports all of these configurations.
96
+
97
+ *1:* Explicit controlled vocabulary, multiple valued facet
98
+
99
+ genres plays_genres plays
100
+ ----+--------- ---------+---------- ----+------------------+---------
101
+ id | name play_id | genre_id id | title | date ...
102
+ ----+--------- ---------+---------- ----+------------------|---------
103
+ 1 | comedy 1 | 4 1 | The Tempest |
104
+ 2 | tragedy 2 | 3 2 | Henry 4, pt 1 |
105
+ 3 | history 3 | 3 3 | Henry 4, pt 2 |
106
+ 4 | romance 4 | 3 4 | Henry 5 |
107
+ 5 | 1 5 | As You Like It |
108
+ 6 | 1 6 | Comedy of Errors |
109
+ 7 | 2 7 | Macbeth |
110
+ 8 | 2 8 | Hamlet |
111
+ ... ....
112
+
113
+ *2:* Implicit vocabulary, multiple valued facet
114
+
115
+ plays_genres plays
116
+ ---------+---------- ----+------------------+---------
117
+ play_id | genre_id id | title | date ...
118
+ ---------+---------- ----+------------------|---------
119
+ 1 | romance 1 | The Tempest |
120
+ 2 | history 2 | Henry 4, pt 1 |
121
+ 3 | history 3 | Henry 4, pt 2 |
122
+ 4 | history 4 | Henry 5 |
123
+ 5 | comedy 5 | As You Like It |
124
+ 6 | comedy 6 | Comedy of Errors |
125
+ 7 | tragedy 7 | Macbeth |
126
+ 8 | tragedy 8 | Hamlet |
127
+ ... ....
128
+
129
+ *3:* Implicit vocabulary, single valued facet
130
+
131
+ plays
132
+ ----+-----------------+---------+---------
133
+ id | title | genre | date ...
134
+ ----+-----------------|---------+---------
135
+ 1 | The Tempest | romance |
136
+ 2 | Henry 4, pt 1 | history |
137
+ 3 | Henry 4, pt 2 | history |
138
+ 4 | Henry 5 | history |
139
+ 5 | As You Like It | comedy |
140
+ 6 | Comedy of Errors | comedy |
141
+ 7 | Macbeth | tragedy |
142
+ 8 | Hamlet | tragedy |
143
+ ... ....
144
+
145
+ For all of these representations, Repertoire Faceting works by constructing an inverted bitset index from the controlled vocabulary to your central model. Each bit represents a distinct model row (plays.id in this example). 1 indicates the play is in the category, and 0 that it is not:
146
+
147
+ _plays_genre_facet
148
+ ---------+-----------
149
+ genre | signature
150
+ ---------+-----------
151
+ comedy | 00001100
152
+ history | 01110000
153
+ romance | 10000000
154
+ tragedy | 00000011
155
+
156
+ From these bitset "signatures", Repertoire Faceting can easily count the number of member plays for each category, even in combination with other facets and a base query. For example, the bitset signature for all plays whose title contains the search word "Henry" is 0110000. Masking this (via bitwise "and") with each signature in the genre index above, we see that there are 2 histories that match the base search - Henry 4 parts 1 & 2 - a none in the other categories:
157
+
158
+ ---------+------------------
159
+ genre | signature & base
160
+ ---------+------------------
161
+ comedy | 00000000
162
+ history | 01100000
163
+ romance | 00000000
164
+ tragedy | 00000000
165
+
166
+ Refinements on other facets are processed similarly, by looking up the relevant bitset signature for the refined value, and masking it against each potential value in the facet to be enumerated.
167
+
168
+ As you may have noticed, this scheme depends on play ids being sequential. Otherwise many bits corresponding to no-existent ids are wasted in every signature. To address this issue, Repertoire Faceting examines the projected wastage in constructing bitset signatures from the primary key id of your model table. If more than a predefined amount (e.g. 15%) of the signature would be wasted, the module instead adds a new column of sequentially packed ids that are used only for faceted searches. When the model's facets are re-indexed, the ids are examined and repacked if too much space is wasted.
169
+
170
+ References on faceted search:
171
+
172
+ - http://flamenco.berkeley.edu/pubs.html
173
+ - http://en.wikipedia.org/wiki/Controlled_vocabulary
187
174
 
188
- SELECT filter(signature) FROM
189
- (SELECT signature FROM _project_feature_facet WHERE feature = 'visualize'
190
- UNION
191
- SELECT signature FROM _project_status_facet WHERE status = 'in progress') AS filter
192
- ...
193
175
 
194
- = assuming you have the base and refinement signatures already calculated, get facet value counts with
176
+ == Known issues
195
177
 
196
- SELECT feature, count FROM
197
- (SELECT feature, count(<base> & <filter> & signature)
198
- FROM _project_feature_facet) AS facet
199
- WHERE count > 0 ORDER BY count DESC, feature ASC;
178
+ - Running the unit tests issues warnings about a circular require. These can be ignored.
200
179
 
201
- = or, to get facet results (e.g.):
202
180
 
203
- SELECT * FROM project WHERE contains(base & filter, _packed_id) ...; -- option 1: contains(signature, int) -> boolean
204
- SELECT * FROM project, members(base & filter) WHERE _packed_id = members ...; -- option 2: members(signature) -> set of int
181
+ == PostgreSQL C extensions
205
182
 
206
- [ of these, the second will generally be faster since it allows Postgres to do a merge join rather than a nested loop ]
183
+ Repertoire Faceting adds a native data type supporting bitwise operations and population count to PostgreSQL. For API details, see ext/signature.sql.IN.
207
184
 
208
- = a variation: to get counts for a facet with nested values, send in an array
185
+ Signature: an auto-sizing bitset with the following functions
209
186
 
210
- SELECT start_date, count FROM
211
- (SELECT start_date, count(<base> & <filter> & signature)
212
- FROM _project_start_date_facet
213
- WHERE start_date[1:2] = ARRAY [ '1993', '12' ]) AS facet
214
- WHERE count > 0 ORDER BY count DESC, start_date ASC;
215
-
216
- = putting it all together - if you're querying directly using SQL, you'll probably use sub-selects to compute the base and filter
217
- signatures:
218
- { this is a count query on the facet 'region', base query 'Bush', filter feature=browse&pi=Fendt, ordered by count descending }
187
+ - count(a) => { count of 1s in a }
188
+ - contains(a, i) => { true if the ith bit of a set }
189
+ - members(a) => { set of integers corresponding to set bits }
219
190
 
220
- SELECT facet.region, count(base.signature & filter.signature & facet.signature) FROM _projects_region_facet,
221
- (SELECT signature(_packed_id) FROM project WHERE fulltext @@ to_tsquery('Bush')) AS base,
222
- (SELECT filter(signature) AS signature FROM
223
- (SELECT signature FROM _project_feature_facet WHERE feature = 'browse'
224
- UNION
225
- SELECT signature FROM _project_pi_facet WHERE pi = 'Fendt')) AS filter
226
- WHERE count > 0 ORDER BY count DESC, facet.region ASC;
191
+ - sig_in, sig_out => { mandatory I/O functions }
192
+ - sig_and(a, b) => a & b
193
+ - sig_or(a, b) => a | b
194
+ - sig_xor(a) => ~a
195
+ - sig_length(a) => { number of bits in a }
196
+ - sig_min(a) => { lowest 1 in a, a.length }
197
+ - sig_get(a, i) => { ith bit of a, or 0 }
198
+ - sig_set(a, i, n) => { sets ith bit of a to n }
199
+ - sig_resize(a, n) => { resizes a to hold n bits }
227
200
 
228
- = by using standard SQL WHERE and ORDER BY clauses, you can achieve most standard facet value configurations
229
- [ order by count, order by facet value, include/exclude zero values, etc. ]
230
-
201
+ Bitwise signature operators: &, |
231
202
 
232
- ===== PostgreSQL C data types =====
203
+ Bitwise aggregates:
233
204
 
234
- Signature: an auto-sizing bitset with the following functions
205
+ - signature(int) => assemble ints into a signature
206
+ - collect(signature) => 'or' signature results together
207
+ - filter(signature) => 'and' signature results together
235
208
 
236
- count(a) => { count of 1s in a }
237
- contains(a, i) => { true if the ith bit of a set }
209
+ Helper functions:
238
210
 
239
- sig_in, sig_out => { mandatory I/O functions }
240
- sig_and(a, b) => a & b
241
- sig_or(a, b) => a | b
242
- sig_xor(a) => ~a
243
- sig_length(a) => { number of bits in a }
244
- sig_min(a) => { lowest 1 in a, a.length }
245
- sig_get(a, i) => { ith bit of a, or 0 }
246
- sig_set(a, i, n) => { sets ith bit of a to n }
247
- sig_resize(a, n) => { resizes a to hold n bits }
211
+ - renumber_table(table, column, threshold)
248
212
 
249
- Bitwise signature operators: &, |
213
+ Check what percentage of signature bits constructed from the specified table and column would be wasted. If higher than the threshold, drop and re-add the column with a packed, contiguous integer id.
250
214
 
251
- Bitwise aggregates:
215
+ - signature_wastage(table, column)
252
216
 
253
- signature(int) => assemble ints into a signature
254
- collect(signature) => 'or' signature results together
255
- filter(signature) => 'and' signature results together
256
-
257
-
258
- ===== Future extensions =====
259
-
260
- = FUTURE: to declare a faceting index using a datamapper hook rather than a postgresql crontab,
261
- = provide a block that returns a hash. (This example is functionally equivalent to the above.)
262
-
263
- is :faceted do |model|
264
- { :genre => model.genre,
265
- :published => [ model.date.year, model.date.month, model.date.day ] }
266
- end
267
-
268
-
269
- ===== Using facets with GIS =====
270
-
271
- The repertoire-faceting module provides optional support for faceting over
272
- data associated with GIS features or points. For example, rather than
273
- selecting the words 'USA', 'Massachusetts', and 'Boston' successively to
274
- refine on the nested values of a textual facet, the user could drill down by
275
- clicking on a map. From a GIS perspective, this involves selecting
276
- intersecting features from a series of GIS layers (in this case, 'countries',
277
- 'states', and 'cities'). At each stage, the display colors features in the map
278
- according to a choropleth distribution of results given other selected values.
279
-
280
- From a conceptual standpoint a GIS facet behaves exactly the same way as a
281
- nested textual facet. However, standard GIS operations can be used to
282
- affiliate items in the result set with map features: for example, if faceting
283
- over a person's city of residence, only points in the final layer ('cities')
284
- need be directly connected to items in the result set. The remainder of the
285
- associations are computed using spatial logic operators in the database.
286
-
287
- Textual facets can either derive the range of facet values from data in your
288
- tables or can use a separate controlled vocabulary. Because when you view
289
- facet values in a map you generally want entire entire map to appear but with
290
- those features associated to items highlighted, GIS facets nearly always use a
291
- controlled vocabulary. (Otherwise the widget would only display a fragmented
292
- map of values associated with items in the current result set.) Hence, the
293
- process of preparing data for faceted GIS browsing generally involves two
294
- steps: (a) marshalling a complete set of GIS data for all layers into a single
295
- table and preparing it for display on the web; and (b) creating a faceted
296
- index over the association between your result items and the GIS features.
297
-
298
- N.B. - You must be familiar with GIS concepts, be proficient with the
299
- PostgreSQL GIS extension (PostGIS), and your GIS data needs to be in good form
300
- before considering faceted indexing or browsing. Repertoire-faceting builds
301
- faceted browsing on top of your existing GIS, rather than automating GIS
302
- upload and display. You should be conversant with GIS databases and your data
303
- needs to be in good form before even considering faceted indexing or browsing.
304
- Ensure that each GIS layer has the correct SRID for its existing projection
305
- and that you can transform the layers into a common SRID and render them
306
- successfully. Also consider whether you need to simplify or combine geometries
307
- using ST_Union(), ST_Simplify(), ST_Transform(), etc. in order to render the
308
- features on the web with appropriate detail for the altitude you anticipate
309
- viewing each layer. If your layers display together correctly using a standard
310
- WMS (GeoServer is suggested) and rendered in Google Earth or Open Layers, then
311
- you are probably ready to proceed. Be aware you must use the EPSG:4269 (NAD
312
- 83) projection with Google Earth.
313
-
314
- - Preparing your system, a rough recipe:
315
-
316
- 1. Install PostGIS into your database
317
-
318
- sudo port install postgis
319
- psql -Upostgres hyperstudio_development -f /opt/local/share/postgresql84/contrib/postgis.sql
320
- psql -Upostgres repertoire_testing -f /opt/local/share/postgresql84/contrib/spatial_ref_sys.sql
321
-
322
- 2. Install Google Earth + Plugin
323
-
324
- 3. Install GeoServer (suggested for testing, not deployment)
325
-
326
- 4. load GIS data into PostgreSQL, using shp2pgsql, ogr2ogr, and other tools.
327
- (Make sure you know and set the SRID).
328
-
329
- 5. ensure GIS layers display together using GeoServer + Google Earth
330
-
331
- - Preparing your project to use GIS faceting
332
-
333
- 1. Register with Google for an Earth API key and add a script element to load
334
- it in your layout. (Since Google requires your API key, you must add this even
335
- though the remainder of the project may use repertoire-assets to load
336
- scripts.)
337
-
338
- <script src="http://www.google.com/jsapi?key=<your key code here>" type="text/javascript"></script>
339
-
340
- 2. In your database, create a table to hold the controlled vocabulary of
341
- features from every layer you may facet over. 'layer' is an integer indicating
342
- the nesting order of GIS layers (e.g. countries to states to cities). 'label'
343
- identifies the feature, e.g. 'California'. (The label needn't be unique.)
344
- 'full_geom' holds the complete geometry for that facet value, translated to a
345
- common projection (e.g. EPSG:4269). 'display_geom' holds the geometry that
346
- should actually be rendered when the feature is displayed in the facet (a
347
- simplified version of full_geom). For example:
348
-
349
- CREATE TABLE my_gis_vocab(id SERIAL, label TEXT, layer INTEGER);
350
- SELECT AddGeometryColumn('my_gis_vocab', 'display_geom', 4269, 'GEOMETRY', 2);
351
- SELECT AddGeometryColumn('my_gis_vocab', 'full_geom', 4269, 'GEOMETRY', 2);
352
- INSERT INTO my_gis_vocab(label, layer, display_geom, full_geom)
353
- SELECT country_name, 1, ST_SimplifyPreserveTopology(ST_Transform(the_geom, 4269), 0.002), the_geom FROM countries;
354
- INSERT INTO my_gis_vocab(label, layer, display_geom, full_geom)
355
- SELECT state_name, 2, ST_Transform(the_geom, 4269), the_geom FROM states;
356
- ...
357
-
358
- (Note that it is not strictly necessary to create a separate table for the
359
- controlled vocabulary, as it is also possible to do similar GIS
360
- transformations when creating a GIS facet value index and then UNION the
361
- results together. However, the SQL is simpler and this route allows you to
362
- verify and tweak the controlled vocabulary table using GeoServer.)
363
-
364
- 3. Generate the actual GIS facet index. Whereas normal facet indices have two
365
- columns, the value and the signature, GIS indices have five: an md5 checksum
366
- of the full geometry, and the fields in the controlled vocabulary (label,
367
- layer number, display geometry, and full geometry).
368
-
369
- Because the result item probably links to a GIS table rather than including
370
- the geometry as a column, we join these in a subselect (*a below). We then use
371
- the ST_Within spatial operator to find all of the features in the GIS
372
- controlled vocabulary that contain this feature (*b). Because there should be
373
- an index entry for every feature in the controlled vocabulary regardless of
374
- whether it is used by one of the items, we use a left outer join to combine
375
- the vocabulary with the signatures (*c) and coalesce to make sure the empty
376
- entries also have a signature (*d).
377
-
378
- SELECT recreate_table('_people_birthplace_facet', $$
379
- SELECT md5(full_geom::bytea) AS birthplace, label, layer,
380
- --- (*d)
381
- display_geom, full_geom, coalesce(signature, '0'::signature) AS signature
382
- FROM my_gis_vocab
383
- --- (*c)
384
- LEFT OUTER JOIN
385
- (SELECT my_gis_vocab.id AS vocab_id, signature(_packed_id)
386
- --- (*a)
387
- FROM my_gis_vocab,
388
- nobelists JOIN nobelist_cities ON (birth_city_id = nobelist_cities.id)
389
- --- (*b)
390
- WHERE ST_Within(nobelist_cities.the_geom, my_gis_vocab.full_geom)
391
- GROUP BY my_gis_vocab.id) AS signatures
392
- ON (vocab_id = my_gis_vocab.id)
393
- $$);
394
-
395
- Finally, a PostGIS index esnures that facet nesting queries are efficient:
396
-
397
- CREATE INDEX _nobelists_birthplace_facet_ndx ON _nobelists_birthplace_facet USING gist(full_geom);
398
-
399
- 4. In your model, declare that your facet uses 'geom' logic, e.g.:
400
-
401
- is :faceted, :birthplace => :geom
402
-
403
- 5. Usage in your controller and models is identical to traditional facets.
404
-
405
- 6. In your view, use the GIS widget, e.g.:
406
-
407
- $().ready(function() {
408
- $('#people').facet_context();
409
- $('#occupation').facet();
410
- $("#birthplace").earth_facet( {
411
- title: 'Location',
412
- camera: { lat: 37, long: -94, tilt: 4, altitude: 4000000, speed: 0.2 }
413
- });
414
- });
415
-
416
- Available options are documented in the earth faceting widget. You can set the
417
- initial camera position, the number of categories and coloring for choropleth
418
- maps, and other visual characteristics.
419
-
420
- 7. In the future, a helper that automatically assembles the controlled
421
- vocabulary and creates the facet index may be added. It would accept a select
422
- statement associating items ids and their geometries; and a series of layer
423
- definitions. E.g.:
424
-
425
- -- SELECT recreate_gis_facet('people', 'birthplace',
426
- -- "SELECT _packed_id, the_geom FROM people JOIN cities ON (people.birth_city_id = cities.id)",
427
- -- "SELECT city, the_geom, the_geom FROM nobelist_cities",
428
- -- "SELECT country, the_geom, ST_SimplifyPreserveTopology(the_geom, 0.02) FROM nobelist_countries")
429
- -- $$);
217
+ Returns a real number representing the percentage of bits that would be wasted in signatures constructed from the specified column.
data/TODO CHANGED
@@ -1,13 +1,29 @@
1
+ DESIRED FEATURES / IMPROVEMENTS.
2
+
3
+ -- mysql support
4
+ [ design: mysql has a count function and bitwise operators for blobs.
5
+ Adding support is a matter of defining the adapter and moving signature()
6
+ calls into the postgresql adapter ]
7
+
8
+ -- modify widgets to multiplex ajax calls to work around 2 call limit
9
+ in many browsers
10
+ [ design: fetch() queues ajax calls; update() implementations request
11
+ queue and merge current webservice call with ones already in the queue.
12
+ on controller side, receive multiple facet names, iterate, and bundle.
13
+ fetch() then unbundles the results can dispatches them to appropriate
14
+ callbacks ]
15
+
16
+
1
17
  TODO
2
18
 
19
+ - deploy to menzinga, document process
20
+ - make sure works with postgresql-crontab
21
+ - check formatting of all docs DONE
22
+
3
23
  - make rake tasks smarter about detecting whether to run NOT NECESSARY
4
24
  - make rake tasks automatically choose task dep on db type NOT TO DO
5
25
 
6
26
  - make sure example app can be set up and deployed via rake DONE
7
- - fulltext indexing for citizens in migration
8
-
9
- - facet index for citizens... migration or rake task?
10
- - gender not getting generated right
11
27
 
12
28
  - README
13
29
  * recipe for running tests DONE
@@ -71,7 +87,7 @@ TODO
71
87
  - get core examples working DONE
72
88
  - 'training wheels' using plain-jane SQL DONE
73
89
  - facets should inherit scope DONE
74
- - install procedure for new app
90
+ - install procedure for new app DONE
75
91
  * arrange to load without a generator/config file DONE
76
92
  * check rake tasks load when installed in an app DONE
77
93
  - system for checking presence of indexes DONE
@@ -87,30 +103,3 @@ TODO
87
103
  - new method: relation.facet[:name] ... finds the facet relation, merges with current one, and returns result DONE
88
104
  - sorting defaults NOT TO DO
89
105
  - prettier output for raw facet relations in to_s NOT TO DO [ db dependent ]
90
-
91
-
92
- Changes in this version
93
-
94
- - the 'type' parameter, which cast facet values to a given type, is gone
95
- - query execution is delayed until results are wanted
96
- - type specific formatters
97
-
98
-
99
- KNOWN PROBLEMS.
100
-
101
- -- none at the moment
102
-
103
-
104
- DESIRED FEATURES / IMPROVEMENTS.
105
-
106
- -- "training-wheels" mode using SQL group statements instead of bitsets
107
- -- migrations using DM facet declarations
108
-
109
- -- clean up sql generation in postgres adapter DONE
110
- -- determine exact memory requirements of postgres scalability example DONE
111
- -- optimise queries up to 1,000,000 scalability target DONE
112
-
113
- -- modify widgets to multiplex ajax calls to work around 2 call limit in many browsers
114
- [ design: fetch() queues ajax calls; update() implementations request queue and merge current webservice call with
115
- ones already in the queue. on controller side, receive multiple facet names, iterate, and bundle. fetch()
116
- then unbundles the results can dispatches them to appropriate callbacks ]