cequel 1.0.2 → 1.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/LICENSE ADDED
@@ -0,0 +1,19 @@
1
+ The MIT License (MIT)
2
+ Copyright (c) 2012 Brewster Inc., Mat Brown
3
+
4
+ Permission is hereby granted, free of charge, to any person obtaining a copy of
5
+ this software and associated documentation files (the "Software"), to deal in
6
+ the Software without restriction, including without limitation the rights to
7
+ use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
8
+ of the Software, and to permit persons to whom the Software is furnished to do
9
+ so, subject to the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be included in all
12
+ copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
16
+ FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
17
+ COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
18
+ IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
19
+ CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,525 @@
1
+ # Cequel #
2
+
3
+ Cequel is a Ruby ORM for [Cassandra](http://cassandra.apache.org/) using
4
+ [CQL3](http://www.datastax.com/documentation/cql/3.0/webhelp/index.html).
5
+
6
+ [![Gem
7
+ Version](https://badge.fury.io/rb/cequel.png)](http://badge.fury.io/rb/cequel)
8
+ [![Dependency
9
+ Status](https://gemnasium.com/cequel/cequel.png)](https://gemnasium.com/cequel/cequel)
10
+ [![Code
11
+ Climate](https://codeclimate.com/github/cequel/cequel.png)](https://codeclimate.com/github/cequel/cequel)
12
+ [![Inline docs](http://inch-pages.github.io/github/cequel/cequel.png)](http://inch-pages.github.io/github/cequel/cequel)
13
+
14
+ `Cequel::Record` is an ActiveRecord-like domain model layer that exposes
15
+ the robust data modeling capabilities of CQL3, including parent-child
16
+ relationships via compound primary keys and collection columns.
17
+
18
+ The lower-level `Cequel::Metal` layer provides a CQL query builder interface
19
+ inspired by the excellent [Sequel](http://sequel.rubyforge.org/) library.
20
+
21
+ ## Installation ##
22
+
23
+ Add it to your Gemfile:
24
+
25
+ ``` ruby
26
+ gem 'cequel'
27
+ ```
28
+
29
+ ### Rails integration ###
30
+
31
+ Cequel does not require Rails, but if you are using Rails, you
32
+ will need version 3.2+. Cequel::Record will read from the configuration file
33
+ `config/cequel.yml` if it is present. You can generate a default configuarion
34
+ file with:
35
+
36
+ ```bash
37
+ rails g cequel:configuration
38
+ ```
39
+
40
+ Once you've got things configured (or decided to accept the defaults), run this
41
+ to create your keyspace (database):
42
+
43
+ ```bash
44
+ rake cequel:keyspace:create
45
+ ```
46
+
47
+ ## Setting up Models ##
48
+
49
+ Unlike in ActiveRecord, models declare their properties inline. We'll start with
50
+ a simple `Blog` model:
51
+
52
+ ```ruby
53
+ class Blog
54
+ include Cequel::Record
55
+
56
+ key :subdomain, :text
57
+ column :name, :text
58
+ column :description, :text
59
+ end
60
+ ```
61
+
62
+ Unlike a relational database, Cassandra does not have auto-incrementing primary
63
+ keys, so you must explicitly set the primary key when you create a new model.
64
+ For blogs, we use a natural key, which is the subdomain. Another option is to
65
+ use a UUID.
66
+
67
+ ### Compound keys and parent-child relationships ###
68
+
69
+ While Cassandra is not a relational database, compound keys do naturally map
70
+ to parent-child relationships. Cequel supports this explicitly with the
71
+ `has_many` and `belongs_to` relations. Let's create a model for posts that acts
72
+ as the child of the blog model:
73
+
74
+ ```ruby
75
+ class Post
76
+ include Cequel::Record
77
+ belongs_to :blog
78
+ key :id, :timeuuid, auto: true
79
+ column :title, :text
80
+ column :body, :text
81
+ end
82
+ ```
83
+
84
+ The `auto` option for the `key` declaration means Cequel will initialize new
85
+ records with a UUID already generated. This option is only valid for `:uuid` and
86
+ `:timeuuid` key columns.
87
+
88
+ Note that the `belongs_to` declaration must come *before* the `key` declaration.
89
+ This is because `belongs_to` defines the
90
+ [partition key](http://www.datastax.com/documentation/cql/3.0/webhelp/index.html#cql/ddl/../../cassandra/glossary/gloss_glossary.html#glossentry_dhv_s24_bk); the `id` column is
91
+ the [clustering column](http://www.datastax.com/documentation/cql/3.0/webhelp/index.html#glossentry_h31_xjk_bk).
92
+
93
+ Practically speaking, this means that posts are accessed using both the
94
+ `blog_subdomain` (automatically defined by the `belongs_to` association) and the
95
+ `id`. The most natural way to represent this type of lookup is using a
96
+ `has_many` association. Let's add one to `Blog`:
97
+
98
+ ```ruby
99
+ class Blog
100
+ include Cequel::Record
101
+
102
+ key :subdomain, :text
103
+ column :name, :text
104
+ column :description, :text
105
+
106
+ has_many :posts
107
+ end
108
+ ```
109
+
110
+ Now we might do something like this:
111
+
112
+ ```ruby
113
+ class PostsController < ActionController::Base
114
+ def show
115
+ Blog.find(current_subdomain).posts.find(params[:id])
116
+ end
117
+ end
118
+ ```
119
+
120
+ ### Schema synchronization ###
121
+
122
+ Cequel will automatically synchronize the schema stored in Cassandra to match
123
+ the schema you have defined in your models. If you're using Rails, you can
124
+ synchronize your schemas for everything in `app/models` by invoking:
125
+
126
+ ```bash
127
+ rake cequel:migrate
128
+ ```
129
+
130
+ ### Record sets ###
131
+
132
+ Record sets are lazy-loaded collections of records that correspond to a
133
+ particular CQL query. They behave similarly to ActiveRecord scopes:
134
+
135
+ ```ruby
136
+ Post.select(:id, :title).reverse.limit(10)
137
+ ```
138
+
139
+ To scope a record set to a primary key value, use the `[]` operator. This will
140
+ define a scoped value for the first unscoped primary key in the record set:
141
+
142
+ ```ruby
143
+ Post['bigdata'] # scopes posts with blog_subdomain="bigdata"
144
+ ```
145
+
146
+ You can pass multiple arguments to the `[]` operator, which will generate an
147
+ `IN` query:
148
+
149
+ ```ruby
150
+ Post['bigdata', 'nosql'] # scopes posts with blog_subdomain IN ("bigdata", "nosql")
151
+ ```
152
+
153
+ To select ranges of data, use `before`, `after`, `from`, `upto`, and `in`. Like
154
+ the `[]` operator, these methods operate on the first unscoped primary key:
155
+
156
+ ```ruby
157
+ Post['bigdata'].after(last_id) # scopes posts with blog_subdomain="bigdata" and id > last_id
158
+ ```
159
+
160
+ Note that record sets always load records in batches; Cassandra does not support
161
+ result sets of unbounded size. This process is transparent to you but you'll see
162
+ multiple queries in your logs if you're iterating over a huge result set.
163
+
164
+ #### Time UUID Queries ####
165
+
166
+ CQL has [special handling for the `timeuuid`
167
+ type](http://www.datastax.com/documentation/cql/3.0/webhelp/index.html#cql/cql_reference/cql_data_types_c.html#reference_ds_axc_xk5_yj),
168
+ which allows you to return a rows whose UUID keys correspond to a range of
169
+ timestamps.
170
+
171
+ Cequel automatically constructs timeuuid range queries if you pass a `Time`
172
+ value for a range over a `timeuuid` column. So, if you want to get the posts
173
+ from the last day, you can run:
174
+
175
+ ```ruby
176
+ Blog['myblog'].posts.from(1.day.ago)
177
+ ```
178
+
179
+ ### Updating records ###
180
+
181
+ When you update an existing record, Cequel will only write statements to the
182
+ database that correspond to explicit modifications you've made to the record in
183
+ memory. So, in this situation:
184
+
185
+ ```ruby
186
+ @post = Blog.find(current_subdomain).posts.find(params[:id])
187
+ @post.update_attributes!(title: "Announcing Cequel 1.0")
188
+ ```
189
+
190
+ Cequel will only update the title column. Note that this is not full dirty
191
+ tracking; simply setting the title on the record will signal to Cequel that you
192
+ want to write that attribute to the database, regardless of its previous value.
193
+
194
+ ### Unloaded models ###
195
+
196
+ In the above example, we call the familiar `find` method to load a blog and then
197
+ one of its posts, but we didn't actually do anything with the data in the Blog
198
+ model; it was simply a convenient object-oriented way to get a handle to the
199
+ blog's posts. Cequel supports unloaded models via the `[]` operator; this will
200
+ return an **unloaded** blog instance, which knows the value of its primary key,
201
+ but does not read the row from the database. So, we can refactor the example to
202
+ be a bit more efficient:
203
+
204
+ ```ruby
205
+ class PostsController < ActionController::Base
206
+ def show
207
+ @post = Blog[current_subdomain].posts.find(params[:id])
208
+ end
209
+ end
210
+ ```
211
+
212
+ If you attempt to access a data attribute on an unloaded class, it will
213
+ lazy-load the row from the database and become a normal loaded instance.
214
+
215
+ You can generate a collection of unloaded instances by passing multiple
216
+ arguments to `[]`:
217
+
218
+ ```ruby
219
+ class BlogsController < ActionController::Base
220
+ def recommended
221
+ @blogs = Blog['cassandra', 'nosql']
222
+ end
223
+ end
224
+ ```
225
+
226
+ The above will not generate a CQL query, but when you access a property on any
227
+ of the unloaded `Blog` instances, Cequel will load data for all of them with
228
+ a single query. Note that CQL does not allow selecting collection columns when
229
+ loading multiple records by primary key; only scalar columns will be loaded.
230
+
231
+ There is another use for unloaded instances: you may set attributes on an
232
+ unloaded instance and call `save` without ever actually reading the row from
233
+ Cassandra. Because Cassandra is optimized for writing data, this "write without
234
+ reading" pattern gives you maximum efficiency, particularly if you are updating
235
+ a large number of records.
236
+
237
+ ### Collection columns ###
238
+
239
+ Cassandra supports three types of collection columns: lists, sets, and maps.
240
+ Collection columns can be manipulated using atomic collection mutation; e.g.,
241
+ you can add an element to a set without knowing the existing elements. Cequel
242
+ supports this by exposing collection objects that keep track of their
243
+ modifications, and which then persist those modifications to Cassandra on save.
244
+
245
+ Let's add a category set to our post model:
246
+
247
+
248
+ ```ruby
249
+ class Post
250
+ include Cequel::Record
251
+
252
+ belongs_to :blog
253
+ key :id, :uuid
254
+ column :title, :text
255
+ column :body, :text
256
+ set :categories, :text
257
+ end
258
+ ```
259
+
260
+ If we were to then update a post like so:
261
+
262
+ ```ruby
263
+ @post = Blog[current_subdomain].posts[params[:id]]
264
+ @post.categories << 'Kittens'
265
+ @post.save!
266
+ ```
267
+
268
+ Cequel would send the CQL equivalent of "Add the category 'Kittens' to the post
269
+ at the given `(blog_subdomain, id)`", without ever reading the saved value of
270
+ the `categories` set.
271
+
272
+ ### Secondary indexes ###
273
+
274
+ Cassandra supports secondary indexes, although with notable restrictions:
275
+
276
+ * Only scalar data columns can be indexed; key columns and collection columns
277
+ cannot.
278
+ * A secondary index consists of exactly one column.
279
+ * Though you can have more than one secondary index on a table, you can only use
280
+ one in any given query.
281
+
282
+ Cequel supports the `:index` option to add secondary indexes to column
283
+ definitions:
284
+
285
+ ```ruby
286
+ class Post
287
+ include Cequel::Record
288
+
289
+ belongs_to :blog
290
+ key :id, :uuid
291
+ column :title, :text
292
+ column :body, :text
293
+ column :author_id, :uuid, :index => true
294
+ set :categories, :text
295
+ end
296
+ ```
297
+
298
+ Defining a column with a secondary index adds several "magic methods" for using
299
+ the index:
300
+
301
+ ```ruby
302
+ Post.with_author_id(id) # returns a record set scoped to that author_id
303
+ Post.find_by_author_id(id) # returns the first post with that author_id
304
+ Post.find_all_by_author_id(id) # returns an array of all posts with that author_id
305
+ ```
306
+
307
+ You can also call the `where` method directly on record sets:
308
+
309
+ ```ruby
310
+ Post.where(:author_id, id)
311
+ ```
312
+
313
+ Note that `where` is only for secondary indexed columns; use `[]` to scope
314
+ record sets by primary keys.
315
+
316
+ ### ActiveModel Support ###
317
+
318
+ Cequel supports ActiveModel functionality, such as callbacks, validations,
319
+ dirty attribute tracking, naming, and serialization. If you're using Rails 3,
320
+ mass-assignment protection works as usual, and in Rails 4, strong parameters are
321
+ treated correctly. So we can add some extra ActiveModel goodness to our post
322
+ model:
323
+
324
+ ```ruby
325
+ class Post
326
+ include Cequel::Record
327
+
328
+ belongs_to :blog
329
+ key :id, :uuid
330
+ column :title, :text
331
+ column :body, :text
332
+
333
+ validates :body, presence: true
334
+
335
+ after_save :notify_followers
336
+ end
337
+ ```
338
+
339
+ Note that validations or callbacks that need to read data attributes will cause
340
+ unloaded models to load their row during the course of the save operation, so if
341
+ you are following a write-without-reading pattern, you will need to be careful.
342
+
343
+ Dirty attribute tracking is only enabled on loaded models.
344
+
345
+ ## Upgrading from Cequel 0.x ##
346
+
347
+ Cequel 0.x targeted CQL2, which has a substantially different data
348
+ representation from CQL3. Accordingly, upgrading from Cequel 0.x to Cequel 1.0
349
+ requires some changes to your data models.
350
+
351
+ ### Upgrading a Cequel::Model ###
352
+
353
+ Upgrading from a `Cequel::Model` class is fairly straightforward; simply add the
354
+ `compact_storage` directive to your class definition:
355
+
356
+ ```ruby
357
+ # Model definition in Cequel 0.x
358
+ class Post
359
+ include Cequel::Model
360
+
361
+ key :id, :uuid
362
+ column :title, :text
363
+ column :body, :text
364
+ end
365
+
366
+ # Model definition in Cequel 1.0
367
+ class Post
368
+ include Cequel::Record
369
+
370
+ key :id, :uuid
371
+ column :title, :text
372
+ column :body, :text
373
+
374
+ compact_storage
375
+ end
376
+ ```
377
+
378
+ Note that the semantics of `belongs_to` and `has_many` are completely different
379
+ between Cequel 0.x and Cequel 1.0; if you have data columns that reference keys
380
+ in other tables, you will need to hand-roll those associations for now.
381
+
382
+ ### Upgrading a Cequel::Model::Dictionary ###
383
+
384
+ CQL3 does not have a direct "wide row" representation like CQL2, so the
385
+ `Dictionary` class does not have a direct analog in Cequel 1.0. Instead, each
386
+ row key-map key-value tuple in a `Dictionary` corresponds to a single row in
387
+ CQL3. Upgrading a `Dictionary` to Cequel 1.0 involves defining two primary keys
388
+ and a single data column, again using the `compact_storage` directive:
389
+
390
+ ``` ruby
391
+ # Dictionary definition in Cequel 0.x
392
+ class BlogPosts < Cequel::Model::Dictionary
393
+ key :blog_id, :uuid
394
+ maps :uuid => :text
395
+
396
+ private
397
+
398
+ def serialize_value(column, value)
399
+ value.to_json
400
+ end
401
+
402
+ def deserialize_value(column, value)
403
+ JSON.parse(value)
404
+ end
405
+ end
406
+
407
+ # Equivalent model in Cequel 1.0
408
+ class BlogPost
409
+ include Cequel::Record
410
+
411
+ key :blog_id, :uuid
412
+ key :id, :uuid
413
+ column :data, :text
414
+
415
+ compact_storage
416
+
417
+ def data
418
+ JSON.parse(read_attribute(:data))
419
+ end
420
+
421
+ def data=(new_data)
422
+ write_attribute(:data, new_data.to_json)
423
+ end
424
+ end
425
+ ```
426
+
427
+ `Cequel::Model::Dictionary` did not infer a pluralized table name, as
428
+ `Cequel::Model` did and `Cequel::Record` does. If your legacy `Dictionary`
429
+ table has a singlar table name, add a `self.table_name = :blog_post` in the
430
+ model definition.
431
+
432
+ Note that you will want to run `::synchronize_schema` on your models when
433
+ upgrading; this will not change the underlying data structure, but will add some
434
+ CQL3-specific metadata to the table definition which will allow you to query it.
435
+
436
+ ### CQL Gotchas ###
437
+
438
+ CQL is designed to be immediately familiar to those of us who are used to
439
+ working with SQL, which is all of us. Cequel advances this spirit by providing
440
+ an ActiveRecord-like mapping for CQL. However, Cassandra is very much not a
441
+ relational database, so some behaviors can come as a surprise. Here's an
442
+ overview.
443
+
444
+ #### Upserts ####
445
+
446
+ Perhaps the most surprising fact about CQL is that `INSERT` and `UPDATE` are
447
+ essentially the same thing: both simply persist the given column data at the
448
+ given key(s). So, you may think you are creating a new record, but in fact
449
+ you're overwriting data at an existing record:
450
+
451
+ ``` ruby
452
+ # I'm just creating a blog here.
453
+ blog1 = Blog.create!(
454
+ subdomain: 'big-data',
455
+ name: 'Big Data',
456
+ description: 'A blog about all things big data')
457
+
458
+ # And another new blog.
459
+ blog2 = Blog.create!(
460
+ subdomain: 'big-data',
461
+ name: 'The Big Data Blog')
462
+ ```
463
+
464
+ Living in a relational world, we'd expect the second statement to throw an
465
+ error because the row with key 'big-data' already exists. But not Cassandra: the
466
+ above code will just overwrite the `name` in that row. Note that the
467
+ `description` will not be touched by the second statement; upserts only work on
468
+ the columns that are given.
469
+
470
+ ## Compatibility ##
471
+
472
+ ### Rails ###
473
+
474
+ * 4.0
475
+ * 3.2
476
+ * 3.1
477
+
478
+ ### Ruby ###
479
+
480
+ * Ruby 2.1, 2.0, 1.9.3
481
+ * JRuby 1.7
482
+ * Rubinius 2.2
483
+
484
+ ### Cassandra ###
485
+
486
+ * 1.2
487
+ * 2.0
488
+
489
+ Though Cequel is tested against Cassandra 2.0, it does not at this time support
490
+ any of the CQL3.1 features introduced in Cassandra 2.0. This will change in the
491
+ future.
492
+
493
+ ## Support & Bugs ##
494
+
495
+ If you find a bug, feel free to
496
+ [open an issue](https://github.com/cequel/cequel/issues/new) on GitHub.
497
+ Pull requests are most welcome.
498
+
499
+ For questions or feedback, hit up our mailing list at
500
+ [cequel@groups.google.com](http://groups.google.com/group/cequel)
501
+ or find outoftime in the #cassandra IRC channel on Freenode.
502
+
503
+ ## Contributing ##
504
+
505
+ See
506
+ [CONTRIBUTING.md](https://github.com/cequel/cequel/blob/master/CONTRIBUTING.md)
507
+
508
+ ## Credits ##
509
+
510
+ Cequel was written by:
511
+
512
+ * Mat Brown
513
+ * Aubrey Holland
514
+ * Keenan Brock
515
+ * Insoo Buzz Jung
516
+ * Louis Simoneau
517
+ * Randy Meech
518
+
519
+ Special thanks to [Brewster](https://www.brewster.com), which supported the 0.x
520
+ releases of Cequel.
521
+
522
+ ## License ##
523
+
524
+ Cequel is distributed under the MIT license. See the attached LICENSE for all
525
+ the sordid details.