cequel 1.0.2 → 1.0.3

Sign up to get free protection for your applications and to get access to all the features.
data/LICENSE ADDED
@@ -0,0 +1,19 @@
1
+ The MIT License (MIT)
2
+ Copyright (c) 2012 Brewster Inc., Mat Brown
3
+
4
+ Permission is hereby granted, free of charge, to any person obtaining a copy of
5
+ this software and associated documentation files (the "Software"), to deal in
6
+ the Software without restriction, including without limitation the rights to
7
+ use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
8
+ of the Software, and to permit persons to whom the Software is furnished to do
9
+ so, subject to the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be included in all
12
+ copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
16
+ FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
17
+ COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
18
+ IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
19
+ CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,525 @@
1
+ # Cequel #
2
+
3
+ Cequel is a Ruby ORM for [Cassandra](http://cassandra.apache.org/) using
4
+ [CQL3](http://www.datastax.com/documentation/cql/3.0/webhelp/index.html).
5
+
6
+ [![Gem
7
+ Version](https://badge.fury.io/rb/cequel.png)](http://badge.fury.io/rb/cequel)
8
+ [![Dependency
9
+ Status](https://gemnasium.com/cequel/cequel.png)](https://gemnasium.com/cequel/cequel)
10
+ [![Code
11
+ Climate](https://codeclimate.com/github/cequel/cequel.png)](https://codeclimate.com/github/cequel/cequel)
12
+ [![Inline docs](http://inch-pages.github.io/github/cequel/cequel.png)](http://inch-pages.github.io/github/cequel/cequel)
13
+
14
+ `Cequel::Record` is an ActiveRecord-like domain model layer that exposes
15
+ the robust data modeling capabilities of CQL3, including parent-child
16
+ relationships via compound primary keys and collection columns.
17
+
18
+ The lower-level `Cequel::Metal` layer provides a CQL query builder interface
19
+ inspired by the excellent [Sequel](http://sequel.rubyforge.org/) library.
20
+
21
+ ## Installation ##
22
+
23
+ Add it to your Gemfile:
24
+
25
+ ``` ruby
26
+ gem 'cequel'
27
+ ```
28
+
29
+ ### Rails integration ###
30
+
31
+ Cequel does not require Rails, but if you are using Rails, you
32
+ will need version 3.2+. Cequel::Record will read from the configuration file
33
+ `config/cequel.yml` if it is present. You can generate a default configuarion
34
+ file with:
35
+
36
+ ```bash
37
+ rails g cequel:configuration
38
+ ```
39
+
40
+ Once you've got things configured (or decided to accept the defaults), run this
41
+ to create your keyspace (database):
42
+
43
+ ```bash
44
+ rake cequel:keyspace:create
45
+ ```
46
+
47
+ ## Setting up Models ##
48
+
49
+ Unlike in ActiveRecord, models declare their properties inline. We'll start with
50
+ a simple `Blog` model:
51
+
52
+ ```ruby
53
+ class Blog
54
+ include Cequel::Record
55
+
56
+ key :subdomain, :text
57
+ column :name, :text
58
+ column :description, :text
59
+ end
60
+ ```
61
+
62
+ Unlike a relational database, Cassandra does not have auto-incrementing primary
63
+ keys, so you must explicitly set the primary key when you create a new model.
64
+ For blogs, we use a natural key, which is the subdomain. Another option is to
65
+ use a UUID.
66
+
67
+ ### Compound keys and parent-child relationships ###
68
+
69
+ While Cassandra is not a relational database, compound keys do naturally map
70
+ to parent-child relationships. Cequel supports this explicitly with the
71
+ `has_many` and `belongs_to` relations. Let's create a model for posts that acts
72
+ as the child of the blog model:
73
+
74
+ ```ruby
75
+ class Post
76
+ include Cequel::Record
77
+ belongs_to :blog
78
+ key :id, :timeuuid, auto: true
79
+ column :title, :text
80
+ column :body, :text
81
+ end
82
+ ```
83
+
84
+ The `auto` option for the `key` declaration means Cequel will initialize new
85
+ records with a UUID already generated. This option is only valid for `:uuid` and
86
+ `:timeuuid` key columns.
87
+
88
+ Note that the `belongs_to` declaration must come *before* the `key` declaration.
89
+ This is because `belongs_to` defines the
90
+ [partition key](http://www.datastax.com/documentation/cql/3.0/webhelp/index.html#cql/ddl/../../cassandra/glossary/gloss_glossary.html#glossentry_dhv_s24_bk); the `id` column is
91
+ the [clustering column](http://www.datastax.com/documentation/cql/3.0/webhelp/index.html#glossentry_h31_xjk_bk).
92
+
93
+ Practically speaking, this means that posts are accessed using both the
94
+ `blog_subdomain` (automatically defined by the `belongs_to` association) and the
95
+ `id`. The most natural way to represent this type of lookup is using a
96
+ `has_many` association. Let's add one to `Blog`:
97
+
98
+ ```ruby
99
+ class Blog
100
+ include Cequel::Record
101
+
102
+ key :subdomain, :text
103
+ column :name, :text
104
+ column :description, :text
105
+
106
+ has_many :posts
107
+ end
108
+ ```
109
+
110
+ Now we might do something like this:
111
+
112
+ ```ruby
113
+ class PostsController < ActionController::Base
114
+ def show
115
+ Blog.find(current_subdomain).posts.find(params[:id])
116
+ end
117
+ end
118
+ ```
119
+
120
+ ### Schema synchronization ###
121
+
122
+ Cequel will automatically synchronize the schema stored in Cassandra to match
123
+ the schema you have defined in your models. If you're using Rails, you can
124
+ synchronize your schemas for everything in `app/models` by invoking:
125
+
126
+ ```bash
127
+ rake cequel:migrate
128
+ ```
129
+
130
+ ### Record sets ###
131
+
132
+ Record sets are lazy-loaded collections of records that correspond to a
133
+ particular CQL query. They behave similarly to ActiveRecord scopes:
134
+
135
+ ```ruby
136
+ Post.select(:id, :title).reverse.limit(10)
137
+ ```
138
+
139
+ To scope a record set to a primary key value, use the `[]` operator. This will
140
+ define a scoped value for the first unscoped primary key in the record set:
141
+
142
+ ```ruby
143
+ Post['bigdata'] # scopes posts with blog_subdomain="bigdata"
144
+ ```
145
+
146
+ You can pass multiple arguments to the `[]` operator, which will generate an
147
+ `IN` query:
148
+
149
+ ```ruby
150
+ Post['bigdata', 'nosql'] # scopes posts with blog_subdomain IN ("bigdata", "nosql")
151
+ ```
152
+
153
+ To select ranges of data, use `before`, `after`, `from`, `upto`, and `in`. Like
154
+ the `[]` operator, these methods operate on the first unscoped primary key:
155
+
156
+ ```ruby
157
+ Post['bigdata'].after(last_id) # scopes posts with blog_subdomain="bigdata" and id > last_id
158
+ ```
159
+
160
+ Note that record sets always load records in batches; Cassandra does not support
161
+ result sets of unbounded size. This process is transparent to you but you'll see
162
+ multiple queries in your logs if you're iterating over a huge result set.
163
+
164
+ #### Time UUID Queries ####
165
+
166
+ CQL has [special handling for the `timeuuid`
167
+ type](http://www.datastax.com/documentation/cql/3.0/webhelp/index.html#cql/cql_reference/cql_data_types_c.html#reference_ds_axc_xk5_yj),
168
+ which allows you to return a rows whose UUID keys correspond to a range of
169
+ timestamps.
170
+
171
+ Cequel automatically constructs timeuuid range queries if you pass a `Time`
172
+ value for a range over a `timeuuid` column. So, if you want to get the posts
173
+ from the last day, you can run:
174
+
175
+ ```ruby
176
+ Blog['myblog'].posts.from(1.day.ago)
177
+ ```
178
+
179
+ ### Updating records ###
180
+
181
+ When you update an existing record, Cequel will only write statements to the
182
+ database that correspond to explicit modifications you've made to the record in
183
+ memory. So, in this situation:
184
+
185
+ ```ruby
186
+ @post = Blog.find(current_subdomain).posts.find(params[:id])
187
+ @post.update_attributes!(title: "Announcing Cequel 1.0")
188
+ ```
189
+
190
+ Cequel will only update the title column. Note that this is not full dirty
191
+ tracking; simply setting the title on the record will signal to Cequel that you
192
+ want to write that attribute to the database, regardless of its previous value.
193
+
194
+ ### Unloaded models ###
195
+
196
+ In the above example, we call the familiar `find` method to load a blog and then
197
+ one of its posts, but we didn't actually do anything with the data in the Blog
198
+ model; it was simply a convenient object-oriented way to get a handle to the
199
+ blog's posts. Cequel supports unloaded models via the `[]` operator; this will
200
+ return an **unloaded** blog instance, which knows the value of its primary key,
201
+ but does not read the row from the database. So, we can refactor the example to
202
+ be a bit more efficient:
203
+
204
+ ```ruby
205
+ class PostsController < ActionController::Base
206
+ def show
207
+ @post = Blog[current_subdomain].posts.find(params[:id])
208
+ end
209
+ end
210
+ ```
211
+
212
+ If you attempt to access a data attribute on an unloaded class, it will
213
+ lazy-load the row from the database and become a normal loaded instance.
214
+
215
+ You can generate a collection of unloaded instances by passing multiple
216
+ arguments to `[]`:
217
+
218
+ ```ruby
219
+ class BlogsController < ActionController::Base
220
+ def recommended
221
+ @blogs = Blog['cassandra', 'nosql']
222
+ end
223
+ end
224
+ ```
225
+
226
+ The above will not generate a CQL query, but when you access a property on any
227
+ of the unloaded `Blog` instances, Cequel will load data for all of them with
228
+ a single query. Note that CQL does not allow selecting collection columns when
229
+ loading multiple records by primary key; only scalar columns will be loaded.
230
+
231
+ There is another use for unloaded instances: you may set attributes on an
232
+ unloaded instance and call `save` without ever actually reading the row from
233
+ Cassandra. Because Cassandra is optimized for writing data, this "write without
234
+ reading" pattern gives you maximum efficiency, particularly if you are updating
235
+ a large number of records.
236
+
237
+ ### Collection columns ###
238
+
239
+ Cassandra supports three types of collection columns: lists, sets, and maps.
240
+ Collection columns can be manipulated using atomic collection mutation; e.g.,
241
+ you can add an element to a set without knowing the existing elements. Cequel
242
+ supports this by exposing collection objects that keep track of their
243
+ modifications, and which then persist those modifications to Cassandra on save.
244
+
245
+ Let's add a category set to our post model:
246
+
247
+
248
+ ```ruby
249
+ class Post
250
+ include Cequel::Record
251
+
252
+ belongs_to :blog
253
+ key :id, :uuid
254
+ column :title, :text
255
+ column :body, :text
256
+ set :categories, :text
257
+ end
258
+ ```
259
+
260
+ If we were to then update a post like so:
261
+
262
+ ```ruby
263
+ @post = Blog[current_subdomain].posts[params[:id]]
264
+ @post.categories << 'Kittens'
265
+ @post.save!
266
+ ```
267
+
268
+ Cequel would send the CQL equivalent of "Add the category 'Kittens' to the post
269
+ at the given `(blog_subdomain, id)`", without ever reading the saved value of
270
+ the `categories` set.
271
+
272
+ ### Secondary indexes ###
273
+
274
+ Cassandra supports secondary indexes, although with notable restrictions:
275
+
276
+ * Only scalar data columns can be indexed; key columns and collection columns
277
+ cannot.
278
+ * A secondary index consists of exactly one column.
279
+ * Though you can have more than one secondary index on a table, you can only use
280
+ one in any given query.
281
+
282
+ Cequel supports the `:index` option to add secondary indexes to column
283
+ definitions:
284
+
285
+ ```ruby
286
+ class Post
287
+ include Cequel::Record
288
+
289
+ belongs_to :blog
290
+ key :id, :uuid
291
+ column :title, :text
292
+ column :body, :text
293
+ column :author_id, :uuid, :index => true
294
+ set :categories, :text
295
+ end
296
+ ```
297
+
298
+ Defining a column with a secondary index adds several "magic methods" for using
299
+ the index:
300
+
301
+ ```ruby
302
+ Post.with_author_id(id) # returns a record set scoped to that author_id
303
+ Post.find_by_author_id(id) # returns the first post with that author_id
304
+ Post.find_all_by_author_id(id) # returns an array of all posts with that author_id
305
+ ```
306
+
307
+ You can also call the `where` method directly on record sets:
308
+
309
+ ```ruby
310
+ Post.where(:author_id, id)
311
+ ```
312
+
313
+ Note that `where` is only for secondary indexed columns; use `[]` to scope
314
+ record sets by primary keys.
315
+
316
+ ### ActiveModel Support ###
317
+
318
+ Cequel supports ActiveModel functionality, such as callbacks, validations,
319
+ dirty attribute tracking, naming, and serialization. If you're using Rails 3,
320
+ mass-assignment protection works as usual, and in Rails 4, strong parameters are
321
+ treated correctly. So we can add some extra ActiveModel goodness to our post
322
+ model:
323
+
324
+ ```ruby
325
+ class Post
326
+ include Cequel::Record
327
+
328
+ belongs_to :blog
329
+ key :id, :uuid
330
+ column :title, :text
331
+ column :body, :text
332
+
333
+ validates :body, presence: true
334
+
335
+ after_save :notify_followers
336
+ end
337
+ ```
338
+
339
+ Note that validations or callbacks that need to read data attributes will cause
340
+ unloaded models to load their row during the course of the save operation, so if
341
+ you are following a write-without-reading pattern, you will need to be careful.
342
+
343
+ Dirty attribute tracking is only enabled on loaded models.
344
+
345
+ ## Upgrading from Cequel 0.x ##
346
+
347
+ Cequel 0.x targeted CQL2, which has a substantially different data
348
+ representation from CQL3. Accordingly, upgrading from Cequel 0.x to Cequel 1.0
349
+ requires some changes to your data models.
350
+
351
+ ### Upgrading a Cequel::Model ###
352
+
353
+ Upgrading from a `Cequel::Model` class is fairly straightforward; simply add the
354
+ `compact_storage` directive to your class definition:
355
+
356
+ ```ruby
357
+ # Model definition in Cequel 0.x
358
+ class Post
359
+ include Cequel::Model
360
+
361
+ key :id, :uuid
362
+ column :title, :text
363
+ column :body, :text
364
+ end
365
+
366
+ # Model definition in Cequel 1.0
367
+ class Post
368
+ include Cequel::Record
369
+
370
+ key :id, :uuid
371
+ column :title, :text
372
+ column :body, :text
373
+
374
+ compact_storage
375
+ end
376
+ ```
377
+
378
+ Note that the semantics of `belongs_to` and `has_many` are completely different
379
+ between Cequel 0.x and Cequel 1.0; if you have data columns that reference keys
380
+ in other tables, you will need to hand-roll those associations for now.
381
+
382
+ ### Upgrading a Cequel::Model::Dictionary ###
383
+
384
+ CQL3 does not have a direct "wide row" representation like CQL2, so the
385
+ `Dictionary` class does not have a direct analog in Cequel 1.0. Instead, each
386
+ row key-map key-value tuple in a `Dictionary` corresponds to a single row in
387
+ CQL3. Upgrading a `Dictionary` to Cequel 1.0 involves defining two primary keys
388
+ and a single data column, again using the `compact_storage` directive:
389
+
390
+ ``` ruby
391
+ # Dictionary definition in Cequel 0.x
392
+ class BlogPosts < Cequel::Model::Dictionary
393
+ key :blog_id, :uuid
394
+ maps :uuid => :text
395
+
396
+ private
397
+
398
+ def serialize_value(column, value)
399
+ value.to_json
400
+ end
401
+
402
+ def deserialize_value(column, value)
403
+ JSON.parse(value)
404
+ end
405
+ end
406
+
407
+ # Equivalent model in Cequel 1.0
408
+ class BlogPost
409
+ include Cequel::Record
410
+
411
+ key :blog_id, :uuid
412
+ key :id, :uuid
413
+ column :data, :text
414
+
415
+ compact_storage
416
+
417
+ def data
418
+ JSON.parse(read_attribute(:data))
419
+ end
420
+
421
+ def data=(new_data)
422
+ write_attribute(:data, new_data.to_json)
423
+ end
424
+ end
425
+ ```
426
+
427
+ `Cequel::Model::Dictionary` did not infer a pluralized table name, as
428
+ `Cequel::Model` did and `Cequel::Record` does. If your legacy `Dictionary`
429
+ table has a singlar table name, add a `self.table_name = :blog_post` in the
430
+ model definition.
431
+
432
+ Note that you will want to run `::synchronize_schema` on your models when
433
+ upgrading; this will not change the underlying data structure, but will add some
434
+ CQL3-specific metadata to the table definition which will allow you to query it.
435
+
436
+ ### CQL Gotchas ###
437
+
438
+ CQL is designed to be immediately familiar to those of us who are used to
439
+ working with SQL, which is all of us. Cequel advances this spirit by providing
440
+ an ActiveRecord-like mapping for CQL. However, Cassandra is very much not a
441
+ relational database, so some behaviors can come as a surprise. Here's an
442
+ overview.
443
+
444
+ #### Upserts ####
445
+
446
+ Perhaps the most surprising fact about CQL is that `INSERT` and `UPDATE` are
447
+ essentially the same thing: both simply persist the given column data at the
448
+ given key(s). So, you may think you are creating a new record, but in fact
449
+ you're overwriting data at an existing record:
450
+
451
+ ``` ruby
452
+ # I'm just creating a blog here.
453
+ blog1 = Blog.create!(
454
+ subdomain: 'big-data',
455
+ name: 'Big Data',
456
+ description: 'A blog about all things big data')
457
+
458
+ # And another new blog.
459
+ blog2 = Blog.create!(
460
+ subdomain: 'big-data',
461
+ name: 'The Big Data Blog')
462
+ ```
463
+
464
+ Living in a relational world, we'd expect the second statement to throw an
465
+ error because the row with key 'big-data' already exists. But not Cassandra: the
466
+ above code will just overwrite the `name` in that row. Note that the
467
+ `description` will not be touched by the second statement; upserts only work on
468
+ the columns that are given.
469
+
470
+ ## Compatibility ##
471
+
472
+ ### Rails ###
473
+
474
+ * 4.0
475
+ * 3.2
476
+ * 3.1
477
+
478
+ ### Ruby ###
479
+
480
+ * Ruby 2.1, 2.0, 1.9.3
481
+ * JRuby 1.7
482
+ * Rubinius 2.2
483
+
484
+ ### Cassandra ###
485
+
486
+ * 1.2
487
+ * 2.0
488
+
489
+ Though Cequel is tested against Cassandra 2.0, it does not at this time support
490
+ any of the CQL3.1 features introduced in Cassandra 2.0. This will change in the
491
+ future.
492
+
493
+ ## Support & Bugs ##
494
+
495
+ If you find a bug, feel free to
496
+ [open an issue](https://github.com/cequel/cequel/issues/new) on GitHub.
497
+ Pull requests are most welcome.
498
+
499
+ For questions or feedback, hit up our mailing list at
500
+ [cequel@groups.google.com](http://groups.google.com/group/cequel)
501
+ or find outoftime in the #cassandra IRC channel on Freenode.
502
+
503
+ ## Contributing ##
504
+
505
+ See
506
+ [CONTRIBUTING.md](https://github.com/cequel/cequel/blob/master/CONTRIBUTING.md)
507
+
508
+ ## Credits ##
509
+
510
+ Cequel was written by:
511
+
512
+ * Mat Brown
513
+ * Aubrey Holland
514
+ * Keenan Brock
515
+ * Insoo Buzz Jung
516
+ * Louis Simoneau
517
+ * Randy Meech
518
+
519
+ Special thanks to [Brewster](https://www.brewster.com), which supported the 0.x
520
+ releases of Cequel.
521
+
522
+ ## License ##
523
+
524
+ Cequel is distributed under the MIT license. See the attached LICENSE for all
525
+ the sordid details.