shameless 0.5.2 → 0.6.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: ada388329571a9d4e7401b2258bd315cd92f9219
4
- data.tar.gz: d12bbf07ca0ede9db4f4ea79b21048b0eeeab4b7
3
+ metadata.gz: 5bc5dbfda6dcf36c8ca5ac5d91394cf1c01b08ec
4
+ data.tar.gz: f6ce2ddeedc0506c1ca7ec6613a42971eafa86ef
5
5
  SHA512:
6
- metadata.gz: df9d74db8abe98a8b5437511a35af4e159ae224d6db17e8fe3870acf12311a9bd407221248ca2a081c866b49da903283b37f8f423c6dc62ad11643cb924c006f
7
- data.tar.gz: '0638d93b2c824cf7c4fa39b8454ec13ef8ae667f159a6ed8fe691b441334c6731c34f57022ce7a1d9297834ebae02b985897914a7e6ea57e4293212b668c2997'
6
+ metadata.gz: 42d218c63e8ef57dd1aa5b6a6f2a69db89f39720c8935f26e0d8e47e3872c4a21a5449915500520cbe109750028659349444258a795c3ed44fef2178aa7ae3d3
7
+ data.tar.gz: 762b0ce2284870d8518aa17a458746b6d0e80641af8f5c5f94ca4f25f386c7ea3481d8ce82598379c9848087ec4e4387b060fc775e3f8e43ba462555f93bacf9
@@ -1,5 +1,10 @@
1
1
  ### Unreleased
2
2
 
3
+ ### 0.6.0 (2017-07-26)
4
+
5
+ * Require `securerandom` explicitly
6
+ * Add support for queries using Sequel's virtual rows
7
+
3
8
  ### 0.5.2 (2016-12-02)
4
9
 
5
10
  * Eagerly initialize `Model#cells`
data/README.md CHANGED
@@ -8,29 +8,29 @@ Shameless is an implementation of a schemaless, distributed, append-only store b
8
8
 
9
9
  Shameless was born out of our need to have highly scalable, distributed storage for hotel rates. Rates are a way hotels package their rooms, they typically include check-in and check-out date, room type, rate plan, net price, discount, extra services, etc. Our original solution of storing rates in a typical relational SQL table was reaching its limits due to write congestion, migration anxiety, and high maintenance.
10
10
 
11
- Hotel rates change very frequently, so our solution needed to have consistent write latency. There are also mutliple agents mutating various aspects of those rates, so we wanted something that would enable versioning. We also wanted to avoid having to create migrations whenever we were adding more data to rates.
11
+ Hotel rates change very frequently, so our solution needed to have consistent write latency. There are also multiple agents mutating various aspects of those rates, so we wanted something that would enable versioning. We also wanted to avoid having to create migrations whenever we were adding more data to rates.
12
12
 
13
13
  ## Concept
14
14
 
15
15
  The whole idea of Shameless is to split a regular SQL table into index tables and content tables. Index tables map the fields you want to query by to UUIDs, content tables map UUIDs to model contents (bodies). In addition, both index and content tables are sharded.
16
16
 
17
- The body of the model is schema-less, you can store an arbitrary data structures. Under the hood, the body is serialized using MessagePack and stored as a blob in a single database column (hence the need for index tables).
17
+ The body of the model is schema-less, you can store arbitrary data structures in it. Under the hood, the body is serialized using MessagePack and stored as a blob in a single database column (hence the need for index tables).
18
18
 
19
19
  The process of querying for records can be described as:
20
20
 
21
21
  1. Query the index tables by index fields (e.g. hotel ID, check-in date, and length of stay), sharded by hotel ID, getting get back a list of UUIDs
22
- 2. Query the content tables, sharded by UUID, for most recent version of model
22
+ 1. Query the content tables, sharded by UUID, for most recent version of model
23
23
 
24
24
  Inserting a record is similar:
25
25
 
26
26
  1. Generate a UUID
27
- 2. Serialize and write model content into appropriate shard of the content tables
28
- 2. Insert a row (index fields + model UUID) to the appropriate shard of the index table
27
+ 1. Serialize and write model content into appropriate shard of the content tables
28
+ 1. Insert a row (index fields + model UUID) to the appropriate shard of the index table
29
29
 
30
30
  Inserting a new version of an existing record is even simpler:
31
31
 
32
32
  1. Increment version
33
- 2. Serialize and write model content into appropriate shard of the content tables
33
+ 1. Serialize and write model content into appropriate shard of the content tables
34
34
 
35
35
  Naturally, shameless hides all that complexity behind a straight-forward API.
36
36
 
@@ -46,12 +46,13 @@ The core object of shameless is a `Store`. Here's how you can set one up:
46
46
  RateStore = Shameless::Store.new(:rate_store) do |c|
47
47
  c.partition_urls = [ENV['RATE_STORE_DATABASE_URL_0'], ENV['RATE_STORE_DATABASE_URL_1']
48
48
  c.shards_count = 512 # total number of shards across all partitions
49
- c.connection_options = {max_connections: 10} # connection options passed to Sequel.connect
49
+ c.connection_options = {max_connections: 10} # connection options passed to `Sequel.connect`
50
50
  c.database_extensions = [:newrelic_instrumentation]
51
+ c.create_table_options = {engine: "InnoDB"} # passed to Sequel's `create_table`
51
52
  end
52
53
  ```
53
54
 
54
- The initializer argument (`:rate_store`) defines the namespace by which all tables will be prefixed, in this case `rate_store_`.
55
+ The initializer argument (`:rate_store`) defines the namespace by which all tables will be prefixed, in this case `rate_store_`. If you pass `nil`, there will be no prefix.
55
56
 
56
57
  Once you've got the Store configured, you can declare models.
57
58
 
@@ -135,7 +136,9 @@ end
135
136
 
136
137
  ### Reading/writing
137
138
 
138
- To insert and query the model, use `Model.put` and `Model.where`:
139
+ To write data to the model, use `Model.put` `Model#save`, `Cell#save`, `Model#update`, or `Cell#update`. `Model.put` will perform an "upsert", i.e. it will try to find an existing record with the given index fields, and insert a new version for that record's base cell if it finds one, or pick a new UUID, write the first version of the base cell, and write to all indices otherwise.
140
+
141
+ Here are some examples of how you can read and write data from/to a shameless store:
139
142
 
140
143
  ```ruby
141
144
  # Writing - all index fields are required, the rest is the schemaless content
@@ -146,16 +149,85 @@ rate[:net_price] # => 120.0 # access in the "base" cell
146
149
  rate[:net_price] = 130.0
147
150
  rate.save
148
151
 
152
+ # You can also access the "base" cell explicitly
153
+ rate.base[:net_price] = 140.0
154
+ rate.base.save
155
+
149
156
  # Reading from/writing to a different cell is simple, too:
150
157
  rate.meta[:hotel_enabled] = true
151
158
  rate.meta.save
152
159
 
160
+ # You can also do that in one go using `Model#update` or `Cell#update`. This writes a new
161
+ # version of the cell, merging the hash passed in as parameter with existing values.
162
+ rate.update(tax_rate: 11.0, gateway: 'pegasus')
163
+ rate.body # => {net_price: 140.0, tax_rate: 11.0, gateway: 'pegasus'}
164
+
165
+ rate.meta.update(hotel_enabled: false)
166
+ ```
167
+
168
+ To query, use `Model.where` (also using Sequel's [virtual row blocks](http://sequel.jeremyevans.net/rdoc/files/doc/virtual_rows_rdoc.html)):
169
+
170
+ ```ruby
153
171
  # Querying by primary index
154
172
  rates = Rate.where(hotel_id: 1, room_type: '1 bed', check_in_date: Date.today)
155
173
 
156
174
  # Querying by a named index
157
175
  rates = Rate.secondary_index.where(hotel_id: 1, gateway: 'pegasus', discount_type: 'geo')
158
176
  rates.first[:net_price] # => 130.0
177
+
178
+ # Query using Sequel's virtual row block (handy for inequality operators)
179
+ rates = Rate.where(hotel_id: 1, room_type: '1 bed') { check_in_date > Date.today }
180
+ ```
181
+
182
+ To access a cell field that you're not sure has a value, you can use and `Cell#fetch` (`Model#fetch` delegates to the base cell) to get a value from a cell, or a default, e.g.:
183
+
184
+ ```ruby
185
+ rate[:net_price] = 130.0
186
+ rate.fetch(:net_price, 100) # => 130.0
187
+ rate.meta.fetch(:enabled, true) # => true
188
+ ```
189
+
190
+ Cells are versioned, the current version is stored in a column called `ref_key`. The first version of a cell has a `ref_key` of zero. To access a previous version of a cell, use `Cell#previous` (`Model#previous` delegates to the base cell). Example:
191
+
192
+ ```ruby
193
+ # ...
194
+ rate[:net_price] # => 120.0
195
+ rate.ref_key # => 1
196
+ rate.update(net_price: 130.0)
197
+ rate.ref_key # => 2
198
+ rate.previous[:net_price] # => 120.0
199
+ rate.previous.ref_key # => 1
200
+ rate.previous.previous.ref_key # => 0
201
+ rate.previous.previous.previous # => nil
202
+ ```
203
+
204
+ It could happen that another process may have updated a model/cell. To fetch the latest state, use `Cell#reload` (`Model#reload` reloads *all* cells), e.g.:
205
+
206
+ ```ruby
207
+ rate[:net_price] # => 120.0
208
+
209
+ # Another process updates the cell
210
+ Rate.where(hotel_id: rate[:hotel_id]).first.update(net_price: 130.0)
211
+
212
+ rate[:net_price] # => 120.0
213
+ rate.reload
214
+ rate[:net_price] # => 130.0
215
+ ```
216
+
217
+ To check if a given cell exists, use `Cell#present?` (as you can suspect, `Model#present?` delegates to the base cell). You can also use `Model#cells` to iterate over all cells, e.g.:
218
+
219
+ ```ruby
220
+ rate.present? # => true
221
+ rate.meta.present? # => false
222
+
223
+ rate.cells.any?(&:present?) # => true
224
+ ```
225
+
226
+ To see the cell's full state (body + metadata), use `Cell#as_json` (`Model#as_json` delegates to the base cell), e.g.:
227
+
228
+ ```ruby
229
+ rate.as_json # => {id: 123, uuid: "...", created_at: "...", column_name: "base", ref_key: 3,
230
+ # body: {hotel_id: 1, check_in_date: "2017-01-03", room_type: "ROH", net_price: 130.0}}
159
231
  ```
160
232
 
161
233
  ### Creating tables
@@ -168,6 +240,18 @@ RateStore.create_tables!
168
240
 
169
241
  This will create the underlying index tables, content tables, together with database indices for fast access.
170
242
 
243
+ ### Concurrent writes
244
+
245
+ Since writes to shameless aren't atomic, concurrency control needs to be moved to application code. We're using a setup where almost all our writes go through queues, one queue per shard. We're using `Store#each_shard` to match queue names to shards and to aggregate queue stats across all shards.
246
+
247
+ ### Using shameless as a data stream store (similar to Kafka)
248
+
249
+ Thanks to storing each write to shameless as a new record in the underlying database table, we're able to use our shameless store as a log for stream processing. For each shard, we have a worker that goes through all new entries in that shard and triggers various event processors to handle all kinds of asynchronous work. For that purpose, we're using `Model.fetch_latest_cells(shard:, cursor:, limit:)`, and incrementing a cursor (stored in Redis) after each record has been processed successfully. We're using the cells' IDs as cursors, using `Cell#id`, which returns the underlying table's primary key value.
250
+
251
+ ### Utilities
252
+
253
+ Sometimes it may be useful to know where a given model will end up, based on its shardable value. For this, you can use `Store#find_shard(shardable_value)`, e.g.: `RateStore.find_shard(hotel.id) # => 196`.
254
+
171
255
  ## Installation
172
256
 
173
257
  Add this line to your application's Gemfile:
@@ -194,7 +278,6 @@ To install this gem onto your local machine, run `bundle exec rake install`. To
194
278
 
195
279
  Bug reports and pull requests are welcome on GitHub at https://github.com/hoteltonight/shameless.
196
280
 
197
-
198
281
  ## License
199
282
 
200
283
  The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
@@ -3,6 +3,8 @@ module Shameless
3
3
  attr_accessor :partition_urls, :shards_count, :connection_options, :database_extensions,
4
4
  :create_table_options
5
5
 
6
+ # Needed to deal with our legacy schema that stores created_at as an integer timestamp
7
+ # and does date conversions in Ruby-land, don't set to `true` for new projects
6
8
  attr_accessor :legacy_created_at_is_bigint
7
9
 
8
10
  def shards_per_partition_count
@@ -36,10 +36,10 @@ module Shameless
36
36
  @model.store.put(table_name, shardable_value, index_values)
37
37
  end
38
38
 
39
- def where(query)
39
+ def where(query, &block)
40
40
  shardable_value = query.fetch(@shard_on).to_i
41
41
  query = index_values(query, false)
42
- @model.store.where(table_name, shardable_value, query).map {|r| @model.new(r[:uuid]) }
42
+ @model.store.where(table_name, shardable_value, query, &block).map {|r| @model.new(r[:uuid]) }
43
43
  end
44
44
 
45
45
  def table_name
@@ -1,3 +1,4 @@
1
+ require 'securerandom'
1
2
  require 'shameless/index'
2
3
  require 'shameless/cell'
3
4
 
@@ -98,8 +99,8 @@ module Shameless
98
99
  @indices.each(&:create_tables!)
99
100
  end
100
101
 
101
- def where(query)
102
- primary_index.where(query)
102
+ def where(query, &block)
103
+ primary_index.where(query, &block)
103
104
  end
104
105
 
105
106
  def reject_index_values(values)
@@ -22,8 +22,8 @@ module Shameless
22
22
  find_table(table_name, shardable_value).insert(values)
23
23
  end
24
24
 
25
- def where(table_name, shardable_value, query)
26
- find_table(table_name, shardable_value).where(query)
25
+ def where(table_name, shardable_value, query, &block)
26
+ find_table(table_name, shardable_value).where(query, &block)
27
27
  end
28
28
 
29
29
  def disconnect
@@ -1,3 +1,3 @@
1
1
  module Shameless
2
- VERSION = "0.5.2"
2
+ VERSION = "0.6.0"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: shameless
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.5.2
4
+ version: 0.6.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Olek Janiszewski
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2016-12-02 00:00:00.000000000 Z
11
+ date: 2017-07-26 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: msgpack
@@ -140,7 +140,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
140
140
  version: '0'
141
141
  requirements: []
142
142
  rubyforge_project:
143
- rubygems_version: 2.5.2
143
+ rubygems_version: 2.6.11
144
144
  signing_key:
145
145
  specification_version: 4
146
146
  summary: Scalable distributed append-only data store