shameless 0.5.2 → 0.6.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +5 -0
- data/README.md +93 -10
- data/lib/shameless/configuration.rb +2 -0
- data/lib/shameless/index.rb +2 -2
- data/lib/shameless/model.rb +3 -2
- data/lib/shameless/store.rb +2 -2
- data/lib/shameless/version.rb +1 -1
- metadata +3 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 5bc5dbfda6dcf36c8ca5ac5d91394cf1c01b08ec
|
4
|
+
data.tar.gz: f6ce2ddeedc0506c1ca7ec6613a42971eafa86ef
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 42d218c63e8ef57dd1aa5b6a6f2a69db89f39720c8935f26e0d8e47e3872c4a21a5449915500520cbe109750028659349444258a795c3ed44fef2178aa7ae3d3
|
7
|
+
data.tar.gz: 762b0ce2284870d8518aa17a458746b6d0e80641af8f5c5f94ca4f25f386c7ea3481d8ce82598379c9848087ec4e4387b060fc775e3f8e43ba462555f93bacf9
|
data/CHANGELOG.md
CHANGED
data/README.md
CHANGED
@@ -8,29 +8,29 @@ Shameless is an implementation of a schemaless, distributed, append-only store b
|
|
8
8
|
|
9
9
|
Shameless was born out of our need to have highly scalable, distributed storage for hotel rates. Rates are a way hotels package their rooms, they typically include check-in and check-out date, room type, rate plan, net price, discount, extra services, etc. Our original solution of storing rates in a typical relational SQL table was reaching its limits due to write congestion, migration anxiety, and high maintenance.
|
10
10
|
|
11
|
-
Hotel rates change very frequently, so our solution needed to have consistent write latency. There are also
|
11
|
+
Hotel rates change very frequently, so our solution needed to have consistent write latency. There are also multiple agents mutating various aspects of those rates, so we wanted something that would enable versioning. We also wanted to avoid having to create migrations whenever we were adding more data to rates.
|
12
12
|
|
13
13
|
## Concept
|
14
14
|
|
15
15
|
The whole idea of Shameless is to split a regular SQL table into index tables and content tables. Index tables map the fields you want to query by to UUIDs, content tables map UUIDs to model contents (bodies). In addition, both index and content tables are sharded.
|
16
16
|
|
17
|
-
The body of the model is schema-less, you can store
|
17
|
+
The body of the model is schema-less, you can store arbitrary data structures in it. Under the hood, the body is serialized using MessagePack and stored as a blob in a single database column (hence the need for index tables).
|
18
18
|
|
19
19
|
The process of querying for records can be described as:
|
20
20
|
|
21
21
|
1. Query the index tables by index fields (e.g. hotel ID, check-in date, and length of stay), sharded by hotel ID, getting get back a list of UUIDs
|
22
|
-
|
22
|
+
1. Query the content tables, sharded by UUID, for most recent version of model
|
23
23
|
|
24
24
|
Inserting a record is similar:
|
25
25
|
|
26
26
|
1. Generate a UUID
|
27
|
-
|
28
|
-
|
27
|
+
1. Serialize and write model content into appropriate shard of the content tables
|
28
|
+
1. Insert a row (index fields + model UUID) to the appropriate shard of the index table
|
29
29
|
|
30
30
|
Inserting a new version of an existing record is even simpler:
|
31
31
|
|
32
32
|
1. Increment version
|
33
|
-
|
33
|
+
1. Serialize and write model content into appropriate shard of the content tables
|
34
34
|
|
35
35
|
Naturally, shameless hides all that complexity behind a straight-forward API.
|
36
36
|
|
@@ -46,12 +46,13 @@ The core object of shameless is a `Store`. Here's how you can set one up:
|
|
46
46
|
RateStore = Shameless::Store.new(:rate_store) do |c|
|
47
47
|
c.partition_urls = [ENV['RATE_STORE_DATABASE_URL_0'], ENV['RATE_STORE_DATABASE_URL_1']
|
48
48
|
c.shards_count = 512 # total number of shards across all partitions
|
49
|
-
c.connection_options = {max_connections: 10} # connection options passed to Sequel.connect
|
49
|
+
c.connection_options = {max_connections: 10} # connection options passed to `Sequel.connect`
|
50
50
|
c.database_extensions = [:newrelic_instrumentation]
|
51
|
+
c.create_table_options = {engine: "InnoDB"} # passed to Sequel's `create_table`
|
51
52
|
end
|
52
53
|
```
|
53
54
|
|
54
|
-
The initializer argument (`:rate_store`) defines the namespace by which all tables will be prefixed, in this case `rate_store_`.
|
55
|
+
The initializer argument (`:rate_store`) defines the namespace by which all tables will be prefixed, in this case `rate_store_`. If you pass `nil`, there will be no prefix.
|
55
56
|
|
56
57
|
Once you've got the Store configured, you can declare models.
|
57
58
|
|
@@ -135,7 +136,9 @@ end
|
|
135
136
|
|
136
137
|
### Reading/writing
|
137
138
|
|
138
|
-
To
|
139
|
+
To write data to the model, use `Model.put` `Model#save`, `Cell#save`, `Model#update`, or `Cell#update`. `Model.put` will perform an "upsert", i.e. it will try to find an existing record with the given index fields, and insert a new version for that record's base cell if it finds one, or pick a new UUID, write the first version of the base cell, and write to all indices otherwise.
|
140
|
+
|
141
|
+
Here are some examples of how you can read and write data from/to a shameless store:
|
139
142
|
|
140
143
|
```ruby
|
141
144
|
# Writing - all index fields are required, the rest is the schemaless content
|
@@ -146,16 +149,85 @@ rate[:net_price] # => 120.0 # access in the "base" cell
|
|
146
149
|
rate[:net_price] = 130.0
|
147
150
|
rate.save
|
148
151
|
|
152
|
+
# You can also access the "base" cell explicitly
|
153
|
+
rate.base[:net_price] = 140.0
|
154
|
+
rate.base.save
|
155
|
+
|
149
156
|
# Reading from/writing to a different cell is simple, too:
|
150
157
|
rate.meta[:hotel_enabled] = true
|
151
158
|
rate.meta.save
|
152
159
|
|
160
|
+
# You can also do that in one go using `Model#update` or `Cell#update`. This writes a new
|
161
|
+
# version of the cell, merging the hash passed in as parameter with existing values.
|
162
|
+
rate.update(tax_rate: 11.0, gateway: 'pegasus')
|
163
|
+
rate.body # => {net_price: 140.0, tax_rate: 11.0, gateway: 'pegasus'}
|
164
|
+
|
165
|
+
rate.meta.update(hotel_enabled: false)
|
166
|
+
```
|
167
|
+
|
168
|
+
To query, use `Model.where` (also using Sequel's [virtual row blocks](http://sequel.jeremyevans.net/rdoc/files/doc/virtual_rows_rdoc.html)):
|
169
|
+
|
170
|
+
```ruby
|
153
171
|
# Querying by primary index
|
154
172
|
rates = Rate.where(hotel_id: 1, room_type: '1 bed', check_in_date: Date.today)
|
155
173
|
|
156
174
|
# Querying by a named index
|
157
175
|
rates = Rate.secondary_index.where(hotel_id: 1, gateway: 'pegasus', discount_type: 'geo')
|
158
176
|
rates.first[:net_price] # => 130.0
|
177
|
+
|
178
|
+
# Query using Sequel's virtual row block (handy for inequality operators)
|
179
|
+
rates = Rate.where(hotel_id: 1, room_type: '1 bed') { check_in_date > Date.today }
|
180
|
+
```
|
181
|
+
|
182
|
+
To access a cell field that you're not sure has a value, you can use and `Cell#fetch` (`Model#fetch` delegates to the base cell) to get a value from a cell, or a default, e.g.:
|
183
|
+
|
184
|
+
```ruby
|
185
|
+
rate[:net_price] = 130.0
|
186
|
+
rate.fetch(:net_price, 100) # => 130.0
|
187
|
+
rate.meta.fetch(:enabled, true) # => true
|
188
|
+
```
|
189
|
+
|
190
|
+
Cells are versioned, the current version is stored in a column called `ref_key`. The first version of a cell has a `ref_key` of zero. To access a previous version of a cell, use `Cell#previous` (`Model#previous` delegates to the base cell). Example:
|
191
|
+
|
192
|
+
```ruby
|
193
|
+
# ...
|
194
|
+
rate[:net_price] # => 120.0
|
195
|
+
rate.ref_key # => 1
|
196
|
+
rate.update(net_price: 130.0)
|
197
|
+
rate.ref_key # => 2
|
198
|
+
rate.previous[:net_price] # => 120.0
|
199
|
+
rate.previous.ref_key # => 1
|
200
|
+
rate.previous.previous.ref_key # => 0
|
201
|
+
rate.previous.previous.previous # => nil
|
202
|
+
```
|
203
|
+
|
204
|
+
It could happen that another process may have updated a model/cell. To fetch the latest state, use `Cell#reload` (`Model#reload` reloads *all* cells), e.g.:
|
205
|
+
|
206
|
+
```ruby
|
207
|
+
rate[:net_price] # => 120.0
|
208
|
+
|
209
|
+
# Another process updates the cell
|
210
|
+
Rate.where(hotel_id: rate[:hotel_id]).first.update(net_price: 130.0)
|
211
|
+
|
212
|
+
rate[:net_price] # => 120.0
|
213
|
+
rate.reload
|
214
|
+
rate[:net_price] # => 130.0
|
215
|
+
```
|
216
|
+
|
217
|
+
To check if a given cell exists, use `Cell#present?` (as you can suspect, `Model#present?` delegates to the base cell). You can also use `Model#cells` to iterate over all cells, e.g.:
|
218
|
+
|
219
|
+
```ruby
|
220
|
+
rate.present? # => true
|
221
|
+
rate.meta.present? # => false
|
222
|
+
|
223
|
+
rate.cells.any?(&:present?) # => true
|
224
|
+
```
|
225
|
+
|
226
|
+
To see the cell's full state (body + metadata), use `Cell#as_json` (`Model#as_json` delegates to the base cell), e.g.:
|
227
|
+
|
228
|
+
```ruby
|
229
|
+
rate.as_json # => {id: 123, uuid: "...", created_at: "...", column_name: "base", ref_key: 3,
|
230
|
+
# body: {hotel_id: 1, check_in_date: "2017-01-03", room_type: "ROH", net_price: 130.0}}
|
159
231
|
```
|
160
232
|
|
161
233
|
### Creating tables
|
@@ -168,6 +240,18 @@ RateStore.create_tables!
|
|
168
240
|
|
169
241
|
This will create the underlying index tables, content tables, together with database indices for fast access.
|
170
242
|
|
243
|
+
### Concurrent writes
|
244
|
+
|
245
|
+
Since writes to shameless aren't atomic, concurrency control needs to be moved to application code. We're using a setup where almost all our writes go through queues, one queue per shard. We're using `Store#each_shard` to match queue names to shards and to aggregate queue stats across all shards.
|
246
|
+
|
247
|
+
### Using shameless as a data stream store (similar to Kafka)
|
248
|
+
|
249
|
+
Thanks to storing each write to shameless as a new record in the underlying database table, we're able to use our shameless store as a log for stream processing. For each shard, we have a worker that goes through all new entries in that shard and triggers various event processors to handle all kinds of asynchronous work. For that purpose, we're using `Model.fetch_latest_cells(shard:, cursor:, limit:)`, and incrementing a cursor (stored in Redis) after each record has been processed successfully. We're using the cells' IDs as cursors, using `Cell#id`, which returns the underlying table's primary key value.
|
250
|
+
|
251
|
+
### Utilities
|
252
|
+
|
253
|
+
Sometimes it may be useful to know where a given model will end up, based on its shardable value. For this, you can use `Store#find_shard(shardable_value)`, e.g.: `RateStore.find_shard(hotel.id) # => 196`.
|
254
|
+
|
171
255
|
## Installation
|
172
256
|
|
173
257
|
Add this line to your application's Gemfile:
|
@@ -194,7 +278,6 @@ To install this gem onto your local machine, run `bundle exec rake install`. To
|
|
194
278
|
|
195
279
|
Bug reports and pull requests are welcome on GitHub at https://github.com/hoteltonight/shameless.
|
196
280
|
|
197
|
-
|
198
281
|
## License
|
199
282
|
|
200
283
|
The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
|
@@ -3,6 +3,8 @@ module Shameless
|
|
3
3
|
attr_accessor :partition_urls, :shards_count, :connection_options, :database_extensions,
|
4
4
|
:create_table_options
|
5
5
|
|
6
|
+
# Needed to deal with our legacy schema that stores created_at as an integer timestamp
|
7
|
+
# and does date conversions in Ruby-land, don't set to `true` for new projects
|
6
8
|
attr_accessor :legacy_created_at_is_bigint
|
7
9
|
|
8
10
|
def shards_per_partition_count
|
data/lib/shameless/index.rb
CHANGED
@@ -36,10 +36,10 @@ module Shameless
|
|
36
36
|
@model.store.put(table_name, shardable_value, index_values)
|
37
37
|
end
|
38
38
|
|
39
|
-
def where(query)
|
39
|
+
def where(query, &block)
|
40
40
|
shardable_value = query.fetch(@shard_on).to_i
|
41
41
|
query = index_values(query, false)
|
42
|
-
@model.store.where(table_name, shardable_value, query).map {|r| @model.new(r[:uuid]) }
|
42
|
+
@model.store.where(table_name, shardable_value, query, &block).map {|r| @model.new(r[:uuid]) }
|
43
43
|
end
|
44
44
|
|
45
45
|
def table_name
|
data/lib/shameless/model.rb
CHANGED
@@ -1,3 +1,4 @@
|
|
1
|
+
require 'securerandom'
|
1
2
|
require 'shameless/index'
|
2
3
|
require 'shameless/cell'
|
3
4
|
|
@@ -98,8 +99,8 @@ module Shameless
|
|
98
99
|
@indices.each(&:create_tables!)
|
99
100
|
end
|
100
101
|
|
101
|
-
def where(query)
|
102
|
-
primary_index.where(query)
|
102
|
+
def where(query, &block)
|
103
|
+
primary_index.where(query, &block)
|
103
104
|
end
|
104
105
|
|
105
106
|
def reject_index_values(values)
|
data/lib/shameless/store.rb
CHANGED
@@ -22,8 +22,8 @@ module Shameless
|
|
22
22
|
find_table(table_name, shardable_value).insert(values)
|
23
23
|
end
|
24
24
|
|
25
|
-
def where(table_name, shardable_value, query)
|
26
|
-
find_table(table_name, shardable_value).where(query)
|
25
|
+
def where(table_name, shardable_value, query, &block)
|
26
|
+
find_table(table_name, shardable_value).where(query, &block)
|
27
27
|
end
|
28
28
|
|
29
29
|
def disconnect
|
data/lib/shameless/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: shameless
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.6.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Olek Janiszewski
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2017-07-26 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: msgpack
|
@@ -140,7 +140,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
140
140
|
version: '0'
|
141
141
|
requirements: []
|
142
142
|
rubyforge_project:
|
143
|
-
rubygems_version: 2.
|
143
|
+
rubygems_version: 2.6.11
|
144
144
|
signing_key:
|
145
145
|
specification_version: 4
|
146
146
|
summary: Scalable distributed append-only data store
|