sequel-schema-sharding 0.0.1 → 0.0.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CONTRIBUTORS.md +6 -0
- data/LICENSE.txt +1 -1
- data/README.md +247 -10
- data/examples/db/migrations/albums/001_create_albums.rb +15 -0
- data/examples/db/migrations/artists/001_create_artists.rb +15 -0
- data/examples/model.rb +19 -0
- data/examples/sharding.rb +4 -0
- data/examples/sharding.yml +53 -0
- data/lib/sequel/schema-sharding/version.rb +1 -1
- metadata +10 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 59427115172509c05a9c50ff6b52f6bf13b3bae7
|
4
|
+
data.tar.gz: 82f16ec31818221234a0873abe6663395137e0ba
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 8620ce75eb60600a686dbd2c3ef53bf5d11d604963e6315e7d467c046578a6c7d76fe408cba8a001925c40ef04ec6576e1c697b4aad821dcc5ffbe3b0a5ebec9
|
7
|
+
data.tar.gz: cd8c89a4f8a40afd9c062aeed3423638db1f7441eb8c9bb967a7a470a134d2fa583d8a53782f6bb526d2cbdfe03570926f5cf36447b85106938d134ee3970f21
|
data/CONTRIBUTORS.md
ADDED
data/LICENSE.txt
CHANGED
data/README.md
CHANGED
@@ -1,28 +1,265 @@
|
|
1
|
-
|
2
|
-
|
1
|
+
sequel-schema-sharding
|
2
|
+
======================
|
3
3
|
|
4
|
-
[](https://travis-ci.org/wanelo/sequel-sharding)
|
4
|
+
[](https://travis-ci.org/wanelo/sequel-schema-sharding)
|
5
|
+
|
6
|
+
Horizontally shard PostgreSQL tables with the Sequel gem, where each shard
|
7
|
+
lives in its own PostgreSQL schema.
|
8
|
+
|
9
|
+
This gem allows you to configure mappings between logical and physical shards, pooling
|
10
|
+
connections between logical shards on the same physical server.
|
5
11
|
|
6
|
-
Horizontally shard postgres with the Sequel gem. This gem allows you to configure mappings between logical and
|
7
|
-
physical shards, and pool connections between logical shards on the same physical server.
|
8
12
|
|
9
13
|
## Installation
|
10
14
|
|
11
15
|
Add this line to your application's Gemfile:
|
12
16
|
|
13
|
-
gem 'sequel-sharding'
|
17
|
+
gem 'sequel-schema-sharding'
|
14
18
|
|
15
19
|
And then execute:
|
16
20
|
|
17
21
|
$ bundle
|
18
22
|
|
19
|
-
Or install it yourself as:
|
20
|
-
|
21
|
-
$ gem install sequel-sharding
|
22
23
|
|
23
24
|
## Usage
|
24
25
|
|
25
|
-
|
26
|
+
See the `examples` directory for example files.
|
27
|
+
|
28
|
+
### Configuration
|
29
|
+
|
30
|
+
Create a sharding configuration file in your project, for instance at
|
31
|
+
`config/sharding.yml`. The format should match the following
|
32
|
+
conventions:
|
33
|
+
|
34
|
+
```yml
|
35
|
+
<env>:
|
36
|
+
tables:
|
37
|
+
<table_name>:
|
38
|
+
schema_name: "schema_%e_%s"
|
39
|
+
logical_shards:
|
40
|
+
<1..n>: <shard_name>
|
41
|
+
physical_shards:
|
42
|
+
<shard_name>:
|
43
|
+
host: <hostname>
|
44
|
+
database: <database>
|
45
|
+
common:
|
46
|
+
username: <pg_username>
|
47
|
+
password: <pg_password>
|
48
|
+
port: <pg_port>
|
49
|
+
```
|
50
|
+
|
51
|
+
Tables can coexist in schemas, though they do not have to.
|
52
|
+
|
53
|
+
In your project, configure `sequel-schema-sharding` in a ruby file that
|
54
|
+
gets loaded before your models, for instance at `config/sharding.rb`.
|
55
|
+
|
56
|
+
```ruby
|
57
|
+
require 'sequel-schema-sharding'
|
58
|
+
|
59
|
+
Sequel::SchemaSharding.migration_path = File.expand_path('../../db/sharding_migrations', __FILE__)
|
60
|
+
Sequel::SchemaSharding.sharding_yml_path = File.expand_path('../sharding.yml', __FILE__)
|
61
|
+
```
|
62
|
+
|
63
|
+
### Migrations
|
64
|
+
|
65
|
+
Each table gets its own set of migrations. Underneath the scenes,
|
66
|
+
`sequel-schema-sharding` uses Sequel migrations, though migrations are
|
67
|
+
run using the `Sequel::SchemaSharding::DatabaseManager` class.
|
68
|
+
|
69
|
+
For instance, if you have two sharded tables, `:artists` and `:albums`,
|
70
|
+
your migration folder would look something like this:
|
71
|
+
|
72
|
+
```yml
|
73
|
+
- my_project
|
74
|
+
- db
|
75
|
+
- migrations
|
76
|
+
- artists
|
77
|
+
- 001_create_artists.rb
|
78
|
+
- 002_add_indexes_to_artists.rb
|
79
|
+
- albums
|
80
|
+
- 001_create_albums.rb
|
81
|
+
```
|
82
|
+
|
83
|
+
See Sequel documentation for more info:
|
84
|
+
* (http://sequel.rubyforge.org/rdoc/files/doc/schema_modification_rdoc.html)
|
85
|
+
* (http://sequel.rubyforge.org/rdoc/files/doc/migration_rdoc.html)
|
86
|
+
|
87
|
+
TODO: rake tasks for running migrations
|
88
|
+
|
89
|
+
### Models
|
90
|
+
|
91
|
+
Models declare their table in the class definition. This allows Sequel
|
92
|
+
to load table information from the database when the environment loads.
|
93
|
+
This is particularly important for typecasting, so empty strings can be
|
94
|
+
typecast to null, etc.
|
95
|
+
|
96
|
+
The tricky bit is that `sequel-schema-sharding` connects to the first
|
97
|
+
available shard for a table in order to read the database schema.
|
98
|
+
|
99
|
+
```ruby
|
100
|
+
require 'config/sharding'
|
101
|
+
|
102
|
+
class Artist < Sequel::SchemaSharding::Model('artists')
|
103
|
+
set_columns [:id, :name]
|
104
|
+
set_sharded_column :id
|
105
|
+
|
106
|
+
def this
|
107
|
+
@this ||= self.class.by_id(id)
|
108
|
+
end
|
109
|
+
|
110
|
+
def self.by_id(id)
|
111
|
+
shard_for(id).where(id: id).first
|
112
|
+
end
|
113
|
+
end
|
114
|
+
|
115
|
+
class Album < Sequel::SchemaSharding::Model('albums')
|
116
|
+
set_columns [:artist_id, :name, :release_date, :created_at]
|
117
|
+
set_sharded_column :artist_id
|
118
|
+
|
119
|
+
def this
|
120
|
+
@this ||= self.class.by_artist(artist_id)
|
121
|
+
end
|
122
|
+
|
123
|
+
def by_artist(artist_id)
|
124
|
+
shard_for(artist_id).where(artist_id: artist_id)
|
125
|
+
end
|
126
|
+
|
127
|
+
def by_artist_and_name(artist_id, name)
|
128
|
+
shard_for(artist_id).where(name: name, artist_id: artist_id)
|
129
|
+
end
|
130
|
+
end
|
131
|
+
```
|
132
|
+
|
133
|
+
Note that logical and physical shards mapped in schema.yml need to exist
|
134
|
+
before you can load models into memory.
|
135
|
+
|
136
|
+
Read access always starts with the `:shard_for` method, to ensure that
|
137
|
+
the correct database connection and shard name is used. Writes will
|
138
|
+
automatically choose the correct shard based on the sharded column.
|
139
|
+
Never try to insert records with nil values in sharded columns.
|
140
|
+
|
141
|
+
TODO: explain why we define `this`
|
142
|
+
|
143
|
+
## FAQ
|
144
|
+
|
145
|
+
### How should I shard my databases?
|
146
|
+
|
147
|
+
This is entirely dependent on the access patterns of your application. A
|
148
|
+
good rule, though, is to look at your indexes. If every query goes
|
149
|
+
through an index on `:user_id`, then chances are that you should shard
|
150
|
+
on `:user_id`. If half of your queries go through `:user_id` and the
|
151
|
+
other half go through `:job_id`, then you may need to create two sets of
|
152
|
+
shards, each with its own model, and have your application write to
|
153
|
+
both. This requires additional application complexity to keep the two
|
154
|
+
sets of shards in sync—it's less complex than doing multi-shard reads to
|
155
|
+
keep everything in one model, though.
|
156
|
+
|
157
|
+
When going into database sharding, an early exercise that is very
|
158
|
+
helpful is to analyze application queries and try to reduce the number
|
159
|
+
of unique queries. If possible, try to refactor queries such that they
|
160
|
+
fit into the smallest number of shard types. For instance, if you find
|
161
|
+
Albums by release year, but every action you query from already has the
|
162
|
+
`:artist_id`, consider changing your query to find by `:artist_id` and
|
163
|
+
release year.
|
164
|
+
|
165
|
+
### How should I generate IDs?
|
166
|
+
|
167
|
+
This is also dependent on your application and your comfort level with
|
168
|
+
various technologies, but regardless should be done outside
|
169
|
+
of `sequel-schema-sharding`. In general there are three approaches that
|
170
|
+
we've considered:
|
171
|
+
|
172
|
+
* Follow Instagram's approach and let PostgreSQL generate ids. They
|
173
|
+
install functions into each shard, to ensure that each shard generates
|
174
|
+
unique ids.
|
175
|
+
|
176
|
+
* Follow Twitter's approach and deploy a separate service for unique id
|
177
|
+
generation. Their in-house solution is called Snowflake, and depends
|
178
|
+
on maven, finagle and thrift.
|
179
|
+
|
180
|
+
* Why use ids at all? If you are sharding Like data or something that
|
181
|
+
looks similar to a join table, you may not need a unique identifier.
|
182
|
+
You are probably sharding on a foreign key to some other table, and
|
183
|
+
may not ever access individual Likes by id.
|
184
|
+
|
185
|
+
### Should each table get its own set of shards/schemas?
|
186
|
+
|
187
|
+
In the early days of a project's lifetime, it may seem like less
|
188
|
+
management overhead to let multiple tables coexist in each shard.
|
189
|
+
Experience with sharding in other technologies (particularly Redis) have
|
190
|
+
shown us that in any sharded data store, you will eventually need
|
191
|
+
to redistribute shards. More data equals larger storage and RAM
|
192
|
+
requirements, and as servers fill up you will find yourself needing to
|
193
|
+
move shards onto a greater number of servers. If your project is
|
194
|
+
successful, this may come much sooner than you expect in initial
|
195
|
+
infrastructure planning meetings.
|
196
|
+
|
197
|
+
Colocating multiple data sets in individual shards makes shard
|
198
|
+
redistribution more complicated and risk-prone. More things break when
|
199
|
+
an individual shard goes down. Pages or queries that depend on an
|
200
|
+
individual data set will stop working when you take down shards to do
|
201
|
+
maintenance on other data sets.
|
202
|
+
|
203
|
+
Simply put, it's less stressful when doing operational maintenance to
|
204
|
+
require twice as many steps that are each easier and less risk-prone.
|
205
|
+
So, do whatever you feel is best, but we've chosen to make each shard
|
206
|
+
single-purpose in our infrastructure.
|
207
|
+
|
208
|
+
### Sequel does sharding. Why another gem?
|
209
|
+
|
210
|
+
The sharding plugin that ships with Sequel assumes that each shard is a
|
211
|
+
separate database. This means that each shard requires a separate
|
212
|
+
connection pool, and that each shard includes every table. When
|
213
|
+
splitting a database into thousands of shards, this means that each
|
214
|
+
application process requires thousands of connections. A proxy such as
|
215
|
+
PGBouncer could help reduce the number of connections from an individual
|
216
|
+
application server, but even then PGBouncer would need to manage thousands
|
217
|
+
of connections.
|
218
|
+
|
219
|
+
When designing a sharded architecture similar to Instagram's approach
|
220
|
+
(http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram),
|
221
|
+
it may be desirable to start with thousands or tens of thousands of shards,
|
222
|
+
to delay the need for resharding as long as possible. PostgreSQL is able
|
223
|
+
to manage tens of thousands of schemas in a single database without
|
224
|
+
significant performance problems, so we can design a sharded backend of
|
225
|
+
thousands of shards living on a few physical servers. As stored data
|
226
|
+
grows, these shards can be moved onto a greater number of servers,
|
227
|
+
without the complication of resharding (i.e. changing the number of
|
228
|
+
shards while retaining the exact mapping of data into old shards).
|
229
|
+
|
230
|
+
### Why Sequel?
|
231
|
+
|
232
|
+
After both good and bad experiences with other Ruby ORMs, Sequel's
|
233
|
+
documentation, ease of use and understandable codebase made it a solid
|
234
|
+
choice for us. The fact that it already supports horizontal sharding and
|
235
|
+
was easy to adapt to our own requirements were a pleasant surprise.
|
236
|
+
|
237
|
+
### What the what?? def self.Model; ???
|
238
|
+
|
239
|
+
Yeah, this threw us for a while, too. The thing is, ORMs in Ruby tend to
|
240
|
+
load information like column info, indexes, etc directly from the
|
241
|
+
connected databases, rather than from local schema dictionaries. In
|
242
|
+
order to do this, databases need to be created and migrations run BEFORE
|
243
|
+
model files can validly loaded.
|
244
|
+
|
245
|
+
If the ORM doesn't load this info from somewhere, then it can't
|
246
|
+
correctly do things like typecast string HTTP params to integers (or
|
247
|
+
nulls).
|
248
|
+
|
249
|
+
Rather than monkeypatching our way around this requirement in Sequel, we
|
250
|
+
ride the wave and just patch in our additions.
|
251
|
+
|
252
|
+
### What could go wrong?
|
253
|
+
|
254
|
+
The thing that you never want to happen is to change the mapping of
|
255
|
+
shards to data. For instance, if you change the number of shards without
|
256
|
+
migrating data into a new database backend, the algorithm by which
|
257
|
+
schemas are chosen will start returning a different mapping for reads than
|
258
|
+
that which was used to insert data. New records will go into the new
|
259
|
+
mapping, but any attempt to read a record inserted via the old mapping
|
260
|
+
will pick the wrong shard and return an empty set. DON'T EVER DO THIS.
|
261
|
+
It's really embarrassing.
|
262
|
+
|
26
263
|
|
27
264
|
## Contributing
|
28
265
|
|
data/examples/model.rb
ADDED
@@ -0,0 +1,19 @@
|
|
1
|
+
require 'config/sharding'
|
2
|
+
|
3
|
+
class Thing < Sequel::SchemaSharding::Model('things')
|
4
|
+
set_columns [:name, :thing1, :thing2]
|
5
|
+
set_sharded_column :name
|
6
|
+
|
7
|
+
# class variables used by Sequel can't easily be set via
|
8
|
+
# pretty methods at the moment. They can be quickly overridden,
|
9
|
+
# however.
|
10
|
+
@require_modification = false
|
11
|
+
|
12
|
+
def this
|
13
|
+
@this ||= self.class.by_name(name)
|
14
|
+
end
|
15
|
+
|
16
|
+
def self.by_name(name)
|
17
|
+
shard_for(name).where(name: name)
|
18
|
+
end
|
19
|
+
end
|
@@ -0,0 +1,53 @@
|
|
1
|
+
test:
|
2
|
+
tables:
|
3
|
+
artists:
|
4
|
+
schema_name: artists_%e_%s
|
5
|
+
logical_shards:
|
6
|
+
1..2: shard1
|
7
|
+
3..4: shard2
|
8
|
+
albums:
|
9
|
+
schema_name: albums_%e_%s
|
10
|
+
logical_shards:
|
11
|
+
1..2: shard2
|
12
|
+
3..4: shard3
|
13
|
+
physical_shards:
|
14
|
+
shard1:
|
15
|
+
host: 127.0.0.1
|
16
|
+
database: my_project_test_shard1
|
17
|
+
shard2:
|
18
|
+
host: 127.0.0.1
|
19
|
+
database: my_project_test_shard2
|
20
|
+
shard3:
|
21
|
+
host: 127.0.0.1
|
22
|
+
database: my_project_test_shard3
|
23
|
+
common:
|
24
|
+
username: postgres
|
25
|
+
password:
|
26
|
+
port: 5432
|
27
|
+
development:
|
28
|
+
tables:
|
29
|
+
artists:
|
30
|
+
schema_name: artists_%e_%s
|
31
|
+
logical_shards:
|
32
|
+
1..2: shard1
|
33
|
+
3..4: shard2
|
34
|
+
albums:
|
35
|
+
schema_name: albums_%e_%s
|
36
|
+
logical_shards:
|
37
|
+
1..2: shard2
|
38
|
+
3..4: shard3
|
39
|
+
physical_shards:
|
40
|
+
shard1:
|
41
|
+
host: 127.0.0.1
|
42
|
+
database: my_project_development_shard1
|
43
|
+
shard2:
|
44
|
+
host: 127.0.0.1
|
45
|
+
database: my_project_development_shard2
|
46
|
+
shard3:
|
47
|
+
host: 127.0.0.1
|
48
|
+
database: my_project_development_shard3
|
49
|
+
common:
|
50
|
+
username: postgres
|
51
|
+
password:
|
52
|
+
port: 5432
|
53
|
+
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: sequel-schema-sharding
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Paul Henry
|
@@ -10,7 +10,7 @@ authors:
|
|
10
10
|
autorequire:
|
11
11
|
bindir: bin
|
12
12
|
cert_chain: []
|
13
|
-
date: 2013-09-
|
13
|
+
date: 2013-09-08 00:00:00.000000000 Z
|
14
14
|
dependencies:
|
15
15
|
- !ruby/object:Gem::Dependency
|
16
16
|
name: sequel
|
@@ -78,10 +78,16 @@ files:
|
|
78
78
|
- .gitignore
|
79
79
|
- .rspec
|
80
80
|
- .travis.yml
|
81
|
+
- CONTRIBUTORS.md
|
81
82
|
- Gemfile
|
82
83
|
- LICENSE.txt
|
83
84
|
- README.md
|
84
85
|
- Rakefile
|
86
|
+
- examples/db/migrations/albums/001_create_albums.rb
|
87
|
+
- examples/db/migrations/artists/001_create_artists.rb
|
88
|
+
- examples/model.rb
|
89
|
+
- examples/sharding.rb
|
90
|
+
- examples/sharding.yml
|
85
91
|
- lib/sequel-schema-sharding.rb
|
86
92
|
- lib/sequel/schema-sharding.rb
|
87
93
|
- lib/sequel/schema-sharding/configuration.rb
|
@@ -125,7 +131,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
125
131
|
version: '0'
|
126
132
|
requirements: []
|
127
133
|
rubyforge_project:
|
128
|
-
rubygems_version: 2.0.
|
134
|
+
rubygems_version: 2.0.3
|
129
135
|
signing_key:
|
130
136
|
specification_version: 4
|
131
137
|
summary: Create horizontally sharded Sequel models with Postgres
|
@@ -141,3 +147,4 @@ test_files:
|
|
141
147
|
- spec/schema-sharding/ring_spec.rb
|
142
148
|
- spec/spec_helper.rb
|
143
149
|
- spec/support/database_helper.rb
|
150
|
+
has_rdoc:
|