sequel-schema-sharding 0.0.1 → 0.0.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 9a83484015fdcd8dc4799e752d6d82ee94498738
4
- data.tar.gz: 2e44e0d65a6c598371035daddc1724022b3c893b
3
+ metadata.gz: 59427115172509c05a9c50ff6b52f6bf13b3bae7
4
+ data.tar.gz: 82f16ec31818221234a0873abe6663395137e0ba
5
5
  SHA512:
6
- metadata.gz: bac9d7a0ed39dbe558e2f0bd6422bda34f68676662ba5283722340c1ffbf784f6e8c433ca6b118efe41c8a0e66ed123c8bfd2f482597ace1ab750a2b7d2fcbd1
7
- data.tar.gz: 724cae47d487ee6e63a8e969ae8ed9f1d380ffaf61b3877074578a9704565fd0290ffd27550904afc9ad3949aba536b4fa69d5209670a39b13f21ae2e0fd874c
6
+ metadata.gz: 8620ce75eb60600a686dbd2c3ef53bf5d11d604963e6315e7d467c046578a6c7d76fe408cba8a001925c40ef04ec6576e1c697b4aad821dcc5ffbe3b0a5ebec9
7
+ data.tar.gz: cd8c89a4f8a40afd9c062aeed3423638db1f7441eb8c9bb967a7a470a134d2fa583d8a53782f6bb526d2cbdfe03570926f5cf36447b85106938d134ee3970f21
data/CONTRIBUTORS.md ADDED
@@ -0,0 +1,6 @@
1
+ Contributors
2
+ ============
3
+
4
+ * James Hart (james@wanelo.com)
5
+ * Paul Henry (paul@wanelo.com)
6
+ * Eric Saxby (sax@wanelo.com)
data/LICENSE.txt CHANGED
@@ -1,4 +1,4 @@
1
- Copyright (c) 2013 TODO: Write your name
1
+ Copyright (c) 2013 Wanelo, Inc
2
2
 
3
3
  MIT License
4
4
 
data/README.md CHANGED
@@ -1,28 +1,265 @@
1
- Sequel::SchemaSharding
2
- ================
1
+ sequel-schema-sharding
2
+ ======================
3
3
 
4
- [![Build Status](https://travis-ci.org/wanelo/sequel-sharding.png?branch=master)](https://travis-ci.org/wanelo/sequel-sharding)
4
+ [![Build Status](https://travis-ci.org/wanelo/sequel-schema-sharding.png?branch=master)](https://travis-ci.org/wanelo/sequel-schema-sharding)
5
+
6
+ Horizontally shard PostgreSQL tables with the Sequel gem, where each shard
7
+ lives in its own PostgreSQL schema.
8
+
9
+ This gem allows you to configure mappings between logical and physical shards, pooling
10
+ connections between logical shards on the same physical server.
5
11
 
6
- Horizontally shard postgres with the Sequel gem. This gem allows you to configure mappings between logical and
7
- physical shards, and pool connections between logical shards on the same physical server.
8
12
 
9
13
  ## Installation
10
14
 
11
15
  Add this line to your application's Gemfile:
12
16
 
13
- gem 'sequel-sharding'
17
+ gem 'sequel-schema-sharding'
14
18
 
15
19
  And then execute:
16
20
 
17
21
  $ bundle
18
22
 
19
- Or install it yourself as:
20
-
21
- $ gem install sequel-sharding
22
23
 
23
24
  ## Usage
24
25
 
25
- TODO :)
26
+ See the `examples` directory for example files.
27
+
28
+ ### Configuration
29
+
30
+ Create a sharding configuration file in your project, for instance at
31
+ `config/sharding.yml`. The format should match the following
32
+ conventions:
33
+
34
+ ```yml
35
+ <env>:
36
+ tables:
37
+ <table_name>:
38
+ schema_name: "schema_%e_%s"
39
+ logical_shards:
40
+ <1..n>: <shard_name>
41
+ physical_shards:
42
+ <shard_name>:
43
+ host: <hostname>
44
+ database: <database>
45
+ common:
46
+ username: <pg_username>
47
+ password: <pg_password>
48
+ port: <pg_port>
49
+ ```
50
+
51
+ Tables can coexist in schemas, though they do not have to.
52
+
53
+ In your project, configure `sequel-schema-sharding` in a ruby file that
54
+ gets loaded before your models, for instance at `config/sharding.rb`.
55
+
56
+ ```ruby
57
+ require 'sequel-schema-sharding'
58
+
59
+ Sequel::SchemaSharding.migration_path = File.expand_path('../../db/sharding_migrations', __FILE__)
60
+ Sequel::SchemaSharding.sharding_yml_path = File.expand_path('../sharding.yml', __FILE__)
61
+ ```
62
+
63
+ ### Migrations
64
+
65
+ Each table gets its own set of migrations. Underneath the scenes,
66
+ `sequel-schema-sharding` uses Sequel migrations, though migrations are
67
+ run using the `Sequel::SchemaSharding::DatabaseManager` class.
68
+
69
+ For instance, if you have two sharded tables, `:artists` and `:albums`,
70
+ your migration folder would look something like this:
71
+
72
+ ```yml
73
+ - my_project
74
+ - db
75
+ - migrations
76
+ - artists
77
+ - 001_create_artists.rb
78
+ - 002_add_indexes_to_artists.rb
79
+ - albums
80
+ - 001_create_albums.rb
81
+ ```
82
+
83
+ See Sequel documentation for more info:
84
+ * (http://sequel.rubyforge.org/rdoc/files/doc/schema_modification_rdoc.html)
85
+ * (http://sequel.rubyforge.org/rdoc/files/doc/migration_rdoc.html)
86
+
87
+ TODO: rake tasks for running migrations
88
+
89
+ ### Models
90
+
91
+ Models declare their table in the class definition. This allows Sequel
92
+ to load table information from the database when the environment loads.
93
+ This is particularly important for typecasting, so empty strings can be
94
+ typecast to null, etc.
95
+
96
+ The tricky bit is that `sequel-schema-sharding` connects to the first
97
+ available shard for a table in order to read the database schema.
98
+
99
+ ```ruby
100
+ require 'config/sharding'
101
+
102
+ class Artist < Sequel::SchemaSharding::Model('artists')
103
+ set_columns [:id, :name]
104
+ set_sharded_column :id
105
+
106
+ def this
107
+ @this ||= self.class.by_id(id)
108
+ end
109
+
110
+ def self.by_id(id)
111
+ shard_for(id).where(id: id).first
112
+ end
113
+ end
114
+
115
+ class Album < Sequel::SchemaSharding::Model('albums')
116
+ set_columns [:artist_id, :name, :release_date, :created_at]
117
+ set_sharded_column :artist_id
118
+
119
+ def this
120
+ @this ||= self.class.by_artist(artist_id)
121
+ end
122
+
123
+ def by_artist(artist_id)
124
+ shard_for(artist_id).where(artist_id: artist_id)
125
+ end
126
+
127
+ def by_artist_and_name(artist_id, name)
128
+ shard_for(artist_id).where(name: name, artist_id: artist_id)
129
+ end
130
+ end
131
+ ```
132
+
133
+ Note that logical and physical shards mapped in schema.yml need to exist
134
+ before you can load models into memory.
135
+
136
+ Read access always starts with the `:shard_for` method, to ensure that
137
+ the correct database connection and shard name is used. Writes will
138
+ automatically choose the correct shard based on the sharded column.
139
+ Never try to insert records with nil values in sharded columns.
140
+
141
+ TODO: explain why we define `this`
142
+
143
+ ## FAQ
144
+
145
+ ### How should I shard my databases?
146
+
147
+ This is entirely dependent on the access patterns of your application. A
148
+ good rule, though, is to look at your indexes. If every query goes
149
+ through an index on `:user_id`, then chances are that you should shard
150
+ on `:user_id`. If half of your queries go through `:user_id` and the
151
+ other half go through `:job_id`, then you may need to create two sets of
152
+ shards, each with its own model, and have your application write to
153
+ both. This requires additional application complexity to keep the two
154
+ sets of shards in sync—it's less complex than doing multi-shard reads to
155
+ keep everything in one model, though.
156
+
157
+ When going into database sharding, an early exercise that is very
158
+ helpful is to analyze application queries and try to reduce the number
159
+ of unique queries. If possible, try to refactor queries such that they
160
+ fit into the smallest number of shard types. For instance, if you find
161
+ Albums by release year, but every action you query from already has the
162
+ `:artist_id`, consider changing your query to find by `:artist_id` and
163
+ release year.
164
+
165
+ ### How should I generate IDs?
166
+
167
+ This is also dependent on your application and your comfort level with
168
+ various technologies, but regardless should be done outside
169
+ of `sequel-schema-sharding`. In general there are three approaches that
170
+ we've considered:
171
+
172
+ * Follow Instagram's approach and let PostgreSQL generate ids. They
173
+ install functions into each shard, to ensure that each shard generates
174
+ unique ids.
175
+
176
+ * Follow Twitter's approach and deploy a separate service for unique id
177
+ generation. Their in-house solution is called Snowflake, and depends
178
+ on maven, finagle and thrift.
179
+
180
+ * Why use ids at all? If you are sharding Like data or something that
181
+ looks similar to a join table, you may not need a unique identifier.
182
+ You are probably sharding on a foreign key to some other table, and
183
+ may not ever access individual Likes by id.
184
+
185
+ ### Should each table get its own set of shards/schemas?
186
+
187
+ In the early days of a project's lifetime, it may seem like less
188
+ management overhead to let multiple tables coexist in each shard.
189
+ Experience with sharding in other technologies (particularly Redis) have
190
+ shown us that in any sharded data store, you will eventually need
191
+ to redistribute shards. More data equals larger storage and RAM
192
+ requirements, and as servers fill up you will find yourself needing to
193
+ move shards onto a greater number of servers. If your project is
194
+ successful, this may come much sooner than you expect in initial
195
+ infrastructure planning meetings.
196
+
197
+ Colocating multiple data sets in individual shards makes shard
198
+ redistribution more complicated and risk-prone. More things break when
199
+ an individual shard goes down. Pages or queries that depend on an
200
+ individual data set will stop working when you take down shards to do
201
+ maintenance on other data sets.
202
+
203
+ Simply put, it's less stressful when doing operational maintenance to
204
+ require twice as many steps that are each easier and less risk-prone.
205
+ So, do whatever you feel is best, but we've chosen to make each shard
206
+ single-purpose in our infrastructure.
207
+
208
+ ### Sequel does sharding. Why another gem?
209
+
210
+ The sharding plugin that ships with Sequel assumes that each shard is a
211
+ separate database. This means that each shard requires a separate
212
+ connection pool, and that each shard includes every table. When
213
+ splitting a database into thousands of shards, this means that each
214
+ application process requires thousands of connections. A proxy such as
215
+ PGBouncer could help reduce the number of connections from an individual
216
+ application server, but even then PGBouncer would need to manage thousands
217
+ of connections.
218
+
219
+ When designing a sharded architecture similar to Instagram's approach
220
+ (http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram),
221
+ it may be desirable to start with thousands or tens of thousands of shards,
222
+ to delay the need for resharding as long as possible. PostgreSQL is able
223
+ to manage tens of thousands of schemas in a single database without
224
+ significant performance problems, so we can design a sharded backend of
225
+ thousands of shards living on a few physical servers. As stored data
226
+ grows, these shards can be moved onto a greater number of servers,
227
+ without the complication of resharding (i.e. changing the number of
228
+ shards while retaining the exact mapping of data into old shards).
229
+
230
+ ### Why Sequel?
231
+
232
+ After both good and bad experiences with other Ruby ORMs, Sequel's
233
+ documentation, ease of use and understandable codebase made it a solid
234
+ choice for us. The fact that it already supports horizontal sharding and
235
+ was easy to adapt to our own requirements were a pleasant surprise.
236
+
237
+ ### What the what?? def self.Model; ???
238
+
239
+ Yeah, this threw us for a while, too. The thing is, ORMs in Ruby tend to
240
+ load information like column info, indexes, etc directly from the
241
+ connected databases, rather than from local schema dictionaries. In
242
+ order to do this, databases need to be created and migrations run BEFORE
243
+ model files can validly loaded.
244
+
245
+ If the ORM doesn't load this info from somewhere, then it can't
246
+ correctly do things like typecast string HTTP params to integers (or
247
+ nulls).
248
+
249
+ Rather than monkeypatching our way around this requirement in Sequel, we
250
+ ride the wave and just patch in our additions.
251
+
252
+ ### What could go wrong?
253
+
254
+ The thing that you never want to happen is to change the mapping of
255
+ shards to data. For instance, if you change the number of shards without
256
+ migrating data into a new database backend, the algorithm by which
257
+ schemas are chosen will start returning a different mapping for reads than
258
+ that which was used to insert data. New records will go into the new
259
+ mapping, but any attempt to read a record inserted via the old mapping
260
+ will pick the wrong shard and return an empty set. DON'T EVER DO THIS.
261
+ It's really embarrassing.
262
+
26
263
 
27
264
  ## Contributing
28
265
 
@@ -0,0 +1,15 @@
1
+ Sequel.migration do
2
+ up do
3
+ create_table :albums do
4
+ Integer :artist_id, null: false
5
+ String :name, null: false
6
+ Date :release_date
7
+ Time :created_at
8
+ end
9
+ end
10
+
11
+ down do
12
+ drop_table :albums
13
+ end
14
+ end
15
+
@@ -0,0 +1,15 @@
1
+ Sequel.migration do
2
+ up do
3
+ create_table :artists do
4
+ Integer :id, null: false
5
+ String :name, null: false
6
+ Time :created_at
7
+ end
8
+ end
9
+
10
+ down do
11
+ drop_table :artists
12
+ end
13
+ end
14
+
15
+
data/examples/model.rb ADDED
@@ -0,0 +1,19 @@
1
+ require 'config/sharding'
2
+
3
+ class Thing < Sequel::SchemaSharding::Model('things')
4
+ set_columns [:name, :thing1, :thing2]
5
+ set_sharded_column :name
6
+
7
+ # class variables used by Sequel can't easily be set via
8
+ # pretty methods at the moment. They can be quickly overridden,
9
+ # however.
10
+ @require_modification = false
11
+
12
+ def this
13
+ @this ||= self.class.by_name(name)
14
+ end
15
+
16
+ def self.by_name(name)
17
+ shard_for(name).where(name: name)
18
+ end
19
+ end
@@ -0,0 +1,4 @@
1
+ require 'sequel-schema-sharding'
2
+
3
+ Sequel::SchemaSharding.migration_path = File.expand_path('../db/sharding_migrations', __FILE__)
4
+ Sequel::SchemaSharding.sharding_yml_path = File.expand_path('../sharding.yml', __FILE__)
@@ -0,0 +1,53 @@
1
+ test:
2
+ tables:
3
+ artists:
4
+ schema_name: artists_%e_%s
5
+ logical_shards:
6
+ 1..2: shard1
7
+ 3..4: shard2
8
+ albums:
9
+ schema_name: albums_%e_%s
10
+ logical_shards:
11
+ 1..2: shard2
12
+ 3..4: shard3
13
+ physical_shards:
14
+ shard1:
15
+ host: 127.0.0.1
16
+ database: my_project_test_shard1
17
+ shard2:
18
+ host: 127.0.0.1
19
+ database: my_project_test_shard2
20
+ shard3:
21
+ host: 127.0.0.1
22
+ database: my_project_test_shard3
23
+ common:
24
+ username: postgres
25
+ password:
26
+ port: 5432
27
+ development:
28
+ tables:
29
+ artists:
30
+ schema_name: artists_%e_%s
31
+ logical_shards:
32
+ 1..2: shard1
33
+ 3..4: shard2
34
+ albums:
35
+ schema_name: albums_%e_%s
36
+ logical_shards:
37
+ 1..2: shard2
38
+ 3..4: shard3
39
+ physical_shards:
40
+ shard1:
41
+ host: 127.0.0.1
42
+ database: my_project_development_shard1
43
+ shard2:
44
+ host: 127.0.0.1
45
+ database: my_project_development_shard2
46
+ shard3:
47
+ host: 127.0.0.1
48
+ database: my_project_development_shard3
49
+ common:
50
+ username: postgres
51
+ password:
52
+ port: 5432
53
+
@@ -1,5 +1,5 @@
1
1
  module Sequel
2
2
  module SchemaSharding
3
- VERSION = "0.0.1"
3
+ VERSION = "0.0.2"
4
4
  end
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sequel-schema-sharding
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.1
4
+ version: 0.0.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Paul Henry
@@ -10,7 +10,7 @@ authors:
10
10
  autorequire:
11
11
  bindir: bin
12
12
  cert_chain: []
13
- date: 2013-09-06 00:00:00.000000000 Z
13
+ date: 2013-09-08 00:00:00.000000000 Z
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
16
16
  name: sequel
@@ -78,10 +78,16 @@ files:
78
78
  - .gitignore
79
79
  - .rspec
80
80
  - .travis.yml
81
+ - CONTRIBUTORS.md
81
82
  - Gemfile
82
83
  - LICENSE.txt
83
84
  - README.md
84
85
  - Rakefile
86
+ - examples/db/migrations/albums/001_create_albums.rb
87
+ - examples/db/migrations/artists/001_create_artists.rb
88
+ - examples/model.rb
89
+ - examples/sharding.rb
90
+ - examples/sharding.yml
85
91
  - lib/sequel-schema-sharding.rb
86
92
  - lib/sequel/schema-sharding.rb
87
93
  - lib/sequel/schema-sharding/configuration.rb
@@ -125,7 +131,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
125
131
  version: '0'
126
132
  requirements: []
127
133
  rubyforge_project:
128
- rubygems_version: 2.0.7
134
+ rubygems_version: 2.0.3
129
135
  signing_key:
130
136
  specification_version: 4
131
137
  summary: Create horizontally sharded Sequel models with Postgres
@@ -141,3 +147,4 @@ test_files:
141
147
  - spec/schema-sharding/ring_spec.rb
142
148
  - spec/spec_helper.rb
143
149
  - spec/support/database_helper.rb
150
+ has_rdoc: