sequel-schema-sharding 0.0.1 → 0.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 9a83484015fdcd8dc4799e752d6d82ee94498738
4
- data.tar.gz: 2e44e0d65a6c598371035daddc1724022b3c893b
3
+ metadata.gz: 59427115172509c05a9c50ff6b52f6bf13b3bae7
4
+ data.tar.gz: 82f16ec31818221234a0873abe6663395137e0ba
5
5
  SHA512:
6
- metadata.gz: bac9d7a0ed39dbe558e2f0bd6422bda34f68676662ba5283722340c1ffbf784f6e8c433ca6b118efe41c8a0e66ed123c8bfd2f482597ace1ab750a2b7d2fcbd1
7
- data.tar.gz: 724cae47d487ee6e63a8e969ae8ed9f1d380ffaf61b3877074578a9704565fd0290ffd27550904afc9ad3949aba536b4fa69d5209670a39b13f21ae2e0fd874c
6
+ metadata.gz: 8620ce75eb60600a686dbd2c3ef53bf5d11d604963e6315e7d467c046578a6c7d76fe408cba8a001925c40ef04ec6576e1c697b4aad821dcc5ffbe3b0a5ebec9
7
+ data.tar.gz: cd8c89a4f8a40afd9c062aeed3423638db1f7441eb8c9bb967a7a470a134d2fa583d8a53782f6bb526d2cbdfe03570926f5cf36447b85106938d134ee3970f21
data/CONTRIBUTORS.md ADDED
@@ -0,0 +1,6 @@
1
+ Contributors
2
+ ============
3
+
4
+ * James Hart (james@wanelo.com)
5
+ * Paul Henry (paul@wanelo.com)
6
+ * Eric Saxby (sax@wanelo.com)
data/LICENSE.txt CHANGED
@@ -1,4 +1,4 @@
1
- Copyright (c) 2013 TODO: Write your name
1
+ Copyright (c) 2013 Wanelo, Inc
2
2
 
3
3
  MIT License
4
4
 
data/README.md CHANGED
@@ -1,28 +1,265 @@
1
- Sequel::SchemaSharding
2
- ================
1
+ sequel-schema-sharding
2
+ ======================
3
3
 
4
- [![Build Status](https://travis-ci.org/wanelo/sequel-sharding.png?branch=master)](https://travis-ci.org/wanelo/sequel-sharding)
4
+ [![Build Status](https://travis-ci.org/wanelo/sequel-schema-sharding.png?branch=master)](https://travis-ci.org/wanelo/sequel-schema-sharding)
5
+
6
+ Horizontally shard PostgreSQL tables with the Sequel gem, where each shard
7
+ lives in its own PostgreSQL schema.
8
+
9
+ This gem allows you to configure mappings between logical and physical shards, pooling
10
+ connections between logical shards on the same physical server.
5
11
 
6
- Horizontally shard postgres with the Sequel gem. This gem allows you to configure mappings between logical and
7
- physical shards, and pool connections between logical shards on the same physical server.
8
12
 
9
13
  ## Installation
10
14
 
11
15
  Add this line to your application's Gemfile:
12
16
 
13
- gem 'sequel-sharding'
17
+ gem 'sequel-schema-sharding'
14
18
 
15
19
  And then execute:
16
20
 
17
21
  $ bundle
18
22
 
19
- Or install it yourself as:
20
-
21
- $ gem install sequel-sharding
22
23
 
23
24
  ## Usage
24
25
 
25
- TODO :)
26
+ See the `examples` directory for example files.
27
+
28
+ ### Configuration
29
+
30
+ Create a sharding configuration file in your project, for instance at
31
+ `config/sharding.yml`. The format should match the following
32
+ conventions:
33
+
34
+ ```yml
35
+ <env>:
36
+ tables:
37
+ <table_name>:
38
+ schema_name: "schema_%e_%s"
39
+ logical_shards:
40
+ <1..n>: <shard_name>
41
+ physical_shards:
42
+ <shard_name>:
43
+ host: <hostname>
44
+ database: <database>
45
+ common:
46
+ username: <pg_username>
47
+ password: <pg_password>
48
+ port: <pg_port>
49
+ ```
50
+
51
+ Tables can coexist in schemas, though they do not have to.
52
+
53
+ In your project, configure `sequel-schema-sharding` in a ruby file that
54
+ gets loaded before your models, for instance at `config/sharding.rb`.
55
+
56
+ ```ruby
57
+ require 'sequel-schema-sharding'
58
+
59
+ Sequel::SchemaSharding.migration_path = File.expand_path('../../db/sharding_migrations', __FILE__)
60
+ Sequel::SchemaSharding.sharding_yml_path = File.expand_path('../sharding.yml', __FILE__)
61
+ ```
62
+
63
+ ### Migrations
64
+
65
+ Each table gets its own set of migrations. Underneath the scenes,
66
+ `sequel-schema-sharding` uses Sequel migrations, though migrations are
67
+ run using the `Sequel::SchemaSharding::DatabaseManager` class.
68
+
69
+ For instance, if you have two sharded tables, `:artists` and `:albums`,
70
+ your migration folder would look something like this:
71
+
72
+ ```yml
73
+ - my_project
74
+ - db
75
+ - migrations
76
+ - artists
77
+ - 001_create_artists.rb
78
+ - 002_add_indexes_to_artists.rb
79
+ - albums
80
+ - 001_create_albums.rb
81
+ ```
82
+
83
+ See Sequel documentation for more info:
84
+ * (http://sequel.rubyforge.org/rdoc/files/doc/schema_modification_rdoc.html)
85
+ * (http://sequel.rubyforge.org/rdoc/files/doc/migration_rdoc.html)
86
+
87
+ TODO: rake tasks for running migrations
88
+
89
+ ### Models
90
+
91
+ Models declare their table in the class definition. This allows Sequel
92
+ to load table information from the database when the environment loads.
93
+ This is particularly important for typecasting, so empty strings can be
94
+ typecast to null, etc.
95
+
96
+ The tricky bit is that `sequel-schema-sharding` connects to the first
97
+ available shard for a table in order to read the database schema.
98
+
99
+ ```ruby
100
+ require 'config/sharding'
101
+
102
+ class Artist < Sequel::SchemaSharding::Model('artists')
103
+ set_columns [:id, :name]
104
+ set_sharded_column :id
105
+
106
+ def this
107
+ @this ||= self.class.by_id(id)
108
+ end
109
+
110
+ def self.by_id(id)
111
+ shard_for(id).where(id: id).first
112
+ end
113
+ end
114
+
115
+ class Album < Sequel::SchemaSharding::Model('albums')
116
+ set_columns [:artist_id, :name, :release_date, :created_at]
117
+ set_sharded_column :artist_id
118
+
119
+ def this
120
+ @this ||= self.class.by_artist(artist_id)
121
+ end
122
+
123
+ def by_artist(artist_id)
124
+ shard_for(artist_id).where(artist_id: artist_id)
125
+ end
126
+
127
+ def by_artist_and_name(artist_id, name)
128
+ shard_for(artist_id).where(name: name, artist_id: artist_id)
129
+ end
130
+ end
131
+ ```
132
+
133
+ Note that logical and physical shards mapped in schema.yml need to exist
134
+ before you can load models into memory.
135
+
136
+ Read access always starts with the `:shard_for` method, to ensure that
137
+ the correct database connection and shard name is used. Writes will
138
+ automatically choose the correct shard based on the sharded column.
139
+ Never try to insert records with nil values in sharded columns.
140
+
141
+ TODO: explain why we define `this`
142
+
143
+ ## FAQ
144
+
145
+ ### How should I shard my databases?
146
+
147
+ This is entirely dependent on the access patterns of your application. A
148
+ good rule, though, is to look at your indexes. If every query goes
149
+ through an index on `:user_id`, then chances are that you should shard
150
+ on `:user_id`. If half of your queries go through `:user_id` and the
151
+ other half go through `:job_id`, then you may need to create two sets of
152
+ shards, each with its own model, and have your application write to
153
+ both. This requires additional application complexity to keep the two
154
+ sets of shards in sync—it's less complex than doing multi-shard reads to
155
+ keep everything in one model, though.
156
+
157
+ When going into database sharding, an early exercise that is very
158
+ helpful is to analyze application queries and try to reduce the number
159
+ of unique queries. If possible, try to refactor queries such that they
160
+ fit into the smallest number of shard types. For instance, if you find
161
+ Albums by release year, but every action you query from already has the
162
+ `:artist_id`, consider changing your query to find by `:artist_id` and
163
+ release year.
164
+
165
+ ### How should I generate IDs?
166
+
167
+ This is also dependent on your application and your comfort level with
168
+ various technologies, but regardless should be done outside
169
+ of `sequel-schema-sharding`. In general there are three approaches that
170
+ we've considered:
171
+
172
+ * Follow Instagram's approach and let PostgreSQL generate ids. They
173
+ install functions into each shard, to ensure that each shard generates
174
+ unique ids.
175
+
176
+ * Follow Twitter's approach and deploy a separate service for unique id
177
+ generation. Their in-house solution is called Snowflake, and depends
178
+ on maven, finagle and thrift.
179
+
180
+ * Why use ids at all? If you are sharding Like data or something that
181
+ looks similar to a join table, you may not need a unique identifier.
182
+ You are probably sharding on a foreign key to some other table, and
183
+ may not ever access individual Likes by id.
184
+
185
+ ### Should each table get its own set of shards/schemas?
186
+
187
+ In the early days of a project's lifetime, it may seem like less
188
+ management overhead to let multiple tables coexist in each shard.
189
+ Experience with sharding in other technologies (particularly Redis) have
190
+ shown us that in any sharded data store, you will eventually need
191
+ to redistribute shards. More data equals larger storage and RAM
192
+ requirements, and as servers fill up you will find yourself needing to
193
+ move shards onto a greater number of servers. If your project is
194
+ successful, this may come much sooner than you expect in initial
195
+ infrastructure planning meetings.
196
+
197
+ Colocating multiple data sets in individual shards makes shard
198
+ redistribution more complicated and risk-prone. More things break when
199
+ an individual shard goes down. Pages or queries that depend on an
200
+ individual data set will stop working when you take down shards to do
201
+ maintenance on other data sets.
202
+
203
+ Simply put, it's less stressful when doing operational maintenance to
204
+ require twice as many steps that are each easier and less risk-prone.
205
+ So, do whatever you feel is best, but we've chosen to make each shard
206
+ single-purpose in our infrastructure.
207
+
208
+ ### Sequel does sharding. Why another gem?
209
+
210
+ The sharding plugin that ships with Sequel assumes that each shard is a
211
+ separate database. This means that each shard requires a separate
212
+ connection pool, and that each shard includes every table. When
213
+ splitting a database into thousands of shards, this means that each
214
+ application process requires thousands of connections. A proxy such as
215
+ PGBouncer could help reduce the number of connections from an individual
216
+ application server, but even then PGBouncer would need to manage thousands
217
+ of connections.
218
+
219
+ When designing a sharded architecture similar to Instagram's approach
220
+ (http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram),
221
+ it may be desirable to start with thousands or tens of thousands of shards,
222
+ to delay the need for resharding as long as possible. PostgreSQL is able
223
+ to manage tens of thousands of schemas in a single database without
224
+ significant performance problems, so we can design a sharded backend of
225
+ thousands of shards living on a few physical servers. As stored data
226
+ grows, these shards can be moved onto a greater number of servers,
227
+ without the complication of resharding (i.e. changing the number of
228
+ shards while retaining the exact mapping of data into old shards).
229
+
230
+ ### Why Sequel?
231
+
232
+ After both good and bad experiences with other Ruby ORMs, Sequel's
233
+ documentation, ease of use and understandable codebase made it a solid
234
+ choice for us. The fact that it already supports horizontal sharding and
235
+ was easy to adapt to our own requirements were a pleasant surprise.
236
+
237
+ ### What the what?? def self.Model; ???
238
+
239
+ Yeah, this threw us for a while, too. The thing is, ORMs in Ruby tend to
240
+ load information like column info, indexes, etc directly from the
241
+ connected databases, rather than from local schema dictionaries. In
242
+ order to do this, databases need to be created and migrations run BEFORE
243
+ model files can validly loaded.
244
+
245
+ If the ORM doesn't load this info from somewhere, then it can't
246
+ correctly do things like typecast string HTTP params to integers (or
247
+ nulls).
248
+
249
+ Rather than monkeypatching our way around this requirement in Sequel, we
250
+ ride the wave and just patch in our additions.
251
+
252
+ ### What could go wrong?
253
+
254
+ The thing that you never want to happen is to change the mapping of
255
+ shards to data. For instance, if you change the number of shards without
256
+ migrating data into a new database backend, the algorithm by which
257
+ schemas are chosen will start returning a different mapping for reads than
258
+ that which was used to insert data. New records will go into the new
259
+ mapping, but any attempt to read a record inserted via the old mapping
260
+ will pick the wrong shard and return an empty set. DON'T EVER DO THIS.
261
+ It's really embarrassing.
262
+
26
263
 
27
264
  ## Contributing
28
265
 
@@ -0,0 +1,15 @@
1
+ Sequel.migration do
2
+ up do
3
+ create_table :albums do
4
+ Integer :artist_id, null: false
5
+ String :name, null: false
6
+ Date :release_date
7
+ Time :created_at
8
+ end
9
+ end
10
+
11
+ down do
12
+ drop_table :albums
13
+ end
14
+ end
15
+
@@ -0,0 +1,15 @@
1
+ Sequel.migration do
2
+ up do
3
+ create_table :artists do
4
+ Integer :id, null: false
5
+ String :name, null: false
6
+ Time :created_at
7
+ end
8
+ end
9
+
10
+ down do
11
+ drop_table :artists
12
+ end
13
+ end
14
+
15
+
data/examples/model.rb ADDED
@@ -0,0 +1,19 @@
1
+ require 'config/sharding'
2
+
3
+ class Thing < Sequel::SchemaSharding::Model('things')
4
+ set_columns [:name, :thing1, :thing2]
5
+ set_sharded_column :name
6
+
7
+ # class variables used by Sequel can't easily be set via
8
+ # pretty methods at the moment. They can be quickly overridden,
9
+ # however.
10
+ @require_modification = false
11
+
12
+ def this
13
+ @this ||= self.class.by_name(name)
14
+ end
15
+
16
+ def self.by_name(name)
17
+ shard_for(name).where(name: name)
18
+ end
19
+ end
@@ -0,0 +1,4 @@
1
+ require 'sequel-schema-sharding'
2
+
3
+ Sequel::SchemaSharding.migration_path = File.expand_path('../db/sharding_migrations', __FILE__)
4
+ Sequel::SchemaSharding.sharding_yml_path = File.expand_path('../sharding.yml', __FILE__)
@@ -0,0 +1,53 @@
1
+ test:
2
+ tables:
3
+ artists:
4
+ schema_name: artists_%e_%s
5
+ logical_shards:
6
+ 1..2: shard1
7
+ 3..4: shard2
8
+ albums:
9
+ schema_name: albums_%e_%s
10
+ logical_shards:
11
+ 1..2: shard2
12
+ 3..4: shard3
13
+ physical_shards:
14
+ shard1:
15
+ host: 127.0.0.1
16
+ database: my_project_test_shard1
17
+ shard2:
18
+ host: 127.0.0.1
19
+ database: my_project_test_shard2
20
+ shard3:
21
+ host: 127.0.0.1
22
+ database: my_project_test_shard3
23
+ common:
24
+ username: postgres
25
+ password:
26
+ port: 5432
27
+ development:
28
+ tables:
29
+ artists:
30
+ schema_name: artists_%e_%s
31
+ logical_shards:
32
+ 1..2: shard1
33
+ 3..4: shard2
34
+ albums:
35
+ schema_name: albums_%e_%s
36
+ logical_shards:
37
+ 1..2: shard2
38
+ 3..4: shard3
39
+ physical_shards:
40
+ shard1:
41
+ host: 127.0.0.1
42
+ database: my_project_development_shard1
43
+ shard2:
44
+ host: 127.0.0.1
45
+ database: my_project_development_shard2
46
+ shard3:
47
+ host: 127.0.0.1
48
+ database: my_project_development_shard3
49
+ common:
50
+ username: postgres
51
+ password:
52
+ port: 5432
53
+
@@ -1,5 +1,5 @@
1
1
  module Sequel
2
2
  module SchemaSharding
3
- VERSION = "0.0.1"
3
+ VERSION = "0.0.2"
4
4
  end
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sequel-schema-sharding
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.1
4
+ version: 0.0.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Paul Henry
@@ -10,7 +10,7 @@ authors:
10
10
  autorequire:
11
11
  bindir: bin
12
12
  cert_chain: []
13
- date: 2013-09-06 00:00:00.000000000 Z
13
+ date: 2013-09-08 00:00:00.000000000 Z
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
16
16
  name: sequel
@@ -78,10 +78,16 @@ files:
78
78
  - .gitignore
79
79
  - .rspec
80
80
  - .travis.yml
81
+ - CONTRIBUTORS.md
81
82
  - Gemfile
82
83
  - LICENSE.txt
83
84
  - README.md
84
85
  - Rakefile
86
+ - examples/db/migrations/albums/001_create_albums.rb
87
+ - examples/db/migrations/artists/001_create_artists.rb
88
+ - examples/model.rb
89
+ - examples/sharding.rb
90
+ - examples/sharding.yml
85
91
  - lib/sequel-schema-sharding.rb
86
92
  - lib/sequel/schema-sharding.rb
87
93
  - lib/sequel/schema-sharding/configuration.rb
@@ -125,7 +131,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
125
131
  version: '0'
126
132
  requirements: []
127
133
  rubyforge_project:
128
- rubygems_version: 2.0.7
134
+ rubygems_version: 2.0.3
129
135
  signing_key:
130
136
  specification_version: 4
131
137
  summary: Create horizontally sharded Sequel models with Postgres
@@ -141,3 +147,4 @@ test_files:
141
147
  - spec/schema-sharding/ring_spec.rb
142
148
  - spec/spec_helper.rb
143
149
  - spec/support/database_helper.rb
150
+ has_rdoc: