fury_dumper 0.1.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.github/workflows/build.yaml +43 -0
- data/.gitignore +16 -0
- data/.rspec +3 -0
- data/.rubocop.yml +67 -0
- data/.ruby-version +1 -0
- data/.travis.yml +6 -0
- data/Breadth-first.png +0 -0
- data/CODE_OF_CONDUCT.md +74 -0
- data/Depth-first.png +0 -0
- data/Gemfile +10 -0
- data/Gemfile.lock +202 -0
- data/README.md +383 -0
- data/README.ru.md +382 -0
- data/Rakefile +8 -0
- data/app/controllers/fury_dumper/dump_process_controller.rb +25 -0
- data/config/routes.rb +6 -0
- data/fury_dumper.gemspec +37 -0
- data/lib/fury_dumper/api.rb +82 -0
- data/lib/fury_dumper/config.rb +113 -0
- data/lib/fury_dumper/dumper.rb +632 -0
- data/lib/fury_dumper/dumpers/dump_state.rb +61 -0
- data/lib/fury_dumper/dumpers/model.rb +131 -0
- data/lib/fury_dumper/dumpers/model_queue.rb +34 -0
- data/lib/fury_dumper/dumpers/relation_items.rb +79 -0
- data/lib/fury_dumper/encrypter.rb +17 -0
- data/lib/fury_dumper/engine.rb +7 -0
- data/lib/fury_dumper/version.rb +5 -0
- data/lib/fury_dumper.rb +102 -0
- data/lib/generators/fury_dumper/config_generator.rb +21 -0
- data/rails_generators/fury_dumper_config/fury_dumper_config_generator.rb +10 -0
- data/rails_generators/fury_dumper_config/templates/fury_dumper.rb +1 -0
- data/rails_generators/fury_dumper_config/templates/fury_dumper.yml +44 -0
- metadata +181 -0
data/README.md
ADDED
@@ -0,0 +1,383 @@
|
|
1
|
+
# FuryDumper 🧙
|
2
|
+
|
3
|
+
Welcome to dumper gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file `lib/fury_dumper`.
|
4
|
+
|
5
|
+
It help you to get dump from remote database in main service and other microservices, which has `fury_dumper` gem.
|
6
|
+
|
7
|
+
*Read this in other languages: [Russian](README.ru.md).*
|
8
|
+
|
9
|
+
*For developers: [English](README.md#dev-documentation), [Russian](README.ru.md#документация-для-разработчиков).*
|
10
|
+
|
11
|
+
## Installation
|
12
|
+
|
13
|
+
Add this line to your application's Gemfile:
|
14
|
+
|
15
|
+
```ruby
|
16
|
+
gem 'fury_dumper'
|
17
|
+
```
|
18
|
+
|
19
|
+
And then execute:
|
20
|
+
|
21
|
+
bundle install
|
22
|
+
|
23
|
+
Or install it yourself as:
|
24
|
+
|
25
|
+
gem install fury_dumper
|
26
|
+
|
27
|
+
## Usage
|
28
|
+
|
29
|
+
### Configuration
|
30
|
+
|
31
|
+
Create default configuration
|
32
|
+
|
33
|
+
bundle exec rails generate fury_dumper:config
|
34
|
+
|
35
|
+
For correct work with other services change default `fury_dumper.yml` config. Structure description:
|
36
|
+
|
37
|
+
```yaml
|
38
|
+
# The size of the batch at the first iteration
|
39
|
+
#
|
40
|
+
# Optional; default 100
|
41
|
+
batch_size: 100
|
42
|
+
|
43
|
+
# The ratio of the number of records (fetching_records) uploaded from the database to the size of the batch
|
44
|
+
# Formula: fetching_records = ratio_records_batches * batch_size
|
45
|
+
# fetching_records acts as the limit value for sql queries
|
46
|
+
#
|
47
|
+
# Optional; default 10
|
48
|
+
ratio_records_batches: 10
|
49
|
+
|
50
|
+
# Track mode of the graph of relations - in width (:wide) or depth (:depth)
|
51
|
+
#
|
52
|
+
# Optional; default wide
|
53
|
+
mode: wide
|
54
|
+
|
55
|
+
# Relations that will be excluded from the dump,
|
56
|
+
# useful for speed optimization uploads and exceptions
|
57
|
+
# from unloading extra data:
|
58
|
+
# <class name>. <association name>
|
59
|
+
#
|
60
|
+
# Optional; default is empty array
|
61
|
+
exclude_relations: User.friends, Post.author
|
62
|
+
|
63
|
+
# By default, data is uploaded quickly (without sorting)
|
64
|
+
# fast mode allows you to dump records sorted by primary key (false) or not (true),
|
65
|
+
# useful for creating dumps for developers.
|
66
|
+
#
|
67
|
+
# Optional; default is true
|
68
|
+
fast: true
|
69
|
+
|
70
|
+
# List of microservice connections
|
71
|
+
#
|
72
|
+
# Optional
|
73
|
+
relative_services:
|
74
|
+
# Microservice name
|
75
|
+
#
|
76
|
+
# Optional
|
77
|
+
post_service:
|
78
|
+
# Name of the remote database for this microservice (post_service)
|
79
|
+
# with which data will dump
|
80
|
+
#
|
81
|
+
# Required
|
82
|
+
database: 'post_service_development_dump'
|
83
|
+
|
84
|
+
# Host for remote database
|
85
|
+
#
|
86
|
+
# Required
|
87
|
+
host: 'localhost'
|
88
|
+
|
89
|
+
# Port for remote database
|
90
|
+
#
|
91
|
+
# Required
|
92
|
+
|
93
|
+
port: '5432'
|
94
|
+
# Username for remote database
|
95
|
+
#
|
96
|
+
# Required
|
97
|
+
user: 'user'
|
98
|
+
|
99
|
+
# Password for remote database
|
100
|
+
#
|
101
|
+
# Required
|
102
|
+
password: 'password'
|
103
|
+
|
104
|
+
# A list of tables associated with this microservice (post_service)
|
105
|
+
#
|
106
|
+
# Required
|
107
|
+
tables:
|
108
|
+
# Table name with current service
|
109
|
+
#
|
110
|
+
# Required
|
111
|
+
users:
|
112
|
+
|
113
|
+
# Table name in microservice (post_service)
|
114
|
+
#
|
115
|
+
# Required
|
116
|
+
users:
|
117
|
+
# Column name to the table of this service (users)
|
118
|
+
#
|
119
|
+
# Required
|
120
|
+
self_field_name: 'id'
|
121
|
+
|
122
|
+
# Model name in microservice (post_service)
|
123
|
+
#
|
124
|
+
# Required
|
125
|
+
ms_model_name: 'User'
|
126
|
+
|
127
|
+
# Name of the column to the microservice table (users)
|
128
|
+
#
|
129
|
+
# Required
|
130
|
+
ms_field_name: 'root_user_id'
|
131
|
+
root_posts:
|
132
|
+
posts:
|
133
|
+
self_field_name: 'id'
|
134
|
+
ms_model_name: 'Post'
|
135
|
+
ms_field_name: 'root_post_id'
|
136
|
+
logs_service:
|
137
|
+
database: 'logs_service_development_dump'
|
138
|
+
host: 'localhost'
|
139
|
+
port: '5432'
|
140
|
+
user: 'user'
|
141
|
+
password: 'password'
|
142
|
+
tables:
|
143
|
+
users:
|
144
|
+
logs:
|
145
|
+
self_field_name: "log :: json - >> 'id'"
|
146
|
+
ms_model_name: 'Log'
|
147
|
+
ms_field_name: 'id'
|
148
|
+
```
|
149
|
+
|
150
|
+
### Routing for microservices
|
151
|
+
|
152
|
+
Add this code to your `config/routes.rb` for add opportunity other services to dump your database:
|
153
|
+
|
154
|
+
```ruby
|
155
|
+
mount FuryDumper::Engine => "fury_dumper" unless Rails.env.production?
|
156
|
+
```
|
157
|
+
|
158
|
+
### Main call
|
159
|
+
|
160
|
+
**⚠️ ⚠️ ⚠️ Attention! When copying data, in the event of a conflict with the available data, they are considered higher priority in the remote database (the current ones will overwrite)! ⚠️ ⚠️ ⚠️**
|
161
|
+
|
162
|
+
For start dumping from production or staging run this command:
|
163
|
+
|
164
|
+
```ruby
|
165
|
+
FuryDumper.dump(password: 'password',
|
166
|
+
host: 'localhost',
|
167
|
+
port: '5632',
|
168
|
+
user: 'username',
|
169
|
+
model_name: 'User',
|
170
|
+
field_name: 'token',
|
171
|
+
field_values: ['99999999-8888-4444-1212-111111111111'],
|
172
|
+
database: 'staging',
|
173
|
+
debug_mode: :short)
|
174
|
+
```
|
175
|
+
|
176
|
+
For connection to remote host, run ssh command. Example for main service stage:
|
177
|
+
```
|
178
|
+
ssh -NL <port>:<host>:<hostport> username@<host>
|
179
|
+
```
|
180
|
+
|
181
|
+
Description for arguments:
|
182
|
+
|
183
|
+
| Argument | Description |
|
184
|
+
| --- | --- |
|
185
|
+
| host | host for remote DB |
|
186
|
+
| port | port for remote DB |
|
187
|
+
| user | username for remote DB |
|
188
|
+
| password | password for remote DB |
|
189
|
+
| database | DB remote name |
|
190
|
+
| model_name | name of model for dump |
|
191
|
+
| field_name | field name for model |
|
192
|
+
| field_values | values of field_name |
|
193
|
+
| debug_mode | debug mode (full print all msgs, short - part of msgs, none - nothing) |
|
194
|
+
| ask | ask user for confirm different schema of target & remote DB |
|
195
|
+
|
196
|
+
|
197
|
+
### Examples
|
198
|
+
|
199
|
+
In these examples it is not necessary to change the `fury_dumper.yml` config, take [default] (README.ru.md # configs).
|
200
|
+
|
201
|
+
User dump by admin_token:
|
202
|
+
```ruby
|
203
|
+
FuryDumper.dump(password: 'password',
|
204
|
+
host: 'localhost',
|
205
|
+
port: '5632',
|
206
|
+
user: 'username',
|
207
|
+
model_name: 'User',
|
208
|
+
field_name: 'admin_token',
|
209
|
+
field_values: [admin_token_value],
|
210
|
+
database: 'staging',
|
211
|
+
debug_mode: :short)
|
212
|
+
```
|
213
|
+
Dump 1000 users (here you can tweak batch_size - by default batch_size = 100 so that the dumper worked more than 10 times):
|
214
|
+
```ruby
|
215
|
+
FuryDumper.dump(password: 'password',
|
216
|
+
host: 'localhost',
|
217
|
+
port: '5632',
|
218
|
+
user: 'username',
|
219
|
+
model_name: 'User',
|
220
|
+
field_values: (500..1500),
|
221
|
+
database: 'staging',
|
222
|
+
debug_mode: :short)
|
223
|
+
```
|
224
|
+
Dump AdminUser:
|
225
|
+
```ruby
|
226
|
+
FuryDumper.dump(password: 'password',
|
227
|
+
host: 'localhost',
|
228
|
+
port: '5632',
|
229
|
+
user: 'username',
|
230
|
+
model_name: 'AdminUser',
|
231
|
+
field_values: 3368,
|
232
|
+
database: 'staging',
|
233
|
+
debug_mode: :short)
|
234
|
+
```
|
235
|
+
|
236
|
+
Stage dump:
|
237
|
+
|
238
|
+
```bash
|
239
|
+
ssh -NL <port>:<host>:<hostport> username@<host>
|
240
|
+
```
|
241
|
+
|
242
|
+
```ruby
|
243
|
+
FuryDumper.dump(password: 'password',
|
244
|
+
host: 'localhost',
|
245
|
+
port: '5632',
|
246
|
+
user: 'username',
|
247
|
+
model_name: 'User',
|
248
|
+
field_values: 1,
|
249
|
+
database: 'staging',
|
250
|
+
debug_mode: :short)
|
251
|
+
```
|
252
|
+
|
253
|
+
Dump from replica of production:
|
254
|
+
|
255
|
+
```bash
|
256
|
+
ssh -NL <port>:<host>:<hostport> username@<host>
|
257
|
+
```
|
258
|
+
|
259
|
+
```ruby
|
260
|
+
FuryDumper.dump(password: 'password',
|
261
|
+
host: 'localhost',
|
262
|
+
port: '5632',
|
263
|
+
user: 'username',
|
264
|
+
model_name: 'User',
|
265
|
+
field_values: 1,
|
266
|
+
database: 'production',
|
267
|
+
debug_mode: :short)
|
268
|
+
```
|
269
|
+
|
270
|
+
### Statistics 📈
|
271
|
+
|
272
|
+
Dump statistics from the replica (standard configuration, see [this config](README.md#configuration))
|
273
|
+
|
274
|
+
| Number of base objects | Number of relative objects | Time |
|
275
|
+
| --- | --- | --- |
|
276
|
+
| 1 | ~ 150* | 2 min 14 sec |
|
277
|
+
| 10 | ~ 3 500* | 6 min 15 sec |
|
278
|
+
| 100 | ~ 10 000* | 11 min 8 sec |
|
279
|
+
| 1,000 | ~ 10 000* | 16 min 6 sec |
|
280
|
+
|
281
|
+
\ * Operations dump several times in different ways and can be duplicated among themselves, because of this the number presented in the table is approximately equal to the number of unique records in the database.
|
282
|
+
|
283
|
+
Note: The runtime may differ for different objects depending on the number relative objects.
|
284
|
+
|
285
|
+
# Dev documentation
|
286
|
+
|
287
|
+
Abbreviations for greater convenience
|
288
|
+
* PK - primary key
|
289
|
+
* FK - foreign key
|
290
|
+
* remote DB - remote DB from which data will be pulled
|
291
|
+
* target DB - the current database to which copying will be performed
|
292
|
+
|
293
|
+
|
294
|
+
## Track of the relation graph
|
295
|
+
|
296
|
+
At the moment, 2 options have been implemented to keep track of the relation graph - in depth (depth-first) and in width (breadth-first).
|
297
|
+
|
298
|
+
### Depth first
|
299
|
+
|
300
|
+
How the algorithm works briefly:
|
301
|
+
|
302
|
+
1. Find a model
|
303
|
+
2. Find all model relationships
|
304
|
+
3. For each relation:
|
305
|
+
1. Find all the data (PK / FK, values, etc.)
|
306
|
+
2. Dump the found relation
|
307
|
+
|
308
|
+
That is a classic breadth-first algorithm
|
309
|
+
|
310
|
+
![Depth_first_example](Depth-first.png)
|
311
|
+
|
312
|
+
### Breadth first
|
313
|
+
|
314
|
+
How the algorithm works briefly:
|
315
|
+
|
316
|
+
1. Get the model for the dump (input data) and add it to the model queue
|
317
|
+
2. While there are models in queue
|
318
|
+
1. The current model is considered the first in queue
|
319
|
+
2. Copy this model from remote DB
|
320
|
+
3. For each relation of this model:
|
321
|
+
1. Find all the data (PK / FK, values, etc.)
|
322
|
+
2. Put the linked model at the end of the queue
|
323
|
+
|
324
|
+
That is a classic breadth-first algorithm
|
325
|
+
|
326
|
+
![Breadth_first_example](Breadth-first.png)
|
327
|
+
|
328
|
+
By default, the dumper is wide. This decision was made due to the fact that the dumper considers short-range relations more priority. \
|
329
|
+
But you can explicitly make the dumper work in depth by specifying the string `mode: depth` in the configuration file` fury_dumper.yml`.
|
330
|
+
|
331
|
+
## Relationships for a specific model
|
332
|
+
|
333
|
+
Each model under consideration has many relations, we consider almost everything. Here is a list of relations being reviewed:
|
334
|
+
* has_one and has_many (considered together; has_one is not taken as LIMIT 1, thus converting to has_many)
|
335
|
+
* belongs_to
|
336
|
+
* has_and_belongs_to_many
|
337
|
+
|
338
|
+
But there are a few exceptions, for example, through relations are ignored.
|
339
|
+
|
340
|
+
And a little about scopes in relations - they are taken. But if there is a wider (covering relation) - without sсope, then only the covering relation will be dumped.
|
341
|
+
|
342
|
+
For example - the user has documents and a main documents:
|
343
|
+
* has_many :documents, class_name: 'User::Document'\
|
344
|
+
* has_one :main_document, -> { main }, class_name: 'User::Document'
|
345
|
+
|
346
|
+
Relation main_document will not be taken when dumping due to the fact that documents is a covering relation, since it is wider and without sсope.\
|
347
|
+
If there was no documents relation, main_document would dump with the condition.
|
348
|
+
|
349
|
+
Models also take with polymorphic relationships (`belongs_to: resource, polymorphic: true` and `has_many: devices, as:: owner`).
|
350
|
+
|
351
|
+
### has_and_belongs_to_many relation
|
352
|
+
|
353
|
+
The has_and_belongs_to_many associations have a proxy table that also needs to be dumped. This happens at the time of processing a model that has a given relationship.
|
354
|
+
|
355
|
+
### Features of as-relation
|
356
|
+
Relationships like as are handled a bit differently from the rest. Due to the fact that there can be many links to this table and they will not be duplicated, each carrying its own meaning and it is impossible to lose them. \
|
357
|
+
For example, a connection for the user `has_many: devices, as:: owner` may also be present in the lead. And in an ideal universe 🦄 , both need to be pulled out. \
|
358
|
+
In order to dump both models, it was decided to write down the relation path (only as) along which the model arrived. If one of the paths is a subpath for the other model, then they are the same and will not be dumped.
|
359
|
+
|
360
|
+
## Fast mode
|
361
|
+
|
362
|
+
Fast mode call sql queries without order by primary key. If you want to dump **last** records in model set `fast: false` in configuration.\
|
363
|
+
In fast mode sql queries look like this:
|
364
|
+
```sql
|
365
|
+
SELECT * FROM table WHERE fk_id IN (...) LIMIT 1000;
|
366
|
+
```
|
367
|
+
In non-fast mode sql queries look like this:
|
368
|
+
```sql
|
369
|
+
SELECT * FROM table WHERE fk_id IN (...) ORDER BY table.id LIMIT 1000
|
370
|
+
```
|
371
|
+
But non-fast mode makes queries slower due to the pg-planner building the query by primary key index and fk_id IN (...) filters when ordering. It works slower.
|
372
|
+
|
373
|
+
## Briefly about classes
|
374
|
+
|
375
|
+
* FuryDumper - initiates the dump process, performs batching on the first iteration
|
376
|
+
* FuryDumper :: Dumper - the main class that implements the dump process, the main algorithm for tracking relation here
|
377
|
+
* FuryDumper :: Dumpers :: Model - model class
|
378
|
+
* FuryDumper :: Dumpers :: ModelQueue - a queue of models for a dump in width
|
379
|
+
* FuryDumper :: Dumpers :: DumpState - dump status class, information about those models that have already been dumped and some statistics are stored here
|
380
|
+
* FuryDumper :: Dumpers :: RelationItem - communication structure - keys and values used to dump. For ordinary models, RelationItem is compared with each other only by key. The Additional option makes it possible to compare by key and value. Complex - explicitly says that there will only be a key, that is, the key contains a string of the type `date_from IS NULL`, which is a condition for communication.
|
381
|
+
* FuryDumper :: Api - a class for communicating with microservices
|
382
|
+
* FuryDumper :: Config - config class
|
383
|
+
* FuryDumper :: Encrypter - a class for encrypting passwords
|