fury_dumper 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
data/README.md ADDED
@@ -0,0 +1,383 @@
1
+ # FuryDumper 🧙‍
2
+
3
+ Welcome to dumper gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file `lib/fury_dumper`.
4
+
5
+ It help you to get dump from remote database in main service and other microservices, which has `fury_dumper` gem.
6
+
7
+ *Read this in other languages: [Russian](README.ru.md).*
8
+
9
+ *For developers: [English](README.md#dev-documentation), [Russian](README.ru.md#документация-для-разработчиков).*
10
+
11
+ ## Installation
12
+
13
+ Add this line to your application's Gemfile:
14
+
15
+ ```ruby
16
+ gem 'fury_dumper'
17
+ ```
18
+
19
+ And then execute:
20
+
21
+ bundle install
22
+
23
+ Or install it yourself as:
24
+
25
+ gem install fury_dumper
26
+
27
+ ## Usage
28
+
29
+ ### Configuration
30
+
31
+ Create default configuration
32
+
33
+ bundle exec rails generate fury_dumper:config
34
+
35
+ For correct work with other services change default `fury_dumper.yml` config. Structure description:
36
+
37
+ ```yaml
38
+ # The size of the batch at the first iteration
39
+ #
40
+ # Optional; default 100
41
+ batch_size: 100
42
+
43
+ # The ratio of the number of records (fetching_records) uploaded from the database to the size of the batch
44
+ # Formula: fetching_records = ratio_records_batches * batch_size
45
+ # fetching_records acts as the limit value for sql queries
46
+ #
47
+ # Optional; default 10
48
+ ratio_records_batches: 10
49
+
50
+ # Track mode of the graph of relations - in width (:wide) or depth (:depth)
51
+ #
52
+ # Optional; default wide
53
+ mode: wide
54
+
55
+ # Relations that will be excluded from the dump,
56
+ # useful for speed optimization uploads and exceptions
57
+ # from unloading extra data:
58
+ # <class name>. <association name>
59
+ #
60
+ # Optional; default is empty array
61
+ exclude_relations: User.friends, Post.author
62
+
63
+ # By default, data is uploaded quickly (without sorting)
64
+ # fast mode allows you to dump records sorted by primary key (false) or not (true),
65
+ # useful for creating dumps for developers.
66
+ #
67
+ # Optional; default is true
68
+ fast: true
69
+
70
+ # List of microservice connections
71
+ #
72
+ # Optional
73
+ relative_services:
74
+ # Microservice name
75
+ #
76
+ # Optional
77
+ post_service:
78
+ # Name of the remote database for this microservice (post_service)
79
+ # with which data will dump
80
+ #
81
+ # Required
82
+ database: 'post_service_development_dump'
83
+
84
+ # Host for remote database
85
+ #
86
+ # Required
87
+ host: 'localhost'
88
+
89
+ # Port for remote database
90
+ #
91
+ # Required
92
+
93
+ port: '5432'
94
+ # Username for remote database
95
+ #
96
+ # Required
97
+ user: 'user'
98
+
99
+ # Password for remote database
100
+ #
101
+ # Required
102
+ password: 'password'
103
+
104
+ # A list of tables associated with this microservice (post_service)
105
+ #
106
+ # Required
107
+ tables:
108
+ # Table name with current service
109
+ #
110
+ # Required
111
+ users:
112
+
113
+ # Table name in microservice (post_service)
114
+ #
115
+ # Required
116
+ users:
117
+ # Column name to the table of this service (users)
118
+ #
119
+ # Required
120
+ self_field_name: 'id'
121
+
122
+ # Model name in microservice (post_service)
123
+ #
124
+ # Required
125
+ ms_model_name: 'User'
126
+
127
+ # Name of the column to the microservice table (users)
128
+ #
129
+ # Required
130
+ ms_field_name: 'root_user_id'
131
+ root_posts:
132
+ posts:
133
+ self_field_name: 'id'
134
+ ms_model_name: 'Post'
135
+ ms_field_name: 'root_post_id'
136
+ logs_service:
137
+ database: 'logs_service_development_dump'
138
+ host: 'localhost'
139
+ port: '5432'
140
+ user: 'user'
141
+ password: 'password'
142
+ tables:
143
+ users:
144
+ logs:
145
+ self_field_name: "log :: json - >> 'id'"
146
+ ms_model_name: 'Log'
147
+ ms_field_name: 'id'
148
+ ```
149
+
150
+ ### Routing for microservices
151
+
152
+ Add this code to your `config/routes.rb` for add opportunity other services to dump your database:
153
+
154
+ ```ruby
155
+ mount FuryDumper::Engine => "fury_dumper" unless Rails.env.production?
156
+ ```
157
+
158
+ ### Main call
159
+
160
+ **⚠️ ⚠️ ⚠️ Attention! When copying data, in the event of a conflict with the available data, they are considered higher priority in the remote database (the current ones will overwrite)! ⚠️ ⚠️ ⚠️**
161
+
162
+ For start dumping from production or staging run this command:
163
+
164
+ ```ruby
165
+ FuryDumper.dump(password: 'password',
166
+ host: 'localhost',
167
+ port: '5632',
168
+ user: 'username',
169
+ model_name: 'User',
170
+ field_name: 'token',
171
+ field_values: ['99999999-8888-4444-1212-111111111111'],
172
+ database: 'staging',
173
+ debug_mode: :short)
174
+ ```
175
+
176
+ For connection to remote host, run ssh command. Example for main service stage:
177
+ ```
178
+ ssh -NL <port>:<host>:<hostport> username@<host>
179
+ ```
180
+
181
+ Description for arguments:
182
+
183
+ | Argument | Description |
184
+ | --- | --- |
185
+ | host | host for remote DB |
186
+ | port | port for remote DB |
187
+ | user | username for remote DB |
188
+ | password | password for remote DB |
189
+ | database | DB remote name |
190
+ | model_name | name of model for dump |
191
+ | field_name | field name for model |
192
+ | field_values | values of field_name |
193
+ | debug_mode | debug mode (full print all msgs, short - part of msgs, none - nothing) |
194
+ | ask | ask user for confirm different schema of target & remote DB |
195
+
196
+
197
+ ### Examples
198
+
199
+ In these examples it is not necessary to change the `fury_dumper.yml` config, take [default] (README.ru.md # configs).
200
+
201
+ User dump by admin_token:
202
+ ```ruby
203
+ FuryDumper.dump(password: 'password',
204
+ host: 'localhost',
205
+ port: '5632',
206
+ user: 'username',
207
+ model_name: 'User',
208
+ field_name: 'admin_token',
209
+ field_values: [admin_token_value],
210
+ database: 'staging',
211
+ debug_mode: :short)
212
+ ```
213
+ Dump 1000 users (here you can tweak batch_size - by default batch_size = 100 so that the dumper worked more than 10 times):
214
+ ```ruby
215
+ FuryDumper.dump(password: 'password',
216
+ host: 'localhost',
217
+ port: '5632',
218
+ user: 'username',
219
+ model_name: 'User',
220
+ field_values: (500..1500),
221
+ database: 'staging',
222
+ debug_mode: :short)
223
+ ```
224
+ Dump AdminUser:
225
+ ```ruby
226
+ FuryDumper.dump(password: 'password',
227
+ host: 'localhost',
228
+ port: '5632',
229
+ user: 'username',
230
+ model_name: 'AdminUser',
231
+ field_values: 3368,
232
+ database: 'staging',
233
+ debug_mode: :short)
234
+ ```
235
+
236
+ Stage dump:
237
+
238
+ ```bash
239
+ ssh -NL <port>:<host>:<hostport> username@<host>
240
+ ```
241
+
242
+ ```ruby
243
+ FuryDumper.dump(password: 'password',
244
+ host: 'localhost',
245
+ port: '5632',
246
+ user: 'username',
247
+ model_name: 'User',
248
+ field_values: 1,
249
+ database: 'staging',
250
+ debug_mode: :short)
251
+ ```
252
+
253
+ Dump from replica of production:
254
+
255
+ ```bash
256
+ ssh -NL <port>:<host>:<hostport> username@<host>
257
+ ```
258
+
259
+ ```ruby
260
+ FuryDumper.dump(password: 'password',
261
+ host: 'localhost',
262
+ port: '5632',
263
+ user: 'username',
264
+ model_name: 'User',
265
+ field_values: 1,
266
+ database: 'production',
267
+ debug_mode: :short)
268
+ ```
269
+
270
+ ### Statistics 📈
271
+
272
+ Dump statistics from the replica (standard configuration, see [this config](README.md#configuration))
273
+
274
+ | Number of base objects | Number of relative objects | Time |
275
+ | --- | --- | --- |
276
+ | 1 | ~ 150* | 2 min 14 sec |
277
+ | 10 | ~ 3 500* | 6 min 15 sec |
278
+ | 100 | ~ 10 000* | 11 min 8 sec |
279
+ | 1,000 | ~ 10 000* | 16 min 6 sec |
280
+
281
+ \ * Operations dump several times in different ways and can be duplicated among themselves, because of this the number presented in the table is approximately equal to the number of unique records in the database.
282
+
283
+ Note: The runtime may differ for different objects depending on the number relative objects.
284
+
285
+ # Dev documentation
286
+
287
+ Abbreviations for greater convenience
288
+ * PK - primary key
289
+ * FK - foreign key
290
+ * remote DB - remote DB from which data will be pulled
291
+ * target DB - the current database to which copying will be performed
292
+
293
+
294
+ ## Track of the relation graph
295
+
296
+ At the moment, 2 options have been implemented to keep track of the relation graph - in depth (depth-first) and in width (breadth-first).
297
+
298
+ ### Depth first
299
+
300
+ How the algorithm works briefly:
301
+
302
+ 1. Find a model
303
+ 2. Find all model relationships
304
+ 3. For each relation:
305
+      1. Find all the data (PK / FK, values, etc.)
306
+      2. Dump the found relation
307
+
308
+ That is a classic breadth-first algorithm
309
+
310
+ ![Depth_first_example](Depth-first.png)
311
+
312
+ ### Breadth first
313
+
314
+ How the algorithm works briefly:
315
+
316
+ 1. Get the model for the dump (input data) and add it to the model queue
317
+ 2. While there are models in queue
318
+      1. The current model is considered the first in queue
319
+      2. Copy this model from remote DB
320
+      3. For each relation of this model:
321
+          1. Find all the data (PK / FK, values, etc.)
322
+          2. Put the linked model at the end of the queue
323
+
324
+ That is a classic breadth-first algorithm
325
+
326
+ ![Breadth_first_example](Breadth-first.png)
327
+
328
+ By default, the dumper is wide. This decision was made due to the fact that the dumper considers short-range relations more priority. \
329
+ But you can explicitly make the dumper work in depth by specifying the string `mode: depth` in the configuration file` fury_dumper.yml`.
330
+
331
+ ## Relationships for a specific model
332
+
333
+ Each model under consideration has many relations, we consider almost everything. Here is a list of relations being reviewed:
334
+ * has_one and has_many (considered together; has_one is not taken as LIMIT 1, thus converting to has_many)
335
+ * belongs_to
336
+ * has_and_belongs_to_many
337
+
338
+ But there are a few exceptions, for example, through relations are ignored.
339
+
340
+ And a little about scopes in relations - they are taken. But if there is a wider (covering relation) - without sсope, then only the covering relation will be dumped.
341
+
342
+ For example - the user has documents and a main documents:
343
+ * has_many :documents, class_name: 'User::Document'\
344
+ * has_one :main_document, -> { main }, class_name: 'User::Document'
345
+
346
+ Relation main_document will not be taken when dumping due to the fact that documents is a covering relation, since it is wider and without sсope.\
347
+ If there was no documents relation, main_document would dump with the condition.
348
+
349
+ Models also take with polymorphic relationships (`belongs_to: resource, polymorphic: true` and `has_many: devices, as:: owner`).
350
+
351
+ ### has_and_belongs_to_many relation
352
+
353
+ The has_and_belongs_to_many associations have a proxy table that also needs to be dumped. This happens at the time of processing a model that has a given relationship.
354
+
355
+ ### Features of as-relation
356
+ Relationships like as are handled a bit differently from the rest. Due to the fact that there can be many links to this table and they will not be duplicated, each carrying its own meaning and it is impossible to lose them. \
357
+ For example, a connection for the user `has_many: devices, as:: owner` may also be present in the lead. And in an ideal universe 🦄 , both need to be pulled out. \
358
+ In order to dump both models, it was decided to write down the relation path (only as) along which the model arrived. If one of the paths is a subpath for the other model, then they are the same and will not be dumped.
359
+
360
+ ## Fast mode
361
+
362
+ Fast mode call sql queries without order by primary key. If you want to dump **last** records in model set `fast: false` in configuration.\
363
+ In fast mode sql queries look like this:
364
+ ```sql
365
+ SELECT * FROM table WHERE fk_id IN (...) LIMIT 1000;
366
+ ```
367
+ In non-fast mode sql queries look like this:
368
+ ```sql
369
+ SELECT * FROM table WHERE fk_id IN (...) ORDER BY table.id LIMIT 1000
370
+ ```
371
+ But non-fast mode makes queries slower due to the pg-planner building the query by primary key index and fk_id IN (...) filters when ordering. It works slower.
372
+
373
+ ## Briefly about classes
374
+
375
+ * FuryDumper - initiates the dump process, performs batching on the first iteration
376
+ * FuryDumper :: Dumper - the main class that implements the dump process, the main algorithm for tracking relation here
377
+ * FuryDumper :: Dumpers :: Model - model class
378
+ * FuryDumper :: Dumpers :: ModelQueue - a queue of models for a dump in width
379
+ * FuryDumper :: Dumpers :: DumpState - dump status class, information about those models that have already been dumped and some statistics are stored here
380
+ * FuryDumper :: Dumpers :: RelationItem - communication structure - keys and values ​​used to dump. For ordinary models, RelationItem is compared with each other only by key. The Additional option makes it possible to compare by key and value. Complex - explicitly says that there will only be a key, that is, the key contains a string of the type `date_from IS NULL`, which is a condition for communication.
381
+ * FuryDumper :: Api - a class for communicating with microservices
382
+ * FuryDumper :: Config - config class
383
+ * FuryDumper :: Encrypter - a class for encrypting passwords