dump_cleaner 0.5.0 → 0.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +4 -0
- data/README.md +9 -8
- data/dump_cleaner.png +0 -0
- data/lib/dump_cleaner/cleaners/mysql_shell_table_cleaner.rb +1 -2
- data/lib/dump_cleaner/version.rb +1 -1
- metadata +4 -6
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 2fb7bab589f7b66ce216c08fcf3b92306577713d8ba96bcde0ec10675ebbde4a
|
4
|
+
data.tar.gz: a74f4fbb5f76aba4f507616d405e4075858915f81d2d1a79a5632d3cb5d0b97b
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 2fe9e2b71156303bfe718ee1e5c57e529decb13be26433e392a803a309efd87796d52d711fc5afa9c9fcf746a430cdb66f8513af7d33b0d329ad9090ba606d51
|
7
|
+
data.tar.gz: 1fc94b2056a4c1f756cd0795fc60b812b0f3181de209c915d0630093db89e2c38be53902f9385713be468a4528d27db08c9f011f15257bdbe69abbbd54b481f2
|
data/CHANGELOG.md
CHANGED
data/README.md
CHANGED
@@ -1,3 +1,5 @@
|
|
1
|
+

|
2
|
+
|
1
3
|
# DumpCleaner
|
2
4
|
|
3
5
|
DumpCleaner is a tool that can randomize or anonymize your database dumps. Currently, it works with the [MySQL Shell Dump](https://dev.mysql.com/doc/mysql-shell/8.4/en/mysql-shell-utilities-dump-instance-schema.html) format (other formats may be added later).
|
@@ -32,7 +34,7 @@ That said, having an exact production data copy at developers’ machines is ins
|
|
32
34
|
- It can **ignore certain columns and/or records** in the dump based on a set of conditions to e.g. skip randomizing contact information of internal admin users.
|
33
35
|
- It obeys the inherent limits of the given dump format, if any (for example, it takes great care to keep the length and byte size of the updated data the same as original so as not to corrupt the MySQL Shell dump chunk index files).
|
34
36
|
|
35
|
-
All in all, DumpCleaner is just a
|
37
|
+
All in all, DumpCleaner is just a _”more specialized and configurable `awk`“_, i.e. a text replacement tool.
|
36
38
|
|
37
39
|
#### Non-goals and limitations
|
38
40
|
|
@@ -44,15 +46,13 @@ All in all, DumpCleaner is just a „more specialized and configurable `awk`“,
|
|
44
46
|
|
45
47
|
## Installation
|
46
48
|
|
47
|
-
|
48
|
-
|
49
|
-
Install the gem and add to the application's Gemfile by executing:
|
49
|
+
To install the gem, add it to the application's Gemfile by executing:
|
50
50
|
|
51
|
-
$ bundle add
|
51
|
+
$ bundle add dump_cleaner
|
52
52
|
|
53
53
|
If bundler is not being used to manage dependencies, install the gem by executing:
|
54
54
|
|
55
|
-
$ gem install
|
55
|
+
$ gem install dump_cleaner
|
56
56
|
|
57
57
|
## Usage
|
58
58
|
|
@@ -78,7 +78,7 @@ MySQLShell JS> util.dumpSchemas(["db"], "mysql_shell_dump");
|
|
78
78
|
The dump contains a `users` table with the following sample contents:
|
79
79
|
|
80
80
|
```sh
|
81
|
-
$ zstdcat
|
81
|
+
$ zstdcat mysql_shell_dump/db@users@@0.tsv.zst
|
82
82
|
|
83
83
|
# id name email phone_number
|
84
84
|
1 Johnson johnson@gmail.com +420774678763
|
@@ -96,7 +96,7 @@ $ dump_cleaner -f mysql_shell_dump -t mysql_shell_anonymized_dump \
|
|
96
96
|
a destination dump directory gets created with a copy of the source dump but with the data in the `users` table randomized, in this case in the following way:
|
97
97
|
|
98
98
|
```sh
|
99
|
-
$ zstdcat
|
99
|
+
$ zstdcat mysql_shell_anonymized_dump/db@users@@0.tsv.zst
|
100
100
|
|
101
101
|
# id name email phone_number
|
102
102
|
1 Jackson variety@gmail.com +420774443735
|
@@ -279,6 +279,7 @@ If multiple conditions are specified, they are logically OR-ed, i.e. if _any_ of
|
|
279
279
|
|
280
280
|
- The issue with random seeds being dependent on the primary key (and thus artificially increasing data variance): this behavior should probably be optional.
|
281
281
|
- The `RandomizeFormattedNumber` step could be generalized to `RandomizeFormattedString`, allowing to replace any matching part of the string with not only numbers, but alphanumeric etc. as well. The `RandomizeEmail` could then be rewritten using this new step.
|
282
|
+
- The ability to work with mysqldump / mysqlpump database dump files would be nice.
|
282
283
|
|
283
284
|
## Development
|
284
285
|
|
data/dump_cleaner.png
ADDED
Binary file
|
@@ -28,8 +28,7 @@ module DumpCleaner
|
|
28
28
|
|
29
29
|
DumpCleaner::Cleanup::Uniqueness::CaseInsensitiveCache.instance.clear
|
30
30
|
|
31
|
-
Dir.glob("#{options.source_dump_path}/#{table_info.db_at_table}
|
32
|
-
# Open3.pipeline_r(pipe_source_args(file), ["head", "-n", "1000"]) do |tsv_data, _wait_thread|
|
31
|
+
Dir.glob("#{options.source_dump_path}/#{table_info.db_at_table}@*.#{table_info.extension}").each do |file|
|
33
32
|
Open3.pipeline_r(pipe_source_args(file)) do |tsv_data, _wait_thread|
|
34
33
|
Open3.pipeline_w(pipe_sink_args(destination_file_for(file))) do |zstd_out, _wait_thread|
|
35
34
|
tsv_data.each_line do |line|
|
data/lib/dump_cleaner/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,13 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: dump_cleaner
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.6.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Matouš Borák
|
8
|
-
autorequire:
|
9
8
|
bindir: exe
|
10
9
|
cert_chain: []
|
11
|
-
date:
|
10
|
+
date: 2025-03-18 00:00:00.000000000 Z
|
12
11
|
dependencies:
|
13
12
|
- !ruby/object:Gem::Dependency
|
14
13
|
name: zeitwerk
|
@@ -41,6 +40,7 @@ files:
|
|
41
40
|
- Rakefile
|
42
41
|
- doc/workflow_steps.md
|
43
42
|
- dump_cleaner.gemspec
|
43
|
+
- dump_cleaner.png
|
44
44
|
- exe/dump_cleaner
|
45
45
|
- lib/dump_cleaner.rb
|
46
46
|
- lib/dump_cleaner/cleaners/base_cleaner.rb
|
@@ -83,7 +83,6 @@ metadata:
|
|
83
83
|
homepage_uri: https://github.com/NejRemeslnici/dump-cleaner
|
84
84
|
source_code_uri: https://github.com/NejRemeslnici/dump-cleaner
|
85
85
|
changelog_uri: https://github.com/NejRemeslnici/dump-cleaner/blob/main/CHANGELOG.md
|
86
|
-
post_install_message:
|
87
86
|
rdoc_options: []
|
88
87
|
require_paths:
|
89
88
|
- lib
|
@@ -98,8 +97,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
98
97
|
- !ruby/object:Gem::Version
|
99
98
|
version: '0'
|
100
99
|
requirements: []
|
101
|
-
rubygems_version: 3.
|
102
|
-
signing_key:
|
100
|
+
rubygems_version: 3.6.2
|
103
101
|
specification_version: 4
|
104
102
|
summary: Anonymizes data in logical database dumps.
|
105
103
|
test_files: []
|