dump_cleaner 0.5.0 → 0.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: fad879c857b2b0f2bb5c0eb130a91b49fef18dd822b16ab0467ef1af7f9293e9
4
- data.tar.gz: 710ce63c414421cb7be41d7775bb8be22bcc5570ff208a1fa0701e2e36627b0c
3
+ metadata.gz: 88a45ecda8f05b867e67332797dd2e31adf526b319f181637b1185edd1d5c3bc
4
+ data.tar.gz: 5a097c5895e21105a8d09e6b4436cdb187df9669a13d2e1980414c99e93f72c5
5
5
  SHA512:
6
- metadata.gz: 8388f418065647c421e97f6c85f5a6c89fca228e3f544117fe6b5f687bf6f6aaaa986ea5c35ce4578c690f335a0f4245317f5374b7b017b6138e0f58c5ee8575
7
- data.tar.gz: d8f449349ac08020eee4fffeaef9a33191703eab1d3048417741fa687be1a34126e88ab18678386e6251b054826518fbecf7f6b0b31771dd7193897aa20cddf3
6
+ metadata.gz: 50d3c285b980c834f56b4a6f3049f1b2a30aa947eb891f5b80b3e8e4e35a62548447125f3ba04fbd86b7ad1d9717b54a63db4d0df1ff47109bbba2ea16790fa1
7
+ data.tar.gz: 859f9eff4e9424c46fa7d3673bb99449c4b95d2996a992b981cbc651f5febc5dee8556bdd62bcf0a5a66e1bc42d53fb784fd395812f553d082bd25bc7cec023d
data/CHANGELOG.md CHANGED
@@ -1,5 +1,13 @@
1
1
  ## [Unreleased]
2
2
 
3
+ ## [0.6.1] - 2025-04-08
4
+
5
+ - Make `logger` an explicit dependency.
6
+
7
+ ## [0.6.0] - 2025-03-18
8
+
9
+ - Fix glob pattern when finding mysql dump files
10
+
3
11
  ## [0.5.0] - 2024-06-13
4
12
 
5
13
  - Initial public release
data/README.md CHANGED
@@ -1,3 +1,5 @@
1
+ ![Dump Cleaner](dump_cleaner.png)
2
+
1
3
  # DumpCleaner
2
4
 
3
5
  DumpCleaner is a tool that can randomize or anonymize your database dumps. Currently, it works with the [MySQL Shell Dump](https://dev.mysql.com/doc/mysql-shell/8.4/en/mysql-shell-utilities-dump-instance-schema.html) format (other formats may be added later).
@@ -32,7 +34,7 @@ That said, having an exact production data copy at developers’ machines is ins
32
34
  - It can **ignore certain columns and/or records** in the dump based on a set of conditions to e.g. skip randomizing contact information of internal admin users.
33
35
  - It obeys the inherent limits of the given dump format, if any (for example, it takes great care to keep the length and byte size of the updated data the same as original so as not to corrupt the MySQL Shell dump chunk index files).
34
36
 
35
- All in all, DumpCleaner is just a more specialized and configurable `awk`“, i.e. a text replacement tool.
37
+ All in all, DumpCleaner is just a _”more specialized and configurable `awk`“_, i.e. a text replacement tool.
36
38
 
37
39
  #### Non-goals and limitations
38
40
 
@@ -44,15 +46,13 @@ All in all, DumpCleaner is just a „more specialized and configurable `awk`“,
44
46
 
45
47
  ## Installation
46
48
 
47
- TODO: Replace `UPDATE_WITH_YOUR_GEM_NAME_IMMEDIATELY_AFTER_RELEASE_TO_RUBYGEMS_ORG` with your gem name right after releasing it to RubyGems.org. Please do not do it earlier due to security reasons. Alternatively, replace this section with instructions to install your gem from git if you don't plan to release to RubyGems.org.
48
-
49
- Install the gem and add to the application's Gemfile by executing:
49
+ To install the gem, add it to the application's Gemfile by executing:
50
50
 
51
- $ bundle add UPDATE_WITH_YOUR_GEM_NAME_IMMEDIATELY_AFTER_RELEASE_TO_RUBYGEMS_ORG
51
+ $ bundle add dump_cleaner
52
52
 
53
53
  If bundler is not being used to manage dependencies, install the gem by executing:
54
54
 
55
- $ gem install UPDATE_WITH_YOUR_GEM_NAME_IMMEDIATELY_AFTER_RELEASE_TO_RUBYGEMS_ORG
55
+ $ gem install dump_cleaner
56
56
 
57
57
  ## Usage
58
58
 
@@ -78,7 +78,7 @@ MySQLShell JS> util.dumpSchemas(["db"], "mysql_shell_dump");
78
78
  The dump contains a `users` table with the following sample contents:
79
79
 
80
80
  ```sh
81
- $ zstdcat spec/support/data/mysql_shell_dump/db@users@@0.tsv.zst
81
+ $ zstdcat mysql_shell_dump/db@users@@0.tsv.zst
82
82
 
83
83
  # id name email phone_number
84
84
  1 Johnson johnson@gmail.com +420774678763
@@ -96,7 +96,7 @@ $ dump_cleaner -f mysql_shell_dump -t mysql_shell_anonymized_dump \
96
96
  a destination dump directory gets created with a copy of the source dump but with the data in the `users` table randomized, in this case in the following way:
97
97
 
98
98
  ```sh
99
- $ zstdcat spec/support/data/mysql_shell_anonymized_dump/db@users@@0.tsv.zst
99
+ $ zstdcat mysql_shell_anonymized_dump/db@users@@0.tsv.zst
100
100
 
101
101
  # id name email phone_number
102
102
  1 Jackson variety@gmail.com +420774443735
@@ -279,6 +279,7 @@ If multiple conditions are specified, they are logically OR-ed, i.e. if _any_ of
279
279
 
280
280
  - The issue with random seeds being dependent on the primary key (and thus artificially increasing data variance): this behavior should probably be optional.
281
281
  - The `RandomizeFormattedNumber` step could be generalized to `RandomizeFormattedString`, allowing to replace any matching part of the string with not only numbers, but alphanumeric etc. as well. The `RandomizeEmail` could then be rewritten using this new step.
282
+ - The ability to work with mysqldump / mysqlpump database dump files would be nice.
282
283
 
283
284
  ## Development
284
285
 
data/dump_cleaner.gemspec CHANGED
@@ -31,6 +31,7 @@ Gem::Specification.new do |spec|
31
31
  spec.require_paths = ["lib"]
32
32
 
33
33
  # Uncomment to register a new dependency of your gem
34
+ spec.add_dependency "logger", "~> 1.7"
34
35
  spec.add_dependency "zeitwerk", "~> 2.6"
35
36
 
36
37
  # For more information and examples about making a new gem, check out our
data/dump_cleaner.png ADDED
Binary file
@@ -28,8 +28,7 @@ module DumpCleaner
28
28
 
29
29
  DumpCleaner::Cleanup::Uniqueness::CaseInsensitiveCache.instance.clear
30
30
 
31
- Dir.glob("#{options.source_dump_path}/#{table_info.db_at_table}@@*.#{table_info.extension}").each do |file|
32
- # Open3.pipeline_r(pipe_source_args(file), ["head", "-n", "1000"]) do |tsv_data, _wait_thread|
31
+ Dir.glob("#{options.source_dump_path}/#{table_info.db_at_table}@*.#{table_info.extension}").each do |file|
33
32
  Open3.pipeline_r(pipe_source_args(file)) do |tsv_data, _wait_thread|
34
33
  Open3.pipeline_w(pipe_sink_args(destination_file_for(file))) do |zstd_out, _wait_thread|
35
34
  tsv_data.each_line do |line|
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module DumpCleaner
4
- VERSION = "0.5.0"
4
+ VERSION = "0.6.1"
5
5
  end
metadata CHANGED
@@ -1,15 +1,28 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: dump_cleaner
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.5.0
4
+ version: 0.6.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Matouš Borák
8
- autorequire:
9
8
  bindir: exe
10
9
  cert_chain: []
11
- date: 2024-06-13 00:00:00.000000000 Z
10
+ date: 2025-04-08 00:00:00.000000000 Z
12
11
  dependencies:
12
+ - !ruby/object:Gem::Dependency
13
+ name: logger
14
+ requirement: !ruby/object:Gem::Requirement
15
+ requirements:
16
+ - - "~>"
17
+ - !ruby/object:Gem::Version
18
+ version: '1.7'
19
+ type: :runtime
20
+ prerelease: false
21
+ version_requirements: !ruby/object:Gem::Requirement
22
+ requirements:
23
+ - - "~>"
24
+ - !ruby/object:Gem::Version
25
+ version: '1.7'
13
26
  - !ruby/object:Gem::Dependency
14
27
  name: zeitwerk
15
28
  requirement: !ruby/object:Gem::Requirement
@@ -41,6 +54,7 @@ files:
41
54
  - Rakefile
42
55
  - doc/workflow_steps.md
43
56
  - dump_cleaner.gemspec
57
+ - dump_cleaner.png
44
58
  - exe/dump_cleaner
45
59
  - lib/dump_cleaner.rb
46
60
  - lib/dump_cleaner/cleaners/base_cleaner.rb
@@ -83,7 +97,6 @@ metadata:
83
97
  homepage_uri: https://github.com/NejRemeslnici/dump-cleaner
84
98
  source_code_uri: https://github.com/NejRemeslnici/dump-cleaner
85
99
  changelog_uri: https://github.com/NejRemeslnici/dump-cleaner/blob/main/CHANGELOG.md
86
- post_install_message:
87
100
  rdoc_options: []
88
101
  require_paths:
89
102
  - lib
@@ -98,8 +111,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
98
111
  - !ruby/object:Gem::Version
99
112
  version: '0'
100
113
  requirements: []
101
- rubygems_version: 3.5.3
102
- signing_key:
114
+ rubygems_version: 3.6.2
103
115
  specification_version: 4
104
116
  summary: Anonymizes data in logical database dumps.
105
117
  test_files: []