dump_cleaner 0.5.0 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: fad879c857b2b0f2bb5c0eb130a91b49fef18dd822b16ab0467ef1af7f9293e9
4
- data.tar.gz: 710ce63c414421cb7be41d7775bb8be22bcc5570ff208a1fa0701e2e36627b0c
3
+ metadata.gz: 2fb7bab589f7b66ce216c08fcf3b92306577713d8ba96bcde0ec10675ebbde4a
4
+ data.tar.gz: a74f4fbb5f76aba4f507616d405e4075858915f81d2d1a79a5632d3cb5d0b97b
5
5
  SHA512:
6
- metadata.gz: 8388f418065647c421e97f6c85f5a6c89fca228e3f544117fe6b5f687bf6f6aaaa986ea5c35ce4578c690f335a0f4245317f5374b7b017b6138e0f58c5ee8575
7
- data.tar.gz: d8f449349ac08020eee4fffeaef9a33191703eab1d3048417741fa687be1a34126e88ab18678386e6251b054826518fbecf7f6b0b31771dd7193897aa20cddf3
6
+ metadata.gz: 2fe9e2b71156303bfe718ee1e5c57e529decb13be26433e392a803a309efd87796d52d711fc5afa9c9fcf746a430cdb66f8513af7d33b0d329ad9090ba606d51
7
+ data.tar.gz: 1fc94b2056a4c1f756cd0795fc60b812b0f3181de209c915d0630093db89e2c38be53902f9385713be468a4528d27db08c9f011f15257bdbe69abbbd54b481f2
data/CHANGELOG.md CHANGED
@@ -1,5 +1,9 @@
1
1
  ## [Unreleased]
2
2
 
3
+ ## [0.6.0] - 2025-03-18
4
+
5
+ - Fix glob pattern when finding mysql dump files
6
+
3
7
  ## [0.5.0] - 2024-06-13
4
8
 
5
9
  - Initial public release
data/README.md CHANGED
@@ -1,3 +1,5 @@
1
+ ![Dump Cleaner](dump_cleaner.png)
2
+
1
3
  # DumpCleaner
2
4
 
3
5
  DumpCleaner is a tool that can randomize or anonymize your database dumps. Currently, it works with the [MySQL Shell Dump](https://dev.mysql.com/doc/mysql-shell/8.4/en/mysql-shell-utilities-dump-instance-schema.html) format (other formats may be added later).
@@ -32,7 +34,7 @@ That said, having an exact production data copy at developers’ machines is ins
32
34
  - It can **ignore certain columns and/or records** in the dump based on a set of conditions to e.g. skip randomizing contact information of internal admin users.
33
35
  - It obeys the inherent limits of the given dump format, if any (for example, it takes great care to keep the length and byte size of the updated data the same as original so as not to corrupt the MySQL Shell dump chunk index files).
34
36
 
35
- All in all, DumpCleaner is just a more specialized and configurable `awk`“, i.e. a text replacement tool.
37
+ All in all, DumpCleaner is just a _”more specialized and configurable `awk`“_, i.e. a text replacement tool.
36
38
 
37
39
  #### Non-goals and limitations
38
40
 
@@ -44,15 +46,13 @@ All in all, DumpCleaner is just a „more specialized and configurable `awk`“,
44
46
 
45
47
  ## Installation
46
48
 
47
- TODO: Replace `UPDATE_WITH_YOUR_GEM_NAME_IMMEDIATELY_AFTER_RELEASE_TO_RUBYGEMS_ORG` with your gem name right after releasing it to RubyGems.org. Please do not do it earlier due to security reasons. Alternatively, replace this section with instructions to install your gem from git if you don't plan to release to RubyGems.org.
48
-
49
- Install the gem and add to the application's Gemfile by executing:
49
+ To install the gem, add it to the application's Gemfile by executing:
50
50
 
51
- $ bundle add UPDATE_WITH_YOUR_GEM_NAME_IMMEDIATELY_AFTER_RELEASE_TO_RUBYGEMS_ORG
51
+ $ bundle add dump_cleaner
52
52
 
53
53
  If bundler is not being used to manage dependencies, install the gem by executing:
54
54
 
55
- $ gem install UPDATE_WITH_YOUR_GEM_NAME_IMMEDIATELY_AFTER_RELEASE_TO_RUBYGEMS_ORG
55
+ $ gem install dump_cleaner
56
56
 
57
57
  ## Usage
58
58
 
@@ -78,7 +78,7 @@ MySQLShell JS> util.dumpSchemas(["db"], "mysql_shell_dump");
78
78
  The dump contains a `users` table with the following sample contents:
79
79
 
80
80
  ```sh
81
- $ zstdcat spec/support/data/mysql_shell_dump/db@users@@0.tsv.zst
81
+ $ zstdcat mysql_shell_dump/db@users@@0.tsv.zst
82
82
 
83
83
  # id name email phone_number
84
84
  1 Johnson johnson@gmail.com +420774678763
@@ -96,7 +96,7 @@ $ dump_cleaner -f mysql_shell_dump -t mysql_shell_anonymized_dump \
96
96
  a destination dump directory gets created with a copy of the source dump but with the data in the `users` table randomized, in this case in the following way:
97
97
 
98
98
  ```sh
99
- $ zstdcat spec/support/data/mysql_shell_anonymized_dump/db@users@@0.tsv.zst
99
+ $ zstdcat mysql_shell_anonymized_dump/db@users@@0.tsv.zst
100
100
 
101
101
  # id name email phone_number
102
102
  1 Jackson variety@gmail.com +420774443735
@@ -279,6 +279,7 @@ If multiple conditions are specified, they are logically OR-ed, i.e. if _any_ of
279
279
 
280
280
  - The issue with random seeds being dependent on the primary key (and thus artificially increasing data variance): this behavior should probably be optional.
281
281
  - The `RandomizeFormattedNumber` step could be generalized to `RandomizeFormattedString`, allowing to replace any matching part of the string with not only numbers, but alphanumeric etc. as well. The `RandomizeEmail` could then be rewritten using this new step.
282
+ - The ability to work with mysqldump / mysqlpump database dump files would be nice.
282
283
 
283
284
  ## Development
284
285
 
data/dump_cleaner.png ADDED
Binary file
@@ -28,8 +28,7 @@ module DumpCleaner
28
28
 
29
29
  DumpCleaner::Cleanup::Uniqueness::CaseInsensitiveCache.instance.clear
30
30
 
31
- Dir.glob("#{options.source_dump_path}/#{table_info.db_at_table}@@*.#{table_info.extension}").each do |file|
32
- # Open3.pipeline_r(pipe_source_args(file), ["head", "-n", "1000"]) do |tsv_data, _wait_thread|
31
+ Dir.glob("#{options.source_dump_path}/#{table_info.db_at_table}@*.#{table_info.extension}").each do |file|
33
32
  Open3.pipeline_r(pipe_source_args(file)) do |tsv_data, _wait_thread|
34
33
  Open3.pipeline_w(pipe_sink_args(destination_file_for(file))) do |zstd_out, _wait_thread|
35
34
  tsv_data.each_line do |line|
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module DumpCleaner
4
- VERSION = "0.5.0"
4
+ VERSION = "0.6.0"
5
5
  end
metadata CHANGED
@@ -1,14 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: dump_cleaner
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.5.0
4
+ version: 0.6.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Matouš Borák
8
- autorequire:
9
8
  bindir: exe
10
9
  cert_chain: []
11
- date: 2024-06-13 00:00:00.000000000 Z
10
+ date: 2025-03-18 00:00:00.000000000 Z
12
11
  dependencies:
13
12
  - !ruby/object:Gem::Dependency
14
13
  name: zeitwerk
@@ -41,6 +40,7 @@ files:
41
40
  - Rakefile
42
41
  - doc/workflow_steps.md
43
42
  - dump_cleaner.gemspec
43
+ - dump_cleaner.png
44
44
  - exe/dump_cleaner
45
45
  - lib/dump_cleaner.rb
46
46
  - lib/dump_cleaner/cleaners/base_cleaner.rb
@@ -83,7 +83,6 @@ metadata:
83
83
  homepage_uri: https://github.com/NejRemeslnici/dump-cleaner
84
84
  source_code_uri: https://github.com/NejRemeslnici/dump-cleaner
85
85
  changelog_uri: https://github.com/NejRemeslnici/dump-cleaner/blob/main/CHANGELOG.md
86
- post_install_message:
87
86
  rdoc_options: []
88
87
  require_paths:
89
88
  - lib
@@ -98,8 +97,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
98
97
  - !ruby/object:Gem::Version
99
98
  version: '0'
100
99
  requirements: []
101
- rubygems_version: 3.5.3
102
- signing_key:
100
+ rubygems_version: 3.6.2
103
101
  specification_version: 4
104
102
  summary: Anonymizes data in logical database dumps.
105
103
  test_files: []