dumpling-cli 0.1.0__py3-none-win_amd64.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
|
Binary file
|
|
@@ -0,0 +1,207 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: dumpling-cli
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Classifier: Development Status :: 4 - Beta
|
|
5
|
+
Classifier: Environment :: Console
|
|
6
|
+
Classifier: Intended Audience :: Developers
|
|
7
|
+
Classifier: Operating System :: MacOS
|
|
8
|
+
Classifier: Operating System :: Microsoft :: Windows
|
|
9
|
+
Classifier: Operating System :: POSIX :: Linux
|
|
10
|
+
Classifier: Programming Language :: Python :: 3
|
|
11
|
+
Classifier: Programming Language :: Python :: 3 :: Only
|
|
12
|
+
Classifier: Programming Language :: Rust
|
|
13
|
+
Classifier: Topic :: Database
|
|
14
|
+
Classifier: Topic :: Security
|
|
15
|
+
Classifier: Topic :: Software Development :: Libraries
|
|
16
|
+
Classifier: Topic :: Utilities
|
|
17
|
+
Summary: Static anonymizer for Postgres plain SQL dumps produced by pg_dump.
|
|
18
|
+
Keywords: postgres,sql,anonymization,cli,rust
|
|
19
|
+
Requires-Python: >=3.8
|
|
20
|
+
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
|
|
21
|
+
|
|
22
|
+
## Dumpling
|
|
23
|
+
|
|
24
|
+
Static anonymizer for Postgres plain SQL dumps produced by `pg_dump`. It scans `INSERT` and `COPY FROM stdin` statements and replaces sensitive row data based on configurable rules.
|
|
25
|
+
|
|
26
|
+
### Install / Build
|
|
27
|
+
|
|
28
|
+
```bash
|
|
29
|
+
cargo build --release
|
|
30
|
+
./target/release/dumpling --help
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
### Python package build (maturin)
|
|
34
|
+
|
|
35
|
+
This repository now includes Python distribution metadata so Dumpling can be
|
|
36
|
+
published as a pip-installable CLI package (distribution name:
|
|
37
|
+
`dumpling-cli`).
|
|
38
|
+
|
|
39
|
+
```bash
|
|
40
|
+
# Build wheel/sdist locally
|
|
41
|
+
maturin build --release
|
|
42
|
+
|
|
43
|
+
# Install from local source (requires maturin as PEP 517 backend)
|
|
44
|
+
pip install .
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
After install, the CLI command remains:
|
|
48
|
+
|
|
49
|
+
```bash
|
|
50
|
+
dumpling --help
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
### Project automation
|
|
54
|
+
|
|
55
|
+
- **Lint:** `.github/workflows/ci.yml` runs `cargo fmt` and `cargo clippy` only (fast signal).
|
|
56
|
+
- **Test:** `.github/workflows/tests.yml` runs `cargo test --all-targets --all-features`.
|
|
57
|
+
- **Platform compatibility (latest):** `.github/workflows/platform-compat-latest.yml` runs cross-platform build checks on latest runner images.
|
|
58
|
+
- **Platform compatibility (matrix):** `.github/workflows/platform-compat-matrix.yml` is a manual, explicit-version matrix for legacy compatibility checks over time.
|
|
59
|
+
- **Docs:** `.github/workflows/docs.yml` builds this repo's mdBook docs and deploys them from `main` to GitHub Pages.
|
|
60
|
+
- **Publish:** `.github/workflows/publish.yml` builds wheels/sdist via `maturin`, publishes to PyPI from tags, and supports manual TestPyPI publication.
|
|
61
|
+
- **Release:** `.github/workflows/release.yml` publishes tagged releases (`v*.*.*`) with checksummed Linux artifacts.
|
|
62
|
+
|
|
63
|
+
### Docs
|
|
64
|
+
|
|
65
|
+
```bash
|
|
66
|
+
mdbook build
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
Primary docs live under `docs/src/`, including the [release process](docs/src/releasing.md).
|
|
70
|
+
|
|
71
|
+
### Usage
|
|
72
|
+
|
|
73
|
+
```bash
|
|
74
|
+
dumpling -i dump.sql -o sanitized.sql # read from file, write to file
|
|
75
|
+
dumpling -i dump.sql --in-place # overwrite the input file (atomic swap)
|
|
76
|
+
cat dump.sql | dumpling > sanitized.sql # stream from stdin to stdout
|
|
77
|
+
dumpling -i dump.sql -c .dumplingconf # use explicit config path
|
|
78
|
+
dumpling --check -i dump.sql # exit 1 if changes would occur, no output
|
|
79
|
+
dumpling --stats -i dump.sql -o out.sql # print summary to stderr
|
|
80
|
+
dumpling --report report.json -i dump.sql # write detailed JSON report of changes/drops
|
|
81
|
+
dumpling --include-table '^public\\.' -i dump.sql -o out.sql
|
|
82
|
+
dumpling --exclude-table '^audit\\.' -i dump.sql -o out.sql
|
|
83
|
+
dumpling --allow-ext dmp -i data.dmp # restrict processing to specific extensions
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
Configuration is loaded in this order:
|
|
87
|
+
|
|
88
|
+
1) `--config <path>` if provided
|
|
89
|
+
2) `.dumplingconf` in the current directory
|
|
90
|
+
3) `pyproject.toml` `[tool.dumpling]` section
|
|
91
|
+
|
|
92
|
+
If no configuration is found, Dumpling performs a no-op transformation.
|
|
93
|
+
|
|
94
|
+
### Configuration (TOML)
|
|
95
|
+
|
|
96
|
+
Both `.dumplingconf` and `[tool.dumpling]` inside `pyproject.toml` use the same schema:
|
|
97
|
+
|
|
98
|
+
```toml
|
|
99
|
+
# Optional global salt for strategies that support it (e.g. hash)
|
|
100
|
+
salt = "mysalt"
|
|
101
|
+
|
|
102
|
+
# Rules are keyed by either "table" or "schema.table"
|
|
103
|
+
[rules."public.users"]
|
|
104
|
+
email = { strategy = "email" }
|
|
105
|
+
name = { strategy = "name" }
|
|
106
|
+
ssn = { strategy = "hash", as_string = true } # SHA-256 of original (salted)
|
|
107
|
+
age = { strategy = "int_range", min = 18, max = 90 }
|
|
108
|
+
|
|
109
|
+
[rules."orders"]
|
|
110
|
+
credit_card = { strategy = "redact", as_string = true }
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
Supported strategies:
|
|
114
|
+
|
|
115
|
+
- `null`: set field to SQL NULL
|
|
116
|
+
- `redact`: replace with `REDACTED` (string)
|
|
117
|
+
- `uuid`: random UUIDv4-like string
|
|
118
|
+
- `hash`: SHA-256 hex of original value; supports per-column `salt` and global `salt`
|
|
119
|
+
- `email`: random-looking email at `example.com`
|
|
120
|
+
- `name`, `first_name`, `last_name`: simple placeholder names
|
|
121
|
+
- `phone`: simple US-like phone number `(xxx) xxx-xxxx`
|
|
122
|
+
- `int_range`: random integer in `[min, max]`
|
|
123
|
+
- `string`: random alphanumeric string, `length = 12` by default
|
|
124
|
+
- `date_fuzz`: shifts a date by a random number of days in `[min_days, max_days]` (defaults: `-30..30`)
|
|
125
|
+
- `time_fuzz`: shifts a time-of-day by a random number of seconds in `[min_seconds, max_seconds]` with 24h wraparound (defaults: `-300..300`)
|
|
126
|
+
- `datetime_fuzz`: shifts a timestamp/timestamptz by a random number of seconds in `[min_seconds, max_seconds]` (defaults: `-86400..86400`)
|
|
127
|
+
|
|
128
|
+
Common option:
|
|
129
|
+
|
|
130
|
+
- `as_string`: if true, forces the anonymized value to be rendered as a quoted SQL string literal. By default Dumpling preserves the original quoting where possible.
|
|
131
|
+
- `min_days`/`max_days`: used by `date_fuzz`
|
|
132
|
+
- `min_seconds`/`max_seconds`: used by `time_fuzz` and `datetime_fuzz`
|
|
133
|
+
|
|
134
|
+
### Input format
|
|
135
|
+
|
|
136
|
+
This tool targets the plain-text SQL format from `pg_dump`, handling:
|
|
137
|
+
|
|
138
|
+
- `INSERT INTO schema.table (col1, col2, ...) VALUES (...), (...), ...;`
|
|
139
|
+
- `COPY schema.table (col1, col2, ...) FROM stdin; ... \.` (tab-delimited with `\N` as NULL)
|
|
140
|
+
|
|
141
|
+
Other `pg_dump` formats (custom/binary/directory) are not supported.
|
|
142
|
+
|
|
143
|
+
### Row filtering (retain/delete)
|
|
144
|
+
|
|
145
|
+
You can retain or delete rows for specific tables using explicit predicate lists. Semantics:
|
|
146
|
+
|
|
147
|
+
- If `retain` is non-empty, a row is kept only if it matches at least one of its predicates.
|
|
148
|
+
- Regardless of `retain`, a row is dropped if it matches any predicate in `delete`.
|
|
149
|
+
|
|
150
|
+
Predicates support these operators on a column:
|
|
151
|
+
|
|
152
|
+
- `eq`, `neq` (string compare; case-insensitive if `case_insensitive = true`)
|
|
153
|
+
- `in`, `not_in` (list of values, string compare)
|
|
154
|
+
- `like`, `ilike` (SQL-like: `%` and `_`)
|
|
155
|
+
- `regex`, `iregex` (Rust regex; `iregex` is case-insensitive)
|
|
156
|
+
- `lt`, `lte`, `gt`, `gte` (numeric compare; values parsed as numbers)
|
|
157
|
+
- `is_null`, `not_null` (no value needed)
|
|
158
|
+
|
|
159
|
+
Example:
|
|
160
|
+
|
|
161
|
+
```toml
|
|
162
|
+
[row_filters."public.users"]
|
|
163
|
+
retain = [
|
|
164
|
+
{ column = "country", op = "eq", value = "US" },
|
|
165
|
+
{ column = "email", op = "ilike", value = "%@myco.com" }
|
|
166
|
+
]
|
|
167
|
+
delete = [
|
|
168
|
+
{ column = "is_admin", op = "eq", value = "true" },
|
|
169
|
+
{ column = "email", op = "ilike", value = "%@example.com" }
|
|
170
|
+
]
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
Row filtering works for both `INSERT ... VALUES (...)` and `COPY ... FROM stdin` rows.
|
|
174
|
+
|
|
175
|
+
### Conditional per-column cases (first-match-wins)
|
|
176
|
+
|
|
177
|
+
Define default strategies in `rules."<table>"` and add ordered per-column cases in `column_cases."<table>"."<column>"`. For each row, for each column, Dumpling applies the first matching case; if none match, it uses the default from `rules`.
|
|
178
|
+
|
|
179
|
+
Example:
|
|
180
|
+
|
|
181
|
+
```toml
|
|
182
|
+
[rules."public.users"]
|
|
183
|
+
email = { strategy = "hash", as_string = true } # default
|
|
184
|
+
name = { strategy = "name" }
|
|
185
|
+
|
|
186
|
+
[[column_cases."public.users".email]]
|
|
187
|
+
when.any = [{ column = "is_admin", op = "eq", value = "true" }]
|
|
188
|
+
strategy = { strategy = "redact", as_string = true }
|
|
189
|
+
|
|
190
|
+
[[column_cases."public.users".email]]
|
|
191
|
+
when.any = [{ column = "country", op = "in", values = ["DE","FR","GB"] }]
|
|
192
|
+
strategy = { strategy = "hash", salt = "eu-salt", as_string = true }
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
Notes:
|
|
196
|
+
- `when.any` is OR, `when.all` is AND; you can use either or both. If both are empty, the case matches unconditionally.
|
|
197
|
+
- First-match-wins per column; there is no merge/replace or fallthrough flag.
|
|
198
|
+
- Row filtering (`row_filters`) is evaluated before cases; deleted rows are not transformed.
|
|
199
|
+
|
|
200
|
+
### Notes
|
|
201
|
+
|
|
202
|
+
- This is a streaming transformer; memory usage stays small even for big dumps.
|
|
203
|
+
- For best results, configure strategies compatible with column data types. If you hash an integer column, Dumpling will render a string which Postgres can usually coerce, but explicit `as_string = false` may help in some cases.
|
|
204
|
+
- If you switch runtimes/branches frequently and see test DB migration issues in your project, remember you can run tests with `pytest --create-db` (project convention).
|
|
205
|
+
- Deterministic anonymization for tests: pass `--seed <u64>` or set env `DUMPLING_SEED` to make fuzz strategies reproducible across runs.
|
|
206
|
+
|
|
207
|
+
|
|
@@ -0,0 +1,5 @@
|
|
|
1
|
+
dumpling_cli-0.1.0.data/scripts/dumpling.exe,sha256=6vvLUVqdiCpBYYxN_vbl-LWnfEMzpgILCuZAqTphCGg,3210752
|
|
2
|
+
dumpling_cli-0.1.0.dist-info/METADATA,sha256=29V0898AhGpcxzdVumYBAE9cfmJwW79qnR-GZCQLPxI,8668
|
|
3
|
+
dumpling_cli-0.1.0.dist-info/WHEEL,sha256=uJOc2U-Q1x95AlblQcqMRb3iR4QnPtdI7X2ycPN99rM,94
|
|
4
|
+
dumpling_cli-0.1.0.dist-info/sboms/dumpling.cyclonedx.json,sha256=0Blmq2tyPTgp0vNrUIFsg906AnCWSbtgTL9cq1dFUGI,90520
|
|
5
|
+
dumpling_cli-0.1.0.dist-info/RECORD,,
|