PyPI - dumpling-cli - Versions diffs - 0.6.0__tar.gz → 0.7.0a0__tar.gz - Mend

dumpling-cli 0.6.0tar.gz → 0.7.0a0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (45) hide show

{dumpling_cli-0.6.0 → dumpling_cli-0.7.0a0}/.dumplingconf.example RENAMED Viewed

@@ -59,8 +59,10 @@ last_login    = { strategy = "datetime_fuzz" }
 wake_time     = { strategy = "time_fuzz", min_seconds = -3600, max_seconds = 3600 }
 [rules."public.orders"]
-# credit card — redact entirely; force as quoted string
-credit_card   = { strategy = "redact", as_string = true }
+# credit card — Luhn-valid synthetic PAN (length 13–19); use domain for stable FKs across dumps
+credit_card   = { strategy = "payment_card", length = 16, domain = "order_pan" }
+# monetary / numeric — random decimal in range with fixed fractional digits
+order_total   = { strategy = "decimal", min = 0, max = 99999, scale = 2, domain = "order_amount" }
 # keep the same anonymized email as users table via shared domain
 customer_email = { strategy = "faker", faker = "internet::SafeEmail", domain = "customer_identity" }

{dumpling_cli-0.6.0 → dumpling_cli-0.7.0a0}/.github/workflows/ci.yml RENAMED Viewed

@@ -23,7 +23,7 @@ jobs:
           components: rustfmt, clippy
       - name: Cache Cargo build artifacts
-        uses: Swatinem/rust-cache@v2
+        uses: Swatinem/rust-cache@v2.9.1
       - name: Check formatting
         run: cargo fmt --all -- --check

{dumpling_cli-0.6.0 → dumpling_cli-0.7.0a0}/.github/workflows/platform-compat-latest.yml RENAMED Viewed

@@ -28,7 +28,7 @@ jobs:
         uses: dtolnay/rust-toolchain@stable
       - name: Cache Cargo build artifacts
-        uses: Swatinem/rust-cache@v2
+        uses: Swatinem/rust-cache@v2.9.1
       - name: Build release binary
         run: cargo build --release --locked

{dumpling_cli-0.6.0 → dumpling_cli-0.7.0a0}/.github/workflows/platform-compat-matrix.yml RENAMED Viewed

@@ -26,7 +26,7 @@ jobs:
         uses: dtolnay/rust-toolchain@stable
       - name: Cache Cargo build artifacts
-        uses: Swatinem/rust-cache@v2
+        uses: Swatinem/rust-cache@v2.9.1
       - name: Build release binary
         run: cargo build --release --locked

{dumpling_cli-0.6.0 → dumpling_cli-0.7.0a0}/.github/workflows/policy-lint.yml RENAMED Viewed

@@ -38,7 +38,7 @@ jobs:
         uses: dtolnay/rust-toolchain@stable
       - name: Cache Cargo build artifacts
-        uses: Swatinem/rust-cache@v2
+        uses: Swatinem/rust-cache@v2.9.1
       - name: Build dumpling
         run: cargo build --release --locked

{dumpling_cli-0.6.0 → dumpling_cli-0.7.0a0}/.github/workflows/publish.yml RENAMED Viewed

@@ -38,7 +38,7 @@ jobs:
         uses: dtolnay/rust-toolchain@stable
       - name: Cache Cargo build artifacts
-        uses: Swatinem/rust-cache@v2
+        uses: Swatinem/rust-cache@v2.9.1
       - name: Set up Python
         uses: actions/setup-python@v6

{dumpling_cli-0.6.0 → dumpling_cli-0.7.0a0}/.github/workflows/release.yml RENAMED Viewed

@@ -21,7 +21,7 @@ jobs:
           components: rustfmt, clippy
       - name: Cache Cargo build artifacts
-        uses: Swatinem/rust-cache@v2
+        uses: Swatinem/rust-cache@v2.9.1
       - name: Validate formatting
         run: cargo fmt --all -- --check

{dumpling_cli-0.6.0 → dumpling_cli-0.7.0a0}/.github/workflows/tests.yml RENAMED Viewed

@@ -21,7 +21,7 @@ jobs:
         uses: dtolnay/rust-toolchain@stable
       - name: Cache Cargo build artifacts
-        uses: Swatinem/rust-cache@v2
+        uses: Swatinem/rust-cache@v2.9.1
       - name: Run cargo tests
         run: cargo test --all-targets --all-features

{dumpling_cli-0.6.0 → dumpling_cli-0.7.0a0}/AGENTS.md RENAMED Viewed

@@ -221,7 +221,7 @@ Follow these steps in order. Do not skip any step.
 7. **Tests**: Add `#[test]` functions in `src/transform.rs` (unit-test strategy output values) and in `src/sql.rs` (end-to-end pipeline test). Use `set_random_seed(N)` for reproducibility.
-8. **`README.md`**: Add a row to the "Anonymization strategies" table.
+8. **`README.md`**: Document the strategy under *Configuration → Anonymization strategies* (per-strategy subsection with accepted options), and mention any new spec fields in `AnonymizerSpec`’s doc comment in `settings.rs`.
 **`faker` strategy:** Config only carries string identifiers; Dumpling never evaluates user Rust from config. To ship a new generator, add dispatch in `src/faker_dispatch.rs` and validation in `validate_anonymizer_spec` for the `faker` branch. Upstream reference: [`fake` on docs.rs](https://docs.rs/fake/latest/fake/), [`fake::faker` module index](https://docs.rs/fake/latest/fake/faker/index.html), [source on GitHub](https://github.com/cksac/fake-rs).

{dumpling_cli-0.6.0 → dumpling_cli-0.7.0a0}/CHANGELOG.md RENAMED Viewed

@@ -7,11 +7,23 @@ and this project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.ht
 ## [Unreleased]
+## [0.7.0-alpha] - 2026-05-04
+Pre-release toward **0.7.0** (stable **0.7.0** is not published yet; crates use the **0.7.0-alpha** prerelease identifier until then).
+### Removed
+- **`--include-table` / `--exclude-table`**: these CLI flags and the associated per-table skip logic in the SQL stream processor are removed. Anonymize the full dump, or split/filter dumps outside Dumpling if you need a smaller input.
+### Changed
+- **Dump seal (`v=3`):** the fingerprint JSON no longer includes table-filter fields. Seals produced by Dumpling **0.6.x** (`v=2`) will not match **0.7.x**; the first line is treated as stale and the dump is re-processed.
 ## [0.6.0] - 2026-05-03
 ### Added
-- **Dump seal** (leading `-- dumpling-seal:` SQL comment): records Dumpling version, security profile, a SHA-256 fingerprint of the resolved policy, and runtime CLI options that affect transforms (`--format`, sorted `--include-table` / `--exclude-table`, effective PRNG seed in standard profile). When the input already begins with a **matching** seal, the remainder is copied through unchanged; stale or unknown seal lines are stripped and the dump is re-processed. See README for full semantics ([#58](https://github.com/ababic/dumpling/pull/58)).
+- **Dump seal** (leading `-- dumpling-seal:` SQL comment): records Dumpling version, security profile, a SHA-256 fingerprint of the resolved policy, and runtime CLI options that affect transforms (`--format` and the effective PRNG seed in standard profile; `null` in hardened, where seeds are ignored). When the input already begins with a **matching** seal, the remainder is copied through unchanged; stale or unknown seal lines are stripped and the dump is re-processed. See README for full semantics ([#58](https://github.com/ababic/dumpling/pull/58)).
 - **`--stats`**: prints `wall_ms` plus `domain_cache_hits` and `domain_cache_misses` for quick profiling of large runs ([#59](https://github.com/ababic/dumpling/pull/59)).
 - **`CONTRIBUTORS.md`** ([#59](https://github.com/ababic/dumpling/pull/59)).

{dumpling_cli-0.6.0 → dumpling_cli-0.7.0a0}/Cargo.lock RENAMED Viewed

@@ -262,7 +262,7 @@ dependencies = [
 [[package]]
 name = "dumpling"
-version = "0.6.0"
+version = "0.7.0-alpha"
 dependencies = [
  "anyhow",
  "chrono",

{dumpling_cli-0.6.0 → dumpling_cli-0.7.0a0}/Cargo.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [package]
 name = "dumpling"
-version = "0.6.0"
+version = "0.7.0-alpha"
 edition = "2021"
 readme = "README.md"

{dumpling_cli-0.6.0 → dumpling_cli-0.7.0a0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: dumpling-cli
-Version: 0.6.0
+Version: 0.7.0a0
 Classifier: Development Status :: 4 - Beta
 Classifier: Environment :: Console
 Classifier: Intended Audience :: Developers
@@ -26,7 +26,7 @@ Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
 <h1 align="center">Dumpling</h1>
 <p align="center">
-  <strong>Sanitize SQL dumps before they go anywhere.</strong><br />
+  <strong>Sanitize database dumps before they go anywhere.</strong><br />
   Turn huge <code>pg_dump</code> / SQLite / SQL Server exports into shareable, test-friendly snapshots — no DB connection, no secrets left by accident.
 </p>
@@ -55,10 +55,12 @@ Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
 ## Why Dumpling?
+- **Rich built-in strategies** — from fast clears (`null`, `redact`, `blank`, `empty_array` / `empty_object`) and bounded fakes (`int_range`, `decimal`, `string`) to realistic stand-ins (`email`, `name`, `payment_card`, `faker`, date/time fuzz), with optional **`domain`** so the same source value stays consistent across tables.
+- **JSON inside columns** — target paths inside `json` / `jsonb` text with the same dot or `__` syntax you use elsewhere; pair with row filters on nested fields.
+- **Row-level control** — **`retain`** and **`delete`** predicates (including nested JSON paths) drop or keep whole rows before transforms run.
 - **Offline by design** — works on dump files only; nothing connects to your database.
 - **Streams giant files** — line-by-line processing keeps multi‑GB dumps reasonable on modest hardware.
 - **Fails loud, not silent** — missing config exits non‑zero and lists where Dumpling looked; use `--allow-noop` only when you mean it.
-- **Stable pseudonyms** — optional domain mappings keep the same source value as the same fake value across tables (foreign keys stay consistent).
 - **Pipeline-ready** — `--check`, strict coverage, JSON reports, and residual PII scans fit pre-merge gates and release automation.
 - **Configure once** — `.dumplingconf` or `[tool.dumpling]` in `pyproject.toml`; install via **Rust** (`cargo`) or **`pip install dumpling-cli`**.
@@ -95,6 +97,21 @@ dumpling --help
 ---
+## Getting started
+Follow these steps once; you will have a working path from “raw dump” to “first sanitized output,” then you can deepen coverage using the rest of this README and the [documentation site](https://ababic.github.io/dumpling/).
+1. **Start from the example policy** — Copy [`.dumplingconf.example`](.dumplingconf.example) to `.dumplingconf` in your project root (or merge the same keys under `[tool.dumpling]` in `pyproject.toml`). Set environment variables for `salt` and any `${…}` references so Dumpling can resolve secrets at startup.
+2. **Name your tables and columns** — Open your dump next to the config. `CREATE TABLE`, `COPY … (…)` and `INSERT INTO … (…)` lines list the identifiers you need for `[rules."table"]` or `[rules."schema.table"]` (see [Configuration (TOML)](#configuration-toml) below). Trim the example rules down to the tables you care about first, then add columns and strategies as you go.
+3. **Run Dumpling** — `dumpling -i dump.sql -o sanitized.sql` (add `-c path` if the config is not in the default search path). Use `dumpling --check -i dump.sql` when you only want to know whether anything would change.
+4. **Tighten the policy** — Run `dumpling lint-policy` on your config. When you are ready for stricter gates, add `[sensitive_columns]` and use `--strict-coverage` / `--report` / `--scan-output` as described under [Usage](#usage).
+**Draft policy generation (planned)** — A future command will stream a dump and emit a **draft** starter TOML so you spend less time hunting table and column names and basic DDL hints (for example `varchar(N)` lengths). Output will be explicitly **draft**: always review and edit before production or compliance workflows; it is a time-saver, not a full policy.
+The same flow is spelled out in the docs: [Getting started](https://ababic.github.io/dumpling/getting-started.html).
+---
 ## Usage
 ```bash
@@ -108,13 +125,10 @@ dumpling --report report.json -i dump.sql       # write detailed JSON report of
 dumpling --strict-coverage --report report.json -i dump.sql --check  # fail on uncovered sensitive columns
 dumpling --scan-output --report report.json -i dump.sql               # scan transformed output for residual PII-like patterns
 dumpling --scan-output --fail-on-findings --report report.json -i dump.sql --check  # fail if scan thresholds are exceeded
-dumpling --include-table '^public\\.' -i dump.sql -o out.sql
-dumpling --exclude-table '^audit\\.' -i dump.sql -o out.sql
 dumpling --allow-ext dmp -i data.dmp            # restrict processing to specific extensions
 dumpling --allow-noop -i dump.sql -o out.sql    # explicitly allow no-op when config is missing
 dumpling --format sqlite -i data.db.sql -o out.sql  # process a SQLite .dump file
 dumpling --format mssql  -i backup.sql -o out.sql   # process a SQL Server plain-SQL dump
-dumpling --security-profile hardened -i dump.sql -o sanitized.sql  # hardened CSPRNG + HMAC mode
 dumpling lint-policy                          # lint the anonymization policy config
 dumpling lint-policy --config .dumplingconf   # lint with explicit config path
 ```
@@ -129,15 +143,194 @@ If no configuration is found, Dumpling fails closed by default and exits non-zer
 The error output lists every checked location. Use `--allow-noop` to explicitly
 permit no-op behavior.
-### Dump seal (always on)
+The **dump seal** comment prefixed to successful output and **`--security-profile hardened`** are documented in the [configuration guide](https://ababic.github.io/dumpling/configuration.html) (see *Dump seal* and *Hardened security profile*).
+---
+## Anonymization strategies
+Column rules live under `[rules."schema.table"]` (or `[rules."table"]`) as inline tables: `{ strategy = "<name>", ... }`. **Strategy-specific keys** are documented next to the strategy that accepts them. A few keys apply across many strategies; see [Cross-cutting options](#cross-cutting-options) below.
+#### Choosing a strategy (cheaper vs more realistic)
+Prefer **lightweight** strategies when nothing downstream requires lifelike values: **`null`**, **`redact`**, **`blank`**, **`empty_array`**, **`empty_object`**, **`string`**, **`int_range`**, and **`decimal`** are cheap to generate (simple constants, random digits/alnum, or bounded numeric shapes). Use **`blank`** for NOT NULL text where you must clear content without SQL NULL; use **`empty_array`** / **`empty_object`** on JSON path rules (or text columns holding JSON) when the document must keep `[]` / `{}` instead of `null` or `""`.
+Reach for **richer** strategies when realism matters for restores, demos, or tests that exercise parsers and validators: **`email`**, **`name`**, **`first_name`**, **`last_name`**, **`phone`**, **`faker`**, **`uuid`**, **`hash`**, **`payment_card`**, and the **`date_fuzz` / `time_fuzz` / `datetime_fuzz`** family do more work (formatting, parsing, digest, or upstream generators). If a cheap strategy would break **CHECK constraints**, **NOT NULL**, **foreign-key shape**, or **import tooling** that validates formats, switch to a strategy that emits compatible values—or keep **`domain`** on the heavier strategy so referential consistency is preserved where you need it.
+#### `null`
+- **Behavior:** emit SQL `NULL` for the cell.
+- **Options:** none. (`domain` is rejected.)
+#### `redact`
+- **Behavior:** replace with the literal `REDACTED`.
+- **`as_string`:** if `true`, the replacement is always a single-quoted SQL string; if `false`, it is emitted without quotes (still valid as an identifier-like token in many dumps). When the **original** cell was already a quoted string, Dumpling quotes the output even when `as_string` is omitted—see [Cross-cutting options](#cross-cutting-options).
+#### `blank`
+- **Behavior:** replace with an **empty string** (`''` in SQL when quoted). If the source cell is SQL **`NULL`**, the cell stays **`NULL`** (same as `null` / `redact` semantics for missing values).
+- **Options:** none. (`domain` is rejected.) **`as_string`** is ignored; output is always the empty string literal when non-NULL.
+#### `empty_array` / `empty_object`
+- **Behavior:** replace with the JSON tokens **`[]`** and **`{}`** as **unquoted** SQL/COPY tokens (so they parse as JSON when the column holds JSON). If the source cell is SQL **`NULL`**, the cell stays **`NULL`**.
+- **JSON path rules:** use these on leaves that are JSON **arrays** or **objects** when you need a typed empty container instead of `null` or `""`.
+- **Options:** none. (`domain` is rejected.)
+#### `uuid`
+- **Behavior:** random UUIDv4-like hyphenated hex string.
+- **`as_string`:** same meaning as for `redact` / `hash` (force quoted literal vs. unquoted token).
+#### `hash`
+- **Behavior:** salted digest of the original cell value (SHA-256 by default; see [configuration guide — Hardened security profile](https://ababic.github.io/dumpling/configuration.html#hardened-security-profile) for HMAC mode).
+- **`salt`:** optional per-column salt; otherwise the top-level `salt` or registry default applies.
+- **`as_string`:** if `true`, force a quoted string literal; if `false`, unquoted hex. Quoted **source** cells are still written quoted when `as_string` is omitted.
+#### `email`, `name`, `first_name`, `last_name`, `phone`
+- **Behavior:** locale-aware fake values (same underlying generators as the matching `faker` targets).
+- **`locale`:** optional; one of `en`, `fr_fr`, `de_de`, `it_it`, `pt_br`, `pt_pt`, `ar_sa`, `zh_cn`, `zh_tw`, `ja_jp`, `cy_gb` (default `en`).
+- **Output:** always emitted as a quoted string replacement.
+#### `int_range`
+- **Behavior:** random integer in the inclusive range `[min, max]` (defaults `min = 0`, `max = 1_000_000`).
+- **`min` / `max`:** inclusive bounds; `min` must be ≤ `max`.
+- **Output:** always unquoted digits (suitable for integer / JSON number columns).
-Every successful run that writes output prefixes the stream with a single-line SQL comment:
+#### `decimal`
-`-- dumpling-seal: v=2 version=<semver> profile=<standard|hardened> sha256=<64 hex chars>`
+- **Behavior:** random decimal with integer part in `[min, max]` and fractional part of **`scale`** digits (defaults `min = 0`, `max = 1_000_000`, `scale = 2`). Use `scale = 0` for a plain integer string in the same range.
+- **`min` / `max`:** inclusive integer-part bounds.
+- **`scale`:** number of digits after `.` (0–38).
+- **`as_string`:** same as `hash` / `redact` for quoting the full literal.
-The `sha256` is over canonical JSON that includes the Dumpling version, the active security profile, a stable encoding of the resolved policy (rules, row filters, column cases, sensitive columns, output scan, global salt), and **runtime options** that affect transforms: `--format`, sorted `--include-table` / `--exclude-table` patterns, and the effective `--seed` / `DUMPLING_SEED` value in standard profile (`null` in hardened, where seeds are ignored).
+#### `payment_card`
+- **Behavior:** random digit string of length **`length`** (default **16**) with a **valid Luhn check digit**, so `--scan-output` PAN detection treats synthetic values like test cards, not arbitrary digit runs.
+- **`length`:** total digit count including check digit; must be 13–19 (PAN lengths).
+- **Output:** always a quoted string of digits (no separators).
+#### `string`
+- **Behavior:** random alphanumeric string.
+- **`length`:** character count (default 12); must be ≥ 1 when set.
+#### `faker`
+- **Behavior:** values from the Rust [`fake`](https://crates.io/crates/fake) crate ([`faker` modules](https://docs.rs/fake/latest/fake/faker/index.html)), selected only by the string **`faker = "module::Type"`** (e.g. `internet::SafeEmail`). Config is **data only**—nothing from TOML is compiled as Rust. Unsupported pairs fail at config load; new generators require a **new Dumpling release** (or a fork), not config-side code.
+- **`faker`:** required; maps to a built-in allowlist in `src/faker_dispatch.rs`.
+- **`locale`:** optional; same set as the built-in PII strategies when the upstream generator is locale-aware.
+- **`min` / `max` / `length` / `format`:** only for/faker combinations that upstream supports (e.g. `number::NumberWithFormat` uses **`format`**: `#` = any digit, `^` = 1–9 per [`fake` docs](https://docs.rs/fake/latest/fake/)).
+**Upstream reference:** [docs.rs — `fake`](https://docs.rs/fake/latest/fake/), [docs.rs — `fake::faker`](https://docs.rs/fake/latest/fake/faker/index.html), [GitHub — cksac/fake-rs](https://github.com/cksac/fake-rs).
+#### `date_fuzz`, `time_fuzz`, `datetime_fuzz`
+- **Behavior:** parse the existing value when possible and shift by a random offset; on parse failure the original string is kept.
+- **`date_fuzz`:** **`min_days` / `max_days`** (defaults `-30` … `30`).
+- **`time_fuzz` / `datetime_fuzz`:** **`min_seconds` / `max_seconds`** (`time_fuzz` defaults `-300` … `300`; `datetime_fuzz` defaults `-86400` … `86400`).
+- **`as_string`:** force quoted literal vs. unquoted token for the emitted date/time/timestamp string.
+### Cross-cutting options
+These keys are valid on **multiple** strategies (unless validation says otherwise):
+- **`domain`:** deterministic mapping bucket. The same non-NULL source value maps to the same pseudonym for that strategy inside the domain (across tables/columns). **SQL `NULL` is always preserved**—no fabricated FK targets.
+- **`unique_within_domain`:** when `true` (requires `domain`), different source values are assigned distinct pseudonyms within the domain.
+- **`as_string`:** when `true`, force the replacement to render as a **single-quoted SQL string literal**. When `false` or omitted, Dumpling still quotes the output if the **original** cell was quoted (`render_cell` uses `force_quoted || original.was_quoted`). Set `as_string = true` when the source may be unquoted (numeric-looking literals, some `COPY` shapes) but you need a string literal in the dump.
+---
+## Conditional per-column cases
+Define default strategies in `rules."<table>"` and add ordered per-column cases in `column_cases."<table>"."<column>"`. For each row and column, Dumpling applies the first matching case; if none match, it falls back to the default from `rules`.
+```toml
+[rules."public.users"]
+email = { strategy = "hash", as_string = true }   # default
+name  = { strategy = "name" }
+[[column_cases."public.users".email]]
+when.any = [{ column = "is_admin", op = "eq", value = "true" }]
+strategy = { strategy = "redact", as_string = true }
+[[column_cases."public.users".email]]
+when.any = [{ column = "country", op = "in", values = ["DE","FR","GB"] }]
+strategy = { strategy = "hash", salt = "eu-salt", as_string = true }
+```
+- `when.any` is OR, `when.all` is AND; you can use either or both. If both are empty, the case matches unconditionally.
+- First-match-wins per column; there is no merge or fallthrough.
+- Row filtering (`row_filters`) is evaluated before cases; deleted rows are not transformed.
+---
+## JSON path rules inside columns
+When a column stores JSON as text (`json` / `jsonb` dumped as a string), you can target **fields inside the document** with the same path syntax as row filters — but as **keys under `[rules."<table>"]`**. Use **quoted** TOML keys when the path contains dots.
+- Dot notation: `"payload.profile.email" = { strategy = "email", domain = "orders_email", as_string = true }`
+- Django-style: `"payload__profile__email" = { strategy = "hash", salt = "${env:ORDER_SECRET_SALT}", as_string = true }`
+The segment before the first `.` or `__` is the **SQL column name**; the rest is the path inside the parsed JSON. You can use **either** path-level rules for a column **or** one whole-column rule for that column’s base name, not both (Dumpling rejects the conflict at startup). If a path is missing in a row, that rule is skipped for that row. When only path rules apply, the rest of the JSON is left unchanged. Path rules run in **longest-path-first** order. `column_cases` still match the SQL column name only; use `when` predicates with nested `column` paths to branch on JSON content.
+---
+## Row filtering
+You can retain or delete rows for specific tables using explicit predicate lists.
+- If `retain` is non-empty, a row is kept only if it matches at least one predicate.
+- Regardless of `retain`, a row is dropped if it matches any predicate in `delete`.
+Supported predicate operators:
+| Operator | Description |
+|---|---|
+| `eq` / `neq` | String compare (case-insensitive if `case_insensitive = true`) |
+| `in` / `not_in` | List of values (string compare) |
+| `like` / `ilike` | SQL-like patterns (`%` and `_`) |
+| `regex` / `iregex` | Rust regex (`iregex` is case-insensitive) |
+| `lt` / `lte` / `gt` / `gte` | Numeric compare (values parsed as numbers) |
+| `is_null` / `not_null` | No value needed |
+Predicates can target nested JSON values using dot notation (`payload.profile.tier`) or Django-style notation (`payload__profile__tier`). For JSON arrays, path segments are evaluated against each element, so list-of-dicts structures can be matched naturally.
+### JSON path list targeting
+JSON list/array traversal is automatic once a path segment resolves to an array.
+- **All elements in an array**: use the next field name directly.
+  - `payload.items.kind` or `payload__items__kind`
+  - Matches/rewrites `kind` for every object in `items`.
+- **Specific array index**: use a numeric segment.
+  - `payload.items.0.kind` or `payload__items__0__kind`
+  - Targets only the first element.
+- **Nested arrays**: combine field and index segments as needed.
+  - `payload.groups.members.email`
+  - `payload.groups.1.members.0.email`
+This path behavior is shared by both `row_filters` predicates and JSON-path anonymization rules in `[rules]`.
+```toml
+[row_filters."public.users"]
+retain = [
+  { column = "country", op = "eq",  value = "US" },
+  { column = "email",   op = "ilike", value = "%@myco.com" },
+  { column = "profile.flags.plan", op = "eq", value = "gold" }
+]
+delete = [
+  { column = "is_admin", op = "eq", value = "true" },
+  { column = "email",    op = "ilike", value = "%@example.com" },
+  { column = "devices__platform", op = "eq", value = "android" }
+]
+```
-If the **input** already begins with a seal line and it **matches** the current run, Dumpling copies the rest of the file through unchanged. If the line looks like a seal but does **not** match (stale policy, different flags, or older `v=`), that line is **dropped** and the dump is re-processed so you do not end up with two seal lines. `--strict-coverage` cannot be combined with a matching seal (table definitions are not scanned in passthrough mode). `--check` writes no output and therefore emits no seal line.
+Row filtering works for both `INSERT ... VALUES (...)` and `COPY ... FROM stdin` rows.
 ---
@@ -158,7 +351,8 @@ ssn   = { strategy = "hash", salt = "${env:DUMPLING_USERS_SSN_SALT}", as_string
 age   = { strategy = "int_range", min = 18, max = 90 }
 [rules."orders"]
-credit_card = { strategy = "redact", as_string = true }
+credit_card = { strategy = "payment_card", length = 16, domain = "order_pan" }
+amount = { strategy = "decimal", min = 0, max = 9999, scale = 2, domain = "order_amount" }
 # Optional explicit sensitive columns policy list (for strict coverage)
 [sensitive_columns]
@@ -185,32 +379,6 @@ pan = "critical"
 token = "high"
 ```
-### Anonymization strategies
-| Strategy | Description |
-|---|---|
-| `null` | Set field to SQL `NULL` |
-| `redact` | Replace with `REDACTED` (string) |
-| `uuid` | Random UUIDv4-like string |
-| `hash` | SHA-256 hex of original value; supports per-column `salt` and global `salt` |
-| `email` | Safe email address (same generator as `faker = "internet::SafeEmail"`); supports `locale` |
-| `name` | Full name (same as `faker = "name::Name"`); supports `locale` |
-| `first_name` | First name (same as `faker = "name::FirstName"`); supports `locale` |
-| `last_name` | Last name (same as `faker = "name::LastName"`); supports `locale` |
-| `phone` | Locale-aware fake phone number (configurable via `locale`); defaults to English format |
-| `faker` | Values from the Rust [`fake`](https://crates.io/crates/fake) crate ([docs.rs](https://docs.rs/fake/latest/fake/), [`faker` modules](https://docs.rs/fake/latest/fake/faker/index.html)), chosen by a **string identifier** only (`faker = "module::Type"`, e.g. `internet::SafeEmail`). Config is **data only**: nothing from TOML is compiled or executed as Rust at runtime. Use `locale` for locale-aware generators; optional `min`/`max`, `length`, `format` as documented. Unsupported targets fail at config load. New generators require a **new Dumpling release** (or your own fork), not config-side code. |
-| `int_range` | Random integer in `[min, max]` |
-| `string` | Random alphanumeric string (`length = 12` by default) |
-| `date_fuzz` | Shifts a date by a random number of days in `[min_days, max_days]` (defaults: `-30..30`) |
-| `time_fuzz` | Shifts a time-of-day by a random number of seconds in `[min_seconds, max_seconds]` with 24h wraparound (defaults: `-300..300`) |
-| `datetime_fuzz` | Shifts a timestamp/timestamptz by a random number of seconds in `[min_seconds, max_seconds]` (defaults: `-86400..86400`) |
-**`faker` reference (upstream `fake` crate):** Dumpling’s `faker = "module::Type"` strings mirror the Rust [`fake`](https://crates.io/crates/fake) crate’s [`faker`](https://docs.rs/fake/latest/fake/faker/index.html) module layout. Use these when picking or extending generators:
-- [docs.rs — `fake` crate root](https://docs.rs/fake/latest/fake/) (overview, `Fake` / `Dummy` traits, locales)
-- [docs.rs — `fake::faker` module index](https://docs.rs/fake/latest/fake/faker/index.html) (per-domain submodules: `address`, `internet`, `name`, …)
-- [GitHub — `cksac/fake-rs`](https://github.com/cksac/fake-rs) (source, README with the CLI’s generator name list)
 ### Secret references
 Dumpling resolves secret references in string config fields so plaintext salts/keys
@@ -251,17 +419,6 @@ dumpling --input dump.sql --check --strict-coverage --report coverage.json
 dumpling --security-profile hardened --input dump.sql --check
 ```
-### Common column options
-- `as_string`: if true, forces the anonymized value to be rendered as a quoted SQL string literal. By default Dumpling preserves the original quoting where possible.
-- `domain`: deterministic mapping domain. When set, the same source value always maps to the same pseudonym inside that domain (across tables/columns). **SQL `NULL` inputs are always preserved as `NULL`** — a null FK reference has no source value to map, so no pseudonym is fabricated.
-- `unique_within_domain`: when true, different source values are assigned unique pseudonyms within the configured `domain`. NULL values are unaffected and always remain NULL.
-- `min_days` / `max_days`: used by `date_fuzz`.
-- `min_seconds` / `max_seconds`: used by `time_fuzz` and `datetime_fuzz`.
-- `locale`: selects the language/regional format for `email`, `name`, `first_name`, `last_name`, `faker`, and `phone`. Supported values: `en`, `fr_fr`, `de_de`, `it_it`, `pt_br`, `pt_pt`, `ar_sa`, `zh_cn`, `zh_tw`, `ja_jp`, `cy_gb`. Defaults to `en` when not specified.
-- `faker`: required when `strategy = "faker"`. A plain string `"module::Type"` (case-insensitive) that maps to a **built-in** generator compiled into Dumpling—not arbitrary Rust or expressions. Names follow [`fake::faker`](https://docs.rs/fake/latest/fake/faker/index.html) (e.g. `internet::SafeEmail` → `faker::internet::SafeEmail` in the crate).
-- `format`: used with `faker = "number::NumberWithFormat"`; pattern uses `#` (0–9) and `^` (1–9) per the [`fake` crate docs](https://docs.rs/fake/latest/fake/).
 > **Note:** `table_options` are no longer supported; use explicit `rules` and optional `column_cases`.
 ---
@@ -368,144 +525,6 @@ Produced by SSMS "Script Table as → INSERT To", `mssql-scripter`, or similar t
 ---
-## Row filtering
-You can retain or delete rows for specific tables using explicit predicate lists.
-- If `retain` is non-empty, a row is kept only if it matches at least one predicate.
-- Regardless of `retain`, a row is dropped if it matches any predicate in `delete`.
-Supported predicate operators:
-| Operator | Description |
-|---|---|
-| `eq` / `neq` | String compare (case-insensitive if `case_insensitive = true`) |
-| `in` / `not_in` | List of values (string compare) |
-| `like` / `ilike` | SQL-like patterns (`%` and `_`) |
-| `regex` / `iregex` | Rust regex (`iregex` is case-insensitive) |
-| `lt` / `lte` / `gt` / `gte` | Numeric compare (values parsed as numbers) |
-| `is_null` / `not_null` | No value needed |
-Predicates can target nested JSON values using dot notation (`payload.profile.tier`) or Django-style notation (`payload__profile__tier`). For JSON arrays, path segments are evaluated against each element, so list-of-dicts structures can be matched naturally.
-### JSON path list targeting
-JSON list/array traversal is automatic once a path segment resolves to an array.
-- **All elements in an array**: use the next field name directly.
-  - `payload.items.kind` or `payload__items__kind`
-  - Matches/rewrites `kind` for every object in `items`.
-- **Specific array index**: use a numeric segment.
-  - `payload.items.0.kind` or `payload__items__0__kind`
-  - Targets only the first element.
-- **Nested arrays**: combine field and index segments as needed.
-  - `payload.groups.members.email`
-  - `payload.groups.1.members.0.email`
-This path behavior is shared by both `row_filters` predicates and JSON-path anonymization rules in `[rules]`.
-```toml
-[row_filters."public.users"]
-retain = [
-  { column = "country", op = "eq",  value = "US" },
-  { column = "email",   op = "ilike", value = "%@myco.com" },
-  { column = "profile.flags.plan", op = "eq", value = "gold" }
-]
-delete = [
-  { column = "is_admin", op = "eq", value = "true" },
-  { column = "email",    op = "ilike", value = "%@example.com" },
-  { column = "devices__platform", op = "eq", value = "android" }
-]
-```
-Row filtering works for both `INSERT ... VALUES (...)` and `COPY ... FROM stdin` rows.
----
-## Conditional per-column cases
-Define default strategies in `rules."<table>"` and add ordered per-column cases in `column_cases."<table>"."<column>"`. For each row and column, Dumpling applies the first matching case; if none match, it falls back to the default from `rules`.
-```toml
-[rules."public.users"]
-email = { strategy = "hash", as_string = true }   # default
-name  = { strategy = "name" }
-[[column_cases."public.users".email]]
-when.any = [{ column = "is_admin", op = "eq", value = "true" }]
-strategy = { strategy = "redact", as_string = true }
-[[column_cases."public.users".email]]
-when.any = [{ column = "country", op = "in", values = ["DE","FR","GB"] }]
-strategy = { strategy = "hash", salt = "eu-salt", as_string = true }
-```
-- `when.any` is OR, `when.all` is AND; you can use either or both. If both are empty, the case matches unconditionally.
-- First-match-wins per column; there is no merge or fallthrough.
-- Row filtering (`row_filters`) is evaluated before cases; deleted rows are not transformed.
----
-## Hardened security profile
-For adversarial risk environments — where an internal or external actor may have partial auxiliary data — use `--security-profile hardened`:
-```bash
-dumpling --security-profile hardened -i dump.sql -o sanitized.sql
-```
-### What changes in hardened mode
-| Aspect | Standard | Hardened |
-|---|---|---|
-| Random generation | xorshift64\* seeded from system time | OS CSPRNG (`getrandom`) — non-predictable |
-| `hash` strategy | SHA-256(salt \|\| input) | HMAC-SHA-256(key=salt, data=input) |
-| Deterministic domain byte stream | SHA-256 CTR-mode | HMAC-SHA-256 CTR-mode |
-| Report `security_profile` field | `"standard"` | `"hardened"` |
-| `--seed` / `DUMPLING_SEED` | Seeds the PRNG | Ignored (warning emitted) |
-### Why this matters
-- **Non-predictable output**: xorshift64\* is seeded from system time, which is guessable. The OS CSPRNG cannot be predicted from timing alone.
-- **Proper keyed hashing**: `SHA-256(key || data)` is vulnerable to length-extension attacks and weak as a MAC. HMAC-SHA-256 uses the salt as a genuine cryptographic key, providing provable PRF security.
-- **Domain separation**: HMAC construction ensures outputs from one salt/key cannot be confused with another.
-### Key management guidance
-Configure a per-environment secret via an env-backed reference to prevent key leakage:
-```toml
-# .dumplingconf
-salt = "${DUMPLING_HMAC_KEY}"
-[rules."public.users"]
-ssn = { strategy = "hash", as_string = true }
-email = { strategy = "email", domain = "users" }
-```
-```bash
-export DUMPLING_HMAC_KEY="$(openssl rand -base64 32)"
-dumpling --security-profile hardened -i dump.sql -o sanitized.sql
-```
-**Key rotation**: Changing `DUMPLING_HMAC_KEY` will produce entirely different pseudonyms for all salted/domain-mapped columns. If you rely on referential consistency across separately-processed dumps (e.g., snapshots over time), keep the same key or re-anonymize all related dumps together. Rotate keys when:
-- A key may have been compromised.
-- You intentionally want to break prior referential linkability.
-### Report metadata
-The JSON report always includes the active security profile:
-```json
-{
-  "security_profile": "hardened",
-  "total_rows_processed": 1000,
-  ...
-}
-```
----
 ## Policy linting
 The `lint-policy` subcommand statically analyses your configuration and flags common issues before they affect a production pipeline.
@@ -539,7 +558,7 @@ See the [CI guardrails documentation](docs/src/ci-guardrails.md) for full pipeli
 - For CI/CD and production-like workflows, prefer the default fail-closed mode and avoid `--allow-noop` unless a no-op run is intentional.
 - For best results, configure strategies compatible with column data types. If you hash an integer column, Dumpling will render a string; most databases can coerce this, but explicit `as_string = false` may help in some cases.
 - For length-restricted text columns (`varchar(n)`, `character varying(n)`, `char(n)`, `character(n)`), Dumpling reads `CREATE TABLE` definitions and truncates generated text values to fit within the declared limit.
-- Deterministic anonymization for tests: pass `--seed <u64>` or set env `DUMPLING_SEED` to make fuzz strategies reproducible across runs. Note: `--seed` has no effect in `--security-profile hardened`.
+- Deterministic anonymization for tests: pass `--seed <u64>` or set env `DUMPLING_SEED` to make fuzz strategies reproducible across runs. In hardened security profile, seeds are ignored; see the [configuration guide](https://ababic.github.io/dumpling/configuration.html#hardened-security-profile).
 - Domain mappings (`domain = "..."`) are deterministic by source value + domain (+ optional salt), so referential joins stay stable across tables within the same dump.
 ---

dumpling-cli 0.6.0__tar.gz → 0.7.0a0__tar.gz

dumpling-cli 0.6.0tar.gz → 0.7.0a0tar.gz