safeseed 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (55) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +150 -0
  3. package/dist/catalog.d.ts +77 -0
  4. package/dist/catalog.d.ts.map +1 -0
  5. package/dist/catalog.js +203 -0
  6. package/dist/catalog.js.map +1 -0
  7. package/dist/cli.d.ts +3 -0
  8. package/dist/cli.d.ts.map +1 -0
  9. package/dist/cli.js +228 -0
  10. package/dist/cli.js.map +1 -0
  11. package/dist/csv.d.ts +21 -0
  12. package/dist/csv.d.ts.map +1 -0
  13. package/dist/csv.js +85 -0
  14. package/dist/csv.js.map +1 -0
  15. package/dist/generate.d.ts +28 -0
  16. package/dist/generate.d.ts.map +1 -0
  17. package/dist/generate.js +101 -0
  18. package/dist/generate.js.map +1 -0
  19. package/dist/hash.d.ts +10 -0
  20. package/dist/hash.d.ts.map +1 -0
  21. package/dist/hash.js +16 -0
  22. package/dist/hash.js.map +1 -0
  23. package/dist/index.d.ts +18 -0
  24. package/dist/index.d.ts.map +1 -0
  25. package/dist/index.js +9 -0
  26. package/dist/index.js.map +1 -0
  27. package/dist/luhn.d.ts +8 -0
  28. package/dist/luhn.d.ts.map +1 -0
  29. package/dist/luhn.js +24 -0
  30. package/dist/luhn.js.map +1 -0
  31. package/dist/net.d.ts +16 -0
  32. package/dist/net.d.ts.map +1 -0
  33. package/dist/net.js +95 -0
  34. package/dist/net.js.map +1 -0
  35. package/dist/record.d.ts +36 -0
  36. package/dist/record.d.ts.map +1 -0
  37. package/dist/record.js +53 -0
  38. package/dist/record.js.map +1 -0
  39. package/dist/rng.d.ts +13 -0
  40. package/dist/rng.d.ts.map +1 -0
  41. package/dist/rng.js +26 -0
  42. package/dist/rng.js.map +1 -0
  43. package/dist/scan.d.ts +24 -0
  44. package/dist/scan.d.ts.map +1 -0
  45. package/dist/scan.js +49 -0
  46. package/dist/scan.js.map +1 -0
  47. package/dist/types.d.ts +30 -0
  48. package/dist/types.d.ts.map +1 -0
  49. package/dist/types.js +9 -0
  50. package/dist/types.js.map +1 -0
  51. package/dist/verify.d.ts +33 -0
  52. package/dist/verify.d.ts.map +1 -0
  53. package/dist/verify.js +177 -0
  54. package/dist/verify.js.map +1 -0
  55. package/package.json +60 -0
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 SafeSeed contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,150 @@
1
+ # SafeSeed
2
+
3
+ **Confirmably-synthetic test data by construction.** Generate stand-in data for test, CI, and demo environments from ranges that published standards have reserved as *permanently not real*, bind a tamper-evident record to the output, verify a file stays in range, and scan data you already have for real PII that slipped in.
4
+
5
+ No model, no training data, no backend, no accounts, no telemetry, zero runtime dependencies. It runs entirely on your machine. MIT licensed.
6
+
7
+ > SafeSeed makes *"no production data crossed this boundary"* a property you can audit once and enforce on every run — and it tells you, in writing, exactly where that guarantee ends. The full argument is in [docs/safe-test-data-by-construction.md](docs/safe-test-data-by-construction.md).
8
+
9
+ ---
10
+
11
+ ## Why this exists
12
+
13
+ The breaches teams plan for involve production. The ones that actually happen often involve a *copy* of production sitting somewhere nobody hardened: a staging database, a CI job's fixtures, a developer's laptop, a screenshot in a bug ticket. The fix everyone agrees on is "don't put real data there, use synthetic data." The disagreement is about what "synthetic" should mean.
14
+
15
+ Most synthetic-data tools **learn the shape of your real data from your real data**, so the output can memorize and re-emit real records, and privacy becomes something you defend *after the fact* (membership-inference tests, differential-privacy noise, a privacy report) on every dataset, forever.
16
+
17
+ SafeSeed takes the other path: **never let real data into the process at all.** Each field is drawn from values a standard reserves as non-real. If the source data never touches the generator, there is no real record for it to memorize or re-emit — not because you defended it after the fact, but because the input was never there. You audit a few hundred cited lines once, and every output inherits the guarantee. (This is the argument against model memorization; the structurally-fake tier still carries the coincidence caveat below.)
18
+
19
+ ## What it does
20
+
21
+ 1. **Generate** safe-by-construction test data from the cited reserved ranges (deterministic from a seed, so the output is a committable fixture).
22
+ 2. **Attest** with a tamper-evident run record that binds to the file's content hash and states the honesty tier of every field.
23
+ 3. **Verify** that a file is still byte-for-byte the generated one *and* that every value is still in range — wire it into CI to fail the build on drift. Strict by default; opt into **column-scoped verify** (`--allow-added-columns`) to attest only the synthetic columns you generated, so a team can add its own business columns (job title, industry) without breaking attestation — added columns are reported as *unattested*, never a silent pass.
24
+ 4. **Scan** an *existing* CSV / seed file and flag values that are not in reserved ranges as candidate real PII. Column-scoped verify pairs with scan: verify vouches for the synthetic columns, scan checks the columns you added.
25
+
26
+ `verify` and `scan` are **generator-agnostic**: they work on any data file, however it was produced, so you can keep the generator you already use and wrap it.
27
+
28
+ There is also a self-serve **generator page** (browser, zero network): pick fields and types, set rows and seed, preview the tier-colored output, and download the CSV plus its run record. It ships as a committed offline single file at [`demo/safeseed-generator.html`](demo/safeseed-generator.html).
29
+
30
+ ## Quickstart
31
+
32
+ ### CLI
33
+
34
+ ```bash
35
+ # Generate 100 rows of safe data + a run record
36
+ safeseed generate \
37
+ --fields email:email,name:fullName,phone:phone,ssn:ssn,card:creditCard \
38
+ --rows 100 --seed 42 \
39
+ --out data.csv --record record.json
40
+
41
+ # Fail the build if the file drifts out of range or was tampered with
42
+ safeseed verify --in data.csv --record record.json # exit 0 clean, 1 on drift
43
+
44
+ # Column-scoped: attest the synthetic columns, allow + report added business columns
45
+ safeseed verify --in data.csv --record record.json --allow-added-columns
46
+
47
+ # Scan a legacy file for real PII that slipped in
48
+ safeseed scan --in legacy.csv --fields email:email,phone:phone,ssn:ssn
49
+
50
+ # Inspect the reserved-range catalog (every field's range, citation, and tier)
51
+ safeseed catalog
52
+ ```
53
+
54
+ ### As a CI gate (GitHub Action)
55
+
56
+ ```yaml
57
+ - uses: <org>/safeseed@v0 # the verify Action
58
+ with:
59
+ data: fixtures/seed.csv
60
+ record: fixtures/seed.record.json
61
+ # allow-added-columns: true # optional: column-scoped verify (attest synthetic columns,
62
+ # # report added business columns instead of failing)
63
+ ```
64
+
65
+ ### As a library
66
+
67
+ ```ts
68
+ import { generate, toCsv, makeRunRecord, verify, scan } from "safeseed";
69
+
70
+ const ds = generate({
71
+ schema: [
72
+ { name: "email", type: "email" },
73
+ { name: "phone", type: "phone" },
74
+ ],
75
+ rows: 100,
76
+ seed: 42,
77
+ });
78
+ const csv = toCsv(ds.columns, ds.rows);
79
+ const record = await makeRunRecord(ds, csv);
80
+
81
+ const result = await verify(csv, record); // { ok, failures, checked, unattestedColumns, warnings }
82
+
83
+ // Column-scoped: attest the synthetic columns only; added columns are reported, not failed.
84
+ const scoped = await verify(extendedCsv, record, { allowAddedColumns: true });
85
+ // scoped.unattestedColumns lists the business columns the team added — scan those.
86
+ ```
87
+
88
+ The library is isomorphic — the same core runs in Node (>=18) and in the browser, using the platform's Web Crypto for hashing.
89
+
90
+ ## The honesty tiers
91
+
92
+ Honesty is the credibility here, so the claim has tiers, and every field is labeled with its own:
93
+
94
+ | Tier | What it means | Examples | The claim |
95
+ |---|---|---|---|
96
+ | **provably-non-real** | Reserved by a published standard/protocol; the standard itself makes them non-routable/non-registrable. | RFC 2606 email domains, RFC 5737 / 3849 documentation IPs | "Cannot correspond to a real person or system." |
97
+ | **reserved-not-issued** | Reserved by the *issuing authority* and never assigned — strong, but administrative policy, not protocol. | NANPA `555-01xx` phones, never-issued SSN ranges | "Never assigned, so no real holder has one; non-real by policy, not protocol." |
98
+ | **designated-test-only** | A valid-looking value processors/sandboxes *designate* for testing. It passes validation. | Card test PANs (`4242…`) | "Non-real by designation, **not** by impossibility." |
99
+ | **structurally-fake** | No standard reserves it, so it is made *self-evidently* fake instead of plausible. | `TEST_Lastname_000142`, `123 Example Way` | "Synthetic token; not derived from any real record." |
100
+
101
+ Stating which tier each field sits in is not a weakness to bury. It is the thing that separates a practitioner from a datasheet.
102
+
103
+ ## What this does **not** prove
104
+
105
+ The fastest way to lose a security reviewer is to claim more than you can defend. So, on purpose:
106
+
107
+ - **It attests the generator, not your environment.** "Generated from reserved ranges, no real input" is true at the moment of generation. It says nothing about a file someone later edits, joins against a prod snapshot, or replaces. That is why `verify` re-checks the *actual artifact* (content hash + range), and why the assurance rests on the open, auditable code — a signature proves the tool ran, not that the tool is right. The run record is therefore a **tamper-evident record**, not "cryptographic proof of no PII."
108
+ - **"Not derived from production data" is not "not personal data."** The defensible claim is the former, never the latter.
109
+ - **It is a control, not a scope-out.** A security-of-processing and data-minimization control for non-production environments (GDPR Articles 25 and 32; SOC 2 and ISO 27001 in audit terms). It is not a DSAR answer and not a lawful-basis story.
110
+ - **It is deliberately low-fidelity.** Every phone in one small block, every IP in three ranges. That is correct for "prove there's no real PII in functional and CI tests," and wrong as your general fixture source, your ML training data, or your load-testing input.
111
+
112
+ ## "Why not just use Faker?"
113
+
114
+ Off-the-shelf fake-data libraries already emit reserved-range values. What is missing is the *discipline* around them: every personal-data field tied to a cited standard, an enforcement check that fails the build when a value drifts out of range, a scan that flags real-looking data already sitting in your test environment, and an honest written statement of exactly what is and isn't guaranteed. That discipline is the contribution. The generator was never the hard part.
115
+
116
+ (SafeSeed can wrap Faker, by the way: keep it for realistic non-PII fields, and let the cited reserved ranges own every PII-shaped one.)
117
+
118
+ ## Standards referenced
119
+
120
+ - **RFC 2606** — reserved DNS names (`example.com/.net/.org`, `.test`/`.example`/`.invalid`/`.localhost`).
121
+ - **RFC 5737** — IPv4 documentation blocks (`192.0.2.0/24`, `198.51.100.0/24`, `203.0.113.0/24`).
122
+ - **RFC 3849** — IPv6 documentation prefix (`2001:db8::/32`).
123
+ - **NANPA / ATIS** — fictitious telephone numbers (`555-0100` through `555-0199`).
124
+ - **SSA SSN randomization** (effective 2011-06-25) — never-assigned ranges (area `000`/`666`/`900-999`, group `00`, serial `0000`), confirmed against the [SSA randomization rules](https://www.ssa.gov/employer/randomization.html).
125
+ - Card numbers are **published processor/sandbox test PANs** (e.g. Stripe testing docs), in the `designated-test-only` tier (they pass Luhn, authorize nowhere).
126
+
127
+ ## Development
128
+
129
+ ```bash
130
+ npm install
131
+ npm run typecheck # tsc --noEmit
132
+ npm test # vitest (the named TDD suite: catalog/generate/verify/scan/record)
133
+ npm run build # emit dist/ (library + CLI)
134
+ ```
135
+
136
+ The catalog in [`src/catalog.ts`](src/catalog.ts) is the reusable core: it maps each field type to its reserved range, citation, and tier. Generation, verification, and scanning all read from it, which is what makes the promise auditable.
137
+
138
+ ## Status
139
+
140
+ Core library, CLI, the `verify` Action, and an interactive browser demo are built and tested (64 tests; CI green). SafeSeed 0.2.0 adds per-column hashes and opt-in **column-scoped verify**, a self-serve **generator page**, and a four-tier honesty taxonomy that separates protocol-reserved values from authority-reserved (never-issued) ones. The demo lives in [`demo/`](demo/); both the showcase and the generator ship as committed, offline single files at [`demo/safeseed-demo.html`](demo/safeseed-demo.html) and [`demo/safeseed-generator.html`](demo/safeseed-generator.html). npm publication is the remaining step. The design record is in [SPEC.md](SPEC.md); the v2 feature spec is in [docs/generator-and-column-scoped-verify.md](docs/generator-and-column-scoped-verify.md).
141
+
142
+ ## Support
143
+
144
+ SafeSeed is free and MIT-licensed. If it's useful to you or your team, you can [support its development on Ko-fi](https://ko-fi.com/Q3S6220HI9).
145
+
146
+ [![ko-fi](https://ko-fi.com/img/githubbutton_sm.svg)](https://ko-fi.com/Q3S6220HI9)
147
+
148
+ ## License
149
+
150
+ [MIT](LICENSE).
@@ -0,0 +1,77 @@
1
+ /**
2
+ * The reserved-range catalog — SafeSeed's reusable core IP.
3
+ *
4
+ * Each entry maps a PII-shaped field type to a standards-reserved "never-real"
5
+ * space, the citation for that reservation, its honesty tier, and the exact
6
+ * language allowed about it. Generation, verification, and scanning all read from
7
+ * this one table, which is what makes the promise auditable: review these few
8
+ * hundred cited lines once, and every output inherits the guarantee.
9
+ *
10
+ * Sourcing: the RFC 2606 / 5737 / 3849 reservations and the SSA SSN randomization
11
+ * exclusions (area 000/666/900-999, group 00, serial 0000) are confirmed against
12
+ * primary sources. The NANPA 555-0100..0199 fictitious block is well-established;
13
+ * its shipped citation link is the NANPA homepage rather than a deep rule page.
14
+ */
15
+ import type { FieldType, Tier } from "./types.js";
16
+ export declare const CATALOG_VERSION = "1.0.0";
17
+ /** Inspectable, structured definition of a reserved space. Drives generation,
18
+ * verification, and scanning, and lets tests assert the ranges match standards. */
19
+ export type ReservedSpec = {
20
+ kind: "emailDomains";
21
+ domains: readonly string[];
22
+ reservedTlds: readonly string[];
23
+ } | {
24
+ kind: "domains";
25
+ domains: readonly string[];
26
+ reservedTlds: readonly string[];
27
+ } | {
28
+ kind: "ipv4Blocks";
29
+ cidrs: readonly string[];
30
+ } | {
31
+ kind: "ipv6Blocks";
32
+ cidrs: readonly string[];
33
+ } | {
34
+ kind: "phoneBlock";
35
+ centralOfficeCode: string;
36
+ subscriberStart: number;
37
+ subscriberEnd: number;
38
+ } | {
39
+ kind: "ssnInvalid";
40
+ invalidAreas: readonly string[];
41
+ invalidAreaMin: number;
42
+ invalidAreaMax: number;
43
+ invalidGroup: string;
44
+ invalidSerial: string;
45
+ } | {
46
+ kind: "cardTestNumbers";
47
+ numbers: readonly string[];
48
+ } | {
49
+ kind: "fakeToken";
50
+ pattern: string;
51
+ };
52
+ export interface CatalogEntry {
53
+ field: FieldType;
54
+ tier: Tier;
55
+ /** Human-readable citation for the reservation. */
56
+ citation: string;
57
+ /** What the reserved space is, in plain words. */
58
+ description: string;
59
+ /** Tier-appropriate, non-overclaiming statement about values of this field. */
60
+ claim: string;
61
+ reserved: ReservedSpec;
62
+ }
63
+ export declare const CATALOG: readonly CatalogEntry[];
64
+ /** Look up the catalog entry for a field type. Throws if the field is unknown. */
65
+ export declare function getEntry(field: FieldType): CatalogEntry;
66
+ /**
67
+ * Is `value` inside the reserved range declared for `entry`? This is the single
68
+ * predicate behind both `verify` (is generated output still in range?) and `scan`
69
+ * (does existing data contain anything *out* of range, i.e. candidate real PII?).
70
+ */
71
+ export declare function isReserved(entry: CatalogEntry, value: string): boolean;
72
+ /**
73
+ * Heuristic used to assert the structurally-fake tier really is self-evident:
74
+ * a human glancing at the value should see "test data", not a plausible person.
75
+ */
76
+ export declare function isSelfEvidentlyFake(value: string): boolean;
77
+ //# sourceMappingURL=catalog.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"catalog.d.ts","sourceRoot":"","sources":["../src/catalog.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;GAaG;AACH,OAAO,KAAK,EAAE,SAAS,EAAE,IAAI,EAAE,MAAM,YAAY,CAAC;AAGlD,eAAO,MAAM,eAAe,UAAU,CAAC;AAEvC;mFACmF;AACnF,MAAM,MAAM,YAAY,GACpB;IAAE,IAAI,EAAE,cAAc,CAAC;IAAC,OAAO,EAAE,SAAS,MAAM,EAAE,CAAC;IAAC,YAAY,EAAE,SAAS,MAAM,EAAE,CAAA;CAAE,GACrF;IAAE,IAAI,EAAE,SAAS,CAAC;IAAC,OAAO,EAAE,SAAS,MAAM,EAAE,CAAC;IAAC,YAAY,EAAE,SAAS,MAAM,EAAE,CAAA;CAAE,GAChF;IAAE,IAAI,EAAE,YAAY,CAAC;IAAC,KAAK,EAAE,SAAS,MAAM,EAAE,CAAA;CAAE,GAChD;IAAE,IAAI,EAAE,YAAY,CAAC;IAAC,KAAK,EAAE,SAAS,MAAM,EAAE,CAAA;CAAE,GAChD;IAAE,IAAI,EAAE,YAAY,CAAC;IAAC,iBAAiB,EAAE,MAAM,CAAC;IAAC,eAAe,EAAE,MAAM,CAAC;IAAC,aAAa,EAAE,MAAM,CAAA;CAAE,GACjG;IACE,IAAI,EAAE,YAAY,CAAC;IACnB,YAAY,EAAE,SAAS,MAAM,EAAE,CAAC;IAChC,cAAc,EAAE,MAAM,CAAC;IACvB,cAAc,EAAE,MAAM,CAAC;IACvB,YAAY,EAAE,MAAM,CAAC;IACrB,aAAa,EAAE,MAAM,CAAC;CACvB,GACD;IAAE,IAAI,EAAE,iBAAiB,CAAC;IAAC,OAAO,EAAE,SAAS,MAAM,EAAE,CAAA;CAAE,GACvD;IAAE,IAAI,EAAE,WAAW,CAAC;IAAC,OAAO,EAAE,MAAM,CAAA;CAAE,CAAC;AAE3C,MAAM,WAAW,YAAY;IAC3B,KAAK,EAAE,SAAS,CAAC;IACjB,IAAI,EAAE,IAAI,CAAC;IACX,mDAAmD;IACnD,QAAQ,EAAE,MAAM,CAAC;IACjB,kDAAkD;IAClD,WAAW,EAAE,MAAM,CAAC;IACpB,+EAA+E;IAC/E,KAAK,EAAE,MAAM,CAAC;IACd,QAAQ,EAAE,YAAY,CAAC;CACxB;AAiCD,eAAO,MAAM,OAAO,EAAE,SAAS,YAAY,EAwG1C,CAAC;AAIF,kFAAkF;AAClF,wBAAgB,QAAQ,CAAC,KAAK,EAAE,SAAS,GAAG,YAAY,CAIvD;AAUD;;;;GAIG;AACH,wBAAgB,UAAU,CAAC,KAAK,EAAE,YAAY,EAAE,KAAK,EAAE,MAAM,GAAG,OAAO,CAwCtE;AAED;;;GAGG;AACH,wBAAgB,mBAAmB,CAAC,KAAK,EAAE,MAAM,GAAG,OAAO,CAE1D"}
@@ -0,0 +1,203 @@
1
+ import { ipv4InCidr, ipv6InPrefix } from "./net.js";
2
+ export const CATALOG_VERSION = "1.0.0";
3
+ // Tier-appropriate claim language. The strong wording ("cannot correspond to a
4
+ // real person") lives ONLY on the protocol-provable tier; every weaker tier
5
+ // deliberately avoids "proof", "impossible", "guarantee", "cannot be real".
6
+ const CLAIM_PROVABLE = "Reserved by published standard; values in this range cannot correspond to a real person or system.";
7
+ const CLAIM_RESERVED_NOT_ISSUED = "Reserved by the issuing authority and never assigned, so no real holder has one. Non-real by administrative policy rather than by protocol.";
8
+ const CLAIM_DESIGNATED = "Designated test value that passes validation; non-real by processor/sandbox designation, not by construction. Valid-looking, but reserved for testing.";
9
+ const CLAIM_FAKE = "Structurally synthetic token; not derived from any real record. This field type is not reserved by any standard, so realism is deliberately avoided.";
10
+ const RFC2606_DOMAINS = ["example.com", "example.net", "example.org"];
11
+ const RFC2606_TLDS = ["test", "example", "invalid", "localhost"];
12
+ const RFC5737_BLOCKS = ["192.0.2.0/24", "198.51.100.0/24", "203.0.113.0/24"];
13
+ const RFC3849_BLOCKS = ["2001:db8::/32"];
14
+ /** Payment-processor / sandbox test PANs (all Luhn-valid by design; authorize nowhere). */
15
+ const CARD_TEST_NUMBERS = [
16
+ "4242424242424242", // Visa (widely used sandbox)
17
+ "4111111111111111", // Visa
18
+ "4000056655665556", // Visa debit
19
+ "5555555555554444", // Mastercard
20
+ "5105105105105100", // Mastercard
21
+ "2223003122003222", // Mastercard (2-series)
22
+ "378282246310005", // American Express
23
+ "371449635398431", // American Express
24
+ "6011111111111117", // Discover
25
+ "3530111333300000", // JCB
26
+ ];
27
+ export const CATALOG = [
28
+ {
29
+ field: "email",
30
+ tier: "provably-non-real",
31
+ citation: "RFC 2606 §2–3 (reserved example.com/.net/.org and TLDs .test/.example/.invalid/.localhost)",
32
+ description: "Email addresses whose domain is an RFC 2606 reserved domain or TLD; can never route to a real mailbox.",
33
+ claim: CLAIM_PROVABLE,
34
+ reserved: { kind: "emailDomains", domains: RFC2606_DOMAINS, reservedTlds: RFC2606_TLDS },
35
+ },
36
+ {
37
+ field: "domain",
38
+ tier: "provably-non-real",
39
+ citation: "RFC 2606 §2–3 (reserved domains and TLDs)",
40
+ description: "Hostnames under an RFC 2606 reserved domain or TLD.",
41
+ claim: CLAIM_PROVABLE,
42
+ reserved: { kind: "domains", domains: RFC2606_DOMAINS, reservedTlds: RFC2606_TLDS },
43
+ },
44
+ {
45
+ field: "ipv4",
46
+ tier: "provably-non-real",
47
+ citation: "RFC 5737 (IPv4 documentation blocks TEST-NET-1/2/3)",
48
+ description: "IPv4 addresses inside the three RFC 5737 documentation ranges, which per the RFC should not be routed on the public Internet.",
49
+ claim: CLAIM_PROVABLE,
50
+ reserved: { kind: "ipv4Blocks", cidrs: RFC5737_BLOCKS },
51
+ },
52
+ {
53
+ field: "ipv6",
54
+ tier: "provably-non-real",
55
+ citation: "RFC 3849 (IPv6 documentation prefix 2001:db8::/32)",
56
+ description: "IPv6 addresses inside the RFC 3849 documentation prefix.",
57
+ claim: CLAIM_PROVABLE,
58
+ reserved: { kind: "ipv6Blocks", cidrs: RFC3849_BLOCKS },
59
+ },
60
+ {
61
+ field: "phone",
62
+ tier: "reserved-not-issued",
63
+ citation: "NANPA / ATIS fictitious-number assignment (555-0100 through 555-0199)",
64
+ description: "North American numbers in the 555-01xx fictitious subscriber block, which the numbering authority reserves and never assigns to a real line (administrative reservation, not a protocol limit).",
65
+ claim: CLAIM_RESERVED_NOT_ISSUED,
66
+ reserved: { kind: "phoneBlock", centralOfficeCode: "555", subscriberStart: 100, subscriberEnd: 199 },
67
+ },
68
+ {
69
+ field: "ssn",
70
+ tier: "reserved-not-issued",
71
+ citation: "SSA SSN randomization (effective 2011-06-25): never-assigned area numbers 000 / 666 / 900-999, plus group 00 and serial 0000 (ssa.gov/employer/randomization.html)",
72
+ description: "US SSNs whose area, group, or serial falls in a range the SSA never issues — reserved by the issuing authority's own rules rather than by protocol.",
73
+ claim: CLAIM_RESERVED_NOT_ISSUED,
74
+ reserved: {
75
+ kind: "ssnInvalid",
76
+ invalidAreas: ["000", "666"],
77
+ invalidAreaMin: 900,
78
+ invalidAreaMax: 999,
79
+ invalidGroup: "00",
80
+ invalidSerial: "0000",
81
+ },
82
+ },
83
+ {
84
+ field: "creditCard",
85
+ tier: "designated-test-only",
86
+ citation: "Payment-processor / sandbox test PANs (e.g. Stripe testing docs); pass Luhn, authorize nowhere",
87
+ description: "Card numbers processors and sandboxes publish for testing. They pass the Luhn checksum but authorize nowhere, so they are non-real by designation, not by impossibility.",
88
+ claim: CLAIM_DESIGNATED,
89
+ reserved: { kind: "cardTestNumbers", numbers: CARD_TEST_NUMBERS },
90
+ },
91
+ {
92
+ field: "firstName",
93
+ tier: "structurally-fake",
94
+ citation: "No standard reserves names; structurally-fake token convention",
95
+ description: "Given names rendered as obvious TEST_ tokens rather than plausible names.",
96
+ claim: CLAIM_FAKE,
97
+ reserved: { kind: "fakeToken", pattern: "^TEST_Firstname_\\d{6,}$" },
98
+ },
99
+ {
100
+ field: "lastName",
101
+ tier: "structurally-fake",
102
+ citation: "No standard reserves names; structurally-fake token convention",
103
+ description: "Family names rendered as obvious TEST_ tokens.",
104
+ claim: CLAIM_FAKE,
105
+ reserved: { kind: "fakeToken", pattern: "^TEST_Lastname_\\d{6,}$" },
106
+ },
107
+ {
108
+ field: "fullName",
109
+ tier: "structurally-fake",
110
+ citation: "No standard reserves names; structurally-fake token convention",
111
+ description: "Full names rendered as obvious TEST_ tokens.",
112
+ claim: CLAIM_FAKE,
113
+ reserved: { kind: "fakeToken", pattern: "^TEST_Person_\\d{6,}$" },
114
+ },
115
+ {
116
+ field: "streetAddress",
117
+ tier: "structurally-fake",
118
+ citation: "No standard reserves addresses; structurally-fake 'Example' convention",
119
+ description: "Street addresses built on the obvious 'Example' street name.",
120
+ claim: CLAIM_FAKE,
121
+ reserved: { kind: "fakeToken", pattern: "^\\d+ Example (Way|St|Ave|Rd|Blvd)$" },
122
+ },
123
+ {
124
+ field: "freeText",
125
+ tier: "structurally-fake",
126
+ citation: "No standard reserves free text; structurally-fake token convention",
127
+ description: "Free-text fields rendered as obvious TEST_ tokens.",
128
+ claim: CLAIM_FAKE,
129
+ reserved: { kind: "fakeToken", pattern: "^TEST_Text_\\d{6,}$" },
130
+ },
131
+ ];
132
+ const BY_FIELD = new Map(CATALOG.map((e) => [e.field, e]));
133
+ /** Look up the catalog entry for a field type. Throws if the field is unknown. */
134
+ export function getEntry(field) {
135
+ const entry = BY_FIELD.get(field);
136
+ if (entry === undefined)
137
+ throw new Error(`No catalog entry for field type: ${field}`);
138
+ return entry;
139
+ }
140
+ function domainIsReserved(domain, domains, tlds) {
141
+ const d = domain.toLowerCase();
142
+ // RFC 2606 reserves the whole zone of a reserved second-level domain, so a
143
+ // subdomain (mail.example.com) is reserved too — not just the bare domain.
144
+ if (domains.some((rd) => d === rd || d.endsWith(`.${rd}`)))
145
+ return true;
146
+ return tlds.some((t) => d === t || d.endsWith(`.${t}`));
147
+ }
148
+ /**
149
+ * Is `value` inside the reserved range declared for `entry`? This is the single
150
+ * predicate behind both `verify` (is generated output still in range?) and `scan`
151
+ * (does existing data contain anything *out* of range, i.e. candidate real PII?).
152
+ */
153
+ export function isReserved(entry, value) {
154
+ const r = entry.reserved;
155
+ switch (r.kind) {
156
+ case "emailDomains": {
157
+ const at = value.lastIndexOf("@");
158
+ if (at < 0)
159
+ return false;
160
+ return domainIsReserved(value.slice(at + 1), r.domains, r.reservedTlds);
161
+ }
162
+ case "domains":
163
+ return domainIsReserved(value, r.domains, r.reservedTlds);
164
+ case "ipv4Blocks":
165
+ return r.cidrs.some((c) => ipv4InCidr(value, c));
166
+ case "ipv6Blocks":
167
+ return r.cidrs.some((c) => ipv6InPrefix(value, c));
168
+ case "phoneBlock": {
169
+ const digits = value.replace(/\D/g, "");
170
+ if (digits.length < 7)
171
+ return false;
172
+ const last7 = digits.slice(-7);
173
+ const nxx = last7.slice(0, 3);
174
+ const line = Number(last7.slice(3));
175
+ return nxx === r.centralOfficeCode && line >= r.subscriberStart && line <= r.subscriberEnd;
176
+ }
177
+ case "ssnInvalid": {
178
+ const digits = value.replace(/\D/g, "");
179
+ if (digits.length !== 9)
180
+ return false;
181
+ const area = digits.slice(0, 3);
182
+ const group = digits.slice(3, 5);
183
+ const serial = digits.slice(5);
184
+ const areaNum = Number(area);
185
+ const areaInvalid = r.invalidAreas.includes(area) || (areaNum >= r.invalidAreaMin && areaNum <= r.invalidAreaMax);
186
+ return areaInvalid || group === r.invalidGroup || serial === r.invalidSerial;
187
+ }
188
+ case "cardTestNumbers": {
189
+ const digits = value.replace(/\D/g, "");
190
+ return r.numbers.some((n) => n.replace(/\D/g, "") === digits);
191
+ }
192
+ case "fakeToken":
193
+ return new RegExp(r.pattern).test(value);
194
+ }
195
+ }
196
+ /**
197
+ * Heuristic used to assert the structurally-fake tier really is self-evident:
198
+ * a human glancing at the value should see "test data", not a plausible person.
199
+ */
200
+ export function isSelfEvidentlyFake(value) {
201
+ return /test[_\s]/i.test(value) || /\bexample\b/i.test(value);
202
+ }
203
+ //# sourceMappingURL=catalog.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"catalog.js","sourceRoot":"","sources":["../src/catalog.ts"],"names":[],"mappings":"AAeA,OAAO,EAAE,UAAU,EAAE,YAAY,EAAE,MAAM,UAAU,CAAC;AAEpD,MAAM,CAAC,MAAM,eAAe,GAAG,OAAO,CAAC;AAiCvC,+EAA+E;AAC/E,4EAA4E;AAC5E,4EAA4E;AAC5E,MAAM,cAAc,GAClB,oGAAoG,CAAC;AACvG,MAAM,yBAAyB,GAC7B,6IAA6I,CAAC;AAChJ,MAAM,gBAAgB,GACpB,wJAAwJ,CAAC;AAC3J,MAAM,UAAU,GACd,sJAAsJ,CAAC;AAEzJ,MAAM,eAAe,GAAG,CAAC,aAAa,EAAE,aAAa,EAAE,aAAa,CAAU,CAAC;AAC/E,MAAM,YAAY,GAAG,CAAC,MAAM,EAAE,SAAS,EAAE,SAAS,EAAE,WAAW,CAAU,CAAC;AAC1E,MAAM,cAAc,GAAG,CAAC,cAAc,EAAE,iBAAiB,EAAE,gBAAgB,CAAU,CAAC;AACtF,MAAM,cAAc,GAAG,CAAC,eAAe,CAAU,CAAC;AAElD,2FAA2F;AAC3F,MAAM,iBAAiB,GAAG;IACxB,kBAAkB,EAAE,6BAA6B;IACjD,kBAAkB,EAAE,OAAO;IAC3B,kBAAkB,EAAE,aAAa;IACjC,kBAAkB,EAAE,aAAa;IACjC,kBAAkB,EAAE,aAAa;IACjC,kBAAkB,EAAE,wBAAwB;IAC5C,iBAAiB,EAAE,mBAAmB;IACtC,iBAAiB,EAAE,mBAAmB;IACtC,kBAAkB,EAAE,WAAW;IAC/B,kBAAkB,EAAE,MAAM;CAClB,CAAC;AAEX,MAAM,CAAC,MAAM,OAAO,GAA4B;IAC9C;QACE,KAAK,EAAE,OAAO;QACd,IAAI,EAAE,mBAAmB;QACzB,QAAQ,EAAE,4FAA4F;QACtG,WAAW,EAAE,wGAAwG;QACrH,KAAK,EAAE,cAAc;QACrB,QAAQ,EAAE,EAAE,IAAI,EAAE,cAAc,EAAE,OAAO,EAAE,eAAe,EAAE,YAAY,EAAE,YAAY,EAAE;KACzF;IACD;QACE,KAAK,EAAE,QAAQ;QACf,IAAI,EAAE,mBAAmB;QACzB,QAAQ,EAAE,2CAA2C;QACrD,WAAW,EAAE,qDAAqD;QAClE,KAAK,EAAE,cAAc;QACrB,QAAQ,EAAE,EAAE,IAAI,EAAE,SAAS,EAAE,OAAO,EAAE,eAAe,EAAE,YAAY,EAAE,YAAY,EAAE;KACpF;IACD;QACE,KAAK,EAAE,MAAM;QACb,IAAI,EAAE,mBAAmB;QACzB,QAAQ,EAAE,qDAAqD;QAC/D,WAAW,EAAE,+HAA+H;QAC5I,KAAK,EAAE,cAAc;QACrB,QAAQ,EAAE,EAAE,IAAI,EAAE,YAAY,EAAE,KAAK,EAAE,cAAc,EAAE;KACxD;IACD;QACE,KAAK,EAAE,MAAM;QACb,IAAI,EAAE,mBAAmB;QACzB,QAAQ,EAAE,oDAAoD;QAC9D,WAAW,EAAE,0DAA0D;QACvE,KAAK,EAAE,cAAc;QACrB,QAAQ,EAAE,EAAE,IAAI,EAAE,YAAY,EAAE,KAAK,EAAE,cAAc,EAAE;KACxD;IACD;QACE,KAAK,EAAE,OAAO;QACd,IAAI,EAAE,qBAAqB;QAC3B,QAAQ,EAAE,uEAAuE;QACjF,WAAW,EAAE,iMAAiM;QAC9M,KAAK,EAAE,yBAAyB;QAChC,QAAQ,EAAE,EAAE,IAAI,EAAE,YAAY,EAAE,iBAAiB,EAAE,KAAK,EAAE,eAAe,EAAE,GAAG,EAAE,aAAa,EAAE,GAAG,EAAE;KACrG;IACD;QACE,KAAK,EAAE,KAAK;QACZ,IAAI,EAAE,qBAAqB;QAC3B,QAAQ,EAAE,oKAAoK;QAC9K,WAAW,EAAE,qJAAqJ;QAClK,KAAK,EAAE,yBAAyB;QAChC,QAAQ,EAAE;YACR,IAAI,EAAE,YAAY;YAClB,YAAY,EAAE,CAAC,KAAK,EAAE,KAAK,CAAC;YAC5B,cAAc,EAAE,GAAG;YACnB,cAAc,EAAE,GAAG;YACnB,YAAY,EAAE,IAAI;YAClB,aAAa,EAAE,MAAM;SACtB;KACF;IACD;QACE,KAAK,EAAE,YAAY;QACnB,IAAI,EAAE,sBAAsB;QAC5B,QAAQ,EAAE,gGAAgG;QAC1G,WAAW,EAAE,0KAA0K;QACvL,KAAK,EAAE,gBAAgB;QACvB,QAAQ,EAAE,EAAE,IAAI,EAAE,iBAAiB,EAAE,OAAO,EAAE,iBAAiB,EAAE;KAClE;IACD;QACE,KAAK,EAAE,WAAW;QAClB,IAAI,EAAE,mBAAmB;QACzB,QAAQ,EAAE,gEAAgE;QAC1E,WAAW,EAAE,2EAA2E;QACxF,KAAK,EAAE,UAAU;QACjB,QAAQ,EAAE,EAAE,IAAI,EAAE,WAAW,EAAE,OAAO,EAAE,0BAA0B,EAAE;KACrE;IACD;QACE,KAAK,EAAE,UAAU;QACjB,IAAI,EAAE,mBAAmB;QACzB,QAAQ,EAAE,gEAAgE;QAC1E,WAAW,EAAE,gDAAgD;QAC7D,KAAK,EAAE,UAAU;QACjB,QAAQ,EAAE,EAAE,IAAI,EAAE,WAAW,EAAE,OAAO,EAAE,yBAAyB,EAAE;KACpE;IACD;QACE,KAAK,EAAE,UAAU;QACjB,IAAI,EAAE,mBAAmB;QACzB,QAAQ,EAAE,gEAAgE;QAC1E,WAAW,EAAE,8CAA8C;QAC3D,KAAK,EAAE,UAAU;QACjB,QAAQ,EAAE,EAAE,IAAI,EAAE,WAAW,EAAE,OAAO,EAAE,uBAAuB,EAAE;KAClE;IACD;QACE,KAAK,EAAE,eAAe;QACtB,IAAI,EAAE,mBAAmB;QACzB,QAAQ,EAAE,wEAAwE;QAClF,WAAW,EAAE,8DAA8D;QAC3E,KAAK,EAAE,UAAU;QACjB,QAAQ,EAAE,EAAE,IAAI,EAAE,WAAW,EAAE,OAAO,EAAE,qCAAqC,EAAE;KAChF;IACD;QACE,KAAK,EAAE,UAAU;QACjB,IAAI,EAAE,mBAAmB;QACzB,QAAQ,EAAE,oEAAoE;QAC9E,WAAW,EAAE,oDAAoD;QACjE,KAAK,EAAE,UAAU;QACjB,QAAQ,EAAE,EAAE,IAAI,EAAE,WAAW,EAAE,OAAO,EAAE,qBAAqB,EAAE;KAChE;CACF,CAAC;AAEF,MAAM,QAAQ,GAAG,IAAI,GAAG,CAA0B,OAAO,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,EAAE,CAAC,CAAC,CAAC,CAAC,KAAK,EAAE,CAAC,CAAC,CAAC,CAAC,CAAC;AAEpF,kFAAkF;AAClF,MAAM,UAAU,QAAQ,CAAC,KAAgB;IACvC,MAAM,KAAK,GAAG,QAAQ,CAAC,GAAG,CAAC,KAAK,CAAC,CAAC;IAClC,IAAI,KAAK,KAAK,SAAS;QAAE,MAAM,IAAI,KAAK,CAAC,oCAAoC,KAAK,EAAE,CAAC,CAAC;IACtF,OAAO,KAAK,CAAC;AACf,CAAC;AAED,SAAS,gBAAgB,CAAC,MAAc,EAAE,OAA0B,EAAE,IAAuB;IAC3F,MAAM,CAAC,GAAG,MAAM,CAAC,WAAW,EAAE,CAAC;IAC/B,2EAA2E;IAC3E,2EAA2E;IAC3E,IAAI,OAAO,CAAC,IAAI,CAAC,CAAC,EAAE,EAAE,EAAE,CAAC,CAAC,KAAK,EAAE,IAAI,CAAC,CAAC,QAAQ,CAAC,IAAI,EAAE,EAAE,CAAC,CAAC;QAAE,OAAO,IAAI,CAAC;IACxE,OAAO,IAAI,CAAC,IAAI,CAAC,CAAC,CAAC,EAAE,EAAE,CAAC,CAAC,KAAK,CAAC,IAAI,CAAC,CAAC,QAAQ,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC,CAAC;AAC1D,CAAC;AAED;;;;GAIG;AACH,MAAM,UAAU,UAAU,CAAC,KAAmB,EAAE,KAAa;IAC3D,MAAM,CAAC,GAAG,KAAK,CAAC,QAAQ,CAAC;IACzB,QAAQ,CAAC,CAAC,IAAI,EAAE,CAAC;QACf,KAAK,cAAc,CAAC,CAAC,CAAC;YACpB,MAAM,EAAE,GAAG,KAAK,CAAC,WAAW,CAAC,GAAG,CAAC,CAAC;YAClC,IAAI,EAAE,GAAG,CAAC;gBAAE,OAAO,KAAK,CAAC;YACzB,OAAO,gBAAgB,CAAC,KAAK,CAAC,KAAK,CAAC,EAAE,GAAG,CAAC,CAAC,EAAE,CAAC,CAAC,OAAO,EAAE,CAAC,CAAC,YAAY,CAAC,CAAC;QAC1E,CAAC;QACD,KAAK,SAAS;YACZ,OAAO,gBAAgB,CAAC,KAAK,EAAE,CAAC,CAAC,OAAO,EAAE,CAAC,CAAC,YAAY,CAAC,CAAC;QAC5D,KAAK,YAAY;YACf,OAAO,CAAC,CAAC,KAAK,CAAC,IAAI,CAAC,CAAC,CAAC,EAAE,EAAE,CAAC,UAAU,CAAC,KAAK,EAAE,CAAC,CAAC,CAAC,CAAC;QACnD,KAAK,YAAY;YACf,OAAO,CAAC,CAAC,KAAK,CAAC,IAAI,CAAC,CAAC,CAAC,EAAE,EAAE,CAAC,YAAY,CAAC,KAAK,EAAE,CAAC,CAAC,CAAC,CAAC;QACrD,KAAK,YAAY,CAAC,CAAC,CAAC;YAClB,MAAM,MAAM,GAAG,KAAK,CAAC,OAAO,CAAC,KAAK,EAAE,EAAE,CAAC,CAAC;YACxC,IAAI,MAAM,CAAC,MAAM,GAAG,CAAC;gBAAE,OAAO,KAAK,CAAC;YACpC,MAAM,KAAK,GAAG,MAAM,CAAC,KAAK,CAAC,CAAC,CAAC,CAAC,CAAC;YAC/B,MAAM,GAAG,GAAG,KAAK,CAAC,KAAK,CAAC,CAAC,EAAE,CAAC,CAAC,CAAC;YAC9B,MAAM,IAAI,GAAG,MAAM,CAAC,KAAK,CAAC,KAAK,CAAC,CAAC,CAAC,CAAC,CAAC;YACpC,OAAO,GAAG,KAAK,CAAC,CAAC,iBAAiB,IAAI,IAAI,IAAI,CAAC,CAAC,eAAe,IAAI,IAAI,IAAI,CAAC,CAAC,aAAa,CAAC;QAC7F,CAAC;QACD,KAAK,YAAY,CAAC,CAAC,CAAC;YAClB,MAAM,MAAM,GAAG,KAAK,CAAC,OAAO,CAAC,KAAK,EAAE,EAAE,CAAC,CAAC;YACxC,IAAI,MAAM,CAAC,MAAM,KAAK,CAAC;gBAAE,OAAO,KAAK,CAAC;YACtC,MAAM,IAAI,GAAG,MAAM,CAAC,KAAK,CAAC,CAAC,EAAE,CAAC,CAAC,CAAC;YAChC,MAAM,KAAK,GAAG,MAAM,CAAC,KAAK,CAAC,CAAC,EAAE,CAAC,CAAC,CAAC;YACjC,MAAM,MAAM,GAAG,MAAM,CAAC,KAAK,CAAC,CAAC,CAAC,CAAC;YAC/B,MAAM,OAAO,GAAG,MAAM,CAAC,IAAI,CAAC,CAAC;YAC7B,MAAM,WAAW,GACf,CAAC,CAAC,YAAY,CAAC,QAAQ,CAAC,IAAI,CAAC,IAAI,CAAC,OAAO,IAAI,CAAC,CAAC,cAAc,IAAI,OAAO,IAAI,CAAC,CAAC,cAAc,CAAC,CAAC;YAChG,OAAO,WAAW,IAAI,KAAK,KAAK,CAAC,CAAC,YAAY,IAAI,MAAM,KAAK,CAAC,CAAC,aAAa,CAAC;QAC/E,CAAC;QACD,KAAK,iBAAiB,CAAC,CAAC,CAAC;YACvB,MAAM,MAAM,GAAG,KAAK,CAAC,OAAO,CAAC,KAAK,EAAE,EAAE,CAAC,CAAC;YACxC,OAAO,CAAC,CAAC,OAAO,CAAC,IAAI,CAAC,CAAC,CAAC,EAAE,EAAE,CAAC,CAAC,CAAC,OAAO,CAAC,KAAK,EAAE,EAAE,CAAC,KAAK,MAAM,CAAC,CAAC;QAChE,CAAC;QACD,KAAK,WAAW;YACd,OAAO,IAAI,MAAM,CAAC,CAAC,CAAC,OAAO,CAAC,CAAC,IAAI,CAAC,KAAK,CAAC,CAAC;IAC7C,CAAC;AACH,CAAC;AAED;;;GAGG;AACH,MAAM,UAAU,mBAAmB,CAAC,KAAa;IAC/C,OAAO,YAAY,CAAC,IAAI,CAAC,KAAK,CAAC,IAAI,cAAc,CAAC,IAAI,CAAC,KAAK,CAAC,CAAC;AAChE,CAAC"}
package/dist/cli.d.ts ADDED
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env node
2
+ export {};
3
+ //# sourceMappingURL=cli.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"cli.d.ts","sourceRoot":"","sources":["../src/cli.ts"],"names":[],"mappings":""}