data_redactor 0.11.0-x86_64-linux → 0.14.0-x86_64-linux

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 79d2dc2a7a24145626e06741bdec238660b21d2cc2268e824878238e62d8331b
4
- data.tar.gz: 8e15d8faddbf76cf4d339efe14b88d401e91b4f67ed911682a7dad75b29ca384
3
+ metadata.gz: 86ed390986b76cb2bcb48a80c322be42b8bd91dcd5b9afb8299a6ab4c0dd9419
4
+ data.tar.gz: ca7260eff77b03d15392b0c763e56ee7b5bf146eb7f0b7597f78f41d2a6dfe08
5
5
  SHA512:
6
- metadata.gz: 37103c8e31cc7271ba2bc0903ceb90fc09d3e2f39441264249e3adb16ff40ed3ffcc713f31f78134fd2f7df3f1a635bfc01e625d52c0a978b405793ec3d18082
7
- data.tar.gz: 4aa60db835020332e65f1a24d93b584143a203695469a9a94b9c3974f12803146d8778dde4e62ce0d020f740417255c3385da120ccec64aa489cce7be21c5c34
6
+ metadata.gz: a683277146fbb1e19caefffe829a9de090d48b181f0d15cc98a625509f47da1a7cf4f7f69df165f4b077a321f2ff1ee125ef4881e5746949b62f6db87269e022
7
+ data.tar.gz: ecb6d0acfcf1a9b9c2c79f1d0e016ae4ad13d0c938e31c9d97f0b3bdc8f323ac7c7c1d1a2828067b3a5b9f33895e0e9bc3bb6a69bd9c7f324e2055cdeb082ee8
data/CHANGELOG.md CHANGED
@@ -7,6 +7,49 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ## [0.14.0] - 2026-06-17
11
+
12
+ ### Added
13
+ - **Key-name-anchored secret redaction** (`:credentials`). A new pattern tier
14
+ redacts a secret by the *name of the field it is assigned to*, for values with
15
+ no distinctive shape of their own — the primary case being an `.env` file or
16
+ config blob passed through the redactor. Anchored on the key words `password`,
17
+ `passwd`, `pwd`, `secret`, `token`, `api_key`, `apikey`, `access_key`, and
18
+ `client_secret` (case-insensitive), followed by `=` or `:` (dotenv and YAML
19
+ styles), with quoted (`"..."`/`'...'`) or unquoted (≥6 chars) values. Only the
20
+ **value** is redacted; the key is kept so logs stay greppable
21
+ (`PASSWORD=[REDACTED]`). Compound key names match whether the secret word is a
22
+ prefix or suffix segment (`POSTGRES_DB_PASSWORD=`, `PASSWORD_POSTGRES=`).
23
+ Requires the assignment separator, so the word in prose ("reset your password")
24
+ is not a false positive.
25
+ - `examples/` directory with runnable, copy-pasteable usage scripts for every
26
+ feature (core redaction, scan/dry-run, custom patterns, deep/JSON traversal,
27
+ and the Logger / Rack / Rails / LLM integrations). Repo-only — not packaged in
28
+ the gem. Linked from the README.
29
+
30
+ ## [0.13.0] - 2026-06-13
31
+
32
+ ### Changed
33
+ - **Custom-pattern registration is now thread-safe.** `add_pattern`,
34
+ `remove_pattern`, and `clear_custom_patterns!` are guarded by a mutex shared
35
+ with the `redact`/`scan` custom-pattern loop, so patterns may be registered,
36
+ removed, or cleared from any thread at any time — including at runtime from a
37
+ request handler — without coordinating with in-flight redactions. The previous
38
+ "register custom patterns at boot only" caveat is lifted. (The C extension now
39
+ links `-lpthread` on glibc; no-op on musl and macOS where pthread is in libc.)
40
+ - **`redact` releases the GVL for large inputs.** The v19 engine's per-scan
41
+ mutable state (NFA scratch and the lazy DFA cache) moved into per-thread
42
+ storage, making the engine re-entrant. `redact` now releases the GVL
43
+ (`rb_thread_call_without_gvl`) around the built-in scan for inputs above a few
44
+ KB, so a large redaction on one thread no longer blocks other Ruby threads.
45
+ Small inputs keep the GVL. No public API change; output is byte-for-byte
46
+ identical (verified by a differential gate over ~6000 inputs). The per-thread
47
+ DFA cache's allocation floor was tuned so this adds ~0.86 MB per scanning
48
+ thread (down from a naive ~3.2 MB), with no throughput change. Per-thread scan
49
+ state is freed at thread exit (via a `pthread_key` destructor), so processes
50
+ that churn many short-lived scanning threads do not accumulate dead caches —
51
+ RSS stays flat across thousands of threads.
52
+
10
53
  ## [0.11.0] - 2026-06-10
11
54
 
12
55
  ### Added
@@ -232,7 +275,9 @@ features as 0.7.1 plus the pipeline fix.
232
275
  - `DataRedactor.redact(text)` module function returning the input with every match replaced by `[REDACTED]`.
233
276
  - RSpec suite with one example per pattern.
234
277
 
235
- [Unreleased]: https://github.com/danielefrisanco/data_redactor/compare/v0.11.0...HEAD
278
+ [Unreleased]: https://github.com/danielefrisanco/data_redactor/compare/v0.14.0...HEAD
279
+ [0.14.0]: https://github.com/danielefrisanco/data_redactor/compare/v0.13.0...v0.14.0
280
+ [0.13.0]: https://github.com/danielefrisanco/data_redactor/compare/v0.11.0...v0.13.0
236
281
  [0.11.0]: https://github.com/danielefrisanco/data_redactor/compare/v0.10.1...v0.11.0
237
282
  [0.10.1]: https://github.com/danielefrisanco/data_redactor/compare/v0.10.0...v0.10.1
238
283
  [0.10.0]: https://github.com/danielefrisanco/data_redactor/compare/v0.9.0...v0.10.0
data/README.md CHANGED
@@ -19,7 +19,7 @@ It ships **88 built-in patterns** across 15+ countries, grouped into tags
19
19
  (`:credentials`, `:financial`, `:contact`, ...) so you can redact only what you
20
20
  care about. Beyond plain strings it can walk nested Hashes, Arrays, and JSON,
21
21
  audit a payload without mutating it (`scan`), and plug into Logger, Rails, and
22
- Rack. You can also register your own patterns at boot.
22
+ Rack. You can also register your own patterns at boot or at runtime from any thread.
23
23
 
24
24
  ### Use cases
25
25
 
@@ -46,6 +46,12 @@ DataRedactor.redact(text)
46
46
  # => "User CF is [REDACTED] and key is [REDACTED]"
47
47
  ```
48
48
 
49
+ Prefer runnable code? The [`examples/`](examples/) directory has self-contained,
50
+ copy-pasteable scripts for every feature below — core redaction, scan/dry-run,
51
+ custom patterns, deep/JSON traversal, and the Logger / Rack / Rails / LLM
52
+ integrations. Run any of them with `bundle exec ruby examples/<name>.rb` (see
53
+ [examples/README.md](examples/README.md)).
54
+
49
55
  ### Filtering by tag or pattern name
50
56
 
51
57
  `only:` and `except:` both accept a single value or an Array, mixing **Symbols** (tag names) and **Strings** (specific pattern names).
@@ -161,7 +167,7 @@ DataRedactor.redact_json("not json") # => JSON::ParserError
161
167
 
162
168
  ### Custom patterns
163
169
 
164
- Teams often have internal IDs that the gem can't ship. Register them at boot:
170
+ Teams often have internal IDs that the gem can't ship. Register them at boot — or at runtime from any thread (registration is thread-safe, see [Thread safety](#thread-safety)):
165
171
 
166
172
  ```ruby
167
173
  # String (POSIX ERE) or Regexp — both accepted
@@ -415,6 +421,16 @@ redactor/
415
421
  │ └── tags.h # TAG_* bit constants
416
422
  ├── spec/
417
423
  │ └── data_redactor_spec.rb # RSpec tests — at least one example per pattern, plus filter / placeholder / custom-pattern coverage
424
+ ├── examples/ # Repo-only runnable usage scripts (not packaged in the gem)
425
+ │ ├── README.md # Index + how to run
426
+ │ ├── basic_redact.rb # redact, tag filters, placeholder modes
427
+ │ ├── scan_report.rb # scan dry-run with byte offsets
428
+ │ ├── custom_pattern.rb # add_pattern + name_pattern
429
+ │ ├── deep_and_json.rb # redact_deep / redact_json
430
+ │ ├── logger.rb # Logger::Formatter integration
431
+ │ ├── rack_middleware.rb # Rack middleware (body + headers)
432
+ │ ├── rails_filter.rb # filter_parameters adapter
433
+ │ └── llm_payload.rb # Claude / OpenAI message + response redaction
418
434
  ├── benchmark/ # Repo-only perf scripts (not packaged in the gem)
419
435
  │ ├── README.md # How to run, what each script measures
420
436
  │ ├── support/corpus.rb # Shared payload builders + pure-Ruby baseline redactor
@@ -571,9 +587,9 @@ All C-side buffers are heap-allocated with `malloc`/`strdup` and freed before th
571
587
 
572
588
  ## Thread safety
573
589
 
574
- `DataRedactor.redact` and `DataRedactor.scan` are safe to call concurrently from multiple threads. The v19 engine holds MRI's GVL for the duration of each call (no `rb_thread_call_without_gvl`), so concurrent calls are serialised by the GVL. Each call allocates its own working buffers; built-in engine state is read-only after `mm_init()` at load time.
590
+ `DataRedactor.redact` and `DataRedactor.scan` are safe to call concurrently from multiple threads. The v19 engine keeps its compiled patterns immutable and shared (read-only after `mm_init()` at load time) and all per-scan mutable state — NFA scratch and the lazy DFA cache — in per-thread storage, so concurrent scans never touch each other's state. For inputs above a few KB, `redact` **releases the GVL** (`rb_thread_call_without_gvl`) around the built-in scan, so a large redaction on one thread no longer blocks other Ruby threads from running. Small inputs keep the GVL (the release bookkeeping would cost more than the scan). Each call allocates its own working buffers. A thread's per-thread state is freed automatically when the thread exits, so processes that spawn many short-lived scanning threads do not accumulate memory.
575
591
 
576
- `DataRedactor.add_pattern`, `remove_pattern`, and `clear_custom_patterns!` mutate a shared dynamic array and are **not** thread-safe. Register custom patterns once at bootbefore spawning worker threads or forkingand they will be visible (read-only) to every subsequent `redact`/`scan` call.
592
+ `DataRedactor.add_pattern`, `remove_pattern`, and `clear_custom_patterns!` are also thread-safe: the shared custom-pattern array is guarded by a mutex that writers take around the mutation and `redact`/`scan` take around their custom-pattern loop. You can register, remove, or clear custom patterns from any thread at any time including from request handlers in a running server without coordinating with in-flight redactions. (Registration is still a rare operation; the lock is uncontended in practice.)
577
593
 
578
594
  ## Versioning
579
595
 
Binary file
Binary file
Binary file
Binary file
Binary file
Binary file
@@ -1,4 +1,4 @@
1
1
  module DataRedactor
2
2
  # Current gem version. Follows {https://semver.org Semantic Versioning 2.0.0}.
3
- VERSION = "0.11.0"
3
+ VERSION = "0.14.0"
4
4
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: data_redactor
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.11.0
4
+ version: 0.14.0
5
5
  platform: x86_64-linux
6
6
  authors:
7
7
  - Daniele Frisanco