data_redactor 0.11.0-x86_64-linux-musl → 0.13.0-x86_64-linux-musl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: d3c4276232644060bfd836a260bfbb5388a73da647e2b606db3a525734096caa
4
- data.tar.gz: b19982bbc64f10b95dd12db0068b34dcdca687a32bc881c1754280f7c5b58f33
3
+ metadata.gz: 56e8ce3c962e337b03a7b8fab9da6eee08a5d3d65323425f7f9afc6105b68876
4
+ data.tar.gz: 0ed77d4a8620bb1f8e03c386a201a99640cd7539ed321dc4e5d310ab4ce3b58b
5
5
  SHA512:
6
- metadata.gz: 453b1e64a831352f4a01bdc665c3d0b5e08dee3e17ae16c430443ed3d9bcd583d1464fbb908f6d11430e0e161b1bcbe7b5b27c620222a75d41eaddfafe78c79d
7
- data.tar.gz: 51ff6ce5f94dec65ceed6b2101faf350d5f27b1678aaf2419ade92624ce5f378e5f323226319d4f9c1bf1860931498c49769be321187fb9ac7062c18953f04e9
6
+ metadata.gz: b0b64c56150d14b34adcd81e6ce0204d4cc486aeb07cfbaafe067c8738a5233011a73aedaf6e4702f5dd309b13fce1844ec8d91a7f15704783053a4c60825b6d
7
+ data.tar.gz: b4cf6871059703476b2906984aec3d7dabb7ec86cf7912bbee33ed23dab1f7e81f988e82253b7b2c0af706cdc6ee53051822500d863e5bb06c2e9f50286f50ae
data/CHANGELOG.md CHANGED
@@ -7,6 +7,29 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ## [0.13.0] - 2026-06-13
11
+
12
+ ### Changed
13
+ - **Custom-pattern registration is now thread-safe.** `add_pattern`,
14
+ `remove_pattern`, and `clear_custom_patterns!` are guarded by a mutex shared
15
+ with the `redact`/`scan` custom-pattern loop, so patterns may be registered,
16
+ removed, or cleared from any thread at any time — including at runtime from a
17
+ request handler — without coordinating with in-flight redactions. The previous
18
+ "register custom patterns at boot only" caveat is lifted. (The C extension now
19
+ links `-lpthread` on glibc; no-op on musl and macOS where pthread is in libc.)
20
+ - **`redact` releases the GVL for large inputs.** The v19 engine's per-scan
21
+ mutable state (NFA scratch and the lazy DFA cache) moved into per-thread
22
+ storage, making the engine re-entrant. `redact` now releases the GVL
23
+ (`rb_thread_call_without_gvl`) around the built-in scan for inputs above a few
24
+ KB, so a large redaction on one thread no longer blocks other Ruby threads.
25
+ Small inputs keep the GVL. No public API change; output is byte-for-byte
26
+ identical (verified by a differential gate over ~6000 inputs). The per-thread
27
+ DFA cache's allocation floor was tuned so this adds ~0.86 MB per scanning
28
+ thread (down from a naive ~3.2 MB), with no throughput change. Per-thread scan
29
+ state is freed at thread exit (via a `pthread_key` destructor), so processes
30
+ that churn many short-lived scanning threads do not accumulate dead caches —
31
+ RSS stays flat across thousands of threads.
32
+
10
33
  ## [0.11.0] - 2026-06-10
11
34
 
12
35
  ### Added
@@ -232,7 +255,8 @@ features as 0.7.1 plus the pipeline fix.
232
255
  - `DataRedactor.redact(text)` module function returning the input with every match replaced by `[REDACTED]`.
233
256
  - RSpec suite with one example per pattern.
234
257
 
235
- [Unreleased]: https://github.com/danielefrisanco/data_redactor/compare/v0.11.0...HEAD
258
+ [Unreleased]: https://github.com/danielefrisanco/data_redactor/compare/v0.13.0...HEAD
259
+ [0.13.0]: https://github.com/danielefrisanco/data_redactor/compare/v0.11.0...v0.13.0
236
260
  [0.11.0]: https://github.com/danielefrisanco/data_redactor/compare/v0.10.1...v0.11.0
237
261
  [0.10.1]: https://github.com/danielefrisanco/data_redactor/compare/v0.10.0...v0.10.1
238
262
  [0.10.0]: https://github.com/danielefrisanco/data_redactor/compare/v0.9.0...v0.10.0
data/README.md CHANGED
@@ -19,7 +19,7 @@ It ships **88 built-in patterns** across 15+ countries, grouped into tags
19
19
  (`:credentials`, `:financial`, `:contact`, ...) so you can redact only what you
20
20
  care about. Beyond plain strings it can walk nested Hashes, Arrays, and JSON,
21
21
  audit a payload without mutating it (`scan`), and plug into Logger, Rails, and
22
- Rack. You can also register your own patterns at boot.
22
+ Rack. You can also register your own patterns at boot or at runtime from any thread.
23
23
 
24
24
  ### Use cases
25
25
 
@@ -161,7 +161,7 @@ DataRedactor.redact_json("not json") # => JSON::ParserError
161
161
 
162
162
  ### Custom patterns
163
163
 
164
- Teams often have internal IDs that the gem can't ship. Register them at boot:
164
+ Teams often have internal IDs that the gem can't ship. Register them at boot — or at runtime from any thread (registration is thread-safe, see [Thread safety](#thread-safety)):
165
165
 
166
166
  ```ruby
167
167
  # String (POSIX ERE) or Regexp — both accepted
@@ -571,9 +571,9 @@ All C-side buffers are heap-allocated with `malloc`/`strdup` and freed before th
571
571
 
572
572
  ## Thread safety
573
573
 
574
- `DataRedactor.redact` and `DataRedactor.scan` are safe to call concurrently from multiple threads. The v19 engine holds MRI's GVL for the duration of each call (no `rb_thread_call_without_gvl`), so concurrent calls are serialised by the GVL. Each call allocates its own working buffers; built-in engine state is read-only after `mm_init()` at load time.
574
+ `DataRedactor.redact` and `DataRedactor.scan` are safe to call concurrently from multiple threads. The v19 engine keeps its compiled patterns immutable and shared (read-only after `mm_init()` at load time) and all per-scan mutable state — NFA scratch and the lazy DFA cache — in per-thread storage, so concurrent scans never touch each other's state. For inputs above a few KB, `redact` **releases the GVL** (`rb_thread_call_without_gvl`) around the built-in scan, so a large redaction on one thread no longer blocks other Ruby threads from running. Small inputs keep the GVL (the release bookkeeping would cost more than the scan). Each call allocates its own working buffers. A thread's per-thread state is freed automatically when the thread exits, so processes that spawn many short-lived scanning threads do not accumulate memory.
575
575
 
576
- `DataRedactor.add_pattern`, `remove_pattern`, and `clear_custom_patterns!` mutate a shared dynamic array and are **not** thread-safe. Register custom patterns once at bootbefore spawning worker threads or forkingand they will be visible (read-only) to every subsequent `redact`/`scan` call.
576
+ `DataRedactor.add_pattern`, `remove_pattern`, and `clear_custom_patterns!` are also thread-safe: the shared custom-pattern array is guarded by a mutex that writers take around the mutation and `redact`/`scan` take around their custom-pattern loop. You can register, remove, or clear custom patterns from any thread at any time including from request handlers in a running server without coordinating with in-flight redactions. (Registration is still a rare operation; the lock is uncontended in practice.)
577
577
 
578
578
  ## Versioning
579
579
 
Binary file
Binary file
Binary file
Binary file
Binary file
Binary file
@@ -1,4 +1,4 @@
1
1
  module DataRedactor
2
2
  # Current gem version. Follows {https://semver.org Semantic Versioning 2.0.0}.
3
- VERSION = "0.11.0"
3
+ VERSION = "0.13.0"
4
4
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: data_redactor
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.11.0
4
+ version: 0.13.0
5
5
  platform: x86_64-linux-musl
6
6
  authors:
7
7
  - Daniele Frisanco