data_redactor 0.10.0-x86_64-linux → 0.10.1-x86_64-linux

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8c68f704fce41662e1b7829085e9fadffa4931dae4323f528916897164cfdf5b
4
- data.tar.gz: 1d91d3f55059e579cf36967eb06294ed18e7a54c3c1396971a749d30570118e4
3
+ metadata.gz: d56efe6252d7abbcf870bbc1cc633836b64163970f9c7945cba87c8cfc8b428d
4
+ data.tar.gz: dbfb5bffe86b85cdd9b6938c1367eb29a6ff7aab9a7535d124231b9f5a649931
5
5
  SHA512:
6
- metadata.gz: c5df30474632d5026ca2a79e8be61fc020a4cb72cd7c3fb9333134d03a7d8430380185a0d209eabab33b2f27b6502ca7e9e34fb5c8fdd6df9ac24b2d71e33bd5
7
- data.tar.gz: b0fd12a2666dfa50e9dd18ecb3a6733ca712ea5f1a8ea31305bd0cf788fa766cba9720c3891deec67c3eace73dafca2cc5fc566712d8bdb627b2bc9eac818b81
6
+ metadata.gz: 13d1ee90ea2f24c7e0855b559fd5dab578ae8be7e9918a2fc575659a43cc6c4cc07445994c1f42a1f9bf4bb3518f29b3d2a1dcc925ac339511239db221d257f5
7
+ data.tar.gz: be502bc4bc1dfb6a1e68bb8d86f382a5c598507e7618bd6d13a608017363c7ffb3bdb7d88657676aa52adc3267d8a340e79f233087d59bea8405821231afcaa2
data/CHANGELOG.md CHANGED
@@ -7,6 +7,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ## [0.10.1] - 2026-06-10
11
+
12
+ ### Fixed
13
+ - **musl/Alpine load failure** — the `hashicorp_vault_batch_token` pattern used a
14
+ `{138,300}` interval whose upper bound exceeds POSIX `RE_DUP_MAX` (255). glibc
15
+ accepts it, but musl's `regcomp` rejects it ("Invalid contents of {}"), so the
16
+ native musl gem raised at load (`require "data_redactor"`) on Alpine. Capped the
17
+ bound at 255; tokens are still neutralized (prefix + 251+ chars redacted).
18
+
10
19
  ## [0.10.0] - 2026-06-09
11
20
 
12
21
  ### Changed
@@ -204,7 +213,9 @@ features as 0.7.1 plus the pipeline fix.
204
213
  - `DataRedactor.redact(text)` module function returning the input with every match replaced by `[REDACTED]`.
205
214
  - RSpec suite with one example per pattern.
206
215
 
207
- [Unreleased]: https://github.com/danielefrisanco/data_redactor/compare/v0.9.0...HEAD
216
+ [Unreleased]: https://github.com/danielefrisanco/data_redactor/compare/v0.10.1...HEAD
217
+ [0.10.1]: https://github.com/danielefrisanco/data_redactor/compare/v0.10.0...v0.10.1
218
+ [0.10.0]: https://github.com/danielefrisanco/data_redactor/compare/v0.9.0...v0.10.0
208
219
  [0.9.0]: https://github.com/danielefrisanco/data_redactor/compare/v0.8.0...v0.9.0
209
220
  [0.8.0]: https://github.com/danielefrisanco/data_redactor/compare/v0.7.2...v0.8.0
210
221
  [0.7.2]: https://github.com/danielefrisanco/data_redactor/compare/v0.7.1...v0.7.2
@@ -523,7 +523,7 @@ All C-side buffers are heap-allocated with `malloc`/`strdup` and freed before th
523
523
 
524
524
  ## Thread safety
525
525
 
526
- `DataRedactor.redact` and `DataRedactor.scan` are safe to call concurrently from multiple threads. Built-in patterns are compiled into a static `regex_t` array at load time and never mutated afterward, and each call allocates its own working buffers. POSIX `regexec` is documented as thread-safe.
526
+ `DataRedactor.redact` and `DataRedactor.scan` are safe to call concurrently from multiple threads. The v19 engine holds MRI's GVL for the duration of each call (no `rb_thread_call_without_gvl`), so concurrent calls are serialised by the GVL. Each call allocates its own working buffers; built-in engine state is read-only after `mm_init()` at load time.
527
527
 
528
528
  `DataRedactor.add_pattern`, `remove_pattern`, and `clear_custom_patterns!` mutate a shared dynamic array and are **not** thread-safe. Register custom patterns once at boot — before spawning worker threads or forking — and they will be visible (read-only) to every subsequent `redact`/`scan` call.
529
529
 
@@ -540,4 +540,4 @@ Released under the [MIT License](LICENSE).
540
540
  - **Pattern ordering matters** — patterns run sequentially. An early broad pattern (e.g. the 9-digit passport) may consume digits that a later pattern (e.g. credit card) depends on. Boundary wrapping mitigates this for pure-digit patterns.
541
541
  - **AWS Secret Key (pattern 1)** — 40 consecutive base64 characters is a broad match. It can produce false positives in base64-encoded content such as embedded images or binary blobs.
542
542
  - **Duplicate digit patterns** — several national ID formats share the same digit-length (11 digits: PESEL, Norwegian Fødselsnummer, Belgian National Number). They are kept as separate slots for clarity but the practical effect is that any 11-digit boundary-delimited number will be redacted.
543
- - **Performance is currently slower than pure-Ruby `gsub`.** A May 2026 investigation found the C extension is 3–5× slower than a pure-Ruby `gsub` loop running the same 88 patterns, across input sizes from 168 bytes to 1 MB. The root cause is glibc's POSIX `regexec()`: each call allocates an O(input-length) state buffer before any matching begins, and the gem calls it once per pattern in sequence. Ruby's Onigmo engine wins by using a built-in Boyer-Moore literal pre-filter that this gem can only approximate. Two perf fixes have shipped (buffer-sizing in `replace_all_matches`, a `strstr` literal pre-filter, and input chunking for large payloads), which gave ~25-30% improvement and made scaling linear, but the absolute gap remains. Use the gem on small payloads where the absolute latency is still acceptable (< 1 ms for typical log lines); for high-throughput pipelines, hold off until the next major release. See `docs/standalone_matcher_design.md` for the long-term plan.
543
+ - **Single-pass overlap semantics** built-in patterns are resolved by an index-order greedy claim: the lower-index pattern wins any region it matches. When two secrets abut with no separator, a rewrite-created word boundary can cause the second to be missed. This is rare in real text (secrets are almost always separator-delimited) and will be fixed by the upcoming longest-match-wins resolver in 1.0.
Binary file
Binary file
Binary file
Binary file
Binary file
Binary file
@@ -1,4 +1,4 @@
1
1
  module DataRedactor
2
2
  # Current gem version. Follows {https://semver.org Semantic Versioning 2.0.0}.
3
- VERSION = "0.10.0"
3
+ VERSION = "0.10.1"
4
4
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: data_redactor
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.10.0
4
+ version: 0.10.1
5
5
  platform: x86_64-linux
6
6
  authors:
7
7
  - Daniele Frisanco
@@ -106,6 +106,7 @@ extra_rdoc_files: []
106
106
  files:
107
107
  - CHANGELOG.md
108
108
  - LICENSE
109
+ - README.md
109
110
  - lib/data_redactor.rb
110
111
  - lib/data_redactor/3.0/data_redactor.so
111
112
  - lib/data_redactor/3.1/data_redactor.so
@@ -118,7 +119,6 @@ files:
118
119
  - lib/data_redactor/integrations/rails.rb
119
120
  - lib/data_redactor/name_pattern.rb
120
121
  - lib/data_redactor/version.rb
121
- - readme.md
122
122
  homepage: https://github.com/danielefrisanco/data_redactor
123
123
  licenses:
124
124
  - MIT