data_redactor 0.11.0-x86_64-linux → 0.14.0-x86_64-linux
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +46 -1
- data/README.md +20 -4
- data/lib/data_redactor/3.0/data_redactor.so +0 -0
- data/lib/data_redactor/3.1/data_redactor.so +0 -0
- data/lib/data_redactor/3.2/data_redactor.so +0 -0
- data/lib/data_redactor/3.3/data_redactor.so +0 -0
- data/lib/data_redactor/3.4/data_redactor.so +0 -0
- data/lib/data_redactor/4.0/data_redactor.so +0 -0
- data/lib/data_redactor/version.rb +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 86ed390986b76cb2bcb48a80c322be42b8bd91dcd5b9afb8299a6ab4c0dd9419
|
|
4
|
+
data.tar.gz: ca7260eff77b03d15392b0c763e56ee7b5bf146eb7f0b7597f78f41d2a6dfe08
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: a683277146fbb1e19caefffe829a9de090d48b181f0d15cc98a625509f47da1a7cf4f7f69df165f4b077a321f2ff1ee125ef4881e5746949b62f6db87269e022
|
|
7
|
+
data.tar.gz: ecb6d0acfcf1a9b9c2c79f1d0e016ae4ad13d0c938e31c9d97f0b3bdc8f323ac7c7c1d1a2828067b3a5b9f33895e0e9bc3bb6a69bd9c7f324e2055cdeb082ee8
|
data/CHANGELOG.md
CHANGED
|
@@ -7,6 +7,49 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|
|
7
7
|
|
|
8
8
|
## [Unreleased]
|
|
9
9
|
|
|
10
|
+
## [0.14.0] - 2026-06-17
|
|
11
|
+
|
|
12
|
+
### Added
|
|
13
|
+
- **Key-name-anchored secret redaction** (`:credentials`). A new pattern tier
|
|
14
|
+
redacts a secret by the *name of the field it is assigned to*, for values with
|
|
15
|
+
no distinctive shape of their own — the primary case being an `.env` file or
|
|
16
|
+
config blob passed through the redactor. Anchored on the key words `password`,
|
|
17
|
+
`passwd`, `pwd`, `secret`, `token`, `api_key`, `apikey`, `access_key`, and
|
|
18
|
+
`client_secret` (case-insensitive), followed by `=` or `:` (dotenv and YAML
|
|
19
|
+
styles), with quoted (`"..."`/`'...'`) or unquoted (≥6 chars) values. Only the
|
|
20
|
+
**value** is redacted; the key is kept so logs stay greppable
|
|
21
|
+
(`PASSWORD=[REDACTED]`). Compound key names match whether the secret word is a
|
|
22
|
+
prefix or suffix segment (`POSTGRES_DB_PASSWORD=`, `PASSWORD_POSTGRES=`).
|
|
23
|
+
Requires the assignment separator, so the word in prose ("reset your password")
|
|
24
|
+
is not a false positive.
|
|
25
|
+
- `examples/` directory with runnable, copy-pasteable usage scripts for every
|
|
26
|
+
feature (core redaction, scan/dry-run, custom patterns, deep/JSON traversal,
|
|
27
|
+
and the Logger / Rack / Rails / LLM integrations). Repo-only — not packaged in
|
|
28
|
+
the gem. Linked from the README.
|
|
29
|
+
|
|
30
|
+
## [0.13.0] - 2026-06-13
|
|
31
|
+
|
|
32
|
+
### Changed
|
|
33
|
+
- **Custom-pattern registration is now thread-safe.** `add_pattern`,
|
|
34
|
+
`remove_pattern`, and `clear_custom_patterns!` are guarded by a mutex shared
|
|
35
|
+
with the `redact`/`scan` custom-pattern loop, so patterns may be registered,
|
|
36
|
+
removed, or cleared from any thread at any time — including at runtime from a
|
|
37
|
+
request handler — without coordinating with in-flight redactions. The previous
|
|
38
|
+
"register custom patterns at boot only" caveat is lifted. (The C extension now
|
|
39
|
+
links `-lpthread` on glibc; no-op on musl and macOS where pthread is in libc.)
|
|
40
|
+
- **`redact` releases the GVL for large inputs.** The v19 engine's per-scan
|
|
41
|
+
mutable state (NFA scratch and the lazy DFA cache) moved into per-thread
|
|
42
|
+
storage, making the engine re-entrant. `redact` now releases the GVL
|
|
43
|
+
(`rb_thread_call_without_gvl`) around the built-in scan for inputs above a few
|
|
44
|
+
KB, so a large redaction on one thread no longer blocks other Ruby threads.
|
|
45
|
+
Small inputs keep the GVL. No public API change; output is byte-for-byte
|
|
46
|
+
identical (verified by a differential gate over ~6000 inputs). The per-thread
|
|
47
|
+
DFA cache's allocation floor was tuned so this adds ~0.86 MB per scanning
|
|
48
|
+
thread (down from a naive ~3.2 MB), with no throughput change. Per-thread scan
|
|
49
|
+
state is freed at thread exit (via a `pthread_key` destructor), so processes
|
|
50
|
+
that churn many short-lived scanning threads do not accumulate dead caches —
|
|
51
|
+
RSS stays flat across thousands of threads.
|
|
52
|
+
|
|
10
53
|
## [0.11.0] - 2026-06-10
|
|
11
54
|
|
|
12
55
|
### Added
|
|
@@ -232,7 +275,9 @@ features as 0.7.1 plus the pipeline fix.
|
|
|
232
275
|
- `DataRedactor.redact(text)` module function returning the input with every match replaced by `[REDACTED]`.
|
|
233
276
|
- RSpec suite with one example per pattern.
|
|
234
277
|
|
|
235
|
-
[Unreleased]: https://github.com/danielefrisanco/data_redactor/compare/v0.
|
|
278
|
+
[Unreleased]: https://github.com/danielefrisanco/data_redactor/compare/v0.14.0...HEAD
|
|
279
|
+
[0.14.0]: https://github.com/danielefrisanco/data_redactor/compare/v0.13.0...v0.14.0
|
|
280
|
+
[0.13.0]: https://github.com/danielefrisanco/data_redactor/compare/v0.11.0...v0.13.0
|
|
236
281
|
[0.11.0]: https://github.com/danielefrisanco/data_redactor/compare/v0.10.1...v0.11.0
|
|
237
282
|
[0.10.1]: https://github.com/danielefrisanco/data_redactor/compare/v0.10.0...v0.10.1
|
|
238
283
|
[0.10.0]: https://github.com/danielefrisanco/data_redactor/compare/v0.9.0...v0.10.0
|
data/README.md
CHANGED
|
@@ -19,7 +19,7 @@ It ships **88 built-in patterns** across 15+ countries, grouped into tags
|
|
|
19
19
|
(`:credentials`, `:financial`, `:contact`, ...) so you can redact only what you
|
|
20
20
|
care about. Beyond plain strings it can walk nested Hashes, Arrays, and JSON,
|
|
21
21
|
audit a payload without mutating it (`scan`), and plug into Logger, Rails, and
|
|
22
|
-
Rack. You can also register your own patterns at boot.
|
|
22
|
+
Rack. You can also register your own patterns — at boot or at runtime from any thread.
|
|
23
23
|
|
|
24
24
|
### Use cases
|
|
25
25
|
|
|
@@ -46,6 +46,12 @@ DataRedactor.redact(text)
|
|
|
46
46
|
# => "User CF is [REDACTED] and key is [REDACTED]"
|
|
47
47
|
```
|
|
48
48
|
|
|
49
|
+
Prefer runnable code? The [`examples/`](examples/) directory has self-contained,
|
|
50
|
+
copy-pasteable scripts for every feature below — core redaction, scan/dry-run,
|
|
51
|
+
custom patterns, deep/JSON traversal, and the Logger / Rack / Rails / LLM
|
|
52
|
+
integrations. Run any of them with `bundle exec ruby examples/<name>.rb` (see
|
|
53
|
+
[examples/README.md](examples/README.md)).
|
|
54
|
+
|
|
49
55
|
### Filtering by tag or pattern name
|
|
50
56
|
|
|
51
57
|
`only:` and `except:` both accept a single value or an Array, mixing **Symbols** (tag names) and **Strings** (specific pattern names).
|
|
@@ -161,7 +167,7 @@ DataRedactor.redact_json("not json") # => JSON::ParserError
|
|
|
161
167
|
|
|
162
168
|
### Custom patterns
|
|
163
169
|
|
|
164
|
-
Teams often have internal IDs that the gem can't ship. Register them at boot:
|
|
170
|
+
Teams often have internal IDs that the gem can't ship. Register them at boot — or at runtime from any thread (registration is thread-safe, see [Thread safety](#thread-safety)):
|
|
165
171
|
|
|
166
172
|
```ruby
|
|
167
173
|
# String (POSIX ERE) or Regexp — both accepted
|
|
@@ -415,6 +421,16 @@ redactor/
|
|
|
415
421
|
│ └── tags.h # TAG_* bit constants
|
|
416
422
|
├── spec/
|
|
417
423
|
│ └── data_redactor_spec.rb # RSpec tests — at least one example per pattern, plus filter / placeholder / custom-pattern coverage
|
|
424
|
+
├── examples/ # Repo-only runnable usage scripts (not packaged in the gem)
|
|
425
|
+
│ ├── README.md # Index + how to run
|
|
426
|
+
│ ├── basic_redact.rb # redact, tag filters, placeholder modes
|
|
427
|
+
│ ├── scan_report.rb # scan dry-run with byte offsets
|
|
428
|
+
│ ├── custom_pattern.rb # add_pattern + name_pattern
|
|
429
|
+
│ ├── deep_and_json.rb # redact_deep / redact_json
|
|
430
|
+
│ ├── logger.rb # Logger::Formatter integration
|
|
431
|
+
│ ├── rack_middleware.rb # Rack middleware (body + headers)
|
|
432
|
+
│ ├── rails_filter.rb # filter_parameters adapter
|
|
433
|
+
│ └── llm_payload.rb # Claude / OpenAI message + response redaction
|
|
418
434
|
├── benchmark/ # Repo-only perf scripts (not packaged in the gem)
|
|
419
435
|
│ ├── README.md # How to run, what each script measures
|
|
420
436
|
│ ├── support/corpus.rb # Shared payload builders + pure-Ruby baseline redactor
|
|
@@ -571,9 +587,9 @@ All C-side buffers are heap-allocated with `malloc`/`strdup` and freed before th
|
|
|
571
587
|
|
|
572
588
|
## Thread safety
|
|
573
589
|
|
|
574
|
-
`DataRedactor.redact` and `DataRedactor.scan` are safe to call concurrently from multiple threads. The v19 engine
|
|
590
|
+
`DataRedactor.redact` and `DataRedactor.scan` are safe to call concurrently from multiple threads. The v19 engine keeps its compiled patterns immutable and shared (read-only after `mm_init()` at load time) and all per-scan mutable state — NFA scratch and the lazy DFA cache — in per-thread storage, so concurrent scans never touch each other's state. For inputs above a few KB, `redact` **releases the GVL** (`rb_thread_call_without_gvl`) around the built-in scan, so a large redaction on one thread no longer blocks other Ruby threads from running. Small inputs keep the GVL (the release bookkeeping would cost more than the scan). Each call allocates its own working buffers. A thread's per-thread state is freed automatically when the thread exits, so processes that spawn many short-lived scanning threads do not accumulate memory.
|
|
575
591
|
|
|
576
|
-
`DataRedactor.add_pattern`, `remove_pattern`, and `clear_custom_patterns!`
|
|
592
|
+
`DataRedactor.add_pattern`, `remove_pattern`, and `clear_custom_patterns!` are also thread-safe: the shared custom-pattern array is guarded by a mutex that writers take around the mutation and `redact`/`scan` take around their custom-pattern loop. You can register, remove, or clear custom patterns from any thread at any time — including from request handlers in a running server — without coordinating with in-flight redactions. (Registration is still a rare operation; the lock is uncontended in practice.)
|
|
577
593
|
|
|
578
594
|
## Versioning
|
|
579
595
|
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|