data_redactor 0.10.1-x86_64-darwin → 0.11.0-x86_64-darwin

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 85bd57c818650ea452e6787af3596c4b868d54d9b4d0caaf66b4729bd3618ddb
4
- data.tar.gz: 61eba720403deacbce30fea57626629885cf4b0d68bc8c1446e31f79e66569d6
3
+ metadata.gz: a96a7eca713bd504ed219651ad47b4feeef9ae577ab2b5c2787316a00630ede9
4
+ data.tar.gz: 5d47a193b079d6d09176b204abd4f1d488da88c135fe8b3a83d533ca40b6cc2c
5
5
  SHA512:
6
- metadata.gz: 937c343346177564cd9688b85028817fac5918846e96a03bf2376cb276c54e6df6b33238271a17473106b823cebc54ac2c2ff98d6459b34825941b0162c885e2
7
- data.tar.gz: a44bee564f7ab53c85b1cc244ae14f2893b39761d59b4043b81acf19c053315731ce4756e5cf54540a38ca1867f723c60c11312510772a3496f076ac11c98753
6
+ metadata.gz: 63520c587722bef51d5fc24df838935f2311b2b5c2e3f55776e1230eb670c787e32cc0b264f9d571b0f8b1a6909364202a3e76c81f8b1de35045dfe7c079c1cc
7
+ data.tar.gz: a9e7ffb2a7b043ddb9e87e5fb96e7d79bd2da90a94de88dd511fc63208e245be16914487f4c46c9d663c461b24026527cc2f1c84b9a2a288aa2414901ba6e14e
data/CHANGELOG.md CHANGED
@@ -7,6 +7,25 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ## [0.11.0] - 2026-06-10
11
+
12
+ ### Added
13
+ - **Claude / OpenAI LLM integrations** — two new soft-required adapters that
14
+ scrub PII and secrets from LLM payloads before they leave the process and
15
+ from responses before they're logged:
16
+ - `DataRedactor::Integrations::Claude` — `.redact_messages` (handles the
17
+ `messages` array plus a top-level `system:` prompt; String or
18
+ array-of-content-block content) and `.redact_response` (Messages API
19
+ `content` text blocks).
20
+ - `DataRedactor::Integrations::OpenAI` — `.redact_messages` (Chat
21
+ Completions `messages`, including a `system` message and array-of-parts
22
+ content) and `.redact_response` (`choices[].message.content`).
23
+ Both operate on plain Ruby Hashes/Arrays with String or Symbol keys (no
24
+ runtime dependency on the `anthropic`/`openai` gems), return a deep copy
25
+ (never mutate the caller's input), pass non-text content blocks through
26
+ untouched, and forward `only:`/`except:`/`placeholder:` to
27
+ `DataRedactor.redact`.
28
+
10
29
  ## [0.10.1] - 2026-06-10
11
30
 
12
31
  ### Fixed
@@ -213,7 +232,8 @@ features as 0.7.1 plus the pipeline fix.
213
232
  - `DataRedactor.redact(text)` module function returning the input with every match replaced by `[REDACTED]`.
214
233
  - RSpec suite with one example per pattern.
215
234
 
216
- [Unreleased]: https://github.com/danielefrisanco/data_redactor/compare/v0.10.1...HEAD
235
+ [Unreleased]: https://github.com/danielefrisanco/data_redactor/compare/v0.11.0...HEAD
236
+ [0.11.0]: https://github.com/danielefrisanco/data_redactor/compare/v0.10.1...v0.11.0
217
237
  [0.10.1]: https://github.com/danielefrisanco/data_redactor/compare/v0.10.0...v0.10.1
218
238
  [0.10.0]: https://github.com/danielefrisanco/data_redactor/compare/v0.9.0...v0.10.0
219
239
  [0.9.0]: https://github.com/danielefrisanco/data_redactor/compare/v0.8.0...v0.9.0
data/README.md CHANGED
@@ -275,6 +275,34 @@ Pass an empty subset (e.g. `scrub: [:headers]`) to opt out of body wrapping. For
275
275
 
276
276
  > **Body wrapping is buffering.** The middleware reads the entire response body into memory before scanning. For streaming endpoints (SSE, large file downloads, Rack::Hijack) use `scrub: [:headers]` and rely on the Logger formatter for application logs instead.
277
277
 
278
+ ### Claude / OpenAI LLM payloads
279
+
280
+ Sanitize LLM message payloads before they leave the process, and scrub responses before they're logged or stored. Both adapters operate on plain Ruby Hashes/Arrays (String **or** Symbol keys), so they work with the `anthropic`/`openai` gems, a raw HTTP client, or parsed JSON — no runtime dependency on any SDK. They **return a deep copy and never mutate your input**, and forward `only:`/`except:`/`placeholder:` to `DataRedactor.redact`.
281
+
282
+ ```ruby
283
+ require "data_redactor/integrations/claude"
284
+
285
+ # Redact a messages array before sending to Claude
286
+ safe_messages = DataRedactor::Integrations::Claude.redact_messages(messages)
287
+ client.messages.create(model: "claude-opus-4-8", max_tokens: 1024, messages: safe_messages)
288
+
289
+ # Redact the response (assistant content blocks) before logging
290
+ safe_response = DataRedactor::Integrations::Claude.redact_response(response)
291
+ ```
292
+
293
+ ```ruby
294
+ require "data_redactor/integrations/openai"
295
+
296
+ # Redact a messages array before sending to OpenAI
297
+ safe_messages = DataRedactor::Integrations::OpenAI.redact_messages(messages)
298
+ client.chat(parameters: { model: "gpt-4o", messages: safe_messages })
299
+
300
+ # Redact the response (choices[].message.content) before logging
301
+ safe_response = DataRedactor::Integrations::OpenAI.redact_response(response)
302
+ ```
303
+
304
+ `content` may be a plain String or an array of content blocks/parts (`{ type: "text", text: "..." }`) — only the `text` of `text` blocks is redacted; image and other block types pass through untouched. For Claude, a top-level `system:` String is also redacted; for OpenAI, a `{ role: "system" }` message in the array is redacted like any other. Pass a bare `messages` array or the whole request Hash (with a `messages` key) — either works.
305
+
278
306
  ## Detected patterns (88 total)
279
307
 
280
308
  The table below is a representative sample. Use `DataRedactor.pattern_names` for the canonical, machine-readable list — it stays in sync with the C extension automatically.
@@ -510,6 +538,26 @@ All payload sizes pass a correctness check (redaction count matches pure-Ruby `g
510
538
  The previous engine (per-pattern `regexec`) was **4.25× slower** than pure Ruby on the
511
539
  1 MB payload — a ~9× swing. Old numbers are in git history (`CHANGELOG.md` [0.9.0]).
512
540
 
541
+ #### Linear scaling
542
+
543
+ Throughput stays flat as input grows — the single-pass engine is O(N), so a 10×
544
+ larger payload takes ~10× longer and MB/s holds steady. The old per-pattern
545
+ `regexec` engine was O(N²) and fell off a cliff on large inputs (a 10 MB log took
546
+ tens of seconds); v19 redacts the same 10 MB in ~1.4 s.
547
+
548
+ | Size | Time | MB/s |
549
+ |--------|----------|---------|
550
+ | 1 KB | 0.14 ms | 7.1 |
551
+ | 100 KB | 13.4 ms | 7.3 |
552
+ | 1 MB | 142 ms | 7.0 |
553
+ | 10 MB | 1.42 s | 7.0 |
554
+ | 50 MB | 7.14 s | 7.0 |
555
+
556
+ No published benchmarks exist for comparable Ruby PII-redaction gems, so the
557
+ numbers above are absolute (vs pure-Ruby `gsub`), not a head-to-head against
558
+ another gem. Run `benchmark/scaling.rb` on your own hardware — absolute MB/s is
559
+ machine-dependent, but the flat curve is not.
560
+
513
561
  ## How it works
514
562
 
515
563
  1. At load time, `Init_data_redactor` compiles all 85 regex patterns once using `regcomp` (POSIX ERE) and stores them as static `regex_t` structs. Patterns marked as boundary-wrapped are expanded with `wrap_boundary()` before compilation.
@@ -0,0 +1,113 @@
1
+ require "data_redactor"
2
+ require "data_redactor/integrations/llm_support"
3
+
4
+ module DataRedactor
5
+ module Integrations
6
+ # Adapter for Anthropic Claude (Messages API) payloads. Scrubs PII and
7
+ # secrets from a request's `messages` (and top-level `system:` prompt)
8
+ # before they leave the process, and from a response's `content` blocks
9
+ # before they're logged or stored.
10
+ #
11
+ # Operates on plain Ruby Hashes/Arrays with either String or Symbol keys,
12
+ # so it works with the `anthropic` gem, a raw HTTP client, or parsed JSON —
13
+ # no runtime dependency on any SDK. Inputs are never mutated; a deep copy
14
+ # is returned.
15
+ #
16
+ # @example Scrub a request before sending
17
+ # require "data_redactor/integrations/claude"
18
+ #
19
+ # messages = [{ role: "user", content: "my email is alice@example.com" }]
20
+ # safe = DataRedactor::Integrations::Claude.redact_messages(messages)
21
+ # client.messages.create(model: "claude-opus-4-8", max_tokens: 1024,
22
+ # messages: safe)
23
+ #
24
+ # @example Scrub a response before logging
25
+ # resp = client.messages.create(...)
26
+ # logger.info DataRedactor::Integrations::Claude.redact_response(resp)
27
+ module Claude
28
+ module_function
29
+
30
+ # Redact a Claude `messages` array (and an optional top-level `system:`
31
+ # String) before sending the request. Returns a deep copy; the input is
32
+ # not mutated.
33
+ #
34
+ # Each message's `content` may be a String or an array of content blocks
35
+ # (`{ type: "text", text: "..." }`); only the `text` field of `text`
36
+ # blocks is redacted. Non-text blocks (e.g. `image`) pass through
37
+ # untouched.
38
+ #
39
+ # @param messages [Array<Hash>, Hash] either a bare array of
40
+ # `{ role:, content: }` hashes, or a request Hash containing a
41
+ # `messages` key and an optional `system` key. Keys may be String or
42
+ # Symbol.
43
+ # @param only forwarded to {DataRedactor.redact}
44
+ # @param except forwarded to {DataRedactor.redact}
45
+ # @param placeholder forwarded to {DataRedactor.redact}
46
+ # @return [Array<Hash>, Hash] a deep copy of the input with text leaves
47
+ # redacted; an Array if an Array was given, a Hash if a Hash was given.
48
+ # @example
49
+ # Claude.redact_messages([{ role: "user", content: "ssn 123-45-6789" }])
50
+ # #=> [{ role: "user", content: "ssn [REDACTED]" }]
51
+ def redact_messages(messages, only: nil, except: nil, placeholder: DataRedactor::PLACEHOLDER_DEFAULT)
52
+ redact = ->(s) { DataRedactor.redact(s, only: only, except: except, placeholder: placeholder) }
53
+
54
+ if messages.is_a?(Hash)
55
+ out = LLMSupport.deep_copy(messages)
56
+ sys = LLMSupport.fetch(out, :system)
57
+ LLMSupport.put(out, :system, redact.call(sys)) if sys.is_a?(String)
58
+ list = LLMSupport.fetch(out, :messages)
59
+ LLMSupport.put(out, :messages, redact_message_list(list, redact)) if list.is_a?(Array)
60
+ out
61
+ else
62
+ redact_message_list(LLMSupport.deep_copy(messages), redact)
63
+ end
64
+ end
65
+
66
+ # Redact a Claude Messages API response before logging or storing it.
67
+ # Returns a deep copy; the input is not mutated.
68
+ #
69
+ # Walks the response's `content` array and redacts the `text` field of
70
+ # each `text` block, leaving the rest of the response (id, role, usage,
71
+ # non-text blocks) intact.
72
+ #
73
+ # @param response [Hash] a Claude response Hash with a `content` array of
74
+ # blocks. Keys may be String or Symbol.
75
+ # @param only forwarded to {DataRedactor.redact}
76
+ # @param except forwarded to {DataRedactor.redact}
77
+ # @param placeholder forwarded to {DataRedactor.redact}
78
+ # @return [Hash] a deep copy of the response with text blocks redacted.
79
+ # @example
80
+ # Claude.redact_response(
81
+ # "content" => [{ "type" => "text", "text" => "card 4111111111111111" }]
82
+ # )
83
+ # #=> {"content"=>[{"type"=>"text", "text"=>"card [REDACTED]"}]}
84
+ def redact_response(response, only: nil, except: nil, placeholder: DataRedactor::PLACEHOLDER_DEFAULT)
85
+ redact = ->(s) { DataRedactor.redact(s, only: only, except: except, placeholder: placeholder) }
86
+ out = LLMSupport.deep_copy(response)
87
+ return out unless out.is_a?(Hash)
88
+
89
+ content = LLMSupport.fetch(out, :content)
90
+ LLMSupport.put(out, :content, LLMSupport.redact_text_blocks(content, redact)) if content.is_a?(Array)
91
+ out
92
+ end
93
+
94
+ # @!visibility private
95
+ # Redact each message's content (String or array of blocks) in place.
96
+ # Expects an already-deep-copied list.
97
+ def redact_message_list(messages, redact)
98
+ messages.map do |msg|
99
+ next msg unless msg.is_a?(Hash)
100
+
101
+ content = LLMSupport.fetch(msg, :content)
102
+ case content
103
+ when String
104
+ LLMSupport.put(msg, :content, redact.call(content))
105
+ when Array
106
+ LLMSupport.put(msg, :content, LLMSupport.redact_text_blocks(content, redact))
107
+ end
108
+ msg
109
+ end
110
+ end
111
+ end
112
+ end
113
+ end
@@ -0,0 +1,63 @@
1
+ require "data_redactor"
2
+
3
+ module DataRedactor
4
+ module Integrations
5
+ # Shared helpers for the LLM payload adapters ({Claude}, {OpenAI}).
6
+ #
7
+ # Both adapters operate on plain Ruby Hashes/Arrays whose keys may be
8
+ # String or Symbol, never mutate the caller's input, and redact only the
9
+ # `text` field of text content blocks. This module holds that common,
10
+ # non-trivial logic; the per-provider modules keep only their own
11
+ # provider-specific shape walking.
12
+ #
13
+ # @!visibility private
14
+ module LLMSupport
15
+ module_function
16
+
17
+ # Read a key from a Hash that may use String or Symbol keys.
18
+ # @return the value under the Symbol or String form of `key`, or nil.
19
+ def fetch(hash, key)
20
+ hash.key?(key) ? hash[key] : hash[key.to_s]
21
+ end
22
+
23
+ # Write a value back under whichever key form (String/Symbol) the Hash
24
+ # already uses for `key`, defaulting to the Symbol form.
25
+ # @return [void]
26
+ def put(hash, key, value)
27
+ if hash.key?(key.to_s)
28
+ hash[key.to_s] = value
29
+ else
30
+ hash[key] = value
31
+ end
32
+ end
33
+
34
+ # Recursively copy a Hash/Array/String structure so the original is never
35
+ # mutated. Non-container, non-String leaves are returned as-is.
36
+ # @return a deep copy of `obj`.
37
+ def deep_copy(obj)
38
+ case obj
39
+ when Hash then obj.each_with_object({}) { |(k, v), o| o[k] = deep_copy(v) }
40
+ when Array then obj.map { |v| deep_copy(v) }
41
+ when String then obj.dup
42
+ else obj
43
+ end
44
+ end
45
+
46
+ # Redact the `text` field of each `text` content block in `blocks`,
47
+ # passing non-text blocks (e.g. images) through untouched. Mutates the
48
+ # blocks in `blocks` (call on an already-copied structure).
49
+ # @param blocks [Array] content blocks.
50
+ # @param redact [#call] a String -> String redaction lambda.
51
+ # @return [Array] the same `blocks` array, with text blocks redacted.
52
+ def redact_text_blocks(blocks, redact)
53
+ blocks.map do |block|
54
+ next block unless block.is_a?(Hash)
55
+
56
+ text = fetch(block, :text)
57
+ put(block, :text, redact.call(text)) if text.is_a?(String)
58
+ block
59
+ end
60
+ end
61
+ end
62
+ end
63
+ end
@@ -0,0 +1,118 @@
1
+ require "data_redactor"
2
+ require "data_redactor/integrations/llm_support"
3
+
4
+ module DataRedactor
5
+ module Integrations
6
+ # Adapter for OpenAI Chat Completions payloads. Scrubs PII and secrets from
7
+ # a request's `messages` before they leave the process, and from a
8
+ # response's `choices[].message.content` before they're logged or stored.
9
+ #
10
+ # Operates on plain Ruby Hashes/Arrays with either String or Symbol keys,
11
+ # so it works with the `openai` gem, a raw HTTP client, or parsed JSON — no
12
+ # runtime dependency on any SDK. Inputs are never mutated; a deep copy is
13
+ # returned.
14
+ #
15
+ # @example Scrub a request before sending
16
+ # require "data_redactor/integrations/openai"
17
+ #
18
+ # messages = [{ role: "user", content: "my email is alice@example.com" }]
19
+ # safe = DataRedactor::Integrations::OpenAI.redact_messages(messages)
20
+ # client.chat(parameters: { model: "gpt-4o", messages: safe })
21
+ #
22
+ # @example Scrub a response before logging
23
+ # resp = client.chat(parameters: { ... })
24
+ # logger.info DataRedactor::Integrations::OpenAI.redact_response(resp)
25
+ module OpenAI
26
+ module_function
27
+
28
+ # Redact an OpenAI `messages` array before sending the request. Returns a
29
+ # deep copy; the input is not mutated.
30
+ #
31
+ # Each message's `content` may be a String or an array of parts
32
+ # (`{ type: "text", text: "..." }`); only the `text` field of `text`
33
+ # parts is redacted. Non-text parts (e.g. `image_url`) pass through
34
+ # untouched. A `{ role: "system", content: ... }` entry is redacted like
35
+ # any other message (OpenAI carries the system prompt in the array).
36
+ #
37
+ # @param messages [Array<Hash>, Hash] either a bare array of
38
+ # `{ role:, content: }` hashes, or a request Hash containing a
39
+ # `messages` key. Keys may be String or Symbol.
40
+ # @param only forwarded to {DataRedactor.redact}
41
+ # @param except forwarded to {DataRedactor.redact}
42
+ # @param placeholder forwarded to {DataRedactor.redact}
43
+ # @return [Array<Hash>, Hash] a deep copy of the input with text leaves
44
+ # redacted; an Array if an Array was given, a Hash if a Hash was given.
45
+ # @example
46
+ # OpenAI.redact_messages([{ role: "user", content: "ssn 123-45-6789" }])
47
+ # #=> [{ role: "user", content: "ssn [REDACTED]" }]
48
+ def redact_messages(messages, only: nil, except: nil, placeholder: DataRedactor::PLACEHOLDER_DEFAULT)
49
+ redact = ->(s) { DataRedactor.redact(s, only: only, except: except, placeholder: placeholder) }
50
+
51
+ if messages.is_a?(Hash)
52
+ out = LLMSupport.deep_copy(messages)
53
+ list = LLMSupport.fetch(out, :messages)
54
+ LLMSupport.put(out, :messages, redact_message_list(list, redact)) if list.is_a?(Array)
55
+ out
56
+ else
57
+ redact_message_list(LLMSupport.deep_copy(messages), redact)
58
+ end
59
+ end
60
+
61
+ # Redact an OpenAI Chat Completions response before logging or storing it.
62
+ # Returns a deep copy; the input is not mutated.
63
+ #
64
+ # Walks `choices[].message.content` and redacts each (String content),
65
+ # leaving the rest of the response (id, usage, finish_reason) intact.
66
+ #
67
+ # @param response [Hash] a response Hash with a `choices` array, each
68
+ # choice carrying a `message` Hash with a `content` String. Keys may be
69
+ # String or Symbol.
70
+ # @param only forwarded to {DataRedactor.redact}
71
+ # @param except forwarded to {DataRedactor.redact}
72
+ # @param placeholder forwarded to {DataRedactor.redact}
73
+ # @return [Hash] a deep copy of the response with message content redacted.
74
+ # @example
75
+ # OpenAI.redact_response(
76
+ # "choices" => [{ "message" => { "content" => "card 4111111111111111" } }]
77
+ # )
78
+ # #=> {"choices"=>[{"message"=>{"content"=>"card [REDACTED]"}}]}
79
+ def redact_response(response, only: nil, except: nil, placeholder: DataRedactor::PLACEHOLDER_DEFAULT)
80
+ redact = ->(s) { DataRedactor.redact(s, only: only, except: except, placeholder: placeholder) }
81
+ out = LLMSupport.deep_copy(response)
82
+ return out unless out.is_a?(Hash)
83
+
84
+ choices = LLMSupport.fetch(out, :choices)
85
+ return out unless choices.is_a?(Array)
86
+
87
+ choices.each do |choice|
88
+ next unless choice.is_a?(Hash)
89
+
90
+ message = LLMSupport.fetch(choice, :message)
91
+ next unless message.is_a?(Hash)
92
+
93
+ content = LLMSupport.fetch(message, :content)
94
+ LLMSupport.put(message, :content, redact.call(content)) if content.is_a?(String)
95
+ end
96
+ out
97
+ end
98
+
99
+ # @!visibility private
100
+ # Redact each message's content (String or array of parts) in place.
101
+ # Expects an already-deep-copied list.
102
+ def redact_message_list(messages, redact)
103
+ messages.map do |msg|
104
+ next msg unless msg.is_a?(Hash)
105
+
106
+ content = LLMSupport.fetch(msg, :content)
107
+ case content
108
+ when String
109
+ LLMSupport.put(msg, :content, redact.call(content))
110
+ when Array
111
+ LLMSupport.put(msg, :content, LLMSupport.redact_text_blocks(content, redact))
112
+ end
113
+ msg
114
+ end
115
+ end
116
+ end
117
+ end
118
+ end
@@ -2,7 +2,8 @@ require "data_redactor"
2
2
 
3
3
  module DataRedactor
4
4
  # Namespace for the optional framework adapters under
5
- # +lib/data_redactor/integrations/+ ({Logger}, +Rails+, {Rack}).
5
+ # +lib/data_redactor/integrations/+ ({Logger}, +Rails+, {Rack},
6
+ # {Claude}, {OpenAI}).
6
7
  #
7
8
  # Each adapter is soft-required — none load with +require "data_redactor"+;
8
9
  # +require+ only the one you need. They add no runtime gem dependencies and
@@ -1,4 +1,4 @@
1
1
  module DataRedactor
2
2
  # Current gem version. Follows {https://semver.org Semantic Versioning 2.0.0}.
3
- VERSION = "0.10.1"
3
+ VERSION = "0.11.0"
4
4
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: data_redactor
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.10.1
4
+ version: 0.11.0
5
5
  platform: x86_64-darwin
6
6
  authors:
7
7
  - Daniele Frisanco
@@ -114,7 +114,10 @@ files:
114
114
  - lib/data_redactor/3.3/data_redactor.bundle
115
115
  - lib/data_redactor/3.4/data_redactor.bundle
116
116
  - lib/data_redactor/4.0/data_redactor.bundle
117
+ - lib/data_redactor/integrations/claude.rb
118
+ - lib/data_redactor/integrations/llm_support.rb
117
119
  - lib/data_redactor/integrations/logger.rb
120
+ - lib/data_redactor/integrations/openai.rb
118
121
  - lib/data_redactor/integrations/rack.rb
119
122
  - lib/data_redactor/integrations/rails.rb
120
123
  - lib/data_redactor/name_pattern.rb