data_redactor 0.7.2-aarch64-linux → 0.8.0-aarch64-linux

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 32eb6b488baa1145c90e7bb6ad04ec17e49f37c88f4b36a4dc533398b6bbcbe9
4
- data.tar.gz: 257d1f2a40ba998d1ae2f2170e6fd91b5bbc976cd0e0006633632f916decbd05
3
+ metadata.gz: 508508ec72fc9ab5eb574a60619fa95961c682902adfcedcf8ae29f57b246e1d
4
+ data.tar.gz: dd1bcabcaec602a9719d30fbc9c3bbc532aa9b4d59f9103bc8120a8cf327a39c
5
5
  SHA512:
6
- metadata.gz: a6eb684c111fcd3097172621110e6a6bf35ee4ca36ef51681a96c8e5ff93ba1c948259cc8e19ef18202696aea9a30bf811f83312b5debf17424ff882b77241dd
7
- data.tar.gz: d34b87cf5f0c6d6acb7db884aeb66326ffaf2ddfae13a88f81a0beeb722f2898af6e2d062b2292a341c51344c08ebfe4ad1d53048537d211da4fe54ad376b068
6
+ metadata.gz: fa665cc51f93155c58bded6bcf737a09787dc1607232d94324994503588e783ee516c39936a2dc980623794c85e52537a30958db431ce29219a39d77852dbd70
7
+ data.tar.gz: 83762eb09df4b13b6c1a87347b3969571480ebf03bff16e88cb82e769a4aa0c9257a6bd1375ee3b5a2969111aa8cf812d255947580a8670680686354d06e68ca
data/CHANGELOG.md CHANGED
@@ -7,6 +7,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ### Added
11
+ - `DataRedactor.redact_deep(data, only:, except:, placeholder:)` — recursively redacts every String value in a nested Hash/Array structure. Non-string scalars (Integer, Float, nil, Boolean) and Hash keys are passed through unchanged. Returns a deep copy; never mutates the input. Raises `ArgumentError` on circular references.
12
+ - `DataRedactor.redact_json(json_string, only:, except:, placeholder:)` — parses JSON, redacts via `redact_deep`, and returns valid JSON. Raises `JSON::ParserError` on invalid input.
13
+ - HashiCorp Vault service tokens (`hvs.` prefix, 90–120 chars) — pattern `hashicorp_vault_service_token`
14
+ - HashiCorp Vault batch tokens (`hvb.` prefix, 138–300 chars) — pattern `hashicorp_vault_batch_token`
15
+ - HashiCorp Terraform Cloud API tokens (`<14-char-id>.atlasv1.<token>`) — pattern `hashicorp_terraform_api_token`
16
+
17
+ All three HashiCorp patterns are tagged `:credentials` and do not require word-boundary wrapping (distinctive prefixes eliminate false positives).
18
+
10
19
  ## [0.7.2] - 2026-05-09
11
20
 
12
21
  **Supersedes 0.7.1, which has been yanked from RubyGems.**
Binary file
Binary file
Binary file
Binary file
Binary file
Binary file
@@ -1,4 +1,4 @@
1
1
  module DataRedactor
2
2
  # Current gem version. Follows {https://semver.org Semantic Versioning 2.0.0}.
3
- VERSION = "0.7.2"
3
+ VERSION = "0.8.0"
4
4
  end
data/lib/data_redactor.rb CHANGED
@@ -1,4 +1,5 @@
1
1
  require "set"
2
+ require "json"
2
3
  require_relative "data_redactor/version"
3
4
  require_relative "data_redactor/data_redactor" # loads the compiled .so
4
5
 
@@ -161,6 +162,54 @@ module DataRedactor
161
162
  result
162
163
  end
163
164
 
165
+ # Recursively redact every String value in a nested Hash/Array structure.
166
+ #
167
+ # Walks the structure depth-first. Only String leaves are passed through
168
+ # {redact}; all other leaf types (Integer, Float, nil, Symbol, Boolean)
169
+ # are copied unchanged. Hash keys are never modified.
170
+ #
171
+ # Returns a deep copy — the original structure is never mutated.
172
+ #
173
+ # @param data [Hash, Array, String, Object] the structure to walk.
174
+ # Any type is accepted; non-String scalars are returned as-is.
175
+ # @param only [Symbol, String, Array, nil] forwarded to {redact}.
176
+ # @param except [Symbol, String, Array, nil] forwarded to {redact}.
177
+ # @param placeholder [String, :tagged, :hash] forwarded to {redact}.
178
+ # @return [Hash, Array, String, Object] a new structure of the same shape
179
+ # with all String leaves redacted.
180
+ # @raise [ArgumentError] if the structure contains a circular reference.
181
+ #
182
+ # @example Rails params
183
+ # safe = DataRedactor.redact_deep(params.to_h)
184
+ #
185
+ # @example Mixed filter
186
+ # DataRedactor.redact_deep(payload, only: :credentials, placeholder: :tagged)
187
+ def redact_deep(data, only: nil, except: nil, placeholder: PLACEHOLDER_DEFAULT)
188
+ _walk(data, only: only, except: except, placeholder: placeholder, seen: Set.new)
189
+ end
190
+
191
+ # Parse +json_string+, redact every String value in the resulting structure,
192
+ # and return valid JSON.
193
+ #
194
+ # Delegates traversal to {redact_deep}. All keyword arguments are forwarded
195
+ # to {redact}.
196
+ #
197
+ # @param json_string [String] valid JSON input.
198
+ # @param only [Symbol, String, Array, nil] forwarded to {redact}.
199
+ # @param except [Symbol, String, Array, nil] forwarded to {redact}.
200
+ # @param placeholder [String, :tagged, :hash] forwarded to {redact}.
201
+ # @return [String] a JSON string with all String values redacted.
202
+ # @raise [JSON::ParserError] if +json_string+ is not valid JSON.
203
+ #
204
+ # @example
205
+ # DataRedactor.redact_json('{"email":"alice@example.com","count":3}')
206
+ # # => '{"email":"[REDACTED]","count":3}'
207
+ def redact_json(json_string, only: nil, except: nil, placeholder: PLACEHOLDER_DEFAULT)
208
+ parsed = JSON.parse(json_string)
209
+ redacted = redact_deep(parsed, only: only, except: except, placeholder: placeholder)
210
+ JSON.generate(redacted)
211
+ end
212
+
164
213
  # Register a custom redaction pattern.
165
214
  #
166
215
  # Patterns must be valid POSIX ERE. Ruby-only syntax (+\d+, +\s+, +\w+,
@@ -317,6 +366,31 @@ module DataRedactor
317
366
  bits
318
367
  end
319
368
 
369
+ # @api private
370
+ # Depth-first recursive walker for {redact_deep}.
371
+ # +seen+ is a Set of object_ids already on the current traversal stack,
372
+ # used to detect circular references.
373
+ def _walk(node, only:, except:, placeholder:, seen:)
374
+ case node
375
+ when String
376
+ redact(node, only: only, except: except, placeholder: placeholder)
377
+ when Hash
378
+ raise ArgumentError, "redact_deep: circular reference detected" if seen.include?(node.object_id)
379
+ seen.add(node.object_id)
380
+ result = node.transform_values { |v| _walk(v, only: only, except: except, placeholder: placeholder, seen: seen) }
381
+ seen.delete(node.object_id)
382
+ result
383
+ when Array
384
+ raise ArgumentError, "redact_deep: circular reference detected" if seen.include?(node.object_id)
385
+ seen.add(node.object_id)
386
+ result = node.map { |v| _walk(v, only: only, except: except, placeholder: placeholder, seen: seen) }
387
+ seen.delete(node.object_id)
388
+ result
389
+ else
390
+ node
391
+ end
392
+ end
393
+
320
394
  # @api private
321
395
  def pattern_enabled?(name, tag_bit, only_present, only_bits, only_names,
322
396
  except_bits, except_names)
data/readme.md CHANGED
@@ -103,6 +103,36 @@ DataRedactor.scan(text, except: :network)
103
103
  DataRedactor.scan(text, only: :contact, except: ["email"])
104
104
  ```
105
105
 
106
+ ### Hash / JSON traversal
107
+
108
+ Redact every string value inside a nested Hash or Array — useful for params hashes, Sidekiq job payloads, webhook bodies, and anything that isn't a flat string:
109
+
110
+ ```ruby
111
+ # Hash — returns a deep copy, never mutates the input
112
+ result = DataRedactor.redact_deep({
113
+ "user" => { "email" => "alice@example.com" },
114
+ "count" => 3,
115
+ "tags" => ["admin", "alice@example.com"]
116
+ })
117
+ # => { "user" => { "email" => "[REDACTED]" }, "count" => 3, "tags" => ["admin", "[REDACTED]"] }
118
+
119
+ # Hash keys are never touched — only values are redacted
120
+ # Non-string scalars (Integer, Float, nil, Boolean) pass through unchanged
121
+
122
+ # Accepts the same filters as redact
123
+ DataRedactor.redact_deep(params, only: :credentials)
124
+ DataRedactor.redact_deep(payload, except: :network, placeholder: :tagged)
125
+ ```
126
+
127
+ ```ruby
128
+ # JSON string — parse → redact_deep → re-serialise
129
+ safe_json = DataRedactor.redact_json('{"email":"alice@example.com","count":3}')
130
+ # => '{"email":"[REDACTED]","count":3}'
131
+
132
+ # Raises JSON::ParserError on invalid input
133
+ DataRedactor.redact_json("not json") # => JSON::ParserError
134
+ ```
135
+
106
136
  ### Custom patterns
107
137
 
108
138
  Teams often have internal IDs that the gem can't ship. Register them at boot:
@@ -179,7 +209,7 @@ Pass an empty subset (e.g. `scrub: [:headers]`) to opt out of body wrapping. For
179
209
 
180
210
  > **Body wrapping is buffering.** The middleware reads the entire response body into memory before scanning. For streaming endpoints (SSE, large file downloads, Rack::Hijack) use `scrub: [:headers]` and rely on the Logger formatter for application logs instead.
181
211
 
182
- ## Detected patterns (85 total)
212
+ ## Detected patterns (88 total)
183
213
 
184
214
  The table below is a representative sample. Use `DataRedactor.pattern_names` for the canonical, machine-readable list — it stays in sync with the C extension automatically.
185
215
 
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: data_redactor
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.7.2
4
+ version: 0.8.0
5
5
  platform: aarch64-linux
6
6
  authors:
7
7
  - Daniele Frisanco