promptscrub 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 6fe475bdcd3c44552076094d338cf41c2644ce834060969132295834b47f8425
4
+ data.tar.gz: ef23ed4be55822096547adff17507da18f8b48d47981bc2c53074abf1ded4631
5
+ SHA512:
6
+ metadata.gz: 987376e104f603c3bb1c24e638e47591ec6b54642ff422ddf49c39f8ea5fc308b77a5a3edca9a779967f35bc22189951474ee47b2cf99c60740c2bd6b44508fa
7
+ data.tar.gz: 14621351db2f92ca5db1abb840452f6ead5307fa1e348ee30a50b4f3235dcce7100afccd7ea443f6164c4825c180f363b1c85035ba65274865bd8175cc22dde6
data/CHANGELOG.md ADDED
@@ -0,0 +1,14 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ ## [0.1.0] - 2025-06-03
6
+
7
+ ### Added
8
+ - `PromptScrub::Middleware` — Faraday middleware for request redaction + response rehydration
9
+ - Built-in detectors: email, SSN, credit card (Luhn-validated), US phone number
10
+ - `PromptScrub::Vault` — per-request thread-safe token↔value store
11
+ - `PromptScrub::StreamRehydrator` — streaming helper with partial-token buffer
12
+ - `PromptScrub.configure` block for global configuration
13
+ - `Configuration#add_detector` — register custom regex detectors
14
+ - `Configuration#disable_detector` — opt out of specific built-in detectors
data/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Jibran Usman
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,134 @@
1
+ # promptscrub
2
+
3
+ [![Gem Version](https://badge.fury.io/rb/promptscrub.svg)](https://rubygems.org/gems/promptscrub)
4
+ [![CI](https://github.com/jibranusman95/promptscrub/actions/workflows/ci.yml/badge.svg)](https://github.com/jibranusman95/promptscrub/actions/workflows/ci.yml)
5
+
6
+ **Strip PII from LLM prompts. Rehydrate it in responses. Your users see real data. Your LLM provider never does.**
7
+
8
+ Drop-in Faraday middleware for OpenAI, Anthropic, Gemini — and any LLM library built on Faraday (RubyLLM, langchainrb, llm.rb).
9
+
10
+ ```
11
+ Your App PromptScrub LLM API
12
+ │ │ │
13
+ │ "SSN is 123-45-6789" │ │
14
+ │──────────────────────────►│ redact │
15
+ │ │ "SSN is <SSN_001>" │
16
+ │ │─────────────────────────►│
17
+ │ │ │ generate
18
+ │ │◄─────────────────────────│
19
+ │ │ "Your SSN <SSN_001>..." │
20
+ │ │ rehydrate │
21
+ │ "Your SSN 123-45-6789..."│ │
22
+ │◄──────────────────────────│ │
23
+ ```
24
+
25
+ No infra to deploy. No gateway to operate. Just middleware.
26
+
27
+ ## Installation
28
+
29
+ ```ruby
30
+ gem "promptscrub"
31
+ ```
32
+
33
+ ## Quick start
34
+
35
+ ```ruby
36
+ require "faraday"
37
+ require "promptscrub"
38
+
39
+ conn = Faraday.new("https://api.openai.com") do |f|
40
+ f.use PromptScrub::Middleware
41
+ f.request :json
42
+ f.response :json
43
+ f.adapter Faraday.default_adapter
44
+ end
45
+
46
+ # PII is stripped before the request leaves your app.
47
+ # Tokens are rehydrated in the response. Transparent to your code.
48
+ response = conn.post("/v1/chat/completions", {
49
+ model: "gpt-4o",
50
+ messages: [{ role: "user", content: "Summarize claim for SSN 234-56-7890, card 4532015112830366" }]
51
+ })
52
+ ```
53
+
54
+ ### With RubyLLM
55
+
56
+ ```ruby
57
+ RubyLLM.configure do |c|
58
+ c.faraday do |f|
59
+ f.use PromptScrub::Middleware
60
+ end
61
+ end
62
+ ```
63
+
64
+ ## Built-in detectors
65
+
66
+ | Type | Detects | Token example |
67
+ |---------|------------------------------------|-----------------|
68
+ | EMAIL | `john.doe+tag@sub-domain.co.uk` | `<EMAIL_001>` |
69
+ | SSN | `123-45-6789` (invalid ranges excluded) | `<SSN_001>` |
70
+ | CARD | 13–19 digit numbers (Luhn-validated) | `<CARD_001>` |
71
+ | PHONE | US numbers in all common formats | `<PHONE_001>` |
72
+
73
+ Same value always maps to the same token within a request — so `alice@corp.com` appearing twice becomes `<EMAIL_001>` twice.
74
+
75
+ ## Configuration
76
+
77
+ ```ruby
78
+ PromptScrub.configure do |config|
79
+ # Add a custom detector
80
+ config.add_detector(:zip, /\b\d{5}(-\d{4})?\b/)
81
+
82
+ # Opt out of a built-in
83
+ config.disable_detector(:phone)
84
+
85
+ # Redact only outbound (skip rehydration)
86
+ config.scrub_response = false
87
+ end
88
+ ```
89
+
90
+ ## Streaming (SSE)
91
+
92
+ For streaming responses where your app processes chunks directly, use `StreamRehydrator` to wrap your callback:
93
+
94
+ ```ruby
95
+ vault = PromptScrub::Vault.new
96
+ redactor = PromptScrub::Redactor.new(vault, PromptScrub.configuration.detectors)
97
+ rehydrator = PromptScrub::StreamRehydrator.new(vault) do |clean_chunk|
98
+ print clean_chunk # user sees real values
99
+ end
100
+
101
+ # Before streaming request:
102
+ redacted_prompt = redactor.scrub(user_prompt)
103
+
104
+ # For each SSE chunk received:
105
+ rehydrator.call(raw_chunk)
106
+
107
+ # After stream ends:
108
+ rehydrator.flush
109
+ ```
110
+
111
+ `StreamRehydrator` buffers partial tokens at chunk boundaries (e.g. `<EMAIL_` split across two chunks) and flushes them correctly when the token completes.
112
+
113
+ ## How it works
114
+
115
+ 1. **Redact** — on every outgoing request, `Redactor` scans the body string with all registered detectors and replaces matches with `<TYPE_NNN>` tokens. Each unique value gets a stable token stored in a per-request `Vault`.
116
+ 2. **Send** — the redacted body hits the LLM API. The model never sees real PII.
117
+ 3. **Rehydrate** — on the response, `Rehydrator` scans for token patterns and substitutes original values from the vault. Your application code receives the real data.
118
+
119
+ The vault is in-memory and scoped to a single request — no persistence, no shared state between requests.
120
+
121
+ ## Security notes
122
+
123
+ - Tokens are **not encrypted**. The vault lives in your process memory for the duration of a request.
124
+ - Detection is regex-based. It will catch well-formed PII; obfuscated or unusual formats may slip through.
125
+ - For high-assurance use cases (HIPAA, PCI-DSS), add custom detectors for your specific data patterns and review false-negative rates in your domain.
126
+ - promptscrub is client-side middleware. It does not replace network-level controls or data governance policies.
127
+
128
+ ## Contributing
129
+
130
+ Bug reports and pull requests are welcome on [GitHub](https://github.com/jibranusman95/promptscrub).
131
+
132
+ ## License
133
+
134
+ MIT — see [LICENSE](LICENSE).
@@ -0,0 +1,40 @@
1
+ # frozen_string_literal: true
2
+
3
+ module PromptScrub
4
+ class Configuration
5
+ attr_accessor :scrub_request, :scrub_response
6
+ attr_reader :detectors
7
+
8
+ def initialize
9
+ @scrub_request = true
10
+ @scrub_response = true
11
+ @detectors = default_detectors
12
+ end
13
+
14
+ def add_detector(type_or_detector, pattern = nil)
15
+ detector = if type_or_detector.is_a?(Detector)
16
+ type_or_detector
17
+ else
18
+ Detector.new(type_or_detector, pattern)
19
+ end
20
+ @detectors << detector
21
+ self
22
+ end
23
+
24
+ def disable_detector(type)
25
+ @detectors.reject! { |d| d.type.casecmp(type.to_s).zero? }
26
+ self
27
+ end
28
+
29
+ private
30
+
31
+ def default_detectors
32
+ [
33
+ Detectors::Email.new,
34
+ Detectors::SSN.new,
35
+ Detectors::CreditCard.new,
36
+ Detectors::Phone.new
37
+ ]
38
+ end
39
+ end
40
+ end
@@ -0,0 +1,16 @@
1
+ # frozen_string_literal: true
2
+
3
+ module PromptScrub
4
+ class Detector
5
+ attr_reader :type, :pattern
6
+
7
+ def initialize(type, pattern)
8
+ @type = type.to_s.upcase
9
+ @pattern = pattern
10
+ end
11
+
12
+ def scan(text)
13
+ text.scan(pattern).map { |m| m.is_a?(Array) ? m.first : m }.uniq
14
+ end
15
+ end
16
+ end
@@ -0,0 +1,34 @@
1
+ # frozen_string_literal: true
2
+
3
+ module PromptScrub
4
+ module Detectors
5
+ class CreditCard < Detector
6
+ PATTERN = /\b(?:\d[ -]?){13,18}\d\b/
7
+
8
+ def initialize
9
+ super('CARD', PATTERN)
10
+ end
11
+
12
+ def scan(text)
13
+ super.select { |n| luhn_valid?(n.gsub(/[ -]/, '')) }
14
+ end
15
+
16
+ private
17
+
18
+ def luhn_valid?(number)
19
+ return false unless number.match?(/\A\d{13,19}\z/)
20
+
21
+ digits = number.chars.map(&:to_i)
22
+ sum = digits.reverse.each_with_index.sum do |digit, i|
23
+ if i.odd?
24
+ doubled = digit * 2
25
+ doubled > 9 ? doubled - 9 : doubled
26
+ else
27
+ digit
28
+ end
29
+ end
30
+ (sum % 10).zero?
31
+ end
32
+ end
33
+ end
34
+ end
@@ -0,0 +1,13 @@
1
+ # frozen_string_literal: true
2
+
3
+ module PromptScrub
4
+ module Detectors
5
+ class Email < Detector
6
+ PATTERN = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b/
7
+
8
+ def initialize
9
+ super('EMAIL', PATTERN)
10
+ end
11
+ end
12
+ end
13
+ end
@@ -0,0 +1,13 @@
1
+ # frozen_string_literal: true
2
+
3
+ module PromptScrub
4
+ module Detectors
5
+ class Phone < Detector
6
+ PATTERN = /(?<!\d)(?:\+1[-.\s]?)?\(?[2-9]\d{2}\)?[-.\s]?\d{3}[-.\s]?\d{4}(?!\d)/
7
+
8
+ def initialize
9
+ super('PHONE', PATTERN)
10
+ end
11
+ end
12
+ end
13
+ end
@@ -0,0 +1,13 @@
1
+ # frozen_string_literal: true
2
+
3
+ module PromptScrub
4
+ module Detectors
5
+ class SSN < Detector
6
+ PATTERN = /\b(?!000|666|9\d{2})\d{3}-(?!00)\d{2}-(?!0000)\d{4}\b/
7
+
8
+ def initialize
9
+ super('SSN', PATTERN)
10
+ end
11
+ end
12
+ end
13
+ end
@@ -0,0 +1,35 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'json'
4
+
5
+ module PromptScrub
6
+ class Middleware < Faraday::Middleware
7
+ def initialize(app, config: PromptScrub.configuration)
8
+ super(app)
9
+ @config = config
10
+ end
11
+
12
+ def call(env)
13
+ vault = Vault.new
14
+ redactor = Redactor.new(vault, @config.detectors)
15
+ rehydrator = Rehydrator.new(vault)
16
+
17
+ env[:body] = scrub_body(env[:body], redactor) if @config.scrub_request
18
+
19
+ @app.call(env).on_complete do |response_env|
20
+ if @config.scrub_response && !vault.empty? && response_env[:body].is_a?(String)
21
+ response_env[:body] = rehydrator.rehydrate(response_env[:body])
22
+ end
23
+ end
24
+ end
25
+
26
+ private
27
+
28
+ def scrub_body(body, redactor)
29
+ return body if body.nil?
30
+
31
+ body_str = body.is_a?(String) ? body : JSON.generate(body)
32
+ redactor.scrub(body_str)
33
+ end
34
+ end
35
+ end
@@ -0,0 +1,23 @@
1
+ # frozen_string_literal: true
2
+
3
+ module PromptScrub
4
+ class Redactor
5
+ def initialize(vault, detectors)
6
+ @vault = vault
7
+ @detectors = detectors
8
+ end
9
+
10
+ def scrub(text)
11
+ return text if text.nil? || text.empty?
12
+
13
+ result = text.dup
14
+ @detectors.each do |detector|
15
+ detector.scan(result).each do |match|
16
+ token = @vault.tokenize(detector.type, match)
17
+ result = result.gsub(match, token)
18
+ end
19
+ end
20
+ result
21
+ end
22
+ end
23
+ end
@@ -0,0 +1,20 @@
1
+ # frozen_string_literal: true
2
+
3
+ module PromptScrub
4
+ class Rehydrator
5
+ TOKEN_PATTERN = /<[A-Z]+_\d{3}>/
6
+
7
+ def initialize(vault)
8
+ @vault = vault
9
+ end
10
+
11
+ def rehydrate(text)
12
+ return text if text.nil? || text.empty?
13
+ return text if @vault.empty?
14
+
15
+ text.gsub(TOKEN_PATTERN) do |token|
16
+ @vault.rehydrate(token) || token
17
+ end
18
+ end
19
+ end
20
+ end
@@ -0,0 +1,39 @@
1
+ # frozen_string_literal: true
2
+
3
+ module PromptScrub
4
+ class StreamRehydrator
5
+ PARTIAL_TOKEN = /<[A-Z_\d]*\z/
6
+
7
+ def initialize(vault, &callback)
8
+ @vault = vault
9
+ @callback = callback
10
+ @buffer = ''
11
+ end
12
+
13
+ def call(chunk)
14
+ combined = @buffer + chunk
15
+ @buffer = ''
16
+
17
+ if (match = combined.match(PARTIAL_TOKEN))
18
+ @buffer = match[0]
19
+ combined = combined[0, match.begin(0)]
20
+ end
21
+
22
+ @callback.call(rehydrate(combined)) unless combined.empty?
23
+ end
24
+
25
+ def flush
26
+ result = rehydrate(@buffer)
27
+ @buffer = ''
28
+ @callback.call(result) unless result.empty?
29
+ end
30
+
31
+ private
32
+
33
+ def rehydrate(text)
34
+ text.gsub(Rehydrator::TOKEN_PATTERN) do |token|
35
+ @vault.rehydrate(token) || token
36
+ end
37
+ end
38
+ end
39
+ end
@@ -0,0 +1,36 @@
1
+ # frozen_string_literal: true
2
+
3
+ module PromptScrub
4
+ class Vault
5
+ def initialize
6
+ @store = {}
7
+ @reverse = {}
8
+ @counters = Hash.new(0)
9
+ @mutex = Mutex.new
10
+ end
11
+
12
+ def tokenize(type, value)
13
+ @mutex.synchronize do
14
+ return @reverse[value] if @reverse.key?(value)
15
+
16
+ @counters[type] += 1
17
+ token = format('<%<type>s_%<num>03d>', type: type.upcase, num: @counters[type])
18
+ @store[token] = value
19
+ @reverse[value] = token
20
+ token
21
+ end
22
+ end
23
+
24
+ def rehydrate(token)
25
+ @mutex.synchronize { @store[token] }
26
+ end
27
+
28
+ def empty?
29
+ @mutex.synchronize { @store.empty? }
30
+ end
31
+
32
+ def size
33
+ @mutex.synchronize { @store.size }
34
+ end
35
+ end
36
+ end
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ module PromptScrub
4
+ VERSION = '0.1.0'
5
+ end
@@ -0,0 +1,31 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'faraday'
4
+ require_relative 'promptscrub/version'
5
+ require_relative 'promptscrub/configuration'
6
+ require_relative 'promptscrub/vault'
7
+ require_relative 'promptscrub/detector'
8
+ require_relative 'promptscrub/detectors/email'
9
+ require_relative 'promptscrub/detectors/ssn'
10
+ require_relative 'promptscrub/detectors/credit_card'
11
+ require_relative 'promptscrub/detectors/phone'
12
+ require_relative 'promptscrub/redactor'
13
+ require_relative 'promptscrub/rehydrator'
14
+ require_relative 'promptscrub/middleware'
15
+ require_relative 'promptscrub/stream_rehydrator'
16
+
17
+ module PromptScrub
18
+ class << self
19
+ def configuration
20
+ @configuration ||= Configuration.new
21
+ end
22
+
23
+ def configure
24
+ yield configuration
25
+ end
26
+
27
+ def reset!
28
+ @configuration = nil
29
+ end
30
+ end
31
+ end
metadata ADDED
@@ -0,0 +1,81 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: promptscrub
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Jibran Usman
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2026-06-03 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: faraday
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: '1.0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: '1.0'
27
+ description: |
28
+ Drop-in Faraday middleware that detects and tokenizes PII (emails, SSNs, credit cards,
29
+ phone numbers, custom patterns) in outgoing LLM requests, then rehydrates tokens back
30
+ in responses. Works with OpenAI, Anthropic, Gemini, RubyLLM, langchainrb, and any
31
+ Faraday-based HTTP client. Includes StreamRehydrator for SSE streaming use cases.
32
+ email:
33
+ - jibran.usman@hotmail.com
34
+ executables: []
35
+ extensions: []
36
+ extra_rdoc_files: []
37
+ files:
38
+ - CHANGELOG.md
39
+ - LICENSE
40
+ - README.md
41
+ - lib/promptscrub.rb
42
+ - lib/promptscrub/configuration.rb
43
+ - lib/promptscrub/detector.rb
44
+ - lib/promptscrub/detectors/credit_card.rb
45
+ - lib/promptscrub/detectors/email.rb
46
+ - lib/promptscrub/detectors/phone.rb
47
+ - lib/promptscrub/detectors/ssn.rb
48
+ - lib/promptscrub/middleware.rb
49
+ - lib/promptscrub/redactor.rb
50
+ - lib/promptscrub/rehydrator.rb
51
+ - lib/promptscrub/stream_rehydrator.rb
52
+ - lib/promptscrub/vault.rb
53
+ - lib/promptscrub/version.rb
54
+ homepage: https://github.com/jibranusman95/promptscrub
55
+ licenses:
56
+ - MIT
57
+ metadata:
58
+ homepage_uri: https://github.com/jibranusman95/promptscrub
59
+ source_code_uri: https://github.com/jibranusman95/promptscrub
60
+ changelog_uri: https://github.com/jibranusman95/promptscrub/blob/main/CHANGELOG.md
61
+ post_install_message:
62
+ rdoc_options: []
63
+ require_paths:
64
+ - lib
65
+ required_ruby_version: !ruby/object:Gem::Requirement
66
+ requirements:
67
+ - - ">="
68
+ - !ruby/object:Gem::Version
69
+ version: '3.1'
70
+ required_rubygems_version: !ruby/object:Gem::Requirement
71
+ requirements:
72
+ - - ">="
73
+ - !ruby/object:Gem::Version
74
+ version: '0'
75
+ requirements: []
76
+ rubygems_version: 3.5.22
77
+ signing_key:
78
+ specification_version: 4
79
+ summary: Bidirectional PII redaction for LLM calls — strip sensitive data from prompts,
80
+ rehydrate in responses.
81
+ test_files: []