promptmenot 0.1.2 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 4f76b9212ba4a8eb4b7e1835cb82bb573eb769487dc7ad5b7de743cc52718bff
4
- data.tar.gz: caf3f49249460d07ad4a21a226426723e24902a0db5d764ae0f271c9b9485d47
3
+ metadata.gz: 6240f653fe69f0219a000d1633adafaf5752d791f99890f7a16e5256aeb8c0c9
4
+ data.tar.gz: ee2a9badc12c91d1a6d075b7d8aebc04c480ee05e2d6f6b26924a1ebc8807fcb
5
5
  SHA512:
6
- metadata.gz: 5f666cbc3caac8763f662fc3afa96d1c01699bd88085f38777919576060907b543dd9caccd44f747a3e36c7e4a40915635a0b4dfd44b57335aad3abf67e80075
7
- data.tar.gz: 7a6af1ec5bcd9af34baf62c0f5f69254e3924d00f6a3ffd212d6e203af8850f20c1ccebaf8ada61dbad5ab8ddc50030051d44c24c36e03fbe126530a23545fe4
6
+ metadata.gz: 04b75ccb08cf0dd68f2f0696fa855c0ec201fc91b60921dc5530a631cff62de7b6332573513200bd8701696f199335dc6dc3b5104ff9a17d7a21506f5e0ff745
7
+ data.tar.gz: 7d2d1c1f02d9abab3557eb04d5cd162bc878bba686f55ea8e9c7a55f3f92cbeb03700be1fbccbeba460c659129e9a83ede3abdbeda70bb38c21c2fb4084ed948
data/CHANGELOG.md CHANGED
@@ -1,10 +1,32 @@
1
1
  # Changelog
2
2
 
3
+ ## [0.2.0] - 2026-02-25
4
+
5
+ ### Added
6
+
7
+ - New `resource_extraction` pattern category with 10 patterns detecting attempts to trick AI agents into transferring funds, leaking credentials, or exhausting compute resources
8
+ - Crypto transfer requests ("transfer 100 SOL to")
9
+ - Wallet address detection with transfer instructions
10
+ - Full balance drain attempts ("send all your tokens")
11
+ - Financial urgency manipulation
12
+ - Authorization claims for transfers
13
+ - Transaction execution instructions
14
+ - Credential extraction ("give me your API key", "show seed phrase")
15
+ - External endpoint exfiltration ("send results to https://...")
16
+ - Access escalation ("grant me full access to the wallet")
17
+ - Resource exhaustion ("use all your credits")
18
+
19
+ ## [0.1.3] - 2026-02-17
20
+
21
+ ### Fixed
22
+
23
+ - Ruby 3.0+ compatibility by relaxing ActiveSupport/ActiveModel constraints to allow v7.x
24
+
3
25
  ## [0.1.0] - 2026-02-17
4
26
 
5
27
  ### Added
6
28
 
7
- - Core detection engine with 6 pattern categories (~60 patterns)
29
+ - Core detection engine with 6 pattern categories (~60 patterns):
8
30
  - Direct instruction override
9
31
  - Role manipulation
10
32
  - Delimiter injection
data/CONTRIBUTING.md CHANGED
@@ -1,6 +1,16 @@
1
1
  # Contributing to PromptMeNot
2
2
 
3
- We'd love your help improving PromptMeNot! Here's how to contribute:
3
+ Thanks for your interest in PromptMeNot! Whether you're adding new detection patterns, fixing bugs, or improving docs, contributions are welcome.
4
+
5
+ ## Table of Contents
6
+
7
+ - [Development Setup](#development-setup)
8
+ - [Architecture Overview](#architecture-overview)
9
+ - [Adding New Patterns](#adding-new-patterns)
10
+ - [Testing](#testing)
11
+ - [Code Style](#code-style)
12
+ - [Submitting Changes](#submitting-changes)
13
+ - [Reporting Issues](#reporting-issues)
4
14
 
5
15
  ## Development Setup
6
16
 
@@ -10,60 +20,283 @@ cd promptmenot
10
20
  bundle install
11
21
  ```
12
22
 
13
- ## Running Tests
23
+ Verify everything works:
14
24
 
15
25
  ```bash
16
- # Run full test suite
17
- bundle exec rspec
26
+ bundle exec rake # runs rspec + rubocop
27
+ ```
28
+
29
+ **Requirements**: Ruby 3.0+, Bundler
30
+
31
+ ## Architecture Overview
32
+
33
+ Understanding how detection works will help you contribute effectively:
34
+
35
+ ```
36
+ Input text
37
+ -> Detector
38
+ -> PatternRegistry.for_sensitivity(level)
39
+ -> Pattern.match(text) # per pattern
40
+ -> Deduplicate overlapping matches
41
+ -> Result (safe/unsafe + matches)
42
+ ```
43
+
44
+ **Key components:**
45
+
46
+ | File | Purpose |
47
+ |------|---------|
48
+ | `lib/promptmenot/detector.rb` | Core detection engine - scans text against active patterns |
49
+ | `lib/promptmenot/pattern.rb` | Pattern class - wraps a name, regex, sensitivity, and confidence |
50
+ | `lib/promptmenot/pattern_registry.rb` | Stores all registered patterns, filters by sensitivity level |
51
+ | `lib/promptmenot/sanitizer.rb` | Removes matched content from text (`:sanitize` mode) |
52
+ | `lib/promptmenot/validator.rb` | ActiveModel integration (`validates :field, prompt_safety: ...`) |
53
+ | `lib/promptmenot/patterns/` | Pattern definitions organized by attack category |
54
+
55
+ **Sensitivity cascade**: Patterns activate cumulatively. A `:low` pattern fires at *all* levels. A `:high` pattern only fires at `:high` and `:paranoid`. Think of it as:
56
+
57
+ ```
58
+ :paranoid -> all patterns active
59
+ :high -> :low + :medium + :high
60
+ :medium -> :low + :medium
61
+ :low -> :low only
62
+ ```
63
+
64
+ ## Adding New Patterns
65
+
66
+ This is the most common (and most valuable) contribution. If you've found a prompt injection technique that PromptMeNot doesn't catch, here's how to add it.
67
+
68
+ ### 1. Choose the right category
69
+
70
+ Patterns live in `lib/promptmenot/patterns/` and are organized by attack type:
71
+
72
+ | Category | File | Examples |
73
+ |----------|------|----------|
74
+ | Direct instruction override | `direct_instruction_override.rb` | "ignore previous instructions", "new instructions:" |
75
+ | Role manipulation | `role_manipulation.rb` | "DAN mode", "act as unrestricted AI" |
76
+ | Delimiter injection | `delimiter_injection.rb` | `<\|system\|>`, `[INST]`, fake XML/markdown headers |
77
+ | Encoding obfuscation | `encoding_obfuscation.rb` | Base64 payloads, hex escapes, zero-width chars |
78
+ | Indirect injection | `indirect_injection.rb` | "Dear AI", "if you are an LLM" |
79
+ | Context manipulation | `context_manipulation.rb` | `===RESET===`, prompt leaking attempts |
80
+ | Resource extraction | `resource_extraction.rb` | Crypto transfers, wallet drains, credential theft, endpoint exfiltration |
18
81
 
19
- # Run specific test file
20
- bundle exec rspec spec/promptmenot/detector_spec.rb
82
+ If your pattern doesn't fit any existing category, open an issue to discuss adding a new one.
21
83
 
22
- # Run with coverage
23
- bundle exec rspec --coverage
84
+ ### 2. Write the pattern
85
+
86
+ All pattern classes inherit from `Patterns::Base` and use the `register` DSL:
87
+
88
+ ```ruby
89
+ # frozen_string_literal: true
90
+
91
+ module Promptmenot
92
+ module Patterns
93
+ class DirectInstructionOverride < Base
94
+ register(
95
+ name: :ignore_previous_instructions,
96
+ regex: /\bignore\s+(all\s+)?(previous|prior|above)\s+(instructions|directives|rules)\b/i,
97
+ sensitivity: :low,
98
+ confidence: :high
99
+ )
100
+ end
101
+ end
102
+ end
24
103
  ```
25
104
 
26
- ## Code Quality
105
+ **Parameters:**
106
+
107
+ - **`name`** (Symbol) - Unique identifier across all patterns. Use `snake_case` describing the attack.
108
+ - **`regex`** (Regexp) - The detection pattern. Always use `\b` word boundaries to prevent substring matches. Always use the `/i` flag for case-insensitive matching.
109
+ - **`sensitivity`** (Symbol) - When this pattern activates:
110
+ - `:low` - High-signal, almost never a false positive. Use for unambiguous injection phrases like "ignore all previous instructions".
111
+ - `:medium` - Reasonable default. May have edge cases but generally reliable.
112
+ - `:high` - Catches more but may flag legitimate text. Use for broader patterns like "from now on you must...".
113
+ - `:paranoid` - Maximum coverage, higher false positive rate. Use for patterns that catch things like encoded content that *could* be legitimate.
114
+ - **`confidence`** (Symbol) - How certain we are that a match is actually an injection:
115
+ - `:high` - The matched text is almost certainly an injection attempt.
116
+ - `:medium` - Likely an injection, but could appear in normal text.
117
+ - `:low` - Suspicious but may need human review.
118
+
119
+ ### 3. Guidelines for good patterns
120
+
121
+ **DO:**
122
+ - Use `\b` word boundaries to anchor matches
123
+ - Use non-capturing groups `(?:...)` when you don't need captures
124
+ - Test against realistic false positives (see [Testing](#testing))
125
+ - Keep regexes readable - complex patterns should have comments
126
+ - Consider multilingual variations if applicable
127
+
128
+ **DON'T:**
129
+ - Write overly broad patterns that match normal English (e.g., don't match just "ignore" alone)
130
+ - Duplicate existing patterns - check the registry first
131
+ - Use lookbehinds/lookaheads unless necessary (they hurt performance)
132
+ - Forget the `/i` flag - injections come in all cases
133
+
134
+ ### 4. Sensitivity/confidence decision guide
135
+
136
+ Ask yourself:
137
+
138
+ > "If a user submitted this text in a normal form, would it ever appear naturally?"
139
+
140
+ - **Never** (e.g., "ignore all previous instructions") -> `:low` sensitivity, `:high` confidence
141
+ - **Rarely** (e.g., "new instructions:") -> `:medium` sensitivity, `:medium` confidence
142
+ - **Sometimes** (e.g., "from now on you will") -> `:high` sensitivity, `:medium` confidence
143
+ - **Often** (e.g., encoded Base64 content) -> `:paranoid` sensitivity, `:low` confidence
144
+
145
+ ## Testing
146
+
147
+ Every pattern needs tests. No exceptions.
148
+
149
+ ### Running tests
27
150
 
28
151
  ```bash
29
- # Run RuboCop linter
30
- bundle exec rubocop
152
+ # Full suite
153
+ bundle exec rspec
31
154
 
32
- # Auto-fix offenses
33
- bundle exec rubocop -a
155
+ # Specific pattern tests
156
+ bundle exec rspec spec/promptmenot/patterns/direct_instruction_override_spec.rb
157
+
158
+ # Run a single test by line number
159
+ bundle exec rspec spec/promptmenot/patterns/direct_instruction_override_spec.rb:19
34
160
  ```
35
161
 
36
- ## Making Changes
162
+ ### Writing pattern tests
37
163
 
38
- 1. **Fork** the repository on GitHub
39
- 2. **Create a branch** for your feature: `git checkout -b feature/my-feature`
40
- 3. **Make your changes** and add tests
41
- 4. **Ensure all tests pass**: `bundle exec rspec`
42
- 5. **Ensure code is clean**: `bundle exec rubocop -a`
43
- 6. **Commit** with clear messages: `git commit -am 'Add new pattern for X'`
44
- 7. **Push** to your fork: `git push origin feature/my-feature`
45
- 8. **Open a PR** on GitHub
164
+ Pattern specs follow a consistent structure with two sections: **detections** (unsafe text that should be caught) and **false positive resistance** (safe text that should pass).
46
165
 
47
- ## Adding New Patterns
166
+ ```ruby
167
+ # frozen_string_literal: true
168
+
169
+ RSpec.describe Promptmenot::Patterns::YourCategory do
170
+ let(:patterns) { described_class.patterns }
171
+
172
+ describe "pattern registration" do
173
+ it "registers patterns" do
174
+ expect(patterns).not_to be_empty
175
+ end
176
+
177
+ it "all patterns have correct category" do
178
+ patterns.each do |pattern|
179
+ expect(pattern.category).to eq(:your_category)
180
+ end
181
+ end
182
+ end
183
+
184
+ describe "detections" do
185
+ [
186
+ "your injection example here",
187
+ "another variant of the attack",
188
+ ].each do |injection|
189
+ it "detects: #{injection[0..50]}" do
190
+ result = Promptmenot.detect(injection, sensitivity: :high)
191
+ expect(result).to be_unsafe, "Expected '#{injection}' to be detected as unsafe"
192
+ end
193
+ end
194
+ end
195
+
196
+ describe "false positive resistance" do
197
+ [
198
+ "Normal sentence that looks similar but is safe",
199
+ "Another benign example using similar words",
200
+ ].each do |safe_text|
201
+ it "allows: #{safe_text[0..50]}" do
202
+ result = Promptmenot.detect(safe_text, sensitivity: :medium)
203
+ expect(result).to be_safe, "Expected '#{safe_text}' to pass but got: #{result.patterns_detected}"
204
+ end
205
+ end
206
+ end
207
+ end
208
+ ```
209
+
210
+ **Important testing notes:**
211
+
212
+ - Detection tests use `sensitivity: :high` to ensure patterns are active
213
+ - False positive tests use `sensitivity: :medium` (the default) to ensure safe text isn't wrongly flagged at normal settings
214
+ - The custom matchers `be_safe` and `be_unsafe` are defined in `spec/spec_helper.rb`
215
+ - Include at least 3-5 detection examples covering variations (caps, spacing, phrasing)
216
+ - Include at least 3-5 false positive examples with similar-looking but safe text
217
+ - `Promptmenot.reset!` runs automatically between tests (configured in spec_helper)
218
+
219
+ ## Code Style
220
+
221
+ We use RuboCop. Run it before submitting:
222
+
223
+ ```bash
224
+ bundle exec rubocop # check
225
+ bundle exec rubocop -a # auto-fix
226
+ ```
227
+
228
+ Key conventions:
229
+
230
+ - **Frozen string literals** required in all files (`# frozen_string_literal: true`)
231
+ - **Double quotes** for strings
232
+ - **Max line length**: 120 characters (relaxed for regex patterns in pattern files)
233
+ - **Max method length**: 20 lines
234
+ - **Max class length**: 150 lines
235
+
236
+ ## Submitting Changes
237
+
238
+ 1. **Fork** the repository
239
+ 2. **Create a branch**: `git checkout -b feature/detect-new-attack-type`
240
+ 3. **Write your code and tests**
241
+ 4. **Run the full suite**: `bundle exec rake`
242
+ 5. **Commit** with a clear message:
243
+ ```
244
+ feat: add detection for [attack type]
245
+
246
+ Adds N patterns to detect [description].
247
+ Includes M safe-text cases for false positive resistance.
248
+ ```
249
+ 6. **Push** to your fork: `git push origin feature/detect-new-attack-type`
250
+ 7. **Open a Pull Request** against `main`
251
+
252
+ ### Commit message format
253
+
254
+ We loosely follow [Conventional Commits](https://www.conventionalcommits.org/):
255
+
256
+ - `feat:` new patterns, features, or capabilities
257
+ - `fix:` bug fixes or false positive corrections
258
+ - `docs:` documentation changes
259
+ - `test:` test-only changes
260
+ - `chore:` build, CI, or maintenance tasks
48
261
 
49
- New injection attack patterns go in `lib/promptmenot/patterns/`.
262
+ ### PR checklist
50
263
 
51
- See existing pattern files for the DSL. Each pattern registers with:
52
- - `name` — unique identifier
53
- - `regex` — detection pattern
54
- - `sensitivity` — `:low`, `:medium`, `:high`, or `:paranoid`
55
- - `confidence` — `:high`, `:medium`, or `:low`
264
+ Before submitting, make sure:
56
265
 
57
- Always include tests in `spec/promptmenot/patterns/`.
266
+ - [ ] All tests pass (`bundle exec rspec`)
267
+ - [ ] RuboCop is clean (`bundle exec rubocop`)
268
+ - [ ] New patterns have both detection and false-positive tests
269
+ - [ ] Sensitivity and confidence levels are justified
270
+ - [ ] No overly broad regexes that would cause false positives at `:medium`
58
271
 
59
272
  ## Reporting Issues
60
273
 
61
- Found a bug or have a suggestion? Open an issue on GitHub with:
274
+ ### Bugs
275
+
276
+ Open an issue with:
277
+
62
278
  - Clear description of the problem
63
- - Steps to reproduce (if applicable)
279
+ - Steps to reproduce
64
280
  - Expected vs. actual behavior
65
- - Ruby/Rails version info
281
+ - Ruby and Rails version (`ruby -v`, `rails -v`)
282
+ - PromptMeNot version (`Promptmenot::VERSION`)
283
+
284
+ ### False Positives
285
+
286
+ If PromptMeNot is flagging legitimate text:
287
+
288
+ - Include the exact text being flagged
289
+ - The sensitivity level you're using
290
+ - Which pattern(s) matched (check `result.matches`)
291
+
292
+ ### Missed Injections
293
+
294
+ If you've found an injection that gets through:
295
+
296
+ - Include the injection text
297
+ - The sensitivity level you tested at
298
+ - Bonus points if you include a PR with the fix!
66
299
 
67
300
  ## License
68
301
 
69
- All contributions are made under the MIT license.
302
+ All contributions are made under the [MIT License](LICENSE.txt).
data/README.md CHANGED
@@ -1,9 +1,50 @@
1
1
  # PromptMeNot
2
2
 
3
- Detect and sanitize prompt injection attacks in user-submitted text. Protects Rails apps against:
3
+ [![Build Status](https://github.com/kevinl05/promptmenot/actions/workflows/ci.yml/badge.svg)](https://github.com/kevinl05/promptmenot/actions/workflows/ci.yml)
4
+ [![Gem Version](https://badge.fury.io/rb/promptmenot.svg)](https://rubygems.org/gems/promptmenot)
5
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
6
+ [![Ruby 3.0+](https://img.shields.io/badge/Ruby-3.0%2B-red.svg)](https://www.ruby-lang.org/)
4
7
 
5
- - **Direct injection** -- users trying to hack your LLMs via form inputs
6
- - **Indirect injection** -- users storing malicious prompts in profiles so other LLMs that scrape your site get compromised
8
+ PromptMeNot is a Ruby gem that helps protect your application against prompt injection attacks. It scans user-submitted text for malicious patterns like instruction overrides, role manipulation, delimiter injection, encoding tricks, and more. With ~70 built-in detection patterns organized across 7 attack categories, it covers direct injection (where users try to hijack your LLM through form inputs), indirect injection (where malicious prompts are stored in user content like profiles or comments, waiting for other LLMs to scrape and execute them), and resource extraction (where attackers trick AI agents into transferring funds, leaking credentials, or exhausting compute resources). It plugs into Rails with a simple ActiveModel validator or works standalone in any Ruby app, with configurable sensitivity levels so you can tune the trade-off between coverage and false positives.
9
+
10
+ ---
11
+
12
+ ### What it catches
13
+
14
+ | Attack type | Description |
15
+ |---|---|
16
+ | **Direct injection** | Users trying to override LLM instructions via form inputs (e.g., "ignore all previous instructions") |
17
+ | **Indirect injection** | Malicious prompts stored in profiles, comments, or posts that target LLMs scraping or processing your site |
18
+ | **Delimiter attacks** | Fake system tokens, ChatML tags, and XML/markdown boundaries injected into text |
19
+ | **Obfuscation** | Base64-encoded payloads, zero-width characters, hex escapes, and other encoding tricks |
20
+ | **Resource extraction** | Crypto transfer requests, wallet drain attacks, credential theft, financial urgency manipulation |
21
+
22
+ ### What it doesn't do
23
+
24
+ > PromptMeNot is a **supplemental defense layer**, not a silver bullet.
25
+
26
+ It uses pattern matching to catch known injection techniques, which means:
27
+
28
+ - **It can be bypassed.** Sufficiently creative or novel attacks may evade detection. Prompt injection is an evolving problem and no regex-based approach will catch everything.
29
+ - **It's not a replacement for other safeguards.** You should still use system prompts with clear boundaries, output filtering, least-privilege API access, and human review where appropriate.
30
+ - **It won't prevent all LLM misuse.** It focuses on the input side. It doesn't monitor or constrain what your LLM outputs.
31
+
32
+ Think of it like input validation for SQL injection: you still use parameterized queries, but rejecting `'; DROP TABLE users--` at the front door doesn't hurt. PromptMeNot is that front door check for prompt injection.
33
+
34
+ ### Defense in depth
35
+
36
+ For production apps, pair PromptMeNot with other layers:
37
+
38
+ | Layer | What to do |
39
+ |---|---|
40
+ | **Structural isolation** | Wrap user input in XML delimiters (`<user_input>...</user_input>`) so the LLM treats it as data, not instructions |
41
+ | **System prompt design** | Explicitly tell the model to ignore instructions found inside user content |
42
+ | **Output validation** | Check LLM responses for leaked system prompts, PII, or unexpected behavior before returning them to users |
43
+ | **Least-privilege access** | Restrict what your LLM can do (read-only DB access, scoped API keys, no `eval`) |
44
+
45
+ PromptMeNot handles the fast, cheap first pass. It catches the known attacks before they cost you an API call. The layers above handle the rest.
46
+
47
+ ---
7
48
 
8
49
  ## Installation
9
50
 
@@ -26,10 +67,10 @@ rails generate promptmenot:install # creates config/initializers/promptmenot.rb
26
67
 
27
68
  ```ruby
28
69
  class UserProfile < ApplicationRecord
29
- # Reject mode (default) -- adds validation error
70
+ # Reject mode (default) adds validation error
30
71
  validates :bio, prompt_safety: true
31
72
 
32
- # Sanitize mode -- strips malicious content, no error
73
+ # Sanitize mode strips malicious content, no error
33
74
  validates :about_me, prompt_safety: { mode: :sanitize }
34
75
 
35
76
  # Custom sensitivity
@@ -90,7 +131,7 @@ end
90
131
 
91
132
  ## Sensitivity Levels
92
133
 
93
- Sensitivity controls which patterns are active. Each pattern declares a minimum sensitivity level -- it only runs when the requested sensitivity is at or above that level.
134
+ Sensitivity controls which patterns are active. Each pattern declares a minimum sensitivity level and only runs when the requested sensitivity is at or above that level.
94
135
 
95
136
  | Pattern sensitivity | Active at `:low` | `:medium` | `:high` | `:paranoid` |
96
137
  |---|---|---|---|---|
@@ -111,17 +152,163 @@ Sensitivity controls which patterns are active. Each pattern declares a minimum
111
152
  | `encoding_obfuscation` | Base64 payloads, zero-width chars, hex escapes | ~10 |
112
153
  | `indirect_injection` | "Dear AI", "if you are an LLM", "note to chatbot" | ~10 |
113
154
  | `context_manipulation` | `===RESET===`, "the above is a test", prompt leaking | ~8 |
155
+ | `resource_extraction` | "transfer 100 SOL to", credential theft, wallet drain, endpoint exfiltration | ~10 |
114
156
 
115
157
  ## False Positive Mitigation
116
158
 
117
159
  Patterns use contextual qualifiers to minimize false positives:
118
160
 
119
- - "ignore" alone is fine -- "ignore **previous instructions**" is flagged
120
- - "act as" requires malicious qualifiers -- "act as a consultant" passes
121
- - "you are now" requires AI/restriction qualifiers -- "you are now subscribed" passes
122
- - "from now on" requires imperative "you must/will" -- "from now on I'll work from home" passes
161
+ - "ignore" alone is fine, but "ignore **previous instructions**" is flagged
162
+ - "act as" requires malicious qualifiers, so "act as a consultant" passes
163
+ - "you are now" requires AI/restriction qualifiers, so "you are now subscribed" passes
164
+ - "from now on" requires imperative "you must/will", so "from now on I'll work from home" passes
123
165
  - Broad patterns are placed at `:high`/`:paranoid` sensitivity so they don't fire at default settings
124
166
 
167
+ ## FAQ
168
+
169
+ <details open>
170
+ <summary><b>Does this work without Rails?</b></summary>
171
+ <br>
172
+
173
+ Yes. The ActiveModel validator is the Rails integration, but the core API works in any Ruby app:
174
+
175
+ ```ruby
176
+ Promptmenot.safe?("some text")
177
+ Promptmenot.detect("some text")
178
+ Promptmenot.sanitize("some text")
179
+ ```
180
+
181
+ The Railtie only loads if `Rails::Railtie` is defined.
182
+
183
+ </details>
184
+
185
+ <details>
186
+ <summary><b>Is this thread-safe?</b></summary>
187
+ <br>
188
+
189
+ Yes. The module singleton (configuration, registry) is protected by a `Monitor`. Pattern matching itself is stateless, so concurrent calls to `detect` or `sanitize` are safe.
190
+
191
+ </details>
192
+
193
+ <details>
194
+ <summary><b>What happens when patterns overlap?</b></summary>
195
+ <br>
196
+
197
+ The detector deduplicates automatically. If two patterns match overlapping regions of text, it keeps the larger match and discards the smaller one. This prevents double-counting in results and avoids garbled output in sanitize mode.
198
+
199
+ </details>
200
+
201
+ <details>
202
+ <summary><b>Can I use this to scan existing database records?</b></summary>
203
+ <br>
204
+
205
+ Yes. You can run detection against any string, not just incoming form input:
206
+
207
+ ```ruby
208
+ UserProfile.find_each do |profile|
209
+ result = Promptmenot.detect(profile.bio, sensitivity: :high)
210
+ puts "#{profile.id}: #{result.summary}" if result.unsafe?
211
+ end
212
+ ```
213
+
214
+ </details>
215
+
216
+ <details>
217
+ <summary><b>What's the performance like?</b></summary>
218
+ <br>
219
+
220
+ At default sensitivity (`:medium`), roughly 40-50 patterns are active. Each is a single regex scan, so detection is fast on typical user input. The `max_length` config (default: 50,000 characters) truncates excessively long inputs before scanning to prevent regex backtracking on adversarial payloads.
221
+
222
+ </details>
223
+
224
+ <details>
225
+ <summary><b>Can I scan only specific categories?</b></summary>
226
+ <br>
227
+
228
+ Yes. Both the `Detector` and `Sanitizer` accept a `categories` filter:
229
+
230
+ ```ruby
231
+ detector = Promptmenot::Detector.new(
232
+ sensitivity: :high,
233
+ categories: [:delimiter_injection, :encoding_obfuscation]
234
+ )
235
+ result = detector.detect(user_input)
236
+ ```
237
+
238
+ This is useful if you only care about certain attack types for a given field.
239
+
240
+ </details>
241
+
242
+ <details>
243
+ <summary><b>How does the on_detect callback work?</b></summary>
244
+ <br>
245
+
246
+ The callback fires whenever an injection is detected, before the result is returned. It receives the full `Result` object, so you can log, alert, or track metrics:
247
+
248
+ ```ruby
249
+ Promptmenot.configure do |config|
250
+ config.on_detect = ->(result) {
251
+ Rails.logger.warn("Injection detected: #{result.summary}")
252
+ StatsD.increment("promptmenot.injection_detected")
253
+ }
254
+ end
255
+ ```
256
+
257
+ If the callback raises an exception, it's rescued and printed to `warn` so it never breaks your app.
258
+
259
+ </details>
260
+
261
+ <details>
262
+ <summary><b>Does it catch leetspeak and Cyrillic homoglyphs?</b></summary>
263
+ <br>
264
+
265
+ Yes. The `encoding_obfuscation` category includes patterns for leetspeak injection (e.g., `1gn0r3 1nstruct10ns`) and mixed-script homoglyphs (Latin + Cyrillic characters in the same string). The homoglyph pattern is set to `:paranoid` sensitivity since mixed scripts can appear in legitimate multilingual content.
266
+
267
+ </details>
268
+
269
+ <details>
270
+ <summary><b>What's the difference between confidence and sensitivity?</b></summary>
271
+ <br>
272
+
273
+ They answer different questions:
274
+
275
+ - **Sensitivity** controls *when* a pattern runs. A `:low` sensitivity pattern runs at all levels. A `:paranoid` pattern only runs when you explicitly crank sensitivity up.
276
+ - **Confidence** describes *how certain* we are that a match is actually an attack. A `:high` confidence match (e.g., "ignore all previous instructions") is almost certainly malicious. A `:low` confidence match (e.g., mixed Cyrillic/Latin text) might be legitimate.
277
+
278
+ You can filter results by confidence after detection using `result.high_confidence_matches`.
279
+
280
+ </details>
281
+
282
+ <details>
283
+ <summary><b>Can I add patterns without modifying the gem source?</b></summary>
284
+ <br>
285
+
286
+ Yes. Use the config DSL in your initializer:
287
+
288
+ ```ruby
289
+ Promptmenot.configure do |config|
290
+ config.add_pattern(
291
+ name: :my_app_specific_attack,
292
+ regex: /some pattern specific to your app/i,
293
+ category: :custom,
294
+ sensitivity: :medium,
295
+ confidence: :high
296
+ )
297
+ end
298
+ ```
299
+
300
+ Custom patterns go through the same detection pipeline as built-in ones.
301
+
302
+ </details>
303
+
304
+ ## Contributing
305
+
306
+ We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines on:
307
+ - Setting up development environment
308
+ - Running tests and linting
309
+ - Adding new patterns
310
+ - Reporting issues
311
+
125
312
  ## License
126
313
 
127
314
  MIT License. See [LICENSE.txt](LICENSE.txt).
@@ -0,0 +1,77 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Promptmenot
4
+ module Patterns
5
+ class ResourceExtraction < Base
6
+ register(
7
+ name: :crypto_transfer_request,
8
+ regex: /\b(?:transfer|send|move|withdraw)\s+\d+(?:\.\d+)?\s*(?:SOL|ETH|BTC|USDC|USDT|XRP|MATIC|AVAX|DOT|ADA|BNB|DOGE|tokens?|coins?)\s+(?:to|into|towards)\b/i,
9
+ sensitivity: :low,
10
+ confidence: :high
11
+ )
12
+
13
+ register(
14
+ name: :wallet_address_with_instruction,
15
+ regex: /\b(?:transfer|send|move|withdraw|deposit)\b.{0,80}(?:0x[0-9a-fA-F]{20,}|[13][a-km-zA-HJ-NP-Z1-9]{25,34}|[1-9A-HJ-NP-Za-km-z]{32,44})\b/i,
16
+ sensitivity: :low,
17
+ confidence: :high
18
+ )
19
+
20
+ register(
21
+ name: :full_balance_drain,
22
+ regex: /\b(?:send|transfer|move|withdraw|drain|sweep)\s+(?:all|entire|whole|every|full|remaining|total)\s+(?:of\s+)?(?:your\s+|my\s+|the\s+)?(?:balance|funds?|tokens?|holdings?|assets?|coins?|crypto|portfolio|wallet)\b/i,
23
+ sensitivity: :low,
24
+ confidence: :high
25
+ )
26
+
27
+ register(
28
+ name: :financial_urgency_manipulation,
29
+ regex: /\b(?:urgent(?:ly)?|immediate(?:ly)?|right\s+now|asap|time[- ]sensitive|quickly|hurry|before\s+it'?s?\s+too\s+late|window\s+closing)\b.{0,60}\b(?:transfer|send|pay|wire|withdraw|transaction|funds?|money|payment)\b/i,
30
+ sensitivity: :medium,
31
+ confidence: :medium
32
+ )
33
+
34
+ register(
35
+ name: :authorization_claim_for_transfer,
36
+ regex: /\b(?:i(?:'m|\s+am)\s+authorized|i\s+have\s+(?:authorization|permission|approval)|(?:authorized|approved|cleared)\s+(?:to|for)\s+(?:receive|collect|access|withdraw))\b.{0,40}\b(?:funds?|tokens?|payment|money|transfer|wallet|assets?)\b/i,
37
+ sensitivity: :low,
38
+ confidence: :high
39
+ )
40
+
41
+ register(
42
+ name: :execute_transaction_instruction,
43
+ regex: /\b(?:execute|submit|place|perform|make|initiate|process)\s+(?:a\s+)?(?:swap|trade|sell\s+order|buy\s+order|transaction|transfer|market\s+order|limit\s+order|exchange)\b/i,
44
+ sensitivity: :medium,
45
+ confidence: :medium
46
+ )
47
+
48
+ register(
49
+ name: :credential_extraction,
50
+ regex: /\b(?:give|show|tell|send|share|reveal|display|output|print|provide|paste)\s+(?:me\s+)?(?:your\s+|the\s+)?(?:API\s+key|secret\s+key|private\s+key|seed\s+phrase|mnemonic|recovery\s+phrase|password|credentials?|auth\s+token|access\s+token|wallet\s+key)\b/i,
51
+ sensitivity: :low,
52
+ confidence: :high
53
+ )
54
+
55
+ register(
56
+ name: :external_endpoint_exfiltration,
57
+ regex: /\b(?:send|post|submit|forward|transmit|upload|push|pipe)\s+(?:the\s+)?(?:results?|output|response|keys?|tokens?|credentials?|secrets?|data)\s+(?:to|at|via)\s+(?:https?:\/\/|webhook|endpoint|external)/i,
58
+ sensitivity: :low,
59
+ confidence: :high
60
+ )
61
+
62
+ register(
63
+ name: :grant_access_escalation,
64
+ regex: /\b(?:grant|give|provide|allow|enable)\s+(?:me\s+)?(?:full|admin|root|owner|unlimited|unrestricted|complete)?\s*(?:access|control|permissions?|privileges?)\s+(?:to|over|for)\s+(?:the\s+)?(?:wallet|account|funds?|system|database|keys?|credentials?|resources?)\b/i,
65
+ sensitivity: :medium,
66
+ confidence: :high
67
+ )
68
+
69
+ register(
70
+ name: :resource_exhaustion,
71
+ regex: /\b(?:use|exhaust|consume|burn\s+through|deplete|drain|spend|max\s+out)\s+(?:all\s+)?(?:(?:your|the|my|available|remaining)\s+)?(?:credits?|quota|budget|compute|resources?|API\s+(?:calls?|requests?)|rate\s+limit|tokens?|capacity)\b/i,
72
+ sensitivity: :medium,
73
+ confidence: :high
74
+ )
75
+ end
76
+ end
77
+ end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Promptmenot
4
- VERSION = "0.1.2"
4
+ VERSION = "0.2.0"
5
5
  end
data/lib/promptmenot.rb CHANGED
@@ -17,6 +17,7 @@ require_relative "promptmenot/patterns/delimiter_injection"
17
17
  require_relative "promptmenot/patterns/encoding_obfuscation"
18
18
  require_relative "promptmenot/patterns/indirect_injection"
19
19
  require_relative "promptmenot/patterns/context_manipulation"
20
+ require_relative "promptmenot/patterns/resource_extraction"
20
21
  require_relative "promptmenot/detector"
21
22
  require_relative "promptmenot/sanitizer"
22
23
  require_relative "promptmenot/validator"
@@ -81,7 +82,8 @@ module Promptmenot
81
82
  Patterns::DelimiterInjection,
82
83
  Patterns::EncodingObfuscation,
83
84
  Patterns::IndirectInjection,
84
- Patterns::ContextManipulation
85
+ Patterns::ContextManipulation,
86
+ Patterns::ResourceExtraction
85
87
  ]
86
88
  end
87
89
 
data/promptmenot.gemspec CHANGED
@@ -29,6 +29,6 @@ Gem::Specification.new do |spec|
29
29
  end
30
30
  spec.require_paths = ["lib"]
31
31
 
32
- spec.add_dependency "activemodel", "~> 6.0"
33
- spec.add_dependency "activesupport", "~> 6.0"
32
+ spec.add_dependency "activemodel", ">= 6.0"
33
+ spec.add_dependency "activesupport", ">= 6.0"
34
34
  end
metadata CHANGED
@@ -1,41 +1,41 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: promptmenot
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.2
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - promptmenot contributors
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2026-02-18 00:00:00.000000000 Z
11
+ date: 2026-02-25 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: activemodel
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
- - - "~>"
17
+ - - ">="
18
18
  - !ruby/object:Gem::Version
19
19
  version: '6.0'
20
20
  type: :runtime
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
- - - "~>"
24
+ - - ">="
25
25
  - !ruby/object:Gem::Version
26
26
  version: '6.0'
27
27
  - !ruby/object:Gem::Dependency
28
28
  name: activesupport
29
29
  requirement: !ruby/object:Gem::Requirement
30
30
  requirements:
31
- - - "~>"
31
+ - - ">="
32
32
  - !ruby/object:Gem::Version
33
33
  version: '6.0'
34
34
  type: :runtime
35
35
  prerelease: false
36
36
  version_requirements: !ruby/object:Gem::Requirement
37
37
  requirements:
38
- - - "~>"
38
+ - - ">="
39
39
  - !ruby/object:Gem::Version
40
40
  version: '6.0'
41
41
  description: A Ruby on Rails gem that detects and sanitizes prompt injection attacks.
@@ -71,6 +71,7 @@ files:
71
71
  - lib/promptmenot/patterns/direct_instruction_override.rb
72
72
  - lib/promptmenot/patterns/encoding_obfuscation.rb
73
73
  - lib/promptmenot/patterns/indirect_injection.rb
74
+ - lib/promptmenot/patterns/resource_extraction.rb
74
75
  - lib/promptmenot/patterns/role_manipulation.rb
75
76
  - lib/promptmenot/railtie.rb
76
77
  - lib/promptmenot/result.rb