RubyGems - promptmenot - Versions diffs - 0.1.1 - Mend

promptmenot 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (34) hide show

checksums.yaml +7 -0
data/.rspec +3 -0
data/.rubocop.yml +36 -0
data/CHANGELOG.md +21 -0
data/CONTRIBUTING.md +69 -0
data/Gemfile +9 -0
data/LICENSE.txt +21 -0
data/README.md +127 -0
data/Rakefile +12 -0
data/agents.md +150 -0
data/config/locales/en.yml +4 -0
data/lib/generators/promptmenot/install_generator.rb +17 -0
data/lib/generators/promptmenot/templates/promptmenot.rb +27 -0
data/lib/promptmenot/configuration.rb +54 -0
data/lib/promptmenot/detector.rb +67 -0
data/lib/promptmenot/errors.rb +7 -0
data/lib/promptmenot/match.rb +36 -0
data/lib/promptmenot/pattern.rb +66 -0
data/lib/promptmenot/pattern_registry.rb +53 -0
data/lib/promptmenot/patterns/base.rb +36 -0
data/lib/promptmenot/patterns/context_manipulation.rb +63 -0
data/lib/promptmenot/patterns/delimiter_injection.rb +81 -0
data/lib/promptmenot/patterns/direct_instruction_override.rb +95 -0
data/lib/promptmenot/patterns/encoding_obfuscation.rb +79 -0
data/lib/promptmenot/patterns/indirect_injection.rb +79 -0
data/lib/promptmenot/patterns/role_manipulation.rb +79 -0
data/lib/promptmenot/railtie.rb +13 -0
data/lib/promptmenot/result.rb +41 -0
data/lib/promptmenot/sanitizer.rb +50 -0
data/lib/promptmenot/validator.rb +39 -0
data/lib/promptmenot/version.rb +5 -0
data/lib/promptmenot.rb +96 -0
data/promptmenot.gemspec +34 -0
metadata +108 -0

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA256:
+  metadata.gz: 000e08f45a4388639584116f108d8e5d9f94f8e2d606e1f1823c82fb67645cf7
+  data.tar.gz: 0460fc97672f78cd0f16a67b0ea10139bbcc73b477eef6ab47d19802636cbc1d
+SHA512:
+  metadata.gz: 8d922773aa2f10bc810bdf285a7d28bea76817ad57fdddb49b69168b0e573b2c3e1b78f9d045d457279843ee3850d4024d480997aa1bd48392cbbca002a69124
+  data.tar.gz: 7158d333c28534f1f6d49a21652497ac765c60b7df39c4a49d989dc0e97a55d38c620e84e54eeae5f85af0a858ecd56b84ca83dc0e090172687ed1f0b30e8187

data/.rspec ADDED Viewed

@@ -0,0 +1,3 @@
+--format documentation
+--color
+--require spec_helper

data/.rubocop.yml ADDED Viewed

@@ -0,0 +1,36 @@
+AllCops:
+  TargetRubyVersion: 3.0
+  NewCops: enable
+  SuggestExtensions: false
+Style/Documentation:
+  Enabled: false
+Style/FrozenStringLiteralComment:
+  EnforcedStyle: always
+Metrics/MethodLength:
+  Max: 20
+Metrics/ClassLength:
+  Max: 150
+Metrics/BlockLength:
+  Exclude:
+    - "spec/**/*"
+    - "promptmenot.gemspec"
+Layout/LineLength:
+  Max: 120
+  Exclude:
+    - "lib/promptmenot/patterns/**/*"
+Metrics/AbcSize:
+  Exclude:
+    - "lib/promptmenot/detector.rb"
+Style/StringLiterals:
+  EnforcedStyle: double_quotes
+Style/StringLiteralsInInterpolation:
+  EnforcedStyle: double_quotes

data/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,21 @@
+# Changelog
+## [0.1.0] - 2026-02-17
+### Added
+- Core detection engine with 6 pattern categories (~60 patterns)
+  - Direct instruction override
+  - Role manipulation
+  - Delimiter injection
+  - Encoding obfuscation
+  - Indirect injection
+  - Context manipulation
+- Filter-based sensitivity levels: `:low`, `:medium`, `:high`, `:paranoid`
+- Two operating modes: `:reject` (validation error) and `:sanitize` (strip content)
+- ActiveModel validator (`prompt_safety`)
+- Standalone API (`Promptmenot.safe?`, `.detect`, `.sanitize`)
+- Global configuration DSL with custom pattern support
+- Detection callbacks
+- Rails generator (`rails g promptmenot:install`)
+- I18n support for error messages

data/CONTRIBUTING.md ADDED Viewed

@@ -0,0 +1,69 @@
+# Contributing to PromptMeNot
+We'd love your help improving PromptMeNot! Here's how to contribute:
+## Development Setup
+```bash
+git clone https://github.com/kevinl05/promptmenot.git
+cd promptmenot
+bundle install
+```
+## Running Tests
+```bash
+# Run full test suite
+bundle exec rspec
+# Run specific test file
+bundle exec rspec spec/promptmenot/detector_spec.rb
+# Run with coverage
+bundle exec rspec --coverage
+```
+## Code Quality
+```bash
+# Run RuboCop linter
+bundle exec rubocop
+# Auto-fix offenses
+bundle exec rubocop -a
+```
+## Making Changes
+1. **Fork** the repository on GitHub
+2. **Create a branch** for your feature: `git checkout -b feature/my-feature`
+3. **Make your changes** and add tests
+4. **Ensure all tests pass**: `bundle exec rspec`
+5. **Ensure code is clean**: `bundle exec rubocop -a`
+6. **Commit** with clear messages: `git commit -am 'Add new pattern for X'`
+7. **Push** to your fork: `git push origin feature/my-feature`
+8. **Open a PR** on GitHub
+## Adding New Patterns
+New injection attack patterns go in `lib/promptmenot/patterns/`.
+See existing pattern files for the DSL. Each pattern registers with:
+- `name` — unique identifier
+- `regex` — detection pattern
+- `sensitivity` — `:low`, `:medium`, `:high`, or `:paranoid`
+- `confidence` — `:high`, `:medium`, or `:low`
+Always include tests in `spec/promptmenot/patterns/`.
+## Reporting Issues
+Found a bug or have a suggestion? Open an issue on GitHub with:
+- Clear description of the problem
+- Steps to reproduce (if applicable)
+- Expected vs. actual behavior
+- Ruby/Rails version info
+## License
+All contributions are made under the MIT license.

data/Gemfile ADDED Viewed

@@ -0,0 +1,9 @@
+# frozen_string_literal: true
+source "https://rubygems.org"
+gemspec
+gem "rake", "~> 13.0"
+gem "rspec", "~> 3.0"
+gem "rubocop", "~> 1.21", require: false

data/LICENSE.txt ADDED Viewed

@@ -0,0 +1,21 @@
+The MIT License (MIT)
+Copyright (c) 2026 promptmenot contributors
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.

data/README.md ADDED Viewed

@@ -0,0 +1,127 @@
+# PromptMeNot
+Detect and sanitize prompt injection attacks in user-submitted text. Protects Rails apps against:
+- **Direct injection** -- users trying to hack your LLMs via form inputs
+- **Indirect injection** -- users storing malicious prompts in profiles so other LLMs that scrape your site get compromised
+## Installation
+Add to your Gemfile:
+```ruby
+gem "promptmenot"
+```
+Then run:
+```bash
+bundle install
+rails generate promptmenot:install  # creates config/initializers/promptmenot.rb
+```
+## Quick Start
+### ActiveModel Validation
+```ruby
+class UserProfile < ApplicationRecord
+  # Reject mode (default) -- adds validation error
+  validates :bio, prompt_safety: true
+  # Sanitize mode -- strips malicious content, no error
+  validates :about_me, prompt_safety: { mode: :sanitize }
+  # Custom sensitivity
+  validates :notes, prompt_safety: { sensitivity: :high, mode: :reject }
+end
+```
+### Standalone API
+```ruby
+Promptmenot.safe?("Hello world")
+# => true
+Promptmenot.safe?("Ignore all previous instructions")
+# => false
+result = Promptmenot.detect("Some text with [SYSTEM] override")
+result.safe?              # => false
+result.unsafe?            # => true
+result.matches            # => [#<Match ...>]
+result.categories_detected # => [:delimiter_injection]
+result.summary            # => "Detected 1 potential prompt injection pattern..."
+sanitized = Promptmenot.sanitize("Hello. Ignore all previous instructions. Goodbye.")
+sanitized.sanitized  # => "Hello. [removed] Goodbye."
+sanitized.changed?   # => true
+sanitized.original   # => "Hello. Ignore all previous instructions. Goodbye."
+```
+## Configuration
+```ruby
+# config/initializers/promptmenot.rb
+Promptmenot.configure do |config|
+  # Default sensitivity level for all validations
+  # Options: :low, :medium (default), :high, :paranoid
+  config.sensitivity = :medium
+  # Default mode: :reject (validation error) or :sanitize (strip content)
+  config.mode = :reject
+  # Replacement text used in sanitize mode
+  config.replacement_text = "[removed]"
+  # Callback fired whenever injection is detected
+  config.on_detect = ->(result) { Rails.logger.warn("Injection: #{result.summary}") }
+  # Register custom patterns
+  config.add_pattern(
+    name: :my_custom_pattern,
+    regex: /my dangerous regex/i,
+    category: :custom,
+    sensitivity: :medium,
+    confidence: :high
+  )
+end
+```
+## Sensitivity Levels
+Sensitivity controls which patterns are active. Each pattern declares a minimum sensitivity level -- it only runs when the requested sensitivity is at or above that level.
+| Pattern sensitivity | Active at `:low` | `:medium` | `:high` | `:paranoid` |
+|---|---|---|---|---|
+| `:low` | Yes | Yes | Yes | Yes |
+| `:medium` | No | Yes | Yes | Yes |
+| `:high` | No | No | Yes | Yes |
+| `:paranoid` | No | No | No | Yes |
+**`:low`** catches only the most obvious attacks (e.g., "ignore all previous instructions"). **`:paranoid`** flags anything remotely suspicious, including mixed-script text.
+## Pattern Categories
+| Category | Examples | Count |
+|---|---|---|
+| `direct_instruction_override` | "ignore previous instructions", "new instructions:" | ~12 |
+| `role_manipulation` | "jailbreak mode", "act as unrestricted AI", "DAN" | ~10 |
+| `delimiter_injection` | `<\|system\|>`, `[SYSTEM]`, ChatML tokens | ~10 |
+| `encoding_obfuscation` | Base64 payloads, zero-width chars, hex escapes | ~10 |
+| `indirect_injection` | "Dear AI", "if you are an LLM", "note to chatbot" | ~10 |
+| `context_manipulation` | `===RESET===`, "the above is a test", prompt leaking | ~8 |
+## False Positive Mitigation
+Patterns use contextual qualifiers to minimize false positives:
+- "ignore" alone is fine -- "ignore **previous instructions**" is flagged
+- "act as" requires malicious qualifiers -- "act as a consultant" passes
+- "you are now" requires AI/restriction qualifiers -- "you are now subscribed" passes
+- "from now on" requires imperative "you must/will" -- "from now on I'll work from home" passes
+- Broad patterns are placed at `:high`/`:paranoid` sensitivity so they don't fire at default settings
+## License
+MIT License. See [LICENSE.txt](LICENSE.txt).

data/Rakefile ADDED Viewed

@@ -0,0 +1,12 @@
+# frozen_string_literal: true
+require "bundler/gem_tasks"
+require "rspec/core/rake_task"
+RSpec::Core::RakeTask.new(:spec)
+require "rubocop/rake_task"
+RuboCop::RakeTask.new
+task default: %i[spec rubocop]

data/agents.md ADDED Viewed

@@ -0,0 +1,150 @@
+# Agents Reference Guide
+This document provides operational knowledge for AI agents and developers working on PromptMeNot.
+## Project Overview
+- **Project Name:** PromptMeNot
+- **Type:** Ruby gem (Rails plugin)
+- **Framework:** ActiveModel / ActiveSupport (>= 6.0)
+- **Language:** Ruby >= 3.0
+- **Package Manager:** Bundler
+- **Test Framework:** RSpec
+- **Linter:** RuboCop
+---
+## Getting Started
+```bash
+# Install dependencies
+bundle install
+# Run tests
+bundle exec rspec
+# Run linter
+bundle exec rubocop
+```
+---
+## Build & Release
+### Build the Gem
+```bash
+# Build .gem file
+gem build promptmenot.gemspec
+# Install locally for testing
+gem install promptmenot-*.gem
+```
+### Release
+```bash
+# Bump version in lib/promptmenot/version.rb, then:
+gem build promptmenot.gemspec
+gem push promptmenot-*.gem
+```
+---
+## Architecture
+### Pattern System
+Patterns are registered via a DSL in `lib/promptmenot/patterns/*.rb`. Each pattern declares:
+- **name** — unique identifier
+- **category** — which pattern category it belongs to
+- **regex** — the detection regex
+- **sensitivity** — minimum sensitivity level to activate (`:low`, `:medium`, `:high`, `:paranoid`)
+- **confidence** — how confident the match is (`:high`, `:medium`, `:low`)
+### Sensitivity Levels (Filter-Based)
+Each pattern declares its minimum sensitivity. At runtime, only patterns at or below the requested level are active:
+| Pattern sensitivity | Active at :low | :medium | :high | :paranoid |
+|---|---|---|---|---|
+| :low | Yes | Yes | Yes | Yes |
+| :medium | No | Yes | Yes | Yes |
+| :high | No | No | Yes | Yes |
+| :paranoid | No | No | No | Yes |
+### Detection Flow
+1. `Detector` receives text + sensitivity level
+2. `PatternRegistry` filters patterns by sensitivity
+3. Each pattern's regex runs against the text
+4. Overlapping matches are deduplicated
+5. `Result` object is returned (safe?/unsafe?, matches, categories)
+### Modes
+- **reject** — adds ActiveModel validation error (default)
+- **sanitize** — strips matched content from the field value
+---
+## Common Scripts Reference
+```bash
+# Development
+bundle install               # Install dependencies
+bundle console               # Open IRB with gem loaded (if configured)
+# Testing
+bundle exec rspec            # Run full test suite
+bundle exec rspec spec/promptmenot/detector_spec.rb  # Run single spec
+# Linting
+bundle exec rubocop          # Run linter
+bundle exec rubocop -a       # Auto-fix offenses
+```
+---
+## Key File Locations
+| File | Purpose |
+|---|---|
+| `lib/promptmenot.rb` | Root entry point, convenience API |
+| `lib/promptmenot/version.rb` | Gem version |
+| `lib/promptmenot/configuration.rb` | Global config DSL |
+| `lib/promptmenot/detector.rb` | Core detection engine |
+| `lib/promptmenot/sanitizer.rb` | Content sanitization |
+| `lib/promptmenot/validator.rb` | ActiveModel validator |
+| `lib/promptmenot/pattern_registry.rb` | Central pattern registry |
+| `lib/promptmenot/patterns/` | All pattern category definitions |
+| `lib/promptmenot/railtie.rb` | Rails auto-config |
+| `config/locales/en.yml` | I18n error messages |
+| `promptmenot.gemspec` | Gem specification |
+| `spec/` | All test specs |
+---
+## Troubleshooting
+### Bundle Install Fails
+**Symptom:** Dependency resolution errors
+```bash
+# Remove lockfile and retry
+rm Gemfile.lock && bundle install
+```
+### RSpec Can't Find Patterns
+**Symptom:** Tests pass but no patterns are detected
+Check that all pattern files in `lib/promptmenot/patterns/` are required in `lib/promptmenot.rb`.
+---
+## Related Documentation
+- `README.md` - Usage examples, configuration guide, pattern reference
+- `CHANGELOG.md` - Version history

data/config/locales/en.yml ADDED Viewed

@@ -0,0 +1,4 @@
+en:
+  errors:
+    messages:
+      prompt_injection_detected: "contains potentially unsafe prompt injection content"

data/lib/generators/promptmenot/install_generator.rb ADDED Viewed

@@ -0,0 +1,17 @@
+# frozen_string_literal: true
+require "rails/generators"
+module Promptmenot
+  module Generators
+    class InstallGenerator < Rails::Generators::Base
+      source_root File.expand_path("templates", __dir__)
+      desc "Creates a Promptmenot initializer in your application."
+      def copy_initializer
+        template "promptmenot.rb", "config/initializers/promptmenot.rb"
+      end
+    end
+  end
+end

data/lib/generators/promptmenot/templates/promptmenot.rb ADDED Viewed

@@ -0,0 +1,27 @@
+# frozen_string_literal: true
+Promptmenot.configure do |config|
+  # Default sensitivity level for all validations.
+  # Options: :low, :medium (default), :high, :paranoid
+  # config.sensitivity = :medium
+  # Default mode for the prompt_safety validator.
+  # :reject  — adds a validation error (default)
+  # :sanitize — strips matched content from the field
+  # config.mode = :reject
+  # Replacement text used in sanitize mode.
+  # config.replacement_text = "[removed]"
+  # Callback fired whenever an injection is detected.
+  # config.on_detect = ->(result) { Rails.logger.warn("Prompt injection: #{result.summary}") }
+  # Register custom patterns:
+  # config.add_pattern(
+  #   name: :my_custom_pattern,
+  #   regex: /my custom regex/i,
+  #   category: :custom,
+  #   sensitivity: :medium,
+  #   confidence: :high
+  # )
+end

data/lib/promptmenot/configuration.rb ADDED Viewed

@@ -0,0 +1,54 @@
+# frozen_string_literal: true
+module Promptmenot
+  class Configuration
+    VALID_SENSITIVITIES = %i[low medium high paranoid].freeze
+    VALID_MODES = %i[reject sanitize].freeze
+    attr_reader :sensitivity, :mode
+    attr_accessor :replacement_text, :on_detect, :max_length
+    def initialize
+      @sensitivity = :medium
+      @mode = :reject
+      @replacement_text = "[removed]"
+      @max_length = 50_000
+      @custom_patterns_list = []
+      @custom_patterns = nil
+      @on_detect = nil
+    end
+    def sensitivity=(value)
+      sym = value.to_sym
+      unless VALID_SENSITIVITIES.include?(sym)
+        raise ConfigurationError, "Invalid sensitivity: #{value}. Must be one of: #{VALID_SENSITIVITIES.join(", ")}"
+      end
+      @sensitivity = sym
+    end
+    def mode=(value)
+      sym = value.to_sym
+      unless VALID_MODES.include?(sym)
+        raise ConfigurationError, "Invalid mode: #{value}. Must be one of: #{VALID_MODES.join(", ")}"
+      end
+      @mode = sym
+    end
+    def custom_patterns
+      @custom_patterns ||= @custom_patterns_list.dup.freeze
+    end
+    def add_pattern(name:, regex:, category: :custom, sensitivity: :medium, confidence: :medium)
+      @custom_patterns_list << Pattern.new(
+        name: name,
+        category: category,
+        regex: regex,
+        sensitivity: sensitivity,
+        confidence: confidence
+      )
+      @custom_patterns = nil
+    end
+  end
+end

data/lib/promptmenot/detector.rb ADDED Viewed

@@ -0,0 +1,67 @@
+# frozen_string_literal: true
+module Promptmenot
+  class Detector
+    attr_reader :sensitivity, :categories
+    def initialize(sensitivity: nil, categories: nil)
+      @sensitivity = sensitivity || Promptmenot.configuration.sensitivity
+      @categories = categories
+    end
+    def detect(text)
+      return Result.new(text: text.to_s) if text.nil? || text.to_s.strip.empty?
+      input = text.to_s
+      max = Promptmenot.configuration.max_length
+      input = input[0, max] if max && input.length > max
+      patterns = Promptmenot.registry.for_sensitivity_and_categories(
+        @sensitivity,
+        categories: @categories
+      )
+      all_matches = patterns.flat_map { |pattern| pattern.match(input) }
+      deduped = deduplicate(all_matches)
+      result = Result.new(text: input, matches: deduped)
+      fire_callback(result) if result.unsafe?
+      result
+    end
+    private
+    def deduplicate(matches)
+      return matches if matches.size <= 1
+      sorted = matches.sort_by { |m| [m.position.begin, -m.position.size] }
+      kept = []
+      sorted.each do |match|
+        existing = kept.find { |m| overlaps?(m, match) }
+        if existing
+          # Keep the larger match when overlapping
+          if match.position.size > existing.position.size
+            kept.delete(existing)
+            kept << match
+          end
+        else
+          kept << match
+        end
+      end
+      kept
+    end
+    def overlaps?(first, second)
+      first.position.begin < second.position.end && second.position.begin < first.position.end
+    end
+    def fire_callback(result)
+      callback = Promptmenot.configuration.on_detect
+      callback&.call(result)
+    rescue StandardError => e
+      warn "[Promptmenot] on_detect callback raised #{e.class}: #{e.message}"
+    end
+  end
+end

data/lib/promptmenot/errors.rb ADDED Viewed

@@ -0,0 +1,7 @@
+# frozen_string_literal: true
+module Promptmenot
+  class Error < StandardError; end
+  class ConfigurationError < Error; end
+  class PatternError < Error; end
+end

data/lib/promptmenot/match.rb ADDED Viewed

@@ -0,0 +1,36 @@
+# frozen_string_literal: true
+module Promptmenot
+  class Match
+    attr_reader :pattern, :matched_text, :position
+    def initialize(pattern:, matched_text:, position:)
+      @pattern = pattern
+      @matched_text = matched_text
+      @position = position
+    end
+    def category
+      pattern.category
+    end
+    def pattern_name
+      pattern.name
+    end
+    def confidence
+      pattern.confidence
+    end
+    def sensitivity
+      pattern.sensitivity
+    end
+    def ==(other)
+      other.is_a?(Match) &&
+        pattern_name == other.pattern_name &&
+        matched_text == other.matched_text &&
+        position == other.position
+    end
+  end
+end