ai_redactor 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 993e3f9ba1f6b448dedbb5de93e8ea20c9d3f52b27b61ca8c099e3775cc95450
4
+ data.tar.gz: f1de9114804300e55ea2e4463a7590f55cf28c724054b68c0a028914aeb2de2d
5
+ SHA512:
6
+ metadata.gz: 5d9f1f6e2492e810a94d5e2242eade09467e5e1865af777870d0e6fabf16fa874a65cba7e8855634bfc9c3051d35bf42888bf7f25ae6bee27671c4c5edfe0b53
7
+ data.tar.gz: 871f92bd62a0e7acc10fafdf9906b80f52cbca771f098d12ff2aef81db61a19a0448828084cb491313acf78b1156714ec0be1f24c0fa0a6b2d5ea5164099eedc
data/.rspec ADDED
@@ -0,0 +1,3 @@
1
+ --format documentation
2
+ --color
3
+ --require spec_helper
data/.rubocop.yml ADDED
@@ -0,0 +1,45 @@
1
+ AllCops:
2
+ TargetRubyVersion: 2.7
3
+ NewCops: enable
4
+ Exclude:
5
+ - 'vendor/**/*'
6
+ - 'bin/**/*'
7
+ - 'exe/**/*'
8
+
9
+ Style/Documentation:
10
+ Enabled: false
11
+
12
+ Style/StringLiterals:
13
+ EnforcedStyle: double_quotes
14
+
15
+ Style/StringLiteralsInInterpolation:
16
+ EnforcedStyle: double_quotes
17
+
18
+ Layout/LineLength:
19
+ Max: 120
20
+
21
+ Metrics/MethodLength:
22
+ Max: 20
23
+
24
+ Metrics/ClassLength:
25
+ Max: 150
26
+
27
+ Metrics/ModuleLength:
28
+ Max: 150
29
+
30
+ Metrics/BlockLength:
31
+ Exclude:
32
+ - 'spec/**/*'
33
+ - '*.gemspec'
34
+
35
+ Style/FrozenStringLiteralComment:
36
+ Enabled: true
37
+
38
+ Layout/EmptyLinesAroundBlockBody:
39
+ Enabled: false
40
+
41
+ Layout/EmptyLinesAroundClassBody:
42
+ Enabled: false
43
+
44
+ Layout/EmptyLinesAroundModuleBody:
45
+ Enabled: false
data/CHANGELOG.md ADDED
@@ -0,0 +1,39 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [Unreleased]
9
+
10
+ ## [0.1.0] - 2024-09-09
11
+
12
+ ### Added
13
+ - Initial release of AI Redactor
14
+ - Text redaction with regex-based PII detection
15
+ - Support for multiple PII types:
16
+ - Turkish National ID numbers
17
+ - IBAN numbers
18
+ - Credit card numbers
19
+ - Phone numbers
20
+ - Email addresses
21
+ - Social Security Numbers
22
+ - License plates
23
+ - IP addresses
24
+ - Passport numbers
25
+ - Tax ID numbers
26
+ - Bank account numbers
27
+ - Flexible masking options with custom characters and lengths
28
+ - Format-preserving masking capability
29
+ - Comprehensive reporting with JSON export
30
+ - Confidence scoring for detections
31
+ - CLI interface with multiple output formats
32
+ - Pattern filtering for selective detection
33
+ - Full test suite with RSpec
34
+ - Documentation and examples
35
+
36
+ ### Security
37
+ - Offline processing - no external API calls
38
+ - No data persistence - all processing in memory
39
+ - Privacy-by-design architecture
data/Gemfile ADDED
@@ -0,0 +1,6 @@
1
+ # frozen_string_literal: true
2
+
3
+ source "https://rubygems.org"
4
+
5
+ # Specify your gem's dependencies in ai_redactor.gemspec
6
+ gemspec
data/README.md ADDED
@@ -0,0 +1,208 @@
1
+ # AI Redactor
2
+
3
+ [![Gem Version](https://badge.fury.io/rb/ai_redactor.svg)](https://badge.fury.io/rb/ai_redactor)
4
+ [![Build Status](https://github.com/ahmetxhero/ai_redactor/workflows/CI/badge.svg)](https://github.com/ahmetxhero/ai_redactor/actions)
5
+
6
+ AI-powered redaction tool for automatically detecting and masking Personally Identifiable Information (PII) in text and images. Designed to help organizations comply with GDPR, HIPAA, and KVKK regulations.
7
+
8
+ ## Features
9
+
10
+ - **Text Analysis**: Regex-based detection of PII including:
11
+ - Turkish National ID numbers
12
+ - IBAN numbers
13
+ - Credit card numbers
14
+ - Phone numbers
15
+ - Email addresses
16
+ - Social Security Numbers
17
+ - License plates
18
+ - IP addresses
19
+ - Passport numbers
20
+ - Tax ID numbers
21
+ - Bank account numbers
22
+
23
+ - **Flexible Masking Options**:
24
+ - Custom mask characters
25
+ - Configurable mask length
26
+ - Format-preserving masking
27
+ - Pattern-specific filtering
28
+
29
+ - **Comprehensive Reporting**:
30
+ - JSON reports with detection details
31
+ - Confidence scores for each detection
32
+ - Position information
33
+ - Summary statistics
34
+
35
+ - **CLI Interface**: Command-line tool for batch processing
36
+ - **Developer-friendly API**: Simple Ruby interface
37
+
38
+ ## Installation
39
+
40
+ Add this line to your application's Gemfile:
41
+
42
+ ```ruby
43
+ gem 'ai_redactor'
44
+ ```
45
+
46
+ And then execute:
47
+
48
+ $ bundle install
49
+
50
+ Or install it yourself as:
51
+
52
+ $ gem install ai_redactor
53
+
54
+ ## Usage
55
+
56
+ ### Basic Text Masking
57
+
58
+ ```ruby
59
+ require 'ai_redactor'
60
+
61
+ # Simple masking
62
+ text = "John Smith, National ID: 12345678901, IBAN: GB29NWBK60161331926819"
63
+ masked = AiRedactor.mask_text(text)
64
+ puts masked
65
+ # => "John Smith, National ID: **********, IBAN: ********"
66
+
67
+ # Custom masking options
68
+ masked = AiRedactor.mask_text(text,
69
+ mask_char: 'X',
70
+ mask_length: 4,
71
+ preserve_format: true
72
+ )
73
+ ```
74
+
75
+ ### Detailed Analysis
76
+
77
+ ```ruby
78
+ # Get detailed analysis report
79
+ report = AiRedactor.analyze_text(text)
80
+
81
+ puts "Found #{report.detection_count} PII items"
82
+ puts "Detection types: #{report.detection_types.join(', ')}"
83
+ puts "Average confidence: #{report.summary[:average_confidence]}"
84
+
85
+ # Access individual detections
86
+ report.detections.each do |detection|
87
+ puts "#{detection[:type]}: #{detection[:original]} (confidence: #{detection[:confidence]})"
88
+ end
89
+
90
+ # Export as JSON
91
+ json_report = report.to_json
92
+ File.write('analysis_report.json', json_report)
93
+ ```
94
+
95
+ ### Pattern Filtering
96
+
97
+ ```ruby
98
+ # Only detect specific patterns
99
+ email_only = AiRedactor.mask_text(text, patterns: [:email])
100
+
101
+ # Multiple specific patterns
102
+ financial_only = AiRedactor.mask_text(text, patterns: [:iban, :credit_card, :bank_account])
103
+ ```
104
+
105
+ ### CLI Usage
106
+
107
+ ```bash
108
+ # Basic masking
109
+ ai_redactor "Contact John at john@example.com or call 555-123-4567"
110
+
111
+ # Custom options
112
+ ai_redactor --mask-char X --mask-length 4 "Email: john@example.com"
113
+
114
+ # Detailed analysis
115
+ ai_redactor --analyze --format json "ID: 12345678901, Email: john@test.com"
116
+
117
+ # Save to file
118
+ ai_redactor --output report.txt --analyze "Sensitive data here"
119
+
120
+ # List available patterns
121
+ ai_redactor --list-patterns
122
+
123
+ # Filter specific patterns
124
+ ai_redactor --patterns email,phone "Contact info: john@test.com, 555-1234"
125
+ ```
126
+
127
+ ## Configuration Options
128
+
129
+ | Option | Description | Default |
130
+ |--------|-------------|---------|
131
+ | `mask_char` | Character used for masking | `"*"` |
132
+ | `mask_length` | Length of mask | `8` |
133
+ | `preserve_format` | Preserve original format | `false` |
134
+ | `patterns` | Array of patterns to detect | All patterns |
135
+ | `case_sensitive` | Case-sensitive matching | `false` |
136
+
137
+ ## Supported PII Patterns
138
+
139
+ - **turkish_id**: Turkish National ID numbers (11 digits)
140
+ - **iban**: International Bank Account Numbers
141
+ - **credit_card**: Credit card numbers (various formats)
142
+ - **phone**: Phone numbers (international and local)
143
+ - **email**: Email addresses
144
+ - **ssn**: Social Security Numbers (US format)
145
+ - **license_plate**: License plate numbers
146
+ - **ip_address**: IP addresses
147
+ - **passport**: Passport numbers
148
+ - **tax_id**: Tax ID numbers
149
+ - **bank_account**: Bank account numbers
150
+
151
+ ## Privacy & Security
152
+
153
+ - **Offline Processing**: No data sent to external services
154
+ - **No Data Storage**: Text is processed in memory only
155
+ - **Configurable**: Full control over detection and masking
156
+ - **Compliance Ready**: Supports GDPR, HIPAA, and KVKK requirements
157
+
158
+ ## Development
159
+
160
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt.
161
+
162
+ To install this gem onto your local machine, run `bundle exec rake install`.
163
+
164
+ ### Running Tests
165
+
166
+ ```bash
167
+ bundle exec rspec
168
+ ```
169
+
170
+ ### Code Quality
171
+
172
+ ```bash
173
+ bundle exec rubocop
174
+ ```
175
+
176
+ ## Roadmap
177
+
178
+ - **v0.1**: ✅ Text redaction with regex patterns
179
+ - **v0.2**: 🔄 ONNX-powered face detection in images
180
+ - **v0.3**: 📋 REST/gRPC API service mode
181
+ - **v1.0**: 🎯 Full compliance suite with audit logs
182
+
183
+ ## Contributing
184
+
185
+ Bug reports and pull requests are welcome on GitHub at https://github.com/ahmetxhero/ai_redactor.
186
+
187
+ 1. Fork it
188
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
189
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
190
+ 4. Push to the branch (`git push origin my-new-feature`)
191
+ 5. Create new Pull Request
192
+
193
+ ## License
194
+
195
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
196
+
197
+ ## Support
198
+
199
+ - 📧 Email: ahmetxhero@gmail.com
200
+ - 🐛 Issues: [GitHub Issues](https://github.com/ahmetxhero/ai_redactor/issues)
201
+ - 📖 Documentation: [GitHub Wiki](https://github.com/ahmetxhero/ai_redactor/wiki)
202
+ - 🌐 Portfolio: [ahmetxhero.web.app](https://ahmetxhero.web.app)
203
+ - 🐤 Twitter: [@ahmetxhero](https://x.com/ahmetxhero)
204
+ - 💼 LinkedIn: [linkedin.com/in/ahmetxhero](https://linkedin.com/in/ahmetxhero)
205
+
206
+ ---
207
+
208
+ **Protecting Privacy, One Redaction at a Time** 🛡️
data/Rakefile ADDED
@@ -0,0 +1,13 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "bundler/gem_tasks"
4
+ require "rspec/core/rake_task"
5
+ require "rubocop/rake_task"
6
+
7
+ RSpec::Core::RakeTask.new(:spec)
8
+ RuboCop::RakeTask.new
9
+
10
+ desc "Run all checks (tests and linting)"
11
+ task :check => [:spec, :rubocop]
12
+
13
+ task :default => :check
data/debug_test.rb ADDED
@@ -0,0 +1,25 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require_relative 'lib/ai_redactor'
4
+
5
+ # Test the failing cases
6
+ puts "=== Test 1: Multiple detections ==="
7
+ text1 = "Email: john@test.com, ID: 12345678901, Phone: +1-555-123-4567"
8
+ report1 = AiRedactor.analyze_text(text1)
9
+ puts "Text: #{text1}"
10
+ puts "Detection count: #{report1.detection_count}"
11
+ puts "Detection types: #{report1.detection_types}"
12
+ report1.detections.each_with_index do |d, i|
13
+ puts " #{i+1}. #{d[:type]}: '#{d[:original]}' at #{d[:start_position]}-#{d[:end_position]}"
14
+ end
15
+ puts
16
+
17
+ puts "=== Test 2: Email and ID ==="
18
+ text2 = "Email: john@example.com, ID: 12345678901"
19
+ report2 = AiRedactor.analyze_text(text2)
20
+ puts "Text: #{text2}"
21
+ puts "Detection count: #{report2.detection_count}"
22
+ puts "Detection types: #{report2.detection_types}"
23
+ report2.detections.each_with_index do |d, i|
24
+ puts " #{i+1}. #{d[:type]}: '#{d[:original]}' at #{d[:start_position]}-#{d[:end_position]}"
25
+ end
@@ -0,0 +1,52 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ require_relative '../lib/ai_redactor'
5
+
6
+ puts "=== AI Redactor Examples ==="
7
+ puts
8
+
9
+ # Example 1: Basic text masking
10
+ puts "1. Basic Text Masking:"
11
+ text = "John Smith, National ID: 12345678901, IBAN: GB29NWBK60161331926819"
12
+ masked = AiRedactor.mask_text(text)
13
+ puts "Original: #{text}"
14
+ puts "Masked: #{masked}"
15
+ puts
16
+
17
+ # Example 2: Custom masking options
18
+ puts "2. Custom Masking Options:"
19
+ custom_masked = AiRedactor.mask_text(text,
20
+ mask_char: 'X',
21
+ mask_length: 4,
22
+ preserve_format: true
23
+ )
24
+ puts "Custom: #{custom_masked}"
25
+ puts
26
+
27
+ # Example 3: Detailed analysis
28
+ puts "3. Detailed Analysis:"
29
+ report = AiRedactor.analyze_text("Contact: john@example.com, Phone: +1-555-123-4567")
30
+ puts "Detections found: #{report.detection_count}"
31
+ puts "Types: #{report.detection_types.join(', ')}"
32
+ puts "Average confidence: #{report.summary[:average_confidence]}"
33
+ puts "Masked text: #{report.masked_text}"
34
+ puts
35
+
36
+ # Example 4: Pattern filtering
37
+ puts "4. Pattern Filtering (emails only):"
38
+ email_only = AiRedactor.mask_text(
39
+ "Email: john@test.com, ID: 12345678901, Phone: +1-555-123-4567",
40
+ patterns: [:email]
41
+ )
42
+ puts "Email only: #{email_only}"
43
+ puts
44
+
45
+ # Example 5: JSON export
46
+ puts "5. JSON Report:"
47
+ json_report = report.to_json
48
+ puts "JSON length: #{json_report.length} characters"
49
+ puts "Sample: #{json_report[0..100]}..."
50
+ puts
51
+
52
+ puts "=== Examples completed! ==="
data/exe/ai_redactor ADDED
@@ -0,0 +1,173 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ require_relative "../lib/ai_redactor"
5
+ require "optparse"
6
+ require "json"
7
+
8
+ class AiRedactorCLI
9
+ def initialize
10
+ @options = {
11
+ mask_char: "*",
12
+ mask_length: 8,
13
+ preserve_format: false,
14
+ patterns: nil,
15
+ output_format: "text",
16
+ output_file: nil,
17
+ analyze_only: false
18
+ }
19
+ end
20
+
21
+ def run(args)
22
+ parse_options(args)
23
+
24
+ if args.empty?
25
+ puts "Error: No input text provided"
26
+ puts "Use --help for usage information"
27
+ exit 1
28
+ end
29
+
30
+ input_text = args.join(" ")
31
+
32
+ if @options[:analyze_only]
33
+ analyze_text(input_text)
34
+ else
35
+ mask_text(input_text)
36
+ end
37
+ end
38
+
39
+ private
40
+
41
+ def parse_options(args)
42
+ OptionParser.new do |opts|
43
+ opts.banner = "Usage: ai_redactor [options] TEXT"
44
+ opts.separator ""
45
+ opts.separator "AI-powered redaction tool for detecting and masking PII"
46
+ opts.separator ""
47
+ opts.separator "Options:"
48
+
49
+ opts.on("-c", "--mask-char CHAR", "Character to use for masking (default: *)") do |char|
50
+ @options[:mask_char] = char
51
+ end
52
+
53
+ opts.on("-l", "--mask-length LENGTH", Integer, "Length of mask (default: 8)") do |length|
54
+ @options[:mask_length] = length
55
+ end
56
+
57
+ opts.on("-p", "--preserve-format", "Preserve original format when masking") do
58
+ @options[:preserve_format] = true
59
+ end
60
+
61
+ opts.on("--patterns PATTERNS", Array, "Comma-separated list of patterns to detect") do |patterns|
62
+ @options[:patterns] = patterns.map(&:to_sym)
63
+ end
64
+
65
+ opts.on("-f", "--format FORMAT", ["text", "json"], "Output format (text, json)") do |format|
66
+ @options[:output_format] = format
67
+ end
68
+
69
+ opts.on("-o", "--output FILE", "Output file (default: stdout)") do |file|
70
+ @options[:output_file] = file
71
+ end
72
+
73
+ opts.on("-a", "--analyze", "Analyze text and return detailed report") do
74
+ @options[:analyze_only] = true
75
+ end
76
+
77
+ opts.on("--list-patterns", "List available detection patterns") do
78
+ list_patterns
79
+ exit 0
80
+ end
81
+
82
+ opts.on("-h", "--help", "Show this help message") do
83
+ puts opts
84
+ exit 0
85
+ end
86
+
87
+ opts.on("--version", "Show version") do
88
+ puts "ai_redactor #{AiRedactor::VERSION}"
89
+ exit 0
90
+ end
91
+ end.parse!(args)
92
+ end
93
+
94
+ def mask_text(text)
95
+ redactor_options = {
96
+ mask_char: @options[:mask_char],
97
+ mask_length: @options[:mask_length],
98
+ preserve_format: @options[:preserve_format]
99
+ }
100
+ redactor_options[:patterns] = @options[:patterns] if @options[:patterns]
101
+
102
+ result = AiRedactor.mask_text(text, redactor_options)
103
+ output_result(result)
104
+ end
105
+
106
+ def analyze_text(text)
107
+ redactor_options = {}
108
+ redactor_options[:patterns] = @options[:patterns] if @options[:patterns]
109
+
110
+ report = AiRedactor.analyze_text(text, redactor_options)
111
+
112
+ case @options[:output_format]
113
+ when "json"
114
+ output_result(report.to_json)
115
+ else
116
+ output_analysis_report(report)
117
+ end
118
+ end
119
+
120
+ def output_analysis_report(report)
121
+ output = []
122
+ output << "=== AI Redactor Analysis Report ==="
123
+ output << "Timestamp: #{report.timestamp}"
124
+ output << ""
125
+ output << "Summary:"
126
+ output << " Total detections: #{report.detection_count}"
127
+ output << " Detection types: #{report.detection_types.join(', ')}"
128
+ output << " Average confidence: #{report.summary[:average_confidence]}"
129
+ output << " High confidence detections: #{report.summary[:high_confidence_detections]}"
130
+ output << ""
131
+
132
+ if report.has_detections?
133
+ output << "Detections:"
134
+ report.detections.each_with_index do |detection, index|
135
+ output << " #{index + 1}. Type: #{detection[:type]}"
136
+ output << " Original: #{detection[:original]}"
137
+ output << " Position: #{detection[:start_position]}-#{detection[:end_position]}"
138
+ output << " Confidence: #{(detection[:confidence] * 100).round(1)}%"
139
+ output << " Masked: #{detection[:masked_value]}"
140
+ output << ""
141
+ end
142
+ end
143
+
144
+ output << "Original text:"
145
+ output << " #{report.original_text}"
146
+ output << ""
147
+ output << "Masked text:"
148
+ output << " #{report.masked_text}"
149
+
150
+ output_result(output.join("\n"))
151
+ end
152
+
153
+ def output_result(result)
154
+ if @options[:output_file]
155
+ File.write(@options[:output_file], result)
156
+ puts "Output written to #{@options[:output_file]}"
157
+ else
158
+ puts result
159
+ end
160
+ end
161
+
162
+ def list_patterns
163
+ puts "Available detection patterns:"
164
+ AiRedactor::Patterns.pattern_names.each do |pattern|
165
+ puts " - #{pattern}"
166
+ end
167
+ end
168
+ end
169
+
170
+ if __FILE__ == $0
171
+ cli = AiRedactorCLI.new
172
+ cli.run(ARGV)
173
+ end
@@ -0,0 +1,69 @@
1
+ # frozen_string_literal: true
2
+
3
+ module AiRedactor
4
+ module Patterns
5
+ # Turkish National ID (TC Kimlik No) - 11 digits
6
+ TURKISH_ID = /\b\d{11}\b/
7
+
8
+ # International Bank Account Number (IBAN)
9
+ # Supports various country formats
10
+ IBAN = /\b[A-Z]{2}\d{2}[A-Z0-9]{4}\d{7}([A-Z0-9]?){0,16}\b/
11
+
12
+ # Credit Card Numbers (various formats)
13
+ CREDIT_CARD = /\b(?:\d{4}[-\s]?){3}\d{4}\b/
14
+
15
+ # Phone Numbers (international and local formats) - more specific to avoid ID conflicts
16
+ PHONE = /(?:\+\d{1,3}[-.\s])\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}|\(\d{3}\)\s?\d{3}[-.\s]?\d{4}|\b\d{3}[.\s]\d{3}[.\s]\d{4}\b/
17
+
18
+ # Email Addresses
19
+ EMAIL = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/
20
+
21
+ # Social Security Numbers (US format)
22
+ SSN = /\b\d{3}-\d{2}-\d{4}\b/
23
+
24
+ # License Plates (various formats) - disabled for now to avoid false positives
25
+ # LICENSE_PLATE = /\b[A-Z]{2,3}[-\s]?[A-Z0-9]{2,4}[-\s]?[A-Z0-9]{2,4}\b/
26
+
27
+ # IP Addresses
28
+ IP_ADDRESS = /\b(?:\d{1,3}\.){3}\d{1,3}\b/
29
+
30
+ # Passport Numbers (alphanumeric, 6-9 characters) - disabled for now
31
+ # PASSPORT = /\b[A-Z0-9]{6,9}\b/
32
+
33
+ # Tax ID Numbers (various formats)
34
+ TAX_ID = /\b\d{2}-\d{7}\b/
35
+
36
+ # Bank Account Numbers (8-17 digits) - disabled for now
37
+ # BANK_ACCOUNT = /\b\d{8,17}\b/
38
+
39
+ # All patterns combined for comprehensive scanning
40
+ ALL_PATTERNS = {
41
+ turkish_id: TURKISH_ID,
42
+ iban: IBAN,
43
+ credit_card: CREDIT_CARD,
44
+ phone: PHONE,
45
+ email: EMAIL,
46
+ ssn: SSN,
47
+ # license_plate: LICENSE_PLATE,
48
+ ip_address: IP_ADDRESS,
49
+ # passport: PASSPORT,
50
+ tax_id: TAX_ID
51
+ # bank_account: BANK_ACCOUNT
52
+ }.freeze
53
+
54
+ # Get pattern by name
55
+ def self.get_pattern(name)
56
+ ALL_PATTERNS[name.to_sym]
57
+ end
58
+
59
+ # Get all pattern names
60
+ def self.pattern_names
61
+ ALL_PATTERNS.keys
62
+ end
63
+
64
+ # Check if a pattern name is valid
65
+ def self.valid_pattern?(name)
66
+ ALL_PATTERNS.key?(name.to_sym)
67
+ end
68
+ end
69
+ end
@@ -0,0 +1,84 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "json"
4
+ require "time"
5
+
6
+ module AiRedactor
7
+ class Report
8
+ attr_reader :original_text, :detections, :masked_text, :timestamp
9
+
10
+ def initialize(original_text, detections, masked_text)
11
+ @original_text = original_text
12
+ @detections = detections
13
+ @masked_text = masked_text
14
+ @timestamp = Time.now.utc.strftime("%Y-%m-%dT%H:%M:%SZ")
15
+ end
16
+
17
+ def to_json(*args)
18
+ {
19
+ timestamp: @timestamp,
20
+ summary: {
21
+ total_detections: @detections.length,
22
+ detection_types: detection_types_summary,
23
+ text_length: @original_text.length,
24
+ masked_length: @masked_text.length
25
+ },
26
+ detections: @detections,
27
+ original_text: @original_text,
28
+ masked_text: @masked_text
29
+ }.to_json(*args)
30
+ end
31
+
32
+ def to_h
33
+ JSON.parse(to_json)
34
+ end
35
+
36
+ def summary
37
+ {
38
+ total_detections: @detections.length,
39
+ detection_types: detection_types_summary,
40
+ high_confidence_detections: high_confidence_count,
41
+ average_confidence: average_confidence
42
+ }
43
+ end
44
+
45
+ def has_detections?
46
+ !@detections.empty?
47
+ end
48
+
49
+ def detection_count
50
+ @detections.length
51
+ end
52
+
53
+ def detection_types
54
+ @detections.map { |d| d[:type] }.uniq
55
+ end
56
+
57
+ def detections_by_type(type)
58
+ @detections.select { |d| d[:type] == type.to_s }
59
+ end
60
+
61
+ def high_confidence_detections(threshold = 0.8)
62
+ @detections.select { |d| d[:confidence] >= threshold }
63
+ end
64
+
65
+ private
66
+
67
+ def detection_types_summary
68
+ types_count = Hash.new(0)
69
+ @detections.each { |d| types_count[d[:type]] += 1 }
70
+ types_count
71
+ end
72
+
73
+ def high_confidence_count(threshold = 0.8)
74
+ @detections.count { |d| d[:confidence] >= threshold }
75
+ end
76
+
77
+ def average_confidence
78
+ return 0.0 if @detections.empty?
79
+
80
+ total_confidence = @detections.sum { |d| d[:confidence] }
81
+ (total_confidence / @detections.length).round(2)
82
+ end
83
+ end
84
+ end
@@ -0,0 +1,158 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "json"
4
+ require_relative "patterns"
5
+ require_relative "report"
6
+
7
+ module AiRedactor
8
+ class TextRedactor
9
+ DEFAULT_MASK_CHAR = "*"
10
+ DEFAULT_MASK_LENGTH = 8
11
+
12
+ def initialize(options = {})
13
+ @mask_char = options[:mask_char] || DEFAULT_MASK_CHAR
14
+ @mask_length = options[:mask_length] || DEFAULT_MASK_LENGTH
15
+ @preserve_format = options[:preserve_format] || false
16
+ @patterns = options[:patterns] || Patterns::ALL_PATTERNS.keys
17
+ @case_sensitive = options[:case_sensitive] || false
18
+ end
19
+
20
+ def mask(text)
21
+ return "" if text.nil? || text.empty?
22
+
23
+ masked_text = text.dup
24
+ detections = []
25
+
26
+ @patterns.each do |pattern_name|
27
+ pattern = Patterns.get_pattern(pattern_name)
28
+ next unless pattern
29
+
30
+ flags = @case_sensitive ? 0 : Regexp::IGNORECASE
31
+ regex = Regexp.new(pattern.source, flags)
32
+
33
+ masked_text.gsub!(regex) do |match|
34
+ start_pos = Regexp.last_match.begin(0)
35
+ end_pos = Regexp.last_match.end(0)
36
+
37
+ detections << {
38
+ type: pattern_name.to_s,
39
+ original: match,
40
+ start_position: start_pos,
41
+ end_position: end_pos,
42
+ confidence: 1.0
43
+ }
44
+
45
+ generate_mask(match, pattern_name)
46
+ end
47
+ end
48
+
49
+ masked_text
50
+ end
51
+
52
+ def analyze(text)
53
+ return Report.new(text, [], text) if text.nil? || text.empty?
54
+
55
+ detections = []
56
+ masked_text = text.dup
57
+
58
+ @patterns.each do |pattern_name|
59
+ pattern = Patterns.get_pattern(pattern_name)
60
+ next unless pattern
61
+
62
+ flags = @case_sensitive ? 0 : Regexp::IGNORECASE
63
+ regex = Regexp.new(pattern.source, flags)
64
+
65
+ text.scan(regex) do
66
+ match = Regexp.last_match
67
+ start_pos = match.begin(0)
68
+ end_pos = match.end(0)
69
+ matched_text = match[0]
70
+
71
+ detections << {
72
+ type: pattern_name.to_s,
73
+ original: matched_text,
74
+ start_position: start_pos,
75
+ end_position: end_pos,
76
+ confidence: calculate_confidence(matched_text, pattern_name),
77
+ masked_value: generate_mask(matched_text, pattern_name)
78
+ }
79
+ end
80
+ end
81
+
82
+ # Apply masking to create masked text
83
+ detections.sort_by { |d| -d[:start_position] }.each do |detection|
84
+ start_pos = detection[:start_position]
85
+ end_pos = detection[:end_position]
86
+ masked_text[start_pos...end_pos] = detection[:masked_value]
87
+ end
88
+
89
+ Report.new(text, detections, masked_text)
90
+ end
91
+
92
+ private
93
+
94
+ def generate_mask(original_text, pattern_name)
95
+ if @preserve_format
96
+ generate_format_preserving_mask(original_text, pattern_name)
97
+ else
98
+ @mask_char * [@mask_length, original_text.length].min
99
+ end
100
+ end
101
+
102
+ def generate_format_preserving_mask(text, pattern_name)
103
+ case pattern_name
104
+ when :iban
105
+ # Preserve IBAN format: GB29 NWBK 6016 1331 9268 19
106
+ text.gsub(/[A-Z0-9]/, @mask_char)
107
+ when :credit_card
108
+ # Preserve credit card format: **** **** **** 1234 (show last 4)
109
+ if text.length >= 4
110
+ masked_part = @mask_char * (text.length - 4)
111
+ last_four = text[-4..-1]
112
+ text.gsub(/\d/, @mask_char)[0...-4] + last_four
113
+ else
114
+ text.gsub(/\d/, @mask_char)
115
+ end
116
+ when :phone
117
+ # Preserve phone format structure
118
+ text.gsub(/\d/, @mask_char)
119
+ when :email
120
+ # Preserve email format: ***@***.com
121
+ parts = text.split("@")
122
+ if parts.length == 2
123
+ username_masked = @mask_char * [parts[0].length, 3].min
124
+ domain_parts = parts[1].split(".")
125
+ domain_masked = @mask_char * [domain_parts[0].length, 3].min
126
+ extension = domain_parts.length > 1 ? ".#{domain_parts[-1]}" : ""
127
+ "#{username_masked}@#{domain_masked}#{extension}"
128
+ else
129
+ @mask_char * text.length
130
+ end
131
+ else
132
+ @mask_char * text.length
133
+ end
134
+ end
135
+
136
+ def calculate_confidence(text, pattern_name)
137
+ # Basic confidence calculation based on pattern matching
138
+ # In future versions, this could be enhanced with ML models
139
+ case pattern_name
140
+ when :turkish_id
141
+ # Turkish ID has a checksum algorithm we could validate
142
+ text.length == 11 && text.match?(/^\d{11}$/) ? 0.9 : 0.7
143
+ when :iban
144
+ # IBAN has checksum validation we could implement
145
+ text.length >= 15 && text.length <= 34 ? 0.9 : 0.7
146
+ when :email
147
+ # Email validation could be more sophisticated
148
+ text.include?("@") && text.include?(".") ? 0.95 : 0.6
149
+ when :phone
150
+ # Phone number validation based on length and format
151
+ digits_only = text.gsub(/\D/, "")
152
+ digits_only.length >= 10 && digits_only.length <= 15 ? 0.85 : 0.6
153
+ else
154
+ 0.8 # Default confidence
155
+ end
156
+ end
157
+ end
158
+ end
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ module AiRedactor
4
+ VERSION = "0.1.0"
5
+ end
@@ -0,0 +1,20 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "ai_redactor/version"
4
+ require_relative "ai_redactor/text_redactor"
5
+ require_relative "ai_redactor/patterns"
6
+ require_relative "ai_redactor/report"
7
+
8
+ module AiRedactor
9
+ class Error < StandardError; end
10
+
11
+ # Main entry point for text masking
12
+ def self.mask_text(text, options = {})
13
+ TextRedactor.new(options).mask(text)
14
+ end
15
+
16
+ # Main entry point for text analysis with detailed report
17
+ def self.analyze_text(text, options = {})
18
+ TextRedactor.new(options).analyze(text)
19
+ end
20
+ end
metadata ADDED
@@ -0,0 +1,148 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: ai_redactor
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Ahmet KAHRAMAN
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2025-09-08 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: json
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '2.0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '2.0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rspec
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '3.0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '3.0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: rubocop
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '1.0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '1.0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: rubocop-rspec
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - "~>"
60
+ - !ruby/object:Gem::Version
61
+ version: '2.0'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - "~>"
67
+ - !ruby/object:Gem::Version
68
+ version: '2.0'
69
+ - !ruby/object:Gem::Dependency
70
+ name: rake
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - "~>"
74
+ - !ruby/object:Gem::Version
75
+ version: '13.0'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - "~>"
81
+ - !ruby/object:Gem::Version
82
+ version: '13.0'
83
+ - !ruby/object:Gem::Dependency
84
+ name: bundler
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - ">="
88
+ - !ruby/object:Gem::Version
89
+ version: '1.17'
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - ">="
95
+ - !ruby/object:Gem::Version
96
+ version: '1.17'
97
+ description: A lightweight Ruby library that automatically detects and redacts sensitive
98
+ information like national ID numbers, IBANs, phone numbers, emails, and faces in
99
+ documents and images. Supports GDPR, HIPAA, and KVKK compliance.
100
+ email:
101
+ - ahmetxhero@gmail.com
102
+ executables:
103
+ - ai_redactor
104
+ extensions: []
105
+ extra_rdoc_files: []
106
+ files:
107
+ - ".rspec"
108
+ - ".rubocop.yml"
109
+ - CHANGELOG.md
110
+ - Gemfile
111
+ - README.md
112
+ - Rakefile
113
+ - debug_test.rb
114
+ - examples/basic_usage.rb
115
+ - exe/ai_redactor
116
+ - lib/ai_redactor.rb
117
+ - lib/ai_redactor/patterns.rb
118
+ - lib/ai_redactor/report.rb
119
+ - lib/ai_redactor/text_redactor.rb
120
+ - lib/ai_redactor/version.rb
121
+ homepage: https://github.com/ahmetxhero/ai_redactor
122
+ licenses:
123
+ - MIT
124
+ metadata:
125
+ allowed_push_host: https://rubygems.org
126
+ homepage_uri: https://github.com/ahmetxhero/ai_redactor
127
+ source_code_uri: https://github.com/ahmetxhero/ai_redactor
128
+ changelog_uri: https://github.com/ahmetxhero/ai_redactor/blob/main/CHANGELOG.md
129
+ post_install_message:
130
+ rdoc_options: []
131
+ require_paths:
132
+ - lib
133
+ required_ruby_version: !ruby/object:Gem::Requirement
134
+ requirements:
135
+ - - ">="
136
+ - !ruby/object:Gem::Version
137
+ version: 2.6.0
138
+ required_rubygems_version: !ruby/object:Gem::Requirement
139
+ requirements:
140
+ - - ">="
141
+ - !ruby/object:Gem::Version
142
+ version: '0'
143
+ requirements: []
144
+ rubygems_version: 3.0.3.1
145
+ signing_key:
146
+ specification_version: 4
147
+ summary: AI-powered redaction tool for detecting and masking PII in text and images
148
+ test_files: []