mask-pii 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 4b8364569252025ebce46522149831dccc170b742f479a4549ea8396fba350f3
4
+ data.tar.gz: 5fb7d89b4ae0638f8bdf348af4640713c5e812f51174769543cdbde7c37d4bd7
5
+ SHA512:
6
+ metadata.gz: 4f2ce3d554d77719f8961b186daa21f6bc57e286387094e0190ca783d5c10f98604aed271889d9180770a8e13dfc58fb97fb23d41bfa4426c046099018251c5a
7
+ data.tar.gz: 7ea311f2d8be27f324cbf671e9f63db98ed2c13cb03949089492ec00d0c5a656ccb6cc4526fdd9b081401da49547922479099c2d0583c50bd76f064d09e0fa2a
data/README.md ADDED
@@ -0,0 +1,85 @@
1
+ # mask-pii (Ruby)
2
+
3
+ Version: 0.2.0
4
+
5
+ A lightweight, customizable Ruby library for masking Personally Identifiable Information (PII) such as **email addresses** and **phone numbers**.
6
+
7
+ It is designed to be safe, fast, and easy to integrate into logging or data processing pipelines.
8
+
9
+ Official website: [https://finitefield.org/en/oss/mask-pii](https://finitefield.org/en/oss/mask-pii)
10
+ Developed by: [Finite Field, K.K.](https://finitefield.org)
11
+
12
+ ## Features
13
+
14
+ - **Email Masking:** Masks the local part while preserving the domain (e.g., `a****@example.com`).
15
+ - **Global Phone Masking:** Detects international phone formats and masks all digits except the last 4.
16
+ - **Customizable:** Change the masking character (default is `*`).
17
+ - **Zero Unnecessary Dependencies:** Pure Ruby implementation.
18
+
19
+ ## Installation
20
+
21
+ Add this to your `Gemfile`:
22
+
23
+ ```ruby
24
+ gem "mask-pii", path: "../mask-pii/ruby"
25
+ ```
26
+
27
+ Or install from a packaged gem:
28
+
29
+ ```bash
30
+ gem install mask-pii
31
+ ```
32
+
33
+ ## Publishing
34
+
35
+ From the `ruby` directory:
36
+
37
+ ```bash
38
+ gem build mask-pii.gemspec
39
+ gem push mask-pii-0.1.0.gem
40
+ ```
41
+
42
+ ## Usage
43
+
44
+ ```ruby
45
+ require "mask_pii"
46
+
47
+ masker = MaskPII::Masker.new
48
+ .mask_emails
49
+ .mask_phones
50
+ .with_mask_char("#")
51
+
52
+ input = "Contact: alice@example.com or 090-1234-5678."
53
+ output = masker.process(input)
54
+
55
+ puts output
56
+ # => "Contact: a####@example.com or 090-####-5678."
57
+ ```
58
+
59
+ ## Configuration
60
+
61
+ The `Masker` class uses a builder-style API. By default, `Masker.new` performs **no masking** (pass-through).
62
+
63
+ ### Builder Methods
64
+
65
+ | Method | Description | Default |
66
+ | --- | --- | --- |
67
+ | `mask_emails` | Enables detection and masking of email addresses. | Disabled |
68
+ | `mask_phones` | Enables detection and masking of global phone numbers. | Disabled |
69
+ | `with_mask_char(char)` | Sets the character used for masking (e.g., `"*"`, `"#"`, `"x"`). | `"*"` |
70
+
71
+ ### Masking Logic Details
72
+
73
+ **Emails**
74
+ - **Pattern:** Detects standard email formats.
75
+ - **Behavior:** Keeps the first character of the local part and the domain. Masks the rest of the local part.
76
+ - **Example:** `alice@example.com` -> `a****@example.com`
77
+ - **Short Emails:** If the local part is 1 character, it is fully masked (e.g., `a@b.com` -> `*@b.com`).
78
+
79
+ **Phones (Global Support)**
80
+ - **Pattern:** Detects sequences of digits that look like phone numbers (supports international `+81...`, US `(555)...`, and hyphenated `090-...`).
81
+ - **Behavior:** Preserves formatting (hyphens, spaces, parentheses) and the **last 4 digits**. All other digits are replaced.
82
+ - **Examples:**
83
+ - `090-1234-5678` -> `090-****-5678`
84
+ - `+1 (800) 123-4567` -> `+1 (***) ***-4567`
85
+ - `12345` -> `*2345`
@@ -0,0 +1,6 @@
1
+ # frozen_string_literal: true
2
+
3
+ module MaskPII
4
+ # The current version of the mask-pii Ruby gem.
5
+ VERSION = "0.2.0"
6
+ end
data/lib/mask_pii.rb ADDED
@@ -0,0 +1,230 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "mask_pii/version"
4
+
5
+ # Top-level namespace for the mask-pii Ruby library.
6
+ module MaskPII
7
+ # A configurable masker for common PII such as emails and phone numbers.
8
+ class Masker
9
+ # Create a new masker with all masks disabled by default.
10
+ def initialize
11
+ @mask_email = false
12
+ @mask_phone = false
13
+ @mask_char = "*"
14
+ end
15
+
16
+ # Enable email address masking.
17
+ # @return [Masker] the current masker instance
18
+ def mask_emails
19
+ @mask_email = true
20
+ self
21
+ end
22
+
23
+ # Enable phone number masking.
24
+ # @return [Masker] the current masker instance
25
+ def mask_phones
26
+ @mask_phone = true
27
+ self
28
+ end
29
+
30
+ # Set the character used for masking.
31
+ # @param char [String] a single-character string to use for masking
32
+ # @return [Masker] the current masker instance
33
+ def with_mask_char(char)
34
+ @mask_char = char.to_s[0] || "*"
35
+ self
36
+ end
37
+
38
+ # Process input text and mask enabled PII patterns.
39
+ # @param input [String] text to scan and mask
40
+ # @return [String] masked output text
41
+ def process(input)
42
+ result = input.to_s.dup
43
+
44
+ result = mask_emails_in_text(result) if @mask_email
45
+ result = mask_phones_in_text(result) if @mask_phone
46
+
47
+ result
48
+ end
49
+
50
+ private
51
+
52
+ def mask_emails_in_text(text)
53
+ bytes = text.bytes
54
+ len = bytes.length
55
+ output = String.new(encoding: text.encoding)
56
+ last = 0
57
+ i = 0
58
+
59
+ while i < len
60
+ if bytes[i] == 64
61
+ local_start = i
62
+ while local_start > 0 && local_char?(bytes[local_start - 1])
63
+ local_start -= 1
64
+ end
65
+ local_end = i
66
+
67
+ domain_start = i + 1
68
+ domain_end = domain_start
69
+ while domain_end < len && domain_char?(bytes[domain_end])
70
+ domain_end += 1
71
+ end
72
+
73
+ if local_start < local_end && domain_start < domain_end
74
+ candidate_end = domain_end
75
+ matched_domain_end = nil
76
+ while candidate_end > domain_start
77
+ domain = slice_bytes(text, domain_start, candidate_end - domain_start)
78
+ if valid_domain?(domain)
79
+ matched_domain_end = candidate_end
80
+ break
81
+ end
82
+ candidate_end -= 1
83
+ end
84
+
85
+ if matched_domain_end
86
+ local = slice_bytes(text, local_start, local_end - local_start)
87
+ domain = slice_bytes(text, domain_start, matched_domain_end - domain_start)
88
+ output << slice_bytes(text, last, local_start - last)
89
+ output << mask_local(local)
90
+ output << "@"
91
+ output << domain
92
+ last = matched_domain_end
93
+ i = matched_domain_end
94
+ next
95
+ end
96
+ end
97
+ end
98
+
99
+ i += 1
100
+ end
101
+
102
+ output << slice_bytes(text, last, len - last)
103
+ output
104
+ end
105
+
106
+ def mask_phones_in_text(text)
107
+ bytes = text.bytes
108
+ len = bytes.length
109
+ output = String.new(encoding: text.encoding)
110
+ last = 0
111
+ i = 0
112
+
113
+ while i < len
114
+ if phone_start?(bytes[i])
115
+ end_index = i
116
+ while end_index < len && phone_char?(bytes[end_index])
117
+ end_index += 1
118
+ end
119
+
120
+ digit_count = 0
121
+ last_digit = nil
122
+ idx = i
123
+ while idx < end_index
124
+ if digit?(bytes[idx])
125
+ digit_count += 1
126
+ last_digit = idx
127
+ end
128
+ idx += 1
129
+ end
130
+
131
+ if last_digit && digit_count >= 5
132
+ candidate_end = last_digit + 1
133
+ candidate = slice_bytes(text, i, candidate_end - i)
134
+ output << slice_bytes(text, last, i - last)
135
+ output << mask_phone_candidate(candidate)
136
+ last = candidate_end
137
+ i = candidate_end
138
+ next
139
+ end
140
+
141
+ i = end_index
142
+ next
143
+ end
144
+
145
+ i += 1
146
+ end
147
+
148
+ output << slice_bytes(text, last, len - last)
149
+ output
150
+ end
151
+
152
+ def mask_local(local)
153
+ if local.bytesize > 1
154
+ slice_bytes(local, 0, 1) + (@mask_char * (local.bytesize - 1))
155
+ else
156
+ @mask_char
157
+ end
158
+ end
159
+
160
+ def mask_phone_candidate(candidate)
161
+ bytes = candidate.bytes
162
+ digit_count = bytes.count { |byte| digit?(byte) }
163
+ current_index = 0
164
+ output = String.new(encoding: candidate.encoding)
165
+
166
+ bytes.each do |byte|
167
+ if digit?(byte)
168
+ current_index += 1
169
+ if digit_count > 4 && current_index <= digit_count - 4
170
+ output << @mask_char
171
+ else
172
+ output << byte
173
+ end
174
+ else
175
+ output << byte
176
+ end
177
+ end
178
+
179
+ output
180
+ end
181
+
182
+ def slice_bytes(text, start_index, length)
183
+ slice = text.byteslice(start_index, length)
184
+ slice.force_encoding(text.encoding)
185
+ end
186
+
187
+ def local_char?(byte)
188
+ alpha?(byte) || digit?(byte) || byte == 46 || byte == 95 || byte == 37 || byte == 43 || byte == 45
189
+ end
190
+
191
+ def domain_char?(byte)
192
+ alpha?(byte) || digit?(byte) || byte == 45 || byte == 46
193
+ end
194
+
195
+ def valid_domain?(domain)
196
+ return false if domain.start_with?(".") || domain.end_with?(".")
197
+
198
+ parts = domain.split(".")
199
+ return false if parts.length < 2
200
+
201
+ parts.each do |part|
202
+ return false if part.empty?
203
+ return false if part.start_with?("-") || part.end_with?("-")
204
+ return false unless part.bytes.all? { |byte| alpha?(byte) || digit?(byte) || byte == 45 }
205
+ end
206
+
207
+ tld = parts.last
208
+ return false if tld.length < 2
209
+ return false unless tld.bytes.all? { |byte| alpha?(byte) }
210
+
211
+ true
212
+ end
213
+
214
+ def phone_start?(byte)
215
+ digit?(byte) || byte == 43 || byte == 40
216
+ end
217
+
218
+ def phone_char?(byte)
219
+ digit?(byte) || byte == 32 || byte == 45 || byte == 40 || byte == 41 || byte == 43
220
+ end
221
+
222
+ def digit?(byte)
223
+ byte >= 48 && byte <= 57
224
+ end
225
+
226
+ def alpha?(byte)
227
+ (byte >= 65 && byte <= 90) || (byte >= 97 && byte <= 122)
228
+ end
229
+ end
230
+ end
@@ -0,0 +1,135 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "minitest/autorun"
4
+ require "mask_pii"
5
+
6
+ class TestMaskPII < Minitest::Test
7
+ def test_email_basic_cases
8
+ masker = MaskPII::Masker.new.mask_emails
9
+ assert_cases(
10
+ masker,
11
+ "alice@example.com" => "a****@example.com",
12
+ "a@b.com" => "*@b.com",
13
+ "ab@example.com" => "a*@example.com",
14
+ "a.b+c_d@example.co.jp" => "a******@example.co.jp"
15
+ )
16
+ end
17
+
18
+ def test_email_mixed_text
19
+ masker = MaskPII::Masker.new.mask_emails
20
+ assert_cases(
21
+ masker,
22
+ "Contact: alice@example.com." => "Contact: a****@example.com.",
23
+ "alice@example.com and bob@example.org" => "a****@example.com and b**@example.org"
24
+ )
25
+ end
26
+
27
+ def test_email_edge_cases
28
+ masker = MaskPII::Masker.new.mask_emails
29
+ assert_cases(
30
+ masker,
31
+ "alice@example" => "alice@example",
32
+ "alice@localhost" => "alice@localhost",
33
+ "alice@@example.com" => "alice@@example.com",
34
+ "first.last+tag@sub.domain.com" => "f*************@sub.domain.com"
35
+ )
36
+ end
37
+
38
+ def test_phone_basic_formats
39
+ masker = MaskPII::Masker.new.mask_phones
40
+ assert_cases(
41
+ masker,
42
+ "090-1234-5678" => "***-****-5678",
43
+ "Call (555) 123-4567" => "Call (***) ***-4567",
44
+ "Intl: +81 3 1234 5678" => "Intl: +** * **** 5678",
45
+ "+1 (800) 123-4567" => "+* (***) ***-4567"
46
+ )
47
+ end
48
+
49
+ def test_phone_short_and_boundary_lengths
50
+ masker = MaskPII::Masker.new.mask_phones
51
+ assert_cases(
52
+ masker,
53
+ "1234" => "1234",
54
+ "12345" => "*2345",
55
+ "12-3456" => "**-3456"
56
+ )
57
+ end
58
+
59
+ def test_phone_mixed_text
60
+ masker = MaskPII::Masker.new.mask_phones
61
+ assert_cases(
62
+ masker,
63
+ "Tel: 090-1234-5678 ext. 99" => "Tel: ***-****-5678 ext. 99",
64
+ "Numbers: 111-2222 and 333-4444" => "Numbers: ***-2222 and ***-4444"
65
+ )
66
+ end
67
+
68
+ def test_phone_edge_cases
69
+ masker = MaskPII::Masker.new.mask_phones
70
+ assert_cases(
71
+ masker,
72
+ "abcdef" => "abcdef",
73
+ "+" => "+",
74
+ "(12) 345 678" => "(**) **5 678"
75
+ )
76
+ end
77
+
78
+ def test_combined_masking
79
+ masker = MaskPII::Masker.new.mask_emails.mask_phones
80
+ assert_cases(
81
+ masker,
82
+ "Contact: alice@example.com or 090-1234-5678." => "Contact: a****@example.com or ***-****-5678.",
83
+ "Email bob@example.org, phone +1 (800) 123-4567" => "Email b**@example.org, phone +* (***) ***-4567"
84
+ )
85
+ end
86
+
87
+ def test_custom_mask_character
88
+ email_masker = MaskPII::Masker.new.mask_emails.with_mask_char("#")
89
+ phone_masker = MaskPII::Masker.new.mask_phones.with_mask_char("#")
90
+ combined = MaskPII::Masker.new.mask_emails.mask_phones.with_mask_char("#")
91
+
92
+ assert_cases(
93
+ email_masker,
94
+ "alice@example.com" => 'a####@example.com'
95
+ )
96
+
97
+ assert_cases(
98
+ phone_masker,
99
+ "090-1234-5678" => "###-####-5678"
100
+ )
101
+
102
+ assert_equal 'Contact: a####@example.com or ###-####-5678.',
103
+ combined.process("Contact: alice@example.com or 090-1234-5678.")
104
+ end
105
+
106
+ def test_masker_configuration
107
+ input = "alice@example.com 090-1234-5678"
108
+
109
+ assert_equal input, MaskPII::Masker.new.process(input)
110
+
111
+ email_only = MaskPII::Masker.new.mask_emails
112
+ assert_equal "a****@example.com 090-1234-5678", email_only.process(input)
113
+
114
+ phone_only = MaskPII::Masker.new.mask_phones
115
+ assert_equal "alice@example.com ***-****-5678", phone_only.process(input)
116
+
117
+ both = MaskPII::Masker.new.mask_emails.mask_phones
118
+ assert_equal "a****@example.com ***-****-5678", both.process(input)
119
+ end
120
+
121
+ def test_non_ascii_text_is_preserved
122
+ masker = MaskPII::Masker.new.mask_emails.mask_phones
123
+ input = "連絡先: alice@example.com と 090-1234-5678"
124
+ expected = "連絡先: a****@example.com と ***-****-5678"
125
+ assert_equal expected, masker.process(input)
126
+ end
127
+
128
+ private
129
+
130
+ def assert_cases(masker, cases)
131
+ cases.each do |input, expected|
132
+ assert_equal expected, masker.process(input)
133
+ end
134
+ end
135
+ end
metadata ADDED
@@ -0,0 +1,51 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: mask-pii
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.2.0
5
+ platform: ruby
6
+ authors:
7
+ - Finite Field, K.K.
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2026-01-28 00:00:00.000000000 Z
12
+ dependencies: []
13
+ description: A lightweight, customizable Ruby library for masking PII such as email
14
+ addresses and phone numbers.
15
+ email:
16
+ - dev@finitefield.org
17
+ executables: []
18
+ extensions: []
19
+ extra_rdoc_files: []
20
+ files:
21
+ - README.md
22
+ - lib/mask_pii.rb
23
+ - lib/mask_pii/version.rb
24
+ - test/test_mask_pii.rb
25
+ homepage: https://finitefield.org/en/oss/mask-pii
26
+ licenses:
27
+ - MIT
28
+ metadata:
29
+ homepage_uri: https://finitefield.org/en/oss/mask-pii
30
+ source_code_uri: https://github.com/finitefield-org/mask-pii
31
+ rubygems_mfa_required: 'true'
32
+ post_install_message:
33
+ rdoc_options: []
34
+ require_paths:
35
+ - lib
36
+ required_ruby_version: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: 2.7.0
41
+ required_rubygems_version: !ruby/object:Gem::Requirement
42
+ requirements:
43
+ - - ">="
44
+ - !ruby/object:Gem::Version
45
+ version: '0'
46
+ requirements: []
47
+ rubygems_version: 3.5.9
48
+ signing_key:
49
+ specification_version: 4
50
+ summary: A lightweight library to mask PII (emails and phone numbers).
51
+ test_files: []