pii_cipher 0.1.0-arm64-darwin
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/CHANGELOG.md +15 -0
- data/CODE_OF_CONDUCT.md +10 -0
- data/LICENSE.txt +21 -0
- data/README.md +302 -0
- data/Rakefile +34 -0
- data/benchmarks/run.rb +203 -0
- data/lib/pii_cipher/3.2/pii_cipher.bundle +0 -0
- data/lib/pii_cipher/3.3/pii_cipher.bundle +0 -0
- data/lib/pii_cipher/active_record_ext.rb +92 -0
- data/lib/pii_cipher/query_interceptor.rb +73 -0
- data/lib/pii_cipher/railtie.rb +15 -0
- data/lib/pii_cipher/version.rb +5 -0
- data/lib/pii_cipher.rb +51 -0
- data/mise.toml +2 -0
- data/sig/pii_cipher.rbs +16 -0
- metadata +99 -0
checksums.yaml
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
---
|
|
2
|
+
SHA256:
|
|
3
|
+
metadata.gz: 263c43b19c209069638c95cd92b5a303654d1cb1b6b76da2d8d53d4f987e4b38
|
|
4
|
+
data.tar.gz: c3844548e420df53189717786b7712aa57ed5d9eae836fb71b532ae54713b3d4
|
|
5
|
+
SHA512:
|
|
6
|
+
metadata.gz: cc86376c2c80ea8ab090b624ce5eaa5207d91d59fe6675d5993d302efc26422ee246fb569d40351e41616c786bf4f9fb0d3285a1966cc6713589ac68012e69da
|
|
7
|
+
data.tar.gz: 13538529242384a7ec2170e59475b64d30768709de540406fe6623f416dc1814f6666c9bab412f5138a79fd3f79458c46f64a49c30ca86b105f54cd31ec02007
|
data/CHANGELOG.md
ADDED
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
## [Unreleased]
|
|
2
|
+
|
|
3
|
+
- Case-insensitive search by default (values are downcased before hashing); opt out with `case_sensitive: true`.
|
|
4
|
+
- Configurable partial-search window via `gram_size:` (default 3).
|
|
5
|
+
- Query rewriting now works on chained relations and scopes, not just direct `Model.where` calls.
|
|
6
|
+
- `where` no longer mutates the conditions hash passed to it.
|
|
7
|
+
- Blanking an attribute now clears its blind index instead of leaving a stale one.
|
|
8
|
+
- Clearer error (`PiiCipher::MissingSecretKeyError`) when `PII_SECRET_KEY` is unset.
|
|
9
|
+
- Lowered requirements to Ruby >= 3.1 and ActiveRecord/Railties >= 7.1.
|
|
10
|
+
- Replaced placeholder test with a real RSpec suite (unit + PostgreSQL integration) and Rust unit tests; CI now runs across Ruby 3.1–4.0 with a Postgres service.
|
|
11
|
+
- Renamed the Rust entry point `generate_trigram_hashes` → `generate_ngram_hashes(text, secret, n)`.
|
|
12
|
+
|
|
13
|
+
## [0.1.0] - 2026-05-16
|
|
14
|
+
|
|
15
|
+
- Initial release
|
data/CODE_OF_CONDUCT.md
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
1
|
+
# Code of Conduct
|
|
2
|
+
|
|
3
|
+
"pii_cipher" follows [The Ruby Community Conduct Guideline](https://www.ruby-lang.org/en/conduct) in all "collaborative space", which is defined as community communications channels (such as mailing lists, submitted patches, commit comments, etc.):
|
|
4
|
+
|
|
5
|
+
* Participants will be tolerant of opposing views.
|
|
6
|
+
* Participants must ensure that their language and actions are free of personal attacks and disparaging personal remarks.
|
|
7
|
+
* When interpreting the words and actions of others, participants should always assume good intentions.
|
|
8
|
+
* Behaviour which can be reasonably considered harassment will not be tolerated.
|
|
9
|
+
|
|
10
|
+
If you have any concerns about behaviour within this project, please contact us at ["selva.chezhian@momentivesoftware.com"](mailto:"selva.chezhian@momentivesoftware.com").
|
data/LICENSE.txt
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
The MIT License (MIT)
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Selva Chezhian
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in
|
|
13
|
+
all copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
|
21
|
+
THE SOFTWARE.
|
data/README.md
ADDED
|
@@ -0,0 +1,302 @@
|
|
|
1
|
+
# PiiCipher
|
|
2
|
+
|
|
3
|
+
A Rails gem that enables **searchable blind indexing** for PII fields — powered by a Rust extension for performance.
|
|
4
|
+
|
|
5
|
+
PiiCipher handles the **search layer** of encrypted PII. It is designed to sit alongside Rails' built-in `ActiveRecord::Encryption` (`encrypts :email`), which handles the actual column encryption. Together they give you full GDPR-compliant storage: the real value never touches the database as plaintext, and searching still works.
|
|
6
|
+
|
|
7
|
+
PiiCipher computes HMAC-SHA256 hashes of the plaintext value before it is encrypted, and stores those hashes in a separate column. Queries are rewritten to search the hashes — the ciphertext column is never scanned.
|
|
8
|
+
|
|
9
|
+
Two search modes are supported:
|
|
10
|
+
|
|
11
|
+
| Mode | Column type | Use case |
|
|
12
|
+
|------|-------------|----------|
|
|
13
|
+
| **Partial** (default) | `jsonb` array | `LIKE`-style substring searches (e.g. searching `"smi"` matches `"Smith"`) |
|
|
14
|
+
| **Exact** | `string` | Exact-match lookups (e.g. looking up a full SSN or email) |
|
|
15
|
+
|
|
16
|
+
## How it works
|
|
17
|
+
|
|
18
|
+
### Partial search — trigram blind indexing
|
|
19
|
+
|
|
20
|
+
For partial search, PiiCipher slides a window across the plaintext and HMAC-SHA256s each n-gram using your secret key. The window size defaults to 3 (trigrams) and is configurable per attribute with `gram_size:`:
|
|
21
|
+
|
|
22
|
+
```
|
|
23
|
+
"smith" → ["smi", "mit", "ith"] → [hmac("smi"), hmac("mit"), hmac("ith")]
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
By default values are downcased before hashing, so search is **case-insensitive** (`"smi"` matches `"Smith"`). Set `case_sensitive: true` to opt out.
|
|
27
|
+
|
|
28
|
+
These hashes are stored in a `jsonb` array column. Querying with `where(email: "mit")` generates the same hashes for the search term and uses a PostgreSQL `@>` (contains) check — no plaintext ever touches the database.
|
|
29
|
+
|
|
30
|
+
Partial search is **approximate**: `@>` matches when the stored array contains *all* of the search term's n-gram hashes, which is occasionally satisfied by values that don't actually contain the term as a contiguous substring. Treat it like a fast candidate filter; if you need exact substring semantics, re-filter the returned (decrypted) records in Ruby.
|
|
31
|
+
|
|
32
|
+
### Exact search — single blind index
|
|
33
|
+
|
|
34
|
+
For exact match, a single HMAC-SHA256 of the full value is stored in a regular string column. Querying generates the same hash and does a standard equality check.
|
|
35
|
+
|
|
36
|
+
Both hash functions live in a Rust extension (`magnus` bindings + the `hmac` and `sha2` crates) and are called transparently from Ruby.
|
|
37
|
+
|
|
38
|
+
### Column encryption (the full picture)
|
|
39
|
+
|
|
40
|
+
PiiCipher only generates the blind indexes — it does not encrypt the column itself. Column encryption is handled by Rails AR Encryption (`encrypts`). The two work at different layers and do not interfere:
|
|
41
|
+
|
|
42
|
+
```
|
|
43
|
+
user.save
|
|
44
|
+
├─ before_save (pii_cipher) → reads plaintext → writes hashes to email_bidx_array
|
|
45
|
+
└─ DB write (Rails AR Enc.) → encrypts plaintext → writes ciphertext to email column
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
Because Rails AR Encryption works at the DB serialization layer (not a callback), `self.email` always returns plaintext during `before_save` — pii_cipher always hashes the real value, never the ciphertext.
|
|
49
|
+
|
|
50
|
+
## Requirements
|
|
51
|
+
|
|
52
|
+
- Ruby >= 3.1
|
|
53
|
+
- Rails / ActiveRecord >= 7.1 (Active Record Encryption ships in Rails 7.0+)
|
|
54
|
+
- PostgreSQL (partial search relies on the `jsonb` `@>` operator)
|
|
55
|
+
- Rust toolchain (only needed when building the gem from source)
|
|
56
|
+
|
|
57
|
+
## Installation
|
|
58
|
+
|
|
59
|
+
Add to your `Gemfile`:
|
|
60
|
+
|
|
61
|
+
```ruby
|
|
62
|
+
gem "pii_cipher"
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
Then run:
|
|
66
|
+
|
|
67
|
+
```bash
|
|
68
|
+
bundle install
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
## Setup
|
|
72
|
+
|
|
73
|
+
### 1. Generate Rails AR Encryption keys
|
|
74
|
+
|
|
75
|
+
Run this once to generate the three keys Rails AR Encryption needs:
|
|
76
|
+
|
|
77
|
+
```bash
|
|
78
|
+
bin/rails db:encryption:init
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
Copy the output into your credentials file:
|
|
82
|
+
|
|
83
|
+
```bash
|
|
84
|
+
bin/rails credentials:edit
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
```yaml
|
|
88
|
+
active_record_encryption:
|
|
89
|
+
primary_key: <generated>
|
|
90
|
+
deterministic_key: <generated>
|
|
91
|
+
key_derivation_salt: <generated>
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
These keys encrypt and decrypt the column values. Keep them in your secrets manager — losing them means losing access to your data.
|
|
95
|
+
|
|
96
|
+
### 2. Set the PiiCipher secret key
|
|
97
|
+
|
|
98
|
+
PiiCipher reads the HMAC key from the `PII_SECRET_KEY` environment variable. Add it to your environment (e.g. via credentials, dotenv, or your secrets manager):
|
|
99
|
+
|
|
100
|
+
```bash
|
|
101
|
+
PII_SECRET_KEY=your-long-random-secret-here
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
Generate a secure random value with:
|
|
105
|
+
|
|
106
|
+
```bash
|
|
107
|
+
rails secret
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
Changing this key will invalidate all existing blind indexes.
|
|
111
|
+
|
|
112
|
+
### 3. Add blind index columns
|
|
113
|
+
|
|
114
|
+
For each encrypted attribute, add the corresponding blind index column in a migration.
|
|
115
|
+
|
|
116
|
+
**Partial search** (default — stores trigram hashes in a `jsonb` array):
|
|
117
|
+
|
|
118
|
+
```ruby
|
|
119
|
+
class AddEmailBidxToUsers < ActiveRecord::Migration[8.1]
|
|
120
|
+
def change
|
|
121
|
+
add_column :users, :email_bidx_array, :jsonb
|
|
122
|
+
add_index :users, :email_bidx_array, using: :gin
|
|
123
|
+
end
|
|
124
|
+
end
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
**Exact search** (stores a single hash string):
|
|
128
|
+
|
|
129
|
+
```ruby
|
|
130
|
+
class AddSsnBidxToUsers < ActiveRecord::Migration[8.1]
|
|
131
|
+
def change
|
|
132
|
+
add_column :users, :ssn_bidx, :string
|
|
133
|
+
add_index :users, :ssn_bidx
|
|
134
|
+
end
|
|
135
|
+
end
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
The GIN index on `jsonb` columns is strongly recommended for performance on partial searches.
|
|
139
|
+
|
|
140
|
+
### 4. Declare encrypted attributes in your model
|
|
141
|
+
|
|
142
|
+
Declare `encrypts` (Rails AR Encryption) first, then `use_pii_cipher`. Both must be present for full GDPR-compliant searchable encryption.
|
|
143
|
+
|
|
144
|
+
```ruby
|
|
145
|
+
class User < ApplicationRecord
|
|
146
|
+
encrypts :email # Rails: stores ciphertext in DB, decrypts on read
|
|
147
|
+
use_pii_cipher :email # pii_cipher: generates trigram blind indexes from plaintext
|
|
148
|
+
|
|
149
|
+
encrypts :ssn
|
|
150
|
+
use_pii_cipher :ssn, partial: false # exact-match blind index
|
|
151
|
+
end
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
Multiple attributes can be passed to `use_pii_cipher` in a single call:
|
|
155
|
+
|
|
156
|
+
```ruby
|
|
157
|
+
encrypts :email, :phone_number
|
|
158
|
+
use_pii_cipher :email, :phone_number
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
## Usage
|
|
162
|
+
|
|
163
|
+
### Saving records
|
|
164
|
+
|
|
165
|
+
No changes to your existing create/update code. Everything happens automatically:
|
|
166
|
+
|
|
167
|
+
```ruby
|
|
168
|
+
User.create!(email: "alice@example.com", ssn: "123-45-6789")
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
What happens under the hood:
|
|
172
|
+
|
|
173
|
+
1. `before_save` (pii_cipher) reads `"alice@example.com"` as plaintext, generates trigram hashes, writes them to `email_bidx_array`
|
|
174
|
+
2. Rails AR Encryption encrypts `"alice@example.com"` and writes ciphertext to the `email` column
|
|
175
|
+
|
|
176
|
+
### What's in the database vs what Ruby sees
|
|
177
|
+
|
|
178
|
+
```ruby
|
|
179
|
+
user = User.find(1)
|
|
180
|
+
|
|
181
|
+
# Ruby — always decrypted transparently by Rails
|
|
182
|
+
user.email
|
|
183
|
+
# => "alice@example.com"
|
|
184
|
+
|
|
185
|
+
# Raw database row — email column holds ciphertext, blind index holds hashes
|
|
186
|
+
# email => {"p":"Wd5LybiwJGPHYI...","h":{"iv":"XJul...","at":"Pk..."}}
|
|
187
|
+
# email_bidx_array => ["a3f2c1...", "9b4e7d...", ...]
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
Nobody with direct database access can read the email. The blind index is just opaque hashes — it reveals nothing about the original value without the `PII_SECRET_KEY`.
|
|
191
|
+
|
|
192
|
+
### Querying
|
|
193
|
+
|
|
194
|
+
Pass the plaintext value to `where` exactly as you normally would — PiiCipher intercepts encrypted columns and rewrites the query to search the blind index:
|
|
195
|
+
|
|
196
|
+
```ruby
|
|
197
|
+
# Partial search — finds any user whose email contains "alice"
|
|
198
|
+
User.where(email: "alice")
|
|
199
|
+
|
|
200
|
+
# Exact search — finds the user with that exact SSN
|
|
201
|
+
User.where(ssn: "123-45-6789")
|
|
202
|
+
|
|
203
|
+
# Mix encrypted and plain columns freely
|
|
204
|
+
User.where(email: "alice", status: "active")
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
The found records have their emails decrypted by Rails on the way out — callers always receive plaintext. The interceptor only rewrites keys declared with `use_pii_cipher`; all other `where` calls pass through to ActiveRecord unchanged.
|
|
208
|
+
|
|
209
|
+
## Performance
|
|
210
|
+
|
|
211
|
+
Benchmarked on a local machine against PostgreSQL 18 with 100,000 rows. The comparison baseline is a plain (unencrypted) column with a standard index — the closest real-world alternative for each search type.
|
|
212
|
+
|
|
213
|
+
### Writes
|
|
214
|
+
|
|
215
|
+
| | Time (100k rows) |
|
|
216
|
+
|---|---|
|
|
217
|
+
| Plain insert | 1,221 ms |
|
|
218
|
+
| Encrypted insert | 2,861 ms (+134%) |
|
|
219
|
+
|
|
220
|
+
The overhead is not from the Rust hashing — that runs in microseconds. It comes from **writing significantly more data per row**: each record gains a `jsonb` array of 64-character HMAC hex strings (one per trigram) and a 64-character blind index string. Both the larger rows and the GIN index maintenance during insert contribute to the slower writes.
|
|
221
|
+
|
|
222
|
+
### Reads
|
|
223
|
+
|
|
224
|
+
| Query type | Plain | Encrypted | Difference |
|
|
225
|
+
|---|---|---|---|
|
|
226
|
+
| Exact match (B-tree) | 0.121 ms | 0.095 ms | ~within noise |
|
|
227
|
+
| Partial match (GIN) | 1.515 ms | 1.865 ms | +23% |
|
|
228
|
+
|
|
229
|
+
**Exact match** is effectively identical. Both paths hit a B-tree index; the lookup cost is the same regardless of what the key looks like.
|
|
230
|
+
|
|
231
|
+
**Partial match** is ~23% slower. The GIN index sizes end up comparable (see below), but PostgreSQL has to parse the `jsonb` array and evaluate the `@>` containment operator on each probe, which adds a small constant overhead that `pg_trgm`'s native GIN operator doesn't pay.
|
|
232
|
+
|
|
233
|
+
### Storage
|
|
234
|
+
|
|
235
|
+
| | Table total | Email index | Name GIN index |
|
|
236
|
+
|---|---|---|---|
|
|
237
|
+
| Plain | 21 MB | 5 MB | 7.2 MB |
|
|
238
|
+
| Encrypted | 89 MB | 12 MB | 7.0 MB |
|
|
239
|
+
|
|
240
|
+
The table is **4.2× larger**. Every stored trigram hash is 64 characters regardless of what the original value looked like — a 5-character name still produces 3 trigrams × 64 chars = 192 bytes of blind index data. At large scale, this is the dominant cost to plan for.
|
|
241
|
+
|
|
242
|
+
The email B-tree index is 2.4× larger for the same reason (64-char hash vs ~25-char email). The name GIN index sizes are nearly identical — HMAC hashes repeat across rows the same way plain trigrams do (same input + same key = same hash), so the GIN posting lists compress similarly.
|
|
243
|
+
|
|
244
|
+
### What this means in practice
|
|
245
|
+
|
|
246
|
+
- **Reads are fast.** Sub-millisecond exact lookups and ~2ms partial searches hold up well even at this row count.
|
|
247
|
+
- **Writes cost more.** If your workload is write-heavy on PII fields, budget for the extra insert time.
|
|
248
|
+
- **Storage is the main tradeoff.** Plan for roughly 4× the table and index footprint compared to an equivalent unencrypted schema.
|
|
249
|
+
|
|
250
|
+
You can reproduce these results yourself:
|
|
251
|
+
|
|
252
|
+
```bash
|
|
253
|
+
ruby -I lib benchmarks/run.rb
|
|
254
|
+
```
|
|
255
|
+
|
|
256
|
+
## Configuration reference
|
|
257
|
+
|
|
258
|
+
`use_pii_cipher(*attributes, partial: true, gram_size: 3, case_sensitive: false)`
|
|
259
|
+
|
|
260
|
+
| Option | Type | Default | Description |
|
|
261
|
+
|--------|------|---------|-------------|
|
|
262
|
+
| `partial` | Boolean | `true` | `true` → n-gram array in `column_bidx_array`; `false` → single hash in `column_bidx` |
|
|
263
|
+
| `gram_size` | Integer | `3` | Sliding-window size for partial search. Ignored when `partial: false`. Changing it invalidates existing indexes. |
|
|
264
|
+
| `case_sensitive` | Boolean | `false` | `false` downcases values before hashing (case-insensitive search). Must match between stored index and queries; changing it invalidates existing indexes. |
|
|
265
|
+
|
|
266
|
+
## Limitations & gotchas
|
|
267
|
+
|
|
268
|
+
- **Query rewriting covers hash-form `where`.** `Model.where(email: "x")`, scopes, and chained relations (`Model.active.where(email: "x")`) are all rewritten. Conditions that don't go through `where(hash)` are **not** rewritten — including `where.not(...)`, raw string/array conditions (`where("email = ?", x)`), `.or(...)` branches, and `find_by` with string SQL. For those, build the blind index yourself with `PiiCipher.generate_ngram_hashes` / `generate_blind_index`.
|
|
269
|
+
- **Partial search is approximate** and may over-match (see "How it works"). Re-filter in Ruby if you need exact substring semantics.
|
|
270
|
+
- **Search terms shorter than `gram_size`** are hashed whole and only match values that were themselves shorter than `gram_size`. Prefer search terms at least `gram_size` characters long.
|
|
271
|
+
- **PostgreSQL only** for partial search — it uses the `jsonb` `@>` containment operator.
|
|
272
|
+
- **Key/option changes invalidate indexes.** Changing `PII_SECRET_KEY`, `gram_size`, or `case_sensitive` means existing blind indexes no longer match; you must re-save affected records to regenerate them.
|
|
273
|
+
|
|
274
|
+
## Development
|
|
275
|
+
|
|
276
|
+
After checking out the repo, run `bin/setup` to install dependencies (this also compiles the Rust extension). Then run the test suite:
|
|
277
|
+
|
|
278
|
+
```bash
|
|
279
|
+
bundle exec rake spec
|
|
280
|
+
```
|
|
281
|
+
|
|
282
|
+
The Ruby specs include a PostgreSQL-backed integration suite (it builds a temporary table and exercises real `@>` queries). Set the standard `PG*` env vars to point at a database, or skip those examples with `bundle exec rspec --tag ~integration`. The Rust extension also has its own unit tests, runnable from `ext/pii_cipher` with `cargo test`.
|
|
283
|
+
|
|
284
|
+
To open an interactive console with the gem loaded:
|
|
285
|
+
|
|
286
|
+
```bash
|
|
287
|
+
bin/console
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
To build and install the gem locally:
|
|
291
|
+
|
|
292
|
+
```bash
|
|
293
|
+
bundle exec rake install
|
|
294
|
+
```
|
|
295
|
+
|
|
296
|
+
## Contributing
|
|
297
|
+
|
|
298
|
+
Bug reports and pull requests are welcome on GitHub at https://github.com/selvachezhian/pii_cipher. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [code of conduct](CODE_OF_CONDUCT.md).
|
|
299
|
+
|
|
300
|
+
## License
|
|
301
|
+
|
|
302
|
+
The gem is available as open source under the terms of the [MIT License](LICENSE.txt).
|
data/Rakefile
ADDED
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "bundler/gem_tasks"
|
|
4
|
+
require "rspec/core/rake_task"
|
|
5
|
+
|
|
6
|
+
RSpec::Core::RakeTask.new(:spec)
|
|
7
|
+
|
|
8
|
+
require "rubocop/rake_task"
|
|
9
|
+
|
|
10
|
+
RuboCop::RakeTask.new
|
|
11
|
+
|
|
12
|
+
require "rb_sys/extensiontask"
|
|
13
|
+
|
|
14
|
+
task build: :compile
|
|
15
|
+
|
|
16
|
+
GEMSPEC = Gem::Specification.load("pii_cipher.gemspec")
|
|
17
|
+
|
|
18
|
+
RbSys::ExtensionTask.new("pii_cipher", GEMSPEC) do |ext|
|
|
19
|
+
ext.lib_dir = "lib/pii_cipher"
|
|
20
|
+
ext.cross_compile = true
|
|
21
|
+
ext.cross_platform = %w[
|
|
22
|
+
x86_64-linux
|
|
23
|
+
aarch64-linux
|
|
24
|
+
x86_64-darwin
|
|
25
|
+
arm64-darwin
|
|
26
|
+
x64-mingw-ucrt
|
|
27
|
+
]
|
|
28
|
+
ext.cross_compiling do |spec|
|
|
29
|
+
# rb_sys is only needed to compile from source; pre-built gems don't need it
|
|
30
|
+
spec.dependencies.reject! { |dep| dep.name == "rb_sys" }
|
|
31
|
+
end
|
|
32
|
+
end
|
|
33
|
+
|
|
34
|
+
task default: %i[compile spec rubocop]
|
data/benchmarks/run.rb
ADDED
|
@@ -0,0 +1,203 @@
|
|
|
1
|
+
require 'active_record'
|
|
2
|
+
require 'benchmark'
|
|
3
|
+
require 'pg'
|
|
4
|
+
|
|
5
|
+
$LOAD_PATH.unshift File.join(__dir__, '..', 'lib')
|
|
6
|
+
require 'pii_cipher'
|
|
7
|
+
|
|
8
|
+
DB = 'pii_cipher_benchmark'
|
|
9
|
+
SECRET = 'benchmark-secret-key-do-not-use-in-prod'
|
|
10
|
+
ROWS = 100_000
|
|
11
|
+
QUERIES = 1_000
|
|
12
|
+
|
|
13
|
+
NAMES = %w[
|
|
14
|
+
Smith Johnson Williams Brown Jones Garcia Miller Davis Wilson Moore
|
|
15
|
+
Taylor Anderson Thomas Jackson White Harris Martin Thompson Chezhian
|
|
16
|
+
Robinson Clark Rodriguez Lewis Lee Walker Hall Allen Young Hernandez
|
|
17
|
+
].freeze
|
|
18
|
+
|
|
19
|
+
DOMAINS = %w[gmail.com yahoo.com outlook.com protonmail.com icloud.com].freeze
|
|
20
|
+
|
|
21
|
+
def rand_name = "#{NAMES.sample}#{rand(9999)}"
|
|
22
|
+
def rand_email = "#{NAMES.sample.downcase}#{rand(9999)}@#{DOMAINS.sample}"
|
|
23
|
+
|
|
24
|
+
# ── DB setup ─────────────────────────────────────────────────────────────────
|
|
25
|
+
|
|
26
|
+
def connect(db = 'postgres')
|
|
27
|
+
ActiveRecord::Base.establish_connection(adapter: 'postgresql', database: db, host: 'localhost')
|
|
28
|
+
ActiveRecord::Base.connection
|
|
29
|
+
end
|
|
30
|
+
|
|
31
|
+
puts "==> Setting up database '#{DB}'..."
|
|
32
|
+
conn = connect('postgres')
|
|
33
|
+
conn.execute("DROP DATABASE IF EXISTS #{DB}")
|
|
34
|
+
conn.execute("CREATE DATABASE #{DB}")
|
|
35
|
+
|
|
36
|
+
conn = connect(DB)
|
|
37
|
+
conn.execute('CREATE EXTENSION IF NOT EXISTS pg_trgm')
|
|
38
|
+
|
|
39
|
+
conn.create_table :plain_users, force: true do |t|
|
|
40
|
+
t.string :name, null: false
|
|
41
|
+
t.string :email, null: false
|
|
42
|
+
end
|
|
43
|
+
|
|
44
|
+
conn.create_table :encrypted_users, force: true do |t|
|
|
45
|
+
t.string :name, null: false
|
|
46
|
+
t.string :email, null: false
|
|
47
|
+
t.jsonb :name_bidx_array
|
|
48
|
+
t.string :email_bidx
|
|
49
|
+
end
|
|
50
|
+
|
|
51
|
+
# Indexes on plain table
|
|
52
|
+
conn.execute('CREATE INDEX idx_plain_email ON plain_users (email)')
|
|
53
|
+
conn.execute('CREATE INDEX idx_plain_name_trgm ON plain_users USING GIN (name gin_trgm_ops)')
|
|
54
|
+
|
|
55
|
+
# Indexes on encrypted table
|
|
56
|
+
conn.execute('CREATE INDEX idx_enc_email_bidx ON encrypted_users (email_bidx)')
|
|
57
|
+
conn.execute('CREATE INDEX idx_enc_name_bidx_array ON encrypted_users USING GIN (name_bidx_array)')
|
|
58
|
+
|
|
59
|
+
puts "==> Tables and indexes created.\n\n"
|
|
60
|
+
|
|
61
|
+
# ── Seed data (pre-generate so Rust/hash time isn't mixed with AR overhead) ──
|
|
62
|
+
|
|
63
|
+
puts "==> Pre-generating #{ROWS} rows of test data..."
|
|
64
|
+
plain_rows = ROWS.times.map { { name: rand_name, email: rand_email } }
|
|
65
|
+
encrypted_rows = plain_rows.map do |r|
|
|
66
|
+
{
|
|
67
|
+
name: r[:name],
|
|
68
|
+
email: r[:email],
|
|
69
|
+
name_bidx_array: PiiCipher.generate_ngram_hashes(r[:name].downcase, SECRET, 3).to_json,
|
|
70
|
+
email_bidx: PiiCipher.generate_blind_index(r[:email].downcase, SECRET)
|
|
71
|
+
}
|
|
72
|
+
end
|
|
73
|
+
puts "==> Done.\n\n"
|
|
74
|
+
|
|
75
|
+
# ── Benchmark writes ──────────────────────────────────────────────────────────
|
|
76
|
+
|
|
77
|
+
puts "=" * 60
|
|
78
|
+
puts "WRITE BENCHMARK (#{ROWS} rows, batch insert)"
|
|
79
|
+
puts "=" * 60
|
|
80
|
+
|
|
81
|
+
write_results = Benchmark.bm(30) do |x|
|
|
82
|
+
x.report("Plain insert:") do
|
|
83
|
+
plain_rows.each_slice(1000) do |batch|
|
|
84
|
+
conn.execute(
|
|
85
|
+
"INSERT INTO plain_users (name, email) VALUES " +
|
|
86
|
+
batch.map { |r| "('#{conn.quote_string(r[:name])}','#{conn.quote_string(r[:email])}')" }.join(',')
|
|
87
|
+
)
|
|
88
|
+
end
|
|
89
|
+
end
|
|
90
|
+
|
|
91
|
+
x.report("Encrypted insert:") do
|
|
92
|
+
encrypted_rows.each_slice(1000) do |batch|
|
|
93
|
+
conn.execute(
|
|
94
|
+
"INSERT INTO encrypted_users (name, email, name_bidx_array, email_bidx) VALUES " +
|
|
95
|
+
batch.map { |r|
|
|
96
|
+
name_array = r[:name_bidx_array].gsub("'", "''")
|
|
97
|
+
"('#{conn.quote_string(r[:name])}','#{conn.quote_string(r[:email])}'," \
|
|
98
|
+
"'#{name_array}'::jsonb,'#{conn.quote_string(r[:email_bidx])}')"
|
|
99
|
+
}.join(',')
|
|
100
|
+
)
|
|
101
|
+
end
|
|
102
|
+
end
|
|
103
|
+
end
|
|
104
|
+
|
|
105
|
+
plain_write_ms = (write_results[0].real * 1000).round(1)
|
|
106
|
+
enc_write_ms = (write_results[1].real * 1000).round(1)
|
|
107
|
+
write_overhead = ((enc_write_ms - plain_write_ms) / plain_write_ms * 100).round(1)
|
|
108
|
+
|
|
109
|
+
# ── Benchmark reads ───────────────────────────────────────────────────────────
|
|
110
|
+
|
|
111
|
+
# Pick real values that exist in the DB for fair comparison
|
|
112
|
+
sample_plain_row = conn.execute('SELECT name, email FROM plain_users ORDER BY RANDOM() LIMIT 1').first
|
|
113
|
+
exact_email = sample_plain_row['email']
|
|
114
|
+
partial_name_term = sample_plain_row['name'][0, 4] # 4-char prefix
|
|
115
|
+
|
|
116
|
+
exact_email_bidx = PiiCipher.generate_blind_index(exact_email.downcase, SECRET)
|
|
117
|
+
partial_hashes_json = PiiCipher.generate_ngram_hashes(partial_name_term.downcase, SECRET, 3).to_json
|
|
118
|
+
|
|
119
|
+
puts "\n"
|
|
120
|
+
puts "=" * 60
|
|
121
|
+
puts "READ BENCHMARK (#{QUERIES} queries each)"
|
|
122
|
+
puts " Exact search term : #{exact_email}"
|
|
123
|
+
puts " Partial search term: #{partial_name_term}"
|
|
124
|
+
puts "=" * 60
|
|
125
|
+
|
|
126
|
+
read_results = Benchmark.bm(30) do |x|
|
|
127
|
+
x.report("Plain exact (B-tree):") do
|
|
128
|
+
QUERIES.times { conn.execute("SELECT id FROM plain_users WHERE email = '#{conn.quote_string(exact_email)}'") }
|
|
129
|
+
end
|
|
130
|
+
|
|
131
|
+
x.report("Encrypted exact (blind idx):") do
|
|
132
|
+
QUERIES.times { conn.execute("SELECT id FROM encrypted_users WHERE email_bidx = '#{conn.quote_string(exact_email_bidx)}'") }
|
|
133
|
+
end
|
|
134
|
+
|
|
135
|
+
x.report("Plain partial (pg_trgm GIN):") do
|
|
136
|
+
QUERIES.times { conn.execute("SELECT id FROM plain_users WHERE name LIKE '%#{conn.quote_string(partial_name_term)}%'") }
|
|
137
|
+
end
|
|
138
|
+
|
|
139
|
+
x.report("Encrypted partial (bidx GIN):") do
|
|
140
|
+
QUERIES.times { conn.execute("SELECT id FROM encrypted_users WHERE name_bidx_array @> '#{partial_hashes_json}'::jsonb") }
|
|
141
|
+
end
|
|
142
|
+
end
|
|
143
|
+
|
|
144
|
+
plain_exact_ms = (read_results[0].real * 1000 / QUERIES).round(3)
|
|
145
|
+
enc_exact_ms = (read_results[1].real * 1000 / QUERIES).round(3)
|
|
146
|
+
plain_partial_ms = (read_results[2].real * 1000 / QUERIES).round(3)
|
|
147
|
+
enc_partial_ms = (read_results[3].real * 1000 / QUERIES).round(3)
|
|
148
|
+
|
|
149
|
+
exact_overhead = ((enc_exact_ms - plain_exact_ms) / plain_exact_ms * 100).round(1)
|
|
150
|
+
partial_overhead = ((enc_partial_ms - plain_partial_ms) / plain_partial_ms * 100).round(1)
|
|
151
|
+
|
|
152
|
+
# ── Storage sizes ─────────────────────────────────────────────────────────────
|
|
153
|
+
|
|
154
|
+
def table_size(conn, table)
|
|
155
|
+
conn.execute("SELECT pg_size_pretty(pg_total_relation_size('#{table}'))").first['pg_size_pretty']
|
|
156
|
+
end
|
|
157
|
+
|
|
158
|
+
def index_size(conn, index)
|
|
159
|
+
result = conn.execute("SELECT pg_size_pretty(pg_relation_size('#{index}'))").first
|
|
160
|
+
result ? result['pg_size_pretty'] : 'n/a'
|
|
161
|
+
end
|
|
162
|
+
|
|
163
|
+
plain_size = table_size(conn, 'plain_users')
|
|
164
|
+
encrypted_size = table_size(conn, 'encrypted_users')
|
|
165
|
+
|
|
166
|
+
plain_email_idx_size = index_size(conn, 'idx_plain_email')
|
|
167
|
+
plain_name_trgm_size = index_size(conn, 'idx_plain_name_trgm')
|
|
168
|
+
enc_email_bidx_size = index_size(conn, 'idx_enc_email_bidx')
|
|
169
|
+
enc_name_bidx_arr_size = index_size(conn, 'idx_enc_name_bidx_array')
|
|
170
|
+
|
|
171
|
+
# ── Print summary ─────────────────────────────────────────────────────────────
|
|
172
|
+
|
|
173
|
+
puts "\n"
|
|
174
|
+
puts "=" * 60
|
|
175
|
+
puts "SUMMARY (#{ROWS} rows)"
|
|
176
|
+
puts "=" * 60
|
|
177
|
+
|
|
178
|
+
puts "\n--- Writes ---"
|
|
179
|
+
puts " Plain insert: #{plain_write_ms} ms total"
|
|
180
|
+
puts " Encrypted insert: #{enc_write_ms} ms total (+#{write_overhead}% overhead)"
|
|
181
|
+
|
|
182
|
+
puts "\n--- Reads (avg per query) ---"
|
|
183
|
+
puts " Exact match"
|
|
184
|
+
puts " Plain (B-tree): #{plain_exact_ms} ms"
|
|
185
|
+
puts " Encrypted (blind idx): #{enc_exact_ms} ms (+#{exact_overhead}% overhead)"
|
|
186
|
+
puts " Partial match"
|
|
187
|
+
puts " Plain (pg_trgm GIN): #{plain_partial_ms} ms"
|
|
188
|
+
puts " Encrypted (bidx GIN): #{enc_partial_ms} ms (+#{partial_overhead}% overhead)"
|
|
189
|
+
|
|
190
|
+
puts "\n--- Storage ---"
|
|
191
|
+
puts " Plain table total: #{plain_size}"
|
|
192
|
+
puts " Encrypted table total: #{encrypted_size}"
|
|
193
|
+
puts ""
|
|
194
|
+
puts " Plain email index (B-tree): #{plain_email_idx_size}"
|
|
195
|
+
puts " Plain name index (pg_trgm GIN): #{plain_name_trgm_size}"
|
|
196
|
+
puts " Encrypted email index (B-tree): #{enc_email_bidx_size}"
|
|
197
|
+
puts " Encrypted name index (GIN): #{enc_name_bidx_arr_size}"
|
|
198
|
+
|
|
199
|
+
# ── Cleanup ───────────────────────────────────────────────────────────────────
|
|
200
|
+
|
|
201
|
+
ActiveRecord::Base.remove_connection
|
|
202
|
+
connect('postgres').execute("DROP DATABASE #{DB}")
|
|
203
|
+
puts "\n==> Benchmark database dropped. Done."
|
|
Binary file
|
|
Binary file
|
|
@@ -0,0 +1,92 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
# lib/pii_cipher/active_record_ext.rb
|
|
4
|
+
require "active_support/concern"
|
|
5
|
+
|
|
6
|
+
module PiiCipher
|
|
7
|
+
# Default sliding-window size for partial (n-gram) blind indexes.
|
|
8
|
+
DEFAULT_GRAM_SIZE = 3
|
|
9
|
+
|
|
10
|
+
module ActiveRecordExt
|
|
11
|
+
extend ActiveSupport::Concern
|
|
12
|
+
|
|
13
|
+
class_methods do
|
|
14
|
+
# Declare one or more attributes as searchable encrypted PII.
|
|
15
|
+
#
|
|
16
|
+
# use_pii_cipher :email # partial trigram search
|
|
17
|
+
# use_pii_cipher :ssn, partial: false # exact-match search
|
|
18
|
+
# use_pii_cipher :name, gram_size: 4 # 4-gram partial search
|
|
19
|
+
# use_pii_cipher :email, case_sensitive: true # do not downcase
|
|
20
|
+
#
|
|
21
|
+
# Options:
|
|
22
|
+
# partial: true -> trigram/n-gram array in `<attr>_bidx_array`
|
|
23
|
+
# false -> single hash in `<attr>_bidx`
|
|
24
|
+
# gram_size: window size for partial search (default: 3). Ignored
|
|
25
|
+
# when partial: false.
|
|
26
|
+
# case_sensitive: false (default) downcases values before hashing so
|
|
27
|
+
# searches are case-insensitive. Must match between the
|
|
28
|
+
# stored index and queries — changing it invalidates
|
|
29
|
+
# existing indexes.
|
|
30
|
+
def use_pii_cipher(*attributes, partial: true, gram_size: PiiCipher::DEFAULT_GRAM_SIZE,
|
|
31
|
+
case_sensitive: false)
|
|
32
|
+
if partial && (!gram_size.is_a?(Integer) || gram_size < 1)
|
|
33
|
+
raise ArgumentError, "gram_size must be a positive integer (got #{gram_size.inspect})"
|
|
34
|
+
end
|
|
35
|
+
|
|
36
|
+
# Registry of which attributes are indexed and how. Built with merge so
|
|
37
|
+
# repeated calls accumulate, and so subclasses (STI) get their own copy
|
|
38
|
+
# rather than mutating a parent's shared hash.
|
|
39
|
+
class_attribute :pii_cipher_configs unless respond_to?(:pii_cipher_configs)
|
|
40
|
+
self.pii_cipher_configs ||= {}
|
|
41
|
+
|
|
42
|
+
new_configs = attributes.each_with_object({}) do |attr, acc|
|
|
43
|
+
acc[attr.to_sym] = {
|
|
44
|
+
partial: partial,
|
|
45
|
+
gram_size: gram_size,
|
|
46
|
+
case_sensitive: case_sensitive
|
|
47
|
+
}
|
|
48
|
+
end
|
|
49
|
+
self.pii_cipher_configs = pii_cipher_configs.merge(new_configs)
|
|
50
|
+
|
|
51
|
+
# Install callbacks and the query patch once per model.
|
|
52
|
+
unless defined?(@_pii_cipher_configured) && @_pii_cipher_configured
|
|
53
|
+
before_save :generate_pii_ciphers!
|
|
54
|
+
PiiCipher.install_query_patch!
|
|
55
|
+
@_pii_cipher_configured = true
|
|
56
|
+
end
|
|
57
|
+
end
|
|
58
|
+
end
|
|
59
|
+
|
|
60
|
+
private
|
|
61
|
+
|
|
62
|
+
# Runs automatically before `record.save`. Reads the still-plaintext value
|
|
63
|
+
# (Rails AR Encryption serializes at the DB layer, not via callbacks) and
|
|
64
|
+
# writes the blind index(es).
|
|
65
|
+
def generate_pii_ciphers!
|
|
66
|
+
secret = PiiCipher.secret_key
|
|
67
|
+
|
|
68
|
+
self.class.pii_cipher_configs.each do |column, config|
|
|
69
|
+
raw_value = send(column)
|
|
70
|
+
|
|
71
|
+
if raw_value.blank?
|
|
72
|
+
# Clear any stale index so a removed/blanked value stops matching.
|
|
73
|
+
if config[:partial]
|
|
74
|
+
send("#{column}_bidx_array=", nil)
|
|
75
|
+
else
|
|
76
|
+
send("#{column}_bidx=", nil)
|
|
77
|
+
end
|
|
78
|
+
next
|
|
79
|
+
end
|
|
80
|
+
|
|
81
|
+
value = PiiCipher.normalize(raw_value, config)
|
|
82
|
+
|
|
83
|
+
if config[:partial]
|
|
84
|
+
hashes = PiiCipher.generate_ngram_hashes(value, secret, config[:gram_size])
|
|
85
|
+
send("#{column}_bidx_array=", hashes)
|
|
86
|
+
else
|
|
87
|
+
send("#{column}_bidx=", PiiCipher.generate_blind_index(value, secret))
|
|
88
|
+
end
|
|
89
|
+
end
|
|
90
|
+
end
|
|
91
|
+
end
|
|
92
|
+
end
|
|
@@ -0,0 +1,73 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
# lib/pii_cipher/query_interceptor.rb
|
|
4
|
+
|
|
5
|
+
module PiiCipher
|
|
6
|
+
# Prepended onto ActiveRecord::Relation so that `where(hash)` is rewritten to
|
|
7
|
+
# search blind indexes — for the model class itself AND for any relation
|
|
8
|
+
# derived from it. Because `Model.where(...)` delegates to `Model.all.where`,
|
|
9
|
+
# patching the relation also covers class-level calls, scopes, and chains
|
|
10
|
+
# like `Model.active.where(email: "alice")`.
|
|
11
|
+
#
|
|
12
|
+
# Only hash-form `where` on attributes declared with `use_pii_cipher` is
|
|
13
|
+
# rewritten. String/array conditions, and models that don't use PiiCipher,
|
|
14
|
+
# pass straight through to ActiveRecord untouched.
|
|
15
|
+
module RelationExt
|
|
16
|
+
def where(*args, &block)
|
|
17
|
+
opts = args.first
|
|
18
|
+
|
|
19
|
+
configs = pii_cipher_configs_for_relation
|
|
20
|
+
if configs && opts.is_a?(Hash)
|
|
21
|
+
encrypted_keys = opts.keys.select { |k| configs.key?(k.to_sym) }
|
|
22
|
+
|
|
23
|
+
if encrypted_keys.any?
|
|
24
|
+
secret = PiiCipher.secret_key
|
|
25
|
+
# Dup so we never mutate the caller's hash (e.g. `where(params)`).
|
|
26
|
+
remaining = opts.dup
|
|
27
|
+
relation = self
|
|
28
|
+
|
|
29
|
+
encrypted_keys.each do |key|
|
|
30
|
+
raw_term = remaining.delete(key)
|
|
31
|
+
config = configs[key.to_sym]
|
|
32
|
+
|
|
33
|
+
# nil means "search for records with no value" — match the cleared
|
|
34
|
+
# (NULL) blind index rather than hashing nil.
|
|
35
|
+
if raw_term.nil?
|
|
36
|
+
column = config[:partial] ? "#{key}_bidx_array" : "#{key}_bidx"
|
|
37
|
+
relation = relation.where(column => nil)
|
|
38
|
+
next
|
|
39
|
+
end
|
|
40
|
+
|
|
41
|
+
value = PiiCipher.normalize(raw_term, config)
|
|
42
|
+
|
|
43
|
+
relation =
|
|
44
|
+
if config[:partial]
|
|
45
|
+
hashes = PiiCipher.generate_ngram_hashes(value, secret, config[:gram_size])
|
|
46
|
+
relation.where("#{key}_bidx_array @> ?::jsonb", hashes.to_json)
|
|
47
|
+
else
|
|
48
|
+
relation.where("#{key}_bidx" => PiiCipher.generate_blind_index(value, secret))
|
|
49
|
+
end
|
|
50
|
+
end
|
|
51
|
+
|
|
52
|
+
# Chain any remaining standard columns (e.g. status: "active").
|
|
53
|
+
relation = relation.where(remaining) if remaining.any?
|
|
54
|
+
return relation
|
|
55
|
+
end
|
|
56
|
+
end
|
|
57
|
+
|
|
58
|
+
super
|
|
59
|
+
end
|
|
60
|
+
|
|
61
|
+
private
|
|
62
|
+
|
|
63
|
+
# The PiiCipher config for this relation's model, or nil if it doesn't use
|
|
64
|
+
# PiiCipher. Guarded so prepending to the shared Relation class is a no-op
|
|
65
|
+
# for every other model.
|
|
66
|
+
def pii_cipher_configs_for_relation
|
|
67
|
+
k = klass
|
|
68
|
+
return nil unless k.respond_to?(:pii_cipher_configs)
|
|
69
|
+
|
|
70
|
+
k.pii_cipher_configs
|
|
71
|
+
end
|
|
72
|
+
end
|
|
73
|
+
end
|
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
# lib/pii_cipher/railtie.rb
|
|
4
|
+
require "rails/railtie"
|
|
5
|
+
|
|
6
|
+
module PiiCipher
|
|
7
|
+
class Railtie < Rails::Railtie
|
|
8
|
+
initializer "pii_cipher.initialize" do
|
|
9
|
+
ActiveSupport.on_load(:active_record) do
|
|
10
|
+
# Injects our macro into ApplicationRecord automatically.
|
|
11
|
+
include PiiCipher::ActiveRecordExt
|
|
12
|
+
end
|
|
13
|
+
end
|
|
14
|
+
end
|
|
15
|
+
end
|
data/lib/pii_cipher.rb
ADDED
|
@@ -0,0 +1,51 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
# lib/pii_cipher.rb
|
|
4
|
+
require_relative "pii_cipher/version"
|
|
5
|
+
|
|
6
|
+
# 1. Load the compiled Rust extension. It defines:
|
|
7
|
+
# PiiCipher.generate_ngram_hashes(text, secret, n) -> [hash, ...]
|
|
8
|
+
# PiiCipher.generate_blind_index(text, secret) -> hash
|
|
9
|
+
require_relative "pii_cipher/pii_cipher"
|
|
10
|
+
|
|
11
|
+
# 2. Load our Ruby logic
|
|
12
|
+
require_relative "pii_cipher/active_record_ext"
|
|
13
|
+
require_relative "pii_cipher/query_interceptor"
|
|
14
|
+
|
|
15
|
+
module PiiCipher
|
|
16
|
+
# Raised when the HMAC secret key is not configured.
|
|
17
|
+
class MissingSecretKeyError < StandardError; end
|
|
18
|
+
|
|
19
|
+
class << self
|
|
20
|
+
# The HMAC secret used for all blind indexes. Read from the
|
|
21
|
+
# `PII_SECRET_KEY` environment variable. Changing it invalidates every
|
|
22
|
+
# existing blind index.
|
|
23
|
+
def secret_key
|
|
24
|
+
ENV.fetch("PII_SECRET_KEY") do
|
|
25
|
+
raise MissingSecretKeyError,
|
|
26
|
+
"PII_SECRET_KEY is not set. PiiCipher needs it to generate blind indexes."
|
|
27
|
+
end
|
|
28
|
+
end
|
|
29
|
+
|
|
30
|
+
# Apply the per-attribute normalization (currently just case-folding) that
|
|
31
|
+
# must be identical on writes and queries.
|
|
32
|
+
def normalize(value, config)
|
|
33
|
+
str = value.to_s
|
|
34
|
+
config[:case_sensitive] ? str : str.downcase
|
|
35
|
+
end
|
|
36
|
+
|
|
37
|
+
# Idempotently prepend the query patch onto ActiveRecord::Relation. Called
|
|
38
|
+
# the first time any model declares `use_pii_cipher`, by which point
|
|
39
|
+
# ActiveRecord is guaranteed to be loaded. Guarded so it only happens once.
|
|
40
|
+
def install_query_patch!
|
|
41
|
+
return if @query_patch_installed
|
|
42
|
+
return unless defined?(ActiveRecord::Relation)
|
|
43
|
+
|
|
44
|
+
ActiveRecord::Relation.prepend(PiiCipher::RelationExt)
|
|
45
|
+
@query_patch_installed = true
|
|
46
|
+
end
|
|
47
|
+
end
|
|
48
|
+
end
|
|
49
|
+
|
|
50
|
+
# 3. Load the Railtie ONLY if Rails is present in the user's app
|
|
51
|
+
require_relative "pii_cipher/railtie" if defined?(Rails)
|
data/mise.toml
ADDED
data/sig/pii_cipher.rbs
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
module PiiCipher
|
|
2
|
+
VERSION: String
|
|
3
|
+
DEFAULT_GRAM_SIZE: Integer
|
|
4
|
+
|
|
5
|
+
class MissingSecretKeyError < StandardError
|
|
6
|
+
end
|
|
7
|
+
|
|
8
|
+
# Implemented in the Rust extension.
|
|
9
|
+
def self.generate_ngram_hashes: (String text, String secret_key, Integer n) -> Array[String]
|
|
10
|
+
def self.generate_blind_index: (String text, String secret_key) -> String
|
|
11
|
+
|
|
12
|
+
# Implemented in Ruby (lib/pii_cipher.rb).
|
|
13
|
+
def self.secret_key: () -> String
|
|
14
|
+
def self.normalize: (untyped value, Hash[Symbol, untyped] config) -> String
|
|
15
|
+
def self.install_query_patch!: () -> void
|
|
16
|
+
end
|
metadata
ADDED
|
@@ -0,0 +1,99 @@
|
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
|
2
|
+
name: pii_cipher
|
|
3
|
+
version: !ruby/object:Gem::Version
|
|
4
|
+
version: 0.1.0
|
|
5
|
+
platform: arm64-darwin
|
|
6
|
+
authors:
|
|
7
|
+
- Selva Chezhian
|
|
8
|
+
autorequire:
|
|
9
|
+
bindir: exe
|
|
10
|
+
cert_chain: []
|
|
11
|
+
date: 2026-06-29 00:00:00.000000000 Z
|
|
12
|
+
dependencies:
|
|
13
|
+
- !ruby/object:Gem::Dependency
|
|
14
|
+
name: activerecord
|
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
|
16
|
+
requirements:
|
|
17
|
+
- - ">="
|
|
18
|
+
- !ruby/object:Gem::Version
|
|
19
|
+
version: '7.1'
|
|
20
|
+
type: :runtime
|
|
21
|
+
prerelease: false
|
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
23
|
+
requirements:
|
|
24
|
+
- - ">="
|
|
25
|
+
- !ruby/object:Gem::Version
|
|
26
|
+
version: '7.1'
|
|
27
|
+
- !ruby/object:Gem::Dependency
|
|
28
|
+
name: railties
|
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
|
30
|
+
requirements:
|
|
31
|
+
- - ">="
|
|
32
|
+
- !ruby/object:Gem::Version
|
|
33
|
+
version: '7.1'
|
|
34
|
+
type: :runtime
|
|
35
|
+
prerelease: false
|
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
37
|
+
requirements:
|
|
38
|
+
- - ">="
|
|
39
|
+
- !ruby/object:Gem::Version
|
|
40
|
+
version: '7.1'
|
|
41
|
+
description: |
|
|
42
|
+
PiiCipher lets you search encrypted PII columns in ActiveRecord without ever
|
|
43
|
+
storing or querying plaintext. It generates HMAC-SHA256 blind indexes alongside
|
|
44
|
+
your ciphertext — trigram arrays for partial (substring) searches and single
|
|
45
|
+
hashes for exact-match lookups. The hash functions run in a native Rust
|
|
46
|
+
extension for performance. Query interception is transparent: call `where`
|
|
47
|
+
as normal and PiiCipher rewrites the query against the blind index automatically.
|
|
48
|
+
email:
|
|
49
|
+
- selvachezhian.labam@gmail.com
|
|
50
|
+
executables: []
|
|
51
|
+
extensions: []
|
|
52
|
+
extra_rdoc_files: []
|
|
53
|
+
files:
|
|
54
|
+
- CHANGELOG.md
|
|
55
|
+
- CODE_OF_CONDUCT.md
|
|
56
|
+
- LICENSE.txt
|
|
57
|
+
- README.md
|
|
58
|
+
- Rakefile
|
|
59
|
+
- benchmarks/run.rb
|
|
60
|
+
- lib/pii_cipher.rb
|
|
61
|
+
- lib/pii_cipher/3.2/pii_cipher.bundle
|
|
62
|
+
- lib/pii_cipher/3.3/pii_cipher.bundle
|
|
63
|
+
- lib/pii_cipher/active_record_ext.rb
|
|
64
|
+
- lib/pii_cipher/query_interceptor.rb
|
|
65
|
+
- lib/pii_cipher/railtie.rb
|
|
66
|
+
- lib/pii_cipher/version.rb
|
|
67
|
+
- mise.toml
|
|
68
|
+
- sig/pii_cipher.rbs
|
|
69
|
+
homepage: https://github.com/selvachezhian/pii_cipher
|
|
70
|
+
licenses:
|
|
71
|
+
- MIT
|
|
72
|
+
metadata:
|
|
73
|
+
allowed_push_host: https://rubygems.org
|
|
74
|
+
homepage_uri: https://github.com/selvachezhian/pii_cipher
|
|
75
|
+
source_code_uri: https://github.com/selvachezhian/pii_cipher
|
|
76
|
+
changelog_uri: https://github.com/selvachezhian/pii_cipher/blob/main/CHANGELOG.md
|
|
77
|
+
post_install_message:
|
|
78
|
+
rdoc_options: []
|
|
79
|
+
require_paths:
|
|
80
|
+
- lib
|
|
81
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
|
82
|
+
requirements:
|
|
83
|
+
- - ">="
|
|
84
|
+
- !ruby/object:Gem::Version
|
|
85
|
+
version: '3.2'
|
|
86
|
+
- - "<"
|
|
87
|
+
- !ruby/object:Gem::Version
|
|
88
|
+
version: 3.4.dev
|
|
89
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
|
90
|
+
requirements:
|
|
91
|
+
- - ">="
|
|
92
|
+
- !ruby/object:Gem::Version
|
|
93
|
+
version: '0'
|
|
94
|
+
requirements: []
|
|
95
|
+
rubygems_version: 3.5.23
|
|
96
|
+
signing_key:
|
|
97
|
+
specification_version: 4
|
|
98
|
+
summary: Searchable blind indexing for PII fields in Rails, powered by a Rust extension.
|
|
99
|
+
test_files: []
|