opaque_id 1.2.0 → 1.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.release-please-manifest.json +1 -1
- data/CHANGELOG.md +28 -0
- data/CODE_OF_CONDUCT.md +11 -11
- data/README.md +82 -0
- data/docs/.gitignore +5 -0
- data/docs/404.html +25 -0
- data/docs/Gemfile +31 -0
- data/docs/Gemfile.lock +335 -0
- data/docs/_config.yml +162 -0
- data/docs/_data/navigation.yml +132 -0
- data/docs/_includes/footer_custom.html +8 -0
- data/docs/_includes/head_custom.html +67 -0
- data/docs/algorithms.md +409 -0
- data/docs/alphabets.md +521 -0
- data/docs/api-reference.md +594 -0
- data/docs/assets/css/custom.scss +798 -0
- data/docs/assets/images/favicon.svg +17 -0
- data/docs/assets/images/og-image.svg +65 -0
- data/docs/configuration.md +548 -0
- data/docs/development.md +567 -0
- data/docs/getting-started.md +256 -0
- data/docs/index.md +132 -0
- data/docs/installation.md +377 -0
- data/docs/performance.md +488 -0
- data/docs/robots.txt +11 -0
- data/docs/security.md +598 -0
- data/docs/usage.md +414 -0
- data/docs/use-cases.md +569 -0
- data/lib/opaque_id/version.rb +1 -1
- data/tasks/0003-prd-documentation-site.md +191 -0
- data/tasks/tasks-0003-prd-documentation-site.md +84 -0
- metadata +27 -2
- data/sig/opaque_id.rbs +0 -4
data/docs/algorithms.md
ADDED
@@ -0,0 +1,409 @@
|
|
1
|
+
---
|
2
|
+
layout: default
|
3
|
+
title: Algorithms
|
4
|
+
nav_order: 7
|
5
|
+
description: "Technical explanation of OpaqueId's generation algorithms and optimization strategies"
|
6
|
+
permalink: /algorithms/
|
7
|
+
---
|
8
|
+
|
9
|
+
# Algorithms
|
10
|
+
|
11
|
+
OpaqueId uses sophisticated algorithms to generate cryptographically secure, collision-free opaque IDs. This guide explains the technical details behind the generation process, optimization strategies, and mathematical foundations.
|
12
|
+
|
13
|
+
## Overview
|
14
|
+
|
15
|
+
OpaqueId implements two primary algorithms optimized for different scenarios:
|
16
|
+
|
17
|
+
1. **Fast Path Algorithm** - For 64-character alphabets (optimized performance)
|
18
|
+
2. **Unbiased Path Algorithm** - For all other alphabets (rejection sampling)
|
19
|
+
|
20
|
+
## Fast Path Algorithm (64-character alphabets)
|
21
|
+
|
22
|
+
When using a 64-character alphabet, OpaqueId employs an optimized algorithm that avoids modulo bias and provides maximum performance.
|
23
|
+
|
24
|
+
### How It Works
|
25
|
+
|
26
|
+
```ruby
|
27
|
+
def generate_fast(size, alphabet)
|
28
|
+
result = ""
|
29
|
+
size.times do
|
30
|
+
# Generate random byte
|
31
|
+
byte = SecureRandom.random_bytes(1).unpack1("C")
|
32
|
+
|
33
|
+
# Use bitwise AND for fast modulo 64
|
34
|
+
index = byte & 63 # Equivalent to byte % 64
|
35
|
+
|
36
|
+
# Append character from alphabet
|
37
|
+
result += alphabet[index]
|
38
|
+
end
|
39
|
+
result
|
40
|
+
end
|
41
|
+
```
|
42
|
+
|
43
|
+
### Key Features
|
44
|
+
|
45
|
+
- **Bitwise Operations**: Uses `byte & 63` instead of `byte % 64` for faster computation
|
46
|
+
- **No Modulo Bias**: 64 is a power of 2, so bitwise AND provides uniform distribution
|
47
|
+
- **Single Random Call**: One `SecureRandom.random_bytes(1)` call per character
|
48
|
+
- **Maximum Performance**: Optimized for speed with 64-character alphabets
|
49
|
+
|
50
|
+
### Mathematical Foundation
|
51
|
+
|
52
|
+
For a 64-character alphabet:
|
53
|
+
|
54
|
+
- Each byte (0-255) maps to exactly 4 characters in the alphabet
|
55
|
+
- `byte & 63` extracts the lower 6 bits (0-63)
|
56
|
+
- This provides uniform distribution without bias
|
57
|
+
|
58
|
+
### Performance Characteristics
|
59
|
+
|
60
|
+
- **Optimized for speed**: Uses bitwise operations for maximum performance
|
61
|
+
- **No rejection sampling**: All generated bytes are used efficiently
|
62
|
+
- **Linear time complexity**: O(n) where n is the ID length
|
63
|
+
|
64
|
+
## Unbiased Path Algorithm (Other alphabets)
|
65
|
+
|
66
|
+
For alphabets that aren't 64 characters, OpaqueId uses rejection sampling to ensure unbiased distribution.
|
67
|
+
|
68
|
+
### How It Works
|
69
|
+
|
70
|
+
```ruby
|
71
|
+
def generate_unbiased(size, alphabet, alphabet_size)
|
72
|
+
result = ""
|
73
|
+
size.times do
|
74
|
+
loop do
|
75
|
+
# Generate random byte
|
76
|
+
byte = SecureRandom.random_bytes(1).unpack1("C")
|
77
|
+
|
78
|
+
# Calculate threshold to avoid modulo bias
|
79
|
+
threshold = 256 - (256 % alphabet_size)
|
80
|
+
|
81
|
+
# Reject if byte is too large (avoids bias)
|
82
|
+
next if byte >= threshold
|
83
|
+
|
84
|
+
# Use modulo to get index
|
85
|
+
index = byte % alphabet_size
|
86
|
+
|
87
|
+
# Append character and break loop
|
88
|
+
result += alphabet[index]
|
89
|
+
break
|
90
|
+
end
|
91
|
+
end
|
92
|
+
result
|
93
|
+
end
|
94
|
+
```
|
95
|
+
|
96
|
+
### Key Features
|
97
|
+
|
98
|
+
- **Rejection Sampling**: Discards biased random values
|
99
|
+
- **Unbiased Distribution**: Ensures each character has equal probability
|
100
|
+
- **Cryptographically Secure**: Uses `SecureRandom` for all random generation
|
101
|
+
- **Collision Resistant**: High entropy prevents predictable patterns
|
102
|
+
|
103
|
+
### Mathematical Foundation
|
104
|
+
|
105
|
+
The rejection sampling algorithm ensures uniform distribution:
|
106
|
+
|
107
|
+
1. **Calculate Threshold**: `threshold = 256 - (256 % alphabet_size)`
|
108
|
+
2. **Reject Biased Values**: Only accept bytes < threshold
|
109
|
+
3. **Uniform Mapping**: Each accepted byte maps to exactly one character
|
110
|
+
|
111
|
+
### Example: 62-character alphabet
|
112
|
+
|
113
|
+
```ruby
|
114
|
+
# For 62-character alphabet
|
115
|
+
alphabet_size = 62
|
116
|
+
threshold = 256 - (256 % 62) # = 256 - 8 = 248
|
117
|
+
|
118
|
+
# Accept bytes 0-247 (248 values)
|
119
|
+
# Each byte maps to: byte % 62
|
120
|
+
# This gives uniform distribution across all 62 characters
|
121
|
+
```
|
122
|
+
|
123
|
+
### Performance Characteristics
|
124
|
+
|
125
|
+
- **Unbiased distribution**: Uses rejection sampling to ensure uniform character distribution
|
126
|
+
- **Slight overhead**: Some bytes are rejected to maintain uniformity
|
127
|
+
- **Linear time complexity**: O(n × rejection_rate) where n is the ID length
|
128
|
+
|
129
|
+
## Algorithm Selection
|
130
|
+
|
131
|
+
OpaqueId automatically selects the appropriate algorithm based on the alphabet size:
|
132
|
+
|
133
|
+
```ruby
|
134
|
+
def generate(size: 21, alphabet: ALPHANUMERIC_ALPHABET)
|
135
|
+
alphabet_size = alphabet.size
|
136
|
+
|
137
|
+
# Handle edge case: single character alphabet
|
138
|
+
return alphabet * size if alphabet_size == 1
|
139
|
+
|
140
|
+
# Use fast path for 64-character alphabets
|
141
|
+
return generate_fast(size, alphabet) if alphabet_size == 64
|
142
|
+
|
143
|
+
# Use unbiased path for all other alphabets
|
144
|
+
generate_unbiased(size, alphabet, alphabet_size)
|
145
|
+
end
|
146
|
+
```
|
147
|
+
|
148
|
+
### Selection Criteria
|
149
|
+
|
150
|
+
| Alphabet Size | Algorithm | Reason |
|
151
|
+
| ------------- | ----------------- | --------------------------------- |
|
152
|
+
| 1 | Direct repetition | No randomness needed |
|
153
|
+
| 64 | Fast Path | Optimized bitwise operations |
|
154
|
+
| Other | Unbiased Path | Rejection sampling for uniformity |
|
155
|
+
|
156
|
+
## Entropy Analysis
|
157
|
+
|
158
|
+
### Entropy Calculation
|
159
|
+
|
160
|
+
The entropy of an opaque ID depends on the alphabet size and length:
|
161
|
+
|
162
|
+
```
|
163
|
+
Entropy = length × log₂(alphabet_size)
|
164
|
+
```
|
165
|
+
|
166
|
+
### Examples
|
167
|
+
|
168
|
+
```ruby
|
169
|
+
# 21-character ID with 62-character alphabet
|
170
|
+
entropy = 21 × log₂(62) = 21 × 5.954 = 125.0 bits
|
171
|
+
|
172
|
+
# 21-character ID with 64-character alphabet
|
173
|
+
entropy = 21 × log₂(64) = 21 × 6.000 = 126.0 bits
|
174
|
+
|
175
|
+
# 15-character ID with 16-character alphabet
|
176
|
+
entropy = 15 × log₂(16) = 15 × 4.000 = 60.0 bits
|
177
|
+
```
|
178
|
+
|
179
|
+
### Collision Probability
|
180
|
+
|
181
|
+
The probability of collision for N generated IDs:
|
182
|
+
|
183
|
+
```
|
184
|
+
P(collision) ≈ 1 - e^(-N²/(2 × 2^entropy))
|
185
|
+
```
|
186
|
+
|
187
|
+
### Practical Examples
|
188
|
+
|
189
|
+
```ruby
|
190
|
+
# 21-character alphanumeric ID (125 bits entropy)
|
191
|
+
# 1 billion IDs: P(collision) ≈ 2.3 × 10⁻¹⁵
|
192
|
+
# 1 trillion IDs: P(collision) ≈ 2.3 × 10⁻⁹
|
193
|
+
|
194
|
+
# 15-character hexadecimal ID (60 bits entropy)
|
195
|
+
# 1 million IDs: P(collision) ≈ 2.3 × 10⁻⁶
|
196
|
+
# 1 billion IDs: P(collision) ≈ 2.3 × 10⁻³
|
197
|
+
```
|
198
|
+
|
199
|
+
## Security Analysis
|
200
|
+
|
201
|
+
### Cryptographic Security
|
202
|
+
|
203
|
+
OpaqueId uses Ruby's `SecureRandom` which provides:
|
204
|
+
|
205
|
+
- **Cryptographically Secure PRNG**: Uses OS entropy sources
|
206
|
+
- **Unpredictable Output**: Cannot be predicted from previous outputs
|
207
|
+
- **High Entropy**: Sufficient randomness for security applications
|
208
|
+
|
209
|
+
### Attack Resistance
|
210
|
+
|
211
|
+
#### Brute Force Attacks
|
212
|
+
|
213
|
+
```ruby
|
214
|
+
# 21-character alphanumeric ID
|
215
|
+
# Search space: 62²¹ ≈ 2.3 × 10³⁷
|
216
|
+
# Time to brute force: ~10²⁰ years (at 1 billion attempts/second)
|
217
|
+
```
|
218
|
+
|
219
|
+
#### Timing Attacks
|
220
|
+
|
221
|
+
The algorithms are designed to be timing-attack resistant:
|
222
|
+
|
223
|
+
- **Constant Time Operations**: Bitwise operations have predictable timing
|
224
|
+
- **Rejection Sampling**: Variable loop iterations don't leak information
|
225
|
+
- **SecureRandom**: OS-level entropy prevents timing-based prediction
|
226
|
+
|
227
|
+
#### Statistical Attacks
|
228
|
+
|
229
|
+
- **Uniform Distribution**: Rejection sampling ensures statistical uniformity
|
230
|
+
- **No Patterns**: Cryptographically secure randomness prevents pattern detection
|
231
|
+
- **High Entropy**: Sufficient entropy prevents statistical analysis
|
232
|
+
|
233
|
+
## Performance Optimization
|
234
|
+
|
235
|
+
### Memory Usage
|
236
|
+
|
237
|
+
```ruby
|
238
|
+
# Memory-efficient generation
|
239
|
+
def generate_efficient(size, alphabet)
|
240
|
+
# Pre-allocate result string
|
241
|
+
result = String.new(capacity: size)
|
242
|
+
|
243
|
+
# Generate characters directly into result
|
244
|
+
size.times do
|
245
|
+
result << generate_character(alphabet)
|
246
|
+
end
|
247
|
+
|
248
|
+
result
|
249
|
+
end
|
250
|
+
```
|
251
|
+
|
252
|
+
### Batch Generation
|
253
|
+
|
254
|
+
```ruby
|
255
|
+
# Optimized batch generation
|
256
|
+
def generate_batch(count, size, alphabet)
|
257
|
+
# Pre-allocate array
|
258
|
+
results = Array.new(count)
|
259
|
+
|
260
|
+
# Generate all IDs
|
261
|
+
count.times do |i|
|
262
|
+
results[i] = generate(size, alphabet)
|
263
|
+
end
|
264
|
+
|
265
|
+
results
|
266
|
+
end
|
267
|
+
```
|
268
|
+
|
269
|
+
### Caching Strategies
|
270
|
+
|
271
|
+
```ruby
|
272
|
+
# Cache alphabet size for performance
|
273
|
+
class OpaqueId
|
274
|
+
def self.generate(size: 21, alphabet: ALPHANUMERIC_ALPHABET)
|
275
|
+
alphabet_size = alphabet.size
|
276
|
+
|
277
|
+
# Use cached algorithm selection
|
278
|
+
if alphabet_size == 64
|
279
|
+
generate_fast(size, alphabet)
|
280
|
+
else
|
281
|
+
generate_unbiased(size, alphabet, alphabet_size)
|
282
|
+
end
|
283
|
+
end
|
284
|
+
end
|
285
|
+
```
|
286
|
+
|
287
|
+
## Performance Characteristics
|
288
|
+
|
289
|
+
### Algorithm Comparison
|
290
|
+
|
291
|
+
- **Fast Path (64-char)**: Maximum performance with bitwise operations
|
292
|
+
- **Unbiased Path (other)**: Slightly slower but ensures uniform distribution
|
293
|
+
- **Memory usage**: Scales linearly with ID length and batch size
|
294
|
+
- **Time complexity**: Linear scaling with ID length
|
295
|
+
|
296
|
+
## Implementation Details
|
297
|
+
|
298
|
+
### Error Handling
|
299
|
+
|
300
|
+
```ruby
|
301
|
+
def generate(size: 21, alphabet: ALPHANUMERIC_ALPHABET)
|
302
|
+
# Validate parameters
|
303
|
+
raise ConfigurationError, "Size must be positive" unless size.positive?
|
304
|
+
raise ConfigurationError, "Alphabet cannot be empty" if alphabet.nil? || alphabet.empty?
|
305
|
+
|
306
|
+
# Handle edge cases
|
307
|
+
return alphabet * size if alphabet.size == 1
|
308
|
+
|
309
|
+
# Generate ID
|
310
|
+
if alphabet.size == 64
|
311
|
+
generate_fast(size, alphabet)
|
312
|
+
else
|
313
|
+
generate_unbiased(size, alphabet, alphabet.size)
|
314
|
+
end
|
315
|
+
rescue => e
|
316
|
+
raise GenerationError, "Failed to generate opaque ID: #{e.message}"
|
317
|
+
end
|
318
|
+
```
|
319
|
+
|
320
|
+
### Thread Safety
|
321
|
+
|
322
|
+
```ruby
|
323
|
+
# OpaqueId is thread-safe
|
324
|
+
# SecureRandom is thread-safe
|
325
|
+
# No shared state between generations
|
326
|
+
|
327
|
+
# Safe for concurrent use
|
328
|
+
threads = 10.times.map do
|
329
|
+
Thread.new do
|
330
|
+
1000.times { OpaqueId.generate }
|
331
|
+
end
|
332
|
+
end
|
333
|
+
|
334
|
+
threads.each(&:join)
|
335
|
+
```
|
336
|
+
|
337
|
+
## Comparison with Other Algorithms
|
338
|
+
|
339
|
+
### vs. UUID
|
340
|
+
|
341
|
+
| Aspect | OpaqueId | UUID |
|
342
|
+
| ----------- | -------------- | ---------------- |
|
343
|
+
| Length | Configurable | Fixed (36 chars) |
|
344
|
+
| Entropy | Configurable | Fixed (122 bits) |
|
345
|
+
| Readability | Human-friendly | Not readable |
|
346
|
+
| Performance | Optimized | Standard |
|
347
|
+
|
348
|
+
### vs. NanoID
|
349
|
+
|
350
|
+
| Aspect | OpaqueId | NanoID |
|
351
|
+
| ------------- | ----------- | --------------- |
|
352
|
+
| Language | Native Ruby | JavaScript port |
|
353
|
+
| Performance | Optimized | Good |
|
354
|
+
| Dependencies | None | External gem |
|
355
|
+
| Customization | Extensive | Limited |
|
356
|
+
|
357
|
+
### vs. SecureRandom.hex
|
358
|
+
|
359
|
+
| Aspect | OpaqueId | SecureRandom.hex |
|
360
|
+
| ----------- | ------------ | ---------------- |
|
361
|
+
| Alphabet | Configurable | Fixed (hex) |
|
362
|
+
| Bias | None | None |
|
363
|
+
| Performance | Optimized | Good |
|
364
|
+
| Features | Rich | Basic |
|
365
|
+
|
366
|
+
## Best Practices
|
367
|
+
|
368
|
+
### 1. Choose Appropriate Algorithm
|
369
|
+
|
370
|
+
```ruby
|
371
|
+
# Use 64-character alphabets for maximum performance
|
372
|
+
self.opaque_id_alphabet = OpaqueId::STANDARD_ALPHABET
|
373
|
+
|
374
|
+
# Use other alphabets for specific requirements
|
375
|
+
self.opaque_id_alphabet = "0123456789" # Numeric only
|
376
|
+
```
|
377
|
+
|
378
|
+
### 2. Optimize for Your Use Case
|
379
|
+
|
380
|
+
```ruby
|
381
|
+
# High-performance applications
|
382
|
+
self.opaque_id_alphabet = OpaqueId::STANDARD_ALPHABET
|
383
|
+
self.opaque_id_length = 21
|
384
|
+
|
385
|
+
# Human-readable applications
|
386
|
+
self.opaque_id_alphabet = OpaqueId::ALPHANUMERIC_ALPHABET
|
387
|
+
self.opaque_id_length = 15
|
388
|
+
```
|
389
|
+
|
390
|
+
### 3. Consider Entropy Requirements
|
391
|
+
|
392
|
+
```ruby
|
393
|
+
# High security (125+ bits entropy)
|
394
|
+
self.opaque_id_length = 21
|
395
|
+
self.opaque_id_alphabet = OpaqueId::ALPHANUMERIC_ALPHABET
|
396
|
+
|
397
|
+
# Medium security (60+ bits entropy)
|
398
|
+
self.opaque_id_length = 15
|
399
|
+
self.opaque_id_alphabet = "0123456789abcdef"
|
400
|
+
```
|
401
|
+
|
402
|
+
## Next Steps
|
403
|
+
|
404
|
+
Now that you understand the algorithms:
|
405
|
+
|
406
|
+
1. **Explore [Performance](performance.md)** for detailed benchmarks and optimization tips
|
407
|
+
2. **Check out [Security](security.md)** for security considerations and best practices
|
408
|
+
3. **Review [Configuration](configuration.md)** for algorithm selection guidance
|
409
|
+
4. **Read [API Reference](api-reference.md)** for complete algorithm documentation
|