opaque_id 1.2.0 → 1.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,412 @@
1
+ ---
2
+ layout: default
3
+ title: Algorithms
4
+ nav_order: 7
5
+ description: "Technical explanation of OpaqueId's generation algorithms and optimization strategies"
6
+ permalink: /algorithms/
7
+ ---
8
+
9
+ # Algorithms
10
+
11
+ OpaqueId uses sophisticated algorithms to generate cryptographically secure, collision-free opaque IDs. This guide explains the technical details behind the generation process, optimization strategies, and mathematical foundations.
12
+
13
+ - TOC
14
+ {:toc}
15
+
16
+ ## Overview
17
+
18
+ OpaqueId implements two primary algorithms optimized for different scenarios:
19
+
20
+ 1. **Fast Path Algorithm** - For 64-character alphabets (optimized performance)
21
+ 2. **Unbiased Path Algorithm** - For all other alphabets (rejection sampling)
22
+
23
+ ## Fast Path Algorithm (64-character alphabets)
24
+
25
+ When using a 64-character alphabet, OpaqueId employs an optimized algorithm that avoids modulo bias and provides maximum performance.
26
+
27
+ ### How It Works
28
+
29
+ ```ruby
30
+ def generate_fast(size, alphabet)
31
+ result = ""
32
+ size.times do
33
+ # Generate random byte
34
+ byte = SecureRandom.random_bytes(1).unpack1("C")
35
+
36
+ # Use bitwise AND for fast modulo 64
37
+ index = byte & 63 # Equivalent to byte % 64
38
+
39
+ # Append character from alphabet
40
+ result += alphabet[index]
41
+ end
42
+ result
43
+ end
44
+ ```
45
+
46
+ ### Key Features
47
+
48
+ - **Bitwise Operations**: Uses `byte & 63` instead of `byte % 64` for faster computation
49
+ - **No Modulo Bias**: 64 is a power of 2, so bitwise AND provides uniform distribution
50
+ - **Single Random Call**: One `SecureRandom.random_bytes(1)` call per character
51
+ - **Maximum Performance**: Optimized for speed with 64-character alphabets
52
+
53
+ ### Mathematical Foundation
54
+
55
+ For a 64-character alphabet:
56
+
57
+ - Each byte (0-255) maps to exactly 4 characters in the alphabet
58
+ - `byte & 63` extracts the lower 6 bits (0-63)
59
+ - This provides uniform distribution without bias
60
+
61
+ ### Performance Characteristics
62
+
63
+ - **Optimized for speed**: Uses bitwise operations for maximum performance
64
+ - **No rejection sampling**: All generated bytes are used efficiently
65
+ - **Linear time complexity**: O(n) where n is the ID length
66
+
67
+ ## Unbiased Path Algorithm (Other alphabets)
68
+
69
+ For alphabets that aren't 64 characters, OpaqueId uses rejection sampling to ensure unbiased distribution.
70
+
71
+ ### How It Works
72
+
73
+ ```ruby
74
+ def generate_unbiased(size, alphabet, alphabet_size)
75
+ result = ""
76
+ size.times do
77
+ loop do
78
+ # Generate random byte
79
+ byte = SecureRandom.random_bytes(1).unpack1("C")
80
+
81
+ # Calculate threshold to avoid modulo bias
82
+ threshold = 256 - (256 % alphabet_size)
83
+
84
+ # Reject if byte is too large (avoids bias)
85
+ next if byte >= threshold
86
+
87
+ # Use modulo to get index
88
+ index = byte % alphabet_size
89
+
90
+ # Append character and break loop
91
+ result += alphabet[index]
92
+ break
93
+ end
94
+ end
95
+ result
96
+ end
97
+ ```
98
+
99
+ ### Key Features
100
+
101
+ - **Rejection Sampling**: Discards biased random values
102
+ - **Unbiased Distribution**: Ensures each character has equal probability
103
+ - **Cryptographically Secure**: Uses `SecureRandom` for all random generation
104
+ - **Collision Resistant**: High entropy prevents predictable patterns
105
+
106
+ ### Mathematical Foundation
107
+
108
+ The rejection sampling algorithm ensures uniform distribution:
109
+
110
+ 1. **Calculate Threshold**: `threshold = 256 - (256 % alphabet_size)`
111
+ 2. **Reject Biased Values**: Only accept bytes < threshold
112
+ 3. **Uniform Mapping**: Each accepted byte maps to exactly one character
113
+
114
+ ### Example: 62-character alphabet
115
+
116
+ ```ruby
117
+ # For 62-character alphabet
118
+ alphabet_size = 62
119
+ threshold = 256 - (256 % 62) # = 256 - 8 = 248
120
+
121
+ # Accept bytes 0-247 (248 values)
122
+ # Each byte maps to: byte % 62
123
+ # This gives uniform distribution across all 62 characters
124
+ ```
125
+
126
+ ### Performance Characteristics
127
+
128
+ - **Unbiased distribution**: Uses rejection sampling to ensure uniform character distribution
129
+ - **Slight overhead**: Some bytes are rejected to maintain uniformity
130
+ - **Linear time complexity**: O(n × rejection_rate) where n is the ID length
131
+
132
+ ## Algorithm Selection
133
+
134
+ OpaqueId automatically selects the appropriate algorithm based on the alphabet size:
135
+
136
+ ```ruby
137
+ def generate(size: 21, alphabet: ALPHANUMERIC_ALPHABET)
138
+ alphabet_size = alphabet.size
139
+
140
+ # Handle edge case: single character alphabet
141
+ return alphabet * size if alphabet_size == 1
142
+
143
+ # Use fast path for 64-character alphabets
144
+ return generate_fast(size, alphabet) if alphabet_size == 64
145
+
146
+ # Use unbiased path for all other alphabets
147
+ generate_unbiased(size, alphabet, alphabet_size)
148
+ end
149
+ ```
150
+
151
+ ### Selection Criteria
152
+
153
+ | Alphabet Size | Algorithm | Reason |
154
+ | ------------- | ----------------- | --------------------------------- |
155
+ | 1 | Direct repetition | No randomness needed |
156
+ | 64 | Fast Path | Optimized bitwise operations |
157
+ | Other | Unbiased Path | Rejection sampling for uniformity |
158
+
159
+ ## Entropy Analysis
160
+
161
+ ### Entropy Calculation
162
+
163
+ The entropy of an opaque ID depends on the alphabet size and length:
164
+
165
+ ```
166
+ Entropy = length × log₂(alphabet_size)
167
+ ```
168
+
169
+ ### Examples
170
+
171
+ ```ruby
172
+ # 21-character ID with 62-character alphabet
173
+ entropy = 21 × log₂(62) = 21 × 5.954 = 125.0 bits
174
+
175
+ # 21-character ID with 64-character alphabet
176
+ entropy = 21 × log₂(64) = 21 × 6.000 = 126.0 bits
177
+
178
+ # 15-character ID with 16-character alphabet
179
+ entropy = 15 × log₂(16) = 15 × 4.000 = 60.0 bits
180
+ ```
181
+
182
+ ### Collision Probability
183
+
184
+ The probability of collision for N generated IDs:
185
+
186
+ ```
187
+ P(collision) ≈ 1 - e^(-N²/(2 × 2^entropy))
188
+ ```
189
+
190
+ ### Practical Examples
191
+
192
+ ```ruby
193
+ # 21-character alphanumeric ID (125 bits entropy)
194
+ # 1 billion IDs: P(collision) ≈ 2.3 × 10⁻¹⁵
195
+ # 1 trillion IDs: P(collision) ≈ 2.3 × 10⁻⁹
196
+
197
+ # 15-character hexadecimal ID (60 bits entropy)
198
+ # 1 million IDs: P(collision) ≈ 2.3 × 10⁻⁶
199
+ # 1 billion IDs: P(collision) ≈ 2.3 × 10⁻³
200
+ ```
201
+
202
+ ## Security Analysis
203
+
204
+ ### Cryptographic Security
205
+
206
+ OpaqueId uses Ruby's `SecureRandom` which provides:
207
+
208
+ - **Cryptographically Secure PRNG**: Uses OS entropy sources
209
+ - **Unpredictable Output**: Cannot be predicted from previous outputs
210
+ - **High Entropy**: Sufficient randomness for security applications
211
+
212
+ ### Attack Resistance
213
+
214
+ #### Brute Force Attacks
215
+
216
+ ```ruby
217
+ # 21-character alphanumeric ID
218
+ # Search space: 62²¹ ≈ 2.3 × 10³⁷
219
+ # Time to brute force: ~10²⁰ years (at 1 billion attempts/second)
220
+ ```
221
+
222
+ #### Timing Attacks
223
+
224
+ The algorithms are designed to be timing-attack resistant:
225
+
226
+ - **Constant Time Operations**: Bitwise operations have predictable timing
227
+ - **Rejection Sampling**: Variable loop iterations don't leak information
228
+ - **SecureRandom**: OS-level entropy prevents timing-based prediction
229
+
230
+ #### Statistical Attacks
231
+
232
+ - **Uniform Distribution**: Rejection sampling ensures statistical uniformity
233
+ - **No Patterns**: Cryptographically secure randomness prevents pattern detection
234
+ - **High Entropy**: Sufficient entropy prevents statistical analysis
235
+
236
+ ## Performance Optimization
237
+
238
+ ### Memory Usage
239
+
240
+ ```ruby
241
+ # Memory-efficient generation
242
+ def generate_efficient(size, alphabet)
243
+ # Pre-allocate result string
244
+ result = String.new(capacity: size)
245
+
246
+ # Generate characters directly into result
247
+ size.times do
248
+ result << generate_character(alphabet)
249
+ end
250
+
251
+ result
252
+ end
253
+ ```
254
+
255
+ ### Batch Generation
256
+
257
+ ```ruby
258
+ # Optimized batch generation
259
+ def generate_batch(count, size, alphabet)
260
+ # Pre-allocate array
261
+ results = Array.new(count)
262
+
263
+ # Generate all IDs
264
+ count.times do |i|
265
+ results[i] = generate(size, alphabet)
266
+ end
267
+
268
+ results
269
+ end
270
+ ```
271
+
272
+ ### Caching Strategies
273
+
274
+ ```ruby
275
+ # Cache alphabet size for performance
276
+ class OpaqueId
277
+ def self.generate(size: 21, alphabet: ALPHANUMERIC_ALPHABET)
278
+ alphabet_size = alphabet.size
279
+
280
+ # Use cached algorithm selection
281
+ if alphabet_size == 64
282
+ generate_fast(size, alphabet)
283
+ else
284
+ generate_unbiased(size, alphabet, alphabet_size)
285
+ end
286
+ end
287
+ end
288
+ ```
289
+
290
+ ## Performance Characteristics
291
+
292
+ ### Algorithm Comparison
293
+
294
+ - **Fast Path (64-char)**: Maximum performance with bitwise operations
295
+ - **Unbiased Path (other)**: Slightly slower but ensures uniform distribution
296
+ - **Memory usage**: Scales linearly with ID length and batch size
297
+ - **Time complexity**: Linear scaling with ID length
298
+
299
+ ## Implementation Details
300
+
301
+ ### Error Handling
302
+
303
+ ```ruby
304
+ def generate(size: 21, alphabet: ALPHANUMERIC_ALPHABET)
305
+ # Validate parameters
306
+ raise ConfigurationError, "Size must be positive" unless size.positive?
307
+ raise ConfigurationError, "Alphabet cannot be empty" if alphabet.nil? || alphabet.empty?
308
+
309
+ # Handle edge cases
310
+ return alphabet * size if alphabet.size == 1
311
+
312
+ # Generate ID
313
+ if alphabet.size == 64
314
+ generate_fast(size, alphabet)
315
+ else
316
+ generate_unbiased(size, alphabet, alphabet.size)
317
+ end
318
+ rescue => e
319
+ raise GenerationError, "Failed to generate opaque ID: #{e.message}"
320
+ end
321
+ ```
322
+
323
+ ### Thread Safety
324
+
325
+ ```ruby
326
+ # OpaqueId is thread-safe
327
+ # SecureRandom is thread-safe
328
+ # No shared state between generations
329
+
330
+ # Safe for concurrent use
331
+ threads = 10.times.map do
332
+ Thread.new do
333
+ 1000.times { OpaqueId.generate }
334
+ end
335
+ end
336
+
337
+ threads.each(&:join)
338
+ ```
339
+
340
+ ## Comparison with Other Algorithms
341
+
342
+ ### vs. UUID
343
+
344
+ | Aspect | OpaqueId | UUID |
345
+ | ----------- | -------------- | ---------------- |
346
+ | Length | Configurable | Fixed (36 chars) |
347
+ | Entropy | Configurable | Fixed (122 bits) |
348
+ | Readability | Human-friendly | Not readable |
349
+ | Performance | Optimized | Standard |
350
+
351
+ ### vs. NanoID
352
+
353
+ | Aspect | OpaqueId | NanoID |
354
+ | ------------- | ----------- | --------------- |
355
+ | Language | Native Ruby | JavaScript port |
356
+ | Performance | Optimized | Good |
357
+ | Dependencies | None | External gem |
358
+ | Customization | Extensive | Limited |
359
+
360
+ ### vs. SecureRandom.hex
361
+
362
+ | Aspect | OpaqueId | SecureRandom.hex |
363
+ | ----------- | ------------ | ---------------- |
364
+ | Alphabet | Configurable | Fixed (hex) |
365
+ | Bias | None | None |
366
+ | Performance | Optimized | Good |
367
+ | Features | Rich | Basic |
368
+
369
+ ## Best Practices
370
+
371
+ ### 1. Choose Appropriate Algorithm
372
+
373
+ ```ruby
374
+ # Use 64-character alphabets for maximum performance
375
+ self.opaque_id_alphabet = OpaqueId::STANDARD_ALPHABET
376
+
377
+ # Use other alphabets for specific requirements
378
+ self.opaque_id_alphabet = "0123456789" # Numeric only
379
+ ```
380
+
381
+ ### 2. Optimize for Your Use Case
382
+
383
+ ```ruby
384
+ # High-performance applications
385
+ self.opaque_id_alphabet = OpaqueId::STANDARD_ALPHABET
386
+ self.opaque_id_length = 21
387
+
388
+ # Human-readable applications
389
+ self.opaque_id_alphabet = OpaqueId::ALPHANUMERIC_ALPHABET
390
+ self.opaque_id_length = 15
391
+ ```
392
+
393
+ ### 3. Consider Entropy Requirements
394
+
395
+ ```ruby
396
+ # High security (125+ bits entropy)
397
+ self.opaque_id_length = 21
398
+ self.opaque_id_alphabet = OpaqueId::ALPHANUMERIC_ALPHABET
399
+
400
+ # Medium security (60+ bits entropy)
401
+ self.opaque_id_length = 15
402
+ self.opaque_id_alphabet = "0123456789abcdef"
403
+ ```
404
+
405
+ ## Next Steps
406
+
407
+ Now that you understand the algorithms:
408
+
409
+ 1. **Explore [Performance](performance.md)** for detailed benchmarks and optimization tips
410
+ 2. **Check out [Security](security.md)** for security considerations and best practices
411
+ 3. **Review [Configuration](configuration.md)** for algorithm selection guidance
412
+ 4. **Read [API Reference](api-reference.md)** for complete algorithm documentation