opaque_id 1.2.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,409 @@
1
+ ---
2
+ layout: default
3
+ title: Algorithms
4
+ nav_order: 7
5
+ description: "Technical explanation of OpaqueId's generation algorithms and optimization strategies"
6
+ permalink: /algorithms/
7
+ ---
8
+
9
+ # Algorithms
10
+
11
+ OpaqueId uses sophisticated algorithms to generate cryptographically secure, collision-free opaque IDs. This guide explains the technical details behind the generation process, optimization strategies, and mathematical foundations.
12
+
13
+ ## Overview
14
+
15
+ OpaqueId implements two primary algorithms optimized for different scenarios:
16
+
17
+ 1. **Fast Path Algorithm** - For 64-character alphabets (optimized performance)
18
+ 2. **Unbiased Path Algorithm** - For all other alphabets (rejection sampling)
19
+
20
+ ## Fast Path Algorithm (64-character alphabets)
21
+
22
+ When using a 64-character alphabet, OpaqueId employs an optimized algorithm that avoids modulo bias and provides maximum performance.
23
+
24
+ ### How It Works
25
+
26
+ ```ruby
27
+ def generate_fast(size, alphabet)
28
+ result = ""
29
+ size.times do
30
+ # Generate random byte
31
+ byte = SecureRandom.random_bytes(1).unpack1("C")
32
+
33
+ # Use bitwise AND for fast modulo 64
34
+ index = byte & 63 # Equivalent to byte % 64
35
+
36
+ # Append character from alphabet
37
+ result += alphabet[index]
38
+ end
39
+ result
40
+ end
41
+ ```
42
+
43
+ ### Key Features
44
+
45
+ - **Bitwise Operations**: Uses `byte & 63` instead of `byte % 64` for faster computation
46
+ - **No Modulo Bias**: 64 is a power of 2, so bitwise AND provides uniform distribution
47
+ - **Single Random Call**: One `SecureRandom.random_bytes(1)` call per character
48
+ - **Maximum Performance**: Optimized for speed with 64-character alphabets
49
+
50
+ ### Mathematical Foundation
51
+
52
+ For a 64-character alphabet:
53
+
54
+ - Each byte (0-255) maps to exactly 4 characters in the alphabet
55
+ - `byte & 63` extracts the lower 6 bits (0-63)
56
+ - This provides uniform distribution without bias
57
+
58
+ ### Performance Characteristics
59
+
60
+ - **Optimized for speed**: Uses bitwise operations for maximum performance
61
+ - **No rejection sampling**: All generated bytes are used efficiently
62
+ - **Linear time complexity**: O(n) where n is the ID length
63
+
64
+ ## Unbiased Path Algorithm (Other alphabets)
65
+
66
+ For alphabets that aren't 64 characters, OpaqueId uses rejection sampling to ensure unbiased distribution.
67
+
68
+ ### How It Works
69
+
70
+ ```ruby
71
+ def generate_unbiased(size, alphabet, alphabet_size)
72
+ result = ""
73
+ size.times do
74
+ loop do
75
+ # Generate random byte
76
+ byte = SecureRandom.random_bytes(1).unpack1("C")
77
+
78
+ # Calculate threshold to avoid modulo bias
79
+ threshold = 256 - (256 % alphabet_size)
80
+
81
+ # Reject if byte is too large (avoids bias)
82
+ next if byte >= threshold
83
+
84
+ # Use modulo to get index
85
+ index = byte % alphabet_size
86
+
87
+ # Append character and break loop
88
+ result += alphabet[index]
89
+ break
90
+ end
91
+ end
92
+ result
93
+ end
94
+ ```
95
+
96
+ ### Key Features
97
+
98
+ - **Rejection Sampling**: Discards biased random values
99
+ - **Unbiased Distribution**: Ensures each character has equal probability
100
+ - **Cryptographically Secure**: Uses `SecureRandom` for all random generation
101
+ - **Collision Resistant**: High entropy prevents predictable patterns
102
+
103
+ ### Mathematical Foundation
104
+
105
+ The rejection sampling algorithm ensures uniform distribution:
106
+
107
+ 1. **Calculate Threshold**: `threshold = 256 - (256 % alphabet_size)`
108
+ 2. **Reject Biased Values**: Only accept bytes < threshold
109
+ 3. **Uniform Mapping**: Each accepted byte maps to exactly one character
110
+
111
+ ### Example: 62-character alphabet
112
+
113
+ ```ruby
114
+ # For 62-character alphabet
115
+ alphabet_size = 62
116
+ threshold = 256 - (256 % 62) # = 256 - 8 = 248
117
+
118
+ # Accept bytes 0-247 (248 values)
119
+ # Each byte maps to: byte % 62
120
+ # This gives uniform distribution across all 62 characters
121
+ ```
122
+
123
+ ### Performance Characteristics
124
+
125
+ - **Unbiased distribution**: Uses rejection sampling to ensure uniform character distribution
126
+ - **Slight overhead**: Some bytes are rejected to maintain uniformity
127
+ - **Linear time complexity**: O(n × rejection_rate) where n is the ID length
128
+
129
+ ## Algorithm Selection
130
+
131
+ OpaqueId automatically selects the appropriate algorithm based on the alphabet size:
132
+
133
+ ```ruby
134
+ def generate(size: 21, alphabet: ALPHANUMERIC_ALPHABET)
135
+ alphabet_size = alphabet.size
136
+
137
+ # Handle edge case: single character alphabet
138
+ return alphabet * size if alphabet_size == 1
139
+
140
+ # Use fast path for 64-character alphabets
141
+ return generate_fast(size, alphabet) if alphabet_size == 64
142
+
143
+ # Use unbiased path for all other alphabets
144
+ generate_unbiased(size, alphabet, alphabet_size)
145
+ end
146
+ ```
147
+
148
+ ### Selection Criteria
149
+
150
+ | Alphabet Size | Algorithm | Reason |
151
+ | ------------- | ----------------- | --------------------------------- |
152
+ | 1 | Direct repetition | No randomness needed |
153
+ | 64 | Fast Path | Optimized bitwise operations |
154
+ | Other | Unbiased Path | Rejection sampling for uniformity |
155
+
156
+ ## Entropy Analysis
157
+
158
+ ### Entropy Calculation
159
+
160
+ The entropy of an opaque ID depends on the alphabet size and length:
161
+
162
+ ```
163
+ Entropy = length × log₂(alphabet_size)
164
+ ```
165
+
166
+ ### Examples
167
+
168
+ ```ruby
169
+ # 21-character ID with 62-character alphabet
170
+ entropy = 21 × log₂(62) = 21 × 5.954 = 125.0 bits
171
+
172
+ # 21-character ID with 64-character alphabet
173
+ entropy = 21 × log₂(64) = 21 × 6.000 = 126.0 bits
174
+
175
+ # 15-character ID with 16-character alphabet
176
+ entropy = 15 × log₂(16) = 15 × 4.000 = 60.0 bits
177
+ ```
178
+
179
+ ### Collision Probability
180
+
181
+ The probability of collision for N generated IDs:
182
+
183
+ ```
184
+ P(collision) ≈ 1 - e^(-N²/(2 × 2^entropy))
185
+ ```
186
+
187
+ ### Practical Examples
188
+
189
+ ```ruby
190
+ # 21-character alphanumeric ID (125 bits entropy)
191
+ # 1 billion IDs: P(collision) ≈ 2.3 × 10⁻¹⁵
192
+ # 1 trillion IDs: P(collision) ≈ 2.3 × 10⁻⁹
193
+
194
+ # 15-character hexadecimal ID (60 bits entropy)
195
+ # 1 million IDs: P(collision) ≈ 2.3 × 10⁻⁶
196
+ # 1 billion IDs: P(collision) ≈ 2.3 × 10⁻³
197
+ ```
198
+
199
+ ## Security Analysis
200
+
201
+ ### Cryptographic Security
202
+
203
+ OpaqueId uses Ruby's `SecureRandom` which provides:
204
+
205
+ - **Cryptographically Secure PRNG**: Uses OS entropy sources
206
+ - **Unpredictable Output**: Cannot be predicted from previous outputs
207
+ - **High Entropy**: Sufficient randomness for security applications
208
+
209
+ ### Attack Resistance
210
+
211
+ #### Brute Force Attacks
212
+
213
+ ```ruby
214
+ # 21-character alphanumeric ID
215
+ # Search space: 62²¹ ≈ 2.3 × 10³⁷
216
+ # Time to brute force: ~10²⁰ years (at 1 billion attempts/second)
217
+ ```
218
+
219
+ #### Timing Attacks
220
+
221
+ The algorithms are designed to be timing-attack resistant:
222
+
223
+ - **Constant Time Operations**: Bitwise operations have predictable timing
224
+ - **Rejection Sampling**: Variable loop iterations don't leak information
225
+ - **SecureRandom**: OS-level entropy prevents timing-based prediction
226
+
227
+ #### Statistical Attacks
228
+
229
+ - **Uniform Distribution**: Rejection sampling ensures statistical uniformity
230
+ - **No Patterns**: Cryptographically secure randomness prevents pattern detection
231
+ - **High Entropy**: Sufficient entropy prevents statistical analysis
232
+
233
+ ## Performance Optimization
234
+
235
+ ### Memory Usage
236
+
237
+ ```ruby
238
+ # Memory-efficient generation
239
+ def generate_efficient(size, alphabet)
240
+ # Pre-allocate result string
241
+ result = String.new(capacity: size)
242
+
243
+ # Generate characters directly into result
244
+ size.times do
245
+ result << generate_character(alphabet)
246
+ end
247
+
248
+ result
249
+ end
250
+ ```
251
+
252
+ ### Batch Generation
253
+
254
+ ```ruby
255
+ # Optimized batch generation
256
+ def generate_batch(count, size, alphabet)
257
+ # Pre-allocate array
258
+ results = Array.new(count)
259
+
260
+ # Generate all IDs
261
+ count.times do |i|
262
+ results[i] = generate(size, alphabet)
263
+ end
264
+
265
+ results
266
+ end
267
+ ```
268
+
269
+ ### Caching Strategies
270
+
271
+ ```ruby
272
+ # Cache alphabet size for performance
273
+ class OpaqueId
274
+ def self.generate(size: 21, alphabet: ALPHANUMERIC_ALPHABET)
275
+ alphabet_size = alphabet.size
276
+
277
+ # Use cached algorithm selection
278
+ if alphabet_size == 64
279
+ generate_fast(size, alphabet)
280
+ else
281
+ generate_unbiased(size, alphabet, alphabet_size)
282
+ end
283
+ end
284
+ end
285
+ ```
286
+
287
+ ## Performance Characteristics
288
+
289
+ ### Algorithm Comparison
290
+
291
+ - **Fast Path (64-char)**: Maximum performance with bitwise operations
292
+ - **Unbiased Path (other)**: Slightly slower but ensures uniform distribution
293
+ - **Memory usage**: Scales linearly with ID length and batch size
294
+ - **Time complexity**: Linear scaling with ID length
295
+
296
+ ## Implementation Details
297
+
298
+ ### Error Handling
299
+
300
+ ```ruby
301
+ def generate(size: 21, alphabet: ALPHANUMERIC_ALPHABET)
302
+ # Validate parameters
303
+ raise ConfigurationError, "Size must be positive" unless size.positive?
304
+ raise ConfigurationError, "Alphabet cannot be empty" if alphabet.nil? || alphabet.empty?
305
+
306
+ # Handle edge cases
307
+ return alphabet * size if alphabet.size == 1
308
+
309
+ # Generate ID
310
+ if alphabet.size == 64
311
+ generate_fast(size, alphabet)
312
+ else
313
+ generate_unbiased(size, alphabet, alphabet.size)
314
+ end
315
+ rescue => e
316
+ raise GenerationError, "Failed to generate opaque ID: #{e.message}"
317
+ end
318
+ ```
319
+
320
+ ### Thread Safety
321
+
322
+ ```ruby
323
+ # OpaqueId is thread-safe
324
+ # SecureRandom is thread-safe
325
+ # No shared state between generations
326
+
327
+ # Safe for concurrent use
328
+ threads = 10.times.map do
329
+ Thread.new do
330
+ 1000.times { OpaqueId.generate }
331
+ end
332
+ end
333
+
334
+ threads.each(&:join)
335
+ ```
336
+
337
+ ## Comparison with Other Algorithms
338
+
339
+ ### vs. UUID
340
+
341
+ | Aspect | OpaqueId | UUID |
342
+ | ----------- | -------------- | ---------------- |
343
+ | Length | Configurable | Fixed (36 chars) |
344
+ | Entropy | Configurable | Fixed (122 bits) |
345
+ | Readability | Human-friendly | Not readable |
346
+ | Performance | Optimized | Standard |
347
+
348
+ ### vs. NanoID
349
+
350
+ | Aspect | OpaqueId | NanoID |
351
+ | ------------- | ----------- | --------------- |
352
+ | Language | Native Ruby | JavaScript port |
353
+ | Performance | Optimized | Good |
354
+ | Dependencies | None | External gem |
355
+ | Customization | Extensive | Limited |
356
+
357
+ ### vs. SecureRandom.hex
358
+
359
+ | Aspect | OpaqueId | SecureRandom.hex |
360
+ | ----------- | ------------ | ---------------- |
361
+ | Alphabet | Configurable | Fixed (hex) |
362
+ | Bias | None | None |
363
+ | Performance | Optimized | Good |
364
+ | Features | Rich | Basic |
365
+
366
+ ## Best Practices
367
+
368
+ ### 1. Choose Appropriate Algorithm
369
+
370
+ ```ruby
371
+ # Use 64-character alphabets for maximum performance
372
+ self.opaque_id_alphabet = OpaqueId::STANDARD_ALPHABET
373
+
374
+ # Use other alphabets for specific requirements
375
+ self.opaque_id_alphabet = "0123456789" # Numeric only
376
+ ```
377
+
378
+ ### 2. Optimize for Your Use Case
379
+
380
+ ```ruby
381
+ # High-performance applications
382
+ self.opaque_id_alphabet = OpaqueId::STANDARD_ALPHABET
383
+ self.opaque_id_length = 21
384
+
385
+ # Human-readable applications
386
+ self.opaque_id_alphabet = OpaqueId::ALPHANUMERIC_ALPHABET
387
+ self.opaque_id_length = 15
388
+ ```
389
+
390
+ ### 3. Consider Entropy Requirements
391
+
392
+ ```ruby
393
+ # High security (125+ bits entropy)
394
+ self.opaque_id_length = 21
395
+ self.opaque_id_alphabet = OpaqueId::ALPHANUMERIC_ALPHABET
396
+
397
+ # Medium security (60+ bits entropy)
398
+ self.opaque_id_length = 15
399
+ self.opaque_id_alphabet = "0123456789abcdef"
400
+ ```
401
+
402
+ ## Next Steps
403
+
404
+ Now that you understand the algorithms:
405
+
406
+ 1. **Explore [Performance](performance.md)** for detailed benchmarks and optimization tips
407
+ 2. **Check out [Security](security.md)** for security considerations and best practices
408
+ 3. **Review [Configuration](configuration.md)** for algorithm selection guidance
409
+ 4. **Read [API Reference](api-reference.md)** for complete algorithm documentation