domain_extractor 0.2.7 → 0.2.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +268 -0
- data/LICENSE +28 -0
- data/README.md +395 -5
- data/lib/domain_extractor/auth.rb +82 -0
- data/lib/domain_extractor/parsed_url.rb +236 -5
- data/lib/domain_extractor/parser.rb +91 -14
- data/lib/domain_extractor/result.rb +40 -9
- data/lib/domain_extractor/uri_helpers.rb +168 -0
- data/lib/domain_extractor/validators.rb +15 -0
- data/lib/domain_extractor/version.rb +1 -1
- data/lib/domain_extractor.rb +30 -0
- data/spec/auth_and_uri_spec.rb +454 -0
- data/spec/domain_extractor_spec.rb +2 -2
- data/spec/domain_validator_spec.rb +1 -1
- data/spec/formatter_spec.rb +2 -2
- metadata +32 -12
- data/LICENSE.txt +0 -21
data/README.md
CHANGED
|
@@ -1,8 +1,9 @@
|
|
|
1
|
+

|
|
2
|
+
|
|
1
3
|
# DomainExtractor
|
|
2
4
|
|
|
3
5
|
[](https://badge.fury.io/rb/domain_extractor)
|
|
4
6
|
[](https://github.com/opensite-ai/domain_extractor/actions/workflows/ci.yml)
|
|
5
|
-
[](https://codeclimate.com/github/opensite-ai/domain_extractor)
|
|
6
7
|
|
|
7
8
|
A lightweight, robust Ruby library for url parsing and domain parsing with **accurate multi-part TLD support**. DomainExtractor delivers a high-throughput url parser and domain parser that excels at domain extraction tasks while staying friendly to analytics pipelines. Perfect for web scraping, analytics, url manipulation, query parameter parsing, and multi-environment domain analysis.
|
|
8
9
|
|
|
@@ -10,16 +11,20 @@ Use **DomainExtractor** whenever you need a dependable tld parser for tricky mul
|
|
|
10
11
|
|
|
11
12
|
## Why DomainExtractor?
|
|
12
13
|
|
|
14
|
+
✅ **URI-Compatible Accessors** - Covers common absolute-URL workflows with a Ruby `URI`-style API
|
|
15
|
+
✅ **Authentication Extraction** - Parse credentials from Redis, database, FTP, and API URLs
|
|
13
16
|
✅ **Accurate Multi-part TLD Parser** - Handles complex multi-part TLDs (co.uk, com.au, gov.br) using the [Public Suffix List](https://publicsuffix.org/)
|
|
14
17
|
✅ **Nested Subdomain Extraction** - Correctly parses multi-level subdomains (api.staging.example.com)
|
|
15
18
|
✅ **Smart URL Normalization** - Automatically handles URLs with or without schemes
|
|
16
19
|
✅ **Powerful URL Formatting** - Transform and standardize URLs with flexible options
|
|
17
20
|
✅ **Rails Integration** - Custom ActiveModel validator for declarative URL validation
|
|
18
21
|
✅ **Query Parameter Parsing** - Parse query strings into structured hashes
|
|
22
|
+
✅ **Authentication Helpers** - Generate Basic Auth and Bearer token headers
|
|
19
23
|
✅ **Batch Processing** - Parse multiple URLs efficiently
|
|
20
24
|
✅ **IP Address Detection** - Identifies and handles IPv4 and IPv6 addresses
|
|
25
|
+
✅ **Benchmark-Backed Performance** - Auth helpers run in low microseconds; full parses are documented in the included benchmark suite
|
|
21
26
|
✅ **Zero Configuration** - Works out of the box with sensible defaults
|
|
22
|
-
✅ **Well-Tested** -
|
|
27
|
+
✅ **Well-Tested** - 200+ comprehensive test cases covering all scenarios
|
|
23
28
|
|
|
24
29
|
## Installation
|
|
25
30
|
|
|
@@ -370,6 +375,235 @@ DomainExtractor.format(url_string, **options)
|
|
|
370
375
|
# :use_trailing_slash (true/false)
|
|
371
376
|
```
|
|
372
377
|
|
|
378
|
+
## Authentication Extraction
|
|
379
|
+
|
|
380
|
+
DomainExtractor provides comprehensive authentication extraction from URLs, supporting all major database systems, caching solutions, and file transfer protocols.
|
|
381
|
+
|
|
382
|
+
### Supported URL Schemes
|
|
383
|
+
|
|
384
|
+
**Database Connections:**
|
|
385
|
+
- PostgreSQL: `postgresql://user:pass@host:5432/dbname`
|
|
386
|
+
- MySQL: `mysql://user:pass@host:3306/database`
|
|
387
|
+
- MongoDB: `mongodb+srv://user:pass@cluster.mongodb.net/db`
|
|
388
|
+
- CockroachDB: `postgresql://user:pass@host:26257/db`
|
|
389
|
+
|
|
390
|
+
**Caching & Message Queues:**
|
|
391
|
+
- Redis: `redis://user:pass@host:6379/0`
|
|
392
|
+
- Redis SSL: `rediss://:password@host:6380`
|
|
393
|
+
|
|
394
|
+
**File Transfer:**
|
|
395
|
+
- FTP: `ftp://user:pass@host/path`
|
|
396
|
+
- SFTP: `sftp://user:pass@host:22/path`
|
|
397
|
+
- FTPS: `ftps://user:pass@host:990/path`
|
|
398
|
+
|
|
399
|
+
**HTTP/HTTPS:**
|
|
400
|
+
- Basic Auth: `https://user:pass@api.example.com`
|
|
401
|
+
|
|
402
|
+
### Basic Usage
|
|
403
|
+
|
|
404
|
+
```ruby
|
|
405
|
+
# Parse Redis URL
|
|
406
|
+
redis_url = 'rediss://default:my_secret_pw@redis.cloud:6385/0'
|
|
407
|
+
result = DomainExtractor.parse(redis_url)
|
|
408
|
+
|
|
409
|
+
result.scheme # => "rediss"
|
|
410
|
+
result.user # => "default"
|
|
411
|
+
result.password # => "my_secret_pw"
|
|
412
|
+
result.host # => "redis.cloud"
|
|
413
|
+
result.port # => 6385
|
|
414
|
+
result.path # => "/0"
|
|
415
|
+
|
|
416
|
+
# Parse PostgreSQL URL
|
|
417
|
+
db_url = 'postgresql://appuser:SecurePass@db.prod.internal:5432/production'
|
|
418
|
+
result = DomainExtractor.parse(db_url)
|
|
419
|
+
|
|
420
|
+
result.user # => "appuser"
|
|
421
|
+
result.password # => "SecurePass"
|
|
422
|
+
result.host # => "db.prod.internal"
|
|
423
|
+
result.port # => 5432
|
|
424
|
+
result.path # => "/production"
|
|
425
|
+
```
|
|
426
|
+
|
|
427
|
+
### Special Character Handling
|
|
428
|
+
|
|
429
|
+
DomainExtractor automatically handles percent-encoded special characters in credentials:
|
|
430
|
+
|
|
431
|
+
```ruby
|
|
432
|
+
# Password with special characters: P@ss:word!
|
|
433
|
+
url = 'redis://user:P%40ss%3Aword%21@localhost:6379'
|
|
434
|
+
result = DomainExtractor.parse(url)
|
|
435
|
+
|
|
436
|
+
result.password # => "P%40ss%3Aword%21" (encoded)
|
|
437
|
+
result.decoded_password # => "P@ss:word!" (decoded, ready to use)
|
|
438
|
+
|
|
439
|
+
# Username as email address
|
|
440
|
+
url = 'https://user%40domain.com:password@api.example.com'
|
|
441
|
+
result = DomainExtractor.parse(url)
|
|
442
|
+
|
|
443
|
+
result.user # => "user%40domain.com"
|
|
444
|
+
result.decoded_user # => "user@domain.com"
|
|
445
|
+
```
|
|
446
|
+
|
|
447
|
+
### Authentication Helper Methods
|
|
448
|
+
|
|
449
|
+
**Generate Basic Authentication Headers:**
|
|
450
|
+
|
|
451
|
+
```ruby
|
|
452
|
+
# From parsed URL
|
|
453
|
+
result = DomainExtractor.parse('https://user:pass@api.example.com')
|
|
454
|
+
auth_header = result.basic_auth_header
|
|
455
|
+
# => "Basic dXNlcjpwYXNz"
|
|
456
|
+
|
|
457
|
+
# Use in HTTP request
|
|
458
|
+
require 'net/http'
|
|
459
|
+
uri = URI('https://api.example.com/endpoint')
|
|
460
|
+
request = Net::HTTP::Get.new(uri)
|
|
461
|
+
request['Authorization'] = auth_header
|
|
462
|
+
|
|
463
|
+
# Or use module method directly
|
|
464
|
+
header = DomainExtractor.basic_auth_header('username', 'password')
|
|
465
|
+
# => "Basic dXNlcm5hbWU6cGFzc3dvcmQ="
|
|
466
|
+
```
|
|
467
|
+
|
|
468
|
+
**Generate Bearer Token Headers:**
|
|
469
|
+
|
|
470
|
+
```ruby
|
|
471
|
+
token = 'eyJhbGciOiJIUzI1NiIs...'
|
|
472
|
+
header = DomainExtractor.bearer_auth_header(token)
|
|
473
|
+
# => "Bearer eyJhbGciOiJIUzI1NiIs..."
|
|
474
|
+
|
|
475
|
+
# Use in API request
|
|
476
|
+
request['Authorization'] = header
|
|
477
|
+
```
|
|
478
|
+
|
|
479
|
+
**Encode/Decode Credentials:**
|
|
480
|
+
|
|
481
|
+
```ruby
|
|
482
|
+
# Encode credentials for URL use
|
|
483
|
+
password = 'P@ss:word!'
|
|
484
|
+
encoded = DomainExtractor.encode_credential(password)
|
|
485
|
+
# => "P%40ss%3Aword%21"
|
|
486
|
+
|
|
487
|
+
# Build URL with encoded credentials
|
|
488
|
+
url = "redis://user:#{encoded}@localhost:6379"
|
|
489
|
+
|
|
490
|
+
# Decode credentials
|
|
491
|
+
decoded = DomainExtractor.decode_credential(encoded)
|
|
492
|
+
# => "P@ss:word!"
|
|
493
|
+
```
|
|
494
|
+
|
|
495
|
+
### Real-World Examples
|
|
496
|
+
|
|
497
|
+
**Database Connection Configuration:**
|
|
498
|
+
|
|
499
|
+
```ruby
|
|
500
|
+
class DatabaseConfig
|
|
501
|
+
def self.from_url(url)
|
|
502
|
+
config = DomainExtractor.parse(url)
|
|
503
|
+
|
|
504
|
+
{
|
|
505
|
+
adapter: config.scheme.sub('postgresql', 'postgres'),
|
|
506
|
+
host: config.host,
|
|
507
|
+
port: config.port,
|
|
508
|
+
database: config.path&.sub('/', ''),
|
|
509
|
+
username: config.decoded_user,
|
|
510
|
+
password: config.decoded_password
|
|
511
|
+
}
|
|
512
|
+
end
|
|
513
|
+
end
|
|
514
|
+
|
|
515
|
+
# Usage
|
|
516
|
+
db_url = ENV['DATABASE_URL']
|
|
517
|
+
config = DatabaseConfig.from_url(db_url)
|
|
518
|
+
# => { adapter: "postgres", host: "db.prod.internal", port: 5432, ... }
|
|
519
|
+
```
|
|
520
|
+
|
|
521
|
+
**Redis Connection Helper:**
|
|
522
|
+
|
|
523
|
+
```ruby
|
|
524
|
+
class RedisConnection
|
|
525
|
+
def self.from_url(url)
|
|
526
|
+
config = DomainExtractor.parse(url)
|
|
527
|
+
|
|
528
|
+
Redis.new(
|
|
529
|
+
host: config.host,
|
|
530
|
+
port: config.port || 6379,
|
|
531
|
+
password: config.decoded_password,
|
|
532
|
+
db: config.path&.sub('/', '')&.to_i || 0,
|
|
533
|
+
ssl: config.scheme == 'rediss'
|
|
534
|
+
)
|
|
535
|
+
end
|
|
536
|
+
end
|
|
537
|
+
|
|
538
|
+
# Usage
|
|
539
|
+
redis = RedisConnection.from_url(ENV['REDIS_URL'])
|
|
540
|
+
```
|
|
541
|
+
|
|
542
|
+
**SFTP Deployment Script:**
|
|
543
|
+
|
|
544
|
+
```ruby
|
|
545
|
+
def deploy_via_sftp(url, local_path)
|
|
546
|
+
config = DomainExtractor.parse(url)
|
|
547
|
+
|
|
548
|
+
Net::SFTP.start(
|
|
549
|
+
config.host,
|
|
550
|
+
config.decoded_user,
|
|
551
|
+
password: config.decoded_password,
|
|
552
|
+
port: config.port || 22
|
|
553
|
+
) do |sftp|
|
|
554
|
+
sftp.upload!(local_path, config.path)
|
|
555
|
+
end
|
|
556
|
+
end
|
|
557
|
+
|
|
558
|
+
# Usage
|
|
559
|
+
deploy_via_sftp(ENV['DEPLOY_URL'], './build')
|
|
560
|
+
```
|
|
561
|
+
|
|
562
|
+
### Security Best Practices
|
|
563
|
+
|
|
564
|
+
⚠️ **Important Security Considerations:**
|
|
565
|
+
|
|
566
|
+
1. **Never hardcode credentials in source code**
|
|
567
|
+
```ruby
|
|
568
|
+
# ❌ Bad
|
|
569
|
+
url = 'redis://user:password@localhost:6379'
|
|
570
|
+
|
|
571
|
+
# ✅ Good
|
|
572
|
+
url = ENV['REDIS_URL']
|
|
573
|
+
url = Rails.application.credentials.redis[:url]
|
|
574
|
+
```
|
|
575
|
+
|
|
576
|
+
2. **Use environment variables or secret managers**
|
|
577
|
+
```ruby
|
|
578
|
+
# ✅ Good
|
|
579
|
+
db_config = DomainExtractor.parse(ENV['DATABASE_URL'])
|
|
580
|
+
redis_config = DomainExtractor.parse(ENV['REDIS_URL'])
|
|
581
|
+
```
|
|
582
|
+
|
|
583
|
+
3. **Never log URLs with credentials**
|
|
584
|
+
```ruby
|
|
585
|
+
# ❌ Bad
|
|
586
|
+
logger.info("Connecting to #{database_url}")
|
|
587
|
+
|
|
588
|
+
# ✅ Good
|
|
589
|
+
config = DomainExtractor.parse(database_url)
|
|
590
|
+
logger.info("Connecting to #{config.host}:#{config.port}")
|
|
591
|
+
```
|
|
592
|
+
|
|
593
|
+
4. **Always use TLS/SSL for credential transmission**
|
|
594
|
+
```ruby
|
|
595
|
+
# ✅ Good - Use rediss:// not redis://
|
|
596
|
+
url = 'rediss://user:pass@redis.cloud:6380'
|
|
597
|
+
|
|
598
|
+
# ✅ Good - Use postgresql:// with sslmode
|
|
599
|
+
url = 'postgresql://user:pass@db.cloud:5432/db?sslmode=require'
|
|
600
|
+
```
|
|
601
|
+
|
|
602
|
+
5. **Rotate credentials regularly**
|
|
603
|
+
- Use secret rotation services (AWS Secrets Manager, HashiCorp Vault)
|
|
604
|
+
- Never commit credentials to version control
|
|
605
|
+
- Use `.env` files with `.gitignore`
|
|
606
|
+
|
|
373
607
|
## URL Formatting
|
|
374
608
|
|
|
375
609
|
DomainExtractor provides powerful URL formatting capabilities to normalize, transform, and standardize URLs according to your application's requirements.
|
|
@@ -861,6 +1095,155 @@ secure.errors[:url]
|
|
|
861
1095
|
# => ["must use https://"]
|
|
862
1096
|
```
|
|
863
1097
|
|
|
1098
|
+
## URI-Compatible Access
|
|
1099
|
+
|
|
1100
|
+
DomainExtractor covers the most common absolute-URL workflows people reach for Ruby's `URI` library for, while adding domain extraction, auth helpers, and formatting utilities.
|
|
1101
|
+
|
|
1102
|
+
### Why Replace URI?
|
|
1103
|
+
|
|
1104
|
+
**Performance:**
|
|
1105
|
+
- Included benchmarks measure roughly 5k-6k full parses/sec for common URLs on Ruby 3.4 / Apple Silicon
|
|
1106
|
+
- Auth helper methods remain microsecond-level operations
|
|
1107
|
+
- Domain extraction work happens in the same parse pass
|
|
1108
|
+
|
|
1109
|
+
**Features:**
|
|
1110
|
+
- Common absolute-URL component accessors and setters
|
|
1111
|
+
- PLUS: Multi-part TLD parsing
|
|
1112
|
+
- PLUS: Domain component extraction
|
|
1113
|
+
- PLUS: Decoded credentials
|
|
1114
|
+
- PLUS: Authentication helpers
|
|
1115
|
+
- PLUS: URL formatting
|
|
1116
|
+
|
|
1117
|
+
### Migration from URI
|
|
1118
|
+
|
|
1119
|
+
**Low-friction migration for common absolute-URL use cases:**
|
|
1120
|
+
|
|
1121
|
+
```ruby
|
|
1122
|
+
# Before (using URI)
|
|
1123
|
+
require 'uri'
|
|
1124
|
+
|
|
1125
|
+
uri = URI.parse('https://user:pass@example.com:8080/path?query=value#section')
|
|
1126
|
+
uri.scheme # => "https"
|
|
1127
|
+
uri.user # => "user"
|
|
1128
|
+
uri.password # => "pass"
|
|
1129
|
+
uri.host # => "example.com"
|
|
1130
|
+
uri.port # => 8080
|
|
1131
|
+
uri.path # => "/path"
|
|
1132
|
+
uri.query # => "query=value"
|
|
1133
|
+
uri.fragment # => "section"
|
|
1134
|
+
|
|
1135
|
+
# After (using DomainExtractor) - URI-style access plus domain helpers
|
|
1136
|
+
require 'domain_extractor'
|
|
1137
|
+
|
|
1138
|
+
result = DomainExtractor.parse('https://user:pass@example.com:8080/path?query=value#section')
|
|
1139
|
+
result.scheme # => "https"
|
|
1140
|
+
result.user # => "user"
|
|
1141
|
+
result.password # => "pass"
|
|
1142
|
+
result.host # => "example.com"
|
|
1143
|
+
result.port # => 8080
|
|
1144
|
+
result.path # => "/path"
|
|
1145
|
+
result.query # => "query=value"
|
|
1146
|
+
result.fragment # => "section"
|
|
1147
|
+
|
|
1148
|
+
# PLUS: Additional features not in URI along with each method
|
|
1149
|
+
# also having `?` and `!` variants for custom behavior
|
|
1150
|
+
result.subdomain # => nil
|
|
1151
|
+
result.domain # => "example"
|
|
1152
|
+
result.tld # => "com"
|
|
1153
|
+
result.root_domain # => "example.com"
|
|
1154
|
+
result.decoded_user # => "user"
|
|
1155
|
+
result.decoded_password # => "pass"
|
|
1156
|
+
result.basic_auth_header # => "Basic dXNlcjpwYXNz"
|
|
1157
|
+
```
|
|
1158
|
+
|
|
1159
|
+
### URI Method Compatibility
|
|
1160
|
+
|
|
1161
|
+
**Common absolute-URL URI methods are supported:**
|
|
1162
|
+
|
|
1163
|
+
```ruby
|
|
1164
|
+
result = DomainExtractor.parse('https://api.example.com:8443/v1/users?page=2#results')
|
|
1165
|
+
|
|
1166
|
+
# Component accessors
|
|
1167
|
+
result.scheme # => "https"
|
|
1168
|
+
result.host # => "api.example.com"
|
|
1169
|
+
result.hostname # => "api.example.com"
|
|
1170
|
+
result.port # => 8443
|
|
1171
|
+
result.path # => "/v1/users"
|
|
1172
|
+
result.query # => "page=2"
|
|
1173
|
+
result.fragment # => "results"
|
|
1174
|
+
|
|
1175
|
+
# Authentication
|
|
1176
|
+
result.user # => nil
|
|
1177
|
+
result.password # => nil
|
|
1178
|
+
result.userinfo # => nil
|
|
1179
|
+
|
|
1180
|
+
# URI state checks
|
|
1181
|
+
result.absolute? # => true
|
|
1182
|
+
result.relative? # => false # bare hosts are normalized to https://
|
|
1183
|
+
|
|
1184
|
+
# Default ports
|
|
1185
|
+
result.default_port # => 443 (for https)
|
|
1186
|
+
|
|
1187
|
+
# String conversion
|
|
1188
|
+
result.to_s # => Full URL string
|
|
1189
|
+
result.to_str # => Alias for to_s
|
|
1190
|
+
result.to_h # => Hash representation
|
|
1191
|
+
```
|
|
1192
|
+
|
|
1193
|
+
### Advanced URI Features
|
|
1194
|
+
|
|
1195
|
+
**Proxy Detection:**
|
|
1196
|
+
|
|
1197
|
+
```ruby
|
|
1198
|
+
# Automatically detects proxy from environment
|
|
1199
|
+
# Checks http_proxy, HTTP_PROXY, and no_proxy
|
|
1200
|
+
result = DomainExtractor.parse('https://api.example.com')
|
|
1201
|
+
proxy = result.find_proxy
|
|
1202
|
+
# => #<URI::HTTP http://proxy.company.com:8080> or nil
|
|
1203
|
+
```
|
|
1204
|
+
|
|
1205
|
+
**URI Normalization:**
|
|
1206
|
+
|
|
1207
|
+
```ruby
|
|
1208
|
+
result = DomainExtractor.parse('HTTP://EXAMPLE.COM:80/Path')
|
|
1209
|
+
normalized = result.normalize
|
|
1210
|
+
|
|
1211
|
+
normalized.scheme # => "http" (lowercased)
|
|
1212
|
+
normalized.host # => "example.com" (lowercased)
|
|
1213
|
+
normalized.port # => 80 (URI-compatible default port)
|
|
1214
|
+
normalized.to_s # => "http://example.com/Path"
|
|
1215
|
+
```
|
|
1216
|
+
|
|
1217
|
+
**URI Merging:**
|
|
1218
|
+
|
|
1219
|
+
```ruby
|
|
1220
|
+
base = DomainExtractor.parse('https://example.com/api/v1/')
|
|
1221
|
+
relative = 'users/123'
|
|
1222
|
+
|
|
1223
|
+
merged = base.merge(relative)
|
|
1224
|
+
merged.to_s # => "https://example.com/api/v1/users/123"
|
|
1225
|
+
```
|
|
1226
|
+
|
|
1227
|
+
### Component Setters
|
|
1228
|
+
|
|
1229
|
+
**Modify URI components programmatically:**
|
|
1230
|
+
|
|
1231
|
+
```ruby
|
|
1232
|
+
result = DomainExtractor.parse('http://example.com')
|
|
1233
|
+
|
|
1234
|
+
# Set individual components
|
|
1235
|
+
result.scheme = 'https'
|
|
1236
|
+
result.host = 'secure.example.com'
|
|
1237
|
+
result.port = 8443
|
|
1238
|
+
result.path = '/api/endpoint'
|
|
1239
|
+
result.query = 'key=value'
|
|
1240
|
+
result.fragment = 'section'
|
|
1241
|
+
|
|
1242
|
+
# Build complete URL
|
|
1243
|
+
result.build_url
|
|
1244
|
+
# => "https://secure.example.com:8443/api/endpoint?key=value#section"
|
|
1245
|
+
```
|
|
1246
|
+
|
|
864
1247
|
## Use Cases
|
|
865
1248
|
|
|
866
1249
|
**Web Scraping**
|
|
@@ -892,8 +1275,8 @@ end
|
|
|
892
1275
|
|
|
893
1276
|
Optimized for high-throughput production use:
|
|
894
1277
|
|
|
895
|
-
- **Single URL parsing**:
|
|
896
|
-
- **Batch processing**:
|
|
1278
|
+
- **Single URL parsing**: the included benchmarks currently land around 170-280μs for common absolute URLs on Ruby 3.4 / Apple Silicon
|
|
1279
|
+
- **Batch processing**: the included benchmarks currently land around 5k-6k URLs/sec for common workloads, with larger batches becoming allocation-bound
|
|
897
1280
|
- **Memory efficient**: <100KB overhead, ~200 bytes per parse
|
|
898
1281
|
- **Thread-safe**: Stateless modules, safe for concurrent use
|
|
899
1282
|
- **Zero-allocation hot paths**: Frozen constants, pre-compiled regex
|
|
@@ -907,8 +1290,15 @@ View [performance analysis](https://github.com/opensite-ai/domain_extractor/blob
|
|
|
907
1290
|
| Multi-part TLD parser | ✅ | ❌ | ❌ |
|
|
908
1291
|
| Subdomain extraction | ✅ | ❌ | ❌ |
|
|
909
1292
|
| Domain component separation | ✅ | ❌ | ❌ |
|
|
910
|
-
|
|
|
1293
|
+
| Auth extraction & decoding | ✅ | ❌ | ⚠️ (basic) |
|
|
1294
|
+
| Authentication helpers | ✅ | ❌ | ❌ |
|
|
1295
|
+
| Built-in url normalization | ✅ | ❌ | ✅ |
|
|
1296
|
+
| URL formatting | ✅ | ❌ | ❌ |
|
|
1297
|
+
| Proxy detection | ✅ | ❌ | ✅ |
|
|
1298
|
+
| Performance profile | Feature-rich single-pass parse | Varies | Faster raw parse |
|
|
1299
|
+
| Auth helper speed | Microsecond-level | ❌ | ❌ |
|
|
911
1300
|
| Lightweight | ✅ | ❌ | ✅ |
|
|
1301
|
+
| Rails validator | ✅ | ❌ | ❌ |
|
|
912
1302
|
|
|
913
1303
|
## Requirements
|
|
914
1304
|
|
|
@@ -0,0 +1,82 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require 'uri'
|
|
4
|
+
|
|
5
|
+
module DomainExtractor
|
|
6
|
+
# Auth module extracts authentication components from URIs
|
|
7
|
+
# Handles userinfo parsing with support for special characters and percent-encoding
|
|
8
|
+
module Auth
|
|
9
|
+
# Frozen constants for zero allocation
|
|
10
|
+
COLON = ':'
|
|
11
|
+
EMPTY_AUTH = {
|
|
12
|
+
user: nil,
|
|
13
|
+
password: nil,
|
|
14
|
+
userinfo: nil,
|
|
15
|
+
decoded_user: nil,
|
|
16
|
+
decoded_password: nil
|
|
17
|
+
}.freeze
|
|
18
|
+
|
|
19
|
+
module_function
|
|
20
|
+
|
|
21
|
+
# Extract userinfo components from a URI object
|
|
22
|
+
# @param uri [URI::Generic] The parsed URI object
|
|
23
|
+
# @return [Hash] Hash containing :user, :password, :userinfo, :decoded_user, :decoded_password
|
|
24
|
+
def extract(uri)
|
|
25
|
+
return empty_auth unless uri&.userinfo
|
|
26
|
+
|
|
27
|
+
user, password = split_userinfo(uri.userinfo)
|
|
28
|
+
|
|
29
|
+
{
|
|
30
|
+
user: user,
|
|
31
|
+
password: password,
|
|
32
|
+
userinfo: uri.userinfo,
|
|
33
|
+
decoded_user: decode_component(user),
|
|
34
|
+
decoded_password: decode_component(password)
|
|
35
|
+
}
|
|
36
|
+
end
|
|
37
|
+
|
|
38
|
+
# Split userinfo into user and password components
|
|
39
|
+
# Handles edge cases like password-only (":password") and user-only ("user")
|
|
40
|
+
# @param userinfo [String] The userinfo string from URI
|
|
41
|
+
# @return [Array<String, String>] Array of [user, password]
|
|
42
|
+
def split_userinfo(userinfo)
|
|
43
|
+
return [nil, nil] if userinfo.nil? || userinfo.empty?
|
|
44
|
+
|
|
45
|
+
# Find first colon to split user:password
|
|
46
|
+
colon_index = userinfo.index(COLON)
|
|
47
|
+
|
|
48
|
+
if colon_index.nil?
|
|
49
|
+
# No colon means user-only
|
|
50
|
+
[userinfo, nil]
|
|
51
|
+
elsif colon_index.zero?
|
|
52
|
+
# Starts with colon means password-only (Redis pattern: ":password")
|
|
53
|
+
[nil, userinfo[1..]]
|
|
54
|
+
else
|
|
55
|
+
# Normal case: "user:password"
|
|
56
|
+
[userinfo[0...colon_index], userinfo[(colon_index + 1)..]]
|
|
57
|
+
end
|
|
58
|
+
end
|
|
59
|
+
private_class_method :split_userinfo
|
|
60
|
+
|
|
61
|
+
# Decode percent-encoded component
|
|
62
|
+
# @param component [String, nil] The component to decode
|
|
63
|
+
# @return [String, nil] Decoded component or nil
|
|
64
|
+
def decode_component(component)
|
|
65
|
+
return nil if component.nil?
|
|
66
|
+
return component if component.empty?
|
|
67
|
+
|
|
68
|
+
URI::DEFAULT_PARSER.unescape(component)
|
|
69
|
+
rescue StandardError
|
|
70
|
+
# If decoding fails, return original
|
|
71
|
+
component
|
|
72
|
+
end
|
|
73
|
+
private_class_method :decode_component
|
|
74
|
+
|
|
75
|
+
# Return empty auth hash
|
|
76
|
+
# @return [Hash] Empty auth components
|
|
77
|
+
def empty_auth
|
|
78
|
+
EMPTY_AUTH
|
|
79
|
+
end
|
|
80
|
+
private_class_method :empty_auth
|
|
81
|
+
end
|
|
82
|
+
end
|