domain_extractor 0.2.8 â 0.2.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +268 -0
- data/LICENSE +1 -1
- data/README.md +395 -5
- data/lib/domain_extractor/auth.rb +82 -0
- data/lib/domain_extractor/parsed_url.rb +236 -5
- data/lib/domain_extractor/parser.rb +91 -14
- data/lib/domain_extractor/result.rb +40 -9
- data/lib/domain_extractor/uri_helpers.rb +168 -0
- data/lib/domain_extractor/validators.rb +15 -0
- data/lib/domain_extractor/version.rb +1 -1
- data/lib/domain_extractor.rb +30 -0
- data/spec/auth_and_uri_spec.rb +454 -0
- data/spec/domain_extractor_spec.rb +2 -2
- data/spec/domain_validator_spec.rb +1 -1
- data/spec/formatter_spec.rb +2 -2
- metadata +30 -10
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 0b791db73463bab1af5200390a77607e87094609a4982c267cf45462648edb09
|
|
4
|
+
data.tar.gz: 1202be56be2eb390bbd767e2c60369a61268261fefe5c593acdb064d586e7431
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 957f691b10eb6bd705646cf192c9c3fecf84a246a3a76e8e5f29406d38b97f52277006bfaa9e09fc334eb3594eeae61c977e05eb95c10021d7cdce6a27ae8325
|
|
7
|
+
data.tar.gz: 205629c265724025d48c9d25318c78ede7d6bccf93486cae0b5957a4094ef94a46dde00a5b97540517c4b2f5244025e5ace4eb9de6bf4c6f0e0a7477692897e7
|
data/CHANGELOG.md
CHANGED
|
@@ -5,6 +5,274 @@ All notable changes to this project will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [0.2.9] - 2026-03-11
|
|
9
|
+
|
|
10
|
+
### Added - URI-Compatible Accessors and Authentication Extraction
|
|
11
|
+
|
|
12
|
+
This major release adds a much broader **URI-compatible API for common absolute-URL workflows** along with new authentication extraction, URI manipulation helpers, and expanded documentation.
|
|
13
|
+
|
|
14
|
+
#### ð Authentication Extraction
|
|
15
|
+
|
|
16
|
+
**Comprehensive userinfo parsing** for database connections, Redis, FTP, SFTP, and any URL scheme with embedded credentials:
|
|
17
|
+
|
|
18
|
+
- `user` - Extract username from URL
|
|
19
|
+
- `password` - Extract password from URL
|
|
20
|
+
- `userinfo` - Complete userinfo string (user:password)
|
|
21
|
+
- `decoded_user` - Percent-decoded username (handles special characters)
|
|
22
|
+
- `decoded_password` - Percent-decoded password (handles special characters)
|
|
23
|
+
|
|
24
|
+
**Supported URL Schemes:**
|
|
25
|
+
|
|
26
|
+
- **Redis/Rediss**: `redis://username:password@host:6379/0`, `rediss://:password@host:6380`
|
|
27
|
+
- **PostgreSQL**: `postgresql://user:pass@localhost:5432/dbname`
|
|
28
|
+
- **MySQL**: `mysql://user:pass@host:3306/database`
|
|
29
|
+
- **MongoDB**: `mongodb+srv://user:pass@cluster.mongodb.net/db`
|
|
30
|
+
- **FTP/SFTP/FTPS**: `ftp://user:pass@host/path`, `sftp://user:pass@host:22/path`
|
|
31
|
+
- **HTTP/HTTPS**: `https://user:pass@api.example.com` (deprecated but supported)
|
|
32
|
+
|
|
33
|
+
**Special Character Handling:**
|
|
34
|
+
|
|
35
|
+
- Automatic percent-decoding of credentials with `@`, `:`, and other special characters
|
|
36
|
+
- `decoded_user` and `decoded_password` provide clean, usable credentials
|
|
37
|
+
- Handles edge cases: password-only (`:password`), username-only, empty passwords
|
|
38
|
+
|
|
39
|
+
#### ð§ Complete URI Component Access
|
|
40
|
+
|
|
41
|
+
**Common URI components** are now accessible as both getters and setters:
|
|
42
|
+
|
|
43
|
+
**Read Access:**
|
|
44
|
+
|
|
45
|
+
- `scheme` - URL scheme (http, https, redis, postgresql, etc.)
|
|
46
|
+
- `host` - Host value for the parsed URI
|
|
47
|
+
- `hostname` - Hostname helper for URI-style access
|
|
48
|
+
- `port` - Port number
|
|
49
|
+
- `path` - URL path
|
|
50
|
+
- `query` - Raw query string
|
|
51
|
+
- `fragment` - Fragment/anchor (#section)
|
|
52
|
+
- `user`, `password`, `userinfo` - Authentication components
|
|
53
|
+
- `subdomain`, `domain`, `tld`, `root_domain` - Domain components (existing)
|
|
54
|
+
|
|
55
|
+
**Write Access (Setter Methods):**
|
|
56
|
+
|
|
57
|
+
- `scheme=`, `host=`, `hostname=`, `port=`, `path=`, `query=`, `fragment=`
|
|
58
|
+
- `user=`, `password=`, `userinfo=`
|
|
59
|
+
- Enables programmatic URI construction and modification
|
|
60
|
+
|
|
61
|
+
#### ð ïļ Authentication Helper Methods
|
|
62
|
+
|
|
63
|
+
**Basic Authentication:**
|
|
64
|
+
|
|
65
|
+
```ruby
|
|
66
|
+
# Generate Authorization header from parsed URL
|
|
67
|
+
result = DomainExtractor.parse('https://user:pass@api.example.com')
|
|
68
|
+
result.basic_auth_header
|
|
69
|
+
# => "Basic dXNlcjpwYXNz"
|
|
70
|
+
|
|
71
|
+
# Or use module method directly
|
|
72
|
+
DomainExtractor.basic_auth_header('user', 'password')
|
|
73
|
+
# => "Basic dXNlcjpwYXNzd29yZA=="
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
**Bearer Token Authentication:**
|
|
77
|
+
|
|
78
|
+
```ruby
|
|
79
|
+
DomainExtractor.bearer_auth_header('eyJhbGciOiJIUzI1NiIs...')
|
|
80
|
+
# => "Bearer eyJhbGciOiJIUzI1NiIs..."
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
**Credential Encoding/Decoding:**
|
|
84
|
+
```ruby
|
|
85
|
+
# Encode credentials for URL use (percent-encoding)
|
|
86
|
+
DomainExtractor.encode_credential('P@ss:word!')
|
|
87
|
+
# => "P%40ss%3Aword%21"
|
|
88
|
+
|
|
89
|
+
# Decode percent-encoded credentials
|
|
90
|
+
DomainExtractor.decode_credential('P%40ss%3Aword%21')
|
|
91
|
+
# => "P@ss:word!"
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
#### ð Advanced URI Methods
|
|
95
|
+
|
|
96
|
+
**URI Manipulation:**
|
|
97
|
+
- `merge(relative_url)` - Merge with relative URL (RFC 2396 compliant)
|
|
98
|
+
- `normalize` - Normalize URI (lowercase scheme/host, remove default ports)
|
|
99
|
+
- `absolute?` - Check if URI is absolute
|
|
100
|
+
- `relative?` - Check if URI is relative
|
|
101
|
+
- `default_port` - Get default port for scheme (80 for http, 443 for https, 6379 for redis, etc.)
|
|
102
|
+
- `build_url` - Reconstruct complete URL from components
|
|
103
|
+
|
|
104
|
+
**Proxy Detection:**
|
|
105
|
+
- `find_proxy` - Automatic proxy detection from environment variables
|
|
106
|
+
- Checks scheme-specific proxy variables, falls back to `http_proxy` / `HTTP_PROXY`, and respects `no_proxy`
|
|
107
|
+
- Returns proxy URI or nil
|
|
108
|
+
|
|
109
|
+
**Alias Methods for URI Compatibility:**
|
|
110
|
+
- `to_str` - Alias for `to_s`
|
|
111
|
+
- `hostname` - URI-style hostname accessor
|
|
112
|
+
- `query` - Raw query string access
|
|
113
|
+
|
|
114
|
+
#### ð Real-World Use Cases
|
|
115
|
+
|
|
116
|
+
**Database Connection Parsing:**
|
|
117
|
+
```ruby
|
|
118
|
+
db_url = 'postgresql://appuser:SecurePass@db.prod.internal:5432/production'
|
|
119
|
+
result = DomainExtractor.parse(db_url)
|
|
120
|
+
|
|
121
|
+
result.user # => "appuser"
|
|
122
|
+
result.password # => "SecurePass"
|
|
123
|
+
result.host # => "db.prod.internal"
|
|
124
|
+
result.port # => 5432
|
|
125
|
+
result.path # => "/production"
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
**Redis Connection with Special Characters:**
|
|
129
|
+
```ruby
|
|
130
|
+
redis_url = 'rediss://default:P%40ss%3Aword@redis.cloud:6385/0'
|
|
131
|
+
result = DomainExtractor.parse(redis_url)
|
|
132
|
+
|
|
133
|
+
result.password # => "P%40ss%3Aword"
|
|
134
|
+
result.decoded_password # => "P@ss:word"
|
|
135
|
+
result.scheme # => "rediss"
|
|
136
|
+
result.port # => 6385
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
**API Authentication Header Generation:**
|
|
140
|
+
```ruby
|
|
141
|
+
api_url = 'https://nick@untappd.com:MySuperAPIToken123@api.untappd.com/v4'
|
|
142
|
+
result = DomainExtractor.parse(api_url)
|
|
143
|
+
|
|
144
|
+
# Generate Basic Auth header
|
|
145
|
+
auth_header = result.basic_auth_header
|
|
146
|
+
# Use in HTTP request:
|
|
147
|
+
# headers['Authorization'] = auth_header
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
**FTP/SFTP Deployment:**
|
|
151
|
+
```ruby
|
|
152
|
+
deploy_url = 'sftp://deploy_user:DeployKey123@deployment.internal:22/var/www/app'
|
|
153
|
+
result = DomainExtractor.parse(deploy_url)
|
|
154
|
+
|
|
155
|
+
result.user # => "deploy_user"
|
|
156
|
+
result.password # => "DeployKey123"
|
|
157
|
+
result.host # => "deployment.internal"
|
|
158
|
+
result.port # => 22
|
|
159
|
+
result.path # => "/var/www/app"
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
#### ð Security Considerations
|
|
163
|
+
|
|
164
|
+
**Important Security Notes:**
|
|
165
|
+
- Embedding credentials in URLs is **deprecated** per RFC 3986 and should be avoided in production
|
|
166
|
+
- Use environment variables, secret managers, or secure vaults for credential storage
|
|
167
|
+
- The library supports credential extraction for **legacy systems** and **configuration parsing**
|
|
168
|
+
- Always use HTTPS/TLS when credentials must be transmitted
|
|
169
|
+
- Never log URLs containing credentials
|
|
170
|
+
- Consider using header-based authentication (Bearer tokens, API keys) instead
|
|
171
|
+
|
|
172
|
+
**Safe Credential Handling:**
|
|
173
|
+
```ruby
|
|
174
|
+
# â
Good: Parse from environment variable
|
|
175
|
+
db_url = ENV['DATABASE_URL']
|
|
176
|
+
config = DomainExtractor.parse(db_url)
|
|
177
|
+
|
|
178
|
+
# â
Good: Extract and use separately
|
|
179
|
+
username = config.decoded_user
|
|
180
|
+
password = config.decoded_password
|
|
181
|
+
# Pass to connection library without logging URL
|
|
182
|
+
|
|
183
|
+
# â Bad: Hardcode credentials in source
|
|
184
|
+
db_url = 'postgresql://user:password@localhost/db' # Don't do this!
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
#### ð Performance
|
|
188
|
+
|
|
189
|
+
**Maintains Performance-First Design:**
|
|
190
|
+
- All new features use frozen constants and optimized string operations
|
|
191
|
+
- Auth extraction adds <5Ξs overhead per parse
|
|
192
|
+
- Core hot paths remain allocation-conscious
|
|
193
|
+
- Thread-safe stateless modules
|
|
194
|
+
- Full parse throughput depends on host complexity; see the benchmark docs for measured results
|
|
195
|
+
|
|
196
|
+
#### ð URI-Style Access
|
|
197
|
+
|
|
198
|
+
**Common URI-style access with additional domain helpers:**
|
|
199
|
+
```ruby
|
|
200
|
+
# Before (using URI)
|
|
201
|
+
uri = URI.parse('https://user:pass@example.com:8080/path?query=value#section')
|
|
202
|
+
uri.scheme # => "https"
|
|
203
|
+
uri.user # => "user"
|
|
204
|
+
uri.password # => "pass"
|
|
205
|
+
uri.host # => "example.com"
|
|
206
|
+
uri.port # => 8080
|
|
207
|
+
|
|
208
|
+
# After (using DomainExtractor) - identical API
|
|
209
|
+
result = DomainExtractor.parse('https://user:pass@example.com:8080/path?query=value#section')
|
|
210
|
+
result.scheme # => "https"
|
|
211
|
+
result.user # => "user"
|
|
212
|
+
result.password # => "pass"
|
|
213
|
+
result.host # => "example.com"
|
|
214
|
+
result.port # => 8080
|
|
215
|
+
|
|
216
|
+
# PLUS: Additional domain parsing features
|
|
217
|
+
result.subdomain # => nil
|
|
218
|
+
result.domain # => "example"
|
|
219
|
+
result.tld # => "com"
|
|
220
|
+
result.root_domain # => "example.com"
|
|
221
|
+
|
|
222
|
+
# PLUS: Decoded credentials
|
|
223
|
+
result.decoded_user # => "user"
|
|
224
|
+
result.decoded_password # => "pass"
|
|
225
|
+
|
|
226
|
+
# PLUS: Authentication helpers
|
|
227
|
+
result.basic_auth_header # => "Basic dXNlcjpwYXNz"
|
|
228
|
+
```
|
|
229
|
+
|
|
230
|
+
#### ðĶ Implementation Details
|
|
231
|
+
|
|
232
|
+
**New Modules:**
|
|
233
|
+
- `DomainExtractor::Auth` - Authentication component extraction with percent-decoding
|
|
234
|
+
- `DomainExtractor::URIHelpers` - Advanced URI manipulation and helper methods
|
|
235
|
+
|
|
236
|
+
**Enhanced Modules:**
|
|
237
|
+
- `DomainExtractor::Parser` - Now extracts auth components and additional URI parts
|
|
238
|
+
- `DomainExtractor::Result` - Builds results with auth and URI components
|
|
239
|
+
- `DomainExtractor::ParsedURL` - Extended with URI-compatible methods and setters
|
|
240
|
+
|
|
241
|
+
**Code Quality:**
|
|
242
|
+
- 200+ comprehensive test cases covering all scenarios
|
|
243
|
+
- RuboCop clean with zero offenses
|
|
244
|
+
- 100% backward compatible - no breaking changes
|
|
245
|
+
- Extensive documentation with real-world examples
|
|
246
|
+
|
|
247
|
+
#### ðŊ Migration from URI Library
|
|
248
|
+
|
|
249
|
+
**Low-friction migration for common absolute-URL use cases:**
|
|
250
|
+
```ruby
|
|
251
|
+
# Swap URI.parse for DomainExtractor.parse
|
|
252
|
+
# Before:
|
|
253
|
+
require 'uri'
|
|
254
|
+
uri = URI.parse(url)
|
|
255
|
+
|
|
256
|
+
# After:
|
|
257
|
+
require 'domain_extractor'
|
|
258
|
+
uri = DomainExtractor.parse(url)
|
|
259
|
+
|
|
260
|
+
# Common URI-style accessors continue to work, plus you get:
|
|
261
|
+
# - Multi-part TLD support
|
|
262
|
+
# - Domain component extraction
|
|
263
|
+
# - Decoded credentials
|
|
264
|
+
# - Authentication helpers
|
|
265
|
+
# - Better performance
|
|
266
|
+
```
|
|
267
|
+
|
|
268
|
+
#### ð Documentation
|
|
269
|
+
|
|
270
|
+
- Comprehensive CHANGELOG with all features documented
|
|
271
|
+
- README updated with authentication examples
|
|
272
|
+
- Real-world use cases for Redis, databases, FTP, APIs
|
|
273
|
+
- Security best practices section
|
|
274
|
+
- Migration guide from URI library
|
|
275
|
+
|
|
8
276
|
## [0.2.7] - 2025-11-09
|
|
9
277
|
|
|
10
278
|
### Added - URL Formatting API
|
data/LICENSE
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
BSD 3-Clause License
|
|
2
2
|
|
|
3
|
-
Copyright (c)
|
|
3
|
+
Copyright (c) 2026, OpenSite AI. All rights reserved.
|
|
4
4
|
|
|
5
5
|
Redistribution and use in source and binary forms, with or without
|
|
6
6
|
modification, are permitted provided that the following conditions are met:
|