RubyGems - domain_extractor - Versions diffs - 0.2.6 → 0.2.8 - Mend

domain_extractor 0.2.6 → 0.2.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +145 -0
data/LICENSE +28 -0
data/README.md +237 -0
data/lib/domain_extractor/formatter.rb +105 -0
data/lib/domain_extractor/version.rb +1 -1
data/lib/domain_extractor.rb +27 -0
data/spec/formatter_spec.rb +299 -0
metadata +7 -5
data/LICENSE.txt +0 -21

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: b770e3c09383122b5cae3baa952127a0f616ee721c2a241f1facd9ddc42a4762
-  data.tar.gz: de6e3561bba3d457da8a4cd9aee88c5f6c76aedaf233c3bda4930cb8402b2871
+  metadata.gz: d3a08e0e813341f588f96df2c7ec48eede21b5e93b6f46e127d15b2266439919
+  data.tar.gz: f05739738dda333fa4793397f4c51c047574d2c94e591ed6c76904f8cdc51fe7
 SHA512:
-  metadata.gz: 342694e42f321dbea6b197a99909afba2fc4de4d13d01e6e92e66f54fa7d286c1abdfbc1713c56709783d1d05a840523c8f0b202c89528bdeca754eade68cf60
-  data.tar.gz: e23a61526b995375057f34b6a87053c9a26e1f6b6699a521332829921ecb28d1fa6fcf13802fdb9f79637a45200bc1ff98fa47fb8179671fbaed27500e9c16e9
+  metadata.gz: f1d4c335712a677707d7fedb966924e9f9a2f0e06f9084fc5ab1075d5d620ae51cd390a2e5013245e6fb9ec3d32563ed7c090d989807e86d01a304ac8ba6d9bc
+  data.tar.gz: 143f192445d137c456668c44e260cf96183ca0afae22be308e884f9fa9c9248cbf43b5608466412fc0734e652565c72bea45990bc4d038ac20934b854f6b929f

data/CHANGELOG.md CHANGED Viewed

@@ -5,6 +5,151 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [0.2.7] - 2025-11-09
+### Added - URL Formatting API
+Added a comprehensive `format` method for programmatic URL normalization and transformation. The formatter provides precise control over URL structure, protocol, and formatting while maintaining the same validation modes as the Rails validator.
+#### Features
+**Core Method:**
+- `DomainExtractor.format(url, **options)` - Format and normalize URLs based on specified options
+- Returns formatted URL string or `nil` for invalid input
+- Strips paths and query parameters from URLs
+- Supports all validation modes from the Rails validator
+**Validation Modes:**
+- `:standard` (default) - Preserves full host as-is while normalizing protocol/slashes
+- `:root_domain` - Strips all subdomains, returns only root domain
+- `:root_or_custom_subdomain` - Preserves custom subdomains but removes 'www'
+**Formatting Options:**
+- `use_protocol` (default: `true`) - Include/exclude protocol in output
+- `use_https` (default: `true`) - Use HTTPS vs HTTP (only when `use_protocol` is true)
+- `use_trailing_slash` (default: `false`) - Add/remove trailing slash from output
+#### Usage Examples
+**Basic Formatting:**
+```ruby
+# Remove trailing slash (default)
+DomainExtractor.format('https://example.com/')
+# => 'https://example.com'
+# Strip paths and query parameters
+DomainExtractor.format('https://example.com/path?query=value')
+# => 'https://example.com'
+# Normalize to HTTPS
+DomainExtractor.format('http://example.com')
+# => 'https://example.com'
+```
+**Validation Modes:**
+```ruby
+# Root domain only (strips subdomains)
+DomainExtractor.format('https://shop.example.com', validation: :root_domain)
+# => 'https://example.com'
+# Strip www but keep custom subdomains
+DomainExtractor.format('https://www.example.com', validation: :root_or_custom_subdomain)
+# => 'https://example.com'
+```
+**Protocol Control:**
+```ruby
+# Without protocol
+DomainExtractor.format('https://example.com', use_protocol: false)
+# => 'example.com'
+# Force HTTP instead of HTTPS
+DomainExtractor.format('https://example.com', use_https: false)
+# => 'http://example.com'
+```
+**Trailing Slash Control:**
+```ruby
+# Add trailing slash
+DomainExtractor.format('https://example.com', use_trailing_slash: true)
+# => 'https://example.com/'
+```
+**Combined Options:**
+```ruby
+# Root domain, no protocol, with trailing slash
+DomainExtractor.format('https://shop.example.com/path',
+                       validation: :root_domain,
+                       use_protocol: false,
+                       use_trailing_slash: true)
+# => 'example.com/'
+```
+#### Real-World Use Cases
+**Canonical URL Generation:**
+```ruby
+def canonical_url(url)
+  DomainExtractor.format(url,
+                         validation: :root_or_custom_subdomain,
+                         use_https: true,
+                         use_trailing_slash: false)
+end
+canonical_url('http://www.example.com/')   # => 'https://example.com'
+```
+**Domain Normalization for Allowlists:**
+```ruby
+def normalize_domain(url)
+  DomainExtractor.format(url, validation: :root_domain, use_protocol: false)
+end
+normalize_domain('https://shop.example.com/path')  # => 'example.com'
+```
+**Multi-Tenant URL Standardization:**
+```ruby
+class Tenant < ApplicationRecord
+  before_validation :normalize_custom_domain
+  private
+  def normalize_custom_domain
+    return if custom_domain.blank?
+    self.custom_domain = DomainExtractor.format(
+      custom_domain,
+      validation: :root_or_custom_subdomain,
+      use_https: true,
+      use_trailing_slash: false
+    )
+  end
+end
+```
+#### Implementation Details
+- **Performance**: Leverages existing DomainExtractor parsing engine with minimal overhead
+- **Nil-safe**: Returns `nil` for invalid URLs instead of raising exceptions
+- **Consistent API**: Uses same option names and validation modes as Rails validator
+- **Path/Query Stripping**: Automatically removes paths and query parameters
+- **Multi-part TLD Support**: Correctly handles complex TLDs like `.co.uk`, `.com.au`
+#### Code Quality
+- **49 comprehensive test cases** covering all formatting modes and options
+- **RuboCop clean** with zero offenses
+- **100% test coverage** maintained across entire gem (200 total tests)
+- **Well-documented** with extensive README section and real-world examples
+#### Documentation
+- Added comprehensive **URL Formatting** section to README.md
+- Includes examples for all validation modes and options
+- Real-world use cases: canonical URLs, domain normalization, multi-tenant standardization
+- Clear API reference with all available options
 ## [0.2.6] - 2025-11-09
 ### Fixed - Rails Validator Registration

data/LICENSE ADDED Viewed

@@ -0,0 +1,28 @@
+BSD 3-Clause License
+Copyright (c) 2025, OpenSite AI. All rights reserved.
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+1. Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+2. Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+3. Neither the name of the copyright holder nor the names of its
+   contributors may be used to endorse or promote products derived from
+   this software without specific prior written permission.
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

data/README.md CHANGED Viewed

@@ -13,6 +13,8 @@ Use **DomainExtractor** whenever you need a dependable tld parser for tricky mul
 ✅ **Accurate Multi-part TLD Parser** - Handles complex multi-part TLDs (co.uk, com.au, gov.br) using the [Public Suffix List](https://publicsuffix.org/)
 ✅ **Nested Subdomain Extraction** - Correctly parses multi-level subdomains (api.staging.example.com)
 ✅ **Smart URL Normalization** - Automatically handles URLs with or without schemes
+✅ **Powerful URL Formatting** - Transform and standardize URLs with flexible options
+✅ **Rails Integration** - Custom ActiveModel validator for declarative URL validation
 ✅ **Query Parameter Parsing** - Parse query strings into structured hashes
 ✅ **Batch Processing** - Parse multiple URLs efficiently
 ✅ **IP Address Detection** - Identifies and handles IPv4 and IPv6 addresses
@@ -355,6 +357,241 @@ DomainExtractor.parse_query_params(query_string)
 # Returns: Hash of query parameters
 ```
+```ruby
+DomainExtractor.format(url_string, **options)
+# => Formats a URL according to the specified options.
+# Returns: Formatted URL string or nil if invalid
+# Options:
+#   :validation (:standard, :root_domain, :root_or_custom_subdomain)
+#   :use_protocol (true/false)
+#   :use_https (true/false)
+#   :use_trailing_slash (true/false)
+```
+## URL Formatting
+DomainExtractor provides powerful URL formatting capabilities to normalize, transform, and standardize URLs according to your application's requirements.
+### Basic Formatting
+```ruby
+# Remove trailing slash (default)
+DomainExtractor.format('https://example.com/')
+# => 'https://example.com'
+# Strip paths and query parameters
+DomainExtractor.format('https://example.com/path?query=value')
+# => 'https://example.com'
+# Normalize to HTTPS
+DomainExtractor.format('http://example.com')
+# => 'https://example.com'
+```
+### Validation Modes
+#### Standard Mode (Default)
+Preserves the full host as-is while normalizing protocol and trailing slashes.
+```ruby
+DomainExtractor.format('https://shop.example.com')
+# => 'https://shop.example.com'
+DomainExtractor.format('https://www.example.com/')
+# => 'https://www.example.com'
+DomainExtractor.format('https://api.staging.example.com')
+# => 'https://api.staging.example.com'
+```
+#### Root Domain Mode
+Strips all subdomains and returns only the root domain.
+```ruby
+DomainExtractor.format('https://shop.example.com', validation: :root_domain)
+# => 'https://example.com'
+DomainExtractor.format('https://www.example.com/', validation: :root_domain)
+# => 'https://example.com'
+DomainExtractor.format('https://api.staging.example.com', validation: :root_domain)
+# => 'https://example.com'
+# Works with multi-part TLDs
+DomainExtractor.format('https://shop.example.co.uk', validation: :root_domain)
+# => 'https://example.co.uk'
+```
+#### Root or Custom Subdomain Mode
+Preserves custom subdomains but specifically removes the 'www' subdomain.
+```ruby
+DomainExtractor.format('https://example.com', validation: :root_or_custom_subdomain)
+# => 'https://example.com'
+DomainExtractor.format('https://shop.example.com', validation: :root_or_custom_subdomain)
+# => 'https://shop.example.com'
+# Strips www subdomain
+DomainExtractor.format('https://www.example.com', validation: :root_or_custom_subdomain)
+# => 'https://example.com'
+DomainExtractor.format('https://api.example.com', validation: :root_or_custom_subdomain)
+# => 'https://api.example.com'
+```
+### Protocol Options
+#### Without Protocol
+Remove the protocol entirely from the output.
+```ruby
+DomainExtractor.format('https://example.com', use_protocol: false)
+# => 'example.com'
+DomainExtractor.format('https://shop.example.com', use_protocol: false)
+# => 'shop.example.com'
+# Combine with root_domain
+DomainExtractor.format('https://shop.example.com',
+                       validation: :root_domain,
+                       use_protocol: false)
+# => 'example.com'
+```
+#### HTTP vs HTTPS
+Control which protocol to use in the output.
+```ruby
+# Default: use HTTPS
+DomainExtractor.format('http://example.com')
+# => 'https://example.com'
+# Allow HTTP
+DomainExtractor.format('https://example.com', use_https: false)
+# => 'http://example.com'
+DomainExtractor.format('http://example.com', use_https: false)
+# => 'http://example.com'
+```
+### Trailing Slash Options
+```ruby
+# Remove trailing slash (default)
+DomainExtractor.format('https://example.com/')
+# => 'https://example.com'
+# Add trailing slash
+DomainExtractor.format('https://example.com', use_trailing_slash: true)
+# => 'https://example.com/'
+DomainExtractor.format('https://example.com/', use_trailing_slash: true)
+# => 'https://example.com/'
+# Works with other options
+DomainExtractor.format('https://shop.example.com',
+                       validation: :root_domain,
+                       use_trailing_slash: true)
+# => 'https://example.com/'
+```
+### Combined Options
+Mix and match options for precise URL formatting:
+```ruby
+# Root domain, no protocol, with trailing slash
+DomainExtractor.format('https://shop.example.com/path',
+                       validation: :root_domain,
+                       use_protocol: false,
+                       use_trailing_slash: true)
+# => 'example.com/'
+# Strip www, use HTTP, with trailing slash
+DomainExtractor.format('https://www.example.com',
+                       validation: :root_or_custom_subdomain,
+                       use_https: false,
+                       use_trailing_slash: true)
+# => 'http://example.com/'
+# Standard mode, no protocol, with trailing slash
+DomainExtractor.format('https://api.example.com',
+                       use_protocol: false,
+                       use_trailing_slash: true)
+# => 'api.example.com/'
+```
+### Real-World Use Cases
+#### Canonical URL Generation
+```ruby
+def canonical_url(url)
+  DomainExtractor.format(url,
+                         validation: :root_or_custom_subdomain,
+                         use_https: true,
+                         use_trailing_slash: false)
+end
+canonical_url('http://www.example.com/')      # => 'https://example.com'
+canonical_url('https://shop.example.com/')    # => 'https://shop.example.com'
+```
+#### Domain Normalization for Allowlists
+```ruby
+def normalize_domain_for_allowlist(url)
+  DomainExtractor.format(url,
+                         validation: :root_domain,
+                         use_protocol: false)
+end
+normalize_domain_for_allowlist('https://shop.example.com/path')  # => 'example.com'
+normalize_domain_for_allowlist('http://www.example.com')         # => 'example.com'
+```
+#### Multi-Tenant URL Standardization
+```ruby
+class Tenant < ApplicationRecord
+  before_validation :normalize_custom_domain
+  private
+  def normalize_custom_domain
+    return if custom_domain.blank?
+    self.custom_domain = DomainExtractor.format(
+      custom_domain,
+      validation: :root_or_custom_subdomain,
+      use_https: true,
+      use_trailing_slash: false
+    )
+  end
+end
+```
+#### API Endpoint Formatting
+```ruby
+def format_api_endpoint(url)
+  DomainExtractor.format(url,
+                         validation: :standard,
+                         use_https: true,
+                         use_trailing_slash: true)
+end
+format_api_endpoint('http://api.example.com')  # => 'https://api.example.com/'
+```
 ## Rails Integration
 DomainExtractor provides a custom ActiveModel validator for Rails applications, enabling declarative URL/domain validation with multiple modes and options.

data/lib/domain_extractor/formatter.rb ADDED Viewed

@@ -0,0 +1,105 @@
+# frozen_string_literal: true
+module DomainExtractor
+  # Formatter provides URL formatting based on validation modes and protocol requirements.
+  #
+  # Formats a URL string according to the specified options:
+  # - Validation modes: :standard, :root_domain, :root_or_custom_subdomain
+  # - Protocol options: use_protocol, use_https
+  # - Trailing slash: use_trailing_slash
+  #
+  # @example Standard formatting
+  #   DomainExtractor.format('https://www.example.com/')
+  #   # => 'https://www.example.com'
+  #
+  # @example Root domain only
+  #   DomainExtractor.format('https://shop.example.com/path', validation: :root_domain)
+  #   # => 'https://example.com'
+  #
+  # @example Without protocol
+  #   DomainExtractor.format('https://example.com', use_protocol: false)
+  #   # => 'example.com'
+  module Formatter
+    VALIDATION_MODES = %i[standard root_domain root_or_custom_subdomain].freeze
+    WWW_SUBDOMAIN = 'www'
+    module_function
+    # Format a URL according to the specified options
+    #
+    # @param url [String] The URL to format
+    # @param options [Hash] Formatting options
+    # @option options [Symbol] :validation (:standard) Validation mode
+    # @option options [Boolean] :use_protocol (true) Include protocol in output
+    # @option options [Boolean] :use_https (true) Use https instead of http
+    # @option options [Boolean] :use_trailing_slash (false) Include trailing slash
+    # @return [String, nil] Formatted URL or nil if invalid
+    def call(url, **options)
+      validation = options.fetch(:validation, :standard)
+      use_protocol = options.fetch(:use_protocol, true)
+      use_https = options.fetch(:use_https, true)
+      use_trailing_slash = options.fetch(:use_trailing_slash, false)
+      validate_options!(validation)
+      # Parse the URL
+      parsed = DomainExtractor.parse(url)
+      return nil unless parsed.valid?
+      # Build the formatted URL based on validation mode
+      formatted_host = build_host(parsed, validation)
+      build_url(formatted_host, use_protocol, use_https, use_trailing_slash)
+    end
+    def validate_options!(validation)
+      return if VALIDATION_MODES.include?(validation)
+      raise ArgumentError, "Invalid validation mode: #{validation}. " \
+                           "Must be one of: #{VALIDATION_MODES.join(', ')}"
+    end
+    private_class_method :validate_options!
+    # Build the host portion based on validation mode
+    def build_host(parsed, validation)
+      case validation
+      when :standard
+        # Return the full host as-is
+        parsed.host
+      when :root_domain
+        # Return only the root domain (no subdomains)
+        parsed.root_domain
+      when :root_or_custom_subdomain
+        # Return root domain or custom subdomain (strip www)
+        if parsed.subdomain == WWW_SUBDOMAIN
+          parsed.root_domain
+        else
+          parsed.host
+        end
+      end
+    end
+    private_class_method :build_host
+    # Build the final URL string with protocol and trailing slash options
+    def build_url(host, use_protocol, use_https, use_trailing_slash)
+      url = ''
+      # Add protocol if requested
+      if use_protocol
+        protocol = use_https ? 'https://' : 'http://'
+        url = protocol + host
+      else
+        url = host
+      end
+      # Add or remove trailing slash
+      if use_trailing_slash
+        url += '/' unless url.end_with?('/')
+      else
+        url = url.chomp('/')
+      end
+      url
+    end
+    private_class_method :build_url
+  end
+end

data/lib/domain_extractor/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module DomainExtractor
-  VERSION = '0.2.6'
+  VERSION = '0.2.8'
 end

data/lib/domain_extractor.rb CHANGED Viewed

@@ -8,6 +8,7 @@ require_relative 'domain_extractor/errors'
 require_relative 'domain_extractor/parsed_url'
 require_relative 'domain_extractor/parser'
 require_relative 'domain_extractor/query_params'
+require_relative 'domain_extractor/formatter'
 # Conditionally load Rails validator if ActiveModel is available
 begin
@@ -70,6 +71,32 @@ module DomainExtractor
       QueryParams.call(query_string)
     end
+    # Format a URL according to the specified options.
+    # Returns a formatted URL string or nil if the input is invalid.
+    #
+    # @param url [String] The URL to format
+    # @param options [Hash] Formatting options
+    # @option options [Symbol] :validation (:standard) Validation mode
+    # @option options [Boolean] :use_protocol (true) Include protocol in output
+    # @option options [Boolean] :use_https (true) Use https instead of http
+    # @option options [Boolean] :use_trailing_slash (false) Include trailing slash
+    # @return [String, nil]
+    #
+    # @example Standard formatting
+    #   DomainExtractor.format('https://www.example.com/')
+    #   # => 'https://www.example.com'
+    #
+    # @example Root domain only
+    #   DomainExtractor.format('https://shop.example.com/path', validation: :root_domain)
+    #   # => 'https://example.com'
+    #
+    # @example Without protocol
+    #   DomainExtractor.format('https://example.com', use_protocol: false)
+    #   # => 'example.com'
+    def format(url, **)
+      Formatter.call(url, **)
+    end
     alias parse_query parse_query_params
   end
 end

data/spec/formatter_spec.rb ADDED Viewed

@@ -0,0 +1,299 @@
+# frozen_string_literal: true
+require 'spec_helper'
+RSpec.describe DomainExtractor::Formatter do
+  describe '.call' do
+    context 'with :standard validation mode' do
+      it 'formats a simple URL with default options' do
+        result = described_class.call('https://example.com')
+        expect(result).to eq('https://example.com')
+      end
+      it 'removes trailing slash by default' do
+        result = described_class.call('https://example.com/')
+        expect(result).to eq('https://example.com')
+      end
+      it 'preserves subdomains' do
+        result = described_class.call('https://shop.example.com')
+        expect(result).to eq('https://shop.example.com')
+      end
+      it 'preserves www subdomain' do
+        result = described_class.call('https://www.example.com')
+        expect(result).to eq('https://www.example.com')
+      end
+      it 'preserves multi-level subdomains' do
+        result = described_class.call('https://api.staging.example.com')
+        expect(result).to eq('https://api.staging.example.com')
+      end
+      it 'handles URLs without protocol' do
+        result = described_class.call('example.com')
+        expect(result).to eq('https://example.com')
+      end
+      it 'strips path from URL' do
+        result = described_class.call('https://example.com/path/to/page')
+        expect(result).to eq('https://example.com')
+      end
+      it 'strips query parameters from URL' do
+        result = described_class.call('https://example.com?foo=bar')
+        expect(result).to eq('https://example.com')
+      end
+    end
+    context 'with :root_domain validation mode' do
+      it 'returns root domain for URL with subdomain' do
+        result = described_class.call('https://shop.example.com', validation: :root_domain)
+        expect(result).to eq('https://example.com')
+      end
+      it 'returns root domain for URL with www' do
+        result = described_class.call('https://www.example.com', validation: :root_domain)
+        expect(result).to eq('https://example.com')
+      end
+      it 'returns root domain for URL without subdomain' do
+        result = described_class.call('https://example.com', validation: :root_domain)
+        expect(result).to eq('https://example.com')
+      end
+      it 'returns root domain for multi-level subdomains' do
+        result = described_class.call('https://api.staging.example.com', validation: :root_domain)
+        expect(result).to eq('https://example.com')
+      end
+      it 'handles multi-part TLDs' do
+        result = described_class.call('https://shop.example.co.uk', validation: :root_domain)
+        expect(result).to eq('https://example.co.uk')
+      end
+    end
+    context 'with :root_or_custom_subdomain validation mode' do
+      it 'preserves root domain' do
+        result = described_class.call('https://example.com', validation: :root_or_custom_subdomain)
+        expect(result).to eq('https://example.com')
+      end
+      it 'preserves custom subdomains' do
+        result = described_class.call('https://shop.example.com', validation: :root_or_custom_subdomain)
+        expect(result).to eq('https://shop.example.com')
+      end
+      it 'strips www subdomain' do
+        result = described_class.call('https://www.example.com', validation: :root_or_custom_subdomain)
+        expect(result).to eq('https://example.com')
+      end
+      it 'preserves api subdomain' do
+        result = described_class.call('https://api.example.com', validation: :root_or_custom_subdomain)
+        expect(result).to eq('https://api.example.com')
+      end
+      it 'preserves multi-level custom subdomains' do
+        result = described_class.call('https://api.staging.example.com', validation: :root_or_custom_subdomain)
+        expect(result).to eq('https://api.staging.example.com')
+      end
+    end
+    context 'with use_protocol option' do
+      it 'includes protocol by default' do
+        result = described_class.call('https://example.com')
+        expect(result).to eq('https://example.com')
+      end
+      it 'includes protocol when use_protocol is true' do
+        result = described_class.call('https://example.com', use_protocol: true)
+        expect(result).to eq('https://example.com')
+      end
+      it 'excludes protocol when use_protocol is false' do
+        result = described_class.call('https://example.com', use_protocol: false)
+        expect(result).to eq('example.com')
+      end
+      it 'excludes protocol with subdomain' do
+        result = described_class.call('https://shop.example.com', use_protocol: false)
+        expect(result).to eq('shop.example.com')
+      end
+      it 'works with root_domain validation' do
+        result = described_class.call('https://shop.example.com',
+                                      validation: :root_domain,
+                                      use_protocol: false)
+        expect(result).to eq('example.com')
+      end
+    end
+    context 'with use_https option' do
+      it 'uses https by default' do
+        result = described_class.call('http://example.com')
+        expect(result).to eq('https://example.com')
+      end
+      it 'uses https when use_https is true' do
+        result = described_class.call('http://example.com', use_https: true)
+        expect(result).to eq('https://example.com')
+      end
+      it 'uses http when use_https is false' do
+        result = described_class.call('https://example.com', use_https: false)
+        expect(result).to eq('http://example.com')
+      end
+      it 'preserves http when use_https is false' do
+        result = described_class.call('http://example.com', use_https: false)
+        expect(result).to eq('http://example.com')
+      end
+      it 'ignores use_https when use_protocol is false' do
+        result = described_class.call('https://example.com',
+                                      use_protocol: false,
+                                      use_https: false)
+        expect(result).to eq('example.com')
+      end
+    end
+    context 'with use_trailing_slash option' do
+      it 'removes trailing slash by default' do
+        result = described_class.call('https://example.com/')
+        expect(result).to eq('https://example.com')
+      end
+      it 'removes trailing slash when use_trailing_slash is false' do
+        result = described_class.call('https://example.com/', use_trailing_slash: false)
+        expect(result).to eq('https://example.com')
+      end
+      it 'adds trailing slash when use_trailing_slash is true' do
+        result = described_class.call('https://example.com', use_trailing_slash: true)
+        expect(result).to eq('https://example.com/')
+      end
+      it 'preserves trailing slash when use_trailing_slash is true' do
+        result = described_class.call('https://example.com/', use_trailing_slash: true)
+        expect(result).to eq('https://example.com/')
+      end
+      it 'works without protocol' do
+        result = described_class.call('https://example.com',
+                                      use_protocol: false,
+                                      use_trailing_slash: true)
+        expect(result).to eq('example.com/')
+      end
+      it 'works with root_domain validation' do
+        result = described_class.call('https://shop.example.com',
+                                      validation: :root_domain,
+                                      use_trailing_slash: true)
+        expect(result).to eq('https://example.com/')
+      end
+    end
+    context 'with combined options' do
+      it 'formats with all options: root_domain, no protocol, with trailing slash' do
+        result = described_class.call('https://shop.example.com/path',
+                                      validation: :root_domain,
+                                      use_protocol: false,
+                                      use_trailing_slash: true)
+        expect(result).to eq('example.com/')
+      end
+      it 'formats with root_or_custom_subdomain, http protocol, no trailing slash' do
+        result = described_class.call('https://www.example.com/',
+                                      validation: :root_or_custom_subdomain,
+                                      use_https: false,
+                                      use_trailing_slash: false)
+        expect(result).to eq('http://example.com')
+      end
+      it 'formats with standard, no protocol, http, with trailing slash' do
+        result = described_class.call('https://api.example.com',
+                                      validation: :standard,
+                                      use_protocol: false,
+                                      use_trailing_slash: true)
+        expect(result).to eq('api.example.com/')
+      end
+      it 'strips www and adds trailing slash' do
+        result = described_class.call('https://www.example.com',
+                                      validation: :root_or_custom_subdomain,
+                                      use_trailing_slash: true)
+        expect(result).to eq('https://example.com/')
+      end
+    end
+    context 'with multi-part TLDs' do
+      it 'handles UK domains with standard mode' do
+        result = described_class.call('https://shop.example.co.uk')
+        expect(result).to eq('https://shop.example.co.uk')
+      end
+      it 'handles UK domains with root_domain mode' do
+        result = described_class.call('https://shop.example.co.uk', validation: :root_domain)
+        expect(result).to eq('https://example.co.uk')
+      end
+      it 'handles Australian domains' do
+        result = described_class.call('https://www.example.com.au',
+                                      validation: :root_or_custom_subdomain)
+        expect(result).to eq('https://example.com.au')
+      end
+    end
+    context 'with invalid input' do
+      it 'returns nil for invalid URLs' do
+        result = described_class.call('not-a-url')
+        expect(result).to be_nil
+      end
+      it 'returns nil for nil input' do
+        result = described_class.call(nil)
+        expect(result).to be_nil
+      end
+      it 'returns nil for empty string' do
+        result = described_class.call('')
+        expect(result).to be_nil
+      end
+      it 'returns nil for IP addresses' do
+        result = described_class.call('https://192.168.1.1')
+        expect(result).to be_nil
+      end
+    end
+    context 'with error handling' do
+      it 'raises error for invalid validation mode' do
+        expect do
+          described_class.call('https://example.com', validation: :invalid_mode)
+        end.to raise_error(ArgumentError, /Invalid validation mode/)
+      end
+    end
+  end
+end
+RSpec.describe DomainExtractor do
+  describe '.format' do
+    it 'delegates to Formatter.call' do
+      result = DomainExtractor.format('https://www.example.com/')
+      expect(result).to eq('https://www.example.com')
+    end
+    it 'passes options correctly' do
+      result = DomainExtractor.format('https://shop.example.com',
+                                      validation: :root_domain,
+                                      use_protocol: false)
+      expect(result).to eq('example.com')
+    end
+    it 'returns nil for invalid URLs' do
+      result = DomainExtractor.format('invalid-url')
+      expect(result).to be_nil
+    end
+  end
+end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: domain_extractor
 version: !ruby/object:Gem::Version
-  version: 0.2.6
+  version: 0.2.8
 platform: ruby
 authors:
 - OpenSite AI
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2025-11-09 00:00:00.000000000 Z
+date: 2025-11-11 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: public_suffix
@@ -33,16 +33,17 @@ executables: []
 extensions: []
 extra_rdoc_files:
 - README.md
-- LICENSE.txt
+- LICENSE
 - CHANGELOG.md
 files:
 - ".rubocop.yml"
 - CHANGELOG.md
-- LICENSE.txt
+- LICENSE
 - README.md
 - lib/domain_extractor.rb
 - lib/domain_extractor/domain_validator.rb
 - lib/domain_extractor/errors.rb
+- lib/domain_extractor/formatter.rb
 - lib/domain_extractor/normalizer.rb
 - lib/domain_extractor/parsed_url.rb
 - lib/domain_extractor/parser.rb
@@ -52,11 +53,12 @@ files:
 - lib/domain_extractor/version.rb
 - spec/domain_extractor_spec.rb
 - spec/domain_validator_spec.rb
+- spec/formatter_spec.rb
 - spec/parsed_url_spec.rb
 - spec/spec_helper.rb
 homepage: https://github.com/opensite-ai/domain_extractor
 licenses:
-- MIT
+- BSD-3-Clause
 metadata:
   source_code_uri: https://github.com/opensite-ai/domain_extractor
   changelog_uri: https://github.com/opensite-ai/domain_extractor/blob/master/CHANGELOG.md

data/LICENSE.txt DELETED Viewed

@@ -1,21 +0,0 @@
-MIT License
-Copyright (c) 2025 OpenSite AI
-Permission is hereby granted, free of charge, to any person obtaining a copy
-of this software and associated documentation files (the "Software"), to deal
-in the Software without restriction, including without limitation the rights
-to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-copies of the Software, and to permit persons to whom the Software is
-furnished to do so, subject to the following conditions:
-The above copyright notice and this permission notice shall be included in all
-copies or substantial portions of the Software.
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-SOFTWARE.