RubyGems - webinspector - Versions diffs - 0.4.0 → 1.0.0 - Mend

webinspector 0.4.0 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

checksums.yaml +5 -5
data/Gemfile +2 -0
data/README.md +58 -28
data/Rakefile +2 -1
data/bin/console +4 -3
data/lib/web_inspector/inspector.rb +192 -83
data/lib/web_inspector/meta.rb +36 -15
data/lib/web_inspector/page.rb +145 -61
data/lib/web_inspector/request.rb +10 -8
data/lib/web_inspector/version.rb +3 -1
data/lib/web_inspector.rb +4 -2
data/lib/webinspector.rb +3 -1
data/webinspector.gemspec +33 -26
metadata +103 -60

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
-SHA1:
-  metadata.gz: c9168c8258b2cc38cad1e30e12d1f42c07d2e0ce
-  data.tar.gz: 302adea791b1d4a4afd03a3fa36e5220244a9896
+SHA256:
+  metadata.gz: 0413d3ff948ab6efff6a1cbe8a7844287149ad06f09353655e6cb208968f9481
+  data.tar.gz: 152b950595afb57adc522da24c6959f71d160ba903b3d01ce6ee5f6a8b4d81d2
 SHA512:
-  metadata.gz: d43248b9c86fb8da996fa874a8ae3202ce9b033cc0d38c015ece9d55b65576a5e55a778929a8afa950feb3db4a51d2637dbbbf2ca6a1de5c460f433ccb1356a5
-  data.tar.gz: c5364539ff2f5701f01feff1d931bef49e27abeeeba5cbfc959c1d932a91e3e8276482cd7b7ed7b64d2f3c8ad83d2f174d91c830e462d840657ea670e490f89a
+  metadata.gz: c6230493b59a0d23585be729ec98706cfdbd6852e2de2d65db83d1638f85110369d41f2275b8e0aa09b58008d53924d036840ce63523d9004f19275999be90f8
+  data.tar.gz: dad518b0b04c1e341c14c29438ebcf84f4602bf394254a4c61382ad79ce53dae2ac92f138f331d5bf089d2220011f14b0cde0a79608a91177b2bda2ab1773a96

data/Gemfile CHANGED Viewed

@@ -1,3 +1,5 @@
+# frozen_string_literal: true
 source 'https://rubygems.org'
 # Specify your gem's dependencies in webinspector.gemspec

data/README.md CHANGED Viewed

@@ -1,10 +1,10 @@
-# Webinspector
+# WebInspector
-Ruby gem to inspect completely a web page. It scrapes a given URL, and returns you its title, description, meta, links, images and more.
+Ruby gem to inspect web pages. It scrapes a given URL and returns its title, description, meta tags, links, images, and more.
+<a href="https://codeclimate.com/github/davidesantangelo/webinspector"><img src="https://codeclimate.com/github/davidesantangelo/webinspector/badges/gpa.svg" /></a>
-## See it in action!
-You can try WebInspector live at this little demo: [https://scrappet.herokuapp.com](https://scrappet.herokuapp.com)
 ## Installation
 Add this line to your application's Gemfile:
@@ -23,58 +23,88 @@ Or install it yourself as:
 ## Usage
-Initialize a WebInspector instance for an URL, like this:
+### Initialize a WebInspector instance
 ```ruby
-page = WebInspector.new('http://davidesantangelo.com')
+page = WebInspector.new('http://example.com')
 ```
-## Accessing response status and headers
+### With options
-You can check the status and headers from the response like this:
+```ruby
+page = WebInspector.new('http://example.com', {
+  timeout: 30,                         # Request timeout in seconds (default: 30)
+  retries: 3,                          # Number of retries (default: 3)
+  headers: {'User-Agent': 'Custom UA'} # Custom HTTP headers
+})
+```
+### Accessing response status and headers
 ```ruby
 page.response.status  # 200
-page.response.headers # { "server"=>"apache", "content-type"=>"text/html; charset=utf-8", "cache-control"=>"must-revalidate, private, max-age=0", ... }
+page.response.headers # { "server"=>"apache", "content-type"=>"text/html; charset=utf-8", ... }
+page.status_code      # 200
+page.success?         # true if the page was loaded successfully
+page.error_message    # returns the error message if any
 ```
-## Accessing inpsected data
-You can see the data like this:
+### Accessing page data
 ```ruby
-page.url                 # URL of the page
-page.scheme              # Scheme of the page (http, https)
-page.host                # Hostname of the page (like, davidesantangelo.com, without the scheme)
-page.port                # Port of the page
-page.title               # title of the page from the head section, as string
-page.description         # description of the page
-page.links               # every link found
-page.images              # every image found
-page.meta                # metatags of the page
+page.url           # URL of the page
+page.scheme        # Scheme of the page (http, https)
+page.host          # Hostname of the page (like, example.com, without the scheme)
+page.port          # Port of the page
+page.title         # title of the page from the head section
+page.description   # description of the page
+page.links         # array of all links found on the page (absolute URLs)
+page.images        # array of all images found on the page (absolute URLs)
+page.meta          # meta tags of the page
+page.favicon       # favicon URL if available
 ```
-## Accessing meta tags
+### Working with meta tags
 ```ruby
-page.meta                 # metatags of the page
+page.meta                 # all meta tags
 page.meta['description']  # meta description
-page.meta['keywords']      # meta keywords
+page.meta['keywords']     # meta keywords
+page.meta['og:title']     # OpenGraph title
+```
+### Filtering links and images by domain
+```ruby
+page.domain_links('example.com')  # returns only links pointing to example.com
+page.domain_images('example.com') # returns only images hosted on example.com
+```
+### Searching for words
+```ruby
+page.find(["ruby", "rails"]) # returns [{"ruby"=>3}, {"rails"=>1}]
+```
+### Export all data to JSON
+```ruby
+page.to_hash # returns a hash with all page data
 ```
 ## Contributors
   * Steven Shelby ([@stevenshelby](https://github.com/stevenshelby))
-	* Sam Nissen ([@samnissen](https://github.com/samnissen))
+  * Sam Nissen ([@samnissen](https://github.com/samnissen))
 ## License
-The webinspector GEM is released under the MIT License.
+The WebInspector gem is released under the MIT License.
 ## Contributing
-1. Fork it ( https://github.com/[my-github-username]/webinspector/fork )
+1. Fork it ( https://github.com/davidesantangelo/webinspector/fork )
 2. Create your feature branch (`git checkout -b my-new-feature`)
 3. Commit your changes (`git commit -am 'Add some feature'`)
 4. Push to the branch (`git push origin my-new-feature`)
 5. Create a new Pull Request
->>>>>>> develop

data/Rakefile CHANGED Viewed

@@ -1,2 +1,3 @@
-require "bundler/gem_tasks"
+# frozen_string_literal: true
+require 'bundler/gem_tasks'

data/bin/console CHANGED Viewed

@@ -1,7 +1,8 @@
 #!/usr/bin/env ruby
+# frozen_string_literal: true
-require "bundler/setup"
-require "webinspector"
+require 'bundler/setup'
+require 'webinspector'
 # You can add fixtures and/or initialization code here to make experimenting
 # with your gem easier. You can also use a different console, if you like.
@@ -10,5 +11,5 @@ require "webinspector"
 # require "pry"
 # Pry.start
-require "irb"
+require 'irb'
 IRB.start

data/lib/web_inspector/inspector.rb CHANGED Viewed

@@ -1,127 +1,236 @@
+# frozen_string_literal: true
 require File.expand_path(File.join(File.dirname(__FILE__), 'meta'))
 module WebInspector
   class Inspector
+    attr_reader :page, :url, :host, :meta
     def initialize(page)
       @page = page
       @meta = WebInspector::Meta.new(page).meta
+      @base_url = nil
+    end
+    def set_url(url, host)
+      @url = url
+      @host = host
     end
     def title
-      @page.css('title').inner_text.strip rescue nil
+      @page.css('title').inner_text.strip
+    rescue StandardError
+      nil
     end
     def description
-      @meta['description'] || snippet
+      @meta['description'] || @meta['og:description'] || snippet
     end
     def body
       @page.css('body').to_html
     end
-    def meta
-      @meta
+    # Search for specific words in the page content
+    # @param words [Array<String>] List of words to search for
+    # @return [Array<Hash>] Counts of word occurrences
+    def find(words)
+      text = @page.at('html').inner_text
+      counter(text.downcase, words)
     end
+    # Get all links from the page
+    # @return [Array<String>] Array of URLs
     def links
-      get_new_links unless @links
-      return @links
+      @links ||= begin
+        links = []
+        @page.css('a').each do |a|
+          href = a[:href]
+          next unless href
+          # Skip javascript and mailto links
+          next if href.start_with?('javascript:', 'mailto:', 'tel:')
+          # Clean and normalize URL
+          href = href.strip
+          begin
+            absolute_url = make_absolute_url(href)
+            links << absolute_url if absolute_url
+          rescue URI::InvalidURIError
+            # Skip invalid URLs
+          end
+        end
+        links.uniq
+      end
     end
-    def domain_links(user_domain, host)
+    # Get links from a specific domain
+    # @param user_domain [String] Domain to filter links by
+    # @param host [String] Current host
+    # @return [Array<String>] Filtered links
+    def domain_links(user_domain, host = nil)
       @host ||= host
-      validated_domain_uri = validate_url_domain("http://#{user_domain.downcase.gsub(/\s+/, '')}")
-      raise "Invalid domain provided" unless validated_domain_uri
-      domain = validated_domain_uri.domain
-      domain_links = []
-      links.each do |l|
-        u = validate_url_domain(l)
-        next unless u && u.domain
-        domain_links.push(l) if domain == u.domain.downcase
+      return [] if links.empty?
+      # Handle nil user_domain
+      user_domain = @host.to_s if user_domain.nil? || user_domain.empty?
+      # Normalize domain for comparison
+      user_domain = user_domain.to_s.downcase.gsub(/\s+/, '')
+      user_domain = user_domain.sub(/^www\./, '') # Remove www prefix for comparison
+      links.select do |link|
+        uri = URI.parse(link.to_s)
+        next false unless uri.host # Skip URLs without hosts
+        uri_host = uri.host.to_s.downcase
+        uri_host = uri_host.sub(/^www\./, '') # Remove www prefix for comparison
+        uri_host.include?(user_domain)
+      rescue URI::InvalidURIError, NoMethodError
+        false
       end
-      return domain_links.compact
     end
-    def domain_images(user_domain, host)
+    # Get all images from the page
+    # @return [Array<String>] Array of image URLs
+    def images
+      @images ||= begin
+        images = []
+        @page.css('img').each do |img|
+          src = img[:src]
+          next unless src
+          # Clean and normalize URL
+          src = src.strip
+          begin
+            absolute_url = make_absolute_url(src)
+            images << absolute_url if absolute_url
+          rescue URI::InvalidURIError, URI::BadURIError
+            # Skip invalid URLs
+          end
+        end
+        images.uniq.compact
+      end
+    end
+    # Get images from a specific domain
+    # @param user_domain [String] Domain to filter images by
+    # @param host [String] Current host
+    # @return [Array<String>] Filtered images
+    def domain_images(user_domain, host = nil)
       @host ||= host
-      validated_domain_uri = validate_url_domain("http://#{user_domain.downcase.gsub(/\s+/, '')}")
-      raise "Invalid domain provided" unless validated_domain_uri
-      domain = validated_domain_uri.domain
-      domain_images = []
-      images.each do |img|
-        u = validate_url_domain(img)
-        next unless u && u.domain
-        domain_images.push(img) if u.domain.downcase.end_with?(domain)
+      return [] if images.empty?
+      # Handle nil user_domain
+      user_domain = @host.to_s if user_domain.nil? || user_domain.empty?
+      # Normalize domain for comparison
+      user_domain = user_domain.to_s.downcase.gsub(/\s+/, '')
+      user_domain = user_domain.sub(/^www\./, '') # Remove www prefix for comparison
+      images.select do |img|
+        uri = URI.parse(img.to_s)
+        next false unless uri.host # Skip URLs without hosts
+        uri_host = uri.host.to_s.downcase
+        uri_host = uri_host.sub(/^www\./, '') # Remove www prefix for comparison
+        uri_host.include?(user_domain)
+      rescue URI::InvalidURIError, NoMethodError
+        false
       end
-      return domain_images.compact
     end
-    # Normalize and validate the URLs on the page for comparison
+    private
+    # Count occurrences of words in text
+    # @param text [String] Text to search in
+    # @param words [Array<String>] Words to find
+    # @return [Array<Hash>] Count results
+    def counter(text, words)
+      words.map do |word|
+        { word => text.scan(/#{word.downcase}/).size }
+      end
+    end
+    # Validate a URL domain
+    # @param u [String] URL to validate
+    # @return [PublicSuffix::Domain, false] Domain object or false if invalid
     def validate_url_domain(u)
-      # Enforce a few bare standards before proceeding
-      u = "#{u}"
-      u = "/" if u.empty?
+      u = u.to_s
+      u = '/' if u.empty?
       begin
-        # Look for evidence of a host. If this is a relative link
-        # like '/contact', add the page host.
-        domained_url   = @host + u unless (u.split("/").first || "").match(/(\:|\.)/)
-        domained_url ||= u
-        # http the URL if it is missing
-        httpped_url   = "http://" + domained_url unless domained_url[0..3] == 'http'
-        httpped_url ||= domained_url
-        # Make sure the URL parses
-        uri     = URI.parse(httpped_url)
-        # Make sure the URL passes ICANN rules.
-        # The PublicSuffix object splits the domain and subdomain
-        # (unlike URI), which allows more liberal URL matching.
-        return PublicSuffix.parse(uri.host)
-      rescue URI::InvalidURIError, PublicSuffix::DomainInvalid => e
-        return false
+        domained_url = if !(u.split('/').first || '').match(/(:|\.)/)
+                         @host + u
+                       else
+                         u
+                       end
+        httpped_url = domained_url.start_with?('http') ? domained_url : "http://#{domained_url}"
+        uri = URI.parse(httpped_url)
+        PublicSuffix.parse(uri.host)
+      rescue URI::InvalidURIError, PublicSuffix::DomainInvalid
+        false
       end
     end
-    def images
-      get_new_images unless @images
-      return @images
-    end
+    # Make a URL absolute
+    # @param url [String] URL to make absolute
+    # @return [String, nil] Absolute URL or nil if invalid
+    def make_absolute_url(url)
+      return nil if url.nil? || url.empty?
-    private
-    def get_new_images
-      @images = []
-      @page.css("img").each do |img|
-        @images.push((img[:src].to_s.start_with? @url.to_s) ? img[:src] : URI.join(url, img[:src]).to_s) if (img and img[:src])
+      # If it's already absolute, return it
+      return url if url.start_with?('http://', 'https://')
+      # Get base URL from the page if not already set
+      if @base_url.nil?
+        base_tag = @page.at_css('base[href]')
+        @base_url = base_tag ? base_tag['href'] : nil
       end
-    end
-    def get_new_links
-      @links = []
-      @page.css("a").each do |a|
-        @links.push((a[:href].to_s.start_with? @url.to_s) ? a[:href] : URI.join(@url, a[:href]).to_s) if (a and a[:href])
+      begin
+        # Try joining with base URL first if available
+        if @base_url && !@base_url.empty?
+          begin
+            return URI.join(@base_url, url).to_s
+          rescue URI::InvalidURIError, URI::BadURIError
+            # Fall through to next method
+          end
+        end
+        # If we have @url, try to use it
+        if @url
+          begin
+            return URI.join(@url, url).to_s
+          rescue URI::InvalidURIError, URI::BadURIError
+            # Fall through to next method
+          end
+        end
+        # Otherwise use a default http:// base if url is absolute path
+        return "http://#{@host}#{url}" if url.start_with?('/')
+        # For truly relative URLs with no base, we need to make our best guess
+        return "http://#{@host}/#{url}" if @host
+        # Last resort, return the original
+        url
+      rescue URI::InvalidURIError, URI::BadURIError
+        url # Return original instead of nil to be more lenient
       end
     end
+    # Extract a snippet from the first long paragraph
+    # @return [String] Text snippet
     def snippet
       first_long_paragraph = @page.search('//p[string-length() >= 120]').first
-      first_long_paragraph ? first_long_paragraph.text : ''
+      first_long_paragraph ? first_long_paragraph.text.strip[0..255] : ''
     end
   end
-end
+end

data/lib/web_inspector/meta.rb CHANGED Viewed

@@ -1,15 +1,18 @@
+# frozen_string_literal: true
 module WebInspector
   class Meta
-  	def initialize(page)
+    def initialize(page)
       @page = page
     end
     def meta_tags
       {
-        'name'        => meta_tags_by('name'),
-        'http-equiv'  => meta_tags_by('http-equiv'),
-        'property'    => meta_tags_by('property'),
-        'charset'     => [charset_from_meta_charset]
+        'name' => meta_tags_by('name'),
+        'http-equiv' => meta_tags_by('http-equiv'),
+        'property' => meta_tags_by('property'),
+        'charset' => [charset_from_meta_charset],
+        'itemprop' => meta_tags_by('itemprop') # Add support for schema.org microdata
       }
     end
@@ -19,30 +22,48 @@ module WebInspector
     def meta
       meta_tag['name']
-          .merge(meta_tag['http-equiv'])
-          .merge(meta_tag['property'])
-          .merge('charset' => meta_tag['charset'])
+        .merge(meta_tag['http-equiv'])
+        .merge(meta_tag['property'])
+        .merge(meta_tag['itemprop'] || {})
+        .merge('charset' => meta_tag['charset'])
     end
     def charset
-      @charset ||= (charset_from_meta_charset || charset_from_meta_content_type)
+      @charset ||= charset_from_meta_charset || charset_from_meta_content_type || charset_from_header || 'utf-8'
     end
     private
     def charset_from_meta_charset
-      @page.css('meta[charset]')[0].attributes['charset'].value rescue nil
+      @page.css('meta[charset]')[0].attributes['charset'].value
+    rescue StandardError
+      nil
     end
     def charset_from_meta_content_type
-      @page.css("meta[http-equiv='Content-Type']")[0].attributes['content'].value.split(';')[1].split('=')[1] rescue nil
+      @page.css("meta[http-equiv='Content-Type']")[0].attributes['content'].value.split(';')[1].strip.split('=')[1]
+    rescue StandardError
+      nil
+    end
+    def charset_from_header
+      # Try to get charset from Content-Type header if available
+      nil
     end
-   	def meta_tags_by(attribute)
+    def meta_tags_by(attribute)
       hash = {}
       @page.css("meta[@#{attribute}]").map do |tag|
-        name    = tag.attributes[attribute].value.downcase rescue nil
-        content = tag.attributes['content'].value rescue nil
+        name = begin
+          tag.attributes[attribute].value.downcase
+        rescue StandardError
+          nil
+        end
+        content = begin
+          tag.attributes['content'].value
+        rescue StandardError
+          nil
+        end
         if name && content
           hash[name] ||= []
@@ -64,4 +85,4 @@ module WebInspector
       end
     end
   end
-end
+end

data/lib/web_inspector/page.rb CHANGED Viewed

@@ -1,3 +1,5 @@
+# frozen_string_literal: true
 require 'nokogiri'
 require 'uri'
 require 'open-uri'
@@ -5,129 +7,211 @@ require 'open_uri_redirections'
 require 'faraday'
 require 'public_suffix'
+# Explicitly load Faraday::Retry if available
+begin
+  require 'faraday/retry'
+rescue LoadError
+  # Faraday retry is not available
+end
 require File.expand_path(File.join(File.dirname(__FILE__), 'inspector'))
 require File.expand_path(File.join(File.dirname(__FILE__), 'request'))
 module WebInspector
   class Page
-    attr_reader :url, :scheme, :host, :port, :title, :description, :body, :meta, :links, :domain_links, :domain_images, :images, :response
+    attr_reader :url, :scheme, :host, :port, :title, :description, :body, :meta, :links,
+                :domain_links, :domain_images, :images, :response, :status_code, :favicon
+    DEFAULT_TIMEOUT = 30
+    DEFAULT_RETRIES = 3
+    DEFAULT_USER_AGENT = -> { "WebInspector/#{WebInspector::VERSION} (+https://github.com/davidesantangelo/webinspector)" }
+    # Initialize a new WebInspector Page
+    #
+    # @param url [String] The URL to inspect
+    # @param options [Hash] Optional parameters
+    # @option options [Integer] :timeout Request timeout in seconds
+    # @option options [Integer] :retries Number of retries for failed requests
+    # @option options [Hash] :headers Custom HTTP headers
+    # @option options [Boolean] :allow_redirections Whether to follow redirects
+    # @option options [String] :user_agent Custom user agent
     def initialize(url, options = {})
       @url = url
       @options = options
+      @retries = options[:retries] || DEFAULT_RETRIES
+      @timeout = options[:timeout] || DEFAULT_TIMEOUT
+      @headers = options[:headers] || { 'User-Agent' => options[:user_agent] || DEFAULT_USER_AGENT.call }
+      @allow_redirections = options[:allow_redirections].nil? || options[:allow_redirections]
       @request = WebInspector::Request.new(url)
-      @inspector = WebInspector::Inspector.new(page)
-    end
-    def title
-      @inspector.title
+      begin
+        @inspector = WebInspector::Inspector.new(page)
+        @inspector.set_url(url, host)
+        @status_code = 200
+      rescue StandardError => e
+        @error = e
+        @status_code = e.respond_to?(:status_code) ? e.status_code : 500
+      end
     end
-    def description
-      @inspector.description
+    # Check if the page was successfully loaded
+    #
+    # @return [Boolean] true if the page was loaded, false otherwise
+    def success?
+      !@inspector.nil? && !@error
     end
-    def body
-      @inspector.body
-    end
-    def links
-      @inspector.links
+    # Get the error message if any
+    #
+    # @return [String, nil] The error message or nil if no error
+    def error_message
+      @error&.message
     end
-    def images
-      @inspector.images
-    end
+    # Delegate methods to inspector
+    %i[title description body links images meta].each do |method|
+      define_method(method) do
+        return nil unless success?
-    def meta
-      @inspector.meta
+        @inspector.send(method)
+      end
     end
-    def url
-      @request.url
-    end
+    # Special case for find method that takes arguments
+    def find(words)
+      return nil unless success?
-    def host
-      @request.host
+      @inspector.find(words)
     end
-    def domain
-      @request.domain
+    # Delegate methods to request
+    %i[url host domain scheme port].each do |method|
+      define_method(method) do
+        @request.send(method)
+      end
     end
-    def scheme
-      @request.scheme
-    end
+    # Get the favicon URL if available
+    #
+    # @return [String, nil] The favicon URL or nil if not found
+    def favicon
+      return @favicon if defined?(@favicon)
-    def port
-      @request.port
+      return nil unless success?
+      @favicon = begin
+        # Try multiple approaches to find favicon
+        # 1. Look for standard favicon link tags
+        favicon_link = @inspector.page.css("link[rel='shortcut icon'], link[rel='icon'], link[rel='apple-touch-icon']").first
+        if favicon_link && favicon_link['href']
+          begin
+            return URI.join(url, favicon_link['href']).to_s
+          rescue URI::InvalidURIError
+            # Try next method
+          end
+        end
+        # 2. Try the default location /favicon.ico
+        "#{scheme}://#{host}/favicon.ico"
+      rescue StandardError
+        nil
+      end
     end
     def domain_links(u = domain)
+      return [] unless success?
       @inspector.domain_links(u, host)
     end
     def domain_images(u = domain)
+      return [] unless success?
       @inspector.domain_images(u, host)
     end
+    # Get full JSON representation of the page
+    #
+    # @return [Hash] JSON representation of the page
     def to_hash
       {
-        'url'           => url,
-        'scheme'        => scheme,
-        'host'          => host,
-        'port'          => port,
-        'title'         => title,
-        'description'  	=> description,
-        'meta'  				=> meta,
-        'links'					=> links,
-        'images'				=> images,
-        'response'      => { 'status'  => response.status,
-                             'headers' => response.headers }
+        'url' => url,
+        'scheme' => scheme,
+        'host' => host,
+        'port' => port,
+        'title' => title,
+        'description' => description,
+        'meta' => meta,
+        'links' => links,
+        'images' => images,
+        'favicon' => favicon,
+        'response' => {
+          'status' => status_code,
+          'headers' => response&.headers || {},
+          'success' => success?
+        },
+        'error' => error_message
       }
     end
     def response
       @response ||= fetch
-    rescue Faraday::TimeoutError, Faraday::Error::ConnectionFailed, RuntimeError, URI::InvalidURIError => e
+    rescue StandardError => e
+      @error = e
       nil
     end
     private
     def fetch
-      session = Faraday.new(:url => url) do |faraday|
-        faraday.request :retry, max: @retries
+      session = Faraday.new(url: url) do |faraday|
+        # Configure retries based on available middleware
+        faraday.request :retry, { max: @retries } if defined?(Faraday::Retry)
+        # Configure redirect handling
         if @allow_redirections
-          faraday.use FaradayMiddleware::FollowRedirects, limit: 10
-          faraday.use :cookie_jar
+          begin
+            faraday.use FaradayMiddleware::FollowRedirects, limit: 10
+            faraday.use :cookie_jar
+          rescue NameError, NoMethodError
+            # Continue without middleware if not available
+          end
         end
-        faraday.headers.merge!(@headers || {})
+        faraday.headers.merge!(@headers)
         faraday.adapter :net_http
       end
-      response = session.get do |req|
-        req.options.timeout      = @connection_timeout
-        req.options.open_timeout = @read_timeout
-      end
+      # Manual retry mechanism as a backup
+      retries = 0
-      @url = response.env.url.to_s
+      begin
+        response = session.get do |req|
+          req.options.timeout = @timeout
+          req.options.open_timeout = @timeout
+        end
-      response
+        @url = response.env.url.to_s
+        response
+      rescue Faraday::TimeoutError, Faraday::ConnectionFailed => e
+        retries += 1
+        retry if retries <= @retries
+        raise e
+      end
     end
     def with_default_scheme(request)
-      request.url && request.scheme.nil? ? 'http://' + request.url : request.url
-    end
-    def default_user_agent
-      "WebInspector/#{WebInspector::VERSION} (+https://github.com/davidesantangelo/webinspector)"
+      request.url && request.scheme.nil? ? "http://#{request.url}" : request.url
     end
     def page
-      Nokogiri::HTML(open(with_default_scheme(@request), allow_redirections: :safe))
+      # Use URI.open instead of open for Ruby 3.0+ compatibility
+      Nokogiri::HTML(URI.open(with_default_scheme(@request),
+                              allow_redirections: :safe,
+                              read_timeout: @timeout,
+                              'User-Agent' => @headers['User-Agent']))
     end
   end
 end

data/lib/web_inspector/request.rb CHANGED Viewed

@@ -1,3 +1,5 @@
+# frozen_string_literal: true
 require 'addressable/uri'
 module WebInspector
@@ -13,7 +15,7 @@ module WebInspector
     def host
       uri.host
     end
     def domain
       suffix_domain
     end
@@ -24,23 +26,23 @@ module WebInspector
     def port
       URI(normalized_uri).port
-    end
+    end
     private
     def suffix_domain
       return @domain if @domain
       begin
         @domain = PublicSuffix.parse(host).domain
-      rescue URI::InvalidURIError, PublicSuffix::DomainInvalid => e
+      rescue URI::InvalidURIError, PublicSuffix::DomainInvalid
         @domain = ''
       end
     end
     def uri
       Addressable::URI.parse(@url)
-    rescue Addressable::URI::InvalidURIError => e
+    rescue Addressable::URI::InvalidURIError
       nil
     end
@@ -48,4 +50,4 @@ module WebInspector
       uri.normalize.to_s
     end
   end
-end
+end

data/lib/web_inspector/version.rb CHANGED Viewed

@@ -1,3 +1,5 @@
+# frozen_string_literal: true
 module WebInspector
-  VERSION = "0.4.0"
+  VERSION = '1.0.0'
 end

data/lib/web_inspector.rb CHANGED Viewed

@@ -1,10 +1,12 @@
+# frozen_string_literal: true
 require File.expand_path(File.join(File.dirname(__FILE__), 'web_inspector/page'))
 require File.expand_path(File.join(File.dirname(__FILE__), 'web_inspector/version'))
 module WebInspector
-  extend self
+  module_function
   def new(url, options = {})
     Page.new(url, options)
   end
-end
+end

data/lib/webinspector.rb CHANGED Viewed

@@ -1 +1,3 @@
-require File.expand_path(File.join(File.dirname(__FILE__), './web_inspector'))
+# frozen_string_literal: true
+require File.expand_path(File.join(File.dirname(__FILE__), './web_inspector'))

data/webinspector.gemspec CHANGED Viewed

@@ -1,38 +1,45 @@
-# coding: utf-8
-lib = File.expand_path('../lib', __FILE__)
+# frozen_string_literal: true
+lib = File.expand_path('lib', __dir__)
 $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
-require File.expand_path('../lib/web_inspector/version', __FILE__)
+require File.expand_path('lib/web_inspector/version', __dir__)
 Gem::Specification.new do |spec|
-  spec.name          = "webinspector"
+  spec.name          = 'webinspector'
   spec.version       = WebInspector::VERSION
-  spec.authors       = ["Davide Santangelo"]
-  spec.email         = ["davide.santangelo@gmail.com"]
+  spec.authors       = ['Davide Santangelo']
+  spec.email         = ['davide.santangelo@gmail.com']
-  spec.summary       = %q{Ruby gem to inspect completely a web page.}
-  spec.description   = %q{Ruby gem to inspect completely a web page. It scrapes a given URL, and returns you its meta, links, images and more.}
-  spec.homepage      = ""
-  spec.license       = "MIT"
+  spec.summary       = 'Ruby gem to inspect completely a web page.'
+  spec.description   = 'Ruby gem to inspect completely a web page. It scrapes a given URL, and returns you its meta, links, images and more.'
+  spec.homepage      = 'https://github.com/davidesantangelo/webinspector'
+  spec.license       = 'MIT'
   spec.files         = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
-  spec.bindir        = "exe"
+  spec.bindir        = 'exe'
   spec.executables   = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
-  spec.require_paths = ["lib"]
-  spec.add_development_dependency "bundler", "~> 1.8"
-  spec.add_development_dependency "rake", "~> 10.0"
+  spec.require_paths = ['lib']
+  spec.metadata      = {
+    'source_code_uri' => 'https://github.com/davidesantangelo/webinspector',
+    'bug_tracker_uri' => 'https://github.com/davidesantangelo/webinspector/issues'
+  }
-  spec.add_development_dependency 'rspec'
-  spec.add_development_dependency "vcr"
-  spec.add_development_dependency "typhoeus"
+  spec.required_ruby_version = '>= 3.0.0'
-  spec.required_ruby_version = ">= 1.9.3"
+  spec.add_development_dependency 'rake', '~> 13.0'
+  spec.add_development_dependency 'rspec', '~> 3.12'
+  spec.add_development_dependency 'rubocop', '~> 1.50'
+  spec.add_development_dependency 'vcr', '~> 6.1'
+  spec.add_development_dependency 'webmock', '~> 3.18'
-  spec.add_dependency "faraday"
-  spec.add_dependency "json"
-  spec.add_dependency "addressable"
-  spec.add_dependency "nokogiri"
-  spec.add_dependency "open_uri_redirections"
-  spec.add_dependency "openurl"
-  spec.add_dependency "public_suffix"
+  spec.add_dependency 'addressable', '~> 2.8'
+  spec.add_dependency 'faraday', '~> 2.7'
+  spec.add_dependency 'faraday-cookie_jar', '~> 0.0.7'
+  spec.add_dependency 'faraday-follow_redirects', '~> 0.3'
+  spec.add_dependency 'faraday-retry', '~> 2.1'
+  spec.add_dependency 'json', '~> 2.6'
+  spec.add_dependency 'nokogiri', '~> 1.14'
+  spec.add_dependency 'open_uri_redirections', '~> 0.2'
+  spec.add_dependency 'openurl', '~> 1.0'
+  spec.add_dependency 'public_suffix', '~> 5.0'
 end

metadata CHANGED Viewed

@@ -1,183 +1,225 @@
 --- !ruby/object:Gem::Specification
 name: webinspector
 version: !ruby/object:Gem::Version
-  version: 0.4.0
+  version: 1.0.0
 platform: ruby
 authors:
 - Davide Santangelo
-autorequire:
+autorequire:
 bindir: exe
 cert_chain: []
-date: 2015-06-09 00:00:00.000000000 Z
+date: 2025-03-18 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
-  name: bundler
+  name: rake
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '1.8'
+        version: '13.0'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '1.8'
+        version: '13.0'
 - !ruby/object:Gem::Dependency
-  name: rake
+  name: rspec
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '10.0'
+        version: '3.12'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '10.0'
+        version: '3.12'
 - !ruby/object:Gem::Dependency
-  name: rspec
+  name: rubocop
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - ">="
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: '0'
+        version: '1.50'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - ">="
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: '0'
+        version: '1.50'
 - !ruby/object:Gem::Dependency
   name: vcr
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - ">="
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: '0'
+        version: '6.1'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - ">="
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: '0'
+        version: '6.1'
 - !ruby/object:Gem::Dependency
-  name: typhoeus
+  name: webmock
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - ">="
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: '0'
+        version: '3.18'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - ">="
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: '0'
+        version: '3.18'
+- !ruby/object:Gem::Dependency
+  name: addressable
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '2.8'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '2.8'
 - !ruby/object:Gem::Dependency
   name: faraday
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - ">="
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: '0'
+        version: '2.7'
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - ">="
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: '0'
+        version: '2.7'
 - !ruby/object:Gem::Dependency
-  name: json
+  name: faraday-cookie_jar
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - ">="
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: '0'
+        version: 0.0.7
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - ">="
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: '0'
+        version: 0.0.7
 - !ruby/object:Gem::Dependency
-  name: addressable
+  name: faraday-follow_redirects
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - ">="
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: '0'
+        version: '0.3'
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - ">="
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: '0'
+        version: '0.3'
+- !ruby/object:Gem::Dependency
+  name: faraday-retry
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '2.1'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '2.1'
+- !ruby/object:Gem::Dependency
+  name: json
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '2.6'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '2.6'
 - !ruby/object:Gem::Dependency
   name: nokogiri
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - ">="
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: '0'
+        version: '1.14'
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - ">="
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: '0'
+        version: '1.14'
 - !ruby/object:Gem::Dependency
   name: open_uri_redirections
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - ">="
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: '0'
+        version: '0.2'
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - ">="
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: '0'
+        version: '0.2'
 - !ruby/object:Gem::Dependency
   name: openurl
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - ">="
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: '0'
+        version: '1.0'
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - ">="
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: '0'
+        version: '1.0'
 - !ruby/object:Gem::Dependency
   name: public_suffix
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - ">="
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: '0'
+        version: '5.0'
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - ">="
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: '0'
+        version: '5.0'
 description: Ruby gem to inspect completely a web page. It scrapes a given URL, and
   returns you its meta, links, images and more.
 email:
@@ -203,11 +245,13 @@ files:
 - lib/web_inspector/version.rb
 - lib/webinspector.rb
 - webinspector.gemspec
-homepage: ''
+homepage: https://github.com/davidesantangelo/webinspector
 licenses:
 - MIT
-metadata: {}
-post_install_message:
+metadata:
+  source_code_uri: https://github.com/davidesantangelo/webinspector
+  bug_tracker_uri: https://github.com/davidesantangelo/webinspector/issues
+post_install_message:
 rdoc_options: []
 require_paths:
 - lib
@@ -215,16 +259,15 @@ required_ruby_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
-      version: 1.9.3
+      version: 3.0.0
 required_rubygems_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubyforge_project:
-rubygems_version: 2.4.6
-signing_key:
+rubygems_version: 3.3.26
+signing_key:
 specification_version: 4
 summary: Ruby gem to inspect completely a web page.
 test_files: []