RubyGems - broken_link_finder - Versions diffs - 0.9.5 → 0.10.0 - Mend

broken_link_finder 0.9.5 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +11 -0
data/Gemfile.lock +3 -3
data/README.md +18 -14
data/benchmark.rb +9 -5
data/bin/console +4 -14
data/exe/broken_link_finder +5 -2
data/lib/broken_link_finder.rb +5 -1
data/lib/broken_link_finder/finder.rb +85 -49
data/lib/broken_link_finder/reporter/html_reporter.rb +134 -0
data/lib/broken_link_finder/reporter/reporter.rb +77 -0
data/lib/broken_link_finder/reporter/text_reporter.rb +86 -0
data/lib/broken_link_finder/version.rb +1 -1
metadata +6 -5
data/lib/broken_link_finder/reporter.rb +0 -116

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 4faeb020322d2882bb43c777d0743f964c364261f33bf62712f7e366f1230762
-  data.tar.gz: 133e61eb8880de6464c2537372d9ac4e82fbdfa36e113e190a4afa141fd77b0d
+  metadata.gz: 7a53784c1bd2f75c18b3492ea782b4cc2e229a94f89afcf33b60ef633512554e
+  data.tar.gz: 393dca220b7f00d72314c93e7b877e0412afdf784fa2e563bbecb2dc6c6b29f7
 SHA512:
-  metadata.gz: 69dc743ce7965125e5c0f5edff817c83f558ff954660a405ef838f7b05437217ca5d287e8d0aa789265b74b0e05e035488ebfe7604a5a1cc92dba67caa331e25
-  data.tar.gz: 667a2341c12d7b39475391e258827b6b6bd425d141e9349c6c0111b7871432376f91247c1673d8980520afd8d7d4865e38c3cb3f6d3d51a4ae56af8ed617206d
+  metadata.gz: c0d304e5b0a9258265c5c084c0a6e5819c169ba8eb02b3c6317a37784a9ca12982b0fc520c3cca1060fde60126ee936708d7891c69133c5d72c9c0287a79b3f5
+  data.tar.gz: c21a4aec2c077e2617fb625debad28f746148ad98229a27a590a4412601e30759c709aa3a6e6d80e81c16160e16968fc0392181fc9c75e4da06578452f7c5ab6

data/CHANGELOG.md CHANGED

@@ -9,6 +9,17 @@
 - ...
 ---
+## v0.10.0
+### Added
+- A `--html` flag to the `crawl` executable command which produces a HTML report (instead of text).
+- Added a 'retry' mechanism for any broken links found. This is essentially a verification step before generating a report.
+- `Finder#crawl_stats` for info such as crawl duration, total links crawled etc.
+### Changed/Removed
+- The API has changed somewhat. See the [docs](https://www.rubydoc.info/gems/broken_link_finder) for the up to date code signatures if you're using `broken_link_finder` outside of its executable.
+### Fixed
+- ...
+---
 ## v0.9.5
 ### Added
 - ...

data/Gemfile.lock CHANGED

@@ -1,7 +1,7 @@
 PATH
   remote: .
   specs:
-    broken_link_finder (0.9.5)
+    broken_link_finder (0.10.0)
       thor (~> 0.20)
       thread (~> 0.2)
       wgit (~> 0.5)
@@ -18,7 +18,7 @@ GEM
       safe_yaml (~> 1.0.0)
     ethon (0.12.0)
       ffi (>= 1.3.0)
-    ffi (1.11.2)
+    ffi (1.11.3)
     hashdiff (1.0.0)
     maxitest (3.4.0)
       minitest (>= 5.0.0, < 5.13.0)
@@ -65,4 +65,4 @@ RUBY VERSION
    ruby 2.5.3p105
 BUNDLED WITH
-   2.0.1
+   2.0.2

data/README.md CHANGED

@@ -57,7 +57,7 @@ Installing this gem installs the `broken_link_finder` executable into your `$PAT
     $ broken_link_finder crawl http://txti.es
-Adding the `-r` flag would crawl the entire `txti.es` site, not just its index page.
+Adding the `--recursive` flag would crawl the entire `txti.es` site, not just its index page.
 See the [output](#Output) section below for an example of a site with broken links.
@@ -76,7 +76,7 @@ require 'broken_link_finder'
 finder = BrokenLinkFinder.new
 finder.crawl_site 'http://txti.es' # Or use Finder#crawl_page for a single webpage.
-finder.pretty_print_link_report    # Or use Finder#broken_links and Finder#ignored_links
+finder.report                      # Or use Finder#broken_links and Finder#ignored_links
                                    # for direct access to the link Hashes.
 ```
@@ -91,13 +91,15 @@ See the full source code documentation [here](https://www.rubydoc.info/gems/brok
 If broken links are found then the output will look something like:
 ```text
+Crawled http://txti.es (7 page(s) in 7.88 seconds)
 Found 6 broken link(s) across 2 page(s):
 The following broken links were found on 'http://txti.es/about':
 http://twitter.com/thebarrytone
+/doesntexist
 http://twitter.com/nwbld
-http://twitter.com/txties
-https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=84L4BDS86FBUU
+twitter.com/txties
 The following broken links were found on 'http://txti.es/how':
 http://en.wikipedia.org/wiki/Markdown
@@ -105,14 +107,16 @@ http://imgur.com
 Ignored 3 unsupported link(s) across 2 page(s), which you should check manually:
-The following links were ignored on http://txti.es:
+The following links were ignored on 'http://txti.es':
 tel:+13174562564
 mailto:big.jim@jmail.com
-The following links were ignored on http://txti.es/contact:
+The following links were ignored on 'http://txti.es/contact':
 ftp://server.com
 ```
+You can provide the `--html` flag if you'd prefer a HTML based report.
 ## Contributing
 Bug reports and feature requests are welcome on [GitHub](https://github.com/michaeltelford/broken-link-finder). Just raise an issue.
@@ -128,11 +132,11 @@ After checking out the repo, run `bin/setup` to install dependencies. Then, run
 To install this gem onto your local machine, run `bundle exec rake install`.
 To release a new gem version:
-- Update the deps in the `*.gemspec` if necessary
-- Update the version number in `version.rb` and add the new version to the `CHANGELOG`
-- Run `bundle install`
-- Run `bundle exec rake test` ensuring all tests pass
-- Run `bundle exec rake compile` ensuring no warnings
-- Run `bundle exec rake install && rbenv rehash`
-- Manually test the executable
-- Run `bundle exec rake release[origin]`
+- Update the deps in the `*.gemspec`, if necessary.
+- Update the version number in `version.rb` and add the new version to the `CHANGELOG`.
+- Run `bundle install`.
+- Run `bundle exec rake test` ensuring all tests pass.
+- Run `bundle exec rake compile` ensuring no warnings.
+- Run `bundle exec rake install && rbenv rehash`.
+- Manually test the executable.
+- Run `bundle exec rake release[origin]`.

data/benchmark.rb CHANGED

@@ -10,15 +10,19 @@ finder = BrokenLinkFinder::Finder.new
 puts Benchmark.measure { finder.crawl_site url }
 puts "Links crawled: #{finder.total_links_crawled}"
-# http://txti.es page crawl
-# Pre  threading: 17.5 seconds
-# Post threading: 7.5  seconds
+# http://txti.es page crawl with threading
+# Pre:  17.5 seconds
+# Post: 7.5  seconds
-# http://txti.es post threading - page vs site crawl
+# http://txti.es with threading - page vs site crawl
 # Page: 9.526981
 # Site: 9.732416
 # Multi-threading crawl_site now yields the same time as a single page
-# Large site crawl - post all link recording functionality
+# Large site crawl - all link recording functionality
 # Pre:  608 seconds with 7665 links crawled
 # Post: 355 seconds with 1099 links crawled
+# Large site crawl - retry mechanism
+# Pre:  140 seconds
+# Post: 170 seconds

data/bin/console CHANGED

@@ -5,20 +5,10 @@ require 'bundler/setup'
 require 'pry'
 require 'byebug'
 require 'broken_link_finder'
+require 'logger'
-# Monkey patch and log all HTTP requests made during the console.
-module Typhoeus
-  singleton_class.class_eval do
-    alias_method :orig_get, :get
-  end
-  def self.get(base_url, options = {})
-    puts "[typhoeus] Sending GET: #{base_url}"
-    resp = orig_get(base_url, options)
-    puts "[typhoeus] Status: #{resp.code} (#{resp.body.length} bytes in #{resp.total_time} seconds)"
-    resp
-  end
-end
+# Logs all HTTP requests.
+Wgit.logger.level = Logger::DEBUG
 # Call reload to load all recent code changes.
 def reload
@@ -39,6 +29,6 @@ by_link = Finder.new sort: :link
 finder = by_page
 # Start the console.
-puts "\nbroken_link_finder v#{BrokenLinkFinder::VERSION}"
+puts "\nbroken_link_finder v#{BrokenLinkFinder::VERSION} (#{Wgit.version_str})"
 binding.pry

data/exe/broken_link_finder CHANGED

@@ -9,12 +9,14 @@ class BrokenLinkFinderCLI < Thor
   desc 'crawl [URL]', 'Find broken links at the URL'
   option :recursive, type: :boolean, aliases: [:r], default: false, desc: 'Crawl the entire site.'
   option :threads, type: :numeric, aliases: [:t], default: BrokenLinkFinder::DEFAULT_MAX_THREADS, desc: 'Max number of threads to use when crawling recursively; 1 thread per web page.'
+  option :html, type: :boolean, aliases: [:h], default: false, desc: 'Produce a HTML report (instead of text)'
   option :sort_by_link, type: :boolean, aliases: [:l], default: false, desc: 'Makes report more concise if there are more pages crawled than broken links found. Use with -r on medium/large sites.'
   option :verbose, type: :boolean, aliases: [:v], default: false, desc: 'Display all ignored links.'
   option :concise, type: :boolean, aliases: [:c], default: false, desc: 'Display only a summary of broken links.'
   def crawl(url)
     url = "http://#{url}" unless url.start_with?('http')
+    report_type     = options[:html] ? :html : :text
     sort_by         = options[:sort_by_link] ? :link : :page
     max_threads     = options[:threads]
     broken_verbose  = !options[:concise]
@@ -22,8 +24,9 @@ class BrokenLinkFinderCLI < Thor
     finder = BrokenLinkFinder::Finder.new(sort: sort_by, max_threads: max_threads)
     options[:recursive] ? finder.crawl_site(url) : finder.crawl_page(url)
-    finder.pretty_print_link_report(
-      broken_verbose: broken_verbose,
+    finder.report(
+      type:            report_type,
+      broken_verbose:  broken_verbose,
       ignored_verbose: ignored_verbose
     )
   rescue Exception => e

data/lib/broken_link_finder.rb CHANGED

@@ -2,8 +2,12 @@
 require 'wgit'
 require 'wgit/core_ext'
+require 'thread/pool'
+require 'set'
 require_relative './broken_link_finder/wgit_extensions'
 require_relative './broken_link_finder/version'
-require_relative './broken_link_finder/reporter'
+require_relative './broken_link_finder/reporter/reporter'
+require_relative './broken_link_finder/reporter/text_reporter'
+require_relative './broken_link_finder/reporter/html_reporter'
 require_relative './broken_link_finder/finder'

data/lib/broken_link_finder/finder.rb CHANGED

@@ -1,9 +1,5 @@
 # frozen_string_literal: true
-require_relative 'reporter'
-require 'thread/pool'
-require 'set'
 module BrokenLinkFinder
   DEFAULT_MAX_THREADS = 100
@@ -13,7 +9,7 @@ module BrokenLinkFinder
   end
   class Finder
-    attr_reader :sort, :broken_links, :ignored_links, :total_links_crawled, :max_threads
+    attr_reader :sort, :max_threads, :broken_links, :ignored_links, :crawl_stats
     # Creates a new Finder instance.
     def initialize(sort: :page, max_threads: BrokenLinkFinder::DEFAULT_MAX_THREADS)
@@ -25,35 +21,38 @@ module BrokenLinkFinder
       @lock        = Mutex.new
       @crawler     = Wgit::Crawler.new
-      clear_links
+      reset_crawl
     end
     # Clear/empty the link collection Hashes.
-    def clear_links
+    def reset_crawl
       @broken_links        = {}
       @ignored_links       = {}
-      @total_links_crawled = 0
-      @all_broken_links    = Set.new
-      @all_intact_links    = Set.new
+      @all_broken_links    = Set.new # Used to prevent crawling a link twice.
+      @all_intact_links    = Set.new #  "
+      @broken_link_map     = {}      # Maps a link to its absolute form.
+      @crawl_stats         = {}      # Records crawl stats e.g. duration etc.
     end
     # Finds broken links within a single page and appends them to the
     # @broken_links array. Returns true if at least one broken link was found.
     # Access the broken links afterwards with Finder#broken_links.
     def crawl_url(url)
-      clear_links
+      reset_crawl
-      url = url.to_url
-      doc = @crawler.crawl(url)
+      start = Time.now
+      url   = url.to_url
+      doc   = @crawler.crawl(url)
       # Ensure the given page url is valid.
       raise "Invalid or broken URL: #{url}" unless doc
       # Get all page links and determine which are broken.
       find_broken_links(doc)
+      retry_broken_links
       sort_links
-      set_total_links_crawled
+      set_crawl_stats(url: url, pages_crawled: [url], start: start)
       @broken_links.any?
     end
@@ -63,15 +62,16 @@ module BrokenLinkFinder
     # at least one broken link was found and an Array of all pages crawled.
     # Access the broken links afterwards with Finder#broken_links.
     def crawl_site(url)
-      clear_links
+      reset_crawl
-      url           = url.to_url
-      pool          = Thread.pool(@max_threads)
-      crawled_pages = []
+      start   = Time.now
+      url     = url.to_url
+      pool    = Thread.pool(@max_threads)
+      crawled = Set.new
       # Crawl the site's HTML web pages looking for links.
       externals = @crawler.crawl_site(url) do |doc|
-        crawled_pages << doc.url
+        crawled << doc.url
         next unless doc
         # Start a thread for each page, checking for broken links.
@@ -83,30 +83,31 @@ module BrokenLinkFinder
       # Wait for all threads to finish.
       pool.shutdown
+      retry_broken_links
       sort_links
-      set_total_links_crawled
+      set_crawl_stats(url: url, pages_crawled: crawled.to_a, start: start)
-      [@broken_links.any?, crawled_pages.uniq]
+      @broken_links.any?
     end
     # Pretty prints the link report into a stream e.g. STDOUT or a file,
     # anything that respond_to? :puts. Defaults to STDOUT.
-    # Returns true if there were broken links and vice versa.
-    def pretty_print_link_report(
-      stream = STDOUT,
-      broken_verbose:  true,
-      ignored_verbose: false
-    )
-      reporter = BrokenLinkFinder::Reporter.new(
-        stream, @sort, @broken_links, @ignored_links
-      )
-      reporter.pretty_print_link_report(
-        broken_verbose:  broken_verbose,
-        ignored_verbose: ignored_verbose
-      )
-      @broken_links.any?
+    def report(stream = STDOUT,
+               type: :text, broken_verbose: true, ignored_verbose: false)
+      klass = case type
+              when :text
+                BrokenLinkFinder::TextReporter
+              when :html
+                BrokenLinkFinder::HTMLReporter
+              else
+                raise "type: must be :text or :html, not: :#{type}"
+              end
+      reporter = klass.new(stream, @sort, @broken_links,
+                           @ignored_links, @broken_link_map, @crawl_stats)
+      reporter.call(broken_verbose:  broken_verbose,
+                    ignored_verbose: ignored_verbose)
     end
     private
@@ -117,11 +118,11 @@ module BrokenLinkFinder
       # Iterate over the supported links checking if they're broken or not.
       links.each do |link|
-        # Check if the link has already been processed previously.
+        # Skip if the link has been processed previously.
         next if @all_intact_links.include?(link)
         if @all_broken_links.include?(link)
-          append_broken_link(page.url, link)
+          append_broken_link(page.url, link) # Record on which page.
           next
         end
@@ -129,10 +130,8 @@ module BrokenLinkFinder
         link_doc = crawl_link(page, link)
         # Determine if the crawled link is broken or not.
-        if  link_doc.nil? ||
-            @crawler.last_response.not_found? ||
-            has_broken_anchor(link_doc)
-          append_broken_link(page.url, link)
+        if link_broken?(link_doc)
+          append_broken_link(page.url, link, doc: page)
         else
           @lock.synchronize { @all_intact_links << link }
         end
@@ -141,6 +140,17 @@ module BrokenLinkFinder
       nil
     end
+    # Implements a retry mechanism for each of the broken links found.
+    # Removes any broken links found to be working OK.
+    def retry_broken_links
+      sleep(0.5) # Give the servers a break, then retry the links.
+      @broken_link_map.each do |link, href|
+        doc = @crawler.crawl(href)
+        remove_broken_link(link) unless link_broken?(doc)
+      end
+    end
     # Report and reject any non supported links. Any link that is absolute and
     # doesn't start with 'http' is unsupported e.g. 'mailto:blah' etc.
     def get_supported_links(doc)
@@ -153,12 +163,17 @@ module BrokenLinkFinder
          end
     end
-    # Makes the link absolute and crawls it, returning its Wgit::Document.
+    # Make the link absolute and crawl it, returning its Wgit::Document.
     def crawl_link(doc, link)
       link = link.prefix_base(doc)
       @crawler.crawl(link)
     end
+    # Return if the crawled link is broken or not.
+    def link_broken?(doc)
+      doc.nil? || @crawler.last_response.not_found? || has_broken_anchor(doc)
+    end
     # Returns true if the link is/contains a broken anchor/fragment.
     def has_broken_anchor(doc)
       raise 'link document is nil' unless doc
@@ -170,7 +185,8 @@ module BrokenLinkFinder
     end
     # Append key => [value] to @broken_links.
-    def append_broken_link(url, link)
+    # If doc: is provided then the link will be recorded in absolute form.
+    def append_broken_link(url, link, doc: nil)
       key, value = get_key_value(url, link)
       @lock.synchronize do
@@ -178,6 +194,23 @@ module BrokenLinkFinder
         @broken_links[key] << value
         @all_broken_links  << link
+        @broken_link_map[link] = link.prefix_base(doc) if doc
+      end
+    end
+    # Remove the broken_link from the necessary collections.
+    def remove_broken_link(link)
+      @lock.synchronize do
+        if @sort == :page
+          @broken_links.each { |_k, links| links.delete(link) }
+          @broken_links.delete_if { |_k, links| links.empty? }
+        else
+          @broken_links.delete(link)
+        end
+        @all_broken_links.delete(link)
+        @all_intact_links << link
       end
     end
@@ -217,12 +250,15 @@ module BrokenLinkFinder
     end
     # Sets and returns the total number of links crawled.
-    def set_total_links_crawled
-      @total_links_crawled = @all_broken_links.size + @all_intact_links.size
+    def set_crawl_stats(url:, pages_crawled:, start:)
+      @crawl_stats[:url] = url
+      @crawl_stats[:pages_crawled] = pages_crawled
+      @crawl_stats[:num_pages] = pages_crawled.size
+      @crawl_stats[:num_links] = @all_broken_links.size + @all_intact_links.size
+      @crawl_stats[:duration] = Time.now - start
     end
-    alias crawl_page                crawl_url
-    alias crawl_r                   crawl_site
-    alias pretty_print_link_summary pretty_print_link_report
+    alias crawl_page crawl_url
+    alias crawl_r    crawl_site
   end
 end

data/lib/broken_link_finder/reporter/html_reporter.rb ADDED

@@ -0,0 +1,134 @@
+# frozen_string_literal: true
+module BrokenLinkFinder
+  class HTMLReporter < Reporter
+    # Creates a new HTMLReporter instance.
+    # stream is any Object that responds to :puts and :print.
+    def initialize(stream, sort,
+                   broken_links, ignored_links,
+                   broken_link_map, crawl_stats)
+      super
+    end
+    # Pretty print a report detailing the full link summary.
+    def call(broken_verbose: true, ignored_verbose: false)
+      puts '<div class="broken_link_finder_report">'
+      report_crawl_summary
+      report_broken_links(verbose: broken_verbose)
+      report_ignored_links(verbose: ignored_verbose)
+      puts '</div>'
+      nil
+    end
+    private
+    # Report a summary of the overall crawl.
+    def report_crawl_summary
+      puts format(
+        '<p class="crawl_summary">Crawled %s (%s page(s) in %s seconds)</p>',
+        @crawl_stats[:url],
+        @crawl_stats[:num_pages],
+        @crawl_stats[:duration]&.truncate(2)
+      )
+    end
+    # Report a summary of the broken links.
+    def report_broken_links(verbose: true)
+      puts '<div class="broken_links">'
+      if @broken_links.empty?
+        puts_summary 'Good news, there are no broken links!', type: :broken
+      else
+        num_pages, num_links = get_hash_stats(@broken_links)
+        puts_summary "Found #{num_links} broken link(s) across #{num_pages} page(s):", type: :broken
+        @broken_links.each do |key, values|
+          puts_group(key, type: :broken) # Puts the opening <p> element.
+          if verbose || (values.length <= NUM_VALUES)
+            values.each { |value| puts_group_item value, type: :broken }
+          else # Only print N values and summarise the rest.
+            NUM_VALUES.times { |i| puts_group_item values[i], type: :broken }
+            objects = sort_by_page? ? 'link(s)' : 'page(s)'
+            puts "+ #{values.length - NUM_VALUES} other #{objects}, remove --concise to see them all<br />"
+          end
+          puts '</p>'
+        end
+      end
+      puts '</div>'
+    end
+    # Report a summary of the ignored links.
+    def report_ignored_links(verbose: false)
+      puts '<div class="ignored_links">'
+      if @ignored_links.any?
+        num_pages, num_links = get_hash_stats(@ignored_links)
+        puts_summary "Ignored #{num_links} unsupported link(s) across #{num_pages} page(s), which you should check manually:", type: :ignored
+        @ignored_links.each do |key, values|
+          puts_group(key, type: :ignored) # Puts the opening <p> element.
+          if verbose || (values.length <= NUM_VALUES)
+            values.each { |value| puts_group_item value, type: :ignored }
+          else # Only print N values and summarise the rest.
+            NUM_VALUES.times { |i| puts_group_item values[i], type: :ignored }
+            objects = sort_by_page? ? 'link(s)' : 'page(s)'
+            puts "+ #{values.length - NUM_VALUES} other #{objects}, use --verbose to see them all<br />"
+          end
+          puts '</p>'
+        end
+      end
+      puts '</div>'
+    end
+    def puts_summary(text, type:)
+      klass = (type == :broken) ? 'broken_links_summary' : 'ignored_links_summary'
+      puts "<p class=\"#{klass}\">#{text}</p>"
+    end
+    def puts_group(link, type:)
+      href = build_url(link)
+      a_element = "<a href=\"#{href}\">#{link}</a>"
+      case type
+      when :broken
+        msg = sort_by_page? ?
+          "The following broken links were found on '#{a_element}':" :
+          "The broken link '#{a_element}' was found on the following pages:"
+        klass = 'broken_links_group'
+      when :ignored
+        msg = sort_by_page? ?
+          "The following links were ignored on '#{a_element}':" :
+          "The link '#{a_element}' was ignored on the following pages:"
+        klass = 'ignored_links_group'
+      else
+        raise "type: must be :broken or :ignored, not: #{type}"
+      end
+      puts "<p class=\"#{klass}\">"
+      puts msg + '<br />'
+    end
+    def puts_group_item(value, type:)
+      klass = (type == :broken) ? 'broken_links_group_item' : 'ignored_links_group_item'
+      puts "<a class=\"#{klass}\" href=\"#{build_url(value)}\">#{value}</a><br />"
+    end
+    def build_url(link)
+      return link if link.to_url.absolute?
+      @broken_link_map.fetch(link)
+    end
+    alias_method :report, :call
+  end
+end

data/lib/broken_link_finder/reporter/reporter.rb ADDED

@@ -0,0 +1,77 @@
+# frozen_string_literal: true
+module BrokenLinkFinder
+  # Generic reporter class to be inherited from by format specific reporters.
+  class Reporter
+    # The amount of pages/links to display when verbose is false.
+    NUM_VALUES = 3
+    # Creates a new Reporter instance.
+    # stream is any Object that responds to :puts and :print.
+    def initialize(stream, sort,
+                   broken_links, ignored_links,
+                   broken_link_map, crawl_stats)
+      unless stream.respond_to?(:puts) && stream.respond_to?(:print)
+        raise 'stream must respond_to? :puts and :print'
+      end
+      raise "sort by either :page or :link, not #{sort}" \
+      unless %i[page link].include?(sort)
+      @stream          = stream
+      @sort            = sort
+      @broken_links    = broken_links
+      @ignored_links   = ignored_links
+      @broken_link_map = broken_link_map
+      @crawl_stats     = crawl_stats
+    end
+    # Pretty print a report detailing the full link summary.
+    def call(broken_verbose: true, ignored_verbose: false)
+      raise 'Not implemented by parent class'
+    end
+    protected
+    # Return true if the sort is by page.
+    def sort_by_page?
+      @sort == :page
+    end
+    # Returns the key/value statistics of hash e.g. the number of keys and
+    # combined values. The hash should be of the format: { 'str' => [...] }.
+    # Use like: `num_pages, num_links = get_hash_stats(links)`.
+    def get_hash_stats(hash)
+      num_keys   = hash.keys.length
+      values     = hash.values.flatten
+      num_values = sort_by_page? ? values.length : values.uniq.length
+      sort_by_page? ?
+        [num_keys, num_values] :
+        [num_values, num_keys]
+    end
+    # Prints the text. Defaults to a blank line.
+    def print(text = '')
+      @stream.print(text)
+    end
+    # Prints the text + \n. Defaults to a blank line.
+    def puts(text = '')
+      @stream.puts(text)
+    end
+    # Prints text + \n\n.
+    def putsn(text)
+      puts(text)
+      puts
+    end
+    # Prints \n + text + \n.
+    def nputs(text)
+      puts
+      puts(text)
+    end
+    alias_method :report, :call
+  end
+end

data/lib/broken_link_finder/reporter/text_reporter.rb ADDED

@@ -0,0 +1,86 @@
+# frozen_string_literal: true
+module BrokenLinkFinder
+  class TextReporter < Reporter
+    # Creates a new TextReporter instance.
+    # stream is any Object that responds to :puts and :print.
+    def initialize(stream, sort,
+                   broken_links, ignored_links,
+                   broken_link_map, crawl_stats)
+      super
+    end
+    # Pretty print a report detailing the full link summary.
+    def call(broken_verbose: true, ignored_verbose: false)
+      report_crawl_summary
+      report_broken_links(verbose: broken_verbose)
+      report_ignored_links(verbose: ignored_verbose)
+      nil
+    end
+    private
+    # Report a summary of the overall crawl.
+    def report_crawl_summary
+      putsn format(
+        'Crawled %s (%s page(s) in %s seconds)',
+        @crawl_stats[:url],
+        @crawl_stats[:num_pages],
+        @crawl_stats[:duration]&.truncate(2)
+      )
+    end
+    # Report a summary of the broken links.
+    def report_broken_links(verbose: true)
+      if @broken_links.empty?
+        puts 'Good news, there are no broken links!'
+      else
+        num_pages, num_links = get_hash_stats(@broken_links)
+        puts "Found #{num_links} broken link(s) across #{num_pages} page(s):"
+        @broken_links.each do |key, values|
+          msg = sort_by_page? ?
+            "The following broken links were found on '#{key}':" :
+            "The broken link '#{key}' was found on the following pages:"
+          nputs msg
+          if verbose || (values.length <= NUM_VALUES)
+            values.each { |value| puts value }
+          else # Only print N values and summarise the rest.
+            NUM_VALUES.times { |i| puts values[i] }
+            objects = sort_by_page? ? 'link(s)' : 'page(s)'
+            puts "+ #{values.length - NUM_VALUES} other #{objects}, remove --concise to see them all"
+          end
+        end
+      end
+    end
+    # Report a summary of the ignored links.
+    def report_ignored_links(verbose: false)
+      if @ignored_links.any?
+        num_pages, num_links = get_hash_stats(@ignored_links)
+        nputs "Ignored #{num_links} unsupported link(s) across #{num_pages} page(s), which you should check manually:"
+        @ignored_links.each do |key, values|
+          msg = sort_by_page? ?
+            "The following links were ignored on '#{key}':" :
+            "The link '#{key}' was ignored on the following pages:"
+          nputs msg
+          if verbose || (values.length <= NUM_VALUES)
+            values.each { |value| puts value }
+          else # Only print N values and summarise the rest.
+            NUM_VALUES.times { |i| puts values[i] }
+            objects = sort_by_page? ? 'link(s)' : 'page(s)'
+            puts "+ #{values.length - NUM_VALUES} other #{objects}, use --verbose to see them all"
+          end
+        end
+      end
+    end
+    alias_method :report, :call
+  end
+end

data/lib/broken_link_finder/version.rb CHANGED

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module BrokenLinkFinder
-  VERSION = '0.9.5'
+  VERSION = '0.10.0'
 end

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: broken_link_finder
 version: !ruby/object:Gem::Version
-  version: 0.9.5
+  version: 0.10.0
 platform: ruby
 authors:
 - Michael Telford
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2019-11-22 00:00:00.000000000 Z
+date: 2019-11-28 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: bundler
@@ -159,7 +159,9 @@ files:
 - exe/broken_link_finder
 - lib/broken_link_finder.rb
 - lib/broken_link_finder/finder.rb
-- lib/broken_link_finder/reporter.rb
+- lib/broken_link_finder/reporter/html_reporter.rb
+- lib/broken_link_finder/reporter/reporter.rb
+- lib/broken_link_finder/reporter/text_reporter.rb
 - lib/broken_link_finder/version.rb
 - lib/broken_link_finder/wgit_extensions.rb
 - load.rb
@@ -187,8 +189,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubyforge_project:
-rubygems_version: 2.7.6
+rubygems_version: 3.0.6
 signing_key:
 specification_version: 4
 summary: Finds a website's broken links and reports back to you with a summary.

data/lib/broken_link_finder/reporter.rb DELETED

@@ -1,116 +0,0 @@
-# frozen_string_literal: true
-module BrokenLinkFinder
-  class Reporter
-    # The amount of pages/links to display when verbose is false.
-    NUM_VALUES = 3
-    # Creates a new Reporter instance.
-    # stream is any Object that responds to :puts.
-    def initialize(stream, sort, broken_links, ignored_links)
-      raise 'stream must respond_to? :puts' unless stream.respond_to?(:puts)
-      raise "sort by either :page or :link, not #{sort}" \
-      unless %i[page link].include?(sort)
-      @stream        = stream
-      @sort          = sort
-      @broken_links  = broken_links
-      @ignored_links = ignored_links
-    end
-    # Pretty print a report detailing the link summary.
-    def pretty_print_link_report(broken_verbose: true, ignored_verbose: false)
-      report_broken_links(verbose: broken_verbose)
-      report_ignored_links(verbose: ignored_verbose)
-      nil
-    end
-    private
-    # Report a summary of the broken links.
-    def report_broken_links(verbose: true)
-      if @broken_links.empty?
-        print 'Good news, there are no broken links!'
-      else
-        num_pages, num_links = get_hash_stats(@broken_links)
-        print "Found #{num_links} broken link(s) across #{num_pages} page(s):"
-        @broken_links.each do |key, values|
-          msg = sort_by_page? ?
-            "The following broken links were found on '#{key}':" :
-            "The broken link '#{key}' was found on the following pages:"
-          nprint msg
-          if verbose || (values.length <= NUM_VALUES)
-            values.each { |value| print value }
-          else # Only print N values and summarise the rest.
-            NUM_VALUES.times { |i| print values[i] }
-            objects = sort_by_page? ? 'link(s)' : 'page(s)'
-            print "+ #{values.length - NUM_VALUES} other #{objects}, remove --concise to see them all"
-          end
-        end
-      end
-    end
-    # Report a summary of the ignored links.
-    def report_ignored_links(verbose: false)
-      if @ignored_links.any?
-        num_pages, num_links = get_hash_stats(@ignored_links)
-        nprint "Ignored #{num_links} unsupported link(s) across #{num_pages} page(s), which you should check manually:"
-        @ignored_links.each do |key, values|
-          msg = sort_by_page? ?
-            "The following links were ignored on '#{key}':" :
-            "The link '#{key}' was ignored on the following pages:"
-          nprint msg
-          if verbose || (values.length <= NUM_VALUES)
-            values.each { |value| print value }
-          else # Only print N values and summarise the rest.
-            NUM_VALUES.times { |i| print values[i] }
-            objects = sort_by_page? ? 'link(s)' : 'page(s)'
-            print "+ #{values.length - NUM_VALUES} other #{objects}, use --verbose to see them all"
-          end
-        end
-      end
-    end
-    # Return true if the sort is by page.
-    def sort_by_page?
-      @sort == :page
-    end
-    # Returns the key/value statistics of hash e.g. the number of keys and
-    # combined values. The hash should be of the format: { 'str' => [...] }.
-    # Use like: `num_pages, num_links = get_hash_stats(links)`.
-    def get_hash_stats(hash)
-      num_keys   = hash.keys.length
-      values     = hash.values.flatten
-      num_values = sort_by_page? ? values.length : values.uniq.length
-      sort_by_page? ?
-        [num_keys, num_values] :
-        [num_values, num_keys]
-    end
-    # Prints the text + \n. Defaults to a blank line.
-    def print(text = '')
-      @stream.puts(text)
-    end
-    # Prints text + \n\n.
-    def printn(text)
-      print(text)
-      print
-    end
-    # Prints \n + text + \n.
-    def nprint(text)
-      print
-      print(text)
-    end
-  end
-end