RubyGems - link_scraper - Versions diffs - 1.01 → 1.02 - Mend

link_scraper 1.01 → 1.02

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 763fe1b9044fc08f804e6949a2d3f996876d4720554238315d190853e246f8e1
-  data.tar.gz: 8686914221781dac64b47a71a0269d220f6f64470c0495bc77c8916e2af4b913
+  metadata.gz: 1722daeff2d952711b2ed8247f5e1875c3b3d2739e96269ca483474405756cd2
+  data.tar.gz: bc901c65b11f0d123e5973bcb5b65b0b6add539f81b7102c625a47bb7a62994a
 SHA512:
-  metadata.gz: 9107b027d965d5fdc610313504299752cc33e0726ef8041c69b8d4c4a5d291c836e30f9ebfd4cb87dfac8ee6eca82d9b7b79134ae8fde8ffe58afa5fc704ece9
-  data.tar.gz: 63e57e717fed2a090d42c6ad2bdc8983d9db5dc3f106b59483f1c3a96d57cc8ffc1dba8019ecde92d35bd2611ec7dd5d4be97b169aa7b0f7b4037dd5c698b759
+  metadata.gz: aa4096762dc168b12d44909929a443d12dd901c40c4ad4d69baaafae4ffc2d97c53bb97f34a90175a613a843ccecd6caaeaf5d5fdc279e5f4062400d750057d6
+  data.tar.gz: 1acb651d9f98ec067551dc5f38fa10c2c98a597229016f76407a2d81a414805cb54fe23414ec698ed7b86caf70d45d2b3803ba318de6bab581f700b142c28e97

data/README.md CHANGED Viewed

@@ -3,7 +3,7 @@
 [![Gem Version](https://badge.fury.io/rb/link_scraper.svg)](https://badge.fury.io/rb/link_scraper)
 [![MIT License](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-#### Scrape website links' text and href with built-in scrubbing filter.
+#### Scrape website links' text and path with built-in scrubbing filter.
 Designed to rapidly visit and scrape links from list of several URLs, then filters them based on your criteria.  For example, to only grab the links for inventory, staff, or contact us, etc.
@@ -26,7 +26,53 @@ Or install it yourself as:
 ## Usage
-### COMING SOON - GEM IS CURRENTLY BEING DEVELOPED.
+This is an example of how to grab links from a URL.  `args` are optional if you want to scrub and filter the links based on your criteria, like below.
+```
+text_criteria = {
+  pos_criteria: ['coordinates', 'zip codes', 'area codes', 'climate', 'demographics'],
+  neg_criteria: %w[drought school]
+}
+path_criteria = {
+  pos_criteria: ['coordinates', 'zip codes', 'area codes', 'climate', 'demographics'],
+  neg_criteria: %w[drought school]
+}
+scraper = LinkScraper::Scrape.new(text_criteria: text_criteria, path_criteria: path_criteria)
+scraped_links = scraper.start('https://en.wikipedia.org/wiki/Austin%2C_Texas')
+```
+Example without Criteria (returns all links)
+```
+scraper = LinkScraper::Scrape.new
+scraped_links = scraper.start('https://en.wikipedia.org/wiki/Austin%2C_Texas')
+```
+Returns Array of Links Based on Criteria in `args`:
+```
+[
+  {:text=>"coordinates", :path=>"/wiki/geographic_coordinate_system"},
+  {:text=>"2.2 climate", :path=>""},
+  {:text=>"3 demographics", :path=>""},
+  {:text=>"explanation", :path=>"/wiki/template:climate_chart/how_to_read_a_climate_chart"},
+  {:text=>"humid subtropical climate", :path=>"/wiki/humid_subtropical_climate"},
+  {:text=>"kppen climate classification", :path=>"/wiki/k%c3%b6ppen_climate_classification"},
+  {:text=>"climate", :path=>""},
+  {:text=>"austin climate summary", :path=>"/web/20110606123855/http://www.srh.noaa.gov/images/ewx/aus/ausclisum.pdf"},
+  {:text=>"u.s. climate data", :path=>""},
+  {:text=>"nowdata - noaa online weather data", :path=>"/climate/xmacis.php"},
+  {:text=>"austin weather & climate", :path=>"/web/20070118231257/http://austin.about.com/od/weatherenvironment/a/weather.htm"},
+  {:text=>"nowdata - noaa online weather data", :path=>"/climate/xmacis.php"},
+  {:text=>"wmo climate normals for austin/municipal ap tx 19611990", :path=>"pub/gcos/wmo-normals/tables/reg_iv/us/group3/72254.txt"},
+  {:text=>"climate", :path=>"/wiki/climate_of_texas"},
+  {:text=>"demographics", :path=>"/wiki/demographics_of_texas"},
+  {:text=>"coordinates on wikidata", :path=>"/wiki/category:coordinates_on_wikidata"}
+ ]
+```
 ## Development

data/Rakefile CHANGED Viewed

@@ -18,32 +18,28 @@ task :console do
   ARGV.clear
   scraped_links = run_link_scraper
-  # binding.pry
+  binding.pry
   IRB.start
 end
 def run_link_scraper
-  urls = %w[
-    austinchevrolet.not.real
-    smith_acura.com/staff
-    abcrepair.ca
-    hertzrentals.com/review
-    londonhyundai.uk/fleet
-    http://www.townbuick.net/staff
-    http://youtube.com/download
-    www.madridinfiniti.es/collision
-    www.mitsubishideals.sofake
-    www.dallassubaru.com.sofake
-    www.quickeats.net/contact_us
-    www.school.edu/teachers
-    www.www.nissancars/inventory
-    www.www.toyotatown.net/staff/management
-    www.www.yellowpages.com/business
-  ]
+  text_criteria = {
+    pos_criteria: ['coordinates', 'zip codes', 'area codes', 'climate', 'demographics'],
+    neg_criteria: %w[drought school]
+  }
+  path_criteria = {
+    pos_criteria: ['coordinates', 'zip codes', 'area codes', 'climate', 'demographics'],
+    neg_criteria: %w[drought school]
+  }
+  scraper = LinkScraper::Scrape.new(text_criteria: text_criteria, path_criteria: path_criteria)
+  scraped_links = scraper.start('https://en.wikipedia.org/wiki/Austin%2C_Texas')
   binding.pry
-  scraper_obj = LinkScraper::Scrape.new(WebsCriteria.all_scrub_web_criteria)
-  scraped_links = scraper_obj.scrub_urls(urls)
+  # scraper = LinkScraper::Scrape.new(WebsCriteria.all_scrub_web_criteria)
 end

data/lib/link_scraper.rb CHANGED Viewed

@@ -1,6 +1,10 @@
 require "link_scraper/version"
-# require 'crm_formatter'
-# require 'pry'
+require "link_scraper/scrape"
+require 'mechanizer'
+require 'scrub_db'
+require 'pry'
 module LinkScraper
   # Your code goes here...

data/lib/link_scraper/scrape.rb ADDED Viewed

@@ -0,0 +1,256 @@
+module LinkScraper
+  class Scrape
+    # attr_accessor :text_criteria, :path_criteria
+    def initialize(args={})
+      @text_scrub = ScrubDb::Strings.new(args.fetch(:text_criteria, {}))
+      @path_scrub = ScrubDb::Strings.new(args.fetch(:path_criteria, {}))
+      @noko = Mechanizer::Noko.new
+    end
+    def start(url)
+      noko_hash = @noko.scrape({url: url})
+      link_hashes = noko_hash[:texts_and_paths]
+      err_msg = noko_hash[:err_msg]
+      page = noko_hash[:page]
+      valid_links = scrub_link_hashes(link_hashes, url)
+      valid_links = extract_link_from_url(valid_links, url)
+    end
+    def extract_link_from_url(valid_links, url)
+      formatted = valid_links.map do |link|
+        begin
+          link[:path] = URI(link[:path])&.path
+        rescue StandardError => e
+          puts e.message
+        end
+        link
+      end
+      formatted
+    end
+    def scrub_link_hashes(link_hashes, url)
+      valid_hashes = link_hashes.map do |link_hsh|
+        if link_hsh[:text].present? || link_hsh[:path].present?
+          link_hsh = encode_link(link_hsh)
+          text_hsh = @text_scrub.scrub_string(link_hsh[:text])
+          path_hsh = @path_scrub.scrub_string(link_hsh[:path])
+          text = evaluate_scrub_hsh(text_hsh)
+          path = evaluate_scrub_hsh(path_hsh)
+        end
+        link_hsh = nil unless (text.present? || path.present?)
+        link_hsh
+      end
+      valid_hashes = valid_hashes.compact
+    end
+    def encode_link(link_hsh)
+      link_hsh[:text] = encoder(link_hsh[:text])
+      link_hsh[:path] = encoder(link_hsh[:path])
+      link_hsh
+    end
+    def encoder(text)
+      # if text.present? && !text.valid_encoding?
+      if text.present?
+        text = text.chars.select(&:valid_encoding?).join
+        text = text.delete("^\u{0000}-\u{007F}")
+        text = text&.gsub(/\s+/, ' ')&.strip
+        text = text.gsub("\"", ' ')&.strip
+      end
+      text
+    end
+    def evaluate_scrub_hsh(hsh)
+      string = nil
+      string = hsh[:string] if (hsh[:pos_criteria].any? && hsh[:neg_criteria].empty?)
+    end
+    ##############################################################################
+    ### Below has been replaced with dependency gems, but keep incase needed later. ####
+    # def old_but_important
+    #   temp_name = nil
+    #   stock_hsh = get_stocks(temp_name)
+    #   stock_texts = stock_hsh[:stock_texts]
+    #   stock_links = stock_hsh[:stock_links]
+    #
+    #   link_text_results = []
+    #   noko_page.links.each do |noko_text_link|
+    #     noko_text = noko_text_link.text&.downcase&.gsub(/\W/,'')
+    #     pre_noko_link = noko_text_link&.path&.downcase&.strip
+    #     noko_link = @formatter.format_link(url, pre_noko_link)
+    #
+    #     if (noko_text && noko_link) && (noko_text.length > 3 && noko_link.length > 3) && (check_text_link_ban(noko_link, noko_text, temp_name) != true)
+    #       ## Find any Texts or Links that include 'team' or 'staff'
+    #       if noko_text.include?('staff') || noko_link.include?('staff')
+    #         link_text_hsh = {staff_text: noko_text, staff_link: noko_link}
+    #         link_text_results << link_text_hsh
+    #       end
+    #
+    #       ## Find valid Links
+    #       stock_links.each do |stock_link|
+    #         stock_link = stock_link.downcase&.strip
+    #         if noko_link.include?(stock_link) || stock_link.include?(noko_link)
+    #           link_text_hsh = {staff_text: noko_text, staff_link: noko_link}
+    #           link_text_results << link_text_hsh
+    #         end
+    #       end
+    #
+    #       ## Find valid Texts
+    #       stock_texts.each do |stock_text|
+    #         stock_text = stock_text.downcase&.gsub(/\W/,'')
+    #         if noko_text.include?(stock_text) || stock_text.include?(noko_text)
+    #           link_text_hsh = {staff_text: noko_text, staff_link: noko_link}
+    #           link_text_results << link_text_hsh
+    #         end
+    #       end
+    #     end
+    #   end
+    #
+    #   link_text_results.uniq!
+    #   puts "\n\n===================="
+    #   puts "Valid Text and Links: #{link_text_results.count}"
+    #   puts link_text_results.inspect
+    #   # sleep(1)
+    #   return link_text_results
+    # end
+    # def get_stocks(temp_name)
+    #   special_templates = ["Cobalt", "Dealer Inspire", "DealerFire"]
+    #   temp_name = 'general' if !special_templates.include?(temp_name)
+    #
+    #   stock_texts = Term.where(sub_category: "staff_text").where(criteria_term: temp_name).map(&:response_term)
+    #   # stock_texts += @tally_staff_texts
+    #   # stock_texts.uniq!
+    #
+    #   stock_links = Term.where(sub_category: "staff_path").where(criteria_term: temp_name).map(&:response_term)
+    #   # stock_links += @tally_staff_links
+    #   # stock_links.uniq!
+    #
+    #   stock_hsh = {stock_texts: stock_texts, stock_links: stock_links}
+    #   # puts stock_hsh
+    #   # sleep(1)
+    #   return stock_hsh
+    # end
+    # def get_query
+    #   err_sts_arr = ['Error: Timeout', 'Error: Host', 'Error: TCP']
+    #
+    #   query = Web.select(:id)
+    #     .where(url_sts: 'Valid', page_sts: "Invalid")
+    #     .where('page_date < ? OR page_date IS NULL', @cut_off)
+    #     .or(Web.select(:id)
+    #       .where(url_sts: 'Valid', temp_sts: 'Valid', page_sts: ['Valid', nil])
+    #       .where('page_date < ? OR page_date IS NULL', @cut_off)
+    #     ).or(Web.select(:id)
+    #        .where(url_sts: 'Valid', temp_sts: 'Valid', page_sts: err_sts_arr)
+    #        .where('timeout < ?', @db_timeout_limit)
+    #     ).order("timeout ASC").pluck(:id)
+    # end
+    #
+    # def start_find_page
+    #   query = get_query[0..20]
+    #   while query.any?
+    #     setup_iterator(query)
+    #     query = get_query[0..20]
+    #     break if !query.any?
+    #   end
+    # end
+    #
+    # def setup_iterator(query)
+    #   @query_count = query.count
+    #   (@query_count & @query_count > @obj_in_grp) ? @group_count = (@query_count / @obj_in_grp) : @group_count = 2
+    #   @dj_on ? iterate_query(query) : query.each { |id| template_starter(id) }
+    # end
+    #
+    #
+    # def template_starter(id)
+    #   web = Web.find(id)
+    #   web.links.destroy_all
+    #   url = web.url
+    #   temp_name = web.temp_name
+    #   db_timeout = web.timeout
+    #   db_timeout == 0 ? timeout = @dj_refresh_interval : timeout = (db_timeout * 3)
+    #   puts "timeout: #{timeout}"
+    #   puts "temp_name: #{temp_name}"
+    #   puts url
+    #
+    #   noko_hsh = start_noko(url, timeout)
+    #   noko_page = noko_hsh[:noko_page]
+    #   err_msg = noko_hsh[:err_msg]
+    #
+    #   if err_msg.present?
+    #     puts err_msg
+    #     web.update(page_sts: err_msg, page_date: Time.now, timeout: timeout)
+    #   elsif noko_page.present?
+    #     link_text_results = scrub_link_hashes(noko_page, web)
+    #     if !link_text_results.any?
+    #       web.update(page_sts: 'Invalid', page_date: Time.now, timeout: timeout)
+    #     else
+    #       link_text_results.each do |link_text_hsh|
+    #         link_obj = Link.find_or_create_by(link_text_hsh)
+    #         web_link = web.links.where(id: link_obj).exists?
+    #         web.links << link_obj if !web_link.present?
+    #         web.update(page_sts: 'Valid', page_date: Time.now, timeout: 0)
+    #       end
+    #     end
+    #   end
+    # end
+    ############ HELPER METHODS BELOW ################
+    #
+    # def check_text_link_ban(staff_link, staff_text, temp_name)
+    #   return true if !staff_link.present? || !staff_text.present? || staff_link.length < 4
+    #   return true if (temp_name = "Cobalt" && staff_text == 'sales')
+    #   return true if check_link_ban(staff_link)
+    #   return true if check_text_ban(staff_text)
+    #
+    #   include_ban = %w(/#card-view/card/ 404 appl approve body career center click collision commercial contact customer demo direction discl drive employ espanol espaol finan get google guarantee habla history home hour inventory javascript job join lease legal location lube mail map match multilingual offers oil open opportunit parts phone place price quick rating review sales_tab schedule search service special start yourdeal survey tel test text trade value vehicle video virtual websiteby welcome why facebook commercial twit near dealernear educat faculty discount event year fleet build index amenit tire find award year blog)
+    #
+    #   banned_link_text = include_ban.find { |ban| staff_link.include?(ban) || staff_text.include?(ban) }
+    #   banned_link_text.present? ? true : false
+    # end
+    #
+    #
+    # def check_text_ban(staff_text)
+    #   if staff_text.present?
+    #     ## Make sure staff_text is downcase and compact like below for accurate comparisons.
+    #     banned_texts = %w(dealershipinfo porsche preowned aboutus ourdealership newcars cars about honda ford learnmoreaboutus news fleet aboutourdealership fordf150 fordtrucks fordtransitconnectwagon fordtransitconnectwagon fordecosport fordfusion fordedge fordfocus fordescape fordexpedition fordexpeditionmax fordcmaxhybrid fordexplorer fordcars fordflex fordtransitcargovan fordsuvs fordtransitconnect fordtransitwagon fordtransitconnectvan fordfusionenergi fordvans fordfusionhybrid fordmustang moreaboutus tourournewdealership tourourdealership)
+    #
+    #     banned_text = banned_texts.find { |ban| staff_text == ban }
+    #     banned_text.present? ? true : false
+    #   end
+    # end
+    #
+    #
+    # def check_link_ban(staff_link)
+    #   if staff_link.present?
+    #     link_strict_ban = %w(/about /about-us /about-us.htm /about.htm /about.html /#commercial /commercial.html /dealership/about.htm /dealeronlineretailing_d /dealeronlineretailing /dealership/department.htm /dealership/news.htm /departments.aspx /fleet /index.htm /meetourdepartments /sales.aspx /#tab-sales)
+    #
+    #     banned_link = link_strict_ban.find { |ban| staff_link == ban }
+    #     banned_link.present? ? true : false
+    #   end
+    # end
+  end
+end

data/lib/link_scraper/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module LinkScraper
-  VERSION = "1.01"
+  VERSION = "1.02"
 end

data/lib/webs_criteria.rb CHANGED Viewed

@@ -37,11 +37,11 @@ class WebsCriteria
      %w(com net)
   end
-  # def self.seed_neg_hrefs
+  # def self.seed_neg_paths
   #   %w(? .com .jpg @ * afri after anounc apply approved blog book business buy call care career cash charit cheap check click collis commerc cont contrib deal distrib download employ event face feature feed financ find fleet form gas generat golf here holiday hospi hour info insta inventory join later light login mail mobile movie museu music news none now oil part pay phone policy priva pump quick quote rate regist review saving schedul service shop sign site speci ticket tire today transla travel truck tv twitter watch youth)
   # end
   #
-  # def self.seed_pos_hrefs
+  # def self.seed_pos_paths
   #   %w(team staff management)
   # end

data/link_scraper.gemspec CHANGED Viewed

@@ -8,11 +8,11 @@ Gem::Specification.new do |spec|
   spec.version       = LinkScraper::VERSION
   spec.authors       = ["Adam Booth"]
   spec.email         = ["4rlm@protonmail.ch"]
-  spec.homepage      = 'https://github.com/4rlm/scrub_db'
+  spec.homepage      = 'https://github.com/4rlm/link_scraper'
   spec.license       = "MIT"
-  spec.summary       = %q{Scrape website links' text and href with built-in scrubbing filter.}
-  spec.description   = %q{Scrape website links' text and href with built-in scrubbing filter.  Designed to rapidly visit and scrape links from list of several URLs, then filters them based on your criteria.  For example, to only grab the links for inventory, staff, or contact us, etc.}
+  spec.summary       = %q{Scrape website links' text and path with built-in scrubbing filter.}
+  spec.description   = %q{Scrape website links' text and path with built-in scrubbing filter.  Designed to rapidly visit and scrape links from list of several URLs, then filters them based on your criteria.  For example, to only grab the links for inventory, staff, or contact us, etc.}
   if spec.respond_to?(:metadata)
@@ -38,11 +38,11 @@ Gem::Specification.new do |spec|
   spec.required_ruby_version = '~> 2.5.1'
   spec.add_dependency 'activesupport', '~> 5.2'
-  spec.add_dependency 'utf8_sanitizer', '~> 2.16'
-  spec.add_dependency 'crm_formatter', '~> 2.61'
-  spec.add_dependency 'mechanizer', '~> 1.11'
-  spec.add_dependency 'scrub_db', '~> 2.22'
+  spec.add_dependency 'crm_formatter', '~> 2.64'
+  spec.add_dependency 'mechanizer', '~> 1.12'
+  spec.add_dependency 'scrub_db', '~> 2.23'
   spec.add_dependency 'url_verifier', '~> 2.12'
+  spec.add_dependency 'utf8_sanitizer', '~> 2.16'
   # spec.add_dependency "activesupport-inflector", ['~> 0.1.0']
   spec.add_development_dependency 'bundler', '~> 1.16', '>= 1.16.2'

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: link_scraper
 version: !ruby/object:Gem::Version
-  version: '1.01'
+  version: '1.02'
 platform: ruby
 authors:
 - Adam Booth
@@ -25,75 +25,75 @@ dependencies:
       - !ruby/object:Gem::Version
         version: '5.2'
 - !ruby/object:Gem::Dependency
-  name: utf8_sanitizer
+  name: crm_formatter
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '2.16'
+        version: '2.64'
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '2.16'
+        version: '2.64'
 - !ruby/object:Gem::Dependency
-  name: crm_formatter
+  name: mechanizer
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '2.61'
+        version: '1.12'
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '2.61'
+        version: '1.12'
 - !ruby/object:Gem::Dependency
-  name: mechanizer
+  name: scrub_db
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '1.11'
+        version: '2.23'
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '1.11'
+        version: '2.23'
 - !ruby/object:Gem::Dependency
-  name: scrub_db
+  name: url_verifier
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '2.22'
+        version: '2.12'
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '2.22'
+        version: '2.12'
 - !ruby/object:Gem::Dependency
-  name: url_verifier
+  name: utf8_sanitizer
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '2.12'
+        version: '2.16'
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '2.12'
+        version: '2.16'
 - !ruby/object:Gem::Dependency
   name: bundler
   requirement: !ruby/object:Gem::Requirement
@@ -162,7 +162,7 @@ dependencies:
     - - "~>"
       - !ruby/object:Gem::Version
         version: '3.7'
-description: Scrape website links' text and href with built-in scrubbing filter.  Designed
+description: Scrape website links' text and path with built-in scrubbing filter.  Designed
   to rapidly visit and scrape links from list of several URLs, then filters them based
   on your criteria.  For example, to only grab the links for inventory, staff, or
   contact us, etc.
@@ -183,10 +183,11 @@ files:
 - bin/console
 - bin/setup
 - lib/link_scraper.rb
+- lib/link_scraper/scrape.rb
 - lib/link_scraper/version.rb
 - lib/webs_criteria.rb
 - link_scraper.gemspec
-homepage: https://github.com/4rlm/scrub_db
+homepage: https://github.com/4rlm/link_scraper
 licenses:
 - MIT
 metadata:
@@ -210,5 +211,5 @@ rubyforge_project:
 rubygems_version: 2.7.6
 signing_key:
 specification_version: 4
-summary: Scrape website links' text and href with built-in scrubbing filter.
+summary: Scrape website links' text and path with built-in scrubbing filter.
 test_files: []