RubyGems - wgit - Versions diffs - 0.10.6 → 0.10.8 - Mend

wgit 0.10.6 → 0.10.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 4598dcfc047ce3915ba5a871837be5efc54201d61b4967cf53070bec2af4dd52
-  data.tar.gz: 604010011024af6f2d4dfcc87e6c4c1d73f8e4811938281119fccb79792818c1
+  metadata.gz: 66e8b435303d07b2f81d260badc96662936599c9782916f7f014b74a7c617499
+  data.tar.gz: 7b55890c66ec09efd8d5749bd66605a4cb43d5091416f072f8fcc5aaaa85fbe7
 SHA512:
-  metadata.gz: 44b098e2a97191801787386e9d2060dcdeacc625c3453976679fc276a73b2bf0614713764a55f7074073018e898f2e43dc1a7f4f803339a86158052f59dcabcb
-  data.tar.gz: 8645c7095bb14590cf83c21905c9f5ed524e1047254e6526b8fe46a53f3989395472300d27fb65f899951a5f4b80ee9928accd23164b10e1a834975bf045db47
+  metadata.gz: fe1b605224f6682ac504f17b55ab83518556f1320f0410741af8f95bf3a669918c69b48832fb413ca1f78482fdbb7e0d2e7d6f57841c6a562b7f926f7511cdd7
+  data.tar.gz: 856be2111709bc96488b7d43abbc49c563a9a56330344adb4b9ec40fc263cb91e63465c3c3dab317c0d8930965a609a43102d53d80bbc2001e6165a15cb905fa

data/CHANGELOG.md CHANGED Viewed

@@ -9,6 +9,26 @@
 - ...
 ---
+## v0.10.8
+### Added
+- Custom `#inspect` methods to `Wgit::Url` and `Wgit::Document` classes.
+- `Document.remove_extractors` method, which removes all default and defined extractors.
+### Changed/Removed
+- ...
+### Fixed
+- ...
+---
+## v0.10.7
+### Added
+- ...
+### Changed/Removed
+- ...
+### Fixed
+- Security vulnerabilities by updating gem dependencies.
+---
 ## v0.10.6
 ### Added
 - `Wgit::DSL` method `#crawl_url` (aliased to `#crawl`).

data/README.md CHANGED Viewed

@@ -18,7 +18,7 @@ Wgit was primarily designed to crawl static HTML websites to index and  search t
 Wgit provides a high level, easy-to-use API and DSL that you can use in your own applications and scripts.
-Check out this [demo search engine](https://search-engine-rb.herokuapp.com) - [built](https://github.com/michaeltelford/search_engine) using Wgit and Sinatra - deployed to [Heroku](https://www.heroku.com/). Heroku's free tier is used so the initial page load may be slow. Try searching for "Matz" or something else that's Ruby related.
+Check out this [demo search engine](https://wgit-search-engine.fly.dev) - [built](https://github.com/michaeltelford/search_engine) using Wgit and Sinatra - deployed to [fly.io](https://fly.io). Try searching for something that's Ruby related like "Matz" or "Rails".
 ## Table Of Contents
@@ -62,7 +62,23 @@ end
 puts JSON.generate(quotes)
 ```
-But what if we want to crawl and store the content in a database, so that it can be searched? Wgit makes it easy to index and search HTML using [MongoDB](https://www.mongodb.com/):
+Which outputs:
+```text
+[
+    {
+        "quote": "“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”",
+        "author": "Jane Austen"
+    },
+    {
+        "quote": "“A day without sunshine is like, you know, night.”",
+        "author": "Steve Martin"
+    },
+    ...
+]
+```
+Great! But what if we want to crawl and store the content in a database, so that it can be searched? Wgit makes it easy to index and search HTML using [MongoDB](https://www.mongodb.com/):
 ```ruby
 require 'wgit'
@@ -89,6 +105,8 @@ The `search` call (on the last line) will return and output the results:
 Quotes to Scrape
 “I am free of all prejudice. I hate everyone equally. ”
 http://quotes.toscrape.com/tag/humor/page/2/
+...
 ```
 Using a MongoDB [client](https://robomongo.org/), we can see that the two web pages have been indexed, along with their extracted *quotes* and *authors*:

data/lib/wgit/document.rb CHANGED Viewed

@@ -89,9 +89,9 @@ module Wgit
     #
     # @return [String] An xpath String to obtain a webpage's text elements.
     def self.text_elements_xpath
-      Wgit::Document.text_elements.each_with_index.reduce("") do |xpath, (el, i)|
-        xpath += " | " unless i.zero?
-        xpath += format("//%s/text()", el)
+      Wgit::Document.text_elements.each_with_index.reduce('') do |xpath, (el, i)|
+        xpath += ' | ' unless i.zero?
+        xpath += format('//%s/text()', el)
       end
     end
@@ -192,13 +192,27 @@ module Wgit
       Document.send(:remove_method, "init_#{var}_from_object")
       @extractors.delete(var.to_sym)
       true
     rescue NameError
       false
     end
+    # Removes all default and defined extractors by calling
+    # `Document.remove_extractor` underneath. See its documentation.
+    def self.remove_extractors
+      @extractors.each { |var| remove_extractor(var) }
+    end
     ### Document Instance Methods ###
+    # Overrides String#inspect to shorten the printed output of a Document.
+    #
+    # @return [String] A short textual representation of this Document.
+    def inspect
+      "#<Wgit::Document url=\"#{@url}\" html=#{size} bytes>"
+    end
     # Determines if both the url and html match. Use
     # doc.object_id == other.object_id for exact object comparison.
     #
@@ -505,7 +519,7 @@ be relative"
     # parameter.
     #
     # @param xpath [String, #call] Used to find the value/object in @html.
-    # @param singleton [Boolean] singleton ? results.first (single Object) :
+    # @param singleton [Boolean] singleton ? results.first (single Object) :
     #   results (Enumerable).
     # @param text_content_only [Boolean] text_content_only ? result.content
     #   (String) : result (Nokogiri Object).
@@ -546,7 +560,7 @@ be relative"
     # parameter.
     #
     # @param xpath [String, #call] Used to find the value/object in @html.
-    # @param singleton [Boolean] singleton ? results.first (single Object) :
+    # @param singleton [Boolean] singleton ? results.first (single Object) :
     #   results (Enumerable).
     # @param text_content_only [Boolean] text_content_only ? result.content
     #   (String) : result (Nokogiri Object).

data/lib/wgit/url.rb CHANGED Viewed

@@ -117,6 +117,13 @@ Addressable::URI::InvalidURIError")
       @date_crawled = bool ? Wgit::Utils.time_stamp : nil
     end
+    # Overrides String#inspect to distingiush this Url from a String.
+    #
+    # @return [String] A short textual representation of this Url.
+    def inspect
+      "#<Wgit::Url url=\"#{self}\" crawled=#{@crawled}>"
+    end
     # Overrides String#replace setting the new_url @uri and String value.
     #
     # @param new_url [Wgit::Url, String] The new URL value.

data/lib/wgit/version.rb CHANGED Viewed

@@ -6,7 +6,7 @@
 # @author Michael Telford
 module Wgit
   # The current gem version of Wgit.
-  VERSION = '0.10.6'
+  VERSION = '0.10.8'
   # Returns the current gem version of Wgit as a String.
   def self.version

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: wgit
 version: !ruby/object:Gem::Version
-  version: 0.10.6
+  version: 0.10.8
 platform: ruby
 authors:
 - Michael Telford
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2022-07-27 00:00:00.000000000 Z
+date: 2023-08-18 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: addressable