wgit 0.10.7 → 0.10.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: cd1829ff2dda87e2b88fb738c18ba3a31765a8e3afbe82874e804b58b6c094fb
4
- data.tar.gz: 8085a6ab3da61aea02fea4bb1f7e7c18caf3edca0d4d11161df3e3a0255293e6
3
+ metadata.gz: 66e8b435303d07b2f81d260badc96662936599c9782916f7f014b74a7c617499
4
+ data.tar.gz: 7b55890c66ec09efd8d5749bd66605a4cb43d5091416f072f8fcc5aaaa85fbe7
5
5
  SHA512:
6
- metadata.gz: add6099d1433baebf93b4ad9471a5a35fb0a551e28eded322c46dfa4cfc45eea059284c8645bc2eb8bf33dd91c55060969f15ce6b24567445ffb737ae8a9afc4
7
- data.tar.gz: 787830ce4f6eea9c7270542c36718c54f48beb7f33e26feb06abbf72d0ad0750943542de051ff7b43caced904ea83550e877631b1ecb936c8f3b6ee211713282
6
+ metadata.gz: fe1b605224f6682ac504f17b55ab83518556f1320f0410741af8f95bf3a669918c69b48832fb413ca1f78482fdbb7e0d2e7d6f57841c6a562b7f926f7511cdd7
7
+ data.tar.gz: 856be2111709bc96488b7d43abbc49c563a9a56330344adb4b9ec40fc263cb91e63465c3c3dab317c0d8930965a609a43102d53d80bbc2001e6165a15cb905fa
data/CHANGELOG.md CHANGED
@@ -9,6 +9,17 @@
9
9
  - ...
10
10
  ---
11
11
 
12
+ ## v0.10.8
13
+ ### Added
14
+ - Custom `#inspect` methods to `Wgit::Url` and `Wgit::Document` classes.
15
+ - `Document.remove_extractors` method, which removes all default and defined extractors.
16
+
17
+ ### Changed/Removed
18
+ - ...
19
+ ### Fixed
20
+ - ...
21
+ ---
22
+
12
23
  ## v0.10.7
13
24
  ### Added
14
25
  - ...
data/README.md CHANGED
@@ -62,7 +62,23 @@ end
62
62
  puts JSON.generate(quotes)
63
63
  ```
64
64
 
65
- But what if we want to crawl and store the content in a database, so that it can be searched? Wgit makes it easy to index and search HTML using [MongoDB](https://www.mongodb.com/):
65
+ Which outputs:
66
+
67
+ ```text
68
+ [
69
+ {
70
+ "quote": "“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”",
71
+ "author": "Jane Austen"
72
+ },
73
+ {
74
+ "quote": "“A day without sunshine is like, you know, night.”",
75
+ "author": "Steve Martin"
76
+ },
77
+ ...
78
+ ]
79
+ ```
80
+
81
+ Great! But what if we want to crawl and store the content in a database, so that it can be searched? Wgit makes it easy to index and search HTML using [MongoDB](https://www.mongodb.com/):
66
82
 
67
83
  ```ruby
68
84
  require 'wgit'
@@ -89,6 +105,8 @@ The `search` call (on the last line) will return and output the results:
89
105
  Quotes to Scrape
90
106
  “I am free of all prejudice. I hate everyone equally. ”
91
107
  http://quotes.toscrape.com/tag/humor/page/2/
108
+
109
+ ...
92
110
  ```
93
111
 
94
112
  Using a MongoDB [client](https://robomongo.org/), we can see that the two web pages have been indexed, along with their extracted *quotes* and *authors*:
data/lib/wgit/document.rb CHANGED
@@ -192,13 +192,27 @@ module Wgit
192
192
  Document.send(:remove_method, "init_#{var}_from_object")
193
193
 
194
194
  @extractors.delete(var.to_sym)
195
+
195
196
  true
196
197
  rescue NameError
197
198
  false
198
199
  end
199
200
 
201
+ # Removes all default and defined extractors by calling
202
+ # `Document.remove_extractor` underneath. See its documentation.
203
+ def self.remove_extractors
204
+ @extractors.each { |var| remove_extractor(var) }
205
+ end
206
+
200
207
  ### Document Instance Methods ###
201
208
 
209
+ # Overrides String#inspect to shorten the printed output of a Document.
210
+ #
211
+ # @return [String] A short textual representation of this Document.
212
+ def inspect
213
+ "#<Wgit::Document url=\"#{@url}\" html=#{size} bytes>"
214
+ end
215
+
202
216
  # Determines if both the url and html match. Use
203
217
  # doc.object_id == other.object_id for exact object comparison.
204
218
  #
data/lib/wgit/url.rb CHANGED
@@ -117,6 +117,13 @@ Addressable::URI::InvalidURIError")
117
117
  @date_crawled = bool ? Wgit::Utils.time_stamp : nil
118
118
  end
119
119
 
120
+ # Overrides String#inspect to distingiush this Url from a String.
121
+ #
122
+ # @return [String] A short textual representation of this Url.
123
+ def inspect
124
+ "#<Wgit::Url url=\"#{self}\" crawled=#{@crawled}>"
125
+ end
126
+
120
127
  # Overrides String#replace setting the new_url @uri and String value.
121
128
  #
122
129
  # @param new_url [Wgit::Url, String] The new URL value.
data/lib/wgit/version.rb CHANGED
@@ -6,7 +6,7 @@
6
6
  # @author Michael Telford
7
7
  module Wgit
8
8
  # The current gem version of Wgit.
9
- VERSION = '0.10.7'
9
+ VERSION = '0.10.8'
10
10
 
11
11
  # Returns the current gem version of Wgit as a String.
12
12
  def self.version
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: wgit
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.10.7
4
+ version: 0.10.8
5
5
  platform: ruby
6
6
  authors:
7
7
  - Michael Telford
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2022-12-01 00:00:00.000000000 Z
11
+ date: 2023-08-18 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: addressable