wgit 0.10.7 → 0.10.8

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: cd1829ff2dda87e2b88fb738c18ba3a31765a8e3afbe82874e804b58b6c094fb
4
- data.tar.gz: 8085a6ab3da61aea02fea4bb1f7e7c18caf3edca0d4d11161df3e3a0255293e6
3
+ metadata.gz: 66e8b435303d07b2f81d260badc96662936599c9782916f7f014b74a7c617499
4
+ data.tar.gz: 7b55890c66ec09efd8d5749bd66605a4cb43d5091416f072f8fcc5aaaa85fbe7
5
5
  SHA512:
6
- metadata.gz: add6099d1433baebf93b4ad9471a5a35fb0a551e28eded322c46dfa4cfc45eea059284c8645bc2eb8bf33dd91c55060969f15ce6b24567445ffb737ae8a9afc4
7
- data.tar.gz: 787830ce4f6eea9c7270542c36718c54f48beb7f33e26feb06abbf72d0ad0750943542de051ff7b43caced904ea83550e877631b1ecb936c8f3b6ee211713282
6
+ metadata.gz: fe1b605224f6682ac504f17b55ab83518556f1320f0410741af8f95bf3a669918c69b48832fb413ca1f78482fdbb7e0d2e7d6f57841c6a562b7f926f7511cdd7
7
+ data.tar.gz: 856be2111709bc96488b7d43abbc49c563a9a56330344adb4b9ec40fc263cb91e63465c3c3dab317c0d8930965a609a43102d53d80bbc2001e6165a15cb905fa
data/CHANGELOG.md CHANGED
@@ -9,6 +9,17 @@
9
9
  - ...
10
10
  ---
11
11
 
12
+ ## v0.10.8
13
+ ### Added
14
+ - Custom `#inspect` methods to `Wgit::Url` and `Wgit::Document` classes.
15
+ - `Document.remove_extractors` method, which removes all default and defined extractors.
16
+
17
+ ### Changed/Removed
18
+ - ...
19
+ ### Fixed
20
+ - ...
21
+ ---
22
+
12
23
  ## v0.10.7
13
24
  ### Added
14
25
  - ...
data/README.md CHANGED
@@ -62,7 +62,23 @@ end
62
62
  puts JSON.generate(quotes)
63
63
  ```
64
64
 
65
- But what if we want to crawl and store the content in a database, so that it can be searched? Wgit makes it easy to index and search HTML using [MongoDB](https://www.mongodb.com/):
65
+ Which outputs:
66
+
67
+ ```text
68
+ [
69
+ {
70
+ "quote": "“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”",
71
+ "author": "Jane Austen"
72
+ },
73
+ {
74
+ "quote": "“A day without sunshine is like, you know, night.”",
75
+ "author": "Steve Martin"
76
+ },
77
+ ...
78
+ ]
79
+ ```
80
+
81
+ Great! But what if we want to crawl and store the content in a database, so that it can be searched? Wgit makes it easy to index and search HTML using [MongoDB](https://www.mongodb.com/):
66
82
 
67
83
  ```ruby
68
84
  require 'wgit'
@@ -89,6 +105,8 @@ The `search` call (on the last line) will return and output the results:
89
105
  Quotes to Scrape
90
106
  “I am free of all prejudice. I hate everyone equally. ”
91
107
  http://quotes.toscrape.com/tag/humor/page/2/
108
+
109
+ ...
92
110
  ```
93
111
 
94
112
  Using a MongoDB [client](https://robomongo.org/), we can see that the two web pages have been indexed, along with their extracted *quotes* and *authors*:
data/lib/wgit/document.rb CHANGED
@@ -192,13 +192,27 @@ module Wgit
192
192
  Document.send(:remove_method, "init_#{var}_from_object")
193
193
 
194
194
  @extractors.delete(var.to_sym)
195
+
195
196
  true
196
197
  rescue NameError
197
198
  false
198
199
  end
199
200
 
201
+ # Removes all default and defined extractors by calling
202
+ # `Document.remove_extractor` underneath. See its documentation.
203
+ def self.remove_extractors
204
+ @extractors.each { |var| remove_extractor(var) }
205
+ end
206
+
200
207
  ### Document Instance Methods ###
201
208
 
209
+ # Overrides String#inspect to shorten the printed output of a Document.
210
+ #
211
+ # @return [String] A short textual representation of this Document.
212
+ def inspect
213
+ "#<Wgit::Document url=\"#{@url}\" html=#{size} bytes>"
214
+ end
215
+
202
216
  # Determines if both the url and html match. Use
203
217
  # doc.object_id == other.object_id for exact object comparison.
204
218
  #
data/lib/wgit/url.rb CHANGED
@@ -117,6 +117,13 @@ Addressable::URI::InvalidURIError")
117
117
  @date_crawled = bool ? Wgit::Utils.time_stamp : nil
118
118
  end
119
119
 
120
+ # Overrides String#inspect to distingiush this Url from a String.
121
+ #
122
+ # @return [String] A short textual representation of this Url.
123
+ def inspect
124
+ "#<Wgit::Url url=\"#{self}\" crawled=#{@crawled}>"
125
+ end
126
+
120
127
  # Overrides String#replace setting the new_url @uri and String value.
121
128
  #
122
129
  # @param new_url [Wgit::Url, String] The new URL value.
data/lib/wgit/version.rb CHANGED
@@ -6,7 +6,7 @@
6
6
  # @author Michael Telford
7
7
  module Wgit
8
8
  # The current gem version of Wgit.
9
- VERSION = '0.10.7'
9
+ VERSION = '0.10.8'
10
10
 
11
11
  # Returns the current gem version of Wgit as a String.
12
12
  def self.version
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: wgit
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.10.7
4
+ version: 0.10.8
5
5
  platform: ruby
6
6
  authors:
7
7
  - Michael Telford
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2022-12-01 00:00:00.000000000 Z
11
+ date: 2023-08-18 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: addressable