mechanizer 1.10 → 1.11

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 454518f3e065bb85be179436c05269e29bdfbd861d1ea4251979544775ec79ad
4
- data.tar.gz: c0c2e74dffb8fb7a69064bce076187fd145f3da0e8c5594638023d8452ea75d6
3
+ metadata.gz: ef9ac4cda832d55e693e46c28e9dd569cad29e096545338ea364bee468f7f02f
4
+ data.tar.gz: b58af71643298b06fb45ee3cbb2aaa2df3abc2579826e42c6dc197d739aa8943
5
5
  SHA512:
6
- metadata.gz: 2afa734cf56b9d9997f8943dab38cbaec17deaaa9f6b7a792f28925ba8e16112aa618d1d2453e9d16d3ac3aadb0fdaa3a87ed77fc02d2f642b97ff9d077e08fe
7
- data.tar.gz: a73b49229a1bc6ec421b085f1210a3785b911984f83f3ef99e9cd3e666a0ea7dd299f0609ac2e79c10c08604dbbea86fd3b56f51247ec38c07e59f086202e590
6
+ metadata.gz: e3034c83bdbb741c0a3f65ec798edf4daf7eff600a233ae6bc301e241ee85b774538c7725e51893375e390e9c17c120683dca1b22986aa1dcf237e6eb19b7901
7
+ data.tar.gz: d8ca61e01ce5c34fbb906aea820e16d947f79ebcf75bc64cddfc7bf1fe9dbaa744438cb507e9551eceda67f724e9ebb256b3285a481228b2f6b6f3ce5cb84312
data/README.md CHANGED
@@ -7,13 +7,15 @@
7
7
 
8
8
  Light, easy to use wrapper for Mechanize and NokoGiri. No configuration or error handling to worry about. Simply enter the target URL and Mechanizer scrapes the page for you to easily parse.
9
9
 
10
- #### Recommended Gems
10
+ ### Recommended Gems
11
11
  Note: URL MUST be in proper format and be valid, example:
12
+
12
13
  Correct: https://www.example.com
14
+
13
15
  Incorrect: www.example.com, example.com, https://example.com
14
16
 
15
- ##### 1. If you need to pre-format your URLs, try using `CrmFormatter gem`
16
- ##### 2. If you need to verify your URLs, try using `UrlVerifier gem`, which includes the `CrmFormatter gem` inside of it.
17
+ 1. If you need to pre-format your URLs, try using `CrmFormatter gem`
18
+ 2. If you need to verify your URLs, try using `UrlVerifier gem`, which includes the `CrmFormatter gem` inside of it.
17
19
 
18
20
  Then, feed the results from those gems into this gem. The documentation below assumes the URLs are correctly formatted and have been verified before passing them through the `Mechanizer gem`.
19
21
 
@@ -35,14 +37,14 @@ Or install it yourself as:
35
37
 
36
38
  ## Usage
37
39
 
38
- #### 1. Instantiate & Pass URL
40
+ ### 1. Instantiate & Pass URL
39
41
 
40
42
  ```
41
43
  noko = Mechanizer::Noko.new
42
44
  noko_hash = noko.scrape({url: 'https://www.wikipedia.org'})
43
45
  ```
44
46
 
45
- #### 2. To Customize Timeout:
47
+ ### 2. To Customize Timeout:
46
48
  Default timeout is set to 60. You can adjust that time or omit it if 60 is fine.
47
49
 
48
50
  ```
@@ -51,7 +53,7 @@ args = {url: 'https://www.wikipedia.org', timeout: 30}
51
53
  noko_hash = noko.scrape(args)
52
54
  ```
53
55
 
54
- #### 3. Noko Result in Hash Format
56
+ ### 3. Noko Result in Hash Format
55
57
 
56
58
  ```
57
59
  err_msg = noko_hash[:err_msg]
@@ -59,7 +61,7 @@ page = noko_hash[:page]
59
61
  texts_and_hrefs = noko_hash[:texts_and_hrefs]
60
62
  ```
61
63
 
62
- #### 4. Example Texts & Hrefs:
64
+ ### 4. Example Texts & Hrefs:
63
65
 
64
66
  ```
65
67
  texts_and_hrefs = [
@@ -73,17 +75,17 @@ texts_and_hrefs = [
73
75
  ]
74
76
  ```
75
77
 
76
- #### 5. Example Parsing Page:
78
+ ### 5. Example Parsing Page:
77
79
  There are several ways to parse and manipulate `noko_hash[:page]`. Essentially, you can parse the page using its css classes and html tags. You can use either or both together. Some pages are very straight forward, but others can require a lot of skill. Here is a good reference guide: [Nokogiri Tutorials](http://www.nokogiri.org/tutorials). All Nokogiri methods are available through this wrapper. This wrapper simply helps you avoid setting up, manages and reduces errors, and helps to automate your scraping process.
78
80
 
79
- ##### For the Wikipedia URL in the example above, at the time of this README there is a group of icons on its homepage. If you right-click on any of them you can inspect. Look for any classes that interest you. In this example, it's `.other-project`. Simply paste it like below to get started. Remember, there are several ways to do this, so read the docs and explore what's available.
81
+ For the Wikipedia URL in the example above, at the time of this README there is a group of icons on its homepage. If you right-click on any of them you can inspect. Look for any classes that interest you. In this example, it's `.other-project`. Simply paste it like below to get started. Remember, there are several ways to do this, so read the docs and explore what's available.
80
82
 
81
83
  ```
82
84
  other_projects = page.css('.other-project')&.text
83
85
  other_projects = other_projects.split("\n").reject(&:blank?)
84
86
  ```
85
87
 
86
- ##### 6. Results from Parsing Page (from example 5):
88
+ ### 6. Results from Parsing Page (from example 5):
87
89
 
88
90
  ```
89
91
  other_projects = [
@@ -114,7 +116,7 @@ other_projects = [
114
116
  ]
115
117
  ```
116
118
 
117
- ##### 7. Automating Your Scraping:
119
+ ### 7. Automating Your Scraping:
118
120
  You may wish to automate your scraping for various reasons including:
119
121
 
120
122
  * Verifing Inventory Items and Pricing (car dealers, retail, menus, etc.),
data/Rakefile CHANGED
@@ -26,6 +26,7 @@ end
26
26
  def run_mechanizer
27
27
  noko = Mechanizer::Noko.new
28
28
  args = {url: 'https://www.wikipedia.org', timeout: 30}
29
+ # args = {url: 'wikipedia', timeout: 30}
29
30
  noko_hash = noko.scrape(args)
30
31
 
31
32
  err_msg = noko_hash[:err_msg]
@@ -34,5 +35,4 @@ def run_mechanizer
34
35
 
35
36
  other_projects = page.css('.other-project')&.text
36
37
  other_projects = other_projects.split("\n").reject(&:blank?)
37
-
38
38
  end
@@ -5,7 +5,6 @@
5
5
  # require 'open-uri'
6
6
  # require 'whois'
7
7
  # require 'delayed_job'
8
- #
9
8
  # require 'timeout'
10
9
  # require 'net/ping'
11
10
 
@@ -72,7 +71,9 @@ module Mechanizer
72
71
  end
73
72
 
74
73
  def pre_noko_msg(url)
75
- puts "\n\n#{'='*40}\nSCRAPING: #{url}\nMax Wait Set: #{@timeout} Seconds\n\n"
74
+ msg = "\n\n#{'='*40}\nSCRAPING: #{url}\nMax Wait Set: #{@timeout} Seconds\n\n"
75
+ puts msg
76
+ msg
76
77
  end
77
78
 
78
79
  def error_parser(err_msg)
@@ -86,6 +87,8 @@ module Mechanizer
86
87
  err_msg = "Error: TCP"
87
88
  elsif err_msg.include?("execution expired")
88
89
  err_msg = "Error: Runtime"
90
+ elsif err_msg.include?("absolute URL needed")
91
+ err_msg = "Error: URL Not Absolute"
89
92
  else
90
93
  err_msg = "Error: Undefined"
91
94
  end
@@ -1,3 +1,3 @@
1
1
  module Mechanizer
2
- VERSION = "1.10"
2
+ VERSION = "1.11"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: mechanizer
3
3
  version: !ruby/object:Gem::Version
4
- version: '1.10'
4
+ version: '1.11'
5
5
  platform: ruby
6
6
  authors:
7
7
  - Adam Booth
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2018-07-02 00:00:00.000000000 Z
11
+ date: 2018-07-04 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: activesupport