metainspector 1.9.0 → 1.9.1

Sign up to get free protection for your applications and to get access to all the features.
data/README.rdoc CHANGED
@@ -14,16 +14,21 @@ This gem is tested on Ruby versions 1.8.7, 1.9.2 and 1.9.3.
14
14
 
15
15
  Initialize a scraper instance for an URL, like this:
16
16
 
17
- page = MetaInspector::Scraper.new('http://pagerankalert.com')
17
+ page = MetaInspector::Scraper.new('http://w3clove.com')
18
18
 
19
19
  or, for short, a convenience alias is also available:
20
20
 
21
- page = MetaInspector.new('http://pagerankalert.com')
21
+ page = MetaInspector.new('http://w3clove.com')
22
22
 
23
23
  If you don't include the scheme on the URL, http:// will be used
24
24
  by defaul:
25
25
 
26
- page = MetaInspector.new('pagerankalert.com')
26
+ page = MetaInspector.new('w3clove.com')
27
+
28
+ By default, MetaInspector times out after 20 seconds of waiting for a page to respond.
29
+ You can set a different timeout with a second parameter, like this:
30
+
31
+ page = MetaInspector.new('w3clove.com', 5) # this would wait just 5 seconds to timeout
27
32
 
28
33
  Then you can see the scraped data like this:
29
34
 
@@ -58,7 +63,7 @@ Please notice that MetaInspector is case sensitive, so page.meta_Content_Type is
58
63
 
59
64
  You can also access most of the scraped data as a hash:
60
65
 
61
- page.to_hash # { "url"=>"http://pagerankalert.com", "title" => "PageRankAlert.com", ... }
66
+ page.to_hash # { "url"=>"http://w3clove.com", "title" => "W3CLove :: site-wide markup validation tool", ... }
62
67
 
63
68
  The full scraped document if accessible from:
64
69
 
@@ -72,23 +77,23 @@ You can find some sample scripts on the samples folder, including a basic scrapi
72
77
  >> require 'metainspector'
73
78
  => true
74
79
 
75
- >> page = MetaInspector.new('http://pagerankalert.com')
76
- => #<MetaInspector:0x11330c0 @url="http://pagerankalert.com">
80
+ >> page = MetaInspector.new('http://w3clove.com')
81
+ => #<MetaInspector:0x11330c0 @url="http://w3clove.com">
77
82
 
78
83
  >> page.title
79
- => "PageRankAlert.com :: Track your PageRank changes"
84
+ => "W3CLove :: site-wide markup validation tool"
80
85
 
81
86
  >> page.meta_description
82
- => "Track your PageRank(TM) changes and receive alerts by email"
87
+ => "Site-wide markup validation tool. Validate the markup of your whole site with just one click."
83
88
 
84
89
  >> page.meta_keywords
85
- => "pagerank, seo, optimization, google"
90
+ => "html, markup, validation, validator, tool, w3c, development, standards, free"
86
91
 
87
92
  >> page.links.size
88
- => 8
93
+ => 15
89
94
 
90
- >> page.links[5]
91
- => "http://pagerankalert.posterous.com"
95
+ >> page.links[4]
96
+ => "/plans-and-pricing"
92
97
 
93
98
  >> page.document.class
94
99
  => String
@@ -103,6 +108,7 @@ You're welcome to fork this project and send pull requests. I want to thank spec
103
108
  * Ryan Romanchuk https://github.com/rromanchuk
104
109
  * Edmund Haselwanter https://github.com/ehaselwanter
105
110
  * Jonathan Hernández https://github.com/ionmx
111
+ * Oriol Gual https://github.com/oriolgual
106
112
 
107
113
  = To Do
108
114
 
@@ -4,6 +4,7 @@ require 'open-uri'
4
4
  require 'nokogiri'
5
5
  require 'charguess'
6
6
  require 'hashie/rash'
7
+ require 'timeout'
7
8
 
8
9
  # MetaInspector provides an easy way to scrape web pages and get its elements
9
10
  module MetaInspector
@@ -11,10 +12,11 @@ module MetaInspector
11
12
  attr_reader :url, :scheme
12
13
  # Initializes a new instance of MetaInspector, setting the URL to the one given
13
14
  # If no scheme given, set it to http:// by default
14
- def initialize(url)
15
- @url = URI.parse(url).scheme.nil? ? 'http://' + url : url
16
- @scheme = URI.parse(url).scheme || 'http'
17
- @data = Hashie::Rash.new('url' => @url)
15
+ def initialize(url, timeout = 20)
16
+ @url = URI.parse(url).scheme.nil? ? 'http://' + url : url
17
+ @scheme = URI.parse(url).scheme || 'http'
18
+ @timeout = timeout
19
+ @data = Hashie::Rash.new('url' => @url)
18
20
  end
19
21
 
20
22
  # Returns the parsed document title, from the content of the <title> tag.
@@ -92,7 +94,7 @@ module MetaInspector
92
94
 
93
95
  # Returns the original, unparsed document
94
96
  def document
95
- @document ||= open(@url).read
97
+ @document ||= Timeout::timeout(@timeout) { open(@url).read }
96
98
 
97
99
  rescue SocketError
98
100
  warn 'MetaInspector exception: The url provided does not exist or is temporarily unavailable (socket error)'
@@ -1,5 +1,5 @@
1
1
  # -*- encoding: utf-8 -*-
2
2
 
3
3
  module MetaInspector
4
- VERSION = "1.9.0"
4
+ VERSION = "1.9.1"
5
5
  end
@@ -6,7 +6,7 @@ module MetaInspector
6
6
  extend self
7
7
 
8
8
  # Sugar method to be able to create a scraper in a shorter way
9
- def new(url)
10
- Scraper.new(url)
9
+ def new(url, timeout = 20)
10
+ Scraper.new(url, timeout)
11
11
  end
12
12
  end
@@ -14,7 +14,7 @@ Gem::Specification.new do |gem|
14
14
  gem.require_paths = ["lib"]
15
15
  gem.version = MetaInspector::VERSION
16
16
 
17
- gem.add_dependency 'nokogiri', '1.5.3'
17
+ gem.add_dependency 'nokogiri', '~> 1.5'
18
18
  gem.add_dependency 'charguess', '1.3.20111021164500'
19
19
  gem.add_dependency 'rash', '0.3.2'
20
20
 
metadata CHANGED
@@ -1,13 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: metainspector
3
3
  version: !ruby/object:Gem::Version
4
- hash: 51
4
+ hash: 49
5
5
  prerelease:
6
6
  segments:
7
7
  - 1
8
8
  - 9
9
- - 0
10
- version: 1.9.0
9
+ - 1
10
+ version: 1.9.1
11
11
  platform: ruby
12
12
  authors:
13
13
  - Jaime Iniesta
@@ -15,7 +15,7 @@ autorequire:
15
15
  bindir: bin
16
16
  cert_chain: []
17
17
 
18
- date: 2012-06-03 00:00:00 Z
18
+ date: 2012-07-11 00:00:00 Z
19
19
  dependencies:
20
20
  - !ruby/object:Gem::Dependency
21
21
  name: nokogiri
@@ -23,14 +23,13 @@ dependencies:
23
23
  requirement: &id001 !ruby/object:Gem::Requirement
24
24
  none: false
25
25
  requirements:
26
- - - "="
26
+ - - ~>
27
27
  - !ruby/object:Gem::Version
28
28
  hash: 5
29
29
  segments:
30
30
  - 1
31
31
  - 5
32
- - 3
33
- version: 1.5.3
32
+ version: "1.5"
34
33
  type: :runtime
35
34
  version_requirements: *id001
36
35
  - !ruby/object:Gem::Dependency