RubyGems - metainspector - Versions diffs - 1.9.0 → 1.9.1 - Mend

metainspector 1.9.0 → 1.9.1

Files changed (6) hide show

data/README.rdoc +18 -12
data/lib/meta_inspector/scraper.rb +7 -5
data/lib/meta_inspector/version.rb +1 -1
data/lib/meta_inspector.rb +2 -2
data/meta_inspector.gemspec +1 -1
metadata +6 -7

data/README.rdoc CHANGED Viewed

@@ -14,16 +14,21 @@ This gem is tested on Ruby versions 1.8.7, 1.9.2 and 1.9.3.
 Initialize a scraper instance for an URL, like this:
-  page = MetaInspector::Scraper.new('http://pagerankalert.com')
+  page = MetaInspector::Scraper.new('http://w3clove.com')
 or, for short, a convenience alias is also available:
-  page = MetaInspector.new('http://pagerankalert.com')
+  page = MetaInspector.new('http://w3clove.com')
 If you don't include the scheme on the URL, http:// will be used
 by defaul:
-  page = MetaInspector.new('pagerankalert.com')
+  page = MetaInspector.new('w3clove.com')
+By default, MetaInspector times out after 20 seconds of waiting for a page to respond.
+You can set a different timeout with a second parameter, like this:
+  page = MetaInspector.new('w3clove.com', 5) # this would wait just 5 seconds to timeout
 Then you can see the scraped data like this:
@@ -58,7 +63,7 @@ Please notice that MetaInspector is case sensitive, so page.meta_Content_Type is
 You can also access most of the scraped data as a hash:
-  page.to_hash               # { "url"=>"http://pagerankalert.com", "title" => "PageRankAlert.com", ... }
+  page.to_hash               # { "url"=>"http://w3clove.com", "title" => "W3CLove :: site-wide markup validation tool", ... }
 The full scraped document if accessible from:
@@ -72,23 +77,23 @@ You can find some sample scripts on the samples folder, including a basic scrapi
   >> require 'metainspector'
   => true
-  >> page = MetaInspector.new('http://pagerankalert.com')
-  => #<MetaInspector:0x11330c0 @url="http://pagerankalert.com">
+  >> page = MetaInspector.new('http://w3clove.com')
+  => #<MetaInspector:0x11330c0 @url="http://w3clove.com">
   >> page.title
-  => "PageRankAlert.com :: Track your PageRank changes"
+  => "W3CLove :: site-wide markup validation tool"
   >> page.meta_description
-  => "Track your PageRank(TM) changes and receive alerts by email"
+  => "Site-wide markup validation tool. Validate the markup of your whole site with just one click."
   >> page.meta_keywords
-  => "pagerank, seo, optimization, google"
+  => "html, markup, validation, validator, tool, w3c, development, standards, free"
   >> page.links.size
-  => 8
+  => 15
-  >> page.links[5]
-  => "http://pagerankalert.posterous.com"
+  >> page.links[4]
+  => "/plans-and-pricing"
   >> page.document.class
   => String
@@ -103,6 +108,7 @@ You're welcome to fork this project and send pull requests. I want to thank spec
 * Ryan Romanchuk https://github.com/rromanchuk
 * Edmund Haselwanter https://github.com/ehaselwanter
 * Jonathan Hernández https://github.com/ionmx
+* Oriol Gual https://github.com/oriolgual
 = To Do

data/lib/meta_inspector/scraper.rb CHANGED Viewed

@@ -4,6 +4,7 @@ require 'open-uri'
 require 'nokogiri'
 require 'charguess'
 require 'hashie/rash'
+require 'timeout'
 # MetaInspector provides an easy way to scrape web pages and get its elements
 module MetaInspector
@@ -11,10 +12,11 @@ module MetaInspector
     attr_reader :url, :scheme
     # Initializes a new instance of MetaInspector, setting the URL to the one given
     # If no scheme given, set it to http:// by default
-    def initialize(url)
-      @url    = URI.parse(url).scheme.nil? ? 'http://' + url : url
-      @scheme = URI.parse(url).scheme || 'http'
-      @data   = Hashie::Rash.new('url' => @url)
+    def initialize(url, timeout = 20)
+      @url      = URI.parse(url).scheme.nil? ? 'http://' + url : url
+      @scheme   = URI.parse(url).scheme || 'http'
+      @timeout  = timeout
+      @data     = Hashie::Rash.new('url' => @url)
     end
     # Returns the parsed document title, from the content of the <title> tag.
@@ -92,7 +94,7 @@ module MetaInspector
     # Returns the original, unparsed document
     def document
-      @document ||= open(@url).read
+      @document ||= Timeout::timeout(@timeout) { open(@url).read }
       rescue SocketError
         warn 'MetaInspector exception: The url provided does not exist or is temporarily unavailable (socket error)'

data/lib/meta_inspector/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # -*- encoding: utf-8 -*-
 module MetaInspector
-  VERSION = "1.9.0"
+  VERSION = "1.9.1"
 end

data/lib/meta_inspector.rb CHANGED Viewed

@@ -6,7 +6,7 @@ module MetaInspector
   extend self
   # Sugar method to be able to create a scraper in a shorter way
-  def new(url)
-    Scraper.new(url)
+  def new(url, timeout = 20)
+    Scraper.new(url, timeout)
   end
 end

data/meta_inspector.gemspec CHANGED Viewed

@@ -14,7 +14,7 @@ Gem::Specification.new do |gem|
   gem.require_paths = ["lib"]
   gem.version       = MetaInspector::VERSION
-  gem.add_dependency 'nokogiri', '1.5.3'
+  gem.add_dependency 'nokogiri', '~> 1.5'
   gem.add_dependency 'charguess', '1.3.20111021164500'
   gem.add_dependency 'rash', '0.3.2'

metadata CHANGED Viewed

@@ -1,13 +1,13 @@
 --- !ruby/object:Gem::Specification
 name: metainspector
 version: !ruby/object:Gem::Version
-  hash: 51
+  hash: 49
   prerelease:
   segments:
   - 1
   - 9
-  - 0
-  version: 1.9.0
+  - 1
+  version: 1.9.1
 platform: ruby
 authors:
 - Jaime Iniesta
@@ -15,7 +15,7 @@ autorequire:
 bindir: bin
 cert_chain: []
-date: 2012-06-03 00:00:00 Z
+date: 2012-07-11 00:00:00 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: nokogiri
@@ -23,14 +23,13 @@ dependencies:
   requirement: &id001 !ruby/object:Gem::Requirement
     none: false
     requirements:
-    - - "="
+    - - ~>
       - !ruby/object:Gem::Version
         hash: 5
         segments:
         - 1
         - 5
-        - 3
-        version: 1.5.3
+        version: "1.5"
   type: :runtime
   version_requirements: *id001
 - !ruby/object:Gem::Dependency