RubyGems - metainspector - Versions diffs - 1.17.2 → 1.17.3 - Mend

metainspector 1.17.2 → 1.17.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

checksums.yaml +4 -4
data/README.md +10 -16
data/lib/meta_inspector/parser.rb +22 -1
data/lib/meta_inspector/version.rb +1 -1
data/spec/fixtures/opengraph.response +52 -0
data/spec/parser_spec.rb +89 -14
data/spec/spec_helper.rb +1 -0
metadata +4 -3

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 6d1795a40750fba9db895d1ab321f54d94349a38
-  data.tar.gz: 0648ba0471516d9a6e33a9439972619d51580ff7
+  metadata.gz: 5c94b07066d8b0080029d5e93808a4388b716575
+  data.tar.gz: 33135740e3e740e21c4ccc011a44a3466c73a926
 SHA512:
-  metadata.gz: a4c8859c52e9f424cf08c72b8f3515dede0f678ac87bcc2331a8e543fa1a839912c0fd85f2b5c56069990cd270dd5db8c2761ac2a3dce6d5929ba906b3692856
-  data.tar.gz: 61217d3d51e81e892f6750f8d24379ea75a80d9e7d07c2408d6c762786e5c088a14996674ed2cb070a5e532c9286e406e55832398c5bb842f4bd51497fcc1674
+  metadata.gz: 4c3ffda64efceaaaa9631178751df36a0316d17176ce0788ce6256fbd96ac5347f8fc9c6b7019c3942bcd746a2f8868f78fdaa7d7aab83d4314a6a01dd9522dd
+  data.tar.gz: 38c3fd01c8c156c82985c6b6972cec1002d90135f4381ab53f9e69e4dd6595bc2b22c8a530c2035703aabc318dfb9f37abf62396502dae12b994183aea091db2

data/README.md CHANGED Viewed

@@ -22,15 +22,15 @@ This gem is tested on Ruby versions 1.9.2, 1.9.3 and 2.0.0.
 Initialize a MetaInspector instance for an URL, like this:
-    page = MetaInspector.new('http://markupvalidator.com')
+    page = MetaInspector.new('http://sitevalidator.com')
 If you don't include the scheme on the URL, http:// will be used by default:
-    page = MetaInspector.new('markupvalidator.com')
+    page = MetaInspector.new('sitevalidator.com')
 You can also include the html which will be used as the document to scrape:
-    page = MetaInspector.new("http://markupvalidator.com", :document => "<html><head><title>Hello From Passed Html</title><a href='/hello'>Hello link</a></head><body></body></html>")
+    page = MetaInspector.new("http://sitevalidator.com", :document => "<html><head><title>Hello From Passed Html</title><a href='/hello'>Hello link</a></head><body></body></html>")
 ## Accessing scraped data
@@ -38,8 +38,8 @@ Then you can see the scraped data like this:
     page.url                # URL of the page
     page.scheme             # Scheme of the page (http, https)
-    page.host               # Hostname of the page (like, markupvalidator.com, without the scheme)
-    page.root_url           # Root url (scheme + host, like http://markupvalidator.com/)
+    page.host               # Hostname of the page (like, sitevalidator.com, without the scheme)
+    page.root_url           # Root url (scheme + host, like http://sitevalidator.com/)
     page.title              # title of the page, as string
     page.links              # array of strings, with every link found on the page as an absolute URL
     page.internal_links     # array of strings, with every internal link found on the page as an absolute URL
@@ -69,7 +69,7 @@ Please notice that MetaInspector is case sensitive, so `page.meta_Content_Type`
 You can also access most of the scraped data as a hash:
-    page.to_hash  # { "url"   => "http://markupvalidator.com",
+    page.to_hash  # { "url"   => "http://sitevalidator.com",
                       "title" => "MarkupValidator :: site-wide markup validation tool", ... }
 The original document is accessible from:
@@ -106,7 +106,7 @@ Note that MetaInspector gives priority to content over value. In other words if
 By default, MetaInspector times out after 20 seconds of waiting for a page to respond.
 You can set a different timeout with a second parameter, like this:
-    page = MetaInspector.new('markupvalidator.com', :timeout => 5) # 5 seconds timeout
+    page = MetaInspector.new('sitevalidator.com', :timeout => 5) # 5 seconds timeout
 ### Redirections
@@ -124,7 +124,7 @@ However, you can tell MetaInspector to allow these redirections with the option
 MetaInspector will try to parse all URLs by default. If you want to raise an exception when trying to parse a non-html URL (one that has a content-type different than text/html), you can state it like this:
-    page = MetaInspector.new('markupvalidator.com', :html_content_only => true)
+    page = MetaInspector.new('sitevalidator.com', :html_content_only => true)
 This is useful when using MetaInspector on web spidering. Although on the initial URL you'll probably have an HTML URL, following links you may find yourself trying to parse non-html URLs.
@@ -167,8 +167,8 @@ You can find some sample scripts on the samples folder, including a basic scrapi
     >> require 'metainspector'
     => true
-    >> page = MetaInspector.new('http://markupvalidator.com')
-    => #<MetaInspector:0x11330c0 @url="http://markupvalidator.com">
+    >> page = MetaInspector.new('http://sitevalidator.com')
+    => #<MetaInspector:0x11330c0 @url="http://sitevalidator.com">
     >> page.title
     => "MarkupValidator :: site-wide markup validation tool"
@@ -185,12 +185,6 @@ You can find some sample scripts on the samples folder, including a basic scrapi
     >> page.links[4]
     => "/plans-and-pricing"
-    >> page.document.class
-    => String
-    >> page.parsed_document.class
-    => Nokogiri::HTML::Document
 ## ZOMG Fork! Thank you!
 You're welcome to fork this project and send pull requests. Just remember to include specs.

data/lib/meta_inspector/parser.rb CHANGED Viewed

@@ -106,7 +106,11 @@ module MetaInspector
         key = meta_tags_method.meta_tag
         #special treatment for opengraph (og:) and twitter card (twitter:) tags
-        key.gsub!("_",":") if key =~ /^og_(.*)/ || key =~ /^twitter_(.*)/
+        if key =~ /^og_(.*)/
+          key = og_key(key)
+        elsif key =~ /^twitter_(.*)/
+          key.gsub!("_",":")
+        end
         scrape_meta_data
@@ -116,6 +120,23 @@ module MetaInspector
       end
     end
+    # Not all OG keys can be directly translated to meta tags method names replacing _ by : as they include the _ in the name
+    # This is going to be deprecated and replaced soon by a simpler, clearer method, like page.meta['og:site_name']
+    def og_key(key)
+      case key
+      when "og_site_name"
+        "og:site_name"
+      when "og_image_secure_url"
+        "og:image:secure_url"
+      when "og_video_secure_url"
+        "og:video:secure_url"
+      when "og_audio_secure_url"
+        "og:audio:secure_url"
+      else
+        key.gsub("_", ":")
+      end
+    end
     # Scrapes all meta tags found
     def scrape_meta_data
       unless @data.meta

data/lib/meta_inspector/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # -*- encoding: utf-8 -*-
 module MetaInspector
-  VERSION = "1.17.2"
+  VERSION = "1.17.3"
 end

data/spec/fixtures/opengraph.response ADDED Viewed

@@ -0,0 +1,52 @@
+HTTP/1.1 200 OK
+Age: 13
+Cache-Control: max-age=120
+Content-Type: text/html
+Date: Mon, 06 Jan 2014 12:47:42 GMT
+Expires: Mon, 06 Jan 2014 12:49:28 GMT
+Server: Apache/2.2.14 (Ubuntu)
+Vary: Accept-Encoding
+Via: 1.1 varnish
+X-Powered-By: PHP/5.3.2-1ubuntu4.22
+X-Varnish: 1188792404 1188790413
+Content-Length: 40571
+Connection: keep-alive
+<!DOCTYPE html>
+<html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml">
+  <head>
+    <meta http-equiv="Content-type" content="text/html; charset=utf-8">
+    <!-- Basic OG Metadata -->
+    <meta property="og:title" content="An OG title" />
+    <meta property="og:type" content="website" />
+    <meta property="og:url" content="http://example.com/opengraph" />
+    <!-- Optional OG Metadata -->
+    <meta property="og:description" content="Sean Connery found fame and fortune" />
+    <meta property="og:determiner" content="the" />
+    <meta property="og:locale" content="en_GB" />
+    <meta property="og:locale:alternate" content="fr_FR" />
+    <meta property="og:site_name" content="IMDb" />
+    <!-- Structured OG Properties -->
+    <meta property="og:image" content="http://example.com/ogp.jpg" />
+    <meta property="og:image:secure_url" content="https://secure.example.com/ogp.jpg" />
+    <meta property="og:image:type" content="image/jpeg" />
+    <meta property="og:image:width" content="400" />
+    <meta property="og:image:height" content="300" />
+    <meta property="og:video" content="http://example.com/movie.swf" />
+    <meta property="og:video:secure_url" content="https://secure.example.com/movie.swf" />
+    <meta property="og:video:type" content="application/x-shockwave-flash" />
+    <meta property="og:video:width" content="400" />
+    <meta property="og:video:height" content="300" />
+    <meta property="og:audio" content="http://example.com/sound.mp3" />
+    <meta property="og:audio:secure_url" content="https://secure.example.com/sound.mp3" />
+    <meta property="og:audio:type" content="audio/mpeg" />
+  </head>
+  <body>
+    <p>A sample page with many Open Graph meta tags</p>
+  </body>
+</html>

data/spec/parser_spec.rb CHANGED Viewed

@@ -292,7 +292,7 @@ describe MetaInspector::Parser do
   end
   describe 'respond_to? for not implemented methods' do
     before(:each) do
       @m = MetaInspector.new('http://pagerankalert.com')
     end
@@ -346,16 +346,6 @@ describe MetaInspector::Parser do
       @m.meta_generator.should == 'WordPress 3.4.2'
     end
-    it "should find a meta_og_title" do
-      @m = MetaInspector::Parser.new(doc 'http://www.theonion.com/articles/apple-claims-new-iphone-only-visible-to-most-loyal,2772/')
-      @m.meta_og_title.should == "Apple Claims New iPhone Only Visible To Most Loyal Of Customers"
-    end
-    it "should not find a meta_og_something" do
-      @m = MetaInspector::Parser.new(doc 'http://www.theonion.com/articles/apple-claims-new-iphone-only-visible-to-most-loyal,2772/')
-      @m.meta_og_something.should == nil
-    end
     it "should find a meta_twitter_site" do
       @m = MetaInspector::Parser.new(doc 'http://www.youtube.com/watch?v=iaGSSrp49uc')
       @m.meta_twitter_site.should == "@youtube"
@@ -371,9 +361,94 @@ describe MetaInspector::Parser do
       @m.meta_twitter_dummy.should == nil
     end
-    it "should find a meta_og_video_width" do
-      @m = MetaInspector::Parser.new(doc 'http://www.youtube.com/watch?v=iaGSSrp49uc')
-      @m.meta_og_video_width.should == "1920"
+    describe "opengraph meta tags" do
+      before(:each) do
+        @m = MetaInspector::Parser.new(doc 'http://example.com/opengraph')
+      end
+      it "should find a meta og:title" do
+        @m.meta_og_title.should == "An OG title"
+      end
+      it "should find a meta og:type" do
+        @m.meta_og_type.should == "website"
+      end
+      it "should find a meta og:url" do
+        @m.meta_og_url.should == "http://example.com/opengraph"
+      end
+      it "should find a meta og:description" do
+        @m.meta_og_description.should == "Sean Connery found fame and fortune"
+      end
+      it "should find a meta og:determiner" do
+        @m.meta_og_determiner.should == "the"
+      end
+      it "should find a meta og:locale" do
+        @m.meta_og_locale.should == "en_GB"
+      end
+      it "should find a meta og:locale:alternate" do
+        @m.meta_og_locale_alternate.should == "fr_FR"
+      end
+      it "should find a meta og:site_name" do
+        @m.meta_og_site_name.should == "IMDb"
+      end
+      it "should find a meta og:image" do
+        @m.meta_og_image.should == "http://example.com/ogp.jpg"
+      end
+      it "should find a meta og:image:secure_url" do
+        @m.meta_og_image_secure_url.should == "https://secure.example.com/ogp.jpg"
+      end
+      it "should find a meta og:image:type" do
+        @m.meta_og_image_type.should == "image/jpeg"
+      end
+      it "should find a meta og:image:width" do
+        @m.meta_og_image_width.should == "400"
+      end
+      it "should find a meta og:image:height" do
+        @m.meta_og_image_height.should == "300"
+      end
+      it "should find a meta og:video" do
+        @m.meta_og_video.should == "http://example.com/movie.swf"
+      end
+      it "should find a meta og:video:secure_url" do
+        @m.meta_og_video_secure_url.should == "https://secure.example.com/movie.swf"
+      end
+      it "should find a meta og:video:type" do
+        @m.meta_og_video_type.should == "application/x-shockwave-flash"
+      end
+      it "should find a meta og:video:width" do
+        @m.meta_og_video_width.should == "400"
+      end
+      it "should find a meta og:video:height" do
+        @m.meta_og_video_height.should == "300"
+      end
+      it "should find a meta og:audio" do
+        @m.meta_og_audio.should == "http://example.com/sound.mp3"
+      end
+      it "should find a meta og:video:secure_url" do
+        @m.meta_og_audio_secure_url.should == "https://secure.example.com/sound.mp3"
+      end
+      it "should find a meta og:audio:type" do
+        @m.meta_og_audio_type.should == "audio/mpeg"
+      end
     end
   end

data/spec/spec_helper.rb CHANGED Viewed

@@ -41,6 +41,7 @@ FakeWeb.register_uri(:get, "http://charset002.com", :response => fixture_file("c
 FakeWeb.register_uri(:get, "http://www.inkthemes.com/", :response => fixture_file("wordpress_site.response"))
 FakeWeb.register_uri(:get, "http://pagerankalert.com/image.png", :body => "Image", :content_type => "image/png")
 FakeWeb.register_uri(:get, "http://pagerankalert.com/file.tar.gz", :body => "Image", :content_type => "application/x-gzip")
+FakeWeb.register_uri(:get, "http://example.com/opengraph", :response => fixture_file("opengraph.response"))
 # These examples are used to test relative links
 FakeWeb.register_uri(:get, "http://relative.com/", :response => fixture_file("relative_links.response"))

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: metainspector
 version: !ruby/object:Gem::Version
-  version: 1.17.2
+  version: 1.17.3
 platform: ruby
 authors:
 - Jaime Iniesta
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2013-11-21 00:00:00.000000000 Z
+date: 2014-01-09 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: nokogiri
@@ -168,6 +168,7 @@ files:
 - spec/fixtures/malformed_href.response
 - spec/fixtures/markupvalidator_faqs.response
 - spec/fixtures/nonhttp.response
+- spec/fixtures/opengraph.response
 - spec/fixtures/pagerankalert.com.response
 - spec/fixtures/protocol_relative.response
 - spec/fixtures/relative_links.response
@@ -206,7 +207,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
       version: '0'
 requirements: []
 rubyforge_project:
-rubygems_version: 2.1.3
+rubygems_version: 2.0.6
 signing_key:
 specification_version: 4
 summary: MetaInspector is a ruby gem for web scraping purposes, that returns a hash