RubyGems - metainspector - Versions diffs - 1.17.3 → 2.0.0 - Mend

metainspector 1.17.3 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

checksums.yaml +4 -4
data/README.md +123 -52
data/lib/meta_inspector.rb +0 -1
data/lib/meta_inspector/document.rb +5 -4
data/lib/meta_inspector/parser.rb +38 -69
data/lib/meta_inspector/version.rb +1 -1
data/meta_inspector.gemspec +0 -1
data/spec/document_spec.rb +7 -10
data/spec/fixtures/meta_tags.response +54 -0
data/spec/fixtures/youtube.response +1 -1
data/spec/parser_spec.rb +88 -202
data/spec/spec_helper.rb +1 -1
metadata +3 -18
data/lib/meta_inspector/meta_tags_dynamic_match.rb +0 -18
data/spec/fixtures/opengraph.response +0 -52

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 5c94b07066d8b0080029d5e93808a4388b716575
-  data.tar.gz: 33135740e3e740e21c4ccc011a44a3466c73a926
+  metadata.gz: f4afd755a0fdc53abb2c3af992bea56021245ec2
+  data.tar.gz: fddcbbb1b151558bf245c9630585553149a93798
 SHA512:
-  metadata.gz: 4c3ffda64efceaaaa9631178751df36a0316d17176ce0788ce6256fbd96ac5347f8fc9c6b7019c3942bcd746a2f8868f78fdaa7d7aab83d4314a6a01dd9522dd
-  data.tar.gz: 38c3fd01c8c156c82985c6b6972cec1002d90135f4381ab53f9e69e4dd6595bc2b22c8a530c2035703aabc318dfb9f37abf62396502dae12b994183aea091db2
+  metadata.gz: e3c4cf8afc4de72cf0432cb2c051f3fc38049f1e1e648033c151644d5a3b38a211f829d88cff33f7813ff6d1d18fb945aa6016df9dea286a7b1d8eed37bbc623
+  data.tar.gz: d18e6647d1187ff115a734d5d9ebca9ab6f87eab4b12b91f0541df6db65cb2c2eb5e31b1be298c7540862fdfbdc3b84ca79eb5769401abfd0de2dac2a701d744

data/README.md CHANGED Viewed

@@ -1,6 +1,8 @@
 # MetaInspector [![Build Status](https://secure.travis-ci.org/jaimeiniesta/metainspector.png)](http://travis-ci.org/jaimeiniesta/metainspector) [![Dependency Status](https://gemnasium.com/jaimeiniesta/metainspector.png)](https://gemnasium.com/jaimeiniesta/metainspector)
-MetaInspector is a gem for web scraping purposes. You give it an URL, and it lets you easily get its title, links, images, charset, description, keywords, meta tags...
+MetaInspector is a gem for web scraping purposes.
+You give it an URL, and it lets you easily get its title, links, images, charset, description, keywords, meta tags...
 ## See it in action!
@@ -36,36 +38,124 @@ You can also include the html which will be used as the document to scrape:
 Then you can see the scraped data like this:
-    page.url                # URL of the page
-    page.scheme             # Scheme of the page (http, https)
-    page.host               # Hostname of the page (like, sitevalidator.com, without the scheme)
-    page.root_url           # Root url (scheme + host, like http://sitevalidator.com/)
-    page.title              # title of the page, as string
-    page.links              # array of strings, with every link found on the page as an absolute URL
-    page.internal_links     # array of strings, with every internal link found on the page as an absolute URL
-    page.external_links     # array of strings, with every external link found on the page as an absolute URL
-    page.meta_description   # meta description, as string
-    page.description        # returns the meta description, or the first long paragraph if no meta description is found
-    page.meta_keywords      # meta keywords, as string
-    page.image              # Most relevant image, if defined with og:image
-    page.images             # array of strings, with every img found on the page as an absolute URL
-    page.feed               # Get rss or atom links in meta data fields as array
-    page.charset            # UTF-8
-    page.content_type       # content-type returned by the server when the url was requested
-MetaInspector uses dynamic methods for meta_tag discovery, so all these will work, and will be converted to a search of a meta tag by the corresponding name, and return its content attribute
-    page.meta_description   # <meta name="description" content="..." />
-    page.meta_keywords      # <meta name="keywords" content="..." />
-    page.meta_robots        # <meta name="robots" content="..." />
-    page.meta_generator     # <meta name="generator" content="..." />
-It will also work for the meta tags of the form <meta http-equiv="name" ... />, like the following:
-    page.meta_content_language  # <meta http-equiv="content-language" content="..." />
-    page.meta_Content_Type      # <meta http-equiv="Content-Type" content="..." />
-Please notice that MetaInspector is case sensitive, so `page.meta_Content_Type` is not the same as `page.meta_content_type`
+    page.url                 # URL of the page
+    page.scheme              # Scheme of the page (http, https)
+    page.host                # Hostname of the page (like, sitevalidator.com, without the scheme)
+    page.root_url            # Root url (scheme + host, like http://sitevalidator.com/)
+    page.title               # title of the page, as string
+    page.links               # array of strings, with every link found on the page as an absolute URL
+    page.internal_links      # array of strings, with every internal link found on the page as an absolute URL
+    page.external_links      # array of strings, with every external link found on the page as an absolute URL
+    page.meta['keywords']    # meta keywords, as string
+    page.meta['description'] # meta description, as string
+    page.description         # returns the meta description, or the first long paragraph if no meta description is found
+    page.image               # Most relevant image, if defined with the og:image meta tag
+    page.images              # array of strings, with every img found on the page as an absolute URL
+    page.feed                # Get rss or atom links in meta data fields as array
+    page.charset             # UTF-8
+    page.content_type        # content-type returned by the server when the url was requested
+## Meta tags
+When it comes to meta tags, you have several options:
+    page.meta_tags          # Gives you all the meta tags by type:
+                            # (meta name, meta http-equiv, meta property and meta charset)
+                            # As meta tags can be repeated (in the case of 'og:image', for example),
+                            # the values returned will be arrays
+                            #
+                            # For example:
+                            #
+                            # {
+                                'name' => {
+                                            'keywords'       => ['one, two, three'],
+                                            'description'    => ['the description'],
+                                            'author'         => ['Joe Sample'],
+                                            'robots'         => ['index,follow'],
+                                            'revisit'        => ['15 days'],
+                                            'dc.date.issued' => ['2011-09-15']
+                                           },
+                                'http-equiv' => {
+                                                  'content-type'        => ['text/html; charset=UTF-8'],
+                                                  'content-style-type'  => ['text/css']
+                                                },
+                                'property' => {
+                                                'og:title'        => ['An OG title'],
+                                                'og:type'         => ['website'],
+                                                'og:url'          => ['http://example.com/meta-tags'],
+                                                'og:image'        => ['http://example.com/rock.jpg',
+                                                                      'http://example.com/rock2.jpg',
+                                                                      'http://example.com/rock3.jpg'],
+                                                'og:image:width'  => ['300'],
+                                                'og:image:height' => ['300', '1000']
+                                              },
+                                'charset' => ['UTF-8']
+                              }
+As this method returns a hash, you can also take only the key that you need, like in:
+    page.meta_tags['property']  # Returns:
+                                # {
+                                #   'og:title'        => ['An OG title'],
+                                #   'og:type'         => ['website'],
+                                #   'og:url'          => ['http://example.com/meta-tags'],
+                                #   'og:image'        => ['http://example.com/rock.jpg',
+                                #                         'http://example.com/rock2.jpg',
+                                #                         'http://example.com/rock3.jpg'],
+                                #   'og:image:width'  => ['300'],
+                                #   'og:image:height' => ['300', '1000']
+                                # }
+In most cases you will only be interested in the first occurrence of a meta tag, so you can
+use the singular form of that method:
+    page.meta_tag['name']  # Returns:
+                           # {
+                           #   'keywords'       => 'one, two, three',
+                           #   'description'    => 'the description',
+                           #   'author'         => 'Joe Sample',
+                           #   'robots'         => 'index,follow',
+                           #   'revisit'        => '15 days',
+                           #   'dc.date.issued' => '2011-09-15'
+                           #  }
+Or, as this is also a hash:
+    page.meta_tag['name']['keywords']    # Returns 'one, two, three'
+And finally, you can use the shorter `meta` method that will merge the different keys so you have
+a simpler hash:
+    page.meta       # Returns:
+                    #
+                    # {
+                    #     'keywords'            => 'one, two, three',
+                    #     'description'         => 'the description',
+                    #     'author'              => 'Joe Sample',
+                    #     'robots'              => 'index,follow',
+                    #     'revisit'             => '15 days',
+                    #     'dc.date.issued'      => '2011-09-15',
+                    #     'content-type'        => 'text/html; charset=UTF-8',
+                    #     'content-style-type'  => 'text/css',
+                    #     'og:title'            => 'An OG title',
+                    #     'og:type'             => 'website',
+                    #     'og:url'              => 'http://example.com/meta-tags',
+                    #     'og:image'            => 'http://example.com/rock.jpg',
+                    #     'og:image:width'      => '300',
+                    #     'og:image:height'     => '300',
+                    #     'charset'             => 'UTF-8'
+                    #   }
+This way, you can get most meta tags just like that:
+    page.meta['author']     # Returns "Joe Sample"
+Please be aware that all keys are converted to downcase, so it's `'dc.date.issued'` and not `'DC.date.issued'`.
+## Other representations
 You can also access most of the scraped data as a hash:
@@ -80,25 +170,6 @@ And the full scraped document is accessible from:
     page.parsed  # Nokogiri doc that you can use it to get any element from the page
-## Opengraph and Twitter card meta tags
-Twitter cards & Open graph tags make it possible for you to attach media experiences to Tweets & Facebook posts. Nowadays most of the content creators add these meta tags to headers to quickly identify content on the page. Sometimes these tags could be nested as well. For example when a site wants to provide information about primary image used on a page it could use
-    <meta name="og:image" content="http://www.somedomain.com/assets/images/abc.jpeg">
-    <meta name="og:image:width" content="200">
-    <meta name="twitter:image" value="http://www.somedomain.com/assets/images/abc.jpeg">
-    <meta property="twitter:image:width" value="200">
-Also many sites use name & property, content & value attributes interchangeably. Using MetaInspector accessing this information is as easy as -
-    page.meta_og_image
-    page.meta_twitter_image_width
-Note that MetaInspector gives priority to content over value. In other words if there is a tag of the form
-    <meta property="og:something" value="100" content="real value">
-    page.meta_og_something #=> "real value"
 ## Options
 ### Timeout
@@ -173,10 +244,10 @@ You can find some sample scripts on the samples folder, including a basic scrapi
     >> page.title
     => "MarkupValidator :: site-wide markup validation tool"
-    >> page.meta_description
+    >> page.meta['description']
     => "Site-wide markup validation tool. Validate the markup of your whole site with just one click."
-    >> page.meta_keywords
+    >> page.meta['keywords']
     => "html, markup, validation, validator, tool, w3c, development, standards, free"
     >> page.links.size

data/lib/meta_inspector.rb CHANGED Viewed

@@ -5,7 +5,6 @@ require File.expand_path(File.join(File.dirname(__FILE__), 'meta_inspector/excep
 require File.expand_path(File.join(File.dirname(__FILE__), 'meta_inspector/exception_log'))
 require File.expand_path(File.join(File.dirname(__FILE__), 'meta_inspector/request'))
 require File.expand_path(File.join(File.dirname(__FILE__), 'meta_inspector/url'))
-require File.expand_path(File.join(File.dirname(__FILE__), 'meta_inspector/meta_tags_dynamic_match'))
 require File.expand_path(File.join(File.dirname(__FILE__), 'meta_inspector/parser'))
 require File.expand_path(File.join(File.dirname(__FILE__), 'meta_inspector/document'))
 require File.expand_path(File.join(File.dirname(__FILE__), 'meta_inspector/deprecations'))

data/lib/meta_inspector/document.rb CHANGED Viewed

@@ -39,8 +39,8 @@ module MetaInspector
     extend Forwardable
     def_delegators :@url,     :url, :scheme, :host, :root_url
     def_delegators :@request, :content_type
-    def_delegators :@parser,  :parsed, :method_missing, :respond_to?, :title, :description, :links, :internal_links, :external_links,
-                              :images, :image, :feed, :charset
+    def_delegators :@parser,  :parsed, :respond_to?, :title, :description, :links, :internal_links, :external_links,
+                              :images, :image, :feed, :charset, :meta_tags, :meta_tag, :meta
     # Returns all document data as a nested Hash
     def to_hash
@@ -53,8 +53,9 @@ module MetaInspector
         'images' => images,
         'charset' => charset,
         'feed' => feed,
-        'content_type' => content_type
-      }.merge @parser.to_hash
+        'content_type' => content_type,
+        'meta_tags' => meta_tags
+      }
     end
     # Returns the contents of the document as a string

data/lib/meta_inspector/parser.rb CHANGED Viewed

@@ -1,7 +1,6 @@
 # -*- encoding: utf-8 -*-
 require 'nokogiri'
-require 'hashie/rash'
 module MetaInspector
   # Parses the document with Nokogiri
@@ -12,13 +11,29 @@ module MetaInspector
       options = defaults.merge(options)
       @document       = document
-      @data           = Hashie::Rash.new
       @exception_log  = options[:exception_log]
     end
     extend Forwardable
     def_delegators :@document, :url, :scheme, :host
+    def meta_tags
+      {
+        'name'        => meta_tags_by('name'),
+        'http-equiv'  => meta_tags_by('http-equiv'),
+        'property'    => meta_tags_by('property'),
+        'charset'     => [charset_from_meta_charset]
+      }
+    end
+    def meta_tag
+      convert_each_array_to_first_element_on meta_tags
+    end
+    def meta
+      meta_tag['name'].merge(meta_tag['http-equiv']).merge(meta_tag['property']).merge({'charset' => meta_tag['charset']})
+    end
     # Returns the whole parsed document
     def parsed
       @parsed ||= Nokogiri::HTML(@document.to_s)
@@ -27,11 +42,6 @@ module MetaInspector
         @exception_log << e
     end
-    def to_hash
-      scrape_meta_data
-      @data.to_hash
-    end
     # Returns the parsed document title, from the content of the <title> tag.
     # This is not the same as the meta_title tag
     def title
@@ -41,7 +51,7 @@ module MetaInspector
     # A description getter that first checks for a meta description and if not present will
     # guess by looking at the first paragraph with more than 120 characters
     def description
-      meta_description || secondary_description
+      meta['description'] || secondary_description
     end
     # Links found on the page, as absolute URLs
@@ -67,8 +77,10 @@ module MetaInspector
     # Returns the parsed image from Facebook's open graph property tags
     # Most all major websites now define this property and is usually very relevant
     # See doc at http://developers.facebook.com/docs/opengraph/
+    # If none found, tries with Twitter image
+    # TODO: if not found, try with images.first
     def image
-      meta_og_image || meta_twitter_image
+      meta['og:image'] || meta['twitter:image']
     end
     # Returns the parsed document meta rss link
@@ -83,81 +95,38 @@ module MetaInspector
       @charset ||= (charset_from_meta_charset || charset_from_meta_content_type)
     end
-    def respond_to?(method_name, include_private = false)
-      MetaInspector::MetaTagsDynamicMatch.new(method_name).match? || super
-    end
     private
     def defaults
       { exception_log: MetaInspector::ExceptionLog.new }
     end
-    # Scrapers for all meta_tags in the form of "meta_name" are automatically defined. This has been tested for
-    # meta name: keywords, description, robots, generator
-    # meta http-equiv: content-language, Content-Type
-    #
-    # It will first try with meta name="..." and if nothing found,
-    # with meta http-equiv="...", substituting "_" by "-"
-    def method_missing(method_name)
-      meta_tags_method = MetaInspector::MetaTagsDynamicMatch.new(method_name)
+    def meta_tags_by(attribute)
+      hash = {}
+      parsed.css("meta[@#{attribute}]").map do |tag|
+        name    = tag.attributes[attribute].value.downcase rescue nil
+        content = tag.attributes['content'].value rescue nil
-      if meta_tags_method.match?
-        key = meta_tags_method.meta_tag
-        #special treatment for opengraph (og:) and twitter card (twitter:) tags
-        if key =~ /^og_(.*)/
-          key = og_key(key)
-        elsif key =~ /^twitter_(.*)/
-          key.gsub!("_",":")
+        if name && content
+          hash[name] ||= []
+          hash[name] << content
         end
-        scrape_meta_data
-        @data.meta.name && (@data.meta.name[key.downcase]) || (@data.meta.property && @data.meta.property[key.downcase])
-      else
-        super
-      end
-    end
-    # Not all OG keys can be directly translated to meta tags method names replacing _ by : as they include the _ in the name
-    # This is going to be deprecated and replaced soon by a simpler, clearer method, like page.meta['og:site_name']
-    def og_key(key)
-      case key
-      when "og_site_name"
-        "og:site_name"
-      when "og_image_secure_url"
-        "og:image:secure_url"
-      when "og_video_secure_url"
-        "og:video:secure_url"
-      when "og_audio_secure_url"
-        "og:audio:secure_url"
-      else
-        key.gsub("_", ":")
       end
+      hash
     end
-    # Scrapes all meta tags found
-    def scrape_meta_data
-      unless @data.meta
-        @data.meta!.name!
-        @data.meta!.property!
-        parsed_search("//meta").each do |element|
-          get_meta_name_or_property(element)
+    def convert_each_array_to_first_element_on(hash)
+      hash.each_pair do |k, v|
+        hash[k] = if v.is_a?(Hash)
+          convert_each_array_to_first_element_on(v)
+        elsif v.is_a?(Array)
+          v.first
+        else
+          v
         end
       end
     end
-    # Store meta tag value, looking at meta name or meta property
-    def get_meta_name_or_property(element)
-      name_or_property = element.attributes["name"] ? "name" : (element.attributes["property"] ? "property" : nil)
-      content_or_value = element.attributes["content"] ? "content" : (element.attributes["value"] ? "value" : nil)
-      if !name_or_property.nil? && !content_or_value.nil?
-        @data.meta.name[element.attributes[name_or_property].value.downcase] = element.attributes[content_or_value].value
-      end
-    end
     # Look for the first <p> block with 120 characters or more
     def secondary_description
       first_long_paragraph = parsed_search('//p[string-length() >= 120]').first

data/lib/meta_inspector/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # -*- encoding: utf-8 -*-
 module MetaInspector
-  VERSION = "1.17.3"
+  VERSION = "2.0.0"
 end

data/meta_inspector.gemspec CHANGED Viewed

@@ -16,7 +16,6 @@ Gem::Specification.new do |gem|
   gem.version       = MetaInspector::VERSION
   gem.add_dependency 'nokogiri', '~> 1.6'
-  gem.add_dependency 'rash', '~> 0.4.0'
   gem.add_dependency 'open_uri_redirections', '~> 0.1.4'
   gem.add_dependency 'addressable', '~> 2.3.5'

data/spec/document_spec.rb CHANGED Viewed

@@ -46,16 +46,13 @@ describe MetaInspector::Document do
                             "charset"         => "utf-8",
                             "feed"            => "http://feeds.feedburner.com/PageRankAlert",
                             "content_type"    =>"text/html",
-                            "meta"            => {
-                                                    "name" => {
-                                                                "description"=> "Track your PageRank(TM) changes and receive alerts by email",
-                                                                "keywords" => "pagerank, seo, optimization, google",
-                                                                "robots" => "all,follow",
-                                                                "csrf_param" => "authenticity_token",
-                                                                "csrf_token" => "iW1/w+R8zrtDkhOlivkLZ793BN04Kr3X/pS+ixObHsE="
-                                                              },
-                                                    "property"=>{}
-                                                 }
+                            "meta_tags"       => { "name" => { "description" => ["Track your PageRank(TM) changes and receive alerts by email"],
+                                                               "keywords"    => ["pagerank, seo, optimization, google"], "robots"=>["all,follow"],
+                                                               "csrf-param"  => ["authenticity_token"],
+                                                               "csrf-token"  => ["iW1/w+R8zrtDkhOlivkLZ793BN04Kr3X/pS+ixObHsE="] },
+                                                   "http-equiv" => {},
+                                                   "property"   => {},
+                                                   "charset"    => ["utf-8"] }
                          }
   end

data/spec/fixtures/meta_tags.response ADDED Viewed

@@ -0,0 +1,54 @@
+HTTP/1.1 200 OK
+Age: 13
+Cache-Control: max-age=120
+Content-Type: text/html
+Date: Mon, 06 Jan 2014 12:47:42 GMT
+Expires: Mon, 06 Jan 2014 12:49:28 GMT
+Server: Apache/2.2.14 (Ubuntu)
+Vary: Accept-Encoding
+Via: 1.1 varnish
+X-Powered-By: PHP/5.3.2-1ubuntu4.22
+X-Varnish: 1188792404 1188790413
+Content-Length: 40571
+Connection: keep-alive
+<!DOCTYPE html>
+<html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml">
+  <head>
+    <!-- meta name examples -->
+    <meta name="keywords"       content="one, two, three" />
+    <meta name="description"    content="the description" />
+    <meta name="author"         content="Joe Sample" />
+    <meta name="robots"         content="index,follow" />
+    <meta name="revisit"        content="15 days" />
+    <meta name="DC.date.issued" content="2011-09-15">
+    <!-- meta http-equiv examples -->
+    <meta http-equiv="Content-Type"       content="text/html; charset=UTF-8">
+    <meta http-equiv="Content-Style-Type" content="text/css" />
+    <!-- meta charset examples -->
+    <meta charset="UTF-8" />
+    <!-- meta property examples -->
+    <meta property="og:title" content="An OG title" />
+    <meta property="og:type" content="website" />
+    <meta property="og:url" content="http://example.com/meta-tags" />
+    <!-- meta properties can be repeated, like in this example from http://open.me -->
+    <meta property="og:image" content="http://example.com/rock.jpg" />
+    <meta property="og:image:width" content="300" />
+    <meta property="og:image:height" content="300" />
+    <meta property="og:image" content="http://example.com/rock2.jpg" />
+    <meta property="og:image" content="http://example.com/rock3.jpg" />
+    <meta property="og:image:height" content="1000" />
+  </head>
+  <body>
+    <p>A sample page with many types of meta tags</p>
+  </body>
+</html>

data/spec/fixtures/youtube.response CHANGED Viewed

@@ -38,7 +38,7 @@ var yt = yt || {};yt.timing = yt.timing || {};yt.timing.tick = function(label, o
       <meta name="title" content="4. Far Cry 3 - Ubisoft E3 2011 Press Conference HD 1080p">
-      <meta name="description" content="">
+      <meta name="description" content="This is Youtube">
       <meta name="keywords" content="FARCRY, Ubisoft, E3, 2011, Press, Conference, HD, 1080p">

data/spec/parser_spec.rb CHANGED Viewed

@@ -21,7 +21,6 @@ describe MetaInspector::Parser do
       it "should find the og image" do
         @m = MetaInspector::Parser.new(doc 'http://www.theonion.com/articles/apple-claims-new-iphone-only-visible-to-most-loyal,2772/')
         @m.image.should == "http://o.onionstatic.com/images/articles/article/2772/Apple-Claims-600w-R_jpg_130x110_q85.jpg"
-        @m.meta_og_image.should == "http://o.onionstatic.com/images/articles/article/2772/Apple-Claims-600w-R_jpg_130x110_q85.jpg"
       end
       it "should find image on youtube" do
@@ -71,16 +70,16 @@ describe MetaInspector::Parser do
         @m.feed.should == nil
       end
     end
+  end
-    describe "get description" do
-      it "should find description on youtube" do
-        MetaInspector::Parser.new(doc 'http://www.youtube.com/watch?v=iaGSSrp49uc').description.should == ""
-      end
+  describe '#description' do
+    it "should find description from meta description" do
+      page = MetaInspector::Parser.new(doc 'http://www.youtube.com/watch?v=iaGSSrp49uc')
+      page.description.should == "This is Youtube"
     end
-  end
-  describe 'Page with missing meta description' do
-    it "should find a secondary description" do
+    it "should find a secondary description if no meta description" do
       @m = MetaInspector::Parser.new(doc 'http://theonion-no-description.com')
       @m.description.should == "SAN FRANCISCO—In a move expected to revolutionize the mobile device industry, Apple launched its fastest and most powerful iPhone to date Tuesday, an innovative new model that can only be seen by the company's hippest and most dedicated customers. This is secondary text picked up because of a missing meta description."
     end
@@ -267,188 +266,87 @@ describe MetaInspector::Parser do
     end
   end
-  describe 'respond_to? for meta tags ghost methods' do
-    before(:each) do
-      @m = MetaInspector.new('http://pagerankalert.com')
-    end
-    it "should return true for meta tags as string" do
-      @m.respond_to?("meta_robots").should be_true
-    end
-    it "should return true for meta tags as symbols" do
-      @m.respond_to?(:meta_robots).should be_true
-    end
-    it "should return true for meta_twitter_site as string" do
-      @m = MetaInspector.new('http://www.youtube.com/watch?v=iaGSSrp49uc')
-      @m.respond_to?("meta_twitter_site").should be_true
-    end
-    it "should return true for meta_twitter_site as symbol" do
-      @m = MetaInspector.new('http://www.youtube.com/watch?v=iaGSSrp49uc')
-      @m.respond_to?(:meta_twitter_player_width).should be_true
-    end
-  end
-  describe 'respond_to? for not implemented methods' do
-    before(:each) do
-      @m = MetaInspector.new('http://pagerankalert.com')
-    end
-    it "should return false when method name passed as string" do
-      @m.respond_to?("method_not_implemented").should be_false
-    end
-    it "should return false when method name passed as symbols" do
-      @m = MetaInspector.new('http://www.youtube.com/watch?v=iaGSSrp49uc')
-      @m.respond_to?(:method_not_implemented).should be_false
-    end
-  end
-  describe 'Getting meta tags by ghost methods' do
-    before(:each) do
-      @m = MetaInspector::Parser.new(doc 'http://pagerankalert.com')
-    end
-    it "should get the robots meta tag" do
-      @m.meta_robots.should == 'all,follow'
-    end
-    it "should get the robots meta tag" do
-      @m.meta_RoBoTs.should == 'all,follow'
-    end
-    it "should get the description meta tag" do
-      @m.meta_description.should == 'Track your PageRank(TM) changes and receive alerts by email'
-    end
-    it "should get the keywords meta tag" do
-      @m.meta_keywords.should == "pagerank, seo, optimization, google"
-    end
-    it "should get the content-language meta tag" do
-      pending "mocks"
-      @m.meta_content_language.should == "en"
-    end
-    it "should get the Csrf_pAram meta tag" do
-      @m.meta_Csrf_pAram.should == "authenticity_token"
-    end
-    it "should return nil for nonfound meta_tags" do
-      @m.meta_lollypop.should == nil
-    end
-    it "should get the generator meta tag" do
-      @m = MetaInspector::Parser.new(doc 'http://www.inkthemes.com/')
-      @m.meta_generator.should == 'WordPress 3.4.2'
-    end
-    it "should find a meta_twitter_site" do
-      @m = MetaInspector::Parser.new(doc 'http://www.youtube.com/watch?v=iaGSSrp49uc')
-      @m.meta_twitter_site.should == "@youtube"
-    end
-    it "should find a meta_twitter_player_width" do
-      @m = MetaInspector::Parser.new(doc 'http://www.youtube.com/watch?v=iaGSSrp49uc')
-      @m.meta_twitter_player_width.should == "1920"
-    end
-    it "should not find a meta_twitter_dummy" do
-      @m = MetaInspector::Parser.new(doc 'http://www.youtube.com/watch?v=iaGSSrp49uc')
-      @m.meta_twitter_dummy.should == nil
-    end
-    describe "opengraph meta tags" do
-      before(:each) do
-        @m = MetaInspector::Parser.new(doc 'http://example.com/opengraph')
-      end
-      it "should find a meta og:title" do
-        @m.meta_og_title.should == "An OG title"
-      end
-      it "should find a meta og:type" do
-        @m.meta_og_type.should == "website"
-      end
-      it "should find a meta og:url" do
-        @m.meta_og_url.should == "http://example.com/opengraph"
-      end
-      it "should find a meta og:description" do
-        @m.meta_og_description.should == "Sean Connery found fame and fortune"
-      end
-      it "should find a meta og:determiner" do
-        @m.meta_og_determiner.should == "the"
-      end
-      it "should find a meta og:locale" do
-        @m.meta_og_locale.should == "en_GB"
-      end
-      it "should find a meta og:locale:alternate" do
-        @m.meta_og_locale_alternate.should == "fr_FR"
-      end
-      it "should find a meta og:site_name" do
-        @m.meta_og_site_name.should == "IMDb"
-      end
-      it "should find a meta og:image" do
-        @m.meta_og_image.should == "http://example.com/ogp.jpg"
-      end
-      it "should find a meta og:image:secure_url" do
-        @m.meta_og_image_secure_url.should == "https://secure.example.com/ogp.jpg"
-      end
-      it "should find a meta og:image:type" do
-        @m.meta_og_image_type.should == "image/jpeg"
-      end
-      it "should find a meta og:image:width" do
-        @m.meta_og_image_width.should == "400"
-      end
-      it "should find a meta og:image:height" do
-        @m.meta_og_image_height.should == "300"
-      end
-      it "should find a meta og:video" do
-        @m.meta_og_video.should == "http://example.com/movie.swf"
-      end
-      it "should find a meta og:video:secure_url" do
-        @m.meta_og_video_secure_url.should == "https://secure.example.com/movie.swf"
-      end
-      it "should find a meta og:video:type" do
-        @m.meta_og_video_type.should == "application/x-shockwave-flash"
-      end
-      it "should find a meta og:video:width" do
-        @m.meta_og_video_width.should == "400"
-      end
-      it "should find a meta og:video:height" do
-        @m.meta_og_video_height.should == "300"
-      end
-      it "should find a meta og:audio" do
-        @m.meta_og_audio.should == "http://example.com/sound.mp3"
-      end
-      it "should find a meta og:video:secure_url" do
-        @m.meta_og_audio_secure_url.should == "https://secure.example.com/sound.mp3"
-      end
-      it "should find a meta og:audio:type" do
-        @m.meta_og_audio_type.should == "audio/mpeg"
-      end
+  describe 'Getting meta tags' do
+    let(:page) { MetaInspector::Parser.new(doc 'http://example.com/meta-tags') }
+    it "#meta_tags" do
+      page.meta_tags.should == {
+                                  'name' => {
+                                              'keywords'       => ['one, two, three'],
+                                              'description'    => ['the description'],
+                                              'author'         => ['Joe Sample'],
+                                              'robots'         => ['index,follow'],
+                                              'revisit'        => ['15 days'],
+                                              'dc.date.issued' => ['2011-09-15']
+                                             },
+                                  'http-equiv' => {
+                                                    'content-type'        => ['text/html; charset=UTF-8'],
+                                                    'content-style-type'  => ['text/css']
+                                                  },
+                                  'property' => {
+                                                  'og:title'        => ['An OG title'],
+                                                  'og:type'         => ['website'],
+                                                  'og:url'          => ['http://example.com/meta-tags'],
+                                                  'og:image'        => ['http://example.com/rock.jpg',
+                                                                        'http://example.com/rock2.jpg',
+                                                                        'http://example.com/rock3.jpg'],
+                                                  'og:image:width'  => ['300'],
+                                                  'og:image:height' => ['300', '1000']
+                                                },
+                                  'charset' => ['UTF-8']
+                                }
+    end
+    it "#meta_tag" do
+      page.meta_tag.should == {
+                                  'name' => {
+                                              'keywords'       => 'one, two, three',
+                                              'description'    => 'the description',
+                                              'author'         => 'Joe Sample',
+                                              'robots'         => 'index,follow',
+                                              'revisit'        => '15 days',
+                                              'dc.date.issued' => '2011-09-15'
+                                             },
+                                  'http-equiv' => {
+                                                    'content-type'        => 'text/html; charset=UTF-8',
+                                                    'content-style-type'  => 'text/css'
+                                                  },
+                                  'property' => {
+                                                  'og:title'        => 'An OG title',
+                                                  'og:type'         => 'website',
+                                                  'og:url'          => 'http://example.com/meta-tags',
+                                                  'og:image'        => 'http://example.com/rock.jpg',
+                                                  'og:image:width'  => '300',
+                                                  'og:image:height' => '300'
+                                                },
+                                  'charset' => 'UTF-8'
+                                }
+    end
+    it "#meta" do
+      page.meta.should == {
+                            'keywords'            => 'one, two, three',
+                            'description'         => 'the description',
+                            'author'              => 'Joe Sample',
+                            'robots'              => 'index,follow',
+                            'revisit'             => '15 days',
+                            'dc.date.issued'      => '2011-09-15',
+                            'content-type'        => 'text/html; charset=UTF-8',
+                            'content-style-type'  => 'text/css',
+                            'og:title'            => 'An OG title',
+                            'og:type'             => 'website',
+                            'og:url'              => 'http://example.com/meta-tags',
+                            'og:image'            => 'http://example.com/rock.jpg',
+                            'og:image:width'      => '300',
+                            'og:image:height'     => '300',
+                            'charset'             => 'UTF-8'
+                          }
     end
   end
@@ -469,18 +367,6 @@ describe MetaInspector::Parser do
     end
   end
-  describe 'to_hash' do
-    it "should return a hash with all the values set" do
-      @m = MetaInspector::Parser.new(doc 'http://pagerankalert.com')
-      @m.to_hash.should == { "meta" => { "name" => { "description" => "Track your PageRank(TM) changes and receive alerts by email",
-                                                     "keywords"    => "pagerank, seo, optimization, google",
-                                                     "robots"      => "all,follow",
-                                                     "csrf_param"  => "authenticity_token",
-                                                     "csrf_token"  => "iW1/w+R8zrtDkhOlivkLZ793BN04Kr3X/pS+ixObHsE="},
-                                         "property"=>{}}}
-    end
-  end
   private
   def doc(url, options = {})

data/spec/spec_helper.rb CHANGED Viewed

@@ -41,7 +41,7 @@ FakeWeb.register_uri(:get, "http://charset002.com", :response => fixture_file("c
 FakeWeb.register_uri(:get, "http://www.inkthemes.com/", :response => fixture_file("wordpress_site.response"))
 FakeWeb.register_uri(:get, "http://pagerankalert.com/image.png", :body => "Image", :content_type => "image/png")
 FakeWeb.register_uri(:get, "http://pagerankalert.com/file.tar.gz", :body => "Image", :content_type => "application/x-gzip")
-FakeWeb.register_uri(:get, "http://example.com/opengraph", :response => fixture_file("opengraph.response"))
+FakeWeb.register_uri(:get, "http://example.com/meta-tags", :response => fixture_file("meta_tags.response"))
 # These examples are used to test relative links
 FakeWeb.register_uri(:get, "http://relative.com/", :response => fixture_file("relative_links.response"))

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: metainspector
 version: !ruby/object:Gem::Version
-  version: 1.17.3
+  version: 2.0.0
 platform: ruby
 authors:
 - Jaime Iniesta
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2014-01-09 00:00:00.000000000 Z
+date: 2014-01-20 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: nokogiri
@@ -24,20 +24,6 @@ dependencies:
     - - ~>
       - !ruby/object:Gem::Version
         version: '1.6'
-- !ruby/object:Gem::Dependency
-  name: rash
-  requirement: !ruby/object:Gem::Requirement
-    requirements:
-    - - ~>
-      - !ruby/object:Gem::Version
-        version: 0.4.0
-  type: :runtime
-  prerelease: false
-  version_requirements: !ruby/object:Gem::Requirement
-    requirements:
-    - - ~>
-      - !ruby/object:Gem::Version
-        version: 0.4.0
 - !ruby/object:Gem::Dependency
   name: open_uri_redirections
   requirement: !ruby/object:Gem::Requirement
@@ -142,7 +128,6 @@ files:
 - lib/meta_inspector/document.rb
 - lib/meta_inspector/exception_log.rb
 - lib/meta_inspector/exceptionable.rb
-- lib/meta_inspector/meta_tags_dynamic_match.rb
 - lib/meta_inspector/parser.rb
 - lib/meta_inspector/request.rb
 - lib/meta_inspector/url.rb
@@ -167,8 +152,8 @@ files:
 - spec/fixtures/iteh.at.response
 - spec/fixtures/malformed_href.response
 - spec/fixtures/markupvalidator_faqs.response
+- spec/fixtures/meta_tags.response
 - spec/fixtures/nonhttp.response
-- spec/fixtures/opengraph.response
 - spec/fixtures/pagerankalert.com.response
 - spec/fixtures/protocol_relative.response
 - spec/fixtures/relative_links.response

data/lib/meta_inspector/meta_tags_dynamic_match.rb DELETED Viewed

@@ -1,18 +0,0 @@
-module MetaInspector
-  # Encapsulates matching for method_missing and respond_to? for meta tags methods
-  class MetaTagsDynamicMatch
-    attr_reader :meta_tag
-    def initialize(method_name)
-      if method_name.to_s =~ /^meta_(.+)/
-        @meta_tag = $1
-      end
-    end
-    def match?
-      @meta_tag
-    end
-  end
-end

data/spec/fixtures/opengraph.response DELETED Viewed

@@ -1,52 +0,0 @@
-HTTP/1.1 200 OK
-Age: 13
-Cache-Control: max-age=120
-Content-Type: text/html
-Date: Mon, 06 Jan 2014 12:47:42 GMT
-Expires: Mon, 06 Jan 2014 12:49:28 GMT
-Server: Apache/2.2.14 (Ubuntu)
-Vary: Accept-Encoding
-Via: 1.1 varnish
-X-Powered-By: PHP/5.3.2-1ubuntu4.22
-X-Varnish: 1188792404 1188790413
-Content-Length: 40571
-Connection: keep-alive
-<!DOCTYPE html>
-<html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml">
-  <head>
-    <meta http-equiv="Content-type" content="text/html; charset=utf-8">
-    <!-- Basic OG Metadata -->
-    <meta property="og:title" content="An OG title" />
-    <meta property="og:type" content="website" />
-    <meta property="og:url" content="http://example.com/opengraph" />
-    <!-- Optional OG Metadata -->
-    <meta property="og:description" content="Sean Connery found fame and fortune" />
-    <meta property="og:determiner" content="the" />
-    <meta property="og:locale" content="en_GB" />
-    <meta property="og:locale:alternate" content="fr_FR" />
-    <meta property="og:site_name" content="IMDb" />
-    <!-- Structured OG Properties -->
-    <meta property="og:image" content="http://example.com/ogp.jpg" />
-    <meta property="og:image:secure_url" content="https://secure.example.com/ogp.jpg" />
-    <meta property="og:image:type" content="image/jpeg" />
-    <meta property="og:image:width" content="400" />
-    <meta property="og:image:height" content="300" />
-    <meta property="og:video" content="http://example.com/movie.swf" />
-    <meta property="og:video:secure_url" content="https://secure.example.com/movie.swf" />
-    <meta property="og:video:type" content="application/x-shockwave-flash" />
-    <meta property="og:video:width" content="400" />
-    <meta property="og:video:height" content="300" />
-    <meta property="og:audio" content="http://example.com/sound.mp3" />
-    <meta property="og:audio:secure_url" content="https://secure.example.com/sound.mp3" />
-    <meta property="og:audio:type" content="audio/mpeg" />
-  </head>
-  <body>
-    <p>A sample page with many Open Graph meta tags</p>
-  </body>
-</html>