RubyGems - hypermicrodata - Versions diffs - 0.1.0 - Mend

hypermicrodata 0.1.0

Files changed (32) hide show

checksums.yaml +7 -0
data/.gitignore +17 -0
data/.travis.yml +8 -0
data/Gemfile +8 -0
data/LICENSE.txt +22 -0
data/README.md +100 -0
data/Rakefile +10 -0
data/bin/hypermicrodata.rb +25 -0
data/hypermicrodata.gemspec +28 -0
data/lib/hypermicrodata.rb +37 -0
data/lib/hypermicrodata/document.rb +27 -0
data/lib/hypermicrodata/extract.rb +22 -0
data/lib/hypermicrodata/item.rb +113 -0
data/lib/hypermicrodata/itemprop_parser.rb +114 -0
data/lib/hypermicrodata/link.rb +7 -0
data/lib/hypermicrodata/property.rb +27 -0
data/lib/hypermicrodata/rails/html_based_json_renderer.rb +35 -0
data/lib/hypermicrodata/serializer/base.rb +24 -0
data/lib/hypermicrodata/serializer/hal.rb +47 -0
data/lib/hypermicrodata/serializer/jsonld.rb +44 -0
data/lib/hypermicrodata/serializer/uber.rb +100 -0
data/lib/hypermicrodata/submit_button.rb +105 -0
data/lib/hypermicrodata/version.rb +3 -0
data/lib/uberous/uber.rb +104 -0
data/test/data/example.html +22 -0
data/test/data/example_itemref.html +16 -0
data/test/data/example_with_no_itemscope.html +22 -0
data/test/test_helper.rb +3 -0
data/test/test_itemref.rb +19 -0
data/test/test_json.rb +15 -0
data/test/test_parse.rb +36 -0
metadata +139 -0

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA1:
+  metadata.gz: 6aa222d1d9f2fd94e7eabda85a111de9b63b17ba
+  data.tar.gz: 624be0e7d6c825c69ed224508f6286da2911cd8e
+SHA512:
+  metadata.gz: 094a2d0285349d16ff74308ce8756d5a2510f67c0ab564bd93112823c488bc0eeee030725feb87fa6b2f89e2ab4805407a2a536ded47262c3125f81ea1cd9901
+  data.tar.gz: 6753e62b18ea5b2e4b0550b5fcaaf2eeb5f3101efbb61c5af745901285a3d4621761e2b75a8201e670274350cab0a5a2f7bebac8570e4e87357a6e581626b700

data/.gitignore ADDED Viewed

@@ -0,0 +1,17 @@
+*.gem
+*.rbc
+.bundle
+.config
+.yardoc
+Gemfile.lock
+InstalledFiles
+_yardoc
+coverage
+doc/
+lib/bundler/man
+pkg
+rdoc
+spec/reports
+test/tmp
+test/version_tmp
+tmp

data/.travis.yml ADDED Viewed

@@ -0,0 +1,8 @@
+language: ruby
+rvm:
+  - "1.9.2"
+  - "1.9.3"
+  - "2.0.0"
+  - jruby-19mode # JRuby in 1.9 mode
+# uncomment this line if your project needs to run something other than `rake`:
+script: rake test

data/Gemfile ADDED Viewed

@@ -0,0 +1,8 @@
+source 'https://rubygems.org'
+# Specify your gem's dependencies in hypermicrodata.gemspec
+gemspec
+group :test do
+  gem 'pry'
+end

data/LICENSE.txt ADDED Viewed

@@ -0,0 +1,22 @@
+Copyright (c) 2013 Jason Ronallo, Toru KAWAMURA
+MIT License
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

data/README.md ADDED Viewed

@@ -0,0 +1,100 @@
+# Hypermicrodata
+Ruby library for extracting HTML5 Microdata with Hypermedia
+[![Build Status](https://travis-ci.org/tkawa/hypermicrodata.png)](https://travis-ci.org/tkawa/hypermicrodata)
+## Story
+Most of the code here was extracted from [Mida](https://github.com/LawrenceWoodman/mida) by Lawrence Woodman. This was done in order to have a simpler, more generic Microdata parser without all the vocabulary awareness and other features. This gem is also tested under Ruby 1.9.3 and Ruby 2.0.0, though it could be better tested.
+## Installation
+This library has not been released to RubyGems.org yet, but when it is the intention is to have it install with the following.
+Add this line to your application's Gemfile:
+    gem 'hypermicrodata'
+And then execute:
+    $ bundle
+Or install it yourself as:
+    $ gem install hypermicrodata
+## Usage
+### Basic
+```
+json = Hypermicrodata::Extract.new(html).to_json(:uber)
+```
+Supported formats are
+- application/vnd.amundsen-uber+json (:uber)
+- application/hal+json (:hal)
+- application/json (:plain)
+### Rails Integration
+When you use this in Rails, you don't need to extract data manually.
+/app/controllers/people_controller.rb
+```
+class PeopleController < ApplicationController
+  before_action :set_message, only: %i(show edit update destroy)
+  include Hypermicrodata::Rails::HtmlBasedJsonRenderer
+  ...
+end
+```
+/app/views/people/show.html.haml
+```
+.person{itemscope: true, itemtype: 'http://schema.org/Person',
+        itemid: person_url(@person), data: {main_item: true}}
+  .media
+    .media-image.pull-left
+      = image_tag @person.picture_path, alt: '', itemprop: 'image'
+    .media-body
+      %h1.media-heading
+        %span{itemprop: 'name'}= @person.name
+  = link_to 'collection', people_path, rel: 'collection', itemprop: 'isPartOf'
+```
+And you can serve following JSON:
+```
+GET /people/1 HTTP/1.1
+Host: www.example.com
+Accept: application/vnd.amundsen-uber+json
+```
+```
+{
+  "uber": {
+    "version": "1.0",
+    "data": [{
+      "url": "http://www.example.com/people/1",
+      "name": "Person",
+      "data": [
+        { "name": "image", "value": "/assets/bob.png" },
+        { "name": "name", "value": "Bob Smith" },
+        { "name": "isPartOf", "rel": "collection", "url": "/people" },
+      ]
+    }]
+  }
+}
+```
+## Contributing
+1. Fork it
+2. Create your feature branch (`git checkout -b my-new-feature`)
+3. Commit your changes (`git commit -am 'Add some feature'`)
+4. Push to the branch (`git push origin my-new-feature`)
+5. Create new Pull Request

data/Rakefile ADDED Viewed

@@ -0,0 +1,10 @@
+require "bundler/gem_tasks"
+require 'rake/testtask'
+Rake::TestTask.new do |t|
+ t.libs << 'test'
+end
+desc "Run tests"
+task :default => :test

data/bin/hypermicrodata.rb ADDED Viewed

@@ -0,0 +1,25 @@
+#! /usr/bin/env ruby
+# hypermicrodata.rb
+# Extract HTML5 Microdata and output JSON
+$LOAD_PATH.unshift File.join(File.dirname(__FILE__), '..', 'lib')
+require 'hypermicrodata'
+location = ARGV[0]
+content = open(location)
+document = Hypermicrodata::Document.new(content, location)
+items = document.extract_items
+if items.empty? || items.nil?
+  puts "No Microdata items found."
+  itemprops = document.doc.search('//*[@itemprop]')
+  if !itemprops.empty?
+    puts "There are some itemprops, which means no top level items with an itemscope have been found."
+  end
+else
+  hash = {}
+  hash[:items] = items.map do |item|
+    item.to_hash
+  end
+  puts JSON.pretty_generate(hash)
+end

data/hypermicrodata.gemspec ADDED Viewed

@@ -0,0 +1,28 @@
+# coding: utf-8
+lib = File.expand_path('../lib', __FILE__)
+$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
+require 'hypermicrodata/version'
+Gem::Specification.new do |spec|
+  spec.name          = "hypermicrodata"
+  spec.version       = Hypermicrodata::VERSION
+  spec.authors       = ["Jason Ronallo", "Toru KAWAMURA"]
+  spec.email         = ["jronallo@gmail.com", "tkawa@4bit.net"]
+  spec.description   = %q{HTML5 Microdata extractor with Hypermedia}
+  spec.summary       = %q{Ruby library for extracting HTML5 Microdata with Hypermedia}
+  spec.homepage      = "https://github.com/tkawa/hypermicrodata"
+  spec.license       = "MIT"
+  spec.files         = `git ls-files`.split($/)
+  spec.executables   = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
+  spec.test_files    = spec.files.grep(%r{^(test|spec|features)/})
+  spec.require_paths = ["lib"]
+  spec.add_dependency "nokogiri"
+  spec.add_dependency "mechanize"
+  spec.add_dependency "halibut"
+  spec.add_dependency "multi_json"
+  spec.add_development_dependency "bundler", "~> 1.3"
+  spec.add_development_dependency "rake"
+end

data/lib/hypermicrodata.rb ADDED Viewed

@@ -0,0 +1,37 @@
+require "hypermicrodata/version"
+require "uberous/uber"
+require "nokogiri"
+require "mechanize"
+require "hypermicrodata/item"
+require "hypermicrodata/document"
+require "hypermicrodata/property"
+require "hypermicrodata/link"
+require "hypermicrodata/itemprop_parser"
+require "hypermicrodata/submit_button"
+require "hypermicrodata/serializer/base"
+require "hypermicrodata/serializer/hal"
+require "hypermicrodata/serializer/uber"
+require "hypermicrodata/extract"
+require "hypermicrodata/rails/html_based_json_renderer"
+require 'open-uri'
+require 'json'
+require 'uri'
+module Hypermicrodata
+  def self.get_items(location)
+    content = open(location)
+    page_url = location
+    Hypermicrodata::Document.new(content, page_url).extract_items
+  end
+  def self.to_json(location)
+    items = get_items(location)
+    hash = {}
+    hash[:items] = items.map do |item|
+      item.to_hash
+    end
+    JSON.pretty_generate hash
+  end
+end

data/lib/hypermicrodata/document.rb ADDED Viewed

@@ -0,0 +1,27 @@
+module Hypermicrodata
+  class Document
+    attr_reader :items, :doc
+    def initialize(content, page_url=nil, filter_xpath_attr=nil)
+      @doc = Nokogiri::HTML(content)
+      @page_url = page_url
+      @filter_xpath_attr = filter_xpath_attr
+      @items = extract_items
+    end
+    def extract_items
+      itemscopes = []
+      if @filter_xpath_attr
+        itemscopes = @doc.xpath("//*[#{@filter_xpath_attr} and @itemscope]")
+        puts "XPath //*[#{@filter_xpath_attr}] is not found. root node is used." if itemscopes.empty?
+      end
+      itemscopes = @doc.xpath('self::*[@itemscope] | .//*[@itemscope and not(@itemprop)]') if itemscopes.empty?
+      itemscopes.collect do |itemscope|
+        Item.new(itemscope, @page_url)
+      end
+    end
+  end
+end

data/lib/hypermicrodata/extract.rb ADDED Viewed

@@ -0,0 +1,22 @@
+module Hypermicrodata
+  class Extract
+    def initialize(html, options = {})
+      default_data_attr_name = 'main-item'
+      @location = options[:location]
+      @profile_path = options[:profile_path]
+      filter_xpath_attr = "@data-#{options[:data_attr_name] || default_data_attr_name}"
+      @document = Hypermicrodata::Document.new(html, @location, filter_xpath_attr)
+    end
+    def to_json(format = :plain, options = {})
+      case format
+      when :hal
+        Hypermicrodata::Serializer::Hal.new(@document, @location, @profile_path).to_json(options)
+      when :uber
+        Hypermicrodata::Serializer::Uber.new(@document, @location, @profile_path).to_json(options)
+      else
+        Hypermicrodata::Serializer::Base.new(@document, @location, @profile_path).to_json(options)
+      end
+    end
+  end
+end

data/lib/hypermicrodata/item.rb ADDED Viewed

@@ -0,0 +1,113 @@
+module Hypermicrodata
+  class Item
+    attr_reader :type, :properties, :links, :id
+    def initialize(top_node, page_url)
+      @top_node = top_node
+      @type = extract_itemtype
+      @id   = extract_itemid
+      @properties = {}
+      @links = {}
+      @page_url = page_url
+      add_itemref_properties(@top_node)
+      parse_elements(extract_elements(@top_node))
+    end
+    def to_hash
+      hash = {}
+      hash[:id] = id if id
+      hash[:type] = type if type
+      hash[:properties] = {}
+      properties.each do |name, same_name_properties|
+        final_values = same_name_properties.map do |property|
+          if property.item
+            property.item.to_hash
+          else
+            property.value
+          end
+        end
+        hash[:properties][name] = final_values
+      end
+      hash[:links] = {}
+      links.each do |rel, same_rel_links|
+        final_values = same_rel_links.map do |link|
+          if link.item
+            link.item.to_hash
+          else
+            link.value
+          end
+        end
+        hash[:links][rel] = final_values
+      end
+      hash
+    end
+    def all_properties_and_links
+      properties.values.flatten | links.values.flatten
+    end
+    private
+    def extract_elements(node)
+      node.search('./*')
+    end
+    def extract_itemid
+      (value = @top_node.attribute('itemid')) ? value.value : nil
+    end
+    def extract_itemtype
+      (value = @top_node.attribute('itemtype')) ? value.value.split(' ') : nil
+    end
+    def parse_elements(elements)
+      elements.each {|element| parse_element(element)}
+    end
+    def parse_element(element)
+      itemscope = element.attribute('itemscope')
+      itemprop = element.attribute('itemprop')
+      internal_elements = extract_elements(element)
+      add_itemprop(element) if itemscope || itemprop || ItempropParser::LINK_ELEMENTS.include?(element.name)
+      add_form(element) if element.name == 'form'
+      parse_elements(internal_elements) if internal_elements && !itemscope
+    end
+    # Add an 'itemprop' to the properties
+    def add_itemprop(element)
+      property = ItempropParser.parse(element, @page_url)
+      if property.link? && property.names.empty? && property.rels.empty?
+        (@links['link'] ||= []) << property
+      else
+        property.names.each { |name| (@properties[name] ||= []) << property }
+        property.rels.each { |rel| (@links[rel] ||= []) << property }
+      end
+    end
+    # Add any properties referred to by 'itemref'
+    def add_itemref_properties(element)
+      itemref = element.attribute('itemref')
+      if itemref
+        itemref.value.split(' ').each {|id| parse_elements(find_with_id(id))}
+      end
+    end
+    def add_form(element)
+      submit_buttons = FormParser.parse(element, @page_url)
+      submit_buttons.each do |submit_button|
+        submit_button.names.each { |name| (@properties[name] ||= []) << submit_button }
+        if submit_button.rels.empty?
+          (@links['submit'] ||= []) << submit_button
+        else
+          submit_button.rels.each { |rel| (@links[rel] ||= []) << submit_button }
+        end
+      end
+    end
+    # Find an element with a matching id
+    def find_with_id(id)
+      @top_node.search("//*[@id='#{id}']")
+    end
+  end
+end