mida 0.2.0 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.
data/CHANGELOG.rdoc CHANGED
@@ -1,7 +1,23 @@
1
+ == 0.3.0 (29th June 2011)
2
+ * Merge +VocabularyDesc+ into +Vocabulary+
3
+ * Vocabularies are now auto registered using +inherited+ hook
4
+ * Removed vocabulary from <tt>Item#to_h</tt>
5
+ * Deprecate +types+ to describe a Vocabulary property if favour of +extract+
6
+ * Add +DataType+ so can use <tt>DataType::Text</tt> instead of +String+ for a
7
+ type
8
+ * Add various <tt>DataType</tt>s: +Boolean+, +Float+, +Integer+, +Number+,
9
+ +ISO8601Date+, +Text+
10
+ * Add Bundler support
11
+ * Properties marked as <tt>has_one</tt> now output a single value instead of
12
+ an +Array+
13
+ * <tt>Document#search</tt> now only uses a +Regexp+ to search with
14
+ * +Document+ now includes +Enumerable+ Mixin
15
+
1
16
  == 0.2.0 (3rd May 2011)
2
17
  * Add ability to describe and conform to vocabularies
3
- * Rename Mida::Property to Mida::Itemprop to better reflect use
4
- * Make some of the Mida::Itemprop class methods private
18
+ * Rename <tt>Mida::Property</tt> to <tt>Mida::Itemprop</tt> to better reflect
19
+ use
20
+ * Make some of the <tt>Mida::Itemprop</tt> class methods private
5
21
 
6
22
  == 0.1.3 (18th April 2011)
7
23
  * Ensure itemprops are parsed properly if containing non-microdata elements
data/README.rdoc CHANGED
@@ -1,6 +1,7 @@
1
1
  = Mida
2
2
 
3
- * {Mida Project Page}[https://github.com/LawrenceWoodman/mida]
3
+ * {Mida Project Page}[http://lawrencewoodman.github.com/mida]
4
+ * {Mida Github Repository}[https://github.com/LawrenceWoodman/mida]
4
5
  * {Mida Bug Tracker}[https://github.com/LawrenceWoodman/mida/issues]
5
6
 
6
7
  == Description
@@ -43,8 +44,8 @@ To return all the +Items+ that use one of Google's Review vocabularies:
43
44
  doc.search(%r{http://data-vocabulary\.org.*?review.*?}i)
44
45
 
45
46
  === Inspecting an +Item+
46
- Each +Item+ is a <tt>Mida::Item</tt> instance and has three main methods of
47
- interest, +type+, +properties+ and +id+.
47
+ Each +Item+ is a <tt>Mida::Item</tt> instance and has four main methods of
48
+ interest: +type+, +vocabulary+, +properties+ and +id+.
48
49
 
49
50
  To find out the +itemtype+ of the +Item+:
50
51
  puts doc.items.first.type
@@ -60,21 +61,31 @@ To see the +properties+ of the +Item+:
60
61
 
61
62
  === Working with Vocabularies
62
63
  Mida allows you to define vocabularies, so that input data can be constrained to match
63
- expected patterns. By default a generic vocabulary (<tt>Mida::Vocabulary::Generic</tt>)
64
+ expected patterns. By default a generic vocabulary (<tt>Mida::GenericVocabulary</tt>)
64
65
  is registered which will match against any +itemtype+ with any number of properties.
65
66
 
66
- If you want to specify a vocabulary you create a class derived from <tt>Mida::VocabularyDesc</tt>
67
- and use +itemtype+, +has_one+, +has_many+ and +types+ to describe the vocabulary.
67
+ If you want to specify a vocabulary you create a class derived from <tt>Mida::Vocabulary</tt>
68
+ and use +itemtype+, +has_one+, +has_many+ and +extract+ to describe the vocabulary.
68
69
 
69
70
  As an example the following describes a subset of Google's Review vocabulary:
70
- class Review < Mida::VocabularyDesc
71
- itemtype %r{http://data-vocabulary.org/review}
71
+
72
+ class Rating < Mida::Vocabulary
73
+ itemtype %r{http://data-vocabulary.org/rating}i
74
+ has_one 'best'
75
+ has_one 'worst'
76
+ has_one 'value'
77
+ end
78
+
79
+ class Review < Mida::Vocabulary
80
+ itemtype %r{http://data-vocabulary.org/review}i
72
81
  has_one 'itemreviewed'
73
- has_one 'rating'
82
+ has_one 'rating' do
83
+ extract Rating, Mida::DataType::Text
84
+ end
74
85
  end
75
86
 
76
- To register the above Vocabulary use:
77
- Mida::Vocabulary.register(Review)
87
+ When you create a subclass of <tt>Mida::Vocabulary</tt> it automatically
88
+ registers the Vocabulary.
78
89
 
79
90
  Now if Mida is parsing some input and manages to match against the +Review+ +itemtype+, it
80
91
  will only allow the specified properties and will reject any that don't have the correct number. It
data/Rakefile CHANGED
@@ -6,10 +6,10 @@ spec = Gem::Specification.new do |s|
6
6
  s.name = "mida"
7
7
  s.summary = "A Microdata parser/extractor library"
8
8
  s.description = "A Microdata parser and extractor library, based on the latest published version of the Microdata Specification, dated 5th April 2011."
9
- s.version = "0.2.0"
9
+ s.version = "0.3.0"
10
10
  s.author = "Lawrence Woodman"
11
11
  s.email = "lwoodman@vlifesystems.com"
12
- s.homepage = %q{http://github.com/LawrenceWoodman/mida}
12
+ s.homepage = %q{http://lawrencewoodman.github.com/mida/}
13
13
  s.platform = Gem::Platform::RUBY
14
14
  s.required_ruby_version = '>=1.9'
15
15
  s.files = Dir['lib/**/*.rb'] + Dir['spec/**/*.rb'] + Dir['*.rdoc'] + Dir['Rakefile']
data/lib/mida.rb CHANGED
@@ -6,4 +6,4 @@ module Mida
6
6
 
7
7
  end
8
8
 
9
- require_relative 'mida/vocabulary/generic'
9
+ require 'mida/genericvocabulary'
@@ -0,0 +1,15 @@
1
+ module Mida
2
+ # Module to hold the various data types.
3
+ # Each DataType should be a module containing the class method: +extract+
4
+ # which returns the value extracted or raises an +ArgumentError+ exception
5
+ # if input value is not valid.
6
+ module DataType
7
+ end
8
+ end
9
+
10
+ require 'mida/datatype/boolean'
11
+ require 'mida/datatype/float'
12
+ require 'mida/datatype/integer'
13
+ require 'mida/datatype/iso8601date'
14
+ require 'mida/datatype/number'
15
+ require 'mida/datatype/text'
@@ -0,0 +1,18 @@
1
+ module Mida
2
+ module DataType
3
+
4
+ # Boolean data type
5
+ module Boolean
6
+
7
+ # Returns the +value+ as a boolean
8
+ # or raises ArgumentError if not valid
9
+ def self.extract(value)
10
+ case value.downcase
11
+ when 'true' then true
12
+ when 'false' then false
13
+ else raise ArgumentError, 'Invalid value'
14
+ end
15
+ end
16
+ end
17
+ end
18
+ end
@@ -0,0 +1,15 @@
1
+ module Mida
2
+ module DataType
3
+
4
+ # Float data type
5
+ module Float
6
+
7
+ # Returns the +value+ as a floating point number
8
+ # Relies on +Float+ to raise +ArgumentError+ if not valid
9
+ def self.extract(value)
10
+ Float(value)
11
+ end
12
+
13
+ end
14
+ end
15
+ end
@@ -0,0 +1,15 @@
1
+ module Mida
2
+ module DataType
3
+
4
+ # Integer data type
5
+ module Integer
6
+
7
+ # Returns the +value+ as an integer
8
+ # Relies on +Integer+ to raise +ArgumentError+ if not valid
9
+ def self.extract(value)
10
+ Integer(value)
11
+ end
12
+
13
+ end
14
+ end
15
+ end
@@ -0,0 +1,17 @@
1
+ require 'date'
2
+
3
+ module Mida
4
+ module DataType
5
+
6
+ # ISO 8601 Date data type
7
+ module ISO8601Date
8
+
9
+ # Returns the +value+ as a +DateTime+ instance
10
+ # Relies on <tt>DateTime#iso8601</tt> to raise
11
+ # +ArgumentError+ if not valid
12
+ def self.extract(value)
13
+ DateTime.iso8601(value)
14
+ end
15
+ end
16
+ end
17
+ end
@@ -0,0 +1,15 @@
1
+ module Mida
2
+ module DataType
3
+
4
+ # Number data type
5
+ module Number
6
+
7
+ # Returns the +value+ as a number
8
+ # Relies on +Float+ to raise +ArgumentError+ if not valid
9
+ def self.extract(value)
10
+ Float(value)
11
+ end
12
+
13
+ end
14
+ end
15
+ end
@@ -0,0 +1,13 @@
1
+ module Mida
2
+ module DataType
3
+
4
+ # Text data type
5
+ module Text
6
+
7
+ # Returns the value extracted
8
+ def self.extract(value)
9
+ value
10
+ end
11
+ end
12
+ end
13
+ end
data/lib/mida/document.rb CHANGED
@@ -4,6 +4,7 @@ module Mida
4
4
 
5
5
  # Class that holds the extracted Microdata
6
6
  class Document
7
+ include Enumerable
7
8
 
8
9
  # An Array of Mida::Item objects. These are all top-level
9
10
  # and hence not properties of other Items
@@ -20,25 +21,27 @@ module Mida
20
21
  @items = extract_items
21
22
  end
22
23
 
23
- # Returns an array of matching Mida::Item objects
24
- #
25
- # [vocabulary] A regexp to match the item types against
26
- # or a Class derived from Mida::VocabularyDesc
27
- # to match against
28
- def search(vocabulary, items=@items)
29
- found_items = []
30
- regexp_passed = vocabulary.kind_of?(Regexp)
31
- regexp = if regexp_passed then vocabulary else vocabulary.itemtype end
24
+ # Implements method for Enumerable
25
+ def each
26
+ @items.each {|item| yield(item)}
27
+ end
32
28
 
33
- items.each do |item|
29
+ # Returns an array of matching <tt>Mida::Item</tt> objects
30
+ #
31
+ # This drills down through each +Item+ to find match items
32
+ #
33
+ # [itemtype] A regexp to match the item types against
34
+ # [items] An array of items to search. If no argument supplied, will
35
+ # search through all items in the document.
36
+ def search(itemtype, items=@items)
37
+ items.each_with_object([]) do |item, found_items|
34
38
  # Allows matching against empty string, otherwise couldn't match
35
39
  # as item.type can be nil
36
- if (item.type.nil? && "" =~ regexp) || (item.type =~ regexp)
40
+ if (item.type.nil? && "" =~ itemtype) || (item.type =~ itemtype)
37
41
  found_items << item
38
42
  end
39
- found_items += search_values(item.properties.values, regexp)
43
+ found_items.concat(search_values(item.properties.values, itemtype))
40
44
  end
41
- found_items
42
45
  end
43
46
 
44
47
  private
@@ -47,18 +50,17 @@ module Mida
47
50
  return nil unless itemscopes
48
51
 
49
52
  itemscopes.collect do |itemscope|
50
- Item.new(itemscope, @page_url)
53
+ itemscope = Itemscope.new(itemscope, @page_url)
54
+ Item.new(itemscope)
51
55
  end
52
56
  end
53
57
 
54
58
  def search_values(values, vocabulary)
55
- items = []
56
- values.each do |value|
57
- if value.is_a?(Mida::Item) then items += search(vocabulary, [value])
58
- elsif value.is_a?(Array) then items += search_values(value, vocabulary)
59
+ values.each_with_object([]) do |value, items|
60
+ if value.is_a?(Item) then items.concat(search(vocabulary, [value]))
61
+ elsif value.is_a?(Array) then items.concat(search_values(value, vocabulary))
59
62
  end
60
63
  end
61
- items
62
64
  end
63
65
 
64
66
  end
@@ -0,0 +1,13 @@
1
+ require 'mida/vocabulary'
2
+
3
+ module Mida
4
+
5
+ # A Generic vocabulary that will match against anything
6
+ class GenericVocabulary < Mida::Vocabulary
7
+ itemtype %r{}
8
+ has_many :any do
9
+ extract :any
10
+ end
11
+ end
12
+
13
+ end
data/lib/mida/item.rb CHANGED
@@ -1,9 +1,11 @@
1
1
  require 'nokogiri'
2
+ require 'mida'
2
3
 
3
4
  module Mida
4
5
 
5
- # Class that holds each item/itemscope
6
+ # Class that holds a validated item
6
7
  class Item
8
+
7
9
  # The vocabulary used to interpret this item
8
10
  attr_reader :vocabulary
9
11
 
@@ -18,29 +20,26 @@ module Mida
18
20
  # or <tt>Mida::Item</tt> instances
19
21
  attr_reader :properties
20
22
 
21
- # Create a new Item object
23
+ # Create a new Item object from an +Itemscope+ and validates
24
+ # its +properties+
22
25
  #
23
- # [itemscope] The itemscope that you want to parse.
24
- # [page_url] The url of target used for form absolute url.
25
- def initialize(itemscope, page_url=nil)
26
- @itemscope, @page_url = itemscope, page_url
27
- @type, @id = extract_attribute('itemtype'), extract_attribute('itemid')
26
+ # [itemscope] The itemscope that has been parsed by +Itemscope+
27
+ def initialize(itemscope)
28
+ @type = itemscope.type
29
+ @id = itemscope.id
28
30
  @vocabulary = Mida::Vocabulary.find(@type)
29
- @properties = {}
30
- add_itemref_properties
31
- parse_elements(extract_elements(@itemscope))
31
+ @properties = itemscope.properties
32
32
  validate_properties
33
33
  end
34
34
 
35
35
  # Return a Hash representation
36
36
  # of the form:
37
- # { vocabulary: 'http://example.com/vocab/review',
38
- # type: 'The item type',
37
+ # { type: 'http://example.com/vocab/review',
39
38
  # id: 'urn:isbn:1-934356-08-5',
40
39
  # properties: {'a name' => 'avalue' }
41
40
  # }
42
41
  def to_h
43
- {vocabulary: @vocabulary, type: @type, id: @id, properties: properties_to_h(@properties)}
42
+ {type: @type, id: @id, properties: properties_to_h(@properties)}
44
43
  end
45
44
 
46
45
  def to_s
@@ -58,63 +57,101 @@ module Mida
58
57
  def validate_properties
59
58
  @properties =
60
59
  @properties.each_with_object({}) do |(property, values), hash|
61
- if valid_property?(property, values)
62
- hash[property] = valid_values(property, values)
63
- end
60
+ valid_values = validate_values(property, values)
61
+ hash[property] = valid_values unless valid_values.nil?
64
62
  end
65
63
  end
66
64
 
67
- # Return whether the number of values conforms to the spec
68
- def valid_num_values?(property, values)
69
- return false unless @vocabulary.prop_spec.has_key?(property)
70
- property_spec = @vocabulary.prop_spec[property]
71
- (property_spec[:num] == :many ||
72
- (property_spec[:num] == :one && values.length == 1))
65
+ # Return whether the number of values conforms to +num+
66
+ def valid_num_values?(num, values)
67
+ num == :many || (num == :one && values.length == 1)
73
68
  end
74
69
 
70
+ # Return whether this property name is valid
75
71
  def valid_property?(property, values)
76
- [property, :any].any? {|prop| valid_num_values?(prop, values)}
72
+ [property, :any].any? do |prop|
73
+ @vocabulary.properties.has_key?(prop)
74
+ end
77
75
  end
78
76
 
79
- def valid_values(property, values)
80
- prop_types = if @vocabulary.prop_spec.has_key?(property)
81
- @vocabulary.prop_spec[property][:types]
77
+ # Return valid values, converted to the correct +DataType+
78
+ # or +Item+ and number if necessary
79
+ def validate_values(property, values)
80
+ return nil unless valid_property?(property, values)
81
+ prop_num = property_number(property)
82
+ return nil unless valid_num_values?(prop_num, values)
83
+ prop_types = property_types(property)
84
+
85
+ valid_values = values.each_with_object([]) do |value, valid_values|
86
+ new_value = validate_value(prop_types, value)
87
+ valid_values << new_value unless new_value.nil?
88
+ end
89
+
90
+ # Convert property to correct number
91
+ prop_num == :many ? valid_values : valid_values.first
92
+ end
93
+
94
+ # Returns value converted to correct +DataType+ or +Item+
95
+ # or +nil+ if not valid
96
+ def validate_value(prop_types, value)
97
+ if is_itemscope?(value)
98
+ valid_itemtype?(prop_types, value.type) ? Item.new(value) : nil
99
+ elsif (extract_value = datatype_extract(prop_types, value))
100
+ extract_value
101
+ elsif prop_types.include?(:any)
102
+ value
82
103
  else
83
- @vocabulary.prop_spec[:any][:types]
104
+ nil
84
105
  end
106
+ end
85
107
 
86
- values.select {|value| valid_type(prop_types, value) }
108
+ # Return the correct type for this property
109
+ def property_types(property)
110
+ if @vocabulary.properties.has_key?(property)
111
+ @vocabulary.properties[property][:types]
112
+ else
113
+ @vocabulary.properties[:any][:types]
114
+ end
87
115
  end
88
116
 
89
- def valid_type(prop_types, value)
90
- if value.respond_to?(:vocabulary)
91
- if prop_types.include?(value.vocabulary) || prop_types.include?(:any)
92
- return true
93
- end
94
- elsif prop_types.include?(value.class) || prop_types.include?(:any)
95
- return true
117
+ # Return the correct number for this property
118
+ def property_number(property)
119
+ if @vocabulary.properties.has_key?(property)
120
+ @vocabulary.properties[property][:num]
121
+ else
122
+ @vocabulary.properties[:any][:num]
96
123
  end
97
- false
98
124
  end
99
125
 
100
- def extract_attribute(attribute)
101
- (value = @itemscope.attribute(attribute)) ? value.value : nil
126
+ def is_itemscope?(object)
127
+ object.kind_of?(Itemscope)
102
128
  end
103
129
 
104
- def extract_elements(itemscope)
105
- itemscope.search('./*')
130
+ # Returns whether the +itemtype+ is a valid type
131
+ def valid_itemtype?(valid_types, itemtype)
132
+ return true if valid_types.include?(:any)
133
+
134
+ valid_types.find do |type|
135
+ type.respond_to?(:itemtype) && type.itemtype =~ itemtype
136
+ end
106
137
  end
107
138
 
108
- # Find an element with a matching id
109
- def find_with_id(id)
110
- @itemscope.search("//*[@id='#{id}']")
139
+ # Returns the extracted value or +nil+ if none of the datatypes
140
+ # could extract the +value+
141
+ def datatype_extract(valid_types, value)
142
+ valid_types.find do |type|
143
+ begin
144
+ return type.extract(value) if type.respond_to?(:extract)
145
+ rescue ArgumentError
146
+ end
147
+ end
148
+ nil
111
149
  end
112
150
 
113
151
  # The value as it should appear in to_h()
114
152
  def value_to_h(value)
115
- case
116
- when value.is_a?(Array) then value.collect {|element| value_to_h(element)}
117
- when value.is_a?(Item) then value.to_h
153
+ if value.is_a?(Array) then value.collect {|element| value_to_h(element)}
154
+ elsif value.is_a?(Item) then value.to_h
118
155
  else value
119
156
  end
120
157
  end
@@ -125,31 +162,6 @@ module Mida
125
162
  end
126
163
  end
127
164
 
128
- # Add any properties referred to by 'itemref'
129
- def add_itemref_properties
130
- itemref = extract_attribute('itemref')
131
- if itemref
132
- itemref.split.each {|id| parse_elements(find_with_id(id))}
133
- end
134
- end
135
-
136
- def parse_elements(elements)
137
- elements.each {|element| parse_element(element)}
138
- end
139
-
140
- def parse_element(element)
141
- itemscope = element.attribute('itemscope')
142
- itemprop = element.attribute('itemprop')
143
- internal_elements = extract_elements(element)
144
- add_itemprop(element) if itemscope || itemprop
145
- parse_elements(internal_elements) if internal_elements && !itemscope
146
- end
147
-
148
- def add_itemprop(itemprop)
149
- properties = Itemprop.parse(itemprop, @page_url)
150
- properties.each { |name, value| (@properties[name] ||= []) << value }
151
- end
152
-
153
165
  end
154
166
 
155
167
  end