micromicro 0.1.0 → 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: ae75d6afb6f7d98bc3a2fc12333ccbd31824e1cecd8336057e551fe6b392a643
4
- data.tar.gz: 877da43669217907635d71015ec574ebc656e8755c8f6267229cebde36f281b0
3
+ metadata.gz: 6c5c3cb23e1c8338fef3d7cca36c47e4dcc834f3819525c01bd7ce6dab314971
4
+ data.tar.gz: 27794eaac80a6701c7910f4e31200a4bc999d0b8082b3ca1d2dd5a87a0bdaa82
5
5
  SHA512:
6
- metadata.gz: a4b5a68c2d343fe0935d2103016aacf0b94d7a7c391d3ea360160a25fcb01bc16b7c70ae715e12b17142cedb928bea782b8057db5002df3e9ece2a99fdf1558c
7
- data.tar.gz: 16a62f01b013d56547cebcac3001ecf59b77b3c14653f473a833d66589c6be31b5c53c2e6c547e0ce5c65f3dafb18feeb94be30ae0f57c65a96def024d6afccb
6
+ metadata.gz: d59f245021fb8bec36e1e319cffce3c3a6168ea763952efc8759bda7cb7a7bf13420632cdcfd2b988d72524a348f5a1238ff206cfd3aa75bfb5a57621f4f3b8c
7
+ data.tar.gz: 9355a25a3b65fe72828abf95d4dcec2829e46772904487b030a906cd95708f4bc946570f507bae326df2a3e8066e3de1c332799977cf8db749269e095bba413b
data/.simplecov CHANGED
@@ -2,9 +2,11 @@ require 'simplecov-console'
2
2
 
3
3
  formatters = [SimpleCov::Formatter::HTMLFormatter]
4
4
 
5
+ # rubocop:disable Style/IfUnlessModifier
5
6
  if RSpec.configuration.files_to_run.length > 1
6
7
  formatters << SimpleCov::Formatter::Console
7
8
  end
9
+ # rubocop:enable Style/IfUnlessModifier
8
10
 
9
11
  SimpleCov.start do
10
12
  formatter SimpleCov::Formatter::MultiFormatter.new(formatters)
@@ -1,5 +1,15 @@
1
1
  # Changelog
2
2
 
3
+ ## 1.0.0 / 2020-11-08
4
+
5
+ - Add `MicroMicro::Item#plain_text_properties` and `MicroMicro::Item#url_properties` methods (351e1f1)
6
+ - Add `MicroMicro::Collections::RelationshipsCollection#rels` and `MicroMicro::Collections::RelationshipsCollection#urls` methods(c0e5665)
7
+ - Add `MicroMicro::Collections::PropertiesCollection#names` and `MicroMicro::Collections::PropertiesCollection#values` methods (65486bc)
8
+ - Add `MicroMicro::Collections::ItemsCollection#types` method (6b53a81)
9
+ - Update absolutely dependency (4e67bb2)
10
+ - Add `Collectible` concern and refactor using Composite design pattern (82503b8)
11
+ - Update absolutely dependency (4578fb4)
12
+
3
13
  ## 0.1.0 / 2020-05-06
4
14
 
5
15
  - Initial release!
data/Gemfile CHANGED
@@ -6,9 +6,9 @@ gemspec
6
6
  gem 'pry-byebug', '~> 3.9'
7
7
  gem 'rake', '~> 13.0'
8
8
  gem 'reek', '~> 6.0'
9
- gem 'rspec', '~> 3.9'
10
- gem 'rubocop', '~> 0.82.0'
11
- gem 'rubocop-performance', '~> 1.5'
12
- gem 'rubocop-rspec', '~> 1.38'
13
- gem 'simplecov', '~> 0.18.5'
9
+ gem 'rspec', '~> 3.10'
10
+ gem 'rubocop', '~> 1.2'
11
+ gem 'rubocop-performance', '~> 1.8'
12
+ gem 'rubocop-rspec', '~> 2.0'
13
+ gem 'simplecov', '~> 0.19.1'
14
14
  gem 'simplecov-console', '~> 0.7.2'
data/README.md CHANGED
@@ -53,7 +53,7 @@ An example using a simple `String` of HTML as input:
53
53
  require 'micromicro'
54
54
 
55
55
  doc = MicroMicro.parse('<div class="h-card">Jason Garber</div>', 'https://sixtwothree.org')
56
- #=> #<MicroMicro::Document items: #<MicroMicro::Collections::ItemsCollection count: 1, members: [#<MicroMicro::Item types: ["h-card"], properties: 1, children: 0>]>, relations: #<MicroMicro::Collections::RelationsCollection count: 0, members: []>>
56
+ #=> #<MicroMicro::Document items: #<MicroMicro::Collections::ItemsCollection count: 1, members: [#<MicroMicro::Item types: ["h-card"], properties: 1, children: 0>]>, relationships: #<MicroMicro::Collections::RelationshipsCollection count: 0, members: []>>
57
57
 
58
58
  doc.to_h
59
59
  #=> { :items => [{ :type => ["h-card"], :properties => { :name => ["Jason Garber"] } }], :rels => {}, :"rel-urls" => {} }
@@ -71,7 +71,7 @@ url = "https://tantek.com"
71
71
  rsp = Net::HTTP.get(URI.parse(url))
72
72
 
73
73
  doc = MicroMicro.parse(rsp, url)
74
- #=> #<MicroMicro::Document items: #<MicroMicro::Collections::ItemsCollection count: 1, members: […]>, relations: #<MicroMicro::Collections::RelationsCollection count: 31, members: […]>>
74
+ #=> #<MicroMicro::Document items: #<MicroMicro::Collections::ItemsCollection count: 1, members: […]>, relationships: #<MicroMicro::Collections::RelationshipsCollection count: 31, members: […]>>
75
75
 
76
76
  doc.to_h
77
77
  #=> { :items => [{ :type => ["h-card"], :properties => {…}, :children => […]}], :rels => {…}, :'rel-urls' => {…} }
@@ -81,30 +81,65 @@ doc.to_h
81
81
 
82
82
  Building on the example above, a MicroMicro-parsed document is navigable and manipulable using a familiar `Enumerable`-esque interface.
83
83
 
84
+ #### Items
85
+
84
86
  ```ruby
85
87
  doc.items.first
86
88
  #=> #<MicroMicro::Item types: ["h-card"], properties: 42, children: 6>
87
89
 
90
+ # 🆕 in v1.0.0
91
+ doc.items.types
92
+ #=> ["h-card"]
93
+
94
+ doc.items.first.children
95
+ #=> #<MicroMicro::Collections::ItemsCollection count: 6, members: […]>
96
+ ```
97
+
98
+ #### Properties
99
+
100
+ ```ruby
88
101
  doc.items.first.properties
89
102
  #=> #<MicroMicro::Collections::PropertiesCollection count: 42, members: […]>
90
103
 
104
+ # 🆕 in v1.0.0
105
+ doc.items.first.plain_text_properties
106
+ #=> #<MicroMicro::Collections::PropertiesCollection count: 34, members: […]>
107
+
108
+ # 🆕 in v1.0.0
109
+ doc.items.first.url_properties
110
+ #=> #<MicroMicro::Collections::PropertiesCollection count: 11, members: […]>
111
+
112
+ # 🆕 in v1.0.0
113
+ doc.items.first.properties.names
114
+ #=> ["category", "name", "note", "org", "photo", "pronoun", "pronouns", "role", "uid", "url"]
115
+
116
+ # 🆕 in v1.0.0
117
+ doc.items.first.properties.values
118
+ #=> [{:value=>"https://tantek.com/photo.jpg", :alt=>""}, "https://tantek.com/", "Tantek Çelik", "Inventor, writer, teacher, runner, coder, more.", "Inventor", "writer", "teacher", "runner", "coder", …]
119
+
91
120
  doc.items.first.properties[7]
92
121
  #=> #<MicroMicro::Property name: "category", prefix: "p", value: "teacher">
93
122
 
94
123
  doc.items.first.properties.take(5).map { |property| [property.name, property.value] }
95
124
  #=> [["photo", { :value => "https://tantek.com/photo.jpg", :alt => "" }], ["url", "https://tantek.com/"], ["uid", "https://tantek.com/"], ["name", "Tantek Çelik"], ["role", "Inventor, writer, teacher, runner, coder, more."]]
125
+ ```
96
126
 
97
- doc.items.first.children
98
- #=> #<MicroMicro::Collections::ItemsCollection count: 6, members: […]>
127
+ #### Relationships
99
128
 
100
- doc.relations.first
101
- #=> #<MicroMicro::Relation href: "https://tantek.com/", rels: ["canonical"]>
129
+ ```ruby
130
+ doc.relationships.first
131
+ #=> #<MicroMicro::Relationship href: "https://tantek.com/", rels: ["canonical"]>
102
132
 
103
- doc.relations.map(&:rels).flatten.uniq.sort
133
+ # 🆕 in v1.0.0
134
+ doc.relationships.rels
104
135
  #=> ["alternate", "apple-touch-icon-precomposed", "author", "authorization_endpoint", "bookmark", "canonical", "hub", "icon", "me", "microsub", …]
105
136
 
106
- doc.relations.find { |relation| relation.rels.include?('webmention') }
107
- # => #<MicroMicro::Relation href: "https://webmention.io/tantek.com/webmention", rels: ["webmention"]>
137
+ # 🆕 in v1.0.0
138
+ doc.relationships.urls
139
+ #=> ["http://dribbble.com/tantek/", "http://last.fm/user/tantekc", "https://aperture.p3k.io/microsub/277", "https://en.wikipedia.org/wiki/User:Tantek", "https://github.com/tantek", "https://indieauth.com/auth", "https://indieauth.com/openid", "https://micro.blog/t", "https://pubsubhubbub.superfeedr.com/", "https://tantek.com/", …]
140
+
141
+ doc.relationships.find { |relationship| relationship.rels.include?('webmention') }
142
+ # => #<MicroMicro::Relationship href: "https://webmention.io/tantek.com/webmention", rels: ["webmention"]>
108
143
  ```
109
144
 
110
145
  ## Contributing
@@ -0,0 +1,13 @@
1
+ module MicroMicro
2
+ module Collectible
3
+ attr_accessor :collection
4
+
5
+ def next_all
6
+ collection.split(self).last
7
+ end
8
+
9
+ def prev_all
10
+ collection.split(self).first
11
+ end
12
+ end
13
+ end
@@ -1,15 +1,15 @@
1
1
  module MicroMicro
2
2
  module Collections
3
3
  class BaseCollection
4
+ extend Forwardable
5
+
4
6
  include Enumerable
5
7
 
6
- delegate :[], :each, :last, :length, :split, to: :members
8
+ def_delegators :members, :[], :each, :last, :length, :split
7
9
 
8
- # @param members [Array<MicroMicro::Item, MicroMicro::Property, MicroMicro::Relation>]
10
+ # @param members [Array<MicroMicro::Item, MicroMicro::Property, MicroMicro::Relationship>]
9
11
  def initialize(members = [])
10
- @members = members
11
-
12
- decorate_members if respond_to?(:decorate_members, true)
12
+ members.each { |member| push(member) }
13
13
  end
14
14
 
15
15
  # @return [String]
@@ -17,21 +17,20 @@ module MicroMicro
17
17
  format(%(#<#{self.class.name}:%#0x count: #{count}, members: #{members.inspect}>), object_id)
18
18
  end
19
19
 
20
- # @param member [MicroMicro::Item, MicroMicro::Property, MicroMicro::Relation]
21
- # @return [self]
20
+ # @param member [MicroMicro::Item, MicroMicro::Property, MicroMicro::Relationship]
22
21
  def push(member)
23
- members.push(member)
22
+ members << member
24
23
 
25
- decorate_members if respond_to?(:decorate_members, true)
26
-
27
- self
24
+ member.collection = self
28
25
  end
29
26
 
30
27
  alias << push
31
28
 
32
29
  private
33
30
 
34
- attr_reader :members
31
+ def members
32
+ @members ||= []
33
+ end
35
34
  end
36
35
  end
37
36
  end
@@ -5,6 +5,11 @@ module MicroMicro
5
5
  def to_a
6
6
  map(&:to_h)
7
7
  end
8
+
9
+ # @return [Array<String>]
10
+ def types
11
+ @types ||= map(&:types).flatten.uniq.sort
12
+ end
8
13
  end
9
14
  end
10
15
  end
@@ -1,17 +1,19 @@
1
1
  module MicroMicro
2
2
  module Collections
3
3
  class PropertiesCollection < BaseCollection
4
+ # @return [Array<String>]
5
+ def names
6
+ @names ||= map(&:name).uniq.sort
7
+ end
8
+
4
9
  # @return [Hash{Symbol => Array<String, Hash>}]
5
10
  def to_h
6
- group_by(&:name).symbolize_keys.deep_transform_values do |property|
7
- property.item_node? ? property.value.to_h : property.value
8
- end
11
+ group_by(&:name).symbolize_keys.deep_transform_values(&:value)
9
12
  end
10
13
 
11
- private
12
-
13
- def decorate_members
14
- each { |member| member.collection = self }
14
+ # @return [Array<String, Hash>]
15
+ def values
16
+ @values ||= map(&:value).uniq
15
17
  end
16
18
  end
17
19
  end
@@ -1,23 +1,32 @@
1
1
  module MicroMicro
2
2
  module Collections
3
- class RelationsCollection < BaseCollection
4
- # @see microformats2 Parsing Specification section 1.4
3
+ class RelationshipsCollection < BaseCollection
5
4
  # @see http://microformats.org/wiki/microformats2-parsing#parse_a_hyperlink_element_for_rel_microformats
6
5
  #
7
- # @return [Hash{Symbole => Hash{Symbol => Array, String}}]
6
+ # @return [Hash{Symbol => Hash{Symbol => Array, String}}]
8
7
  def group_by_url
9
- group_by(&:href).symbolize_keys.transform_values { |relations| relations.first.to_h.slice!(:href) }
8
+ group_by(&:href).symbolize_keys.transform_values { |relationships| relationships.first.to_h.slice!(:href) }
10
9
  end
11
10
 
12
- # @see microformats2 Parsing Specification section 1.4
13
11
  # @see http://microformats.org/wiki/microformats2-parsing#parse_a_hyperlink_element_for_rel_microformats
14
12
  #
15
13
  # @return [Hash{Symbol => Array<String>}]
16
14
  def group_by_rel
15
+ # flat_map { |member| member.rels.map { |rel| [rel, member.href] } }.group_by(&:shift).symbolize_keys.transform_values(&:flatten).transform_values(&:uniq)
17
16
  each_with_object(Hash.new { |hash, key| hash[key] = [] }) do |member, hash|
18
17
  member.rels.each { |rel| hash[rel] << member.href }
19
18
  end.symbolize_keys.transform_values(&:uniq)
20
19
  end
20
+
21
+ # @return [Array<String>]
22
+ def rels
23
+ @rels ||= map(&:rels).flatten.uniq.sort
24
+ end
25
+
26
+ # @return [Array<String>]
27
+ def urls
28
+ @urls ||= map(&:href).uniq.sort
29
+ end
21
30
  end
22
31
  end
23
32
  end
@@ -1,57 +1,117 @@
1
1
  module MicroMicro
2
2
  class Document
3
- # @param markup [String] the HTML to parse
4
- # @param base_url [String] the URL associated with the provided markup
3
+ # A map of HTML `srcset` attributes and their associated element names
4
+ #
5
+ # @see https://html.spec.whatwg.org/#srcset-attributes
6
+ # @see https://html.spec.whatwg.org/#attributes-3
7
+ HTML_IMAGE_CANDIDATE_STRINGS_ATTRIBUTES_MAP = {
8
+ 'imagesrcset' => %w[link],
9
+ 'srcset' => %w[img source]
10
+ }.freeze
11
+
12
+ # A map of HTML URL attributes and their associated element names
13
+ #
14
+ # @see https://html.spec.whatwg.org/#attributes-3
15
+ HTML_URL_ATTRIBUTES_MAP = {
16
+ 'action' => %w[form],
17
+ 'cite' => %w[blockquote del ins q],
18
+ 'data' => %w[object],
19
+ 'formaction' => %w[button input],
20
+ 'href' => %w[a area base link],
21
+ 'manifest' => %w[html],
22
+ 'ping' => %w[a area],
23
+ 'poster' => %w[video],
24
+ 'src' => %w[audio embed iframe img input script source track video]
25
+ }.freeze
26
+
27
+ # Parse a string of HTML for microformats2-encoded data.
28
+ #
29
+ # MicroMicro::Document.new('<a href="/" class="h-card" rel="me">Jason Garber</a>', 'https://sixtwothree.org')
30
+ #
31
+ # Or, pull the source HTML of a page on the Web:
32
+ #
33
+ # url = 'https://tantek.com'
34
+ # markup = Net::HTTP.get(URI.parse(url))
35
+ #
36
+ # doc = MicroMicro::Document.new(markup, url)
37
+ #
38
+ # @param markup [String] The HTML to parse for microformats2-encoded data.
39
+ # @param base_url [String] The URL associated with markup. Used for relative URL resolution.
5
40
  def initialize(markup, base_url)
6
41
  @markup = markup
7
42
  @base_url = base_url
43
+
44
+ resolve_relative_urls
8
45
  end
9
46
 
10
47
  # @return [String]
11
48
  def inspect
12
- format(%(#<#{self.class.name}:%#0x items: #{items.inspect}, relations: #{relations.inspect}>), object_id)
49
+ format(%(#<#{self.class.name}:%#0x items: #{items.inspect}, relationships: #{relationships.inspect}>), object_id)
13
50
  end
14
51
 
52
+ # A collection of items parsed from the provided markup.
53
+ #
15
54
  # @return [MicroMicro::Collections::ItemsCollection]
16
55
  def items
17
56
  @items ||= Collections::ItemsCollection.new(Item.items_from(document))
18
57
  end
19
58
 
20
- # @return [MicroMicro::Collections::RelationsCollection]
21
- def relations
22
- @relations ||= Collections::RelationsCollection.new(Relation.relations_from(document))
59
+ # A collection of relationships parsed from the provided markup.
60
+ #
61
+ # @return [MicroMicro::Collections::RelationshipsCollection]
62
+ def relationships
63
+ @relationships ||= Collections::RelationshipsCollection.new(Relationship.relationships_from(document))
23
64
  end
24
65
 
25
- # @see microformats2 Parsing Specification section 1.1
66
+ # Return the parsed document as a Hash.
67
+ #
26
68
  # @see http://microformats.org/wiki/microformats2-parsing#parse_a_document_for_microformats
27
69
  #
28
- # @return [Hash]
70
+ # @return [Hash{Symbol => Array, Hash}]
29
71
  def to_h
30
72
  {
31
73
  items: items.to_a,
32
- rels: relations.group_by_rel,
33
- 'rel-urls': relations.group_by_url
74
+ rels: relationships.group_by_rel,
75
+ 'rel-urls': relationships.group_by_url
34
76
  }
35
77
  end
36
78
 
79
+ # Ignore this node?
80
+ #
37
81
  # @param node [Nokogiri::XML::Element]
38
82
  # @return [Boolean]
39
83
  def self.ignore_node?(node)
40
84
  ignored_node_names.include?(node.name)
41
85
  end
42
86
 
87
+ # A list of HTML element names the parser should ignore.
88
+ #
43
89
  # @return [Array<String>]
44
90
  def self.ignored_node_names
45
91
  %w[script style template]
46
92
  end
47
93
 
94
+ # @see http://microformats.org/wiki/microformats2-parsing#parse_an_element_for_properties
95
+ # @see http://microformats.org/wiki/microformats2-parsing#parsing_for_implied_properties
96
+ #
97
+ # @param context [Nokogiri::HTML::Document, Nokogiri::XML::NodeSet, Nokogiri::XML::Element]
98
+ # @yield [context]
99
+ # @return [String]
100
+ def self.text_content_from(context)
101
+ context.css(*ignored_node_names).unlink
102
+
103
+ yield(context) if block_given?
104
+
105
+ context.text.strip
106
+ end
107
+
48
108
  private
49
109
 
50
110
  attr_reader :base_url, :markup
51
111
 
52
112
  # @return [Nokogiri::XML::Element, nil]
53
113
  def base_element
54
- @base_element ||= Nokogiri::HTML(markup).at_css('base[href]')
114
+ @base_element ||= Nokogiri::HTML(markup).at('//base[@href]')
55
115
  end
56
116
 
57
117
  # @return [Nokogiri::HTML::Document]
@@ -59,12 +119,32 @@ module MicroMicro
59
119
  @document ||= Nokogiri::HTML(markup, resolved_base_url)
60
120
  end
61
121
 
122
+ def resolve_relative_urls
123
+ HTML_URL_ATTRIBUTES_MAP.each do |attribute, names|
124
+ document.xpath(*names.map { |name| "//#{name}[@#{attribute}]" }).each do |node|
125
+ node[attribute] = Absolutely.to_abs(base: resolved_base_url, relative: node[attribute].strip)
126
+ end
127
+ end
128
+
129
+ HTML_IMAGE_CANDIDATE_STRINGS_ATTRIBUTES_MAP.each do |attribute, names|
130
+ document.xpath(*names.map { |name| "//#{name}[@#{attribute}]" }).each do |node|
131
+ candidates = node[attribute].split(',').map(&:strip).map { |candidate| candidate.match(/^(?<url>.+?)(?<descriptor>\s+.+)?$/) }
132
+
133
+ node[attribute] = candidates.map { |candidate| "#{Absolutely.to_abs(base: resolved_base_url, relative: candidate[:url])}#{candidate[:descriptor]}" }.join(', ')
134
+ end
135
+ end
136
+
137
+ self
138
+ end
139
+
62
140
  # @return [String]
63
141
  def resolved_base_url
64
142
  @resolved_base_url ||= begin
65
- return base_url unless base_element
66
-
67
- Absolutely.to_abs(base: base_url, relative: base_element['href'])
143
+ if base_element
144
+ Absolutely.to_abs(base: base_url, relative: base_element['href'].strip)
145
+ else
146
+ base_url
147
+ end
68
148
  end
69
149
  end
70
150
  end