micromicro 0.1.0 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: ae75d6afb6f7d98bc3a2fc12333ccbd31824e1cecd8336057e551fe6b392a643
4
- data.tar.gz: 877da43669217907635d71015ec574ebc656e8755c8f6267229cebde36f281b0
3
+ metadata.gz: 6c5c3cb23e1c8338fef3d7cca36c47e4dcc834f3819525c01bd7ce6dab314971
4
+ data.tar.gz: 27794eaac80a6701c7910f4e31200a4bc999d0b8082b3ca1d2dd5a87a0bdaa82
5
5
  SHA512:
6
- metadata.gz: a4b5a68c2d343fe0935d2103016aacf0b94d7a7c391d3ea360160a25fcb01bc16b7c70ae715e12b17142cedb928bea782b8057db5002df3e9ece2a99fdf1558c
7
- data.tar.gz: 16a62f01b013d56547cebcac3001ecf59b77b3c14653f473a833d66589c6be31b5c53c2e6c547e0ce5c65f3dafb18feeb94be30ae0f57c65a96def024d6afccb
6
+ metadata.gz: d59f245021fb8bec36e1e319cffce3c3a6168ea763952efc8759bda7cb7a7bf13420632cdcfd2b988d72524a348f5a1238ff206cfd3aa75bfb5a57621f4f3b8c
7
+ data.tar.gz: 9355a25a3b65fe72828abf95d4dcec2829e46772904487b030a906cd95708f4bc946570f507bae326df2a3e8066e3de1c332799977cf8db749269e095bba413b
data/.simplecov CHANGED
@@ -2,9 +2,11 @@ require 'simplecov-console'
2
2
 
3
3
  formatters = [SimpleCov::Formatter::HTMLFormatter]
4
4
 
5
+ # rubocop:disable Style/IfUnlessModifier
5
6
  if RSpec.configuration.files_to_run.length > 1
6
7
  formatters << SimpleCov::Formatter::Console
7
8
  end
9
+ # rubocop:enable Style/IfUnlessModifier
8
10
 
9
11
  SimpleCov.start do
10
12
  formatter SimpleCov::Formatter::MultiFormatter.new(formatters)
@@ -1,5 +1,15 @@
1
1
  # Changelog
2
2
 
3
+ ## 1.0.0 / 2020-11-08
4
+
5
+ - Add `MicroMicro::Item#plain_text_properties` and `MicroMicro::Item#url_properties` methods (351e1f1)
6
+ - Add `MicroMicro::Collections::RelationshipsCollection#rels` and `MicroMicro::Collections::RelationshipsCollection#urls` methods(c0e5665)
7
+ - Add `MicroMicro::Collections::PropertiesCollection#names` and `MicroMicro::Collections::PropertiesCollection#values` methods (65486bc)
8
+ - Add `MicroMicro::Collections::ItemsCollection#types` method (6b53a81)
9
+ - Update absolutely dependency (4e67bb2)
10
+ - Add `Collectible` concern and refactor using Composite design pattern (82503b8)
11
+ - Update absolutely dependency (4578fb4)
12
+
3
13
  ## 0.1.0 / 2020-05-06
4
14
 
5
15
  - Initial release!
data/Gemfile CHANGED
@@ -6,9 +6,9 @@ gemspec
6
6
  gem 'pry-byebug', '~> 3.9'
7
7
  gem 'rake', '~> 13.0'
8
8
  gem 'reek', '~> 6.0'
9
- gem 'rspec', '~> 3.9'
10
- gem 'rubocop', '~> 0.82.0'
11
- gem 'rubocop-performance', '~> 1.5'
12
- gem 'rubocop-rspec', '~> 1.38'
13
- gem 'simplecov', '~> 0.18.5'
9
+ gem 'rspec', '~> 3.10'
10
+ gem 'rubocop', '~> 1.2'
11
+ gem 'rubocop-performance', '~> 1.8'
12
+ gem 'rubocop-rspec', '~> 2.0'
13
+ gem 'simplecov', '~> 0.19.1'
14
14
  gem 'simplecov-console', '~> 0.7.2'
data/README.md CHANGED
@@ -53,7 +53,7 @@ An example using a simple `String` of HTML as input:
53
53
  require 'micromicro'
54
54
 
55
55
  doc = MicroMicro.parse('<div class="h-card">Jason Garber</div>', 'https://sixtwothree.org')
56
- #=> #<MicroMicro::Document items: #<MicroMicro::Collections::ItemsCollection count: 1, members: [#<MicroMicro::Item types: ["h-card"], properties: 1, children: 0>]>, relations: #<MicroMicro::Collections::RelationsCollection count: 0, members: []>>
56
+ #=> #<MicroMicro::Document items: #<MicroMicro::Collections::ItemsCollection count: 1, members: [#<MicroMicro::Item types: ["h-card"], properties: 1, children: 0>]>, relationships: #<MicroMicro::Collections::RelationshipsCollection count: 0, members: []>>
57
57
 
58
58
  doc.to_h
59
59
  #=> { :items => [{ :type => ["h-card"], :properties => { :name => ["Jason Garber"] } }], :rels => {}, :"rel-urls" => {} }
@@ -71,7 +71,7 @@ url = "https://tantek.com"
71
71
  rsp = Net::HTTP.get(URI.parse(url))
72
72
 
73
73
  doc = MicroMicro.parse(rsp, url)
74
- #=> #<MicroMicro::Document items: #<MicroMicro::Collections::ItemsCollection count: 1, members: […]>, relations: #<MicroMicro::Collections::RelationsCollection count: 31, members: […]>>
74
+ #=> #<MicroMicro::Document items: #<MicroMicro::Collections::ItemsCollection count: 1, members: […]>, relationships: #<MicroMicro::Collections::RelationshipsCollection count: 31, members: […]>>
75
75
 
76
76
  doc.to_h
77
77
  #=> { :items => [{ :type => ["h-card"], :properties => {…}, :children => […]}], :rels => {…}, :'rel-urls' => {…} }
@@ -81,30 +81,65 @@ doc.to_h
81
81
 
82
82
  Building on the example above, a MicroMicro-parsed document is navigable and manipulable using a familiar `Enumerable`-esque interface.
83
83
 
84
+ #### Items
85
+
84
86
  ```ruby
85
87
  doc.items.first
86
88
  #=> #<MicroMicro::Item types: ["h-card"], properties: 42, children: 6>
87
89
 
90
+ # 🆕 in v1.0.0
91
+ doc.items.types
92
+ #=> ["h-card"]
93
+
94
+ doc.items.first.children
95
+ #=> #<MicroMicro::Collections::ItemsCollection count: 6, members: […]>
96
+ ```
97
+
98
+ #### Properties
99
+
100
+ ```ruby
88
101
  doc.items.first.properties
89
102
  #=> #<MicroMicro::Collections::PropertiesCollection count: 42, members: […]>
90
103
 
104
+ # 🆕 in v1.0.0
105
+ doc.items.first.plain_text_properties
106
+ #=> #<MicroMicro::Collections::PropertiesCollection count: 34, members: […]>
107
+
108
+ # 🆕 in v1.0.0
109
+ doc.items.first.url_properties
110
+ #=> #<MicroMicro::Collections::PropertiesCollection count: 11, members: […]>
111
+
112
+ # 🆕 in v1.0.0
113
+ doc.items.first.properties.names
114
+ #=> ["category", "name", "note", "org", "photo", "pronoun", "pronouns", "role", "uid", "url"]
115
+
116
+ # 🆕 in v1.0.0
117
+ doc.items.first.properties.values
118
+ #=> [{:value=>"https://tantek.com/photo.jpg", :alt=>""}, "https://tantek.com/", "Tantek Çelik", "Inventor, writer, teacher, runner, coder, more.", "Inventor", "writer", "teacher", "runner", "coder", …]
119
+
91
120
  doc.items.first.properties[7]
92
121
  #=> #<MicroMicro::Property name: "category", prefix: "p", value: "teacher">
93
122
 
94
123
  doc.items.first.properties.take(5).map { |property| [property.name, property.value] }
95
124
  #=> [["photo", { :value => "https://tantek.com/photo.jpg", :alt => "" }], ["url", "https://tantek.com/"], ["uid", "https://tantek.com/"], ["name", "Tantek Çelik"], ["role", "Inventor, writer, teacher, runner, coder, more."]]
125
+ ```
96
126
 
97
- doc.items.first.children
98
- #=> #<MicroMicro::Collections::ItemsCollection count: 6, members: […]>
127
+ #### Relationships
99
128
 
100
- doc.relations.first
101
- #=> #<MicroMicro::Relation href: "https://tantek.com/", rels: ["canonical"]>
129
+ ```ruby
130
+ doc.relationships.first
131
+ #=> #<MicroMicro::Relationship href: "https://tantek.com/", rels: ["canonical"]>
102
132
 
103
- doc.relations.map(&:rels).flatten.uniq.sort
133
+ # 🆕 in v1.0.0
134
+ doc.relationships.rels
104
135
  #=> ["alternate", "apple-touch-icon-precomposed", "author", "authorization_endpoint", "bookmark", "canonical", "hub", "icon", "me", "microsub", …]
105
136
 
106
- doc.relations.find { |relation| relation.rels.include?('webmention') }
107
- # => #<MicroMicro::Relation href: "https://webmention.io/tantek.com/webmention", rels: ["webmention"]>
137
+ # 🆕 in v1.0.0
138
+ doc.relationships.urls
139
+ #=> ["http://dribbble.com/tantek/", "http://last.fm/user/tantekc", "https://aperture.p3k.io/microsub/277", "https://en.wikipedia.org/wiki/User:Tantek", "https://github.com/tantek", "https://indieauth.com/auth", "https://indieauth.com/openid", "https://micro.blog/t", "https://pubsubhubbub.superfeedr.com/", "https://tantek.com/", …]
140
+
141
+ doc.relationships.find { |relationship| relationship.rels.include?('webmention') }
142
+ # => #<MicroMicro::Relationship href: "https://webmention.io/tantek.com/webmention", rels: ["webmention"]>
108
143
  ```
109
144
 
110
145
  ## Contributing
@@ -0,0 +1,13 @@
1
+ module MicroMicro
2
+ module Collectible
3
+ attr_accessor :collection
4
+
5
+ def next_all
6
+ collection.split(self).last
7
+ end
8
+
9
+ def prev_all
10
+ collection.split(self).first
11
+ end
12
+ end
13
+ end
@@ -1,15 +1,15 @@
1
1
  module MicroMicro
2
2
  module Collections
3
3
  class BaseCollection
4
+ extend Forwardable
5
+
4
6
  include Enumerable
5
7
 
6
- delegate :[], :each, :last, :length, :split, to: :members
8
+ def_delegators :members, :[], :each, :last, :length, :split
7
9
 
8
- # @param members [Array<MicroMicro::Item, MicroMicro::Property, MicroMicro::Relation>]
10
+ # @param members [Array<MicroMicro::Item, MicroMicro::Property, MicroMicro::Relationship>]
9
11
  def initialize(members = [])
10
- @members = members
11
-
12
- decorate_members if respond_to?(:decorate_members, true)
12
+ members.each { |member| push(member) }
13
13
  end
14
14
 
15
15
  # @return [String]
@@ -17,21 +17,20 @@ module MicroMicro
17
17
  format(%(#<#{self.class.name}:%#0x count: #{count}, members: #{members.inspect}>), object_id)
18
18
  end
19
19
 
20
- # @param member [MicroMicro::Item, MicroMicro::Property, MicroMicro::Relation]
21
- # @return [self]
20
+ # @param member [MicroMicro::Item, MicroMicro::Property, MicroMicro::Relationship]
22
21
  def push(member)
23
- members.push(member)
22
+ members << member
24
23
 
25
- decorate_members if respond_to?(:decorate_members, true)
26
-
27
- self
24
+ member.collection = self
28
25
  end
29
26
 
30
27
  alias << push
31
28
 
32
29
  private
33
30
 
34
- attr_reader :members
31
+ def members
32
+ @members ||= []
33
+ end
35
34
  end
36
35
  end
37
36
  end
@@ -5,6 +5,11 @@ module MicroMicro
5
5
  def to_a
6
6
  map(&:to_h)
7
7
  end
8
+
9
+ # @return [Array<String>]
10
+ def types
11
+ @types ||= map(&:types).flatten.uniq.sort
12
+ end
8
13
  end
9
14
  end
10
15
  end
@@ -1,17 +1,19 @@
1
1
  module MicroMicro
2
2
  module Collections
3
3
  class PropertiesCollection < BaseCollection
4
+ # @return [Array<String>]
5
+ def names
6
+ @names ||= map(&:name).uniq.sort
7
+ end
8
+
4
9
  # @return [Hash{Symbol => Array<String, Hash>}]
5
10
  def to_h
6
- group_by(&:name).symbolize_keys.deep_transform_values do |property|
7
- property.item_node? ? property.value.to_h : property.value
8
- end
11
+ group_by(&:name).symbolize_keys.deep_transform_values(&:value)
9
12
  end
10
13
 
11
- private
12
-
13
- def decorate_members
14
- each { |member| member.collection = self }
14
+ # @return [Array<String, Hash>]
15
+ def values
16
+ @values ||= map(&:value).uniq
15
17
  end
16
18
  end
17
19
  end
@@ -1,23 +1,32 @@
1
1
  module MicroMicro
2
2
  module Collections
3
- class RelationsCollection < BaseCollection
4
- # @see microformats2 Parsing Specification section 1.4
3
+ class RelationshipsCollection < BaseCollection
5
4
  # @see http://microformats.org/wiki/microformats2-parsing#parse_a_hyperlink_element_for_rel_microformats
6
5
  #
7
- # @return [Hash{Symbole => Hash{Symbol => Array, String}}]
6
+ # @return [Hash{Symbol => Hash{Symbol => Array, String}}]
8
7
  def group_by_url
9
- group_by(&:href).symbolize_keys.transform_values { |relations| relations.first.to_h.slice!(:href) }
8
+ group_by(&:href).symbolize_keys.transform_values { |relationships| relationships.first.to_h.slice!(:href) }
10
9
  end
11
10
 
12
- # @see microformats2 Parsing Specification section 1.4
13
11
  # @see http://microformats.org/wiki/microformats2-parsing#parse_a_hyperlink_element_for_rel_microformats
14
12
  #
15
13
  # @return [Hash{Symbol => Array<String>}]
16
14
  def group_by_rel
15
+ # flat_map { |member| member.rels.map { |rel| [rel, member.href] } }.group_by(&:shift).symbolize_keys.transform_values(&:flatten).transform_values(&:uniq)
17
16
  each_with_object(Hash.new { |hash, key| hash[key] = [] }) do |member, hash|
18
17
  member.rels.each { |rel| hash[rel] << member.href }
19
18
  end.symbolize_keys.transform_values(&:uniq)
20
19
  end
20
+
21
+ # @return [Array<String>]
22
+ def rels
23
+ @rels ||= map(&:rels).flatten.uniq.sort
24
+ end
25
+
26
+ # @return [Array<String>]
27
+ def urls
28
+ @urls ||= map(&:href).uniq.sort
29
+ end
21
30
  end
22
31
  end
23
32
  end
@@ -1,57 +1,117 @@
1
1
  module MicroMicro
2
2
  class Document
3
- # @param markup [String] the HTML to parse
4
- # @param base_url [String] the URL associated with the provided markup
3
+ # A map of HTML `srcset` attributes and their associated element names
4
+ #
5
+ # @see https://html.spec.whatwg.org/#srcset-attributes
6
+ # @see https://html.spec.whatwg.org/#attributes-3
7
+ HTML_IMAGE_CANDIDATE_STRINGS_ATTRIBUTES_MAP = {
8
+ 'imagesrcset' => %w[link],
9
+ 'srcset' => %w[img source]
10
+ }.freeze
11
+
12
+ # A map of HTML URL attributes and their associated element names
13
+ #
14
+ # @see https://html.spec.whatwg.org/#attributes-3
15
+ HTML_URL_ATTRIBUTES_MAP = {
16
+ 'action' => %w[form],
17
+ 'cite' => %w[blockquote del ins q],
18
+ 'data' => %w[object],
19
+ 'formaction' => %w[button input],
20
+ 'href' => %w[a area base link],
21
+ 'manifest' => %w[html],
22
+ 'ping' => %w[a area],
23
+ 'poster' => %w[video],
24
+ 'src' => %w[audio embed iframe img input script source track video]
25
+ }.freeze
26
+
27
+ # Parse a string of HTML for microformats2-encoded data.
28
+ #
29
+ # MicroMicro::Document.new('<a href="/" class="h-card" rel="me">Jason Garber</a>', 'https://sixtwothree.org')
30
+ #
31
+ # Or, pull the source HTML of a page on the Web:
32
+ #
33
+ # url = 'https://tantek.com'
34
+ # markup = Net::HTTP.get(URI.parse(url))
35
+ #
36
+ # doc = MicroMicro::Document.new(markup, url)
37
+ #
38
+ # @param markup [String] The HTML to parse for microformats2-encoded data.
39
+ # @param base_url [String] The URL associated with markup. Used for relative URL resolution.
5
40
  def initialize(markup, base_url)
6
41
  @markup = markup
7
42
  @base_url = base_url
43
+
44
+ resolve_relative_urls
8
45
  end
9
46
 
10
47
  # @return [String]
11
48
  def inspect
12
- format(%(#<#{self.class.name}:%#0x items: #{items.inspect}, relations: #{relations.inspect}>), object_id)
49
+ format(%(#<#{self.class.name}:%#0x items: #{items.inspect}, relationships: #{relationships.inspect}>), object_id)
13
50
  end
14
51
 
52
+ # A collection of items parsed from the provided markup.
53
+ #
15
54
  # @return [MicroMicro::Collections::ItemsCollection]
16
55
  def items
17
56
  @items ||= Collections::ItemsCollection.new(Item.items_from(document))
18
57
  end
19
58
 
20
- # @return [MicroMicro::Collections::RelationsCollection]
21
- def relations
22
- @relations ||= Collections::RelationsCollection.new(Relation.relations_from(document))
59
+ # A collection of relationships parsed from the provided markup.
60
+ #
61
+ # @return [MicroMicro::Collections::RelationshipsCollection]
62
+ def relationships
63
+ @relationships ||= Collections::RelationshipsCollection.new(Relationship.relationships_from(document))
23
64
  end
24
65
 
25
- # @see microformats2 Parsing Specification section 1.1
66
+ # Return the parsed document as a Hash.
67
+ #
26
68
  # @see http://microformats.org/wiki/microformats2-parsing#parse_a_document_for_microformats
27
69
  #
28
- # @return [Hash]
70
+ # @return [Hash{Symbol => Array, Hash}]
29
71
  def to_h
30
72
  {
31
73
  items: items.to_a,
32
- rels: relations.group_by_rel,
33
- 'rel-urls': relations.group_by_url
74
+ rels: relationships.group_by_rel,
75
+ 'rel-urls': relationships.group_by_url
34
76
  }
35
77
  end
36
78
 
79
+ # Ignore this node?
80
+ #
37
81
  # @param node [Nokogiri::XML::Element]
38
82
  # @return [Boolean]
39
83
  def self.ignore_node?(node)
40
84
  ignored_node_names.include?(node.name)
41
85
  end
42
86
 
87
+ # A list of HTML element names the parser should ignore.
88
+ #
43
89
  # @return [Array<String>]
44
90
  def self.ignored_node_names
45
91
  %w[script style template]
46
92
  end
47
93
 
94
+ # @see http://microformats.org/wiki/microformats2-parsing#parse_an_element_for_properties
95
+ # @see http://microformats.org/wiki/microformats2-parsing#parsing_for_implied_properties
96
+ #
97
+ # @param context [Nokogiri::HTML::Document, Nokogiri::XML::NodeSet, Nokogiri::XML::Element]
98
+ # @yield [context]
99
+ # @return [String]
100
+ def self.text_content_from(context)
101
+ context.css(*ignored_node_names).unlink
102
+
103
+ yield(context) if block_given?
104
+
105
+ context.text.strip
106
+ end
107
+
48
108
  private
49
109
 
50
110
  attr_reader :base_url, :markup
51
111
 
52
112
  # @return [Nokogiri::XML::Element, nil]
53
113
  def base_element
54
- @base_element ||= Nokogiri::HTML(markup).at_css('base[href]')
114
+ @base_element ||= Nokogiri::HTML(markup).at('//base[@href]')
55
115
  end
56
116
 
57
117
  # @return [Nokogiri::HTML::Document]
@@ -59,12 +119,32 @@ module MicroMicro
59
119
  @document ||= Nokogiri::HTML(markup, resolved_base_url)
60
120
  end
61
121
 
122
+ def resolve_relative_urls
123
+ HTML_URL_ATTRIBUTES_MAP.each do |attribute, names|
124
+ document.xpath(*names.map { |name| "//#{name}[@#{attribute}]" }).each do |node|
125
+ node[attribute] = Absolutely.to_abs(base: resolved_base_url, relative: node[attribute].strip)
126
+ end
127
+ end
128
+
129
+ HTML_IMAGE_CANDIDATE_STRINGS_ATTRIBUTES_MAP.each do |attribute, names|
130
+ document.xpath(*names.map { |name| "//#{name}[@#{attribute}]" }).each do |node|
131
+ candidates = node[attribute].split(',').map(&:strip).map { |candidate| candidate.match(/^(?<url>.+?)(?<descriptor>\s+.+)?$/) }
132
+
133
+ node[attribute] = candidates.map { |candidate| "#{Absolutely.to_abs(base: resolved_base_url, relative: candidate[:url])}#{candidate[:descriptor]}" }.join(', ')
134
+ end
135
+ end
136
+
137
+ self
138
+ end
139
+
62
140
  # @return [String]
63
141
  def resolved_base_url
64
142
  @resolved_base_url ||= begin
65
- return base_url unless base_element
66
-
67
- Absolutely.to_abs(base: base_url, relative: base_element['href'])
143
+ if base_element
144
+ Absolutely.to_abs(base: base_url, relative: base_element['href'].strip)
145
+ else
146
+ base_url
147
+ end
68
148
  end
69
149
  end
70
150
  end