micromicro 1.0.0 → 2.0.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +33 -0
- data/CONTRIBUTING.md +2 -2
- data/README.md +21 -20
- data/lib/micro_micro/collectible.rb +2 -0
- data/lib/micro_micro/collections/base_collection.rb +7 -1
- data/lib/micro_micro/collections/items_collection.rb +3 -1
- data/lib/micro_micro/collections/properties_collection.rb +12 -0
- data/lib/micro_micro/collections/relationships_collection.rb +11 -10
- data/lib/micro_micro/document.rb +11 -99
- data/lib/micro_micro/helpers.rb +88 -0
- data/lib/micro_micro/implied_property.rb +2 -0
- data/lib/micro_micro/item.rb +57 -62
- data/lib/micro_micro/parsers/base_implied_property_parser.rb +29 -0
- data/lib/micro_micro/parsers/base_property_parser.rb +6 -14
- data/lib/micro_micro/parsers/date_time_parser.rb +60 -25
- data/lib/micro_micro/parsers/date_time_property_parser.rb +10 -9
- data/lib/micro_micro/parsers/embedded_markup_property_parser.rb +4 -3
- data/lib/micro_micro/parsers/implied_name_property_parser.rb +15 -17
- data/lib/micro_micro/parsers/implied_photo_property_parser.rb +21 -45
- data/lib/micro_micro/parsers/implied_url_property_parser.rb +12 -31
- data/lib/micro_micro/parsers/plain_text_property_parser.rb +4 -2
- data/lib/micro_micro/parsers/url_property_parser.rb +22 -14
- data/lib/micro_micro/parsers/value_class_pattern_parser.rb +29 -44
- data/lib/micro_micro/property.rb +68 -56
- data/lib/micro_micro/relationship.rb +15 -13
- data/lib/micro_micro/version.rb +3 -1
- data/lib/micromicro.rb +31 -26
- data/micromicro.gemspec +14 -9
- metadata +23 -32
- data/.editorconfig +0 -14
- data/.gitignore +0 -34
- data/.gitmodules +0 -3
- data/.reek.yml +0 -8
- data/.rspec +0 -2
- data/.rubocop +0 -3
- data/.rubocop.yml +0 -25
- data/.ruby-version +0 -1
- data/.simplecov +0 -13
- data/.travis.yml +0 -19
- data/Gemfile +0 -14
- data/Rakefile +0 -18
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: d9c965d277e7c87c68de6ce7a12aff28da257d813573de2519bfe36943df1f4f
|
4
|
+
data.tar.gz: a62d02a8fe962b060c36a9c615445f586fe76ea1c570d71f05175d140d631d32
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 8ee19e4ca072c36137746dd54e435405a9a749761c9b735d75ff9f97a5c28f8e5367f79c3a7ad45c83039eb8cb54f1043cac2dd7ddcd1bfd2d4fbbad0d218ad6
|
7
|
+
data.tar.gz: f5f88370b3fc92e2d467c8e358798c6266b5fb9dbfe50fb13e74b2235572e9adfbfeb4f171b1df3d5bf474cc549bac7adfe3fa7c2a59b6830d2bacc3cab3d500
|
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,38 @@
|
|
1
1
|
# Changelog
|
2
2
|
|
3
|
+
## 2.0.1 / 2022-08-20
|
4
|
+
|
5
|
+
- Use ruby/debug instead of pry-byebug (2965b2e)
|
6
|
+
- Update nokogiri-html-ext to v0.2.2 (921c486)
|
7
|
+
- Include root items with property class names (dd14212)
|
8
|
+
|
9
|
+
## 2.0.0 / 2022-08-12
|
10
|
+
|
11
|
+
- Refactor implied property parsers (203fec9)
|
12
|
+
- Add `Helpers` module (caa1c02)
|
13
|
+
- New `PropertiesCollection` and `Property` instance methods (e9bb38b):
|
14
|
+
- `PropertiesCollection#plain_text_properties`
|
15
|
+
- `PropertiesCollection#url_properties`
|
16
|
+
- `Property#date_time_property?`
|
17
|
+
- `Property#embedded_markup_property?`
|
18
|
+
- `Property#plain_text_property?`
|
19
|
+
- `Property#url_property?`
|
20
|
+
- Remove Addressable (66c2bb4)
|
21
|
+
- Refactor classes to use nokogiri-html-ext (33fdf4a)
|
22
|
+
- Update activesupport (563bf56)
|
23
|
+
- **Breaking change:** Set minimum supported Ruby to 2.7 (ba17d05)
|
24
|
+
- Update development Ruby to 2.7.6 (ba17d05)
|
25
|
+
- Remove Reek (c1e76c5)
|
26
|
+
- Update runtime dependency version constraints (f83f26a)
|
27
|
+
- ~~**Breaking change:** Set minimum supported Ruby to 2.6~~ (fc588cd)
|
28
|
+
- ~~Update development Ruby to 2.6.10~~ (d05a2ac)
|
29
|
+
|
30
|
+
## 1.1.0 / 2021-06-10
|
31
|
+
|
32
|
+
- Replace Absolutely dependency with Addressable (e93721b)
|
33
|
+
- Add support for Ruby 3.0 (d897c54)
|
34
|
+
- Update development Ruby version to 2.6.10 (051c9ad)
|
35
|
+
|
3
36
|
## 1.0.0 / 2020-11-08
|
4
37
|
|
5
38
|
- Add `MicroMicro::Item#plain_text_properties` and `MicroMicro::Item#url_properties` methods (351e1f1)
|
data/CONTRIBUTING.md
CHANGED
@@ -8,9 +8,9 @@ There are a couple ways you can help improve MicroMicro:
|
|
8
8
|
|
9
9
|
## Getting Started
|
10
10
|
|
11
|
-
MicroMicro is developed using Ruby 2.
|
11
|
+
MicroMicro is developed using Ruby 2.7.6 and is additionally tested against Ruby 3.0 and 3.1 using [GitHub Actions](https://github.com/jgarber623/micromicro/actions).
|
12
12
|
|
13
|
-
Before making changes to MicroMicro, you'll want to install Ruby 2.
|
13
|
+
Before making changes to MicroMicro, you'll want to install Ruby 2.7.6. It's recommended that you use a Ruby version managment tool like [rbenv](https://github.com/rbenv/rbenv), [chruby](https://github.com/postmodern/chruby), or [rvm](https://github.com/rvm/rvm). Once you've installed Ruby 2.7.6 using your method of choice, install the project's gems by running:
|
14
14
|
|
15
15
|
```sh
|
16
16
|
bundle install
|
data/README.md
CHANGED
@@ -1,42 +1,43 @@
|
|
1
1
|
# MicroMicro
|
2
2
|
|
3
|
-
**A Ruby gem for extracting [microformats2](
|
3
|
+
**A Ruby gem for extracting [microformats2](https://microformats.org/wiki/microformats2)-encoded data from HTML documents.**
|
4
4
|
|
5
|
-
[![
|
6
|
-
[![
|
7
|
-
[![
|
8
|
-
[![
|
5
|
+
[![Gem](https://img.shields.io/gem/v/micromicro.svg?logo=rubygems&style=for-the-badge)](https://rubygems.org/gems/micromicro)
|
6
|
+
[![Downloads](https://img.shields.io/gem/dt/micromicro.svg?logo=rubygems&style=for-the-badge)](https://rubygems.org/gems/micromicro)
|
7
|
+
[![Build](https://img.shields.io/github/workflow/status/jgarber623/micromicro/CI?logo=github&style=for-the-badge)](https://github.com/jgarber623/micromicro/actions/workflows/ci.yml)
|
8
|
+
[![Maintainability](https://img.shields.io/codeclimate/maintainability/jgarber623/micromicro.svg?logo=code-climate&style=for-the-badge)](https://codeclimate.com/github/jgarber623/micromicro)
|
9
|
+
[![Coverage](https://img.shields.io/codeclimate/c/jgarber623/micromicro.svg?logo=code-climate&style=for-the-badge)](https://codeclimate.com/github/jgarber623/micromicro/code)
|
9
10
|
|
10
11
|
## Key Features
|
11
12
|
|
12
|
-
- Parses microformats2-encoded HTML documents according to the [microformats2 parsing specification](
|
13
|
+
- Parses microformats2-encoded HTML documents according to the [microformats2 parsing specification](https://microformats.org/wiki/microformats2-parsing)
|
13
14
|
- Passes all microformats2 tests from [the official test suite](https://github.com/microformats/tests)¹
|
14
|
-
- Supports Ruby 2.
|
15
|
+
- Supports Ruby 2.7 and newer
|
15
16
|
|
16
|
-
**Note:** MicroMicro **does not** parse [Classic Microformats](
|
17
|
+
**Note:** MicroMicro **does not** parse [Classic Microformats](https://microformats.org/wiki/Main_Page#Classic_Microformats) (referred to in [the parsing specification](https://microformats.org/wiki/microformats2-parsing#note_backward_compatibility_details) as "backcompat root classes" and "backcompat properties" and in vocabulary specifications in the "Parser Compatibility" sections [e.g. [h-entry](https://microformats.org/wiki/h-entry#Parser_Compatibility)]). To parse documents marked up with Classic Microformats, consider using [the official microformats-ruby parser](https://github.com/microformats/microformats-ruby).
|
17
18
|
|
18
19
|
<small>¹ …with some exceptions until [this pull request](https://github.com/microformats/tests/pull/112) is merged.</small>
|
19
20
|
|
20
21
|
## Getting Started
|
21
22
|
|
22
|
-
Before installing and using MicroMicro, you'll want to have [Ruby](https://www.ruby-lang.org) 2.
|
23
|
+
Before installing and using MicroMicro, you'll want to have [Ruby](https://www.ruby-lang.org) 2.7 (or newer) installed. It's recommended that you use a Ruby version managment tool like [rbenv](https://github.com/rbenv/rbenv), [chruby](https://github.com/postmodern/chruby), or [rvm](https://github.com/rvm/rvm).
|
23
24
|
|
24
|
-
MicroMicro is developed using Ruby 2.
|
25
|
+
MicroMicro is developed using Ruby 2.7.6 and is additionally tested against Ruby 3.0 and 3.1 using [GitHub Actions](https://github.com/jgarber623/micromicro/actions).
|
25
26
|
|
26
27
|
## Installation
|
27
28
|
|
28
|
-
If you're using [Bundler](https://bundler.io), add MicroMicro to your project's
|
29
|
+
If you're using [Bundler](https://bundler.io) to manage gem dependencies, add MicroMicro to your project's Gemfile:
|
29
30
|
|
30
31
|
```ruby
|
31
|
-
source 'https://rubygems.org'
|
32
|
-
|
33
32
|
gem 'micromicro'
|
34
33
|
```
|
35
34
|
|
36
|
-
…and
|
35
|
+
…and run `bundle install` in your shell.
|
36
|
+
|
37
|
+
To install the gem manually, run the following in your shell:
|
37
38
|
|
38
39
|
```sh
|
39
|
-
|
40
|
+
gem install micromicro
|
40
41
|
```
|
41
42
|
|
42
43
|
## Usage
|
@@ -64,10 +65,10 @@ The `Hash` produced by calling `doc.to_h` may be converted to JSON (e.g. `doc.to
|
|
64
65
|
Another example pulling the source HTML from [Tantek](https://tantek.com)'s website:
|
65
66
|
|
66
67
|
```ruby
|
67
|
-
require
|
68
|
-
require
|
68
|
+
require 'net/http'
|
69
|
+
require 'micromicro'
|
69
70
|
|
70
|
-
url =
|
71
|
+
url = 'https://tantek.com'
|
71
72
|
rsp = Net::HTTP.get(URI.parse(url))
|
72
73
|
|
73
74
|
doc = MicroMicro.parse(rsp, url)
|
@@ -144,11 +145,11 @@ doc.relationships.find { |relationship| relationship.rels.include?('webmention')
|
|
144
145
|
|
145
146
|
## Contributing
|
146
147
|
|
147
|
-
Interested in helping improve MicroMicro? Awesome! Your help is greatly appreciated. See [CONTRIBUTING.md](https://github.com/jgarber623/micromicro/blob/
|
148
|
+
Interested in helping improve MicroMicro? Awesome! Your help is greatly appreciated. See [CONTRIBUTING.md](https://github.com/jgarber623/micromicro/blob/main/CONTRIBUTING.md) for details.
|
148
149
|
|
149
150
|
## Acknowledgments
|
150
151
|
|
151
|
-
MicroMicro wouldn't exist without the hard work of everyone involved in the [microformats](
|
152
|
+
MicroMicro wouldn't exist without the hard work of everyone involved in the [microformats](https://microformats.org) community. Additionally, the comprehensive [microformats test suite](https://github.com/microformats/tests) was invaluable in the development of this Ruby gem.
|
152
153
|
|
153
154
|
MicroMicro is written and maintained by [Jason Garber](https://sixtwothree.org).
|
154
155
|
|
@@ -1,3 +1,5 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
1
3
|
module MicroMicro
|
2
4
|
module Collections
|
3
5
|
class BaseCollection
|
@@ -12,10 +14,14 @@ module MicroMicro
|
|
12
14
|
members.each { |member| push(member) }
|
13
15
|
end
|
14
16
|
|
17
|
+
# :nocov:
|
15
18
|
# @return [String]
|
16
19
|
def inspect
|
17
|
-
|
20
|
+
"#<#{self.class}:#{format('%#0x', object_id)} " \
|
21
|
+
"count: #{count}, " \
|
22
|
+
"members: #{members.inspect}>"
|
18
23
|
end
|
24
|
+
# :nocov:
|
19
25
|
|
20
26
|
# @param member [MicroMicro::Item, MicroMicro::Property, MicroMicro::Relationship]
|
21
27
|
def push(member)
|
@@ -1,3 +1,5 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
1
3
|
module MicroMicro
|
2
4
|
module Collections
|
3
5
|
class ItemsCollection < BaseCollection
|
@@ -8,7 +10,7 @@ module MicroMicro
|
|
8
10
|
|
9
11
|
# @return [Array<String>]
|
10
12
|
def types
|
11
|
-
@types ||=
|
13
|
+
@types ||= flat_map(&:types).uniq.sort
|
12
14
|
end
|
13
15
|
end
|
14
16
|
end
|
@@ -1,3 +1,5 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
1
3
|
module MicroMicro
|
2
4
|
module Collections
|
3
5
|
class PropertiesCollection < BaseCollection
|
@@ -6,11 +8,21 @@ module MicroMicro
|
|
6
8
|
@names ||= map(&:name).uniq.sort
|
7
9
|
end
|
8
10
|
|
11
|
+
# @return [MicroMicro::Collections::PropertiesCollection]
|
12
|
+
def plain_text_properties
|
13
|
+
self.class.new(select(&:plain_text_property?))
|
14
|
+
end
|
15
|
+
|
9
16
|
# @return [Hash{Symbol => Array<String, Hash>}]
|
10
17
|
def to_h
|
11
18
|
group_by(&:name).symbolize_keys.deep_transform_values(&:value)
|
12
19
|
end
|
13
20
|
|
21
|
+
# @return [MicroMicro::Collections::PropertiesCollection]
|
22
|
+
def url_properties
|
23
|
+
self.class.new(select(&:url_property?))
|
24
|
+
end
|
25
|
+
|
14
26
|
# @return [Array<String, Hash>]
|
15
27
|
def values
|
16
28
|
@values ||= map(&:value).uniq
|
@@ -1,26 +1,27 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
1
3
|
module MicroMicro
|
2
4
|
module Collections
|
3
5
|
class RelationshipsCollection < BaseCollection
|
4
|
-
# @see
|
5
|
-
#
|
6
|
-
# @return [Hash{Symbol => Hash{Symbol => Array, String}}]
|
7
|
-
def group_by_url
|
8
|
-
group_by(&:href).symbolize_keys.transform_values { |relationships| relationships.first.to_h.slice!(:href) }
|
9
|
-
end
|
10
|
-
|
11
|
-
# @see http://microformats.org/wiki/microformats2-parsing#parse_a_hyperlink_element_for_rel_microformats
|
6
|
+
# @see https://microformats.org/wiki/microformats2-parsing#parse_a_hyperlink_element_for_rel_microformats
|
12
7
|
#
|
13
8
|
# @return [Hash{Symbol => Array<String>}]
|
14
9
|
def group_by_rel
|
15
|
-
# flat_map { |member| member.rels.map { |rel| [rel, member.href] } }.group_by(&:shift).symbolize_keys.transform_values(&:flatten).transform_values(&:uniq)
|
16
10
|
each_with_object(Hash.new { |hash, key| hash[key] = [] }) do |member, hash|
|
17
11
|
member.rels.each { |rel| hash[rel] << member.href }
|
18
12
|
end.symbolize_keys.transform_values(&:uniq)
|
19
13
|
end
|
20
14
|
|
15
|
+
# @see https://microformats.org/wiki/microformats2-parsing#parse_a_hyperlink_element_for_rel_microformats
|
16
|
+
#
|
17
|
+
# @return [Hash{Symbol => Hash{Symbol => Array, String}}]
|
18
|
+
def group_by_url
|
19
|
+
group_by(&:href).symbolize_keys.transform_values { |relationships| relationships.first.to_h.slice!(:href) }
|
20
|
+
end
|
21
|
+
|
21
22
|
# @return [Array<String>]
|
22
23
|
def rels
|
23
|
-
@rels ||=
|
24
|
+
@rels ||= flat_map(&:rels).uniq.sort
|
24
25
|
end
|
25
26
|
|
26
27
|
# @return [Array<String>]
|
data/lib/micro_micro/document.rb
CHANGED
@@ -1,29 +1,7 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
1
3
|
module MicroMicro
|
2
4
|
class Document
|
3
|
-
# A map of HTML `srcset` attributes and their associated element names
|
4
|
-
#
|
5
|
-
# @see https://html.spec.whatwg.org/#srcset-attributes
|
6
|
-
# @see https://html.spec.whatwg.org/#attributes-3
|
7
|
-
HTML_IMAGE_CANDIDATE_STRINGS_ATTRIBUTES_MAP = {
|
8
|
-
'imagesrcset' => %w[link],
|
9
|
-
'srcset' => %w[img source]
|
10
|
-
}.freeze
|
11
|
-
|
12
|
-
# A map of HTML URL attributes and their associated element names
|
13
|
-
#
|
14
|
-
# @see https://html.spec.whatwg.org/#attributes-3
|
15
|
-
HTML_URL_ATTRIBUTES_MAP = {
|
16
|
-
'action' => %w[form],
|
17
|
-
'cite' => %w[blockquote del ins q],
|
18
|
-
'data' => %w[object],
|
19
|
-
'formaction' => %w[button input],
|
20
|
-
'href' => %w[a area base link],
|
21
|
-
'manifest' => %w[html],
|
22
|
-
'ping' => %w[a area],
|
23
|
-
'poster' => %w[video],
|
24
|
-
'src' => %w[audio embed iframe img input script source track video]
|
25
|
-
}.freeze
|
26
|
-
|
27
5
|
# Parse a string of HTML for microformats2-encoded data.
|
28
6
|
#
|
29
7
|
# MicroMicro::Document.new('<a href="/" class="h-card" rel="me">Jason Garber</a>', 'https://sixtwothree.org')
|
@@ -38,22 +16,23 @@ module MicroMicro
|
|
38
16
|
# @param markup [String] The HTML to parse for microformats2-encoded data.
|
39
17
|
# @param base_url [String] The URL associated with markup. Used for relative URL resolution.
|
40
18
|
def initialize(markup, base_url)
|
41
|
-
@
|
42
|
-
@base_url = base_url
|
43
|
-
|
44
|
-
resolve_relative_urls
|
19
|
+
@document = Nokogiri::HTML(markup, base_url).resolve_relative_urls!
|
45
20
|
end
|
46
21
|
|
22
|
+
# :nocov:
|
47
23
|
# @return [String]
|
48
24
|
def inspect
|
49
|
-
|
25
|
+
"#<#{self.class}:#{format('%#0x', object_id)} " \
|
26
|
+
"items: #{items.inspect}, " \
|
27
|
+
"relationships: #{relationships.inspect}>"
|
50
28
|
end
|
29
|
+
# :nocov:
|
51
30
|
|
52
31
|
# A collection of items parsed from the provided markup.
|
53
32
|
#
|
54
33
|
# @return [MicroMicro::Collections::ItemsCollection]
|
55
34
|
def items
|
56
|
-
@items ||= Collections::ItemsCollection.new(Item.
|
35
|
+
@items ||= Collections::ItemsCollection.new(Item.from_context(document.element_children))
|
57
36
|
end
|
58
37
|
|
59
38
|
# A collection of relationships parsed from the provided markup.
|
@@ -65,7 +44,7 @@ module MicroMicro
|
|
65
44
|
|
66
45
|
# Return the parsed document as a Hash.
|
67
46
|
#
|
68
|
-
# @see
|
47
|
+
# @see https://microformats.org/wiki/microformats2-parsing#parse_a_document_for_microformats
|
69
48
|
#
|
70
49
|
# @return [Hash{Symbol => Array, Hash}]
|
71
50
|
def to_h
|
@@ -76,76 +55,9 @@ module MicroMicro
|
|
76
55
|
}
|
77
56
|
end
|
78
57
|
|
79
|
-
# Ignore this node?
|
80
|
-
#
|
81
|
-
# @param node [Nokogiri::XML::Element]
|
82
|
-
# @return [Boolean]
|
83
|
-
def self.ignore_node?(node)
|
84
|
-
ignored_node_names.include?(node.name)
|
85
|
-
end
|
86
|
-
|
87
|
-
# A list of HTML element names the parser should ignore.
|
88
|
-
#
|
89
|
-
# @return [Array<String>]
|
90
|
-
def self.ignored_node_names
|
91
|
-
%w[script style template]
|
92
|
-
end
|
93
|
-
|
94
|
-
# @see http://microformats.org/wiki/microformats2-parsing#parse_an_element_for_properties
|
95
|
-
# @see http://microformats.org/wiki/microformats2-parsing#parsing_for_implied_properties
|
96
|
-
#
|
97
|
-
# @param context [Nokogiri::HTML::Document, Nokogiri::XML::NodeSet, Nokogiri::XML::Element]
|
98
|
-
# @yield [context]
|
99
|
-
# @return [String]
|
100
|
-
def self.text_content_from(context)
|
101
|
-
context.css(*ignored_node_names).unlink
|
102
|
-
|
103
|
-
yield(context) if block_given?
|
104
|
-
|
105
|
-
context.text.strip
|
106
|
-
end
|
107
|
-
|
108
58
|
private
|
109
59
|
|
110
|
-
attr_reader :base_url, :markup
|
111
|
-
|
112
|
-
# @return [Nokogiri::XML::Element, nil]
|
113
|
-
def base_element
|
114
|
-
@base_element ||= Nokogiri::HTML(markup).at('//base[@href]')
|
115
|
-
end
|
116
|
-
|
117
60
|
# @return [Nokogiri::HTML::Document]
|
118
|
-
|
119
|
-
@document ||= Nokogiri::HTML(markup, resolved_base_url)
|
120
|
-
end
|
121
|
-
|
122
|
-
def resolve_relative_urls
|
123
|
-
HTML_URL_ATTRIBUTES_MAP.each do |attribute, names|
|
124
|
-
document.xpath(*names.map { |name| "//#{name}[@#{attribute}]" }).each do |node|
|
125
|
-
node[attribute] = Absolutely.to_abs(base: resolved_base_url, relative: node[attribute].strip)
|
126
|
-
end
|
127
|
-
end
|
128
|
-
|
129
|
-
HTML_IMAGE_CANDIDATE_STRINGS_ATTRIBUTES_MAP.each do |attribute, names|
|
130
|
-
document.xpath(*names.map { |name| "//#{name}[@#{attribute}]" }).each do |node|
|
131
|
-
candidates = node[attribute].split(',').map(&:strip).map { |candidate| candidate.match(/^(?<url>.+?)(?<descriptor>\s+.+)?$/) }
|
132
|
-
|
133
|
-
node[attribute] = candidates.map { |candidate| "#{Absolutely.to_abs(base: resolved_base_url, relative: candidate[:url])}#{candidate[:descriptor]}" }.join(', ')
|
134
|
-
end
|
135
|
-
end
|
136
|
-
|
137
|
-
self
|
138
|
-
end
|
139
|
-
|
140
|
-
# @return [String]
|
141
|
-
def resolved_base_url
|
142
|
-
@resolved_base_url ||= begin
|
143
|
-
if base_element
|
144
|
-
Absolutely.to_abs(base: base_url, relative: base_element['href'].strip)
|
145
|
-
else
|
146
|
-
base_url
|
147
|
-
end
|
148
|
-
end
|
149
|
-
end
|
61
|
+
attr_reader :document
|
150
62
|
end
|
151
63
|
end
|
@@ -0,0 +1,88 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
module MicroMicro
|
4
|
+
module Helpers
|
5
|
+
IGNORED_NODE_NAMES = %w[script style template].freeze
|
6
|
+
|
7
|
+
# @param node [Nokogiri::XML::Element]
|
8
|
+
# @param attributes_map [Hash{String => Array}]
|
9
|
+
# @return [String, nil]
|
10
|
+
def self.attribute_value_from(node, attributes_map)
|
11
|
+
attributes_map.filter_map do |attribute, names|
|
12
|
+
node[attribute] if names.include?(node.name) && node[attribute]
|
13
|
+
end.first
|
14
|
+
end
|
15
|
+
|
16
|
+
# @param node [Nokogiri::XML::Element]
|
17
|
+
# @return [Boolean]
|
18
|
+
def self.ignore_node?(node)
|
19
|
+
IGNORED_NODE_NAMES.include?(node.name)
|
20
|
+
end
|
21
|
+
|
22
|
+
# @param nodes [Nokogiri::XML::NodeSet]
|
23
|
+
# @return [Boolean]
|
24
|
+
def self.ignore_nodes?(nodes)
|
25
|
+
(nodes.map(&:name) & IGNORED_NODE_NAMES).any?
|
26
|
+
end
|
27
|
+
|
28
|
+
# @param node [Nokogiri::XML::Element]
|
29
|
+
# @return [Boolean]
|
30
|
+
def self.item_node?(node)
|
31
|
+
root_class_names_from(node).any?
|
32
|
+
end
|
33
|
+
|
34
|
+
# @param nodes [Nokogiri::XML::NodeSet]
|
35
|
+
# @return [Boolean]
|
36
|
+
def self.item_nodes?(nodes)
|
37
|
+
nodes.filter_map { |node| item_node?(node) }.any?
|
38
|
+
end
|
39
|
+
|
40
|
+
# @param node [Nokogiri::XML::Element]
|
41
|
+
# @return [Array<String>]
|
42
|
+
def self.property_class_names_from(node)
|
43
|
+
node.classes.grep(/^(?:dt|e|p|u)(?:-[0-9a-z]+)?(?:-[a-z]+)+$/).uniq
|
44
|
+
end
|
45
|
+
|
46
|
+
# @param node [Nokogiri::XML::Element]
|
47
|
+
# @return [Boolean]
|
48
|
+
def self.property_node?(node)
|
49
|
+
property_class_names_from(node).any?
|
50
|
+
end
|
51
|
+
|
52
|
+
# @param node [Nokogiri::XML::Element]
|
53
|
+
# @return [Array<String>]
|
54
|
+
def self.root_class_names_from(node)
|
55
|
+
node.classes.grep(/^h(?:-[0-9a-z]+)?(?:-[a-z]+)+$/).uniq.sort
|
56
|
+
end
|
57
|
+
|
58
|
+
# @see https://microformats.org/wiki/microformats2-parsing#parse_an_element_for_properties
|
59
|
+
# @see https://microformats.org/wiki/microformats2-parsing#parsing_for_implied_properties
|
60
|
+
#
|
61
|
+
# @param context [Nokogiri::HTML::Document, Nokogiri::XML::NodeSet, Nokogiri::XML::Element]
|
62
|
+
# @yield [context]
|
63
|
+
# @return [String]
|
64
|
+
def self.text_content_from(context)
|
65
|
+
context.css(*IGNORED_NODE_NAMES).unlink
|
66
|
+
|
67
|
+
yield(context) if block_given?
|
68
|
+
|
69
|
+
context.text.strip
|
70
|
+
end
|
71
|
+
|
72
|
+
# @see https://microformats.org/wiki/value-class-pattern#Basic_Parsing
|
73
|
+
#
|
74
|
+
# @param node [Nokogiri::XML::Element]
|
75
|
+
# @return [Boolean]
|
76
|
+
def self.value_class_node?(node)
|
77
|
+
node.classes.include?('value')
|
78
|
+
end
|
79
|
+
|
80
|
+
# @see https://microformats.org/wiki/value-class-pattern#Parsing_value_from_a_title_attribute
|
81
|
+
#
|
82
|
+
# @param node [Nokogiri::XML::Element]
|
83
|
+
# @return [Boolean]
|
84
|
+
def self.value_title_node?(node)
|
85
|
+
node.classes.include?('value-title')
|
86
|
+
end
|
87
|
+
end
|
88
|
+
end
|