micromicro 0.1.0 → 2.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +37 -0
- data/CONTRIBUTING.md +2 -2
- data/README.md +65 -29
- data/lib/micro_micro/collectible.rb +15 -0
- data/lib/micro_micro/collections/base_collection.rb +18 -13
- data/lib/micro_micro/collections/items_collection.rb +7 -0
- data/lib/micro_micro/collections/properties_collection.rb +20 -6
- data/lib/micro_micro/collections/relationships_collection.rb +33 -0
- data/lib/micro_micro/document.rb +36 -44
- data/lib/micro_micro/helpers.rb +82 -0
- data/lib/micro_micro/implied_property.rb +2 -0
- data/lib/micro_micro/item.rb +78 -52
- data/lib/micro_micro/parsers/base_implied_property_parser.rb +29 -0
- data/lib/micro_micro/parsers/base_property_parser.rb +9 -14
- data/lib/micro_micro/parsers/date_time_parser.rb +60 -31
- data/lib/micro_micro/parsers/date_time_property_parser.rb +20 -29
- data/lib/micro_micro/parsers/embedded_markup_property_parser.rb +7 -17
- data/lib/micro_micro/parsers/implied_name_property_parser.rb +17 -58
- data/lib/micro_micro/parsers/implied_photo_property_parser.rb +23 -51
- data/lib/micro_micro/parsers/implied_url_property_parser.rb +13 -42
- data/lib/micro_micro/parsers/plain_text_property_parser.rb +11 -18
- data/lib/micro_micro/parsers/url_property_parser.rb +29 -37
- data/lib/micro_micro/parsers/value_class_pattern_parser.rb +29 -55
- data/lib/micro_micro/property.rb +73 -68
- data/lib/micro_micro/{relation.rb → relationship.rb} +19 -16
- data/lib/micro_micro/version.rb +3 -1
- data/lib/micromicro.rb +37 -22
- data/micromicro.gemspec +14 -9
- metadata +23 -31
- data/.editorconfig +0 -14
- data/.gitignore +0 -34
- data/.gitmodules +0 -3
- data/.reek.yml +0 -8
- data/.rspec +0 -2
- data/.rubocop +0 -3
- data/.rubocop.yml +0 -25
- data/.ruby-version +0 -1
- data/.simplecov +0 -11
- data/.travis.yml +0 -19
- data/Gemfile +0 -14
- data/Rakefile +0 -18
- data/lib/micro_micro/collections/relations_collection.rb +0 -23
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: b3037dad18fbd29d5487d985c388e288a8a09beab3f075488696ece006855f64
|
4
|
+
data.tar.gz: e0426d2ab7f6bfb762ff856f4bfb3afcca7af1c499ddedefdbfd9111c41c26eb
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 8024a4c6de518aab23991e056554ea97d4a21c06b3ba27e38362840be82b5021bebe618d82698903d1e2ce7a5b49d22b131f8cfb7d460074198a89cef9873476
|
7
|
+
data.tar.gz: 6137ab273b1418e2e5f063eda6ef35f1b3bdec5841272e73ef3a9ac25c8e7995ac09aa2495c5f0ad9e88726e88c8fdba97ac50d8d5bdc0e16d271a00c5a42597
|
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,42 @@
|
|
1
1
|
# Changelog
|
2
2
|
|
3
|
+
## 2.0.0 / 2022-08-12
|
4
|
+
|
5
|
+
- Refactor implied property parsers (203fec9)
|
6
|
+
- Add `Helpers` module (caa1c02)
|
7
|
+
- New `PropertiesCollection` and `Property` instance methods (e9bb38b):
|
8
|
+
- `PropertiesCollection#plain_text_properties`
|
9
|
+
- `PropertiesCollection#url_properties`
|
10
|
+
- `Property#date_time_property?`
|
11
|
+
- `Property#embedded_markup_property?`
|
12
|
+
- `Property#plain_text_property?`
|
13
|
+
- `Property#url_property?`
|
14
|
+
- Remove Addressable (66c2bb4)
|
15
|
+
- Refactor classes to use nokogiri-html-ext (33fdf4a)
|
16
|
+
- Update activesupport (563bf56)
|
17
|
+
- **Breaking change:** Set minimum supported Ruby to 2.7 (ba17d05)
|
18
|
+
- Update development Ruby to 2.7.6 (ba17d05)
|
19
|
+
- Remove Reek (c1e76c5)
|
20
|
+
- Update runtime dependency version constraints (f83f26a)
|
21
|
+
- ~~**Breaking change:** Set minimum supported Ruby to 2.6~~ (fc588cd)
|
22
|
+
- ~~Update development Ruby to 2.6.10~~ (d05a2ac)
|
23
|
+
|
24
|
+
## 1.1.0 / 2021-06-10
|
25
|
+
|
26
|
+
- Replace Absolutely dependency with Addressable (e93721b)
|
27
|
+
- Add support for Ruby 3.0 (d897c54)
|
28
|
+
- Update development Ruby version to 2.6.10 (051c9ad)
|
29
|
+
|
30
|
+
## 1.0.0 / 2020-11-08
|
31
|
+
|
32
|
+
- Add `MicroMicro::Item#plain_text_properties` and `MicroMicro::Item#url_properties` methods (351e1f1)
|
33
|
+
- Add `MicroMicro::Collections::RelationshipsCollection#rels` and `MicroMicro::Collections::RelationshipsCollection#urls` methods(c0e5665)
|
34
|
+
- Add `MicroMicro::Collections::PropertiesCollection#names` and `MicroMicro::Collections::PropertiesCollection#values` methods (65486bc)
|
35
|
+
- Add `MicroMicro::Collections::ItemsCollection#types` method (6b53a81)
|
36
|
+
- Update absolutely dependency (4e67bb2)
|
37
|
+
- Add `Collectible` concern and refactor using Composite design pattern (82503b8)
|
38
|
+
- Update absolutely dependency (4578fb4)
|
39
|
+
|
3
40
|
## 0.1.0 / 2020-05-06
|
4
41
|
|
5
42
|
- Initial release!
|
data/CONTRIBUTING.md
CHANGED
@@ -8,9 +8,9 @@ There are a couple ways you can help improve MicroMicro:
|
|
8
8
|
|
9
9
|
## Getting Started
|
10
10
|
|
11
|
-
MicroMicro is developed using Ruby 2.
|
11
|
+
MicroMicro is developed using Ruby 2.7.6 and is additionally tested against Ruby 3.0 and 3.1 using [GitHub Actions](https://github.com/jgarber623/micromicro/actions).
|
12
12
|
|
13
|
-
Before making changes to MicroMicro, you'll want to install Ruby 2.
|
13
|
+
Before making changes to MicroMicro, you'll want to install Ruby 2.7.6. It's recommended that you use a Ruby version managment tool like [rbenv](https://github.com/rbenv/rbenv), [chruby](https://github.com/postmodern/chruby), or [rvm](https://github.com/rvm/rvm). Once you've installed Ruby 2.7.6 using your method of choice, install the project's gems by running:
|
14
14
|
|
15
15
|
```sh
|
16
16
|
bundle install
|
data/README.md
CHANGED
@@ -1,42 +1,43 @@
|
|
1
1
|
# MicroMicro
|
2
2
|
|
3
|
-
**A Ruby gem for extracting [microformats2](
|
3
|
+
**A Ruby gem for extracting [microformats2](https://microformats.org/wiki/microformats2)-encoded data from HTML documents.**
|
4
4
|
|
5
|
-
[](https://rubygems.org/gems/micromicro)
|
6
|
+
[](https://rubygems.org/gems/micromicro)
|
7
|
+
[](https://github.com/jgarber623/micromicro/actions/workflows/ci.yml)
|
8
|
+
[](https://codeclimate.com/github/jgarber623/micromicro)
|
9
|
+
[](https://codeclimate.com/github/jgarber623/micromicro/code)
|
9
10
|
|
10
11
|
## Key Features
|
11
12
|
|
12
|
-
- Parses microformats2-encoded HTML documents according to the [microformats2 parsing specification](
|
13
|
+
- Parses microformats2-encoded HTML documents according to the [microformats2 parsing specification](https://microformats.org/wiki/microformats2-parsing)
|
13
14
|
- Passes all microformats2 tests from [the official test suite](https://github.com/microformats/tests)¹
|
14
|
-
- Supports Ruby 2.
|
15
|
+
- Supports Ruby 2.7 and newer
|
15
16
|
|
16
|
-
**Note:** MicroMicro **does not** parse [Classic Microformats](
|
17
|
+
**Note:** MicroMicro **does not** parse [Classic Microformats](https://microformats.org/wiki/Main_Page#Classic_Microformats) (referred to in [the parsing specification](https://microformats.org/wiki/microformats2-parsing#note_backward_compatibility_details) as "backcompat root classes" and "backcompat properties" and in vocabulary specifications in the "Parser Compatibility" sections [e.g. [h-entry](https://microformats.org/wiki/h-entry#Parser_Compatibility)]). To parse documents marked up with Classic Microformats, consider using [the official microformats-ruby parser](https://github.com/microformats/microformats-ruby).
|
17
18
|
|
18
19
|
<small>¹ …with some exceptions until [this pull request](https://github.com/microformats/tests/pull/112) is merged.</small>
|
19
20
|
|
20
21
|
## Getting Started
|
21
22
|
|
22
|
-
Before installing and using MicroMicro, you'll want to have [Ruby](https://www.ruby-lang.org) 2.
|
23
|
+
Before installing and using MicroMicro, you'll want to have [Ruby](https://www.ruby-lang.org) 2.7 (or newer) installed. It's recommended that you use a Ruby version managment tool like [rbenv](https://github.com/rbenv/rbenv), [chruby](https://github.com/postmodern/chruby), or [rvm](https://github.com/rvm/rvm).
|
23
24
|
|
24
|
-
MicroMicro is developed using Ruby 2.
|
25
|
+
MicroMicro is developed using Ruby 2.7.6 and is additionally tested against Ruby 3.0 and 3.1 using [GitHub Actions](https://github.com/jgarber623/micromicro/actions).
|
25
26
|
|
26
27
|
## Installation
|
27
28
|
|
28
|
-
If you're using [Bundler](https://bundler.io), add MicroMicro to your project's
|
29
|
+
If you're using [Bundler](https://bundler.io) to manage gem dependencies, add MicroMicro to your project's Gemfile:
|
29
30
|
|
30
31
|
```ruby
|
31
|
-
source 'https://rubygems.org'
|
32
|
-
|
33
32
|
gem 'micromicro'
|
34
33
|
```
|
35
34
|
|
36
|
-
…and
|
35
|
+
…and run `bundle install` in your shell.
|
36
|
+
|
37
|
+
To install the gem manually, run the following in your shell:
|
37
38
|
|
38
39
|
```sh
|
39
|
-
|
40
|
+
gem install micromicro
|
40
41
|
```
|
41
42
|
|
42
43
|
## Usage
|
@@ -53,7 +54,7 @@ An example using a simple `String` of HTML as input:
|
|
53
54
|
require 'micromicro'
|
54
55
|
|
55
56
|
doc = MicroMicro.parse('<div class="h-card">Jason Garber</div>', 'https://sixtwothree.org')
|
56
|
-
#=> #<MicroMicro::Document items: #<MicroMicro::Collections::ItemsCollection count: 1, members: [#<MicroMicro::Item types: ["h-card"], properties: 1, children: 0>]>,
|
57
|
+
#=> #<MicroMicro::Document items: #<MicroMicro::Collections::ItemsCollection count: 1, members: [#<MicroMicro::Item types: ["h-card"], properties: 1, children: 0>]>, relationships: #<MicroMicro::Collections::RelationshipsCollection count: 0, members: []>>
|
57
58
|
|
58
59
|
doc.to_h
|
59
60
|
#=> { :items => [{ :type => ["h-card"], :properties => { :name => ["Jason Garber"] } }], :rels => {}, :"rel-urls" => {} }
|
@@ -64,14 +65,14 @@ The `Hash` produced by calling `doc.to_h` may be converted to JSON (e.g. `doc.to
|
|
64
65
|
Another example pulling the source HTML from [Tantek](https://tantek.com)'s website:
|
65
66
|
|
66
67
|
```ruby
|
67
|
-
require
|
68
|
-
require
|
68
|
+
require 'net/http'
|
69
|
+
require 'micromicro'
|
69
70
|
|
70
|
-
url =
|
71
|
+
url = 'https://tantek.com'
|
71
72
|
rsp = Net::HTTP.get(URI.parse(url))
|
72
73
|
|
73
74
|
doc = MicroMicro.parse(rsp, url)
|
74
|
-
#=> #<MicroMicro::Document items: #<MicroMicro::Collections::ItemsCollection count: 1, members: […]>,
|
75
|
+
#=> #<MicroMicro::Document items: #<MicroMicro::Collections::ItemsCollection count: 1, members: […]>, relationships: #<MicroMicro::Collections::RelationshipsCollection count: 31, members: […]>>
|
75
76
|
|
76
77
|
doc.to_h
|
77
78
|
#=> { :items => [{ :type => ["h-card"], :properties => {…}, :children => […]}], :rels => {…}, :'rel-urls' => {…} }
|
@@ -81,39 +82,74 @@ doc.to_h
|
|
81
82
|
|
82
83
|
Building on the example above, a MicroMicro-parsed document is navigable and manipulable using a familiar `Enumerable`-esque interface.
|
83
84
|
|
85
|
+
#### Items
|
86
|
+
|
84
87
|
```ruby
|
85
88
|
doc.items.first
|
86
89
|
#=> #<MicroMicro::Item types: ["h-card"], properties: 42, children: 6>
|
87
90
|
|
91
|
+
# 🆕 in v1.0.0
|
92
|
+
doc.items.types
|
93
|
+
#=> ["h-card"]
|
94
|
+
|
95
|
+
doc.items.first.children
|
96
|
+
#=> #<MicroMicro::Collections::ItemsCollection count: 6, members: […]>
|
97
|
+
```
|
98
|
+
|
99
|
+
#### Properties
|
100
|
+
|
101
|
+
```ruby
|
88
102
|
doc.items.first.properties
|
89
103
|
#=> #<MicroMicro::Collections::PropertiesCollection count: 42, members: […]>
|
90
104
|
|
105
|
+
# 🆕 in v1.0.0
|
106
|
+
doc.items.first.plain_text_properties
|
107
|
+
#=> #<MicroMicro::Collections::PropertiesCollection count: 34, members: […]>
|
108
|
+
|
109
|
+
# 🆕 in v1.0.0
|
110
|
+
doc.items.first.url_properties
|
111
|
+
#=> #<MicroMicro::Collections::PropertiesCollection count: 11, members: […]>
|
112
|
+
|
113
|
+
# 🆕 in v1.0.0
|
114
|
+
doc.items.first.properties.names
|
115
|
+
#=> ["category", "name", "note", "org", "photo", "pronoun", "pronouns", "role", "uid", "url"]
|
116
|
+
|
117
|
+
# 🆕 in v1.0.0
|
118
|
+
doc.items.first.properties.values
|
119
|
+
#=> [{:value=>"https://tantek.com/photo.jpg", :alt=>""}, "https://tantek.com/", "Tantek Çelik", "Inventor, writer, teacher, runner, coder, more.", "Inventor", "writer", "teacher", "runner", "coder", …]
|
120
|
+
|
91
121
|
doc.items.first.properties[7]
|
92
122
|
#=> #<MicroMicro::Property name: "category", prefix: "p", value: "teacher">
|
93
123
|
|
94
124
|
doc.items.first.properties.take(5).map { |property| [property.name, property.value] }
|
95
125
|
#=> [["photo", { :value => "https://tantek.com/photo.jpg", :alt => "" }], ["url", "https://tantek.com/"], ["uid", "https://tantek.com/"], ["name", "Tantek Çelik"], ["role", "Inventor, writer, teacher, runner, coder, more."]]
|
126
|
+
```
|
96
127
|
|
97
|
-
|
98
|
-
#=> #<MicroMicro::Collections::ItemsCollection count: 6, members: […]>
|
128
|
+
#### Relationships
|
99
129
|
|
100
|
-
|
101
|
-
|
130
|
+
```ruby
|
131
|
+
doc.relationships.first
|
132
|
+
#=> #<MicroMicro::Relationship href: "https://tantek.com/", rels: ["canonical"]>
|
102
133
|
|
103
|
-
|
134
|
+
# 🆕 in v1.0.0
|
135
|
+
doc.relationships.rels
|
104
136
|
#=> ["alternate", "apple-touch-icon-precomposed", "author", "authorization_endpoint", "bookmark", "canonical", "hub", "icon", "me", "microsub", …]
|
105
137
|
|
106
|
-
|
107
|
-
|
138
|
+
# 🆕 in v1.0.0
|
139
|
+
doc.relationships.urls
|
140
|
+
#=> ["http://dribbble.com/tantek/", "http://last.fm/user/tantekc", "https://aperture.p3k.io/microsub/277", "https://en.wikipedia.org/wiki/User:Tantek", "https://github.com/tantek", "https://indieauth.com/auth", "https://indieauth.com/openid", "https://micro.blog/t", "https://pubsubhubbub.superfeedr.com/", "https://tantek.com/", …]
|
141
|
+
|
142
|
+
doc.relationships.find { |relationship| relationship.rels.include?('webmention') }
|
143
|
+
# => #<MicroMicro::Relationship href: "https://webmention.io/tantek.com/webmention", rels: ["webmention"]>
|
108
144
|
```
|
109
145
|
|
110
146
|
## Contributing
|
111
147
|
|
112
|
-
Interested in helping improve MicroMicro? Awesome! Your help is greatly appreciated. See [CONTRIBUTING.md](https://github.com/jgarber623/micromicro/blob/
|
148
|
+
Interested in helping improve MicroMicro? Awesome! Your help is greatly appreciated. See [CONTRIBUTING.md](https://github.com/jgarber623/micromicro/blob/main/CONTRIBUTING.md) for details.
|
113
149
|
|
114
150
|
## Acknowledgments
|
115
151
|
|
116
|
-
MicroMicro wouldn't exist without the hard work of everyone involved in the [microformats](
|
152
|
+
MicroMicro wouldn't exist without the hard work of everyone involved in the [microformats](https://microformats.org) community. Additionally, the comprehensive [microformats test suite](https://github.com/microformats/tests) was invaluable in the development of this Ruby gem.
|
117
153
|
|
118
154
|
MicroMicro is written and maintained by [Jason Garber](https://sixtwothree.org).
|
119
155
|
|
@@ -1,37 +1,42 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
1
3
|
module MicroMicro
|
2
4
|
module Collections
|
3
5
|
class BaseCollection
|
6
|
+
extend Forwardable
|
7
|
+
|
4
8
|
include Enumerable
|
5
9
|
|
6
|
-
|
10
|
+
def_delegators :members, :[], :each, :last, :length, :split
|
7
11
|
|
8
|
-
# @param members [Array<MicroMicro::Item, MicroMicro::Property, MicroMicro::
|
12
|
+
# @param members [Array<MicroMicro::Item, MicroMicro::Property, MicroMicro::Relationship>]
|
9
13
|
def initialize(members = [])
|
10
|
-
|
11
|
-
|
12
|
-
decorate_members if respond_to?(:decorate_members, true)
|
14
|
+
members.each { |member| push(member) }
|
13
15
|
end
|
14
16
|
|
17
|
+
# :nocov:
|
15
18
|
# @return [String]
|
16
19
|
def inspect
|
17
|
-
|
20
|
+
"#<#{self.class}:#{format('%#0x', object_id)} " \
|
21
|
+
"count: #{count}, " \
|
22
|
+
"members: #{members.inspect}>"
|
18
23
|
end
|
24
|
+
# :nocov:
|
19
25
|
|
20
|
-
# @param member [MicroMicro::Item, MicroMicro::Property, MicroMicro::
|
21
|
-
# @return [self]
|
26
|
+
# @param member [MicroMicro::Item, MicroMicro::Property, MicroMicro::Relationship]
|
22
27
|
def push(member)
|
23
|
-
members
|
28
|
+
members << member
|
24
29
|
|
25
|
-
|
26
|
-
|
27
|
-
self
|
30
|
+
member.collection = self
|
28
31
|
end
|
29
32
|
|
30
33
|
alias << push
|
31
34
|
|
32
35
|
private
|
33
36
|
|
34
|
-
|
37
|
+
def members
|
38
|
+
@members ||= []
|
39
|
+
end
|
35
40
|
end
|
36
41
|
end
|
37
42
|
end
|
@@ -1,3 +1,5 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
1
3
|
module MicroMicro
|
2
4
|
module Collections
|
3
5
|
class ItemsCollection < BaseCollection
|
@@ -5,6 +7,11 @@ module MicroMicro
|
|
5
7
|
def to_a
|
6
8
|
map(&:to_h)
|
7
9
|
end
|
10
|
+
|
11
|
+
# @return [Array<String>]
|
12
|
+
def types
|
13
|
+
@types ||= flat_map(&:types).uniq.sort
|
14
|
+
end
|
8
15
|
end
|
9
16
|
end
|
10
17
|
end
|
@@ -1,17 +1,31 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
1
3
|
module MicroMicro
|
2
4
|
module Collections
|
3
5
|
class PropertiesCollection < BaseCollection
|
6
|
+
# @return [Array<String>]
|
7
|
+
def names
|
8
|
+
@names ||= map(&:name).uniq.sort
|
9
|
+
end
|
10
|
+
|
11
|
+
# @return [MicroMicro::Collections::PropertiesCollection]
|
12
|
+
def plain_text_properties
|
13
|
+
self.class.new(select(&:plain_text_property?))
|
14
|
+
end
|
15
|
+
|
4
16
|
# @return [Hash{Symbol => Array<String, Hash>}]
|
5
17
|
def to_h
|
6
|
-
group_by(&:name).symbolize_keys.deep_transform_values
|
7
|
-
property.item_node? ? property.value.to_h : property.value
|
8
|
-
end
|
18
|
+
group_by(&:name).symbolize_keys.deep_transform_values(&:value)
|
9
19
|
end
|
10
20
|
|
11
|
-
|
21
|
+
# @return [MicroMicro::Collections::PropertiesCollection]
|
22
|
+
def url_properties
|
23
|
+
self.class.new(select(&:url_property?))
|
24
|
+
end
|
12
25
|
|
13
|
-
|
14
|
-
|
26
|
+
# @return [Array<String, Hash>]
|
27
|
+
def values
|
28
|
+
@values ||= map(&:value).uniq
|
15
29
|
end
|
16
30
|
end
|
17
31
|
end
|
@@ -0,0 +1,33 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
module MicroMicro
|
4
|
+
module Collections
|
5
|
+
class RelationshipsCollection < BaseCollection
|
6
|
+
# @see https://microformats.org/wiki/microformats2-parsing#parse_a_hyperlink_element_for_rel_microformats
|
7
|
+
#
|
8
|
+
# @return [Hash{Symbol => Array<String>}]
|
9
|
+
def group_by_rel
|
10
|
+
each_with_object(Hash.new { |hash, key| hash[key] = [] }) do |member, hash|
|
11
|
+
member.rels.each { |rel| hash[rel] << member.href }
|
12
|
+
end.symbolize_keys.transform_values(&:uniq)
|
13
|
+
end
|
14
|
+
|
15
|
+
# @see https://microformats.org/wiki/microformats2-parsing#parse_a_hyperlink_element_for_rel_microformats
|
16
|
+
#
|
17
|
+
# @return [Hash{Symbol => Hash{Symbol => Array, String}}]
|
18
|
+
def group_by_url
|
19
|
+
group_by(&:href).symbolize_keys.transform_values { |relationships| relationships.first.to_h.slice!(:href) }
|
20
|
+
end
|
21
|
+
|
22
|
+
# @return [Array<String>]
|
23
|
+
def rels
|
24
|
+
@rels ||= flat_map(&:rels).uniq.sort
|
25
|
+
end
|
26
|
+
|
27
|
+
# @return [Array<String>]
|
28
|
+
def urls
|
29
|
+
@urls ||= map(&:href).uniq.sort
|
30
|
+
end
|
31
|
+
end
|
32
|
+
end
|
33
|
+
end
|
data/lib/micro_micro/document.rb
CHANGED
@@ -1,71 +1,63 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
1
3
|
module MicroMicro
|
2
4
|
class Document
|
3
|
-
#
|
4
|
-
#
|
5
|
+
# Parse a string of HTML for microformats2-encoded data.
|
6
|
+
#
|
7
|
+
# MicroMicro::Document.new('<a href="/" class="h-card" rel="me">Jason Garber</a>', 'https://sixtwothree.org')
|
8
|
+
#
|
9
|
+
# Or, pull the source HTML of a page on the Web:
|
10
|
+
#
|
11
|
+
# url = 'https://tantek.com'
|
12
|
+
# markup = Net::HTTP.get(URI.parse(url))
|
13
|
+
#
|
14
|
+
# doc = MicroMicro::Document.new(markup, url)
|
15
|
+
#
|
16
|
+
# @param markup [String] The HTML to parse for microformats2-encoded data.
|
17
|
+
# @param base_url [String] The URL associated with markup. Used for relative URL resolution.
|
5
18
|
def initialize(markup, base_url)
|
6
|
-
@
|
7
|
-
@base_url = base_url
|
19
|
+
@document = Nokogiri::HTML(markup, base_url).resolve_relative_urls!
|
8
20
|
end
|
9
21
|
|
22
|
+
# :nocov:
|
10
23
|
# @return [String]
|
11
24
|
def inspect
|
12
|
-
|
25
|
+
"#<#{self.class}:#{format('%#0x', object_id)} " \
|
26
|
+
"items: #{items.inspect}, " \
|
27
|
+
"relationships: #{relationships.inspect}>"
|
13
28
|
end
|
29
|
+
# :nocov:
|
14
30
|
|
31
|
+
# A collection of items parsed from the provided markup.
|
32
|
+
#
|
15
33
|
# @return [MicroMicro::Collections::ItemsCollection]
|
16
34
|
def items
|
17
|
-
@items ||= Collections::ItemsCollection.new(Item.
|
35
|
+
@items ||= Collections::ItemsCollection.new(Item.from_context(document.element_children))
|
18
36
|
end
|
19
37
|
|
20
|
-
#
|
21
|
-
|
22
|
-
|
38
|
+
# A collection of relationships parsed from the provided markup.
|
39
|
+
#
|
40
|
+
# @return [MicroMicro::Collections::RelationshipsCollection]
|
41
|
+
def relationships
|
42
|
+
@relationships ||= Collections::RelationshipsCollection.new(Relationship.relationships_from(document))
|
23
43
|
end
|
24
44
|
|
25
|
-
#
|
26
|
-
#
|
45
|
+
# Return the parsed document as a Hash.
|
46
|
+
#
|
47
|
+
# @see https://microformats.org/wiki/microformats2-parsing#parse_a_document_for_microformats
|
27
48
|
#
|
28
|
-
# @return [Hash]
|
49
|
+
# @return [Hash{Symbol => Array, Hash}]
|
29
50
|
def to_h
|
30
51
|
{
|
31
52
|
items: items.to_a,
|
32
|
-
rels:
|
33
|
-
'rel-urls':
|
53
|
+
rels: relationships.group_by_rel,
|
54
|
+
'rel-urls': relationships.group_by_url
|
34
55
|
}
|
35
56
|
end
|
36
57
|
|
37
|
-
# @param node [Nokogiri::XML::Element]
|
38
|
-
# @return [Boolean]
|
39
|
-
def self.ignore_node?(node)
|
40
|
-
ignored_node_names.include?(node.name)
|
41
|
-
end
|
42
|
-
|
43
|
-
# @return [Array<String>]
|
44
|
-
def self.ignored_node_names
|
45
|
-
%w[script style template]
|
46
|
-
end
|
47
|
-
|
48
58
|
private
|
49
59
|
|
50
|
-
attr_reader :base_url, :markup
|
51
|
-
|
52
|
-
# @return [Nokogiri::XML::Element, nil]
|
53
|
-
def base_element
|
54
|
-
@base_element ||= Nokogiri::HTML(markup).at_css('base[href]')
|
55
|
-
end
|
56
|
-
|
57
60
|
# @return [Nokogiri::HTML::Document]
|
58
|
-
|
59
|
-
@document ||= Nokogiri::HTML(markup, resolved_base_url)
|
60
|
-
end
|
61
|
-
|
62
|
-
# @return [String]
|
63
|
-
def resolved_base_url
|
64
|
-
@resolved_base_url ||= begin
|
65
|
-
return base_url unless base_element
|
66
|
-
|
67
|
-
Absolutely.to_abs(base: base_url, relative: base_element['href'])
|
68
|
-
end
|
69
|
-
end
|
61
|
+
attr_reader :document
|
70
62
|
end
|
71
63
|
end
|
@@ -0,0 +1,82 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
module MicroMicro
|
4
|
+
module Helpers
|
5
|
+
IGNORED_NODE_NAMES = %w[script style template].freeze
|
6
|
+
|
7
|
+
# @param node [Nokogiri::XML::Element]
|
8
|
+
# @param attributes_map [Hash{String => Array}]
|
9
|
+
# @return [String, nil]
|
10
|
+
def self.attribute_value_from(node, attributes_map)
|
11
|
+
attributes_map.filter_map do |attribute, names|
|
12
|
+
node[attribute] if names.include?(node.name) && node[attribute]
|
13
|
+
end.first
|
14
|
+
end
|
15
|
+
|
16
|
+
# @param node [Nokogiri::XML::Element]
|
17
|
+
# @return [Boolean]
|
18
|
+
def self.ignore_node?(node)
|
19
|
+
IGNORED_NODE_NAMES.include?(node.name)
|
20
|
+
end
|
21
|
+
|
22
|
+
# @param nodes [Nokogiri::XML::NodeSet]
|
23
|
+
# @return [Boolean]
|
24
|
+
def self.ignore_nodes?(nodes)
|
25
|
+
(nodes.map(&:name) & IGNORED_NODE_NAMES).any?
|
26
|
+
end
|
27
|
+
|
28
|
+
# @param node [Nokogiri::XML::Element]
|
29
|
+
# @return [Boolean]
|
30
|
+
def self.item_node?(node)
|
31
|
+
root_class_names_from(node).any?
|
32
|
+
end
|
33
|
+
|
34
|
+
# @param node [Nokogiri::XML::Element]
|
35
|
+
# @return [Array<String>]
|
36
|
+
def self.property_class_names_from(node)
|
37
|
+
node.classes.grep(/^(?:dt|e|p|u)(?:-[0-9a-z]+)?(?:-[a-z]+)+$/).uniq
|
38
|
+
end
|
39
|
+
|
40
|
+
# @param node [Nokogiri::XML::Element]
|
41
|
+
# @return [Boolean]
|
42
|
+
def self.property_node?(node)
|
43
|
+
property_class_names_from(node).any?
|
44
|
+
end
|
45
|
+
|
46
|
+
# @param node [Nokogiri::XML::Element]
|
47
|
+
# @return [Array<String>]
|
48
|
+
def self.root_class_names_from(node)
|
49
|
+
node.classes.grep(/^h(?:-[0-9a-z]+)?(?:-[a-z]+)+$/).uniq.sort
|
50
|
+
end
|
51
|
+
|
52
|
+
# @see https://microformats.org/wiki/microformats2-parsing#parse_an_element_for_properties
|
53
|
+
# @see https://microformats.org/wiki/microformats2-parsing#parsing_for_implied_properties
|
54
|
+
#
|
55
|
+
# @param context [Nokogiri::HTML::Document, Nokogiri::XML::NodeSet, Nokogiri::XML::Element]
|
56
|
+
# @yield [context]
|
57
|
+
# @return [String]
|
58
|
+
def self.text_content_from(context)
|
59
|
+
context.css(*IGNORED_NODE_NAMES).unlink
|
60
|
+
|
61
|
+
yield(context) if block_given?
|
62
|
+
|
63
|
+
context.text.strip
|
64
|
+
end
|
65
|
+
|
66
|
+
# @see https://microformats.org/wiki/value-class-pattern#Basic_Parsing
|
67
|
+
#
|
68
|
+
# @param node [Nokogiri::XML::Element]
|
69
|
+
# @return [Boolean]
|
70
|
+
def self.value_class_node?(node)
|
71
|
+
node.classes.include?('value')
|
72
|
+
end
|
73
|
+
|
74
|
+
# @see https://microformats.org/wiki/value-class-pattern#Parsing_value_from_a_title_attribute
|
75
|
+
#
|
76
|
+
# @param node [Nokogiri::XML::Element]
|
77
|
+
# @return [Boolean]
|
78
|
+
def self.value_title_node?(node)
|
79
|
+
node.classes.include?('value-title')
|
80
|
+
end
|
81
|
+
end
|
82
|
+
end
|