micromicro 3.0.0 → 4.0.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +13 -1
- data/CONTRIBUTING.md +2 -2
- data/README.md +2 -2
- data/lib/micro_micro/collections/properties_collection.rb +1 -1
- data/lib/micro_micro/collections/relationships_collection.rb +4 -4
- data/lib/micro_micro/document.rb +2 -2
- data/lib/micro_micro/helpers.rb +1 -1
- data/lib/micro_micro/item.rb +1 -1
- data/lib/micro_micro/parsers/date_time_parser.rb +2 -5
- data/lib/micro_micro/parsers/image_element_parser.rb +68 -0
- data/lib/micro_micro/parsers/implied_photo_property_parser.rb +5 -6
- data/lib/micro_micro/parsers/url_property_parser.rb +2 -5
- data/lib/micro_micro/parsers/value_class_pattern_parser.rb +1 -1
- data/lib/micro_micro/property.rb +2 -2
- data/lib/micro_micro/relationship.rb +2 -2
- data/lib/micro_micro/version.rb +1 -1
- data/lib/micromicro.rb +3 -2
- data/micromicro.gemspec +2 -2
- metadata +9 -8
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: c4ed4ff8b2b7284cd01096982ef345571ffb1de9c10e5943b725976aa6778674
|
4
|
+
data.tar.gz: 9275bb41d54398977ea98439e76b0adab839ffa38869a2ecbe63468fed0dc376
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 0e87a9b3275b7c212e485459a520ea3b7a687550450f42ff5ade89069e1fc610d64e9c918eec27c1a88be8807f14f652351c81150a3028884b37a050db7c0f70
|
7
|
+
data.tar.gz: 50278fa0b0130004d98b5851dc4530f19c6888028266538bd0d05e43820ff3b3d64af6aadc937e1497cce158a134553c1605857e342cc2511e67450c1314411d
|
data/CHANGELOG.md
CHANGED
@@ -1,6 +1,18 @@
|
|
1
1
|
# Changelog
|
2
2
|
|
3
|
-
##
|
3
|
+
## 4.0.0 / 2023-03-14
|
4
|
+
|
5
|
+
- Parse HTML with `Nokogiri::HTML5::Document.parse` (330a2d1, 0de40d7)
|
6
|
+
- Update Nokogiri and nokogiri-html-ext constraints (e038606, cb6e499, 9793b17)
|
7
|
+
- Remove code-scanning-rubocop and rspec-github gems (2fbb9c5)
|
8
|
+
- Update development Ruby to v2.7.7 (a333103)
|
9
|
+
|
10
|
+
## 3.1.0 / 2022-09-24
|
11
|
+
|
12
|
+
- **New feature:** parse `img[srcset]` (microformats/microformats2-parsing#7) (cdda328)
|
13
|
+
- Improve usage of activesupport extensions (5ed120c)
|
14
|
+
|
15
|
+
## 3.0.0 / 2022-08-28
|
4
16
|
|
5
17
|
- Improved YARD documentation
|
6
18
|
- New `Item` instance methods (8105d6f):
|
data/CONTRIBUTING.md
CHANGED
@@ -8,9 +8,9 @@ There are a couple ways you can help improve MicroMicro:
|
|
8
8
|
|
9
9
|
## Getting Started
|
10
10
|
|
11
|
-
MicroMicro is developed using Ruby 2.7.
|
11
|
+
MicroMicro is developed using Ruby 2.7.7 and is additionally tested against Ruby 3.0, 3.1, and 3.2 using [GitHub Actions](https://github.com/jgarber623/micromicro/actions).
|
12
12
|
|
13
|
-
Before making changes to MicroMicro, you'll want to install Ruby 2.7.
|
13
|
+
Before making changes to MicroMicro, you'll want to install Ruby 2.7.7. It's recommended that you use a Ruby version managment tool like [rbenv](https://github.com/rbenv/rbenv), [chruby](https://github.com/postmodern/chruby), or [rvm](https://github.com/rvm/rvm). Once you've installed Ruby 2.7.7 using your method of choice, install the project's gems by running:
|
14
14
|
|
15
15
|
```sh
|
16
16
|
bundle install
|
data/README.md
CHANGED
@@ -4,7 +4,7 @@
|
|
4
4
|
|
5
5
|
[![Gem](https://img.shields.io/gem/v/micromicro.svg?logo=rubygems&style=for-the-badge)](https://rubygems.org/gems/micromicro)
|
6
6
|
[![Downloads](https://img.shields.io/gem/dt/micromicro.svg?logo=rubygems&style=for-the-badge)](https://rubygems.org/gems/micromicro)
|
7
|
-
[![Build](https://img.shields.io/github/workflow/status/jgarber623/micromicro/
|
7
|
+
[![Build](https://img.shields.io/github/actions/workflow/status/jgarber623/micromicro/ci.yml?branch=main&logo=github&style=for-the-badge)](https://github.com/jgarber623/micromicro/actions/workflows/ci.yml)
|
8
8
|
[![Maintainability](https://img.shields.io/codeclimate/maintainability/jgarber623/micromicro.svg?logo=code-climate&style=for-the-badge)](https://codeclimate.com/github/jgarber623/micromicro)
|
9
9
|
[![Coverage](https://img.shields.io/codeclimate/c/jgarber623/micromicro.svg?logo=code-climate&style=for-the-badge)](https://codeclimate.com/github/jgarber623/micromicro/code)
|
10
10
|
|
@@ -48,7 +48,7 @@ doc.to_h
|
|
48
48
|
#=> { :items => [{ :type => ["h-card"], :properties => { :name => ["Jason Garber"] } }], :rels => {}, :"rel-urls" => {} }
|
49
49
|
```
|
50
50
|
|
51
|
-
See [USAGE.md](https://github.com/jgarber623/micromicro/blob/main/USAGE.md) for detailed examples of MicroMicro's features.
|
51
|
+
See [USAGE.md](https://github.com/jgarber623/micromicro/blob/main/USAGE.md) for detailed examples of MicroMicro's features. Additional structured documentation is available on [RubyDoc.info](https://rubydoc.info/gems/micromicro).
|
52
52
|
|
53
53
|
## Contributing
|
54
54
|
|
@@ -64,7 +64,7 @@ module MicroMicro
|
|
64
64
|
#
|
65
65
|
# @return [Hash{Symbol => Array<String, Hash>}]
|
66
66
|
def to_h
|
67
|
-
group_by(&:name).
|
67
|
+
group_by(&:name).transform_keys(&:to_sym).deep_transform_values(&:value)
|
68
68
|
end
|
69
69
|
|
70
70
|
# A collection of url {MicroMicro::Property}s parsed from the node.
|
@@ -38,9 +38,9 @@ module MicroMicro
|
|
38
38
|
#
|
39
39
|
# @return [Hash{Symbol => Array<String>}]
|
40
40
|
def group_by_rel
|
41
|
-
each_with_object(Hash.new { |hash, key| hash[key] =
|
42
|
-
member.rels.each { |rel| hash[rel] << member.href }
|
43
|
-
end.
|
41
|
+
each_with_object(Hash.new { |hash, key| hash[key] = Set.new }) do |member, hash|
|
42
|
+
member.rels.each { |rel| hash[rel.to_sym] << member.href }
|
43
|
+
end.transform_values(&:to_a)
|
44
44
|
end
|
45
45
|
|
46
46
|
# Return a Hash of this collection's {MicroMicro::Relationship}s grouped
|
@@ -51,7 +51,7 @@ module MicroMicro
|
|
51
51
|
#
|
52
52
|
# @return [Hash{Symbol => Hash{Symbol => Array, String}}]
|
53
53
|
def group_by_url
|
54
|
-
group_by(&:href).
|
54
|
+
group_by(&:href).to_h { |k, v| [k.to_sym, v.first.to_h.except(:href)] }
|
55
55
|
end
|
56
56
|
|
57
57
|
# Retrieve an Array of this collection's unique {MicroMicro::Relationship}
|
data/lib/micro_micro/document.rb
CHANGED
@@ -16,7 +16,7 @@ module MicroMicro
|
|
16
16
|
# @param markup [String] The HTML to parse for microformats2-encoded data.
|
17
17
|
# @param base_url [String] The URL associated with markup. Used for relative URL resolution.
|
18
18
|
def initialize(markup, base_url)
|
19
|
-
@document = Nokogiri::
|
19
|
+
@document = Nokogiri::HTML5::Document.parse(markup, base_url).resolve_relative_urls!
|
20
20
|
end
|
21
21
|
|
22
22
|
# @return [String]
|
@@ -63,7 +63,7 @@ module MicroMicro
|
|
63
63
|
|
64
64
|
private
|
65
65
|
|
66
|
-
# @return [Nokogiri::
|
66
|
+
# @return [Nokogiri::HTML5::Document]
|
67
67
|
attr_reader :document
|
68
68
|
end
|
69
69
|
end
|
data/lib/micro_micro/helpers.rb
CHANGED
@@ -62,7 +62,7 @@ module MicroMicro
|
|
62
62
|
# @see https://microformats.org/wiki/microformats2-parsing#parsing_for_implied_properties
|
63
63
|
# microformats.org: microformats2 parsing specification § Parsing for implied properties
|
64
64
|
#
|
65
|
-
# @param context [Nokogiri::
|
65
|
+
# @param context [Nokogiri::HTML5::Document, Nokogiri::XML::NodeSet, Nokogiri::XML::Element]
|
66
66
|
# @yield [context]
|
67
67
|
# @return [String]
|
68
68
|
def self.text_content_from(context)
|
data/lib/micro_micro/item.rb
CHANGED
@@ -32,7 +32,7 @@ module MicroMicro
|
|
32
32
|
|
33
33
|
# Extract {MicroMicro::Item}s from a context.
|
34
34
|
#
|
35
|
-
# @param context [Nokogiri::
|
35
|
+
# @param context [Nokogiri::HTML5::Document, Nokogiri::XML::NodeSet, Nokogiri::XML::Element]
|
36
36
|
# @return [Array<MicroMicro::Item>]
|
37
37
|
def self.from_context(context)
|
38
38
|
ItemNodeSearch
|
@@ -90,17 +90,14 @@ module MicroMicro
|
|
90
90
|
|
91
91
|
# @return [String, nil]
|
92
92
|
def value
|
93
|
-
@value ||=
|
94
|
-
if normalized_date || normalized_time || normalized_timezone
|
95
|
-
"#{normalized_date} #{normalized_time}#{normalized_timezone}".strip
|
96
|
-
end
|
93
|
+
@value ||= "#{normalized_date} #{normalized_time}#{normalized_timezone}".strip.presence
|
97
94
|
end
|
98
95
|
|
99
96
|
# @return [Hash{Symbol => String, nil}]
|
100
97
|
def values
|
101
98
|
@values ||=
|
102
99
|
if string.match?(DATE_TIME_TIMEZONE_REGEXP)
|
103
|
-
string.match(DATE_TIME_TIMEZONE_REGEXP).named_captures.
|
100
|
+
string.match(DATE_TIME_TIMEZONE_REGEXP).named_captures.transform_keys(&:to_sym)
|
104
101
|
else
|
105
102
|
{}
|
106
103
|
end
|
@@ -0,0 +1,68 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
module MicroMicro
|
4
|
+
module Parsers
|
5
|
+
class ImageElementParser
|
6
|
+
# @return [String]
|
7
|
+
attr_reader :value
|
8
|
+
|
9
|
+
# @param node [Nokogiri::XML::Element]
|
10
|
+
# @param value [String]
|
11
|
+
def initialize(node, value)
|
12
|
+
@node = node
|
13
|
+
@value = value
|
14
|
+
end
|
15
|
+
|
16
|
+
# @return [String, nil]
|
17
|
+
def alt
|
18
|
+
@alt ||= node['alt']&.strip
|
19
|
+
end
|
20
|
+
|
21
|
+
# @return [Boolean]
|
22
|
+
def alt?
|
23
|
+
!alt.nil?
|
24
|
+
end
|
25
|
+
|
26
|
+
# @return [Hash{Symbol => String}, nil]
|
27
|
+
def srcset
|
28
|
+
@srcset ||= image_candidates if node['srcset']
|
29
|
+
end
|
30
|
+
|
31
|
+
# @return [Boolean]
|
32
|
+
def srcset?
|
33
|
+
srcset.present?
|
34
|
+
end
|
35
|
+
|
36
|
+
# @return [Hash{Symbol => String, Hash{Symbol => String}}]
|
37
|
+
def to_h
|
38
|
+
hash = { value: value }
|
39
|
+
|
40
|
+
hash[:srcset] = srcset if srcset?
|
41
|
+
hash[:alt] = alt if alt?
|
42
|
+
|
43
|
+
hash
|
44
|
+
end
|
45
|
+
|
46
|
+
private
|
47
|
+
|
48
|
+
# @return [Nokogiri::XML::Element]
|
49
|
+
attr_reader :node
|
50
|
+
|
51
|
+
# @return [Hash{Symbol => String}]
|
52
|
+
#
|
53
|
+
# rubocop:disable Style/PerlBackrefs
|
54
|
+
def image_candidates
|
55
|
+
node['srcset']
|
56
|
+
.split(',')
|
57
|
+
.each_with_object({}) do |candidate, hash|
|
58
|
+
candidate.strip.match(/^(.+?)(\s+.+)?$/) do
|
59
|
+
key = ($2 || '1x').strip.to_sym
|
60
|
+
|
61
|
+
hash[key] = $1 unless hash[key]
|
62
|
+
end
|
63
|
+
end
|
64
|
+
end
|
65
|
+
# rubocop:enable Style/PerlBackrefs
|
66
|
+
end
|
67
|
+
end
|
68
|
+
end
|
@@ -19,12 +19,11 @@ module MicroMicro
|
|
19
19
|
def value
|
20
20
|
@value ||=
|
21
21
|
if attribute_value
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
}
|
22
|
+
if candidate_node.matches?('img[alt], img[srcset]')
|
23
|
+
ImageElementParser.new(candidate_node, attribute_value).to_h
|
24
|
+
else
|
25
|
+
attribute_value
|
26
|
+
end
|
28
27
|
end
|
29
28
|
end
|
30
29
|
|
@@ -23,11 +23,8 @@ module MicroMicro
|
|
23
23
|
# @return [String, Hash{Symbol => String}]
|
24
24
|
def value
|
25
25
|
@value ||=
|
26
|
-
if node.matches?('img[alt]')
|
27
|
-
|
28
|
-
value: resolved_value,
|
29
|
-
alt: node['alt'].strip
|
30
|
-
}
|
26
|
+
if node.matches?('img[alt], img[srcset]')
|
27
|
+
ImageElementParser.new(node, resolved_value).to_h
|
31
28
|
else
|
32
29
|
resolved_value
|
33
30
|
end
|
data/lib/micro_micro/property.rb
CHANGED
@@ -152,8 +152,8 @@ module MicroMicro
|
|
152
152
|
|
153
153
|
return hash.merge(parser.value) if embedded_markup_property?
|
154
154
|
|
155
|
-
p_property = item.properties.
|
156
|
-
u_property = item.properties.
|
155
|
+
p_property = item.properties.find_by(name: 'name') if plain_text_property?
|
156
|
+
u_property = item.properties.find_by(name: 'url') if url_property?
|
157
157
|
|
158
158
|
hash.merge(value: (p_property || u_property || parser).value)
|
159
159
|
else
|
@@ -6,7 +6,7 @@ module MicroMicro
|
|
6
6
|
|
7
7
|
# Extract {MicroMicro::Relationship}s from a context.
|
8
8
|
#
|
9
|
-
# @param context [Nokogiri::
|
9
|
+
# @param context [Nokogiri::HTML5::Document, Nokogiri::XML::Element]
|
10
10
|
# @return [Array<MicroMicro::Relationship>]
|
11
11
|
def self.from_context(context)
|
12
12
|
context.css('[href][rel]:not([rel=""])')
|
@@ -65,7 +65,7 @@ module MicroMicro
|
|
65
65
|
title: title,
|
66
66
|
type: type,
|
67
67
|
text: text
|
68
|
-
}.
|
68
|
+
}.compact_blank!
|
69
69
|
end
|
70
70
|
|
71
71
|
# An Array of unique values from node's +rel+ attribute.
|
data/lib/micro_micro/version.rb
CHANGED
data/lib/micromicro.rb
CHANGED
@@ -3,9 +3,9 @@
|
|
3
3
|
require 'forwardable'
|
4
4
|
|
5
5
|
require 'active_support/core_ext/array/grouping'
|
6
|
+
require 'active_support/core_ext/enumerable'
|
6
7
|
require 'active_support/core_ext/hash/deep_transform_values'
|
7
|
-
require 'active_support/core_ext/hash/
|
8
|
-
require 'active_support/core_ext/hash/slice'
|
8
|
+
require 'active_support/core_ext/hash/except'
|
9
9
|
require 'active_support/core_ext/object/blank'
|
10
10
|
require 'nokogiri'
|
11
11
|
require 'nokogiri/html-ext'
|
@@ -15,6 +15,7 @@ require_relative 'micro_micro/collectible'
|
|
15
15
|
require_relative 'micro_micro/helpers'
|
16
16
|
|
17
17
|
require_relative 'micro_micro/parsers/date_time_parser'
|
18
|
+
require_relative 'micro_micro/parsers/image_element_parser'
|
18
19
|
require_relative 'micro_micro/parsers/value_class_pattern_parser'
|
19
20
|
|
20
21
|
require_relative 'micro_micro/parsers/base_property_parser'
|
data/micromicro.gemspec
CHANGED
@@ -28,6 +28,6 @@ Gem::Specification.new do |spec|
|
|
28
28
|
}
|
29
29
|
|
30
30
|
spec.add_runtime_dependency 'activesupport', '~> 7.0'
|
31
|
-
spec.add_runtime_dependency 'nokogiri', '>= 1.
|
32
|
-
spec.add_runtime_dependency 'nokogiri-html-ext', '~> 0.
|
31
|
+
spec.add_runtime_dependency 'nokogiri', '>= 1.14'
|
32
|
+
spec.add_runtime_dependency 'nokogiri-html-ext', '~> 0.4.0'
|
33
33
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: micromicro
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version:
|
4
|
+
version: 4.0.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jason Garber
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2023-03-14 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: activesupport
|
@@ -30,28 +30,28 @@ dependencies:
|
|
30
30
|
requirements:
|
31
31
|
- - ">="
|
32
32
|
- !ruby/object:Gem::Version
|
33
|
-
version: '1.
|
33
|
+
version: '1.14'
|
34
34
|
type: :runtime
|
35
35
|
prerelease: false
|
36
36
|
version_requirements: !ruby/object:Gem::Requirement
|
37
37
|
requirements:
|
38
38
|
- - ">="
|
39
39
|
- !ruby/object:Gem::Version
|
40
|
-
version: '1.
|
40
|
+
version: '1.14'
|
41
41
|
- !ruby/object:Gem::Dependency
|
42
42
|
name: nokogiri-html-ext
|
43
43
|
requirement: !ruby/object:Gem::Requirement
|
44
44
|
requirements:
|
45
45
|
- - "~>"
|
46
46
|
- !ruby/object:Gem::Version
|
47
|
-
version: 0.
|
47
|
+
version: 0.4.0
|
48
48
|
type: :runtime
|
49
49
|
prerelease: false
|
50
50
|
version_requirements: !ruby/object:Gem::Requirement
|
51
51
|
requirements:
|
52
52
|
- - "~>"
|
53
53
|
- !ruby/object:Gem::Version
|
54
|
-
version: 0.
|
54
|
+
version: 0.4.0
|
55
55
|
description: Extract microformats2-encoded data from HTML documents.
|
56
56
|
email:
|
57
57
|
- jason@sixtwothree.org
|
@@ -77,6 +77,7 @@ files:
|
|
77
77
|
- lib/micro_micro/parsers/date_time_parser.rb
|
78
78
|
- lib/micro_micro/parsers/date_time_property_parser.rb
|
79
79
|
- lib/micro_micro/parsers/embedded_markup_property_parser.rb
|
80
|
+
- lib/micro_micro/parsers/image_element_parser.rb
|
80
81
|
- lib/micro_micro/parsers/implied_name_property_parser.rb
|
81
82
|
- lib/micro_micro/parsers/implied_photo_property_parser.rb
|
82
83
|
- lib/micro_micro/parsers/implied_url_property_parser.rb
|
@@ -93,7 +94,7 @@ licenses:
|
|
93
94
|
- MIT
|
94
95
|
metadata:
|
95
96
|
bug_tracker_uri: https://github.com/jgarber623/micromicro/issues
|
96
|
-
changelog_uri: https://github.com/jgarber623/micromicro/blob/
|
97
|
+
changelog_uri: https://github.com/jgarber623/micromicro/blob/v4.0.0/CHANGELOG.md
|
97
98
|
rubygems_mfa_required: 'true'
|
98
99
|
post_install_message:
|
99
100
|
rdoc_options: []
|
@@ -113,7 +114,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
113
114
|
- !ruby/object:Gem::Version
|
114
115
|
version: '0'
|
115
116
|
requirements: []
|
116
|
-
rubygems_version: 3.3.
|
117
|
+
rubygems_version: 3.3.26
|
117
118
|
signing_key:
|
118
119
|
specification_version: 4
|
119
120
|
summary: Extract microformats2-encoded data from HTML documents.
|