nokogiri-html-ext 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/CHANGELOG.md +5 -0
- data/CONTRIBUTING.md +37 -0
- data/LICENSE +21 -0
- data/README.md +139 -0
- data/lib/nokogiri/html-ext.rb +5 -0
- data/lib/nokogiri/html_ext/document.rb +110 -0
- data/lib/nokogiri/html_ext/version.rb +7 -0
- data/nokogiri-html-ext.gemspec +31 -0
- metadata +68 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: b391309327d894afb0495dd2fcbe3e3555a0c41bd21937af25a134e0ad1d27ec
|
4
|
+
data.tar.gz: 16e6cd74fd4485610a239edd904318973125de3225a1a5beca69a311fb79eda5
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: dc29f44a0ed51d136b5c7e44ff646bfc6d024ac66308b7f539ce64e3ceaad3fe78faed345842946ed538ab4d4dd673a5d831a1cf2f1cc9689c429689c5815c0f
|
7
|
+
data.tar.gz: 2e4ec5bba2a5678f854194ea5d1c1c3a054fd5ae11e62d61f6992f0a52afe87ba239abc6f5114a5e6494a3e1cab1fac0f42a9fe81a79e490f21b77bd63e24161
|
data/CHANGELOG.md
ADDED
data/CONTRIBUTING.md
ADDED
@@ -0,0 +1,37 @@
|
|
1
|
+
# Contributing to nokogiri-html-ext
|
2
|
+
|
3
|
+
There are a couple ways you can help improve nokogiri-html-ext:
|
4
|
+
|
5
|
+
1. Fix an existing [issue][issues] and submit a [pull request][pulls].
|
6
|
+
1. Review open [pull requests][pulls].
|
7
|
+
1. Report a new [issue][issues]. _Only do this after you've made sure the behavior or problem you're observing isn't already documented in an open issue._
|
8
|
+
|
9
|
+
## Getting Started
|
10
|
+
|
11
|
+
nokogiri-html-ext is developed using Ruby 2.7.6 and is additionally tested against Ruby 3.0 and 3.1 using [GitHub Actions](https://github.com/jgarber623/nokogiri-html-ext/actions).
|
12
|
+
|
13
|
+
Before making changes to nokogiri-html-ext, you'll want to install Ruby 2.7.6. It's recommended that you use a Ruby version managment tool like [rbenv](https://github.com/rbenv/rbenv), [chruby](https://github.com/postmodern/chruby), or [rvm](https://github.com/rvm/rvm). Once you've installed Ruby 2.7.6 using your method of choice, install the project's gems by running:
|
14
|
+
|
15
|
+
```sh
|
16
|
+
bundle install
|
17
|
+
```
|
18
|
+
|
19
|
+
## Making Changes
|
20
|
+
|
21
|
+
1. Fork and clone the project's repo.
|
22
|
+
1. Install development dependencies as outlined above.
|
23
|
+
1. Create a feature branch for the code changes you're looking to make: `git checkout -b my-new-feature`.
|
24
|
+
1. _Write some code!_
|
25
|
+
1. If your changes would benefit from testing, add the necessary tests and verify everything passes by running `bundle exec rspec`.
|
26
|
+
1. Commit your changes: `git commit -am 'Add some new feature or fix some issue'`. _(See [this excellent article](https://chris.beams.io/posts/git-commit/) for tips on writing useful Git commit messages.)_
|
27
|
+
1. Push the branch to your fork: `git push -u origin my-new-feature`.
|
28
|
+
1. Create a new [pull request][pulls] and we'll review your changes.
|
29
|
+
|
30
|
+
## Code Style
|
31
|
+
|
32
|
+
Code formatting conventions are defined in the `.editorconfig` file which uses the [EditorConfig](http://editorconfig.org) syntax. There are [plugins for a variety of editors](http://editorconfig.org/#download) that utilize the settings in the `.editorconfig` file. We recommended you install the EditorConfig plugin for your editor of choice.
|
33
|
+
|
34
|
+
Your bug fix or feature addition won't be rejected if it runs afoul of any (or all) of these guidelines, but following the guidelines will definitely make everyone's lives a little easier.
|
35
|
+
|
36
|
+
[issues]: https://github.com/jgarber623/nokogiri-html-ext/issues
|
37
|
+
[pulls]: https://github.com/jgarber623/nokogiri-html-ext/pulls
|
data/LICENSE
ADDED
@@ -0,0 +1,21 @@
|
|
1
|
+
MIT License
|
2
|
+
|
3
|
+
Copyright (c) 2022 Jason Garber
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
7
|
+
in the Software without restriction, including without limitation the rights
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
10
|
+
furnished to do so, subject to the following conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
13
|
+
copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
21
|
+
SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,139 @@
|
|
1
|
+
# nokogiri-html-ext
|
2
|
+
|
3
|
+
**A Ruby gem extending [Nokogiri](https://nokogiri.org) with several useful HTML-centric features.**
|
4
|
+
|
5
|
+
[](https://rubygems.org/gems/nokogiri-html-ext)
|
6
|
+
[](https://rubygems.org/gems/nokogiri-html-ext)
|
7
|
+
[](https://github.com/jgarber623/nokogiri-html-ext/actions/workflows/ci.yml)
|
8
|
+
[](https://codeclimate.com/github/jgarber623/nokogiri-html-ext)
|
9
|
+
[](https://codeclimate.com/github/jgarber623/nokogiri-html-ext/code)
|
10
|
+
|
11
|
+
## Key features
|
12
|
+
|
13
|
+
- Resolves all relative URLs in a Nokogiri-parsed HTML document.
|
14
|
+
- Adds helpers for getting and setting a document's `<base>` element's `href` attribute.
|
15
|
+
- Supports Ruby 2.7 and newer
|
16
|
+
|
17
|
+
## Getting Started
|
18
|
+
|
19
|
+
Before installing and using nokogiri-html-ext, you'll want to have [Ruby](https://www.ruby-lang.org) 2.7 (or newer) installed. It's recommended that you use a Ruby version managment tool like [rbenv](https://github.com/rbenv/rbenv), [chruby](https://github.com/postmodern/chruby), or [rvm](https://github.com/rvm/rvm).
|
20
|
+
|
21
|
+
nokogiri-html-ext is developed using Ruby 2.7.6 and is additionally tested against Ruby 3.0 and 3.1 using [GitHub Actions](https://github.com/jgarber623/nokogiri-html-ext/actions).
|
22
|
+
|
23
|
+
## Installation
|
24
|
+
|
25
|
+
If you're using [Bundler](https://bundler.io) to manage gem dependencies, add nokogiri-html-ext to your project's Gemfile:
|
26
|
+
|
27
|
+
```ruby
|
28
|
+
gem 'nokogiri-html-ext'
|
29
|
+
```
|
30
|
+
|
31
|
+
…and run `bundle install` in your shell.
|
32
|
+
|
33
|
+
To install the gem manually, run the following in your shell:
|
34
|
+
|
35
|
+
```sh
|
36
|
+
gem install nokogiri-html-ext
|
37
|
+
```
|
38
|
+
|
39
|
+
## Usage
|
40
|
+
|
41
|
+
### `base_href`
|
42
|
+
|
43
|
+
nokogiri-html-ext provides two helper methods for getting and setting a document's `<base>` element's `href` attribute. The first, `base_href`, retrieves the element's `href` attribute value if it exists.
|
44
|
+
|
45
|
+
```ruby
|
46
|
+
doc = Nokogiri::HTML('<html><body>Hello, world!</body></html>')
|
47
|
+
|
48
|
+
doc.base_href
|
49
|
+
#=> nil
|
50
|
+
|
51
|
+
doc = Nokogiri::HTML('<html><head><base target="_top"><body>Hello, world!</body></html>')
|
52
|
+
|
53
|
+
doc.base_href
|
54
|
+
#=> nil
|
55
|
+
|
56
|
+
doc = Nokogiri::HTML('<html><head><base href="/foo"><body>Hello, world!</body></html>')
|
57
|
+
|
58
|
+
doc.base_href
|
59
|
+
#=> "/foo"
|
60
|
+
```
|
61
|
+
|
62
|
+
The `base_href=` method allows you to manipulate the document's `<base>` element.
|
63
|
+
|
64
|
+
```ruby
|
65
|
+
doc = Nokogiri::HTML('<html><body>Hello, world!</body></html>')
|
66
|
+
|
67
|
+
doc.base_href = '/foo'
|
68
|
+
#=> "/foo"
|
69
|
+
|
70
|
+
doc.at_css('base').to_s
|
71
|
+
#=> "<base href=\"/foo\">"
|
72
|
+
|
73
|
+
doc = Nokogiri::HTML('<html><head><base href="/foo"><body>Hello, world!</body></html>')
|
74
|
+
|
75
|
+
doc.base_href = '/bar'
|
76
|
+
#=> "/bar"
|
77
|
+
|
78
|
+
doc.at_css('base').to_s
|
79
|
+
#=> "<base href=\"/bar\">"
|
80
|
+
```
|
81
|
+
|
82
|
+
### `resolve_relative_urls!`
|
83
|
+
|
84
|
+
nokogiri-html-ext will resolve a document's relative URLs against a provided source URL. The source URL _should_ be an absolute URL (e.g. `https://jgarber.example`) representing the location of the document being parsed. The source URL _may_ be any `String` (or any Ruby object that responds to `#to_s`).
|
85
|
+
|
86
|
+
nokogiri-html-ext takes advantage of [the `Nokogiri::XML::Document.parse` method](https://github.com/sparklemotion/nokogiri/blob/main/lib/nokogiri/xml/document.rb#L48)'s second positional argument to set the parsed document's URL.Nokogiri's source code is _very_ complex, but in short: [the `Nokogiri::HTML` method](https://github.com/sparklemotion/nokogiri/blob/main/lib/nokogiri/html.rb#L7-L8) is an alias to [the `Nokogiri::HTML4` method](https://github.com/sparklemotion/nokogiri/blob/main/lib/nokogiri/html4.rb#L10-L12) which eventually winds its way to the aforementioned `Nokogiri::XML::Document.parse` method. _Phew._ 🥵
|
87
|
+
|
88
|
+
URL resolution uses Ruby's built-in URL parsing and normalizing capabilities. Absolute URLs will remain unmodified.
|
89
|
+
|
90
|
+
**Note:** If the document's markup includes a `<base>` element whose `href` attribute is an absolute URL, _that_ URL will take precedence when performing URL resolution.
|
91
|
+
|
92
|
+
An abbreviated example:
|
93
|
+
|
94
|
+
```ruby
|
95
|
+
markup = <<-HTML
|
96
|
+
<html>
|
97
|
+
<body>
|
98
|
+
<a href="/home">Home</a>
|
99
|
+
<img src="/foo.png" srcset="../bar.png 720w">
|
100
|
+
</body>
|
101
|
+
</html>
|
102
|
+
HTML
|
103
|
+
|
104
|
+
doc = Nokogiri::HTML(markup, 'https://jgarber.example')
|
105
|
+
|
106
|
+
doc.url
|
107
|
+
#=> "https://jgarber.example"
|
108
|
+
|
109
|
+
doc.base_href
|
110
|
+
#=> nil
|
111
|
+
|
112
|
+
doc.base_href = '/foo/bar/biz'
|
113
|
+
#=> "/foo/bar/biz"
|
114
|
+
|
115
|
+
doc.resolve_relative_urls!
|
116
|
+
|
117
|
+
doc.at_css('base')['href']
|
118
|
+
#=> "https://jgarber.example/foo/bar/biz"
|
119
|
+
|
120
|
+
doc.at_css('a')['href']
|
121
|
+
#=> "https://jgarber.example/home"
|
122
|
+
|
123
|
+
doc.at_css('img').to_s
|
124
|
+
#=> "<img src=\"https://jgarber.example/foo.png\" srcset=\"https://jgarber.example/foo/bar.png 720w\">"
|
125
|
+
```
|
126
|
+
|
127
|
+
## Contributing
|
128
|
+
|
129
|
+
Interested in helping improve nokogiri-html-ext? Awesome! Your help is greatly appreciated. See [CONTRIBUTING.md](https://github.com/jgarber623/nokogiri-html-ext/blob/main/CONTRIBUTING.md) for details.
|
130
|
+
|
131
|
+
## Acknowledgments
|
132
|
+
|
133
|
+
nokogiri-html-ext wouldn't exist without the [Nokogiri](https://nokogiri.org) project and its [community](https://github.com/sparklemotion/nokogiri).
|
134
|
+
|
135
|
+
nokogiri-html-ext is written and maintained by [Jason Garber](https://sixtwothree.org).
|
136
|
+
|
137
|
+
## License
|
138
|
+
|
139
|
+
nokogiri-html-ext is freely available under the MIT License. Use it, learn from it, fork it, improve it, change it, tailor it to your needs.
|
@@ -0,0 +1,110 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require 'nokogiri'
|
4
|
+
|
5
|
+
module Nokogiri
|
6
|
+
module HTML4
|
7
|
+
class Document < Nokogiri::XML::Document
|
8
|
+
# A map of HTML `srcset` attributes and their associated element names.
|
9
|
+
#
|
10
|
+
# @see https://html.spec.whatwg.org/#srcset-attributes
|
11
|
+
# @see https://html.spec.whatwg.org/#attributes-3
|
12
|
+
IMAGE_CANDIDATE_STRINGS_ATTRIBUTES_MAP = {
|
13
|
+
'imagesrcset' => %w[link],
|
14
|
+
'srcset' => %w[img source]
|
15
|
+
}.freeze
|
16
|
+
|
17
|
+
private_constant :IMAGE_CANDIDATE_STRINGS_ATTRIBUTES_MAP
|
18
|
+
|
19
|
+
# A map of HTML URL attributes and their associated element names.
|
20
|
+
#
|
21
|
+
# @see https://html.spec.whatwg.org/#attributes-3
|
22
|
+
URL_ATTRIBUTES_MAP = {
|
23
|
+
'action' => %w[form],
|
24
|
+
'cite' => %w[blockquote del ins q],
|
25
|
+
'data' => %w[object],
|
26
|
+
'formaction' => %w[button input],
|
27
|
+
'href' => %w[a area base link],
|
28
|
+
'ping' => %w[a area],
|
29
|
+
'poster' => %w[video],
|
30
|
+
'src' => %w[audio embed iframe img input script source track video]
|
31
|
+
}.freeze
|
32
|
+
|
33
|
+
private_constant :URL_ATTRIBUTES_MAP
|
34
|
+
|
35
|
+
# Get the <base> element's HREF attribute value.
|
36
|
+
#
|
37
|
+
# @return [String, nil]
|
38
|
+
def base_href
|
39
|
+
(base = at_xpath('//base[@href]')) && base['href'].strip
|
40
|
+
end
|
41
|
+
|
42
|
+
# Set the <base> element's HREF attribute value.
|
43
|
+
#
|
44
|
+
# If a <base> element exists, its HREF attribute value is replaced with
|
45
|
+
# the given value. If no <base> element exists, this method will create
|
46
|
+
# one and append it to the document's <head> (creating that element if
|
47
|
+
# necessary).
|
48
|
+
#
|
49
|
+
# @param url [String, #to_s]
|
50
|
+
#
|
51
|
+
# @return [String]
|
52
|
+
def base_href=(url)
|
53
|
+
url_str = url.to_s
|
54
|
+
|
55
|
+
if (base = at_xpath('//base'))
|
56
|
+
base['href'] = url_str
|
57
|
+
url_str
|
58
|
+
else
|
59
|
+
base = XML::Node.new('base', self)
|
60
|
+
base['href'] = url_str
|
61
|
+
|
62
|
+
set_metadata_element(base)
|
63
|
+
end
|
64
|
+
end
|
65
|
+
|
66
|
+
# Convert the document's relative URLs to absolute URLs.
|
67
|
+
#
|
68
|
+
# @return [self]
|
69
|
+
#
|
70
|
+
# rubocop:disable Style/PerlBackrefs
|
71
|
+
def resolve_relative_urls!
|
72
|
+
resolve_relative_urls_for(URL_ATTRIBUTES_MAP) { |value| resolve_relative_url(value.strip) }
|
73
|
+
|
74
|
+
resolve_relative_urls_for(IMAGE_CANDIDATE_STRINGS_ATTRIBUTES_MAP) do |value|
|
75
|
+
value
|
76
|
+
.split(',')
|
77
|
+
.map { |candidate| candidate.strip.sub(/^(.+?)(\s+.+)?$/) { "#{resolve_relative_url($1)}#{$2}" } }
|
78
|
+
.join(', ')
|
79
|
+
end
|
80
|
+
|
81
|
+
self
|
82
|
+
end
|
83
|
+
# rubocop:enable Style/PerlBackrefs
|
84
|
+
|
85
|
+
private
|
86
|
+
|
87
|
+
def resolve_relative_url(url)
|
88
|
+
uri_parser.unescape(
|
89
|
+
uri_parser.join(*[document.url.strip, base_href, url].compact.map { |u| uri_parser.escape(u) })
|
90
|
+
.normalize
|
91
|
+
.to_s
|
92
|
+
)
|
93
|
+
end
|
94
|
+
|
95
|
+
def resolve_relative_urls_for(attributes_map)
|
96
|
+
attributes_map.each do |attribute, names|
|
97
|
+
xpaths = names.map { |name| "//#{name}[@#{attribute}]" }
|
98
|
+
|
99
|
+
xpath(*xpaths).each do |node|
|
100
|
+
node[attribute] = yield node[attribute]
|
101
|
+
end
|
102
|
+
end
|
103
|
+
end
|
104
|
+
|
105
|
+
def uri_parser
|
106
|
+
@uri_parser ||= URI::DEFAULT_PARSER
|
107
|
+
end
|
108
|
+
end
|
109
|
+
end
|
110
|
+
end
|
@@ -0,0 +1,31 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require_relative 'lib/nokogiri/html_ext/version'
|
4
|
+
|
5
|
+
Gem::Specification.new do |spec|
|
6
|
+
spec.required_ruby_version = '>= 2.7'
|
7
|
+
|
8
|
+
spec.name = 'nokogiri-html-ext'
|
9
|
+
spec.version = Nokogiri::HTMLExt::VERSION
|
10
|
+
spec.authors = ['Jason Garber']
|
11
|
+
spec.email = ['jason@sixtwothree.org']
|
12
|
+
|
13
|
+
spec.summary = 'Extend Nokogiri with several useful HTML-centric features.'
|
14
|
+
spec.description = spec.summary
|
15
|
+
spec.homepage = 'https://github.com/jgarber623/nokogiri-html-ext'
|
16
|
+
spec.license = 'MIT'
|
17
|
+
|
18
|
+
spec.files = Dir['lib/**/*'].reject { |f| File.directory?(f) }
|
19
|
+
spec.files += %w[LICENSE CHANGELOG.md CONTRIBUTING.md README.md]
|
20
|
+
spec.files += %w[nokogiri-html-ext.gemspec]
|
21
|
+
|
22
|
+
spec.require_paths = ['lib']
|
23
|
+
|
24
|
+
spec.metadata = {
|
25
|
+
'bug_tracker_uri' => "#{spec.homepage}/issues",
|
26
|
+
'changelog_uri' => "#{spec.homepage}/blob/v#{spec.version}/CHANGELOG.md",
|
27
|
+
'rubygems_mfa_required' => 'true'
|
28
|
+
}
|
29
|
+
|
30
|
+
spec.add_runtime_dependency 'nokogiri', '>= 1.13'
|
31
|
+
end
|
metadata
ADDED
@@ -0,0 +1,68 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: nokogiri-html-ext
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.1.0
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Jason Garber
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
date: 2022-07-02 00:00:00.000000000 Z
|
12
|
+
dependencies:
|
13
|
+
- !ruby/object:Gem::Dependency
|
14
|
+
name: nokogiri
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
16
|
+
requirements:
|
17
|
+
- - ">="
|
18
|
+
- !ruby/object:Gem::Version
|
19
|
+
version: '1.13'
|
20
|
+
type: :runtime
|
21
|
+
prerelease: false
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
23
|
+
requirements:
|
24
|
+
- - ">="
|
25
|
+
- !ruby/object:Gem::Version
|
26
|
+
version: '1.13'
|
27
|
+
description: Extend Nokogiri with several useful HTML-centric features.
|
28
|
+
email:
|
29
|
+
- jason@sixtwothree.org
|
30
|
+
executables: []
|
31
|
+
extensions: []
|
32
|
+
extra_rdoc_files: []
|
33
|
+
files:
|
34
|
+
- CHANGELOG.md
|
35
|
+
- CONTRIBUTING.md
|
36
|
+
- LICENSE
|
37
|
+
- README.md
|
38
|
+
- lib/nokogiri/html-ext.rb
|
39
|
+
- lib/nokogiri/html_ext/document.rb
|
40
|
+
- lib/nokogiri/html_ext/version.rb
|
41
|
+
- nokogiri-html-ext.gemspec
|
42
|
+
homepage: https://github.com/jgarber623/nokogiri-html-ext
|
43
|
+
licenses:
|
44
|
+
- MIT
|
45
|
+
metadata:
|
46
|
+
bug_tracker_uri: https://github.com/jgarber623/nokogiri-html-ext/issues
|
47
|
+
changelog_uri: https://github.com/jgarber623/nokogiri-html-ext/blob/v0.1.0/CHANGELOG.md
|
48
|
+
rubygems_mfa_required: 'true'
|
49
|
+
post_install_message:
|
50
|
+
rdoc_options: []
|
51
|
+
require_paths:
|
52
|
+
- lib
|
53
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
54
|
+
requirements:
|
55
|
+
- - ">="
|
56
|
+
- !ruby/object:Gem::Version
|
57
|
+
version: '2.7'
|
58
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
59
|
+
requirements:
|
60
|
+
- - ">="
|
61
|
+
- !ruby/object:Gem::Version
|
62
|
+
version: '0'
|
63
|
+
requirements: []
|
64
|
+
rubygems_version: 3.3.16
|
65
|
+
signing_key:
|
66
|
+
specification_version: 4
|
67
|
+
summary: Extend Nokogiri with several useful HTML-centric features.
|
68
|
+
test_files: []
|