nokogiri-html-ext 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: b391309327d894afb0495dd2fcbe3e3555a0c41bd21937af25a134e0ad1d27ec
4
+ data.tar.gz: 16e6cd74fd4485610a239edd904318973125de3225a1a5beca69a311fb79eda5
5
+ SHA512:
6
+ metadata.gz: dc29f44a0ed51d136b5c7e44ff646bfc6d024ac66308b7f539ce64e3ceaad3fe78faed345842946ed538ab4d4dd673a5d831a1cf2f1cc9689c429689c5815c0f
7
+ data.tar.gz: 2e4ec5bba2a5678f854194ea5d1c1c3a054fd5ae11e62d61f6992f0a52afe87ba239abc6f5114a5e6494a3e1cab1fac0f42a9fe81a79e490f21b77bd63e24161
data/CHANGELOG.md ADDED
@@ -0,0 +1,5 @@
1
+ # Changelog
2
+
3
+ ## v0.1.0 / 2022-07-01
4
+
5
+ - Initial release! 🎉
data/CONTRIBUTING.md ADDED
@@ -0,0 +1,37 @@
1
+ # Contributing to nokogiri-html-ext
2
+
3
+ There are a couple ways you can help improve nokogiri-html-ext:
4
+
5
+ 1. Fix an existing [issue][issues] and submit a [pull request][pulls].
6
+ 1. Review open [pull requests][pulls].
7
+ 1. Report a new [issue][issues]. _Only do this after you've made sure the behavior or problem you're observing isn't already documented in an open issue._
8
+
9
+ ## Getting Started
10
+
11
+ nokogiri-html-ext is developed using Ruby 2.7.6 and is additionally tested against Ruby 3.0 and 3.1 using [GitHub Actions](https://github.com/jgarber623/nokogiri-html-ext/actions).
12
+
13
+ Before making changes to nokogiri-html-ext, you'll want to install Ruby 2.7.6. It's recommended that you use a Ruby version managment tool like [rbenv](https://github.com/rbenv/rbenv), [chruby](https://github.com/postmodern/chruby), or [rvm](https://github.com/rvm/rvm). Once you've installed Ruby 2.7.6 using your method of choice, install the project's gems by running:
14
+
15
+ ```sh
16
+ bundle install
17
+ ```
18
+
19
+ ## Making Changes
20
+
21
+ 1. Fork and clone the project's repo.
22
+ 1. Install development dependencies as outlined above.
23
+ 1. Create a feature branch for the code changes you're looking to make: `git checkout -b my-new-feature`.
24
+ 1. _Write some code!_
25
+ 1. If your changes would benefit from testing, add the necessary tests and verify everything passes by running `bundle exec rspec`.
26
+ 1. Commit your changes: `git commit -am 'Add some new feature or fix some issue'`. _(See [this excellent article](https://chris.beams.io/posts/git-commit/) for tips on writing useful Git commit messages.)_
27
+ 1. Push the branch to your fork: `git push -u origin my-new-feature`.
28
+ 1. Create a new [pull request][pulls] and we'll review your changes.
29
+
30
+ ## Code Style
31
+
32
+ Code formatting conventions are defined in the `.editorconfig` file which uses the [EditorConfig](http://editorconfig.org) syntax. There are [plugins for a variety of editors](http://editorconfig.org/#download) that utilize the settings in the `.editorconfig` file. We recommended you install the EditorConfig plugin for your editor of choice.
33
+
34
+ Your bug fix or feature addition won't be rejected if it runs afoul of any (or all) of these guidelines, but following the guidelines will definitely make everyone's lives a little easier.
35
+
36
+ [issues]: https://github.com/jgarber623/nokogiri-html-ext/issues
37
+ [pulls]: https://github.com/jgarber623/nokogiri-html-ext/pulls
data/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2022 Jason Garber
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,139 @@
1
+ # nokogiri-html-ext
2
+
3
+ **A Ruby gem extending [Nokogiri](https://nokogiri.org) with several useful HTML-centric features.**
4
+
5
+ [![Gem](https://img.shields.io/gem/v/nokogiri-html-ext.svg?logo=rubygems&style=for-the-badge)](https://rubygems.org/gems/nokogiri-html-ext)
6
+ [![Downloads](https://img.shields.io/gem/dt/nokogiri-html-ext.svg?logo=rubygems&style=for-the-badge)](https://rubygems.org/gems/nokogiri-html-ext)
7
+ [![Build](https://img.shields.io/github/workflow/status/jgarber623/nokogiri-html-ext/CI?logo=github&style=for-the-badge)](https://github.com/jgarber623/nokogiri-html-ext/actions/workflows/ci.yml)
8
+ [![Maintainability](https://img.shields.io/codeclimate/maintainability/jgarber623/nokogiri-html-ext.svg?logo=code-climate&style=for-the-badge)](https://codeclimate.com/github/jgarber623/nokogiri-html-ext)
9
+ [![Coverage](https://img.shields.io/codeclimate/c/jgarber623/nokogiri-html-ext.svg?logo=code-climate&style=for-the-badge)](https://codeclimate.com/github/jgarber623/nokogiri-html-ext/code)
10
+
11
+ ## Key features
12
+
13
+ - Resolves all relative URLs in a Nokogiri-parsed HTML document.
14
+ - Adds helpers for getting and setting a document's `<base>` element's `href` attribute.
15
+ - Supports Ruby 2.7 and newer
16
+
17
+ ## Getting Started
18
+
19
+ Before installing and using nokogiri-html-ext, you'll want to have [Ruby](https://www.ruby-lang.org) 2.7 (or newer) installed. It's recommended that you use a Ruby version managment tool like [rbenv](https://github.com/rbenv/rbenv), [chruby](https://github.com/postmodern/chruby), or [rvm](https://github.com/rvm/rvm).
20
+
21
+ nokogiri-html-ext is developed using Ruby 2.7.6 and is additionally tested against Ruby 3.0 and 3.1 using [GitHub Actions](https://github.com/jgarber623/nokogiri-html-ext/actions).
22
+
23
+ ## Installation
24
+
25
+ If you're using [Bundler](https://bundler.io) to manage gem dependencies, add nokogiri-html-ext to your project's Gemfile:
26
+
27
+ ```ruby
28
+ gem 'nokogiri-html-ext'
29
+ ```
30
+
31
+ …and run `bundle install` in your shell.
32
+
33
+ To install the gem manually, run the following in your shell:
34
+
35
+ ```sh
36
+ gem install nokogiri-html-ext
37
+ ```
38
+
39
+ ## Usage
40
+
41
+ ### `base_href`
42
+
43
+ nokogiri-html-ext provides two helper methods for getting and setting a document's `<base>` element's `href` attribute. The first, `base_href`, retrieves the element's `href` attribute value if it exists.
44
+
45
+ ```ruby
46
+ doc = Nokogiri::HTML('<html><body>Hello, world!</body></html>')
47
+
48
+ doc.base_href
49
+ #=> nil
50
+
51
+ doc = Nokogiri::HTML('<html><head><base target="_top"><body>Hello, world!</body></html>')
52
+
53
+ doc.base_href
54
+ #=> nil
55
+
56
+ doc = Nokogiri::HTML('<html><head><base href="/foo"><body>Hello, world!</body></html>')
57
+
58
+ doc.base_href
59
+ #=> "/foo"
60
+ ```
61
+
62
+ The `base_href=` method allows you to manipulate the document's `<base>` element.
63
+
64
+ ```ruby
65
+ doc = Nokogiri::HTML('<html><body>Hello, world!</body></html>')
66
+
67
+ doc.base_href = '/foo'
68
+ #=> "/foo"
69
+
70
+ doc.at_css('base').to_s
71
+ #=> "<base href=\"/foo\">"
72
+
73
+ doc = Nokogiri::HTML('<html><head><base href="/foo"><body>Hello, world!</body></html>')
74
+
75
+ doc.base_href = '/bar'
76
+ #=> "/bar"
77
+
78
+ doc.at_css('base').to_s
79
+ #=> "<base href=\"/bar\">"
80
+ ```
81
+
82
+ ### `resolve_relative_urls!`
83
+
84
+ nokogiri-html-ext will resolve a document's relative URLs against a provided source URL. The source URL _should_ be an absolute URL (e.g. `https://jgarber.example`) representing the location of the document being parsed. The source URL _may_ be any `String` (or any Ruby object that responds to `#to_s`).
85
+
86
+ nokogiri-html-ext takes advantage of [the `Nokogiri::XML::Document.parse` method](https://github.com/sparklemotion/nokogiri/blob/main/lib/nokogiri/xml/document.rb#L48)'s second positional argument to set the parsed document's URL.Nokogiri's source code is _very_ complex, but in short: [the `Nokogiri::HTML` method](https://github.com/sparklemotion/nokogiri/blob/main/lib/nokogiri/html.rb#L7-L8) is an alias to [the `Nokogiri::HTML4` method](https://github.com/sparklemotion/nokogiri/blob/main/lib/nokogiri/html4.rb#L10-L12) which eventually winds its way to the aforementioned `Nokogiri::XML::Document.parse` method. _Phew._ 🥵
87
+
88
+ URL resolution uses Ruby's built-in URL parsing and normalizing capabilities. Absolute URLs will remain unmodified.
89
+
90
+ **Note:** If the document's markup includes a `<base>` element whose `href` attribute is an absolute URL, _that_ URL will take precedence when performing URL resolution.
91
+
92
+ An abbreviated example:
93
+
94
+ ```ruby
95
+ markup = <<-HTML
96
+ <html>
97
+ <body>
98
+ <a href="/home">Home</a>
99
+ <img src="/foo.png" srcset="../bar.png 720w">
100
+ </body>
101
+ </html>
102
+ HTML
103
+
104
+ doc = Nokogiri::HTML(markup, 'https://jgarber.example')
105
+
106
+ doc.url
107
+ #=> "https://jgarber.example"
108
+
109
+ doc.base_href
110
+ #=> nil
111
+
112
+ doc.base_href = '/foo/bar/biz'
113
+ #=> "/foo/bar/biz"
114
+
115
+ doc.resolve_relative_urls!
116
+
117
+ doc.at_css('base')['href']
118
+ #=> "https://jgarber.example/foo/bar/biz"
119
+
120
+ doc.at_css('a')['href']
121
+ #=> "https://jgarber.example/home"
122
+
123
+ doc.at_css('img').to_s
124
+ #=> "<img src=\"https://jgarber.example/foo.png\" srcset=\"https://jgarber.example/foo/bar.png 720w\">"
125
+ ```
126
+
127
+ ## Contributing
128
+
129
+ Interested in helping improve nokogiri-html-ext? Awesome! Your help is greatly appreciated. See [CONTRIBUTING.md](https://github.com/jgarber623/nokogiri-html-ext/blob/main/CONTRIBUTING.md) for details.
130
+
131
+ ## Acknowledgments
132
+
133
+ nokogiri-html-ext wouldn't exist without the [Nokogiri](https://nokogiri.org) project and its [community](https://github.com/sparklemotion/nokogiri).
134
+
135
+ nokogiri-html-ext is written and maintained by [Jason Garber](https://sixtwothree.org).
136
+
137
+ ## License
138
+
139
+ nokogiri-html-ext is freely available under the MIT License. Use it, learn from it, fork it, improve it, change it, tailor it to your needs.
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative 'html_ext/version'
4
+
5
+ require_relative 'html_ext/document'
@@ -0,0 +1,110 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'nokogiri'
4
+
5
+ module Nokogiri
6
+ module HTML4
7
+ class Document < Nokogiri::XML::Document
8
+ # A map of HTML `srcset` attributes and their associated element names.
9
+ #
10
+ # @see https://html.spec.whatwg.org/#srcset-attributes
11
+ # @see https://html.spec.whatwg.org/#attributes-3
12
+ IMAGE_CANDIDATE_STRINGS_ATTRIBUTES_MAP = {
13
+ 'imagesrcset' => %w[link],
14
+ 'srcset' => %w[img source]
15
+ }.freeze
16
+
17
+ private_constant :IMAGE_CANDIDATE_STRINGS_ATTRIBUTES_MAP
18
+
19
+ # A map of HTML URL attributes and their associated element names.
20
+ #
21
+ # @see https://html.spec.whatwg.org/#attributes-3
22
+ URL_ATTRIBUTES_MAP = {
23
+ 'action' => %w[form],
24
+ 'cite' => %w[blockquote del ins q],
25
+ 'data' => %w[object],
26
+ 'formaction' => %w[button input],
27
+ 'href' => %w[a area base link],
28
+ 'ping' => %w[a area],
29
+ 'poster' => %w[video],
30
+ 'src' => %w[audio embed iframe img input script source track video]
31
+ }.freeze
32
+
33
+ private_constant :URL_ATTRIBUTES_MAP
34
+
35
+ # Get the <base> element's HREF attribute value.
36
+ #
37
+ # @return [String, nil]
38
+ def base_href
39
+ (base = at_xpath('//base[@href]')) && base['href'].strip
40
+ end
41
+
42
+ # Set the <base> element's HREF attribute value.
43
+ #
44
+ # If a <base> element exists, its HREF attribute value is replaced with
45
+ # the given value. If no <base> element exists, this method will create
46
+ # one and append it to the document's <head> (creating that element if
47
+ # necessary).
48
+ #
49
+ # @param url [String, #to_s]
50
+ #
51
+ # @return [String]
52
+ def base_href=(url)
53
+ url_str = url.to_s
54
+
55
+ if (base = at_xpath('//base'))
56
+ base['href'] = url_str
57
+ url_str
58
+ else
59
+ base = XML::Node.new('base', self)
60
+ base['href'] = url_str
61
+
62
+ set_metadata_element(base)
63
+ end
64
+ end
65
+
66
+ # Convert the document's relative URLs to absolute URLs.
67
+ #
68
+ # @return [self]
69
+ #
70
+ # rubocop:disable Style/PerlBackrefs
71
+ def resolve_relative_urls!
72
+ resolve_relative_urls_for(URL_ATTRIBUTES_MAP) { |value| resolve_relative_url(value.strip) }
73
+
74
+ resolve_relative_urls_for(IMAGE_CANDIDATE_STRINGS_ATTRIBUTES_MAP) do |value|
75
+ value
76
+ .split(',')
77
+ .map { |candidate| candidate.strip.sub(/^(.+?)(\s+.+)?$/) { "#{resolve_relative_url($1)}#{$2}" } }
78
+ .join(', ')
79
+ end
80
+
81
+ self
82
+ end
83
+ # rubocop:enable Style/PerlBackrefs
84
+
85
+ private
86
+
87
+ def resolve_relative_url(url)
88
+ uri_parser.unescape(
89
+ uri_parser.join(*[document.url.strip, base_href, url].compact.map { |u| uri_parser.escape(u) })
90
+ .normalize
91
+ .to_s
92
+ )
93
+ end
94
+
95
+ def resolve_relative_urls_for(attributes_map)
96
+ attributes_map.each do |attribute, names|
97
+ xpaths = names.map { |name| "//#{name}[@#{attribute}]" }
98
+
99
+ xpath(*xpaths).each do |node|
100
+ node[attribute] = yield node[attribute]
101
+ end
102
+ end
103
+ end
104
+
105
+ def uri_parser
106
+ @uri_parser ||= URI::DEFAULT_PARSER
107
+ end
108
+ end
109
+ end
110
+ end
@@ -0,0 +1,7 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Nokogiri
4
+ module HTMLExt
5
+ VERSION = '0.1.0'
6
+ end
7
+ end
@@ -0,0 +1,31 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative 'lib/nokogiri/html_ext/version'
4
+
5
+ Gem::Specification.new do |spec|
6
+ spec.required_ruby_version = '>= 2.7'
7
+
8
+ spec.name = 'nokogiri-html-ext'
9
+ spec.version = Nokogiri::HTMLExt::VERSION
10
+ spec.authors = ['Jason Garber']
11
+ spec.email = ['jason@sixtwothree.org']
12
+
13
+ spec.summary = 'Extend Nokogiri with several useful HTML-centric features.'
14
+ spec.description = spec.summary
15
+ spec.homepage = 'https://github.com/jgarber623/nokogiri-html-ext'
16
+ spec.license = 'MIT'
17
+
18
+ spec.files = Dir['lib/**/*'].reject { |f| File.directory?(f) }
19
+ spec.files += %w[LICENSE CHANGELOG.md CONTRIBUTING.md README.md]
20
+ spec.files += %w[nokogiri-html-ext.gemspec]
21
+
22
+ spec.require_paths = ['lib']
23
+
24
+ spec.metadata = {
25
+ 'bug_tracker_uri' => "#{spec.homepage}/issues",
26
+ 'changelog_uri' => "#{spec.homepage}/blob/v#{spec.version}/CHANGELOG.md",
27
+ 'rubygems_mfa_required' => 'true'
28
+ }
29
+
30
+ spec.add_runtime_dependency 'nokogiri', '>= 1.13'
31
+ end
metadata ADDED
@@ -0,0 +1,68 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: nokogiri-html-ext
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Jason Garber
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2022-07-02 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: nokogiri
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: '1.13'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: '1.13'
27
+ description: Extend Nokogiri with several useful HTML-centric features.
28
+ email:
29
+ - jason@sixtwothree.org
30
+ executables: []
31
+ extensions: []
32
+ extra_rdoc_files: []
33
+ files:
34
+ - CHANGELOG.md
35
+ - CONTRIBUTING.md
36
+ - LICENSE
37
+ - README.md
38
+ - lib/nokogiri/html-ext.rb
39
+ - lib/nokogiri/html_ext/document.rb
40
+ - lib/nokogiri/html_ext/version.rb
41
+ - nokogiri-html-ext.gemspec
42
+ homepage: https://github.com/jgarber623/nokogiri-html-ext
43
+ licenses:
44
+ - MIT
45
+ metadata:
46
+ bug_tracker_uri: https://github.com/jgarber623/nokogiri-html-ext/issues
47
+ changelog_uri: https://github.com/jgarber623/nokogiri-html-ext/blob/v0.1.0/CHANGELOG.md
48
+ rubygems_mfa_required: 'true'
49
+ post_install_message:
50
+ rdoc_options: []
51
+ require_paths:
52
+ - lib
53
+ required_ruby_version: !ruby/object:Gem::Requirement
54
+ requirements:
55
+ - - ">="
56
+ - !ruby/object:Gem::Version
57
+ version: '2.7'
58
+ required_rubygems_version: !ruby/object:Gem::Requirement
59
+ requirements:
60
+ - - ">="
61
+ - !ruby/object:Gem::Version
62
+ version: '0'
63
+ requirements: []
64
+ rubygems_version: 3.3.16
65
+ signing_key:
66
+ specification_version: 4
67
+ summary: Extend Nokogiri with several useful HTML-centric features.
68
+ test_files: []