nokogiri-html-ext 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: b391309327d894afb0495dd2fcbe3e3555a0c41bd21937af25a134e0ad1d27ec
4
+ data.tar.gz: 16e6cd74fd4485610a239edd904318973125de3225a1a5beca69a311fb79eda5
5
+ SHA512:
6
+ metadata.gz: dc29f44a0ed51d136b5c7e44ff646bfc6d024ac66308b7f539ce64e3ceaad3fe78faed345842946ed538ab4d4dd673a5d831a1cf2f1cc9689c429689c5815c0f
7
+ data.tar.gz: 2e4ec5bba2a5678f854194ea5d1c1c3a054fd5ae11e62d61f6992f0a52afe87ba239abc6f5114a5e6494a3e1cab1fac0f42a9fe81a79e490f21b77bd63e24161
data/CHANGELOG.md ADDED
@@ -0,0 +1,5 @@
1
+ # Changelog
2
+
3
+ ## v0.1.0 / 2022-07-01
4
+
5
+ - Initial release! 🎉
data/CONTRIBUTING.md ADDED
@@ -0,0 +1,37 @@
1
+ # Contributing to nokogiri-html-ext
2
+
3
+ There are a couple ways you can help improve nokogiri-html-ext:
4
+
5
+ 1. Fix an existing [issue][issues] and submit a [pull request][pulls].
6
+ 1. Review open [pull requests][pulls].
7
+ 1. Report a new [issue][issues]. _Only do this after you've made sure the behavior or problem you're observing isn't already documented in an open issue._
8
+
9
+ ## Getting Started
10
+
11
+ nokogiri-html-ext is developed using Ruby 2.7.6 and is additionally tested against Ruby 3.0 and 3.1 using [GitHub Actions](https://github.com/jgarber623/nokogiri-html-ext/actions).
12
+
13
+ Before making changes to nokogiri-html-ext, you'll want to install Ruby 2.7.6. It's recommended that you use a Ruby version managment tool like [rbenv](https://github.com/rbenv/rbenv), [chruby](https://github.com/postmodern/chruby), or [rvm](https://github.com/rvm/rvm). Once you've installed Ruby 2.7.6 using your method of choice, install the project's gems by running:
14
+
15
+ ```sh
16
+ bundle install
17
+ ```
18
+
19
+ ## Making Changes
20
+
21
+ 1. Fork and clone the project's repo.
22
+ 1. Install development dependencies as outlined above.
23
+ 1. Create a feature branch for the code changes you're looking to make: `git checkout -b my-new-feature`.
24
+ 1. _Write some code!_
25
+ 1. If your changes would benefit from testing, add the necessary tests and verify everything passes by running `bundle exec rspec`.
26
+ 1. Commit your changes: `git commit -am 'Add some new feature or fix some issue'`. _(See [this excellent article](https://chris.beams.io/posts/git-commit/) for tips on writing useful Git commit messages.)_
27
+ 1. Push the branch to your fork: `git push -u origin my-new-feature`.
28
+ 1. Create a new [pull request][pulls] and we'll review your changes.
29
+
30
+ ## Code Style
31
+
32
+ Code formatting conventions are defined in the `.editorconfig` file which uses the [EditorConfig](http://editorconfig.org) syntax. There are [plugins for a variety of editors](http://editorconfig.org/#download) that utilize the settings in the `.editorconfig` file. We recommended you install the EditorConfig plugin for your editor of choice.
33
+
34
+ Your bug fix or feature addition won't be rejected if it runs afoul of any (or all) of these guidelines, but following the guidelines will definitely make everyone's lives a little easier.
35
+
36
+ [issues]: https://github.com/jgarber623/nokogiri-html-ext/issues
37
+ [pulls]: https://github.com/jgarber623/nokogiri-html-ext/pulls
data/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2022 Jason Garber
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,139 @@
1
+ # nokogiri-html-ext
2
+
3
+ **A Ruby gem extending [Nokogiri](https://nokogiri.org) with several useful HTML-centric features.**
4
+
5
+ [![Gem](https://img.shields.io/gem/v/nokogiri-html-ext.svg?logo=rubygems&style=for-the-badge)](https://rubygems.org/gems/nokogiri-html-ext)
6
+ [![Downloads](https://img.shields.io/gem/dt/nokogiri-html-ext.svg?logo=rubygems&style=for-the-badge)](https://rubygems.org/gems/nokogiri-html-ext)
7
+ [![Build](https://img.shields.io/github/workflow/status/jgarber623/nokogiri-html-ext/CI?logo=github&style=for-the-badge)](https://github.com/jgarber623/nokogiri-html-ext/actions/workflows/ci.yml)
8
+ [![Maintainability](https://img.shields.io/codeclimate/maintainability/jgarber623/nokogiri-html-ext.svg?logo=code-climate&style=for-the-badge)](https://codeclimate.com/github/jgarber623/nokogiri-html-ext)
9
+ [![Coverage](https://img.shields.io/codeclimate/c/jgarber623/nokogiri-html-ext.svg?logo=code-climate&style=for-the-badge)](https://codeclimate.com/github/jgarber623/nokogiri-html-ext/code)
10
+
11
+ ## Key features
12
+
13
+ - Resolves all relative URLs in a Nokogiri-parsed HTML document.
14
+ - Adds helpers for getting and setting a document's `<base>` element's `href` attribute.
15
+ - Supports Ruby 2.7 and newer
16
+
17
+ ## Getting Started
18
+
19
+ Before installing and using nokogiri-html-ext, you'll want to have [Ruby](https://www.ruby-lang.org) 2.7 (or newer) installed. It's recommended that you use a Ruby version managment tool like [rbenv](https://github.com/rbenv/rbenv), [chruby](https://github.com/postmodern/chruby), or [rvm](https://github.com/rvm/rvm).
20
+
21
+ nokogiri-html-ext is developed using Ruby 2.7.6 and is additionally tested against Ruby 3.0 and 3.1 using [GitHub Actions](https://github.com/jgarber623/nokogiri-html-ext/actions).
22
+
23
+ ## Installation
24
+
25
+ If you're using [Bundler](https://bundler.io) to manage gem dependencies, add nokogiri-html-ext to your project's Gemfile:
26
+
27
+ ```ruby
28
+ gem 'nokogiri-html-ext'
29
+ ```
30
+
31
+ …and run `bundle install` in your shell.
32
+
33
+ To install the gem manually, run the following in your shell:
34
+
35
+ ```sh
36
+ gem install nokogiri-html-ext
37
+ ```
38
+
39
+ ## Usage
40
+
41
+ ### `base_href`
42
+
43
+ nokogiri-html-ext provides two helper methods for getting and setting a document's `<base>` element's `href` attribute. The first, `base_href`, retrieves the element's `href` attribute value if it exists.
44
+
45
+ ```ruby
46
+ doc = Nokogiri::HTML('<html><body>Hello, world!</body></html>')
47
+
48
+ doc.base_href
49
+ #=> nil
50
+
51
+ doc = Nokogiri::HTML('<html><head><base target="_top"><body>Hello, world!</body></html>')
52
+
53
+ doc.base_href
54
+ #=> nil
55
+
56
+ doc = Nokogiri::HTML('<html><head><base href="/foo"><body>Hello, world!</body></html>')
57
+
58
+ doc.base_href
59
+ #=> "/foo"
60
+ ```
61
+
62
+ The `base_href=` method allows you to manipulate the document's `<base>` element.
63
+
64
+ ```ruby
65
+ doc = Nokogiri::HTML('<html><body>Hello, world!</body></html>')
66
+
67
+ doc.base_href = '/foo'
68
+ #=> "/foo"
69
+
70
+ doc.at_css('base').to_s
71
+ #=> "<base href=\"/foo\">"
72
+
73
+ doc = Nokogiri::HTML('<html><head><base href="/foo"><body>Hello, world!</body></html>')
74
+
75
+ doc.base_href = '/bar'
76
+ #=> "/bar"
77
+
78
+ doc.at_css('base').to_s
79
+ #=> "<base href=\"/bar\">"
80
+ ```
81
+
82
+ ### `resolve_relative_urls!`
83
+
84
+ nokogiri-html-ext will resolve a document's relative URLs against a provided source URL. The source URL _should_ be an absolute URL (e.g. `https://jgarber.example`) representing the location of the document being parsed. The source URL _may_ be any `String` (or any Ruby object that responds to `#to_s`).
85
+
86
+ nokogiri-html-ext takes advantage of [the `Nokogiri::XML::Document.parse` method](https://github.com/sparklemotion/nokogiri/blob/main/lib/nokogiri/xml/document.rb#L48)'s second positional argument to set the parsed document's URL.Nokogiri's source code is _very_ complex, but in short: [the `Nokogiri::HTML` method](https://github.com/sparklemotion/nokogiri/blob/main/lib/nokogiri/html.rb#L7-L8) is an alias to [the `Nokogiri::HTML4` method](https://github.com/sparklemotion/nokogiri/blob/main/lib/nokogiri/html4.rb#L10-L12) which eventually winds its way to the aforementioned `Nokogiri::XML::Document.parse` method. _Phew._ 🥵
87
+
88
+ URL resolution uses Ruby's built-in URL parsing and normalizing capabilities. Absolute URLs will remain unmodified.
89
+
90
+ **Note:** If the document's markup includes a `<base>` element whose `href` attribute is an absolute URL, _that_ URL will take precedence when performing URL resolution.
91
+
92
+ An abbreviated example:
93
+
94
+ ```ruby
95
+ markup = <<-HTML
96
+ <html>
97
+ <body>
98
+ <a href="/home">Home</a>
99
+ <img src="/foo.png" srcset="../bar.png 720w">
100
+ </body>
101
+ </html>
102
+ HTML
103
+
104
+ doc = Nokogiri::HTML(markup, 'https://jgarber.example')
105
+
106
+ doc.url
107
+ #=> "https://jgarber.example"
108
+
109
+ doc.base_href
110
+ #=> nil
111
+
112
+ doc.base_href = '/foo/bar/biz'
113
+ #=> "/foo/bar/biz"
114
+
115
+ doc.resolve_relative_urls!
116
+
117
+ doc.at_css('base')['href']
118
+ #=> "https://jgarber.example/foo/bar/biz"
119
+
120
+ doc.at_css('a')['href']
121
+ #=> "https://jgarber.example/home"
122
+
123
+ doc.at_css('img').to_s
124
+ #=> "<img src=\"https://jgarber.example/foo.png\" srcset=\"https://jgarber.example/foo/bar.png 720w\">"
125
+ ```
126
+
127
+ ## Contributing
128
+
129
+ Interested in helping improve nokogiri-html-ext? Awesome! Your help is greatly appreciated. See [CONTRIBUTING.md](https://github.com/jgarber623/nokogiri-html-ext/blob/main/CONTRIBUTING.md) for details.
130
+
131
+ ## Acknowledgments
132
+
133
+ nokogiri-html-ext wouldn't exist without the [Nokogiri](https://nokogiri.org) project and its [community](https://github.com/sparklemotion/nokogiri).
134
+
135
+ nokogiri-html-ext is written and maintained by [Jason Garber](https://sixtwothree.org).
136
+
137
+ ## License
138
+
139
+ nokogiri-html-ext is freely available under the MIT License. Use it, learn from it, fork it, improve it, change it, tailor it to your needs.
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative 'html_ext/version'
4
+
5
+ require_relative 'html_ext/document'
@@ -0,0 +1,110 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'nokogiri'
4
+
5
+ module Nokogiri
6
+ module HTML4
7
+ class Document < Nokogiri::XML::Document
8
+ # A map of HTML `srcset` attributes and their associated element names.
9
+ #
10
+ # @see https://html.spec.whatwg.org/#srcset-attributes
11
+ # @see https://html.spec.whatwg.org/#attributes-3
12
+ IMAGE_CANDIDATE_STRINGS_ATTRIBUTES_MAP = {
13
+ 'imagesrcset' => %w[link],
14
+ 'srcset' => %w[img source]
15
+ }.freeze
16
+
17
+ private_constant :IMAGE_CANDIDATE_STRINGS_ATTRIBUTES_MAP
18
+
19
+ # A map of HTML URL attributes and their associated element names.
20
+ #
21
+ # @see https://html.spec.whatwg.org/#attributes-3
22
+ URL_ATTRIBUTES_MAP = {
23
+ 'action' => %w[form],
24
+ 'cite' => %w[blockquote del ins q],
25
+ 'data' => %w[object],
26
+ 'formaction' => %w[button input],
27
+ 'href' => %w[a area base link],
28
+ 'ping' => %w[a area],
29
+ 'poster' => %w[video],
30
+ 'src' => %w[audio embed iframe img input script source track video]
31
+ }.freeze
32
+
33
+ private_constant :URL_ATTRIBUTES_MAP
34
+
35
+ # Get the <base> element's HREF attribute value.
36
+ #
37
+ # @return [String, nil]
38
+ def base_href
39
+ (base = at_xpath('//base[@href]')) && base['href'].strip
40
+ end
41
+
42
+ # Set the <base> element's HREF attribute value.
43
+ #
44
+ # If a <base> element exists, its HREF attribute value is replaced with
45
+ # the given value. If no <base> element exists, this method will create
46
+ # one and append it to the document's <head> (creating that element if
47
+ # necessary).
48
+ #
49
+ # @param url [String, #to_s]
50
+ #
51
+ # @return [String]
52
+ def base_href=(url)
53
+ url_str = url.to_s
54
+
55
+ if (base = at_xpath('//base'))
56
+ base['href'] = url_str
57
+ url_str
58
+ else
59
+ base = XML::Node.new('base', self)
60
+ base['href'] = url_str
61
+
62
+ set_metadata_element(base)
63
+ end
64
+ end
65
+
66
+ # Convert the document's relative URLs to absolute URLs.
67
+ #
68
+ # @return [self]
69
+ #
70
+ # rubocop:disable Style/PerlBackrefs
71
+ def resolve_relative_urls!
72
+ resolve_relative_urls_for(URL_ATTRIBUTES_MAP) { |value| resolve_relative_url(value.strip) }
73
+
74
+ resolve_relative_urls_for(IMAGE_CANDIDATE_STRINGS_ATTRIBUTES_MAP) do |value|
75
+ value
76
+ .split(',')
77
+ .map { |candidate| candidate.strip.sub(/^(.+?)(\s+.+)?$/) { "#{resolve_relative_url($1)}#{$2}" } }
78
+ .join(', ')
79
+ end
80
+
81
+ self
82
+ end
83
+ # rubocop:enable Style/PerlBackrefs
84
+
85
+ private
86
+
87
+ def resolve_relative_url(url)
88
+ uri_parser.unescape(
89
+ uri_parser.join(*[document.url.strip, base_href, url].compact.map { |u| uri_parser.escape(u) })
90
+ .normalize
91
+ .to_s
92
+ )
93
+ end
94
+
95
+ def resolve_relative_urls_for(attributes_map)
96
+ attributes_map.each do |attribute, names|
97
+ xpaths = names.map { |name| "//#{name}[@#{attribute}]" }
98
+
99
+ xpath(*xpaths).each do |node|
100
+ node[attribute] = yield node[attribute]
101
+ end
102
+ end
103
+ end
104
+
105
+ def uri_parser
106
+ @uri_parser ||= URI::DEFAULT_PARSER
107
+ end
108
+ end
109
+ end
110
+ end
@@ -0,0 +1,7 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Nokogiri
4
+ module HTMLExt
5
+ VERSION = '0.1.0'
6
+ end
7
+ end
@@ -0,0 +1,31 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative 'lib/nokogiri/html_ext/version'
4
+
5
+ Gem::Specification.new do |spec|
6
+ spec.required_ruby_version = '>= 2.7'
7
+
8
+ spec.name = 'nokogiri-html-ext'
9
+ spec.version = Nokogiri::HTMLExt::VERSION
10
+ spec.authors = ['Jason Garber']
11
+ spec.email = ['jason@sixtwothree.org']
12
+
13
+ spec.summary = 'Extend Nokogiri with several useful HTML-centric features.'
14
+ spec.description = spec.summary
15
+ spec.homepage = 'https://github.com/jgarber623/nokogiri-html-ext'
16
+ spec.license = 'MIT'
17
+
18
+ spec.files = Dir['lib/**/*'].reject { |f| File.directory?(f) }
19
+ spec.files += %w[LICENSE CHANGELOG.md CONTRIBUTING.md README.md]
20
+ spec.files += %w[nokogiri-html-ext.gemspec]
21
+
22
+ spec.require_paths = ['lib']
23
+
24
+ spec.metadata = {
25
+ 'bug_tracker_uri' => "#{spec.homepage}/issues",
26
+ 'changelog_uri' => "#{spec.homepage}/blob/v#{spec.version}/CHANGELOG.md",
27
+ 'rubygems_mfa_required' => 'true'
28
+ }
29
+
30
+ spec.add_runtime_dependency 'nokogiri', '>= 1.13'
31
+ end
metadata ADDED
@@ -0,0 +1,68 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: nokogiri-html-ext
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Jason Garber
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2022-07-02 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: nokogiri
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: '1.13'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: '1.13'
27
+ description: Extend Nokogiri with several useful HTML-centric features.
28
+ email:
29
+ - jason@sixtwothree.org
30
+ executables: []
31
+ extensions: []
32
+ extra_rdoc_files: []
33
+ files:
34
+ - CHANGELOG.md
35
+ - CONTRIBUTING.md
36
+ - LICENSE
37
+ - README.md
38
+ - lib/nokogiri/html-ext.rb
39
+ - lib/nokogiri/html_ext/document.rb
40
+ - lib/nokogiri/html_ext/version.rb
41
+ - nokogiri-html-ext.gemspec
42
+ homepage: https://github.com/jgarber623/nokogiri-html-ext
43
+ licenses:
44
+ - MIT
45
+ metadata:
46
+ bug_tracker_uri: https://github.com/jgarber623/nokogiri-html-ext/issues
47
+ changelog_uri: https://github.com/jgarber623/nokogiri-html-ext/blob/v0.1.0/CHANGELOG.md
48
+ rubygems_mfa_required: 'true'
49
+ post_install_message:
50
+ rdoc_options: []
51
+ require_paths:
52
+ - lib
53
+ required_ruby_version: !ruby/object:Gem::Requirement
54
+ requirements:
55
+ - - ">="
56
+ - !ruby/object:Gem::Version
57
+ version: '2.7'
58
+ required_rubygems_version: !ruby/object:Gem::Requirement
59
+ requirements:
60
+ - - ">="
61
+ - !ruby/object:Gem::Version
62
+ version: '0'
63
+ requirements: []
64
+ rubygems_version: 3.3.16
65
+ signing_key:
66
+ specification_version: 4
67
+ summary: Extend Nokogiri with several useful HTML-centric features.
68
+ test_files: []