nokogumbo 2.0.0.pre.alpha → 2.0.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +101 -14
- data/ext/nokogumbo/extconf.rb +7 -2
- data/ext/nokogumbo/nokogumbo.c +630 -235
- data/gumbo-parser/src/ascii.c +42 -0
- data/gumbo-parser/src/ascii.h +91 -7
- data/gumbo-parser/src/char_ref.c +5973 -4601
- data/gumbo-parser/src/char_ref.h +13 -28
- data/gumbo-parser/src/error.c +391 -126
- data/gumbo-parser/src/error.h +63 -125
- data/gumbo-parser/src/gumbo.h +74 -4
- data/gumbo-parser/src/parser.c +1161 -1025
- data/gumbo-parser/src/string_buffer.c +1 -1
- data/gumbo-parser/src/string_buffer.h +1 -1
- data/gumbo-parser/src/token_buffer.c +79 -0
- data/gumbo-parser/src/token_buffer.h +71 -0
- data/gumbo-parser/src/tokenizer.c +1440 -1278
- data/gumbo-parser/src/tokenizer.h +7 -18
- data/gumbo-parser/src/tokenizer_states.h +275 -23
- data/gumbo-parser/src/utf8.c +17 -59
- data/gumbo-parser/src/utf8.h +52 -16
- data/lib/nokogumbo.rb +3 -1
- data/lib/nokogumbo/html5.rb +17 -15
- data/lib/nokogumbo/html5/document.rb +19 -3
- data/lib/nokogumbo/html5/document_fragment.rb +36 -20
- data/lib/nokogumbo/{xml → html5}/node.rb +28 -13
- data/lib/nokogumbo/version.rb +1 -1
- metadata +20 -14
- data/CHANGELOG.md +0 -56
data/CHANGELOG.md
DELETED
@@ -1,56 +0,0 @@
|
|
1
|
-
# Changelog
|
2
|
-
|
3
|
-
All notable changes to Nokogumbo will be documented in this file.
|
4
|
-
|
5
|
-
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
|
6
|
-
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
|
7
|
-
|
8
|
-
## [Unreleased]
|
9
|
-
### Added
|
10
|
-
- Experimental support for errors (it was supported in 1.5.0 but
|
11
|
-
undocumented).
|
12
|
-
- Added proper HTML5 serialization.
|
13
|
-
- Added option `:max_tree_depth` to control the maximum parse tree depth.
|
14
|
-
|
15
|
-
### Changed
|
16
|
-
- Integrated [Gumbo parser](https://github.com/google/gumbo-parser) into
|
17
|
-
Nokogumbo. A system version will not be used.
|
18
|
-
- The undocumented (but publicly mentioned) `:max_parse_errors` renamed to `:max_errors`;
|
19
|
-
`:max_parse_errors` is deprecated and will go away
|
20
|
-
- The various `#parse` and `#fragment` (and `Nokogiri.HTML5`) methods return
|
21
|
-
`Nokogiri::HTML5::Document` and `Nokogiri::HTML5::DocumentFragment` classes
|
22
|
-
rather than `Nokogiri::HTML::Document` and
|
23
|
-
`Nokogiri::HTML::DocumentFragment`.
|
24
|
-
- Changed the top-level API to more closely match Nokogiri's while maintaining
|
25
|
-
backwards compatibility. The new APIs are
|
26
|
-
* `Nokogiri::HTML5(html, url = nil, encoding = nil, **options, &block)`
|
27
|
-
* `Nokogiri::HTML5.parse(html, url = nil, encoding = nil, **options, &block)`
|
28
|
-
* `Nokogiri::HTML5::Document.parse(html, url = nil, encoding = nil, **options, &block)`
|
29
|
-
* `Nokogiri::HTML5.fragment(html, encoding = nil, **options)`
|
30
|
-
* `Nokogiri::HTML5::DocumentFragment.parse(html, encoding = nil, **options)`
|
31
|
-
In all cases, `html` can be a string or an `IO` object (something that
|
32
|
-
responds to `#read`). The `url` parameter is entirely for error reporting,
|
33
|
-
as in Nokogiri. The `encoding` parameter only signals what encoding `html`
|
34
|
-
should have on input; the output `Document` or `DocumentFragment` will be in
|
35
|
-
UTF-8. Currently, the only options supported is `:max_errors` which controls
|
36
|
-
the maximum number of reported by `#errors`.
|
37
|
-
|
38
|
-
### Deprecated
|
39
|
-
- `:max_parse_errors`; use `:max_errors`
|
40
|
-
|
41
|
-
### Removed
|
42
|
-
|
43
|
-
### Fixed
|
44
|
-
- Fixed documents failing to serialize (via `to_html`) if they contain certain
|
45
|
-
`meta` elements that set the `charset`.
|
46
|
-
- Documents are now properly marked as UTF-8 after parsing.
|
47
|
-
- Fixed `Nokogiri::HTML5.fragment` reporting an error due to a missing
|
48
|
-
`<!DOCTYPE html>`.
|
49
|
-
- Fixed crash when input contains U+0000 NULL bytes and error reporting is
|
50
|
-
enabled.
|
51
|
-
|
52
|
-
### Security
|
53
|
-
- The most recent, released version of Gumbo has a [potential security
|
54
|
-
issue](https://github.com/google/gumbo-parser/pull/375) that could result in
|
55
|
-
a cross-site scripting vulnerability. This has been fixed by integrating
|
56
|
-
Gumbo into Nokogumbo.
|