nokogumbo 2.0.0.pre.alpha → 2.0.4
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +101 -14
- data/ext/nokogumbo/extconf.rb +7 -2
- data/ext/nokogumbo/nokogumbo.c +630 -235
- data/gumbo-parser/src/ascii.c +42 -0
- data/gumbo-parser/src/ascii.h +91 -7
- data/gumbo-parser/src/char_ref.c +5973 -4601
- data/gumbo-parser/src/char_ref.h +13 -28
- data/gumbo-parser/src/error.c +391 -126
- data/gumbo-parser/src/error.h +63 -125
- data/gumbo-parser/src/gumbo.h +74 -4
- data/gumbo-parser/src/parser.c +1161 -1025
- data/gumbo-parser/src/string_buffer.c +1 -1
- data/gumbo-parser/src/string_buffer.h +1 -1
- data/gumbo-parser/src/token_buffer.c +79 -0
- data/gumbo-parser/src/token_buffer.h +71 -0
- data/gumbo-parser/src/tokenizer.c +1440 -1278
- data/gumbo-parser/src/tokenizer.h +7 -18
- data/gumbo-parser/src/tokenizer_states.h +275 -23
- data/gumbo-parser/src/utf8.c +17 -59
- data/gumbo-parser/src/utf8.h +52 -16
- data/lib/nokogumbo.rb +3 -1
- data/lib/nokogumbo/html5.rb +17 -15
- data/lib/nokogumbo/html5/document.rb +19 -3
- data/lib/nokogumbo/html5/document_fragment.rb +36 -20
- data/lib/nokogumbo/{xml → html5}/node.rb +28 -13
- data/lib/nokogumbo/version.rb +1 -1
- metadata +20 -14
- data/CHANGELOG.md +0 -56
data/CHANGELOG.md
DELETED
@@ -1,56 +0,0 @@
|
|
1
|
-
# Changelog
|
2
|
-
|
3
|
-
All notable changes to Nokogumbo will be documented in this file.
|
4
|
-
|
5
|
-
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
|
6
|
-
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
|
7
|
-
|
8
|
-
## [Unreleased]
|
9
|
-
### Added
|
10
|
-
- Experimental support for errors (it was supported in 1.5.0 but
|
11
|
-
undocumented).
|
12
|
-
- Added proper HTML5 serialization.
|
13
|
-
- Added option `:max_tree_depth` to control the maximum parse tree depth.
|
14
|
-
|
15
|
-
### Changed
|
16
|
-
- Integrated [Gumbo parser](https://github.com/google/gumbo-parser) into
|
17
|
-
Nokogumbo. A system version will not be used.
|
18
|
-
- The undocumented (but publicly mentioned) `:max_parse_errors` renamed to `:max_errors`;
|
19
|
-
`:max_parse_errors` is deprecated and will go away
|
20
|
-
- The various `#parse` and `#fragment` (and `Nokogiri.HTML5`) methods return
|
21
|
-
`Nokogiri::HTML5::Document` and `Nokogiri::HTML5::DocumentFragment` classes
|
22
|
-
rather than `Nokogiri::HTML::Document` and
|
23
|
-
`Nokogiri::HTML::DocumentFragment`.
|
24
|
-
- Changed the top-level API to more closely match Nokogiri's while maintaining
|
25
|
-
backwards compatibility. The new APIs are
|
26
|
-
* `Nokogiri::HTML5(html, url = nil, encoding = nil, **options, &block)`
|
27
|
-
* `Nokogiri::HTML5.parse(html, url = nil, encoding = nil, **options, &block)`
|
28
|
-
* `Nokogiri::HTML5::Document.parse(html, url = nil, encoding = nil, **options, &block)`
|
29
|
-
* `Nokogiri::HTML5.fragment(html, encoding = nil, **options)`
|
30
|
-
* `Nokogiri::HTML5::DocumentFragment.parse(html, encoding = nil, **options)`
|
31
|
-
In all cases, `html` can be a string or an `IO` object (something that
|
32
|
-
responds to `#read`). The `url` parameter is entirely for error reporting,
|
33
|
-
as in Nokogiri. The `encoding` parameter only signals what encoding `html`
|
34
|
-
should have on input; the output `Document` or `DocumentFragment` will be in
|
35
|
-
UTF-8. Currently, the only options supported is `:max_errors` which controls
|
36
|
-
the maximum number of reported by `#errors`.
|
37
|
-
|
38
|
-
### Deprecated
|
39
|
-
- `:max_parse_errors`; use `:max_errors`
|
40
|
-
|
41
|
-
### Removed
|
42
|
-
|
43
|
-
### Fixed
|
44
|
-
- Fixed documents failing to serialize (via `to_html`) if they contain certain
|
45
|
-
`meta` elements that set the `charset`.
|
46
|
-
- Documents are now properly marked as UTF-8 after parsing.
|
47
|
-
- Fixed `Nokogiri::HTML5.fragment` reporting an error due to a missing
|
48
|
-
`<!DOCTYPE html>`.
|
49
|
-
- Fixed crash when input contains U+0000 NULL bytes and error reporting is
|
50
|
-
enabled.
|
51
|
-
|
52
|
-
### Security
|
53
|
-
- The most recent, released version of Gumbo has a [potential security
|
54
|
-
issue](https://github.com/google/gumbo-parser/pull/375) that could result in
|
55
|
-
a cross-site scripting vulnerability. This has been fixed by integrating
|
56
|
-
Gumbo into Nokogumbo.
|