RubyGems - nokogiri-html5-inference - Versions diffs - 0.1.1 → 0.2.0 - Mend

nokogiri-html5-inference 0.1.1 → 0.2.0

Files changed (6) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +5 -0
data/README.md +20 -10
data/lib/nokogiri/html5/inference/version.rb +1 -1
data/lib/nokogiri/html5/inference.rb +45 -41
metadata +2 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: '0942dd8a89c2d930794a10583ed84746da2c434af00ccb3c3d52c4932fa1b812'
-  data.tar.gz: c09d3bf45f24570c4c4d80571a16e35c7fdefa82be467fead21334194788938f
+  metadata.gz: bd21e524361107504a10fa58bd991b9b89bd40f968b7550b90d38454156e65c2
+  data.tar.gz: ddbedefb2798713c3ae46d27df75efa5c0bfe10d1782dc419d8587dd6a155a74
 SHA512:
-  metadata.gz: 5e931d3e3c4a3a516c046bf04ad352c58acd0118fca531fc3ce7f5a9ce3a93f1918ac96447c53814185e58d3d51c578ea4e45dbd15f68e9d290a696a6d6a0783
-  data.tar.gz: 49d6050376b44467248e3c7c85251add238773c954914d874c282108c5569249394852ce3028693fef547ea60e41346e2be328a105e9076106250449d1d02b29
+  metadata.gz: c46997b8c93033fb53e5c2b141a34bdbe93d6c1f55a3cd3de76abd6116c807b60ae2a26dfb02e8ce06adf441308fbf0e2522eb0b302ef11fedb1231c33df0300
+  data.tar.gz: e24dc98db23641a8edb55824a7a4c0f7d858f6c0d2fe99f854a56c62ebc97a8e4f0f79df849798bb2097b9e0253213ebbbf3adbd988a3580b55ca891f8375a63

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,10 @@
 ## [Unreleased]
+## [0.2.0] - 2024-04-26
+- When a `<head>` tag is seen first in the input string, include the `<body>` tag in the returned fragment or node set. (#3, #4) @flavorjones
 ## [0.1.1] - 2024-04-24
 - Make protected methods `#context` and `#pluck_path` public, but keeping them undocumented.

data/README.md CHANGED Viewed

@@ -2,7 +2,10 @@
 Given HTML5 input, make a reasonable guess at how to parse it correctly.
-Infer from the HTML5 input whether it's a fragment or a document, and if it's a fragment what the proper context node should be. This is useful for parsing trusted content like view snippets, particularly for morphing cases like StimulusReflex.
+Nokogiri::HTML5::Inference makes reasonable inferences that work for both HTML5 documents and HTML5
+fragments, and for all the different HTML5 tags that a web developer might need in a view library.
+This is useful for parsing trusted content like view snippets, particularly for morphing cases like StimulusReflex.
 ## The problem this library solves
@@ -20,23 +23,25 @@ For example:
 ``` ruby
 Nokogiri::HTML5::DocumentFragment.parse("<td>foo</td>").to_html
-# => "foo"
+# => "foo" # where did the tag go!?
 ```
 In the default "in body" mode, the parser will log an error, "Start tag 'td' isn't allowed here",
-and drop the tag. This fragment must be parsed "in the context" of a table in order to parse
-properly. Thankfully, libgumbo and Nokogiri allow us to do this:
+and drop the tag. This particular fragment must be parsed "in the context" of a table in order to
+parse properly.
+Thankfully, libgumbo and Nokogiri allow us to set the context node:
 ``` ruby
 Nokogiri::HTML5::DocumentFragment.new(
   Nokogiri::HTML5::Document.new,
   "<td>foo</td>",
-  "table"  # this is the context node
+  "table"  # <--- this is the context node
 ).to_html
 # => "<tbody><tr><td>foo</td></tr></tbody>"
 ```
-This is _almost_ correct, but we're seeing another HTML5 parsing rule in action: there may be
+This result is _almost_ correct, but we're seeing another HTML5 parsing rule in action: there may be
 _intermediate parent tags_ that the HTML5 spec requires to be inserted by the parser. In this case,
 the `<td>` tag must be wrapped in `<tbody><tr>` tags.
@@ -51,9 +56,12 @@ Nokogiri::HTML5::DocumentFragment.new(
 # => "<td>foo</td>"
 ```
-Hurrah! This is precisely what Nokogiri::HTML5::Inference.parse does: make reasonable inferences
-that work for both HTML5 documents and HTML5 fragments, and for all the different HTML5 tags that a
-web developer might need in a view library.
+Huzzah! That works. And it's precisely what Nokogiri::HTML5::Inference.parse does:
+``` ruby
+Nokogiri::HTML5::Inference.parse("<td>foo</td>").to_html
+# => "<td>foo</td>"
+```
 ## Usage
@@ -67,7 +75,7 @@ html = <<~HTML
   <!doctype html>
   <html lang="en">
     <head>
-      <meta encoding="UTF-8">
+      <meta charset="utf-8">
     </head>
     <body>
       <h1>Hello, world!</h1>
@@ -131,6 +139,8 @@ decisions. Nonetheless, it is a step forward from what Nokogiri and libgumbo do
 The implementation also is almost certainly incomplete, meaning there are HTML5 tags that aren't handled by this library as you might expect.
+This implementation is probably OK for handling untrusted content, but it's still new and I haven't really thought very hard about it yet. If you want to use it on untrusted content, open an issue and talk with us about your use case so we can help keep you secure!
 We would welcome bug reports and pull requests improving this library!

data/lib/nokogiri/html5/inference/version.rb CHANGED Viewed

@@ -3,7 +3,7 @@
 module Nokogiri
   module HTML5
     module Inference
-      VERSION = "0.1.1"
+      VERSION = "0.2.0"
     end
   end
 end

data/lib/nokogiri/html5/inference.rb CHANGED Viewed

@@ -12,55 +12,59 @@ else
     module HTML5
       # :markup: markdown
       #
-      # The [HTML5 Spec](https://html.spec.whatwg.org/multipage/parsing.html) defines some very precise
-      # context-dependent parsing rules which can make it challenging to "just parse" a fragment of HTML
-      # without knowing the parent node -- also called the "context node" -- in which it will be inserted.
+      #  The [HTML5 Spec](https://html.spec.whatwg.org/multipage/parsing.html) defines some very precise
+      #  context-dependent parsing rules which can make it challenging to "just parse" a fragment of HTML
+      #  without knowing the parent node -- also called the "context node" -- in which it will be inserted.
       #
-      # Most content in an HTML5 document can be parsed assuming the parser's mode will be in the
-      # ["in body" insertion mode](https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inbody),
-      # but there are some notable exceptions. Perhaps the most problematic to web developers are the
-      # table-related tags, which will not be parsed properly unless the parser is in the
-      # ["in table" insertion mode](https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-intable).
+      #  Most content in an HTML5 document can be parsed assuming the parser's mode will be in the
+      #  ["in body" insertion mode](https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inbody),
+      #  but there are some notable exceptions. Perhaps the most problematic to web developers are the
+      #  table-related tags, which will not be parsed properly unless the parser is in the
+      #  ["in table" insertion mode](https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-intable).
       #
-      # For example:
+      #  For example:
       #
-      # ``` ruby
-      # Nokogiri::HTML5::DocumentFragment.parse("<td>foo</td>").to_html
-      # # => "foo"
-      # ```
+      #  ``` ruby
+      #  Nokogiri::HTML5::DocumentFragment.parse("<td>foo</td>").to_html
+      #  # => "foo" # where did the tag go!?
+      #  ```
       #
-      # In the default "in body" mode, the parser will log an error, "Start tag 'td' isn't allowed here",
-      # and drop the tag. This fragment must be parsed "in the context" of a table in order to parse
-      # properly. Thankfully, libgumbo and Nokogiri allow us to do this:
+      #  In the default "in body" mode, the parser will log an error, "Start tag 'td' isn't allowed here",
+      #  and drop the tag. This particular fragment must be parsed "in the context" of a table in order to
+      #  parse properly.
       #
-      # ``` ruby
-      # Nokogiri::HTML5::DocumentFragment.new(
-      #   Nokogiri::HTML5::Document.new,
-      #   "<td>foo</td>",
-      #   "table"  # this is the context node
-      # ).to_html
-      # # => "<tbody><tr><td>foo</td></tr></tbody>"
-      # ```
+      #  Thankfully, libgumbo and Nokogiri allow us to set the context node:
       #
-      # This is _almost_ correct, but we're seeing another HTML5 parsing rule in action: there may be
-      # _intermediate parent tags_ that the HTML5 spec requires to be inserted by the parser. In this case,
-      # the `<td>` tag must be wrapped in `<tbody><tr>` tags.
+      #  ``` ruby
+      #  Nokogiri::HTML5::DocumentFragment.new(
+      #    Nokogiri::HTML5::Document.new,
+      #    "<td>foo</td>",
+      #    "table"  # <--- this is the context node
+      #  ).to_html
+      #  # => "<tbody><tr><td>foo</td></tr></tbody>"
+      #  ```
       #
-      # We can narrow down the result set with an XPath query to get back only the intended tags:
+      #  This result is _almost_ correct, but we're seeing another HTML5 parsing rule in action: there may be
+      #  _intermediate parent tags_ that the HTML5 spec requires to be inserted by the parser. In this case,
+      #  the `<td>` tag must be wrapped in `<tbody><tr>` tags.
       #
-      # ``` ruby
-      # Nokogiri::HTML5::DocumentFragment.new(
-      #   Nokogiri::HTML5::Document.new,
-      #   "<td>foo</td>",
-      #   "table"  # this is the context node
-      # ).xpath("tbody/tr/*").to_html
-      # # => "<td>foo</td>"
-      # ```
+      #  We can narrow down the result set with an XPath query to get back only the intended tags:
       #
-      # Hurrah! This is precisely what Nokogiri::HTML5::Inference.parse does: make reasonable inferences
-      # that work for both HTML5 documents and HTML5 fragments, and for all the different HTML5 tags that a
-      # web developer might need in a view library.
+      #  ``` ruby
+      #  Nokogiri::HTML5::DocumentFragment.new(
+      #    Nokogiri::HTML5::Document.new,
+      #    "<td>foo</td>",
+      #    "table"  # this is the context node
+      #  ).xpath("tbody/tr/*").to_html
+      #  # => "<td>foo</td>"
+      #  ```
       #
+      #  Huzzah! That works. And it's precisely what Nokogiri::HTML5::Inference.parse does:
+      #
+      #  ``` ruby
+      #  Nokogiri::HTML5::Inference.parse("<td>foo</td>").to_html
+      #  # => "<td>foo</td>"
+      #  ```
       module Inference
         # Tags that must be parsed in a specific HTML5 insertion mode, for which we must use a
         # context node.
@@ -89,7 +93,7 @@ else
           TBODY = /\A\s*<(#{PluckTags::TBODY.join("|")})\b/i
           TBODY_TR = /\A\s*<(#{PluckTags::TBODY_TR.join("|")})\b/i
           COLGROUP = /\A\s*<(#{PluckTags::COLGROUP.join("|")})\b/i
-          HEAD_OUTER = /\A\s*<(head)\b/i
+          HTML_INNER = /\A\s*<(head)\b/i
           BODY_OUTER = /\A\s*<(body)\b/i
         end
@@ -180,7 +184,7 @@ else
             when PluckRegexp::TBODY then "tbody/*"
             when PluckRegexp::TBODY_TR then "tbody/tr/*"
             when PluckRegexp::COLGROUP then "colgroup/*"
-            when PluckRegexp::HEAD_OUTER then "head"
+            when PluckRegexp::HTML_INNER then "./*"
             when PluckRegexp::BODY_OUTER then "body"
             end
           end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: nokogiri-html5-inference
 version: !ruby/object:Gem::Version
-  version: 0.1.1
+  version: 0.2.0
 platform: ruby
 authors:
 - Mike Dalessio
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2024-04-24 00:00:00.000000000 Z
+date: 2024-04-26 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: nokogiri