RubyGems - canon - Versions diffs - 0.2.4 → 0.2.6 - Mend

canon 0.2.4 → 0.2.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

checksums.yaml +4 -4
data/README.adoc +4 -0
data/docs/advanced/diff-classification.adoc +16 -0
data/docs/advanced/semantic-diff-report.adoc +65 -0
data/docs/features/diff-formatting/index.adoc +5 -0
data/docs/features/diff-formatting/whitespace-adjacency.adoc +218 -0
data/docs/reference/environment-variables.adoc +3 -1
data/lib/canon/comparison/comparison_result.rb +16 -2
data/lib/canon/comparison/html_comparator.rb +4 -0
data/lib/canon/comparison/markup_comparator.rb +49 -71
data/lib/canon/comparison/node_inspector.rb +103 -0
data/lib/canon/comparison/xml_comparator/child_comparison.rb +127 -55
data/lib/canon/comparison/xml_comparator/diff_node_builder.rb +24 -23
data/lib/canon/comparison/xml_comparator.rb +97 -3
data/lib/canon/comparison/xml_node_comparison.rb +37 -81
data/lib/canon/comparison.rb +59 -0
data/lib/canon/diff/diff_classifier.rb +37 -39
data/lib/canon/diff/xml_serialization_formatter.rb +27 -42
data/lib/canon/diff_formatter/diff_detail_formatter/dimension_formatter.rb +119 -9
data/lib/canon/diff_formatter/diff_detail_formatter/node_utils.rb +75 -4
data/lib/canon/diff_formatter.rb +71 -2
data/lib/canon/pretty_printer/html.rb +76 -14
data/lib/canon/pretty_printer/html_void_elements.rb +20 -0
data/lib/canon/pretty_printer/xml_normalized.rb +10 -3
data/lib/canon/version.rb +1 -1
data/lib/canon/xml/data_model.rb +13 -1
data/lib/canon/xml/node.rb +15 -0
data/lib/canon/xml/sax_builder.rb +18 -0
metadata +5 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: c24b7c541d6159a3c261d389c0d41b85f954cd4152d88d9ca4748d9a3ceb34ef
-  data.tar.gz: 1de985c950b90c6979432b7b0bd1ed1b469240456fb7fd985a3d7f6929448b83
+  metadata.gz: 6437f1a8b556bb49bffbcecf47ec0eeecabdf6541bd5baa5954ac88f98f33a2c
+  data.tar.gz: 98eff2aa558165dc7e13c8d29da21d8d9c6589cae1d48a18d27f0420d6be7198
 SHA512:
-  metadata.gz: 719eefd6be6d642503adb82e50609983fe9082ec8c7efe34c5e6cf27bfdc8065edc05b7ae75a959db8e5fe117f0ec67d71d81006d342a1c01f2846b4aa54b196
-  data.tar.gz: 32a1bece85afd8265f158fdea547de08759773ba8a1e574ca72e42c79f6f59b02ed881cc4ba4bb78e54d135c9f4362100c8d409d2ee08b0eaa3561b13652296c
+  metadata.gz: 055614c143bca292b575755f5b4a1554a002e0d6f264ddee3e29049f89d6f9795a61069c1bdf7ebfa459ed11ac2a21203a779e115f8aee143a2aa3c77951a086
+  data.tar.gz: ff64c25654c1eef41dcc80b471df2958c516e69fa625723bdecfd18c5a99716e7f347c313bf72c28f3b898547d5c97aac3ebb62bcf44e726bc49e668a73e0dd9

data/README.adoc CHANGED Viewed

@@ -614,6 +614,10 @@ See link:docs/MODES[Diff modes] for details.
 * **Formatting diff detection**: Automatically detects and highlights purely cosmetic whitespace/line break differences
 * **Whitespace visualization**: Make invisible characters visible with CJK-safe
   Unicode symbols
+* **Whitespace adjacency reporting**: Stray whitespace-only text nodes are
+  reported as a dedicated `:whitespace_adjacency` dimension with direction
+  wording (`before`/`after`/`adjacent to`) instead of cascading into
+  misleading `:text_content` mismatches
 * **Non-ASCII detection**: Warnings for unexpected Unicode characters
 * **Customizable**: Character maps, context lines, grouping options

data/docs/advanced/diff-classification.adoc CHANGED Viewed

@@ -266,6 +266,22 @@ match dimension but follows the same normative rule as all general dimensions:
   - Structural differences tracked but don't affect equivalence
   - Useful for content-only comparisons where wrapper elements don't matter
+==== Whitespace Adjacency
+`:whitespace_adjacency` is a derived dimension — emitted when the
+alignment walk pairs a whitespace-only text node on one side against a
+content node on the other. The Reason line describes the whitespace's
+direction relative to the partner content: `before`, `after`, or
+`adjacent to`.
+* **Always normative** — differences always affect equivalence
+* **Not user-configurable** — dimension is always tracked when the
+  re-alignment walk encounters an asymmetric whitespace node
+* **Report-only** — does not change equivalence outcomes compared to
+  pre-#137 behaviour; only changes the diff-report shape (see
+  link:../features/diff-formatting/whitespace-adjacency.adoc[Whitespace
+  adjacency] for details)
 .Example: Comment handling
 ====
 [source,ruby]

data/docs/advanced/semantic-diff-report.adoc CHANGED Viewed

@@ -14,6 +14,39 @@ The Semantic Diff Report provides dimension-specific, actionable details for eac
 The report is automatically shown in verbose mode when differences exist, appearing before the detailed diff output.
+== Parse errors
+When Canon's underlying parser (libxml for XML, HTML5 for HTML) reports errors during input parsing, Canon surfaces them at the top of the diff report in a banner section before any per-difference output. The banner names the offending side and warns that the diff below describes the parsed tree, not the input — content the parser could not represent has been silently dropped from the comparison tree.
+This is purely a transparency feature: Canon does not modify the parse to "fix" invalid input. The user is responsible for deciding whether the parse failure was expected (e.g. testing legacy fixtures during a migration) or symptomatic of an upstream bug.
+.Example: Banner for a duplicate-attribute FATAL on the received side
+[example]
+====
+[source]
+----
+======================================================================
+  ⚠️  PARSE ERRORS
+======================================================================
+  Received side:
+    Attribute xml:lang redefined
+  ⚠️  The diff below describes the parsed tree, not the input.
+      Content that the parser could not represent has been
+      dropped and may appear as "missing" in the report.
+======================================================================
+----
+====
+Common triggers in HTML / XHTML round-trips:
+* Duplicate attributes (XML strict; HTML5 permissive — only XML mode triggers a banner)
+* Stray processing instructions in fragment context
+* Malformed namespace declarations
+* DOCTYPE in unexpected positions
+The banner is rendered when `Canon::Comparison::ComparisonResult#parse_errors?` is true. Programmatic callers can read `parse_errors_expected` and `parse_errors_received` directly off the result.
 == Key Features
 * XPath locations for XML/HTML elements
@@ -179,6 +212,38 @@ Reason:  Text: "¬······:¬······"
 This fallback is implemented in `Canon::DiffFormatter::DiffDetailFormatterHelpers::DimensionFormatter.format_text_content_details` and only triggers when `TextUtils.ambiguous_text_pair?` returns `true` _and_ at least one side has a parent element to render.
+==== One-sided text diffs (added or removed text nodes)
+When a `text_content` difference carries a text node on one side and `nil` on the other (issue #125) -- the shape that fragment-length mismatches and child-comparison emit when a text-node child is missing -- the renderer mirrors `element_structure`: the missing side reads `(not present)`, and the present side reads the text-node content (whitespace-visualised) plus a brief parent open-tag hint for context. The full ancestor subtree is *not* dumped; only the immediate parent's opening tag is shown, so a missing whitespace text node cannot make the diff look like the entire ancestor differs.
+.Example: Whitespace text node missing on the received side
+[example]
+====
+[source]
+----
+🔍 DIFFERENCE #1/1 [NORMATIVE]
+──────────────────────────────────────────────────────────────────────
+Dimension: text_content
+Reason:  element missing: text
+⊖ Expected (File 1):
+   text "¬············" in <div id="A">
+⊕ Actual (File 2):
+   (not present)
+✨ Changes:
+   Text removed: text "¬············" in <div id="A">
+----
+====
+The `Changes:` line uses `Text removed:` or `Text added:` to mirror the `Element removed:` / `Element added:` phrasing of `element_structure`.
+==== Element-shaped diffs misclassified as text_content
+In rare cases an upstream comparator may emit an *element*-shaped one-sided diff under `dimension: :text_content`.  Without a guard, the one-sided text formatter would call `raw_text_value` on the element (which returns `""` for an empty element such as `<br/>`) and render `text "" in <parent>` -- meaningless when an element is what's actually missing.
+The formatter detects element-shaped present-side nodes (Canon `ElementNode` or Nokogiri `Element`) and delegates to `format_element_structure_details`, so the rendered output reads `<br/>` and `Element removed:` rather than `text ""` and `Text removed:`.  This is defence in depth -- the construction-side fix in `XmlComparatorHelpers::ChildComparison` ensures element orphans are now tagged `:element_structure` at source -- but a misclassified diff still renders meaningfully if any path slips through.
 === Structural Whitespace
 Shows whitespace-only differences (usually informative).

data/docs/features/diff-formatting/index.adoc CHANGED Viewed

@@ -28,6 +28,11 @@ Canon's diff formatting includes:
 * **Context and grouping**: Control how much surrounding context to show
 * **Algorithm-specific output**: Different output styles for different diff
   algorithms
+* **Whitespace adjacency**: Stray whitespace-only text nodes are anchored at
+  themselves instead of cascading into mismatches against neighbouring
+  content. The Reason line names the direction relative to the partner
+  (`before`/`after`/`adjacent to`)
+  (link:./whitespace-adjacency.adoc[details])
 == Available formatting options

data/docs/features/diff-formatting/whitespace-adjacency.adoc ADDED Viewed

@@ -0,0 +1,218 @@
+---
+title: Whitespace adjacency in diff reports
+parent: Diff Formatting
+nav_order: 8
+---
+= Whitespace adjacency in diff reports
+:toc:
+:toclevels: 2
+== Purpose
+Canon's diff reports anchor whitespace-only text nodes that have no
+counterpart on the other side to a dedicated `:whitespace_adjacency`
+dimension instead of letting them cascade into 3-4 misaligned
+`:text_content` mismatches against neighbouring content nodes.
+This is a *report-only* contract — equivalence verdicts are unchanged.
+Inputs that were non-equivalent before this feature remain non-equivalent;
+only the *shape* of the diff report changes.
+== The problem
+Consider an HTML fragment compared as `be_html_equivalent_to`:
+[source,html]
+----
+<!-- expected -->
+<p>
+  <span>ISO </span>
+  <span>20483</span>
+  ,
+  <i>Cereals and pulses</i>
+</p>
+<!-- actual -->
+<p><span>ISO </span><span>20483</span>, <i>Cereals and pulses</i></p>
+----
+Both render identically in a browser — the indentation is structural HTML
+formatting, not content. Before this feature, the diff report contained
+four entries:
+[source]
+----
+DIFFERENCE #1 — element_structure: parent <p> "missing children"
+DIFFERENCE #2 — text_content: ""  vs  "20483"        (visualised: ↵░░░░)
+DIFFERENCE #3 — text_content: "20483"  vs  ","
+DIFFERENCE #4 — text_content: ","  vs  "Cereals and pulses"
+----
+The cascade comes from positional `zip()` alignment in
+`Canon::Comparison::XmlComparatorHelpers::ChildComparison`: with the
+expected side carrying extra whitespace-only text nodes and the actual
+side carrying none, every child slides by one slot and gets paired
+against the wrong neighbour.
+== The contract
+When `ChildComparison` aligns child sequences and encounters a
+whitespace-only text node on one side paired against a non-whitespace
+node on the other, it:
+1. Treats the whitespace node as a *single-side gap* in the alignment.
+2. Emits one `:whitespace_adjacency` diff entry anchored at the
+   whitespace node itself (not at its mis-paired neighbour).
+3. Advances only the cursor that carries the whitespace, so the next
+   iteration aligns content against content.
+The asymmetric whitespace still produces a non-equivalent verdict — the
+`:whitespace_adjacency` dimension is classified as normative
+unconditionally — so any test that previously failed on whitespace
+asymmetry continues to fail.
+After the new contract, the cascade above collapses to:
+[source]
+----
+DIFFERENCE #1 — whitespace_adjacency: Whitespace before "20483":
+                  present on EXPECTED ("↵░░"), absent on ACTUAL
+DIFFERENCE #2 — whitespace_adjacency: Whitespace before ",":
+                  present on EXPECTED ("↵░░"), absent on ACTUAL
+DIFFERENCE #3 — text_content: "↵░░,↵░░"  vs  ", "
+----
+== Direction relative to the partner
+The Reason line names the document-order position of the whitespace
+node relative to the *partner content node* it was zipped against by
+the alignment walk. The partner is the next (or, at parent edge,
+previous) non-whitespace sibling on the whitespace-bearing side, which
+is what aligns against the corresponding content node on the other
+side.
+`before`::  The whitespace immediately precedes its next non-whitespace
+sibling. This is the common case (e.g. indentation between two inline
+spans where the asymmetric whitespace sits on the leading edge of the
+partner).
+`after`::  The whitespace trails the previous non-whitespace sibling
+and has no non-whitespace sibling after it. Emitted at the trailing
+edge of a parent.
+`adjacent to`::  Degenerate fallback for a whitespace node with no
+non-whitespace siblings at all. Rarely emitted.
+NOTE: An earlier wording (`Whitespace surrounding "X"`) classified the
+*whitespace node's position among its own siblings* rather than its
+direction relative to the partner. That label was misleading when the
+whitespace sat between two element siblings but the asymmetry was
+one-sided — see issue #137 follow-up.
+=== Examples
+==== "before" — whitespace between inline element siblings
+[source,ruby]
+----
+html1 = "<a><span>ISO </span>\n   <span>712</span></a>"
+html2 = "<a><span>ISO </span><span>712</span></a>"
+result = Canon::Comparison.equivalent?(html1, html2,
+  format: :html5, verbose: true)
+# => #<ComparisonResult equivalent=false>
+# The stray "\n   " between two spans is the only asymmetric node.
+# It sits immediately before <span>712</span>, its next non-ws sibling.
+result.differences.first.reason
+# => "Whitespace before \"712\": present on EXPECTED (\"░\"), absent on ACTUAL"
+----
+==== "after" — trailing whitespace in a whitespace-preserving element
+[source,ruby]
+----
+# <code> preserves whitespace, so the trailing newline survives the
+# upstream filter and pairs against the extra <b>B</b> on the other side.
+html1 = "<code><b>A</b>\n</code>"
+html2 = "<code><b>A</b><b>B</b></code>"
+result = Canon::Comparison.equivalent?(html1, html2,
+  format: :html5, verbose: true)
+result.differences.first.reason
+# => "Whitespace after \"B\": present on EXPECTED (\"↵\"), absent on ACTUAL"
+----
+==== "adjacent to" — sole-child whitespace node
+[source,ruby]
+----
+# A whitespace-only text node as the only child of <code>, paired
+# against an element on the other side.  No non-ws siblings exist.
+html1 = "<code>\n</code>"
+html2 = "<code><b>A</b></code>"
+result = Canon::Comparison.equivalent?(html1, html2,
+  format: :html5, verbose: true)
+result.differences.first.reason
+# => "Whitespace adjacent to \"A\": present on EXPECTED (\"↵\"), absent on ACTUAL"
+----
+== Working with :whitespace_adjacency diffs programmatically
+Use the `dimension` field on `DiffNode` to filter:
+[source,ruby]
+----
+result = Canon::Comparison.equivalent?(html1, html2,
+  format: :html5, verbose: true)
+# Find all whitespace-adjacency diffs
+ws_diffs = result.differences.select { |d| d.dimension == :whitespace_adjacency }
+# These are always normative — they affect the equivalence verdict
+ws_diffs.all?(&:normative?)  # => true
+----
+== What this contract does NOT do
+* **Does not change equivalence outcomes.** A non-equivalent comparison
+  before #137 remains non-equivalent after — only the diff-report shape
+  changes.
+* **Does not silently filter whitespace.** The asymmetric whitespace is
+  always reported; it is just labelled `:whitespace_adjacency` and
+  anchored at the whitespace node, instead of cascading as
+  `:text_content` against unrelated content nodes.
+* **Does not affect symmetric whitespace.** When both sides carry
+  parallel whitespace-only nodes, those compare normally
+  (no `:whitespace_adjacency` entry, no cascade).
+== Where it runs
+The contract is implemented as a re-alignment walk inside
+`Canon::Comparison::XmlComparatorHelpers::ChildComparison.use_positional_comparison`.
+It activates whenever the existing positional `zip()` alignment would
+pair a whitespace-only text node against a content node — that is, in
+every whitespace context where the upstream filter has not already
+dropped the whitespace nodes.
+For elements where whitespace is preserved by configuration
+(`preserve_whitespace_elements`) the upstream filter does not drop
+indentation, and the re-alignment walk surfaces every asymmetric
+whitespace node as a single normative `:whitespace_adjacency` diff.
+== Related
+* link:../../advanced/diff-classification.adoc[Diff classification] —
+  Normative vs informative differences.
+* link:../match-options/index.adoc[Match options] — Configuring
+  `preserve_whitespace_elements`, `collapse_whitespace_elements`, and
+  `strip_whitespace_elements`.
+== History
+The cascade behaviour was reported in
+https://github.com/lutaml/canon/issues/137[issue #137]. The fix landed
+as a report-only re-alignment in PR #138. PR #141 replaced the
+misleading `:surrounding`/`:preceding`/`:following` position labels
+with direction-faithful wording (`before`/`after`/`adjacent to`).

data/docs/reference/environment-variables.adoc CHANGED Viewed

@@ -194,7 +194,9 @@ export CANON_JSON_FORMAT_PREPROCESSING=normalize
 |`CANON_SHOW_PRETTYPRINT_RECEIVED`
 |boolean
 |`false`
-|Show only the RECEIVED (actual) block in the fixture-ready pretty-printed section.  This is the most common fixture-update workflow: enable this option to get a copy-pasteable pretty-printed form of the generated output that can replace the old fixture heredoc.  Format-specific: `CANON_{FORMAT}_DIFF_SHOW_PRETTYPRINT_RECEIVED`
+|Show only the RECEIVED (actual) block in the fixture-ready pretty-printed section.  This is the most common fixture-update workflow: enable this option to get a copy-pasteable pretty-printed form of the generated output that can replace the old fixture heredoc.  Format-specific: `CANON_{FORMAT}_DIFF_SHOW_PRETTYPRINT_RECEIVED`.
+For HTML / HTML4 / HTML5 inputs, the pretty-printed output is XHTML-shaped: void elements are self-closed (`<br/>`, `<meta/>`), non-void elements are paired (`<a></a>`), and Nokogiri may add `xmlns="http://www.w3.org/1999/xhtml"` on `<html>` and an `xml:lang` mirror of `lang`.  This is a display-only serialisation chosen because libxml's `FORMAT` save flag (the only path that actually indents HTML5 input) requires the XHTML save mode -- `Nokogiri::HTML5#to_html` silently ignores its `indent:` keyword.  See lutaml/canon#133.
 |All formats (display only)
 |`CANON_COMPACT_SEMANTIC_REPORT`

data/lib/canon/comparison/comparison_result.rb CHANGED Viewed

@@ -6,7 +6,8 @@ module Canon
     # Provides methods to query equivalence based on normative diffs
     class ComparisonResult
       attr_reader :differences, :preprocessed_strings, :format, :html_version,
-                  :match_options, :algorithm, :original_strings
+                  :match_options, :algorithm, :original_strings,
+                  :parse_errors_expected, :parse_errors_received
       # @param differences [Array<DiffNode>] Array of difference nodes
       # @param preprocessed_strings [Array<String, String>] Pre-processed content for display
@@ -15,8 +16,11 @@ module Canon
       # @param match_options [Hash, nil] Resolved match options used for comparison
       # @param algorithm [Symbol] Diff algorithm used (:dom or :semantic)
       # @param original_strings [Array<String, String>, nil] Original unprocessed content for line diff
+      # @param parse_errors_expected [Array<String>, nil] Parser errors from the expected side
+      # @param parse_errors_received [Array<String>, nil] Parser errors from the received side
       def initialize(differences:, preprocessed_strings:, format:,
-html_version: nil, match_options: nil, algorithm: :dom, original_strings: nil)
+html_version: nil, match_options: nil, algorithm: :dom, original_strings: nil,
+parse_errors_expected: nil, parse_errors_received: nil)
         @differences = differences
         @preprocessed_strings = preprocessed_strings
         @original_strings = original_strings || preprocessed_strings
@@ -24,6 +28,16 @@ html_version: nil, match_options: nil, algorithm: :dom, original_strings: nil)
         @html_version = html_version
         @match_options = match_options
         @algorithm = algorithm
+        @parse_errors_expected = Array(parse_errors_expected)
+        @parse_errors_received = Array(parse_errors_received)
+      end
+      # Whether either side reported parse errors.  Used by the diff
+      # formatter to decide whether to render the parse-error banner.
+      #
+      # @return [Boolean]
+      def parse_errors?
+        @parse_errors_expected.any? || @parse_errors_received.any?
       end
       # Check if documents are semantically equivalent (no normative diffs)

data/lib/canon/comparison/html_comparator.rb CHANGED Viewed

@@ -151,6 +151,8 @@ module Canon
               html_version: detect_html_version_from_node(node1),
               match_options: match_opts_hash,
               algorithm: :dom,
+              parse_errors_expected: Comparison.parse_errors_for(node1),
+              parse_errors_received: Comparison.parse_errors_for(node2),
             )
           elsif result != Comparison::EQUIVALENT && !differences.empty?
             # Non-verbose mode: check equivalence
@@ -300,6 +302,8 @@ module Canon
               html_version: html_version,
               match_options: match_opts_hash.merge(strategy.metadata),
               algorithm: :semantic,
+              parse_errors_expected: Comparison.parse_errors_for(node1),
+              parse_errors_received: Comparison.parse_errors_for(node2),
             )
           else
             # Simple boolean result - equivalent if no normative differences

data/lib/canon/comparison/markup_comparator.rb CHANGED Viewed

@@ -1,6 +1,7 @@
 # frozen_string_literal: true
 require_relative "../comparison" # Load base module with constants
+require_relative "node_inspector"
 require_relative "../diff/diff_node"
 require_relative "../diff/path_builder"
@@ -87,23 +88,20 @@ module Canon
           return nil if node.nil?
           # Canon::Xml::Node types
-          if node.is_a?(Canon::Xml::Nodes::RootNode)
+          case node
+          when Canon::Xml::Nodes::RootNode
             # Serialize all children of root
             node.children.map { |child| serialize_node(child) }.join
-          elsif node.is_a?(Canon::Xml::Nodes::ElementNode)
+          when Canon::Xml::Nodes::ElementNode
             serialize_element_node(node)
-          elsif node.is_a?(Canon::Xml::Nodes::TextNode)
+          when Canon::Xml::Nodes::TextNode
             # Use original text (with entity references) if available,
             # otherwise fall back to value (decoded text)
             node.original || node.value
-          elsif node.is_a?(Canon::Xml::Nodes::CommentNode)
+          when Canon::Xml::Nodes::CommentNode
             "<!--#{node.value}-->"
-          elsif node.is_a?(Canon::Xml::Nodes::ProcessingInstructionNode)
+          when Canon::Xml::Nodes::ProcessingInstructionNode
             "<?#{node.target} #{node.data}?>"
-          elsif node.respond_to?(:to_xml)
-            node.to_xml
-          elsif node.respond_to?(:to_html)
-            node.to_html
           else
             node.to_s
           end
@@ -121,8 +119,8 @@ module Canon
             node.attribute_nodes.to_h do |attr|
               [attr.name, attr.value]
             end
-          # Nokogiri nodes
-          elsif node.respond_to?(:attributes)
+          # Nokogiri elements
+          elsif node.is_a?(Nokogiri::XML::Element)
             node.attributes.to_h do |_, attr|
               [attr.name, attr.value]
             end
@@ -227,8 +225,8 @@ module Canon
         def same_node_type?(node1, node2)
           return false if node1.class != node2.class
-          # For Nokogiri/Canon::Xml nodes, check node type
-          if node1.respond_to?(:node_type) && node2.respond_to?(:node_type)
+          case node1
+          when Canon::Xml::Node, Nokogiri::XML::Node
             node1.node_type == node2.node_type
           else
             true
@@ -245,20 +243,7 @@ module Canon
         # @param node [Object] Node to check
         # @return [Boolean] true if node is a comment
         def comment_node?(node)
-          return true if node.respond_to?(:comment?) && node.comment?
-          return true if node.respond_to?(:node_type) && node.node_type == :comment
-          # HTML comments are parsed as TEXT nodes by Nokogiri
-          # Check if this is a text node with HTML comment content
-          if text_node?(node)
-            text = node_text(node)
-            # Strip whitespace and backslashes for comparison
-            # Nokogiri escapes HTML comments as "<\\!-- comment -->" in full documents
-            text_stripped = text.to_s.strip.gsub("\\", "")
-            return true if text_stripped.start_with?("<!--") && text_stripped.end_with?("-->")
-          end
-          false
+          NodeInspector.comment_node?(node)
         end
         # Check if a node is a text node
@@ -266,9 +251,7 @@ module Canon
         # @param node [Object] Node to check
         # @return [Boolean] true if node is a text node
         def text_node?(node)
-          (node.respond_to?(:text?) && node.text? &&
-            !node.respond_to?(:element?)) ||
-            (node.respond_to?(:node_type) && node.node_type == :text)
+          NodeInspector.text_node?(node)
         end
         # Get text content from a node
@@ -276,15 +259,7 @@ module Canon
         # @param node [Object] Node to get text from
         # @return [String] Text content
         def node_text(node)
-          # Canon::Xml::Node TextNode uses .value
-          if node.respond_to?(:value)
-            node.value.to_s
-          # Nokogiri nodes use .content
-          elsif node.respond_to?(:content)
-            node.content.to_s
-          else
-            node.to_s
-          end
+          NodeInspector.text_content(node)
         end
         # Check if difference between two texts is only whitespace
@@ -328,7 +303,7 @@ module Canon
           if diff1 == Canon::Comparison::MISSING_NODE && diff2 == Canon::Comparison::MISSING_NODE
             "element structure mismatch (children differ)"
           else
-            "#{diff1} vs #{diff2}"
+            Canon::Comparison.code_pair_label(diff1, diff2)
           end
         end
@@ -371,26 +346,18 @@ module Canon
         def extract_text_content_from_node(node)
           return nil if node.nil?
-          # For Canon::Xml::Nodes::TextNode
-          return node.value if node.respond_to?(:value) && node.is_a?(Canon::Xml::Nodes::TextNode)
-          # For XML/HTML nodes with text_content method
-          return node.text_content if node.respond_to?(:text_content)
-          # For nodes with text method
-          return node.text if node.respond_to?(:text)
-          # For nodes with content method (Moxml::Text)
-          return node.content if node.respond_to?(:content)
-          # For nodes with value method (other types)
-          return node.value if node.respond_to?(:value)
-          # For simple text nodes or strings
-          return node.to_s if node.is_a?(String)
-          # For other node types, try to_s
-          node.to_s
+          case node
+          when Canon::Xml::Nodes::TextNode
+            node.value
+          when Canon::Xml::Node
+            node.text_content
+          when Nokogiri::XML::Node
+            node.content.to_s
+          when String
+            node
+          else
+            node.to_s
+          end
         rescue StandardError
           nil
         end
@@ -444,26 +411,37 @@ module Canon
         # Determine the appropriate dimension for a node type
         #
+        # Used by ChildComparison to tag per-child orphan diffs with a
+        # dimension that matches what the node *is*, so the formatter
+        # renders correctly.  An element orphan tagged :text_content
+        # would otherwise route through PR #126's one-sided text
+        # formatter and render as +text ""+ instead of as the actual
+        # element (see lutaml/canon#125 follow-up).
+        #
         # @param node [Object] The node to check
         # @return [Symbol] The dimension symbol
         def determine_node_dimension(node)
-          # Canon::Xml::Node types
-          if node.respond_to?(:node_type) && node.node_type.is_a?(Symbol)
+          case node
+          when Canon::Xml::Node
             case node.node_type
+            when :element then :element_structure
             when :comment then :comments
             when :text, :cdata then :text_content
             when :processing_instruction then :processing_instructions
             else :text_content
             end
-          # Moxml/Nokogiri types
-          elsif node.respond_to?(:comment?) && node.comment?
-            :comments
-          elsif node.respond_to?(:text?) && node.text?
-            :text_content
-          elsif node.respond_to?(:cdata?) && node.cdata?
-            :text_content
-          elsif node.respond_to?(:processing_instruction?) && node.processing_instruction?
-            :processing_instructions
+          when Nokogiri::XML::Node
+            if node.comment?
+              :comments
+            elsif node.text? || node.cdata?
+              :text_content
+            elsif node.processing_instruction?
+              :processing_instructions
+            elsif node.element?
+              :element_structure
+            else
+              :text_content
+            end
           else
             :text_content
           end