canon 0.2.6 → 0.2.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/docs/advanced/semantic-diff-report.adoc +2 -0
- data/docs/features/diff-formatting/display-preprocessing.adoc +1 -1
- data/docs/features/diff-formatting/whitespace-adjacency.adoc +9 -0
- data/docs/interfaces/ruby-api/index.adoc +20 -0
- data/docs/understanding/formats/html.adoc +17 -0
- data/lib/canon/comparison/node_inspector.rb +9 -0
- data/lib/canon/comparison/xml_comparator.rb +25 -9
- data/lib/canon/diff_formatter/diff_detail_formatter/dimension_formatter.rb +28 -8
- data/lib/canon/pretty_printer/html.rb +34 -0
- data/lib/canon/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 1800400419926b8607eb146490d0bc4a0ecf5e4bfaf2b3007a87e99d440661f3
|
|
4
|
+
data.tar.gz: 0fc8298171e94fec5e9c4b650001fcc31e79ca774e1d7ad0f19fe91308199b18
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 5c553b671df23a70814bedb7521836b01ce7d6e0ab1af99f00aa05ca6f4ef875d8e8af2e4e63c208443d8ddbab92ed7a05c3c8c6828da2608c42bf6a38f0b7c4
|
|
7
|
+
data.tar.gz: e183e77684bc3fe7c072caa904cf709a3c20e2f6c38099a42f099e3367d2b0c2336f5407a6def0147f7405056a26a639f4b9dd5075deb6b51a60bcb7921c7c44
|
|
@@ -212,6 +212,8 @@ Reason: Text: "¬······:¬······"
|
|
|
212
212
|
|
|
213
213
|
This fallback is implemented in `Canon::DiffFormatter::DiffDetailFormatterHelpers::DimensionFormatter.format_text_content_details` and only triggers when `TextUtils.ambiguous_text_pair?` returns `true` _and_ at least one side has a parent element to render.
|
|
214
214
|
|
|
215
|
+
The same fallback also applies to the `whitespace_adjacency` dimension (see <<whitespace-adjacency,Whitespace adjacency>>): when the alignment partner of a stray whitespace node extracts to an empty / whitespace-only string, the Reason line reads `Whitespace inside <PARENT>` (rather than `Whitespace before ""`), and the Expected/Actual block surfaces each side's parent element compactly. See `format_whitespace_adjacency_details` and `Canon::Comparison::XmlComparator#build_whitespace_adjacency_reason`.
|
|
216
|
+
|
|
215
217
|
==== One-sided text diffs (added or removed text nodes)
|
|
216
218
|
|
|
217
219
|
When a `text_content` difference carries a text node on one side and `nil` on the other (issue #125) -- the shape that fragment-length mismatches and child-comparison emit when a text-node child is missing -- the renderer mirrors `element_structure`: the missing side reads `(not present)`, and the present side reads the text-node content (whitespace-visualised) plus a brief parent open-tag hint for context. The full ancestor subtree is *not* dumped; only the immediate parent's opening tag is shown, so a missing whitespace text node cannot make the diff look like the entire ancestor differs.
|
|
@@ -430,7 +430,7 @@ pretty-printer. This is a known future work item.
|
|
|
430
430
|
|✓ Full
|
|
431
431
|
|✓ (via XML serializer)
|
|
432
432
|
|✓ Full
|
|
433
|
-
|`:pretty_print` uses `Canon::PrettyPrinter::Html
|
|
433
|
+
|`:pretty_print` uses `Canon::PrettyPrinter::Html` in fixture-ready mode (`FORMAT|AS_XHTML|NO_DECLARATION`); `:normalize_pretty_print` falls back to `XmlNormalized` pending a dedicated `HtmlNormalized`; `:c14n` uses Nokogiri HTML5 serialization. In fixture-ready mode, stray structural whitespace (whitespace-only text nodes between block-level siblings) is stripped before formatting so that libxml's `FORMAT` flag produces correct indentation. Whitespace inside `<pre>`, `<script>`, `<style>`, and `<textarea>` is preserved.
|
|
434
434
|
|
|
435
435
|
|JSON
|
|
436
436
|
|Planned
|
|
@@ -103,6 +103,15 @@ edge of a parent.
|
|
|
103
103
|
`adjacent to`:: Degenerate fallback for a whitespace node with no
|
|
104
104
|
non-whitespace siblings at all. Rarely emitted.
|
|
105
105
|
|
|
106
|
+
When the alignment partner extracts to an empty / whitespace-only
|
|
107
|
+
string (e.g. an element with no text descendants), the direction
|
|
108
|
+
phrasing degenerates to `Whitespace before ""` which carries no
|
|
109
|
+
information. In that case Canon falls back to naming the parent
|
|
110
|
+
element instead — `Whitespace inside <PARENT>` — and the
|
|
111
|
+
Expected/Actual detail block renders each side's parent element
|
|
112
|
+
compactly per the contract from
|
|
113
|
+
link:../../advanced/semantic-diff-report.adoc#parent-context-fallback-for-ambiguous-text-diffs[issue #112].
|
|
114
|
+
|
|
106
115
|
NOTE: An earlier wording (`Whitespace surrounding "X"`) classified the
|
|
107
116
|
*whitespace node's position among its own siblings* rather than its
|
|
108
117
|
direction relative to the partner. That label was misleading when the
|
|
@@ -116,6 +116,9 @@ Where:
|
|
|
116
116
|
`{Format}`:: The format module (`Xml`, `Html`, `Json`)
|
|
117
117
|
`n`:: Number of spaces (default: 2) or tabs (use 1 for tabs)
|
|
118
118
|
`type`:: Indentation type: `'space'` (default) or `'tab'`
|
|
119
|
+
`fixture_ready`:: (HTML only) When `true`, emit indented XHTML-shaped
|
|
120
|
+
output that strips structural whitespace before formatting. Designed for
|
|
121
|
+
copy-paste into RSpec heredoc fixtures. Default: `false`.
|
|
119
122
|
`content`:: The input string
|
|
120
123
|
|
|
121
124
|
.Pretty-print examples
|
|
@@ -151,6 +154,23 @@ Canon::Xml::PrettyPrinter.new(
|
|
|
151
154
|
html_input = '<div><p>Hello</p></div>'
|
|
152
155
|
Canon::Html::PrettyPrinter.new(indent: 2).format(html_input)
|
|
153
156
|
|
|
157
|
+
# HTML fixture-ready mode: produces indented XHTML-shaped output
|
|
158
|
+
# suitable for pasting into RSpec heredoc fixtures. Strips stray
|
|
159
|
+
# structural whitespace (inter-element text nodes) so libxml's FORMAT
|
|
160
|
+
# flag can indent block-level siblings that would otherwise be treated
|
|
161
|
+
# as mixed content. Whitespace inside <pre>, <script>, <style>, and
|
|
162
|
+
# <textarea> is preserved.
|
|
163
|
+
Canon::Html::PrettyPrinter.new(indent: 2, fixture_ready: true)
|
|
164
|
+
.format('<html><body><div>a</div> <div>b</div></body></html>')
|
|
165
|
+
# =>
|
|
166
|
+
# <html xmlns="http://www.w3.org/1999/xhtml">
|
|
167
|
+
# <head>...</head>
|
|
168
|
+
# <body>
|
|
169
|
+
# <div>a</div>
|
|
170
|
+
# <div>b</div>
|
|
171
|
+
# </body>
|
|
172
|
+
# </html>
|
|
173
|
+
|
|
154
174
|
# JSON with 2-space indentation
|
|
155
175
|
json_input = '{"z":3,"a":{"b":1}}'
|
|
156
176
|
Canon::Json::PrettyPrinter.new(indent: 2).format(json_input)
|
|
@@ -235,6 +235,23 @@ HTML whitespace is collapsed per CSS rendering rules. Empty text nodes between e
|
|
|
235
235
|
Multiple spaces within text content are collapsed to single spaces when `text_content: :normalize` is used.
|
|
236
236
|
====
|
|
237
237
|
|
|
238
|
+
==== Fixture-ready pretty-print and structural whitespace
|
|
239
|
+
|
|
240
|
+
When using `Canon::PrettyPrinter::Html` with `fixture_ready: true` (the mode
|
|
241
|
+
used by the diff pipeline's *PRETTY-PRINTED INPUTS* section), Canon strips
|
|
242
|
+
stray structural whitespace before formatting. Real-world HTML5 input from
|
|
243
|
+
upstream pipelines often carries whitespace-only text nodes between block-level
|
|
244
|
+
siblings (`<body>` → `<div>`, `<br>`, `<div>`, ...). libxml's `FORMAT` flag
|
|
245
|
+
treats any element with a non-whitespace-only text child as mixed content and
|
|
246
|
+
refuses to indent its children — producing a single-line blob instead of a
|
|
247
|
+
readable tree.
|
|
248
|
+
|
|
249
|
+
The fixture-ready mode removes whitespace-only text nodes from parents that
|
|
250
|
+
are purely structural (no real text content) and are not whitespace-preserving
|
|
251
|
+
elements (`<pre>`, `<script>`, `<style>`, `<textarea>`). Mixed-content runs
|
|
252
|
+
like `<p>foo <em>bar</em> baz</p>` are left untouched so that significant
|
|
253
|
+
inline whitespace is preserved.
|
|
254
|
+
|
|
238
255
|
=== Attribute order
|
|
239
256
|
|
|
240
257
|
HTML attributes are inherently unordered per the HTML specification, so default is `:ignore`.
|
|
@@ -98,6 +98,15 @@ module Canon
|
|
|
98
98
|
[]
|
|
99
99
|
end
|
|
100
100
|
end
|
|
101
|
+
|
|
102
|
+
# Return the parent node of +node+, or nil when +node+ is not a
|
|
103
|
+
# recognised DOM backend type or has no parent.
|
|
104
|
+
def self.parent_of(node)
|
|
105
|
+
case node
|
|
106
|
+
when Canon::Xml::Node, Nokogiri::XML::Node
|
|
107
|
+
node.parent
|
|
108
|
+
end
|
|
109
|
+
end
|
|
101
110
|
end
|
|
102
111
|
end
|
|
103
112
|
end
|
|
@@ -873,12 +873,31 @@ differences)
|
|
|
873
873
|
return build_text_diff_reason(text1, text2)
|
|
874
874
|
end
|
|
875
875
|
|
|
876
|
-
direction = whitespace_partner_direction(ws_node)
|
|
877
876
|
ws_vis = visualize_whitespace(ws_text)
|
|
878
|
-
content_vis = content_text ? visualize_whitespace(truncate_text(content_text)) : "(none)"
|
|
879
877
|
|
|
880
|
-
|
|
881
|
-
|
|
878
|
+
if content_text.nil? || content_text.strip.empty?
|
|
879
|
+
# Partner content extracts to "" / whitespace-only — naming it
|
|
880
|
+
# in the Reason ("Whitespace before \"\"") gives the reader
|
|
881
|
+
# nothing. Fall back to the parent element name so the
|
|
882
|
+
# diff carries structural context (issue #112's contract,
|
|
883
|
+
# extended from :text_content to :whitespace_adjacency).
|
|
884
|
+
parent_label = whitespace_adjacency_parent_label(ws_node)
|
|
885
|
+
"Whitespace inside #{parent_label}: " \
|
|
886
|
+
"present on #{present_side} (\"#{ws_vis}\"), absent on #{absent_side}"
|
|
887
|
+
else
|
|
888
|
+
direction = whitespace_partner_direction(ws_node)
|
|
889
|
+
content_vis = visualize_whitespace(truncate_text(content_text))
|
|
890
|
+
"Whitespace #{direction} \"#{content_vis}\": " \
|
|
891
|
+
"present on #{present_side} (\"#{ws_vis}\"), absent on #{absent_side}"
|
|
892
|
+
end
|
|
893
|
+
end
|
|
894
|
+
|
|
895
|
+
def whitespace_adjacency_parent_label(ws_node)
|
|
896
|
+
parent = NodeInspector.parent_of(ws_node)
|
|
897
|
+
return "(unknown parent)" unless parent
|
|
898
|
+
|
|
899
|
+
name = parent.name
|
|
900
|
+
name && !name.empty? ? "<#{name}>" : "(unknown parent)"
|
|
882
901
|
end
|
|
883
902
|
|
|
884
903
|
# Direction of the partner content relative to the whitespace node,
|
|
@@ -889,11 +908,8 @@ differences)
|
|
|
889
908
|
# "adjacent to" as a degenerate fallback when neither neighbour
|
|
890
909
|
# exists.
|
|
891
910
|
def whitespace_partner_direction(ws_node)
|
|
892
|
-
|
|
893
|
-
|
|
894
|
-
|
|
895
|
-
parent = ws_node.parent
|
|
896
|
-
return "adjacent to" if parent.nil?
|
|
911
|
+
parent = NodeInspector.parent_of(ws_node)
|
|
912
|
+
return "adjacent to" unless parent
|
|
897
913
|
|
|
898
914
|
siblings = parent.children
|
|
899
915
|
idx = siblings.index(ws_node)
|
|
@@ -525,14 +525,34 @@ expand_difference: false)
|
|
|
525
525
|
text1 = NodeUtils.get_node_text(node1).to_s
|
|
526
526
|
text2 = NodeUtils.get_node_text(node2).to_s
|
|
527
527
|
|
|
528
|
-
|
|
529
|
-
|
|
530
|
-
|
|
531
|
-
|
|
532
|
-
|
|
533
|
-
|
|
534
|
-
|
|
535
|
-
|
|
528
|
+
if TextUtils.ambiguous_text_pair?(text1, text2) &&
|
|
529
|
+
(NodeUtils.parent_of(node1) || NodeUtils.parent_of(node2))
|
|
530
|
+
# Both sides extract to empty / whitespace-only strings —
|
|
531
|
+
# `""` / `""` tells the reader nothing. Fall back to a
|
|
532
|
+
# brief parent open-tag hint per #112's contract, but
|
|
533
|
+
# without dumping the full ancestor subtree (#125).
|
|
534
|
+
hint1 = NodeUtils.serialize_open_tag(NodeUtils.parent_of(node1))
|
|
535
|
+
hint2 = NodeUtils.serialize_open_tag(NodeUtils.parent_of(node2))
|
|
536
|
+
ws1 = TextUtils.visualize_whitespace(text1)
|
|
537
|
+
ws2 = TextUtils.visualize_whitespace(text2)
|
|
538
|
+
detail1 = ColorHelper.colorize(
|
|
539
|
+
"\"#{ws1}\" in #{hint1}",
|
|
540
|
+
:red, use_color
|
|
541
|
+
)
|
|
542
|
+
detail2 = ColorHelper.colorize(
|
|
543
|
+
"\"#{ws2}\" in #{hint2}",
|
|
544
|
+
:green, use_color
|
|
545
|
+
)
|
|
546
|
+
else
|
|
547
|
+
detail1 = ColorHelper.colorize(
|
|
548
|
+
"\"#{TextUtils.visualize_whitespace(text1)}\"",
|
|
549
|
+
:red, use_color
|
|
550
|
+
)
|
|
551
|
+
detail2 = ColorHelper.colorize(
|
|
552
|
+
"\"#{TextUtils.visualize_whitespace(text2)}\"",
|
|
553
|
+
:green, use_color
|
|
554
|
+
)
|
|
555
|
+
end
|
|
536
556
|
|
|
537
557
|
reason = if diff.is_a?(Canon::Diff::DiffNode)
|
|
538
558
|
diff.reason
|
|
@@ -29,6 +29,8 @@ module Canon
|
|
|
29
29
|
#
|
|
30
30
|
# See lutaml/canon#133, lutaml/canon#135.
|
|
31
31
|
class Html
|
|
32
|
+
WHITESPACE_PRESERVING_ELEMENTS = %w[pre textarea script style].freeze
|
|
33
|
+
|
|
32
34
|
def initialize(indent: 2, indent_type: "space", fixture_ready: false)
|
|
33
35
|
@indent = indent.to_i
|
|
34
36
|
@indent_type = indent_type
|
|
@@ -83,6 +85,7 @@ module Canon
|
|
|
83
85
|
# suppresses the +<?xml ...?>+ prefix.
|
|
84
86
|
def format_fixture_ready(html_string)
|
|
85
87
|
doc = Nokogiri::HTML5(html_string)
|
|
88
|
+
strip_structural_whitespace!(doc)
|
|
86
89
|
io = StringIO.new
|
|
87
90
|
if @indent_type == "tab"
|
|
88
91
|
doc.write_to(io, save_with: fixture_ready_save_options,
|
|
@@ -94,6 +97,37 @@ module Canon
|
|
|
94
97
|
io.string
|
|
95
98
|
end
|
|
96
99
|
|
|
100
|
+
# libxml's +FORMAT+ save flag does not insert indentation around
|
|
101
|
+
# the children of any element it sees as mixed content (any
|
|
102
|
+
# non-whitespace-only text node child). +Nokogiri::HTML5+ does
|
|
103
|
+
# not accept the +noblanks+ option that the XML parser uses to
|
|
104
|
+
# strip these inter-sibling text nodes pre-serialisation, so we
|
|
105
|
+
# do it manually here: drop whitespace-only text nodes whose
|
|
106
|
+
# parent is structural (no real text content) and not a
|
|
107
|
+
# whitespace-preserving element. Mixed-content runs like
|
|
108
|
+
# +<p>foo <em>bar</em> baz</p>+ are left alone.
|
|
109
|
+
def strip_structural_whitespace!(doc)
|
|
110
|
+
to_remove = []
|
|
111
|
+
doc.traverse do |node|
|
|
112
|
+
next unless node.text?
|
|
113
|
+
next unless node.content.strip.empty?
|
|
114
|
+
|
|
115
|
+
parent = node.parent
|
|
116
|
+
next if parent.nil?
|
|
117
|
+
next if WHITESPACE_PRESERVING_ELEMENTS.include?(parent.name)
|
|
118
|
+
next if parent_has_real_text?(parent)
|
|
119
|
+
|
|
120
|
+
to_remove << node
|
|
121
|
+
end
|
|
122
|
+
to_remove.each(&:remove)
|
|
123
|
+
end
|
|
124
|
+
|
|
125
|
+
def parent_has_real_text?(parent)
|
|
126
|
+
parent.children.any? do |c|
|
|
127
|
+
c.text? && !c.content.strip.empty?
|
|
128
|
+
end
|
|
129
|
+
end
|
|
130
|
+
|
|
97
131
|
def fixture_ready_save_options
|
|
98
132
|
Nokogiri::XML::Node::SaveOptions::FORMAT |
|
|
99
133
|
Nokogiri::XML::Node::SaveOptions::AS_XHTML |
|
data/lib/canon/version.rb
CHANGED
metadata
CHANGED
|
@@ -1,14 +1,14 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: canon
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.2.
|
|
4
|
+
version: 0.2.7
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Ribose Inc.
|
|
8
8
|
autorequire:
|
|
9
9
|
bindir: exe
|
|
10
10
|
cert_chain: []
|
|
11
|
-
date: 2026-05-
|
|
11
|
+
date: 2026-05-04 00:00:00.000000000 Z
|
|
12
12
|
dependencies:
|
|
13
13
|
- !ruby/object:Gem::Dependency
|
|
14
14
|
name: diff-lcs
|