canon 0.2.5 → 0.2.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.adoc +4 -0
- data/docs/advanced/diff-classification.adoc +16 -0
- data/docs/features/diff-formatting/index.adoc +3 -1
- data/docs/features/diff-formatting/whitespace-adjacency.adoc +91 -13
- data/lib/canon/comparison/xml_comparator.rb +17 -14
- data/lib/canon/version.rb +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 6437f1a8b556bb49bffbcecf47ec0eeecabdf6541bd5baa5954ac88f98f33a2c
|
|
4
|
+
data.tar.gz: 98eff2aa558165dc7e13c8d29da21d8d9c6589cae1d48a18d27f0420d6be7198
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 055614c143bca292b575755f5b4a1554a002e0d6f264ddee3e29049f89d6f9795a61069c1bdf7ebfa459ed11ac2a21203a779e115f8aee143a2aa3c77951a086
|
|
7
|
+
data.tar.gz: ff64c25654c1eef41dcc80b471df2958c516e69fa625723bdecfd18c5a99716e7f347c313bf72c28f3b898547d5c97aac3ebb62bcf44e726bc49e668a73e0dd9
|
data/README.adoc
CHANGED
|
@@ -614,6 +614,10 @@ See link:docs/MODES[Diff modes] for details.
|
|
|
614
614
|
* **Formatting diff detection**: Automatically detects and highlights purely cosmetic whitespace/line break differences
|
|
615
615
|
* **Whitespace visualization**: Make invisible characters visible with CJK-safe
|
|
616
616
|
Unicode symbols
|
|
617
|
+
* **Whitespace adjacency reporting**: Stray whitespace-only text nodes are
|
|
618
|
+
reported as a dedicated `:whitespace_adjacency` dimension with direction
|
|
619
|
+
wording (`before`/`after`/`adjacent to`) instead of cascading into
|
|
620
|
+
misleading `:text_content` mismatches
|
|
617
621
|
* **Non-ASCII detection**: Warnings for unexpected Unicode characters
|
|
618
622
|
* **Customizable**: Character maps, context lines, grouping options
|
|
619
623
|
|
|
@@ -266,6 +266,22 @@ match dimension but follows the same normative rule as all general dimensions:
|
|
|
266
266
|
- Structural differences tracked but don't affect equivalence
|
|
267
267
|
- Useful for content-only comparisons where wrapper elements don't matter
|
|
268
268
|
|
|
269
|
+
==== Whitespace Adjacency
|
|
270
|
+
|
|
271
|
+
`:whitespace_adjacency` is a derived dimension — emitted when the
|
|
272
|
+
alignment walk pairs a whitespace-only text node on one side against a
|
|
273
|
+
content node on the other. The Reason line describes the whitespace's
|
|
274
|
+
direction relative to the partner content: `before`, `after`, or
|
|
275
|
+
`adjacent to`.
|
|
276
|
+
|
|
277
|
+
* **Always normative** — differences always affect equivalence
|
|
278
|
+
* **Not user-configurable** — dimension is always tracked when the
|
|
279
|
+
re-alignment walk encounters an asymmetric whitespace node
|
|
280
|
+
* **Report-only** — does not change equivalence outcomes compared to
|
|
281
|
+
pre-#137 behaviour; only changes the diff-report shape (see
|
|
282
|
+
link:../features/diff-formatting/whitespace-adjacency.adoc[Whitespace
|
|
283
|
+
adjacency] for details)
|
|
284
|
+
|
|
269
285
|
.Example: Comment handling
|
|
270
286
|
====
|
|
271
287
|
[source,ruby]
|
|
@@ -30,7 +30,9 @@ Canon's diff formatting includes:
|
|
|
30
30
|
algorithms
|
|
31
31
|
* **Whitespace adjacency**: Stray whitespace-only text nodes are anchored at
|
|
32
32
|
themselves instead of cascading into mismatches against neighbouring
|
|
33
|
-
content
|
|
33
|
+
content. The Reason line names the direction relative to the partner
|
|
34
|
+
(`before`/`after`/`adjacent to`)
|
|
35
|
+
(link:./whitespace-adjacency.adoc[details])
|
|
34
36
|
|
|
35
37
|
== Available formatting options
|
|
36
38
|
|
|
@@ -75,28 +75,104 @@ After the new contract, the cascade above collapses to:
|
|
|
75
75
|
|
|
76
76
|
[source]
|
|
77
77
|
----
|
|
78
|
-
DIFFERENCE #1 — whitespace_adjacency: Whitespace
|
|
78
|
+
DIFFERENCE #1 — whitespace_adjacency: Whitespace before "20483":
|
|
79
79
|
present on EXPECTED ("↵░░"), absent on ACTUAL
|
|
80
|
-
DIFFERENCE #2 — whitespace_adjacency: Whitespace
|
|
80
|
+
DIFFERENCE #2 — whitespace_adjacency: Whitespace before ",":
|
|
81
81
|
present on EXPECTED ("↵░░"), absent on ACTUAL
|
|
82
82
|
DIFFERENCE #3 — text_content: "↵░░,↵░░" vs ", "
|
|
83
83
|
----
|
|
84
84
|
|
|
85
|
-
==
|
|
85
|
+
== Direction relative to the partner
|
|
86
86
|
|
|
87
|
-
The Reason line names the
|
|
88
|
-
relative to
|
|
87
|
+
The Reason line names the document-order position of the whitespace
|
|
88
|
+
node relative to the *partner content node* it was zipped against by
|
|
89
|
+
the alignment walk. The partner is the next (or, at parent edge,
|
|
90
|
+
previous) non-whitespace sibling on the whitespace-bearing side, which
|
|
91
|
+
is what aligns against the corresponding content node on the other
|
|
92
|
+
side.
|
|
89
93
|
|
|
90
|
-
|
|
91
|
-
sibling
|
|
94
|
+
`before`:: The whitespace immediately precedes its next non-whitespace
|
|
95
|
+
sibling. This is the common case (e.g. indentation between two inline
|
|
96
|
+
spans where the asymmetric whitespace sits on the leading edge of the
|
|
97
|
+
partner).
|
|
92
98
|
|
|
93
|
-
|
|
94
|
-
sibling
|
|
99
|
+
`after`:: The whitespace trails the previous non-whitespace sibling
|
|
100
|
+
and has no non-whitespace sibling after it. Emitted at the trailing
|
|
101
|
+
edge of a parent.
|
|
95
102
|
|
|
96
|
-
|
|
103
|
+
`adjacent to`:: Degenerate fallback for a whitespace node with no
|
|
104
|
+
non-whitespace siblings at all. Rarely emitted.
|
|
97
105
|
|
|
98
|
-
|
|
99
|
-
|
|
106
|
+
NOTE: An earlier wording (`Whitespace surrounding "X"`) classified the
|
|
107
|
+
*whitespace node's position among its own siblings* rather than its
|
|
108
|
+
direction relative to the partner. That label was misleading when the
|
|
109
|
+
whitespace sat between two element siblings but the asymmetry was
|
|
110
|
+
one-sided — see issue #137 follow-up.
|
|
111
|
+
|
|
112
|
+
=== Examples
|
|
113
|
+
|
|
114
|
+
==== "before" — whitespace between inline element siblings
|
|
115
|
+
|
|
116
|
+
[source,ruby]
|
|
117
|
+
----
|
|
118
|
+
html1 = "<a><span>ISO </span>\n <span>712</span></a>"
|
|
119
|
+
html2 = "<a><span>ISO </span><span>712</span></a>"
|
|
120
|
+
|
|
121
|
+
result = Canon::Comparison.equivalent?(html1, html2,
|
|
122
|
+
format: :html5, verbose: true)
|
|
123
|
+
# => #<ComparisonResult equivalent=false>
|
|
124
|
+
|
|
125
|
+
# The stray "\n " between two spans is the only asymmetric node.
|
|
126
|
+
# It sits immediately before <span>712</span>, its next non-ws sibling.
|
|
127
|
+
result.differences.first.reason
|
|
128
|
+
# => "Whitespace before \"712\": present on EXPECTED (\"░\"), absent on ACTUAL"
|
|
129
|
+
----
|
|
130
|
+
|
|
131
|
+
==== "after" — trailing whitespace in a whitespace-preserving element
|
|
132
|
+
|
|
133
|
+
[source,ruby]
|
|
134
|
+
----
|
|
135
|
+
# <code> preserves whitespace, so the trailing newline survives the
|
|
136
|
+
# upstream filter and pairs against the extra <b>B</b> on the other side.
|
|
137
|
+
html1 = "<code><b>A</b>\n</code>"
|
|
138
|
+
html2 = "<code><b>A</b><b>B</b></code>"
|
|
139
|
+
|
|
140
|
+
result = Canon::Comparison.equivalent?(html1, html2,
|
|
141
|
+
format: :html5, verbose: true)
|
|
142
|
+
result.differences.first.reason
|
|
143
|
+
# => "Whitespace after \"B\": present on EXPECTED (\"↵\"), absent on ACTUAL"
|
|
144
|
+
----
|
|
145
|
+
|
|
146
|
+
==== "adjacent to" — sole-child whitespace node
|
|
147
|
+
|
|
148
|
+
[source,ruby]
|
|
149
|
+
----
|
|
150
|
+
# A whitespace-only text node as the only child of <code>, paired
|
|
151
|
+
# against an element on the other side. No non-ws siblings exist.
|
|
152
|
+
html1 = "<code>\n</code>"
|
|
153
|
+
html2 = "<code><b>A</b></code>"
|
|
154
|
+
|
|
155
|
+
result = Canon::Comparison.equivalent?(html1, html2,
|
|
156
|
+
format: :html5, verbose: true)
|
|
157
|
+
result.differences.first.reason
|
|
158
|
+
# => "Whitespace adjacent to \"A\": present on EXPECTED (\"↵\"), absent on ACTUAL"
|
|
159
|
+
----
|
|
160
|
+
|
|
161
|
+
== Working with :whitespace_adjacency diffs programmatically
|
|
162
|
+
|
|
163
|
+
Use the `dimension` field on `DiffNode` to filter:
|
|
164
|
+
|
|
165
|
+
[source,ruby]
|
|
166
|
+
----
|
|
167
|
+
result = Canon::Comparison.equivalent?(html1, html2,
|
|
168
|
+
format: :html5, verbose: true)
|
|
169
|
+
|
|
170
|
+
# Find all whitespace-adjacency diffs
|
|
171
|
+
ws_diffs = result.differences.select { |d| d.dimension == :whitespace_adjacency }
|
|
172
|
+
|
|
173
|
+
# These are always normative — they affect the equivalence verdict
|
|
174
|
+
ws_diffs.all?(&:normative?) # => true
|
|
175
|
+
----
|
|
100
176
|
|
|
101
177
|
== What this contract does NOT do
|
|
102
178
|
|
|
@@ -137,4 +213,6 @@ whitespace node as a single normative `:whitespace_adjacency` diff.
|
|
|
137
213
|
|
|
138
214
|
The cascade behaviour was reported in
|
|
139
215
|
https://github.com/lutaml/canon/issues/137[issue #137]. The fix landed
|
|
140
|
-
as a report-only re-alignment in PR #138.
|
|
216
|
+
as a report-only re-alignment in PR #138. PR #141 replaced the
|
|
217
|
+
misleading `:surrounding`/`:preceding`/`:following` position labels
|
|
218
|
+
with direction-faithful wording (`before`/`after`/`adjacent to`).
|
|
@@ -873,36 +873,39 @@ differences)
|
|
|
873
873
|
return build_text_diff_reason(text1, text2)
|
|
874
874
|
end
|
|
875
875
|
|
|
876
|
-
|
|
876
|
+
direction = whitespace_partner_direction(ws_node)
|
|
877
877
|
ws_vis = visualize_whitespace(ws_text)
|
|
878
878
|
content_vis = content_text ? visualize_whitespace(truncate_text(content_text)) : "(none)"
|
|
879
879
|
|
|
880
|
-
"Whitespace #{
|
|
880
|
+
"Whitespace #{direction} \"#{content_vis}\": " \
|
|
881
881
|
"present on #{present_side} (\"#{ws_vis}\"), absent on #{absent_side}"
|
|
882
882
|
end
|
|
883
883
|
|
|
884
|
-
|
|
885
|
-
|
|
884
|
+
# Direction of the partner content relative to the whitespace node,
|
|
885
|
+
# phrased from the partner's point of view: "before" when the
|
|
886
|
+
# whitespace immediately precedes its next non-whitespace sibling
|
|
887
|
+
# (the alignment partner on the other side), "after" when the
|
|
888
|
+
# whitespace trails the previous non-whitespace sibling, or
|
|
889
|
+
# "adjacent to" as a degenerate fallback when neither neighbour
|
|
890
|
+
# exists.
|
|
891
|
+
def whitespace_partner_direction(ws_node)
|
|
892
|
+
return "adjacent to" unless ws_node.is_a?(Canon::Xml::Node) ||
|
|
886
893
|
ws_node.is_a?(Nokogiri::XML::Node)
|
|
887
894
|
|
|
888
895
|
parent = ws_node.parent
|
|
889
|
-
return
|
|
896
|
+
return "adjacent to" if parent.nil?
|
|
890
897
|
|
|
891
898
|
siblings = parent.children
|
|
892
899
|
idx = siblings.index(ws_node)
|
|
893
|
-
return
|
|
900
|
+
return "adjacent to" unless idx
|
|
894
901
|
|
|
895
|
-
|
|
896
|
-
|
|
897
|
-
|
|
898
|
-
if before && after then :surrounding
|
|
899
|
-
elsif before then :following
|
|
900
|
-
elsif after then :preceding
|
|
901
|
-
else :isolated
|
|
902
|
+
if non_ws_sibling_exists?(siblings, idx, 1) then "before"
|
|
903
|
+
elsif non_ws_sibling_exists?(siblings, idx, -1) then "after"
|
|
904
|
+
else "adjacent to"
|
|
902
905
|
end
|
|
903
906
|
end
|
|
904
907
|
|
|
905
|
-
def
|
|
908
|
+
def non_ws_sibling_exists?(siblings, idx, direction)
|
|
906
909
|
i = idx + direction
|
|
907
910
|
while i >= 0 && i < siblings.length
|
|
908
911
|
s = siblings[i]
|
data/lib/canon/version.rb
CHANGED