canon 0.2.5 → 0.2.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 615e3154c89a9850e86c39852201e5573b461ac62d52cc423523e444ace301f7
4
- data.tar.gz: 37ee00969f0682dde670168fbd7888294edda612220bfbebb7c950efbcb76aa2
3
+ metadata.gz: 6437f1a8b556bb49bffbcecf47ec0eeecabdf6541bd5baa5954ac88f98f33a2c
4
+ data.tar.gz: 98eff2aa558165dc7e13c8d29da21d8d9c6589cae1d48a18d27f0420d6be7198
5
5
  SHA512:
6
- metadata.gz: bce4239ab6a471edd896fd3b54def4e57e21714078cb3631b55363b50646349a6923eed1e208e5706c3319d3e7a2ae75f2db698ffe853c0e03a754d76c856679
7
- data.tar.gz: 1441bd5412658d9d2b975e3889fc95bfd080dec2b89b731f71e191f5ca7bbc7e0a8aa63e787916781bd5e653732c16d5c03b0d3fc3b967a3b653a2a735e62636
6
+ metadata.gz: 055614c143bca292b575755f5b4a1554a002e0d6f264ddee3e29049f89d6f9795a61069c1bdf7ebfa459ed11ac2a21203a779e115f8aee143a2aa3c77951a086
7
+ data.tar.gz: ff64c25654c1eef41dcc80b471df2958c516e69fa625723bdecfd18c5a99716e7f347c313bf72c28f3b898547d5c97aac3ebb62bcf44e726bc49e668a73e0dd9
data/README.adoc CHANGED
@@ -614,6 +614,10 @@ See link:docs/MODES[Diff modes] for details.
614
614
  * **Formatting diff detection**: Automatically detects and highlights purely cosmetic whitespace/line break differences
615
615
  * **Whitespace visualization**: Make invisible characters visible with CJK-safe
616
616
  Unicode symbols
617
+ * **Whitespace adjacency reporting**: Stray whitespace-only text nodes are
618
+ reported as a dedicated `:whitespace_adjacency` dimension with direction
619
+ wording (`before`/`after`/`adjacent to`) instead of cascading into
620
+ misleading `:text_content` mismatches
617
621
  * **Non-ASCII detection**: Warnings for unexpected Unicode characters
618
622
  * **Customizable**: Character maps, context lines, grouping options
619
623
 
@@ -266,6 +266,22 @@ match dimension but follows the same normative rule as all general dimensions:
266
266
  - Structural differences tracked but don't affect equivalence
267
267
  - Useful for content-only comparisons where wrapper elements don't matter
268
268
 
269
+ ==== Whitespace Adjacency
270
+
271
+ `:whitespace_adjacency` is a derived dimension — emitted when the
272
+ alignment walk pairs a whitespace-only text node on one side against a
273
+ content node on the other. The Reason line describes the whitespace's
274
+ direction relative to the partner content: `before`, `after`, or
275
+ `adjacent to`.
276
+
277
+ * **Always normative** — differences always affect equivalence
278
+ * **Not user-configurable** — dimension is always tracked when the
279
+ re-alignment walk encounters an asymmetric whitespace node
280
+ * **Report-only** — does not change equivalence outcomes compared to
281
+ pre-#137 behaviour; only changes the diff-report shape (see
282
+ link:../features/diff-formatting/whitespace-adjacency.adoc[Whitespace
283
+ adjacency] for details)
284
+
269
285
  .Example: Comment handling
270
286
  ====
271
287
  [source,ruby]
@@ -30,7 +30,9 @@ Canon's diff formatting includes:
30
30
  algorithms
31
31
  * **Whitespace adjacency**: Stray whitespace-only text nodes are anchored at
32
32
  themselves instead of cascading into mismatches against neighbouring
33
- content (link:./whitespace-adjacency.adoc[details])
33
+ content. The Reason line names the direction relative to the partner
34
+ (`before`/`after`/`adjacent to`)
35
+ (link:./whitespace-adjacency.adoc[details])
34
36
 
35
37
  == Available formatting options
36
38
 
@@ -75,28 +75,104 @@ After the new contract, the cascade above collapses to:
75
75
 
76
76
  [source]
77
77
  ----
78
- DIFFERENCE #1 — whitespace_adjacency: Whitespace surrounding "20483":
78
+ DIFFERENCE #1 — whitespace_adjacency: Whitespace before "20483":
79
79
  present on EXPECTED ("↵░░"), absent on ACTUAL
80
- DIFFERENCE #2 — whitespace_adjacency: Whitespace surrounding ",":
80
+ DIFFERENCE #2 — whitespace_adjacency: Whitespace before ",":
81
81
  present on EXPECTED ("↵░░"), absent on ACTUAL
82
82
  DIFFERENCE #3 — text_content: "↵░░,↵░░" vs ", "
83
83
  ----
84
84
 
85
- == Adjacency positions
85
+ == Direction relative to the partner
86
86
 
87
- The Reason line names the adjacency position of the whitespace node
88
- relative to its non-whitespace siblings:
87
+ The Reason line names the document-order position of the whitespace
88
+ node relative to the *partner content node* it was zipped against by
89
+ the alignment walk. The partner is the next (or, at parent edge,
90
+ previous) non-whitespace sibling on the whitespace-bearing side, which
91
+ is what aligns against the corresponding content node on the other
92
+ side.
89
93
 
90
- `:preceding`:: Whitespace at the start of its parent (no non-whitespace
91
- sibling before it, has one after it).
94
+ `before`:: The whitespace immediately precedes its next non-whitespace
95
+ sibling. This is the common case (e.g. indentation between two inline
96
+ spans where the asymmetric whitespace sits on the leading edge of the
97
+ partner).
92
98
 
93
- `:following`:: Whitespace at the end of its parent (has a non-whitespace
94
- sibling before it, none after).
99
+ `after`:: The whitespace trails the previous non-whitespace sibling
100
+ and has no non-whitespace sibling after it. Emitted at the trailing
101
+ edge of a parent.
95
102
 
96
- `:surrounding`:: Sandwiched between two non-whitespace siblings.
103
+ `adjacent to`:: Degenerate fallback for a whitespace node with no
104
+ non-whitespace siblings at all. Rarely emitted.
97
105
 
98
- `:isolated`:: No non-whitespace siblings at all (degenerate; rarely
99
- emitted).
106
+ NOTE: An earlier wording (`Whitespace surrounding "X"`) classified the
107
+ *whitespace node's position among its own siblings* rather than its
108
+ direction relative to the partner. That label was misleading when the
109
+ whitespace sat between two element siblings but the asymmetry was
110
+ one-sided — see issue #137 follow-up.
111
+
112
+ === Examples
113
+
114
+ ==== "before" — whitespace between inline element siblings
115
+
116
+ [source,ruby]
117
+ ----
118
+ html1 = "<a><span>ISO </span>\n <span>712</span></a>"
119
+ html2 = "<a><span>ISO </span><span>712</span></a>"
120
+
121
+ result = Canon::Comparison.equivalent?(html1, html2,
122
+ format: :html5, verbose: true)
123
+ # => #<ComparisonResult equivalent=false>
124
+
125
+ # The stray "\n " between two spans is the only asymmetric node.
126
+ # It sits immediately before <span>712</span>, its next non-ws sibling.
127
+ result.differences.first.reason
128
+ # => "Whitespace before \"712\": present on EXPECTED (\"░\"), absent on ACTUAL"
129
+ ----
130
+
131
+ ==== "after" — trailing whitespace in a whitespace-preserving element
132
+
133
+ [source,ruby]
134
+ ----
135
+ # <code> preserves whitespace, so the trailing newline survives the
136
+ # upstream filter and pairs against the extra <b>B</b> on the other side.
137
+ html1 = "<code><b>A</b>\n</code>"
138
+ html2 = "<code><b>A</b><b>B</b></code>"
139
+
140
+ result = Canon::Comparison.equivalent?(html1, html2,
141
+ format: :html5, verbose: true)
142
+ result.differences.first.reason
143
+ # => "Whitespace after \"B\": present on EXPECTED (\"↵\"), absent on ACTUAL"
144
+ ----
145
+
146
+ ==== "adjacent to" — sole-child whitespace node
147
+
148
+ [source,ruby]
149
+ ----
150
+ # A whitespace-only text node as the only child of <code>, paired
151
+ # against an element on the other side. No non-ws siblings exist.
152
+ html1 = "<code>\n</code>"
153
+ html2 = "<code><b>A</b></code>"
154
+
155
+ result = Canon::Comparison.equivalent?(html1, html2,
156
+ format: :html5, verbose: true)
157
+ result.differences.first.reason
158
+ # => "Whitespace adjacent to \"A\": present on EXPECTED (\"↵\"), absent on ACTUAL"
159
+ ----
160
+
161
+ == Working with :whitespace_adjacency diffs programmatically
162
+
163
+ Use the `dimension` field on `DiffNode` to filter:
164
+
165
+ [source,ruby]
166
+ ----
167
+ result = Canon::Comparison.equivalent?(html1, html2,
168
+ format: :html5, verbose: true)
169
+
170
+ # Find all whitespace-adjacency diffs
171
+ ws_diffs = result.differences.select { |d| d.dimension == :whitespace_adjacency }
172
+
173
+ # These are always normative — they affect the equivalence verdict
174
+ ws_diffs.all?(&:normative?) # => true
175
+ ----
100
176
 
101
177
  == What this contract does NOT do
102
178
 
@@ -137,4 +213,6 @@ whitespace node as a single normative `:whitespace_adjacency` diff.
137
213
 
138
214
  The cascade behaviour was reported in
139
215
  https://github.com/lutaml/canon/issues/137[issue #137]. The fix landed
140
- as a report-only re-alignment in PR #138.
216
+ as a report-only re-alignment in PR #138. PR #141 replaced the
217
+ misleading `:surrounding`/`:preceding`/`:following` position labels
218
+ with direction-faithful wording (`before`/`after`/`adjacent to`).
@@ -873,36 +873,39 @@ differences)
873
873
  return build_text_diff_reason(text1, text2)
874
874
  end
875
875
 
876
- position = whitespace_adjacency_position(ws_node)
876
+ direction = whitespace_partner_direction(ws_node)
877
877
  ws_vis = visualize_whitespace(ws_text)
878
878
  content_vis = content_text ? visualize_whitespace(truncate_text(content_text)) : "(none)"
879
879
 
880
- "Whitespace #{position} \"#{content_vis}\": " \
880
+ "Whitespace #{direction} \"#{content_vis}\": " \
881
881
  "present on #{present_side} (\"#{ws_vis}\"), absent on #{absent_side}"
882
882
  end
883
883
 
884
- def whitespace_adjacency_position(ws_node)
885
- return :isolated unless ws_node.is_a?(Canon::Xml::Node) ||
884
+ # Direction of the partner content relative to the whitespace node,
885
+ # phrased from the partner's point of view: "before" when the
886
+ # whitespace immediately precedes its next non-whitespace sibling
887
+ # (the alignment partner on the other side), "after" when the
888
+ # whitespace trails the previous non-whitespace sibling, or
889
+ # "adjacent to" as a degenerate fallback when neither neighbour
890
+ # exists.
891
+ def whitespace_partner_direction(ws_node)
892
+ return "adjacent to" unless ws_node.is_a?(Canon::Xml::Node) ||
886
893
  ws_node.is_a?(Nokogiri::XML::Node)
887
894
 
888
895
  parent = ws_node.parent
889
- return :isolated if parent.nil?
896
+ return "adjacent to" if parent.nil?
890
897
 
891
898
  siblings = parent.children
892
899
  idx = siblings.index(ws_node)
893
- return :isolated unless idx
900
+ return "adjacent to" unless idx
894
901
 
895
- before = sibling_with_content?(siblings, idx, -1)
896
- after = sibling_with_content?(siblings, idx, 1)
897
-
898
- if before && after then :surrounding
899
- elsif before then :following
900
- elsif after then :preceding
901
- else :isolated
902
+ if non_ws_sibling_exists?(siblings, idx, 1) then "before"
903
+ elsif non_ws_sibling_exists?(siblings, idx, -1) then "after"
904
+ else "adjacent to"
902
905
  end
903
906
  end
904
907
 
905
- def sibling_with_content?(siblings, idx, direction)
908
+ def non_ws_sibling_exists?(siblings, idx, direction)
906
909
  i = idx + direction
907
910
  while i >= 0 && i < siblings.length
908
911
  s = siblings[i]
data/lib/canon/version.rb CHANGED
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Canon
4
- VERSION = "0.2.5"
4
+ VERSION = "0.2.6"
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: canon
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.5
4
+ version: 0.2.6
5
5
  platform: ruby
6
6
  authors:
7
7
  - Ribose Inc.