canon 0.1.23 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop_todo.yml +155 -30
- data/docs/INDEX.adoc +4 -0
- data/docs/advanced/diff-classification.adoc +3 -2
- data/docs/advanced/verbose-mode-architecture.adoc +23 -0
- data/docs/features/configuration-profiles.adoc +288 -0
- data/docs/features/diff-formatting/character-visualization.adoc +153 -454
- data/docs/features/diff-formatting/display-filtering.adoc +44 -0
- data/docs/features/diff-formatting/display-preprocessing.adoc +656 -0
- data/docs/features/diff-formatting/index.adoc +47 -0
- data/docs/features/diff-formatting/pretty-diff-mode.adoc +154 -0
- data/docs/features/environment-configuration/override-system.adoc +10 -3
- data/docs/features/index.adoc +9 -0
- data/docs/features/match-options/html-policies.adoc +3 -0
- data/docs/features/match-options/index.adoc +32 -42
- data/docs/features/match-options/pretty-printed-fixtures.adoc +270 -0
- data/docs/guides/choosing-configuration.adoc +22 -0
- data/docs/reference/environment-variables.adoc +121 -1
- data/docs/reference/options-across-interfaces.adoc +182 -2
- data/lib/canon/cli.rb +20 -0
- data/lib/canon/commands/diff_command.rb +7 -2
- data/lib/canon/commands/format_command.rb +1 -1
- data/lib/canon/comparison/html_comparator.rb +29 -19
- data/lib/canon/comparison/html_compare_profile.rb +4 -4
- data/lib/canon/comparison/markup_comparator.rb +12 -3
- data/lib/canon/comparison/match_options/base_resolver.rb +29 -7
- data/lib/canon/comparison/match_options/json_resolver.rb +9 -0
- data/lib/canon/comparison/match_options/xml_resolver.rb +16 -2
- data/lib/canon/comparison/match_options/yaml_resolver.rb +10 -0
- data/lib/canon/comparison/match_options.rb +4 -1
- data/lib/canon/comparison/whitespace_sensitivity.rb +189 -137
- data/lib/canon/comparison/xml_comparator/child_comparison.rb +21 -4
- data/lib/canon/comparison/xml_comparator.rb +14 -12
- data/lib/canon/comparison/xml_node_comparison.rb +51 -6
- data/lib/canon/comparison.rb +52 -9
- data/lib/canon/config/env_schema.rb +32 -4
- data/lib/canon/config/override_resolver.rb +16 -3
- data/lib/canon/config/profile_loader.rb +135 -0
- data/lib/canon/config/profiles/metanorma.yml +74 -0
- data/lib/canon/config/profiles/metanorma_debug.yml +8 -0
- data/lib/canon/config/type_converter.rb +8 -0
- data/lib/canon/config.rb +469 -5
- data/lib/canon/diff/diff_classifier.rb +41 -11
- data/lib/canon/diff_formatter/diff_detail_formatter/dimension_formatter.rb +48 -17
- data/lib/canon/diff_formatter/diff_detail_formatter/node_utils.rb +58 -0
- data/lib/canon/diff_formatter/diff_detail_formatter.rb +73 -17
- data/lib/canon/diff_formatter.rb +493 -36
- data/lib/canon/pretty_printer/xml_normalized.rb +395 -0
- data/lib/canon/rspec_matchers.rb +36 -0
- data/lib/canon/version.rb +1 -1
- data/lib/canon/xml/nodes/namespace_node.rb +4 -0
- data/lib/canon/xml/nodes/processing_instruction_node.rb +4 -0
- data/lib/canon/xml/nodes/root_node.rb +4 -0
- data/lib/canon/xml/nodes/text_node.rb +4 -0
- data/lib/tasks/performance_helpers.rb +2 -2
- metadata +24 -2
|
@@ -1,528 +1,227 @@
|
|
|
1
1
|
---
|
|
2
|
+
layout: default
|
|
2
3
|
title: Character Visualization
|
|
3
4
|
parent: Diff Formatting
|
|
4
5
|
grand_parent: Features
|
|
5
|
-
nav_order:
|
|
6
|
+
nav_order: 3
|
|
6
7
|
---
|
|
7
|
-
|
|
8
|
+
|
|
8
9
|
:toc:
|
|
9
10
|
:toclevels: 3
|
|
10
11
|
|
|
11
|
-
==
|
|
12
|
-
|
|
13
|
-
Canon's character visualization system makes invisible characters (spaces, tabs, zero-width characters) visible in diff output, helping you quickly identify whitespace differences that cause test failures.
|
|
14
|
-
|
|
15
|
-
Visualization is **CJK-safe**, using Unicode symbols that don't conflict with Chinese, Japanese, or Korean text.
|
|
16
|
-
|
|
17
|
-
== When visualization is applied
|
|
18
|
-
|
|
19
|
-
Character visualization is applied **only to diff lines** (additions, deletions, and changes), not to context lines (unchanged lines). This ensures:
|
|
20
|
-
|
|
21
|
-
* Context lines display content in original form
|
|
22
|
-
* Only actual changes show visualization
|
|
23
|
-
* Differences are easier to spot
|
|
24
|
-
|
|
25
|
-
Within changed lines showing token-level diffs, unchanged tokens are displayed in the terminal's default color (not red/green) to distinguish them from actual changes.
|
|
26
|
-
|
|
27
|
-
== Default character map
|
|
28
|
-
|
|
29
|
-
Canon provides a comprehensive CJK-safe character mapping.
|
|
30
|
-
|
|
31
|
-
=== Common whitespace
|
|
32
|
-
|
|
33
|
-
[cols="1,1,1,2"]
|
|
34
|
-
|===
|
|
35
|
-
|Character |Unicode |Symbol |Description
|
|
36
|
-
|
|
37
|
-
|Regular space
|
|
38
|
-
|U+0020
|
|
39
|
-
|`░`
|
|
40
|
-
|Light Shade (U+2591)
|
|
41
|
-
|
|
42
|
-
|Tab
|
|
43
|
-
|U+0009
|
|
44
|
-
|`⇥`
|
|
45
|
-
|Rightwards Arrow to Bar (U+21E5)
|
|
46
|
-
|
|
47
|
-
|Non-breaking space
|
|
48
|
-
|U+00A0
|
|
49
|
-
|`␣`
|
|
50
|
-
|Open Box (U+2423)
|
|
51
|
-
|===
|
|
52
|
-
|
|
53
|
-
=== Line endings
|
|
54
|
-
|
|
55
|
-
[cols="1,1,1,2"]
|
|
56
|
-
|===
|
|
57
|
-
|Character |Unicode |Symbol |Description
|
|
58
|
-
|
|
59
|
-
|Line feed (LF)
|
|
60
|
-
|U+000A
|
|
61
|
-
|`↵`
|
|
62
|
-
|Downwards Arrow with Corner Leftwards (U+21B5)
|
|
63
|
-
|
|
64
|
-
|Carriage return (CR)
|
|
65
|
-
|U+000D
|
|
66
|
-
|`⏎`
|
|
67
|
-
|Return Symbol (U+23CE)
|
|
68
|
-
|
|
69
|
-
|Windows line ending (CRLF)
|
|
70
|
-
|U+000D U+000A
|
|
71
|
-
|`↵`
|
|
72
|
-
|Downwards Arrow with Corner Leftwards (U+21B5)
|
|
73
|
-
|
|
74
|
-
|Next line (NEL)
|
|
75
|
-
|U+0085
|
|
76
|
-
|`⏎`
|
|
77
|
-
|Return Symbol (U+23CE)
|
|
78
|
-
|
|
79
|
-
|Line separator
|
|
80
|
-
|U+2028
|
|
81
|
-
|`⤓`
|
|
82
|
-
|Downwards Arrow to Bar (U+2913)
|
|
83
|
-
|
|
84
|
-
|Paragraph separator
|
|
85
|
-
|U+2029
|
|
86
|
-
|`⤓`
|
|
87
|
-
|Downwards Arrow to Bar (U+2913)
|
|
88
|
-
|===
|
|
89
|
-
|
|
90
|
-
=== Unicode spaces
|
|
91
|
-
|
|
92
|
-
[cols="1,1,1,2"]
|
|
93
|
-
|===
|
|
94
|
-
|Character |Unicode |Symbol |Description
|
|
95
|
-
|
|
96
|
-
|En space
|
|
97
|
-
|U+2002
|
|
98
|
-
|`▭`
|
|
99
|
-
|White Rectangle (U+25AD)
|
|
100
|
-
|
|
101
|
-
|Em space
|
|
102
|
-
|U+2003
|
|
103
|
-
|`▬`
|
|
104
|
-
|Black Rectangle (U+25AC)
|
|
12
|
+
== Overview
|
|
105
13
|
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
|
|
14
|
+
Canon replaces invisible characters (spaces, tabs, non-breaking spaces, etc.)
|
|
15
|
+
with visible Unicode symbols before rendering diff output. This makes it easy
|
|
16
|
+
to spot whitespace differences that would otherwise be indistinguishable from
|
|
17
|
+
surrounding content.
|
|
110
18
|
|
|
111
|
-
|
|
112
|
-
|U+2006
|
|
113
|
-
|`⏕`
|
|
114
|
-
|Metrical Two Shorts Over Long (U+23D5)
|
|
115
|
-
|
|
116
|
-
|Thin space
|
|
117
|
-
|U+2009
|
|
118
|
-
|`▯`
|
|
119
|
-
|White Vertical Rectangle (U+25AF)
|
|
120
|
-
|
|
121
|
-
|Hair space
|
|
122
|
-
|U+200A
|
|
123
|
-
|`▮`
|
|
124
|
-
|Black Vertical Rectangle (U+25AE)
|
|
125
|
-
|
|
126
|
-
|Figure space
|
|
127
|
-
|U+2007
|
|
128
|
-
|`□`
|
|
129
|
-
|White Square (U+25A1)
|
|
130
|
-
|
|
131
|
-
|Narrow no-break space
|
|
132
|
-
|U+202F
|
|
133
|
-
|`▫`
|
|
134
|
-
|White Small Square (U+25AB)
|
|
135
|
-
|
|
136
|
-
|Medium mathematical space
|
|
137
|
-
|U+205F
|
|
138
|
-
|`▭`
|
|
139
|
-
|White Rectangle (U+25AD)
|
|
140
|
-
|
|
141
|
-
|Ideographic space
|
|
142
|
-
|U+3000
|
|
143
|
-
|`⎵`
|
|
144
|
-
|Bottom Square Bracket (U+23B5)
|
|
145
|
-
|
|
146
|
-
|Ideographic half space
|
|
147
|
-
|U+303F
|
|
148
|
-
|`⏑`
|
|
149
|
-
|Metrical Breve (U+23D1)
|
|
150
|
-
|===
|
|
151
|
-
|
|
152
|
-
=== Zero-width characters
|
|
153
|
-
|
|
154
|
-
[cols="1,1,1,2"]
|
|
155
|
-
|===
|
|
156
|
-
|Character |Unicode |Symbol |Description
|
|
157
|
-
|
|
158
|
-
|Zero-width space
|
|
159
|
-
|U+200B
|
|
160
|
-
|`→`
|
|
161
|
-
|Rightwards Arrow (U+2192)
|
|
162
|
-
|
|
163
|
-
|Zero-width non-joiner
|
|
164
|
-
|U+200C
|
|
165
|
-
|`↛`
|
|
166
|
-
|Rightwards Arrow with Stroke (U+219B)
|
|
167
|
-
|
|
168
|
-
|Zero-width joiner
|
|
169
|
-
|U+200D
|
|
170
|
-
|`⇢`
|
|
171
|
-
|Rightwards Dashed Arrow (U+21E2)
|
|
172
|
-
|
|
173
|
-
|Zero-width no-break space (BOM)
|
|
174
|
-
|U+FEFF
|
|
175
|
-
|`⇨`
|
|
176
|
-
|Rightwards White Arrow (U+21E8)
|
|
177
|
-
|===
|
|
178
|
-
|
|
179
|
-
=== Bidirectional/RTL markers
|
|
180
|
-
|
|
181
|
-
[cols="1,1,1,2"]
|
|
182
|
-
|===
|
|
183
|
-
|Character |Unicode |Symbol |Description
|
|
184
|
-
|
|
185
|
-
|Left-to-right mark
|
|
186
|
-
|U+200E
|
|
187
|
-
|`⟹`
|
|
188
|
-
|Long Rightwards Double Arrow (U+27F9)
|
|
189
|
-
|
|
190
|
-
|Right-to-left mark
|
|
191
|
-
|U+200F
|
|
192
|
-
|`⟸`
|
|
193
|
-
|Long Leftwards Double Arrow (U+27F8)
|
|
194
|
-
|
|
195
|
-
|LTR embedding
|
|
196
|
-
|U+202A
|
|
197
|
-
|`⇒`
|
|
198
|
-
|Rightwards Double Arrow (U+21D2)
|
|
199
|
-
|
|
200
|
-
|RTL embedding
|
|
201
|
-
|U+202B
|
|
202
|
-
|`⇐`
|
|
203
|
-
|Leftwards Double Arrow (U+21D0)
|
|
204
|
-
|
|
205
|
-
|Pop directional formatting
|
|
206
|
-
|U+202C
|
|
207
|
-
|`↔`
|
|
208
|
-
|Left Right Arrow (U+2194)
|
|
209
|
-
|
|
210
|
-
|LTR override
|
|
211
|
-
|U+202D
|
|
212
|
-
|`⇉`
|
|
213
|
-
|Rightwards Paired Arrows (U+21C9)
|
|
214
|
-
|
|
215
|
-
|RTL override
|
|
216
|
-
|U+202E
|
|
217
|
-
|`⇇`
|
|
218
|
-
|Leftwards Paired Arrows (U+21C7)
|
|
219
|
-
|===
|
|
220
|
-
|
|
221
|
-
=== Control characters
|
|
222
|
-
|
|
223
|
-
[cols="1,1,1,2"]
|
|
224
|
-
|===
|
|
225
|
-
|Character |Unicode |Symbol |Description
|
|
226
|
-
|
|
227
|
-
|Null
|
|
228
|
-
|U+0000
|
|
229
|
-
|`␀`
|
|
230
|
-
|Symbol for Null (U+2400)
|
|
231
|
-
|
|
232
|
-
|Soft hyphen
|
|
233
|
-
|U+00AD
|
|
234
|
-
|`‐`
|
|
235
|
-
|Hyphen (U+2010)
|
|
236
|
-
|
|
237
|
-
|Backspace
|
|
238
|
-
|U+0008
|
|
239
|
-
|`␈`
|
|
240
|
-
|Symbol for Backspace (U+2408)
|
|
241
|
-
|
|
242
|
-
|Delete
|
|
243
|
-
|U+007F
|
|
244
|
-
|`␡`
|
|
245
|
-
|Symbol for Delete (U+2421)
|
|
246
|
-
|===
|
|
247
|
-
|
|
248
|
-
== CJK safety
|
|
249
|
-
|
|
250
|
-
The visualization characters are specifically chosen to avoid conflicts with CJK text:
|
|
251
|
-
|
|
252
|
-
**Avoided characters**:
|
|
253
|
-
|
|
254
|
-
* **No middle dots** (`·`) - commonly used as separators in CJK
|
|
255
|
-
* **No bullets** (`∙`) - used in CJK lists
|
|
256
|
-
* **No circles** (`◌◍◎`) - look similar to CJK characters like ○ ●
|
|
257
|
-
* **No small dots** (`⋅`) - conflict with CJK punctuation
|
|
258
|
-
|
|
259
|
-
**Used instead**:
|
|
260
|
-
|
|
261
|
-
* Box characters (`□▭▬▯▮▫`) for various space types
|
|
262
|
-
* Arrow symbols (`→↛⇢⇨⟹⟸⇒⇐`) for zero-width and directional characters
|
|
263
|
-
* Control Pictures block symbols (`␀␈␡`) for control characters
|
|
264
|
-
|
|
265
|
-
== Examples in use
|
|
266
|
-
|
|
267
|
-
=== Space added
|
|
268
|
-
|
|
269
|
-
.Regular space added
|
|
270
|
-
[example]
|
|
271
|
-
====
|
|
272
|
-
[source]
|
|
273
|
-
----
|
|
274
|
-
10| -| <tag>Value</tag> # No space
|
|
275
|
-
| 10+| <tag>░Value</tag> # Space added (green light shade)
|
|
276
|
-
----
|
|
277
|
-
|
|
278
|
-
The `░` symbol clearly shows a regular space was added between `<tag>` and `Value`.
|
|
279
|
-
====
|
|
280
|
-
|
|
281
|
-
=== Tab vs spaces
|
|
282
|
-
|
|
283
|
-
.Tab replaced with spaces
|
|
19
|
+
.Example output with character visualization enabled (default)
|
|
284
20
|
[example]
|
|
285
21
|
====
|
|
286
|
-
[source]
|
|
287
22
|
----
|
|
288
|
-
|
|
289
|
-
|
|
|
23
|
+
| 3- | ░░<item>Alpha</item>
|
|
24
|
+
| 3+ | ░░<item>Beta</item>
|
|
290
25
|
----
|
|
291
|
-
|
|
292
|
-
The difference between a tab (`⇥`) and two spaces (`░░`) is immediately visible.
|
|
293
|
-
====
|
|
294
|
-
|
|
295
|
-
=== Non-breaking space
|
|
296
|
-
|
|
297
|
-
.Non-breaking space from web copy-paste
|
|
298
|
-
[example]
|
|
26
|
+
The `░` characters represent U+0020 spaces used for indentation.
|
|
299
27
|
====
|
|
300
|
-
Without visualization, these look identical:
|
|
301
|
-
|
|
302
|
-
[source,xml]
|
|
303
|
-
----
|
|
304
|
-
<foreword id="fwd">
|
|
305
|
-
<foreword id="fwd">
|
|
306
|
-
----
|
|
307
28
|
|
|
308
|
-
|
|
29
|
+
== Configuration
|
|
309
30
|
|
|
310
|
-
|
|
311
|
-
|
|
312
|
-
4| -| <foreword░id="fwd"> # Regular space (U+0020)
|
|
313
|
-
| 4+| <foreword␣id="fwd"> # Non-breaking space (U+00A0)
|
|
314
|
-
----
|
|
31
|
+
`character_visualization` is a `DiffConfig` option accessible via
|
|
32
|
+
`Canon::Config`, environment variables, or the `DiffFormatter` constructor.
|
|
315
33
|
|
|
316
|
-
|
|
317
|
-
====
|
|
34
|
+
=== Values
|
|
318
35
|
|
|
319
|
-
|
|
36
|
+
[cols="1,4"]
|
|
37
|
+
|===
|
|
38
|
+
|Value |Behaviour
|
|
39
|
+
|
|
40
|
+
|`true`
|
|
41
|
+
|(default) The full default visualization map is applied. Spaces appear as
|
|
42
|
+
`░`, tabs as `⇥`, non-breaking spaces as `␣`, and so on.
|
|
43
|
+
|
|
44
|
+
|`false`
|
|
45
|
+
|All visualization is disabled. Characters appear exactly as they are stored
|
|
46
|
+
in the document. Useful when copying failure-message output into a text
|
|
47
|
+
editor, or when downstream tooling cannot handle Unicode symbols.
|
|
48
|
+
|
|
49
|
+
|`:content_only`
|
|
50
|
+
|**Reserved for future use** — currently behaves identically to `true`.
|
|
51
|
+
Future intent: apply visualization only to text-node content, leaving
|
|
52
|
+
structural indentation whitespace plain so that pretty-printed diffs remain
|
|
53
|
+
visually uncluttered. See the <<future-work>> section below.
|
|
54
|
+
|===
|
|
320
55
|
|
|
321
|
-
|
|
322
|
-
[example]
|
|
323
|
-
====
|
|
324
|
-
Zero-width characters are invisible but affect comparison:
|
|
56
|
+
=== Via `Canon::Config` (RSpec `spec_helper.rb`)
|
|
325
57
|
|
|
326
|
-
[source,
|
|
327
|
-
----
|
|
328
|
-
<item>Widget</item>
|
|
329
|
-
<item>Widget</item> <!-- Contains U+200B zero-width space after "Widget" -->
|
|
58
|
+
[source,ruby]
|
|
330
59
|
----
|
|
60
|
+
Canon::Config.configure do |cfg|
|
|
61
|
+
# Disable for all XML diffs in this test suite
|
|
62
|
+
cfg.xml.diff.character_visualization = false
|
|
331
63
|
|
|
332
|
-
|
|
333
|
-
|
|
334
|
-
|
|
335
|
-
----
|
|
336
|
-
5| -| <item>Widget</item>
|
|
337
|
-
| 5+| <item>Widget→</item> # Zero-width space visualized as →
|
|
64
|
+
# Or disable for HTML as well
|
|
65
|
+
cfg.html.diff.character_visualization = false
|
|
66
|
+
end
|
|
338
67
|
----
|
|
339
68
|
|
|
340
|
-
The
|
|
341
|
-
|
|
342
|
-
|
|
343
|
-
== Real-world scenarios
|
|
69
|
+
The setting is format-specific: `cfg.xml`, `cfg.html`, `cfg.json`,
|
|
70
|
+
`cfg.yaml`, and `cfg.string` each have their own `diff.character_visualization`.
|
|
344
71
|
|
|
345
|
-
===
|
|
72
|
+
=== Via Environment Variable
|
|
346
73
|
|
|
347
|
-
|
|
348
|
-
|
|
349
|
-
.Detection example
|
|
350
|
-
[example]
|
|
351
|
-
====
|
|
352
|
-
[source]
|
|
353
|
-
----
|
|
354
|
-
4| -| <p>Hello░world</p> # U+0020 (regular space)
|
|
355
|
-
| 4+| <p>Hello␣world</p> # U+00A0 (non-breaking space)
|
|
74
|
+
[source,bash]
|
|
356
75
|
----
|
|
76
|
+
# Disable globally
|
|
77
|
+
export CANON_CHARACTER_VISUALIZATION=false
|
|
357
78
|
|
|
358
|
-
|
|
359
|
-
|
|
360
|
-
|
|
361
|
-
=== Smart quotes
|
|
362
|
-
|
|
363
|
-
**Problem**: Text editors may automatically convert straight quotes to curly quotes.
|
|
79
|
+
# Disable for XML only
|
|
80
|
+
export CANON_XML_DIFF_CHARACTER_VISUALIZATION=false
|
|
364
81
|
|
|
365
|
-
|
|
366
|
-
|
|
367
|
-
====
|
|
368
|
-
[source]
|
|
369
|
-
----
|
|
370
|
-
10| -| <title>John's Book</title> # Straight apostrophe
|
|
371
|
-
| 10+| <title>John's Book</title> # Curly apostrophe (U+2019)
|
|
82
|
+
# Set to :content_only for XML only
|
|
83
|
+
export CANON_XML_DIFF_CHARACTER_VISUALIZATION=content_only
|
|
372
84
|
----
|
|
373
85
|
|
|
374
|
-
|
|
375
|
-
====
|
|
376
|
-
|
|
377
|
-
=== Template generation
|
|
378
|
-
|
|
379
|
-
**Problem**: Generated output has invisible character differences.
|
|
86
|
+
=== Via `DiffFormatter` directly
|
|
380
87
|
|
|
381
|
-
|
|
382
|
-
[example]
|
|
383
|
-
====
|
|
384
|
-
[source]
|
|
88
|
+
[source,ruby]
|
|
385
89
|
----
|
|
386
|
-
|
|
387
|
-
|
|
90
|
+
formatter = Canon::DiffFormatter.new(
|
|
91
|
+
use_color: false,
|
|
92
|
+
mode: :by_line,
|
|
93
|
+
character_visualization: false,
|
|
94
|
+
)
|
|
388
95
|
----
|
|
389
96
|
|
|
390
|
-
|
|
391
|
-
====
|
|
392
|
-
|
|
393
|
-
== Customizing character visualization
|
|
97
|
+
== Default Visualization Map
|
|
394
98
|
|
|
395
|
-
|
|
99
|
+
The following characters are visualized by default:
|
|
396
100
|
|
|
397
|
-
|
|
101
|
+
[cols="2,1,1,3"]
|
|
102
|
+
|===
|
|
103
|
+
|Name |Unicode |Visualization |Notes
|
|
398
104
|
|
|
399
|
-
|
|
400
|
-
|
|
401
|
-
|
|
105
|
+
|Space
|
|
106
|
+
|U+0020
|
|
107
|
+
|`░`
|
|
108
|
+
|Most common; appears as indentation and between tokens
|
|
402
109
|
|
|
403
|
-
|
|
404
|
-
|
|
405
|
-
|
|
406
|
-
|
|
407
|
-
"\u200B" => '⚠' # Warning symbol for zero-width space
|
|
408
|
-
})
|
|
110
|
+
|Tab
|
|
111
|
+
|U+0009
|
|
112
|
+
|`⇥`
|
|
113
|
+
|Structural indentation in tab-indented files
|
|
409
114
|
|
|
410
|
-
|
|
411
|
-
|
|
412
|
-
|
|
413
|
-
|
|
414
|
-
)
|
|
415
|
-
----
|
|
115
|
+
|No-Break Space
|
|
116
|
+
|U+00A0
|
|
117
|
+
|`␣`
|
|
118
|
+
|Common in HTML (` `) and word-processor output
|
|
416
119
|
|
|
417
|
-
|
|
120
|
+
|Thin Space
|
|
121
|
+
|U+2009
|
|
122
|
+
|`·`
|
|
123
|
+
|Typographic use
|
|
418
124
|
|
|
419
|
-
|
|
125
|
+
|En Space
|
|
126
|
+
|U+2002
|
|
127
|
+
|`◦`
|
|
128
|
+
|
|
|
420
129
|
|
|
421
|
-
|
|
422
|
-
|
|
423
|
-
|
|
424
|
-
|
|
130
|
+
|Em Space
|
|
131
|
+
|U+2003
|
|
132
|
+
|`▬`
|
|
133
|
+
|
|
|
425
134
|
|
|
426
|
-
|
|
135
|
+
|Line Feed
|
|
136
|
+
|U+000A
|
|
137
|
+
|`↵`
|
|
138
|
+
|End-of-line (shown in certain modes)
|
|
427
139
|
|
|
428
|
-
|
|
429
|
-
|
|
430
|
-
|
|
140
|
+
|Carriage Return
|
|
141
|
+
|U+000D
|
|
142
|
+
|`←`
|
|
143
|
+
|Windows line endings
|
|
431
144
|
|
|
432
|
-
|
|
145
|
+
|Zero-Width Space
|
|
146
|
+
|U+200B
|
|
147
|
+
|`⁰`
|
|
148
|
+
|Invisible in output; often a bug
|
|
149
|
+
|===
|
|
433
150
|
|
|
434
|
-
|
|
151
|
+
The complete map is loaded from
|
|
152
|
+
`lib/canon/diff_formatter/character_map.yml` at startup.
|
|
435
153
|
|
|
436
|
-
|
|
154
|
+
== Interaction with `display_preprocessing`
|
|
437
155
|
|
|
438
|
-
|
|
156
|
+
Character visualization is applied **after** display preprocessing.
|
|
439
157
|
|
|
440
|
-
|
|
441
|
-
|
|
442
|
-
|
|
443
|
-
diff: { use_color: true }
|
|
158
|
+
When `display_preprocessing: :pretty_print` is active, Canon runs both
|
|
159
|
+
documents through `Canon::PrettyPrinter::Xml` before the line diff. The
|
|
160
|
+
pretty-printer produces only:
|
|
444
161
|
|
|
445
|
-
|
|
446
|
-
|
|
447
|
-
----
|
|
162
|
+
* ASCII U+0020 spaces for indentation
|
|
163
|
+
* ASCII U+000A newlines between elements
|
|
448
164
|
|
|
449
|
-
|
|
165
|
+
Because the default visualization map visualizes U+0020 spaces as `░`,
|
|
166
|
+
structural indentation introduced by the pretty-printer *will* appear as `░`
|
|
167
|
+
characters in context lines. This is intentional: the diff output stays
|
|
168
|
+
consistent regardless of how indentation was introduced.
|
|
450
169
|
|
|
451
|
-
.
|
|
170
|
+
.Example (display_preprocessing: :pretty_print, character_visualization: true)
|
|
452
171
|
[example]
|
|
453
172
|
====
|
|
454
|
-
[source,ruby]
|
|
455
173
|
----
|
|
456
|
-
|
|
457
|
-
|
|
458
|
-
|
|
459
|
-
|
|
460
|
-
)
|
|
461
|
-
|
|
462
|
-
# Disable for plain text
|
|
463
|
-
Canon::Comparison.equivalent?(doc1, doc2,
|
|
464
|
-
verbose: true,
|
|
465
|
-
diff: { use_color: false } # No visualization
|
|
466
|
-
)
|
|
174
|
+
| 1 | <root>
|
|
175
|
+
| 2- | ░░<item>Alpha</item>
|
|
176
|
+
| 2+ | ░░<item>Beta</item>
|
|
177
|
+
| 3 | </root>
|
|
467
178
|
----
|
|
179
|
+
The `░░` represents the 2-space indentation added by the pretty-printer.
|
|
468
180
|
====
|
|
469
181
|
|
|
470
|
-
.
|
|
471
|
-
|
|
472
|
-
|
|
473
|
-
|
|
474
|
-
----
|
|
475
|
-
# Enable (default)
|
|
476
|
-
canon diff file1.xml file2.xml --verbose
|
|
182
|
+
If you want structure-free visualization (e.g. indentation stays as plain
|
|
183
|
+
spaces but content whitespace is still visualized), set
|
|
184
|
+
`character_visualization: :content_only` once that feature is implemented.
|
|
185
|
+
See <<future-work>>.
|
|
477
186
|
|
|
478
|
-
|
|
479
|
-
|
|
480
|
-
|
|
481
|
-
|
|
187
|
+
== Combining with `context_lines`
|
|
188
|
+
|
|
189
|
+
Context lines (unchanged lines shown around a diff hunk) are also subject to
|
|
190
|
+
character visualization. Reducing `context_lines` limits the number of
|
|
191
|
+
visualized lines displayed:
|
|
482
192
|
|
|
483
|
-
.RSpec
|
|
484
|
-
[example]
|
|
485
|
-
====
|
|
486
193
|
[source,ruby]
|
|
487
194
|
----
|
|
488
|
-
Canon::
|
|
489
|
-
#
|
|
490
|
-
|
|
195
|
+
Canon::Config.configure do |cfg|
|
|
196
|
+
cfg.xml.diff.context_lines = 1 # show only 1 context line
|
|
197
|
+
cfg.xml.diff.character_visualization = true
|
|
491
198
|
end
|
|
492
199
|
----
|
|
493
|
-
====
|
|
494
|
-
|
|
495
|
-
== Troubleshooting
|
|
496
|
-
|
|
497
|
-
=== Visualization not showing
|
|
498
|
-
|
|
499
|
-
**Problem**: Invisible characters not visualized.
|
|
500
|
-
|
|
501
|
-
**Solutions**:
|
|
502
|
-
|
|
503
|
-
* Ensure `use_color: true`
|
|
504
|
-
* Check terminal supports Unicode
|
|
505
|
-
* Verify the characters are in diff lines (not context lines)
|
|
506
|
-
|
|
507
|
-
=== Wrong symbols displayed
|
|
508
200
|
|
|
509
|
-
|
|
201
|
+
== [[future-work]]Future Work: `:content_only`
|
|
510
202
|
|
|
511
|
-
|
|
203
|
+
The `:content_only` value is reserved for a planned DOM-level pre-serialization
|
|
204
|
+
pass that would:
|
|
512
205
|
|
|
513
|
-
|
|
514
|
-
*
|
|
515
|
-
|
|
206
|
+
1. Walk the parsed DOM tree before serialization.
|
|
207
|
+
2. Apply character visualization *only* to text node content.
|
|
208
|
+
3. Leave structural whitespace (indentation, element-separator newlines)
|
|
209
|
+
plain.
|
|
516
210
|
|
|
517
|
-
|
|
211
|
+
This would allow pretty-printed output to have clean indentation while
|
|
212
|
+
still making content whitespace visible.
|
|
518
213
|
|
|
519
|
-
|
|
214
|
+
The constraint that blocks this today: visualization is currently applied
|
|
215
|
+
as a post-serialization string substitution by the by-line formatters.
|
|
216
|
+
Implementing `:content_only` requires moving the substitution step into a
|
|
217
|
+
DOM walk before `Nokogiri::XML::Node#to_xml` is called.
|
|
520
218
|
|
|
521
|
-
|
|
219
|
+
When implemented, `:content_only` will become the recommended value for
|
|
220
|
+
suites that use `display_preprocessing: :pretty_print`.
|
|
522
221
|
|
|
523
|
-
== See
|
|
222
|
+
== See Also
|
|
524
223
|
|
|
525
|
-
* link:
|
|
526
|
-
* link:
|
|
527
|
-
* link:../../interfaces
|
|
528
|
-
* link:../../
|
|
224
|
+
* link:display-preprocessing.html[Display Preprocessing]
|
|
225
|
+
* link:index.html[Diff Formatting overview]
|
|
226
|
+
* link:../../reference/options-across-interfaces.html[Options Across Interfaces]
|
|
227
|
+
* link:../../reference/environment-variables.html[Environment Variables]
|