canon 0.1.23 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (56) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop_todo.yml +155 -30
  3. data/docs/INDEX.adoc +4 -0
  4. data/docs/advanced/diff-classification.adoc +3 -2
  5. data/docs/advanced/verbose-mode-architecture.adoc +23 -0
  6. data/docs/features/configuration-profiles.adoc +288 -0
  7. data/docs/features/diff-formatting/character-visualization.adoc +153 -454
  8. data/docs/features/diff-formatting/display-filtering.adoc +44 -0
  9. data/docs/features/diff-formatting/display-preprocessing.adoc +656 -0
  10. data/docs/features/diff-formatting/index.adoc +47 -0
  11. data/docs/features/diff-formatting/pretty-diff-mode.adoc +154 -0
  12. data/docs/features/environment-configuration/override-system.adoc +10 -3
  13. data/docs/features/index.adoc +9 -0
  14. data/docs/features/match-options/html-policies.adoc +3 -0
  15. data/docs/features/match-options/index.adoc +32 -42
  16. data/docs/features/match-options/pretty-printed-fixtures.adoc +270 -0
  17. data/docs/guides/choosing-configuration.adoc +22 -0
  18. data/docs/reference/environment-variables.adoc +121 -1
  19. data/docs/reference/options-across-interfaces.adoc +182 -2
  20. data/lib/canon/cli.rb +20 -0
  21. data/lib/canon/commands/diff_command.rb +7 -2
  22. data/lib/canon/commands/format_command.rb +1 -1
  23. data/lib/canon/comparison/html_comparator.rb +29 -19
  24. data/lib/canon/comparison/html_compare_profile.rb +4 -4
  25. data/lib/canon/comparison/markup_comparator.rb +12 -3
  26. data/lib/canon/comparison/match_options/base_resolver.rb +29 -7
  27. data/lib/canon/comparison/match_options/json_resolver.rb +9 -0
  28. data/lib/canon/comparison/match_options/xml_resolver.rb +16 -2
  29. data/lib/canon/comparison/match_options/yaml_resolver.rb +10 -0
  30. data/lib/canon/comparison/match_options.rb +4 -1
  31. data/lib/canon/comparison/whitespace_sensitivity.rb +189 -137
  32. data/lib/canon/comparison/xml_comparator/child_comparison.rb +21 -4
  33. data/lib/canon/comparison/xml_comparator.rb +14 -12
  34. data/lib/canon/comparison/xml_node_comparison.rb +51 -6
  35. data/lib/canon/comparison.rb +52 -9
  36. data/lib/canon/config/env_schema.rb +32 -4
  37. data/lib/canon/config/override_resolver.rb +16 -3
  38. data/lib/canon/config/profile_loader.rb +135 -0
  39. data/lib/canon/config/profiles/metanorma.yml +74 -0
  40. data/lib/canon/config/profiles/metanorma_debug.yml +8 -0
  41. data/lib/canon/config/type_converter.rb +8 -0
  42. data/lib/canon/config.rb +469 -5
  43. data/lib/canon/diff/diff_classifier.rb +41 -11
  44. data/lib/canon/diff_formatter/diff_detail_formatter/dimension_formatter.rb +48 -17
  45. data/lib/canon/diff_formatter/diff_detail_formatter/node_utils.rb +58 -0
  46. data/lib/canon/diff_formatter/diff_detail_formatter.rb +73 -17
  47. data/lib/canon/diff_formatter.rb +493 -36
  48. data/lib/canon/pretty_printer/xml_normalized.rb +395 -0
  49. data/lib/canon/rspec_matchers.rb +36 -0
  50. data/lib/canon/version.rb +1 -1
  51. data/lib/canon/xml/nodes/namespace_node.rb +4 -0
  52. data/lib/canon/xml/nodes/processing_instruction_node.rb +4 -0
  53. data/lib/canon/xml/nodes/root_node.rb +4 -0
  54. data/lib/canon/xml/nodes/text_node.rb +4 -0
  55. data/lib/tasks/performance_helpers.rb +2 -2
  56. metadata +24 -2
@@ -1,528 +1,227 @@
1
1
  ---
2
+ layout: default
2
3
  title: Character Visualization
3
4
  parent: Diff Formatting
4
5
  grand_parent: Features
5
- nav_order: 4
6
+ nav_order: 3
6
7
  ---
7
- = Character visualization
8
+
8
9
  :toc:
9
10
  :toclevels: 3
10
11
 
11
- == Purpose
12
-
13
- Canon's character visualization system makes invisible characters (spaces, tabs, zero-width characters) visible in diff output, helping you quickly identify whitespace differences that cause test failures.
14
-
15
- Visualization is **CJK-safe**, using Unicode symbols that don't conflict with Chinese, Japanese, or Korean text.
16
-
17
- == When visualization is applied
18
-
19
- Character visualization is applied **only to diff lines** (additions, deletions, and changes), not to context lines (unchanged lines). This ensures:
20
-
21
- * Context lines display content in original form
22
- * Only actual changes show visualization
23
- * Differences are easier to spot
24
-
25
- Within changed lines showing token-level diffs, unchanged tokens are displayed in the terminal's default color (not red/green) to distinguish them from actual changes.
26
-
27
- == Default character map
28
-
29
- Canon provides a comprehensive CJK-safe character mapping.
30
-
31
- === Common whitespace
32
-
33
- [cols="1,1,1,2"]
34
- |===
35
- |Character |Unicode |Symbol |Description
36
-
37
- |Regular space
38
- |U+0020
39
- |`░`
40
- |Light Shade (U+2591)
41
-
42
- |Tab
43
- |U+0009
44
- |`⇥`
45
- |Rightwards Arrow to Bar (U+21E5)
46
-
47
- |Non-breaking space
48
- |U+00A0
49
- |`␣`
50
- |Open Box (U+2423)
51
- |===
52
-
53
- === Line endings
54
-
55
- [cols="1,1,1,2"]
56
- |===
57
- |Character |Unicode |Symbol |Description
58
-
59
- |Line feed (LF)
60
- |U+000A
61
- |`↵`
62
- |Downwards Arrow with Corner Leftwards (U+21B5)
63
-
64
- |Carriage return (CR)
65
- |U+000D
66
- |`⏎`
67
- |Return Symbol (U+23CE)
68
-
69
- |Windows line ending (CRLF)
70
- |U+000D U+000A
71
- |`↵`
72
- |Downwards Arrow with Corner Leftwards (U+21B5)
73
-
74
- |Next line (NEL)
75
- |U+0085
76
- |`⏎`
77
- |Return Symbol (U+23CE)
78
-
79
- |Line separator
80
- |U+2028
81
- |`⤓`
82
- |Downwards Arrow to Bar (U+2913)
83
-
84
- |Paragraph separator
85
- |U+2029
86
- |`⤓`
87
- |Downwards Arrow to Bar (U+2913)
88
- |===
89
-
90
- === Unicode spaces
91
-
92
- [cols="1,1,1,2"]
93
- |===
94
- |Character |Unicode |Symbol |Description
95
-
96
- |En space
97
- |U+2002
98
- |`▭`
99
- |White Rectangle (U+25AD)
100
-
101
- |Em space
102
- |U+2003
103
- |`▬`
104
- |Black Rectangle (U+25AC)
12
+ == Overview
105
13
 
106
- |Four-per-em space
107
- |U+2005
108
- |`⏓`
109
- |Metrical Short Over Long (U+23D3)
14
+ Canon replaces invisible characters (spaces, tabs, non-breaking spaces, etc.)
15
+ with visible Unicode symbols before rendering diff output. This makes it easy
16
+ to spot whitespace differences that would otherwise be indistinguishable from
17
+ surrounding content.
110
18
 
111
- |Six-per-em space
112
- |U+2006
113
- |`⏕`
114
- |Metrical Two Shorts Over Long (U+23D5)
115
-
116
- |Thin space
117
- |U+2009
118
- |`▯`
119
- |White Vertical Rectangle (U+25AF)
120
-
121
- |Hair space
122
- |U+200A
123
- |`▮`
124
- |Black Vertical Rectangle (U+25AE)
125
-
126
- |Figure space
127
- |U+2007
128
- |`□`
129
- |White Square (U+25A1)
130
-
131
- |Narrow no-break space
132
- |U+202F
133
- |`▫`
134
- |White Small Square (U+25AB)
135
-
136
- |Medium mathematical space
137
- |U+205F
138
- |`▭`
139
- |White Rectangle (U+25AD)
140
-
141
- |Ideographic space
142
- |U+3000
143
- |`⎵`
144
- |Bottom Square Bracket (U+23B5)
145
-
146
- |Ideographic half space
147
- |U+303F
148
- |`⏑`
149
- |Metrical Breve (U+23D1)
150
- |===
151
-
152
- === Zero-width characters
153
-
154
- [cols="1,1,1,2"]
155
- |===
156
- |Character |Unicode |Symbol |Description
157
-
158
- |Zero-width space
159
- |U+200B
160
- |`→`
161
- |Rightwards Arrow (U+2192)
162
-
163
- |Zero-width non-joiner
164
- |U+200C
165
- |`↛`
166
- |Rightwards Arrow with Stroke (U+219B)
167
-
168
- |Zero-width joiner
169
- |U+200D
170
- |`⇢`
171
- |Rightwards Dashed Arrow (U+21E2)
172
-
173
- |Zero-width no-break space (BOM)
174
- |U+FEFF
175
- |`⇨`
176
- |Rightwards White Arrow (U+21E8)
177
- |===
178
-
179
- === Bidirectional/RTL markers
180
-
181
- [cols="1,1,1,2"]
182
- |===
183
- |Character |Unicode |Symbol |Description
184
-
185
- |Left-to-right mark
186
- |U+200E
187
- |`⟹`
188
- |Long Rightwards Double Arrow (U+27F9)
189
-
190
- |Right-to-left mark
191
- |U+200F
192
- |`⟸`
193
- |Long Leftwards Double Arrow (U+27F8)
194
-
195
- |LTR embedding
196
- |U+202A
197
- |`⇒`
198
- |Rightwards Double Arrow (U+21D2)
199
-
200
- |RTL embedding
201
- |U+202B
202
- |`⇐`
203
- |Leftwards Double Arrow (U+21D0)
204
-
205
- |Pop directional formatting
206
- |U+202C
207
- |`↔`
208
- |Left Right Arrow (U+2194)
209
-
210
- |LTR override
211
- |U+202D
212
- |`⇉`
213
- |Rightwards Paired Arrows (U+21C9)
214
-
215
- |RTL override
216
- |U+202E
217
- |`⇇`
218
- |Leftwards Paired Arrows (U+21C7)
219
- |===
220
-
221
- === Control characters
222
-
223
- [cols="1,1,1,2"]
224
- |===
225
- |Character |Unicode |Symbol |Description
226
-
227
- |Null
228
- |U+0000
229
- |`␀`
230
- |Symbol for Null (U+2400)
231
-
232
- |Soft hyphen
233
- |U+00AD
234
- |`­‐`
235
- |Hyphen (U+2010)
236
-
237
- |Backspace
238
- |U+0008
239
- |`␈`
240
- |Symbol for Backspace (U+2408)
241
-
242
- |Delete
243
- |U+007F
244
- |`␡`
245
- |Symbol for Delete (U+2421)
246
- |===
247
-
248
- == CJK safety
249
-
250
- The visualization characters are specifically chosen to avoid conflicts with CJK text:
251
-
252
- **Avoided characters**:
253
-
254
- * **No middle dots** (`·`) - commonly used as separators in CJK
255
- * **No bullets** (`∙`) - used in CJK lists
256
- * **No circles** (`◌◍◎`) - look similar to CJK characters like ○ ●
257
- * **No small dots** (`⋅`) - conflict with CJK punctuation
258
-
259
- **Used instead**:
260
-
261
- * Box characters (`□▭▬▯▮▫`) for various space types
262
- * Arrow symbols (`→↛⇢⇨⟹⟸⇒⇐`) for zero-width and directional characters
263
- * Control Pictures block symbols (`␀␈␡`) for control characters
264
-
265
- == Examples in use
266
-
267
- === Space added
268
-
269
- .Regular space added
270
- [example]
271
- ====
272
- [source]
273
- ----
274
- 10| -| <tag>Value</tag> # No space
275
- | 10+| <tag>░Value</tag> # Space added (green light shade)
276
- ----
277
-
278
- The `░` symbol clearly shows a regular space was added between `<tag>` and `Value`.
279
- ====
280
-
281
- === Tab vs spaces
282
-
283
- .Tab replaced with spaces
19
+ .Example output with character visualization enabled (default)
284
20
  [example]
285
21
  ====
286
- [source]
287
22
  ----
288
- 15| -| <tag>⇥Value</tag> # Tab (red arrow-to-bar)
289
- | 15+| <tag>░░Value</tag> # Two spaces (green light shades)
23
+ | 3- | ░░<item>Alpha</item>
24
+ | 3+ | ░░<item>Beta</item>
290
25
  ----
291
-
292
- The difference between a tab (`⇥`) and two spaces (`░░`) is immediately visible.
293
- ====
294
-
295
- === Non-breaking space
296
-
297
- .Non-breaking space from web copy-paste
298
- [example]
26
+ The `░` characters represent U+0020 spaces used for indentation.
299
27
  ====
300
- Without visualization, these look identical:
301
-
302
- [source,xml]
303
- ----
304
- <foreword id="fwd">
305
- <foreword id="fwd">
306
- ----
307
28
 
308
- With visualization:
29
+ == Configuration
309
30
 
310
- [source]
311
- ----
312
- 4| -| <foreword░id="fwd"> # Regular space (U+0020)
313
- | 4+| <foreword␣id="fwd"> # Non-breaking space (U+00A0)
314
- ----
31
+ `character_visualization` is a `DiffConfig` option accessible via
32
+ `Canon::Config`, environment variables, or the `DiffFormatter` constructor.
315
33
 
316
- The different symbols (`░` vs `␣`) clearly show that one uses a regular space while the other uses a non-breaking space, likely from copying from a web page.
317
- ====
34
+ === Values
318
35
 
319
- === Zero-width space
36
+ [cols="1,4"]
37
+ |===
38
+ |Value |Behaviour
39
+
40
+ |`true`
41
+ |(default) The full default visualization map is applied. Spaces appear as
42
+ `░`, tabs as `⇥`, non-breaking spaces as `␣`, and so on.
43
+
44
+ |`false`
45
+ |All visualization is disabled. Characters appear exactly as they are stored
46
+ in the document. Useful when copying failure-message output into a text
47
+ editor, or when downstream tooling cannot handle Unicode symbols.
48
+
49
+ |`:content_only`
50
+ |**Reserved for future use** — currently behaves identically to `true`.
51
+ Future intent: apply visualization only to text-node content, leaving
52
+ structural indentation whitespace plain so that pretty-printed diffs remain
53
+ visually uncluttered. See the <<future-work>> section below.
54
+ |===
320
55
 
321
- .Zero-width space (completely invisible)
322
- [example]
323
- ====
324
- Zero-width characters are invisible but affect comparison:
56
+ === Via `Canon::Config` (RSpec `spec_helper.rb`)
325
57
 
326
- [source,xml]
327
- ----
328
- <item>Widget</item>
329
- <item>Widget</item> <!-- Contains U+200B zero-width space after "Widget" -->
58
+ [source,ruby]
330
59
  ----
60
+ Canon::Config.configure do |cfg|
61
+ # Disable for all XML diffs in this test suite
62
+ cfg.xml.diff.character_visualization = false
331
63
 
332
- The diff shows:
333
-
334
- [source]
335
- ----
336
- 5| -| <item>Widget</item>
337
- | 5+| <item>Widget→</item> # Zero-width space visualized as →
64
+ # Or disable for HTML as well
65
+ cfg.html.diff.character_visualization = false
66
+ end
338
67
  ----
339
68
 
340
- The rightwards arrow (`→`) reveals the presence of a zero-width space.
341
- ====
342
-
343
- == Real-world scenarios
69
+ The setting is format-specific: `cfg.xml`, `cfg.html`, `cfg.json`,
70
+ `cfg.yaml`, and `cfg.string` each have their own `diff.character_visualization`.
344
71
 
345
- === Web copy-paste
72
+ === Via Environment Variable
346
73
 
347
- **Problem**: Text copied from web pages often contains non-breaking spaces (U+00A0) instead of regular spaces.
348
-
349
- .Detection example
350
- [example]
351
- ====
352
- [source]
353
- ----
354
- 4| -| <p>Hello░world</p> # U+0020 (regular space)
355
- | 4+| <p>Hello␣world</p> # U+00A0 (non-breaking space)
74
+ [source,bash]
356
75
  ----
76
+ # Disable globally
77
+ export CANON_CHARACTER_VISUALIZATION=false
357
78
 
358
- The `␣` symbol immediately identifies the non-breaking space.
359
- ====
360
-
361
- === Smart quotes
362
-
363
- **Problem**: Text editors may automatically convert straight quotes to curly quotes.
79
+ # Disable for XML only
80
+ export CANON_XML_DIFF_CHARACTER_VISUALIZATION=false
364
81
 
365
- .Detection example
366
- [example]
367
- ====
368
- [source]
369
- ----
370
- 10| -| <title>John's Book</title> # Straight apostrophe
371
- | 10+| <title>John's Book</title> # Curly apostrophe (U+2019)
82
+ # Set to :content_only for XML only
83
+ export CANON_XML_DIFF_CHARACTER_VISUALIZATION=content_only
372
84
  ----
373
85
 
374
- Non-ASCII warning will alert you to the smart quote.
375
- ====
376
-
377
- === Template generation
378
-
379
- **Problem**: Generated output has invisible character differences.
86
+ === Via `DiffFormatter` directly
380
87
 
381
- .Detection example
382
- [example]
383
- ====
384
- [source]
88
+ [source,ruby]
385
89
  ----
386
- 20| -| <item>Value→</item> # Zero-width space present
387
- | 20+| <item>Value</item> # No zero-width space
90
+ formatter = Canon::DiffFormatter.new(
91
+ use_color: false,
92
+ mode: :by_line,
93
+ character_visualization: false,
94
+ )
388
95
  ----
389
96
 
390
- The `→` symbol reveals the zero-width space in generated content.
391
- ====
392
-
393
- == Customizing character visualization
97
+ == Default Visualization Map
394
98
 
395
- You can customize the visualization map for specific needs.
99
+ The following characters are visualized by default:
396
100
 
397
- === Custom map
101
+ [cols="2,1,1,3"]
102
+ |===
103
+ |Name |Unicode |Visualization |Notes
398
104
 
399
- [source,ruby]
400
- ----
401
- require 'canon/diff_formatter'
105
+ |Space
106
+ |U+0020
107
+ |`░`
108
+ |Most common; appears as indentation and between tokens
402
109
 
403
- # Create custom visualization map
404
- custom_map = Canon::DiffFormatter.merge_visualization_map({
405
- ' ' => '·', # Use middle dot for spaces (if not using CJK)
406
- "\t" => '→', # Use simple arrow for tabs
407
- "\u200B" => '⚠' # Warning symbol for zero-width space
408
- })
110
+ |Tab
111
+ |U+0009
112
+ |`⇥`
113
+ |Structural indentation in tab-indented files
409
114
 
410
- # Use custom map with formatter
411
- formatter = Canon::DiffFormatter.new(
412
- use_color: true,
413
- visualization_map: custom_map
414
- )
415
- ----
115
+ |No-Break Space
116
+ |U+00A0
117
+ |`␣`
118
+ |Common in HTML (`&nbsp;`) and word-processor output
416
119
 
417
- === When to customize
120
+ |Thin Space
121
+ |U+2009
122
+ |`·`
123
+ |Typographic use
418
124
 
419
- **Use custom visualization when**:
125
+ |En Space
126
+ |U+2002
127
+ |`◦`
128
+ |
420
129
 
421
- * Working with non-CJK text exclusively
422
- * Prefer simpler symbols
423
- * Need specific character highlighting
424
- * Integrating with existing tools
130
+ |Em Space
131
+ |U+2003
132
+ |`▬`
133
+ |
425
134
 
426
- **Keep defaults when**:
135
+ |Line Feed
136
+ |U+000A
137
+ |`↵`
138
+ |End-of-line (shown in certain modes)
427
139
 
428
- * Working with CJK text
429
- * Maximum compatibility needed
430
- * Standard behavior preferred
140
+ |Carriage Return
141
+ |U+000D
142
+ |`←`
143
+ |Windows line endings
431
144
 
432
- == Configuration
145
+ |Zero-Width Space
146
+ |U+200B
147
+ |`⁰`
148
+ |Invisible in output; often a bug
149
+ |===
433
150
 
434
- Character visualization is automatically enabled when `use_color: true` and applies across all Canon interfaces.
151
+ The complete map is loaded from
152
+ `lib/canon/diff_formatter/character_map.yml` at startup.
435
153
 
436
- === Enabling/disabling
154
+ == Interaction with `display_preprocessing`
437
155
 
438
- Visualization is tied to color output:
156
+ Character visualization is applied **after** display preprocessing.
439
157
 
440
- [source,ruby]
441
- ----
442
- # Enable (visualization active)
443
- diff: { use_color: true }
158
+ When `display_preprocessing: :pretty_print` is active, Canon runs both
159
+ documents through `Canon::PrettyPrinter::Xml` before the line diff. The
160
+ pretty-printer produces only:
444
161
 
445
- # Disable (no visualization)
446
- diff: { use_color: false }
447
- ----
162
+ * ASCII U+0020 spaces for indentation
163
+ * ASCII U+000A newlines between elements
448
164
 
449
- === Interface configuration
165
+ Because the default visualization map visualizes U+0020 spaces as `░`,
166
+ structural indentation introduced by the pretty-printer *will* appear as `░`
167
+ characters in context lines. This is intentional: the diff output stays
168
+ consistent regardless of how indentation was introduced.
450
169
 
451
- .Ruby API
170
+ .Example (display_preprocessing: :pretty_print, character_visualization: true)
452
171
  [example]
453
172
  ====
454
- [source,ruby]
455
173
  ----
456
- # Visualization enabled by default
457
- Canon::Comparison.equivalent?(doc1, doc2,
458
- verbose: true,
459
- diff: { use_color: true } # Visualization active
460
- )
461
-
462
- # Disable for plain text
463
- Canon::Comparison.equivalent?(doc1, doc2,
464
- verbose: true,
465
- diff: { use_color: false } # No visualization
466
- )
174
+ | 1 | <root>
175
+ | 2- | ░░<item>Alpha</item>
176
+ | 2+ | ░░<item>Beta</item>
177
+ | 3 | </root>
467
178
  ----
179
+ The `░░` represents the 2-space indentation added by the pretty-printer.
468
180
  ====
469
181
 
470
- .CLI
471
- [example]
472
- ====
473
- [source,bash]
474
- ----
475
- # Enable (default)
476
- canon diff file1.xml file2.xml --verbose
182
+ If you want structure-free visualization (e.g. indentation stays as plain
183
+ spaces but content whitespace is still visualized), set
184
+ `character_visualization: :content_only` once that feature is implemented.
185
+ See <<future-work>>.
477
186
 
478
- # Disable
479
- canon diff file1.xml file2.xml --no-color --verbose
480
- ----
481
- ====
187
+ == Combining with `context_lines`
188
+
189
+ Context lines (unchanged lines shown around a diff hunk) are also subject to
190
+ character visualization. Reducing `context_lines` limits the number of
191
+ visualized lines displayed:
482
192
 
483
- .RSpec
484
- [example]
485
- ====
486
193
  [source,ruby]
487
194
  ----
488
- Canon::RSpecMatchers.configure do |config|
489
- # Enable for local development
490
- config.xml.diff.use_color = !ENV['CI']
195
+ Canon::Config.configure do |cfg|
196
+ cfg.xml.diff.context_lines = 1 # show only 1 context line
197
+ cfg.xml.diff.character_visualization = true
491
198
  end
492
199
  ----
493
- ====
494
-
495
- == Troubleshooting
496
-
497
- === Visualization not showing
498
-
499
- **Problem**: Invisible characters not visualized.
500
-
501
- **Solutions**:
502
-
503
- * Ensure `use_color: true`
504
- * Check terminal supports Unicode
505
- * Verify the characters are in diff lines (not context lines)
506
-
507
- === Wrong symbols displayed
508
200
 
509
- **Problem**: Symbols appear garbled or as boxes.
201
+ == [[future-work]]Future Work: `:content_only`
510
202
 
511
- **Solutions**:
203
+ The `:content_only` value is reserved for a planned DOM-level pre-serialization
204
+ pass that would:
512
205
 
513
- * Use terminal with Unicode support
514
- * Install Unicode-compatible font
515
- * Check terminal encoding (should be UTF-8)
206
+ 1. Walk the parsed DOM tree before serialization.
207
+ 2. Apply character visualization *only* to text node content.
208
+ 3. Leave structural whitespace (indentation, element-separator newlines)
209
+ plain.
516
210
 
517
- === CJK text affected
211
+ This would allow pretty-printed output to have clean indentation while
212
+ still making content whitespace visible.
518
213
 
519
- **Problem**: Visualization conflicts with CJK text.
214
+ The constraint that blocks this today: visualization is currently applied
215
+ as a post-serialization string substitution by the by-line formatters.
216
+ Implementing `:content_only` requires moving the substitution step into a
217
+ DOM walk before `Nokogiri::XML::Node#to_xml` is called.
520
218
 
521
- **Solution**: Canon's defaults are CJK-safe. If using custom map, avoid the characters listed in "CJK safety" section.
219
+ When implemented, `:content_only` will become the recommended value for
220
+ suites that use `display_preprocessing: :pretty_print`.
522
221
 
523
- == See also
222
+ == See Also
524
223
 
525
- * link:index.adoc[Diff Formatting] - Overview of formatting options
526
- * link:colors-and-symbols.adoc[Colors and Symbols] - Color scheme details
527
- * link:../../interfaces/cli/index.adoc[CLI Interface] - Command-line usage
528
- * link:../../interfaces/ruby-api/index.adoc[Ruby API] - Programmatic usage
224
+ * link:display-preprocessing.html[Display Preprocessing]
225
+ * link:index.html[Diff Formatting overview]
226
+ * link:../../reference/options-across-interfaces.html[Options Across Interfaces]
227
+ * link:../../reference/environment-variables.html[Environment Variables]