canon 0.1.8 → 0.1.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (98) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop_todo.yml +112 -25
  3. data/docs/Gemfile +1 -0
  4. data/docs/_config.yml +90 -1
  5. data/docs/advanced/diff-classification.adoc +82 -2
  6. data/docs/features/match-options/index.adoc +239 -1
  7. data/lib/canon/comparison/format_detector.rb +2 -1
  8. data/lib/canon/comparison/html_comparator.rb +19 -8
  9. data/lib/canon/comparison/html_compare_profile.rb +8 -2
  10. data/lib/canon/comparison/match_options/base_resolver.rb +7 -0
  11. data/lib/canon/comparison/whitespace_sensitivity.rb +208 -0
  12. data/lib/canon/comparison/xml_comparator/child_comparison.rb +15 -7
  13. data/lib/canon/comparison/xml_comparator/node_parser.rb +10 -5
  14. data/lib/canon/comparison/xml_comparator/node_type_comparator.rb +14 -7
  15. data/lib/canon/comparison/xml_comparator.rb +48 -23
  16. data/lib/canon/comparison/xml_node_comparison.rb +25 -3
  17. data/lib/canon/diff/diff_classifier.rb +101 -2
  18. data/lib/canon/diff/formatting_detector.rb +1 -1
  19. data/lib/canon/rspec_matchers.rb +37 -8
  20. data/lib/canon/version.rb +1 -1
  21. data/lib/canon/xml/data_model.rb +24 -13
  22. metadata +3 -78
  23. data/docs/plans/2025-01-17-html-parser-selection-fix.adoc +0 -250
  24. data/false_positive_analysis.txt +0 -0
  25. data/file1.html +0 -1
  26. data/file2.html +0 -1
  27. data/old-docs/ADVANCED_TOPICS.adoc +0 -20
  28. data/old-docs/BASIC_USAGE.adoc +0 -16
  29. data/old-docs/CHARACTER_VISUALIZATION.adoc +0 -567
  30. data/old-docs/CLI.adoc +0 -497
  31. data/old-docs/CUSTOMIZING_BEHAVIOR.adoc +0 -19
  32. data/old-docs/DIFF_ARCHITECTURE.adoc +0 -435
  33. data/old-docs/DIFF_FORMATTING.adoc +0 -540
  34. data/old-docs/DIFF_PARAMETERS.adoc +0 -261
  35. data/old-docs/DOM_DIFF.adoc +0 -1017
  36. data/old-docs/ENV_CONFIG.adoc +0 -876
  37. data/old-docs/FORMATS.adoc +0 -867
  38. data/old-docs/INPUT_VALIDATION.adoc +0 -477
  39. data/old-docs/MATCHER_BEHAVIOR.adoc +0 -90
  40. data/old-docs/MATCH_ARCHITECTURE.adoc +0 -463
  41. data/old-docs/MATCH_OPTIONS.adoc +0 -912
  42. data/old-docs/MODES.adoc +0 -432
  43. data/old-docs/NORMATIVE_INFORMATIVE_DIFFS.adoc +0 -219
  44. data/old-docs/OPTIONS.adoc +0 -1387
  45. data/old-docs/PREPROCESSING.adoc +0 -491
  46. data/old-docs/README.old.adoc +0 -2831
  47. data/old-docs/RSPEC.adoc +0 -814
  48. data/old-docs/RUBY_API.adoc +0 -485
  49. data/old-docs/SEMANTIC_DIFF_REPORT.adoc +0 -646
  50. data/old-docs/SEMANTIC_TREE_DIFF.adoc +0 -765
  51. data/old-docs/STRING_COMPARE.adoc +0 -345
  52. data/old-docs/TMP.adoc +0 -3384
  53. data/old-docs/TREE_DIFF.adoc +0 -1080
  54. data/old-docs/UNDERSTANDING_CANON.adoc +0 -17
  55. data/old-docs/VERBOSE.adoc +0 -482
  56. data/old-docs/VISUALIZATION_MAP.adoc +0 -625
  57. data/old-docs/WHITESPACE_TREATMENT.adoc +0 -1155
  58. data/scripts/analyze_current_state.rb +0 -85
  59. data/scripts/analyze_false_positives.rb +0 -114
  60. data/scripts/analyze_remaining_failures.rb +0 -105
  61. data/scripts/compare_current_failures.rb +0 -95
  62. data/scripts/compare_dom_tree_diff.rb +0 -158
  63. data/scripts/compare_failures.rb +0 -151
  64. data/scripts/debug_attribute_extraction.rb +0 -66
  65. data/scripts/debug_blocks_839.rb +0 -115
  66. data/scripts/debug_meta_matching.rb +0 -52
  67. data/scripts/debug_p_matching.rb +0 -192
  68. data/scripts/debug_signature_matching.rb +0 -118
  69. data/scripts/debug_sourcecode_124.rb +0 -32
  70. data/scripts/debug_whitespace_sensitive.rb +0 -192
  71. data/scripts/extract_false_positives.rb +0 -138
  72. data/scripts/find_actual_false_positives.rb +0 -125
  73. data/scripts/investigate_all_false_positives.rb +0 -161
  74. data/scripts/investigate_batch1.rb +0 -127
  75. data/scripts/investigate_classification.rb +0 -150
  76. data/scripts/investigate_classification_detailed.rb +0 -190
  77. data/scripts/investigate_common_failures.rb +0 -342
  78. data/scripts/investigate_false_negative.rb +0 -80
  79. data/scripts/investigate_false_positive.rb +0 -83
  80. data/scripts/investigate_false_positives.rb +0 -227
  81. data/scripts/investigate_false_positives_batch.rb +0 -163
  82. data/scripts/investigate_mixed_content.rb +0 -125
  83. data/scripts/investigate_remaining_16.rb +0 -214
  84. data/scripts/run_single_test.rb +0 -29
  85. data/scripts/test_all_false_positives.rb +0 -95
  86. data/scripts/test_attribute_details.rb +0 -61
  87. data/scripts/test_both_algorithms.rb +0 -49
  88. data/scripts/test_both_simple.rb +0 -49
  89. data/scripts/test_enhanced_semantic_output.rb +0 -125
  90. data/scripts/test_readme_examples.rb +0 -131
  91. data/scripts/test_semantic_tree_diff.rb +0 -99
  92. data/scripts/test_semantic_ux_improvements.rb +0 -135
  93. data/scripts/test_single_false_positive.rb +0 -119
  94. data/scripts/test_size_limits.rb +0 -99
  95. data/test_html_1.html +0 -21
  96. data/test_html_2.html +0 -21
  97. data/test_nokogiri.rb +0 -33
  98. data/test_normalize.rb +0 -45
@@ -1,646 +0,0 @@
1
- ---
2
- layout: default
3
- title: Semantic Diff Report
4
- nav_order: 41
5
- parent: Advanced Topics
6
- ---
7
- = Canon semantic diff report
8
- :toc:
9
- :toclevels: 3
10
-
11
- == General
12
-
13
- The Semantic Diff Report provides dimension-specific, actionable details for
14
- each difference found during comparison. Unlike the detailed line-by-line or
15
- object tree diffs, which show every changed line, the Semantic Diff Report
16
- focuses on WHAT changed and WHY it matters.
17
-
18
- The report is always shown in verbose mode when differences exist, providing
19
- a high-level summary before the detailed diff output.
20
-
21
- Key features:
22
-
23
- * XPath locations for XML/HTML elements
24
- * JSON path locations for JSON/YAML data
25
- * Dimension-specific formatting optimized for each type of difference
26
- * Colorized output for easy visual scanning
27
- * Whitespace preservation detection for `<pre>`, `<code>`, etc.
28
- * Actionable change summaries (e.g., "Added: +xmlns:v, +xmlns:o")
29
-
30
- == Architecture
31
-
32
- The Semantic Diff Report is generated by the `DiffDetailFormatter` module
33
- and integrated into the main diff output flow:
34
-
35
- [source]
36
- ----
37
- ╔═════════════════════════════════════════════════════════════════════╗
38
- ║ SEMANTIC DIFF REPORT ARCHITECTURE ║
39
- ╚═════════════════════════════════════════════════════════════════════╝
40
-
41
- Input: ComparisonResult with differences array
42
-
43
- ┌─────────────────────────────────────────────────────────────────────┐
44
- │ DiffDetailFormatter.format_report(differences) │
45
- │ │
46
- │ For each difference: │
47
- │ 1. Detect type (DiffNode for XML/HTML, Hash for JSON/YAML) │
48
- │ 2. Extract location (XPath or JSON path) │
49
- │ 3. Dispatch to dimension-specific formatter │
50
- │ 4. Format as vertical section with colors │
51
- └─────────────────────────────────────────────────────────────────────┘
52
-
53
-
54
- ┌─────────────────────────────────────────────────────────────────────┐
55
- │ Dimension-Specific Formatters │
56
- │ │
57
- │ XML/HTML: JSON/YAML: │
58
- │ • attribute_presence • hash_diff │
59
- │ • attribute_values (unified handling) │
60
- │ • text_content │
61
- │ • structural_whitespace │
62
- │ • comments │
63
- │ │
64
- │ Each formatter returns: │
65
- │ [detail1, detail2, changes_summary] │
66
- └─────────────────────────────────────────────────────────────────────┘
67
-
68
-
69
- Output: Formatted semantic diff report with:
70
- • Header with difference count
71
- • Individual difference sections
72
- • Color-coded for easy scanning
73
- ----
74
-
75
- === Integration with DiffFormatter
76
-
77
- The Semantic Diff Report is integrated at the `format_comparison_result()`
78
- level:
79
-
80
- [source,ruby]
81
- ----
82
- # In Canon::DiffFormatter
83
- def format_comparison_result(comparison_result, expected, actual)
84
- output = []
85
-
86
- # 1. CANON VERBOSE tables (optional)
87
- output << DebugOutput.verbose_tables_only(...)
88
-
89
- # 2. Semantic Diff Report (always if diffs exist)
90
- if comparison_result.differences.any?
91
- output << DiffDetailFormatter.format_report(differences)
92
- end
93
-
94
- # 3. Detailed diff (always)
95
- output << format(...)
96
-
97
- output.compact.join("\n")
98
- end
99
- ----
100
-
101
- This ensures the Semantic Diff Report is part of the main output, not debug
102
- information.
103
-
104
- == Output format
105
-
106
- === General structure
107
-
108
- Each difference is displayed as a vertical section with these components:
109
-
110
- [example]
111
- ====
112
- [source]
113
- ----
114
- 🔍 DIFFERENCE #1/3 [STATUS]
115
- ──────────────────────────────────────────────────────────────────────
116
- Dimension: dimension_name
117
- Location: /xpath/or/json.path
118
-
119
- ⊖ Expected (File 1):
120
- Details...
121
-
122
- ⊕ Actual (File 2):
123
- Details...
124
-
125
- ✨ Changes:
126
- Summary of what changed
127
- ----
128
- ====
129
-
130
- Where:
131
-
132
- `STATUS`:: Either `[NORMATIVE]` (green) or `[INFORMATIVE]` (yellow)
133
- `Dimension`:: The match dimension that detected this difference (magenta)
134
- `Location`:: XPath for XML/HTML, JSON path for JSON/YAML (blue)
135
- `Expected`:: What was in File 1 (red heading)
136
- `Actual`:: What was in File 2 (green heading)
137
- `Changes`:: Actionable summary (yellow)
138
-
139
- === Color scheme
140
-
141
- The report uses colors to make scanning easy:
142
-
143
- * **Dimension name**: Magenta
144
- * **XPath/JSON path**: Blue
145
- * **Expected heading**: Red (bold)
146
- * **Actual heading**: Green (bold)
147
- * **Changes heading**: Yellow (bold)
148
- * **Status [NORMATIVE]**: Green (bold)
149
- * **Status [INFORMATIVE]**: Yellow (bold)
150
- * **Added items**: Green (with `+` prefix)
151
- * **Removed items**: Red (with `-` prefix)
152
- * **Element names**: Magenta
153
- * **Attribute names**: Cyan
154
-
155
- === Vertical layout
156
-
157
- The vertical layout ensures no width constraints, making it easy to read
158
- even with long attribute lists or deeply nested paths.
159
-
160
- == XML/HTML dimensions
161
-
162
- === General
163
-
164
- XML and HTML comparisons use the same set of dimensions, classified based
165
- on what aspect of the document differs.
166
-
167
- === Attribute presence differences
168
-
169
- Reports when attributes are missing or extra.
170
-
171
- .Real-world example: Removed attribute in IsoDoc output
172
- [example]
173
- [source]
174
- ----
175
- 🔍 DIFFERENCE #1/1 [NORMATIVE]
176
- ──────────────────────────────────────────────────────────────────────
177
- Dimension: attribute_presence
178
- Location: /iso-standard/preface/foreword
179
-
180
- ⊖ Expected (File 1):
181
- <foreword> with 3 attributes: displayorder, id, semx-id
182
-
183
- ⊕ Actual (File 2):
184
- <foreword> with 2 attributes: displayorder, id
185
-
186
- ✨ Changes:
187
- Removed: -semx-id
188
- ----
189
-
190
- .Another real-world example: Missing class attribute
191
- [example]
192
- [source]
193
- ----
194
- 🔍 DIFFERENCE #1/1 [NORMATIVE]
195
- ──────────────────────────────────────────────────────────────────────
196
- Dimension: attribute_presence
197
- Location: /html/body/div/div
198
-
199
- ⊖ Expected (File 1):
200
- <div> with 2 attributes: class, id
201
-
202
- ⊕ Actual (File 2):
203
- <div> with 1 attribute: id
204
-
205
- [source]
206
- ----
207
- 🔍 DIFFERENCE #1/1 [NORMATIVE]
208
- ──────────────────────────────────────────────────────────────────────
209
- Dimension: attribute_presence
210
- Location: /html/body/p
211
-
212
- ⊖ Expected (File 1):
213
- <p> with 2 attributes: id, lang
214
-
215
- ⊕ Actual (File 2):
216
- <p> with 5 attributes: id, lang, xmlns:o, xmlns:v, xmlns:w
217
-
218
- ✨ Changes:
219
- Added: +xmlns:o, +xmlns:v, +xmlns:w
220
- ----
221
-
222
- The report shows:
223
- ✨ Changes:
224
- Removed: -class
225
- ----
226
-
227
- The report shows:
228
- ====
229
- [source]
230
- ----
231
- 🔍 DIFFERENCE #1/1 [NORMATIVE]
232
- ──────────────────────────────────────────────────────────────────────
233
- Dimension: attribute_presence
234
- Location: /html/body/p
235
-
236
- ⊖ Expected (File 1):
237
- <p> with 2 attributes: id, lang
238
-
239
- ⊕ Actual (File 2):
240
- <p> with 5 attributes: id, lang, xmlns:o, xmlns:v, xmlns:w
241
-
242
- ✨ Changes:
243
- Added: +xmlns:o, +xmlns:v, +xmlns:w
244
- ----
245
- ====
246
-
247
- The report shows:
248
-
249
- * Element name
250
- * Total attribute count in each document
251
- * Complete list of attributes
252
- * Which were added (green `+` prefix) or removed (red `-` prefix)
253
-
254
- This makes it immediately clear what needs to be added or removed to fix
255
- the test.
256
-
257
- === Attribute value differences
258
-
259
- Reports when an attribute value differs.
260
-
261
- [example]
262
- ====
263
- [source]
264
- ----
265
- 🔍 DIFFERENCE #1/1 [NORMATIVE]
266
- ──────────────────────────────────────────────────────────────────────
267
- Dimension: attribute_values
268
- Location: /html/body/div[@id="main"]
269
-
270
- ⊖ Expected (File 1):
271
- <div> class=" container fluid "
272
-
273
- ⊕ Actual (File 2):
274
- <div> class="container fluid"
275
-
276
- ✨ Changes:
277
- Whitespace normalization difference
278
- ----
279
- ====
280
-
281
- The report shows:
282
-
283
- * Which specific attribute has different value (cyan highlighting)
284
- * Exact values on both sides (with quotes)
285
- * Analysis of the difference type:
286
- ** "Whitespace difference only" - Only leading/trailing whitespace differs
287
- ** "Whitespace normalization difference" - Whitespace runs differ
288
- ** "Value changed" - Actual content differs
289
-
290
- === Text content differences
291
-
292
- Reports when element text content differs.
293
-
294
- .Real-world example: Whitespace differences inside `<pre>` element
295
- [example]
296
- [source]
297
- ----
298
- 🔍 DIFFERENCE #1/1 [NORMATIVE]
299
- ──────────────────────────────────────────────────────────────────────
300
- Dimension: text_content
301
- Location: /html/body/div/div/div/table/tbody/tr/td/pre/text
302
-
303
- ⊖ Expected (File 1):
304
- <text> "
305
- puts \"Hello, world.\"
306
- "
307
-
308
- ⊕ Actual (File 2):
309
- <text> "puts \"Hello, world.\" "
310
-
311
- ✨ Changes:
312
- ⚠️ Whitespace preserved (inside <pre>, <code>, etc. - whitespace is significant)
313
- ----
314
-
315
- .Another real-world example: Text content change
316
- [example]
317
- [source]
318
- ----
319
- 🔍 DIFFERENCE #1/1 [NORMATIVE]
320
- ──────────────────────────────────────────────────────────────────────
321
- Dimension: text_content
322
- Location: /iso-standard/bibliography/references/bibitem/formattedref
323
-
324
- ⊖ Expected (File 1):
325
- <formattedref> "
326
- R. FIELDING, J. GETTYS, J. MOGUL, H. FRYSTYK, L. MASINTER, P. LEACH and T. BERNER..."
327
-
328
- ⊕ Actual (File 2):
329
- <formattedref> "[1]IETF RFC 2616, "
330
-
331
- [source]
332
- ----
333
- 🔍 DIFFERENCE #1/1 [NORMATIVE]
334
- ──────────────────────────────────────────────────────────────────────
335
- Dimension: text_content
336
- Location: /html/body/div/table/tbody/tr/td/pre/text
337
-
338
- ⊖ Expected (File 1):
339
- <text> "
340
- puts \"Hello, world.\"
341
- "
342
-
343
- ⊕ Actual (File 2):
344
- <text> "puts \"Hello, world.\" "
345
-
346
- ✨ Changes:
347
- ⚠️ Whitespace preserved (inside <pre>, <code>, etc. - whitespace
348
- is significant)
349
- ----
350
-
351
- The report shows:
352
- ✨ Changes:
353
- Text content changed
354
- ----
355
-
356
- The report shows:
357
- ====
358
- [source]
359
- ----
360
- 🔍 DIFFERENCE #1/1 [NORMATIVE]
361
- ──────────────────────────────────────────────────────────────────────
362
- Dimension: text_content
363
- Location: /html/body/div/table/tbody/tr/td/pre/text
364
-
365
- ⊖ Expected (File 1):
366
- <text> "
367
- puts \"Hello, world.\"
368
- "
369
-
370
- ⊕ Actual (File 2):
371
- <text> "puts \"Hello, world.\" "
372
-
373
- ✨ Changes:
374
- ⚠️ Whitespace preserved (inside <pre>, <code>, etc. - whitespace
375
- is significant)
376
- ----
377
- ====
378
-
379
- The report shows:
380
-
381
- * Text preview (truncated at 100 characters if long)
382
- * Element containing the text
383
- * Special warning if the text is inside whitespace-preserving elements
384
- (`<pre>`, `<code>`, `<textarea>`, `<script>`, `<style>`)
385
-
386
- The whitespace warning is important because Canon automatically switches
387
- from `text_content: normalize` to `:strict` mode inside these elements.
388
-
389
- === Structural whitespace differences
390
-
391
- Reports whitespace-only text differences (usually informative).
392
-
393
- [example]
394
- ====
395
- [source]
396
- ----
397
- 🔍 DIFFERENCE #1/1 [INFORMATIVE]
398
- ──────────────────────────────────────────────────────────────────────
399
- Dimension: structural_whitespace
400
- Location: /root/section/p
401
-
402
- ⊖ Expected (File 1):
403
- <p> "hello␣␣world"
404
-
405
- ⊕ Actual (File 2):
406
- <p> "hello␣world"
407
-
408
- ✨ Changes:
409
- Whitespace-only difference (informative)
410
- ----
411
- ====
412
-
413
- The report shows:
414
-
415
- * Whitespace visualized using Unicode symbols:
416
- ** `␣` - Space (U+0020)
417
- ** `→` - Tab
418
- ** `↵` - Newline
419
- * Marked as `[INFORMATIVE]` (yellow) when `structural_whitespace: ignore`
420
-
421
- === Comment differences
422
-
423
- Reports when HTML/XML comment content differs.
424
-
425
- [example]
426
- ====
427
- [source]
428
- ----
429
- 🔍 DIFFERENCE #1/1 [INFORMATIVE]
430
- ──────────────────────────────────────────────────────────────────────
431
- Dimension: comments
432
- Location: /html/head
433
-
434
- ⊖ Expected (File 1):
435
- <!-- Original comment text -->
436
-
437
- ⊕ Actual (File 2):
438
- <!-- Modified comment text -->
439
-
440
- ✨ Changes:
441
- Comment content differs
442
- ----
443
- ====
444
-
445
- == JSON/YAML dimensions
446
-
447
- === General
448
-
449
- JSON and YAML comparisons use path-based difference reporting with Hash
450
- objects containing:
451
-
452
- * `:path` - The JSON path to the difference (e.g., `user.profile.email`)
453
- * `:value1` - Expected value
454
- * `:value2` - Actual value
455
- * `:diff_code` - Type of difference (MISSING_HASH_KEY,
456
- UNEQUAL_PRIMITIVES, etc.)
457
-
458
- === Hash key differences
459
-
460
- Reports when a key is missing or has different value.
461
-
462
- [example]
463
- ====
464
- [source]
465
- ----
466
- 🔍 DIFFERENCE #1/1 [NORMATIVE]
467
- ──────────────────────────────────────────────────────────────────────
468
- Dimension: 2
469
- Location: user.email
470
-
471
- ⊖ Expected (File 1):
472
- user.email = "alice@example.com"
473
-
474
- ⊕ Actual (File 2):
475
- user.email = nil
476
-
477
- ✨ Changes:
478
- Key missing
479
- ----
480
- ====
481
-
482
- === Primitive value differences
483
-
484
- Reports when primitive values (strings, numbers, booleans) differ.
485
-
486
- [example]
487
- ====
488
- [source]
489
- ----
490
- 🔍 DIFFERENCE #1/1 [NORMATIVE]
491
- ──────────────────────────────────────────────────────────────────────
492
- Dimension: 15
493
- Location: users[0].age
494
-
495
- ⊖ Expected (File 1):
496
- users[0].age = 25
497
-
498
- ⊕ Actual (File 2):
499
- users[0].age = 30
500
-
501
- ✨ Changes:
502
- Value changed
503
- ----
504
- ====
505
-
506
- === Array differences
507
-
508
- Reports when arrays have different lengths or elements.
509
-
510
- [example]
511
- ====
512
- [source]
513
- ----
514
- 🔍 DIFFERENCE #1/1 [NORMATIVE]
515
- ──────────────────────────────────────────────────────────────────────
516
- Dimension: 12
517
- Location: items
518
-
519
- ⊖ Expected (File 1):
520
- items = [...] (5 items)
521
-
522
- ⊕ Actual (File 2):
523
- items = [...] (3 items)
524
-
525
- ✨ Changes:
526
- Array length differs
527
- ----
528
- ====
529
-
530
- Complex values (hashes, arrays) are shown as `{...} (N keys)` or
531
- `[...] (N items)` to keep output concise.
532
-
533
- == Special features
534
-
535
- === XPath location extraction
536
-
537
- For XML/HTML differences, the report extracts XPath with:
538
-
539
- * Full path from root: `/html/body/div/section/p`
540
- * Position predicates when multiple siblings: `/p[2]`, `/div[3]`
541
- * Safe traversal with depth limits to prevent infinite loops
542
- * Graceful error handling for circular references
543
- * Document node detection to stop at appropriate boundaries
544
-
545
- === Whitespace preservation detection
546
-
547
- The report detects when text is inside whitespace-preserving HTML elements
548
- and shows a special warning:
549
-
550
- [source]
551
- ----
552
- ✨ Changes: ⚠️ Whitespace preserved (inside <pre>, <code>, etc. -
553
- whitespace is significant)
554
- ----
555
-
556
- This is important because Canon automatically switches to `:strict` mode
557
- for text content inside these elements:
558
-
559
- * `<pre>` - Preformatted text
560
- * `<code>` - Code blocks
561
- * `<textarea>` - Text input areas
562
- * `<script>` - JavaScript code
563
- * `<style>` - CSS style sheets
564
-
565
- The warning helps developers understand why a seemingly minor whitespace
566
- difference is causing a test failure.
567
-
568
- === Comprehensive error handling
569
-
570
- The formatter includes multiple layers of error handling:
571
-
572
- * Top-level rescue in `format_single_diff()` - Catches any formatting
573
- errors
574
- * Safe XPath extraction with depth limits and circular reference detection
575
- * Safe parent traversal with document node checks
576
- * Graceful fallbacks when node types are unexpected
577
-
578
- This ensures the Semantic Diff Report never crashes, even with unusual DOM
579
- structures.
580
-
581
- == Implementation
582
-
583
- === DiffDetailFormatter class
584
-
585
- Module: `Canon::DiffFormatter::DiffDetailFormatter`
586
-
587
- Location: `lib/canon/diff_formatter/diff_detail_formatter.rb`
588
-
589
- Main entry point:
590
-
591
- [source,ruby]
592
- ----
593
- # Format all differences as semantic diff report
594
- def self.format_report(differences, use_color: true)
595
- # Returns formatted string with all difference sections
596
- end
597
- ----
598
-
599
- === Dimension dispatch mechanism
600
-
601
- The formatter uses dimension-based dispatch:
602
-
603
- [source,ruby]
604
- ----
605
- def format_dimension_details(diff, use_color)
606
- # Handle Hash diffs (JSON/YAML)
607
- return format_hash_diff_details(diff) if diff.is_a?(Hash)
608
-
609
- # Handle DiffNode (XML/HTML) based on dimension
610
- case diff.dimension
611
- when :attribute_presence
612
- format_attribute_presence_details(diff)
613
- when :attribute_values
614
- format_attribute_values_details(diff)
615
- when :text_content
616
- format_text_content_details(diff)
617
- # ... other dimensions
618
- end
619
- end
620
- ----
621
-
622
- This ensures each difference type is optimally formatted.
623
-
624
- === Helper methods
625
-
626
- Key helper methods:
627
-
628
- `extract_xpath(node)`:: Extracts XPath from XML/HTML nodes with safety
629
- limits and error handling
630
-
631
- `extract_location(diff)`:: Dispatches to XPath extraction for XML/HTML or
632
- returns JSON path for JSON/YAML
633
-
634
- `inside_preserve_element?(node)`:: Detects if node is inside `<pre>`,
635
- `<code>`, etc. with safe parent traversal
636
-
637
- `get_attribute_names(node)`:: Extracts sorted attribute names from elements
638
-
639
- `find_differing_attribute(node1, node2)`:: Finds which attribute has
640
- different value
641
-
642
- `format_json_value(value)`:: Formats JSON values concisely ({...},
643
- [...], primitives)
644
-
645
- All helpers include comprehensive error handling to ensure the report never
646
- crashes.