canon 0.1.3 → 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (102) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop.yml +9 -1
  3. data/.rubocop_todo.yml +276 -7
  4. data/README.adoc +203 -138
  5. data/_config.yml +116 -0
  6. data/docs/ADVANCED_TOPICS.adoc +20 -0
  7. data/docs/BASIC_USAGE.adoc +16 -0
  8. data/docs/CHARACTER_VISUALIZATION.adoc +567 -0
  9. data/docs/CLI.adoc +493 -0
  10. data/docs/CUSTOMIZING_BEHAVIOR.adoc +19 -0
  11. data/docs/DIFF_ARCHITECTURE.adoc +435 -0
  12. data/docs/DIFF_FORMATTING.adoc +540 -0
  13. data/docs/FORMATS.adoc +447 -0
  14. data/docs/INDEX.adoc +222 -0
  15. data/docs/INPUT_VALIDATION.adoc +477 -0
  16. data/docs/MATCH_ARCHITECTURE.adoc +463 -0
  17. data/docs/MATCH_OPTIONS.adoc +719 -0
  18. data/docs/MODES.adoc +432 -0
  19. data/docs/NORMATIVE_INFORMATIVE_DIFFS.adoc +219 -0
  20. data/docs/OPTIONS.adoc +1387 -0
  21. data/docs/PREPROCESSING.adoc +491 -0
  22. data/docs/RSPEC.adoc +605 -0
  23. data/docs/RUBY_API.adoc +478 -0
  24. data/docs/SEMANTIC_DIFF_REPORT.adoc +528 -0
  25. data/docs/UNDERSTANDING_CANON.adoc +17 -0
  26. data/docs/VERBOSE.adoc +482 -0
  27. data/exe/canon +7 -0
  28. data/lib/canon/cli.rb +179 -0
  29. data/lib/canon/commands/diff_command.rb +195 -0
  30. data/lib/canon/commands/format_command.rb +113 -0
  31. data/lib/canon/comparison/base_comparator.rb +39 -0
  32. data/lib/canon/comparison/comparison_result.rb +79 -0
  33. data/lib/canon/comparison/html_comparator.rb +410 -0
  34. data/lib/canon/comparison/json_comparator.rb +212 -0
  35. data/lib/canon/comparison/match_options.rb +616 -0
  36. data/lib/canon/comparison/xml_comparator.rb +566 -0
  37. data/lib/canon/comparison/yaml_comparator.rb +93 -0
  38. data/lib/canon/comparison.rb +239 -0
  39. data/lib/canon/config.rb +172 -0
  40. data/lib/canon/diff/diff_block.rb +71 -0
  41. data/lib/canon/diff/diff_block_builder.rb +105 -0
  42. data/lib/canon/diff/diff_classifier.rb +46 -0
  43. data/lib/canon/diff/diff_context.rb +85 -0
  44. data/lib/canon/diff/diff_context_builder.rb +107 -0
  45. data/lib/canon/diff/diff_line.rb +77 -0
  46. data/lib/canon/diff/diff_node.rb +56 -0
  47. data/lib/canon/diff/diff_node_mapper.rb +148 -0
  48. data/lib/canon/diff/diff_report.rb +133 -0
  49. data/lib/canon/diff/diff_report_builder.rb +62 -0
  50. data/lib/canon/diff_formatter/by_line/base_formatter.rb +407 -0
  51. data/lib/canon/diff_formatter/by_line/html_formatter.rb +672 -0
  52. data/lib/canon/diff_formatter/by_line/json_formatter.rb +284 -0
  53. data/lib/canon/diff_formatter/by_line/simple_formatter.rb +190 -0
  54. data/lib/canon/diff_formatter/by_line/xml_formatter.rb +860 -0
  55. data/lib/canon/diff_formatter/by_line/yaml_formatter.rb +292 -0
  56. data/lib/canon/diff_formatter/by_object/base_formatter.rb +199 -0
  57. data/lib/canon/diff_formatter/by_object/json_formatter.rb +305 -0
  58. data/lib/canon/diff_formatter/by_object/xml_formatter.rb +248 -0
  59. data/lib/canon/diff_formatter/by_object/yaml_formatter.rb +17 -0
  60. data/lib/canon/diff_formatter/character_map.yml +197 -0
  61. data/lib/canon/diff_formatter/debug_output.rb +431 -0
  62. data/lib/canon/diff_formatter/diff_detail_formatter.rb +551 -0
  63. data/lib/canon/diff_formatter/legend.rb +141 -0
  64. data/lib/canon/diff_formatter.rb +520 -0
  65. data/lib/canon/errors.rb +56 -0
  66. data/lib/canon/formatters/html4_formatter.rb +17 -0
  67. data/lib/canon/formatters/html5_formatter.rb +17 -0
  68. data/lib/canon/formatters/html_formatter.rb +37 -0
  69. data/lib/canon/formatters/html_formatter_base.rb +163 -0
  70. data/lib/canon/formatters/json_formatter.rb +3 -0
  71. data/lib/canon/formatters/xml_formatter.rb +20 -55
  72. data/lib/canon/formatters/yaml_formatter.rb +4 -1
  73. data/lib/canon/pretty_printer/html.rb +57 -0
  74. data/lib/canon/pretty_printer/json.rb +25 -0
  75. data/lib/canon/pretty_printer/xml.rb +29 -0
  76. data/lib/canon/rspec_matchers.rb +222 -80
  77. data/lib/canon/validators/base_validator.rb +49 -0
  78. data/lib/canon/validators/html_validator.rb +138 -0
  79. data/lib/canon/validators/json_validator.rb +89 -0
  80. data/lib/canon/validators/xml_validator.rb +53 -0
  81. data/lib/canon/validators/yaml_validator.rb +73 -0
  82. data/lib/canon/version.rb +1 -1
  83. data/lib/canon/xml/attribute_handler.rb +80 -0
  84. data/lib/canon/xml/c14n.rb +36 -0
  85. data/lib/canon/xml/character_encoder.rb +38 -0
  86. data/lib/canon/xml/data_model.rb +225 -0
  87. data/lib/canon/xml/element_matcher.rb +196 -0
  88. data/lib/canon/xml/line_range_mapper.rb +158 -0
  89. data/lib/canon/xml/namespace_handler.rb +86 -0
  90. data/lib/canon/xml/node.rb +32 -0
  91. data/lib/canon/xml/nodes/attribute_node.rb +54 -0
  92. data/lib/canon/xml/nodes/comment_node.rb +23 -0
  93. data/lib/canon/xml/nodes/element_node.rb +56 -0
  94. data/lib/canon/xml/nodes/namespace_node.rb +38 -0
  95. data/lib/canon/xml/nodes/processing_instruction_node.rb +24 -0
  96. data/lib/canon/xml/nodes/root_node.rb +16 -0
  97. data/lib/canon/xml/nodes/text_node.rb +23 -0
  98. data/lib/canon/xml/processor.rb +151 -0
  99. data/lib/canon/xml/whitespace_normalizer.rb +72 -0
  100. data/lib/canon/xml/xml_base_handler.rb +188 -0
  101. data/lib/canon.rb +14 -3
  102. metadata +116 -21
@@ -0,0 +1,463 @@
1
+ ---
2
+ layout: default
3
+ title: Match Architecture
4
+ nav_order: 22
5
+ parent: Understanding Canon
6
+ ---
7
+ = Match architecture
8
+ :toc:
9
+ :toclevels: 3
10
+
11
+ == Scope
12
+
13
+ This document explains Canon's three-phase comparison architecture and how
14
+ documents flow through preprocessing, semantic matching, and diff rendering.
15
+
16
+ For match dimension details, see link:MATCH_OPTIONS[Match options].
17
+
18
+ For diff output customization, see link:DIFF_FORMATTING[Diff formatting].
19
+
20
+ == General
21
+
22
+ Canon uses a three-phase architecture that separates concerns for clean,
23
+ maintainable comparison logic:
24
+
25
+ . **Preprocessing**: Optional document normalization
26
+ . **Semantic matching**: Content comparison with configurable dimensions
27
+ . **Diff rendering**: Formatted output with visualization
28
+
29
+ Each phase is independent and configurable, allowing fine-grained control over
30
+ comparison behavior.
31
+
32
+ == Architecture diagram
33
+
34
+ [source]
35
+ ----
36
+ ┌──────────────────────────────────────────────────────────────────┐
37
+ │ CANON COMPARISON FLOW │
38
+ └──────────────────────────────────────────────────────────────────┘
39
+
40
+ ┌─────────────────────┐
41
+ │ Input Documents │
42
+ │ (File 1, File 2) │
43
+ └──────────┬──────────┘
44
+
45
+
46
+ ╔══════════════════════════════════════════════════════════════════╗
47
+ ║ PHASE 1: PREPROCESSING ║
48
+ ╠══════════════════════════════════════════════════════════════════╣
49
+ ║ Options: ║
50
+ ║ • none - No preprocessing ║
51
+ ║ • c14n - Canonical form (XML C14N, JSON/YAML sorted) ║
52
+ ║ • normalize - Normalize whitespace ║
53
+ ║ • format - Pretty-print with standard formatting ║
54
+ ╚══════════════════════════════════════════════════════════════════╝
55
+
56
+
57
+ ┌─────────────────────┐
58
+ │ Preprocessed Docs │
59
+ └──────────┬──────────┘
60
+
61
+
62
+ ╔══════════════════════════════════════════════════════════════════╗
63
+ ║ PHASE 2: SEMANTIC MATCHING ║
64
+ ╠══════════════════════════════════════════════════════════════════╣
65
+ ║ Match Dimensions: ║
66
+ ║ • text_content (strict|normalize|ignore) ║
67
+ ║ • structural_whitespace (strict|normalize|ignore) ║
68
+ ║ • attribute_whitespace (strict|normalize|ignore) [XML/HTML] ║
69
+ ║ • attribute_order (strict|ignore) [XML/HTML] ║
70
+ ║ • attribute_values (strict|normalize|ignore) [XML/HTML] ║
71
+ ║ • key_order (strict|ignore) [JSON/YAML] ║
72
+ ║ • comments (strict|normalize|ignore) ║
73
+ ║ ║
74
+ ║ Match Profiles: ║
75
+ ║ • strict - All dimensions strict (exact matching) ║
76
+ ║ • rendered - Mimics browser/CSS rendering behavior ║
77
+ ║ • spec_friendly - Test-friendly (ignores formatting diffs) ║
78
+ ║ • content_only - Only semantic content matters ║
79
+ ╚══════════════════════════════════════════════════════════════════╝
80
+
81
+ ├─ Equivalent? ──► Return true
82
+
83
+ ├─ Different? ──┐
84
+ │ │
85
+ ▼ ▼
86
+ ╔══════════════════════════════════════════════════════════════════╗
87
+ ║ PHASE 3: DIFF RENDERING ║
88
+ ╠══════════════════════════════════════════════════════════════════╣
89
+ ║ Diff Options: ║
90
+ ║ • mode - by_line (XML/HTML) | by_object (JSON/YAML) ║
91
+ ║ • use_color - Colorized output (terminal colors) ║
92
+ ║ • context_lines - Lines of context around changes ║
93
+ ║ • grouping_lines - Group nearby changes into blocks ║
94
+ ╚══════════════════════════════════════════════════════════════════╝
95
+
96
+
97
+ ┌─────────────────────┐
98
+ │ Formatted Diff │
99
+ │ Output │
100
+ └─────────────────────┘
101
+ ----
102
+
103
+ == Phase 1: Preprocessing
104
+
105
+ === Purpose
106
+
107
+ Transform documents into a normalized form before comparison. This eliminates
108
+ format-specific variations that should not affect semantic equivalence.
109
+
110
+ === Options
111
+
112
+ `none` (default):: No preprocessing - compare documents as-is
113
+
114
+ `c14n`:: Canonical form:
115
+ * XML: W3C Canonical XML 1.1
116
+ * HTML: Normalized HTML structure
117
+ * JSON: Sorted keys, normalized whitespace
118
+ * YAML: Sorted keys, standard format
119
+
120
+ `normalize`:: Normalize whitespace:
121
+ * Collapse multiple whitespace to single space
122
+ * Trim leading/trailing whitespace
123
+ * Normalize line endings
124
+
125
+ `format`:: Pretty-print with standard formatting:
126
+ * 2-space indentation
127
+ * One element/property per line
128
+ * Consistent structure
129
+
130
+ === Usage
131
+
132
+ .Ruby API
133
+ [example]
134
+ ====
135
+ [source,ruby]
136
+ ----
137
+ Canon::Comparison.equivalent?(doc1, doc2,
138
+ preprocessing: :normalize
139
+ )
140
+ ----
141
+ ====
142
+
143
+ .CLI
144
+ [example]
145
+ ====
146
+ [source,bash]
147
+ ----
148
+ $ canon diff file1.xml file2.xml --preprocessing normalize
149
+ ----
150
+ ====
151
+
152
+ See link:PREPROCESSING[Preprocessing documentation] for details.
153
+
154
+ == Phase 2: Semantic matching
155
+
156
+ === Purpose
157
+
158
+ Compare document content based on configurable match dimensions. Each
159
+ dimension controls how a specific aspect of documents is compared.
160
+
161
+ === Match dimensions
162
+
163
+ Match dimensions are orthogonal aspects of documents that can be compared
164
+ independently:
165
+
166
+ `text_content`:: Text within elements/values
167
+ `structural_whitespace`:: Whitespace between elements
168
+ `attribute_whitespace`:: Whitespace in attribute values (XML/HTML)
169
+ `attribute_order`:: Order of attributes (XML/HTML)
170
+ `attribute_values`:: Attribute value content (XML/HTML)
171
+ `key_order`:: Order of object keys (JSON/YAML)
172
+ `comments`:: Comment content and placement
173
+
174
+ Each dimension supports behaviors:
175
+
176
+ * `:strict` - Must match exactly
177
+ * `:normalize` - Match after normalization
178
+ * `:ignore` - Don't compare
179
+
180
+ See link:MATCH_OPTIONS[Match options] for complete dimension reference.
181
+
182
+ === Match profiles
183
+
184
+ Profiles are predefined combinations of dimension settings for common
185
+ scenarios:
186
+
187
+ `:strict`:: Exact matching - all dimensions use `:strict` behavior
188
+ `:rendered`:: Browser rendering - ignores formatting that doesn't affect
189
+ display
190
+ `:spec_friendly`:: Test-friendly - ignores formatting, focuses on content
191
+ `:content_only`:: Maximum tolerance - only semantic content matters
192
+
193
+ === Usage
194
+
195
+ .With dimensions
196
+ [example]
197
+ ====
198
+ [source,ruby]
199
+ ----
200
+ Canon::Comparison.equivalent?(doc1, doc2,
201
+ match: {
202
+ text_content: :normalize,
203
+ structural_whitespace: :ignore,
204
+ comments: :ignore
205
+ }
206
+ )
207
+ ----
208
+ ====
209
+
210
+ .With profile
211
+ [example]
212
+ ====
213
+ [source,ruby]
214
+ ----
215
+ Canon::Comparison.equivalent?(doc1, doc2,
216
+ match_profile: :spec_friendly
217
+ )
218
+ ----
219
+ ====
220
+
221
+ .Profile with dimension overrides
222
+ [example]
223
+ ====
224
+ [source,ruby]
225
+ ----
226
+ Canon::Comparison.equivalent?(doc1, doc2,
227
+ match_profile: :spec_friendly,
228
+ match: {
229
+ comments: :strict # Override profile setting
230
+ }
231
+ )
232
+ ----
233
+ ====
234
+
235
+ == Phase 3: Diff rendering
236
+
237
+ === Purpose
238
+
239
+ When documents differ, format the differences for human review with syntax
240
+ highlighting, context lines, and whitespace visualization.
241
+
242
+ === Diff modes
243
+
244
+ `by-line`:: Traditional line-by-line diff
245
+ * Default for HTML
246
+ * Optional for XML
247
+ * Shows changes in document order
248
+ * DOM-guided semantic matching for XML
249
+
250
+ `by-object`:: Tree-based semantic diff
251
+ * Default for XML, JSON, YAML
252
+ * Shows only what changed in structure
253
+ * Visual tree representation
254
+
255
+ See link:MODES[Diff modes] for details.
256
+
257
+ === Diff options
258
+
259
+ `use_color`:: Enable/disable ANSI color codes (default: `true`)
260
+
261
+ `context_lines`:: Number of unchanged lines around changes (default: `3`)
262
+
263
+ `grouping_lines`:: Group changes within N lines (default: `nil`)
264
+
265
+ See link:DIFF_FORMATTING[Diff formatting] for details.
266
+
267
+ === Usage
268
+
269
+ .Ruby API
270
+ [example]
271
+ ====
272
+ [source,ruby]
273
+ ----
274
+ Canon::Comparison.equivalent?(doc1, doc2,
275
+ verbose: true,
276
+ diff: {
277
+ mode: :by_line,
278
+ use_color: true,
279
+ context_lines: 5,
280
+ grouping_lines: 10
281
+ }
282
+ )
283
+ ----
284
+ ====
285
+
286
+ .CLI
287
+ [example]
288
+ ====
289
+ [source,bash]
290
+ ----
291
+ $ canon diff file1.xml file2.xml \
292
+ --verbose \
293
+ --by-line \
294
+ --context-lines 5 \
295
+ --diff-grouping-lines 10
296
+ ----
297
+ ====
298
+
299
+ == Data flow examples
300
+
301
+ === Example 1: Equivalent documents with formatting differences
302
+
303
+ [source]
304
+ ----
305
+ Input:
306
+ doc1: <root><a>1</a><b>2</b></root>
307
+ doc2: <root> <b>2</b> <a>1</a> </root>
308
+
309
+ Phase 1 - Preprocessing (none):
310
+ No changes
311
+
312
+ Phase 2 - Semantic Matching (default):
313
+ • text_content: Both have "1" and "2" ✓
314
+ • structural_whitespace: Normalized ✓
315
+ • element order: Doesn't matter for siblings ✓
316
+ Result: EQUIVALENT
317
+
318
+ Phase 3 - Diff Rendering:
319
+ Not needed (documents equivalent)
320
+
321
+ Return: true
322
+ ----
323
+
324
+ === Example 2: Different text content
325
+
326
+ [source]
327
+ ----
328
+ Input:
329
+ doc1: <p>Test 1</p>
330
+ doc2: <p>Test 2</p>
331
+
332
+ Phase 1 - Preprocessing (none):
333
+ No changes
334
+
335
+ Phase 2 - Semantic Matching (default):
336
+ • text_content: "Test 1" ≠ "Test 2" ✗
337
+ Result: DIFFERENT
338
+
339
+ Phase 3 - Diff Rendering (by-line):
340
+ Output:
341
+ 1| - | <p>Test 1</p>
342
+ | 1+ | <p>Test 2</p>
343
+
344
+ Return: false (or diff hash if verbose: true)
345
+ ----
346
+
347
+ === Example 3: Attribute order with normalize
348
+
349
+ [source]
350
+ ----
351
+ Input:
352
+ doc1: <div class="foo" id="x">Content</div>
353
+ doc2: <div id="x" class="foo">Content</div>
354
+
355
+ Phase 1 - Preprocessing (none):
356
+ No changes
357
+
358
+ Phase 2 - Semantic Matching (attribute_order: ignore):
359
+ • attribute_order: Ignored (set to :ignore)
360
+ • Both have class="foo" and id="x" ✓
361
+ • Both have same content ✓
362
+ Result: EQUIVALENT
363
+
364
+ Phase 3 - Diff Rendering:
365
+ Not needed (documents equivalent)
366
+
367
+ Return: true
368
+ ----
369
+
370
+ === Example 4: With preprocessing
371
+
372
+ [source]
373
+ ----
374
+ Input:
375
+ doc1: <root> <a> 1 </a> </root>
376
+ doc2: <root><a>1</a></root>
377
+
378
+ Phase 1 - Preprocessing (normalize):
379
+ Both become: <root><a>1</a></root>
380
+
381
+ Phase 2 - Semantic Matching:
382
+ Preprocessed documents are identical
383
+ Result: EQUIVALENT
384
+
385
+ Phase 3 - Diff Rendering:
386
+ Not needed (documents equivalent)
387
+
388
+ Return: true
389
+ ----
390
+
391
+ == Configuration precedence
392
+
393
+ When options are specified in multiple places, Canon resolves them using this
394
+ hierarchy (highest to lowest priority):
395
+
396
+ [source]
397
+ ----
398
+ 1. Per-comparison explicit options (highest)
399
+
400
+ 2. Per-comparison profile
401
+
402
+ 3. Global configuration explicit options
403
+
404
+ 4. Global configuration profile
405
+
406
+ 5. Format defaults (lowest)
407
+ ----
408
+
409
+ .Precedence example
410
+ [example]
411
+ ====
412
+ Global configuration:
413
+
414
+ [source,ruby]
415
+ ----
416
+ Canon::RSpecMatchers.configure do |config|
417
+ config.xml.match.profile = :spec_friendly
418
+ config.xml.match.options = { comments: :strict }
419
+ end
420
+ ----
421
+
422
+ Per-test usage:
423
+
424
+ [source,ruby]
425
+ ----
426
+ expect(actual).to be_xml_equivalent_to(expected)
427
+ .with_profile(:rendered)
428
+ .with_options(structural_whitespace: :ignore)
429
+ ----
430
+
431
+ **Final resolved options**:
432
+
433
+ * `text_content: :normalize` (from `:rendered` per-test profile)
434
+ * `structural_whitespace: :ignore` (from per-test explicit option)
435
+ * `comments: :strict` (from global explicit option)
436
+ * Other dimensions use `:rendered` profile or format defaults
437
+ ====
438
+
439
+ == Benefits of three-phase architecture
440
+
441
+ **Separation of concerns**:: Each phase has a single responsibility
442
+
443
+ **Composability**:: Mix and match preprocessing, matching, and rendering
444
+ options
445
+
446
+ **Testability**:: Each phase can be tested independently
447
+
448
+ **Flexibility**:: Fine-grained control over comparison behavior
449
+
450
+ **Clarity**:: Clear data flow from input to output
451
+
452
+ **Extensibility**:: Easy to add new preprocessing, dimensions, or rendering
453
+ modes
454
+
455
+ == See also
456
+
457
+ * link:PREPROCESSING[Preprocessing options]
458
+ * link:MATCH_OPTIONS[Match dimensions and profiles]
459
+ * link:MODES[Diff modes]
460
+ * link:DIFF_FORMATTING[Diff formatting]
461
+ * link:RUBY_API[Ruby API documentation]
462
+ * link:CLI[Command-line interface]
463
+ * link:RSPEC[RSpec matchers]