canon 0.1.7 → 0.1.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (77) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop_todo.yml +69 -92
  3. data/README.adoc +13 -13
  4. data/docs/.lycheeignore +69 -0
  5. data/docs/Gemfile +1 -0
  6. data/docs/_config.yml +90 -1
  7. data/docs/advanced/diff-classification.adoc +82 -2
  8. data/docs/advanced/extending-canon.adoc +193 -0
  9. data/docs/features/match-options/index.adoc +239 -1
  10. data/docs/internals/diffnode-enrichment.adoc +611 -0
  11. data/docs/internals/index.adoc +251 -0
  12. data/docs/lychee.toml +13 -6
  13. data/docs/understanding/architecture.adoc +749 -33
  14. data/docs/understanding/comparison-pipeline.adoc +122 -0
  15. data/lib/canon/cache.rb +129 -0
  16. data/lib/canon/comparison/dimensions/attribute_order_dimension.rb +68 -0
  17. data/lib/canon/comparison/dimensions/attribute_presence_dimension.rb +68 -0
  18. data/lib/canon/comparison/dimensions/attribute_values_dimension.rb +171 -0
  19. data/lib/canon/comparison/dimensions/base_dimension.rb +107 -0
  20. data/lib/canon/comparison/dimensions/comments_dimension.rb +121 -0
  21. data/lib/canon/comparison/dimensions/element_position_dimension.rb +90 -0
  22. data/lib/canon/comparison/dimensions/registry.rb +77 -0
  23. data/lib/canon/comparison/dimensions/structural_whitespace_dimension.rb +119 -0
  24. data/lib/canon/comparison/dimensions/text_content_dimension.rb +96 -0
  25. data/lib/canon/comparison/dimensions.rb +54 -0
  26. data/lib/canon/comparison/format_detector.rb +87 -0
  27. data/lib/canon/comparison/html_comparator.rb +70 -26
  28. data/lib/canon/comparison/html_compare_profile.rb +8 -2
  29. data/lib/canon/comparison/html_parser.rb +80 -0
  30. data/lib/canon/comparison/json_comparator.rb +12 -0
  31. data/lib/canon/comparison/json_parser.rb +19 -0
  32. data/lib/canon/comparison/markup_comparator.rb +293 -0
  33. data/lib/canon/comparison/match_options/base_resolver.rb +150 -0
  34. data/lib/canon/comparison/match_options/json_resolver.rb +82 -0
  35. data/lib/canon/comparison/match_options/xml_resolver.rb +151 -0
  36. data/lib/canon/comparison/match_options/yaml_resolver.rb +87 -0
  37. data/lib/canon/comparison/match_options.rb +68 -463
  38. data/lib/canon/comparison/profile_definition.rb +149 -0
  39. data/lib/canon/comparison/ruby_object_comparator.rb +180 -0
  40. data/lib/canon/comparison/strategies/semantic_tree_match_strategy.rb +7 -10
  41. data/lib/canon/comparison/whitespace_sensitivity.rb +208 -0
  42. data/lib/canon/comparison/xml_comparator/attribute_comparator.rb +177 -0
  43. data/lib/canon/comparison/xml_comparator/attribute_filter.rb +136 -0
  44. data/lib/canon/comparison/xml_comparator/child_comparison.rb +197 -0
  45. data/lib/canon/comparison/xml_comparator/diff_node_builder.rb +115 -0
  46. data/lib/canon/comparison/xml_comparator/namespace_comparator.rb +186 -0
  47. data/lib/canon/comparison/xml_comparator/node_parser.rb +79 -0
  48. data/lib/canon/comparison/xml_comparator/node_type_comparator.rb +102 -0
  49. data/lib/canon/comparison/xml_comparator.rb +97 -684
  50. data/lib/canon/comparison/xml_node_comparison.rb +319 -0
  51. data/lib/canon/comparison/xml_parser.rb +19 -0
  52. data/lib/canon/comparison/yaml_comparator.rb +3 -3
  53. data/lib/canon/comparison.rb +265 -110
  54. data/lib/canon/diff/diff_classifier.rb +101 -2
  55. data/lib/canon/diff/diff_node.rb +32 -2
  56. data/lib/canon/diff/formatting_detector.rb +1 -1
  57. data/lib/canon/diff/node_serializer.rb +191 -0
  58. data/lib/canon/diff/path_builder.rb +143 -0
  59. data/lib/canon/diff_formatter/by_line/base_formatter.rb +251 -0
  60. data/lib/canon/diff_formatter/by_line/html_formatter.rb +6 -248
  61. data/lib/canon/diff_formatter/by_line/xml_formatter.rb +38 -229
  62. data/lib/canon/diff_formatter/diff_detail_formatter/color_helper.rb +30 -0
  63. data/lib/canon/diff_formatter/diff_detail_formatter/dimension_formatter.rb +579 -0
  64. data/lib/canon/diff_formatter/diff_detail_formatter/location_extractor.rb +121 -0
  65. data/lib/canon/diff_formatter/diff_detail_formatter/node_utils.rb +253 -0
  66. data/lib/canon/diff_formatter/diff_detail_formatter/text_utils.rb +61 -0
  67. data/lib/canon/diff_formatter/diff_detail_formatter.rb +31 -1028
  68. data/lib/canon/diff_formatter.rb +1 -1
  69. data/lib/canon/rspec_matchers.rb +38 -9
  70. data/lib/canon/tree_diff/operation_converter.rb +92 -338
  71. data/lib/canon/tree_diff/operation_converter_helpers/metadata_enricher.rb +71 -0
  72. data/lib/canon/tree_diff/operation_converter_helpers/post_processor.rb +103 -0
  73. data/lib/canon/tree_diff/operation_converter_helpers/reason_builder.rb +168 -0
  74. data/lib/canon/tree_diff/operation_converter_helpers/update_change_handler.rb +188 -0
  75. data/lib/canon/version.rb +1 -1
  76. data/lib/canon/xml/data_model.rb +24 -13
  77. metadata +48 -2
@@ -194,19 +194,69 @@ See link:algorithms/[Algorithm documentation] for details.
194
194
 
195
195
  === Purpose
196
196
 
197
- Configure what to compare and how strictly. **This layer is algorithm-specific** - each algorithm interprets match options differently.
197
+ Configure what to compare and how strictly. **Match options are format-specific** - each format (XML, HTML, JSON, YAML) has its own set of dimensions based on its structure.
198
198
 
199
- === Match dimensions
199
+ === Key architectural principle
200
200
 
201
- Match dimensions are orthogonal aspects of documents that can be compared independently:
201
+ **Dimensions are format-specific, NOT algorithm-specific.**
202
202
 
203
- `text_content`:: Text within elements/values
203
+ The comparison architecture works as follows:
204
+
205
+ [cols="2,4,3"]
206
+ |===
207
+ |**Aspect** |**Description** |**Examples**
208
+
209
+ |**Format**
210
+ |Determines which dimensions exist
211
+ |XML has attributes, JSON has keys
212
+
213
+ |**Dimensions**
214
+ |WHAT to compare (format-specific)
215
+ |`text_content`, `attribute_values`, `key_order`
216
+
217
+ |**Profile**
218
+ |Configures dimension behaviors for a format
219
+ |`text_content: :normalize`, `comments: :ignore`
220
+
221
+ |**Algorithm**
222
+ |HOW nodes are matched (format-independent)
223
+ |DOM: position-based, Semantic: signature-based
224
+ |===
225
+
226
+ **Critical distinction:**
227
+
228
+ * **Format → Dimensions**: XML has `attribute_values`, JSON has `key_order`
229
+ * **Profile → Behaviors**: Configures HOW dimensions are compared (`:strict`, `:normalize`, `:ignore`)
230
+ * **Algorithm → Matching Strategy**: DOM (position) vs Semantic (signature) - works with ANY format
231
+
232
+ === Format-specific dimensions
233
+
234
+ Different formats have different dimensions based on their structure:
235
+
236
+ **XML/HTML dimensions:**
237
+
238
+ `text_content`:: Text within elements
204
239
  `structural_whitespace`:: Whitespace between elements
205
- `attribute_whitespace`:: Whitespace in attribute values (XML/HTML)
206
- `attribute_order`:: Order of attributes (XML/HTML)
207
- `attribute_values`:: Attribute value content (XML/HTML)
208
- `key_order`:: Order of object keys (JSON/YAML)
209
- `comments`:: Comment content and placement
240
+ `attribute_presence`:: Which attributes exist
241
+ `attribute_order`:: Order of attributes
242
+ `attribute_values`:: Attribute value content
243
+ `element_position`:: Position in tree
244
+ `comments`:: Comment nodes
245
+
246
+ **JSON dimensions:**
247
+
248
+ `text_content`:: Value text
249
+ `structural_whitespace`:: Whitespace
250
+ `key_order`:: Order of object keys
251
+
252
+ **YAML dimensions:**
253
+
254
+ `text_content`:: Value text
255
+ `structural_whitespace`:: Whitespace
256
+ `key_order`:: Order of keys
257
+ `comments`:: Comments
258
+
259
+ === Dimension behaviors
210
260
 
211
261
  Each dimension supports behaviors:
212
262
 
@@ -216,59 +266,507 @@ Each dimension supports behaviors:
216
266
 
217
267
  === Match profiles
218
268
 
219
- Profiles are predefined combinations of dimension settings for common scenarios:
269
+ Profiles are predefined combinations of dimension settings for common scenarios.
220
270
 
221
- `:strict`:: Exact matching - all dimensions use `:strict` behavior
222
- `:rendered`:: Browser rendering - ignores formatting that doesn't affect display
223
- `:spec_friendly`:: Test-friendly - ignores formatting, focuses on content
224
- `:content_only`:: Maximum tolerance - only semantic content matters
271
+ **Important**: Profiles are **format-specific**. Each format (Xml, Html, Json, Yaml) has its own set of profiles configured for its dimensions.
225
272
 
226
- === Algorithm-specific behavior
273
+ See link:#available-preset-profiles[Available Preset Profiles] for complete profile reference.
227
274
 
228
- **Critical**: The same match options behave differently with each algorithm!
275
+ === Available preset profiles
229
276
 
230
- * **DOM algorithm**: Uses options for positional element comparison
231
- * **Semantic algorithm**: Uses options during signature calculation
277
+ Canon provides preset profiles optimized for different comparison scenarios. Each format has its own set of profiles with appropriate dimension configurations.
232
278
 
233
- See link:../features/match-options/algorithm-specific-behavior.adoc[Algorithm-Specific Behavior] for detailed comparison.
279
+ ==== XML/HTML profiles
234
280
 
235
- === Usage
281
+ **Profile: `:strict`**
282
+
283
+ Exact matching - all dimensions use `:strict` behavior (XML default).
284
+
285
+ [cols="2,2,4"]
286
+ |===
287
+ |Dimension |Behavior |Description
288
+
289
+ |preprocessing
290
+ |`:none`
291
+ |No preprocessing - compare as-is
292
+
293
+ |text_content
294
+ |`:strict`
295
+ |Must match exactly
296
+
297
+ |structural_whitespace
298
+ |`:strict`
299
+ |Whitespace must match exactly
300
+
301
+ |attribute_presence
302
+ |`:strict`
303
+ |All attributes must be present
304
+
305
+ |attribute_order
306
+ |`:strict`
307
+ |Attribute order must match
308
+
309
+ |attribute_values
310
+ |`:strict`
311
+ |Attribute values must match exactly
312
+
313
+ |element_position
314
+ |`:strict`
315
+ |Element positions must match
316
+
317
+ |comments
318
+ |`:strict`
319
+ |Comments must match exactly
320
+ |===
321
+
322
+ **Use when**: You need exact byte-for-byte matching (e.g., validating serialization).
323
+
324
+ **Profile: `:rendered`**
325
+
326
+ Browser rendering - ignores formatting that doesn't affect display (HTML default).
327
+
328
+ [cols="2,2,4"]
329
+ |===
330
+ |Dimension |Behavior |Description
331
+
332
+ |preprocessing
333
+ |`:none`
334
+ |No preprocessing
335
+
336
+ |text_content
337
+ |`:normalize`
338
+ |Normalize text (collapse whitespace)
339
+
340
+ |structural_whitespace
341
+ |`:normalize`
342
+ |Normalize whitespace
343
+
344
+ |attribute_presence
345
+ |`:strict`
346
+ |All attributes must be present
347
+
348
+ |attribute_order
349
+ |`:strict`
350
+ |Attribute order must match
351
+
352
+ |attribute_values
353
+ |`:strict`
354
+ |Attribute values must match exactly
355
+
356
+ |element_position
357
+ |`:strict`
358
+ |Element positions must match
359
+
360
+ |comments
361
+ |`:ignore`
362
+ |Comments are ignored
363
+ |===
364
+
365
+ **Use when**: You care about what the browser displays, not source formatting.
366
+
367
+ **Profile: `:html4`**
368
+
369
+ HTML4 rendered output - HTML4 normalizes attribute whitespace.
370
+
371
+ [cols="2,2,4"]
372
+ |===
373
+ |Dimension |Behavior |Description
374
+
375
+ |preprocessing
376
+ |`:rendered`
377
+ |Rendered HTML preprocessing
378
+
379
+ |text_content
380
+ |`:normalize`
381
+ |Normalize text
382
+
383
+ |structural_whitespace
384
+ |`:normalize`
385
+ |Normalize whitespace
386
+
387
+ |attribute_presence
388
+ |`:strict`
389
+ |All attributes must be present
390
+
391
+ |attribute_order
392
+ |`:strict`
393
+ |Attribute order must match
394
+
395
+ |attribute_values
396
+ |`:normalize`
397
+ |Normalize attribute values
398
+
399
+ |element_position
400
+ |`:ignore`
401
+ |Element position doesn't matter
402
+
403
+ |comments
404
+ |`:ignore`
405
+ |Comments are ignored
406
+ |===
407
+
408
+ **Use when**: Testing HTML4 output where attribute whitespace may vary.
409
+
410
+ **Profile: `:html5`**
411
+
412
+ HTML5 rendered output - same as `:rendered`.
413
+
414
+ [cols="2,2,4"]
415
+ |===
416
+ |Dimension |Behavior |Description
417
+
418
+ |preprocessing
419
+ |`:rendered`
420
+ |Rendered HTML preprocessing
421
+
422
+ |text_content
423
+ |`:normalize`
424
+ |Normalize text
425
+
426
+ |structural_whitespace
427
+ |`:normalize`
428
+ |Normalize whitespace
429
+
430
+ |attribute_presence
431
+ |`:strict`
432
+ |All attributes must be present
433
+
434
+ |attribute_order
435
+ |`:strict`
436
+ |Attribute order must match
437
+
438
+ |attribute_values
439
+ |`:strict`
440
+ |Attribute values must match exactly
441
+
442
+ |element_position
443
+ |`:ignore`
444
+ |Element position doesn't matter
445
+
446
+ |comments
447
+ |`:ignore`
448
+ |Comments are ignored
449
+ |===
450
+
451
+ **Use when**: Testing HTML5 output.
452
+
453
+ **Profile: `:spec_friendly`**
454
+
455
+ Test-friendly - ignores formatting, focuses on content.
456
+
457
+ [cols="2,2,4"]
458
+ |===
459
+ |Dimension |Behavior |Description
460
+
461
+ |preprocessing
462
+ |`:rendered`
463
+ |Rendered HTML preprocessing
464
+
465
+ |text_content
466
+ |`:normalize`
467
+ |Normalize text
468
+
469
+ |structural_whitespace
470
+ |`:ignore`
471
+ |Whitespace ignored
472
+
473
+ |attribute_presence
474
+ |`:strict`
475
+ |All attributes must be present
476
+
477
+ |attribute_order
478
+ |`:ignore`
479
+ |Attribute order ignored
480
+
481
+ |attribute_values
482
+ |`:normalize`
483
+ |Normalize attribute values
484
+
485
+ |element_position
486
+ |`:ignore`
487
+ |Element position ignored
488
+
489
+ |comments
490
+ |`:ignore`
491
+ |Comments ignored
492
+ |===
493
+
494
+ **Use when**: Writing tests where formatting changes are acceptable.
495
+
496
+ **Profile: `:content_only`**
497
+
498
+ Maximum tolerance - only semantic content matters.
499
+
500
+ [cols="2,2,4"]
501
+ |===
502
+ |Dimension |Behavior |Description
503
+
504
+ |preprocessing
505
+ |`:c14n`
506
+ |Canonical XML preprocessing
507
+
508
+ |text_content
509
+ |`:normalize`
510
+ |Normalize text
511
+
512
+ |structural_whitespace
513
+ |`:ignore`
514
+ |Whitespace ignored
515
+
516
+ |attribute_presence
517
+ |`:strict`
518
+ |All attributes must be present
519
+
520
+ |attribute_order
521
+ |`:ignore`
522
+ |Attribute order ignored
523
+
524
+ |attribute_values
525
+ |`:normalize`
526
+ |Normalize attribute values
527
+
528
+ |element_position
529
+ |`:ignore`
530
+ |Element position ignored
531
+
532
+ |comments
533
+ |`:ignore`
534
+ |Comments ignored
535
+ |===
536
+
537
+ **Use when**: You only care about semantic content, not structure or formatting.
538
+
539
+ ==== JSON profiles
540
+
541
+ JSON has 3 preset profiles: `:strict`, `:spec_friendly`, and `:content_only`.
542
+
543
+ [cols="2,2,2,2"]
544
+ |===
545
+ |Dimension |`:strict` |`:spec_friendly` |`:content_only`
546
+
547
+ |preprocessing
548
+ |`:none`
549
+ |`:normalize`
550
+ |`:normalize`
551
+
552
+ |text_content
553
+ |`:strict`
554
+ |`:strict`
555
+ |`:normalize`
556
+
557
+ |structural_whitespace
558
+ |`:strict`
559
+ |`:ignore`
560
+ |`:ignore`
561
+
562
+ |key_order
563
+ |`:strict`
564
+ |`:ignore`
565
+ |`:ignore`
566
+ |===
567
+
568
+ **Use cases**:
569
+
570
+ * `:strict` - Exact JSON matching (order-sensitive)
571
+ * `:spec_friendly` - Order-independent JSON comparison
572
+ * `:content_only` - Normalized values, order and formatting ignored
573
+
574
+ ==== YAML profiles
575
+
576
+ YAML has 3 preset profiles: `:strict`, `:spec_friendly`, and `:content_only`.
577
+
578
+ [cols="2,2,2,2"]
579
+ |===
580
+ |Dimension |`:strict` |`:spec_friendly` |`:content_only`
581
+
582
+ |preprocessing
583
+ |`:none`
584
+ |`:normalize`
585
+ |`:normalize`
586
+
587
+ |text_content
588
+ |`:strict`
589
+ |`:strict`
590
+ |`:normalize`
591
+
592
+ |structural_whitespace
593
+ |`:strict`
594
+ |`:ignore`
595
+ |`:ignore`
596
+
597
+ |key_order
598
+ |`:strict`
599
+ |`:ignore`
600
+ |`:ignore`
601
+
602
+ |comments
603
+ |`:strict`
604
+ |`:ignore`
605
+ |`:ignore`
606
+ |===
607
+
608
+ **Use cases**:
609
+
610
+ * `:strict` - Exact YAML matching (order and comments matter)
611
+ * `:spec_friendly` - Order-independent, comments ignored
612
+ * `:content_only` - Maximum tolerance, only values matter
613
+
614
+ === Customizing profiles
615
+
616
+ Canon provides two ways to customize comparison behavior: inline custom profiles and named custom profiles.
617
+
618
+ ==== Inline custom profiles
619
+
620
+ For one-off comparisons, pass a Hash directly to the `profile` parameter:
236
621
 
237
- .With dimensions
238
- [example]
239
- ====
240
622
  [source,ruby]
241
623
  ----
242
- Canon::Comparison.equivalent?(doc1, doc2,
243
- match: {
624
+ Canon::Comparison.equivalent?(html1, html2,
625
+ profile: {
244
626
  text_content: :normalize,
245
627
  structural_whitespace: :ignore,
246
628
  comments: :ignore
247
629
  }
248
630
  )
249
631
  ----
250
- ====
251
632
 
252
- .With profile
633
+ **Validation**: Inline profiles are validated at comparison time. Invalid dimensions or behaviors will raise a `Canon::Error`.
634
+
635
+ [source,ruby]
636
+ ----
637
+ # This raises Canon::Error
638
+ Canon::Comparison.equivalent?(html1, html2,
639
+ profile: {
640
+ unknown_dimension: :strict # => Error: Unknown dimension: unknown_dimension
641
+ }
642
+ )
643
+ ----
644
+
645
+ ==== Named custom profiles (Profile DSL)
646
+
647
+ For reusable custom profiles, define them using the Profile DSL:
648
+
649
+ [source,ruby]
650
+ ----
651
+ # Define a custom profile
652
+ Canon::Comparison.define_profile(:content_focused) do
653
+ text_content :normalize
654
+ comments :ignore
655
+ structural_whitespace :ignore
656
+ attribute_values :normalize
657
+ preprocessing :rendered
658
+ end
659
+
660
+ # Use the custom profile
661
+ Canon::Comparison.equivalent?(html1, html2, profile: :content_focused)
662
+
663
+ # List all available profiles (includes custom profiles)
664
+ Canon::Comparison.available_profiles
665
+ # => [:strict, :rendered, :html4, :html5, :spec_friendly, :content_only, :content_focused]
666
+ ----
667
+
668
+ **Available DSL methods**:
669
+
670
+ * `text_content` - Text within elements
671
+ * `structural_whitespace` - Whitespace between elements
672
+ * `attribute_presence` - Which attributes exist
673
+ * `attribute_order` - Order of attributes
674
+ * `attribute_values` - Attribute value content
675
+ * `element_position` - Position of elements
676
+ * `comments` - Comment content and placement
677
+ * `preprocessing` - Preprocessing option (`:none`, `:c14n`, `:normalize`, `:format`, `:rendered`)
678
+
679
+ **Behaviors for each dimension**:
680
+
681
+ * `:strict` - Must match exactly
682
+ * `:normalize` - Match after normalization
683
+ * `:ignore` - Don't compare
684
+ * `:strip` - (attribute_values only) Strip leading/trailing whitespace
685
+ * `:compact` - (attribute_values only) Collapse internal whitespace
686
+
687
+ **Validation at definition time**:
688
+
689
+ The Profile DSL validates immediately when you define the profile:
690
+
691
+ [source,ruby]
692
+ ----
693
+ # This raises Canon::Error at definition time
694
+ Canon::Comparison.define_profile(:invalid) do
695
+ unknown_dimension :strict # => Error: Unknown dimension: unknown_dimension
696
+ text_content :invalid_behavior # => Error: Invalid behavior 'invalid_behavior'
697
+ end
698
+ ----
699
+
700
+ This prevents invalid profiles from ever being used in comparisons.
701
+
702
+ **Removing custom profiles**:
703
+
704
+ [source,ruby]
705
+ ----
706
+ # Remove a custom profile
707
+ Canon::Comparison.remove_profile(:content_focused)
708
+ ----
709
+
710
+ **Profile best practices**:
711
+
712
+ * Use preset profiles when possible - they're well-tested and documented
713
+ * Name custom profiles descriptively (e.g., `:content_focused`, `:seo_test`)
714
+ * Define profiles at application startup, not during request handling
715
+ * Document why a custom profile is needed in comments
716
+
717
+ === Algorithm interaction with match options
718
+
719
+ Both algorithms (DOM and Semantic) work with ALL formats. The algorithm determines **HOW nodes are matched**, not **WHAT is compared**:
720
+
721
+ * **DOM algorithm**: Position-based matching (element at position 0 matches element at position 0)
722
+ * **Semantic algorithm**: Signature-based matching (nodes with similar signatures match)
723
+
724
+ Once nodes are matched, both algorithms use the **same dimension comparisons** configured by the profile.
725
+
726
+ === Usage
727
+
728
+ .Using the new unified `profile` parameter
253
729
  [example]
254
730
  ====
255
731
  [source,ruby]
256
732
  ----
733
+ # Using a preset profile
257
734
  Canon::Comparison.equivalent?(doc1, doc2,
258
- match_profile: :spec_friendly
735
+ profile: :spec_friendly
736
+ )
737
+
738
+ # Using an inline custom profile
739
+ Canon::Comparison.equivalent?(doc1, doc2,
740
+ profile: {
741
+ text_content: :normalize,
742
+ structural_whitespace: :ignore,
743
+ comments: :ignore
744
+ }
745
+ )
746
+
747
+ # Defining and using a custom profile
748
+ Canon::Comparison.define_profile(:my_custom) do
749
+ text_content :normalize
750
+ comments :ignore
751
+ preprocessing :rendered
752
+ end
753
+
754
+ Canon::Comparison.equivalent?(doc1, doc2,
755
+ profile: :my_custom
259
756
  )
260
757
  ----
261
758
  ====
262
759
 
263
- .Profile with dimension overrides
760
+ .Using dimensions (deprecated - use profile instead)
264
761
  [example]
265
762
  ====
266
763
  [source,ruby]
267
764
  ----
268
765
  Canon::Comparison.equivalent?(doc1, doc2,
269
- match_profile: :spec_friendly,
270
766
  match: {
271
- comments: :strict # Override profile setting
767
+ text_content: :normalize,
768
+ structural_whitespace: :ignore,
769
+ comments: :ignore
272
770
  }
273
771
  )
274
772
  ----
@@ -356,8 +854,8 @@ result = Canon::Comparison.equivalent?(doc1, doc2,
356
854
  # Layer 2: Algorithm
357
855
  diff_algorithm: :semantic,
358
856
 
359
- # Layer 3: Match Options
360
- match_profile: :spec_friendly,
857
+ # Layer 3: Match Options (new unified profile API)
858
+ profile: :spec_friendly,
361
859
 
362
860
  # Layer 4: Diff Formatting
363
861
  verbose: true,
@@ -369,6 +867,121 @@ result = Canon::Comparison.equivalent?(doc1, doc2,
369
867
 
370
868
  See link:comparison-pipeline.adoc[Comparison Pipeline] for layer-by-layer examples.
371
869
 
870
+ === DiffNode: Representation of differences
871
+
872
+ ==== Purpose
873
+
874
+ `DiffNode` objects represent individual differences between documents. Each DiffNode carries complete information about what changed, where it changed, and how to display it.
875
+
876
+ ==== DiffNode structure
877
+
878
+ [source,ruby]
879
+ ----
880
+ class DiffNode
881
+ # Core properties
882
+ attr_reader :node1, :node2 # Raw node references
883
+ attr_accessor :dimension, :reason # What changed and why
884
+ attr_accessor :normative, :formatting # Classification
885
+
886
+ # Location and display information
887
+ attr_accessor :path # Canonical path with ordinal indices
888
+ attr_accessor :serialized_before # Serialized "before" content
889
+ attr_accessor :serialized_after # Serialized "after" content
890
+ attr_accessor :attributes_before # Normalized "before" attributes
891
+ attr_accessor :attributes_after # Normalized "after" attributes
892
+ end
893
+ ----
894
+
895
+ ===== Properties explained
896
+
897
+ **Core properties**:
898
+
899
+ `node1, node2`:: Raw node references from the original documents
900
+
901
+ `dimension`:: What type of difference (`:text_content`, `:attribute_values`, `:element_structure`, etc.)
902
+
903
+ `reason`:: Human-readable explanation of the difference
904
+
905
+ `normative`:: Whether this difference affects semantic equivalence (true) or is just formatting (false)
906
+
907
+ `formatting`:: Whether this is a purely cosmetic whitespace difference
908
+
909
+ **Location and display properties**:
910
+
911
+ `path`:: Canonical XPath-like path with ordinal indices that uniquely identifies the node location (e.g., `/#document/div[0]/body[0]/p[1]/span[2]`)
912
+
913
+ `serialized_before`:: Serialized content of the "before" state captured at comparison time
914
+
915
+ `serialized_after`:: Serialized content of the "after" state captured at comparison time
916
+
917
+ `attributes_before`:: Normalized attribute hash from the "before" state
918
+
919
+ `attributes_after`:: Normalized attribute hash from the "after" state
920
+
921
+ ==== Using DiffNode in verbose output
922
+
923
+ When you enable verbose mode, Canon returns a `ComparisonResult` containing DiffNode objects:
924
+
925
+ [source,ruby]
926
+ ----
927
+ result = Canon::Comparison.equivalent?(doc1, doc2, verbose: true)
928
+
929
+ # Access individual differences
930
+ result.differences.each do |diff|
931
+ puts "Location: #{diff.path}"
932
+ puts "Dimension: #{diff.dimension}"
933
+ puts "Reason: #{diff.reason}"
934
+ puts "Normative: #{diff.normative?}"
935
+ end
936
+ ----
937
+
938
+ ==== Canonical paths with ordinal indices
939
+
940
+ DiffNode paths use ordinal indices to uniquely identify nodes. Instead of ambiguous paths like:
941
+
942
+ [source,text]
943
+ ----
944
+ /#document-fragment/div/p/span/span
945
+ ----
946
+
947
+ Canon generates precise paths like:
948
+
949
+ [source,text]
950
+ ----
951
+ /#document-fragment/div[0]/p[1]/span[2]/span[0]
952
+ ----
953
+
954
+ This tells you exactly which element changed:
955
+ * `div[0]` - First div element
956
+ * `p[1]` - Second paragraph element
957
+ * `span[2]` - Third span element
958
+ * `span[0]` - First nested span element
959
+
960
+ ==== Enriched metadata in diff output
961
+
962
+ Layer 4 (diff formatting) uses enriched metadata to display accurate before/after content:
963
+
964
+ [source,text]
965
+ ----
966
+ 🔍 DIFFERENCE #1/3 [NORMATIVE]
967
+ ════════════════════════════════════════════════════════════════════════
968
+ Dimension: element_structure
969
+ Location: /#document/div[0]/body[0]/p[1]/span[2]
970
+
971
+ ⊖ Expected (File 1):
972
+ (not present)
973
+
974
+ ⊕ Actual (File 2):
975
+ <span id="new-element">Added content</span>
976
+
977
+ ✨ Changes:
978
+ Element inserted
979
+ ----
980
+
981
+ The `Location` field shows the enriched path, and the before/after content uses `serialized_before` and `serialized_after` to ensure accurate display.
982
+
983
+ See link:../internals/[Internals] for implementation details on PathBuilder, NodeSerializer, and how metadata flows through the comparison layers.
984
+
372
985
  == Configuration precedence
373
986
 
374
987
  When options are specified in multiple places, Canon resolves them using this hierarchy (highest to lowest priority):
@@ -432,10 +1045,113 @@ expect(actual).to be_xml_equivalent_to(expected)
432
1045
 
433
1046
  **Extensibility**:: Easy to add new preprocessing, algorithms, dimensions, or rendering modes
434
1047
 
1048
+ == Profile DSL and Dimension System
1049
+
1050
+ === Overview
1051
+
1052
+ Canon 2.0 introduces a Profile DSL and Dimension system for cleaner, more maintainable comparison configuration:
1053
+
1054
+ * **Profile DSL** - Define custom comparison profiles with validation
1055
+ * **Dimension Classes** - Object-oriented dimension handling with reusable behaviors
1056
+
1057
+ === Profile DSL
1058
+
1059
+ The Profile DSL provides a clean, validated way to define custom comparison profiles:
1060
+
1061
+ [source,ruby]
1062
+ ----
1063
+ # Define a custom profile
1064
+ Canon::Comparison.define_profile(:content_focused) do
1065
+ text_content :normalize
1066
+ comments :ignore
1067
+ structural_whitespace :ignore
1068
+ attribute_values :normalize
1069
+ preprocessing :rendered
1070
+ end
1071
+
1072
+ # Use the custom profile
1073
+ Canon::Comparison.equivalent?(html1, html2, profile: :content_focused)
1074
+
1075
+ # List all available profiles
1076
+ Canon::Comparison.available_profiles
1077
+ # => [:strict, :rendered, :html4, :html5, :spec_friendly, :content_only, :content_focused]
1078
+ ----
1079
+
1080
+ **Available dimensions**:
1081
+
1082
+ * `text_content` - Text within elements/values
1083
+ * `structural_whitespace` - Whitespace between elements
1084
+ * `attribute_presence` - Which attributes exist
1085
+ * `attribute_order` - Order of attributes
1086
+ * `attribute_values` - Attribute value content
1087
+ * `element_position` - Position of elements
1088
+ * `comments` - Comment content and placement
1089
+
1090
+ **Behaviors for each dimension**:
1091
+
1092
+ * `:strict` - Must match exactly
1093
+ * `:normalize` - Match after normalization
1094
+ * `:ignore` - Don't compare
1095
+
1096
+ **Validation**:
1097
+
1098
+ The Profile DSL validates at definition time:
1099
+
1100
+ [source,ruby]
1101
+ ----
1102
+ # This raises an error at definition time
1103
+ Canon::Comparison.define_profile(:invalid) do
1104
+ unknown_dimension :strict # => Error: Unknown dimension: unknown_dimension
1105
+ text_content :invalid_behavior # => Error: Invalid behavior 'invalid_behavior'
1106
+ end
1107
+ ----
1108
+
1109
+ === Dimension Classes
1110
+
1111
+ Behind the scenes, Canon uses dimension classes that encapsulate comparison logic:
1112
+
1113
+ [source,ruby]
1114
+ ----
1115
+ # Each dimension knows how to extract and compare data
1116
+ dimension = Canon::Comparison::Dimensions::Registry.get(:text_content)
1117
+
1118
+ # Extract data from a node
1119
+ text = dimension.extract_data(node)
1120
+
1121
+ # Compare according to behavior
1122
+ dimension.equivalent?(node1, node2, :normalize)
1123
+ ----
1124
+
1125
+ **Available dimension classes**:
1126
+
1127
+ * `TextContentDimension` - Text content comparison
1128
+ * `CommentsDimension` - Comment comparison
1129
+ * `AttributeValuesDimension` - Attribute values comparison
1130
+ * `AttributePresenceDimension` - Attribute presence comparison
1131
+ * `AttributeOrderDimension` - Attribute order comparison
1132
+ * `ElementPositionDimension` - Element position comparison
1133
+ * `StructuralWhitespaceDimension` - Structural whitespace comparison
1134
+
1135
+ === Refactored Module Structure
1136
+
1137
+ Canon's internal modules have been reorganized for better separation of concerns:
1138
+
1139
+ * **XmlComparatorHelpers** - Node parsing, attribute comparison, namespace comparison
1140
+ * **DiffDetailFormatterHelpers** - Location extraction, node utilities, text utilities, dimension formatting
1141
+ * **Dimensions** - Reusable dimension classes for comparison
1142
+
1143
+ This refactoring improves:
1144
+ - **Maintainability** - Each module has a single responsibility
1145
+ - **Testability** - Modules can be tested independently
1146
+ - **Extensibility** - New dimensions/formatters can be added easily
1147
+ - **Code organization** - Related functionality is grouped together
1148
+
435
1149
  == See also
436
1150
 
437
1151
  * link:comparison-pipeline.adoc[Comparison Pipeline] - Complete 4-layer walkthrough
438
1152
  * link:algorithms/[Algorithms] - DOM and Semantic algorithm details
1153
+ * link:../internals/[Internals] - Implementation details and data structures
1154
+ * link:../internals/diffnode-enrichment.adoc[DiffNode Enrichment] - How metadata flows from Layer 2 to Layer 4
439
1155
  * link:../features/preprocessing/[Preprocessing options]
440
1156
  * link:../features/match-options/[Match dimensions and profiles]
441
1157
  * link:../features/match-options/algorithm-specific-behavior.adoc[Algorithm-Specific Behavior]