canon 0.1.7 → 0.1.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop_todo.yml +69 -92
- data/README.adoc +13 -13
- data/docs/.lycheeignore +69 -0
- data/docs/Gemfile +1 -0
- data/docs/_config.yml +90 -1
- data/docs/advanced/diff-classification.adoc +82 -2
- data/docs/advanced/extending-canon.adoc +193 -0
- data/docs/features/match-options/index.adoc +239 -1
- data/docs/internals/diffnode-enrichment.adoc +611 -0
- data/docs/internals/index.adoc +251 -0
- data/docs/lychee.toml +13 -6
- data/docs/understanding/architecture.adoc +749 -33
- data/docs/understanding/comparison-pipeline.adoc +122 -0
- data/lib/canon/cache.rb +129 -0
- data/lib/canon/comparison/dimensions/attribute_order_dimension.rb +68 -0
- data/lib/canon/comparison/dimensions/attribute_presence_dimension.rb +68 -0
- data/lib/canon/comparison/dimensions/attribute_values_dimension.rb +171 -0
- data/lib/canon/comparison/dimensions/base_dimension.rb +107 -0
- data/lib/canon/comparison/dimensions/comments_dimension.rb +121 -0
- data/lib/canon/comparison/dimensions/element_position_dimension.rb +90 -0
- data/lib/canon/comparison/dimensions/registry.rb +77 -0
- data/lib/canon/comparison/dimensions/structural_whitespace_dimension.rb +119 -0
- data/lib/canon/comparison/dimensions/text_content_dimension.rb +96 -0
- data/lib/canon/comparison/dimensions.rb +54 -0
- data/lib/canon/comparison/format_detector.rb +87 -0
- data/lib/canon/comparison/html_comparator.rb +70 -26
- data/lib/canon/comparison/html_compare_profile.rb +8 -2
- data/lib/canon/comparison/html_parser.rb +80 -0
- data/lib/canon/comparison/json_comparator.rb +12 -0
- data/lib/canon/comparison/json_parser.rb +19 -0
- data/lib/canon/comparison/markup_comparator.rb +293 -0
- data/lib/canon/comparison/match_options/base_resolver.rb +150 -0
- data/lib/canon/comparison/match_options/json_resolver.rb +82 -0
- data/lib/canon/comparison/match_options/xml_resolver.rb +151 -0
- data/lib/canon/comparison/match_options/yaml_resolver.rb +87 -0
- data/lib/canon/comparison/match_options.rb +68 -463
- data/lib/canon/comparison/profile_definition.rb +149 -0
- data/lib/canon/comparison/ruby_object_comparator.rb +180 -0
- data/lib/canon/comparison/strategies/semantic_tree_match_strategy.rb +7 -10
- data/lib/canon/comparison/whitespace_sensitivity.rb +208 -0
- data/lib/canon/comparison/xml_comparator/attribute_comparator.rb +177 -0
- data/lib/canon/comparison/xml_comparator/attribute_filter.rb +136 -0
- data/lib/canon/comparison/xml_comparator/child_comparison.rb +197 -0
- data/lib/canon/comparison/xml_comparator/diff_node_builder.rb +115 -0
- data/lib/canon/comparison/xml_comparator/namespace_comparator.rb +186 -0
- data/lib/canon/comparison/xml_comparator/node_parser.rb +79 -0
- data/lib/canon/comparison/xml_comparator/node_type_comparator.rb +102 -0
- data/lib/canon/comparison/xml_comparator.rb +97 -684
- data/lib/canon/comparison/xml_node_comparison.rb +319 -0
- data/lib/canon/comparison/xml_parser.rb +19 -0
- data/lib/canon/comparison/yaml_comparator.rb +3 -3
- data/lib/canon/comparison.rb +265 -110
- data/lib/canon/diff/diff_classifier.rb +101 -2
- data/lib/canon/diff/diff_node.rb +32 -2
- data/lib/canon/diff/formatting_detector.rb +1 -1
- data/lib/canon/diff/node_serializer.rb +191 -0
- data/lib/canon/diff/path_builder.rb +143 -0
- data/lib/canon/diff_formatter/by_line/base_formatter.rb +251 -0
- data/lib/canon/diff_formatter/by_line/html_formatter.rb +6 -248
- data/lib/canon/diff_formatter/by_line/xml_formatter.rb +38 -229
- data/lib/canon/diff_formatter/diff_detail_formatter/color_helper.rb +30 -0
- data/lib/canon/diff_formatter/diff_detail_formatter/dimension_formatter.rb +579 -0
- data/lib/canon/diff_formatter/diff_detail_formatter/location_extractor.rb +121 -0
- data/lib/canon/diff_formatter/diff_detail_formatter/node_utils.rb +253 -0
- data/lib/canon/diff_formatter/diff_detail_formatter/text_utils.rb +61 -0
- data/lib/canon/diff_formatter/diff_detail_formatter.rb +31 -1028
- data/lib/canon/diff_formatter.rb +1 -1
- data/lib/canon/rspec_matchers.rb +38 -9
- data/lib/canon/tree_diff/operation_converter.rb +92 -338
- data/lib/canon/tree_diff/operation_converter_helpers/metadata_enricher.rb +71 -0
- data/lib/canon/tree_diff/operation_converter_helpers/post_processor.rb +103 -0
- data/lib/canon/tree_diff/operation_converter_helpers/reason_builder.rb +168 -0
- data/lib/canon/tree_diff/operation_converter_helpers/update_change_handler.rb +188 -0
- data/lib/canon/version.rb +1 -1
- data/lib/canon/xml/data_model.rb +24 -13
- metadata +48 -2
|
@@ -194,19 +194,69 @@ See link:algorithms/[Algorithm documentation] for details.
|
|
|
194
194
|
|
|
195
195
|
=== Purpose
|
|
196
196
|
|
|
197
|
-
Configure what to compare and how strictly. **
|
|
197
|
+
Configure what to compare and how strictly. **Match options are format-specific** - each format (XML, HTML, JSON, YAML) has its own set of dimensions based on its structure.
|
|
198
198
|
|
|
199
|
-
===
|
|
199
|
+
=== Key architectural principle
|
|
200
200
|
|
|
201
|
-
|
|
201
|
+
**Dimensions are format-specific, NOT algorithm-specific.**
|
|
202
202
|
|
|
203
|
-
|
|
203
|
+
The comparison architecture works as follows:
|
|
204
|
+
|
|
205
|
+
[cols="2,4,3"]
|
|
206
|
+
|===
|
|
207
|
+
|**Aspect** |**Description** |**Examples**
|
|
208
|
+
|
|
209
|
+
|**Format**
|
|
210
|
+
|Determines which dimensions exist
|
|
211
|
+
|XML has attributes, JSON has keys
|
|
212
|
+
|
|
213
|
+
|**Dimensions**
|
|
214
|
+
|WHAT to compare (format-specific)
|
|
215
|
+
|`text_content`, `attribute_values`, `key_order`
|
|
216
|
+
|
|
217
|
+
|**Profile**
|
|
218
|
+
|Configures dimension behaviors for a format
|
|
219
|
+
|`text_content: :normalize`, `comments: :ignore`
|
|
220
|
+
|
|
221
|
+
|**Algorithm**
|
|
222
|
+
|HOW nodes are matched (format-independent)
|
|
223
|
+
|DOM: position-based, Semantic: signature-based
|
|
224
|
+
|===
|
|
225
|
+
|
|
226
|
+
**Critical distinction:**
|
|
227
|
+
|
|
228
|
+
* **Format → Dimensions**: XML has `attribute_values`, JSON has `key_order`
|
|
229
|
+
* **Profile → Behaviors**: Configures HOW dimensions are compared (`:strict`, `:normalize`, `:ignore`)
|
|
230
|
+
* **Algorithm → Matching Strategy**: DOM (position) vs Semantic (signature) - works with ANY format
|
|
231
|
+
|
|
232
|
+
=== Format-specific dimensions
|
|
233
|
+
|
|
234
|
+
Different formats have different dimensions based on their structure:
|
|
235
|
+
|
|
236
|
+
**XML/HTML dimensions:**
|
|
237
|
+
|
|
238
|
+
`text_content`:: Text within elements
|
|
204
239
|
`structural_whitespace`:: Whitespace between elements
|
|
205
|
-
`
|
|
206
|
-
`attribute_order`:: Order of attributes
|
|
207
|
-
`attribute_values`:: Attribute value content
|
|
208
|
-
`
|
|
209
|
-
`comments`:: Comment
|
|
240
|
+
`attribute_presence`:: Which attributes exist
|
|
241
|
+
`attribute_order`:: Order of attributes
|
|
242
|
+
`attribute_values`:: Attribute value content
|
|
243
|
+
`element_position`:: Position in tree
|
|
244
|
+
`comments`:: Comment nodes
|
|
245
|
+
|
|
246
|
+
**JSON dimensions:**
|
|
247
|
+
|
|
248
|
+
`text_content`:: Value text
|
|
249
|
+
`structural_whitespace`:: Whitespace
|
|
250
|
+
`key_order`:: Order of object keys
|
|
251
|
+
|
|
252
|
+
**YAML dimensions:**
|
|
253
|
+
|
|
254
|
+
`text_content`:: Value text
|
|
255
|
+
`structural_whitespace`:: Whitespace
|
|
256
|
+
`key_order`:: Order of keys
|
|
257
|
+
`comments`:: Comments
|
|
258
|
+
|
|
259
|
+
=== Dimension behaviors
|
|
210
260
|
|
|
211
261
|
Each dimension supports behaviors:
|
|
212
262
|
|
|
@@ -216,59 +266,507 @@ Each dimension supports behaviors:
|
|
|
216
266
|
|
|
217
267
|
=== Match profiles
|
|
218
268
|
|
|
219
|
-
Profiles are predefined combinations of dimension settings for common scenarios
|
|
269
|
+
Profiles are predefined combinations of dimension settings for common scenarios.
|
|
220
270
|
|
|
221
|
-
|
|
222
|
-
`:rendered`:: Browser rendering - ignores formatting that doesn't affect display
|
|
223
|
-
`:spec_friendly`:: Test-friendly - ignores formatting, focuses on content
|
|
224
|
-
`:content_only`:: Maximum tolerance - only semantic content matters
|
|
271
|
+
**Important**: Profiles are **format-specific**. Each format (Xml, Html, Json, Yaml) has its own set of profiles configured for its dimensions.
|
|
225
272
|
|
|
226
|
-
|
|
273
|
+
See link:#available-preset-profiles[Available Preset Profiles] for complete profile reference.
|
|
227
274
|
|
|
228
|
-
|
|
275
|
+
=== Available preset profiles
|
|
229
276
|
|
|
230
|
-
|
|
231
|
-
* **Semantic algorithm**: Uses options during signature calculation
|
|
277
|
+
Canon provides preset profiles optimized for different comparison scenarios. Each format has its own set of profiles with appropriate dimension configurations.
|
|
232
278
|
|
|
233
|
-
|
|
279
|
+
==== XML/HTML profiles
|
|
234
280
|
|
|
235
|
-
|
|
281
|
+
**Profile: `:strict`**
|
|
282
|
+
|
|
283
|
+
Exact matching - all dimensions use `:strict` behavior (XML default).
|
|
284
|
+
|
|
285
|
+
[cols="2,2,4"]
|
|
286
|
+
|===
|
|
287
|
+
|Dimension |Behavior |Description
|
|
288
|
+
|
|
289
|
+
|preprocessing
|
|
290
|
+
|`:none`
|
|
291
|
+
|No preprocessing - compare as-is
|
|
292
|
+
|
|
293
|
+
|text_content
|
|
294
|
+
|`:strict`
|
|
295
|
+
|Must match exactly
|
|
296
|
+
|
|
297
|
+
|structural_whitespace
|
|
298
|
+
|`:strict`
|
|
299
|
+
|Whitespace must match exactly
|
|
300
|
+
|
|
301
|
+
|attribute_presence
|
|
302
|
+
|`:strict`
|
|
303
|
+
|All attributes must be present
|
|
304
|
+
|
|
305
|
+
|attribute_order
|
|
306
|
+
|`:strict`
|
|
307
|
+
|Attribute order must match
|
|
308
|
+
|
|
309
|
+
|attribute_values
|
|
310
|
+
|`:strict`
|
|
311
|
+
|Attribute values must match exactly
|
|
312
|
+
|
|
313
|
+
|element_position
|
|
314
|
+
|`:strict`
|
|
315
|
+
|Element positions must match
|
|
316
|
+
|
|
317
|
+
|comments
|
|
318
|
+
|`:strict`
|
|
319
|
+
|Comments must match exactly
|
|
320
|
+
|===
|
|
321
|
+
|
|
322
|
+
**Use when**: You need exact byte-for-byte matching (e.g., validating serialization).
|
|
323
|
+
|
|
324
|
+
**Profile: `:rendered`**
|
|
325
|
+
|
|
326
|
+
Browser rendering - ignores formatting that doesn't affect display (HTML default).
|
|
327
|
+
|
|
328
|
+
[cols="2,2,4"]
|
|
329
|
+
|===
|
|
330
|
+
|Dimension |Behavior |Description
|
|
331
|
+
|
|
332
|
+
|preprocessing
|
|
333
|
+
|`:none`
|
|
334
|
+
|No preprocessing
|
|
335
|
+
|
|
336
|
+
|text_content
|
|
337
|
+
|`:normalize`
|
|
338
|
+
|Normalize text (collapse whitespace)
|
|
339
|
+
|
|
340
|
+
|structural_whitespace
|
|
341
|
+
|`:normalize`
|
|
342
|
+
|Normalize whitespace
|
|
343
|
+
|
|
344
|
+
|attribute_presence
|
|
345
|
+
|`:strict`
|
|
346
|
+
|All attributes must be present
|
|
347
|
+
|
|
348
|
+
|attribute_order
|
|
349
|
+
|`:strict`
|
|
350
|
+
|Attribute order must match
|
|
351
|
+
|
|
352
|
+
|attribute_values
|
|
353
|
+
|`:strict`
|
|
354
|
+
|Attribute values must match exactly
|
|
355
|
+
|
|
356
|
+
|element_position
|
|
357
|
+
|`:strict`
|
|
358
|
+
|Element positions must match
|
|
359
|
+
|
|
360
|
+
|comments
|
|
361
|
+
|`:ignore`
|
|
362
|
+
|Comments are ignored
|
|
363
|
+
|===
|
|
364
|
+
|
|
365
|
+
**Use when**: You care about what the browser displays, not source formatting.
|
|
366
|
+
|
|
367
|
+
**Profile: `:html4`**
|
|
368
|
+
|
|
369
|
+
HTML4 rendered output - HTML4 normalizes attribute whitespace.
|
|
370
|
+
|
|
371
|
+
[cols="2,2,4"]
|
|
372
|
+
|===
|
|
373
|
+
|Dimension |Behavior |Description
|
|
374
|
+
|
|
375
|
+
|preprocessing
|
|
376
|
+
|`:rendered`
|
|
377
|
+
|Rendered HTML preprocessing
|
|
378
|
+
|
|
379
|
+
|text_content
|
|
380
|
+
|`:normalize`
|
|
381
|
+
|Normalize text
|
|
382
|
+
|
|
383
|
+
|structural_whitespace
|
|
384
|
+
|`:normalize`
|
|
385
|
+
|Normalize whitespace
|
|
386
|
+
|
|
387
|
+
|attribute_presence
|
|
388
|
+
|`:strict`
|
|
389
|
+
|All attributes must be present
|
|
390
|
+
|
|
391
|
+
|attribute_order
|
|
392
|
+
|`:strict`
|
|
393
|
+
|Attribute order must match
|
|
394
|
+
|
|
395
|
+
|attribute_values
|
|
396
|
+
|`:normalize`
|
|
397
|
+
|Normalize attribute values
|
|
398
|
+
|
|
399
|
+
|element_position
|
|
400
|
+
|`:ignore`
|
|
401
|
+
|Element position doesn't matter
|
|
402
|
+
|
|
403
|
+
|comments
|
|
404
|
+
|`:ignore`
|
|
405
|
+
|Comments are ignored
|
|
406
|
+
|===
|
|
407
|
+
|
|
408
|
+
**Use when**: Testing HTML4 output where attribute whitespace may vary.
|
|
409
|
+
|
|
410
|
+
**Profile: `:html5`**
|
|
411
|
+
|
|
412
|
+
HTML5 rendered output - same as `:rendered`.
|
|
413
|
+
|
|
414
|
+
[cols="2,2,4"]
|
|
415
|
+
|===
|
|
416
|
+
|Dimension |Behavior |Description
|
|
417
|
+
|
|
418
|
+
|preprocessing
|
|
419
|
+
|`:rendered`
|
|
420
|
+
|Rendered HTML preprocessing
|
|
421
|
+
|
|
422
|
+
|text_content
|
|
423
|
+
|`:normalize`
|
|
424
|
+
|Normalize text
|
|
425
|
+
|
|
426
|
+
|structural_whitespace
|
|
427
|
+
|`:normalize`
|
|
428
|
+
|Normalize whitespace
|
|
429
|
+
|
|
430
|
+
|attribute_presence
|
|
431
|
+
|`:strict`
|
|
432
|
+
|All attributes must be present
|
|
433
|
+
|
|
434
|
+
|attribute_order
|
|
435
|
+
|`:strict`
|
|
436
|
+
|Attribute order must match
|
|
437
|
+
|
|
438
|
+
|attribute_values
|
|
439
|
+
|`:strict`
|
|
440
|
+
|Attribute values must match exactly
|
|
441
|
+
|
|
442
|
+
|element_position
|
|
443
|
+
|`:ignore`
|
|
444
|
+
|Element position doesn't matter
|
|
445
|
+
|
|
446
|
+
|comments
|
|
447
|
+
|`:ignore`
|
|
448
|
+
|Comments are ignored
|
|
449
|
+
|===
|
|
450
|
+
|
|
451
|
+
**Use when**: Testing HTML5 output.
|
|
452
|
+
|
|
453
|
+
**Profile: `:spec_friendly`**
|
|
454
|
+
|
|
455
|
+
Test-friendly - ignores formatting, focuses on content.
|
|
456
|
+
|
|
457
|
+
[cols="2,2,4"]
|
|
458
|
+
|===
|
|
459
|
+
|Dimension |Behavior |Description
|
|
460
|
+
|
|
461
|
+
|preprocessing
|
|
462
|
+
|`:rendered`
|
|
463
|
+
|Rendered HTML preprocessing
|
|
464
|
+
|
|
465
|
+
|text_content
|
|
466
|
+
|`:normalize`
|
|
467
|
+
|Normalize text
|
|
468
|
+
|
|
469
|
+
|structural_whitespace
|
|
470
|
+
|`:ignore`
|
|
471
|
+
|Whitespace ignored
|
|
472
|
+
|
|
473
|
+
|attribute_presence
|
|
474
|
+
|`:strict`
|
|
475
|
+
|All attributes must be present
|
|
476
|
+
|
|
477
|
+
|attribute_order
|
|
478
|
+
|`:ignore`
|
|
479
|
+
|Attribute order ignored
|
|
480
|
+
|
|
481
|
+
|attribute_values
|
|
482
|
+
|`:normalize`
|
|
483
|
+
|Normalize attribute values
|
|
484
|
+
|
|
485
|
+
|element_position
|
|
486
|
+
|`:ignore`
|
|
487
|
+
|Element position ignored
|
|
488
|
+
|
|
489
|
+
|comments
|
|
490
|
+
|`:ignore`
|
|
491
|
+
|Comments ignored
|
|
492
|
+
|===
|
|
493
|
+
|
|
494
|
+
**Use when**: Writing tests where formatting changes are acceptable.
|
|
495
|
+
|
|
496
|
+
**Profile: `:content_only`**
|
|
497
|
+
|
|
498
|
+
Maximum tolerance - only semantic content matters.
|
|
499
|
+
|
|
500
|
+
[cols="2,2,4"]
|
|
501
|
+
|===
|
|
502
|
+
|Dimension |Behavior |Description
|
|
503
|
+
|
|
504
|
+
|preprocessing
|
|
505
|
+
|`:c14n`
|
|
506
|
+
|Canonical XML preprocessing
|
|
507
|
+
|
|
508
|
+
|text_content
|
|
509
|
+
|`:normalize`
|
|
510
|
+
|Normalize text
|
|
511
|
+
|
|
512
|
+
|structural_whitespace
|
|
513
|
+
|`:ignore`
|
|
514
|
+
|Whitespace ignored
|
|
515
|
+
|
|
516
|
+
|attribute_presence
|
|
517
|
+
|`:strict`
|
|
518
|
+
|All attributes must be present
|
|
519
|
+
|
|
520
|
+
|attribute_order
|
|
521
|
+
|`:ignore`
|
|
522
|
+
|Attribute order ignored
|
|
523
|
+
|
|
524
|
+
|attribute_values
|
|
525
|
+
|`:normalize`
|
|
526
|
+
|Normalize attribute values
|
|
527
|
+
|
|
528
|
+
|element_position
|
|
529
|
+
|`:ignore`
|
|
530
|
+
|Element position ignored
|
|
531
|
+
|
|
532
|
+
|comments
|
|
533
|
+
|`:ignore`
|
|
534
|
+
|Comments ignored
|
|
535
|
+
|===
|
|
536
|
+
|
|
537
|
+
**Use when**: You only care about semantic content, not structure or formatting.
|
|
538
|
+
|
|
539
|
+
==== JSON profiles
|
|
540
|
+
|
|
541
|
+
JSON has 3 preset profiles: `:strict`, `:spec_friendly`, and `:content_only`.
|
|
542
|
+
|
|
543
|
+
[cols="2,2,2,2"]
|
|
544
|
+
|===
|
|
545
|
+
|Dimension |`:strict` |`:spec_friendly` |`:content_only`
|
|
546
|
+
|
|
547
|
+
|preprocessing
|
|
548
|
+
|`:none`
|
|
549
|
+
|`:normalize`
|
|
550
|
+
|`:normalize`
|
|
551
|
+
|
|
552
|
+
|text_content
|
|
553
|
+
|`:strict`
|
|
554
|
+
|`:strict`
|
|
555
|
+
|`:normalize`
|
|
556
|
+
|
|
557
|
+
|structural_whitespace
|
|
558
|
+
|`:strict`
|
|
559
|
+
|`:ignore`
|
|
560
|
+
|`:ignore`
|
|
561
|
+
|
|
562
|
+
|key_order
|
|
563
|
+
|`:strict`
|
|
564
|
+
|`:ignore`
|
|
565
|
+
|`:ignore`
|
|
566
|
+
|===
|
|
567
|
+
|
|
568
|
+
**Use cases**:
|
|
569
|
+
|
|
570
|
+
* `:strict` - Exact JSON matching (order-sensitive)
|
|
571
|
+
* `:spec_friendly` - Order-independent JSON comparison
|
|
572
|
+
* `:content_only` - Normalized values, order and formatting ignored
|
|
573
|
+
|
|
574
|
+
==== YAML profiles
|
|
575
|
+
|
|
576
|
+
YAML has 3 preset profiles: `:strict`, `:spec_friendly`, and `:content_only`.
|
|
577
|
+
|
|
578
|
+
[cols="2,2,2,2"]
|
|
579
|
+
|===
|
|
580
|
+
|Dimension |`:strict` |`:spec_friendly` |`:content_only`
|
|
581
|
+
|
|
582
|
+
|preprocessing
|
|
583
|
+
|`:none`
|
|
584
|
+
|`:normalize`
|
|
585
|
+
|`:normalize`
|
|
586
|
+
|
|
587
|
+
|text_content
|
|
588
|
+
|`:strict`
|
|
589
|
+
|`:strict`
|
|
590
|
+
|`:normalize`
|
|
591
|
+
|
|
592
|
+
|structural_whitespace
|
|
593
|
+
|`:strict`
|
|
594
|
+
|`:ignore`
|
|
595
|
+
|`:ignore`
|
|
596
|
+
|
|
597
|
+
|key_order
|
|
598
|
+
|`:strict`
|
|
599
|
+
|`:ignore`
|
|
600
|
+
|`:ignore`
|
|
601
|
+
|
|
602
|
+
|comments
|
|
603
|
+
|`:strict`
|
|
604
|
+
|`:ignore`
|
|
605
|
+
|`:ignore`
|
|
606
|
+
|===
|
|
607
|
+
|
|
608
|
+
**Use cases**:
|
|
609
|
+
|
|
610
|
+
* `:strict` - Exact YAML matching (order and comments matter)
|
|
611
|
+
* `:spec_friendly` - Order-independent, comments ignored
|
|
612
|
+
* `:content_only` - Maximum tolerance, only values matter
|
|
613
|
+
|
|
614
|
+
=== Customizing profiles
|
|
615
|
+
|
|
616
|
+
Canon provides two ways to customize comparison behavior: inline custom profiles and named custom profiles.
|
|
617
|
+
|
|
618
|
+
==== Inline custom profiles
|
|
619
|
+
|
|
620
|
+
For one-off comparisons, pass a Hash directly to the `profile` parameter:
|
|
236
621
|
|
|
237
|
-
.With dimensions
|
|
238
|
-
[example]
|
|
239
|
-
====
|
|
240
622
|
[source,ruby]
|
|
241
623
|
----
|
|
242
|
-
Canon::Comparison.equivalent?(
|
|
243
|
-
|
|
624
|
+
Canon::Comparison.equivalent?(html1, html2,
|
|
625
|
+
profile: {
|
|
244
626
|
text_content: :normalize,
|
|
245
627
|
structural_whitespace: :ignore,
|
|
246
628
|
comments: :ignore
|
|
247
629
|
}
|
|
248
630
|
)
|
|
249
631
|
----
|
|
250
|
-
====
|
|
251
632
|
|
|
252
|
-
.
|
|
633
|
+
**Validation**: Inline profiles are validated at comparison time. Invalid dimensions or behaviors will raise a `Canon::Error`.
|
|
634
|
+
|
|
635
|
+
[source,ruby]
|
|
636
|
+
----
|
|
637
|
+
# This raises Canon::Error
|
|
638
|
+
Canon::Comparison.equivalent?(html1, html2,
|
|
639
|
+
profile: {
|
|
640
|
+
unknown_dimension: :strict # => Error: Unknown dimension: unknown_dimension
|
|
641
|
+
}
|
|
642
|
+
)
|
|
643
|
+
----
|
|
644
|
+
|
|
645
|
+
==== Named custom profiles (Profile DSL)
|
|
646
|
+
|
|
647
|
+
For reusable custom profiles, define them using the Profile DSL:
|
|
648
|
+
|
|
649
|
+
[source,ruby]
|
|
650
|
+
----
|
|
651
|
+
# Define a custom profile
|
|
652
|
+
Canon::Comparison.define_profile(:content_focused) do
|
|
653
|
+
text_content :normalize
|
|
654
|
+
comments :ignore
|
|
655
|
+
structural_whitespace :ignore
|
|
656
|
+
attribute_values :normalize
|
|
657
|
+
preprocessing :rendered
|
|
658
|
+
end
|
|
659
|
+
|
|
660
|
+
# Use the custom profile
|
|
661
|
+
Canon::Comparison.equivalent?(html1, html2, profile: :content_focused)
|
|
662
|
+
|
|
663
|
+
# List all available profiles (includes custom profiles)
|
|
664
|
+
Canon::Comparison.available_profiles
|
|
665
|
+
# => [:strict, :rendered, :html4, :html5, :spec_friendly, :content_only, :content_focused]
|
|
666
|
+
----
|
|
667
|
+
|
|
668
|
+
**Available DSL methods**:
|
|
669
|
+
|
|
670
|
+
* `text_content` - Text within elements
|
|
671
|
+
* `structural_whitespace` - Whitespace between elements
|
|
672
|
+
* `attribute_presence` - Which attributes exist
|
|
673
|
+
* `attribute_order` - Order of attributes
|
|
674
|
+
* `attribute_values` - Attribute value content
|
|
675
|
+
* `element_position` - Position of elements
|
|
676
|
+
* `comments` - Comment content and placement
|
|
677
|
+
* `preprocessing` - Preprocessing option (`:none`, `:c14n`, `:normalize`, `:format`, `:rendered`)
|
|
678
|
+
|
|
679
|
+
**Behaviors for each dimension**:
|
|
680
|
+
|
|
681
|
+
* `:strict` - Must match exactly
|
|
682
|
+
* `:normalize` - Match after normalization
|
|
683
|
+
* `:ignore` - Don't compare
|
|
684
|
+
* `:strip` - (attribute_values only) Strip leading/trailing whitespace
|
|
685
|
+
* `:compact` - (attribute_values only) Collapse internal whitespace
|
|
686
|
+
|
|
687
|
+
**Validation at definition time**:
|
|
688
|
+
|
|
689
|
+
The Profile DSL validates immediately when you define the profile:
|
|
690
|
+
|
|
691
|
+
[source,ruby]
|
|
692
|
+
----
|
|
693
|
+
# This raises Canon::Error at definition time
|
|
694
|
+
Canon::Comparison.define_profile(:invalid) do
|
|
695
|
+
unknown_dimension :strict # => Error: Unknown dimension: unknown_dimension
|
|
696
|
+
text_content :invalid_behavior # => Error: Invalid behavior 'invalid_behavior'
|
|
697
|
+
end
|
|
698
|
+
----
|
|
699
|
+
|
|
700
|
+
This prevents invalid profiles from ever being used in comparisons.
|
|
701
|
+
|
|
702
|
+
**Removing custom profiles**:
|
|
703
|
+
|
|
704
|
+
[source,ruby]
|
|
705
|
+
----
|
|
706
|
+
# Remove a custom profile
|
|
707
|
+
Canon::Comparison.remove_profile(:content_focused)
|
|
708
|
+
----
|
|
709
|
+
|
|
710
|
+
**Profile best practices**:
|
|
711
|
+
|
|
712
|
+
* Use preset profiles when possible - they're well-tested and documented
|
|
713
|
+
* Name custom profiles descriptively (e.g., `:content_focused`, `:seo_test`)
|
|
714
|
+
* Define profiles at application startup, not during request handling
|
|
715
|
+
* Document why a custom profile is needed in comments
|
|
716
|
+
|
|
717
|
+
=== Algorithm interaction with match options
|
|
718
|
+
|
|
719
|
+
Both algorithms (DOM and Semantic) work with ALL formats. The algorithm determines **HOW nodes are matched**, not **WHAT is compared**:
|
|
720
|
+
|
|
721
|
+
* **DOM algorithm**: Position-based matching (element at position 0 matches element at position 0)
|
|
722
|
+
* **Semantic algorithm**: Signature-based matching (nodes with similar signatures match)
|
|
723
|
+
|
|
724
|
+
Once nodes are matched, both algorithms use the **same dimension comparisons** configured by the profile.
|
|
725
|
+
|
|
726
|
+
=== Usage
|
|
727
|
+
|
|
728
|
+
.Using the new unified `profile` parameter
|
|
253
729
|
[example]
|
|
254
730
|
====
|
|
255
731
|
[source,ruby]
|
|
256
732
|
----
|
|
733
|
+
# Using a preset profile
|
|
257
734
|
Canon::Comparison.equivalent?(doc1, doc2,
|
|
258
|
-
|
|
735
|
+
profile: :spec_friendly
|
|
736
|
+
)
|
|
737
|
+
|
|
738
|
+
# Using an inline custom profile
|
|
739
|
+
Canon::Comparison.equivalent?(doc1, doc2,
|
|
740
|
+
profile: {
|
|
741
|
+
text_content: :normalize,
|
|
742
|
+
structural_whitespace: :ignore,
|
|
743
|
+
comments: :ignore
|
|
744
|
+
}
|
|
745
|
+
)
|
|
746
|
+
|
|
747
|
+
# Defining and using a custom profile
|
|
748
|
+
Canon::Comparison.define_profile(:my_custom) do
|
|
749
|
+
text_content :normalize
|
|
750
|
+
comments :ignore
|
|
751
|
+
preprocessing :rendered
|
|
752
|
+
end
|
|
753
|
+
|
|
754
|
+
Canon::Comparison.equivalent?(doc1, doc2,
|
|
755
|
+
profile: :my_custom
|
|
259
756
|
)
|
|
260
757
|
----
|
|
261
758
|
====
|
|
262
759
|
|
|
263
|
-
.
|
|
760
|
+
.Using dimensions (deprecated - use profile instead)
|
|
264
761
|
[example]
|
|
265
762
|
====
|
|
266
763
|
[source,ruby]
|
|
267
764
|
----
|
|
268
765
|
Canon::Comparison.equivalent?(doc1, doc2,
|
|
269
|
-
match_profile: :spec_friendly,
|
|
270
766
|
match: {
|
|
271
|
-
|
|
767
|
+
text_content: :normalize,
|
|
768
|
+
structural_whitespace: :ignore,
|
|
769
|
+
comments: :ignore
|
|
272
770
|
}
|
|
273
771
|
)
|
|
274
772
|
----
|
|
@@ -356,8 +854,8 @@ result = Canon::Comparison.equivalent?(doc1, doc2,
|
|
|
356
854
|
# Layer 2: Algorithm
|
|
357
855
|
diff_algorithm: :semantic,
|
|
358
856
|
|
|
359
|
-
# Layer 3: Match Options
|
|
360
|
-
|
|
857
|
+
# Layer 3: Match Options (new unified profile API)
|
|
858
|
+
profile: :spec_friendly,
|
|
361
859
|
|
|
362
860
|
# Layer 4: Diff Formatting
|
|
363
861
|
verbose: true,
|
|
@@ -369,6 +867,121 @@ result = Canon::Comparison.equivalent?(doc1, doc2,
|
|
|
369
867
|
|
|
370
868
|
See link:comparison-pipeline.adoc[Comparison Pipeline] for layer-by-layer examples.
|
|
371
869
|
|
|
870
|
+
=== DiffNode: Representation of differences
|
|
871
|
+
|
|
872
|
+
==== Purpose
|
|
873
|
+
|
|
874
|
+
`DiffNode` objects represent individual differences between documents. Each DiffNode carries complete information about what changed, where it changed, and how to display it.
|
|
875
|
+
|
|
876
|
+
==== DiffNode structure
|
|
877
|
+
|
|
878
|
+
[source,ruby]
|
|
879
|
+
----
|
|
880
|
+
class DiffNode
|
|
881
|
+
# Core properties
|
|
882
|
+
attr_reader :node1, :node2 # Raw node references
|
|
883
|
+
attr_accessor :dimension, :reason # What changed and why
|
|
884
|
+
attr_accessor :normative, :formatting # Classification
|
|
885
|
+
|
|
886
|
+
# Location and display information
|
|
887
|
+
attr_accessor :path # Canonical path with ordinal indices
|
|
888
|
+
attr_accessor :serialized_before # Serialized "before" content
|
|
889
|
+
attr_accessor :serialized_after # Serialized "after" content
|
|
890
|
+
attr_accessor :attributes_before # Normalized "before" attributes
|
|
891
|
+
attr_accessor :attributes_after # Normalized "after" attributes
|
|
892
|
+
end
|
|
893
|
+
----
|
|
894
|
+
|
|
895
|
+
===== Properties explained
|
|
896
|
+
|
|
897
|
+
**Core properties**:
|
|
898
|
+
|
|
899
|
+
`node1, node2`:: Raw node references from the original documents
|
|
900
|
+
|
|
901
|
+
`dimension`:: What type of difference (`:text_content`, `:attribute_values`, `:element_structure`, etc.)
|
|
902
|
+
|
|
903
|
+
`reason`:: Human-readable explanation of the difference
|
|
904
|
+
|
|
905
|
+
`normative`:: Whether this difference affects semantic equivalence (true) or is just formatting (false)
|
|
906
|
+
|
|
907
|
+
`formatting`:: Whether this is a purely cosmetic whitespace difference
|
|
908
|
+
|
|
909
|
+
**Location and display properties**:
|
|
910
|
+
|
|
911
|
+
`path`:: Canonical XPath-like path with ordinal indices that uniquely identifies the node location (e.g., `/#document/div[0]/body[0]/p[1]/span[2]`)
|
|
912
|
+
|
|
913
|
+
`serialized_before`:: Serialized content of the "before" state captured at comparison time
|
|
914
|
+
|
|
915
|
+
`serialized_after`:: Serialized content of the "after" state captured at comparison time
|
|
916
|
+
|
|
917
|
+
`attributes_before`:: Normalized attribute hash from the "before" state
|
|
918
|
+
|
|
919
|
+
`attributes_after`:: Normalized attribute hash from the "after" state
|
|
920
|
+
|
|
921
|
+
==== Using DiffNode in verbose output
|
|
922
|
+
|
|
923
|
+
When you enable verbose mode, Canon returns a `ComparisonResult` containing DiffNode objects:
|
|
924
|
+
|
|
925
|
+
[source,ruby]
|
|
926
|
+
----
|
|
927
|
+
result = Canon::Comparison.equivalent?(doc1, doc2, verbose: true)
|
|
928
|
+
|
|
929
|
+
# Access individual differences
|
|
930
|
+
result.differences.each do |diff|
|
|
931
|
+
puts "Location: #{diff.path}"
|
|
932
|
+
puts "Dimension: #{diff.dimension}"
|
|
933
|
+
puts "Reason: #{diff.reason}"
|
|
934
|
+
puts "Normative: #{diff.normative?}"
|
|
935
|
+
end
|
|
936
|
+
----
|
|
937
|
+
|
|
938
|
+
==== Canonical paths with ordinal indices
|
|
939
|
+
|
|
940
|
+
DiffNode paths use ordinal indices to uniquely identify nodes. Instead of ambiguous paths like:
|
|
941
|
+
|
|
942
|
+
[source,text]
|
|
943
|
+
----
|
|
944
|
+
/#document-fragment/div/p/span/span
|
|
945
|
+
----
|
|
946
|
+
|
|
947
|
+
Canon generates precise paths like:
|
|
948
|
+
|
|
949
|
+
[source,text]
|
|
950
|
+
----
|
|
951
|
+
/#document-fragment/div[0]/p[1]/span[2]/span[0]
|
|
952
|
+
----
|
|
953
|
+
|
|
954
|
+
This tells you exactly which element changed:
|
|
955
|
+
* `div[0]` - First div element
|
|
956
|
+
* `p[1]` - Second paragraph element
|
|
957
|
+
* `span[2]` - Third span element
|
|
958
|
+
* `span[0]` - First nested span element
|
|
959
|
+
|
|
960
|
+
==== Enriched metadata in diff output
|
|
961
|
+
|
|
962
|
+
Layer 4 (diff formatting) uses enriched metadata to display accurate before/after content:
|
|
963
|
+
|
|
964
|
+
[source,text]
|
|
965
|
+
----
|
|
966
|
+
🔍 DIFFERENCE #1/3 [NORMATIVE]
|
|
967
|
+
════════════════════════════════════════════════════════════════════════
|
|
968
|
+
Dimension: element_structure
|
|
969
|
+
Location: /#document/div[0]/body[0]/p[1]/span[2]
|
|
970
|
+
|
|
971
|
+
⊖ Expected (File 1):
|
|
972
|
+
(not present)
|
|
973
|
+
|
|
974
|
+
⊕ Actual (File 2):
|
|
975
|
+
<span id="new-element">Added content</span>
|
|
976
|
+
|
|
977
|
+
✨ Changes:
|
|
978
|
+
Element inserted
|
|
979
|
+
----
|
|
980
|
+
|
|
981
|
+
The `Location` field shows the enriched path, and the before/after content uses `serialized_before` and `serialized_after` to ensure accurate display.
|
|
982
|
+
|
|
983
|
+
See link:../internals/[Internals] for implementation details on PathBuilder, NodeSerializer, and how metadata flows through the comparison layers.
|
|
984
|
+
|
|
372
985
|
== Configuration precedence
|
|
373
986
|
|
|
374
987
|
When options are specified in multiple places, Canon resolves them using this hierarchy (highest to lowest priority):
|
|
@@ -432,10 +1045,113 @@ expect(actual).to be_xml_equivalent_to(expected)
|
|
|
432
1045
|
|
|
433
1046
|
**Extensibility**:: Easy to add new preprocessing, algorithms, dimensions, or rendering modes
|
|
434
1047
|
|
|
1048
|
+
== Profile DSL and Dimension System
|
|
1049
|
+
|
|
1050
|
+
=== Overview
|
|
1051
|
+
|
|
1052
|
+
Canon 2.0 introduces a Profile DSL and Dimension system for cleaner, more maintainable comparison configuration:
|
|
1053
|
+
|
|
1054
|
+
* **Profile DSL** - Define custom comparison profiles with validation
|
|
1055
|
+
* **Dimension Classes** - Object-oriented dimension handling with reusable behaviors
|
|
1056
|
+
|
|
1057
|
+
=== Profile DSL
|
|
1058
|
+
|
|
1059
|
+
The Profile DSL provides a clean, validated way to define custom comparison profiles:
|
|
1060
|
+
|
|
1061
|
+
[source,ruby]
|
|
1062
|
+
----
|
|
1063
|
+
# Define a custom profile
|
|
1064
|
+
Canon::Comparison.define_profile(:content_focused) do
|
|
1065
|
+
text_content :normalize
|
|
1066
|
+
comments :ignore
|
|
1067
|
+
structural_whitespace :ignore
|
|
1068
|
+
attribute_values :normalize
|
|
1069
|
+
preprocessing :rendered
|
|
1070
|
+
end
|
|
1071
|
+
|
|
1072
|
+
# Use the custom profile
|
|
1073
|
+
Canon::Comparison.equivalent?(html1, html2, profile: :content_focused)
|
|
1074
|
+
|
|
1075
|
+
# List all available profiles
|
|
1076
|
+
Canon::Comparison.available_profiles
|
|
1077
|
+
# => [:strict, :rendered, :html4, :html5, :spec_friendly, :content_only, :content_focused]
|
|
1078
|
+
----
|
|
1079
|
+
|
|
1080
|
+
**Available dimensions**:
|
|
1081
|
+
|
|
1082
|
+
* `text_content` - Text within elements/values
|
|
1083
|
+
* `structural_whitespace` - Whitespace between elements
|
|
1084
|
+
* `attribute_presence` - Which attributes exist
|
|
1085
|
+
* `attribute_order` - Order of attributes
|
|
1086
|
+
* `attribute_values` - Attribute value content
|
|
1087
|
+
* `element_position` - Position of elements
|
|
1088
|
+
* `comments` - Comment content and placement
|
|
1089
|
+
|
|
1090
|
+
**Behaviors for each dimension**:
|
|
1091
|
+
|
|
1092
|
+
* `:strict` - Must match exactly
|
|
1093
|
+
* `:normalize` - Match after normalization
|
|
1094
|
+
* `:ignore` - Don't compare
|
|
1095
|
+
|
|
1096
|
+
**Validation**:
|
|
1097
|
+
|
|
1098
|
+
The Profile DSL validates at definition time:
|
|
1099
|
+
|
|
1100
|
+
[source,ruby]
|
|
1101
|
+
----
|
|
1102
|
+
# This raises an error at definition time
|
|
1103
|
+
Canon::Comparison.define_profile(:invalid) do
|
|
1104
|
+
unknown_dimension :strict # => Error: Unknown dimension: unknown_dimension
|
|
1105
|
+
text_content :invalid_behavior # => Error: Invalid behavior 'invalid_behavior'
|
|
1106
|
+
end
|
|
1107
|
+
----
|
|
1108
|
+
|
|
1109
|
+
=== Dimension Classes
|
|
1110
|
+
|
|
1111
|
+
Behind the scenes, Canon uses dimension classes that encapsulate comparison logic:
|
|
1112
|
+
|
|
1113
|
+
[source,ruby]
|
|
1114
|
+
----
|
|
1115
|
+
# Each dimension knows how to extract and compare data
|
|
1116
|
+
dimension = Canon::Comparison::Dimensions::Registry.get(:text_content)
|
|
1117
|
+
|
|
1118
|
+
# Extract data from a node
|
|
1119
|
+
text = dimension.extract_data(node)
|
|
1120
|
+
|
|
1121
|
+
# Compare according to behavior
|
|
1122
|
+
dimension.equivalent?(node1, node2, :normalize)
|
|
1123
|
+
----
|
|
1124
|
+
|
|
1125
|
+
**Available dimension classes**:
|
|
1126
|
+
|
|
1127
|
+
* `TextContentDimension` - Text content comparison
|
|
1128
|
+
* `CommentsDimension` - Comment comparison
|
|
1129
|
+
* `AttributeValuesDimension` - Attribute values comparison
|
|
1130
|
+
* `AttributePresenceDimension` - Attribute presence comparison
|
|
1131
|
+
* `AttributeOrderDimension` - Attribute order comparison
|
|
1132
|
+
* `ElementPositionDimension` - Element position comparison
|
|
1133
|
+
* `StructuralWhitespaceDimension` - Structural whitespace comparison
|
|
1134
|
+
|
|
1135
|
+
=== Refactored Module Structure
|
|
1136
|
+
|
|
1137
|
+
Canon's internal modules have been reorganized for better separation of concerns:
|
|
1138
|
+
|
|
1139
|
+
* **XmlComparatorHelpers** - Node parsing, attribute comparison, namespace comparison
|
|
1140
|
+
* **DiffDetailFormatterHelpers** - Location extraction, node utilities, text utilities, dimension formatting
|
|
1141
|
+
* **Dimensions** - Reusable dimension classes for comparison
|
|
1142
|
+
|
|
1143
|
+
This refactoring improves:
|
|
1144
|
+
- **Maintainability** - Each module has a single responsibility
|
|
1145
|
+
- **Testability** - Modules can be tested independently
|
|
1146
|
+
- **Extensibility** - New dimensions/formatters can be added easily
|
|
1147
|
+
- **Code organization** - Related functionality is grouped together
|
|
1148
|
+
|
|
435
1149
|
== See also
|
|
436
1150
|
|
|
437
1151
|
* link:comparison-pipeline.adoc[Comparison Pipeline] - Complete 4-layer walkthrough
|
|
438
1152
|
* link:algorithms/[Algorithms] - DOM and Semantic algorithm details
|
|
1153
|
+
* link:../internals/[Internals] - Implementation details and data structures
|
|
1154
|
+
* link:../internals/diffnode-enrichment.adoc[DiffNode Enrichment] - How metadata flows from Layer 2 to Layer 4
|
|
439
1155
|
* link:../features/preprocessing/[Preprocessing options]
|
|
440
1156
|
* link:../features/match-options/[Match dimensions and profiles]
|
|
441
1157
|
* link:../features/match-options/algorithm-specific-behavior.adoc[Algorithm-Specific Behavior]
|