canon 0.1.8 → 0.1.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (98) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop_todo.yml +112 -25
  3. data/docs/Gemfile +1 -0
  4. data/docs/_config.yml +90 -1
  5. data/docs/advanced/diff-classification.adoc +82 -2
  6. data/docs/features/match-options/index.adoc +239 -1
  7. data/lib/canon/comparison/format_detector.rb +2 -1
  8. data/lib/canon/comparison/html_comparator.rb +19 -8
  9. data/lib/canon/comparison/html_compare_profile.rb +8 -2
  10. data/lib/canon/comparison/match_options/base_resolver.rb +7 -0
  11. data/lib/canon/comparison/whitespace_sensitivity.rb +208 -0
  12. data/lib/canon/comparison/xml_comparator/child_comparison.rb +15 -7
  13. data/lib/canon/comparison/xml_comparator/node_parser.rb +10 -5
  14. data/lib/canon/comparison/xml_comparator/node_type_comparator.rb +14 -7
  15. data/lib/canon/comparison/xml_comparator.rb +48 -23
  16. data/lib/canon/comparison/xml_node_comparison.rb +25 -3
  17. data/lib/canon/diff/diff_classifier.rb +101 -2
  18. data/lib/canon/diff/formatting_detector.rb +1 -1
  19. data/lib/canon/rspec_matchers.rb +37 -8
  20. data/lib/canon/version.rb +1 -1
  21. data/lib/canon/xml/data_model.rb +24 -13
  22. metadata +3 -78
  23. data/docs/plans/2025-01-17-html-parser-selection-fix.adoc +0 -250
  24. data/false_positive_analysis.txt +0 -0
  25. data/file1.html +0 -1
  26. data/file2.html +0 -1
  27. data/old-docs/ADVANCED_TOPICS.adoc +0 -20
  28. data/old-docs/BASIC_USAGE.adoc +0 -16
  29. data/old-docs/CHARACTER_VISUALIZATION.adoc +0 -567
  30. data/old-docs/CLI.adoc +0 -497
  31. data/old-docs/CUSTOMIZING_BEHAVIOR.adoc +0 -19
  32. data/old-docs/DIFF_ARCHITECTURE.adoc +0 -435
  33. data/old-docs/DIFF_FORMATTING.adoc +0 -540
  34. data/old-docs/DIFF_PARAMETERS.adoc +0 -261
  35. data/old-docs/DOM_DIFF.adoc +0 -1017
  36. data/old-docs/ENV_CONFIG.adoc +0 -876
  37. data/old-docs/FORMATS.adoc +0 -867
  38. data/old-docs/INPUT_VALIDATION.adoc +0 -477
  39. data/old-docs/MATCHER_BEHAVIOR.adoc +0 -90
  40. data/old-docs/MATCH_ARCHITECTURE.adoc +0 -463
  41. data/old-docs/MATCH_OPTIONS.adoc +0 -912
  42. data/old-docs/MODES.adoc +0 -432
  43. data/old-docs/NORMATIVE_INFORMATIVE_DIFFS.adoc +0 -219
  44. data/old-docs/OPTIONS.adoc +0 -1387
  45. data/old-docs/PREPROCESSING.adoc +0 -491
  46. data/old-docs/README.old.adoc +0 -2831
  47. data/old-docs/RSPEC.adoc +0 -814
  48. data/old-docs/RUBY_API.adoc +0 -485
  49. data/old-docs/SEMANTIC_DIFF_REPORT.adoc +0 -646
  50. data/old-docs/SEMANTIC_TREE_DIFF.adoc +0 -765
  51. data/old-docs/STRING_COMPARE.adoc +0 -345
  52. data/old-docs/TMP.adoc +0 -3384
  53. data/old-docs/TREE_DIFF.adoc +0 -1080
  54. data/old-docs/UNDERSTANDING_CANON.adoc +0 -17
  55. data/old-docs/VERBOSE.adoc +0 -482
  56. data/old-docs/VISUALIZATION_MAP.adoc +0 -625
  57. data/old-docs/WHITESPACE_TREATMENT.adoc +0 -1155
  58. data/scripts/analyze_current_state.rb +0 -85
  59. data/scripts/analyze_false_positives.rb +0 -114
  60. data/scripts/analyze_remaining_failures.rb +0 -105
  61. data/scripts/compare_current_failures.rb +0 -95
  62. data/scripts/compare_dom_tree_diff.rb +0 -158
  63. data/scripts/compare_failures.rb +0 -151
  64. data/scripts/debug_attribute_extraction.rb +0 -66
  65. data/scripts/debug_blocks_839.rb +0 -115
  66. data/scripts/debug_meta_matching.rb +0 -52
  67. data/scripts/debug_p_matching.rb +0 -192
  68. data/scripts/debug_signature_matching.rb +0 -118
  69. data/scripts/debug_sourcecode_124.rb +0 -32
  70. data/scripts/debug_whitespace_sensitive.rb +0 -192
  71. data/scripts/extract_false_positives.rb +0 -138
  72. data/scripts/find_actual_false_positives.rb +0 -125
  73. data/scripts/investigate_all_false_positives.rb +0 -161
  74. data/scripts/investigate_batch1.rb +0 -127
  75. data/scripts/investigate_classification.rb +0 -150
  76. data/scripts/investigate_classification_detailed.rb +0 -190
  77. data/scripts/investigate_common_failures.rb +0 -342
  78. data/scripts/investigate_false_negative.rb +0 -80
  79. data/scripts/investigate_false_positive.rb +0 -83
  80. data/scripts/investigate_false_positives.rb +0 -227
  81. data/scripts/investigate_false_positives_batch.rb +0 -163
  82. data/scripts/investigate_mixed_content.rb +0 -125
  83. data/scripts/investigate_remaining_16.rb +0 -214
  84. data/scripts/run_single_test.rb +0 -29
  85. data/scripts/test_all_false_positives.rb +0 -95
  86. data/scripts/test_attribute_details.rb +0 -61
  87. data/scripts/test_both_algorithms.rb +0 -49
  88. data/scripts/test_both_simple.rb +0 -49
  89. data/scripts/test_enhanced_semantic_output.rb +0 -125
  90. data/scripts/test_readme_examples.rb +0 -131
  91. data/scripts/test_semantic_tree_diff.rb +0 -99
  92. data/scripts/test_semantic_ux_improvements.rb +0 -135
  93. data/scripts/test_single_false_positive.rb +0 -119
  94. data/scripts/test_size_limits.rb +0 -99
  95. data/test_html_1.html +0 -21
  96. data/test_html_2.html +0 -21
  97. data/test_nokogiri.rb +0 -33
  98. data/test_normalize.rb +0 -45
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 1d94be550a90d23eb695f46579b13fa327434993b89a995b3c95ba658a143fb9
4
- data.tar.gz: 2ac083712aa9d0153aa2e0898186a7cbf669e775368378c9de8de6c42c52a257
3
+ metadata.gz: 24f79ae4b9b6817104e388a5bef96d24677f797db5d13bd6f009a26a04170137
4
+ data.tar.gz: 3b8260af8e2157f2f449421b3d40649521ba64495a15f7667e64c3e343b6d3b7
5
5
  SHA512:
6
- metadata.gz: 45b4502c83bfd367c5933e66f610de529b87f27d5cfef18c0fe808bcbc91e20c1c3c0e32ba36ed6163a70de37c5ae3a1023c67fed586491e7eea5f7c621e2769
7
- data.tar.gz: 3713856d8dfdfb4164cfd9e8bdcd4dcbdce7b303619d703b7defe58d09c5468eb3acc5f32435f3184d35f20b3a6f7470c2960d8e5cef21ba720ebd7dc44ccfbf
6
+ metadata.gz: 971daa53fd96c5c46b5c37c2175f12875e7e36f658cc4186848f1df90ab3db9ceff06af69320e5004d1da1d6b2dc4b35d800c5aca1b1522ac05cf14c73025c21
7
+ data.tar.gz: 02f5160a42bf651db2a252909966cbc7dea43239cbbae2a53d155c55c4e09709eddc34e599496bfc5cda14705931f24efbebd267b9acc8700b834fb859d8096f
data/.rubocop_todo.yml CHANGED
@@ -1,6 +1,6 @@
1
1
  # This configuration was generated by
2
2
  # `rubocop --auto-gen-config`
3
- # on 2026-01-17 14:46:16 UTC using RuboCop version 1.81.7.
3
+ # on 2026-01-20 02:18:38 UTC using RuboCop version 1.81.7.
4
4
  # The point is for the user to remove these configuration records
5
5
  # one by one as the offenses are removed from the code base.
6
6
  # Note that changes in the inspected code, or installation of new
@@ -12,27 +12,70 @@ Gemspec/RequiredRubyVersion:
12
12
  Exclude:
13
13
  - 'canon.gemspec'
14
14
 
15
+ # Offense count: 2
16
+ # This cop supports safe autocorrection (--autocorrect).
17
+ # Configuration parameters: EnforcedStyleAlignWith.
18
+ # SupportedStylesAlignWith: either, start_of_block, start_of_line
19
+ Layout/BlockAlignment:
20
+ Exclude:
21
+ - 'spec/canon/rspec_matchers_spec.rb'
22
+
23
+ # Offense count: 2
24
+ # This cop supports safe autocorrection (--autocorrect).
25
+ Layout/BlockEndNewline:
26
+ Exclude:
27
+ - 'spec/canon/rspec_matchers_spec.rb'
28
+
29
+ # Offense count: 2
30
+ # This cop supports safe autocorrection (--autocorrect).
31
+ # Configuration parameters: AllowForAlignment.
32
+ Layout/CommentIndentation:
33
+ Exclude:
34
+ - 'lib/canon/comparison/xml_comparator.rb'
35
+
15
36
  # Offense count: 1
16
37
  # This cop supports safe autocorrection (--autocorrect).
17
- # Configuration parameters: EnforcedStyle, IndentationWidth.
18
- # SupportedStyles: with_first_argument, with_fixed_indentation
19
- Layout/ArgumentAlignment:
38
+ Layout/ElseAlignment:
20
39
  Exclude:
21
- - 'lib/canon/comparison.rb'
40
+ - 'lib/canon/comparison/xml_comparator.rb'
41
+
42
+ # Offense count: 1
43
+ # This cop supports safe autocorrection (--autocorrect).
44
+ # Configuration parameters: EnforcedStyleAlignWith, Severity.
45
+ # SupportedStylesAlignWith: keyword, variable, start_of_line
46
+ Layout/EndAlignment:
47
+ Exclude:
48
+ - 'lib/canon/comparison/xml_comparator.rb'
49
+
50
+ # Offense count: 1
51
+ # This cop supports safe autocorrection (--autocorrect).
52
+ # Configuration parameters: EnforcedStyle.
53
+ # SupportedStyles: normal, indented_internal_methods
54
+ Layout/IndentationConsistency:
55
+ Exclude:
56
+ - 'lib/canon/comparison/xml_comparator.rb'
22
57
 
23
- # Offense count: 697
58
+ # Offense count: 4
59
+ # This cop supports safe autocorrection (--autocorrect).
60
+ # Configuration parameters: Width, AllowedPatterns.
61
+ Layout/IndentationWidth:
62
+ Exclude:
63
+ - 'spec/canon/rspec_matchers_spec.rb'
64
+
65
+ # Offense count: 655
24
66
  # This cop supports safe autocorrection (--autocorrect).
25
67
  # Configuration parameters: Max, AllowHeredoc, AllowURI, AllowQualifiedName, URISchemes, IgnoreCopDirectives, AllowedPatterns, SplitStrings.
26
68
  # URISchemes: http, https
27
69
  Layout/LineLength:
28
70
  Enabled: false
29
71
 
30
- # Offense count: 1
72
+ # Offense count: 3
31
73
  # This cop supports safe autocorrection (--autocorrect).
32
- # Configuration parameters: AllowInHeredoc.
33
- Layout/TrailingWhitespace:
74
+ # Configuration parameters: EnforcedStyle, IndentationWidth.
75
+ # SupportedStyles: aligned, indented
76
+ Layout/MultilineOperationIndentation:
34
77
  Exclude:
35
- - 'lib/canon/comparison.rb'
78
+ - 'lib/canon/diff/diff_classifier.rb'
36
79
 
37
80
  # Offense count: 48
38
81
  # Configuration parameters: IgnoreLiteralBranches, IgnoreConstantBranches, IgnoreDuplicateElseBranch.
@@ -74,38 +117,38 @@ Lint/UnusedMethodArgument:
74
117
  - 'lib/canon/diff_formatter/by_line/xml_formatter.rb'
75
118
  - 'lib/canon/diff_formatter/by_object/base_formatter.rb'
76
119
 
77
- # Offense count: 225
120
+ # Offense count: 194
78
121
  # Configuration parameters: AllowedMethods, AllowedPatterns, CountRepeatedAttributes, Max.
79
122
  Metrics/AbcSize:
80
123
  Enabled: false
81
124
 
82
- # Offense count: 27
125
+ # Offense count: 20
83
126
  # Configuration parameters: CountComments, CountAsOne, AllowedMethods, AllowedPatterns, inherit_mode.
84
127
  # AllowedMethods: refine
85
128
  Metrics/BlockLength:
86
129
  Max: 84
87
130
 
88
- # Offense count: 178
131
+ # Offense count: 164
89
132
  # Configuration parameters: AllowedMethods, AllowedPatterns, Max.
90
133
  Metrics/CyclomaticComplexity:
91
134
  Enabled: false
92
135
 
93
- # Offense count: 376
136
+ # Offense count: 346
94
137
  # Configuration parameters: CountComments, CountAsOne, AllowedMethods, AllowedPatterns.
95
138
  Metrics/MethodLength:
96
139
  Max: 110
97
140
 
98
- # Offense count: 39
141
+ # Offense count: 45
99
142
  # Configuration parameters: CountKeywordArgs, MaxOptionalParameters.
100
143
  Metrics/ParameterLists:
101
144
  Max: 9
102
145
 
103
- # Offense count: 143
146
+ # Offense count: 131
104
147
  # Configuration parameters: AllowedMethods, AllowedPatterns, Max.
105
148
  Metrics/PerceivedComplexity:
106
149
  Enabled: false
107
150
 
108
- # Offense count: 29
151
+ # Offense count: 28
109
152
  # Configuration parameters: MinNameLength, AllowNamesEndingInNumbers, AllowedNames, ForbiddenNames.
110
153
  # AllowedNames: as, at, by, cc, db, id, if, in, io, ip, of, on, os, pp, to
111
154
  Naming/MethodParameterName:
@@ -113,7 +156,6 @@ Naming/MethodParameterName:
113
156
  - 'lib/canon/comparison/xml_comparator.rb'
114
157
  - 'lib/canon/comparison/xml_comparator/attribute_comparator.rb'
115
158
  - 'lib/canon/xml/namespace_handler.rb'
116
- - 'scripts/investigate_all_false_positives.rb'
117
159
 
118
160
  # Offense count: 1
119
161
  # Configuration parameters: NamePrefix, ForbiddenPrefixes, AllowedMethods, MethodDefinitionMacros, UseSorbetSigs.
@@ -140,7 +182,7 @@ Performance/CollectionLiteralInLoop:
140
182
  - 'lib/canon/comparison/html_comparator.rb'
141
183
  - 'lib/canon/xml/xml_base_handler.rb'
142
184
 
143
- # Offense count: 62
185
+ # Offense count: 64
144
186
  # Configuration parameters: Prefixes, AllowedPatterns.
145
187
  # Prefixes: when, with, without
146
188
  RSpec/ContextWording:
@@ -157,7 +199,7 @@ RSpec/DescribeMethod:
157
199
  - 'spec/canon/comparison/multiple_differences_spec.rb'
158
200
  - 'spec/canon/diff_formatter/character_map_customization_spec.rb'
159
201
 
160
- # Offense count: 624
202
+ # Offense count: 663
161
203
  # Configuration parameters: CountAsOne.
162
204
  RSpec/ExampleLength:
163
205
  Max: 67
@@ -171,7 +213,7 @@ RSpec/ExpectActual:
171
213
  - 'spec/canon/rspec_matchers_spec.rb'
172
214
  - 'spec/canon/string_matcher_spec.rb'
173
215
 
174
- # Offense count: 171
216
+ # Offense count: 175
175
217
  # Configuration parameters: Max, AllowedIdentifiers, AllowedPatterns.
176
218
  RSpec/IndexedLet:
177
219
  Exclude:
@@ -212,7 +254,7 @@ RSpec/MultipleDescribes:
212
254
  RSpec/MultipleExpectations:
213
255
  Max: 15
214
256
 
215
- # Offense count: 66
257
+ # Offense count: 69
216
258
  # Configuration parameters: AllowSubject.
217
259
  RSpec/MultipleMemoizedHelpers:
218
260
  Max: 13
@@ -226,7 +268,7 @@ RSpec/NamedSubject:
226
268
  - 'spec/canon/pretty_printer/json_spec.rb'
227
269
  - 'spec/canon/pretty_printer/xml_spec.rb'
228
270
 
229
- # Offense count: 30
271
+ # Offense count: 37
230
272
  # Configuration parameters: AllowedGroups.
231
273
  RSpec/NestedGroups:
232
274
  Max: 4
@@ -254,14 +296,34 @@ RSpec/SpecFilePathFormat:
254
296
  - 'spec/canon/yaml/formatter_spec.rb'
255
297
  - 'spec/xml_c14n_spec.rb'
256
298
 
257
- # Offense count: 94
299
+ # Offense count: 95
258
300
  # Configuration parameters: IgnoreNameless, IgnoreSymbolicNames.
259
301
  RSpec/VerifiedDoubles:
260
302
  Exclude:
303
+ - 'spec/canon/comparison/whitespace_sensitivity_spec.rb'
261
304
  - 'spec/canon/diff/diff_classifier_spec.rb'
262
305
  - 'spec/canon/diff/path_builder_spec.rb'
263
306
  - 'spec/canon/tree_diff/operation_converter_spec.rb'
264
307
 
308
+ # Offense count: 3
309
+ # This cop supports safe autocorrection (--autocorrect).
310
+ # Configuration parameters: EnforcedStyle, ProceduralMethods, FunctionalMethods, AllowedMethods, AllowedPatterns, AllowBracesOnProceduralOneLiners, BracesRequiredMethods.
311
+ # SupportedStyles: line_count_based, semantic, braces_for_chaining, always_braces
312
+ # ProceduralMethods: benchmark, bm, bmbm, create, each_with_object, measure, new, realtime, tap, with_object
313
+ # FunctionalMethods: let, let!, subject, watch
314
+ # AllowedMethods: lambda, proc, it
315
+ Style/BlockDelimiters:
316
+ Exclude:
317
+ - 'spec/canon/rspec_matchers_spec.rb'
318
+
319
+ # Offense count: 1
320
+ # This cop supports safe autocorrection (--autocorrect).
321
+ # Configuration parameters: EnforcedStyle, AllowComments.
322
+ # SupportedStyles: empty, nil, both
323
+ Style/EmptyElse:
324
+ Exclude:
325
+ - 'lib/canon/comparison/xml_comparator.rb'
326
+
265
327
  # Offense count: 3
266
328
  # Configuration parameters: MinBranchesCount.
267
329
  Style/HashLikeCase:
@@ -269,10 +331,11 @@ Style/HashLikeCase:
269
331
  - 'lib/canon/diff/diff_block_builder.rb'
270
332
  - 'lib/canon/xml/character_encoder.rb'
271
333
 
272
- # Offense count: 4
334
+ # Offense count: 6
273
335
  # This cop supports unsafe autocorrection (--autocorrect-all).
274
336
  Style/IdenticalConditionalBranches:
275
337
  Exclude:
338
+ - 'lib/canon/comparison/xml_comparator.rb'
276
339
  - 'lib/canon/diff_formatter/by_object/base_formatter.rb'
277
340
  - 'lib/canon/diff_formatter/legend.rb'
278
341
 
@@ -282,3 +345,27 @@ Style/IdenticalConditionalBranches:
282
345
  Style/OptionalBooleanParameter:
283
346
  Exclude:
284
347
  - 'lib/canon/diff_formatter/debug_output.rb'
348
+
349
+ # Offense count: 6
350
+ # This cop supports safe autocorrection (--autocorrect).
351
+ # Configuration parameters: EnforcedStyle, ConsistentQuotesInMultiline.
352
+ # SupportedStyles: single_quotes, double_quotes
353
+ Style/StringLiterals:
354
+ Exclude:
355
+ - 'spec/canon/rspec_matchers_spec.rb'
356
+
357
+ # Offense count: 5
358
+ # This cop supports safe autocorrection (--autocorrect).
359
+ # Configuration parameters: EnforcedStyleForMultiline.
360
+ # SupportedStylesForMultiline: comma, consistent_comma, diff_comma, no_comma
361
+ Style/TrailingCommaInArguments:
362
+ Exclude:
363
+ - 'spec/canon/rspec_matchers_spec.rb'
364
+
365
+ # Offense count: 3
366
+ # This cop supports safe autocorrection (--autocorrect).
367
+ # Configuration parameters: EnforcedStyleForMultiline.
368
+ # SupportedStylesForMultiline: comma, consistent_comma, diff_comma, no_comma
369
+ Style/TrailingCommaInHashLiteral:
370
+ Exclude:
371
+ - 'spec/canon/rspec_matchers_spec.rb'
data/docs/Gemfile CHANGED
@@ -6,4 +6,5 @@ gem "just-the-docs"
6
6
 
7
7
  group :jekyll_plugins do
8
8
  gem "jekyll-seo-tag"
9
+ gem "jekyll-sitemap"
9
10
  end
data/docs/_config.yml CHANGED
@@ -12,8 +12,13 @@ repository: lutaml/canon
12
12
 
13
13
  # Theme
14
14
  theme: just-the-docs
15
+ remote_theme: just-the-docs/just-the-docs@v0.7.0
15
16
  color_scheme: light
16
17
 
18
+ # Logo (uncomment if you have a logo)
19
+ # logo: "/assets/images/logo.svg"
20
+ # favicon_ico: "/assets/images/favicon.ico"
21
+
17
22
  # AsciiDoc support
18
23
  asciidoc: {}
19
24
  asciidoctor:
@@ -63,10 +68,36 @@ heading_anchors: true
63
68
  # Footer
64
69
  footer_content: 'Copyright &copy; 2025 Ribose. Distributed under the <a href="https://github.com/lutaml/canon/blob/main/LICENSE.txt">BSD 2-Clause License</a>.'
65
70
 
71
+ # Footer last edit timestamp
72
+ last_edit_timestamp: true
73
+ last_edit_time_format: "%b %e %Y at %I:%M %p"
74
+
75
+ # Enable code copy button
76
+ enable_copy_code_button: true
77
+
78
+ # Callouts
79
+ callouts_level: quiet
80
+ callouts:
81
+ highlight:
82
+ color: yellow
83
+ important:
84
+ title: Important
85
+ color: blue
86
+ new:
87
+ title: New
88
+ color: green
89
+ note:
90
+ title: Note
91
+ color: purple
92
+ warning:
93
+ title: Warning
94
+ color: red
95
+
66
96
  # Plugins
67
97
  plugins:
68
98
  - jekyll-asciidoc
69
99
  - jekyll-seo-tag
100
+ - jekyll-sitemap
70
101
 
71
102
  # Markdown settings (for any markdown files)
72
103
  markdown: kramdown
@@ -75,6 +106,60 @@ kramdown:
75
106
  hard_wrap: false
76
107
  syntax_highlighter: rouge
77
108
 
109
+ # Collections for organizing content
110
+ collections:
111
+ # Core documentation pages (getting-started, interfaces, etc.)
112
+ pages:
113
+ permalink: "/:path/"
114
+ output: true
115
+
116
+ # Feature documentation
117
+ features:
118
+ permalink: "/:collection/:path/"
119
+ output: true
120
+
121
+ # Understanding/internal documentation
122
+ understanding:
123
+ permalink: "/:collection/:path/"
124
+ output: true
125
+
126
+ # Advanced topics
127
+ advanced:
128
+ permalink: "/:collection/:path/"
129
+ output: true
130
+
131
+ # Guides (task-oriented tutorials)
132
+ guides:
133
+ permalink: "/:collection/:path/"
134
+ output: true
135
+
136
+ # Reference documentation
137
+ reference:
138
+ permalink: "/:collection/:path/"
139
+ output: true
140
+
141
+ # Just the Docs collection configuration
142
+ just_the_docs:
143
+ collections:
144
+ pages:
145
+ name: Pages
146
+ nav_fold: false
147
+ features:
148
+ name: Features
149
+ nav_fold: true
150
+ understanding:
151
+ name: Understanding
152
+ nav_fold: true
153
+ advanced:
154
+ name: Advanced
155
+ nav_fold: true
156
+ guides:
157
+ name: Guides
158
+ nav_fold: true
159
+ reference:
160
+ name: Reference
161
+ nav_fold: true
162
+
78
163
  # Defaults
79
164
  defaults:
80
165
  - scope:
@@ -83,6 +168,10 @@ defaults:
83
168
  values:
84
169
  layout: default
85
170
 
171
+ # Include additional files
172
+ include:
173
+ - "*.adoc"
174
+
86
175
  # Exclude from processing
87
176
  exclude:
88
177
  - Gemfile
@@ -97,4 +186,4 @@ exclude:
97
186
  - .git
98
187
  - .gitignore
99
188
 
100
- permalink: pretty
189
+ permalink: pretty
@@ -229,6 +229,69 @@ result = Canon::Comparison.equivalent?(
229
229
  ----
230
230
  ====
231
231
 
232
+ ==== Text Content
233
+
234
+ * **`:strict` behavior** → Normative
235
+ - Text must match exactly, including all whitespace
236
+ - Any text difference causes non-equivalence
237
+
238
+ * **`:normalize` behavior** → Normative (after normalization) or Informative (if formatting-only)
239
+ - Whitespace is normalized (collapsed/trimmed) before comparison
240
+ - If normalized texts match but originals differ, classified as formatting-only (informative)
241
+ - This ensures that whitespace-only differences don't affect equivalence
242
+ - Element-level sensitivity is respected (e.g., `<pre>`, `<code>` preserve whitespace)
243
+
244
+ * **`:ignore` behavior** → Informative
245
+ - Text content differences tracked but don't affect equivalence
246
+
247
+ .Example: Text content with normalize behavior
248
+ ====
249
+ [source,ruby]
250
+ ----
251
+ # Formatting-only difference - normalized texts match
252
+ xml1 = '<p>Hello world</p>'
253
+ xml2 = '<p>Hello world</p>'
254
+
255
+ result = Canon::Comparison.equivalent?(
256
+ xml1, xml2,
257
+ match: { text_content: :normalize }
258
+ )
259
+ # => true (extra space is formatting-only, classified as informative)
260
+
261
+ # Shows as informative in verbose output
262
+ result.differences.first.normative?
263
+ # => false
264
+ result.differences.first.formatting?
265
+ # => true
266
+ ----
267
+
268
+ .Using text_content: :normalize with element-level sensitivity
269
+ ====
270
+ [source,ruby]
271
+ ----
272
+ # HTML defaults: <code> is whitespace-sensitive
273
+ html1 = '<code> indented </code><p> text </p>'
274
+ html2 = '<code>indented</code><p>text</p>'
275
+
276
+ # With <code> blacklisted from sensitive elements
277
+ Canon::Comparison.equivalent?(html1, html2,
278
+ format: :html,
279
+ match: {
280
+ whitespace_insensitive_elements: [:code],
281
+ }
282
+ )
283
+ # => true
284
+ # - <code> whitespace: formatting-only (informative)
285
+ # - <p> whitespace: formatting-only (informative)
286
+
287
+ # Without blacklisting (default HTML behavior)
288
+ Canon::Comparison.equivalent?(html1, html2, format: :html)
289
+ # => false
290
+ # - <code> whitespace: normative (sensitive element)
291
+ # - <p> whitespace: formatting-only (informative)
292
+ ----
293
+ ====
294
+
232
295
  === FormattingDetector Integration
233
296
 
234
297
  For dimensions that support it (`:text_content`, `:structural_whitespace`),
@@ -262,12 +325,23 @@ The [`CompareProfile`](../../lib/canon/comparison/compare_profile.rb) class prov
262
325
  * `affects_equivalence?(dimension)` - Does this dimension affect equivalence?
263
326
  * `supports_formatting_detection?(dimension)` - Can this dimension have formatting-only diffs?
264
327
 
265
- The [`DiffClassifier`](../../lib/canon/diff/diff_classifier.rb) uses CompareProfile to classify:
328
+ The [`DiffClassifier`](../../lib/canon/diff/diff_classifier.rb) uses CompareProfile to classify differences, with special handling for `text_content: :normalize`:
266
329
 
267
330
  [source,ruby]
268
331
  ----
269
332
  def classify(diff_node)
270
- # Check normative status based on policy
333
+ # SPECIAL CASE: text_content with :normalize behavior
334
+ # Formatting-only differences (whitespace-only) are marked as non-normative
335
+ if diff_node.dimension == :text_content &&
336
+ profile.send(:behavior_for, :text_content) == :normalize &&
337
+ !inside_whitespace_sensitive_element?(diff_node) &&
338
+ formatting_only_diff?(diff_node)
339
+ diff_node.formatting = true
340
+ diff_node.normative = false
341
+ return diff_node
342
+ end
343
+
344
+ # Standard classification flow
271
345
  is_normative = profile.normative_dimension?(diff_node.dimension)
272
346
 
273
347
  # Only check formatting for non-normative dimensions
@@ -284,6 +358,12 @@ def classify(diff_node)
284
358
  end
285
359
  ----
286
360
 
361
+ The key distinction for `text_content: :normalize`:
362
+
363
+ * **Formatting-only detection**: Uses `normalized_equivalent?` method to compare normalized texts
364
+ * **Element sensitivity**: Respects element-level whitespace sensitivity (`<pre>`, `<code>`, etc.)
365
+ * **Result**: Whitespace-only differences are classified as *informative* (non-normative) when using `:normalize`
366
+
287
367
  == Visual Indicators
288
368
 
289
369
  === Normative Diffs