canon 0.1.8 → 0.1.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (98) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop_todo.yml +112 -25
  3. data/docs/Gemfile +1 -0
  4. data/docs/_config.yml +90 -1
  5. data/docs/advanced/diff-classification.adoc +82 -2
  6. data/docs/features/match-options/index.adoc +239 -1
  7. data/lib/canon/comparison/format_detector.rb +2 -1
  8. data/lib/canon/comparison/html_comparator.rb +19 -8
  9. data/lib/canon/comparison/html_compare_profile.rb +8 -2
  10. data/lib/canon/comparison/match_options/base_resolver.rb +7 -0
  11. data/lib/canon/comparison/whitespace_sensitivity.rb +208 -0
  12. data/lib/canon/comparison/xml_comparator/child_comparison.rb +15 -7
  13. data/lib/canon/comparison/xml_comparator/node_parser.rb +10 -5
  14. data/lib/canon/comparison/xml_comparator/node_type_comparator.rb +14 -7
  15. data/lib/canon/comparison/xml_comparator.rb +48 -23
  16. data/lib/canon/comparison/xml_node_comparison.rb +25 -3
  17. data/lib/canon/diff/diff_classifier.rb +101 -2
  18. data/lib/canon/diff/formatting_detector.rb +1 -1
  19. data/lib/canon/rspec_matchers.rb +37 -8
  20. data/lib/canon/version.rb +1 -1
  21. data/lib/canon/xml/data_model.rb +24 -13
  22. metadata +3 -78
  23. data/docs/plans/2025-01-17-html-parser-selection-fix.adoc +0 -250
  24. data/false_positive_analysis.txt +0 -0
  25. data/file1.html +0 -1
  26. data/file2.html +0 -1
  27. data/old-docs/ADVANCED_TOPICS.adoc +0 -20
  28. data/old-docs/BASIC_USAGE.adoc +0 -16
  29. data/old-docs/CHARACTER_VISUALIZATION.adoc +0 -567
  30. data/old-docs/CLI.adoc +0 -497
  31. data/old-docs/CUSTOMIZING_BEHAVIOR.adoc +0 -19
  32. data/old-docs/DIFF_ARCHITECTURE.adoc +0 -435
  33. data/old-docs/DIFF_FORMATTING.adoc +0 -540
  34. data/old-docs/DIFF_PARAMETERS.adoc +0 -261
  35. data/old-docs/DOM_DIFF.adoc +0 -1017
  36. data/old-docs/ENV_CONFIG.adoc +0 -876
  37. data/old-docs/FORMATS.adoc +0 -867
  38. data/old-docs/INPUT_VALIDATION.adoc +0 -477
  39. data/old-docs/MATCHER_BEHAVIOR.adoc +0 -90
  40. data/old-docs/MATCH_ARCHITECTURE.adoc +0 -463
  41. data/old-docs/MATCH_OPTIONS.adoc +0 -912
  42. data/old-docs/MODES.adoc +0 -432
  43. data/old-docs/NORMATIVE_INFORMATIVE_DIFFS.adoc +0 -219
  44. data/old-docs/OPTIONS.adoc +0 -1387
  45. data/old-docs/PREPROCESSING.adoc +0 -491
  46. data/old-docs/README.old.adoc +0 -2831
  47. data/old-docs/RSPEC.adoc +0 -814
  48. data/old-docs/RUBY_API.adoc +0 -485
  49. data/old-docs/SEMANTIC_DIFF_REPORT.adoc +0 -646
  50. data/old-docs/SEMANTIC_TREE_DIFF.adoc +0 -765
  51. data/old-docs/STRING_COMPARE.adoc +0 -345
  52. data/old-docs/TMP.adoc +0 -3384
  53. data/old-docs/TREE_DIFF.adoc +0 -1080
  54. data/old-docs/UNDERSTANDING_CANON.adoc +0 -17
  55. data/old-docs/VERBOSE.adoc +0 -482
  56. data/old-docs/VISUALIZATION_MAP.adoc +0 -625
  57. data/old-docs/WHITESPACE_TREATMENT.adoc +0 -1155
  58. data/scripts/analyze_current_state.rb +0 -85
  59. data/scripts/analyze_false_positives.rb +0 -114
  60. data/scripts/analyze_remaining_failures.rb +0 -105
  61. data/scripts/compare_current_failures.rb +0 -95
  62. data/scripts/compare_dom_tree_diff.rb +0 -158
  63. data/scripts/compare_failures.rb +0 -151
  64. data/scripts/debug_attribute_extraction.rb +0 -66
  65. data/scripts/debug_blocks_839.rb +0 -115
  66. data/scripts/debug_meta_matching.rb +0 -52
  67. data/scripts/debug_p_matching.rb +0 -192
  68. data/scripts/debug_signature_matching.rb +0 -118
  69. data/scripts/debug_sourcecode_124.rb +0 -32
  70. data/scripts/debug_whitespace_sensitive.rb +0 -192
  71. data/scripts/extract_false_positives.rb +0 -138
  72. data/scripts/find_actual_false_positives.rb +0 -125
  73. data/scripts/investigate_all_false_positives.rb +0 -161
  74. data/scripts/investigate_batch1.rb +0 -127
  75. data/scripts/investigate_classification.rb +0 -150
  76. data/scripts/investigate_classification_detailed.rb +0 -190
  77. data/scripts/investigate_common_failures.rb +0 -342
  78. data/scripts/investigate_false_negative.rb +0 -80
  79. data/scripts/investigate_false_positive.rb +0 -83
  80. data/scripts/investigate_false_positives.rb +0 -227
  81. data/scripts/investigate_false_positives_batch.rb +0 -163
  82. data/scripts/investigate_mixed_content.rb +0 -125
  83. data/scripts/investigate_remaining_16.rb +0 -214
  84. data/scripts/run_single_test.rb +0 -29
  85. data/scripts/test_all_false_positives.rb +0 -95
  86. data/scripts/test_attribute_details.rb +0 -61
  87. data/scripts/test_both_algorithms.rb +0 -49
  88. data/scripts/test_both_simple.rb +0 -49
  89. data/scripts/test_enhanced_semantic_output.rb +0 -125
  90. data/scripts/test_readme_examples.rb +0 -131
  91. data/scripts/test_semantic_tree_diff.rb +0 -99
  92. data/scripts/test_semantic_ux_improvements.rb +0 -135
  93. data/scripts/test_single_false_positive.rb +0 -119
  94. data/scripts/test_size_limits.rb +0 -99
  95. data/test_html_1.html +0 -21
  96. data/test_html_2.html +0 -21
  97. data/test_nokogiri.rb +0 -33
  98. data/test_normalize.rb +0 -45
@@ -1,345 +0,0 @@
1
- = String Comparison in Canon
2
- :toc:
3
- :toclevels: 3
4
-
5
- == General
6
-
7
- Canon provides advanced string comparison capabilities with character-level visualization and diff rendering. This feature is particularly useful for:
8
-
9
- * Comparing strings with invisible whitespace differences (spaces, tabs, trailing newlines)
10
- * Detecting Unicode differences (non-breaking spaces, zero-width characters, etc.)
11
- * Visualizing multi-line string differences with context
12
- * Identifying trailing newline differences in text output
13
-
14
- == Character Visualization
15
-
16
- Canon automatically visualizes invisible and special characters to make differences clear:
17
-
18
- [cols="2,3,2",options="header"]
19
- |===
20
- |Character
21
- |Description
22
- |Visualization
23
-
24
- |Space (U+0020)
25
- |Regular space
26
- |`░`
27
-
28
- |Tab (U+0009)
29
- |Tab character
30
- |`⇥`
31
-
32
- |Non-breaking space (U+00A0)
33
- |Non-breaking space
34
- |`␣`
35
-
36
- |Zero-width space (U+200B)
37
- |Zero-width space
38
- |`→`
39
-
40
- |Other invisible characters
41
- |Various Unicode invisibles
42
- |See Unicode legend in output
43
- |===
44
-
45
- == Usage in RSpec
46
-
47
- === Auto-detection with `be_equivalent_to`
48
-
49
- The `be_equivalent_to` matcher automatically detects the format (XML, JSON, YAML, or string) and uses the appropriate comparison mode.
50
-
51
- .Auto-detecting string mode
52
- [source,ruby]
53
- ----
54
- RSpec.describe "String comparison" do
55
- it "auto-detects string format" do
56
- actual = "Hello World"
57
- expected = "Hello Universe"
58
-
59
- expect(actual).to be_equivalent_to(expected)
60
- # Automatically uses STRING mode for plain text
61
- end
62
- end
63
- ----
64
-
65
- === Explicit string mode with `be_string_equivalent_to`
66
-
67
- For explicit string comparison, use the `be_string_equivalent_to` matcher.
68
-
69
- .Basic string comparison
70
- [source,ruby]
71
- ----
72
- RSpec.describe "String comparison" do
73
- it "compares strings exactly" do
74
- expect("Hello World").to be_string_equivalent_to("Hello World")
75
- end
76
-
77
- it "detects differences" do
78
- expect("Hello World").not_to be_string_equivalent_to("Hello Universe")
79
- end
80
- end
81
- ----
82
-
83
- === Whitespace differences
84
-
85
- .Detecting extra spaces
86
- [source,ruby]
87
- ----
88
- it "detects extra spaces" do
89
- actual = "Hello World"
90
- expected = "Hello World" # Two spaces
91
-
92
- expect(actual).not_to be_string_equivalent_to(expected)
93
- end
94
- ----
95
-
96
- When this test fails, the output shows:
97
-
98
- ----
99
- expected STRING to be equivalent
100
-
101
- Line-by-line diff (STRING mode):
102
- 1| 1- | Hello░World
103
- | 1+ | Hello░░World
104
- ----
105
-
106
- === Trailing newline differences
107
-
108
- .Detecting trailing newlines
109
- [source,ruby]
110
- ----
111
- it "detects trailing newline" do
112
- actual = "data:image/png;base64,abc123"
113
- expected = "data:image/png;base64,abc123\n"
114
-
115
- expect(actual).not_to be_string_equivalent_to(expected)
116
- end
117
- ----
118
-
119
- When this test fails, the output shows two separate lines:
120
-
121
- ----
122
- expected STRING to be equivalent
123
-
124
- Line-by-line diff (STRING mode):
125
- 1| 1 | data:image/png;base64,abc123
126
- | 2+ |
127
- ----
128
-
129
- The empty line on line 2 (shown in green with `+`) represents the trailing newline character.
130
-
131
- === Unicode character differences
132
-
133
- .Detecting Unicode differences
134
- [source,ruby]
135
- ----
136
- it "detects non-breaking space vs regular space" do
137
- actual = "Hello World"
138
- expected = "Hello\u00A0World" # Non-breaking space
139
-
140
- expect(actual).not_to be_string_equivalent_to(expected)
141
- end
142
- ----
143
-
144
- When this test fails, the output includes a Unicode legend:
145
-
146
- ----
147
- expected STRING to be equivalent
148
-
149
- Character Visualization Legend:
150
- ░ = U+0020 (Space)
151
- ␣ = U+00A0 (Non-breaking space)
152
-
153
- Line-by-line diff (STRING mode):
154
- 1| 1- | Hello░World
155
- | 1+ | Hello␣World
156
- ----
157
-
158
- === Multi-line string differences
159
-
160
- .Comparing multi-line strings
161
- [source,ruby]
162
- ----
163
- it "shows line-by-line diff for multi-line strings" do
164
- actual = <<~TEXT
165
- Line 1
166
- Line 2
167
- Line 3
168
- TEXT
169
-
170
- expected = <<~TEXT
171
- Line 1
172
- Line 2 Modified
173
- Line 3
174
- TEXT
175
-
176
- expect(actual).not_to be_string_equivalent_to(expected)
177
- end
178
- ----
179
-
180
- Output shows context around the changed line:
181
-
182
- ----
183
- expected STRING to be equivalent
184
-
185
- Line-by-line diff (STRING mode):
186
- 1| 1 | Line 1
187
- 2| 2- | Line 2
188
- | 2+ | Line 2 Modified
189
- 3| 3 | Line 3
190
- ----
191
-
192
- == Usage in CLI
193
-
194
- === Comparing string files
195
-
196
- .Using the diff command
197
- [source,bash]
198
- ----
199
- canon diff actual.txt expected.txt --format string
200
- ----
201
-
202
- .Output
203
- ----
204
- Line-by-line diff (STRING mode):
205
- 1| 1 | Hello World
206
- 2| 2- | Line with spaces
207
- | 2+ | Line with spaces
208
- 3| 3 | Final line
209
- ----
210
-
211
- === Format auto-detection
212
-
213
- If you don't specify `--format`, Canon will auto-detect the format based on file content:
214
-
215
- [source,bash]
216
- ----
217
- canon diff actual.txt expected.txt
218
- # Auto-detects as string if content isn't XML/JSON/YAML
219
- ----
220
-
221
- == Usage in API
222
-
223
- === Using DiffFormatter directly
224
-
225
- .Programmatic string comparison
226
- [source,ruby]
227
- ----
228
- require 'canon/diff_formatter'
229
-
230
- actual = "Hello World"
231
- expected = "Hello World" # Two spaces
232
-
233
- formatter = Canon::DiffFormatter.new(
234
- use_color: true,
235
- mode: :by_line,
236
- context_lines: 3
237
- )
238
-
239
- diff = formatter.format([], :string,
240
- doc1: actual,
241
- doc2: expected)
242
- puts diff
243
- ----
244
-
245
- .Output
246
- ----
247
- Line-by-line diff (STRING mode):
248
- 1| 1- | Hello░World
249
- | 1+ | Hello░░World
250
- ----
251
-
252
- === Using Canon.format for canonicalization
253
-
254
- Note: For strings, Canon does not perform canonicalization (no formatting changes are applied). The string is compared exactly as-is.
255
-
256
- .String comparison example
257
- [source,ruby]
258
- ----
259
- require 'canon'
260
-
261
- actual = "Hello World"
262
- expected = "Hello World"
263
-
264
- # Strings are compared as-is (no canonicalization)
265
- result = Canon.format(actual, :string)
266
- # => "Hello World"
267
-
268
- result == expected # => true
269
- ----
270
-
271
- == Configuration
272
-
273
- === Customizing character visualization
274
-
275
- You can customize how characters are visualized by configuring the visualization map:
276
-
277
- .Customizing character map
278
- [source,ruby]
279
- ----
280
- Canon::RSpecMatchers.configure do |config|
281
- config.use_color = true
282
- config.context_lines = 3
283
- config.diff_mode = :by_line
284
- end
285
- ----
286
-
287
- === Color output
288
-
289
- Color output is enabled by default in RSpec. To disable:
290
-
291
- .Disabling colors
292
- [source,ruby]
293
- ----
294
- Canon::RSpecMatchers.configure do |config|
295
- config.use_color = false
296
- end
297
- ----
298
-
299
- == Advanced Features
300
-
301
- === Trailing newline detection
302
-
303
- Canon properly handles trailing newlines by preserving them during line splitting. This ensures that strings like:
304
-
305
- * `"text"` (no trailing newline)
306
- * `"text\n"` (with trailing newline)
307
-
308
- Are shown as different, with the trailing newline visualized as an empty line in the diff output.
309
-
310
- === Context lines
311
-
312
- By default, Canon shows 3 lines of context around changes. This can be configured:
313
-
314
- .Adjusting context lines
315
- [source,ruby]
316
- ----
317
- Canon::RSpecMatchers.configure do |config|
318
- config.context_lines = 5 # Show 5 lines of context
319
- end
320
- ----
321
-
322
- == Technical Implementation
323
-
324
- === Line splitting
325
-
326
- Canon uses `split("\n", -1)` to preserve trailing empty strings, ensuring that:
327
-
328
- [source,ruby]
329
- ----
330
- "abc\n".split("\n", -1) # => ["abc", ""]
331
- "abc".split("\n", -1) # => ["abc"]
332
- ----
333
-
334
- This allows proper detection and visualization of trailing newlines.
335
-
336
- === Character visualization map
337
-
338
- The character visualization is configurable via `lib/canon/diff_formatter/character_map.yml`. See link:VISUALIZATION_MAP[Visualization Map Documentation] for details on customizing character representations.
339
-
340
- == See Also
341
-
342
- * link:README[Canon README] - General Canon documentation
343
- * link:VISUALIZATION_MAP[Visualization Map] - Character visualization customization
344
- * link:DIFF_PARAMETERS[Diff Parameters] - Diff formatting options
345
- * link:MATCHER_BEHAVIOR[Matcher Behavior] - RSpec matcher details