canon 0.1.8 → 0.1.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (101) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop_todo.yml +83 -22
  3. data/docs/Gemfile +1 -0
  4. data/docs/_config.yml +90 -1
  5. data/docs/advanced/diff-classification.adoc +196 -24
  6. data/docs/features/match-options/index.adoc +239 -1
  7. data/lib/canon/comparison/format_detector.rb +2 -1
  8. data/lib/canon/comparison/html_comparator.rb +19 -8
  9. data/lib/canon/comparison/html_compare_profile.rb +8 -2
  10. data/lib/canon/comparison/markup_comparator.rb +109 -2
  11. data/lib/canon/comparison/match_options/base_resolver.rb +7 -0
  12. data/lib/canon/comparison/whitespace_sensitivity.rb +208 -0
  13. data/lib/canon/comparison/xml_comparator/child_comparison.rb +15 -7
  14. data/lib/canon/comparison/xml_comparator/diff_node_builder.rb +108 -0
  15. data/lib/canon/comparison/xml_comparator/node_parser.rb +10 -5
  16. data/lib/canon/comparison/xml_comparator/node_type_comparator.rb +14 -7
  17. data/lib/canon/comparison/xml_comparator.rb +240 -23
  18. data/lib/canon/comparison/xml_node_comparison.rb +25 -3
  19. data/lib/canon/diff/diff_classifier.rb +119 -5
  20. data/lib/canon/diff/formatting_detector.rb +1 -1
  21. data/lib/canon/diff/xml_serialization_formatter.rb +153 -0
  22. data/lib/canon/rspec_matchers.rb +37 -8
  23. data/lib/canon/version.rb +1 -1
  24. data/lib/canon/xml/data_model.rb +24 -13
  25. metadata +4 -78
  26. data/docs/plans/2025-01-17-html-parser-selection-fix.adoc +0 -250
  27. data/false_positive_analysis.txt +0 -0
  28. data/file1.html +0 -1
  29. data/file2.html +0 -1
  30. data/old-docs/ADVANCED_TOPICS.adoc +0 -20
  31. data/old-docs/BASIC_USAGE.adoc +0 -16
  32. data/old-docs/CHARACTER_VISUALIZATION.adoc +0 -567
  33. data/old-docs/CLI.adoc +0 -497
  34. data/old-docs/CUSTOMIZING_BEHAVIOR.adoc +0 -19
  35. data/old-docs/DIFF_ARCHITECTURE.adoc +0 -435
  36. data/old-docs/DIFF_FORMATTING.adoc +0 -540
  37. data/old-docs/DIFF_PARAMETERS.adoc +0 -261
  38. data/old-docs/DOM_DIFF.adoc +0 -1017
  39. data/old-docs/ENV_CONFIG.adoc +0 -876
  40. data/old-docs/FORMATS.adoc +0 -867
  41. data/old-docs/INPUT_VALIDATION.adoc +0 -477
  42. data/old-docs/MATCHER_BEHAVIOR.adoc +0 -90
  43. data/old-docs/MATCH_ARCHITECTURE.adoc +0 -463
  44. data/old-docs/MATCH_OPTIONS.adoc +0 -912
  45. data/old-docs/MODES.adoc +0 -432
  46. data/old-docs/NORMATIVE_INFORMATIVE_DIFFS.adoc +0 -219
  47. data/old-docs/OPTIONS.adoc +0 -1387
  48. data/old-docs/PREPROCESSING.adoc +0 -491
  49. data/old-docs/README.old.adoc +0 -2831
  50. data/old-docs/RSPEC.adoc +0 -814
  51. data/old-docs/RUBY_API.adoc +0 -485
  52. data/old-docs/SEMANTIC_DIFF_REPORT.adoc +0 -646
  53. data/old-docs/SEMANTIC_TREE_DIFF.adoc +0 -765
  54. data/old-docs/STRING_COMPARE.adoc +0 -345
  55. data/old-docs/TMP.adoc +0 -3384
  56. data/old-docs/TREE_DIFF.adoc +0 -1080
  57. data/old-docs/UNDERSTANDING_CANON.adoc +0 -17
  58. data/old-docs/VERBOSE.adoc +0 -482
  59. data/old-docs/VISUALIZATION_MAP.adoc +0 -625
  60. data/old-docs/WHITESPACE_TREATMENT.adoc +0 -1155
  61. data/scripts/analyze_current_state.rb +0 -85
  62. data/scripts/analyze_false_positives.rb +0 -114
  63. data/scripts/analyze_remaining_failures.rb +0 -105
  64. data/scripts/compare_current_failures.rb +0 -95
  65. data/scripts/compare_dom_tree_diff.rb +0 -158
  66. data/scripts/compare_failures.rb +0 -151
  67. data/scripts/debug_attribute_extraction.rb +0 -66
  68. data/scripts/debug_blocks_839.rb +0 -115
  69. data/scripts/debug_meta_matching.rb +0 -52
  70. data/scripts/debug_p_matching.rb +0 -192
  71. data/scripts/debug_signature_matching.rb +0 -118
  72. data/scripts/debug_sourcecode_124.rb +0 -32
  73. data/scripts/debug_whitespace_sensitive.rb +0 -192
  74. data/scripts/extract_false_positives.rb +0 -138
  75. data/scripts/find_actual_false_positives.rb +0 -125
  76. data/scripts/investigate_all_false_positives.rb +0 -161
  77. data/scripts/investigate_batch1.rb +0 -127
  78. data/scripts/investigate_classification.rb +0 -150
  79. data/scripts/investigate_classification_detailed.rb +0 -190
  80. data/scripts/investigate_common_failures.rb +0 -342
  81. data/scripts/investigate_false_negative.rb +0 -80
  82. data/scripts/investigate_false_positive.rb +0 -83
  83. data/scripts/investigate_false_positives.rb +0 -227
  84. data/scripts/investigate_false_positives_batch.rb +0 -163
  85. data/scripts/investigate_mixed_content.rb +0 -125
  86. data/scripts/investigate_remaining_16.rb +0 -214
  87. data/scripts/run_single_test.rb +0 -29
  88. data/scripts/test_all_false_positives.rb +0 -95
  89. data/scripts/test_attribute_details.rb +0 -61
  90. data/scripts/test_both_algorithms.rb +0 -49
  91. data/scripts/test_both_simple.rb +0 -49
  92. data/scripts/test_enhanced_semantic_output.rb +0 -125
  93. data/scripts/test_readme_examples.rb +0 -131
  94. data/scripts/test_semantic_tree_diff.rb +0 -99
  95. data/scripts/test_semantic_ux_improvements.rb +0 -135
  96. data/scripts/test_single_false_positive.rb +0 -119
  97. data/scripts/test_size_limits.rb +0 -99
  98. data/test_html_1.html +0 -21
  99. data/test_html_2.html +0 -21
  100. data/test_nokogiri.rb +0 -33
  101. data/test_normalize.rb +0 -45
@@ -1,345 +0,0 @@
1
- = String Comparison in Canon
2
- :toc:
3
- :toclevels: 3
4
-
5
- == General
6
-
7
- Canon provides advanced string comparison capabilities with character-level visualization and diff rendering. This feature is particularly useful for:
8
-
9
- * Comparing strings with invisible whitespace differences (spaces, tabs, trailing newlines)
10
- * Detecting Unicode differences (non-breaking spaces, zero-width characters, etc.)
11
- * Visualizing multi-line string differences with context
12
- * Identifying trailing newline differences in text output
13
-
14
- == Character Visualization
15
-
16
- Canon automatically visualizes invisible and special characters to make differences clear:
17
-
18
- [cols="2,3,2",options="header"]
19
- |===
20
- |Character
21
- |Description
22
- |Visualization
23
-
24
- |Space (U+0020)
25
- |Regular space
26
- |`░`
27
-
28
- |Tab (U+0009)
29
- |Tab character
30
- |`⇥`
31
-
32
- |Non-breaking space (U+00A0)
33
- |Non-breaking space
34
- |`␣`
35
-
36
- |Zero-width space (U+200B)
37
- |Zero-width space
38
- |`→`
39
-
40
- |Other invisible characters
41
- |Various Unicode invisibles
42
- |See Unicode legend in output
43
- |===
44
-
45
- == Usage in RSpec
46
-
47
- === Auto-detection with `be_equivalent_to`
48
-
49
- The `be_equivalent_to` matcher automatically detects the format (XML, JSON, YAML, or string) and uses the appropriate comparison mode.
50
-
51
- .Auto-detecting string mode
52
- [source,ruby]
53
- ----
54
- RSpec.describe "String comparison" do
55
- it "auto-detects string format" do
56
- actual = "Hello World"
57
- expected = "Hello Universe"
58
-
59
- expect(actual).to be_equivalent_to(expected)
60
- # Automatically uses STRING mode for plain text
61
- end
62
- end
63
- ----
64
-
65
- === Explicit string mode with `be_string_equivalent_to`
66
-
67
- For explicit string comparison, use the `be_string_equivalent_to` matcher.
68
-
69
- .Basic string comparison
70
- [source,ruby]
71
- ----
72
- RSpec.describe "String comparison" do
73
- it "compares strings exactly" do
74
- expect("Hello World").to be_string_equivalent_to("Hello World")
75
- end
76
-
77
- it "detects differences" do
78
- expect("Hello World").not_to be_string_equivalent_to("Hello Universe")
79
- end
80
- end
81
- ----
82
-
83
- === Whitespace differences
84
-
85
- .Detecting extra spaces
86
- [source,ruby]
87
- ----
88
- it "detects extra spaces" do
89
- actual = "Hello World"
90
- expected = "Hello World" # Two spaces
91
-
92
- expect(actual).not_to be_string_equivalent_to(expected)
93
- end
94
- ----
95
-
96
- When this test fails, the output shows:
97
-
98
- ----
99
- expected STRING to be equivalent
100
-
101
- Line-by-line diff (STRING mode):
102
- 1| 1- | Hello░World
103
- | 1+ | Hello░░World
104
- ----
105
-
106
- === Trailing newline differences
107
-
108
- .Detecting trailing newlines
109
- [source,ruby]
110
- ----
111
- it "detects trailing newline" do
112
- actual = "data:image/png;base64,abc123"
113
- expected = "data:image/png;base64,abc123\n"
114
-
115
- expect(actual).not_to be_string_equivalent_to(expected)
116
- end
117
- ----
118
-
119
- When this test fails, the output shows two separate lines:
120
-
121
- ----
122
- expected STRING to be equivalent
123
-
124
- Line-by-line diff (STRING mode):
125
- 1| 1 | data:image/png;base64,abc123
126
- | 2+ |
127
- ----
128
-
129
- The empty line on line 2 (shown in green with `+`) represents the trailing newline character.
130
-
131
- === Unicode character differences
132
-
133
- .Detecting Unicode differences
134
- [source,ruby]
135
- ----
136
- it "detects non-breaking space vs regular space" do
137
- actual = "Hello World"
138
- expected = "Hello\u00A0World" # Non-breaking space
139
-
140
- expect(actual).not_to be_string_equivalent_to(expected)
141
- end
142
- ----
143
-
144
- When this test fails, the output includes a Unicode legend:
145
-
146
- ----
147
- expected STRING to be equivalent
148
-
149
- Character Visualization Legend:
150
- ░ = U+0020 (Space)
151
- ␣ = U+00A0 (Non-breaking space)
152
-
153
- Line-by-line diff (STRING mode):
154
- 1| 1- | Hello░World
155
- | 1+ | Hello␣World
156
- ----
157
-
158
- === Multi-line string differences
159
-
160
- .Comparing multi-line strings
161
- [source,ruby]
162
- ----
163
- it "shows line-by-line diff for multi-line strings" do
164
- actual = <<~TEXT
165
- Line 1
166
- Line 2
167
- Line 3
168
- TEXT
169
-
170
- expected = <<~TEXT
171
- Line 1
172
- Line 2 Modified
173
- Line 3
174
- TEXT
175
-
176
- expect(actual).not_to be_string_equivalent_to(expected)
177
- end
178
- ----
179
-
180
- Output shows context around the changed line:
181
-
182
- ----
183
- expected STRING to be equivalent
184
-
185
- Line-by-line diff (STRING mode):
186
- 1| 1 | Line 1
187
- 2| 2- | Line 2
188
- | 2+ | Line 2 Modified
189
- 3| 3 | Line 3
190
- ----
191
-
192
- == Usage in CLI
193
-
194
- === Comparing string files
195
-
196
- .Using the diff command
197
- [source,bash]
198
- ----
199
- canon diff actual.txt expected.txt --format string
200
- ----
201
-
202
- .Output
203
- ----
204
- Line-by-line diff (STRING mode):
205
- 1| 1 | Hello World
206
- 2| 2- | Line with spaces
207
- | 2+ | Line with spaces
208
- 3| 3 | Final line
209
- ----
210
-
211
- === Format auto-detection
212
-
213
- If you don't specify `--format`, Canon will auto-detect the format based on file content:
214
-
215
- [source,bash]
216
- ----
217
- canon diff actual.txt expected.txt
218
- # Auto-detects as string if content isn't XML/JSON/YAML
219
- ----
220
-
221
- == Usage in API
222
-
223
- === Using DiffFormatter directly
224
-
225
- .Programmatic string comparison
226
- [source,ruby]
227
- ----
228
- require 'canon/diff_formatter'
229
-
230
- actual = "Hello World"
231
- expected = "Hello World" # Two spaces
232
-
233
- formatter = Canon::DiffFormatter.new(
234
- use_color: true,
235
- mode: :by_line,
236
- context_lines: 3
237
- )
238
-
239
- diff = formatter.format([], :string,
240
- doc1: actual,
241
- doc2: expected)
242
- puts diff
243
- ----
244
-
245
- .Output
246
- ----
247
- Line-by-line diff (STRING mode):
248
- 1| 1- | Hello░World
249
- | 1+ | Hello░░World
250
- ----
251
-
252
- === Using Canon.format for canonicalization
253
-
254
- Note: For strings, Canon does not perform canonicalization (no formatting changes are applied). The string is compared exactly as-is.
255
-
256
- .String comparison example
257
- [source,ruby]
258
- ----
259
- require 'canon'
260
-
261
- actual = "Hello World"
262
- expected = "Hello World"
263
-
264
- # Strings are compared as-is (no canonicalization)
265
- result = Canon.format(actual, :string)
266
- # => "Hello World"
267
-
268
- result == expected # => true
269
- ----
270
-
271
- == Configuration
272
-
273
- === Customizing character visualization
274
-
275
- You can customize how characters are visualized by configuring the visualization map:
276
-
277
- .Customizing character map
278
- [source,ruby]
279
- ----
280
- Canon::RSpecMatchers.configure do |config|
281
- config.use_color = true
282
- config.context_lines = 3
283
- config.diff_mode = :by_line
284
- end
285
- ----
286
-
287
- === Color output
288
-
289
- Color output is enabled by default in RSpec. To disable:
290
-
291
- .Disabling colors
292
- [source,ruby]
293
- ----
294
- Canon::RSpecMatchers.configure do |config|
295
- config.use_color = false
296
- end
297
- ----
298
-
299
- == Advanced Features
300
-
301
- === Trailing newline detection
302
-
303
- Canon properly handles trailing newlines by preserving them during line splitting. This ensures that strings like:
304
-
305
- * `"text"` (no trailing newline)
306
- * `"text\n"` (with trailing newline)
307
-
308
- Are shown as different, with the trailing newline visualized as an empty line in the diff output.
309
-
310
- === Context lines
311
-
312
- By default, Canon shows 3 lines of context around changes. This can be configured:
313
-
314
- .Adjusting context lines
315
- [source,ruby]
316
- ----
317
- Canon::RSpecMatchers.configure do |config|
318
- config.context_lines = 5 # Show 5 lines of context
319
- end
320
- ----
321
-
322
- == Technical Implementation
323
-
324
- === Line splitting
325
-
326
- Canon uses `split("\n", -1)` to preserve trailing empty strings, ensuring that:
327
-
328
- [source,ruby]
329
- ----
330
- "abc\n".split("\n", -1) # => ["abc", ""]
331
- "abc".split("\n", -1) # => ["abc"]
332
- ----
333
-
334
- This allows proper detection and visualization of trailing newlines.
335
-
336
- === Character visualization map
337
-
338
- The character visualization is configurable via `lib/canon/diff_formatter/character_map.yml`. See link:VISUALIZATION_MAP[Visualization Map Documentation] for details on customizing character representations.
339
-
340
- == See Also
341
-
342
- * link:README[Canon README] - General Canon documentation
343
- * link:VISUALIZATION_MAP[Visualization Map] - Character visualization customization
344
- * link:DIFF_PARAMETERS[Diff Parameters] - Diff formatting options
345
- * link:MATCHER_BEHAVIOR[Matcher Behavior] - RSpec matcher details