canon 0.1.8 → 0.1.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop_todo.yml +112 -25
- data/docs/Gemfile +1 -0
- data/docs/_config.yml +90 -1
- data/docs/advanced/diff-classification.adoc +82 -2
- data/docs/features/match-options/index.adoc +239 -1
- data/lib/canon/comparison/format_detector.rb +2 -1
- data/lib/canon/comparison/html_comparator.rb +19 -8
- data/lib/canon/comparison/html_compare_profile.rb +8 -2
- data/lib/canon/comparison/match_options/base_resolver.rb +7 -0
- data/lib/canon/comparison/whitespace_sensitivity.rb +208 -0
- data/lib/canon/comparison/xml_comparator/child_comparison.rb +15 -7
- data/lib/canon/comparison/xml_comparator/node_parser.rb +10 -5
- data/lib/canon/comparison/xml_comparator/node_type_comparator.rb +14 -7
- data/lib/canon/comparison/xml_comparator.rb +48 -23
- data/lib/canon/comparison/xml_node_comparison.rb +25 -3
- data/lib/canon/diff/diff_classifier.rb +101 -2
- data/lib/canon/diff/formatting_detector.rb +1 -1
- data/lib/canon/rspec_matchers.rb +37 -8
- data/lib/canon/version.rb +1 -1
- data/lib/canon/xml/data_model.rb +24 -13
- metadata +3 -78
- data/docs/plans/2025-01-17-html-parser-selection-fix.adoc +0 -250
- data/false_positive_analysis.txt +0 -0
- data/file1.html +0 -1
- data/file2.html +0 -1
- data/old-docs/ADVANCED_TOPICS.adoc +0 -20
- data/old-docs/BASIC_USAGE.adoc +0 -16
- data/old-docs/CHARACTER_VISUALIZATION.adoc +0 -567
- data/old-docs/CLI.adoc +0 -497
- data/old-docs/CUSTOMIZING_BEHAVIOR.adoc +0 -19
- data/old-docs/DIFF_ARCHITECTURE.adoc +0 -435
- data/old-docs/DIFF_FORMATTING.adoc +0 -540
- data/old-docs/DIFF_PARAMETERS.adoc +0 -261
- data/old-docs/DOM_DIFF.adoc +0 -1017
- data/old-docs/ENV_CONFIG.adoc +0 -876
- data/old-docs/FORMATS.adoc +0 -867
- data/old-docs/INPUT_VALIDATION.adoc +0 -477
- data/old-docs/MATCHER_BEHAVIOR.adoc +0 -90
- data/old-docs/MATCH_ARCHITECTURE.adoc +0 -463
- data/old-docs/MATCH_OPTIONS.adoc +0 -912
- data/old-docs/MODES.adoc +0 -432
- data/old-docs/NORMATIVE_INFORMATIVE_DIFFS.adoc +0 -219
- data/old-docs/OPTIONS.adoc +0 -1387
- data/old-docs/PREPROCESSING.adoc +0 -491
- data/old-docs/README.old.adoc +0 -2831
- data/old-docs/RSPEC.adoc +0 -814
- data/old-docs/RUBY_API.adoc +0 -485
- data/old-docs/SEMANTIC_DIFF_REPORT.adoc +0 -646
- data/old-docs/SEMANTIC_TREE_DIFF.adoc +0 -765
- data/old-docs/STRING_COMPARE.adoc +0 -345
- data/old-docs/TMP.adoc +0 -3384
- data/old-docs/TREE_DIFF.adoc +0 -1080
- data/old-docs/UNDERSTANDING_CANON.adoc +0 -17
- data/old-docs/VERBOSE.adoc +0 -482
- data/old-docs/VISUALIZATION_MAP.adoc +0 -625
- data/old-docs/WHITESPACE_TREATMENT.adoc +0 -1155
- data/scripts/analyze_current_state.rb +0 -85
- data/scripts/analyze_false_positives.rb +0 -114
- data/scripts/analyze_remaining_failures.rb +0 -105
- data/scripts/compare_current_failures.rb +0 -95
- data/scripts/compare_dom_tree_diff.rb +0 -158
- data/scripts/compare_failures.rb +0 -151
- data/scripts/debug_attribute_extraction.rb +0 -66
- data/scripts/debug_blocks_839.rb +0 -115
- data/scripts/debug_meta_matching.rb +0 -52
- data/scripts/debug_p_matching.rb +0 -192
- data/scripts/debug_signature_matching.rb +0 -118
- data/scripts/debug_sourcecode_124.rb +0 -32
- data/scripts/debug_whitespace_sensitive.rb +0 -192
- data/scripts/extract_false_positives.rb +0 -138
- data/scripts/find_actual_false_positives.rb +0 -125
- data/scripts/investigate_all_false_positives.rb +0 -161
- data/scripts/investigate_batch1.rb +0 -127
- data/scripts/investigate_classification.rb +0 -150
- data/scripts/investigate_classification_detailed.rb +0 -190
- data/scripts/investigate_common_failures.rb +0 -342
- data/scripts/investigate_false_negative.rb +0 -80
- data/scripts/investigate_false_positive.rb +0 -83
- data/scripts/investigate_false_positives.rb +0 -227
- data/scripts/investigate_false_positives_batch.rb +0 -163
- data/scripts/investigate_mixed_content.rb +0 -125
- data/scripts/investigate_remaining_16.rb +0 -214
- data/scripts/run_single_test.rb +0 -29
- data/scripts/test_all_false_positives.rb +0 -95
- data/scripts/test_attribute_details.rb +0 -61
- data/scripts/test_both_algorithms.rb +0 -49
- data/scripts/test_both_simple.rb +0 -49
- data/scripts/test_enhanced_semantic_output.rb +0 -125
- data/scripts/test_readme_examples.rb +0 -131
- data/scripts/test_semantic_tree_diff.rb +0 -99
- data/scripts/test_semantic_ux_improvements.rb +0 -135
- data/scripts/test_single_false_positive.rb +0 -119
- data/scripts/test_size_limits.rb +0 -99
- data/test_html_1.html +0 -21
- data/test_html_2.html +0 -21
- data/test_nokogiri.rb +0 -33
- data/test_normalize.rb +0 -45
data/old-docs/TMP.adoc
DELETED
|
@@ -1,3384 +0,0 @@
|
|
|
1
|
-
= Canon: Canonicalization for serialization formats
|
|
2
|
-
|
|
3
|
-
A Ruby library for canonicalizing and pretty-printing XML, HTML, YAML, and JSON
|
|
4
|
-
with RSpec matchers for equivalence testing.
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
== Purpose
|
|
8
|
-
|
|
9
|
-
Canon provides canonicalization and pretty-printing for various serialization
|
|
10
|
-
formats (XML, HTML, JSON, YAML), producing standardized forms suitable for
|
|
11
|
-
comparison, testing, digital signatures, and human-readable output.
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
== Architecture
|
|
15
|
-
|
|
16
|
-
Canon follows an **orchestrator pattern** with clear separation of concerns.
|
|
17
|
-
|
|
18
|
-
=== Comparison module
|
|
19
|
-
|
|
20
|
-
The `Canon::Comparison` module (123 lines) acts as a pure orchestrator that:
|
|
21
|
-
|
|
22
|
-
* Detects input format (XML, HTML, JSON, YAML)
|
|
23
|
-
* Validates format compatibility
|
|
24
|
-
* Delegates to format-specific comparator classes
|
|
25
|
-
|
|
26
|
-
Format-specific comparators:
|
|
27
|
-
|
|
28
|
-
* `Canon::Comparison::XmlComparator` - XML semantic comparison
|
|
29
|
-
* `Canon::Comparison::HtmlComparator` - HTML semantic comparison
|
|
30
|
-
* `Canon::Comparison::JsonComparator` - JSON/Ruby object comparison
|
|
31
|
-
* `Canon::Comparison::YamlComparator` - YAML comparison (delegates to JsonComparator)
|
|
32
|
-
|
|
33
|
-
Each comparator is self-contained and handles all comparison logic for its format.
|
|
34
|
-
|
|
35
|
-
=== DiffFormatter module
|
|
36
|
-
|
|
37
|
-
The `Canon::DiffFormatter` class (171 lines) acts as a pure orchestrator that:
|
|
38
|
-
|
|
39
|
-
* Manages diff options (colors, visualization, context)
|
|
40
|
-
* Detects diff mode (by-object vs by-line)
|
|
41
|
-
* Delegates to mode-specific and format-specific formatters
|
|
42
|
-
|
|
43
|
-
Two diff modes:
|
|
44
|
-
|
|
45
|
-
**By-object mode** (tree-based semantic diff):
|
|
46
|
-
|
|
47
|
-
* `Canon::DiffFormatter::ByObject::BaseFormatter` - Factory and common logic
|
|
48
|
-
* `Canon::DiffFormatter::ByObject::XmlFormatter` - XML DOM differences
|
|
49
|
-
* `Canon::DiffFormatter::ByObject::JsonFormatter` - Ruby object differences
|
|
50
|
-
* `Canon::DiffFormatter::ByObject::YamlFormatter` - YAML differences
|
|
51
|
-
|
|
52
|
-
**By-line mode** (line-based diff):
|
|
53
|
-
|
|
54
|
-
* `Canon::DiffFormatter::ByLine::BaseFormatter` - LCS algorithm and factory
|
|
55
|
-
* `Canon::DiffFormatter::ByLine::XmlFormatter` - DOM-guided XML line diff
|
|
56
|
-
* `Canon::DiffFormatter::ByLine::JsonFormatter` - Semantic JSON line diff
|
|
57
|
-
* `Canon::DiffFormatter::ByLine::YamlFormatter` - Semantic YAML line diff
|
|
58
|
-
* `Canon::DiffFormatter::ByLine::SimpleFormatter` - Fallback line diff
|
|
59
|
-
|
|
60
|
-
Each formatter handles format-specific intelligence (DOM parsing, token
|
|
61
|
-
highlighting, semantic understanding).
|
|
62
|
-
|
|
63
|
-
=== Object-oriented diff foundation
|
|
64
|
-
|
|
65
|
-
Canon uses three foundational classes for managing diff data:
|
|
66
|
-
|
|
67
|
-
* `Canon::Diff::DiffBlock` - Represents a contiguous block of changes
|
|
68
|
-
* `Canon::Diff::DiffContext` - Groups diff blocks with surrounding context
|
|
69
|
-
* `Canon::Diff::DiffReport` - Top-level container for complete diff results
|
|
70
|
-
|
|
71
|
-
These classes ensure clean separation by providing clear ownership of diff data
|
|
72
|
-
at different granularity levels.
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
== Features
|
|
76
|
-
|
|
77
|
-
=== Ruby API
|
|
78
|
-
|
|
79
|
-
Single API for working with all four formats (XML, HTML, JSON, YAML).
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
=== XML canonicalization
|
|
83
|
-
|
|
84
|
-
Format XML documents according to the
|
|
85
|
-
https://www.w3.org/TR/xml-c14n11/[W3C Canonical XML Version 1.1] specification.
|
|
86
|
-
|
|
87
|
-
Key features:
|
|
88
|
-
|
|
89
|
-
* Namespace declaration ordering (lexicographic by prefix)
|
|
90
|
-
* Attribute ordering (lexicographic by namespace URI, then local name)
|
|
91
|
-
* Character encoding normalization to UTF-8
|
|
92
|
-
* Special character encoding in text and attributes
|
|
93
|
-
* Removal of superfluous namespace declarations
|
|
94
|
-
* Support for xml:base, xml:lang, xml:space, and xml:id attributes
|
|
95
|
-
* Processing instruction and comment handling
|
|
96
|
-
* Document subset support with attribute inheritance
|
|
97
|
-
|
|
98
|
-
=== HTML canonicalization
|
|
99
|
-
|
|
100
|
-
Format HTML 4/5 and XHTML documents with consistent formatting. Automatically
|
|
101
|
-
detects HTML vs XHTML and applies appropriate formatting.
|
|
102
|
-
|
|
103
|
-
=== YAML canonicalization
|
|
104
|
-
|
|
105
|
-
Format YAML documents with keys sorted alphabetically at all levels of the
|
|
106
|
-
structure.
|
|
107
|
-
|
|
108
|
-
=== JSON canonicalization
|
|
109
|
-
|
|
110
|
-
Format JSON documents with keys sorted alphabetically at all levels of the
|
|
111
|
-
structure.
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
=== Output modes
|
|
115
|
-
|
|
116
|
-
Canon supports two output modes for all formats:
|
|
117
|
-
|
|
118
|
-
`c14n` (canonical):: Compact output without indentation, suitable for digital
|
|
119
|
-
signatures, hashing, and equivalence testing. Removes formatting whitespace.
|
|
120
|
-
|
|
121
|
-
`pretty` (pretty-print):: Human-readable output with consistent indentation.
|
|
122
|
-
Configurable indent size and type (spaces or tabs). This is the default mode for
|
|
123
|
-
CLI commands.
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
=== RSpec matchers
|
|
127
|
-
|
|
128
|
-
Provides matchers for testing equivalence between serialized formats.
|
|
129
|
-
|
|
130
|
-
NOTE: RSpec matchers always use canonical (c14n) mode for comparison to ensure
|
|
131
|
-
formatting differences don't affect test results.
|
|
132
|
-
|
|
133
|
-
=== Comparison API
|
|
134
|
-
|
|
135
|
-
Canon provides a `Canon::Comparison` module for semantic comparison of HTML and
|
|
136
|
-
XML documents.
|
|
137
|
-
|
|
138
|
-
The `Canon::Comparison.equivalent?` method compares two documents for semantic
|
|
139
|
-
equivalence, ignoring formatting differences that don't affect meaning.
|
|
140
|
-
|
|
141
|
-
Key features:
|
|
142
|
-
|
|
143
|
-
* Semantic comparison (content, not formatting)
|
|
144
|
-
* Whitespace normalization
|
|
145
|
-
* Comment handling (can ignore or include)
|
|
146
|
-
* Attribute sorting
|
|
147
|
-
* Support for both HTML and XML documents
|
|
148
|
-
* Optional verbose diff output
|
|
149
|
-
|
|
150
|
-
NOTE: `Canon::Comparison.equivalent?` adopts option names used by the excellent
|
|
151
|
-
https://github.com/vkononov/compare-xml[`compare-xml` gem].
|
|
152
|
-
|
|
153
|
-
|
|
154
|
-
|
|
155
|
-
== Usage
|
|
156
|
-
|
|
157
|
-
=== Command-line usage
|
|
158
|
-
|
|
159
|
-
=== Installation
|
|
160
|
-
|
|
161
|
-
After installing the gem, the `canon` command will be available:
|
|
162
|
-
|
|
163
|
-
[source,bash]
|
|
164
|
-
----
|
|
165
|
-
$ gem install canon
|
|
166
|
-
$ canon --help
|
|
167
|
-
----
|
|
168
|
-
|
|
169
|
-
=== Format command
|
|
170
|
-
|
|
171
|
-
The `format` command formats files in XML, HTML, JSON, or YAML.
|
|
172
|
-
|
|
173
|
-
==== Output modes
|
|
174
|
-
|
|
175
|
-
`pretty` (default):: Human-readable output with indentation (2 spaces default)
|
|
176
|
-
`c14n`:: Canonical form without indentation
|
|
177
|
-
|
|
178
|
-
==== Command syntax
|
|
179
|
-
|
|
180
|
-
[source,bash]
|
|
181
|
-
----
|
|
182
|
-
canon format FILE [OPTIONS]
|
|
183
|
-
----
|
|
184
|
-
|
|
185
|
-
==== Options
|
|
186
|
-
|
|
187
|
-
`-f, --format FORMAT`:: Specify format: `xml`, `html`, `json`, or `yaml`
|
|
188
|
-
(auto-detected from extension if not specified)
|
|
189
|
-
|
|
190
|
-
`-m, --mode MODE`:: Output mode: `pretty` (default) or `c14n`
|
|
191
|
-
|
|
192
|
-
`-i, --indent N`:: Indentation spaces for pretty mode (default: 2)
|
|
193
|
-
|
|
194
|
-
`--indent-type TYPE`:: Indentation type: `space` (default) or `tab`
|
|
195
|
-
|
|
196
|
-
`-o, --output FILE`:: Write output to file instead of stdout
|
|
197
|
-
|
|
198
|
-
`-c, --with-comments`:: Include comments in canonical XML output
|
|
199
|
-
|
|
200
|
-
==== Examples
|
|
201
|
-
|
|
202
|
-
[source,bash]
|
|
203
|
-
----
|
|
204
|
-
# Pretty-print (default mode)
|
|
205
|
-
$ canon format input.xml
|
|
206
|
-
<?xml version="1.0" encoding="UTF-8"?>
|
|
207
|
-
<root>
|
|
208
|
-
<a>1</a>
|
|
209
|
-
<b>2</b>
|
|
210
|
-
</root>
|
|
211
|
-
|
|
212
|
-
# Canonical mode (compact)
|
|
213
|
-
$ canon format input.xml --mode c14n
|
|
214
|
-
<root><a>1</a><b>2</b></root>
|
|
215
|
-
|
|
216
|
-
# Custom indentation
|
|
217
|
-
$ canon format input.xml --mode pretty --indent 4
|
|
218
|
-
$ canon format input.json --indent 4
|
|
219
|
-
|
|
220
|
-
# Tab indentation
|
|
221
|
-
$ canon format input.xml --indent-type tab
|
|
222
|
-
$ canon format input.html --mode pretty --indent-type tab
|
|
223
|
-
|
|
224
|
-
# Specify format explicitly
|
|
225
|
-
$ canon format data.txt --format xml
|
|
226
|
-
|
|
227
|
-
# Save to file
|
|
228
|
-
$ canon format input.xml --output formatted.xml
|
|
229
|
-
|
|
230
|
-
# Include XML comments in canonical output
|
|
231
|
-
$ canon format doc.xml --mode c14n --with-comments
|
|
232
|
-
|
|
233
|
-
# HTML files
|
|
234
|
-
$ canon format page.html
|
|
235
|
-
$ canon format page.html --mode c14n
|
|
236
|
-
----
|
|
237
|
-
|
|
238
|
-
|
|
239
|
-
=== Diff command
|
|
240
|
-
|
|
241
|
-
Compare two files using **semantic comparison** that understands the structure of
|
|
242
|
-
XML, HTML, JSON, and YAML formats. Unlike traditional text-based diff tools,
|
|
243
|
-
`canon diff` compares the meaning and structure of your data, not just the
|
|
244
|
-
characters.
|
|
245
|
-
|
|
246
|
-
==== Command syntax
|
|
247
|
-
|
|
248
|
-
[source,bash]
|
|
249
|
-
----
|
|
250
|
-
canon diff FILE1 FILE2 [OPTIONS]
|
|
251
|
-
----
|
|
252
|
-
|
|
253
|
-
==== Diff modes
|
|
254
|
-
|
|
255
|
-
Canon supports two diff modes optimized for different use cases:
|
|
256
|
-
|
|
257
|
-
===== by-object mode (default for JSON/YAML)
|
|
258
|
-
|
|
259
|
-
Compares files **semantically** by their data structure and displays differences
|
|
260
|
-
as a visual tree showing what changed in the structure.
|
|
261
|
-
|
|
262
|
-
Best for::
|
|
263
|
-
* Configuration files where you care about what values changed
|
|
264
|
-
* API responses where structure matters
|
|
265
|
-
* Comparing semantic equivalence across formats
|
|
266
|
-
|
|
267
|
-
Features::
|
|
268
|
-
* Tree visualization with box-drawing characters
|
|
269
|
-
* Shows only what changed (additions, removals, modifications)
|
|
270
|
-
* Ignores formatting differences automatically
|
|
271
|
-
* Color-coded output (red=removed, green=added, yellow=changed)
|
|
272
|
-
|
|
273
|
-
===== by-line mode (default for HTML, optional for XML)
|
|
274
|
-
|
|
275
|
-
Compares files **line-by-line** after canonicalization, showing traditional
|
|
276
|
-
diff-style output.
|
|
277
|
-
|
|
278
|
-
Best for::
|
|
279
|
-
* HTML markup where line-level changes matter
|
|
280
|
-
* Reviewing exact textual differences
|
|
281
|
-
* When you need to see the full document context
|
|
282
|
-
|
|
283
|
-
Features::
|
|
284
|
-
* Traditional diff format with line numbers
|
|
285
|
-
* Shows before/after for each change
|
|
286
|
-
* Better for understanding markup structure changes
|
|
287
|
-
|
|
288
|
-
[NOTE]
|
|
289
|
-
* JSON and YAML always use **by-object** mode
|
|
290
|
-
* HTML always uses **by-line** mode
|
|
291
|
-
* XML uses **by-object** mode by default, but can use **by-line** with `--by-line`
|
|
292
|
-
|
|
293
|
-
==== Options
|
|
294
|
-
|
|
295
|
-
===== Format options
|
|
296
|
-
|
|
297
|
-
`-f, --format FORMAT`:: Format for both files: `xml`, `html`, `json`, or `yaml`
|
|
298
|
-
(auto-detected from extension if not specified)
|
|
299
|
-
|
|
300
|
-
`--format1 FORMAT`:: Format for first file (when comparing different formats)
|
|
301
|
-
|
|
302
|
-
`--format2 FORMAT`:: Format for second file (when comparing different formats)
|
|
303
|
-
|
|
304
|
-
===== Comparison options
|
|
305
|
-
|
|
306
|
-
`-v, --verbose`:: Show detailed differences in tree format (default: just show
|
|
307
|
-
if files differ)
|
|
308
|
-
|
|
309
|
-
`--by-line`:: Use line-by-line diff for XML (default: by-object mode)
|
|
310
|
-
|
|
311
|
-
`--ignore-attr-order` / `--no-ignore-attr-order`:: Control whether attribute/key
|
|
312
|
-
ordering matters (default: ignore order)
|
|
313
|
-
|
|
314
|
-
`--ignore-comments`:: Ignore XML/HTML comments during comparison (overrides
|
|
315
|
-
`--with-comments`)
|
|
316
|
-
|
|
317
|
-
`--ignore-text-nodes`:: Ignore all text node content, only compare structure
|
|
318
|
-
|
|
319
|
-
`-c, --with-comments`:: Include comments in comparison (sets `ignore_comments: false`)
|
|
320
|
-
|
|
321
|
-
===== Output options
|
|
322
|
-
|
|
323
|
-
`--color` / `--no-color`:: Enable/disable colored output (default: enabled)
|
|
324
|
-
|
|
325
|
-
==== Examples
|
|
326
|
-
|
|
327
|
-
===== Basic comparison
|
|
328
|
-
|
|
329
|
-
[source,bash]
|
|
330
|
-
----
|
|
331
|
-
# Compare two JSON files (shows if equivalent or different)
|
|
332
|
-
$ canon diff config1.json config2.json
|
|
333
|
-
Files are semantically different
|
|
334
|
-
|
|
335
|
-
# Compare two XML files
|
|
336
|
-
$ canon diff file1.xml file2.xml
|
|
337
|
-
✅ Files are semantically equivalent
|
|
338
|
-
----
|
|
339
|
-
|
|
340
|
-
===== Verbose mode examples
|
|
341
|
-
|
|
342
|
-
====== JSON comparison (by-object mode)
|
|
343
|
-
|
|
344
|
-
[example]
|
|
345
|
-
Given these two JSON files:
|
|
346
|
-
|
|
347
|
-
.config1.json
|
|
348
|
-
[source,json]
|
|
349
|
-
----
|
|
350
|
-
{
|
|
351
|
-
"name": "myapp",
|
|
352
|
-
"version": "1.0.0",
|
|
353
|
-
"settings": {
|
|
354
|
-
"debug": true,
|
|
355
|
-
"port": 8080
|
|
356
|
-
}
|
|
357
|
-
}
|
|
358
|
-
----
|
|
359
|
-
|
|
360
|
-
.config2.json
|
|
361
|
-
[source,json]
|
|
362
|
-
----
|
|
363
|
-
{
|
|
364
|
-
"version": "2.0.0",
|
|
365
|
-
"name": "myapp",
|
|
366
|
-
"settings": {
|
|
367
|
-
"debug": false,
|
|
368
|
-
"port": 8080
|
|
369
|
-
}
|
|
370
|
-
}
|
|
371
|
-
----
|
|
372
|
-
|
|
373
|
-
Running with `--verbose`:
|
|
374
|
-
|
|
375
|
-
[source,bash]
|
|
376
|
-
----
|
|
377
|
-
$ canon diff config1.json config2.json --verbose
|
|
378
|
-
Visual Diff:
|
|
379
|
-
├── settings.debug:
|
|
380
|
-
│ ├── - true
|
|
381
|
-
│ └── + false
|
|
382
|
-
└── version:
|
|
383
|
-
├── - "1.0.0"
|
|
384
|
-
└── + "2.0.0"
|
|
385
|
-
----
|
|
386
|
-
|
|
387
|
-
The tree shows:
|
|
388
|
-
|
|
389
|
-
* Key order difference (`version` moved) is ignored
|
|
390
|
-
* Only semantic changes are shown: `debug` and `version` values changed
|
|
391
|
-
|
|
392
|
-
====== XML comparison (by-object mode with DOM-guided semantic matching)
|
|
393
|
-
|
|
394
|
-
Canon's XML diff uses **hybrid DOM-guided line diff** that semantically matches
|
|
395
|
-
elements across documents using identity attributes (such as `id`, `ref`, `name`,
|
|
396
|
-
`key`) and element paths. This ensures that corresponding elements are compared
|
|
397
|
-
even when they appear at different line positions in the files.
|
|
398
|
-
|
|
399
|
-
[example]
|
|
400
|
-
Given these two XML files:
|
|
401
|
-
|
|
402
|
-
.document1.xml
|
|
403
|
-
[source,xml]
|
|
404
|
-
----
|
|
405
|
-
<standard-document>
|
|
406
|
-
<preface>
|
|
407
|
-
<foreword id="fwd">
|
|
408
|
-
<p>First paragraph</p>
|
|
409
|
-
</foreword>
|
|
410
|
-
</preface>
|
|
411
|
-
<sections>
|
|
412
|
-
<clause id="scope">
|
|
413
|
-
<title>Scope</title>
|
|
414
|
-
</clause>
|
|
415
|
-
</sections>
|
|
416
|
-
</standard-document>
|
|
417
|
-
----
|
|
418
|
-
|
|
419
|
-
.document2.xml
|
|
420
|
-
[source,xml]
|
|
421
|
-
----
|
|
422
|
-
<standard-document>
|
|
423
|
-
<preface>
|
|
424
|
-
<foreword displayorder="2" id="fwd">
|
|
425
|
-
<p>First paragraph</p>
|
|
426
|
-
</foreword>
|
|
427
|
-
</preface>
|
|
428
|
-
<sections>
|
|
429
|
-
<clause id="scope">
|
|
430
|
-
<title>Scope</title>
|
|
431
|
-
<p>New content</p>
|
|
432
|
-
</clause>
|
|
433
|
-
</sections>
|
|
434
|
-
</standard-document>
|
|
435
|
-
----
|
|
436
|
-
|
|
437
|
-
Running with `--verbose` using by-object mode (default):
|
|
438
|
-
|
|
439
|
-
[source,bash]
|
|
440
|
-
----
|
|
441
|
-
$ canon diff document1.xml document2.xml --verbose
|
|
442
|
-
Visual Diff:
|
|
443
|
-
├── preface.foreword:
|
|
444
|
-
│ └── + displayorder="2"
|
|
445
|
-
└── sections.clause.p:
|
|
446
|
-
└── + "New content"
|
|
447
|
-
----
|
|
448
|
-
|
|
449
|
-
The DOM-guided diff shows:
|
|
450
|
-
|
|
451
|
-
* The `<foreword id="fwd">` elements are **semantically matched** by their `id`
|
|
452
|
-
attribute, even though they may be at different positions
|
|
453
|
-
* Only the **added** `displayorder` attribute is shown for foreword
|
|
454
|
-
* The **added** `<p>` element in clause is shown
|
|
455
|
-
* Unchanged content is not displayed
|
|
456
|
-
|
|
457
|
-
[example]
|
|
458
|
-
Example with element matching when positions differ:
|
|
459
|
-
|
|
460
|
-
.file1.xml
|
|
461
|
-
[source,xml]
|
|
462
|
-
----
|
|
463
|
-
<root>
|
|
464
|
-
<item id="1" name="Widget" price="10.00"/>
|
|
465
|
-
<item id="2" name="Gadget" price="20.00"/>
|
|
466
|
-
</root>
|
|
467
|
-
----
|
|
468
|
-
|
|
469
|
-
.file2.xml
|
|
470
|
-
[source,xml]
|
|
471
|
-
----
|
|
472
|
-
<root>
|
|
473
|
-
<item price="20.00" name="Gadget" id="2"/>
|
|
474
|
-
<item id="1" name="Widget" price="15.00"/>
|
|
475
|
-
</root>
|
|
476
|
-
----
|
|
477
|
-
|
|
478
|
-
Running with `--verbose`:
|
|
479
|
-
|
|
480
|
-
[source,bash]
|
|
481
|
-
----
|
|
482
|
-
$ canon diff file1.xml file2.xml --verbose
|
|
483
|
-
Visual Diff:
|
|
484
|
-
└── root.item[id="1"].price:
|
|
485
|
-
├── - "10.00"
|
|
486
|
-
└── + "15.00"
|
|
487
|
-
----
|
|
488
|
-
|
|
489
|
-
The semantic matching shows:
|
|
490
|
-
|
|
491
|
-
* Elements are matched by `id` attribute (`id="1"` with `id="1"`, `id="2"` with `id="2"`)
|
|
492
|
-
* Position changes are ignored (item with `id="2"` moved from second to first)
|
|
493
|
-
* Attribute reordering is ignored (price/name order changed)
|
|
494
|
-
* Only the semantic change is shown: `price` value changed for item `id="1"`
|
|
495
|
-
|
|
496
|
-
[NOTE]
|
|
497
|
-
DOM-guided semantic matching features:
|
|
498
|
-
|
|
499
|
-
* **Identity attributes**: Matches elements using `id`, `ref`, `name`, or `key` attributes
|
|
500
|
-
* **Element paths**: Uses full element path for matching (e.g., `root.item`)
|
|
501
|
-
* **Token-level highlighting**: Shows differences at semantic token level (element
|
|
502
|
-
names, attribute names, attribute values)
|
|
503
|
-
* **Parent filtering**: Skips parent elements that only differ in children to
|
|
504
|
-
avoid redundant output
|
|
505
|
-
* **Line range mapping**: Maps DOM elements to exact line ranges in pretty-printed
|
|
506
|
-
output for accurate diff display
|
|
507
|
-
|
|
508
|
-
====== XML comparison (by-line mode)
|
|
509
|
-
|
|
510
|
-
The `--by-line` option switches to traditional line-by-line diff after
|
|
511
|
-
canonicalization, useful when you need to see exact line-level changes.
|
|
512
|
-
|
|
513
|
-
[example]
|
|
514
|
-
Using the previous example files, but with `--by-line`:
|
|
515
|
-
|
|
516
|
-
[source,bash]
|
|
517
|
-
----
|
|
518
|
-
$ canon diff document1.xml document2.xml --by-line --verbose
|
|
519
|
-
Line-by-line diff:
|
|
520
|
-
4 - | <foreword id="fwd">
|
|
521
|
-
4 + | <foreword displayorder="2" id="fwd">
|
|
522
|
-
5 | <p>First paragraph</p>
|
|
523
|
-
10 + | <p>New content</p>
|
|
524
|
-
11 | </clause>
|
|
525
|
-
----
|
|
526
|
-
|
|
527
|
-
The by-line mode shows:
|
|
528
|
-
|
|
529
|
-
* Traditional diff format with line numbers
|
|
530
|
-
* Full line context after canonicalization
|
|
531
|
-
* All changes at line level (not semantic level)
|
|
532
|
-
* Useful for reviewing exact textual differences
|
|
533
|
-
|
|
534
|
-
====== YAML comparison (by-object mode)
|
|
535
|
-
|
|
536
|
-
YAML comparison uses by-object mode to show semantic differences in the data
|
|
537
|
-
structure, ignoring formatting and key ordering differences.
|
|
538
|
-
|
|
539
|
-
[example]
|
|
540
|
-
Given these two YAML files:
|
|
541
|
-
|
|
542
|
-
.config1.yaml
|
|
543
|
-
[source,yaml]
|
|
544
|
-
----
|
|
545
|
-
database:
|
|
546
|
-
host: localhost
|
|
547
|
-
port: 5432
|
|
548
|
-
name: mydb
|
|
549
|
-
logging:
|
|
550
|
-
level: info
|
|
551
|
-
format: json
|
|
552
|
-
----
|
|
553
|
-
|
|
554
|
-
.config2.yaml
|
|
555
|
-
[source,yaml]
|
|
556
|
-
----
|
|
557
|
-
logging:
|
|
558
|
-
level: debug
|
|
559
|
-
format: json
|
|
560
|
-
database:
|
|
561
|
-
port: 5432
|
|
562
|
-
host: localhost
|
|
563
|
-
name: production
|
|
564
|
-
----
|
|
565
|
-
|
|
566
|
-
Running with `--verbose`:
|
|
567
|
-
|
|
568
|
-
[source,bash]
|
|
569
|
-
----
|
|
570
|
-
$ canon diff config1.yaml config2.yaml --verbose
|
|
571
|
-
Visual Diff:
|
|
572
|
-
├── database.name:
|
|
573
|
-
│ ├── - "mydb"
|
|
574
|
-
│ └── + "production"
|
|
575
|
-
└── logging.level:
|
|
576
|
-
├── - "info"
|
|
577
|
-
└── + "debug"
|
|
578
|
-
----
|
|
579
|
-
|
|
580
|
-
The by-object mode shows:
|
|
581
|
-
|
|
582
|
-
* Section reordering (`logging` before `database`) is ignored
|
|
583
|
-
* Key reordering within sections (`port` before `host`) is ignored
|
|
584
|
-
* Only semantic value changes are displayed
|
|
585
|
-
* Tree structure clearly shows the path to each change
|
|
586
|
-
|
|
587
|
-
===== Comparison options examples
|
|
588
|
-
|
|
589
|
-
[source,bash]
|
|
590
|
-
----
|
|
591
|
-
# Include comments in XML comparison
|
|
592
|
-
$ canon diff doc1.xml doc2.xml --with-comments --verbose
|
|
593
|
-
|
|
594
|
-
# Ignore all text content, only compare structure
|
|
595
|
-
$ canon diff template1.html template2.html --ignore-text-nodes
|
|
596
|
-
|
|
597
|
-
# Don't collapse whitespace (exact whitespace comparison)
|
|
598
|
-
$ canon diff file1.xml file2.xml --no-collapse-whitespace
|
|
599
|
-
|
|
600
|
-
# Compare different formats (must have same structure)
|
|
601
|
-
$ canon diff config.json config.yaml --format1 json --format2 yaml --verbose
|
|
602
|
-
----
|
|
603
|
-
|
|
604
|
-
===== HTML comparison (by-line mode only)
|
|
605
|
-
|
|
606
|
-
HTML comparison always uses by-line mode after canonicalization, which is ideal
|
|
607
|
-
for reviewing markup structure changes.
|
|
608
|
-
|
|
609
|
-
[example]
|
|
610
|
-
Given these two HTML files:
|
|
611
|
-
|
|
612
|
-
.page1.html
|
|
613
|
-
[source,html]
|
|
614
|
-
----
|
|
615
|
-
<!DOCTYPE html>
|
|
616
|
-
<html>
|
|
617
|
-
<head>
|
|
618
|
-
<title>My Page</title>
|
|
619
|
-
</head>
|
|
620
|
-
<body>
|
|
621
|
-
<div class="header">
|
|
622
|
-
<h1>Welcome</h1>
|
|
623
|
-
<p>Introduction text</p>
|
|
624
|
-
</div>
|
|
625
|
-
<div class="content">
|
|
626
|
-
<p>Main content</p>
|
|
627
|
-
</div>
|
|
628
|
-
</body>
|
|
629
|
-
</html>
|
|
630
|
-
----
|
|
631
|
-
|
|
632
|
-
.page2.html
|
|
633
|
-
[source,html]
|
|
634
|
-
----
|
|
635
|
-
<!DOCTYPE html>
|
|
636
|
-
<html>
|
|
637
|
-
<head>
|
|
638
|
-
<title>My Updated Page</title>
|
|
639
|
-
</head>
|
|
640
|
-
<body>
|
|
641
|
-
<nav class="header">
|
|
642
|
-
<h1>Welcome</h1>
|
|
643
|
-
<p>Updated introduction</p>
|
|
644
|
-
</nav>
|
|
645
|
-
<div class="content">
|
|
646
|
-
<p>Main content</p>
|
|
647
|
-
<p>Additional paragraph</p>
|
|
648
|
-
</div>
|
|
649
|
-
</body>
|
|
650
|
-
</html>
|
|
651
|
-
----
|
|
652
|
-
|
|
653
|
-
Running with `--verbose`:
|
|
654
|
-
|
|
655
|
-
[source,bash]
|
|
656
|
-
----
|
|
657
|
-
$ canon diff page1.html page2.html --verbose
|
|
658
|
-
Line-by-line diff:
|
|
659
|
-
4 - | <title>My Page</title>
|
|
660
|
-
4 + | <title>My Updated Page</title>
|
|
661
|
-
7 - | <div class="header">
|
|
662
|
-
7 + | <nav class="header">
|
|
663
|
-
9 - | <p>Introduction text</p>
|
|
664
|
-
9 + | <p>Updated introduction</p>
|
|
665
|
-
10 - | </div>
|
|
666
|
-
10 + | </nav>
|
|
667
|
-
13 + | <p>Additional paragraph</p>
|
|
668
|
-
14 | </div>
|
|
669
|
-
----
|
|
670
|
-
|
|
671
|
-
The line-by-line mode shows:
|
|
672
|
-
|
|
673
|
-
* Element name changes (`<div>` to `<nav>`)
|
|
674
|
-
* Text content changes
|
|
675
|
-
* Added elements with proper indentation context
|
|
676
|
-
* Line numbers help locate changes in the document
|
|
677
|
-
|
|
678
|
-
===== Exit codes
|
|
679
|
-
|
|
680
|
-
* `0` - Files are semantically equivalent
|
|
681
|
-
* `1` - Files are semantically different
|
|
682
|
-
|
|
683
|
-
=== Ruby API usage
|
|
684
|
-
|
|
685
|
-
=== Basic formatting (c14n mode)
|
|
686
|
-
|
|
687
|
-
The `Canon.format` method produces canonical output by default.
|
|
688
|
-
|
|
689
|
-
Syntax:
|
|
690
|
-
|
|
691
|
-
[source,ruby]
|
|
692
|
-
----
|
|
693
|
-
Canon.format({content}, {format})
|
|
694
|
-
Canon.format_{format}({content}) # Format-specific shorthand
|
|
695
|
-
----
|
|
696
|
-
|
|
697
|
-
Where,
|
|
698
|
-
|
|
699
|
-
`{content}`:: The input string
|
|
700
|
-
`{format}`:: The format type (`:xml`, `:html`, `:json`, or `:yaml`)
|
|
701
|
-
|
|
702
|
-
.Canonical formatting examples
|
|
703
|
-
[example]
|
|
704
|
-
====
|
|
705
|
-
[source,ruby]
|
|
706
|
-
----
|
|
707
|
-
require 'canon'
|
|
708
|
-
|
|
709
|
-
# XML - compact canonical form
|
|
710
|
-
xml = '<root><b>2</b><a>1</a></root>'
|
|
711
|
-
Canon.format(xml, :xml)
|
|
712
|
-
# => "<root><a>1</a><b>2</b></root>"
|
|
713
|
-
|
|
714
|
-
Canon.format_xml(xml) # Shorthand
|
|
715
|
-
# => "<root><a>1</a><b>2</b></root>"
|
|
716
|
-
|
|
717
|
-
# HTML - compact canonical form
|
|
718
|
-
html = '<div><p>Hello</p></div>'
|
|
719
|
-
Canon.format(html, :html)
|
|
720
|
-
Canon.format_html(html) # Shorthand
|
|
721
|
-
|
|
722
|
-
# JSON - canonical with sorted keys
|
|
723
|
-
json = '{"z":3,"a":1,"b":2}'
|
|
724
|
-
Canon.format(json, :json)
|
|
725
|
-
# => {"a":1,"b":2,"z":3}
|
|
726
|
-
|
|
727
|
-
# YAML - canonical with sorted keys
|
|
728
|
-
yaml = "z: 3\na: 1\nb: 2"
|
|
729
|
-
Canon.format(yaml, :yaml)
|
|
730
|
-
----
|
|
731
|
-
====
|
|
732
|
-
|
|
733
|
-
=== Pretty-print mode
|
|
734
|
-
|
|
735
|
-
For human-readable output with indentation, use the format-specific pretty
|
|
736
|
-
printer classes.
|
|
737
|
-
|
|
738
|
-
Syntax:
|
|
739
|
-
|
|
740
|
-
[source,ruby]
|
|
741
|
-
----
|
|
742
|
-
Canon::{Format}::PrettyPrinter.new(indent: {n}, indent_type: {type}).format({content})
|
|
743
|
-
----
|
|
744
|
-
|
|
745
|
-
Where,
|
|
746
|
-
|
|
747
|
-
`{Format}`:: The format module (`Xml`, `Html`, `Json`)
|
|
748
|
-
`{n}`:: Number of spaces (default: 2) or tabs (use 1 for tabs)
|
|
749
|
-
`{type}`:: Indentation type: `'space'` (default) or `'tab'`
|
|
750
|
-
`{content}`:: The input string
|
|
751
|
-
|
|
752
|
-
.Pretty-print examples
|
|
753
|
-
[example]
|
|
754
|
-
====
|
|
755
|
-
[source,ruby]
|
|
756
|
-
----
|
|
757
|
-
require 'canon/xml/pretty_printer'
|
|
758
|
-
require 'canon/html/pretty_printer'
|
|
759
|
-
require 'canon/json/pretty_printer'
|
|
760
|
-
|
|
761
|
-
xml_input = '<root><b>2</b><a>1</a></root>'
|
|
762
|
-
|
|
763
|
-
# XML with 2-space indentation (default)
|
|
764
|
-
Canon::Xml::PrettyPrinter.new(indent: 2).format(xml_input)
|
|
765
|
-
# =>
|
|
766
|
-
# <?xml version="1.0" encoding="UTF-8"?>
|
|
767
|
-
# <root>
|
|
768
|
-
# <a>1</a>
|
|
769
|
-
# <b>2</b>
|
|
770
|
-
# </root>
|
|
771
|
-
|
|
772
|
-
# XML with 4-space indentation
|
|
773
|
-
Canon::Xml::PrettyPrinter.new(indent: 4).format(xml_input)
|
|
774
|
-
|
|
775
|
-
# XML with tab indentation
|
|
776
|
-
Canon::Xml::PrettyPrinter.new(
|
|
777
|
-
indent: 1,
|
|
778
|
-
indent_type: 'tab'
|
|
779
|
-
).format(xml_input)
|
|
780
|
-
|
|
781
|
-
# HTML with 2-space indentation
|
|
782
|
-
html_input = '<div><p>Hello</p></div>'
|
|
783
|
-
Canon::Html::PrettyPrinter.new(indent: 2).format(html_input)
|
|
784
|
-
|
|
785
|
-
# JSON with 2-space indentation
|
|
786
|
-
json_input = '{"z":3,"a":{"b":1}}'
|
|
787
|
-
Canon::Json::PrettyPrinter.new(indent: 2).format(json_input)
|
|
788
|
-
|
|
789
|
-
# JSON with tab indentation
|
|
790
|
-
Canon::Json::PrettyPrinter.new(
|
|
791
|
-
indent: 1,
|
|
792
|
-
indent_type: 'tab'
|
|
793
|
-
).format(json_input)
|
|
794
|
-
----
|
|
795
|
-
====
|
|
796
|
-
|
|
797
|
-
=== Parsing
|
|
798
|
-
|
|
799
|
-
The `Canon.parse` method parses content into Ruby objects or Nokogiri documents.
|
|
800
|
-
|
|
801
|
-
Syntax:
|
|
802
|
-
|
|
803
|
-
[source,ruby]
|
|
804
|
-
----
|
|
805
|
-
Canon.parse({content}, {format})
|
|
806
|
-
Canon.parse_{format}({content}) # Format-specific shorthand
|
|
807
|
-
----
|
|
808
|
-
|
|
809
|
-
Where,
|
|
810
|
-
|
|
811
|
-
`{content}`:: The input string
|
|
812
|
-
`{format}`:: The format type (`:xml`, `:html`, `:json`, or `:yaml`)
|
|
813
|
-
|
|
814
|
-
.Parsing examples
|
|
815
|
-
[example]
|
|
816
|
-
====
|
|
817
|
-
[source,ruby]
|
|
818
|
-
----
|
|
819
|
-
# Parse XML → Nokogiri::XML::Document
|
|
820
|
-
xml_doc = Canon.parse(xml_input, :xml)
|
|
821
|
-
xml_doc = Canon.parse_xml(xml_input)
|
|
822
|
-
|
|
823
|
-
# Parse HTML → Nokogiri::HTML5::Document (or XML::Document for XHTML)
|
|
824
|
-
html_doc = Canon.parse(html_input, :html)
|
|
825
|
-
html_doc = Canon.parse_html(html_input)
|
|
826
|
-
|
|
827
|
-
# Parse JSON → Ruby Hash/Array
|
|
828
|
-
json_obj = Canon.parse(json_input, :json)
|
|
829
|
-
json_obj = Canon.parse_json(json_input)
|
|
830
|
-
|
|
831
|
-
# Parse YAML → Ruby Hash/Array
|
|
832
|
-
yaml_obj = Canon.parse(yaml_input, :yaml)
|
|
833
|
-
yaml_obj = Canon.parse_yaml(yaml_input)
|
|
834
|
-
----
|
|
835
|
-
====
|
|
836
|
-
|
|
837
|
-
=== Comparison
|
|
838
|
-
|
|
839
|
-
The `Canon::Comparison.equivalent?` method compares two HTML or XML documents.
|
|
840
|
-
|
|
841
|
-
The Comparison module uses a depth-first comparison based on the two DOM trees
|
|
842
|
-
by traversing them in parallel and comparing nodes.
|
|
843
|
-
|
|
844
|
-
In XML mode:
|
|
845
|
-
|
|
846
|
-
* Parsing: accepts Moxml (`Moxml::Document`) or Nokogiri
|
|
847
|
-
(`Nokogiri::XML::Document`)
|
|
848
|
-
* Comments: normalized and compared unless `ignore_comments: true`
|
|
849
|
-
* Whitespace: collapses whitespace in text nodes unless `collapse_whitespace: false`
|
|
850
|
-
* Sorts attributes alphabetically before comparison
|
|
851
|
-
|
|
852
|
-
In HTML mode:
|
|
853
|
-
|
|
854
|
-
* Parsing: accepts Nokogiri (`Nokogiri::HTML5` or `Nokogiri::HTML`)
|
|
855
|
-
* Normalizes HTML comments in `<style>` and `<script>` tags
|
|
856
|
-
* Sorts attributes alphabetically before comparison
|
|
857
|
-
* Collapses whitespace for text content comparison
|
|
858
|
-
* Removes empty text nodes between elements
|
|
859
|
-
|
|
860
|
-
[NOTE]
|
|
861
|
-
====
|
|
862
|
-
The comparison module is automatically used by Canon's RSpec matchers
|
|
863
|
-
(`be_html_equivalent_to`, `be_xml_equivalent_to`, etc.) to provide reliable
|
|
864
|
-
semantic comparison in tests.
|
|
865
|
-
====
|
|
866
|
-
|
|
867
|
-
|
|
868
|
-
Syntax:
|
|
869
|
-
|
|
870
|
-
[source,ruby]
|
|
871
|
-
----
|
|
872
|
-
Canon::Comparison.equivalent?({doc1}, {doc2}, {options})
|
|
873
|
-
----
|
|
874
|
-
|
|
875
|
-
Where,
|
|
876
|
-
|
|
877
|
-
`{doc1}`:: First document object (String, Nokogiri::HTML::Document, or supported XML document)
|
|
878
|
-
`{doc2}`:: Second document object (String, Nokogiri::HTML::Document, or supported XML document)
|
|
879
|
-
`{options}`:: Hash of comparison options (optional)
|
|
880
|
-
|
|
881
|
-
Canon::Comparison for XML supports Moxml::Document and Nokogiri::XML::Document
|
|
882
|
-
as input.
|
|
883
|
-
|
|
884
|
-
Returns:
|
|
885
|
-
|
|
886
|
-
* `true` if documents are equivalent
|
|
887
|
-
* `false` if documents differ
|
|
888
|
-
* `Array` of differences if `verbose: true` option is set
|
|
889
|
-
|
|
890
|
-
.Basic comparison examples
|
|
891
|
-
[example]
|
|
892
|
-
====
|
|
893
|
-
[source,ruby]
|
|
894
|
-
----
|
|
895
|
-
require 'canon/comparison'
|
|
896
|
-
|
|
897
|
-
# HTML comparison - ignores whitespace and comments by default
|
|
898
|
-
html1 = '<div><p>Hello</p></div>'
|
|
899
|
-
html2 = '<div> <p> Hello </p> </div>'
|
|
900
|
-
Canon::Comparison.equivalent?(html1, html2)
|
|
901
|
-
# => true
|
|
902
|
-
|
|
903
|
-
# HTML with different content
|
|
904
|
-
html3 = '<div><p>Goodbye</p></div>'
|
|
905
|
-
Canon::Comparison.equivalent?(html1, html3)
|
|
906
|
-
# => false
|
|
907
|
-
|
|
908
|
-
# XML comparison
|
|
909
|
-
xml1 = '<root><a>1</a><b>2</b></root>'
|
|
910
|
-
xml2 = '<root> <b>2</b> <a>1</a> </root>'
|
|
911
|
-
Canon::Comparison.equivalent?(xml1, xml2)
|
|
912
|
-
# => true
|
|
913
|
-
|
|
914
|
-
# With Nokogiri documents
|
|
915
|
-
doc1 = Nokogiri::HTML5(html1)
|
|
916
|
-
doc2 = Nokogiri::HTML5(html2)
|
|
917
|
-
Canon::Comparison.equivalent?(doc1, doc2)
|
|
918
|
-
# => true
|
|
919
|
-
----
|
|
920
|
-
====
|
|
921
|
-
|
|
922
|
-
=== RSpec usage
|
|
923
|
-
|
|
924
|
-
=== General
|
|
925
|
-
|
|
926
|
-
RSpec matchers for testing equivalence between serialized formats. All matchers
|
|
927
|
-
use canonical (c14n) mode for comparison.
|
|
928
|
-
|
|
929
|
-
See <<Diff formatting configuration>> for details on configuring diff output
|
|
930
|
-
in RSpec matchers.
|
|
931
|
-
|
|
932
|
-
.RSpec matcher examples
|
|
933
|
-
[example]
|
|
934
|
-
====
|
|
935
|
-
[source,ruby]
|
|
936
|
-
----
|
|
937
|
-
require 'rspec'
|
|
938
|
-
require 'canon'
|
|
939
|
-
|
|
940
|
-
RSpec.describe 'Serialization equivalence' do
|
|
941
|
-
# Unified matcher with format parameter
|
|
942
|
-
it 'compares XML' do
|
|
943
|
-
xml1 = '<root><a>1</a><b>2</b></root>'
|
|
944
|
-
xml2 = '<root> <b>2</b> <a>1</a> </root>'
|
|
945
|
-
expect(xml1).to be_serialization_equivalent_to(xml2, format: :xml)
|
|
946
|
-
end
|
|
947
|
-
|
|
948
|
-
it 'compares HTML' do
|
|
949
|
-
html1 = '<div><p>Hello</p></div>'
|
|
950
|
-
html2 = '<div> <p> Hello </p> </div>'
|
|
951
|
-
expect(html1).to be_serialization_equivalent_to(html2, format: :html)
|
|
952
|
-
end
|
|
953
|
-
|
|
954
|
-
it 'compares JSON' do
|
|
955
|
-
json1 = '{"a":1,"b":2}'
|
|
956
|
-
json2 = '{"b":2,"a":1}'
|
|
957
|
-
expect(json1).to be_serialization_equivalent_to(json2, format: :json)
|
|
958
|
-
end
|
|
959
|
-
|
|
960
|
-
it 'compares YAML' do
|
|
961
|
-
yaml1 = "a: 1\nb: 2"
|
|
962
|
-
yaml2 = "b: 2\na: 1"
|
|
963
|
-
expect(yaml1).to be_serialization_equivalent_to(yaml2, format: :yaml)
|
|
964
|
-
end
|
|
965
|
-
|
|
966
|
-
# Format-specific matchers
|
|
967
|
-
it 'uses format-specific matchers' do
|
|
968
|
-
expect(xml1).to be_xml_equivalent_to(xml2) # XML
|
|
969
|
-
expect(xml1).to be_analogous_with(xml2) # XML (legacy)
|
|
970
|
-
expect(html1).to be_html_equivalent_to(html2) # HTML
|
|
971
|
-
expect(json1).to be_json_equivalent_to(json2) # JSON
|
|
972
|
-
expect(yaml1).to be_yaml_equivalent_to(yaml2) # YAML
|
|
973
|
-
end
|
|
974
|
-
end
|
|
975
|
-
----
|
|
976
|
-
====
|
|
977
|
-
|
|
978
|
-
[IMPORTANT]
|
|
979
|
-
====
|
|
980
|
-
RSpec matchers always canonicalize both sides before comparing, so:
|
|
981
|
-
|
|
982
|
-
* Formatting differences (whitespace, indentation) are ignored
|
|
983
|
-
* Attribute order in XML/HTML is normalized
|
|
984
|
-
* Key order in JSON/YAML is normalized
|
|
985
|
-
* Tests focus on content equality, not formatting
|
|
986
|
-
====
|
|
987
|
-
|
|
988
|
-
|
|
989
|
-
=== Usage examples
|
|
990
|
-
|
|
991
|
-
==== Using predefined profiles
|
|
992
|
-
|
|
993
|
-
Use a profile for XML comparison:
|
|
994
|
-
|
|
995
|
-
[source,ruby]
|
|
996
|
-
----
|
|
997
|
-
expect(actual_xml).to be_xml_equivalent_to(
|
|
998
|
-
expected_xml,
|
|
999
|
-
match_profile: :spec_friendly
|
|
1000
|
-
)
|
|
1001
|
-
----
|
|
1002
|
-
|
|
1003
|
-
Use a profile for HTML comparison:
|
|
1004
|
-
|
|
1005
|
-
[source,ruby]
|
|
1006
|
-
----
|
|
1007
|
-
expect(actual_html).to be_html_equivalent_to(
|
|
1008
|
-
expected_html,
|
|
1009
|
-
match_profile: :content_only
|
|
1010
|
-
)
|
|
1011
|
-
----
|
|
1012
|
-
|
|
1013
|
-
==== Using explicit match options
|
|
1014
|
-
|
|
1015
|
-
Override specific dimensions:
|
|
1016
|
-
|
|
1017
|
-
[source,ruby]
|
|
1018
|
-
----
|
|
1019
|
-
expect(actual_xml).to be_xml_equivalent_to(
|
|
1020
|
-
expected_xml,
|
|
1021
|
-
match_options: {
|
|
1022
|
-
text_content: :normalize,
|
|
1023
|
-
structural_whitespace: :ignore,
|
|
1024
|
-
attribute_whitespace: :strict,
|
|
1025
|
-
comments: :ignore
|
|
1026
|
-
}
|
|
1027
|
-
)
|
|
1028
|
-
----
|
|
1029
|
-
|
|
1030
|
-
==== Combining profiles and explicit options
|
|
1031
|
-
|
|
1032
|
-
Explicit options override profile settings:
|
|
1033
|
-
|
|
1034
|
-
[source,ruby]
|
|
1035
|
-
----
|
|
1036
|
-
expect(actual_xml).to be_xml_equivalent_to(
|
|
1037
|
-
expected_xml,
|
|
1038
|
-
match_profile: :spec_friendly,
|
|
1039
|
-
match_options: {
|
|
1040
|
-
attribute_whitespace: :strict # Override just this dimension
|
|
1041
|
-
}
|
|
1042
|
-
)
|
|
1043
|
-
----
|
|
1044
|
-
|
|
1045
|
-
==== Global configuration
|
|
1046
|
-
|
|
1047
|
-
Set a global default profile for all tests:
|
|
1048
|
-
|
|
1049
|
-
[source,ruby]
|
|
1050
|
-
----
|
|
1051
|
-
# In spec_helper.rb
|
|
1052
|
-
Canon::RSpecMatchers.configure do |config|
|
|
1053
|
-
config.xml_match_profile = :spec_friendly
|
|
1054
|
-
config.html_match_profile = :rendered
|
|
1055
|
-
end
|
|
1056
|
-
----
|
|
1057
|
-
|
|
1058
|
-
Override global profile in specific tests:
|
|
1059
|
-
|
|
1060
|
-
[source,ruby]
|
|
1061
|
-
----
|
|
1062
|
-
# This test uses strict matching despite global spec_friendly
|
|
1063
|
-
expect(actual_xml).to be_xml_equivalent_to(
|
|
1064
|
-
expected_xml,
|
|
1065
|
-
match_profile: :strict
|
|
1066
|
-
)
|
|
1067
|
-
----
|
|
1068
|
-
|
|
1069
|
-
== Configuration
|
|
1070
|
-
|
|
1071
|
-
=== Comparison options
|
|
1072
|
-
|
|
1073
|
-
=== Overview
|
|
1074
|
-
|
|
1075
|
-
Canon provides a flexible matching system for XML, HTML, JSON, and YAML
|
|
1076
|
-
comparisons.
|
|
1077
|
-
|
|
1078
|
-
This system allows precise control over how whitespace and formatting
|
|
1079
|
-
differences are handled during comparison.
|
|
1080
|
-
|
|
1081
|
-
These options apply to the `Canon::Comparison.equivalent?` method, Canon's
|
|
1082
|
-
RSpec matchers as well as for the command-line `canon diff` tool to perform
|
|
1083
|
-
semantic comparison.
|
|
1084
|
-
|
|
1085
|
-
The system uses a two-phase architecture:
|
|
1086
|
-
|
|
1087
|
-
* *Preprocessing phase*: What to compare (normalization, canonicalization, formatting)
|
|
1088
|
-
* *Matching phase*: How to compare (4 dimensions × 3 behaviors)
|
|
1089
|
-
|
|
1090
|
-
The system uses `match_options` and `match_profile` parameters that offer
|
|
1091
|
-
precise control over comparison behavior.
|
|
1092
|
-
|
|
1093
|
-
`ignore_attr_order`:: (default: `true`) when `true`, ignores attribute ordering
|
|
1094
|
-
(<<ignore_attr_order>>)
|
|
1095
|
-
|
|
1096
|
-
`verbose`:: (default: `false`) when `true`, returns array of differences instead
|
|
1097
|
-
of boolean (<<verbose>>)
|
|
1098
|
-
|
|
1099
|
-
|
|
1100
|
-
=== Preprocessing phase
|
|
1101
|
-
|
|
1102
|
-
The preprocessing phase determines what content is compared.
|
|
1103
|
-
|
|
1104
|
-
Canon supports four preprocessing options:
|
|
1105
|
-
|
|
1106
|
-
[cols="1,3"]
|
|
1107
|
-
|===
|
|
1108
|
-
| Option | Description
|
|
1109
|
-
|
|
1110
|
-
| `:none`
|
|
1111
|
-
| No preprocessing - compare raw content as-is
|
|
1112
|
-
|
|
1113
|
-
| `:c14n`
|
|
1114
|
-
| Apply XML Canonicalization (C14N) to normalize structure before comparison
|
|
1115
|
-
|
|
1116
|
-
| `:normalize`
|
|
1117
|
-
| Apply whitespace normalization (collapsing, trimming) before comparison
|
|
1118
|
-
|
|
1119
|
-
| `:format`
|
|
1120
|
-
| Apply format-specific pretty-printing to standardize formatting before comparison
|
|
1121
|
-
|
|
1122
|
-
|===
|
|
1123
|
-
|
|
1124
|
-
The preprocessing option is controlled via the `preprocessing` parameter and
|
|
1125
|
-
defaults based on the format being compared.
|
|
1126
|
-
|
|
1127
|
-
=== Matching phase
|
|
1128
|
-
|
|
1129
|
-
The matching phase defines how content is compared across four independent
|
|
1130
|
-
dimensions. Each dimension can be configured with one of three mutually
|
|
1131
|
-
exclusive behaviors.
|
|
1132
|
-
|
|
1133
|
-
=== Match dimensions
|
|
1134
|
-
|
|
1135
|
-
The matching phase operates on four collectively exhaustive dimensions:
|
|
1136
|
-
|
|
1137
|
-
[cols="1,3"]
|
|
1138
|
-
|===
|
|
1139
|
-
| Dimension | What it controls
|
|
1140
|
-
|
|
1141
|
-
| `text_content`
|
|
1142
|
-
| Text content within elements/values
|
|
1143
|
-
|
|
1144
|
-
| `structural_whitespace`
|
|
1145
|
-
| Whitespace between tags/elements (indentation, line breaks)
|
|
1146
|
-
|
|
1147
|
-
| `attribute_whitespace`
|
|
1148
|
-
| Whitespace within attribute values
|
|
1149
|
-
|
|
1150
|
-
| `comments`
|
|
1151
|
-
| How comments are handled
|
|
1152
|
-
|===
|
|
1153
|
-
|
|
1154
|
-
These four dimensions are collectively exhaustive - they cover all aspects of
|
|
1155
|
-
whitespace and formatting in structured documents.
|
|
1156
|
-
|
|
1157
|
-
=== Match behaviors
|
|
1158
|
-
|
|
1159
|
-
For each dimension, you can specify one of three mutually exclusive behaviors:
|
|
1160
|
-
|
|
1161
|
-
[cols="1,3"]
|
|
1162
|
-
|===
|
|
1163
|
-
| Behavior | Description
|
|
1164
|
-
|
|
1165
|
-
| `:strict`
|
|
1166
|
-
| Exact character-for-character matching (including all whitespace)
|
|
1167
|
-
|
|
1168
|
-
| `:normalize`
|
|
1169
|
-
| Collapse consecutive whitespace to single spaces, trim leading/trailing whitespace
|
|
1170
|
-
|
|
1171
|
-
| `:ignore`
|
|
1172
|
-
| Don't compare this dimension at all
|
|
1173
|
-
|===
|
|
1174
|
-
|
|
1175
|
-
=== Match profiles
|
|
1176
|
-
|
|
1177
|
-
==== Overview
|
|
1178
|
-
|
|
1179
|
-
Canon provides a set of predefined match profiles optimized for common use cases.
|
|
1180
|
-
|
|
1181
|
-
The following table shows how each profile configures the four match dimensions:
|
|
1182
|
-
|
|
1183
|
-
[cols="1,1,1,1,1"]
|
|
1184
|
-
|===
|
|
1185
|
-
|Profile |text_content |structural_whitespace |attribute_whitespace |comments
|
|
1186
|
-
|
|
1187
|
-
|`strict` |`:strict` |`:strict` |`:strict` |`:strict`
|
|
1188
|
-
|
|
1189
|
-
|`rendered` |`:normalize` |`:ignore` |`:normalize` |`:ignore`
|
|
1190
|
-
|
|
1191
|
-
|`spec_friendly` |`:normalize` |`:ignore` |`:normalize` |`:ignore`
|
|
1192
|
-
|
|
1193
|
-
|`content_only` |`:normalize` |`:ignore` |`:ignore` |`:ignore`
|
|
1194
|
-
|
|
1195
|
-
|===
|
|
1196
|
-
|
|
1197
|
-
The key differences between profiles are:
|
|
1198
|
-
|
|
1199
|
-
strict:: Exact matching on all dimensions - use for byte-for-byte comparison
|
|
1200
|
-
rendered:: Mimics browser rendering - collapses text, ignores formatting and comments
|
|
1201
|
-
spec_friendly:: Same as rendered - ideal for test specifications
|
|
1202
|
-
content_only:: Most permissive - only compares text content, ignores all formatting and attribute whitespace
|
|
1203
|
-
|
|
1204
|
-
NOTE: The `rendered` and `spec_friendly` profiles have identical configurations
|
|
1205
|
-
but serve different semantic purposes in your codebase.
|
|
1206
|
-
|
|
1207
|
-
==== Strict profile
|
|
1208
|
-
|
|
1209
|
-
The `strict` profile is the default for XML and requires exact matching:
|
|
1210
|
-
|
|
1211
|
-
[source,ruby]
|
|
1212
|
-
----
|
|
1213
|
-
{
|
|
1214
|
-
text_content: :strict,
|
|
1215
|
-
structural_whitespace: :strict,
|
|
1216
|
-
attribute_whitespace: :strict,
|
|
1217
|
-
comments: :strict
|
|
1218
|
-
}
|
|
1219
|
-
----
|
|
1220
|
-
|
|
1221
|
-
Use this when:
|
|
1222
|
-
|
|
1223
|
-
* You need exact byte-for-byte comparison
|
|
1224
|
-
* Whitespace is semantically significant
|
|
1225
|
-
* Working with canonicalized or pre-normalized content
|
|
1226
|
-
|
|
1227
|
-
==== Rendered profile
|
|
1228
|
-
|
|
1229
|
-
The `rendered` profile mimics how browsers render HTML/XML:
|
|
1230
|
-
|
|
1231
|
-
[source,ruby]
|
|
1232
|
-
----
|
|
1233
|
-
{
|
|
1234
|
-
text_content: :normalize,
|
|
1235
|
-
structural_whitespace: :ignore,
|
|
1236
|
-
attribute_whitespace: :normalize,
|
|
1237
|
-
comments: :ignore
|
|
1238
|
-
}
|
|
1239
|
-
----
|
|
1240
|
-
|
|
1241
|
-
Use this when:
|
|
1242
|
-
|
|
1243
|
-
* Comparing HTML documents where rendering matters
|
|
1244
|
-
* Whitespace between tags doesn't affect output
|
|
1245
|
-
* Comments are documentation-only
|
|
1246
|
-
|
|
1247
|
-
This is the default profile for HTML comparisons.
|
|
1248
|
-
|
|
1249
|
-
==== Spec-friendly profile
|
|
1250
|
-
|
|
1251
|
-
The `spec_friendly` profile ignores all formatting differences:
|
|
1252
|
-
|
|
1253
|
-
[source,ruby]
|
|
1254
|
-
----
|
|
1255
|
-
{
|
|
1256
|
-
text_content: :normalize,
|
|
1257
|
-
structural_whitespace: :ignore,
|
|
1258
|
-
attribute_whitespace: :normalize,
|
|
1259
|
-
comments: :ignore
|
|
1260
|
-
}
|
|
1261
|
-
----
|
|
1262
|
-
|
|
1263
|
-
Use this when:
|
|
1264
|
-
|
|
1265
|
-
* Writing test specifications
|
|
1266
|
-
* Formatting/indentation style doesn't matter
|
|
1267
|
-
* Generated vs. hand-written content comparison
|
|
1268
|
-
* CI/CD environments with different formatters
|
|
1269
|
-
|
|
1270
|
-
==== Content-only profile
|
|
1271
|
-
|
|
1272
|
-
The `content_only` profile focuses solely on actual content:
|
|
1273
|
-
|
|
1274
|
-
[source,ruby]
|
|
1275
|
-
----
|
|
1276
|
-
{
|
|
1277
|
-
text_content: :normalize,
|
|
1278
|
-
structural_whitespace: :ignore,
|
|
1279
|
-
attribute_whitespace: :ignore,
|
|
1280
|
-
comments: :ignore
|
|
1281
|
-
}
|
|
1282
|
-
----
|
|
1283
|
-
|
|
1284
|
-
Use this when:
|
|
1285
|
-
|
|
1286
|
-
* Only semantic content matters
|
|
1287
|
-
* All whitespace (including in attributes) is insignificant
|
|
1288
|
-
* Maximum tolerance for formatting differences
|
|
1289
|
-
|
|
1290
|
-
|
|
1291
|
-
=== Format-specific defaults
|
|
1292
|
-
|
|
1293
|
-
==== General
|
|
1294
|
-
|
|
1295
|
-
Different formats have different default behaviors optimized for their typical
|
|
1296
|
-
use cases.
|
|
1297
|
-
|
|
1298
|
-
==== XML defaults
|
|
1299
|
-
|
|
1300
|
-
[source,ruby]
|
|
1301
|
-
----
|
|
1302
|
-
{
|
|
1303
|
-
preprocessing: :none,
|
|
1304
|
-
match_profile: :strict
|
|
1305
|
-
}
|
|
1306
|
-
----
|
|
1307
|
-
|
|
1308
|
-
XML defaults to strict matching because:
|
|
1309
|
-
|
|
1310
|
-
* XML whitespace can be semantically significant
|
|
1311
|
-
* XML is often machine-generated with consistent formatting
|
|
1312
|
-
* Canonicalization (C14N) is available for normalization when needed
|
|
1313
|
-
|
|
1314
|
-
==== HTML defaults
|
|
1315
|
-
|
|
1316
|
-
[source,ruby]
|
|
1317
|
-
----
|
|
1318
|
-
{
|
|
1319
|
-
preprocessing: :none,
|
|
1320
|
-
match_profile: :rendered
|
|
1321
|
-
}
|
|
1322
|
-
----
|
|
1323
|
-
|
|
1324
|
-
HTML defaults to rendered-style matching because:
|
|
1325
|
-
|
|
1326
|
-
* Browsers collapse whitespace when rendering
|
|
1327
|
-
* Indentation and formatting are for readability only
|
|
1328
|
-
* Comments are typically documentation
|
|
1329
|
-
|
|
1330
|
-
==== JSON defaults
|
|
1331
|
-
|
|
1332
|
-
[source,ruby]
|
|
1333
|
-
----
|
|
1334
|
-
{
|
|
1335
|
-
preprocessing: :format,
|
|
1336
|
-
match_profile: :rendered
|
|
1337
|
-
}
|
|
1338
|
-
----
|
|
1339
|
-
|
|
1340
|
-
JSON applies pretty-printing before comparison because:
|
|
1341
|
-
|
|
1342
|
-
* JSON whitespace is never semantically significant
|
|
1343
|
-
* Minified vs. formatted JSON should be equivalent
|
|
1344
|
-
* Pretty-printing ensures consistent structure
|
|
1345
|
-
|
|
1346
|
-
==== YAML defaults
|
|
1347
|
-
|
|
1348
|
-
[source,ruby]
|
|
1349
|
-
----
|
|
1350
|
-
{
|
|
1351
|
-
preprocessing: :format,
|
|
1352
|
-
match_profile: :rendered
|
|
1353
|
-
}
|
|
1354
|
-
----
|
|
1355
|
-
|
|
1356
|
-
YAML applies pretty-printing because:
|
|
1357
|
-
|
|
1358
|
-
* YAML formatting can vary significantly
|
|
1359
|
-
* Indentation styles differ between generators
|
|
1360
|
-
* Content equivalence is what matters
|
|
1361
|
-
|
|
1362
|
-
|
|
1363
|
-
|
|
1364
|
-
==== Dimension-specific examples
|
|
1365
|
-
|
|
1366
|
-
=== Text content dimension
|
|
1367
|
-
|
|
1368
|
-
The `text_content` dimension controls how text within elements is compared.
|
|
1369
|
-
|
|
1370
|
-
==== Strict behavior (exact whitespace)
|
|
1371
|
-
|
|
1372
|
-
When `text_content: :strict`, all whitespace in text content must match exactly.
|
|
1373
|
-
|
|
1374
|
-
.XML examples with strict text_content
|
|
1375
|
-
[example]
|
|
1376
|
-
The following XML strings are **not** considered equal because whitespace differs:
|
|
1377
|
-
|
|
1378
|
-
[source,xml]
|
|
1379
|
-
----
|
|
1380
|
-
<p> text with spaces </p>
|
|
1381
|
-
<p>text with spaces</p>
|
|
1382
|
-
----
|
|
1383
|
-
|
|
1384
|
-
[source,ruby]
|
|
1385
|
-
----
|
|
1386
|
-
actual = "<p> text with spaces </p>"
|
|
1387
|
-
expected = "<p>text with spaces</p>"
|
|
1388
|
-
|
|
1389
|
-
expect(actual).not_to be_xml_equivalent_to(
|
|
1390
|
-
expected,
|
|
1391
|
-
match_options: {
|
|
1392
|
-
text_content: :strict,
|
|
1393
|
-
structural_whitespace: :ignore,
|
|
1394
|
-
attribute_whitespace: :strict,
|
|
1395
|
-
comments: :ignore
|
|
1396
|
-
}
|
|
1397
|
-
)
|
|
1398
|
-
# => true (documents are NOT equivalent)
|
|
1399
|
-
----
|
|
1400
|
-
|
|
1401
|
-
Even differences in leading/trailing whitespace matter:
|
|
1402
|
-
|
|
1403
|
-
[source,xml]
|
|
1404
|
-
----
|
|
1405
|
-
<item> Value </item>
|
|
1406
|
-
<item>Value</item>
|
|
1407
|
-
----
|
|
1408
|
-
|
|
1409
|
-
[source,ruby]
|
|
1410
|
-
----
|
|
1411
|
-
xml1 = "<item> Value </item>"
|
|
1412
|
-
xml2 = "<item>Value</item>"
|
|
1413
|
-
|
|
1414
|
-
expect(xml1).not_to be_xml_equivalent_to(
|
|
1415
|
-
xml2,
|
|
1416
|
-
match_options: { text_content: :strict, structural_whitespace: :ignore }
|
|
1417
|
-
)
|
|
1418
|
-
# => true (documents are NOT equivalent)
|
|
1419
|
-
----
|
|
1420
|
-
|
|
1421
|
-
.HTML examples with strict text_content
|
|
1422
|
-
[example]
|
|
1423
|
-
[source,html]
|
|
1424
|
-
----
|
|
1425
|
-
<a href="/admin"> SOME TEXT </a>
|
|
1426
|
-
<a href="/admin">SOME TEXT</a>
|
|
1427
|
-
----
|
|
1428
|
-
|
|
1429
|
-
[source,ruby]
|
|
1430
|
-
----
|
|
1431
|
-
html1 = '<a href="/admin"> SOME TEXT </a>'
|
|
1432
|
-
html2 = '<a href="/admin">SOME TEXT</a>'
|
|
1433
|
-
|
|
1434
|
-
expect(html1).not_to be_html_equivalent_to(
|
|
1435
|
-
html2,
|
|
1436
|
-
match_options: { text_content: :strict, structural_whitespace: :ignore }
|
|
1437
|
-
)
|
|
1438
|
-
# => true (documents are NOT equivalent)
|
|
1439
|
-
----
|
|
1440
|
-
|
|
1441
|
-
==== Normalize behavior (collapse whitespace)
|
|
1442
|
-
|
|
1443
|
-
When `text_content: :normalize`, consecutive whitespace is collapsed to single spaces and leading/trailing whitespace is trimmed.
|
|
1444
|
-
|
|
1445
|
-
.XML examples with normalized text_content
|
|
1446
|
-
[example]
|
|
1447
|
-
The following XML strings **are** considered equal:
|
|
1448
|
-
|
|
1449
|
-
[source,xml]
|
|
1450
|
-
----
|
|
1451
|
-
<p> text with multiple spaces </p>
|
|
1452
|
-
<p>text with multiple spaces</p>
|
|
1453
|
-
----
|
|
1454
|
-
|
|
1455
|
-
[source,ruby]
|
|
1456
|
-
----
|
|
1457
|
-
actual = "<p> text with multiple spaces </p>"
|
|
1458
|
-
expected = "<p>text with multiple spaces</p>"
|
|
1459
|
-
|
|
1460
|
-
expect(actual).to be_xml_equivalent_to(
|
|
1461
|
-
expected,
|
|
1462
|
-
match_options: {
|
|
1463
|
-
text_content: :normalize,
|
|
1464
|
-
structural_whitespace: :ignore,
|
|
1465
|
-
attribute_whitespace: :strict,
|
|
1466
|
-
comments: :ignore
|
|
1467
|
-
}
|
|
1468
|
-
)
|
|
1469
|
-
# => true (documents are equivalent)
|
|
1470
|
-
----
|
|
1471
|
-
|
|
1472
|
-
Tabs and newlines are also normalized:
|
|
1473
|
-
|
|
1474
|
-
[source,xml]
|
|
1475
|
-
----
|
|
1476
|
-
<description>
|
|
1477
|
-
This is a
|
|
1478
|
-
multi-line
|
|
1479
|
-
description
|
|
1480
|
-
</description>
|
|
1481
|
-
|
|
1482
|
-
<description>This is a multi-line description</description>
|
|
1483
|
-
----
|
|
1484
|
-
|
|
1485
|
-
[source,ruby]
|
|
1486
|
-
----
|
|
1487
|
-
xml1 = <<~XML
|
|
1488
|
-
<description>
|
|
1489
|
-
This is a
|
|
1490
|
-
multi-line
|
|
1491
|
-
description
|
|
1492
|
-
</description>
|
|
1493
|
-
XML
|
|
1494
|
-
|
|
1495
|
-
xml2 = "<description>This is a multi-line description</description>"
|
|
1496
|
-
|
|
1497
|
-
expect(xml1).to be_xml_equivalent_to(
|
|
1498
|
-
xml2,
|
|
1499
|
-
match_options: { text_content: :normalize, structural_whitespace: :ignore }
|
|
1500
|
-
)
|
|
1501
|
-
# => true (documents are equivalent)
|
|
1502
|
-
----
|
|
1503
|
-
|
|
1504
|
-
.HTML examples with normalized text_content
|
|
1505
|
-
[example]
|
|
1506
|
-
[source,html]
|
|
1507
|
-
----
|
|
1508
|
-
<a href="/admin"> SOME TEXT CONTENT </a>
|
|
1509
|
-
<a href="/admin">SOME TEXT CONTENT</a>
|
|
1510
|
-
----
|
|
1511
|
-
|
|
1512
|
-
[source,ruby]
|
|
1513
|
-
----
|
|
1514
|
-
html1 = '<a href="/admin"> SOME TEXT CONTENT </a>'
|
|
1515
|
-
html2 = '<a href="/admin">SOME TEXT CONTENT</a>'
|
|
1516
|
-
|
|
1517
|
-
expect(html1).to be_html_equivalent_to(
|
|
1518
|
-
html2,
|
|
1519
|
-
match_options: { text_content: :normalize, structural_whitespace: :ignore }
|
|
1520
|
-
)
|
|
1521
|
-
# => true (documents are equivalent)
|
|
1522
|
-
----
|
|
1523
|
-
|
|
1524
|
-
Multi-line HTML text:
|
|
1525
|
-
|
|
1526
|
-
[source,html]
|
|
1527
|
-
----
|
|
1528
|
-
<p>
|
|
1529
|
-
This is a paragraph
|
|
1530
|
-
with multiple lines
|
|
1531
|
-
of text.
|
|
1532
|
-
</p>
|
|
1533
|
-
|
|
1534
|
-
<p>This is a paragraph with multiple lines of text.</p>
|
|
1535
|
-
----
|
|
1536
|
-
|
|
1537
|
-
[source,ruby]
|
|
1538
|
-
----
|
|
1539
|
-
html1 = <<~HTML
|
|
1540
|
-
<p>
|
|
1541
|
-
This is a paragraph
|
|
1542
|
-
with multiple lines
|
|
1543
|
-
of text.
|
|
1544
|
-
</p>
|
|
1545
|
-
HTML
|
|
1546
|
-
|
|
1547
|
-
html2 = "<p>This is a paragraph with multiple lines of text.</p>"
|
|
1548
|
-
|
|
1549
|
-
expect(html1).to be_html_equivalent_to(
|
|
1550
|
-
html2,
|
|
1551
|
-
match_options: { text_content: :normalize, structural_whitespace: :ignore }
|
|
1552
|
-
)
|
|
1553
|
-
# => true (documents are equivalent)
|
|
1554
|
-
----
|
|
1555
|
-
|
|
1556
|
-
=== Structural whitespace dimension
|
|
1557
|
-
|
|
1558
|
-
The `structural_whitespace` dimension controls whitespace between tags (indentation, line breaks, formatting).
|
|
1559
|
-
|
|
1560
|
-
==== Strict behavior
|
|
1561
|
-
|
|
1562
|
-
When `structural_whitespace: :strict`, all whitespace between tags must match exactly, including indentation and line breaks.
|
|
1563
|
-
|
|
1564
|
-
.XML examples with strict structural_whitespace
|
|
1565
|
-
[example]
|
|
1566
|
-
These documents are **not** equivalent due to different indentation:
|
|
1567
|
-
|
|
1568
|
-
[source,xml]
|
|
1569
|
-
----
|
|
1570
|
-
<root>
|
|
1571
|
-
<item>Value</item>
|
|
1572
|
-
</root>
|
|
1573
|
-
|
|
1574
|
-
<root>
|
|
1575
|
-
<item>Value</item>
|
|
1576
|
-
</root>
|
|
1577
|
-
----
|
|
1578
|
-
|
|
1579
|
-
[source,ruby]
|
|
1580
|
-
----
|
|
1581
|
-
xml1 = "<root>\n <item>Value</item>\n</root>"
|
|
1582
|
-
xml2 = "<root>\n <item>Value</item>\n</root>"
|
|
1583
|
-
|
|
1584
|
-
expect(xml1).not_to be_xml_equivalent_to(
|
|
1585
|
-
xml2,
|
|
1586
|
-
match_options: {
|
|
1587
|
-
text_content: :normalize,
|
|
1588
|
-
structural_whitespace: :strict,
|
|
1589
|
-
attribute_whitespace: :strict,
|
|
1590
|
-
comments: :ignore
|
|
1591
|
-
}
|
|
1592
|
-
)
|
|
1593
|
-
# => true (documents are NOT equivalent - indentation differs)
|
|
1594
|
-
----
|
|
1595
|
-
|
|
1596
|
-
==== Ignore behavior (formatting doesn't matter)
|
|
1597
|
-
|
|
1598
|
-
When `structural_whitespace: :ignore`, all whitespace between tags is ignored, making pretty-printed and compact formats equivalent.
|
|
1599
|
-
|
|
1600
|
-
.XML examples with ignored structural_whitespace
|
|
1601
|
-
[example]
|
|
1602
|
-
Pretty-printed vs compact XML **are** considered equal:
|
|
1603
|
-
|
|
1604
|
-
[source,xml]
|
|
1605
|
-
----
|
|
1606
|
-
<!-- Pretty-printed with indentation -->
|
|
1607
|
-
<root>
|
|
1608
|
-
<a>
|
|
1609
|
-
<b>text</b>
|
|
1610
|
-
</a>
|
|
1611
|
-
</root>
|
|
1612
|
-
|
|
1613
|
-
<!-- Compact on one line -->
|
|
1614
|
-
<root><a><b>text</b></a></root>
|
|
1615
|
-
----
|
|
1616
|
-
|
|
1617
|
-
[source,ruby]
|
|
1618
|
-
----
|
|
1619
|
-
compact = "<root><a><b>text</b></a></root>"
|
|
1620
|
-
formatted = <<~XML
|
|
1621
|
-
<root>
|
|
1622
|
-
<a>
|
|
1623
|
-
<b>text</b>
|
|
1624
|
-
</a>
|
|
1625
|
-
</root>
|
|
1626
|
-
XML
|
|
1627
|
-
|
|
1628
|
-
expect(compact).to be_xml_equivalent_to(
|
|
1629
|
-
formatted,
|
|
1630
|
-
match_options: {
|
|
1631
|
-
text_content: :normalize,
|
|
1632
|
-
structural_whitespace: :ignore,
|
|
1633
|
-
attribute_whitespace: :strict,
|
|
1634
|
-
comments: :ignore
|
|
1635
|
-
}
|
|
1636
|
-
)
|
|
1637
|
-
# => true (documents are equivalent)
|
|
1638
|
-
----
|
|
1639
|
-
|
|
1640
|
-
Complex nested structures with different indentation:
|
|
1641
|
-
|
|
1642
|
-
[source,xml]
|
|
1643
|
-
----
|
|
1644
|
-
<!-- 2-space indentation -->
|
|
1645
|
-
<document>
|
|
1646
|
-
<metadata>
|
|
1647
|
-
<title>My Document</title>
|
|
1648
|
-
<author>
|
|
1649
|
-
<name>John Doe</name>
|
|
1650
|
-
</author>
|
|
1651
|
-
</metadata>
|
|
1652
|
-
</document>
|
|
1653
|
-
|
|
1654
|
-
<!-- 4-space indentation -->
|
|
1655
|
-
<document>
|
|
1656
|
-
<metadata>
|
|
1657
|
-
<title>My Document</title>
|
|
1658
|
-
<author>
|
|
1659
|
-
<name>John Doe</name>
|
|
1660
|
-
</author>
|
|
1661
|
-
</metadata>
|
|
1662
|
-
</document>
|
|
1663
|
-
|
|
1664
|
-
<!-- Compact -->
|
|
1665
|
-
<document><metadata><title>My Document</title><author><name>John Doe</name></author></metadata></document>
|
|
1666
|
-
----
|
|
1667
|
-
|
|
1668
|
-
[source,ruby]
|
|
1669
|
-
----
|
|
1670
|
-
two_spaces = <<~XML
|
|
1671
|
-
<document>
|
|
1672
|
-
<metadata>
|
|
1673
|
-
<title>My Document</title>
|
|
1674
|
-
<author>
|
|
1675
|
-
<name>John Doe</name>
|
|
1676
|
-
</author>
|
|
1677
|
-
</metadata>
|
|
1678
|
-
</document>
|
|
1679
|
-
XML
|
|
1680
|
-
|
|
1681
|
-
four_spaces = "<document>\n <metadata>\n <title>My Document</title>\n <author>\n <name>John Doe</name>\n </author>\n </metadata>\n</document>"
|
|
1682
|
-
|
|
1683
|
-
compact = "<document><metadata><title>My Document</title><author><name>John Doe</name></author></metadata></document>"
|
|
1684
|
-
|
|
1685
|
-
expect(two_spaces).to be_xml_equivalent_to(
|
|
1686
|
-
four_spaces,
|
|
1687
|
-
match_options: { structural_whitespace: :ignore }
|
|
1688
|
-
)
|
|
1689
|
-
# => true
|
|
1690
|
-
|
|
1691
|
-
expect(two_spaces).to be_xml_equivalent_to(
|
|
1692
|
-
compact,
|
|
1693
|
-
match_options: { structural_whitespace: :ignore }
|
|
1694
|
-
)
|
|
1695
|
-
# => true
|
|
1696
|
-
----
|
|
1697
|
-
|
|
1698
|
-
.HTML examples with ignored structural_whitespace
|
|
1699
|
-
[example]
|
|
1700
|
-
[source,html]
|
|
1701
|
-
----
|
|
1702
|
-
<!-- Pretty-printed -->
|
|
1703
|
-
<div class="container">
|
|
1704
|
-
<header>
|
|
1705
|
-
<h1>Welcome</h1>
|
|
1706
|
-
<p>Introduction text</p>
|
|
1707
|
-
</header>
|
|
1708
|
-
</div>
|
|
1709
|
-
|
|
1710
|
-
<!-- Compact -->
|
|
1711
|
-
<div class="container"><header><h1>Welcome</h1><p>Introduction text</p></header></div>
|
|
1712
|
-
----
|
|
1713
|
-
|
|
1714
|
-
[source,ruby]
|
|
1715
|
-
----
|
|
1716
|
-
pretty_html = <<~HTML
|
|
1717
|
-
<div class="container">
|
|
1718
|
-
<header>
|
|
1719
|
-
<h1>Welcome</h1>
|
|
1720
|
-
<p>Introduction text</p>
|
|
1721
|
-
</header>
|
|
1722
|
-
</div>
|
|
1723
|
-
HTML
|
|
1724
|
-
|
|
1725
|
-
compact_html = '<div class="container"><header><h1>Welcome</h1><p>Introduction text</p></header></div>'
|
|
1726
|
-
|
|
1727
|
-
expect(pretty_html).to be_html_equivalent_to(
|
|
1728
|
-
compact_html,
|
|
1729
|
-
match_options: { structural_whitespace: :ignore }
|
|
1730
|
-
)
|
|
1731
|
-
# => true (documents are equivalent)
|
|
1732
|
-
----
|
|
1733
|
-
|
|
1734
|
-
==== Normalize behavior
|
|
1735
|
-
|
|
1736
|
-
When `structural_whitespace: :normalize`, whitespace between tags is collapsed to single spaces.
|
|
1737
|
-
|
|
1738
|
-
.XML examples with normalized structural_whitespace
|
|
1739
|
-
[example]
|
|
1740
|
-
[source,xml]
|
|
1741
|
-
----
|
|
1742
|
-
<root>
|
|
1743
|
-
|
|
1744
|
-
|
|
1745
|
-
<item>Value</item>
|
|
1746
|
-
|
|
1747
|
-
|
|
1748
|
-
</root>
|
|
1749
|
-
|
|
1750
|
-
<root> <item>Value</item> </root>
|
|
1751
|
-
----
|
|
1752
|
-
|
|
1753
|
-
[source,ruby]
|
|
1754
|
-
----
|
|
1755
|
-
xml1 = "<root>\n\n\n <item>Value</item>\n\n\n</root>"
|
|
1756
|
-
xml2 = "<root> <item>Value</item> </root>"
|
|
1757
|
-
|
|
1758
|
-
expect(xml1).to be_xml_equivalent_to(
|
|
1759
|
-
xml2,
|
|
1760
|
-
match_options: { structural_whitespace: :normalize }
|
|
1761
|
-
)
|
|
1762
|
-
# => true (documents are equivalent - whitespace normalized)
|
|
1763
|
-
----
|
|
1764
|
-
|
|
1765
|
-
=== Attribute whitespace dimension
|
|
1766
|
-
|
|
1767
|
-
The `attribute_whitespace` dimension controls whitespace within attribute values.
|
|
1768
|
-
|
|
1769
|
-
==== Strict behavior (exact attribute whitespace)
|
|
1770
|
-
|
|
1771
|
-
When `attribute_whitespace: :strict`, whitespace in attribute values must match exactly.
|
|
1772
|
-
|
|
1773
|
-
.XML examples with strict attribute_whitespace
|
|
1774
|
-
[example]
|
|
1775
|
-
These documents are **not** equivalent due to attribute whitespace differences:
|
|
1776
|
-
|
|
1777
|
-
[source,xml]
|
|
1778
|
-
----
|
|
1779
|
-
<div class=" foo bar ">text</div>
|
|
1780
|
-
<div class="foo bar">text</div>
|
|
1781
|
-
----
|
|
1782
|
-
|
|
1783
|
-
[source,ruby]
|
|
1784
|
-
----
|
|
1785
|
-
actual = '<div class=" foo bar ">text</div>'
|
|
1786
|
-
expected = '<div class="foo bar">text</div>'
|
|
1787
|
-
|
|
1788
|
-
expect(actual).not_to be_xml_equivalent_to(
|
|
1789
|
-
expected,
|
|
1790
|
-
match_options: {
|
|
1791
|
-
text_content: :normalize,
|
|
1792
|
-
structural_whitespace: :ignore,
|
|
1793
|
-
attribute_whitespace: :strict,
|
|
1794
|
-
comments: :ignore
|
|
1795
|
-
}
|
|
1796
|
-
)
|
|
1797
|
-
# => true (documents are NOT equivalent)
|
|
1798
|
-
----
|
|
1799
|
-
|
|
1800
|
-
Leading/trailing whitespace in attributes:
|
|
1801
|
-
|
|
1802
|
-
[source,xml]
|
|
1803
|
-
----
|
|
1804
|
-
<item id=" 123 " name=" Widget "/>
|
|
1805
|
-
<item id="123" name="Widget"/>
|
|
1806
|
-
----
|
|
1807
|
-
|
|
1808
|
-
[source,ruby]
|
|
1809
|
-
----
|
|
1810
|
-
xml1 = '<item id=" 123 " name=" Widget "/>'
|
|
1811
|
-
xml2 = '<item id="123" name="Widget"/>'
|
|
1812
|
-
|
|
1813
|
-
expect(xml1).not_to be_xml_equivalent_to(
|
|
1814
|
-
xml2,
|
|
1815
|
-
match_options: { attribute_whitespace: :strict }
|
|
1816
|
-
)
|
|
1817
|
-
# => true (documents are NOT equivalent)
|
|
1818
|
-
----
|
|
1819
|
-
|
|
1820
|
-
.HTML examples with strict attribute_whitespace
|
|
1821
|
-
[example]
|
|
1822
|
-
[source,html]
|
|
1823
|
-
----
|
|
1824
|
-
<a href="/admin" class=" button primary ">Link</a>
|
|
1825
|
-
<a href="/admin" class="button primary">Link</a>
|
|
1826
|
-
----
|
|
1827
|
-
|
|
1828
|
-
[source,ruby]
|
|
1829
|
-
----
|
|
1830
|
-
html1 = '<a href="/admin" class=" button primary ">Link</a>'
|
|
1831
|
-
html2 = '<a href="/admin" class="button primary">Link</a>'
|
|
1832
|
-
|
|
1833
|
-
expect(html1).not_to be_html_equivalent_to(
|
|
1834
|
-
html2,
|
|
1835
|
-
match_options: { attribute_whitespace: :strict }
|
|
1836
|
-
)
|
|
1837
|
-
# => true (documents are NOT equivalent)
|
|
1838
|
-
----
|
|
1839
|
-
|
|
1840
|
-
==== Normalize behavior (collapse attribute whitespace)
|
|
1841
|
-
|
|
1842
|
-
When `attribute_whitespace: :normalize`, whitespace in attribute values is collapsed and trimmed.
|
|
1843
|
-
|
|
1844
|
-
.XML examples with normalized attribute_whitespace
|
|
1845
|
-
[example]
|
|
1846
|
-
These documents **are** considered equal:
|
|
1847
|
-
|
|
1848
|
-
[source,xml]
|
|
1849
|
-
----
|
|
1850
|
-
<div class=" foo bar ">text</div>
|
|
1851
|
-
<div class="foo bar">text</div>
|
|
1852
|
-
----
|
|
1853
|
-
|
|
1854
|
-
[source,ruby]
|
|
1855
|
-
----
|
|
1856
|
-
actual = '<div class=" foo bar ">text</div>'
|
|
1857
|
-
expected = '<div class="foo bar">text</div>'
|
|
1858
|
-
|
|
1859
|
-
expect(actual).to be_xml_equivalent_to(
|
|
1860
|
-
expected,
|
|
1861
|
-
match_options: {
|
|
1862
|
-
text_content: :normalize,
|
|
1863
|
-
structural_whitespace: :ignore,
|
|
1864
|
-
attribute_whitespace: :normalize,
|
|
1865
|
-
comments: :ignore
|
|
1866
|
-
}
|
|
1867
|
-
)
|
|
1868
|
-
# => true (documents are equivalent)
|
|
1869
|
-
----
|
|
1870
|
-
|
|
1871
|
-
Multiple attributes with whitespace:
|
|
1872
|
-
|
|
1873
|
-
[source,xml]
|
|
1874
|
-
----
|
|
1875
|
-
<item id=" 123 " name=" Widget " category=" tools "/>
|
|
1876
|
-
<item id="123" name="Widget" category="tools"/>
|
|
1877
|
-
----
|
|
1878
|
-
|
|
1879
|
-
[source,ruby]
|
|
1880
|
-
----
|
|
1881
|
-
xml1 = '<item id=" 123 " name=" Widget " category=" tools "/>'
|
|
1882
|
-
xml2 = '<item id="123" name="Widget" category="tools"/>'
|
|
1883
|
-
|
|
1884
|
-
expect(xml1).to be_xml_equivalent_to(
|
|
1885
|
-
xml2,
|
|
1886
|
-
match_options: { attribute_whitespace: :normalize }
|
|
1887
|
-
)
|
|
1888
|
-
# => true (documents are equivalent)
|
|
1889
|
-
----
|
|
1890
|
-
|
|
1891
|
-
.HTML examples with normalized attribute_whitespace
|
|
1892
|
-
[example]
|
|
1893
|
-
[source,html]
|
|
1894
|
-
----
|
|
1895
|
-
<a href="/admin" class=" button primary " id=" main-link ">Link</a>
|
|
1896
|
-
<a href="/admin" class="button primary" id="main-link">Link</a>
|
|
1897
|
-
----
|
|
1898
|
-
|
|
1899
|
-
[source,ruby]
|
|
1900
|
-
----
|
|
1901
|
-
html1 = '<a href="/admin" class=" button primary " id=" main-link ">Link</a>'
|
|
1902
|
-
html2 = '<a href="/admin" class="button primary" id="main-link">Link</a>'
|
|
1903
|
-
|
|
1904
|
-
expect(html1).to be_html_equivalent_to(
|
|
1905
|
-
html2,
|
|
1906
|
-
match_options: { attribute_whitespace: :normalize }
|
|
1907
|
-
)
|
|
1908
|
-
# => true (documents are equivalent)
|
|
1909
|
-
----
|
|
1910
|
-
|
|
1911
|
-
==== Ignore behavior
|
|
1912
|
-
|
|
1913
|
-
When `attribute_whitespace: :ignore`, attribute values are not compared at all (only attribute names are checked).
|
|
1914
|
-
|
|
1915
|
-
.Example with ignored attribute_whitespace
|
|
1916
|
-
[example]
|
|
1917
|
-
[source,ruby]
|
|
1918
|
-
----
|
|
1919
|
-
xml1 = '<item class="foo">text</item>'
|
|
1920
|
-
xml2 = '<item class="completely different">text</item>'
|
|
1921
|
-
|
|
1922
|
-
expect(xml1).to be_xml_equivalent_to(
|
|
1923
|
-
xml2,
|
|
1924
|
-
match_options: { attribute_whitespace: :ignore }
|
|
1925
|
-
)
|
|
1926
|
-
# => true (attribute values are not compared)
|
|
1927
|
-
----
|
|
1928
|
-
|
|
1929
|
-
=== Comments dimension
|
|
1930
|
-
|
|
1931
|
-
The `comments` dimension controls how XML/HTML comments are compared.
|
|
1932
|
-
|
|
1933
|
-
==== Strict behavior
|
|
1934
|
-
|
|
1935
|
-
When `comments: :strict`, comments must match exactly, including their content and position.
|
|
1936
|
-
|
|
1937
|
-
.XML examples with strict comments
|
|
1938
|
-
[example]
|
|
1939
|
-
These documents are **not** equivalent due to different comments:
|
|
1940
|
-
|
|
1941
|
-
[source,xml]
|
|
1942
|
-
----
|
|
1943
|
-
<root><!-- First comment --><a>text</a></root>
|
|
1944
|
-
<root><!-- Different comment --><a>text</a></root>
|
|
1945
|
-
----
|
|
1946
|
-
|
|
1947
|
-
[source,ruby]
|
|
1948
|
-
----
|
|
1949
|
-
xml1 = "<root><!-- First comment --><a>text</a></root>"
|
|
1950
|
-
xml2 = "<root><!-- Different comment --><a>text</a></root>"
|
|
1951
|
-
|
|
1952
|
-
expect(xml1).not_to be_xml_equivalent_to(
|
|
1953
|
-
xml2,
|
|
1954
|
-
match_options: { comments: :strict }
|
|
1955
|
-
)
|
|
1956
|
-
# => true (documents are NOT equivalent - comments differ)
|
|
1957
|
-
----
|
|
1958
|
-
|
|
1959
|
-
==== Ignore behavior (comments don't affect comparison)
|
|
1960
|
-
|
|
1961
|
-
When `comments: :ignore`, comments are completely ignored during comparison.
|
|
1962
|
-
|
|
1963
|
-
.XML examples with ignored comments
|
|
1964
|
-
[example]
|
|
1965
|
-
These documents **are** considered equal despite different comments:
|
|
1966
|
-
|
|
1967
|
-
[source,xml]
|
|
1968
|
-
----
|
|
1969
|
-
<root><!-- comment --><a>text</a></root>
|
|
1970
|
-
<root><!-- different --><a>text</a></root>
|
|
1971
|
-
<root><a>text</a></root>
|
|
1972
|
-
----
|
|
1973
|
-
|
|
1974
|
-
[source,ruby]
|
|
1975
|
-
----
|
|
1976
|
-
with_comment = "<root><!-- comment --><a>text</a></root>"
|
|
1977
|
-
different_comment = "<root><!-- different --><a>text</a></root>"
|
|
1978
|
-
no_comment = "<root><a>text</a></root>"
|
|
1979
|
-
|
|
1980
|
-
expect(with_comment).to be_xml_equivalent_to(
|
|
1981
|
-
different_comment,
|
|
1982
|
-
match_options: {
|
|
1983
|
-
text_content: :normalize,
|
|
1984
|
-
structural_whitespace: :ignore,
|
|
1985
|
-
attribute_whitespace: :strict,
|
|
1986
|
-
comments: :ignore
|
|
1987
|
-
}
|
|
1988
|
-
)
|
|
1989
|
-
# => true (documents are equivalent - comments ignored)
|
|
1990
|
-
|
|
1991
|
-
expect(with_comment).to be_xml_equivalent_to(
|
|
1992
|
-
no_comment,
|
|
1993
|
-
match_options: {
|
|
1994
|
-
text_content: :normalize,
|
|
1995
|
-
structural_whitespace: :ignore,
|
|
1996
|
-
attribute_whitespace: :strict,
|
|
1997
|
-
comments: :ignore
|
|
1998
|
-
}
|
|
1999
|
-
)
|
|
2000
|
-
# => true (documents are equivalent - comments ignored)
|
|
2001
|
-
----
|
|
2002
|
-
|
|
2003
|
-
Complex document with multiple comments:
|
|
2004
|
-
|
|
2005
|
-
[source,xml]
|
|
2006
|
-
----
|
|
2007
|
-
<!-- Document header -->
|
|
2008
|
-
<document>
|
|
2009
|
-
<!-- Metadata section -->
|
|
2010
|
-
<metadata>
|
|
2011
|
-
<title>My Document</title>
|
|
2012
|
-
<!-- Author information -->
|
|
2013
|
-
<author>John Doe</author>
|
|
2014
|
-
</metadata>
|
|
2015
|
-
<!-- Main content -->
|
|
2016
|
-
<content>
|
|
2017
|
-
<p>Text</p>
|
|
2018
|
-
</content>
|
|
2019
|
-
</document>
|
|
2020
|
-
|
|
2021
|
-
<document>
|
|
2022
|
-
<metadata>
|
|
2023
|
-
<title>My Document</title>
|
|
2024
|
-
<author>John Doe</author>
|
|
2025
|
-
</metadata>
|
|
2026
|
-
<content>
|
|
2027
|
-
<p>Text</p>
|
|
2028
|
-
</content>
|
|
2029
|
-
</document>
|
|
2030
|
-
----
|
|
2031
|
-
|
|
2032
|
-
[source,ruby]
|
|
2033
|
-
----
|
|
2034
|
-
with_comments = <<~XML
|
|
2035
|
-
<!-- Document header -->
|
|
2036
|
-
<document>
|
|
2037
|
-
<!-- Metadata section -->
|
|
2038
|
-
<metadata>
|
|
2039
|
-
<title>My Document</title>
|
|
2040
|
-
<!-- Author information -->
|
|
2041
|
-
<author>John Doe</author>
|
|
2042
|
-
</metadata>
|
|
2043
|
-
<!-- Main content -->
|
|
2044
|
-
<content>
|
|
2045
|
-
<p>Text</p>
|
|
2046
|
-
</content>
|
|
2047
|
-
</document>
|
|
2048
|
-
XML
|
|
2049
|
-
|
|
2050
|
-
without_comments = <<~XML
|
|
2051
|
-
<document>
|
|
2052
|
-
<metadata>
|
|
2053
|
-
<title>My Document</title>
|
|
2054
|
-
<author>John Doe</author>
|
|
2055
|
-
</metadata>
|
|
2056
|
-
<content>
|
|
2057
|
-
<p>Text</p>
|
|
2058
|
-
</content>
|
|
2059
|
-
</document>
|
|
2060
|
-
XML
|
|
2061
|
-
|
|
2062
|
-
expect(with_comments).to be_xml_equivalent_to(
|
|
2063
|
-
without_comments,
|
|
2064
|
-
match_options: { comments: :ignore }
|
|
2065
|
-
)
|
|
2066
|
-
# => true (documents are equivalent)
|
|
2067
|
-
----
|
|
2068
|
-
|
|
2069
|
-
.HTML examples with ignored comments
|
|
2070
|
-
[example]
|
|
2071
|
-
[source,html]
|
|
2072
|
-
----
|
|
2073
|
-
<!-- Navigation -->
|
|
2074
|
-
<nav>
|
|
2075
|
-
<!-- Primary menu -->
|
|
2076
|
-
<ul>
|
|
2077
|
-
<li>Home</li>
|
|
2078
|
-
</ul>
|
|
2079
|
-
</nav>
|
|
2080
|
-
|
|
2081
|
-
<nav>
|
|
2082
|
-
<ul>
|
|
2083
|
-
<li>Home</li>
|
|
2084
|
-
</ul>
|
|
2085
|
-
</nav>
|
|
2086
|
-
----
|
|
2087
|
-
|
|
2088
|
-
[source,ruby]
|
|
2089
|
-
----
|
|
2090
|
-
html_with_comments = <<~HTML
|
|
2091
|
-
<!-- Navigation -->
|
|
2092
|
-
<nav>
|
|
2093
|
-
<!-- Primary menu -->
|
|
2094
|
-
<ul>
|
|
2095
|
-
<li>Home</li>
|
|
2096
|
-
</ul>
|
|
2097
|
-
</nav>
|
|
2098
|
-
HTML
|
|
2099
|
-
|
|
2100
|
-
html_without_comments = <<~HTML
|
|
2101
|
-
<nav>
|
|
2102
|
-
<ul>
|
|
2103
|
-
<li>Home</li>
|
|
2104
|
-
</ul>
|
|
2105
|
-
</nav>
|
|
2106
|
-
HTML
|
|
2107
|
-
|
|
2108
|
-
expect(html_with_comments).to be_html_equivalent_to(
|
|
2109
|
-
html_without_comments,
|
|
2110
|
-
match_options: { comments: :ignore }
|
|
2111
|
-
)
|
|
2112
|
-
# => true (documents are equivalent)
|
|
2113
|
-
----
|
|
2114
|
-
|
|
2115
|
-
==== Normalize behavior
|
|
2116
|
-
|
|
2117
|
-
When `comments: :normalize`, comment content is trimmed and whitespace is collapsed before comparison.
|
|
2118
|
-
|
|
2119
|
-
.Example with normalized comments
|
|
2120
|
-
[example]
|
|
2121
|
-
[source,ruby]
|
|
2122
|
-
----
|
|
2123
|
-
xml1 = "<root><!-- comment with spaces --><a>text</a></root>"
|
|
2124
|
-
xml2 = "<root><!-- comment with spaces --><a>text</a></root>"
|
|
2125
|
-
|
|
2126
|
-
expect(xml1).to be_xml_equivalent_to(
|
|
2127
|
-
xml2,
|
|
2128
|
-
match_options: { comments: :normalize }
|
|
2129
|
-
)
|
|
2130
|
-
# => true (comments are normalized before comparison)
|
|
2131
|
-
----
|
|
2132
|
-
|
|
2133
|
-
==== Precedence resolution
|
|
2134
|
-
|
|
2135
|
-
When multiple configuration sources are present, Canon resolves them in this order (highest to lowest precedence):
|
|
2136
|
-
|
|
2137
|
-
. Explicit `match_options` hash in the test
|
|
2138
|
-
. Named `match_profile` in the test
|
|
2139
|
-
. Global format-specific profile (e.g., `xml_match_profile`)
|
|
2140
|
-
. Format-specific defaults (e.g., XML → strict, HTML → rendered)
|
|
2141
|
-
|
|
2142
|
-
.Example of precedence resolution
|
|
2143
|
-
====
|
|
2144
|
-
[source,ruby]
|
|
2145
|
-
----
|
|
2146
|
-
# Global configuration
|
|
2147
|
-
Canon::RSpecMatchers.configure do |config|
|
|
2148
|
-
config.xml_match_profile = :spec_friendly
|
|
2149
|
-
end
|
|
2150
|
-
|
|
2151
|
-
# This uses strict for attribute_whitespace (explicit option)
|
|
2152
|
-
# and spec_friendly for other dimensions (global profile)
|
|
2153
|
-
expect(actual).to be_xml_equivalent_to(
|
|
2154
|
-
expected,
|
|
2155
|
-
match_options: {
|
|
2156
|
-
attribute_whitespace: :strict
|
|
2157
|
-
}
|
|
2158
|
-
)
|
|
2159
|
-
----
|
|
2160
|
-
====
|
|
2161
|
-
|
|
2162
|
-
|
|
2163
|
-
[[ignore_attr_order]]
|
|
2164
|
-
==== ignore_attr_order
|
|
2165
|
-
|
|
2166
|
-
`ignore_attr_order: {true|false}` default: `true`
|
|
2167
|
-
|
|
2168
|
-
When `true`, all attributes are sorted before comparison and only attributes of
|
|
2169
|
-
the same type are compared.
|
|
2170
|
-
|
|
2171
|
-
Usage:
|
|
2172
|
-
|
|
2173
|
-
[source,ruby]
|
|
2174
|
-
----
|
|
2175
|
-
Canon::Comparison.equivalent?(doc1, doc2, ignore_attr_order: true)
|
|
2176
|
-
----
|
|
2177
|
-
|
|
2178
|
-
.HTML examples with ignore_attr_order
|
|
2179
|
-
[example]
|
|
2180
|
-
====
|
|
2181
|
-
When `true` the following HTML strings are considered equal:
|
|
2182
|
-
|
|
2183
|
-
[source,html]
|
|
2184
|
-
----
|
|
2185
|
-
<a href="/admin" class="button" target="_blank">Link</a>
|
|
2186
|
-
<a class="button" target="_blank" href="/admin">Link</a>
|
|
2187
|
-
----
|
|
2188
|
-
|
|
2189
|
-
[source,ruby]
|
|
2190
|
-
----
|
|
2191
|
-
html1 = '<a href="/admin" class="button" target="_blank">Link</a>'
|
|
2192
|
-
html2 = '<a class="button" target="_blank" href="/admin">Link</a>'
|
|
2193
|
-
Canon::Comparison.equivalent?(html1, html2, ignore_attr_order: true)
|
|
2194
|
-
# => true
|
|
2195
|
-
----
|
|
2196
|
-
|
|
2197
|
-
When `false` attributes are compared in order:
|
|
2198
|
-
|
|
2199
|
-
[source,ruby]
|
|
2200
|
-
----
|
|
2201
|
-
html1 = '<a href="/admin" class="button">Link</a>'
|
|
2202
|
-
html2 = '<a class="button" href="/admin">Link</a>'
|
|
2203
|
-
Canon::Comparison.equivalent?(html1, html2, ignore_attr_order: false)
|
|
2204
|
-
# => false
|
|
2205
|
-
----
|
|
2206
|
-
====
|
|
2207
|
-
|
|
2208
|
-
.XML examples with ignore_attr_order
|
|
2209
|
-
[example]
|
|
2210
|
-
====
|
|
2211
|
-
When `true` the following XML strings are considered equal:
|
|
2212
|
-
|
|
2213
|
-
[source,xml]
|
|
2214
|
-
----
|
|
2215
|
-
<item id="1" name="Widget" price="10.00"/>
|
|
2216
|
-
<item price="10.00" id="1" name="Widget"/>
|
|
2217
|
-
----
|
|
2218
|
-
|
|
2219
|
-
[source,ruby]
|
|
2220
|
-
----
|
|
2221
|
-
xml1 = '<item id="1" name="Widget" price="10.00"/>'
|
|
2222
|
-
xml2 = '<item price="10.00" id="1" name="Widget"/>'
|
|
2223
|
-
Canon::Comparison.equivalent?(xml1, xml2, ignore_attr_order: true)
|
|
2224
|
-
# => true
|
|
2225
|
-
----
|
|
2226
|
-
====
|
|
2227
|
-
|
|
2228
|
-
|
|
2229
|
-
[[verbose]]
|
|
2230
|
-
==== verbose
|
|
2231
|
-
|
|
2232
|
-
`verbose: {true|false}` default: `false`
|
|
2233
|
-
|
|
2234
|
-
When `true`, instead of returning a boolean value `Canon::Comparison.equivalent?`
|
|
2235
|
-
returns an array of all errors encountered when performing a comparison.
|
|
2236
|
-
|
|
2237
|
-
WARNING: When `true`, the comparison takes longer! Not only because more
|
|
2238
|
-
processing is required to produce meaningful differences, but also because in
|
|
2239
|
-
this mode, comparison does **NOT** stop when a first difference is encountered,
|
|
2240
|
-
because the goal is to capture as many differences as possible.
|
|
2241
|
-
|
|
2242
|
-
Usage:
|
|
2243
|
-
|
|
2244
|
-
[source,ruby]
|
|
2245
|
-
----
|
|
2246
|
-
Canon::Comparison.equivalent?(doc1, doc2, verbose: true)
|
|
2247
|
-
----
|
|
2248
|
-
|
|
2249
|
-
Return values in verbose mode:
|
|
2250
|
-
|
|
2251
|
-
* Empty array `[]` if documents are equivalent
|
|
2252
|
-
* Array of difference hashes if documents differ
|
|
2253
|
-
|
|
2254
|
-
Each difference hash contains:
|
|
2255
|
-
|
|
2256
|
-
`node1`:: The first node involved in the difference
|
|
2257
|
-
`node2`:: The second node involved in the difference
|
|
2258
|
-
`diff1`:: Difference code for the first node
|
|
2259
|
-
`diff2`:: Difference code for the second node
|
|
2260
|
-
|
|
2261
|
-
Difference codes:
|
|
2262
|
-
|
|
2263
|
-
* `Canon::Comparison::EQUIVALENT` (1) - Nodes are equivalent
|
|
2264
|
-
* `Canon::Comparison::MISSING_ATTRIBUTE` (2) - Attribute missing
|
|
2265
|
-
* `Canon::Comparison::MISSING_NODE` (3) - Node missing
|
|
2266
|
-
* `Canon::Comparison::UNEQUAL_ATTRIBUTES` (4) - Attributes differ
|
|
2267
|
-
* `Canon::Comparison::UNEQUAL_COMMENTS` (5) - Comments differ
|
|
2268
|
-
* `Canon::Comparison::UNEQUAL_ELEMENTS` (7) - Element names differ
|
|
2269
|
-
* `Canon::Comparison::UNEQUAL_NODES_TYPES` (8) - Node types differ
|
|
2270
|
-
* `Canon::Comparison::UNEQUAL_TEXT_CONTENTS` (9) - Text content differs
|
|
2271
|
-
|
|
2272
|
-
.Verbose mode examples
|
|
2273
|
-
[example]
|
|
2274
|
-
====
|
|
2275
|
-
[source,ruby]
|
|
2276
|
-
----
|
|
2277
|
-
# Verbose mode with equivalent documents
|
|
2278
|
-
html1 = '<div>Hello</div>'
|
|
2279
|
-
html2 = '<div>Hello</div>'
|
|
2280
|
-
result = Canon::Comparison.equivalent?(html1, html2, verbose: true)
|
|
2281
|
-
# => [] (empty array indicates equivalence)
|
|
2282
|
-
|
|
2283
|
-
# Verbose mode with different text content
|
|
2284
|
-
html1 = '<div>Hello</div>'
|
|
2285
|
-
html2 = '<div>Goodbye</div>'
|
|
2286
|
-
result = Canon::Comparison.equivalent?(html1, html2, verbose: true)
|
|
2287
|
-
# => [{
|
|
2288
|
-
# node1: <Nokogiri::XML::Text>,
|
|
2289
|
-
# node2: <Nokogiri::XML::Text>,
|
|
2290
|
-
# diff1: 9, # UNEQUAL_TEXT_CONTENTS
|
|
2291
|
-
# diff2: 9 # UNEQUAL_TEXT_CONTENTS
|
|
2292
|
-
# }]
|
|
2293
|
-
|
|
2294
|
-
# Verbose mode with different element names
|
|
2295
|
-
html1 = '<div>Test</div>'
|
|
2296
|
-
html2 = '<span>Test</span>'
|
|
2297
|
-
result = Canon::Comparison.equivalent?(html1, html2, verbose: true)
|
|
2298
|
-
# => [{
|
|
2299
|
-
# node1: <Nokogiri::XML::Element: div>,
|
|
2300
|
-
# node2: <Nokogiri::XML::Element: span>,
|
|
2301
|
-
# diff1: 7, # UNEQUAL_ELEMENTS
|
|
2302
|
-
# diff2: 7 # UNEQUAL_ELEMENTS
|
|
2303
|
-
# }]
|
|
2304
|
-
|
|
2305
|
-
# Verbose mode with missing attributes
|
|
2306
|
-
html1 = '<div class="foo" id="bar">Test</div>'
|
|
2307
|
-
html2 = '<div class="foo">Test</div>'
|
|
2308
|
-
result = Canon::Comparison.equivalent?(html1, html2, verbose: true)
|
|
2309
|
-
# => [{
|
|
2310
|
-
# node1: <Nokogiri::XML::Element: div>,
|
|
2311
|
-
# node2: <Nokogiri::XML::Element: div>,
|
|
2312
|
-
# diff1: 2, # MISSING_ATTRIBUTE
|
|
2313
|
-
# diff2: 2 # MISSING_ATTRIBUTE
|
|
2314
|
-
# }]
|
|
2315
|
-
|
|
2316
|
-
# Check difference type programmatically
|
|
2317
|
-
result = Canon::Comparison.equivalent?(html1, html2, verbose: true)
|
|
2318
|
-
if result.empty?
|
|
2319
|
-
puts "Documents are equivalent"
|
|
2320
|
-
else
|
|
2321
|
-
result.each do |diff|
|
|
2322
|
-
case diff[:diff1]
|
|
2323
|
-
when Canon::Comparison::UNEQUAL_TEXT_CONTENTS
|
|
2324
|
-
puts "Text content differs"
|
|
2325
|
-
when Canon::Comparison::UNEQUAL_ELEMENTS
|
|
2326
|
-
puts "Element names differ"
|
|
2327
|
-
when Canon::Comparison::MISSING_ATTRIBUTE
|
|
2328
|
-
puts "Attributes differ"
|
|
2329
|
-
end
|
|
2330
|
-
end
|
|
2331
|
-
end
|
|
2332
|
-
----
|
|
2333
|
-
====
|
|
2334
|
-
|
|
2335
|
-
=== Input validation
|
|
2336
|
-
|
|
2337
|
-
Canon provides comprehensive input validation for all supported formats (XML,
|
|
2338
|
-
HTML, JSON, YAML). When malformed input is detected, Canon raises a
|
|
2339
|
-
`Canon::ValidationError` with detailed location information to help you quickly
|
|
2340
|
-
identify and fix the problem.
|
|
2341
|
-
|
|
2342
|
-
==== Purpose
|
|
2343
|
-
|
|
2344
|
-
Input validation ensures that:
|
|
2345
|
-
|
|
2346
|
-
* Malformed documents are detected early with clear error messages
|
|
2347
|
-
* Syntax errors show exact line and column numbers
|
|
2348
|
-
* Error details appear in RSpec test output (not hidden in log files)
|
|
2349
|
-
* Users receive actionable feedback about what's wrong and where
|
|
2350
|
-
|
|
2351
|
-
==== How it works
|
|
2352
|
-
|
|
2353
|
-
Canon validates input **before parsing** using format-specific validators:
|
|
2354
|
-
|
|
2355
|
-
* `Canon::Validators::XmlValidator` - Strict XML syntax validation
|
|
2356
|
-
* `Canon::Validators::HtmlValidator` - HTML5 and XHTML validation
|
|
2357
|
-
* `Canon::Validators::JsonValidator` - JSON syntax validation
|
|
2358
|
-
* `Canon::Validators::YamlValidator` - YAML syntax validation
|
|
2359
|
-
|
|
2360
|
-
Validation happens automatically when you use Canon's formatters or comparison
|
|
2361
|
-
methods.
|
|
2362
|
-
|
|
2363
|
-
==== Validation error format
|
|
2364
|
-
|
|
2365
|
-
When validation fails, Canon raises `Canon::ValidationError` with:
|
|
2366
|
-
|
|
2367
|
-
* `format` - The format being validated (`:xml`, `:html`, `:json`, `:yaml`)
|
|
2368
|
-
* `line` - Line number where the error occurred (if available)
|
|
2369
|
-
* `column` - Column number where the error occurred (if available)
|
|
2370
|
-
* `details` - Additional context about the error
|
|
2371
|
-
|
|
2372
|
-
.Validation error example
|
|
2373
|
-
[example]
|
|
2374
|
-
[source,ruby]
|
|
2375
|
-
----
|
|
2376
|
-
require 'canon'
|
|
2377
|
-
|
|
2378
|
-
malformed_xml = '<root><unclosed>'
|
|
2379
|
-
|
|
2380
|
-
begin
|
|
2381
|
-
Canon.format(malformed_xml, :xml)
|
|
2382
|
-
rescue Canon::ValidationError => e
|
|
2383
|
-
puts e.message
|
|
2384
|
-
# XML Validation Error: Premature end of data in tag unclosed line 1
|
|
2385
|
-
# Line: 1
|
|
2386
|
-
# Column: 18
|
|
2387
|
-
|
|
2388
|
-
puts "Format: #{e.format}" # => :xml
|
|
2389
|
-
puts "Line: #{e.line}" # => 1
|
|
2390
|
-
puts "Column: #{e.column}" # => 18
|
|
2391
|
-
end
|
|
2392
|
-
----
|
|
2393
|
-
|
|
2394
|
-
==== Format-specific validation
|
|
2395
|
-
|
|
2396
|
-
===== XML validation
|
|
2397
|
-
|
|
2398
|
-
Uses Nokogiri's strict XML parsing to detect:
|
|
2399
|
-
|
|
2400
|
-
* Unclosed tags
|
|
2401
|
-
* Mismatched tags
|
|
2402
|
-
* Invalid XML declaration
|
|
2403
|
-
* Malformed attributes
|
|
2404
|
-
* Invalid character references
|
|
2405
|
-
|
|
2406
|
-
.XML validation examples
|
|
2407
|
-
[example]
|
|
2408
|
-
[source,ruby]
|
|
2409
|
-
----
|
|
2410
|
-
# Unclosed tag
|
|
2411
|
-
Canon.format('<root><item>', :xml)
|
|
2412
|
-
# => Canon::ValidationError: XML Validation Error: Premature end of data in tag item line 1
|
|
2413
|
-
# Line: 1
|
|
2414
|
-
|
|
2415
|
-
# Mismatched tags
|
|
2416
|
-
Canon.format('<root><item></root>', :xml)
|
|
2417
|
-
# => Canon::ValidationError: XML Validation Error: Opening and ending tag mismatch: item line 1 and root
|
|
2418
|
-
# Line: 1
|
|
2419
|
-
----
|
|
2420
|
-
|
|
2421
|
-
===== HTML validation
|
|
2422
|
-
|
|
2423
|
-
Automatically detects HTML5 vs XHTML and applies appropriate validation:
|
|
2424
|
-
|
|
2425
|
-
* HTML5: Uses Nokogiri::HTML5 parser with error filtering
|
|
2426
|
-
* XHTML: Uses strict XML parsing
|
|
2427
|
-
|
|
2428
|
-
Special handling:
|
|
2429
|
-
|
|
2430
|
-
* Strips XML declarations from HTML (common in legacy HTML files)
|
|
2431
|
-
* Filters out non-critical HTML5 parser warnings
|
|
2432
|
-
* Only reports significant errors (level 2+)
|
|
2433
|
-
|
|
2434
|
-
.HTML validation examples
|
|
2435
|
-
[example]
|
|
2436
|
-
[source,ruby]
|
|
2437
|
-
----
|
|
2438
|
-
# Malformed XHTML
|
|
2439
|
-
xhtml = '<html xmlns="http://www.w3.org/1999/xhtml"><body><p>Unclosed'
|
|
2440
|
-
Canon.format(xhtml, :html)
|
|
2441
|
-
# => Canon::ValidationError: HTML Validation Error: Premature end of data in tag p line 1
|
|
2442
|
-
# Line: 1
|
|
2443
|
-
|
|
2444
|
-
# HTML5 with errors
|
|
2445
|
-
html5 = '<div><span></div>'
|
|
2446
|
-
Canon.format(html5, :html)
|
|
2447
|
-
# => Canon::ValidationError: HTML Validation Error: Unexpected end tag : span
|
|
2448
|
-
# Line: 1
|
|
2449
|
-
----
|
|
2450
|
-
|
|
2451
|
-
===== JSON validation
|
|
2452
|
-
|
|
2453
|
-
Validates JSON syntax using Ruby's JSON parser:
|
|
2454
|
-
|
|
2455
|
-
* Missing/extra braces or brackets
|
|
2456
|
-
* Trailing commas
|
|
2457
|
-
* Invalid escape sequences
|
|
2458
|
-
* Invalid numbers
|
|
2459
|
-
|
|
2460
|
-
Provides context showing the error location in the JSON structure.
|
|
2461
|
-
|
|
2462
|
-
.JSON validation examples
|
|
2463
|
-
[example]
|
|
2464
|
-
[source,ruby]
|
|
2465
|
-
----
|
|
2466
|
-
# Missing closing brace
|
|
2467
|
-
Canon.format('{"key": "value"', :json)
|
|
2468
|
-
# => Canon::ValidationError: JSON Validation Error: unexpected token at '{"key": "value"'
|
|
2469
|
-
# Details: Error at position 16
|
|
2470
|
-
|
|
2471
|
-
# Trailing comma (invalid in JSON)
|
|
2472
|
-
Canon.format('{"a": 1,}', :json)
|
|
2473
|
-
# => Canon::ValidationError: JSON Validation Error: unexpected token at '{"a": 1,}'
|
|
2474
|
-
# Details: Error at position 8
|
|
2475
|
-
----
|
|
2476
|
-
|
|
2477
|
-
===== YAML validation
|
|
2478
|
-
|
|
2479
|
-
Validates YAML syntax using Psych (Ruby's YAML parser):
|
|
2480
|
-
|
|
2481
|
-
* Invalid indentation
|
|
2482
|
-
* Unclosed brackets/braces
|
|
2483
|
-
* Invalid anchors/aliases
|
|
2484
|
-
* Type mismatches
|
|
2485
|
-
|
|
2486
|
-
Shows error location with line numbers and context.
|
|
2487
|
-
|
|
2488
|
-
.YAML validation examples
|
|
2489
|
-
[example]
|
|
2490
|
-
[source,ruby]
|
|
2491
|
-
----
|
|
2492
|
-
# Unclosed bracket
|
|
2493
|
-
Canon.format("key: {unclosed", :yaml)
|
|
2494
|
-
# => Canon::ValidationError: YAML Validation Error: (<unknown>): did not find expected node content...
|
|
2495
|
-
# Line: 1
|
|
2496
|
-
# Details: Shows context around error
|
|
2497
|
-
|
|
2498
|
-
# Invalid indentation
|
|
2499
|
-
yaml = <<~YAML
|
|
2500
|
-
parent:
|
|
2501
|
-
child: value
|
|
2502
|
-
YAML
|
|
2503
|
-
Canon.format(yaml, :yaml)
|
|
2504
|
-
# => Canon::ValidationError: YAML Validation Error: mapping values are not allowed in this context
|
|
2505
|
-
# Line: 2
|
|
2506
|
-
----
|
|
2507
|
-
|
|
2508
|
-
==== Validation in RSpec tests
|
|
2509
|
-
|
|
2510
|
-
Canon's RSpec matchers automatically propagate validation errors to test output,
|
|
2511
|
-
making it easy to see what's wrong:
|
|
2512
|
-
|
|
2513
|
-
.RSpec validation error example
|
|
2514
|
-
[example]
|
|
2515
|
-
[source,ruby]
|
|
2516
|
-
----
|
|
2517
|
-
require 'canon/rspec_matchers'
|
|
2518
|
-
|
|
2519
|
-
RSpec.describe 'XML validation' do
|
|
2520
|
-
it 'validates input' do
|
|
2521
|
-
malformed_xml = '<root><unclosed>'
|
|
2522
|
-
expected_xml = '<root><item/></root>'
|
|
2523
|
-
|
|
2524
|
-
# This will fail with a clear validation error message
|
|
2525
|
-
expect(malformed_xml).to be_xml_equivalent_to(expected_xml)
|
|
2526
|
-
end
|
|
2527
|
-
end
|
|
2528
|
-
|
|
2529
|
-
# Test output shows:
|
|
2530
|
-
# Canon::ValidationError:
|
|
2531
|
-
# XML Validation Error: Premature end of data in tag unclosed line 1
|
|
2532
|
-
# Line: 1
|
|
2533
|
-
# Column: 18
|
|
2534
|
-
----
|
|
2535
|
-
|
|
2536
|
-
The error appears directly in the RSpec output, not hidden in separate error
|
|
2537
|
-
files or logs.
|
|
2538
|
-
|
|
2539
|
-
==== Validation in comparison
|
|
2540
|
-
|
|
2541
|
-
Validation also occurs when using `Canon::Comparison.equivalent?`:
|
|
2542
|
-
|
|
2543
|
-
.Comparison validation example
|
|
2544
|
-
[example]
|
|
2545
|
-
[source,ruby]
|
|
2546
|
-
----
|
|
2547
|
-
require 'canon/comparison'
|
|
2548
|
-
|
|
2549
|
-
xml1 = '<root><item/></root>'
|
|
2550
|
-
xml2 = '<root><unclosed>'
|
|
2551
|
-
|
|
2552
|
-
Canon::Comparison.equivalent?(xml1, xml2)
|
|
2553
|
-
# => Canon::ValidationError: XML Validation Error: Premature end of data in tag unclosed line 1
|
|
2554
|
-
# Line: 1
|
|
2555
|
-
# Column: 18
|
|
2556
|
-
----
|
|
2557
|
-
|
|
2558
|
-
==== Benefits
|
|
2559
|
-
|
|
2560
|
-
Input validation provides several key benefits:
|
|
2561
|
-
|
|
2562
|
-
**Early error detection**:: Problems are caught before processing begins, saving
|
|
2563
|
-
time and providing clear feedback
|
|
2564
|
-
|
|
2565
|
-
**Precise error location**:: Line and column numbers pinpoint exactly where the
|
|
2566
|
-
problem is, especially useful in large documents
|
|
2567
|
-
|
|
2568
|
-
**Clear error messages**:: Descriptive messages explain what's wrong and often
|
|
2569
|
-
suggest how to fix it
|
|
2570
|
-
|
|
2571
|
-
**Test-friendly**:: Errors appear in RSpec output where developers expect them,
|
|
2572
|
-
not in separate log files
|
|
2573
|
-
|
|
2574
|
-
**Format-aware**:: Each validator understands format-specific rules and provides
|
|
2575
|
-
relevant error details
|
|
2576
|
-
|
|
2577
|
-
=== Reporting options (diff options)
|
|
2578
|
-
|
|
2579
|
-
==== General
|
|
2580
|
-
|
|
2581
|
-
Canon provides comprehensive diff formatting capabilities across three interfaces:
|
|
2582
|
-
RSpec matchers, CLI commands, and the Ruby API. All interfaces support the same
|
|
2583
|
-
set of parameters for consistent behavior.
|
|
2584
|
-
|
|
2585
|
-
==== Parameters
|
|
2586
|
-
|
|
2587
|
-
The following table shows all available diff formatting parameters and their
|
|
2588
|
-
availability across interfaces:
|
|
2589
|
-
|
|
2590
|
-
[cols="1,1,1,1,2,1"]
|
|
2591
|
-
|===
|
|
2592
|
-
|Parameter |RSpec |CLI |Ruby API |Description |Default
|
|
2593
|
-
|
|
2594
|
-
|`use_color`
|
|
2595
|
-
|✓
|
|
2596
|
-
|✓
|
|
2597
|
-
|✓
|
|
2598
|
-
|Enable/disable colored output
|
|
2599
|
-
|`true`
|
|
2600
|
-
|
|
2601
|
-
|`diff_mode`
|
|
2602
|
-
|✓
|
|
2603
|
-
|✓
|
|
2604
|
-
|✓
|
|
2605
|
-
|Comparison mode: `:by_object` or `:by_line`
|
|
2606
|
-
|`:by_line` (RSpec), `:by_object` (XML/JSON/YAML)
|
|
2607
|
-
|
|
2608
|
-
|`context_lines`
|
|
2609
|
-
|✓
|
|
2610
|
-
|✓
|
|
2611
|
-
|✓
|
|
2612
|
-
|Number of unchanged lines to show around each change
|
|
2613
|
-
|`3`
|
|
2614
|
-
|
|
2615
|
-
|`diff_grouping_lines`
|
|
2616
|
-
|✓
|
|
2617
|
-
|✓
|
|
2618
|
-
|✓
|
|
2619
|
-
|Maximum line distance to group separate diffs into context blocks
|
|
2620
|
-
|`10`
|
|
2621
|
-
|===
|
|
2622
|
-
|
|
2623
|
-
==== Interface-specific usage
|
|
2624
|
-
|
|
2625
|
-
===== RSpec matchers configuration
|
|
2626
|
-
|
|
2627
|
-
Configure diff formatting for RSpec matchers using `Canon::RspecMatchers`:
|
|
2628
|
-
|
|
2629
|
-
[source,ruby]
|
|
2630
|
-
----
|
|
2631
|
-
require 'canon/rspec_matchers'
|
|
2632
|
-
|
|
2633
|
-
# Configure globally for all matchers
|
|
2634
|
-
Canon::RspecMatchers.diff_mode = :by_object
|
|
2635
|
-
Canon::RspecMatchers.use_color = true
|
|
2636
|
-
Canon::RspecMatchers.context_lines = 5
|
|
2637
|
-
Canon::RspecMatchers.diff_grouping_lines = 10
|
|
2638
|
-
|
|
2639
|
-
# Use in specs
|
|
2640
|
-
RSpec.describe 'My comparison' do
|
|
2641
|
-
it 'shows formatted diff' do
|
|
2642
|
-
expect(actual_xml).to be_xml_equivalent_to(expected_xml)
|
|
2643
|
-
end
|
|
2644
|
-
end
|
|
2645
|
-
----
|
|
2646
|
-
|
|
2647
|
-
===== CLI usage
|
|
2648
|
-
|
|
2649
|
-
Pass options to the `canon diff` command:
|
|
2650
|
-
|
|
2651
|
-
[source,bash]
|
|
2652
|
-
----
|
|
2653
|
-
# Basic diff with default settings
|
|
2654
|
-
$ canon diff file1.xml file2.xml --verbose
|
|
2655
|
-
|
|
2656
|
-
# Customize diff output
|
|
2657
|
-
$ canon diff file1.xml file2.xml \
|
|
2658
|
-
--verbose \
|
|
2659
|
-
--by-line \
|
|
2660
|
-
--no-color \
|
|
2661
|
-
--context-lines 5 \
|
|
2662
|
-
--diff-grouping-lines 10
|
|
2663
|
-
----
|
|
2664
|
-
|
|
2665
|
-
===== Ruby API usage
|
|
2666
|
-
|
|
2667
|
-
Use `Canon::DiffFormatter` directly in your code:
|
|
2668
|
-
|
|
2669
|
-
[source,ruby]
|
|
2670
|
-
----
|
|
2671
|
-
require 'canon/diff_formatter'
|
|
2672
|
-
require 'canon/comparison'
|
|
2673
|
-
|
|
2674
|
-
# Compare documents
|
|
2675
|
-
comparison = Canon::Comparison.new(doc1, doc2)
|
|
2676
|
-
result = comparison.compare
|
|
2677
|
-
|
|
2678
|
-
# Format diff output
|
|
2679
|
-
formatter = Canon::DiffFormatter.new(
|
|
2680
|
-
use_color: true,
|
|
2681
|
-
mode: :by_object,
|
|
2682
|
-
context_lines: 5,
|
|
2683
|
-
diff_grouping_lines: 10
|
|
2684
|
-
)
|
|
2685
|
-
|
|
2686
|
-
diff_output = formatter.format(result)
|
|
2687
|
-
puts diff_output
|
|
2688
|
-
----
|
|
2689
|
-
|
|
2690
|
-
==== Parameter details
|
|
2691
|
-
|
|
2692
|
-
===== use_color
|
|
2693
|
-
|
|
2694
|
-
Controls whether diff output includes ANSI color codes.
|
|
2695
|
-
|
|
2696
|
-
* Type: Boolean
|
|
2697
|
-
* Default: `true`
|
|
2698
|
-
* Colors used:
|
|
2699
|
-
** Red: Deletions/removed content
|
|
2700
|
-
** Green: Additions/inserted content
|
|
2701
|
-
** Yellow: Modified content
|
|
2702
|
-
** Cyan: Element names and structure
|
|
2703
|
-
|
|
2704
|
-
[source,ruby]
|
|
2705
|
-
----
|
|
2706
|
-
# Disable colors for plain text output
|
|
2707
|
-
Canon::RspecMatchers.use_color = false
|
|
2708
|
-
|
|
2709
|
-
# CLI
|
|
2710
|
-
$ canon diff file1.xml file2.xml --no-color --verbose
|
|
2711
|
-
----
|
|
2712
|
-
|
|
2713
|
-
===== diff_mode
|
|
2714
|
-
|
|
2715
|
-
Determines the comparison and display strategy.
|
|
2716
|
-
|
|
2717
|
-
* Type: Symbol (`:by_object` or `:by_line`)
|
|
2718
|
-
* Default: `:by_line` for RSpec matchers, format-dependent for CLI/API
|
|
2719
|
-
* Modes:
|
|
2720
|
-
** `:by_object` - Semantic tree-based comparison showing structural changes
|
|
2721
|
-
** `:by_line` - Line-by-line diff after canonicalization
|
|
2722
|
-
|
|
2723
|
-
[source,ruby]
|
|
2724
|
-
----
|
|
2725
|
-
# Use object-based diff for RSpec matchers
|
|
2726
|
-
Canon::RspecMatchers.diff_mode = :by_object
|
|
2727
|
-
|
|
2728
|
-
# CLI - XML uses by-object by default, force by-line
|
|
2729
|
-
$ canon diff file1.xml file2.xml --by-line --verbose
|
|
2730
|
-
----
|
|
2731
|
-
|
|
2732
|
-
===== context_lines
|
|
2733
|
-
|
|
2734
|
-
Number of unchanged lines to display around each change for context.
|
|
2735
|
-
|
|
2736
|
-
* Type: Numeric
|
|
2737
|
-
* Default: `3`
|
|
2738
|
-
* Range: `0` to any positive integer
|
|
2739
|
-
* Effect: Higher values show more surrounding context, lower values show only changes
|
|
2740
|
-
|
|
2741
|
-
[source,ruby]
|
|
2742
|
-
----
|
|
2743
|
-
# Show 5 lines of context around each change
|
|
2744
|
-
Canon::RspecMatchers.context_lines = 5
|
|
2745
|
-
|
|
2746
|
-
# CLI
|
|
2747
|
-
$ canon diff file1.xml file2.xml --context-lines 5 --verbose
|
|
2748
|
-
|
|
2749
|
-
# Ruby API
|
|
2750
|
-
formatter = Canon::DiffFormatter.new(context_lines: 5)
|
|
2751
|
-
----
|
|
2752
|
-
|
|
2753
|
-
===== diff_grouping_lines
|
|
2754
|
-
|
|
2755
|
-
Maximum line distance between separate changes to group them into a single
|
|
2756
|
-
context block.
|
|
2757
|
-
|
|
2758
|
-
* Type: Numeric or `nil`
|
|
2759
|
-
* Default: `nil` (no grouping)
|
|
2760
|
-
* Effect: When set, changes within N lines of each other are grouped into
|
|
2761
|
-
context blocks with a header showing the number of diffs in the block
|
|
2762
|
-
|
|
2763
|
-
[source,ruby]
|
|
2764
|
-
----
|
|
2765
|
-
# Group changes that are within 10 lines of each other
|
|
2766
|
-
Canon::RspecMatchers.diff_grouping_lines = 10
|
|
2767
|
-
|
|
2768
|
-
# CLI
|
|
2769
|
-
$ canon diff file1.xml file2.xml --diff-grouping-lines 10 --verbose
|
|
2770
|
-
|
|
2771
|
-
# Ruby API
|
|
2772
|
-
formatter = Canon::DiffFormatter.new(diff_grouping_lines: 10)
|
|
2773
|
-
----
|
|
2774
|
-
|
|
2775
|
-
.Example of grouped diff output
|
|
2776
|
-
[example]
|
|
2777
|
-
When `diff_grouping_lines` is set to `10`, changes close together are grouped:
|
|
2778
|
-
|
|
2779
|
-
[source]
|
|
2780
|
-
----
|
|
2781
|
-
Context block has 3 diffs (lines 5-18):
|
|
2782
|
-
5 - | <foreword id="fwd">
|
|
2783
|
-
5 + | <foreword displayorder="2" id="fwd">
|
|
2784
|
-
6 | <p>First paragraph</p>
|
|
2785
|
-
...
|
|
2786
|
-
15 - | <title>Scope</title>
|
|
2787
|
-
15 + | <title>Application Scope</title>
|
|
2788
|
-
16 | </clause>
|
|
2789
|
-
17 + | <p>New content</p>
|
|
2790
|
-
18 | </sections>
|
|
2791
|
-
----
|
|
2792
|
-
|
|
2793
|
-
Without grouping, these would appear as separate diff sections.
|
|
2794
|
-
|
|
2795
|
-
=== Visualization options
|
|
2796
|
-
|
|
2797
|
-
==== Enhanced diff output features
|
|
2798
|
-
|
|
2799
|
-
Canon's diff formatter includes several enhancements designed to make diffs more
|
|
2800
|
-
readable and informative, especially when working with RSpec test failures.
|
|
2801
|
-
|
|
2802
|
-
===== Color-coded line numbers and structure
|
|
2803
|
-
|
|
2804
|
-
**Purpose**: Improve readability by distinguishing structural elements from
|
|
2805
|
-
content changes.
|
|
2806
|
-
|
|
2807
|
-
When color mode is enabled (`use_color: true`), the diff formatter uses a
|
|
2808
|
-
consistent color scheme:
|
|
2809
|
-
|
|
2810
|
-
* **Yellow**: Line numbers and pipe separators
|
|
2811
|
-
* **Red**: Deletion markers (`-`) and removed content
|
|
2812
|
-
* **Green**: Addition markers (`+`) and inserted content
|
|
2813
|
-
* **Default terminal color**: Unchanged context lines (no ANSI codes applied)
|
|
2814
|
-
|
|
2815
|
-
This color scheme helps differentiate between:
|
|
2816
|
-
|
|
2817
|
-
* The diff structure (line numbers, pipes)
|
|
2818
|
-
* Content that was removed (red)
|
|
2819
|
-
* Content that was added (green)
|
|
2820
|
-
* Content that stayed the same (your terminal's default color)
|
|
2821
|
-
|
|
2822
|
-
.Example colored diff output
|
|
2823
|
-
[example]
|
|
2824
|
-
In a colored terminal, a typical diff line appears as:
|
|
2825
|
-
|
|
2826
|
-
[source]
|
|
2827
|
-
----
|
|
2828
|
-
5| 5 | <p>First paragraph</p> # Context line (yellow numbers/pipes, default text)
|
|
2829
|
-
6| -| <old>Text</old> # Deletion (yellow numbers/pipes, red marker/content)
|
|
2830
|
-
| 6+| <new>Text</new> # Addition (yellow numbers/pipes, green marker/content)
|
|
2831
|
-
----
|
|
2832
|
-
|
|
2833
|
-
Where:
|
|
2834
|
-
|
|
2835
|
-
* Line numbers (`5`, `6`) are in yellow
|
|
2836
|
-
* Pipe separators (`|`) are in yellow
|
|
2837
|
-
* Markers (`-`, `+`) are in red/green respectively
|
|
2838
|
-
* Changed content is highlighted in red (deletions) or green (additions)
|
|
2839
|
-
* Unchanged content uses your terminal's default color (no forced white/black)
|
|
2840
|
-
|
|
2841
|
-
**Why this matters**: When running tests with RSpec, the framework initially sets
|
|
2842
|
-
output to red. Canon's diff formatter explicitly resets colors to prevent RSpec's
|
|
2843
|
-
red from bleeding into the diff output, ensuring consistent and readable diffs.
|
|
2844
|
-
|
|
2845
|
-
===== Whitespace visualization
|
|
2846
|
-
|
|
2847
|
-
**Purpose**: Make invisible whitespace and special characters visible in diffs.
|
|
2848
|
-
|
|
2849
|
-
Whitespace changes can be difficult to spot in traditional diffs because spaces,
|
|
2850
|
-
tabs, and other invisible characters don't appear in output. Canon visualizes
|
|
2851
|
-
these changes using a comprehensive set of Unicode symbols that are safe for use
|
|
2852
|
-
with CJK (Chinese, Japanese, Korean) text.
|
|
2853
|
-
|
|
2854
|
-
**Visualization scope**: Character visualization is applied only to **diff lines**
|
|
2855
|
-
(additions, deletions, and changes), not to context lines (unchanged lines). This
|
|
2856
|
-
ensures that:
|
|
2857
|
-
|
|
2858
|
-
* Context lines display content in its original form without substitution
|
|
2859
|
-
* Only actual changes show visualization, making differences easier to spot
|
|
2860
|
-
* Within changed lines showing token-level diffs, unchanged tokens are displayed
|
|
2861
|
-
in the terminal's default color (not red/green) to distinguish them from actual
|
|
2862
|
-
changes
|
|
2863
|
-
|
|
2864
|
-
====== Default character visualization map
|
|
2865
|
-
|
|
2866
|
-
Canon provides a comprehensive CJK-safe character mapping for common non-visible
|
|
2867
|
-
characters encountered in diffs:
|
|
2868
|
-
|
|
2869
|
-
NOTE: These visualization symbols appear **only in diff lines** (additions,
|
|
2870
|
-
deletions, and changes), not in context lines (unchanged lines).
|
|
2871
|
-
|
|
2872
|
-
.Common whitespace characters
|
|
2873
|
-
[cols="1,1,1,2"]
|
|
2874
|
-
|===
|
|
2875
|
-
|Character |Unicode |Symbol |Description
|
|
2876
|
-
|
|
2877
|
-
|Regular space
|
|
2878
|
-
|U+0020
|
|
2879
|
-
|`░`
|
|
2880
|
-
|Light Shade (U+2591)
|
|
2881
|
-
|
|
2882
|
-
|Tab
|
|
2883
|
-
|U+0009
|
|
2884
|
-
|`⇥`
|
|
2885
|
-
|Rightwards Arrow to Bar (U+21E5)
|
|
2886
|
-
|
|
2887
|
-
|Non-breaking space
|
|
2888
|
-
|U+00A0
|
|
2889
|
-
|`␣`
|
|
2890
|
-
|Open Box (U+2423)
|
|
2891
|
-
|===
|
|
2892
|
-
|
|
2893
|
-
.Line endings
|
|
2894
|
-
[cols="1,1,1,2"]
|
|
2895
|
-
|===
|
|
2896
|
-
|Character |Unicode |Symbol |Description
|
|
2897
|
-
|
|
2898
|
-
|Line feed (LF)
|
|
2899
|
-
|U+000A
|
|
2900
|
-
|`↵`
|
|
2901
|
-
|Downwards Arrow with Corner Leftwards (U+21B5)
|
|
2902
|
-
|
|
2903
|
-
|Carriage return (CR)
|
|
2904
|
-
|U+000D
|
|
2905
|
-
|`⏎`
|
|
2906
|
-
|Return Symbol (U+23CE)
|
|
2907
|
-
|
|
2908
|
-
|Windows line ending (CRLF)
|
|
2909
|
-
|U+000D U+000A
|
|
2910
|
-
|`↵`
|
|
2911
|
-
|Downwards Arrow with Corner Leftwards (U+21B5)
|
|
2912
|
-
|
|
2913
|
-
|Next line (NEL)
|
|
2914
|
-
|U+0085
|
|
2915
|
-
|`⏎`
|
|
2916
|
-
|Return Symbol (U+23CE)
|
|
2917
|
-
|
|
2918
|
-
|Line separator
|
|
2919
|
-
|U+2028
|
|
2920
|
-
|`⤓`
|
|
2921
|
-
|Downwards Arrow to Bar (U+2913)
|
|
2922
|
-
|
|
2923
|
-
|Paragraph separator
|
|
2924
|
-
|U+2029
|
|
2925
|
-
|`⤓`
|
|
2926
|
-
|Downwards Arrow to Bar (U+2913)
|
|
2927
|
-
|===
|
|
2928
|
-
|
|
2929
|
-
.Unicode spaces (various widths)
|
|
2930
|
-
[cols="1,1,1,2"]
|
|
2931
|
-
|===
|
|
2932
|
-
|Character |Unicode |Symbol |Description
|
|
2933
|
-
|
|
2934
|
-
|En space
|
|
2935
|
-
|U+2002
|
|
2936
|
-
|`▭`
|
|
2937
|
-
|White Rectangle (U+25AD)
|
|
2938
|
-
|
|
2939
|
-
|Em space
|
|
2940
|
-
|U+2003
|
|
2941
|
-
|`▬`
|
|
2942
|
-
|Black Rectangle (U+25AC)
|
|
2943
|
-
|
|
2944
|
-
|Four-per-em space
|
|
2945
|
-
|U+2005
|
|
2946
|
-
|`⏓`
|
|
2947
|
-
|Metrical Short Over Long (U+23D3)
|
|
2948
|
-
|
|
2949
|
-
|Six-per-em space
|
|
2950
|
-
|U+2006
|
|
2951
|
-
|`⏕`
|
|
2952
|
-
|Metrical Two Shorts Over Long (U+23D5)
|
|
2953
|
-
|
|
2954
|
-
|Thin space
|
|
2955
|
-
|U+2009
|
|
2956
|
-
|`▯`
|
|
2957
|
-
|White Vertical Rectangle (U+25AF)
|
|
2958
|
-
|
|
2959
|
-
|Hair space
|
|
2960
|
-
|U+200A
|
|
2961
|
-
|`▮`
|
|
2962
|
-
|Black Vertical Rectangle (U+25AE)
|
|
2963
|
-
|
|
2964
|
-
|Figure space
|
|
2965
|
-
|U+2007
|
|
2966
|
-
|`□`
|
|
2967
|
-
|White Square (U+25A1)
|
|
2968
|
-
|
|
2969
|
-
|Narrow no-break space
|
|
2970
|
-
|U+202F
|
|
2971
|
-
|`▫`
|
|
2972
|
-
|White Small Square (U+25AB)
|
|
2973
|
-
|
|
2974
|
-
|Medium mathematical space
|
|
2975
|
-
|U+205F
|
|
2976
|
-
|`▭`
|
|
2977
|
-
|White Rectangle (U+25AD)
|
|
2978
|
-
|
|
2979
|
-
|Ideographic space
|
|
2980
|
-
|U+3000
|
|
2981
|
-
|`⎵`
|
|
2982
|
-
|Bottom Square Bracket (U+23B5)
|
|
2983
|
-
|
|
2984
|
-
|Ideographic half space
|
|
2985
|
-
|U+303F
|
|
2986
|
-
|`⏑`
|
|
2987
|
-
|Metrical Breve (U+23D1)
|
|
2988
|
-
|
|
2989
|
-
|===
|
|
2990
|
-
|
|
2991
|
-
.Zero-width characters (invisible troublemakers)
|
|
2992
|
-
[cols="1,1,1,2"]
|
|
2993
|
-
|===
|
|
2994
|
-
|Character |Unicode |Symbol |Description
|
|
2995
|
-
|
|
2996
|
-
|Zero-width space
|
|
2997
|
-
|U+200B
|
|
2998
|
-
|`→`
|
|
2999
|
-
|Rightwards Arrow (U+2192)
|
|
3000
|
-
|
|
3001
|
-
|Zero-width non-joiner
|
|
3002
|
-
|U+200C
|
|
3003
|
-
|`↛`
|
|
3004
|
-
|Rightwards Arrow with Stroke (U+219B)
|
|
3005
|
-
|
|
3006
|
-
|Zero-width joiner
|
|
3007
|
-
|U+200D
|
|
3008
|
-
|`⇢`
|
|
3009
|
-
|Rightwards Dashed Arrow (U+21E2)
|
|
3010
|
-
|
|
3011
|
-
|Zero-width no-break space (BOM)
|
|
3012
|
-
|U+FEFF
|
|
3013
|
-
|`⇨`
|
|
3014
|
-
|Rightwards White Arrow (U+21E8)
|
|
3015
|
-
|===
|
|
3016
|
-
|
|
3017
|
-
.Bidirectional/RTL markers
|
|
3018
|
-
[cols="1,1,1,2"]
|
|
3019
|
-
|===
|
|
3020
|
-
|Character |Unicode |Symbol |Description
|
|
3021
|
-
|
|
3022
|
-
|Left-to-right mark
|
|
3023
|
-
|U+200E
|
|
3024
|
-
|`⟹`
|
|
3025
|
-
|Long Rightwards Double Arrow (U+27F9)
|
|
3026
|
-
|
|
3027
|
-
|Right-to-left mark
|
|
3028
|
-
|U+200F
|
|
3029
|
-
|`⟸`
|
|
3030
|
-
|Long Leftwards Double Arrow (U+27F8)
|
|
3031
|
-
|
|
3032
|
-
|LTR embedding
|
|
3033
|
-
|U+202A
|
|
3034
|
-
|`⇒`
|
|
3035
|
-
|Rightwards Double Arrow (U+21D2)
|
|
3036
|
-
|
|
3037
|
-
|RTL embedding
|
|
3038
|
-
|U+202B
|
|
3039
|
-
|`⇐`
|
|
3040
|
-
|Leftwards Double Arrow (U+21D0)
|
|
3041
|
-
|
|
3042
|
-
|Pop directional formatting
|
|
3043
|
-
|U+202C
|
|
3044
|
-
|`↔`
|
|
3045
|
-
|Left Right Arrow (U+2194)
|
|
3046
|
-
|
|
3047
|
-
|LTR override
|
|
3048
|
-
|U+202D
|
|
3049
|
-
|`⇉`
|
|
3050
|
-
|Rightwards Paired Arrows (U+21C9)
|
|
3051
|
-
|
|
3052
|
-
|RTL override
|
|
3053
|
-
|U+202E
|
|
3054
|
-
|`⇇`
|
|
3055
|
-
|Leftwards Paired Arrows (U+21C7)
|
|
3056
|
-
|===
|
|
3057
|
-
|
|
3058
|
-
.Control characters
|
|
3059
|
-
[cols="1,1,1,2"]
|
|
3060
|
-
|===
|
|
3061
|
-
|Character |Unicode |Symbol |Description
|
|
3062
|
-
|
|
3063
|
-
|Null
|
|
3064
|
-
|U+0000
|
|
3065
|
-
|`␀`
|
|
3066
|
-
|Symbol for Null (U+2400)
|
|
3067
|
-
|
|
3068
|
-
|Soft hyphen
|
|
3069
|
-
|U+00AD
|
|
3070
|
-
|`‐`
|
|
3071
|
-
|Hyphen (U+2010)
|
|
3072
|
-
|
|
3073
|
-
|Backspace
|
|
3074
|
-
|U+0008
|
|
3075
|
-
|`␈`
|
|
3076
|
-
|Symbol for Backspace (U+2408)
|
|
3077
|
-
|
|
3078
|
-
|Delete
|
|
3079
|
-
|U+007F
|
|
3080
|
-
|`␡`
|
|
3081
|
-
|Symbol for Delete (U+2421)
|
|
3082
|
-
|===
|
|
3083
|
-
|
|
3084
|
-
====== CJK safety
|
|
3085
|
-
|
|
3086
|
-
The visualization characters are specifically chosen to avoid conflicts with CJK
|
|
3087
|
-
text:
|
|
3088
|
-
|
|
3089
|
-
* **No middle dots** (`·`) - commonly used as separators in CJK
|
|
3090
|
-
* **No bullets** (`∙`) - used in CJK lists
|
|
3091
|
-
* **No circles** (`◌◍◎`) - look similar to CJK characters like ○ ●
|
|
3092
|
-
* **No small dots** (`⋅`) - conflict with CJK punctuation
|
|
3093
|
-
|
|
3094
|
-
Instead, Canon uses:
|
|
3095
|
-
* Box characters (`□▭▬▯▮▫`) for various space types
|
|
3096
|
-
* Arrow symbols (`→↛⇢⇨⟹⟸⇒⇐`) for zero-width and directional characters
|
|
3097
|
-
* Control Pictures block symbols (`␀␈␡`) for control characters
|
|
3098
|
-
|
|
3099
|
-
====== Customizing character visualization
|
|
3100
|
-
|
|
3101
|
-
You can customize the character visualization map for your specific needs:
|
|
3102
|
-
|
|
3103
|
-
[source,ruby]
|
|
3104
|
-
----
|
|
3105
|
-
require 'canon/diff_formatter'
|
|
3106
|
-
|
|
3107
|
-
# Create custom visualization map
|
|
3108
|
-
custom_map = Canon::DiffFormatter.merge_visualization_map({
|
|
3109
|
-
' ' => '·', # Use middle dot for spaces (if not using CJK)
|
|
3110
|
-
"\t" => '→', # Use simple arrow for tabs
|
|
3111
|
-
"\u200B" => '⚠' # Warning symbol for zero-width space
|
|
3112
|
-
})
|
|
3113
|
-
|
|
3114
|
-
# Use custom map with formatter
|
|
3115
|
-
formatter = Canon::DiffFormatter.new(
|
|
3116
|
-
use_color: true,
|
|
3117
|
-
visualization_map: custom_map
|
|
3118
|
-
)
|
|
3119
|
-
|
|
3120
|
-
# The custom map merges with defaults, so unspecified
|
|
3121
|
-
# characters still use the default visualization
|
|
3122
|
-
----
|
|
3123
|
-
|
|
3124
|
-
====== Visualization in action
|
|
3125
|
-
|
|
3126
|
-
.Whitespace visualization examples
|
|
3127
|
-
[example]
|
|
3128
|
-
[source]
|
|
3129
|
-
----
|
|
3130
|
-
# Space added between tags
|
|
3131
|
-
10| -| <tag>Value</tag> # No space
|
|
3132
|
-
| 10+| <tag>░Value</tag> # Space added (green light shade)
|
|
3133
|
-
|
|
3134
|
-
# Tab character
|
|
3135
|
-
15| -| <tag>⇥Value</tag> # Tab (red arrow-to-bar)
|
|
3136
|
-
| 15+| <tag>░░Value</tag> # Two spaces (green light shades)
|
|
3137
|
-
|
|
3138
|
-
# Non-breaking space (U+00A0)
|
|
3139
|
-
20| -| <tag>Value</tag> # Regular space
|
|
3140
|
-
| 20+| <tag>Value␣</tag> # Non-breaking space (green open box)
|
|
3141
|
-
|
|
3142
|
-
# Zero-width space (U+200B)
|
|
3143
|
-
25| -| <word1><word2> # No zero-width space
|
|
3144
|
-
| 25+| <word1>→<word2> # Zero-width space (green arrow)
|
|
3145
|
-
|
|
3146
|
-
# Mixed invisible characters
|
|
3147
|
-
30| -| <p>Text▬more</p> # Em space (red black rectangle)
|
|
3148
|
-
| 30+| <p>Text░more</p> # Regular space (green light shade)
|
|
3149
|
-
----
|
|
3150
|
-
|
|
3151
|
-
Where visualization symbols appear in:
|
|
3152
|
-
|
|
3153
|
-
* Red when showing removed/deleted characters
|
|
3154
|
-
* Green when showing added/inserted characters
|
|
3155
|
-
* Bold to make them more visible
|
|
3156
|
-
|
|
3157
|
-
**When is this useful?**
|
|
3158
|
-
|
|
3159
|
-
1. **Test failures due to formatting**: Your test expects compact XML but receives
|
|
3160
|
-
pretty-printed XML with different indentation
|
|
3161
|
-
|
|
3162
|
-
2. **Mixed whitespace**: Some parts of your code use tabs while others use spaces
|
|
3163
|
-
|
|
3164
|
-
3. **Non-breaking spaces**: Copy-pasted content from browsers often contains
|
|
3165
|
-
U+00A0 instead of regular spaces
|
|
3166
|
-
|
|
3167
|
-
4. **Zero-width characters**: Invisible Unicode characters that cause mysterious
|
|
3168
|
-
comparison failures
|
|
3169
|
-
|
|
3170
|
-
5. **RTL/LTR markers**: Bidirectional text markers in internationalized content
|
|
3171
|
-
|
|
3172
|
-
6. **Template differences**: Generated output has invisible character differences
|
|
3173
|
-
|
|
3174
|
-
.Real-world example: Non-breaking space from web copy-paste
|
|
3175
|
-
[example]
|
|
3176
|
-
Without whitespace visualization, these two lines look identical:
|
|
3177
|
-
|
|
3178
|
-
[source,xml]
|
|
3179
|
-
----
|
|
3180
|
-
<foreword id="fwd">
|
|
3181
|
-
<foreword id="fwd">
|
|
3182
|
-
----
|
|
3183
|
-
|
|
3184
|
-
With whitespace visualization enabled, the difference is immediately visible:
|
|
3185
|
-
|
|
3186
|
-
[source]
|
|
3187
|
-
----
|
|
3188
|
-
4| -| <foreword░id="fwd"> # Regular space (U+0020)
|
|
3189
|
-
| 4+| <foreword␣id="fwd"> # Non-breaking space (U+00A0)
|
|
3190
|
-
----
|
|
3191
|
-
|
|
3192
|
-
The different symbols (`░` vs `␣`) clearly show that one uses a regular space
|
|
3193
|
-
while the other uses a non-breaking space, likely from copying text from a web
|
|
3194
|
-
page or word processor.
|
|
3195
|
-
|
|
3196
|
-
.Real-world example: Zero-width characters
|
|
3197
|
-
[example]
|
|
3198
|
-
Zero-width characters are completely invisible but affect comparison:
|
|
3199
|
-
|
|
3200
|
-
[source,xml]
|
|
3201
|
-
----
|
|
3202
|
-
<item>Widget</item>
|
|
3203
|
-
<item>Widget</item> <!-- Contains U+200B zero-width space after "Widget" -->
|
|
3204
|
-
----
|
|
3205
|
-
|
|
3206
|
-
The diff shows:
|
|
3207
|
-
|
|
3208
|
-
[source]
|
|
3209
|
-
----
|
|
3210
|
-
5| -| <item>Widget</item>
|
|
3211
|
-
| 5+| <item>Widget→</item> # Zero-width space visualized as →
|
|
3212
|
-
----
|
|
3213
|
-
|
|
3214
|
-
The rightwards arrow (`→`) reveals the presence of a zero-width space that would
|
|
3215
|
-
otherwise be impossible to detect.
|
|
3216
|
-
|
|
3217
|
-
===== Non-ASCII character detection
|
|
3218
|
-
|
|
3219
|
-
**Purpose**: Alert users when diffs contain non-ASCII characters that might cause
|
|
3220
|
-
unexpected comparison failures or encoding issues.
|
|
3221
|
-
|
|
3222
|
-
When Canon detects non-ASCII characters (any character with Unicode codepoint >
|
|
3223
|
-
U+007F) in a diff block, it displays a yellow warning with the specific
|
|
3224
|
-
characters and their Unicode codepoints.
|
|
3225
|
-
|
|
3226
|
-
.Non-ASCII warning format
|
|
3227
|
-
[example]
|
|
3228
|
-
[source]
|
|
3229
|
-
----
|
|
3230
|
-
Context block has 3 diffs (lines 10-25):
|
|
3231
|
-
(WARNING: non-ASCII characters detected in diff: [' ' (U+00A0, shown as: ␣), '—' (U+2014, shown as: —)])
|
|
3232
|
-
|
|
3233
|
-
10| -| <p>Hello░world</p>
|
|
3234
|
-
| 10+| <p>Hello␣world</p> # Contains non-breaking space (U+00A0)
|
|
3235
|
-
15| -| <p>Text - more text</p>
|
|
3236
|
-
| 15+| <p>Text — more text</p> # Contains em dash (U+2014)
|
|
3237
|
-
----
|
|
3238
|
-
|
|
3239
|
-
The warning appears immediately after the "Context block has X diffs" header.
|
|
3240
|
-
|
|
3241
|
-
**Common non-ASCII characters in diffs**:
|
|
3242
|
-
|
|
3243
|
-
|===
|
|
3244
|
-
|Character |Unicode |Name |Common source
|
|
3245
|
-
|
|
3246
|
-
|` ` (looks like space)
|
|
3247
|
-
|U+00A0
|
|
3248
|
-
|Non-breaking space
|
|
3249
|
-
|Copy-paste from web browsers, word processors
|
|
3250
|
-
|
|
3251
|
-
|`—`
|
|
3252
|
-
|U+2014
|
|
3253
|
-
|Em dash
|
|
3254
|
-
|Word processors, smart quotes enabled
|
|
3255
|
-
|
|
3256
|
-
|`–`
|
|
3257
|
-
|U+2013
|
|
3258
|
-
|En dash
|
|
3259
|
-
|Word processors, smart quotes enabled
|
|
3260
|
-
|
|
3261
|
-
|`'` `'`
|
|
3262
|
-
|U+2018, U+2019
|
|
3263
|
-
|Smart single quotes
|
|
3264
|
-
|Word processors, text editors with smart quotes
|
|
3265
|
-
|
|
3266
|
-
|`"` `"`
|
|
3267
|
-
|U+201C, U+201D
|
|
3268
|
-
|Smart double quotes
|
|
3269
|
-
|Word processors, text editors with smart quotes
|
|
3270
|
-
|
|
3271
|
-
|`…`
|
|
3272
|
-
|U+2026
|
|
3273
|
-
|Ellipsis
|
|
3274
|
-
|Word processors
|
|
3275
|
-
|
|
3276
|
-
|Various
|
|
3277
|
-
|U+2000-U+200B
|
|
3278
|
-
|Various spaces
|
|
3279
|
-
|HTML entities, special formatting
|
|
3280
|
-
|===
|
|
3281
|
-
|
|
3282
|
-
**Why this matters**:
|
|
3283
|
-
|
|
3284
|
-
1. **Invisible differences**: Many non-ASCII characters look identical to their
|
|
3285
|
-
ASCII equivalents but cause comparison failures
|
|
3286
|
-
|
|
3287
|
-
2. **Encoding issues**: Non-ASCII characters may behave differently across
|
|
3288
|
-
systems with different encodings
|
|
3289
|
-
|
|
3290
|
-
3. **Copy-paste errors**: Content copied from browsers or documents often
|
|
3291
|
-
includes non-breaking spaces instead of regular spaces
|
|
3292
|
-
|
|
3293
|
-
4. **Smart quotes**: Text editors may automatically convert straight quotes to
|
|
3294
|
-
curly quotes
|
|
3295
|
-
|
|
3296
|
-
.Practical example
|
|
3297
|
-
[example]
|
|
3298
|
-
A test fails because the expected output was copied from a web page:
|
|
3299
|
-
|
|
3300
|
-
[source,ruby]
|
|
3301
|
-
----
|
|
3302
|
-
# Expected (copied from documentation website - contains U+00A0)
|
|
3303
|
-
expected = '<p>Hello world</p>' # Space between "Hello" and "world" is U+00A0
|
|
3304
|
-
|
|
3305
|
-
# Actual (generated by code - contains regular space)
|
|
3306
|
-
actual = '<p>Hello world</p>' # Space is U+0020
|
|
3307
|
-
|
|
3308
|
-
expect(actual).to be_xml_equivalent_to(expected)
|
|
3309
|
-
# FAILS: Documents appear identical but contain different space characters
|
|
3310
|
-
----
|
|
3311
|
-
|
|
3312
|
-
Canon's diff output shows:
|
|
3313
|
-
|
|
3314
|
-
[source]
|
|
3315
|
-
----
|
|
3316
|
-
Context block has 1 diff (line 1):
|
|
3317
|
-
(WARNING: non-ASCII characters detected in diff: [' ' (U+00A0)])
|
|
3318
|
-
|
|
3319
|
-
1| -| <p>Hello world</p> # U+0020 (regular space)
|
|
3320
|
-
| 1+| <p>Hello░world</p> # U+00A0 (non-breaking space, shown as block)
|
|
3321
|
-
----
|
|
3322
|
-
|
|
3323
|
-
The warning alerts you to check for non-breaking spaces, and the light shade
|
|
3324
|
-
block visualization shows where the difference occurs.
|
|
3325
|
-
|
|
3326
|
-
===== Configuration and usage
|
|
3327
|
-
|
|
3328
|
-
All enhanced diff features are enabled by default when `use_color` is `true` and
|
|
3329
|
-
automatically applied across all Canon interfaces:
|
|
3330
|
-
|
|
3331
|
-
[source,ruby]
|
|
3332
|
-
----
|
|
3333
|
-
# RSpec matchers (automatically enabled)
|
|
3334
|
-
expect(xml1).to be_xml_equivalent_to(xml2)
|
|
3335
|
-
# Output includes: colored line numbers, whitespace visualization, non-ASCII warnings
|
|
3336
|
-
|
|
3337
|
-
# CLI (enabled by default)
|
|
3338
|
-
$ canon diff file1.xml file2.xml --verbose
|
|
3339
|
-
# Output includes all enhanced features
|
|
3340
|
-
|
|
3341
|
-
# Ruby API (controlled by use_color parameter)
|
|
3342
|
-
formatter = Canon::DiffFormatter.new(use_color: true) # Enhanced features enabled
|
|
3343
|
-
formatter = Canon::DiffFormatter.new(use_color: false) # Plain text only
|
|
3344
|
-
----
|
|
3345
|
-
|
|
3346
|
-
To disable colored output (and all color-dependent enhancements):
|
|
3347
|
-
|
|
3348
|
-
[source,ruby]
|
|
3349
|
-
----
|
|
3350
|
-
# RSpec
|
|
3351
|
-
Canon::RspecMatchers.use_color = false
|
|
3352
|
-
|
|
3353
|
-
# CLI
|
|
3354
|
-
$ canon diff file1.xml file2.xml --no-color --verbose
|
|
3355
|
-
|
|
3356
|
-
# Ruby API
|
|
3357
|
-
formatter = Canon::DiffFormatter.new(use_color: false)
|
|
3358
|
-
----
|
|
3359
|
-
|
|
3360
|
-
When `use_color` is `false`:
|
|
3361
|
-
|
|
3362
|
-
* Line numbers and pipes are plain text
|
|
3363
|
-
* Whitespace is not visualized (remains invisible)
|
|
3364
|
-
* Non-ASCII warnings are still shown (but without yellow color)
|
|
3365
|
-
* Content changes are shown without color highlighting
|
|
3366
|
-
|
|
3367
|
-
|
|
3368
|
-
== Development
|
|
3369
|
-
|
|
3370
|
-
After checking out the repo, run `bin/setup` to install dependencies. Then, run
|
|
3371
|
-
`rake spec` to run the tests. You can also run `bin/console` for an interactive
|
|
3372
|
-
prompt that will allow you to experiment.
|
|
3373
|
-
|
|
3374
|
-
|
|
3375
|
-
== Contributing
|
|
3376
|
-
|
|
3377
|
-
Bug reports and pull requests are welcome on GitHub at
|
|
3378
|
-
https://github.com/lutaml/canon.
|
|
3379
|
-
|
|
3380
|
-
|
|
3381
|
-
== Copyright and license
|
|
3382
|
-
|
|
3383
|
-
Copyright Ribose.
|
|
3384
|
-
https://opensource.org/licenses/BSD-2-Clause[BSD-2-Clause License].
|