canon 0.1.8 → 0.1.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop_todo.yml +112 -25
- data/docs/Gemfile +1 -0
- data/docs/_config.yml +90 -1
- data/docs/advanced/diff-classification.adoc +82 -2
- data/docs/features/match-options/index.adoc +239 -1
- data/lib/canon/comparison/format_detector.rb +2 -1
- data/lib/canon/comparison/html_comparator.rb +19 -8
- data/lib/canon/comparison/html_compare_profile.rb +8 -2
- data/lib/canon/comparison/match_options/base_resolver.rb +7 -0
- data/lib/canon/comparison/whitespace_sensitivity.rb +208 -0
- data/lib/canon/comparison/xml_comparator/child_comparison.rb +15 -7
- data/lib/canon/comparison/xml_comparator/node_parser.rb +10 -5
- data/lib/canon/comparison/xml_comparator/node_type_comparator.rb +14 -7
- data/lib/canon/comparison/xml_comparator.rb +48 -23
- data/lib/canon/comparison/xml_node_comparison.rb +25 -3
- data/lib/canon/diff/diff_classifier.rb +101 -2
- data/lib/canon/diff/formatting_detector.rb +1 -1
- data/lib/canon/rspec_matchers.rb +37 -8
- data/lib/canon/version.rb +1 -1
- data/lib/canon/xml/data_model.rb +24 -13
- metadata +3 -78
- data/docs/plans/2025-01-17-html-parser-selection-fix.adoc +0 -250
- data/false_positive_analysis.txt +0 -0
- data/file1.html +0 -1
- data/file2.html +0 -1
- data/old-docs/ADVANCED_TOPICS.adoc +0 -20
- data/old-docs/BASIC_USAGE.adoc +0 -16
- data/old-docs/CHARACTER_VISUALIZATION.adoc +0 -567
- data/old-docs/CLI.adoc +0 -497
- data/old-docs/CUSTOMIZING_BEHAVIOR.adoc +0 -19
- data/old-docs/DIFF_ARCHITECTURE.adoc +0 -435
- data/old-docs/DIFF_FORMATTING.adoc +0 -540
- data/old-docs/DIFF_PARAMETERS.adoc +0 -261
- data/old-docs/DOM_DIFF.adoc +0 -1017
- data/old-docs/ENV_CONFIG.adoc +0 -876
- data/old-docs/FORMATS.adoc +0 -867
- data/old-docs/INPUT_VALIDATION.adoc +0 -477
- data/old-docs/MATCHER_BEHAVIOR.adoc +0 -90
- data/old-docs/MATCH_ARCHITECTURE.adoc +0 -463
- data/old-docs/MATCH_OPTIONS.adoc +0 -912
- data/old-docs/MODES.adoc +0 -432
- data/old-docs/NORMATIVE_INFORMATIVE_DIFFS.adoc +0 -219
- data/old-docs/OPTIONS.adoc +0 -1387
- data/old-docs/PREPROCESSING.adoc +0 -491
- data/old-docs/README.old.adoc +0 -2831
- data/old-docs/RSPEC.adoc +0 -814
- data/old-docs/RUBY_API.adoc +0 -485
- data/old-docs/SEMANTIC_DIFF_REPORT.adoc +0 -646
- data/old-docs/SEMANTIC_TREE_DIFF.adoc +0 -765
- data/old-docs/STRING_COMPARE.adoc +0 -345
- data/old-docs/TMP.adoc +0 -3384
- data/old-docs/TREE_DIFF.adoc +0 -1080
- data/old-docs/UNDERSTANDING_CANON.adoc +0 -17
- data/old-docs/VERBOSE.adoc +0 -482
- data/old-docs/VISUALIZATION_MAP.adoc +0 -625
- data/old-docs/WHITESPACE_TREATMENT.adoc +0 -1155
- data/scripts/analyze_current_state.rb +0 -85
- data/scripts/analyze_false_positives.rb +0 -114
- data/scripts/analyze_remaining_failures.rb +0 -105
- data/scripts/compare_current_failures.rb +0 -95
- data/scripts/compare_dom_tree_diff.rb +0 -158
- data/scripts/compare_failures.rb +0 -151
- data/scripts/debug_attribute_extraction.rb +0 -66
- data/scripts/debug_blocks_839.rb +0 -115
- data/scripts/debug_meta_matching.rb +0 -52
- data/scripts/debug_p_matching.rb +0 -192
- data/scripts/debug_signature_matching.rb +0 -118
- data/scripts/debug_sourcecode_124.rb +0 -32
- data/scripts/debug_whitespace_sensitive.rb +0 -192
- data/scripts/extract_false_positives.rb +0 -138
- data/scripts/find_actual_false_positives.rb +0 -125
- data/scripts/investigate_all_false_positives.rb +0 -161
- data/scripts/investigate_batch1.rb +0 -127
- data/scripts/investigate_classification.rb +0 -150
- data/scripts/investigate_classification_detailed.rb +0 -190
- data/scripts/investigate_common_failures.rb +0 -342
- data/scripts/investigate_false_negative.rb +0 -80
- data/scripts/investigate_false_positive.rb +0 -83
- data/scripts/investigate_false_positives.rb +0 -227
- data/scripts/investigate_false_positives_batch.rb +0 -163
- data/scripts/investigate_mixed_content.rb +0 -125
- data/scripts/investigate_remaining_16.rb +0 -214
- data/scripts/run_single_test.rb +0 -29
- data/scripts/test_all_false_positives.rb +0 -95
- data/scripts/test_attribute_details.rb +0 -61
- data/scripts/test_both_algorithms.rb +0 -49
- data/scripts/test_both_simple.rb +0 -49
- data/scripts/test_enhanced_semantic_output.rb +0 -125
- data/scripts/test_readme_examples.rb +0 -131
- data/scripts/test_semantic_tree_diff.rb +0 -99
- data/scripts/test_semantic_ux_improvements.rb +0 -135
- data/scripts/test_single_false_positive.rb +0 -119
- data/scripts/test_size_limits.rb +0 -99
- data/test_html_1.html +0 -21
- data/test_html_2.html +0 -21
- data/test_nokogiri.rb +0 -33
- data/test_normalize.rb +0 -45
data/old-docs/README.old.adoc
DELETED
|
@@ -1,2831 +0,0 @@
|
|
|
1
|
-
= Canon: Canonicalization for serialization formats
|
|
2
|
-
|
|
3
|
-
Canon allows you to format, canonicalize, or compare various serialization
|
|
4
|
-
formats, DOM-based (XML, HTML) or object-based (JSON, YAML).
|
|
5
|
-
|
|
6
|
-
Its main features:
|
|
7
|
-
|
|
8
|
-
* Canonicalization and pretty-printing for XML, HTML, JSON, and YAML
|
|
9
|
-
* Comparison of XML, HTML, JSON, and YAML documents
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
== Purpose
|
|
13
|
-
|
|
14
|
-
Canon provides canonicalization and pretty-printing for various serialization
|
|
15
|
-
formats (XML, HTML, JSON, YAML), producing standardized forms suitable for
|
|
16
|
-
comparison, testing, digital signatures, and human-readable output.
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
== Architecture
|
|
20
|
-
|
|
21
|
-
Canon follows an **orchestrator pattern** with MECE (Mutually Exclusive,
|
|
22
|
-
Collectively Exhaustive) principles for clean separation of concerns.
|
|
23
|
-
|
|
24
|
-
=== Comparison module
|
|
25
|
-
|
|
26
|
-
The `Canon::Comparison` module (123 lines) acts as a pure orchestrator that:
|
|
27
|
-
|
|
28
|
-
* Detects input format (XML, HTML, JSON, YAML)
|
|
29
|
-
* Validates format compatibility
|
|
30
|
-
* Delegates to format-specific comparator classes
|
|
31
|
-
|
|
32
|
-
Format-specific comparators:
|
|
33
|
-
|
|
34
|
-
* `Canon::Comparison::XmlComparator` - XML semantic comparison
|
|
35
|
-
* `Canon::Comparison::HtmlComparator` - HTML semantic comparison
|
|
36
|
-
* `Canon::Comparison::JsonComparator` - JSON/Ruby object comparison
|
|
37
|
-
* `Canon::Comparison::YamlComparator` - YAML comparison (delegates to JsonComparator)
|
|
38
|
-
|
|
39
|
-
Each comparator is self-contained and handles all comparison logic for its format.
|
|
40
|
-
|
|
41
|
-
=== DiffFormatter module
|
|
42
|
-
|
|
43
|
-
The `Canon::DiffFormatter` class (171 lines) acts as a pure orchestrator that:
|
|
44
|
-
|
|
45
|
-
* Manages diff options (colors, visualization, context)
|
|
46
|
-
* Detects diff mode (by-object vs by-line)
|
|
47
|
-
* Delegates to mode-specific and format-specific formatters
|
|
48
|
-
|
|
49
|
-
Two diff modes:
|
|
50
|
-
|
|
51
|
-
**By-object mode** (tree-based semantic diff):
|
|
52
|
-
|
|
53
|
-
* `Canon::DiffFormatter::ByObject::BaseFormatter` - Factory and common logic
|
|
54
|
-
* `Canon::DiffFormatter::ByObject::XmlFormatter` - XML DOM differences
|
|
55
|
-
* `Canon::DiffFormatter::ByObject::JsonFormatter` - Ruby object differences
|
|
56
|
-
* `Canon::DiffFormatter::ByObject::YamlFormatter` - YAML differences
|
|
57
|
-
|
|
58
|
-
**By-line mode** (line-based diff):
|
|
59
|
-
|
|
60
|
-
* `Canon::DiffFormatter::ByLine::BaseFormatter` - LCS algorithm and factory
|
|
61
|
-
* `Canon::DiffFormatter::ByLine::XmlFormatter` - DOM-guided XML line diff
|
|
62
|
-
* `Canon::DiffFormatter::ByLine::JsonFormatter` - Semantic JSON line diff
|
|
63
|
-
* `Canon::DiffFormatter::ByLine::YamlFormatter` - Semantic YAML line diff
|
|
64
|
-
* `Canon::DiffFormatter::ByLine::SimpleFormatter` - Fallback line diff
|
|
65
|
-
|
|
66
|
-
Each formatter handles format-specific intelligence (DOM parsing, token
|
|
67
|
-
highlighting, semantic understanding).
|
|
68
|
-
|
|
69
|
-
=== Object-oriented diff foundation
|
|
70
|
-
|
|
71
|
-
Canon uses three foundational classes for managing diff data:
|
|
72
|
-
|
|
73
|
-
* `Canon::Diff::DiffBlock` - Represents a contiguous block of changes
|
|
74
|
-
* `Canon::Diff::DiffContext` - Groups diff blocks with surrounding context
|
|
75
|
-
* `Canon::Diff::DiffReport` - Top-level container for complete diff results
|
|
76
|
-
|
|
77
|
-
These classes ensure MECE compliance by providing clear ownership of diff data
|
|
78
|
-
at different granularity levels.
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
== Features
|
|
82
|
-
|
|
83
|
-
=== Ruby API
|
|
84
|
-
|
|
85
|
-
Single API for working with all four formats (XML, HTML, JSON, YAML).
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
=== XML canonicalization
|
|
89
|
-
|
|
90
|
-
Format XML documents according to the
|
|
91
|
-
https://www.w3.org/TR/xml-c14n11/[W3C Canonical XML Version 1.1] specification.
|
|
92
|
-
|
|
93
|
-
Key features:
|
|
94
|
-
|
|
95
|
-
* Namespace declaration ordering (lexicographic by prefix)
|
|
96
|
-
* Attribute ordering (lexicographic by namespace URI, then local name)
|
|
97
|
-
* Character encoding normalization to UTF-8
|
|
98
|
-
* Special character encoding in text and attributes
|
|
99
|
-
* Removal of superfluous namespace declarations
|
|
100
|
-
* Support for xml:base, xml:lang, xml:space, and xml:id attributes
|
|
101
|
-
* Processing instruction and comment handling
|
|
102
|
-
* Document subset support with attribute inheritance
|
|
103
|
-
|
|
104
|
-
=== HTML canonicalization
|
|
105
|
-
|
|
106
|
-
Format HTML 4/5 and XHTML documents with consistent formatting. Automatically
|
|
107
|
-
detects HTML vs XHTML and applies appropriate formatting.
|
|
108
|
-
|
|
109
|
-
=== YAML canonicalization
|
|
110
|
-
|
|
111
|
-
Format YAML documents with keys sorted alphabetically at all levels of the
|
|
112
|
-
structure.
|
|
113
|
-
|
|
114
|
-
=== JSON canonicalization
|
|
115
|
-
|
|
116
|
-
Format JSON documents with keys sorted alphabetically at all levels of the
|
|
117
|
-
structure.
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
=== Output modes
|
|
121
|
-
|
|
122
|
-
Canon supports two output modes for all formats:
|
|
123
|
-
|
|
124
|
-
`c14n` (canonical):: Compact output without indentation, suitable for digital
|
|
125
|
-
signatures, hashing, and equivalence testing. Removes formatting whitespace.
|
|
126
|
-
|
|
127
|
-
`pretty` (pretty-print):: Human-readable output with consistent indentation.
|
|
128
|
-
Configurable indent size and type (spaces or tabs). This is the default mode for
|
|
129
|
-
CLI commands.
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
=== RSpec matchers
|
|
133
|
-
|
|
134
|
-
Provides matchers for testing equivalence between serialized formats.
|
|
135
|
-
|
|
136
|
-
NOTE: RSpec matchers always use canonical (c14n) mode for comparison to ensure
|
|
137
|
-
formatting differences don't affect test results.
|
|
138
|
-
|
|
139
|
-
=== Comparison API
|
|
140
|
-
|
|
141
|
-
Canon provides a `Canon::Comparison` module for semantic comparison of HTML and
|
|
142
|
-
XML documents.
|
|
143
|
-
|
|
144
|
-
The `Canon::Comparison.equivalent?` method compares two documents for semantic
|
|
145
|
-
equivalence, ignoring formatting differences that don't affect meaning.
|
|
146
|
-
|
|
147
|
-
Key features:
|
|
148
|
-
|
|
149
|
-
* Semantic comparison (content, not formatting)
|
|
150
|
-
* Whitespace normalization
|
|
151
|
-
* Comment handling (can ignore or include)
|
|
152
|
-
* Attribute sorting
|
|
153
|
-
* Support for both HTML and XML documents
|
|
154
|
-
* Optional verbose diff output
|
|
155
|
-
|
|
156
|
-
NOTE: `Canon::Comparison.equivalent?` adopts option names used by the excellent
|
|
157
|
-
https://github.com/vkononov/compare-xml[`compare-xml` gem].
|
|
158
|
-
|
|
159
|
-
|
|
160
|
-
== Installation
|
|
161
|
-
|
|
162
|
-
Add this line to your application's Gemfile:
|
|
163
|
-
|
|
164
|
-
[source,ruby]
|
|
165
|
-
----
|
|
166
|
-
gem 'canon'
|
|
167
|
-
----
|
|
168
|
-
|
|
169
|
-
And then execute:
|
|
170
|
-
|
|
171
|
-
[source,bash]
|
|
172
|
-
----
|
|
173
|
-
$ bundle install
|
|
174
|
-
----
|
|
175
|
-
|
|
176
|
-
Or install it yourself as:
|
|
177
|
-
|
|
178
|
-
[source,bash]
|
|
179
|
-
----
|
|
180
|
-
$ gem install canon
|
|
181
|
-
----
|
|
182
|
-
|
|
183
|
-
|
|
184
|
-
== Usage
|
|
185
|
-
|
|
186
|
-
=== Ruby API
|
|
187
|
-
|
|
188
|
-
==== Basic formatting (c14n mode)
|
|
189
|
-
|
|
190
|
-
The `Canon.format` method produces canonical output by default.
|
|
191
|
-
|
|
192
|
-
Syntax:
|
|
193
|
-
|
|
194
|
-
[source,ruby]
|
|
195
|
-
----
|
|
196
|
-
Canon.format({content}, {format})
|
|
197
|
-
Canon.format_{format}({content}) # Format-specific shorthand
|
|
198
|
-
----
|
|
199
|
-
|
|
200
|
-
Where,
|
|
201
|
-
|
|
202
|
-
`{content}`:: The input string
|
|
203
|
-
`{format}`:: The format type (`:xml`, `:html`, `:json`, or `:yaml`)
|
|
204
|
-
|
|
205
|
-
.Canonical formatting examples
|
|
206
|
-
[example]
|
|
207
|
-
====
|
|
208
|
-
[source,ruby]
|
|
209
|
-
----
|
|
210
|
-
require 'canon'
|
|
211
|
-
|
|
212
|
-
# XML - compact canonical form
|
|
213
|
-
xml = '<root><b>2</b><a>1</a></root>'
|
|
214
|
-
Canon.format(xml, :xml)
|
|
215
|
-
# => "<root><a>1</a><b>2</b></root>"
|
|
216
|
-
|
|
217
|
-
Canon.format_xml(xml) # Shorthand
|
|
218
|
-
# => "<root><a>1</a><b>2</b></root>"
|
|
219
|
-
|
|
220
|
-
# HTML - compact canonical form
|
|
221
|
-
html = '<div><p>Hello</p></div>'
|
|
222
|
-
Canon.format(html, :html)
|
|
223
|
-
Canon.format_html(html) # Shorthand
|
|
224
|
-
|
|
225
|
-
# JSON - canonical with sorted keys
|
|
226
|
-
json = '{"z":3,"a":1,"b":2}'
|
|
227
|
-
Canon.format(json, :json)
|
|
228
|
-
# => {"a":1,"b":2,"z":3}
|
|
229
|
-
|
|
230
|
-
# YAML - canonical with sorted keys
|
|
231
|
-
yaml = "z: 3\na: 1\nb: 2"
|
|
232
|
-
Canon.format(yaml, :yaml)
|
|
233
|
-
----
|
|
234
|
-
====
|
|
235
|
-
|
|
236
|
-
==== Pretty-print mode
|
|
237
|
-
|
|
238
|
-
For human-readable output with indentation, use the format-specific pretty
|
|
239
|
-
printer classes.
|
|
240
|
-
|
|
241
|
-
Syntax:
|
|
242
|
-
|
|
243
|
-
[source,ruby]
|
|
244
|
-
----
|
|
245
|
-
Canon::{Format}::PrettyPrinter.new(indent: {n}, indent_type: {type}).format({content})
|
|
246
|
-
----
|
|
247
|
-
|
|
248
|
-
Where,
|
|
249
|
-
|
|
250
|
-
`{Format}`:: The format module (`Xml`, `Html`, `Json`)
|
|
251
|
-
`{n}`:: Number of spaces (default: 2) or tabs (use 1 for tabs)
|
|
252
|
-
`{type}`:: Indentation type: `'space'` (default) or `'tab'`
|
|
253
|
-
`{content}`:: The input string
|
|
254
|
-
|
|
255
|
-
.Pretty-print examples
|
|
256
|
-
[example]
|
|
257
|
-
====
|
|
258
|
-
[source,ruby]
|
|
259
|
-
----
|
|
260
|
-
require 'canon/xml/pretty_printer'
|
|
261
|
-
require 'canon/html/pretty_printer'
|
|
262
|
-
require 'canon/json/pretty_printer'
|
|
263
|
-
|
|
264
|
-
xml_input = '<root><b>2</b><a>1</a></root>'
|
|
265
|
-
|
|
266
|
-
# XML with 2-space indentation (default)
|
|
267
|
-
Canon::Xml::PrettyPrinter.new(indent: 2).format(xml_input)
|
|
268
|
-
# =>
|
|
269
|
-
# <?xml version="1.0" encoding="UTF-8"?>
|
|
270
|
-
# <root>
|
|
271
|
-
# <a>1</a>
|
|
272
|
-
# <b>2</b>
|
|
273
|
-
# </root>
|
|
274
|
-
|
|
275
|
-
# XML with 4-space indentation
|
|
276
|
-
Canon::Xml::PrettyPrinter.new(indent: 4).format(xml_input)
|
|
277
|
-
|
|
278
|
-
# XML with tab indentation
|
|
279
|
-
Canon::Xml::PrettyPrinter.new(
|
|
280
|
-
indent: 1,
|
|
281
|
-
indent_type: 'tab'
|
|
282
|
-
).format(xml_input)
|
|
283
|
-
|
|
284
|
-
# HTML with 2-space indentation
|
|
285
|
-
html_input = '<div><p>Hello</p></div>'
|
|
286
|
-
Canon::Html::PrettyPrinter.new(indent: 2).format(html_input)
|
|
287
|
-
|
|
288
|
-
# JSON with 2-space indentation
|
|
289
|
-
json_input = '{"z":3,"a":{"b":1}}'
|
|
290
|
-
Canon::Json::PrettyPrinter.new(indent: 2).format(json_input)
|
|
291
|
-
|
|
292
|
-
# JSON with tab indentation
|
|
293
|
-
Canon::Json::PrettyPrinter.new(
|
|
294
|
-
indent: 1,
|
|
295
|
-
indent_type: 'tab'
|
|
296
|
-
).format(json_input)
|
|
297
|
-
----
|
|
298
|
-
====
|
|
299
|
-
|
|
300
|
-
==== Parsing
|
|
301
|
-
|
|
302
|
-
The `Canon.parse` method parses content into Ruby objects or Nokogiri documents.
|
|
303
|
-
|
|
304
|
-
Syntax:
|
|
305
|
-
|
|
306
|
-
[source,ruby]
|
|
307
|
-
----
|
|
308
|
-
Canon.parse({content}, {format})
|
|
309
|
-
Canon.parse_{format}({content}) # Format-specific shorthand
|
|
310
|
-
----
|
|
311
|
-
|
|
312
|
-
Where,
|
|
313
|
-
|
|
314
|
-
`{content}`:: The input string
|
|
315
|
-
`{format}`:: The format type (`:xml`, `:html`, `:json`, or `:yaml`)
|
|
316
|
-
|
|
317
|
-
.Parsing examples
|
|
318
|
-
[example]
|
|
319
|
-
====
|
|
320
|
-
[source,ruby]
|
|
321
|
-
----
|
|
322
|
-
# Parse XML → Nokogiri::XML::Document
|
|
323
|
-
xml_doc = Canon.parse(xml_input, :xml)
|
|
324
|
-
xml_doc = Canon.parse_xml(xml_input)
|
|
325
|
-
|
|
326
|
-
# Parse HTML → Nokogiri::HTML5::Document (or XML::Document for XHTML)
|
|
327
|
-
html_doc = Canon.parse(html_input, :html)
|
|
328
|
-
html_doc = Canon.parse_html(html_input)
|
|
329
|
-
|
|
330
|
-
# Parse JSON → Ruby Hash/Array
|
|
331
|
-
json_obj = Canon.parse(json_input, :json)
|
|
332
|
-
json_obj = Canon.parse_json(json_input)
|
|
333
|
-
|
|
334
|
-
# Parse YAML → Ruby Hash/Array
|
|
335
|
-
yaml_obj = Canon.parse(yaml_input, :yaml)
|
|
336
|
-
yaml_obj = Canon.parse_yaml(yaml_input)
|
|
337
|
-
----
|
|
338
|
-
====
|
|
339
|
-
|
|
340
|
-
==== Comparison
|
|
341
|
-
|
|
342
|
-
===== General
|
|
343
|
-
|
|
344
|
-
The `Canon::Comparison.equivalent?` method compares two HTML or XML documents.
|
|
345
|
-
|
|
346
|
-
The Comparison module uses a depth-first comparison based on the two DOM trees
|
|
347
|
-
by traversing them in parallel and comparing nodes.
|
|
348
|
-
|
|
349
|
-
In XML mode:
|
|
350
|
-
|
|
351
|
-
* Parsing: accepts Moxml (`Moxml::Document`) or Nokogiri
|
|
352
|
-
(`Nokogiri::XML::Document`)
|
|
353
|
-
* Comments: normalized and compared unless `ignore_comments: true`
|
|
354
|
-
* Whitespace: collapses whitespace in text nodes unless `collapse_whitespace: false`
|
|
355
|
-
* Sorts attributes alphabetically before comparison
|
|
356
|
-
|
|
357
|
-
In HTML mode:
|
|
358
|
-
|
|
359
|
-
* Parsing: accepts Nokogiri (`Nokogiri::HTML5` or `Nokogiri::HTML`)
|
|
360
|
-
* Normalizes HTML comments in `<style>` and `<script>` tags
|
|
361
|
-
* Sorts attributes alphabetically before comparison
|
|
362
|
-
* Collapses whitespace for text content comparison
|
|
363
|
-
* Removes empty text nodes between elements
|
|
364
|
-
|
|
365
|
-
[NOTE]
|
|
366
|
-
====
|
|
367
|
-
The comparison module is automatically used by Canon's RSpec matchers
|
|
368
|
-
(`be_html_equivalent_to`, `be_xml_equivalent_to`, etc.) to provide reliable
|
|
369
|
-
semantic comparison in tests.
|
|
370
|
-
====
|
|
371
|
-
|
|
372
|
-
===== Basic usage
|
|
373
|
-
|
|
374
|
-
Syntax:
|
|
375
|
-
|
|
376
|
-
[source,ruby]
|
|
377
|
-
----
|
|
378
|
-
Canon::Comparison.equivalent?({doc1}, {doc2}, {options})
|
|
379
|
-
----
|
|
380
|
-
|
|
381
|
-
Where,
|
|
382
|
-
|
|
383
|
-
`{doc1}`:: First document object (String, Nokogiri::HTML::Document, or supported XML document)
|
|
384
|
-
`{doc2}`:: Second document object (String, Nokogiri::HTML::Document, or supported XML document)
|
|
385
|
-
`{options}`:: Hash of comparison options (optional)
|
|
386
|
-
|
|
387
|
-
Canon::Comparison for XML supports Moxml::Document and Nokogiri::XML::Document
|
|
388
|
-
as input.
|
|
389
|
-
|
|
390
|
-
Returns:
|
|
391
|
-
|
|
392
|
-
* `true` if documents are equivalent
|
|
393
|
-
* `false` if documents differ
|
|
394
|
-
* `Array` of differences if `verbose: true` option is set
|
|
395
|
-
|
|
396
|
-
.Basic comparison examples
|
|
397
|
-
[example]
|
|
398
|
-
====
|
|
399
|
-
[source,ruby]
|
|
400
|
-
----
|
|
401
|
-
require 'canon/comparison'
|
|
402
|
-
|
|
403
|
-
# HTML comparison - ignores whitespace and comments by default
|
|
404
|
-
html1 = '<div><p>Hello</p></div>'
|
|
405
|
-
html2 = '<div> <p> Hello </p> </div>'
|
|
406
|
-
Canon::Comparison.equivalent?(html1, html2)
|
|
407
|
-
# => true
|
|
408
|
-
|
|
409
|
-
# HTML with different content
|
|
410
|
-
html3 = '<div><p>Goodbye</p></div>'
|
|
411
|
-
Canon::Comparison.equivalent?(html1, html3)
|
|
412
|
-
# => false
|
|
413
|
-
|
|
414
|
-
# XML comparison
|
|
415
|
-
xml1 = '<root><a>1</a><b>2</b></root>'
|
|
416
|
-
xml2 = '<root> <b>2</b> <a>1</a> </root>'
|
|
417
|
-
Canon::Comparison.equivalent?(xml1, xml2)
|
|
418
|
-
# => true
|
|
419
|
-
|
|
420
|
-
# With Nokogiri documents
|
|
421
|
-
doc1 = Nokogiri::HTML5(html1)
|
|
422
|
-
doc2 = Nokogiri::HTML5(html2)
|
|
423
|
-
Canon::Comparison.equivalent?(doc1, doc2)
|
|
424
|
-
# => true
|
|
425
|
-
----
|
|
426
|
-
====
|
|
427
|
-
|
|
428
|
-
|
|
429
|
-
|
|
430
|
-
===== Options at a glance
|
|
431
|
-
|
|
432
|
-
The `Canon::Comparison.equivalent?` method has a variety of options that tailor
|
|
433
|
-
comparison behavior.
|
|
434
|
-
|
|
435
|
-
The following options control comparison behavior:
|
|
436
|
-
|
|
437
|
-
`collapse_whitespace`:: (default: `true`) when `true`, trims and collapses whitespace
|
|
438
|
-
(<<collapse_whitespace>>)
|
|
439
|
-
|
|
440
|
-
`normalize_tag_whitespace`:: (default: `false`) when `true`, normalizes whitespace
|
|
441
|
-
boundaries around tags for flexible comparison (<<normalize_tag_whitespace>>)
|
|
442
|
-
|
|
443
|
-
`ignore_comments`:: (default: `true`) when `true`, ignores HTML/XML comments
|
|
444
|
-
(<<ignore_comments>>)
|
|
445
|
-
|
|
446
|
-
`ignore_attr_order`:: (default: `true`) when `true`, ignores attribute ordering
|
|
447
|
-
(<<ignore_attr_order>>)
|
|
448
|
-
|
|
449
|
-
`ignore_text_nodes`:: (default: `false`) when `true`, ignores all text content
|
|
450
|
-
(<<ignore_text_nodes>>)
|
|
451
|
-
|
|
452
|
-
`verbose`:: (default: `false`) when `true`, returns array of differences instead of boolean
|
|
453
|
-
(<<verbose>>)
|
|
454
|
-
|
|
455
|
-
|
|
456
|
-
[[collapse_whitespace]]
|
|
457
|
-
==== collapse_whitespace
|
|
458
|
-
|
|
459
|
-
`collapse_whitespace: {true|false}` default: `true`
|
|
460
|
-
|
|
461
|
-
When `true`, all text content within the document is trimmed (i.e. space removed
|
|
462
|
-
from left and right) and whitespace is collapsed (i.e. tabs, new lines, multiple
|
|
463
|
-
whitespace characters are replaced by a single whitespace).
|
|
464
|
-
|
|
465
|
-
XML mode:: Whitespace is collapsed in text nodes only. Whitespace within
|
|
466
|
-
attribute values is preserved.
|
|
467
|
-
|
|
468
|
-
HTML mode:: Whitespace is collapsed in text nodes only. Whitespace within
|
|
469
|
-
attribute values is preserved. Additionally, empty text nodes between elements
|
|
470
|
-
are removed.
|
|
471
|
-
|
|
472
|
-
Usage:
|
|
473
|
-
|
|
474
|
-
[source,ruby]
|
|
475
|
-
----
|
|
476
|
-
Canon::Comparison.equivalent?(doc1, doc2, collapse_whitespace: true)
|
|
477
|
-
----
|
|
478
|
-
|
|
479
|
-
.HTML examples with collapse_whitespace
|
|
480
|
-
[example]
|
|
481
|
-
====
|
|
482
|
-
When `true` the following HTML strings are considered equal:
|
|
483
|
-
|
|
484
|
-
[source,html]
|
|
485
|
-
----
|
|
486
|
-
<a href="/admin"> SOME TEXT CONTENT </a>
|
|
487
|
-
<a href="/admin">SOME TEXT CONTENT</a>
|
|
488
|
-
----
|
|
489
|
-
|
|
490
|
-
[source,ruby]
|
|
491
|
-
----
|
|
492
|
-
html1 = '<a href="/admin"> SOME TEXT CONTENT </a>'
|
|
493
|
-
html2 = '<a href="/admin">SOME TEXT CONTENT</a>'
|
|
494
|
-
Canon::Comparison.equivalent?(html1, html2, collapse_whitespace: true)
|
|
495
|
-
# => true
|
|
496
|
-
----
|
|
497
|
-
|
|
498
|
-
When `true` the following HTML strings are considered equal:
|
|
499
|
-
|
|
500
|
-
[source,html]
|
|
501
|
-
----
|
|
502
|
-
<html>
|
|
503
|
-
<title>
|
|
504
|
-
This is my title
|
|
505
|
-
</title>
|
|
506
|
-
</html>
|
|
507
|
-
|
|
508
|
-
<html><title>This is my title</title></html>
|
|
509
|
-
----
|
|
510
|
-
|
|
511
|
-
[source,ruby]
|
|
512
|
-
----
|
|
513
|
-
html1 = <<~HTML
|
|
514
|
-
<html>
|
|
515
|
-
<title>
|
|
516
|
-
This is my title
|
|
517
|
-
</title>
|
|
518
|
-
</html>
|
|
519
|
-
HTML
|
|
520
|
-
html2 = '<html><title>This is my title</title></html>'
|
|
521
|
-
Canon::Comparison.equivalent?(html1, html2, collapse_whitespace: true)
|
|
522
|
-
# => true
|
|
523
|
-
----
|
|
524
|
-
====
|
|
525
|
-
|
|
526
|
-
.XML examples with collapse_whitespace
|
|
527
|
-
[example]
|
|
528
|
-
====
|
|
529
|
-
When `true` the following XML strings are considered equal:
|
|
530
|
-
|
|
531
|
-
[source,xml]
|
|
532
|
-
----
|
|
533
|
-
<root>
|
|
534
|
-
<item> Some text </item>
|
|
535
|
-
</root>
|
|
536
|
-
|
|
537
|
-
<root><item>Some text</item></root>
|
|
538
|
-
----
|
|
539
|
-
|
|
540
|
-
[source,ruby]
|
|
541
|
-
----
|
|
542
|
-
xml1 = "<root>\n <item> Some text </item>\n</root>"
|
|
543
|
-
xml2 = '<root><item>Some text</item></root>'
|
|
544
|
-
Canon::Comparison.equivalent?(xml1, xml2, collapse_whitespace: true)
|
|
545
|
-
# => true
|
|
546
|
-
----
|
|
547
|
-
====
|
|
548
|
-
|
|
549
|
-
[[normalize_tag_whitespace]]
|
|
550
|
-
==== normalize_tag_whitespace
|
|
551
|
-
|
|
552
|
-
`normalize_tag_whitespace: {true|false}` default: `false`
|
|
553
|
-
|
|
554
|
-
When `true`, normalizes whitespace at tag boundaries by collapsing multiple
|
|
555
|
-
whitespace characters (spaces, tabs, newlines) to a single space and stripping
|
|
556
|
-
leading/trailing whitespace from text nodes. This enables "forgiving whitespace
|
|
557
|
-
mode" for comparing documents that use different pretty-print formatting while
|
|
558
|
-
maintaining the same semantic content.
|
|
559
|
-
|
|
560
|
-
This option is specifically designed for comparing documents where:
|
|
561
|
-
|
|
562
|
-
* One document is compact (no indentation/line breaks)
|
|
563
|
-
* The other document is pretty-printed (with indentation/line breaks)
|
|
564
|
-
* You want to ignore these formatting differences
|
|
565
|
-
|
|
566
|
-
[NOTE]
|
|
567
|
-
`normalize_tag_whitespace` is more aggressive than `collapse_whitespace`:
|
|
568
|
-
|
|
569
|
-
* `collapse_whitespace` only trims and collapses whitespace within text content
|
|
570
|
-
* `normalize_tag_whitespace` additionally handles whitespace at tag boundaries,
|
|
571
|
-
making it suitable for comparing compact vs pretty-printed documents
|
|
572
|
-
|
|
573
|
-
When both options are enabled, `normalize_tag_whitespace` takes precedence.
|
|
574
|
-
|
|
575
|
-
Usage:
|
|
576
|
-
|
|
577
|
-
[source,ruby]
|
|
578
|
-
----
|
|
579
|
-
Canon::Comparison.equivalent?(doc1, doc2, normalize_tag_whitespace: true)
|
|
580
|
-
----
|
|
581
|
-
|
|
582
|
-
.When to use normalize_tag_whitespace
|
|
583
|
-
[example]
|
|
584
|
-
Use this option when:
|
|
585
|
-
|
|
586
|
-
1. **Comparing generated output with expected fixtures**: Your test generates
|
|
587
|
-
pretty-printed XML/HTML but your fixture is compact (or vice versa)
|
|
588
|
-
|
|
589
|
-
2. **Mixed formatting in test suites**: Some tests use pretty-printed expected
|
|
590
|
-
values while others use compact format
|
|
591
|
-
|
|
592
|
-
3. **Flexible test fixtures**: You want to maintain human-readable test fixtures
|
|
593
|
-
with indentation but compare them against compact generated output
|
|
594
|
-
|
|
595
|
-
4. **Format-agnostic testing**: Testing semantic equivalence regardless of
|
|
596
|
-
whether the output is formatted or compact
|
|
597
|
-
|
|
598
|
-
.XML examples with normalize_tag_whitespace
|
|
599
|
-
[example]
|
|
600
|
-
When `true`, documents with different tag boundary whitespace are considered equal:
|
|
601
|
-
|
|
602
|
-
[source,xml]
|
|
603
|
-
----
|
|
604
|
-
<!-- Pretty-printed with line breaks and indentation -->
|
|
605
|
-
<root>
|
|
606
|
-
<item>
|
|
607
|
-
<name>Widget</name>
|
|
608
|
-
<price>10.00</price>
|
|
609
|
-
</item>
|
|
610
|
-
</root>
|
|
611
|
-
|
|
612
|
-
<!-- Compact on a single line -->
|
|
613
|
-
<root><item><name>Widget</name><price>10.00</price></item></root>
|
|
614
|
-
----
|
|
615
|
-
|
|
616
|
-
[source,ruby]
|
|
617
|
-
----
|
|
618
|
-
pretty = <<~XML
|
|
619
|
-
<root>
|
|
620
|
-
<item>
|
|
621
|
-
<name>Widget</name>
|
|
622
|
-
<price>10.00</price>
|
|
623
|
-
</item>
|
|
624
|
-
</root>
|
|
625
|
-
XML
|
|
626
|
-
|
|
627
|
-
compact = '<root><item><name>Widget</name><price>10.00</price></item></root>'
|
|
628
|
-
|
|
629
|
-
Canon::Comparison.equivalent?(pretty, compact, normalize_tag_whitespace: true)
|
|
630
|
-
# => true
|
|
631
|
-
----
|
|
632
|
-
|
|
633
|
-
When `false` (default), the whitespace differences matter:
|
|
634
|
-
|
|
635
|
-
[source,ruby]
|
|
636
|
-
----
|
|
637
|
-
Canon::Comparison.equivalent?(pretty, compact, normalize_tag_whitespace: false)
|
|
638
|
-
# => false (whitespace at tag boundaries differs)
|
|
639
|
-
----
|
|
640
|
-
|
|
641
|
-
This also handles complex nested structures:
|
|
642
|
-
|
|
643
|
-
[source,xml]
|
|
644
|
-
----
|
|
645
|
-
<!-- Pretty-printed -->
|
|
646
|
-
<document>
|
|
647
|
-
<metadata>
|
|
648
|
-
<title>My Document</title>
|
|
649
|
-
<author>
|
|
650
|
-
<name>John Doe</name>
|
|
651
|
-
<email>john@example.com</email>
|
|
652
|
-
</author>
|
|
653
|
-
</metadata>
|
|
654
|
-
</document>
|
|
655
|
-
|
|
656
|
-
<!-- Compact -->
|
|
657
|
-
<document><metadata><title>My Document</title><author><name>John Doe</name><email>john@example.com</email></author></metadata></document>
|
|
658
|
-
----
|
|
659
|
-
|
|
660
|
-
[source,ruby]
|
|
661
|
-
----
|
|
662
|
-
pretty_doc = <<~XML
|
|
663
|
-
<document>
|
|
664
|
-
<metadata>
|
|
665
|
-
<title>My Document</title>
|
|
666
|
-
<author>
|
|
667
|
-
<name>John Doe</name>
|
|
668
|
-
<email>john@example.com</email>
|
|
669
|
-
</author>
|
|
670
|
-
</metadata>
|
|
671
|
-
</document>
|
|
672
|
-
XML
|
|
673
|
-
|
|
674
|
-
compact_doc = '<document><metadata><title>My Document</title><author><name>John Doe</name><email>john@example.com</email></author></metadata></document>'
|
|
675
|
-
|
|
676
|
-
Canon::Comparison.equivalent?(pretty_doc, compact_doc, normalize_tag_whitespace: true)
|
|
677
|
-
# => true
|
|
678
|
-
----
|
|
679
|
-
|
|
680
|
-
.HTML examples with normalize_tag_whitespace
|
|
681
|
-
[example]
|
|
682
|
-
When `true`, HTML with different formatting is considered equal:
|
|
683
|
-
|
|
684
|
-
[source,html]
|
|
685
|
-
----
|
|
686
|
-
<!-- Pretty-printed -->
|
|
687
|
-
<div class="container">
|
|
688
|
-
<header>
|
|
689
|
-
<h1>Welcome</h1>
|
|
690
|
-
<p>Introduction text</p>
|
|
691
|
-
</header>
|
|
692
|
-
</div>
|
|
693
|
-
|
|
694
|
-
<!-- Compact -->
|
|
695
|
-
<div class="container"><header><h1>Welcome</h1><p>Introduction text</p></header></div>
|
|
696
|
-
----
|
|
697
|
-
|
|
698
|
-
[source,ruby]
|
|
699
|
-
----
|
|
700
|
-
pretty_html = <<~HTML
|
|
701
|
-
<div class="container">
|
|
702
|
-
<header>
|
|
703
|
-
<h1>Welcome</h1>
|
|
704
|
-
<p>Introduction text</p>
|
|
705
|
-
</header>
|
|
706
|
-
</div>
|
|
707
|
-
HTML
|
|
708
|
-
|
|
709
|
-
compact_html = '<div class="container"><header><h1>Welcome</h1><p>Introduction text</p></header></div>'
|
|
710
|
-
|
|
711
|
-
Canon::Comparison.equivalent?(pretty_html, compact_html, normalize_tag_whitespace: true)
|
|
712
|
-
# => true
|
|
713
|
-
----
|
|
714
|
-
|
|
715
|
-
.RSpec configuration for normalize_tag_whitespace
|
|
716
|
-
[example]
|
|
717
|
-
For test suites that consistently need forgiving whitespace mode, configure it
|
|
718
|
-
globally:
|
|
719
|
-
|
|
720
|
-
[source,ruby]
|
|
721
|
-
----
|
|
722
|
-
# spec/spec_helper.rb
|
|
723
|
-
require 'canon/rspec_matchers'
|
|
724
|
-
|
|
725
|
-
RSpec.configure do |config|
|
|
726
|
-
# Enable forgiving whitespace mode globally for all Canon matchers
|
|
727
|
-
Canon::RSpecMatchers.configure do |canon_config|
|
|
728
|
-
canon_config.normalize_tag_whitespace = true
|
|
729
|
-
end
|
|
730
|
-
end
|
|
731
|
-
|
|
732
|
-
# Now all XML/HTML comparisons will use forgiving whitespace mode
|
|
733
|
-
RSpec.describe 'My tests' do
|
|
734
|
-
it 'compares pretty-printed with compact XML' do
|
|
735
|
-
pretty_xml = <<~XML
|
|
736
|
-
<root>
|
|
737
|
-
<item>Value</item>
|
|
738
|
-
</root>
|
|
739
|
-
XML
|
|
740
|
-
|
|
741
|
-
compact_xml = '<root><item>Value</item></root>'
|
|
742
|
-
|
|
743
|
-
# These will be considered equivalent due to global configuration
|
|
744
|
-
expect(pretty_xml).to be_xml_equivalent_to(compact_xml)
|
|
745
|
-
end
|
|
746
|
-
|
|
747
|
-
it 'compares HTML with different formatting' do
|
|
748
|
-
pretty_html = <<~HTML
|
|
749
|
-
<div>
|
|
750
|
-
<p>Content</p>
|
|
751
|
-
</div>
|
|
752
|
-
HTML
|
|
753
|
-
|
|
754
|
-
compact_html = '<div><p>Content</p></div>'
|
|
755
|
-
|
|
756
|
-
expect(pretty_html).to be_html_equivalent_to(compact_html)
|
|
757
|
-
end
|
|
758
|
-
end
|
|
759
|
-
----
|
|
760
|
-
|
|
761
|
-
To disable it for specific tests when globally enabled:
|
|
762
|
-
|
|
763
|
-
[source,ruby]
|
|
764
|
-
----
|
|
765
|
-
# This test needs exact whitespace matching
|
|
766
|
-
it 'checks exact whitespace' do
|
|
767
|
-
# Temporarily disable normalize_tag_whitespace
|
|
768
|
-
original = Canon::RSpecMatchers.normalize_tag_whitespace
|
|
769
|
-
Canon::RSpecMatchers.normalize_tag_whitespace = false
|
|
770
|
-
|
|
771
|
-
begin
|
|
772
|
-
expect(xml1).to be_xml_equivalent_to(xml2)
|
|
773
|
-
ensure
|
|
774
|
-
Canon::RSpecMatchers.normalize_tag_whitespace = original
|
|
775
|
-
end
|
|
776
|
-
end
|
|
777
|
-
----
|
|
778
|
-
|
|
779
|
-
.Comparison with collapse_whitespace
|
|
780
|
-
[example]
|
|
781
|
-
Understanding the difference between `collapse_whitespace` and
|
|
782
|
-
`normalize_tag_whitespace`:
|
|
783
|
-
|
|
784
|
-
[source,ruby]
|
|
785
|
-
----
|
|
786
|
-
# Example XML with whitespace variations
|
|
787
|
-
pretty = '<root> <item> Value </item> </root>'
|
|
788
|
-
compact = '<root><item>Value</item></root>'
|
|
789
|
-
|
|
790
|
-
# With collapse_whitespace only (default)
|
|
791
|
-
Canon::Comparison.equivalent?(
|
|
792
|
-
pretty,
|
|
793
|
-
compact,
|
|
794
|
-
collapse_whitespace: true,
|
|
795
|
-
normalize_tag_whitespace: false
|
|
796
|
-
)
|
|
797
|
-
# => false
|
|
798
|
-
# Reason: Whitespace at tag boundaries (spaces between > and <) differs
|
|
799
|
-
|
|
800
|
-
# With normalize_tag_whitespace
|
|
801
|
-
Canon::Comparison.equivalent?(
|
|
802
|
-
pretty,
|
|
803
|
-
compact,
|
|
804
|
-
normalize_tag_whitespace: true
|
|
805
|
-
)
|
|
806
|
-
# => true
|
|
807
|
-
# Reason: All whitespace at tag boundaries is normalized
|
|
808
|
-
----
|
|
809
|
-
|
|
810
|
-
Key differences:
|
|
811
|
-
|
|
812
|
-
|===
|
|
813
|
-
|Feature |collapse_whitespace |normalize_tag_whitespace
|
|
814
|
-
|
|
815
|
-
|Trims text content
|
|
816
|
-
|✓
|
|
817
|
-
|✓
|
|
818
|
-
|
|
819
|
-
|Collapses internal whitespace
|
|
820
|
-
|✓
|
|
821
|
-
|✓
|
|
822
|
-
|
|
823
|
-
|Normalizes tag boundaries
|
|
824
|
-
|✗
|
|
825
|
-
|✓
|
|
826
|
-
|
|
827
|
-
|Use case
|
|
828
|
-
|Flexible text comparison
|
|
829
|
-
|Flexible format comparison
|
|
830
|
-
|===
|
|
831
|
-
|
|
832
|
-
[[ignore_attr_order]]
|
|
833
|
-
==== ignore_attr_order
|
|
834
|
-
|
|
835
|
-
`ignore_attr_order: {true|false}` default: `true`
|
|
836
|
-
|
|
837
|
-
When `true`, all attributes are sorted before comparison and only attributes of
|
|
838
|
-
the same type are compared.
|
|
839
|
-
|
|
840
|
-
Usage:
|
|
841
|
-
|
|
842
|
-
[source,ruby]
|
|
843
|
-
----
|
|
844
|
-
Canon::Comparison.equivalent?(doc1, doc2, ignore_attr_order: true)
|
|
845
|
-
----
|
|
846
|
-
|
|
847
|
-
.HTML examples with ignore_attr_order
|
|
848
|
-
[example]
|
|
849
|
-
====
|
|
850
|
-
When `true` the following HTML strings are considered equal:
|
|
851
|
-
|
|
852
|
-
[source,html]
|
|
853
|
-
----
|
|
854
|
-
<a href="/admin" class="button" target="_blank">Link</a>
|
|
855
|
-
<a class="button" target="_blank" href="/admin">Link</a>
|
|
856
|
-
----
|
|
857
|
-
|
|
858
|
-
[source,ruby]
|
|
859
|
-
----
|
|
860
|
-
html1 = '<a href="/admin" class="button" target="_blank">Link</a>'
|
|
861
|
-
html2 = '<a class="button" target="_blank" href="/admin">Link</a>'
|
|
862
|
-
Canon::Comparison.equivalent?(html1, html2, ignore_attr_order: true)
|
|
863
|
-
# => true
|
|
864
|
-
----
|
|
865
|
-
|
|
866
|
-
When `false` attributes are compared in order:
|
|
867
|
-
|
|
868
|
-
[source,ruby]
|
|
869
|
-
----
|
|
870
|
-
html1 = '<a href="/admin" class="button">Link</a>'
|
|
871
|
-
html2 = '<a class="button" href="/admin">Link</a>'
|
|
872
|
-
Canon::Comparison.equivalent?(html1, html2, ignore_attr_order: false)
|
|
873
|
-
# => false
|
|
874
|
-
----
|
|
875
|
-
====
|
|
876
|
-
|
|
877
|
-
.XML examples with ignore_attr_order
|
|
878
|
-
[example]
|
|
879
|
-
====
|
|
880
|
-
When `true` the following XML strings are considered equal:
|
|
881
|
-
|
|
882
|
-
[source,xml]
|
|
883
|
-
----
|
|
884
|
-
<item id="1" name="Widget" price="10.00"/>
|
|
885
|
-
<item price="10.00" id="1" name="Widget"/>
|
|
886
|
-
----
|
|
887
|
-
|
|
888
|
-
[source,ruby]
|
|
889
|
-
----
|
|
890
|
-
xml1 = '<item id="1" name="Widget" price="10.00"/>'
|
|
891
|
-
xml2 = '<item price="10.00" id="1" name="Widget"/>'
|
|
892
|
-
Canon::Comparison.equivalent?(xml1, xml2, ignore_attr_order: true)
|
|
893
|
-
# => true
|
|
894
|
-
----
|
|
895
|
-
====
|
|
896
|
-
|
|
897
|
-
[[ignore_comments]]
|
|
898
|
-
==== ignore_comments
|
|
899
|
-
|
|
900
|
-
`ignore_comments: {true|false}` default: `true`
|
|
901
|
-
|
|
902
|
-
When `true`, ignores comments such as `<!-- This is a comment -->`.
|
|
903
|
-
|
|
904
|
-
Usage:
|
|
905
|
-
|
|
906
|
-
[source,ruby]
|
|
907
|
-
----
|
|
908
|
-
Canon::Comparison.equivalent?(doc1, doc2, ignore_comments: true)
|
|
909
|
-
----
|
|
910
|
-
|
|
911
|
-
.HTML examples with ignore_comments
|
|
912
|
-
[example]
|
|
913
|
-
====
|
|
914
|
-
When `true` the following HTML strings are considered equal:
|
|
915
|
-
|
|
916
|
-
[source,html]
|
|
917
|
-
----
|
|
918
|
-
<!-- This is a comment -->
|
|
919
|
-
<!-- This is another comment -->
|
|
920
|
-
----
|
|
921
|
-
|
|
922
|
-
[source,ruby]
|
|
923
|
-
----
|
|
924
|
-
html1 = '<!-- This is a comment -->'
|
|
925
|
-
html2 = '<!-- This is another comment -->'
|
|
926
|
-
Canon::Comparison.equivalent?(html1, html2, ignore_comments: true)
|
|
927
|
-
# => true
|
|
928
|
-
----
|
|
929
|
-
|
|
930
|
-
When `true` the following HTML strings are considered equal:
|
|
931
|
-
|
|
932
|
-
[source,html]
|
|
933
|
-
----
|
|
934
|
-
<a href="/admin"><!-- This is a comment -->Link</a>
|
|
935
|
-
<a href="/admin">Link</a>
|
|
936
|
-
----
|
|
937
|
-
|
|
938
|
-
[source,ruby]
|
|
939
|
-
----
|
|
940
|
-
html1 = '<a href="/admin"><!-- This is a comment -->Link</a>'
|
|
941
|
-
html2 = '<a href="/admin">Link</a>'
|
|
942
|
-
Canon::Comparison.equivalent?(html1, html2, ignore_comments: true)
|
|
943
|
-
# => true
|
|
944
|
-
----
|
|
945
|
-
|
|
946
|
-
When `false` comments are compared:
|
|
947
|
-
|
|
948
|
-
[source,ruby]
|
|
949
|
-
----
|
|
950
|
-
html1 = '<div><!-- comment 1 --><p>Text</p></div>'
|
|
951
|
-
html2 = '<div><!-- comment 2 --><p>Text</p></div>'
|
|
952
|
-
Canon::Comparison.equivalent?(html1, html2, ignore_comments: false)
|
|
953
|
-
# => false
|
|
954
|
-
----
|
|
955
|
-
====
|
|
956
|
-
|
|
957
|
-
.XML examples with ignore_comments
|
|
958
|
-
[example]
|
|
959
|
-
====
|
|
960
|
-
When `true` the following XML strings are considered equal:
|
|
961
|
-
|
|
962
|
-
[source,xml]
|
|
963
|
-
----
|
|
964
|
-
<root>
|
|
965
|
-
<!-- First comment -->
|
|
966
|
-
<item>Data</item>
|
|
967
|
-
</root>
|
|
968
|
-
|
|
969
|
-
<root>
|
|
970
|
-
<!-- Different comment -->
|
|
971
|
-
<item>Data</item>
|
|
972
|
-
</root>
|
|
973
|
-
----
|
|
974
|
-
|
|
975
|
-
[source,ruby]
|
|
976
|
-
----
|
|
977
|
-
xml1 = '<root><!-- First comment --><item>Data</item></root>'
|
|
978
|
-
xml2 = '<root><!-- Different comment --><item>Data</item></root>'
|
|
979
|
-
Canon::Comparison.equivalent?(xml1, xml2, ignore_comments: true)
|
|
980
|
-
# => true
|
|
981
|
-
----
|
|
982
|
-
====
|
|
983
|
-
|
|
984
|
-
[[ignore_text_nodes]]
|
|
985
|
-
==== ignore_text_nodes
|
|
986
|
-
|
|
987
|
-
`ignore_text_nodes: {true|false}` default: `false`
|
|
988
|
-
|
|
989
|
-
When `true`, ignores all text content. Text content is anything that is included
|
|
990
|
-
between an opening and a closing tag, e.g. `<tag>THIS IS TEXT CONTENT</tag>`.
|
|
991
|
-
|
|
992
|
-
Usage:
|
|
993
|
-
|
|
994
|
-
[source,ruby]
|
|
995
|
-
----
|
|
996
|
-
Canon::Comparison.equivalent?(doc1, doc2, ignore_text_nodes: true)
|
|
997
|
-
----
|
|
998
|
-
|
|
999
|
-
.HTML examples with ignore_text_nodes
|
|
1000
|
-
[example]
|
|
1001
|
-
====
|
|
1002
|
-
When `true` the following HTML strings are considered equal:
|
|
1003
|
-
|
|
1004
|
-
[source,html]
|
|
1005
|
-
----
|
|
1006
|
-
<a href="/admin">SOME TEXT CONTENT</a>
|
|
1007
|
-
<a href="/admin">DIFFERENT TEXT CONTENT</a>
|
|
1008
|
-
----
|
|
1009
|
-
|
|
1010
|
-
[source,ruby]
|
|
1011
|
-
----
|
|
1012
|
-
html1 = '<a href="/admin">SOME TEXT CONTENT</a>'
|
|
1013
|
-
html2 = '<a href="/admin">DIFFERENT TEXT CONTENT</a>'
|
|
1014
|
-
Canon::Comparison.equivalent?(html1, html2, ignore_text_nodes: true)
|
|
1015
|
-
# => true
|
|
1016
|
-
----
|
|
1017
|
-
|
|
1018
|
-
When `true` the following HTML strings are considered equal:
|
|
1019
|
-
|
|
1020
|
-
[source,html]
|
|
1021
|
-
----
|
|
1022
|
-
<i class="icon"></i><b>Warning:</b>
|
|
1023
|
-
<i class="icon"></i><b>Message:</b>
|
|
1024
|
-
----
|
|
1025
|
-
|
|
1026
|
-
[source,ruby]
|
|
1027
|
-
----
|
|
1028
|
-
html1 = '<i class="icon"></i><b>Warning:</b>'
|
|
1029
|
-
html2 = '<i class="icon"></i><b>Message:</b>'
|
|
1030
|
-
Canon::Comparison.equivalent?(html1, html2, ignore_text_nodes: true)
|
|
1031
|
-
# => true
|
|
1032
|
-
----
|
|
1033
|
-
|
|
1034
|
-
When `false` text content is compared:
|
|
1035
|
-
|
|
1036
|
-
[source,ruby]
|
|
1037
|
-
----
|
|
1038
|
-
html1 = '<p>Hello</p>'
|
|
1039
|
-
html2 = '<p>Goodbye</p>'
|
|
1040
|
-
Canon::Comparison.equivalent?(html1, html2, ignore_text_nodes: false)
|
|
1041
|
-
# => false
|
|
1042
|
-
----
|
|
1043
|
-
====
|
|
1044
|
-
|
|
1045
|
-
.XML examples with ignore_text_nodes
|
|
1046
|
-
[example]
|
|
1047
|
-
====
|
|
1048
|
-
When `true` the following XML strings are considered equal:
|
|
1049
|
-
|
|
1050
|
-
[source,xml]
|
|
1051
|
-
----
|
|
1052
|
-
<item>First value</item>
|
|
1053
|
-
<item>Second value</item>
|
|
1054
|
-
----
|
|
1055
|
-
|
|
1056
|
-
[source,ruby]
|
|
1057
|
-
----
|
|
1058
|
-
xml1 = '<item>First value</item>'
|
|
1059
|
-
xml2 = '<item>Second value</item>'
|
|
1060
|
-
Canon::Comparison.equivalent?(xml1, xml2, ignore_text_nodes: true)
|
|
1061
|
-
# => true
|
|
1062
|
-
----
|
|
1063
|
-
====
|
|
1064
|
-
|
|
1065
|
-
[[verbose]]
|
|
1066
|
-
==== verbose
|
|
1067
|
-
|
|
1068
|
-
`verbose: {true|false}` default: `false`
|
|
1069
|
-
|
|
1070
|
-
When `true`, instead of returning a boolean value `Canon::Comparison.equivalent?`
|
|
1071
|
-
returns an array of all errors encountered when performing a comparison.
|
|
1072
|
-
|
|
1073
|
-
WARNING: When `true`, the comparison takes longer! Not only because more
|
|
1074
|
-
processing is required to produce meaningful differences, but also because in
|
|
1075
|
-
this mode, comparison does **NOT** stop when a first difference is encountered,
|
|
1076
|
-
because the goal is to capture as many differences as possible.
|
|
1077
|
-
|
|
1078
|
-
Usage:
|
|
1079
|
-
|
|
1080
|
-
[source,ruby]
|
|
1081
|
-
----
|
|
1082
|
-
Canon::Comparison.equivalent?(doc1, doc2, verbose: true)
|
|
1083
|
-
----
|
|
1084
|
-
|
|
1085
|
-
Return values in verbose mode:
|
|
1086
|
-
|
|
1087
|
-
* Empty array `[]` if documents are equivalent
|
|
1088
|
-
* Array of difference hashes if documents differ
|
|
1089
|
-
|
|
1090
|
-
Each difference hash contains:
|
|
1091
|
-
|
|
1092
|
-
`node1`:: The first node involved in the difference
|
|
1093
|
-
`node2`:: The second node involved in the difference
|
|
1094
|
-
`diff1`:: Difference code for the first node
|
|
1095
|
-
`diff2`:: Difference code for the second node
|
|
1096
|
-
|
|
1097
|
-
Difference codes:
|
|
1098
|
-
|
|
1099
|
-
* `Canon::Comparison::EQUIVALENT` (1) - Nodes are equivalent
|
|
1100
|
-
* `Canon::Comparison::MISSING_ATTRIBUTE` (2) - Attribute missing
|
|
1101
|
-
* `Canon::Comparison::MISSING_NODE` (3) - Node missing
|
|
1102
|
-
* `Canon::Comparison::UNEQUAL_ATTRIBUTES` (4) - Attributes differ
|
|
1103
|
-
* `Canon::Comparison::UNEQUAL_COMMENTS` (5) - Comments differ
|
|
1104
|
-
* `Canon::Comparison::UNEQUAL_ELEMENTS` (7) - Element names differ
|
|
1105
|
-
* `Canon::Comparison::UNEQUAL_NODES_TYPES` (8) - Node types differ
|
|
1106
|
-
* `Canon::Comparison::UNEQUAL_TEXT_CONTENTS` (9) - Text content differs
|
|
1107
|
-
|
|
1108
|
-
.Verbose mode examples
|
|
1109
|
-
[example]
|
|
1110
|
-
====
|
|
1111
|
-
[source,ruby]
|
|
1112
|
-
----
|
|
1113
|
-
# Verbose mode with equivalent documents
|
|
1114
|
-
html1 = '<div>Hello</div>'
|
|
1115
|
-
html2 = '<div>Hello</div>'
|
|
1116
|
-
result = Canon::Comparison.equivalent?(html1, html2, verbose: true)
|
|
1117
|
-
# => [] (empty array indicates equivalence)
|
|
1118
|
-
|
|
1119
|
-
# Verbose mode with different text content
|
|
1120
|
-
html1 = '<div>Hello</div>'
|
|
1121
|
-
html2 = '<div>Goodbye</div>'
|
|
1122
|
-
result = Canon::Comparison.equivalent?(html1, html2, verbose: true)
|
|
1123
|
-
# => [{
|
|
1124
|
-
# node1: <Nokogiri::XML::Text>,
|
|
1125
|
-
# node2: <Nokogiri::XML::Text>,
|
|
1126
|
-
# diff1: 9, # UNEQUAL_TEXT_CONTENTS
|
|
1127
|
-
# diff2: 9 # UNEQUAL_TEXT_CONTENTS
|
|
1128
|
-
# }]
|
|
1129
|
-
|
|
1130
|
-
# Verbose mode with different element names
|
|
1131
|
-
html1 = '<div>Test</div>'
|
|
1132
|
-
html2 = '<span>Test</span>'
|
|
1133
|
-
result = Canon::Comparison.equivalent?(html1, html2, verbose: true)
|
|
1134
|
-
# => [{
|
|
1135
|
-
# node1: <Nokogiri::XML::Element: div>,
|
|
1136
|
-
# node2: <Nokogiri::XML::Element: span>,
|
|
1137
|
-
# diff1: 7, # UNEQUAL_ELEMENTS
|
|
1138
|
-
# diff2: 7 # UNEQUAL_ELEMENTS
|
|
1139
|
-
# }]
|
|
1140
|
-
|
|
1141
|
-
# Verbose mode with missing attributes
|
|
1142
|
-
html1 = '<div class="foo" id="bar">Test</div>'
|
|
1143
|
-
html2 = '<div class="foo">Test</div>'
|
|
1144
|
-
result = Canon::Comparison.equivalent?(html1, html2, verbose: true)
|
|
1145
|
-
# => [{
|
|
1146
|
-
# node1: <Nokogiri::XML::Element: div>,
|
|
1147
|
-
# node2: <Nokogiri::XML::Element: div>,
|
|
1148
|
-
# diff1: 2, # MISSING_ATTRIBUTE
|
|
1149
|
-
# diff2: 2 # MISSING_ATTRIBUTE
|
|
1150
|
-
# }]
|
|
1151
|
-
|
|
1152
|
-
# Check difference type programmatically
|
|
1153
|
-
result = Canon::Comparison.equivalent?(html1, html2, verbose: true)
|
|
1154
|
-
if result.empty?
|
|
1155
|
-
puts "Documents are equivalent"
|
|
1156
|
-
else
|
|
1157
|
-
result.each do |diff|
|
|
1158
|
-
case diff[:diff1]
|
|
1159
|
-
when Canon::Comparison::UNEQUAL_TEXT_CONTENTS
|
|
1160
|
-
puts "Text content differs"
|
|
1161
|
-
when Canon::Comparison::UNEQUAL_ELEMENTS
|
|
1162
|
-
puts "Element names differ"
|
|
1163
|
-
when Canon::Comparison::MISSING_ATTRIBUTE
|
|
1164
|
-
puts "Attributes differ"
|
|
1165
|
-
end
|
|
1166
|
-
end
|
|
1167
|
-
end
|
|
1168
|
-
----
|
|
1169
|
-
====
|
|
1170
|
-
|
|
1171
|
-
|
|
1172
|
-
=== Diff formatting configuration
|
|
1173
|
-
|
|
1174
|
-
==== General
|
|
1175
|
-
|
|
1176
|
-
Canon provides comprehensive diff formatting capabilities across three interfaces:
|
|
1177
|
-
RSpec matchers, CLI commands, and the Ruby API. All interfaces support the same
|
|
1178
|
-
set of parameters for consistent behavior.
|
|
1179
|
-
|
|
1180
|
-
==== Parameters
|
|
1181
|
-
|
|
1182
|
-
The following table shows all available diff formatting parameters and their
|
|
1183
|
-
availability across interfaces:
|
|
1184
|
-
|
|
1185
|
-
[cols="1,1,1,1,2,1"]
|
|
1186
|
-
|===
|
|
1187
|
-
|Parameter |RSpec |CLI |Ruby API |Description |Default
|
|
1188
|
-
|
|
1189
|
-
|`use_color`
|
|
1190
|
-
|✓
|
|
1191
|
-
|✓
|
|
1192
|
-
|✓
|
|
1193
|
-
|Enable/disable colored output
|
|
1194
|
-
|`true`
|
|
1195
|
-
|
|
1196
|
-
|`diff_mode`
|
|
1197
|
-
|✓
|
|
1198
|
-
|✓
|
|
1199
|
-
|✓
|
|
1200
|
-
|Comparison mode: `:by_object` or `:by_line`
|
|
1201
|
-
|`:by_line` (RSpec), `:by_object` (XML/JSON/YAML)
|
|
1202
|
-
|
|
1203
|
-
|`context_lines`
|
|
1204
|
-
|✓
|
|
1205
|
-
|✓
|
|
1206
|
-
|✓
|
|
1207
|
-
|Number of unchanged lines to show around each change
|
|
1208
|
-
|`3`
|
|
1209
|
-
|
|
1210
|
-
|`diff_grouping_lines`
|
|
1211
|
-
|✓
|
|
1212
|
-
|✓
|
|
1213
|
-
|✓
|
|
1214
|
-
|Maximum line distance to group separate diffs into context blocks
|
|
1215
|
-
|`10`
|
|
1216
|
-
|===
|
|
1217
|
-
|
|
1218
|
-
==== Interface-specific usage
|
|
1219
|
-
|
|
1220
|
-
===== RSpec matchers configuration
|
|
1221
|
-
|
|
1222
|
-
Configure diff formatting for RSpec matchers using `Canon::RspecMatchers`:
|
|
1223
|
-
|
|
1224
|
-
[source,ruby]
|
|
1225
|
-
----
|
|
1226
|
-
require 'canon/rspec_matchers'
|
|
1227
|
-
|
|
1228
|
-
# Configure globally for all matchers
|
|
1229
|
-
Canon::RspecMatchers.diff_mode = :by_object
|
|
1230
|
-
Canon::RspecMatchers.use_color = true
|
|
1231
|
-
Canon::RspecMatchers.context_lines = 5
|
|
1232
|
-
Canon::RspecMatchers.diff_grouping_lines = 10
|
|
1233
|
-
|
|
1234
|
-
# Use in specs
|
|
1235
|
-
RSpec.describe 'My comparison' do
|
|
1236
|
-
it 'shows formatted diff' do
|
|
1237
|
-
expect(actual_xml).to be_xml_equivalent_to(expected_xml)
|
|
1238
|
-
end
|
|
1239
|
-
end
|
|
1240
|
-
----
|
|
1241
|
-
|
|
1242
|
-
===== CLI usage
|
|
1243
|
-
|
|
1244
|
-
Pass options to the `canon diff` command:
|
|
1245
|
-
|
|
1246
|
-
[source,bash]
|
|
1247
|
-
----
|
|
1248
|
-
# Basic diff with default settings
|
|
1249
|
-
$ canon diff file1.xml file2.xml --verbose
|
|
1250
|
-
|
|
1251
|
-
# Customize diff output
|
|
1252
|
-
$ canon diff file1.xml file2.xml \
|
|
1253
|
-
--verbose \
|
|
1254
|
-
--by-line \
|
|
1255
|
-
--no-color \
|
|
1256
|
-
--context-lines 5 \
|
|
1257
|
-
--diff-grouping-lines 10
|
|
1258
|
-
----
|
|
1259
|
-
|
|
1260
|
-
===== Ruby API usage
|
|
1261
|
-
|
|
1262
|
-
Use `Canon::DiffFormatter` directly in your code:
|
|
1263
|
-
|
|
1264
|
-
[source,ruby]
|
|
1265
|
-
----
|
|
1266
|
-
require 'canon/diff_formatter'
|
|
1267
|
-
require 'canon/comparison'
|
|
1268
|
-
|
|
1269
|
-
# Compare documents
|
|
1270
|
-
comparison = Canon::Comparison.new(doc1, doc2)
|
|
1271
|
-
result = comparison.compare
|
|
1272
|
-
|
|
1273
|
-
# Format diff output
|
|
1274
|
-
formatter = Canon::DiffFormatter.new(
|
|
1275
|
-
use_color: true,
|
|
1276
|
-
mode: :by_object,
|
|
1277
|
-
context_lines: 5,
|
|
1278
|
-
diff_grouping_lines: 10
|
|
1279
|
-
)
|
|
1280
|
-
|
|
1281
|
-
diff_output = formatter.format(result)
|
|
1282
|
-
puts diff_output
|
|
1283
|
-
----
|
|
1284
|
-
|
|
1285
|
-
==== Parameter details
|
|
1286
|
-
|
|
1287
|
-
===== use_color
|
|
1288
|
-
|
|
1289
|
-
Controls whether diff output includes ANSI color codes.
|
|
1290
|
-
|
|
1291
|
-
* Type: Boolean
|
|
1292
|
-
* Default: `true`
|
|
1293
|
-
* Colors used:
|
|
1294
|
-
** Red: Deletions/removed content
|
|
1295
|
-
** Green: Additions/inserted content
|
|
1296
|
-
** Yellow: Modified content
|
|
1297
|
-
** Cyan: Element names and structure
|
|
1298
|
-
|
|
1299
|
-
[source,ruby]
|
|
1300
|
-
----
|
|
1301
|
-
# Disable colors for plain text output
|
|
1302
|
-
Canon::RspecMatchers.use_color = false
|
|
1303
|
-
|
|
1304
|
-
# CLI
|
|
1305
|
-
$ canon diff file1.xml file2.xml --no-color --verbose
|
|
1306
|
-
----
|
|
1307
|
-
|
|
1308
|
-
===== diff_mode
|
|
1309
|
-
|
|
1310
|
-
Determines the comparison and display strategy.
|
|
1311
|
-
|
|
1312
|
-
* Type: Symbol (`:by_object` or `:by_line`)
|
|
1313
|
-
* Default: `:by_line` for RSpec matchers, format-dependent for CLI/API
|
|
1314
|
-
* Modes:
|
|
1315
|
-
** `:by_object` - Semantic tree-based comparison showing structural changes
|
|
1316
|
-
** `:by_line` - Line-by-line diff after canonicalization
|
|
1317
|
-
|
|
1318
|
-
[source,ruby]
|
|
1319
|
-
----
|
|
1320
|
-
# Use object-based diff for RSpec matchers
|
|
1321
|
-
Canon::RspecMatchers.diff_mode = :by_object
|
|
1322
|
-
|
|
1323
|
-
# CLI - XML uses by-object by default, force by-line
|
|
1324
|
-
$ canon diff file1.xml file2.xml --by-line --verbose
|
|
1325
|
-
----
|
|
1326
|
-
|
|
1327
|
-
===== context_lines
|
|
1328
|
-
|
|
1329
|
-
Number of unchanged lines to display around each change for context.
|
|
1330
|
-
|
|
1331
|
-
* Type: Numeric
|
|
1332
|
-
* Default: `3`
|
|
1333
|
-
* Range: `0` to any positive integer
|
|
1334
|
-
* Effect: Higher values show more surrounding context, lower values show only changes
|
|
1335
|
-
|
|
1336
|
-
[source,ruby]
|
|
1337
|
-
----
|
|
1338
|
-
# Show 5 lines of context around each change
|
|
1339
|
-
Canon::RspecMatchers.context_lines = 5
|
|
1340
|
-
|
|
1341
|
-
# CLI
|
|
1342
|
-
$ canon diff file1.xml file2.xml --context-lines 5 --verbose
|
|
1343
|
-
|
|
1344
|
-
# Ruby API
|
|
1345
|
-
formatter = Canon::DiffFormatter.new(context_lines: 5)
|
|
1346
|
-
----
|
|
1347
|
-
|
|
1348
|
-
===== diff_grouping_lines
|
|
1349
|
-
|
|
1350
|
-
Maximum line distance between separate changes to group them into a single
|
|
1351
|
-
context block.
|
|
1352
|
-
|
|
1353
|
-
* Type: Numeric or `nil`
|
|
1354
|
-
* Default: `nil` (no grouping)
|
|
1355
|
-
* Effect: When set, changes within N lines of each other are grouped into
|
|
1356
|
-
context blocks with a header showing the number of diffs in the block
|
|
1357
|
-
|
|
1358
|
-
[source,ruby]
|
|
1359
|
-
----
|
|
1360
|
-
# Group changes that are within 10 lines of each other
|
|
1361
|
-
Canon::RspecMatchers.diff_grouping_lines = 10
|
|
1362
|
-
|
|
1363
|
-
# CLI
|
|
1364
|
-
$ canon diff file1.xml file2.xml --diff-grouping-lines 10 --verbose
|
|
1365
|
-
|
|
1366
|
-
# Ruby API
|
|
1367
|
-
formatter = Canon::DiffFormatter.new(diff_grouping_lines: 10)
|
|
1368
|
-
----
|
|
1369
|
-
|
|
1370
|
-
.Example of grouped diff output
|
|
1371
|
-
[example]
|
|
1372
|
-
When `diff_grouping_lines` is set to `10`, changes close together are grouped:
|
|
1373
|
-
|
|
1374
|
-
[source]
|
|
1375
|
-
----
|
|
1376
|
-
Context block has 3 diffs (lines 5-18):
|
|
1377
|
-
5 - | <foreword id="fwd">
|
|
1378
|
-
5 + | <foreword displayorder="2" id="fwd">
|
|
1379
|
-
6 | <p>First paragraph</p>
|
|
1380
|
-
...
|
|
1381
|
-
15 - | <title>Scope</title>
|
|
1382
|
-
15 + | <title>Application Scope</title>
|
|
1383
|
-
16 | </clause>
|
|
1384
|
-
17 + | <p>New content</p>
|
|
1385
|
-
18 | </sections>
|
|
1386
|
-
----
|
|
1387
|
-
|
|
1388
|
-
Without grouping, these would appear as separate diff sections.
|
|
1389
|
-
|
|
1390
|
-
==== Enhanced diff output features
|
|
1391
|
-
|
|
1392
|
-
Canon's diff formatter includes several enhancements designed to make diffs more
|
|
1393
|
-
readable and informative, especially when working with RSpec test failures.
|
|
1394
|
-
|
|
1395
|
-
===== Color-coded line numbers and structure
|
|
1396
|
-
|
|
1397
|
-
**Purpose**: Improve readability by distinguishing structural elements from
|
|
1398
|
-
content changes.
|
|
1399
|
-
|
|
1400
|
-
When color mode is enabled (`use_color: true`), the diff formatter uses a
|
|
1401
|
-
consistent color scheme:
|
|
1402
|
-
|
|
1403
|
-
* **Yellow**: Line numbers and pipe separators
|
|
1404
|
-
* **Red**: Deletion markers (`-`) and removed content
|
|
1405
|
-
* **Green**: Addition markers (`+`) and inserted content
|
|
1406
|
-
* **Default terminal color**: Unchanged context lines (no ANSI codes applied)
|
|
1407
|
-
|
|
1408
|
-
This color scheme helps differentiate between:
|
|
1409
|
-
|
|
1410
|
-
* The diff structure (line numbers, pipes)
|
|
1411
|
-
* Content that was removed (red)
|
|
1412
|
-
* Content that was added (green)
|
|
1413
|
-
* Content that stayed the same (your terminal's default color)
|
|
1414
|
-
|
|
1415
|
-
.Example colored diff output
|
|
1416
|
-
[example]
|
|
1417
|
-
In a colored terminal, a typical diff line appears as:
|
|
1418
|
-
|
|
1419
|
-
[source]
|
|
1420
|
-
----
|
|
1421
|
-
5| 5 | <p>First paragraph</p> # Context line (yellow numbers/pipes, default text)
|
|
1422
|
-
6| -| <old>Text</old> # Deletion (yellow numbers/pipes, red marker/content)
|
|
1423
|
-
| 6+| <new>Text</new> # Addition (yellow numbers/pipes, green marker/content)
|
|
1424
|
-
----
|
|
1425
|
-
|
|
1426
|
-
Where:
|
|
1427
|
-
|
|
1428
|
-
* Line numbers (`5`, `6`) are in yellow
|
|
1429
|
-
* Pipe separators (`|`) are in yellow
|
|
1430
|
-
* Markers (`-`, `+`) are in red/green respectively
|
|
1431
|
-
* Changed content is highlighted in red (deletions) or green (additions)
|
|
1432
|
-
* Unchanged content uses your terminal's default color (no forced white/black)
|
|
1433
|
-
|
|
1434
|
-
**Why this matters**: When running tests with RSpec, the framework initially sets
|
|
1435
|
-
output to red. Canon's diff formatter explicitly resets colors to prevent RSpec's
|
|
1436
|
-
red from bleeding into the diff output, ensuring consistent and readable diffs.
|
|
1437
|
-
|
|
1438
|
-
===== Whitespace visualization
|
|
1439
|
-
|
|
1440
|
-
**Purpose**: Make invisible whitespace and special characters visible in diffs.
|
|
1441
|
-
|
|
1442
|
-
Whitespace changes can be difficult to spot in traditional diffs because spaces,
|
|
1443
|
-
tabs, and other invisible characters don't appear in output. Canon visualizes
|
|
1444
|
-
these changes using a comprehensive set of Unicode symbols that are safe for use
|
|
1445
|
-
with CJK (Chinese, Japanese, Korean) text.
|
|
1446
|
-
|
|
1447
|
-
**Visualization scope**: Character visualization is applied only to **diff lines**
|
|
1448
|
-
(additions, deletions, and changes), not to context lines (unchanged lines). This
|
|
1449
|
-
ensures that:
|
|
1450
|
-
|
|
1451
|
-
* Context lines display content in its original form without substitution
|
|
1452
|
-
* Only actual changes show visualization, making differences easier to spot
|
|
1453
|
-
* Within changed lines showing token-level diffs, unchanged tokens are displayed
|
|
1454
|
-
in the terminal's default color (not red/green) to distinguish them from actual
|
|
1455
|
-
changes
|
|
1456
|
-
|
|
1457
|
-
====== Default character visualization map
|
|
1458
|
-
|
|
1459
|
-
Canon provides a comprehensive CJK-safe character mapping for common non-visible
|
|
1460
|
-
characters encountered in diffs:
|
|
1461
|
-
|
|
1462
|
-
NOTE: These visualization symbols appear **only in diff lines** (additions,
|
|
1463
|
-
deletions, and changes), not in context lines (unchanged lines).
|
|
1464
|
-
|
|
1465
|
-
.Common whitespace characters
|
|
1466
|
-
[cols="1,1,1,2"]
|
|
1467
|
-
|===
|
|
1468
|
-
|Character |Unicode |Symbol |Description
|
|
1469
|
-
|
|
1470
|
-
|Regular space
|
|
1471
|
-
|U+0020
|
|
1472
|
-
|`░`
|
|
1473
|
-
|Light Shade (U+2591)
|
|
1474
|
-
|
|
1475
|
-
|Tab
|
|
1476
|
-
|U+0009
|
|
1477
|
-
|`⇥`
|
|
1478
|
-
|Rightwards Arrow to Bar (U+21E5)
|
|
1479
|
-
|
|
1480
|
-
|Non-breaking space
|
|
1481
|
-
|U+00A0
|
|
1482
|
-
|`␣`
|
|
1483
|
-
|Open Box (U+2423)
|
|
1484
|
-
|===
|
|
1485
|
-
|
|
1486
|
-
.Line endings
|
|
1487
|
-
[cols="1,1,1,2"]
|
|
1488
|
-
|===
|
|
1489
|
-
|Character |Unicode |Symbol |Description
|
|
1490
|
-
|
|
1491
|
-
|Line feed (LF)
|
|
1492
|
-
|U+000A
|
|
1493
|
-
|`↵`
|
|
1494
|
-
|Downwards Arrow with Corner Leftwards (U+21B5)
|
|
1495
|
-
|
|
1496
|
-
|Carriage return (CR)
|
|
1497
|
-
|U+000D
|
|
1498
|
-
|`⏎`
|
|
1499
|
-
|Return Symbol (U+23CE)
|
|
1500
|
-
|
|
1501
|
-
|Windows line ending (CRLF)
|
|
1502
|
-
|U+000D U+000A
|
|
1503
|
-
|`↵`
|
|
1504
|
-
|Downwards Arrow with Corner Leftwards (U+21B5)
|
|
1505
|
-
|
|
1506
|
-
|Next line (NEL)
|
|
1507
|
-
|U+0085
|
|
1508
|
-
|`⏎`
|
|
1509
|
-
|Return Symbol (U+23CE)
|
|
1510
|
-
|
|
1511
|
-
|Line separator
|
|
1512
|
-
|U+2028
|
|
1513
|
-
|`⤓`
|
|
1514
|
-
|Downwards Arrow to Bar (U+2913)
|
|
1515
|
-
|
|
1516
|
-
|Paragraph separator
|
|
1517
|
-
|U+2029
|
|
1518
|
-
|`⤓`
|
|
1519
|
-
|Downwards Arrow to Bar (U+2913)
|
|
1520
|
-
|===
|
|
1521
|
-
|
|
1522
|
-
.Unicode spaces (various widths)
|
|
1523
|
-
[cols="1,1,1,2"]
|
|
1524
|
-
|===
|
|
1525
|
-
|Character |Unicode |Symbol |Description
|
|
1526
|
-
|
|
1527
|
-
|En space
|
|
1528
|
-
|U+2002
|
|
1529
|
-
|`▭`
|
|
1530
|
-
|White Rectangle (U+25AD)
|
|
1531
|
-
|
|
1532
|
-
|Em space
|
|
1533
|
-
|U+2003
|
|
1534
|
-
|`▬`
|
|
1535
|
-
|Black Rectangle (U+25AC)
|
|
1536
|
-
|
|
1537
|
-
|Four-per-em space
|
|
1538
|
-
|U+2005
|
|
1539
|
-
|`⏓`
|
|
1540
|
-
|Metrical Short Over Long (U+23D3)
|
|
1541
|
-
|
|
1542
|
-
|Six-per-em space
|
|
1543
|
-
|U+2006
|
|
1544
|
-
|`⏕`
|
|
1545
|
-
|Metrical Two Shorts Over Long (U+23D5)
|
|
1546
|
-
|
|
1547
|
-
|Thin space
|
|
1548
|
-
|U+2009
|
|
1549
|
-
|`▯`
|
|
1550
|
-
|White Vertical Rectangle (U+25AF)
|
|
1551
|
-
|
|
1552
|
-
|Hair space
|
|
1553
|
-
|U+200A
|
|
1554
|
-
|`▮`
|
|
1555
|
-
|Black Vertical Rectangle (U+25AE)
|
|
1556
|
-
|
|
1557
|
-
|Figure space
|
|
1558
|
-
|U+2007
|
|
1559
|
-
|`□`
|
|
1560
|
-
|White Square (U+25A1)
|
|
1561
|
-
|
|
1562
|
-
|Narrow no-break space
|
|
1563
|
-
|U+202F
|
|
1564
|
-
|`▫`
|
|
1565
|
-
|White Small Square (U+25AB)
|
|
1566
|
-
|
|
1567
|
-
|Medium mathematical space
|
|
1568
|
-
|U+205F
|
|
1569
|
-
|`▭`
|
|
1570
|
-
|White Rectangle (U+25AD)
|
|
1571
|
-
|
|
1572
|
-
|Ideographic space
|
|
1573
|
-
|U+3000
|
|
1574
|
-
|`⎵`
|
|
1575
|
-
|Bottom Square Bracket (U+23B5)
|
|
1576
|
-
|
|
1577
|
-
|Ideographic half space
|
|
1578
|
-
|U+303F
|
|
1579
|
-
|`⏑`
|
|
1580
|
-
|Metrical Breve (U+23D1)
|
|
1581
|
-
|
|
1582
|
-
|===
|
|
1583
|
-
|
|
1584
|
-
.Zero-width characters (invisible troublemakers)
|
|
1585
|
-
[cols="1,1,1,2"]
|
|
1586
|
-
|===
|
|
1587
|
-
|Character |Unicode |Symbol |Description
|
|
1588
|
-
|
|
1589
|
-
|Zero-width space
|
|
1590
|
-
|U+200B
|
|
1591
|
-
|`→`
|
|
1592
|
-
|Rightwards Arrow (U+2192)
|
|
1593
|
-
|
|
1594
|
-
|Zero-width non-joiner
|
|
1595
|
-
|U+200C
|
|
1596
|
-
|`↛`
|
|
1597
|
-
|Rightwards Arrow with Stroke (U+219B)
|
|
1598
|
-
|
|
1599
|
-
|Zero-width joiner
|
|
1600
|
-
|U+200D
|
|
1601
|
-
|`⇢`
|
|
1602
|
-
|Rightwards Dashed Arrow (U+21E2)
|
|
1603
|
-
|
|
1604
|
-
|Zero-width no-break space (BOM)
|
|
1605
|
-
|U+FEFF
|
|
1606
|
-
|`⇨`
|
|
1607
|
-
|Rightwards White Arrow (U+21E8)
|
|
1608
|
-
|===
|
|
1609
|
-
|
|
1610
|
-
.Bidirectional/RTL markers
|
|
1611
|
-
[cols="1,1,1,2"]
|
|
1612
|
-
|===
|
|
1613
|
-
|Character |Unicode |Symbol |Description
|
|
1614
|
-
|
|
1615
|
-
|Left-to-right mark
|
|
1616
|
-
|U+200E
|
|
1617
|
-
|`⟹`
|
|
1618
|
-
|Long Rightwards Double Arrow (U+27F9)
|
|
1619
|
-
|
|
1620
|
-
|Right-to-left mark
|
|
1621
|
-
|U+200F
|
|
1622
|
-
|`⟸`
|
|
1623
|
-
|Long Leftwards Double Arrow (U+27F8)
|
|
1624
|
-
|
|
1625
|
-
|LTR embedding
|
|
1626
|
-
|U+202A
|
|
1627
|
-
|`⇒`
|
|
1628
|
-
|Rightwards Double Arrow (U+21D2)
|
|
1629
|
-
|
|
1630
|
-
|RTL embedding
|
|
1631
|
-
|U+202B
|
|
1632
|
-
|`⇐`
|
|
1633
|
-
|Leftwards Double Arrow (U+21D0)
|
|
1634
|
-
|
|
1635
|
-
|Pop directional formatting
|
|
1636
|
-
|U+202C
|
|
1637
|
-
|`↔`
|
|
1638
|
-
|Left Right Arrow (U+2194)
|
|
1639
|
-
|
|
1640
|
-
|LTR override
|
|
1641
|
-
|U+202D
|
|
1642
|
-
|`⇉`
|
|
1643
|
-
|Rightwards Paired Arrows (U+21C9)
|
|
1644
|
-
|
|
1645
|
-
|RTL override
|
|
1646
|
-
|U+202E
|
|
1647
|
-
|`⇇`
|
|
1648
|
-
|Leftwards Paired Arrows (U+21C7)
|
|
1649
|
-
|===
|
|
1650
|
-
|
|
1651
|
-
.Control characters
|
|
1652
|
-
[cols="1,1,1,2"]
|
|
1653
|
-
|===
|
|
1654
|
-
|Character |Unicode |Symbol |Description
|
|
1655
|
-
|
|
1656
|
-
|Null
|
|
1657
|
-
|U+0000
|
|
1658
|
-
|`␀`
|
|
1659
|
-
|Symbol for Null (U+2400)
|
|
1660
|
-
|
|
1661
|
-
|Soft hyphen
|
|
1662
|
-
|U+00AD
|
|
1663
|
-
|`‐`
|
|
1664
|
-
|Hyphen (U+2010)
|
|
1665
|
-
|
|
1666
|
-
|Backspace
|
|
1667
|
-
|U+0008
|
|
1668
|
-
|`␈`
|
|
1669
|
-
|Symbol for Backspace (U+2408)
|
|
1670
|
-
|
|
1671
|
-
|Delete
|
|
1672
|
-
|U+007F
|
|
1673
|
-
|`␡`
|
|
1674
|
-
|Symbol for Delete (U+2421)
|
|
1675
|
-
|===
|
|
1676
|
-
|
|
1677
|
-
====== CJK safety
|
|
1678
|
-
|
|
1679
|
-
The visualization characters are specifically chosen to avoid conflicts with CJK
|
|
1680
|
-
text:
|
|
1681
|
-
|
|
1682
|
-
* **No middle dots** (`·`) - commonly used as separators in CJK
|
|
1683
|
-
* **No bullets** (`∙`) - used in CJK lists
|
|
1684
|
-
* **No circles** (`◌◍◎`) - look similar to CJK characters like ○ ●
|
|
1685
|
-
* **No small dots** (`⋅`) - conflict with CJK punctuation
|
|
1686
|
-
|
|
1687
|
-
Instead, Canon uses:
|
|
1688
|
-
* Box characters (`□▭▬▯▮▫`) for various space types
|
|
1689
|
-
* Arrow symbols (`→↛⇢⇨⟹⟸⇒⇐`) for zero-width and directional characters
|
|
1690
|
-
* Control Pictures block symbols (`␀␈␡`) for control characters
|
|
1691
|
-
|
|
1692
|
-
====== Customizing character visualization
|
|
1693
|
-
|
|
1694
|
-
You can customize the character visualization map for your specific needs:
|
|
1695
|
-
|
|
1696
|
-
[source,ruby]
|
|
1697
|
-
----
|
|
1698
|
-
require 'canon/diff_formatter'
|
|
1699
|
-
|
|
1700
|
-
# Create custom visualization map
|
|
1701
|
-
custom_map = Canon::DiffFormatter.merge_visualization_map({
|
|
1702
|
-
' ' => '·', # Use middle dot for spaces (if not using CJK)
|
|
1703
|
-
"\t" => '→', # Use simple arrow for tabs
|
|
1704
|
-
"\u200B" => '⚠' # Warning symbol for zero-width space
|
|
1705
|
-
})
|
|
1706
|
-
|
|
1707
|
-
# Use custom map with formatter
|
|
1708
|
-
formatter = Canon::DiffFormatter.new(
|
|
1709
|
-
use_color: true,
|
|
1710
|
-
visualization_map: custom_map
|
|
1711
|
-
)
|
|
1712
|
-
|
|
1713
|
-
# The custom map merges with defaults, so unspecified
|
|
1714
|
-
# characters still use the default visualization
|
|
1715
|
-
----
|
|
1716
|
-
|
|
1717
|
-
====== Visualization in action
|
|
1718
|
-
|
|
1719
|
-
.Whitespace visualization examples
|
|
1720
|
-
[example]
|
|
1721
|
-
[source]
|
|
1722
|
-
----
|
|
1723
|
-
# Space added between tags
|
|
1724
|
-
10| -| <tag>Value</tag> # No space
|
|
1725
|
-
| 10+| <tag>░Value</tag> # Space added (green light shade)
|
|
1726
|
-
|
|
1727
|
-
# Tab character
|
|
1728
|
-
15| -| <tag>⇥Value</tag> # Tab (red arrow-to-bar)
|
|
1729
|
-
| 15+| <tag>░░Value</tag> # Two spaces (green light shades)
|
|
1730
|
-
|
|
1731
|
-
# Non-breaking space (U+00A0)
|
|
1732
|
-
20| -| <tag>Value</tag> # Regular space
|
|
1733
|
-
| 20+| <tag>Value␣</tag> # Non-breaking space (green open box)
|
|
1734
|
-
|
|
1735
|
-
# Zero-width space (U+200B)
|
|
1736
|
-
25| -| <word1><word2> # No zero-width space
|
|
1737
|
-
| 25+| <word1>→<word2> # Zero-width space (green arrow)
|
|
1738
|
-
|
|
1739
|
-
# Mixed invisible characters
|
|
1740
|
-
30| -| <p>Text▬more</p> # Em space (red black rectangle)
|
|
1741
|
-
| 30+| <p>Text░more</p> # Regular space (green light shade)
|
|
1742
|
-
----
|
|
1743
|
-
|
|
1744
|
-
Where visualization symbols appear in:
|
|
1745
|
-
|
|
1746
|
-
* Red when showing removed/deleted characters
|
|
1747
|
-
* Green when showing added/inserted characters
|
|
1748
|
-
* Bold to make them more visible
|
|
1749
|
-
|
|
1750
|
-
**When is this useful?**
|
|
1751
|
-
|
|
1752
|
-
1. **Test failures due to formatting**: Your test expects compact XML but receives
|
|
1753
|
-
pretty-printed XML with different indentation
|
|
1754
|
-
|
|
1755
|
-
2. **Mixed whitespace**: Some parts of your code use tabs while others use spaces
|
|
1756
|
-
|
|
1757
|
-
3. **Non-breaking spaces**: Copy-pasted content from browsers often contains
|
|
1758
|
-
U+00A0 instead of regular spaces
|
|
1759
|
-
|
|
1760
|
-
4. **Zero-width characters**: Invisible Unicode characters that cause mysterious
|
|
1761
|
-
comparison failures
|
|
1762
|
-
|
|
1763
|
-
5. **RTL/LTR markers**: Bidirectional text markers in internationalized content
|
|
1764
|
-
|
|
1765
|
-
6. **Template differences**: Generated output has invisible character differences
|
|
1766
|
-
|
|
1767
|
-
.Real-world example: Non-breaking space from web copy-paste
|
|
1768
|
-
[example]
|
|
1769
|
-
Without whitespace visualization, these two lines look identical:
|
|
1770
|
-
|
|
1771
|
-
[source,xml]
|
|
1772
|
-
----
|
|
1773
|
-
<foreword id="fwd">
|
|
1774
|
-
<foreword id="fwd">
|
|
1775
|
-
----
|
|
1776
|
-
|
|
1777
|
-
With whitespace visualization enabled, the difference is immediately visible:
|
|
1778
|
-
|
|
1779
|
-
[source]
|
|
1780
|
-
----
|
|
1781
|
-
4| -| <foreword░id="fwd"> # Regular space (U+0020)
|
|
1782
|
-
| 4+| <foreword␣id="fwd"> # Non-breaking space (U+00A0)
|
|
1783
|
-
----
|
|
1784
|
-
|
|
1785
|
-
The different symbols (`░` vs `␣`) clearly show that one uses a regular space
|
|
1786
|
-
while the other uses a non-breaking space, likely from copying text from a web
|
|
1787
|
-
page or word processor.
|
|
1788
|
-
|
|
1789
|
-
.Real-world example: Zero-width characters
|
|
1790
|
-
[example]
|
|
1791
|
-
Zero-width characters are completely invisible but affect comparison:
|
|
1792
|
-
|
|
1793
|
-
[source,xml]
|
|
1794
|
-
----
|
|
1795
|
-
<item>Widget</item>
|
|
1796
|
-
<item>Widget</item> <!-- Contains U+200B zero-width space after "Widget" -->
|
|
1797
|
-
----
|
|
1798
|
-
|
|
1799
|
-
The diff shows:
|
|
1800
|
-
|
|
1801
|
-
[source]
|
|
1802
|
-
----
|
|
1803
|
-
5| -| <item>Widget</item>
|
|
1804
|
-
| 5+| <item>Widget→</item> # Zero-width space visualized as →
|
|
1805
|
-
----
|
|
1806
|
-
|
|
1807
|
-
The rightwards arrow (`→`) reveals the presence of a zero-width space that would
|
|
1808
|
-
otherwise be impossible to detect.
|
|
1809
|
-
|
|
1810
|
-
===== Non-ASCII character detection
|
|
1811
|
-
|
|
1812
|
-
**Purpose**: Alert users when diffs contain non-ASCII characters that might cause
|
|
1813
|
-
unexpected comparison failures or encoding issues.
|
|
1814
|
-
|
|
1815
|
-
When Canon detects non-ASCII characters (any character with Unicode codepoint >
|
|
1816
|
-
U+007F) in a diff block, it displays a yellow warning with the specific
|
|
1817
|
-
characters and their Unicode codepoints.
|
|
1818
|
-
|
|
1819
|
-
.Non-ASCII warning format
|
|
1820
|
-
[example]
|
|
1821
|
-
[source]
|
|
1822
|
-
----
|
|
1823
|
-
Context block has 3 diffs (lines 10-25):
|
|
1824
|
-
(WARNING: non-ASCII characters detected in diff: [' ' (U+00A0, shown as: ␣), '—' (U+2014, shown as: —)])
|
|
1825
|
-
|
|
1826
|
-
10| -| <p>Hello░world</p>
|
|
1827
|
-
| 10+| <p>Hello␣world</p> # Contains non-breaking space (U+00A0)
|
|
1828
|
-
15| -| <p>Text - more text</p>
|
|
1829
|
-
| 15+| <p>Text — more text</p> # Contains em dash (U+2014)
|
|
1830
|
-
----
|
|
1831
|
-
|
|
1832
|
-
The warning appears immediately after the "Context block has X diffs" header.
|
|
1833
|
-
|
|
1834
|
-
**Common non-ASCII characters in diffs**:
|
|
1835
|
-
|
|
1836
|
-
|===
|
|
1837
|
-
|Character |Unicode |Name |Common source
|
|
1838
|
-
|
|
1839
|
-
|` ` (looks like space)
|
|
1840
|
-
|U+00A0
|
|
1841
|
-
|Non-breaking space
|
|
1842
|
-
|Copy-paste from web browsers, word processors
|
|
1843
|
-
|
|
1844
|
-
|`—`
|
|
1845
|
-
|U+2014
|
|
1846
|
-
|Em dash
|
|
1847
|
-
|Word processors, smart quotes enabled
|
|
1848
|
-
|
|
1849
|
-
|`–`
|
|
1850
|
-
|U+2013
|
|
1851
|
-
|En dash
|
|
1852
|
-
|Word processors, smart quotes enabled
|
|
1853
|
-
|
|
1854
|
-
|`'` `'`
|
|
1855
|
-
|U+2018, U+2019
|
|
1856
|
-
|Smart single quotes
|
|
1857
|
-
|Word processors, text editors with smart quotes
|
|
1858
|
-
|
|
1859
|
-
|`"` `"`
|
|
1860
|
-
|U+201C, U+201D
|
|
1861
|
-
|Smart double quotes
|
|
1862
|
-
|Word processors, text editors with smart quotes
|
|
1863
|
-
|
|
1864
|
-
|`…`
|
|
1865
|
-
|U+2026
|
|
1866
|
-
|Ellipsis
|
|
1867
|
-
|Word processors
|
|
1868
|
-
|
|
1869
|
-
|Various
|
|
1870
|
-
|U+2000-U+200B
|
|
1871
|
-
|Various spaces
|
|
1872
|
-
|HTML entities, special formatting
|
|
1873
|
-
|===
|
|
1874
|
-
|
|
1875
|
-
**Why this matters**:
|
|
1876
|
-
|
|
1877
|
-
1. **Invisible differences**: Many non-ASCII characters look identical to their
|
|
1878
|
-
ASCII equivalents but cause comparison failures
|
|
1879
|
-
|
|
1880
|
-
2. **Encoding issues**: Non-ASCII characters may behave differently across
|
|
1881
|
-
systems with different encodings
|
|
1882
|
-
|
|
1883
|
-
3. **Copy-paste errors**: Content copied from browsers or documents often
|
|
1884
|
-
includes non-breaking spaces instead of regular spaces
|
|
1885
|
-
|
|
1886
|
-
4. **Smart quotes**: Text editors may automatically convert straight quotes to
|
|
1887
|
-
curly quotes
|
|
1888
|
-
|
|
1889
|
-
.Practical example
|
|
1890
|
-
[example]
|
|
1891
|
-
A test fails because the expected output was copied from a web page:
|
|
1892
|
-
|
|
1893
|
-
[source,ruby]
|
|
1894
|
-
----
|
|
1895
|
-
# Expected (copied from documentation website - contains U+00A0)
|
|
1896
|
-
expected = '<p>Hello world</p>' # Space between "Hello" and "world" is U+00A0
|
|
1897
|
-
|
|
1898
|
-
# Actual (generated by code - contains regular space)
|
|
1899
|
-
actual = '<p>Hello world</p>' # Space is U+0020
|
|
1900
|
-
|
|
1901
|
-
expect(actual).to be_xml_equivalent_to(expected)
|
|
1902
|
-
# FAILS: Documents appear identical but contain different space characters
|
|
1903
|
-
----
|
|
1904
|
-
|
|
1905
|
-
Canon's diff output shows:
|
|
1906
|
-
|
|
1907
|
-
[source]
|
|
1908
|
-
----
|
|
1909
|
-
Context block has 1 diff (line 1):
|
|
1910
|
-
(WARNING: non-ASCII characters detected in diff: [' ' (U+00A0)])
|
|
1911
|
-
|
|
1912
|
-
1| -| <p>Hello world</p> # U+0020 (regular space)
|
|
1913
|
-
| 1+| <p>Hello░world</p> # U+00A0 (non-breaking space, shown as block)
|
|
1914
|
-
----
|
|
1915
|
-
|
|
1916
|
-
The warning alerts you to check for non-breaking spaces, and the light shade
|
|
1917
|
-
block visualization shows where the difference occurs.
|
|
1918
|
-
|
|
1919
|
-
===== Configuration and usage
|
|
1920
|
-
|
|
1921
|
-
All enhanced diff features are enabled by default when `use_color` is `true` and
|
|
1922
|
-
automatically applied across all Canon interfaces:
|
|
1923
|
-
|
|
1924
|
-
[source,ruby]
|
|
1925
|
-
----
|
|
1926
|
-
# RSpec matchers (automatically enabled)
|
|
1927
|
-
expect(xml1).to be_xml_equivalent_to(xml2)
|
|
1928
|
-
# Output includes: colored line numbers, whitespace visualization, non-ASCII warnings
|
|
1929
|
-
|
|
1930
|
-
# CLI (enabled by default)
|
|
1931
|
-
$ canon diff file1.xml file2.xml --verbose
|
|
1932
|
-
# Output includes all enhanced features
|
|
1933
|
-
|
|
1934
|
-
# Ruby API (controlled by use_color parameter)
|
|
1935
|
-
formatter = Canon::DiffFormatter.new(use_color: true) # Enhanced features enabled
|
|
1936
|
-
formatter = Canon::DiffFormatter.new(use_color: false) # Plain text only
|
|
1937
|
-
----
|
|
1938
|
-
|
|
1939
|
-
To disable colored output (and all color-dependent enhancements):
|
|
1940
|
-
|
|
1941
|
-
[source,ruby]
|
|
1942
|
-
----
|
|
1943
|
-
# RSpec
|
|
1944
|
-
Canon::RspecMatchers.use_color = false
|
|
1945
|
-
|
|
1946
|
-
# CLI
|
|
1947
|
-
$ canon diff file1.xml file2.xml --no-color --verbose
|
|
1948
|
-
|
|
1949
|
-
# Ruby API
|
|
1950
|
-
formatter = Canon::DiffFormatter.new(use_color: false)
|
|
1951
|
-
----
|
|
1952
|
-
|
|
1953
|
-
When `use_color` is `false`:
|
|
1954
|
-
|
|
1955
|
-
* Line numbers and pipes are plain text
|
|
1956
|
-
* Whitespace is not visualized (remains invisible)
|
|
1957
|
-
* Non-ASCII warnings are still shown (but without yellow color)
|
|
1958
|
-
* Content changes are shown without color highlighting
|
|
1959
|
-
|
|
1960
|
-
=== Input validation
|
|
1961
|
-
|
|
1962
|
-
Canon provides comprehensive input validation for all supported formats (XML,
|
|
1963
|
-
HTML, JSON, YAML). When malformed input is detected, Canon raises a
|
|
1964
|
-
`Canon::ValidationError` with detailed location information to help you quickly
|
|
1965
|
-
identify and fix the problem.
|
|
1966
|
-
|
|
1967
|
-
==== Purpose
|
|
1968
|
-
|
|
1969
|
-
Input validation ensures that:
|
|
1970
|
-
|
|
1971
|
-
* Malformed documents are detected early with clear error messages
|
|
1972
|
-
* Syntax errors show exact line and column numbers
|
|
1973
|
-
* Error details appear in RSpec test output (not hidden in log files)
|
|
1974
|
-
* Users receive actionable feedback about what's wrong and where
|
|
1975
|
-
|
|
1976
|
-
==== How it works
|
|
1977
|
-
|
|
1978
|
-
Canon validates input **before parsing** using format-specific validators:
|
|
1979
|
-
|
|
1980
|
-
* `Canon::Validators::XmlValidator` - Strict XML syntax validation
|
|
1981
|
-
* `Canon::Validators::HtmlValidator` - HTML5 and XHTML validation
|
|
1982
|
-
* `Canon::Validators::JsonValidator` - JSON syntax validation
|
|
1983
|
-
* `Canon::Validators::YamlValidator` - YAML syntax validation
|
|
1984
|
-
|
|
1985
|
-
Validation happens automatically when you use Canon's formatters or comparison
|
|
1986
|
-
methods.
|
|
1987
|
-
|
|
1988
|
-
==== Validation error format
|
|
1989
|
-
|
|
1990
|
-
When validation fails, Canon raises `Canon::ValidationError` with:
|
|
1991
|
-
|
|
1992
|
-
* `format` - The format being validated (`:xml`, `:html`, `:json`, `:yaml`)
|
|
1993
|
-
* `line` - Line number where the error occurred (if available)
|
|
1994
|
-
* `column` - Column number where the error occurred (if available)
|
|
1995
|
-
* `details` - Additional context about the error
|
|
1996
|
-
|
|
1997
|
-
.Validation error example
|
|
1998
|
-
[example]
|
|
1999
|
-
[source,ruby]
|
|
2000
|
-
----
|
|
2001
|
-
require 'canon'
|
|
2002
|
-
|
|
2003
|
-
malformed_xml = '<root><unclosed>'
|
|
2004
|
-
|
|
2005
|
-
begin
|
|
2006
|
-
Canon.format(malformed_xml, :xml)
|
|
2007
|
-
rescue Canon::ValidationError => e
|
|
2008
|
-
puts e.message
|
|
2009
|
-
# XML Validation Error: Premature end of data in tag unclosed line 1
|
|
2010
|
-
# Line: 1
|
|
2011
|
-
# Column: 18
|
|
2012
|
-
|
|
2013
|
-
puts "Format: #{e.format}" # => :xml
|
|
2014
|
-
puts "Line: #{e.line}" # => 1
|
|
2015
|
-
puts "Column: #{e.column}" # => 18
|
|
2016
|
-
end
|
|
2017
|
-
----
|
|
2018
|
-
|
|
2019
|
-
==== Format-specific validation
|
|
2020
|
-
|
|
2021
|
-
===== XML validation
|
|
2022
|
-
|
|
2023
|
-
Uses Nokogiri's strict XML parsing to detect:
|
|
2024
|
-
|
|
2025
|
-
* Unclosed tags
|
|
2026
|
-
* Mismatched tags
|
|
2027
|
-
* Invalid XML declaration
|
|
2028
|
-
* Malformed attributes
|
|
2029
|
-
* Invalid character references
|
|
2030
|
-
|
|
2031
|
-
.XML validation examples
|
|
2032
|
-
[example]
|
|
2033
|
-
[source,ruby]
|
|
2034
|
-
----
|
|
2035
|
-
# Unclosed tag
|
|
2036
|
-
Canon.format('<root><item>', :xml)
|
|
2037
|
-
# => Canon::ValidationError: XML Validation Error: Premature end of data in tag item line 1
|
|
2038
|
-
# Line: 1
|
|
2039
|
-
|
|
2040
|
-
# Mismatched tags
|
|
2041
|
-
Canon.format('<root><item></root>', :xml)
|
|
2042
|
-
# => Canon::ValidationError: XML Validation Error: Opening and ending tag mismatch: item line 1 and root
|
|
2043
|
-
# Line: 1
|
|
2044
|
-
----
|
|
2045
|
-
|
|
2046
|
-
===== HTML validation
|
|
2047
|
-
|
|
2048
|
-
Automatically detects HTML5 vs XHTML and applies appropriate validation:
|
|
2049
|
-
|
|
2050
|
-
* HTML5: Uses Nokogiri::HTML5 parser with error filtering
|
|
2051
|
-
* XHTML: Uses strict XML parsing
|
|
2052
|
-
|
|
2053
|
-
Special handling:
|
|
2054
|
-
|
|
2055
|
-
* Strips XML declarations from HTML (common in legacy HTML files)
|
|
2056
|
-
* Filters out non-critical HTML5 parser warnings
|
|
2057
|
-
* Only reports significant errors (level 2+)
|
|
2058
|
-
|
|
2059
|
-
.HTML validation examples
|
|
2060
|
-
[example]
|
|
2061
|
-
[source,ruby]
|
|
2062
|
-
----
|
|
2063
|
-
# Malformed XHTML
|
|
2064
|
-
xhtml = '<html xmlns="http://www.w3.org/1999/xhtml"><body><p>Unclosed'
|
|
2065
|
-
Canon.format(xhtml, :html)
|
|
2066
|
-
# => Canon::ValidationError: HTML Validation Error: Premature end of data in tag p line 1
|
|
2067
|
-
# Line: 1
|
|
2068
|
-
|
|
2069
|
-
# HTML5 with errors
|
|
2070
|
-
html5 = '<div><span></div>'
|
|
2071
|
-
Canon.format(html5, :html)
|
|
2072
|
-
# => Canon::ValidationError: HTML Validation Error: Unexpected end tag : span
|
|
2073
|
-
# Line: 1
|
|
2074
|
-
----
|
|
2075
|
-
|
|
2076
|
-
===== JSON validation
|
|
2077
|
-
|
|
2078
|
-
Validates JSON syntax using Ruby's JSON parser:
|
|
2079
|
-
|
|
2080
|
-
* Missing/extra braces or brackets
|
|
2081
|
-
* Trailing commas
|
|
2082
|
-
* Invalid escape sequences
|
|
2083
|
-
* Invalid numbers
|
|
2084
|
-
|
|
2085
|
-
Provides context showing the error location in the JSON structure.
|
|
2086
|
-
|
|
2087
|
-
.JSON validation examples
|
|
2088
|
-
[example]
|
|
2089
|
-
[source,ruby]
|
|
2090
|
-
----
|
|
2091
|
-
# Missing closing brace
|
|
2092
|
-
Canon.format('{"key": "value"', :json)
|
|
2093
|
-
# => Canon::ValidationError: JSON Validation Error: unexpected token at '{"key": "value"'
|
|
2094
|
-
# Details: Error at position 16
|
|
2095
|
-
|
|
2096
|
-
# Trailing comma (invalid in JSON)
|
|
2097
|
-
Canon.format('{"a": 1,}', :json)
|
|
2098
|
-
# => Canon::ValidationError: JSON Validation Error: unexpected token at '{"a": 1,}'
|
|
2099
|
-
# Details: Error at position 8
|
|
2100
|
-
----
|
|
2101
|
-
|
|
2102
|
-
===== YAML validation
|
|
2103
|
-
|
|
2104
|
-
Validates YAML syntax using Psych (Ruby's YAML parser):
|
|
2105
|
-
|
|
2106
|
-
* Invalid indentation
|
|
2107
|
-
* Unclosed brackets/braces
|
|
2108
|
-
* Invalid anchors/aliases
|
|
2109
|
-
* Type mismatches
|
|
2110
|
-
|
|
2111
|
-
Shows error location with line numbers and context.
|
|
2112
|
-
|
|
2113
|
-
.YAML validation examples
|
|
2114
|
-
[example]
|
|
2115
|
-
[source,ruby]
|
|
2116
|
-
----
|
|
2117
|
-
# Unclosed bracket
|
|
2118
|
-
Canon.format("key: {unclosed", :yaml)
|
|
2119
|
-
# => Canon::ValidationError: YAML Validation Error: (<unknown>): did not find expected node content...
|
|
2120
|
-
# Line: 1
|
|
2121
|
-
# Details: Shows context around error
|
|
2122
|
-
|
|
2123
|
-
# Invalid indentation
|
|
2124
|
-
yaml = <<~YAML
|
|
2125
|
-
parent:
|
|
2126
|
-
child: value
|
|
2127
|
-
YAML
|
|
2128
|
-
Canon.format(yaml, :yaml)
|
|
2129
|
-
# => Canon::ValidationError: YAML Validation Error: mapping values are not allowed in this context
|
|
2130
|
-
# Line: 2
|
|
2131
|
-
----
|
|
2132
|
-
|
|
2133
|
-
==== Validation in RSpec tests
|
|
2134
|
-
|
|
2135
|
-
Canon's RSpec matchers automatically propagate validation errors to test output,
|
|
2136
|
-
making it easy to see what's wrong:
|
|
2137
|
-
|
|
2138
|
-
.RSpec validation error example
|
|
2139
|
-
[example]
|
|
2140
|
-
[source,ruby]
|
|
2141
|
-
----
|
|
2142
|
-
require 'canon/rspec_matchers'
|
|
2143
|
-
|
|
2144
|
-
RSpec.describe 'XML validation' do
|
|
2145
|
-
it 'validates input' do
|
|
2146
|
-
malformed_xml = '<root><unclosed>'
|
|
2147
|
-
expected_xml = '<root><item/></root>'
|
|
2148
|
-
|
|
2149
|
-
# This will fail with a clear validation error message
|
|
2150
|
-
expect(malformed_xml).to be_xml_equivalent_to(expected_xml)
|
|
2151
|
-
end
|
|
2152
|
-
end
|
|
2153
|
-
|
|
2154
|
-
# Test output shows:
|
|
2155
|
-
# Canon::ValidationError:
|
|
2156
|
-
# XML Validation Error: Premature end of data in tag unclosed line 1
|
|
2157
|
-
# Line: 1
|
|
2158
|
-
# Column: 18
|
|
2159
|
-
----
|
|
2160
|
-
|
|
2161
|
-
The error appears directly in the RSpec output, not hidden in separate error
|
|
2162
|
-
files or logs.
|
|
2163
|
-
|
|
2164
|
-
==== Validation in comparison
|
|
2165
|
-
|
|
2166
|
-
Validation also occurs when using `Canon::Comparison.equivalent?`:
|
|
2167
|
-
|
|
2168
|
-
.Comparison validation example
|
|
2169
|
-
[example]
|
|
2170
|
-
[source,ruby]
|
|
2171
|
-
----
|
|
2172
|
-
require 'canon/comparison'
|
|
2173
|
-
|
|
2174
|
-
xml1 = '<root><item/></root>'
|
|
2175
|
-
xml2 = '<root><unclosed>'
|
|
2176
|
-
|
|
2177
|
-
Canon::Comparison.equivalent?(xml1, xml2)
|
|
2178
|
-
# => Canon::ValidationError: XML Validation Error: Premature end of data in tag unclosed line 1
|
|
2179
|
-
# Line: 1
|
|
2180
|
-
# Column: 18
|
|
2181
|
-
----
|
|
2182
|
-
|
|
2183
|
-
==== Benefits
|
|
2184
|
-
|
|
2185
|
-
Input validation provides several key benefits:
|
|
2186
|
-
|
|
2187
|
-
**Early error detection**:: Problems are caught before processing begins, saving
|
|
2188
|
-
time and providing clear feedback
|
|
2189
|
-
|
|
2190
|
-
**Precise error location**:: Line and column numbers pinpoint exactly where the
|
|
2191
|
-
problem is, especially useful in large documents
|
|
2192
|
-
|
|
2193
|
-
**Clear error messages**:: Descriptive messages explain what's wrong and often
|
|
2194
|
-
suggest how to fix it
|
|
2195
|
-
|
|
2196
|
-
**Test-friendly**:: Errors appear in RSpec output where developers expect them,
|
|
2197
|
-
not in separate log files
|
|
2198
|
-
|
|
2199
|
-
**Format-aware**:: Each validator understands format-specific rules and provides
|
|
2200
|
-
relevant error details
|
|
2201
|
-
|
|
2202
|
-
|
|
2203
|
-
=== RSpec matchers
|
|
2204
|
-
|
|
2205
|
-
RSpec matchers for testing equivalence between serialized formats. All matchers
|
|
2206
|
-
use canonical (c14n) mode for comparison.
|
|
2207
|
-
|
|
2208
|
-
See <<Diff formatting configuration>> for details on configuring diff output
|
|
2209
|
-
in RSpec matchers.
|
|
2210
|
-
|
|
2211
|
-
.RSpec matcher examples
|
|
2212
|
-
[example]
|
|
2213
|
-
====
|
|
2214
|
-
[source,ruby]
|
|
2215
|
-
----
|
|
2216
|
-
require 'rspec'
|
|
2217
|
-
require 'canon'
|
|
2218
|
-
|
|
2219
|
-
RSpec.describe 'Serialization equivalence' do
|
|
2220
|
-
# Unified matcher with format parameter
|
|
2221
|
-
it 'compares XML' do
|
|
2222
|
-
xml1 = '<root><a>1</a><b>2</b></root>'
|
|
2223
|
-
xml2 = '<root> <b>2</b> <a>1</a> </root>'
|
|
2224
|
-
expect(xml1).to be_serialization_equivalent_to(xml2, format: :xml)
|
|
2225
|
-
end
|
|
2226
|
-
|
|
2227
|
-
it 'compares HTML' do
|
|
2228
|
-
html1 = '<div><p>Hello</p></div>'
|
|
2229
|
-
html2 = '<div> <p> Hello </p> </div>'
|
|
2230
|
-
expect(html1).to be_serialization_equivalent_to(html2, format: :html)
|
|
2231
|
-
end
|
|
2232
|
-
|
|
2233
|
-
it 'compares JSON' do
|
|
2234
|
-
json1 = '{"a":1,"b":2}'
|
|
2235
|
-
json2 = '{"b":2,"a":1}'
|
|
2236
|
-
expect(json1).to be_serialization_equivalent_to(json2, format: :json)
|
|
2237
|
-
end
|
|
2238
|
-
|
|
2239
|
-
it 'compares YAML' do
|
|
2240
|
-
yaml1 = "a: 1\nb: 2"
|
|
2241
|
-
yaml2 = "b: 2\na: 1"
|
|
2242
|
-
expect(yaml1).to be_serialization_equivalent_to(yaml2, format: :yaml)
|
|
2243
|
-
end
|
|
2244
|
-
|
|
2245
|
-
# Format-specific matchers
|
|
2246
|
-
it 'uses format-specific matchers' do
|
|
2247
|
-
expect(xml1).to be_xml_equivalent_to(xml2) # XML
|
|
2248
|
-
expect(xml1).to be_analogous_with(xml2) # XML (legacy)
|
|
2249
|
-
expect(html1).to be_html_equivalent_to(html2) # HTML
|
|
2250
|
-
expect(json1).to be_json_equivalent_to(json2) # JSON
|
|
2251
|
-
expect(yaml1).to be_yaml_equivalent_to(yaml2) # YAML
|
|
2252
|
-
end
|
|
2253
|
-
end
|
|
2254
|
-
----
|
|
2255
|
-
====
|
|
2256
|
-
|
|
2257
|
-
[IMPORTANT]
|
|
2258
|
-
====
|
|
2259
|
-
RSpec matchers always canonicalize both sides before comparing, so:
|
|
2260
|
-
|
|
2261
|
-
* Formatting differences (whitespace, indentation) are ignored
|
|
2262
|
-
* Attribute order in XML/HTML is normalized
|
|
2263
|
-
* Key order in JSON/YAML is normalized
|
|
2264
|
-
* Tests focus on content equality, not formatting
|
|
2265
|
-
====
|
|
2266
|
-
|
|
2267
|
-
|
|
2268
|
-
== Command-line interface
|
|
2269
|
-
|
|
2270
|
-
=== Installation
|
|
2271
|
-
|
|
2272
|
-
After installing the gem, the `canon` command will be available:
|
|
2273
|
-
|
|
2274
|
-
[source,bash]
|
|
2275
|
-
----
|
|
2276
|
-
$ gem install canon
|
|
2277
|
-
$ canon --help
|
|
2278
|
-
----
|
|
2279
|
-
|
|
2280
|
-
=== Format command
|
|
2281
|
-
|
|
2282
|
-
The `format` command formats files in XML, HTML, JSON, or YAML.
|
|
2283
|
-
|
|
2284
|
-
==== Output modes
|
|
2285
|
-
|
|
2286
|
-
`pretty` (default):: Human-readable output with indentation (2 spaces default)
|
|
2287
|
-
`c14n`:: Canonical form without indentation
|
|
2288
|
-
|
|
2289
|
-
==== Command syntax
|
|
2290
|
-
|
|
2291
|
-
[source,bash]
|
|
2292
|
-
----
|
|
2293
|
-
canon format FILE [OPTIONS]
|
|
2294
|
-
----
|
|
2295
|
-
|
|
2296
|
-
==== Options
|
|
2297
|
-
|
|
2298
|
-
`-f, --format FORMAT`:: Specify format: `xml`, `html`, `json`, or `yaml`
|
|
2299
|
-
(auto-detected from extension if not specified)
|
|
2300
|
-
|
|
2301
|
-
`-m, --mode MODE`:: Output mode: `pretty` (default) or `c14n`
|
|
2302
|
-
|
|
2303
|
-
`-i, --indent N`:: Indentation spaces for pretty mode (default: 2)
|
|
2304
|
-
|
|
2305
|
-
`--indent-type TYPE`:: Indentation type: `space` (default) or `tab`
|
|
2306
|
-
|
|
2307
|
-
`-o, --output FILE`:: Write output to file instead of stdout
|
|
2308
|
-
|
|
2309
|
-
`-c, --with-comments`:: Include comments in canonical XML output
|
|
2310
|
-
|
|
2311
|
-
==== Examples
|
|
2312
|
-
|
|
2313
|
-
[source,bash]
|
|
2314
|
-
----
|
|
2315
|
-
# Pretty-print (default mode)
|
|
2316
|
-
$ canon format input.xml
|
|
2317
|
-
<?xml version="1.0" encoding="UTF-8"?>
|
|
2318
|
-
<root>
|
|
2319
|
-
<a>1</a>
|
|
2320
|
-
<b>2</b>
|
|
2321
|
-
</root>
|
|
2322
|
-
|
|
2323
|
-
# Canonical mode (compact)
|
|
2324
|
-
$ canon format input.xml --mode c14n
|
|
2325
|
-
<root><a>1</a><b>2</b></root>
|
|
2326
|
-
|
|
2327
|
-
# Custom indentation
|
|
2328
|
-
$ canon format input.xml --mode pretty --indent 4
|
|
2329
|
-
$ canon format input.json --indent 4
|
|
2330
|
-
|
|
2331
|
-
# Tab indentation
|
|
2332
|
-
$ canon format input.xml --indent-type tab
|
|
2333
|
-
$ canon format input.html --mode pretty --indent-type tab
|
|
2334
|
-
|
|
2335
|
-
# Specify format explicitly
|
|
2336
|
-
$ canon format data.txt --format xml
|
|
2337
|
-
|
|
2338
|
-
# Save to file
|
|
2339
|
-
$ canon format input.xml --output formatted.xml
|
|
2340
|
-
|
|
2341
|
-
# Include XML comments in canonical output
|
|
2342
|
-
$ canon format doc.xml --mode c14n --with-comments
|
|
2343
|
-
|
|
2344
|
-
# HTML files
|
|
2345
|
-
$ canon format page.html
|
|
2346
|
-
$ canon format page.html --mode c14n
|
|
2347
|
-
----
|
|
2348
|
-
|
|
2349
|
-
==== Format detection
|
|
2350
|
-
|
|
2351
|
-
[cols="1,1"]
|
|
2352
|
-
|===
|
|
2353
|
-
|File Extension |Detected Format
|
|
2354
|
-
|
|
2355
|
-
|`.xml`
|
|
2356
|
-
|XML
|
|
2357
|
-
|
|
2358
|
-
|`.html`, `.htm`
|
|
2359
|
-
|HTML
|
|
2360
|
-
|
|
2361
|
-
|`.json`
|
|
2362
|
-
|JSON
|
|
2363
|
-
|
|
2364
|
-
|`.yaml`, `.yml`
|
|
2365
|
-
|YAML
|
|
2366
|
-
|===
|
|
2367
|
-
|
|
2368
|
-
=== Diff command
|
|
2369
|
-
|
|
2370
|
-
Compare two files using **semantic comparison** that understands the structure of
|
|
2371
|
-
XML, HTML, JSON, and YAML formats. Unlike traditional text-based diff tools,
|
|
2372
|
-
`canon diff` compares the meaning and structure of your data, not just the
|
|
2373
|
-
characters.
|
|
2374
|
-
|
|
2375
|
-
==== Command syntax
|
|
2376
|
-
|
|
2377
|
-
[source,bash]
|
|
2378
|
-
----
|
|
2379
|
-
canon diff FILE1 FILE2 [OPTIONS]
|
|
2380
|
-
----
|
|
2381
|
-
|
|
2382
|
-
==== Diff modes
|
|
2383
|
-
|
|
2384
|
-
Canon supports two diff modes optimized for different use cases:
|
|
2385
|
-
|
|
2386
|
-
===== by-object mode (default for JSON/YAML)
|
|
2387
|
-
|
|
2388
|
-
Compares files **semantically** by their data structure and displays differences
|
|
2389
|
-
as a visual tree showing what changed in the structure.
|
|
2390
|
-
|
|
2391
|
-
Best for::
|
|
2392
|
-
* Configuration files where you care about what values changed
|
|
2393
|
-
* API responses where structure matters
|
|
2394
|
-
* Comparing semantic equivalence across formats
|
|
2395
|
-
|
|
2396
|
-
Features::
|
|
2397
|
-
* Tree visualization with box-drawing characters
|
|
2398
|
-
* Shows only what changed (additions, removals, modifications)
|
|
2399
|
-
* Ignores formatting differences automatically
|
|
2400
|
-
* Color-coded output (red=removed, green=added, yellow=changed)
|
|
2401
|
-
|
|
2402
|
-
===== by-line mode (default for HTML, optional for XML)
|
|
2403
|
-
|
|
2404
|
-
Compares files **line-by-line** after canonicalization, showing traditional
|
|
2405
|
-
diff-style output.
|
|
2406
|
-
|
|
2407
|
-
Best for::
|
|
2408
|
-
* HTML markup where line-level changes matter
|
|
2409
|
-
* Reviewing exact textual differences
|
|
2410
|
-
* When you need to see the full document context
|
|
2411
|
-
|
|
2412
|
-
Features::
|
|
2413
|
-
* Traditional diff format with line numbers
|
|
2414
|
-
* Shows before/after for each change
|
|
2415
|
-
* Better for understanding markup structure changes
|
|
2416
|
-
|
|
2417
|
-
[NOTE]
|
|
2418
|
-
* JSON and YAML always use **by-object** mode
|
|
2419
|
-
* HTML always uses **by-line** mode
|
|
2420
|
-
* XML uses **by-object** mode by default, but can use **by-line** with `--by-line`
|
|
2421
|
-
|
|
2422
|
-
==== Options
|
|
2423
|
-
|
|
2424
|
-
===== Format options
|
|
2425
|
-
|
|
2426
|
-
`-f, --format FORMAT`:: Format for both files: `xml`, `html`, `json`, or `yaml`
|
|
2427
|
-
(auto-detected from extension if not specified)
|
|
2428
|
-
|
|
2429
|
-
`--format1 FORMAT`:: Format for first file (when comparing different formats)
|
|
2430
|
-
|
|
2431
|
-
`--format2 FORMAT`:: Format for second file (when comparing different formats)
|
|
2432
|
-
|
|
2433
|
-
===== Comparison options
|
|
2434
|
-
|
|
2435
|
-
`-v, --verbose`:: Show detailed differences in tree format (default: just show
|
|
2436
|
-
if files differ)
|
|
2437
|
-
|
|
2438
|
-
`--by-line`:: Use line-by-line diff for XML (default: by-object mode)
|
|
2439
|
-
|
|
2440
|
-
`--collapse-whitespace` / `--no-collapse-whitespace`:: Control whitespace
|
|
2441
|
-
normalization in text nodes (default: collapse)
|
|
2442
|
-
|
|
2443
|
-
`--ignore-attr-order` / `--no-ignore-attr-order`:: Control whether attribute/key
|
|
2444
|
-
ordering matters (default: ignore order)
|
|
2445
|
-
|
|
2446
|
-
`--ignore-comments`:: Ignore XML/HTML comments during comparison (overrides
|
|
2447
|
-
`--with-comments`)
|
|
2448
|
-
|
|
2449
|
-
`--ignore-text-nodes`:: Ignore all text node content, only compare structure
|
|
2450
|
-
|
|
2451
|
-
`-c, --with-comments`:: Include comments in comparison (sets `ignore_comments: false`)
|
|
2452
|
-
|
|
2453
|
-
===== Output options
|
|
2454
|
-
|
|
2455
|
-
`--color` / `--no-color`:: Enable/disable colored output (default: enabled)
|
|
2456
|
-
|
|
2457
|
-
==== Examples
|
|
2458
|
-
|
|
2459
|
-
===== Basic comparison
|
|
2460
|
-
|
|
2461
|
-
[source,bash]
|
|
2462
|
-
----
|
|
2463
|
-
# Compare two JSON files (shows if equivalent or different)
|
|
2464
|
-
$ canon diff config1.json config2.json
|
|
2465
|
-
Files are semantically different
|
|
2466
|
-
|
|
2467
|
-
# Compare two XML files
|
|
2468
|
-
$ canon diff file1.xml file2.xml
|
|
2469
|
-
✅ Files are semantically equivalent
|
|
2470
|
-
----
|
|
2471
|
-
|
|
2472
|
-
===== Verbose mode examples
|
|
2473
|
-
|
|
2474
|
-
====== JSON comparison (by-object mode)
|
|
2475
|
-
|
|
2476
|
-
[example]
|
|
2477
|
-
Given these two JSON files:
|
|
2478
|
-
|
|
2479
|
-
.config1.json
|
|
2480
|
-
[source,json]
|
|
2481
|
-
----
|
|
2482
|
-
{
|
|
2483
|
-
"name": "myapp",
|
|
2484
|
-
"version": "1.0.0",
|
|
2485
|
-
"settings": {
|
|
2486
|
-
"debug": true,
|
|
2487
|
-
"port": 8080
|
|
2488
|
-
}
|
|
2489
|
-
}
|
|
2490
|
-
----
|
|
2491
|
-
|
|
2492
|
-
.config2.json
|
|
2493
|
-
[source,json]
|
|
2494
|
-
----
|
|
2495
|
-
{
|
|
2496
|
-
"version": "2.0.0",
|
|
2497
|
-
"name": "myapp",
|
|
2498
|
-
"settings": {
|
|
2499
|
-
"debug": false,
|
|
2500
|
-
"port": 8080
|
|
2501
|
-
}
|
|
2502
|
-
}
|
|
2503
|
-
----
|
|
2504
|
-
|
|
2505
|
-
Running with `--verbose`:
|
|
2506
|
-
|
|
2507
|
-
[source,bash]
|
|
2508
|
-
----
|
|
2509
|
-
$ canon diff config1.json config2.json --verbose
|
|
2510
|
-
Visual Diff:
|
|
2511
|
-
├── settings.debug:
|
|
2512
|
-
│ ├── - true
|
|
2513
|
-
│ └── + false
|
|
2514
|
-
└── version:
|
|
2515
|
-
├── - "1.0.0"
|
|
2516
|
-
└── + "2.0.0"
|
|
2517
|
-
----
|
|
2518
|
-
|
|
2519
|
-
The tree shows:
|
|
2520
|
-
* Key order difference (`version` moved) is ignored
|
|
2521
|
-
* Only semantic changes are shown: `debug` and `version` values changed
|
|
2522
|
-
|
|
2523
|
-
====== XML comparison (by-object mode with DOM-guided semantic matching)
|
|
2524
|
-
|
|
2525
|
-
Canon's XML diff uses **hybrid DOM-guided line diff** that semantically matches
|
|
2526
|
-
elements across documents using identity attributes (such as `id`, `ref`, `name`,
|
|
2527
|
-
`key`) and element paths. This ensures that corresponding elements are compared
|
|
2528
|
-
even when they appear at different line positions in the files.
|
|
2529
|
-
|
|
2530
|
-
[example]
|
|
2531
|
-
Given these two XML files:
|
|
2532
|
-
|
|
2533
|
-
.document1.xml
|
|
2534
|
-
[source,xml]
|
|
2535
|
-
----
|
|
2536
|
-
<standard-document>
|
|
2537
|
-
<preface>
|
|
2538
|
-
<foreword id="fwd">
|
|
2539
|
-
<p>First paragraph</p>
|
|
2540
|
-
</foreword>
|
|
2541
|
-
</preface>
|
|
2542
|
-
<sections>
|
|
2543
|
-
<clause id="scope">
|
|
2544
|
-
<title>Scope</title>
|
|
2545
|
-
</clause>
|
|
2546
|
-
</sections>
|
|
2547
|
-
</standard-document>
|
|
2548
|
-
----
|
|
2549
|
-
|
|
2550
|
-
.document2.xml
|
|
2551
|
-
[source,xml]
|
|
2552
|
-
----
|
|
2553
|
-
<standard-document>
|
|
2554
|
-
<preface>
|
|
2555
|
-
<foreword displayorder="2" id="fwd">
|
|
2556
|
-
<p>First paragraph</p>
|
|
2557
|
-
</foreword>
|
|
2558
|
-
</preface>
|
|
2559
|
-
<sections>
|
|
2560
|
-
<clause id="scope">
|
|
2561
|
-
<title>Scope</title>
|
|
2562
|
-
<p>New content</p>
|
|
2563
|
-
</clause>
|
|
2564
|
-
</sections>
|
|
2565
|
-
</standard-document>
|
|
2566
|
-
----
|
|
2567
|
-
|
|
2568
|
-
Running with `--verbose` using by-object mode (default):
|
|
2569
|
-
|
|
2570
|
-
[source,bash]
|
|
2571
|
-
----
|
|
2572
|
-
$ canon diff document1.xml document2.xml --verbose
|
|
2573
|
-
Visual Diff:
|
|
2574
|
-
├── preface.foreword:
|
|
2575
|
-
│ └── + displayorder="2"
|
|
2576
|
-
└── sections.clause.p:
|
|
2577
|
-
└── + "New content"
|
|
2578
|
-
----
|
|
2579
|
-
|
|
2580
|
-
The DOM-guided diff shows:
|
|
2581
|
-
|
|
2582
|
-
* The `<foreword id="fwd">` elements are **semantically matched** by their `id`
|
|
2583
|
-
attribute, even though they may be at different positions
|
|
2584
|
-
* Only the **added** `displayorder` attribute is shown for foreword
|
|
2585
|
-
* The **added** `<p>` element in clause is shown
|
|
2586
|
-
* Unchanged content is not displayed
|
|
2587
|
-
|
|
2588
|
-
[example]
|
|
2589
|
-
Example with element matching when positions differ:
|
|
2590
|
-
|
|
2591
|
-
.file1.xml
|
|
2592
|
-
[source,xml]
|
|
2593
|
-
----
|
|
2594
|
-
<root>
|
|
2595
|
-
<item id="1" name="Widget" price="10.00"/>
|
|
2596
|
-
<item id="2" name="Gadget" price="20.00"/>
|
|
2597
|
-
</root>
|
|
2598
|
-
----
|
|
2599
|
-
|
|
2600
|
-
.file2.xml
|
|
2601
|
-
[source,xml]
|
|
2602
|
-
----
|
|
2603
|
-
<root>
|
|
2604
|
-
<item price="20.00" name="Gadget" id="2"/>
|
|
2605
|
-
<item id="1" name="Widget" price="15.00"/>
|
|
2606
|
-
</root>
|
|
2607
|
-
----
|
|
2608
|
-
|
|
2609
|
-
Running with `--verbose`:
|
|
2610
|
-
|
|
2611
|
-
[source,bash]
|
|
2612
|
-
----
|
|
2613
|
-
$ canon diff file1.xml file2.xml --verbose
|
|
2614
|
-
Visual Diff:
|
|
2615
|
-
└── root.item[id="1"].price:
|
|
2616
|
-
├── - "10.00"
|
|
2617
|
-
└── + "15.00"
|
|
2618
|
-
----
|
|
2619
|
-
|
|
2620
|
-
The semantic matching shows:
|
|
2621
|
-
|
|
2622
|
-
* Elements are matched by `id` attribute (`id="1"` with `id="1"`, `id="2"` with `id="2"`)
|
|
2623
|
-
* Position changes are ignored (item with `id="2"` moved from second to first)
|
|
2624
|
-
* Attribute reordering is ignored (price/name order changed)
|
|
2625
|
-
* Only the semantic change is shown: `price` value changed for item `id="1"`
|
|
2626
|
-
|
|
2627
|
-
[NOTE]
|
|
2628
|
-
DOM-guided semantic matching features:
|
|
2629
|
-
|
|
2630
|
-
* **Identity attributes**: Matches elements using `id`, `ref`, `name`, or `key` attributes
|
|
2631
|
-
* **Element paths**: Uses full element path for matching (e.g., `root.item`)
|
|
2632
|
-
* **Token-level highlighting**: Shows differences at semantic token level (element
|
|
2633
|
-
names, attribute names, attribute values)
|
|
2634
|
-
* **Parent filtering**: Skips parent elements that only differ in children to
|
|
2635
|
-
avoid redundant output
|
|
2636
|
-
* **Line range mapping**: Maps DOM elements to exact line ranges in pretty-printed
|
|
2637
|
-
output for accurate diff display
|
|
2638
|
-
|
|
2639
|
-
====== XML comparison (by-line mode)
|
|
2640
|
-
|
|
2641
|
-
The `--by-line` option switches to traditional line-by-line diff after
|
|
2642
|
-
canonicalization, useful when you need to see exact line-level changes.
|
|
2643
|
-
|
|
2644
|
-
[example]
|
|
2645
|
-
Using the previous example files, but with `--by-line`:
|
|
2646
|
-
|
|
2647
|
-
[source,bash]
|
|
2648
|
-
----
|
|
2649
|
-
$ canon diff document1.xml document2.xml --by-line --verbose
|
|
2650
|
-
Line-by-line diff:
|
|
2651
|
-
4 - | <foreword id="fwd">
|
|
2652
|
-
4 + | <foreword displayorder="2" id="fwd">
|
|
2653
|
-
5 | <p>First paragraph</p>
|
|
2654
|
-
10 + | <p>New content</p>
|
|
2655
|
-
11 | </clause>
|
|
2656
|
-
----
|
|
2657
|
-
|
|
2658
|
-
The by-line mode shows:
|
|
2659
|
-
|
|
2660
|
-
* Traditional diff format with line numbers
|
|
2661
|
-
* Full line context after canonicalization
|
|
2662
|
-
* All changes at line level (not semantic level)
|
|
2663
|
-
* Useful for reviewing exact textual differences
|
|
2664
|
-
|
|
2665
|
-
====== YAML comparison (by-object mode)
|
|
2666
|
-
|
|
2667
|
-
YAML comparison uses by-object mode to show semantic differences in the data
|
|
2668
|
-
structure, ignoring formatting and key ordering differences.
|
|
2669
|
-
|
|
2670
|
-
[example]
|
|
2671
|
-
Given these two YAML files:
|
|
2672
|
-
|
|
2673
|
-
.config1.yaml
|
|
2674
|
-
[source,yaml]
|
|
2675
|
-
----
|
|
2676
|
-
database:
|
|
2677
|
-
host: localhost
|
|
2678
|
-
port: 5432
|
|
2679
|
-
name: mydb
|
|
2680
|
-
logging:
|
|
2681
|
-
level: info
|
|
2682
|
-
format: json
|
|
2683
|
-
----
|
|
2684
|
-
|
|
2685
|
-
.config2.yaml
|
|
2686
|
-
[source,yaml]
|
|
2687
|
-
----
|
|
2688
|
-
logging:
|
|
2689
|
-
level: debug
|
|
2690
|
-
format: json
|
|
2691
|
-
database:
|
|
2692
|
-
port: 5432
|
|
2693
|
-
host: localhost
|
|
2694
|
-
name: production
|
|
2695
|
-
----
|
|
2696
|
-
|
|
2697
|
-
Running with `--verbose`:
|
|
2698
|
-
|
|
2699
|
-
[source,bash]
|
|
2700
|
-
----
|
|
2701
|
-
$ canon diff config1.yaml config2.yaml --verbose
|
|
2702
|
-
Visual Diff:
|
|
2703
|
-
├── database.name:
|
|
2704
|
-
│ ├── - "mydb"
|
|
2705
|
-
│ └── + "production"
|
|
2706
|
-
└── logging.level:
|
|
2707
|
-
├── - "info"
|
|
2708
|
-
└── + "debug"
|
|
2709
|
-
----
|
|
2710
|
-
|
|
2711
|
-
The by-object mode shows:
|
|
2712
|
-
|
|
2713
|
-
* Section reordering (`logging` before `database`) is ignored
|
|
2714
|
-
* Key reordering within sections (`port` before `host`) is ignored
|
|
2715
|
-
* Only semantic value changes are displayed
|
|
2716
|
-
* Tree structure clearly shows the path to each change
|
|
2717
|
-
|
|
2718
|
-
===== Comparison options examples
|
|
2719
|
-
|
|
2720
|
-
[source,bash]
|
|
2721
|
-
----
|
|
2722
|
-
# Include comments in XML comparison
|
|
2723
|
-
$ canon diff doc1.xml doc2.xml --with-comments --verbose
|
|
2724
|
-
|
|
2725
|
-
# Ignore all text content, only compare structure
|
|
2726
|
-
$ canon diff template1.html template2.html --ignore-text-nodes
|
|
2727
|
-
|
|
2728
|
-
# Don't collapse whitespace (exact whitespace comparison)
|
|
2729
|
-
$ canon diff file1.xml file2.xml --no-collapse-whitespace
|
|
2730
|
-
|
|
2731
|
-
# Compare different formats (must have same structure)
|
|
2732
|
-
$ canon diff config.json config.yaml --format1 json --format2 yaml --verbose
|
|
2733
|
-
----
|
|
2734
|
-
|
|
2735
|
-
===== HTML comparison (by-line mode only)
|
|
2736
|
-
|
|
2737
|
-
HTML comparison always uses by-line mode after canonicalization, which is ideal
|
|
2738
|
-
for reviewing markup structure changes.
|
|
2739
|
-
|
|
2740
|
-
[example]
|
|
2741
|
-
Given these two HTML files:
|
|
2742
|
-
|
|
2743
|
-
.page1.html
|
|
2744
|
-
[source,html]
|
|
2745
|
-
----
|
|
2746
|
-
<!DOCTYPE html>
|
|
2747
|
-
<html>
|
|
2748
|
-
<head>
|
|
2749
|
-
<title>My Page</title>
|
|
2750
|
-
</head>
|
|
2751
|
-
<body>
|
|
2752
|
-
<div class="header">
|
|
2753
|
-
<h1>Welcome</h1>
|
|
2754
|
-
<p>Introduction text</p>
|
|
2755
|
-
</div>
|
|
2756
|
-
<div class="content">
|
|
2757
|
-
<p>Main content</p>
|
|
2758
|
-
</div>
|
|
2759
|
-
</body>
|
|
2760
|
-
</html>
|
|
2761
|
-
----
|
|
2762
|
-
|
|
2763
|
-
.page2.html
|
|
2764
|
-
[source,html]
|
|
2765
|
-
----
|
|
2766
|
-
<!DOCTYPE html>
|
|
2767
|
-
<html>
|
|
2768
|
-
<head>
|
|
2769
|
-
<title>My Updated Page</title>
|
|
2770
|
-
</head>
|
|
2771
|
-
<body>
|
|
2772
|
-
<nav class="header">
|
|
2773
|
-
<h1>Welcome</h1>
|
|
2774
|
-
<p>Updated introduction</p>
|
|
2775
|
-
</nav>
|
|
2776
|
-
<div class="content">
|
|
2777
|
-
<p>Main content</p>
|
|
2778
|
-
<p>Additional paragraph</p>
|
|
2779
|
-
</div>
|
|
2780
|
-
</body>
|
|
2781
|
-
</html>
|
|
2782
|
-
----
|
|
2783
|
-
|
|
2784
|
-
Running with `--verbose`:
|
|
2785
|
-
|
|
2786
|
-
[source,bash]
|
|
2787
|
-
----
|
|
2788
|
-
$ canon diff page1.html page2.html --verbose
|
|
2789
|
-
Line-by-line diff:
|
|
2790
|
-
4 - | <title>My Page</title>
|
|
2791
|
-
4 + | <title>My Updated Page</title>
|
|
2792
|
-
7 - | <div class="header">
|
|
2793
|
-
7 + | <nav class="header">
|
|
2794
|
-
9 - | <p>Introduction text</p>
|
|
2795
|
-
9 + | <p>Updated introduction</p>
|
|
2796
|
-
10 - | </div>
|
|
2797
|
-
10 + | </nav>
|
|
2798
|
-
13 + | <p>Additional paragraph</p>
|
|
2799
|
-
14 | </div>
|
|
2800
|
-
----
|
|
2801
|
-
|
|
2802
|
-
The line-by-line mode shows:
|
|
2803
|
-
|
|
2804
|
-
* Element name changes (`<div>` to `<nav>`)
|
|
2805
|
-
* Text content changes
|
|
2806
|
-
* Added elements with proper indentation context
|
|
2807
|
-
* Line numbers help locate changes in the document
|
|
2808
|
-
|
|
2809
|
-
===== Exit codes
|
|
2810
|
-
|
|
2811
|
-
* `0` - Files are semantically equivalent
|
|
2812
|
-
* `1` - Files are semantically different
|
|
2813
|
-
|
|
2814
|
-
|
|
2815
|
-
== Development
|
|
2816
|
-
|
|
2817
|
-
After checking out the repo, run `bin/setup` to install dependencies. Then, run
|
|
2818
|
-
`rake spec` to run the tests. You can also run `bin/console` for an interactive
|
|
2819
|
-
prompt that will allow you to experiment.
|
|
2820
|
-
|
|
2821
|
-
|
|
2822
|
-
== Contributing
|
|
2823
|
-
|
|
2824
|
-
Bug reports and pull requests are welcome on GitHub at
|
|
2825
|
-
https://github.com/lutaml/canon.
|
|
2826
|
-
|
|
2827
|
-
|
|
2828
|
-
== Copyright and license
|
|
2829
|
-
|
|
2830
|
-
Copyright Ribose.
|
|
2831
|
-
https://opensource.org/licenses/BSD-2-Clause[BSD-2-Clause License].
|