canon 0.1.6 → 0.1.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop_todo.yml +163 -67
- data/README.adoc +400 -7
- data/docs/Gemfile +9 -0
- data/docs/INDEX.adoc +99 -182
- data/docs/_config.yml +100 -0
- data/docs/advanced/diff-classification.adoc +547 -0
- data/docs/advanced/diff-pipeline.adoc +358 -0
- data/docs/advanced/index.adoc +214 -0
- data/docs/advanced/semantic-diff-report.adoc +390 -0
- data/docs/{VERBOSE.adoc → advanced/verbose-mode-architecture.adoc} +51 -53
- data/docs/features/diff-formatting/algorithm-specific-output.adoc +533 -0
- data/docs/{CHARACTER_VISUALIZATION.adoc → features/diff-formatting/character-visualization.adoc} +23 -62
- data/docs/features/diff-formatting/colors-and-symbols.adoc +606 -0
- data/docs/features/diff-formatting/context-and-grouping.adoc +490 -0
- data/docs/features/diff-formatting/display-filtering.adoc +472 -0
- data/docs/features/diff-formatting/index.adoc +140 -0
- data/docs/features/environment-configuration/index.adoc +327 -0
- data/docs/features/environment-configuration/override-system.adoc +436 -0
- data/docs/features/environment-configuration/size-limits.adoc +273 -0
- data/docs/features/index.adoc +173 -0
- data/docs/features/input-validation/index.adoc +521 -0
- data/docs/features/match-options/algorithm-specific-behavior.adoc +365 -0
- data/docs/features/match-options/html-policies.adoc +312 -0
- data/docs/features/match-options/index.adoc +621 -0
- data/docs/getting-started/index.adoc +83 -0
- data/docs/getting-started/quick-start.adoc +76 -0
- data/docs/guides/choosing-configuration.adoc +689 -0
- data/docs/guides/index.adoc +181 -0
- data/docs/{CLI.adoc → interfaces/cli/index.adoc} +18 -13
- data/docs/interfaces/index.adoc +101 -0
- data/docs/{RSPEC.adoc → interfaces/rspec/index.adoc} +242 -31
- data/docs/{RUBY_API.adoc → interfaces/ruby-api/index.adoc} +118 -16
- data/docs/lychee.toml +65 -0
- data/docs/reference/cli-options.adoc +418 -0
- data/docs/reference/environment-variables.adoc +375 -0
- data/docs/reference/index.adoc +204 -0
- data/docs/reference/options-across-interfaces.adoc +417 -0
- data/docs/understanding/algorithms/dom-diff.adoc +389 -0
- data/docs/understanding/algorithms/index.adoc +314 -0
- data/docs/understanding/algorithms/semantic-tree-diff.adoc +533 -0
- data/docs/understanding/architecture.adoc +447 -0
- data/docs/understanding/comparison-pipeline.adoc +317 -0
- data/docs/understanding/formats/html.adoc +380 -0
- data/docs/understanding/formats/index.adoc +261 -0
- data/docs/understanding/formats/json.adoc +390 -0
- data/docs/understanding/formats/xml.adoc +366 -0
- data/docs/understanding/formats/yaml.adoc +504 -0
- data/docs/understanding/index.adoc +130 -0
- data/lib/canon/cli.rb +42 -1
- data/lib/canon/commands/diff_command.rb +108 -23
- data/lib/canon/comparison/compare_profile.rb +101 -0
- data/lib/canon/comparison/comparison_result.rb +41 -2
- data/lib/canon/comparison/html_comparator.rb +292 -71
- data/lib/canon/comparison/html_compare_profile.rb +117 -0
- data/lib/canon/comparison/match_options.rb +42 -4
- data/lib/canon/comparison/strategies/base_match_strategy.rb +99 -0
- data/lib/canon/comparison/strategies/match_strategy_factory.rb +74 -0
- data/lib/canon/comparison/strategies/semantic_tree_match_strategy.rb +220 -0
- data/lib/canon/comparison/xml_comparator.rb +695 -91
- data/lib/canon/comparison.rb +207 -2
- data/lib/canon/config/env_provider.rb +71 -0
- data/lib/canon/config/env_schema.rb +58 -0
- data/lib/canon/config/override_resolver.rb +55 -0
- data/lib/canon/config/type_converter.rb +59 -0
- data/lib/canon/config.rb +158 -29
- data/lib/canon/data_model.rb +29 -0
- data/lib/canon/diff/diff_classifier.rb +74 -14
- data/lib/canon/diff/diff_context_builder.rb +41 -0
- data/lib/canon/diff/diff_line.rb +18 -2
- data/lib/canon/diff/diff_node.rb +18 -3
- data/lib/canon/diff/diff_node_mapper.rb +71 -12
- data/lib/canon/diff/formatting_detector.rb +53 -0
- data/lib/canon/diff_formatter/by_line/base_formatter.rb +60 -5
- data/lib/canon/diff_formatter/by_line/html_formatter.rb +68 -16
- data/lib/canon/diff_formatter/by_line/json_formatter.rb +0 -37
- data/lib/canon/diff_formatter/by_line/simple_formatter.rb +0 -42
- data/lib/canon/diff_formatter/by_line/xml_formatter.rb +116 -31
- data/lib/canon/diff_formatter/by_line/yaml_formatter.rb +0 -37
- data/lib/canon/diff_formatter/by_object/base_formatter.rb +126 -19
- data/lib/canon/diff_formatter/by_object/xml_formatter.rb +30 -1
- data/lib/canon/diff_formatter/debug_output.rb +7 -1
- data/lib/canon/diff_formatter/diff_detail_formatter.rb +674 -57
- data/lib/canon/diff_formatter/legend.rb +42 -0
- data/lib/canon/diff_formatter.rb +78 -9
- data/lib/canon/errors.rb +56 -0
- data/lib/canon/formatters/html_formatter_base.rb +35 -1
- data/lib/canon/formatters/json_formatter.rb +3 -0
- data/lib/canon/formatters/yaml_formatter.rb +3 -0
- data/lib/canon/html/data_model.rb +229 -0
- data/lib/canon/html.rb +9 -0
- data/lib/canon/options/cli_generator.rb +70 -0
- data/lib/canon/options/registry.rb +234 -0
- data/lib/canon/rspec_matchers.rb +34 -13
- data/lib/canon/tree_diff/adapters/html_adapter.rb +316 -0
- data/lib/canon/tree_diff/adapters/json_adapter.rb +204 -0
- data/lib/canon/tree_diff/adapters/xml_adapter.rb +285 -0
- data/lib/canon/tree_diff/adapters/yaml_adapter.rb +213 -0
- data/lib/canon/tree_diff/core/attribute_comparator.rb +84 -0
- data/lib/canon/tree_diff/core/matching.rb +241 -0
- data/lib/canon/tree_diff/core/node_signature.rb +164 -0
- data/lib/canon/tree_diff/core/node_weight.rb +135 -0
- data/lib/canon/tree_diff/core/tree_node.rb +450 -0
- data/lib/canon/tree_diff/matchers/hash_matcher.rb +258 -0
- data/lib/canon/tree_diff/matchers/similarity_matcher.rb +168 -0
- data/lib/canon/tree_diff/matchers/structural_propagator.rb +242 -0
- data/lib/canon/tree_diff/matchers/universal_matcher.rb +220 -0
- data/lib/canon/tree_diff/operation_converter.rb +631 -0
- data/lib/canon/tree_diff/operations/operation.rb +92 -0
- data/lib/canon/tree_diff/operations/operation_detector.rb +626 -0
- data/lib/canon/tree_diff/tree_diff_integrator.rb +140 -0
- data/lib/canon/tree_diff.rb +33 -0
- data/lib/canon/validators/json_validator.rb +3 -1
- data/lib/canon/validators/yaml_validator.rb +3 -1
- data/lib/canon/version.rb +1 -1
- data/lib/canon/xml/data_model.rb +22 -23
- data/lib/canon/xml/element_matcher.rb +128 -20
- data/lib/canon/xml/namespace_helper.rb +110 -0
- data/lib/canon.rb +3 -0
- metadata +81 -23
- data/_config.yml +0 -116
- data/docs/ADVANCED_TOPICS.adoc +0 -20
- data/docs/BASIC_USAGE.adoc +0 -16
- data/docs/CUSTOMIZING_BEHAVIOR.adoc +0 -19
- data/docs/DIFF_ARCHITECTURE.adoc +0 -435
- data/docs/DIFF_FORMATTING.adoc +0 -540
- data/docs/FORMATS.adoc +0 -447
- data/docs/INPUT_VALIDATION.adoc +0 -477
- data/docs/MATCH_ARCHITECTURE.adoc +0 -463
- data/docs/MATCH_OPTIONS.adoc +0 -719
- data/docs/MODES.adoc +0 -432
- data/docs/NORMATIVE_INFORMATIVE_DIFFS.adoc +0 -219
- data/docs/OPTIONS.adoc +0 -1387
- data/docs/PREPROCESSING.adoc +0 -491
- data/docs/SEMANTIC_DIFF_REPORT.adoc +0 -528
- data/docs/UNDERSTANDING_CANON.adoc +0 -17
|
@@ -0,0 +1,689 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: Choosing Configuration
|
|
3
|
+
parent: Guides
|
|
4
|
+
nav_order: 1
|
|
5
|
+
---
|
|
6
|
+
= Choosing Configuration
|
|
7
|
+
|
|
8
|
+
== Purpose
|
|
9
|
+
|
|
10
|
+
Canon's 4-layer architecture provides powerful flexibility, but this can be overwhelming. This guide helps you choose the right configuration for your use case through decision trees, use case scenarios, and practical recommendations.
|
|
11
|
+
|
|
12
|
+
== Quick Decision Tree
|
|
13
|
+
|
|
14
|
+
[mermaid]
|
|
15
|
+
----
|
|
16
|
+
graph TD
|
|
17
|
+
Start[What are you comparing?] --> Similar{Similar<br/>structure?}
|
|
18
|
+
Similar -->|Yes| Fast{Need<br/>speed?}
|
|
19
|
+
Similar -->|No| Semantic[Use Semantic Algorithm]
|
|
20
|
+
|
|
21
|
+
Fast -->|Yes| DOM[Use DOM Algorithm]
|
|
22
|
+
Fast -->|No| Semantic
|
|
23
|
+
|
|
24
|
+
DOM --> Format{Care about<br/>formatting?}
|
|
25
|
+
Semantic --> Format
|
|
26
|
+
|
|
27
|
+
Format -->|Yes| Strict[strict profile]
|
|
28
|
+
Format -->|No| SpecFriendly[spec_friendly profile]
|
|
29
|
+
|
|
30
|
+
Strict --> Output1[by_line mode]
|
|
31
|
+
SpecFriendly --> Output2{Want<br/>operations?}
|
|
32
|
+
|
|
33
|
+
Output2 -->|Yes| ByObject[by_object mode]
|
|
34
|
+
Output2 -->|No| ByLine[by_line mode]
|
|
35
|
+
|
|
36
|
+
style DOM fill:#fff4e1
|
|
37
|
+
style Semantic fill:#e1f5ff
|
|
38
|
+
style Strict fill:#ffe1f5
|
|
39
|
+
style SpecFriendly fill:#ffe1f5
|
|
40
|
+
style ByObject fill:#e1ffe1
|
|
41
|
+
style ByLine fill:#e1ffe1
|
|
42
|
+
----
|
|
43
|
+
|
|
44
|
+
== Layer-by-Layer Decision Guide
|
|
45
|
+
|
|
46
|
+
=== Layer 1: Preprocessing
|
|
47
|
+
|
|
48
|
+
**Question**: How should documents be normalized before comparison?
|
|
49
|
+
|
|
50
|
+
[cols="2,3,3"]
|
|
51
|
+
|===
|
|
52
|
+
|Choose |When |Example
|
|
53
|
+
|
|
54
|
+
|**none**
|
|
55
|
+
|Documents already in comparable form, no normalization needed
|
|
56
|
+
|Comparing canonicalized XML files
|
|
57
|
+
|
|
58
|
+
|**c14n**
|
|
59
|
+
|Testing XML canonicalization implementations
|
|
60
|
+
|Validating C14N output
|
|
61
|
+
|
|
62
|
+
|**normalize**
|
|
63
|
+
|Whitespace differences are irrelevant
|
|
64
|
+
|Comparing generated vs handwritten XML
|
|
65
|
+
|
|
66
|
+
|**format**
|
|
67
|
+
|Want to compare structure, ignore all formatting
|
|
68
|
+
|Comparing minified vs formatted JSON
|
|
69
|
+
|===
|
|
70
|
+
|
|
71
|
+
**Default**: `none` (no preprocessing)
|
|
72
|
+
|
|
73
|
+
**Ruby API**:
|
|
74
|
+
[source,ruby]
|
|
75
|
+
----
|
|
76
|
+
Canon::Comparison.equivalent?(doc1, doc2,
|
|
77
|
+
preprocessing: :normalize # or :c14n, :format
|
|
78
|
+
)
|
|
79
|
+
----
|
|
80
|
+
|
|
81
|
+
**CLI**:
|
|
82
|
+
[source,bash]
|
|
83
|
+
----
|
|
84
|
+
canon diff file1.xml file2.xml --preprocessing normalize
|
|
85
|
+
----
|
|
86
|
+
|
|
87
|
+
=== Layer 2: Algorithm Selection
|
|
88
|
+
|
|
89
|
+
**Question**: What comparison strategy fits your documents?
|
|
90
|
+
|
|
91
|
+
[cols="2,3,3"]
|
|
92
|
+
|===
|
|
93
|
+
|Choose |When |Characteristics
|
|
94
|
+
|
|
95
|
+
|**dom**
|
|
96
|
+
|• Similar document structure +
|
|
97
|
+
• Traditional diff workflow +
|
|
98
|
+
• Speed is important +
|
|
99
|
+
• Production use (stable)
|
|
100
|
+
|• Fast +
|
|
101
|
+
• Position-based +
|
|
102
|
+
• No move detection +
|
|
103
|
+
• Well-tested
|
|
104
|
+
|
|
105
|
+
|**semantic**
|
|
106
|
+
|• Restructured documents +
|
|
107
|
+
• Need move detection +
|
|
108
|
+
• Operation analysis needed +
|
|
109
|
+
• Experimental OK
|
|
110
|
+
|• Slower +
|
|
111
|
+
• Signature-based +
|
|
112
|
+
• Detects moves +
|
|
113
|
+
• Experimental
|
|
114
|
+
|===
|
|
115
|
+
|
|
116
|
+
**Default**: `dom` (stable algorithm)
|
|
117
|
+
|
|
118
|
+
**Decision Matrix**:
|
|
119
|
+
[cols="2,1,1"]
|
|
120
|
+
|===
|
|
121
|
+
|Scenario |DOM |Semantic
|
|
122
|
+
|
|
123
|
+
|Documents have same structure
|
|
124
|
+
|✓
|
|
125
|
+
|✓
|
|
126
|
+
|
|
127
|
+
|Documents are reordered
|
|
128
|
+
|✗
|
|
129
|
+
|✓
|
|
130
|
+
|
|
131
|
+
|Need fast comparison
|
|
132
|
+
|✓
|
|
133
|
+
|✗
|
|
134
|
+
|
|
135
|
+
|Need move detection
|
|
136
|
+
|✗
|
|
137
|
+
|✓
|
|
138
|
+
|
|
139
|
+
|Production use
|
|
140
|
+
|✓
|
|
141
|
+
|⚠
|
|
142
|
+
|
|
143
|
+
|Large documents (> 100KB)
|
|
144
|
+
|✓
|
|
145
|
+
|✗
|
|
146
|
+
|===
|
|
147
|
+
|
|
148
|
+
**Ruby API**:
|
|
149
|
+
[source,ruby]
|
|
150
|
+
----
|
|
151
|
+
Canon::Comparison.equivalent?(doc1, doc2,
|
|
152
|
+
diff_algorithm: :dom # or :semantic
|
|
153
|
+
)
|
|
154
|
+
----
|
|
155
|
+
|
|
156
|
+
**CLI**:
|
|
157
|
+
[source,bash]
|
|
158
|
+
----
|
|
159
|
+
canon diff file1.xml file2.xml --diff-algorithm semantic
|
|
160
|
+
----
|
|
161
|
+
|
|
162
|
+
=== Layer 3: Match Options
|
|
163
|
+
|
|
164
|
+
**Question**: How strict should comparison be?
|
|
165
|
+
|
|
166
|
+
==== Using Match Profiles (Recommended)
|
|
167
|
+
|
|
168
|
+
[cols="2,4"]
|
|
169
|
+
|===
|
|
170
|
+
|Profile |Use When
|
|
171
|
+
|
|
172
|
+
|**strict**
|
|
173
|
+
|Exact matching required. Everything must match exactly including whitespace, attribute order, comments.
|
|
174
|
+
|
|
175
|
+
|**rendered**
|
|
176
|
+
|Comparing rendered output. Simulates browser/CSS rendering - ignores formatting but keeps content strict.
|
|
177
|
+
|
|
178
|
+
|**spec_friendly**
|
|
179
|
+
|Writing tests. Ignores formatting differences, focuses on content and structure.
|
|
180
|
+
|
|
181
|
+
|**content_only**
|
|
182
|
+
|Content comparison only. Ignores all structural and formatting differences.
|
|
183
|
+
|===
|
|
184
|
+
|
|
185
|
+
**Default**: `strict` (exact matching)
|
|
186
|
+
|
|
187
|
+
**Ruby API**:
|
|
188
|
+
[source,ruby]
|
|
189
|
+
----
|
|
190
|
+
Canon::Comparison.equivalent?(doc1, doc2,
|
|
191
|
+
match_profile: :spec_friendly # or :strict, :rendered, :content_only
|
|
192
|
+
)
|
|
193
|
+
----
|
|
194
|
+
|
|
195
|
+
**CLI**:
|
|
196
|
+
[source,bash]
|
|
197
|
+
----
|
|
198
|
+
canon diff file1.xml file2.xml --match-profile spec_friendly
|
|
199
|
+
----
|
|
200
|
+
|
|
201
|
+
==== Custom Match Dimensions
|
|
202
|
+
|
|
203
|
+
For fine-grained control, configure individual dimensions:
|
|
204
|
+
|
|
205
|
+
[source,ruby]
|
|
206
|
+
----
|
|
207
|
+
Canon::Comparison.equivalent?(doc1, doc2,
|
|
208
|
+
match: {
|
|
209
|
+
text_content: :normalize, # normalize, strict, ignore
|
|
210
|
+
structural_whitespace: :ignore, # ignore, normalize, strict
|
|
211
|
+
attribute_order: :ignore, # ignore, strict (XML/HTML)
|
|
212
|
+
attribute_values: :normalize, # normalize, strict, ignore
|
|
213
|
+
comments: :ignore # ignore, normalize, strict
|
|
214
|
+
}
|
|
215
|
+
)
|
|
216
|
+
----
|
|
217
|
+
|
|
218
|
+
**Remember**: Match options behave differently with each algorithm! See link:../features/match-options/algorithm-specific-behavior.adoc[Algorithm-Specific Behavior].
|
|
219
|
+
|
|
220
|
+
=== Layer 4: Diff Formatting
|
|
221
|
+
|
|
222
|
+
**Question**: How should differences be displayed?
|
|
223
|
+
|
|
224
|
+
==== Choosing Diff Mode
|
|
225
|
+
|
|
226
|
+
[cols="2,3,3"]
|
|
227
|
+
|===
|
|
228
|
+
|Mode |Best For |Output Type
|
|
229
|
+
|
|
230
|
+
|**by_line**
|
|
231
|
+
|• Traditional diffs +
|
|
232
|
+
• Code review +
|
|
233
|
+
• Quick scanning +
|
|
234
|
+
• DOM algorithm
|
|
235
|
+
|Line-based diff similar to `git diff`
|
|
236
|
+
|
|
237
|
+
|**by_object**
|
|
238
|
+
|• Tree structure view +
|
|
239
|
+
• Operation analysis +
|
|
240
|
+
• Semantic algorithm
|
|
241
|
+
|Tree-based with operations (INSERT, DELETE, UPDATE, MOVE)
|
|
242
|
+
|===
|
|
243
|
+
|
|
244
|
+
**Default**: `by_line` for DOM, `by_object` for Semantic
|
|
245
|
+
|
|
246
|
+
**Natural Fits**:
|
|
247
|
+
* DOM + by_line = Traditional positional diff
|
|
248
|
+
* Semantic + by_object = Operation-based tree diff
|
|
249
|
+
|
|
250
|
+
**Ruby API**:
|
|
251
|
+
[source,ruby]
|
|
252
|
+
----
|
|
253
|
+
Canon::Comparison.equivalent?(doc1, doc2,
|
|
254
|
+
diff_mode: :by_object, # or :by_line
|
|
255
|
+
verbose: true # Enable diff output
|
|
256
|
+
)
|
|
257
|
+
----
|
|
258
|
+
|
|
259
|
+
**CLI**:
|
|
260
|
+
[source,bash]
|
|
261
|
+
----
|
|
262
|
+
canon diff file1.xml file2.xml --diff-mode by-object --verbose
|
|
263
|
+
----
|
|
264
|
+
|
|
265
|
+
==== Visual Formatting Options
|
|
266
|
+
|
|
267
|
+
[source,ruby]
|
|
268
|
+
----
|
|
269
|
+
Canon::Comparison.equivalent?(doc1, doc2,
|
|
270
|
+
verbose: true,
|
|
271
|
+
use_color: true, # Enable colors
|
|
272
|
+
context_lines: 3, # Lines of context
|
|
273
|
+
diff_grouping_lines: 5, # Group nearby changes
|
|
274
|
+
show_legend: true # Display symbol legend
|
|
275
|
+
)
|
|
276
|
+
----
|
|
277
|
+
|
|
278
|
+
== Use Case Scenarios
|
|
279
|
+
|
|
280
|
+
=== Scenario 1: Unit Testing XML Generation
|
|
281
|
+
|
|
282
|
+
**Requirement**: Test that code generates correct XML, ignoring formatting
|
|
283
|
+
|
|
284
|
+
**Configuration**:
|
|
285
|
+
[source,ruby]
|
|
286
|
+
----
|
|
287
|
+
expect(actual_xml).to be_equivalent_to(expected_xml).with_options(
|
|
288
|
+
preprocessing: :normalize, # Ignore formatting differences
|
|
289
|
+
diff_algorithm: :dom, # Fast, stable
|
|
290
|
+
match_profile: :spec_friendly, # Test-friendly
|
|
291
|
+
verbose: true # Show diffs on failure
|
|
292
|
+
)
|
|
293
|
+
----
|
|
294
|
+
|
|
295
|
+
**Why**:
|
|
296
|
+
* `normalize` handles inconsistent whitespace
|
|
297
|
+
* `dom` is fast and stable for tests
|
|
298
|
+
* `spec_friendly` focuses on content, not formatting
|
|
299
|
+
* `verbose` helps debug failures
|
|
300
|
+
|
|
301
|
+
=== Scenario 2: Comparing API Responses
|
|
302
|
+
|
|
303
|
+
**Requirement**: Compare JSON responses, key order doesn't matter
|
|
304
|
+
|
|
305
|
+
**Configuration**:
|
|
306
|
+
[source,ruby]
|
|
307
|
+
----
|
|
308
|
+
Canon::Comparison.equivalent?(response1, response2,
|
|
309
|
+
diff_algorithm: :dom,
|
|
310
|
+
match: {
|
|
311
|
+
key_order: :ignore, # JSON key order irrelevant
|
|
312
|
+
text_content: :normalize # Normalize string values
|
|
313
|
+
},
|
|
314
|
+
verbose: true,
|
|
315
|
+
diff_mode: :by_object # Tree view of differences
|
|
316
|
+
)
|
|
317
|
+
----
|
|
318
|
+
|
|
319
|
+
**Why**:
|
|
320
|
+
* `key_order: :ignore` handles JSON object key reordering
|
|
321
|
+
* `by_object` shows structured diff
|
|
322
|
+
* `dom` is sufficient for API responses
|
|
323
|
+
|
|
324
|
+
=== Scenario 3: Detecting Document Restructuring
|
|
325
|
+
|
|
326
|
+
**Requirement**: Find what changed when document is reorganized
|
|
327
|
+
|
|
328
|
+
**Configuration**:
|
|
329
|
+
[source,ruby]
|
|
330
|
+
----
|
|
331
|
+
result = Canon::Comparison.equivalent?(old_doc, new_doc,
|
|
332
|
+
diff_algorithm: :semantic, # Detect moves
|
|
333
|
+
match_profile: :spec_friendly, # Ignore formatting
|
|
334
|
+
verbose: true,
|
|
335
|
+
diff_mode: :by_object # See operations
|
|
336
|
+
)
|
|
337
|
+
|
|
338
|
+
# Analyze operations
|
|
339
|
+
puts "Moves: #{result.statistics.moves}"
|
|
340
|
+
puts "Updates: #{result.statistics.updates}"
|
|
341
|
+
----
|
|
342
|
+
|
|
343
|
+
**Why**:
|
|
344
|
+
* `semantic` algorithm detects moves and restructuring
|
|
345
|
+
* `by_object` shows operation-level changes
|
|
346
|
+
* Statistics provide quantitative analysis
|
|
347
|
+
|
|
348
|
+
=== Scenario 4: Code Review Diff
|
|
349
|
+
|
|
350
|
+
**Requirement**: Traditional diff for reviewing changes
|
|
351
|
+
|
|
352
|
+
**Configuration**:
|
|
353
|
+
[source,bash]
|
|
354
|
+
----
|
|
355
|
+
canon diff old.xml new.xml \
|
|
356
|
+
--diff-algorithm dom \
|
|
357
|
+
--match-profile spec_friendly \
|
|
358
|
+
--diff-mode by_line \
|
|
359
|
+
--verbose \
|
|
360
|
+
--use-color \
|
|
361
|
+
--context-lines 3
|
|
362
|
+
----
|
|
363
|
+
|
|
364
|
+
**Why**:
|
|
365
|
+
* `dom + by_line` gives traditional diff
|
|
366
|
+
* `context_lines` provides context
|
|
367
|
+
* Colors improve readability
|
|
368
|
+
|
|
369
|
+
=== Scenario 5: Canonicalization Testing
|
|
370
|
+
|
|
371
|
+
**Requirement**: Test C14N implementation
|
|
372
|
+
|
|
373
|
+
**Configuration**:
|
|
374
|
+
[source,ruby]
|
|
375
|
+
----
|
|
376
|
+
Canon::Comparison.equivalent?(doc, canonical_doc,
|
|
377
|
+
preprocessing: :c14n, # Apply canonicalization
|
|
378
|
+
diff_algorithm: :dom,
|
|
379
|
+
match_profile: :strict, # Exact match required
|
|
380
|
+
verbose: true
|
|
381
|
+
)
|
|
382
|
+
----
|
|
383
|
+
|
|
384
|
+
**Why**:
|
|
385
|
+
* `c14n` preprocessing applies canonicalization
|
|
386
|
+
* `strict` profile ensures exact match
|
|
387
|
+
* Tests that canonicalization produces correct output
|
|
388
|
+
|
|
389
|
+
=== Scenario 6: Content-Only Comparison
|
|
390
|
+
|
|
391
|
+
**Requirement**: Compare only text content, ignore all structure
|
|
392
|
+
|
|
393
|
+
**Configuration**:
|
|
394
|
+
[source,ruby]
|
|
395
|
+
----
|
|
396
|
+
Canon::Comparison.equivalent?(doc1, doc2,
|
|
397
|
+
preprocessing: :format, # Normalize structure first
|
|
398
|
+
diff_algorithm: :semantic, # Better for structure-independent
|
|
399
|
+
match_profile: :content_only, # Ignore all structure
|
|
400
|
+
verbose: true,
|
|
401
|
+
diff_mode: :by_object
|
|
402
|
+
)
|
|
403
|
+
----
|
|
404
|
+
|
|
405
|
+
**Why**:
|
|
406
|
+
* `content_only` profile ignores structure
|
|
407
|
+
* `semantic` algorithm better at structure-independent comparison
|
|
408
|
+
* `format` preprocessing normalizes before comparison
|
|
409
|
+
|
|
410
|
+
== Layer Interaction Matrix
|
|
411
|
+
|
|
412
|
+
This table shows recommended configurations for common scenarios:
|
|
413
|
+
|
|
414
|
+
[cols="3,1,1,1,1,2"]
|
|
415
|
+
|===
|
|
416
|
+
|Use Case |Layer 1 |Layer 2 |Layer 3 |Layer 4 |Notes
|
|
417
|
+
|
|
418
|
+
|Unit tests (similar structure)
|
|
419
|
+
|normalize
|
|
420
|
+
|dom
|
|
421
|
+
|spec_friendly
|
|
422
|
+
|by_line
|
|
423
|
+
|Fast, test-friendly
|
|
424
|
+
|
|
425
|
+
|Unit tests (any structure)
|
|
426
|
+
|normalize
|
|
427
|
+
|semantic
|
|
428
|
+
|spec_friendly
|
|
429
|
+
|by_object
|
|
430
|
+
|Handles restructuring
|
|
431
|
+
|
|
432
|
+
|API response comparison
|
|
433
|
+
|none
|
|
434
|
+
|dom
|
|
435
|
+
|custom
|
|
436
|
+
|by_object
|
|
437
|
+
|Configure key_order
|
|
438
|
+
|
|
439
|
+
|Document evolution tracking
|
|
440
|
+
|none
|
|
441
|
+
|semantic
|
|
442
|
+
|rendered
|
|
443
|
+
|by_object
|
|
444
|
+
|Detect operations
|
|
445
|
+
|
|
446
|
+
|Code review
|
|
447
|
+
|none
|
|
448
|
+
|dom
|
|
449
|
+
|strict
|
|
450
|
+
|by_line
|
|
451
|
+
|Traditional diff
|
|
452
|
+
|
|
453
|
+
|C14N testing
|
|
454
|
+
|c14n
|
|
455
|
+
|dom
|
|
456
|
+
|strict
|
|
457
|
+
|by_line
|
|
458
|
+
|Exact match
|
|
459
|
+
|
|
460
|
+
|Content extraction testing
|
|
461
|
+
|format
|
|
462
|
+
|semantic
|
|
463
|
+
|content_only
|
|
464
|
+
|by_object
|
|
465
|
+
|Structure-independent
|
|
466
|
+
|
|
467
|
+
|Regression testing
|
|
468
|
+
|normalize
|
|
469
|
+
|dom
|
|
470
|
+
|spec_friendly
|
|
471
|
+
|by_line
|
|
472
|
+
|Stable, fast
|
|
473
|
+
|===
|
|
474
|
+
|
|
475
|
+
== Common Configuration Patterns
|
|
476
|
+
|
|
477
|
+
=== Pattern 1: Fast Test Assertion
|
|
478
|
+
|
|
479
|
+
[source,ruby]
|
|
480
|
+
----
|
|
481
|
+
# Minimal configuration for speed
|
|
482
|
+
Canon::Comparison.equivalent?(expected, actual,
|
|
483
|
+
match_profile: :spec_friendly
|
|
484
|
+
)
|
|
485
|
+
# Uses defaults: no preprocessing, dom algorithm, by_line output
|
|
486
|
+
----
|
|
487
|
+
|
|
488
|
+
=== Pattern 2: Comprehensive Analysis
|
|
489
|
+
|
|
490
|
+
[source,ruby]
|
|
491
|
+
----
|
|
492
|
+
# Full analysis with all features
|
|
493
|
+
result = Canon::Comparison.equivalent?(doc1, doc2,
|
|
494
|
+
preprocessing: :normalize,
|
|
495
|
+
diff_algorithm: :semantic,
|
|
496
|
+
match_profile: :spec_friendly,
|
|
497
|
+
verbose: true,
|
|
498
|
+
diff_mode: :by_object,
|
|
499
|
+
use_color: true,
|
|
500
|
+
context_lines: 5,
|
|
501
|
+
show_legend: true
|
|
502
|
+
)
|
|
503
|
+
|
|
504
|
+
# Access rich data
|
|
505
|
+
puts result.operations
|
|
506
|
+
puts result.statistics
|
|
507
|
+
----
|
|
508
|
+
|
|
509
|
+
=== Pattern 3: Strict Validation
|
|
510
|
+
|
|
511
|
+
[source,ruby]
|
|
512
|
+
----
|
|
513
|
+
# Exact match required
|
|
514
|
+
Canon::Comparison.equivalent?(doc1, doc2,
|
|
515
|
+
preprocessing: :c14n, # Canonicalize first
|
|
516
|
+
match_profile: :strict, # Exact matching
|
|
517
|
+
verbose: true # Show any differences
|
|
518
|
+
)
|
|
519
|
+
----
|
|
520
|
+
|
|
521
|
+
=== Pattern 4: Flexible Content Comparison
|
|
522
|
+
|
|
523
|
+
[source,ruby]
|
|
524
|
+
----
|
|
525
|
+
# Focus on content, ignore structure
|
|
526
|
+
Canon::Comparison.equivalent?(doc1, doc2,
|
|
527
|
+
preprocessing: :normalize,
|
|
528
|
+
diff_algorithm: :semantic,
|
|
529
|
+
match_profile: :content_only,
|
|
530
|
+
verbose: true
|
|
531
|
+
)
|
|
532
|
+
----
|
|
533
|
+
|
|
534
|
+
== Anti-Patterns to Avoid
|
|
535
|
+
|
|
536
|
+
=== Anti-Pattern 1: Over-Configuration
|
|
537
|
+
|
|
538
|
+
[source,ruby]
|
|
539
|
+
----
|
|
540
|
+
# DON'T: Conflicting settings
|
|
541
|
+
Canon::Comparison.equivalent?(doc1, doc2,
|
|
542
|
+
preprocessing: :c14n, # Canonicalizes
|
|
543
|
+
match: {
|
|
544
|
+
structural_whitespace: :strict # Conflicts with c14n
|
|
545
|
+
}
|
|
546
|
+
)
|
|
547
|
+
|
|
548
|
+
# DO: Choose one approach
|
|
549
|
+
Canon::Comparison.equivalent?(doc1, doc2,
|
|
550
|
+
preprocessing: :c14n # Handles normalization
|
|
551
|
+
)
|
|
552
|
+
----
|
|
553
|
+
|
|
554
|
+
=== Anti-Pattern 2: Wrong Algorithm/Mode Combination
|
|
555
|
+
|
|
556
|
+
[source,ruby]
|
|
557
|
+
----
|
|
558
|
+
# SUBOPTIMAL: Loses semantic information
|
|
559
|
+
Canon::Comparison.equivalent?(doc1, doc2,
|
|
560
|
+
diff_algorithm: :semantic,
|
|
561
|
+
diff_mode: :by_line # Doesn't show operations well
|
|
562
|
+
)
|
|
563
|
+
|
|
564
|
+
# BETTER: Use natural fit
|
|
565
|
+
Canon::Comparison.equivalent?(doc1, doc2,
|
|
566
|
+
diff_algorithm: :semantic,
|
|
567
|
+
diff_mode: :by_object # Shows operations clearly
|
|
568
|
+
)
|
|
569
|
+
----
|
|
570
|
+
|
|
571
|
+
=== Anti-Pattern 3: Unnecessary Semantic Algorithm
|
|
572
|
+
|
|
573
|
+
[source,ruby]
|
|
574
|
+
----
|
|
575
|
+
# SLOW: Semantic not needed for similar documents
|
|
576
|
+
Canon::Comparison.equivalent?(doc1, doc2,
|
|
577
|
+
diff_algorithm: :semantic # Overkill if no restructuring
|
|
578
|
+
)
|
|
579
|
+
|
|
580
|
+
# FASTER: Use DOM for similar structures
|
|
581
|
+
Canon::Comparison.equivalent?(doc1, doc2,
|
|
582
|
+
diff_algorithm: :dom # Fast for similar docs
|
|
583
|
+
)
|
|
584
|
+
----
|
|
585
|
+
|
|
586
|
+
=== Anti-Pattern 4: Missing Verbose Flag
|
|
587
|
+
|
|
588
|
+
[source,ruby]
|
|
589
|
+
----
|
|
590
|
+
# DON'T: Can't see what's different
|
|
591
|
+
result = Canon::Comparison.equivalent?(doc1, doc2)
|
|
592
|
+
# result is just true/false
|
|
593
|
+
|
|
594
|
+
# DO: Enable verbose for debugging
|
|
595
|
+
result = Canon::Comparison.equivalent?(doc1, doc2,
|
|
596
|
+
verbose: true
|
|
597
|
+
)
|
|
598
|
+
# result.diff shows actual differences
|
|
599
|
+
----
|
|
600
|
+
|
|
601
|
+
== Performance Considerations
|
|
602
|
+
|
|
603
|
+
=== Performance Impact by Layer
|
|
604
|
+
|
|
605
|
+
[cols="2,2,2,3"]
|
|
606
|
+
|===
|
|
607
|
+
|Layer |Low Impact |Medium Impact |High Impact
|
|
608
|
+
|
|
609
|
+
|**Layer 1**
|
|
610
|
+
|none
|
|
611
|
+
|normalize, format
|
|
612
|
+
|c14n (complex documents)
|
|
613
|
+
|
|
614
|
+
|**Layer 2**
|
|
615
|
+
|dom
|
|
616
|
+
|—
|
|
617
|
+
|semantic
|
|
618
|
+
|
|
619
|
+
|**Layer 3**
|
|
620
|
+
|Any profile
|
|
621
|
+
|—
|
|
622
|
+
|Complex custom dimensions
|
|
623
|
+
|
|
624
|
+
|**Layer 4**
|
|
625
|
+
|by_line
|
|
626
|
+
|by_object (small docs)
|
|
627
|
+
|by_object (large docs)
|
|
628
|
+
|===
|
|
629
|
+
|
|
630
|
+
=== Optimization Guidelines
|
|
631
|
+
|
|
632
|
+
**For Speed**:
|
|
633
|
+
[source,ruby]
|
|
634
|
+
----
|
|
635
|
+
Canon::Comparison.equivalent?(doc1, doc2,
|
|
636
|
+
preprocessing: :none, # Skip preprocessing
|
|
637
|
+
diff_algorithm: :dom, # Fast algorithm
|
|
638
|
+
match_profile: :strict, # Simple matching
|
|
639
|
+
diff_mode: :by_line # Fast output
|
|
640
|
+
)
|
|
641
|
+
----
|
|
642
|
+
|
|
643
|
+
**For Intelligence** (accepting slower performance):
|
|
644
|
+
[source,ruby]
|
|
645
|
+
----
|
|
646
|
+
Canon::Comparison.equivalent?(doc1, doc2,
|
|
647
|
+
preprocessing: :normalize, # Normalize first
|
|
648
|
+
diff_algorithm: :semantic, # Intelligent algorithm
|
|
649
|
+
diff_mode: :by_object # Rich output
|
|
650
|
+
)
|
|
651
|
+
----
|
|
652
|
+
|
|
653
|
+
== Migration Checklist
|
|
654
|
+
|
|
655
|
+
When changing configuration:
|
|
656
|
+
|
|
657
|
+
=== Changing Algorithm (DOM → Semantic)
|
|
658
|
+
|
|
659
|
+
- [ ] Update `diff_algorithm` option
|
|
660
|
+
- [ ] Consider changing `diff_mode` to `by_object`
|
|
661
|
+
- [ ] Remove or update `attribute_order` expectations
|
|
662
|
+
- [ ] Update test assertions for operation-based output
|
|
663
|
+
- [ ] Accept slower performance
|
|
664
|
+
- [ ] Review move detection impact
|
|
665
|
+
|
|
666
|
+
=== Changing Algorithm (Semantic → DOM)
|
|
667
|
+
|
|
668
|
+
- [ ] Update `diff_algorithm` option
|
|
669
|
+
- [ ] Consider changing `diff_mode` to `by_line`
|
|
670
|
+
- [ ] Add `attribute_order: :ignore` if needed
|
|
671
|
+
- [ ] Update test assertions for line-based output
|
|
672
|
+
- [ ] Expect faster performance
|
|
673
|
+
- [ ] Accept no move detection
|
|
674
|
+
|
|
675
|
+
=== Changing Match Profile
|
|
676
|
+
|
|
677
|
+
- [ ] Review impact on existing tests
|
|
678
|
+
- [ ] Understand what each dimension does
|
|
679
|
+
- [ ] Test with sample documents
|
|
680
|
+
- [ ] Update documentation
|
|
681
|
+
|
|
682
|
+
== See Also
|
|
683
|
+
|
|
684
|
+
* link:../understanding/comparison-pipeline.adoc[Comparison Pipeline] - Understanding the 4 layers
|
|
685
|
+
* link:../understanding/algorithms/[Algorithms] - Detailed algorithm documentation
|
|
686
|
+
* link:../features/match-options/algorithm-specific-behavior.adoc[Algorithm-Specific Behavior] - How algorithms differ
|
|
687
|
+
* link:../features/diff-formatting/algorithm-specific-output.adoc[Algorithm-Specific Output] - Output format differences
|
|
688
|
+
* link:../features/match-options/[Match Options] - All matching options
|
|
689
|
+
* link:../features/diff-formatting/[Diff Formatting] - Formatting options
|