canon 0.1.6 → 0.1.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (136) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop_todo.yml +163 -67
  3. data/README.adoc +400 -7
  4. data/docs/Gemfile +9 -0
  5. data/docs/INDEX.adoc +99 -182
  6. data/docs/_config.yml +100 -0
  7. data/docs/advanced/diff-classification.adoc +547 -0
  8. data/docs/advanced/diff-pipeline.adoc +358 -0
  9. data/docs/advanced/index.adoc +214 -0
  10. data/docs/advanced/semantic-diff-report.adoc +390 -0
  11. data/docs/{VERBOSE.adoc → advanced/verbose-mode-architecture.adoc} +51 -53
  12. data/docs/features/diff-formatting/algorithm-specific-output.adoc +533 -0
  13. data/docs/{CHARACTER_VISUALIZATION.adoc → features/diff-formatting/character-visualization.adoc} +23 -62
  14. data/docs/features/diff-formatting/colors-and-symbols.adoc +606 -0
  15. data/docs/features/diff-formatting/context-and-grouping.adoc +490 -0
  16. data/docs/features/diff-formatting/display-filtering.adoc +472 -0
  17. data/docs/features/diff-formatting/index.adoc +140 -0
  18. data/docs/features/environment-configuration/index.adoc +327 -0
  19. data/docs/features/environment-configuration/override-system.adoc +436 -0
  20. data/docs/features/environment-configuration/size-limits.adoc +273 -0
  21. data/docs/features/index.adoc +173 -0
  22. data/docs/features/input-validation/index.adoc +521 -0
  23. data/docs/features/match-options/algorithm-specific-behavior.adoc +365 -0
  24. data/docs/features/match-options/html-policies.adoc +312 -0
  25. data/docs/features/match-options/index.adoc +621 -0
  26. data/docs/getting-started/index.adoc +83 -0
  27. data/docs/getting-started/quick-start.adoc +76 -0
  28. data/docs/guides/choosing-configuration.adoc +689 -0
  29. data/docs/guides/index.adoc +181 -0
  30. data/docs/{CLI.adoc → interfaces/cli/index.adoc} +18 -13
  31. data/docs/interfaces/index.adoc +101 -0
  32. data/docs/{RSPEC.adoc → interfaces/rspec/index.adoc} +242 -31
  33. data/docs/{RUBY_API.adoc → interfaces/ruby-api/index.adoc} +118 -16
  34. data/docs/lychee.toml +65 -0
  35. data/docs/reference/cli-options.adoc +418 -0
  36. data/docs/reference/environment-variables.adoc +375 -0
  37. data/docs/reference/index.adoc +204 -0
  38. data/docs/reference/options-across-interfaces.adoc +417 -0
  39. data/docs/understanding/algorithms/dom-diff.adoc +389 -0
  40. data/docs/understanding/algorithms/index.adoc +314 -0
  41. data/docs/understanding/algorithms/semantic-tree-diff.adoc +533 -0
  42. data/docs/understanding/architecture.adoc +447 -0
  43. data/docs/understanding/comparison-pipeline.adoc +317 -0
  44. data/docs/understanding/formats/html.adoc +380 -0
  45. data/docs/understanding/formats/index.adoc +261 -0
  46. data/docs/understanding/formats/json.adoc +390 -0
  47. data/docs/understanding/formats/xml.adoc +366 -0
  48. data/docs/understanding/formats/yaml.adoc +504 -0
  49. data/docs/understanding/index.adoc +130 -0
  50. data/lib/canon/cli.rb +42 -1
  51. data/lib/canon/commands/diff_command.rb +108 -23
  52. data/lib/canon/comparison/compare_profile.rb +101 -0
  53. data/lib/canon/comparison/comparison_result.rb +41 -2
  54. data/lib/canon/comparison/html_comparator.rb +292 -71
  55. data/lib/canon/comparison/html_compare_profile.rb +117 -0
  56. data/lib/canon/comparison/match_options.rb +42 -4
  57. data/lib/canon/comparison/strategies/base_match_strategy.rb +99 -0
  58. data/lib/canon/comparison/strategies/match_strategy_factory.rb +74 -0
  59. data/lib/canon/comparison/strategies/semantic_tree_match_strategy.rb +220 -0
  60. data/lib/canon/comparison/xml_comparator.rb +695 -91
  61. data/lib/canon/comparison.rb +207 -2
  62. data/lib/canon/config/env_provider.rb +71 -0
  63. data/lib/canon/config/env_schema.rb +58 -0
  64. data/lib/canon/config/override_resolver.rb +55 -0
  65. data/lib/canon/config/type_converter.rb +59 -0
  66. data/lib/canon/config.rb +158 -29
  67. data/lib/canon/data_model.rb +29 -0
  68. data/lib/canon/diff/diff_classifier.rb +74 -14
  69. data/lib/canon/diff/diff_context_builder.rb +41 -0
  70. data/lib/canon/diff/diff_line.rb +18 -2
  71. data/lib/canon/diff/diff_node.rb +18 -3
  72. data/lib/canon/diff/diff_node_mapper.rb +71 -12
  73. data/lib/canon/diff/formatting_detector.rb +53 -0
  74. data/lib/canon/diff_formatter/by_line/base_formatter.rb +60 -5
  75. data/lib/canon/diff_formatter/by_line/html_formatter.rb +68 -16
  76. data/lib/canon/diff_formatter/by_line/json_formatter.rb +0 -37
  77. data/lib/canon/diff_formatter/by_line/simple_formatter.rb +0 -42
  78. data/lib/canon/diff_formatter/by_line/xml_formatter.rb +116 -31
  79. data/lib/canon/diff_formatter/by_line/yaml_formatter.rb +0 -37
  80. data/lib/canon/diff_formatter/by_object/base_formatter.rb +126 -19
  81. data/lib/canon/diff_formatter/by_object/xml_formatter.rb +30 -1
  82. data/lib/canon/diff_formatter/debug_output.rb +7 -1
  83. data/lib/canon/diff_formatter/diff_detail_formatter.rb +674 -57
  84. data/lib/canon/diff_formatter/legend.rb +42 -0
  85. data/lib/canon/diff_formatter.rb +78 -9
  86. data/lib/canon/errors.rb +56 -0
  87. data/lib/canon/formatters/html_formatter_base.rb +35 -1
  88. data/lib/canon/formatters/json_formatter.rb +3 -0
  89. data/lib/canon/formatters/yaml_formatter.rb +3 -0
  90. data/lib/canon/html/data_model.rb +229 -0
  91. data/lib/canon/html.rb +9 -0
  92. data/lib/canon/options/cli_generator.rb +70 -0
  93. data/lib/canon/options/registry.rb +234 -0
  94. data/lib/canon/rspec_matchers.rb +34 -13
  95. data/lib/canon/tree_diff/adapters/html_adapter.rb +316 -0
  96. data/lib/canon/tree_diff/adapters/json_adapter.rb +204 -0
  97. data/lib/canon/tree_diff/adapters/xml_adapter.rb +285 -0
  98. data/lib/canon/tree_diff/adapters/yaml_adapter.rb +213 -0
  99. data/lib/canon/tree_diff/core/attribute_comparator.rb +84 -0
  100. data/lib/canon/tree_diff/core/matching.rb +241 -0
  101. data/lib/canon/tree_diff/core/node_signature.rb +164 -0
  102. data/lib/canon/tree_diff/core/node_weight.rb +135 -0
  103. data/lib/canon/tree_diff/core/tree_node.rb +450 -0
  104. data/lib/canon/tree_diff/matchers/hash_matcher.rb +258 -0
  105. data/lib/canon/tree_diff/matchers/similarity_matcher.rb +168 -0
  106. data/lib/canon/tree_diff/matchers/structural_propagator.rb +242 -0
  107. data/lib/canon/tree_diff/matchers/universal_matcher.rb +220 -0
  108. data/lib/canon/tree_diff/operation_converter.rb +631 -0
  109. data/lib/canon/tree_diff/operations/operation.rb +92 -0
  110. data/lib/canon/tree_diff/operations/operation_detector.rb +626 -0
  111. data/lib/canon/tree_diff/tree_diff_integrator.rb +140 -0
  112. data/lib/canon/tree_diff.rb +33 -0
  113. data/lib/canon/validators/json_validator.rb +3 -1
  114. data/lib/canon/validators/yaml_validator.rb +3 -1
  115. data/lib/canon/version.rb +1 -1
  116. data/lib/canon/xml/data_model.rb +22 -23
  117. data/lib/canon/xml/element_matcher.rb +128 -20
  118. data/lib/canon/xml/namespace_helper.rb +110 -0
  119. data/lib/canon.rb +3 -0
  120. metadata +81 -23
  121. data/_config.yml +0 -116
  122. data/docs/ADVANCED_TOPICS.adoc +0 -20
  123. data/docs/BASIC_USAGE.adoc +0 -16
  124. data/docs/CUSTOMIZING_BEHAVIOR.adoc +0 -19
  125. data/docs/DIFF_ARCHITECTURE.adoc +0 -435
  126. data/docs/DIFF_FORMATTING.adoc +0 -540
  127. data/docs/FORMATS.adoc +0 -447
  128. data/docs/INPUT_VALIDATION.adoc +0 -477
  129. data/docs/MATCH_ARCHITECTURE.adoc +0 -463
  130. data/docs/MATCH_OPTIONS.adoc +0 -719
  131. data/docs/MODES.adoc +0 -432
  132. data/docs/NORMATIVE_INFORMATIVE_DIFFS.adoc +0 -219
  133. data/docs/OPTIONS.adoc +0 -1387
  134. data/docs/PREPROCESSING.adoc +0 -491
  135. data/docs/SEMANTIC_DIFF_REPORT.adoc +0 -528
  136. data/docs/UNDERSTANDING_CANON.adoc +0 -17
@@ -0,0 +1,533 @@
1
+ ---
2
+ title: Semantic Algorithm
3
+ parent: Algorithms
4
+ grand_parent: Understanding
5
+ nav_order: 2
6
+ ---
7
+ = Semantic algorithm
8
+ :toc:
9
+ :toclevels: 3
10
+
11
+ WARNING: The semantic tree diff algorithm is currently **experimental** and under active development. While functional and tested, the API and behavior may change in future releases. Use with caution in production environments.
12
+
13
+ == Purpose
14
+
15
+ The Semantic algorithm is Canon's **intelligent, experimental algorithm** for document comparison. It provides signature-based matching with operation detection (INSERT, DELETE, UPDATE, MOVE).
16
+
17
+ This page explains when to use the Semantic algorithm, how it differs from DOM, and how to configure it effectively.
18
+
19
+ == When to Use
20
+
21
+ The Semantic algorithm is for **advanced use cases** where intelligence is worth the performance cost.
22
+
23
+ === Use Semantic Algorithm When
24
+
25
+ * ✓ You need to **detect element moves and reordering**
26
+ * ✓ Documents have **significant restructuring**
27
+ * ✓ You need **operation-level analysis** (INSERT, DELETE, UPDATE, MOVE)
28
+ * ✓ You want **statistical analysis** of changes
29
+ * ✓ You're **analyzing document evolution**
30
+ * ✓ You're willing to **accept experimental status**
31
+ * ✓ Working with **smaller documents** (< 10KB)
32
+
33
+ === Characteristics
34
+
35
+ [cols="2,3"]
36
+ |===
37
+ |Feature |Semantic Algorithm
38
+
39
+ |**Status**
40
+ |Experimental, under development
41
+
42
+ |**Performance**
43
+ |Slower - O(n²) worst case
44
+
45
+ |**Memory Usage**
46
+ |Higher - builds tree structures
47
+
48
+ |**Matching Strategy**
49
+ |Signature-based similarity matching
50
+
51
+ |**Move Detection**
52
+ |Yes - detects MOVE operations
53
+
54
+ |**Output Format**
55
+ |Operation-based (INSERT, DELETE, UPDATE, MOVE)
56
+
57
+ |**Best For**
58
+ |Restructured documents, operation analysis
59
+
60
+ |**Document Size**
61
+ |Best for smaller documents (< 10KB)
62
+ |===
63
+
64
+ == How It Works
65
+
66
+ The Semantic algorithm uses a sophisticated three-phase matching process:
67
+
68
+ === Phase 1: Hash-Based Exact Matching
69
+
70
+ Matches nodes with identical structure and content:
71
+
72
+ * **Fast** - O(n) performance
73
+ * **Eliminates** unchanged subtrees
74
+ * **Reduces** problem size for later phases
75
+
76
+ === Phase 2: Similarity-Based Matching
77
+
78
+ Matches similar but not identical nodes:
79
+
80
+ * **Compares** node names, attributes, text, structure
81
+ * **Scores** similarity using weighted metrics
82
+ * **Threshold** - Default 0.95 (95% similar)
83
+
84
+ === Phase 3: Structural Propagation
85
+
86
+ Improves match quality using context:
87
+
88
+ * **Top-down** - Propagate from matched parents
89
+ * **Bottom-up** - Propagate from matched children
90
+ * **Resolves** ambiguous matches
91
+
92
+ === Signature-Based Matching
93
+
94
+ Unlike DOM's position-based comparison, Semantic uses **signatures**:
95
+
96
+ .Signature-based comparison example
97
+ [example]
98
+ ====
99
+ [source,xml]
100
+ ----
101
+ <!-- Document 1 -->
102
+ <book>
103
+ <title>Canon Guide</title>
104
+ <author>Alice</author>
105
+ </book>
106
+
107
+ <!-- Document 2 -->
108
+ <book>
109
+ <author>Alice</author>
110
+ <title>Canon Guide</title>
111
+ </book>
112
+ ----
113
+
114
+ Semantic algorithm:
115
+ 1. Calculates signature for each element
116
+ 2. `<author>Alice</author>` has same signature in both documents
117
+ 3. Detects as **MOVE** operation (moved from position 2 to position 1)
118
+
119
+ Result: 1 MOVE operation detected (author element moved)
120
+ ====
121
+
122
+ == Operation Detection
123
+
124
+ The Semantic algorithm detects eight operation types:
125
+
126
+ === Basic Operations (Level 1)
127
+
128
+ **INSERT**:: New node added
129
+ [source]
130
+ ----
131
+ + <chapter id="3">New Chapter</chapter>
132
+ ----
133
+
134
+ **DELETE**:: Node removed
135
+ [source]
136
+ ----
137
+ - <chapter id="old">Removed Chapter</chapter>
138
+ ----
139
+
140
+ **UPDATE**:: Node content/attributes changed
141
+ [source]
142
+ ----
143
+ ~ <title>Old → New</title>
144
+ ----
145
+
146
+ === Structural Operations (Level 2)
147
+
148
+ **MOVE**:: Node relocated to different position
149
+ [source]
150
+ ----
151
+ → <author>Alice</author> (moved from position 2 to 1)
152
+ ----
153
+
154
+ === Semantic Operations (Level 3)
155
+
156
+ **MERGE**:: Multiple nodes combined into one
157
+ [source]
158
+ ----
159
+ ⊕ <section> (merged from 2 separate sections)
160
+ ----
161
+
162
+ **SPLIT**:: One node divided into multiple
163
+ [source]
164
+ ----
165
+ ⊖ <section> (split into 2 separate sections)
166
+ ----
167
+
168
+ **UPGRADE**:: Node promoted to higher level
169
+ [source]
170
+ ----
171
+ ↑ <section> (promoted from depth 3 to depth 2)
172
+ ----
173
+
174
+ **DOWNGRADE**:: Node demoted to lower level
175
+ [source]
176
+ ----
177
+ ↓ <section> (demoted from depth 2 to depth 3)
178
+ ----
179
+
180
+ == Configuration
181
+
182
+ === Basic Usage
183
+
184
+ **Ruby API**:
185
+ [source,ruby]
186
+ ----
187
+ # Explicitly specify semantic algorithm
188
+ Canon::Comparison.equivalent?(doc1, doc2,
189
+ diff_algorithm: :semantic
190
+ )
191
+ ----
192
+
193
+ **CLI**:
194
+ [source,bash]
195
+ ----
196
+ canon diff file1.xml file2.xml --diff-algorithm semantic
197
+ ----
198
+
199
+ === With Similarity Threshold
200
+
201
+ Control how strict matching is:
202
+
203
+ [source,ruby]
204
+ ----
205
+ Canon::Comparison.equivalent?(doc1, doc2,
206
+ diff_algorithm: :semantic,
207
+ match: {
208
+ similarity_threshold: 0.90 # More lenient (default: 0.95)
209
+ }
210
+ )
211
+ ----
212
+
213
+ * **Higher** (0.99) - Very conservative, only nearly identical nodes match
214
+ * **Lower** (0.80) - More aggressive, allows less similar nodes to match
215
+ * **Default** (0.95) - Balanced for most use cases
216
+
217
+ === With Match Options
218
+
219
+ Semantic algorithm interprets match options for signature calculation:
220
+
221
+ [source,ruby]
222
+ ----
223
+ Canon::Comparison.equivalent?(doc1, doc2,
224
+ diff_algorithm: :semantic,
225
+ match: {
226
+ text_content: :normalize, # Affects text signatures
227
+ attribute_order: :ignore, # Always ignored (unordered in signatures)
228
+ element_position: :ignore # MOVEs become informative
229
+ }
230
+ )
231
+ ----
232
+
233
+ See link:../../features/match-options/algorithm-specific-behavior.adoc[Algorithm-Specific Behavior] for details.
234
+
235
+ === With Diff Formatting
236
+
237
+ Semantic works best with by_object mode:
238
+
239
+ [source,ruby]
240
+ ----
241
+ # Operation-based output (natural fit for Semantic)
242
+ Canon::Comparison.equivalent?(doc1, doc2,
243
+ diff_algorithm: :semantic,
244
+ diff_mode: :by_object, # Shows operations
245
+ verbose: true
246
+ )
247
+
248
+ # Traditional output (also works)
249
+ Canon::Comparison.equivalent?(doc1, doc2,
250
+ diff_algorithm: :semantic,
251
+ diff_mode: :by_line, # Traditional format
252
+ verbose: true
253
+ )
254
+ ----
255
+
256
+ == Output Format
257
+
258
+ === Operation-Based Output (Default)
259
+
260
+ The Semantic algorithm naturally produces operation-based output:
261
+
262
+ .Operation-based diff example
263
+ [example]
264
+ ====
265
+ ```
266
+ UPDATE: book/title: "Old Title" → "New Title"
267
+ MOVE: book/author → book/author (position 2 → 1)
268
+
269
+ Statistics:
270
+ INSERT: 0
271
+ DELETE: 0
272
+ UPDATE: 1
273
+ MOVE: 1
274
+ ```
275
+
276
+ * Shows **what changed** (operation type)
277
+ * Shows **where it changed** (element path)
278
+ * Provides **statistics** (operation counts)
279
+ ====
280
+
281
+ === Accessing Operations
282
+
283
+ [source,ruby]
284
+ ----
285
+ result = Canon::Comparison.equivalent?(doc1, doc2,
286
+ diff_algorithm: :semantic,
287
+ verbose: true
288
+ )
289
+
290
+ # Access operations
291
+ result.operations.each do |op|
292
+ puts "Type: #{op.type}" # :insert, :delete, :update, :move
293
+ puts "Path: #{op.path}" # Element path
294
+ puts "Details: #{op.details}" # Operation-specific info
295
+ end
296
+
297
+ # Access statistics
298
+ puts "Moves: #{result.statistics.moves}"
299
+ puts "Updates: #{result.statistics.updates}"
300
+ ----
301
+
302
+ == Advantages
303
+
304
+ === Intelligent Matching
305
+
306
+ * **Detects moves** - Tracks content relocation
307
+ * **Handles restructuring** - Works with heavily modified documents
308
+ * **Signature-based** - Matches similar content anywhere
309
+
310
+ .Move detection example
311
+ [cols="2,2,2"]
312
+ |===
313
+ |DOM Algorithm |Semantic Algorithm |Advantage
314
+
315
+ |Shows as DELETE + INSERT
316
+ |Shows as MOVE
317
+ |Clearer understanding
318
+
319
+ |Many false positives
320
+ |Accurate detection
321
+ |Better analysis
322
+
323
+ |Position-dependent
324
+ |Position-independent
325
+ |Handles reordering
326
+ |===
327
+
328
+ === Rich Analysis
329
+
330
+ * **Operation counts** - Statistical view of changes
331
+ * **Operation paths** - Precise location information
332
+ * **Confidence scores** - Match quality indicators
333
+
334
+ == Limitations
335
+
336
+ === Performance
337
+
338
+ The Semantic algorithm is significantly slower:
339
+
340
+ .Performance comparison
341
+ [cols="1,1,1,1"]
342
+ |===
343
+ |Document Size |DOM Time |Semantic Time |Ratio
344
+
345
+ |1 KB
346
+ |~1 ms
347
+ |~10 ms
348
+ |10x slower
349
+
350
+ |10 KB
351
+ |~10 ms
352
+ |~150 ms
353
+ |15x slower
354
+
355
+ |100 KB
356
+ |~100 ms
357
+ |~3000 ms
358
+ |30x slower
359
+ |===
360
+
361
+ **Workaround**: Use DOM algorithm for large documents, Semantic for smaller ones
362
+
363
+ === Experimental Status
364
+
365
+ * **API may change** - Not stable yet
366
+ * **Behavior may change** - Under active development
367
+ * **Edge cases** - May have unexpected results
368
+
369
+ **Workaround**: Test thoroughly before relying on Semantic in production
370
+
371
+ === Complex Matching
372
+
373
+ * **False matches** - May match unrelated but similar content
374
+ * **Ambiguity** - Multiple similar candidates can confuse matching
375
+ * **Tuning needed** - May require similarity threshold adjustment
376
+
377
+ **Workaround**: Adjust `similarity_threshold` or use DOM algorithm
378
+
379
+ == Common Use Cases
380
+
381
+ === Use Case 1: Detecting Document Reorganization
382
+
383
+ [source,ruby]
384
+ ----
385
+ # Analyze how document was restructured
386
+ result = Canon::Comparison.equivalent?(old_doc, new_doc,
387
+ diff_algorithm: :semantic,
388
+ verbose: true,
389
+ diff_mode: :by_object
390
+ )
391
+
392
+ # Analyze operations
393
+ puts "Content moved: #{result.statistics.moves} times"
394
+ puts "Sections merged: #{result.statistics.merges}"
395
+ puts "Sections split: #{result.statistics.splits}"
396
+ ----
397
+
398
+ === Use Case 2: Content Evolution Tracking
399
+
400
+ [source,ruby]
401
+ ----
402
+ # Track how content evolved over time
403
+ versions = [v1, v2, v3, v4]
404
+
405
+ versions.each_cons(2) do |old, new|
406
+ result = Canon::Comparison.equivalent?(old, new,
407
+ diff_algorithm: :semantic,
408
+ verbose: true
409
+ )
410
+
411
+ log_operations(result.operations)
412
+ end
413
+ ----
414
+
415
+ === Use Case 3: Intelligent Test Assertions
416
+
417
+ [source,ruby]
418
+ ----
419
+ # Allow reordering in tests
420
+ RSpec.describe "Content generation" do
421
+ it "generates correct content regardless of order" do
422
+ actual = generate_content
423
+
424
+ expect(actual).to be_xml_equivalent_to(expected)
425
+ .with_options(
426
+ diff_algorithm: :semantic,
427
+ element_position: :ignore # Ignores moves
428
+ )
429
+ end
430
+ end
431
+ ----
432
+
433
+ == Best Practices
434
+
435
+ === Start with DOM, Upgrade to Semantic When Needed
436
+
437
+ Use DOM algorithm as default, switch to Semantic only when move detection is required.
438
+
439
+ === Adjust Similarity Threshold
440
+
441
+ Start conservative (0.95+), lower gradually if under-matching:
442
+
443
+ [source,ruby]
444
+ ----
445
+ # Try different thresholds to find sweet spot
446
+ [0.95, 0.90, 0.85].each do |threshold|
447
+ result = Canon::Comparison.equivalent?(doc1, doc2,
448
+ diff_algorithm: :semantic,
449
+ match: { similarity_threshold: threshold }
450
+ )
451
+ puts "Threshold #{threshold}: #{result.statistics.total} operations"
452
+ end
453
+ ----
454
+
455
+ === Use Appropriate Match Options
456
+
457
+ Configure dimensions to match your needs:
458
+
459
+ [source,ruby]
460
+ ----
461
+ # Ignore cosmetic differences
462
+ Canon::Comparison.equivalent?(doc1, doc2,
463
+ diff_algorithm: :semantic,
464
+ match: {
465
+ structural_whitespace: :ignore,
466
+ element_position: :ignore
467
+ }
468
+ )
469
+ ----
470
+
471
+ == Troubleshooting
472
+
473
+ === Too Many Operations Detected
474
+
475
+ **Problem**: Everything shows as changed
476
+
477
+ **Solution**: Increase similarity threshold
478
+ [source,ruby]
479
+ ----
480
+ match: { similarity_threshold: 0.98 } # Was 0.95
481
+ ----
482
+
483
+ === Too Few Matches
484
+
485
+ **Problem**: Similar content shows as DELETE + INSERT
486
+
487
+ **Solution**: Decrease similarity threshold
488
+ [source,ruby]
489
+ ----
490
+ match: { similarity_threshold: 0.85 } # Was 0.95
491
+ ----
492
+
493
+ === Performance Issues
494
+
495
+ **Problem**: Comparison is very slow
496
+
497
+ **Solution**: Use DOM algorithm or limit document size
498
+ [source,ruby]
499
+ ----
500
+ # Conditionally use Semantic only for small docs
501
+ algorithm = doc_size < 10_000 ? :semantic : :dom
502
+ Canon::Comparison.equivalent?(doc1, doc2,
503
+ diff_algorithm: algorithm
504
+ )
505
+ ----
506
+
507
+ == Migration from DOM
508
+
509
+ === Expected Changes
510
+
511
+ When switching from DOM to Semantic:
512
+
513
+ 1. **MOVEs detected** - Reordered content shows as MOVE instead of DELETE+INSERT
514
+ 2. **Different output** - Operations instead of line-based diff
515
+ 3. **Slower performance** - Accept longer comparison time
516
+ 4. **New capabilities** - Access to rich operation analysis
517
+
518
+ === Migration Steps
519
+
520
+ 1. **Test on small subset** - Verify behavior on sample documents
521
+ 2. **Compare outputs** - Review DOM vs Semantic results side-by-side
522
+ 3. **Adjust threshold** - Tune similarity_threshold for your needs
523
+ 4. **Update assertions** - Adapt tests to operation-based output
524
+ 5. **Monitor performance** - Ensure acceptable speed
525
+
526
+ == See Also
527
+
528
+ * link:index.adoc[Algorithms Overview] - Comparison of DOM vs Semantic
529
+ * link:dom-diff.adoc[DOM Algorithm] - Standard algorithm
530
+ * link:../../features/match-options/algorithm-specific-behavior.adoc[Algorithm-Specific Behavior] - How Semantic interprets options
531
+ * link:../../features/diff-formatting/algorithm-specific-output.adoc[Algorithm-Specific Output] - Output format details
532
+ * link:../../guides/choosing-configuration.adoc[Choosing Configuration] - Complete decision guide
533
+ * link:../../advanced/semantic-tree-diff-internals.adoc[Semantic Tree Diff Internals] - Advanced details (if available)