canon 0.1.6 → 0.1.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (136) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop_todo.yml +163 -67
  3. data/README.adoc +400 -7
  4. data/docs/Gemfile +9 -0
  5. data/docs/INDEX.adoc +99 -182
  6. data/docs/_config.yml +100 -0
  7. data/docs/advanced/diff-classification.adoc +547 -0
  8. data/docs/advanced/diff-pipeline.adoc +358 -0
  9. data/docs/advanced/index.adoc +214 -0
  10. data/docs/advanced/semantic-diff-report.adoc +390 -0
  11. data/docs/{VERBOSE.adoc → advanced/verbose-mode-architecture.adoc} +51 -53
  12. data/docs/features/diff-formatting/algorithm-specific-output.adoc +533 -0
  13. data/docs/{CHARACTER_VISUALIZATION.adoc → features/diff-formatting/character-visualization.adoc} +23 -62
  14. data/docs/features/diff-formatting/colors-and-symbols.adoc +606 -0
  15. data/docs/features/diff-formatting/context-and-grouping.adoc +490 -0
  16. data/docs/features/diff-formatting/display-filtering.adoc +472 -0
  17. data/docs/features/diff-formatting/index.adoc +140 -0
  18. data/docs/features/environment-configuration/index.adoc +327 -0
  19. data/docs/features/environment-configuration/override-system.adoc +436 -0
  20. data/docs/features/environment-configuration/size-limits.adoc +273 -0
  21. data/docs/features/index.adoc +173 -0
  22. data/docs/features/input-validation/index.adoc +521 -0
  23. data/docs/features/match-options/algorithm-specific-behavior.adoc +365 -0
  24. data/docs/features/match-options/html-policies.adoc +312 -0
  25. data/docs/features/match-options/index.adoc +621 -0
  26. data/docs/getting-started/index.adoc +83 -0
  27. data/docs/getting-started/quick-start.adoc +76 -0
  28. data/docs/guides/choosing-configuration.adoc +689 -0
  29. data/docs/guides/index.adoc +181 -0
  30. data/docs/{CLI.adoc → interfaces/cli/index.adoc} +18 -13
  31. data/docs/interfaces/index.adoc +101 -0
  32. data/docs/{RSPEC.adoc → interfaces/rspec/index.adoc} +242 -31
  33. data/docs/{RUBY_API.adoc → interfaces/ruby-api/index.adoc} +118 -16
  34. data/docs/lychee.toml +65 -0
  35. data/docs/reference/cli-options.adoc +418 -0
  36. data/docs/reference/environment-variables.adoc +375 -0
  37. data/docs/reference/index.adoc +204 -0
  38. data/docs/reference/options-across-interfaces.adoc +417 -0
  39. data/docs/understanding/algorithms/dom-diff.adoc +389 -0
  40. data/docs/understanding/algorithms/index.adoc +314 -0
  41. data/docs/understanding/algorithms/semantic-tree-diff.adoc +533 -0
  42. data/docs/understanding/architecture.adoc +447 -0
  43. data/docs/understanding/comparison-pipeline.adoc +317 -0
  44. data/docs/understanding/formats/html.adoc +380 -0
  45. data/docs/understanding/formats/index.adoc +261 -0
  46. data/docs/understanding/formats/json.adoc +390 -0
  47. data/docs/understanding/formats/xml.adoc +366 -0
  48. data/docs/understanding/formats/yaml.adoc +504 -0
  49. data/docs/understanding/index.adoc +130 -0
  50. data/lib/canon/cli.rb +42 -1
  51. data/lib/canon/commands/diff_command.rb +108 -23
  52. data/lib/canon/comparison/compare_profile.rb +101 -0
  53. data/lib/canon/comparison/comparison_result.rb +41 -2
  54. data/lib/canon/comparison/html_comparator.rb +292 -71
  55. data/lib/canon/comparison/html_compare_profile.rb +117 -0
  56. data/lib/canon/comparison/match_options.rb +42 -4
  57. data/lib/canon/comparison/strategies/base_match_strategy.rb +99 -0
  58. data/lib/canon/comparison/strategies/match_strategy_factory.rb +74 -0
  59. data/lib/canon/comparison/strategies/semantic_tree_match_strategy.rb +220 -0
  60. data/lib/canon/comparison/xml_comparator.rb +695 -91
  61. data/lib/canon/comparison.rb +207 -2
  62. data/lib/canon/config/env_provider.rb +71 -0
  63. data/lib/canon/config/env_schema.rb +58 -0
  64. data/lib/canon/config/override_resolver.rb +55 -0
  65. data/lib/canon/config/type_converter.rb +59 -0
  66. data/lib/canon/config.rb +158 -29
  67. data/lib/canon/data_model.rb +29 -0
  68. data/lib/canon/diff/diff_classifier.rb +74 -14
  69. data/lib/canon/diff/diff_context_builder.rb +41 -0
  70. data/lib/canon/diff/diff_line.rb +18 -2
  71. data/lib/canon/diff/diff_node.rb +18 -3
  72. data/lib/canon/diff/diff_node_mapper.rb +71 -12
  73. data/lib/canon/diff/formatting_detector.rb +53 -0
  74. data/lib/canon/diff_formatter/by_line/base_formatter.rb +60 -5
  75. data/lib/canon/diff_formatter/by_line/html_formatter.rb +68 -16
  76. data/lib/canon/diff_formatter/by_line/json_formatter.rb +0 -37
  77. data/lib/canon/diff_formatter/by_line/simple_formatter.rb +0 -42
  78. data/lib/canon/diff_formatter/by_line/xml_formatter.rb +116 -31
  79. data/lib/canon/diff_formatter/by_line/yaml_formatter.rb +0 -37
  80. data/lib/canon/diff_formatter/by_object/base_formatter.rb +126 -19
  81. data/lib/canon/diff_formatter/by_object/xml_formatter.rb +30 -1
  82. data/lib/canon/diff_formatter/debug_output.rb +7 -1
  83. data/lib/canon/diff_formatter/diff_detail_formatter.rb +674 -57
  84. data/lib/canon/diff_formatter/legend.rb +42 -0
  85. data/lib/canon/diff_formatter.rb +78 -9
  86. data/lib/canon/errors.rb +56 -0
  87. data/lib/canon/formatters/html_formatter_base.rb +35 -1
  88. data/lib/canon/formatters/json_formatter.rb +3 -0
  89. data/lib/canon/formatters/yaml_formatter.rb +3 -0
  90. data/lib/canon/html/data_model.rb +229 -0
  91. data/lib/canon/html.rb +9 -0
  92. data/lib/canon/options/cli_generator.rb +70 -0
  93. data/lib/canon/options/registry.rb +234 -0
  94. data/lib/canon/rspec_matchers.rb +34 -13
  95. data/lib/canon/tree_diff/adapters/html_adapter.rb +316 -0
  96. data/lib/canon/tree_diff/adapters/json_adapter.rb +204 -0
  97. data/lib/canon/tree_diff/adapters/xml_adapter.rb +285 -0
  98. data/lib/canon/tree_diff/adapters/yaml_adapter.rb +213 -0
  99. data/lib/canon/tree_diff/core/attribute_comparator.rb +84 -0
  100. data/lib/canon/tree_diff/core/matching.rb +241 -0
  101. data/lib/canon/tree_diff/core/node_signature.rb +164 -0
  102. data/lib/canon/tree_diff/core/node_weight.rb +135 -0
  103. data/lib/canon/tree_diff/core/tree_node.rb +450 -0
  104. data/lib/canon/tree_diff/matchers/hash_matcher.rb +258 -0
  105. data/lib/canon/tree_diff/matchers/similarity_matcher.rb +168 -0
  106. data/lib/canon/tree_diff/matchers/structural_propagator.rb +242 -0
  107. data/lib/canon/tree_diff/matchers/universal_matcher.rb +220 -0
  108. data/lib/canon/tree_diff/operation_converter.rb +631 -0
  109. data/lib/canon/tree_diff/operations/operation.rb +92 -0
  110. data/lib/canon/tree_diff/operations/operation_detector.rb +626 -0
  111. data/lib/canon/tree_diff/tree_diff_integrator.rb +140 -0
  112. data/lib/canon/tree_diff.rb +33 -0
  113. data/lib/canon/validators/json_validator.rb +3 -1
  114. data/lib/canon/validators/yaml_validator.rb +3 -1
  115. data/lib/canon/version.rb +1 -1
  116. data/lib/canon/xml/data_model.rb +22 -23
  117. data/lib/canon/xml/element_matcher.rb +128 -20
  118. data/lib/canon/xml/namespace_helper.rb +110 -0
  119. data/lib/canon.rb +3 -0
  120. metadata +81 -23
  121. data/_config.yml +0 -116
  122. data/docs/ADVANCED_TOPICS.adoc +0 -20
  123. data/docs/BASIC_USAGE.adoc +0 -16
  124. data/docs/CUSTOMIZING_BEHAVIOR.adoc +0 -19
  125. data/docs/DIFF_ARCHITECTURE.adoc +0 -435
  126. data/docs/DIFF_FORMATTING.adoc +0 -540
  127. data/docs/FORMATS.adoc +0 -447
  128. data/docs/INPUT_VALIDATION.adoc +0 -477
  129. data/docs/MATCH_ARCHITECTURE.adoc +0 -463
  130. data/docs/MATCH_OPTIONS.adoc +0 -719
  131. data/docs/MODES.adoc +0 -432
  132. data/docs/NORMATIVE_INFORMATIVE_DIFFS.adoc +0 -219
  133. data/docs/OPTIONS.adoc +0 -1387
  134. data/docs/PREPROCESSING.adoc +0 -491
  135. data/docs/SEMANTIC_DIFF_REPORT.adoc +0 -528
  136. data/docs/UNDERSTANDING_CANON.adoc +0 -17
@@ -0,0 +1,436 @@
1
+ ---
2
+ title: Override System
3
+ parent: Environment Configuration
4
+ grand_parent: Features
5
+ nav_order: 2
6
+ ---
7
+ = Override system
8
+ :toc:
9
+ :toclevels: 3
10
+
11
+ == Purpose
12
+
13
+ Canon's override system allows environment variables to take precedence over programmatic configuration, enabling flexible deployment without code changes.
14
+
15
+ This page explains how the priority chain works, when overrides apply, and how to use them effectively.
16
+
17
+ == Priority chain
18
+
19
+ Configuration values are resolved using a strict three-level priority:
20
+
21
+ [source]
22
+ ----
23
+ ┌─────────────────────────────────┐
24
+ │ 1. Environment Variables │ ← Highest Priority
25
+ │ (CANON_XML_DIFF_ALGORITHM) │
26
+ └──────────────┬──────────────────┘
27
+ ↓ overrides
28
+ ┌─────────────────────────────────┐
29
+ │ 2. Programmatic Configuration │ ← Medium Priority
30
+ │ (config.xml.diff.algorithm) │
31
+ └──────────────┬──────────────────┘
32
+ ↓ overrides
33
+ ┌─────────────────────────────────┐
34
+ │ 3. Default Values │ ← Lowest Priority
35
+ │ (defined in Canon::Config) │
36
+ └─────────────────────────────────┘
37
+ ----
38
+
39
+ **Rule**: Higher priority always wins, regardless of when values are set.
40
+
41
+ == How overrides work
42
+
43
+ === Environment variables override programmatic settings
44
+
45
+ When an environment variable is set, it **always** takes precedence:
46
+
47
+ [source,ruby]
48
+ ----
49
+ # Set ENV before requiring Canon
50
+ ENV['CANON_XML_DIFF_ALGORITHM'] = 'semantic'
51
+
52
+ # Programmatic setting is IGNORED
53
+ config = Canon::Config.instance
54
+ config.xml.diff.algorithm = :dom
55
+
56
+ # ENV wins
57
+ puts config.xml.diff.algorithm # => :semantic (not :dom)
58
+ ----
59
+
60
+ === Format-specific variables override global variables
61
+
62
+ Format-specific ENV vars override global ENV vars:
63
+
64
+ [source,bash]
65
+ ----
66
+ # Global setting
67
+ export CANON_ALGORITHM=dom
68
+
69
+ # Format-specific override
70
+ export CANON_XML_DIFF_ALGORITHM=semantic
71
+
72
+ # Result:
73
+ # - XML uses semantic (format-specific wins)
74
+ # - HTML, JSON, YAML use dom (global applies)
75
+ ----
76
+
77
+ === Setting order doesn't matter
78
+
79
+ Unlike programmatic configuration, ENV variable priority is **positional**, not temporal:
80
+
81
+ [source,ruby]
82
+ ----
83
+ # Set ENV first
84
+ ENV['CANON_ALGORITHM'] = 'semantic'
85
+
86
+ # Configure programmatically later
87
+ config = Canon::Config.instance
88
+ config.xml.diff.algorithm = :dom # This is IGNORED
89
+
90
+ # ENV still wins, even though set "earlier"
91
+ puts config.xml.diff.algorithm # => :semantic
92
+ ----
93
+
94
+ == When to use environment variables
95
+
96
+ === Use ENV variables for
97
+
98
+ **CI/CD configuration**::
99
+ Different test runs need different settings without code changes.
100
+ +
101
+ [source,bash]
102
+ ----
103
+ # In .github/workflows/test.yml
104
+ env:
105
+ CANON_ALGORITHM: semantic
106
+ CANON_USE_COLOR: false
107
+ ----
108
+
109
+ **Container deployment**::
110
+ Docker containers with different comparison behavior.
111
+ +
112
+ [source,dockerfile]
113
+ ----
114
+ # Dockerfile
115
+ ENV CANON_XML_DIFF_ALGORITHM=semantic
116
+ ENV CANON_CONTEXT_LINES=5
117
+ ----
118
+
119
+ **Environment-specific behavior**::
120
+ Different settings for development, staging, production.
121
+ +
122
+ [source,bash]
123
+ ----
124
+ # Development
125
+ export CANON_VERBOSE_DIFF=true
126
+
127
+ # Production
128
+ export CANON_VERBOSE_DIFF=false
129
+ ----
130
+
131
+ **Testing algorithm behavior**::
132
+ Quick switching between algorithms for comparison.
133
+ +
134
+ [source,bash]
135
+ ----
136
+ # Test with DOM
137
+ CANON_ALGORITHM=dom bundle exec rspec
138
+
139
+ # Test with semantic
140
+ CANON_ALGORITHM=semantic bundle exec rspec
141
+ ----
142
+
143
+ === Use programmatic config for
144
+
145
+ **Application defaults**::
146
+ Set sensible defaults in your application code.
147
+ +
148
+ [source,ruby]
149
+ ----
150
+ # config/initializers/canon.rb
151
+ Canon::Config.configure do |config|
152
+ config.xml.diff.algorithm = :dom
153
+ config.xml.diff.use_color = true
154
+ end
155
+ ----
156
+
157
+ **Test-specific overrides**::
158
+ Per-test configuration in RSpec.
159
+ +
160
+ [source,ruby]
161
+ ----
162
+ RSpec.describe "My feature" do
163
+ it "compares with specific options" do
164
+ result = Canon::Comparison.equivalent?(doc1, doc2,
165
+ diff_algorithm: :semantic # Test-specific
166
+ )
167
+ end
168
+ end
169
+ ----
170
+
171
+ **Dynamic configuration**::
172
+ Runtime decisions based on document size or other factors.
173
+ +
174
+ [source,ruby]
175
+ ----
176
+ algorithm = file_size > 100_000 ? :dom : :semantic
177
+ Canon::Comparison.equivalent?(doc1, doc2,
178
+ diff_algorithm: algorithm
179
+ )
180
+ ----
181
+
182
+ == Verification and debugging
183
+
184
+ === Check which source provides a value
185
+
186
+ Use the resolver to inspect configuration sources:
187
+
188
+ [source,ruby]
189
+ ----
190
+ config = Canon::Config.instance
191
+ resolver = config.xml.diff.instance_variable_get(:@resolver)
192
+
193
+ # Check sources
194
+ puts "Algorithm from: #{resolver.source_for(:algorithm)}"
195
+ # => "env" | "programmatic" | "default"
196
+
197
+ # See all sources
198
+ puts "ENV values: #{resolver.env.inspect}"
199
+ puts "Programmatic values: #{resolver.programmatic.inspect}"
200
+ puts "Defaults: #{resolver.defaults.inspect}"
201
+ ----
202
+
203
+ === List all active ENV variables
204
+
205
+ [source,ruby]
206
+ ----
207
+ # Get all Canon ENV variables
208
+ canon_env = ENV.select { |k, v| k.start_with?('CANON_') }
209
+ puts "Active Canon ENV variables:"
210
+ canon_env.each do |key, value|
211
+ puts " #{key} = #{value}"
212
+ end
213
+ ----
214
+
215
+ === Test ENV override behavior
216
+
217
+ [source,ruby]
218
+ ----
219
+ # Verify ENV takes precedence
220
+ ENV['CANON_ALGORITHM'] = 'semantic'
221
+
222
+ config = Canon::Config.instance
223
+ config.xml.diff.algorithm = :dom # Should be ignored
224
+
225
+ if config.xml.diff.algorithm == :semantic
226
+ puts "✓ ENV override working correctly"
227
+ else
228
+ puts "✗ ENV override NOT working!"
229
+ end
230
+ ----
231
+
232
+ == Common patterns
233
+
234
+ === Pattern 1: Sensible defaults with ENV override
235
+
236
+ [source,ruby]
237
+ ----
238
+ # config/initializers/canon.rb
239
+ # Set defaults that work for most cases
240
+ Canon::RSpecMatchers.configure do |config|
241
+ config.xml.diff.algorithm = :dom
242
+ config.xml.diff.use_color = !ENV['CI']
243
+ end
244
+
245
+ # Users can override per test run:
246
+ # CANON_ALGORITHM=semantic bundle exec rspec
247
+ ----
248
+
249
+ === Pattern 2: Environment-based configuration
250
+
251
+ [source,ruby]
252
+ ----
253
+ # config/environments/development.rb
254
+ ENV['CANON_VERBOSE_DIFF'] = 'true'
255
+ ENV['CANON_USE_COLOR'] = 'true'
256
+
257
+ # config/environments/production.rb
258
+ ENV['CANON_VERBOSE_DIFF'] = 'false'
259
+ ENV['CANON_USE_COLOR'] = 'false'
260
+ ----
261
+
262
+ === Pattern 3: CI matrix testing
263
+
264
+ [source,yaml]
265
+ ----
266
+ # .github/workflows/test.yml
267
+ strategy:
268
+ matrix:
269
+ algorithm: [dom, semantic]
270
+ steps:
271
+ - name: Run tests
272
+ env:
273
+ CANON_ALGORITHM: ${{ matrix.algorithm }}
274
+ run: bundle exec rspec
275
+ ----
276
+
277
+ === Pattern 4: Format-specific CI configuration
278
+
279
+ [source,yaml]
280
+ ----
281
+ # .github/workflows/test.yml
282
+ env:
283
+ CANON_XML_DIFF_ALGORITHM: semantic # XML uses semantic
284
+ CANON_HTML_DIFF_ALGORITHM: dom # HTML uses DOM
285
+ CANON_USE_COLOR: false # All formats: no color
286
+ ----
287
+
288
+ == Troubleshooting
289
+
290
+ === ENV variable not working
291
+
292
+ **Problem**: ENV variable seems to be ignored.
293
+
294
+ **Checklist**:
295
+
296
+ 1. **Variable name correct?**
297
+ +
298
+ [source,bash]
299
+ ----
300
+ # Correct
301
+ export CANON_XML_DIFF_ALGORITHM=semantic
302
+
303
+ # Wrong (underscore placement)
304
+ export CANON_XML_DIFFALGORITHM=semantic
305
+ ----
306
+
307
+ 2. **ENV set before Canon loads?**
308
+ +
309
+ [source,ruby]
310
+ ----
311
+ # Wrong order
312
+ require 'canon'
313
+ ENV['CANON_ALGORITHM'] = 'semantic' # Too late!
314
+
315
+ # Correct order
316
+ ENV['CANON_ALGORITHM'] = 'semantic'
317
+ require 'canon'
318
+ ----
319
+
320
+ 3. **Value valid for attribute type?**
321
+ +
322
+ [source,bash]
323
+ ----
324
+ # Wrong (should be symbol name)
325
+ export CANON_ALGORITHM=:semantic
326
+
327
+ # Correct
328
+ export CANON_ALGORITHM=semantic
329
+ ----
330
+
331
+ === Programmatic setting not working
332
+
333
+ **Problem**: Setting `config.xml.diff.algorithm = :semantic` doesn't work.
334
+
335
+ **Solution**: Check if ENV variable is set:
336
+
337
+ [source,bash]
338
+ ----
339
+ # Check current ENV
340
+ echo $CANON_XML_DIFF_ALGORITHM
341
+ echo $CANON_ALGORITHM
342
+
343
+ # If set, unset it
344
+ unset CANON_XML_DIFF_ALGORITHM
345
+ unset CANON_ALGORITHM
346
+ ----
347
+
348
+ === Inconsistent behavior across runs
349
+
350
+ **Problem**: Tests behave differently on different machines.
351
+
352
+ **Cause**: One machine has Canon ENV variables set in shell profile.
353
+
354
+ **Solution**: Document required ENV variables or unset them in test setup:
355
+
356
+ [source,ruby]
357
+ ----
358
+ # spec/spec_helper.rb
359
+ RSpec.configure do |config|
360
+ config.before(:suite) do
361
+ # Clear any Canon ENV vars to ensure consistent tests
362
+ ENV.keys.select { |k| k.start_with?('CANON_') }.each do |key|
363
+ ENV.delete(key)
364
+ end
365
+ end
366
+ end
367
+ ----
368
+
369
+ == Best practices
370
+
371
+ === Document expected ENV variables
372
+
373
+ Create a `.env.example` or document ENV variables:
374
+
375
+ [source,bash]
376
+ ----
377
+ # .env.example
378
+ # Canon Configuration
379
+ # Uncomment and modify as needed
380
+
381
+ # CANON_ALGORITHM=dom
382
+ # CANON_USE_COLOR=true
383
+ # CANON_MAX_FILE_SIZE=5242880
384
+ ----
385
+
386
+ === Use ENV for deployment, code for defaults
387
+
388
+ [source,ruby]
389
+ ----
390
+ # Good: Code provides defaults
391
+ Canon::RSpecMatchers.configure do |config|
392
+ config.xml.diff.algorithm = :dom # Default
393
+ end
394
+
395
+ # Good: ENV overrides for deployment
396
+ # In production: CANON_ALGORITHM=semantic
397
+ ----
398
+
399
+ === Avoid mixing ENV and programmatic for same attribute
400
+
401
+ [source,ruby]
402
+ ----
403
+ # Confusing: Don't do this
404
+ ENV['CANON_ALGORITHM'] = 'semantic'
405
+ config.xml.diff.algorithm = :dom # Ignored, but confusing
406
+
407
+ # Better: Use one or the other consistently
408
+ ENV['CANON_ALGORITHM'] = 'semantic'
409
+ # OR
410
+ config.xml.diff.algorithm = :dom
411
+ ----
412
+
413
+ === Test with both ENV and without
414
+
415
+ [source,ruby]
416
+ ----
417
+ RSpec.describe "Canon behavior" do
418
+ context "without ENV override" do
419
+ before { ENV.delete('CANON_ALGORITHM') }
420
+ # Tests...
421
+ end
422
+
423
+ context "with ENV override" do
424
+ before { ENV['CANON_ALGORITHM'] = 'semantic' }
425
+ after { ENV.delete('CANON_ALGORITHM') }
426
+ # Tests...
427
+ end
428
+ end
429
+ ----
430
+
431
+ == See also
432
+
433
+ * link:index.adoc[Environment Configuration] - Overview and usage
434
+ * link:size-limits.adoc[Size Limits] - Limit-specific ENV variables
435
+ * link:../../reference/environment-variables.adoc[Environment Variables Reference] - Complete listing
436
+ * link:../../reference/options-across-interfaces.adoc[Options Across Interfaces] - How options work in CLI, Ruby, RSpec
@@ -0,0 +1,273 @@
1
+ ---
2
+ title: Size Limits
3
+ parent: Environment Configuration
4
+ grand_parent: Features
5
+ nav_order: 1
6
+ ---
7
+ = Size limits
8
+ :toc:
9
+ :toclevels: 3
10
+
11
+ == Purpose
12
+
13
+ Canon provides configurable size limits to prevent hangs or excessive resource usage when processing very large files. These limits apply uniformly across all interfaces (CLI, Ruby API, RSpec).
14
+
15
+ == Available limits
16
+
17
+ === File size limit
18
+
19
+ Maximum file size in bytes before comparison is rejected.
20
+
21
+ **Environment variable**: `CANON_MAX_FILE_SIZE` or `CANON_{FORMAT}_DIFF_MAX_FILE_SIZE`
22
+
23
+ **Default**: 5,242,880 bytes (5 MB)
24
+
25
+ **When triggered**: Before parsing, if either input file exceeds this size
26
+
27
+ [source,bash]
28
+ ----
29
+ # Set max file size to 10MB for XML
30
+ export CANON_XML_DIFF_MAX_FILE_SIZE=10485760
31
+
32
+ # Set globally (5MB default)
33
+ export CANON_MAX_FILE_SIZE=5242880
34
+ ----
35
+
36
+ === Node count limit
37
+
38
+ Maximum number of nodes in a tree structure before comparison is rejected.
39
+
40
+ **Environment variable**: `CANON_MAX_NODE_COUNT` or `CANON_{FORMAT}_DIFF_MAX_NODE_COUNT`
41
+
42
+ **Default**: 10,000 nodes
43
+
44
+ **When triggered**: After parsing, if the document tree exceeds this node count
45
+
46
+ [source,bash]
47
+ ----
48
+ # Set max node count for XML diff
49
+ export CANON_XML_DIFF_MAX_NODE_COUNT=20000
50
+
51
+ # Set globally (10,000 default)
52
+ export CANON_MAX_NODE_COUNT=10000
53
+ ----
54
+
55
+ === Diff output lines limit
56
+
57
+ Maximum number of lines in diff output before truncation.
58
+
59
+ **Environment variable**: `CANON_MAX_DIFF_LINES` or `CANON_{FORMAT}_DIFF_MAX_DIFF_LINES`
60
+
61
+ **Default**: 10,000 lines
62
+
63
+ **When triggered**: During diff generation, output is truncated at this line count
64
+
65
+ [source,bash]
66
+ ----
67
+ # Set max diff lines for XML
68
+ export CANON_XML_DIFF_MAX_DIFF_LINES=15000
69
+
70
+ # Set globally (10,000 default)
71
+ export CANON_MAX_DIFF_LINES=10000
72
+ ----
73
+
74
+ == Common scenarios
75
+
76
+ === Large SVG files
77
+
78
+ When working with large SVG files (e.g., 3.5MB) that may cause hangs:
79
+
80
+ [source,bash]
81
+ ----
82
+ # Increase limits for large SVG processing
83
+ export CANON_MAX_FILE_SIZE=10485760 # 10MB
84
+ export CANON_MAX_NODE_COUNT=50000 # 50,000 nodes
85
+ export CANON_MAX_DIFF_LINES=20000 # 20,000 lines
86
+
87
+ bundle exec rspec spec/test_031_spec.rb
88
+ ----
89
+
90
+ === CI/CD with large documents
91
+
92
+ In CI environments where you know documents are large but safe:
93
+
94
+ [source,bash]
95
+ ----
96
+ # In .github/workflows/test.yml
97
+ env:
98
+ CANON_MAX_FILE_SIZE: 20971520 # 20MB
99
+ CANON_MAX_NODE_COUNT: 100000 # 100k nodes
100
+ CANON_MAX_DIFF_LINES: 50000 # 50k lines
101
+ ----
102
+
103
+ === Format-specific limits
104
+
105
+ Different formats may need different limits:
106
+
107
+ [source,bash]
108
+ ----
109
+ # XML can have more nodes
110
+ export CANON_XML_DIFF_MAX_NODE_COUNT=50000
111
+
112
+ # JSON typically has fewer nodes
113
+ export CANON_JSON_DIFF_MAX_NODE_COUNT=10000
114
+
115
+ # HTML might need larger file size
116
+ export CANON_HTML_DIFF_MAX_FILE_SIZE=10485760
117
+ ----
118
+
119
+ == Disabling limits
120
+
121
+ To disable a limit, set it to 0 or a negative value:
122
+
123
+ [source,bash]
124
+ ----
125
+ # Disable file size limit (not recommended)
126
+ export CANON_MAX_FILE_SIZE=0
127
+
128
+ # Disable node count limit (use with caution)
129
+ export CANON_MAX_NODE_COUNT=-1
130
+ ----
131
+
132
+ WARNING: Disabling limits may cause Canon to hang or consume excessive memory on pathologically large inputs.
133
+
134
+ == Error messages
135
+
136
+ When a limit is exceeded, Canon raises a clear error:
137
+
138
+ === File size exceeded
139
+
140
+ [source]
141
+ ----
142
+ Canon::ValidationError: File size exceeds maximum limit
143
+ Maximum: 5242880 bytes (5.0 MB)
144
+ Actual: 10485760 bytes (10.0 MB)
145
+
146
+ To increase this limit, set:
147
+ CANON_MAX_FILE_SIZE=10485760
148
+ ----
149
+
150
+ === Node count exceeded
151
+
152
+ [source]
153
+ ----
154
+ Canon::ValidationError: Node count exceeds maximum limit
155
+ Maximum: 10000 nodes
156
+ Actual: 25000 nodes
157
+
158
+ To increase this limit, set:
159
+ CANON_MAX_NODE_COUNT=25000
160
+ ----
161
+
162
+ === Diff lines exceeded
163
+
164
+ [source]
165
+ ----
166
+ Canon::DiffTruncationWarning: Diff output truncated
167
+ Maximum: 10000 lines
168
+ Actual: 15000 lines (truncated to 10000)
169
+
170
+ To increase this limit, set:
171
+ CANON_MAX_DIFF_LINES=15000
172
+ ----
173
+
174
+ == Programmatic configuration
175
+
176
+ While environment variables are recommended, you can also configure limits programmatically:
177
+
178
+ [source,ruby]
179
+ ----
180
+ # NOT RECOMMENDED - use ENV vars instead
181
+ Canon::Config.instance.xml.diff.max_file_size = 10_485_760
182
+ Canon::Config.instance.xml.diff.max_node_count = 50_000
183
+ Canon::Config.instance.xml.diff.max_diff_lines = 20_000
184
+ ----
185
+
186
+ However, **environment variables will override programmatic settings** per the priority chain.
187
+
188
+ == Performance considerations
189
+
190
+ === Why limits exist
191
+
192
+ Limits prevent:
193
+
194
+ * **Hangs**: Very large documents can cause O(n²) algorithms to hang
195
+ * **Memory exhaustion**: Huge trees consume excessive RAM
196
+ * **Unreadable output**: 100k+ line diffs are not useful
197
+
198
+ === Choosing appropriate limits
199
+
200
+ **File size**:
201
+
202
+ * **Small projects**: 5MB default is fine
203
+ * **Large documents**: 10-20MB for SVG, generated HTML
204
+ * **Very large**: 50MB+ only if you know what you're doing
205
+
206
+ **Node count**:
207
+
208
+ * **Simple documents**: 10k default is fine
209
+ * **Complex documents**: 50k for large XML, nested JSON
210
+ * **Very complex**: 100k+ only for known-safe inputs
211
+
212
+ **Diff lines**:
213
+
214
+ * **Readable output**: 10k default is fine
215
+ * **Detailed diffs**: 20-50k for comprehensive output
216
+ * **Debug mode**: 100k+ for full comparison
217
+
218
+ == Troubleshooting
219
+
220
+ === Tests failing with size limit errors
221
+
222
+ If your tests start failing due to size limits:
223
+
224
+ 1. **Verify the limit is appropriate**: Check if documents really are that large
225
+ 2. **Set ENV in test helper**:
226
+ +
227
+ [source,ruby]
228
+ ----
229
+ # spec/spec_helper.rb
230
+ ENV['CANON_MAX_FILE_SIZE'] = '10485760' # 10MB
231
+ ENV['CANON_MAX_NODE_COUNT'] = '50000'
232
+ ----
233
+
234
+ 3. **Or set per-test**:
235
+ +
236
+ [source,ruby]
237
+ ----
238
+ around do |example|
239
+ ClimateControl.modify(
240
+ CANON_MAX_FILE_SIZE: '10485760'
241
+ ) do
242
+ example.run
243
+ end
244
+ end
245
+ ----
246
+
247
+ === Performance degradation
248
+
249
+ If comparisons become slow after increasing limits:
250
+
251
+ 1. **Use DOM algorithm**: Faster than semantic for large documents
252
+ +
253
+ [source,bash]
254
+ ----
255
+ export CANON_ALGORITHM=dom
256
+ ----
257
+
258
+ 2. **Disable expensive features**:
259
+ +
260
+ [source,bash]
261
+ ----
262
+ export CANON_SHOW_COMPARE=false
263
+ export CANON_VERBOSE_DIFF=false
264
+ ----
265
+
266
+ 3. **Consider if you really need to compare such large files**
267
+
268
+ == See also
269
+
270
+ * link:index.adoc[Environment Configuration] - Complete ENV configuration
271
+ * link:override-system.adoc[Override System] - How ENV vars work
272
+ * link:../../reference/environment-variables.adoc[Environment Variables Reference] - All variables
273
+ * link:../../understanding/algorithms/dom-diff.adoc[DOM Algorithm] - Faster for large files