canon 0.1.6 → 0.1.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop_todo.yml +163 -67
- data/README.adoc +400 -7
- data/docs/Gemfile +9 -0
- data/docs/INDEX.adoc +99 -182
- data/docs/_config.yml +100 -0
- data/docs/advanced/diff-classification.adoc +547 -0
- data/docs/advanced/diff-pipeline.adoc +358 -0
- data/docs/advanced/index.adoc +214 -0
- data/docs/advanced/semantic-diff-report.adoc +390 -0
- data/docs/{VERBOSE.adoc → advanced/verbose-mode-architecture.adoc} +51 -53
- data/docs/features/diff-formatting/algorithm-specific-output.adoc +533 -0
- data/docs/{CHARACTER_VISUALIZATION.adoc → features/diff-formatting/character-visualization.adoc} +23 -62
- data/docs/features/diff-formatting/colors-and-symbols.adoc +606 -0
- data/docs/features/diff-formatting/context-and-grouping.adoc +490 -0
- data/docs/features/diff-formatting/display-filtering.adoc +472 -0
- data/docs/features/diff-formatting/index.adoc +140 -0
- data/docs/features/environment-configuration/index.adoc +327 -0
- data/docs/features/environment-configuration/override-system.adoc +436 -0
- data/docs/features/environment-configuration/size-limits.adoc +273 -0
- data/docs/features/index.adoc +173 -0
- data/docs/features/input-validation/index.adoc +521 -0
- data/docs/features/match-options/algorithm-specific-behavior.adoc +365 -0
- data/docs/features/match-options/html-policies.adoc +312 -0
- data/docs/features/match-options/index.adoc +621 -0
- data/docs/getting-started/index.adoc +83 -0
- data/docs/getting-started/quick-start.adoc +76 -0
- data/docs/guides/choosing-configuration.adoc +689 -0
- data/docs/guides/index.adoc +181 -0
- data/docs/{CLI.adoc → interfaces/cli/index.adoc} +18 -13
- data/docs/interfaces/index.adoc +101 -0
- data/docs/{RSPEC.adoc → interfaces/rspec/index.adoc} +242 -31
- data/docs/{RUBY_API.adoc → interfaces/ruby-api/index.adoc} +118 -16
- data/docs/lychee.toml +65 -0
- data/docs/reference/cli-options.adoc +418 -0
- data/docs/reference/environment-variables.adoc +375 -0
- data/docs/reference/index.adoc +204 -0
- data/docs/reference/options-across-interfaces.adoc +417 -0
- data/docs/understanding/algorithms/dom-diff.adoc +389 -0
- data/docs/understanding/algorithms/index.adoc +314 -0
- data/docs/understanding/algorithms/semantic-tree-diff.adoc +533 -0
- data/docs/understanding/architecture.adoc +447 -0
- data/docs/understanding/comparison-pipeline.adoc +317 -0
- data/docs/understanding/formats/html.adoc +380 -0
- data/docs/understanding/formats/index.adoc +261 -0
- data/docs/understanding/formats/json.adoc +390 -0
- data/docs/understanding/formats/xml.adoc +366 -0
- data/docs/understanding/formats/yaml.adoc +504 -0
- data/docs/understanding/index.adoc +130 -0
- data/lib/canon/cli.rb +42 -1
- data/lib/canon/commands/diff_command.rb +108 -23
- data/lib/canon/comparison/compare_profile.rb +101 -0
- data/lib/canon/comparison/comparison_result.rb +41 -2
- data/lib/canon/comparison/html_comparator.rb +292 -71
- data/lib/canon/comparison/html_compare_profile.rb +117 -0
- data/lib/canon/comparison/match_options.rb +42 -4
- data/lib/canon/comparison/strategies/base_match_strategy.rb +99 -0
- data/lib/canon/comparison/strategies/match_strategy_factory.rb +74 -0
- data/lib/canon/comparison/strategies/semantic_tree_match_strategy.rb +220 -0
- data/lib/canon/comparison/xml_comparator.rb +695 -91
- data/lib/canon/comparison.rb +207 -2
- data/lib/canon/config/env_provider.rb +71 -0
- data/lib/canon/config/env_schema.rb +58 -0
- data/lib/canon/config/override_resolver.rb +55 -0
- data/lib/canon/config/type_converter.rb +59 -0
- data/lib/canon/config.rb +158 -29
- data/lib/canon/data_model.rb +29 -0
- data/lib/canon/diff/diff_classifier.rb +74 -14
- data/lib/canon/diff/diff_context_builder.rb +41 -0
- data/lib/canon/diff/diff_line.rb +18 -2
- data/lib/canon/diff/diff_node.rb +18 -3
- data/lib/canon/diff/diff_node_mapper.rb +71 -12
- data/lib/canon/diff/formatting_detector.rb +53 -0
- data/lib/canon/diff_formatter/by_line/base_formatter.rb +60 -5
- data/lib/canon/diff_formatter/by_line/html_formatter.rb +68 -16
- data/lib/canon/diff_formatter/by_line/json_formatter.rb +0 -37
- data/lib/canon/diff_formatter/by_line/simple_formatter.rb +0 -42
- data/lib/canon/diff_formatter/by_line/xml_formatter.rb +116 -31
- data/lib/canon/diff_formatter/by_line/yaml_formatter.rb +0 -37
- data/lib/canon/diff_formatter/by_object/base_formatter.rb +126 -19
- data/lib/canon/diff_formatter/by_object/xml_formatter.rb +30 -1
- data/lib/canon/diff_formatter/debug_output.rb +7 -1
- data/lib/canon/diff_formatter/diff_detail_formatter.rb +674 -57
- data/lib/canon/diff_formatter/legend.rb +42 -0
- data/lib/canon/diff_formatter.rb +78 -9
- data/lib/canon/errors.rb +56 -0
- data/lib/canon/formatters/html_formatter_base.rb +35 -1
- data/lib/canon/formatters/json_formatter.rb +3 -0
- data/lib/canon/formatters/yaml_formatter.rb +3 -0
- data/lib/canon/html/data_model.rb +229 -0
- data/lib/canon/html.rb +9 -0
- data/lib/canon/options/cli_generator.rb +70 -0
- data/lib/canon/options/registry.rb +234 -0
- data/lib/canon/rspec_matchers.rb +34 -13
- data/lib/canon/tree_diff/adapters/html_adapter.rb +316 -0
- data/lib/canon/tree_diff/adapters/json_adapter.rb +204 -0
- data/lib/canon/tree_diff/adapters/xml_adapter.rb +285 -0
- data/lib/canon/tree_diff/adapters/yaml_adapter.rb +213 -0
- data/lib/canon/tree_diff/core/attribute_comparator.rb +84 -0
- data/lib/canon/tree_diff/core/matching.rb +241 -0
- data/lib/canon/tree_diff/core/node_signature.rb +164 -0
- data/lib/canon/tree_diff/core/node_weight.rb +135 -0
- data/lib/canon/tree_diff/core/tree_node.rb +450 -0
- data/lib/canon/tree_diff/matchers/hash_matcher.rb +258 -0
- data/lib/canon/tree_diff/matchers/similarity_matcher.rb +168 -0
- data/lib/canon/tree_diff/matchers/structural_propagator.rb +242 -0
- data/lib/canon/tree_diff/matchers/universal_matcher.rb +220 -0
- data/lib/canon/tree_diff/operation_converter.rb +631 -0
- data/lib/canon/tree_diff/operations/operation.rb +92 -0
- data/lib/canon/tree_diff/operations/operation_detector.rb +626 -0
- data/lib/canon/tree_diff/tree_diff_integrator.rb +140 -0
- data/lib/canon/tree_diff.rb +33 -0
- data/lib/canon/validators/json_validator.rb +3 -1
- data/lib/canon/validators/yaml_validator.rb +3 -1
- data/lib/canon/version.rb +1 -1
- data/lib/canon/xml/data_model.rb +22 -23
- data/lib/canon/xml/element_matcher.rb +128 -20
- data/lib/canon/xml/namespace_helper.rb +110 -0
- data/lib/canon.rb +3 -0
- metadata +81 -23
- data/_config.yml +0 -116
- data/docs/ADVANCED_TOPICS.adoc +0 -20
- data/docs/BASIC_USAGE.adoc +0 -16
- data/docs/CUSTOMIZING_BEHAVIOR.adoc +0 -19
- data/docs/DIFF_ARCHITECTURE.adoc +0 -435
- data/docs/DIFF_FORMATTING.adoc +0 -540
- data/docs/FORMATS.adoc +0 -447
- data/docs/INPUT_VALIDATION.adoc +0 -477
- data/docs/MATCH_ARCHITECTURE.adoc +0 -463
- data/docs/MATCH_OPTIONS.adoc +0 -719
- data/docs/MODES.adoc +0 -432
- data/docs/NORMATIVE_INFORMATIVE_DIFFS.adoc +0 -219
- data/docs/OPTIONS.adoc +0 -1387
- data/docs/PREPROCESSING.adoc +0 -491
- data/docs/SEMANTIC_DIFF_REPORT.adoc +0 -528
- data/docs/UNDERSTANDING_CANON.adoc +0 -17
|
@@ -0,0 +1,436 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: Override System
|
|
3
|
+
parent: Environment Configuration
|
|
4
|
+
grand_parent: Features
|
|
5
|
+
nav_order: 2
|
|
6
|
+
---
|
|
7
|
+
= Override system
|
|
8
|
+
:toc:
|
|
9
|
+
:toclevels: 3
|
|
10
|
+
|
|
11
|
+
== Purpose
|
|
12
|
+
|
|
13
|
+
Canon's override system allows environment variables to take precedence over programmatic configuration, enabling flexible deployment without code changes.
|
|
14
|
+
|
|
15
|
+
This page explains how the priority chain works, when overrides apply, and how to use them effectively.
|
|
16
|
+
|
|
17
|
+
== Priority chain
|
|
18
|
+
|
|
19
|
+
Configuration values are resolved using a strict three-level priority:
|
|
20
|
+
|
|
21
|
+
[source]
|
|
22
|
+
----
|
|
23
|
+
┌─────────────────────────────────┐
|
|
24
|
+
│ 1. Environment Variables │ ← Highest Priority
|
|
25
|
+
│ (CANON_XML_DIFF_ALGORITHM) │
|
|
26
|
+
└──────────────┬──────────────────┘
|
|
27
|
+
↓ overrides
|
|
28
|
+
┌─────────────────────────────────┐
|
|
29
|
+
│ 2. Programmatic Configuration │ ← Medium Priority
|
|
30
|
+
│ (config.xml.diff.algorithm) │
|
|
31
|
+
└──────────────┬──────────────────┘
|
|
32
|
+
↓ overrides
|
|
33
|
+
┌─────────────────────────────────┐
|
|
34
|
+
│ 3. Default Values │ ← Lowest Priority
|
|
35
|
+
│ (defined in Canon::Config) │
|
|
36
|
+
└─────────────────────────────────┘
|
|
37
|
+
----
|
|
38
|
+
|
|
39
|
+
**Rule**: Higher priority always wins, regardless of when values are set.
|
|
40
|
+
|
|
41
|
+
== How overrides work
|
|
42
|
+
|
|
43
|
+
=== Environment variables override programmatic settings
|
|
44
|
+
|
|
45
|
+
When an environment variable is set, it **always** takes precedence:
|
|
46
|
+
|
|
47
|
+
[source,ruby]
|
|
48
|
+
----
|
|
49
|
+
# Set ENV before requiring Canon
|
|
50
|
+
ENV['CANON_XML_DIFF_ALGORITHM'] = 'semantic'
|
|
51
|
+
|
|
52
|
+
# Programmatic setting is IGNORED
|
|
53
|
+
config = Canon::Config.instance
|
|
54
|
+
config.xml.diff.algorithm = :dom
|
|
55
|
+
|
|
56
|
+
# ENV wins
|
|
57
|
+
puts config.xml.diff.algorithm # => :semantic (not :dom)
|
|
58
|
+
----
|
|
59
|
+
|
|
60
|
+
=== Format-specific variables override global variables
|
|
61
|
+
|
|
62
|
+
Format-specific ENV vars override global ENV vars:
|
|
63
|
+
|
|
64
|
+
[source,bash]
|
|
65
|
+
----
|
|
66
|
+
# Global setting
|
|
67
|
+
export CANON_ALGORITHM=dom
|
|
68
|
+
|
|
69
|
+
# Format-specific override
|
|
70
|
+
export CANON_XML_DIFF_ALGORITHM=semantic
|
|
71
|
+
|
|
72
|
+
# Result:
|
|
73
|
+
# - XML uses semantic (format-specific wins)
|
|
74
|
+
# - HTML, JSON, YAML use dom (global applies)
|
|
75
|
+
----
|
|
76
|
+
|
|
77
|
+
=== Setting order doesn't matter
|
|
78
|
+
|
|
79
|
+
Unlike programmatic configuration, ENV variable priority is **positional**, not temporal:
|
|
80
|
+
|
|
81
|
+
[source,ruby]
|
|
82
|
+
----
|
|
83
|
+
# Set ENV first
|
|
84
|
+
ENV['CANON_ALGORITHM'] = 'semantic'
|
|
85
|
+
|
|
86
|
+
# Configure programmatically later
|
|
87
|
+
config = Canon::Config.instance
|
|
88
|
+
config.xml.diff.algorithm = :dom # This is IGNORED
|
|
89
|
+
|
|
90
|
+
# ENV still wins, even though set "earlier"
|
|
91
|
+
puts config.xml.diff.algorithm # => :semantic
|
|
92
|
+
----
|
|
93
|
+
|
|
94
|
+
== When to use environment variables
|
|
95
|
+
|
|
96
|
+
=== Use ENV variables for
|
|
97
|
+
|
|
98
|
+
**CI/CD configuration**::
|
|
99
|
+
Different test runs need different settings without code changes.
|
|
100
|
+
+
|
|
101
|
+
[source,bash]
|
|
102
|
+
----
|
|
103
|
+
# In .github/workflows/test.yml
|
|
104
|
+
env:
|
|
105
|
+
CANON_ALGORITHM: semantic
|
|
106
|
+
CANON_USE_COLOR: false
|
|
107
|
+
----
|
|
108
|
+
|
|
109
|
+
**Container deployment**::
|
|
110
|
+
Docker containers with different comparison behavior.
|
|
111
|
+
+
|
|
112
|
+
[source,dockerfile]
|
|
113
|
+
----
|
|
114
|
+
# Dockerfile
|
|
115
|
+
ENV CANON_XML_DIFF_ALGORITHM=semantic
|
|
116
|
+
ENV CANON_CONTEXT_LINES=5
|
|
117
|
+
----
|
|
118
|
+
|
|
119
|
+
**Environment-specific behavior**::
|
|
120
|
+
Different settings for development, staging, production.
|
|
121
|
+
+
|
|
122
|
+
[source,bash]
|
|
123
|
+
----
|
|
124
|
+
# Development
|
|
125
|
+
export CANON_VERBOSE_DIFF=true
|
|
126
|
+
|
|
127
|
+
# Production
|
|
128
|
+
export CANON_VERBOSE_DIFF=false
|
|
129
|
+
----
|
|
130
|
+
|
|
131
|
+
**Testing algorithm behavior**::
|
|
132
|
+
Quick switching between algorithms for comparison.
|
|
133
|
+
+
|
|
134
|
+
[source,bash]
|
|
135
|
+
----
|
|
136
|
+
# Test with DOM
|
|
137
|
+
CANON_ALGORITHM=dom bundle exec rspec
|
|
138
|
+
|
|
139
|
+
# Test with semantic
|
|
140
|
+
CANON_ALGORITHM=semantic bundle exec rspec
|
|
141
|
+
----
|
|
142
|
+
|
|
143
|
+
=== Use programmatic config for
|
|
144
|
+
|
|
145
|
+
**Application defaults**::
|
|
146
|
+
Set sensible defaults in your application code.
|
|
147
|
+
+
|
|
148
|
+
[source,ruby]
|
|
149
|
+
----
|
|
150
|
+
# config/initializers/canon.rb
|
|
151
|
+
Canon::Config.configure do |config|
|
|
152
|
+
config.xml.diff.algorithm = :dom
|
|
153
|
+
config.xml.diff.use_color = true
|
|
154
|
+
end
|
|
155
|
+
----
|
|
156
|
+
|
|
157
|
+
**Test-specific overrides**::
|
|
158
|
+
Per-test configuration in RSpec.
|
|
159
|
+
+
|
|
160
|
+
[source,ruby]
|
|
161
|
+
----
|
|
162
|
+
RSpec.describe "My feature" do
|
|
163
|
+
it "compares with specific options" do
|
|
164
|
+
result = Canon::Comparison.equivalent?(doc1, doc2,
|
|
165
|
+
diff_algorithm: :semantic # Test-specific
|
|
166
|
+
)
|
|
167
|
+
end
|
|
168
|
+
end
|
|
169
|
+
----
|
|
170
|
+
|
|
171
|
+
**Dynamic configuration**::
|
|
172
|
+
Runtime decisions based on document size or other factors.
|
|
173
|
+
+
|
|
174
|
+
[source,ruby]
|
|
175
|
+
----
|
|
176
|
+
algorithm = file_size > 100_000 ? :dom : :semantic
|
|
177
|
+
Canon::Comparison.equivalent?(doc1, doc2,
|
|
178
|
+
diff_algorithm: algorithm
|
|
179
|
+
)
|
|
180
|
+
----
|
|
181
|
+
|
|
182
|
+
== Verification and debugging
|
|
183
|
+
|
|
184
|
+
=== Check which source provides a value
|
|
185
|
+
|
|
186
|
+
Use the resolver to inspect configuration sources:
|
|
187
|
+
|
|
188
|
+
[source,ruby]
|
|
189
|
+
----
|
|
190
|
+
config = Canon::Config.instance
|
|
191
|
+
resolver = config.xml.diff.instance_variable_get(:@resolver)
|
|
192
|
+
|
|
193
|
+
# Check sources
|
|
194
|
+
puts "Algorithm from: #{resolver.source_for(:algorithm)}"
|
|
195
|
+
# => "env" | "programmatic" | "default"
|
|
196
|
+
|
|
197
|
+
# See all sources
|
|
198
|
+
puts "ENV values: #{resolver.env.inspect}"
|
|
199
|
+
puts "Programmatic values: #{resolver.programmatic.inspect}"
|
|
200
|
+
puts "Defaults: #{resolver.defaults.inspect}"
|
|
201
|
+
----
|
|
202
|
+
|
|
203
|
+
=== List all active ENV variables
|
|
204
|
+
|
|
205
|
+
[source,ruby]
|
|
206
|
+
----
|
|
207
|
+
# Get all Canon ENV variables
|
|
208
|
+
canon_env = ENV.select { |k, v| k.start_with?('CANON_') }
|
|
209
|
+
puts "Active Canon ENV variables:"
|
|
210
|
+
canon_env.each do |key, value|
|
|
211
|
+
puts " #{key} = #{value}"
|
|
212
|
+
end
|
|
213
|
+
----
|
|
214
|
+
|
|
215
|
+
=== Test ENV override behavior
|
|
216
|
+
|
|
217
|
+
[source,ruby]
|
|
218
|
+
----
|
|
219
|
+
# Verify ENV takes precedence
|
|
220
|
+
ENV['CANON_ALGORITHM'] = 'semantic'
|
|
221
|
+
|
|
222
|
+
config = Canon::Config.instance
|
|
223
|
+
config.xml.diff.algorithm = :dom # Should be ignored
|
|
224
|
+
|
|
225
|
+
if config.xml.diff.algorithm == :semantic
|
|
226
|
+
puts "✓ ENV override working correctly"
|
|
227
|
+
else
|
|
228
|
+
puts "✗ ENV override NOT working!"
|
|
229
|
+
end
|
|
230
|
+
----
|
|
231
|
+
|
|
232
|
+
== Common patterns
|
|
233
|
+
|
|
234
|
+
=== Pattern 1: Sensible defaults with ENV override
|
|
235
|
+
|
|
236
|
+
[source,ruby]
|
|
237
|
+
----
|
|
238
|
+
# config/initializers/canon.rb
|
|
239
|
+
# Set defaults that work for most cases
|
|
240
|
+
Canon::RSpecMatchers.configure do |config|
|
|
241
|
+
config.xml.diff.algorithm = :dom
|
|
242
|
+
config.xml.diff.use_color = !ENV['CI']
|
|
243
|
+
end
|
|
244
|
+
|
|
245
|
+
# Users can override per test run:
|
|
246
|
+
# CANON_ALGORITHM=semantic bundle exec rspec
|
|
247
|
+
----
|
|
248
|
+
|
|
249
|
+
=== Pattern 2: Environment-based configuration
|
|
250
|
+
|
|
251
|
+
[source,ruby]
|
|
252
|
+
----
|
|
253
|
+
# config/environments/development.rb
|
|
254
|
+
ENV['CANON_VERBOSE_DIFF'] = 'true'
|
|
255
|
+
ENV['CANON_USE_COLOR'] = 'true'
|
|
256
|
+
|
|
257
|
+
# config/environments/production.rb
|
|
258
|
+
ENV['CANON_VERBOSE_DIFF'] = 'false'
|
|
259
|
+
ENV['CANON_USE_COLOR'] = 'false'
|
|
260
|
+
----
|
|
261
|
+
|
|
262
|
+
=== Pattern 3: CI matrix testing
|
|
263
|
+
|
|
264
|
+
[source,yaml]
|
|
265
|
+
----
|
|
266
|
+
# .github/workflows/test.yml
|
|
267
|
+
strategy:
|
|
268
|
+
matrix:
|
|
269
|
+
algorithm: [dom, semantic]
|
|
270
|
+
steps:
|
|
271
|
+
- name: Run tests
|
|
272
|
+
env:
|
|
273
|
+
CANON_ALGORITHM: ${{ matrix.algorithm }}
|
|
274
|
+
run: bundle exec rspec
|
|
275
|
+
----
|
|
276
|
+
|
|
277
|
+
=== Pattern 4: Format-specific CI configuration
|
|
278
|
+
|
|
279
|
+
[source,yaml]
|
|
280
|
+
----
|
|
281
|
+
# .github/workflows/test.yml
|
|
282
|
+
env:
|
|
283
|
+
CANON_XML_DIFF_ALGORITHM: semantic # XML uses semantic
|
|
284
|
+
CANON_HTML_DIFF_ALGORITHM: dom # HTML uses DOM
|
|
285
|
+
CANON_USE_COLOR: false # All formats: no color
|
|
286
|
+
----
|
|
287
|
+
|
|
288
|
+
== Troubleshooting
|
|
289
|
+
|
|
290
|
+
=== ENV variable not working
|
|
291
|
+
|
|
292
|
+
**Problem**: ENV variable seems to be ignored.
|
|
293
|
+
|
|
294
|
+
**Checklist**:
|
|
295
|
+
|
|
296
|
+
1. **Variable name correct?**
|
|
297
|
+
+
|
|
298
|
+
[source,bash]
|
|
299
|
+
----
|
|
300
|
+
# Correct
|
|
301
|
+
export CANON_XML_DIFF_ALGORITHM=semantic
|
|
302
|
+
|
|
303
|
+
# Wrong (underscore placement)
|
|
304
|
+
export CANON_XML_DIFFALGORITHM=semantic
|
|
305
|
+
----
|
|
306
|
+
|
|
307
|
+
2. **ENV set before Canon loads?**
|
|
308
|
+
+
|
|
309
|
+
[source,ruby]
|
|
310
|
+
----
|
|
311
|
+
# Wrong order
|
|
312
|
+
require 'canon'
|
|
313
|
+
ENV['CANON_ALGORITHM'] = 'semantic' # Too late!
|
|
314
|
+
|
|
315
|
+
# Correct order
|
|
316
|
+
ENV['CANON_ALGORITHM'] = 'semantic'
|
|
317
|
+
require 'canon'
|
|
318
|
+
----
|
|
319
|
+
|
|
320
|
+
3. **Value valid for attribute type?**
|
|
321
|
+
+
|
|
322
|
+
[source,bash]
|
|
323
|
+
----
|
|
324
|
+
# Wrong (should be symbol name)
|
|
325
|
+
export CANON_ALGORITHM=:semantic
|
|
326
|
+
|
|
327
|
+
# Correct
|
|
328
|
+
export CANON_ALGORITHM=semantic
|
|
329
|
+
----
|
|
330
|
+
|
|
331
|
+
=== Programmatic setting not working
|
|
332
|
+
|
|
333
|
+
**Problem**: Setting `config.xml.diff.algorithm = :semantic` doesn't work.
|
|
334
|
+
|
|
335
|
+
**Solution**: Check if ENV variable is set:
|
|
336
|
+
|
|
337
|
+
[source,bash]
|
|
338
|
+
----
|
|
339
|
+
# Check current ENV
|
|
340
|
+
echo $CANON_XML_DIFF_ALGORITHM
|
|
341
|
+
echo $CANON_ALGORITHM
|
|
342
|
+
|
|
343
|
+
# If set, unset it
|
|
344
|
+
unset CANON_XML_DIFF_ALGORITHM
|
|
345
|
+
unset CANON_ALGORITHM
|
|
346
|
+
----
|
|
347
|
+
|
|
348
|
+
=== Inconsistent behavior across runs
|
|
349
|
+
|
|
350
|
+
**Problem**: Tests behave differently on different machines.
|
|
351
|
+
|
|
352
|
+
**Cause**: One machine has Canon ENV variables set in shell profile.
|
|
353
|
+
|
|
354
|
+
**Solution**: Document required ENV variables or unset them in test setup:
|
|
355
|
+
|
|
356
|
+
[source,ruby]
|
|
357
|
+
----
|
|
358
|
+
# spec/spec_helper.rb
|
|
359
|
+
RSpec.configure do |config|
|
|
360
|
+
config.before(:suite) do
|
|
361
|
+
# Clear any Canon ENV vars to ensure consistent tests
|
|
362
|
+
ENV.keys.select { |k| k.start_with?('CANON_') }.each do |key|
|
|
363
|
+
ENV.delete(key)
|
|
364
|
+
end
|
|
365
|
+
end
|
|
366
|
+
end
|
|
367
|
+
----
|
|
368
|
+
|
|
369
|
+
== Best practices
|
|
370
|
+
|
|
371
|
+
=== Document expected ENV variables
|
|
372
|
+
|
|
373
|
+
Create a `.env.example` or document ENV variables:
|
|
374
|
+
|
|
375
|
+
[source,bash]
|
|
376
|
+
----
|
|
377
|
+
# .env.example
|
|
378
|
+
# Canon Configuration
|
|
379
|
+
# Uncomment and modify as needed
|
|
380
|
+
|
|
381
|
+
# CANON_ALGORITHM=dom
|
|
382
|
+
# CANON_USE_COLOR=true
|
|
383
|
+
# CANON_MAX_FILE_SIZE=5242880
|
|
384
|
+
----
|
|
385
|
+
|
|
386
|
+
=== Use ENV for deployment, code for defaults
|
|
387
|
+
|
|
388
|
+
[source,ruby]
|
|
389
|
+
----
|
|
390
|
+
# Good: Code provides defaults
|
|
391
|
+
Canon::RSpecMatchers.configure do |config|
|
|
392
|
+
config.xml.diff.algorithm = :dom # Default
|
|
393
|
+
end
|
|
394
|
+
|
|
395
|
+
# Good: ENV overrides for deployment
|
|
396
|
+
# In production: CANON_ALGORITHM=semantic
|
|
397
|
+
----
|
|
398
|
+
|
|
399
|
+
=== Avoid mixing ENV and programmatic for same attribute
|
|
400
|
+
|
|
401
|
+
[source,ruby]
|
|
402
|
+
----
|
|
403
|
+
# Confusing: Don't do this
|
|
404
|
+
ENV['CANON_ALGORITHM'] = 'semantic'
|
|
405
|
+
config.xml.diff.algorithm = :dom # Ignored, but confusing
|
|
406
|
+
|
|
407
|
+
# Better: Use one or the other consistently
|
|
408
|
+
ENV['CANON_ALGORITHM'] = 'semantic'
|
|
409
|
+
# OR
|
|
410
|
+
config.xml.diff.algorithm = :dom
|
|
411
|
+
----
|
|
412
|
+
|
|
413
|
+
=== Test with both ENV and without
|
|
414
|
+
|
|
415
|
+
[source,ruby]
|
|
416
|
+
----
|
|
417
|
+
RSpec.describe "Canon behavior" do
|
|
418
|
+
context "without ENV override" do
|
|
419
|
+
before { ENV.delete('CANON_ALGORITHM') }
|
|
420
|
+
# Tests...
|
|
421
|
+
end
|
|
422
|
+
|
|
423
|
+
context "with ENV override" do
|
|
424
|
+
before { ENV['CANON_ALGORITHM'] = 'semantic' }
|
|
425
|
+
after { ENV.delete('CANON_ALGORITHM') }
|
|
426
|
+
# Tests...
|
|
427
|
+
end
|
|
428
|
+
end
|
|
429
|
+
----
|
|
430
|
+
|
|
431
|
+
== See also
|
|
432
|
+
|
|
433
|
+
* link:index.adoc[Environment Configuration] - Overview and usage
|
|
434
|
+
* link:size-limits.adoc[Size Limits] - Limit-specific ENV variables
|
|
435
|
+
* link:../../reference/environment-variables.adoc[Environment Variables Reference] - Complete listing
|
|
436
|
+
* link:../../reference/options-across-interfaces.adoc[Options Across Interfaces] - How options work in CLI, Ruby, RSpec
|
|
@@ -0,0 +1,273 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: Size Limits
|
|
3
|
+
parent: Environment Configuration
|
|
4
|
+
grand_parent: Features
|
|
5
|
+
nav_order: 1
|
|
6
|
+
---
|
|
7
|
+
= Size limits
|
|
8
|
+
:toc:
|
|
9
|
+
:toclevels: 3
|
|
10
|
+
|
|
11
|
+
== Purpose
|
|
12
|
+
|
|
13
|
+
Canon provides configurable size limits to prevent hangs or excessive resource usage when processing very large files. These limits apply uniformly across all interfaces (CLI, Ruby API, RSpec).
|
|
14
|
+
|
|
15
|
+
== Available limits
|
|
16
|
+
|
|
17
|
+
=== File size limit
|
|
18
|
+
|
|
19
|
+
Maximum file size in bytes before comparison is rejected.
|
|
20
|
+
|
|
21
|
+
**Environment variable**: `CANON_MAX_FILE_SIZE` or `CANON_{FORMAT}_DIFF_MAX_FILE_SIZE`
|
|
22
|
+
|
|
23
|
+
**Default**: 5,242,880 bytes (5 MB)
|
|
24
|
+
|
|
25
|
+
**When triggered**: Before parsing, if either input file exceeds this size
|
|
26
|
+
|
|
27
|
+
[source,bash]
|
|
28
|
+
----
|
|
29
|
+
# Set max file size to 10MB for XML
|
|
30
|
+
export CANON_XML_DIFF_MAX_FILE_SIZE=10485760
|
|
31
|
+
|
|
32
|
+
# Set globally (5MB default)
|
|
33
|
+
export CANON_MAX_FILE_SIZE=5242880
|
|
34
|
+
----
|
|
35
|
+
|
|
36
|
+
=== Node count limit
|
|
37
|
+
|
|
38
|
+
Maximum number of nodes in a tree structure before comparison is rejected.
|
|
39
|
+
|
|
40
|
+
**Environment variable**: `CANON_MAX_NODE_COUNT` or `CANON_{FORMAT}_DIFF_MAX_NODE_COUNT`
|
|
41
|
+
|
|
42
|
+
**Default**: 10,000 nodes
|
|
43
|
+
|
|
44
|
+
**When triggered**: After parsing, if the document tree exceeds this node count
|
|
45
|
+
|
|
46
|
+
[source,bash]
|
|
47
|
+
----
|
|
48
|
+
# Set max node count for XML diff
|
|
49
|
+
export CANON_XML_DIFF_MAX_NODE_COUNT=20000
|
|
50
|
+
|
|
51
|
+
# Set globally (10,000 default)
|
|
52
|
+
export CANON_MAX_NODE_COUNT=10000
|
|
53
|
+
----
|
|
54
|
+
|
|
55
|
+
=== Diff output lines limit
|
|
56
|
+
|
|
57
|
+
Maximum number of lines in diff output before truncation.
|
|
58
|
+
|
|
59
|
+
**Environment variable**: `CANON_MAX_DIFF_LINES` or `CANON_{FORMAT}_DIFF_MAX_DIFF_LINES`
|
|
60
|
+
|
|
61
|
+
**Default**: 10,000 lines
|
|
62
|
+
|
|
63
|
+
**When triggered**: During diff generation, output is truncated at this line count
|
|
64
|
+
|
|
65
|
+
[source,bash]
|
|
66
|
+
----
|
|
67
|
+
# Set max diff lines for XML
|
|
68
|
+
export CANON_XML_DIFF_MAX_DIFF_LINES=15000
|
|
69
|
+
|
|
70
|
+
# Set globally (10,000 default)
|
|
71
|
+
export CANON_MAX_DIFF_LINES=10000
|
|
72
|
+
----
|
|
73
|
+
|
|
74
|
+
== Common scenarios
|
|
75
|
+
|
|
76
|
+
=== Large SVG files
|
|
77
|
+
|
|
78
|
+
When working with large SVG files (e.g., 3.5MB) that may cause hangs:
|
|
79
|
+
|
|
80
|
+
[source,bash]
|
|
81
|
+
----
|
|
82
|
+
# Increase limits for large SVG processing
|
|
83
|
+
export CANON_MAX_FILE_SIZE=10485760 # 10MB
|
|
84
|
+
export CANON_MAX_NODE_COUNT=50000 # 50,000 nodes
|
|
85
|
+
export CANON_MAX_DIFF_LINES=20000 # 20,000 lines
|
|
86
|
+
|
|
87
|
+
bundle exec rspec spec/test_031_spec.rb
|
|
88
|
+
----
|
|
89
|
+
|
|
90
|
+
=== CI/CD with large documents
|
|
91
|
+
|
|
92
|
+
In CI environments where you know documents are large but safe:
|
|
93
|
+
|
|
94
|
+
[source,bash]
|
|
95
|
+
----
|
|
96
|
+
# In .github/workflows/test.yml
|
|
97
|
+
env:
|
|
98
|
+
CANON_MAX_FILE_SIZE: 20971520 # 20MB
|
|
99
|
+
CANON_MAX_NODE_COUNT: 100000 # 100k nodes
|
|
100
|
+
CANON_MAX_DIFF_LINES: 50000 # 50k lines
|
|
101
|
+
----
|
|
102
|
+
|
|
103
|
+
=== Format-specific limits
|
|
104
|
+
|
|
105
|
+
Different formats may need different limits:
|
|
106
|
+
|
|
107
|
+
[source,bash]
|
|
108
|
+
----
|
|
109
|
+
# XML can have more nodes
|
|
110
|
+
export CANON_XML_DIFF_MAX_NODE_COUNT=50000
|
|
111
|
+
|
|
112
|
+
# JSON typically has fewer nodes
|
|
113
|
+
export CANON_JSON_DIFF_MAX_NODE_COUNT=10000
|
|
114
|
+
|
|
115
|
+
# HTML might need larger file size
|
|
116
|
+
export CANON_HTML_DIFF_MAX_FILE_SIZE=10485760
|
|
117
|
+
----
|
|
118
|
+
|
|
119
|
+
== Disabling limits
|
|
120
|
+
|
|
121
|
+
To disable a limit, set it to 0 or a negative value:
|
|
122
|
+
|
|
123
|
+
[source,bash]
|
|
124
|
+
----
|
|
125
|
+
# Disable file size limit (not recommended)
|
|
126
|
+
export CANON_MAX_FILE_SIZE=0
|
|
127
|
+
|
|
128
|
+
# Disable node count limit (use with caution)
|
|
129
|
+
export CANON_MAX_NODE_COUNT=-1
|
|
130
|
+
----
|
|
131
|
+
|
|
132
|
+
WARNING: Disabling limits may cause Canon to hang or consume excessive memory on pathologically large inputs.
|
|
133
|
+
|
|
134
|
+
== Error messages
|
|
135
|
+
|
|
136
|
+
When a limit is exceeded, Canon raises a clear error:
|
|
137
|
+
|
|
138
|
+
=== File size exceeded
|
|
139
|
+
|
|
140
|
+
[source]
|
|
141
|
+
----
|
|
142
|
+
Canon::ValidationError: File size exceeds maximum limit
|
|
143
|
+
Maximum: 5242880 bytes (5.0 MB)
|
|
144
|
+
Actual: 10485760 bytes (10.0 MB)
|
|
145
|
+
|
|
146
|
+
To increase this limit, set:
|
|
147
|
+
CANON_MAX_FILE_SIZE=10485760
|
|
148
|
+
----
|
|
149
|
+
|
|
150
|
+
=== Node count exceeded
|
|
151
|
+
|
|
152
|
+
[source]
|
|
153
|
+
----
|
|
154
|
+
Canon::ValidationError: Node count exceeds maximum limit
|
|
155
|
+
Maximum: 10000 nodes
|
|
156
|
+
Actual: 25000 nodes
|
|
157
|
+
|
|
158
|
+
To increase this limit, set:
|
|
159
|
+
CANON_MAX_NODE_COUNT=25000
|
|
160
|
+
----
|
|
161
|
+
|
|
162
|
+
=== Diff lines exceeded
|
|
163
|
+
|
|
164
|
+
[source]
|
|
165
|
+
----
|
|
166
|
+
Canon::DiffTruncationWarning: Diff output truncated
|
|
167
|
+
Maximum: 10000 lines
|
|
168
|
+
Actual: 15000 lines (truncated to 10000)
|
|
169
|
+
|
|
170
|
+
To increase this limit, set:
|
|
171
|
+
CANON_MAX_DIFF_LINES=15000
|
|
172
|
+
----
|
|
173
|
+
|
|
174
|
+
== Programmatic configuration
|
|
175
|
+
|
|
176
|
+
While environment variables are recommended, you can also configure limits programmatically:
|
|
177
|
+
|
|
178
|
+
[source,ruby]
|
|
179
|
+
----
|
|
180
|
+
# NOT RECOMMENDED - use ENV vars instead
|
|
181
|
+
Canon::Config.instance.xml.diff.max_file_size = 10_485_760
|
|
182
|
+
Canon::Config.instance.xml.diff.max_node_count = 50_000
|
|
183
|
+
Canon::Config.instance.xml.diff.max_diff_lines = 20_000
|
|
184
|
+
----
|
|
185
|
+
|
|
186
|
+
However, **environment variables will override programmatic settings** per the priority chain.
|
|
187
|
+
|
|
188
|
+
== Performance considerations
|
|
189
|
+
|
|
190
|
+
=== Why limits exist
|
|
191
|
+
|
|
192
|
+
Limits prevent:
|
|
193
|
+
|
|
194
|
+
* **Hangs**: Very large documents can cause O(n²) algorithms to hang
|
|
195
|
+
* **Memory exhaustion**: Huge trees consume excessive RAM
|
|
196
|
+
* **Unreadable output**: 100k+ line diffs are not useful
|
|
197
|
+
|
|
198
|
+
=== Choosing appropriate limits
|
|
199
|
+
|
|
200
|
+
**File size**:
|
|
201
|
+
|
|
202
|
+
* **Small projects**: 5MB default is fine
|
|
203
|
+
* **Large documents**: 10-20MB for SVG, generated HTML
|
|
204
|
+
* **Very large**: 50MB+ only if you know what you're doing
|
|
205
|
+
|
|
206
|
+
**Node count**:
|
|
207
|
+
|
|
208
|
+
* **Simple documents**: 10k default is fine
|
|
209
|
+
* **Complex documents**: 50k for large XML, nested JSON
|
|
210
|
+
* **Very complex**: 100k+ only for known-safe inputs
|
|
211
|
+
|
|
212
|
+
**Diff lines**:
|
|
213
|
+
|
|
214
|
+
* **Readable output**: 10k default is fine
|
|
215
|
+
* **Detailed diffs**: 20-50k for comprehensive output
|
|
216
|
+
* **Debug mode**: 100k+ for full comparison
|
|
217
|
+
|
|
218
|
+
== Troubleshooting
|
|
219
|
+
|
|
220
|
+
=== Tests failing with size limit errors
|
|
221
|
+
|
|
222
|
+
If your tests start failing due to size limits:
|
|
223
|
+
|
|
224
|
+
1. **Verify the limit is appropriate**: Check if documents really are that large
|
|
225
|
+
2. **Set ENV in test helper**:
|
|
226
|
+
+
|
|
227
|
+
[source,ruby]
|
|
228
|
+
----
|
|
229
|
+
# spec/spec_helper.rb
|
|
230
|
+
ENV['CANON_MAX_FILE_SIZE'] = '10485760' # 10MB
|
|
231
|
+
ENV['CANON_MAX_NODE_COUNT'] = '50000'
|
|
232
|
+
----
|
|
233
|
+
|
|
234
|
+
3. **Or set per-test**:
|
|
235
|
+
+
|
|
236
|
+
[source,ruby]
|
|
237
|
+
----
|
|
238
|
+
around do |example|
|
|
239
|
+
ClimateControl.modify(
|
|
240
|
+
CANON_MAX_FILE_SIZE: '10485760'
|
|
241
|
+
) do
|
|
242
|
+
example.run
|
|
243
|
+
end
|
|
244
|
+
end
|
|
245
|
+
----
|
|
246
|
+
|
|
247
|
+
=== Performance degradation
|
|
248
|
+
|
|
249
|
+
If comparisons become slow after increasing limits:
|
|
250
|
+
|
|
251
|
+
1. **Use DOM algorithm**: Faster than semantic for large documents
|
|
252
|
+
+
|
|
253
|
+
[source,bash]
|
|
254
|
+
----
|
|
255
|
+
export CANON_ALGORITHM=dom
|
|
256
|
+
----
|
|
257
|
+
|
|
258
|
+
2. **Disable expensive features**:
|
|
259
|
+
+
|
|
260
|
+
[source,bash]
|
|
261
|
+
----
|
|
262
|
+
export CANON_SHOW_COMPARE=false
|
|
263
|
+
export CANON_VERBOSE_DIFF=false
|
|
264
|
+
----
|
|
265
|
+
|
|
266
|
+
3. **Consider if you really need to compare such large files**
|
|
267
|
+
|
|
268
|
+
== See also
|
|
269
|
+
|
|
270
|
+
* link:index.adoc[Environment Configuration] - Complete ENV configuration
|
|
271
|
+
* link:override-system.adoc[Override System] - How ENV vars work
|
|
272
|
+
* link:../../reference/environment-variables.adoc[Environment Variables Reference] - All variables
|
|
273
|
+
* link:../../understanding/algorithms/dom-diff.adoc[DOM Algorithm] - Faster for large files
|