canon 0.1.22 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop_todo.yml +174 -25
- data/docs/INDEX.adoc +4 -0
- data/docs/advanced/diff-classification.adoc +3 -2
- data/docs/features/configuration-profiles.adoc +288 -0
- data/docs/features/diff-formatting/character-visualization.adoc +153 -454
- data/docs/features/diff-formatting/display-filtering.adoc +44 -0
- data/docs/features/diff-formatting/display-preprocessing.adoc +656 -0
- data/docs/features/diff-formatting/index.adoc +47 -0
- data/docs/features/diff-formatting/pretty-diff-mode.adoc +154 -0
- data/docs/features/environment-configuration/override-system.adoc +10 -3
- data/docs/features/index.adoc +9 -0
- data/docs/features/match-options/index.adoc +32 -42
- data/docs/features/match-options/pretty-printed-fixtures.adoc +270 -0
- data/docs/guides/choosing-configuration.adoc +22 -0
- data/docs/reference/environment-variables.adoc +121 -1
- data/docs/reference/options-across-interfaces.adoc +182 -2
- data/lib/canon/cli.rb +20 -0
- data/lib/canon/commands/diff_command.rb +7 -2
- data/lib/canon/commands/format_command.rb +1 -1
- data/lib/canon/comparison/html_comparator.rb +20 -15
- data/lib/canon/comparison/html_compare_profile.rb +4 -4
- data/lib/canon/comparison/markup_comparator.rb +12 -3
- data/lib/canon/comparison/match_options/base_resolver.rb +29 -7
- data/lib/canon/comparison/match_options/json_resolver.rb +9 -0
- data/lib/canon/comparison/match_options/xml_resolver.rb +16 -2
- data/lib/canon/comparison/match_options/yaml_resolver.rb +10 -0
- data/lib/canon/comparison/match_options.rb +4 -1
- data/lib/canon/comparison/whitespace_sensitivity.rb +189 -137
- data/lib/canon/comparison/xml_comparator/child_comparison.rb +21 -4
- data/lib/canon/comparison/xml_comparator.rb +14 -12
- data/lib/canon/comparison/xml_node_comparison.rb +51 -6
- data/lib/canon/comparison.rb +52 -9
- data/lib/canon/config/env_schema.rb +32 -4
- data/lib/canon/config/override_resolver.rb +16 -3
- data/lib/canon/config/profile_loader.rb +135 -0
- data/lib/canon/config/profiles/metanorma.yml +74 -0
- data/lib/canon/config/profiles/metanorma_debug.yml +8 -0
- data/lib/canon/config/type_converter.rb +8 -0
- data/lib/canon/config.rb +469 -5
- data/lib/canon/diff/diff_classifier.rb +41 -11
- data/lib/canon/diff_formatter/diff_detail_formatter/dimension_formatter.rb +48 -17
- data/lib/canon/diff_formatter/diff_detail_formatter/node_utils.rb +58 -0
- data/lib/canon/diff_formatter/diff_detail_formatter.rb +22 -7
- data/lib/canon/diff_formatter/theme.rb +24 -17
- data/lib/canon/diff_formatter.rb +493 -36
- data/lib/canon/pretty_printer/xml_normalized.rb +395 -0
- data/lib/canon/rspec_matchers.rb +36 -0
- data/lib/canon/tree_diff/matchers/hash_matcher.rb +26 -11
- data/lib/canon/version.rb +1 -1
- data/lib/canon/xml/nodes/namespace_node.rb +4 -0
- data/lib/canon/xml/nodes/processing_instruction_node.rb +4 -0
- data/lib/canon/xml/nodes/root_node.rb +4 -0
- data/lib/canon/xml/nodes/text_node.rb +4 -0
- data/lib/tasks/performance_helpers.rb +2 -2
- metadata +24 -2
|
@@ -0,0 +1,270 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: Pretty-Printed Fixture Support
|
|
3
|
+
parent: Match Options
|
|
4
|
+
nav_order: 5
|
|
5
|
+
---
|
|
6
|
+
= Pretty-printed fixture support
|
|
7
|
+
:toc:
|
|
8
|
+
:toclevels: 3
|
|
9
|
+
|
|
10
|
+
== Problem statement
|
|
11
|
+
|
|
12
|
+
A common pattern in Metanorma and similar XML-generating libraries is to compare
|
|
13
|
+
a *compact* generated document (no inter-element whitespace) against a
|
|
14
|
+
*hand-indented fixture* stored as an RSpec heredoc:
|
|
15
|
+
|
|
16
|
+
[source,ruby]
|
|
17
|
+
----
|
|
18
|
+
expected_xml = <<~XML
|
|
19
|
+
<root>
|
|
20
|
+
<fmt-title>
|
|
21
|
+
<semx element="title" source="_">Foreword</semx>
|
|
22
|
+
</fmt-title>
|
|
23
|
+
</root>
|
|
24
|
+
XML
|
|
25
|
+
|
|
26
|
+
received_xml = "<root><fmt-title><semx element=\"title\" source=\"_\">Foreword</semx></fmt-title></root>"
|
|
27
|
+
----
|
|
28
|
+
|
|
29
|
+
When `fmt-title` and `semx` are classified as `:collapse` whitespace elements
|
|
30
|
+
(their inline content matters but all whitespace forms are equivalent), the
|
|
31
|
+
comparison _correctly_ detects whitespace differences: the fixture has text nodes
|
|
32
|
+
`"\n "` and `"\n"` inside `<fmt-title>` that are absent from the compact
|
|
33
|
+
received document.
|
|
34
|
+
|
|
35
|
+
This is the right behaviour in general. However, for *fixture files that
|
|
36
|
+
cannot be written as compact single-line strings* — because doing so would
|
|
37
|
+
produce unreadable 1000-character lines — the indentation whitespace in the
|
|
38
|
+
fixture is a necessary formatting artefact, not a meaningful content difference.
|
|
39
|
+
|
|
40
|
+
=== Why the whitespace cannot simply be stripped
|
|
41
|
+
|
|
42
|
+
The whitespace classification is intentionally asymmetric across elements:
|
|
43
|
+
|
|
44
|
+
* Structural container elements (e.g. `<clause>`, `<foreword>`) are
|
|
45
|
+
`:strip` — their inter-element whitespace is already dropped in both
|
|
46
|
+
compact and indented forms.
|
|
47
|
+
* Mixed-content elements (e.g. `<fmt-title>`, `<semx>`) are `:collapse` —
|
|
48
|
+
the _presence_ of a space between inline children is significant (it
|
|
49
|
+
represents a word-boundary), but its exact form is not.
|
|
50
|
+
|
|
51
|
+
Stripping all whitespace from the fixture would destroy the signal inside
|
|
52
|
+
mixed-content elements: `<p>See <em>note</em></p>` would become identical to
|
|
53
|
+
`<p>See<em>note</em></p>`, hiding a real content difference.
|
|
54
|
+
|
|
55
|
+
== Solution: asymmetric pretty-print flags
|
|
56
|
+
|
|
57
|
+
Canon provides two match options and two display-preprocessing flags that
|
|
58
|
+
instruct the relevant subsystem to treat whitespace-only text nodes that begin
|
|
59
|
+
with `"\n"` as *structural indentation from the pretty-printer* and drop them
|
|
60
|
+
from the comparison. They operate independently so that only the
|
|
61
|
+
actually-pretty-printed side is affected.
|
|
62
|
+
|
|
63
|
+
`pretty_printed_expected`::
|
|
64
|
+
When `true`, whitespace-only text nodes that start with `"\n"` in
|
|
65
|
+
`:collapse`-classified elements of the **expected** (first / fixture)
|
|
66
|
+
document are dropped before comparison and before display.
|
|
67
|
+
Use this when fixture files are indented heredocs but received XML is compact.
|
|
68
|
+
|
|
69
|
+
`pretty_printed_received`::
|
|
70
|
+
When `true`, the same treatment applies to the **received** (second / actual)
|
|
71
|
+
document. Use this when the received output may be pretty-printed but the
|
|
72
|
+
fixture is compact.
|
|
73
|
+
|
|
74
|
+
=== The heuristic explained
|
|
75
|
+
|
|
76
|
+
The heuristic is:
|
|
77
|
+
|
|
78
|
+
[source]
|
|
79
|
+
----
|
|
80
|
+
Whitespace-only text node starts with "\n" → structural indentation → drop
|
|
81
|
+
Whitespace-only text node has no "\n" → inline content space → keep
|
|
82
|
+
----
|
|
83
|
+
|
|
84
|
+
The reasoning: compact Metanorma XML (and most programmatic XML serializers)
|
|
85
|
+
never emit a bare `"\n"` immediately after a closing or opening tag inside a
|
|
86
|
+
mixed-content element. If a fixture heredoc contains such a `"\n"`, it was
|
|
87
|
+
introduced by the human author formatting the fixture for readability — it is
|
|
88
|
+
not part of the original document content.
|
|
89
|
+
|
|
90
|
+
A _space_ not preceded by `"\n"` is treated as real content: in
|
|
91
|
+
`<p>See <em>note</em></p>`, the text node `"See "` starts with `"S"`, so its
|
|
92
|
+
trailing space is kept and normalized to a single `░` symbol in the display.
|
|
93
|
+
|
|
94
|
+
NOTE: This heuristic is only activated for `:collapse` elements. `:preserve`
|
|
95
|
+
elements always preserve every whitespace character regardless of these flags.
|
|
96
|
+
`:strip` elements already drop all whitespace and are unaffected.
|
|
97
|
+
|
|
98
|
+
== Configuration
|
|
99
|
+
|
|
100
|
+
=== Comparison (equivalence detection)
|
|
101
|
+
|
|
102
|
+
Pass `pretty_printed_expected` or `pretty_printed_received` inside the `match:`
|
|
103
|
+
hash of `Canon::Comparison.equivalent?` or the RSpec `be_xml_equivalent_to`
|
|
104
|
+
matcher:
|
|
105
|
+
|
|
106
|
+
[source,ruby]
|
|
107
|
+
----
|
|
108
|
+
# One-off: fixture is indented, received is compact
|
|
109
|
+
result = Canon::Comparison.equivalent?(
|
|
110
|
+
fixture_xml,
|
|
111
|
+
received_xml,
|
|
112
|
+
match: {
|
|
113
|
+
collapse_whitespace_elements: %w[fmt-title semx],
|
|
114
|
+
pretty_printed_expected: true,
|
|
115
|
+
}
|
|
116
|
+
)
|
|
117
|
+
|
|
118
|
+
# RSpec global config in spec_helper.rb
|
|
119
|
+
Canon::Config.configure do |cfg|
|
|
120
|
+
cfg.xml.match.options = {
|
|
121
|
+
collapse_whitespace_elements: %w[fmt-title semx],
|
|
122
|
+
pretty_printed_expected: true,
|
|
123
|
+
}
|
|
124
|
+
end
|
|
125
|
+
----
|
|
126
|
+
|
|
127
|
+
=== Display preprocessing (`:normalize_pretty_print`)
|
|
128
|
+
|
|
129
|
+
When using `display_preprocessing: :normalize_pretty_print`, set the
|
|
130
|
+
corresponding diff-config flags so that the display side also strips structural
|
|
131
|
+
indentation from only the flagged side:
|
|
132
|
+
|
|
133
|
+
[source,ruby]
|
|
134
|
+
----
|
|
135
|
+
Canon::Config.configure do |cfg|
|
|
136
|
+
cfg.xml.diff.display_preprocessing = :normalize_pretty_print
|
|
137
|
+
cfg.xml.diff.collapse_whitespace_elements = %w[fmt-title semx]
|
|
138
|
+
|
|
139
|
+
# Drop structural indentation from the indented fixture side only
|
|
140
|
+
cfg.xml.diff.pretty_printed_expected = true
|
|
141
|
+
cfg.xml.diff.pretty_printed_received = false # compact received is unchanged
|
|
142
|
+
end
|
|
143
|
+
----
|
|
144
|
+
|
|
145
|
+
=== Environment variables
|
|
146
|
+
|
|
147
|
+
[source,bash]
|
|
148
|
+
----
|
|
149
|
+
export CANON_XML_DIFF_PRETTY_PRINTED_EXPECTED=true
|
|
150
|
+
export CANON_XML_DIFF_PRETTY_PRINTED_RECEIVED=false
|
|
151
|
+
----
|
|
152
|
+
|
|
153
|
+
== Behaviour by whitespace class
|
|
154
|
+
|
|
155
|
+
[cols="1,3,3,3"]
|
|
156
|
+
|===
|
|
157
|
+
|Element class |`pretty_printed_expected: false` (default) |`pretty_printed_expected: true`
|
|
158
|
+
|Note
|
|
159
|
+
|
|
160
|
+
|`:strip`
|
|
161
|
+
|All inter-element whitespace dropped (both sides)
|
|
162
|
+
|Same — no change
|
|
163
|
+
|Already dropped; flag has no effect
|
|
164
|
+
|
|
165
|
+
|`:collapse`
|
|
166
|
+
|`"\n "` nodes kept; compact vs. indented comparison detects difference
|
|
167
|
+
|`"\n"`-starting nodes dropped from expected; compact vs. indented now equivalent
|
|
168
|
+
|Inline content spaces (`" "` without `"\n"`) are always kept
|
|
169
|
+
|
|
170
|
+
|`:preserve`
|
|
171
|
+
|Every whitespace character compared verbatim
|
|
172
|
+
|Same — no change
|
|
173
|
+
|`:preserve` always preserves all whitespace regardless of flag
|
|
174
|
+
|
|
175
|
+
|===
|
|
176
|
+
|
|
177
|
+
== Example walkthrough
|
|
178
|
+
|
|
179
|
+
=== Fixture (hand-indented heredoc)
|
|
180
|
+
|
|
181
|
+
[source,xml]
|
|
182
|
+
----
|
|
183
|
+
<root>
|
|
184
|
+
<clause>
|
|
185
|
+
<fmt-title>
|
|
186
|
+
<semx element="title" source="_">Foreword</semx>
|
|
187
|
+
</fmt-title>
|
|
188
|
+
</clause>
|
|
189
|
+
</root>
|
|
190
|
+
----
|
|
191
|
+
|
|
192
|
+
Text nodes inside `<fmt-title>` after Nokogiri parsing:
|
|
193
|
+
|
|
194
|
+
* `"\n "` between `<fmt-title>` open tag and `<semx>` — whitespace-only, starts with `"\n"`
|
|
195
|
+
* `"\n "` between `</semx>` and `</fmt-title>` — whitespace-only, starts with `"\n"`
|
|
196
|
+
|
|
197
|
+
=== Received (compact Metanorma output)
|
|
198
|
+
|
|
199
|
+
[source,xml]
|
|
200
|
+
----
|
|
201
|
+
<root><clause><fmt-title><semx element="title" source="_">Foreword</semx></fmt-title></clause></root>
|
|
202
|
+
----
|
|
203
|
+
|
|
204
|
+
No whitespace nodes inside `<fmt-title>`.
|
|
205
|
+
|
|
206
|
+
=== Without `pretty_printed_expected: true`
|
|
207
|
+
|
|
208
|
+
`clause` is `:strip`, so its inter-element whitespace is dropped.
|
|
209
|
+
`fmt-title` and `semx` are `:collapse`, so their whitespace nodes are kept.
|
|
210
|
+
|
|
211
|
+
The comparison finds two extra whitespace nodes in the expected side →
|
|
212
|
+
**not equivalent**.
|
|
213
|
+
|
|
214
|
+
=== With `pretty_printed_expected: true`
|
|
215
|
+
|
|
216
|
+
The `"\n"`-starting nodes inside `:collapse` elements are dropped from the
|
|
217
|
+
expected side before comparison. The expected side now matches the compact
|
|
218
|
+
received side → **equivalent**.
|
|
219
|
+
|
|
220
|
+
== Interaction with `:pretty_diff` mode
|
|
221
|
+
|
|
222
|
+
`:pretty_diff` mode applies `display_preprocessing` to both documents before
|
|
223
|
+
running a text-LCS diff on the resulting lines. When
|
|
224
|
+
`pretty_printed_expected: true` is set on the formatter, `XmlNormalized` is
|
|
225
|
+
instantiated with `pretty_printed: true` for the expected-side printer only.
|
|
226
|
+
This drops the `"\n"`-starting whitespace visualization from the expected
|
|
227
|
+
side's serialized lines, so the LCS diff sees identical lines for purely
|
|
228
|
+
structural indentation differences.
|
|
229
|
+
|
|
230
|
+
[source,ruby]
|
|
231
|
+
----
|
|
232
|
+
Canon::DiffFormatter.new(
|
|
233
|
+
mode: :pretty_diff,
|
|
234
|
+
display_preprocessing: :normalize_pretty_print,
|
|
235
|
+
collapse_whitespace_elements: %w[fmt-title semx],
|
|
236
|
+
pretty_printed_expected: true, # strip structural \n nodes from fixture side
|
|
237
|
+
pretty_printed_received: false, # compact received side is unchanged
|
|
238
|
+
)
|
|
239
|
+
----
|
|
240
|
+
|
|
241
|
+
== Relation to deprecated `normalize_pretty_print_ignore_structural_newlines`
|
|
242
|
+
|
|
243
|
+
The deprecated flag `normalize_pretty_print_ignore_structural_newlines` applied
|
|
244
|
+
the newline-stripping heuristic to **both** sides simultaneously and without
|
|
245
|
+
regard to whitespace classification.
|
|
246
|
+
|
|
247
|
+
The new `pretty_printed_expected` / `pretty_printed_received` flags replace it
|
|
248
|
+
with a more granular design:
|
|
249
|
+
|
|
250
|
+
[cols="3,3"]
|
|
251
|
+
|===
|
|
252
|
+
|Old flag |New equivalent
|
|
253
|
+
|
|
254
|
+
|`normalize_pretty_print_ignore_structural_newlines: true`
|
|
255
|
+
|`pretty_printed_expected: true, pretty_printed_received: true`
|
|
256
|
+
(plus `collapse_whitespace_elements:` to restrict to the right elements)
|
|
257
|
+
|===
|
|
258
|
+
|
|
259
|
+
The old flag is deprecated and still emits a warning; use the new flags for all
|
|
260
|
+
new code.
|
|
261
|
+
|
|
262
|
+
== See also
|
|
263
|
+
|
|
264
|
+
* link:../../features/diff-formatting/display-preprocessing.adoc[Display preprocessing]
|
|
265
|
+
— the `display_preprocessing` option and `XmlNormalized` serializer
|
|
266
|
+
* link:index.adoc[Match options overview] — whitespace sensitivity classification
|
|
267
|
+
* link:../../reference/options-across-interfaces.adoc[Options across interfaces]
|
|
268
|
+
— cross-interface reference table
|
|
269
|
+
* link:../../reference/environment-variables.adoc[Environment variables]
|
|
270
|
+
— `CANON_XML_DIFF_PRETTY_PRINTED_EXPECTED`, `CANON_XML_DIFF_PRETTY_PRINTED_RECEIVED`
|
|
@@ -679,8 +679,30 @@ When changing configuration:
|
|
|
679
679
|
- [ ] Test with sample documents
|
|
680
680
|
- [ ] Update documentation
|
|
681
681
|
|
|
682
|
+
== Using Configuration Profiles
|
|
683
|
+
|
|
684
|
+
If you use the same configuration across multiple gems or projects,
|
|
685
|
+
consider using a **configuration profile** instead of repeating settings.
|
|
686
|
+
A single line replaces dozens of configuration calls:
|
|
687
|
+
|
|
688
|
+
[source,ruby]
|
|
689
|
+
----
|
|
690
|
+
# Instead of 60+ lines of per-format settings:
|
|
691
|
+
Canon::Config.instance.profile = :metanorma
|
|
692
|
+
|
|
693
|
+
# Or load a custom profile from a local YAML file:
|
|
694
|
+
Canon::Config.instance.profile = "config/my_canon_profile.yml"
|
|
695
|
+
----
|
|
696
|
+
|
|
697
|
+
Profiles bundle all layers (preprocessing, match profile, diff settings,
|
|
698
|
+
whitespace element lists) into a named preset defined in YAML.
|
|
699
|
+
Custom file profiles can inherit from built-in profiles.
|
|
700
|
+
|
|
701
|
+
See link:../features/configuration-profiles.adoc[Configuration Profiles] for full documentation.
|
|
702
|
+
|
|
682
703
|
== See Also
|
|
683
704
|
|
|
705
|
+
* link:../features/configuration-profiles.adoc[Configuration Profiles] - Named config presets
|
|
684
706
|
* link:../understanding/comparison-pipeline.adoc[Comparison Pipeline] - Understanding the 4 layers
|
|
685
707
|
* link:../understanding/algorithms/[Algorithms] - Detailed algorithm documentation
|
|
686
708
|
* link:../features/match-options/algorithm-specific-behavior.adoc[Algorithm-Specific Behavior] - How algorithms differ
|
|
@@ -86,9 +86,129 @@ export CANON_JSON_FORMAT_PREPROCESSING=normalize
|
|
|
86
86
|
|`CANON_MODE`
|
|
87
87
|
|symbol
|
|
88
88
|
|`:by_line`
|
|
89
|
-
|Diff output mode: `by_line` or `
|
|
89
|
+
|Diff output mode: `by_line`, `by_object`, or `rspec`
|
|
90
90
|
|All formats
|
|
91
91
|
|
|
92
|
+
|`CANON_DISPLAY_PREPROCESSING`
|
|
93
|
+
|symbol
|
|
94
|
+
|`:none`
|
|
95
|
+
|How documents are normalized _for display_ before the line diff: `none`, `pretty_print`, `c14n`. Format-specific: `CANON_{FORMAT}_DIFF_DISPLAY_PREPROCESSING`
|
|
96
|
+
|All formats
|
|
97
|
+
|
|
98
|
+
|`CANON_PRETTY_PRINTER_INDENT`
|
|
99
|
+
|integer
|
|
100
|
+
|`2`
|
|
101
|
+
|Indentation depth used by the pretty-printer when `display_preprocessing: :pretty_print`. Format-specific: `CANON_{FORMAT}_DIFF_PRETTY_PRINTER_INDENT`
|
|
102
|
+
|All formats (display only)
|
|
103
|
+
|
|
104
|
+
|`CANON_PRETTY_PRINTER_INDENT_TYPE`
|
|
105
|
+
|symbol
|
|
106
|
+
|`:space`
|
|
107
|
+
|Indentation character for the pretty-printer: `space` or `tab`. Format-specific: `CANON_{FORMAT}_DIFF_PRETTY_PRINTER_INDENT_TYPE`
|
|
108
|
+
|All formats (display only)
|
|
109
|
+
|
|
110
|
+
|`CANON_XML_DIFF_COLLAPSE_WHITESPACE_ELEMENTS`
|
|
111
|
+
|string array
|
|
112
|
+
|`[]`
|
|
113
|
+
|Comma-separated list of XML element names whose inter-element whitespace is *presence-significant but form-insensitive*: both `" "` and `"\n "` collapse to a single `░`. Suitable for inline mixed-content elements such as `<p>`, `<li>`, `<td>`. Only applies when `display_preprocessing: :normalize_pretty_print`. Format-specific; no global form.
|
|
114
|
+
|XML (display only)
|
|
115
|
+
|
|
116
|
+
|`CANON_XML_DIFF_PRESERVE_WHITESPACE_ELEMENTS`
|
|
117
|
+
|string array
|
|
118
|
+
|`[]`
|
|
119
|
+
|Comma-separated list of XML element names where every whitespace character is significant and visualized verbatim (`" "` → `░`, `"\n "` → `↵░░`). Suitable for preformatted elements such as `<pre>`, `<code>`. Only applies when `display_preprocessing: :normalize_pretty_print`. Format-specific; no global form.
|
|
120
|
+
|XML (display only)
|
|
121
|
+
|
|
122
|
+
|`CANON_XML_DIFF_STRIP_WHITESPACE_ELEMENTS`
|
|
123
|
+
|string array
|
|
124
|
+
|`[]`
|
|
125
|
+
|Comma-separated list of XML element names where whitespace-only text nodes are stripped entirely. Only applies when `display_preprocessing: :normalize_pretty_print`. Format-specific; no global form.
|
|
126
|
+
|XML (display only)
|
|
127
|
+
|
|
128
|
+
|`CANON_XML_DIFF_PRETTY_PRINTED_EXPECTED`
|
|
129
|
+
|boolean
|
|
130
|
+
|`false`
|
|
131
|
+
|When `true`, whitespace-only text nodes that start with `"\n"` inside `:collapse`-classified elements are dropped from the **expected (fixture)** document before it reaches the line diff. Solves the asymmetric case where the expected side is a hand-indented heredoc fixture but the received side is compact programmatic XML. Only applies when `display_preprocessing: :normalize_pretty_print`. Format-specific; no global form. See also link:../features/match-options/pretty-printed-fixtures.adoc[Pretty-printed fixture support].
|
|
132
|
+
|XML (display only)
|
|
133
|
+
|
|
134
|
+
|`CANON_XML_DIFF_PRETTY_PRINTED_RECEIVED`
|
|
135
|
+
|boolean
|
|
136
|
+
|`false`
|
|
137
|
+
|Same as `CANON_XML_DIFF_PRETTY_PRINTED_EXPECTED` but applied to the **received (actual)** document. Useful when the received output may be pretty-printed but the fixture is compact. Format-specific; no global form.
|
|
138
|
+
|XML (display only)
|
|
139
|
+
|
|
140
|
+
|`CANON_XML_DIFF_PRETTY_PRINTER_SORT_ATTRIBUTES`
|
|
141
|
+
|boolean
|
|
142
|
+
|`false`
|
|
143
|
+
|When `true`, attributes on each element are sorted by namespace URI then local name in the pretty-printed display. Eliminates spurious diff lines caused by differing attribute order between expected and received XML. Only applies when `display_preprocessing` is `:pretty_print` or `:normalize_pretty_print`. Format-specific; no global form.
|
|
144
|
+
|XML (display only)
|
|
145
|
+
|
|
146
|
+
|`CANON_SHOW_RAW_INPUTS`
|
|
147
|
+
|boolean
|
|
148
|
+
|`false`
|
|
149
|
+
|Show the raw (un-preprocessed) file contents of both sides before the diff output. Equivalent to enabling both `CANON_SHOW_RAW_EXPECTED` and `CANON_SHOW_RAW_RECEIVED`. Format-specific: `CANON_{FORMAT}_DIFF_SHOW_RAW_INPUTS`
|
|
150
|
+
|All formats (display only)
|
|
151
|
+
|
|
152
|
+
|`CANON_SHOW_RAW_EXPECTED`
|
|
153
|
+
|boolean
|
|
154
|
+
|`false`
|
|
155
|
+
|Show only the EXPECTED (fixture) block in the raw-inputs section. Has no effect unless `show_raw_inputs` or `verbose_diff` is also set. Format-specific: `CANON_{FORMAT}_DIFF_SHOW_RAW_EXPECTED`
|
|
156
|
+
|All formats (display only)
|
|
157
|
+
|
|
158
|
+
|`CANON_SHOW_RAW_RECEIVED`
|
|
159
|
+
|boolean
|
|
160
|
+
|`false`
|
|
161
|
+
|Show only the RECEIVED (actual) block in the raw-inputs section. Format-specific: `CANON_{FORMAT}_DIFF_SHOW_RAW_RECEIVED`
|
|
162
|
+
|All formats (display only)
|
|
163
|
+
|
|
164
|
+
|`CANON_SHOW_PREPROCESSED_INPUTS`
|
|
165
|
+
|boolean
|
|
166
|
+
|`false`
|
|
167
|
+
|Show the preprocessed (post-comparison-preprocessing) contents of both sides before the diff output. Equivalent to enabling both `CANON_SHOW_PREPROCESSED_EXPECTED` and `CANON_SHOW_PREPROCESSED_RECEIVED`. Format-specific: `CANON_{FORMAT}_DIFF_SHOW_PREPROCESSED_INPUTS`
|
|
168
|
+
|All formats (display only)
|
|
169
|
+
|
|
170
|
+
|`CANON_SHOW_PREPROCESSED_EXPECTED`
|
|
171
|
+
|boolean
|
|
172
|
+
|`false`
|
|
173
|
+
|Show only the EXPECTED (fixture) block in the preprocessed-inputs section. Has no effect unless `show_preprocessed_inputs` or `verbose_diff` is also set. Format-specific: `CANON_{FORMAT}_DIFF_SHOW_PREPROCESSED_EXPECTED`
|
|
174
|
+
|All formats (display only)
|
|
175
|
+
|
|
176
|
+
|`CANON_SHOW_PREPROCESSED_RECEIVED`
|
|
177
|
+
|boolean
|
|
178
|
+
|`false`
|
|
179
|
+
|Show only the RECEIVED (actual) block in the preprocessed-inputs section. Format-specific: `CANON_{FORMAT}_DIFF_SHOW_PREPROCESSED_RECEIVED`
|
|
180
|
+
|All formats (display only)
|
|
181
|
+
|
|
182
|
+
|`CANON_SHOW_PRETTYPRINT_INPUTS`
|
|
183
|
+
|boolean
|
|
184
|
+
|`false`
|
|
185
|
+
|Show a fixture-ready pretty-printed form of **both** original input sides before the diff output. The output is formatted with one XML/HTML tag per line and proper indentation (using the configured `pretty_printer_indent` / `pretty_printer_indent_type`), but with **no character visualization** — whitespace appears as plain ASCII so the output can be copy-pasted directly into RSpec heredoc fixtures. Unlike `show_preprocessed_inputs`, this always pretty-prints the original strings regardless of the `preprocessing` or `display_preprocessing` settings. Equivalent to enabling both `CANON_SHOW_PRETTYPRINT_EXPECTED` and `CANON_SHOW_PRETTYPRINT_RECEIVED`. Format-specific: `CANON_{FORMAT}_DIFF_SHOW_PRETTYPRINT_INPUTS`
|
|
186
|
+
|All formats (display only)
|
|
187
|
+
|
|
188
|
+
|`CANON_SHOW_PRETTYPRINT_EXPECTED`
|
|
189
|
+
|boolean
|
|
190
|
+
|`false`
|
|
191
|
+
|Show only the EXPECTED (fixture) block in the fixture-ready pretty-printed section. Use this to see the current fixture re-formatted for copy-pasting when the fixture is the side that needs updating. Format-specific: `CANON_{FORMAT}_DIFF_SHOW_PRETTYPRINT_EXPECTED`
|
|
192
|
+
|All formats (display only)
|
|
193
|
+
|
|
194
|
+
|`CANON_SHOW_PRETTYPRINT_RECEIVED`
|
|
195
|
+
|boolean
|
|
196
|
+
|`false`
|
|
197
|
+
|Show only the RECEIVED (actual) block in the fixture-ready pretty-printed section. This is the most common fixture-update workflow: enable this option to get a copy-pasteable pretty-printed form of the generated output that can replace the old fixture heredoc. Format-specific: `CANON_{FORMAT}_DIFF_SHOW_PRETTYPRINT_RECEIVED`
|
|
198
|
+
|All formats (display only)
|
|
199
|
+
|
|
200
|
+
|`CANON_COMPACT_SEMANTIC_REPORT`
|
|
201
|
+
|boolean
|
|
202
|
+
|`false`
|
|
203
|
+
|Render element nodes in the Semantic Diff Report as compact inline XML (e.g. `<strong>Annex</strong>`) instead of the verbose `node_info` description string (e.g. `name: strong namespace_uri: …`). Useful when reading Semantic Diff output in the terminal and wanting to see the actual XML markup rather than a textual node description. See format-specific form `CANON_{FORMAT}_DIFF_COMPACT_SEMANTIC_REPORT`.
|
|
204
|
+
|All formats (display only)
|
|
205
|
+
|
|
206
|
+
|`CANON_CHARACTER_VISUALIZATION`
|
|
207
|
+
|symbol
|
|
208
|
+
|`true`
|
|
209
|
+
|Replace invisible characters with visible Unicode symbols in diff output: `true`, `false`, or `content_only`. Format-specific: `CANON_{FORMAT}_DIFF_CHARACTER_VISUALIZATION`. Set to `false` to keep plain-text output.
|
|
210
|
+
|All formats (display only)
|
|
211
|
+
|
|
92
212
|
|`CANON_USE_COLOR`
|
|
93
213
|
|boolean
|
|
94
214
|
|`true`
|
|
@@ -29,7 +29,7 @@ The following table shows how major Canon options are expressed in each interfac
|
|
|
29
29
|
|===
|
|
30
30
|
|Option |CLI Flag |Ruby API |RSpec Config |ENV Variable
|
|
31
31
|
|
|
32
|
-
|Preprocessing
|
|
32
|
+
|Comparison Preprocessing
|
|
33
33
|
|`--preprocessing normalize`
|
|
34
34
|
|`preprocessing: :normalize`
|
|
35
35
|
|`config.canon.xml.preprocessing = :normalize`
|
|
@@ -38,6 +38,111 @@ The following table shows how major Canon options are expressed in each interfac
|
|
|
38
38
|
|
|
39
39
|
Preprocessing values: `none`, `c14n`, `normalize`, `format`
|
|
40
40
|
|
|
41
|
+
NOTE: This controls how documents are normalized _before comparison_ (equivalence detection). It is independent of display preprocessing, which controls how documents are formatted for the diff output.
|
|
42
|
+
|
|
43
|
+
=== Layer 1b: Display Preprocessing Options
|
|
44
|
+
|
|
45
|
+
Display preprocessing controls how documents are normalized _before the line diff is rendered_. This is independent of comparison preprocessing. Because both sides go through the same normalization, the resulting line diff shows only genuine content differences rather than formatting differences.
|
|
46
|
+
|
|
47
|
+
[cols="2,2,3,3,3"]
|
|
48
|
+
|===
|
|
49
|
+
|Option |CLI Flag |Ruby API |RSpec Config |ENV Variable
|
|
50
|
+
|
|
51
|
+
|Display Preprocessing
|
|
52
|
+
|`--display-preprocessing pretty_print`
|
|
53
|
+
|`display_preprocessing: :pretty_print`
|
|
54
|
+
|`cfg.xml.diff.display_preprocessing = :pretty_print`
|
|
55
|
+
|`CANON_XML_DIFF_DISPLAY_PREPROCESSING=pretty_print`
|
|
56
|
+
|
|
57
|
+
|Pretty-printer Indent
|
|
58
|
+
|N/A
|
|
59
|
+
|N/A (config only)
|
|
60
|
+
|`cfg.xml.diff.pretty_printer.indent = 2`
|
|
61
|
+
|`CANON_XML_DIFF_PRETTY_PRINTER_INDENT=2`
|
|
62
|
+
|
|
63
|
+
|Pretty-printer Indent Type
|
|
64
|
+
|N/A
|
|
65
|
+
|N/A (config only)
|
|
66
|
+
|`cfg.xml.diff.pretty_printer.indent_type = :space`
|
|
67
|
+
|`CANON_XML_DIFF_PRETTY_PRINTER_INDENT_TYPE=space`
|
|
68
|
+
|
|
69
|
+
|Collapse Whitespace Elements
|
|
70
|
+
|N/A
|
|
71
|
+
|N/A (config only)
|
|
72
|
+
|`cfg.xml.match.collapse_whitespace_elements = %w[p li td th]`
|
|
73
|
+
|`CANON_XML_MATCH_COLLAPSE_WHITESPACE_ELEMENTS=p,li,td,th`
|
|
74
|
+
|
|
75
|
+
|Preserve Whitespace Elements
|
|
76
|
+
|N/A
|
|
77
|
+
|N/A (config only)
|
|
78
|
+
|`cfg.xml.match.preserve_whitespace_elements = %w[pre code]`
|
|
79
|
+
|`CANON_XML_MATCH_PRESERVE_WHITESPACE_ELEMENTS=pre,code`
|
|
80
|
+
|
|
81
|
+
|Pretty-printed Expected
|
|
82
|
+
|N/A
|
|
83
|
+
|`match: { pretty_printed_expected: true }`
|
|
84
|
+
|`cfg.xml.diff.pretty_printed_expected = true`
|
|
85
|
+
|`CANON_XML_DIFF_PRETTY_PRINTED_EXPECTED=true`
|
|
86
|
+
|
|
87
|
+
|Pretty-printed Received
|
|
88
|
+
|N/A
|
|
89
|
+
|`match: { pretty_printed_received: true }`
|
|
90
|
+
|`cfg.xml.diff.pretty_printed_received = true`
|
|
91
|
+
|`CANON_XML_DIFF_PRETTY_PRINTED_RECEIVED=true`
|
|
92
|
+
|
|
93
|
+
|Sort Attributes (pretty printer)
|
|
94
|
+
|N/A
|
|
95
|
+
|N/A (config only)
|
|
96
|
+
|`cfg.xml.diff.pretty_printer_sort_attributes = true`
|
|
97
|
+
|`CANON_XML_DIFF_PRETTY_PRINTER_SORT_ATTRIBUTES=true`
|
|
98
|
+
|
|
99
|
+
|Compact Semantic Report
|
|
100
|
+
|N/A
|
|
101
|
+
|N/A (config only)
|
|
102
|
+
|`cfg.xml.diff.compact_semantic_report = true`
|
|
103
|
+
|`CANON_XML_DIFF_COMPACT_SEMANTIC_REPORT=true`
|
|
104
|
+
|
|
105
|
+
|Expand Difference
|
|
106
|
+
|N/A
|
|
107
|
+
|N/A (config only)
|
|
108
|
+
|`cfg.xml.diff.expand_difference = true`
|
|
109
|
+
|`CANON_XML_DIFF_EXPAND_DIFFERENCE=true`
|
|
110
|
+
|===
|
|
111
|
+
|
|
112
|
+
Display preprocessing values: `none` (default), `pretty_print`, `normalize_pretty_print`, `c14n`
|
|
113
|
+
|
|
114
|
+
* `:none` — documents are used as-is for the line diff (existing behaviour)
|
|
115
|
+
* `:pretty_print` — both documents are run through a format-specific pretty-printer before the line diff: `Canon::PrettyPrinter::Xml` for XML (one tag per line, consistent indentation); `Canon::PrettyPrinter::Html` (Nokogiri HTML5 serializer) for HTML. Recommended for XML and HTML RSpec tests.
|
|
116
|
+
* `:normalize_pretty_print` — like `:pretty_print` but uses `Canon::PrettyPrinter::XmlNormalized`, which guarantees one line per XML node even for mixed-content elements (those containing both text and child elements). Recommended when XML contains inline markup (Metanorma, DocBook). Use `collapse_whitespace_elements` or `preserve_whitespace_elements` to control per-element whitespace visualization.
|
|
117
|
+
* `:c14n` — both documents are run through canonical normalization before the line diff. For XML: canonical XML (C14N, attribute-order sorting). For HTML: Nokogiri HTML5 serialization (normalizes attribute order and whitespace).
|
|
118
|
+
|
|
119
|
+
NOTE: Pretty-printer indent options only apply when `display_preprocessing` is `:pretty_print` or `:normalize_pretty_print`.
|
|
120
|
+
|
|
121
|
+
NOTE: `compact_semantic_report` controls the representation of XML element nodes in the *Semantic Diff Report* section (the structured summary that appears above the line diff). When `false` (default), element nodes are described with their `node_info` string (e.g. `name: strong namespace_uri: …`), which is unambiguous but verbose. When `true`, element nodes are serialized as compact inline XML (e.g. `<strong>Annex A</strong>`), which is much easier to read at a glance. Plain text nodes (leaf text content) are not affected — they always display their decoded string value.
|
|
122
|
+
|
|
123
|
+
=== Layer 1c: Character Visualization
|
|
124
|
+
|
|
125
|
+
Character visualization controls whether invisible characters (spaces, tabs, non-breaking spaces, etc.) are replaced with visible Unicode symbols in diff output.
|
|
126
|
+
|
|
127
|
+
[cols="2,2,3,3,3"]
|
|
128
|
+
|===
|
|
129
|
+
|Option |CLI Flag |Ruby API |RSpec Config |ENV Variable
|
|
130
|
+
|
|
131
|
+
|Character Visualization
|
|
132
|
+
|N/A
|
|
133
|
+
|`character_visualization: true`
|
|
134
|
+
|`cfg.xml.diff.character_visualization = false`
|
|
135
|
+
|`CANON_XML_DIFF_CHARACTER_VISUALIZATION=false`
|
|
136
|
+
|===
|
|
137
|
+
|
|
138
|
+
Character visualization values:
|
|
139
|
+
|
|
140
|
+
* `true` — (default) the full default visualization map is applied; spaces appear as `░`, tabs as `⇥`, non-breaking spaces as `␣`, etc.
|
|
141
|
+
* `false` — visualization is disabled; all characters appear as plain text.
|
|
142
|
+
* `:content_only` — reserved for future use; currently behaves as `true`. Future intent: apply visualization only to text-node content, leaving structural indentation whitespace plain.
|
|
143
|
+
|
|
144
|
+
NOTE: Setting `character_visualization: false` is useful when copying diff output from a failure message, or when downstream tooling cannot handle Unicode symbol substitutions.
|
|
145
|
+
|
|
41
146
|
=== Layer 2: Algorithm Selection
|
|
42
147
|
|
|
43
148
|
[cols="2,2,3,3,3"]
|
|
@@ -58,7 +163,7 @@ Preprocessing values: `none`, `c14n`, `normalize`, `format`
|
|
|
58
163
|
|===
|
|
59
164
|
|
|
60
165
|
Algorithm values: `dom`, `semantic` +
|
|
61
|
-
Diff mode values: `by_line`, `by_object`
|
|
166
|
+
Diff mode values: `by_line`, `by_object`, `pretty_diff`
|
|
62
167
|
|
|
63
168
|
=== Layer 3: Match Options
|
|
64
169
|
|
|
@@ -195,6 +300,81 @@ Values: `strict`, `ignore`
|
|
|
195
300
|
|`CANON_GROUPING_LINES=15`
|
|
196
301
|
|===
|
|
197
302
|
|
|
303
|
+
=== Debug Input Display Options
|
|
304
|
+
|
|
305
|
+
These options control whether Canon dumps the raw or preprocessed document content to the terminal alongside the diff. They are useful for debugging comparisons — e.g. inspecting what the received output actually looks like before filing an issue.
|
|
306
|
+
|
|
307
|
+
[cols="2,2,3,3,3"]
|
|
308
|
+
|===
|
|
309
|
+
|Option |CLI Flag |Ruby API |RSpec Config |ENV Variable
|
|
310
|
+
|
|
311
|
+
|Show Raw Inputs (both)
|
|
312
|
+
|`--show-raw-inputs`
|
|
313
|
+
|N/A (config only)
|
|
314
|
+
|`cfg.xml.diff.show_raw_inputs = true`
|
|
315
|
+
|`CANON_XML_DIFF_SHOW_RAW_INPUTS=true`
|
|
316
|
+
|
|
317
|
+
|Show Raw Expected only
|
|
318
|
+
|N/A
|
|
319
|
+
|N/A (config only)
|
|
320
|
+
|`cfg.xml.diff.show_raw_expected = true`
|
|
321
|
+
|`CANON_XML_DIFF_SHOW_RAW_EXPECTED=true`
|
|
322
|
+
|
|
323
|
+
|Show Raw Received only
|
|
324
|
+
|N/A
|
|
325
|
+
|N/A (config only)
|
|
326
|
+
|`cfg.xml.diff.show_raw_received = true`
|
|
327
|
+
|`CANON_XML_DIFF_SHOW_RAW_RECEIVED=true`
|
|
328
|
+
|
|
329
|
+
|Show Preprocessed Inputs (both)
|
|
330
|
+
|`--show-preprocessed-inputs`
|
|
331
|
+
|N/A (config only)
|
|
332
|
+
|`cfg.xml.diff.show_preprocessed_inputs = true`
|
|
333
|
+
|`CANON_XML_DIFF_SHOW_PREPROCESSED_INPUTS=true`
|
|
334
|
+
|
|
335
|
+
|Show Preprocessed Expected only
|
|
336
|
+
|`--show-preprocessed-expected`
|
|
337
|
+
|N/A (config only)
|
|
338
|
+
|`cfg.xml.diff.show_preprocessed_expected = true`
|
|
339
|
+
|`CANON_XML_DIFF_SHOW_PREPROCESSED_EXPECTED=true`
|
|
340
|
+
|
|
341
|
+
|Show Preprocessed Received only
|
|
342
|
+
|`--show-preprocessed-received`
|
|
343
|
+
|N/A (config only)
|
|
344
|
+
|`cfg.xml.diff.show_preprocessed_received = true`
|
|
345
|
+
|`CANON_XML_DIFF_SHOW_PREPROCESSED_RECEIVED=true`
|
|
346
|
+
|
|
347
|
+
|Show Pretty-printed Inputs (both)
|
|
348
|
+
|`--show-prettyprint-inputs`
|
|
349
|
+
|N/A (config only)
|
|
350
|
+
|`cfg.xml.diff.show_prettyprint_inputs = true`
|
|
351
|
+
|`CANON_XML_DIFF_SHOW_PRETTYPRINT_INPUTS=true`
|
|
352
|
+
|
|
353
|
+
|Show Pretty-printed Expected only
|
|
354
|
+
|`--show-prettyprint-expected`
|
|
355
|
+
|N/A (config only)
|
|
356
|
+
|`cfg.xml.diff.show_prettyprint_expected = true`
|
|
357
|
+
|`CANON_XML_DIFF_SHOW_PRETTYPRINT_EXPECTED=true`
|
|
358
|
+
|
|
359
|
+
|Show Pretty-printed Received only
|
|
360
|
+
|`--show-prettyprint-received`
|
|
361
|
+
|N/A (config only)
|
|
362
|
+
|`cfg.xml.diff.show_prettyprint_received = true`
|
|
363
|
+
|`CANON_XML_DIFF_SHOW_PRETTYPRINT_RECEIVED=true`
|
|
364
|
+
|
|
365
|
+
|Show Line-Numbered Inputs
|
|
366
|
+
|`--show-line-numbered-inputs`
|
|
367
|
+
|N/A (config only)
|
|
368
|
+
|`cfg.xml.diff.show_line_numbered_inputs = true`
|
|
369
|
+
|`CANON_XML_DIFF_SHOW_LINE_NUMBERED_INPUTS=true`
|
|
370
|
+
|===
|
|
371
|
+
|
|
372
|
+
NOTE: `show_raw_inputs`, `show_preprocessed_inputs`, and `show_prettyprint_inputs` are convenience shorthands for enabling both sides of their respective section simultaneously. Use the per-side variants (`*_expected` / `*_received`) when you only want to display one side — e.g. suppress the (long) expected fixture and show only what was generated.
|
|
373
|
+
|
|
374
|
+
NOTE: The *pretty-printed* section (`show_prettyprint_*`) differs from the *preprocessed* section (`show_preprocessed_*`) in two ways: (1) it always pretty-prints the **original** strings using `PrettyPrinter::Xml`/`Html`, independently of any `preprocessing` or `display_preprocessing` setting; and (2) it outputs **plain ASCII** with no character visualization (spaces remain spaces), making the output suitable for direct copy-paste into RSpec heredoc fixtures.
|
|
375
|
+
|
|
376
|
+
NOTE: `verbose_diff` enables the raw, preprocessed, and line-numbered debug sections simultaneously. It does **not** enable the pretty-printed section, which is a fixture-oriented feature independent of debugging verbosity.
|
|
377
|
+
|
|
198
378
|
=== Size Limit Options
|
|
199
379
|
|
|
200
380
|
[cols="2,2,3,3,3"]
|
data/lib/canon/cli.rb
CHANGED
|
@@ -238,6 +238,26 @@ module Canon
|
|
|
238
238
|
type: :boolean,
|
|
239
239
|
default: false,
|
|
240
240
|
desc: "Show preprocessed contents (what was actually compared)"
|
|
241
|
+
method_option :show_preprocessed_expected,
|
|
242
|
+
type: :boolean,
|
|
243
|
+
default: false,
|
|
244
|
+
desc: "Show only the EXPECTED (fixture) block in the preprocessed-inputs section"
|
|
245
|
+
method_option :show_preprocessed_received,
|
|
246
|
+
type: :boolean,
|
|
247
|
+
default: false,
|
|
248
|
+
desc: "Show only the RECEIVED (actual) block in the preprocessed-inputs section"
|
|
249
|
+
method_option :show_prettyprint_inputs,
|
|
250
|
+
type: :boolean,
|
|
251
|
+
default: false,
|
|
252
|
+
desc: "Show fixture-ready pretty-printed form of both inputs (no whitespace visualization)"
|
|
253
|
+
method_option :show_prettyprint_expected,
|
|
254
|
+
type: :boolean,
|
|
255
|
+
default: false,
|
|
256
|
+
desc: "Show only the EXPECTED block in the fixture-ready pretty-printed section"
|
|
257
|
+
method_option :show_prettyprint_received,
|
|
258
|
+
type: :boolean,
|
|
259
|
+
default: false,
|
|
260
|
+
desc: "Show only the RECEIVED block in the fixture-ready pretty-printed section"
|
|
241
261
|
method_option :show_line_numbered_inputs,
|
|
242
262
|
type: :boolean,
|
|
243
263
|
default: false,
|