RubyGems - canon - Versions diffs - 0.1.6 → 0.1.7 - Mend

canon 0.1.6 → 0.1.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (136) hide show

checksums.yaml +4 -4
data/.rubocop_todo.yml +163 -67
data/README.adoc +400 -7
data/docs/Gemfile +9 -0
data/docs/INDEX.adoc +99 -182
data/docs/_config.yml +100 -0
data/docs/advanced/diff-classification.adoc +547 -0
data/docs/advanced/diff-pipeline.adoc +358 -0
data/docs/advanced/index.adoc +214 -0
data/docs/advanced/semantic-diff-report.adoc +390 -0
data/docs/{VERBOSE.adoc → advanced/verbose-mode-architecture.adoc} +51 -53
data/docs/features/diff-formatting/algorithm-specific-output.adoc +533 -0
data/docs/{CHARACTER_VISUALIZATION.adoc → features/diff-formatting/character-visualization.adoc} +23 -62
data/docs/features/diff-formatting/colors-and-symbols.adoc +606 -0
data/docs/features/diff-formatting/context-and-grouping.adoc +490 -0
data/docs/features/diff-formatting/display-filtering.adoc +472 -0
data/docs/features/diff-formatting/index.adoc +140 -0
data/docs/features/environment-configuration/index.adoc +327 -0
data/docs/features/environment-configuration/override-system.adoc +436 -0
data/docs/features/environment-configuration/size-limits.adoc +273 -0
data/docs/features/index.adoc +173 -0
data/docs/features/input-validation/index.adoc +521 -0
data/docs/features/match-options/algorithm-specific-behavior.adoc +365 -0
data/docs/features/match-options/html-policies.adoc +312 -0
data/docs/features/match-options/index.adoc +621 -0
data/docs/getting-started/index.adoc +83 -0
data/docs/getting-started/quick-start.adoc +76 -0
data/docs/guides/choosing-configuration.adoc +689 -0
data/docs/guides/index.adoc +181 -0
data/docs/{CLI.adoc → interfaces/cli/index.adoc} +18 -13
data/docs/interfaces/index.adoc +101 -0
data/docs/{RSPEC.adoc → interfaces/rspec/index.adoc} +242 -31
data/docs/{RUBY_API.adoc → interfaces/ruby-api/index.adoc} +118 -16
data/docs/lychee.toml +65 -0
data/docs/reference/cli-options.adoc +418 -0
data/docs/reference/environment-variables.adoc +375 -0
data/docs/reference/index.adoc +204 -0
data/docs/reference/options-across-interfaces.adoc +417 -0
data/docs/understanding/algorithms/dom-diff.adoc +389 -0
data/docs/understanding/algorithms/index.adoc +314 -0
data/docs/understanding/algorithms/semantic-tree-diff.adoc +533 -0
data/docs/understanding/architecture.adoc +447 -0
data/docs/understanding/comparison-pipeline.adoc +317 -0
data/docs/understanding/formats/html.adoc +380 -0
data/docs/understanding/formats/index.adoc +261 -0
data/docs/understanding/formats/json.adoc +390 -0
data/docs/understanding/formats/xml.adoc +366 -0
data/docs/understanding/formats/yaml.adoc +504 -0
data/docs/understanding/index.adoc +130 -0
data/lib/canon/cli.rb +42 -1
data/lib/canon/commands/diff_command.rb +108 -23
data/lib/canon/comparison/compare_profile.rb +101 -0
data/lib/canon/comparison/comparison_result.rb +41 -2
data/lib/canon/comparison/html_comparator.rb +292 -71
data/lib/canon/comparison/html_compare_profile.rb +117 -0
data/lib/canon/comparison/match_options.rb +42 -4
data/lib/canon/comparison/strategies/base_match_strategy.rb +99 -0
data/lib/canon/comparison/strategies/match_strategy_factory.rb +74 -0
data/lib/canon/comparison/strategies/semantic_tree_match_strategy.rb +220 -0
data/lib/canon/comparison/xml_comparator.rb +695 -91
data/lib/canon/comparison.rb +207 -2
data/lib/canon/config/env_provider.rb +71 -0
data/lib/canon/config/env_schema.rb +58 -0
data/lib/canon/config/override_resolver.rb +55 -0
data/lib/canon/config/type_converter.rb +59 -0
data/lib/canon/config.rb +158 -29
data/lib/canon/data_model.rb +29 -0
data/lib/canon/diff/diff_classifier.rb +74 -14
data/lib/canon/diff/diff_context_builder.rb +41 -0
data/lib/canon/diff/diff_line.rb +18 -2
data/lib/canon/diff/diff_node.rb +18 -3
data/lib/canon/diff/diff_node_mapper.rb +71 -12
data/lib/canon/diff/formatting_detector.rb +53 -0
data/lib/canon/diff_formatter/by_line/base_formatter.rb +60 -5
data/lib/canon/diff_formatter/by_line/html_formatter.rb +68 -16
data/lib/canon/diff_formatter/by_line/json_formatter.rb +0 -37
data/lib/canon/diff_formatter/by_line/simple_formatter.rb +0 -42
data/lib/canon/diff_formatter/by_line/xml_formatter.rb +116 -31
data/lib/canon/diff_formatter/by_line/yaml_formatter.rb +0 -37
data/lib/canon/diff_formatter/by_object/base_formatter.rb +126 -19
data/lib/canon/diff_formatter/by_object/xml_formatter.rb +30 -1
data/lib/canon/diff_formatter/debug_output.rb +7 -1
data/lib/canon/diff_formatter/diff_detail_formatter.rb +674 -57
data/lib/canon/diff_formatter/legend.rb +42 -0
data/lib/canon/diff_formatter.rb +78 -9
data/lib/canon/errors.rb +56 -0
data/lib/canon/formatters/html_formatter_base.rb +35 -1
data/lib/canon/formatters/json_formatter.rb +3 -0
data/lib/canon/formatters/yaml_formatter.rb +3 -0
data/lib/canon/html/data_model.rb +229 -0
data/lib/canon/html.rb +9 -0
data/lib/canon/options/cli_generator.rb +70 -0
data/lib/canon/options/registry.rb +234 -0
data/lib/canon/rspec_matchers.rb +34 -13
data/lib/canon/tree_diff/adapters/html_adapter.rb +316 -0
data/lib/canon/tree_diff/adapters/json_adapter.rb +204 -0
data/lib/canon/tree_diff/adapters/xml_adapter.rb +285 -0
data/lib/canon/tree_diff/adapters/yaml_adapter.rb +213 -0
data/lib/canon/tree_diff/core/attribute_comparator.rb +84 -0
data/lib/canon/tree_diff/core/matching.rb +241 -0
data/lib/canon/tree_diff/core/node_signature.rb +164 -0
data/lib/canon/tree_diff/core/node_weight.rb +135 -0
data/lib/canon/tree_diff/core/tree_node.rb +450 -0
data/lib/canon/tree_diff/matchers/hash_matcher.rb +258 -0
data/lib/canon/tree_diff/matchers/similarity_matcher.rb +168 -0
data/lib/canon/tree_diff/matchers/structural_propagator.rb +242 -0
data/lib/canon/tree_diff/matchers/universal_matcher.rb +220 -0
data/lib/canon/tree_diff/operation_converter.rb +631 -0
data/lib/canon/tree_diff/operations/operation.rb +92 -0
data/lib/canon/tree_diff/operations/operation_detector.rb +626 -0
data/lib/canon/tree_diff/tree_diff_integrator.rb +140 -0
data/lib/canon/tree_diff.rb +33 -0
data/lib/canon/validators/json_validator.rb +3 -1
data/lib/canon/validators/yaml_validator.rb +3 -1
data/lib/canon/version.rb +1 -1
data/lib/canon/xml/data_model.rb +22 -23
data/lib/canon/xml/element_matcher.rb +128 -20
data/lib/canon/xml/namespace_helper.rb +110 -0
data/lib/canon.rb +3 -0
metadata +81 -23
data/_config.yml +0 -116
data/docs/ADVANCED_TOPICS.adoc +0 -20
data/docs/BASIC_USAGE.adoc +0 -16
data/docs/CUSTOMIZING_BEHAVIOR.adoc +0 -19
data/docs/DIFF_ARCHITECTURE.adoc +0 -435
data/docs/DIFF_FORMATTING.adoc +0 -540
data/docs/FORMATS.adoc +0 -447
data/docs/INPUT_VALIDATION.adoc +0 -477
data/docs/MATCH_ARCHITECTURE.adoc +0 -463
data/docs/MATCH_OPTIONS.adoc +0 -719
data/docs/MODES.adoc +0 -432
data/docs/NORMATIVE_INFORMATIVE_DIFFS.adoc +0 -219
data/docs/OPTIONS.adoc +0 -1387
data/docs/PREPROCESSING.adoc +0 -491
data/docs/SEMANTIC_DIFF_REPORT.adoc +0 -528
data/docs/UNDERSTANDING_CANON.adoc +0 -17

data/docs/understanding/architecture.adoc ADDED Viewed

@@ -0,0 +1,447 @@
+---
+title: Architecture
+parent: Understanding
+nav_order: 1
+---
+= Architecture
+:toc:
+:toclevels: 3
+== Purpose
+This document explains Canon's 4-layer comparison architecture and how documents flow through preprocessing, algorithm selection, semantic matching, and diff rendering.
+For a guided walkthrough of choosing configurations, see link:../guides/choosing-configuration.adoc[Choosing Configuration].
+For detailed 4-layer pipeline documentation, see link:comparison-pipeline.adoc[Comparison Pipeline].
+== Overview
+Canon uses a 4-layer architecture that separates concerns for clean, maintainable comparison logic:
+. **Layer 1 - Preprocessing**: Optional document normalization
+. **Layer 2 - Algorithm Selection**: Choose comparison strategy (DOM vs Semantic)
+. **Layer 3 - Match Options**: Content comparison with configurable dimensions (algorithm-specific)
+. **Layer 4 - Diff Formatting**: Formatted output with visualization (algorithm-specific)
+Each layer is independent and configurable, allowing fine-grained control over comparison behavior.
+**Key Insight**: Layers 3 and 4 are **algorithm-specific** - they behave differently depending on which algorithm (DOM or Semantic) is chosen in Layer 2.
+== System architecture diagram
+[mermaid]
+----
+graph TD
+    A[Input Documents] --> B[Layer 1: Preprocessing]
+    B --> C[Layer 2: Algorithm Selection]
+    C --> D[Layer 3: Match Options]
+    D --> E[Layer 4: Diff Formatting]
+    E --> F[Output]
+    C -->|DOM Algorithm| D1[Positional Matching]
+    C -->|Semantic Algorithm| D2[Signature Matching]
+    D1 --> E1[Line-based Diff]
+    D2 --> E2[Operation-based Diff]
+    style B fill:#e1f5ff
+    style C fill:#fff4e1
+    style D fill:#ffe1f5
+    style E fill:#e1ffe1
+----
+== Layer 1: Preprocessing
+=== Purpose
+Transform documents into a normalized form before comparison. This eliminates format-specific variations that should not affect semantic equivalence.
+=== Options
+`none` (default):: No preprocessing - compare documents as-is
+`c14n`:: Canonical form:
+* XML: W3C Canonical XML 1.1
+* HTML: Normalized HTML structure
+* JSON: Sorted keys, normalized whitespace
+* YAML: Sorted keys, standard format
+`normalize`:: Normalize whitespace:
+* Collapse multiple whitespace to single space
+* Trim leading/trailing whitespace
+* Normalize line endings
+`format`:: Pretty-print with standard formatting:
+* 2-space indentation
+* One element/property per line
+* Consistent structure
+=== Usage
+.Ruby API
+[example]
+====
+[source,ruby]
+----
+Canon::Comparison.equivalent?(doc1, doc2,
+  preprocessing: :normalize
+)
+----
+====
+.CLI
+[example]
+====
+[source,bash]
+----
+$ canon diff file1.xml file2.xml --preprocessing normalize
+----
+====
+See link:../features/preprocessing/[Preprocessing documentation] for details.
+== Layer 2: Algorithm selection
+=== Purpose
+Choose the comparison strategy. This is a **critical decision** because it determines how Layers 3 and 4 behave.
+=== Options
+`dom` (default):: DOM-based positional comparison
+* Fast, stable, well-tested
+* Position-based element matching
+* No move detection
+* Best for similar documents
+`semantic` (experimental):: Tree-based semantic diff
+* Slower but more intelligent
+* Signature-based matching
+* Detects moves, merges, splits
+* Best for restructured documents
+=== Algorithm characteristics
+[cols="2,3,3"]
+|===
+|Feature |DOM |Semantic
+|**Stability**
+|Stable (production-ready)
+|Experimental
+|**Performance**
+|Fast (linear)
+|Slower (quadratic worst case)
+|**Move Detection**
+|No
+|Yes
+|**Match Strategy**
+|Positional
+|Signature-based
+|**Layer 3 Behavior**
+|Element-by-element comparison
+|Signature calculation
+|**Layer 4 Behavior**
+|Line-based differences
+|Operation-based (INSERT, DELETE, UPDATE, MOVE)
+|**Best For**
+|Similar documents
+|Restructured documents
+|===
+=== Usage
+.Ruby API
+[example]
+====
+[source,ruby]
+----
+# DOM algorithm (default)
+Canon::Comparison.equivalent?(doc1, doc2,
+  diff_algorithm: :dom
+)
+# Semantic algorithm
+Canon::Comparison.equivalent?(doc1, doc2,
+  diff_algorithm: :semantic
+)
+----
+====
+.CLI
+[example]
+====
+[source,bash]
+----
+# DOM algorithm
+$ canon diff file1.xml file2.xml --diff-algorithm dom
+# Semantic algorithm
+$ canon diff file1.xml file2.xml --diff-algorithm semantic
+----
+====
+See link:algorithms/[Algorithm documentation] for details.
+== Layer 3: Match options
+=== Purpose
+Configure what to compare and how strictly. **This layer is algorithm-specific** - each algorithm interprets match options differently.
+=== Match dimensions
+Match dimensions are orthogonal aspects of documents that can be compared independently:
+`text_content`:: Text within elements/values
+`structural_whitespace`:: Whitespace between elements
+`attribute_whitespace`:: Whitespace in attribute values (XML/HTML)
+`attribute_order`:: Order of attributes (XML/HTML)
+`attribute_values`:: Attribute value content (XML/HTML)
+`key_order`:: Order of object keys (JSON/YAML)
+`comments`:: Comment content and placement
+Each dimension supports behaviors:
+* `:strict` - Must match exactly
+* `:normalize` - Match after normalization
+* `:ignore` - Don't compare
+=== Match profiles
+Profiles are predefined combinations of dimension settings for common scenarios:
+`:strict`:: Exact matching - all dimensions use `:strict` behavior
+`:rendered`:: Browser rendering - ignores formatting that doesn't affect display
+`:spec_friendly`:: Test-friendly - ignores formatting, focuses on content
+`:content_only`:: Maximum tolerance - only semantic content matters
+=== Algorithm-specific behavior
+**Critical**: The same match options behave differently with each algorithm!
+* **DOM algorithm**: Uses options for positional element comparison
+* **Semantic algorithm**: Uses options during signature calculation
+See link:../features/match-options/algorithm-specific-behavior.adoc[Algorithm-Specific Behavior] for detailed comparison.
+=== Usage
+.With dimensions
+[example]
+====
+[source,ruby]
+----
+Canon::Comparison.equivalent?(doc1, doc2,
+  match: {
+    text_content: :normalize,
+    structural_whitespace: :ignore,
+    comments: :ignore
+  }
+)
+----
+====
+.With profile
+[example]
+====
+[source,ruby]
+----
+Canon::Comparison.equivalent?(doc1, doc2,
+  match_profile: :spec_friendly
+)
+----
+====
+.Profile with dimension overrides
+[example]
+====
+[source,ruby]
+----
+Canon::Comparison.equivalent?(doc1, doc2,
+  match_profile: :spec_friendly,
+  match: {
+    comments: :strict  # Override profile setting
+  }
+)
+----
+====
+See link:../features/match-options/[Match Options] for complete reference.
+== Layer 4: Diff formatting
+=== Purpose
+Control how differences are displayed. **This layer is algorithm-specific** - each algorithm generates different output types.
+=== Diff modes
+`by_line`:: Traditional line-by-line diff
+* Natural fit for DOM algorithm
+* Shows positional changes
+* Traditional diff format
+`by_object`:: Tree-based semantic diff
+* Natural fit for Semantic algorithm
+* Shows structural operations
+* Visual tree representation
+=== Algorithm-specific output
+**Critical**: Each algorithm produces fundamentally different output!
+* **DOM algorithm**: Generates line-based differences
+* **Semantic algorithm**: Generates operation-based differences (INSERT, DELETE, UPDATE, MOVE)
+See link:../features/diff-formatting/algorithm-specific-output.adoc[Algorithm-Specific Output] for detailed comparison.
+=== Diff options
+`use_color`:: Enable/disable ANSI color codes (default: `true`)
+`context_lines`:: Number of unchanged lines around changes (default: `3`)
+`diff_grouping_lines`:: Group changes within N lines (default: `nil`)
+See link:../features/diff-formatting/[Diff Formatting] for details.
+=== Usage
+.Ruby API
+[example]
+====
+[source,ruby]
+----
+Canon::Comparison.equivalent?(doc1, doc2,
+  verbose: true,
+  diff_mode: :by_line,
+  use_color: true,
+  context_lines: 5,
+  diff_grouping_lines: 10
+)
+----
+====
+.CLI
+[example]
+====
+[source,bash]
+----
+$ canon diff file1.xml file2.xml \
+  --verbose \
+  --diff-mode by-line \
+  --context-lines 5 \
+  --diff-grouping-lines 10
+----
+====
+== Complete example: All 4 layers
+Here's a full configuration showing all 4 layers working together:
+[source,ruby]
+----
+result = Canon::Comparison.equivalent?(doc1, doc2,
+  # Layer 1: Preprocessing
+  preprocessing: :normalize,
+  # Layer 2: Algorithm
+  diff_algorithm: :semantic,
+  # Layer 3: Match Options
+  match_profile: :spec_friendly,
+  # Layer 4: Diff Formatting
+  verbose: true,
+  diff_mode: :by_object,
+  use_color: true,
+  context_lines: 3
+)
+----
+See link:comparison-pipeline.adoc[Comparison Pipeline] for layer-by-layer examples.
+== Configuration precedence
+When options are specified in multiple places, Canon resolves them using this hierarchy (highest to lowest priority):
+[source]
+----
+1. Per-comparison explicit options (highest)
+   ↓
+2. Per-comparison profile
+   ↓
+3. Global configuration explicit options
+   ↓
+4. Global configuration profile
+   ↓
+5. Format defaults (lowest)
+----
+.Precedence example
+[example]
+====
+Global configuration:
+[source,ruby]
+----
+Canon::RSpecMatchers.configure do |config|
+  config.xml.match.profile = :spec_friendly
+  config.xml.match.options = { comments: :strict }
+end
+----
+Per-test usage:
+[source,ruby]
+----
+expect(actual).to be_xml_equivalent_to(expected)
+  .with_profile(:rendered)
+  .with_options(structural_whitespace: :ignore)
+----
+**Final resolved options**:
+* `text_content: :normalize` (from `:rendered` per-test profile)
+* `structural_whitespace: :ignore` (from per-test explicit option)
+* `comments: :strict` (from global explicit option)
+* Other dimensions use `:rendered` profile or format defaults
+====
+== Benefits of 4-layer architecture
+**Separation of concerns**:: Each layer has a single responsibility
+**Composability**:: Mix and match preprocessing, algorithm, matching, and rendering options
+**Algorithm flexibility**:: Choose between speed (DOM) and intelligence (Semantic)
+**Testability**:: Each layer can be tested independently
+**Flexibility**:: Fine-grained control over comparison behavior
+**Clarity**:: Clear data flow from input to output
+**Extensibility**:: Easy to add new preprocessing, algorithms, dimensions, or rendering modes
+== See also
+* link:comparison-pipeline.adoc[Comparison Pipeline] - Complete 4-layer walkthrough
+* link:algorithms/[Algorithms] - DOM and Semantic algorithm details
+* link:../features/preprocessing/[Preprocessing options]
+* link:../features/match-options/[Match dimensions and profiles]
+* link:../features/match-options/algorithm-specific-behavior.adoc[Algorithm-Specific Behavior]
+* link:../features/diff-formatting/[Diff formatting]
+* link:../features/diff-formatting/algorithm-specific-output.adoc[Algorithm-Specific Output]
+* link:../guides/choosing-configuration.adoc[Choosing Configuration]
+* link:../interfaces/ruby-api/[Ruby API documentation]
+* link:../interfaces/cli/[Command-line interface]
+* link:../interfaces/rspec/[RSpec matchers]