RubyGems - canon - Versions diffs - 0.1.8 → 0.1.10 - Mend

canon 0.1.8 → 0.1.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (101) hide show

checksums.yaml +4 -4
data/.rubocop_todo.yml +83 -22
data/docs/Gemfile +1 -0
data/docs/_config.yml +90 -1
data/docs/advanced/diff-classification.adoc +196 -24
data/docs/features/match-options/index.adoc +239 -1
data/lib/canon/comparison/format_detector.rb +2 -1
data/lib/canon/comparison/html_comparator.rb +19 -8
data/lib/canon/comparison/html_compare_profile.rb +8 -2
data/lib/canon/comparison/markup_comparator.rb +109 -2
data/lib/canon/comparison/match_options/base_resolver.rb +7 -0
data/lib/canon/comparison/whitespace_sensitivity.rb +208 -0
data/lib/canon/comparison/xml_comparator/child_comparison.rb +15 -7
data/lib/canon/comparison/xml_comparator/diff_node_builder.rb +108 -0
data/lib/canon/comparison/xml_comparator/node_parser.rb +10 -5
data/lib/canon/comparison/xml_comparator/node_type_comparator.rb +14 -7
data/lib/canon/comparison/xml_comparator.rb +240 -23
data/lib/canon/comparison/xml_node_comparison.rb +25 -3
data/lib/canon/diff/diff_classifier.rb +119 -5
data/lib/canon/diff/formatting_detector.rb +1 -1
data/lib/canon/diff/xml_serialization_formatter.rb +153 -0
data/lib/canon/rspec_matchers.rb +37 -8
data/lib/canon/version.rb +1 -1
data/lib/canon/xml/data_model.rb +24 -13
metadata +4 -78
data/docs/plans/2025-01-17-html-parser-selection-fix.adoc +0 -250
data/false_positive_analysis.txt +0 -0
data/file1.html +0 -1
data/file2.html +0 -1
data/old-docs/ADVANCED_TOPICS.adoc +0 -20
data/old-docs/BASIC_USAGE.adoc +0 -16
data/old-docs/CHARACTER_VISUALIZATION.adoc +0 -567
data/old-docs/CLI.adoc +0 -497
data/old-docs/CUSTOMIZING_BEHAVIOR.adoc +0 -19
data/old-docs/DIFF_ARCHITECTURE.adoc +0 -435
data/old-docs/DIFF_FORMATTING.adoc +0 -540
data/old-docs/DIFF_PARAMETERS.adoc +0 -261
data/old-docs/DOM_DIFF.adoc +0 -1017
data/old-docs/ENV_CONFIG.adoc +0 -876
data/old-docs/FORMATS.adoc +0 -867
data/old-docs/INPUT_VALIDATION.adoc +0 -477
data/old-docs/MATCHER_BEHAVIOR.adoc +0 -90
data/old-docs/MATCH_ARCHITECTURE.adoc +0 -463
data/old-docs/MATCH_OPTIONS.adoc +0 -912
data/old-docs/MODES.adoc +0 -432
data/old-docs/NORMATIVE_INFORMATIVE_DIFFS.adoc +0 -219
data/old-docs/OPTIONS.adoc +0 -1387
data/old-docs/PREPROCESSING.adoc +0 -491
data/old-docs/README.old.adoc +0 -2831
data/old-docs/RSPEC.adoc +0 -814
data/old-docs/RUBY_API.adoc +0 -485
data/old-docs/SEMANTIC_DIFF_REPORT.adoc +0 -646
data/old-docs/SEMANTIC_TREE_DIFF.adoc +0 -765
data/old-docs/STRING_COMPARE.adoc +0 -345
data/old-docs/TMP.adoc +0 -3384
data/old-docs/TREE_DIFF.adoc +0 -1080
data/old-docs/UNDERSTANDING_CANON.adoc +0 -17
data/old-docs/VERBOSE.adoc +0 -482
data/old-docs/VISUALIZATION_MAP.adoc +0 -625
data/old-docs/WHITESPACE_TREATMENT.adoc +0 -1155
data/scripts/analyze_current_state.rb +0 -85
data/scripts/analyze_false_positives.rb +0 -114
data/scripts/analyze_remaining_failures.rb +0 -105
data/scripts/compare_current_failures.rb +0 -95
data/scripts/compare_dom_tree_diff.rb +0 -158
data/scripts/compare_failures.rb +0 -151
data/scripts/debug_attribute_extraction.rb +0 -66
data/scripts/debug_blocks_839.rb +0 -115
data/scripts/debug_meta_matching.rb +0 -52
data/scripts/debug_p_matching.rb +0 -192
data/scripts/debug_signature_matching.rb +0 -118
data/scripts/debug_sourcecode_124.rb +0 -32
data/scripts/debug_whitespace_sensitive.rb +0 -192
data/scripts/extract_false_positives.rb +0 -138
data/scripts/find_actual_false_positives.rb +0 -125
data/scripts/investigate_all_false_positives.rb +0 -161
data/scripts/investigate_batch1.rb +0 -127
data/scripts/investigate_classification.rb +0 -150
data/scripts/investigate_classification_detailed.rb +0 -190
data/scripts/investigate_common_failures.rb +0 -342
data/scripts/investigate_false_negative.rb +0 -80
data/scripts/investigate_false_positive.rb +0 -83
data/scripts/investigate_false_positives.rb +0 -227
data/scripts/investigate_false_positives_batch.rb +0 -163
data/scripts/investigate_mixed_content.rb +0 -125
data/scripts/investigate_remaining_16.rb +0 -214
data/scripts/run_single_test.rb +0 -29
data/scripts/test_all_false_positives.rb +0 -95
data/scripts/test_attribute_details.rb +0 -61
data/scripts/test_both_algorithms.rb +0 -49
data/scripts/test_both_simple.rb +0 -49
data/scripts/test_enhanced_semantic_output.rb +0 -125
data/scripts/test_readme_examples.rb +0 -131
data/scripts/test_semantic_tree_diff.rb +0 -99
data/scripts/test_semantic_ux_improvements.rb +0 -135
data/scripts/test_single_false_positive.rb +0 -119
data/scripts/test_size_limits.rb +0 -99
data/test_html_1.html +0 -21
data/test_html_2.html +0 -21
data/test_nokogiri.rb +0 -33
data/test_normalize.rb +0 -45

data/old-docs/RUBY_API.adoc DELETED Viewed

@@ -1,485 +0,0 @@
----
-layout: default
-title: Ruby API
-nav_order: 10
-parent: Basic Usage
-grand_parent: Documentation Index
----
-= Canon Ruby API
-:toc:
-:toclevels: 3
-== Scope
-This document describes how to use Canon from Ruby code. It covers formatting,
-parsing, and comparison APIs.
-For command-line usage, see link:CLI[CLI documentation].
-For RSpec testing, see link:RSPEC[RSpec documentation].
-== General
-Canon provides a unified Ruby API for working with XML, HTML, JSON, and YAML
-formats. All methods follow consistent patterns across formats.
-== Formatting
-=== Canonical formatting
-The `Canon.format` method produces canonical output (compact, normalized).
-Syntax:
-[source,ruby]
-----
-Canon.format(content, format)
-Canon.format_{format}(content)  # Format-specific shorthand
-----
-Where:
-`content`:: The input string
-`format`:: The format type (`:xml`, `:html`, `:json`, or `:yaml`)
-.Canonical formatting examples
-[example]
-====
-[source,ruby]
-----
-require 'canon'
-# XML - compact canonical form
-xml = '<root><b>2</b><a>1</a></root>'
-Canon.format(xml, :xml)
-# => "<root><a>1</a><b>2</b></root>"
-Canon.format_xml(xml)  # Shorthand
-# => "<root><a>1</a><b>2</b></root>"
-# HTML - compact canonical form
-html = '<div><p>Hello</p></div>'
-Canon.format(html, :html)
-Canon.format_html(html)  # Shorthand
-# JSON - canonical with sorted keys
-json = '{"z":3,"a":1,"b":2}'
-Canon.format(json, :json)
-# => {"a":1,"b":2,"z":3}
-# YAML - canonical with sorted keys
-yaml = "z: 3\na: 1\nb: 2"
-Canon.format(yaml, :yaml)
-----
-====
-=== Pretty-print formatting
-For human-readable output with indentation, use format-specific pretty printer
-classes.
-Syntax:
-[source,ruby]
-----
-Canon::{Format}::PrettyPrinter.new(indent: n, indent_type: type).format(content)
-----
-Where:
-`{Format}`:: The format module (`Xml`, `Html`, `Json`)
-`n`:: Number of spaces (default: 2) or tabs (use 1 for tabs)
-`type`:: Indentation type: `'space'` (default) or `'tab'`
-`content`:: The input string
-.Pretty-print examples
-[example]
-====
-[source,ruby]
-----
-require 'canon/pretty_printer/xml'
-require 'canon/pretty_printer/html'
-require 'canon/pretty_printer/json'
-xml_input = '<root><b>2</b><a>1</a></root>'
-# XML with 2-space indentation (default)
-Canon::Xml::PrettyPrinter.new(indent: 2).format(xml_input)
-# =>
-# <?xml version="1.0" encoding="UTF-8"?>
-# <root>
-#   <a>1</a>
-#   <b>2</b>
-# </root>
-# XML with 4-space indentation
-Canon::Xml::PrettyPrinter.new(indent: 4).format(xml_input)
-# XML with tab indentation
-Canon::Xml::PrettyPrinter.new(
-  indent: 1,
-  indent_type: 'tab'
-).format(xml_input)
-# HTML with 2-space indentation
-html_input = '<div><p>Hello</p></div>'
-Canon::Html::PrettyPrinter.new(indent: 2).format(html_input)
-# JSON with 2-space indentation
-json_input = '{"z":3,"a":{"b":1}}'
-Canon::Json::PrettyPrinter.new(indent: 2).format(json_input)
-# JSON with tab indentation
-Canon::Json::PrettyPrinter.new(
-  indent: 1,
-  indent_type: 'tab'
-).format(json_input)
-----
-====
-== Parsing
-The `Canon.parse` method parses content into Ruby objects or Nokogiri
-documents.
-Syntax:
-[source,ruby]
-----
-Canon.parse(content, format)
-Canon.parse_{format}(content)  # Format-specific shorthand
-----
-Where:
-`content`:: The input string
-`format`:: The format type (`:xml`, `:html`, `:json`, or `:yaml`)
-.Parsing examples
-[example]
-====
-[source,ruby]
-----
-# Parse XML → Nokogiri::XML::Document
-xml_doc = Canon.parse(xml_input, :xml)
-xml_doc = Canon.parse_xml(xml_input)
-# Parse HTML → Nokogiri::HTML5::Document (or XML::Document for XHTML)
-html_doc = Canon.parse(html_input, :html)
-html_doc = Canon.parse_html(html_input)
-# Parse JSON → Ruby Hash/Array
-json_obj = Canon.parse(json_input, :json)
-json_obj = Canon.parse_json(json_input)
-# Parse YAML → Ruby Hash/Array
-yaml_obj = Canon.parse(yaml_input, :yaml)
-yaml_obj = Canon.parse_yaml(yaml_input)
-----
-====
-== Comparison
-=== General
-The `Canon::Comparison.equivalent?` method compares two documents semantically.
-The comparison uses depth-first traversal of DOM trees (XML/HTML) or object
-graphs (JSON/YAML), comparing nodes/values based on configurable match
-dimensions.
-See link:MATCH_OPTIONS[Match options] for details on match dimensions and
-profiles.
-=== Basic comparison
-Syntax:
-[source,ruby]
-----
-Canon::Comparison.equivalent?(obj1, obj2, options = {})
-----
-Where:
-`obj1`:: First document (String, Nokogiri document, or Ruby object)
-`obj2`:: Second document (String, Nokogiri document, or Ruby object)
-`options`:: Hash of comparison options (optional)
-Returns:
-* `true` if documents are semantically equivalent
-* `false` if documents differ
-* `ComparisonResult` object if `verbose: true`
-Options:
-* `diff_algorithm`: `:dom` (default) or `:semantic` - chooses the diff algorithm
-* `verbose`: `true` or `false` - returns detailed results when true
-* `match`: Hash of match dimension options
-* `match_profile`: Symbol specifying a predefined profile
-.Basic comparison examples
-[example]
-====
-[source,ruby]
-----
-require 'canon/comparison'
-# HTML comparison - ignores whitespace by default
-html1 = '<div><p>Hello</p></div>'
-html2 = '<div> <p> Hello </p> </div>'
-Canon::Comparison.equivalent?(html1, html2)
-# => true
-# XML comparison - element order doesn't matter for children
-xml1 = '<root><a>1</a><b>2</b></root>'
-xml2 = '<root>  <b>2</b>  <a>1</a>  </root>'
-Canon::Comparison.equivalent?(xml1, xml2)
-# => true
-# JSON comparison
-json1 = '{"a":1,"b":2}'
-json2 = '{"b":2,"a":1}'
-Canon::Comparison.equivalent?(json1, json2)
-# => true
-# With Nokogiri documents
-doc1 = Nokogiri::HTML5(html1)
-doc2 = Nokogiri::HTML5(html2)
-Canon::Comparison.equivalent?(doc1, doc2)
-# => true
-----
-====
-=== Comparison with match options
-Match options control which aspects of documents are compared and how strictly.
-Syntax:
-[source,ruby]
-----
-Canon::Comparison.equivalent?(obj1, obj2,
-  match: {
-    dimension1: behavior1,
-    dimension2: behavior2,
-    ...
-  }
-)
-----
-See link:MATCH_OPTIONS[Match options] for complete dimension reference.
-.Match option examples
-[example]
-====
-[source,ruby]
-----
-# Normalize whitespace in text content
-Canon::Comparison.equivalent?(xml1, xml2,
-  match: {
-    text_content: :normalize,
-    structural_whitespace: :ignore
-  }
-)
-# Ignore comments
-Canon::Comparison.equivalent?(xml1, xml2,
-  match: {
-    comments: :ignore
-  }
-)
-# Strict attribute order
-Canon::Comparison.equivalent?(xml1, xml2,
-  match: {
-    attribute_order: :strict
-  }
-)
-# Multiple dimensions
-Canon::Comparison.equivalent?(html1, html2,
-  match: {
-    text_content: :normalize,
-    structural_whitespace: :ignore,
-    attribute_order: :ignore,
-    comments: :ignore
-  }
-)
-----
-====
-=== Using match profiles
-Match profiles are predefined combinations of match dimension settings.
-Syntax:
-[source,ruby]
-----
-Canon::Comparison.equivalent?(obj1, obj2,
-  match_profile: :profile_name
-)
-----
-Available profiles:
-`:strict`:: Exact matching - all dimensions use `:strict` behavior
-`:rendered`:: Mimics browser rendering - ignores formatting differences
-`:spec_friendly`:: Test-friendly - ignores most formatting, focuses on content
-`:content_only`:: Maximum tolerance - only semantic content matters
-.Match profile examples
-[example]
-====
-[source,ruby]
-----
-# Use spec_friendly profile (common for tests)
-Canon::Comparison.equivalent?(xml1, xml2,
-  match_profile: :spec_friendly
-)
-# Use rendered profile (for HTML)
-Canon::Comparison.equivalent?(html1, html2,
-  match_profile: :rendered
-)
-# Override profile with specific dimension
-Canon::Comparison.equivalent?(xml1, xml2,
-  match_profile: :spec_friendly,
-  match: {
-    comments: :strict  # Override profile setting
-  }
-)
-----
-====
-=== Verbose mode
-When `verbose: true` is specified, the method returns detailed comparison
-results instead of a boolean.
-Syntax:
-[source,ruby]
-----
-result = Canon::Comparison.equivalent?(obj1, obj2, verbose: true)
-----
-Returns:
-A Hash with two keys:
-`:differences`:: Array of difference objects (empty if equivalent)
-`:preprocessed`:: Two-element array of preprocessed documents
-.Verbose mode examples
-[example]
-====
-[source,ruby]
-----
-# Get detailed diff information
-result = Canon::Comparison.equivalent?(xml1, xml2, verbose: true)
-if result[:differences].empty?
-  puts "Documents are equivalent"
-else
-  puts "Found #{result[:differences].size} differences"
-  result[:differences].each do |diff|
-    puts "Difference: #{diff}"
-  end
-end
-# Access preprocessed content
-preprocessed1, preprocessed2 = result[:preprocessed]
-# Verbose with custom options
-result = Canon::Comparison.equivalent?(xml1, xml2,
-  verbose: true,
-  match: {
-    text_content: :normalize,
-    comments: :ignore
-  }
-)
-----
-====
-=== Format-specific comparators
-You can use format-specific comparator classes directly.
-Syntax:
-[source,ruby]
-----
-Canon::Comparison::XmlComparator.equivalent?(obj1, obj2, options)
-Canon::Comparison::HtmlComparator.equivalent?(obj1, obj2, options)
-Canon::Comparison::JsonComparator.equivalent?(obj1, obj2, options)
-Canon::Comparison::YamlComparator.equivalent?(obj1, obj2, options)
-----
-.Format-specific comparator examples
-[example]
-====
-[source,ruby]
-----
-# XML comparison with strict attribute order
-Canon::Comparison::XmlComparator.equivalent?(xml1, xml2,
-  match: {
-    attribute_order: :strict
-  }
-)
-# HTML comparison with rendered profile
-Canon::Comparison::HtmlComparator.equivalent?(html1, html2,
-  match_profile: :rendered
-)
-# JSON comparison ignoring key order
-Canon::Comparison::JsonComparator.equivalent?(json1, json2,
-  match: {
-    key_order: :ignore
-  }
-)
-----
-====
-== Validation
-Canon validates input before processing and raises `Canon::ValidationError`
-for malformed input.
-.Validation error handling
-[example]
-====
-[source,ruby]
-----
-require 'canon'
-malformed_xml = '<root><unclosed>'
-begin
-  Canon.format(malformed_xml, :xml)
-rescue Canon::ValidationError => e
-  puts e.message
-  # => XML Validation Error: Premature end of data in tag unclosed line 1
-  #    Line: 1
-  #    Column: 18
-  puts "Format: #{e.format}"     # => :xml
-  puts "Line: #{e.line}"          # => 1
-  puts "Column: #{e.column}"      # => 18
-end
-----
-====
-See link:INPUT_VALIDATION[Input validation] for details.
-== See also
-* link:CLI[Command-line interface]
-* link:RSPEC[RSpec matchers]
-* link:MATCH_OPTIONS[Match options reference]
-* link:FORMATS[Format support details]
-* link:INPUT_VALIDATION[Input validation]