RubyGems - canon - Versions diffs - 0.1.22 → 0.2.0 - Mend

canon 0.1.22 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (56) hide show

checksums.yaml +4 -4
data/.rubocop_todo.yml +174 -25
data/docs/INDEX.adoc +4 -0
data/docs/advanced/diff-classification.adoc +3 -2
data/docs/features/configuration-profiles.adoc +288 -0
data/docs/features/diff-formatting/character-visualization.adoc +153 -454
data/docs/features/diff-formatting/display-filtering.adoc +44 -0
data/docs/features/diff-formatting/display-preprocessing.adoc +656 -0
data/docs/features/diff-formatting/index.adoc +47 -0
data/docs/features/diff-formatting/pretty-diff-mode.adoc +154 -0
data/docs/features/environment-configuration/override-system.adoc +10 -3
data/docs/features/index.adoc +9 -0
data/docs/features/match-options/index.adoc +32 -42
data/docs/features/match-options/pretty-printed-fixtures.adoc +270 -0
data/docs/guides/choosing-configuration.adoc +22 -0
data/docs/reference/environment-variables.adoc +121 -1
data/docs/reference/options-across-interfaces.adoc +182 -2
data/lib/canon/cli.rb +20 -0
data/lib/canon/commands/diff_command.rb +7 -2
data/lib/canon/commands/format_command.rb +1 -1
data/lib/canon/comparison/html_comparator.rb +20 -15
data/lib/canon/comparison/html_compare_profile.rb +4 -4
data/lib/canon/comparison/markup_comparator.rb +12 -3
data/lib/canon/comparison/match_options/base_resolver.rb +29 -7
data/lib/canon/comparison/match_options/json_resolver.rb +9 -0
data/lib/canon/comparison/match_options/xml_resolver.rb +16 -2
data/lib/canon/comparison/match_options/yaml_resolver.rb +10 -0
data/lib/canon/comparison/match_options.rb +4 -1
data/lib/canon/comparison/whitespace_sensitivity.rb +189 -137
data/lib/canon/comparison/xml_comparator/child_comparison.rb +21 -4
data/lib/canon/comparison/xml_comparator.rb +14 -12
data/lib/canon/comparison/xml_node_comparison.rb +51 -6
data/lib/canon/comparison.rb +52 -9
data/lib/canon/config/env_schema.rb +32 -4
data/lib/canon/config/override_resolver.rb +16 -3
data/lib/canon/config/profile_loader.rb +135 -0
data/lib/canon/config/profiles/metanorma.yml +74 -0
data/lib/canon/config/profiles/metanorma_debug.yml +8 -0
data/lib/canon/config/type_converter.rb +8 -0
data/lib/canon/config.rb +469 -5
data/lib/canon/diff/diff_classifier.rb +41 -11
data/lib/canon/diff_formatter/diff_detail_formatter/dimension_formatter.rb +48 -17
data/lib/canon/diff_formatter/diff_detail_formatter/node_utils.rb +58 -0
data/lib/canon/diff_formatter/diff_detail_formatter.rb +22 -7
data/lib/canon/diff_formatter/theme.rb +24 -17
data/lib/canon/diff_formatter.rb +493 -36
data/lib/canon/pretty_printer/xml_normalized.rb +395 -0
data/lib/canon/rspec_matchers.rb +36 -0
data/lib/canon/tree_diff/matchers/hash_matcher.rb +26 -11
data/lib/canon/version.rb +1 -1
data/lib/canon/xml/nodes/namespace_node.rb +4 -0
data/lib/canon/xml/nodes/processing_instruction_node.rb +4 -0
data/lib/canon/xml/nodes/root_node.rb +4 -0
data/lib/canon/xml/nodes/text_node.rb +4 -0
data/lib/tasks/performance_helpers.rb +2 -2
metadata +24 -2

data/lib/canon/pretty_printer/xml_normalized.rb ADDED Viewed

@@ -0,0 +1,395 @@
+# frozen_string_literal: true
+require "nokogiri"
+module Canon
+  module PrettyPrinter
+    # Mixed-content-aware XML serializer for diff display preprocessing.
+    #
+    # == The mixed-content problem
+    #
+    # Standard XML pretty-printers (including Nokogiri's built-in serializer)
+    # keep elements that contain both text and child elements on a single line.
+    # They have no choice: inserting a newline between, say, `<p>See ` and
+    # `<xref.../>` would create a new whitespace text node, changing the
+    # document's semantic content.  The result for line-by-line diffs is that
+    # any change inside such an element forces the entire line — potentially
+    # hundreds or thousands of characters — to be marked as changed.  Issue #53
+    # documented this as "1000-character long lines" from HTML diffs.
+    #
+    # == Three-way whitespace classification
+    #
+    # This serializer distinguishes three categories of element-level whitespace
+    # behaviour, configured via element-name lists:
+    #
+    # * **Preserve** (`preserve_whitespace_elements`) — every whitespace character is
+    #   significant. `" "` ≠ `"\n"`. Typical: `<pre>`, `<code>`, `<textarea>`.
+    #   Whitespace-only text nodes are visualized character-by-character.
+    #
+    # * **Collapse** (`collapse_whitespace_elements`) — presence ≠ absence,
+    #   but all whitespace forms are equivalent: `" "` == `"\n  "` == `"\t"`.
+    #   Typical: `<p>`, `<li>`, `<td>`, heading elements.
+    #   Whitespace-only text nodes are collapsed to a single `░` visualization,
+    #   so `<p>\n  <em>` (indented fixture) and `<p> <em>` (compact source)
+    #   both render as `<p>░<em>` — identical display lines, no spurious diff.
+    #
+    # * **Strip** (everything else, or explicit `strip_whitespace_elements`) —
+    #   all whitespace between child elements is structural formatting noise.
+    #   `" "` == `"\n  "` == nothing. Whitespace-only text nodes are silently
+    #   dropped. Typical: `<section>`, `<ul>`, `<formattedref>`, `<bibitem>`.
+    #
+    # Classification is **ancestor-based**: a text node's class is determined
+    # by the closest matching ancestor. This means `<em>` inside `<p>` inherits
+    # `<p>`'s normalize behaviour without needing to be listed explicitly.
+    #
+    # == Format defaults
+    #
+    # * **XML**: all three lists are empty by default — insensitive everywhere.
+    #   Whitespace sensitivity is opt-in, consistent with XML's data-first usage.
+    #
+    # * **HTML**: built-in defaults are provided (but overridable):
+    #   - preserve: `pre`, `code`, `textarea`, `script`, `style`
+    #   - collapse: `p`, `li`, `dt`, `dd`, `td`, `th`, `h1`–`h6`, `caption`,
+    #     `figcaption`, `label`, `legend`, `summary`, `blockquote`, `address`
+    #
+    # == Structural vs. content whitespace
+    #
+    # * **Structural whitespace** — indentation characters emitted by the
+    #   serializer itself.  These do not exist in the source document.
+    #   They are rendered as ordinary ASCII space and newline characters.
+    # * **Content whitespace** — whitespace that exists as text-node content
+    #   in the source document.  Classification (above) decides how to render it.
+    #
+    # The invariant is: every XML element always starts on its own line.
+    # Content whitespace is never confused with structural indentation.
+    #
+    # == Example (normalize element <p>)
+    #
+    # Input — compact source (Metanorma-style):
+    #   <p>See <xref target="M"/></p>
+    #
+    # Input — indented fixture heredoc:
+    #   <p>
+    #     See
+    #     <xref target="M"/>
+    #   </p>
+    #
+    # Both serialize to:
+    #   <p>
+    #     See░
+    #     <xref target="M"/>
+    #   </p>
+    #
+    # Result: zero diff lines for a semantically identical document.
+    #
+    # == Example (insensitive element <formattedref>)
+    #
+    # Input — compact source:
+    #   <formattedref><em>Cereals</em>.</formattedref>
+    #
+    # Input — indented fixture:
+    #   <formattedref>
+    #     <em>Cereals</em>.
+    #   </formattedref>
+    #
+    # Both serialize to (whitespace-only nodes silently dropped):
+    #   <formattedref>
+    #     <em>Cereals</em>
+    #     .
+    #   </formattedref>
+    #
+    # Result: zero diff lines.
+    #
+    # == Usage
+    #
+    #   printer = Canon::PrettyPrinter::XmlNormalized.new
+    #   formatted = printer.format(xml_string)
+    #
+    #   # With element lists (XML):
+    #   printer = Canon::PrettyPrinter::XmlNormalized.new(
+    #     collapse_whitespace_elements: %w[p formattedref title],
+    #     preserve_whitespace_elements: %w[sourcecode pre],
+    #   )
+    #
+    class XmlNormalized
+      # @param indent [Integer] number of indent characters per level (default 2)
+      # @param indent_type [String] "space" or "tab"
+      # @param visualization_map [Hash, nil] character visualization map
+      # @param preserve_whitespace_elements [Array<String>] element names where
+      #   every whitespace character is significant (e.g. pre, code).
+      # @param collapse_whitespace_elements [Array<String>] element names where
+      #   presence of whitespace matters but all forms are equivalent (e.g. p, li).
+      # @param strip_whitespace_elements [Array<String>] explicit blacklist — these
+      #   elements and their children always have whitespace dropped, even if an
+      #   ancestor would otherwise be preserve or collapse.
+      # @param pretty_printed [Boolean] when true, whitespace-only text nodes
+      #   that begin with "\n" inside +:collapse+ elements are treated as
+      #   structural indentation and silently dropped.  This matches the
+      #   comparison-side behaviour activated by +pretty_printed_expected+ /
+      #   +pretty_printed_received+ match options.  Nodes under +:preserve+ elements
+      #   are always preserved; nodes under +:strip+ elements are already dropped.
+      def initialize(indent: 2, indent_type: "space", visualization_map: nil,
+                     preserve_whitespace_elements: [],
+                     collapse_whitespace_elements: [],
+                     strip_whitespace_elements: [],
+                     pretty_printed: false,
+                     sort_attributes: false)
+        @indent = indent.to_i
+        @indent_char = indent_type == "tab" ? "\t" : " "
+        @vis_map = visualization_map || default_vis_map
+        @pretty_printed = pretty_printed
+        @sort_attributes = sort_attributes
+        @strict_ws  = Set.new((preserve_whitespace_elements || []).map(&:to_s))
+        @norm_ws    = Set.new((collapse_whitespace_elements || []).map(&:to_s))
+        @insens_ws  = Set.new((strip_whitespace_elements || []).map(&:to_s))
+      end
+      # Format an XML string with mixed-content-aware serialization.
+      #
+      # @param xml_string [String] Input XML
+      # @return [String] Serialized XML, one node per line, with content
+      #   whitespace visualized at line boundaries
+      def format(xml_string)
+        doc = Nokogiri::XML(xml_string)
+        lines = []
+        if doc.version
+          enc = doc.encoding ? " encoding=\"#{doc.encoding}\"" : ""
+          lines << "<?xml version=\"#{doc.version}\"#{enc}?>"
+        end
+        lines << serialize_element(doc.root, 0) if doc.root
+        lines.join("\n")
+      end
+      private
+      # Return indent string for depth.
+      def ind(depth)
+        @indent_char * (@indent * depth)
+      end
+      # Classify the whitespace behaviour for a given Nokogiri element node.
+      #
+      # Walks up the ancestor chain from the element itself.  The first
+      # matching ancestor determines the class.  Insensitive blacklist wins
+      # over any sensitive ancestor.
+      #
+      # @param element [Nokogiri::XML::Element] The element to classify
+      # @return [Symbol] :strict, :normalize, or :drop
+      def classify_whitespace(element)
+        current = element
+        while current && !current.is_a?(Nokogiri::XML::Document)
+          name = current.name.to_s
+          return :drop      if @insens_ws.include?(name)
+          return :strict    if @strict_ws.include?(name)
+          return :normalize if @norm_ws.include?(name)
+          current = current.parent
+        end
+        # No matching ancestor — default: drop (insensitive)
+        :drop
+      end
+      # Serialize a single element node.
+      def serialize_element(node, depth)
+        # Filter out empty text nodes (zero-length, not whitespace-only).
+        children = node.children.reject { |c| c.text? && c.content.empty? }
+        if children.empty?
+          return "#{ind(depth)}#{open_tag(node,
+                                          self_close: true)}"
+        end
+        elem_children = children.select(&:element?)
+        text_with_content = children.select do |c|
+          c.text? && !c.content.strip.empty?
+        end
+        if elem_children.empty?
+          # Pure-text element — keep on one line.
+          return "#{ind(depth)}#{open_tag(node)}#{node.text}</#{node.name}>"
+        end
+        if text_with_content.empty?
+          # Element-only children (may have whitespace-only text nodes between them).
+          # Apply classification to decide whether to drop or visualize them.
+          ws_class = classify_whitespace(node)
+          lines = ["#{ind(depth)}#{open_tag(node)}"]
+          children.each do |child|
+            if child.text?
+              # Whitespace-only text node between element children
+              vis = render_whitespace_only(child.content, ws_class)
+              next if vis.nil? # :drop
+              # Append to previous line (do not create a new line)
+              lines[-1] = lines[-1] + vis
+            else
+              lines << serialize_element(child, depth + 1)
+            end
+          end
+          lines << "#{ind(depth)}</#{node.name}>"
+          return lines.join("\n")
+        end
+        # Mixed content: both text-with-content and element children.
+        serialize_mixed(node, children, depth)
+      end
+      # Serialize a mixed-content element.
+      #
+      # Each child is processed in document order.  Text nodes are split into:
+      # * leading whitespace  → rendered according to whitespace classification
+      # * non-whitespace content → put on its OWN indented line
+      # * trailing whitespace → rendered according to classification, appended
+      #
+      # Element children flush the current accumulated line, then are
+      # serialized recursively.
+      def serialize_mixed(node, children, depth)
+        child_depth = depth + 1
+        lines = []
+        current_line = "#{ind(depth)}#{open_tag(node)}"
+        ws_class = classify_whitespace(node)
+        children.each do |child|
+          if child.text?
+            process_text_node(child.content, child_depth, lines, current_line,
+                              ws_class) do |nl|
+              current_line = nl
+            end
+          else
+            lines << current_line
+            current_line = serialize_element(child, child_depth)
+          end
+        end
+        lines << current_line
+        lines << "#{ind(depth)}</#{node.name}>"
+        lines.join("\n")
+      end
+      # Render a whitespace-only string according to classification.
+      #
+      # When +@pretty_printed+ is true and +ws_class+ is +:normalize+:
+      # * Content starting with "\n" (e.g. "\n  " indentation) is treated as
+      #   structural pretty-print formatting and **dropped** (returns nil).
+      # * All other whitespace (e.g. " " inline space) is still rendered as the
+      #   usual single-space visualization.
+      # This aligns display output with the comparison-side behaviour controlled
+      # by +pretty_printed_expected+ / +pretty_printed_received+.
+      #
+      # @param content [String] Whitespace-only string
+      # @param ws_class [Symbol] :strict, :normalize, or :drop
+      # @return [String, nil] Rendered string, or nil to indicate "drop"
+      def render_whitespace_only(content, ws_class)
+        case ws_class
+        when :strict
+          visualize(content)
+        when :normalize
+          # In pretty_printed mode, \n-leading whitespace is structural — drop it
+          return nil if @pretty_printed && content.start_with?("\n")
+          # Any other whitespace → single space visualization
+          content.empty? ? nil : @vis_map.fetch(" ", "░")
+          # :drop — fall through to nil
+        end
+      end
+      # Process a text node in mixed-content context.
+      #
+      # Yields the new current_line (string the caller should adopt).
+      #
+      # === Pure-whitespace text nodes
+      #
+      # Whitespace-only text nodes are rendered via +render_whitespace_only+
+      # according to the element's whitespace classification:
+      # - :strict   → visualize every character (e.g. ↵░░░)
+      # - :normalize → single ░ regardless of whitespace form
+      # - :drop     → silently discarded
+      #
+      # === Text nodes with printable content
+      #
+      # Leading and trailing whitespace are split off and rendered according
+      # to the whitespace classification at line boundaries.  The printable
+      # content occupies its own indented line.
+      def process_text_node(content, child_depth, lines, current_line, ws_class)
+        stripped = content.strip
+        if stripped.empty?
+          # Pure whitespace between elements
+          vis = render_whitespace_only(content, ws_class)
+          if vis.nil?
+            yield current_line # :drop — no change
+          else
+            yield current_line + vis
+          end
+          return
+        end
+        leading  = content[/\A\s*/]
+        trailing = content[/\s*\z/]
+        middle   = stripped
+        # Leading whitespace: append to current line (then flush), or drop
+        unless leading.empty?
+          vis = render_whitespace_only(leading, ws_class)
+          current_line += vis unless vis.nil?
+        end
+        lines << current_line
+        # Trailing whitespace visualization
+        trailing_vis = if trailing.empty?
+                         ""
+                       else
+                         v = render_whitespace_only(trailing, ws_class)
+                         v.nil? ? "" : v
+                       end
+        yield "#{ind(child_depth)}#{middle}#{trailing_vis}"
+      end
+      # Build an opening XML tag with namespace declarations and attributes.
+      def open_tag(node, self_close: false)
+        ns_decls = node.namespace_definitions.map do |ns|
+          ns.prefix ? " xmlns:#{ns.prefix}=\"#{ns.href}\"" : " xmlns=\"#{ns.href}\""
+        end.join
+        attr_nodes = node.attribute_nodes
+        if @sort_attributes
+          attr_nodes = attr_nodes.sort_by do |a|
+            [a.namespace&.href.to_s, a.name]
+          end
+        end
+        attrs = attr_nodes.map do |a|
+          prefix = a.namespace&.prefix ? "#{a.namespace.prefix}:" : ""
+          " #{prefix}#{a.name}=\"#{escape_attr(a.value)}\""
+        end.join
+        close = self_close ? "/>" : ">"
+        "<#{node.name}#{ns_decls}#{attrs}#{close}"
+      end
+      # Escape characters that are special inside attribute values.
+      def escape_attr(value)
+        value.gsub("&", "&amp;").gsub('"', "&quot;").gsub("<", "&lt;")
+      end
+      # Visualize a whitespace string using the character map.
+      # Non-whitespace characters are passed through unchanged (safety net).
+      def visualize(str)
+        return "" if str.nil? || str.empty?
+        str.chars.map { |c| @vis_map.fetch(c, c) }.join
+      end
+      # Load the default visualization map from DiffFormatter constants.
+      def default_vis_map
+        require_relative "../diff_formatter"
+        Canon::DiffFormatter::DEFAULT_VISUALIZATION_MAP
+      rescue LoadError, NameError
+        { " " => "░", "\t" => "⇥", "\n" => "↵", "\r" => "⏎", "\u00A0" => "␣" }
+      end
+    end
+  end
+end

data/lib/canon/rspec_matchers.rb CHANGED Viewed

@@ -219,6 +219,20 @@ module Canon
             context_lines: diff_config.context_lines,
             diff_grouping_lines: diff_config.grouping_lines,
             show_diffs: diff_config.show_diffs,
+            show_raw_inputs: diff_config.show_raw_inputs,
+            show_raw_expected: diff_config.show_raw_expected,
+            show_raw_received: diff_config.show_raw_received,
+            show_preprocessed_inputs: diff_config.show_preprocessed_inputs,
+            show_preprocessed_expected: diff_config.show_preprocessed_expected,
+            show_preprocessed_received: diff_config.show_preprocessed_received,
+            show_prettyprint_inputs: diff_config.show_prettyprint_inputs,
+            show_prettyprint_expected: diff_config.show_prettyprint_expected,
+            show_prettyprint_received: diff_config.show_prettyprint_received,
+            show_line_numbered_inputs: diff_config.show_line_numbered_inputs,
+            character_visualization: diff_config.character_visualization,
+            display_preprocessing: diff_config.display_preprocessing,
+            pretty_printer_indent: diff_config.pretty_printer.indent,
+            pretty_printer_indent_type: diff_config.pretty_printer.indent_type,
           )
           return formatter.format([], :string, doc1: @expected.to_s,
@@ -237,6 +251,28 @@ module Canon
           diff_grouping_lines: diff_config.grouping_lines,
           show_diffs: diff_config.show_diffs,
           verbose_diff: diff_config.verbose_diff,
+          show_raw_inputs: diff_config.show_raw_inputs,
+          show_raw_expected: diff_config.show_raw_expected,
+          show_raw_received: diff_config.show_raw_received,
+          show_preprocessed_inputs: diff_config.show_preprocessed_inputs,
+          show_preprocessed_expected: diff_config.show_preprocessed_expected,
+          show_preprocessed_received: diff_config.show_preprocessed_received,
+          show_prettyprint_inputs: diff_config.show_prettyprint_inputs,
+          show_prettyprint_expected: diff_config.show_prettyprint_expected,
+          show_prettyprint_received: diff_config.show_prettyprint_received,
+          show_line_numbered_inputs: diff_config.show_line_numbered_inputs,
+          character_visualization: diff_config.character_visualization,
+          display_preprocessing: diff_config.display_preprocessing,
+          pretty_printer_indent: diff_config.pretty_printer.indent,
+          pretty_printer_indent_type: diff_config.pretty_printer.indent_type,
+          preserve_whitespace_elements: diff_config.preserve_whitespace_elements,
+          collapse_whitespace_elements: diff_config.collapse_whitespace_elements,
+          strip_whitespace_elements: diff_config.strip_whitespace_elements,
+          pretty_printed_expected: diff_config.pretty_printed_expected,
+          pretty_printed_received: diff_config.pretty_printed_received,
+          pretty_printer_sort_attributes: diff_config.pretty_printer_sort_attributes,
+          compact_semantic_report: diff_config.compact_semantic_report,
+          expand_difference: diff_config.expand_difference,
         )
         # Format the diff using the comparison result

data/lib/canon/tree_diff/matchers/hash_matcher.rb CHANGED Viewed

@@ -93,19 +93,34 @@ module Canon
           end
           return if candidates.empty?
-          best_match = find_best_match(node2, candidates)
-          return unless best_match
-          if @matching.add(best_match, node2)
-            @matched_tree1 << best_match
-            @matched_tree2 << node2
-            propagate_to_ancestors(best_match, node2)
+          # When multiple candidates have identical signatures (common with
+          # duplicate subtrees like MathML formulas), sort by sibling position
+          # proximity to prefer matching nodes at the same position within
+          # their parent. This reduces cross-matching that causes cascading
+          # prefix closure failures.
+          if candidates.size > 1
+            pos2 = node2.position || 0
+            candidates = candidates.sort_by do |c|
+              pos1 = c.position || 0
+              (pos1 - pos2).abs
+            end
           end
-        end
-        # @return [TreeNode, nil]
-        def find_best_match(node2, candidates)
-          candidates.find { |node1| subtrees_match?(node1, node2) }
+          # Try each candidate until one passes both subtree matching
+          # AND the prefix closure constraint in matching.add.
+          # When multiple candidates have identical subtrees (e.g., labels
+          # with the same text child), the first may fail prefix closure
+          # due to ancestor cross-matching, but a later candidate succeeds.
+          candidates.each do |candidate|
+            next unless subtrees_match?(candidate, node2)
+            if @matching.add(candidate, node2)
+              @matched_tree1 << candidate
+              @matched_tree2 << node2
+              propagate_to_ancestors(candidate, node2)
+              return
+            end
+          end
         end
         def subtrees_match?(node1, node2)

data/lib/canon/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module Canon
-  VERSION = "0.1.22"
+  VERSION = "0.2.0"
 end

data/lib/canon/xml/nodes/namespace_node.rb CHANGED Viewed

@@ -15,6 +15,10 @@ module Canon
           @uri = uri
         end
+        def name
+          prefix.to_s
+        end
         def node_type
           :namespace
         end

data/lib/canon/xml/nodes/processing_instruction_node.rb CHANGED Viewed

@@ -15,6 +15,10 @@ module Canon
           @data = data
         end
+        def name
+          target
+        end
         def node_type
           :processing_instruction
         end

data/lib/canon/xml/nodes/root_node.rb CHANGED Viewed

@@ -7,6 +7,10 @@ module Canon
     module Nodes
       # Root node representing the document root
       class RootNode < Node
+        def name
+          "#document"
+        end
         def node_type
           :root
         end

data/lib/canon/xml/nodes/text_node.rb CHANGED Viewed

@@ -22,6 +22,10 @@ module Canon
           @original = original || value
         end
+        def name
+          "#text"
+        end
         def node_type
           :text
         end

data/lib/tasks/performance_helpers.rb CHANGED Viewed

@@ -52,7 +52,7 @@ module PerformanceHelpers
   class << self
     def load_into_namespace(module_obj, file_path)
-      content = File.read(file_path)
+      content = File.read(file_path, encoding: "utf-8")
       module_obj.module_eval(content, file_path)
     end
@@ -85,7 +85,7 @@ module PerformanceHelpers
         bench_copy_dir = File.join(clone_dir, "tmp", "performance")
         FileUtils.mkdir_p(bench_copy_dir)
         bench_copy = File.join(bench_copy_dir, "benchmark_runner.rb")
-        File.write(bench_copy, File.read(script))
+        File.write(bench_copy, File.read(script, encoding: "utf-8"))
         load_into_namespace(Base, bench_copy)
       end
     end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: canon
 version: !ruby/object:Gem::Version
-  version: 0.1.22
+  version: 0.2.0
 platform: ruby
 authors:
 - Ribose Inc.
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2026-03-31 00:00:00.000000000 Z
+date: 2026-04-12 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: diff-lcs
@@ -80,6 +80,20 @@ dependencies:
     - - ">="
       - !ruby/object:Gem::Version
         version: '0'
+- !ruby/object:Gem::Dependency
+  name: rainbow
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
 - !ruby/object:Gem::Dependency
   name: table_tennis
   requirement: !ruby/object:Gem::Requirement
@@ -149,12 +163,15 @@ files:
 - docs/advanced/index.adoc
 - docs/advanced/semantic-diff-report.adoc
 - docs/advanced/verbose-mode-architecture.adoc
+- docs/features/configuration-profiles.adoc
 - docs/features/diff-formatting/algorithm-specific-output.adoc
 - docs/features/diff-formatting/character-visualization.adoc
 - docs/features/diff-formatting/colors-and-symbols.adoc
 - docs/features/diff-formatting/context-and-grouping.adoc
 - docs/features/diff-formatting/display-filtering.adoc
+- docs/features/diff-formatting/display-preprocessing.adoc
 - docs/features/diff-formatting/index.adoc
+- docs/features/diff-formatting/pretty-diff-mode.adoc
 - docs/features/diff-formatting/themes.adoc
 - docs/features/environment-configuration/index.adoc
 - docs/features/environment-configuration/override-system.adoc
@@ -164,6 +181,7 @@ files:
 - docs/features/match-options/algorithm-specific-behavior.adoc
 - docs/features/match-options/html-policies.adoc
 - docs/features/match-options/index.adoc
+- docs/features/match-options/pretty-printed-fixtures.adoc
 - docs/features/performance.adoc
 - docs/getting-started/index.adoc
 - docs/getting-started/quick-start.adoc
@@ -247,6 +265,9 @@ files:
 - lib/canon/config/env_provider.rb
 - lib/canon/config/env_schema.rb
 - lib/canon/config/override_resolver.rb
+- lib/canon/config/profile_loader.rb
+- lib/canon/config/profiles/metanorma.yml
+- lib/canon/config/profiles/metanorma_debug.yml
 - lib/canon/config/type_converter.rb
 - lib/canon/data_model.rb
 - lib/canon/diff/diff_block.rb
@@ -304,6 +325,7 @@ files:
 - lib/canon/pretty_printer/html.rb
 - lib/canon/pretty_printer/json.rb
 - lib/canon/pretty_printer/xml.rb
+- lib/canon/pretty_printer/xml_normalized.rb
 - lib/canon/rspec_matchers.rb
 - lib/canon/tree_diff.rb
 - lib/canon/tree_diff/adapters/html_adapter.rb