RubyGems - nestedtext - Versions diffs - 4.6.0 → 5.0.2 - Mend

nestedtext 4.6.0 → 5.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +133 -41
data/README.md +1 -1
data/lib/nestedtext/decode.rb +78 -3
data/lib/nestedtext/errors_internal.rb +29 -3
data/lib/nestedtext/parser.rb +8 -2
data/lib/nestedtext/scanners.rb +12 -7
data/lib/nestedtext/version.rb +1 -1
metadata +2 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 313e16f9cbbf553ff5b3afaee739150c9fad09ddbbca2aa7b73c02bb7e6dc422
-  data.tar.gz: 4179c0d01df5da768be52317ac2b96f236d659e25cbe84436334f3cc46c42240
+  metadata.gz: 3779aae06776d786682bb0406184692a06a1587c78b31b4e7a808daa3fde8c65
+  data.tar.gz: b382d463a220cd64c8935767c2ce75f4aedc12184f913eb5b5109e16b850ad86
 SHA512:
-  metadata.gz: be6167b0b8f56c2e6df9e3964a09e082046a666d2aa1ebccb4a53a03e0bc209d9e4df9387ea7173fac7a50abe8ffefc07b5240d5452f5e730180b443de035678
-  data.tar.gz: 870f1f5504ce8b998c6f1c5308eea9536fd82b740e91074768509cdbd8912b6771dc3219434ae468ffa85ca0373fc7bc5a2e106c8083d4159c135100445ae32b
+  metadata.gz: a76972ff3e809e9d676d05161123051455de6b1297abd5e9cbc297500fecaa87715233254a814d6f925911305825394391aea26c8b0b1a2493a8a57526a42f5f
+  data.tar.gz: b5a5a1af978d02ed55d90bec3a89ce74e855d9766937fb2528e0053a6510116caa34dc7e1de87854c05b98450884c366ae65e928a33f2cd1aec8087986ce81c9

data/CHANGELOG.md CHANGED Viewed

@@ -1,4 +1,5 @@
 # Changelog
 All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
@@ -6,150 +7,241 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
+## [5.0.1] - 2026-05-29
+### Fixed
+* rubocop errors
+## [5.0.0] - 2026-05-29
+### Added
+* Support for NestedText specification v3.8 (official test suite updated).
+* UTF-8 BOM is now silently stripped before parsing.
+* CR-only and CRLF line endings are normalised to LF, so files from all platforms are accepted.
+* Parser now raises a dedicated error when a multi-line key has no following indented value.
+* Parser now raises "extra content" when valid content is followed by unexpected additional lines.
+* Unicode whitespace characters (e.g. U+00A0 NO-BREAK SPACE) at the start of a line are correctly rejected as invalid indentation.
+* Unicode whitespace is now trimmed from the end of dict keys before the `:` separator, consistent with the spec.
+### Fixed
+* `key:` (dict item with a single trailing space) is now correctly treated as having no inline value, allowing an indented block value to follow.
+* `-` (list item with a single trailing space) is now correctly treated as having no inline value.
 ## [4.6.0] - 2025-10-31
 ### Changed
-- Migrated Code Climate to qlty.
+* Migrated Code Climate to qlty.
 ## [4.5.0] - 2022-06-09
 ### Added
-- Guard filesystem watch and test executor.
+* Guard filesystem watch and test executor.
 ### Changed
-- Dumped file content should end with newline, following updated version 3.3.0 of the NestedText specification.
+* Dumped file content should end with newline, following updated version 3.3.0 of the NestedText specification.
 ## [4.4.6] - 2022-02-17
 ### Fixed
-- rubydoc.info: don't force use hash value omission with rubycop. rubydoc.info is not on ruby 3.1 yet.
+* rubydoc.info: don't force use hash value omission with rubycop. rubydoc.info is not on ruby 3.1 yet.
 ## [4.4.5] - 2022-02-17
 ### Fixed
-- rubydoc.info: try remove unused module require.
+* rubydoc.info: try remove unused module require.
 ## [4.4.4] - 2022-02-17
 ### Fixed
-- rubydoc.info: revert reject instead of select
+* rubydoc.info: revert reject instead of select
 ## [4.4.3] - 2022-02-17
 ### Fixed
-- rubydoc.info: try building gem from git-ls | reject instead of select
+* rubydoc.info: try building gem from git-ls | reject instead of select
 ## [4.4.2] - 2022-02-17
 ### Fixed
-- rubydoc.info: try includ all of lib/**/*.rb
+* rubydoc.info: try includ all of lib/**/*.rb
 ## [4.4.1] - 2022-02-17
 ### Fixed
-- rubydoc.info: try fix missing class methods.
+* rubydoc.info: try fix missing class methods.
 ## [4.4.0] - 2022-02-17
 ### Fixed
-- rubydoc.info: not re-generating for patch versions?
+* rubydoc.info: not re-generating for patch versions?
 ## [4.3.1] - 2022-02-17
 ### Fixed
-- rubydoc.info: Include .yardopts in gem
+* rubydoc.info: Include .yardopts in gem
 ## [4.3.0] - 2022-02-17
 ### Fixed
-- rubydoc.info: try fix missing class methods.
+* rubydoc.info: try fix missing class methods.
 ## [4.2.2] - 2022-02-12
 ### Fixed
-- Better module documentation fix.
+* Better module documentation fix.
 ## [4.2.1] - 2022-02-12
 ### Fixed
-- Better module documentation.
+* Better module documentation.
 ## [4.2.0] - 2022-02-08
 ### Fixed
-- Proper Unicode character name lookup.
+* Proper Unicode character name lookup.
 ## [4.1.1] - 2022-01-28
 ### Fixed
-- Don't trigger CI when CD will run all tests anyways.
+* Don't trigger CI when CD will run all tests anyways.
 ## [4.1.0] - 2022-01-28
 ### Changed
-- cd.yml now runs full tests before releasing new version, by using reusable workflows.
+* cd.yml now runs full tests before releasing new version, by using reusable workflows.
 ## [4.0.0] - 2022-01-28
 ### Changed
-- **Breaking change**: Renamed `NTEncodeMixin` to `ToNTMixin`.
-- All code linted with RuboCop
+* **Breaking change**: Renamed `NTEncodeMixin` to `ToNTMixin`.
+* All code linted with RuboCop
 ## [3.2.1] - 2022-01-27
 ### Fixed
-- Fix logo at rubydoc.info
+* Fix logo at rubydoc.info
 ## [3.2.0] - 2022-01-27
 ### Changed
-- Switch from rdoc formatting syntax to Markdown with Redcarpet to be able to render README.md properly.
+* Switch from rdoc formatting syntax to Markdown with Redcarpet to be able to render README.md properly.
 ## [3.1.0] - 2022-01-27
 ### Changed
-- Switch from rdoc to YARD to match rubydoc.info that is used automatically for Gems uploaded to rubygems.org.
+* Switch from rdoc to YARD to match rubydoc.info that is used automatically for Gems uploaded to rubygems.org.
 ## [3.0.0] - 2022-01-27
 ### Added
-- API documentation generated with rdoc.
+* API documentation generated with rdoc.
 ### Fixed
-- Removed leaked `NT_MIXIN` constant in core extensions.
+* Removed leaked `NT_MIXIN` constant in core extensions.
 ### Changed
-- **Breaking change**: `#to_nt` on `String`, `Array` and `Hash` is no longer strict by default for consistency an unexpected surprises e.g. when having an array of Custom Objects and calling the method on the array.
-- Internal clean-up and simplifications on helper classes and methods.
+* **Breaking change**: `#to_nt` on `String`, `Array` and `Hash` is no longer strict by default for consistency an unexpected surprises e.g. when having an array of Custom Objects and calling the method on the array.
+* Internal clean-up and simplifications on helper classes and methods.
 ## [2.1.0] - 2022-01-27
 ### Changed
-- Slim down Gem by using include instead of block list.
+* Slim down Gem by using include instead of block list.
 ## [2.0.1] - 2022-01-26
 ### Fixed
-- README issue with logo showing up on Rdoc (out-commented HTML).
+* README issue with logo showing up on Rdoc (out-commented HTML).
 ## [2.0.0] - 2022-01-26
 ### Changed
-- **Breaking change**: strict mode now defaults to false for both the `load` and `dump` methods.
-- Internal rename of error classes to be more consistent.
-- Internal simplification of argument passing.
+* **Breaking change**: strict mode now defaults to false for both the `load` and `dump` methods.
+* Internal rename of error classes to be more consistent.
+* Internal simplification of argument passing.
 ## [1.2.0] - 2022-01-25
 ### Changed
-- Hide core extension `String.normalize_line_endings` from users.
+* Hide core extension `String.normalize_line_endings` from users.
 ## [1.1.1] - 2022-01-25
 ### Fixed
-- Renamed `ToNTMixing` to `ToNTMixin` .
+* Renamed `ToNTMixing` to `ToNTMixin` .
 ## [1.1.0] - 2022-01-25
 ### Added
-- Expose `NestedText::VERSION` for convenience to the users.
+* Expose `NestedText::VERSION` for convenience to the users.
 ## [1.0.0] - 2022-01-25
 The library is now useful for users!
 ### Changed
-- Hide all internals in the module from users.
+* Hide all internals in the module from users.
 ## [0.6.0] - 2022-01-24
 ### Fixed
-- Move runtime dependencies from Gemfile to .gemspec.
+* Move runtime dependencies from Gemfile to .gemspec.
 ## [0.5.0] - 2022-01-24
 ### Added
-- Publish Gem to GitHub Packages
+* Publish Gem to GitHub Packages
 ## [0.4.0] - 2022-01-24
-- Iteration on CD GitHub Actions workflow.
+* Iteration on CD GitHub Actions workflow.
 ## [0.3.0] - 2022-01-24
-- Iteration on CD GitHub Actions workflow.
+* Iteration on CD GitHub Actions workflow.
 ## [0.2.0] - 2022-01-24
-- Iteration on CD GitHub Actions workflow.
+* Iteration on CD GitHub Actions workflow.
 ## [0.1.0] - 2022-01-24
 ### Added
-- Initial release. If this release works, an 1.0.0 will soon follow.
+* Initial release. If this release works, an 1.0.0 will soon follow.

data/README.md CHANGED Viewed

@@ -2,7 +2,7 @@
 [![Gem Version](https://badge.fury.io/rb/nestedtext.svg)](https://badge.fury.io/rb/nestedtext)
 [![Gem Downloads](https://img.shields.io/gem/dt/nestedtext?label=gem%20downloads)](https://rubygems.org/gems/nestedtext)
 [![Documentation](https://img.shields.io/badge/docs-API-informational?logo=readthedocs&logoColor=violet)](https://www.rubydoc.info/gems/nestedtext/NestedText)
-[![Data Format Version Supported](https://img.shields.io/badge/%F0%9F%84%BD%F0%9F%85%83%20Version%20Supported-3.4.0-blueviolet)](https://nestedtext.org/en/v3.3/)
+[![Data Format Version Supported](https://img.shields.io/badge/%F0%9F%84%BD%F0%9F%85%83%20Version%20Supported-3.8-blueviolet)](https://nestedtext.org/en/v3.8/)
 [![Official Tests](https://img.shields.io/badge/Official%20Tests-Passing-success?logo=cachet)](https://github.com/KenKundert/nestedtext_tests/)
 [![GitHub Actions: Continuous Integration](https://github.com/erikw/nestedtext-ruby/actions/workflows/ci.yml/badge.svg)](https://github.com/erikw/nestedtext-ruby/actions/workflows/ci.yml)
 [![GitHub Actions: Continuous Deployment](https://github.com/erikw/nestedtext-ruby/actions/workflows/cd.yml/badge.svg)](https://github.com/erikw/nestedtext-ruby/actions/workflows/cd.yml)

data/lib/nestedtext/decode.rb CHANGED Viewed

@@ -6,6 +6,10 @@ require 'nestedtext/errors_internal'
 require 'stringio'
 module NestedText
+  # UTF-8 BOM as a binary string for reliable detection regardless of input encoding.
+  UTF8_BOM = "\xEF\xBB\xBF".b.freeze
+  private_constant :UTF8_BOM
   # Decode a NestedText string to Ruby objects.
   #
   # @param ntstring [String] The string containing NestedText to be decoded.
@@ -18,6 +22,7 @@ module NestedText
   def self.load(ntstring, top_class: Object, strict: false)
     raise Errors::WrongInputTypeError.new([String], ntstring) unless ntstring.nil? || ntstring.is_a?(String)
+    ntstring = prepare_string_input(ntstring) unless ntstring.nil?
     Parser.new(StringIO.new(ntstring), top_class, strict: strict).parse
   end
@@ -34,9 +39,79 @@ module NestedText
   def self.load_file(filename, top_class: Object, strict: false)
     raise Errors::WrongInputTypeError.new([String], filename) unless !filename.nil? && filename.is_a?(String)
-    # Open explicitly in text mode to detect \r as line ending.
-    File.open(filename, 'rt') do |file|
-      Parser.new(file, top_class, strict: strict).parse
+    # Read in binary mode to handle BOM detection; we manually ensure UTF-8.
+    raw = File.binread(filename)
+    ntstring = prepare_string_input(raw)
+    Parser.new(StringIO.new(ntstring), top_class, strict: strict).parse
+  end
+  # Strips a UTF-8 BOM if present, validates UTF-8 encoding, normalizes line
+  # endings (CR-only and CRLF → LF), and returns a clean UTF-8 String ready
+  # for the parser.
+  def self.prepare_string_input(str)
+    binary = str.b
+    binary = binary.delete_prefix(UTF8_BOM)
+    raise_invalid_utf8_error(binary) unless binary.force_encoding('UTF-8').valid_encoding?
+    # Normalize CR-only and CRLF line endings to LF so the scanner's
+    # IO#gets (which splits on \n) works correctly for all platforms.
+    binary.force_encoding('UTF-8').gsub(/\r\n?/, "\n")
+  end
+  private_class_method :prepare_string_input
+  # Scans binary bytes to find the first invalid UTF-8 sequence, then raises a
+  # ParseError with the correct lineno and colno.
+  def self.raise_invalid_utf8_error(binary)
+    bytes = binary.bytes
+    lineno, line_start, i = scan_for_invalid_utf8(bytes)
+    colno = i - line_start
+    line_end = bytes.index(0x0A, i) || bytes.length
+    line_content = bytes[line_start...line_end].pack('C*')
+                                               .force_encoding('UTF-8')
+                                               .encode('UTF-8', invalid: :replace, undef: :replace)
+    raise Errors::ParseEncodingError.new(line_content, lineno, colno)
+  end
+  private_class_method :raise_invalid_utf8_error
+  # Returns [lineno, line_start, i] where i is the position of the first
+  # invalid byte.
+  def self.scan_for_invalid_utf8(bytes)
+    lineno = 0
+    line_start = 0
+    i = 0
+    while i < bytes.length
+      byte = bytes[i]
+      seq_len = utf8_sequence_length(byte)
+      break unless seq_len && valid_utf8_continuation?(bytes, i, seq_len)
+      if byte == 0x0A
+        lineno += 1
+        line_start = i + 1
+      end
+      i += seq_len
+    end
+    [lineno, line_start, i]
+  end
+  private_class_method :scan_for_invalid_utf8
+  # Returns the expected byte-sequence length for a UTF-8 start byte, or nil if invalid.
+  def self.utf8_sequence_length(byte)
+    if byte < 0x80 then 1             # 0x00–0x7F: ASCII
+    elsif byte < 0xC2 then nil        # 0x80–0xC1: continuation or overlong (invalid start)
+    elsif byte < 0xE0 then 2          # 0xC2–0xDF: 2-byte sequence
+    elsif byte < 0xF0 then 3          # 0xE0–0xEF: 3-byte sequence
+    elsif byte < 0xF8 then 4          # 0xF0–0xF7: 4-byte sequence
+      # 0xF8–0xFF: invalid
+    end
+  end
+  private_class_method :utf8_sequence_length
+  # Checks that the continuation bytes following a multi-byte start are valid.
+  def self.valid_utf8_continuation?(bytes, start, seq_len)
+    (1...seq_len).all? do |j|
+      k = start + j
+      k < bytes.length && (bytes[k] & 0xC0) == 0x80
     end
   end
+  private_class_method :valid_utf8_continuation?
 end

data/lib/nestedtext/errors_internal.rb CHANGED Viewed

@@ -14,7 +14,7 @@ module NestedText
     end
     class ParseError < InternalError
-      attr_reader :lineno, :colno, :message_raw
+      attr_reader :lineno, :colno, :message_raw, :line
       def initialize(line, colno, message)
         # Note, both line and column number are 0-indexed.
@@ -22,6 +22,7 @@ module NestedText
         @lineno = line.lineno
         @colno = colno
         @message_raw = message
+        @line = (' ' * line.indentation) + line.content
         super(pretty_message(line))
       end
@@ -92,6 +93,30 @@ module NestedText
       end
     end
+    class ParseMultilineKeyRequiresIndentedValueError < ParseError
+      def initialize(line)
+        super(line, line.indentation, 'indented value must follow multi-line key.')
+      end
+    end
+    class ParseExtraContentError < ParseError
+      def initialize(line)
+        super(line, line.indentation, 'extra content.')
+      end
+    end
+    # A lightweight line substitute used when reporting encoding errors before
+    # the scanner has produced any Line objects.
+    EncodingErrorLine = Struct.new(:lineno, :indentation, :content, :prev)
+    private_constant :EncodingErrorLine
+    class ParseEncodingError < ParseError
+      def initialize(line_content, lineno, colno)
+        line = EncodingErrorLine.new(lineno, 0, line_content, nil)
+        super(line, colno, 'invalid start byte')
+      end
+    end
     class ParseInlineDictSyntaxError < ParseError
       def initialize(line, colno, wrong_char)
         super(line, line.indentation + colno, "expected ‘,’ or ‘}’, found ‘#{wrong_char}’.")
@@ -246,8 +271,9 @@ module NestedText
     end
     def self.raise_unrecognized_line(line)
-      # [[:space:]] include all Unicode spaces e.g. non-breakable space which \s does not.
-      raise ParseInvalidIndentationCharError, line if line.content.chr =~ /[[:space:]]/
+      # Use content[0] (Unicode character) rather than .chr (first byte) so that
+      # multi-byte Unicode spaces (e.g. U+00A0 NO-BREAK SPACE) are detected.
+      raise ParseInvalidIndentationCharError, line if line.content[0] =~ /[[:space:]]/
       raise ParseLineTagNotDetectedError, line
     end

data/lib/nestedtext/parser.rb CHANGED Viewed

@@ -28,6 +28,8 @@ module NestedText
     def parse
       result = parse_any(0)
+      raise Errors::ParseExtraContentError, @line_scanner.peek unless @line_scanner.peek.nil?
       case @top_class.object_id
       when Object.object_id
         return_object(result)
@@ -111,8 +113,8 @@ module NestedText
     def assert_list_line(line, indentation)
       Errors.raise_unrecognized_line(line) if line.tag == :unrecognized
-      raise Errors::ParseLineTypeExpectedListItemError, line unless line.tag == :list_item
       raise Errors::ParseInvalidIndentationError.new(line, indentation) if line.indentation != indentation
+      raise Errors::ParseLineTypeExpectedListItemError, line unless line.tag == :list_item
     end
     def parse_list_item(indentation)
@@ -161,7 +163,7 @@ module NestedText
     end
     def parse_key_item_value(indentation, line)
-      return '' if @line_scanner.peek.nil?
+      raise Errors::ParseMultilineKeyRequiresIndentedValueError, line if @line_scanner.peek.nil?
       exp_types = %i[dict_item key_item list_item string_item]
       unless exp_types.member?(@line_scanner.peek.tag)
@@ -220,6 +222,10 @@ module NestedText
     def parse_string_item(indentation)
       result = []
       while !@line_scanner.peek.nil? && @line_scanner.peek.indentation >= indentation
+        # Stop (without consuming) when same-indent non-string line is encountered;
+        # the caller handles it (e.g. as "extra content." at top level).
+        break if @line_scanner.peek.indentation == indentation && @line_scanner.peek.tag != :string_item
         line = @line_scanner.read_next
         assert_string_line(line, indentation)

data/lib/nestedtext/scanners.rb CHANGED Viewed

@@ -119,7 +119,7 @@ module NestedText
     PATTERN_DICT_ITEM = /^
              (?<key>[^\s].*?)   # Key must start with a non-whitespace character, and goes until first
-              \s*:              # first optional space, or :-separator
+              \p{Space}*:       # optional Unicode whitespace then :-separator
               (?:               # Value part is optional
                 \p{Space}       # Must have a space after :-separator
                 (?<value>.*)    # Value is everything to the end of the line
@@ -141,7 +141,8 @@ module NestedText
         @attribs['key'] = @content[2..] || ''
       elsif @content =~ /^-(?: |$)/
         self.tag = :list_item
-        @attribs['value'] = @content[2..]
+        value = @content[2..]
+        @attribs['value'] = value.nil? || value.empty? ? nil : value
       elsif @content =~ /^>(?: |$)/
         self.tag = :string_item
         @attribs['value'] = @content[2..] || ''
@@ -149,14 +150,18 @@ module NestedText
         self.tag = :inline_dict
       elsif @content[0] == '['
         self.tag = :inline_list
-      elsif @content =~ PATTERN_DICT_ITEM
-        self.tag = :dict_item
-        @attribs['key'] = Regexp.last_match(:key)
-        @attribs['value'] = Regexp.last_match(:value)
-      else
+      elsif @content[0] =~ /[[:space:]]/ || @content !~ PATTERN_DICT_ITEM
+        # A non-ASCII Unicode space character at the start is invalid indentation.
+        # (ASCII spaces are stripped by fast_forward_indentation, so this can only
+        #  be a character like U+00A0 NO-BREAK SPACE.)
         # Don't raise error here, as this line might not have been consumed yet,
         # thus could hide an error that we detect when parsing the previous line.
         self.tag = :unrecognized
+      else
+        self.tag = :dict_item
+        @attribs['key'] = Regexp.last_match(:key)
+        value = Regexp.last_match(:value)
+        @attribs['value'] = value.nil? || value.empty? ? nil : value
       end
     end

data/lib/nestedtext/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module NestedText
-  VERSION = '4.6.0' # The version of this library.
+  VERSION = '5.0.2' # The version of this library.
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: nestedtext
 version: !ruby/object:Gem::Version
-  version: 4.6.0
+  version: 5.0.2
 platform: ruby
 authors:
 - Erik Westrup
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2025-10-31 00:00:00.000000000 Z
+date: 2026-05-29 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: unicode_utils