RubyGems - cton - Versions diffs - 0.1.1 → 0.2.0 - Mend

cton 0.1.1 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: '0295922a011dd898278f9f57de2f48f2acbe0ce3363f263f4e2b2753993ebdea'
-  data.tar.gz: 33bb13ee23584ff6cd51bf6bc6c1a869d02f206ff0792a172c9aa7cdc5547977
+  metadata.gz: b010a8f0e0da39e4e4d0a4217eddaa8f9496f1889bf32e12430fdb7737f17fab
+  data.tar.gz: 6fe6f58ff0a40233a279ae5c8881ccca4ce382fa85cae15c2c5e26782bb02875
 SHA512:
-  metadata.gz: 9dff47df67680eabf6fb7ac05dac606e969df0a0d31d575318fa4a72c51c8fe85d38b671f4e2b8b8caa0e969184c044e06547b135b9a5b5b0baa4c3e28232322
-  data.tar.gz: be363392d2305b6940e46060310a908922bbf822ae487e9d4b20441e4cc51e5e8161332cd5d8c0846743663d1251ff7b1b5480f21f36f9b52c0aafb0404f4b74
+  metadata.gz: 3a85563dd205c2c00b204359d85376514de8fc45ce2b2c98e4d52a0325bff2937e2d88ba5e367fe718a0b82127603deadfe16dd6f60062e77a1b75babc666ec4
+  data.tar.gz: b4b27bfb483e0145c49def7b9ab735c27e03420dc59fd6bcaabc57d1b2bf6868d7bc5c55fea9866da3270a6c81126df032590129a1e28385827b8b4f3058e92a

data/CHANGELOG.md CHANGED Viewed

@@ -5,6 +5,20 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [0.2.0] - 2025-11-19
+### Added
+- **CLI Tool**: New `bin/cton` executable for converting between JSON and CTON from the command line. Supports auto-detection, pretty printing, and file I/O.
+- **Streaming IO**: `Cton.dump` now accepts an `IO` object as the second argument (or via `io:` keyword), allowing direct writing to files or sockets without intermediate string allocation.
+- **Pretty Printing**: Added `pretty: true` option to `Cton.dump` to format output with indentation and newlines for better readability.
+- **Extended Types**: Native support for `Time`, `Date` (ISO8601), `Set` (as Array), and `OpenStruct` (as Object).
+- **Enhanced Error Reporting**: `ParseError` now includes line and column numbers to help locate syntax errors in large documents.
+### Changed
+- **Ruby 3 Compatibility**: Improved argument handling in `Cton.dump` to robustly support Ruby 3 keyword arguments when passing hashes.
 ## [0.1.1] - 2025-11-18
 ### Changed

data/README.md CHANGED Viewed

@@ -1,32 +1,113 @@
 # CTON
-CTON (Compact Token-Oriented Notation) is an aggressively minified, JSON-compatible wire format that keeps prompts short without giving up schema hints. It is shape-preserving (objects, arrays, scalars, table-like arrays) and deterministic, so you can safely round-trip between Ruby hashes and compact strings that work well in LLM prompts.
+[![Gem Version](https://badge.fury.io/rb/cton.svg)](https://badge.fury.io/rb/cton)
+[![License](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/davidesantangelo/cton/blob/master/LICENSE.txt)
+**CTON** (Compact Token-Oriented Notation) is an aggressively minified, JSON-compatible wire format that keeps prompts short without giving up schema hints. It is shape-preserving (objects, arrays, scalars, table-like arrays) and deterministic, so you can safely round-trip between Ruby hashes and compact strings that work well in LLM prompts.
+---
+## 📖 Table of Contents
+- [What is CTON?](#what-is-cton)
+- [Why another format?](#why-another-format)
+- [Examples](#examples)
+- [Token Savings](#token-savings-vs-json--toon)
+- [Installation](#installation)
+- [Usage](#usage)
+- [Development](#development)
+- [Contributing](#contributing)
+- [License](#license)
+---
+## What is CTON?
+CTON is designed to be the most efficient way to represent structured data for Large Language Models (LLMs). It strips away the "syntactic sugar" of JSON that humans like (indentation, excessive quoting, braces) but machines don't strictly need, while adding "structural hints" that help LLMs generate valid output.
+### Key Concepts
+1.  **Root is Implicit**: No curly braces `{}` wrapping the entire document.
+2.  **Minimal Punctuation**:
+    *   Objects use `key=value`.
+    *   Nested objects use parentheses `(key=value)`.
+    *   Arrays use brackets with length `[N]=item1,item2`.
+3.  **Table Compression**: If an array contains objects with the same keys, CTON automatically converts it into a table format `[N]{header1,header2}=val1,val2;val3,val4`. This is a massive token saver for datasets.
+---
+## Examples
+### Simple Key-Value Pairs
+**JSON**
+```json
+{
+  "task": "planning",
+  "urgent": true,
+  "id": 123
+}
+```
+**CTON**
+```text
+task=planning,urgent=true,id=123
+```
+### Nested Objects
+**JSON**
+```json
+{
+  "user": {
+    "name": "Davide",
+    "settings": {
+      "theme": "dark"
+    }
+  }
+}
+```
+**CTON**
+```text
+user(name=Davide,settings(theme=dark))
+```
+### Arrays and Tables
+**JSON**
+```json
+{
+  "tags": ["ruby", "gem", "llm"],
+  "files": [
+    { "name": "README.md", "size": 1024 },
+    { "name": "lib/cton.rb", "size": 2048 }
+  ]
+}
+```
+**CTON**
+```text
+tags[3]=ruby,gem,llm
+files[2]{name,size}=README.md,1024;lib/cton.rb,2048
+```
+---
 ## Why another format?
 - **Less noise than YAML/JSON**: no indentation, no braces around the root, and optional quoting.
 - **Schema guardrails**: arrays carry their length (`friends[3]`) and table headers (`{id,name,...}`) so downstream parsing can verify shape.
 - **LLM-friendly**: works as a single string you can embed in a prompt together with short parsing instructions.
-- **Token savings**: CTON compounds the JSON → TOON savings; see the section below for concrete numbers.
+- **Token savings**: CTON compounds the JSON → TOON savings.
-## Token savings vs JSON & TOON
+### Token savings vs JSON & TOON
 - **JSON → TOON**: The [TOON benchmarks](https://toonformat.dev) report roughly 40% fewer tokens than plain JSON on mixed-structure prompts while retaining accuracy due to explicit array lengths and headers.
-- **TOON → CTON**: By stripping indentation and forcing everything inline, CTON cuts another ~20–40% of characters. The sample above is ~350 characters as TOON and ~250 as CTON (~29% fewer), and larger tabular datasets show similar reductions.
-- **Net effect**: In practice you can often reclaim 50–60% of the token budget versus raw JSON, leaving more room for instructions or reasoning steps while keeping a deterministic schema.
-## Format at a glance
+- **TOON → CTON**: By stripping indentation and forcing everything inline, CTON cuts another ~20–40% of characters.
+- **Net effect**: In practice you can often reclaim **50–60% of the token budget** versus raw JSON, leaving more room for instructions or reasoning steps while keeping a deterministic schema.
-```
-context(task="Our favorite hikes together",location=Boulder,season=spring_2025)
-friends[3]=ana,luis,sam
-hikes[3]{id,name,distanceKm,elevationGain,companion,wasSunny}=1,"Blue Lake Trail",7.5,320,ana,true;2,"Ridge Overlook",9.2,540,luis,false;3,"Wildflower Loop",5.1,180,sam,true
-```
-- Objects use parentheses and `key=value` pairs separated by commas.
-- Arrays encode their length: `[N]=...`. When every element is a flat hash with the same keys, they collapse into a compact table: `[N]{key1,key2}=row1;row2`.
-- Scalars (numbers, booleans, `null`) keep their JSON text. Strings only need quotes when they contain whitespace or reserved punctuation.
-- For parsing safety the Ruby encoder inserts a single `\n` between top-level segments. You can override this if you truly need a fully inline document (see options below).
+---
 ## Installation
@@ -42,28 +123,32 @@ Or install it directly:
 gem install cton
 ```
+---
 ## Usage
 ```ruby
 require "cton"
 payload = {
-	"context" => {
-		"task" => "Our favorite hikes together",
-		"location" => "Boulder",
-		"season" => "spring_2025"
-	},
-	"friends" => %w[ana luis sam],
-	"hikes" => [
-		{ "id" => 1, "name" => "Blue Lake Trail", "distanceKm" => 7.5, "elevationGain" => 320, "companion" => "ana", "wasSunny" => true },
-		{ "id" => 2, "name" => "Ridge Overlook", "distanceKm" => 9.2, "elevationGain" => 540, "companion" => "luis", "wasSunny" => false },
-		{ "id" => 3, "name" => "Wildflower Loop", "distanceKm" => 5.1, "elevationGain" => 180, "companion" => "sam", "wasSunny" => true }
-	]
+  "context" => {
+    "task" => "Our favorite hikes together",
+    "location" => "Boulder",
+    "season" => "spring_2025"
+  },
+  "friends" => %w[ana luis sam],
+  "hikes" => [
+    { "id" => 1, "name" => "Blue Lake Trail", "distanceKm" => 7.5, "elevationGain" => 320, "companion" => "ana", "wasSunny" => true },
+    { "id" => 2, "name" => "Ridge Overlook", "distanceKm" => 9.2, "elevationGain" => 540, "companion" => "luis", "wasSunny" => false },
+    { "id" => 3, "name" => "Wildflower Loop", "distanceKm" => 5.1, "elevationGain" => 180, "companion" => "sam", "wasSunny" => true }
+  ]
 }
+# Encode to CTON
 cton = Cton.dump(payload)
 # => "context(... )\nfriends[3]=ana,luis,sam\nhikes[3]{...}"
+# Decode back to Hash
 round_tripped = Cton.load(cton)
 # => original hash
@@ -72,24 +157,55 @@ symbolized = Cton.load(cton, symbolize_names: true)
 # Want a truly inline document? Opt in explicitly (decoding becomes unsafe for ambiguous cases).
 inline = Cton.dump(payload, separator: "")
+# Pretty print for human readability
+pretty = Cton.dump(payload, pretty: true)
+# Stream to an IO object (file, socket, etc.)
+File.open("data.cton", "w") do |f|
+  Cton.dump(payload, f)
+end
 ```
-### Table detection
+### CLI Tool
-Whenever an array is made of hashes that all expose the same scalar keys, the encoder flattens it into a table to save tokens. Mixed or nested arrays fall back to `[N]=(value1,value2,...)`.
+CTON comes with a command-line tool for quick conversions:
-### Separators & ambiguity
+```bash
+# Convert JSON to CTON
+echo '{"hello": "world"}' | cton
+# => hello=world
-Removing every newline makes certain inputs ambiguous because `sam` and the next key `hikes` can merge into `samhikes`. The default `separator: "\n"` avoids that by inserting a single newline between root segments. You may pass `separator: ""` to `Cton.dump` for maximum compactness, but decoding such strings is only safe if you can guarantee extra quoting or whitespace between segments.
+# Convert CTON to JSON
+echo 'hello=world' | cton --to-json
+# => {"hello":"world"}
-### Literal safety & number normalization
+# Pretty print
+cton --pretty input.json
+```
-Following the TOON specification's guardrails, the encoder now:
+### Advanced Features
+#### Extended Types
+CTON natively supports serialization for:
+- `Time` and `Date` (ISO8601 strings)
+- `Set` (converted to Arrays)
+- `OpenStruct` (converted to Objects)
+#### Table detection
+Whenever an array is made of hashes that all expose the same scalar keys, the encoder flattens it into a table to save tokens. Mixed or nested arrays fall back to `[N]=(value1,value2,...)`.
+#### Separators & ambiguity
+Removing every newline makes certain inputs ambiguous because `sam` and the next key `hikes` can merge into `samhikes`. The default `separator: "\n"` avoids that by inserting a single newline between root segments. You may pass `separator: ""` to `Cton.dump` for maximum compactness, but decoding such strings is only safe if you can guarantee extra quoting or whitespace between segments.
+#### Literal safety & number normalization
+Following the TOON specification's guardrails, the encoder now:
 - Auto-quotes strings that would otherwise be parsed as booleans, `null`, or numbers (e.g., `"true"`, `"007"`, `"1e6"`, `"-5"`) so they round-trip as strings without extra work.
 - Canonicalizes float/BigDecimal output: no exponent notation, no trailing zeros, and `-0` collapses to `0`.
 - Converts `NaN` and `±Infinity` inputs to `null`, matching TOON's normalization guidance so downstream decoders don't explode on non-finite numbers.
+---
 ## Type Safety
 CTON ships with RBS signatures (`sig/cton.rbs`) to support type checking and IDE autocompletion.
@@ -110,4 +226,4 @@ Bug reports and pull requests are welcome at https://github.com/davidesantangelo
 ## License
-MIT © Davide Santangelo
+MIT © [Davide Santangelo](https://github.com/davidesantangelo)

data/lib/cton/decoder.rb CHANGED Viewed

@@ -21,7 +21,7 @@ module Cton
               end
       skip_ws
-      raise ParseError, "Unexpected trailing data" unless @scanner.eos?
+      raise_error("Unexpected trailing data") unless @scanner.eos?
       value
     end
@@ -30,6 +30,20 @@ module Cton
     attr_reader :symbolize_names, :scanner
+    def raise_error(message)
+      line, col = calculate_location(@scanner.pos)
+      raise ParseError, "#{message} at line #{line}, column #{col}"
+    end
+    def calculate_location(pos)
+      string = @scanner.string
+      consumed = string[0...pos]
+      line = consumed.count("\n") + 1
+      last_newline = consumed.rindex("\n")
+      col = last_newline ? pos - last_newline : pos + 1
+      [line, col]
+    end
     def parse_document
       result = {}
       until @scanner.eos?
@@ -43,22 +57,20 @@ module Cton
     def parse_value_for_key
       skip_ws
-      if @scanner.scan(/\(/)
+      if @scanner.scan("(")
         parse_object
-      elsif @scanner.scan(/\[/)
+      elsif @scanner.scan("[")
         parse_array
-      elsif @scanner.scan(/=/)
+      elsif @scanner.scan("=")
         parse_scalar(allow_key_boundary: true)
       else
-        raise ParseError, "Unexpected token at position #{@scanner.pos}"
+        raise_error("Unexpected token")
       end
     end
     def parse_object
       skip_ws
-      if @scanner.scan(/\)/)
-        return {}
-      end
+      return {} if @scanner.scan(")")
       pairs = {}
       loop do
@@ -67,7 +79,8 @@ module Cton
         value = parse_value
         pairs[key] = value
         skip_ws
-        break if @scanner.scan(/\)/)
+        break if @scanner.scan(")")
         expect!(",")
         skip_ws
       end
@@ -92,7 +105,8 @@ module Cton
       fields = []
       loop do
         fields << parse_key_name
-        break if @scanner.scan(/\}/)
+        break if @scanner.scan("}")
         expect!(",")
       end
       fields
@@ -125,9 +139,9 @@ module Cton
     def parse_value(allow_key_boundary: false)
       skip_ws
-      if @scanner.scan(/\(/)
+      if @scanner.scan("(")
         parse_object
-      elsif @scanner.scan(/\[/)
+      elsif @scanner.scan("[")
         parse_array
       elsif @scanner.peek(1) == '"'
         parse_string
@@ -140,101 +154,40 @@ module Cton
       skip_ws
       return parse_string if @scanner.peek(1) == '"'
-      start_pos = @scanner.pos
-      # If we allow key boundary, we need to be careful not to consume the next key
-      # This is the tricky part. The original implementation scanned ahead.
-      # With StringScanner, we can scan until a terminator or whitespace.
+      @scanner.pos
       token = if allow_key_boundary
                 scan_until_boundary_or_terminator
               else
                 scan_until_terminator
               end
-      raise ParseError, "Empty value at #{start_pos}" if token.nil? || token.empty?
+      raise_error("Empty value") if token.nil? || token.empty?
       convert_scalar(token)
     end
     def scan_until_terminator
-      # Scan until we hit a terminator char, whitespace, or structure char
-      # Terminators: , ; ) ] }
-      # Structure: ( [ {
-      # Whitespace
       @scanner.scan(/[^,;\]\}\)\(\[\{\s]+/)
     end
     def scan_until_boundary_or_terminator
-      # This is complex because "key=" looks like a scalar "key" followed by "="
-      # But "value" followed by "key=" means "value" ends before "key".
-      # The original logic used `next_key_index`.
-      # Let's try to replicate the logic:
-      # Scan characters that are safe for keys/values.
-      # If we see something that looks like a key start, check if it is followed by [(=
       start_pos = @scanner.pos
-      # Fast path: scan until something interesting happens
       chunk = @scanner.scan(/[0-9A-Za-z_.:-]+/)
       return nil unless chunk
-      # Now we might have consumed too much if the chunk contains a key.
-      # e.g. "valuekey=" -> chunk is "valuekey"
-      # We need to check if there is a split point inside `chunk` or if `chunk` itself is followed by [(=
-      # Actually, the original logic was:
-      # Find the *first* position where a valid key starts AND is followed by [(=
-      # Let's re-implement `next_key_index` logic but using the scanner's string
-      rest_of_string = @scanner.string[@scanner.pos..-1]
-      # But we also need to consider the chunk we just scanned?
-      # No, `scan_until_boundary_or_terminator` is called when we are at the start of a scalar.
-      # Let's reset and do it properly.
-      @scanner.pos = start_pos
-      full_scalar = scan_until_terminator
-      return nil unless full_scalar
-      # Now check if `full_scalar` contains a key boundary
-      # A key boundary is a substring that matches SAFE_TOKEN and is followed by [(=
-      # We need to look at `full_scalar` + whatever follows (whitespace?) + [(=
-      # But `scan_until_terminator` stops at whitespace.
-      # If `full_scalar` is "valuekey", and next char is "=", then "key" is the key.
-      # But wait, "value" and "key" must be separated?
-      # In CTON, "valuekey=..." is ambiguous if no separator.
-      # The README says: "Removing every newline makes certain inputs ambiguous... The default separator avoids that... You may pass separator: ''... decoding such strings is only safe if you can guarantee extra quoting or whitespace".
-      # So if we are in `allow_key_boundary` mode (top level), we must look for embedded keys.
-      # Let's look for the pattern inside the text we just consumed + lookahead.
-      # Actually, the original `next_key_index` scanned from the current position.
-      # Let's implement a helper that searches for the boundary in the remaining string
-      # starting from `start_pos`.
       boundary_idx = find_key_boundary(start_pos)
       if boundary_idx
-        # We found a boundary at `boundary_idx`.
-        # The scalar ends at `boundary_idx`.
         length = boundary_idx - start_pos
         @scanner.pos = start_pos
         token = @scanner.peek(length)
         @scanner.pos += length
         token
       else
-        # No boundary found, so the whole thing we scanned is the token
-        # We already scanned it into `full_scalar` but we need to put the scanner in the right place.
-        # Wait, I reset the scanner.
-        @scanner.pos = start_pos + full_scalar.length
-        full_scalar
+        @scanner.pos = start_pos + chunk.length
+        chunk
       end
     end
@@ -242,135 +195,24 @@ module Cton
       str = @scanner.string
       len = str.length
       idx = from_index
-      # We are looking for a sequence that matches SAFE_KEY followed by [(=
-      # But we are currently parsing a scalar.
-      # Optimization: we only care about boundaries that appear *before* any terminator/whitespace.
-      # Because if we hit a terminator/whitespace, the scalar ends anyway.
-      # So we only need to check inside the `scan_until_terminator` range?
-      # No, because "valuekey=" has no terminator/whitespace between value and key.
       while idx < len
         char = str[idx]
-        # If we hit a terminator or whitespace, we stop looking for boundaries
-        # because the scalar naturally ends here.
-        if TERMINATORS.include?(char) || whitespace?(char) || "([{".include?(char)
-           return nil
-        end
-        # Check if a key starts here
+        return nil if TERMINATORS.include?(char) || whitespace?(char) || "([{".include?(char)
         if safe_key_char?(char)
-          # Check if this potential key is followed by [(=
-          # We need to scan this potential key
           key_end = idx
-          while key_end < len && safe_key_char?(str[key_end])
-            key_end += 1
-          end
-          # Check what follows
+          key_end += 1 while key_end < len && safe_key_char?(str[key_end])
           next_char_idx = key_end
-          # Skip whitespace after key? No, keys are immediately followed by [(= usually?
-          # The original `next_key_index` did NOT skip whitespace after the key candidate.
-          # "next_char = @source[idx]" (where idx is after key)
           if next_char_idx < len
-             next_char = str[next_char_idx]
-             if ["(", "[", "="].include?(next_char)
-               # Found a boundary!
-               # But wait, is this the *start* of the scalar?
-               # If idx == from_index, then the scalar IS the key? No, that means we are at the start.
-               # If we are at the start, and it looks like a key, then it IS a key, so we should have parsed it as a key?
-               # No, `parse_scalar` is called when we expect a value.
-               # If we are parsing a document "key=valuekey2=value2", we are parsing "valuekey2".
-               # "key2" is the next key. So "value" is the scalar.
-               # So if idx > from_index, we found a split.
-               return idx if idx > from_index
-             end
+            next_char = str[next_char_idx]
+            return idx if ["(", "[", "="].include?(next_char) && (idx > from_index)
           end
-          # If not a boundary, we continue scanning from inside the key?
-          # "valuekey=" -> at 'k', key is "key", followed by '=', so split at 'k'.
-          # "valukey=" -> at 'l', key is "lukey", followed by '=', so split at 'l'.
-          # This seems to imply we should check every position?
-          # The original code:
-          # if safe_key_char?(char)
-          #   start = idx
-          #   idx += 1 while ...
-          #   if start > from_index && ... return start
-          #   idx = start + 1  <-- This is important! It backtracks to check nested keys.
-          #   next
-          # Yes, we need to check every position.
-          # Optimization: The key must end at `key_end`.
-          # If `str[key_end]` is not [(=, then this `key_candidate` is not a key.
-          # But maybe a suffix of it is?
-          # e.g. "abc=" -> "abc" followed by "=". Split at start? No.
-          # "a" followed by "bc="? No.
-          # Actually, if we find a valid key char, we scan to the end of the valid key chars.
-          # Let's say we have "abc=def".
-          # At 'a': key is "abc". Next is "=". "abc" is a key.
-          # If we are at start (from_index), then the whole thing is a key?
-          # But we are parsing a scalar.
-          # If `parse_scalar` sees "abc=", and `allow_key_boundary` is true.
-          # Does it mean "abc" is the scalar? Or "abc" is the next key?
-          # If "abc" is the next key, then the scalar before it is empty?
-          # "key=abc=def" -> key="key", value="abc", next_key="def"? No.
-          # "key=value next=val" -> value="value", next="next".
-          # "key=valuenext=val" -> value="value", next="next".
-          # So if we find a key boundary at `idx`, it means the scalar ends at `idx`.
-          # Let's stick to the original logic:
-          # Scan the maximal sequence of safe chars.
-          # If it is followed by [(=, then it IS a key.
-          # If it starts after `from_index`, then we found the boundary.
-          # If it starts AT `from_index`, then... what?
-          # If we are parsing a scalar, and we see "key=...", then the scalar is empty?
-          # That shouldn't happen if we called `parse_scalar`.
-          # Unless `parse_document` called `parse_value_for_key` -> `parse_scalar`.
-          # But `parse_document` calls `parse_key_name` first.
-          # So we are inside `parse_value`.
-          # Example: "a=1b=2".
-          # parse "a", expect "=", parse value.
-          # value starts at "1".
-          # "1" is safe char. "1b" is safe.
-          # "b" is safe.
-          # At "1": max key is "1b". Next is "=". "1b" is a key? Yes.
-          # Is "1b" followed by "="? Yes.
-          # Does it start > from_index? "1" is at from_index. No.
-          # So "1b" is NOT a boundary.
-          # Continue to next char "b".
-          # At "b": max key is "b". Next is "=". "b" is a key.
-          # Does it start > from_index? Yes ("b" index > "1" index).
-          # So boundary is at "b".
-          # Scalar is "1".
-          # So the logic is:
-          # For each char at `idx`:
-          #   If it can start a key:
-          #     Find end of key `end_key`.
-          #     If `str[end_key]` is [(= :
-          #       If `idx > from_index`: return `idx`.
-          #   idx += 1
-          # But wait, "1b" was a key candidate.
-          # If we advanced `idx` to `end_key`, we would skip "b".
-          # So we must NOT advance `idx` to `end_key` blindly.
-          # We must check `idx`, then `idx+1`, etc.
-          # But `safe_key_char?` is true for all chars in "1b".
-          # So we check "1...", then "b...".
-          # Correct.
         end
         idx += 1
       end
       nil
@@ -396,22 +238,20 @@ module Cton
       expect!("\"")
       buffer = +""
       loop do
-        if @scanner.eos?
-          raise ParseError, "Unterminated string"
-        end
+        raise_error("Unterminated string") if @scanner.eos?
         char = @scanner.getch
-        if char == '\\'
+        if char == "\\"
           escaped = @scanner.getch
-          raise ParseError, "Invalid escape sequence" if escaped.nil?
+          raise_error("Invalid escape sequence") if escaped.nil?
           buffer << case escaped
-                    when 'n' then "\n"
-                    when 'r' then "\r"
-                    when 't' then "\t"
-                    when '"', '\\' then escaped
+                    when "n" then "\n"
+                    when "r" then "\r"
+                    when "t" then "\t"
+                    when '"', "\\" then escaped
                     else
-                      raise ParseError, "Unsupported escape sequence"
+                      raise_error("Unsupported escape sequence")
                     end
         elsif char == '"'
           break
@@ -425,16 +265,16 @@ module Cton
     def parse_key_name
       skip_ws
       token = @scanner.scan(/[0-9A-Za-z_.:-]+/)
-      raise ParseError, "Invalid key" if token.nil?
+      raise_error("Invalid key") if token.nil?
       symbolize_names ? token.to_sym : token
     end
     def parse_integer_literal
       token = @scanner.scan(/-?\d+/)
-      raise ParseError, "Expected digits" if token.nil?
+      raise_error("Expected digits") if token.nil?
       Integer(token, 10)
     rescue ArgumentError
-      raise ParseError, "Invalid length literal"
+      raise_error("Invalid length literal")
     end
     def symbolize_keys(row)
@@ -443,9 +283,9 @@ module Cton
     def expect!(char)
       skip_ws
-      unless @scanner.scan(Regexp.new(Regexp.escape(char)))
-        raise ParseError, "Expected #{char.inspect}, got #{@scanner.peek(1).inspect}"
-      end
+      return if @scanner.scan(Regexp.new(Regexp.escape(char)))
+      raise_error("Expected #{char.inspect}, got #{@scanner.peek(1).inspect}")
     end
     def skip_ws
@@ -453,18 +293,14 @@ module Cton
     end
     def whitespace?(char)
-      char == " " || char == "\t" || char == "\n" || char == "\r"
+      [" ", "\t", "\n", "\r"].include?(char)
     end
     def key_ahead?
-      # Check if the next token looks like a key followed by [(=
-      # We need to preserve position
       pos = @scanner.pos
       skip_ws
-      # Scan a key
       if @scanner.scan(/[0-9A-Za-z_.:-]+/)
-        # Check what follows
         skip_ws
         next_char = @scanner.peek(1)
         result = ["(", "[", "="].include?(next_char)

data/lib/cton/encoder.rb CHANGED Viewed

@@ -1,27 +1,31 @@
 # frozen_string_literal: true
 require "stringio"
+require "time"
+require "date"
 module Cton
   class Encoder
-    SAFE_TOKEN = /\A[0-9A-Za-z_.:-]+\z/.freeze
-    NUMERIC_TOKEN = /\A-?(?:\d+)(?:\.\d+)?(?:[eE][+-]?\d+)?\z/.freeze
+    SAFE_TOKEN = /\A[0-9A-Za-z_.:-]+\z/
+    NUMERIC_TOKEN = /\A-?(?:\d+)(?:\.\d+)?(?:[eE][+-]?\d+)?\z/
     RESERVED_LITERALS = %w[true false null].freeze
     FLOAT_DECIMAL_PRECISION = Float::DIG
-    def initialize(separator: "\n")
+    def initialize(separator: "\n", pretty: false)
       @separator = separator || ""
+      @pretty = pretty
+      @indent_level = 0
     end
-    def encode(payload)
-      @io = StringIO.new
+    def encode(payload, io: nil)
+      @io = io || StringIO.new
       encode_root(payload)
-      @io.string
+      @io.string if @io.is_a?(StringIO)
     end
     private
-    attr_reader :separator, :io
+    attr_reader :separator, :io, :pretty, :indent_level
     def encode_root(value)
       case value
@@ -43,6 +47,12 @@ module Cton
     end
     def encode_value(value, context:)
+      if defined?(Set) && value.is_a?(Set)
+        value = value.to_a
+      elsif defined?(OpenStruct) && value.is_a?(OpenStruct)
+        value = value.to_h
+      end
       case value
       when Hash
         encode_object(value)
@@ -61,13 +71,19 @@ module Cton
       end
       io << "("
+      indent if pretty
       first = true
       hash.each do |key, value|
-        io << "," unless first
+        if first
+          first = false
+        else
+          io << ","
+          newline if pretty
+        end
         io << format_key(key) << "="
         encode_value(value, context: :object)
-        first = false
       end
+      outdent if pretty
       io << ")"
     end
@@ -98,35 +114,63 @@ module Cton
       io << header.map { |key| format_key(key) }.join(",")
       io << "}="
+      indent if pretty
       first_row = true
       rows.each do |row|
-        io << ";" unless first_row
+        if first_row
+          first_row = false
+        else
+          io << ";"
+          newline if pretty
+        end
         first_col = true
         header.each do |field|
           io << "," unless first_col
           encode_scalar(row.fetch(field))
           first_col = false
         end
-        first_row = false
       end
+      outdent if pretty
     end
     def encode_scalar_list(list)
-      first = true
-      list.each do |value|
-        io << "," unless first
-        encode_scalar(value)
-        first = false
+      if pretty
+        indent
+        first = true
+        list.each do |value|
+          if first
+            first = false
+          else
+            io << ","
+            newline
+          end
+          encode_scalar(value)
+        end
+        outdent
+      else
+        first = true
+        list.each do |value|
+          io << "," unless first
+          encode_scalar(value)
+          first = false
+        end
       end
     end
     def encode_mixed_list(list)
+      indent if pretty
       first = true
       list.each do |value|
-        io << "," unless first
+        if first
+          first = false
+        else
+          io << ","
+          newline if pretty
+        end
         encode_value(value, context: :array)
-        first = false
       end
+      outdent if pretty
     end
     def encode_scalar(value)
@@ -139,19 +183,21 @@ module Cton
         io << "null"
       when Numeric
         io << format_number(value)
+      when Time, Date
+        encode_string(value.iso8601)
       else
         raise EncodeError, "Unsupported value: #{value.class}"
       end
     end
     def encode_string(value)
-      if value.empty?
-        io << '""'
-      elsif string_needs_quotes?(value)
-        io << quote_string(value)
-      else
-        io << value
-      end
+      io << if value.empty?
+              '""'
+            elsif string_needs_quotes?(value)
+              quote_string(value)
+            else
+              value
+            end
     end
     def format_number(value)
@@ -172,7 +218,7 @@ module Cton
     end
     def normalize_decimal_string(string)
-      stripped = string.start_with?("+") ? string[1..-1] : string
+      stripped = string.start_with?("+") ? string[1..] : string
       return "0" if zero_string?(stripped)
       if stripped.include?(".")
@@ -197,14 +243,14 @@ module Cton
     def format_key(key)
       key_string = key.to_s
-      unless SAFE_TOKEN.match?(key_string)
-        raise EncodeError, "Invalid key: #{key_string.inspect}"
-      end
+      raise EncodeError, "Invalid key: #{key_string.inspect}" unless SAFE_TOKEN.match?(key_string)
       key_string
     end
     def string_needs_quotes?(value)
       return true unless SAFE_TOKEN.match?(value)
       RESERVED_LITERALS.include?(value) || numeric_like?(value)
     end
@@ -229,7 +275,7 @@ module Cton
     end
     def scalar?(value)
-      value.is_a?(String) || value.is_a?(Numeric) || value == true || value == false || value.nil?
+      value.is_a?(String) || value.is_a?(Numeric) || value == true || value == false || value.nil? || value.is_a?(Time) || value.is_a?(Date)
     end
     def table_candidate?(rows)
@@ -243,5 +289,19 @@ module Cton
         row.is_a?(Hash) && row.keys == keys && row.values.all? { |val| scalar?(val) }
       end
     end
+    def indent
+      @indent_level += 1
+      newline
+    end
+    def outdent
+      @indent_level -= 1
+      newline
+    end
+    def newline
+      io << "\n" << ("  " * indent_level)
+    end
   end
 end

data/lib/cton/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module Cton
-  VERSION = "0.1.1"
+  VERSION = "0.2.0"
 end

data/lib/cton.rb CHANGED Viewed

@@ -12,9 +12,23 @@ module Cton
   module_function
-  def dump(payload, options = {})
+  def dump(payload, *args)
+    io = nil
+    options = {}
+    args.each do |arg|
+      if arg.is_a?(Hash)
+        options.merge!(arg)
+      else
+        io = arg
+      end
+    end
+    io ||= options[:io]
     separator = options.fetch(:separator, "\n")
-    Encoder.new(separator: separator).encode(payload)
+    pretty = options.fetch(:pretty, false)
+    Encoder.new(separator: separator, pretty: pretty).encode(payload, io: io)
   end
   alias generate dump
@@ -23,4 +37,3 @@ module Cton
   end
   alias parse load
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: cton
 version: !ruby/object:Gem::Version
-  version: 0.1.1
+  version: 0.2.0
 platform: ruby
 authors:
 - Davide Santangelo
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2025-11-18 00:00:00.000000000 Z
+date: 2025-11-19 00:00:00.000000000 Z
 dependencies: []
 description: CTON provides a JSON-compatible, token-efficient text representation
   optimized for LLM prompts.
@@ -37,6 +37,7 @@ metadata:
   homepage_uri: https://github.com/davidesantangelo/cton
   source_code_uri: https://github.com/davidesantangelo/cton
   changelog_uri: https://github.com/davidesantangelo/cton/blob/master/CHANGELOG.md
+  rubygems_mfa_required: 'true'
 post_install_message:
 rdoc_options: []
 require_paths: