cton 0.1.1 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: '0295922a011dd898278f9f57de2f48f2acbe0ce3363f263f4e2b2753993ebdea'
4
- data.tar.gz: 33bb13ee23584ff6cd51bf6bc6c1a869d02f206ff0792a172c9aa7cdc5547977
3
+ metadata.gz: b010a8f0e0da39e4e4d0a4217eddaa8f9496f1889bf32e12430fdb7737f17fab
4
+ data.tar.gz: 6fe6f58ff0a40233a279ae5c8881ccca4ce382fa85cae15c2c5e26782bb02875
5
5
  SHA512:
6
- metadata.gz: 9dff47df67680eabf6fb7ac05dac606e969df0a0d31d575318fa4a72c51c8fe85d38b671f4e2b8b8caa0e969184c044e06547b135b9a5b5b0baa4c3e28232322
7
- data.tar.gz: be363392d2305b6940e46060310a908922bbf822ae487e9d4b20441e4cc51e5e8161332cd5d8c0846743663d1251ff7b1b5480f21f36f9b52c0aafb0404f4b74
6
+ metadata.gz: 3a85563dd205c2c00b204359d85376514de8fc45ce2b2c98e4d52a0325bff2937e2d88ba5e367fe718a0b82127603deadfe16dd6f60062e77a1b75babc666ec4
7
+ data.tar.gz: b4b27bfb483e0145c49def7b9ab735c27e03420dc59fd6bcaabc57d1b2bf6868d7bc5c55fea9866da3270a6c81126df032590129a1e28385827b8b4f3058e92a
data/CHANGELOG.md CHANGED
@@ -5,6 +5,20 @@ All notable changes to this project will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [0.2.0] - 2025-11-19
9
+
10
+ ### Added
11
+
12
+ - **CLI Tool**: New `bin/cton` executable for converting between JSON and CTON from the command line. Supports auto-detection, pretty printing, and file I/O.
13
+ - **Streaming IO**: `Cton.dump` now accepts an `IO` object as the second argument (or via `io:` keyword), allowing direct writing to files or sockets without intermediate string allocation.
14
+ - **Pretty Printing**: Added `pretty: true` option to `Cton.dump` to format output with indentation and newlines for better readability.
15
+ - **Extended Types**: Native support for `Time`, `Date` (ISO8601), `Set` (as Array), and `OpenStruct` (as Object).
16
+ - **Enhanced Error Reporting**: `ParseError` now includes line and column numbers to help locate syntax errors in large documents.
17
+
18
+ ### Changed
19
+
20
+ - **Ruby 3 Compatibility**: Improved argument handling in `Cton.dump` to robustly support Ruby 3 keyword arguments when passing hashes.
21
+
8
22
  ## [0.1.1] - 2025-11-18
9
23
 
10
24
  ### Changed
data/README.md CHANGED
@@ -1,32 +1,113 @@
1
1
  # CTON
2
2
 
3
- CTON (Compact Token-Oriented Notation) is an aggressively minified, JSON-compatible wire format that keeps prompts short without giving up schema hints. It is shape-preserving (objects, arrays, scalars, table-like arrays) and deterministic, so you can safely round-trip between Ruby hashes and compact strings that work well in LLM prompts.
3
+ [![Gem Version](https://badge.fury.io/rb/cton.svg)](https://badge.fury.io/rb/cton)
4
+ [![License](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/davidesantangelo/cton/blob/master/LICENSE.txt)
5
+
6
+ **CTON** (Compact Token-Oriented Notation) is an aggressively minified, JSON-compatible wire format that keeps prompts short without giving up schema hints. It is shape-preserving (objects, arrays, scalars, table-like arrays) and deterministic, so you can safely round-trip between Ruby hashes and compact strings that work well in LLM prompts.
7
+
8
+ ---
9
+
10
+ ## 📖 Table of Contents
11
+
12
+ - [What is CTON?](#what-is-cton)
13
+ - [Why another format?](#why-another-format)
14
+ - [Examples](#examples)
15
+ - [Token Savings](#token-savings-vs-json--toon)
16
+ - [Installation](#installation)
17
+ - [Usage](#usage)
18
+ - [Development](#development)
19
+ - [Contributing](#contributing)
20
+ - [License](#license)
21
+
22
+ ---
23
+
24
+ ## What is CTON?
25
+
26
+ CTON is designed to be the most efficient way to represent structured data for Large Language Models (LLMs). It strips away the "syntactic sugar" of JSON that humans like (indentation, excessive quoting, braces) but machines don't strictly need, while adding "structural hints" that help LLMs generate valid output.
27
+
28
+ ### Key Concepts
29
+
30
+ 1. **Root is Implicit**: No curly braces `{}` wrapping the entire document.
31
+ 2. **Minimal Punctuation**:
32
+ * Objects use `key=value`.
33
+ * Nested objects use parentheses `(key=value)`.
34
+ * Arrays use brackets with length `[N]=item1,item2`.
35
+ 3. **Table Compression**: If an array contains objects with the same keys, CTON automatically converts it into a table format `[N]{header1,header2}=val1,val2;val3,val4`. This is a massive token saver for datasets.
36
+
37
+ ---
38
+
39
+ ## Examples
40
+
41
+ ### Simple Key-Value Pairs
42
+
43
+ **JSON**
44
+ ```json
45
+ {
46
+ "task": "planning",
47
+ "urgent": true,
48
+ "id": 123
49
+ }
50
+ ```
51
+
52
+ **CTON**
53
+ ```text
54
+ task=planning,urgent=true,id=123
55
+ ```
56
+
57
+ ### Nested Objects
58
+
59
+ **JSON**
60
+ ```json
61
+ {
62
+ "user": {
63
+ "name": "Davide",
64
+ "settings": {
65
+ "theme": "dark"
66
+ }
67
+ }
68
+ }
69
+ ```
70
+
71
+ **CTON**
72
+ ```text
73
+ user(name=Davide,settings(theme=dark))
74
+ ```
75
+
76
+ ### Arrays and Tables
77
+
78
+ **JSON**
79
+ ```json
80
+ {
81
+ "tags": ["ruby", "gem", "llm"],
82
+ "files": [
83
+ { "name": "README.md", "size": 1024 },
84
+ { "name": "lib/cton.rb", "size": 2048 }
85
+ ]
86
+ }
87
+ ```
88
+
89
+ **CTON**
90
+ ```text
91
+ tags[3]=ruby,gem,llm
92
+ files[2]{name,size}=README.md,1024;lib/cton.rb,2048
93
+ ```
94
+
95
+ ---
4
96
 
5
97
  ## Why another format?
6
98
 
7
99
  - **Less noise than YAML/JSON**: no indentation, no braces around the root, and optional quoting.
8
100
  - **Schema guardrails**: arrays carry their length (`friends[3]`) and table headers (`{id,name,...}`) so downstream parsing can verify shape.
9
101
  - **LLM-friendly**: works as a single string you can embed in a prompt together with short parsing instructions.
10
- - **Token savings**: CTON compounds the JSON → TOON savings; see the section below for concrete numbers.
102
+ - **Token savings**: CTON compounds the JSON → TOON savings.
11
103
 
12
- ## Token savings vs JSON & TOON
104
+ ### Token savings vs JSON & TOON
13
105
 
14
106
  - **JSON → TOON**: The [TOON benchmarks](https://toonformat.dev) report roughly 40% fewer tokens than plain JSON on mixed-structure prompts while retaining accuracy due to explicit array lengths and headers.
15
- - **TOON → CTON**: By stripping indentation and forcing everything inline, CTON cuts another ~20–40% of characters. The sample above is ~350 characters as TOON and ~250 as CTON (~29% fewer), and larger tabular datasets show similar reductions.
16
- - **Net effect**: In practice you can often reclaim 50–60% of the token budget versus raw JSON, leaving more room for instructions or reasoning steps while keeping a deterministic schema.
17
-
18
- ## Format at a glance
107
+ - **TOON → CTON**: By stripping indentation and forcing everything inline, CTON cuts another ~20–40% of characters.
108
+ - **Net effect**: In practice you can often reclaim **50–60% of the token budget** versus raw JSON, leaving more room for instructions or reasoning steps while keeping a deterministic schema.
19
109
 
20
- ```
21
- context(task="Our favorite hikes together",location=Boulder,season=spring_2025)
22
- friends[3]=ana,luis,sam
23
- hikes[3]{id,name,distanceKm,elevationGain,companion,wasSunny}=1,"Blue Lake Trail",7.5,320,ana,true;2,"Ridge Overlook",9.2,540,luis,false;3,"Wildflower Loop",5.1,180,sam,true
24
- ```
25
-
26
- - Objects use parentheses and `key=value` pairs separated by commas.
27
- - Arrays encode their length: `[N]=...`. When every element is a flat hash with the same keys, they collapse into a compact table: `[N]{key1,key2}=row1;row2`.
28
- - Scalars (numbers, booleans, `null`) keep their JSON text. Strings only need quotes when they contain whitespace or reserved punctuation.
29
- - For parsing safety the Ruby encoder inserts a single `\n` between top-level segments. You can override this if you truly need a fully inline document (see options below).
110
+ ---
30
111
 
31
112
  ## Installation
32
113
 
@@ -42,28 +123,32 @@ Or install it directly:
42
123
  gem install cton
43
124
  ```
44
125
 
126
+ ---
127
+
45
128
  ## Usage
46
129
 
47
130
  ```ruby
48
131
  require "cton"
49
132
 
50
133
  payload = {
51
- "context" => {
52
- "task" => "Our favorite hikes together",
53
- "location" => "Boulder",
54
- "season" => "spring_2025"
55
- },
56
- "friends" => %w[ana luis sam],
57
- "hikes" => [
58
- { "id" => 1, "name" => "Blue Lake Trail", "distanceKm" => 7.5, "elevationGain" => 320, "companion" => "ana", "wasSunny" => true },
59
- { "id" => 2, "name" => "Ridge Overlook", "distanceKm" => 9.2, "elevationGain" => 540, "companion" => "luis", "wasSunny" => false },
60
- { "id" => 3, "name" => "Wildflower Loop", "distanceKm" => 5.1, "elevationGain" => 180, "companion" => "sam", "wasSunny" => true }
61
- ]
134
+ "context" => {
135
+ "task" => "Our favorite hikes together",
136
+ "location" => "Boulder",
137
+ "season" => "spring_2025"
138
+ },
139
+ "friends" => %w[ana luis sam],
140
+ "hikes" => [
141
+ { "id" => 1, "name" => "Blue Lake Trail", "distanceKm" => 7.5, "elevationGain" => 320, "companion" => "ana", "wasSunny" => true },
142
+ { "id" => 2, "name" => "Ridge Overlook", "distanceKm" => 9.2, "elevationGain" => 540, "companion" => "luis", "wasSunny" => false },
143
+ { "id" => 3, "name" => "Wildflower Loop", "distanceKm" => 5.1, "elevationGain" => 180, "companion" => "sam", "wasSunny" => true }
144
+ ]
62
145
  }
63
146
 
147
+ # Encode to CTON
64
148
  cton = Cton.dump(payload)
65
149
  # => "context(... )\nfriends[3]=ana,luis,sam\nhikes[3]{...}"
66
150
 
151
+ # Decode back to Hash
67
152
  round_tripped = Cton.load(cton)
68
153
  # => original hash
69
154
 
@@ -72,24 +157,55 @@ symbolized = Cton.load(cton, symbolize_names: true)
72
157
 
73
158
  # Want a truly inline document? Opt in explicitly (decoding becomes unsafe for ambiguous cases).
74
159
  inline = Cton.dump(payload, separator: "")
160
+
161
+ # Pretty print for human readability
162
+ pretty = Cton.dump(payload, pretty: true)
163
+
164
+ # Stream to an IO object (file, socket, etc.)
165
+ File.open("data.cton", "w") do |f|
166
+ Cton.dump(payload, f)
167
+ end
75
168
  ```
76
169
 
77
- ### Table detection
170
+ ### CLI Tool
78
171
 
79
- Whenever an array is made of hashes that all expose the same scalar keys, the encoder flattens it into a table to save tokens. Mixed or nested arrays fall back to `[N]=(value1,value2,...)`.
172
+ CTON comes with a command-line tool for quick conversions:
80
173
 
81
- ### Separators & ambiguity
174
+ ```bash
175
+ # Convert JSON to CTON
176
+ echo '{"hello": "world"}' | cton
177
+ # => hello=world
82
178
 
83
- Removing every newline makes certain inputs ambiguous because `sam` and the next key `hikes` can merge into `samhikes`. The default `separator: "\n"` avoids that by inserting a single newline between root segments. You may pass `separator: ""` to `Cton.dump` for maximum compactness, but decoding such strings is only safe if you can guarantee extra quoting or whitespace between segments.
179
+ # Convert CTON to JSON
180
+ echo 'hello=world' | cton --to-json
181
+ # => {"hello":"world"}
84
182
 
85
- ### Literal safety & number normalization
183
+ # Pretty print
184
+ cton --pretty input.json
185
+ ```
86
186
 
87
- Following the TOON specification's guardrails, the encoder now:
187
+ ### Advanced Features
188
+
189
+ #### Extended Types
190
+ CTON natively supports serialization for:
191
+ - `Time` and `Date` (ISO8601 strings)
192
+ - `Set` (converted to Arrays)
193
+ - `OpenStruct` (converted to Objects)
88
194
 
195
+ #### Table detection
196
+ Whenever an array is made of hashes that all expose the same scalar keys, the encoder flattens it into a table to save tokens. Mixed or nested arrays fall back to `[N]=(value1,value2,...)`.
197
+
198
+ #### Separators & ambiguity
199
+ Removing every newline makes certain inputs ambiguous because `sam` and the next key `hikes` can merge into `samhikes`. The default `separator: "\n"` avoids that by inserting a single newline between root segments. You may pass `separator: ""` to `Cton.dump` for maximum compactness, but decoding such strings is only safe if you can guarantee extra quoting or whitespace between segments.
200
+
201
+ #### Literal safety & number normalization
202
+ Following the TOON specification's guardrails, the encoder now:
89
203
  - Auto-quotes strings that would otherwise be parsed as booleans, `null`, or numbers (e.g., `"true"`, `"007"`, `"1e6"`, `"-5"`) so they round-trip as strings without extra work.
90
204
  - Canonicalizes float/BigDecimal output: no exponent notation, no trailing zeros, and `-0` collapses to `0`.
91
205
  - Converts `NaN` and `±Infinity` inputs to `null`, matching TOON's normalization guidance so downstream decoders don't explode on non-finite numbers.
92
206
 
207
+ ---
208
+
93
209
  ## Type Safety
94
210
 
95
211
  CTON ships with RBS signatures (`sig/cton.rbs`) to support type checking and IDE autocompletion.
@@ -110,4 +226,4 @@ Bug reports and pull requests are welcome at https://github.com/davidesantangelo
110
226
 
111
227
  ## License
112
228
 
113
- MIT © Davide Santangelo
229
+ MIT © [Davide Santangelo](https://github.com/davidesantangelo)
data/lib/cton/decoder.rb CHANGED
@@ -21,7 +21,7 @@ module Cton
21
21
  end
22
22
 
23
23
  skip_ws
24
- raise ParseError, "Unexpected trailing data" unless @scanner.eos?
24
+ raise_error("Unexpected trailing data") unless @scanner.eos?
25
25
 
26
26
  value
27
27
  end
@@ -30,6 +30,20 @@ module Cton
30
30
 
31
31
  attr_reader :symbolize_names, :scanner
32
32
 
33
+ def raise_error(message)
34
+ line, col = calculate_location(@scanner.pos)
35
+ raise ParseError, "#{message} at line #{line}, column #{col}"
36
+ end
37
+
38
+ def calculate_location(pos)
39
+ string = @scanner.string
40
+ consumed = string[0...pos]
41
+ line = consumed.count("\n") + 1
42
+ last_newline = consumed.rindex("\n")
43
+ col = last_newline ? pos - last_newline : pos + 1
44
+ [line, col]
45
+ end
46
+
33
47
  def parse_document
34
48
  result = {}
35
49
  until @scanner.eos?
@@ -43,22 +57,20 @@ module Cton
43
57
 
44
58
  def parse_value_for_key
45
59
  skip_ws
46
- if @scanner.scan(/\(/)
60
+ if @scanner.scan("(")
47
61
  parse_object
48
- elsif @scanner.scan(/\[/)
62
+ elsif @scanner.scan("[")
49
63
  parse_array
50
- elsif @scanner.scan(/=/)
64
+ elsif @scanner.scan("=")
51
65
  parse_scalar(allow_key_boundary: true)
52
66
  else
53
- raise ParseError, "Unexpected token at position #{@scanner.pos}"
67
+ raise_error("Unexpected token")
54
68
  end
55
69
  end
56
70
 
57
71
  def parse_object
58
72
  skip_ws
59
- if @scanner.scan(/\)/)
60
- return {}
61
- end
73
+ return {} if @scanner.scan(")")
62
74
 
63
75
  pairs = {}
64
76
  loop do
@@ -67,7 +79,8 @@ module Cton
67
79
  value = parse_value
68
80
  pairs[key] = value
69
81
  skip_ws
70
- break if @scanner.scan(/\)/)
82
+ break if @scanner.scan(")")
83
+
71
84
  expect!(",")
72
85
  skip_ws
73
86
  end
@@ -92,7 +105,8 @@ module Cton
92
105
  fields = []
93
106
  loop do
94
107
  fields << parse_key_name
95
- break if @scanner.scan(/\}/)
108
+ break if @scanner.scan("}")
109
+
96
110
  expect!(",")
97
111
  end
98
112
  fields
@@ -125,9 +139,9 @@ module Cton
125
139
 
126
140
  def parse_value(allow_key_boundary: false)
127
141
  skip_ws
128
- if @scanner.scan(/\(/)
142
+ if @scanner.scan("(")
129
143
  parse_object
130
- elsif @scanner.scan(/\[/)
144
+ elsif @scanner.scan("[")
131
145
  parse_array
132
146
  elsif @scanner.peek(1) == '"'
133
147
  parse_string
@@ -140,101 +154,40 @@ module Cton
140
154
  skip_ws
141
155
  return parse_string if @scanner.peek(1) == '"'
142
156
 
143
- start_pos = @scanner.pos
144
-
145
- # If we allow key boundary, we need to be careful not to consume the next key
146
- # This is the tricky part. The original implementation scanned ahead.
147
- # With StringScanner, we can scan until a terminator or whitespace.
148
-
157
+ @scanner.pos
158
+
149
159
  token = if allow_key_boundary
150
160
  scan_until_boundary_or_terminator
151
161
  else
152
162
  scan_until_terminator
153
163
  end
154
164
 
155
- raise ParseError, "Empty value at #{start_pos}" if token.nil? || token.empty?
165
+ raise_error("Empty value") if token.nil? || token.empty?
156
166
 
157
167
  convert_scalar(token)
158
168
  end
159
169
 
160
170
  def scan_until_terminator
161
- # Scan until we hit a terminator char, whitespace, or structure char
162
- # Terminators: , ; ) ] }
163
- # Structure: ( [ {
164
- # Whitespace
165
-
166
171
  @scanner.scan(/[^,;\]\}\)\(\[\{\s]+/)
167
172
  end
168
173
 
169
174
  def scan_until_boundary_or_terminator
170
- # This is complex because "key=" looks like a scalar "key" followed by "="
171
- # But "value" followed by "key=" means "value" ends before "key".
172
- # The original logic used `next_key_index`.
173
-
174
- # Let's try to replicate the logic:
175
- # Scan characters that are safe for keys/values.
176
- # If we see something that looks like a key start, check if it is followed by [(=
177
-
178
175
  start_pos = @scanner.pos
179
-
180
- # Fast path: scan until something interesting happens
176
+
181
177
  chunk = @scanner.scan(/[0-9A-Za-z_.:-]+/)
182
178
  return nil unless chunk
183
-
184
- # Now we might have consumed too much if the chunk contains a key.
185
- # e.g. "valuekey=" -> chunk is "valuekey"
186
- # We need to check if there is a split point inside `chunk` or if `chunk` itself is followed by [(=
187
-
188
- # Actually, the original logic was:
189
- # Find the *first* position where a valid key starts AND is followed by [(=
190
-
191
- # Let's re-implement `next_key_index` logic but using the scanner's string
192
-
193
- rest_of_string = @scanner.string[@scanner.pos..-1]
194
- # But we also need to consider the chunk we just scanned?
195
- # No, `scan_until_boundary_or_terminator` is called when we are at the start of a scalar.
196
-
197
- # Let's reset and do it properly.
198
- @scanner.pos = start_pos
199
-
200
- full_scalar = scan_until_terminator
201
- return nil unless full_scalar
202
-
203
- # Now check if `full_scalar` contains a key boundary
204
- # A key boundary is a substring that matches SAFE_TOKEN and is followed by [(=
205
-
206
- # We need to look at `full_scalar` + whatever follows (whitespace?) + [(=
207
- # But `scan_until_terminator` stops at whitespace.
208
-
209
- # If `full_scalar` is "valuekey", and next char is "=", then "key" is the key.
210
- # But wait, "value" and "key" must be separated?
211
- # In CTON, "valuekey=..." is ambiguous if no separator.
212
- # The README says: "Removing every newline makes certain inputs ambiguous... The default separator avoids that... You may pass separator: ''... decoding such strings is only safe if you can guarantee extra quoting or whitespace".
213
-
214
- # So if we are in `allow_key_boundary` mode (top level), we must look for embedded keys.
215
-
216
- # Let's look for the pattern inside the text we just consumed + lookahead.
217
- # Actually, the original `next_key_index` scanned from the current position.
218
-
219
- # Let's implement a helper that searches for the boundary in the remaining string
220
- # starting from `start_pos`.
221
-
179
+
222
180
  boundary_idx = find_key_boundary(start_pos)
223
-
181
+
224
182
  if boundary_idx
225
- # We found a boundary at `boundary_idx`.
226
- # The scalar ends at `boundary_idx`.
227
183
  length = boundary_idx - start_pos
228
184
  @scanner.pos = start_pos
229
185
  token = @scanner.peek(length)
230
186
  @scanner.pos += length
231
187
  token
232
188
  else
233
- # No boundary found, so the whole thing we scanned is the token
234
- # We already scanned it into `full_scalar` but we need to put the scanner in the right place.
235
- # Wait, I reset the scanner.
236
- @scanner.pos = start_pos + full_scalar.length
237
- full_scalar
189
+ @scanner.pos = start_pos + chunk.length
190
+ chunk
238
191
  end
239
192
  end
240
193
 
@@ -242,135 +195,24 @@ module Cton
242
195
  str = @scanner.string
243
196
  len = str.length
244
197
  idx = from_index
245
-
246
- # We are looking for a sequence that matches SAFE_KEY followed by [(=
247
- # But we are currently parsing a scalar.
248
-
249
- # Optimization: we only care about boundaries that appear *before* any terminator/whitespace.
250
- # Because if we hit a terminator/whitespace, the scalar ends anyway.
251
-
252
- # So we only need to check inside the `scan_until_terminator` range?
253
- # No, because "valuekey=" has no terminator/whitespace between value and key.
254
-
198
+
255
199
  while idx < len
256
200
  char = str[idx]
257
-
258
- # If we hit a terminator or whitespace, we stop looking for boundaries
259
- # because the scalar naturally ends here.
260
- if TERMINATORS.include?(char) || whitespace?(char) || "([{".include?(char)
261
- return nil
262
- end
263
-
264
- # Check if a key starts here
201
+
202
+ return nil if TERMINATORS.include?(char) || whitespace?(char) || "([{".include?(char)
203
+
265
204
  if safe_key_char?(char)
266
- # Check if this potential key is followed by [(=
267
- # We need to scan this potential key
268
205
  key_end = idx
269
- while key_end < len && safe_key_char?(str[key_end])
270
- key_end += 1
271
- end
272
-
273
- # Check what follows
206
+ key_end += 1 while key_end < len && safe_key_char?(str[key_end])
207
+
274
208
  next_char_idx = key_end
275
- # Skip whitespace after key? No, keys are immediately followed by [(= usually?
276
- # The original `next_key_index` did NOT skip whitespace after the key candidate.
277
- # "next_char = @source[idx]" (where idx is after key)
278
-
209
+
279
210
  if next_char_idx < len
280
- next_char = str[next_char_idx]
281
- if ["(", "[", "="].include?(next_char)
282
- # Found a boundary!
283
- # But wait, is this the *start* of the scalar?
284
- # If idx == from_index, then the scalar IS the key? No, that means we are at the start.
285
- # If we are at the start, and it looks like a key, then it IS a key, so we should have parsed it as a key?
286
- # No, `parse_scalar` is called when we expect a value.
287
- # If we are parsing a document "key=valuekey2=value2", we are parsing "valuekey2".
288
- # "key2" is the next key. So "value" is the scalar.
289
- # So if idx > from_index, we found a split.
290
-
291
- return idx if idx > from_index
292
- end
211
+ next_char = str[next_char_idx]
212
+ return idx if ["(", "[", "="].include?(next_char) && (idx > from_index)
293
213
  end
294
-
295
- # If not a boundary, we continue scanning from inside the key?
296
- # "valuekey=" -> at 'k', key is "key", followed by '=', so split at 'k'.
297
- # "valukey=" -> at 'l', key is "lukey", followed by '=', so split at 'l'.
298
- # This seems to imply we should check every position?
299
- # The original code:
300
- # if safe_key_char?(char)
301
- # start = idx
302
- # idx += 1 while ...
303
- # if start > from_index && ... return start
304
- # idx = start + 1 <-- This is important! It backtracks to check nested keys.
305
- # next
306
-
307
- # Yes, we need to check every position.
308
-
309
- # Optimization: The key must end at `key_end`.
310
- # If `str[key_end]` is not [(=, then this `key_candidate` is not a key.
311
- # But maybe a suffix of it is?
312
- # e.g. "abc=" -> "abc" followed by "=". Split at start? No.
313
- # "a" followed by "bc="? No.
314
-
315
- # Actually, if we find a valid key char, we scan to the end of the valid key chars.
316
- # Let's say we have "abc=def".
317
- # At 'a': key is "abc". Next is "=". "abc" is a key.
318
- # If we are at start (from_index), then the whole thing is a key?
319
- # But we are parsing a scalar.
320
- # If `parse_scalar` sees "abc=", and `allow_key_boundary` is true.
321
- # Does it mean "abc" is the scalar? Or "abc" is the next key?
322
- # If "abc" is the next key, then the scalar before it is empty?
323
- # "key=abc=def" -> key="key", value="abc", next_key="def"? No.
324
- # "key=value next=val" -> value="value", next="next".
325
- # "key=valuenext=val" -> value="value", next="next".
326
-
327
- # So if we find a key boundary at `idx`, it means the scalar ends at `idx`.
328
-
329
- # Let's stick to the original logic:
330
- # Scan the maximal sequence of safe chars.
331
- # If it is followed by [(=, then it IS a key.
332
- # If it starts after `from_index`, then we found the boundary.
333
- # If it starts AT `from_index`, then... what?
334
- # If we are parsing a scalar, and we see "key=...", then the scalar is empty?
335
- # That shouldn't happen if we called `parse_scalar`.
336
- # Unless `parse_document` called `parse_value_for_key` -> `parse_scalar`.
337
- # But `parse_document` calls `parse_key_name` first.
338
- # So we are inside `parse_value`.
339
-
340
- # Example: "a=1b=2".
341
- # parse "a", expect "=", parse value.
342
- # value starts at "1".
343
- # "1" is safe char. "1b" is safe.
344
- # "b" is safe.
345
- # At "1": max key is "1b". Next is "=". "1b" is a key? Yes.
346
- # Is "1b" followed by "="? Yes.
347
- # Does it start > from_index? "1" is at from_index. No.
348
- # So "1b" is NOT a boundary.
349
- # Continue to next char "b".
350
- # At "b": max key is "b". Next is "=". "b" is a key.
351
- # Does it start > from_index? Yes ("b" index > "1" index).
352
- # So boundary is at "b".
353
- # Scalar is "1".
354
-
355
- # So the logic is:
356
- # For each char at `idx`:
357
- # If it can start a key:
358
- # Find end of key `end_key`.
359
- # If `str[end_key]` is [(= :
360
- # If `idx > from_index`: return `idx`.
361
- # idx += 1
362
-
363
- # But wait, "1b" was a key candidate.
364
- # If we advanced `idx` to `end_key`, we would skip "b".
365
- # So we must NOT advance `idx` to `end_key` blindly.
366
- # We must check `idx`, then `idx+1`, etc.
367
-
368
- # But `safe_key_char?` is true for all chars in "1b".
369
- # So we check "1...", then "b...".
370
-
371
- # Correct.
372
214
  end
373
-
215
+
374
216
  idx += 1
375
217
  end
376
218
  nil
@@ -396,22 +238,20 @@ module Cton
396
238
  expect!("\"")
397
239
  buffer = +""
398
240
  loop do
399
- if @scanner.eos?
400
- raise ParseError, "Unterminated string"
401
- end
402
-
241
+ raise_error("Unterminated string") if @scanner.eos?
242
+
403
243
  char = @scanner.getch
404
-
405
- if char == '\\'
244
+
245
+ if char == "\\"
406
246
  escaped = @scanner.getch
407
- raise ParseError, "Invalid escape sequence" if escaped.nil?
247
+ raise_error("Invalid escape sequence") if escaped.nil?
408
248
  buffer << case escaped
409
- when 'n' then "\n"
410
- when 'r' then "\r"
411
- when 't' then "\t"
412
- when '"', '\\' then escaped
249
+ when "n" then "\n"
250
+ when "r" then "\r"
251
+ when "t" then "\t"
252
+ when '"', "\\" then escaped
413
253
  else
414
- raise ParseError, "Unsupported escape sequence"
254
+ raise_error("Unsupported escape sequence")
415
255
  end
416
256
  elsif char == '"'
417
257
  break
@@ -425,16 +265,16 @@ module Cton
425
265
  def parse_key_name
426
266
  skip_ws
427
267
  token = @scanner.scan(/[0-9A-Za-z_.:-]+/)
428
- raise ParseError, "Invalid key" if token.nil?
268
+ raise_error("Invalid key") if token.nil?
429
269
  symbolize_names ? token.to_sym : token
430
270
  end
431
271
 
432
272
  def parse_integer_literal
433
273
  token = @scanner.scan(/-?\d+/)
434
- raise ParseError, "Expected digits" if token.nil?
274
+ raise_error("Expected digits") if token.nil?
435
275
  Integer(token, 10)
436
276
  rescue ArgumentError
437
- raise ParseError, "Invalid length literal"
277
+ raise_error("Invalid length literal")
438
278
  end
439
279
 
440
280
  def symbolize_keys(row)
@@ -443,9 +283,9 @@ module Cton
443
283
 
444
284
  def expect!(char)
445
285
  skip_ws
446
- unless @scanner.scan(Regexp.new(Regexp.escape(char)))
447
- raise ParseError, "Expected #{char.inspect}, got #{@scanner.peek(1).inspect}"
448
- end
286
+ return if @scanner.scan(Regexp.new(Regexp.escape(char)))
287
+
288
+ raise_error("Expected #{char.inspect}, got #{@scanner.peek(1).inspect}")
449
289
  end
450
290
 
451
291
  def skip_ws
@@ -453,18 +293,14 @@ module Cton
453
293
  end
454
294
 
455
295
  def whitespace?(char)
456
- char == " " || char == "\t" || char == "\n" || char == "\r"
296
+ [" ", "\t", "\n", "\r"].include?(char)
457
297
  end
458
298
 
459
299
  def key_ahead?
460
- # Check if the next token looks like a key followed by [(=
461
- # We need to preserve position
462
300
  pos = @scanner.pos
463
301
  skip_ws
464
-
465
- # Scan a key
302
+
466
303
  if @scanner.scan(/[0-9A-Za-z_.:-]+/)
467
- # Check what follows
468
304
  skip_ws
469
305
  next_char = @scanner.peek(1)
470
306
  result = ["(", "[", "="].include?(next_char)
data/lib/cton/encoder.rb CHANGED
@@ -1,27 +1,31 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  require "stringio"
4
+ require "time"
5
+ require "date"
4
6
 
5
7
  module Cton
6
8
  class Encoder
7
- SAFE_TOKEN = /\A[0-9A-Za-z_.:-]+\z/.freeze
8
- NUMERIC_TOKEN = /\A-?(?:\d+)(?:\.\d+)?(?:[eE][+-]?\d+)?\z/.freeze
9
+ SAFE_TOKEN = /\A[0-9A-Za-z_.:-]+\z/
10
+ NUMERIC_TOKEN = /\A-?(?:\d+)(?:\.\d+)?(?:[eE][+-]?\d+)?\z/
9
11
  RESERVED_LITERALS = %w[true false null].freeze
10
12
  FLOAT_DECIMAL_PRECISION = Float::DIG
11
13
 
12
- def initialize(separator: "\n")
14
+ def initialize(separator: "\n", pretty: false)
13
15
  @separator = separator || ""
16
+ @pretty = pretty
17
+ @indent_level = 0
14
18
  end
15
19
 
16
- def encode(payload)
17
- @io = StringIO.new
20
+ def encode(payload, io: nil)
21
+ @io = io || StringIO.new
18
22
  encode_root(payload)
19
- @io.string
23
+ @io.string if @io.is_a?(StringIO)
20
24
  end
21
25
 
22
26
  private
23
27
 
24
- attr_reader :separator, :io
28
+ attr_reader :separator, :io, :pretty, :indent_level
25
29
 
26
30
  def encode_root(value)
27
31
  case value
@@ -43,6 +47,12 @@ module Cton
43
47
  end
44
48
 
45
49
  def encode_value(value, context:)
50
+ if defined?(Set) && value.is_a?(Set)
51
+ value = value.to_a
52
+ elsif defined?(OpenStruct) && value.is_a?(OpenStruct)
53
+ value = value.to_h
54
+ end
55
+
46
56
  case value
47
57
  when Hash
48
58
  encode_object(value)
@@ -61,13 +71,19 @@ module Cton
61
71
  end
62
72
 
63
73
  io << "("
74
+ indent if pretty
64
75
  first = true
65
76
  hash.each do |key, value|
66
- io << "," unless first
77
+ if first
78
+ first = false
79
+ else
80
+ io << ","
81
+ newline if pretty
82
+ end
67
83
  io << format_key(key) << "="
68
84
  encode_value(value, context: :object)
69
- first = false
70
85
  end
86
+ outdent if pretty
71
87
  io << ")"
72
88
  end
73
89
 
@@ -98,35 +114,63 @@ module Cton
98
114
  io << header.map { |key| format_key(key) }.join(",")
99
115
  io << "}="
100
116
 
117
+ indent if pretty
101
118
  first_row = true
102
119
  rows.each do |row|
103
- io << ";" unless first_row
120
+ if first_row
121
+ first_row = false
122
+ else
123
+ io << ";"
124
+ newline if pretty
125
+ end
126
+
104
127
  first_col = true
105
128
  header.each do |field|
106
129
  io << "," unless first_col
107
130
  encode_scalar(row.fetch(field))
108
131
  first_col = false
109
132
  end
110
- first_row = false
111
133
  end
134
+ outdent if pretty
112
135
  end
113
136
 
114
137
  def encode_scalar_list(list)
115
- first = true
116
- list.each do |value|
117
- io << "," unless first
118
- encode_scalar(value)
119
- first = false
138
+ if pretty
139
+ indent
140
+ first = true
141
+ list.each do |value|
142
+ if first
143
+ first = false
144
+ else
145
+ io << ","
146
+ newline
147
+ end
148
+ encode_scalar(value)
149
+ end
150
+ outdent
151
+ else
152
+ first = true
153
+ list.each do |value|
154
+ io << "," unless first
155
+ encode_scalar(value)
156
+ first = false
157
+ end
120
158
  end
121
159
  end
122
160
 
123
161
  def encode_mixed_list(list)
162
+ indent if pretty
124
163
  first = true
125
164
  list.each do |value|
126
- io << "," unless first
165
+ if first
166
+ first = false
167
+ else
168
+ io << ","
169
+ newline if pretty
170
+ end
127
171
  encode_value(value, context: :array)
128
- first = false
129
172
  end
173
+ outdent if pretty
130
174
  end
131
175
 
132
176
  def encode_scalar(value)
@@ -139,19 +183,21 @@ module Cton
139
183
  io << "null"
140
184
  when Numeric
141
185
  io << format_number(value)
186
+ when Time, Date
187
+ encode_string(value.iso8601)
142
188
  else
143
189
  raise EncodeError, "Unsupported value: #{value.class}"
144
190
  end
145
191
  end
146
192
 
147
193
  def encode_string(value)
148
- if value.empty?
149
- io << '""'
150
- elsif string_needs_quotes?(value)
151
- io << quote_string(value)
152
- else
153
- io << value
154
- end
194
+ io << if value.empty?
195
+ '""'
196
+ elsif string_needs_quotes?(value)
197
+ quote_string(value)
198
+ else
199
+ value
200
+ end
155
201
  end
156
202
 
157
203
  def format_number(value)
@@ -172,7 +218,7 @@ module Cton
172
218
  end
173
219
 
174
220
  def normalize_decimal_string(string)
175
- stripped = string.start_with?("+") ? string[1..-1] : string
221
+ stripped = string.start_with?("+") ? string[1..] : string
176
222
  return "0" if zero_string?(stripped)
177
223
 
178
224
  if stripped.include?(".")
@@ -197,14 +243,14 @@ module Cton
197
243
 
198
244
  def format_key(key)
199
245
  key_string = key.to_s
200
- unless SAFE_TOKEN.match?(key_string)
201
- raise EncodeError, "Invalid key: #{key_string.inspect}"
202
- end
246
+ raise EncodeError, "Invalid key: #{key_string.inspect}" unless SAFE_TOKEN.match?(key_string)
247
+
203
248
  key_string
204
249
  end
205
250
 
206
251
  def string_needs_quotes?(value)
207
252
  return true unless SAFE_TOKEN.match?(value)
253
+
208
254
  RESERVED_LITERALS.include?(value) || numeric_like?(value)
209
255
  end
210
256
 
@@ -229,7 +275,7 @@ module Cton
229
275
  end
230
276
 
231
277
  def scalar?(value)
232
- value.is_a?(String) || value.is_a?(Numeric) || value == true || value == false || value.nil?
278
+ value.is_a?(String) || value.is_a?(Numeric) || value == true || value == false || value.nil? || value.is_a?(Time) || value.is_a?(Date)
233
279
  end
234
280
 
235
281
  def table_candidate?(rows)
@@ -243,5 +289,19 @@ module Cton
243
289
  row.is_a?(Hash) && row.keys == keys && row.values.all? { |val| scalar?(val) }
244
290
  end
245
291
  end
292
+
293
+ def indent
294
+ @indent_level += 1
295
+ newline
296
+ end
297
+
298
+ def outdent
299
+ @indent_level -= 1
300
+ newline
301
+ end
302
+
303
+ def newline
304
+ io << "\n" << (" " * indent_level)
305
+ end
246
306
  end
247
307
  end
data/lib/cton/version.rb CHANGED
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Cton
4
- VERSION = "0.1.1"
4
+ VERSION = "0.2.0"
5
5
  end
data/lib/cton.rb CHANGED
@@ -12,9 +12,23 @@ module Cton
12
12
 
13
13
  module_function
14
14
 
15
- def dump(payload, options = {})
15
+ def dump(payload, *args)
16
+ io = nil
17
+ options = {}
18
+
19
+ args.each do |arg|
20
+ if arg.is_a?(Hash)
21
+ options.merge!(arg)
22
+ else
23
+ io = arg
24
+ end
25
+ end
26
+
27
+ io ||= options[:io]
28
+
16
29
  separator = options.fetch(:separator, "\n")
17
- Encoder.new(separator: separator).encode(payload)
30
+ pretty = options.fetch(:pretty, false)
31
+ Encoder.new(separator: separator, pretty: pretty).encode(payload, io: io)
18
32
  end
19
33
  alias generate dump
20
34
 
@@ -23,4 +37,3 @@ module Cton
23
37
  end
24
38
  alias parse load
25
39
  end
26
-
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: cton
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.1
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Davide Santangelo
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2025-11-18 00:00:00.000000000 Z
11
+ date: 2025-11-19 00:00:00.000000000 Z
12
12
  dependencies: []
13
13
  description: CTON provides a JSON-compatible, token-efficient text representation
14
14
  optimized for LLM prompts.
@@ -37,6 +37,7 @@ metadata:
37
37
  homepage_uri: https://github.com/davidesantangelo/cton
38
38
  source_code_uri: https://github.com/davidesantangelo/cton
39
39
  changelog_uri: https://github.com/davidesantangelo/cton/blob/master/CHANGELOG.md
40
+ rubygems_mfa_required: 'true'
40
41
  post_install_message:
41
42
  rdoc_options: []
42
43
  require_paths: