cton 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: d011537d6c7b854d8ffffb3dc1e7848534a7919c12f4c1cc4ba84db338cb1669
4
- data.tar.gz: c96a91aae6acc9dd37e3df0b8e0fe28acfbdb6c20ea1f22a946676962f7bf7ac
3
+ metadata.gz: b010a8f0e0da39e4e4d0a4217eddaa8f9496f1889bf32e12430fdb7737f17fab
4
+ data.tar.gz: 6fe6f58ff0a40233a279ae5c8881ccca4ce382fa85cae15c2c5e26782bb02875
5
5
  SHA512:
6
- metadata.gz: 77be2aa2db0a728eaa6835be6a66456fefcd12ca27f32d3fbb3a6d77d91f4b109eca64eb958f24bb63b7ae2e193f389029fc6f1986cd590b708da3e3820b288e
7
- data.tar.gz: b09740b530363e012c7f00ba6b33aaec366503310756e328a00d9def849fcb13158e911265cf57dcf6a8b613be6cd3e0e6bf164e1cfa480983b73c29a53e8cbb
6
+ metadata.gz: 3a85563dd205c2c00b204359d85376514de8fc45ce2b2c98e4d52a0325bff2937e2d88ba5e367fe718a0b82127603deadfe16dd6f60062e77a1b75babc666ec4
7
+ data.tar.gz: b4b27bfb483e0145c49def7b9ab735c27e03420dc59fd6bcaabc57d1b2bf6868d7bc5c55fea9866da3270a6c81126df032590129a1e28385827b8b4f3058e92a
data/.rubocop.yml CHANGED
@@ -1,4 +1,5 @@
1
1
  AllCops:
2
+ NewCops: enable
2
3
  TargetRubyVersion: 3.1
3
4
 
4
5
  Style/StringLiterals:
@@ -6,3 +7,15 @@ Style/StringLiterals:
6
7
 
7
8
  Style/StringLiteralsInInterpolation:
8
9
  EnforcedStyle: double_quotes
10
+
11
+ Style/FrozenStringLiteralComment:
12
+ Enabled: true
13
+
14
+ Metrics/MethodLength:
15
+ Max: 25
16
+
17
+ Metrics/ClassLength:
18
+ Max: 200
19
+
20
+ Layout/LineLength:
21
+ Max: 120
data/CHANGELOG.md CHANGED
@@ -5,6 +5,37 @@ All notable changes to this project will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [0.2.0] - 2025-11-19
9
+
10
+ ### Added
11
+
12
+ - **CLI Tool**: New `bin/cton` executable for converting between JSON and CTON from the command line. Supports auto-detection, pretty printing, and file I/O.
13
+ - **Streaming IO**: `Cton.dump` now accepts an `IO` object as the second argument (or via `io:` keyword), allowing direct writing to files or sockets without intermediate string allocation.
14
+ - **Pretty Printing**: Added `pretty: true` option to `Cton.dump` to format output with indentation and newlines for better readability.
15
+ - **Extended Types**: Native support for `Time`, `Date` (ISO8601), `Set` (as Array), and `OpenStruct` (as Object).
16
+ - **Enhanced Error Reporting**: `ParseError` now includes line and column numbers to help locate syntax errors in large documents.
17
+
18
+ ### Changed
19
+
20
+ - **Ruby 3 Compatibility**: Improved argument handling in `Cton.dump` to robustly support Ruby 3 keyword arguments when passing hashes.
21
+
22
+ ## [0.1.1] - 2025-11-18
23
+
24
+ ### Changed
25
+
26
+ - **Performance**: Refactored `Encoder` to use `StringIO` and `Decoder` to use `StringScanner` for significantly improved performance and memory usage.
27
+ - **Architecture**: Split `Cton` module into dedicated `Cton::Encoder` and `Cton::Decoder` classes for better maintainability.
28
+
29
+ ### Fixed
30
+
31
+ - **Parsing**: Fixed an issue where unterminated strings were not correctly detected.
32
+ - **Whitespace**: Improved whitespace handling in the decoder, specifically fixing issues with whitespace between keys and structure markers.
33
+
34
+ ### Added
35
+
36
+ - **Type Safety**: Added comprehensive RBS signatures (`sig/cton.rbs`) for better IDE support and static analysis.
37
+ - **Tests**: Expanded test coverage for validation, complex tables, mixed arrays, unicode values, and error cases.
38
+
8
39
  ## [0.1.0] - 2025-11-18
9
40
 
10
41
  ### Added
data/README.md CHANGED
@@ -1,32 +1,113 @@
1
1
  # CTON
2
2
 
3
- CTON (Compact Token-Oriented Notation) is an aggressively minified, JSON-compatible wire format that keeps prompts short without giving up schema hints. It is shape-preserving (objects, arrays, scalars, table-like arrays) and deterministic, so you can safely round-trip between Ruby hashes and compact strings that work well in LLM prompts.
3
+ [![Gem Version](https://badge.fury.io/rb/cton.svg)](https://badge.fury.io/rb/cton)
4
+ [![License](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/davidesantangelo/cton/blob/master/LICENSE.txt)
5
+
6
+ **CTON** (Compact Token-Oriented Notation) is an aggressively minified, JSON-compatible wire format that keeps prompts short without giving up schema hints. It is shape-preserving (objects, arrays, scalars, table-like arrays) and deterministic, so you can safely round-trip between Ruby hashes and compact strings that work well in LLM prompts.
7
+
8
+ ---
9
+
10
+ ## 📖 Table of Contents
11
+
12
+ - [What is CTON?](#what-is-cton)
13
+ - [Why another format?](#why-another-format)
14
+ - [Examples](#examples)
15
+ - [Token Savings](#token-savings-vs-json--toon)
16
+ - [Installation](#installation)
17
+ - [Usage](#usage)
18
+ - [Development](#development)
19
+ - [Contributing](#contributing)
20
+ - [License](#license)
21
+
22
+ ---
23
+
24
+ ## What is CTON?
25
+
26
+ CTON is designed to be the most efficient way to represent structured data for Large Language Models (LLMs). It strips away the "syntactic sugar" of JSON that humans like (indentation, excessive quoting, braces) but machines don't strictly need, while adding "structural hints" that help LLMs generate valid output.
27
+
28
+ ### Key Concepts
29
+
30
+ 1. **Root is Implicit**: No curly braces `{}` wrapping the entire document.
31
+ 2. **Minimal Punctuation**:
32
+ * Objects use `key=value`.
33
+ * Nested objects use parentheses `(key=value)`.
34
+ * Arrays use brackets with length `[N]=item1,item2`.
35
+ 3. **Table Compression**: If an array contains objects with the same keys, CTON automatically converts it into a table format `[N]{header1,header2}=val1,val2;val3,val4`. This is a massive token saver for datasets.
36
+
37
+ ---
38
+
39
+ ## Examples
40
+
41
+ ### Simple Key-Value Pairs
42
+
43
+ **JSON**
44
+ ```json
45
+ {
46
+ "task": "planning",
47
+ "urgent": true,
48
+ "id": 123
49
+ }
50
+ ```
51
+
52
+ **CTON**
53
+ ```text
54
+ task=planning,urgent=true,id=123
55
+ ```
56
+
57
+ ### Nested Objects
58
+
59
+ **JSON**
60
+ ```json
61
+ {
62
+ "user": {
63
+ "name": "Davide",
64
+ "settings": {
65
+ "theme": "dark"
66
+ }
67
+ }
68
+ }
69
+ ```
70
+
71
+ **CTON**
72
+ ```text
73
+ user(name=Davide,settings(theme=dark))
74
+ ```
75
+
76
+ ### Arrays and Tables
77
+
78
+ **JSON**
79
+ ```json
80
+ {
81
+ "tags": ["ruby", "gem", "llm"],
82
+ "files": [
83
+ { "name": "README.md", "size": 1024 },
84
+ { "name": "lib/cton.rb", "size": 2048 }
85
+ ]
86
+ }
87
+ ```
88
+
89
+ **CTON**
90
+ ```text
91
+ tags[3]=ruby,gem,llm
92
+ files[2]{name,size}=README.md,1024;lib/cton.rb,2048
93
+ ```
94
+
95
+ ---
4
96
 
5
97
  ## Why another format?
6
98
 
7
99
  - **Less noise than YAML/JSON**: no indentation, no braces around the root, and optional quoting.
8
100
  - **Schema guardrails**: arrays carry their length (`friends[3]`) and table headers (`{id,name,...}`) so downstream parsing can verify shape.
9
101
  - **LLM-friendly**: works as a single string you can embed in a prompt together with short parsing instructions.
10
- - **Token savings**: CTON compounds the JSON → TOON savings; see the section below for concrete numbers.
102
+ - **Token savings**: CTON compounds the JSON → TOON savings.
11
103
 
12
- ## Token savings vs JSON & TOON
104
+ ### Token savings vs JSON & TOON
13
105
 
14
106
  - **JSON → TOON**: The [TOON benchmarks](https://toonformat.dev) report roughly 40% fewer tokens than plain JSON on mixed-structure prompts while retaining accuracy due to explicit array lengths and headers.
15
- - **TOON → CTON**: By stripping indentation and forcing everything inline, CTON cuts another ~20–40% of characters. The sample above is ~350 characters as TOON and ~250 as CTON (~29% fewer), and larger tabular datasets show similar reductions.
16
- - **Net effect**: In practice you can often reclaim 50–60% of the token budget versus raw JSON, leaving more room for instructions or reasoning steps while keeping a deterministic schema.
107
+ - **TOON → CTON**: By stripping indentation and forcing everything inline, CTON cuts another ~20–40% of characters.
108
+ - **Net effect**: In practice you can often reclaim **50–60% of the token budget** versus raw JSON, leaving more room for instructions or reasoning steps while keeping a deterministic schema.
17
109
 
18
- ## Format at a glance
19
-
20
- ```
21
- context(task="Our favorite hikes together",location=Boulder,season=spring_2025)
22
- friends[3]=ana,luis,sam
23
- hikes[3]{id,name,distanceKm,elevationGain,companion,wasSunny}=1,"Blue Lake Trail",7.5,320,ana,true;2,"Ridge Overlook",9.2,540,luis,false;3,"Wildflower Loop",5.1,180,sam,true
24
- ```
25
-
26
- - Objects use parentheses and `key=value` pairs separated by commas.
27
- - Arrays encode their length: `[N]=...`. When every element is a flat hash with the same keys, they collapse into a compact table: `[N]{key1,key2}=row1;row2`.
28
- - Scalars (numbers, booleans, `null`) keep their JSON text. Strings only need quotes when they contain whitespace or reserved punctuation.
29
- - For parsing safety the Ruby encoder inserts a single `\n` between top-level segments. You can override this if you truly need a fully inline document (see options below).
110
+ ---
30
111
 
31
112
  ## Installation
32
113
 
@@ -42,28 +123,32 @@ Or install it directly:
42
123
  gem install cton
43
124
  ```
44
125
 
126
+ ---
127
+
45
128
  ## Usage
46
129
 
47
130
  ```ruby
48
131
  require "cton"
49
132
 
50
133
  payload = {
51
- "context" => {
52
- "task" => "Our favorite hikes together",
53
- "location" => "Boulder",
54
- "season" => "spring_2025"
55
- },
56
- "friends" => %w[ana luis sam],
57
- "hikes" => [
58
- { "id" => 1, "name" => "Blue Lake Trail", "distanceKm" => 7.5, "elevationGain" => 320, "companion" => "ana", "wasSunny" => true },
59
- { "id" => 2, "name" => "Ridge Overlook", "distanceKm" => 9.2, "elevationGain" => 540, "companion" => "luis", "wasSunny" => false },
60
- { "id" => 3, "name" => "Wildflower Loop", "distanceKm" => 5.1, "elevationGain" => 180, "companion" => "sam", "wasSunny" => true }
61
- ]
134
+ "context" => {
135
+ "task" => "Our favorite hikes together",
136
+ "location" => "Boulder",
137
+ "season" => "spring_2025"
138
+ },
139
+ "friends" => %w[ana luis sam],
140
+ "hikes" => [
141
+ { "id" => 1, "name" => "Blue Lake Trail", "distanceKm" => 7.5, "elevationGain" => 320, "companion" => "ana", "wasSunny" => true },
142
+ { "id" => 2, "name" => "Ridge Overlook", "distanceKm" => 9.2, "elevationGain" => 540, "companion" => "luis", "wasSunny" => false },
143
+ { "id" => 3, "name" => "Wildflower Loop", "distanceKm" => 5.1, "elevationGain" => 180, "companion" => "sam", "wasSunny" => true }
144
+ ]
62
145
  }
63
146
 
147
+ # Encode to CTON
64
148
  cton = Cton.dump(payload)
65
149
  # => "context(... )\nfriends[3]=ana,luis,sam\nhikes[3]{...}"
66
150
 
151
+ # Decode back to Hash
67
152
  round_tripped = Cton.load(cton)
68
153
  # => original hash
69
154
 
@@ -72,30 +157,65 @@ symbolized = Cton.load(cton, symbolize_names: true)
72
157
 
73
158
  # Want a truly inline document? Opt in explicitly (decoding becomes unsafe for ambiguous cases).
74
159
  inline = Cton.dump(payload, separator: "")
160
+
161
+ # Pretty print for human readability
162
+ pretty = Cton.dump(payload, pretty: true)
163
+
164
+ # Stream to an IO object (file, socket, etc.)
165
+ File.open("data.cton", "w") do |f|
166
+ Cton.dump(payload, f)
167
+ end
75
168
  ```
76
169
 
77
- ### Table detection
170
+ ### CLI Tool
78
171
 
79
- Whenever an array is made of hashes that all expose the same scalar keys, the encoder flattens it into a table to save tokens. Mixed or nested arrays fall back to `[N]=(value1,value2,...)`.
172
+ CTON comes with a command-line tool for quick conversions:
80
173
 
81
- ### Separators & ambiguity
174
+ ```bash
175
+ # Convert JSON to CTON
176
+ echo '{"hello": "world"}' | cton
177
+ # => hello=world
82
178
 
83
- Removing every newline makes certain inputs ambiguous because `sam` and the next key `hikes` can merge into `samhikes`. The default `separator: "\n"` avoids that by inserting a single newline between root segments. You may pass `separator: ""` to `Cton.dump` for maximum compactness, but decoding such strings is only safe if you can guarantee extra quoting or whitespace between segments.
179
+ # Convert CTON to JSON
180
+ echo 'hello=world' | cton --to-json
181
+ # => {"hello":"world"}
84
182
 
85
- ### Literal safety & number normalization
183
+ # Pretty print
184
+ cton --pretty input.json
185
+ ```
86
186
 
87
- Following the TOON specification's guardrails, the encoder now:
187
+ ### Advanced Features
188
+
189
+ #### Extended Types
190
+ CTON natively supports serialization for:
191
+ - `Time` and `Date` (ISO8601 strings)
192
+ - `Set` (converted to Arrays)
193
+ - `OpenStruct` (converted to Objects)
194
+
195
+ #### Table detection
196
+ Whenever an array is made of hashes that all expose the same scalar keys, the encoder flattens it into a table to save tokens. Mixed or nested arrays fall back to `[N]=(value1,value2,...)`.
88
197
 
198
+ #### Separators & ambiguity
199
+ Removing every newline makes certain inputs ambiguous because `sam` and the next key `hikes` can merge into `samhikes`. The default `separator: "\n"` avoids that by inserting a single newline between root segments. You may pass `separator: ""` to `Cton.dump` for maximum compactness, but decoding such strings is only safe if you can guarantee extra quoting or whitespace between segments.
200
+
201
+ #### Literal safety & number normalization
202
+ Following the TOON specification's guardrails, the encoder now:
89
203
  - Auto-quotes strings that would otherwise be parsed as booleans, `null`, or numbers (e.g., `"true"`, `"007"`, `"1e6"`, `"-5"`) so they round-trip as strings without extra work.
90
204
  - Canonicalizes float/BigDecimal output: no exponent notation, no trailing zeros, and `-0` collapses to `0`.
91
205
  - Converts `NaN` and `±Infinity` inputs to `null`, matching TOON's normalization guidance so downstream decoders don't explode on non-finite numbers.
92
206
 
207
+ ---
208
+
209
+ ## Type Safety
210
+
211
+ CTON ships with RBS signatures (`sig/cton.rbs`) to support type checking and IDE autocompletion.
212
+
93
213
  ## Development
94
214
 
95
215
  ```bash
96
- bin/setup # install dependencies
97
- bundle exec rspec
98
- bin/console # interactive playground
216
+ bin/setup # install dependencies
217
+ bundle exec rake # run tests and rubocop
218
+ bin/console # interactive playground
99
219
  ```
100
220
 
101
221
  To release a new version, bump `Cton::VERSION` and run `bundle exec rake release`.
@@ -106,4 +226,4 @@ Bug reports and pull requests are welcome at https://github.com/davidesantangelo
106
226
 
107
227
  ## License
108
228
 
109
- MIT © Davide Santangelo
229
+ MIT © [Davide Santangelo](https://github.com/davidesantangelo)
@@ -0,0 +1,327 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "strscan"
4
+
5
+ module Cton
6
+ class Decoder
7
+ TERMINATORS = [",", ";", ")", "]", "}"].freeze
8
+
9
+ def initialize(symbolize_names: false)
10
+ @symbolize_names = symbolize_names
11
+ end
12
+
13
+ def decode(cton)
14
+ @scanner = StringScanner.new(cton.to_s)
15
+ skip_ws
16
+
17
+ value = if key_ahead?
18
+ parse_document
19
+ else
20
+ parse_value(allow_key_boundary: true)
21
+ end
22
+
23
+ skip_ws
24
+ raise_error("Unexpected trailing data") unless @scanner.eos?
25
+
26
+ value
27
+ end
28
+
29
+ private
30
+
31
+ attr_reader :symbolize_names, :scanner
32
+
33
+ def raise_error(message)
34
+ line, col = calculate_location(@scanner.pos)
35
+ raise ParseError, "#{message} at line #{line}, column #{col}"
36
+ end
37
+
38
+ def calculate_location(pos)
39
+ string = @scanner.string
40
+ consumed = string[0...pos]
41
+ line = consumed.count("\n") + 1
42
+ last_newline = consumed.rindex("\n")
43
+ col = last_newline ? pos - last_newline : pos + 1
44
+ [line, col]
45
+ end
46
+
47
+ def parse_document
48
+ result = {}
49
+ until @scanner.eos?
50
+ key = parse_key_name
51
+ value = parse_value_for_key
52
+ result[key] = value
53
+ skip_ws
54
+ end
55
+ result
56
+ end
57
+
58
+ def parse_value_for_key
59
+ skip_ws
60
+ if @scanner.scan("(")
61
+ parse_object
62
+ elsif @scanner.scan("[")
63
+ parse_array
64
+ elsif @scanner.scan("=")
65
+ parse_scalar(allow_key_boundary: true)
66
+ else
67
+ raise_error("Unexpected token")
68
+ end
69
+ end
70
+
71
+ def parse_object
72
+ skip_ws
73
+ return {} if @scanner.scan(")")
74
+
75
+ pairs = {}
76
+ loop do
77
+ key = parse_key_name
78
+ expect!("=")
79
+ value = parse_value
80
+ pairs[key] = value
81
+ skip_ws
82
+ break if @scanner.scan(")")
83
+
84
+ expect!(",")
85
+ skip_ws
86
+ end
87
+ pairs
88
+ end
89
+
90
+ def parse_array
91
+ length = parse_integer_literal
92
+ expect!("]")
93
+ skip_ws
94
+
95
+ header = parse_header if @scanner.peek(1) == "{"
96
+
97
+ expect!("=")
98
+ return [] if length.zero?
99
+
100
+ header ? parse_table_rows(length, header) : parse_array_elements(length)
101
+ end
102
+
103
+ def parse_header
104
+ expect!("{")
105
+ fields = []
106
+ loop do
107
+ fields << parse_key_name
108
+ break if @scanner.scan("}")
109
+
110
+ expect!(",")
111
+ end
112
+ fields
113
+ end
114
+
115
+ def parse_table_rows(length, header)
116
+ rows = []
117
+ length.times do |row_index|
118
+ row = {}
119
+ header.each_with_index do |field, column_index|
120
+ allow_boundary = row_index == length - 1 && column_index == header.length - 1
121
+ row[field] = parse_scalar(allow_key_boundary: allow_boundary)
122
+ expect!(",") if column_index < header.length - 1
123
+ end
124
+ rows << symbolize_keys(row)
125
+ expect!(";") if row_index < length - 1
126
+ end
127
+ rows
128
+ end
129
+
130
+ def parse_array_elements(length)
131
+ values = []
132
+ length.times do |index|
133
+ allow_boundary = index == length - 1
134
+ values << parse_value(allow_key_boundary: allow_boundary)
135
+ expect!(",") if index < length - 1
136
+ end
137
+ values
138
+ end
139
+
140
+ def parse_value(allow_key_boundary: false)
141
+ skip_ws
142
+ if @scanner.scan("(")
143
+ parse_object
144
+ elsif @scanner.scan("[")
145
+ parse_array
146
+ elsif @scanner.peek(1) == '"'
147
+ parse_string
148
+ else
149
+ parse_scalar(allow_key_boundary: allow_key_boundary)
150
+ end
151
+ end
152
+
153
+ def parse_scalar(allow_key_boundary: false)
154
+ skip_ws
155
+ return parse_string if @scanner.peek(1) == '"'
156
+
157
+ @scanner.pos
158
+
159
+ token = if allow_key_boundary
160
+ scan_until_boundary_or_terminator
161
+ else
162
+ scan_until_terminator
163
+ end
164
+
165
+ raise_error("Empty value") if token.nil? || token.empty?
166
+
167
+ convert_scalar(token)
168
+ end
169
+
170
+ def scan_until_terminator
171
+ @scanner.scan(/[^,;\]\}\)\(\[\{\s]+/)
172
+ end
173
+
174
+ def scan_until_boundary_or_terminator
175
+ start_pos = @scanner.pos
176
+
177
+ chunk = @scanner.scan(/[0-9A-Za-z_.:-]+/)
178
+ return nil unless chunk
179
+
180
+ boundary_idx = find_key_boundary(start_pos)
181
+
182
+ if boundary_idx
183
+ length = boundary_idx - start_pos
184
+ @scanner.pos = start_pos
185
+ token = @scanner.peek(length)
186
+ @scanner.pos += length
187
+ token
188
+ else
189
+ @scanner.pos = start_pos + chunk.length
190
+ chunk
191
+ end
192
+ end
193
+
194
+ def find_key_boundary(from_index)
195
+ str = @scanner.string
196
+ len = str.length
197
+ idx = from_index
198
+
199
+ while idx < len
200
+ char = str[idx]
201
+
202
+ return nil if TERMINATORS.include?(char) || whitespace?(char) || "([{".include?(char)
203
+
204
+ if safe_key_char?(char)
205
+ key_end = idx
206
+ key_end += 1 while key_end < len && safe_key_char?(str[key_end])
207
+
208
+ next_char_idx = key_end
209
+
210
+ if next_char_idx < len
211
+ next_char = str[next_char_idx]
212
+ return idx if ["(", "[", "="].include?(next_char) && (idx > from_index)
213
+ end
214
+ end
215
+
216
+ idx += 1
217
+ end
218
+ nil
219
+ end
220
+
221
+ def convert_scalar(token)
222
+ case token
223
+ when "true" then true
224
+ when "false" then false
225
+ when "null" then nil
226
+ else
227
+ if integer?(token)
228
+ token.to_i
229
+ elsif float?(token)
230
+ token.to_f
231
+ else
232
+ token
233
+ end
234
+ end
235
+ end
236
+
237
+ def parse_string
238
+ expect!("\"")
239
+ buffer = +""
240
+ loop do
241
+ raise_error("Unterminated string") if @scanner.eos?
242
+
243
+ char = @scanner.getch
244
+
245
+ if char == "\\"
246
+ escaped = @scanner.getch
247
+ raise_error("Invalid escape sequence") if escaped.nil?
248
+ buffer << case escaped
249
+ when "n" then "\n"
250
+ when "r" then "\r"
251
+ when "t" then "\t"
252
+ when '"', "\\" then escaped
253
+ else
254
+ raise_error("Unsupported escape sequence")
255
+ end
256
+ elsif char == '"'
257
+ break
258
+ else
259
+ buffer << char
260
+ end
261
+ end
262
+ buffer
263
+ end
264
+
265
+ def parse_key_name
266
+ skip_ws
267
+ token = @scanner.scan(/[0-9A-Za-z_.:-]+/)
268
+ raise_error("Invalid key") if token.nil?
269
+ symbolize_names ? token.to_sym : token
270
+ end
271
+
272
+ def parse_integer_literal
273
+ token = @scanner.scan(/-?\d+/)
274
+ raise_error("Expected digits") if token.nil?
275
+ Integer(token, 10)
276
+ rescue ArgumentError
277
+ raise_error("Invalid length literal")
278
+ end
279
+
280
+ def symbolize_keys(row)
281
+ symbolize_names ? row.transform_keys(&:to_sym) : row
282
+ end
283
+
284
+ def expect!(char)
285
+ skip_ws
286
+ return if @scanner.scan(Regexp.new(Regexp.escape(char)))
287
+
288
+ raise_error("Expected #{char.inspect}, got #{@scanner.peek(1).inspect}")
289
+ end
290
+
291
+ def skip_ws
292
+ @scanner.skip(/\s+/)
293
+ end
294
+
295
+ def whitespace?(char)
296
+ [" ", "\t", "\n", "\r"].include?(char)
297
+ end
298
+
299
+ def key_ahead?
300
+ pos = @scanner.pos
301
+ skip_ws
302
+
303
+ if @scanner.scan(/[0-9A-Za-z_.:-]+/)
304
+ skip_ws
305
+ next_char = @scanner.peek(1)
306
+ result = ["(", "[", "="].include?(next_char)
307
+ @scanner.pos = pos
308
+ result
309
+ else
310
+ @scanner.pos = pos
311
+ false
312
+ end
313
+ end
314
+
315
+ def safe_key_char?(char)
316
+ !char.nil? && char.match?(/[0-9A-Za-z_.:-]/)
317
+ end
318
+
319
+ def integer?(token)
320
+ token.match?(/\A-?(?:0|[1-9]\d*)\z/)
321
+ end
322
+
323
+ def float?(token)
324
+ token.match?(/\A-?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+-]?\d+)?\z/)
325
+ end
326
+ end
327
+ end