cton 0.2.0 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +19 -0
- data/README.md +108 -1
- data/bench/encode_decode_bench.rb +65 -0
- data/lib/cton/decoder.rb +43 -23
- data/lib/cton/encoder.rb +99 -29
- data/lib/cton/version.rb +1 -1
- data/lib/cton.rb +2 -1
- metadata +3 -2
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 1c9161ae830ba6b01d3ec94d1170fc4295aaacfa839c869aaa6adefe2711cc2d
|
|
4
|
+
data.tar.gz: 80c4ba30abbf8a562bde581f26e7dc5529aa46275b61930fe28370f34156db61
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 914196284081bacd5b7f5f6ac9a1b246ea8924eddbb26cd28796b12a2ee2156718a9ff3b795b86188d483da4fc294b002a93e935815f692b432dac78b5304dcf
|
|
7
|
+
data.tar.gz: d9b1bfb1f7de402fe9de0d7da90f750d490dbdcb4e572e9283bc3ed15b1b43e3f3f9b44a4031f80ecf5d610a73d64719f3c78f170060de78035e04dab3a9d663
|
data/CHANGELOG.md
CHANGED
|
@@ -5,6 +5,25 @@ All notable changes to this project will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [0.3.0] - 2025-11-20
|
|
9
|
+
|
|
10
|
+
### Added
|
|
11
|
+
|
|
12
|
+
- **Performance tunables**: `Cton.dump` now accepts `decimal_mode: :fast | :precise`, allowing callers to trade float-format determinism for lower allocation pressure. Specs cover both modes.
|
|
13
|
+
- **Benchmark harness**: New `bench/encode_decode_bench.rb` script (wired into the README/Development docs) exercises encode/decode hot paths and prints comparative JSON vs CTON timings. On Ruby 3.1.4/macOS the fast encoder completes 1,000 iterations in ~0.63s and the new inline decoder stress test wraps 400 concatenated documents in ~4.14s.
|
|
14
|
+
- **Regression tests**: Added specs for streaming documents without separators plus validation around the new decimal mode toggle.
|
|
15
|
+
|
|
16
|
+
### Changed
|
|
17
|
+
|
|
18
|
+
- **Encoder**: Memoizes table schemas per array instance, adds a fast-path for homogeneous scalar lists, and reduces float/BigDecimal copying by favoring Ruby's native float formatting before falling back to `BigDecimal`. Unsupported `decimal_mode` values now raise immediately.
|
|
19
|
+
- **Decoder**: Replaces high-allocation `StringScanner` tokenization with raw string slicing, improves key-boundary detection for inline payloads, and keeps symbolization logic untouched. Boundary heuristics now prefer alphabetic key starts to avoid splitting numeric payloads.
|
|
20
|
+
- **Documentation**: README now calls out the tuning flags, inline caveats, and benchmark instructions; Development workflow highlights how to rerun the perf suite.
|
|
21
|
+
|
|
22
|
+
### Fixed
|
|
23
|
+
|
|
24
|
+
- **Inline parsing**: Eliminated the runaway allocations and incorrect key splits when processing long documents with `separator: ""`.
|
|
25
|
+
- **Float normalization**: Restored canonical `9.2`-style output in fast mode while keeping the new perf optimizations.
|
|
26
|
+
|
|
8
27
|
## [0.2.0] - 2025-11-19
|
|
9
28
|
|
|
10
29
|
### Added
|
data/README.md
CHANGED
|
@@ -15,6 +15,8 @@
|
|
|
15
15
|
- [Token Savings](#token-savings-vs-json--toon)
|
|
16
16
|
- [Installation](#installation)
|
|
17
17
|
- [Usage](#usage)
|
|
18
|
+
- [Performance & Benchmarks](#performance--benchmarks)
|
|
19
|
+
- [Teaching CTON to LLMs](#teaching-cton-to-llms)
|
|
18
20
|
- [Development](#development)
|
|
19
21
|
- [Contributing](#contributing)
|
|
20
22
|
- [License](#license)
|
|
@@ -165,6 +167,10 @@ pretty = Cton.dump(payload, pretty: true)
|
|
|
165
167
|
File.open("data.cton", "w") do |f|
|
|
166
168
|
Cton.dump(payload, f)
|
|
167
169
|
end
|
|
170
|
+
|
|
171
|
+
# Toggle float normalization strategies
|
|
172
|
+
fast = Cton.dump(payload) # default :fast mode
|
|
173
|
+
strict = Cton.dump(payload, decimal_mode: :precise)
|
|
168
174
|
```
|
|
169
175
|
|
|
170
176
|
### CLI Tool
|
|
@@ -196,7 +202,7 @@ CTON natively supports serialization for:
|
|
|
196
202
|
Whenever an array is made of hashes that all expose the same scalar keys, the encoder flattens it into a table to save tokens. Mixed or nested arrays fall back to `[N]=(value1,value2,...)`.
|
|
197
203
|
|
|
198
204
|
#### Separators & ambiguity
|
|
199
|
-
Removing every newline makes certain inputs ambiguous because `sam` and the next key `hikes` can merge into `samhikes`. The default `separator: "\n"` avoids that by inserting a single newline between root segments. You may pass `separator: ""` to `Cton.dump` for maximum compactness, but decoding such strings is only safe if you can guarantee extra quoting or whitespace between segments.
|
|
205
|
+
Removing every newline makes certain inputs ambiguous because `sam` and the next key `hikes` can merge into `samhikes`. The default `separator: "\n"` avoids that by inserting a single newline between root segments. You may pass `separator: ""` to `Cton.dump` for maximum compactness, but decoding such strings is only safe if you can guarantee extra quoting or whitespace between segments. When you intentionally omit separators, keep next-level keys alphabetic (e.g., `payload`, `k42`) so the decoder's boundary heuristic can split `...1payload...` without misclassifying numeric prefixes.
|
|
200
206
|
|
|
201
207
|
#### Literal safety & number normalization
|
|
202
208
|
Following the TOON specification's guardrails, the encoder now:
|
|
@@ -204,6 +210,106 @@ Following the TOON specification's guardrails, the encoder now:
|
|
|
204
210
|
- Canonicalizes float/BigDecimal output: no exponent notation, no trailing zeros, and `-0` collapses to `0`.
|
|
205
211
|
- Converts `NaN` and `±Infinity` inputs to `null`, matching TOON's normalization guidance so downstream decoders don't explode on non-finite numbers.
|
|
206
212
|
|
|
213
|
+
#### Decimal normalization modes
|
|
214
|
+
- `decimal_mode: :fast` (default) prefers Ruby's native float representation and only falls back to `BigDecimal` when scientific notation is detected, minimizing allocations on tight loops.
|
|
215
|
+
- `decimal_mode: :precise` forces the legacy `BigDecimal` path for every float, which is slower but useful for audit-grade dumps where you want deterministic decimal expansion.
|
|
216
|
+
- Both modes share the same trailing-zero stripping and `-0 → 0` normalization, so switching modes never affects integer formatting.
|
|
217
|
+
|
|
218
|
+
---
|
|
219
|
+
|
|
220
|
+
## Performance & Benchmarks
|
|
221
|
+
|
|
222
|
+
CTON focuses on throughput: encoder table schemas are memoized, scalar list encoding keeps a reusable buffer, floats avoid `BigDecimal` when they can, and the decoder slices straight from the raw string to sidestep `StringScanner` allocations. You can reproduce the numbers below with the bundled script:
|
|
223
|
+
|
|
224
|
+
```bash
|
|
225
|
+
bundle exec ruby bench/encode_decode_bench.rb
|
|
226
|
+
# customize input size / iterations
|
|
227
|
+
ITERATIONS=2000 STREAM_SIZE=400 bundle exec ruby bench/encode_decode_bench.rb
|
|
228
|
+
```
|
|
229
|
+
|
|
230
|
+
Latest results on Ruby 3.1.4/macOS (M-series), 1,000 iterations, `STREAM_SIZE=200`:
|
|
231
|
+
|
|
232
|
+
| Benchmark | Time (s) |
|
|
233
|
+
| --- | --- |
|
|
234
|
+
| `cton dump` (:fast) | 0.626 |
|
|
235
|
+
| `cton dump` (:precise) | 0.658 |
|
|
236
|
+
| `json generate` | 0.027 |
|
|
237
|
+
| `cton load` | 2.067 |
|
|
238
|
+
| `json parse` | 0.045 |
|
|
239
|
+
| `cton inline load` (separator=`""`, double payload) | 4.140 |
|
|
240
|
+
|
|
241
|
+
`cton inline load` deliberately concatenates documents without separators to stress the new boundary detector; it now finishes without the runaway allocations seen in earlier releases.
|
|
242
|
+
|
|
243
|
+
---
|
|
244
|
+
|
|
245
|
+
## Teaching CTON to LLMs
|
|
246
|
+
|
|
247
|
+
Use this system prompt to teach an LLM how to understand and generate CTON:
|
|
248
|
+
|
|
249
|
+
````markdown
|
|
250
|
+
You are an expert in data serialization and specifically in CTON (Compact Token-Oriented Notation). CTON is a token-efficient data format optimized for LLMs that serves as a compact alternative to JSON.
|
|
251
|
+
|
|
252
|
+
Your task is to interpret CTON input and convert it to JSON, or convert JSON input into valid CTON format, following the specification below.
|
|
253
|
+
|
|
254
|
+
### CTON Specification
|
|
255
|
+
|
|
256
|
+
CTON minimizes syntax characters (braces, quotes) while preserving structure and type safety.
|
|
257
|
+
|
|
258
|
+
**1. Basic Structure (Key-Value)**
|
|
259
|
+
- **Rule:** Do not use outer curly braces `{}` for the root object.
|
|
260
|
+
- **Rule:** Use `=` to separate keys and values.
|
|
261
|
+
- **Rule:** Use `,` to separate fields.
|
|
262
|
+
- **Rule:** Do not use quotes around "safe" strings (alphanumeric, simple text).
|
|
263
|
+
- **Example:** - JSON: `{"task": "planning", "urgent": true}`
|
|
264
|
+
- CTON: `task=planning,urgent=true`
|
|
265
|
+
|
|
266
|
+
**2. Nested Objects**
|
|
267
|
+
- **Rule:** Use parentheses `()` to denote a nested object instead of `{}`.
|
|
268
|
+
- **Example:**
|
|
269
|
+
- JSON: `{"context": {"user": "Davide", "theme": "dark"}}`
|
|
270
|
+
- CTON: `context(user=Davide,theme=dark)`
|
|
271
|
+
|
|
272
|
+
**3. Arrays of Objects (Table Compression)**
|
|
273
|
+
- **Rule:** Use the syntax `key[count]{columns}=values` for arrays of objects to avoid repeating keys.
|
|
274
|
+
- **Structure:** `key[Length]{col1,col2}=val1,val2;val1,val2`
|
|
275
|
+
- **Details:** - `[N]` denotes the number of items in the array.
|
|
276
|
+
- `{col1,col2}` defines the schema headers.
|
|
277
|
+
- `;` separates distinct objects (rows).
|
|
278
|
+
- `,` separates values within an object.
|
|
279
|
+
- **Example:**
|
|
280
|
+
|
|
281
|
+
JSON:
|
|
282
|
+
```json
|
|
283
|
+
{
|
|
284
|
+
"files": [
|
|
285
|
+
{ "name": "README.md", "size": 1024 },
|
|
286
|
+
{ "name": "lib.rb", "size": 2048 }
|
|
287
|
+
]
|
|
288
|
+
}
|
|
289
|
+
```
|
|
290
|
+
|
|
291
|
+
CTON: `files[2]{name,size}=README.md,1024;lib.rb,2048`
|
|
292
|
+
|
|
293
|
+
**4. Type Safety & Literals**
|
|
294
|
+
- **Booleans/Null:** `true`, `false`, and `null` are preserved as literals (unquoted).
|
|
295
|
+
- **Numbers:** Integers and floats are written as is (e.g., `1024`, `3.14`).
|
|
296
|
+
- **Escaping:** If a string value looks like a boolean, number, or contains reserved characters (like `,`, `;`, `=`, `(`, `)`), it must be wrapped in double quotes (e.g., `"true"`).
|
|
297
|
+
|
|
298
|
+
### Examples for Training
|
|
299
|
+
|
|
300
|
+
**Input (JSON):**
|
|
301
|
+
```json
|
|
302
|
+
{
|
|
303
|
+
"id": 123,
|
|
304
|
+
"active": true,
|
|
305
|
+
"metadata": {
|
|
306
|
+
"created_at": "2023-01-01",
|
|
307
|
+
"tags": "admin"
|
|
308
|
+
}
|
|
309
|
+
}
|
|
310
|
+
```
|
|
311
|
+
````
|
|
312
|
+
|
|
207
313
|
---
|
|
208
314
|
|
|
209
315
|
## Type Safety
|
|
@@ -216,6 +322,7 @@ CTON ships with RBS signatures (`sig/cton.rbs`) to support type checking and IDE
|
|
|
216
322
|
bin/setup # install dependencies
|
|
217
323
|
bundle exec rake # run tests and rubocop
|
|
218
324
|
bin/console # interactive playground
|
|
325
|
+
bundle exec ruby bench/encode_decode_bench.rb # performance smoke test
|
|
219
326
|
```
|
|
220
327
|
|
|
221
328
|
To release a new version, bump `Cton::VERSION` and run `bundle exec rake release`.
|
|
@@ -0,0 +1,65 @@
|
|
|
1
|
+
#!/usr/bin/env ruby
|
|
2
|
+
# frozen_string_literal: true
|
|
3
|
+
|
|
4
|
+
require "benchmark"
|
|
5
|
+
require "json"
|
|
6
|
+
require_relative "../lib/cton"
|
|
7
|
+
|
|
8
|
+
ITERATIONS = Integer(ENV.fetch("ITERATIONS", 1_000))
|
|
9
|
+
STREAM_SIZE = Integer(ENV.fetch("STREAM_SIZE", 200))
|
|
10
|
+
|
|
11
|
+
sample_payload = {
|
|
12
|
+
"context" => {
|
|
13
|
+
"task" => "Our favorite hikes together",
|
|
14
|
+
"location" => "Boulder",
|
|
15
|
+
"season" => "spring_2025"
|
|
16
|
+
},
|
|
17
|
+
"friends" => %w[ana luis sam],
|
|
18
|
+
"hikes" => Array.new(STREAM_SIZE) do |idx|
|
|
19
|
+
{
|
|
20
|
+
"id" => idx + 1,
|
|
21
|
+
"name" => "Trail ##{idx + 1}",
|
|
22
|
+
"distanceKm" => (6.0 + ((idx % 5) * 0.5)),
|
|
23
|
+
"elevationGain" => 250 + ((idx % 3) * 50),
|
|
24
|
+
"companion" => %w[ana luis sam][idx % 3],
|
|
25
|
+
"wasSunny" => idx.even?
|
|
26
|
+
}
|
|
27
|
+
end
|
|
28
|
+
}
|
|
29
|
+
|
|
30
|
+
warm_cton = Cton.dump(sample_payload)
|
|
31
|
+
warm_json = JSON.generate(sample_payload)
|
|
32
|
+
|
|
33
|
+
puts "\nEncoding benchmarks (iterations=#{ITERATIONS}, stream_size=#{STREAM_SIZE})"
|
|
34
|
+
Benchmark.bm(25) do |bm|
|
|
35
|
+
bm.report("cton dump fast") do
|
|
36
|
+
ITERATIONS.times { Cton.dump(sample_payload) }
|
|
37
|
+
end
|
|
38
|
+
|
|
39
|
+
bm.report("cton dump precise") do
|
|
40
|
+
ITERATIONS.times { Cton.dump(sample_payload, decimal_mode: :precise) }
|
|
41
|
+
end
|
|
42
|
+
|
|
43
|
+
bm.report("json generate") do
|
|
44
|
+
ITERATIONS.times { JSON.generate(sample_payload) }
|
|
45
|
+
end
|
|
46
|
+
end
|
|
47
|
+
|
|
48
|
+
puts "\nDecoding benchmarks"
|
|
49
|
+
Benchmark.bm(25) do |bm|
|
|
50
|
+
bm.report("cton load") do
|
|
51
|
+
ITERATIONS.times { Cton.load(warm_cton) }
|
|
52
|
+
end
|
|
53
|
+
|
|
54
|
+
bm.report("json parse") do
|
|
55
|
+
ITERATIONS.times { JSON.parse(warm_json) }
|
|
56
|
+
end
|
|
57
|
+
end
|
|
58
|
+
|
|
59
|
+
puts "\nStreaming decode stress (#{STREAM_SIZE * 2} documents, separator=\"\")"
|
|
60
|
+
inline_blob = warm_cton.delete("\n") * 2
|
|
61
|
+
Benchmark.bm(25) do |bm|
|
|
62
|
+
bm.report("cton inline load") do
|
|
63
|
+
ITERATIONS.times { Cton.load(inline_blob) }
|
|
64
|
+
end
|
|
65
|
+
end
|
data/lib/cton/decoder.rb
CHANGED
|
@@ -5,13 +5,15 @@ require "strscan"
|
|
|
5
5
|
module Cton
|
|
6
6
|
class Decoder
|
|
7
7
|
TERMINATORS = [",", ";", ")", "]", "}"].freeze
|
|
8
|
+
KEY_VALUE_BOUNDARY_TOKENS = ["(", "[", "="].freeze
|
|
8
9
|
|
|
9
10
|
def initialize(symbolize_names: false)
|
|
10
11
|
@symbolize_names = symbolize_names
|
|
11
12
|
end
|
|
12
13
|
|
|
13
14
|
def decode(cton)
|
|
14
|
-
@
|
|
15
|
+
@raw_string = cton.to_s
|
|
16
|
+
@scanner = StringScanner.new(@raw_string)
|
|
15
17
|
skip_ws
|
|
16
18
|
|
|
17
19
|
value = if key_ahead?
|
|
@@ -28,7 +30,7 @@ module Cton
|
|
|
28
30
|
|
|
29
31
|
private
|
|
30
32
|
|
|
31
|
-
attr_reader :symbolize_names, :scanner
|
|
33
|
+
attr_reader :symbolize_names, :scanner, :raw_string
|
|
32
34
|
|
|
33
35
|
def raise_error(message)
|
|
34
36
|
line, col = calculate_location(@scanner.pos)
|
|
@@ -36,7 +38,7 @@ module Cton
|
|
|
36
38
|
end
|
|
37
39
|
|
|
38
40
|
def calculate_location(pos)
|
|
39
|
-
string =
|
|
41
|
+
string = raw_string
|
|
40
42
|
consumed = string[0...pos]
|
|
41
43
|
line = consumed.count("\n") + 1
|
|
42
44
|
last_newline = consumed.rindex("\n")
|
|
@@ -168,56 +170,74 @@ module Cton
|
|
|
168
170
|
end
|
|
169
171
|
|
|
170
172
|
def scan_until_terminator
|
|
171
|
-
@scanner.
|
|
173
|
+
start_pos = @scanner.pos
|
|
174
|
+
end_pos = find_terminator_position(start_pos)
|
|
175
|
+
consume_slice(start_pos, end_pos)
|
|
172
176
|
end
|
|
173
177
|
|
|
174
178
|
def scan_until_boundary_or_terminator
|
|
175
179
|
start_pos = @scanner.pos
|
|
180
|
+
boundary_pos = find_key_boundary(start_pos)
|
|
181
|
+
end_pos = boundary_pos || find_terminator_position(start_pos)
|
|
182
|
+
consume_slice(start_pos, end_pos)
|
|
183
|
+
end
|
|
176
184
|
|
|
177
|
-
|
|
178
|
-
return nil
|
|
185
|
+
def consume_slice(start_pos, end_pos)
|
|
186
|
+
return nil if end_pos <= start_pos
|
|
179
187
|
|
|
180
|
-
|
|
188
|
+
token = raw_string.byteslice(start_pos, end_pos - start_pos)
|
|
189
|
+
@scanner.pos = end_pos
|
|
190
|
+
token
|
|
191
|
+
end
|
|
181
192
|
|
|
182
|
-
|
|
183
|
-
|
|
184
|
-
|
|
185
|
-
|
|
186
|
-
|
|
187
|
-
|
|
188
|
-
|
|
189
|
-
|
|
190
|
-
|
|
193
|
+
def find_terminator_position(start_pos)
|
|
194
|
+
str = raw_string
|
|
195
|
+
len = str.length
|
|
196
|
+
idx = start_pos
|
|
197
|
+
|
|
198
|
+
while idx < len
|
|
199
|
+
char = str[idx]
|
|
200
|
+
break if terminator?(char)
|
|
201
|
+
|
|
202
|
+
idx += 1
|
|
191
203
|
end
|
|
204
|
+
|
|
205
|
+
idx
|
|
192
206
|
end
|
|
193
207
|
|
|
194
208
|
def find_key_boundary(from_index)
|
|
195
|
-
str =
|
|
209
|
+
str = raw_string
|
|
196
210
|
len = str.length
|
|
197
211
|
idx = from_index
|
|
198
212
|
|
|
199
213
|
while idx < len
|
|
200
214
|
char = str[idx]
|
|
201
215
|
|
|
202
|
-
return nil if
|
|
216
|
+
return nil if terminator?(char)
|
|
203
217
|
|
|
204
218
|
if safe_key_char?(char)
|
|
205
219
|
key_end = idx
|
|
206
220
|
key_end += 1 while key_end < len && safe_key_char?(str[key_end])
|
|
207
221
|
|
|
208
|
-
|
|
209
|
-
|
|
210
|
-
if next_char_idx < len
|
|
211
|
-
next_char = str[next_char_idx]
|
|
212
|
-
return idx if ["(", "[", "="].include?(next_char) && (idx > from_index)
|
|
222
|
+
if key_end < len && KEY_VALUE_BOUNDARY_TOKENS.include?(str[key_end]) && idx > from_index && boundary_start_allowed?(str[idx])
|
|
223
|
+
return idx
|
|
213
224
|
end
|
|
214
225
|
end
|
|
215
226
|
|
|
216
227
|
idx += 1
|
|
217
228
|
end
|
|
229
|
+
|
|
218
230
|
nil
|
|
219
231
|
end
|
|
220
232
|
|
|
233
|
+
def terminator?(char)
|
|
234
|
+
TERMINATORS.include?(char) || whitespace?(char) || ["(", "[", "{"].include?(char)
|
|
235
|
+
end
|
|
236
|
+
|
|
237
|
+
def boundary_start_allowed?(char)
|
|
238
|
+
!char.nil? && char.match?(/[A-Za-z_.:-]/)
|
|
239
|
+
end
|
|
240
|
+
|
|
221
241
|
def convert_scalar(token)
|
|
222
242
|
case token
|
|
223
243
|
when "true" then true
|
data/lib/cton/encoder.rb
CHANGED
|
@@ -11,10 +11,14 @@ module Cton
|
|
|
11
11
|
RESERVED_LITERALS = %w[true false null].freeze
|
|
12
12
|
FLOAT_DECIMAL_PRECISION = Float::DIG
|
|
13
13
|
|
|
14
|
-
def initialize(separator: "\n", pretty: false)
|
|
14
|
+
def initialize(separator: "\n", pretty: false, decimal_mode: :fast)
|
|
15
15
|
@separator = separator || ""
|
|
16
16
|
@pretty = pretty
|
|
17
|
+
@decimal_mode = decimal_mode
|
|
18
|
+
raise ArgumentError, "decimal_mode must be :fast or :precise" unless %i[fast precise].include?(@decimal_mode)
|
|
19
|
+
|
|
17
20
|
@indent_level = 0
|
|
21
|
+
@table_schema_cache = {}
|
|
18
22
|
end
|
|
19
23
|
|
|
20
24
|
def encode(payload, io: nil)
|
|
@@ -25,7 +29,7 @@ module Cton
|
|
|
25
29
|
|
|
26
30
|
private
|
|
27
31
|
|
|
28
|
-
attr_reader :separator, :io, :pretty, :indent_level
|
|
32
|
+
attr_reader :separator, :io, :pretty, :indent_level, :decimal_mode
|
|
29
33
|
|
|
30
34
|
def encode_root(value)
|
|
31
35
|
case value
|
|
@@ -96,8 +100,8 @@ module Cton
|
|
|
96
100
|
|
|
97
101
|
io << "[" << length.to_s << "]"
|
|
98
102
|
|
|
99
|
-
if
|
|
100
|
-
encode_table(list)
|
|
103
|
+
if (header = table_schema_for(list))
|
|
104
|
+
encode_table(list, header)
|
|
101
105
|
else
|
|
102
106
|
io << "="
|
|
103
107
|
if list.all? { |value| scalar?(value) }
|
|
@@ -108,8 +112,7 @@ module Cton
|
|
|
108
112
|
end
|
|
109
113
|
end
|
|
110
114
|
|
|
111
|
-
def encode_table(rows)
|
|
112
|
-
header = rows.first.keys
|
|
115
|
+
def encode_table(rows, header)
|
|
113
116
|
io << "{"
|
|
114
117
|
io << header.map { |key| format_key(key) }.join(",")
|
|
115
118
|
io << "}="
|
|
@@ -150,10 +153,14 @@ module Cton
|
|
|
150
153
|
outdent
|
|
151
154
|
else
|
|
152
155
|
first = true
|
|
153
|
-
list
|
|
154
|
-
io <<
|
|
155
|
-
|
|
156
|
-
|
|
156
|
+
if fast_scalar_stream?(list)
|
|
157
|
+
io << fast_scalar_stream(list)
|
|
158
|
+
else
|
|
159
|
+
list.each do |value|
|
|
160
|
+
io << "," unless first
|
|
161
|
+
encode_scalar(value)
|
|
162
|
+
first = false
|
|
163
|
+
end
|
|
157
164
|
end
|
|
158
165
|
end
|
|
159
166
|
end
|
|
@@ -174,30 +181,34 @@ module Cton
|
|
|
174
181
|
end
|
|
175
182
|
|
|
176
183
|
def encode_scalar(value)
|
|
184
|
+
io << scalar_to_string(value)
|
|
185
|
+
end
|
|
186
|
+
|
|
187
|
+
def scalar_to_string(value)
|
|
177
188
|
case value
|
|
178
189
|
when String
|
|
179
|
-
|
|
190
|
+
format_string(value)
|
|
180
191
|
when TrueClass, FalseClass
|
|
181
|
-
|
|
192
|
+
value ? "true" : "false"
|
|
182
193
|
when NilClass
|
|
183
|
-
|
|
194
|
+
"null"
|
|
184
195
|
when Numeric
|
|
185
|
-
|
|
196
|
+
format_number(value)
|
|
186
197
|
when Time, Date
|
|
187
|
-
|
|
198
|
+
format_string(value.iso8601)
|
|
188
199
|
else
|
|
189
200
|
raise EncodeError, "Unsupported value: #{value.class}"
|
|
190
201
|
end
|
|
191
202
|
end
|
|
192
203
|
|
|
193
|
-
def
|
|
194
|
-
|
|
195
|
-
|
|
196
|
-
|
|
197
|
-
|
|
198
|
-
|
|
199
|
-
|
|
200
|
-
|
|
204
|
+
def format_string(value)
|
|
205
|
+
if value.empty?
|
|
206
|
+
'""'
|
|
207
|
+
elsif string_needs_quotes?(value)
|
|
208
|
+
quote_string(value)
|
|
209
|
+
else
|
|
210
|
+
value
|
|
211
|
+
end
|
|
201
212
|
end
|
|
202
213
|
|
|
203
214
|
def format_number(value)
|
|
@@ -234,6 +245,17 @@ module Cton
|
|
|
234
245
|
end
|
|
235
246
|
|
|
236
247
|
def float_decimal_string(value)
|
|
248
|
+
return precise_float_decimal_string(value) if decimal_mode == :precise
|
|
249
|
+
|
|
250
|
+
decimal = value.to_s
|
|
251
|
+
if decimal.include?("e") || decimal.include?("E")
|
|
252
|
+
precise_float_decimal_string(value)
|
|
253
|
+
else
|
|
254
|
+
decimal
|
|
255
|
+
end
|
|
256
|
+
end
|
|
257
|
+
|
|
258
|
+
def precise_float_decimal_string(value)
|
|
237
259
|
if defined?(BigDecimal)
|
|
238
260
|
BigDecimal(value.to_s).to_s("F")
|
|
239
261
|
else
|
|
@@ -278,16 +300,64 @@ module Cton
|
|
|
278
300
|
value.is_a?(String) || value.is_a?(Numeric) || value == true || value == false || value.nil? || value.is_a?(Time) || value.is_a?(Date)
|
|
279
301
|
end
|
|
280
302
|
|
|
281
|
-
def
|
|
282
|
-
|
|
303
|
+
def table_schema_for(rows)
|
|
304
|
+
cache_lookup = @table_schema_cache.fetch(rows.object_id, :__missing__)
|
|
305
|
+
return cache_lookup unless cache_lookup == :__missing__
|
|
306
|
+
|
|
307
|
+
schema = compute_table_schema(rows)
|
|
308
|
+
@table_schema_cache[rows.object_id] = schema
|
|
309
|
+
end
|
|
310
|
+
|
|
311
|
+
def compute_table_schema(rows)
|
|
312
|
+
return nil if rows.empty?
|
|
283
313
|
|
|
284
314
|
first = rows.first
|
|
285
|
-
return
|
|
315
|
+
return nil unless first.is_a?(Hash) && !first.empty?
|
|
316
|
+
|
|
317
|
+
header = first.keys.freeze
|
|
318
|
+
|
|
319
|
+
rows.each do |row|
|
|
320
|
+
return nil unless row.is_a?(Hash)
|
|
321
|
+
return nil unless row.keys == header
|
|
322
|
+
return nil unless row.values.all? { |val| scalar?(val) }
|
|
323
|
+
end
|
|
324
|
+
|
|
325
|
+
header
|
|
326
|
+
end
|
|
327
|
+
|
|
328
|
+
def fast_scalar_stream?(list)
|
|
329
|
+
!pretty && list.length > 4 && homogeneous_scalar_tokens?(list)
|
|
330
|
+
end
|
|
331
|
+
|
|
332
|
+
def homogeneous_scalar_tokens?(list)
|
|
333
|
+
first_class = nil
|
|
334
|
+
list.all? do |value|
|
|
335
|
+
return false unless scalar?(value)
|
|
336
|
+
|
|
337
|
+
token_class = value.class
|
|
338
|
+
first_class ||= token_class
|
|
339
|
+
token_class == first_class && token_does_not_require_quotes?(value)
|
|
340
|
+
end
|
|
341
|
+
end
|
|
342
|
+
|
|
343
|
+
def token_does_not_require_quotes?(value)
|
|
344
|
+
case value
|
|
345
|
+
when String
|
|
346
|
+
!value.empty? && !string_needs_quotes?(value)
|
|
347
|
+
when Integer, TrueClass, FalseClass, NilClass
|
|
348
|
+
true
|
|
349
|
+
else
|
|
350
|
+
false
|
|
351
|
+
end
|
|
352
|
+
end
|
|
286
353
|
|
|
287
|
-
|
|
288
|
-
|
|
289
|
-
|
|
354
|
+
def fast_scalar_stream(list)
|
|
355
|
+
buffer = String.new
|
|
356
|
+
list.each_with_index do |value, index|
|
|
357
|
+
buffer << "," unless index.zero?
|
|
358
|
+
buffer << scalar_to_string(value)
|
|
290
359
|
end
|
|
360
|
+
buffer
|
|
291
361
|
end
|
|
292
362
|
|
|
293
363
|
def indent
|
data/lib/cton/version.rb
CHANGED
data/lib/cton.rb
CHANGED
|
@@ -28,7 +28,8 @@ module Cton
|
|
|
28
28
|
|
|
29
29
|
separator = options.fetch(:separator, "\n")
|
|
30
30
|
pretty = options.fetch(:pretty, false)
|
|
31
|
-
|
|
31
|
+
decimal_mode = options.fetch(:decimal_mode, :fast)
|
|
32
|
+
Encoder.new(separator: separator, pretty: pretty, decimal_mode: decimal_mode).encode(payload, io: io)
|
|
32
33
|
end
|
|
33
34
|
alias generate dump
|
|
34
35
|
|
metadata
CHANGED
|
@@ -1,14 +1,14 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: cton
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.
|
|
4
|
+
version: 0.3.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Davide Santangelo
|
|
8
8
|
autorequire:
|
|
9
9
|
bindir: exe
|
|
10
10
|
cert_chain: []
|
|
11
|
-
date: 2025-11-
|
|
11
|
+
date: 2025-11-20 00:00:00.000000000 Z
|
|
12
12
|
dependencies: []
|
|
13
13
|
description: CTON provides a JSON-compatible, token-efficient text representation
|
|
14
14
|
optimized for LLM prompts.
|
|
@@ -25,6 +25,7 @@ files:
|
|
|
25
25
|
- LICENSE.txt
|
|
26
26
|
- README.md
|
|
27
27
|
- Rakefile
|
|
28
|
+
- bench/encode_decode_bench.rb
|
|
28
29
|
- lib/cton.rb
|
|
29
30
|
- lib/cton/decoder.rb
|
|
30
31
|
- lib/cton/encoder.rb
|