cton 0.3.0 → 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +62 -0
- data/README.md +126 -235
- data/lib/cton/binary.rb +74 -0
- data/lib/cton/decoder.rb +67 -26
- data/lib/cton/encoder.rb +34 -4
- data/lib/cton/schema.rb +369 -0
- data/lib/cton/stats.rb +123 -0
- data/lib/cton/stream.rb +44 -0
- data/lib/cton/type_registry.rb +137 -0
- data/lib/cton/validator.rb +427 -0
- data/lib/cton/version.rb +1 -1
- data/lib/cton.rb +147 -2
- data/sig/cton/binary.rbs +12 -0
- data/sig/cton/schema.rbs +84 -0
- data/sig/cton/stream.rbs +13 -0
- data/sig/cton.rbs +134 -6
- metadata +11 -2
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: a1d32c4a3d5f55726071135d05705be7084d7c8332a1f03a5a724523981a7b81
|
|
4
|
+
data.tar.gz: '0813fb27119f36ab0117e7ca1401fe33d8826bc7bfbe09b68589dece8ae2f376'
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: e536052cfe992ab7a5844640b622093a5d6f091edfa6af6ce456761e229fa1eb949ce50f103e9aadca54fa6abce66b0fc535fb5335c884e5de5bd2b29408d747
|
|
7
|
+
data.tar.gz: 66be8d4e9ff12a3e99dd138c82aa676cdc6e8f9e61a07183137b24de07b2f2d177c9a11b6e27e2407476dc45a19e3b012b63be87f942981a34002a6a292f791e
|
data/CHANGELOG.md
CHANGED
|
@@ -5,6 +5,68 @@ All notable changes to this project will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [1.0.0] - 2026-01-17
|
|
9
|
+
|
|
10
|
+
### Added
|
|
11
|
+
|
|
12
|
+
- **Schema Validation DSL**: Define schemas via `Cton.schema` and validate data with `Cton.validate_schema` for LLM-safe outputs.
|
|
13
|
+
- **Streaming APIs**: `Cton.load_stream`, `Cton.dump_stream`, plus `StreamReader`/`StreamWriter` for newline-delimited documents.
|
|
14
|
+
- **CTON-B Binary Mode**: Optional binary envelope with compression via `Cton.dump_binary`/`Cton.load_binary`.
|
|
15
|
+
- **CLI Enhancements**: `--schema`, `--stream`, `--to-binary`, and `--from-binary` support.
|
|
16
|
+
|
|
17
|
+
### Changed
|
|
18
|
+
|
|
19
|
+
- **Performance**: Faster scalar scans in the decoder and reusable scalar buffers in the encoder.
|
|
20
|
+
- **Docs**: README refocused on LLM usage, schema validation, and streaming workflows.
|
|
21
|
+
|
|
22
|
+
## [0.4.0] - 2025-11-26
|
|
23
|
+
|
|
24
|
+
### Added
|
|
25
|
+
|
|
26
|
+
- **Comment Support**: CTON now supports single-line comments using `#` syntax. Comments are ignored during parsing, allowing for annotated data files.
|
|
27
|
+
- Decoder skips comments (from `#` to end of line) during parsing
|
|
28
|
+
- Encoder can emit comments via new `comments:` option: `Cton.dump(data, comments: { "key" => "description" })`
|
|
29
|
+
|
|
30
|
+
- **Validation API**: New methods to validate CTON syntax without full parsing:
|
|
31
|
+
- `Cton.valid?(string)` returns `true` or `false`
|
|
32
|
+
- `Cton.validate(string)` returns a `ValidationResult` object with detailed error information
|
|
33
|
+
- `ValidationResult` includes `valid?`, `errors`, and `to_s` methods
|
|
34
|
+
- `ValidationError` includes `message`, `line`, `column`, and `source_excerpt`
|
|
35
|
+
|
|
36
|
+
- **Token Statistics API**: Analyze and compare CTON vs JSON token efficiency:
|
|
37
|
+
- `Cton.stats(data)` returns a `Stats` object with comprehensive metrics
|
|
38
|
+
- `Cton.stats_hash(data)` returns stats as a Hash
|
|
39
|
+
- `Stats` includes `json_chars`, `cton_chars`, `savings_percent`, `estimated_json_tokens`, `estimated_cton_tokens`
|
|
40
|
+
- `Stats.compare(data)` compares multiple format variants (CTON, CTON inline, CTON pretty, JSON, JSON pretty)
|
|
41
|
+
|
|
42
|
+
- **Custom Type Registry**: Register custom serializers for domain objects:
|
|
43
|
+
- `Cton.register_type(klass, as: :object) { |value| ... }` registers a type handler
|
|
44
|
+
- `Cton.unregister_type(klass)` removes a handler
|
|
45
|
+
- `Cton.clear_type_registry!` clears all handlers
|
|
46
|
+
- Supports `:object`, `:array`, and `:scalar` modes
|
|
47
|
+
|
|
48
|
+
- **Enhanced CLI**: New command-line options:
|
|
49
|
+
- `--stats` / `-s`: Show token savings statistics comparing JSON vs CTON
|
|
50
|
+
- `--validate`: Validate CTON syntax without conversion
|
|
51
|
+
- `--minify` / `-m`: Output CTON without separators (fully inline)
|
|
52
|
+
- Improved error messages with line/column information and colored output
|
|
53
|
+
|
|
54
|
+
- **Enhanced Error Reporting**: `ParseError` now includes structured location information:
|
|
55
|
+
- `line` and `column` attributes for precise error location
|
|
56
|
+
- `source_excerpt` showing context around the error
|
|
57
|
+
- `suggestions` array for helpful hints
|
|
58
|
+
- `to_h` method for programmatic error handling
|
|
59
|
+
|
|
60
|
+
### Changed
|
|
61
|
+
|
|
62
|
+
- **Decoder optimizations**: Pre-compiled frozen regex patterns (`SAFE_KEY_PATTERN`, `INTEGER_PATTERN`, `FLOAT_PATTERN`) for faster matching
|
|
63
|
+
- **Encoder**: Now uses frozen regex constants for `SAFE_TOKEN` and `NUMERIC_TOKEN`
|
|
64
|
+
- **RBS signatures**: Comprehensive type signatures for all new APIs
|
|
65
|
+
|
|
66
|
+
### Fixed
|
|
67
|
+
|
|
68
|
+
- **Comment handling**: Whitespace and comments are now properly skipped in all parsing contexts
|
|
69
|
+
|
|
8
70
|
## [0.3.0] - 2025-11-20
|
|
9
71
|
|
|
10
72
|
### Added
|
data/README.md
CHANGED
|
@@ -3,92 +3,73 @@
|
|
|
3
3
|
[](https://badge.fury.io/rb/cton)
|
|
4
4
|
[](https://github.com/davidesantangelo/cton/blob/master/LICENSE.txt)
|
|
5
5
|
|
|
6
|
-
|
|
6
|
+
CTON (Compact Token-Oriented Notation) is a token-efficient, JSON-compatible wire format built for LLM prompts. It keeps structure explicit (objects, arrays, table arrays) while removing syntactic noise, so prompts are shorter and outputs are easier to validate. CTON is deterministic and round-trippable, making it safe for LLM workflows.
|
|
7
|
+
|
|
8
|
+
**CTON is designed to be the reference language for LLM data exchange**: short, deterministic, schema-aware.
|
|
7
9
|
|
|
8
10
|
---
|
|
9
11
|
|
|
10
|
-
##
|
|
12
|
+
## Quickstart
|
|
11
13
|
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
- [Token Savings](#token-savings-vs-json--toon)
|
|
16
|
-
- [Installation](#installation)
|
|
17
|
-
- [Usage](#usage)
|
|
18
|
-
- [Performance & Benchmarks](#performance--benchmarks)
|
|
19
|
-
- [Teaching CTON to LLMs](#teaching-cton-to-llms)
|
|
20
|
-
- [Development](#development)
|
|
21
|
-
- [Contributing](#contributing)
|
|
22
|
-
- [License](#license)
|
|
14
|
+
```bash
|
|
15
|
+
bundle add cton
|
|
16
|
+
```
|
|
23
17
|
|
|
24
|
-
|
|
18
|
+
```ruby
|
|
19
|
+
require "cton"
|
|
25
20
|
|
|
26
|
-
|
|
21
|
+
payload = {
|
|
22
|
+
"user" => { "id" => 42, "name" => "Ada" },
|
|
23
|
+
"tags" => ["llm", "compact"],
|
|
24
|
+
"events" => [
|
|
25
|
+
{ "id" => 1, "action" => "login" },
|
|
26
|
+
{ "id" => 2, "action" => "upload" }
|
|
27
|
+
]
|
|
28
|
+
}
|
|
27
29
|
|
|
28
|
-
|
|
30
|
+
cton = Cton.dump(payload)
|
|
31
|
+
# => user(id=42,name=Ada)
|
|
32
|
+
# => tags[2]=llm,compact
|
|
33
|
+
# => events[2]{id,action}=1,login;2,upload
|
|
29
34
|
|
|
30
|
-
|
|
35
|
+
round_trip = Cton.load(cton)
|
|
36
|
+
# => same as payload
|
|
37
|
+
```
|
|
31
38
|
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
39
|
+
```bash
|
|
40
|
+
# CLI usage
|
|
41
|
+
cton input.json
|
|
42
|
+
cton --to-json data.cton
|
|
43
|
+
cton --stats input.json
|
|
44
|
+
```
|
|
38
45
|
|
|
39
46
|
---
|
|
40
47
|
|
|
41
|
-
##
|
|
48
|
+
## Why CTON for LLMs?
|
|
42
49
|
|
|
43
|
-
|
|
50
|
+
- **Shorter prompts**: CTON removes braces, indentation, and repeated keys.
|
|
51
|
+
- **Schema hints built-in**: arrays include length and tables include headers.
|
|
52
|
+
- **Deterministic output**: round-trip safe and validates structure.
|
|
53
|
+
- **LLM-friendly**: small grammar + clear guardrails for generation.
|
|
44
54
|
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
"id": 123
|
|
51
|
-
}
|
|
52
|
-
```
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
## CTON in 60 seconds
|
|
58
|
+
|
|
59
|
+
### Objects & Scalars
|
|
53
60
|
|
|
54
|
-
**CTON**
|
|
55
61
|
```text
|
|
56
62
|
task=planning,urgent=true,id=123
|
|
57
63
|
```
|
|
58
64
|
|
|
59
65
|
### Nested Objects
|
|
60
66
|
|
|
61
|
-
**JSON**
|
|
62
|
-
```json
|
|
63
|
-
{
|
|
64
|
-
"user": {
|
|
65
|
-
"name": "Davide",
|
|
66
|
-
"settings": {
|
|
67
|
-
"theme": "dark"
|
|
68
|
-
}
|
|
69
|
-
}
|
|
70
|
-
}
|
|
71
|
-
```
|
|
72
|
-
|
|
73
|
-
**CTON**
|
|
74
67
|
```text
|
|
75
|
-
user(name=
|
|
68
|
+
user(name=Ada,settings(theme=dark))
|
|
76
69
|
```
|
|
77
70
|
|
|
78
|
-
### Arrays
|
|
71
|
+
### Arrays & Tables
|
|
79
72
|
|
|
80
|
-
**JSON**
|
|
81
|
-
```json
|
|
82
|
-
{
|
|
83
|
-
"tags": ["ruby", "gem", "llm"],
|
|
84
|
-
"files": [
|
|
85
|
-
{ "name": "README.md", "size": 1024 },
|
|
86
|
-
{ "name": "lib/cton.rb", "size": 2048 }
|
|
87
|
-
]
|
|
88
|
-
}
|
|
89
|
-
```
|
|
90
|
-
|
|
91
|
-
**CTON**
|
|
92
73
|
```text
|
|
93
74
|
tags[3]=ruby,gem,llm
|
|
94
75
|
files[2]{name,size}=README.md,1024;lib/cton.rb,2048
|
|
@@ -96,236 +77,146 @@ files[2]{name,size}=README.md,1024;lib/cton.rb,2048
|
|
|
96
77
|
|
|
97
78
|
---
|
|
98
79
|
|
|
99
|
-
##
|
|
100
|
-
|
|
101
|
-
- **Less noise than YAML/JSON**: no indentation, no braces around the root, and optional quoting.
|
|
102
|
-
- **Schema guardrails**: arrays carry their length (`friends[3]`) and table headers (`{id,name,...}`) so downstream parsing can verify shape.
|
|
103
|
-
- **LLM-friendly**: works as a single string you can embed in a prompt together with short parsing instructions.
|
|
104
|
-
- **Token savings**: CTON compounds the JSON → TOON savings.
|
|
80
|
+
## LLM Prompt Kit (Recommended)
|
|
105
81
|
|
|
106
|
-
|
|
82
|
+
System prompt template:
|
|
107
83
|
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
- **Net effect**: In practice you can often reclaim **50–60% of the token budget** versus raw JSON, leaving more room for instructions or reasoning steps while keeping a deterministic schema.
|
|
84
|
+
```markdown
|
|
85
|
+
You are an expert in CTON (Compact Token-Oriented Notation). Convert between JSON and CTON following the rules below and preserve the schema exactly.
|
|
111
86
|
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
bundle add cton
|
|
87
|
+
Rules:
|
|
88
|
+
1. Do not wrap the root in `{}`.
|
|
89
|
+
2. Objects use `key=value` and nested objects use `key(...)`.
|
|
90
|
+
3. Arrays are `key[N]=v1,v2` and table arrays are `key[N]{k1,k2}=v1,v2;v1,v2`.
|
|
91
|
+
4. Use unquoted literals for `true`, `false`, and `null`.
|
|
92
|
+
5. Quote strings containing reserved characters (`,`, `;`, `=`, `(`, `)`) or whitespace.
|
|
93
|
+
6. Always keep array length and table headers accurate.
|
|
120
94
|
```
|
|
121
95
|
|
|
122
|
-
|
|
96
|
+
Few-shot example:
|
|
123
97
|
|
|
124
|
-
```
|
|
125
|
-
|
|
98
|
+
```text
|
|
99
|
+
JSON: {"team":[{"id":1,"name":"Ada"},{"id":2,"name":"Lin"}]}
|
|
100
|
+
CTON: team[2]{id,name}=1,Ada;2,Lin
|
|
126
101
|
```
|
|
127
102
|
|
|
128
103
|
---
|
|
129
104
|
|
|
130
|
-
##
|
|
131
|
-
|
|
132
|
-
```ruby
|
|
133
|
-
require "cton"
|
|
134
|
-
|
|
135
|
-
payload = {
|
|
136
|
-
"context" => {
|
|
137
|
-
"task" => "Our favorite hikes together",
|
|
138
|
-
"location" => "Boulder",
|
|
139
|
-
"season" => "spring_2025"
|
|
140
|
-
},
|
|
141
|
-
"friends" => %w[ana luis sam],
|
|
142
|
-
"hikes" => [
|
|
143
|
-
{ "id" => 1, "name" => "Blue Lake Trail", "distanceKm" => 7.5, "elevationGain" => 320, "companion" => "ana", "wasSunny" => true },
|
|
144
|
-
{ "id" => 2, "name" => "Ridge Overlook", "distanceKm" => 9.2, "elevationGain" => 540, "companion" => "luis", "wasSunny" => false },
|
|
145
|
-
{ "id" => 3, "name" => "Wildflower Loop", "distanceKm" => 5.1, "elevationGain" => 180, "companion" => "sam", "wasSunny" => true }
|
|
146
|
-
]
|
|
147
|
-
}
|
|
148
|
-
|
|
149
|
-
# Encode to CTON
|
|
150
|
-
cton = Cton.dump(payload)
|
|
151
|
-
# => "context(... )\nfriends[3]=ana,luis,sam\nhikes[3]{...}"
|
|
152
|
-
|
|
153
|
-
# Decode back to Hash
|
|
154
|
-
round_tripped = Cton.load(cton)
|
|
155
|
-
# => original hash
|
|
156
|
-
|
|
157
|
-
# Need symbols?
|
|
158
|
-
symbolized = Cton.load(cton, symbolize_names: true)
|
|
159
|
-
|
|
160
|
-
# Want a truly inline document? Opt in explicitly (decoding becomes unsafe for ambiguous cases).
|
|
161
|
-
inline = Cton.dump(payload, separator: "")
|
|
105
|
+
## Schema Validation (1.0.0)
|
|
162
106
|
|
|
163
|
-
|
|
164
|
-
pretty = Cton.dump(payload, pretty: true)
|
|
107
|
+
CTON ships with a schema DSL for validation inside your LLM pipeline.
|
|
165
108
|
|
|
166
|
-
|
|
167
|
-
|
|
168
|
-
|
|
109
|
+
```ruby
|
|
110
|
+
schema = Cton.schema do
|
|
111
|
+
object do
|
|
112
|
+
key "user" do
|
|
113
|
+
object do
|
|
114
|
+
key "id", integer
|
|
115
|
+
key "name", string
|
|
116
|
+
optional "role", enum("admin", "viewer")
|
|
117
|
+
end
|
|
118
|
+
end
|
|
119
|
+
key "tags", array(of: string)
|
|
120
|
+
end
|
|
169
121
|
end
|
|
170
122
|
|
|
171
|
-
|
|
172
|
-
|
|
173
|
-
strict = Cton.dump(payload, decimal_mode: :precise)
|
|
123
|
+
result = Cton.validate_schema(payload, schema)
|
|
124
|
+
puts result.valid? # true/false
|
|
174
125
|
```
|
|
175
126
|
|
|
176
|
-
|
|
127
|
+
Schema files can be used from the CLI as well:
|
|
177
128
|
|
|
178
|
-
|
|
129
|
+
```ruby
|
|
130
|
+
# schema.rb
|
|
131
|
+
CTON_SCHEMA = Cton.schema do
|
|
132
|
+
object do
|
|
133
|
+
key "user", object { key "id", integer }
|
|
134
|
+
end
|
|
135
|
+
end
|
|
136
|
+
```
|
|
179
137
|
|
|
180
138
|
```bash
|
|
181
|
-
|
|
182
|
-
echo '{"hello": "world"}' | cton
|
|
183
|
-
# => hello=world
|
|
184
|
-
|
|
185
|
-
# Convert CTON to JSON
|
|
186
|
-
echo 'hello=world' | cton --to-json
|
|
187
|
-
# => {"hello":"world"}
|
|
188
|
-
|
|
189
|
-
# Pretty print
|
|
190
|
-
cton --pretty input.json
|
|
139
|
+
cton --schema schema.rb input.cton
|
|
191
140
|
```
|
|
192
141
|
|
|
193
|
-
|
|
194
|
-
|
|
195
|
-
#### Extended Types
|
|
196
|
-
CTON natively supports serialization for:
|
|
197
|
-
- `Time` and `Date` (ISO8601 strings)
|
|
198
|
-
- `Set` (converted to Arrays)
|
|
199
|
-
- `OpenStruct` (converted to Objects)
|
|
142
|
+
---
|
|
200
143
|
|
|
201
|
-
|
|
202
|
-
Whenever an array is made of hashes that all expose the same scalar keys, the encoder flattens it into a table to save tokens. Mixed or nested arrays fall back to `[N]=(value1,value2,...)`.
|
|
144
|
+
## Streaming IO (1.0.0)
|
|
203
145
|
|
|
204
|
-
|
|
205
|
-
Removing every newline makes certain inputs ambiguous because `sam` and the next key `hikes` can merge into `samhikes`. The default `separator: "\n"` avoids that by inserting a single newline between root segments. You may pass `separator: ""` to `Cton.dump` for maximum compactness, but decoding such strings is only safe if you can guarantee extra quoting or whitespace between segments. When you intentionally omit separators, keep next-level keys alphabetic (e.g., `payload`, `k42`) so the decoder's boundary heuristic can split `...1payload...` without misclassifying numeric prefixes.
|
|
146
|
+
Handle newline-delimited CTON streams efficiently:
|
|
206
147
|
|
|
207
|
-
|
|
208
|
-
|
|
209
|
-
|
|
210
|
-
|
|
211
|
-
|
|
148
|
+
```ruby
|
|
149
|
+
io = File.open("events.cton", "r")
|
|
150
|
+
Cton.load_stream(io).each do |event|
|
|
151
|
+
# process event
|
|
152
|
+
end
|
|
153
|
+
```
|
|
212
154
|
|
|
213
|
-
|
|
214
|
-
|
|
215
|
-
|
|
216
|
-
|
|
155
|
+
```ruby
|
|
156
|
+
io = File.open("events.cton", "w")
|
|
157
|
+
Cton.dump_stream(events, io)
|
|
158
|
+
```
|
|
217
159
|
|
|
218
160
|
---
|
|
219
161
|
|
|
220
|
-
##
|
|
162
|
+
## CTON-B (Binary Mode)
|
|
221
163
|
|
|
222
|
-
CTON
|
|
164
|
+
CTON-B is an optional binary envelope for compact transport (with optional compression):
|
|
223
165
|
|
|
224
|
-
```
|
|
225
|
-
|
|
226
|
-
|
|
227
|
-
ITERATIONS=2000 STREAM_SIZE=400 bundle exec ruby bench/encode_decode_bench.rb
|
|
166
|
+
```ruby
|
|
167
|
+
binary = Cton.dump_binary(payload)
|
|
168
|
+
round_trip = Cton.load_binary(binary)
|
|
228
169
|
```
|
|
229
170
|
|
|
230
|
-
|
|
171
|
+
CLI:
|
|
231
172
|
|
|
232
|
-
|
|
233
|
-
|
|
234
|
-
|
|
235
|
-
|
|
236
|
-
| `json generate` | 0.027 |
|
|
237
|
-
| `cton load` | 2.067 |
|
|
238
|
-
| `json parse` | 0.045 |
|
|
239
|
-
| `cton inline load` (separator=`""`, double payload) | 4.140 |
|
|
173
|
+
```bash
|
|
174
|
+
cton --to-binary input.json > output.ctonb
|
|
175
|
+
cton --from-binary output.ctonb
|
|
176
|
+
```
|
|
240
177
|
|
|
241
|
-
|
|
178
|
+
Note: `--stream` with binary assumes newline-delimited binary frames.
|
|
242
179
|
|
|
243
180
|
---
|
|
244
181
|
|
|
245
|
-
##
|
|
246
|
-
|
|
247
|
-
Use this system prompt to teach an LLM how to understand and generate CTON:
|
|
248
|
-
|
|
249
|
-
````markdown
|
|
250
|
-
You are an expert in data serialization and specifically in CTON (Compact Token-Oriented Notation). CTON is a token-efficient data format optimized for LLMs that serves as a compact alternative to JSON.
|
|
251
|
-
|
|
252
|
-
Your task is to interpret CTON input and convert it to JSON, or convert JSON input into valid CTON format, following the specification below.
|
|
253
|
-
|
|
254
|
-
### CTON Specification
|
|
255
|
-
|
|
256
|
-
CTON minimizes syntax characters (braces, quotes) while preserving structure and type safety.
|
|
257
|
-
|
|
258
|
-
**1. Basic Structure (Key-Value)**
|
|
259
|
-
- **Rule:** Do not use outer curly braces `{}` for the root object.
|
|
260
|
-
- **Rule:** Use `=` to separate keys and values.
|
|
261
|
-
- **Rule:** Use `,` to separate fields.
|
|
262
|
-
- **Rule:** Do not use quotes around "safe" strings (alphanumeric, simple text).
|
|
263
|
-
- **Example:** - JSON: `{"task": "planning", "urgent": true}`
|
|
264
|
-
- CTON: `task=planning,urgent=true`
|
|
182
|
+
## Performance & Benchmarks
|
|
265
183
|
|
|
266
|
-
|
|
267
|
-
- **Rule:** Use parentheses `()` to denote a nested object instead of `{}`.
|
|
268
|
-
- **Example:**
|
|
269
|
-
- JSON: `{"context": {"user": "Davide", "theme": "dark"}}`
|
|
270
|
-
- CTON: `context(user=Davide,theme=dark)`
|
|
184
|
+
CTON focuses on throughput: memoized table schemas, low-allocation scalar streams, and fast boundary detection for inline docs.
|
|
271
185
|
|
|
272
|
-
|
|
273
|
-
- **Rule:** Use the syntax `key[count]{columns}=values` for arrays of objects to avoid repeating keys.
|
|
274
|
-
- **Structure:** `key[Length]{col1,col2}=val1,val2;val1,val2`
|
|
275
|
-
- **Details:** - `[N]` denotes the number of items in the array.
|
|
276
|
-
- `{col1,col2}` defines the schema headers.
|
|
277
|
-
- `;` separates distinct objects (rows).
|
|
278
|
-
- `,` separates values within an object.
|
|
279
|
-
- **Example:**
|
|
186
|
+
Run benchmarks:
|
|
280
187
|
|
|
281
|
-
|
|
282
|
-
|
|
283
|
-
|
|
284
|
-
"files": [
|
|
285
|
-
{ "name": "README.md", "size": 1024 },
|
|
286
|
-
{ "name": "lib.rb", "size": 2048 }
|
|
287
|
-
]
|
|
288
|
-
}
|
|
188
|
+
```bash
|
|
189
|
+
bundle exec ruby bench/encode_decode_bench.rb
|
|
190
|
+
ITERATIONS=2000 STREAM_SIZE=400 bundle exec ruby bench/encode_decode_bench.rb
|
|
289
191
|
```
|
|
290
192
|
|
|
291
|
-
|
|
292
|
-
|
|
293
|
-
**4. Type Safety & Literals**
|
|
294
|
-
- **Booleans/Null:** `true`, `false`, and `null` are preserved as literals (unquoted).
|
|
295
|
-
- **Numbers:** Integers and floats are written as is (e.g., `1024`, `3.14`).
|
|
296
|
-
- **Escaping:** If a string value looks like a boolean, number, or contains reserved characters (like `,`, `;`, `=`, `(`, `)`), it must be wrapped in double quotes (e.g., `"true"`).
|
|
193
|
+
---
|
|
297
194
|
|
|
298
|
-
|
|
195
|
+
## CLI Reference
|
|
299
196
|
|
|
300
|
-
|
|
301
|
-
|
|
302
|
-
|
|
303
|
-
|
|
304
|
-
|
|
305
|
-
|
|
306
|
-
|
|
307
|
-
|
|
308
|
-
|
|
309
|
-
|
|
197
|
+
```bash
|
|
198
|
+
cton [input] # auto-detect JSON/CTON
|
|
199
|
+
cton --to-json input.cton # CTON → JSON
|
|
200
|
+
cton --to-cton input.json # JSON → CTON
|
|
201
|
+
cton --to-binary input.json # JSON → CTON-B
|
|
202
|
+
cton --from-binary input.ctonb
|
|
203
|
+
cton --minify input.json # no separators
|
|
204
|
+
cton --pretty input.json
|
|
205
|
+
cton --stream input.ndjson
|
|
206
|
+
cton --schema schema.rb input.cton
|
|
310
207
|
```
|
|
311
|
-
````
|
|
312
208
|
|
|
313
209
|
---
|
|
314
210
|
|
|
315
|
-
## Type Safety
|
|
316
|
-
|
|
317
|
-
CTON ships with RBS signatures (`sig/cton.rbs`) to support type checking and IDE autocompletion.
|
|
318
|
-
|
|
319
211
|
## Development
|
|
320
212
|
|
|
321
213
|
```bash
|
|
322
214
|
bin/setup # install dependencies
|
|
323
215
|
bundle exec rake # run tests and rubocop
|
|
324
216
|
bin/console # interactive playground
|
|
325
|
-
bundle exec ruby bench/encode_decode_bench.rb # performance smoke test
|
|
326
217
|
```
|
|
327
218
|
|
|
328
|
-
|
|
219
|
+
---
|
|
329
220
|
|
|
330
221
|
## Contributing
|
|
331
222
|
|
data/lib/cton/binary.rb
ADDED
|
@@ -0,0 +1,74 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "zlib"
|
|
4
|
+
|
|
5
|
+
module Cton
|
|
6
|
+
module Binary
|
|
7
|
+
MAGIC = "CTON".b
|
|
8
|
+
VERSION = 1
|
|
9
|
+
FLAG_COMPRESSED = 1
|
|
10
|
+
|
|
11
|
+
module_function
|
|
12
|
+
|
|
13
|
+
def dump(data, compress: true, **options)
|
|
14
|
+
payload = Cton.dump(data, **options).b
|
|
15
|
+
flags = 0
|
|
16
|
+
|
|
17
|
+
if compress
|
|
18
|
+
payload = Zlib.deflate(payload)
|
|
19
|
+
flags |= FLAG_COMPRESSED
|
|
20
|
+
end
|
|
21
|
+
|
|
22
|
+
header = MAGIC + [VERSION, flags].pack("CC")
|
|
23
|
+
header + encode_varint(payload.bytesize) + payload
|
|
24
|
+
end
|
|
25
|
+
|
|
26
|
+
def load(binary)
|
|
27
|
+
source = binary.to_s.b
|
|
28
|
+
raise Cton::Error, "Invalid CTON-B header" unless source.start_with?(MAGIC)
|
|
29
|
+
|
|
30
|
+
version = source.getbyte(4)
|
|
31
|
+
flags = source.getbyte(5)
|
|
32
|
+
raise Cton::Error, "Unsupported CTON-B version" unless version == VERSION
|
|
33
|
+
|
|
34
|
+
length, consumed = decode_varint(source, 6)
|
|
35
|
+
payload_start = 6 + consumed
|
|
36
|
+
payload = source.byteslice(payload_start, length)
|
|
37
|
+
raise Cton::Error, "Invalid CTON-B payload length" if payload.nil? || payload.bytesize < length
|
|
38
|
+
|
|
39
|
+
payload = Zlib.inflate(payload) if (flags & FLAG_COMPRESSED).positive?
|
|
40
|
+
|
|
41
|
+
Cton.load(payload)
|
|
42
|
+
end
|
|
43
|
+
|
|
44
|
+
def encode_varint(value)
|
|
45
|
+
bytes = []
|
|
46
|
+
remaining = value
|
|
47
|
+
while remaining >= 0x80
|
|
48
|
+
bytes << ((remaining & 0x7f) | 0x80)
|
|
49
|
+
remaining >>= 7
|
|
50
|
+
end
|
|
51
|
+
bytes << remaining
|
|
52
|
+
bytes.pack("C*")
|
|
53
|
+
end
|
|
54
|
+
|
|
55
|
+
def decode_varint(source, offset)
|
|
56
|
+
result = 0
|
|
57
|
+
shift = 0
|
|
58
|
+
index = offset
|
|
59
|
+
|
|
60
|
+
loop do
|
|
61
|
+
byte = source.getbyte(index)
|
|
62
|
+
raise Cton::Error, "Invalid CTON-B varint" unless byte
|
|
63
|
+
|
|
64
|
+
result |= (byte & 0x7f) << shift
|
|
65
|
+
index += 1
|
|
66
|
+
break if (byte & 0x80).zero?
|
|
67
|
+
|
|
68
|
+
shift += 7
|
|
69
|
+
end
|
|
70
|
+
|
|
71
|
+
[result, index - offset]
|
|
72
|
+
end
|
|
73
|
+
end
|
|
74
|
+
end
|