cton 0.3.0 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 1c9161ae830ba6b01d3ec94d1170fc4295aaacfa839c869aaa6adefe2711cc2d
4
- data.tar.gz: 80c4ba30abbf8a562bde581f26e7dc5529aa46275b61930fe28370f34156db61
3
+ metadata.gz: a1d32c4a3d5f55726071135d05705be7084d7c8332a1f03a5a724523981a7b81
4
+ data.tar.gz: '0813fb27119f36ab0117e7ca1401fe33d8826bc7bfbe09b68589dece8ae2f376'
5
5
  SHA512:
6
- metadata.gz: 914196284081bacd5b7f5f6ac9a1b246ea8924eddbb26cd28796b12a2ee2156718a9ff3b795b86188d483da4fc294b002a93e935815f692b432dac78b5304dcf
7
- data.tar.gz: d9b1bfb1f7de402fe9de0d7da90f750d490dbdcb4e572e9283bc3ed15b1b43e3f3f9b44a4031f80ecf5d610a73d64719f3c78f170060de78035e04dab3a9d663
6
+ metadata.gz: e536052cfe992ab7a5844640b622093a5d6f091edfa6af6ce456761e229fa1eb949ce50f103e9aadca54fa6abce66b0fc535fb5335c884e5de5bd2b29408d747
7
+ data.tar.gz: 66be8d4e9ff12a3e99dd138c82aa676cdc6e8f9e61a07183137b24de07b2f2d177c9a11b6e27e2407476dc45a19e3b012b63be87f942981a34002a6a292f791e
data/CHANGELOG.md CHANGED
@@ -5,6 +5,68 @@ All notable changes to this project will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [1.0.0] - 2026-01-17
9
+
10
+ ### Added
11
+
12
+ - **Schema Validation DSL**: Define schemas via `Cton.schema` and validate data with `Cton.validate_schema` for LLM-safe outputs.
13
+ - **Streaming APIs**: `Cton.load_stream`, `Cton.dump_stream`, plus `StreamReader`/`StreamWriter` for newline-delimited documents.
14
+ - **CTON-B Binary Mode**: Optional binary envelope with compression via `Cton.dump_binary`/`Cton.load_binary`.
15
+ - **CLI Enhancements**: `--schema`, `--stream`, `--to-binary`, and `--from-binary` support.
16
+
17
+ ### Changed
18
+
19
+ - **Performance**: Faster scalar scans in the decoder and reusable scalar buffers in the encoder.
20
+ - **Docs**: README refocused on LLM usage, schema validation, and streaming workflows.
21
+
22
+ ## [0.4.0] - 2025-11-26
23
+
24
+ ### Added
25
+
26
+ - **Comment Support**: CTON now supports single-line comments using `#` syntax. Comments are ignored during parsing, allowing for annotated data files.
27
+ - Decoder skips comments (from `#` to end of line) during parsing
28
+ - Encoder can emit comments via new `comments:` option: `Cton.dump(data, comments: { "key" => "description" })`
29
+
30
+ - **Validation API**: New methods to validate CTON syntax without full parsing:
31
+ - `Cton.valid?(string)` returns `true` or `false`
32
+ - `Cton.validate(string)` returns a `ValidationResult` object with detailed error information
33
+ - `ValidationResult` includes `valid?`, `errors`, and `to_s` methods
34
+ - `ValidationError` includes `message`, `line`, `column`, and `source_excerpt`
35
+
36
+ - **Token Statistics API**: Analyze and compare CTON vs JSON token efficiency:
37
+ - `Cton.stats(data)` returns a `Stats` object with comprehensive metrics
38
+ - `Cton.stats_hash(data)` returns stats as a Hash
39
+ - `Stats` includes `json_chars`, `cton_chars`, `savings_percent`, `estimated_json_tokens`, `estimated_cton_tokens`
40
+ - `Stats.compare(data)` compares multiple format variants (CTON, CTON inline, CTON pretty, JSON, JSON pretty)
41
+
42
+ - **Custom Type Registry**: Register custom serializers for domain objects:
43
+ - `Cton.register_type(klass, as: :object) { |value| ... }` registers a type handler
44
+ - `Cton.unregister_type(klass)` removes a handler
45
+ - `Cton.clear_type_registry!` clears all handlers
46
+ - Supports `:object`, `:array`, and `:scalar` modes
47
+
48
+ - **Enhanced CLI**: New command-line options:
49
+ - `--stats` / `-s`: Show token savings statistics comparing JSON vs CTON
50
+ - `--validate`: Validate CTON syntax without conversion
51
+ - `--minify` / `-m`: Output CTON without separators (fully inline)
52
+ - Improved error messages with line/column information and colored output
53
+
54
+ - **Enhanced Error Reporting**: `ParseError` now includes structured location information:
55
+ - `line` and `column` attributes for precise error location
56
+ - `source_excerpt` showing context around the error
57
+ - `suggestions` array for helpful hints
58
+ - `to_h` method for programmatic error handling
59
+
60
+ ### Changed
61
+
62
+ - **Decoder optimizations**: Pre-compiled frozen regex patterns (`SAFE_KEY_PATTERN`, `INTEGER_PATTERN`, `FLOAT_PATTERN`) for faster matching
63
+ - **Encoder**: Now uses frozen regex constants for `SAFE_TOKEN` and `NUMERIC_TOKEN`
64
+ - **RBS signatures**: Comprehensive type signatures for all new APIs
65
+
66
+ ### Fixed
67
+
68
+ - **Comment handling**: Whitespace and comments are now properly skipped in all parsing contexts
69
+
8
70
  ## [0.3.0] - 2025-11-20
9
71
 
10
72
  ### Added
data/README.md CHANGED
@@ -3,92 +3,73 @@
3
3
  [![Gem Version](https://badge.fury.io/rb/cton.svg)](https://badge.fury.io/rb/cton)
4
4
  [![License](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/davidesantangelo/cton/blob/master/LICENSE.txt)
5
5
 
6
- **CTON** (Compact Token-Oriented Notation) is an aggressively minified, JSON-compatible wire format that keeps prompts short without giving up schema hints. It is shape-preserving (objects, arrays, scalars, table-like arrays) and deterministic, so you can safely round-trip between Ruby hashes and compact strings that work well in LLM prompts.
6
+ CTON (Compact Token-Oriented Notation) is a token-efficient, JSON-compatible wire format built for LLM prompts. It keeps structure explicit (objects, arrays, table arrays) while removing syntactic noise, so prompts are shorter and outputs are easier to validate. CTON is deterministic and round-trippable, making it safe for LLM workflows.
7
+
8
+ **CTON is designed to be the reference language for LLM data exchange**: short, deterministic, schema-aware.
7
9
 
8
10
  ---
9
11
 
10
- ## 📖 Table of Contents
12
+ ## Quickstart
11
13
 
12
- - [What is CTON?](#what-is-cton)
13
- - [Why another format?](#why-another-format)
14
- - [Examples](#examples)
15
- - [Token Savings](#token-savings-vs-json--toon)
16
- - [Installation](#installation)
17
- - [Usage](#usage)
18
- - [Performance & Benchmarks](#performance--benchmarks)
19
- - [Teaching CTON to LLMs](#teaching-cton-to-llms)
20
- - [Development](#development)
21
- - [Contributing](#contributing)
22
- - [License](#license)
14
+ ```bash
15
+ bundle add cton
16
+ ```
23
17
 
24
- ---
18
+ ```ruby
19
+ require "cton"
25
20
 
26
- ## What is CTON?
21
+ payload = {
22
+ "user" => { "id" => 42, "name" => "Ada" },
23
+ "tags" => ["llm", "compact"],
24
+ "events" => [
25
+ { "id" => 1, "action" => "login" },
26
+ { "id" => 2, "action" => "upload" }
27
+ ]
28
+ }
27
29
 
28
- CTON is designed to be the most efficient way to represent structured data for Large Language Models (LLMs). It strips away the "syntactic sugar" of JSON that humans like (indentation, excessive quoting, braces) but machines don't strictly need, while adding "structural hints" that help LLMs generate valid output.
30
+ cton = Cton.dump(payload)
31
+ # => user(id=42,name=Ada)
32
+ # => tags[2]=llm,compact
33
+ # => events[2]{id,action}=1,login;2,upload
29
34
 
30
- ### Key Concepts
35
+ round_trip = Cton.load(cton)
36
+ # => same as payload
37
+ ```
31
38
 
32
- 1. **Root is Implicit**: No curly braces `{}` wrapping the entire document.
33
- 2. **Minimal Punctuation**:
34
- * Objects use `key=value`.
35
- * Nested objects use parentheses `(key=value)`.
36
- * Arrays use brackets with length `[N]=item1,item2`.
37
- 3. **Table Compression**: If an array contains objects with the same keys, CTON automatically converts it into a table format `[N]{header1,header2}=val1,val2;val3,val4`. This is a massive token saver for datasets.
39
+ ```bash
40
+ # CLI usage
41
+ cton input.json
42
+ cton --to-json data.cton
43
+ cton --stats input.json
44
+ ```
38
45
 
39
46
  ---
40
47
 
41
- ## Examples
48
+ ## Why CTON for LLMs?
42
49
 
43
- ### Simple Key-Value Pairs
50
+ - **Shorter prompts**: CTON removes braces, indentation, and repeated keys.
51
+ - **Schema hints built-in**: arrays include length and tables include headers.
52
+ - **Deterministic output**: round-trip safe and validates structure.
53
+ - **LLM-friendly**: small grammar + clear guardrails for generation.
44
54
 
45
- **JSON**
46
- ```json
47
- {
48
- "task": "planning",
49
- "urgent": true,
50
- "id": 123
51
- }
52
- ```
55
+ ---
56
+
57
+ ## CTON in 60 seconds
58
+
59
+ ### Objects & Scalars
53
60
 
54
- **CTON**
55
61
  ```text
56
62
  task=planning,urgent=true,id=123
57
63
  ```
58
64
 
59
65
  ### Nested Objects
60
66
 
61
- **JSON**
62
- ```json
63
- {
64
- "user": {
65
- "name": "Davide",
66
- "settings": {
67
- "theme": "dark"
68
- }
69
- }
70
- }
71
- ```
72
-
73
- **CTON**
74
67
  ```text
75
- user(name=Davide,settings(theme=dark))
68
+ user(name=Ada,settings(theme=dark))
76
69
  ```
77
70
 
78
- ### Arrays and Tables
71
+ ### Arrays & Tables
79
72
 
80
- **JSON**
81
- ```json
82
- {
83
- "tags": ["ruby", "gem", "llm"],
84
- "files": [
85
- { "name": "README.md", "size": 1024 },
86
- { "name": "lib/cton.rb", "size": 2048 }
87
- ]
88
- }
89
- ```
90
-
91
- **CTON**
92
73
  ```text
93
74
  tags[3]=ruby,gem,llm
94
75
  files[2]{name,size}=README.md,1024;lib/cton.rb,2048
@@ -96,236 +77,146 @@ files[2]{name,size}=README.md,1024;lib/cton.rb,2048
96
77
 
97
78
  ---
98
79
 
99
- ## Why another format?
100
-
101
- - **Less noise than YAML/JSON**: no indentation, no braces around the root, and optional quoting.
102
- - **Schema guardrails**: arrays carry their length (`friends[3]`) and table headers (`{id,name,...}`) so downstream parsing can verify shape.
103
- - **LLM-friendly**: works as a single string you can embed in a prompt together with short parsing instructions.
104
- - **Token savings**: CTON compounds the JSON → TOON savings.
80
+ ## LLM Prompt Kit (Recommended)
105
81
 
106
- ### Token savings vs JSON & TOON
82
+ System prompt template:
107
83
 
108
- - **JSON → TOON**: The [TOON benchmarks](https://toonformat.dev) report roughly 40% fewer tokens than plain JSON on mixed-structure prompts while retaining accuracy due to explicit array lengths and headers.
109
- - **TOON CTON**: By stripping indentation and forcing everything inline, CTON cuts another ~20–40% of characters.
110
- - **Net effect**: In practice you can often reclaim **50–60% of the token budget** versus raw JSON, leaving more room for instructions or reasoning steps while keeping a deterministic schema.
84
+ ```markdown
85
+ You are an expert in CTON (Compact Token-Oriented Notation). Convert between JSON and CTON following the rules below and preserve the schema exactly.
111
86
 
112
- ---
113
-
114
- ## Installation
115
-
116
- Add the gem to your application:
117
-
118
- ```bash
119
- bundle add cton
87
+ Rules:
88
+ 1. Do not wrap the root in `{}`.
89
+ 2. Objects use `key=value` and nested objects use `key(...)`.
90
+ 3. Arrays are `key[N]=v1,v2` and table arrays are `key[N]{k1,k2}=v1,v2;v1,v2`.
91
+ 4. Use unquoted literals for `true`, `false`, and `null`.
92
+ 5. Quote strings containing reserved characters (`,`, `;`, `=`, `(`, `)`) or whitespace.
93
+ 6. Always keep array length and table headers accurate.
120
94
  ```
121
95
 
122
- Or install it directly:
96
+ Few-shot example:
123
97
 
124
- ```bash
125
- gem install cton
98
+ ```text
99
+ JSON: {"team":[{"id":1,"name":"Ada"},{"id":2,"name":"Lin"}]}
100
+ CTON: team[2]{id,name}=1,Ada;2,Lin
126
101
  ```
127
102
 
128
103
  ---
129
104
 
130
- ## Usage
131
-
132
- ```ruby
133
- require "cton"
134
-
135
- payload = {
136
- "context" => {
137
- "task" => "Our favorite hikes together",
138
- "location" => "Boulder",
139
- "season" => "spring_2025"
140
- },
141
- "friends" => %w[ana luis sam],
142
- "hikes" => [
143
- { "id" => 1, "name" => "Blue Lake Trail", "distanceKm" => 7.5, "elevationGain" => 320, "companion" => "ana", "wasSunny" => true },
144
- { "id" => 2, "name" => "Ridge Overlook", "distanceKm" => 9.2, "elevationGain" => 540, "companion" => "luis", "wasSunny" => false },
145
- { "id" => 3, "name" => "Wildflower Loop", "distanceKm" => 5.1, "elevationGain" => 180, "companion" => "sam", "wasSunny" => true }
146
- ]
147
- }
148
-
149
- # Encode to CTON
150
- cton = Cton.dump(payload)
151
- # => "context(... )\nfriends[3]=ana,luis,sam\nhikes[3]{...}"
152
-
153
- # Decode back to Hash
154
- round_tripped = Cton.load(cton)
155
- # => original hash
156
-
157
- # Need symbols?
158
- symbolized = Cton.load(cton, symbolize_names: true)
159
-
160
- # Want a truly inline document? Opt in explicitly (decoding becomes unsafe for ambiguous cases).
161
- inline = Cton.dump(payload, separator: "")
105
+ ## Schema Validation (1.0.0)
162
106
 
163
- # Pretty print for human readability
164
- pretty = Cton.dump(payload, pretty: true)
107
+ CTON ships with a schema DSL for validation inside your LLM pipeline.
165
108
 
166
- # Stream to an IO object (file, socket, etc.)
167
- File.open("data.cton", "w") do |f|
168
- Cton.dump(payload, f)
109
+ ```ruby
110
+ schema = Cton.schema do
111
+ object do
112
+ key "user" do
113
+ object do
114
+ key "id", integer
115
+ key "name", string
116
+ optional "role", enum("admin", "viewer")
117
+ end
118
+ end
119
+ key "tags", array(of: string)
120
+ end
169
121
  end
170
122
 
171
- # Toggle float normalization strategies
172
- fast = Cton.dump(payload) # default :fast mode
173
- strict = Cton.dump(payload, decimal_mode: :precise)
123
+ result = Cton.validate_schema(payload, schema)
124
+ puts result.valid? # true/false
174
125
  ```
175
126
 
176
- ### CLI Tool
127
+ Schema files can be used from the CLI as well:
177
128
 
178
- CTON comes with a command-line tool for quick conversions:
129
+ ```ruby
130
+ # schema.rb
131
+ CTON_SCHEMA = Cton.schema do
132
+ object do
133
+ key "user", object { key "id", integer }
134
+ end
135
+ end
136
+ ```
179
137
 
180
138
  ```bash
181
- # Convert JSON to CTON
182
- echo '{"hello": "world"}' | cton
183
- # => hello=world
184
-
185
- # Convert CTON to JSON
186
- echo 'hello=world' | cton --to-json
187
- # => {"hello":"world"}
188
-
189
- # Pretty print
190
- cton --pretty input.json
139
+ cton --schema schema.rb input.cton
191
140
  ```
192
141
 
193
- ### Advanced Features
194
-
195
- #### Extended Types
196
- CTON natively supports serialization for:
197
- - `Time` and `Date` (ISO8601 strings)
198
- - `Set` (converted to Arrays)
199
- - `OpenStruct` (converted to Objects)
142
+ ---
200
143
 
201
- #### Table detection
202
- Whenever an array is made of hashes that all expose the same scalar keys, the encoder flattens it into a table to save tokens. Mixed or nested arrays fall back to `[N]=(value1,value2,...)`.
144
+ ## Streaming IO (1.0.0)
203
145
 
204
- #### Separators & ambiguity
205
- Removing every newline makes certain inputs ambiguous because `sam` and the next key `hikes` can merge into `samhikes`. The default `separator: "\n"` avoids that by inserting a single newline between root segments. You may pass `separator: ""` to `Cton.dump` for maximum compactness, but decoding such strings is only safe if you can guarantee extra quoting or whitespace between segments. When you intentionally omit separators, keep next-level keys alphabetic (e.g., `payload`, `k42`) so the decoder's boundary heuristic can split `...1payload...` without misclassifying numeric prefixes.
146
+ Handle newline-delimited CTON streams efficiently:
206
147
 
207
- #### Literal safety & number normalization
208
- Following the TOON specification's guardrails, the encoder now:
209
- - Auto-quotes strings that would otherwise be parsed as booleans, `null`, or numbers (e.g., `"true"`, `"007"`, `"1e6"`, `"-5"`) so they round-trip as strings without extra work.
210
- - Canonicalizes float/BigDecimal output: no exponent notation, no trailing zeros, and `-0` collapses to `0`.
211
- - Converts `NaN` and `±Infinity` inputs to `null`, matching TOON's normalization guidance so downstream decoders don't explode on non-finite numbers.
148
+ ```ruby
149
+ io = File.open("events.cton", "r")
150
+ Cton.load_stream(io).each do |event|
151
+ # process event
152
+ end
153
+ ```
212
154
 
213
- #### Decimal normalization modes
214
- - `decimal_mode: :fast` (default) prefers Ruby's native float representation and only falls back to `BigDecimal` when scientific notation is detected, minimizing allocations on tight loops.
215
- - `decimal_mode: :precise` forces the legacy `BigDecimal` path for every float, which is slower but useful for audit-grade dumps where you want deterministic decimal expansion.
216
- - Both modes share the same trailing-zero stripping and `-0 → 0` normalization, so switching modes never affects integer formatting.
155
+ ```ruby
156
+ io = File.open("events.cton", "w")
157
+ Cton.dump_stream(events, io)
158
+ ```
217
159
 
218
160
  ---
219
161
 
220
- ## Performance & Benchmarks
162
+ ## CTON-B (Binary Mode)
221
163
 
222
- CTON focuses on throughput: encoder table schemas are memoized, scalar list encoding keeps a reusable buffer, floats avoid `BigDecimal` when they can, and the decoder slices straight from the raw string to sidestep `StringScanner` allocations. You can reproduce the numbers below with the bundled script:
164
+ CTON-B is an optional binary envelope for compact transport (with optional compression):
223
165
 
224
- ```bash
225
- bundle exec ruby bench/encode_decode_bench.rb
226
- # customize input size / iterations
227
- ITERATIONS=2000 STREAM_SIZE=400 bundle exec ruby bench/encode_decode_bench.rb
166
+ ```ruby
167
+ binary = Cton.dump_binary(payload)
168
+ round_trip = Cton.load_binary(binary)
228
169
  ```
229
170
 
230
- Latest results on Ruby 3.1.4/macOS (M-series), 1,000 iterations, `STREAM_SIZE=200`:
171
+ CLI:
231
172
 
232
- | Benchmark | Time (s) |
233
- | --- | --- |
234
- | `cton dump` (:fast) | 0.626 |
235
- | `cton dump` (:precise) | 0.658 |
236
- | `json generate` | 0.027 |
237
- | `cton load` | 2.067 |
238
- | `json parse` | 0.045 |
239
- | `cton inline load` (separator=`""`, double payload) | 4.140 |
173
+ ```bash
174
+ cton --to-binary input.json > output.ctonb
175
+ cton --from-binary output.ctonb
176
+ ```
240
177
 
241
- `cton inline load` deliberately concatenates documents without separators to stress the new boundary detector; it now finishes without the runaway allocations seen in earlier releases.
178
+ Note: `--stream` with binary assumes newline-delimited binary frames.
242
179
 
243
180
  ---
244
181
 
245
- ## Teaching CTON to LLMs
246
-
247
- Use this system prompt to teach an LLM how to understand and generate CTON:
248
-
249
- ````markdown
250
- You are an expert in data serialization and specifically in CTON (Compact Token-Oriented Notation). CTON is a token-efficient data format optimized for LLMs that serves as a compact alternative to JSON.
251
-
252
- Your task is to interpret CTON input and convert it to JSON, or convert JSON input into valid CTON format, following the specification below.
253
-
254
- ### CTON Specification
255
-
256
- CTON minimizes syntax characters (braces, quotes) while preserving structure and type safety.
257
-
258
- **1. Basic Structure (Key-Value)**
259
- - **Rule:** Do not use outer curly braces `{}` for the root object.
260
- - **Rule:** Use `=` to separate keys and values.
261
- - **Rule:** Use `,` to separate fields.
262
- - **Rule:** Do not use quotes around "safe" strings (alphanumeric, simple text).
263
- - **Example:** - JSON: `{"task": "planning", "urgent": true}`
264
- - CTON: `task=planning,urgent=true`
182
+ ## Performance & Benchmarks
265
183
 
266
- **2. Nested Objects**
267
- - **Rule:** Use parentheses `()` to denote a nested object instead of `{}`.
268
- - **Example:**
269
- - JSON: `{"context": {"user": "Davide", "theme": "dark"}}`
270
- - CTON: `context(user=Davide,theme=dark)`
184
+ CTON focuses on throughput: memoized table schemas, low-allocation scalar streams, and fast boundary detection for inline docs.
271
185
 
272
- **3. Arrays of Objects (Table Compression)**
273
- - **Rule:** Use the syntax `key[count]{columns}=values` for arrays of objects to avoid repeating keys.
274
- - **Structure:** `key[Length]{col1,col2}=val1,val2;val1,val2`
275
- - **Details:** - `[N]` denotes the number of items in the array.
276
- - `{col1,col2}` defines the schema headers.
277
- - `;` separates distinct objects (rows).
278
- - `,` separates values within an object.
279
- - **Example:**
186
+ Run benchmarks:
280
187
 
281
- JSON:
282
- ```json
283
- {
284
- "files": [
285
- { "name": "README.md", "size": 1024 },
286
- { "name": "lib.rb", "size": 2048 }
287
- ]
288
- }
188
+ ```bash
189
+ bundle exec ruby bench/encode_decode_bench.rb
190
+ ITERATIONS=2000 STREAM_SIZE=400 bundle exec ruby bench/encode_decode_bench.rb
289
191
  ```
290
192
 
291
- CTON: `files[2]{name,size}=README.md,1024;lib.rb,2048`
292
-
293
- **4. Type Safety & Literals**
294
- - **Booleans/Null:** `true`, `false`, and `null` are preserved as literals (unquoted).
295
- - **Numbers:** Integers and floats are written as is (e.g., `1024`, `3.14`).
296
- - **Escaping:** If a string value looks like a boolean, number, or contains reserved characters (like `,`, `;`, `=`, `(`, `)`), it must be wrapped in double quotes (e.g., `"true"`).
193
+ ---
297
194
 
298
- ### Examples for Training
195
+ ## CLI Reference
299
196
 
300
- **Input (JSON):**
301
- ```json
302
- {
303
- "id": 123,
304
- "active": true,
305
- "metadata": {
306
- "created_at": "2023-01-01",
307
- "tags": "admin"
308
- }
309
- }
197
+ ```bash
198
+ cton [input] # auto-detect JSON/CTON
199
+ cton --to-json input.cton # CTON → JSON
200
+ cton --to-cton input.json # JSON → CTON
201
+ cton --to-binary input.json # JSON → CTON-B
202
+ cton --from-binary input.ctonb
203
+ cton --minify input.json # no separators
204
+ cton --pretty input.json
205
+ cton --stream input.ndjson
206
+ cton --schema schema.rb input.cton
310
207
  ```
311
- ````
312
208
 
313
209
  ---
314
210
 
315
- ## Type Safety
316
-
317
- CTON ships with RBS signatures (`sig/cton.rbs`) to support type checking and IDE autocompletion.
318
-
319
211
  ## Development
320
212
 
321
213
  ```bash
322
214
  bin/setup # install dependencies
323
215
  bundle exec rake # run tests and rubocop
324
216
  bin/console # interactive playground
325
- bundle exec ruby bench/encode_decode_bench.rb # performance smoke test
326
217
  ```
327
218
 
328
- To release a new version, bump `Cton::VERSION` and run `bundle exec rake release`.
219
+ ---
329
220
 
330
221
  ## Contributing
331
222
 
@@ -0,0 +1,74 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "zlib"
4
+
5
+ module Cton
6
+ module Binary
7
+ MAGIC = "CTON".b
8
+ VERSION = 1
9
+ FLAG_COMPRESSED = 1
10
+
11
+ module_function
12
+
13
+ def dump(data, compress: true, **options)
14
+ payload = Cton.dump(data, **options).b
15
+ flags = 0
16
+
17
+ if compress
18
+ payload = Zlib.deflate(payload)
19
+ flags |= FLAG_COMPRESSED
20
+ end
21
+
22
+ header = MAGIC + [VERSION, flags].pack("CC")
23
+ header + encode_varint(payload.bytesize) + payload
24
+ end
25
+
26
+ def load(binary)
27
+ source = binary.to_s.b
28
+ raise Cton::Error, "Invalid CTON-B header" unless source.start_with?(MAGIC)
29
+
30
+ version = source.getbyte(4)
31
+ flags = source.getbyte(5)
32
+ raise Cton::Error, "Unsupported CTON-B version" unless version == VERSION
33
+
34
+ length, consumed = decode_varint(source, 6)
35
+ payload_start = 6 + consumed
36
+ payload = source.byteslice(payload_start, length)
37
+ raise Cton::Error, "Invalid CTON-B payload length" if payload.nil? || payload.bytesize < length
38
+
39
+ payload = Zlib.inflate(payload) if (flags & FLAG_COMPRESSED).positive?
40
+
41
+ Cton.load(payload)
42
+ end
43
+
44
+ def encode_varint(value)
45
+ bytes = []
46
+ remaining = value
47
+ while remaining >= 0x80
48
+ bytes << ((remaining & 0x7f) | 0x80)
49
+ remaining >>= 7
50
+ end
51
+ bytes << remaining
52
+ bytes.pack("C*")
53
+ end
54
+
55
+ def decode_varint(source, offset)
56
+ result = 0
57
+ shift = 0
58
+ index = offset
59
+
60
+ loop do
61
+ byte = source.getbyte(index)
62
+ raise Cton::Error, "Invalid CTON-B varint" unless byte
63
+
64
+ result |= (byte & 0x7f) << shift
65
+ index += 1
66
+ break if (byte & 0x80).zero?
67
+
68
+ shift += 7
69
+ end
70
+
71
+ [result, index - offset]
72
+ end
73
+ end
74
+ end