cton 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: d011537d6c7b854d8ffffb3dc1e7848534a7919c12f4c1cc4ba84db338cb1669
4
+ data.tar.gz: c96a91aae6acc9dd37e3df0b8e0fe28acfbdb6c20ea1f22a946676962f7bf7ac
5
+ SHA512:
6
+ metadata.gz: 77be2aa2db0a728eaa6835be6a66456fefcd12ca27f32d3fbb3a6d77d91f4b109eca64eb958f24bb63b7ae2e193f389029fc6f1986cd590b708da3e3820b288e
7
+ data.tar.gz: b09740b530363e012c7f00ba6b33aaec366503310756e328a00d9def849fcb13158e911265cf57dcf6a8b613be6cd3e0e6bf164e1cfa480983b73c29a53e8cbb
data/.rspec ADDED
@@ -0,0 +1,3 @@
1
+ --format documentation
2
+ --color
3
+ --require spec_helper
data/.rubocop.yml ADDED
@@ -0,0 +1,8 @@
1
+ AllCops:
2
+ TargetRubyVersion: 3.1
3
+
4
+ Style/StringLiterals:
5
+ EnforcedStyle: double_quotes
6
+
7
+ Style/StringLiteralsInInterpolation:
8
+ EnforcedStyle: double_quotes
data/CHANGELOG.md ADDED
@@ -0,0 +1,25 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [0.1.0] - 2025-11-18
9
+
10
+ ### Added
11
+
12
+ - **Initial Release**: First public version of the `cton` gem.
13
+ - **CTON Encoder**: `Cton.dump` (aliased as `Cton.generate`) to convert a Ruby `Hash` into a CTON string.
14
+ - Encodes objects, arrays, strings, numbers, booleans, and `nil`.
15
+ - Automatic table detection for arrays of uniform hashes, creating a highly compact representation: `key[N]{h1,h2}=v1,v2;...`.
16
+ - Smart string quoting: only quotes strings containing special characters, whitespace, or those that could be misinterpreted as numbers, booleans, or null.
17
+ - Number normalization: canonicalizes floats and `BigDecimal` to a clean, exponent-free format. `NaN` and `Infinity` are converted to `null` for safety.
18
+ - Configurable separator (`\n` by default) between top-level key-value pairs.
19
+ - **CTON Decoder**: `Cton.load` (aliased as `Cton.parse`) to parse a CTON string back into a Ruby object.
20
+ - Handles all CTON structures: objects `()`, arrays `[]`, and compact tables `[]{}`.
21
+ - Supports an optional `symbolize_names: true` argument to convert all hash keys to symbols.
22
+ - Robust parsing of scalars, including quoted and unquoted strings.
23
+ - **Error Handling**: `Cton::EncodeError` for unsupported Ruby types during encoding and `Cton::ParseError` for malformed CTON input.
24
+ - **Documentation**: `README.md` with a format overview, usage examples, and rationale.
25
+ - **Testing**: Comprehensive RSpec suite ensuring round-trip integrity and correct handling of edge cases.
@@ -0,0 +1,132 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ We as members, contributors, and leaders pledge to make participation in our
6
+ community a harassment-free experience for everyone, regardless of age, body
7
+ size, visible or invisible disability, ethnicity, sex characteristics, gender
8
+ identity and expression, level of experience, education, socio-economic status,
9
+ nationality, personal appearance, race, caste, color, religion, or sexual
10
+ identity and orientation.
11
+
12
+ We pledge to act and interact in ways that contribute to an open, welcoming,
13
+ diverse, inclusive, and healthy community.
14
+
15
+ ## Our Standards
16
+
17
+ Examples of behavior that contributes to a positive environment for our
18
+ community include:
19
+
20
+ * Demonstrating empathy and kindness toward other people
21
+ * Being respectful of differing opinions, viewpoints, and experiences
22
+ * Giving and gracefully accepting constructive feedback
23
+ * Accepting responsibility and apologizing to those affected by our mistakes,
24
+ and learning from the experience
25
+ * Focusing on what is best not just for us as individuals, but for the overall
26
+ community
27
+
28
+ Examples of unacceptable behavior include:
29
+
30
+ * The use of sexualized language or imagery, and sexual attention or advances of
31
+ any kind
32
+ * Trolling, insulting or derogatory comments, and personal or political attacks
33
+ * Public or private harassment
34
+ * Publishing others' private information, such as a physical or email address,
35
+ without their explicit permission
36
+ * Other conduct which could reasonably be considered inappropriate in a
37
+ professional setting
38
+
39
+ ## Enforcement Responsibilities
40
+
41
+ Community leaders are responsible for clarifying and enforcing our standards of
42
+ acceptable behavior and will take appropriate and fair corrective action in
43
+ response to any behavior that they deem inappropriate, threatening, offensive,
44
+ or harmful.
45
+
46
+ Community leaders have the right and responsibility to remove, edit, or reject
47
+ comments, commits, code, wiki edits, issues, and other contributions that are
48
+ not aligned to this Code of Conduct, and will communicate reasons for moderation
49
+ decisions when appropriate.
50
+
51
+ ## Scope
52
+
53
+ This Code of Conduct applies within all community spaces, and also applies when
54
+ an individual is officially representing the community in public spaces.
55
+ Examples of representing our community include using an official email address,
56
+ posting via an official social media account, or acting as an appointed
57
+ representative at an online or offline event.
58
+
59
+ ## Enforcement
60
+
61
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
62
+ reported to the community leaders responsible for enforcement at
63
+ [INSERT CONTACT METHOD].
64
+ All complaints will be reviewed and investigated promptly and fairly.
65
+
66
+ All community leaders are obligated to respect the privacy and security of the
67
+ reporter of any incident.
68
+
69
+ ## Enforcement Guidelines
70
+
71
+ Community leaders will follow these Community Impact Guidelines in determining
72
+ the consequences for any action they deem in violation of this Code of Conduct:
73
+
74
+ ### 1. Correction
75
+
76
+ **Community Impact**: Use of inappropriate language or other behavior deemed
77
+ unprofessional or unwelcome in the community.
78
+
79
+ **Consequence**: A private, written warning from community leaders, providing
80
+ clarity around the nature of the violation and an explanation of why the
81
+ behavior was inappropriate. A public apology may be requested.
82
+
83
+ ### 2. Warning
84
+
85
+ **Community Impact**: A violation through a single incident or series of
86
+ actions.
87
+
88
+ **Consequence**: A warning with consequences for continued behavior. No
89
+ interaction with the people involved, including unsolicited interaction with
90
+ those enforcing the Code of Conduct, for a specified period of time. This
91
+ includes avoiding interactions in community spaces as well as external channels
92
+ like social media. Violating these terms may lead to a temporary or permanent
93
+ ban.
94
+
95
+ ### 3. Temporary Ban
96
+
97
+ **Community Impact**: A serious violation of community standards, including
98
+ sustained inappropriate behavior.
99
+
100
+ **Consequence**: A temporary ban from any sort of interaction or public
101
+ communication with the community for a specified period of time. No public or
102
+ private interaction with the people involved, including unsolicited interaction
103
+ with those enforcing the Code of Conduct, is allowed during this period.
104
+ Violating these terms may lead to a permanent ban.
105
+
106
+ ### 4. Permanent Ban
107
+
108
+ **Community Impact**: Demonstrating a pattern of violation of community
109
+ standards, including sustained inappropriate behavior, harassment of an
110
+ individual, or aggression toward or disparagement of classes of individuals.
111
+
112
+ **Consequence**: A permanent ban from any sort of public interaction within the
113
+ community.
114
+
115
+ ## Attribution
116
+
117
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage],
118
+ version 2.1, available at
119
+ [https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
120
+
121
+ Community Impact Guidelines were inspired by
122
+ [Mozilla's code of conduct enforcement ladder][Mozilla CoC].
123
+
124
+ For answers to common questions about this code of conduct, see the FAQ at
125
+ [https://www.contributor-covenant.org/faq][FAQ]. Translations are available at
126
+ [https://www.contributor-covenant.org/translations][translations].
127
+
128
+ [homepage]: https://www.contributor-covenant.org
129
+ [v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
130
+ [Mozilla CoC]: https://github.com/mozilla/diversity
131
+ [FAQ]: https://www.contributor-covenant.org/faq
132
+ [translations]: https://www.contributor-covenant.org/translations
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2025 Davide Santangelo
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,109 @@
1
+ # CTON
2
+
3
+ CTON (Compact Token-Oriented Notation) is an aggressively minified, JSON-compatible wire format that keeps prompts short without giving up schema hints. It is shape-preserving (objects, arrays, scalars, table-like arrays) and deterministic, so you can safely round-trip between Ruby hashes and compact strings that work well in LLM prompts.
4
+
5
+ ## Why another format?
6
+
7
+ - **Less noise than YAML/JSON**: no indentation, no braces around the root, and optional quoting.
8
+ - **Schema guardrails**: arrays carry their length (`friends[3]`) and table headers (`{id,name,...}`) so downstream parsing can verify shape.
9
+ - **LLM-friendly**: works as a single string you can embed in a prompt together with short parsing instructions.
10
+ - **Token savings**: CTON compounds the JSON → TOON savings; see the section below for concrete numbers.
11
+
12
+ ## Token savings vs JSON & TOON
13
+
14
+ - **JSON → TOON**: The [TOON benchmarks](https://toonformat.dev) report roughly 40% fewer tokens than plain JSON on mixed-structure prompts while retaining accuracy due to explicit array lengths and headers.
15
+ - **TOON → CTON**: By stripping indentation and forcing everything inline, CTON cuts another ~20–40% of characters. The sample above is ~350 characters as TOON and ~250 as CTON (~29% fewer), and larger tabular datasets show similar reductions.
16
+ - **Net effect**: In practice you can often reclaim 50–60% of the token budget versus raw JSON, leaving more room for instructions or reasoning steps while keeping a deterministic schema.
17
+
18
+ ## Format at a glance
19
+
20
+ ```
21
+ context(task="Our favorite hikes together",location=Boulder,season=spring_2025)
22
+ friends[3]=ana,luis,sam
23
+ hikes[3]{id,name,distanceKm,elevationGain,companion,wasSunny}=1,"Blue Lake Trail",7.5,320,ana,true;2,"Ridge Overlook",9.2,540,luis,false;3,"Wildflower Loop",5.1,180,sam,true
24
+ ```
25
+
26
+ - Objects use parentheses and `key=value` pairs separated by commas.
27
+ - Arrays encode their length: `[N]=...`. When every element is a flat hash with the same keys, they collapse into a compact table: `[N]{key1,key2}=row1;row2`.
28
+ - Scalars (numbers, booleans, `null`) keep their JSON text. Strings only need quotes when they contain whitespace or reserved punctuation.
29
+ - For parsing safety the Ruby encoder inserts a single `\n` between top-level segments. You can override this if you truly need a fully inline document (see options below).
30
+
31
+ ## Installation
32
+
33
+ Add the gem to your application:
34
+
35
+ ```bash
36
+ bundle add cton
37
+ ```
38
+
39
+ Or install it directly:
40
+
41
+ ```bash
42
+ gem install cton
43
+ ```
44
+
45
+ ## Usage
46
+
47
+ ```ruby
48
+ require "cton"
49
+
50
+ payload = {
51
+ "context" => {
52
+ "task" => "Our favorite hikes together",
53
+ "location" => "Boulder",
54
+ "season" => "spring_2025"
55
+ },
56
+ "friends" => %w[ana luis sam],
57
+ "hikes" => [
58
+ { "id" => 1, "name" => "Blue Lake Trail", "distanceKm" => 7.5, "elevationGain" => 320, "companion" => "ana", "wasSunny" => true },
59
+ { "id" => 2, "name" => "Ridge Overlook", "distanceKm" => 9.2, "elevationGain" => 540, "companion" => "luis", "wasSunny" => false },
60
+ { "id" => 3, "name" => "Wildflower Loop", "distanceKm" => 5.1, "elevationGain" => 180, "companion" => "sam", "wasSunny" => true }
61
+ ]
62
+ }
63
+
64
+ cton = Cton.dump(payload)
65
+ # => "context(... )\nfriends[3]=ana,luis,sam\nhikes[3]{...}"
66
+
67
+ round_tripped = Cton.load(cton)
68
+ # => original hash
69
+
70
+ # Need symbols?
71
+ symbolized = Cton.load(cton, symbolize_names: true)
72
+
73
+ # Want a truly inline document? Opt in explicitly (decoding becomes unsafe for ambiguous cases).
74
+ inline = Cton.dump(payload, separator: "")
75
+ ```
76
+
77
+ ### Table detection
78
+
79
+ Whenever an array is made of hashes that all expose the same scalar keys, the encoder flattens it into a table to save tokens. Mixed or nested arrays fall back to `[N]=(value1,value2,...)`.
80
+
81
+ ### Separators & ambiguity
82
+
83
+ Removing every newline makes certain inputs ambiguous because `sam` and the next key `hikes` can merge into `samhikes`. The default `separator: "\n"` avoids that by inserting a single newline between root segments. You may pass `separator: ""` to `Cton.dump` for maximum compactness, but decoding such strings is only safe if you can guarantee extra quoting or whitespace between segments.
84
+
85
+ ### Literal safety & number normalization
86
+
87
+ Following the TOON specification's guardrails, the encoder now:
88
+
89
+ - Auto-quotes strings that would otherwise be parsed as booleans, `null`, or numbers (e.g., `"true"`, `"007"`, `"1e6"`, `"-5"`) so they round-trip as strings without extra work.
90
+ - Canonicalizes float/BigDecimal output: no exponent notation, no trailing zeros, and `-0` collapses to `0`.
91
+ - Converts `NaN` and `±Infinity` inputs to `null`, matching TOON's normalization guidance so downstream decoders don't explode on non-finite numbers.
92
+
93
+ ## Development
94
+
95
+ ```bash
96
+ bin/setup # install dependencies
97
+ bundle exec rspec
98
+ bin/console # interactive playground
99
+ ```
100
+
101
+ To release a new version, bump `Cton::VERSION` and run `bundle exec rake release`.
102
+
103
+ ## Contributing
104
+
105
+ Bug reports and pull requests are welcome at https://github.com/davidesantangelo/cton. Please follow the [Code of Conduct](CODE_OF_CONDUCT.md).
106
+
107
+ ## License
108
+
109
+ MIT © Davide Santangelo
data/Rakefile ADDED
@@ -0,0 +1,12 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "bundler/gem_tasks"
4
+ require "rspec/core/rake_task"
5
+
6
+ RSpec::Core::RakeTask.new(:spec)
7
+
8
+ require "rubocop/rake_task"
9
+
10
+ RuboCop::RakeTask.new
11
+
12
+ task default: %i[spec rubocop]
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Cton
4
+ VERSION = "0.1.0"
5
+ end
data/lib/cton.rb ADDED
@@ -0,0 +1,581 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "bigdecimal"
4
+ require_relative "cton/version"
5
+
6
+ module Cton
7
+ class Error < StandardError; end
8
+ class EncodeError < Error; end
9
+ class ParseError < Error; end
10
+
11
+ module_function
12
+
13
+ def dump(payload, options = {})
14
+ separator = options.fetch(:separator, "\n")
15
+ Encoder.new(separator: separator).encode(payload)
16
+ end
17
+ alias generate dump
18
+
19
+ def load(cton_string, symbolize_names: false)
20
+ Decoder.new(symbolize_names: symbolize_names).decode(cton_string)
21
+ end
22
+ alias parse load
23
+
24
+ class Encoder
25
+ SAFE_TOKEN = /\A[0-9A-Za-z_.:-]+\z/.freeze
26
+ NUMERIC_TOKEN = /\A-?(?:\d+)(?:\.\d+)?(?:[eE][+-]?\d+)?\z/.freeze
27
+ RESERVED_LITERALS = %w[true false null].freeze
28
+
29
+ def initialize(separator: "\n")
30
+ @separator = separator || ""
31
+ end
32
+
33
+ def encode(payload)
34
+ encode_root(payload)
35
+ end
36
+
37
+ private
38
+
39
+ attr_reader :separator
40
+
41
+ def encode_root(value)
42
+ case value
43
+ when Hash
44
+ value.map { |key, nested| encode_top_level_pair(key, nested) }.join(separator)
45
+ else
46
+ encode_value(value, context: :standalone)
47
+ end
48
+ end
49
+
50
+ def encode_top_level_pair(key, value)
51
+ "#{format_key(key)}#{encode_value(value, context: :top_pair)}"
52
+ end
53
+
54
+ def encode_value(value, context:)
55
+ case value
56
+ when Hash
57
+ encode_object(value)
58
+ when Array
59
+ encode_array(value)
60
+ else
61
+ prefix = context == :top_pair ? "=" : ""
62
+ "#{prefix}#{encode_scalar(value)}"
63
+ end
64
+ end
65
+
66
+ def encode_object(hash)
67
+ return "()" if hash.empty?
68
+
69
+ pairs = hash.map do |key, value|
70
+ "#{format_key(key)}=#{encode_value(value, context: :object)}"
71
+ end
72
+ "(#{pairs.join(',')})"
73
+ end
74
+
75
+ def encode_array(list)
76
+ length = list.length
77
+ return "[0]=" if length.zero?
78
+
79
+ if table_candidate?(list)
80
+ "[#{length}]#{encode_table(list)}"
81
+ else
82
+ body = if list.all? { |value| scalar?(value) }
83
+ list.map { |value| encode_scalar(value) }.join(",")
84
+ else
85
+ list.map { |value| encode_array_element(value) }.join(",")
86
+ end
87
+ "[#{length}]=#{body}"
88
+ end
89
+ end
90
+
91
+ def encode_table(rows)
92
+ header = rows.first.keys
93
+ header_token = "{#{header.map { |key| format_key(key) }.join(',')}}"
94
+ table_rows = rows.map do |row|
95
+ header.map { |field| encode_scalar(row.fetch(field)) }.join(",")
96
+ end
97
+ "#{header_token}=#{table_rows.join(';')}"
98
+ end
99
+
100
+ def encode_array_element(value)
101
+ encode_value(value, context: :array)
102
+ end
103
+
104
+ def encode_scalar(value)
105
+ case value
106
+ when String
107
+ encode_string(value)
108
+ when TrueClass, FalseClass
109
+ value ? "true" : "false"
110
+ when NilClass
111
+ "null"
112
+ when Numeric
113
+ format_number(value)
114
+ else
115
+ raise EncodeError, "Unsupported value: #{value.class}"
116
+ end
117
+ end
118
+
119
+ def encode_string(value)
120
+ return '""' if value.empty?
121
+
122
+ string_needs_quotes?(value) ? quote_string(value) : value
123
+ end
124
+
125
+ FLOAT_DECIMAL_PRECISION = Float::DIG
126
+
127
+ def format_number(value)
128
+ case value
129
+ when Float
130
+ return "null" if value.nan? || value.infinite?
131
+
132
+ normalize_decimal_string(float_decimal_string(value))
133
+ when Integer
134
+ value.to_s
135
+ else
136
+ if defined?(BigDecimal) && value.is_a?(BigDecimal)
137
+ normalize_decimal_string(value.to_s("F"))
138
+ else
139
+ value.to_s
140
+ end
141
+ end
142
+ end
143
+
144
+ def normalize_decimal_string(string)
145
+ stripped = string.start_with?("+") ? string[1..-1] : string
146
+ return "0" if zero_string?(stripped)
147
+
148
+ if stripped.include?(".")
149
+ stripped = stripped.sub(/0+\z/, "")
150
+ stripped = stripped.sub(/\.\z/, "")
151
+ end
152
+
153
+ stripped
154
+ end
155
+
156
+ def zero_string?(string)
157
+ string.match?(/\A-?0+(?:\.0+)?\z/)
158
+ end
159
+
160
+ def float_decimal_string(value)
161
+ if defined?(BigDecimal)
162
+ BigDecimal(value.to_s).to_s("F")
163
+ else
164
+ Kernel.format("%.#{FLOAT_DECIMAL_PRECISION}f", value)
165
+ end
166
+ end
167
+
168
+ def format_key(key)
169
+ key_string = key.to_s
170
+ unless SAFE_TOKEN.match?(key_string)
171
+ raise EncodeError, "Invalid key: #{key_string.inspect}"
172
+ end
173
+ key_string
174
+ end
175
+
176
+ def string_needs_quotes?(value)
177
+ return true unless SAFE_TOKEN.match?(value)
178
+ RESERVED_LITERALS.include?(value) || numeric_like?(value)
179
+ end
180
+
181
+ def numeric_like?(value)
182
+ NUMERIC_TOKEN.match?(value)
183
+ end
184
+
185
+ def quote_string(value)
186
+ "\"#{escape_string(value)}\""
187
+ end
188
+
189
+ def escape_string(value)
190
+ value.gsub(/["\\\n\r\t]/) do |char|
191
+ case char
192
+ when "\n" then "\\n"
193
+ when "\r" then "\\r"
194
+ when "\t" then "\\t"
195
+ else
196
+ "\\#{char}"
197
+ end
198
+ end
199
+ end
200
+
201
+ def scalar?(value)
202
+ value.is_a?(String) || value.is_a?(Numeric) || value == true || value == false || value.nil?
203
+ end
204
+
205
+ def table_candidate?(rows)
206
+ return false if rows.empty?
207
+
208
+ first = rows.first
209
+ return false unless first.is_a?(Hash) && !first.empty?
210
+
211
+ keys = first.keys
212
+ rows.all? do |row|
213
+ row.is_a?(Hash) && row.keys == keys && row.values.all? { |val| scalar?(val) }
214
+ end
215
+ end
216
+ end
217
+
218
+ class Decoder
219
+ TERMINATORS = [",", ";", ")", "]", "}"].freeze
220
+
221
+ def initialize(symbolize_names: false)
222
+ @symbolize_names = symbolize_names
223
+ end
224
+
225
+ def decode(cton)
226
+ @source = cton.to_s
227
+ @index = 0
228
+ skip_ws
229
+
230
+ value = if key_ahead?(@index)
231
+ parse_document
232
+ else
233
+ parse_value(allow_key_boundary: true)
234
+ end
235
+
236
+ skip_ws
237
+ raise ParseError, "Unexpected trailing data" unless eof?
238
+
239
+ value
240
+ end
241
+
242
+ private
243
+
244
+ attr_reader :symbolize_names
245
+
246
+ def parse_document
247
+ result = {}
248
+ until eof?
249
+ key = parse_key_name
250
+ value = parse_value_for_key
251
+ assign_pair(result, key, value)
252
+ skip_ws
253
+ end
254
+ result
255
+ end
256
+
257
+ def parse_value_for_key
258
+ skip_ws
259
+ char = current_char
260
+ case char
261
+ when "("
262
+ parse_object
263
+ when "["
264
+ parse_array
265
+ when "="
266
+ advance
267
+ parse_scalar(allow_key_boundary: true)
268
+ else
269
+ raise ParseError, "Unexpected token #{char.inspect} while reading value"
270
+ end
271
+ end
272
+
273
+ def parse_object
274
+ expect!("(")
275
+ skip_ws
276
+ if current_char == ")"
277
+ expect!(")")
278
+ return {}
279
+ end
280
+
281
+ pairs = {}
282
+ loop do
283
+ key = parse_key_name
284
+ expect!("=")
285
+ value = parse_value
286
+ assign_pair(pairs, key, value)
287
+ skip_ws
288
+ break if current_char == ")"
289
+ expect!(",")
290
+ skip_ws
291
+ end
292
+ expect!(")")
293
+ pairs
294
+ end
295
+
296
+ def parse_array
297
+ expect!("[")
298
+ length = parse_integer_literal
299
+ expect!("]")
300
+ skip_ws
301
+
302
+ header = parse_header if current_char == "{"
303
+
304
+ expect!("=")
305
+ return [] if length.zero?
306
+
307
+ header ? parse_table_rows(length, header) : parse_array_elements(length)
308
+ end
309
+
310
+ def parse_header
311
+ expect!("{")
312
+ fields = []
313
+ loop do
314
+ fields << parse_key_name
315
+ break if current_char == "}"
316
+ expect!(",")
317
+ end
318
+ expect!("}")
319
+ fields
320
+ end
321
+
322
+ def parse_table_rows(length, header)
323
+ rows = []
324
+ length.times do |row_index|
325
+ row = {}
326
+ header.each_with_index do |field, column_index|
327
+ allow_boundary = row_index == length - 1 && column_index == header.length - 1
328
+ row[field] = parse_scalar(allow_key_boundary: allow_boundary)
329
+ expect!(",") if column_index < header.length - 1
330
+ end
331
+ rows << symbolize_keys(row)
332
+ expect!(";") if row_index < length - 1
333
+ end
334
+ rows
335
+ end
336
+
337
+ def parse_array_elements(length)
338
+ values = []
339
+ length.times do |index|
340
+ allow_boundary = index == length - 1
341
+ values << parse_value(allow_key_boundary: allow_boundary)
342
+ expect!(",") if index < length - 1
343
+ end
344
+ values
345
+ end
346
+
347
+ def parse_value(allow_key_boundary: false)
348
+ skip_ws
349
+ char = current_char
350
+ raise ParseError, "Unexpected end of input" if char.nil?
351
+
352
+ case char
353
+ when "("
354
+ parse_object
355
+ when "["
356
+ parse_array
357
+ when '"'
358
+ parse_string
359
+ else
360
+ parse_scalar(allow_key_boundary: allow_key_boundary)
361
+ end
362
+ end
363
+
364
+ def parse_scalar(terminators: TERMINATORS, allow_key_boundary: false)
365
+ skip_ws
366
+ return parse_string if current_char == '"'
367
+
368
+ start = @index
369
+ limit_index = allow_key_boundary ? next_key_index(@index) : nil
370
+ exit_reason = nil
371
+
372
+ while !eof?
373
+ if limit_index && @index >= limit_index
374
+ exit_reason = :boundary
375
+ break
376
+ end
377
+
378
+ char = current_char
379
+
380
+ if char.nil?
381
+ exit_reason = :eof
382
+ break
383
+ elsif terminators.include?(char)
384
+ exit_reason = :terminator
385
+ break
386
+ elsif whitespace?(char)
387
+ exit_reason = :whitespace
388
+ break
389
+ elsif "()[]{}".include?(char)
390
+ exit_reason = :structure
391
+ break
392
+ end
393
+
394
+ @index += 1
395
+ end
396
+
397
+ token = if exit_reason == :boundary && limit_index
398
+ @source[start...limit_index]
399
+ else
400
+ @source[start...@index]
401
+ end
402
+
403
+ raise ParseError, "Empty value" if token.nil? || token.empty?
404
+
405
+ convert_scalar(token)
406
+ end
407
+
408
+ def convert_scalar(token)
409
+ case token
410
+ when "true" then true
411
+ when "false" then false
412
+ when "null" then nil
413
+ else
414
+ if integer?(token)
415
+ token.to_i
416
+ elsif float?(token)
417
+ token.to_f
418
+ else
419
+ token
420
+ end
421
+ end
422
+ end
423
+
424
+ def parse_string
425
+ expect!("\"")
426
+ buffer = +""
427
+ while !eof?
428
+ char = current_char
429
+ raise ParseError, "Unterminated string" if char.nil?
430
+
431
+ if char == '\\'
432
+ @index += 1
433
+ escaped = current_char
434
+ raise ParseError, "Invalid escape sequence" if escaped.nil?
435
+ buffer << case escaped
436
+ when 'n' then "\n"
437
+ when 'r' then "\r"
438
+ when 't' then "\t"
439
+ when '"', '\\' then escaped
440
+ else
441
+ raise ParseError, "Unsupported escape sequence"
442
+ end
443
+ elsif char == '"'
444
+ break
445
+ else
446
+ buffer << char
447
+ end
448
+ @index += 1
449
+ end
450
+ expect!("\"")
451
+ buffer
452
+ end
453
+
454
+ def parse_key_name
455
+ skip_ws
456
+ start = @index
457
+ while !eof? && safe_key_char?(current_char)
458
+ @index += 1
459
+ end
460
+ token = @source[start...@index]
461
+ raise ParseError, "Invalid key" if token.nil? || token.empty?
462
+ symbolize_names ? token.to_sym : token
463
+ end
464
+
465
+ def parse_integer_literal
466
+ start = @index
467
+ while !eof? && current_char =~ /\d/
468
+ @index += 1
469
+ end
470
+ token = @source[start...@index]
471
+ raise ParseError, "Expected digits" if token.nil? || token.empty?
472
+ Integer(token, 10)
473
+ rescue ArgumentError
474
+ raise ParseError, "Invalid length literal"
475
+ end
476
+
477
+ def assign_pair(hash, key, value)
478
+ hash[key] = value
479
+ end
480
+
481
+ def symbolize_keys(row)
482
+ symbolize_names ? row.transform_keys(&:to_sym) : row
483
+ end
484
+
485
+ def expect!(char)
486
+ skip_ws
487
+ actual = current_char
488
+ raise ParseError, "Expected #{char.inspect}, got #{actual.inspect}" unless actual == char
489
+ @index += 1
490
+ end
491
+
492
+ def skip_ws
493
+ @index += 1 while !eof? && whitespace?(current_char)
494
+ end
495
+
496
+ def whitespace?(char)
497
+ char == " " || char == "\t" || char == "\n" || char == "\r"
498
+ end
499
+
500
+ def eof?
501
+ @index >= @source.length
502
+ end
503
+
504
+ def current_char
505
+ @source[@index]
506
+ end
507
+
508
+ def advance
509
+ @index += 1
510
+ end
511
+
512
+ def key_ahead?(offset)
513
+ idx = offset
514
+ idx += 1 while idx < @source.length && whitespace?(@source[idx])
515
+ start = idx
516
+ while idx < @source.length && safe_key_char?(@source[idx])
517
+ idx += 1
518
+ end
519
+ return false if idx == start
520
+ next_char = @source[idx]
521
+ ["(", "[", "="].include?(next_char)
522
+ end
523
+
524
+ def safe_key_char?(char)
525
+ !char.nil? && char.match?(/[0-9A-Za-z_.:-]/)
526
+ end
527
+
528
+ def integer?(token)
529
+ token.match?(/\A-?(?:0|[1-9]\d*)\z/)
530
+ end
531
+
532
+ def float?(token)
533
+ token.match?(/\A-?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+-]?\d+)?\z/)
534
+ end
535
+
536
+ def next_key_index(from_index)
537
+ idx = from_index
538
+ in_string = false
539
+
540
+ while idx < @source.length
541
+ char = @source[idx]
542
+
543
+ if in_string
544
+ if char == '\\'
545
+ idx += 2
546
+ next
547
+ elsif char == '"'
548
+ in_string = false
549
+ idx += 1
550
+ next
551
+ else
552
+ idx += 1
553
+ next
554
+ end
555
+ else
556
+ case char
557
+ when '"'
558
+ in_string = true
559
+ idx += 1
560
+ next
561
+ else
562
+ if safe_key_char?(char)
563
+ start = idx
564
+ idx += 1 while idx < @source.length && safe_key_char?(@source[idx])
565
+ next_char = @source[idx]
566
+ if start > from_index && ["(", "[", "="].include?(next_char)
567
+ return start
568
+ end
569
+ idx = start + 1
570
+ next
571
+ end
572
+ idx += 1
573
+ end
574
+ end
575
+ end
576
+
577
+ nil
578
+ end
579
+ end
580
+ end
581
+
data/sig/cton.rbs ADDED
@@ -0,0 +1,11 @@
1
+ module Cton
2
+ VERSION: String
3
+ class Error < ::StandardError; end
4
+ class EncodeError < Error; end
5
+ class ParseError < Error; end
6
+
7
+ def self.dump: (untyped, ?Hash[Symbol, untyped]) -> String
8
+ def self.generate: (untyped, ?Hash[Symbol, untyped]) -> String
9
+ def self.load: (String, ?symbolize_names: bool) -> untyped
10
+ def self.parse: (String, ?symbolize_names: bool) -> untyped
11
+ end
metadata ADDED
@@ -0,0 +1,57 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: cton
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Davide Santangelo
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2025-11-18 00:00:00.000000000 Z
12
+ dependencies: []
13
+ description: CTON provides a JSON-compatible, token-efficient text representation
14
+ optimized for LLM prompts.
15
+ email:
16
+ - davide.santangelo@gmail.com
17
+ executables: []
18
+ extensions: []
19
+ extra_rdoc_files: []
20
+ files:
21
+ - ".rspec"
22
+ - ".rubocop.yml"
23
+ - CHANGELOG.md
24
+ - CODE_OF_CONDUCT.md
25
+ - LICENSE.txt
26
+ - README.md
27
+ - Rakefile
28
+ - lib/cton.rb
29
+ - lib/cton/version.rb
30
+ - sig/cton.rbs
31
+ homepage: https://github.com/davidesantangelo/cton
32
+ licenses:
33
+ - MIT
34
+ metadata:
35
+ homepage_uri: https://github.com/davidesantangelo/cton
36
+ source_code_uri: https://github.com/davidesantangelo/cton
37
+ changelog_uri: https://github.com/davidesantangelo/cton/blob/master/CHANGELOG.md
38
+ post_install_message:
39
+ rdoc_options: []
40
+ require_paths:
41
+ - lib
42
+ required_ruby_version: !ruby/object:Gem::Requirement
43
+ requirements:
44
+ - - ">="
45
+ - !ruby/object:Gem::Version
46
+ version: 3.1.0
47
+ required_rubygems_version: !ruby/object:Gem::Requirement
48
+ requirements:
49
+ - - ">="
50
+ - !ruby/object:Gem::Version
51
+ version: '0'
52
+ requirements: []
53
+ rubygems_version: 3.3.26
54
+ signing_key:
55
+ specification_version: 4
56
+ summary: Compact Token-Oriented Notation encoder/decoder.
57
+ test_files: []