cton 0.3.0 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +48 -0
- data/README.md +119 -0
- data/lib/cton/decoder.rb +49 -20
- data/lib/cton/encoder.rb +16 -2
- data/lib/cton/stats.rb +123 -0
- data/lib/cton/type_registry.rb +137 -0
- data/lib/cton/validator.rb +427 -0
- data/lib/cton/version.rb +1 -1
- data/lib/cton.rb +84 -2
- data/sig/cton.rbs +119 -6
- metadata +5 -2
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 4e7225331f668eaed1f9b9e85e4cfc6fd183b997a8a6a89b69a5300ad2fbc54c
|
|
4
|
+
data.tar.gz: 20cdff68722bc7dc8d7629b95c5ae47d6d0eb21a8be26ebc93cc32605a5ed329
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 53f865fe83082953fae7cfa899c1c08014a21b0662d774253dc929c091eb4925c838007abfaeebdbacc3219a59751be0de1990d2230c9347a4d4e87ed9cc9753
|
|
7
|
+
data.tar.gz: b34df4bb7946e32feb693086892a410d9b440ba6374b3d3673e125b20ab7403c69bded90a43f5dd7683c3c54b889449965cfd136a7b180811a7ccb0a1ab185b9
|
data/CHANGELOG.md
CHANGED
|
@@ -5,6 +5,54 @@ All notable changes to this project will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [0.4.0] - 2025-11-26
|
|
9
|
+
|
|
10
|
+
### Added
|
|
11
|
+
|
|
12
|
+
- **Comment Support**: CTON now supports single-line comments using `#` syntax. Comments are ignored during parsing, allowing for annotated data files.
|
|
13
|
+
- Decoder skips comments (from `#` to end of line) during parsing
|
|
14
|
+
- Encoder can emit comments via new `comments:` option: `Cton.dump(data, comments: { "key" => "description" })`
|
|
15
|
+
|
|
16
|
+
- **Validation API**: New methods to validate CTON syntax without full parsing:
|
|
17
|
+
- `Cton.valid?(string)` returns `true` or `false`
|
|
18
|
+
- `Cton.validate(string)` returns a `ValidationResult` object with detailed error information
|
|
19
|
+
- `ValidationResult` includes `valid?`, `errors`, and `to_s` methods
|
|
20
|
+
- `ValidationError` includes `message`, `line`, `column`, and `source_excerpt`
|
|
21
|
+
|
|
22
|
+
- **Token Statistics API**: Analyze and compare CTON vs JSON token efficiency:
|
|
23
|
+
- `Cton.stats(data)` returns a `Stats` object with comprehensive metrics
|
|
24
|
+
- `Cton.stats_hash(data)` returns stats as a Hash
|
|
25
|
+
- `Stats` includes `json_chars`, `cton_chars`, `savings_percent`, `estimated_json_tokens`, `estimated_cton_tokens`
|
|
26
|
+
- `Stats.compare(data)` compares multiple format variants (CTON, CTON inline, CTON pretty, JSON, JSON pretty)
|
|
27
|
+
|
|
28
|
+
- **Custom Type Registry**: Register custom serializers for domain objects:
|
|
29
|
+
- `Cton.register_type(klass, as: :object) { |value| ... }` registers a type handler
|
|
30
|
+
- `Cton.unregister_type(klass)` removes a handler
|
|
31
|
+
- `Cton.clear_type_registry!` clears all handlers
|
|
32
|
+
- Supports `:object`, `:array`, and `:scalar` modes
|
|
33
|
+
|
|
34
|
+
- **Enhanced CLI**: New command-line options:
|
|
35
|
+
- `--stats` / `-s`: Show token savings statistics comparing JSON vs CTON
|
|
36
|
+
- `--validate`: Validate CTON syntax without conversion
|
|
37
|
+
- `--minify` / `-m`: Output CTON without separators (fully inline)
|
|
38
|
+
- Improved error messages with line/column information and colored output
|
|
39
|
+
|
|
40
|
+
- **Enhanced Error Reporting**: `ParseError` now includes structured location information:
|
|
41
|
+
- `line` and `column` attributes for precise error location
|
|
42
|
+
- `source_excerpt` showing context around the error
|
|
43
|
+
- `suggestions` array for helpful hints
|
|
44
|
+
- `to_h` method for programmatic error handling
|
|
45
|
+
|
|
46
|
+
### Changed
|
|
47
|
+
|
|
48
|
+
- **Decoder optimizations**: Pre-compiled frozen regex patterns (`SAFE_KEY_PATTERN`, `INTEGER_PATTERN`, `FLOAT_PATTERN`) for faster matching
|
|
49
|
+
- **Encoder**: Now uses frozen regex constants for `SAFE_TOKEN` and `NUMERIC_TOKEN`
|
|
50
|
+
- **RBS signatures**: Comprehensive type signatures for all new APIs
|
|
51
|
+
|
|
52
|
+
### Fixed
|
|
53
|
+
|
|
54
|
+
- **Comment handling**: Whitespace and comments are now properly skipped in all parsing contexts
|
|
55
|
+
|
|
8
56
|
## [0.3.0] - 2025-11-20
|
|
9
57
|
|
|
10
58
|
### Added
|
data/README.md
CHANGED
|
@@ -35,6 +35,10 @@ CTON is designed to be the most efficient way to represent structured data for L
|
|
|
35
35
|
* Nested objects use parentheses `(key=value)`.
|
|
36
36
|
* Arrays use brackets with length `[N]=item1,item2`.
|
|
37
37
|
3. **Table Compression**: If an array contains objects with the same keys, CTON automatically converts it into a table format `[N]{header1,header2}=val1,val2;val3,val4`. This is a massive token saver for datasets.
|
|
38
|
+
4. **Comments**: Single-line comments with `#` for annotating data.
|
|
39
|
+
5. **Validation API**: Check CTON syntax without full parsing for quick validation.
|
|
40
|
+
6. **Token Statistics**: Built-in measurement of token efficiency vs JSON.
|
|
41
|
+
7. **Custom Type Registry**: Register serializers for domain objects.
|
|
38
42
|
|
|
39
43
|
---
|
|
40
44
|
|
|
@@ -188,10 +192,125 @@ echo 'hello=world' | cton --to-json
|
|
|
188
192
|
|
|
189
193
|
# Pretty print
|
|
190
194
|
cton --pretty input.json
|
|
195
|
+
|
|
196
|
+
# Minify (fully inline, no separators)
|
|
197
|
+
cton --minify input.json
|
|
198
|
+
|
|
199
|
+
# Validate CTON syntax
|
|
200
|
+
cton --validate input.cton
|
|
201
|
+
# => ✓ Valid CTON
|
|
202
|
+
|
|
203
|
+
# Show token savings statistics
|
|
204
|
+
echo '{"name": "test", "items": [1,2,3]}' | cton --stats
|
|
205
|
+
# => JSON: 33 chars / 33 bytes (~9 tokens)
|
|
206
|
+
# => CTON: 26 chars / 26 bytes (~7 tokens)
|
|
207
|
+
# => Saved: 21.2% (7 chars, ~2 tokens)
|
|
191
208
|
```
|
|
192
209
|
|
|
193
210
|
### Advanced Features
|
|
194
211
|
|
|
212
|
+
#### Comments
|
|
213
|
+
|
|
214
|
+
CTON supports single-line comments using the `#` character:
|
|
215
|
+
|
|
216
|
+
```ruby
|
|
217
|
+
cton_with_comments = <<~CTON
|
|
218
|
+
# User configuration
|
|
219
|
+
user(
|
|
220
|
+
name=Alice,
|
|
221
|
+
# Age is optional
|
|
222
|
+
age=30
|
|
223
|
+
)
|
|
224
|
+
CTON
|
|
225
|
+
|
|
226
|
+
Cton.load(cton_with_comments)
|
|
227
|
+
# => {"user" => {"name" => "Alice", "age" => 30}}
|
|
228
|
+
|
|
229
|
+
# Add comments when encoding
|
|
230
|
+
Cton.dump(data, comments: { "user" => "User configuration" })
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
#### Validation API
|
|
234
|
+
|
|
235
|
+
Validate CTON syntax without full parsing:
|
|
236
|
+
|
|
237
|
+
```ruby
|
|
238
|
+
# Quick validity check
|
|
239
|
+
Cton.valid?("key=value") # => true
|
|
240
|
+
Cton.valid?("key=(broken") # => false
|
|
241
|
+
|
|
242
|
+
# Detailed validation with error info
|
|
243
|
+
result = Cton.validate("key=(broken")
|
|
244
|
+
result.valid? # => false
|
|
245
|
+
result.errors.first.message # => "Expected '=' in object"
|
|
246
|
+
result.errors.first.line # => 1
|
|
247
|
+
result.errors.first.column # => 5
|
|
248
|
+
```
|
|
249
|
+
|
|
250
|
+
#### Token Statistics
|
|
251
|
+
|
|
252
|
+
Measure CTON's token efficiency compared to JSON:
|
|
253
|
+
|
|
254
|
+
```ruby
|
|
255
|
+
stats = Cton.stats(data)
|
|
256
|
+
puts stats.savings_percent # => 45.5
|
|
257
|
+
puts stats.estimated_token_savings # => 12
|
|
258
|
+
|
|
259
|
+
# Full comparison
|
|
260
|
+
puts stats.to_s
|
|
261
|
+
# => JSON: 100 chars / 100 bytes (~25 tokens)
|
|
262
|
+
# => CTON: 55 chars / 55 bytes (~14 tokens)
|
|
263
|
+
# => Saved: 45.0% (45 chars, ~11 tokens)
|
|
264
|
+
|
|
265
|
+
# Compare all format variants
|
|
266
|
+
Cton::Stats.compare(data)
|
|
267
|
+
# => { cton: {...}, cton_inline: {...}, json: {...}, ... }
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
#### Custom Type Registry
|
|
271
|
+
|
|
272
|
+
Register custom serializers for your domain objects:
|
|
273
|
+
|
|
274
|
+
```ruby
|
|
275
|
+
class Money
|
|
276
|
+
attr_reader :cents, :currency
|
|
277
|
+
def initialize(cents, currency)
|
|
278
|
+
@cents = cents
|
|
279
|
+
@currency = currency
|
|
280
|
+
end
|
|
281
|
+
end
|
|
282
|
+
|
|
283
|
+
# Register as object
|
|
284
|
+
Cton.register_type(Money) do |money|
|
|
285
|
+
{ amount: money.cents, currency: money.currency }
|
|
286
|
+
end
|
|
287
|
+
|
|
288
|
+
Cton.dump("price" => Money.new(1999, "USD"))
|
|
289
|
+
# => "price(amount=1999,currency=USD)"
|
|
290
|
+
|
|
291
|
+
# Register as scalar
|
|
292
|
+
Cton.register_type(UUID, as: :scalar) { |uuid| uuid.to_s }
|
|
293
|
+
|
|
294
|
+
# Unregister when done
|
|
295
|
+
Cton.unregister_type(Money)
|
|
296
|
+
```
|
|
297
|
+
|
|
298
|
+
#### Enhanced Error Reporting
|
|
299
|
+
|
|
300
|
+
Parse errors include detailed context for debugging:
|
|
301
|
+
|
|
302
|
+
```ruby
|
|
303
|
+
begin
|
|
304
|
+
Cton.load("user(name=Alice,invalid")
|
|
305
|
+
rescue Cton::ParseError => e
|
|
306
|
+
puts e.message # => "Unterminated object at line 1, column 20"
|
|
307
|
+
puts e.line # => 1
|
|
308
|
+
puts e.column # => 20
|
|
309
|
+
puts e.source_excerpt # => "...name=Alice,invalid"
|
|
310
|
+
puts e.suggestions # => ["Did you forget a closing ')'?"]
|
|
311
|
+
end
|
|
312
|
+
```
|
|
313
|
+
|
|
195
314
|
#### Extended Types
|
|
196
315
|
CTON natively supports serialization for:
|
|
197
316
|
- `Time` and `Date` (ISO8601 strings)
|
data/lib/cton/decoder.rb
CHANGED
|
@@ -6,6 +6,9 @@ module Cton
|
|
|
6
6
|
class Decoder
|
|
7
7
|
TERMINATORS = [",", ";", ")", "]", "}"].freeze
|
|
8
8
|
KEY_VALUE_BOUNDARY_TOKENS = ["(", "[", "="].freeze
|
|
9
|
+
SAFE_KEY_PATTERN = /[0-9A-Za-z_.:-]+/
|
|
10
|
+
INTEGER_PATTERN = /\A-?(?:0|[1-9]\d*)\z/
|
|
11
|
+
FLOAT_PATTERN = /\A-?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+-]?\d+)?\z/
|
|
9
12
|
|
|
10
13
|
def initialize(symbolize_names: false)
|
|
11
14
|
@symbolize_names = symbolize_names
|
|
@@ -14,7 +17,7 @@ module Cton
|
|
|
14
17
|
def decode(cton)
|
|
15
18
|
@raw_string = cton.to_s
|
|
16
19
|
@scanner = StringScanner.new(@raw_string)
|
|
17
|
-
|
|
20
|
+
skip_ws_and_comments
|
|
18
21
|
|
|
19
22
|
value = if key_ahead?
|
|
20
23
|
parse_document
|
|
@@ -22,7 +25,7 @@ module Cton
|
|
|
22
25
|
parse_value(allow_key_boundary: true)
|
|
23
26
|
end
|
|
24
27
|
|
|
25
|
-
|
|
28
|
+
skip_ws_and_comments
|
|
26
29
|
raise_error("Unexpected trailing data") unless @scanner.eos?
|
|
27
30
|
|
|
28
31
|
value
|
|
@@ -32,9 +35,16 @@ module Cton
|
|
|
32
35
|
|
|
33
36
|
attr_reader :symbolize_names, :scanner, :raw_string
|
|
34
37
|
|
|
35
|
-
def raise_error(message)
|
|
38
|
+
def raise_error(message, suggestions: nil)
|
|
36
39
|
line, col = calculate_location(@scanner.pos)
|
|
37
|
-
|
|
40
|
+
excerpt = extract_source_excerpt(@scanner.pos)
|
|
41
|
+
raise ParseError.new(
|
|
42
|
+
message,
|
|
43
|
+
line: line,
|
|
44
|
+
column: col,
|
|
45
|
+
source_excerpt: excerpt,
|
|
46
|
+
suggestions: suggestions
|
|
47
|
+
)
|
|
38
48
|
end
|
|
39
49
|
|
|
40
50
|
def calculate_location(pos)
|
|
@@ -46,19 +56,31 @@ module Cton
|
|
|
46
56
|
[line, col]
|
|
47
57
|
end
|
|
48
58
|
|
|
59
|
+
def extract_source_excerpt(pos, length: 30)
|
|
60
|
+
start = [pos - 10, 0].max
|
|
61
|
+
finish = [pos + length, raw_string.length].min
|
|
62
|
+
excerpt = raw_string[start...finish]
|
|
63
|
+
excerpt = "...#{excerpt}" if start.positive?
|
|
64
|
+
excerpt = "#{excerpt}..." if finish < raw_string.length
|
|
65
|
+
excerpt.gsub(/\s+/, " ")
|
|
66
|
+
end
|
|
67
|
+
|
|
49
68
|
def parse_document
|
|
50
69
|
result = {}
|
|
51
70
|
until @scanner.eos?
|
|
71
|
+
skip_ws_and_comments
|
|
72
|
+
break if @scanner.eos?
|
|
73
|
+
|
|
52
74
|
key = parse_key_name
|
|
53
75
|
value = parse_value_for_key
|
|
54
76
|
result[key] = value
|
|
55
|
-
|
|
77
|
+
skip_ws_and_comments
|
|
56
78
|
end
|
|
57
79
|
result
|
|
58
80
|
end
|
|
59
81
|
|
|
60
82
|
def parse_value_for_key
|
|
61
|
-
|
|
83
|
+
skip_ws_and_comments
|
|
62
84
|
if @scanner.scan("(")
|
|
63
85
|
parse_object
|
|
64
86
|
elsif @scanner.scan("[")
|
|
@@ -71,7 +93,7 @@ module Cton
|
|
|
71
93
|
end
|
|
72
94
|
|
|
73
95
|
def parse_object
|
|
74
|
-
|
|
96
|
+
skip_ws_and_comments
|
|
75
97
|
return {} if @scanner.scan(")")
|
|
76
98
|
|
|
77
99
|
pairs = {}
|
|
@@ -80,11 +102,11 @@ module Cton
|
|
|
80
102
|
expect!("=")
|
|
81
103
|
value = parse_value
|
|
82
104
|
pairs[key] = value
|
|
83
|
-
|
|
105
|
+
skip_ws_and_comments
|
|
84
106
|
break if @scanner.scan(")")
|
|
85
107
|
|
|
86
108
|
expect!(",")
|
|
87
|
-
|
|
109
|
+
skip_ws_and_comments
|
|
88
110
|
end
|
|
89
111
|
pairs
|
|
90
112
|
end
|
|
@@ -92,7 +114,7 @@ module Cton
|
|
|
92
114
|
def parse_array
|
|
93
115
|
length = parse_integer_literal
|
|
94
116
|
expect!("]")
|
|
95
|
-
|
|
117
|
+
skip_ws_and_comments
|
|
96
118
|
|
|
97
119
|
header = parse_header if @scanner.peek(1) == "{"
|
|
98
120
|
|
|
@@ -140,7 +162,7 @@ module Cton
|
|
|
140
162
|
end
|
|
141
163
|
|
|
142
164
|
def parse_value(allow_key_boundary: false)
|
|
143
|
-
|
|
165
|
+
skip_ws_and_comments
|
|
144
166
|
if @scanner.scan("(")
|
|
145
167
|
parse_object
|
|
146
168
|
elsif @scanner.scan("[")
|
|
@@ -153,7 +175,7 @@ module Cton
|
|
|
153
175
|
end
|
|
154
176
|
|
|
155
177
|
def parse_scalar(allow_key_boundary: false)
|
|
156
|
-
|
|
178
|
+
skip_ws_and_comments
|
|
157
179
|
return parse_string if @scanner.peek(1) == '"'
|
|
158
180
|
|
|
159
181
|
@scanner.pos
|
|
@@ -283,8 +305,8 @@ module Cton
|
|
|
283
305
|
end
|
|
284
306
|
|
|
285
307
|
def parse_key_name
|
|
286
|
-
|
|
287
|
-
token = @scanner.scan(
|
|
308
|
+
skip_ws_and_comments
|
|
309
|
+
token = @scanner.scan(SAFE_KEY_PATTERN)
|
|
288
310
|
raise_error("Invalid key") if token.nil?
|
|
289
311
|
symbolize_names ? token.to_sym : token
|
|
290
312
|
end
|
|
@@ -302,7 +324,7 @@ module Cton
|
|
|
302
324
|
end
|
|
303
325
|
|
|
304
326
|
def expect!(char)
|
|
305
|
-
|
|
327
|
+
skip_ws_and_comments
|
|
306
328
|
return if @scanner.scan(Regexp.new(Regexp.escape(char)))
|
|
307
329
|
|
|
308
330
|
raise_error("Expected #{char.inspect}, got #{@scanner.peek(1).inspect}")
|
|
@@ -312,16 +334,23 @@ module Cton
|
|
|
312
334
|
@scanner.skip(/\s+/)
|
|
313
335
|
end
|
|
314
336
|
|
|
337
|
+
def skip_ws_and_comments
|
|
338
|
+
loop do
|
|
339
|
+
@scanner.skip(/\s+/)
|
|
340
|
+
break unless @scanner.scan(/#[^\n]*\n?/)
|
|
341
|
+
end
|
|
342
|
+
end
|
|
343
|
+
|
|
315
344
|
def whitespace?(char)
|
|
316
345
|
[" ", "\t", "\n", "\r"].include?(char)
|
|
317
346
|
end
|
|
318
347
|
|
|
319
348
|
def key_ahead?
|
|
320
349
|
pos = @scanner.pos
|
|
321
|
-
|
|
350
|
+
skip_ws_and_comments
|
|
322
351
|
|
|
323
|
-
if @scanner.scan(
|
|
324
|
-
|
|
352
|
+
if @scanner.scan(SAFE_KEY_PATTERN)
|
|
353
|
+
skip_ws_and_comments
|
|
325
354
|
next_char = @scanner.peek(1)
|
|
326
355
|
result = ["(", "[", "="].include?(next_char)
|
|
327
356
|
@scanner.pos = pos
|
|
@@ -337,11 +366,11 @@ module Cton
|
|
|
337
366
|
end
|
|
338
367
|
|
|
339
368
|
def integer?(token)
|
|
340
|
-
token.match?(
|
|
369
|
+
token.match?(INTEGER_PATTERN)
|
|
341
370
|
end
|
|
342
371
|
|
|
343
372
|
def float?(token)
|
|
344
|
-
token.match?(
|
|
373
|
+
token.match?(FLOAT_PATTERN)
|
|
345
374
|
end
|
|
346
375
|
end
|
|
347
376
|
end
|
data/lib/cton/encoder.rb
CHANGED
|
@@ -11,10 +11,11 @@ module Cton
|
|
|
11
11
|
RESERVED_LITERALS = %w[true false null].freeze
|
|
12
12
|
FLOAT_DECIMAL_PRECISION = Float::DIG
|
|
13
13
|
|
|
14
|
-
def initialize(separator: "\n", pretty: false, decimal_mode: :fast)
|
|
14
|
+
def initialize(separator: "\n", pretty: false, decimal_mode: :fast, comments: nil)
|
|
15
15
|
@separator = separator || ""
|
|
16
16
|
@pretty = pretty
|
|
17
17
|
@decimal_mode = decimal_mode
|
|
18
|
+
@comments = comments || {}
|
|
18
19
|
raise ArgumentError, "decimal_mode must be :fast or :precise" unless %i[fast precise].include?(@decimal_mode)
|
|
19
20
|
|
|
20
21
|
@indent_level = 0
|
|
@@ -29,7 +30,7 @@ module Cton
|
|
|
29
30
|
|
|
30
31
|
private
|
|
31
32
|
|
|
32
|
-
attr_reader :separator, :io, :pretty, :indent_level, :decimal_mode
|
|
33
|
+
attr_reader :separator, :io, :pretty, :indent_level, :decimal_mode, :comments
|
|
33
34
|
|
|
34
35
|
def encode_root(value)
|
|
35
36
|
case value
|
|
@@ -37,6 +38,7 @@ module Cton
|
|
|
37
38
|
first = true
|
|
38
39
|
value.each do |key, nested|
|
|
39
40
|
io << separator unless first
|
|
41
|
+
emit_comment_for(key.to_s)
|
|
40
42
|
encode_top_level_pair(key, nested)
|
|
41
43
|
first = false
|
|
42
44
|
end
|
|
@@ -51,6 +53,9 @@ module Cton
|
|
|
51
53
|
end
|
|
52
54
|
|
|
53
55
|
def encode_value(value, context:)
|
|
56
|
+
# Check type registry first for custom transformations
|
|
57
|
+
value = Cton.type_registry.transform(value) if Cton.type_registry.registered?(value.class)
|
|
58
|
+
|
|
54
59
|
if defined?(Set) && value.is_a?(Set)
|
|
55
60
|
value = value.to_a
|
|
56
61
|
elsif defined?(OpenStruct) && value.is_a?(OpenStruct)
|
|
@@ -373,5 +378,14 @@ module Cton
|
|
|
373
378
|
def newline
|
|
374
379
|
io << "\n" << (" " * indent_level)
|
|
375
380
|
end
|
|
381
|
+
|
|
382
|
+
def emit_comment_for(key)
|
|
383
|
+
comment = comments[key] || comments[key.to_sym]
|
|
384
|
+
return unless comment
|
|
385
|
+
|
|
386
|
+
comment.to_s.each_line do |line|
|
|
387
|
+
io << "# " << line.chomp << "\n"
|
|
388
|
+
end
|
|
389
|
+
end
|
|
376
390
|
end
|
|
377
391
|
end
|
data/lib/cton/stats.rb
ADDED
|
@@ -0,0 +1,123 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "json"
|
|
4
|
+
|
|
5
|
+
module Cton
|
|
6
|
+
# Token statistics for comparing JSON vs CTON efficiency
|
|
7
|
+
class Stats
|
|
8
|
+
# Rough estimate: GPT models average ~4 characters per token
|
|
9
|
+
CHARS_PER_TOKEN = 4
|
|
10
|
+
|
|
11
|
+
attr_reader :data, :json_string, :cton_string
|
|
12
|
+
|
|
13
|
+
def initialize(data, cton_string: nil, json_string: nil)
|
|
14
|
+
@data = data
|
|
15
|
+
@json_string = json_string || JSON.generate(data)
|
|
16
|
+
@cton_string = cton_string || Cton.dump(data)
|
|
17
|
+
end
|
|
18
|
+
|
|
19
|
+
def json_chars
|
|
20
|
+
json_string.length
|
|
21
|
+
end
|
|
22
|
+
|
|
23
|
+
def cton_chars
|
|
24
|
+
cton_string.length
|
|
25
|
+
end
|
|
26
|
+
|
|
27
|
+
def json_bytes
|
|
28
|
+
json_string.bytesize
|
|
29
|
+
end
|
|
30
|
+
|
|
31
|
+
def cton_bytes
|
|
32
|
+
cton_string.bytesize
|
|
33
|
+
end
|
|
34
|
+
|
|
35
|
+
def savings_chars
|
|
36
|
+
json_chars - cton_chars
|
|
37
|
+
end
|
|
38
|
+
|
|
39
|
+
def savings_bytes
|
|
40
|
+
json_bytes - cton_bytes
|
|
41
|
+
end
|
|
42
|
+
|
|
43
|
+
def savings_percent
|
|
44
|
+
return 0.0 if json_chars.zero?
|
|
45
|
+
|
|
46
|
+
((1 - (cton_chars.to_f / json_chars)) * 100).round(1)
|
|
47
|
+
end
|
|
48
|
+
|
|
49
|
+
def estimated_json_tokens
|
|
50
|
+
(json_chars / CHARS_PER_TOKEN.to_f).ceil
|
|
51
|
+
end
|
|
52
|
+
|
|
53
|
+
def estimated_cton_tokens
|
|
54
|
+
(cton_chars / CHARS_PER_TOKEN.to_f).ceil
|
|
55
|
+
end
|
|
56
|
+
|
|
57
|
+
def estimated_token_savings
|
|
58
|
+
estimated_json_tokens - estimated_cton_tokens
|
|
59
|
+
end
|
|
60
|
+
|
|
61
|
+
def to_h
|
|
62
|
+
{
|
|
63
|
+
json_chars: json_chars,
|
|
64
|
+
cton_chars: cton_chars,
|
|
65
|
+
json_bytes: json_bytes,
|
|
66
|
+
cton_bytes: cton_bytes,
|
|
67
|
+
savings_chars: savings_chars,
|
|
68
|
+
savings_bytes: savings_bytes,
|
|
69
|
+
savings_percent: savings_percent,
|
|
70
|
+
estimated_tokens: {
|
|
71
|
+
json: estimated_json_tokens,
|
|
72
|
+
cton: estimated_cton_tokens,
|
|
73
|
+
savings: estimated_token_savings
|
|
74
|
+
}
|
|
75
|
+
}
|
|
76
|
+
end
|
|
77
|
+
|
|
78
|
+
def to_s
|
|
79
|
+
<<~STATS
|
|
80
|
+
JSON: #{json_chars} chars / #{json_bytes} bytes (~#{estimated_json_tokens} tokens)
|
|
81
|
+
CTON: #{cton_chars} chars / #{cton_bytes} bytes (~#{estimated_cton_tokens} tokens)
|
|
82
|
+
Saved: #{savings_percent}% (#{savings_chars} chars, ~#{estimated_token_savings} tokens)
|
|
83
|
+
STATS
|
|
84
|
+
end
|
|
85
|
+
|
|
86
|
+
# Compare multiple encoding options
|
|
87
|
+
def self.compare(data, options: {})
|
|
88
|
+
results = {}
|
|
89
|
+
|
|
90
|
+
# Standard CTON
|
|
91
|
+
results[:cton] = new(data).to_h
|
|
92
|
+
|
|
93
|
+
# Inline CTON (no separators)
|
|
94
|
+
inline_cton = Cton.dump(data, separator: "")
|
|
95
|
+
results[:cton_inline] = {
|
|
96
|
+
chars: inline_cton.length,
|
|
97
|
+
bytes: inline_cton.bytesize
|
|
98
|
+
}
|
|
99
|
+
|
|
100
|
+
# Pretty CTON
|
|
101
|
+
pretty_cton = Cton.dump(data, pretty: true)
|
|
102
|
+
results[:cton_pretty] = {
|
|
103
|
+
chars: pretty_cton.length,
|
|
104
|
+
bytes: pretty_cton.bytesize
|
|
105
|
+
}
|
|
106
|
+
|
|
107
|
+
# JSON variants
|
|
108
|
+
json = JSON.generate(data)
|
|
109
|
+
results[:json] = {
|
|
110
|
+
chars: json.length,
|
|
111
|
+
bytes: json.bytesize
|
|
112
|
+
}
|
|
113
|
+
|
|
114
|
+
pretty_json = JSON.pretty_generate(data)
|
|
115
|
+
results[:json_pretty] = {
|
|
116
|
+
chars: pretty_json.length,
|
|
117
|
+
bytes: pretty_json.bytesize
|
|
118
|
+
}
|
|
119
|
+
|
|
120
|
+
results
|
|
121
|
+
end
|
|
122
|
+
end
|
|
123
|
+
end
|
|
@@ -0,0 +1,137 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Cton
|
|
4
|
+
# Registry for custom type serializers
|
|
5
|
+
# Allows users to define how custom classes should be encoded to CTON
|
|
6
|
+
class TypeRegistry
|
|
7
|
+
# Handler wraps a serialization block with metadata
|
|
8
|
+
Handler = Struct.new(:klass, :mode, :block, keyword_init: true)
|
|
9
|
+
|
|
10
|
+
def initialize
|
|
11
|
+
@handlers = {}
|
|
12
|
+
end
|
|
13
|
+
|
|
14
|
+
# Register a custom type handler
|
|
15
|
+
#
|
|
16
|
+
# @param klass [Class] The class to handle
|
|
17
|
+
# @param as [Symbol] How to serialize: :object (Hash), :array, or :scalar
|
|
18
|
+
# @param block [Proc] Transformation block receiving the value
|
|
19
|
+
#
|
|
20
|
+
# @example Register a Money class
|
|
21
|
+
# Cton.register_type(Money) do |money|
|
|
22
|
+
# { amount: money.cents, currency: money.currency }
|
|
23
|
+
# end
|
|
24
|
+
#
|
|
25
|
+
# @example Register a UUID as scalar
|
|
26
|
+
# Cton.register_type(UUID, as: :scalar) do |uuid|
|
|
27
|
+
# uuid.to_s
|
|
28
|
+
# end
|
|
29
|
+
#
|
|
30
|
+
def register(klass, as: :object, &block)
|
|
31
|
+
raise ArgumentError, "Block required for type registration" unless block_given?
|
|
32
|
+
raise ArgumentError, "as must be :object, :array, or :scalar" unless %i[object array scalar].include?(as)
|
|
33
|
+
|
|
34
|
+
@handlers[klass] = Handler.new(klass: klass, mode: as, block: block)
|
|
35
|
+
end
|
|
36
|
+
|
|
37
|
+
# Unregister a type handler
|
|
38
|
+
#
|
|
39
|
+
# @param klass [Class] The class to unregister
|
|
40
|
+
def unregister(klass)
|
|
41
|
+
@handlers.delete(klass)
|
|
42
|
+
end
|
|
43
|
+
|
|
44
|
+
# Check if a handler exists for a class
|
|
45
|
+
#
|
|
46
|
+
# @param klass [Class] The class to check
|
|
47
|
+
# @return [Boolean]
|
|
48
|
+
def registered?(klass)
|
|
49
|
+
@handlers.key?(klass) || find_handler_for_ancestors(klass)
|
|
50
|
+
end
|
|
51
|
+
|
|
52
|
+
# Transform a value using its registered handler
|
|
53
|
+
# Returns the value unchanged if no handler is registered
|
|
54
|
+
#
|
|
55
|
+
# @param value [Object] The value to transform
|
|
56
|
+
# @return [Object] The transformed value
|
|
57
|
+
def transform(value)
|
|
58
|
+
handler = find_handler(value.class)
|
|
59
|
+
return value unless handler
|
|
60
|
+
|
|
61
|
+
handler.block.call(value)
|
|
62
|
+
end
|
|
63
|
+
|
|
64
|
+
# Get the handler for a class
|
|
65
|
+
#
|
|
66
|
+
# @param klass [Class] The class to look up
|
|
67
|
+
# @return [Handler, nil]
|
|
68
|
+
def handler_for(klass)
|
|
69
|
+
find_handler(klass)
|
|
70
|
+
end
|
|
71
|
+
|
|
72
|
+
# List all registered types
|
|
73
|
+
#
|
|
74
|
+
# @return [Array<Class>]
|
|
75
|
+
def registered_types
|
|
76
|
+
@handlers.keys
|
|
77
|
+
end
|
|
78
|
+
|
|
79
|
+
# Clear all registered handlers
|
|
80
|
+
def clear!
|
|
81
|
+
@handlers.clear
|
|
82
|
+
end
|
|
83
|
+
|
|
84
|
+
private
|
|
85
|
+
|
|
86
|
+
def find_handler(klass)
|
|
87
|
+
# Direct match
|
|
88
|
+
return @handlers[klass] if @handlers.key?(klass)
|
|
89
|
+
|
|
90
|
+
# Check ancestors (for inheritance support)
|
|
91
|
+
find_handler_for_ancestors(klass)
|
|
92
|
+
end
|
|
93
|
+
|
|
94
|
+
def find_handler_for_ancestors(klass)
|
|
95
|
+
klass.ancestors.each do |ancestor|
|
|
96
|
+
return @handlers[ancestor] if @handlers.key?(ancestor)
|
|
97
|
+
end
|
|
98
|
+
nil
|
|
99
|
+
end
|
|
100
|
+
end
|
|
101
|
+
|
|
102
|
+
# Global type registry instance
|
|
103
|
+
@type_registry = TypeRegistry.new
|
|
104
|
+
|
|
105
|
+
class << self
|
|
106
|
+
# Access the global type registry
|
|
107
|
+
#
|
|
108
|
+
# @return [TypeRegistry]
|
|
109
|
+
attr_reader :type_registry
|
|
110
|
+
|
|
111
|
+
# Register a custom type handler
|
|
112
|
+
#
|
|
113
|
+
# @param klass [Class] The class to handle
|
|
114
|
+
# @param as [Symbol] How to serialize: :object, :array, or :scalar
|
|
115
|
+
# @param block [Proc] Transformation block
|
|
116
|
+
#
|
|
117
|
+
# @example
|
|
118
|
+
# Cton.register_type(Money) do |money|
|
|
119
|
+
# { amount: money.cents, currency: money.currency }
|
|
120
|
+
# end
|
|
121
|
+
def register_type(klass, as: :object, &block)
|
|
122
|
+
type_registry.register(klass, as: as, &block)
|
|
123
|
+
end
|
|
124
|
+
|
|
125
|
+
# Unregister a custom type handler
|
|
126
|
+
#
|
|
127
|
+
# @param klass [Class] The class to unregister
|
|
128
|
+
def unregister_type(klass)
|
|
129
|
+
type_registry.unregister(klass)
|
|
130
|
+
end
|
|
131
|
+
|
|
132
|
+
# Clear all custom type handlers
|
|
133
|
+
def clear_type_registry!
|
|
134
|
+
type_registry.clear!
|
|
135
|
+
end
|
|
136
|
+
end
|
|
137
|
+
end
|