toon-ruby 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: aa973e72399a80aea212aefe4128d37ee2f25951962b5f7eb42573c4ba8c297f
4
+ data.tar.gz: 29e84d1f0c664914ea03c349922c6a47bcfe97d34374ab724197984cffe96870
5
+ SHA512:
6
+ metadata.gz: 97a54e987de421c56c2f736c873d567f1ab5f82b1dc022f560d8b3677fe826a7b42b6bbda26a4715bdad3fbf085ef8cff849c8d7f9a0c2892eb6c12436e1f685
7
+ data.tar.gz: bb30d9a9f276fa9f0666c45f1238f84ef6c30dd57025627d0e17d36100e600e3b2f408be8b9e667857fca4e6b39a6d7e5c69ce0ee54abb8468268a60e6d10479
data/LICENSE ADDED
@@ -0,0 +1,22 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025-PRESENT André Perdigão
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
22
+
data/README.md ADDED
@@ -0,0 +1,330 @@
1
+ # TOON for Ruby
2
+
3
+ **Token-Oriented Object Notation** is a compact, human-readable format designed for passing structured data to Large Language Models with significantly reduced token usage.
4
+
5
+ This is a Ruby port of the [TOON library](https://github.com/johannschopplich/toon) originally written in TypeScript.
6
+
7
+ ## Why TOON?
8
+
9
+ AI is becoming cheaper and more accessible, but larger context windows allow for larger data inputs as well. **LLM tokens still cost money** – and standard JSON is verbose and token-expensive:
10
+
11
+ ```json
12
+ {
13
+ "users": [
14
+ { "id": 1, "name": "Alice", "role": "admin" },
15
+ { "id": 2, "name": "Bob", "role": "user" }
16
+ ]
17
+ }
18
+ ```
19
+
20
+ TOON conveys the same information with **fewer tokens**:
21
+
22
+ ```
23
+ users[2]{id,name,role}:
24
+ 1,Alice,admin
25
+ 2,Bob,user
26
+ ```
27
+
28
+ ## Installation
29
+
30
+ Add this line to your application's Gemfile:
31
+
32
+ ```ruby
33
+ gem 'toon'
34
+ ```
35
+
36
+ And then execute:
37
+
38
+ ```bash
39
+ bundle install
40
+ ```
41
+
42
+ Or install it yourself as:
43
+
44
+ ```bash
45
+ gem install toon
46
+ ```
47
+
48
+ ## Quick Start
49
+
50
+ ```ruby
51
+ require 'toon'
52
+
53
+ data = {
54
+ 'user' => {
55
+ 'id' => 123,
56
+ 'name' => 'Ada',
57
+ 'tags' => ['reading', 'gaming'],
58
+ 'active' => true,
59
+ 'preferences' => []
60
+ }
61
+ }
62
+
63
+ puts Toon.encode(data)
64
+ ```
65
+
66
+ Output:
67
+
68
+ ```
69
+ user:
70
+ id: 123
71
+ name: Ada
72
+ tags[2]: reading,gaming
73
+ active: true
74
+ preferences[0]:
75
+ ```
76
+
77
+ ## Key Features
78
+
79
+ - 💸 **Token-efficient:** typically 30–60% fewer tokens than JSON
80
+ - 🤿 **LLM-friendly guardrails:** explicit lengths and field lists help models validate output
81
+ - 🍱 **Minimal syntax:** removes redundant punctuation (braces, brackets, most quotes)
82
+ - 📐 **Indentation-based structure:** replaces braces with whitespace for better readability
83
+ - 🧺 **Tabular arrays:** declare keys once, then stream rows without repetition
84
+
85
+ ## API
86
+
87
+ ### `Toon.encode(value, **options)`
88
+
89
+ Converts any value to TOON format.
90
+
91
+ **Parameters:**
92
+
93
+ - `value` – Any value to encode (Hash, Array, primitives, or nested structures)
94
+ - `indent` – Number of spaces per indentation level (default: `2`)
95
+ - `delimiter` – Delimiter for array values and tabular rows: `','`, `"\t"`, or `'|'` (default: `','`)
96
+ - `length_marker` – Optional marker to prefix array lengths: `'#'` or `false` (default: `false`)
97
+
98
+ **Returns:**
99
+
100
+ A TOON-formatted string with no trailing newline or spaces.
101
+
102
+ **Examples:**
103
+
104
+ ```ruby
105
+ # Basic usage
106
+ Toon.encode({ 'id' => 1, 'name' => 'Ada' })
107
+ # => "id: 1\nname: Ada"
108
+
109
+ # Tabular arrays
110
+ items = [
111
+ { 'sku' => 'A1', 'qty' => 2, 'price' => 9.99 },
112
+ { 'sku' => 'B2', 'qty' => 1, 'price' => 14.5 }
113
+ ]
114
+ Toon.encode({ 'items' => items })
115
+ # => "items[2]{sku,qty,price}:\n A1,2,9.99\n B2,1,14.5"
116
+
117
+ # Custom delimiter (tab)
118
+ Toon.encode(items, delimiter: "\t")
119
+ # => "items[2\t]{sku\tqty\tprice}:\n A1\t2\t9.99\n B2\t1\t14.5"
120
+
121
+ # Length marker
122
+ Toon.encode({ 'tags' => ['a', 'b', 'c'] }, length_marker: '#')
123
+ # => "tags[#3]: a,b,c"
124
+ ```
125
+
126
+ ## Format Overview
127
+
128
+ ### Objects
129
+
130
+ Simple objects with primitive values:
131
+
132
+ ```ruby
133
+ Toon.encode({
134
+ 'id' => 123,
135
+ 'name' => 'Ada',
136
+ 'active' => true
137
+ })
138
+ ```
139
+
140
+ ```
141
+ id: 123
142
+ name: Ada
143
+ active: true
144
+ ```
145
+
146
+ Nested objects:
147
+
148
+ ```ruby
149
+ Toon.encode({
150
+ 'user' => {
151
+ 'id' => 123,
152
+ 'name' => 'Ada'
153
+ }
154
+ })
155
+ ```
156
+
157
+ ```
158
+ user:
159
+ id: 123
160
+ name: Ada
161
+ ```
162
+
163
+ ### Arrays
164
+
165
+ #### Primitive Arrays (Inline)
166
+
167
+ ```ruby
168
+ Toon.encode({ 'tags' => ['admin', 'ops', 'dev'] })
169
+ ```
170
+
171
+ ```
172
+ tags[3]: admin,ops,dev
173
+ ```
174
+
175
+ #### Arrays of Objects (Tabular)
176
+
177
+ When all objects share the same primitive fields, TOON uses an efficient **tabular format**:
178
+
179
+ ```ruby
180
+ Toon.encode({
181
+ 'items' => [
182
+ { 'sku' => 'A1', 'qty' => 2, 'price' => 9.99 },
183
+ { 'sku' => 'B2', 'qty' => 1, 'price' => 14.5 }
184
+ ]
185
+ })
186
+ ```
187
+
188
+ ```
189
+ items[2]{sku,qty,price}:
190
+ A1,2,9.99
191
+ B2,1,14.5
192
+ ```
193
+
194
+ #### Mixed and Non-Uniform Arrays
195
+
196
+ Arrays that don't meet the tabular requirements use list format:
197
+
198
+ ```ruby
199
+ Toon.encode({
200
+ 'items' => [1, { 'a' => 1 }, 'text']
201
+ })
202
+ ```
203
+
204
+ ```
205
+ items[3]:
206
+ - 1
207
+ - a: 1
208
+ - text
209
+ ```
210
+
211
+ ### Delimiter Options
212
+
213
+ The `delimiter` option allows you to choose between comma (default), tab, or pipe delimiters:
214
+
215
+ ```ruby
216
+ # Tab delimiter (can save additional tokens)
217
+ data = {
218
+ 'items' => [
219
+ { 'sku' => 'A1', 'name' => 'Widget', 'qty' => 2 },
220
+ { 'sku' => 'B2', 'name' => 'Gadget', 'qty' => 1 }
221
+ ]
222
+ }
223
+
224
+ Toon.encode(data, delimiter: "\t")
225
+ ```
226
+
227
+ Output:
228
+
229
+ ```
230
+ items[2 ]{sku name qty}:
231
+ A1 Widget 2
232
+ B2 Gadget 1
233
+ ```
234
+
235
+ ### Length Marker Option
236
+
237
+ The `length_marker` option adds a hash (`#`) prefix to array lengths:
238
+
239
+ ```ruby
240
+ data = {
241
+ 'tags' => ['reading', 'gaming', 'coding'],
242
+ 'items' => [
243
+ { 'sku' => 'A1', 'qty' => 2 },
244
+ { 'sku' => 'B2', 'qty' => 1 }
245
+ ]
246
+ }
247
+
248
+ Toon.encode(data, length_marker: '#')
249
+ ```
250
+
251
+ Output:
252
+
253
+ ```
254
+ tags[#3]: reading,gaming,coding
255
+ items[#2]{sku,qty}:
256
+ A1,2
257
+ B2,1
258
+ ```
259
+
260
+ ## Type Conversions
261
+
262
+ Some Ruby types are automatically normalized:
263
+
264
+ | Input | Output |
265
+ |---|---|
266
+ | `Symbol` | String (`:hello` → `"hello"`) |
267
+ | `Time`, `DateTime` | ISO8601 string |
268
+ | `Date` | ISO8601 string |
269
+ | `Float::INFINITY`, `Float::NAN` | `null` |
270
+ | `Set` | Array |
271
+
272
+ ## Quoting Rules
273
+
274
+ TOON quotes strings **only when necessary** to maximize token efficiency:
275
+
276
+ - Empty strings are quoted: `""`
277
+ - Strings with leading/trailing spaces: `" padded "`
278
+ - Strings that look like booleans/numbers: `"true"`, `"42"`
279
+ - Strings with structural characters: `"a,b"`, `"a:b"`, `"[5]"`
280
+ - The active delimiter triggers quoting
281
+
282
+ Keys follow similar rules and are quoted when needed.
283
+
284
+ ## Using TOON in LLM Prompts
285
+
286
+ When incorporating TOON into your LLM workflows:
287
+
288
+ - Wrap TOON data in a fenced code block in your prompt
289
+ - Tell the model: "Do not add extra punctuation or spaces; follow the exact TOON format."
290
+ - When asking the model to generate TOON, specify the same rules (2-space indentation, no trailing spaces, quoting rules)
291
+
292
+ ## Notes and Limitations
293
+
294
+ - **Token counts vary by tokenizer and model.** Benchmarks use a GPT-style tokenizer; actual savings will differ with other models.
295
+ - **TOON is designed for LLM contexts** where human readability and token efficiency matter. It's **not** a drop-in replacement for JSON in APIs or storage.
296
+ - **Tabular arrays** require all objects to have exactly the same keys with primitive values only.
297
+ - **Object key order** is preserved from the input. In tabular arrays, header order follows the first object's keys.
298
+
299
+ ## Development
300
+
301
+ After checking out the repo, run:
302
+
303
+ ```bash
304
+ bundle install
305
+ ```
306
+
307
+ Run the test suite:
308
+
309
+ ```bash
310
+ bundle exec rspec
311
+ ```
312
+
313
+ Or simply:
314
+
315
+ ```bash
316
+ rake
317
+ ```
318
+
319
+ ## Contributing
320
+
321
+ Bug reports and pull requests are welcome on GitHub at https://github.com/johannschopplich/toon.
322
+
323
+ ## License
324
+
325
+ The gem is available as open source under the terms of the [MIT License](LICENSE).
326
+
327
+ ## Credits
328
+
329
+ This is a Ruby port of the original [TOON library](https://github.com/johannschopplich/toon) by [Johann Schopplich](https://github.com/johannschopplich).
330
+
@@ -0,0 +1,40 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Toon
4
+ # List markers
5
+ LIST_ITEM_MARKER = '-'
6
+ LIST_ITEM_PREFIX = '- '
7
+
8
+ # Structural characters
9
+ COMMA = ','
10
+ COLON = ':'
11
+ SPACE = ' '
12
+ PIPE = '|'
13
+
14
+ # Brackets and braces
15
+ OPEN_BRACKET = '['
16
+ CLOSE_BRACKET = ']'
17
+ OPEN_BRACE = '{'
18
+ CLOSE_BRACE = '}'
19
+
20
+ # Literals
21
+ NULL_LITERAL = 'null'
22
+ TRUE_LITERAL = 'true'
23
+ FALSE_LITERAL = 'false'
24
+
25
+ # Escape characters
26
+ BACKSLASH = '\\'
27
+ DOUBLE_QUOTE = '"'
28
+ NEWLINE = "\n"
29
+ CARRIAGE_RETURN = "\r"
30
+ TAB = "\t"
31
+
32
+ # Delimiters
33
+ DELIMITERS = {
34
+ comma: COMMA,
35
+ tab: TAB,
36
+ pipe: PIPE
37
+ }.freeze
38
+
39
+ DEFAULT_DELIMITER = DELIMITERS[:comma]
40
+ end
@@ -0,0 +1,261 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative 'constants'
4
+ require_relative 'writer'
5
+ require_relative 'normalizer'
6
+ require_relative 'primitives'
7
+
8
+ module Toon
9
+ module Encoders
10
+ module_function
11
+
12
+ extend Normalizer
13
+ extend Primitives
14
+
15
+ # Encode normalized value
16
+ def encode_value(value, options)
17
+ if json_primitive?(value)
18
+ return encode_primitive(value, options[:delimiter])
19
+ end
20
+
21
+ writer = LineWriter.new(options[:indent])
22
+
23
+ if json_array?(value)
24
+ encode_array(nil, value, writer, 0, options)
25
+ elsif json_object?(value)
26
+ encode_object(value, writer, 0, options)
27
+ end
28
+
29
+ writer.to_s
30
+ end
31
+
32
+ # Object encoding
33
+ def encode_object(value, writer, depth, options)
34
+ keys = value.keys
35
+
36
+ keys.each do |key|
37
+ encode_key_value_pair(key, value[key], writer, depth, options)
38
+ end
39
+ end
40
+
41
+ def encode_key_value_pair(key, value, writer, depth, options)
42
+ encoded_key = encode_key(key)
43
+
44
+ if json_primitive?(value)
45
+ writer.push(depth, "#{encoded_key}: #{encode_primitive(value, options[:delimiter])}")
46
+ elsif json_array?(value)
47
+ encode_array(key, value, writer, depth, options)
48
+ elsif json_object?(value)
49
+ nested_keys = value.keys
50
+ if nested_keys.empty?
51
+ # Empty object
52
+ writer.push(depth, "#{encoded_key}:")
53
+ else
54
+ writer.push(depth, "#{encoded_key}:")
55
+ encode_object(value, writer, depth + 1, options)
56
+ end
57
+ end
58
+ end
59
+
60
+ # Array encoding
61
+ def encode_array(key, value, writer, depth, options)
62
+ if value.empty?
63
+ header = format_header(0, key: key, delimiter: options[:delimiter], length_marker: options[:length_marker])
64
+ writer.push(depth, header)
65
+ return
66
+ end
67
+
68
+ # Primitive array
69
+ if array_of_primitives?(value)
70
+ encode_inline_primitive_array(key, value, writer, depth, options)
71
+ return
72
+ end
73
+
74
+ # Array of arrays (all primitives)
75
+ if array_of_arrays?(value)
76
+ all_primitive_arrays = value.all? { |arr| array_of_primitives?(arr) }
77
+ if all_primitive_arrays
78
+ encode_array_of_arrays_as_list_items(key, value, writer, depth, options)
79
+ return
80
+ end
81
+ end
82
+
83
+ # Array of objects
84
+ if array_of_objects?(value)
85
+ header = detect_tabular_header(value)
86
+ if header
87
+ encode_array_of_objects_as_tabular(key, value, header, writer, depth, options)
88
+ else
89
+ encode_mixed_array_as_list_items(key, value, writer, depth, options)
90
+ end
91
+ return
92
+ end
93
+
94
+ # Mixed array: fallback to expanded format
95
+ encode_mixed_array_as_list_items(key, value, writer, depth, options)
96
+ end
97
+
98
+ # Primitive array encoding (inline)
99
+ def encode_inline_primitive_array(prefix, values, writer, depth, options)
100
+ formatted = format_inline_array(values, options[:delimiter], prefix, options[:length_marker])
101
+ writer.push(depth, formatted)
102
+ end
103
+
104
+ # Array of arrays (expanded format)
105
+ def encode_array_of_arrays_as_list_items(prefix, values, writer, depth, options)
106
+ header = format_header(values.length, key: prefix, delimiter: options[:delimiter], length_marker: options[:length_marker])
107
+ writer.push(depth, header)
108
+
109
+ values.each do |arr|
110
+ if array_of_primitives?(arr)
111
+ inline = format_inline_array(arr, options[:delimiter], nil, options[:length_marker])
112
+ writer.push(depth + 1, "#{LIST_ITEM_PREFIX}#{inline}")
113
+ end
114
+ end
115
+ end
116
+
117
+ def format_inline_array(values, delimiter, prefix = nil, length_marker = false)
118
+ header = format_header(values.length, key: prefix, delimiter: delimiter, length_marker: length_marker)
119
+ joined_value = join_encoded_values(values, delimiter)
120
+ # Only add space if there are values
121
+ if values.empty?
122
+ header
123
+ else
124
+ "#{header} #{joined_value}"
125
+ end
126
+ end
127
+
128
+ # Array of objects (tabular format)
129
+ def encode_array_of_objects_as_tabular(prefix, rows, header, writer, depth, options)
130
+ header_str = format_header(rows.length, key: prefix, fields: header, delimiter: options[:delimiter], length_marker: options[:length_marker])
131
+ writer.push(depth, header_str)
132
+
133
+ write_tabular_rows(rows, header, writer, depth + 1, options)
134
+ end
135
+
136
+ def detect_tabular_header(rows)
137
+ return nil if rows.empty?
138
+
139
+ first_row = rows[0]
140
+ first_keys = first_row.keys
141
+ return nil if first_keys.empty?
142
+
143
+ if tabular_array?(rows, first_keys)
144
+ first_keys
145
+ else
146
+ nil
147
+ end
148
+ end
149
+
150
+ def tabular_array?(rows, header)
151
+ rows.all? do |row|
152
+ keys = row.keys
153
+
154
+ # All objects must have the same keys (but order can differ)
155
+ return false if keys.length != header.length
156
+
157
+ # Check that all header keys exist in the row and all values are primitives
158
+ header.all? do |key|
159
+ row.key?(key) && json_primitive?(row[key])
160
+ end
161
+ end
162
+ end
163
+
164
+ def write_tabular_rows(rows, header, writer, depth, options)
165
+ rows.each do |row|
166
+ values = header.map { |key| row[key] }
167
+ joined_value = join_encoded_values(values, options[:delimiter])
168
+ writer.push(depth, joined_value)
169
+ end
170
+ end
171
+
172
+ # Array of objects (expanded format)
173
+ def encode_mixed_array_as_list_items(prefix, items, writer, depth, options)
174
+ header = format_header(items.length, key: prefix, delimiter: options[:delimiter], length_marker: options[:length_marker])
175
+ writer.push(depth, header)
176
+
177
+ items.each do |item|
178
+ if json_primitive?(item)
179
+ # Direct primitive as list item
180
+ writer.push(depth + 1, "#{LIST_ITEM_PREFIX}#{encode_primitive(item, options[:delimiter])}")
181
+ elsif json_array?(item)
182
+ # Direct array as list item
183
+ if array_of_primitives?(item)
184
+ inline = format_inline_array(item, options[:delimiter], nil, options[:length_marker])
185
+ writer.push(depth + 1, "#{LIST_ITEM_PREFIX}#{inline}")
186
+ end
187
+ elsif json_object?(item)
188
+ # Object as list item
189
+ encode_object_as_list_item(item, writer, depth + 1, options)
190
+ end
191
+ end
192
+ end
193
+
194
+ def encode_object_as_list_item(obj, writer, depth, options)
195
+ keys = obj.keys
196
+ if keys.empty?
197
+ writer.push(depth, LIST_ITEM_MARKER)
198
+ return
199
+ end
200
+
201
+ # First key-value on the same line as "- "
202
+ first_key = keys[0]
203
+ encoded_key = encode_key(first_key)
204
+ first_value = obj[first_key]
205
+
206
+ if json_primitive?(first_value)
207
+ writer.push(depth, "#{LIST_ITEM_PREFIX}#{encoded_key}: #{encode_primitive(first_value, options[:delimiter])}")
208
+ elsif json_array?(first_value)
209
+ if array_of_primitives?(first_value)
210
+ # Inline format for primitive arrays
211
+ formatted = format_inline_array(first_value, options[:delimiter], first_key, options[:length_marker])
212
+ writer.push(depth, "#{LIST_ITEM_PREFIX}#{formatted}")
213
+ elsif array_of_objects?(first_value)
214
+ # Check if array of objects can use tabular format
215
+ header = detect_tabular_header(first_value)
216
+ if header
217
+ # Tabular format for uniform arrays of objects
218
+ header_str = format_header(first_value.length, key: first_key, fields: header, delimiter: options[:delimiter], length_marker: options[:length_marker])
219
+ writer.push(depth, "#{LIST_ITEM_PREFIX}#{header_str}")
220
+ write_tabular_rows(first_value, header, writer, depth + 1, options)
221
+ else
222
+ # Fall back to list format for non-uniform arrays of objects
223
+ writer.push(depth, "#{LIST_ITEM_PREFIX}#{encoded_key}[#{first_value.length}]:")
224
+ first_value.each do |item|
225
+ encode_object_as_list_item(item, writer, depth + 1, options)
226
+ end
227
+ end
228
+ else
229
+ # Complex arrays on separate lines (array of arrays, etc.)
230
+ writer.push(depth, "#{LIST_ITEM_PREFIX}#{encoded_key}[#{first_value.length}]:")
231
+
232
+ # Encode array contents at depth + 1
233
+ first_value.each do |item|
234
+ if json_primitive?(item)
235
+ writer.push(depth + 1, "#{LIST_ITEM_PREFIX}#{encode_primitive(item, options[:delimiter])}")
236
+ elsif json_array?(item) && array_of_primitives?(item)
237
+ inline = format_inline_array(item, options[:delimiter], nil, options[:length_marker])
238
+ writer.push(depth + 1, "#{LIST_ITEM_PREFIX}#{inline}")
239
+ elsif json_object?(item)
240
+ encode_object_as_list_item(item, writer, depth + 1, options)
241
+ end
242
+ end
243
+ end
244
+ elsif json_object?(first_value)
245
+ nested_keys = first_value.keys
246
+ if nested_keys.empty?
247
+ writer.push(depth, "#{LIST_ITEM_PREFIX}#{encoded_key}:")
248
+ else
249
+ writer.push(depth, "#{LIST_ITEM_PREFIX}#{encoded_key}:")
250
+ encode_object(first_value, writer, depth + 2, options)
251
+ end
252
+ end
253
+
254
+ # Remaining keys on indented lines
255
+ (1...keys.length).each do |i|
256
+ key = keys[i]
257
+ encode_key_value_pair(key, obj[key], writer, depth + 1, options)
258
+ end
259
+ end
260
+ end
261
+ end
@@ -0,0 +1,104 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'set'
4
+ require 'date'
5
+
6
+ module Toon
7
+ module Normalizer
8
+ module_function
9
+
10
+ # Normalization (unknown → JSON-compatible value)
11
+ def normalize_value(value)
12
+ # null
13
+ return nil if value.nil?
14
+
15
+ # Primitives
16
+ return value if value.is_a?(String) || value.is_a?(TrueClass) || value.is_a?(FalseClass)
17
+
18
+ # Numbers: handle special cases
19
+ if value.is_a?(Numeric)
20
+ # Float special cases
21
+ if value.is_a?(Float)
22
+ # -0.0 becomes 0
23
+ return 0 if value.zero? && (1.0 / value).negative?
24
+ # NaN and Infinity become nil
25
+ return nil unless value.finite?
26
+ end
27
+ return value
28
+ end
29
+
30
+ # Symbol → string
31
+ return value.to_s if value.is_a?(Symbol)
32
+
33
+ # Time → ISO8601 string
34
+ if value.is_a?(Time)
35
+ return value.utc.strftime('%Y-%m-%dT%H:%M:%SZ')
36
+ end
37
+
38
+ # DateTime → ISO8601 string
39
+ if value.respond_to?(:iso8601) && !value.is_a?(Date)
40
+ return value.iso8601
41
+ end
42
+
43
+ # Date → ISO8601 string
44
+ if value.is_a?(Date)
45
+ return value.to_time.utc.iso8601
46
+ end
47
+
48
+ # Array
49
+ if value.is_a?(Array)
50
+ return value.map { |v| normalize_value(v) }
51
+ end
52
+
53
+ # Set → array
54
+ if value.is_a?(Set)
55
+ return value.to_a.map { |v| normalize_value(v) }
56
+ end
57
+
58
+ # Hash/object
59
+ if value.is_a?(Hash)
60
+ result = {}
61
+ value.each do |k, v|
62
+ result[k.to_s] = normalize_value(v)
63
+ end
64
+ return result
65
+ end
66
+
67
+ # Fallback: anything else becomes nil (functions, etc.)
68
+ nil
69
+ end
70
+
71
+ # Type guards
72
+ def json_primitive?(value)
73
+ value.nil? ||
74
+ value.is_a?(String) ||
75
+ value.is_a?(Numeric) ||
76
+ value.is_a?(TrueClass) ||
77
+ value.is_a?(FalseClass)
78
+ end
79
+
80
+ def json_array?(value)
81
+ value.is_a?(Array)
82
+ end
83
+
84
+ def json_object?(value)
85
+ value.is_a?(Hash)
86
+ end
87
+
88
+ # Array type detection
89
+ def array_of_primitives?(value)
90
+ return false unless value.is_a?(Array)
91
+ value.all? { |item| json_primitive?(item) }
92
+ end
93
+
94
+ def array_of_arrays?(value)
95
+ return false unless value.is_a?(Array)
96
+ value.all? { |item| json_array?(item) }
97
+ end
98
+
99
+ def array_of_objects?(value)
100
+ return false unless value.is_a?(Array)
101
+ value.all? { |item| json_object?(item) }
102
+ end
103
+ end
104
+ end
@@ -0,0 +1,109 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative 'constants'
4
+
5
+ module Toon
6
+ module Primitives
7
+ module_function
8
+
9
+ # Primitive encoding
10
+ def encode_primitive(value, delimiter = COMMA)
11
+ return NULL_LITERAL if value.nil?
12
+ return value.to_s if value.is_a?(TrueClass) || value.is_a?(FalseClass)
13
+ return value.to_s if value.is_a?(Numeric)
14
+
15
+ encode_string_literal(value, delimiter)
16
+ end
17
+
18
+ def encode_string_literal(value, delimiter = COMMA)
19
+ if safe_unquoted?(value, delimiter)
20
+ value
21
+ else
22
+ "#{DOUBLE_QUOTE}#{escape_string(value)}#{DOUBLE_QUOTE}"
23
+ end
24
+ end
25
+
26
+ def escape_string(value)
27
+ value
28
+ .gsub(BACKSLASH, "#{BACKSLASH}#{BACKSLASH}")
29
+ .gsub(DOUBLE_QUOTE, "#{BACKSLASH}#{DOUBLE_QUOTE}")
30
+ .gsub("\n", "#{BACKSLASH}n")
31
+ .gsub("\r", "#{BACKSLASH}r")
32
+ .gsub("\t", "#{BACKSLASH}t")
33
+ end
34
+
35
+ def safe_unquoted?(value, delimiter = COMMA)
36
+ return false if value.empty?
37
+ return false if padded_with_whitespace?(value)
38
+ return false if value == TRUE_LITERAL || value == FALSE_LITERAL || value == NULL_LITERAL
39
+ return false if numeric_like?(value)
40
+ return false if value.include?(COLON)
41
+ return false if value.include?(DOUBLE_QUOTE) || value.include?(BACKSLASH)
42
+ return false if value.match?(/[\[\]{}]/)
43
+ return false if value.match?(/[\n\r\t]/)
44
+ return false if value.include?(delimiter)
45
+ return false if value.start_with?(LIST_ITEM_MARKER)
46
+
47
+ true
48
+ end
49
+
50
+ def numeric_like?(value)
51
+ # Match numbers like: 42, -3.14, 1e-6, 05, etc.
52
+ value.match?(/^-?\d+(?:\.\d+)?(?:e[+-]?\d+)?$/i) || value.match?(/^0\d+$/)
53
+ end
54
+
55
+ def padded_with_whitespace?(value)
56
+ value != value.strip
57
+ end
58
+
59
+ # Key encoding
60
+ def encode_key(key)
61
+ if valid_unquoted_key?(key)
62
+ key
63
+ else
64
+ "#{DOUBLE_QUOTE}#{escape_string(key)}#{DOUBLE_QUOTE}"
65
+ end
66
+ end
67
+
68
+ def valid_unquoted_key?(key)
69
+ # Keys must not contain control characters or special characters
70
+ return false if key.match?(/[\n\r\t]/)
71
+ return false if key.include?(COLON)
72
+ return false if key.include?(DOUBLE_QUOTE) || key.include?(BACKSLASH)
73
+ return false if key.match?(/[\[\]{}]/)
74
+ return false if key.include?(COMMA)
75
+ return false if key.start_with?(LIST_ITEM_MARKER)
76
+ return false if key.empty?
77
+ return false if key.match?(/^\d+$/) # Numeric keys
78
+ return false if key != key.strip # Leading/trailing spaces
79
+
80
+ key.match?(/^[A-Z_][\w.]*$/i)
81
+ end
82
+
83
+ # Value joining
84
+ def join_encoded_values(values, delimiter = COMMA)
85
+ values.map { |v| encode_primitive(v, delimiter) }.join(delimiter)
86
+ end
87
+
88
+ # Header formatters
89
+ def format_header(length, key: nil, fields: nil, delimiter: COMMA, length_marker: false)
90
+ header = ''
91
+
92
+ header += encode_key(key) if key
93
+
94
+ # Only include delimiter if it's not the default (comma)
95
+ delimiter_suffix = delimiter != DEFAULT_DELIMITER ? delimiter : ''
96
+ length_prefix = length_marker ? length_marker : ''
97
+ header += "[#{length_prefix}#{length}#{delimiter_suffix}]"
98
+
99
+ if fields
100
+ quoted_fields = fields.map { |f| encode_key(f) }
101
+ header += "{#{quoted_fields.join(delimiter)}}"
102
+ end
103
+
104
+ header += COLON
105
+
106
+ header
107
+ end
108
+ end
109
+ end
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Toon
4
+ VERSION = '0.1.0'
5
+ end
@@ -0,0 +1,19 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Toon
4
+ class LineWriter
5
+ def initialize(indent_size)
6
+ @lines = []
7
+ @indentation_string = ' ' * indent_size
8
+ end
9
+
10
+ def push(depth, content)
11
+ indent = @indentation_string * depth
12
+ @lines << indent + content
13
+ end
14
+
15
+ def to_s
16
+ @lines.join("\n")
17
+ end
18
+ end
19
+ end
data/lib/toon.rb ADDED
@@ -0,0 +1,33 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative 'toon/version'
4
+ require_relative 'toon/constants'
5
+ require_relative 'toon/writer'
6
+ require_relative 'toon/normalizer'
7
+ require_relative 'toon/primitives'
8
+ require_relative 'toon/encoders'
9
+
10
+ module Toon
11
+ module_function
12
+
13
+ # Encode any value to TOON format
14
+ #
15
+ # @param input [Object] Any value to encode
16
+ # @param indent [Integer] Number of spaces per indentation level (default: 2)
17
+ # @param delimiter [String] Delimiter for array values and tabular rows (default: ',')
18
+ # @param length_marker [String, false] Optional marker to prefix array lengths (default: false)
19
+ # @return [String] TOON-formatted string
20
+ def encode(input, indent: 2, delimiter: DEFAULT_DELIMITER, length_marker: false)
21
+ normalized_value = Normalizer.normalize_value(input)
22
+ options = resolve_options(indent: indent, delimiter: delimiter, length_marker: length_marker)
23
+ Encoders.encode_value(normalized_value, options)
24
+ end
25
+
26
+ def resolve_options(indent:, delimiter:, length_marker:)
27
+ {
28
+ indent: indent,
29
+ delimiter: delimiter,
30
+ length_marker: length_marker
31
+ }
32
+ end
33
+ end
metadata ADDED
@@ -0,0 +1,84 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: toon-ruby
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - André Perdigão
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2025-10-27 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: rake
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '13.0'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '13.0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rspec
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '3.12'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '3.12'
41
+ description: TOON is a compact, human-readable format designed for passing structured
42
+ data to Large Language Models with significantly reduced token usage.
43
+ email:
44
+ - andrepcg@gmail.com
45
+ executables: []
46
+ extensions: []
47
+ extra_rdoc_files: []
48
+ files:
49
+ - LICENSE
50
+ - README.md
51
+ - lib/toon.rb
52
+ - lib/toon/constants.rb
53
+ - lib/toon/encoders.rb
54
+ - lib/toon/normalizer.rb
55
+ - lib/toon/primitives.rb
56
+ - lib/toon/version.rb
57
+ - lib/toon/writer.rb
58
+ homepage: https://github.com/andrepcg/toon-ruby
59
+ licenses:
60
+ - MIT
61
+ metadata:
62
+ homepage_uri: https://github.com/andrepcg/toon-ruby
63
+ source_code_uri: https://github.com/andrepcg/toon-ruby
64
+ post_install_message:
65
+ rdoc_options: []
66
+ require_paths:
67
+ - lib
68
+ required_ruby_version: !ruby/object:Gem::Requirement
69
+ requirements:
70
+ - - ">="
71
+ - !ruby/object:Gem::Version
72
+ version: 2.7.0
73
+ required_rubygems_version: !ruby/object:Gem::Requirement
74
+ requirements:
75
+ - - ">="
76
+ - !ruby/object:Gem::Version
77
+ version: '0'
78
+ requirements: []
79
+ rubygems_version: 3.5.22
80
+ signing_key:
81
+ specification_version: 4
82
+ summary: Token-Oriented Object Notation – a token-efficient JSON alternative for LLM
83
+ prompts
84
+ test_files: []