@toon-format/spec 2.0.1 → 3.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +55 -38
- package/README.md +19 -3
- package/SPEC.md +70 -70
- package/package.json +1 -1
- package/tests/fixtures/decode/arrays-nested.json +22 -5
- package/tests/fixtures/encode/arrays-nested.json +1 -1
- package/tests/fixtures/encode/arrays-objects.json +28 -8
package/CHANGELOG.md
CHANGED
|
@@ -5,87 +5,104 @@ All notable changes to the TOON specification will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [3.0] - 2025-11-24
|
|
9
|
+
|
|
10
|
+
### Breaking Changes
|
|
11
|
+
|
|
12
|
+
- Standardized encoding for list-item objects whose first field is a tabular array (§10):
|
|
13
|
+
- Encoders MUST emit `- key[N]{fields}:` on the hyphen line.
|
|
14
|
+
- Tabular rows MUST appear at depth +2 relative to the hyphen line.
|
|
15
|
+
- All other fields of the same object MUST appear at depth +1.
|
|
16
|
+
- The v2.0 shallow form (rows and fields at the same depth) and the v2.1 bare-hyphen form are no longer normative and MUST NOT be emitted by conforming encoders.
|
|
17
|
+
|
|
18
|
+
### Changed
|
|
19
|
+
|
|
20
|
+
- Encoding/decoding rules (§10) simplified to describe only the YAML-style pattern; legacy layouts are treated as generic nesting and are not covered by conformance tests.
|
|
21
|
+
- Nested tabular list-item example in Appendix A updated to the canonical v3.0 form.
|
|
22
|
+
|
|
23
|
+
### Migration from v2.1
|
|
24
|
+
|
|
25
|
+
- Update encoders to emit the YAML-style form for list-item objects whose first field is a tabular array.
|
|
26
|
+
- If you rely on v2.0/v2.1 layouts, keep decoder compatibility in non-strict or implementation-defined modes; the spec no longer requires or tests these patterns.
|
|
27
|
+
- Optionally regenerate existing `.toon` files for consistent v3 formatting.
|
|
28
|
+
|
|
29
|
+
## [2.1] - 2025-11-23
|
|
30
|
+
|
|
31
|
+
### Changed
|
|
32
|
+
|
|
33
|
+
- Canonical encoding for objects as list items (§10):
|
|
34
|
+
- Encoders SHOULD emit `- key[N]{fields}:` only when the list-item object has exactly one field and that field is a tabular array.
|
|
35
|
+
- In all other cases, encoders SHOULD emit a bare `-` line and place all fields at depth +1; tabular array headers then appear at depth +1 and their rows at depth +2.
|
|
36
|
+
|
|
8
37
|
## [2.0] - 2025-11-10
|
|
9
38
|
|
|
10
39
|
### Breaking Changes
|
|
11
40
|
|
|
12
|
-
-
|
|
13
|
-
-
|
|
14
|
-
- Encoders MUST NOT emit `[#N]` format
|
|
15
|
-
- Decoders MUST NOT accept `[#N]` format (breaking change from v1.5)
|
|
41
|
+
- Removed `[#N]` length-marker syntax in array headers; `[N]` is now the only valid format.
|
|
42
|
+
- Encoders MUST NOT emit `[#N]`; decoders MUST reject it.
|
|
16
43
|
|
|
17
44
|
### Removed
|
|
18
45
|
|
|
19
|
-
-
|
|
20
|
-
- `lengthMarker` encoder option removed from all implementations
|
|
21
|
-
- Length marker test fixtures removed
|
|
46
|
+
- The `lengthMarker` encoder option and any CLI flags exposing it.
|
|
22
47
|
|
|
23
48
|
### Migration from v1.5
|
|
24
49
|
|
|
25
|
-
- Update
|
|
26
|
-
- Convert
|
|
27
|
-
- Remove `lengthMarker`
|
|
28
|
-
- Remove `--length-marker` CLI flags if present
|
|
50
|
+
- Update decoders to reject `[#N]` syntax.
|
|
51
|
+
- Convert existing `.toon` files using `[#N]` to `[N]`.
|
|
52
|
+
- Remove `lengthMarker` configuration and CLI options.
|
|
29
53
|
|
|
30
54
|
## [1.5] - 2025-11-08
|
|
31
55
|
|
|
32
56
|
### Added
|
|
33
57
|
|
|
34
|
-
- Optional key folding for encoders: `keyFolding="safe"`
|
|
35
|
-
- Optional path expansion for decoders: `expandPaths="safe"`
|
|
36
|
-
- IdentifierSegment terminology and
|
|
37
|
-
- Deep-merge semantics for path expansion: recursive merge for objects, error on conflict when `strict=true`, last-write-wins (LWW) when `strict=false` (§13.4)
|
|
58
|
+
- Optional key folding for encoders: `keyFolding="safe"` with `flattenDepth` to collapse single-key object chains into dotted paths (§13.4).
|
|
59
|
+
- Optional path expansion for decoders: `expandPaths="safe"` to split dotted keys into nested objects with deep-merge semantics and conflict handling tied to `strict` (§13.4, §14.5).
|
|
60
|
+
- IdentifierSegment terminology and fixed `"."` path separator for safe folding/expansion (§1.9).
|
|
38
61
|
|
|
39
62
|
### Changed
|
|
40
63
|
|
|
41
|
-
-
|
|
42
|
-
-
|
|
64
|
+
- Safe-mode folding requires IdentifierSegment-only segments, no path separator in segments, no quoting, and collision avoidance.
|
|
65
|
+
- Both features default to `off` and are backward-compatible.
|
|
43
66
|
|
|
44
67
|
## [1.4] - 2025-11-05
|
|
45
68
|
|
|
46
69
|
### Changed
|
|
47
70
|
|
|
48
|
-
-
|
|
49
|
-
-
|
|
50
|
-
- Clarified
|
|
51
|
-
- Expanded `\w` regex notation to explicit character class `[A-Za-z0-9_]` for cross-language clarity (Section 7.3)
|
|
52
|
-
- Clarified non-strict mode tab handling as implementation-defined (Section 12)
|
|
71
|
+
- Generalized normalization rules and defined canonical number format for encoders (no exponent notation, no trailing zeros, no leading zeros except `"0"`), plus decoder handling of exponent forms and out-of-range numbers (§2-§3).
|
|
72
|
+
- Replaced `\w` with explicit `[A-Za-z0-9_]` in key regexes for cross-language clarity (§7.3).
|
|
73
|
+
- Clarified non-strict mode tab handling as implementation-defined (§12).
|
|
53
74
|
|
|
54
75
|
### Added
|
|
55
76
|
|
|
56
|
-
- Appendix G
|
|
77
|
+
- Appendix G with host-type normalization examples for Go, JavaScript, Python, and Rust.
|
|
57
78
|
|
|
58
79
|
## [1.3] - 2025-10-31
|
|
59
80
|
|
|
60
81
|
### Added
|
|
61
82
|
|
|
62
|
-
- Numeric precision requirements: JavaScript implementations SHOULD use `Number.toString()` precision (15
|
|
63
|
-
- RFC 5234 core rules (ALPHA, DIGIT, DQUOTE, HTAB, LF, SP) to ABNF grammar definitions (
|
|
83
|
+
- Numeric precision requirements: JavaScript implementations SHOULD use `Number.toString()` precision (15–17 digits); all implementations MUST preserve round-trip fidelity (§2).
|
|
84
|
+
- RFC 5234 core rules (ALPHA, DIGIT, DQUOTE, HTAB, LF, SP) to ABNF grammar definitions (§6).
|
|
64
85
|
|
|
65
86
|
## [1.2] - 2025-10-29
|
|
66
87
|
|
|
67
88
|
### Changed
|
|
68
89
|
|
|
69
|
-
-
|
|
70
|
-
-
|
|
71
|
-
- Defined blank-line and trailing-newline decoding behavior with explicit skipping rules outside arrays
|
|
72
|
-
- Clarified hyphen-based quoting: "-" or any string starting with "-" MUST be quoted
|
|
73
|
-
- Clarified BigInt normalization: values outside safe integer range are converted to quoted decimal strings
|
|
74
|
-
- Clarified row/key disambiguation: uses first unquoted delimiter vs colon position
|
|
90
|
+
- Tightened delimiter scoping, indentation, blank-line handling, and hyphen-based quoting rules (§11-§12).
|
|
91
|
+
- Clarified BigInt normalization (out-of-range values → quoted decimal strings) and row/key disambiguation (first unquoted delimiter vs colon) (§2, §9.3).
|
|
75
92
|
|
|
76
93
|
## [1.1] - 2025-10-29
|
|
77
94
|
|
|
78
95
|
### Added
|
|
79
96
|
|
|
80
|
-
- Strict-mode rules
|
|
81
|
-
- Delimiter-aware parsing
|
|
82
|
-
- Decoder options (indent
|
|
97
|
+
- Strict-mode rules.
|
|
98
|
+
- Delimiter-aware parsing.
|
|
99
|
+
- Decoder options (`indent`, `strict`).
|
|
83
100
|
|
|
84
101
|
## [1.0] - 2025-10-28
|
|
85
102
|
|
|
86
103
|
### Added
|
|
87
104
|
|
|
88
|
-
- Initial specification release
|
|
89
|
-
- Encoding normalization rules
|
|
90
|
-
- Decoding interpretation guidelines
|
|
91
|
-
- Conformance requirements
|
|
105
|
+
- Initial specification release.
|
|
106
|
+
- Encoding normalization rules.
|
|
107
|
+
- Decoding interpretation guidelines.
|
|
108
|
+
- Conformance requirements.
|
package/README.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
# TOON Format Specification
|
|
2
2
|
|
|
3
|
-
[](./SPEC.md)
|
|
4
|
+
[](./tests/fixtures/)
|
|
5
5
|
[](./LICENSE)
|
|
6
6
|
|
|
7
7
|
This repository contains the official specification for **Token-Oriented Object Notation (TOON)**, a compact, human-readable encoding of the JSON data model for LLM prompts. It provides a lossless serialization of the same objects, arrays, and primitives as JSON, but in a syntax that minimizes tokens and makes structure easy for models to follow.
|
|
@@ -10,7 +10,7 @@ This repository contains the official specification for **Token-Oriented Object
|
|
|
10
10
|
|
|
11
11
|
[→ Read the full specification (SPEC.md)](./SPEC.md)
|
|
12
12
|
|
|
13
|
-
- **Version:**
|
|
13
|
+
- **Version:** 3.0 (2025-11-24)
|
|
14
14
|
- **Status:** Working Draft
|
|
15
15
|
- **License:** MIT
|
|
16
16
|
|
|
@@ -122,6 +122,22 @@ The [tests/fixtures/](./tests/fixtures/) directory contains **language-agnostic
|
|
|
122
122
|
|
|
123
123
|
See [tests/README.md](./tests/README.md) for detailed fixture format and usage instructions.
|
|
124
124
|
|
|
125
|
+
## Media Type & File Extension
|
|
126
|
+
|
|
127
|
+
TOON defines a provisional media type (see §18.2 of the specification):
|
|
128
|
+
|
|
129
|
+
- **Media type:** `text/toon` (provisional, pending IANA registration)
|
|
130
|
+
- **File extension:** `.toon`
|
|
131
|
+
- **Charset:** Always UTF-8
|
|
132
|
+
|
|
133
|
+
For HTTP usage:
|
|
134
|
+
|
|
135
|
+
```http
|
|
136
|
+
Content-Type: text/toon
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
See the full [IANA Considerations section](SPEC.md#18-iana-considerations) for details.
|
|
140
|
+
|
|
125
141
|
## Contributing
|
|
126
142
|
|
|
127
143
|
We welcome contributions to improve the specification! Please see [CONTRIBUTING.md](./CONTRIBUTING.md) for:
|
package/SPEC.md
CHANGED
|
@@ -2,9 +2,9 @@
|
|
|
2
2
|
|
|
3
3
|
## Token-Oriented Object Notation
|
|
4
4
|
|
|
5
|
-
**Version:**
|
|
5
|
+
**Version:** 3.0
|
|
6
6
|
|
|
7
|
-
**Date:** 2025-11-
|
|
7
|
+
**Date:** 2025-11-24
|
|
8
8
|
|
|
9
9
|
**Status:** Working Draft
|
|
10
10
|
|
|
@@ -20,7 +20,7 @@ Token-Oriented Object Notation (TOON) is a line-oriented, indentation-based text
|
|
|
20
20
|
|
|
21
21
|
## Status of This Document
|
|
22
22
|
|
|
23
|
-
This document is a Working Draft
|
|
23
|
+
This document is a Working Draft v3.0 and may be updated, replaced, or obsoleted. Implementers should monitor the canonical repository at https://github.com/toon-format/spec for changes.
|
|
24
24
|
|
|
25
25
|
This specification is stable for implementation but not yet finalized. Breaking changes may occur in future major versions.
|
|
26
26
|
|
|
@@ -227,12 +227,11 @@ Implementations that fail to conform to any MUST or REQUIRED level requirement a
|
|
|
227
227
|
|
|
228
228
|
## 3. Encoding Normalization (Reference Encoder)
|
|
229
229
|
|
|
230
|
-
Encoders MUST normalize non-JSON values to the JSON data model before encoding
|
|
230
|
+
Encoders MUST normalize non-JSON values to the JSON data model before encoding. The mapping from host-specific types to JSON model is implementation-defined and MUST be documented.
|
|
231
231
|
|
|
232
232
|
- Number:
|
|
233
233
|
- Finite → number (canonical decimal form per Section 2). -0 → 0.
|
|
234
234
|
- NaN, +Infinity, -Infinity → null.
|
|
235
|
-
- Non-JSON types MUST be normalized to the JSON data model (object, array, string, number, boolean, or null) before encoding. The mapping from host-specific types to JSON model is implementation-defined and MUST be documented.
|
|
236
235
|
- Examples of host-type normalization (non-normative):
|
|
237
236
|
- Date/time objects → ISO 8601 string representation.
|
|
238
237
|
- Set-like collections → array.
|
|
@@ -384,9 +383,9 @@ A string value MUST be quoted if any of the following is true:
|
|
|
384
383
|
- It contains a colon (:), double quote ("), or backslash (\).
|
|
385
384
|
- It contains brackets or braces ([, ], {, }).
|
|
386
385
|
- It contains control characters: newline, carriage return, or tab.
|
|
387
|
-
- It contains the relevant delimiter:
|
|
388
|
-
-
|
|
389
|
-
-
|
|
386
|
+
- It contains the relevant delimiter (see §11 for complete delimiter rules):
|
|
387
|
+
- For inline array values and tabular row cells: the active delimiter from the nearest array header.
|
|
388
|
+
- For object field values (key: value): the document delimiter, even when the object is within an array's scope.
|
|
390
389
|
- It equals "-" or starts with "-" (any hyphen at position 0).
|
|
391
390
|
|
|
392
391
|
Otherwise, the string MAY be emitted without quotes. Unicode, emoji, and strings with internal (non-leading/trailing) spaces are safe unquoted provided they do not violate the conditions.
|
|
@@ -403,12 +402,10 @@ Encoders MAY perform key folding when enabled (see §13.4 for complete folding r
|
|
|
403
402
|
|
|
404
403
|
### 7.4 Decoding Rules for Strings and Keys (Decoding)
|
|
405
404
|
|
|
406
|
-
|
|
407
|
-
|
|
408
|
-
|
|
409
|
-
|
|
410
|
-
- Otherwise → strings
|
|
411
|
-
- Keys (quoted or unquoted) MUST be followed by ":"; missing colon MUST error.
|
|
405
|
+
Decoding of value tokens follows §4 (unquoted type inference, quoted strings, numeric rules). This section adds key-specific requirements:
|
|
406
|
+
|
|
407
|
+
- Quoted keys MUST be unescaped per Section 7.1; any other escape MUST error.
|
|
408
|
+
- Keys (quoted or unquoted) MUST be followed by ":"; missing colon MUST error (see also §14.2).
|
|
412
409
|
|
|
413
410
|
## 8. Objects
|
|
414
411
|
|
|
@@ -421,7 +418,6 @@ Encoders MAY perform key folding when enabled (see §13.4 for complete folding r
|
|
|
421
418
|
- Decoding:
|
|
422
419
|
- A line "key:" with nothing after the colon at depth d opens an object; subsequent lines at depth > d belong to that object until the depth decreases to ≤ d.
|
|
423
420
|
- Lines "key: value" at the same depth are sibling fields.
|
|
424
|
-
- Missing colon after a key MUST error.
|
|
425
421
|
|
|
426
422
|
## 9. Arrays
|
|
427
423
|
|
|
@@ -474,6 +470,7 @@ Decoding:
|
|
|
474
470
|
- Delimiter before colon → row.
|
|
475
471
|
- Colon before delimiter → key-value line (end of rows).
|
|
476
472
|
- If a line has an unquoted colon but no unquoted active delimiter → key-value line (end of rows).
|
|
473
|
+
- When a tabular array appears as the first field of a list-item object, indentation is governed by Section 10.
|
|
477
474
|
|
|
478
475
|
### 9.4 Mixed / Non-Uniform Arrays — Expanded List
|
|
479
476
|
|
|
@@ -499,20 +496,18 @@ Decoding:
|
|
|
499
496
|
For an object appearing as a list item:
|
|
500
497
|
|
|
501
498
|
- Empty object list item: a single "-" at the list-item indentation level.
|
|
502
|
-
-
|
|
503
|
-
-
|
|
504
|
-
|
|
505
|
-
|
|
506
|
-
-
|
|
507
|
-
|
|
508
|
-
|
|
509
|
-
|
|
510
|
-
|
|
511
|
-
-
|
|
512
|
-
|
|
513
|
-
|
|
514
|
-
- The first field is parsed from the hyphen line. If it is a nested object (- key:), nested fields are at +2 relative to the hyphen line; subsequent fields of the same list item are at +1.
|
|
515
|
-
- If the first field is a tabular header on the hyphen line, its rows are at +1; subsequent sibling fields continue at +1 after the rows.
|
|
499
|
+
- Encoding (normative):
|
|
500
|
+
- When a list-item object has a tabular array (Section 9.3) as its first field in encounter order, encoders MUST emit the tabular header on the hyphen line:
|
|
501
|
+
- The hyphen and tabular header appear on the same line at the list-item depth: - key[N<delim?>]{fields}:
|
|
502
|
+
- Tabular rows MUST appear at depth +2 (relative to the hyphen line).
|
|
503
|
+
- All other fields of the same object MUST appear at depth +1 under the hyphen line, in encounter order, using normal object field rules (Section 8).
|
|
504
|
+
- Encoders MUST NOT emit tabular rows at depth +1 or sibling fields at the same depth as rows when the first field is a tabular array.
|
|
505
|
+
- For all other cases (first field is not a tabular array), encoders SHOULD place the first field on the hyphen line. A bare hyphen on its own line is used only for empty list-item objects.
|
|
506
|
+
- Decoding (normative):
|
|
507
|
+
- When a decoder encounters a list-item line of the form - key[N<delim?>]{fields}: at depth d, it MUST treat this as the start of a tabular array field named key in the list-item object.
|
|
508
|
+
- Lines at depth d+2 that conform to tabular row syntax (Section 9.3) are rows of that tabular array.
|
|
509
|
+
- Lines at depth d+1 are additional fields of the same list-item object; the presence of a line at depth d+1 after rows terminates the rows.
|
|
510
|
+
- All other object-as-list-item patterns (bare hyphen, first field on hyphen line for non-tabular values) are decoded according to the general rules in Section 8 and Section 9.
|
|
516
511
|
|
|
517
512
|
## 11. Delimiters
|
|
518
513
|
|
|
@@ -520,19 +515,25 @@ Decoding:
|
|
|
520
515
|
- Comma (default): header omits the delimiter symbol.
|
|
521
516
|
- Tab: header includes HTAB inside brackets and braces (e.g., [N<TAB>], {a<TAB>b}); rows/inline arrays use tabs.
|
|
522
517
|
- Pipe: header includes "|" inside brackets and braces; rows/inline arrays use "|".
|
|
523
|
-
|
|
524
|
-
|
|
525
|
-
|
|
526
|
-
-
|
|
527
|
-
|
|
528
|
-
|
|
529
|
-
-
|
|
530
|
-
-
|
|
531
|
-
-
|
|
518
|
+
|
|
519
|
+
### 11.1 Encoding Rules (Normative for Encoders)
|
|
520
|
+
|
|
521
|
+
- Document delimiter: Encoders select a document delimiter (option: comma, tab, pipe; default comma) that influences quoting for all object field values (key: value) throughout the document.
|
|
522
|
+
- Active delimiter: Inside an array header's scope, the active delimiter governs quoting only for inline array values and tabular row cells.
|
|
523
|
+
- Delimiter-aware quoting:
|
|
524
|
+
- Inline array values and tabular row cells: strings containing the active delimiter MUST be quoted.
|
|
525
|
+
- Object field values (key: value): encoders use the document delimiter to decide delimiter-aware quoting, regardless of whether the object appears within an array's scope.
|
|
526
|
+
- Strings containing non-active delimiters do not require quoting unless another condition applies (§7.2).
|
|
527
|
+
|
|
528
|
+
### 11.2 Decoding Rules (Normative for Decoders)
|
|
529
|
+
|
|
530
|
+
- Active delimiter: Decoders use only the active delimiter declared by the nearest array header to split inline arrays and tabular rows.
|
|
531
|
+
- Delimiter-aware parsing:
|
|
532
|
+
- Inline arrays and tabular rows MUST be split only on the active delimiter.
|
|
532
533
|
- Splitting MUST preserve empty tokens; surrounding spaces are trimmed, and empty tokens decode to the empty string.
|
|
533
|
-
- Strings containing the active delimiter MUST be quoted to avoid splitting; non-active delimiters MUST NOT cause splits.
|
|
534
534
|
- Nested headers may change the active delimiter; decoding MUST use the delimiter declared by the nearest header.
|
|
535
|
-
- If the bracket declares tab or pipe, the same symbol MUST be used in the fields segment and for splitting all rows/values in that scope.
|
|
535
|
+
- If the bracket declares tab or pipe, the same symbol MUST be used in the fields segment and for splitting all rows/values in that scope (§6).
|
|
536
|
+
- Object field values (key: value): Decoders parse the entire post-colon token as a single value; document delimiter is not a decoder concept.
|
|
536
537
|
|
|
537
538
|
## 12. Indentation and Whitespace
|
|
538
539
|
|
|
@@ -730,12 +731,14 @@ When strict mode is enabled (default), decoders MUST error on the following cond
|
|
|
730
731
|
|
|
731
732
|
### 14.3 Indentation Errors
|
|
732
733
|
|
|
734
|
+
See §12 for indentation semantics. In strict mode, decoders MUST error on:
|
|
733
735
|
- Leading spaces not a multiple of indentSize.
|
|
734
736
|
- Any tab used in indentation (tabs allowed in quoted strings and as HTAB delimiter).
|
|
735
737
|
|
|
736
738
|
### 14.4 Structural Errors
|
|
737
739
|
|
|
738
|
-
|
|
740
|
+
See §12 for blank line semantics. In strict mode, decoders MUST error on:
|
|
741
|
+
- Blank lines inside arrays/tabular rows (between the first and last item/row).
|
|
739
742
|
|
|
740
743
|
For root-form rules, including handling of empty documents, see §5.
|
|
741
744
|
|
|
@@ -894,7 +897,11 @@ This specification does not request IANA registration at this time, as the forma
|
|
|
894
897
|
|
|
895
898
|
### 18.2 Provisional Media Type
|
|
896
899
|
|
|
897
|
-
|
|
900
|
+
Until IANA registration is completed, implementations SHOULD use:
|
|
901
|
+
- Media type: `text/toon`
|
|
902
|
+
- File extension: `.toon`
|
|
903
|
+
|
|
904
|
+
Full designation details:
|
|
898
905
|
|
|
899
906
|
Type name: text
|
|
900
907
|
|
|
@@ -989,11 +996,13 @@ Nested tabular inside a list item:
|
|
|
989
996
|
```
|
|
990
997
|
items[1]:
|
|
991
998
|
- users[2]{id,name}:
|
|
992
|
-
|
|
993
|
-
|
|
999
|
+
1,Ada
|
|
1000
|
+
2,Bob
|
|
994
1001
|
status: active
|
|
995
1002
|
```
|
|
996
1003
|
|
|
1004
|
+
Note: When a list-item object has a tabular array as its first field, encoders emit the tabular header on the hyphen line with rows at depth +2 and other fields at depth +1. This is the canonical encoding for list-item objects whose first field is a tabular array.
|
|
1005
|
+
|
|
997
1006
|
Delimiter variations:
|
|
998
1007
|
```
|
|
999
1008
|
items[2 ]{sku name qty price}:
|
|
@@ -1218,52 +1227,43 @@ Note: Host-type normalization tests (e.g., BigInt, Date, Set, Map) are language-
|
|
|
1218
1227
|
|
|
1219
1228
|
## Appendix D: Document Changelog (Informative)
|
|
1220
1229
|
|
|
1230
|
+
This appendix summarizes major changes between spec versions. For the complete changelog, see [`CHANGELOG.md`](./CHANGELOG.md) in the specification repository.
|
|
1231
|
+
|
|
1232
|
+
### v3.0 (2025-11-24)
|
|
1233
|
+
|
|
1234
|
+
- Standardized encoding for list-item objects whose first field is a tabular array (§10).
|
|
1235
|
+
|
|
1236
|
+
### v2.1 (2025-11-23)
|
|
1237
|
+
|
|
1238
|
+
- Tightened canonical encoding for objects as list items (§10): bare `-` for multi-field objects, compact `- key[N]{fields}:` only for single-field tabular arrays, to improve visual consistency and LLM readability.
|
|
1239
|
+
|
|
1221
1240
|
### v2.0 (2025-11-10)
|
|
1222
1241
|
|
|
1223
|
-
-
|
|
1224
|
-
- The `[#N]` format is no longer valid syntax. All array headers MUST use `[N]` format only.
|
|
1225
|
-
- Encoders MUST NOT emit `[#N]` format.
|
|
1226
|
-
- Decoders MUST NOT accept `[#N]` format (breaking change from v1.5).
|
|
1227
|
-
- Removed all references to length marker from terminology, grammar, conformance requirements, and parsing helpers.
|
|
1242
|
+
- Removed `[#N]` length-marker syntax from array headers; `[N]` is now the only valid form.
|
|
1228
1243
|
|
|
1229
1244
|
### v1.5 (2025-11-08)
|
|
1230
1245
|
|
|
1231
|
-
- Added optional key folding
|
|
1232
|
-
- Added optional path expansion for decoders: `expandPaths='safe'` mode with conflict resolution tied to existing `strict` option (§13.4).
|
|
1233
|
-
- Defined safe-mode requirements for folding: IdentifierSegment validation, no path separator in segments, collision avoidance, no quoting required (§7.3, §13.4).
|
|
1234
|
-
- Specified deep-merge semantics for expansion: recursive merge for objects; conflict policy (error in strict mode, LWW when strict=false) for non-objects (§13.4).
|
|
1235
|
-
- Added strict-mode error category for path expansion conflicts (§14.5).
|
|
1236
|
-
- Both features default to OFF; fully backward-compatible.
|
|
1246
|
+
- Added optional key folding (`keyFolding="safe"`) and path expansion (`expandPaths="safe"`) with deep-merge semantics and strict-mode conflict handling (§13.4, §14.5).
|
|
1237
1247
|
|
|
1238
1248
|
### v1.4 (2025-11-05)
|
|
1239
1249
|
|
|
1240
|
-
-
|
|
1241
|
-
- Defined canonical number format for encoders and decoder acceptance rules (Section 2).
|
|
1242
|
-
- Added Appendix G with host-type normalization examples for Go, JavaScript, Python, and Rust.
|
|
1243
|
-
- Clarified non-strict mode tab handling as implementation-defined (Section 12).
|
|
1244
|
-
- Expanded regex notation for cross-language clarity (Section 7.3).
|
|
1250
|
+
- Generalized normalization and numeric canonicalization rules, and added host-type normalization guidance (Appendix G).
|
|
1245
1251
|
|
|
1246
1252
|
### v1.3 (2025-10-31)
|
|
1247
1253
|
|
|
1248
|
-
- Added numeric precision
|
|
1249
|
-
- Added RFC 5234 core rules (ALPHA, DIGIT, DQUOTE, HTAB, LF, SP) to ABNF grammar definitions (Section 6).
|
|
1254
|
+
- Added numeric precision guidance and ABNF core rules for headers and keys (§2, §6).
|
|
1250
1255
|
|
|
1251
1256
|
### v1.2 (2025-10-29)
|
|
1252
1257
|
|
|
1253
|
-
-
|
|
1254
|
-
- Tightened strict-mode indentation requirements: leading spaces MUST be exact multiples of indentSize; tabs in indentation MUST error.
|
|
1255
|
-
- Defined blank-line and trailing-newline decoding behavior with explicit skipping rules outside arrays.
|
|
1256
|
-
- Clarified hyphen-based quoting: "-" or any string starting with "-" MUST be quoted.
|
|
1257
|
-
- Clarified BigInt normalization: values outside safe integer range are converted to quoted decimal strings.
|
|
1258
|
-
- Clarified row/key disambiguation: uses first unquoted delimiter vs colon position.
|
|
1258
|
+
- Tightened delimiter scoping, indentation, blank-line handling, hyphen-based quoting, BigInt normalization, and row/key disambiguation rules (§2, §9, §11-§12).
|
|
1259
1259
|
|
|
1260
1260
|
### v1.1 (2025-10-29)
|
|
1261
1261
|
|
|
1262
|
-
|
|
1262
|
+
- Introduced strict-mode validation, delimiter-aware parsing, and decoder options (indent, strict).
|
|
1263
1263
|
|
|
1264
1264
|
### v1.0 (2025-10-28)
|
|
1265
1265
|
|
|
1266
|
-
Initial encoding
|
|
1266
|
+
- Initial specification: encoding normalization, decoding interpretation, and conformance requirements.
|
|
1267
1267
|
|
|
1268
1268
|
## Appendix E: Acknowledgments and License
|
|
1269
1269
|
|
package/package.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@toon-format/spec",
|
|
3
3
|
"type": "module",
|
|
4
|
-
"version": "
|
|
4
|
+
"version": "3.0.0",
|
|
5
5
|
"packageManager": "pnpm@10.19.0",
|
|
6
6
|
"description": "Official specification for Token-Oriented Object Notation (TOON)",
|
|
7
7
|
"author": "Johann Schopplich <hello@johannschopplich.com>",
|
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
{
|
|
2
|
-
"version": "
|
|
2
|
+
"version": "3.0",
|
|
3
3
|
"category": "decode",
|
|
4
4
|
"description": "Nested and mixed array decoding - list format, arrays of arrays, root arrays, mixed types",
|
|
5
5
|
"tests": [
|
|
@@ -52,8 +52,8 @@
|
|
|
52
52
|
"specSection": "9.4"
|
|
53
53
|
},
|
|
54
54
|
{
|
|
55
|
-
"name": "parses
|
|
56
|
-
"input": "items[1]:\n - users[2]{id,name}:\n
|
|
55
|
+
"name": "parses list items whose first field is a tabular array",
|
|
56
|
+
"input": "items[1]:\n - users[2]{id,name}:\n 1,Ada\n 2,Bob\n status: active",
|
|
57
57
|
"expected": {
|
|
58
58
|
"items": [
|
|
59
59
|
{
|
|
@@ -65,7 +65,24 @@
|
|
|
65
65
|
}
|
|
66
66
|
]
|
|
67
67
|
},
|
|
68
|
-
"specSection": "10"
|
|
68
|
+
"specSection": "10",
|
|
69
|
+
"note": "Canonical encoding: tabular header on hyphen line, rows at depth +2, sibling fields at depth +1"
|
|
70
|
+
},
|
|
71
|
+
{
|
|
72
|
+
"name": "parses single-field list-item object with tabular array",
|
|
73
|
+
"input": "items[1]:\n - users[2]{id,name}:\n 1,Ada\n 2,Bob",
|
|
74
|
+
"expected": {
|
|
75
|
+
"items": [
|
|
76
|
+
{
|
|
77
|
+
"users": [
|
|
78
|
+
{ "id": 1, "name": "Ada" },
|
|
79
|
+
{ "id": 2, "name": "Bob" }
|
|
80
|
+
]
|
|
81
|
+
}
|
|
82
|
+
]
|
|
83
|
+
},
|
|
84
|
+
"specSection": "10",
|
|
85
|
+
"note": "Single-field list-item object: only the tabular array, no sibling fields"
|
|
69
86
|
},
|
|
70
87
|
{
|
|
71
88
|
"name": "parses objects containing arrays (including empty arrays) in list format",
|
|
@@ -79,7 +96,7 @@
|
|
|
79
96
|
},
|
|
80
97
|
{
|
|
81
98
|
"name": "parses arrays of arrays within objects",
|
|
82
|
-
"input": "items[1]:\n - matrix[2]:\n
|
|
99
|
+
"input": "items[1]:\n - matrix[2]:\n - [2]: 1,2\n - [2]: 3,4\n name: grid",
|
|
83
100
|
"expected": {
|
|
84
101
|
"items": [
|
|
85
102
|
{ "matrix": [[1, 2], [3, 4]], "name": "grid" }
|
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
{
|
|
2
|
-
"version": "
|
|
2
|
+
"version": "3.0",
|
|
3
3
|
"category": "encode",
|
|
4
4
|
"description": "Arrays of objects encoding - list format for non-uniform objects and complex structures",
|
|
5
5
|
"tests": [
|
|
@@ -47,7 +47,7 @@
|
|
|
47
47
|
{ "matrix": [[1, 2], [3, 4]], "name": "grid" }
|
|
48
48
|
]
|
|
49
49
|
},
|
|
50
|
-
"expected": "items[1]:\n - matrix[2]:\n
|
|
50
|
+
"expected": "items[1]:\n - matrix[2]:\n - [2]: 1,2\n - [2]: 3,4\n name: grid",
|
|
51
51
|
"specSection": "10"
|
|
52
52
|
},
|
|
53
53
|
{
|
|
@@ -57,8 +57,9 @@
|
|
|
57
57
|
{ "users": [{ "id": 1, "name": "Ada" }, { "id": 2, "name": "Bob" }], "status": "active" }
|
|
58
58
|
]
|
|
59
59
|
},
|
|
60
|
-
"expected": "items[1]:\n - users[2]{id,name}:\n
|
|
61
|
-
"specSection": "10"
|
|
60
|
+
"expected": "items[1]:\n - users[2]{id,name}:\n 1,Ada\n 2,Bob\n status: active",
|
|
61
|
+
"specSection": "10",
|
|
62
|
+
"note": "YAML-style encoding for list-item objects with tabular array as first field"
|
|
62
63
|
},
|
|
63
64
|
{
|
|
64
65
|
"name": "uses list format for nested object arrays with mismatched keys",
|
|
@@ -67,7 +68,7 @@
|
|
|
67
68
|
{ "users": [{ "id": 1, "name": "Ada" }, { "id": 2 }], "status": "active" }
|
|
68
69
|
]
|
|
69
70
|
},
|
|
70
|
-
"expected": "items[1]:\n - users[2]:\n
|
|
71
|
+
"expected": "items[1]:\n - users[2]:\n - id: 1\n name: Ada\n - id: 2\n status: active",
|
|
71
72
|
"specSection": "10"
|
|
72
73
|
},
|
|
73
74
|
{
|
|
@@ -97,12 +98,22 @@
|
|
|
97
98
|
"specSection": "10"
|
|
98
99
|
},
|
|
99
100
|
{
|
|
100
|
-
"name": "
|
|
101
|
+
"name": "uses canonical encoding for multi-field list-item objects with tabular arrays",
|
|
101
102
|
"input": {
|
|
102
103
|
"items": [{ "users": [{ "id": 1 }, { "id": 2 }], "note": "x" }]
|
|
103
104
|
},
|
|
104
|
-
"expected": "items[1]:\n - users[2]{id}:\n
|
|
105
|
-
"specSection": "10"
|
|
105
|
+
"expected": "items[1]:\n - users[2]{id}:\n 1\n 2\n note: x",
|
|
106
|
+
"specSection": "10",
|
|
107
|
+
"note": "Tabular header on hyphen line with rows at depth +2 and sibling fields at depth +1"
|
|
108
|
+
},
|
|
109
|
+
{
|
|
110
|
+
"name": "uses canonical encoding for single-field list-item tabular arrays",
|
|
111
|
+
"input": {
|
|
112
|
+
"items": [{ "users": [{ "id": 1, "name": "Ada" }, { "id": 2, "name": "Bob" }] }]
|
|
113
|
+
},
|
|
114
|
+
"expected": "items[1]:\n - users[2]{id,name}:\n 1,Ada\n 2,Bob",
|
|
115
|
+
"specSection": "10",
|
|
116
|
+
"note": "Tabular header on hyphen line with rows at depth +2"
|
|
106
117
|
},
|
|
107
118
|
{
|
|
108
119
|
"name": "places empty arrays on hyphen line when first",
|
|
@@ -112,6 +123,15 @@
|
|
|
112
123
|
"expected": "items[1]:\n - data[0]:\n name: x",
|
|
113
124
|
"specSection": "10"
|
|
114
125
|
},
|
|
126
|
+
{
|
|
127
|
+
"name": "encodes empty object list items as bare hyphen",
|
|
128
|
+
"input": {
|
|
129
|
+
"items": ["first", "second", {}]
|
|
130
|
+
},
|
|
131
|
+
"expected": "items[3]:\n - first\n - second\n -",
|
|
132
|
+
"specSection": "10",
|
|
133
|
+
"note": "Empty object list items encode as a single \"-\" line at the list-item depth"
|
|
134
|
+
},
|
|
115
135
|
{
|
|
116
136
|
"name": "uses field order from first object for tabular headers",
|
|
117
137
|
"input": {
|