@toon-format/spec 2.0.1 → 3.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -5,87 +5,104 @@ All notable changes to the TOON specification will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [3.0] - 2025-11-24
9
+
10
+ ### Breaking Changes
11
+
12
+ - Standardized encoding for list-item objects whose first field is a tabular array (§10):
13
+ - Encoders MUST emit `- key[N]{fields}:` on the hyphen line.
14
+ - Tabular rows MUST appear at depth +2 relative to the hyphen line.
15
+ - All other fields of the same object MUST appear at depth +1.
16
+ - The v2.0 shallow form (rows and fields at the same depth) and the v2.1 bare-hyphen form are no longer normative and MUST NOT be emitted by conforming encoders.
17
+
18
+ ### Changed
19
+
20
+ - Encoding/decoding rules (§10) simplified to describe only the YAML-style pattern; legacy layouts are treated as generic nesting and are not covered by conformance tests.
21
+ - Nested tabular list-item example in Appendix A updated to the canonical v3.0 form.
22
+
23
+ ### Migration from v2.1
24
+
25
+ - Update encoders to emit the YAML-style form for list-item objects whose first field is a tabular array.
26
+ - If you rely on v2.0/v2.1 layouts, keep decoder compatibility in non-strict or implementation-defined modes; the spec no longer requires or tests these patterns.
27
+ - Optionally regenerate existing `.toon` files for consistent v3 formatting.
28
+
29
+ ## [2.1] - 2025-11-23
30
+
31
+ ### Changed
32
+
33
+ - Canonical encoding for objects as list items (§10):
34
+ - Encoders SHOULD emit `- key[N]{fields}:` only when the list-item object has exactly one field and that field is a tabular array.
35
+ - In all other cases, encoders SHOULD emit a bare `-` line and place all fields at depth +1; tabular array headers then appear at depth +1 and their rows at depth +2.
36
+
8
37
  ## [2.0] - 2025-11-10
9
38
 
10
39
  ### Breaking Changes
11
40
 
12
- - **Removed:** Length marker (`#`) prefix in array headers has been completely removed from the specification
13
- - The `[#N]` format is no longer valid syntax. All array headers MUST use `[N]` format only
14
- - Encoders MUST NOT emit `[#N]` format
15
- - Decoders MUST NOT accept `[#N]` format (breaking change from v1.5)
41
+ - Removed `[#N]` length-marker syntax in array headers; `[N]` is now the only valid format.
42
+ - Encoders MUST NOT emit `[#N]`; decoders MUST reject it.
16
43
 
17
44
  ### Removed
18
45
 
19
- - All references to length marker from terminology (§1.4), header syntax (§6), ABNF grammar, conformance requirements (§13.2), and parsing helpers (Appendix B)
20
- - `lengthMarker` encoder option removed from all implementations
21
- - Length marker test fixtures removed
46
+ - The `lengthMarker` encoder option and any CLI flags exposing it.
22
47
 
23
48
  ### Migration from v1.5
24
49
 
25
- - Update decoder implementations to reject `[#N]` syntax
26
- - Convert any existing `.toon` files using `[#N]` format to `[N]` format
27
- - Remove `lengthMarker` option from encoder configurations
28
- - Remove `--length-marker` CLI flags if present
50
+ - Update decoders to reject `[#N]` syntax.
51
+ - Convert existing `.toon` files using `[#N]` to `[N]`.
52
+ - Remove `lengthMarker` configuration and CLI options.
29
53
 
30
54
  ## [1.5] - 2025-11-08
31
55
 
32
56
  ### Added
33
57
 
34
- - Optional key folding for encoders: `keyFolding="safe"` mode with `flattenDepth` control to collapse single-key object chains into dotted-path notation (§13.4)
35
- - Optional path expansion for decoders: `expandPaths="safe"` mode to split dotted keys into nested objects, with conflict resolution tied to `strict` option (§13.4, §14.5)
36
- - IdentifierSegment terminology and path separator definition (fixed to `"."` in v1.5) (§1.9)
37
- - Deep-merge semantics for path expansion: recursive merge for objects, error on conflict when `strict=true`, last-write-wins (LWW) when `strict=false` (§13.4)
58
+ - Optional key folding for encoders: `keyFolding="safe"` with `flattenDepth` to collapse single-key object chains into dotted paths (§13.4).
59
+ - Optional path expansion for decoders: `expandPaths="safe"` to split dotted keys into nested objects with deep-merge semantics and conflict handling tied to `strict` (§13.4, §14.5).
60
+ - IdentifierSegment terminology and fixed `"."` path separator for safe folding/expansion (§1.9).
38
61
 
39
62
  ### Changed
40
63
 
41
- - Both new features default to OFF and are fully backward-compatible
42
- - Safe-mode folding requires IdentifierSegment validation, collision avoidance, and no quoting
64
+ - Safe-mode folding requires IdentifierSegment-only segments, no path separator in segments, no quoting, and collision avoidance.
65
+ - Both features default to `off` and are backward-compatible.
43
66
 
44
67
  ## [1.4] - 2025-11-05
45
68
 
46
69
  ### Changed
47
70
 
48
- - Removed JavaScript-specific normalization details from specification; replaced with language-agnostic requirements (Section 3)
49
- - Defined canonical number format for encoders: no exponent notation, no trailing zeros, no leading zeros except "0" (Section 2)
50
- - Clarified decoder handling of exponent notation and out-of-range numbers (Section 2)
51
- - Expanded `\w` regex notation to explicit character class `[A-Za-z0-9_]` for cross-language clarity (Section 7.3)
52
- - Clarified non-strict mode tab handling as implementation-defined (Section 12)
71
+ - Generalized normalization rules and defined canonical number format for encoders (no exponent notation, no trailing zeros, no leading zeros except `"0"`), plus decoder handling of exponent forms and out-of-range numbers (§2-§3).
72
+ - Replaced `\w` with explicit `[A-Za-z0-9_]` in key regexes for cross-language clarity (§7.3).
73
+ - Clarified non-strict mode tab handling as implementation-defined (§12).
53
74
 
54
75
  ### Added
55
76
 
56
- - Appendix G: Host Type Normalization Examples with guidance for Go, JavaScript, Python, and Rust implementations
77
+ - Appendix G with host-type normalization examples for Go, JavaScript, Python, and Rust.
57
78
 
58
79
  ## [1.3] - 2025-10-31
59
80
 
60
81
  ### Added
61
82
 
62
- - Numeric precision requirements: JavaScript implementations SHOULD use `Number.toString()` precision (15-17 digits), all implementations MUST preserve round-trip fidelity (Section 2)
63
- - RFC 5234 core rules (ALPHA, DIGIT, DQUOTE, HTAB, LF, SP) to ABNF grammar definitions (Section 6)
83
+ - Numeric precision requirements: JavaScript implementations SHOULD use `Number.toString()` precision (1517 digits); all implementations MUST preserve round-trip fidelity (§2).
84
+ - RFC 5234 core rules (ALPHA, DIGIT, DQUOTE, HTAB, LF, SP) to ABNF grammar definitions (§6).
64
85
 
65
86
  ## [1.2] - 2025-10-29
66
87
 
67
88
  ### Changed
68
89
 
69
- - Clarified delimiter scoping behavior between array headers
70
- - Tightened strict-mode indentation requirements: leading spaces MUST be exact multiples of indentSize; tabs in indentation MUST error
71
- - Defined blank-line and trailing-newline decoding behavior with explicit skipping rules outside arrays
72
- - Clarified hyphen-based quoting: "-" or any string starting with "-" MUST be quoted
73
- - Clarified BigInt normalization: values outside safe integer range are converted to quoted decimal strings
74
- - Clarified row/key disambiguation: uses first unquoted delimiter vs colon position
90
+ - Tightened delimiter scoping, indentation, blank-line handling, and hyphen-based quoting rules (§11-§12).
91
+ - Clarified BigInt normalization (out-of-range values quoted decimal strings) and row/key disambiguation (first unquoted delimiter vs colon) (§2, §9.3).
75
92
 
76
93
  ## [1.1] - 2025-10-29
77
94
 
78
95
  ### Added
79
96
 
80
- - Strict-mode rules
81
- - Delimiter-aware parsing
82
- - Decoder options (indent, strict)
97
+ - Strict-mode rules.
98
+ - Delimiter-aware parsing.
99
+ - Decoder options (`indent`, `strict`).
83
100
 
84
101
  ## [1.0] - 2025-10-28
85
102
 
86
103
  ### Added
87
104
 
88
- - Initial specification release
89
- - Encoding normalization rules
90
- - Decoding interpretation guidelines
91
- - Conformance requirements
105
+ - Initial specification release.
106
+ - Encoding normalization rules.
107
+ - Decoding interpretation guidelines.
108
+ - Conformance requirements.
package/README.md CHANGED
@@ -1,7 +1,7 @@
1
1
  # TOON Format Specification
2
2
 
3
- [![SPEC v2.0](https://img.shields.io/badge/spec-v2.0-lightgrey)](./SPEC.md)
4
- [![Tests](https://img.shields.io/badge/tests-342-green)](./tests/fixtures/)
3
+ [![SPEC v3.0](https://img.shields.io/badge/spec-v3.0-lightgrey)](./SPEC.md)
4
+ [![Tests](https://img.shields.io/badge/tests-345-green)](./tests/fixtures/)
5
5
  [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](./LICENSE)
6
6
 
7
7
  This repository contains the official specification for **Token-Oriented Object Notation (TOON)**, a compact, human-readable encoding of the JSON data model for LLM prompts. It provides a lossless serialization of the same objects, arrays, and primitives as JSON, but in a syntax that minimizes tokens and makes structure easy for models to follow.
@@ -10,7 +10,7 @@ This repository contains the official specification for **Token-Oriented Object
10
10
 
11
11
  [→ Read the full specification (SPEC.md)](./SPEC.md)
12
12
 
13
- - **Version:** 2.0 (2025-11-10)
13
+ - **Version:** 3.0 (2025-11-24)
14
14
  - **Status:** Working Draft
15
15
  - **License:** MIT
16
16
 
@@ -122,6 +122,22 @@ The [tests/fixtures/](./tests/fixtures/) directory contains **language-agnostic
122
122
 
123
123
  See [tests/README.md](./tests/README.md) for detailed fixture format and usage instructions.
124
124
 
125
+ ## Media Type & File Extension
126
+
127
+ TOON defines a provisional media type (see §18.2 of the specification):
128
+
129
+ - **Media type:** `text/toon` (provisional, pending IANA registration)
130
+ - **File extension:** `.toon`
131
+ - **Charset:** Always UTF-8
132
+
133
+ For HTTP usage:
134
+
135
+ ```http
136
+ Content-Type: text/toon
137
+ ```
138
+
139
+ See the full [IANA Considerations section](SPEC.md#18-iana-considerations) for details.
140
+
125
141
  ## Contributing
126
142
 
127
143
  We welcome contributions to improve the specification! Please see [CONTRIBUTING.md](./CONTRIBUTING.md) for:
package/SPEC.md CHANGED
@@ -2,9 +2,9 @@
2
2
 
3
3
  ## Token-Oriented Object Notation
4
4
 
5
- **Version:** 2.0
5
+ **Version:** 3.0
6
6
 
7
- **Date:** 2025-11-10
7
+ **Date:** 2025-11-24
8
8
 
9
9
  **Status:** Working Draft
10
10
 
@@ -20,7 +20,7 @@ Token-Oriented Object Notation (TOON) is a line-oriented, indentation-based text
20
20
 
21
21
  ## Status of This Document
22
22
 
23
- This document is a Working Draft v2.0 and may be updated, replaced, or obsoleted. Implementers should monitor the canonical repository at https://github.com/toon-format/spec for changes.
23
+ This document is a Working Draft v3.0 and may be updated, replaced, or obsoleted. Implementers should monitor the canonical repository at https://github.com/toon-format/spec for changes.
24
24
 
25
25
  This specification is stable for implementation but not yet finalized. Breaking changes may occur in future major versions.
26
26
 
@@ -227,12 +227,11 @@ Implementations that fail to conform to any MUST or REQUIRED level requirement a
227
227
 
228
228
  ## 3. Encoding Normalization (Reference Encoder)
229
229
 
230
- Encoders MUST normalize non-JSON values to the JSON data model before encoding:
230
+ Encoders MUST normalize non-JSON values to the JSON data model before encoding. The mapping from host-specific types to JSON model is implementation-defined and MUST be documented.
231
231
 
232
232
  - Number:
233
233
  - Finite → number (canonical decimal form per Section 2). -0 → 0.
234
234
  - NaN, +Infinity, -Infinity → null.
235
- - Non-JSON types MUST be normalized to the JSON data model (object, array, string, number, boolean, or null) before encoding. The mapping from host-specific types to JSON model is implementation-defined and MUST be documented.
236
235
  - Examples of host-type normalization (non-normative):
237
236
  - Date/time objects → ISO 8601 string representation.
238
237
  - Set-like collections → array.
@@ -384,9 +383,9 @@ A string value MUST be quoted if any of the following is true:
384
383
  - It contains a colon (:), double quote ("), or backslash (\).
385
384
  - It contains brackets or braces ([, ], {, }).
386
385
  - It contains control characters: newline, carriage return, or tab.
387
- - It contains the relevant delimiter:
388
- - Inside array scope: the active delimiter (Section 1).
389
- - Outside array scope: the document delimiter (Section 1).
386
+ - It contains the relevant delimiter (see §11 for complete delimiter rules):
387
+ - For inline array values and tabular row cells: the active delimiter from the nearest array header.
388
+ - For object field values (key: value): the document delimiter, even when the object is within an array's scope.
390
389
  - It equals "-" or starts with "-" (any hyphen at position 0).
391
390
 
392
391
  Otherwise, the string MAY be emitted without quotes. Unicode, emoji, and strings with internal (non-leading/trailing) spaces are safe unquoted provided they do not violate the conditions.
@@ -403,12 +402,10 @@ Encoders MAY perform key folding when enabled (see §13.4 for complete folding r
403
402
 
404
403
  ### 7.4 Decoding Rules for Strings and Keys (Decoding)
405
404
 
406
- - Quoted strings and keys MUST be unescaped per Section 7.1; any other escape MUST error. Quoted primitives remain strings.
407
- - Unquoted values:
408
- - true/false/null boolean/null
409
- - Numeric tokens numbers (with the leading-zero rule in Section 4)
410
- - Otherwise → strings
411
- - Keys (quoted or unquoted) MUST be followed by ":"; missing colon MUST error.
405
+ Decoding of value tokens follows §4 (unquoted type inference, quoted strings, numeric rules). This section adds key-specific requirements:
406
+
407
+ - Quoted keys MUST be unescaped per Section 7.1; any other escape MUST error.
408
+ - Keys (quoted or unquoted) MUST be followed by ":"; missing colon MUST error (see also §14.2).
412
409
 
413
410
  ## 8. Objects
414
411
 
@@ -421,7 +418,6 @@ Encoders MAY perform key folding when enabled (see §13.4 for complete folding r
421
418
  - Decoding:
422
419
  - A line "key:" with nothing after the colon at depth d opens an object; subsequent lines at depth > d belong to that object until the depth decreases to ≤ d.
423
420
  - Lines "key: value" at the same depth are sibling fields.
424
- - Missing colon after a key MUST error.
425
421
 
426
422
  ## 9. Arrays
427
423
 
@@ -474,6 +470,7 @@ Decoding:
474
470
  - Delimiter before colon → row.
475
471
  - Colon before delimiter → key-value line (end of rows).
476
472
  - If a line has an unquoted colon but no unquoted active delimiter → key-value line (end of rows).
473
+ - When a tabular array appears as the first field of a list-item object, indentation is governed by Section 10.
477
474
 
478
475
  ### 9.4 Mixed / Non-Uniform Arrays — Expanded List
479
476
 
@@ -499,20 +496,18 @@ Decoding:
499
496
  For an object appearing as a list item:
500
497
 
501
498
  - Empty object list item: a single "-" at the list-item indentation level.
502
- - First field on the hyphen line:
503
- - Primitive: - key: value
504
- - Primitive array: - key[M<delim?>]: v1<delim>…
505
- - Tabular array: - key[N<delim?>]{fields}:
506
- - Followed by tabular rows at depth +1 (relative to the hyphen line).
507
- - Non-uniform array: - key[N<delim?>]:
508
- - Followed by list items at depth +1.
509
- - Object: - key:
510
- - Nested object fields appear at depth +2 (i.e., one deeper than subsequent sibling fields of the same list item).
511
- - Remaining fields of the same object appear at depth +1 under the hyphen line in encounter order, using normal object field rules.
512
-
513
- Decoding:
514
- - The first field is parsed from the hyphen line. If it is a nested object (- key:), nested fields are at +2 relative to the hyphen line; subsequent fields of the same list item are at +1.
515
- - If the first field is a tabular header on the hyphen line, its rows are at +1; subsequent sibling fields continue at +1 after the rows.
499
+ - Encoding (normative):
500
+ - When a list-item object has a tabular array (Section 9.3) as its first field in encounter order, encoders MUST emit the tabular header on the hyphen line:
501
+ - The hyphen and tabular header appear on the same line at the list-item depth: - key[N<delim?>]{fields}:
502
+ - Tabular rows MUST appear at depth +2 (relative to the hyphen line).
503
+ - All other fields of the same object MUST appear at depth +1 under the hyphen line, in encounter order, using normal object field rules (Section 8).
504
+ - Encoders MUST NOT emit tabular rows at depth +1 or sibling fields at the same depth as rows when the first field is a tabular array.
505
+ - For all other cases (first field is not a tabular array), encoders SHOULD place the first field on the hyphen line. A bare hyphen on its own line is used only for empty list-item objects.
506
+ - Decoding (normative):
507
+ - When a decoder encounters a list-item line of the form - key[N<delim?>]{fields}: at depth d, it MUST treat this as the start of a tabular array field named key in the list-item object.
508
+ - Lines at depth d+2 that conform to tabular row syntax (Section 9.3) are rows of that tabular array.
509
+ - Lines at depth d+1 are additional fields of the same list-item object; the presence of a line at depth d+1 after rows terminates the rows.
510
+ - All other object-as-list-item patterns (bare hyphen, first field on hyphen line for non-tabular values) are decoded according to the general rules in Section 8 and Section 9.
516
511
 
517
512
  ## 11. Delimiters
518
513
 
@@ -520,19 +515,25 @@ Decoding:
520
515
  - Comma (default): header omits the delimiter symbol.
521
516
  - Tab: header includes HTAB inside brackets and braces (e.g., [N<TAB>], {a<TAB>b}); rows/inline arrays use tabs.
522
517
  - Pipe: header includes "|" inside brackets and braces; rows/inline arrays use "|".
523
- - Document vs Active delimiter:
524
- - Encoders select a document delimiter (option) that influences quoting for all object values (key: value) throughout the document.
525
- - Inside an array header's scope, the active delimiter governs splitting and quoting only for inline arrays and tabular rows that the header introduces. Object values (key: value) follow document-delimiter quoting rules regardless of array scope.
526
- - Delimiter-aware quoting (encoding):
527
- - Inline array values and tabular row cells: strings containing the active delimiter MUST be quoted to avoid splitting.
528
- - Object values (key: value): encoders use the document delimiter to decide delimiter-aware quoting, regardless of whether the object appears within an array's scope.
529
- - Strings containing non-active delimiters do not require quoting unless another quoting condition applies (Section 7.2).
530
- - Delimiter-aware parsing (decoding):
531
- - Inline arrays and tabular rows MUST be split only on the active delimiter declared by the nearest array header.
518
+
519
+ ### 11.1 Encoding Rules (Normative for Encoders)
520
+
521
+ - Document delimiter: Encoders select a document delimiter (option: comma, tab, pipe; default comma) that influences quoting for all object field values (key: value) throughout the document.
522
+ - Active delimiter: Inside an array header's scope, the active delimiter governs quoting only for inline array values and tabular row cells.
523
+ - Delimiter-aware quoting:
524
+ - Inline array values and tabular row cells: strings containing the active delimiter MUST be quoted.
525
+ - Object field values (key: value): encoders use the document delimiter to decide delimiter-aware quoting, regardless of whether the object appears within an array's scope.
526
+ - Strings containing non-active delimiters do not require quoting unless another condition applies (§7.2).
527
+
528
+ ### 11.2 Decoding Rules (Normative for Decoders)
529
+
530
+ - Active delimiter: Decoders use only the active delimiter declared by the nearest array header to split inline arrays and tabular rows.
531
+ - Delimiter-aware parsing:
532
+ - Inline arrays and tabular rows MUST be split only on the active delimiter.
532
533
  - Splitting MUST preserve empty tokens; surrounding spaces are trimmed, and empty tokens decode to the empty string.
533
- - Strings containing the active delimiter MUST be quoted to avoid splitting; non-active delimiters MUST NOT cause splits.
534
534
  - Nested headers may change the active delimiter; decoding MUST use the delimiter declared by the nearest header.
535
- - If the bracket declares tab or pipe, the same symbol MUST be used in the fields segment and for splitting all rows/values in that scope.
535
+ - If the bracket declares tab or pipe, the same symbol MUST be used in the fields segment and for splitting all rows/values in that scope (§6).
536
+ - Object field values (key: value): Decoders parse the entire post-colon token as a single value; document delimiter is not a decoder concept.
536
537
 
537
538
  ## 12. Indentation and Whitespace
538
539
 
@@ -730,12 +731,14 @@ When strict mode is enabled (default), decoders MUST error on the following cond
730
731
 
731
732
  ### 14.3 Indentation Errors
732
733
 
734
+ See §12 for indentation semantics. In strict mode, decoders MUST error on:
733
735
  - Leading spaces not a multiple of indentSize.
734
736
  - Any tab used in indentation (tabs allowed in quoted strings and as HTAB delimiter).
735
737
 
736
738
  ### 14.4 Structural Errors
737
739
 
738
- - Blank lines inside arrays/tabular rows.
740
+ See §12 for blank line semantics. In strict mode, decoders MUST error on:
741
+ - Blank lines inside arrays/tabular rows (between the first and last item/row).
739
742
 
740
743
  For root-form rules, including handling of empty documents, see §5.
741
744
 
@@ -894,7 +897,11 @@ This specification does not request IANA registration at this time, as the forma
894
897
 
895
898
  ### 18.2 Provisional Media Type
896
899
 
897
- The following provisional media type designation is RECOMMENDED for experimental implementations:
900
+ Until IANA registration is completed, implementations SHOULD use:
901
+ - Media type: `text/toon`
902
+ - File extension: `.toon`
903
+
904
+ Full designation details:
898
905
 
899
906
  Type name: text
900
907
 
@@ -989,11 +996,13 @@ Nested tabular inside a list item:
989
996
  ```
990
997
  items[1]:
991
998
  - users[2]{id,name}:
992
- 1,Ada
993
- 2,Bob
999
+ 1,Ada
1000
+ 2,Bob
994
1001
  status: active
995
1002
  ```
996
1003
 
1004
+ Note: When a list-item object has a tabular array as its first field, encoders emit the tabular header on the hyphen line with rows at depth +2 and other fields at depth +1. This is the canonical encoding for list-item objects whose first field is a tabular array.
1005
+
997
1006
  Delimiter variations:
998
1007
  ```
999
1008
  items[2 ]{sku name qty price}:
@@ -1218,52 +1227,43 @@ Note: Host-type normalization tests (e.g., BigInt, Date, Set, Map) are language-
1218
1227
 
1219
1228
  ## Appendix D: Document Changelog (Informative)
1220
1229
 
1230
+ This appendix summarizes major changes between spec versions. For the complete changelog, see [`CHANGELOG.md`](./CHANGELOG.md) in the specification repository.
1231
+
1232
+ ### v3.0 (2025-11-24)
1233
+
1234
+ - Standardized encoding for list-item objects whose first field is a tabular array (§10).
1235
+
1236
+ ### v2.1 (2025-11-23)
1237
+
1238
+ - Tightened canonical encoding for objects as list items (§10): bare `-` for multi-field objects, compact `- key[N]{fields}:` only for single-field tabular arrays, to improve visual consistency and LLM readability.
1239
+
1221
1240
  ### v2.0 (2025-11-10)
1222
1241
 
1223
- - Breaking change: Length marker (`#`) prefix in array headers has been completely removed from the specification.
1224
- - The `[#N]` format is no longer valid syntax. All array headers MUST use `[N]` format only.
1225
- - Encoders MUST NOT emit `[#N]` format.
1226
- - Decoders MUST NOT accept `[#N]` format (breaking change from v1.5).
1227
- - Removed all references to length marker from terminology, grammar, conformance requirements, and parsing helpers.
1242
+ - Removed `[#N]` length-marker syntax from array headers; `[N]` is now the only valid form.
1228
1243
 
1229
1244
  ### v1.5 (2025-11-08)
1230
1245
 
1231
- - Added optional key folding for encoders: `keyFolding='safe'` mode with `flattenDepth` control (§13.4).
1232
- - Added optional path expansion for decoders: `expandPaths='safe'` mode with conflict resolution tied to existing `strict` option (§13.4).
1233
- - Defined safe-mode requirements for folding: IdentifierSegment validation, no path separator in segments, collision avoidance, no quoting required (§7.3, §13.4).
1234
- - Specified deep-merge semantics for expansion: recursive merge for objects; conflict policy (error in strict mode, LWW when strict=false) for non-objects (§13.4).
1235
- - Added strict-mode error category for path expansion conflicts (§14.5).
1236
- - Both features default to OFF; fully backward-compatible.
1246
+ - Added optional key folding (`keyFolding="safe"`) and path expansion (`expandPaths="safe"`) with deep-merge semantics and strict-mode conflict handling (§13.4, §14.5).
1237
1247
 
1238
1248
  ### v1.4 (2025-11-05)
1239
1249
 
1240
- - Removed JavaScript-specific normalization details; replaced with language-agnostic requirements (Section 3).
1241
- - Defined canonical number format for encoders and decoder acceptance rules (Section 2).
1242
- - Added Appendix G with host-type normalization examples for Go, JavaScript, Python, and Rust.
1243
- - Clarified non-strict mode tab handling as implementation-defined (Section 12).
1244
- - Expanded regex notation for cross-language clarity (Section 7.3).
1250
+ - Generalized normalization and numeric canonicalization rules, and added host-type normalization guidance (Appendix G).
1245
1251
 
1246
1252
  ### v1.3 (2025-10-31)
1247
1253
 
1248
- - Added numeric precision requirements: JavaScript implementations SHOULD use Number.toString() precision (15-17 digits), all implementations MUST preserve round-trip fidelity (Section 2).
1249
- - Added RFC 5234 core rules (ALPHA, DIGIT, DQUOTE, HTAB, LF, SP) to ABNF grammar definitions (Section 6).
1254
+ - Added numeric precision guidance and ABNF core rules for headers and keys (§2, §6).
1250
1255
 
1251
1256
  ### v1.2 (2025-10-29)
1252
1257
 
1253
- - Clarified delimiter scoping behavior between array headers.
1254
- - Tightened strict-mode indentation requirements: leading spaces MUST be exact multiples of indentSize; tabs in indentation MUST error.
1255
- - Defined blank-line and trailing-newline decoding behavior with explicit skipping rules outside arrays.
1256
- - Clarified hyphen-based quoting: "-" or any string starting with "-" MUST be quoted.
1257
- - Clarified BigInt normalization: values outside safe integer range are converted to quoted decimal strings.
1258
- - Clarified row/key disambiguation: uses first unquoted delimiter vs colon position.
1258
+ - Tightened delimiter scoping, indentation, blank-line handling, hyphen-based quoting, BigInt normalization, and row/key disambiguation rules (§2, §9, §11-§12).
1259
1259
 
1260
1260
  ### v1.1 (2025-10-29)
1261
1261
 
1262
- Added strict-mode rules, delimiter-aware parsing, and decoder options (indent, strict).
1262
+ - Introduced strict-mode validation, delimiter-aware parsing, and decoder options (indent, strict).
1263
1263
 
1264
1264
  ### v1.0 (2025-10-28)
1265
1265
 
1266
- Initial encoding, normalization, and conformance rules.
1266
+ - Initial specification: encoding normalization, decoding interpretation, and conformance requirements.
1267
1267
 
1268
1268
  ## Appendix E: Acknowledgments and License
1269
1269
 
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "@toon-format/spec",
3
3
  "type": "module",
4
- "version": "2.0.1",
4
+ "version": "3.0.0",
5
5
  "packageManager": "pnpm@10.19.0",
6
6
  "description": "Official specification for Token-Oriented Object Notation (TOON)",
7
7
  "author": "Johann Schopplich <hello@johannschopplich.com>",
@@ -1,5 +1,5 @@
1
1
  {
2
- "version": "1.4",
2
+ "version": "3.0",
3
3
  "category": "decode",
4
4
  "description": "Nested and mixed array decoding - list format, arrays of arrays, root arrays, mixed types",
5
5
  "tests": [
@@ -52,8 +52,8 @@
52
52
  "specSection": "9.4"
53
53
  },
54
54
  {
55
- "name": "parses nested tabular arrays as first field on hyphen line",
56
- "input": "items[1]:\n - users[2]{id,name}:\n 1,Ada\n 2,Bob\n status: active",
55
+ "name": "parses list items whose first field is a tabular array",
56
+ "input": "items[1]:\n - users[2]{id,name}:\n 1,Ada\n 2,Bob\n status: active",
57
57
  "expected": {
58
58
  "items": [
59
59
  {
@@ -65,7 +65,24 @@
65
65
  }
66
66
  ]
67
67
  },
68
- "specSection": "10"
68
+ "specSection": "10",
69
+ "note": "Canonical encoding: tabular header on hyphen line, rows at depth +2, sibling fields at depth +1"
70
+ },
71
+ {
72
+ "name": "parses single-field list-item object with tabular array",
73
+ "input": "items[1]:\n - users[2]{id,name}:\n 1,Ada\n 2,Bob",
74
+ "expected": {
75
+ "items": [
76
+ {
77
+ "users": [
78
+ { "id": 1, "name": "Ada" },
79
+ { "id": 2, "name": "Bob" }
80
+ ]
81
+ }
82
+ ]
83
+ },
84
+ "specSection": "10",
85
+ "note": "Single-field list-item object: only the tabular array, no sibling fields"
69
86
  },
70
87
  {
71
88
  "name": "parses objects containing arrays (including empty arrays) in list format",
@@ -79,7 +96,7 @@
79
96
  },
80
97
  {
81
98
  "name": "parses arrays of arrays within objects",
82
- "input": "items[1]:\n - matrix[2]:\n - [2]: 1,2\n - [2]: 3,4\n name: grid",
99
+ "input": "items[1]:\n - matrix[2]:\n - [2]: 1,2\n - [2]: 3,4\n name: grid",
83
100
  "expected": {
84
101
  "items": [
85
102
  { "matrix": [[1, 2], [3, 4]], "name": "grid" }
@@ -1,5 +1,5 @@
1
1
  {
2
- "version": "1.4",
2
+ "version": "3.0",
3
3
  "category": "encode",
4
4
  "description": "Nested and mixed array encoding - arrays of arrays, mixed type arrays, root arrays",
5
5
  "tests": [
@@ -1,5 +1,5 @@
1
1
  {
2
- "version": "1.4",
2
+ "version": "3.0",
3
3
  "category": "encode",
4
4
  "description": "Arrays of objects encoding - list format for non-uniform objects and complex structures",
5
5
  "tests": [
@@ -47,7 +47,7 @@
47
47
  { "matrix": [[1, 2], [3, 4]], "name": "grid" }
48
48
  ]
49
49
  },
50
- "expected": "items[1]:\n - matrix[2]:\n - [2]: 1,2\n - [2]: 3,4\n name: grid",
50
+ "expected": "items[1]:\n - matrix[2]:\n - [2]: 1,2\n - [2]: 3,4\n name: grid",
51
51
  "specSection": "10"
52
52
  },
53
53
  {
@@ -57,8 +57,9 @@
57
57
  { "users": [{ "id": 1, "name": "Ada" }, { "id": 2, "name": "Bob" }], "status": "active" }
58
58
  ]
59
59
  },
60
- "expected": "items[1]:\n - users[2]{id,name}:\n 1,Ada\n 2,Bob\n status: active",
61
- "specSection": "10"
60
+ "expected": "items[1]:\n - users[2]{id,name}:\n 1,Ada\n 2,Bob\n status: active",
61
+ "specSection": "10",
62
+ "note": "YAML-style encoding for list-item objects with tabular array as first field"
62
63
  },
63
64
  {
64
65
  "name": "uses list format for nested object arrays with mismatched keys",
@@ -67,7 +68,7 @@
67
68
  { "users": [{ "id": 1, "name": "Ada" }, { "id": 2 }], "status": "active" }
68
69
  ]
69
70
  },
70
- "expected": "items[1]:\n - users[2]:\n - id: 1\n name: Ada\n - id: 2\n status: active",
71
+ "expected": "items[1]:\n - users[2]:\n - id: 1\n name: Ada\n - id: 2\n status: active",
71
72
  "specSection": "10"
72
73
  },
73
74
  {
@@ -97,12 +98,22 @@
97
98
  "specSection": "10"
98
99
  },
99
100
  {
100
- "name": "places first field of nested tabular arrays on hyphen line",
101
+ "name": "uses canonical encoding for multi-field list-item objects with tabular arrays",
101
102
  "input": {
102
103
  "items": [{ "users": [{ "id": 1 }, { "id": 2 }], "note": "x" }]
103
104
  },
104
- "expected": "items[1]:\n - users[2]{id}:\n 1\n 2\n note: x",
105
- "specSection": "10"
105
+ "expected": "items[1]:\n - users[2]{id}:\n 1\n 2\n note: x",
106
+ "specSection": "10",
107
+ "note": "Tabular header on hyphen line with rows at depth +2 and sibling fields at depth +1"
108
+ },
109
+ {
110
+ "name": "uses canonical encoding for single-field list-item tabular arrays",
111
+ "input": {
112
+ "items": [{ "users": [{ "id": 1, "name": "Ada" }, { "id": 2, "name": "Bob" }] }]
113
+ },
114
+ "expected": "items[1]:\n - users[2]{id,name}:\n 1,Ada\n 2,Bob",
115
+ "specSection": "10",
116
+ "note": "Tabular header on hyphen line with rows at depth +2"
106
117
  },
107
118
  {
108
119
  "name": "places empty arrays on hyphen line when first",
@@ -112,6 +123,15 @@
112
123
  "expected": "items[1]:\n - data[0]:\n name: x",
113
124
  "specSection": "10"
114
125
  },
126
+ {
127
+ "name": "encodes empty object list items as bare hyphen",
128
+ "input": {
129
+ "items": ["first", "second", {}]
130
+ },
131
+ "expected": "items[3]:\n - first\n - second\n -",
132
+ "specSection": "10",
133
+ "note": "Empty object list items encode as a single \"-\" line at the list-item depth"
134
+ },
115
135
  {
116
136
  "name": "uses field order from first object for tabular headers",
117
137
  "input": {