@toon-format/spec 1.3.3 → 1.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +28 -0
- package/README.md +15 -3
- package/SPEC.md +435 -102
- package/VERSIONING.md +1 -14
- package/package.json +1 -1
- package/tests/README.md +42 -28
- package/tests/fixtures/decode/arrays-nested.json +20 -20
- package/tests/fixtures/decode/arrays-primitive.json +14 -14
- package/tests/fixtures/decode/arrays-tabular.json +28 -5
- package/tests/fixtures/decode/blank-lines.json +14 -14
- package/tests/fixtures/decode/delimiters.json +46 -29
- package/tests/fixtures/decode/indentation-errors.json +16 -29
- package/tests/fixtures/decode/numbers.json +142 -0
- package/tests/fixtures/decode/objects.json +29 -29
- package/tests/fixtures/decode/path-expansion.json +173 -0
- package/tests/fixtures/decode/primitives.json +44 -75
- package/tests/fixtures/decode/root-form.json +17 -0
- package/tests/fixtures/decode/validation-errors.json +29 -9
- package/tests/fixtures/decode/whitespace.json +61 -0
- package/tests/fixtures/encode/arrays-nested.json +15 -15
- package/tests/fixtures/encode/arrays-objects.json +16 -16
- package/tests/fixtures/encode/arrays-primitive.json +14 -14
- package/tests/fixtures/encode/arrays-tabular.json +8 -8
- package/tests/fixtures/encode/delimiters.json +23 -23
- package/tests/fixtures/encode/key-folding.json +218 -0
- package/tests/fixtures/encode/objects.json +27 -27
- package/tests/fixtures/encode/options.json +1 -1
- package/tests/fixtures/encode/primitives.json +61 -36
- package/tests/fixtures/encode/whitespace.json +18 -3
- package/tests/fixtures.schema.json +20 -4
- package/tests/fixtures/encode/normalization.json +0 -107
package/SPEC.md
CHANGED
|
@@ -2,9 +2,9 @@
|
|
|
2
2
|
|
|
3
3
|
## Token-Oriented Object Notation
|
|
4
4
|
|
|
5
|
-
**Version:** 1.
|
|
5
|
+
**Version:** 1.5
|
|
6
6
|
|
|
7
|
-
**Date:** 2025-10
|
|
7
|
+
**Date:** 2025-11-10
|
|
8
8
|
|
|
9
9
|
**Status:** Working Draft
|
|
10
10
|
|
|
@@ -16,11 +16,11 @@
|
|
|
16
16
|
|
|
17
17
|
## Abstract
|
|
18
18
|
|
|
19
|
-
Token-Oriented Object Notation (TOON) is a
|
|
19
|
+
Token-Oriented Object Notation (TOON) is a line-oriented, indentation-based text format that encodes the JSON data model with explicit structure and minimal quoting. Arrays declare their length and an optional field list once; rows use a single active delimiter (comma, tab, or pipe). Objects use indentation instead of braces; strings are quoted only when required. This specification defines TOON’s concrete syntax, canonical number formatting, delimiter scoping, and strict‑mode validation, and sets conformance requirements for encoders, decoders, and validators. TOON provides a compact, deterministic representation of structured data and is particularly efficient for arrays of uniform objects.
|
|
20
20
|
|
|
21
21
|
## Status of This Document
|
|
22
22
|
|
|
23
|
-
This document is a Working Draft v1.
|
|
23
|
+
This document is a Working Draft v1.4 and may be updated, replaced, or obsoleted. Implementers should monitor the canonical repository at https://github.com/toon-format/spec for changes.
|
|
24
24
|
|
|
25
25
|
This specification is stable for implementation but not yet finalized. Breaking changes are unlikely but possible before v2.0.
|
|
26
26
|
|
|
@@ -55,16 +55,6 @@ https://www.unicode.org/versions/Unicode15.1.0/
|
|
|
55
55
|
**[ISO8601]** ISO 8601:2019, "Date and time — Representations for information interchange".
|
|
56
56
|
https://www.iso.org/standard/70907.html
|
|
57
57
|
|
|
58
|
-
## Conventions and Terminology
|
|
59
|
-
|
|
60
|
-
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119] and [RFC8174] when, and only when, they appear in all capitals, as shown here.
|
|
61
|
-
|
|
62
|
-
Audience: implementers of encoders/decoders/validators; tool authors; practitioners embedding TOON in LLM prompts.
|
|
63
|
-
|
|
64
|
-
All normative text in this specification is contained in Sections 1-16 and Section 19. All appendices are informative except where explicitly marked normative. Examples throughout this document are informative unless explicitly stated otherwise.
|
|
65
|
-
|
|
66
|
-
Implementations that fail to conform to any MUST or REQUIRED level requirement are non-conformant. Implementations that conform to all MUST and REQUIRED level requirements but fail to conform to SHOULD or RECOMMENDED level requirements are said to be "not fully conformant" but are still considered conformant.
|
|
67
|
-
|
|
68
58
|
## Table of Contents
|
|
69
59
|
|
|
70
60
|
- [Introduction](#introduction)
|
|
@@ -97,63 +87,114 @@ Implementations that fail to conform to any MUST or REQUIRED level requirement a
|
|
|
97
87
|
- [Appendix D: Document Changelog (Informative)](#appendix-d-document-changelog-informative)
|
|
98
88
|
- [Appendix E: Acknowledgments and License](#appendix-e-acknowledgments-and-license)
|
|
99
89
|
- [Appendix F: Cross-check With Reference Behavior (Informative)](#appendix-f-cross-check-with-reference-behavior-informative)
|
|
90
|
+
- [Appendix G: Host Type Normalization Examples (Informative)](#appendix-g-host-type-normalization-examples-informative)
|
|
91
|
+
|
|
92
|
+
## Introduction (Informative)
|
|
93
|
+
|
|
94
|
+
### Purpose and scope
|
|
95
|
+
|
|
96
|
+
TOON (Token-Oriented Object Notation) is a line-oriented, indentation-based text format that encodes the JSON data model with explicit structure and minimal quoting. It is designed as a compact, deterministic representation of structured data, particularly well-suited to arrays of uniform objects. TOON is often used as a translation layer: produce data as JSON in code, encode to TOON for downstream consumption (e.g., LLM prompts), and decode back to JSON if needed.
|
|
97
|
+
|
|
98
|
+
### Applicability and non‑goals
|
|
99
|
+
|
|
100
|
+
Use TOON when:
|
|
101
|
+
- arrays of objects share the same fields (uniform tabular data),
|
|
102
|
+
- deterministic, minimally quoted text is desirable,
|
|
103
|
+
- explicit lengths and fixed row widths help detect truncation or malformed data,
|
|
104
|
+
- you want unambiguous, human-readable structure without repeating keys.
|
|
105
|
+
|
|
106
|
+
TOON is not intended to replace:
|
|
107
|
+
- JSON for non-uniform or deeply nested structures where repeated keys are not dominant,
|
|
108
|
+
- CSV for flat, strictly tabular data where maximum compactness is required and nesting is not needed,
|
|
109
|
+
- general-purpose storage or public APIs. TOON carries the JSON data model; it is a transport/authoring format with explicit structure, not an extended type system or schema language.
|
|
110
|
+
|
|
111
|
+
Out of scope:
|
|
112
|
+
- comments and annotations,
|
|
113
|
+
- alternative number systems or locale-specific formatting,
|
|
114
|
+
- user-defined escape sequences or control directives.
|
|
100
115
|
|
|
101
|
-
|
|
116
|
+
### Relationship to JSON, CSV, and YAML (Informative)
|
|
102
117
|
|
|
103
|
-
|
|
118
|
+
- **JSON**: TOON preserves the JSON data model. It is more compact for uniform arrays of objects by declaring length and fields once. For non-uniform or deeply nested data, JSON may be more efficient.
|
|
119
|
+
- **CSV/TSV**: CSV is typically more compact for flat tables but lacks nesting and type awareness. TOON adds explicit lengths, per-array delimiter scoping, field lists, and deterministic quoting, while remaining lightweight.
|
|
120
|
+
- **YAML**: TOON uses indentation and hyphen markers but is more constrained and deterministic: no comments, explicit array headers with lengths, fixed quoting rules, and a narrow escape set.
|
|
104
121
|
|
|
105
|
-
###
|
|
122
|
+
### Example (Informative)
|
|
106
123
|
|
|
107
|
-
|
|
124
|
+
```
|
|
125
|
+
users[2]{id,name,role}:
|
|
126
|
+
1,Alice,admin
|
|
127
|
+
2,Bob,user
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
### Document roadmap
|
|
131
|
+
|
|
132
|
+
Normative rules are organized as follows:
|
|
133
|
+
- Data model and canonical number form (§2); normalization on encode (§3); decoding interpretation (§4).
|
|
134
|
+
- Concrete syntax, including root-form determination (§5) and header syntax (§6).
|
|
135
|
+
- Strings and keys (§7); objects (§8); arrays and their sub-forms (§9); objects as list items (§10); delimiter rules (§11).
|
|
136
|
+
- Indentation and whitespace (§12); conformance and options (§13).
|
|
137
|
+
- Strict-mode errors (authoritative checklist) (§14).
|
|
108
138
|
|
|
109
|
-
|
|
110
|
-
- Type normalization rules for encoders (Section 3)
|
|
111
|
-
- Concrete syntax and formatting rules (Sections 5-12)
|
|
112
|
-
- Parsing and decoding semantics (Section 4)
|
|
113
|
-
- Conformance requirements for encoders, decoders, and validators (Section 13)
|
|
114
|
-
- Security and internationalization considerations (Sections 15-16)
|
|
139
|
+
Appendices are informative unless stated otherwise and provide examples, parsing helpers, and implementation guidance.
|
|
115
140
|
|
|
116
141
|
## 1. Terminology and Conventions
|
|
117
142
|
|
|
118
|
-
###
|
|
143
|
+
### 1.1 Use of RFC2119 Keywords and Normativity
|
|
144
|
+
|
|
145
|
+
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119] and [RFC8174] when, and only when, they appear in all capitals, as shown here.
|
|
146
|
+
|
|
147
|
+
Audience: implementers of encoders/decoders/validators; tool authors; practitioners embedding TOON in LLM prompts.
|
|
148
|
+
|
|
149
|
+
All normative text in this specification is contained in Sections 1-16 and Section 19. All appendices are informative except where explicitly marked normative. Examples throughout this document are informative unless explicitly stated otherwise.
|
|
150
|
+
|
|
151
|
+
Implementations that fail to conform to any MUST or REQUIRED level requirement are non-conformant. Implementations that conform to all MUST and REQUIRED level requirements but fail to conform to SHOULD or RECOMMENDED level requirements are said to be "not fully conformant" but are still considered conformant.
|
|
152
|
+
|
|
153
|
+
### 1.2 Core Concepts
|
|
119
154
|
|
|
120
155
|
- TOON document: A sequence of UTF-8 text lines formatted according to this spec.
|
|
121
156
|
- Line: A sequence of non-newline characters terminated by LF (U+000A) in serialized form. Encoders MUST use LF.
|
|
122
157
|
|
|
123
|
-
### Structural Terms
|
|
158
|
+
### 1.3 Structural Terms
|
|
124
159
|
|
|
125
160
|
- Indentation level (depth): Leading indentation measured in fixed-size space units (indentSize). Depth 0 has no indentation.
|
|
126
161
|
- Indentation unit (indentSize): A fixed number of spaces per level (default 2). Tabs MUST NOT be used for indentation.
|
|
127
162
|
|
|
128
|
-
### Array Terms
|
|
163
|
+
### 1.4 Array Terms
|
|
129
164
|
|
|
130
165
|
- Header: The bracketed declaration for arrays, optionally followed by a field list, and terminating with a colon; e.g., key[3]: or items[2]{a,b}:.
|
|
131
166
|
- Field list: Brace-enclosed, delimiter-separated list of field names for tabular arrays: {f1<delim>f2}.
|
|
132
167
|
- List item: A line beginning with "- " at a given depth representing an element in an expanded array.
|
|
133
168
|
- Length marker: Optional "#" prefix for array lengths in headers, e.g., [#3]. Decoders MUST accept and ignore it semantically.
|
|
134
169
|
|
|
135
|
-
### Delimiter Terms
|
|
170
|
+
### 1.5 Delimiter Terms
|
|
136
171
|
|
|
137
172
|
- Delimiter: The character used to separate array/tabular values: comma (default), tab (HTAB, U+0009), or pipe ("|").
|
|
138
173
|
- Document delimiter: The encoder-selected delimiter used for quoting decisions outside any array scope (default comma).
|
|
139
174
|
- Active delimiter: The delimiter declared by the closest array header in scope, used to split inline primitive arrays and tabular rows under that header; it also governs quoting decisions for values within that array's scope.
|
|
140
175
|
|
|
141
|
-
### Type Terms
|
|
176
|
+
### 1.6 Type Terms
|
|
142
177
|
|
|
143
178
|
- Primitive: string, number, boolean, or null.
|
|
144
179
|
- Object: Mapping from string keys to `JsonValue`.
|
|
145
180
|
- Array: Ordered sequence of `JsonValue`.
|
|
146
181
|
- `JsonValue`: Primitive | Object | Array.
|
|
147
182
|
|
|
148
|
-
### Conformance Terms
|
|
183
|
+
### 1.7 Conformance Terms
|
|
149
184
|
|
|
150
185
|
- Strict mode: Decoder mode that enforces counts, indentation, and delimiter consistency; also rejects invalid escapes and missing colons (default: true).
|
|
151
186
|
|
|
152
|
-
### Notation
|
|
187
|
+
### 1.8 Notation
|
|
153
188
|
|
|
154
189
|
- Regular expressions appear in slash-delimited form.
|
|
155
190
|
- ABNF snippets follow RFC 5234; HTAB means the U+0009 character.
|
|
156
191
|
|
|
192
|
+
### 1.9 Key Folding and Path Expansion Terms
|
|
193
|
+
|
|
194
|
+
- IdentifierSegment: A key segment eligible for safe folding and expansion, matching the pattern `^[A-Za-z_][A-Za-z0-9_]*$` (contains only letters, digits, and underscores; does not start with a digit; does not contain dots).
|
|
195
|
+
- Path separator: The character used to join/split key segments during folding and expansion. Fixed to `"."` (U+002E, FULL STOP) in v1.5.
|
|
196
|
+
- Note: Unquoted keys in TOON remain permissive per §7.3 (`^[A-Za-z_][A-Za-z0-9_.]*$`, allowing dots). IdentifierSegment is a stricter pattern used only for safe folding and expansion eligibility checks.
|
|
197
|
+
|
|
157
198
|
## 2. Data Model
|
|
158
199
|
|
|
159
200
|
- TOON models data as:
|
|
@@ -163,34 +204,43 @@ This specification defines:
|
|
|
163
204
|
- Ordering:
|
|
164
205
|
- Array order MUST be preserved.
|
|
165
206
|
- Object key order MUST be preserved as encountered by the encoder.
|
|
166
|
-
- Numbers (encoding):
|
|
167
|
-
-
|
|
168
|
-
|
|
169
|
-
|
|
170
|
-
-
|
|
171
|
-
|
|
172
|
-
|
|
173
|
-
-
|
|
174
|
-
-
|
|
207
|
+
- Numbers (canonical form for encoding):
|
|
208
|
+
- Encoders MUST emit numbers in canonical decimal form:
|
|
209
|
+
- No exponent notation (e.g., 1e6 MUST be rendered as 1000000; 1e-6 as 0.000001).
|
|
210
|
+
- No leading zeros except for the single digit "0" (e.g., "05" is not canonical).
|
|
211
|
+
- No trailing zeros in the fractional part (e.g., 1.5000 MUST be rendered as 1.5).
|
|
212
|
+
- If the fractional part is zero after normalization, emit as an integer (e.g., 1.0 → 1).
|
|
213
|
+
- -0 MUST be normalized to 0.
|
|
214
|
+
- Encoders MUST emit sufficient precision to ensure round-trip fidelity within the encoder's host environment: decode(encode(x)) MUST equal x.
|
|
215
|
+
- If the encoder's host environment cannot represent a numeric value without loss (e.g., arbitrary-precision decimals or integers exceeding the host's numeric range), the encoder MAY:
|
|
216
|
+
- Emit a quoted string containing the exact decimal representation to preserve value fidelity, OR
|
|
217
|
+
- Emit a canonical number that round-trips to the host's numeric approximation (losing precision), provided it conforms to the canonical formatting rules above.
|
|
218
|
+
- Encoders SHOULD provide an option to choose lossless stringification for out-of-range numbers.
|
|
219
|
+
- Numbers (decoding):
|
|
220
|
+
- Decoders MUST accept decimal and exponent forms on input (e.g., 42, -3.14, 1e-6, -1E+9).
|
|
221
|
+
- Decoders MUST treat tokens with forbidden leading zeros (e.g., "05", "0001") as strings, not numbers.
|
|
222
|
+
- If a decoded numeric token is not representable in the host's default numeric type without loss, implementations MAY:
|
|
223
|
+
- Return a higher-precision numeric type (e.g., arbitrary-precision integer or decimal), OR
|
|
224
|
+
- Return a string, OR
|
|
225
|
+
- Return an approximate numeric value if that is the documented policy.
|
|
226
|
+
- Implementations MUST document their policy for handling out-of-range or non-representable numbers. A lossless-first policy is RECOMMENDED for libraries intended for data interchange or validation.
|
|
175
227
|
- Null: Represented as the literal null.
|
|
176
228
|
|
|
177
229
|
## 3. Encoding Normalization (Reference Encoder)
|
|
178
230
|
|
|
179
|
-
|
|
231
|
+
Encoders MUST normalize non-JSON values to the JSON data model before encoding:
|
|
180
232
|
|
|
181
233
|
- Number:
|
|
182
|
-
- Finite → number (
|
|
234
|
+
- Finite → number (canonical decimal form per Section 2). -0 → 0.
|
|
183
235
|
- NaN, +Infinity, -Infinity → null.
|
|
184
|
-
-
|
|
185
|
-
|
|
186
|
-
-
|
|
187
|
-
-
|
|
188
|
-
-
|
|
189
|
-
-
|
|
190
|
-
- Plain object → own enumerable string keys in encounter order; values normalized recursively.
|
|
191
|
-
- Function, symbol, undefined, or unrecognized types → null.
|
|
236
|
+
- Non-JSON types MUST be normalized to the JSON data model (object, array, string, number, boolean, or null) before encoding. The mapping from host-specific types to JSON model is implementation-defined and MUST be documented.
|
|
237
|
+
- Examples of host-type normalization (non-normative):
|
|
238
|
+
- Date/time objects → ISO 8601 string representation.
|
|
239
|
+
- Set-like collections → array.
|
|
240
|
+
- Map-like collections → object (with string keys).
|
|
241
|
+
- Undefined, function, symbol, or unrecognized types → null.
|
|
192
242
|
|
|
193
|
-
|
|
243
|
+
See Appendix G for non-normative language-specific examples (Go, JavaScript, Python, Rust).
|
|
194
244
|
|
|
195
245
|
## 4. Decoding Interpretation (Reference Decoder)
|
|
196
246
|
|
|
@@ -205,6 +255,10 @@ Decoders map text tokens to host values:
|
|
|
205
255
|
- MUST accept standard decimal and exponent forms (e.g., 42, -3.14, 1e-6, -1E+9).
|
|
206
256
|
- MUST treat tokens with forbidden leading zeros (e.g., "05", "0001") as strings (not numbers).
|
|
207
257
|
- Only finite numbers are expected from conforming encoders.
|
|
258
|
+
- Decoding examples:
|
|
259
|
+
- `"1.5000"` → numeric value `1.5` (trailing zeros in fractional part are accepted)
|
|
260
|
+
- `"-1E+03"` → numeric value `-1000` (exponent forms are accepted)
|
|
261
|
+
- `"-0"` → numeric value `0` (negative zero decodes to zero; most host environments do not distinguish -0 from 0)
|
|
208
262
|
- Otherwise → string.
|
|
209
263
|
- Keys:
|
|
210
264
|
- Decoded as strings (quoted keys MUST be unescaped per Section 7.1).
|
|
@@ -225,9 +279,15 @@ TOON is a deterministic, line-oriented, indentation-based notation.
|
|
|
225
279
|
- Otherwise: expanded list items: key[N<delim?>]: with "- …" items (see Sections 9.4 and 10).
|
|
226
280
|
- Root form discovery:
|
|
227
281
|
- If the first non-empty depth-0 line is a valid root array header per Section 6 (must include a colon), decode a root array.
|
|
228
|
-
- Else if the document has exactly one non-empty line and it is neither a valid array header nor a key-value line (quoted or unquoted key), decode a single primitive.
|
|
282
|
+
- Else if the document has exactly one non-empty line and it is neither a valid array header nor a key-value line (quoted or unquoted key), decode a single primitive (examples: `hello`, `42`, `true`).
|
|
229
283
|
- Otherwise, decode an object.
|
|
230
|
-
-
|
|
284
|
+
- An empty document (no non-empty lines after ignoring trailing newline(s) and ignorable blank lines) decodes to an empty object `{}`.
|
|
285
|
+
- In strict mode, if there are two or more non-empty depth-0 lines that are neither headers nor key-value lines, the document is invalid. Example of invalid input (strict mode):
|
|
286
|
+
```
|
|
287
|
+
hello
|
|
288
|
+
world
|
|
289
|
+
```
|
|
290
|
+
This would be two primitives at root depth, which is not a valid TOON document structure.
|
|
231
291
|
|
|
232
292
|
## 6. Header Syntax (Normative)
|
|
233
293
|
|
|
@@ -253,7 +313,7 @@ Spacing and delimiters:
|
|
|
253
313
|
- The active delimiter declared by the bracket segment applies to:
|
|
254
314
|
- splitting inline primitive arrays on that header line,
|
|
255
315
|
- splitting tabular field names in "{…}",
|
|
256
|
-
- splitting all rows/items within the header
|
|
316
|
+
- splitting all rows/items within the header's scope,
|
|
257
317
|
- unless a nested header changes it.
|
|
258
318
|
- The same delimiter symbol declared in the bracket MUST be used in the fields segment and in all row/value splits in that scope.
|
|
259
319
|
- Absence of a delimiter symbol in a bracket segment ALWAYS means comma, regardless of any parent header.
|
|
@@ -287,6 +347,8 @@ unquoted-key = ( ALPHA / "_" ) *( ALPHA / DIGIT / "_" / "." )
|
|
|
287
347
|
; quoted-key = DQUOTE *(escaped-char / safe-char) DQUOTE
|
|
288
348
|
```
|
|
289
349
|
|
|
350
|
+
Note: The ABNF grammar above cannot enforce that the delimiter used in the fields segment (braces) matches the delimiter declared in the bracket segment. This equality requirement is normative per the prose in lines 311-312 above and MUST be enforced by implementations. Mismatched delimiters between bracket and brace segments MUST error in strict mode.
|
|
351
|
+
|
|
290
352
|
Note: The grammar above specifies header syntax. TOON's grammar is deliberately designed to prioritize human readability and token efficiency over strict LR(1) parseability. This requires some context-sensitive parsing (particularly for tabular row disambiguation in Section 9.3), which is a deliberate design tradeoff. Reference implementations demonstrate that deterministic parsing is achievable with modest lookahead.
|
|
291
353
|
|
|
292
354
|
Decoding requirements:
|
|
@@ -295,6 +357,8 @@ Decoding requirements:
|
|
|
295
357
|
- If a fields segment occurs between the bracket and the colon, parse field names using the active delimiter; quoted names MUST be unescaped per Section 7.1.
|
|
296
358
|
- A colon MUST follow the bracket and optional fields; missing colon MUST error.
|
|
297
359
|
|
|
360
|
+
Note: Key folding (§13.4) affects only the key prefix in headers. The header grammar remains unchanged. Example: `data.meta.items[2]{id,name}:` is a valid header with a folded key prefix `data.meta.items`, followed by a standard bracket segment, field list, and colon. Parsing treats folded keys as literal keys; see §13.4 for optional path expansion.
|
|
361
|
+
|
|
298
362
|
## 7. Strings and Keys
|
|
299
363
|
|
|
300
364
|
### 7.1 Escaping (Encoding and Decoding)
|
|
@@ -332,11 +396,13 @@ Otherwise, the string MAY be emitted without quotes. Unicode, emoji, and strings
|
|
|
332
396
|
### 7.3 Key Encoding (Encoding)
|
|
333
397
|
|
|
334
398
|
Object keys and tabular field names:
|
|
335
|
-
- MAY be unquoted only if they match: ^[A-Za-z_][
|
|
399
|
+
- MAY be unquoted only if they match: ^[A-Za-z_][A-Za-z0-9_.]*$.
|
|
336
400
|
- Otherwise, they MUST be quoted and escaped per Section 7.1.
|
|
337
401
|
|
|
338
402
|
Keys requiring quoting per the above rules MUST be quoted in all contexts, including array headers (e.g., "my-key"[N]:).
|
|
339
403
|
|
|
404
|
+
Encoders MAY perform key folding when enabled (see §13.4 for complete folding rules and requirements).
|
|
405
|
+
|
|
340
406
|
### 7.4 Decoding Rules for Strings and Keys (Decoding)
|
|
341
407
|
|
|
342
408
|
- Quoted strings and keys MUST be unescaped per Section 7.1; any other escape MUST error. Quoted primitives remain strings.
|
|
@@ -353,6 +419,7 @@ Keys requiring quoting per the above rules MUST be quoted in all contexts, inclu
|
|
|
353
419
|
- Nested or empty objects: key: on its own line. If non-empty, nested fields appear at depth +1.
|
|
354
420
|
- Key order: Implementations MUST preserve encounter order when emitting fields.
|
|
355
421
|
- An empty object at the root yields an empty document (no lines).
|
|
422
|
+
- Dotted keys (e.g., `user.name`) are valid literal keys in TOON. Decoders MUST treat them as single literal keys unless path expansion is explicitly enabled (see §13.4). This preserves backward compatibility and allows safe opt-in expansion behavior.
|
|
356
423
|
- Decoding:
|
|
357
424
|
- A line "key:" with nothing after the colon at depth d opens an object; subsequent lines at depth > d belong to that object until the depth decreases to ≤ d.
|
|
358
425
|
- Lines "key: value" at the same depth are sibling fields.
|
|
@@ -368,6 +435,7 @@ Keys requiring quoting per the above rules MUST be quoted in all contexts, inclu
|
|
|
368
435
|
- Root arrays: [N<delim?>]: v1<delim>…
|
|
369
436
|
- Decoding:
|
|
370
437
|
- Split using the active delimiter declared by the header; non-active delimiters MUST NOT split values.
|
|
438
|
+
- When splitting inline arrays, empty tokens (including those surrounded by whitespace) decode to the empty string.
|
|
371
439
|
- In strict mode, the number of decoded values MUST equal N; otherwise MUST error.
|
|
372
440
|
|
|
373
441
|
### 9.2 Arrays of Arrays (Primitives Only) — Expanded List
|
|
@@ -390,7 +458,7 @@ Tabular detection (encoding; MUST hold for all elements):
|
|
|
390
458
|
- All values across these keys are primitives (no nested arrays/objects).
|
|
391
459
|
|
|
392
460
|
When satisfied (encoding):
|
|
393
|
-
- Header: key[N<delim?>]{f1<delim>f2<delim>…}: where field order is the first object
|
|
461
|
+
- Header: key[N<delim?>]{f1<delim>f2<delim>…}: where field order is the first object's key encounter order.
|
|
394
462
|
- Field names encoded per Section 7.3.
|
|
395
463
|
- Rows: one line per object at depth +1 under the header; values are encoded primitives (Section 7) and joined by the active delimiter.
|
|
396
464
|
- Root tabular arrays omit the key: [N<delim?>]{…}: followed by rows.
|
|
@@ -399,7 +467,7 @@ Decoding:
|
|
|
399
467
|
- A tabular header declares the active delimiter and ordered field list.
|
|
400
468
|
- Rows appear at depth +1 as delimiter-separated value lines.
|
|
401
469
|
- Strict mode MUST enforce:
|
|
402
|
-
- Each row
|
|
470
|
+
- Each row's value count equals the field count.
|
|
403
471
|
- The number of rows equals N.
|
|
404
472
|
- Disambiguation at row depth (unquoted tokens):
|
|
405
473
|
- Compute the first unquoted occurrence of the active delimiter and the first unquoted colon.
|
|
@@ -455,15 +523,15 @@ Decoding:
|
|
|
455
523
|
- Tab: header includes HTAB inside brackets and braces (e.g., [N<TAB>], {a<TAB>b}); rows/inline arrays use tabs.
|
|
456
524
|
- Pipe: header includes "|" inside brackets and braces; rows/inline arrays use "|".
|
|
457
525
|
- Document vs Active delimiter:
|
|
458
|
-
- Encoders select a document delimiter (option) that influences quoting
|
|
459
|
-
- Inside an array header
|
|
460
|
-
- Absence of a delimiter symbol in a header ALWAYS means comma for that array’s scope; it does not inherit from any parent.
|
|
526
|
+
- Encoders select a document delimiter (option) that influences quoting for all object values (key: value) throughout the document.
|
|
527
|
+
- Inside an array header's scope, the active delimiter governs splitting and quoting only for inline arrays and tabular rows that the header introduces. Object values (key: value) follow document-delimiter quoting rules regardless of array scope.
|
|
461
528
|
- Delimiter-aware quoting (encoding):
|
|
462
|
-
-
|
|
463
|
-
-
|
|
529
|
+
- Inline array values and tabular row cells: strings containing the active delimiter MUST be quoted to avoid splitting.
|
|
530
|
+
- Object values (key: value): encoders use the document delimiter to decide delimiter-aware quoting, regardless of whether the object appears within an array's scope.
|
|
464
531
|
- Strings containing non-active delimiters do not require quoting unless another quoting condition applies (Section 7.2).
|
|
465
532
|
- Delimiter-aware parsing (decoding):
|
|
466
533
|
- Inline arrays and tabular rows MUST be split only on the active delimiter declared by the nearest array header.
|
|
534
|
+
- Splitting MUST preserve empty tokens; surrounding spaces are trimmed, and empty tokens decode to the empty string.
|
|
467
535
|
- Strings containing the active delimiter MUST be quoted to avoid splitting; non-active delimiters MUST NOT cause splits.
|
|
468
536
|
- Nested headers may change the active delimiter; decoding MUST use the delimiter declared by the nearest header.
|
|
469
537
|
- If the bracket declares tab or pipe, the same symbol MUST be used in the fields segment and for splitting all rows/values in that scope.
|
|
@@ -483,7 +551,7 @@ Decoding:
|
|
|
483
551
|
- Tabs used as indentation MUST error. Tabs are allowed in quoted strings and as the HTAB delimiter.
|
|
484
552
|
- Non-strict mode:
|
|
485
553
|
- Depth MAY be computed as floor(indentSpaces / indentSize).
|
|
486
|
-
-
|
|
554
|
+
- Implementations MAY accept tab characters in indentation. Depth computation for tabs is implementation-defined. Implementations MUST document their tab policy.
|
|
487
555
|
- Surrounding whitespace around tokens SHOULD be tolerated; internal semantics follow quoting rules.
|
|
488
556
|
- Blank lines:
|
|
489
557
|
- Outside arrays/tabular rows: decoders SHOULD ignore completely blank lines (do not create/close structures).
|
|
@@ -525,11 +593,88 @@ Options:
|
|
|
525
593
|
- indent (default: 2 spaces)
|
|
526
594
|
- delimiter (document delimiter; default: comma; alternatives: tab, pipe)
|
|
527
595
|
- lengthMarker (default: disabled)
|
|
596
|
+
- keyFolding (default: `"off"`; alternatives: `"safe"`)
|
|
597
|
+
- flattenDepth (default: Infinity when keyFolding is `"safe"`; non-negative integer ≥ 0; values 0 or 1 have no practical folding effect)
|
|
528
598
|
- Decoder options:
|
|
529
599
|
- indent (default: 2 spaces)
|
|
530
|
-
- strict (default: true)
|
|
600
|
+
- strict (default: `true`)
|
|
601
|
+
- expandPaths (default: `"off"`; alternatives: `"safe"`)
|
|
602
|
+
|
|
603
|
+
Strict-mode errors are enumerated in §14; validators MAY add informative diagnostics for style and encoding invariants.
|
|
604
|
+
|
|
605
|
+
### 13.4 Key Folding and Path Expansion
|
|
606
|
+
|
|
607
|
+
Key folding and path expansion are optional transformations for compact dotted-path notation. Both default to `"off"`.
|
|
608
|
+
|
|
609
|
+
#### Encoder: Key Folding
|
|
610
|
+
|
|
611
|
+
Key folding allows encoders to collapse chains of single-key objects into dotted-path notation, reducing verbosity for deeply nested structures.
|
|
612
|
+
|
|
613
|
+
Mode: `"off"` | `"safe"` (default: `"off"`)
|
|
614
|
+
- `"off"`: No folding is performed. All objects are encoded with standard nesting.
|
|
615
|
+
- `"safe"`: Fold eligible chains according to the rules below.
|
|
616
|
+
|
|
617
|
+
flattenDepth: The maximum number of segments from K0 to include in the folded path (default: Infinity when keyFolding is `"safe"`; values less than 2 have no practical effect).
|
|
618
|
+
- A value of 2 folds only two-segment chains: `{a: {b: val}}` → `a.b: val`.
|
|
619
|
+
- A value of Infinity folds entire eligible chains: `{a: {b: {c: val}}}` → `a.b.c: val`.
|
|
620
|
+
|
|
621
|
+
Foldable chain: A chain K0 → K1 → ... → Kn is foldable when:
|
|
622
|
+
- Each Ki (where i = 0 to n−1) is an object with exactly one key Ki+1.
|
|
623
|
+
- The chain stops at the first non-single-key object or when encountering a leaf value.
|
|
624
|
+
- Arrays are not considered single-key objects; a chain stops at arrays.
|
|
625
|
+
- The leaf value at Kn is either a primitive, an array, or an empty object.
|
|
626
|
+
|
|
627
|
+
Safe mode requirements (all MUST hold for a chain to be folded):
|
|
628
|
+
1. All folded segments K0 through K(d−1) (where d = min(chain length, flattenDepth)) MUST be IdentifierSegments (§1.9): matching `^[A-Za-z_][A-Za-z0-9_]*$`.
|
|
629
|
+
2. No segment may contain the path separator (`.` in v1.5).
|
|
630
|
+
3. The resulting folded key string MUST NOT equal any existing sibling literal key at the same object depth (collision avoidance).
|
|
631
|
+
4. If any segment would require quoting per §7.3, the chain MUST NOT be folded.
|
|
632
|
+
|
|
633
|
+
Folding process:
|
|
634
|
+
- For a foldable chain of length n, determine d = min(n, flattenDepth).
|
|
635
|
+
- Fold segments K0 through K(d−1) into a single key: `K0.K1.....K(d−1)`.
|
|
636
|
+
- If d < n, emit the remaining structure (Kd through Kn) as normal nested objects.
|
|
637
|
+
- The leaf value at Kn is encoded normally (primitive, array, or empty object).
|
|
638
|
+
|
|
639
|
+
Examples:
|
|
640
|
+
- `{a: {b: {c: 1}}}` with safe mode, depth=Infinity → `a.b.c: 1`
|
|
641
|
+
- `{a: {b: {c: {d: 1}}}}` with safe mode, depth=2 → produces `a.b:` followed by nested `c:` and `d: 1` at appropriate depths
|
|
642
|
+
- `{data: {"full-name": {x: 1}}}` → safe mode skips (segment `"full-name"` requires quoting); emits standard nested structure
|
|
643
|
+
|
|
644
|
+
#### Decoder: Path Expansion
|
|
645
|
+
|
|
646
|
+
Path expansion allows decoders to split dotted keys into nested object structures, enabling round-trip compatibility with folded encodings.
|
|
647
|
+
|
|
648
|
+
Mode: `"off"` | `"safe"` (default: `"off"`)
|
|
649
|
+
- `"off"`: Dotted keys are treated as literal keys. No expansion is performed.
|
|
650
|
+
- `"safe"`: Expand eligible dotted keys according to the rules below.
|
|
531
651
|
|
|
532
|
-
|
|
652
|
+
Safe mode behavior:
|
|
653
|
+
- Any key containing the path separator (`.`) is considered for expansion.
|
|
654
|
+
- Split the key into segments at each occurrence of `.`.
|
|
655
|
+
- Only expand when ALL resulting segments are IdentifierSegments (§1.9) and none contain `.` after splitting.
|
|
656
|
+
- Keys that do not meet the expansion criteria remain as literal keys.
|
|
657
|
+
|
|
658
|
+
Deep merge semantics:
|
|
659
|
+
When multiple expanded keys construct overlapping object paths, the decoder MUST merge them recursively:
|
|
660
|
+
- Object + Object: Deep merge recursively (recurse into nested keys and apply these rules).
|
|
661
|
+
- Object + Non-object (array or primitive): This is a conflict. Apply conflict resolution policy.
|
|
662
|
+
- Array + Array or Primitive + Primitive: This is a conflict. Apply conflict resolution policy. Arrays are never merged element-wise.
|
|
663
|
+
- Key ordering: During expansion, newly created keys are inserted in encounter order (the order they appear in the document). When merging creates nested keys, keys from later lines are appended after existing keys at the same depth. This ensures deterministic, predictable key order in the resulting object.
|
|
664
|
+
|
|
665
|
+
Conflict resolution:
|
|
666
|
+
- Conflict definition: A conflict occurs when expansion requires an object at a given path but finds a non-object value (array or primitive), or vice versa. A conflict also occurs when a final leaf key already exists with a non-object value that must be overwritten.
|
|
667
|
+
- `strict=true` (default): Decoders MUST error on any conflict. This ensures data integrity and catches structural inconsistencies.
|
|
668
|
+
- `strict=false`: Last-write-wins (LWW) conflict resolution: keys appearing later in document order (encounter order during parsing) overwrite earlier values. This provides deterministic behavior for lenient parsing.
|
|
669
|
+
|
|
670
|
+
Application order: Path expansion is applied AFTER all base parsing rules (§4–12) have been applied and BEFORE the final decoded value is returned to the caller. Structural validations enumerated in §14 (strict-mode errors for array counts, indentation, etc.) operate on the pre-expanded structure and remain unaffected by expansion.
|
|
671
|
+
|
|
672
|
+
Examples:
|
|
673
|
+
- Input: `data.meta.items[2]: a,b` with `expandPaths="safe"` → Output: `{"data": {"meta": {"items": ["a", "b"]}}}`
|
|
674
|
+
- Input: `user.name: Ada` with `expandPaths="off"` → Output: `{"user.name": "Ada"}`
|
|
675
|
+
- Input: `a.b.c: 1` and `a.b.d: 2` and `a.e: 3` with `expandPaths="safe"` → Output: `{"a": {"b": {"c": 1, "d": 2}, "e": 3}}` (deep merge)
|
|
676
|
+
- Input: `a.b: 1` then `a: 2` with `expandPaths="safe"` and `strict=true` → Error: "Expansion conflict at path 'a' (object vs primitive)"
|
|
677
|
+
- Input: `a.b: 1` then `a: 2` with `expandPaths="safe"` and `strict=false` → Output: `{"a": 2}` (LWW)
|
|
533
678
|
|
|
534
679
|
### 13.1 Encoder Conformance Checklist
|
|
535
680
|
|
|
@@ -544,6 +689,8 @@ Conforming encoders MUST:
|
|
|
544
689
|
- [ ] Convert -0 to 0 (§2)
|
|
545
690
|
- [ ] Convert NaN/±Infinity to null (§3)
|
|
546
691
|
- [ ] Emit no trailing spaces or trailing newline (§12)
|
|
692
|
+
- [ ] When `keyFolding="safe"`, folding MUST comply with §13.4 (IdentifierSegment validation, no separator in segments, collision avoidance, no quoting required)
|
|
693
|
+
- [ ] When `flattenDepth` is set, folding MUST stop at the configured segment count (§13.4)
|
|
547
694
|
|
|
548
695
|
### 13.2 Decoder Conformance Checklist
|
|
549
696
|
|
|
@@ -552,9 +699,12 @@ Conforming decoders MUST:
|
|
|
552
699
|
- [ ] Split inline arrays and tabular rows using active delimiter only (§11)
|
|
553
700
|
- [ ] Unescape quoted strings with only valid escapes (§7.1)
|
|
554
701
|
- [ ] Type unquoted primitives: true/false/null → booleans/null, numeric → number, else → string (§4)
|
|
555
|
-
- [ ] Enforce strict-mode rules when strict=true (§14)
|
|
702
|
+
- [ ] Enforce strict-mode rules when `strict=true` (§14)
|
|
556
703
|
- [ ] Accept and ignore optional # length marker (§6)
|
|
557
704
|
- [ ] Preserve array order and object key order (§2)
|
|
705
|
+
- [ ] When `expandPaths="safe"`, expansion MUST follow §13.4 (IdentifierSegment-only segments, deep merge, conflict rules)
|
|
706
|
+
- [ ] When `expandPaths="safe"` with `strict=true`, MUST error on expansion conflicts per §14.5
|
|
707
|
+
- [ ] When `expandPaths="safe"` with `strict=false`, apply LWW conflict resolution (§13.4)
|
|
558
708
|
|
|
559
709
|
### 13.3 Validator Conformance Checklist
|
|
560
710
|
|
|
@@ -590,9 +740,20 @@ When strict mode is enabled (default), decoders MUST error on the following cond
|
|
|
590
740
|
### 14.4 Structural Errors
|
|
591
741
|
|
|
592
742
|
- Blank lines inside arrays/tabular rows.
|
|
593
|
-
- Empty input (document with no non-empty lines after ignoring trailing newline(s) and ignorable blank lines outside arrays/tabular rows).
|
|
594
743
|
|
|
595
|
-
|
|
744
|
+
For root-form rules, including handling of empty documents, see §5.
|
|
745
|
+
|
|
746
|
+
### 14.5 Path Expansion Conflicts
|
|
747
|
+
|
|
748
|
+
When `expandPaths="safe"` is enabled:
|
|
749
|
+
- With `strict=true` (default): Decoders MUST error on any expansion conflict.
|
|
750
|
+
- With `strict=false`: Decoders MUST apply deterministic last-write-wins (LWW) resolution in document order. Implementations MUST resolve conflicts silently and MUST NOT emit diagnostics during normal decode operations.
|
|
751
|
+
|
|
752
|
+
See §13.4 for complete conflict definitions, deep-merge semantics, and examples.
|
|
753
|
+
|
|
754
|
+
Note (informative): Implementations MAY expose conflict diagnostics via out-of-band mechanisms (e.g., debug hooks, verbose CLI flags, or separate validation APIs), but such facilities are non-normative and MUST NOT affect default decode behavior or output.
|
|
755
|
+
|
|
756
|
+
### 14.6 Recommended Error Messages and Validator Diagnostics (Informative)
|
|
596
757
|
|
|
597
758
|
Validators SHOULD additionally report:
|
|
598
759
|
- Trailing spaces, trailing newlines (encoding invariants).
|
|
@@ -914,6 +1075,74 @@ Quoted keys with arrays (keys requiring quoting per Section 7.3):
|
|
|
914
1075
|
- id: 2
|
|
915
1076
|
```
|
|
916
1077
|
|
|
1078
|
+
Key folding and path expansion (v1.5+):
|
|
1079
|
+
|
|
1080
|
+
Encoding - basic folding (safe mode, depth=Infinity):
|
|
1081
|
+
|
|
1082
|
+
Input: `{"a": {"b": {"c": 1}}}`
|
|
1083
|
+
```
|
|
1084
|
+
a.b.c: 1
|
|
1085
|
+
```
|
|
1086
|
+
|
|
1087
|
+
Encoding - folding with inline array:
|
|
1088
|
+
|
|
1089
|
+
Input: `{"data": {"meta": {"items": ["x", "y"]}}}`
|
|
1090
|
+
```
|
|
1091
|
+
data.meta.items[2]: x,y
|
|
1092
|
+
```
|
|
1093
|
+
|
|
1094
|
+
Encoding - folding with tabular array:
|
|
1095
|
+
|
|
1096
|
+
Input: `{"a": {"b": {"items": [{"id": 1, "name": "A"}, {"id": 2, "name": "B"}]}}}`
|
|
1097
|
+
```
|
|
1098
|
+
a.b.items[2]{id,name}:
|
|
1099
|
+
1,A
|
|
1100
|
+
2,B
|
|
1101
|
+
```
|
|
1102
|
+
|
|
1103
|
+
Encoding - partial folding (flattenDepth=2):
|
|
1104
|
+
|
|
1105
|
+
Input: `{"a": {"b": {"c": {"d": 1}}}}`
|
|
1106
|
+
```
|
|
1107
|
+
a.b:
|
|
1108
|
+
c:
|
|
1109
|
+
d: 1
|
|
1110
|
+
```
|
|
1111
|
+
|
|
1112
|
+
Decoding - basic expansion (safe mode round-trip):
|
|
1113
|
+
|
|
1114
|
+
Input: `data.meta.items[2]: a,b` with options `{expandPaths: "safe"}`
|
|
1115
|
+
|
|
1116
|
+
Output: `{"data": {"meta": {"items": ["a", "b"]}}}`
|
|
1117
|
+
|
|
1118
|
+
Decoding - deep merge (multiple expanded keys):
|
|
1119
|
+
|
|
1120
|
+
Input with options `{expandPaths: "safe"}`:
|
|
1121
|
+
```
|
|
1122
|
+
a.b.c: 1
|
|
1123
|
+
a.b.d: 2
|
|
1124
|
+
a.e: 3
|
|
1125
|
+
```
|
|
1126
|
+
Output: `{"a": {"b": {"c": 1, "d": 2}, "e": 3}}`
|
|
1127
|
+
|
|
1128
|
+
Decoding - conflict error (strict=true, default):
|
|
1129
|
+
|
|
1130
|
+
Input with options `{expandPaths: "safe", strict: true}`:
|
|
1131
|
+
```
|
|
1132
|
+
a.b: 1
|
|
1133
|
+
a: 2
|
|
1134
|
+
```
|
|
1135
|
+
Result: Error - "Expansion conflict at path 'a' (object vs primitive)"
|
|
1136
|
+
|
|
1137
|
+
Decoding - conflict LWW (strict=false):
|
|
1138
|
+
|
|
1139
|
+
Input with options `{expandPaths: "safe", strict: false}`:
|
|
1140
|
+
```
|
|
1141
|
+
a.b: 1
|
|
1142
|
+
a: 2
|
|
1143
|
+
```
|
|
1144
|
+
Output: `{"a": 2}`
|
|
1145
|
+
|
|
917
1146
|
## Appendix B: Parsing Helpers (Informative)
|
|
918
1147
|
|
|
919
1148
|
These sketches illustrate structure and common decoding helpers. They are informative; normative behavior is defined in Sections 4–12 and 14.
|
|
@@ -949,6 +1178,7 @@ These sketches illustrate structure and common decoding helpers. They are inform
|
|
|
949
1178
|
- If token starts with a quote, it MUST be a properly quoted string (no trailing characters after the closing quote). Unescape using only the five escapes; otherwise MUST error.
|
|
950
1179
|
- Else if token is true/false/null → boolean/null.
|
|
951
1180
|
- Else if token is numeric without forbidden leading zeros and finite → number.
|
|
1181
|
+
- Examples: `"1.5000"` → `1.5`, `"-1E+03"` → `-1000`, `"-0"` → `0` (host normalization applies)
|
|
952
1182
|
- Else → string.
|
|
953
1183
|
|
|
954
1184
|
### B.5 Object and List Item Parsing
|
|
@@ -994,15 +1224,29 @@ The reference test suite covers:
|
|
|
994
1224
|
- Tabular detection and formatting, including delimiter variations.
|
|
995
1225
|
- Mixed arrays and objects-as-list-items behavior, including nested arrays and objects.
|
|
996
1226
|
- Whitespace invariants (no trailing spaces/newline).
|
|
997
|
-
-
|
|
1227
|
+
- Canonical number formatting (no exponent, no trailing zeros, no leading zeros).
|
|
998
1228
|
- Decoder strict-mode errors: count mismatches, invalid escapes, missing colon, delimiter mismatches, indentation errors, blank-line handling.
|
|
999
1229
|
|
|
1230
|
+
Note: Host-type normalization tests (e.g., BigInt, Date, Set, Map) are language-specific and maintained in implementation repositories. See Appendix G for normalization guidance.
|
|
1231
|
+
|
|
1000
1232
|
## Appendix D: Document Changelog (Informative)
|
|
1001
1233
|
|
|
1002
|
-
### v1.
|
|
1234
|
+
### v1.5 (2025-11-08)
|
|
1235
|
+
|
|
1236
|
+
- Added optional key folding for encoders: `keyFolding='safe'` mode with `flattenDepth` control (§13.4).
|
|
1237
|
+
- Added optional path expansion for decoders: `expandPaths='safe'` mode with conflict resolution tied to existing `strict` option (§13.4).
|
|
1238
|
+
- Defined safe-mode requirements for folding: IdentifierSegment validation, no path separator in segments, collision avoidance, no quoting required (§7.3, §13.4).
|
|
1239
|
+
- Specified deep-merge semantics for expansion: recursive merge for objects; conflict policy (error in strict mode, LWW when strict=false) for non-objects (§13.4).
|
|
1240
|
+
- Added strict-mode error category for path expansion conflicts (§14.5).
|
|
1241
|
+
- Both features default to OFF; fully backward-compatible.
|
|
1003
1242
|
|
|
1004
|
-
|
|
1005
|
-
|
|
1243
|
+
### v1.4 (2025-11-05)
|
|
1244
|
+
|
|
1245
|
+
- Removed JavaScript-specific normalization details; replaced with language-agnostic requirements (Section 3).
|
|
1246
|
+
- Defined canonical number format for encoders and decoder acceptance rules (Section 2).
|
|
1247
|
+
- Added Appendix G with host-type normalization examples for Go, JavaScript, Python, and Rust.
|
|
1248
|
+
- Clarified non-strict mode tab handling as implementation-defined (Section 12).
|
|
1249
|
+
- Expanded regex notation for cross-language clarity (Section 7.3).
|
|
1006
1250
|
|
|
1007
1251
|
### v1.3 (2025-10-31)
|
|
1008
1252
|
|
|
@@ -1053,39 +1297,127 @@ This specification and reference implementation are released under the MIT Licen
|
|
|
1053
1297
|
- Whitespace invariants for encoding and strict-mode indentation enforcement for decoding.
|
|
1054
1298
|
- Blank-line handling and trailing-newline acceptance.
|
|
1055
1299
|
|
|
1300
|
+
## Appendix G: Host Type Normalization Examples (Informative)
|
|
1301
|
+
|
|
1302
|
+
This appendix provides non-normative guidance on how implementations in different programming languages MAY normalize host-specific types to the JSON data model before encoding. The normative requirement is in Section 3: implementations MUST normalize non-JSON types to the JSON data model and MUST document their normalization policy.
|
|
1303
|
+
|
|
1304
|
+
### G.1 Go
|
|
1305
|
+
|
|
1306
|
+
Go implementations commonly normalize the following host types:
|
|
1307
|
+
|
|
1308
|
+
Numeric Types:
|
|
1309
|
+
- `big.Int`: If within `int64` range, convert to number. Otherwise, convert to quoted decimal string per lossless policy.
|
|
1310
|
+
- `math.Inf()`, `math.NaN()`: Convert to `null`.
|
|
1311
|
+
|
|
1312
|
+
Temporal Types:
|
|
1313
|
+
- `time.Time`: Convert to ISO 8601 string via `.Format(time.RFC3339)` or `.Format(time.RFC3339Nano)`.
|
|
1314
|
+
|
|
1315
|
+
Collection Types:
|
|
1316
|
+
- `map[K]V`: Convert to object. Keys MUST be strings or convertible to strings via `fmt.Sprint`.
|
|
1317
|
+
- `[]T` (slices): Preserve as array.
|
|
1318
|
+
|
|
1319
|
+
Struct Types:
|
|
1320
|
+
- Structs with exported fields: Convert to object using JSON struct tags if present.
|
|
1321
|
+
|
|
1322
|
+
Non-Serializable Types:
|
|
1323
|
+
- `nil`: Maps to `null`.
|
|
1324
|
+
- Functions, channels, `unsafe.Pointer`: Not serializable; implementations MUST error or skip these fields.
|
|
1325
|
+
|
|
1326
|
+
### G.2 JavaScript
|
|
1327
|
+
|
|
1328
|
+
JavaScript implementations commonly normalize the following host types:
|
|
1329
|
+
|
|
1330
|
+
Numeric Types:
|
|
1331
|
+
- `BigInt`: If the value is within `Number.MIN_SAFE_INTEGER` to `Number.MAX_SAFE_INTEGER`, convert to `number`. Otherwise, convert to a quoted decimal string (e.g., `BigInt(9007199254740993)` → `"9007199254740993"`).
|
|
1332
|
+
- `NaN`, `Infinity`, `-Infinity`: Convert to `null`.
|
|
1333
|
+
- `-0`: Normalize to `0`.
|
|
1334
|
+
|
|
1335
|
+
Temporal Types:
|
|
1336
|
+
- `Date`: Convert to ISO 8601 string via `.toISOString()` (e.g., `"2025-01-01T00:00:00.000Z"`).
|
|
1337
|
+
|
|
1338
|
+
Collection Types:
|
|
1339
|
+
- `Set`: Convert to array by iterating entries and normalizing each element.
|
|
1340
|
+
- `Map`: Convert to object using `String(key)` for keys and normalizing values recursively. Non-string keys are coerced to strings.
|
|
1341
|
+
|
|
1342
|
+
Object Types:
|
|
1343
|
+
- Plain objects: Enumerate own enumerable string keys in encounter order; normalize values recursively.
|
|
1344
|
+
|
|
1345
|
+
Non-Serializable Types:
|
|
1346
|
+
- `undefined`, `function`, `Symbol`: Convert to `null`.
|
|
1347
|
+
|
|
1348
|
+
### G.3 Python
|
|
1349
|
+
|
|
1350
|
+
Python implementations commonly normalize the following host types:
|
|
1351
|
+
|
|
1352
|
+
Numeric Types:
|
|
1353
|
+
- `decimal.Decimal`: Convert to `float` if representable without loss, OR convert to quoted decimal string for exact preservation (implementation policy).
|
|
1354
|
+
- `float('inf')`, `float('-inf')`, `float('nan')`: Convert to `null`.
|
|
1355
|
+
- Arbitrary-precision integers (large `int`): Emit as number if within host numeric range, OR as quoted decimal string per lossless policy.
|
|
1356
|
+
|
|
1357
|
+
Temporal Types:
|
|
1358
|
+
- `datetime.datetime`, `datetime.date`, `datetime.time`: Convert to ISO 8601 string representation via `.isoformat()`.
|
|
1359
|
+
|
|
1360
|
+
Collection Types:
|
|
1361
|
+
- `set`, `frozenset`: Convert to list (array).
|
|
1362
|
+
- `dict`: Preserve as object with string keys. Non-string keys MUST be coerced to strings.
|
|
1363
|
+
|
|
1364
|
+
Object Types:
|
|
1365
|
+
- Custom objects: Extract attributes via `__dict__` or implement custom serialization; convert to object (dict) with string keys.
|
|
1366
|
+
|
|
1367
|
+
Non-Serializable Types:
|
|
1368
|
+
- `None`: Maps to `null`.
|
|
1369
|
+
- Functions, lambdas, modules: Convert to `null`.
|
|
1370
|
+
|
|
1371
|
+
### G.4 Rust
|
|
1372
|
+
|
|
1373
|
+
Rust implementations commonly normalize the following host types (typically using serialization frameworks like `serde`):
|
|
1374
|
+
|
|
1375
|
+
Numeric Types:
|
|
1376
|
+
- `i128`, `u128`: If within `i64`/`u64` range, emit as number. Otherwise, convert to quoted decimal string per lossless policy.
|
|
1377
|
+
- `f64::INFINITY`, `f64::NEG_INFINITY`, `f64::NAN`: Convert to `null`.
|
|
1378
|
+
|
|
1379
|
+
Temporal Types:
|
|
1380
|
+
- `chrono::DateTime<T>`: Convert to ISO 8601 string via `.to_rfc3339()`.
|
|
1381
|
+
- `chrono::NaiveDate`, `chrono::NaiveTime`: Convert to ISO 8601 partial representations.
|
|
1382
|
+
|
|
1383
|
+
Collection Types:
|
|
1384
|
+
- `HashSet<T>`, `BTreeSet<T>`: Convert to `Vec<T>` (array).
|
|
1385
|
+
- `HashMap<K, V>`, `BTreeMap<K, V>`: Convert to object. Keys MUST be strings or convertible to strings via `Display` or `ToString`.
|
|
1386
|
+
|
|
1387
|
+
Enum Types:
|
|
1388
|
+
- Unit variants: Convert to string of variant name (e.g., `Color::Red` → `"Red"`).
|
|
1389
|
+
- Tuple/struct variants: Typically convert to object with `"type"` field and data fields per `serde` conventions.
|
|
1390
|
+
|
|
1391
|
+
Non-Serializable Types:
|
|
1392
|
+
- `Option::None`: Convert to `null`.
|
|
1393
|
+
- `Option::Some(T)`: Unwrap and normalize `T`.
|
|
1394
|
+
- Function pointers, raw pointers: Not serializable; implementations MUST error or skip these fields.
|
|
1395
|
+
|
|
1396
|
+
### G.5 General Guidance
|
|
1397
|
+
|
|
1398
|
+
Implementations in any language SHOULD:
|
|
1399
|
+
1. Document their normalization policy clearly, especially for:
|
|
1400
|
+
- Large or arbitrary-precision numbers (lossless string vs. approximate number)
|
|
1401
|
+
- Date/time representations (ISO 8601 format details)
|
|
1402
|
+
- Collection type mappings (order preservation for sets)
|
|
1403
|
+
2. Provide configuration options where multiple strategies are reasonable (e.g., lossless vs. approximate numeric encoding).
|
|
1404
|
+
3. Ensure that normalization is deterministic: encoding the same host value twice MUST produce identical TOON output.
|
|
1405
|
+
|
|
1056
1406
|
## 19. TOON Core Profile (Normative Subset)
|
|
1057
1407
|
|
|
1058
|
-
This profile captures the most common, memory-friendly rules.
|
|
1408
|
+
This profile captures the most common, memory-friendly rules by reference to normative sections.
|
|
1059
1409
|
|
|
1060
|
-
- Character set:
|
|
1061
|
-
- Indentation: 2 spaces per level
|
|
1062
|
-
|
|
1063
|
-
-
|
|
1064
|
-
|
|
1065
|
-
|
|
1066
|
-
-
|
|
1067
|
-
|
|
1068
|
-
|
|
1069
|
-
-
|
|
1070
|
-
|
|
1071
|
-
- Decoder accepts decimal and exponent forms; tokens with forbidden leading zeros decode as strings.
|
|
1072
|
-
- Arrays and headers:
|
|
1073
|
-
- Header: [#?N[delim?]] where delim is absent (comma), HTAB (tab), or "|" (pipe).
|
|
1074
|
-
- Keyed header: key[#?N[delim?]]:. Optional fields: {f1<delim>f2}.
|
|
1075
|
-
- Primitive arrays inline: key[N]: v1<delim>v2. Empty arrays: key[0]: (no values).
|
|
1076
|
-
- Tabular arrays: key[N]{fields}: then N rows at depth +1.
|
|
1077
|
-
- Otherwise list form: key[N]: then N items, each starting with "- ".
|
|
1078
|
-
- Delimiters:
|
|
1079
|
-
- Only split on the active delimiter from the nearest header. Non-active delimiters never split.
|
|
1080
|
-
- Objects as list items:
|
|
1081
|
-
- "- value" (primitive), "- [M]: …" (inline array), or "- key: …" (object).
|
|
1082
|
-
- If first field is "- key:" with nested object: nested fields at +2; subsequent sibling fields at +1.
|
|
1083
|
-
- Root form:
|
|
1084
|
-
- Root array if the first depth-0 line is a header (per Section 6).
|
|
1085
|
-
- Root primitive if exactly one non-empty line and it is not a header or key-value.
|
|
1086
|
-
- Otherwise object.
|
|
1087
|
-
- Strict mode checks:
|
|
1088
|
-
- All count/width checks; missing colon; invalid escapes; indentation multiple-of-indentSize; delimiter mismatches via count checks; blank lines inside arrays/tabular rows; empty input.
|
|
1410
|
+
- Character set and line endings: As defined in §1 (Core Concepts) and §12.
|
|
1411
|
+
- Indentation: MUST conform to §12 (2 spaces per level by default; strict mode enforces indentSize multiples).
|
|
1412
|
+
- Keys and colon syntax: MUST conform to §7.2 (unquoted keys match ^[A-Za-z_][A-Za-z0-9_.]*$; quoted otherwise; colon required after keys).
|
|
1413
|
+
- Strings and quoting: MUST be quoted as defined in §7.2 (deterministic quoting rules for empty strings, whitespace, reserved literals, control characters, delimiters, leading hyphens, and structural tokens).
|
|
1414
|
+
- Escape sequences: MUST conform to §7.1 (only \\, \", \n, \r, \t are valid).
|
|
1415
|
+
- Numbers: Encoders MUST emit canonical form per §2; decoders MUST accept input per §4.
|
|
1416
|
+
- Arrays and headers: Header syntax MUST conform to §6; array encoding as defined in §9.
|
|
1417
|
+
- Delimiters: Delimiter scoping and quoting rules as defined in §11.
|
|
1418
|
+
- Objects as list items: Indentation rules as defined in §10.
|
|
1419
|
+
- Root form determination: As defined in §5.
|
|
1420
|
+
- Strict mode validation: All checks enumerated in §14.
|
|
1089
1421
|
|
|
1090
1422
|
## 20. Versioning and Extensibility
|
|
1091
1423
|
|
|
@@ -1097,6 +1429,7 @@ For a detailed version history, see Appendix D.
|
|
|
1097
1429
|
|
|
1098
1430
|
- Backward-compatible evolutions SHOULD preserve current headers, quoting rules, and indentation semantics.
|
|
1099
1431
|
- Reserved/structural characters (colon, brackets, braces, hyphen) MUST retain current meanings.
|
|
1432
|
+
- The path separator (see §1.9) is fixed to `"."` in v1.5; future versions MAY make this configurable.
|
|
1100
1433
|
- Future work (non-normative): schemas, comments/annotations, additional delimiter profiles, optional \uXXXX escapes (if added, must be precisely defined).
|
|
1101
1434
|
|
|
1102
1435
|
## 21. Intellectual Property Considerations
|