toon-parser 3.0.0 → 3.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +148 -62
  2. package/package.json +1 -1
package/README.md CHANGED
@@ -6,7 +6,7 @@
6
6
  [![npm provenance](https://img.shields.io/badge/npm-provenance-blue)](https://docs.npmjs.com/generating-provenance-statements)
7
7
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
8
8
 
9
- Safe JSON ⇆ TOON encoder/decoder with strict validation and prototype-pollution guards.
9
+ Safe JSON ⇆ TOON encoder/decoder with strict validation and prototype-pollution guards. Targets the [TOON v3.0 spec](https://toonformat.dev/) (Working Draft, 2025-11-24).
10
10
 
11
11
  ## Install
12
12
 
@@ -14,25 +14,21 @@ Safe JSON ⇆ TOON encoder/decoder with strict validation and prototype-pollutio
14
14
  npm install toon-parser
15
15
  ```
16
16
 
17
- Note: this package supports both ESM and CommonJS consumers (CJS builds are available as `dist/index.cjs`). The package requires Node >= 20 per `engines` in `package.json`.
17
+ This package supports both ESM and CommonJS consumers (CJS builds are available as `dist/index.cjs`). Requires Node 20.
18
18
 
19
- ## New in 2.2.0
20
- - **Security**: Fixed 8 dependency vulnerabilities (1 critical `fast-xml-parser` with 6 CVEs, 5 high, 2 moderate).
21
- - **Dependencies**: Updated all dependencies to latest versions (vitest 4, TypeScript 5.9, fast-xml-parser 5.5.9).
22
- - **Node.js**: Minimum version bumped to Node 20 (Node 18 reached EOL April 2025).
23
- - **Build**: `esbuild` is now an explicit dependency; test files excluded from published tarball.
24
-
25
- ## New in 2.1.0
26
- - **HTML/CSV/Log/URL Support**: Dedicated parsers for common formats to leverage Toon's structure.
27
-
28
- ## New in 2.0.0
29
- - **XML Support**: Convert XML strings directly to TOON with `xmlToToon`.
30
- - **Date Support**: Automatically converts `Date` objects to ISO strings.
19
+ ## New in 3.0.0
20
+ - **Aligned with TOON spec v3.0**. No breaking changes to existing input/output v3 features are opt-in.
21
+ - **§13.4 key folding** (encoder) and **path expansion** (decoder). Single-key chains can collapse into dotted paths (`{a:{b:{c:1}}}` `a.b.c: 1`) and round-trip back. See [Key folding & path expansion](#key-folding--path-expansion-toon-v3-134) below.
22
+ - **Security**: fixed a prototype pollution vector in `urlToToon` (bracket/dotted `__proto__` / `constructor` / `prototype` segments now throw `ToonError` and never reach `Object.prototype`). New `maxInputLength` option (default 5 MB) caps raw input size on every parser entry point. All side-format adapters now throw `ToonError` (not plain `Error`).
23
+ - **Sub-path exports** — import only the adapter you need: `toon-parser/csv`, `toon-parser/xml`, `toon-parser/html`, `toon-parser/log`, `toon-parser/url`. Bundlers can drop the unused adapters.
24
+ - **`logToToon`**: now parses Combined Log Format (referer + user-agent) in addition to Common.
25
+ - CI matrix now covers Node 20, 22, and 24.
31
26
 
32
27
  ## Why this library?
33
28
 
34
- - Implements the TOON v2.1 spec features most useful for JSON round-trips: tabular arrays, inline primitive arrays, nested objects/arrays, deterministic quoting.
35
- - Hardened for untrusted input: prototype-pollution guards, max depth/length/node caps, strict length/width enforcement, and finite-number checks.
29
+ - **Universal data support**: converts JSON, XML, HTML, CSV, logs, and URL parameters into TOON's concise, human-readable format.
30
+ - Implements TOON v3.0 features for token savings: tabular arrays (perfect for CSV/logs), inline primitive arrays, deterministic quoting, and opt-in §13.4 key folding / path expansion.
31
+ - Hardened for untrusted input: prototype-pollution guards, max depth/length/node caps, `maxInputLength` cap, strict length/width enforcement, finite-number checks.
36
32
  - No dynamic code execution; parsing uses explicit token scanning and bounded state to resist resource exhaustion.
37
33
 
38
34
  ## Quick start
@@ -50,23 +46,35 @@ const data = {
50
46
  };
51
47
 
52
48
  const toon = jsonToToon(data);
53
- // TOON text with tabular hikes array and inline primitive friends array
54
49
  console.log(toon);
55
50
 
56
51
  const roundTrip = toonToJson(toon);
57
52
  console.log(roundTrip); // back to the original JSON object
58
53
  ```
59
54
 
55
+ ### Tree-shaking via sub-path imports
56
+
57
+ ```ts
58
+ // Pulls in only the CSV adapter + the core encoder; xml/html/log/url stay out of the bundle.
59
+ import { csvToToon } from 'toon-parser/csv';
60
+
61
+ // The barrel still works — use this when you want everything.
62
+ import { jsonToToon, csvToToon, xmlToToon } from 'toon-parser';
63
+ ```
64
+
60
65
  ## API
61
66
 
62
67
  ### `jsonToToon(value, options?) => string`
63
68
 
64
69
  Encodes a JSON-compatible value into TOON text.
65
70
 
71
+ ### `toonToJson(text, options?) => unknown`
72
+
73
+ Decodes TOON text back to JSON data.
74
+
66
75
  ### `xmlToToon(xml, options?) => string`
67
76
 
68
- Parses an XML string and converts it to TOON text.
69
- Accepts standard `JsonToToonOptions` plus an `xmlOptions` object passed to `fast-xml-parser`.
77
+ Parses an XML string and converts it to TOON text. Accepts standard `JsonToToonOptions` plus an `xmlOptions` object passed to `fast-xml-parser`.
70
78
 
71
79
  ```ts
72
80
  import { xmlToToon } from 'toon-parser';
@@ -76,56 +84,119 @@ const toon = xmlToToon('<user id="1">Alice</user>');
76
84
  // "@_id": 1
77
85
  ```
78
86
 
87
+ > [!WARNING]
88
+ > **Security Note:** While `fast-xml-parser` v5 is generally secure by default, overriding `xmlOptions` can alter security properties (e.g., enabling entity expansion). Only enable such features if you trust the source XML.
89
+
79
90
  ### `htmlToToon(html, options?) => string`
80
91
 
81
- Parses HTML string to Toon. Uses `node-html-parser`.
92
+ Parses an HTML string into a structured object tree, preserving attributes and hierarchy. Uses `node-html-parser`.
82
93
 
83
94
  ### `csvToToon(csv, options?) => string`
84
95
 
85
- Parses CSV string. Options:
96
+ Parses a CSV string into a TOON tabular array. Options:
86
97
  - `delimiter` (default `,`)
87
98
  - `hasHeader` (default `true`)
88
99
 
89
100
  ### `urlToToon(urlOrQs, options?) => string`
90
- Parses URL query strings to Toon object. Expands dotted/bracket notation (e.g. `user[name]`).
101
+
102
+ Parses URL query strings to TOON. Expands dotted/bracket notation (e.g. `user[name]`). Rejects `__proto__` / `constructor` / `prototype` segments with `ToonError`.
91
103
 
92
104
  ### `logToToon(log, options?) => string`
93
- Parses logs. Options:
94
- - `format`: `'auto'` | `'clf'` | `'json'`
105
+
106
+ Parses logs into TOON tabular form. Options:
107
+ - `format`: `'auto'` | `'clf'` | `'combined'` | `'json'` (default `'auto'`)
108
+ - `'auto'` tries Combined Log Format first (with referer + user-agent), falls back to Common, then to a `{ raw }` line on no match.
109
+ - `'clf'` accepts both Common and Combined variants.
110
+ - `'combined'` accepts only Combined Log Format.
111
+ - `'json'` parses NDJSON (one JSON object per line); malformed lines become `{ raw }`.
112
+
113
+ Field set: `host`, `ident`, `authuser`, `date`, `request`, `status`, `size` (plus `referer`, `userAgent` for Combined). `size` is `null` when the log emits `-`.
95
114
 
96
115
  ### `csvToJson(csv, options?) => unknown[]`
97
- Lightweight CSV to JSON helper. Throws when row widths mismatch headers or when the delimiter is not a single character.
116
+ Lightweight CSV JSON helper. Throws `ToonError` when row widths mismatch headers or when the delimiter is not a single character.
98
117
 
99
- ### `htmlToJson(html) => { children: ... }`
118
+ ### `htmlToJson(html, options?) => { children: ... }`
100
119
  Parses HTML into a simplified JSON tree. Performs a minimal tag-balance check and trims whitespace-only nodes. Not intended for arbitrary HTML with scripts/styles.
101
120
 
102
121
  ### `xmlToJson(xml, options?) => unknown`
103
122
  Validates XML before parsing; returns `{}` for empty input and throws on malformed XML.
104
123
 
124
+ ### `enforceInputLength(text, options?)`
125
+ Helper used by every `*ToToon` / `*ToJson` decoder entry point as a first-line resource cap. Re-exported for downstream packages that wrap TOON inputs.
105
126
 
106
- > [!WARNING]
107
- > **Security Note:** While `fast-xml-parser` v5 is generally secure by default, overriding `xmlOptions` can alter security properties (e.g., enabling entity expansion). Only enable such features if you trust the source XML.
127
+ ### Common options (`SecurityOptions`)
108
128
 
109
- Options:
110
- - `indent` (number, default `2`): spaces per indentation level.
111
- - `delimiter` (`,` | `|` | `\t`, default `,`): delimiter for inline arrays and tabular rows.
112
- - `sortKeys` (boolean, default `false`): sort object keys alphabetically instead of preserving encounter order.
113
129
  - `maxDepth` (number, default `64`): maximum nesting depth (objects + arrays).
114
130
  - `maxArrayLength` (number, default `50_000`): maximum allowed array length.
115
- - `maxTotalNodes` (number, default `250_000`): cap on processed fields/items to limit resource use.
131
+ - `maxTotalNodes` (number, default `250_000`): cap on processed fields/items.
132
+ - `maxInputLength` (number, default `5_000_000`): max raw input length in characters. Pass `Infinity` to disable.
116
133
  - `disallowedKeys` (string[], default `["__proto__", "constructor", "prototype"]`): keys rejected to prevent prototype pollution.
117
134
 
118
- Throws `ToonError` if limits are hit or input is not encodable.
135
+ `jsonToToon` additionally supports:
136
+ - `indent` (number, default `2`)
137
+ - `delimiter` (`,` | `|` | `\t`, default `,`)
138
+ - `sortKeys` (boolean, default `false`)
139
+ - `keyFolding` (`'off'` | `'safe'`, default `'off'`) — see below
140
+ - `flattenDepth` (number, default `Infinity` when folding is `'safe'`)
119
141
 
120
- ### `toonToJson(text, options?) => unknown`
142
+ `toonToJson` additionally supports:
143
+ - `strict` (boolean, default `true`)
144
+ - `expandPaths` (`'off'` | `'safe'`, default `'off'`) — see below
121
145
 
122
- Decodes TOON text back to JSON data.
146
+ All entry points throw `ToonError` if limits are hit or input is malformed.
123
147
 
124
- Options:
125
- - `strict` (boolean, default `true`): enforce declared array lengths, tabular row widths, and indentation consistency.
126
- - Same security options as `jsonToToon`: `maxDepth`, `maxArrayLength`, `maxTotalNodes`, `disallowedKeys`.
148
+ ## Key folding & path expansion (TOON v3 §13.4)
127
149
 
128
- Throws `ToonError` with line numbers when parsing fails or security limits are exceeded.
150
+ Both are **opt-in** and default to `'off'`, so existing output and parsing behavior are unchanged.
151
+
152
+ **Encoder** — collapse single-key object chains into dotted paths:
153
+
154
+ ```ts
155
+ jsonToToon({ a: { b: { c: 1 } } }, { keyFolding: 'safe' });
156
+ // "a.b.c: 1"
157
+
158
+ jsonToToon(
159
+ { data: { meta: { items: [{ id: 1 }, { id: 2 }] } } },
160
+ { keyFolding: 'safe' }
161
+ );
162
+ // data.meta.items[2]{id}:
163
+ // 1
164
+ // 2
165
+ ```
166
+
167
+ Cap fold length with `flattenDepth`:
168
+
169
+ ```ts
170
+ jsonToToon({ a: { b: { c: { d: 1 } } } }, { keyFolding: 'safe', flattenDepth: 2 });
171
+ // a.b:
172
+ // c.d: 1
173
+ ```
174
+
175
+ A chain is foldable only when:
176
+ 1. Every step is an object with exactly one key.
177
+ 2. Every segment matches the IdentifierSegment grammar `^[A-Za-z_][A-Za-z0-9_]*$`.
178
+ 3. The leaf is a primitive, array, `Date`, or empty object.
179
+ 4. The folded path doesn't collide with a literal sibling.
180
+ 5. No segment is in `disallowedKeys` (prototype-pollution guard).
181
+
182
+ **Decoder** — expand dotted keys into nested objects:
183
+
184
+ ```ts
185
+ toonToJson('a.b.c: 1', { expandPaths: 'safe' });
186
+ // { a: { b: { c: 1 } } }
187
+
188
+ toonToJson(['a.b.c: 1', 'a.b.d: 2', 'a.e: 3'].join('\n'), { expandPaths: 'safe' });
189
+ // { a: { b: { c: 1, d: 2 }, e: 3 } }
190
+ ```
191
+
192
+ Conflicting paths throw `ToonError` in strict mode (default), or last-write-wins when `strict: false`:
193
+
194
+ ```ts
195
+ toonToJson('a.b: 1\na: 2', { expandPaths: 'safe' }); // throws — object vs primitive
196
+ toonToJson('a.b: 1\na: 2', { expandPaths: 'safe', strict: false }); // { a: 2 }
197
+ ```
198
+
199
+ Disallowed segments (e.g. `__proto__`) cause `ToonError` regardless of `strict`.
129
200
 
130
201
  ## Usage examples
131
202
 
@@ -135,9 +206,9 @@ Throws `ToonError` with line numbers when parsing fails or security limits are e
135
206
  const toon = jsonToToon(data, { indent: 4, delimiter: '|' });
136
207
  ```
137
208
 
138
- ### Detect and emit tabular arrays
209
+ ### Tabular arrays
139
210
 
140
- Uniform arrays of objects with primitive values are emitted in TOONs table form automatically:
211
+ Uniform arrays of objects with primitive values are emitted in TOON's table form automatically:
141
212
 
142
213
  ```ts
143
214
  const toon = jsonToToon({ rows: [{ a: 1, b: 'x' }, { a: 2, b: 'y' }] });
@@ -152,39 +223,53 @@ Non-uniform arrays fall back to list form with `-` entries.
152
223
 
153
224
  ### Handling unsafe keys
154
225
 
155
- Prototype-polluting keys are rejected:
156
-
157
226
  ```ts
158
227
  toonToJson('__proto__: 1'); // throws ToonError: Disallowed key "__proto__"
159
- ```
160
-
161
- You can extend the blocklist:
162
-
163
- ```ts
164
228
  toonToJson('danger: 1', { disallowedKeys: ['danger'] }); // throws
165
229
  ```
166
230
 
167
231
  ### Enforcing strictness
168
232
 
169
- Strict mode (default) ensures array lengths match headers and tabular rows match declared widths:
170
-
171
233
  ```ts
172
- toonToJson('nums[2]: 1'); // throws ToonError: length mismatch
234
+ toonToJson('nums[2]: 1'); // throws length mismatch
235
+ toonToJson('nums[2]: 1', { strict: false }); // { nums: [1] }
173
236
  ```
174
237
 
175
- Disable strictness if you need best-effort parsing:
238
+ ### Adapter examples
176
239
 
177
240
  ```ts
178
- const result = toonToJson('nums[2]: 1', { strict: false });
179
- // result: { nums: [1] }
241
+ import { csvToToon } from 'toon-parser/csv';
242
+ csvToToon('id,name\n1,Alice\n2,Bob');
243
+ /*
244
+ [2]{id,name}:
245
+ 1,Alice
246
+ 2,Bob
247
+ */
248
+
249
+ import { urlToToon } from 'toon-parser/url';
250
+ urlToToon('filter[type]=user&filter[active]=true');
251
+ /*
252
+ filter:
253
+ type: user
254
+ active: true
255
+ */
256
+
257
+ import { logToToon } from 'toon-parser/log';
258
+ logToToon('127.0.0.1 - - [10/Oct:12:00] "GET /" 200 512');
259
+ // CLF parsed into tabular form
180
260
  ```
181
261
 
182
262
  ### Security limits
183
263
 
184
264
  ```ts
185
- const opts = { maxDepth: 10, maxArrayLength: 1000, maxTotalNodes: 10_000 };
186
- jsonToToon(bigValue, opts); // throws if exceeded
187
- toonToJson(bigToonText, opts); // throws if exceeded
265
+ const opts = {
266
+ maxDepth: 10,
267
+ maxArrayLength: 1000,
268
+ maxTotalNodes: 10_000,
269
+ maxInputLength: 100_000
270
+ };
271
+ jsonToToon(bigValue, opts); // throws if exceeded
272
+ toonToJson(bigToonText, opts); // throws if exceeded
188
273
  ```
189
274
 
190
275
  ## Error handling
@@ -203,11 +288,12 @@ try {
203
288
 
204
289
  ## Design choices
205
290
 
206
- - **Tabular detection** follows the spec: all elements must be objects, share identical keys, and contain only primitives.
207
- - **String quoting** follows deterministic rules (quote numeric-looking strings, leading/trailing space, colon, delimiter, backslash, brackets, control chars, or leading hyphen).
291
+ - **Universal tabular support**: detects tabular structures in JSON/CSV/logs and optimizes them into compact TOON tables.
292
+ - **Format-preserving**: HTML and XML conversions preserve hierarchy and attributes (as keys) while ensuring output remains safe TOON.
293
+ - **Deterministic quoting**: string quoting follows strict rules to ensure round-trip safety.
208
294
  - **Finite numbers only**: `NaN`, `Infinity`, and `-Infinity` are rejected.
209
- - **No implicit path expansion**: dotted keys stay literal (e.g., `a.b` remains a single key).
295
+ - **Explicit pathing**: dotted keys stay literal by default. Opt into expansion with `expandPaths: 'safe'`.
210
296
 
211
297
  ## Project status
212
298
 
213
- This library targets TOON spec v2.1 core behaviors commonly needed for JSON round-trips. It prioritizes correctness and safety over permissiveness; loosen validation via `strict: false` only when you fully trust the input source.
299
+ This library targets the **TOON v3.0** spec (Working Draft, 2025-11-24). All v2.1 features remain supported; v3 extensions (key folding, path expansion) are opt-in. The library prioritizes correctness and safety over permissiveness; loosen validation via `strict: false` only when you fully trust the input source.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "toon-parser",
3
- "version": "3.0.0",
3
+ "version": "3.0.1",
4
4
  "description": "Safe JSON <-> TOON encoder/decoder with strict validation.",
5
5
  "type": "module",
6
6
  "main": "dist/index.cjs",