jmd-format 0.0.1 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -2,9 +2,9 @@
2
2
 
3
3
  **JMD (JSON Markdown) — JavaScript reference implementation.**
4
4
 
5
- JMD is a structured data format for LLM-driven infrastructure, designed to work
6
- *with* the natural generation behavior of large language models rather than
7
- against it. See the [JMD specification](https://github.com/ostermeyer/jmd-spec)
5
+ JMD is a structured data format for LLM-driven infrastructure, designed to
6
+ work *with* the natural generation behavior of large language models rather
7
+ than against it. See the [JMD specification](https://github.com/ostermeyer/jmd-spec)
8
8
  for the format itself.
9
9
 
10
10
  This package is the JavaScript reference implementation. A Python reference
@@ -12,11 +12,118 @@ implementation exists as the `jmd-format` package on PyPI.
12
12
 
13
13
  ## Status
14
14
 
15
- **Work in progress 0.0.x placeholder.** The package name is reserved; the
16
- implementation is under active design. Expect no functional code before 0.1.0.
15
+ **Byte-compatible with the Python reference.** The core syntax (all four
16
+ document modes, nested objects, arrays of scalars / objects /
17
+ sub-arrays, multiline blockquotes, frontmatter, thematic breaks)
18
+ produces output identical to `jmd-format` on PyPI for the same input
19
+ value and label — verified by a 10 000-case randomized
20
+ cross-implementation stress test.
17
21
 
18
- Follow [github.com/ostermeyer/jmd-js](https://github.com/ostermeyer/jmd-js) for
19
- updates.
22
+ Schema-specific type expressions (§14) and QBE filter conditions (§13)
23
+ are still parsed as raw strings on both sides; structured interpretation
24
+ will follow.
25
+
26
+ ## Install
27
+
28
+ ```
29
+ npm install jmd-format
30
+ ```
31
+
32
+ Node 20+ required. Pure ESM, no transpilation, no dependencies.
33
+
34
+ ## Usage
35
+
36
+ ### Batch
37
+
38
+ ```js
39
+ import { parse, serialize } from 'jmd-format'
40
+
41
+ const { mode, label, frontmatter, value } = parse(text)
42
+ const out = serialize({ id: 42, status: 'pending' }, 'Order')
43
+ ```
44
+
45
+ Frontmatter and alternate root modes are expressed at the call site:
46
+
47
+ ```js
48
+ serialize(value, '? Order', { page: 1, 'page-size': 50 }) // query mode
49
+ serialize(value, '! Order') // schema mode
50
+ serialize(value, '- Order') // delete mode
51
+ ```
52
+
53
+ ### Streaming
54
+
55
+ The parser and serializer both have streaming surfaces — async generator
56
+ for input, sync generator for output. Events follow the sequence from
57
+ spec §18.2.
58
+
59
+ ```js
60
+ import { createParser, toLines, serializeLines } from 'jmd-format'
61
+
62
+ // Parse a stream of arbitrary text chunks (e.g. an HTTP response body).
63
+ const parser = createParser()
64
+ for await (const event of parser.events(toLines(response.body))) {
65
+ // event: { type: 'field', key: 'id', value: 42 }
66
+ // event: { type: 'object_start', key: 'address' }
67
+ // event: { type: 'document_end' }
68
+ }
69
+
70
+ // Serialize line by line.
71
+ for (const line of serializeLines(value, 'Orders')) {
72
+ res.write(line) // each line includes its trailing newline
73
+ }
74
+ ```
75
+
76
+ `toLines(source)` is the adapter that turns an async iterable of arbitrary
77
+ string chunks into an async iterable of complete lines.
78
+
79
+ ## Currently supported
80
+
81
+ - All four document modes (`#`, `#!`, `#?`, `#-`): parsing recognizes the
82
+ mode and extracts the label; the body parses as standard JMD.
83
+ - Scalars: `null`, `true`, `false`, numbers, bare and quoted strings.
84
+ - Nested objects via heading depth.
85
+ - Arrays of scalars and of objects (with indented continuation fields).
86
+ - Sub-arrays and arrays of arrays (`### []`, §8.4).
87
+ - Blockquote multiline strings (§9.1).
88
+ - Frontmatter (§3.5): both `key: value` and bare-key forms.
89
+ - Deferred blank-line scope reset (§7.2a) — a blank followed by a
90
+ deeper heading re-enters the nested scope; a blank followed by a bare
91
+ field triggers the reset. Matches the Python reference behavior.
92
+ - Scalar headings for scope return (`## total: 84.99`, §7.2).
93
+ - Anonymous headings (§3.2a).
94
+ - Thematic breaks (`---`) as array-item separators (§8.6). Consumed by
95
+ the innermost enclosing array whose most-recent item is a dict with
96
+ nested structures.
97
+ - Depth-qualified array items (`##N -`, §8.6a) and depth+1 items
98
+ (`##N - key: val`, §8.6b). Parser-tolerance forms: the canonical
99
+ serializer emits bare `- ` items with thematic-break separators, but
100
+ the parser accepts the depth-annotated forms that LLMs tend to
101
+ produce when items sit one heading level below the array heading.
102
+ - **Streaming parser** via async generator: events match the sequence
103
+ defined in §18.2 (document_start, field, field_start, field_content,
104
+ object_start/end, array_start/end, item_start/value/end, scope_reset,
105
+ document_end, frontmatter).
106
+ - **Streaming serializer** via sync generator (`serializeLines`).
107
+ - Line adapter (`toLines`) to convert chunked input to lines.
108
+
109
+ ## Not yet structured
110
+
111
+ Schema-specific type expressions (§14) and QBE filter conditions (§13)
112
+ are parsed as raw strings on both this implementation and the Python
113
+ reference. Structured interpretation will follow in a later release.
114
+
115
+ ## Design
116
+
117
+ JavaScript-native throughout:
118
+
119
+ - Functions and closures, not classes. No `new`, no `this`.
120
+ - Plain objects as data carriers.
121
+ - ESM only; no build step.
122
+ - No external dependencies.
123
+ - Zero Node-specific APIs in the core — runs in the browser unchanged.
124
+
125
+ The implementation is strict on the generator side and tolerant on the
126
+ parser side, following §22.1 of the specification.
20
127
 
21
128
  ## License
22
129
 
package/package.json CHANGED
@@ -1,8 +1,12 @@
1
1
  {
2
2
  "name": "jmd-format",
3
- "version": "0.0.1",
3
+ "version": "0.1.1",
4
4
  "description": "JMD (JSON Markdown) — structured data format for LLM-driven infrastructure. JavaScript reference implementation.",
5
5
  "type": "module",
6
+ "main": "./src/index.js",
7
+ "exports": {
8
+ ".": "./src/index.js"
9
+ },
6
10
  "author": {
7
11
  "name": "Andreas Ostermeyer",
8
12
  "email": "andreas@ostermeyer.de"
@@ -28,7 +32,11 @@
28
32
  "engines": {
29
33
  "node": ">=20"
30
34
  },
35
+ "scripts": {
36
+ "test": "node --test test/*.test.js"
37
+ },
31
38
  "files": [
39
+ "src/",
32
40
  "README.md",
33
41
  "LICENSE"
34
42
  ]
package/src/index.js ADDED
@@ -0,0 +1,7 @@
1
+ // jmd-format — JavaScript reference implementation.
2
+ //
3
+ // Public surface: minimal on purpose. Batch API for the common case,
4
+ // streaming API for large or incremental workloads.
5
+
6
+ export { parse, createParser, toLines } from './parser.js'
7
+ export { serialize, serializeLines } from './serializer.js'
package/src/parser.js ADDED
@@ -0,0 +1,501 @@
1
+ // JMD parser.
2
+ //
3
+ // The parser processes a JMD document line by line, maintaining a scope
4
+ // stack driven by heading depth. It has two surfaces:
5
+ //
6
+ // - parse(text) — batch. Returns { mode, label, frontmatter, value }.
7
+ // - events(lineSource) — streaming. Async generator of parse events.
8
+ //
9
+ // Both share the same line-processing core. Events follow the sequence
10
+ // defined in JMD spec §18.2. Parser-tolerant per §22.1.
11
+
12
+ import { parseScalar, parseKey, parseField } from './value.js'
13
+
14
+ const HEADING = /^(#+)([!?-])?(?:\s+(.*))?$/
15
+
16
+ export function createParser() {
17
+ let lineNo = 0
18
+
19
+ // Document-level state.
20
+ let mode = null
21
+ let label = null
22
+ const frontmatter = {}
23
+ let inFrontmatter = true
24
+ let seenRoot = false
25
+ let root = null
26
+
27
+ // Scope stack. Each entry:
28
+ // { kind: 'object' | 'array', container, depth, currentItem? }
29
+ // currentItem lives on array scopes only and holds the object built by
30
+ // the most recent `- ` line so indented continuations attach to it.
31
+ let stack = []
32
+
33
+ // Pending blockquote state.
34
+ // { container, key, lines }
35
+ let bq = null
36
+
37
+ // A blank line may or may not terminate the current scope — it depends on
38
+ // what comes after. We defer the decision: flag the blank, then let the
39
+ // next real line resolve it. A deeper heading re-enters (no reset); a bare
40
+ // field at the root indicates a scope return (full reset); a `---` inside
41
+ // an array is a thematic break (item separator, §8.6).
42
+ let blankPending = false
43
+
44
+ // Events emitted by the current line — drained on each processLine call.
45
+ let pending = []
46
+ function emit(type, data) {
47
+ pending.push(data ? { type, ...data } : { type })
48
+ }
49
+ function drain() {
50
+ const out = pending
51
+ pending = []
52
+ return out
53
+ }
54
+
55
+ // --- Line processing -----------------------------------------------------
56
+
57
+ function processLine(rawLine) {
58
+ lineNo++
59
+ const line = rawLine.replace(/\r$/, '')
60
+
61
+ if (bq !== null) {
62
+ if (line === '>' || line.startsWith('> ')) {
63
+ const content = line === '>' ? '' : line.slice(2)
64
+ bq.lines.push(content)
65
+ emit('field_content', { text: content })
66
+ return drain()
67
+ }
68
+ commitBlockquote()
69
+ }
70
+
71
+ if (/^\s*$/.test(line)) { blankPending = true; return drain() }
72
+
73
+ const h = HEADING.exec(line)
74
+ if (h) {
75
+ // A heading stands on its own authority — its depth drives scope.
76
+ blankPending = false
77
+ onHeading(h[1].length, h[2] || '', h[3] || '')
78
+ return drain()
79
+ }
80
+
81
+ if (inFrontmatter) { onFrontmatter(line); return drain() }
82
+
83
+ if (/^\s{2,}/.test(line)) {
84
+ blankPending = false
85
+ onIndented(line)
86
+ return drain()
87
+ }
88
+
89
+ // Thematic break: `---` (or more) at column 0, only meaningful inside
90
+ // an array scope, where it terminates the current item.
91
+ if (/^-{3,}$/.test(line)) {
92
+ onThematicBreak()
93
+ return drain()
94
+ }
95
+
96
+ if (line === '-' || line.startsWith('- ')) {
97
+ if (blankPending) applyBlankReset()
98
+ onItem(line)
99
+ return drain()
100
+ }
101
+
102
+ if (blankPending) applyBlankReset()
103
+ onField(line)
104
+ return drain()
105
+ }
106
+
107
+ function applyBlankReset() {
108
+ blankPending = false
109
+ if (stack.length <= 1) return
110
+ emit('scope_reset')
111
+ while (stack.length > 1) popScope()
112
+ if (stack[0] && stack[0].kind === 'array') closeItem(stack[0])
113
+ }
114
+
115
+ function onThematicBreak() {
116
+ blankPending = false
117
+ // A thematic break is consumed by the innermost enclosing array whose
118
+ // most-recent item is a dict containing nested structures — this is
119
+ // the only context where jmd-format emits the break, and the parser
120
+ // mirrors that rule (spec §8.6). Inner scopes are closed; if no array
121
+ // on the stack qualifies, the line is tolerated as decoration.
122
+ let targetIdx = -1
123
+ for (let i = stack.length - 1; i >= 0; i--) {
124
+ const s = stack[i]
125
+ if (s.kind !== 'array') continue
126
+ const last = s.container[s.container.length - 1]
127
+ if (last && typeof last === 'object' && !Array.isArray(last)
128
+ && Object.values(last).some(
129
+ v => v !== null && typeof v === 'object')) {
130
+ targetIdx = i
131
+ break
132
+ }
133
+ }
134
+ if (targetIdx === -1) return
135
+ while (stack.length - 1 > targetIdx) popScope()
136
+ closeItem(stack[targetIdx])
137
+ }
138
+
139
+ // --- Blockquote ----------------------------------------------------------
140
+
141
+ function startBlockquote(container, key) {
142
+ bq = { container, key, lines: [] }
143
+ emit('field_start', { key })
144
+ }
145
+
146
+ function commitBlockquote() {
147
+ bq.container[bq.key] = bq.lines.join('\n')
148
+ bq = null
149
+ }
150
+
151
+ // --- Root / frontmatter --------------------------------------------------
152
+
153
+ function onFrontmatter(line) {
154
+ const f = parseField(line)
155
+ if (f) {
156
+ const v = f.empty ? true : f.value
157
+ frontmatter[f.key] = v
158
+ emit('frontmatter', { key: f.key, value: v })
159
+ return
160
+ }
161
+ const pk = parseKey(line)
162
+ if (pk && pk.rest === '') {
163
+ frontmatter[pk.key] = true
164
+ emit('frontmatter', { key: pk.key, value: true })
165
+ return
166
+ }
167
+ throw parseError('Unexpected line before root heading')
168
+ }
169
+
170
+ function openRoot(modeMark, text) {
171
+ inFrontmatter = false
172
+ seenRoot = true
173
+
174
+ if (modeMark === '!') mode = 'schema'
175
+ else if (modeMark === '?') mode = 'query'
176
+ else if (modeMark === '-') mode = 'delete'
177
+ else mode = 'data'
178
+
179
+ if (text === '[]') {
180
+ label = '[]'
181
+ root = []
182
+ stack = [{ kind: 'array', container: root, depth: 1, currentItem: null }]
183
+ } else if (text.endsWith('[]')) {
184
+ label = text.slice(0, -2)
185
+ root = []
186
+ stack = [{ kind: 'array', container: root, depth: 1, currentItem: null }]
187
+ } else {
188
+ label = text
189
+ root = {}
190
+ stack = [{ kind: 'object', container: root, depth: 1 }]
191
+ }
192
+ emit('document_start', { mode, label })
193
+ }
194
+
195
+ // --- Headings ------------------------------------------------------------
196
+
197
+ function onHeading(depth, modeMark, text) {
198
+ if (!seenRoot) {
199
+ if (depth !== 1) {
200
+ throw parseError('Document must begin with a depth-1 heading')
201
+ }
202
+ openRoot(modeMark, text)
203
+ return
204
+ }
205
+ if (modeMark) {
206
+ throw parseError('Mode markers (!, ?, -) are only valid on the root heading')
207
+ }
208
+
209
+ // Depth-qualified array item (§8.6a) or depth+1 item (§8.6b):
210
+ // `##N -` or `##N - key: val` starts a new item in an enclosing array.
211
+ // Resolution prefers the innermost array at depth N; failing that, an
212
+ // array at depth N-1 (the LLM-natural "items under the heading" form).
213
+ if (text === '-' || text.startsWith('- ')) {
214
+ onDepthQualifiedItem(depth, text)
215
+ return
216
+ }
217
+
218
+ popToDepth(depth)
219
+
220
+ if (text === '' || text === undefined) {
221
+ openObjectScope(depth, '')
222
+ return
223
+ }
224
+
225
+ // Anonymous sub-array: `### []` — handled below with the other array forms.
226
+ if (text === '[]') {
227
+ openSubArray(depth)
228
+ return
229
+ }
230
+
231
+ if (text.endsWith('[]')) {
232
+ const keyText = text.slice(0, -2)
233
+ const pk = parseKey(keyText)
234
+ if (!pk || pk.rest !== '') throw parseError('Malformed array heading key')
235
+ openArrayScope(depth, pk.key)
236
+ return
237
+ }
238
+
239
+ const field = parseField(text)
240
+ if (field && !field.empty) {
241
+ const parent = parentObjectAt(depth)
242
+ parent[field.key] = field.value
243
+ emit('field', { key: field.key, value: field.value })
244
+ return
245
+ }
246
+
247
+ if (field && field.empty) {
248
+ const parent = parentObjectAt(depth)
249
+ startBlockquote(parent, field.key)
250
+ return
251
+ }
252
+
253
+ const pk = parseKey(text)
254
+ if (pk && pk.rest === '') {
255
+ openObjectScope(depth, pk.key)
256
+ return
257
+ }
258
+
259
+ throw parseError('Malformed heading')
260
+ }
261
+
262
+ function openObjectScope(depth, key) {
263
+ const parent = parentObjectAt(depth)
264
+ const obj = {}
265
+ parent[key] = obj
266
+ stack.push({ kind: 'object', container: obj, depth })
267
+ emit('object_start', { key })
268
+ }
269
+
270
+ function openArrayScope(depth, key) {
271
+ const parent = parentObjectAt(depth)
272
+ const arr = []
273
+ parent[key] = arr
274
+ stack.push({ kind: 'array', container: arr, depth, currentItem: null })
275
+ emit('array_start', { key })
276
+ }
277
+
278
+ function openSubArray(depth) {
279
+ // `### []`: start a new anonymous array as the next item of the
280
+ // enclosing array scope.
281
+ const top = stack[stack.length - 1]
282
+ if (!top || top.kind !== 'array') {
283
+ throw parseError('Anonymous sub-array outside array scope')
284
+ }
285
+ closeItem(top)
286
+ const inner = []
287
+ top.container.push(inner)
288
+ stack.push({ kind: 'array', container: inner, depth, currentItem: null })
289
+ emit('array_start', {})
290
+ }
291
+
292
+ function parentObjectAt(depth) {
293
+ for (let i = stack.length - 1; i >= 0; i--) {
294
+ const s = stack[i]
295
+ if (s.depth < depth) {
296
+ if (s.kind === 'object') return s.container
297
+ if (s.kind === 'array' && s.currentItem) return s.currentItem
298
+ throw parseError('Field has no enclosing object scope')
299
+ }
300
+ }
301
+ throw parseError('No enclosing scope for depth ' + depth)
302
+ }
303
+
304
+ function popToDepth(targetDepth) {
305
+ while (stack.length > 1 && stack[stack.length - 1].depth >= targetDepth) {
306
+ popScope()
307
+ }
308
+ }
309
+
310
+ function popScope() {
311
+ const s = stack.pop()
312
+ if (s.kind === 'array') {
313
+ closeItem(s)
314
+ emit('array_end', s.depth === 1 ? {} : { key: keyOfScope(s) })
315
+ } else {
316
+ emit('object_end', { key: keyOfScope(s) })
317
+ }
318
+ }
319
+
320
+ function closeItem(arrayScope) {
321
+ if (arrayScope.currentItem !== null) {
322
+ emit('item_end')
323
+ arrayScope.currentItem = null
324
+ }
325
+ }
326
+
327
+ // The parent container holds the scope's container under a known key;
328
+ // look it up once so end events can name what closed.
329
+ function keyOfScope(scope) {
330
+ // Walk the stack below; find the container that holds scope.container.
331
+ // Micro-inefficient but runs once per pop — fine for now.
332
+ for (let i = stack.length - 1; i >= 0; i--) {
333
+ const parent = stack[i]
334
+ const bag = parent.kind === 'array' && parent.currentItem
335
+ ? parent.currentItem
336
+ : parent.container
337
+ for (const k of Object.keys(bag)) {
338
+ if (bag[k] === scope.container) return k
339
+ }
340
+ }
341
+ return undefined
342
+ }
343
+
344
+ // --- Bare fields ---------------------------------------------------------
345
+
346
+ function onField(line) {
347
+ const f = parseField(line)
348
+ if (!f) throw parseError('Malformed field line')
349
+ const top = stack[stack.length - 1]
350
+ const target = top.kind === 'object'
351
+ ? top.container
352
+ : (top.currentItem || throwHere('Bare field inside array scope without an item'))
353
+ if (f.empty) {
354
+ startBlockquote(target, f.key)
355
+ return
356
+ }
357
+ target[f.key] = f.value
358
+ emit('field', { key: f.key, value: f.value })
359
+ }
360
+
361
+ // --- Array items ---------------------------------------------------------
362
+
363
+ function onDepthQualifiedItem(headingDepth, text) {
364
+ // Find target: innermost array at depth == headingDepth wins (§8.6a).
365
+ // Else fall back to array at depth == headingDepth - 1 (§8.6b — the
366
+ // LLM-natural pattern of writing items one heading-level under the
367
+ // array heading).
368
+ let sameDepthIdx = -1
369
+ let parentDepthIdx = -1
370
+ for (let i = stack.length - 1; i >= 0; i--) {
371
+ const s = stack[i]
372
+ if (s.kind !== 'array') continue
373
+ if (sameDepthIdx === -1 && s.depth === headingDepth) sameDepthIdx = i
374
+ if (parentDepthIdx === -1 && s.depth === headingDepth - 1) {
375
+ parentDepthIdx = i
376
+ }
377
+ }
378
+ const targetIdx = sameDepthIdx !== -1 ? sameDepthIdx : parentDepthIdx
379
+ if (targetIdx === -1) {
380
+ throw parseError('Depth-qualified item has no matching array scope')
381
+ }
382
+ // Close any scopes nested inside the target array.
383
+ while (stack.length - 1 > targetIdx) popScope()
384
+ // Reuse the regular item handler with the target array on top.
385
+ onItem(text)
386
+ }
387
+
388
+ function onItem(line) {
389
+ const top = stack[stack.length - 1]
390
+ if (top.kind !== 'array') throw parseError('Array item outside array scope')
391
+
392
+ closeItem(top)
393
+ const rest = line === '-' ? '' : line.slice(2)
394
+
395
+ if (rest === '') {
396
+ const item = {}
397
+ top.container.push(item)
398
+ top.currentItem = item
399
+ emit('item_start')
400
+ return
401
+ }
402
+
403
+ const f = parseField(rest)
404
+ if (f) {
405
+ const item = {}
406
+ top.container.push(item)
407
+ top.currentItem = item
408
+ emit('item_start')
409
+ if (f.empty) {
410
+ startBlockquote(item, f.key)
411
+ } else {
412
+ item[f.key] = f.value
413
+ emit('field', { key: f.key, value: f.value })
414
+ }
415
+ return
416
+ }
417
+
418
+ const value = parseScalar(rest)
419
+ top.container.push(value)
420
+ top.currentItem = null
421
+ emit('item_value', { value })
422
+ }
423
+
424
+ function onIndented(line) {
425
+ const content = line.replace(/^\s+/, '')
426
+ const top = stack[stack.length - 1]
427
+ if (top.kind !== 'array' || !top.currentItem) {
428
+ throw parseError('Indented continuation without an active array item')
429
+ }
430
+ const f = parseField(content)
431
+ if (!f) throw parseError('Malformed indented continuation')
432
+ if (f.empty) {
433
+ startBlockquote(top.currentItem, f.key)
434
+ return
435
+ }
436
+ top.currentItem[f.key] = f.value
437
+ emit('field', { key: f.key, value: f.value })
438
+ }
439
+
440
+ // --- Finalization --------------------------------------------------------
441
+
442
+ function finish() {
443
+ if (bq !== null) commitBlockquote()
444
+ if (!seenRoot) {
445
+ throw parseError('Document contained no root heading')
446
+ }
447
+ while (stack.length > 0) popScope()
448
+ emit('document_end')
449
+ return drain()
450
+ }
451
+
452
+ // --- Errors --------------------------------------------------------------
453
+
454
+ function parseError(msg) {
455
+ const err = new Error(msg + ' (line ' + lineNo + ')')
456
+ err.line = lineNo
457
+ return err
458
+ }
459
+ function throwHere(msg) { throw parseError(msg) }
460
+
461
+ // --- Public surface ------------------------------------------------------
462
+
463
+ function parse(text) {
464
+ // A trailing newline is a line terminator, not a blank line — drop
465
+ // any empty final element from the split.
466
+ const lines = text.split('\n')
467
+ if (lines.length && lines[lines.length - 1] === '') lines.pop()
468
+ for (const line of lines) processLine(line)
469
+ finish()
470
+ return { mode, label, frontmatter, value: root }
471
+ }
472
+
473
+ async function* events(source) {
474
+ for await (const line of source) {
475
+ for (const ev of processLine(line)) yield ev
476
+ }
477
+ for (const ev of finish()) yield ev
478
+ }
479
+
480
+ return { processLine, finish, parse, events }
481
+ }
482
+
483
+ export function parse(text) {
484
+ return createParser().parse(text)
485
+ }
486
+
487
+ // Line adapter: turn an async iterable of arbitrary string chunks (e.g. a
488
+ // fetch response body stream) into an async iterable of lines. Trailing
489
+ // `\r` is stripped; the final line is emitted even if unterminated.
490
+ export async function* toLines(source) {
491
+ let buffer = ''
492
+ for await (const chunk of source) {
493
+ buffer += typeof chunk === 'string' ? chunk : String(chunk)
494
+ let idx
495
+ while ((idx = buffer.indexOf('\n')) !== -1) {
496
+ yield buffer.slice(0, idx).replace(/\r$/, '')
497
+ buffer = buffer.slice(idx + 1)
498
+ }
499
+ }
500
+ if (buffer !== '') yield buffer.replace(/\r$/, '')
501
+ }
@@ -0,0 +1,180 @@
1
+ // JMD serializer.
2
+ //
3
+ // Two surfaces over one implementation:
4
+ //
5
+ // - serializeLines(value, label, frontmatter) — generator of lines with
6
+ // trailing newlines. Suitable
7
+ // for streaming to a transport.
8
+ // - serialize(value, label, frontmatter) — returns the full document
9
+ // as a single string (no
10
+ // trailing newline), matching
11
+ // the byte form emitted by
12
+ // the jmd-format Python
13
+ // reference implementation.
14
+ //
15
+ // Generator-strict per §22.1: output matches the canonical form that a
16
+ // conforming parser accepts without tolerance.
17
+
18
+ import { serializeScalar, serializeKey } from './value.js'
19
+
20
+ export function serialize(value, label = 'Document', frontmatter = null) {
21
+ const lines = []
22
+ emitDocument(value, label, frontmatter, lines)
23
+ return lines.join('\n')
24
+ }
25
+
26
+ export function* serializeLines(value, label = 'Document', frontmatter = null) {
27
+ const lines = []
28
+ emitDocument(value, label, frontmatter, lines)
29
+ for (const ln of lines) yield ln + '\n'
30
+ }
31
+
32
+ function emitDocument(value, label, frontmatter, lines) {
33
+ if (frontmatter && Object.keys(frontmatter).length > 0) {
34
+ for (const [k, v] of Object.entries(frontmatter)) {
35
+ if (v === true) lines.push(serializeKey(k))
36
+ else lines.push(serializeKey(k) + ': ' + serializeScalar(v))
37
+ }
38
+ lines.push('')
39
+ }
40
+
41
+ if (Array.isArray(value)) {
42
+ const head = label === '[]' ? '# []' : '# ' + label + '[]'
43
+ lines.push(head)
44
+ writeArrayItems(value, lines, 1)
45
+ } else if (value !== null && typeof value === 'object') {
46
+ lines.push('# ' + label)
47
+ writeObjectFields(value, lines, 1)
48
+ } else {
49
+ throw new TypeError('Root value must be an object or array')
50
+ }
51
+ }
52
+
53
+ function heading(depth) {
54
+ return '#'.repeat(depth) + ' '
55
+ }
56
+
57
+ function writeMultiline(value, lines) {
58
+ for (const part of value.split('\n')) {
59
+ lines.push(part === '' ? '>' : '> ' + part)
60
+ }
61
+ }
62
+
63
+ function writeObjectFields(obj, lines, depth) {
64
+ let needsHeading = false
65
+ for (const [key, value] of Object.entries(obj)) {
66
+ const k = serializeKey(key)
67
+ if (isPlainObject(value)) {
68
+ lines.push('')
69
+ lines.push(heading(depth + 1) + k)
70
+ writeObjectFields(value, lines, depth + 1)
71
+ needsHeading = true
72
+ } else if (Array.isArray(value)) {
73
+ lines.push('')
74
+ lines.push(heading(depth + 1) + k + '[]')
75
+ writeArrayItems(value, lines, depth + 1)
76
+ needsHeading = true
77
+ } else if (typeof value === 'string' && value.includes('\n')) {
78
+ lines.push((needsHeading ? heading(depth + 1) : '') + k + ':')
79
+ writeMultiline(value, lines)
80
+ needsHeading = true
81
+ } else if (needsHeading) {
82
+ lines.push(heading(depth + 1) + k + ': ' + serializeScalar(value))
83
+ } else {
84
+ lines.push(k + ': ' + serializeScalar(value))
85
+ }
86
+ }
87
+ }
88
+
89
+ function writeArrayItems(lst, lines, depth) {
90
+ if (lst.length === 0) return
91
+
92
+ const allLists = lst.every(i => Array.isArray(i))
93
+ const allDicts = lst.every(isPlainObject)
94
+ const allScalars = lst.every(i => !isNested(i))
95
+
96
+ if (allLists) {
97
+ for (const item of lst) {
98
+ lines.push(heading(depth + 1) + '[]')
99
+ writeArrayItems(item, lines, depth + 1)
100
+ }
101
+ return
102
+ }
103
+
104
+ if (allDicts) {
105
+ const hasNested = lst.some(item =>
106
+ Object.values(item).some(isNested))
107
+ for (let i = 0; i < lst.length; i++) {
108
+ writeDictItem(lst[i], lines, depth, i > 0 && hasNested)
109
+ }
110
+ return
111
+ }
112
+
113
+ if (allScalars) {
114
+ for (const item of lst) {
115
+ lines.push('- ' + serializeScalar(item))
116
+ }
117
+ return
118
+ }
119
+
120
+ // Heterogeneous array.
121
+ //
122
+ // The C-accelerated Python reference does not insert thematic breaks
123
+ // inside a heterogeneous array — items simply follow one another. We
124
+ // match that form byte-for-byte.
125
+ for (const item of lst) {
126
+ if (isPlainObject(item)) {
127
+ writeDictItem(item, lines, depth, false)
128
+ } else if (Array.isArray(item)) {
129
+ lines.push(heading(depth + 1) + '[]')
130
+ writeArrayItems(item, lines, depth + 1)
131
+ } else {
132
+ lines.push('- ' + serializeScalar(item))
133
+ }
134
+ }
135
+ }
136
+
137
+ function writeDictItem(item, lines, depth, separatorNeeded) {
138
+ const scalarFields = []
139
+ const nestedFields = []
140
+ for (const [k, v] of Object.entries(item)) {
141
+ if (isNested(v)) nestedFields.push([k, v])
142
+ else scalarFields.push([k, v])
143
+ }
144
+
145
+ if (separatorNeeded) {
146
+ // Match the C-accelerated Python serializer (the default in jmd-format):
147
+ // blank line before the `---`, but the next `- ` follows immediately on
148
+ // the next line — no blank after the thematic break.
149
+ lines.push('')
150
+ lines.push('---')
151
+ }
152
+
153
+ if (scalarFields.length === 0) {
154
+ lines.push('-')
155
+ } else {
156
+ let first = true
157
+ for (const [k, v] of scalarFields) {
158
+ const sv = serializeScalar(v)
159
+ const qk = serializeKey(k)
160
+ if (first) {
161
+ lines.push('- ' + qk + ': ' + sv)
162
+ first = false
163
+ } else {
164
+ lines.push(' ' + qk + ': ' + sv)
165
+ }
166
+ }
167
+ }
168
+
169
+ if (nestedFields.length > 0) {
170
+ writeObjectFields(Object.fromEntries(nestedFields), lines, depth)
171
+ }
172
+ }
173
+
174
+ function isPlainObject(v) {
175
+ return v !== null && typeof v === 'object' && !Array.isArray(v)
176
+ }
177
+
178
+ function isNested(v) {
179
+ return v !== null && typeof v === 'object'
180
+ }
package/src/value.js ADDED
@@ -0,0 +1,106 @@
1
+ // Scalar values and keys.
2
+ //
3
+ // Parsing and serialization of the smallest JMD units: scalar values
4
+ // (null, booleans, numbers, strings) and field keys. These helpers are
5
+ // used by both parser and serializer, and stay at the character level —
6
+ // line structure and scope are handled one layer up.
7
+
8
+ const NUMBER = /^-?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+-]?\d+)?$/
9
+ const BARE_KEY = /^[A-Za-z0-9_-]+/
10
+
11
+ // Parse the text that appears after `key: ` into a JS value.
12
+ // §2.1: bare values are tried as null, true, false, number; otherwise string.
13
+ export function parseScalar(raw) {
14
+ if (raw === '') return ''
15
+ if (raw === 'null') return null
16
+ if (raw === 'true') return true
17
+ if (raw === 'false') return false
18
+ if (NUMBER.test(raw)) return Number(raw)
19
+ if (raw.charCodeAt(0) === 34 /* " */) {
20
+ // Quoted string — JSON semantics for escapes (§5, RFC 8259).
21
+ return JSON.parse(raw)
22
+ }
23
+ return raw
24
+ }
25
+
26
+ // Serialize a scalar for use as a field value.
27
+ // §6.1: quote strings that would otherwise be misread (as type or structure).
28
+ export function serializeScalar(value) {
29
+ if (value === null) return 'null'
30
+ if (value === true) return 'true'
31
+ if (value === false) return 'false'
32
+ if (typeof value === 'number') {
33
+ if (!Number.isFinite(value)) {
34
+ throw new TypeError('Cannot serialize non-finite number: ' + value)
35
+ }
36
+ return String(value)
37
+ }
38
+ if (typeof value === 'string') {
39
+ return needsQuoting(value) ? JSON.stringify(value) : value
40
+ }
41
+ throw new TypeError('Cannot serialize scalar of type ' + typeof value)
42
+ }
43
+
44
+ function needsQuoting(s) {
45
+ // Quoting rules matching the jmd-format Python reference:
46
+ // - empty string, ambiguous scalars, and numbers ⇒ always quote
47
+ // - structural prefixes (`# `, `- `) ⇒ quote
48
+ // - strings starting with `"` ⇒ quote (otherwise ambiguous with quoted form)
49
+ // - strings containing newline or tab ⇒ quote (JSON-escape line structure)
50
+ // - internal quotes and backslashes are left bare — the parser is
51
+ // tolerant enough to accept them, and Python's serializer does the same.
52
+ if (s === '') return true
53
+ if (s === 'null' || s === 'true' || s === 'false') return true
54
+ if (NUMBER.test(s)) return true
55
+ if (s === '-') return true
56
+ if (s.startsWith('# ') || s.startsWith('- ')) return true
57
+ if (s.charCodeAt(0) === 34 /* " */) return true
58
+ if (/[\n\t]/.test(s)) return true
59
+ return false
60
+ }
61
+
62
+ // Parse a bare or quoted key from the start of a string.
63
+ // Returns { key, rest } or null if no key is present.
64
+ export function parseKey(str) {
65
+ if (str.charCodeAt(0) === 34 /* " */) {
66
+ let i = 1
67
+ while (i < str.length) {
68
+ const c = str.charCodeAt(i)
69
+ if (c === 92 /* \ */) { i += 2; continue }
70
+ if (c === 34 /* " */) {
71
+ const key = JSON.parse(str.slice(0, i + 1))
72
+ return { key, rest: str.slice(i + 1) }
73
+ }
74
+ i++
75
+ }
76
+ return null
77
+ }
78
+ const m = BARE_KEY.exec(str)
79
+ if (!m) return null
80
+ return { key: m[0], rest: str.slice(m[0].length) }
81
+ }
82
+
83
+ // Serialize a key — bare if the character class permits, quoted otherwise.
84
+ export function serializeKey(key) {
85
+ if (typeof key !== 'string') {
86
+ throw new TypeError('Key must be a string, got ' + typeof key)
87
+ }
88
+ if (key.length > 0 && /^[A-Za-z0-9_-]+$/.test(key)) return key
89
+ return JSON.stringify(key)
90
+ }
91
+
92
+ // Parse `key: value` or `key:` (empty value).
93
+ // Returns { key, value } or { key, empty: true } or null.
94
+ export function parseField(line) {
95
+ const pk = parseKey(line)
96
+ if (!pk) return null
97
+ const rest = pk.rest
98
+ if (rest.length === 0) return null
99
+ if (rest.charCodeAt(0) !== 58 /* : */) return null
100
+ const after = rest.slice(1)
101
+ if (after === '' || /^\s*$/.test(after)) {
102
+ return { key: pk.key, empty: true }
103
+ }
104
+ if (after.charCodeAt(0) !== 32 /* space */) return null
105
+ return { key: pk.key, value: parseScalar(after.slice(1)) }
106
+ }