markdown-schema 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,720 @@
1
+ # markdown-schema
2
+
3
+ Turn a human-readable markdown document into **typed, validated, structured
4
+ data** — define the shape once, then parse every document that follows it into a
5
+ JSON object you can query, transform, and render.
6
+
7
+ ## Goals
8
+
9
+ - **One source of truth.** Authors edit plain markdown in any editor; tools
10
+ consume the same file as structured data. The structure is _derived_, never
11
+ duplicated.
12
+ - **Markdown you can compute over, not just display.** A parsed document is a
13
+ typed object (`frontmatter` + `title` + section keys), so a renderer reads
14
+ `doc.frontmatter.colors` or `doc.endpoints[]` directly — no inline regex, no
15
+ string-hunting.
16
+ - **Correctness you can enforce in CI.** Every document is validated against a
17
+ Zod schema. A doc that drifts from its shape — a missing section, a malformed
18
+ table, a broken cross-reference — fails the `validate-md` CLI.
19
+ - **Author-friendly, machine-readable.** The markdown stays readable and
20
+ diff-friendly; downstream gets the rigor of a typed payload.
21
+
22
+ ### Why structured data beats raw markdown
23
+
24
+ Raw markdown can only be _displayed_. Once parsed into typed fields, the same
25
+ file drives many renderings. Two examples, each from a single markdown file:
26
+
27
+ - **A design-token doc** — following Google Labs'
28
+ [DESIGN.md specification](https://stitch.withgoogle.com/docs/design-md/specification)
29
+ ([source](https://github.com/google-labs-code/design.md)), where YAML
30
+ frontmatter holds machine-readable tokens (the _what_) and the markdown body
31
+ holds human-readable design rationale (the _why_). The frontmatter parses into
32
+ typed maps (`colors`, `typography`, `spacing`, `rounded`, `components`), so a
33
+ renderer can show colors as **interactive swatches**, typography as **live
34
+ font specimens**, spacing as **dimension bars**, and resolve
35
+ `{foreground}`-style token references against the color map at render time. Raw
36
+ markdown would show `#3d6e6c` as text; structured data shows the colour.
37
+ - **A system-overview doc** — GFM tables parse into typed row arrays
38
+ (`nodes[]`, `edges[]`, `groups[]`, `layoutHints[]`), which a renderer can turn
39
+ into a **diagram**: filter nodes by deployment mode without re-parsing, lay
40
+ them out from numeric position hints, compute group bounding boxes, and route
41
+ edges by geometry. Raw markdown is a table you read; structured data is a graph
42
+ you filter and lay out.
43
+
44
+ The trade is parse-once-up-front for query-and-render-many afterwards.
45
+
46
+ ## Two ways to define a shape
47
+
48
+ - **Template mode** (recommended for most docs) — write a `*.template.md` file
49
+ with `<!-- TEMPLATE-ONLY: -->` directives; the schema is derived automatically
50
+ at runtime. No separate TypeScript file, and it powers the `validate-md` CLI.
51
+ - **Programmatic mode** (advanced) — define the schema in TypeScript with
52
+ `defineDocSchema` when you need full static types, custom extractors, or shapes
53
+ the directive grammar can't express (geometry, cross-section invariants).
54
+
55
+ Both modes parse a leading YAML **frontmatter** block when present and expose it
56
+ as `doc.frontmatter` — see [Frontmatter](#frontmatter).
57
+
58
+ ## Install
59
+
60
+ ```bash
61
+ pnpm add markdown-schema
62
+ ```
63
+
64
+ The core API (`parseTemplate`, `defineDocSchema`, extractors) runs in the
65
+ **browser and Node**. Filesystem helpers (`loadRefine`) and the CLI are
66
+ Node-only — see [Entry points](#entry-points).
67
+
68
+ ---
69
+
70
+ ## Agent skill
71
+
72
+ The package ships an **agent skill** that teaches a coding agent (Claude Code,
73
+ Cursor, Codex, and 70+ others) the three template-mode workflows:
74
+
75
+ - **Authoring** a `*.template.md` — choosing directives, splitting sections, and
76
+ deciding what belongs in a `*.refine.ts` companion.
77
+ - **Filling** a template — replacing `<!-- TEMPLATE-ONLY: -->` directives with
78
+ real content.
79
+ - **Validating** a document with `validate-md`, including how to read and triage
80
+ its error output.
81
+
82
+ It encodes the directive grammar, a which-directive decision table, the
83
+ template-vs-`*.refine.ts` boundary, and a triage table for common failures, so
84
+ the agent gets these workflows right without re-deriving the rules each time.
85
+
86
+ ### Installing the skill
87
+
88
+ Use the [`skills`](https://github.com/vercel-labs/skills) CLI — the open
89
+ agent-skills installer. It copies the skill into your agent's skills directory
90
+ (`.claude/skills/`, `.cursor/skills/`, …) and auto-detects which agents you have:
91
+
92
+ ```bash
93
+ # install the skill from this repo into the current project
94
+ npx skills add gergelyszerovay/markdown-schema --skill markdown-schema
95
+
96
+ # …or globally (~/.claude/skills/, ~/.cursor/skills/, …)
97
+ npx skills add gergelyszerovay/markdown-schema --skill markdown-schema -g
98
+ ```
99
+
100
+ The skill source lives under
101
+ [`.claude/skills/markdown-schema/`](.claude/skills/markdown-schema/):
102
+
103
+ | File | Role |
104
+ | -------------------------------- | -------------------------------------------------------- |
105
+ | `SKILL.md` | Skill entry point — triggers and the three workflows |
106
+ | `markdown-schema.guideline.md` | Source of truth for the `<!-- TEMPLATE-ONLY: -->` grammar |
107
+
108
+ > Agents do **not** scan `node_modules`, so installing the npm package alone does
109
+ > not register the skill — use `npx skills add` (above) to copy it onto a skills
110
+ > path. As a fallback you can copy the directory manually into `.claude/skills/`.
111
+
112
+ Once installed, the agent triggers it automatically when you ask to author,
113
+ fill, or validate a `*.template.md`.
114
+
115
+ ---
116
+
117
+ ## Template mode
118
+
119
+ ### 1. Write a template
120
+
121
+ `*.template.md` files use `<!-- TEMPLATE-ONLY: ... -->` HTML comments to embed
122
+ schema directives and author-facing prose. Everything outside the comments is
123
+ fixed structure that every filled instance must preserve.
124
+
125
+ Directives come in two structural shapes:
126
+
127
+ - **Inline directives** sit _inside_ a heading, list item, or paragraph and
128
+ must close on the same line as the opener.
129
+ - **Block directives** sit on their own line, opener at column ≤ 3, closer
130
+ on its own line. Body lines must start at column 0.
131
+
132
+ Author-facing prose lives in standalone `guide` block directives — `//`-prefixed
133
+ lines, ignored by the parser. Place them immediately before the field, list,
134
+ table, or section they document.
135
+
136
+ ```markdown
137
+ <!-- TEMPLATE-ONLY: guide
138
+ // Short product name + release date, e.g. "Acme 2.4 — 2026-05-08".
139
+ -->
140
+
141
+ # Release Notes: <!-- TEMPLATE-ONLY: string; required -->
142
+
143
+ ## 1. Metadata
144
+
145
+ - Version: <!-- TEMPLATE-ONLY: string; regex `^\d+\.\d+\.\d+$`; required -->
146
+
147
+ <!-- TEMPLATE-ONLY: guide
148
+ // Use GA only after the release has shipped to all customers.
149
+ -->
150
+
151
+ - Stage: <!-- TEMPLATE-ONLY: enum: Alpha | Beta | GA; required -->
152
+
153
+ ## 2. Highlights
154
+
155
+ <!-- TEMPLATE-ONLY: guide
156
+ // One concise sentence per bullet. Keep under 80 chars.
157
+ -->
158
+
159
+ - <!-- TEMPLATE-ONLY: string; required -->
160
+
161
+ ## 3. Acceptance Criteria
162
+
163
+ | ID | Description | Priority |
164
+ | --- | ----------- | -------- |
165
+
166
+ <!-- TEMPLATE-ONLY: row; min-rows: 1
167
+ ID: string; regex `^AC-\d+$`; required
168
+ Description: string; required
169
+ Priority: enum: Low | Medium | High; required
170
+ -->
171
+
172
+ | AC-001 | Sample criterion. | High |
173
+ ```
174
+
175
+ ### 2. Fill the template
176
+
177
+ Replace every `<!-- TEMPLATE-ONLY: … -->` block with actual content:
178
+
179
+ ```markdown
180
+ # Release Notes: Widget v2
181
+
182
+ ## 1. Metadata
183
+
184
+ - Version: 2.0.0
185
+ - Stage: GA
186
+
187
+ ## 2. Highlights
188
+
189
+ - Rewrote the plugin loader to support async plugins
190
+
191
+ ## 3. Acceptance Criteria
192
+
193
+ | ID | Description | Priority |
194
+ | ------ | -------------------------------------------- | -------- |
195
+ | AC-001 | Async plugin loads in < 100 ms | High |
196
+ | AC-002 | Legacy v1 config emits a deprecation warning | Medium |
197
+ ```
198
+
199
+ ### 3. Validate from the CLI
200
+
201
+ ```bash
202
+ validate-md --template release.template.md release.md
203
+ # release.md: OK
204
+ ```
205
+
206
+ ### 4. Parse in code
207
+
208
+ ```ts
209
+ import { readFileSync } from "node:fs";
210
+ import { parseTemplate } from "markdown-schema";
211
+
212
+ const templateRaw = readFileSync("release.template.md", "utf-8");
213
+ const schema = parseTemplate(templateRaw);
214
+
215
+ const raw = readFileSync("release.md", "utf-8");
216
+ const doc = schema.parse(raw);
217
+ // doc.metadata → { Version: "2.0.0", Stage: "GA" }
218
+ // doc.highlights → ["Rewrote the plugin loader to support async plugins"]
219
+ // doc.acceptanceCriteria → [{ ID: "AC-001", Description: "...", Priority: "High" }, ...]
220
+ ```
221
+
222
+ ### Cross-section invariants (`*.refine.ts`)
223
+
224
+ For rules that span multiple sections, add a sibling `*.refine.ts`:
225
+
226
+ ```ts
227
+ // release.refine.ts
228
+ import type { z } from "zod";
229
+
230
+ export const refine = (doc: unknown, ctx: z.core.$RefinementCtx): void => {
231
+ const d = doc as Record<string, unknown>;
232
+ const meta = d["metadata"] as Record<string, string>;
233
+ if (meta["Stage"] !== "GA" && !d["knownIssues"]) {
234
+ ctx.addIssue({
235
+ code: "custom",
236
+ path: ["knownIssues"],
237
+ message: "Non-GA releases must declare known issues",
238
+ });
239
+ }
240
+ };
241
+ ```
242
+
243
+ Load it automatically at validation time:
244
+
245
+ ```bash
246
+ validate-md --template release.template.md release.md
247
+ # refine.ts is loaded automatically when a sibling release.refine.ts exists
248
+ ```
249
+
250
+ Or load it manually in code:
251
+
252
+ ```ts
253
+ import { parseTemplate } from "markdown-schema";
254
+ // loadRefine reads the filesystem, so it ships on the node-only subpath.
255
+ import { loadRefine } from "markdown-schema/node";
256
+
257
+ const refine = await loadRefine("release.template.md");
258
+ const schema = parseTemplate(templateRaw, { refine });
259
+ ```
260
+
261
+ ### Frontmatter
262
+
263
+ A document may begin with a YAML frontmatter block — a `---`-fenced region at
264
+ the very top of the file. When present, it is parsed and exposed on the result
265
+ as `doc.frontmatter`. When absent, `doc.frontmatter` is `undefined`; an empty
266
+ block (`---\n---`) yields `{}`.
267
+
268
+ ```markdown
269
+ ---
270
+ name: Heritage
271
+ colors:
272
+ primary: "#1A1C1E"
273
+ neutral: "#F7F5F2"
274
+ ---
275
+
276
+ # Heritage
277
+
278
+ ## 1. Summary
279
+
280
+ Body.
281
+ ```
282
+
283
+ ```ts
284
+ const doc = schema.parse(raw);
285
+ // doc.frontmatter → { name: "Heritage", colors: { primary: "#1A1C1E", neutral: "#F7F5F2" } }
286
+ ```
287
+
288
+ The frontmatter is parsed and **exposed** — it is not validated against any
289
+ declarative schema. There is no template grammar for describing frontmatter
290
+ keys; the YAML region of a `*.template.md` is ignored for schema inference.
291
+ Validate frontmatter in a `*.refine.ts` sibling, which receives the whole
292
+ document including `doc.frontmatter`:
293
+
294
+ ```ts
295
+ // design.refine.ts
296
+ import type { z } from "zod";
297
+
298
+ export const refine = (doc: unknown, ctx: z.core.$RefinementCtx): void => {
299
+ const fm = (doc as { frontmatter?: { colors?: Record<string, string> } })
300
+ .frontmatter;
301
+ if (!fm?.colors?.["primary"]) {
302
+ ctx.addIssue({
303
+ code: "custom",
304
+ path: ["frontmatter", "colors", "primary"],
305
+ message: "primary color is required",
306
+ });
307
+ }
308
+ };
309
+ ```
310
+
311
+ `doc.frontmatter` is typed as `unknown` (its shape is document-specific); narrow
312
+ it inside the refine function. The parsed frontmatter also flows through
313
+ `--emit-json`, so it appears in the emitted document object.
314
+
315
+ > **Why no declarative frontmatter grammar?** Frontmatter shapes vary widely
316
+ > (nested maps, arrays, domain-specific types). Rather than grow the directive
317
+ > grammar, the parser extracts the YAML and hands it to `*.refine.ts`, where
318
+ > arbitrary Zod/TypeScript rules can validate it with full flexibility.
319
+
320
+ ### Directive reference
321
+
322
+ All directives live inside `<!-- TEMPLATE-ONLY: ... -->` blocks. The first
323
+ non-whitespace token is the directive kind.
324
+
325
+ **Inline directives** (must close on the same line as the opener):
326
+
327
+ | Directive | Schema produced |
328
+ | ------------------------------------- | -------------------------------------------- | ------------------------------- |
329
+ | `string; required` | `z.string().min(1)` |
330
+ | `string; optional` | `z.string().optional()` |
331
+ | `string; regex \`^…$\`; required` | `z.string().regex(…)` |
332
+ | `string; optional; default=<value>` | falls back to `<value>` when blank |
333
+ | `string; optional; only-if Key=Value` | field present only when sibling equals value |
334
+ | `enum: A \| B \| C; required` | `z.enum(["A", "B", "C"])` |
335
+ | `enum: A \\\| B \| C; required` | choice `"A | B"`, choice `"C"` (`\|` escape) |
336
+
337
+ `regex` is a modifier of `string`, not a type of its own. A bare `regex` without a leading `string;` is a hard error.
338
+
339
+ **Block directives** (own-line opener at column ≤ 3; body lines at column 0):
340
+
341
+ | Directive | Effect |
342
+ | ------------------------------------------------------ | ---------------------------------------------------- |
343
+ | `freetext; required` / `freetext; optional` | section body is free-form markdown (see below) |
344
+ | `row; min-rows: N; max-rows: N` + body of column specs | `z.array(z.object({…}))` — one entry per row |
345
+ | `section; optional` | whole section heading may be absent |
346
+ | `section; remove-if Key=Value` | section must be absent when expression holds |
347
+ | `section; min-groups: N` / `max-groups: N` | for repeated sections, constrain the count of groups |
348
+ | `guide` + body of `//` lines | author-facing prose; ignored by parser |
349
+
350
+ **`row` column specs** use the same grammar as inline `string` / `enum:`
351
+ directives — one column per body line, e.g.:
352
+
353
+ ```markdown
354
+ <!-- TEMPLATE-ONLY: row; min-rows: 1
355
+ ID: string; regex `^AC-\d+$`; required
356
+ Description: string; required
357
+ Priority: enum: Low | Medium | High; required
358
+ -->
359
+ ```
360
+
361
+ ### Author-facing prose: `guide` blocks
362
+
363
+ Long-form authoring guidance lives in standalone `guide` block directives.
364
+ Body lines must start with `//` (after optional whitespace) or be blank.
365
+
366
+ ```markdown
367
+ <!-- TEMPLATE-ONLY: guide
368
+ // Choose the release stage. GA means the feature is production-ready and
369
+ // has shipped to all customers; Beta means a stable preview; Alpha means
370
+ // early access for design partners.
371
+ -->
372
+
373
+ - Stage: <!-- TEMPLATE-ONLY: enum: Alpha | Beta | GA; required -->
374
+ ```
375
+
376
+ **Convention:** place a `guide` block immediately _before_ the field, list,
377
+ table, sub-heading, or section it documents. A `guide` at the top of a
378
+ section documents the whole section; one at the top of the file documents
379
+ the whole template. The parser doesn't enforce placement, but authors read
380
+ top-to-bottom and expect explanation before the thing being explained.
381
+
382
+ A `guide` answers at least one of: _what_ this field is in everyday words,
383
+ _why_ a constraint exists, _when_ a section/field applies, or shows an
384
+ _example_ of a good answer. Avoid restating the grammar
385
+ (`// string; required` adds no information) and avoid engineer-speak the
386
+ filling author won't share.
387
+
388
+ ### Free-text sections
389
+
390
+ When a section contains a mix of prose, code, lists, blockquotes, or other
391
+ arbitrary markdown that doesn't fit a fixed schema shape, opt it into
392
+ **free-text mode** by placing a `freetext` block directive under the H2:
393
+
394
+ ````markdown
395
+ ## 1. Summary
396
+
397
+ <!-- TEMPLATE-ONLY: freetext; required -->
398
+
399
+ A paragraph.
400
+
401
+ ```ts
402
+ const example = true;
403
+ ```
404
+
405
+ - A bullet point.
406
+ ````
407
+
408
+ The section's JSON value will be the body serialized back to a markdown
409
+ string via `mdast-util-to-markdown` (with GFM extensions). All node types
410
+ are preserved: paragraphs, fenced code, lists, tables, blockquotes, thematic
411
+ breaks, H3+ headings.
412
+
413
+ **Authoring rules:**
414
+
415
+ - Use H3+ for structure inside a free-text body; H2 always ends the section.
416
+ - No `<!-- TEMPLATE-ONLY: -->` directives of any kind inside the body —
417
+ including `guide` blocks. Place `guide` blocks _above_ the H2 instead.
418
+ - Sections with no recognized shape and no `freetext` directive are a hard
419
+ error; the parser will not silently accept them.
420
+ - **Serialization normalizations:** `mdast-util-to-markdown` normalizes some
421
+ constructs on round-trip. Bare URLs become autolink literals (`https://…`
422
+ → `<https://…>`). Thematic breaks may be normalized to `***`. Downstream
423
+ consumers should treat the value as a markdown string, not as the exact
424
+ source bytes.
425
+
426
+ ### Heading-level directive support
427
+
428
+ | Heading level | Inline directive supported? |
429
+ | ------------- | --------------------------------------------------------------------- |
430
+ | H1 | **Yes** — validates the document title (typed `string` / `enum:`) |
431
+ | H2 | **No** — H2 text is the JSON section key; keys must be fixed |
432
+ | H3 | **Yes, in repeated sections** — validates each per-group heading text |
433
+
434
+ ### Section extractor inference
435
+
436
+ The parser picks an extractor automatically from the section body shape:
437
+
438
+ | Body shape | Extractor | Returns |
439
+ | ------------------------------------------- | ------------ | ------------------------------------------------- |
440
+ | `freetext` directive present | `freetext` | `string` (markdown source, round-tripped) |
441
+ | Sub-headings (H3 inside H2) | `repeated` | `{ heading: string; items: string[] }[]` |
442
+ | GFM table (with `row` directive) | `table` | `Record<string, string>[]` |
443
+ | Labeled bullet list (`- Key: value`) | `bulletList` | `Record<string, string \| undefined>` |
444
+ | GFM task list (`- [ ]` / `- [x]`) | `taskList` | `{ checked: boolean; text: string }[]` |
445
+ | Plain unordered bullet list | `bulletList` | `string[]` |
446
+ | None of the above (no `freetext` directive) | **error** | "no recognized shape; add a `freetext` directive" |
447
+
448
+ ---
449
+
450
+ ## Programmatic mode (advanced)
451
+
452
+ > Most documents are better served by [template mode](#template-mode) — the
453
+ > schema lives next to the prose and the CLI validates it. Reach for
454
+ > programmatic mode only when you need what's below.
455
+
456
+ Use `defineDocSchema` when you need full TypeScript types or extractors not
457
+ covered by template directives.
458
+
459
+ ### Example
460
+
461
+ A complete, runnable example lives in
462
+ [`examples/programmatic/`](examples/programmatic/) — a release-notes schema that
463
+ exercises **every** extractor and `defineDocSchema` option:
464
+
465
+ | File | Role |
466
+ | ------------------------------------------------------------------------- | -------------------------------------------------------------- |
467
+ | [`release-schema.ts`](examples/programmatic/release-schema.ts) | The `defineDocSchema` schema (title, all extractors, `refine`) |
468
+ | [`release.md`](examples/programmatic/release.md) | A filled document that validates against it |
469
+ | [`run.ts`](examples/programmatic/run.ts) | Parses `release.md` and prints the typed JSON |
470
+ | [`output/release.json`](examples/programmatic/output/release.json) | The emitted, validated structured payload |
471
+
472
+ ```bash
473
+ node --experimental-strip-types examples/programmatic/run.ts
474
+ ```
475
+
476
+ Two sections in that example are worth calling out, because they show the
477
+ repeating-group extractors that template directives cannot express:
478
+
479
+ - **`changes`** uses [`repeated`](#extractors) — splits the section by `H3`
480
+ sub-heading and runs a sub-extractor map (`freetext` + `optional(table)` +
481
+ `codeBlocks`) on each group.
482
+ - **`migration`** uses [`repeatedWhere`](#extractors) — splits by an arbitrary
483
+ node predicate (here, `thematicBreak` / `---` rules) rather than headings.
484
+
485
+ Its `refine` then ties the two together as a cross-section invariant (pending
486
+ checklist items require migration steps).
487
+
488
+ <details>
489
+ <summary>Skeleton of the schema (see the file for the full version)</summary>
490
+
491
+ ```ts
492
+ import { z } from "zod";
493
+ import {
494
+ defineDocSchema,
495
+ freetext,
496
+ table,
497
+ codeBlocks,
498
+ optional,
499
+ repeated,
500
+ repeatedWhere,
501
+ } from "markdown-schema";
502
+
503
+ export const ReleaseDoc = defineDocSchema({
504
+ title: { schema: z.string().regex(/^v\d+\.\d+\.\d+$/, "must be semver") },
505
+ sections: {
506
+ summary: { heading: "Summary", extract: freetext, schema: z.string().min(10) },
507
+ // …
508
+ changes: {
509
+ heading: "Changes",
510
+ extract: repeated({
511
+ shape: { description: freetext, params: optional(table), snippets: codeBlocks },
512
+ }),
513
+ schema: z.array(/* Change */ z.object({})).min(1),
514
+ },
515
+ migration: {
516
+ heading: "Migration",
517
+ extract: repeatedWhere({
518
+ startsAt: (n) => n.type === "thematicBreak",
519
+ shape: { body: freetext },
520
+ }),
521
+ schema: z.array(z.object({ heading: z.string(), body: z.string().min(1) })),
522
+ optional: true,
523
+ },
524
+ },
525
+ refine: (doc, ctx) => {
526
+ /* pending checklist items require migration steps — see the file */
527
+ },
528
+ });
529
+ ```
530
+
531
+ </details>
532
+
533
+ ### CLI support
534
+
535
+ The `validate-md` CLI is **template mode only** — it derives the schema from a
536
+ `*.template.md` file. Programmatic schemas (`defineDocSchema`) are used in code;
537
+ call `.parse(raw)` directly, as the example above does. The old
538
+ `--schema` / `--export` flags were removed in v0.2.0; the CLI now errors and
539
+ points to `--template`.
540
+
541
+ ---
542
+
543
+ ## Entry points
544
+
545
+ The package has two entry points so the core API can bundle for the browser:
546
+
547
+ | Import path | Environment | Exports |
548
+ | ------------------------------------------ | ------------- | ----------------------------------------------------------------- |
549
+ | `markdown-schema` | browser + node | `parseTemplate`, `defineDocSchema`, all extractors, `headingToKey`, `type RefineFunction` — pure mdast/zod, no node builtins |
550
+ | `markdown-schema/node` | node only | `loadRefine` — reads the filesystem and dynamically imports `*.refine.ts` |
551
+
552
+ The `validate-md` CLI is node-only and uses the `/node` entry internally.
553
+
554
+ ---
555
+
556
+ ## API surface
557
+
558
+ ### Template mode
559
+
560
+ | Export | Entry | Description |
561
+ | --------------------------- | ------- | -------------------------------------------------------------------------- |
562
+ | `parseTemplate(raw, opts?)` | `.` | Parses a `*.template.md` string into a `{ parse(raw) }` schema object |
563
+ | `headingToKey(heading)` | `.` | Converts H2 text (e.g. `"1. Summary"`) to a camelCase key (`"summary"`) |
564
+ | `loadRefine(templatePath)` | `/node` | Dynamically loads the sibling `*.refine.ts`; returns `undefined` if absent |
565
+
566
+ **`parseTemplate` options (`opts`):**
567
+
568
+ | Option | Default | Description |
569
+ | -------- | ------- | ------------------------------------------------------------------- |
570
+ | `refine` | — | Cross-section refinement callback to attach (usually `loadRefine`'s result) |
571
+ | `file` | — | Source file path used in error messages |
572
+
573
+ ### Extractors
574
+
575
+ | Extractor | Returns |
576
+ | ------------------------------------------------ | ------------------------------------------------------------------------------------- |
577
+ | `freetext` | Section body serialized back to markdown (all node types preserved) |
578
+ | `table` | First GFM table as `Record<string, string>[]`; throws if absent |
579
+ | `bulletList` | Plain list items as `string[]`; labeled list as `Record<string, string \| undefined>` |
580
+ | `taskList` | GFM task list as `{ checked: boolean; text: string }[]`; throws if absent |
581
+ | `codeBlocks` | All fenced code blocks as `{ lang: string \| null; value: string }[]` |
582
+ | `rawNodes` | The raw `RootContent[]` unchanged |
583
+ | `fencedCodeWithMarker({ marker, markerLabel? })` | Code block following an HTML comment matching `marker` |
584
+ | `optional(ex)` | Wraps any extractor; returns `undefined` instead of throwing |
585
+ | `repeated({ by?, shape })` | Splits by sub-heading; auto-detects depth |
586
+ | `repeatedWhere({ startsAt, shape, … })` | Generic repeating groups driven by a node predicate |
587
+
588
+ ### Doc schema builder
589
+
590
+ | Export | Description |
591
+ | ----------------------- | ----------------------------------------------- |
592
+ | `defineDocSchema(spec)` | Returns `{ parse(raw: string): DocOf<S> }` |
593
+ | `SectionSpec<S>` | Type for one section entry |
594
+ | `DocOf<S>` | Infers the fully-typed parse result from a spec; includes `frontmatter?: unknown` |
595
+
596
+ **`defineDocSchema` spec fields:**
597
+
598
+ | Field | Default | Description |
599
+ | -------------- | -------- | ---------------------------------------------------- |
600
+ | `title` | — | `{ schema }` — extracts the H1 as the document title |
601
+ | `titleDepth` | `1` | Heading depth for the title |
602
+ | `sectionDepth` | `2` | Heading depth used as section boundaries |
603
+ | `sections` | required | Map of output key → `SectionSpec` |
604
+ | `refine` | — | `(doc, ctx) => void` — cross-section Zod refinement |
605
+
606
+ **`SectionSpec` fields:**
607
+
608
+ | Field | Default | Description |
609
+ | ---------- | ----------- | ------------------------------------------------- |
610
+ | `heading` | section key | Heading text in the document |
611
+ | `extract` | required | Extractor called on the section's body nodes |
612
+ | `schema` | required | Zod schema that validates the extracted value |
613
+ | `optional` | `false` | When `true`, a missing section yields `undefined` |
614
+
615
+ ---
616
+
617
+ ## CLI
618
+
619
+ ```bash
620
+ # Validate one or more documents against a template
621
+ validate-md --template component.template.md component.md
622
+ validate-md --template component.template.md *.md
623
+
624
+ # Validate and emit the parsed document as JSON (exactly one input file)
625
+ validate-md --template component.template.md --emit-json component.json component.md
626
+ ```
627
+
628
+ - `--template <file.template.md>` — required; the template that drives the schema.
629
+ - `--emit-json <path>` — write the parsed document (sections, title, and
630
+ `frontmatter`) to `<path>` as JSON. Requires exactly one input file.
631
+ - `--no-refine` — skip loading the sibling `*.refine.ts`. Use when validating
632
+ **untrusted** templates (see [Security](#security)).
633
+
634
+ A sibling `*.refine.ts` next to the template is loaded automatically (unless
635
+ `--no-refine` is given).
636
+
637
+ Prints `<file>: OK` on success — or `<file>: OK (wrote <path>)` when
638
+ `--emit-json` is set — or `<file>: schema validation failed` with one
639
+ `path: message` line per Zod issue on failure. Exits `1` if any file fails, `0`
640
+ otherwise.
641
+
642
+ All status lines are written to **stderr**, so `--emit-json /dev/stdout` yields
643
+ clean JSON on stdout (pipeable to `jq`).
644
+
645
+ ### Requirements
646
+
647
+ - **Node.js ≥ 22.18**. The CLI loads the sibling `*.refine.ts` with Node's
648
+ built-in [type stripping](https://nodejs.org/api/typescript.html#type-stripping)
649
+ (unflagged from 22.18 / 23.6 onward). No `tsx`, `ts-node`, or build step
650
+ needed.
651
+ - TypeScript syntax that erases to plain JavaScript works as-is (`type`,
652
+ `interface`, type annotations, `as`, `satisfies`). Runtime-emitting syntax
653
+ (`enum`, `namespace`, parameter properties, legacy decorators) must be
654
+ compiled first.
655
+
656
+ ---
657
+
658
+ ## Security
659
+
660
+ This library builds schemas from `*.template.md` files and validates documents
661
+ against them. If **either** the template or the document can come from an
662
+ untrusted source (a PR, an upload, a third party), read this.
663
+
664
+ ### Executing `*.refine.ts` (code execution)
665
+
666
+ Loading a template auto-imports its sibling `<stem>.refine.ts` and runs its
667
+ `refine` export. **Validating an untrusted template therefore executes
668
+ arbitrary code** shipped beside it. The path is confined to the sibling file (no
669
+ traversal), but the feature is code-execution by design.
670
+
671
+ - CLI: pass `--no-refine` to skip it.
672
+ - Programmatic: simply don't call `loadRefine` / don't pass `refine` to
673
+ `parseTemplate`. `parseTemplate(raw)` alone never loads or runs any file.
674
+
675
+ ### Template-supplied regular expressions (ReDoS)
676
+
677
+ A template's `regex` modifier is compiled and tested against document values.
678
+ Every such pattern is routed through a ReDoS guard that runs at schema-build
679
+ time: it caps the pattern length and statically analyses it, rejecting
680
+ exponential-backtracking patterns outright and allowing only low-degree
681
+ (≤ 2) polynomial patterns. An unsafe pattern fails template parsing with a
682
+ `DirectiveError` rather than hanging the validator on a crafted document.
683
+
684
+ ### Frontmatter and untrusted keys
685
+
686
+ YAML frontmatter is parsed with `yaml` (≥ 2), which materialises `__proto__`
687
+ and similar keys as ordinary own properties — no prototype pollution. Table
688
+ headers and labeled-list keys derived from untrusted markdown are collected into
689
+ null-prototype objects for the same reason. Frontmatter is **exposed, not
690
+ schema-validated**; validate it yourself in a `*.refine.ts` (which only runs
691
+ under the conditions above).
692
+
693
+ ### Filesystem paths (CLI)
694
+
695
+ `validate-md` reads `--template`/input paths and writes `--emit-json` to any
696
+ path you give it, including absolute and `../` paths. This is fine when an
697
+ operator runs the CLI, but do not pass attacker-controlled path arguments to it
698
+ without confining them yourself.
699
+
700
+ ---
701
+
702
+ ## Design notes
703
+
704
+ The toolkit composes each section's Zod schema into a single `z.object` and
705
+ validates the whole document at once, so refinement callbacks receive a
706
+ fully-typed document and errors arrive as a single `ZodError` tree. Structural
707
+ failures (missing required section, malformed table) throw a plain `Error`
708
+ before Zod runs — structure is checked first, semantics second.
709
+
710
+ In template mode, `parseTemplate` walks the mdast tree, collects
711
+ `<!-- TEMPLATE-ONLY: -->` directives, and infers an extractor + Zod schema for
712
+ each H2 section. Section keys are derived from heading text via `headingToKey`
713
+ (e.g. `"4. Inputs (Props)"` → `"inputs"`). An optional `*.refine.ts` sibling
714
+ can be loaded at runtime to add cross-section invariants without coupling them
715
+ to the template syntax.
716
+
717
+ YAML frontmatter is parsed via `remark-frontmatter` + `yaml` and lifted onto
718
+ `doc.frontmatter` alongside the title and sections. It is exposed but not
719
+ schema-validated by the parser; frontmatter rules belong in `*.refine.ts`.
720
+ ````