markdown-schema 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/skills/markdown-schema/SKILL.md +282 -0
- package/.claude/skills/markdown-schema/markdown-schema.guideline.md +419 -0
- package/LICENSE +21 -0
- package/README.md +720 -0
- package/dist/bin/validate-md.js +1127 -0
- package/dist/bin/validate-md.js.map +1 -0
- package/dist/index.d.ts +296 -0
- package/dist/index.js +1039 -0
- package/dist/index.js.map +1 -0
- package/dist/node.d.ts +17 -0
- package/dist/node.js +18 -0
- package/dist/node.js.map +1 -0
- package/package.json +72 -0
package/README.md
ADDED
|
@@ -0,0 +1,720 @@
|
|
|
1
|
+
# markdown-schema
|
|
2
|
+
|
|
3
|
+
Turn a human-readable markdown document into **typed, validated, structured
|
|
4
|
+
data** — define the shape once, then parse every document that follows it into a
|
|
5
|
+
JSON object you can query, transform, and render.
|
|
6
|
+
|
|
7
|
+
## Goals
|
|
8
|
+
|
|
9
|
+
- **One source of truth.** Authors edit plain markdown in any editor; tools
|
|
10
|
+
consume the same file as structured data. The structure is _derived_, never
|
|
11
|
+
duplicated.
|
|
12
|
+
- **Markdown you can compute over, not just display.** A parsed document is a
|
|
13
|
+
typed object (`frontmatter` + `title` + section keys), so a renderer reads
|
|
14
|
+
`doc.frontmatter.colors` or `doc.endpoints[]` directly — no inline regex, no
|
|
15
|
+
string-hunting.
|
|
16
|
+
- **Correctness you can enforce in CI.** Every document is validated against a
|
|
17
|
+
Zod schema. A doc that drifts from its shape — a missing section, a malformed
|
|
18
|
+
table, a broken cross-reference — fails the `validate-md` CLI.
|
|
19
|
+
- **Author-friendly, machine-readable.** The markdown stays readable and
|
|
20
|
+
diff-friendly; downstream gets the rigor of a typed payload.
|
|
21
|
+
|
|
22
|
+
### Why structured data beats raw markdown
|
|
23
|
+
|
|
24
|
+
Raw markdown can only be _displayed_. Once parsed into typed fields, the same
|
|
25
|
+
file drives many renderings. Two examples, each from a single markdown file:
|
|
26
|
+
|
|
27
|
+
- **A design-token doc** — following Google Labs'
|
|
28
|
+
[DESIGN.md specification](https://stitch.withgoogle.com/docs/design-md/specification)
|
|
29
|
+
([source](https://github.com/google-labs-code/design.md)), where YAML
|
|
30
|
+
frontmatter holds machine-readable tokens (the _what_) and the markdown body
|
|
31
|
+
holds human-readable design rationale (the _why_). The frontmatter parses into
|
|
32
|
+
typed maps (`colors`, `typography`, `spacing`, `rounded`, `components`), so a
|
|
33
|
+
renderer can show colors as **interactive swatches**, typography as **live
|
|
34
|
+
font specimens**, spacing as **dimension bars**, and resolve
|
|
35
|
+
`{foreground}`-style token references against the color map at render time. Raw
|
|
36
|
+
markdown would show `#3d6e6c` as text; structured data shows the colour.
|
|
37
|
+
- **A system-overview doc** — GFM tables parse into typed row arrays
|
|
38
|
+
(`nodes[]`, `edges[]`, `groups[]`, `layoutHints[]`), which a renderer can turn
|
|
39
|
+
into a **diagram**: filter nodes by deployment mode without re-parsing, lay
|
|
40
|
+
them out from numeric position hints, compute group bounding boxes, and route
|
|
41
|
+
edges by geometry. Raw markdown is a table you read; structured data is a graph
|
|
42
|
+
you filter and lay out.
|
|
43
|
+
|
|
44
|
+
The trade is parse-once-up-front for query-and-render-many afterwards.
|
|
45
|
+
|
|
46
|
+
## Two ways to define a shape
|
|
47
|
+
|
|
48
|
+
- **Template mode** (recommended for most docs) — write a `*.template.md` file
|
|
49
|
+
with `<!-- TEMPLATE-ONLY: -->` directives; the schema is derived automatically
|
|
50
|
+
at runtime. No separate TypeScript file, and it powers the `validate-md` CLI.
|
|
51
|
+
- **Programmatic mode** (advanced) — define the schema in TypeScript with
|
|
52
|
+
`defineDocSchema` when you need full static types, custom extractors, or shapes
|
|
53
|
+
the directive grammar can't express (geometry, cross-section invariants).
|
|
54
|
+
|
|
55
|
+
Both modes parse a leading YAML **frontmatter** block when present and expose it
|
|
56
|
+
as `doc.frontmatter` — see [Frontmatter](#frontmatter).
|
|
57
|
+
|
|
58
|
+
## Install
|
|
59
|
+
|
|
60
|
+
```bash
|
|
61
|
+
pnpm add markdown-schema
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
The core API (`parseTemplate`, `defineDocSchema`, extractors) runs in the
|
|
65
|
+
**browser and Node**. Filesystem helpers (`loadRefine`) and the CLI are
|
|
66
|
+
Node-only — see [Entry points](#entry-points).
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
## Agent skill
|
|
71
|
+
|
|
72
|
+
The package ships an **agent skill** that teaches a coding agent (Claude Code,
|
|
73
|
+
Cursor, Codex, and 70+ others) the three template-mode workflows:
|
|
74
|
+
|
|
75
|
+
- **Authoring** a `*.template.md` — choosing directives, splitting sections, and
|
|
76
|
+
deciding what belongs in a `*.refine.ts` companion.
|
|
77
|
+
- **Filling** a template — replacing `<!-- TEMPLATE-ONLY: -->` directives with
|
|
78
|
+
real content.
|
|
79
|
+
- **Validating** a document with `validate-md`, including how to read and triage
|
|
80
|
+
its error output.
|
|
81
|
+
|
|
82
|
+
It encodes the directive grammar, a which-directive decision table, the
|
|
83
|
+
template-vs-`*.refine.ts` boundary, and a triage table for common failures, so
|
|
84
|
+
the agent gets these workflows right without re-deriving the rules each time.
|
|
85
|
+
|
|
86
|
+
### Installing the skill
|
|
87
|
+
|
|
88
|
+
Use the [`skills`](https://github.com/vercel-labs/skills) CLI — the open
|
|
89
|
+
agent-skills installer. It copies the skill into your agent's skills directory
|
|
90
|
+
(`.claude/skills/`, `.cursor/skills/`, …) and auto-detects which agents you have:
|
|
91
|
+
|
|
92
|
+
```bash
|
|
93
|
+
# install the skill from this repo into the current project
|
|
94
|
+
npx skills add gergelyszerovay/markdown-schema --skill markdown-schema
|
|
95
|
+
|
|
96
|
+
# …or globally (~/.claude/skills/, ~/.cursor/skills/, …)
|
|
97
|
+
npx skills add gergelyszerovay/markdown-schema --skill markdown-schema -g
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
The skill source lives under
|
|
101
|
+
[`.claude/skills/markdown-schema/`](.claude/skills/markdown-schema/):
|
|
102
|
+
|
|
103
|
+
| File | Role |
|
|
104
|
+
| -------------------------------- | -------------------------------------------------------- |
|
|
105
|
+
| `SKILL.md` | Skill entry point — triggers and the three workflows |
|
|
106
|
+
| `markdown-schema.guideline.md` | Source of truth for the `<!-- TEMPLATE-ONLY: -->` grammar |
|
|
107
|
+
|
|
108
|
+
> Agents do **not** scan `node_modules`, so installing the npm package alone does
|
|
109
|
+
> not register the skill — use `npx skills add` (above) to copy it onto a skills
|
|
110
|
+
> path. As a fallback you can copy the directory manually into `.claude/skills/`.
|
|
111
|
+
|
|
112
|
+
Once installed, the agent triggers it automatically when you ask to author,
|
|
113
|
+
fill, or validate a `*.template.md`.
|
|
114
|
+
|
|
115
|
+
---
|
|
116
|
+
|
|
117
|
+
## Template mode
|
|
118
|
+
|
|
119
|
+
### 1. Write a template
|
|
120
|
+
|
|
121
|
+
`*.template.md` files use `<!-- TEMPLATE-ONLY: ... -->` HTML comments to embed
|
|
122
|
+
schema directives and author-facing prose. Everything outside the comments is
|
|
123
|
+
fixed structure that every filled instance must preserve.
|
|
124
|
+
|
|
125
|
+
Directives come in two structural shapes:
|
|
126
|
+
|
|
127
|
+
- **Inline directives** sit _inside_ a heading, list item, or paragraph and
|
|
128
|
+
must close on the same line as the opener.
|
|
129
|
+
- **Block directives** sit on their own line, opener at column ≤ 3, closer
|
|
130
|
+
on its own line. Body lines must start at column 0.
|
|
131
|
+
|
|
132
|
+
Author-facing prose lives in standalone `guide` block directives — `//`-prefixed
|
|
133
|
+
lines, ignored by the parser. Place them immediately before the field, list,
|
|
134
|
+
table, or section they document.
|
|
135
|
+
|
|
136
|
+
```markdown
|
|
137
|
+
<!-- TEMPLATE-ONLY: guide
|
|
138
|
+
// Short product name + release date, e.g. "Acme 2.4 — 2026-05-08".
|
|
139
|
+
-->
|
|
140
|
+
|
|
141
|
+
# Release Notes: <!-- TEMPLATE-ONLY: string; required -->
|
|
142
|
+
|
|
143
|
+
## 1. Metadata
|
|
144
|
+
|
|
145
|
+
- Version: <!-- TEMPLATE-ONLY: string; regex `^\d+\.\d+\.\d+$`; required -->
|
|
146
|
+
|
|
147
|
+
<!-- TEMPLATE-ONLY: guide
|
|
148
|
+
// Use GA only after the release has shipped to all customers.
|
|
149
|
+
-->
|
|
150
|
+
|
|
151
|
+
- Stage: <!-- TEMPLATE-ONLY: enum: Alpha | Beta | GA; required -->
|
|
152
|
+
|
|
153
|
+
## 2. Highlights
|
|
154
|
+
|
|
155
|
+
<!-- TEMPLATE-ONLY: guide
|
|
156
|
+
// One concise sentence per bullet. Keep under 80 chars.
|
|
157
|
+
-->
|
|
158
|
+
|
|
159
|
+
- <!-- TEMPLATE-ONLY: string; required -->
|
|
160
|
+
|
|
161
|
+
## 3. Acceptance Criteria
|
|
162
|
+
|
|
163
|
+
| ID | Description | Priority |
|
|
164
|
+
| --- | ----------- | -------- |
|
|
165
|
+
|
|
166
|
+
<!-- TEMPLATE-ONLY: row; min-rows: 1
|
|
167
|
+
ID: string; regex `^AC-\d+$`; required
|
|
168
|
+
Description: string; required
|
|
169
|
+
Priority: enum: Low | Medium | High; required
|
|
170
|
+
-->
|
|
171
|
+
|
|
172
|
+
| AC-001 | Sample criterion. | High |
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
### 2. Fill the template
|
|
176
|
+
|
|
177
|
+
Replace every `<!-- TEMPLATE-ONLY: … -->` block with actual content:
|
|
178
|
+
|
|
179
|
+
```markdown
|
|
180
|
+
# Release Notes: Widget v2
|
|
181
|
+
|
|
182
|
+
## 1. Metadata
|
|
183
|
+
|
|
184
|
+
- Version: 2.0.0
|
|
185
|
+
- Stage: GA
|
|
186
|
+
|
|
187
|
+
## 2. Highlights
|
|
188
|
+
|
|
189
|
+
- Rewrote the plugin loader to support async plugins
|
|
190
|
+
|
|
191
|
+
## 3. Acceptance Criteria
|
|
192
|
+
|
|
193
|
+
| ID | Description | Priority |
|
|
194
|
+
| ------ | -------------------------------------------- | -------- |
|
|
195
|
+
| AC-001 | Async plugin loads in < 100 ms | High |
|
|
196
|
+
| AC-002 | Legacy v1 config emits a deprecation warning | Medium |
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
### 3. Validate from the CLI
|
|
200
|
+
|
|
201
|
+
```bash
|
|
202
|
+
validate-md --template release.template.md release.md
|
|
203
|
+
# release.md: OK
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
### 4. Parse in code
|
|
207
|
+
|
|
208
|
+
```ts
|
|
209
|
+
import { readFileSync } from "node:fs";
|
|
210
|
+
import { parseTemplate } from "markdown-schema";
|
|
211
|
+
|
|
212
|
+
const templateRaw = readFileSync("release.template.md", "utf-8");
|
|
213
|
+
const schema = parseTemplate(templateRaw);
|
|
214
|
+
|
|
215
|
+
const raw = readFileSync("release.md", "utf-8");
|
|
216
|
+
const doc = schema.parse(raw);
|
|
217
|
+
// doc.metadata → { Version: "2.0.0", Stage: "GA" }
|
|
218
|
+
// doc.highlights → ["Rewrote the plugin loader to support async plugins"]
|
|
219
|
+
// doc.acceptanceCriteria → [{ ID: "AC-001", Description: "...", Priority: "High" }, ...]
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
### Cross-section invariants (`*.refine.ts`)
|
|
223
|
+
|
|
224
|
+
For rules that span multiple sections, add a sibling `*.refine.ts`:
|
|
225
|
+
|
|
226
|
+
```ts
|
|
227
|
+
// release.refine.ts
|
|
228
|
+
import type { z } from "zod";
|
|
229
|
+
|
|
230
|
+
export const refine = (doc: unknown, ctx: z.core.$RefinementCtx): void => {
|
|
231
|
+
const d = doc as Record<string, unknown>;
|
|
232
|
+
const meta = d["metadata"] as Record<string, string>;
|
|
233
|
+
if (meta["Stage"] !== "GA" && !d["knownIssues"]) {
|
|
234
|
+
ctx.addIssue({
|
|
235
|
+
code: "custom",
|
|
236
|
+
path: ["knownIssues"],
|
|
237
|
+
message: "Non-GA releases must declare known issues",
|
|
238
|
+
});
|
|
239
|
+
}
|
|
240
|
+
};
|
|
241
|
+
```
|
|
242
|
+
|
|
243
|
+
Load it automatically at validation time:
|
|
244
|
+
|
|
245
|
+
```bash
|
|
246
|
+
validate-md --template release.template.md release.md
|
|
247
|
+
# refine.ts is loaded automatically when a sibling release.refine.ts exists
|
|
248
|
+
```
|
|
249
|
+
|
|
250
|
+
Or load it manually in code:
|
|
251
|
+
|
|
252
|
+
```ts
|
|
253
|
+
import { parseTemplate } from "markdown-schema";
|
|
254
|
+
// loadRefine reads the filesystem, so it ships on the node-only subpath.
|
|
255
|
+
import { loadRefine } from "markdown-schema/node";
|
|
256
|
+
|
|
257
|
+
const refine = await loadRefine("release.template.md");
|
|
258
|
+
const schema = parseTemplate(templateRaw, { refine });
|
|
259
|
+
```
|
|
260
|
+
|
|
261
|
+
### Frontmatter
|
|
262
|
+
|
|
263
|
+
A document may begin with a YAML frontmatter block — a `---`-fenced region at
|
|
264
|
+
the very top of the file. When present, it is parsed and exposed on the result
|
|
265
|
+
as `doc.frontmatter`. When absent, `doc.frontmatter` is `undefined`; an empty
|
|
266
|
+
block (`---\n---`) yields `{}`.
|
|
267
|
+
|
|
268
|
+
```markdown
|
|
269
|
+
---
|
|
270
|
+
name: Heritage
|
|
271
|
+
colors:
|
|
272
|
+
primary: "#1A1C1E"
|
|
273
|
+
neutral: "#F7F5F2"
|
|
274
|
+
---
|
|
275
|
+
|
|
276
|
+
# Heritage
|
|
277
|
+
|
|
278
|
+
## 1. Summary
|
|
279
|
+
|
|
280
|
+
Body.
|
|
281
|
+
```
|
|
282
|
+
|
|
283
|
+
```ts
|
|
284
|
+
const doc = schema.parse(raw);
|
|
285
|
+
// doc.frontmatter → { name: "Heritage", colors: { primary: "#1A1C1E", neutral: "#F7F5F2" } }
|
|
286
|
+
```
|
|
287
|
+
|
|
288
|
+
The frontmatter is parsed and **exposed** — it is not validated against any
|
|
289
|
+
declarative schema. There is no template grammar for describing frontmatter
|
|
290
|
+
keys; the YAML region of a `*.template.md` is ignored for schema inference.
|
|
291
|
+
Validate frontmatter in a `*.refine.ts` sibling, which receives the whole
|
|
292
|
+
document including `doc.frontmatter`:
|
|
293
|
+
|
|
294
|
+
```ts
|
|
295
|
+
// design.refine.ts
|
|
296
|
+
import type { z } from "zod";
|
|
297
|
+
|
|
298
|
+
export const refine = (doc: unknown, ctx: z.core.$RefinementCtx): void => {
|
|
299
|
+
const fm = (doc as { frontmatter?: { colors?: Record<string, string> } })
|
|
300
|
+
.frontmatter;
|
|
301
|
+
if (!fm?.colors?.["primary"]) {
|
|
302
|
+
ctx.addIssue({
|
|
303
|
+
code: "custom",
|
|
304
|
+
path: ["frontmatter", "colors", "primary"],
|
|
305
|
+
message: "primary color is required",
|
|
306
|
+
});
|
|
307
|
+
}
|
|
308
|
+
};
|
|
309
|
+
```
|
|
310
|
+
|
|
311
|
+
`doc.frontmatter` is typed as `unknown` (its shape is document-specific); narrow
|
|
312
|
+
it inside the refine function. The parsed frontmatter also flows through
|
|
313
|
+
`--emit-json`, so it appears in the emitted document object.
|
|
314
|
+
|
|
315
|
+
> **Why no declarative frontmatter grammar?** Frontmatter shapes vary widely
|
|
316
|
+
> (nested maps, arrays, domain-specific types). Rather than grow the directive
|
|
317
|
+
> grammar, the parser extracts the YAML and hands it to `*.refine.ts`, where
|
|
318
|
+
> arbitrary Zod/TypeScript rules can validate it with full flexibility.
|
|
319
|
+
|
|
320
|
+
### Directive reference
|
|
321
|
+
|
|
322
|
+
All directives live inside `<!-- TEMPLATE-ONLY: ... -->` blocks. The first
|
|
323
|
+
non-whitespace token is the directive kind.
|
|
324
|
+
|
|
325
|
+
**Inline directives** (must close on the same line as the opener):
|
|
326
|
+
|
|
327
|
+
| Directive | Schema produced |
|
|
328
|
+
| ------------------------------------- | -------------------------------------------- | ------------------------------- |
|
|
329
|
+
| `string; required` | `z.string().min(1)` |
|
|
330
|
+
| `string; optional` | `z.string().optional()` |
|
|
331
|
+
| `string; regex \`^…$\`; required` | `z.string().regex(…)` |
|
|
332
|
+
| `string; optional; default=<value>` | falls back to `<value>` when blank |
|
|
333
|
+
| `string; optional; only-if Key=Value` | field present only when sibling equals value |
|
|
334
|
+
| `enum: A \| B \| C; required` | `z.enum(["A", "B", "C"])` |
|
|
335
|
+
| `enum: A \\\| B \| C; required` | choice `"A | B"`, choice `"C"` (`\|` escape) |
|
|
336
|
+
|
|
337
|
+
`regex` is a modifier of `string`, not a type of its own. A bare `regex` without a leading `string;` is a hard error.
|
|
338
|
+
|
|
339
|
+
**Block directives** (own-line opener at column ≤ 3; body lines at column 0):
|
|
340
|
+
|
|
341
|
+
| Directive | Effect |
|
|
342
|
+
| ------------------------------------------------------ | ---------------------------------------------------- |
|
|
343
|
+
| `freetext; required` / `freetext; optional` | section body is free-form markdown (see below) |
|
|
344
|
+
| `row; min-rows: N; max-rows: N` + body of column specs | `z.array(z.object({…}))` — one entry per row |
|
|
345
|
+
| `section; optional` | whole section heading may be absent |
|
|
346
|
+
| `section; remove-if Key=Value` | section must be absent when expression holds |
|
|
347
|
+
| `section; min-groups: N` / `max-groups: N` | for repeated sections, constrain the count of groups |
|
|
348
|
+
| `guide` + body of `//` lines | author-facing prose; ignored by parser |
|
|
349
|
+
|
|
350
|
+
**`row` column specs** use the same grammar as inline `string` / `enum:`
|
|
351
|
+
directives — one column per body line, e.g.:
|
|
352
|
+
|
|
353
|
+
```markdown
|
|
354
|
+
<!-- TEMPLATE-ONLY: row; min-rows: 1
|
|
355
|
+
ID: string; regex `^AC-\d+$`; required
|
|
356
|
+
Description: string; required
|
|
357
|
+
Priority: enum: Low | Medium | High; required
|
|
358
|
+
-->
|
|
359
|
+
```
|
|
360
|
+
|
|
361
|
+
### Author-facing prose: `guide` blocks
|
|
362
|
+
|
|
363
|
+
Long-form authoring guidance lives in standalone `guide` block directives.
|
|
364
|
+
Body lines must start with `//` (after optional whitespace) or be blank.
|
|
365
|
+
|
|
366
|
+
```markdown
|
|
367
|
+
<!-- TEMPLATE-ONLY: guide
|
|
368
|
+
// Choose the release stage. GA means the feature is production-ready and
|
|
369
|
+
// has shipped to all customers; Beta means a stable preview; Alpha means
|
|
370
|
+
// early access for design partners.
|
|
371
|
+
-->
|
|
372
|
+
|
|
373
|
+
- Stage: <!-- TEMPLATE-ONLY: enum: Alpha | Beta | GA; required -->
|
|
374
|
+
```
|
|
375
|
+
|
|
376
|
+
**Convention:** place a `guide` block immediately _before_ the field, list,
|
|
377
|
+
table, sub-heading, or section it documents. A `guide` at the top of a
|
|
378
|
+
section documents the whole section; one at the top of the file documents
|
|
379
|
+
the whole template. The parser doesn't enforce placement, but authors read
|
|
380
|
+
top-to-bottom and expect explanation before the thing being explained.
|
|
381
|
+
|
|
382
|
+
A `guide` answers at least one of: _what_ this field is in everyday words,
|
|
383
|
+
_why_ a constraint exists, _when_ a section/field applies, or shows an
|
|
384
|
+
_example_ of a good answer. Avoid restating the grammar
|
|
385
|
+
(`// string; required` adds no information) and avoid engineer-speak the
|
|
386
|
+
filling author won't share.
|
|
387
|
+
|
|
388
|
+
### Free-text sections
|
|
389
|
+
|
|
390
|
+
When a section contains a mix of prose, code, lists, blockquotes, or other
|
|
391
|
+
arbitrary markdown that doesn't fit a fixed schema shape, opt it into
|
|
392
|
+
**free-text mode** by placing a `freetext` block directive under the H2:
|
|
393
|
+
|
|
394
|
+
````markdown
|
|
395
|
+
## 1. Summary
|
|
396
|
+
|
|
397
|
+
<!-- TEMPLATE-ONLY: freetext; required -->
|
|
398
|
+
|
|
399
|
+
A paragraph.
|
|
400
|
+
|
|
401
|
+
```ts
|
|
402
|
+
const example = true;
|
|
403
|
+
```
|
|
404
|
+
|
|
405
|
+
- A bullet point.
|
|
406
|
+
````
|
|
407
|
+
|
|
408
|
+
The section's JSON value will be the body serialized back to a markdown
|
|
409
|
+
string via `mdast-util-to-markdown` (with GFM extensions). All node types
|
|
410
|
+
are preserved: paragraphs, fenced code, lists, tables, blockquotes, thematic
|
|
411
|
+
breaks, H3+ headings.
|
|
412
|
+
|
|
413
|
+
**Authoring rules:**
|
|
414
|
+
|
|
415
|
+
- Use H3+ for structure inside a free-text body; H2 always ends the section.
|
|
416
|
+
- No `<!-- TEMPLATE-ONLY: -->` directives of any kind inside the body —
|
|
417
|
+
including `guide` blocks. Place `guide` blocks _above_ the H2 instead.
|
|
418
|
+
- Sections with no recognized shape and no `freetext` directive are a hard
|
|
419
|
+
error; the parser will not silently accept them.
|
|
420
|
+
- **Serialization normalizations:** `mdast-util-to-markdown` normalizes some
|
|
421
|
+
constructs on round-trip. Bare URLs become autolink literals (`https://…`
|
|
422
|
+
→ `<https://…>`). Thematic breaks may be normalized to `***`. Downstream
|
|
423
|
+
consumers should treat the value as a markdown string, not as the exact
|
|
424
|
+
source bytes.
|
|
425
|
+
|
|
426
|
+
### Heading-level directive support
|
|
427
|
+
|
|
428
|
+
| Heading level | Inline directive supported? |
|
|
429
|
+
| ------------- | --------------------------------------------------------------------- |
|
|
430
|
+
| H1 | **Yes** — validates the document title (typed `string` / `enum:`) |
|
|
431
|
+
| H2 | **No** — H2 text is the JSON section key; keys must be fixed |
|
|
432
|
+
| H3 | **Yes, in repeated sections** — validates each per-group heading text |
|
|
433
|
+
|
|
434
|
+
### Section extractor inference
|
|
435
|
+
|
|
436
|
+
The parser picks an extractor automatically from the section body shape:
|
|
437
|
+
|
|
438
|
+
| Body shape | Extractor | Returns |
|
|
439
|
+
| ------------------------------------------- | ------------ | ------------------------------------------------- |
|
|
440
|
+
| `freetext` directive present | `freetext` | `string` (markdown source, round-tripped) |
|
|
441
|
+
| Sub-headings (H3 inside H2) | `repeated` | `{ heading: string; items: string[] }[]` |
|
|
442
|
+
| GFM table (with `row` directive) | `table` | `Record<string, string>[]` |
|
|
443
|
+
| Labeled bullet list (`- Key: value`) | `bulletList` | `Record<string, string \| undefined>` |
|
|
444
|
+
| GFM task list (`- [ ]` / `- [x]`) | `taskList` | `{ checked: boolean; text: string }[]` |
|
|
445
|
+
| Plain unordered bullet list | `bulletList` | `string[]` |
|
|
446
|
+
| None of the above (no `freetext` directive) | **error** | "no recognized shape; add a `freetext` directive" |
|
|
447
|
+
|
|
448
|
+
---
|
|
449
|
+
|
|
450
|
+
## Programmatic mode (advanced)
|
|
451
|
+
|
|
452
|
+
> Most documents are better served by [template mode](#template-mode) — the
|
|
453
|
+
> schema lives next to the prose and the CLI validates it. Reach for
|
|
454
|
+
> programmatic mode only when you need what's below.
|
|
455
|
+
|
|
456
|
+
Use `defineDocSchema` when you need full TypeScript types or extractors not
|
|
457
|
+
covered by template directives.
|
|
458
|
+
|
|
459
|
+
### Example
|
|
460
|
+
|
|
461
|
+
A complete, runnable example lives in
|
|
462
|
+
[`examples/programmatic/`](examples/programmatic/) — a release-notes schema that
|
|
463
|
+
exercises **every** extractor and `defineDocSchema` option:
|
|
464
|
+
|
|
465
|
+
| File | Role |
|
|
466
|
+
| ------------------------------------------------------------------------- | -------------------------------------------------------------- |
|
|
467
|
+
| [`release-schema.ts`](examples/programmatic/release-schema.ts) | The `defineDocSchema` schema (title, all extractors, `refine`) |
|
|
468
|
+
| [`release.md`](examples/programmatic/release.md) | A filled document that validates against it |
|
|
469
|
+
| [`run.ts`](examples/programmatic/run.ts) | Parses `release.md` and prints the typed JSON |
|
|
470
|
+
| [`output/release.json`](examples/programmatic/output/release.json) | The emitted, validated structured payload |
|
|
471
|
+
|
|
472
|
+
```bash
|
|
473
|
+
node --experimental-strip-types examples/programmatic/run.ts
|
|
474
|
+
```
|
|
475
|
+
|
|
476
|
+
Two sections in that example are worth calling out, because they show the
|
|
477
|
+
repeating-group extractors that template directives cannot express:
|
|
478
|
+
|
|
479
|
+
- **`changes`** uses [`repeated`](#extractors) — splits the section by `H3`
|
|
480
|
+
sub-heading and runs a sub-extractor map (`freetext` + `optional(table)` +
|
|
481
|
+
`codeBlocks`) on each group.
|
|
482
|
+
- **`migration`** uses [`repeatedWhere`](#extractors) — splits by an arbitrary
|
|
483
|
+
node predicate (here, `thematicBreak` / `---` rules) rather than headings.
|
|
484
|
+
|
|
485
|
+
Its `refine` then ties the two together as a cross-section invariant (pending
|
|
486
|
+
checklist items require migration steps).
|
|
487
|
+
|
|
488
|
+
<details>
|
|
489
|
+
<summary>Skeleton of the schema (see the file for the full version)</summary>
|
|
490
|
+
|
|
491
|
+
```ts
|
|
492
|
+
import { z } from "zod";
|
|
493
|
+
import {
|
|
494
|
+
defineDocSchema,
|
|
495
|
+
freetext,
|
|
496
|
+
table,
|
|
497
|
+
codeBlocks,
|
|
498
|
+
optional,
|
|
499
|
+
repeated,
|
|
500
|
+
repeatedWhere,
|
|
501
|
+
} from "markdown-schema";
|
|
502
|
+
|
|
503
|
+
export const ReleaseDoc = defineDocSchema({
|
|
504
|
+
title: { schema: z.string().regex(/^v\d+\.\d+\.\d+$/, "must be semver") },
|
|
505
|
+
sections: {
|
|
506
|
+
summary: { heading: "Summary", extract: freetext, schema: z.string().min(10) },
|
|
507
|
+
// …
|
|
508
|
+
changes: {
|
|
509
|
+
heading: "Changes",
|
|
510
|
+
extract: repeated({
|
|
511
|
+
shape: { description: freetext, params: optional(table), snippets: codeBlocks },
|
|
512
|
+
}),
|
|
513
|
+
schema: z.array(/* Change */ z.object({})).min(1),
|
|
514
|
+
},
|
|
515
|
+
migration: {
|
|
516
|
+
heading: "Migration",
|
|
517
|
+
extract: repeatedWhere({
|
|
518
|
+
startsAt: (n) => n.type === "thematicBreak",
|
|
519
|
+
shape: { body: freetext },
|
|
520
|
+
}),
|
|
521
|
+
schema: z.array(z.object({ heading: z.string(), body: z.string().min(1) })),
|
|
522
|
+
optional: true,
|
|
523
|
+
},
|
|
524
|
+
},
|
|
525
|
+
refine: (doc, ctx) => {
|
|
526
|
+
/* pending checklist items require migration steps — see the file */
|
|
527
|
+
},
|
|
528
|
+
});
|
|
529
|
+
```
|
|
530
|
+
|
|
531
|
+
</details>
|
|
532
|
+
|
|
533
|
+
### CLI support
|
|
534
|
+
|
|
535
|
+
The `validate-md` CLI is **template mode only** — it derives the schema from a
|
|
536
|
+
`*.template.md` file. Programmatic schemas (`defineDocSchema`) are used in code;
|
|
537
|
+
call `.parse(raw)` directly, as the example above does. The old
|
|
538
|
+
`--schema` / `--export` flags were removed in v0.2.0; the CLI now errors and
|
|
539
|
+
points to `--template`.
|
|
540
|
+
|
|
541
|
+
---
|
|
542
|
+
|
|
543
|
+
## Entry points
|
|
544
|
+
|
|
545
|
+
The package has two entry points so the core API can bundle for the browser:
|
|
546
|
+
|
|
547
|
+
| Import path | Environment | Exports |
|
|
548
|
+
| ------------------------------------------ | ------------- | ----------------------------------------------------------------- |
|
|
549
|
+
| `markdown-schema` | browser + node | `parseTemplate`, `defineDocSchema`, all extractors, `headingToKey`, `type RefineFunction` — pure mdast/zod, no node builtins |
|
|
550
|
+
| `markdown-schema/node` | node only | `loadRefine` — reads the filesystem and dynamically imports `*.refine.ts` |
|
|
551
|
+
|
|
552
|
+
The `validate-md` CLI is node-only and uses the `/node` entry internally.
|
|
553
|
+
|
|
554
|
+
---
|
|
555
|
+
|
|
556
|
+
## API surface
|
|
557
|
+
|
|
558
|
+
### Template mode
|
|
559
|
+
|
|
560
|
+
| Export | Entry | Description |
|
|
561
|
+
| --------------------------- | ------- | -------------------------------------------------------------------------- |
|
|
562
|
+
| `parseTemplate(raw, opts?)` | `.` | Parses a `*.template.md` string into a `{ parse(raw) }` schema object |
|
|
563
|
+
| `headingToKey(heading)` | `.` | Converts H2 text (e.g. `"1. Summary"`) to a camelCase key (`"summary"`) |
|
|
564
|
+
| `loadRefine(templatePath)` | `/node` | Dynamically loads the sibling `*.refine.ts`; returns `undefined` if absent |
|
|
565
|
+
|
|
566
|
+
**`parseTemplate` options (`opts`):**
|
|
567
|
+
|
|
568
|
+
| Option | Default | Description |
|
|
569
|
+
| -------- | ------- | ------------------------------------------------------------------- |
|
|
570
|
+
| `refine` | — | Cross-section refinement callback to attach (usually `loadRefine`'s result) |
|
|
571
|
+
| `file` | — | Source file path used in error messages |
|
|
572
|
+
|
|
573
|
+
### Extractors
|
|
574
|
+
|
|
575
|
+
| Extractor | Returns |
|
|
576
|
+
| ------------------------------------------------ | ------------------------------------------------------------------------------------- |
|
|
577
|
+
| `freetext` | Section body serialized back to markdown (all node types preserved) |
|
|
578
|
+
| `table` | First GFM table as `Record<string, string>[]`; throws if absent |
|
|
579
|
+
| `bulletList` | Plain list items as `string[]`; labeled list as `Record<string, string \| undefined>` |
|
|
580
|
+
| `taskList` | GFM task list as `{ checked: boolean; text: string }[]`; throws if absent |
|
|
581
|
+
| `codeBlocks` | All fenced code blocks as `{ lang: string \| null; value: string }[]` |
|
|
582
|
+
| `rawNodes` | The raw `RootContent[]` unchanged |
|
|
583
|
+
| `fencedCodeWithMarker({ marker, markerLabel? })` | Code block following an HTML comment matching `marker` |
|
|
584
|
+
| `optional(ex)` | Wraps any extractor; returns `undefined` instead of throwing |
|
|
585
|
+
| `repeated({ by?, shape })` | Splits by sub-heading; auto-detects depth |
|
|
586
|
+
| `repeatedWhere({ startsAt, shape, … })` | Generic repeating groups driven by a node predicate |
|
|
587
|
+
|
|
588
|
+
### Doc schema builder
|
|
589
|
+
|
|
590
|
+
| Export | Description |
|
|
591
|
+
| ----------------------- | ----------------------------------------------- |
|
|
592
|
+
| `defineDocSchema(spec)` | Returns `{ parse(raw: string): DocOf<S> }` |
|
|
593
|
+
| `SectionSpec<S>` | Type for one section entry |
|
|
594
|
+
| `DocOf<S>` | Infers the fully-typed parse result from a spec; includes `frontmatter?: unknown` |
|
|
595
|
+
|
|
596
|
+
**`defineDocSchema` spec fields:**
|
|
597
|
+
|
|
598
|
+
| Field | Default | Description |
|
|
599
|
+
| -------------- | -------- | ---------------------------------------------------- |
|
|
600
|
+
| `title` | — | `{ schema }` — extracts the H1 as the document title |
|
|
601
|
+
| `titleDepth` | `1` | Heading depth for the title |
|
|
602
|
+
| `sectionDepth` | `2` | Heading depth used as section boundaries |
|
|
603
|
+
| `sections` | required | Map of output key → `SectionSpec` |
|
|
604
|
+
| `refine` | — | `(doc, ctx) => void` — cross-section Zod refinement |
|
|
605
|
+
|
|
606
|
+
**`SectionSpec` fields:**
|
|
607
|
+
|
|
608
|
+
| Field | Default | Description |
|
|
609
|
+
| ---------- | ----------- | ------------------------------------------------- |
|
|
610
|
+
| `heading` | section key | Heading text in the document |
|
|
611
|
+
| `extract` | required | Extractor called on the section's body nodes |
|
|
612
|
+
| `schema` | required | Zod schema that validates the extracted value |
|
|
613
|
+
| `optional` | `false` | When `true`, a missing section yields `undefined` |
|
|
614
|
+
|
|
615
|
+
---
|
|
616
|
+
|
|
617
|
+
## CLI
|
|
618
|
+
|
|
619
|
+
```bash
|
|
620
|
+
# Validate one or more documents against a template
|
|
621
|
+
validate-md --template component.template.md component.md
|
|
622
|
+
validate-md --template component.template.md *.md
|
|
623
|
+
|
|
624
|
+
# Validate and emit the parsed document as JSON (exactly one input file)
|
|
625
|
+
validate-md --template component.template.md --emit-json component.json component.md
|
|
626
|
+
```
|
|
627
|
+
|
|
628
|
+
- `--template <file.template.md>` — required; the template that drives the schema.
|
|
629
|
+
- `--emit-json <path>` — write the parsed document (sections, title, and
|
|
630
|
+
`frontmatter`) to `<path>` as JSON. Requires exactly one input file.
|
|
631
|
+
- `--no-refine` — skip loading the sibling `*.refine.ts`. Use when validating
|
|
632
|
+
**untrusted** templates (see [Security](#security)).
|
|
633
|
+
|
|
634
|
+
A sibling `*.refine.ts` next to the template is loaded automatically (unless
|
|
635
|
+
`--no-refine` is given).
|
|
636
|
+
|
|
637
|
+
Prints `<file>: OK` on success — or `<file>: OK (wrote <path>)` when
|
|
638
|
+
`--emit-json` is set — or `<file>: schema validation failed` with one
|
|
639
|
+
`path: message` line per Zod issue on failure. Exits `1` if any file fails, `0`
|
|
640
|
+
otherwise.
|
|
641
|
+
|
|
642
|
+
All status lines are written to **stderr**, so `--emit-json /dev/stdout` yields
|
|
643
|
+
clean JSON on stdout (pipeable to `jq`).
|
|
644
|
+
|
|
645
|
+
### Requirements
|
|
646
|
+
|
|
647
|
+
- **Node.js ≥ 22.18**. The CLI loads the sibling `*.refine.ts` with Node's
|
|
648
|
+
built-in [type stripping](https://nodejs.org/api/typescript.html#type-stripping)
|
|
649
|
+
(unflagged from 22.18 / 23.6 onward). No `tsx`, `ts-node`, or build step
|
|
650
|
+
needed.
|
|
651
|
+
- TypeScript syntax that erases to plain JavaScript works as-is (`type`,
|
|
652
|
+
`interface`, type annotations, `as`, `satisfies`). Runtime-emitting syntax
|
|
653
|
+
(`enum`, `namespace`, parameter properties, legacy decorators) must be
|
|
654
|
+
compiled first.
|
|
655
|
+
|
|
656
|
+
---
|
|
657
|
+
|
|
658
|
+
## Security
|
|
659
|
+
|
|
660
|
+
This library builds schemas from `*.template.md` files and validates documents
|
|
661
|
+
against them. If **either** the template or the document can come from an
|
|
662
|
+
untrusted source (a PR, an upload, a third party), read this.
|
|
663
|
+
|
|
664
|
+
### Executing `*.refine.ts` (code execution)
|
|
665
|
+
|
|
666
|
+
Loading a template auto-imports its sibling `<stem>.refine.ts` and runs its
|
|
667
|
+
`refine` export. **Validating an untrusted template therefore executes
|
|
668
|
+
arbitrary code** shipped beside it. The path is confined to the sibling file (no
|
|
669
|
+
traversal), but the feature is code-execution by design.
|
|
670
|
+
|
|
671
|
+
- CLI: pass `--no-refine` to skip it.
|
|
672
|
+
- Programmatic: simply don't call `loadRefine` / don't pass `refine` to
|
|
673
|
+
`parseTemplate`. `parseTemplate(raw)` alone never loads or runs any file.
|
|
674
|
+
|
|
675
|
+
### Template-supplied regular expressions (ReDoS)
|
|
676
|
+
|
|
677
|
+
A template's `regex` modifier is compiled and tested against document values.
|
|
678
|
+
Every such pattern is routed through a ReDoS guard that runs at schema-build
|
|
679
|
+
time: it caps the pattern length and statically analyses it, rejecting
|
|
680
|
+
exponential-backtracking patterns outright and allowing only low-degree
|
|
681
|
+
(≤ 2) polynomial patterns. An unsafe pattern fails template parsing with a
|
|
682
|
+
`DirectiveError` rather than hanging the validator on a crafted document.
|
|
683
|
+
|
|
684
|
+
### Frontmatter and untrusted keys
|
|
685
|
+
|
|
686
|
+
YAML frontmatter is parsed with `yaml` (≥ 2), which materialises `__proto__`
|
|
687
|
+
and similar keys as ordinary own properties — no prototype pollution. Table
|
|
688
|
+
headers and labeled-list keys derived from untrusted markdown are collected into
|
|
689
|
+
null-prototype objects for the same reason. Frontmatter is **exposed, not
|
|
690
|
+
schema-validated**; validate it yourself in a `*.refine.ts` (which only runs
|
|
691
|
+
under the conditions above).
|
|
692
|
+
|
|
693
|
+
### Filesystem paths (CLI)
|
|
694
|
+
|
|
695
|
+
`validate-md` reads `--template`/input paths and writes `--emit-json` to any
|
|
696
|
+
path you give it, including absolute and `../` paths. This is fine when an
|
|
697
|
+
operator runs the CLI, but do not pass attacker-controlled path arguments to it
|
|
698
|
+
without confining them yourself.
|
|
699
|
+
|
|
700
|
+
---
|
|
701
|
+
|
|
702
|
+
## Design notes
|
|
703
|
+
|
|
704
|
+
The toolkit composes each section's Zod schema into a single `z.object` and
|
|
705
|
+
validates the whole document at once, so refinement callbacks receive a
|
|
706
|
+
fully-typed document and errors arrive as a single `ZodError` tree. Structural
|
|
707
|
+
failures (missing required section, malformed table) throw a plain `Error`
|
|
708
|
+
before Zod runs — structure is checked first, semantics second.
|
|
709
|
+
|
|
710
|
+
In template mode, `parseTemplate` walks the mdast tree, collects
|
|
711
|
+
`<!-- TEMPLATE-ONLY: -->` directives, and infers an extractor + Zod schema for
|
|
712
|
+
each H2 section. Section keys are derived from heading text via `headingToKey`
|
|
713
|
+
(e.g. `"4. Inputs (Props)"` → `"inputs"`). An optional `*.refine.ts` sibling
|
|
714
|
+
can be loaded at runtime to add cross-section invariants without coupling them
|
|
715
|
+
to the template syntax.
|
|
716
|
+
|
|
717
|
+
YAML frontmatter is parsed via `remark-frontmatter` + `yaml` and lifted onto
|
|
718
|
+
`doc.frontmatter` alongside the title and sections. It is exposed but not
|
|
719
|
+
schema-validated by the parser; frontmatter rules belong in `*.refine.ts`.
|
|
720
|
+
````
|