mdld-parse 0.1.0 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +83 -174
- package/index.js +438 -795
- package/package.json +7 -8
- package/tests.js +0 -409
package/README.md
CHANGED
|
@@ -1,28 +1,36 @@
|
|
|
1
|
-
# MD-LD
|
|
1
|
+
# MD-LD Parse
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
**Markdown-Linked Data (MD-LD)** — a human-friendly RDF authoring format that extends Markdown with semantic annotations.
|
|
4
|
+
|
|
5
|
+
[NPM](https://www.npmjs.com/package/mdld-parse)
|
|
6
|
+
|
|
7
|
+
[Website](https://mdld.js.org)
|
|
4
8
|
|
|
5
9
|
## What is MD-LD?
|
|
6
10
|
|
|
7
11
|
MD-LD allows you to author RDF graphs directly in Markdown using familiar syntax:
|
|
8
12
|
|
|
9
13
|
```markdown
|
|
10
|
-
|
|
11
|
-
"@context":
|
|
12
|
-
"@vocab": "http://schema.org/"
|
|
13
|
-
"@id": "#doc"
|
|
14
|
-
"@type": Article
|
|
15
|
-
---
|
|
14
|
+
# My Note {=urn:mdld:my-note-20251231 .NoteDigitalDocument}
|
|
16
15
|
|
|
17
|
-
|
|
16
|
+
[ex]{: http://example.org/}
|
|
18
17
|
|
|
19
|
-
Written by [Alice Johnson](
|
|
18
|
+
Written by [Alice Johnson](=ex:alice){author .Person}
|
|
20
19
|
|
|
21
|
-
|
|
20
|
+
## Alice's biography {=ex:alice}
|
|
21
|
+
|
|
22
|
+
[Alice](ex:alice){name} works at [Tech Corp](=ex:tech-corp){worksFor .Organization}
|
|
22
23
|
```
|
|
23
24
|
|
|
24
25
|
This generates valid RDF triples while remaining readable as plain Markdown.
|
|
25
26
|
|
|
27
|
+
```n-quads
|
|
28
|
+
<urn:mdld:my-note-20251231> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/NoteDigitalDocument> .
|
|
29
|
+
<urn:mdld:my-note-20251231> <http://schema.org/author> <http://example.org/alice> .
|
|
30
|
+
<http://example.org/alice> <http://schema.org/name> "Alice" .
|
|
31
|
+
<http://example.org/alice> <http://schema.org/worksFor> <http://example.org/tech-corp> .
|
|
32
|
+
```
|
|
33
|
+
|
|
26
34
|
## Architecture
|
|
27
35
|
|
|
28
36
|
### Design Principles
|
|
@@ -32,6 +40,8 @@ This generates valid RDF triples while remaining readable as plain Markdown.
|
|
|
32
40
|
3. **Standards Compliant** — Outputs RDF quads compatible with RDFa semantics
|
|
33
41
|
4. **Markdown Native** — Plain Markdown yields minimal but valid RDF
|
|
34
42
|
5. **Progressive Enhancement** — Add semantics incrementally via attributes
|
|
43
|
+
6. **BaseIRI Inference** — Automatically infers baseIRI from document structure
|
|
44
|
+
7. **Default Vocabulary** — Provides default vocabulary for common properties, extensible via options
|
|
35
45
|
|
|
36
46
|
### Stack Choices
|
|
37
47
|
|
|
@@ -40,8 +50,7 @@ This generates valid RDF triples while remaining readable as plain Markdown.
|
|
|
40
50
|
We implement a **minimal, purpose-built parser** for maximum control and zero dependencies:
|
|
41
51
|
|
|
42
52
|
- **Custom Markdown tokenizer** — Line-by-line parsing of headings, lists, paragraphs, code blocks
|
|
43
|
-
- **Inline attribute parser** — Pandoc-style `{
|
|
44
|
-
- **YAML-LD frontmatter parser** — Minimal YAML subset for `@context` and `@id` parsing
|
|
53
|
+
- **Inline attribute parser** — Pandoc-style `{=iri .class key="value"}` attribute extraction
|
|
45
54
|
- **RDF quad generator** — Direct mapping from tokens to RDF/JS quads
|
|
46
55
|
|
|
47
56
|
**Why custom?**
|
|
@@ -79,9 +88,7 @@ Markdown Text
|
|
|
79
88
|
↓
|
|
80
89
|
[Custom Tokenizer] — Extract headings, lists, paragraphs, code blocks
|
|
81
90
|
↓
|
|
82
|
-
[
|
|
83
|
-
↓
|
|
84
|
-
[Attribute Parser] — Parse {#id property="value"} from tokens
|
|
91
|
+
[Attribute Parser] — Parse {=iri .class key="value"} from tokens
|
|
85
92
|
↓
|
|
86
93
|
[Inline Parser] — Extract [text](url){attrs} spans
|
|
87
94
|
↓
|
|
@@ -101,17 +108,6 @@ The zero-dependency design provides:
|
|
|
101
108
|
3. **Predictable performance** — Linear time complexity, bounded memory
|
|
102
109
|
4. **Easy integration** — Works in Node.js, browsers, and edge runtimes
|
|
103
110
|
|
|
104
|
-
### Performance Profile
|
|
105
|
-
|
|
106
|
-
| Document Size | Peak Memory | Parse Time |
|
|
107
|
-
| ------------- | ----------- | ---------- |
|
|
108
|
-
| 10 KB | ~100 KB | <2ms |
|
|
109
|
-
| 100 KB | ~500 KB | <20ms |
|
|
110
|
-
| 1 MB | ~2 MB | <100ms |
|
|
111
|
-
| 10 MB | ~10 MB | <1s |
|
|
112
|
-
|
|
113
|
-
_Measured on modern JavaScript engines. Actual performance depends on document structure._
|
|
114
|
-
|
|
115
111
|
## Installation
|
|
116
112
|
|
|
117
113
|
### Node.js
|
|
@@ -121,12 +117,11 @@ npm install mdld-parse
|
|
|
121
117
|
```
|
|
122
118
|
|
|
123
119
|
```javascript
|
|
124
|
-
import {
|
|
120
|
+
import { parse } from "mdld-parse";
|
|
125
121
|
|
|
126
|
-
const markdown = `# Hello
|
|
127
|
-
const
|
|
128
|
-
|
|
129
|
-
});
|
|
122
|
+
const markdown = `# Hello {=urn:mdld:hello .Article}`;
|
|
123
|
+
const result = parse(markdown);
|
|
124
|
+
const quads = result.quads;
|
|
130
125
|
```
|
|
131
126
|
|
|
132
127
|
### Browser (via CDN)
|
|
@@ -141,42 +136,63 @@ const quads = parseMDLD(markdown, {
|
|
|
141
136
|
</script>
|
|
142
137
|
|
|
143
138
|
<script type="module">
|
|
144
|
-
import {
|
|
145
|
-
// use
|
|
139
|
+
import { parse } from "mdld-parse";
|
|
140
|
+
// use parse...
|
|
146
141
|
</script>
|
|
147
142
|
```
|
|
148
143
|
|
|
149
144
|
## API
|
|
150
145
|
|
|
151
|
-
### `
|
|
146
|
+
### `parse(markdown, options)`
|
|
152
147
|
|
|
153
|
-
Parse MD-LD markdown and return
|
|
148
|
+
Parse MD-LD markdown and return parsing result.
|
|
154
149
|
|
|
155
150
|
**Parameters:**
|
|
156
151
|
|
|
157
152
|
- `markdown` (string) — MD-LD formatted text
|
|
158
153
|
- `options` (object, optional):
|
|
159
|
-
- `baseIRI` (string) — Base IRI for relative references
|
|
160
|
-
- `
|
|
154
|
+
- `baseIRI` (string) — Base IRI for relative references
|
|
155
|
+
- `context` (object) — Additional context to merge with default context
|
|
161
156
|
- `dataFactory` (object) — Custom RDF/JS DataFactory (default: built-in)
|
|
162
157
|
|
|
163
|
-
**Returns:**
|
|
158
|
+
**Returns:** Object containing:
|
|
159
|
+
- `quads` — Array of RDF/JS Quads
|
|
160
|
+
- `origin` — Object with `blocks` and `quadIndex` for serialization
|
|
161
|
+
- `context` — Final context used for parsing
|
|
162
|
+
|
|
163
|
+
### `serialize({ text, diff, origin, options })`
|
|
164
|
+
|
|
165
|
+
Serialize RDF changes back to markdown with proper positioning.
|
|
166
|
+
|
|
167
|
+
**Parameters:**
|
|
168
|
+
|
|
169
|
+
- `text` (string) — Original markdown text
|
|
170
|
+
- `diff` (object) — Changes to apply:
|
|
171
|
+
- `add` — Array of quads to add
|
|
172
|
+
- `delete` — Array of quads to remove
|
|
173
|
+
- `origin` (object) — Origin object from parse result
|
|
174
|
+
- `options` (object, optional) — Additional options
|
|
175
|
+
|
|
176
|
+
**Returns:** Object containing:
|
|
177
|
+
- `text` — Updated markdown text
|
|
178
|
+
- `origin` — Updated origin object
|
|
164
179
|
|
|
165
180
|
```javascript
|
|
166
|
-
const
|
|
181
|
+
const result = parse(
|
|
167
182
|
`
|
|
168
|
-
# Article Title
|
|
169
|
-
{#article typeof="Article"}
|
|
183
|
+
# Article Title {=ex:article .Article}
|
|
170
184
|
|
|
171
|
-
Written by [Alice](
|
|
185
|
+
Written by [Alice](ex:alice) {ex:author}
|
|
172
186
|
`,
|
|
173
187
|
{
|
|
174
188
|
baseIRI: "http://example.org/doc",
|
|
175
|
-
|
|
189
|
+
context: {
|
|
190
|
+
'@vocab': 'http://schema.org/',
|
|
191
|
+
},
|
|
176
192
|
}
|
|
177
193
|
);
|
|
178
194
|
|
|
179
|
-
// quads[0] = {
|
|
195
|
+
// result.quads[0] = {
|
|
180
196
|
// subject: { termType: 'NamedNode', value: 'http://example.org/doc#article' },
|
|
181
197
|
// predicate: { termType: 'NamedNode', value: 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' },
|
|
182
198
|
// object: { termType: 'NamedNode', value: 'http://schema.org/Article' },
|
|
@@ -184,78 +200,51 @@ Written by [Alice](#alice){property="author"}
|
|
|
184
200
|
// }
|
|
185
201
|
```
|
|
186
202
|
|
|
187
|
-
### Batch Processing
|
|
188
|
-
|
|
189
|
-
For multiple documents, process them sequentially:
|
|
190
|
-
|
|
191
|
-
```javascript
|
|
192
|
-
const documents = [markdown1, markdown2, markdown3];
|
|
193
|
-
const allQuads = documents.flatMap((md) =>
|
|
194
|
-
parseMDLD(md, { baseIRI: "http://example.org/" })
|
|
195
|
-
);
|
|
196
|
-
```
|
|
197
|
-
|
|
198
203
|
## Implementation Details
|
|
199
204
|
|
|
200
205
|
### Subject Resolution
|
|
201
206
|
|
|
202
207
|
MD-LD follows a clear subject inheritance model:
|
|
203
208
|
|
|
204
|
-
1. **Root subject** — Declared in
|
|
205
|
-
2. **Heading subjects** — `## Title {
|
|
206
|
-
3. **Inline subjects** — `[text](
|
|
209
|
+
1. **Root subject** — Declared in the first heading of the document or inferred it's text content
|
|
210
|
+
2. **Heading subjects** — `## Title {=ex:title .Type}`
|
|
211
|
+
3. **Inline subjects** — `[text](=ex:text) {.Type}`
|
|
207
212
|
4. **Blank nodes** — Generated for incomplete triples
|
|
208
213
|
|
|
209
214
|
```markdown
|
|
210
|
-
# Document
|
|
211
|
-
|
|
212
|
-
{#doc typeof="Article"}
|
|
215
|
+
# Document {=urn:mdld:doc .Article}
|
|
213
216
|
|
|
214
|
-
## Section
|
|
217
|
+
## Section 1 {=urn:mdld:sec1 .Section}
|
|
215
218
|
|
|
216
|
-
{
|
|
219
|
+
[Text] {name} ← property of sec1
|
|
217
220
|
|
|
218
|
-
[
|
|
221
|
+
Back to [doc](=urn:mdld:doc) {hasPart}
|
|
219
222
|
```
|
|
220
223
|
|
|
221
|
-
### Property Mapping
|
|
222
|
-
|
|
223
|
-
| Markdown | RDF Predicate |
|
|
224
|
-
| ----------------------- | ------------------------------------------------------------------------------- |
|
|
225
|
-
| Top-level H1 (no `#id`) | `rdfs:label` on root subject |
|
|
226
|
-
| Heading with `{#id}` | `rdfs:label` on subject |
|
|
227
|
-
| First paragraph | `dct:description` on root |
|
|
228
|
-
| `{property="name"}` | Resolved via `@vocab` (e.g., `schema:name`) |
|
|
229
|
-
| `{rel="author"}` | Resolved via `@vocab` (e.g., `schema:author`) |
|
|
230
|
-
| Code block | `schema:SoftwareSourceCode` with `schema:programmingLanguage` and `schema:text` |
|
|
231
|
-
|
|
232
224
|
### List Handling
|
|
233
225
|
|
|
234
|
-
```markdown
|
|
235
|
-
-
|
|
236
|
-
-
|
|
226
|
+
```markdown {item}
|
|
227
|
+
- Item 1
|
|
228
|
+
- Item 2
|
|
237
229
|
```
|
|
238
230
|
|
|
239
231
|
Creates **multiple triples** with same predicate (not RDF lists):
|
|
240
232
|
|
|
241
233
|
```turtle
|
|
242
|
-
|
|
243
|
-
|
|
234
|
+
<subject> schema:item "Item 1" .
|
|
235
|
+
<subject> schema:item "Item 2" .
|
|
244
236
|
```
|
|
245
237
|
|
|
246
|
-
For RDF lists (`rdf:List`), use `@inlist` in generated HTML.
|
|
247
|
-
|
|
248
238
|
### Code Block Semantics
|
|
249
239
|
|
|
250
|
-
Fenced code blocks are automatically mapped to `schema:SoftwareSourceCode`:
|
|
251
|
-
|
|
252
240
|
```markdown
|
|
253
|
-
\`\`\`sparql {
|
|
254
|
-
SELECT
|
|
241
|
+
\`\`\`sparql {=ex:query-1 .SoftwareSourceCode}
|
|
242
|
+
SELECT \* WHERE { ?s ?p ?o }
|
|
255
243
|
\`\`\`
|
|
256
244
|
```
|
|
257
245
|
|
|
258
246
|
Creates:
|
|
247
|
+
|
|
259
248
|
- A `schema:SoftwareSourceCode` resource (or custom type via `typeof`)
|
|
260
249
|
- `schema:programmingLanguage` from the info string (`sparql`)
|
|
261
250
|
- `schema:text` with the raw source code
|
|
@@ -263,112 +252,32 @@ Creates:
|
|
|
263
252
|
|
|
264
253
|
This enables semantic queries like "find all SPARQL queries in my notes."
|
|
265
254
|
|
|
266
|
-
### Blank Node Strategy
|
|
267
|
-
|
|
268
|
-
Blank nodes are created for:
|
|
269
|
-
|
|
270
|
-
1. Task list items without explicit `#id`
|
|
271
|
-
2. Code blocks without explicit `#id`
|
|
272
|
-
3. Inline `typeof` without `id` when used with `rel`
|
|
273
|
-
|
|
274
|
-
## Testing
|
|
275
|
-
|
|
276
|
-
```bash
|
|
277
|
-
npm test
|
|
278
|
-
````
|
|
279
|
-
|
|
280
|
-
Tests cover:
|
|
281
|
-
|
|
282
|
-
- ✅ YAML-LD frontmatter parsing
|
|
283
|
-
- ✅ Subject inheritance via headings
|
|
284
|
-
- ✅ Property literals and datatypes (`property`, `datatype`)
|
|
285
|
-
- ✅ Object relationships (`rel` on links)
|
|
286
|
-
- ✅ Blank node generation (tasks, code blocks)
|
|
287
|
-
- ✅ List mappings (repeated properties)
|
|
288
|
-
- ✅ Code block semantics (`SoftwareSourceCode`)
|
|
289
|
-
- ✅ Semantic links in lists (`hasPart` TOC)
|
|
290
|
-
- ✅ Cross-references via fragment IDs
|
|
291
|
-
- ✅ Minimal Markdown → RDF (headings, paragraphs)
|
|
292
|
-
|
|
293
255
|
## Syntax Overview
|
|
294
256
|
|
|
295
257
|
### Core Features
|
|
296
258
|
|
|
297
|
-
**YAML-LD Frontmatter** — Define context and root subject:
|
|
298
|
-
|
|
299
|
-
```yaml
|
|
300
|
-
---
|
|
301
|
-
"@context":
|
|
302
|
-
"@vocab": "http://schema.org/"
|
|
303
|
-
"@id": "#doc"
|
|
304
|
-
"@type": Article
|
|
305
|
-
---
|
|
306
|
-
```
|
|
307
|
-
|
|
308
259
|
**Subject Declaration** — Headings create typed subjects:
|
|
309
260
|
|
|
310
261
|
```markdown
|
|
311
|
-
## Alice Johnson {
|
|
262
|
+
## Alice Johnson {=ex:alice .Person}
|
|
312
263
|
```
|
|
313
264
|
|
|
314
265
|
**Literal Properties** — Inline spans create properties:
|
|
315
266
|
|
|
316
267
|
```markdown
|
|
317
|
-
[Alice Johnson]{
|
|
318
|
-
[30]{
|
|
268
|
+
[Alice Johnson] {name}
|
|
269
|
+
[30] {age ^^xsd:integer}
|
|
319
270
|
```
|
|
320
271
|
|
|
321
272
|
**Object Properties** — Links create relationships:
|
|
322
273
|
|
|
323
274
|
```markdown
|
|
324
|
-
[Tech Corp](
|
|
275
|
+
[Tech Corp](=ex:company) {worksFor}
|
|
325
276
|
```
|
|
326
277
|
|
|
327
278
|
**Lists** — Repeated properties:
|
|
328
279
|
|
|
329
|
-
```markdown
|
|
330
|
-
-
|
|
331
|
-
-
|
|
280
|
+
```markdown {tag}
|
|
281
|
+
- Item 1
|
|
282
|
+
- Item 2
|
|
332
283
|
```
|
|
333
|
-
|
|
334
|
-
**Code Blocks** — Automatic `SoftwareSourceCode` mapping:
|
|
335
|
-
|
|
336
|
-
````markdown
|
|
337
|
-
```sparql
|
|
338
|
-
SELECT * WHERE { ?s ?p ?o }
|
|
339
|
-
```
|
|
340
|
-
````
|
|
341
|
-
|
|
342
|
-
````
|
|
343
|
-
|
|
344
|
-
**Tasks** — Markdown checklists become `schema:Action`:
|
|
345
|
-
```markdown
|
|
346
|
-
- [x] Completed task
|
|
347
|
-
- [ ] Pending task
|
|
348
|
-
````
|
|
349
|
-
|
|
350
|
-
### Optimization Tips
|
|
351
|
-
|
|
352
|
-
1. **Reuse DataFactory** — Pass custom factory instance to avoid allocations
|
|
353
|
-
2. **Minimize frontmatter** — Keep `@context` simple for faster parsing
|
|
354
|
-
3. **Batch processing** — Process multiple documents sequentially
|
|
355
|
-
4. **Fragment IDs** — Use `#id` on headings for efficient cross-references
|
|
356
|
-
|
|
357
|
-
## Future Work
|
|
358
|
-
|
|
359
|
-
- [ ] Streaming API for large documents
|
|
360
|
-
- [ ] Tables → CSVW integration
|
|
361
|
-
- [ ] Math blocks → MathML + RDF
|
|
362
|
-
- [ ] Image syntax → `schema:ImageObject`
|
|
363
|
-
- [ ] Bare URL links → `dct:references`
|
|
364
|
-
- [ ] Language tags (`lang` attribute)
|
|
365
|
-
- [ ] Source maps for debugging
|
|
366
|
-
|
|
367
|
-
## Standards Compliance
|
|
368
|
-
|
|
369
|
-
This parser implements:
|
|
370
|
-
|
|
371
|
-
- [MD-LD v0.1 Specification](./mdld_spec_dogfood.md)
|
|
372
|
-
- [RDF/JS Data Model](https://rdf.js.org/data-model-spec/)
|
|
373
|
-
- [RDFa Core 1.1](https://www.w3.org/TR/rdfa-core/) (subset)
|
|
374
|
-
- [JSON-LD 1.1](https://www.w3.org/TR/json-ld11/) (frontmatter)
|