mdld-parse 0.1.0 → 0.2.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +98 -172
- package/index.js +463 -793
- package/package.json +7 -8
- package/tests.js +0 -409
package/README.md
CHANGED
|
@@ -1,28 +1,36 @@
|
|
|
1
|
-
# MD-LD
|
|
1
|
+
# MD-LD Parse
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
**Markdown-Linked Data (MD-LD)** — a human-friendly RDF authoring format that extends Markdown with semantic annotations.
|
|
4
|
+
|
|
5
|
+
[NPM](https://www.npmjs.com/package/mdld-parse)
|
|
6
|
+
|
|
7
|
+
[Website](https://mdld.js.org)
|
|
4
8
|
|
|
5
9
|
## What is MD-LD?
|
|
6
10
|
|
|
7
11
|
MD-LD allows you to author RDF graphs directly in Markdown using familiar syntax:
|
|
8
12
|
|
|
9
13
|
```markdown
|
|
10
|
-
|
|
11
|
-
"@context":
|
|
12
|
-
"@vocab": "http://schema.org/"
|
|
13
|
-
"@id": "#doc"
|
|
14
|
-
"@type": Article
|
|
15
|
-
---
|
|
14
|
+
# My Note {=urn:mdld:my-note-20251231 .NoteDigitalDocument}
|
|
16
15
|
|
|
17
|
-
|
|
16
|
+
[ex]{: http://example.org/}
|
|
18
17
|
|
|
19
|
-
Written by [Alice Johnson](
|
|
18
|
+
Written by [Alice Johnson](=ex:alice){author .Person}
|
|
20
19
|
|
|
21
|
-
|
|
20
|
+
## Alice's biography {=ex:alice}
|
|
21
|
+
|
|
22
|
+
[Alice](ex:alice){name} works at [Tech Corp](=ex:tech-corp){worksFor .Organization}
|
|
22
23
|
```
|
|
23
24
|
|
|
24
25
|
This generates valid RDF triples while remaining readable as plain Markdown.
|
|
25
26
|
|
|
27
|
+
```n-quads
|
|
28
|
+
<urn:mdld:my-note-20251231> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/NoteDigitalDocument> .
|
|
29
|
+
<urn:mdld:my-note-20251231> <http://schema.org/author> <http://example.org/alice> .
|
|
30
|
+
<http://example.org/alice> <http://schema.org/name> "Alice" .
|
|
31
|
+
<http://example.org/alice> <http://schema.org/worksFor> <http://example.org/tech-corp> .
|
|
32
|
+
```
|
|
33
|
+
|
|
26
34
|
## Architecture
|
|
27
35
|
|
|
28
36
|
### Design Principles
|
|
@@ -32,6 +40,8 @@ This generates valid RDF triples while remaining readable as plain Markdown.
|
|
|
32
40
|
3. **Standards Compliant** — Outputs RDF quads compatible with RDFa semantics
|
|
33
41
|
4. **Markdown Native** — Plain Markdown yields minimal but valid RDF
|
|
34
42
|
5. **Progressive Enhancement** — Add semantics incrementally via attributes
|
|
43
|
+
6. **BaseIRI Inference** — Automatically infers baseIRI from document structure
|
|
44
|
+
7. **Default Vocabulary** — Provides default vocabulary for common properties, extensible via options
|
|
35
45
|
|
|
36
46
|
### Stack Choices
|
|
37
47
|
|
|
@@ -40,8 +50,7 @@ This generates valid RDF triples while remaining readable as plain Markdown.
|
|
|
40
50
|
We implement a **minimal, purpose-built parser** for maximum control and zero dependencies:
|
|
41
51
|
|
|
42
52
|
- **Custom Markdown tokenizer** — Line-by-line parsing of headings, lists, paragraphs, code blocks
|
|
43
|
-
- **Inline attribute parser** — Pandoc-style `{
|
|
44
|
-
- **YAML-LD frontmatter parser** — Minimal YAML subset for `@context` and `@id` parsing
|
|
53
|
+
- **Inline attribute parser** — Pandoc-style `{=iri .class key="value"}` attribute extraction
|
|
45
54
|
- **RDF quad generator** — Direct mapping from tokens to RDF/JS quads
|
|
46
55
|
|
|
47
56
|
**Why custom?**
|
|
@@ -79,9 +88,7 @@ Markdown Text
|
|
|
79
88
|
↓
|
|
80
89
|
[Custom Tokenizer] — Extract headings, lists, paragraphs, code blocks
|
|
81
90
|
↓
|
|
82
|
-
[
|
|
83
|
-
↓
|
|
84
|
-
[Attribute Parser] — Parse {#id property="value"} from tokens
|
|
91
|
+
[Attribute Parser] — Parse {=iri .class key="value"} from tokens
|
|
85
92
|
↓
|
|
86
93
|
[Inline Parser] — Extract [text](url){attrs} spans
|
|
87
94
|
↓
|
|
@@ -101,17 +108,6 @@ The zero-dependency design provides:
|
|
|
101
108
|
3. **Predictable performance** — Linear time complexity, bounded memory
|
|
102
109
|
4. **Easy integration** — Works in Node.js, browsers, and edge runtimes
|
|
103
110
|
|
|
104
|
-
### Performance Profile
|
|
105
|
-
|
|
106
|
-
| Document Size | Peak Memory | Parse Time |
|
|
107
|
-
| ------------- | ----------- | ---------- |
|
|
108
|
-
| 10 KB | ~100 KB | <2ms |
|
|
109
|
-
| 100 KB | ~500 KB | <20ms |
|
|
110
|
-
| 1 MB | ~2 MB | <100ms |
|
|
111
|
-
| 10 MB | ~10 MB | <1s |
|
|
112
|
-
|
|
113
|
-
_Measured on modern JavaScript engines. Actual performance depends on document structure._
|
|
114
|
-
|
|
115
111
|
## Installation
|
|
116
112
|
|
|
117
113
|
### Node.js
|
|
@@ -121,12 +117,11 @@ npm install mdld-parse
|
|
|
121
117
|
```
|
|
122
118
|
|
|
123
119
|
```javascript
|
|
124
|
-
import {
|
|
120
|
+
import { parse } from "mdld-parse";
|
|
125
121
|
|
|
126
|
-
const markdown = `# Hello
|
|
127
|
-
const
|
|
128
|
-
|
|
129
|
-
});
|
|
122
|
+
const markdown = `# Hello {=urn:mdld:hello .Article}`;
|
|
123
|
+
const result = parse(markdown);
|
|
124
|
+
const quads = result.quads;
|
|
130
125
|
```
|
|
131
126
|
|
|
132
127
|
### Browser (via CDN)
|
|
@@ -141,58 +136,85 @@ const quads = parseMDLD(markdown, {
|
|
|
141
136
|
</script>
|
|
142
137
|
|
|
143
138
|
<script type="module">
|
|
144
|
-
import {
|
|
145
|
-
// use
|
|
139
|
+
import { parse } from "mdld-parse";
|
|
140
|
+
// use parse...
|
|
146
141
|
</script>
|
|
147
142
|
```
|
|
148
143
|
|
|
149
144
|
## API
|
|
150
145
|
|
|
151
|
-
### `
|
|
146
|
+
### `parse(markdown, options)`
|
|
152
147
|
|
|
153
|
-
Parse MD-LD markdown and return
|
|
148
|
+
Parse MD-LD markdown and return parsing result.
|
|
154
149
|
|
|
155
150
|
**Parameters:**
|
|
156
151
|
|
|
157
152
|
- `markdown` (string) — MD-LD formatted text
|
|
158
153
|
- `options` (object, optional):
|
|
159
|
-
- `baseIRI` (string) — Base IRI for relative references
|
|
160
|
-
- `
|
|
154
|
+
- `baseIRI` (string) — Base IRI for relative references
|
|
155
|
+
- `context` (object) — Additional context to merge with default context
|
|
161
156
|
- `dataFactory` (object) — Custom RDF/JS DataFactory (default: built-in)
|
|
162
157
|
|
|
163
|
-
**Returns:**
|
|
158
|
+
**Returns:** Object containing:
|
|
159
|
+
- `quads` — Array of RDF/JS Quads
|
|
160
|
+
- `origin` — Object with `blocks` and `quadIndex` for serialization
|
|
161
|
+
- `context` — Final context used for parsing
|
|
162
|
+
|
|
163
|
+
### `serialize({ text, diff, origin, options })`
|
|
164
|
+
|
|
165
|
+
Serialize RDF changes back to markdown with proper positioning.
|
|
166
|
+
|
|
167
|
+
**Parameters:**
|
|
168
|
+
|
|
169
|
+
- `text` (string) — Original markdown text
|
|
170
|
+
- `diff` (object) — Changes to apply:
|
|
171
|
+
- `add` — Array of quads to add
|
|
172
|
+
- `delete` — Array of quads to remove
|
|
173
|
+
- `origin` (object) — Origin object from parse result
|
|
174
|
+
- `options` (object, optional) — Additional options:
|
|
175
|
+
- `context` (object) — Context for IRI shortening (default: empty object)
|
|
176
|
+
|
|
177
|
+
**Returns:** Object containing:
|
|
178
|
+
- `text` — Updated markdown text
|
|
179
|
+
- `origin` — Updated origin object
|
|
164
180
|
|
|
165
181
|
```javascript
|
|
166
|
-
const
|
|
182
|
+
const result = parse(
|
|
167
183
|
`
|
|
168
|
-
# Article Title
|
|
169
|
-
{#article typeof="Article"}
|
|
184
|
+
# Article Title {=ex:article .Article}
|
|
170
185
|
|
|
171
|
-
Written by [Alice](
|
|
186
|
+
Written by [Alice](ex:alice) {ex:author}
|
|
172
187
|
`,
|
|
173
188
|
{
|
|
174
189
|
baseIRI: "http://example.org/doc",
|
|
175
|
-
|
|
190
|
+
context: {
|
|
191
|
+
'@vocab': 'http://schema.org/',
|
|
192
|
+
},
|
|
176
193
|
}
|
|
177
194
|
);
|
|
178
195
|
|
|
179
|
-
// quads[0] = {
|
|
196
|
+
// result.quads[0] = {
|
|
180
197
|
// subject: { termType: 'NamedNode', value: 'http://example.org/doc#article' },
|
|
181
198
|
// predicate: { termType: 'NamedNode', value: 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' },
|
|
182
199
|
// object: { termType: 'NamedNode', value: 'http://schema.org/Article' },
|
|
183
200
|
// graph: { termType: 'DefaultGraph' }
|
|
184
201
|
// }
|
|
185
|
-
```
|
|
186
202
|
|
|
187
|
-
|
|
188
|
-
|
|
189
|
-
|
|
203
|
+
// Add a new quad with proper IRI shortening
|
|
204
|
+
const newQuad = {
|
|
205
|
+
subject: { termType: 'NamedNode', value: 'http://example.org/doc#article' },
|
|
206
|
+
predicate: { termType: 'NamedNode', value: 'http://schema.org/dateCreated' },
|
|
207
|
+
object: { termType: 'Literal', value: '2024-01-01' }
|
|
208
|
+
};
|
|
209
|
+
|
|
210
|
+
const serialized = serialize({
|
|
211
|
+
text: originalText,
|
|
212
|
+
diff: { add: [newQuad] },
|
|
213
|
+
origin: result.origin,
|
|
214
|
+
options: { context: result.context } // Important: pass context for IRI shortening
|
|
215
|
+
});
|
|
190
216
|
|
|
191
|
-
|
|
192
|
-
const documents = [markdown1, markdown2, markdown3];
|
|
193
|
-
const allQuads = documents.flatMap((md) =>
|
|
194
|
-
parseMDLD(md, { baseIRI: "http://example.org/" })
|
|
195
|
-
);
|
|
217
|
+
// Result: [2024-01-01] {dateCreated} // Properly shortened!
|
|
196
218
|
```
|
|
197
219
|
|
|
198
220
|
## Implementation Details
|
|
@@ -201,61 +223,45 @@ const allQuads = documents.flatMap((md) =>
|
|
|
201
223
|
|
|
202
224
|
MD-LD follows a clear subject inheritance model:
|
|
203
225
|
|
|
204
|
-
1. **Root subject** — Declared in
|
|
205
|
-
2. **Heading subjects** — `## Title {
|
|
206
|
-
3. **Inline subjects** — `[text](
|
|
226
|
+
1. **Root subject** — Declared in the first heading of the document or inferred it's text content
|
|
227
|
+
2. **Heading subjects** — `## Title {=ex:title .Type}`
|
|
228
|
+
3. **Inline subjects** — `[text](=ex:text) {.Type}`
|
|
207
229
|
4. **Blank nodes** — Generated for incomplete triples
|
|
208
230
|
|
|
209
231
|
```markdown
|
|
210
|
-
# Document
|
|
211
|
-
|
|
212
|
-
{#doc typeof="Article"}
|
|
232
|
+
# Document {=urn:mdld:doc .Article}
|
|
213
233
|
|
|
214
|
-
## Section
|
|
234
|
+
## Section 1 {=urn:mdld:sec1 .Section}
|
|
215
235
|
|
|
216
|
-
{
|
|
236
|
+
[Text] {name} ← property of sec1
|
|
217
237
|
|
|
218
|
-
[
|
|
238
|
+
Back to [doc](=urn:mdld:doc) {hasPart}
|
|
219
239
|
```
|
|
220
240
|
|
|
221
|
-
### Property Mapping
|
|
222
|
-
|
|
223
|
-
| Markdown | RDF Predicate |
|
|
224
|
-
| ----------------------- | ------------------------------------------------------------------------------- |
|
|
225
|
-
| Top-level H1 (no `#id`) | `rdfs:label` on root subject |
|
|
226
|
-
| Heading with `{#id}` | `rdfs:label` on subject |
|
|
227
|
-
| First paragraph | `dct:description` on root |
|
|
228
|
-
| `{property="name"}` | Resolved via `@vocab` (e.g., `schema:name`) |
|
|
229
|
-
| `{rel="author"}` | Resolved via `@vocab` (e.g., `schema:author`) |
|
|
230
|
-
| Code block | `schema:SoftwareSourceCode` with `schema:programmingLanguage` and `schema:text` |
|
|
231
|
-
|
|
232
241
|
### List Handling
|
|
233
242
|
|
|
234
|
-
```markdown
|
|
235
|
-
-
|
|
236
|
-
-
|
|
243
|
+
```markdown {item}
|
|
244
|
+
- Item 1
|
|
245
|
+
- Item 2
|
|
237
246
|
```
|
|
238
247
|
|
|
239
248
|
Creates **multiple triples** with same predicate (not RDF lists):
|
|
240
249
|
|
|
241
250
|
```turtle
|
|
242
|
-
|
|
243
|
-
|
|
251
|
+
<subject> schema:item "Item 1" .
|
|
252
|
+
<subject> schema:item "Item 2" .
|
|
244
253
|
```
|
|
245
254
|
|
|
246
|
-
For RDF lists (`rdf:List`), use `@inlist` in generated HTML.
|
|
247
|
-
|
|
248
255
|
### Code Block Semantics
|
|
249
256
|
|
|
250
|
-
Fenced code blocks are automatically mapped to `schema:SoftwareSourceCode`:
|
|
251
|
-
|
|
252
257
|
```markdown
|
|
253
|
-
\`\`\`sparql {
|
|
254
|
-
SELECT
|
|
258
|
+
\`\`\`sparql {=ex:query-1 .SoftwareSourceCode}
|
|
259
|
+
SELECT \* WHERE { ?s ?p ?o }
|
|
255
260
|
\`\`\`
|
|
256
261
|
```
|
|
257
262
|
|
|
258
263
|
Creates:
|
|
264
|
+
|
|
259
265
|
- A `schema:SoftwareSourceCode` resource (or custom type via `typeof`)
|
|
260
266
|
- `schema:programmingLanguage` from the info string (`sparql`)
|
|
261
267
|
- `schema:text` with the raw source code
|
|
@@ -263,112 +269,32 @@ Creates:
|
|
|
263
269
|
|
|
264
270
|
This enables semantic queries like "find all SPARQL queries in my notes."
|
|
265
271
|
|
|
266
|
-
### Blank Node Strategy
|
|
267
|
-
|
|
268
|
-
Blank nodes are created for:
|
|
269
|
-
|
|
270
|
-
1. Task list items without explicit `#id`
|
|
271
|
-
2. Code blocks without explicit `#id`
|
|
272
|
-
3. Inline `typeof` without `id` when used with `rel`
|
|
273
|
-
|
|
274
|
-
## Testing
|
|
275
|
-
|
|
276
|
-
```bash
|
|
277
|
-
npm test
|
|
278
|
-
````
|
|
279
|
-
|
|
280
|
-
Tests cover:
|
|
281
|
-
|
|
282
|
-
- ✅ YAML-LD frontmatter parsing
|
|
283
|
-
- ✅ Subject inheritance via headings
|
|
284
|
-
- ✅ Property literals and datatypes (`property`, `datatype`)
|
|
285
|
-
- ✅ Object relationships (`rel` on links)
|
|
286
|
-
- ✅ Blank node generation (tasks, code blocks)
|
|
287
|
-
- ✅ List mappings (repeated properties)
|
|
288
|
-
- ✅ Code block semantics (`SoftwareSourceCode`)
|
|
289
|
-
- ✅ Semantic links in lists (`hasPart` TOC)
|
|
290
|
-
- ✅ Cross-references via fragment IDs
|
|
291
|
-
- ✅ Minimal Markdown → RDF (headings, paragraphs)
|
|
292
|
-
|
|
293
272
|
## Syntax Overview
|
|
294
273
|
|
|
295
274
|
### Core Features
|
|
296
275
|
|
|
297
|
-
**YAML-LD Frontmatter** — Define context and root subject:
|
|
298
|
-
|
|
299
|
-
```yaml
|
|
300
|
-
---
|
|
301
|
-
"@context":
|
|
302
|
-
"@vocab": "http://schema.org/"
|
|
303
|
-
"@id": "#doc"
|
|
304
|
-
"@type": Article
|
|
305
|
-
---
|
|
306
|
-
```
|
|
307
|
-
|
|
308
276
|
**Subject Declaration** — Headings create typed subjects:
|
|
309
277
|
|
|
310
278
|
```markdown
|
|
311
|
-
## Alice Johnson {
|
|
279
|
+
## Alice Johnson {=ex:alice .Person}
|
|
312
280
|
```
|
|
313
281
|
|
|
314
282
|
**Literal Properties** — Inline spans create properties:
|
|
315
283
|
|
|
316
284
|
```markdown
|
|
317
|
-
[Alice Johnson]{
|
|
318
|
-
[30]{
|
|
285
|
+
[Alice Johnson] {name}
|
|
286
|
+
[30] {age ^^xsd:integer}
|
|
319
287
|
```
|
|
320
288
|
|
|
321
289
|
**Object Properties** — Links create relationships:
|
|
322
290
|
|
|
323
291
|
```markdown
|
|
324
|
-
[Tech Corp](
|
|
292
|
+
[Tech Corp](=ex:company) {worksFor}
|
|
325
293
|
```
|
|
326
294
|
|
|
327
295
|
**Lists** — Repeated properties:
|
|
328
296
|
|
|
329
|
-
```markdown
|
|
330
|
-
-
|
|
331
|
-
-
|
|
297
|
+
```markdown {tag}
|
|
298
|
+
- Item 1
|
|
299
|
+
- Item 2
|
|
332
300
|
```
|
|
333
|
-
|
|
334
|
-
**Code Blocks** — Automatic `SoftwareSourceCode` mapping:
|
|
335
|
-
|
|
336
|
-
````markdown
|
|
337
|
-
```sparql
|
|
338
|
-
SELECT * WHERE { ?s ?p ?o }
|
|
339
|
-
```
|
|
340
|
-
````
|
|
341
|
-
|
|
342
|
-
````
|
|
343
|
-
|
|
344
|
-
**Tasks** — Markdown checklists become `schema:Action`:
|
|
345
|
-
```markdown
|
|
346
|
-
- [x] Completed task
|
|
347
|
-
- [ ] Pending task
|
|
348
|
-
````
|
|
349
|
-
|
|
350
|
-
### Optimization Tips
|
|
351
|
-
|
|
352
|
-
1. **Reuse DataFactory** — Pass custom factory instance to avoid allocations
|
|
353
|
-
2. **Minimize frontmatter** — Keep `@context` simple for faster parsing
|
|
354
|
-
3. **Batch processing** — Process multiple documents sequentially
|
|
355
|
-
4. **Fragment IDs** — Use `#id` on headings for efficient cross-references
|
|
356
|
-
|
|
357
|
-
## Future Work
|
|
358
|
-
|
|
359
|
-
- [ ] Streaming API for large documents
|
|
360
|
-
- [ ] Tables → CSVW integration
|
|
361
|
-
- [ ] Math blocks → MathML + RDF
|
|
362
|
-
- [ ] Image syntax → `schema:ImageObject`
|
|
363
|
-
- [ ] Bare URL links → `dct:references`
|
|
364
|
-
- [ ] Language tags (`lang` attribute)
|
|
365
|
-
- [ ] Source maps for debugging
|
|
366
|
-
|
|
367
|
-
## Standards Compliance
|
|
368
|
-
|
|
369
|
-
This parser implements:
|
|
370
|
-
|
|
371
|
-
- [MD-LD v0.1 Specification](./mdld_spec_dogfood.md)
|
|
372
|
-
- [RDF/JS Data Model](https://rdf.js.org/data-model-spec/)
|
|
373
|
-
- [RDFa Core 1.1](https://www.w3.org/TR/rdfa-core/) (subset)
|
|
374
|
-
- [JSON-LD 1.1](https://www.w3.org/TR/json-ld11/) (frontmatter)
|