mdld-parse 0.1.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (4) hide show
  1. package/README.md +83 -174
  2. package/index.js +438 -795
  3. package/package.json +7 -8
  4. package/tests.js +0 -409
package/README.md CHANGED
@@ -1,28 +1,36 @@
1
- # MD-LD Parser
1
+ # MD-LD Parse
2
2
 
3
- A standards-compliant parser for **MD-LD (Markdown-Linked Data)** — a human-friendly RDF authoring format that extends Markdown with semantic annotations.
3
+ **Markdown-Linked Data (MD-LD)** — a human-friendly RDF authoring format that extends Markdown with semantic annotations.
4
+
5
+ [NPM](https://www.npmjs.com/package/mdld-parse)
6
+
7
+ [Website](https://mdld.js.org)
4
8
 
5
9
  ## What is MD-LD?
6
10
 
7
11
  MD-LD allows you to author RDF graphs directly in Markdown using familiar syntax:
8
12
 
9
13
  ```markdown
10
- ---
11
- "@context":
12
- "@vocab": "http://schema.org/"
13
- "@id": "#doc"
14
- "@type": Article
15
- ---
14
+ # My Note {=urn:mdld:my-note-20251231 .NoteDigitalDocument}
16
15
 
17
- # My Article {#article typeof="Article"}
16
+ [ex]{: http://example.org/}
18
17
 
19
- Written by [Alice Johnson](#alice){property="author" typeof="Person"}
18
+ Written by [Alice Johnson](=ex:alice){author .Person}
20
19
 
21
- [Alice](#alice) works at [Tech Corp](#company){rel="worksFor" typeof="Organization"}
20
+ ## Alice's biography {=ex:alice}
21
+
22
+ [Alice](ex:alice){name} works at [Tech Corp](=ex:tech-corp){worksFor .Organization}
22
23
  ```
23
24
 
24
25
  This generates valid RDF triples while remaining readable as plain Markdown.
25
26
 
27
+ ```n-quads
28
+ <urn:mdld:my-note-20251231> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/NoteDigitalDocument> .
29
+ <urn:mdld:my-note-20251231> <http://schema.org/author> <http://example.org/alice> .
30
+ <http://example.org/alice> <http://schema.org/name> "Alice" .
31
+ <http://example.org/alice> <http://schema.org/worksFor> <http://example.org/tech-corp> .
32
+ ```
33
+
26
34
  ## Architecture
27
35
 
28
36
  ### Design Principles
@@ -32,6 +40,8 @@ This generates valid RDF triples while remaining readable as plain Markdown.
32
40
  3. **Standards Compliant** — Outputs RDF quads compatible with RDFa semantics
33
41
  4. **Markdown Native** — Plain Markdown yields minimal but valid RDF
34
42
  5. **Progressive Enhancement** — Add semantics incrementally via attributes
43
+ 6. **BaseIRI Inference** — Automatically infers baseIRI from document structure
44
+ 7. **Default Vocabulary** — Provides default vocabulary for common properties, extensible via options
35
45
 
36
46
  ### Stack Choices
37
47
 
@@ -40,8 +50,7 @@ This generates valid RDF triples while remaining readable as plain Markdown.
40
50
  We implement a **minimal, purpose-built parser** for maximum control and zero dependencies:
41
51
 
42
52
  - **Custom Markdown tokenizer** — Line-by-line parsing of headings, lists, paragraphs, code blocks
43
- - **Inline attribute parser** — Pandoc-style `{#id .class key="value"}` attribute extraction
44
- - **YAML-LD frontmatter parser** — Minimal YAML subset for `@context` and `@id` parsing
53
+ - **Inline attribute parser** — Pandoc-style `{=iri .class key="value"}` attribute extraction
45
54
  - **RDF quad generator** — Direct mapping from tokens to RDF/JS quads
46
55
 
47
56
  **Why custom?**
@@ -79,9 +88,7 @@ Markdown Text
79
88
 
80
89
  [Custom Tokenizer] — Extract headings, lists, paragraphs, code blocks
81
90
 
82
- [YAML-LD Parser] — Extract frontmatter @context and @id
83
-
84
- [Attribute Parser] — Parse {#id property="value"} from tokens
91
+ [Attribute Parser] — Parse {=iri .class key="value"} from tokens
85
92
 
86
93
  [Inline Parser] — Extract [text](url){attrs} spans
87
94
 
@@ -101,17 +108,6 @@ The zero-dependency design provides:
101
108
  3. **Predictable performance** — Linear time complexity, bounded memory
102
109
  4. **Easy integration** — Works in Node.js, browsers, and edge runtimes
103
110
 
104
- ### Performance Profile
105
-
106
- | Document Size | Peak Memory | Parse Time |
107
- | ------------- | ----------- | ---------- |
108
- | 10 KB | ~100 KB | <2ms |
109
- | 100 KB | ~500 KB | <20ms |
110
- | 1 MB | ~2 MB | <100ms |
111
- | 10 MB | ~10 MB | <1s |
112
-
113
- _Measured on modern JavaScript engines. Actual performance depends on document structure._
114
-
115
111
  ## Installation
116
112
 
117
113
  ### Node.js
@@ -121,12 +117,11 @@ npm install mdld-parse
121
117
  ```
122
118
 
123
119
  ```javascript
124
- import { parseMDLD } from "mdld-parse";
120
+ import { parse } from "mdld-parse";
125
121
 
126
- const markdown = `# Hello\n{#doc typeof="Article"}`;
127
- const quads = parseMDLD(markdown, {
128
- baseIRI: "http://example.org/doc",
129
- });
122
+ const markdown = `# Hello {=urn:mdld:hello .Article}`;
123
+ const result = parse(markdown);
124
+ const quads = result.quads;
130
125
  ```
131
126
 
132
127
  ### Browser (via CDN)
@@ -141,42 +136,63 @@ const quads = parseMDLD(markdown, {
141
136
  </script>
142
137
 
143
138
  <script type="module">
144
- import { parseMDLD } from "mdld-parse";
145
- // use parseMDLD...
139
+ import { parse } from "mdld-parse";
140
+ // use parse...
146
141
  </script>
147
142
  ```
148
143
 
149
144
  ## API
150
145
 
151
- ### `parseMDLD(markdown, options)`
146
+ ### `parse(markdown, options)`
152
147
 
153
- Parse MD-LD markdown and return RDF quads.
148
+ Parse MD-LD markdown and return parsing result.
154
149
 
155
150
  **Parameters:**
156
151
 
157
152
  - `markdown` (string) — MD-LD formatted text
158
153
  - `options` (object, optional):
159
- - `baseIRI` (string) — Base IRI for relative references (default: `''`)
160
- - `defaultVocab` (string) — Default vocabulary (default: `'http://schema.org/'`)
154
+ - `baseIRI` (string) — Base IRI for relative references
155
+ - `context` (object) — Additional context to merge with default context
161
156
  - `dataFactory` (object) — Custom RDF/JS DataFactory (default: built-in)
162
157
 
163
- **Returns:** Array of RDF/JS Quads
158
+ **Returns:** Object containing:
159
+ - `quads` — Array of RDF/JS Quads
160
+ - `origin` — Object with `blocks` and `quadIndex` for serialization
161
+ - `context` — Final context used for parsing
162
+
163
+ ### `serialize({ text, diff, origin, options })`
164
+
165
+ Serialize RDF changes back to markdown with proper positioning.
166
+
167
+ **Parameters:**
168
+
169
+ - `text` (string) — Original markdown text
170
+ - `diff` (object) — Changes to apply:
171
+ - `add` — Array of quads to add
172
+ - `delete` — Array of quads to remove
173
+ - `origin` (object) — Origin object from parse result
174
+ - `options` (object, optional) — Additional options
175
+
176
+ **Returns:** Object containing:
177
+ - `text` — Updated markdown text
178
+ - `origin` — Updated origin object
164
179
 
165
180
  ```javascript
166
- const quads = parseMDLD(
181
+ const result = parse(
167
182
  `
168
- # Article Title
169
- {#article typeof="Article"}
183
+ # Article Title {=ex:article .Article}
170
184
 
171
- Written by [Alice](#alice){property="author"}
185
+ Written by [Alice](ex:alice) {ex:author}
172
186
  `,
173
187
  {
174
188
  baseIRI: "http://example.org/doc",
175
- defaultVocab: "http://schema.org/",
189
+ context: {
190
+ '@vocab': 'http://schema.org/',
191
+ },
176
192
  }
177
193
  );
178
194
 
179
- // quads[0] = {
195
+ // result.quads[0] = {
180
196
  // subject: { termType: 'NamedNode', value: 'http://example.org/doc#article' },
181
197
  // predicate: { termType: 'NamedNode', value: 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' },
182
198
  // object: { termType: 'NamedNode', value: 'http://schema.org/Article' },
@@ -184,78 +200,51 @@ Written by [Alice](#alice){property="author"}
184
200
  // }
185
201
  ```
186
202
 
187
- ### Batch Processing
188
-
189
- For multiple documents, process them sequentially:
190
-
191
- ```javascript
192
- const documents = [markdown1, markdown2, markdown3];
193
- const allQuads = documents.flatMap((md) =>
194
- parseMDLD(md, { baseIRI: "http://example.org/" })
195
- );
196
- ```
197
-
198
203
  ## Implementation Details
199
204
 
200
205
  ### Subject Resolution
201
206
 
202
207
  MD-LD follows a clear subject inheritance model:
203
208
 
204
- 1. **Root subject** — Declared in YAML-LD `@id` field
205
- 2. **Heading subjects** — `## Title {#id typeof="Type"}`
206
- 3. **Inline subjects** — `[text](#id){typeof="Type"}`
209
+ 1. **Root subject** — Declared in the first heading of the document or inferred it's text content
210
+ 2. **Heading subjects** — `## Title {=ex:title .Type}`
211
+ 3. **Inline subjects** — `[text](=ex:text) {.Type}`
207
212
  4. **Blank nodes** — Generated for incomplete triples
208
213
 
209
214
  ```markdown
210
- # Document
211
-
212
- {#doc typeof="Article"}
215
+ # Document {=urn:mdld:doc .Article}
213
216
 
214
- ## Section
217
+ ## Section 1 {=urn:mdld:sec1 .Section}
215
218
 
216
- {#sec1 typeof="Section"}
219
+ [Text] {name} ← property of sec1
217
220
 
218
- [Text]{property="name"} ← property of #sec1
221
+ Back to [doc](=urn:mdld:doc) {hasPart}
219
222
  ```
220
223
 
221
- ### Property Mapping
222
-
223
- | Markdown | RDF Predicate |
224
- | ----------------------- | ------------------------------------------------------------------------------- |
225
- | Top-level H1 (no `#id`) | `rdfs:label` on root subject |
226
- | Heading with `{#id}` | `rdfs:label` on subject |
227
- | First paragraph | `dct:description` on root |
228
- | `{property="name"}` | Resolved via `@vocab` (e.g., `schema:name`) |
229
- | `{rel="author"}` | Resolved via `@vocab` (e.g., `schema:author`) |
230
- | Code block | `schema:SoftwareSourceCode` with `schema:programmingLanguage` and `schema:text` |
231
-
232
224
  ### List Handling
233
225
 
234
- ```markdown
235
- - [Item 1]{property="item"}
236
- - [Item 2]{property="item"}
226
+ ```markdown {item}
227
+ - Item 1
228
+ - Item 2
237
229
  ```
238
230
 
239
231
  Creates **multiple triples** with same predicate (not RDF lists):
240
232
 
241
233
  ```turtle
242
- <#doc> schema:item "Item 1" .
243
- <#doc> schema:item "Item 2" .
234
+ <subject> schema:item "Item 1" .
235
+ <subject> schema:item "Item 2" .
244
236
  ```
245
237
 
246
- For RDF lists (`rdf:List`), use `@inlist` in generated HTML.
247
-
248
238
  ### Code Block Semantics
249
239
 
250
- Fenced code blocks are automatically mapped to `schema:SoftwareSourceCode`:
251
-
252
240
  ```markdown
253
- \`\`\`sparql {#query-1}
254
- SELECT * WHERE { ?s ?p ?o }
241
+ \`\`\`sparql {=ex:query-1 .SoftwareSourceCode}
242
+ SELECT \* WHERE { ?s ?p ?o }
255
243
  \`\`\`
256
244
  ```
257
245
 
258
246
  Creates:
247
+
259
248
  - A `schema:SoftwareSourceCode` resource (or custom type via `typeof`)
260
249
  - `schema:programmingLanguage` from the info string (`sparql`)
261
250
  - `schema:text` with the raw source code
@@ -263,112 +252,32 @@ Creates:
263
252
 
264
253
  This enables semantic queries like "find all SPARQL queries in my notes."
265
254
 
266
- ### Blank Node Strategy
267
-
268
- Blank nodes are created for:
269
-
270
- 1. Task list items without explicit `#id`
271
- 2. Code blocks without explicit `#id`
272
- 3. Inline `typeof` without `id` when used with `rel`
273
-
274
- ## Testing
275
-
276
- ```bash
277
- npm test
278
- ````
279
-
280
- Tests cover:
281
-
282
- - ✅ YAML-LD frontmatter parsing
283
- - ✅ Subject inheritance via headings
284
- - ✅ Property literals and datatypes (`property`, `datatype`)
285
- - ✅ Object relationships (`rel` on links)
286
- - ✅ Blank node generation (tasks, code blocks)
287
- - ✅ List mappings (repeated properties)
288
- - ✅ Code block semantics (`SoftwareSourceCode`)
289
- - ✅ Semantic links in lists (`hasPart` TOC)
290
- - ✅ Cross-references via fragment IDs
291
- - ✅ Minimal Markdown → RDF (headings, paragraphs)
292
-
293
255
  ## Syntax Overview
294
256
 
295
257
  ### Core Features
296
258
 
297
- **YAML-LD Frontmatter** — Define context and root subject:
298
-
299
- ```yaml
300
- ---
301
- "@context":
302
- "@vocab": "http://schema.org/"
303
- "@id": "#doc"
304
- "@type": Article
305
- ---
306
- ```
307
-
308
259
  **Subject Declaration** — Headings create typed subjects:
309
260
 
310
261
  ```markdown
311
- ## Alice Johnson {#alice typeof="Person"}
262
+ ## Alice Johnson {=ex:alice .Person}
312
263
  ```
313
264
 
314
265
  **Literal Properties** — Inline spans create properties:
315
266
 
316
267
  ```markdown
317
- [Alice Johnson]{property="name"}
318
- [30]{property="age" datatype="xsd:integer"}
268
+ [Alice Johnson] {name}
269
+ [30] {age ^^xsd:integer}
319
270
  ```
320
271
 
321
272
  **Object Properties** — Links create relationships:
322
273
 
323
274
  ```markdown
324
- [Tech Corp](#company){rel="worksFor"}
275
+ [Tech Corp](=ex:company) {worksFor}
325
276
  ```
326
277
 
327
278
  **Lists** — Repeated properties:
328
279
 
329
- ```markdown
330
- - [Item 1]{property="tag"}
331
- - [Item 2]{property="tag"}
280
+ ```markdown {tag}
281
+ - Item 1
282
+ - Item 2
332
283
  ```
333
-
334
- **Code Blocks** — Automatic `SoftwareSourceCode` mapping:
335
-
336
- ````markdown
337
- ```sparql
338
- SELECT * WHERE { ?s ?p ?o }
339
- ```
340
- ````
341
-
342
- ````
343
-
344
- **Tasks** — Markdown checklists become `schema:Action`:
345
- ```markdown
346
- - [x] Completed task
347
- - [ ] Pending task
348
- ````
349
-
350
- ### Optimization Tips
351
-
352
- 1. **Reuse DataFactory** — Pass custom factory instance to avoid allocations
353
- 2. **Minimize frontmatter** — Keep `@context` simple for faster parsing
354
- 3. **Batch processing** — Process multiple documents sequentially
355
- 4. **Fragment IDs** — Use `#id` on headings for efficient cross-references
356
-
357
- ## Future Work
358
-
359
- - [ ] Streaming API for large documents
360
- - [ ] Tables → CSVW integration
361
- - [ ] Math blocks → MathML + RDF
362
- - [ ] Image syntax → `schema:ImageObject`
363
- - [ ] Bare URL links → `dct:references`
364
- - [ ] Language tags (`lang` attribute)
365
- - [ ] Source maps for debugging
366
-
367
- ## Standards Compliance
368
-
369
- This parser implements:
370
-
371
- - [MD-LD v0.1 Specification](./mdld_spec_dogfood.md)
372
- - [RDF/JS Data Model](https://rdf.js.org/data-model-spec/)
373
- - [RDFa Core 1.1](https://www.w3.org/TR/rdfa-core/) (subset)
374
- - [JSON-LD 1.1](https://www.w3.org/TR/json-ld11/) (frontmatter)