npm - mdld-parse - Versions diffs - 0.1.0 → 0.2.2 - Mend

mdld-parse 0.1.0 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/README.md CHANGED Viewed

@@ -1,28 +1,36 @@
-# MD-LD Parser
+# MD-LD Parse
-A standards-compliant parser for **MD-LD (Markdown-Linked Data)** — a human-friendly RDF authoring format that extends Markdown with semantic annotations.
+**Markdown-Linked Data (MD-LD)** — a human-friendly RDF authoring format that extends Markdown with semantic annotations.
+[NPM](https://www.npmjs.com/package/mdld-parse)
+[Website](https://mdld.js.org)
 ## What is MD-LD?
 MD-LD allows you to author RDF graphs directly in Markdown using familiar syntax:
 ```markdown
----
-"@context":
-  "@vocab": "http://schema.org/"
-"@id": "#doc"
-"@type": Article
----
+# My Note {=urn:mdld:my-note-20251231 .NoteDigitalDocument}
-# My Article {#article typeof="Article"}
+[ex]{: http://example.org/}
-Written by [Alice Johnson](#alice){property="author" typeof="Person"}
+Written by [Alice Johnson](=ex:alice){author .Person}
-[Alice](#alice) works at [Tech Corp](#company){rel="worksFor" typeof="Organization"}
+## Alice's biography {=ex:alice}
+[Alice](ex:alice){name} works at [Tech Corp](=ex:tech-corp){worksFor .Organization}
 ```
 This generates valid RDF triples while remaining readable as plain Markdown.
+```n-quads
+<urn:mdld:my-note-20251231> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/NoteDigitalDocument> .
+<urn:mdld:my-note-20251231> <http://schema.org/author> <http://example.org/alice> .
+<http://example.org/alice> <http://schema.org/name> "Alice" .
+<http://example.org/alice> <http://schema.org/worksFor> <http://example.org/tech-corp> .
+```
 ## Architecture
 ### Design Principles
@@ -32,6 +40,8 @@ This generates valid RDF triples while remaining readable as plain Markdown.
 3. **Standards Compliant** — Outputs RDF quads compatible with RDFa semantics
 4. **Markdown Native** — Plain Markdown yields minimal but valid RDF
 5. **Progressive Enhancement** — Add semantics incrementally via attributes
+6. **BaseIRI Inference** — Automatically infers baseIRI from document structure
+7. **Default Vocabulary** — Provides default vocabulary for common properties, extensible via options
 ### Stack Choices
@@ -40,8 +50,7 @@ This generates valid RDF triples while remaining readable as plain Markdown.
 We implement a **minimal, purpose-built parser** for maximum control and zero dependencies:
 - **Custom Markdown tokenizer** — Line-by-line parsing of headings, lists, paragraphs, code blocks
-- **Inline attribute parser** — Pandoc-style `{#id .class key="value"}` attribute extraction
-- **YAML-LD frontmatter parser** — Minimal YAML subset for `@context` and `@id` parsing
+- **Inline attribute parser** — Pandoc-style `{=iri .class key="value"}` attribute extraction
 - **RDF quad generator** — Direct mapping from tokens to RDF/JS quads
 **Why custom?**
@@ -79,9 +88,7 @@ Markdown Text
     ↓
 [Custom Tokenizer] — Extract headings, lists, paragraphs, code blocks
     ↓
-[YAML-LD Parser] — Extract frontmatter @context and @id
-    ↓
-[Attribute Parser] — Parse {#id property="value"} from tokens
+[Attribute Parser] — Parse {=iri .class key="value"} from tokens
     ↓
 [Inline Parser] — Extract [text](url){attrs} spans
     ↓
@@ -101,17 +108,6 @@ The zero-dependency design provides:
 3. **Predictable performance** — Linear time complexity, bounded memory
 4. **Easy integration** — Works in Node.js, browsers, and edge runtimes
-### Performance Profile
-| Document Size | Peak Memory | Parse Time |
-| ------------- | ----------- | ---------- |
-| 10 KB         | ~100 KB     | <2ms       |
-| 100 KB        | ~500 KB     | <20ms      |
-| 1 MB          | ~2 MB       | <100ms     |
-| 10 MB         | ~10 MB      | <1s        |
-_Measured on modern JavaScript engines. Actual performance depends on document structure._
 ## Installation
 ### Node.js
@@ -121,12 +117,11 @@ npm install mdld-parse
 ```
 ```javascript
-import { parseMDLD } from "mdld-parse";
+import { parse } from "mdld-parse";
-const markdown = `# Hello\n{#doc typeof="Article"}`;
-const quads = parseMDLD(markdown, {
-	baseIRI: "http://example.org/doc",
-});
+const markdown = `# Hello {=urn:mdld:hello .Article}`;
+const result = parse(markdown);
+const quads = result.quads;
 ```
 ### Browser (via CDN)
@@ -141,58 +136,85 @@ const quads = parseMDLD(markdown, {
 </script>
 <script type="module">
-	import { parseMDLD } from "mdld-parse";
-	// use parseMDLD...
+	import { parse } from "mdld-parse";
+	// use parse...
 </script>
 ```
 ## API
-### `parseMDLD(markdown, options)`
+### `parse(markdown, options)`
-Parse MD-LD markdown and return RDF quads.
+Parse MD-LD markdown and return parsing result.
 **Parameters:**
 - `markdown` (string) — MD-LD formatted text
 - `options` (object, optional):
-  - `baseIRI` (string) — Base IRI for relative references (default: `''`)
-  - `defaultVocab` (string) — Default vocabulary (default: `'http://schema.org/'`)
+  - `baseIRI` (string) — Base IRI for relative references
+  - `context` (object) — Additional context to merge with default context
   - `dataFactory` (object) — Custom RDF/JS DataFactory (default: built-in)
-**Returns:** Array of RDF/JS Quads
+**Returns:** Object containing:
+- `quads` — Array of RDF/JS Quads
+- `origin` — Object with `blocks` and `quadIndex` for serialization
+- `context` — Final context used for parsing
+### `serialize({ text, diff, origin, options })`
+Serialize RDF changes back to markdown with proper positioning.
+**Parameters:**
+- `text` (string) — Original markdown text
+- `diff` (object) — Changes to apply:
+  - `add` — Array of quads to add
+  - `delete` — Array of quads to remove
+- `origin` (object) — Origin object from parse result
+- `options` (object, optional) — Additional options:
+  - `context` (object) — Context for IRI shortening (default: empty object)
+**Returns:** Object containing:
+- `text` — Updated markdown text
+- `origin` — Updated origin object
 ```javascript
-const quads = parseMDLD(
+const result = parse(
 	`
-# Article Title
-{#article typeof="Article"}
+# Article Title {=ex:article .Article}
-Written by [Alice](#alice){property="author"}
+Written by [Alice](ex:alice) {ex:author}
 `,
 	{
 		baseIRI: "http://example.org/doc",
-		defaultVocab: "http://schema.org/",
+		context: {
+      '@vocab': 'http://schema.org/',
+    },
 	}
 );
-// quads[0] = {
+// result.quads[0] = {
 //   subject: { termType: 'NamedNode', value: 'http://example.org/doc#article' },
 //   predicate: { termType: 'NamedNode', value: 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' },
 //   object: { termType: 'NamedNode', value: 'http://schema.org/Article' },
 //   graph: { termType: 'DefaultGraph' }
 // }
-```
-### Batch Processing
-For multiple documents, process them sequentially:
+// Add a new quad with proper IRI shortening
+const newQuad = {
+    subject: { termType: 'NamedNode', value: 'http://example.org/doc#article' },
+    predicate: { termType: 'NamedNode', value: 'http://schema.org/dateCreated' },
+    object: { termType: 'Literal', value: '2024-01-01' }
+};
+const serialized = serialize({
+    text: originalText,
+    diff: { add: [newQuad] },
+    origin: result.origin,
+    options: { context: result.context }  // Important: pass context for IRI shortening
+});
-```javascript
-const documents = [markdown1, markdown2, markdown3];
-const allQuads = documents.flatMap((md) =>
-	parseMDLD(md, { baseIRI: "http://example.org/" })
-);
+// Result: [2024-01-01] {dateCreated}  // Properly shortened!
 ```
 ## Implementation Details
@@ -201,61 +223,45 @@ const allQuads = documents.flatMap((md) =>
 MD-LD follows a clear subject inheritance model:
-1. **Root subject** — Declared in YAML-LD `@id` field
-2. **Heading subjects** — `## Title {#id typeof="Type"}`
-3. **Inline subjects** — `[text](#id){typeof="Type"}`
+1. **Root subject** — Declared in the first heading of the document or inferred it's text content
+2. **Heading subjects** — `## Title {=ex:title .Type}`
+3. **Inline subjects** — `[text](=ex:text) {.Type}`
 4. **Blank nodes** — Generated for incomplete triples
 ```markdown
-# Document
-{#doc typeof="Article"}
+# Document {=urn:mdld:doc .Article}
-## Section
+## Section 1 {=urn:mdld:sec1 .Section}
-{#sec1 typeof="Section"}
+[Text] {name} ← property of sec1
-[Text]{property="name"} ← property of #sec1
+Back to [doc](=urn:mdld:doc) {hasPart}
 ```
-### Property Mapping
-| Markdown                | RDF Predicate                                                                   |
-| ----------------------- | ------------------------------------------------------------------------------- |
-| Top-level H1 (no `#id`) | `rdfs:label` on root subject                                                    |
-| Heading with `{#id}`    | `rdfs:label` on subject                                                         |
-| First paragraph         | `dct:description` on root                                                       |
-| `{property="name"}`     | Resolved via `@vocab` (e.g., `schema:name`)                                     |
-| `{rel="author"}`        | Resolved via `@vocab` (e.g., `schema:author`)                                   |
-| Code block              | `schema:SoftwareSourceCode` with `schema:programmingLanguage` and `schema:text` |
 ### List Handling
-```markdown
-- [Item 1]{property="item"}
-- [Item 2]{property="item"}
+```markdown {item}
+- Item 1
+- Item 2
 ```
 Creates **multiple triples** with same predicate (not RDF lists):
 ```turtle
-<#doc> schema:item "Item 1" .
-<#doc> schema:item "Item 2" .
+<subject> schema:item "Item 1" .
+<subject> schema:item "Item 2" .
 ```
-For RDF lists (`rdf:List`), use `@inlist` in generated HTML.
 ### Code Block Semantics
-Fenced code blocks are automatically mapped to `schema:SoftwareSourceCode`:
 ```markdown
-\`\`\`sparql {#query-1}
-SELECT * WHERE { ?s ?p ?o }
+\`\`\`sparql {=ex:query-1 .SoftwareSourceCode}
+SELECT \* WHERE { ?s ?p ?o }
 \`\`\`
 ```
 Creates:
 - A `schema:SoftwareSourceCode` resource (or custom type via `typeof`)
 - `schema:programmingLanguage` from the info string (`sparql`)
 - `schema:text` with the raw source code
@@ -263,112 +269,32 @@ Creates:
 This enables semantic queries like "find all SPARQL queries in my notes."
-### Blank Node Strategy
-Blank nodes are created for:
-1. Task list items without explicit `#id`
-2. Code blocks without explicit `#id`
-3. Inline `typeof` without `id` when used with `rel`
-## Testing
-```bash
-npm test
-````
-Tests cover:
-- ✅ YAML-LD frontmatter parsing
-- ✅ Subject inheritance via headings
-- ✅ Property literals and datatypes (`property`, `datatype`)
-- ✅ Object relationships (`rel` on links)
-- ✅ Blank node generation (tasks, code blocks)
-- ✅ List mappings (repeated properties)
-- ✅ Code block semantics (`SoftwareSourceCode`)
-- ✅ Semantic links in lists (`hasPart` TOC)
-- ✅ Cross-references via fragment IDs
-- ✅ Minimal Markdown → RDF (headings, paragraphs)
 ## Syntax Overview
 ### Core Features
-**YAML-LD Frontmatter** — Define context and root subject:
-```yaml
----
-"@context":
-  "@vocab": "http://schema.org/"
-"@id": "#doc"
-"@type": Article
----
-```
 **Subject Declaration** — Headings create typed subjects:
 ```markdown
-## Alice Johnson {#alice typeof="Person"}
+## Alice Johnson {=ex:alice .Person}
 ```
 **Literal Properties** — Inline spans create properties:
 ```markdown
-[Alice Johnson]{property="name"}
-[30]{property="age" datatype="xsd:integer"}
+[Alice Johnson] {name}
+[30] {age ^^xsd:integer}
 ```
 **Object Properties** — Links create relationships:
 ```markdown
-[Tech Corp](#company){rel="worksFor"}
+[Tech Corp](=ex:company) {worksFor}
 ```
 **Lists** — Repeated properties:
-```markdown
-- [Item 1]{property="tag"}
-- [Item 2]{property="tag"}
+```markdown {tag}
+- Item 1
+- Item 2
 ```
-**Code Blocks** — Automatic `SoftwareSourceCode` mapping:
-````markdown
-```sparql
-SELECT * WHERE { ?s ?p ?o }
-```
-````
-````
-**Tasks** — Markdown checklists become `schema:Action`:
-```markdown
-- [x] Completed task
-- [ ] Pending task
-````
-### Optimization Tips
-1. **Reuse DataFactory** — Pass custom factory instance to avoid allocations
-2. **Minimize frontmatter** — Keep `@context` simple for faster parsing
-3. **Batch processing** — Process multiple documents sequentially
-4. **Fragment IDs** — Use `#id` on headings for efficient cross-references
-## Future Work
-- [ ] Streaming API for large documents
-- [ ] Tables → CSVW integration
-- [ ] Math blocks → MathML + RDF
-- [ ] Image syntax → `schema:ImageObject`
-- [ ] Bare URL links → `dct:references`
-- [ ] Language tags (`lang` attribute)
-- [ ] Source maps for debugging
-## Standards Compliance
-This parser implements:
-- [MD-LD v0.1 Specification](./mdld_spec_dogfood.md)
-- [RDF/JS Data Model](https://rdf.js.org/data-model-spec/)
-- [RDFa Core 1.1](https://www.w3.org/TR/rdfa-core/) (subset)
-- [JSON-LD 1.1](https://www.w3.org/TR/json-ld11/) (frontmatter)