npm - mdld-parse - Versions diffs - 0.5.2 → 0.5.4 - Mend

mdld-parse 0.5.2 → 0.5.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/README.md +172 -59
package/package.json +3 -2
package/src/{serialize.js → applyDiff.js} +1 -1
package/src/generate.js +248 -0
package/src/index.js +3 -1
package/src/locate.js +92 -0
package/src/parse.js +120 -96
package/src/utils.js +19 -4

package/README.md CHANGED Viewed

@@ -11,26 +11,55 @@
 MD-LD allows you to author RDF graphs directly in Markdown using explicit `{...}` annotations:
 ```markdown
-# Apollo 11 {=ex:apollo11 .SpaceMission}
+[my] <tag:alice@example.com,2026:>
-Launch: [1969-07-16] {startDate ^^xsd:date}
-Crew: [Neil Armstrong] {+ex:armstrong ?crewMember name}
-Description: [First crewed Moon landing] {description}
+# 2024-07-18 {=my:journal-2024-07-18 .my:Event my:date ^^xsd:date}
+## A good day {label}
+Mood: [Happy] {my:mood}
+Energy level: [8] {my:energyLevel ^^xsd:integer}
+Met [Sam] {+my:sam .my:Person ?my:attendee} on my regular walk at [Central Park] {+my:central-park ?my:location .my:Place label @en} and talked about [Sunny] {my:weather} weather.
+Activities: {?my:hasActivity .my:Activity label}
+- Walking {=#walking}
+- Reading {=#reading}
-[Section] {+#overview ?hasPart}
-Overview: [Mission summary] {description}
 ```
 Generates valid RDF triples:
 ```turtle
-ex:apollo11 a schema:SpaceMission ;
-  schema:startDate "1969-07-16"^^xsd:date ;
-  schema:crewMember ex:armstrong ;
-  schema:description "First crewed Moon landing" .
-ex:armstrong schema:name "Neil Armstrong" .
-```
+@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
+@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
+@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
+@prefix sh: <http://www.w3.org/ns/shacl#>.
+@prefix prov: <http://www.w3.org/ns/prov#>.
+@prefix ex: <http://example.org/>.
+@prefix my: <tag:alice@example.com,2026:>.
+my:journal-2024-07-18 a my:Event;
+    my:date "2024-07-18"^^xsd:date;
+    rdfs:label "A good day";
+    my:mood "Happy";
+    my:energyLevel 8;
+    my:attendee my:sam;
+    my:location my:central-park;
+    my:weather "Sunny";
+    my:hasActivity <tag:alice@example.com,2026:journal-2024-07-18#walking>, <tag:alice@example.com,2026:journal-2024-07-18#reading>.
+my:sam a my:Person.
+my:central-park a my:Place;
+    rdfs:label "Central Park"@en.
+<tag:alice@example.com,2026:journal-2024-07-18#walking> a my:Activity;
+    rdfs:label "Walking".
+<tag:alice@example.com,2026:journal-2024-07-18#reading> a my:Activity;
+    rdfs:label "Reading".
+```
+Read the [FULL SPEC](./docs/Spec/Spec.md).
 ## Core Features
@@ -112,11 +141,11 @@ Create fragment IRIs relative to current subject:
 ```markdown
 # Document {=ex:document}
 {=#summary}
-[Content] {name}
+[Content] {label}
 ```
 ```turtle
-ex:document#summary schema:name "Content" .
+ex:document#summary rdfs:label "Content" .
 ```
 Fragments replace any existing fragment and require a current subject.
@@ -128,11 +157,11 @@ Subject remains in scope until reset with `{=}` or new subject declared.
 Emit `rdf:type` triple:
 ```markdown
-## Apollo 11 {=ex:apollo11 .SpaceMission .Event}
+## Apollo 11 {=ex:apollo11 .ex:SpaceMission .ex:Event}
 ```
 ```turtle
-ex:apollo11 a schema:SpaceMission, schema:Event .
+ex:apollo11 a ex:SpaceMission, ex:Event .
 ```
 ### Literal Properties
@@ -142,15 +171,15 @@ Inline value carriers emit literal properties:
 ```markdown
 # Mission {=ex:apollo11}
-[Neil Armstrong] {commander}
-[1969] {year ^^xsd:gYear}
-[Historic mission] {description @en}
+[Neil Armstrong] {ex:commander}
+[1969] {ex:year ^^xsd:gYear}
+[Historic mission] {ex:description @en}
 ```
 ```turtle
-ex:apollo11 schema:commander "Neil Armstrong" ;
-  schema:year "1969"^^xsd:gYear ;
-  schema:description "Historic mission"@en .
+ex:apollo11 ex:commander "Neil Armstrong" ;
+  ex:year "1969"^^xsd:gYear ;
+  ex:description "Historic mission"@en .
 ```
 ### Object Properties
@@ -160,11 +189,11 @@ Links create relationships (use `?` prefix):
 ```markdown
 # Mission {=ex:apollo11}
-[NASA] {=ex:nasa ?organizer}
+[NASA] {=ex:nasa ?ex:organizer}
 ```
 ```turtle
-ex:apollo11 schema:organizer ex:nasa .
+ex:apollo11 ex:organizer ex:nasa .
 ```
 ### Resource Declaration
@@ -174,12 +203,12 @@ Declare resources inline with `{=iri}`:
 ```markdown
 # Mission {=ex:apollo11}
-[Neil Armstrong] {=ex:armstrong ?commander .Person}
+[Neil Armstrong] {=ex:armstrong ?ex:commander .prov:Person}
 ```
 ```turtle
-ex:apollo11 schema:commander ex:armstrong .
-ex:armstrong a schema:Person .
+ex:apollo11 ex:commander ex:armstrong .
+ex:armstrong a prov:Person .
 ```
 ### Lists
@@ -189,15 +218,15 @@ Lists require explicit subjects per item.
 ```markdown
 # Recipe {=ex:recipe}
-Ingredients: {?ingredient .Ingredient}
-- Flour {=ex:flour name}
-- Water {=ex:water name}
+Ingredients: {?ex:ingredient .ex:Ingredient}
+- Flour {=ex:flour label}
+- Water {=ex:water label}
 ```
 ```turtle
-ex:recipe schema:ingredient ex:flour, ex:water .
-ex:flour a schema:Ingredient ; schema:name "Flour" .
-ex:water a schema:Ingredient ; schema:name "Water" .
+ex:recipe ex:ingredient ex:flour, ex:water .
+ex:flour a ex:Ingredient ; rdfs:label "Flour" .
+ex:water a ex:Ingredient ; rdfs:label "Water" .
 ```
 ### Code Blocks
@@ -207,14 +236,14 @@ Code blocks are value carriers:
 ````markdown
 # Example {=ex:example}
-```javascript {=ex:code .SoftwareSourceCode text}
+```javascript {=ex:code .ex:SoftwareSourceCode ex:text}
 console.log("hello");
 ```
 ````
 ```turtle
-ex:code a schema:SoftwareSourceCode ;
-  schema:text "console.log(\"hello\")" .
+ex:code a ex:SoftwareSourceCode ;
+  ex:text "console.log(\"hello\")" .
 ```
 ### Blockquotes
@@ -222,11 +251,11 @@ ex:code a schema:SoftwareSourceCode ;
 ```markdown
 # Article {=ex:article}
-> MD-LD bridges Markdown and RDF. {abstract}
+> MD-LD bridges Markdown and RDF. {comment}
 ```
 ```turtle
-ex:article schema:abstract "MD-LD bridges Markdown and RDF." .
+ex:article rdfs:comment "MD-LD bridges Markdown and RDF." .
 ```
 ### Reverse Relations
@@ -236,13 +265,13 @@ Reverse the relationship direction:
 ```markdown
 # Part {=ex:part}
-Part of: {!hasPart}
+Part of: {!ex:hasPart}
 - Book {=ex:book}
 ```
 ```turtle
-ex:book schema:hasPart ex:part .
+ex:book ex:hasPart ex:part .
 ```
 ### Prefix Declarations
@@ -250,7 +279,6 @@ ex:book schema:hasPart ex:part .
 ```markdown
 [ex] <http://example.org/>
 [foaf] <http://xmlns.com/foaf/0.1/>
-[@vocab] <http://schema.org/>
 # Person {=ex:alice .foaf:Person}
 ```
@@ -297,7 +325,7 @@ Parse MD-LD markdown and return RDF quads with origin tracking.
 - `markdown` (string) — MD-LD formatted text
 - `options` (object, optional):
-  - `context` (object) — Prefix mappings (default: `{ '@vocab': 'http://www.w3.org/2000/01/rdf-schema#', rdf, rdfs, xsd, schema }`)
+  - `context` (object) — Prefix mappings (default: `{ '@vocab': 'http://www.w3.org/2000/01/rdf-schema#', rdf, rdfs, xsd, sh, prov }`)
   - `dataFactory` (object) — Custom RDF/JS DataFactory
 **Returns:** `{ quads, origin, context }`
@@ -329,7 +357,7 @@ console.log(result.quads);
 // ]
 ```
-### `serialize({ text, diff, origin, options })`
+### `applyDiff({ text, diff, origin, options })`
 Apply RDF changes back to markdown with proper positioning.
@@ -353,18 +381,18 @@ Apply RDF changes back to markdown with proper positioning.
 ```javascript
 const original = `# Article {=ex:article}
-[Alice] {author}`;
+[Alice] {ex:author}`;
 const result = parse(original, { context: { ex: 'http://example.org/' } });
 // Add a new property
 const newQuad = {
   subject: { termType: 'NamedNode', value: 'http://example.org/article' },
-  predicate: { termType: 'NamedNode', value: 'http://schema.org/datePublished' },
+  predicate: { termType: 'NamedNode', value: 'http://example.org/datePublished' },
   object: { termType: 'Literal', value: '2024-01-01' }
 };
-const updated = serialize({
+const updated = applyDiff({
   text: original,
   diff: { add: [newQuad] },
   origin: result.origin,
@@ -378,6 +406,91 @@ console.log(updated.text);
 // [2024-01-01] {datePublished}
 ```
+### `generate(quads, context)`
+Generate deterministic MDLD from RDF quads with origin tracking.
+**Parameters:**
+- `quads` (array) — Array of RDF/JS Quads to convert
+- `context` (object, optional) — Prefix mappings (default: `{}`)
+  - Merged with DEFAULT_CONTEXT for proper CURIE shortening
+  - Only user-defined prefixes are rendered in output
+**Returns:** `{ text, origin, context }`
+- `text` — Generated MDLD markdown
+- `origin` — Origin tracking object with:
+  - `blocks` — Map of block IDs to source locations
+  - `quadIndex` — Map of quads to block IDs
+- `context` — Final context used (includes defaults)
+**Example:**
+```javascript
+const quads = [
+  {
+    subject: { termType: 'NamedNode', value: 'http://example.org/article' },
+    predicate: { termType: 'NamedNode', value: 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' },
+    object: { termType: 'NamedNode', value: 'http://example.org/Article' }
+  },
+  {
+    subject: { termType: 'NamedNode', value: 'http://example.org/article' },
+    predicate: { termType: 'NamedNode', value: 'http://example.org/author' },
+    object: { termType: 'NamedNode', value: 'http://example.org/alice' }
+  }
+];
+const result = generate(quads, {
+  ex: 'http://example.org/',
+});
+console.log(result.text);
+// # Article {=ex:article .ex:Article}
+//
+// > alice {+ex:alice ?ex:author}
+```
+### `locate(quad, origin, text, context)`
+Locate the precise text range of a quad in MDLD text using origin tracking.
+**Parameters:**
+- `quad` (object) — The quad to locate (subject, predicate, object)
+- `origin` (object, optional) — Origin object containing blocks and quadIndex
+- `text` (string, optional) — MDLD text (auto-parsed if origin not provided)
+- `context` (object, optional) — Context for parsing when text needs to be parsed
+**Returns:** `{ blockId, entryIndex, range, content, blockRange, carrierType, isVacant }` or `null`
+- `blockId` — ID of the containing block
+- `entryIndex` — Position within block entries
+- `range` — Precise character range of the quad content
+- `content` — Actual text content at that range
+- `blockRange` — Full range of the containing block
+- `carrierType` — Type of carrier (heading, blockquote, list, span)
+- `isVacant` — Whether the slot is marked as vacant
+**Example:**
+```javascript
+import { parse, locate } from './src/index.js';
+const result = parse(mdldText, { context: { ex: 'http://example.org/' } });
+const quad = result.quads[0]; // Find a quad to locate
+// Pattern 1: With origin (most efficient)
+const location1 = locate(quad, result.origin, mdldText);
+// Pattern 2: Auto-parse text (convenient)
+const location2 = locate(quad, null, mdldText, { ex: 'http://example.org/' });
+console.log(location1.range); // { start: 38, end: 44 }
+console.log(location1.content); // " Alice"
+console.log(location1.carrierType); // "blockquote"
+```
 ## Value Carriers
 Only specific markdown elements can carry semantic values:
@@ -464,14 +577,14 @@ Therefore, the algebra is **closed**.
 ```markdown
 [alice] <tag:alice@example.com,2026:>
-# Meeting Notes {=alice:meeting-2024-01-15 .Meeting}
+# Meeting Notes {=alice:meeting-2024-01-15 .alice:Meeting}
-Attendees: {?attendee name}
+Attendees: {?alice:attendee label}
 - Alice {=alice:alice}
 - Bob {=alice:bob}
-Action items: {?actionItem name}
+Action items: {?alice:actionItem label}
 - Review proposal {=alice:task-1}
 ```
@@ -479,14 +592,14 @@ Action items: {?actionItem name}
 ### Developer Documentation
 ````markdown
-# API Endpoint {=api:/users/:id .APIEndpoint}
+# API Endpoint {=api:/users/:id .api:Endpoint}
-[GET] {method}
-[/users/:id] {path}
+[GET] {api:method}
+[/users/:id] {api:path}
 Example:
-```bash {=api:/users/:id#example .CodeExample text}
+```bash {=api:/users/:id#example .api:CodeExample api:code}
 curl https://api.example.com/users/123
 ```
 ````
@@ -496,13 +609,13 @@ curl https://api.example.com/users/123
 ```markdown
 [alice] <tag:alice@example.com,2026:>
-# Paper {=alice:paper-semantic-markdown .ScholarlyArticle}
+# Paper {=alice:paper-semantic-markdown .alice:ScholarlyArticle}
-[Semantic Web] {about}
-[Alice Johnson] {=alice:alice-johnson ?author}
-[2024-01] {datePublished ^^xsd:gYearMonth}
+[Semantic Web] {label}
+[Alice Johnson] {=alice:alice-johnson ?alice:author}
+[2024-01] {alice:datePublished ^^xsd:gYearMonth}
-> This paper explores semantic markup in Markdown. {abstract @en}
+> This paper explores semantic markup in Markdown. {comment @en}
 ```
 ## Testing

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
 	"name": "mdld-parse",
-	"version": "0.5.2",
+	"version": "0.5.4",
 	"description": "A standards-compliant parser for **MD-LD (Markdown-Linked Data)** — a human-friendly RDF authoring format that extends Markdown with semantic annotations.",
 	"type": "module",
 	"main": "index.js",
@@ -12,7 +12,8 @@
 		"src"
 	],
 	"scripts": {
-		"test": "node tests/index.js"
+		"test": "node tests/index.js",
+		"dev": "pnpx live-server"
 	},
 	"keywords": [
 		"mdld",

package/src/{serialize.js → applyDiff.js} RENAMED Viewed

@@ -158,7 +158,7 @@ function markEntryAsVacant(entry, quad) {
     return null;
 }
-export function serialize({ text, diff, origin, options = {} }) {
+export function applyDiff({ text, diff, origin, options = {} }) {
     if (!diff || (!diff.add?.length && !diff.delete?.length)) {
         const reparsed = parse(text, { context: options.context || {} });
         return { text, origin: reparsed.origin };

package/src/generate.js ADDED Viewed

@@ -0,0 +1,248 @@
+import { shortenIRI, expandIRI, quadIndexKey, createSlotInfo, DEFAULT_CONTEXT } from './utils.js';
+function extractLocalName(iri) {
+    const separators = ['#', '/', ':'];
+    for (const sep of separators) {
+        const lastSep = iri.lastIndexOf(sep);
+        if (lastSep !== -1 && lastSep < iri.length - 1) {
+            return iri.substring(lastSep + 1);
+        }
+    }
+    return iri;
+}
+/**
+ * Generate deterministic MDLD from RDF quads
+ * Purpose: TTL→MDLD conversion with canonical structure
+ * Input: RDF quads + context
+ * Output: MDLD text + origin + context
+ */
+export function generate(quads, context = {}) {
+    const fullContext = { ...DEFAULT_CONTEXT, ...context };
+    const normalizedQuads = normalizeAndSortQuads(quads);
+    const subjectGroups = groupQuadsBySubject(normalizedQuads);
+    const { text, blocks, quadIndex } = buildDeterministicMDLD(subjectGroups, fullContext);
+    return {
+        text,
+        origin: { blocks, quadIndex },
+        context: fullContext
+    };
+}
+function normalizeAndSortQuads(quads) {
+    return quads
+        .map(quad => ({
+            subject: { termType: quad.subject.termType, value: quad.subject.value },
+            predicate: { termType: quad.predicate.termType, value: quad.predicate.value },
+            object: quad.object.termType === 'Literal'
+                ? {
+                    termType: 'Literal',
+                    value: quad.object.value,
+                    language: quad.object.language || null,
+                    datatype: quad.object.datatype || { termType: 'NamedNode', value: 'http://www.w3.org/2001/XMLSchema#string' }
+                }
+                : { termType: 'NamedNode', value: quad.object.value }
+        }))
+        .sort((a, b) => {
+            // Deterministic sorting: subject -> predicate -> object
+            const sComp = a.subject.value.localeCompare(b.subject.value);
+            if (sComp !== 0) return sComp;
+            const pComp = a.predicate.value.localeCompare(b.predicate.value);
+            if (pComp !== 0) return pComp;
+            const oA = a.object.termType === 'Literal' ? a.object.value : a.object.value;
+            const oB = b.object.termType === 'Literal' ? b.object.value : b.object.value;
+            return oA.localeCompare(oB);
+        });
+}
+function groupQuadsBySubject(quads) {
+    const groups = new Map();
+    for (const quad of quads) {
+        if (!groups.has(quad.subject.value)) {
+            groups.set(quad.subject.value, []);
+        }
+        groups.get(quad.subject.value).push(quad);
+    }
+    return groups;
+}
+function buildDeterministicMDLD(subjectGroups, context) {
+    let text = '';
+    let currentPos = 0;
+    const blocks = new Map();
+    const quadIndex = new Map();
+    // Add prefixes first (deterministic order), but exclude default context prefixes
+    const sortedPrefixes = Object.entries(context).sort(([a], [b]) => a.localeCompare(b));
+    for (const [prefix, namespace] of sortedPrefixes) {
+        // Skip default context prefixes - they're implicit in MDLD
+        if (prefix !== '@vocab' && !prefix.startsWith('@') && !DEFAULT_CONTEXT[prefix]) {
+            const prefixDecl = `[${prefix}] <${namespace}>\n`;
+            const blockId = generateBlockId();
+            blocks.set(blockId, {
+                id: blockId,
+                range: { start: currentPos, end: currentPos + prefixDecl.length },
+                subject: null,
+                entries: [{ kind: 'prefix', prefix, namespace, raw: prefixDecl.trim() }],
+                carrierType: 'prefix'
+            });
+            text += prefixDecl;
+            currentPos += prefixDecl.length;
+        }
+    }
+    if (sortedPrefixes.length > 0) {
+        text += '\n';
+        currentPos += 1;
+    }
+    // Process subjects in deterministic order
+    const sortedSubjects = Array.from(subjectGroups.keys()).sort();
+    for (const subjectIRI of sortedSubjects) {
+        const subjectQuads = subjectGroups.get(subjectIRI);
+        const shortSubject = shortenIRI(subjectIRI, context);
+        // Separate types, literals, and objects
+        const types = subjectQuads.filter(q => q.predicate.value === 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type');
+        const literals = subjectQuads.filter(q => q.object.termType === 'Literal' && q.predicate.value !== 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type');
+        const objects = subjectQuads.filter(q => q.object.termType === 'NamedNode' && q.predicate.value !== 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type');
+        // Generate heading
+        const localSubjectName = extractLocalName(subjectIRI);
+        const typeAnnotations = types.length > 0
+            ? ' ' + types.map(t => '.' + extractLocalName(t.object.value)).sort().join(' ')
+            : '';
+        const headingText = `# ${localSubjectName} {=${shortSubject}${typeAnnotations}}\n\n`;
+        const blockId = generateBlockId();
+        const headingBlock = {
+            id: blockId,
+            range: { start: currentPos, end: currentPos + headingText.length },
+            subject: subjectIRI,
+            entries: [
+                { kind: 'subject', raw: `=${shortSubject}`, expandedSubject: subjectIRI },
+                ...types.map((t, i) => ({
+                    kind: 'type',
+                    raw: '.' + extractLocalName(t.object.value),
+                    expandedType: t.object.value,
+                    entryIndex: i
+                }))
+            ],
+            carrierType: 'heading'
+        };
+        blocks.set(blockId, headingBlock);
+        // Add type quads to index
+        types.forEach((quad, i) => {
+            const key = quadIndexKey(quad.subject, quad.predicate, quad.object);
+            quadIndex.set(key, createSlotInfo(blockId, i, {
+                kind: 'type',
+                subject: quad.subject,
+                predicate: quad.predicate,
+                object: quad.object
+            }));
+        });
+        text += headingText;
+        currentPos += headingText.length;
+        // Add literals (deterministic order)
+        const sortedLiterals = literals.sort((a, b) => a.predicate.value.localeCompare(b.predicate.value));
+        for (const quad of sortedLiterals) {
+            const predShort = shortenIRI(quad.predicate.value, context);
+            let annotation = predShort;
+            if (quad.object.language) {
+                annotation += ` @${quad.object.language}`;
+            } else if (quad.object.datatype.value !== 'http://www.w3.org/2001/XMLSchema#string') {
+                annotation += ` ^^${shortenIRI(quad.object.datatype.value, context)}`;
+            }
+            const literalText = `[${quad.object.value}] {${annotation}}\n`;
+            const literalBlockId = generateBlockId();
+            const literalBlock = {
+                id: literalBlockId,
+                range: { start: currentPos, end: currentPos + literalText.length },
+                subject: subjectIRI,
+                entries: [{
+                    kind: 'property',
+                    raw: annotation,
+                    expandedPredicate: quad.predicate.value,
+                    form: '',
+                    entryIndex: 0
+                }],
+                carrierType: 'span',
+                valueRange: { start: currentPos + 1, end: currentPos + 1 + quad.object.value.length },
+                attrsRange: { start: currentPos + literalText.indexOf('{'), end: currentPos + literalText.indexOf('}') + 1 }
+            };
+            blocks.set(literalBlockId, literalBlock);
+            // Add to quad index
+            const key = quadIndexKey(quad.subject, quad.predicate, quad.object);
+            quadIndex.set(key, createSlotInfo(literalBlockId, 0, {
+                kind: 'pred',
+                subject: quad.subject,
+                predicate: quad.predicate,
+                object: quad.object,
+                form: ''
+            }));
+            text += literalText;
+            currentPos += literalText.length;
+        }
+        // Add objects (deterministic order)
+        const sortedObjects = objects.sort((a, b) => a.predicate.value.localeCompare(b.predicate.value));
+        for (const quad of sortedObjects) {
+            const predShort = shortenIRI(quad.predicate.value, context);
+            const objShort = shortenIRI(quad.object.value, context);
+            const localName = extractLocalName(quad.object.value);
+            const objectText = `> ${localName} {+${objShort} ?${predShort}}\n`;
+            const objectBlockId = generateBlockId();
+            const objectBlock = {
+                id: objectBlockId,
+                range: { start: currentPos, end: currentPos + objectText.length },
+                subject: subjectIRI,
+                entries: [{
+                    kind: 'object',
+                    raw: objShort,
+                    expandedObject: quad.object.value,
+                    entryIndex: 0
+                }],
+                carrierType: 'span'
+            };
+            blocks.set(objectBlockId, objectBlock);
+            // Add to quad index
+            const key = quadIndexKey(quad.subject, quad.predicate, quad.object);
+            quadIndex.set(key, createSlotInfo(objectBlockId, 0, {
+                kind: 'pred',
+                subject: quad.subject,
+                predicate: quad.predicate,
+                object: quad.object,
+                form: '?'
+            }));
+            text += objectText;
+            currentPos += objectText.length;
+        }
+        if (sortedLiterals.length > 0 || sortedObjects.length > 0) {
+            text += '\n';
+            currentPos += 1;
+        }
+    }
+    return { text: text.trim(), blocks, quadIndex };
+}
+function generateBlockId() {
+    return Math.random().toString(36).substring(2, 10);
+}

package/src/index.js CHANGED Viewed

@@ -1,5 +1,7 @@
 export { parse } from './parse.js';
-export { serialize } from './serialize.js';
+export { applyDiff } from './applyDiff.js';
+export { generate } from './generate.js';
+export { locate } from './locate.js';
 export {
     DEFAULT_CONTEXT,
     DataFactory,

package/src/locate.js ADDED Viewed

@@ -0,0 +1,92 @@
+import { parse } from './parse.js';
+import { normalizeQuad, quadIndexKey } from './utils.js';
+/**
+ * Locate the precise text range of a quad in MDLD text using origin tracking
+ *
+ * @param {Object} quad - The quad to locate (subject, predicate, object)
+ * @param {Object} origin - Origin object containing blocks and quadIndex (optional)
+ * @param {string} text - Original MDLD text (optional, parsed if origin not provided)
+ * @param {Object} context - Context for parsing (optional, used if text needs parsing)
+ * @returns {Object|null} Range information or null if not found
+ */
+export function locate(quad, origin, text = '', context = {}) {
+    // If origin not provided, parse text to get origin
+    if (!origin && text) {
+        const parseResult = parse(text, { context });
+        origin = parseResult.origin;
+    }
+    if (!quad || !origin || !origin.quadIndex || !origin.blocks) {
+        return null;
+    }
+    // Normalize the quad for consistent key generation
+    const normalizedQuad = normalizeQuad(quad);
+    if (!normalizedQuad) {
+        return null;
+    }
+    // Generate the quad key to lookup in quadIndex
+    const quadKey = quadIndexKey(normalizedQuad.subject, normalizedQuad.predicate, normalizedQuad.object);
+    // Find the slot information in quadIndex
+    const slotInfo = origin.quadIndex.get(quadKey);
+    if (!slotInfo) {
+        return null;
+    }
+    // Get the block information
+    const block = origin.blocks.get(slotInfo.blockId);
+    if (!block) {
+        return null;
+    }
+    // Extract the actual text content based on carrier type and entry
+    let contentRange = null;
+    let content = '';
+    if (block.carrierType === 'heading') {
+        // For headings, use the block's main range
+        contentRange = block.range;
+        content = text.substring(block.range.start, block.range.end);
+    } else if (block.carrierType === 'blockquote' || block.carrierType === 'list' || block.carrierType === 'span') {
+        // For blockquotes, lists, and spans, extract from block range
+        contentRange = block.range;
+        content = text.substring(block.range.start, block.range.end);
+        // For blockquotes, try to extract the specific carrier content from entries
+        if (slotInfo.entryIndex != null && block.entries && block.entries[slotInfo.entryIndex]) {
+            const entry = block.entries[slotInfo.entryIndex];
+            if (entry.raw) {
+                // For blockquotes, the entry.raw contains the full carrier text
+                // Extract just the content part before the annotation
+                const annotationStart = entry.raw.indexOf('{');
+                if (annotationStart !== -1) {
+                    const carrierContent = entry.raw.substring(0, annotationStart).trim();
+                    // Find this content in the block text
+                    const contentStart = text.indexOf(carrierContent, block.range.start);
+                    if (contentStart !== -1) {
+                        const contentEnd = contentStart + carrierContent.length;
+                        contentRange = { start: contentStart, end: contentEnd };
+                        content = text.substring(contentStart, contentEnd);
+                    }
+                }
+            }
+        }
+    }
+    return {
+        blockId: slotInfo.blockId,
+        entryIndex: slotInfo.entryIndex,
+        kind: slotInfo.kind,
+        subject: normalizedQuad.subject,
+        predicate: normalizedQuad.predicate,
+        object: normalizedQuad.object,
+        range: contentRange,
+        content: content,
+        blockRange: block.range,
+        carrierType: block.carrierType,
+        isVacant: slotInfo.isVacant || false
+    };
+}

package/src/parse.js CHANGED Viewed

@@ -10,7 +10,7 @@ import {
 } from './utils.js';
 const URL_REGEX = /^[a-zA-Z][a-zA-Z0-9+.-]*:/;
-const FENCE_REGEX = /^(`{3,})(.*)/;
+const FENCE_REGEX = /^(`{3,}|~{3,})(.*)/;
 const PREFIX_REGEX = /^\[([^\]]+)\]\s*<([^>]+)>/;
 const HEADING_REGEX = /^(#{1,6})\s+(.+?)(?:\s*(\{[^}]+\}))?$/;
 const UNORDERED_LIST_REGEX = /^(\s*)([-*+]|\d+\.)\s+(.+?)(?:\s*(\{[^}]+\}))?\s*$/;
@@ -22,6 +22,29 @@ const INLINE_CARRIER_PATTERNS = {
     CODE_SPAN: /``(.+?)``\s*\{([^}]+)\}/y
 };
+// Cache for fence regex patterns to avoid recreation
+const FENCE_CLOSE_PATTERNS = new Map();
+function getFenceClosePattern(fenceChar) {
+    if (!FENCE_CLOSE_PATTERNS.has(fenceChar)) {
+        FENCE_CLOSE_PATTERNS.set(fenceChar, new RegExp(`^(${fenceChar}{3,})`));
+    }
+    return FENCE_CLOSE_PATTERNS.get(fenceChar);
+}
+function parseLangAndAttrs(langAndAttrs) {
+    const spaceIndex = langAndAttrs.indexOf(' ');
+    const braceIndex = langAndAttrs.indexOf('{');
+    const langEnd = Math.min(
+        spaceIndex > -1 ? spaceIndex : Infinity,
+        braceIndex > -1 ? braceIndex : Infinity
+    );
+    return {
+        lang: langAndAttrs.substring(0, langEnd),
+        attrsText: langAndAttrs.substring(langEnd).match(/\{[^{}]*\}/)?.[0] || null
+    };
+}
 const semCache = {};
 const EMPTY_SEM = Object.freeze({ predicates: [], types: [], subject: null });
@@ -79,108 +102,104 @@ function scanTokens(text) {
     let pos = 0;
     let codeBlock = null;
-    const processors = [
-        {
-            test: line => line.startsWith('```'),
-            process: (line, lineStart, pos) => {
-                if (!codeBlock) {
-                    const fenceMatch = line.match(FENCE_REGEX);
-                    const attrsText = fenceMatch[2].match(/\{[^{}]*\}/)?.[0] || null;
-                    const attrsStartInLine = attrsText ? line.indexOf(attrsText) : -1;
-                    const contentStart = lineStart + line.length + 1;
-                    const langAndAttrs = fenceMatch[2];
-                    const langEnd = langAndAttrs.indexOf(' ') > -1 ? langAndAttrs.indexOf(' ') :
-                        langAndAttrs.indexOf('{') > -1 ? langAndAttrs.indexOf('{') : langAndAttrs.length;
-                    codeBlock = {
-                        fence: fenceMatch[1],
-                        start: lineStart,
-                        content: [],
-                        lang: langAndAttrs.substring(0, langEnd),
-                        attrs: attrsText,
-                        attrsRange: attrsText && attrsStartInLine >= 0 ? [lineStart + attrsStartInLine, lineStart + attrsStartInLine + attrsText.length] : null,
-                        valueRangeStart: contentStart
-                    };
-                } else if (line.startsWith(codeBlock.fence)) {
-                    const valueStart = codeBlock.valueRangeStart;
-                    const valueEnd = Math.max(valueStart, lineStart - 1);
-                    tokens.push({
-                        type: 'code',
-                        range: [codeBlock.start, lineStart],
-                        text: codeBlock.content.join('\n'),
-                        lang: codeBlock.lang,
-                        attrs: codeBlock.attrs,
-                        attrsRange: codeBlock.attrsRange,
-                        valueRange: [valueStart, valueEnd]
-                    });
-                    codeBlock = null;
-                }
-                return true;
-            }
-        },
-        {
-            test: () => codeBlock,
-            process: line => {
-                codeBlock.content.push(line);
-                return true;
-            }
-        },
-        {
-            test: line => PREFIX_REGEX.test(line),
-            process: (line, lineStart, pos) => {
-                const match = PREFIX_REGEX.exec(line);
-                tokens.push({ type: 'prefix', prefix: match[1], iri: match[2].trim() });
-                return true;
-            }
-        },
-        {
-            test: line => HEADING_REGEX.test(line),
-            process: (line, lineStart, pos) => {
-                const match = HEADING_REGEX.exec(line);
-                const attrs = match[3] || null;
-                const afterHashes = match[1].length;
-                const rangeInfo = calcRangeInfo(line, attrs, lineStart, afterHashes, match[2].length);
-                tokens.push(createToken('heading', [lineStart, pos - 1], match[2].trim(), attrs,
-                    rangeInfo.attrsRange, rangeInfo.valueRange, { depth: match[1].length }));
-                return true;
-            }
-        },
-        {
-            test: line => UNORDERED_LIST_REGEX.test(line),
-            process: (line, lineStart, pos) => {
-                const match = UNORDERED_LIST_REGEX.exec(line);
-                tokens.push(createListToken('list', line, lineStart, pos, match, match[1].length));
-                return true;
-            }
-        },
-        {
-            test: line => BLOCKQUOTE_REGEX.test(line),
-            process: (line, lineStart, pos) => {
-                const match = BLOCKQUOTE_REGEX.exec(line);
-                const attrs = match[2] || null;
-                const valueStartInLine = line.startsWith('> ') ? 2 : line.indexOf('>') + 1;
-                const valueEndInLine = valueStartInLine + match[1].length;
-                tokens.push(createToken('blockquote', [lineStart, pos - 1], match[1].trim(), attrs,
-                    calcAttrsRange(line, attrs, lineStart),
-                    [lineStart + valueStartInLine, lineStart + valueEndInLine]));
-                return true;
-            }
-        },
-        {
-            test: line => line.trim(),
-            process: (line, lineStart, pos) => {
-                tokens.push(createToken('para', [lineStart, pos - 1], line.trim()));
-                return true;
+    // Direct lookup instead of linear search
+    const PROCESSORS = [
+        { type: 'fence', test: line => FENCE_REGEX.test(line.trim()), process: handleFence },
+        { type: 'content', test: () => codeBlock, process: line => codeBlock.content.push(line) },
+        { type: 'prefix', test: line => PREFIX_REGEX.test(line), process: handlePrefix },
+        { type: 'heading', test: line => HEADING_REGEX.test(line), process: handleHeading },
+        { type: 'list', test: line => UNORDERED_LIST_REGEX.test(line), process: handleList },
+        { type: 'blockquote', test: line => BLOCKQUOTE_REGEX.test(line), process: handleBlockquote },
+        { type: 'para', test: line => line.trim(), process: handlePara }
+    ];
+    function handleFence(line, lineStart, pos) {
+        const trimmedLine = line.trim();
+        if (!codeBlock) {
+            const fenceMatch = trimmedLine.match(FENCE_REGEX);
+            if (!fenceMatch) return false;
+            const { lang, attrsText } = parseLangAndAttrs(fenceMatch[2]);
+            const attrsStartInLine = attrsText ? line.indexOf(attrsText) : -1;
+            const contentStart = lineStart + line.length + 1;
+            codeBlock = {
+                fence: fenceMatch[1],
+                start: lineStart,
+                content: [],
+                lang,
+                attrs: attrsText,
+                attrsRange: attrsText && attrsStartInLine >= 0 ? [lineStart + attrsStartInLine, lineStart + attrsStartInLine + attrsText.length] : null,
+                valueRangeStart: contentStart
+            };
+        } else {
+            const fenceChar = codeBlock.fence[0];
+            const expectedFence = fenceChar.repeat(codeBlock.fence.length);
+            const fenceMatch = trimmedLine.match(getFenceClosePattern(fenceChar));
+            if (fenceMatch && fenceMatch[1] === expectedFence) {
+                const valueStart = codeBlock.valueRangeStart;
+                const valueEnd = Math.max(valueStart, lineStart - 1);
+                tokens.push({
+                    type: 'code',
+                    range: [codeBlock.start, lineStart],
+                    text: codeBlock.content.join('\n'),
+                    lang: codeBlock.lang,
+                    attrs: codeBlock.attrs,
+                    attrsRange: codeBlock.attrsRange,
+                    valueRange: [valueStart, valueEnd]
+                });
+                codeBlock = null;
             }
         }
-    ];
+        return true;
+    }
+    function handlePrefix(line, lineStart, pos) {
+        const match = PREFIX_REGEX.exec(line);
+        tokens.push({ type: 'prefix', prefix: match[1], iri: match[2].trim() });
+        return true;
+    }
+    function handleHeading(line, lineStart, pos) {
+        const match = HEADING_REGEX.exec(line);
+        const attrs = match[3] || null;
+        const afterHashes = match[1].length;
+        const rangeInfo = calcRangeInfo(line, attrs, lineStart, afterHashes, match[2].length);
+        tokens.push(createToken('heading', [lineStart, pos - 1], match[2].trim(), attrs,
+            rangeInfo.attrsRange, rangeInfo.valueRange, { depth: match[1].length }));
+        return true;
+    }
+    function handleList(line, lineStart, pos) {
+        const match = UNORDERED_LIST_REGEX.exec(line);
+        tokens.push(createListToken('list', line, lineStart, pos, match, match[1].length));
+        return true;
+    }
+    function handleBlockquote(line, lineStart, pos) {
+        const match = BLOCKQUOTE_REGEX.exec(line);
+        const attrs = match[2] || null;
+        const valueStartInLine = line.startsWith('> ') ? 2 : line.indexOf('>') + 1;
+        const valueEndInLine = valueStartInLine + match[1].length;
+        tokens.push(createToken('blockquote', [lineStart, pos - 1], match[1].trim(), attrs,
+            calcAttrsRange(line, attrs, lineStart),
+            [lineStart + valueStartInLine, lineStart + valueEndInLine]));
+        return true;
+    }
+    function handlePara(line, lineStart, pos) {
+        tokens.push(createToken('para', [lineStart, pos - 1], line.trim()));
+        return true;
+    }
     for (let i = 0; i < lines.length; i++) {
         const line = lines[i];
         const lineStart = pos;
         pos += line.length + 1;
-        // Try each processor until one handles the line
-        for (const processor of processors) {
+        // Direct processor lookup - O(n) instead of O(n*m)
+        for (const processor of PROCESSORS) {
             if (processor.test(line) && processor.process(line, lineStart, pos)) {
                 break;
             }
@@ -562,7 +581,12 @@ const manageListStack = (token, state) => {
 const combineSemanticInfo = (token, carriers, listFrame, state, itemSubject) => {
     const combinedSem = { subject: null, object: null, types: [], predicates: [], datatype: null, language: null, entries: [] };
-    const addSem = (sem) => { combinedSem.types.push(...sem.types); combinedSem.predicates.push(...sem.predicates); combinedSem.entries.push(...sem.entries); };
+    const addSem = (sem) => {
+        const entryIndex = combinedSem.entries.length;
+        combinedSem.types.push(...sem.types);
+        combinedSem.predicates.push(...sem.predicates);
+        combinedSem.entries.push(...sem.entries.map(entry => ({ ...entry, entryIndex })));
+    };
     if (listFrame?.contextSem) {
         const inheritedSem = processContextSem({ sem: listFrame.contextSem, itemSubject, contextSubject: listFrame.contextSubject, inheritLiterals: true, state });

package/src/utils.js CHANGED Viewed

@@ -25,16 +25,31 @@ export function hash(str) {
     return Math.abs(h).toString(16).slice(0, 12);
 }
+const iriCache = new Map();
 export function expandIRI(term, ctx) {
     if (term == null) return null;
+    const cacheKey = `${term}|${ctx['@vocab'] || ''}|${Object.keys(ctx).filter(k => k !== '@vocab').sort().map(k => `${k}:${ctx[k]}`).join(',')}`;
+    if (iriCache.has(cacheKey)) {
+        return iriCache.get(cacheKey);
+    }
     const raw = typeof term === 'string' ? term : (typeof term === 'object' && typeof term.value === 'string') ? term.value : String(term);
     const t = raw.trim();
-    if (t.match(/^https?:/)) return t;
-    if (t.includes(':')) {
+    let result;
+    if (t.match(/^https?:/)) {
+        result = t;
+    } else if (t.includes(':')) {
         const [prefix, ref] = t.split(':', 2);
-        return ctx[prefix] ? ctx[prefix] + ref : t;
+        result = ctx[prefix] ? ctx[prefix] + ref : t;
+    } else {
+        result = (ctx['@vocab'] || '') + t;
     }
-    return (ctx['@vocab'] || '') + t;
+    iriCache.set(cacheKey, result);
+    return result;
 }
 export function shortenIRI(iri, ctx) {