npm - lilmd - Versions diffs - 0.1.0 - Mend

lilmd 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/BENCHMARK.md ADDED Viewed

@@ -0,0 +1,106 @@
+# Benchmark summary
+Why `mdq` uses a hand-rolled scanner instead of an off-the-shelf markdown
+parser. Numbers captured on Bun 1.3.11 against MDN content (`mdn/content`,
+sparse-checkout of `files/en-us/web/{javascript,css,html,api}`, concatenated
+into fixtures of ~100 KB, ~1 MB, and ~10 MB).
+## Parse speed (large, 10 MB)
+| library | positions? | throughput |
+|---|:---:|---:|
+| **scanner (in `src/scan.ts`)** | ✅ | **~180 MB/s** |
+| markdown-it | ✅ | ~26 MB/s |
+| mdast-util-from-markdown | ✅ | ~1 MB/s (skipped — too slow) |
+| marked lexer | ❌ | >90 s on a 1 MB input (unusable) |
+| md4w (WASM) | ❌ | ~42 MB/s, errors on 10 MB JSON output |
+## End-to-end `mdq read` (10 MB, find a section + slice its body)
+| strategy | time |
+|---|---:|
+| **scanner** | **55 ms** (IO-bound) |
+| markdown-it | 422 ms (~7.6× slower) |
+| mdast-util-from-markdown | ~60 s (~1000× slower) |
+All three strategies agree on the matched section and exact body bytes —
+the scanner is correct, not just fast.
+## CLI cold start
+| framework | cold start |
+|---|---:|
+| `node:util.parseArgs` (built-in) | ~16 ms |
+| cac | ~16 ms |
+| citty | ~23 ms |
+## Why the scanner wins
+`mdq`'s read-path commands (`toc`, `read`, `ls`, `grep`) only need two facts
+from the markdown:
+1. ATX headings — level, text, line number
+2. Fenced code block boundaries (so `#` inside code doesn't become a heading)
+Everything else — links, emphasis, tables, footnotes, nested lists, HTML
+blocks — is irrelevant to "list the headings" and "slice the body between
+line N and line M". A full CommonMark parser spends 95% of its budget on
+grammar `mdq` immediately throws away. The scanner skips all of that, runs
+in a single pass over character codes, and is IO-bound on 10 MB of prose.
+## The final stack
+- **Parsing**: hand-rolled scanner in `src/scan.ts`. Zero dependencies.
+- **CLI**: `node:util.parseArgs` + a ~20-line subcommand switch. Zero
+  dependencies.
+- **Future write-path commands** (`set`, `insert`, `mv`, `links`, `code`)
+  may add `markdown-it` as the only runtime dep when they land — it's the
+  only position-preserving parser that scales.
+- **Rejected**: `mdast-util-from-markdown` (25× slower than markdown-it
+  despite wrapping the same micromark tokenizer), `marked` (catastrophic
+  regex backtracking on prose), `md4w` (no source positions + JSON
+  marshaller bug at 10 MB), `citty` (~45% slower cold start than the
+  built-in for no meaningful feature we need), `cac` (same cold-start
+  class as built-in but adds a dep).
+## Reproducing
+The raw benchmark scripts were removed to keep the repo minimal. To rerun
+them, check out an earlier commit on this branch (look for `dev/bench/` in
+git history) or rewrite them against the methodology above:
+- Small/medium/large fixtures built by concatenating MDN markdown files.
+- Per-(library, fixture) wall budgets of 4–8 s with hard iteration caps, so
+  a pathological parser (we're looking at you, marked) can't hang the run.
+- Trimmed mean of the fastest 50% of iterations per combo.
+- Full-throughput results written incrementally so a timeout still yields
+  partial data.
+## Integration-test fixture
+`src/__fixtures__/mdn-array.md` is a tiny (~42 KB, 1,298 lines, 112
+headings) fixture committed into the repo and exercised by
+`src/integration.test.ts`. It's a concatenation of 8 MDN
+`Array.prototype.*` reference pages hoisted under synthetic H1 wrappers.
+Small enough to commit, big enough to catch regressions the synthetic unit
+fixtures can miss (Kuma macros, JSX-flavored HTML, tables, real fenced
+code, nested lists).
+Licensed CC BY-SA 2.5, © Mozilla Contributors. Regenerate with:
+```bash
+# 1. Sparse-clone the MDN Array docs (no blobs, no tree outside array/)
+git clone --depth 1 --filter=blob:none --sparse \
+  https://github.com/mdn/content.git /tmp/mdn
+cd /tmp/mdn
+git sparse-checkout set \
+  files/en-us/web/javascript/reference/global_objects/array
+# 2. Concatenate 8 method pages under synthetic H1s
+#    (see the fixture's own header comment for the exact list)
+# 3. Prepend the attribution block from the existing fixture header
+```
+If you change the file list or the synthetic wrappers, update the relevant
+assertions in `src/integration.test.ts` — a couple of them pin exact counts
+("8 matches, showing first 3") that are tied to the 8-page choice.

package/README.md ADDED Viewed

@@ -0,0 +1,113 @@
+## `mdq` - Markdown as a Database for Agents
+`mdq` is a CLI for working with large MD files designed for agents.
+*Wait, but why?* Agent knowledge, docs, memory keeps growing.
+MDQ allows you to dump it all in one file and effeciently read/write/navigate its contents.
+Features:
+- fast navigation, complex read selectors, link extraction
+- complex section selectors
+- designed to save as much context as possible
+- can write, append, remove entire sections
+- can run in Node/Bun
+- optimized for speed
+- can be used by humans and **agents**
+- uses Bun as tooling: to test, control deps etc.
+### Help
+```bash
+# start here!
+# both commands print short documentation for the agent
+> mdq
+> mdq --help
+```
+### Overview & table of contents
+First, the agent gets file overview and table of contents.
+```bash
+# renders toc + stats; line ranges are inclusive, 1-indexed
+# --depth=N to limit nesting, --flat for a flat list
+> mdq file.md
+file.md  L1-450  12 headings
+# MDQ                       L1-450
+  ## Getting Started        L5-80
+    ### Installation        L31-80
+  ## Community              L301-450
+```
+### Reading sections
+```bash
+> mdq read file.md "# MDQ"
+> mdq file.md "# MDQ"           # alias!
+# prints the contents of the MDQ section
+# descendant selector (any depth under the parent)
+> mdq file.md "MDQ > Installation"
+# direct child only
+> mdq file.md "MDQ >> Installation"
+# level filter (H2 only)
+> mdq file.md "##Installation"
+# exact match (default is fuzzy, case-insensitive)
+> mdq file.md "=Installation"
+# regex
+> mdq file.md "/install(ation)?/"
+# by default no more than 25 matches are printed; if more, mdq prints a hint
+# about --max-results=N
+# --max-lines=N truncates long bodies (shows "… N more lines")
+# --body-only skips subsections, --no-body prints headings only
+```
+### For humans only
+```bash
+# --pretty renders the section body as syntax-highlighted terminal markdown
+#   (for humans; piped output stays plain unless FORCE_COLOR is set)
+> mdq file.md --pretty "Installation"
+# nicely formatted markdown
+```
+### Searching & extracting
+```bash
+> mdq ls file.md "Getting Started"        # direct children of a section
+> mdq grep file.md "pattern"              # regex search, grouped by section
+> mdq links file.md ["selector"]          # extract links with section path
+> mdq code file.md "Install" [--lang=ts]  # extract code blocks
+```
+### Writing
+`mdq` treats sections as addressable records: you can replace, append,
+insert, move, or rename them without rewriting the whole file. Every write
+supports `--dry-run`, which prints a unified diff instead of touching disk —
+perfect for agent-authored edits that a human (or another agent) reviews
+before applying.
+```bash
+> mdq set    file.md "Install" < body.md  # replace section body
+> mdq append file.md "Install" < body.md
+> mdq insert file.md --after "Install" < new.md
+> mdq rm     file.md "Old"
+> mdq mv     file.md "From" "To"          # re-parent, fixes heading levels
+> mdq rename file.md "Old" "New"
+> mdq promote|demote file.md "Section"    # shift heading level ±1
+```
+### Output
+```bash
+# human-readable by default; --json for machine output
+# use - as filename to read from stdin
+> cat big.md | mdq - "Install"
+```

package/dist/index.cjs ADDED Viewed

@@ -0,0 +1,364 @@
+var import_node_module = require("node:module");
+var __defProp = Object.defineProperty;
+var __getOwnPropNames = Object.getOwnPropertyNames;
+var __getOwnPropDesc = Object.getOwnPropertyDescriptor;
+var __hasOwnProp = Object.prototype.hasOwnProperty;
+var __moduleCache = /* @__PURE__ */ new WeakMap;
+var __toCommonJS = (from) => {
+  var entry = __moduleCache.get(from), desc;
+  if (entry)
+    return entry;
+  entry = __defProp({}, "__esModule", { value: true });
+  if (from && typeof from === "object" || typeof from === "function")
+    __getOwnPropNames(from).map((key) => !__hasOwnProp.call(entry, key) && __defProp(entry, key, {
+      get: () => from[key],
+      enumerable: !(desc = __getOwnPropDesc(from, key)) || desc.enumerable
+    }));
+  __moduleCache.set(from, entry);
+  return entry;
+};
+var __export = (target, all) => {
+  for (var name in all)
+    __defProp(target, name, {
+      get: all[name],
+      enumerable: true,
+      configurable: true,
+      set: (newValue) => all[name] = () => newValue
+    });
+};
+// src/index.ts
+var exports_src = {};
+__export(exports_src, {
+  truncateBody: () => truncateBody,
+  scan: () => scan,
+  renderToc: () => renderToc,
+  renderSection: () => renderSection,
+  pathOf: () => pathOf,
+  parseSelector: () => parseSelector,
+  match: () => match,
+  countLines: () => countLines,
+  buildSections: () => buildSections
+});
+module.exports = __toCommonJS(exports_src);
+// src/scan.ts
+function scan(src) {
+  const out = [];
+  const len = src.length;
+  let i = 0;
+  let lineNo = 0;
+  let inFence = false;
+  let fenceChar = 0;
+  let fenceLen = 0;
+  while (i <= len) {
+    const start = i;
+    while (i < len && src.charCodeAt(i) !== 10)
+      i++;
+    let line = src.slice(start, i);
+    if (line.length > 0 && line.charCodeAt(line.length - 1) === 13) {
+      line = line.slice(0, line.length - 1);
+    }
+    lineNo++;
+    const fence = matchFence(line);
+    if (fence) {
+      if (!inFence) {
+        inFence = true;
+        fenceChar = fence.char;
+        fenceLen = fence.len;
+      } else if (fence.char === fenceChar && fence.len >= fenceLen) {
+        inFence = false;
+      }
+    } else if (!inFence) {
+      const h = matchHeading(line, lineNo);
+      if (h)
+        out.push(h);
+    }
+    if (i >= len)
+      break;
+    i++;
+  }
+  return out;
+}
+function matchFence(line) {
+  let p = 0;
+  while (p < 3 && line.charCodeAt(p) === 32)
+    p++;
+  const ch = line.charCodeAt(p);
+  if (ch !== 96 && ch !== 126)
+    return null;
+  let run = 0;
+  while (line.charCodeAt(p + run) === ch)
+    run++;
+  if (run < 3)
+    return null;
+  return { char: ch, len: run };
+}
+function matchHeading(line, lineNo) {
+  let p = 0;
+  while (p < 3 && line.charCodeAt(p) === 32)
+    p++;
+  if (line.charCodeAt(p) !== 35)
+    return null;
+  let hashes = 0;
+  while (line.charCodeAt(p + hashes) === 35)
+    hashes++;
+  if (hashes < 1 || hashes > 6)
+    return null;
+  const after = p + hashes;
+  const afterCh = line.charCodeAt(after);
+  if (after < line.length && afterCh !== 32 && afterCh !== 9) {
+    return null;
+  }
+  let contentStart = after;
+  while (contentStart < line.length && (line.charCodeAt(contentStart) === 32 || line.charCodeAt(contentStart) === 9)) {
+    contentStart++;
+  }
+  let end = line.length;
+  while (end > contentStart && (line.charCodeAt(end - 1) === 32 || line.charCodeAt(end - 1) === 9)) {
+    end--;
+  }
+  let closing = end;
+  while (closing > contentStart && line.charCodeAt(closing - 1) === 35)
+    closing--;
+  if (closing < end && (closing === contentStart || line.charCodeAt(closing - 1) === 32 || line.charCodeAt(closing - 1) === 9)) {
+    end = closing;
+    while (end > contentStart && (line.charCodeAt(end - 1) === 32 || line.charCodeAt(end - 1) === 9)) {
+      end--;
+    }
+  }
+  const title = line.slice(contentStart, end);
+  return { level: hashes, title, line: lineNo };
+}
+// src/sections.ts
+function buildSections(headings, totalLines) {
+  const out = [];
+  const stack = [];
+  for (const h of headings) {
+    while (stack.length > 0 && stack[stack.length - 1].level >= h.level) {
+      const closing = stack.pop();
+      closing.line_end = h.line - 1;
+    }
+    const parent = stack.length > 0 ? stack[stack.length - 1] : null;
+    const sec = {
+      level: h.level,
+      title: h.title,
+      line_start: h.line,
+      line_end: totalLines,
+      parent
+    };
+    out.push(sec);
+    stack.push(sec);
+  }
+  return out;
+}
+function pathOf(sec) {
+  const path = [];
+  let cur = sec.parent;
+  while (cur) {
+    path.push(cur.title);
+    cur = cur.parent;
+  }
+  return path.reverse();
+}
+function countLines(src) {
+  if (src.length === 0)
+    return 0;
+  let n = 1;
+  for (let i = 0;i < src.length; i++) {
+    if (src.charCodeAt(i) === 10)
+      n++;
+  }
+  if (src.charCodeAt(src.length - 1) === 10)
+    n--;
+  return n;
+}
+// src/select.ts
+function parseSelector(input) {
+  const trimmed = input.trim();
+  if (trimmed.length === 0)
+    return [];
+  const rawSegments = [];
+  const ops = ["descendant"];
+  let cur = "";
+  let i = 0;
+  let inRegex = false;
+  let atSegmentStart = true;
+  while (i < trimmed.length) {
+    const ch = trimmed[i];
+    if (ch === "/" && (atSegmentStart || inRegex)) {
+      inRegex = !inRegex;
+      cur += ch;
+      atSegmentStart = false;
+      i++;
+      continue;
+    }
+    if (!inRegex && ch === ">") {
+      rawSegments.push(cur.trim());
+      cur = "";
+      atSegmentStart = true;
+      if (trimmed[i + 1] === ">") {
+        ops.push("child");
+        i += 2;
+      } else {
+        ops.push("descendant");
+        i += 1;
+      }
+      continue;
+    }
+    cur += ch;
+    if (ch !== " " && ch !== "\t")
+      atSegmentStart = false;
+    i++;
+  }
+  rawSegments.push(cur.trim());
+  return rawSegments.map((s, idx) => parseSegment(s, ops[idx] ?? "descendant"));
+}
+function parseSegment(raw, op) {
+  let s = raw;
+  let level = null;
+  const levelMatch = /^(#{1,6})(?!#)\s*(.*)$/.exec(s);
+  if (levelMatch) {
+    level = levelMatch[1].length;
+    s = levelMatch[2] ?? "";
+  }
+  const regexMatch = /^\/(.+)\/([gimsuy]*)$/.exec(s);
+  if (regexMatch) {
+    const pattern = regexMatch[1];
+    const flags = regexMatch[2] || "i";
+    return {
+      op,
+      level,
+      kind: "regex",
+      value: pattern,
+      regex: new RegExp(pattern, flags)
+    };
+  }
+  if (s.startsWith("=")) {
+    return { op, level, kind: "exact", value: s.slice(1).trim() };
+  }
+  return { op, level, kind: "fuzzy", value: s.trim() };
+}
+function match(sections, selector) {
+  if (selector.length === 0)
+    return [];
+  const out = [];
+  for (const sec of sections) {
+    if (matches(sec, selector))
+      out.push(sec);
+  }
+  return out;
+}
+function matches(sec, segs) {
+  const last = segs[segs.length - 1];
+  if (!last || !segmentMatchesSection(last, sec))
+    return false;
+  let cursor = sec.parent;
+  for (let i = segs.length - 2;i >= 0; i--) {
+    const op = segs[i + 1].op;
+    const seg = segs[i];
+    if (op === "child") {
+      if (!cursor || !segmentMatchesSection(seg, cursor))
+        return false;
+      cursor = cursor.parent;
+    } else {
+      let found = null;
+      while (cursor) {
+        if (segmentMatchesSection(seg, cursor)) {
+          found = cursor;
+          break;
+        }
+        cursor = cursor.parent;
+      }
+      if (!found)
+        return false;
+      cursor = found.parent;
+    }
+  }
+  return true;
+}
+function segmentMatchesSection(seg, sec) {
+  if (seg.level !== null && seg.level !== sec.level)
+    return false;
+  const title = sec.title;
+  switch (seg.kind) {
+    case "exact":
+      return title.toLowerCase() === seg.value.toLowerCase();
+    case "regex":
+      return seg.regex.test(title);
+    case "fuzzy":
+      return title.toLowerCase().includes(seg.value.toLowerCase());
+  }
+}
+// src/render.ts
+function renderToc(file, src, sections, opts) {
+  const totalLines = countLines(src);
+  const headerCount = sections.length;
+  const headerRange = totalLines === 0 ? "L0" : `L1-${totalLines}`;
+  const plural = headerCount === 1 ? "heading" : "headings";
+  const out = [];
+  out.push(`${file}  ${headerRange}  ${headerCount} ${plural}`);
+  for (const sec of sections) {
+    if (opts.depth != null && sec.level > opts.depth)
+      continue;
+    const indent = opts.flat ? "" : "  ".repeat(Math.max(0, sec.level - 1));
+    const hashes = "#".repeat(sec.level);
+    const range = `L${sec.line_start}-${sec.line_end}`;
+    out.push(`${indent}${hashes} ${sec.title}  ${range}`);
+  }
+  return out.join(`
+`);
+}
+function renderSection(file, srcLines, sec, opts) {
+  const start = sec.line_start;
+  let end = sec.line_end;
+  if (opts.bodyOnly && opts.allSections) {
+    const firstChild = findFirstChild(sec, opts.allSections);
+    if (firstChild)
+      end = firstChild.line_start - 1;
+  }
+  if (opts.noBody) {
+    end = start;
+  }
+  const clampedEnd = Math.min(end, srcLines.length);
+  let body = srcLines.slice(start - 1, clampedEnd).join(`
+`);
+  if (opts.maxLines != null && opts.maxLines > 0) {
+    body = truncateBody(body, opts.maxLines);
+  }
+  if (opts.pretty) {
+    body = opts.pretty(body);
+  }
+  if (opts.raw)
+    return body;
+  const hashes = "#".repeat(sec.level);
+  const header = `── ${file}  L${start}-${end}  ${hashes} ${sec.title} ${"─".repeat(8)}`;
+  const footer = `── end ${"─".repeat(40)}`;
+  return `${header}
+${body}
+${footer}`;
+}
+function truncateBody(body, maxLines) {
+  if (maxLines <= 0)
+    return body;
+  const lines = body.split(`
+`);
+  if (lines.length <= maxLines)
+    return body;
+  const kept = lines.slice(0, maxLines).join(`
+`);
+  const remaining = lines.length - maxLines;
+  return `${kept}
+… ${remaining} more lines (use --max-lines=0 for full)`;
+}
+function findFirstChild(sec, all) {
+  for (const candidate of all) {
+    if (candidate.parent === sec)
+      return candidate;
+  }
+  return null;
+}
+//# debugId=F78549B744E4995264756E2164756E21
+//# sourceMappingURL=index.js.map

package/dist/index.d.cts ADDED Viewed

@@ -0,0 +1,112 @@
+/**
+* Markdown heading scanner — the engine behind every read-path command.
+*
+* Instead of building a full CommonMark AST we walk the source line by line
+* and recognize only what `mdq` actually needs: ATX headings and fenced code
+* blocks (so `#` inside code doesn't count as a heading).
+*
+* Numbers on MDN content (see BENCHMARK.md): ~180 MB/s end-to-end on a
+* 10 MB fixture, roughly 7x faster than markdown-it and ~1000x faster than
+* mdast-util-from-markdown while returning the exact same section.
+*
+* Deliberate limitations:
+* - Setext headings (`===` / `---` underlines) are NOT recognized. mdq is
+*   aimed at agent-authored markdown where ATX is ubiquitous.
+* - HTML blocks are not detected. A `<pre>` containing an ATX-looking line
+*   would be misread as a heading. That's an acceptable tradeoff for 100x
+*   speed; a future `--strict` flag could hand off to markdown-it.
+* - Fenced code blocks *inside a list item* that are indented 4+ spaces are
+*   not recognized as fences — we only look at the first 3 columns for the
+*   fence opener. A `# fake` line inside such a block would be scanned as a
+*   heading. Rare in practice; document-your-way-out rather than fix.
+* - An unclosed fence at EOF leaves the scanner in "still in fence" state
+*   to the end of the file, so any `#`-looking lines after it are ignored.
+*   That's the conservative choice — prefer under-counting to over-counting.
+*/
+type Heading = {
+	/** 1..6 */
+	level: number;
+	/** Heading text with trailing closing hashes stripped. */
+	title: string;
+	/** 1-indexed line number. */
+	line: number;
+};
+/**
+* Return every ATX heading in `src`, in document order.
+* Runs in a single pass; O(n) in source length, O(headings) in space.
+*/
+declare function scan(src: string): Heading[];
+type Section = {
+	level: number;
+	title: string;
+	/** 1-indexed line of the heading itself. */
+	line_start: number;
+	/** 1-indexed inclusive end of the subtree. */
+	line_end: number;
+	/** Nearest enclosing section, or null for top-level. */
+	parent: Section | null;
+};
+/**
+* Build the section tree in a single pass. Preserves document order.
+*
+* Runs in O(n): every section is pushed once and popped once, and we set
+* its `line_end` at pop time. Sections still on the stack when we run out
+* of headings keep their provisional `line_end = totalLines`.
+*/
+declare function buildSections(headings: Heading[], totalLines: number): Section[];
+/**
+* Walk `sec` up to the root, collecting ancestor titles in top-down order.
+* Returns [] for a root section.
+*/
+declare function pathOf(sec: Section): string[];
+/**
+* Count lines in a source string. Empty string is 0; otherwise every line
+* (including the last one, whether or not it ends with a newline) is 1.
+* A trailing newline does NOT add a phantom line.
+*/
+declare function countLines(src: string): number;
+type Op = "descendant" | "child";
+type Kind = "fuzzy" | "exact" | "regex";
+type Segment = {
+	/** Operator that connects this segment to the *previous* one.
+	*  For the first segment this is always "descendant" (unused). */
+	op: Op;
+	/** Optional 1..6 level filter. */
+	level: number | null;
+	kind: Kind;
+	/** The raw value (without level/kind prefix). */
+	value: string;
+	/** Present only for kind === "regex". */
+	regex?: RegExp;
+};
+declare function parseSelector(input: string): Segment[];
+declare function match(sections: Section[], selector: Segment[]): Section[];
+/**
+* Pretty printing for `mdq read --pretty`. Lazy-loads marked +
+* marked-terminal on first use so the default (plain-text) path keeps its
+* ~16ms cold start.
+*/
+type PrettyFormatter = (markdown: string) => string;
+type TocOptions = {
+	depth?: number;
+	flat?: boolean;
+};
+declare function renderToc(file: string, src: string, sections: Section[], opts: TocOptions): string;
+type SectionOptions = {
+	bodyOnly?: boolean;
+	noBody?: boolean;
+	raw?: boolean;
+	maxLines?: number;
+	/** Required when bodyOnly is true so we can find the first child. */
+	allSections?: Section[];
+	/** Optional markdown→ANSI formatter applied to the body before delimiters. */
+	pretty?: PrettyFormatter;
+};
+declare function renderSection(file: string, srcLines: string[], sec: Section, opts: SectionOptions): string;
+/**
+* Cut `body` to the first `maxLines` lines. If anything was dropped, append
+* a marker line telling the agent how to get the rest. `maxLines <= 0`
+* disables truncation.
+*/
+declare function truncateBody(body: string, maxLines: number): string;
+export { truncateBody, scan, renderToc, renderSection, pathOf, parseSelector, match, countLines, buildSections, TocOptions, Segment, SectionOptions, Section, Op, Kind, Heading };