npm - @matrica-code/snippet-extractor - Versions diffs - 1.0.0 - Mend

@matrica-code/snippet-extractor 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md ADDED Viewed

@@ -0,0 +1,169 @@
+# snippet-extractor
+The **producer** side of snippet-viewer. It harvests `// extract-code <name>`
+markers out of source files into a `snippets.json` keyed `name@filename.ext` —
+exactly the format the [`<snippet-viewer>`](../README.md) web component renders.
+Parses with [Tree-sitter](https://tree-sitter.github.io/), so it works across
+**JS / TS / TSX and Java** (more grammars can be registered — see below).
+## Markers
+```ts
+// extract-code <name>     // export the node that follows, keyed as <name>@<file>
+// extract-code ignore     // strip the following node from any snippet it lives in
+```
+If the marked node is an `import`, the **whole file** is emitted.
+## Ways to run it
+Pick the channel that fits the consumer:
+| Consumer                          | Channel             |
+| --------------------------------- | ------------------- |
+| JS/TS repo (has Node)             | **npm / npx**       |
+| Java or any repo (no Node)        | **GitHub Action**   |
+| Any CI or local, pinned toolchain | **container image** |
+### As a GitHub Action
+The repo ships a composite action that runs the published image, so it works in
+**any** repo — Node, Java/Gradle, Maven, whatever — with no local toolchain.
+Add one step:
+```yaml
+# .github/workflows/snippets.yml
+name: snippets
+on:
+  push:
+    branches: [main]
+jobs:
+  extract:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Extract snippets
+        uses: matrica-code/snippet-viewer/extractor@main # pin to extractor-vX.Y.Z for reproducibility
+        with:
+          snippet-file: snippets.json # output path, relative to repo root
+          paths: src # space-separated dirs/files to scan
+          # reset: "true"            # start from {} (default); set "false" to merge
+          # image-tag: "1.0.0"       # which ghcr.io image tag to run (default: latest)
+      # then commit the file, or upload it, or deploy it to your snippet-viewer host
+      - uses: actions/upload-artifact@v4
+        with:
+          name: snippets
+          path: snippets.json
+```
+A **Java repo** is identical — just point `paths` at the sources:
+```yaml
+- uses: matrica-code/snippet-viewer/extractor@main
+  with:
+    snippet-file: docs/snippets.json
+    paths: src/main/java
+```
+The action requires a Linux runner with Docker available (GitHub-hosted
+`ubuntu-latest` has it). It pulls `ghcr.io/matrica-code/snippet-extractor` and
+runs it over your checked-out workspace.
+### Via npm / npx (Node repos)
+No install step needed — run the published package directly:
+```bash
+npx @matrica-code/snippet-extractor --reset --snippetFile=snippets.json src
+```
+Or add it as a dev dependency and wire a script:
+```jsonc
+// package.json
+{
+  "scripts": {
+    "snippets": "extract-snippets --reset --snippetFile=public/snippets.json src"
+  },
+  "devDependencies": {
+    "@matrica-code/snippet-extractor": "^1.0.0"
+  }
+}
+```
+(The Tree-sitter grammars are native addons; npm fetches prebuilt binaries for
+common platforms, falling back to a local compile if none match.)
+### With a container (Docker or Podman)
+OCI-standard image, so `podman` and `docker` are interchangeable. All paths are
+relative to `/work`, the mount point for the repo you're scanning.
+```bash
+# pull the published image...
+podman run --rm -v "$PWD":/work \
+  ghcr.io/matrica-code/snippet-extractor:latest \
+  --reset --snippetFile=/work/snippets.json /work/src
+# ...or build it locally from this directory
+podman build -t snippet-extractor extractor        # or: docker build ...
+```
+On macOS, Podman is daemonless and avoids the Docker Desktop GUI/login gate:
+```bash
+podman machine init    # one-time, if you have no machine yet
+podman machine start
+```
+If the CLI isn't on `PATH`, the macOS installer puts it at `/opt/podman/bin/podman`.
+### Locally from source (Node)
+```bash
+cd extractor && npm install
+node extractSnippets.mjs --reset --snippetFile=../example/snippets.json [<dir|file> ...]
+```
+`--reset` starts from `{}`; omit it to merge into an existing file.
+## Publishing
+Both artifacts publish from a single version tag. Bump `version` in
+`extractor/package.json`, then:
+```bash
+git tag extractor-v1.0.0
+git push origin extractor-v1.0.0
+```
+That fires two workflows:
+- `.github/workflows/extractor-npm.yml` → publishes `@matrica-code/snippet-extractor`
+  to npm (needs an `NPM_TOKEN` repo secret).
+- `.github/workflows/extractor-image.yml` → builds the multi-arch image and pushes
+  `ghcr.io/matrica-code/snippet-extractor:1.0.0` + `:latest` (uses the built-in
+  `GITHUB_TOKEN`).
+## CLI
+```
+extractSnippets.mjs --snippetFile=<path> [--reset] <dir-or-file> [<dir-or-file> ...]
+```
+| Flag                   | Meaning                                                                   |
+| ---------------------- | ------------------------------------------------------------------------- |
+| `--snippetFile=<path>` | Output JSON file (merged into unless `--reset`). Required.                |
+| `--reset`              | Start from `{}` instead of merging.                                       |
+| `<dir-or-file> ...`    | Roots to scan recursively (`node_modules`, `dist`, `.git`, etc. skipped). |
+## Adding a language
+Register an entry in `LANGUAGES` in `extractSnippets.mjs`: the file extension, a
+`load()` returning the grammar, the grammar's comment node types, and which node
+type triggers whole-file mode. (Prism in the viewer already highlights many
+languages; extraction just needs the matching Tree-sitter grammar as a dep.)

package/extractSnippets.mjs ADDED Viewed

@@ -0,0 +1,280 @@
+#!/usr/bin/env node
+// Portable, multi-language snippet extractor.
+//
+// Drop-in replacement for the jscodeshift-based extractSnippets.mjs that works
+// across JS/TS/TSX *and* Java (and any other Tree-sitter grammar you register).
+//
+// Usage:
+//   node tools/codemod/extractSnippets.portable.mjs \
+//     --snippetFile=path/to/snippets.json [--reset] <dir-or-file> [<dir-or-file> ...]
+//
+// Marker conventions (identical to the original):
+//   // extract-code <name>        -> export the node that follows this comment
+//   // extract-code ignore        -> strip the following node from any snippet it lives in
+//
+// Behaviors preserved from the jscodeshift version:
+//   * snippet keys are `<name>@<basename>` so the same name in two files is safe
+//   * if the marked node is an import (whole-file trigger), the ENTIRE file is emitted
+//   * the existing snippet file is merged into, not overwritten
+//   * whole-file snippets do NOT honor `ignore` (only node-mode snippets do) — faithful
+//     to the original, where ignore removal only affected AST-serialized nodes.
+import fs from "fs";
+import path from "path";
+import { createRequire } from "module";
+const require = createRequire(import.meta.url);
+// ---------------------------------------------------------------------------
+// Language registry. Grammars are loaded lazily so a project that never touches
+// Java doesn't need tree-sitter-java installed, and vice versa.
+// ---------------------------------------------------------------------------
+const LANGUAGES = {
+  ".ts": { load: () => require("tree-sitter-typescript").typescript, comments: ["comment"], wholeFile: ["import_statement"] },
+  ".tsx": { load: () => require("tree-sitter-typescript").tsx, comments: ["comment"], wholeFile: ["import_statement"] },
+  ".mts": { load: () => require("tree-sitter-typescript").typescript, comments: ["comment"], wholeFile: ["import_statement"] },
+  ".cts": { load: () => require("tree-sitter-typescript").typescript, comments: ["comment"], wholeFile: ["import_statement"] },
+  ".js": { load: () => require("tree-sitter-javascript"), comments: ["comment"], wholeFile: ["import_statement"] },
+  ".jsx": { load: () => require("tree-sitter-javascript"), comments: ["comment"], wholeFile: ["import_statement"] },
+  ".mjs": { load: () => require("tree-sitter-javascript"), comments: ["comment"], wholeFile: ["import_statement"] },
+  ".java": { load: () => require("tree-sitter-java"), comments: ["line_comment", "block_comment"], wholeFile: ["import_declaration"] },
+};
+// Cache one Parser per extension so we don't reload grammars per file.
+const parserCache = new Map();
+function getParser(ext) {
+  if (parserCache.has(ext)) return parserCache.get(ext);
+  const cfg = LANGUAGES[ext];
+  if (!cfg) return null;
+  const Parser = require("tree-sitter");
+  const parser = new Parser();
+  parser.setLanguage(cfg.load());
+  const entry = { parser, cfg };
+  parserCache.set(ext, entry);
+  return entry;
+}
+// ---------------------------------------------------------------------------
+// CLI args
+// ---------------------------------------------------------------------------
+function parseArgs(argv) {
+  let snippetFile = null;
+  let reset = false;
+  const paths = [];
+  for (const arg of argv) {
+    if (arg.startsWith("--snippetFile=")) snippetFile = arg.slice("--snippetFile=".length);
+    else if (arg === "--reset") reset = true;
+    else if (arg.startsWith("--")) continue; // ignore unknown flags (e.g. legacy jscodeshift flags)
+    else paths.push(arg);
+  }
+  return { snippetFile, reset, paths };
+}
+// ---------------------------------------------------------------------------
+// File discovery
+// ---------------------------------------------------------------------------
+const SKIP_DIRS = new Set(["node_modules", ".git", "dist", "build", ".next", "out", "coverage"]);
+function* walk(target) {
+  const stat = fs.statSync(target);
+  if (stat.isFile()) {
+    yield target;
+    return;
+  }
+  for (const entry of fs.readdirSync(target, { withFileTypes: true })) {
+    if (entry.isDirectory()) {
+      if (SKIP_DIRS.has(entry.name)) continue;
+      yield* walk(path.join(target, entry.name));
+    } else if (entry.isFile()) {
+      yield path.join(target, entry.name);
+    }
+  }
+}
+// ---------------------------------------------------------------------------
+// Tree helpers
+// ---------------------------------------------------------------------------
+function collectComments(root, commentTypes) {
+  const out = [];
+  const stack = [root];
+  while (stack.length) {
+    const node = stack.pop();
+    if (commentTypes.includes(node.type)) out.push(node);
+    for (let i = node.namedChildCount - 1; i >= 0; i--) stack.push(node.namedChild(i));
+  }
+  return out;
+}
+// The source span a marker comment applies to. Returns the next named,
+// non-comment sibling, but normalizes two grammar quirks so the snippet matches
+// what a real pretty-printer (recast/toSource) would emit:
+//   * method decorators are *separate* siblings — extend the span across the
+//     decorator run to include the declaration it decorates
+//   * a trailing `;` after a field/declaration sits outside the node — swallow it
+function unitFor(comment, commentTypes, source) {
+  let n = comment.nextNamedSibling;
+  while (n && commentTypes.includes(n.type)) n = n.nextNamedSibling;
+  if (!n) return null;
+  const startNode = n;
+  let endNode = n;
+  if (n.type === "decorator") {
+    let m = n;
+    while (m && m.type === "decorator") {
+      endNode = m;
+      m = m.nextNamedSibling;
+    }
+    if (m) endNode = m; // the decorated declaration itself
+  }
+  let endIndex = endNode.endIndex;
+  let e = endIndex;
+  while (e < source.length && /\s/.test(source[e])) e++;
+  if (source[e] === ";") endIndex = e + 1;
+  return {
+    startIndex: startNode.startIndex,
+    endIndex,
+    startPosition: startNode.startPosition,
+    type: n.type,
+  };
+}
+// Expand a [start, end) byte range to cover whole lines, so removing it leaves
+// no dangling blank line (the moral equivalent of jscodeshift's node removal).
+function expandToFullLines(source, start, end) {
+  let s = start;
+  while (s > 0 && source[s - 1] !== "\n") s--;
+  let e = end;
+  while (e < source.length && source[e] !== "\n") e++;
+  if (e < source.length) e++; // include the trailing newline
+  return [s, e];
+}
+function parseSnippetName(commentText) {
+  // commentText includes the comment delimiters, e.g. "// extract-code timer"
+  // or "/* extract-code Foo */". Strip delimiters, take what follows the marker.
+  const after = commentText.split("extract-code ")[1] ?? "";
+  return after
+    .replace(/\*\/\s*$/, "") // trailing block-comment close
+    .replace(/\r?\n[\s\S]*$/, "") // anything past the first line
+    .trim();
+}
+// ---------------------------------------------------------------------------
+// Per-file extraction
+// ---------------------------------------------------------------------------
+function extractFromFile(filePath, snippets) {
+  const ext = path.extname(filePath);
+  const entry = getParser(ext);
+  if (!entry) return; // not a language we know — skip silently
+  const { parser, cfg } = entry;
+  const source = fs.readFileSync(filePath, "utf8");
+  const tree = parser.parse(source);
+  const basename = path.basename(filePath);
+  const comments = collectComments(tree.rootNode, cfg.comments);
+  // Pass 1: gather ignore ranges (line-expanded) so node-mode snippets can splice them out.
+  const ignoreRanges = [];
+  for (const comment of comments) {
+    if (!comment.text.includes("extract-code ignore")) continue;
+    const unit = unitFor(comment, cfg.comments, source);
+    if (!unit) continue;
+    ignoreRanges.push(expandToFullLines(source, comment.startIndex, unit.endIndex));
+  }
+  // Pass 2: emit named snippets.
+  for (const comment of comments) {
+    if (!comment.text.includes("extract-code")) continue;
+    if (comment.text.includes("extract-code ignore")) continue;
+    const name = parseSnippetName(comment.text);
+    if (!name) continue;
+    const key = `${name}@${basename}`;
+    const unit = unitFor(comment, cfg.comments, source);
+    if (!unit) continue;
+    // Whole-file mode: marker leads an import -> emit the full source.
+    if (cfg.wholeFile.includes(unit.type)) {
+      snippets[key] = source.replace(/^\s*\/\/ extract-code.*\r?\n/, "");
+      continue;
+    }
+    // Node mode: slice the unit's exact span, then splice out any nested ignores.
+    let text = source.slice(unit.startIndex, unit.endIndex);
+    const inner = ignoreRanges
+      .filter(([s, e]) => s >= unit.startIndex && e <= unit.endIndex)
+      .sort((a, b) => b[0] - a[0]); // descending so earlier splices don't shift later offsets
+    for (const [s, e] of inner) {
+      const rs = s - unit.startIndex;
+      const re = e - unit.startIndex;
+      text = text.slice(0, rs) + text.slice(re);
+    }
+    // Re-indent to column 0, matching jscodeshift's toSource() output: the first
+    // line already starts at the node, but continuation lines keep their in-source
+    // indentation, so strip the node's base column from each subsequent line.
+    const baseCol = unit.startPosition.column;
+    if (baseCol > 0) {
+      text = text
+        .split("\n")
+        .map((line, i) => (i === 0 ? line : line.replace(new RegExp(`^[ \\t]{0,${baseCol}}`), "")))
+        .join("\n");
+    }
+    snippets[key] = text.replace(/^\/\/ extract-code.*\r?\n/, "").trimEnd();
+  }
+}
+// ---------------------------------------------------------------------------
+// Main
+// ---------------------------------------------------------------------------
+function main() {
+  const { snippetFile, reset, paths } = parseArgs(process.argv.slice(2));
+  if (!snippetFile) {
+    console.error("error: --snippetFile=<path> is required");
+    process.exit(1);
+  }
+  if (paths.length === 0) {
+    console.error("error: at least one <dir-or-file> is required");
+    process.exit(1);
+  }
+  let snippets = {};
+  if (reset) {
+    snippets = {};
+  } else {
+    try {
+      snippets = JSON.parse(fs.readFileSync(snippetFile, "utf8") || "{}");
+    } catch {
+      snippets = {};
+    }
+  }
+  let scanned = 0;
+  for (const root of paths) {
+    if (!fs.existsSync(root)) {
+      console.warn(`warning: path does not exist, skipping: ${root}`);
+      continue;
+    }
+    for (const file of walk(root)) {
+      if (!LANGUAGES[path.extname(file)]) continue;
+      try {
+        extractFromFile(file, snippets);
+        scanned++;
+      } catch (e) {
+        console.warn(`warning: failed to process ${file}: ${e.message}`);
+      }
+    }
+  }
+  fs.mkdirSync(path.dirname(snippetFile), { recursive: true });
+  fs.writeFileSync(snippetFile, JSON.stringify(snippets, null, 2) + "\n");
+  console.log(`extracted ${Object.keys(snippets).length} snippet(s) from ${scanned} file(s) -> ${snippetFile}`);
+}
+main();

package/package.json ADDED Viewed

@@ -0,0 +1,47 @@
+{
+  "name": "@matrica-code/snippet-extractor",
+  "version": "1.0.0",
+  "type": "module",
+  "description": "Multi-language (JS/TS/TSX/Java) extract-code snippet harvester. Produces the snippets.json consumed by snippet-viewer.",
+  "keywords": [
+    "snippets",
+    "code-snippets",
+    "documentation",
+    "tree-sitter",
+    "codemod",
+    "extract-code",
+    "snippet-viewer"
+  ],
+  "license": "MIT",
+  "repository": {
+    "type": "git",
+    "url": "git+https://github.com/matrica-code/snippet-viewer.git",
+    "directory": "extractor"
+  },
+  "homepage": "https://github.com/matrica-code/snippet-viewer/tree/main/extractor#readme",
+  "bugs": {
+    "url": "https://github.com/matrica-code/snippet-viewer/issues"
+  },
+  "bin": {
+    "extract-snippets": "extractSnippets.mjs"
+  },
+  "files": [
+    "extractSnippets.mjs",
+    "README.md"
+  ],
+  "scripts": {
+    "extract": "node extractSnippets.mjs"
+  },
+  "engines": {
+    "node": ">=18"
+  },
+  "publishConfig": {
+    "access": "public"
+  },
+  "dependencies": {
+    "tree-sitter": "0.25.0",
+    "tree-sitter-java": "0.23.5",
+    "tree-sitter-javascript": "0.25.0",
+    "tree-sitter-typescript": "0.23.2"
+  }
+}