@matrica-code/snippet-extractor 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,169 @@
1
+ # snippet-extractor
2
+
3
+ The **producer** side of snippet-viewer. It harvests `// extract-code <name>`
4
+ markers out of source files into a `snippets.json` keyed `name@filename.ext` —
5
+ exactly the format the [`<snippet-viewer>`](../README.md) web component renders.
6
+
7
+ Parses with [Tree-sitter](https://tree-sitter.github.io/), so it works across
8
+ **JS / TS / TSX and Java** (more grammars can be registered — see below).
9
+
10
+ ## Markers
11
+
12
+ ```ts
13
+ // extract-code <name> // export the node that follows, keyed as <name>@<file>
14
+ // extract-code ignore // strip the following node from any snippet it lives in
15
+ ```
16
+
17
+ If the marked node is an `import`, the **whole file** is emitted.
18
+
19
+ ## Ways to run it
20
+
21
+ Pick the channel that fits the consumer:
22
+
23
+ | Consumer | Channel |
24
+ | --------------------------------- | ------------------- |
25
+ | JS/TS repo (has Node) | **npm / npx** |
26
+ | Java or any repo (no Node) | **GitHub Action** |
27
+ | Any CI or local, pinned toolchain | **container image** |
28
+
29
+ ### As a GitHub Action
30
+
31
+ The repo ships a composite action that runs the published image, so it works in
32
+ **any** repo — Node, Java/Gradle, Maven, whatever — with no local toolchain.
33
+ Add one step:
34
+
35
+ ```yaml
36
+ # .github/workflows/snippets.yml
37
+ name: snippets
38
+ on:
39
+ push:
40
+ branches: [main]
41
+
42
+ jobs:
43
+ extract:
44
+ runs-on: ubuntu-latest
45
+ steps:
46
+ - uses: actions/checkout@v4
47
+
48
+ - name: Extract snippets
49
+ uses: matrica-code/snippet-viewer/extractor@main # pin to extractor-vX.Y.Z for reproducibility
50
+ with:
51
+ snippet-file: snippets.json # output path, relative to repo root
52
+ paths: src # space-separated dirs/files to scan
53
+ # reset: "true" # start from {} (default); set "false" to merge
54
+ # image-tag: "1.0.0" # which ghcr.io image tag to run (default: latest)
55
+
56
+ # then commit the file, or upload it, or deploy it to your snippet-viewer host
57
+ - uses: actions/upload-artifact@v4
58
+ with:
59
+ name: snippets
60
+ path: snippets.json
61
+ ```
62
+
63
+ A **Java repo** is identical — just point `paths` at the sources:
64
+
65
+ ```yaml
66
+ - uses: matrica-code/snippet-viewer/extractor@main
67
+ with:
68
+ snippet-file: docs/snippets.json
69
+ paths: src/main/java
70
+ ```
71
+
72
+ The action requires a Linux runner with Docker available (GitHub-hosted
73
+ `ubuntu-latest` has it). It pulls `ghcr.io/matrica-code/snippet-extractor` and
74
+ runs it over your checked-out workspace.
75
+
76
+ ### Via npm / npx (Node repos)
77
+
78
+ No install step needed — run the published package directly:
79
+
80
+ ```bash
81
+ npx @matrica-code/snippet-extractor --reset --snippetFile=snippets.json src
82
+ ```
83
+
84
+ Or add it as a dev dependency and wire a script:
85
+
86
+ ```jsonc
87
+ // package.json
88
+ {
89
+ "scripts": {
90
+ "snippets": "extract-snippets --reset --snippetFile=public/snippets.json src"
91
+ },
92
+ "devDependencies": {
93
+ "@matrica-code/snippet-extractor": "^1.0.0"
94
+ }
95
+ }
96
+ ```
97
+
98
+ (The Tree-sitter grammars are native addons; npm fetches prebuilt binaries for
99
+ common platforms, falling back to a local compile if none match.)
100
+
101
+ ### With a container (Docker or Podman)
102
+
103
+ OCI-standard image, so `podman` and `docker` are interchangeable. All paths are
104
+ relative to `/work`, the mount point for the repo you're scanning.
105
+
106
+ ```bash
107
+ # pull the published image...
108
+ podman run --rm -v "$PWD":/work \
109
+ ghcr.io/matrica-code/snippet-extractor:latest \
110
+ --reset --snippetFile=/work/snippets.json /work/src
111
+
112
+ # ...or build it locally from this directory
113
+ podman build -t snippet-extractor extractor # or: docker build ...
114
+ ```
115
+
116
+ On macOS, Podman is daemonless and avoids the Docker Desktop GUI/login gate:
117
+
118
+ ```bash
119
+ podman machine init # one-time, if you have no machine yet
120
+ podman machine start
121
+ ```
122
+
123
+ If the CLI isn't on `PATH`, the macOS installer puts it at `/opt/podman/bin/podman`.
124
+
125
+ ### Locally from source (Node)
126
+
127
+ ```bash
128
+ cd extractor && npm install
129
+ node extractSnippets.mjs --reset --snippetFile=../example/snippets.json [<dir|file> ...]
130
+ ```
131
+
132
+ `--reset` starts from `{}`; omit it to merge into an existing file.
133
+
134
+ ## Publishing
135
+
136
+ Both artifacts publish from a single version tag. Bump `version` in
137
+ `extractor/package.json`, then:
138
+
139
+ ```bash
140
+ git tag extractor-v1.0.0
141
+ git push origin extractor-v1.0.0
142
+ ```
143
+
144
+ That fires two workflows:
145
+
146
+ - `.github/workflows/extractor-npm.yml` → publishes `@matrica-code/snippet-extractor`
147
+ to npm (needs an `NPM_TOKEN` repo secret).
148
+ - `.github/workflows/extractor-image.yml` → builds the multi-arch image and pushes
149
+ `ghcr.io/matrica-code/snippet-extractor:1.0.0` + `:latest` (uses the built-in
150
+ `GITHUB_TOKEN`).
151
+
152
+ ## CLI
153
+
154
+ ```
155
+ extractSnippets.mjs --snippetFile=<path> [--reset] <dir-or-file> [<dir-or-file> ...]
156
+ ```
157
+
158
+ | Flag | Meaning |
159
+ | ---------------------- | ------------------------------------------------------------------------- |
160
+ | `--snippetFile=<path>` | Output JSON file (merged into unless `--reset`). Required. |
161
+ | `--reset` | Start from `{}` instead of merging. |
162
+ | `<dir-or-file> ...` | Roots to scan recursively (`node_modules`, `dist`, `.git`, etc. skipped). |
163
+
164
+ ## Adding a language
165
+
166
+ Register an entry in `LANGUAGES` in `extractSnippets.mjs`: the file extension, a
167
+ `load()` returning the grammar, the grammar's comment node types, and which node
168
+ type triggers whole-file mode. (Prism in the viewer already highlights many
169
+ languages; extraction just needs the matching Tree-sitter grammar as a dep.)
@@ -0,0 +1,280 @@
1
+ #!/usr/bin/env node
2
+ // Portable, multi-language snippet extractor.
3
+ //
4
+ // Drop-in replacement for the jscodeshift-based extractSnippets.mjs that works
5
+ // across JS/TS/TSX *and* Java (and any other Tree-sitter grammar you register).
6
+ //
7
+ // Usage:
8
+ // node tools/codemod/extractSnippets.portable.mjs \
9
+ // --snippetFile=path/to/snippets.json [--reset] <dir-or-file> [<dir-or-file> ...]
10
+ //
11
+ // Marker conventions (identical to the original):
12
+ // // extract-code <name> -> export the node that follows this comment
13
+ // // extract-code ignore -> strip the following node from any snippet it lives in
14
+ //
15
+ // Behaviors preserved from the jscodeshift version:
16
+ // * snippet keys are `<name>@<basename>` so the same name in two files is safe
17
+ // * if the marked node is an import (whole-file trigger), the ENTIRE file is emitted
18
+ // * the existing snippet file is merged into, not overwritten
19
+ // * whole-file snippets do NOT honor `ignore` (only node-mode snippets do) — faithful
20
+ // to the original, where ignore removal only affected AST-serialized nodes.
21
+
22
+ import fs from "fs";
23
+ import path from "path";
24
+ import { createRequire } from "module";
25
+
26
+ const require = createRequire(import.meta.url);
27
+
28
+ // ---------------------------------------------------------------------------
29
+ // Language registry. Grammars are loaded lazily so a project that never touches
30
+ // Java doesn't need tree-sitter-java installed, and vice versa.
31
+ // ---------------------------------------------------------------------------
32
+ const LANGUAGES = {
33
+ ".ts": { load: () => require("tree-sitter-typescript").typescript, comments: ["comment"], wholeFile: ["import_statement"] },
34
+ ".tsx": { load: () => require("tree-sitter-typescript").tsx, comments: ["comment"], wholeFile: ["import_statement"] },
35
+ ".mts": { load: () => require("tree-sitter-typescript").typescript, comments: ["comment"], wholeFile: ["import_statement"] },
36
+ ".cts": { load: () => require("tree-sitter-typescript").typescript, comments: ["comment"], wholeFile: ["import_statement"] },
37
+ ".js": { load: () => require("tree-sitter-javascript"), comments: ["comment"], wholeFile: ["import_statement"] },
38
+ ".jsx": { load: () => require("tree-sitter-javascript"), comments: ["comment"], wholeFile: ["import_statement"] },
39
+ ".mjs": { load: () => require("tree-sitter-javascript"), comments: ["comment"], wholeFile: ["import_statement"] },
40
+ ".java": { load: () => require("tree-sitter-java"), comments: ["line_comment", "block_comment"], wholeFile: ["import_declaration"] },
41
+ };
42
+
43
+ // Cache one Parser per extension so we don't reload grammars per file.
44
+ const parserCache = new Map();
45
+ function getParser(ext) {
46
+ if (parserCache.has(ext)) return parserCache.get(ext);
47
+ const cfg = LANGUAGES[ext];
48
+ if (!cfg) return null;
49
+ const Parser = require("tree-sitter");
50
+ const parser = new Parser();
51
+ parser.setLanguage(cfg.load());
52
+ const entry = { parser, cfg };
53
+ parserCache.set(ext, entry);
54
+ return entry;
55
+ }
56
+
57
+ // ---------------------------------------------------------------------------
58
+ // CLI args
59
+ // ---------------------------------------------------------------------------
60
+ function parseArgs(argv) {
61
+ let snippetFile = null;
62
+ let reset = false;
63
+ const paths = [];
64
+ for (const arg of argv) {
65
+ if (arg.startsWith("--snippetFile=")) snippetFile = arg.slice("--snippetFile=".length);
66
+ else if (arg === "--reset") reset = true;
67
+ else if (arg.startsWith("--")) continue; // ignore unknown flags (e.g. legacy jscodeshift flags)
68
+ else paths.push(arg);
69
+ }
70
+ return { snippetFile, reset, paths };
71
+ }
72
+
73
+ // ---------------------------------------------------------------------------
74
+ // File discovery
75
+ // ---------------------------------------------------------------------------
76
+ const SKIP_DIRS = new Set(["node_modules", ".git", "dist", "build", ".next", "out", "coverage"]);
77
+
78
+ function* walk(target) {
79
+ const stat = fs.statSync(target);
80
+ if (stat.isFile()) {
81
+ yield target;
82
+ return;
83
+ }
84
+ for (const entry of fs.readdirSync(target, { withFileTypes: true })) {
85
+ if (entry.isDirectory()) {
86
+ if (SKIP_DIRS.has(entry.name)) continue;
87
+ yield* walk(path.join(target, entry.name));
88
+ } else if (entry.isFile()) {
89
+ yield path.join(target, entry.name);
90
+ }
91
+ }
92
+ }
93
+
94
+ // ---------------------------------------------------------------------------
95
+ // Tree helpers
96
+ // ---------------------------------------------------------------------------
97
+ function collectComments(root, commentTypes) {
98
+ const out = [];
99
+ const stack = [root];
100
+ while (stack.length) {
101
+ const node = stack.pop();
102
+ if (commentTypes.includes(node.type)) out.push(node);
103
+ for (let i = node.namedChildCount - 1; i >= 0; i--) stack.push(node.namedChild(i));
104
+ }
105
+ return out;
106
+ }
107
+
108
+ // The source span a marker comment applies to. Returns the next named,
109
+ // non-comment sibling, but normalizes two grammar quirks so the snippet matches
110
+ // what a real pretty-printer (recast/toSource) would emit:
111
+ // * method decorators are *separate* siblings — extend the span across the
112
+ // decorator run to include the declaration it decorates
113
+ // * a trailing `;` after a field/declaration sits outside the node — swallow it
114
+ function unitFor(comment, commentTypes, source) {
115
+ let n = comment.nextNamedSibling;
116
+ while (n && commentTypes.includes(n.type)) n = n.nextNamedSibling;
117
+ if (!n) return null;
118
+
119
+ const startNode = n;
120
+ let endNode = n;
121
+ if (n.type === "decorator") {
122
+ let m = n;
123
+ while (m && m.type === "decorator") {
124
+ endNode = m;
125
+ m = m.nextNamedSibling;
126
+ }
127
+ if (m) endNode = m; // the decorated declaration itself
128
+ }
129
+
130
+ let endIndex = endNode.endIndex;
131
+ let e = endIndex;
132
+ while (e < source.length && /\s/.test(source[e])) e++;
133
+ if (source[e] === ";") endIndex = e + 1;
134
+
135
+ return {
136
+ startIndex: startNode.startIndex,
137
+ endIndex,
138
+ startPosition: startNode.startPosition,
139
+ type: n.type,
140
+ };
141
+ }
142
+
143
+ // Expand a [start, end) byte range to cover whole lines, so removing it leaves
144
+ // no dangling blank line (the moral equivalent of jscodeshift's node removal).
145
+ function expandToFullLines(source, start, end) {
146
+ let s = start;
147
+ while (s > 0 && source[s - 1] !== "\n") s--;
148
+ let e = end;
149
+ while (e < source.length && source[e] !== "\n") e++;
150
+ if (e < source.length) e++; // include the trailing newline
151
+ return [s, e];
152
+ }
153
+
154
+ function parseSnippetName(commentText) {
155
+ // commentText includes the comment delimiters, e.g. "// extract-code timer"
156
+ // or "/* extract-code Foo */". Strip delimiters, take what follows the marker.
157
+ const after = commentText.split("extract-code ")[1] ?? "";
158
+ return after
159
+ .replace(/\*\/\s*$/, "") // trailing block-comment close
160
+ .replace(/\r?\n[\s\S]*$/, "") // anything past the first line
161
+ .trim();
162
+ }
163
+
164
+ // ---------------------------------------------------------------------------
165
+ // Per-file extraction
166
+ // ---------------------------------------------------------------------------
167
+ function extractFromFile(filePath, snippets) {
168
+ const ext = path.extname(filePath);
169
+ const entry = getParser(ext);
170
+ if (!entry) return; // not a language we know — skip silently
171
+
172
+ const { parser, cfg } = entry;
173
+ const source = fs.readFileSync(filePath, "utf8");
174
+ const tree = parser.parse(source);
175
+ const basename = path.basename(filePath);
176
+
177
+ const comments = collectComments(tree.rootNode, cfg.comments);
178
+
179
+ // Pass 1: gather ignore ranges (line-expanded) so node-mode snippets can splice them out.
180
+ const ignoreRanges = [];
181
+ for (const comment of comments) {
182
+ if (!comment.text.includes("extract-code ignore")) continue;
183
+ const unit = unitFor(comment, cfg.comments, source);
184
+ if (!unit) continue;
185
+ ignoreRanges.push(expandToFullLines(source, comment.startIndex, unit.endIndex));
186
+ }
187
+
188
+ // Pass 2: emit named snippets.
189
+ for (const comment of comments) {
190
+ if (!comment.text.includes("extract-code")) continue;
191
+ if (comment.text.includes("extract-code ignore")) continue;
192
+
193
+ const name = parseSnippetName(comment.text);
194
+ if (!name) continue;
195
+ const key = `${name}@${basename}`;
196
+
197
+ const unit = unitFor(comment, cfg.comments, source);
198
+ if (!unit) continue;
199
+
200
+ // Whole-file mode: marker leads an import -> emit the full source.
201
+ if (cfg.wholeFile.includes(unit.type)) {
202
+ snippets[key] = source.replace(/^\s*\/\/ extract-code.*\r?\n/, "");
203
+ continue;
204
+ }
205
+
206
+ // Node mode: slice the unit's exact span, then splice out any nested ignores.
207
+ let text = source.slice(unit.startIndex, unit.endIndex);
208
+ const inner = ignoreRanges
209
+ .filter(([s, e]) => s >= unit.startIndex && e <= unit.endIndex)
210
+ .sort((a, b) => b[0] - a[0]); // descending so earlier splices don't shift later offsets
211
+ for (const [s, e] of inner) {
212
+ const rs = s - unit.startIndex;
213
+ const re = e - unit.startIndex;
214
+ text = text.slice(0, rs) + text.slice(re);
215
+ }
216
+
217
+ // Re-indent to column 0, matching jscodeshift's toSource() output: the first
218
+ // line already starts at the node, but continuation lines keep their in-source
219
+ // indentation, so strip the node's base column from each subsequent line.
220
+ const baseCol = unit.startPosition.column;
221
+ if (baseCol > 0) {
222
+ text = text
223
+ .split("\n")
224
+ .map((line, i) => (i === 0 ? line : line.replace(new RegExp(`^[ \\t]{0,${baseCol}}`), "")))
225
+ .join("\n");
226
+ }
227
+
228
+ snippets[key] = text.replace(/^\/\/ extract-code.*\r?\n/, "").trimEnd();
229
+ }
230
+ }
231
+
232
+ // ---------------------------------------------------------------------------
233
+ // Main
234
+ // ---------------------------------------------------------------------------
235
+ function main() {
236
+ const { snippetFile, reset, paths } = parseArgs(process.argv.slice(2));
237
+
238
+ if (!snippetFile) {
239
+ console.error("error: --snippetFile=<path> is required");
240
+ process.exit(1);
241
+ }
242
+ if (paths.length === 0) {
243
+ console.error("error: at least one <dir-or-file> is required");
244
+ process.exit(1);
245
+ }
246
+
247
+ let snippets = {};
248
+ if (reset) {
249
+ snippets = {};
250
+ } else {
251
+ try {
252
+ snippets = JSON.parse(fs.readFileSync(snippetFile, "utf8") || "{}");
253
+ } catch {
254
+ snippets = {};
255
+ }
256
+ }
257
+
258
+ let scanned = 0;
259
+ for (const root of paths) {
260
+ if (!fs.existsSync(root)) {
261
+ console.warn(`warning: path does not exist, skipping: ${root}`);
262
+ continue;
263
+ }
264
+ for (const file of walk(root)) {
265
+ if (!LANGUAGES[path.extname(file)]) continue;
266
+ try {
267
+ extractFromFile(file, snippets);
268
+ scanned++;
269
+ } catch (e) {
270
+ console.warn(`warning: failed to process ${file}: ${e.message}`);
271
+ }
272
+ }
273
+ }
274
+
275
+ fs.mkdirSync(path.dirname(snippetFile), { recursive: true });
276
+ fs.writeFileSync(snippetFile, JSON.stringify(snippets, null, 2) + "\n");
277
+ console.log(`extracted ${Object.keys(snippets).length} snippet(s) from ${scanned} file(s) -> ${snippetFile}`);
278
+ }
279
+
280
+ main();
package/package.json ADDED
@@ -0,0 +1,47 @@
1
+ {
2
+ "name": "@matrica-code/snippet-extractor",
3
+ "version": "1.0.0",
4
+ "type": "module",
5
+ "description": "Multi-language (JS/TS/TSX/Java) extract-code snippet harvester. Produces the snippets.json consumed by snippet-viewer.",
6
+ "keywords": [
7
+ "snippets",
8
+ "code-snippets",
9
+ "documentation",
10
+ "tree-sitter",
11
+ "codemod",
12
+ "extract-code",
13
+ "snippet-viewer"
14
+ ],
15
+ "license": "MIT",
16
+ "repository": {
17
+ "type": "git",
18
+ "url": "git+https://github.com/matrica-code/snippet-viewer.git",
19
+ "directory": "extractor"
20
+ },
21
+ "homepage": "https://github.com/matrica-code/snippet-viewer/tree/main/extractor#readme",
22
+ "bugs": {
23
+ "url": "https://github.com/matrica-code/snippet-viewer/issues"
24
+ },
25
+ "bin": {
26
+ "extract-snippets": "extractSnippets.mjs"
27
+ },
28
+ "files": [
29
+ "extractSnippets.mjs",
30
+ "README.md"
31
+ ],
32
+ "scripts": {
33
+ "extract": "node extractSnippets.mjs"
34
+ },
35
+ "engines": {
36
+ "node": ">=18"
37
+ },
38
+ "publishConfig": {
39
+ "access": "public"
40
+ },
41
+ "dependencies": {
42
+ "tree-sitter": "0.25.0",
43
+ "tree-sitter-java": "0.23.5",
44
+ "tree-sitter-javascript": "0.25.0",
45
+ "tree-sitter-typescript": "0.23.2"
46
+ }
47
+ }