wiki-search-index 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,74 @@
1
+ # Index format (v1)
2
+
3
+ A wiki-search index is a single, self-describing JSON document. A client
4
+ **assumes nothing** beyond this contract: it validates the version and required
5
+ fields, then builds result links purely from the index's own metadata. Any site
6
+ that emits this shape is searchable — wiki-search is not GitHub-specific.
7
+
8
+ ```json
9
+ {
10
+ "v": 1,
11
+ "site": {
12
+ "name": "wiki-search wiki",
13
+ "urlTemplate": "https://github.com/uhop/wiki-search/wiki/{page}",
14
+ "fragments": true
15
+ },
16
+ "docs": [
17
+ {
18
+ "id": 0,
19
+ "page": "Index-Format",
20
+ "title": "Index format",
21
+ "heading": "Validation",
22
+ "anchor": "validation",
23
+ "text": "full plain-text of the section…"
24
+ }
25
+ ]
26
+ }
27
+ ```
28
+
29
+ ## Fields
30
+
31
+ | Field | Meaning |
32
+ |-------|---------|
33
+ | `v` | Format version. This document is `1`. Clients reject versions they don't understand. |
34
+ | `site.name` | Human label for the corpus (shown in the UI). |
35
+ | `site.urlTemplate` | Result-URL template; **must contain `{page}`**. No hardcoded host. |
36
+ | `site.fragments` | `true` if the target renders [Text Fragments](https://developer.mozilla.org/docs/Web/Text_fragments). When `false`, clients omit the `:~:text=` directive. |
37
+ | `docs[]` | One entry per indexed section. |
38
+ | `doc.id` | Stable integer, sequential in build order. |
39
+ | `doc.page` | The `{page}` substitution — for GitHub wikis, the page's URL segment (`Foo-Bar`). |
40
+ | `doc.title` | Page display title. |
41
+ | `doc.heading` | Section heading (falls back to the page title for a page's preamble). |
42
+ | `doc.anchor` | In-page anchor slug; `""` means the page top. For GitHub, the heading's slug. |
43
+ | `doc.text` | Plain-text body of the section (markdown stripped), for the search engine. |
44
+
45
+ ## Building a result URL
46
+
47
+ ```
48
+ base = urlTemplate.replace("{page}", encodeURIComponent(doc.page))
49
+ hash = doc.anchor || "" (omit if empty)
50
+ text = ":~:text=" + <matched phrase> (only if site.fragments and a phrase)
51
+ result = base + ("#" + hash + text if hash or text)
52
+ ```
53
+
54
+ So a hit links to e.g.
55
+ `https://github.com/uhop/wiki-search/wiki/Index-Format#validation:~:text=clients%20reject`.
56
+
57
+ ## Validation (verify-or-explain)
58
+
59
+ A client must check, and on any failure show a specific message (never a blank
60
+ result box):
61
+
62
+ 1. the index is **fetchable** (else: 404 / network / no CORS);
63
+ 2. it is **valid JSON**;
64
+ 3. `v` is **supported** (else: format vN unsupported — app or index out of date);
65
+ 4. `site.urlTemplate` is present and contains `{page}`;
66
+ 5. `docs` is a non-empty array, each entry having `page`, `title`, `text`.
67
+
68
+ ## Versioning
69
+
70
+ `v` increases only on a breaking change to this shape. Additive, optional fields
71
+ do **not** bump `v`; clients ignore unknown fields. A client that meets a higher
72
+ `v` than it knows stops and says so rather than guessing.
73
+
74
+ The reference builder that emits this is `builder/wiki-index.mjs`.
package/LICENSE ADDED
@@ -0,0 +1,28 @@
1
+ BSD 3-Clause License
2
+
3
+ Copyright (c) 2026, Eugene Lazutkin
4
+
5
+ Redistribution and use in source and binary forms, with or without
6
+ modification, are permitted provided that the following conditions are met:
7
+
8
+ 1. Redistributions of source code must retain the above copyright notice, this
9
+ list of conditions and the following disclaimer.
10
+
11
+ 2. Redistributions in binary form must reproduce the above copyright notice,
12
+ this list of conditions and the following disclaimer in the documentation
13
+ and/or other materials provided with the distribution.
14
+
15
+ 3. Neither the name of the copyright holder nor the names of its
16
+ contributors may be used to endorse or promote products derived from
17
+ this software without specific prior written permission.
18
+
19
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
20
+ AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
21
+ IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
23
+ FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
24
+ DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
25
+ SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
26
+ CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
27
+ OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
28
+ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
package/README.md ADDED
@@ -0,0 +1,54 @@
1
+ # wiki-search
2
+
3
+ GitHub wikis are great for docs but have no real search. **wiki-search adds it:**
4
+ a bookmarklet (plus a hosted search page) that searches a wiki and takes you
5
+ straight to the matching section — without moving your docs off the wiki.
6
+
7
+ **▶ [Try the live demo & install the bookmarklet](https://uhop.github.io/wiki-search/)**
8
+
9
+ ## Use it
10
+
11
+ - **Search a wiki.** Drag the bookmarklet to your bookmarks bar, then open any
12
+ GitHub wiki that has a wiki-search index and search — each result jumps you to
13
+ the exact section. ([Try it now](https://uhop.github.io/wiki-search/app/?wiki=uhop/wiki-search)
14
+ on this project's own wiki — no install needed.)
15
+ - **Add search to your own wiki.** Build an index from your Markdown — needs
16
+ [Node](https://nodejs.org), nothing to install:
17
+
18
+ ```bash
19
+ npx wiki-search-index --wiki ./your-wiki # → your-wiki/search-index.json
20
+ ```
21
+
22
+ Commit that `search-index.json` into your wiki, then add a one-line Search
23
+ section. Full guide — including keeping the index fresh —
24
+ [Add search to your wiki](https://github.com/uhop/wiki-search/wiki/Add-Search).
25
+
26
+ ## How it works
27
+
28
+ The bookmarklet opens the search page on its own GitHub Pages origin, so the
29
+ wiki page's security policy can't block it. There it loads a small JSON index
30
+ built from the wiki's Markdown and links each result with a
31
+ [text fragment](https://developer.mozilla.org/docs/Web/Text_fragments), so your
32
+ browser scrolls to — and, where supported, highlights — the matched phrase. By
33
+ default a result re-uses your current tab; Back returns you. The index carries
34
+ its own URL template, so the same app works for any site with no hardcoded host.
35
+
36
+ <details>
37
+ <summary>Repo layout & local run</summary>
38
+
39
+ | Path | What |
40
+ |------|------|
41
+ | `index.html` | Landing + bookmarklet-install page (the Pages root). |
42
+ | `app/` | The search page (loads + validates an index, searches, links out). |
43
+ | `builder/` | `wiki-index` CLI: Markdown → the JSON index. |
44
+ | `engine/` | Search core: MiniSearch (vendored), with a zero-dep fallback. |
45
+ | `bookmarklet/` | The `window.open` stub + builder. |
46
+
47
+ Run locally: `python3 -m http.server` from the repo root, then open
48
+ <http://localhost:8000/app/?wiki=uhop/wiki-search>.
49
+
50
+ </details>
51
+
52
+ ## License
53
+
54
+ [BSD-3-Clause](LICENSE)
@@ -0,0 +1,30 @@
1
+ # builder — `wiki-index`
2
+
3
+ Turns a directory of GitHub-wiki Markdown into a self-describing
4
+ [v1 index](../INDEX-FORMAT.md). Dependency-free; deterministic output so a CI
5
+ `git diff --exit-code` can gate a stale committed index.
6
+
7
+ ```bash
8
+ # Build the index for our own wiki (owner/repo + template inferred from git origin)
9
+ node builder/wiki-index.mjs --wiki ./wiki # → wiki/search-index.json
10
+ node builder/wiki-index.mjs --wiki ./wiki --stdout # print instead of writing
11
+
12
+ # Any other site: give the template explicitly
13
+ node builder/wiki-index.mjs --wiki ./docs --url-template 'https://example.com/d/{page}' --name 'Example docs'
14
+ ```
15
+
16
+ | Flag | Default | Meaning |
17
+ |------|---------|---------|
18
+ | `--wiki <dir>` | `./wiki` | Markdown source directory. |
19
+ | `--out <path>` | `<wiki>/search-index.json` | Where to write. |
20
+ | `--stdout` | — | Print the index instead of writing a file. |
21
+ | `--url-template <tpl>` | inferred | Result-URL template; must contain `{page}`. |
22
+ | `--repo <owner/repo>` | inferred | Build the GitHub template from this. |
23
+ | `--name <str>` | `<repo> wiki` | `site.name`. |
24
+
25
+ What it does: one section per ATX heading (plus a page-top preamble section),
26
+ GitHub-style heading anchors with `-1`/`-2` disambiguation, `#`-in-code-fence
27
+ ignored, markdown reduced to plain text for the engine. Pages named `_*.md`
28
+ (`_Sidebar`, `_Footer`) are treated as chrome and skipped.
29
+
30
+ Tests: `node builder/test/run.mjs`.
@@ -0,0 +1,45 @@
1
+ // builder/lib/build-index.mjs — assemble a self-describing v1 index from a
2
+ // directory of GitHub-wiki Markdown files.
3
+ //
4
+ // Deterministic by construction: pages sorted by filename, sections in document
5
+ // order, sequential ids, no timestamps — so a CI `git diff --exit-code` can gate
6
+ // a stale committed index.
7
+
8
+ import { readdir, readFile } from 'node:fs/promises';
9
+ import { join } from 'node:path';
10
+ import { splitSections } from './markdown.mjs';
11
+ import { createSlugger } from './slug.mjs';
12
+
13
+ // GitHub stores page "Foo Bar" as Foo-Bar.md and special pages (_Sidebar,
14
+ // _Footer, …) start with an underscore — those are chrome, not content.
15
+ const isContentPage = name => name.endsWith('.md') && !name.startsWith('_');
16
+
17
+ const firstH1 = md => {
18
+ const m = /^#\s+(.+?)\s*#*\s*$/m.exec(md);
19
+ return m ? m[1].trim() : null;
20
+ };
21
+
22
+ export const buildIndex = async ({ wikiDir, urlTemplate, siteName, fragments = true }) => {
23
+ const files = (await readdir(wikiDir)).filter(isContentPage).sort();
24
+ const docs = [];
25
+ let id = 0;
26
+
27
+ for (const file of files) {
28
+ const page = file.slice(0, -3); // URL segment: Foo-Bar.md → {page}=Foo-Bar → /wiki/Foo-Bar
29
+ const md = await readFile(join(wikiDir, file), 'utf8');
30
+ const title = firstH1(md) || page.replace(/-/g, ' ');
31
+ const slug = createSlugger();
32
+ for (const sec of splitSections(md)) {
33
+ docs.push({
34
+ id: id++,
35
+ page,
36
+ title,
37
+ heading: sec.heading ?? title,
38
+ anchor: sec.heading ? slug(sec.heading) : '', // preamble → page top (no anchor)
39
+ text: sec.text || sec.heading || title,
40
+ });
41
+ }
42
+ }
43
+
44
+ return { v: 1, site: { name: siteName, urlTemplate, fragments }, docs };
45
+ };
@@ -0,0 +1,42 @@
1
+ // builder/lib/markdown.mjs — split a Markdown page into heading-delimited
2
+ // sections and reduce each to searchable plain text. Dependency-free and
3
+ // intentionally approximate: it serves retrieval, not faithful rendering.
4
+
5
+ const FENCE = /^[ \t]*(```|~~~)/;
6
+ const ATX = /^(#{1,6})\s+(.*?)\s*#*\s*$/;
7
+
8
+ // Split markdown into sections. Text before the first heading becomes a
9
+ // preamble section with heading=null, level=0. `#` inside fenced code is
10
+ // ignored so code comments don't masquerade as headings.
11
+ export const splitSections = md => {
12
+ const sections = [{ level: 0, heading: null, lines: [] }];
13
+ let inFence = false;
14
+ for (const line of md.split(/\r?\n/)) {
15
+ if (FENCE.test(line)) inFence = !inFence;
16
+ const m = inFence ? null : ATX.exec(line);
17
+ if (m) sections.push({ level: m[1].length, heading: m[2].trim(), lines: [] });
18
+ else sections.at(-1).lines.push(line);
19
+ }
20
+ return sections
21
+ .map(s => ({ level: s.level, heading: s.heading, text: toPlainText(s.lines.join('\n')) }))
22
+ .filter(s => s.heading || s.text); // drop an empty preamble
23
+ };
24
+
25
+ // Reduce Markdown to plain, collapsed text. Code *text* is kept (API names are
26
+ // worth searching) — only the fence delimiters are removed.
27
+ export const toPlainText = md =>
28
+ md
29
+ .replace(/^[ \t]*(```|~~~).*$/gm, ' ') // fence delimiters (keep the code text)
30
+ .replace(/`([^`]*)`/g, '$1') // inline code → its text
31
+ .replace(/\[\[([^\]|]+)\|([^\]]+)\]\]/g, '$1') // [[Display|Page]] wiki link → Display
32
+ .replace(/\[\[([^\]]+)\]\]/g, '$1') // [[Page]] wiki link → Page
33
+ .replace(/!\[([^\]]*)\]\([^)]*\)/g, '$1') // image → alt
34
+ .replace(/\[([^\]]*)\]\([^)]*\)/g, '$1') // link → text
35
+ .replace(/<[^>]+>/g, ' ') // strip HTML tags
36
+ .replace(/^[ \t>]*>+/gm, ' ') // blockquote markers
37
+ .replace(/^\s{0,3}([-*+]|\d+\.)\s+/gm, ' ') // list markers
38
+ .replace(/[*_~]+/g, '') // emphasis
39
+ .replace(/^\s*\|.*$/gm, m => m.replace(/\|/g, ' ')) // table pipes
40
+ .replace(/^#{1,6}\s+/gm, '') // stray heading marks
41
+ .replace(/\s+/g, ' ')
42
+ .trim();
@@ -0,0 +1,27 @@
1
+ // builder/lib/slug.mjs — GitHub-style heading slugs (approximates github-slugger).
2
+ //
3
+ // GitHub lowercases a heading, strips most punctuation, turns spaces into
4
+ // hyphens, and disambiguates repeats *within a page* as -1, -2, …. This is a
5
+ // close approximation, good enough for English docs; S2/S3 verify the anchors
6
+ // against real rendered pages and flag any divergence.
7
+
8
+ export const slugify = text =>
9
+ text
10
+ .trim()
11
+ .toLowerCase()
12
+ .replace(/\s+/g, ' ') // collapse source whitespace, as HTML rendering does
13
+ .replace(/[^\p{L}\p{N}_ -]+/gu, '') // drop punctuation/symbols (em dash, quotes, colon, parens…)
14
+ .replace(/ /g, '-'); // each remaining space → one hyphen, so a removed char
15
+ // between two spaces yields "--", matching github-slugger
16
+
17
+ // A per-page deduping slugger: call the returned fn on each heading in document
18
+ // order so duplicates get GitHub's -1 / -2 / … suffixes.
19
+ export const createSlugger = () => {
20
+ const seen = new Map();
21
+ return text => {
22
+ const base = slugify(text);
23
+ const n = seen.get(base) ?? 0;
24
+ seen.set(base, n + 1);
25
+ return n === 0 ? base : `${base}-${n}`;
26
+ };
27
+ };
@@ -0,0 +1,75 @@
1
+ #!/usr/bin/env node
2
+ // builder/wiki-index.mjs — CLI: GitHub-wiki Markdown → self-describing v1 index.
3
+ //
4
+ // node builder/wiki-index.mjs [--wiki ./wiki] [--out <path>]
5
+ // [--url-template <tpl>] [--name "<site name>"]
6
+ // [--repo owner/repo] [--stdout]
7
+ //
8
+ // With neither --url-template nor --repo, it infers owner/repo from the wiki
9
+ // dir's git origin (…/<owner>/<repo>.wiki.git) and builds the GitHub template.
10
+ // Default --out is <wiki>/search-index.json (the index is hosted from the wiki).
11
+
12
+ import { writeFile } from 'node:fs/promises';
13
+ import { join, resolve } from 'node:path';
14
+ import { execFile } from 'node:child_process';
15
+ import { promisify } from 'node:util';
16
+ import { buildIndex } from './lib/build-index.mjs';
17
+
18
+ const run = promisify(execFile);
19
+
20
+ const BOOLEAN_FLAGS = new Set(['stdout']);
21
+
22
+ // Accept both --key=value and --key value; flags in BOOLEAN_FLAGS (and a --key
23
+ // followed by another --flag or nothing) are valueless booleans.
24
+ const parseArgs = argv => {
25
+ const args = {};
26
+ for (let i = 0; i < argv.length; ++i) {
27
+ const m = /^--([^=]+)(?:=(.*))?$/.exec(argv[i]);
28
+ if (!m) continue;
29
+ const key = m[1];
30
+ if (m[2] !== undefined) { args[key] = m[2]; continue; }
31
+ const next = argv[i + 1];
32
+ if (!BOOLEAN_FLAGS.has(key) && next !== undefined && !next.startsWith('--')) args[key] = argv[++i];
33
+ else args[key] = true;
34
+ }
35
+ return args;
36
+ };
37
+
38
+ // owner/repo from the wiki clone's origin, tolerating the …/.wiki.git suffix.
39
+ const inferRepo = async wikiDir => {
40
+ try {
41
+ const { stdout } = await run('git', ['-C', wikiDir, 'remote', 'get-url', 'origin']);
42
+ const m = /[/:]([^/]+)\/([^/]+?)(?:\.wiki)?\.git$/.exec(stdout.trim());
43
+ return m ? `${m[1]}/${m[2]}` : null;
44
+ } catch {
45
+ return null;
46
+ }
47
+ };
48
+
49
+ const main = async () => {
50
+ const args = parseArgs(process.argv.slice(2));
51
+ const wikiDir = resolve(args.wiki || './wiki');
52
+
53
+ let repo = args.repo || null;
54
+ if (!args['url-template'] && !repo) repo = await inferRepo(wikiDir);
55
+
56
+ const urlTemplate = args['url-template'] || (repo && `https://github.com/${repo}/wiki/{page}`);
57
+ if (!urlTemplate) {
58
+ console.error('wiki-index: need --url-template or --repo owner/repo (could not infer from git origin).');
59
+ process.exit(2);
60
+ }
61
+ const siteName = args.name || (repo ? `${repo.split('/')[1]} wiki` : 'wiki');
62
+
63
+ const index = await buildIndex({ wikiDir, urlTemplate, siteName });
64
+ const json = JSON.stringify(index, null, 2) + '\n';
65
+
66
+ if (args.stdout) { process.stdout.write(json); return; }
67
+
68
+ const out = resolve(args.out || join(wikiDir, 'search-index.json'));
69
+ await writeFile(out, json);
70
+ const pages = new Set(index.docs.map(d => d.page)).size;
71
+ console.error(`wiki-index: ${index.docs.length} sections from ${pages} page(s) → ${out}`);
72
+ console.error(` site "${siteName}" · ${urlTemplate}`);
73
+ };
74
+
75
+ main();
package/package.json ADDED
@@ -0,0 +1,38 @@
1
+ {
2
+ "name": "wiki-search-index",
3
+ "version": "0.1.0",
4
+ "description": "Build a self-describing search index from a GitHub wiki (or any Markdown docs) — the indexer for wiki-search.",
5
+ "type": "module",
6
+ "bin": {
7
+ "wiki-search-index": "builder/wiki-index.mjs"
8
+ },
9
+ "files": [
10
+ "builder/wiki-index.mjs",
11
+ "builder/lib",
12
+ "builder/README.md",
13
+ "INDEX-FORMAT.md"
14
+ ],
15
+ "engines": {
16
+ "node": ">=18"
17
+ },
18
+ "keywords": [
19
+ "wiki",
20
+ "search",
21
+ "github-wiki",
22
+ "search-index",
23
+ "bookmarklet",
24
+ "documentation",
25
+ "wiki-search",
26
+ "static-search"
27
+ ],
28
+ "homepage": "https://uhop.github.io/wiki-search/",
29
+ "repository": {
30
+ "type": "git",
31
+ "url": "git+https://github.com/uhop/wiki-search.git"
32
+ },
33
+ "bugs": {
34
+ "url": "https://github.com/uhop/wiki-search/issues"
35
+ },
36
+ "author": "Eugene Lazutkin",
37
+ "license": "BSD-3-Clause"
38
+ }