wiki-search-index 0.1.0 → 0.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/INDEX-FORMAT.md +13 -13
- package/README.md +19 -8
- package/builder/README.md +8 -8
- package/builder/lib/build-index.mjs +7 -7
- package/builder/lib/markdown.mjs +70 -10
- package/builder/lib/slug.mjs +6 -6
- package/builder/wiki-index.mjs +20 -11
- package/llms-full.txt +156 -0
- package/llms.txt +73 -0
- package/package.json +19 -3
package/INDEX-FORMAT.md
CHANGED
|
@@ -28,19 +28,19 @@ that emits this shape is searchable — wiki-search is not GitHub-specific.
|
|
|
28
28
|
|
|
29
29
|
## Fields
|
|
30
30
|
|
|
31
|
-
| Field
|
|
32
|
-
|
|
33
|
-
| `v`
|
|
34
|
-
| `site.name`
|
|
35
|
-
| `site.urlTemplate` | Result-URL template; **must contain `{page}`**. No hardcoded host.
|
|
36
|
-
| `site.fragments`
|
|
37
|
-
| `docs[]`
|
|
38
|
-
| `doc.id`
|
|
39
|
-
| `doc.page`
|
|
40
|
-
| `doc.title`
|
|
41
|
-
| `doc.heading`
|
|
42
|
-
| `doc.anchor`
|
|
43
|
-
| `doc.text`
|
|
31
|
+
| Field | Meaning |
|
|
32
|
+
| ------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
33
|
+
| `v` | Format version. This document is `1`. Clients reject versions they don't understand. |
|
|
34
|
+
| `site.name` | Human label for the corpus (shown in the UI). |
|
|
35
|
+
| `site.urlTemplate` | Result-URL template; **must contain `{page}`**. No hardcoded host. |
|
|
36
|
+
| `site.fragments` | `true` if the target renders [Text Fragments](https://developer.mozilla.org/docs/Web/Text_fragments). When `false`, clients omit the `:~:text=` directive. |
|
|
37
|
+
| `docs[]` | One entry per indexed section. |
|
|
38
|
+
| `doc.id` | Stable integer, sequential in build order. |
|
|
39
|
+
| `doc.page` | The `{page}` substitution — for GitHub wikis, the page's URL segment (`Foo-Bar`). |
|
|
40
|
+
| `doc.title` | Page display title. |
|
|
41
|
+
| `doc.heading` | Section heading (falls back to the page title for a page's preamble). |
|
|
42
|
+
| `doc.anchor` | In-page anchor slug; `""` means the page top. For GitHub, the heading's slug. |
|
|
43
|
+
| `doc.text` | Plain-text body of the section (markdown stripped), for the search engine. |
|
|
44
44
|
|
|
45
45
|
## Building a result URL
|
|
46
46
|
|
package/README.md
CHANGED
|
@@ -1,4 +1,7 @@
|
|
|
1
|
-
# wiki-search
|
|
1
|
+
# wiki-search [![NPM version][npm-img]][npm-url]
|
|
2
|
+
|
|
3
|
+
[npm-img]: https://img.shields.io/npm/v/wiki-search-index.svg
|
|
4
|
+
[npm-url]: https://npmjs.org/package/wiki-search-index
|
|
2
5
|
|
|
3
6
|
GitHub wikis are great for docs but have no real search. **wiki-search adds it:**
|
|
4
7
|
a bookmarklet (plus a hosted search page) that searches a wiki and takes you
|
|
@@ -36,19 +39,27 @@ its own URL template, so the same app works for any site with no hardcoded host.
|
|
|
36
39
|
<details>
|
|
37
40
|
<summary>Repo layout & local run</summary>
|
|
38
41
|
|
|
39
|
-
| Path
|
|
40
|
-
|
|
41
|
-
| `index.html`
|
|
42
|
-
| `app/`
|
|
43
|
-
| `builder/`
|
|
44
|
-
| `engine/`
|
|
45
|
-
| `bookmarklet/` | The
|
|
42
|
+
| Path | What |
|
|
43
|
+
| -------------- | ------------------------------------------------------------------- |
|
|
44
|
+
| `index.html` | Landing + bookmarklet-install page (the Pages root). |
|
|
45
|
+
| `app/` | The search page (loads + validates an index, searches, links out). |
|
|
46
|
+
| `builder/` | `wiki-index` CLI: Markdown → the JSON index. |
|
|
47
|
+
| `engine/` | Search core: MiniSearch (vendored), with a zero-dep fallback. |
|
|
48
|
+
| `bookmarklet/` | The bookmarklet — one constant, imported by the app + install page. |
|
|
46
49
|
|
|
47
50
|
Run locally: `python3 -m http.server` from the repo root, then open
|
|
48
51
|
<http://localhost:8000/app/?wiki=uhop/wiki-search>.
|
|
49
52
|
|
|
50
53
|
</details>
|
|
51
54
|
|
|
55
|
+
## Release notes
|
|
56
|
+
|
|
57
|
+
- 0.1.2 _Fix: the published CLI was missing its execute bit, so `npx wiki-search-index` failed — now runnable._
|
|
58
|
+
- 0.1.1 _HTML entities (`—`, `Ӓ`, …) are decoded so they no longer pollute the index._
|
|
59
|
+
- 0.1.0 _Initial release of the `wiki-search-index` builder._
|
|
60
|
+
|
|
61
|
+
For the full history see the wiki: [Release notes](https://github.com/uhop/wiki-search/wiki/Release-notes).
|
|
62
|
+
|
|
52
63
|
## License
|
|
53
64
|
|
|
54
65
|
[BSD-3-Clause](LICENSE)
|
package/builder/README.md
CHANGED
|
@@ -13,14 +13,14 @@ node builder/wiki-index.mjs --wiki ./wiki --stdout # print instead of wri
|
|
|
13
13
|
node builder/wiki-index.mjs --wiki ./docs --url-template 'https://example.com/d/{page}' --name 'Example docs'
|
|
14
14
|
```
|
|
15
15
|
|
|
16
|
-
| Flag
|
|
17
|
-
|
|
18
|
-
| `--wiki <dir>`
|
|
19
|
-
| `--out <path>`
|
|
20
|
-
| `--stdout`
|
|
21
|
-
| `--url-template <tpl>` | inferred
|
|
22
|
-
| `--repo <owner/repo>`
|
|
23
|
-
| `--name <str>`
|
|
16
|
+
| Flag | Default | Meaning |
|
|
17
|
+
| ---------------------- | -------------------------- | ------------------------------------------- |
|
|
18
|
+
| `--wiki <dir>` | `./wiki` | Markdown source directory. |
|
|
19
|
+
| `--out <path>` | `<wiki>/search-index.json` | Where to write. |
|
|
20
|
+
| `--stdout` | — | Print the index instead of writing a file. |
|
|
21
|
+
| `--url-template <tpl>` | inferred | Result-URL template; must contain `{page}`. |
|
|
22
|
+
| `--repo <owner/repo>` | inferred | Build the GitHub template from this. |
|
|
23
|
+
| `--name <str>` | `<repo> wiki` | `site.name`. |
|
|
24
24
|
|
|
25
25
|
What it does: one section per ATX heading (plus a page-top preamble section),
|
|
26
26
|
GitHub-style heading anchors with `-1`/`-2` disambiguation, `#`-in-code-fence
|
|
@@ -5,10 +5,10 @@
|
|
|
5
5
|
// order, sequential ids, no timestamps — so a CI `git diff --exit-code` can gate
|
|
6
6
|
// a stale committed index.
|
|
7
7
|
|
|
8
|
-
import {
|
|
9
|
-
import {
|
|
10
|
-
import {
|
|
11
|
-
import {
|
|
8
|
+
import {readdir, readFile} from 'node:fs/promises';
|
|
9
|
+
import {join} from 'node:path';
|
|
10
|
+
import {splitSections} from './markdown.mjs';
|
|
11
|
+
import {createSlugger} from './slug.mjs';
|
|
12
12
|
|
|
13
13
|
// GitHub stores page "Foo Bar" as Foo-Bar.md and special pages (_Sidebar,
|
|
14
14
|
// _Footer, …) start with an underscore — those are chrome, not content.
|
|
@@ -19,7 +19,7 @@ const firstH1 = md => {
|
|
|
19
19
|
return m ? m[1].trim() : null;
|
|
20
20
|
};
|
|
21
21
|
|
|
22
|
-
export const buildIndex = async ({
|
|
22
|
+
export const buildIndex = async ({wikiDir, urlTemplate, siteName, fragments = true}) => {
|
|
23
23
|
const files = (await readdir(wikiDir)).filter(isContentPage).sort();
|
|
24
24
|
const docs = [];
|
|
25
25
|
let id = 0;
|
|
@@ -36,10 +36,10 @@ export const buildIndex = async ({ wikiDir, urlTemplate, siteName, fragments = t
|
|
|
36
36
|
title,
|
|
37
37
|
heading: sec.heading ?? title,
|
|
38
38
|
anchor: sec.heading ? slug(sec.heading) : '', // preamble → page top (no anchor)
|
|
39
|
-
text: sec.text || sec.heading || title
|
|
39
|
+
text: sec.text || sec.heading || title
|
|
40
40
|
});
|
|
41
41
|
}
|
|
42
42
|
}
|
|
43
43
|
|
|
44
|
-
return {
|
|
44
|
+
return {v: 1, site: {name: siteName, urlTemplate, fragments}, docs};
|
|
45
45
|
};
|
package/builder/lib/markdown.mjs
CHANGED
|
@@ -9,34 +9,94 @@ const ATX = /^(#{1,6})\s+(.*?)\s*#*\s*$/;
|
|
|
9
9
|
// preamble section with heading=null, level=0. `#` inside fenced code is
|
|
10
10
|
// ignored so code comments don't masquerade as headings.
|
|
11
11
|
export const splitSections = md => {
|
|
12
|
-
const sections = [{
|
|
12
|
+
const sections = [{level: 0, heading: null, lines: []}];
|
|
13
13
|
let inFence = false;
|
|
14
14
|
for (const line of md.split(/\r?\n/)) {
|
|
15
15
|
if (FENCE.test(line)) inFence = !inFence;
|
|
16
16
|
const m = inFence ? null : ATX.exec(line);
|
|
17
|
-
if (m) sections.push({
|
|
17
|
+
if (m) sections.push({level: m[1].length, heading: m[2].trim(), lines: []});
|
|
18
18
|
else sections.at(-1).lines.push(line);
|
|
19
19
|
}
|
|
20
20
|
return sections
|
|
21
|
-
.map(s => ({
|
|
21
|
+
.map(s => ({level: s.level, heading: s.heading, text: toPlainText(s.lines.join('\n'))}))
|
|
22
22
|
.filter(s => s.heading || s.text); // drop an empty preamble
|
|
23
23
|
};
|
|
24
24
|
|
|
25
|
+
// Common named HTML entities found in wiki prose — typographic punctuation and
|
|
26
|
+
// symbols. Each maps to its character so it drops out at tokenization (— is
|
|
27
|
+
// punctuation, not a word) instead of surviving as a junk term ("mdash"), and so
|
|
28
|
+
// snippets render the glyph rather than the literal "—".
|
|
29
|
+
const NAMED_ENTITIES = {
|
|
30
|
+
amp: '&',
|
|
31
|
+
lt: '<',
|
|
32
|
+
gt: '>',
|
|
33
|
+
quot: '"',
|
|
34
|
+
apos: "'",
|
|
35
|
+
nbsp: ' ',
|
|
36
|
+
mdash: '—',
|
|
37
|
+
ndash: '–',
|
|
38
|
+
hellip: '…',
|
|
39
|
+
bull: '•',
|
|
40
|
+
middot: '·',
|
|
41
|
+
copy: '©',
|
|
42
|
+
reg: '®',
|
|
43
|
+
trade: '™',
|
|
44
|
+
sect: '§',
|
|
45
|
+
para: '¶',
|
|
46
|
+
deg: '°',
|
|
47
|
+
plusmn: '±',
|
|
48
|
+
times: '×',
|
|
49
|
+
divide: '÷',
|
|
50
|
+
minus: '−',
|
|
51
|
+
frasl: '⁄',
|
|
52
|
+
laquo: '«',
|
|
53
|
+
raquo: '»',
|
|
54
|
+
lsquo: '‘',
|
|
55
|
+
rsquo: '’',
|
|
56
|
+
ldquo: '“',
|
|
57
|
+
rdquo: '”',
|
|
58
|
+
larr: '←',
|
|
59
|
+
rarr: '→',
|
|
60
|
+
uarr: '↑',
|
|
61
|
+
darr: '↓',
|
|
62
|
+
harr: '↔',
|
|
63
|
+
le: '≤',
|
|
64
|
+
ge: '≥',
|
|
65
|
+
ne: '≠',
|
|
66
|
+
prime: '′',
|
|
67
|
+
Prime: '″'
|
|
68
|
+
};
|
|
69
|
+
|
|
70
|
+
// Resolve HTML entities so they never pollute the term index. Numeric entities
|
|
71
|
+
// (Ӓ / 🔍) decode generally — preserving genuine letters that happen
|
|
72
|
+
// to be encoded (α → α) while the typographic noise (—, arrows, emoji)
|
|
73
|
+
// decodes to punctuation/symbols the tokenizer discards. Known named entities map
|
|
74
|
+
// via the table above; anything else unknown collapses to a space so it can't
|
|
75
|
+
// become a junk term ("mdash", "128269").
|
|
76
|
+
const ENTITY_RE = /&(#x[0-9a-fA-F]+|#\d+|[a-zA-Z][a-zA-Z0-9]*);/g;
|
|
77
|
+
const decodeEntity = (_m, body) => {
|
|
78
|
+
if (body[0] !== '#') return NAMED_ENTITIES[body] ?? ' ';
|
|
79
|
+
const cp =
|
|
80
|
+
body[1] === 'x' || body[1] === 'X' ? parseInt(body.slice(2), 16) : parseInt(body.slice(1), 10);
|
|
81
|
+
return cp > 0 && cp <= 0x10ffff ? String.fromCodePoint(cp) : ' ';
|
|
82
|
+
};
|
|
83
|
+
|
|
25
84
|
// Reduce Markdown to plain, collapsed text. Code *text* is kept (API names are
|
|
26
85
|
// worth searching) — only the fence delimiters are removed.
|
|
27
86
|
export const toPlainText = md =>
|
|
28
87
|
md
|
|
29
88
|
.replace(/^[ \t]*(```|~~~).*$/gm, ' ') // fence delimiters (keep the code text)
|
|
30
|
-
.replace(/`([^`]*)`/g, '$1')
|
|
89
|
+
.replace(/`([^`]*)`/g, '$1') // inline code → its text
|
|
31
90
|
.replace(/\[\[([^\]|]+)\|([^\]]+)\]\]/g, '$1') // [[Display|Page]] wiki link → Display
|
|
32
|
-
.replace(/\[\[([^\]]+)\]\]/g, '$1')
|
|
91
|
+
.replace(/\[\[([^\]]+)\]\]/g, '$1') // [[Page]] wiki link → Page
|
|
33
92
|
.replace(/!\[([^\]]*)\]\([^)]*\)/g, '$1') // image → alt
|
|
34
|
-
.replace(/\[([^\]]*)\]\([^)]*\)/g, '$1')
|
|
35
|
-
.replace(/<[^>]+>/g, ' ')
|
|
36
|
-
.replace(
|
|
93
|
+
.replace(/\[([^\]]*)\]\([^)]*\)/g, '$1') // link → text
|
|
94
|
+
.replace(/<[^>]+>/g, ' ') // strip HTML tags
|
|
95
|
+
.replace(ENTITY_RE, decodeEntity) // HTML entities → glyph (drops —/Ӓ noise)
|
|
96
|
+
.replace(/^[ \t>]*>+/gm, ' ') // blockquote markers
|
|
37
97
|
.replace(/^\s{0,3}([-*+]|\d+\.)\s+/gm, ' ') // list markers
|
|
38
|
-
.replace(/[*_~]+/g, '')
|
|
98
|
+
.replace(/[*_~]+/g, '') // emphasis
|
|
39
99
|
.replace(/^\s*\|.*$/gm, m => m.replace(/\|/g, ' ')) // table pipes
|
|
40
|
-
.replace(/^#{1,6}\s+/gm, '')
|
|
100
|
+
.replace(/^#{1,6}\s+/gm, '') // stray heading marks
|
|
41
101
|
.replace(/\s+/g, ' ')
|
|
42
102
|
.trim();
|
package/builder/lib/slug.mjs
CHANGED
|
@@ -2,17 +2,17 @@
|
|
|
2
2
|
//
|
|
3
3
|
// GitHub lowercases a heading, strips most punctuation, turns spaces into
|
|
4
4
|
// hyphens, and disambiguates repeats *within a page* as -1, -2, …. This is a
|
|
5
|
-
// close approximation, good enough for English docs;
|
|
6
|
-
//
|
|
5
|
+
// close approximation, good enough for English docs; verified against real
|
|
6
|
+
// rendered GitHub wiki pages.
|
|
7
7
|
|
|
8
8
|
export const slugify = text =>
|
|
9
9
|
text
|
|
10
10
|
.trim()
|
|
11
11
|
.toLowerCase()
|
|
12
|
-
.replace(/\s+/g, ' ')
|
|
13
|
-
.replace(/[^\p{L}\p{N}_ -]+/gu, '')
|
|
14
|
-
.replace(/ /g, '-');
|
|
15
|
-
|
|
12
|
+
.replace(/\s+/g, ' ') // collapse source whitespace, as HTML rendering does
|
|
13
|
+
.replace(/[^\p{L}\p{N}_ -]+/gu, '') // drop punctuation/symbols (em dash, quotes, colon, parens…)
|
|
14
|
+
.replace(/ /g, '-'); // each remaining space → one hyphen, so a removed char
|
|
15
|
+
// between two spaces yields "--", matching github-slugger
|
|
16
16
|
|
|
17
17
|
// A per-page deduping slugger: call the returned fn on each heading in document
|
|
18
18
|
// order so duplicates get GitHub's -1 / -2 / … suffixes.
|
package/builder/wiki-index.mjs
CHANGED
|
@@ -9,11 +9,11 @@
|
|
|
9
9
|
// dir's git origin (…/<owner>/<repo>.wiki.git) and builds the GitHub template.
|
|
10
10
|
// Default --out is <wiki>/search-index.json (the index is hosted from the wiki).
|
|
11
11
|
|
|
12
|
-
import {
|
|
13
|
-
import {
|
|
14
|
-
import {
|
|
15
|
-
import {
|
|
16
|
-
import {
|
|
12
|
+
import {writeFile} from 'node:fs/promises';
|
|
13
|
+
import {join, resolve} from 'node:path';
|
|
14
|
+
import {execFile} from 'node:child_process';
|
|
15
|
+
import {promisify} from 'node:util';
|
|
16
|
+
import {buildIndex} from './lib/build-index.mjs';
|
|
17
17
|
|
|
18
18
|
const run = promisify(execFile);
|
|
19
19
|
|
|
@@ -27,9 +27,13 @@ const parseArgs = argv => {
|
|
|
27
27
|
const m = /^--([^=]+)(?:=(.*))?$/.exec(argv[i]);
|
|
28
28
|
if (!m) continue;
|
|
29
29
|
const key = m[1];
|
|
30
|
-
if (m[2] !== undefined) {
|
|
30
|
+
if (m[2] !== undefined) {
|
|
31
|
+
args[key] = m[2];
|
|
32
|
+
continue;
|
|
33
|
+
}
|
|
31
34
|
const next = argv[i + 1];
|
|
32
|
-
if (!BOOLEAN_FLAGS.has(key) && next !== undefined && !next.startsWith('--'))
|
|
35
|
+
if (!BOOLEAN_FLAGS.has(key) && next !== undefined && !next.startsWith('--'))
|
|
36
|
+
args[key] = argv[++i];
|
|
33
37
|
else args[key] = true;
|
|
34
38
|
}
|
|
35
39
|
return args;
|
|
@@ -38,7 +42,7 @@ const parseArgs = argv => {
|
|
|
38
42
|
// owner/repo from the wiki clone's origin, tolerating the …/.wiki.git suffix.
|
|
39
43
|
const inferRepo = async wikiDir => {
|
|
40
44
|
try {
|
|
41
|
-
const {
|
|
45
|
+
const {stdout} = await run('git', ['-C', wikiDir, 'remote', 'get-url', 'origin']);
|
|
42
46
|
const m = /[/:]([^/]+)\/([^/]+?)(?:\.wiki)?\.git$/.exec(stdout.trim());
|
|
43
47
|
return m ? `${m[1]}/${m[2]}` : null;
|
|
44
48
|
} catch {
|
|
@@ -55,15 +59,20 @@ const main = async () => {
|
|
|
55
59
|
|
|
56
60
|
const urlTemplate = args['url-template'] || (repo && `https://github.com/${repo}/wiki/{page}`);
|
|
57
61
|
if (!urlTemplate) {
|
|
58
|
-
console.error(
|
|
62
|
+
console.error(
|
|
63
|
+
'wiki-index: need --url-template or --repo owner/repo (could not infer from git origin).'
|
|
64
|
+
);
|
|
59
65
|
process.exit(2);
|
|
60
66
|
}
|
|
61
67
|
const siteName = args.name || (repo ? `${repo.split('/')[1]} wiki` : 'wiki');
|
|
62
68
|
|
|
63
|
-
const index = await buildIndex({
|
|
69
|
+
const index = await buildIndex({wikiDir, urlTemplate, siteName});
|
|
64
70
|
const json = JSON.stringify(index, null, 2) + '\n';
|
|
65
71
|
|
|
66
|
-
if (args.stdout) {
|
|
72
|
+
if (args.stdout) {
|
|
73
|
+
process.stdout.write(json);
|
|
74
|
+
return;
|
|
75
|
+
}
|
|
67
76
|
|
|
68
77
|
const out = resolve(args.out || join(wikiDir, 'search-index.json'));
|
|
69
78
|
await writeFile(out, json);
|
package/llms-full.txt
ADDED
|
@@ -0,0 +1,156 @@
|
|
|
1
|
+
# wiki-search — full LLM reference
|
|
2
|
+
|
|
3
|
+
`wiki-search` adds real search to a GitHub wiki (or any Markdown docs site) and jumps the reader to the matching section, without moving the docs. It has two parts:
|
|
4
|
+
|
|
5
|
+
1. **`wiki-search-index`** — the npm CLI (this package) that compiles Markdown into a self-describing JSON search index.
|
|
6
|
+
2. **The hosted search app** — a static GitHub Pages site that loads an index, searches it, and deep-links each hit via Text Fragments. A thin bookmarklet opens it on any wiki page.
|
|
7
|
+
|
|
8
|
+
Zero runtime dependencies. No build step — the CLI is `.mjs`, the app is browser ESM.
|
|
9
|
+
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
## 1. The CLI — `wiki-search-index`
|
|
13
|
+
|
|
14
|
+
`bin`: `wiki-search-index` → `builder/wiki-index.mjs`.
|
|
15
|
+
|
|
16
|
+
```
|
|
17
|
+
wiki-search-index [options]
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
### Options
|
|
21
|
+
|
|
22
|
+
| Flag | Default | Meaning |
|
|
23
|
+
|------|---------|---------|
|
|
24
|
+
| `--wiki <dir>` | `./wiki` | Markdown source directory. |
|
|
25
|
+
| `--out <path>` | `<wiki>/search-index.json` | Output file. |
|
|
26
|
+
| `--repo <owner/repo>` | — | GitHub repo; builds the wiki URL template `https://github.com/<owner>/<repo>/wiki/{page}`. |
|
|
27
|
+
| `--url-template <tpl>` | — | Result-URL template; **must contain `{page}`**. For any non-GitHub site. |
|
|
28
|
+
| `--name <site name>` | `<repo> wiki` or `wiki` | Human label shown in the search UI. |
|
|
29
|
+
| `--stdout` | off | Write the index to stdout instead of a file. |
|
|
30
|
+
|
|
31
|
+
Flags accept both `--key value` and `--key=value`. `--stdout` is a valueless boolean.
|
|
32
|
+
|
|
33
|
+
### Repo inference
|
|
34
|
+
|
|
35
|
+
With neither `--url-template` nor `--repo`, the tool reads the wiki dir's git origin (`git -C <wiki> remote get-url origin`) and extracts `owner/repo`, tolerating the `…/<owner>/<repo>.wiki.git` suffix. If it can't determine a URL template (no `--url-template`, no `--repo`, and inference fails), it prints an error and exits `2`.
|
|
36
|
+
|
|
37
|
+
### Output
|
|
38
|
+
|
|
39
|
+
On success it writes pretty-printed JSON (2-space indent, trailing newline) and logs to **stderr** a one-line summary: `wiki-index: <N> sections from <P> page(s) → <out>` plus the site name and URL template. With `--stdout` the JSON goes to stdout and nothing else is written.
|
|
40
|
+
|
|
41
|
+
### Examples
|
|
42
|
+
|
|
43
|
+
```bash
|
|
44
|
+
# GitHub wiki — owner/repo inferred from the wiki's git origin
|
|
45
|
+
npx wiki-search-index --wiki ./wiki
|
|
46
|
+
|
|
47
|
+
# Explicit repo (origin lacks the .wiki.git suffix, e.g. an SSH submodule)
|
|
48
|
+
npx wiki-search-index --wiki ./wiki --repo uhop/stream-json
|
|
49
|
+
|
|
50
|
+
# Non-GitHub docs site
|
|
51
|
+
npx wiki-search-index --wiki ./docs --url-template 'https://example.com/docs/{page}' --name 'Example docs'
|
|
52
|
+
|
|
53
|
+
# Emit to stdout
|
|
54
|
+
npx wiki-search-index --wiki ./wiki --stdout > index.json
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
---
|
|
58
|
+
|
|
59
|
+
## 2. Index format (v1)
|
|
60
|
+
|
|
61
|
+
A wiki-search index is a single self-describing JSON document (canonical spec: `INDEX-FORMAT.md`). A client assumes nothing beyond this contract.
|
|
62
|
+
|
|
63
|
+
```json
|
|
64
|
+
{
|
|
65
|
+
"v": 1,
|
|
66
|
+
"site": {
|
|
67
|
+
"name": "wiki-search wiki",
|
|
68
|
+
"urlTemplate": "https://github.com/uhop/wiki-search/wiki/{page}",
|
|
69
|
+
"fragments": true
|
|
70
|
+
},
|
|
71
|
+
"docs": [
|
|
72
|
+
{
|
|
73
|
+
"id": 0,
|
|
74
|
+
"page": "Index-Format",
|
|
75
|
+
"title": "Index format",
|
|
76
|
+
"heading": "Validation",
|
|
77
|
+
"anchor": "validation",
|
|
78
|
+
"text": "full plain text of the section…"
|
|
79
|
+
}
|
|
80
|
+
]
|
|
81
|
+
}
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
### Fields
|
|
85
|
+
|
|
86
|
+
- `v` — format version (`1`). Clients reject a version they don't understand.
|
|
87
|
+
- `site.name` — human label for the corpus.
|
|
88
|
+
- `site.urlTemplate` — result-URL template; **must contain `{page}`**. No hardcoded host.
|
|
89
|
+
- `site.fragments` — `true` if the target renders Text Fragments; when `false`, clients omit the `:~:text=` directive.
|
|
90
|
+
- `docs[]` — one entry per indexed section.
|
|
91
|
+
- `doc.id` — stable integer, sequential in build order.
|
|
92
|
+
- `doc.page` — the `{page}` substitution (for GitHub wikis, the page's URL segment, e.g. `Foo-Bar`).
|
|
93
|
+
- `doc.title` — page display title.
|
|
94
|
+
- `doc.heading` — section heading (falls back to the page title for a page's preamble).
|
|
95
|
+
- `doc.anchor` — in-page anchor slug; `""` means the page top.
|
|
96
|
+
- `doc.text` — plain-text body of the section (Markdown stripped), for the search engine.
|
|
97
|
+
|
|
98
|
+
### Building a result URL
|
|
99
|
+
|
|
100
|
+
```
|
|
101
|
+
base = urlTemplate.replace("{page}", encodeURIComponent(doc.page))
|
|
102
|
+
hash = doc.anchor || "" (omit if empty)
|
|
103
|
+
text = ":~:text=" + <matched phrase> (only if site.fragments and a phrase)
|
|
104
|
+
result = base + ("#" + hash + text) (only if hash or text)
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
### Validation (verify-or-explain)
|
|
108
|
+
|
|
109
|
+
A client must check, and on any failure show a specific message (never a blank box): (1) the index is fetchable; (2) it is valid JSON; (3) `v` is supported; (4) `site.urlTemplate` is present and contains `{page}`; (5) `docs` is a non-empty array, each entry having `page`, `title`, `text`.
|
|
110
|
+
|
|
111
|
+
### Versioning
|
|
112
|
+
|
|
113
|
+
`v` increases only on a breaking change. Additive optional fields do not bump `v`; clients ignore unknown fields. A client meeting a higher `v` than it knows stops and says so.
|
|
114
|
+
|
|
115
|
+
---
|
|
116
|
+
|
|
117
|
+
## 3. The hosted app
|
|
118
|
+
|
|
119
|
+
Static site at `https://uhop.github.io/wiki-search/app/`. It loads an index, searches, and renders hits as real `<a>` links with `#anchor:~:text=phrase` directives. Query parameters (priority order):
|
|
120
|
+
|
|
121
|
+
- `?index=<url>` — load any v1 index from any URL.
|
|
122
|
+
- `?wiki=<owner>/<repo>` — convenience: derive `https://raw.githubusercontent.com/wiki/<owner>/<repo>/search-index.json` (override the file with `?file=<name>`).
|
|
123
|
+
- `?from=<page url>` — the current page URL, passed by the bookmarklet; the app parses `owner/repo` from it (any `github.com/<owner>/<repo>` page — repo root, `/wiki/…`, `/actions`, etc.).
|
|
124
|
+
- `?q=<query>` — pre-fill the search box.
|
|
125
|
+
- `?target=opener|new|tab`, `?anchor=off`, `?text=off` — positioning levers (also in the Options row).
|
|
126
|
+
|
|
127
|
+
With none of `?index` / `?wiki` / `?from` resolvable, the app explains ("nothing to search yet") rather than showing a blank box.
|
|
128
|
+
|
|
129
|
+
### The bookmarklet
|
|
130
|
+
|
|
131
|
+
`bookmarklet/bookmarklet.js` exports `APP_URL` and `BOOKMARKLET`. The bookmarklet is a thin launcher — `window.open(APP_URL + '?from=' + encodeURIComponent(location.href), …)` — so it opens the app on its own origin (bypassing the wiki page's CSP) and lets the app do owner/repo detection. Only `APP_URL` is frozen into a saved bookmark; everything else is server-side and updates on redeploy.
|
|
132
|
+
|
|
133
|
+
---
|
|
134
|
+
|
|
135
|
+
## 4. Adoption recipe
|
|
136
|
+
|
|
137
|
+
```bash
|
|
138
|
+
# 1. Build the index from your wiki's Markdown
|
|
139
|
+
npx wiki-search-index --wiki ./wiki # (--repo owner/repo if the origin lacks .wiki.git)
|
|
140
|
+
# 2. Commit ./wiki/search-index.json into the wiki repo
|
|
141
|
+
# 3. Link the hosted app from the wiki (Home / sidebar):
|
|
142
|
+
# https://uhop.github.io/wiki-search/app/?wiki=<owner>/<repo>
|
|
143
|
+
# plus the one-click bookmarklet from https://uhop.github.io/wiki-search/
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
Rebuild the index whenever the wiki's Markdown changes (the output is deterministic, so a CI `git diff --exit-code` can gate staleness).
|
|
147
|
+
|
|
148
|
+
---
|
|
149
|
+
|
|
150
|
+
## Links
|
|
151
|
+
|
|
152
|
+
- Demo + install: https://uhop.github.io/wiki-search/
|
|
153
|
+
- Wiki: https://github.com/uhop/wiki-search/wiki
|
|
154
|
+
- Source: https://github.com/uhop/wiki-search
|
|
155
|
+
- npm: https://www.npmjs.com/package/wiki-search-index
|
|
156
|
+
- Index format: https://github.com/uhop/wiki-search/blob/main/INDEX-FORMAT.md
|
package/llms.txt
ADDED
|
@@ -0,0 +1,73 @@
|
|
|
1
|
+
# wiki-search-index
|
|
2
|
+
|
|
3
|
+
> CLI that builds a self-describing search index from a GitHub wiki (or any Markdown docs). It's the indexer for `wiki-search` — a bookmarklet + hosted app that search a wiki and jump to the matching section. Zero dependencies, no build step.
|
|
4
|
+
|
|
5
|
+
## Install
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
npx wiki-search-index --wiki ./wiki # no install needed
|
|
9
|
+
# or, as a dev dependency:
|
|
10
|
+
npm i -D wiki-search-index
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
## Quick start
|
|
14
|
+
|
|
15
|
+
```bash
|
|
16
|
+
# Build an index for a GitHub wiki checked out at ./wiki
|
|
17
|
+
npx wiki-search-index --wiki ./wiki
|
|
18
|
+
# → ./wiki/search-index.json
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
Commit `search-index.json` into the wiki (it's served from `raw.githubusercontent.com`), then point the hosted app at it: `https://uhop.github.io/wiki-search/app/?wiki=<owner>/<repo>`.
|
|
22
|
+
|
|
23
|
+
## CLI
|
|
24
|
+
|
|
25
|
+
`wiki-search-index [options]`
|
|
26
|
+
|
|
27
|
+
- `--wiki <dir>` — Markdown source directory (default `./wiki`).
|
|
28
|
+
- `--out <path>` — output file (default `<wiki>/search-index.json`).
|
|
29
|
+
- `--repo <owner/repo>` — GitHub repo; builds the wiki URL template. Needed when the wiki's git origin lacks the `.wiki.git` suffix the tool infers from.
|
|
30
|
+
- `--url-template <tpl>` — result-URL template containing `{page}` (for any non-GitHub site).
|
|
31
|
+
- `--name <site name>` — human label shown in the search UI.
|
|
32
|
+
- `--stdout` — write the index to stdout instead of a file.
|
|
33
|
+
|
|
34
|
+
With neither `--url-template` nor `--repo`, owner/repo is inferred from the wiki dir's git origin (`…/<owner>/<repo>.wiki.git`). Exit code `2` if neither is given and inference fails.
|
|
35
|
+
|
|
36
|
+
## Output — index format v1
|
|
37
|
+
|
|
38
|
+
A single self-describing JSON document (full spec: INDEX-FORMAT.md):
|
|
39
|
+
|
|
40
|
+
```json
|
|
41
|
+
{
|
|
42
|
+
"v": 1,
|
|
43
|
+
"site": { "name": "...", "urlTemplate": "https://github.com/<o>/<r>/wiki/{page}", "fragments": true },
|
|
44
|
+
"docs": [
|
|
45
|
+
{ "id": 0, "page": "Page-Name", "title": "Page title", "heading": "Section", "anchor": "section", "text": "plain text…" }
|
|
46
|
+
]
|
|
47
|
+
}
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
A client validates `v` + required fields, then builds each result URL from `site.urlTemplate` alone — no hardcoded host, so any site emitting this shape is searchable. The output is deterministic (sorted, no timestamps), so a CI `git diff --exit-code` can gate index staleness.
|
|
51
|
+
|
|
52
|
+
## Common patterns
|
|
53
|
+
|
|
54
|
+
```bash
|
|
55
|
+
# Explicit repo (origin lacks the .wiki.git suffix)
|
|
56
|
+
npx wiki-search-index --wiki ./wiki --repo uhop/stream-json
|
|
57
|
+
|
|
58
|
+
# Non-GitHub docs site
|
|
59
|
+
npx wiki-search-index --wiki ./docs --url-template 'https://example.com/docs/{page}' --name 'Example docs'
|
|
60
|
+
|
|
61
|
+
# Pipe the index elsewhere
|
|
62
|
+
npx wiki-search-index --wiki ./wiki --stdout > index.json
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
Rebuild whenever the wiki's Markdown changes — an index does not go stale on its own.
|
|
66
|
+
|
|
67
|
+
## Links
|
|
68
|
+
|
|
69
|
+
- Demo + install: https://uhop.github.io/wiki-search/
|
|
70
|
+
- Wiki: https://github.com/uhop/wiki-search/wiki
|
|
71
|
+
- npm: https://www.npmjs.com/package/wiki-search-index
|
|
72
|
+
- Index format: https://github.com/uhop/wiki-search/blob/main/INDEX-FORMAT.md
|
|
73
|
+
- Full LLM reference: https://github.com/uhop/wiki-search/blob/main/llms-full.txt
|
package/package.json
CHANGED
|
@@ -1,19 +1,29 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "wiki-search-index",
|
|
3
|
-
"version": "0.1.
|
|
3
|
+
"version": "0.1.2",
|
|
4
4
|
"description": "Build a self-describing search index from a GitHub wiki (or any Markdown docs) — the indexer for wiki-search.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"bin": {
|
|
7
7
|
"wiki-search-index": "builder/wiki-index.mjs"
|
|
8
8
|
},
|
|
9
|
+
"scripts": {
|
|
10
|
+
"test": "node builder/test/run.mjs",
|
|
11
|
+
"lint": "prettier --check .",
|
|
12
|
+
"lint:fix": "prettier --write ."
|
|
13
|
+
},
|
|
9
14
|
"files": [
|
|
10
15
|
"builder/wiki-index.mjs",
|
|
11
16
|
"builder/lib",
|
|
12
17
|
"builder/README.md",
|
|
13
|
-
"INDEX-FORMAT.md"
|
|
18
|
+
"INDEX-FORMAT.md",
|
|
19
|
+
"llms.txt",
|
|
20
|
+
"llms-full.txt"
|
|
14
21
|
],
|
|
15
22
|
"engines": {
|
|
16
|
-
"node": ">=
|
|
23
|
+
"node": ">=22"
|
|
24
|
+
},
|
|
25
|
+
"devDependencies": {
|
|
26
|
+
"prettier": "^3.8.3"
|
|
17
27
|
},
|
|
18
28
|
"keywords": [
|
|
19
29
|
"wiki",
|
|
@@ -26,6 +36,8 @@
|
|
|
26
36
|
"static-search"
|
|
27
37
|
],
|
|
28
38
|
"homepage": "https://uhop.github.io/wiki-search/",
|
|
39
|
+
"llms": "https://raw.githubusercontent.com/uhop/wiki-search/main/llms.txt",
|
|
40
|
+
"llmsFull": "https://raw.githubusercontent.com/uhop/wiki-search/main/llms-full.txt",
|
|
29
41
|
"repository": {
|
|
30
42
|
"type": "git",
|
|
31
43
|
"url": "git+https://github.com/uhop/wiki-search.git"
|
|
@@ -34,5 +46,9 @@
|
|
|
34
46
|
"url": "https://github.com/uhop/wiki-search/issues"
|
|
35
47
|
},
|
|
36
48
|
"author": "Eugene Lazutkin",
|
|
49
|
+
"funding": {
|
|
50
|
+
"type": "github",
|
|
51
|
+
"url": "https://github.com/sponsors/uhop"
|
|
52
|
+
},
|
|
37
53
|
"license": "BSD-3-Clause"
|
|
38
54
|
}
|