obsidian-anchor 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +209 -0
- package/dist/chunk/chunker.js +169 -0
- package/dist/chunk/chunker.js.map +1 -0
- package/dist/chunk/markdown.js +49 -0
- package/dist/chunk/markdown.js.map +1 -0
- package/dist/chunk/types.js +5 -0
- package/dist/chunk/types.js.map +1 -0
- package/dist/config.js +17 -0
- package/dist/config.js.map +1 -0
- package/dist/container.js +84 -0
- package/dist/container.js.map +1 -0
- package/dist/edit/safeEdit.js +130 -0
- package/dist/edit/safeEdit.js.map +1 -0
- package/dist/edit/snapshots.js +31 -0
- package/dist/edit/snapshots.js.map +1 -0
- package/dist/embeddings/local.js +61 -0
- package/dist/embeddings/local.js.map +1 -0
- package/dist/embeddings/openai.js +63 -0
- package/dist/embeddings/openai.js.map +1 -0
- package/dist/embeddings/provider.js +5 -0
- package/dist/embeddings/provider.js.map +1 -0
- package/dist/index/indexer.js +108 -0
- package/dist/index/indexer.js.map +1 -0
- package/dist/index/walk.js +28 -0
- package/dist/index/walk.js.map +1 -0
- package/dist/index/watcher.js +115 -0
- package/dist/index/watcher.js.map +1 -0
- package/dist/index.js +192 -0
- package/dist/index.js.map +1 -0
- package/dist/server.js +61 -0
- package/dist/server.js.map +1 -0
- package/dist/store/chunkStore.js +82 -0
- package/dist/store/chunkStore.js.map +1 -0
- package/dist/store/db.js +59 -0
- package/dist/store/db.js.map +1 -0
- package/dist/store/schema.js +67 -0
- package/dist/store/schema.js.map +1 -0
- package/dist/store/search.js +33 -0
- package/dist/store/search.js.map +1 -0
- package/dist/store/types.js +3 -0
- package/dist/store/types.js.map +1 -0
- package/dist/store/vectorStore.js +77 -0
- package/dist/store/vectorStore.js.map +1 -0
- package/dist/tools/cite.js +51 -0
- package/dist/tools/cite.js.map +1 -0
- package/dist/tools/restoreNote.js +49 -0
- package/dist/tools/restoreNote.js.map +1 -0
- package/dist/tools/safeEdit.js +65 -0
- package/dist/tools/safeEdit.js.map +1 -0
- package/dist/tools/searchNotes.js +47 -0
- package/dist/tools/searchNotes.js.map +1 -0
- package/dist/tools/verifyGrounding.js +66 -0
- package/dist/tools/verifyGrounding.js.map +1 -0
- package/dist/util/hash.js +6 -0
- package/dist/util/hash.js.map +1 -0
- package/dist/util/logger.js +41 -0
- package/dist/util/logger.js.map +1 -0
- package/dist/util/timeout.js +19 -0
- package/dist/util/timeout.js.map +1 -0
- package/dist/verify/anthropic.js +126 -0
- package/dist/verify/anthropic.js.map +1 -0
- package/dist/verify/decompose.js +100 -0
- package/dist/verify/decompose.js.map +1 -0
- package/dist/verify/localVerifier.js +89 -0
- package/dist/verify/localVerifier.js.map +1 -0
- package/dist/verify/pipeline.js +165 -0
- package/dist/verify/pipeline.js.map +1 -0
- package/dist/verify/score.js +64 -0
- package/dist/verify/score.js.map +1 -0
- package/dist/verify/types.js +3 -0
- package/dist/verify/types.js.map +1 -0
- package/dist/verify/verifier.js +2 -0
- package/dist/verify/verifier.js.map +1 -0
- package/package.json +71 -0
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Justin
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,209 @@
|
|
|
1
|
+
# ⚓ Anchor
|
|
2
|
+
|
|
3
|
+
**Your AI assistant can't lie about your notes.**
|
|
4
|
+
|
|
5
|
+
[](https://github.com/Perfectio/obsidian-anchor/actions/workflows/ci.yml)
|
|
6
|
+
[](https://www.npmjs.com/package/obsidian-anchor)
|
|
7
|
+
[](./LICENSE)
|
|
8
|
+
[](https://nodejs.org)
|
|
9
|
+
|
|
10
|
+
Anchor is a reliability-first [MCP](https://modelcontextprotocol.io) server for
|
|
11
|
+
[Obsidian](https://obsidian.md). Every other Obsidian MCP helps an AI *read*
|
|
12
|
+
your vault. Anchor makes it **prove** what it says: it checks each claim against
|
|
13
|
+
your actual notes, scores how well they support it, and **refuses claims your
|
|
14
|
+
vault can't back up** — citing the exact block, heading, or `[[wikilink]]` it
|
|
15
|
+
relied on.
|
|
16
|
+
|
|
17
|
+
> Other tools answer. Anchor answers *and tells you when it doesn't know.*
|
|
18
|
+
|
|
19
|
+
🚧 **Early development** (v0.2), built in the open.
|
|
20
|
+
|
|
21
|
+
<!-- TODO: add demo.gif here — the grounded-vs-refused contrast (the money shot). -->
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
## See it work
|
|
26
|
+
|
|
27
|
+
Before an agent states anything as fact, it calls `verify_grounding` — and
|
|
28
|
+
Anchor checks each claim against your notes:
|
|
29
|
+
|
|
30
|
+
```text
|
|
31
|
+
Claim: "We chose Postgres as our primary data store."
|
|
32
|
+
✅ grounded (score 0.99)
|
|
33
|
+
Projects/Auth.md#^d4e1
|
|
34
|
+
"Decision: use Postgres as the primary data store, chosen over MySQL…"
|
|
35
|
+
|
|
36
|
+
Claim: "The auth rewrite shipped in March."
|
|
37
|
+
🚫 REFUSED — contradicted (confidence 0.99)
|
|
38
|
+
Projects/Auth.md#^k93a
|
|
39
|
+
"The auth rewrite slipped to Q3 2026. March only covered the design
|
|
40
|
+
phase — no code shipped in March."
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
The second claim is *topically* similar to your notes — plain semantic search
|
|
44
|
+
would happily "support" it. Anchor catches that the notes actually **contradict**
|
|
45
|
+
it, and refuses. That distinction is the whole point.
|
|
46
|
+
|
|
47
|
+
## Why Anchor
|
|
48
|
+
|
|
49
|
+
AI agents confidently invent facts about your notes, and semantic search makes
|
|
50
|
+
it *worse* — it surfaces plausible passages and the model fills the gaps. Anchor
|
|
51
|
+
adds the missing layer: **grounding verification**.
|
|
52
|
+
|
|
53
|
+
- **Verifies, doesn't just search.** Splits an answer into atomic claims and
|
|
54
|
+
checks each for entailment (supported / contradicted / neutral) — not mere
|
|
55
|
+
similarity. A claim is only grounded when its distinctive terms (proper nouns,
|
|
56
|
+
identifiers, numbers) actually appear in the cited evidence, so a topical
|
|
57
|
+
look-alike like "we use MongoDB" against a Postgres note is refused, not grounded.
|
|
58
|
+
- **Refuses when unsupported.** Below the grounding threshold, Anchor returns
|
|
59
|
+
"no supporting notes found" instead of a confident guess.
|
|
60
|
+
- **Cites Obsidian-natively.** Evidence points at `note.md#^block-id`, headings,
|
|
61
|
+
and wikilinks — clickable, exact, auditable.
|
|
62
|
+
- **Local and private by default.** Runs with **no API key**; your notes never
|
|
63
|
+
leave your machine. Add a key only if you want higher accuracy.
|
|
64
|
+
|
|
65
|
+
## Tools
|
|
66
|
+
|
|
67
|
+
| Tool | What it does |
|
|
68
|
+
| --- | --- |
|
|
69
|
+
| `search_notes` | Semantic search returning chunks with precise Obsidian citations. |
|
|
70
|
+
| `verify_grounding` | **The core.** Scores a claim 0–1 against your notes; refuses/flags the unsupported, with cited evidence. |
|
|
71
|
+
| `cite` | Returns a statement's supporting citations, or an honest "no supporting notes found." |
|
|
72
|
+
| `safe_edit` | Dry-run diff → confirm token → apply, saving an automatic rollback snapshot. |
|
|
73
|
+
| `restore_note` | Roll a note back to a `safe_edit` snapshot (the restore is itself undoable). |
|
|
74
|
+
|
|
75
|
+
## Quick start
|
|
76
|
+
|
|
77
|
+
No API key required. With [Node 20+](https://nodejs.org):
|
|
78
|
+
|
|
79
|
+
```bash
|
|
80
|
+
npx obsidian-anchor /path/to/your/vault
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
The first run downloads small local models (~110MB total) and indexes your
|
|
84
|
+
vault. Everything after is fully offline.
|
|
85
|
+
|
|
86
|
+
### Configure your MCP client
|
|
87
|
+
|
|
88
|
+
**Claude Desktop** — edit the config file, then restart:
|
|
89
|
+
|
|
90
|
+
- Windows: `%APPDATA%\Claude\claude_desktop_config.json`
|
|
91
|
+
- macOS: `~/Library/Application Support/Claude/claude_desktop_config.json`
|
|
92
|
+
- Linux: `~/.config/Claude/claude_desktop_config.json`
|
|
93
|
+
|
|
94
|
+
```json
|
|
95
|
+
{
|
|
96
|
+
"mcpServers": {
|
|
97
|
+
"anchor": {
|
|
98
|
+
"command": "npx",
|
|
99
|
+
"args": ["-y", "obsidian-anchor", "/path/to/your/vault"]
|
|
100
|
+
}
|
|
101
|
+
}
|
|
102
|
+
}
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
**Claude Code**: `claude mcp add anchor -- npx -y obsidian-anchor /path/to/your/vault`
|
|
106
|
+
|
|
107
|
+
**Cursor**: add the same `mcpServers` entry to your MCP config.
|
|
108
|
+
|
|
109
|
+
## Configuration
|
|
110
|
+
|
|
111
|
+
All optional, via environment variables:
|
|
112
|
+
|
|
113
|
+
| Variable | Effect |
|
|
114
|
+
| --- | --- |
|
|
115
|
+
| `ANTHROPIC_API_KEY` | Use the Claude Haiku verifier (higher accuracy). Auto-detected. |
|
|
116
|
+
| `ANCHOR_VERIFIER=local` | Force the local NLI verifier even if a key is present. |
|
|
117
|
+
| `ANCHOR_EMBEDDING=openai` | Use OpenAI embeddings instead of local (requires `OPENAI_API_KEY`; notes are sent to OpenAI). |
|
|
118
|
+
| `OPENAI_API_KEY` | Key for the optional OpenAI embedding provider. |
|
|
119
|
+
| `ANCHOR_WATCH_POLLING=1` | Poll for file changes — needed on network drives, WSL, or Docker volumes. |
|
|
120
|
+
| `ANCHOR_LOG_LEVEL` | `debug` \| `info` (default) \| `warn` \| `error`. Logs go to stderr. |
|
|
121
|
+
|
|
122
|
+
CLI flags override the defaults: `--knn N`, `--grounded-threshold X`,
|
|
123
|
+
`--refuse-threshold X`, `--evidence-min-score X`, `--verify-timeout-ms N`,
|
|
124
|
+
`--verifier local`, `--embedding openai`, `--watch-polling`, and `--reindex`
|
|
125
|
+
(force a full re-index).
|
|
126
|
+
|
|
127
|
+
### Languages
|
|
128
|
+
|
|
129
|
+
The default local models are English. For non-English vaults (e.g. Korean), use
|
|
130
|
+
the API path — OpenAI embeddings + the Anthropic verifier — which Anchor supports
|
|
131
|
+
and which scores 100% on a Korean eval (`npm run eval:ko`). Or point the local
|
|
132
|
+
models at multilingual ONNX weights via `ANCHOR_EMBEDDING_MODEL` /
|
|
133
|
+
`ANCHOR_NLI_MODEL` (and `ANCHOR_EMBEDDING_DIM` if the dimensionality differs).
|
|
134
|
+
|
|
135
|
+
## Accuracy
|
|
136
|
+
|
|
137
|
+
Anchor's reliability is measured, not claimed. `npm run eval` runs a labeled,
|
|
138
|
+
adversarial grounding set and reports **refusal precision** — when Anchor
|
|
139
|
+
refuses a claim, how often the claim is genuinely unsupported (wrongly refusing
|
|
140
|
+
a *true* claim is the worst failure, so this is the metric that matters).
|
|
141
|
+
|
|
142
|
+
Two evals measure it — `npm run eval` (verifier given evidence) and
|
|
143
|
+
`npm run eval:e2e` (the whole retrieve → verify pipeline). The end-to-end set is
|
|
144
|
+
88 labeled claims spanning failure modes: paraphrase, numbers and **numeric
|
|
145
|
+
mismatch**, temporal, negation, quantifier scope, multi-fact conjunctions,
|
|
146
|
+
**entity substitution** ("we use MongoDB" against a Postgres note), and claims
|
|
147
|
+
simply not in the notes.
|
|
148
|
+
|
|
149
|
+
| Verifier | Grounding recall | Refusal recall | Grounding precision | Latency / claim |
|
|
150
|
+
| --- | --- | --- | --- | --- |
|
|
151
|
+
| **Local** (default, no key) | 98% | 100% | **100%** | ~20 ms |
|
|
152
|
+
| **Anthropic Haiku** (`ANTHROPIC_API_KEY`) | 100% | 100% | **100%** | ~1.4 s |
|
|
153
|
+
|
|
154
|
+
**Grounding precision — never grounding an unsupported claim — is a hard CI gate
|
|
155
|
+
(must be 100%).** That is the core promise, so a regression fails the build. The
|
|
156
|
+
local model trails only on heavy-paraphrase recall; the Anthropic verifier closes
|
|
157
|
+
it, at the cost of latency. On the verifier-only set, refusal precision is 84%
|
|
158
|
+
local / 100% Anthropic. These are hand-labeled sets and keep growing.
|
|
159
|
+
|
|
160
|
+
## How it works
|
|
161
|
+
|
|
162
|
+
```
|
|
163
|
+
Vault (.md) ──▶ chunk (heading / ^block-id / wikilink aware)
|
|
164
|
+
──▶ embed (local, all-MiniLM-L6-v2) ──▶ sqlite-vec
|
|
165
|
+
│
|
|
166
|
+
claim ──▶ decompose ──▶ retrieve candidates ──▶ NLI entailment ──▶ score ──▶ grounded / flagged / refused
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
The citation unit is the chunk, but each chunk carries the most precise anchor
|
|
170
|
+
available (block id › heading › line), so evidence is a real, clickable Obsidian
|
|
171
|
+
link — not just a file name.
|
|
172
|
+
|
|
173
|
+
### "Isn't the verifier also an LLM that can hallucinate?"
|
|
174
|
+
|
|
175
|
+
The verifier never *generates* claims — it answers one narrow, closed question:
|
|
176
|
+
*does this passage support, contradict, or stay neutral toward this claim?* That
|
|
177
|
+
bounded entailment judgment is far more stable than open-ended generation, and
|
|
178
|
+
Anchor always returns the verbatim evidence and a confidence score so you can
|
|
179
|
+
audit every verdict yourself. Narrow beats open-ended.
|
|
180
|
+
|
|
181
|
+
## Troubleshooting
|
|
182
|
+
|
|
183
|
+
- **Native build errors on `npm`/`npx`:** Anchor uses prebuilt binaries
|
|
184
|
+
(`better-sqlite3`, `onnxruntime`). If a prebuild is missing for your platform,
|
|
185
|
+
ensure you're on Node 20–22 and a 64-bit OS. CI verifies Linux, macOS, and Windows.
|
|
186
|
+
If `sqlite-vec` still can't load, Anchor automatically falls back to a slower
|
|
187
|
+
pure-JS vector store and keeps working.
|
|
188
|
+
- **First run is slow:** it downloads the embedding + NLI models once (~110MB),
|
|
189
|
+
cached under `<vault>/.anchor/models`. Later runs are instant and offline.
|
|
190
|
+
- **Changes aren't picked up** on a network drive / WSL / Docker volume: set
|
|
191
|
+
`ANCHOR_WATCH_POLLING=1`.
|
|
192
|
+
- **Large vault feels slow to first answer:** indexing runs in the background;
|
|
193
|
+
the first tool call waits for it to finish, then stays live via the watcher.
|
|
194
|
+
|
|
195
|
+
## Development
|
|
196
|
+
|
|
197
|
+
```bash
|
|
198
|
+
npm install
|
|
199
|
+
npm run build
|
|
200
|
+
npm test # fast, hermetic unit + integration tests
|
|
201
|
+
npm run eval # grounding eval (downloads the NLI model once)
|
|
202
|
+
ANCHOR_E2E=1 npx vitest run test/e2e.demo.test.ts # full demo with real models
|
|
203
|
+
```
|
|
204
|
+
|
|
205
|
+
See [CONTRIBUTING.md](./CONTRIBUTING.md).
|
|
206
|
+
|
|
207
|
+
## License
|
|
208
|
+
|
|
209
|
+
[MIT](./LICENSE)
|
|
@@ -0,0 +1,169 @@
|
|
|
1
|
+
import { extractBlockId, extractWikilinks, isFence, matchHeading } from "./markdown.js";
|
|
2
|
+
// Matches a leading YAML frontmatter block so it can be excluded from chunk text
|
|
3
|
+
// while keeping absolute file offsets intact for the body.
|
|
4
|
+
const FRONTMATTER_RE = /^---\r?\n[\s\S]*?\r?\n---\r?\n?/;
|
|
5
|
+
// Upper bound on chunk size. all-MiniLM-L6-v2 truncates at ~256 tokens (~1k
|
|
6
|
+
// chars for English); oversized blocks are split so no content is silently lost.
|
|
7
|
+
const MAX_CHUNK_CHARS = 1000;
|
|
8
|
+
/**
|
|
9
|
+
* Splits a note into chunks — Anchor's citation/evidence units.
|
|
10
|
+
*
|
|
11
|
+
* The unit is a blank-line-separated block (paragraph, list group, table, or
|
|
12
|
+
* fenced code), which matches how Obsidian attaches `^block-id`s. Oversized
|
|
13
|
+
* blocks are further split to fit the embedding model. Each chunk carries the
|
|
14
|
+
* most precise anchor available (block id > enclosing heading > synthetic line
|
|
15
|
+
* marker). Heading lines are not emitted as chunks; they build the breadcrumb.
|
|
16
|
+
*/
|
|
17
|
+
export function chunkNote(path, content) {
|
|
18
|
+
const chunks = [];
|
|
19
|
+
const headingStack = [];
|
|
20
|
+
const frontmatter = FRONTMATTER_RE.exec(content);
|
|
21
|
+
const bodyStart = frontmatter ? frontmatter[0].length : 0;
|
|
22
|
+
let bufStart = -1;
|
|
23
|
+
let bufLines = [];
|
|
24
|
+
let inFence = false;
|
|
25
|
+
const pushChunk = (text, start) => {
|
|
26
|
+
const blockId = extractBlockId(text);
|
|
27
|
+
const meta = {
|
|
28
|
+
path,
|
|
29
|
+
headingPath: headingStack.length > 0 ? headingStack.map((h) => h.text).join(" > ") : null,
|
|
30
|
+
blockId,
|
|
31
|
+
anchor: resolveAnchor(blockId, headingStack, chunks.length),
|
|
32
|
+
wikilinks: extractWikilinks(text),
|
|
33
|
+
};
|
|
34
|
+
chunks.push({
|
|
35
|
+
text,
|
|
36
|
+
ordinal: chunks.length,
|
|
37
|
+
charStart: start,
|
|
38
|
+
charEnd: start + text.length,
|
|
39
|
+
tokenEst: Math.ceil(text.length / 4),
|
|
40
|
+
meta,
|
|
41
|
+
});
|
|
42
|
+
};
|
|
43
|
+
const flush = () => {
|
|
44
|
+
if (bufLines.length === 0)
|
|
45
|
+
return;
|
|
46
|
+
const text = bufLines.join("\n");
|
|
47
|
+
const start = bufStart;
|
|
48
|
+
bufLines = [];
|
|
49
|
+
bufStart = -1;
|
|
50
|
+
if (text.trim() === "")
|
|
51
|
+
return;
|
|
52
|
+
// Tables confuse the NLI verifier when kept whole (rows blur together), so
|
|
53
|
+
// emit one chunk per row.
|
|
54
|
+
if (isTable(text)) {
|
|
55
|
+
for (const row of tableRows(text, start))
|
|
56
|
+
pushChunk(row.text, row.start);
|
|
57
|
+
return;
|
|
58
|
+
}
|
|
59
|
+
for (const piece of splitBlock(text, start)) {
|
|
60
|
+
pushChunk(piece.text, piece.start);
|
|
61
|
+
}
|
|
62
|
+
};
|
|
63
|
+
// Index frontmatter that carries aliases/tags so a note is findable by them.
|
|
64
|
+
const frontmatterText = frontmatter?.[0];
|
|
65
|
+
if (frontmatterText !== undefined && /(^|\n)(aliases|tags)\s*:/i.test(frontmatterText)) {
|
|
66
|
+
pushChunk(frontmatterText, 0);
|
|
67
|
+
}
|
|
68
|
+
let offset = bodyStart;
|
|
69
|
+
for (const line of content.slice(bodyStart).split("\n")) {
|
|
70
|
+
const lineStart = offset;
|
|
71
|
+
offset += line.length + 1; // +1 for the "\n" that split removed
|
|
72
|
+
if (isFence(line)) {
|
|
73
|
+
if (inFence) {
|
|
74
|
+
bufLines.push(line);
|
|
75
|
+
inFence = false;
|
|
76
|
+
flush();
|
|
77
|
+
}
|
|
78
|
+
else {
|
|
79
|
+
flush();
|
|
80
|
+
inFence = true;
|
|
81
|
+
bufStart = lineStart;
|
|
82
|
+
bufLines.push(line);
|
|
83
|
+
}
|
|
84
|
+
continue;
|
|
85
|
+
}
|
|
86
|
+
if (inFence) {
|
|
87
|
+
if (bufStart === -1)
|
|
88
|
+
bufStart = lineStart;
|
|
89
|
+
bufLines.push(line);
|
|
90
|
+
continue;
|
|
91
|
+
}
|
|
92
|
+
const heading = matchHeading(line);
|
|
93
|
+
if (heading) {
|
|
94
|
+
flush();
|
|
95
|
+
while (headingStack.length > 0) {
|
|
96
|
+
const top = headingStack[headingStack.length - 1];
|
|
97
|
+
if (top === undefined || top.level < heading.level)
|
|
98
|
+
break;
|
|
99
|
+
headingStack.pop();
|
|
100
|
+
}
|
|
101
|
+
headingStack.push(heading);
|
|
102
|
+
continue;
|
|
103
|
+
}
|
|
104
|
+
if (line.trim() === "") {
|
|
105
|
+
flush();
|
|
106
|
+
continue;
|
|
107
|
+
}
|
|
108
|
+
if (bufStart === -1)
|
|
109
|
+
bufStart = lineStart;
|
|
110
|
+
bufLines.push(line);
|
|
111
|
+
}
|
|
112
|
+
flush();
|
|
113
|
+
return chunks;
|
|
114
|
+
}
|
|
115
|
+
/** Splits an oversized block into <= MAX_CHUNK_CHARS pieces on line/word boundaries. */
|
|
116
|
+
function splitBlock(text, start) {
|
|
117
|
+
if (text.length <= MAX_CHUNK_CHARS)
|
|
118
|
+
return [{ text, start }];
|
|
119
|
+
const pieces = [];
|
|
120
|
+
let pos = 0;
|
|
121
|
+
while (pos < text.length) {
|
|
122
|
+
let end = Math.min(pos + MAX_CHUNK_CHARS, text.length);
|
|
123
|
+
if (end < text.length) {
|
|
124
|
+
// Back off to the last newline or space so we don't cut mid-word.
|
|
125
|
+
const window = text.slice(pos, end);
|
|
126
|
+
const newline = window.lastIndexOf("\n");
|
|
127
|
+
const space = window.lastIndexOf(" ");
|
|
128
|
+
const boundary = newline > MAX_CHUNK_CHARS / 2 ? newline : space;
|
|
129
|
+
if (boundary > 0)
|
|
130
|
+
end = pos + boundary + 1;
|
|
131
|
+
}
|
|
132
|
+
const piece = text.slice(pos, end);
|
|
133
|
+
if (piece.trim() !== "")
|
|
134
|
+
pieces.push({ text: piece, start: start + pos });
|
|
135
|
+
pos = end;
|
|
136
|
+
}
|
|
137
|
+
return pieces;
|
|
138
|
+
}
|
|
139
|
+
const TABLE_SEPARATOR_RE = /^\|[\s:|-]+\|?$/;
|
|
140
|
+
function isTable(text) {
|
|
141
|
+
const lines = text.split("\n").filter((line) => line.trim() !== "");
|
|
142
|
+
if (lines.length < 2)
|
|
143
|
+
return false;
|
|
144
|
+
const pipeLines = lines.filter((line) => line.trim().startsWith("|")).length;
|
|
145
|
+
return pipeLines >= 2 && pipeLines >= lines.length - 1;
|
|
146
|
+
}
|
|
147
|
+
/** Splits a markdown table into one chunk per row (separator row dropped), offset-exact. */
|
|
148
|
+
function tableRows(text, start) {
|
|
149
|
+
const rows = [];
|
|
150
|
+
let offset = 0;
|
|
151
|
+
for (const line of text.split("\n")) {
|
|
152
|
+
const lineStart = start + offset;
|
|
153
|
+
offset += line.length + 1;
|
|
154
|
+
const trimmed = line.trim();
|
|
155
|
+
if (!trimmed.startsWith("|") || TABLE_SEPARATOR_RE.test(trimmed))
|
|
156
|
+
continue;
|
|
157
|
+
rows.push({ text: line, start: lineStart });
|
|
158
|
+
}
|
|
159
|
+
return rows;
|
|
160
|
+
}
|
|
161
|
+
function resolveAnchor(blockId, stack, ordinal) {
|
|
162
|
+
if (blockId)
|
|
163
|
+
return `#^${blockId}`;
|
|
164
|
+
const top = stack[stack.length - 1];
|
|
165
|
+
if (top)
|
|
166
|
+
return `#${top.text}`;
|
|
167
|
+
return `#L${ordinal}`;
|
|
168
|
+
}
|
|
169
|
+
//# sourceMappingURL=chunker.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"chunker.js","sourceRoot":"","sources":["../../src/chunk/chunker.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,cAAc,EAAE,gBAAgB,EAAE,OAAO,EAAE,YAAY,EAAE,MAAM,eAAe,CAAC;AAQxF,iFAAiF;AACjF,2DAA2D;AAC3D,MAAM,cAAc,GAAG,iCAAiC,CAAC;AAEzD,4EAA4E;AAC5E,iFAAiF;AACjF,MAAM,eAAe,GAAG,IAAI,CAAC;AAE7B;;;;;;;;GAQG;AACH,MAAM,UAAU,SAAS,CAAC,IAAY,EAAE,OAAe;IACrD,MAAM,MAAM,GAAY,EAAE,CAAC;IAC3B,MAAM,YAAY,GAAmB,EAAE,CAAC;IAExC,MAAM,WAAW,GAAG,cAAc,CAAC,IAAI,CAAC,OAAO,CAAC,CAAC;IACjD,MAAM,SAAS,GAAG,WAAW,CAAC,CAAC,CAAC,WAAW,CAAC,CAAC,CAAC,CAAC,MAAM,CAAC,CAAC,CAAC,CAAC,CAAC;IAE1D,IAAI,QAAQ,GAAG,CAAC,CAAC,CAAC;IAClB,IAAI,QAAQ,GAAa,EAAE,CAAC;IAC5B,IAAI,OAAO,GAAG,KAAK,CAAC;IAEpB,MAAM,SAAS,GAAG,CAAC,IAAY,EAAE,KAAa,EAAQ,EAAE;QACtD,MAAM,OAAO,GAAG,cAAc,CAAC,IAAI,CAAC,CAAC;QACrC,MAAM,IAAI,GAAiB;YACzB,IAAI;YACJ,WAAW,EAAE,YAAY,CAAC,MAAM,GAAG,CAAC,CAAC,CAAC,CAAC,YAAY,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,EAAE,CAAC,CAAC,CAAC,IAAI,CAAC,CAAC,IAAI,CAAC,KAAK,CAAC,CAAC,CAAC,CAAC,IAAI;YACzF,OAAO;YACP,MAAM,EAAE,aAAa,CAAC,OAAO,EAAE,YAAY,EAAE,MAAM,CAAC,MAAM,CAAC;YAC3D,SAAS,EAAE,gBAAgB,CAAC,IAAI,CAAC;SAClC,CAAC;QACF,MAAM,CAAC,IAAI,CAAC;YACV,IAAI;YACJ,OAAO,EAAE,MAAM,CAAC,MAAM;YACtB,SAAS,EAAE,KAAK;YAChB,OAAO,EAAE,KAAK,GAAG,IAAI,CAAC,MAAM;YAC5B,QAAQ,EAAE,IAAI,CAAC,IAAI,CAAC,IAAI,CAAC,MAAM,GAAG,CAAC,CAAC;YACpC,IAAI;SACL,CAAC,CAAC;IACL,CAAC,CAAC;IAEF,MAAM,KAAK,GAAG,GAAS,EAAE;QACvB,IAAI,QAAQ,CAAC,MAAM,KAAK,CAAC;YAAE,OAAO;QAClC,MAAM,IAAI,GAAG,QAAQ,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC;QACjC,MAAM,KAAK,GAAG,QAAQ,CAAC;QACvB,QAAQ,GAAG,EAAE,CAAC;QACd,QAAQ,GAAG,CAAC,CAAC,CAAC;QACd,IAAI,IAAI,CAAC,IAAI,EAAE,KAAK,EAAE;YAAE,OAAO;QAC/B,2EAA2E;QAC3E,0BAA0B;QAC1B,IAAI,OAAO,CAAC,IAAI,CAAC,EAAE,CAAC;YAClB,KAAK,MAAM,GAAG,IAAI,SAAS,CAAC,IAAI,EAAE,KAAK,CAAC;gBAAE,SAAS,CAAC,GAAG,CAAC,IAAI,EAAE,GAAG,CAAC,KAAK,CAAC,CAAC;YACzE,OAAO;QACT,CAAC;QACD,KAAK,MAAM,KAAK,IAAI,UAAU,CAAC,IAAI,EAAE,KAAK,CAAC,EAAE,CAAC;YAC5C,SAAS,CAAC,KAAK,CAAC,IAAI,EAAE,KAAK,CAAC,KAAK,CAAC,CAAC;QACrC,CAAC;IACH,CAAC,CAAC;IAEF,6EAA6E;IAC7E,MAAM,eAAe,GAAG,WAAW,EAAE,CAAC,CAAC,CAAC,CAAC;IACzC,IAAI,eAAe,KAAK,SAAS,IAAI,2BAA2B,CAAC,IAAI,CAAC,eAAe,CAAC,EAAE,CAAC;QACvF,SAAS,CAAC,eAAe,EAAE,CAAC,CAAC,CAAC;IAChC,CAAC;IAED,IAAI,MAAM,GAAG,SAAS,CAAC;IACvB,KAAK,MAAM,IAAI,IAAI,OAAO,CAAC,KAAK,CAAC,SAAS,CAAC,CAAC,KAAK,CAAC,IAAI,CAAC,EAAE,CAAC;QACxD,MAAM,SAAS,GAAG,MAAM,CAAC;QACzB,MAAM,IAAI,IAAI,CAAC,MAAM,GAAG,CAAC,CAAC,CAAC,qCAAqC;QAEhE,IAAI,OAAO,CAAC,IAAI,CAAC,EAAE,CAAC;YAClB,IAAI,OAAO,EAAE,CAAC;gBACZ,QAAQ,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC;gBACpB,OAAO,GAAG,KAAK,CAAC;gBAChB,KAAK,EAAE,CAAC;YACV,CAAC;iBAAM,CAAC;gBACN,KAAK,EAAE,CAAC;gBACR,OAAO,GAAG,IAAI,CAAC;gBACf,QAAQ,GAAG,SAAS,CAAC;gBACrB,QAAQ,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC;YACtB,CAAC;YACD,SAAS;QACX,CAAC;QAED,IAAI,OAAO,EAAE,CAAC;YACZ,IAAI,QAAQ,KAAK,CAAC,CAAC;gBAAE,QAAQ,GAAG,SAAS,CAAC;YAC1C,QAAQ,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC;YACpB,SAAS;QACX,CAAC;QAED,MAAM,OAAO,GAAG,YAAY,CAAC,IAAI,CAAC,CAAC;QACnC,IAAI,OAAO,EAAE,CAAC;YACZ,KAAK,EAAE,CAAC;YACR,OAAO,YAAY,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;gBAC/B,MAAM,GAAG,GAAG,YAAY,CAAC,YAAY,CAAC,MAAM,GAAG,CAAC,CAAC,CAAC;gBAClD,IAAI,GAAG,KAAK,SAAS,IAAI,GAAG,CAAC,KAAK,GAAG,OAAO,CAAC,KAAK;oBAAE,MAAM;gBAC1D,YAAY,CAAC,GAAG,EAAE,CAAC;YACrB,CAAC;YACD,YAAY,CAAC,IAAI,CAAC,OAAO,CAAC,CAAC;YAC3B,SAAS;QACX,CAAC;QAED,IAAI,IAAI,CAAC,IAAI,EAAE,KAAK,EAAE,EAAE,CAAC;YACvB,KAAK,EAAE,CAAC;YACR,SAAS;QACX,CAAC;QAED,IAAI,QAAQ,KAAK,CAAC,CAAC;YAAE,QAAQ,GAAG,SAAS,CAAC;QAC1C,QAAQ,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC;IACtB,CAAC;IACD,KAAK,EAAE,CAAC;IAER,OAAO,MAAM,CAAC;AAChB,CAAC;AAOD,wFAAwF;AACxF,SAAS,UAAU,CAAC,IAAY,EAAE,KAAa;IAC7C,IAAI,IAAI,CAAC,MAAM,IAAI,eAAe;QAAE,OAAO,CAAC,EAAE,IAAI,EAAE,KAAK,EAAE,CAAC,CAAC;IAE7D,MAAM,MAAM,GAAY,EAAE,CAAC;IAC3B,IAAI,GAAG,GAAG,CAAC,CAAC;IACZ,OAAO,GAAG,GAAG,IAAI,CAAC,MAAM,EAAE,CAAC;QACzB,IAAI,GAAG,GAAG,IAAI,CAAC,GAAG,CAAC,GAAG,GAAG,eAAe,EAAE,IAAI,CAAC,MAAM,CAAC,CAAC;QACvD,IAAI,GAAG,GAAG,IAAI,CAAC,MAAM,EAAE,CAAC;YACtB,kEAAkE;YAClE,MAAM,MAAM,GAAG,IAAI,CAAC,KAAK,CAAC,GAAG,EAAE,GAAG,CAAC,CAAC;YACpC,MAAM,OAAO,GAAG,MAAM,CAAC,WAAW,CAAC,IAAI,CAAC,CAAC;YACzC,MAAM,KAAK,GAAG,MAAM,CAAC,WAAW,CAAC,GAAG,CAAC,CAAC;YACtC,MAAM,QAAQ,GAAG,OAAO,GAAG,eAAe,GAAG,CAAC,CAAC,CAAC,CAAC,OAAO,CAAC,CAAC,CAAC,KAAK,CAAC;YACjE,IAAI,QAAQ,GAAG,CAAC;gBAAE,GAAG,GAAG,GAAG,GAAG,QAAQ,GAAG,CAAC,CAAC;QAC7C,CAAC;QACD,MAAM,KAAK,GAAG,IAAI,CAAC,KAAK,CAAC,GAAG,EAAE,GAAG,CAAC,CAAC;QACnC,IAAI,KAAK,CAAC,IAAI,EAAE,KAAK,EAAE;YAAE,MAAM,CAAC,IAAI,CAAC,EAAE,IAAI,EAAE,KAAK,EAAE,KAAK,EAAE,KAAK,GAAG,GAAG,EAAE,CAAC,CAAC;QAC1E,GAAG,GAAG,GAAG,CAAC;IACZ,CAAC;IACD,OAAO,MAAM,CAAC;AAChB,CAAC;AAED,MAAM,kBAAkB,GAAG,iBAAiB,CAAC;AAE7C,SAAS,OAAO,CAAC,IAAY;IAC3B,MAAM,KAAK,GAAG,IAAI,CAAC,KAAK,CAAC,IAAI,CAAC,CAAC,MAAM,CAAC,CAAC,IAAI,EAAE,EAAE,CAAC,IAAI,CAAC,IAAI,EAAE,KAAK,EAAE,CAAC,CAAC;IACpE,IAAI,KAAK,CAAC,MAAM,GAAG,CAAC;QAAE,OAAO,KAAK,CAAC;IACnC,MAAM,SAAS,GAAG,KAAK,CAAC,MAAM,CAAC,CAAC,IAAI,EAAE,EAAE,CAAC,IAAI,CAAC,IAAI,EAAE,CAAC,UAAU,CAAC,GAAG,CAAC,CAAC,CAAC,MAAM,CAAC;IAC7E,OAAO,SAAS,IAAI,CAAC,IAAI,SAAS,IAAI,KAAK,CAAC,MAAM,GAAG,CAAC,CAAC;AACzD,CAAC;AAED,4FAA4F;AAC5F,SAAS,SAAS,CAAC,IAAY,EAAE,KAAa;IAC5C,MAAM,IAAI,GAAY,EAAE,CAAC;IACzB,IAAI,MAAM,GAAG,CAAC,CAAC;IACf,KAAK,MAAM,IAAI,IAAI,IAAI,CAAC,KAAK,CAAC,IAAI,CAAC,EAAE,CAAC;QACpC,MAAM,SAAS,GAAG,KAAK,GAAG,MAAM,CAAC;QACjC,MAAM,IAAI,IAAI,CAAC,MAAM,GAAG,CAAC,CAAC;QAC1B,MAAM,OAAO,GAAG,IAAI,CAAC,IAAI,EAAE,CAAC;QAC5B,IAAI,CAAC,OAAO,CAAC,UAAU,CAAC,GAAG,CAAC,IAAI,kBAAkB,CAAC,IAAI,CAAC,OAAO,CAAC;YAAE,SAAS;QAC3E,IAAI,CAAC,IAAI,CAAC,EAAE,IAAI,EAAE,IAAI,EAAE,KAAK,EAAE,SAAS,EAAE,CAAC,CAAC;IAC9C,CAAC;IACD,OAAO,IAAI,CAAC;AACd,CAAC;AAED,SAAS,aAAa,CAAC,OAAsB,EAAE,KAAqB,EAAE,OAAe;IACnF,IAAI,OAAO;QAAE,OAAO,KAAK,OAAO,EAAE,CAAC;IACnC,MAAM,GAAG,GAAG,KAAK,CAAC,KAAK,CAAC,MAAM,GAAG,CAAC,CAAC,CAAC;IACpC,IAAI,GAAG;QAAE,OAAO,IAAI,GAAG,CAAC,IAAI,EAAE,CAAC;IAC/B,OAAO,KAAK,OAAO,EAAE,CAAC;AACxB,CAAC"}
|
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
// Deterministic Obsidian markdown extraction helpers (no LLM).
|
|
2
|
+
//
|
|
3
|
+
// These power the chunker. For the v1 milestone they are intentionally
|
|
4
|
+
// line/regex based; a remark-based hardening pass (robust fenced-code handling,
|
|
5
|
+
// nested structures) is planned for week 2.
|
|
6
|
+
// [[note]] | [[note#^block]] | [[note#Heading]] | [[note|alias]]
|
|
7
|
+
const WIKILINK_RE = /\[\[([^\]|#]+)(#[^\]|]+)?(\|[^\]]+)?\]\]/g;
|
|
8
|
+
// trailing ^block-id on a block's last line
|
|
9
|
+
const BLOCK_ID_RE = /(?:^|\s)\^([A-Za-z0-9-]+)\s*$/;
|
|
10
|
+
const HEADING_RE = /^(#{1,6})\s+(.*\S)\s*$/;
|
|
11
|
+
const FENCE_RE = /^(?:```|~~~)/;
|
|
12
|
+
export function extractWikilinks(text) {
|
|
13
|
+
const links = [];
|
|
14
|
+
for (const match of text.matchAll(WIKILINK_RE)) {
|
|
15
|
+
const targetNote = (match[1] ?? "").trim();
|
|
16
|
+
if (targetNote === "")
|
|
17
|
+
continue;
|
|
18
|
+
const targetAnchor = match[2] ?? null;
|
|
19
|
+
const aliasGroup = match[3];
|
|
20
|
+
links.push({
|
|
21
|
+
targetNote,
|
|
22
|
+
targetAnchor,
|
|
23
|
+
alias: aliasGroup ? aliasGroup.slice(1) : null,
|
|
24
|
+
});
|
|
25
|
+
}
|
|
26
|
+
return links;
|
|
27
|
+
}
|
|
28
|
+
/** Returns the block id (caret stripped) declared at the end of a block, if any. */
|
|
29
|
+
export function extractBlockId(blockText) {
|
|
30
|
+
const lines = blockText.split("\n");
|
|
31
|
+
for (let i = lines.length - 1; i >= 0; i--) {
|
|
32
|
+
const line = lines[i];
|
|
33
|
+
if (line === undefined || line.trim() === "")
|
|
34
|
+
continue;
|
|
35
|
+
const match = BLOCK_ID_RE.exec(line);
|
|
36
|
+
return match ? (match[1] ?? null) : null;
|
|
37
|
+
}
|
|
38
|
+
return null;
|
|
39
|
+
}
|
|
40
|
+
export function matchHeading(line) {
|
|
41
|
+
const match = HEADING_RE.exec(line);
|
|
42
|
+
if (!match || match[1] === undefined || match[2] === undefined)
|
|
43
|
+
return null;
|
|
44
|
+
return { level: match[1].length, text: match[2].trim() };
|
|
45
|
+
}
|
|
46
|
+
export function isFence(line) {
|
|
47
|
+
return FENCE_RE.test(line.trimStart());
|
|
48
|
+
}
|
|
49
|
+
//# sourceMappingURL=markdown.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"markdown.js","sourceRoot":"","sources":["../../src/chunk/markdown.ts"],"names":[],"mappings":"AAAA,+DAA+D;AAC/D,EAAE;AACF,uEAAuE;AACvE,gFAAgF;AAChF,4CAA4C;AAI5C,iEAAiE;AACjE,MAAM,WAAW,GAAG,2CAA2C,CAAC;AAChE,4CAA4C;AAC5C,MAAM,WAAW,GAAG,+BAA+B,CAAC;AACpD,MAAM,UAAU,GAAG,wBAAwB,CAAC;AAC5C,MAAM,QAAQ,GAAG,cAAc,CAAC;AAEhC,MAAM,UAAU,gBAAgB,CAAC,IAAY;IAC3C,MAAM,KAAK,GAAe,EAAE,CAAC;IAC7B,KAAK,MAAM,KAAK,IAAI,IAAI,CAAC,QAAQ,CAAC,WAAW,CAAC,EAAE,CAAC;QAC/C,MAAM,UAAU,GAAG,CAAC,KAAK,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC,CAAC,IAAI,EAAE,CAAC;QAC3C,IAAI,UAAU,KAAK,EAAE;YAAE,SAAS;QAChC,MAAM,YAAY,GAAG,KAAK,CAAC,CAAC,CAAC,IAAI,IAAI,CAAC;QACtC,MAAM,UAAU,GAAG,KAAK,CAAC,CAAC,CAAC,CAAC;QAC5B,KAAK,CAAC,IAAI,CAAC;YACT,UAAU;YACV,YAAY;YACZ,KAAK,EAAE,UAAU,CAAC,CAAC,CAAC,UAAU,CAAC,KAAK,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC,IAAI;SAC/C,CAAC,CAAC;IACL,CAAC;IACD,OAAO,KAAK,CAAC;AACf,CAAC;AAED,oFAAoF;AACpF,MAAM,UAAU,cAAc,CAAC,SAAiB;IAC9C,MAAM,KAAK,GAAG,SAAS,CAAC,KAAK,CAAC,IAAI,CAAC,CAAC;IACpC,KAAK,IAAI,CAAC,GAAG,KAAK,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC,IAAI,CAAC,EAAE,CAAC,EAAE,EAAE,CAAC;QAC3C,MAAM,IAAI,GAAG,KAAK,CAAC,CAAC,CAAC,CAAC;QACtB,IAAI,IAAI,KAAK,SAAS,IAAI,IAAI,CAAC,IAAI,EAAE,KAAK,EAAE;YAAE,SAAS;QACvD,MAAM,KAAK,GAAG,WAAW,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC;QACrC,OAAO,KAAK,CAAC,CAAC,CAAC,CAAC,KAAK,CAAC,CAAC,CAAC,IAAI,IAAI,CAAC,CAAC,CAAC,CAAC,IAAI,CAAC;IAC3C,CAAC;IACD,OAAO,IAAI,CAAC;AACd,CAAC;AAOD,MAAM,UAAU,YAAY,CAAC,IAAY;IACvC,MAAM,KAAK,GAAG,UAAU,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC;IACpC,IAAI,CAAC,KAAK,IAAI,KAAK,CAAC,CAAC,CAAC,KAAK,SAAS,IAAI,KAAK,CAAC,CAAC,CAAC,KAAK,SAAS;QAAE,OAAO,IAAI,CAAC;IAC5E,OAAO,EAAE,KAAK,EAAE,KAAK,CAAC,CAAC,CAAC,CAAC,MAAM,EAAE,IAAI,EAAE,KAAK,CAAC,CAAC,CAAC,CAAC,IAAI,EAAE,EAAE,CAAC;AAC3D,CAAC;AAED,MAAM,UAAU,OAAO,CAAC,IAAY;IAClC,OAAO,QAAQ,CAAC,IAAI,CAAC,IAAI,CAAC,SAAS,EAAE,CAAC,CAAC;AACzC,CAAC"}
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"types.js","sourceRoot":"","sources":["../../src/chunk/types.ts"],"names":[],"mappings":"AAAA,8EAA8E;AAC9E,4EAA4E;AAC5E,6DAA6D"}
|
package/dist/config.js
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
1
|
+
import { join } from "node:path";
|
|
2
|
+
const ANCHOR_DIR = ".anchor";
|
|
3
|
+
export function resolveConfig(vaultPath, overrides = {}) {
|
|
4
|
+
return {
|
|
5
|
+
vaultPath,
|
|
6
|
+
dbPath: join(vaultPath, ANCHOR_DIR, "index.db"),
|
|
7
|
+
modelCacheDir: join(vaultPath, ANCHOR_DIR, "models"),
|
|
8
|
+
knn: 6,
|
|
9
|
+
grounded: 0.8,
|
|
10
|
+
refuse: 0.5,
|
|
11
|
+
decisiveMinConfidence: 0.5,
|
|
12
|
+
evidenceMinScore: 0.35,
|
|
13
|
+
verifyTimeoutMs: 30_000,
|
|
14
|
+
...overrides,
|
|
15
|
+
};
|
|
16
|
+
}
|
|
17
|
+
//# sourceMappingURL=config.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"config.js","sourceRoot":"","sources":["../src/config.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,IAAI,EAAE,MAAM,WAAW,CAAC;AAgCjC,MAAM,UAAU,GAAG,SAAS,CAAC;AAE7B,MAAM,UAAU,aAAa,CAAC,SAAiB,EAAE,YAA6B,EAAE;IAC9E,OAAO;QACL,SAAS;QACT,MAAM,EAAE,IAAI,CAAC,SAAS,EAAE,UAAU,EAAE,UAAU,CAAC;QAC/C,aAAa,EAAE,IAAI,CAAC,SAAS,EAAE,UAAU,EAAE,QAAQ,CAAC;QACpD,GAAG,EAAE,CAAC;QACN,QAAQ,EAAE,GAAG;QACb,MAAM,EAAE,GAAG;QACX,qBAAqB,EAAE,GAAG;QAC1B,gBAAgB,EAAE,IAAI;QACtB,eAAe,EAAE,MAAM;QACvB,GAAG,SAAS;KACb,CAAC;AACJ,CAAC"}
|
|
@@ -0,0 +1,84 @@
|
|
|
1
|
+
import { mkdirSync, rmSync } from "node:fs";
|
|
2
|
+
import { dirname } from "node:path";
|
|
3
|
+
import { resolveConfig } from "./config.js";
|
|
4
|
+
import { SafeEditService } from "./edit/safeEdit.js";
|
|
5
|
+
import { LocalEmbeddingProvider } from "./embeddings/local.js";
|
|
6
|
+
import { OpenAIEmbeddingProvider } from "./embeddings/openai.js";
|
|
7
|
+
import { Indexer } from "./index/indexer.js";
|
|
8
|
+
import { ChunkStore } from "./store/chunkStore.js";
|
|
9
|
+
import { openDb } from "./store/db.js";
|
|
10
|
+
import { SearchService } from "./store/search.js";
|
|
11
|
+
import { VectorStore } from "./store/vectorStore.js";
|
|
12
|
+
import { logger } from "./util/logger.js";
|
|
13
|
+
import { AnthropicVerifier } from "./verify/anthropic.js";
|
|
14
|
+
import { LocalVerifier } from "./verify/localVerifier.js";
|
|
15
|
+
import { GroundingPipeline } from "./verify/pipeline.js";
|
|
16
|
+
/**
|
|
17
|
+
* Composition root: instantiates the store, providers, and services for a vault.
|
|
18
|
+
* Tools never construct providers directly — they receive an {@link AppContext}.
|
|
19
|
+
*/
|
|
20
|
+
export function createContext(vaultPath, overrides = {}) {
|
|
21
|
+
const config = resolveConfig(vaultPath, overrides.config);
|
|
22
|
+
mkdirSync(dirname(config.dbPath), { recursive: true });
|
|
23
|
+
if (overrides.reindex) {
|
|
24
|
+
for (const suffix of ["", "-wal", "-shm"]) {
|
|
25
|
+
rmSync(`${config.dbPath}${suffix}`, { force: true });
|
|
26
|
+
}
|
|
27
|
+
}
|
|
28
|
+
const embeddings = overrides.embeddings ?? createDefaultEmbeddings(config);
|
|
29
|
+
const verifier = overrides.verifier ?? createDefaultVerifier(config.modelCacheDir);
|
|
30
|
+
const { db, vectorBackend } = openDb({
|
|
31
|
+
path: config.dbPath,
|
|
32
|
+
embeddingModel: embeddings.id,
|
|
33
|
+
dim: embeddings.dim,
|
|
34
|
+
});
|
|
35
|
+
const chunkStore = new ChunkStore(db);
|
|
36
|
+
const vectorStore = new VectorStore(db, vectorBackend);
|
|
37
|
+
const indexer = new Indexer(vaultPath, db, chunkStore, vectorStore, embeddings);
|
|
38
|
+
const search = new SearchService(embeddings, vectorStore, chunkStore);
|
|
39
|
+
const grounding = new GroundingPipeline(search, verifier, {
|
|
40
|
+
knn: config.knn,
|
|
41
|
+
thresholds: { grounded: config.grounded, refuse: config.refuse },
|
|
42
|
+
decisiveMinConfidence: config.decisiveMinConfidence,
|
|
43
|
+
evidenceMinScore: config.evidenceMinScore,
|
|
44
|
+
verifyTimeoutMs: config.verifyTimeoutMs,
|
|
45
|
+
});
|
|
46
|
+
const safeEdit = new SafeEditService(vaultPath, indexer);
|
|
47
|
+
return {
|
|
48
|
+
config,
|
|
49
|
+
db,
|
|
50
|
+
embeddings,
|
|
51
|
+
verifier,
|
|
52
|
+
chunkStore,
|
|
53
|
+
vectorStore,
|
|
54
|
+
indexer,
|
|
55
|
+
search,
|
|
56
|
+
grounding,
|
|
57
|
+
safeEdit,
|
|
58
|
+
};
|
|
59
|
+
}
|
|
60
|
+
/**
|
|
61
|
+
* Selects the embedding provider. Local by default (privacy-first); OpenAI only
|
|
62
|
+
* when explicitly opted in via ANCHOR_EMBEDDING=openai (notes are then sent to
|
|
63
|
+
* the OpenAI API).
|
|
64
|
+
*/
|
|
65
|
+
function createDefaultEmbeddings(config) {
|
|
66
|
+
if (process.env.ANCHOR_EMBEDDING === "openai") {
|
|
67
|
+
logger.info("Using OpenAI embeddings (notes are sent to the OpenAI API)");
|
|
68
|
+
return new OpenAIEmbeddingProvider();
|
|
69
|
+
}
|
|
70
|
+
return new LocalEmbeddingProvider(config.modelCacheDir);
|
|
71
|
+
}
|
|
72
|
+
/**
|
|
73
|
+
* Selects the grounding verifier: Anthropic (Haiku) when an API key is present,
|
|
74
|
+
* otherwise the local NLI verifier. Force local with ANCHOR_VERIFIER=local.
|
|
75
|
+
*/
|
|
76
|
+
function createDefaultVerifier(modelCacheDir) {
|
|
77
|
+
if (process.env.ANCHOR_VERIFIER !== "local" && process.env.ANTHROPIC_API_KEY) {
|
|
78
|
+
logger.info("Using Anthropic verifier (ANTHROPIC_API_KEY detected)");
|
|
79
|
+
return new AnthropicVerifier();
|
|
80
|
+
}
|
|
81
|
+
logger.info("Using local NLI verifier; set ANTHROPIC_API_KEY for higher accuracy");
|
|
82
|
+
return new LocalVerifier(modelCacheDir);
|
|
83
|
+
}
|
|
84
|
+
//# sourceMappingURL=container.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"container.js","sourceRoot":"","sources":["../src/container.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,SAAS,EAAE,MAAM,EAAE,MAAM,SAAS,CAAC;AAC5C,OAAO,EAAE,OAAO,EAAE,MAAM,WAAW,CAAC;AAEpC,OAAO,EAA2C,aAAa,EAAE,MAAM,aAAa,CAAC;AACrF,OAAO,EAAE,eAAe,EAAE,MAAM,oBAAoB,CAAC;AACrD,OAAO,EAAE,sBAAsB,EAAE,MAAM,uBAAuB,CAAC;AAC/D,OAAO,EAAE,uBAAuB,EAAE,MAAM,wBAAwB,CAAC;AAEjE,OAAO,EAAE,OAAO,EAAE,MAAM,oBAAoB,CAAC;AAC7C,OAAO,EAAE,UAAU,EAAE,MAAM,uBAAuB,CAAC;AACnD,OAAO,EAAW,MAAM,EAAE,MAAM,eAAe,CAAC;AAChD,OAAO,EAAE,aAAa,EAAE,MAAM,mBAAmB,CAAC;AAClD,OAAO,EAAE,WAAW,EAAE,MAAM,wBAAwB,CAAC;AACrD,OAAO,EAAE,MAAM,EAAE,MAAM,kBAAkB,CAAC;AAC1C,OAAO,EAAE,iBAAiB,EAAE,MAAM,uBAAuB,CAAC;AAC1D,OAAO,EAAE,aAAa,EAAE,MAAM,2BAA2B,CAAC;AAC1D,OAAO,EAAE,iBAAiB,EAAE,MAAM,sBAAsB,CAAC;AAiCzD;;;GAGG;AACH,MAAM,UAAU,aAAa,CAAC,SAAiB,EAAE,YAA8B,EAAE;IAC/E,MAAM,MAAM,GAAG,aAAa,CAAC,SAAS,EAAE,SAAS,CAAC,MAAM,CAAC,CAAC;IAC1D,SAAS,CAAC,OAAO,CAAC,MAAM,CAAC,MAAM,CAAC,EAAE,EAAE,SAAS,EAAE,IAAI,EAAE,CAAC,CAAC;IACvD,IAAI,SAAS,CAAC,OAAO,EAAE,CAAC;QACtB,KAAK,MAAM,MAAM,IAAI,CAAC,EAAE,EAAE,MAAM,EAAE,MAAM,CAAC,EAAE,CAAC;YAC1C,MAAM,CAAC,GAAG,MAAM,CAAC,MAAM,GAAG,MAAM,EAAE,EAAE,EAAE,KAAK,EAAE,IAAI,EAAE,CAAC,CAAC;QACvD,CAAC;IACH,CAAC;IAED,MAAM,UAAU,GAAG,SAAS,CAAC,UAAU,IAAI,uBAAuB,CAAC,MAAM,CAAC,CAAC;IAC3E,MAAM,QAAQ,GAAG,SAAS,CAAC,QAAQ,IAAI,qBAAqB,CAAC,MAAM,CAAC,aAAa,CAAC,CAAC;IACnF,MAAM,EAAE,EAAE,EAAE,aAAa,EAAE,GAAG,MAAM,CAAC;QACnC,IAAI,EAAE,MAAM,CAAC,MAAM;QACnB,cAAc,EAAE,UAAU,CAAC,EAAE;QAC7B,GAAG,EAAE,UAAU,CAAC,GAAG;KACpB,CAAC,CAAC;IACH,MAAM,UAAU,GAAG,IAAI,UAAU,CAAC,EAAE,CAAC,CAAC;IACtC,MAAM,WAAW,GAAG,IAAI,WAAW,CAAC,EAAE,EAAE,aAAa,CAAC,CAAC;IACvD,MAAM,OAAO,GAAG,IAAI,OAAO,CAAC,SAAS,EAAE,EAAE,EAAE,UAAU,EAAE,WAAW,EAAE,UAAU,CAAC,CAAC;IAChF,MAAM,MAAM,GAAG,IAAI,aAAa,CAAC,UAAU,EAAE,WAAW,EAAE,UAAU,CAAC,CAAC;IACtE,MAAM,SAAS,GAAG,IAAI,iBAAiB,CAAC,MAAM,EAAE,QAAQ,EAAE;QACxD,GAAG,EAAE,MAAM,CAAC,GAAG;QACf,UAAU,EAAE,EAAE,QAAQ,EAAE,MAAM,CAAC,QAAQ,EAAE,MAAM,EAAE,MAAM,CAAC,MAAM,EAAE;QAChE,qBAAqB,EAAE,MAAM,CAAC,qBAAqB;QACnD,gBAAgB,EAAE,MAAM,CAAC,gBAAgB;QACzC,eAAe,EAAE,MAAM,CAAC,eAAe;KACxC,CAAC,CAAC;IACH,MAAM,QAAQ,GAAG,IAAI,eAAe,CAAC,SAAS,EAAE,OAAO,CAAC,CAAC;IAEzD,OAAO;QACL,MAAM;QACN,EAAE;QACF,UAAU;QACV,QAAQ;QACR,UAAU;QACV,WAAW;QACX,OAAO;QACP,MAAM;QACN,SAAS;QACT,QAAQ;KACT,CAAC;AACJ,CAAC;AAED;;;;GAIG;AACH,SAAS,uBAAuB,CAAC,MAAoB;IACnD,IAAI,OAAO,CAAC,GAAG,CAAC,gBAAgB,KAAK,QAAQ,EAAE,CAAC;QAC9C,MAAM,CAAC,IAAI,CAAC,4DAA4D,CAAC,CAAC;QAC1E,OAAO,IAAI,uBAAuB,EAAE,CAAC;IACvC,CAAC;IACD,OAAO,IAAI,sBAAsB,CAAC,MAAM,CAAC,aAAa,CAAC,CAAC;AAC1D,CAAC;AAED;;;GAGG;AACH,SAAS,qBAAqB,CAAC,aAAqB;IAClD,IAAI,OAAO,CAAC,GAAG,CAAC,eAAe,KAAK,OAAO,IAAI,OAAO,CAAC,GAAG,CAAC,iBAAiB,EAAE,CAAC;QAC7E,MAAM,CAAC,IAAI,CAAC,uDAAuD,CAAC,CAAC;QACrE,OAAO,IAAI,iBAAiB,EAAE,CAAC;IACjC,CAAC;IACD,MAAM,CAAC,IAAI,CAAC,qEAAqE,CAAC,CAAC;IACnF,OAAO,IAAI,aAAa,CAAC,aAAa,CAAC,CAAC;AAC1C,CAAC"}
|