@roomi-fields/notebooklm-mcp 1.5.9 → 1.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,323 @@
1
+ # NotebookLM + RTFM — cache batch outputs as a searchable markdown vault
2
+
3
+ NotebookLM is brilliant at producing citation-backed answers, but it's slow (~10–30s per query) and rate-limited (50 queries per day per Google account on the free tier). For any workflow that re-asks similar questions over time — academic literature reviews, competitive intelligence pipelines, internal knowledge bases — querying NotebookLM live every time is the wrong architecture.
4
+
5
+ The pattern that scales: **NotebookLM as a one-shot ingestion layer, [RTFM](https://github.com/roomi-fields/rtfm) as the retrieval layer.** Run an exhaustive question set once, persist every answer (with citations, source titles, and excerpts) as markdown, then point your CLI agent at the vault for unlimited offline queries.
6
+
7
+ ```
8
+ [Once per notebook, periodic]
9
+ CLI agent generates an exhaustive question set
10
+ → POST /batch-to-vault (titles + excerpts, citations preserved)
11
+ → vault/*.md + vault/*.json (RTFM-ingestable)
12
+
13
+ [At will, unlimited, ~ms, offline]
14
+ Agent → rtfm_search → rtfm_expand → answer
15
+ ```
16
+
17
+ This page shows how to wire the two together.
18
+
19
+ ## Why this beats querying NotebookLM live
20
+
21
+ | Concern | Live NotebookLM | NotebookLM → vault → RTFM |
22
+ | ----------------- | --------------------------- | ------------------------------ |
23
+ | Latency per query | 10–30s | ~milliseconds |
24
+ | Quota | 50/day per Google account | Unlimited after one-shot batch |
25
+ | Repeat queries | Cost a quota slot each time | Free |
26
+ | Offline | No | Yes |
27
+ | Source citations | Yes (titles + excerpts) | Yes (preserved in markdown) |
28
+ | Best for | Fresh interpretation | Re-querying ingested knowledge |
29
+
30
+ ## What you need
31
+
32
+ - This project running locally: `npm run start:http` after `npm run setup-auth`. [Install guide](/install).
33
+ - [RTFM](https://github.com/roomi-fields/rtfm) installed and configured to point at your vault directory.
34
+ - A notebook with sources already attached. List them with `GET /notebooks/scrape`.
35
+ - A list of questions you want answered against that notebook.
36
+
37
+ ## Two transports, same logic
38
+
39
+ The batch runner is exposed two ways — pick whichever matches your client. Both call the same `runBatchToVault` helper, accept the same parameters, and return the same shape.
40
+
41
+ | Transport | When to use | Requires HTTP server running? |
42
+ | --------------------------- | ------------------------------------------------------------------ | -------------------------------------- |
43
+ | MCP tool `batch_to_vault` | From an MCP client (Claude Code, Cursor, Codex, Continue) | No — the MCP client spawns the process |
44
+ | HTTP `POST /batch-to-vault` | From shell scripts, curl, n8n HTTP node, custom backends, browsers | Yes (`npm run start:http`) |
45
+
46
+ For ad-hoc batches inside an agent session, prefer the MCP tool — nothing to start, kill, or expose on a port. For overnight cron jobs and external orchestrators, the HTTP endpoint is the right transport.
47
+
48
+ ### MCP tool call
49
+
50
+ From any MCP client with this server registered:
51
+
52
+ ```jsonc
53
+ // Claude Code / Cursor / Codex tool call
54
+ {
55
+ "tool": "batch_to_vault",
56
+ "arguments": {
57
+ "questions": [
58
+ "What is the OSBD process?",
59
+ "How does NVC differentiate a need from a strategy?",
60
+ ],
61
+ "vault_dir": "/path/to/your/vault/cnv",
62
+ "notebook_id": "notebook-1",
63
+ "slug_prefix": "sota",
64
+ "source_format": "json",
65
+ "sleep_between_ms": 2000,
66
+ },
67
+ }
68
+ ```
69
+
70
+ The agent invokes `batch_to_vault` directly — same answer, same `{slug}.md` + `{slug}.json` artifacts, no HTTP round-trip.
71
+
72
+ ## The HTTP endpoint
73
+
74
+ `POST /batch-to-vault` runs a list of questions and writes each answer as two artifacts in a vault directory:
75
+
76
+ - `{slug}.md` — markdown with YAML frontmatter, the answer body, and a "Sources" section with quoted excerpts. Indexable by any markdown vault tool (RTFM, Obsidian, Foam, Dendron…).
77
+ - `{slug}.json` — a structured payload conforming to the `nblm-answer-v1` schema (see [schema below](#nblm-answer-v1-json-schema)) for richer ingestion.
78
+
79
+ ### Request
80
+
81
+ ```bash
82
+ curl -X POST http://localhost:3000/batch-to-vault \
83
+ -H 'Content-Type: application/json' \
84
+ -d '{
85
+ "questions": [
86
+ "What is the OSBD process?",
87
+ "How does NVC differentiate a need from a strategy?",
88
+ "What is empathic listening in NVC?"
89
+ ],
90
+ "notebook_id": "notebook-1",
91
+ "vault_dir": "/path/to/your/vault/cnv",
92
+ "slug_prefix": "sota",
93
+ "source_format": "json",
94
+ "sleep_between_ms": 2000
95
+ }'
96
+ ```
97
+
98
+ ### Parameters
99
+
100
+ | Field | Required | Default | Description |
101
+ | ------------------ | -------- | -------- | --------------------------------------------------------------------------------------------- |
102
+ | `questions` | yes | — | Non-empty array of strings. Each question becomes one `.md` + one `.json` file. |
103
+ | `vault_dir` | yes | — | Destination directory. Created with `mkdir -p` if missing. |
104
+ | `notebook_id` | no | active | Library notebook id to query. |
105
+ | `notebook_url` | no | — | Direct NotebookLM URL (alternative to `notebook_id`). |
106
+ | `slug_prefix` | no | `""` | Prepended to each filename. Use to namespace per topic, e.g. `"sota"`, `"market-2026q2"`. |
107
+ | `source_format` | no | `"json"` | Citation extraction mode. `"json"` is recommended for vault output (keeps titles + excerpts). |
108
+ | `sleep_between_ms` | no | `0` | Pause between questions to avoid hammering NotebookLM. 1500–3000ms is sane for batches > 20. |
109
+ | `session_id` | no | new | Reuse an existing session for context continuity across the batch. |
110
+
111
+ ### Response
112
+
113
+ ```json
114
+ {
115
+ "success": true,
116
+ "data": {
117
+ "vault_dir": "/path/to/your/vault/cnv",
118
+ "total": 3,
119
+ "succeeded": 3,
120
+ "failed": 0,
121
+ "session_id": "5f1d8731",
122
+ "notebook": {
123
+ "id": "notebook-1",
124
+ "url": "https://notebooklm.google.com/notebook/74912e55-..."
125
+ },
126
+ "files": [
127
+ {
128
+ "question": "What is the OSBD process?",
129
+ "md_path": "/path/to/your/vault/cnv/sota-001-what-is-the-osbd-process.md",
130
+ "json_path": "/path/to/your/vault/cnv/sota-001-what-is-the-osbd-process.json",
131
+ "success": true,
132
+ "citations_count": 16
133
+ }
134
+ ]
135
+ }
136
+ }
137
+ ```
138
+
139
+ ## What gets written
140
+
141
+ ### `{slug}.md`
142
+
143
+ ```markdown
144
+ ---
145
+ title: 'What is the OSBD process?'
146
+ type: nblm-answer
147
+ asked_at: 2026-05-04T13:30:00.000Z
148
+ notebook_id: 'notebook-1'
149
+ notebook_url: 'https://notebooklm.google.com/notebook/74912e55-...'
150
+ session_id: '5f1d8731'
151
+ citations_count: 16
152
+ sources:
153
+ - 'Pratiquer la Communication NonViolente_F.Keller.pdf'
154
+ - 'CNV et OSBD : outils pour pratiquer la communication bienveillante'
155
+ - "Rapport d'analyse systémique sur les cursus de formation en CNV"
156
+ ---
157
+
158
+ # What is the OSBD process?
159
+
160
+ > Asked on 2026-05-04T13:30:00.000Z against [CNV - Communication NonViolente](https://notebooklm.google.com/notebook/...)
161
+
162
+ ## Answer
163
+
164
+ OSBD is the four-step acronym at the core of Nonviolent Communication...
165
+
166
+ ## Sources
167
+
168
+ ### [1] CNV et OSBD : outils pour pratiquer la communication bienveillante
169
+
170
+ > Ce mode de communication est un choix conscient...
171
+
172
+ ### [2] Pratiquer la Communication NonViolente_F.Keller.pdf
173
+
174
+ > Observation Je décris, de manière neutre, la situation...
175
+ ```
176
+
177
+ The frontmatter is standard YAML — every markdown indexer (RTFM, Obsidian, Foam) reads it natively. The body has stable section headings (`## Answer`, `## Sources`) so a parser can lift the answer text and citation excerpts independently.
178
+
179
+ ### `{slug}.json`
180
+
181
+ A structured sidecar conforming to [`nblm-answer-v1`](#nblm-answer-v1-json-schema). Use it when your indexer wants typed access to citations, source positions, or session metadata without re-parsing the markdown.
182
+
183
+ ## Pointing RTFM at the vault
184
+
185
+ [RTFM](https://github.com/roomi-fields/rtfm) is an MCP-native retrieval layer with FTS5 + semantic search over markdown vaults, wikilink resolution, and progressive disclosure for AI agents. It speaks the same markdown convention `/batch-to-vault` writes, so wiring is essentially "point and index":
186
+
187
+ ```bash
188
+ # 1. Generate the vault from NotebookLM
189
+ curl -X POST http://localhost:3000/batch-to-vault -d '{...}'
190
+
191
+ # 2. Index it with RTFM
192
+ rtfm index /path/to/your/vault/cnv
193
+
194
+ # 3. Search from your CLI agent (or any MCP client)
195
+ rtfm search "OSBD process" --top 5
196
+ rtfm expand sota-001-what-is-the-osbd-process
197
+ ```
198
+
199
+ Inside an MCP client (Claude Code, Cursor, Codex), the same flow becomes a two-tool pattern: `rtfm_search` to surface the relevant cached answer, `rtfm_expand` to read the full markdown with citations preserved. No NotebookLM call needed for repeat queries.
200
+
201
+ When new sources land in the notebook, re-run `/batch-to-vault` to refresh the cache.
202
+
203
+ ## Recommended layout for academic / SOTA workflows
204
+
205
+ ```
206
+ ~/research-vault/
207
+ ├── cnv/ # one notebook → one folder
208
+ │ ├── sota-001-...md
209
+ │ ├── sota-001-...json
210
+ │ ├── sota-002-...md
211
+ │ └── sota-002-...json
212
+ ├── ifs-therapy/
213
+ │ ├── sota-001-...md
214
+ │ └── ...
215
+ └── attachment-theory/
216
+ └── ...
217
+ ```
218
+
219
+ Each folder maps to one NotebookLM notebook. `slug_prefix` per topic keeps filenames sortable and unique. RTFM indexes the whole tree and resolves cross-folder wikilinks if you add them.
220
+
221
+ ## Question generation
222
+
223
+ The matching pattern on the input side: ask Claude (or any LLM) to generate an exhaustive question set for a topic before you batch them.
224
+
225
+ ```
226
+ You are preparing a SOTA (state of the art) document on {topic} from a NotebookLM
227
+ notebook containing {N sources}. Generate {K} questions that, taken together,
228
+ extract everything a domain expert would want to know:
229
+
230
+ - Foundational definitions and key concepts
231
+ - Historical context and lineage
232
+ - Core mechanisms / processes
233
+ - Distinctions vs adjacent fields
234
+ - Empirical evidence and limitations
235
+ - Practical applications
236
+ - Open debates and research gaps
237
+
238
+ Output as a JSON array of strings, no commentary.
239
+ ```
240
+
241
+ Save the output as `questions.json`, then:
242
+
243
+ ```bash
244
+ curl -X POST http://localhost:3000/batch-to-vault \
245
+ -H 'Content-Type: application/json' \
246
+ -d "$(jq -n --slurpfile q questions.json --arg dir ~/research-vault/cnv \
247
+ '{questions: $q[0], notebook_id: "notebook-1", vault_dir: $dir, slug_prefix: "sota", sleep_between_ms: 2000}')"
248
+ ```
249
+
250
+ For batches above ~50 questions, multi-account rotation kicks in automatically when a quota is hit. See [Multi-account rotation](/notebooklm-multi-account).
251
+
252
+ ## `nblm-answer-v1` JSON schema
253
+
254
+ Sidecar `{slug}.json` files conform to this schema. Stable across releases under SemVer; breaking changes will bump the major version.
255
+
256
+ > **Canonical URL** (resolvable, served as `application/schema+json` with CORS): [schemas.roomi-fields.com/nblm-answer-v1.json](https://schemas.roomi-fields.com/nblm-answer-v1.json) — fetch from any JSON Schema validator. The version below mirrors the canonical document.
257
+
258
+ ```json
259
+ {
260
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
261
+ "$id": "https://schemas.roomi-fields.com/nblm-answer-v1.json",
262
+ "title": "NotebookLM Answer (nblm-answer-v1)",
263
+ "description": "Structured sidecar payload produced by notebooklm-mcp /batch-to-vault. Encodes a single NotebookLM answer with citations, source positions, and session metadata for typed ingestion by retrieval systems (e.g. RTFM).",
264
+ "type": "object",
265
+ "required": ["type", "version", "asked_at", "question", "answer", "citations", "metadata"],
266
+ "properties": {
267
+ "$schema": { "type": "string", "format": "uri" },
268
+ "type": { "const": "nblm-answer" },
269
+ "version": { "const": "1.0" },
270
+ "asked_at": { "type": "string", "format": "date-time" },
271
+ "session_id": { "type": ["string", "null"] },
272
+ "notebook": {
273
+ "type": "object",
274
+ "properties": {
275
+ "id": { "type": ["string", "null"] },
276
+ "name": { "type": ["string", "null"] },
277
+ "url": { "type": ["string", "null"] }
278
+ }
279
+ },
280
+ "question": { "type": "string" },
281
+ "answer": {
282
+ "type": "object",
283
+ "required": ["text", "format"],
284
+ "properties": {
285
+ "text": { "type": "string" },
286
+ "format": { "const": "markdown" }
287
+ }
288
+ },
289
+ "citations": {
290
+ "type": "array",
291
+ "items": {
292
+ "type": "object",
293
+ "required": ["marker", "number"],
294
+ "properties": {
295
+ "marker": { "type": "string", "description": "Display marker, e.g. \"[1]\"" },
296
+ "number": { "type": "integer", "minimum": 1 },
297
+ "source_name": { "type": ["string", "null"] },
298
+ "source_text": {
299
+ "type": ["string", "null"],
300
+ "description": "Highlighted excerpt from the cited source"
301
+ }
302
+ }
303
+ }
304
+ },
305
+ "metadata": {
306
+ "type": "object",
307
+ "properties": {
308
+ "tags": { "type": "array", "items": { "type": "string" } },
309
+ "extraction_success": { "type": ["boolean", "null"] },
310
+ "citations_count": { "type": "integer", "minimum": 0 },
311
+ "source_names": { "type": "array", "items": { "type": "string" } }
312
+ }
313
+ }
314
+ }
315
+ }
316
+ ```
317
+
318
+ ## See also
319
+
320
+ - [Run 1 000 questions overnight](/batch-1000-questions) — the larger batch pattern with auto-reauth and rotation
321
+ - [Multi-account rotation](/notebooklm-multi-account) — how quotas and TOTP auto-reauth work
322
+ - [REST API reference](/notebooklm-rest-api) — full endpoint surface (33 endpoints + `/batch-to-vault`)
323
+ - [RTFM on GitHub](https://github.com/roomi-fields/rtfm) — the retrieval layer