@roomi-fields/notebooklm-mcp 1.5.8 → 1.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,288 @@
1
+ # NotebookLM + RTFM — cache batch outputs as a searchable markdown vault
2
+
3
+ NotebookLM is brilliant at producing citation-backed answers, but it's slow (~10–30s per query) and rate-limited (50 queries per day per Google account on the free tier). For any workflow that re-asks similar questions over time — academic literature reviews, competitive intelligence pipelines, internal knowledge bases — querying NotebookLM live every time is the wrong architecture.
4
+
5
+ The pattern that scales: **NotebookLM as a one-shot ingestion layer, [RTFM](https://github.com/roomi-fields/rtfm) as the retrieval layer.** Run an exhaustive question set once, persist every answer (with citations, source titles, and excerpts) as markdown, then point your CLI agent at the vault for unlimited offline queries.
6
+
7
+ ```
8
+ [Once per notebook, periodic]
9
+ CLI agent generates an exhaustive question set
10
+ → POST /batch-to-vault (titles + excerpts, citations preserved)
11
+ → vault/*.md + vault/*.json (RTFM-ingestable)
12
+
13
+ [At will, unlimited, ~ms, offline]
14
+ Agent → rtfm_search → rtfm_expand → answer
15
+ ```
16
+
17
+ This page shows how to wire the two together.
18
+
19
+ ## Why this beats querying NotebookLM live
20
+
21
+ | Concern | Live NotebookLM | NotebookLM → vault → RTFM |
22
+ | ----------------- | --------------------------- | ------------------------------ |
23
+ | Latency per query | 10–30s | ~milliseconds |
24
+ | Quota | 50/day per Google account | Unlimited after one-shot batch |
25
+ | Repeat queries | Cost a quota slot each time | Free |
26
+ | Offline | No | Yes |
27
+ | Source citations | Yes (titles + excerpts) | Yes (preserved in markdown) |
28
+ | Best for | Fresh interpretation | Re-querying ingested knowledge |
29
+
30
+ ## What you need
31
+
32
+ - This project running locally: `npm run start:http` after `npm run setup-auth`. [Install guide](/install).
33
+ - [RTFM](https://github.com/roomi-fields/rtfm) installed and configured to point at your vault directory.
34
+ - A notebook with sources already attached. List them with `GET /notebooks/scrape`.
35
+ - A list of questions you want answered against that notebook.
36
+
37
+ ## The endpoint
38
+
39
+ `POST /batch-to-vault` runs a list of questions and writes each answer as two artifacts in a vault directory:
40
+
41
+ - `{slug}.md` — markdown with YAML frontmatter, the answer body, and a "Sources" section with quoted excerpts. Indexable by any markdown vault tool (RTFM, Obsidian, Foam, Dendron…).
42
+ - `{slug}.json` — a structured payload conforming to the `nblm-answer-v1` schema (see [schema below](#nblm-answer-v1-json-schema)) for richer ingestion.
43
+
44
+ ### Request
45
+
46
+ ```bash
47
+ curl -X POST http://localhost:3000/batch-to-vault \
48
+ -H 'Content-Type: application/json' \
49
+ -d '{
50
+ "questions": [
51
+ "What is the OSBD process?",
52
+ "How does NVC differentiate a need from a strategy?",
53
+ "What is empathic listening in NVC?"
54
+ ],
55
+ "notebook_id": "notebook-1",
56
+ "vault_dir": "/path/to/your/vault/cnv",
57
+ "slug_prefix": "sota",
58
+ "source_format": "json",
59
+ "sleep_between_ms": 2000
60
+ }'
61
+ ```
62
+
63
+ ### Parameters
64
+
65
+ | Field | Required | Default | Description |
66
+ | ------------------ | -------- | -------- | --------------------------------------------------------------------------------------------- |
67
+ | `questions` | yes | — | Non-empty array of strings. Each question becomes one `.md` + one `.json` file. |
68
+ | `vault_dir` | yes | — | Destination directory. Created with `mkdir -p` if missing. |
69
+ | `notebook_id` | no | active | Library notebook id to query. |
70
+ | `notebook_url` | no | — | Direct NotebookLM URL (alternative to `notebook_id`). |
71
+ | `slug_prefix` | no | `""` | Prepended to each filename. Use to namespace per topic, e.g. `"sota"`, `"market-2026q2"`. |
72
+ | `source_format` | no | `"json"` | Citation extraction mode. `"json"` is recommended for vault output (keeps titles + excerpts). |
73
+ | `sleep_between_ms` | no | `0` | Pause between questions to avoid hammering NotebookLM. 1500–3000ms is sane for batches > 20. |
74
+ | `session_id` | no | new | Reuse an existing session for context continuity across the batch. |
75
+
76
+ ### Response
77
+
78
+ ```json
79
+ {
80
+ "success": true,
81
+ "data": {
82
+ "vault_dir": "/path/to/your/vault/cnv",
83
+ "total": 3,
84
+ "succeeded": 3,
85
+ "failed": 0,
86
+ "session_id": "5f1d8731",
87
+ "notebook": {
88
+ "id": "notebook-1",
89
+ "url": "https://notebooklm.google.com/notebook/74912e55-..."
90
+ },
91
+ "files": [
92
+ {
93
+ "question": "What is the OSBD process?",
94
+ "md_path": "/path/to/your/vault/cnv/sota-001-what-is-the-osbd-process.md",
95
+ "json_path": "/path/to/your/vault/cnv/sota-001-what-is-the-osbd-process.json",
96
+ "success": true,
97
+ "citations_count": 16
98
+ }
99
+ ]
100
+ }
101
+ }
102
+ ```
103
+
104
+ ## What gets written
105
+
106
+ ### `{slug}.md`
107
+
108
+ ```markdown
109
+ ---
110
+ title: 'What is the OSBD process?'
111
+ type: nblm-answer
112
+ asked_at: 2026-05-04T13:30:00.000Z
113
+ notebook_id: 'notebook-1'
114
+ notebook_url: 'https://notebooklm.google.com/notebook/74912e55-...'
115
+ session_id: '5f1d8731'
116
+ citations_count: 16
117
+ sources:
118
+ - 'Pratiquer la Communication NonViolente_F.Keller.pdf'
119
+ - 'CNV et OSBD : outils pour pratiquer la communication bienveillante'
120
+ - "Rapport d'analyse systémique sur les cursus de formation en CNV"
121
+ ---
122
+
123
+ # What is the OSBD process?
124
+
125
+ > Asked on 2026-05-04T13:30:00.000Z against [CNV - Communication NonViolente](https://notebooklm.google.com/notebook/...)
126
+
127
+ ## Answer
128
+
129
+ OSBD is the four-step acronym at the core of Nonviolent Communication...
130
+
131
+ ## Sources
132
+
133
+ ### [1] CNV et OSBD : outils pour pratiquer la communication bienveillante
134
+
135
+ > Ce mode de communication est un choix conscient...
136
+
137
+ ### [2] Pratiquer la Communication NonViolente_F.Keller.pdf
138
+
139
+ > Observation Je décris, de manière neutre, la situation...
140
+ ```
141
+
142
+ The frontmatter is standard YAML — every markdown indexer (RTFM, Obsidian, Foam) reads it natively. The body has stable section headings (`## Answer`, `## Sources`) so a parser can lift the answer text and citation excerpts independently.
143
+
144
+ ### `{slug}.json`
145
+
146
+ A structured sidecar conforming to [`nblm-answer-v1`](#nblm-answer-v1-json-schema). Use it when your indexer wants typed access to citations, source positions, or session metadata without re-parsing the markdown.
147
+
148
+ ## Pointing RTFM at the vault
149
+
150
+ [RTFM](https://github.com/roomi-fields/rtfm) is an MCP-native retrieval layer with FTS5 + semantic search over markdown vaults, wikilink resolution, and progressive disclosure for AI agents. It speaks the same markdown convention `/batch-to-vault` writes, so wiring is essentially "point and index":
151
+
152
+ ```bash
153
+ # 1. Generate the vault from NotebookLM
154
+ curl -X POST http://localhost:3000/batch-to-vault -d '{...}'
155
+
156
+ # 2. Index it with RTFM
157
+ rtfm index /path/to/your/vault/cnv
158
+
159
+ # 3. Search from your CLI agent (or any MCP client)
160
+ rtfm search "OSBD process" --top 5
161
+ rtfm expand sota-001-what-is-the-osbd-process
162
+ ```
163
+
164
+ Inside an MCP client (Claude Code, Cursor, Codex), the same flow becomes a two-tool pattern: `rtfm_search` to surface the relevant cached answer, `rtfm_expand` to read the full markdown with citations preserved. No NotebookLM call needed for repeat queries.
165
+
166
+ When new sources land in the notebook, re-run `/batch-to-vault` to refresh the cache.
167
+
168
+ ## Recommended layout for academic / SOTA workflows
169
+
170
+ ```
171
+ ~/research-vault/
172
+ ├── cnv/ # one notebook → one folder
173
+ │ ├── sota-001-...md
174
+ │ ├── sota-001-...json
175
+ │ ├── sota-002-...md
176
+ │ └── sota-002-...json
177
+ ├── ifs-therapy/
178
+ │ ├── sota-001-...md
179
+ │ └── ...
180
+ └── attachment-theory/
181
+ └── ...
182
+ ```
183
+
184
+ Each folder maps to one NotebookLM notebook. `slug_prefix` per topic keeps filenames sortable and unique. RTFM indexes the whole tree and resolves cross-folder wikilinks if you add them.
185
+
186
+ ## Question generation
187
+
188
+ The matching pattern on the input side: ask Claude (or any LLM) to generate an exhaustive question set for a topic before you batch them.
189
+
190
+ ```
191
+ You are preparing a SOTA (state of the art) document on {topic} from a NotebookLM
192
+ notebook containing {N sources}. Generate {K} questions that, taken together,
193
+ extract everything a domain expert would want to know:
194
+
195
+ - Foundational definitions and key concepts
196
+ - Historical context and lineage
197
+ - Core mechanisms / processes
198
+ - Distinctions vs adjacent fields
199
+ - Empirical evidence and limitations
200
+ - Practical applications
201
+ - Open debates and research gaps
202
+
203
+ Output as a JSON array of strings, no commentary.
204
+ ```
205
+
206
+ Save the output as `questions.json`, then:
207
+
208
+ ```bash
209
+ curl -X POST http://localhost:3000/batch-to-vault \
210
+ -H 'Content-Type: application/json' \
211
+ -d "$(jq -n --slurpfile q questions.json --arg dir ~/research-vault/cnv \
212
+ '{questions: $q[0], notebook_id: "notebook-1", vault_dir: $dir, slug_prefix: "sota", sleep_between_ms: 2000}')"
213
+ ```
214
+
215
+ For batches above ~50 questions, multi-account rotation kicks in automatically when a quota is hit. See [Multi-account rotation](/notebooklm-multi-account).
216
+
217
+ ## `nblm-answer-v1` JSON schema
218
+
219
+ Sidecar `{slug}.json` files conform to this schema. Stable across releases under SemVer; breaking changes will bump the major version.
220
+
221
+ > **Canonical URL** (resolvable, served as `application/schema+json` with CORS): <https://schemas.roomi-fields.com/nblm-answer-v1.json> — fetch from any JSON Schema validator. The version below mirrors the canonical document.
222
+
223
+ ```json
224
+ {
225
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
226
+ "$id": "https://schemas.roomi-fields.com/nblm-answer-v1.json",
227
+ "title": "NotebookLM Answer (nblm-answer-v1)",
228
+ "description": "Structured sidecar payload produced by notebooklm-mcp /batch-to-vault. Encodes a single NotebookLM answer with citations, source positions, and session metadata for typed ingestion by retrieval systems (e.g. RTFM).",
229
+ "type": "object",
230
+ "required": ["type", "version", "asked_at", "question", "answer", "citations", "metadata"],
231
+ "properties": {
232
+ "$schema": { "type": "string", "format": "uri" },
233
+ "type": { "const": "nblm-answer" },
234
+ "version": { "const": "1.0" },
235
+ "asked_at": { "type": "string", "format": "date-time" },
236
+ "session_id": { "type": ["string", "null"] },
237
+ "notebook": {
238
+ "type": "object",
239
+ "properties": {
240
+ "id": { "type": ["string", "null"] },
241
+ "name": { "type": ["string", "null"] },
242
+ "url": { "type": ["string", "null"] }
243
+ }
244
+ },
245
+ "question": { "type": "string" },
246
+ "answer": {
247
+ "type": "object",
248
+ "required": ["text", "format"],
249
+ "properties": {
250
+ "text": { "type": "string" },
251
+ "format": { "const": "markdown" }
252
+ }
253
+ },
254
+ "citations": {
255
+ "type": "array",
256
+ "items": {
257
+ "type": "object",
258
+ "required": ["marker", "number"],
259
+ "properties": {
260
+ "marker": { "type": "string", "description": "Display marker, e.g. \"[1]\"" },
261
+ "number": { "type": "integer", "minimum": 1 },
262
+ "source_name": { "type": ["string", "null"] },
263
+ "source_text": {
264
+ "type": ["string", "null"],
265
+ "description": "Highlighted excerpt from the cited source"
266
+ }
267
+ }
268
+ }
269
+ },
270
+ "metadata": {
271
+ "type": "object",
272
+ "properties": {
273
+ "tags": { "type": "array", "items": { "type": "string" } },
274
+ "extraction_success": { "type": ["boolean", "null"] },
275
+ "citations_count": { "type": "integer", "minimum": 0 },
276
+ "source_names": { "type": "array", "items": { "type": "string" } }
277
+ }
278
+ }
279
+ }
280
+ }
281
+ ```
282
+
283
+ ## See also
284
+
285
+ - [Run 1 000 questions overnight](/batch-1000-questions) — the larger batch pattern with auto-reauth and rotation
286
+ - [Multi-account rotation](/notebooklm-multi-account) — how quotas and TOTP auto-reauth work
287
+ - [REST API reference](/notebooklm-rest-api) — full endpoint surface (33 endpoints + `/batch-to-vault`)
288
+ - [RTFM on GitHub](https://github.com/roomi-fields/rtfm) — the retrieval layer
@@ -144,59 +144,13 @@
144
144
 
145
145
  ---
146
146
 
147
- ## Changelog
147
+ ## Release History
148
148
 
149
- ### v1.4.2 (2025-12-29)
149
+ Every release from 1.0.0 to the current version is documented with full "Added / Changed / Fixed / Security" breakdowns.
150
150
 
151
- **Removed fake content generation:**
151
+ 👉 **[See the full release history](/changelog)** — 13 versions, 6 years of changes.
152
152
 
153
- - Removed `generate_content` endpoint for FAQ, Study Guide, Briefing Doc, Timeline, TOC
154
- - These were NOT real NotebookLM features - just chat prompts
155
- - Only REAL content generation: Audio Overview (podcast)
156
- - Updated all documentation for honesty
157
-
158
- ### v1.4.0 (2025-12-24)
159
-
160
- **Content Management:**
161
-
162
- - Audio Overview generation (podcast) - REAL NotebookLM feature
163
- - Audio download
164
- - Source management (files, URLs, text, YouTube)
165
-
166
- ### v1.3.1 (2025-01-24)
167
-
168
- **New features:**
169
-
170
- - MCP Auto-Discovery Tool: `auto_discover_notebook` for Claude Desktop/Cursor
171
- - Parity with HTTP API: MCP clients now have auto-discovery capability
172
- - Zero-friction notebook addition: just URL, metadata auto-generated
173
-
174
- **Critical Fixes:**
175
-
176
- - Claude Desktop compatibility: Disabled `CompleteRequestSchema` handler
177
- - Fixed "Server does not support completions" error on connection
178
-
179
- ### v1.1.2 (2025-11-22)
180
-
181
- **New features:**
182
-
183
- - ✅ Multi-notebook library system
184
- - ✅ Live validation of notebooks when adding
185
- - ✅ Protection against duplicate names
186
- - ✅ DELETE and PUT endpoints for notebooks
187
- - ✅ Detailed and contextualized error messages
188
-
189
- **Fixes:**
190
-
191
- - ✅ Fixed Cookies path (Default/Network/Cookies)
192
- - ✅ Cookies threshold lowered to 10KB
193
- - ✅ Better temporary session management
194
-
195
- **Documentation:**
196
-
197
- - ✅ New guide [06-NOTEBOOK-LIBRARY.md](./06-NOTEBOOK-LIBRARY.md)
198
- - ✅ "Notebook Configuration" section in [01-INSTALL.md](./01-INSTALL.md)
199
- - ✅ Enhanced API in [03-API.md](./03-API.md)
153
+ Latest: **v1.5.8 (2026-04-19)** 2026 NotebookLM UI selectors, doctor scripts, PII scrub, `npm audit fix`.
200
154
 
201
155
  ---
202
156