@tekmidian/scribe 0.1.1 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +134 -82
  2. package/dist/index.js +223 -41
  3. package/package.json +9 -2
package/README.md CHANGED
@@ -1,8 +1,12 @@
1
- # Scribe — YouTube transcript extraction for Claude
1
+ ---
2
+ links: "[[Ideaverse/AI/Scribe/Scribe|Scribe]]"
3
+ ---
2
4
 
3
- Scribe is an MCP server that extracts transcripts and captions from YouTube videos, giving Claude the ability to read, summarize, and analyze video content without watching it.
5
+ # Scribe Content extraction for Claude
4
6
 
5
- Scribe speaks directly to YouTube's Innertube API using an Android client context no API keys, no third-party services, no credentials to manage. It bypasses EU/GDPR consent gates automatically, handles both manual and auto-generated captions, and outputs transcripts in plain text, SRT subtitle format, or structured JSON with millisecond-accurate timing data.
7
+ Scribe is an MCP server that extracts content from multiple sourcesYouTube videos, web articles, PDFs, and Claude.ai conversations giving Claude the ability to read and work with content from anywhere.
8
+
9
+ **4 providers, one tool:** `extract_content` auto-detects the source type and routes to the right provider. YouTube transcripts come from the Innertube API (no API keys needed), articles use Readability extraction, PDFs are parsed locally, and Claude.ai conversations are downloaded directly from the web UI API.
6
10
 
7
11
  ## How It Works
8
12
 
@@ -13,16 +17,17 @@ Claude (AI client)
13
17
  v
14
18
  scribe-mcp server
15
19
  |
16
- |-- 1. Fetch youtube.com watch page (extract embedded JSON + cookies)
17
- |-- 2. POST /youtubei/v1/get_transcript (Android client context)
18
- |-- 3. Parse transcript segment list from API response
20
+ |-- extract_content auto-routes by URL:
21
+ | youtube.com/* → YouTube provider (Innertube API)
22
+ | claude.ai/* → Claude provider (web UI API)
23
+ | *.pdf → PDF provider (local parsing)
24
+ | any other URL → Article provider (Readability)
19
25
  |
20
26
  v
21
- Transcript returned to Claude
22
- (text / SRT / JSON with timing)
27
+ Clean text/markdown returned to Claude
23
28
  ```
24
29
 
25
- The server runs as a local process. Claude connects over stdio via the MCP protocol. No data leaves your machine except the requests to YouTube's own endpoints.
30
+ The server runs as a local process. Claude connects over stdio via the MCP protocol.
26
31
 
27
32
  ## Quick Start
28
33
 
@@ -45,7 +50,37 @@ Or use the CLI directly:
45
50
  claude mcp add scribe-mcp -- npx -y @tekmidian/scribe
46
51
  ```
47
52
 
48
- ### Manual install (Claude Desktop)
53
+ ### Manual install
54
+
55
+ #### Claude Code
56
+
57
+ Add to `~/.claude.json`:
58
+
59
+ ```json
60
+ {
61
+ "mcpServers": {
62
+ "scribe": {
63
+ "command": "npx",
64
+ "args": ["-y", "scribe-mcp"]
65
+ }
66
+ }
67
+ }
68
+ ```
69
+
70
+ Or with Bun (faster):
71
+
72
+ ```json
73
+ {
74
+ "mcpServers": {
75
+ "scribe": {
76
+ "command": "bunx",
77
+ "args": ["scribe-mcp"]
78
+ }
79
+ }
80
+ }
81
+ ```
82
+
83
+ #### Claude Desktop
49
84
 
50
85
  Add the following to your `claude_desktop_config.json`:
51
86
 
@@ -102,89 +137,137 @@ Then point your MCP config at the built binary:
102
137
 
103
138
  | Tool | What it does |
104
139
  |------|-------------|
105
- | `youtube_transcribe` | Fetch the transcript for a YouTube video in text, SRT, or JSON format |
140
+ | `extract_content` | Extract content from any supported source auto-detects the provider |
141
+ | `list_providers` | Show all available providers and their capabilities |
142
+ | `youtube_transcribe` | Fetch YouTube transcript in text, SRT, or JSON format |
106
143
  | `youtube_list_languages` | List every caption language available for a video |
107
144
 
145
+ ## Providers
146
+
147
+ | Provider | Sources | Output |
148
+ |----------|---------|--------|
149
+ | **youtube** | YouTube videos (all URL formats + bare IDs) | Text, SRT, JSON with timing |
150
+ | **claude** | Claude.ai chats and projects | Markdown with metadata |
151
+ | **pdf** | PDF files (URLs or local paths) | Plain text |
152
+ | **article** | Any web page | Clean text via Readability |
153
+
108
154
  ## User Guide
109
155
 
110
- Scribe gives Claude the ability to read YouTube videos as text. Just describe what you want in plain language.
156
+ Just give Claude a URL. Scribe auto-detects the source type.
111
157
 
112
- ### Get a transcript
158
+ ### YouTube videos
113
159
 
114
160
  ```
115
- Transcribe this YouTube video: https://www.youtube.com/watch?v=dQw4w9WgXcQ
161
+ Summarize this YouTube video: https://www.youtube.com/watch?v=dQw4w9WgXcQ
116
162
  ```
117
163
 
118
164
  ```
119
- Get the transcript of this talk: https://youtu.be/abc123xyz11
165
+ Get me the German transcript of this lecture: [url]
120
166
  ```
121
167
 
122
168
  ```
123
- Can you read this video for me? https://youtube.com/shorts/def456uvw22
169
+ Return the transcript as JSON with timing data: [url]
124
170
  ```
125
171
 
126
- ### Change the language
172
+ ### Claude.ai conversations
127
173
 
128
174
  ```
129
- Get me the German transcript of this lecture: [url]
175
+ Download this conversation: https://claude.ai/chat/550e8400-e29b-41d4-a716-446655440000
130
176
  ```
131
177
 
132
178
  ```
133
- I need the Spanish subtitles for this video: [url]
179
+ Get all conversations from this project: https://claude.ai/project/550e8400-e29b-41d4-a716-446655440000
134
180
  ```
135
181
 
182
+ ### Web articles
183
+
136
184
  ```
137
- Transcribe this video in French
185
+ Extract the content from this article: https://example.com/interesting-post
138
186
  ```
139
187
 
140
- ### Discover available languages
188
+ ### PDFs
141
189
 
142
190
  ```
143
- What languages are available for this video? [url]
191
+ Read this PDF: https://example.com/paper.pdf
144
192
  ```
145
193
 
146
194
  ```
147
- Does this talk have Japanese captions?
195
+ Extract text from /Users/me/Documents/report.pdf
148
196
  ```
149
197
 
150
- ### Choose an output format
198
+ ### Analyze anything
151
199
 
152
200
  ```
153
- Give me the SRT subtitles for this video: [url]
201
+ Summarize this: [any supported URL]
154
202
  ```
155
203
 
156
204
  ```
157
- Return the transcript as JSON with timing data: [url]
205
+ Extract the key points from this: [any supported URL]
158
206
  ```
159
207
 
160
- ```
161
- Get the transcript with timestamps included: [url]
162
- ```
208
+ ## Claude.ai Provider Setup
163
209
 
164
- ### Ask Claude to analyze the content
210
+ The Claude provider downloads conversations from the Claude.ai web UI. It requires a session cookie for authentication. Three options:
165
211
 
166
- ```
167
- Summarize this YouTube video: [url]
168
- ```
212
+ **Option A — Playwright (automated, recommended if you have Playwright MCP):**
169
213
 
170
- ```
171
- Extract the key points from this lecture: [url]
172
- ```
214
+ Ask Claude Code to navigate to claude.ai and extract cookies:
173
215
 
174
216
  ```
175
- What are the main arguments made in this talk? [url]
217
+ Navigate to claude.ai and extract all cookies, save them as JSON to ~/claude-cookies.json
176
218
  ```
177
219
 
178
- ```
179
- Find every mention of "machine learning" in this video and the timestamp it appears: [url]
180
- ```
220
+ Then add to your MCP config:
181
221
 
222
+ ```json
223
+ {
224
+ "mcpServers": {
225
+ "scribe": {
226
+ "command": "npx",
227
+ "args": ["-y", "@tekmidian/scribe"],
228
+ "env": {
229
+ "CLAUDE_COOKIES_FILE": "/Users/you/claude-cookies.json"
230
+ }
231
+ }
232
+ }
233
+ }
182
234
  ```
183
- Translate the transcript of this video into English: [url]
235
+
236
+ **Option B — Browser extension:**
237
+
238
+ Install a cookie export extension (e.g. "Cookie-Editor"), export claude.ai cookies as JSON, and set `CLAUDE_COOKIES_FILE` as above.
239
+
240
+ **Option C — Manual:**
241
+
242
+ Open claude.ai → F12 → Application → Cookies → copy the `sessionKey` value:
243
+
244
+ ```json
245
+ {
246
+ "env": {
247
+ "CLAUDE_SESSION_KEY": "sk-ant-sid01-..."
248
+ }
249
+ }
184
250
  ```
185
251
 
252
+ Without either env var, the Claude provider is silently disabled — other providers still work normally.
253
+
186
254
  ## MCP Tool Reference
187
255
 
256
+ ### extract_content
257
+
258
+ Extracts content from any supported source. Auto-detects the provider based on the URL.
259
+
260
+ | Parameter | Type | Required | Default | Description |
261
+ |-----------|------|----------|---------|-------------|
262
+ | `url` | string | yes | — | URL or file path to extract content from |
263
+ | `format` | string | no | `text` | Output format (available formats depend on provider) |
264
+ | `language` | string | no | — | Preferred language code (YouTube only) |
265
+ | `timestamps` | boolean | no | `false` | Include timestamps (YouTube text format only) |
266
+
267
+ ### list_providers
268
+
269
+ Lists all available providers and their capabilities. No parameters.
270
+
188
271
  ### youtube_transcribe
189
272
 
190
273
  Fetches captions for a YouTube video and returns them in the requested format.
@@ -235,46 +318,13 @@ Available languages for dQw4w9WgXcQ:
235
318
 
236
319
  ## Configuration
237
320
 
238
- Scribe has no configuration file. All behavior is controlled by parameters passed per-request. The server runs on stdio and exits when the client disconnects.
321
+ All behavior is controlled by parameters passed per-request. Optional environment variables enable the Claude.ai provider:
239
322
 
240
- **npx (Node.js):**
241
-
242
- ```json
243
- {
244
- "mcpServers": {
245
- "scribe": {
246
- "command": "npx",
247
- "args": ["-y", "scribe-mcp"]
248
- }
249
- }
250
- }
251
- ```
252
-
253
- **bunx (Bun):**
254
-
255
- ```json
256
- {
257
- "mcpServers": {
258
- "scribe": {
259
- "command": "bunx",
260
- "args": ["scribe-mcp"]
261
- }
262
- }
263
- }
264
- ```
265
-
266
- **Local build:**
267
-
268
- ```json
269
- {
270
- "mcpServers": {
271
- "scribe": {
272
- "command": "node",
273
- "args": ["/path/to/Scribe/dist/index.js"]
274
- }
275
- }
276
- }
277
- ```
323
+ | Variable | Required | Description |
324
+ |----------|----------|-------------|
325
+ | `CLAUDE_COOKIES_FILE` | no | Path to browser cookie export JSON (claude.ai provider) |
326
+ | `CLAUDE_SESSION_KEY` | no | Direct session key value (claude.ai provider) |
327
+ | `CLAUDE_ORG_ID` | no | Organization ID (auto-discovered if not set) |
278
328
 
279
329
  ## Troubleshooting
280
330
 
@@ -304,14 +354,13 @@ If you transcribe many videos in rapid succession, YouTube may temporarily throt
304
354
  - An MCP-compatible client (Claude Desktop, Claude Code, or any MCP-aware host)
305
355
  - Internet access to reach `youtube.com` and `www.youtube.com/youtubei/v1/`
306
356
 
307
- No API keys. No accounts. No external dependencies beyond the MCP SDK and Zod.
357
+ No API keys needed for YouTube, articles, or PDFs. Claude.ai provider requires a session cookie (see setup above).
308
358
 
309
359
  ## Coming soon
310
360
 
311
361
  - Vimeo transcript extraction
312
362
  - Direct audio/video file transcription
313
363
  - Podcast RSS feed support
314
- - SoundCloud track transcription
315
364
 
316
365
  ## License
317
366
 
@@ -320,3 +369,6 @@ MIT
320
369
  ## Author
321
370
 
322
371
  Matthias Nott — [github.com/mnott](https://github.com/mnott)
372
+
373
+ ---
374
+ *Links:* [[Ideaverse/AI/Scribe/Scribe|Scribe]]