@tekmidian/scribe 0.1.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +144 -84
  2. package/dist/index.js +223 -41
  3. package/package.json +23 -3
package/README.md CHANGED
@@ -1,8 +1,12 @@
1
- # Scribe — YouTube transcript extraction for Claude
1
+ ---
2
+ links: "[[Ideaverse/AI/Scribe/Scribe|Scribe]]"
3
+ ---
2
4
 
3
- Scribe is an MCP server that extracts transcripts and captions from YouTube videos, giving Claude the ability to read, summarize, and analyze video content without watching it.
5
+ # Scribe Content extraction for Claude
4
6
 
5
- Scribe speaks directly to YouTube's Innertube API using an Android client context no API keys, no third-party services, no credentials to manage. It bypasses EU/GDPR consent gates automatically, handles both manual and auto-generated captions, and outputs transcripts in plain text, SRT subtitle format, or structured JSON with millisecond-accurate timing data.
7
+ Scribe is an MCP server that extracts content from multiple sourcesYouTube videos, web articles, PDFs, and Claude.ai conversations giving Claude the ability to read and work with content from anywhere.
8
+
9
+ **4 providers, one tool:** `extract_content` auto-detects the source type and routes to the right provider. YouTube transcripts come from the Innertube API (no API keys needed), articles use Readability extraction, PDFs are parsed locally, and Claude.ai conversations are downloaded directly from the web UI API.
6
10
 
7
11
  ## How It Works
8
12
 
@@ -13,16 +17,17 @@ Claude (AI client)
13
17
  v
14
18
  scribe-mcp server
15
19
  |
16
- |-- 1. Fetch youtube.com watch page (extract embedded JSON + cookies)
17
- |-- 2. POST /youtubei/v1/get_transcript (Android client context)
18
- |-- 3. Parse transcript segment list from API response
20
+ |-- extract_content auto-routes by URL:
21
+ | youtube.com/* → YouTube provider (Innertube API)
22
+ | claude.ai/* → Claude provider (web UI API)
23
+ | *.pdf → PDF provider (local parsing)
24
+ | any other URL → Article provider (Readability)
19
25
  |
20
26
  v
21
- Transcript returned to Claude
22
- (text / SRT / JSON with timing)
27
+ Clean text/markdown returned to Claude
23
28
  ```
24
29
 
25
- The server runs as a local process. Claude connects over stdio via the MCP protocol. No data leaves your machine except the requests to YouTube's own endpoints.
30
+ The server runs as a local process. Claude connects over stdio via the MCP protocol.
26
31
 
27
32
  ## Quick Start
28
33
 
@@ -31,13 +36,51 @@ The server runs as a local process. Claude connects over stdio via the MCP proto
31
36
  - [Claude Desktop](https://claude.ai/download) or [Claude Code](https://claude.ai/code)
32
37
  - [Node.js](https://nodejs.org) 18+ **or** [Bun](https://bun.sh) 1.0+
33
38
 
34
- ### Install via Claude Code
39
+ ### Install with Claude Code
40
+
41
+ Tell Claude:
42
+
43
+ > *"Install the scribe MCP server from github.com/mnott/Scribe"*
44
+
45
+ Claude will clone the repo, build it, and add it to your MCP config.
46
+
47
+ Or use the CLI directly:
35
48
 
36
49
  ```bash
37
- claude mcp add scribe-mcp -- npx -y scribe-mcp
50
+ claude mcp add scribe-mcp -- npx -y @tekmidian/scribe
38
51
  ```
39
52
 
40
- ### Manual install (Claude Desktop)
53
+ ### Manual install
54
+
55
+ #### Claude Code
56
+
57
+ Add to `~/.claude.json`:
58
+
59
+ ```json
60
+ {
61
+ "mcpServers": {
62
+ "scribe": {
63
+ "command": "npx",
64
+ "args": ["-y", "scribe-mcp"]
65
+ }
66
+ }
67
+ }
68
+ ```
69
+
70
+ Or with Bun (faster):
71
+
72
+ ```json
73
+ {
74
+ "mcpServers": {
75
+ "scribe": {
76
+ "command": "bunx",
77
+ "args": ["scribe-mcp"]
78
+ }
79
+ }
80
+ }
81
+ ```
82
+
83
+ #### Claude Desktop
41
84
 
42
85
  Add the following to your `claude_desktop_config.json`:
43
86
 
@@ -94,89 +137,137 @@ Then point your MCP config at the built binary:
94
137
 
95
138
  | Tool | What it does |
96
139
  |------|-------------|
97
- | `youtube_transcribe` | Fetch the transcript for a YouTube video in text, SRT, or JSON format |
140
+ | `extract_content` | Extract content from any supported source auto-detects the provider |
141
+ | `list_providers` | Show all available providers and their capabilities |
142
+ | `youtube_transcribe` | Fetch YouTube transcript in text, SRT, or JSON format |
98
143
  | `youtube_list_languages` | List every caption language available for a video |
99
144
 
145
+ ## Providers
146
+
147
+ | Provider | Sources | Output |
148
+ |----------|---------|--------|
149
+ | **youtube** | YouTube videos (all URL formats + bare IDs) | Text, SRT, JSON with timing |
150
+ | **claude** | Claude.ai chats and projects | Markdown with metadata |
151
+ | **pdf** | PDF files (URLs or local paths) | Plain text |
152
+ | **article** | Any web page | Clean text via Readability |
153
+
100
154
  ## User Guide
101
155
 
102
- Scribe gives Claude the ability to read YouTube videos as text. Just describe what you want in plain language.
156
+ Just give Claude a URL. Scribe auto-detects the source type.
103
157
 
104
- ### Get a transcript
158
+ ### YouTube videos
105
159
 
106
160
  ```
107
- Transcribe this YouTube video: https://www.youtube.com/watch?v=dQw4w9WgXcQ
161
+ Summarize this YouTube video: https://www.youtube.com/watch?v=dQw4w9WgXcQ
108
162
  ```
109
163
 
110
164
  ```
111
- Get the transcript of this talk: https://youtu.be/abc123xyz11
165
+ Get me the German transcript of this lecture: [url]
112
166
  ```
113
167
 
114
168
  ```
115
- Can you read this video for me? https://youtube.com/shorts/def456uvw22
169
+ Return the transcript as JSON with timing data: [url]
116
170
  ```
117
171
 
118
- ### Change the language
172
+ ### Claude.ai conversations
119
173
 
120
174
  ```
121
- Get me the German transcript of this lecture: [url]
175
+ Download this conversation: https://claude.ai/chat/550e8400-e29b-41d4-a716-446655440000
122
176
  ```
123
177
 
124
178
  ```
125
- I need the Spanish subtitles for this video: [url]
179
+ Get all conversations from this project: https://claude.ai/project/550e8400-e29b-41d4-a716-446655440000
126
180
  ```
127
181
 
182
+ ### Web articles
183
+
128
184
  ```
129
- Transcribe this video in French
185
+ Extract the content from this article: https://example.com/interesting-post
130
186
  ```
131
187
 
132
- ### Discover available languages
188
+ ### PDFs
133
189
 
134
190
  ```
135
- What languages are available for this video? [url]
191
+ Read this PDF: https://example.com/paper.pdf
136
192
  ```
137
193
 
138
194
  ```
139
- Does this talk have Japanese captions?
195
+ Extract text from /Users/me/Documents/report.pdf
140
196
  ```
141
197
 
142
- ### Choose an output format
198
+ ### Analyze anything
143
199
 
144
200
  ```
145
- Give me the SRT subtitles for this video: [url]
201
+ Summarize this: [any supported URL]
146
202
  ```
147
203
 
148
204
  ```
149
- Return the transcript as JSON with timing data: [url]
205
+ Extract the key points from this: [any supported URL]
150
206
  ```
151
207
 
152
- ```
153
- Get the transcript with timestamps included: [url]
154
- ```
208
+ ## Claude.ai Provider Setup
155
209
 
156
- ### Ask Claude to analyze the content
210
+ The Claude provider downloads conversations from the Claude.ai web UI. It requires a session cookie for authentication. Three options:
157
211
 
158
- ```
159
- Summarize this YouTube video: [url]
160
- ```
212
+ **Option A — Playwright (automated, recommended if you have Playwright MCP):**
161
213
 
162
- ```
163
- Extract the key points from this lecture: [url]
164
- ```
214
+ Ask Claude Code to navigate to claude.ai and extract cookies:
165
215
 
166
216
  ```
167
- What are the main arguments made in this talk? [url]
217
+ Navigate to claude.ai and extract all cookies, save them as JSON to ~/claude-cookies.json
168
218
  ```
169
219
 
170
- ```
171
- Find every mention of "machine learning" in this video and the timestamp it appears: [url]
172
- ```
220
+ Then add to your MCP config:
173
221
 
222
+ ```json
223
+ {
224
+ "mcpServers": {
225
+ "scribe": {
226
+ "command": "npx",
227
+ "args": ["-y", "@tekmidian/scribe"],
228
+ "env": {
229
+ "CLAUDE_COOKIES_FILE": "/Users/you/claude-cookies.json"
230
+ }
231
+ }
232
+ }
233
+ }
174
234
  ```
175
- Translate the transcript of this video into English: [url]
235
+
236
+ **Option B — Browser extension:**
237
+
238
+ Install a cookie export extension (e.g. "Cookie-Editor"), export claude.ai cookies as JSON, and set `CLAUDE_COOKIES_FILE` as above.
239
+
240
+ **Option C — Manual:**
241
+
242
+ Open claude.ai → F12 → Application → Cookies → copy the `sessionKey` value:
243
+
244
+ ```json
245
+ {
246
+ "env": {
247
+ "CLAUDE_SESSION_KEY": "sk-ant-sid01-..."
248
+ }
249
+ }
176
250
  ```
177
251
 
252
+ Without either env var, the Claude provider is silently disabled — other providers still work normally.
253
+
178
254
  ## MCP Tool Reference
179
255
 
256
+ ### extract_content
257
+
258
+ Extracts content from any supported source. Auto-detects the provider based on the URL.
259
+
260
+ | Parameter | Type | Required | Default | Description |
261
+ |-----------|------|----------|---------|-------------|
262
+ | `url` | string | yes | — | URL or file path to extract content from |
263
+ | `format` | string | no | `text` | Output format (available formats depend on provider) |
264
+ | `language` | string | no | — | Preferred language code (YouTube only) |
265
+ | `timestamps` | boolean | no | `false` | Include timestamps (YouTube text format only) |
266
+
267
+ ### list_providers
268
+
269
+ Lists all available providers and their capabilities. No parameters.
270
+
180
271
  ### youtube_transcribe
181
272
 
182
273
  Fetches captions for a YouTube video and returns them in the requested format.
@@ -227,46 +318,13 @@ Available languages for dQw4w9WgXcQ:
227
318
 
228
319
  ## Configuration
229
320
 
230
- Scribe has no configuration file. All behavior is controlled by parameters passed per-request. The server runs on stdio and exits when the client disconnects.
321
+ All behavior is controlled by parameters passed per-request. Optional environment variables enable the Claude.ai provider:
231
322
 
232
- **npx (Node.js):**
233
-
234
- ```json
235
- {
236
- "mcpServers": {
237
- "scribe": {
238
- "command": "npx",
239
- "args": ["-y", "scribe-mcp"]
240
- }
241
- }
242
- }
243
- ```
244
-
245
- **bunx (Bun):**
246
-
247
- ```json
248
- {
249
- "mcpServers": {
250
- "scribe": {
251
- "command": "bunx",
252
- "args": ["scribe-mcp"]
253
- }
254
- }
255
- }
256
- ```
257
-
258
- **Local build:**
259
-
260
- ```json
261
- {
262
- "mcpServers": {
263
- "scribe": {
264
- "command": "node",
265
- "args": ["/path/to/Scribe/dist/index.js"]
266
- }
267
- }
268
- }
269
- ```
323
+ | Variable | Required | Description |
324
+ |----------|----------|-------------|
325
+ | `CLAUDE_COOKIES_FILE` | no | Path to browser cookie export JSON (claude.ai provider) |
326
+ | `CLAUDE_SESSION_KEY` | no | Direct session key value (claude.ai provider) |
327
+ | `CLAUDE_ORG_ID` | no | Organization ID (auto-discovered if not set) |
270
328
 
271
329
  ## Troubleshooting
272
330
 
@@ -296,14 +354,13 @@ If you transcribe many videos in rapid succession, YouTube may temporarily throt
296
354
  - An MCP-compatible client (Claude Desktop, Claude Code, or any MCP-aware host)
297
355
  - Internet access to reach `youtube.com` and `www.youtube.com/youtubei/v1/`
298
356
 
299
- No API keys. No accounts. No external dependencies beyond the MCP SDK and Zod.
357
+ No API keys needed for YouTube, articles, or PDFs. Claude.ai provider requires a session cookie (see setup above).
300
358
 
301
359
  ## Coming soon
302
360
 
303
361
  - Vimeo transcript extraction
304
362
  - Direct audio/video file transcription
305
363
  - Podcast RSS feed support
306
- - SoundCloud track transcription
307
364
 
308
365
  ## License
309
366
 
@@ -312,3 +369,6 @@ MIT
312
369
  ## Author
313
370
 
314
371
  Matthias Nott — [github.com/mnott](https://github.com/mnott)
372
+
373
+ ---
374
+ *Links:* [[Ideaverse/AI/Scribe/Scribe|Scribe]]