screenpipe-mcp 0.8.4 → 0.8.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +65 -1
- package/dist/index.js +44 -8
- package/package.json +3 -2
- package/src/index.ts +44 -8
package/README.md
CHANGED
|
@@ -28,7 +28,47 @@ The easiest way to use screenpipe-mcp is with npx. Edit your Claude Desktop conf
|
|
|
28
28
|
}
|
|
29
29
|
```
|
|
30
30
|
|
|
31
|
-
### Option 2:
|
|
31
|
+
### Option 2: HTTP Server (Remote / Network Access)
|
|
32
|
+
|
|
33
|
+
The MCP server can run over HTTP using the [Streamable HTTP transport](https://modelcontextprotocol.io/specification/2025-03-26/basic/transports#streamable-http), allowing remote MCP clients to connect over the network instead of stdio. This is ideal when your AI assistant (e.g., OpenClaw) runs on a different machine than screenpipe.
|
|
34
|
+
|
|
35
|
+
```bash
|
|
36
|
+
# from npm
|
|
37
|
+
npx screenpipe-mcp-http --port 3031
|
|
38
|
+
|
|
39
|
+
# or from source
|
|
40
|
+
npm run start:http -- --port 3031
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
The server exposes:
|
|
44
|
+
- **MCP endpoint**: `http://localhost:3031/mcp` — Streamable HTTP transport (POST for requests, GET for SSE stream)
|
|
45
|
+
- **Health check**: `http://localhost:3031/health`
|
|
46
|
+
|
|
47
|
+
**Options:**
|
|
48
|
+
| Flag | Description | Default |
|
|
49
|
+
|------|-------------|---------|
|
|
50
|
+
| `--port` | Port for the MCP HTTP server | 3031 |
|
|
51
|
+
| `--screenpipe-port` | Port where screenpipe API is running | 3030 |
|
|
52
|
+
|
|
53
|
+
**Connecting a remote MCP client:**
|
|
54
|
+
|
|
55
|
+
Point any MCP client that supports HTTP transport at the `/mcp` endpoint:
|
|
56
|
+
|
|
57
|
+
```json
|
|
58
|
+
{
|
|
59
|
+
"mcpServers": {
|
|
60
|
+
"screenpipe": {
|
|
61
|
+
"url": "http://<your-ip>:3031/mcp"
|
|
62
|
+
}
|
|
63
|
+
}
|
|
64
|
+
}
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
If your machines are on different networks, expose port 3031 via Tailscale, SSH tunnel, or similar — see the [OpenClaw integration guide](https://docs.screenpi.pe/openclaw) for detailed examples.
|
|
68
|
+
|
|
69
|
+
> **Note:** The HTTP server currently exposes `search_content` only. The stdio server has the full tool set (export-video, list-meetings, activity-summary, search-elements, frame-context). We're working on bringing HTTP to full parity.
|
|
70
|
+
|
|
71
|
+
### Option 3: From Source
|
|
32
72
|
|
|
33
73
|
Clone and build from source:
|
|
34
74
|
|
|
@@ -62,6 +102,13 @@ Test with MCP Inspector:
|
|
|
62
102
|
npx @modelcontextprotocol/inspector npx screenpipe-mcp
|
|
63
103
|
```
|
|
64
104
|
|
|
105
|
+
## Transport Modes
|
|
106
|
+
|
|
107
|
+
| Mode | Command | Use Case |
|
|
108
|
+
|------|---------|----------|
|
|
109
|
+
| **stdio** (default) | `npx screenpipe-mcp` | Claude Desktop, local MCP clients |
|
|
110
|
+
| **HTTP** | `npx screenpipe-mcp-http` | Remote clients, network access, OpenClaw on VPS |
|
|
111
|
+
|
|
65
112
|
## Available Tools
|
|
66
113
|
|
|
67
114
|
### search-content
|
|
@@ -79,6 +126,23 @@ Export screen recordings as video files:
|
|
|
79
126
|
- Specify time range with start/end times
|
|
80
127
|
- Configurable FPS for output video
|
|
81
128
|
|
|
129
|
+
### activity-summary
|
|
130
|
+
Get a lightweight compressed activity overview for a time range:
|
|
131
|
+
- App usage with active minutes and frame counts
|
|
132
|
+
- Recent accessibility texts
|
|
133
|
+
- Audio speaker summary
|
|
134
|
+
|
|
135
|
+
### list-meetings
|
|
136
|
+
List detected meetings with duration, app, and attendees.
|
|
137
|
+
|
|
138
|
+
### search-elements
|
|
139
|
+
Search structured UI elements (accessibility tree nodes and OCR text blocks):
|
|
140
|
+
- Filter by source, role, app, time range
|
|
141
|
+
- Much lighter than search-content for targeted UI lookups
|
|
142
|
+
|
|
143
|
+
### frame-context
|
|
144
|
+
Get accessibility text, parsed tree nodes, and extracted URLs for a specific frame.
|
|
145
|
+
|
|
82
146
|
## Example Queries in Claude
|
|
83
147
|
|
|
84
148
|
- "Search for any mentions of 'rust' in my screen recordings"
|
package/dist/index.js
CHANGED
|
@@ -69,7 +69,7 @@ const SCREENPIPE_API = `http://localhost:${port}`;
|
|
|
69
69
|
// Initialize server
|
|
70
70
|
const server = new index_js_1.Server({
|
|
71
71
|
name: "screenpipe",
|
|
72
|
-
version: "0.8.
|
|
72
|
+
version: "0.8.5",
|
|
73
73
|
}, {
|
|
74
74
|
capabilities: {
|
|
75
75
|
tools: {},
|
|
@@ -85,10 +85,14 @@ const BASE_TOOLS = [
|
|
|
85
85
|
"Returns timestamped results with app context. " +
|
|
86
86
|
"Call with no parameters to get recent activity. " +
|
|
87
87
|
"Use the 'screenpipe://context' resource for current time when building time-based queries.\n\n" +
|
|
88
|
+
"WHEN TO USE WHICH content_type:\n" +
|
|
89
|
+
"- For meetings/calls/conversations: content_type='audio', do NOT use q param (transcriptions are noisy, q filters too aggressively)\n" +
|
|
90
|
+
"- For screen text/reading: content_type='all' or 'accessibility'\n" +
|
|
91
|
+
"- For time spent/app usage questions: use activity-summary tool instead (this tool returns content, not time stats)\n\n" +
|
|
88
92
|
"SEARCH STRATEGY: First search with ONLY time params (start_time/end_time) — no q, no app_name, no content_type. " +
|
|
89
93
|
"This gives ground truth of what's recorded. Scan results to find correct app_name values, then narrow with filters using exact observed values. " +
|
|
90
|
-
"App names are case-sensitive
|
|
91
|
-
"The q param searches captured text
|
|
94
|
+
"App names are case-sensitive (e.g. 'Discord' vs 'Discord.exe'). " +
|
|
95
|
+
"The q param searches captured text, NOT app names. NEVER report 'no data' after one filtered search — verify with unfiltered time-only search first.\n\n" +
|
|
92
96
|
"DEEP LINKS: When referencing specific moments, create clickable links using IDs from search results:\n" +
|
|
93
97
|
"- OCR results (PREFERRED): [10:30 AM — Chrome](screenpipe://frame/12345) — use content.frame_id from the result\n" +
|
|
94
98
|
"- Audio results: [meeting at 3pm](screenpipe://timeline?timestamp=2024-01-15T15:00:00Z) — use exact timestamp from result\n" +
|
|
@@ -102,12 +106,12 @@ const BASE_TOOLS = [
|
|
|
102
106
|
properties: {
|
|
103
107
|
q: {
|
|
104
108
|
type: "string",
|
|
105
|
-
description: "Search query. Optional - omit to return all
|
|
109
|
+
description: "Search query (full-text search on captured text). Optional - omit to return all content in time range. IMPORTANT: Do NOT use q for audio/meeting searches — transcriptions are noisy and q filters too aggressively. Only use q when searching for specific text the user saw on screen.",
|
|
106
110
|
},
|
|
107
111
|
content_type: {
|
|
108
112
|
type: "string",
|
|
109
113
|
enum: ["all", "ocr", "audio", "input", "accessibility"],
|
|
110
|
-
description: "Content type filter: '
|
|
114
|
+
description: "Content type filter: 'audio' (transcriptions — use for meetings/calls/conversations), 'accessibility' (accessibility tree text, preferred for screen content), 'ocr' (screen text via OCR, legacy fallback), 'input' (clicks, keystrokes, clipboard, app switches), 'all'. Default: 'all'. For meeting/call queries, ALWAYS use 'audio'.",
|
|
111
115
|
default: "all",
|
|
112
116
|
},
|
|
113
117
|
limit: {
|
|
@@ -240,9 +244,14 @@ const BASE_TOOLS = [
|
|
|
240
244
|
description: "Get a lightweight compressed activity overview for a time range (~200-500 tokens). " +
|
|
241
245
|
"Returns app usage (name, frame count, active minutes, first/last seen), recent accessibility texts, and audio speaker summary. " +
|
|
242
246
|
"Minutes are based on active session time (consecutive frames with gaps < 5min count as active). " +
|
|
243
|
-
"first_seen/last_seen show the wall-clock span per app
|
|
244
|
-
"
|
|
245
|
-
"
|
|
247
|
+
"first_seen/last_seen show the wall-clock span per app.\n\n" +
|
|
248
|
+
"USE THIS TOOL (not search-content or raw SQL) for:\n" +
|
|
249
|
+
"- 'how long did I spend on X?' → active_minutes per app\n" +
|
|
250
|
+
"- 'which apps did I use today?' → app list sorted by active_minutes\n" +
|
|
251
|
+
"- 'what was I doing?' → broad overview before drilling deeper\n" +
|
|
252
|
+
"- Any time-spent or app-usage question\n\n" +
|
|
253
|
+
"WARNING: Do NOT estimate time from raw frame counts or SQL queries — those are inaccurate. " +
|
|
254
|
+
"This endpoint calculates actual active session time correctly.",
|
|
246
255
|
annotations: {
|
|
247
256
|
title: "Activity Summary",
|
|
248
257
|
readOnlyHint: true,
|
|
@@ -427,6 +436,20 @@ Screenpipe captures four types of data:
|
|
|
427
436
|
- **Get keyboard input**: \`{"content_type": "input"}\`
|
|
428
437
|
- **Get audio only**: \`{"content_type": "audio"}\`
|
|
429
438
|
|
|
439
|
+
## Common User Requests → Correct Tool Choice
|
|
440
|
+
| User says | Use this tool | Key params |
|
|
441
|
+
|-----------|--------------|------------|
|
|
442
|
+
| "summarize my meeting/call" | search-content | content_type:"audio", NO q param, start_time |
|
|
443
|
+
| "what did they/I say about X" | search-content | content_type:"audio", NO q param (scan results manually) |
|
|
444
|
+
| "how long on X" / "which apps" / "time spent" | activity-summary | start_time, end_time |
|
|
445
|
+
| "what was I doing" | activity-summary | start_time, end_time (then drill into search-content) |
|
|
446
|
+
| "what was I reading/looking at" | search-content | content_type:"all", start_time |
|
|
447
|
+
|
|
448
|
+
## Behavior Rules
|
|
449
|
+
- Act immediately on clear requests. NEVER ask "what time range?" or "which content type?" when the intent is obvious.
|
|
450
|
+
- If search returns empty, silently retry with wider time range or fewer filters. Do NOT ask the user what to change.
|
|
451
|
+
- For meetings: ALWAYS use content_type:"audio" and do NOT use the q param. Transcriptions are noisy — q filters too aggressively and misses relevant content.
|
|
452
|
+
|
|
430
453
|
## search-content
|
|
431
454
|
| Parameter | Description | Default |
|
|
432
455
|
|-----------|-------------|---------|
|
|
@@ -452,6 +475,19 @@ Screenpipe captures four types of data:
|
|
|
452
475
|
4. **Fetch frame-context** for URLs and accessibility tree of specific frames
|
|
453
476
|
5. **Screenshots** (include_frames=true) only when text isn't enough
|
|
454
477
|
|
|
478
|
+
## Chat History
|
|
479
|
+
Previous screenpipe chat conversations are stored as individual JSON files in ~/.screenpipe/chats/{conversation-id}.json
|
|
480
|
+
Each file contains: id, title, messages[], createdAt, updatedAt. You can read these files to reference or search previous conversations.
|
|
481
|
+
|
|
482
|
+
## Speaker Management
|
|
483
|
+
screenpipe auto-identifies speakers in audio. API endpoints for managing them:
|
|
484
|
+
- \`GET /speakers/unnamed?limit=10\` — list unnamed speakers
|
|
485
|
+
- \`GET /speakers/search?name=John\` — search by name
|
|
486
|
+
- \`POST /speakers/update\` with \`{"id": 5, "name": "John"}\` — rename a speaker
|
|
487
|
+
- \`POST /speakers/merge\` with \`{"speaker_to_keep_id": 1, "speaker_to_merge_id": 2}\` — merge duplicates
|
|
488
|
+
- \`GET /speakers/similar?speaker_id=5\` — find similar speakers for merging
|
|
489
|
+
- \`POST /speakers/reassign\` — reassign audio chunk to different speaker
|
|
490
|
+
|
|
455
491
|
## Tips
|
|
456
492
|
1. Read screenpipe://context first to get current timestamps
|
|
457
493
|
2. Use activity-summary before search-content for broad overview questions
|
package/package.json
CHANGED
|
@@ -1,10 +1,11 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "screenpipe-mcp",
|
|
3
|
-
"version": "0.8.
|
|
3
|
+
"version": "0.8.6",
|
|
4
4
|
"description": "MCP server for screenpipe - search your screen recordings and audio transcriptions",
|
|
5
5
|
"main": "dist/index.js",
|
|
6
6
|
"bin": {
|
|
7
|
-
"screenpipe-mcp": "dist/index.js"
|
|
7
|
+
"screenpipe-mcp": "dist/index.js",
|
|
8
|
+
"screenpipe-mcp-http": "dist/http-server.js"
|
|
8
9
|
},
|
|
9
10
|
"scripts": {
|
|
10
11
|
"build": "tsc",
|
package/src/index.ts
CHANGED
|
@@ -48,7 +48,7 @@ const SCREENPIPE_API = `http://localhost:${port}`;
|
|
|
48
48
|
const server = new Server(
|
|
49
49
|
{
|
|
50
50
|
name: "screenpipe",
|
|
51
|
-
version: "0.8.
|
|
51
|
+
version: "0.8.5",
|
|
52
52
|
},
|
|
53
53
|
{
|
|
54
54
|
capabilities: {
|
|
@@ -68,10 +68,14 @@ const BASE_TOOLS: Tool[] = [
|
|
|
68
68
|
"Returns timestamped results with app context. " +
|
|
69
69
|
"Call with no parameters to get recent activity. " +
|
|
70
70
|
"Use the 'screenpipe://context' resource for current time when building time-based queries.\n\n" +
|
|
71
|
+
"WHEN TO USE WHICH content_type:\n" +
|
|
72
|
+
"- For meetings/calls/conversations: content_type='audio', do NOT use q param (transcriptions are noisy, q filters too aggressively)\n" +
|
|
73
|
+
"- For screen text/reading: content_type='all' or 'accessibility'\n" +
|
|
74
|
+
"- For time spent/app usage questions: use activity-summary tool instead (this tool returns content, not time stats)\n\n" +
|
|
71
75
|
"SEARCH STRATEGY: First search with ONLY time params (start_time/end_time) — no q, no app_name, no content_type. " +
|
|
72
76
|
"This gives ground truth of what's recorded. Scan results to find correct app_name values, then narrow with filters using exact observed values. " +
|
|
73
|
-
"App names are case-sensitive
|
|
74
|
-
"The q param searches captured text
|
|
77
|
+
"App names are case-sensitive (e.g. 'Discord' vs 'Discord.exe'). " +
|
|
78
|
+
"The q param searches captured text, NOT app names. NEVER report 'no data' after one filtered search — verify with unfiltered time-only search first.\n\n" +
|
|
75
79
|
"DEEP LINKS: When referencing specific moments, create clickable links using IDs from search results:\n" +
|
|
76
80
|
"- OCR results (PREFERRED): [10:30 AM — Chrome](screenpipe://frame/12345) — use content.frame_id from the result\n" +
|
|
77
81
|
"- Audio results: [meeting at 3pm](screenpipe://timeline?timestamp=2024-01-15T15:00:00Z) — use exact timestamp from result\n" +
|
|
@@ -85,12 +89,12 @@ const BASE_TOOLS: Tool[] = [
|
|
|
85
89
|
properties: {
|
|
86
90
|
q: {
|
|
87
91
|
type: "string",
|
|
88
|
-
description: "Search query. Optional - omit to return all
|
|
92
|
+
description: "Search query (full-text search on captured text). Optional - omit to return all content in time range. IMPORTANT: Do NOT use q for audio/meeting searches — transcriptions are noisy and q filters too aggressively. Only use q when searching for specific text the user saw on screen.",
|
|
89
93
|
},
|
|
90
94
|
content_type: {
|
|
91
95
|
type: "string",
|
|
92
96
|
enum: ["all", "ocr", "audio", "input", "accessibility"],
|
|
93
|
-
description: "Content type filter: '
|
|
97
|
+
description: "Content type filter: 'audio' (transcriptions — use for meetings/calls/conversations), 'accessibility' (accessibility tree text, preferred for screen content), 'ocr' (screen text via OCR, legacy fallback), 'input' (clicks, keystrokes, clipboard, app switches), 'all'. Default: 'all'. For meeting/call queries, ALWAYS use 'audio'.",
|
|
94
98
|
default: "all",
|
|
95
99
|
},
|
|
96
100
|
limit: {
|
|
@@ -229,9 +233,14 @@ const BASE_TOOLS: Tool[] = [
|
|
|
229
233
|
"Get a lightweight compressed activity overview for a time range (~200-500 tokens). " +
|
|
230
234
|
"Returns app usage (name, frame count, active minutes, first/last seen), recent accessibility texts, and audio speaker summary. " +
|
|
231
235
|
"Minutes are based on active session time (consecutive frames with gaps < 5min count as active). " +
|
|
232
|
-
"first_seen/last_seen show the wall-clock span per app
|
|
233
|
-
"
|
|
234
|
-
"
|
|
236
|
+
"first_seen/last_seen show the wall-clock span per app.\n\n" +
|
|
237
|
+
"USE THIS TOOL (not search-content or raw SQL) for:\n" +
|
|
238
|
+
"- 'how long did I spend on X?' → active_minutes per app\n" +
|
|
239
|
+
"- 'which apps did I use today?' → app list sorted by active_minutes\n" +
|
|
240
|
+
"- 'what was I doing?' → broad overview before drilling deeper\n" +
|
|
241
|
+
"- Any time-spent or app-usage question\n\n" +
|
|
242
|
+
"WARNING: Do NOT estimate time from raw frame counts or SQL queries — those are inaccurate. " +
|
|
243
|
+
"This endpoint calculates actual active session time correctly.",
|
|
235
244
|
annotations: {
|
|
236
245
|
title: "Activity Summary",
|
|
237
246
|
readOnlyHint: true,
|
|
@@ -424,6 +433,20 @@ Screenpipe captures four types of data:
|
|
|
424
433
|
- **Get keyboard input**: \`{"content_type": "input"}\`
|
|
425
434
|
- **Get audio only**: \`{"content_type": "audio"}\`
|
|
426
435
|
|
|
436
|
+
## Common User Requests → Correct Tool Choice
|
|
437
|
+
| User says | Use this tool | Key params |
|
|
438
|
+
|-----------|--------------|------------|
|
|
439
|
+
| "summarize my meeting/call" | search-content | content_type:"audio", NO q param, start_time |
|
|
440
|
+
| "what did they/I say about X" | search-content | content_type:"audio", NO q param (scan results manually) |
|
|
441
|
+
| "how long on X" / "which apps" / "time spent" | activity-summary | start_time, end_time |
|
|
442
|
+
| "what was I doing" | activity-summary | start_time, end_time (then drill into search-content) |
|
|
443
|
+
| "what was I reading/looking at" | search-content | content_type:"all", start_time |
|
|
444
|
+
|
|
445
|
+
## Behavior Rules
|
|
446
|
+
- Act immediately on clear requests. NEVER ask "what time range?" or "which content type?" when the intent is obvious.
|
|
447
|
+
- If search returns empty, silently retry with wider time range or fewer filters. Do NOT ask the user what to change.
|
|
448
|
+
- For meetings: ALWAYS use content_type:"audio" and do NOT use the q param. Transcriptions are noisy — q filters too aggressively and misses relevant content.
|
|
449
|
+
|
|
427
450
|
## search-content
|
|
428
451
|
| Parameter | Description | Default |
|
|
429
452
|
|-----------|-------------|---------|
|
|
@@ -449,6 +472,19 @@ Screenpipe captures four types of data:
|
|
|
449
472
|
4. **Fetch frame-context** for URLs and accessibility tree of specific frames
|
|
450
473
|
5. **Screenshots** (include_frames=true) only when text isn't enough
|
|
451
474
|
|
|
475
|
+
## Chat History
|
|
476
|
+
Previous screenpipe chat conversations are stored as individual JSON files in ~/.screenpipe/chats/{conversation-id}.json
|
|
477
|
+
Each file contains: id, title, messages[], createdAt, updatedAt. You can read these files to reference or search previous conversations.
|
|
478
|
+
|
|
479
|
+
## Speaker Management
|
|
480
|
+
screenpipe auto-identifies speakers in audio. API endpoints for managing them:
|
|
481
|
+
- \`GET /speakers/unnamed?limit=10\` — list unnamed speakers
|
|
482
|
+
- \`GET /speakers/search?name=John\` — search by name
|
|
483
|
+
- \`POST /speakers/update\` with \`{"id": 5, "name": "John"}\` — rename a speaker
|
|
484
|
+
- \`POST /speakers/merge\` with \`{"speaker_to_keep_id": 1, "speaker_to_merge_id": 2}\` — merge duplicates
|
|
485
|
+
- \`GET /speakers/similar?speaker_id=5\` — find similar speakers for merging
|
|
486
|
+
- \`POST /speakers/reassign\` — reassign audio chunk to different speaker
|
|
487
|
+
|
|
452
488
|
## Tips
|
|
453
489
|
1. Read screenpipe://context first to get current timestamps
|
|
454
490
|
2. Use activity-summary before search-content for broad overview questions
|