screenpipe-mcp 0.8.3 → 0.8.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +65 -1
- package/dist/index.js +69 -22
- package/manifest.json +1 -1
- package/package.json +3 -2
- package/src/index.ts +69 -23
package/README.md
CHANGED
|
@@ -28,7 +28,47 @@ The easiest way to use screenpipe-mcp is with npx. Edit your Claude Desktop conf
|
|
|
28
28
|
}
|
|
29
29
|
```
|
|
30
30
|
|
|
31
|
-
### Option 2:
|
|
31
|
+
### Option 2: HTTP Server (Remote / Network Access)
|
|
32
|
+
|
|
33
|
+
The MCP server can run over HTTP using the [Streamable HTTP transport](https://modelcontextprotocol.io/specification/2025-03-26/basic/transports#streamable-http), allowing remote MCP clients to connect over the network instead of stdio. This is ideal when your AI assistant (e.g., OpenClaw) runs on a different machine than screenpipe.
|
|
34
|
+
|
|
35
|
+
```bash
|
|
36
|
+
# from npm
|
|
37
|
+
npx screenpipe-mcp-http --port 3031
|
|
38
|
+
|
|
39
|
+
# or from source
|
|
40
|
+
npm run start:http -- --port 3031
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
The server exposes:
|
|
44
|
+
- **MCP endpoint**: `http://localhost:3031/mcp` — Streamable HTTP transport (POST for requests, GET for SSE stream)
|
|
45
|
+
- **Health check**: `http://localhost:3031/health`
|
|
46
|
+
|
|
47
|
+
**Options:**
|
|
48
|
+
| Flag | Description | Default |
|
|
49
|
+
|------|-------------|---------|
|
|
50
|
+
| `--port` | Port for the MCP HTTP server | 3031 |
|
|
51
|
+
| `--screenpipe-port` | Port where screenpipe API is running | 3030 |
|
|
52
|
+
|
|
53
|
+
**Connecting a remote MCP client:**
|
|
54
|
+
|
|
55
|
+
Point any MCP client that supports HTTP transport at the `/mcp` endpoint:
|
|
56
|
+
|
|
57
|
+
```json
|
|
58
|
+
{
|
|
59
|
+
"mcpServers": {
|
|
60
|
+
"screenpipe": {
|
|
61
|
+
"url": "http://<your-ip>:3031/mcp"
|
|
62
|
+
}
|
|
63
|
+
}
|
|
64
|
+
}
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
If your machines are on different networks, expose port 3031 via Tailscale, SSH tunnel, or similar — see the [OpenClaw integration guide](https://docs.screenpi.pe/openclaw) for detailed examples.
|
|
68
|
+
|
|
69
|
+
> **Note:** The HTTP server currently exposes `search_content` only. The stdio server has the full tool set (export-video, list-meetings, activity-summary, search-elements, frame-context). We're working on bringing HTTP to full parity.
|
|
70
|
+
|
|
71
|
+
### Option 3: From Source
|
|
32
72
|
|
|
33
73
|
Clone and build from source:
|
|
34
74
|
|
|
@@ -62,6 +102,13 @@ Test with MCP Inspector:
|
|
|
62
102
|
npx @modelcontextprotocol/inspector npx screenpipe-mcp
|
|
63
103
|
```
|
|
64
104
|
|
|
105
|
+
## Transport Modes
|
|
106
|
+
|
|
107
|
+
| Mode | Command | Use Case |
|
|
108
|
+
|------|---------|----------|
|
|
109
|
+
| **stdio** (default) | `npx screenpipe-mcp` | Claude Desktop, local MCP clients |
|
|
110
|
+
| **HTTP** | `npx screenpipe-mcp-http` | Remote clients, network access, OpenClaw on VPS |
|
|
111
|
+
|
|
65
112
|
## Available Tools
|
|
66
113
|
|
|
67
114
|
### search-content
|
|
@@ -79,6 +126,23 @@ Export screen recordings as video files:
|
|
|
79
126
|
- Specify time range with start/end times
|
|
80
127
|
- Configurable FPS for output video
|
|
81
128
|
|
|
129
|
+
### activity-summary
|
|
130
|
+
Get a lightweight compressed activity overview for a time range:
|
|
131
|
+
- App usage with active minutes and frame counts
|
|
132
|
+
- Recent accessibility texts
|
|
133
|
+
- Audio speaker summary
|
|
134
|
+
|
|
135
|
+
### list-meetings
|
|
136
|
+
List detected meetings with duration, app, and attendees.
|
|
137
|
+
|
|
138
|
+
### search-elements
|
|
139
|
+
Search structured UI elements (accessibility tree nodes and OCR text blocks):
|
|
140
|
+
- Filter by source, role, app, time range
|
|
141
|
+
- Much lighter than search-content for targeted UI lookups
|
|
142
|
+
|
|
143
|
+
### frame-context
|
|
144
|
+
Get accessibility text, parsed tree nodes, and extracted URLs for a specific frame.
|
|
145
|
+
|
|
82
146
|
## Example Queries in Claude
|
|
83
147
|
|
|
84
148
|
- "Search for any mentions of 'rust' in my screen recordings"
|
package/dist/index.js
CHANGED
|
@@ -69,7 +69,7 @@ const SCREENPIPE_API = `http://localhost:${port}`;
|
|
|
69
69
|
// Initialize server
|
|
70
70
|
const server = new index_js_1.Server({
|
|
71
71
|
name: "screenpipe",
|
|
72
|
-
version: "0.8.
|
|
72
|
+
version: "0.8.5",
|
|
73
73
|
}, {
|
|
74
74
|
capabilities: {
|
|
75
75
|
tools: {},
|
|
@@ -85,10 +85,14 @@ const BASE_TOOLS = [
|
|
|
85
85
|
"Returns timestamped results with app context. " +
|
|
86
86
|
"Call with no parameters to get recent activity. " +
|
|
87
87
|
"Use the 'screenpipe://context' resource for current time when building time-based queries.\n\n" +
|
|
88
|
+
"WHEN TO USE WHICH content_type:\n" +
|
|
89
|
+
"- For meetings/calls/conversations: content_type='audio', do NOT use q param (transcriptions are noisy, q filters too aggressively)\n" +
|
|
90
|
+
"- For screen text/reading: content_type='all' or 'accessibility'\n" +
|
|
91
|
+
"- For time spent/app usage questions: use activity-summary tool instead (this tool returns content, not time stats)\n\n" +
|
|
88
92
|
"SEARCH STRATEGY: First search with ONLY time params (start_time/end_time) — no q, no app_name, no content_type. " +
|
|
89
93
|
"This gives ground truth of what's recorded. Scan results to find correct app_name values, then narrow with filters using exact observed values. " +
|
|
90
|
-
"App names are case-sensitive
|
|
91
|
-
"The q param searches captured text
|
|
94
|
+
"App names are case-sensitive (e.g. 'Discord' vs 'Discord.exe'). " +
|
|
95
|
+
"The q param searches captured text, NOT app names. NEVER report 'no data' after one filtered search — verify with unfiltered time-only search first.\n\n" +
|
|
92
96
|
"DEEP LINKS: When referencing specific moments, create clickable links using IDs from search results:\n" +
|
|
93
97
|
"- OCR results (PREFERRED): [10:30 AM — Chrome](screenpipe://frame/12345) — use content.frame_id from the result\n" +
|
|
94
98
|
"- Audio results: [meeting at 3pm](screenpipe://timeline?timestamp=2024-01-15T15:00:00Z) — use exact timestamp from result\n" +
|
|
@@ -102,12 +106,12 @@ const BASE_TOOLS = [
|
|
|
102
106
|
properties: {
|
|
103
107
|
q: {
|
|
104
108
|
type: "string",
|
|
105
|
-
description: "Search query. Optional - omit to return all
|
|
109
|
+
description: "Search query (full-text search on captured text). Optional - omit to return all content in time range. IMPORTANT: Do NOT use q for audio/meeting searches — transcriptions are noisy and q filters too aggressively. Only use q when searching for specific text the user saw on screen.",
|
|
106
110
|
},
|
|
107
111
|
content_type: {
|
|
108
112
|
type: "string",
|
|
109
113
|
enum: ["all", "ocr", "audio", "input", "accessibility"],
|
|
110
|
-
description: "Content type filter: '
|
|
114
|
+
description: "Content type filter: 'audio' (transcriptions — use for meetings/calls/conversations), 'accessibility' (accessibility tree text, preferred for screen content), 'ocr' (screen text via OCR, legacy fallback), 'input' (clicks, keystrokes, clipboard, app switches), 'all'. Default: 'all'. For meeting/call queries, ALWAYS use 'audio'.",
|
|
111
115
|
default: "all",
|
|
112
116
|
},
|
|
113
117
|
limit: {
|
|
@@ -123,12 +127,12 @@ const BASE_TOOLS = [
|
|
|
123
127
|
start_time: {
|
|
124
128
|
type: "string",
|
|
125
129
|
format: "date-time",
|
|
126
|
-
description: "ISO 8601 UTC
|
|
130
|
+
description: "Start time: ISO 8601 UTC (e.g., 2024-01-15T10:00:00Z) or relative (e.g., '16h ago', '2d ago', 'now')",
|
|
127
131
|
},
|
|
128
132
|
end_time: {
|
|
129
133
|
type: "string",
|
|
130
134
|
format: "date-time",
|
|
131
|
-
description: "ISO 8601 UTC
|
|
135
|
+
description: "End time: ISO 8601 UTC (e.g., 2024-01-15T18:00:00Z) or relative (e.g., 'now', '1h ago')",
|
|
132
136
|
},
|
|
133
137
|
app_name: {
|
|
134
138
|
type: "string",
|
|
@@ -159,6 +163,10 @@ const BASE_TOOLS = [
|
|
|
159
163
|
type: "string",
|
|
160
164
|
description: "Filter audio by speaker name (case-insensitive partial match)",
|
|
161
165
|
},
|
|
166
|
+
max_content_length: {
|
|
167
|
+
type: "integer",
|
|
168
|
+
description: "Truncate each result's text/transcription to this many characters using middle-truncation (keeps first half + last half). Useful for limiting token usage with small-context models.",
|
|
169
|
+
},
|
|
162
170
|
},
|
|
163
171
|
},
|
|
164
172
|
},
|
|
@@ -166,7 +174,7 @@ const BASE_TOOLS = [
|
|
|
166
174
|
name: "export-video",
|
|
167
175
|
description: "Export a video of screen recordings for a specific time range. " +
|
|
168
176
|
"Creates an MP4 video from the recorded frames between the start and end times.\n\n" +
|
|
169
|
-
"IMPORTANT: Use ISO 8601 UTC timestamps (e.g., 2024-01-15T10:00:00Z)\n\n" +
|
|
177
|
+
"IMPORTANT: Use ISO 8601 UTC timestamps (e.g., 2024-01-15T10:00:00Z) or relative times (e.g., '16h ago', 'now')\n\n" +
|
|
170
178
|
"EXAMPLES:\n" +
|
|
171
179
|
"- Last 30 minutes: Calculate timestamps from current time\n" +
|
|
172
180
|
"- Specific meeting: Use the meeting's start and end times in UTC",
|
|
@@ -180,12 +188,12 @@ const BASE_TOOLS = [
|
|
|
180
188
|
start_time: {
|
|
181
189
|
type: "string",
|
|
182
190
|
format: "date-time",
|
|
183
|
-
description: "Start time
|
|
191
|
+
description: "Start time: ISO 8601 UTC (e.g., '2024-01-15T10:00:00Z') or relative (e.g., '16h ago', 'now')",
|
|
184
192
|
},
|
|
185
193
|
end_time: {
|
|
186
194
|
type: "string",
|
|
187
195
|
format: "date-time",
|
|
188
|
-
description: "End time
|
|
196
|
+
description: "End time: ISO 8601 UTC (e.g., '2024-01-15T10:30:00Z') or relative (e.g., 'now', '1h ago')",
|
|
189
197
|
},
|
|
190
198
|
fps: {
|
|
191
199
|
type: "number",
|
|
@@ -211,12 +219,12 @@ const BASE_TOOLS = [
|
|
|
211
219
|
start_time: {
|
|
212
220
|
type: "string",
|
|
213
221
|
format: "date-time",
|
|
214
|
-
description: "ISO 8601 UTC
|
|
222
|
+
description: "Start filter: ISO 8601 UTC (e.g., 2024-01-15T10:00:00Z) or relative (e.g., '16h ago', 'now')",
|
|
215
223
|
},
|
|
216
224
|
end_time: {
|
|
217
225
|
type: "string",
|
|
218
226
|
format: "date-time",
|
|
219
|
-
description: "ISO 8601 UTC
|
|
227
|
+
description: "End filter: ISO 8601 UTC (e.g., 2024-01-15T18:00:00Z) or relative (e.g., 'now', '1h ago')",
|
|
220
228
|
},
|
|
221
229
|
limit: {
|
|
222
230
|
type: "integer",
|
|
@@ -234,9 +242,16 @@ const BASE_TOOLS = [
|
|
|
234
242
|
{
|
|
235
243
|
name: "activity-summary",
|
|
236
244
|
description: "Get a lightweight compressed activity overview for a time range (~200-500 tokens). " +
|
|
237
|
-
"Returns app usage (name, frame count, minutes), recent accessibility texts, and audio speaker summary. " +
|
|
238
|
-
"
|
|
239
|
-
"
|
|
245
|
+
"Returns app usage (name, frame count, active minutes, first/last seen), recent accessibility texts, and audio speaker summary. " +
|
|
246
|
+
"Minutes are based on active session time (consecutive frames with gaps < 5min count as active). " +
|
|
247
|
+
"first_seen/last_seen show the wall-clock span per app.\n\n" +
|
|
248
|
+
"USE THIS TOOL (not search-content or raw SQL) for:\n" +
|
|
249
|
+
"- 'how long did I spend on X?' → active_minutes per app\n" +
|
|
250
|
+
"- 'which apps did I use today?' → app list sorted by active_minutes\n" +
|
|
251
|
+
"- 'what was I doing?' → broad overview before drilling deeper\n" +
|
|
252
|
+
"- Any time-spent or app-usage question\n\n" +
|
|
253
|
+
"WARNING: Do NOT estimate time from raw frame counts or SQL queries — those are inaccurate. " +
|
|
254
|
+
"This endpoint calculates actual active session time correctly.",
|
|
240
255
|
annotations: {
|
|
241
256
|
title: "Activity Summary",
|
|
242
257
|
readOnlyHint: true,
|
|
@@ -247,12 +262,12 @@ const BASE_TOOLS = [
|
|
|
247
262
|
start_time: {
|
|
248
263
|
type: "string",
|
|
249
264
|
format: "date-time",
|
|
250
|
-
description: "Start of time range
|
|
265
|
+
description: "Start of time range: ISO 8601 UTC (e.g., 2024-01-15T10:00:00Z) or relative (e.g., '16h ago', 'now')",
|
|
251
266
|
},
|
|
252
267
|
end_time: {
|
|
253
268
|
type: "string",
|
|
254
269
|
format: "date-time",
|
|
255
|
-
description: "End of time range
|
|
270
|
+
description: "End of time range: ISO 8601 UTC (e.g., 2024-01-15T18:00:00Z) or relative (e.g., 'now', '1h ago')",
|
|
256
271
|
},
|
|
257
272
|
app_name: {
|
|
258
273
|
type: "string",
|
|
@@ -296,12 +311,12 @@ const BASE_TOOLS = [
|
|
|
296
311
|
start_time: {
|
|
297
312
|
type: "string",
|
|
298
313
|
format: "date-time",
|
|
299
|
-
description: "ISO 8601 UTC
|
|
314
|
+
description: "Start time: ISO 8601 UTC or relative (e.g., '16h ago', 'now')",
|
|
300
315
|
},
|
|
301
316
|
end_time: {
|
|
302
317
|
type: "string",
|
|
303
318
|
format: "date-time",
|
|
304
|
-
description: "ISO 8601 UTC
|
|
319
|
+
description: "End time: ISO 8601 UTC or relative (e.g., 'now', '1h ago')",
|
|
305
320
|
},
|
|
306
321
|
app_name: {
|
|
307
322
|
type: "string",
|
|
@@ -421,14 +436,28 @@ Screenpipe captures four types of data:
|
|
|
421
436
|
- **Get keyboard input**: \`{"content_type": "input"}\`
|
|
422
437
|
- **Get audio only**: \`{"content_type": "audio"}\`
|
|
423
438
|
|
|
439
|
+
## Common User Requests → Correct Tool Choice
|
|
440
|
+
| User says | Use this tool | Key params |
|
|
441
|
+
|-----------|--------------|------------|
|
|
442
|
+
| "summarize my meeting/call" | search-content | content_type:"audio", NO q param, start_time |
|
|
443
|
+
| "what did they/I say about X" | search-content | content_type:"audio", NO q param (scan results manually) |
|
|
444
|
+
| "how long on X" / "which apps" / "time spent" | activity-summary | start_time, end_time |
|
|
445
|
+
| "what was I doing" | activity-summary | start_time, end_time (then drill into search-content) |
|
|
446
|
+
| "what was I reading/looking at" | search-content | content_type:"all", start_time |
|
|
447
|
+
|
|
448
|
+
## Behavior Rules
|
|
449
|
+
- Act immediately on clear requests. NEVER ask "what time range?" or "which content type?" when the intent is obvious.
|
|
450
|
+
- If search returns empty, silently retry with wider time range or fewer filters. Do NOT ask the user what to change.
|
|
451
|
+
- For meetings: ALWAYS use content_type:"audio" and do NOT use the q param. Transcriptions are noisy — q filters too aggressively and misses relevant content.
|
|
452
|
+
|
|
424
453
|
## search-content
|
|
425
454
|
| Parameter | Description | Default |
|
|
426
455
|
|-----------|-------------|---------|
|
|
427
456
|
| q | Search query | (none - returns all) |
|
|
428
457
|
| content_type | all/ocr/audio/input/accessibility | all |
|
|
429
458
|
| limit | Max results | 10 |
|
|
430
|
-
| start_time | ISO 8601 UTC | (no filter) |
|
|
431
|
-
| end_time | ISO 8601 UTC | (no filter) |
|
|
459
|
+
| start_time | ISO 8601 UTC or relative (e.g. '16h ago') | (no filter) |
|
|
460
|
+
| end_time | ISO 8601 UTC or relative (e.g. 'now') | (no filter) |
|
|
432
461
|
| app_name | Filter by app | (no filter) |
|
|
433
462
|
| include_frames | Include screenshots | false |
|
|
434
463
|
|
|
@@ -446,6 +475,19 @@ Screenpipe captures four types of data:
|
|
|
446
475
|
4. **Fetch frame-context** for URLs and accessibility tree of specific frames
|
|
447
476
|
5. **Screenshots** (include_frames=true) only when text isn't enough
|
|
448
477
|
|
|
478
|
+
## Chat History
|
|
479
|
+
Previous screenpipe chat conversations are stored as individual JSON files in ~/.screenpipe/chats/{conversation-id}.json
|
|
480
|
+
Each file contains: id, title, messages[], createdAt, updatedAt. You can read these files to reference or search previous conversations.
|
|
481
|
+
|
|
482
|
+
## Speaker Management
|
|
483
|
+
screenpipe auto-identifies speakers in audio. API endpoints for managing them:
|
|
484
|
+
- \`GET /speakers/unnamed?limit=10\` — list unnamed speakers
|
|
485
|
+
- \`GET /speakers/search?name=John\` — search by name
|
|
486
|
+
- \`POST /speakers/update\` with \`{"id": 5, "name": "John"}\` — rename a speaker
|
|
487
|
+
- \`POST /speakers/merge\` with \`{"speaker_to_keep_id": 1, "speaker_to_merge_id": 2}\` — merge duplicates
|
|
488
|
+
- \`GET /speakers/similar?speaker_id=5\` — find similar speakers for merging
|
|
489
|
+
- \`POST /speakers/reassign\` — reassign audio chunk to different speaker
|
|
490
|
+
|
|
449
491
|
## Tips
|
|
450
492
|
1. Read screenpipe://context first to get current timestamps
|
|
451
493
|
2. Use activity-summary before search-content for broad overview questions
|
|
@@ -928,7 +970,12 @@ server.setRequestHandler(types_js_1.CallToolRequestSchema, async (request) => {
|
|
|
928
970
|
}
|
|
929
971
|
const data = await response.json();
|
|
930
972
|
// Format apps
|
|
931
|
-
const appsLines = (data.apps || []).map((a) =>
|
|
973
|
+
const appsLines = (data.apps || []).map((a) => {
|
|
974
|
+
const timeSpan = a.first_seen && a.last_seen
|
|
975
|
+
? `, ${a.first_seen.slice(11, 16)}–${a.last_seen.slice(11, 16)} UTC`
|
|
976
|
+
: "";
|
|
977
|
+
return ` ${a.name}: ${a.minutes} min (${a.frame_count} frames${timeSpan})`;
|
|
978
|
+
});
|
|
932
979
|
// Format audio
|
|
933
980
|
const speakerLines = (data.audio_summary?.speakers || []).map((s) => ` ${s.name}: ${s.segment_count} segments`);
|
|
934
981
|
// Format recent texts
|
package/manifest.json
CHANGED
|
@@ -2,7 +2,7 @@
|
|
|
2
2
|
"manifest_version": "0.3",
|
|
3
3
|
"name": "screenpipe",
|
|
4
4
|
"display_name": "Screenpipe",
|
|
5
|
-
"version": "0.8.
|
|
5
|
+
"version": "0.8.4",
|
|
6
6
|
"description": "Search your screen recordings and audio transcriptions with AI",
|
|
7
7
|
"long_description": "Screenpipe is a 24/7 screen and audio recorder that lets you search everything you've seen or heard. This extension connects Claude to your local screenpipe instance, enabling AI-powered search through your digital memory.",
|
|
8
8
|
"author": {
|
package/package.json
CHANGED
|
@@ -1,10 +1,11 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "screenpipe-mcp",
|
|
3
|
-
"version": "0.8.
|
|
3
|
+
"version": "0.8.6",
|
|
4
4
|
"description": "MCP server for screenpipe - search your screen recordings and audio transcriptions",
|
|
5
5
|
"main": "dist/index.js",
|
|
6
6
|
"bin": {
|
|
7
|
-
"screenpipe-mcp": "dist/index.js"
|
|
7
|
+
"screenpipe-mcp": "dist/index.js",
|
|
8
|
+
"screenpipe-mcp-http": "dist/http-server.js"
|
|
8
9
|
},
|
|
9
10
|
"scripts": {
|
|
10
11
|
"build": "tsc",
|
package/src/index.ts
CHANGED
|
@@ -48,7 +48,7 @@ const SCREENPIPE_API = `http://localhost:${port}`;
|
|
|
48
48
|
const server = new Server(
|
|
49
49
|
{
|
|
50
50
|
name: "screenpipe",
|
|
51
|
-
version: "0.8.
|
|
51
|
+
version: "0.8.5",
|
|
52
52
|
},
|
|
53
53
|
{
|
|
54
54
|
capabilities: {
|
|
@@ -68,10 +68,14 @@ const BASE_TOOLS: Tool[] = [
|
|
|
68
68
|
"Returns timestamped results with app context. " +
|
|
69
69
|
"Call with no parameters to get recent activity. " +
|
|
70
70
|
"Use the 'screenpipe://context' resource for current time when building time-based queries.\n\n" +
|
|
71
|
+
"WHEN TO USE WHICH content_type:\n" +
|
|
72
|
+
"- For meetings/calls/conversations: content_type='audio', do NOT use q param (transcriptions are noisy, q filters too aggressively)\n" +
|
|
73
|
+
"- For screen text/reading: content_type='all' or 'accessibility'\n" +
|
|
74
|
+
"- For time spent/app usage questions: use activity-summary tool instead (this tool returns content, not time stats)\n\n" +
|
|
71
75
|
"SEARCH STRATEGY: First search with ONLY time params (start_time/end_time) — no q, no app_name, no content_type. " +
|
|
72
76
|
"This gives ground truth of what's recorded. Scan results to find correct app_name values, then narrow with filters using exact observed values. " +
|
|
73
|
-
"App names are case-sensitive
|
|
74
|
-
"The q param searches captured text
|
|
77
|
+
"App names are case-sensitive (e.g. 'Discord' vs 'Discord.exe'). " +
|
|
78
|
+
"The q param searches captured text, NOT app names. NEVER report 'no data' after one filtered search — verify with unfiltered time-only search first.\n\n" +
|
|
75
79
|
"DEEP LINKS: When referencing specific moments, create clickable links using IDs from search results:\n" +
|
|
76
80
|
"- OCR results (PREFERRED): [10:30 AM — Chrome](screenpipe://frame/12345) — use content.frame_id from the result\n" +
|
|
77
81
|
"- Audio results: [meeting at 3pm](screenpipe://timeline?timestamp=2024-01-15T15:00:00Z) — use exact timestamp from result\n" +
|
|
@@ -85,12 +89,12 @@ const BASE_TOOLS: Tool[] = [
|
|
|
85
89
|
properties: {
|
|
86
90
|
q: {
|
|
87
91
|
type: "string",
|
|
88
|
-
description: "Search query. Optional - omit to return all
|
|
92
|
+
description: "Search query (full-text search on captured text). Optional - omit to return all content in time range. IMPORTANT: Do NOT use q for audio/meeting searches — transcriptions are noisy and q filters too aggressively. Only use q when searching for specific text the user saw on screen.",
|
|
89
93
|
},
|
|
90
94
|
content_type: {
|
|
91
95
|
type: "string",
|
|
92
96
|
enum: ["all", "ocr", "audio", "input", "accessibility"],
|
|
93
|
-
description: "Content type filter: '
|
|
97
|
+
description: "Content type filter: 'audio' (transcriptions — use for meetings/calls/conversations), 'accessibility' (accessibility tree text, preferred for screen content), 'ocr' (screen text via OCR, legacy fallback), 'input' (clicks, keystrokes, clipboard, app switches), 'all'. Default: 'all'. For meeting/call queries, ALWAYS use 'audio'.",
|
|
94
98
|
default: "all",
|
|
95
99
|
},
|
|
96
100
|
limit: {
|
|
@@ -106,12 +110,12 @@ const BASE_TOOLS: Tool[] = [
|
|
|
106
110
|
start_time: {
|
|
107
111
|
type: "string",
|
|
108
112
|
format: "date-time",
|
|
109
|
-
description: "ISO 8601 UTC
|
|
113
|
+
description: "Start time: ISO 8601 UTC (e.g., 2024-01-15T10:00:00Z) or relative (e.g., '16h ago', '2d ago', 'now')",
|
|
110
114
|
},
|
|
111
115
|
end_time: {
|
|
112
116
|
type: "string",
|
|
113
117
|
format: "date-time",
|
|
114
|
-
description: "ISO 8601 UTC
|
|
118
|
+
description: "End time: ISO 8601 UTC (e.g., 2024-01-15T18:00:00Z) or relative (e.g., 'now', '1h ago')",
|
|
115
119
|
},
|
|
116
120
|
app_name: {
|
|
117
121
|
type: "string",
|
|
@@ -142,6 +146,10 @@ const BASE_TOOLS: Tool[] = [
|
|
|
142
146
|
type: "string",
|
|
143
147
|
description: "Filter audio by speaker name (case-insensitive partial match)",
|
|
144
148
|
},
|
|
149
|
+
max_content_length: {
|
|
150
|
+
type: "integer",
|
|
151
|
+
description: "Truncate each result's text/transcription to this many characters using middle-truncation (keeps first half + last half). Useful for limiting token usage with small-context models.",
|
|
152
|
+
},
|
|
145
153
|
},
|
|
146
154
|
},
|
|
147
155
|
},
|
|
@@ -150,7 +158,7 @@ const BASE_TOOLS: Tool[] = [
|
|
|
150
158
|
description:
|
|
151
159
|
"Export a video of screen recordings for a specific time range. " +
|
|
152
160
|
"Creates an MP4 video from the recorded frames between the start and end times.\n\n" +
|
|
153
|
-
"IMPORTANT: Use ISO 8601 UTC timestamps (e.g., 2024-01-15T10:00:00Z)\n\n" +
|
|
161
|
+
"IMPORTANT: Use ISO 8601 UTC timestamps (e.g., 2024-01-15T10:00:00Z) or relative times (e.g., '16h ago', 'now')\n\n" +
|
|
154
162
|
"EXAMPLES:\n" +
|
|
155
163
|
"- Last 30 minutes: Calculate timestamps from current time\n" +
|
|
156
164
|
"- Specific meeting: Use the meeting's start and end times in UTC",
|
|
@@ -165,13 +173,13 @@ const BASE_TOOLS: Tool[] = [
|
|
|
165
173
|
type: "string",
|
|
166
174
|
format: "date-time",
|
|
167
175
|
description:
|
|
168
|
-
"Start time
|
|
176
|
+
"Start time: ISO 8601 UTC (e.g., '2024-01-15T10:00:00Z') or relative (e.g., '16h ago', 'now')",
|
|
169
177
|
},
|
|
170
178
|
end_time: {
|
|
171
179
|
type: "string",
|
|
172
180
|
format: "date-time",
|
|
173
181
|
description:
|
|
174
|
-
"End time
|
|
182
|
+
"End time: ISO 8601 UTC (e.g., '2024-01-15T10:30:00Z') or relative (e.g., 'now', '1h ago')",
|
|
175
183
|
},
|
|
176
184
|
fps: {
|
|
177
185
|
type: "number",
|
|
@@ -199,12 +207,12 @@ const BASE_TOOLS: Tool[] = [
|
|
|
199
207
|
start_time: {
|
|
200
208
|
type: "string",
|
|
201
209
|
format: "date-time",
|
|
202
|
-
description: "ISO 8601 UTC
|
|
210
|
+
description: "Start filter: ISO 8601 UTC (e.g., 2024-01-15T10:00:00Z) or relative (e.g., '16h ago', 'now')",
|
|
203
211
|
},
|
|
204
212
|
end_time: {
|
|
205
213
|
type: "string",
|
|
206
214
|
format: "date-time",
|
|
207
|
-
description: "ISO 8601 UTC
|
|
215
|
+
description: "End filter: ISO 8601 UTC (e.g., 2024-01-15T18:00:00Z) or relative (e.g., 'now', '1h ago')",
|
|
208
216
|
},
|
|
209
217
|
limit: {
|
|
210
218
|
type: "integer",
|
|
@@ -223,9 +231,16 @@ const BASE_TOOLS: Tool[] = [
|
|
|
223
231
|
name: "activity-summary",
|
|
224
232
|
description:
|
|
225
233
|
"Get a lightweight compressed activity overview for a time range (~200-500 tokens). " +
|
|
226
|
-
"Returns app usage (name, frame count, minutes), recent accessibility texts, and audio speaker summary. " +
|
|
227
|
-
"
|
|
228
|
-
"
|
|
234
|
+
"Returns app usage (name, frame count, active minutes, first/last seen), recent accessibility texts, and audio speaker summary. " +
|
|
235
|
+
"Minutes are based on active session time (consecutive frames with gaps < 5min count as active). " +
|
|
236
|
+
"first_seen/last_seen show the wall-clock span per app.\n\n" +
|
|
237
|
+
"USE THIS TOOL (not search-content or raw SQL) for:\n" +
|
|
238
|
+
"- 'how long did I spend on X?' → active_minutes per app\n" +
|
|
239
|
+
"- 'which apps did I use today?' → app list sorted by active_minutes\n" +
|
|
240
|
+
"- 'what was I doing?' → broad overview before drilling deeper\n" +
|
|
241
|
+
"- Any time-spent or app-usage question\n\n" +
|
|
242
|
+
"WARNING: Do NOT estimate time from raw frame counts or SQL queries — those are inaccurate. " +
|
|
243
|
+
"This endpoint calculates actual active session time correctly.",
|
|
229
244
|
annotations: {
|
|
230
245
|
title: "Activity Summary",
|
|
231
246
|
readOnlyHint: true,
|
|
@@ -236,12 +251,12 @@ const BASE_TOOLS: Tool[] = [
|
|
|
236
251
|
start_time: {
|
|
237
252
|
type: "string",
|
|
238
253
|
format: "date-time",
|
|
239
|
-
description: "Start of time range
|
|
254
|
+
description: "Start of time range: ISO 8601 UTC (e.g., 2024-01-15T10:00:00Z) or relative (e.g., '16h ago', 'now')",
|
|
240
255
|
},
|
|
241
256
|
end_time: {
|
|
242
257
|
type: "string",
|
|
243
258
|
format: "date-time",
|
|
244
|
-
description: "End of time range
|
|
259
|
+
description: "End of time range: ISO 8601 UTC (e.g., 2024-01-15T18:00:00Z) or relative (e.g., 'now', '1h ago')",
|
|
245
260
|
},
|
|
246
261
|
app_name: {
|
|
247
262
|
type: "string",
|
|
@@ -286,12 +301,12 @@ const BASE_TOOLS: Tool[] = [
|
|
|
286
301
|
start_time: {
|
|
287
302
|
type: "string",
|
|
288
303
|
format: "date-time",
|
|
289
|
-
description: "ISO 8601 UTC
|
|
304
|
+
description: "Start time: ISO 8601 UTC or relative (e.g., '16h ago', 'now')",
|
|
290
305
|
},
|
|
291
306
|
end_time: {
|
|
292
307
|
type: "string",
|
|
293
308
|
format: "date-time",
|
|
294
|
-
description: "ISO 8601 UTC
|
|
309
|
+
description: "End time: ISO 8601 UTC or relative (e.g., 'now', '1h ago')",
|
|
295
310
|
},
|
|
296
311
|
app_name: {
|
|
297
312
|
type: "string",
|
|
@@ -418,14 +433,28 @@ Screenpipe captures four types of data:
|
|
|
418
433
|
- **Get keyboard input**: \`{"content_type": "input"}\`
|
|
419
434
|
- **Get audio only**: \`{"content_type": "audio"}\`
|
|
420
435
|
|
|
436
|
+
## Common User Requests → Correct Tool Choice
|
|
437
|
+
| User says | Use this tool | Key params |
|
|
438
|
+
|-----------|--------------|------------|
|
|
439
|
+
| "summarize my meeting/call" | search-content | content_type:"audio", NO q param, start_time |
|
|
440
|
+
| "what did they/I say about X" | search-content | content_type:"audio", NO q param (scan results manually) |
|
|
441
|
+
| "how long on X" / "which apps" / "time spent" | activity-summary | start_time, end_time |
|
|
442
|
+
| "what was I doing" | activity-summary | start_time, end_time (then drill into search-content) |
|
|
443
|
+
| "what was I reading/looking at" | search-content | content_type:"all", start_time |
|
|
444
|
+
|
|
445
|
+
## Behavior Rules
|
|
446
|
+
- Act immediately on clear requests. NEVER ask "what time range?" or "which content type?" when the intent is obvious.
|
|
447
|
+
- If search returns empty, silently retry with wider time range or fewer filters. Do NOT ask the user what to change.
|
|
448
|
+
- For meetings: ALWAYS use content_type:"audio" and do NOT use the q param. Transcriptions are noisy — q filters too aggressively and misses relevant content.
|
|
449
|
+
|
|
421
450
|
## search-content
|
|
422
451
|
| Parameter | Description | Default |
|
|
423
452
|
|-----------|-------------|---------|
|
|
424
453
|
| q | Search query | (none - returns all) |
|
|
425
454
|
| content_type | all/ocr/audio/input/accessibility | all |
|
|
426
455
|
| limit | Max results | 10 |
|
|
427
|
-
| start_time | ISO 8601 UTC | (no filter) |
|
|
428
|
-
| end_time | ISO 8601 UTC | (no filter) |
|
|
456
|
+
| start_time | ISO 8601 UTC or relative (e.g. '16h ago') | (no filter) |
|
|
457
|
+
| end_time | ISO 8601 UTC or relative (e.g. 'now') | (no filter) |
|
|
429
458
|
| app_name | Filter by app | (no filter) |
|
|
430
459
|
| include_frames | Include screenshots | false |
|
|
431
460
|
|
|
@@ -443,6 +472,19 @@ Screenpipe captures four types of data:
|
|
|
443
472
|
4. **Fetch frame-context** for URLs and accessibility tree of specific frames
|
|
444
473
|
5. **Screenshots** (include_frames=true) only when text isn't enough
|
|
445
474
|
|
|
475
|
+
## Chat History
|
|
476
|
+
Previous screenpipe chat conversations are stored as individual JSON files in ~/.screenpipe/chats/{conversation-id}.json
|
|
477
|
+
Each file contains: id, title, messages[], createdAt, updatedAt. You can read these files to reference or search previous conversations.
|
|
478
|
+
|
|
479
|
+
## Speaker Management
|
|
480
|
+
screenpipe auto-identifies speakers in audio. API endpoints for managing them:
|
|
481
|
+
- \`GET /speakers/unnamed?limit=10\` — list unnamed speakers
|
|
482
|
+
- \`GET /speakers/search?name=John\` — search by name
|
|
483
|
+
- \`POST /speakers/update\` with \`{"id": 5, "name": "John"}\` — rename a speaker
|
|
484
|
+
- \`POST /speakers/merge\` with \`{"speaker_to_keep_id": 1, "speaker_to_merge_id": 2}\` — merge duplicates
|
|
485
|
+
- \`GET /speakers/similar?speaker_id=5\` — find similar speakers for merging
|
|
486
|
+
- \`POST /speakers/reassign\` — reassign audio chunk to different speaker
|
|
487
|
+
|
|
446
488
|
## Tips
|
|
447
489
|
1. Read screenpipe://context first to get current timestamps
|
|
448
490
|
2. Use activity-summary before search-content for broad overview questions
|
|
@@ -993,8 +1035,12 @@ server.setRequestHandler(CallToolRequestSchema, async (request) => {
|
|
|
993
1035
|
|
|
994
1036
|
// Format apps
|
|
995
1037
|
const appsLines = (data.apps || []).map(
|
|
996
|
-
(a: { name: string; frame_count: number; minutes: number }) =>
|
|
997
|
-
|
|
1038
|
+
(a: { name: string; frame_count: number; minutes: number; first_seen?: string; last_seen?: string }) => {
|
|
1039
|
+
const timeSpan = a.first_seen && a.last_seen
|
|
1040
|
+
? `, ${a.first_seen.slice(11, 16)}–${a.last_seen.slice(11, 16)} UTC`
|
|
1041
|
+
: "";
|
|
1042
|
+
return ` ${a.name}: ${a.minutes} min (${a.frame_count} frames${timeSpan})`;
|
|
1043
|
+
}
|
|
998
1044
|
);
|
|
999
1045
|
|
|
1000
1046
|
// Format audio
|