screenpipe-mcp 0.8.3 → 0.8.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -28,7 +28,47 @@ The easiest way to use screenpipe-mcp is with npx. Edit your Claude Desktop conf
28
28
  }
29
29
  ```
30
30
 
31
- ### Option 2: From Source
31
+ ### Option 2: HTTP Server (Remote / Network Access)
32
+
33
+ The MCP server can run over HTTP using the [Streamable HTTP transport](https://modelcontextprotocol.io/specification/2025-03-26/basic/transports#streamable-http), allowing remote MCP clients to connect over the network instead of stdio. This is ideal when your AI assistant (e.g., OpenClaw) runs on a different machine than screenpipe.
34
+
35
+ ```bash
36
+ # from npm
37
+ npx screenpipe-mcp-http --port 3031
38
+
39
+ # or from source
40
+ npm run start:http -- --port 3031
41
+ ```
42
+
43
+ The server exposes:
44
+ - **MCP endpoint**: `http://localhost:3031/mcp` — Streamable HTTP transport (POST for requests, GET for SSE stream)
45
+ - **Health check**: `http://localhost:3031/health`
46
+
47
+ **Options:**
48
+ | Flag | Description | Default |
49
+ |------|-------------|---------|
50
+ | `--port` | Port for the MCP HTTP server | 3031 |
51
+ | `--screenpipe-port` | Port where screenpipe API is running | 3030 |
52
+
53
+ **Connecting a remote MCP client:**
54
+
55
+ Point any MCP client that supports HTTP transport at the `/mcp` endpoint:
56
+
57
+ ```json
58
+ {
59
+ "mcpServers": {
60
+ "screenpipe": {
61
+ "url": "http://<your-ip>:3031/mcp"
62
+ }
63
+ }
64
+ }
65
+ ```
66
+
67
+ If your machines are on different networks, expose port 3031 via Tailscale, SSH tunnel, or similar — see the [OpenClaw integration guide](https://docs.screenpi.pe/openclaw) for detailed examples.
68
+
69
+ > **Note:** The HTTP server currently exposes `search_content` only. The stdio server has the full tool set (export-video, list-meetings, activity-summary, search-elements, frame-context). We're working on bringing HTTP to full parity.
70
+
71
+ ### Option 3: From Source
32
72
 
33
73
  Clone and build from source:
34
74
 
@@ -62,6 +102,13 @@ Test with MCP Inspector:
62
102
  npx @modelcontextprotocol/inspector npx screenpipe-mcp
63
103
  ```
64
104
 
105
+ ## Transport Modes
106
+
107
+ | Mode | Command | Use Case |
108
+ |------|---------|----------|
109
+ | **stdio** (default) | `npx screenpipe-mcp` | Claude Desktop, local MCP clients |
110
+ | **HTTP** | `npx screenpipe-mcp-http` | Remote clients, network access, OpenClaw on VPS |
111
+
65
112
  ## Available Tools
66
113
 
67
114
  ### search-content
@@ -79,6 +126,23 @@ Export screen recordings as video files:
79
126
  - Specify time range with start/end times
80
127
  - Configurable FPS for output video
81
128
 
129
+ ### activity-summary
130
+ Get a lightweight compressed activity overview for a time range:
131
+ - App usage with active minutes and frame counts
132
+ - Recent accessibility texts
133
+ - Audio speaker summary
134
+
135
+ ### list-meetings
136
+ List detected meetings with duration, app, and attendees.
137
+
138
+ ### search-elements
139
+ Search structured UI elements (accessibility tree nodes and OCR text blocks):
140
+ - Filter by source, role, app, time range
141
+ - Much lighter than search-content for targeted UI lookups
142
+
143
+ ### frame-context
144
+ Get accessibility text, parsed tree nodes, and extracted URLs for a specific frame.
145
+
82
146
  ## Example Queries in Claude
83
147
 
84
148
  - "Search for any mentions of 'rust' in my screen recordings"
package/dist/index.js CHANGED
@@ -69,7 +69,7 @@ const SCREENPIPE_API = `http://localhost:${port}`;
69
69
  // Initialize server
70
70
  const server = new index_js_1.Server({
71
71
  name: "screenpipe",
72
- version: "0.8.3",
72
+ version: "0.8.5",
73
73
  }, {
74
74
  capabilities: {
75
75
  tools: {},
@@ -85,10 +85,14 @@ const BASE_TOOLS = [
85
85
  "Returns timestamped results with app context. " +
86
86
  "Call with no parameters to get recent activity. " +
87
87
  "Use the 'screenpipe://context' resource for current time when building time-based queries.\n\n" +
88
+ "WHEN TO USE WHICH content_type:\n" +
89
+ "- For meetings/calls/conversations: content_type='audio', do NOT use q param (transcriptions are noisy, q filters too aggressively)\n" +
90
+ "- For screen text/reading: content_type='all' or 'accessibility'\n" +
91
+ "- For time spent/app usage questions: use activity-summary tool instead (this tool returns content, not time stats)\n\n" +
88
92
  "SEARCH STRATEGY: First search with ONLY time params (start_time/end_time) — no q, no app_name, no content_type. " +
89
93
  "This gives ground truth of what's recorded. Scan results to find correct app_name values, then narrow with filters using exact observed values. " +
90
- "App names are case-sensitive and may differ from user input (e.g. 'Discord' vs 'Discord.exe'). " +
91
- "The q param searches captured text (accessibility/OCR), NOT app names. NEVER report 'no data' after one filtered search — verify with unfiltered time-only search first.\n\n" +
94
+ "App names are case-sensitive (e.g. 'Discord' vs 'Discord.exe'). " +
95
+ "The q param searches captured text, NOT app names. NEVER report 'no data' after one filtered search — verify with unfiltered time-only search first.\n\n" +
92
96
  "DEEP LINKS: When referencing specific moments, create clickable links using IDs from search results:\n" +
93
97
  "- OCR results (PREFERRED): [10:30 AM — Chrome](screenpipe://frame/12345) — use content.frame_id from the result\n" +
94
98
  "- Audio results: [meeting at 3pm](screenpipe://timeline?timestamp=2024-01-15T15:00:00Z) — use exact timestamp from result\n" +
@@ -102,12 +106,12 @@ const BASE_TOOLS = [
102
106
  properties: {
103
107
  q: {
104
108
  type: "string",
105
- description: "Search query. Optional - omit to return all recent content.",
109
+ description: "Search query (full-text search on captured text). Optional - omit to return all content in time range. IMPORTANT: Do NOT use q for audio/meeting searches — transcriptions are noisy and q filters too aggressively. Only use q when searching for specific text the user saw on screen.",
106
110
  },
107
111
  content_type: {
108
112
  type: "string",
109
113
  enum: ["all", "ocr", "audio", "input", "accessibility"],
110
- description: "Content type filter: 'ocr' (screen text via OCR, legacy fallback), 'audio' (transcriptions), 'input' (clicks, keystrokes, clipboard, app switches), 'accessibility' (accessibility tree text, preferred for screen content), 'all'. Default: 'all'.",
114
+ description: "Content type filter: 'audio' (transcriptions use for meetings/calls/conversations), 'accessibility' (accessibility tree text, preferred for screen content), 'ocr' (screen text via OCR, legacy fallback), 'input' (clicks, keystrokes, clipboard, app switches), 'all'. Default: 'all'. For meeting/call queries, ALWAYS use 'audio'.",
111
115
  default: "all",
112
116
  },
113
117
  limit: {
@@ -123,12 +127,12 @@ const BASE_TOOLS = [
123
127
  start_time: {
124
128
  type: "string",
125
129
  format: "date-time",
126
- description: "ISO 8601 UTC start time (e.g., 2024-01-15T10:00:00Z)",
130
+ description: "Start time: ISO 8601 UTC (e.g., 2024-01-15T10:00:00Z) or relative (e.g., '16h ago', '2d ago', 'now')",
127
131
  },
128
132
  end_time: {
129
133
  type: "string",
130
134
  format: "date-time",
131
- description: "ISO 8601 UTC end time (e.g., 2024-01-15T18:00:00Z)",
135
+ description: "End time: ISO 8601 UTC (e.g., 2024-01-15T18:00:00Z) or relative (e.g., 'now', '1h ago')",
132
136
  },
133
137
  app_name: {
134
138
  type: "string",
@@ -159,6 +163,10 @@ const BASE_TOOLS = [
159
163
  type: "string",
160
164
  description: "Filter audio by speaker name (case-insensitive partial match)",
161
165
  },
166
+ max_content_length: {
167
+ type: "integer",
168
+ description: "Truncate each result's text/transcription to this many characters using middle-truncation (keeps first half + last half). Useful for limiting token usage with small-context models.",
169
+ },
162
170
  },
163
171
  },
164
172
  },
@@ -166,7 +174,7 @@ const BASE_TOOLS = [
166
174
  name: "export-video",
167
175
  description: "Export a video of screen recordings for a specific time range. " +
168
176
  "Creates an MP4 video from the recorded frames between the start and end times.\n\n" +
169
- "IMPORTANT: Use ISO 8601 UTC timestamps (e.g., 2024-01-15T10:00:00Z)\n\n" +
177
+ "IMPORTANT: Use ISO 8601 UTC timestamps (e.g., 2024-01-15T10:00:00Z) or relative times (e.g., '16h ago', 'now')\n\n" +
170
178
  "EXAMPLES:\n" +
171
179
  "- Last 30 minutes: Calculate timestamps from current time\n" +
172
180
  "- Specific meeting: Use the meeting's start and end times in UTC",
@@ -180,12 +188,12 @@ const BASE_TOOLS = [
180
188
  start_time: {
181
189
  type: "string",
182
190
  format: "date-time",
183
- description: "Start time in ISO 8601 format UTC. MUST include timezone (Z for UTC). Example: '2024-01-15T10:00:00Z'",
191
+ description: "Start time: ISO 8601 UTC (e.g., '2024-01-15T10:00:00Z') or relative (e.g., '16h ago', 'now')",
184
192
  },
185
193
  end_time: {
186
194
  type: "string",
187
195
  format: "date-time",
188
- description: "End time in ISO 8601 format UTC. MUST include timezone (Z for UTC). Example: '2024-01-15T10:30:00Z'",
196
+ description: "End time: ISO 8601 UTC (e.g., '2024-01-15T10:30:00Z') or relative (e.g., 'now', '1h ago')",
189
197
  },
190
198
  fps: {
191
199
  type: "number",
@@ -211,12 +219,12 @@ const BASE_TOOLS = [
211
219
  start_time: {
212
220
  type: "string",
213
221
  format: "date-time",
214
- description: "ISO 8601 UTC start filter (e.g., 2024-01-15T10:00:00Z)",
222
+ description: "Start filter: ISO 8601 UTC (e.g., 2024-01-15T10:00:00Z) or relative (e.g., '16h ago', 'now')",
215
223
  },
216
224
  end_time: {
217
225
  type: "string",
218
226
  format: "date-time",
219
- description: "ISO 8601 UTC end filter (e.g., 2024-01-15T18:00:00Z)",
227
+ description: "End filter: ISO 8601 UTC (e.g., 2024-01-15T18:00:00Z) or relative (e.g., 'now', '1h ago')",
220
228
  },
221
229
  limit: {
222
230
  type: "integer",
@@ -234,9 +242,16 @@ const BASE_TOOLS = [
234
242
  {
235
243
  name: "activity-summary",
236
244
  description: "Get a lightweight compressed activity overview for a time range (~200-500 tokens). " +
237
- "Returns app usage (name, frame count, minutes), recent accessibility texts, and audio speaker summary. " +
238
- "Use this FIRST for broad questions like 'what was I doing?' before drilling into search-content or search-elements. " +
239
- "Much cheaper than search-content for getting an overview.",
245
+ "Returns app usage (name, frame count, active minutes, first/last seen), recent accessibility texts, and audio speaker summary. " +
246
+ "Minutes are based on active session time (consecutive frames with gaps < 5min count as active). " +
247
+ "first_seen/last_seen show the wall-clock span per app.\n\n" +
248
+ "USE THIS TOOL (not search-content or raw SQL) for:\n" +
249
+ "- 'how long did I spend on X?' → active_minutes per app\n" +
250
+ "- 'which apps did I use today?' → app list sorted by active_minutes\n" +
251
+ "- 'what was I doing?' → broad overview before drilling deeper\n" +
252
+ "- Any time-spent or app-usage question\n\n" +
253
+ "WARNING: Do NOT estimate time from raw frame counts or SQL queries — those are inaccurate. " +
254
+ "This endpoint calculates actual active session time correctly.",
240
255
  annotations: {
241
256
  title: "Activity Summary",
242
257
  readOnlyHint: true,
@@ -247,12 +262,12 @@ const BASE_TOOLS = [
247
262
  start_time: {
248
263
  type: "string",
249
264
  format: "date-time",
250
- description: "Start of time range in ISO 8601 UTC (e.g., 2024-01-15T10:00:00Z)",
265
+ description: "Start of time range: ISO 8601 UTC (e.g., 2024-01-15T10:00:00Z) or relative (e.g., '16h ago', 'now')",
251
266
  },
252
267
  end_time: {
253
268
  type: "string",
254
269
  format: "date-time",
255
- description: "End of time range in ISO 8601 UTC (e.g., 2024-01-15T18:00:00Z)",
270
+ description: "End of time range: ISO 8601 UTC (e.g., 2024-01-15T18:00:00Z) or relative (e.g., 'now', '1h ago')",
256
271
  },
257
272
  app_name: {
258
273
  type: "string",
@@ -296,12 +311,12 @@ const BASE_TOOLS = [
296
311
  start_time: {
297
312
  type: "string",
298
313
  format: "date-time",
299
- description: "ISO 8601 UTC start time",
314
+ description: "Start time: ISO 8601 UTC or relative (e.g., '16h ago', 'now')",
300
315
  },
301
316
  end_time: {
302
317
  type: "string",
303
318
  format: "date-time",
304
- description: "ISO 8601 UTC end time",
319
+ description: "End time: ISO 8601 UTC or relative (e.g., 'now', '1h ago')",
305
320
  },
306
321
  app_name: {
307
322
  type: "string",
@@ -421,14 +436,28 @@ Screenpipe captures four types of data:
421
436
  - **Get keyboard input**: \`{"content_type": "input"}\`
422
437
  - **Get audio only**: \`{"content_type": "audio"}\`
423
438
 
439
+ ## Common User Requests → Correct Tool Choice
440
+ | User says | Use this tool | Key params |
441
+ |-----------|--------------|------------|
442
+ | "summarize my meeting/call" | search-content | content_type:"audio", NO q param, start_time |
443
+ | "what did they/I say about X" | search-content | content_type:"audio", NO q param (scan results manually) |
444
+ | "how long on X" / "which apps" / "time spent" | activity-summary | start_time, end_time |
445
+ | "what was I doing" | activity-summary | start_time, end_time (then drill into search-content) |
446
+ | "what was I reading/looking at" | search-content | content_type:"all", start_time |
447
+
448
+ ## Behavior Rules
449
+ - Act immediately on clear requests. NEVER ask "what time range?" or "which content type?" when the intent is obvious.
450
+ - If search returns empty, silently retry with wider time range or fewer filters. Do NOT ask the user what to change.
451
+ - For meetings: ALWAYS use content_type:"audio" and do NOT use the q param. Transcriptions are noisy — q filters too aggressively and misses relevant content.
452
+
424
453
  ## search-content
425
454
  | Parameter | Description | Default |
426
455
  |-----------|-------------|---------|
427
456
  | q | Search query | (none - returns all) |
428
457
  | content_type | all/ocr/audio/input/accessibility | all |
429
458
  | limit | Max results | 10 |
430
- | start_time | ISO 8601 UTC | (no filter) |
431
- | end_time | ISO 8601 UTC | (no filter) |
459
+ | start_time | ISO 8601 UTC or relative (e.g. '16h ago') | (no filter) |
460
+ | end_time | ISO 8601 UTC or relative (e.g. 'now') | (no filter) |
432
461
  | app_name | Filter by app | (no filter) |
433
462
  | include_frames | Include screenshots | false |
434
463
 
@@ -446,6 +475,19 @@ Screenpipe captures four types of data:
446
475
  4. **Fetch frame-context** for URLs and accessibility tree of specific frames
447
476
  5. **Screenshots** (include_frames=true) only when text isn't enough
448
477
 
478
+ ## Chat History
479
+ Previous screenpipe chat conversations are stored as individual JSON files in ~/.screenpipe/chats/{conversation-id}.json
480
+ Each file contains: id, title, messages[], createdAt, updatedAt. You can read these files to reference or search previous conversations.
481
+
482
+ ## Speaker Management
483
+ screenpipe auto-identifies speakers in audio. API endpoints for managing them:
484
+ - \`GET /speakers/unnamed?limit=10\` — list unnamed speakers
485
+ - \`GET /speakers/search?name=John\` — search by name
486
+ - \`POST /speakers/update\` with \`{"id": 5, "name": "John"}\` — rename a speaker
487
+ - \`POST /speakers/merge\` with \`{"speaker_to_keep_id": 1, "speaker_to_merge_id": 2}\` — merge duplicates
488
+ - \`GET /speakers/similar?speaker_id=5\` — find similar speakers for merging
489
+ - \`POST /speakers/reassign\` — reassign audio chunk to different speaker
490
+
449
491
  ## Tips
450
492
  1. Read screenpipe://context first to get current timestamps
451
493
  2. Use activity-summary before search-content for broad overview questions
@@ -928,7 +970,12 @@ server.setRequestHandler(types_js_1.CallToolRequestSchema, async (request) => {
928
970
  }
929
971
  const data = await response.json();
930
972
  // Format apps
931
- const appsLines = (data.apps || []).map((a) => ` ${a.name}: ${a.minutes} min (${a.frame_count} frames)`);
973
+ const appsLines = (data.apps || []).map((a) => {
974
+ const timeSpan = a.first_seen && a.last_seen
975
+ ? `, ${a.first_seen.slice(11, 16)}–${a.last_seen.slice(11, 16)} UTC`
976
+ : "";
977
+ return ` ${a.name}: ${a.minutes} min (${a.frame_count} frames${timeSpan})`;
978
+ });
932
979
  // Format audio
933
980
  const speakerLines = (data.audio_summary?.speakers || []).map((s) => ` ${s.name}: ${s.segment_count} segments`);
934
981
  // Format recent texts
package/manifest.json CHANGED
@@ -2,7 +2,7 @@
2
2
  "manifest_version": "0.3",
3
3
  "name": "screenpipe",
4
4
  "display_name": "Screenpipe",
5
- "version": "0.8.3",
5
+ "version": "0.8.4",
6
6
  "description": "Search your screen recordings and audio transcriptions with AI",
7
7
  "long_description": "Screenpipe is a 24/7 screen and audio recorder that lets you search everything you've seen or heard. This extension connects Claude to your local screenpipe instance, enabling AI-powered search through your digital memory.",
8
8
  "author": {
package/package.json CHANGED
@@ -1,10 +1,11 @@
1
1
  {
2
2
  "name": "screenpipe-mcp",
3
- "version": "0.8.3",
3
+ "version": "0.8.6",
4
4
  "description": "MCP server for screenpipe - search your screen recordings and audio transcriptions",
5
5
  "main": "dist/index.js",
6
6
  "bin": {
7
- "screenpipe-mcp": "dist/index.js"
7
+ "screenpipe-mcp": "dist/index.js",
8
+ "screenpipe-mcp-http": "dist/http-server.js"
8
9
  },
9
10
  "scripts": {
10
11
  "build": "tsc",
package/src/index.ts CHANGED
@@ -48,7 +48,7 @@ const SCREENPIPE_API = `http://localhost:${port}`;
48
48
  const server = new Server(
49
49
  {
50
50
  name: "screenpipe",
51
- version: "0.8.3",
51
+ version: "0.8.5",
52
52
  },
53
53
  {
54
54
  capabilities: {
@@ -68,10 +68,14 @@ const BASE_TOOLS: Tool[] = [
68
68
  "Returns timestamped results with app context. " +
69
69
  "Call with no parameters to get recent activity. " +
70
70
  "Use the 'screenpipe://context' resource for current time when building time-based queries.\n\n" +
71
+ "WHEN TO USE WHICH content_type:\n" +
72
+ "- For meetings/calls/conversations: content_type='audio', do NOT use q param (transcriptions are noisy, q filters too aggressively)\n" +
73
+ "- For screen text/reading: content_type='all' or 'accessibility'\n" +
74
+ "- For time spent/app usage questions: use activity-summary tool instead (this tool returns content, not time stats)\n\n" +
71
75
  "SEARCH STRATEGY: First search with ONLY time params (start_time/end_time) — no q, no app_name, no content_type. " +
72
76
  "This gives ground truth of what's recorded. Scan results to find correct app_name values, then narrow with filters using exact observed values. " +
73
- "App names are case-sensitive and may differ from user input (e.g. 'Discord' vs 'Discord.exe'). " +
74
- "The q param searches captured text (accessibility/OCR), NOT app names. NEVER report 'no data' after one filtered search — verify with unfiltered time-only search first.\n\n" +
77
+ "App names are case-sensitive (e.g. 'Discord' vs 'Discord.exe'). " +
78
+ "The q param searches captured text, NOT app names. NEVER report 'no data' after one filtered search — verify with unfiltered time-only search first.\n\n" +
75
79
  "DEEP LINKS: When referencing specific moments, create clickable links using IDs from search results:\n" +
76
80
  "- OCR results (PREFERRED): [10:30 AM — Chrome](screenpipe://frame/12345) — use content.frame_id from the result\n" +
77
81
  "- Audio results: [meeting at 3pm](screenpipe://timeline?timestamp=2024-01-15T15:00:00Z) — use exact timestamp from result\n" +
@@ -85,12 +89,12 @@ const BASE_TOOLS: Tool[] = [
85
89
  properties: {
86
90
  q: {
87
91
  type: "string",
88
- description: "Search query. Optional - omit to return all recent content.",
92
+ description: "Search query (full-text search on captured text). Optional - omit to return all content in time range. IMPORTANT: Do NOT use q for audio/meeting searches — transcriptions are noisy and q filters too aggressively. Only use q when searching for specific text the user saw on screen.",
89
93
  },
90
94
  content_type: {
91
95
  type: "string",
92
96
  enum: ["all", "ocr", "audio", "input", "accessibility"],
93
- description: "Content type filter: 'ocr' (screen text via OCR, legacy fallback), 'audio' (transcriptions), 'input' (clicks, keystrokes, clipboard, app switches), 'accessibility' (accessibility tree text, preferred for screen content), 'all'. Default: 'all'.",
97
+ description: "Content type filter: 'audio' (transcriptions use for meetings/calls/conversations), 'accessibility' (accessibility tree text, preferred for screen content), 'ocr' (screen text via OCR, legacy fallback), 'input' (clicks, keystrokes, clipboard, app switches), 'all'. Default: 'all'. For meeting/call queries, ALWAYS use 'audio'.",
94
98
  default: "all",
95
99
  },
96
100
  limit: {
@@ -106,12 +110,12 @@ const BASE_TOOLS: Tool[] = [
106
110
  start_time: {
107
111
  type: "string",
108
112
  format: "date-time",
109
- description: "ISO 8601 UTC start time (e.g., 2024-01-15T10:00:00Z)",
113
+ description: "Start time: ISO 8601 UTC (e.g., 2024-01-15T10:00:00Z) or relative (e.g., '16h ago', '2d ago', 'now')",
110
114
  },
111
115
  end_time: {
112
116
  type: "string",
113
117
  format: "date-time",
114
- description: "ISO 8601 UTC end time (e.g., 2024-01-15T18:00:00Z)",
118
+ description: "End time: ISO 8601 UTC (e.g., 2024-01-15T18:00:00Z) or relative (e.g., 'now', '1h ago')",
115
119
  },
116
120
  app_name: {
117
121
  type: "string",
@@ -142,6 +146,10 @@ const BASE_TOOLS: Tool[] = [
142
146
  type: "string",
143
147
  description: "Filter audio by speaker name (case-insensitive partial match)",
144
148
  },
149
+ max_content_length: {
150
+ type: "integer",
151
+ description: "Truncate each result's text/transcription to this many characters using middle-truncation (keeps first half + last half). Useful for limiting token usage with small-context models.",
152
+ },
145
153
  },
146
154
  },
147
155
  },
@@ -150,7 +158,7 @@ const BASE_TOOLS: Tool[] = [
150
158
  description:
151
159
  "Export a video of screen recordings for a specific time range. " +
152
160
  "Creates an MP4 video from the recorded frames between the start and end times.\n\n" +
153
- "IMPORTANT: Use ISO 8601 UTC timestamps (e.g., 2024-01-15T10:00:00Z)\n\n" +
161
+ "IMPORTANT: Use ISO 8601 UTC timestamps (e.g., 2024-01-15T10:00:00Z) or relative times (e.g., '16h ago', 'now')\n\n" +
154
162
  "EXAMPLES:\n" +
155
163
  "- Last 30 minutes: Calculate timestamps from current time\n" +
156
164
  "- Specific meeting: Use the meeting's start and end times in UTC",
@@ -165,13 +173,13 @@ const BASE_TOOLS: Tool[] = [
165
173
  type: "string",
166
174
  format: "date-time",
167
175
  description:
168
- "Start time in ISO 8601 format UTC. MUST include timezone (Z for UTC). Example: '2024-01-15T10:00:00Z'",
176
+ "Start time: ISO 8601 UTC (e.g., '2024-01-15T10:00:00Z') or relative (e.g., '16h ago', 'now')",
169
177
  },
170
178
  end_time: {
171
179
  type: "string",
172
180
  format: "date-time",
173
181
  description:
174
- "End time in ISO 8601 format UTC. MUST include timezone (Z for UTC). Example: '2024-01-15T10:30:00Z'",
182
+ "End time: ISO 8601 UTC (e.g., '2024-01-15T10:30:00Z') or relative (e.g., 'now', '1h ago')",
175
183
  },
176
184
  fps: {
177
185
  type: "number",
@@ -199,12 +207,12 @@ const BASE_TOOLS: Tool[] = [
199
207
  start_time: {
200
208
  type: "string",
201
209
  format: "date-time",
202
- description: "ISO 8601 UTC start filter (e.g., 2024-01-15T10:00:00Z)",
210
+ description: "Start filter: ISO 8601 UTC (e.g., 2024-01-15T10:00:00Z) or relative (e.g., '16h ago', 'now')",
203
211
  },
204
212
  end_time: {
205
213
  type: "string",
206
214
  format: "date-time",
207
- description: "ISO 8601 UTC end filter (e.g., 2024-01-15T18:00:00Z)",
215
+ description: "End filter: ISO 8601 UTC (e.g., 2024-01-15T18:00:00Z) or relative (e.g., 'now', '1h ago')",
208
216
  },
209
217
  limit: {
210
218
  type: "integer",
@@ -223,9 +231,16 @@ const BASE_TOOLS: Tool[] = [
223
231
  name: "activity-summary",
224
232
  description:
225
233
  "Get a lightweight compressed activity overview for a time range (~200-500 tokens). " +
226
- "Returns app usage (name, frame count, minutes), recent accessibility texts, and audio speaker summary. " +
227
- "Use this FIRST for broad questions like 'what was I doing?' before drilling into search-content or search-elements. " +
228
- "Much cheaper than search-content for getting an overview.",
234
+ "Returns app usage (name, frame count, active minutes, first/last seen), recent accessibility texts, and audio speaker summary. " +
235
+ "Minutes are based on active session time (consecutive frames with gaps < 5min count as active). " +
236
+ "first_seen/last_seen show the wall-clock span per app.\n\n" +
237
+ "USE THIS TOOL (not search-content or raw SQL) for:\n" +
238
+ "- 'how long did I spend on X?' → active_minutes per app\n" +
239
+ "- 'which apps did I use today?' → app list sorted by active_minutes\n" +
240
+ "- 'what was I doing?' → broad overview before drilling deeper\n" +
241
+ "- Any time-spent or app-usage question\n\n" +
242
+ "WARNING: Do NOT estimate time from raw frame counts or SQL queries — those are inaccurate. " +
243
+ "This endpoint calculates actual active session time correctly.",
229
244
  annotations: {
230
245
  title: "Activity Summary",
231
246
  readOnlyHint: true,
@@ -236,12 +251,12 @@ const BASE_TOOLS: Tool[] = [
236
251
  start_time: {
237
252
  type: "string",
238
253
  format: "date-time",
239
- description: "Start of time range in ISO 8601 UTC (e.g., 2024-01-15T10:00:00Z)",
254
+ description: "Start of time range: ISO 8601 UTC (e.g., 2024-01-15T10:00:00Z) or relative (e.g., '16h ago', 'now')",
240
255
  },
241
256
  end_time: {
242
257
  type: "string",
243
258
  format: "date-time",
244
- description: "End of time range in ISO 8601 UTC (e.g., 2024-01-15T18:00:00Z)",
259
+ description: "End of time range: ISO 8601 UTC (e.g., 2024-01-15T18:00:00Z) or relative (e.g., 'now', '1h ago')",
245
260
  },
246
261
  app_name: {
247
262
  type: "string",
@@ -286,12 +301,12 @@ const BASE_TOOLS: Tool[] = [
286
301
  start_time: {
287
302
  type: "string",
288
303
  format: "date-time",
289
- description: "ISO 8601 UTC start time",
304
+ description: "Start time: ISO 8601 UTC or relative (e.g., '16h ago', 'now')",
290
305
  },
291
306
  end_time: {
292
307
  type: "string",
293
308
  format: "date-time",
294
- description: "ISO 8601 UTC end time",
309
+ description: "End time: ISO 8601 UTC or relative (e.g., 'now', '1h ago')",
295
310
  },
296
311
  app_name: {
297
312
  type: "string",
@@ -418,14 +433,28 @@ Screenpipe captures four types of data:
418
433
  - **Get keyboard input**: \`{"content_type": "input"}\`
419
434
  - **Get audio only**: \`{"content_type": "audio"}\`
420
435
 
436
+ ## Common User Requests → Correct Tool Choice
437
+ | User says | Use this tool | Key params |
438
+ |-----------|--------------|------------|
439
+ | "summarize my meeting/call" | search-content | content_type:"audio", NO q param, start_time |
440
+ | "what did they/I say about X" | search-content | content_type:"audio", NO q param (scan results manually) |
441
+ | "how long on X" / "which apps" / "time spent" | activity-summary | start_time, end_time |
442
+ | "what was I doing" | activity-summary | start_time, end_time (then drill into search-content) |
443
+ | "what was I reading/looking at" | search-content | content_type:"all", start_time |
444
+
445
+ ## Behavior Rules
446
+ - Act immediately on clear requests. NEVER ask "what time range?" or "which content type?" when the intent is obvious.
447
+ - If search returns empty, silently retry with wider time range or fewer filters. Do NOT ask the user what to change.
448
+ - For meetings: ALWAYS use content_type:"audio" and do NOT use the q param. Transcriptions are noisy — q filters too aggressively and misses relevant content.
449
+
421
450
  ## search-content
422
451
  | Parameter | Description | Default |
423
452
  |-----------|-------------|---------|
424
453
  | q | Search query | (none - returns all) |
425
454
  | content_type | all/ocr/audio/input/accessibility | all |
426
455
  | limit | Max results | 10 |
427
- | start_time | ISO 8601 UTC | (no filter) |
428
- | end_time | ISO 8601 UTC | (no filter) |
456
+ | start_time | ISO 8601 UTC or relative (e.g. '16h ago') | (no filter) |
457
+ | end_time | ISO 8601 UTC or relative (e.g. 'now') | (no filter) |
429
458
  | app_name | Filter by app | (no filter) |
430
459
  | include_frames | Include screenshots | false |
431
460
 
@@ -443,6 +472,19 @@ Screenpipe captures four types of data:
443
472
  4. **Fetch frame-context** for URLs and accessibility tree of specific frames
444
473
  5. **Screenshots** (include_frames=true) only when text isn't enough
445
474
 
475
+ ## Chat History
476
+ Previous screenpipe chat conversations are stored as individual JSON files in ~/.screenpipe/chats/{conversation-id}.json
477
+ Each file contains: id, title, messages[], createdAt, updatedAt. You can read these files to reference or search previous conversations.
478
+
479
+ ## Speaker Management
480
+ screenpipe auto-identifies speakers in audio. API endpoints for managing them:
481
+ - \`GET /speakers/unnamed?limit=10\` — list unnamed speakers
482
+ - \`GET /speakers/search?name=John\` — search by name
483
+ - \`POST /speakers/update\` with \`{"id": 5, "name": "John"}\` — rename a speaker
484
+ - \`POST /speakers/merge\` with \`{"speaker_to_keep_id": 1, "speaker_to_merge_id": 2}\` — merge duplicates
485
+ - \`GET /speakers/similar?speaker_id=5\` — find similar speakers for merging
486
+ - \`POST /speakers/reassign\` — reassign audio chunk to different speaker
487
+
446
488
  ## Tips
447
489
  1. Read screenpipe://context first to get current timestamps
448
490
  2. Use activity-summary before search-content for broad overview questions
@@ -993,8 +1035,12 @@ server.setRequestHandler(CallToolRequestSchema, async (request) => {
993
1035
 
994
1036
  // Format apps
995
1037
  const appsLines = (data.apps || []).map(
996
- (a: { name: string; frame_count: number; minutes: number }) =>
997
- ` ${a.name}: ${a.minutes} min (${a.frame_count} frames)`
1038
+ (a: { name: string; frame_count: number; minutes: number; first_seen?: string; last_seen?: string }) => {
1039
+ const timeSpan = a.first_seen && a.last_seen
1040
+ ? `, ${a.first_seen.slice(11, 16)}–${a.last_seen.slice(11, 16)} UTC`
1041
+ : "";
1042
+ return ` ${a.name}: ${a.minutes} min (${a.frame_count} frames${timeSpan})`;
1043
+ }
998
1044
  );
999
1045
 
1000
1046
  // Format audio