screenpipe-mcp 0.8.4 → 0.8.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -28,7 +28,47 @@ The easiest way to use screenpipe-mcp is with npx. Edit your Claude Desktop conf
28
28
  }
29
29
  ```
30
30
 
31
- ### Option 2: From Source
31
+ ### Option 2: HTTP Server (Remote / Network Access)
32
+
33
+ The MCP server can run over HTTP using the [Streamable HTTP transport](https://modelcontextprotocol.io/specification/2025-03-26/basic/transports#streamable-http), allowing remote MCP clients to connect over the network instead of stdio. This is ideal when your AI assistant (e.g., OpenClaw) runs on a different machine than screenpipe.
34
+
35
+ ```bash
36
+ # from npm
37
+ npx screenpipe-mcp-http --port 3031
38
+
39
+ # or from source
40
+ npm run start:http -- --port 3031
41
+ ```
42
+
43
+ The server exposes:
44
+ - **MCP endpoint**: `http://localhost:3031/mcp` — Streamable HTTP transport (POST for requests, GET for SSE stream)
45
+ - **Health check**: `http://localhost:3031/health`
46
+
47
+ **Options:**
48
+ | Flag | Description | Default |
49
+ |------|-------------|---------|
50
+ | `--port` | Port for the MCP HTTP server | 3031 |
51
+ | `--screenpipe-port` | Port where screenpipe API is running | 3030 |
52
+
53
+ **Connecting a remote MCP client:**
54
+
55
+ Point any MCP client that supports HTTP transport at the `/mcp` endpoint:
56
+
57
+ ```json
58
+ {
59
+ "mcpServers": {
60
+ "screenpipe": {
61
+ "url": "http://<your-ip>:3031/mcp"
62
+ }
63
+ }
64
+ }
65
+ ```
66
+
67
+ If your machines are on different networks, expose port 3031 via Tailscale, SSH tunnel, or similar — see the [OpenClaw integration guide](https://docs.screenpi.pe/openclaw) for detailed examples.
68
+
69
+ > **Note:** The HTTP server currently exposes `search_content` only. The stdio server has the full tool set (export-video, list-meetings, activity-summary, search-elements, frame-context). We're working on bringing HTTP to full parity.
70
+
71
+ ### Option 3: From Source
32
72
 
33
73
  Clone and build from source:
34
74
 
@@ -62,6 +102,13 @@ Test with MCP Inspector:
62
102
  npx @modelcontextprotocol/inspector npx screenpipe-mcp
63
103
  ```
64
104
 
105
+ ## Transport Modes
106
+
107
+ | Mode | Command | Use Case |
108
+ |------|---------|----------|
109
+ | **stdio** (default) | `npx screenpipe-mcp` | Claude Desktop, local MCP clients |
110
+ | **HTTP** | `npx screenpipe-mcp-http` | Remote clients, network access, OpenClaw on VPS |
111
+
65
112
  ## Available Tools
66
113
 
67
114
  ### search-content
@@ -79,6 +126,23 @@ Export screen recordings as video files:
79
126
  - Specify time range with start/end times
80
127
  - Configurable FPS for output video
81
128
 
129
+ ### activity-summary
130
+ Get a lightweight compressed activity overview for a time range:
131
+ - App usage with active minutes and frame counts
132
+ - Recent accessibility texts
133
+ - Audio speaker summary
134
+
135
+ ### list-meetings
136
+ List detected meetings with duration, app, and attendees.
137
+
138
+ ### search-elements
139
+ Search structured UI elements (accessibility tree nodes and OCR text blocks):
140
+ - Filter by source, role, app, time range
141
+ - Much lighter than search-content for targeted UI lookups
142
+
143
+ ### frame-context
144
+ Get accessibility text, parsed tree nodes, and extracted URLs for a specific frame.
145
+
82
146
  ## Example Queries in Claude
83
147
 
84
148
  - "Search for any mentions of 'rust' in my screen recordings"
package/dist/index.js CHANGED
@@ -69,7 +69,7 @@ const SCREENPIPE_API = `http://localhost:${port}`;
69
69
  // Initialize server
70
70
  const server = new index_js_1.Server({
71
71
  name: "screenpipe",
72
- version: "0.8.3",
72
+ version: "0.8.5",
73
73
  }, {
74
74
  capabilities: {
75
75
  tools: {},
@@ -85,10 +85,14 @@ const BASE_TOOLS = [
85
85
  "Returns timestamped results with app context. " +
86
86
  "Call with no parameters to get recent activity. " +
87
87
  "Use the 'screenpipe://context' resource for current time when building time-based queries.\n\n" +
88
+ "WHEN TO USE WHICH content_type:\n" +
89
+ "- For meetings/calls/conversations: content_type='audio', do NOT use q param (transcriptions are noisy, q filters too aggressively)\n" +
90
+ "- For screen text/reading: content_type='all' or 'accessibility'\n" +
91
+ "- For time spent/app usage questions: use activity-summary tool instead (this tool returns content, not time stats)\n\n" +
88
92
  "SEARCH STRATEGY: First search with ONLY time params (start_time/end_time) — no q, no app_name, no content_type. " +
89
93
  "This gives ground truth of what's recorded. Scan results to find correct app_name values, then narrow with filters using exact observed values. " +
90
- "App names are case-sensitive and may differ from user input (e.g. 'Discord' vs 'Discord.exe'). " +
91
- "The q param searches captured text (accessibility/OCR), NOT app names. NEVER report 'no data' after one filtered search — verify with unfiltered time-only search first.\n\n" +
94
+ "App names are case-sensitive (e.g. 'Discord' vs 'Discord.exe'). " +
95
+ "The q param searches captured text, NOT app names. NEVER report 'no data' after one filtered search — verify with unfiltered time-only search first.\n\n" +
92
96
  "DEEP LINKS: When referencing specific moments, create clickable links using IDs from search results:\n" +
93
97
  "- OCR results (PREFERRED): [10:30 AM — Chrome](screenpipe://frame/12345) — use content.frame_id from the result\n" +
94
98
  "- Audio results: [meeting at 3pm](screenpipe://timeline?timestamp=2024-01-15T15:00:00Z) — use exact timestamp from result\n" +
@@ -102,12 +106,12 @@ const BASE_TOOLS = [
102
106
  properties: {
103
107
  q: {
104
108
  type: "string",
105
- description: "Search query. Optional - omit to return all recent content.",
109
+ description: "Search query (full-text search on captured text). Optional - omit to return all content in time range. IMPORTANT: Do NOT use q for audio/meeting searches — transcriptions are noisy and q filters too aggressively. Only use q when searching for specific text the user saw on screen.",
106
110
  },
107
111
  content_type: {
108
112
  type: "string",
109
113
  enum: ["all", "ocr", "audio", "input", "accessibility"],
110
- description: "Content type filter: 'ocr' (screen text via OCR, legacy fallback), 'audio' (transcriptions), 'input' (clicks, keystrokes, clipboard, app switches), 'accessibility' (accessibility tree text, preferred for screen content), 'all'. Default: 'all'.",
114
+ description: "Content type filter: 'audio' (transcriptions use for meetings/calls/conversations), 'accessibility' (accessibility tree text, preferred for screen content), 'ocr' (screen text via OCR, legacy fallback), 'input' (clicks, keystrokes, clipboard, app switches), 'all'. Default: 'all'. For meeting/call queries, ALWAYS use 'audio'.",
111
115
  default: "all",
112
116
  },
113
117
  limit: {
@@ -240,9 +244,14 @@ const BASE_TOOLS = [
240
244
  description: "Get a lightweight compressed activity overview for a time range (~200-500 tokens). " +
241
245
  "Returns app usage (name, frame count, active minutes, first/last seen), recent accessibility texts, and audio speaker summary. " +
242
246
  "Minutes are based on active session time (consecutive frames with gaps < 5min count as active). " +
243
- "first_seen/last_seen show the wall-clock span per app. " +
244
- "Use this FIRST for broad questions like 'what was I doing?' before drilling into search-content or search-elements. " +
245
- "Much cheaper than search-content for getting an overview.",
247
+ "first_seen/last_seen show the wall-clock span per app.\n\n" +
248
+ "USE THIS TOOL (not search-content or raw SQL) for:\n" +
249
+ "- 'how long did I spend on X?' → active_minutes per app\n" +
250
+ "- 'which apps did I use today?' → app list sorted by active_minutes\n" +
251
+ "- 'what was I doing?' → broad overview before drilling deeper\n" +
252
+ "- Any time-spent or app-usage question\n\n" +
253
+ "WARNING: Do NOT estimate time from raw frame counts or SQL queries — those are inaccurate. " +
254
+ "This endpoint calculates actual active session time correctly.",
246
255
  annotations: {
247
256
  title: "Activity Summary",
248
257
  readOnlyHint: true,
@@ -427,6 +436,20 @@ Screenpipe captures four types of data:
427
436
  - **Get keyboard input**: \`{"content_type": "input"}\`
428
437
  - **Get audio only**: \`{"content_type": "audio"}\`
429
438
 
439
+ ## Common User Requests → Correct Tool Choice
440
+ | User says | Use this tool | Key params |
441
+ |-----------|--------------|------------|
442
+ | "summarize my meeting/call" | search-content | content_type:"audio", NO q param, start_time |
443
+ | "what did they/I say about X" | search-content | content_type:"audio", NO q param (scan results manually) |
444
+ | "how long on X" / "which apps" / "time spent" | activity-summary | start_time, end_time |
445
+ | "what was I doing" | activity-summary | start_time, end_time (then drill into search-content) |
446
+ | "what was I reading/looking at" | search-content | content_type:"all", start_time |
447
+
448
+ ## Behavior Rules
449
+ - Act immediately on clear requests. NEVER ask "what time range?" or "which content type?" when the intent is obvious.
450
+ - If search returns empty, silently retry with wider time range or fewer filters. Do NOT ask the user what to change.
451
+ - For meetings: ALWAYS use content_type:"audio" and do NOT use the q param. Transcriptions are noisy — q filters too aggressively and misses relevant content.
452
+
430
453
  ## search-content
431
454
  | Parameter | Description | Default |
432
455
  |-----------|-------------|---------|
@@ -452,6 +475,19 @@ Screenpipe captures four types of data:
452
475
  4. **Fetch frame-context** for URLs and accessibility tree of specific frames
453
476
  5. **Screenshots** (include_frames=true) only when text isn't enough
454
477
 
478
+ ## Chat History
479
+ Previous screenpipe chat conversations are stored as individual JSON files in ~/.screenpipe/chats/{conversation-id}.json
480
+ Each file contains: id, title, messages[], createdAt, updatedAt. You can read these files to reference or search previous conversations.
481
+
482
+ ## Speaker Management
483
+ screenpipe auto-identifies speakers in audio. API endpoints for managing them:
484
+ - \`GET /speakers/unnamed?limit=10\` — list unnamed speakers
485
+ - \`GET /speakers/search?name=John\` — search by name
486
+ - \`POST /speakers/update\` with \`{"id": 5, "name": "John"}\` — rename a speaker
487
+ - \`POST /speakers/merge\` with \`{"speaker_to_keep_id": 1, "speaker_to_merge_id": 2}\` — merge duplicates
488
+ - \`GET /speakers/similar?speaker_id=5\` — find similar speakers for merging
489
+ - \`POST /speakers/reassign\` — reassign audio chunk to different speaker
490
+
455
491
  ## Tips
456
492
  1. Read screenpipe://context first to get current timestamps
457
493
  2. Use activity-summary before search-content for broad overview questions
package/package.json CHANGED
@@ -1,10 +1,11 @@
1
1
  {
2
2
  "name": "screenpipe-mcp",
3
- "version": "0.8.4",
3
+ "version": "0.8.6",
4
4
  "description": "MCP server for screenpipe - search your screen recordings and audio transcriptions",
5
5
  "main": "dist/index.js",
6
6
  "bin": {
7
- "screenpipe-mcp": "dist/index.js"
7
+ "screenpipe-mcp": "dist/index.js",
8
+ "screenpipe-mcp-http": "dist/http-server.js"
8
9
  },
9
10
  "scripts": {
10
11
  "build": "tsc",
package/src/index.ts CHANGED
@@ -48,7 +48,7 @@ const SCREENPIPE_API = `http://localhost:${port}`;
48
48
  const server = new Server(
49
49
  {
50
50
  name: "screenpipe",
51
- version: "0.8.3",
51
+ version: "0.8.5",
52
52
  },
53
53
  {
54
54
  capabilities: {
@@ -68,10 +68,14 @@ const BASE_TOOLS: Tool[] = [
68
68
  "Returns timestamped results with app context. " +
69
69
  "Call with no parameters to get recent activity. " +
70
70
  "Use the 'screenpipe://context' resource for current time when building time-based queries.\n\n" +
71
+ "WHEN TO USE WHICH content_type:\n" +
72
+ "- For meetings/calls/conversations: content_type='audio', do NOT use q param (transcriptions are noisy, q filters too aggressively)\n" +
73
+ "- For screen text/reading: content_type='all' or 'accessibility'\n" +
74
+ "- For time spent/app usage questions: use activity-summary tool instead (this tool returns content, not time stats)\n\n" +
71
75
  "SEARCH STRATEGY: First search with ONLY time params (start_time/end_time) — no q, no app_name, no content_type. " +
72
76
  "This gives ground truth of what's recorded. Scan results to find correct app_name values, then narrow with filters using exact observed values. " +
73
- "App names are case-sensitive and may differ from user input (e.g. 'Discord' vs 'Discord.exe'). " +
74
- "The q param searches captured text (accessibility/OCR), NOT app names. NEVER report 'no data' after one filtered search — verify with unfiltered time-only search first.\n\n" +
77
+ "App names are case-sensitive (e.g. 'Discord' vs 'Discord.exe'). " +
78
+ "The q param searches captured text, NOT app names. NEVER report 'no data' after one filtered search — verify with unfiltered time-only search first.\n\n" +
75
79
  "DEEP LINKS: When referencing specific moments, create clickable links using IDs from search results:\n" +
76
80
  "- OCR results (PREFERRED): [10:30 AM — Chrome](screenpipe://frame/12345) — use content.frame_id from the result\n" +
77
81
  "- Audio results: [meeting at 3pm](screenpipe://timeline?timestamp=2024-01-15T15:00:00Z) — use exact timestamp from result\n" +
@@ -85,12 +89,12 @@ const BASE_TOOLS: Tool[] = [
85
89
  properties: {
86
90
  q: {
87
91
  type: "string",
88
- description: "Search query. Optional - omit to return all recent content.",
92
+ description: "Search query (full-text search on captured text). Optional - omit to return all content in time range. IMPORTANT: Do NOT use q for audio/meeting searches — transcriptions are noisy and q filters too aggressively. Only use q when searching for specific text the user saw on screen.",
89
93
  },
90
94
  content_type: {
91
95
  type: "string",
92
96
  enum: ["all", "ocr", "audio", "input", "accessibility"],
93
- description: "Content type filter: 'ocr' (screen text via OCR, legacy fallback), 'audio' (transcriptions), 'input' (clicks, keystrokes, clipboard, app switches), 'accessibility' (accessibility tree text, preferred for screen content), 'all'. Default: 'all'.",
97
+ description: "Content type filter: 'audio' (transcriptions use for meetings/calls/conversations), 'accessibility' (accessibility tree text, preferred for screen content), 'ocr' (screen text via OCR, legacy fallback), 'input' (clicks, keystrokes, clipboard, app switches), 'all'. Default: 'all'. For meeting/call queries, ALWAYS use 'audio'.",
94
98
  default: "all",
95
99
  },
96
100
  limit: {
@@ -229,9 +233,14 @@ const BASE_TOOLS: Tool[] = [
229
233
  "Get a lightweight compressed activity overview for a time range (~200-500 tokens). " +
230
234
  "Returns app usage (name, frame count, active minutes, first/last seen), recent accessibility texts, and audio speaker summary. " +
231
235
  "Minutes are based on active session time (consecutive frames with gaps < 5min count as active). " +
232
- "first_seen/last_seen show the wall-clock span per app. " +
233
- "Use this FIRST for broad questions like 'what was I doing?' before drilling into search-content or search-elements. " +
234
- "Much cheaper than search-content for getting an overview.",
236
+ "first_seen/last_seen show the wall-clock span per app.\n\n" +
237
+ "USE THIS TOOL (not search-content or raw SQL) for:\n" +
238
+ "- 'how long did I spend on X?' → active_minutes per app\n" +
239
+ "- 'which apps did I use today?' → app list sorted by active_minutes\n" +
240
+ "- 'what was I doing?' → broad overview before drilling deeper\n" +
241
+ "- Any time-spent or app-usage question\n\n" +
242
+ "WARNING: Do NOT estimate time from raw frame counts or SQL queries — those are inaccurate. " +
243
+ "This endpoint calculates actual active session time correctly.",
235
244
  annotations: {
236
245
  title: "Activity Summary",
237
246
  readOnlyHint: true,
@@ -424,6 +433,20 @@ Screenpipe captures four types of data:
424
433
  - **Get keyboard input**: \`{"content_type": "input"}\`
425
434
  - **Get audio only**: \`{"content_type": "audio"}\`
426
435
 
436
+ ## Common User Requests → Correct Tool Choice
437
+ | User says | Use this tool | Key params |
438
+ |-----------|--------------|------------|
439
+ | "summarize my meeting/call" | search-content | content_type:"audio", NO q param, start_time |
440
+ | "what did they/I say about X" | search-content | content_type:"audio", NO q param (scan results manually) |
441
+ | "how long on X" / "which apps" / "time spent" | activity-summary | start_time, end_time |
442
+ | "what was I doing" | activity-summary | start_time, end_time (then drill into search-content) |
443
+ | "what was I reading/looking at" | search-content | content_type:"all", start_time |
444
+
445
+ ## Behavior Rules
446
+ - Act immediately on clear requests. NEVER ask "what time range?" or "which content type?" when the intent is obvious.
447
+ - If search returns empty, silently retry with wider time range or fewer filters. Do NOT ask the user what to change.
448
+ - For meetings: ALWAYS use content_type:"audio" and do NOT use the q param. Transcriptions are noisy — q filters too aggressively and misses relevant content.
449
+
427
450
  ## search-content
428
451
  | Parameter | Description | Default |
429
452
  |-----------|-------------|---------|
@@ -449,6 +472,19 @@ Screenpipe captures four types of data:
449
472
  4. **Fetch frame-context** for URLs and accessibility tree of specific frames
450
473
  5. **Screenshots** (include_frames=true) only when text isn't enough
451
474
 
475
+ ## Chat History
476
+ Previous screenpipe chat conversations are stored as individual JSON files in ~/.screenpipe/chats/{conversation-id}.json
477
+ Each file contains: id, title, messages[], createdAt, updatedAt. You can read these files to reference or search previous conversations.
478
+
479
+ ## Speaker Management
480
+ screenpipe auto-identifies speakers in audio. API endpoints for managing them:
481
+ - \`GET /speakers/unnamed?limit=10\` — list unnamed speakers
482
+ - \`GET /speakers/search?name=John\` — search by name
483
+ - \`POST /speakers/update\` with \`{"id": 5, "name": "John"}\` — rename a speaker
484
+ - \`POST /speakers/merge\` with \`{"speaker_to_keep_id": 1, "speaker_to_merge_id": 2}\` — merge duplicates
485
+ - \`GET /speakers/similar?speaker_id=5\` — find similar speakers for merging
486
+ - \`POST /speakers/reassign\` — reassign audio chunk to different speaker
487
+
452
488
  ## Tips
453
489
  1. Read screenpipe://context first to get current timestamps
454
490
  2. Use activity-summary before search-content for broad overview questions