pikiclaw 0.3.51 → 0.3.53

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -24,7 +24,7 @@ npx pikiclaw@latest
24
24
  <b>English</b> | <a href="README.zh-CN.md">简体中文</a>
25
25
  </p>
26
26
 
27
- <img src="docs/promo-dashboard-workspace.png" alt="Workspace" width="780">
27
+ <img src="docs/promo-orchestrator.png" alt="Pikiclaw — AI-Native Agent Orchestrator" width="820">
28
28
 
29
29
  </div>
30
30
 
@@ -36,29 +36,14 @@ npx pikiclaw@latest
36
36
 
37
37
  The product is the orchestrator itself. Everything else simply plugs in. **And what's cooler is that this orchestrator is entirely self-bootstrapped**—pikiclaw is what we use to build pikiclaw.
38
38
 
39
- ```text
40
- Terminal Layer Telegram · Feishu · WeChat · Slack · Discord · DingTalk · WeCom · Web Dashboard
41
- \__________________________|__________________________/
42
- v
43
- ┌──────────────────────────────┐
44
- │ pikiclaw orchestrator │
45
- └──────────────────────────────┘
46
- |
47
- ┌────────────────────────────────────────┼────────────────────────────────────────┐
48
- v v v
49
- Agent Layer Model Layer Tool Layer
50
- Claude Code · Codex · Gemini · Hermes Claude · GPT · Gemini · DeepSeek Skills · MCP · CLI
51
- (driver registry · ACP · any agent) Doubao · MiMo · MiniMax · OpenRouter (global × workspace)
52
- · any OpenAI-compatible proxy · …
53
- |
54
- v
55
- Your Machine
56
- ```
39
+ The diagram above maps the four layers we stitch together:
40
+
41
+ - **Entry Points** — Telegram, Feishu, WeChat, Slack, Discord, DingTalk, WeCom, the Web Dashboard, and the local API/CLI are all first-class, co-equal terminals. New ones plug right in.
42
+ - **Pluggable Agents** — Claude Code, Codex, Gemini, and Hermes ship as built-in drivers. Hermes speaks ACP (Agent Client Protocol); the registry accepts any CLI- or ACP-based agent through the same `AgentDriver` contract.
43
+ - **Model Routing** — Frontier (Claude · GPT · Gemini), Chinese domestic (DeepSeek · Doubao · MiMo · MiniMax · Qwen), local runtimes (Ollama, mlx-lm on Apple Silicon), OpenRouter, and any OpenAI-compatible proxy. Providers + Profiles are a first-class vault with a read-only `models.dev` catalog and per-agent environment injection at spawn time.
44
+ - **Tool Mesh** — Skills, MCP servers, CLI tools, web search, and desktop automation, intelligently merged across global × workspace scopes and silently injected into every session.
57
45
 
58
- - **Terminal Layer** — Telegram, Feishu, WeChat, Slack, Discord, DingTalk, WeCom, and the Web Dashboard are all first-class, co-equal entry points. New terminals plug right in.
59
- - **Agent Layer** — We use the official Claude Code, Codex, Gemini, and Hermes CLIs as underlying drivers. Hermes communicates via ACP (Agent Client Protocol); our flexible registry can accommodate virtually any agent.
60
- - **Model Layer** — Access Claude, GPT, Gemini, leading Chinese domestic models (DeepSeek, Doubao, MiMo, MiniMax), plus OpenRouter and any OpenAI-compatible proxy. Providers and Profiles are treated as a first-class layer with their own credential vault, a read-only models.dev catalog, and per-agent environment injection.
61
- - **Tool Layer** — Skills, MCP servers, and CLI tools are intelligently merged across global and workspace scopes, automatically injected into every session.
46
+ Sitting in the middle is the **Pikiclaw Orchestration Core** — the runtime that owns routing, memory, observability, and the bot lifecycle so any terminal can talk to any agent on any model through any tool.
62
47
 
63
48
  ---
64
49
 
@@ -98,7 +83,7 @@ This is the shape that matters: one creator, with a swarm of AI agents at their
98
83
 
99
84
  <p align="center"><img src="docs/promo-demo.gif" alt="Demo: Ask Telegram, agent works locally, result returns to chat" width="780"></p>
100
85
 
101
- > **Web Dashboard** — A multi-pane workspace featuring a session list, conversation threads, tool-use traces, and an input composer (supporting 1, 2, 3, or 6-pane layouts).
86
+ > **Web Dashboard** — Multi-pane workspace with a session list, live conversation threads, tool-use traces, file/image attachments, queued-task chips, and a unified input composer (1 / 2 / 3 / 6 pane layouts, light/dark theme, EN/中文).
102
87
 
103
88
  <p align="center"><img src="docs/promo-dashboard-workspace.png" alt="Web Dashboard workspace" width="780"></p>
104
89
 
@@ -113,13 +98,13 @@ This is the shape that matters: one creator, with a swarm of AI agents at their
113
98
 
114
99
  <img src="docs/promo-dashboard-im.png" alt="IM Access" width="780">
115
100
 
116
- > **Agents** — Manage installed agent CLIs, set your default agent, and configure per-agent models and reasoning effort levels.
101
+ > **Agents** — Manage installed agent CLIs, set your default agent, configure per-agent models and reasoning effort, and bind a Profile to drive an agent on a non-native model.
117
102
 
118
103
  <img src="docs/promo-dashboard-agents.png" alt="Agents" width="780">
119
104
 
120
- > **Models** — A secure Providers + Profiles vault (supporting Claude, GPT, Gemini, DeepSeek, Doubao, MiMo, MiniMax, OpenRouter, and any OpenAI-compatible proxy), validated against the models.dev catalog and injected directly per agent.
105
+ > **Models** — A secure Providers + Profiles vault (Claude · GPT · Gemini · DeepSeek · Doubao · MiMo · MiniMax · Qwen · OpenRouter · any OpenAI-compatible proxy), validated against the read-only `models.dev` catalog and injected per-agent at spawn time. Local backends (Ollama, mlx-lm on Apple Silicon) attach automatically the moment they're detected.
121
106
 
122
- > **Extensions** — Manage global MCP servers, community skills, and built-in automation for headless browsers and macOS desktop (Peekaboo).
107
+ > **Extensions** — Manage global MCP servers, community skills, and built-in automation for headless browsers and macOS desktop (Peekaboo). Add servers via stdio, HTTP, or OAuth 2.1 with Dynamic Client Registration.
123
108
 
124
109
  <img src="docs/promo-dashboard-extensions.png" alt="Extensions" width="780">
125
110
 
@@ -151,8 +136,6 @@ cd your-workspace
151
136
  npx pikiclaw@latest
152
137
  ```
153
138
 
154
- <p align="center"><img src="docs/promo-install.gif" alt="One-command install" width="780"></p>
155
-
156
139
  This instantly opens the **Web Dashboard** at `http://localhost:3939`. From there, you can drive sessions in the browser, connect IM channels, configure agents and models, install MCP servers and skills, and manage system permissions. Everything else is just one click away.
157
140
 
158
141
  <details>
@@ -193,8 +176,9 @@ agent CLI versions).
193
176
  - **Self-Hosted Dev Loop** — pikiclaw was built using pikiclaw. The dev workflow *is* the product: drive the orchestrator from your phone, write code, ship a release, and iterate.
194
177
  - **Walk-Away Coding** — Kick off a massive refactoring task, close your laptop, and monitor/steer it from your phone over Telegram. The agent continues running locally, streaming results back to your chat.
195
178
  - **Multi-Agent Tag Team** — Let Claude Code draft an initial implementation, switch to Codex for an in-depth review, and finally hand it over to Gemini for a fresh perspective. Same files, same continuous session history.
196
- - **Domestic Model Routing** — When latency, cost, or compliance demands a non-frontier model, use a wrapper driver to run Claude Code effortlessly on DeepSeek or Doubao.
179
+ - **Domestic Model Routing** — When latency, cost, or compliance demands a non-frontier model, use a wrapper driver to run Claude Code effortlessly on DeepSeek, Doubao, Qwen, or a fully local Ollama / mlx-lm endpoint.
197
180
  - **The Group Chat Agent** — Drop pikiclaw into a Feishu, Slack, Discord, or WeCom workgroup. The entire team shares one orchestrator, one project workspace, and a unified set of powerful skills.
181
+ - **Codex-Generated Images On Tap** — Ask Codex to draft a poster, a diagram, a UI mock — the image streams back as a real attachment in the chat with a click-to-reveal Image Prompt so you can audit the exact text that went to the model. Iterate by replying, not by re-opening a browser.
198
182
  - **Computer-Use, Controlled by You** — Enable the managed Chrome (Playwright) and macOS desktop (Peekaboo, via Accessibility + ScreenCaptureKit) capabilities. The agent can suddenly `see` the screen, click, type, and manage windows, menus, and the Dock—while you steer it from your phone. Book a meeting, scrape a complex dashboard, run end-to-end tests, or drive any native macOS application.
199
183
  - **Skill-Driven Workflows** — Install community skills (`promote`, `snipe`, `review`, `security-review`, etc.) once, and trigger them instantly from any connected terminal using `/sk_<name>`.
200
184
 
@@ -204,45 +188,46 @@ agent CLI versions).
204
188
 
205
189
  ### Terminal Layer
206
190
 
207
- - **Seven Native IM Channels** — Telegram, Feishu, WeChat (personal), Slack, Discord, DingTalk, and WeCom. Run one, several, or all of them simultaneously. Each channel is strictly isolated at the code level; adding a new one (like WhatsApp or a mobile app) requires zero changes to the others.
208
- - **Web Dashboard** — Drive sessions directly from your browser with the exact same conversational flow, tool-use tracing, and streaming experience as IM. Enjoy a multi-pane workspace (1/2/3/6 panes), light/dark themes, and full EN/中文 i18n support.
209
- - **Live Streaming Preview** — Watch messages update in place as the agent thinks. Long text auto-splits beautifully; images and files stream back to the UI in real time.
191
+ - **Seven Native IM Channels** — Telegram, Feishu, WeChat (personal), Slack, Discord, DingTalk, and WeCom. Run one, several, or all of them simultaneously. Each channel is strictly isolated at the code level; adding a new one (WhatsApp, a mobile app, voice) requires zero changes to the others.
192
+ - **Web Dashboard** — Drive sessions directly from your browser with the exact same conversational flow, tool-use tracing, and streaming experience as IM. Multi-pane workspace (1/2/3/6 panes), light/dark themes, and full EN/中文 i18n.
193
+ - **Live Streaming Preview** — Watch messages update in place as the agent thinks. Long text auto-splits cleanly; thinking traces, tool calls, and plans surface as collapsible cards; images and files stream back to the UI in real time.
194
+ - **Queue & Steer From One Composer** — Send while the stream is still running. New messages line up as queued chips you can preview, recall, or hand-steer; one click stops the active turn AND drains the rest of the queue.
210
195
 
211
196
  ### Agent Layer
212
197
 
213
198
  - **Official CLIs as Drivers** — Powered directly by Claude Code, Codex CLI, Gemini CLI, and Hermes (via ACP). We don't rewrite the agent core—you inherit upstream capabilities and Day-0 updates automatically.
214
199
  - **ACP-Native Architecture** — Hermes integrates natively through the [Agent Client Protocol](https://agentclientprotocol.com), spawning `hermes acp` over JSON-RPC stdio. Any future ACP-compatible agent plugs in the exact same way.
215
200
  - **Pluggable Driver Registry** — The only contract is `src/agent/driver.ts`. New CLI- or ACP-based agents can drop right in alongside our four built-in drivers.
216
- - **Per-Session Agent Switching** — Swap the "brain" on the fly without leaving your workspace.
217
- - **Steer & Interrupt** — Interrupt a heavy running task and force a queued message to the front of the line.
218
- - **Codex Human-in-the-Loop** — When Codex pauses to ask you a question, it forwards the prompt interactively to your IM. Reply directly in the chat, and the task resumes seamlessly.
219
- - **Persistent Goals** — Use `/goal` to set a long-running, session-scoped objective complete with a token budget. Supports pause/resume, and the agent will autonomously self-terminate only when it verifies the goal is complete.
201
+ - **Per-Session Agent Switching** — Swap the "brain" on the fly without leaving your workspace; the same conversation history follows you to the next agent.
202
+ - **Steer & Interrupt** — Interrupt a heavy running task and force a queued message to the front of the line, or stop everything for this session in one click.
203
+ - **Codex Human-in-the-Loop** — When Codex pauses to ask you a question, the prompt is forwarded to your active terminal (IM or Dashboard). Reply inline and the task resumes seamlessly.
204
+ - **Persistent Goals, Routed by Agent** — `/goal <objective>` keeps a session working toward a target until the agent self-audits completion. Codex uses its native `thread/goal/*` RPC with optional `budget=N` tokens and full pause/resume; Claude uses its in-process Stop hook with a Haiku judge and auto-clears when the condition is met; other agents fall back to pikiclaw's portable continuation loop.
205
+ - **Image Generation, Surfaced End-to-End** — Codex's built-in `image_gen` tool (and Claude MCP / Gemini Imagen sources) lands as a real image attachment in the chat — not a wall of base64. The agent's actual `revised_prompt` rides along as a click-to-reveal **Image Prompt** disclosure in the Dashboard, so you can audit *why* the model drew what it drew. A "Generating image…" chip ticks alongside the assistant turn while the call is in flight.
220
206
 
221
207
  ### Model Layer
222
208
 
223
- - **Frontier + Domestic + Proxies** — Supports the Claude 4 family, GPT-5 / Codex, Gemini, DeepSeek, Doubao, MiMo, MiniMax, OpenRouter, and any custom OpenAI-compatible proxy endpoint.
224
- - **Providers & Profiles Vault** — A first-class data model that securely isolates credentials in `~/.pikiclaw/setting.json`. Browse a read-only models.dev catalog, validate keys with real provider probes, and bind a profile to an agent for automatic environment injection at spawn-time.
225
- - **Per-Session Model & Reasoning Effort** — Switch models or adjust reasoning capabilities dynamically via the Dashboard, `/models`, or `/mode`.
226
- - **Per-Agent Deep Injection** — `resolveAgentInjection(agentId)` forces the active profile's environment variables down at spawn time. This means you can run Claude Code on top of DeepSeek or Doubao without ever touching the upstream client's config.
209
+ - **Frontier + Domestic + Local + Proxies** — Frontier (Claude · GPT-5/Codex · Gemini), Chinese domestic (DeepSeek · Doubao · MiMo · MiniMax · Qwen), local runtimes (Ollama, mlx-lm on Apple Silicon), OpenRouter, and any custom OpenAI-compatible proxy.
210
+ - **Providers & Profiles Vault** — Credentials are isolated in `~/.pikiclaw/setting.json`. Browse a read-only `models.dev` catalog, validate keys with real provider probes, and bind a Profile to an agent for automatic environment injection at spawn time.
211
+ - **Local Models, Zero-Config Attach** — Detected Ollama or mlx-lm backends auto-attach as a Provider — no extra wiring. The Dashboard tile shows status, install hints (brew/pipx), the exact `ollama pull` / `mlx_lm.server` command, and RAM headroom warnings against the host's total memory.
212
+ - **Per-Session Model & Reasoning Effort** — Switch models or reasoning effort live via the Dashboard, `/models`, or `/mode`. Effort levels are per-agent (Claude: low max; Codex: low very high; Hermes: minimal very high).
213
+ - **Per-Agent Deep Injection** — `resolveAgentInjection(agentId)` forces the bound Profile's env vars down at spawn time. Run Claude Code on top of DeepSeek, Doubao, or a local Ollama model without ever editing the upstream client's config.
227
214
 
228
215
  ### Tool Layer
229
216
 
230
- - **Robust Skills System** — Project-specific skills live safely in `.pikiclaw/skills/*/SKILL.md` (and we fully support legacy `.claude/commands/*.md` formats). Install community packages with one click from GitHub (`owner/repo`) or browse our curated packs (like Anthropic Official, Vercel Agent Skills, etc.). Trigger them anywhere with `/skills` and `/sk_<name>`.
231
- - **Massive MCP Server Ecosystem** — Browse the [MCP Registry](https://registry.modelcontextprotocol.io), add custom stdio or HTTP servers, enforce real handshake health-checks, and utilize OAuth 2.1 with Dynamic Client Registration. Our recommended catalog flawlessly covers GitHub, Atlassian, Notion, Linear, Sentry, Cloudflare, Slack, Feishu/Lark, Stripe, Hugging Face, Gamma, Brave Search, Perplexity, Filesystem, SQLite, and PostgreSQL. Furthermore, we ship with two built-in, hyper-powerful computer-use servers: `pikiclaw-browser` (driving Chrome via Playwright) and `peekaboo` (driving the macOS GUI via Peekaboo).
232
- - **Seamless CLI Tool Integration** — Auto-detects versions and authentication states for popular CLIs. We natively support OAuth-web login handoffs for browser-based authentications, routing everything smoothly through the agent's standard tool surface.
233
- - **Session-Scoped MCP Bridge** — Foundational tools like `im_list_files`, `im_send_file`, `im_ask_user`, alongside the managed browser and macOS desktop tools (when enabled), are automatically injected deep into every single session you launch.
234
- - **Two-Tier Merge Resolution** — Tool scopes follow a simple rule: `global < workspace < built-in`. The engine automatically resolves and merges these, applying them silently to every session.
235
-
236
- <p align="center"><img src="docs/promo-dashboard-extensions-add.png" alt="Add MCP server" width="780"></p>
217
+ - **Robust Skills System** — Project skills live in `.pikiclaw/skills/*/SKILL.md` (legacy `.claude/commands/*.md` still works). Install community packs in one click from GitHub (`owner/repo`) or pick from our curated set (Anthropic Official, Vercel Agent Skills, etc.). Trigger anywhere with `/skills` and `/sk_<name>`.
218
+ - **Massive MCP Server Ecosystem** — Browse the [MCP Registry](https://registry.modelcontextprotocol.io), add custom stdio or HTTP servers, enforce real-handshake health checks, and authenticate with OAuth 2.1 + Dynamic Client Registration. The recommended catalog covers GitHub, Atlassian, Notion, Linear, Sentry, Cloudflare, Slack, Feishu/Lark, Stripe, Hugging Face, Gamma, Brave Search, Perplexity, Filesystem, SQLite, and PostgreSQL plus two built-in computer-use servers we ship ourselves: `pikiclaw-browser` (Chrome via Playwright) and `peekaboo` (macOS GUI via Peekaboo).
219
+ - **Seamless CLI Tool Integration** — Auto-detects versions and authentication state for popular CLIs (gh, brew, npm, uv, …). OAuth-web login handoffs route through the agent's normal tool surface.
220
+ - **Session-Scoped MCP Bridge** — Foundational tools (`im_list_files`, `im_send_file`, `im_ask_user`, `goal_get`, `goal_update`) plus the managed browser and macOS desktop tools (when enabled) are auto-injected into every session you launch.
221
+ - **Three-Way Merge Resolution** — Scope precedence is simple: `global < workspace < built-in`. The engine resolves and merges these silently for every session.
237
222
 
238
223
  ### Runtime & Developer Experience
239
224
 
240
- - **Dedicated Session Workspaces** — Every session gets its own isolated directory; file attachments and generated assets drop there automatically.
241
- - **Resume, Switch, and Classify** — Flawless multi-turn conversation support with smart session classification (identifying answers, proposals, implementations, or blocked states).
242
- - **Auto-Injected Base Tools** — Core MCP tools like file listing, sending, user prompting, and goal tracking are hard-wired into every stream.
243
- - **Computer-Use (Browser Engine)** — The built-in `pikiclaw-browser` MCP is a hyper-charged wrapper over `@playwright/mcp`. It includes a process-level supervisor and shares an isolated Chrome profile. Log in to your tools once, and reuse those authenticated sessions across all future tasks!
244
- - **Computer-Use (macOS Desktop)** — Enable the `peekaboo` MCP built-in server (macOS only) to unleash the [Peekaboo](https://peekaboo.sh/) framework over Accessibility and ScreenCaptureKit APIs. It exposes a god-mode suite of tools: `see`, `click`, `type`, `scroll`, `window`, `menu`, `app`, and `dock`. Requires explicit OS-level permissions but grants unprecedented control.
245
- - **Hardened for Long Tasks** — Built with sleep prevention, watchdog timers, auto-restarts, daemon modes, and a robust channel supervisor. You can walk away knowing your marathon tasks are protected by an ironclad runtime.
225
+ - **Dedicated Session Workspaces** — Every session gets its own isolated directory; uploaded files and generated assets (including agent-produced images) drop there automatically.
226
+ - **Resume, Switch, and Classify** — Multi-turn conversations resume cleanly. Sessions are auto-classified (answer, proposal, implementation, blocked) and the workspace list sorts by recent activity across all installed agents.
227
+ - **Auto-Injected Base Tools** — `im_*` (file listing, sending, asking the user) and `goal_*` tools are hard-wired into every stream — the agent can hand a file back to your IM or pause to ask a question without you wiring anything up.
228
+ - **Computer-Use (Browser Engine)** — The built-in `pikiclaw-browser` MCP wraps `@playwright/mcp` with a process-level supervisor and a shared, isolated Chrome profile. Log in to your tools once and reuse those authenticated sessions across every future task.
229
+ - **Computer-Use (macOS Desktop)** — Enable the `peekaboo` MCP server (macOS only) to unleash the [Peekaboo](https://peekaboo.sh/) framework over Accessibility and ScreenCaptureKit. Tools include `see`, `click`, `type`, `scroll`, `window`, `menu`, `app`, `dock`, plus the `agent` sub-agent for goal-directed control. Requires Accessibility + Screen Recording permission in System Settings.
230
+ - **Hardened for Long Tasks** — Sleep prevention, watchdog timers, auto-restart, daemon mode, and a channel supervisor. Restart is blocked while tasks are active so a hot reload never kills your marathon job.
246
231
 
247
232
  ---
248
233
 
@@ -253,7 +238,7 @@ agent CLI versions).
253
238
  | **Terminal Access** | 7 IM channels + Web + Extensible | Locked inside the IDE | Confined to a Web app | One specific IM app |
254
239
  | **Execution Environment** | Your local machine | Your local machine | Vendor's remote sandbox | Usually vendor servers |
255
240
  | **Agent Flexibility** | Claude Code, Codex, Gemini, Hermes (ACP), etc. | Locked in | Single | Single |
256
- | **Model Freedom** | Frontier models, domestic giants, OpenAI-proxies | Controlled by the platform | Controlled by the vendor | Single, hardcoded |
241
+ | **Model Freedom** | Frontier · Chinese domestic · local (Ollama, mlx-lm) · OpenAI-compatible proxies | Controlled by the platform | Controlled by the vendor | Single, hardcoded |
257
242
  | **Concurrency Power** | **N Agents × N Windows × N Workspaces** | One agent per IDE window | Strictly sequential | Single thread |
258
243
  | **Files & Tools Access** | Your entire local disk, your MCPs, your CLIs | Local project files | Heavily sandboxed | None or extremely limited |
259
244
  | **Add a New Terminal** | Drop in a simple `Channel` class | Impossible | Impossible | Requires a hard fork |
@@ -305,13 +290,7 @@ The shape that truly matters: **You never have to leave your preferred environme
305
290
 
306
291
  ## Roadmap
307
292
 
308
- **Already Shipped:** Hermes driver integration · ACP (Agent Client Protocol) · Secure Provider/Profile vault · Seven native IM channels · Computer-use via Playwright and Peekaboo (macOS).
309
-
310
- - **More ACP Agents** — Ensuring any new ACP-compatible agent can drop in with zero code changes.
311
- - **Broader Terminal Ecosystem** — Adding support for WhatsApp, a dedicated mobile app, and voice interfaces.
312
- - **Deeper Model Wrapping** — Building agent-on-arbitrary-model wrappers to support a wider array of domestic and open-source models seamlessly.
313
- - **Richer Tool Ecosystem** — Releasing official MCP packs, skill templates, and a community marketplace.
314
- - **Cross-Platform Computer-Use** — Extending desktop control drivers beyond macOS to support Windows and Linux.
293
+ - **SupporterAgent** A high-level meta-agent layered on top of the existing orchestration stack (Terminals × Agents × Models × Tools). It takes a complex objective and centrally owns the full loop: decomposition and planning, scheduling the right sub-agents on the right models with the right tools, watching their streams as they run, and stepping in to correct course when a sub-agent stalls, drifts, or contradicts the plan. The aim is a step-change in how reliably pikiclaw can drive long-horizon, multi-agent work without a human babysitting every turn.
315
294
 
316
295
  ---
317
296
 
package/README.zh-CN.md CHANGED
@@ -24,7 +24,7 @@ npx pikiclaw@latest
24
24
  <a href="README.md">English</a> | <b>简体中文</b>
25
25
  </p>
26
26
 
27
- <img src="docs/promo-dashboard-workspace.png" alt="工作区" width="780">
27
+ <img src="docs/promo-orchestrator.png" alt="Pikiclaw —— AI-Native Agent 编排器" width="820">
28
28
 
29
29
  </div>
30
30
 
@@ -36,29 +36,14 @@ npx pikiclaw@latest
36
36
 
37
37
  核心产品就是这个编排器,其它所有组件都可拔插。**更酷的是,这个编排器是由它自己构建出来的** —— pikiclaw 就是我们用来开发 pikiclaw 的工具。
38
38
 
39
- ```
40
- 终端层 Telegram · 飞书 · 微信 · Slack · Discord · 钉钉 · 企业微信 · Web Dashboard
41
- \__________________________|__________________________/
42
- v
43
- ┌──────────────────────────────┐
44
- │ pikiclaw 编排器 │
45
- └──────────────────────────────┘
46
- |
47
- ┌────────────────────────────────────────┼────────────────────────────────────────┐
48
- v v v
49
- Agent 层 模型层 工具层
50
- Claude Code · Codex · Gemini · Hermes Claude · GPT · Gemini · DeepSeek Skills · MCP · CLI
51
- (driver registry · ACP · 任意 Agent) 豆包 · MiMo · MiniMax · OpenRouter (全局 × 工作区)
52
- · 任意 OpenAI 兼容代理 · …
53
- |
54
- v
55
- 你的电脑
56
- ```
39
+ 上面这张架构图勾勒出我们缝合在一起的四层结构:
40
+
41
+ - **入口层 (Entry Points)** —— Telegram、飞书、微信、Slack、Discord、钉钉、企业微信、Web Dashboard,以及本地 API / CLI,都是一等公民级别、地位完全对等的终端。新增任意一个新终端,对其它通道完全无感。
42
+ - **可插拔 Agent (Pluggable Agents)** —— Claude Code、Codex、Gemini、Hermes 均作为内置驱动。Hermes 走 ACP (Agent Client Protocol) 协议;任何 CLI 或 ACP 形态的 Agent 都可通过相同的 `AgentDriver` 契约接入注册表。
43
+ - **模型路由 (Model Routing)** —— 前沿系列(Claude · GPT · Gemini)、国产矩阵(DeepSeek · 豆包 · MiMo · MiniMax · Qwen)、本地推理(Ollama,以及 Apple Silicon 上的 mlx-lm)、OpenRouter,以及任意 OpenAI 兼容代理。Providers + Profiles 作为一等公民的凭据保险箱,自带只读的 `models.dev` 目录与启动时的逐 Agent 环境变量注入。
44
+ - **工具网 (Tool Mesh)** —— Skills、MCP 服务器、CLI 工具、Web Search、桌面自动化等,会在「全局 × 工作区」两个维度智能合并,并悄悄注入到每一次会话之中。
57
45
 
58
- - **终端层 (Terminal)** —— Telegram、飞书、微信、Slack、Discord、钉钉、企业微信以及 Web Dashboard 都是一等公民入口。新的终端形态可以随时接入。
59
- - **Agent 层** —— 官方的 Claude Code / Codex / Gemini / Hermes CLI 作为底层驱动 (driver)。其中 Hermes 使用 ACP (Agent Client Protocol,客户端协议);注册表机制允许无缝接入任何其他的 Agent。
60
- - **模型层 (Model)** —— Claude / GPT / Gemini、国产系列 (DeepSeek、豆包、MiMo、MiniMax),外加 OpenRouter 以及任何兼容 OpenAI 接口的代理服务。提供商 (Providers) 与配置项 (Profiles) 是一等公民模块,自带凭据保险箱、models.dev 目录以及面向各个 Agent 专属的环境变量注入能力。
61
- - **工具层 (Tool)** —— Skills、MCP 服务器和 CLI 工具。它们会在全局和工作区两个层级进行智能合并,并被自动注入到每一次会话之中。
46
+ 这一切的正中央,是 **Pikiclaw Orchestration Core** —— 由它来统一管理路由、记忆、可观测性和 Bot 生命周期,从而保证任何终端都能借助任意工具,让任意 Agent 跑在任意模型上。
62
47
 
63
48
  ---
64
49
 
@@ -98,7 +83,7 @@ npx pikiclaw@latest
98
83
 
99
84
  <p align="center"><img src="docs/promo-demo.gif" alt="演示:从 Telegram 发起任务,Agent 在本地执行,结果回到聊天" width="780"></p>
100
85
 
101
- > **Web Dashboard** —— 多面板工作区,包含会话列表、对话流、工具调用轨迹以及输入区域(支持 1 / 2 / 3 / 6 面板布局)。
86
+ > **Web Dashboard** —— 多面板工作区,集成会话列表、实时对话流、工具调用轨迹、文件/图片附件、排队任务芯片以及统一的输入框(支持 1 / 2 / 3 / 6 面板布局、深浅色主题与中英双语 i18n)。
102
87
 
103
88
  <p align="center"><img src="docs/promo-dashboard-workspace.png" alt="Web Dashboard 工作区" width="780"></p>
104
89
 
@@ -113,13 +98,13 @@ npx pikiclaw@latest
113
98
 
114
99
  <img src="docs/promo-dashboard-im.png" alt="IM 接入" width="780">
115
100
 
116
- > **Agent 管理** —— 已安装的 Agent CLI 列表、默认 Agent 设定,以及各自独立的模型 / 推理强度配置。
101
+ > **Agent 管理** —— 已安装的 Agent CLI 列表、默认 Agent 设定,以及各 Agent 独立的模型 / 推理强度配置;可绑定 Profile 让 Agent 跑在非原生模型上。
117
102
 
118
103
  <img src="docs/promo-dashboard-agents.png" alt="Agent" width="780">
119
104
 
120
- > **模型配置** —— 整合了 Provider + Profile 的凭据库(涵盖 Claude、GPT、Gemini、DeepSeek、豆包、MiMo、MiniMax、OpenRouter 及任何兼容 OpenAI 接口的代理),支持通过 models.dev 目录进行验证,并为指定的 Agent 独立进行底层环境变量注入。
105
+ > **模型配置** —— 整合了 Provider + Profile 的凭据库(涵盖 Claude、GPT、Gemini、DeepSeek、豆包、MiMo、MiniMax、Qwen、OpenRouter 及任何 OpenAI 兼容代理),支持通过只读 `models.dev` 目录进行验证,并在 Agent 启动时定向注入对应环境变量;探测到 Ollama / mlx-lm(Apple Silicon)等本地后端时会自动挂载为 Provider。
121
106
 
122
- > **扩展工具** —— 统一管理全局 MCP 服务器、社区版 Skills、内置托管的浏览器环境及 macOS 桌面(Peekaboo)自动化能力。
107
+ > **扩展工具** —— 统一管理全局 MCP 服务器、社区版 Skills、内置托管的浏览器环境及 macOS 桌面(Peekaboo)自动化能力,支持通过 stdio、HTTP,或带动态客户端注册的 OAuth 2.1 接入服务。
123
108
 
124
109
  <img src="docs/promo-dashboard-extensions.png" alt="扩展" width="780">
125
110
 
@@ -151,8 +136,6 @@ cd your-workspace
151
136
  npx pikiclaw@latest
152
137
  ```
153
138
 
154
- <p align="center"><img src="docs/promo-install.gif" alt="一行命令安装" width="780"></p>
155
-
156
139
  这条命令会在 `http://localhost:3939` 自动唤起 **Web Dashboard**。随后,你就可以在浏览器里驱动任何会话、接入需要的 IM 渠道、灵活配置 Agent 和模型、快速安装 MCP 服务器与技能 (Skills),并统筹所有的系统权限。其他一切功能,尽在一键之遥。
157
140
 
158
141
  <details>
@@ -192,8 +175,9 @@ docker run -d --name pikiclaw -p 3939:3939 \
192
175
  - **自包含的闭环开发** —— pikiclaw 就是用 pikiclaw 自己开发出来的。这套开发流本身就是这款产品最原始的面貌:甚至可以在外用手机操作编排器,让 Agent 写代码、发布版本并不断迭代。
193
176
  - **挂机式编程 (Walk-away coding)** —— 发起一个耗时极长的大型重构任务,合上笔记本,外出时直接用手机通过 Telegram 进行监控和控制。Agent 始终在本地机器上运行,结果则会流式实时推回聊天界面中。
194
177
  - **同工作区多 Agent 接力** —— 先让 Claude Code 写一版功能草稿,无缝切给 Codex 去做深度 Review,最后再交给 Gemini 提供截然不同视角的优化建议。所有这些操作都在同一份代码目录和相同的历史会话中完成。
195
- - **灵活的国产模型路由方案** —— 当你的任务对延迟、成本或合规有硬性要求时,通过模型驱动包装层,可以直接让 Claude Code 跑在实惠又快速的 DeepSeek 或豆包模型之上。
178
+ - **灵活的国产 / 本地模型路由** —— 当你的任务对延迟、成本或合规有硬性要求时,通过模型注入层,可以让 Claude Code 直接跑在 DeepSeek、豆包、Qwen,甚至完全离线的 Ollama / mlx-lm 上。
196
179
  - **群聊协作级 Agent** —— 把 pikiclaw 拉入飞书 / Slack / Discord / 企业微信群聊内;整个团队可以共享这同一个编排器、统一的项目工作区和一系列团队专属技能。
180
+ - **随手让 Codex 生图** —— 让 Codex 出张海报、出个示意图、画个 UI 草图,结果会作为真正的图片附件流回到聊天里,并附带一个可展开的「生图 Prompt」让你随时查看模型实际收到的指令。下一次迭代只需要继续聊,而不必再切回浏览器。
197
181
  - **完全受控的 Computer-use 能力** —— 开启内置的 Chrome 浏览器托管(基于 Playwright)和 macOS 桌面环境托管(基于 Peekaboo,通过辅助功能和 ScreenCaptureKit)。Agent 瞬间获得「视力」(`see`)、可以自由点击、打字,并管理窗口、菜单栏和 Dock,而你依然可以通过手机远程精准操控它。无论是帮你预定一场会议、抓取某个数据面板信息、跑一通端到端自动测试,还是驱动任何原生的 macOS 本地应用,全都不在话下。
198
182
  - **基于 Skill 体系的自动化工作流** —— 一次性安装好社区提供的常用技能(例如 `promote`、`snipe`、`review`、`security-review` 等),往后只需在任何连接的终端里输入 `/sk_<name>` 即可实现一键触发。
199
183
 
@@ -203,45 +187,46 @@ docker run -d --name pikiclaw -p 3939:3939 \
203
187
 
204
188
  ### 终端层 (Terminal)
205
189
 
206
- - **支持七大主流 IM** —— 全面集成 Telegram、飞书、微信(个人号)、Slack、Discord、钉钉和企业微信。你可以只开启其中一个,也可以多开齐上。底层代码中每个渠道都做到绝对隔离;即使后续再添加新渠道(如 WhatsApp、自有移动 App 等),也丝毫不会影响现有逻辑的稳定性。
207
- - **Web Dashboard 面板** —— 直接在网页浏览器中驱动所有会话,获得与 IM 完全一致的自然对话、工具调用轨迹跟踪和极速的流式反馈体验。面板提供 1 / 2 / 3 / 6 多窗口并发布局、深色/浅色自适应主题,以及纯正的中英文 (i18n) 双语支持。
208
- - **实时流式预览** —— 每当 Agent 开始思考,消息都会实时在原地进行刷新;遇到超长文本能自动进行友好分段;生成的图片与文件也会即刻原样推回前端界面。
190
+ - **支持七大主流 IM** —— Telegram、飞书、微信(个人号)、Slack、Discord、钉钉与企业微信。开一个、开几个、全开都可以。底层每个渠道在代码上是物理隔离的;后续接入新通道(WhatsApp、自研移动 App、语音终端)也不会牵动其它通道。
191
+ - **Web Dashboard 面板** —— 直接在浏览器里驱动所有会话,对话流、工具调用轨迹和流式反馈都与 IM 完全一致。提供 1 / 2 / 3 / 6 面板布局、深浅色主题与中英双语 i18n
192
+ - **实时流式预览** —— Agent 一边思考、消息一边原地更新;超长文本自动分段;思考过程、工具调用、Plan 都被分别折叠成卡片;图片与文件也会实时原样推回前端。
193
+ - **排队 / 操控统一在一个输入框** —— 上一条还在跑,你就能继续发;新消息以排队 chip 出现,可以预览、撤回,也可以让 Agent 立刻插队执行;一键即可同时停掉当前任务与所有排队任务。
209
194
 
210
195
  ### Agent 层
211
196
 
212
- - **官方 CLI 作为原生底层驱动** —— 内置接入 Claude Code、Codex CLI、Gemini CLI 以及 Hermes (通过 ACP 协议)。我们坚决拒绝自己「造一套套壳的 Agent 引擎」——只要上游核心推出了任何更新功能,你就可以在第一时间无损享用。
213
- - **原生拥抱 ACP 协议** —— Hermes 的接入完全基于 [Agent Client Protocol](https://agentclientprotocol.com) 协议,通过系统标准的 JSON-RPC (输入/输出流) 唤起 `hermes acp`。这意味着在未来,任何兼容 ACP 协议的新 Agent 也能立刻无缝空降至平台。
214
- - **自由可插拔的注册表机制** —— 在整套代码库中,这部分唯一的强制契约只有 `src/agent/driver.ts`。不论是基于传统 CLI 还是新兴 ACP 协议开发的各类新 Agent,都能随时加入注册表,与现有的四大核心内置引擎并肩作战。
215
- - **无感会话级 Agent 切换** —— 你甚至不用离开当前代码工作区,就能在会话途中随时顺畅地帮 AI 更换一颗不同特性的「大脑」。
216
- - **接管与干预 (Steer) 控制** —— 你可以随心所欲中断正在执行的繁重任务,让排队的紧急新消息直接插队至最前方处理。
217
- - **Codex 人机协同机制 (Human-in-the-loop)** —— Codex 需要你确认操作细节时,这些提示请求会自动转化发送为 IM 中的互动询问消息。你只需在平常用的聊天框内简单答复,暂停的任务就会完美接续运作。
218
- - **长效目标系统 (Persistent goals)** —— 允许使用 `/goal` 指令,为指定的会话设定出伴有明确 Token 预算的长效终止目标。任务支持智能暂停/恢复,只有当 Agent 靠自行审计判定达到目标要求后,它才会结束自身当前进程。
197
+ - **官方 CLI 作为原生底层驱动** —— 内置接入 Claude Code、Codex CLI、Gemini CLI 以及 Hermes(通过 ACP 协议)。我们坚决不自己「造一套套壳的 Agent 引擎」—— 上游核心一旦更新,你立刻就能享用。
198
+ - **原生拥抱 ACP 协议** —— Hermes 完全基于 [Agent Client Protocol](https://agentclientprotocol.com) 协议接入,通过 JSON-RPC stdio 唤起 `hermes acp`。未来任何兼容 ACP 的新 Agent 也能立刻无缝空降。
199
+ - **可插拔的驱动注册表** —— 整个代码库中唯一的契约只有 `src/agent/driver.ts`。无论是 CLI 还是 ACP 形态,新 Agent 都能落地,与四大内置引擎并肩。
200
+ - **会话级 Agent 切换** —— 不需要离开当前工作区,就能在会话中途给 AI 换一颗「大脑」,历史上下文继续生效。
201
+ - **接管与干预 (Steer)** —— 随时中断正在执行的重任务,让排队的紧急消息插到最前;或者一键停掉整个会话。
202
+ - **Codex 人机协同 (Human-in-the-loop)** —— Codex 需要确认操作时,提示会被自动转发到你的活跃终端(IM 或 Dashboard)。在原地回一句话,被暂停的任务就会继续。
203
+ - **持久化目标系统,按 Agent 路由** —— `/goal <objective>` 会让会话持续工作直到 Agent 自审满足条件。Codex 走原生 `thread/goal/*` RPC,可选 `budget=N` Token 预算并支持暂停 / 恢复;Claude 走原生 Stop hook + Haiku 评审,目标完成后自动清除;其它 Agent 走 pikiclaw 自带的可移植 continuation。
204
+ - **图片生成全链路接管** —— Codex 内置的 `image_gen`(以及 Claude MCP / Gemini Imagen)产出的图,会以真实的图片附件落到聊天里 —— 不再是一坨 base64。Agent 实际发给图模型的 `revised_prompt` 会作为可点开展开的「**生图 Prompt**」挂在图片旁;图片生成中时还会有「Generating image…」chip 在助手回复下闪烁,告诉你这一轮为什么慢。
219
205
 
220
206
  ### 模型层
221
207
 
222
- - **全面涵盖前沿顶流、国产之光与各类代理** —— 囊括 Claude 家族系列、强大的 GPT-5 / Codex 以及 Gemini;国内优秀梯队的 DeepSeek、豆包 (Doubao)、MiMo MiniMax;同时原生兼容 OpenRouter 和任意支持 OpenAI 通用接口格式的第三方代理服务。
223
- - **Providers & Profiles 凭据专属保险箱** —— 构建了高标准隔离的数据保护模型,API 凭据会被单独加密存放在 `~/.pikiclaw/setting.json` 专属区域。你能在只读的 models.dev 目录进行便捷浏览、调用最真实的 API 探针来严谨验证密钥的有效性,最终再把这份 Profile 与指定的任意 Agent 相绑定,从而实现运行阶段环境变量参数的自动隔离注入。
224
- - **极度自由的会话级配置选取** —— 无论是模型本体还是针对特定高难度任务的推理强度,你都能在友好的 Dashboard 界面中,或者直接发送指令 `/models` `/mode` 来即时动态切选。
225
- - **Agent 级别底层强制注入** —— 核心流函数 `resolveAgentInjection(agentId)` 在启动的最初阶段就会将对应的环境变量强行覆盖进去。这意味着,你竟然可以直接指令 Claude Code,让它全程跑在超高性价比的 DeepSeek 或是豆包核心大模型上,并且全程无需去改动其原本上游客户端里任何一行深层配置代码。
208
+ - **前沿 + 国产 + 本地 + 各类代理** —— 前沿系列(Claude · GPT-5 / Codex · Gemini)、国产矩阵(DeepSeek · 豆包 · MiMo · MiniMax · Qwen)、本地推理(Ollama,以及 Apple Silicon 上的 mlx-lm)、OpenRouter,以及任意 OpenAI 兼容代理。
209
+ - **Providers & Profiles 凭据保险箱** —— API 凭据隔离存放在 `~/.pikiclaw/setting.json` 中。在只读的 `models.dev` 目录里浏览模型、通过真实的 Provider 探针验证密钥,再把 Profile 与某个 Agent 绑定,启动时自动注入对应环境变量。
210
+ - **本地模型零配置接入** —— 探测到 Ollama 或 mlx-lm 后端时会自动挂载为 Provider,不需要额外配置。Dashboard 上的卡片会展示状态、`brew/pipx` 安装命令、对应的 `ollama pull` / `mlx_lm.server` 拉模型命令,以及对照本机内存的 RAM 余量提示。
211
+ - **会话级模型 / 推理强度切换** —— Dashboard、`/models` 或 `/mode` 中实时切换。推理强度按 Agent 提供(Claude:low max;Codex:low → very high;Hermes:minimal → very high)。
212
+ - **Agent 级深度环境注入** —— `resolveAgentInjection(agentId)` 在启动时强制写入绑定 Profile 的环境变量。这意味着你可以让 Claude Code 全程跑在 DeepSeek、豆包,甚至本地 Ollama 上,而完全不动上游 CLI 的配置。
226
213
 
227
214
  ### 工具层
228
215
 
229
- - **强大的技能系统 (Skills)** —— 这个系统让每一个工程专属技能被稳稳地存放在 `.pikiclaw/skills/*/SKILL.md` 内(同时也全面向下兼容标准的 `.claude/commands/*.md` 描述格式)。支持快速指定从 GitHub 的公开仓库(`owner/repo`)中实现极速的一键远程拉取并安装;或者去随便逛逛我们收录整理的精选套件包(比如备受好评的 Anthropic 官方包、或是好用的 Vercel Agent Skills 包等)。平时直接发个 `/skills` 探查当前载入的所有技能,挑准目标直接用 `/sk_<name>` 便可秒速触发。
230
- - **最广泛主流的 MCP 服务器加持** —— 可以直接浏览接入 [MCP Registry](https://registry.modelcontextprotocol.io) 全球库或者自由手工增加本地 stdio 和网端 HTTP 服务;框架严格支持实机硬核握手健康侦测机制与 OAuth 2.1 高级动态客户端安全注册,且能精细拆分控制启用哪些作用域范围。目前精选优化的目录已毫无压力地涵盖 GitHub、Atlassian、Notion、Linear、Sentry、Cloudflare、Slack、飞书/Lark、Stripe、Hugging Face、Gamma、Brave Search、Perplexity、本地系统深度文件探测、SQLite 甚至专业的 PostgreSQL。此外,系统更逆天地内置附赠了两个重磅级的强力 Computer-use 级别核心服务(一个是基于大名鼎鼎的 Playwright 来暴躁驱动底层 Chrome 浏览器的 `pikiclaw-browser`;另一个则是依托极客向 Peekaboo 纯正血统,操控整个底层 macOS GUI 交互视窗的超级 `peekaboo` 工具)。
231
- - **无缝衔接各类流行 CLI 神器** —— 底层逻辑强悍地支持自动探测各类版本兼容性并精准校验出授权登入状态。特别是遇到基于浏览器鉴权登录判定的 CLI,我们底层支持 OAuth-web 授权无缝接力。最后统统由 Agent 最原生的调用接口无缝唤起执行操作。
232
- - **全局会话级的 MCP 底层桥接** —— `im_list_files`、`im_send_file`、`im_ask_user` 这些基建指令,再叠加前述的内置浏览器与 macOS 桌面自动化控制工具包(只要一旦开启安全开关),统统都会被全面自动注入进你的每一场会话里。
233
- - **双域极简权限合并机制** —— 所有工具作用范围授权,永远只需遵循这条策略:`全局 (global) < 当前工作区 (workspace) < 内建 (built-in)`。底层引擎每次都能自动执行合并,并丝滑生效进后续发起的对话之中。
234
-
235
- <p align="center"><img src="docs/promo-dashboard-extensions-add.png" alt="添加 MCP server" width="780"></p>
216
+ - **强大的技能系统 (Skills)** —— 项目专属技能存放在 `.pikiclaw/skills/*/SKILL.md` 中(也兼容旧的 `.claude/commands/*.md` 格式)。可以从 GitHub(`owner/repo`)一键安装社区包,或挑选我们精选的官方包(Anthropic Official、Vercel Agent Skills 等)。在任何终端里发 `/skills` 浏览,`/sk_<name>` 一键触发。
217
+ - **海量 MCP 生态加持** —— 浏览 [MCP Registry](https://registry.modelcontextprotocol.io)、手工增加 stdio / HTTP 服务、强制真实握手健康探测、支持带动态客户端注册的 OAuth 2.1。精选目录涵盖 GitHub、Atlassian、Notion、Linear、Sentry、Cloudflare、Slack、飞书/Lark、Stripe、Hugging Face、Gamma、Brave Search、Perplexity、Filesystem、SQLite PostgreSQL —— 加上我们自带的两个 computer-use 服务:`pikiclaw-browser`(Playwright 驱动的 Chrome)与 `peekaboo`(Peekaboo 驱动的 macOS GUI)。
218
+ - **无缝接入主流 CLI 工具** —— 自动探测版本与登录态(gh、brew、npm、uv 等),OAuth-web 浏览器授权流程在 Agent 调用面上无缝衔接。
219
+ - **会话级 MCP 桥接** —— `im_list_files`、`im_send_file`、`im_ask_user`、`goal_get`、`goal_update` 等基础工具,加上启用后的浏览器与 macOS 桌面工具,会被自动注入到每一场会话里。
220
+ - **三层合并规则** —— 工具作用域永远遵循:`全局 (global) < 当前工作区 (workspace) < 内建 (built-in)`。引擎自动合并后无感生效。
236
221
 
237
222
  ### 运行环境与开发者体验 (Runtime & DX)
238
223
 
239
- - **独享会话级项目工作区** —— 每开启一次新的交锋会话,底层引擎都会为它开辟出单独专属的实体文件隔离目录,附件直接落在那里。
240
- - **多轮会话回溯管控** —— 随便怎么恢复、切换,还配上了贴心的语义会话分类体系(快速分为解答、提案、实现,阻塞等清晰状态标识归类)。
241
- - **基建工具流自注入** —— 强悍的 `im_list_files`、`im_send_file`、以及 `im_ask_user`,加上目标追踪管理工具等,会在启动前夕自动挂载。
242
- - **Computer-use (浏览器引擎层)** —— 系统底层内置了 `pikiclaw-browser` MCP。这是二次封装了 `@playwright/mcp` 实现的,使其拥有进程级 Supervisor 监管机制,且达成了跨任务进程共享独立 Chrome 配置。只需要登录认证一次常用网站;在未来的任何任务里,这个工具将直接一键继承数据免签直连!
243
- - **Computer-use (macOS 桌面控制层)** —— 当你在扩展面板启用 `peekaboo` MCP 并在系统设置授予终端“辅助功能”与“屏幕录制”权限后(仅限 macOS);你即可借助 [Peekaboo](https://peekaboo.sh/) 框架的加持瞬间获得暴露在外的各种工具:视力 (`see`);精准点击 (`click`);虚空打字输入 (`type`);操作滚轮 (`scroll`);以及操作全系统窗口 (`window`);主菜单 (`menu`);程序生命周期 (`app`);甚至是 Dock (`dock`) 等这一整套系统控制工具集。
244
- - **长效任务坚固防线** —— 核心内置了防休眠系统、看门狗守护模块、异常自动重启涅槃机制、守护进程模式;还有渠道 Supervisor 督军服务。这豪华阵容保证你哪怕挂机跑极其漫长的任务,也能极度稳如磐石!
224
+ - **独立的会话工作区** —— 每一次会话都有专属的隔离目录;上传的文件以及 Agent 生成的产物(含图片)都会落在那里。
225
+ - **可恢复 / 可切换 / 自动分类** —— 多轮会话随意恢复与切换,自动按语义分类(answer / proposal / implementation / blocked),工作区会话列表按最近活动时间排序,覆盖所有已安装 Agent。
226
+ - **基础工具自动注入** —— `im_*`(列文件 / 发文件 / 问用户)与 `goal_*` 在每一条流里都默认可用 —— Agent 不需要任何配置就能把文件回推到你的 IM、或者卡在中途反过来问你一句。
227
+ - **Computer-use(浏览器层)** —— 内置的 `pikiclaw-browser` MCP `@playwright/mcp` 包装上进程级 Supervisor 和一个共享的、隔离的 Chrome Profile。常用站点登录一次,所有后续任务都直接复用登录态。
228
+ - **Computer-usemacOS 桌面层)** —— 启用 `peekaboo` MCP(仅 macOS),即可调用 [Peekaboo](https://peekaboo.sh/) 提供的整套桌面控制工具:`see`、`click`、`type`、`scroll`、`window`、`menu`、`app`、`dock`,以及面向目标自主控制的 `agent` 子代理。需要在系统设置中授予终端「辅助功能」与「屏幕录制」权限。
229
+ - **为长任务硬化的运行时** —— 防休眠、看门狗、自动重启、守护进程模式、渠道 Supervisor 一应俱全;当还有任务在跑时主动阻止重启,保证你的马拉松作业不会被一次热加载弄崩。
245
230
 
246
231
  ---
247
232
 
@@ -252,7 +237,7 @@ docker run -d --name pikiclaw -p 3939:3939 \
252
237
  | **操作终端** | 7 大 IM + Web + 持续扩展 | 仅限 IDE 内部 | 局限在专属网页端 | 死绑在单个 IM 内的单个 Bot |
253
238
  | **Agent 运行地** | 完全在你自己的本地机器上 | 你的本地机器 | 厂商分配的云端沙盒里 | 往往在厂商服务器端 |
254
239
  | **Agent 的选择** | Claude Code · Codex · Gemini · Hermes (ACP) · …(任你选) | 深度绑定没得选 | 单一 | 单一 |
255
- | **底层模型抉择** | 国外前沿大模型 + 国产全系 + 任何兼容 OpenAI 接口的模型 | 平台控制 | 厂商绑定 | 单一无脑没得换 |
240
+ | **底层模型抉择** | 前沿 · 国产 · 本地(Ollama / mlx-lm)· OpenAI 兼容代理 | 平台控制 | 厂商绑定 | 单一无脑没得换 |
256
241
  | **并发能力** | **N 个 Agent × N 个窗口 × N 个工作区** | 每个 IDE 窗口只能同时运行一个 | 串行排队 | 单一线程 |
257
242
  | **文件与工具掌控** | 你主机上的所有本地文件、MCP 资源库、以及本地 CLI 系统 | 本地文件 | 沙盒受限环境 | 极度受限 |
258
243
  | **接入新终端渠道** | 随便写个 `Channel` 基础实现类就能打通 | 无法实现 | 无法实现 | 需要 Fork 整个项目 |
@@ -304,13 +289,7 @@ docker run -d --name pikiclaw -p 3939:3939 \
304
289
 
305
290
  ## 产品路线图 (Roadmap)
306
291
 
307
- 我们已交付:Hermes 驱动支持 · ACP (Agent Client Protocol) 协议底层集成 · Provider/Profile 模型保险箱机制 · 七大 IM 渠道打通 · Computer-use 的落地(Playwright 浏览器托管 + Peekaboo macOS 桌面托管)。
308
-
309
- - **接入更多 ACP Agent** —— 确保任何新的兼容 ACP 协议的 Agent 都能免代码零配置顺滑接入。
310
- - **拓展终端生态** —— 将支持 WhatsApp、独立的移动端 App 以及语音交互模块。
311
- - **深化模型层包装** —— 构建基于任意模型的通用 Agent Wrapper,以便无缝驱动更多优秀的国产模型。
312
- - **完善工具生态** —— 推出官方推荐的 MCP 插件合集、Skill 模版库及社区应用市场。
313
- - **全平台的 Computer-use** —— 在已有的 macOS Peekaboo 驱动之外,加入适配 Windows / Linux 操作系统的桌面控制支持。
292
+ - **SupporterAgent** —— 在现有「终端 × Agent × 模型 × 工具」编排栈之上再加一层 high-level 元代理,统一管理整个复杂任务的生命周期:从拆解与规划,到把合适的子 Agent 调度到合适的模型与工具上,再到全程盯着各路 stream,发现子 Agent 卡壳、走偏或与计划冲突时主动介入校正。目标是把 pikiclaw 在长时序、多 Agent 协作上的稳定性拉到新一档,让人不再需要逐轮盯着每个子任务。
314
293
 
315
294
  ---
316
295
 
@@ -131,6 +131,11 @@ process.stdin.on("end", () => {
131
131
  tool_name: typeof payload.tool_name === "string" ? payload.tool_name : null,
132
132
  tool_input: payload.tool_input || null,
133
133
  tool_response: payload.tool_response || null,
134
+ // Claude Code tags sub-agent tool calls with agent_id so the parent can
135
+ // tell them apart from main-thread calls. Forwarding it lets the driver
136
+ // route the hook to the right sub-agent card instead of the parent's
137
+ // 执行 list.
138
+ agent_id: typeof payload.agent_id === "string" ? payload.agent_id : null,
134
139
  }) + "\\n";
135
140
  try { fs.appendFileSync(toolEventsFile, line); } catch (_) {}
136
141
  process.stdout.write(JSON.stringify({ continue: true }) + "\\n");
@@ -250,6 +255,26 @@ function applyHookToolEvent(ev, s) {
250
255
  const toolName = String(ev?.tool_name || '').trim();
251
256
  if (!toolName || !toolUseId)
252
257
  return false;
258
+ // Sub-agent tool calls fire the parent's Pre/PostToolUse hooks too (one
259
+ // hook pipeline per CLI process). Claude Code tags those payloads with
260
+ // `agent_id`; route them to the matching sub-agent's tool list instead of
261
+ // appending to the parent's recentActivity. Without this every Task spawn
262
+ // floods the parent's 执行 card with the children's tool stream while the
263
+ // sub-agent cards sit empty until the sidecar JSONL flushes at Stop.
264
+ const subAgentId = typeof ev?.agent_id === 'string' && ev.agent_id ? ev.agent_id : '';
265
+ if (subAgentId) {
266
+ if (ev.event === 'PreToolUse') {
267
+ const parentToolUseId = s.subAgentIdToParent?.get(subAgentId);
268
+ const sub = parentToolUseId ? s.subAgents?.get(parentToolUseId) : undefined;
269
+ if (sub && !sub.tools.some((t) => t.id === toolUseId)) {
270
+ const summary = toolName === 'TodoWrite'
271
+ ? 'Update plan'
272
+ : summarizeClaudeToolUse(toolName, ev.tool_input || {});
273
+ sub.tools.push({ id: toolUseId, name: toolName, summary });
274
+ }
275
+ }
276
+ return true;
277
+ }
253
278
  if (ev.event === 'PreToolUse') {
254
279
  if (s.seenClaudeToolIds.has(toolUseId))
255
280
  return false;
@@ -835,6 +860,14 @@ export async function doClaudeTuiStream(opts) {
835
860
  catch {
836
861
  continue;
837
862
  }
863
+ // A Task PreToolUse and the first sub-agent tool PreToolUse can land in
864
+ // the same tick batch. If the sub-agent's hook arrives before we've
865
+ // discovered its sidecar (and thus before s.subAgentIdToParent knows
866
+ // its agent_id), refresh discovery so the hook resolves its parent on
867
+ // this pass instead of leaking through unattributed.
868
+ const subAgentId = typeof ev?.agent_id === 'string' ? ev.agent_id : '';
869
+ if (subAgentId && !s.subAgentIdToParent?.has(subAgentId))
870
+ tryDiscoverSubAgents();
838
871
  try {
839
872
  if (applyHookToolEvent(ev, s))
840
873
  any = true;
@@ -880,6 +913,15 @@ export async function doClaudeTuiStream(opts) {
880
913
  continue;
881
914
  const sidecarPath = path.join(sidecarDir, `${stem}.jsonl`);
882
915
  trackedSubAgents.set(stem, { sidecarPath, offset: 0, parentToolUseId });
916
+ // `stem` is "agent-<id>"; Claude Code's hook payload `agent_id` carries
917
+ // just the raw id. Keep both keys so applyHookToolEvent can attribute
918
+ // sub-agent tool hooks to the parent's Task tool_use no matter which
919
+ // form arrives.
920
+ const rawAgentId = stem.startsWith('agent-') ? stem.slice('agent-'.length) : stem;
921
+ if (!s.subAgentIdToParent)
922
+ s.subAgentIdToParent = new Map();
923
+ s.subAgentIdToParent.set(rawAgentId, parentToolUseId);
924
+ s.subAgentIdToParent.set(stem, parentToolUseId);
883
925
  agentLog(`[claude-tui] subagent sidecar discovered ${stem} parent=${parentToolUseId.slice(0, 14)}`);
884
926
  }
885
927
  };
@@ -222,7 +222,11 @@ function normalizeSessionRecord(raw, workdir) {
222
222
  title: typeof raw?.title === 'string' && raw.title.trim() ? raw.title.trim() : null,
223
223
  model: typeof raw?.model === 'string' && raw.model.trim() ? raw.model.trim() : null,
224
224
  thinkingEffort: typeof raw?.thinkingEffort === 'string' && raw.thinkingEffort.trim() ? raw.thinkingEffort.trim() : null,
225
+ profileId: typeof raw?.profileId === 'string' && raw.profileId.trim() ? raw.profileId.trim() : null,
225
226
  stagedFiles: Array.isArray(raw?.stagedFiles) ? dedupeStrings(raw.stagedFiles.filter((v) => typeof v === 'string')) : [],
227
+ lastUserAttachments: Array.isArray(raw?.lastUserAttachments)
228
+ ? dedupeStrings(raw.lastUserAttachments.filter((v) => typeof v === 'string'))
229
+ : [],
226
230
  runState: normalizeSessionRunState(raw?.runState),
227
231
  runDetail: normalizeSessionRunDetail(raw?.runState, raw?.runDetail),
228
232
  runUpdatedAt: normalizeSessionRunUpdatedAt(raw?.runUpdatedAt, typeof raw?.updatedAt === 'string' && raw.updatedAt.trim() ? raw.updatedAt : new Date().toISOString()),
@@ -548,7 +552,7 @@ export function ensureSessionWorkspace(opts) {
548
552
  workspacePath: sessionWorkspacePath(workdir, opts.agent, sessionId),
549
553
  threadId,
550
554
  createdAt: new Date().toISOString(), updatedAt: new Date().toISOString(),
551
- title: summarizePromptTitle(opts.title) || null, model: null, thinkingEffort: null, stagedFiles: [],
555
+ title: summarizePromptTitle(opts.title) || null, model: null, thinkingEffort: null, profileId: null, stagedFiles: [], lastUserAttachments: [],
552
556
  runState: 'completed', runDetail: null, runUpdatedAt: new Date().toISOString(),
553
557
  runPid: null,
554
558
  classification: null, userStatus: null, userNote: null,
@@ -588,6 +592,7 @@ function managedRecordToSessionInfo(record) {
588
592
  threadId: record.threadId,
589
593
  model: record.model,
590
594
  thinkingEffort: record.thinkingEffort,
595
+ profileId: record.profileId ?? null,
591
596
  createdAt: record.createdAt,
592
597
  title,
593
598
  running: record.runState === 'running',
@@ -686,12 +691,17 @@ export async function deleteAgentSession(opts) {
686
691
  return result;
687
692
  }
688
693
  /**
689
- * Look up the persisted model and thinkingEffort for an existing session.
690
- * Returns null values when the session is not found or fields are not set.
694
+ * Look up the persisted model, thinkingEffort, and bound profileId for an
695
+ * existing session. Returns null values when the session is not found or
696
+ * fields are not set.
691
697
  */
692
698
  export function getSessionStoredConfig(workdir, agent, sessionId) {
693
699
  const record = findPikiclawSession(workdir, agent, sessionId);
694
- return { model: record?.model ?? null, thinkingEffort: record?.thinkingEffort ?? null };
700
+ return {
701
+ model: record?.model ?? null,
702
+ thinkingEffort: record?.thinkingEffort ?? null,
703
+ profileId: record?.profileId ?? null,
704
+ };
695
705
  }
696
706
  export function ensureManagedSession(opts) {
697
707
  const session = ensureSessionWorkspace({
@@ -705,6 +715,12 @@ export function ensureManagedSession(opts) {
705
715
  session.record.title = summarizePromptTitle(opts.title);
706
716
  if (!session.record.model && opts.model)
707
717
  session.record.model = opts.model.trim() || null;
718
+ if (!session.record.thinkingEffort && opts.thinkingEffort) {
719
+ session.record.thinkingEffort = opts.thinkingEffort.trim().toLowerCase() || null;
720
+ }
721
+ if (!session.record.profileId && opts.profileId) {
722
+ session.record.profileId = opts.profileId.trim() || null;
723
+ }
708
724
  saveSessionRecord(opts.workdir, session.record);
709
725
  return managedRecordToSessionInfo(session.record);
710
726
  }
@@ -9,7 +9,7 @@ import { restartManagedBrowser } from '../browser-supervisor.js';
9
9
  import { terminateProcessTree } from '../core/process-control.js';
10
10
  import { AGENT_DETECT_TIMEOUTS, AGENT_STREAM_HARD_KILL_GRACE_MS } from '../core/constants.js';
11
11
  import { getDriver, allDrivers, getAcceptedProviderKinds } from './driver.js';
12
- import { resolveAgentInjection, getActiveProfile, getProvider, updateProfile, listProfiles, } from '../model/index.js';
12
+ import { resolveAgentInjection, getActiveProfile, getActiveProfileId, getProvider, updateProfile, listProfiles, } from '../model/index.js';
13
13
  import { Q, agentLog, agentWarn, agentError, joinErrorMessages, normalizeErrorMessage, buildStreamPreviewMeta, computeContext, shortValue, isPendingSessionId, dedupeStrings, normalizeStreamPreviewPlan, } from './utils.js';
14
14
  import { saveSessionRecord, setSessionRunState, applySessionRunResult, ensureSessionWorkspace, importFilesIntoWorkspace, syncManagedSessionIdentity, summarizePromptTitle, recordFork, } from './session.js';
15
15
  import { collapseSkillPrompt } from './skills.js';
@@ -346,6 +346,11 @@ function prepareStreamOpts(opts) {
346
346
  // Capture staged files for MCP bridge before clearing
347
347
  const stagedFiles = [...session.record.stagedFiles];
348
348
  session.record.stagedFiles = [];
349
+ // Remember this turn's attachments so dashboard fallbacks (called while the
350
+ // agent CLI hasn't yet flushed the user event to its native session file)
351
+ // can still render the user's image bubble. Cleared/overwritten at the
352
+ // start of the NEXT turn — always reflects the turn currently in flight.
353
+ session.record.lastUserAttachments = [...attachmentRelPaths];
349
354
  if (!session.record.title)
350
355
  session.record.title = summarizePromptTitle(displayPrompt) || null;
351
356
  session.record.lastQuestion = shortValue(displayPrompt, 500);
@@ -383,6 +388,14 @@ function finalizeStreamResult(result, workdir, prompt, session) {
383
388
  session.record.model = result.model || session.record.model;
384
389
  if (result.thinkingEffort)
385
390
  session.record.thinkingEffort = result.thinkingEffort;
391
+ // Capture the BYOK Profile that was in effect for this run so a future
392
+ // `session.switch` can re-bind it (null = native CLI auth).
393
+ try {
394
+ session.record.profileId = getActiveProfileId(session.record.agent);
395
+ }
396
+ catch {
397
+ /* model layer not initialised in tests — leave profileId untouched */
398
+ }
386
399
  const displayPrompt = collapseSkillPrompt(prompt) ?? prompt;
387
400
  if (!session.record.title)
388
401
  session.record.title = summarizePromptTitle(displayPrompt);
package/dist/bot/bot.js CHANGED
@@ -640,6 +640,8 @@ export class Bot {
640
640
  workdir: 'workdir' in session && session.workdir ? session.workdir : this.workdir,
641
641
  title: session.title ?? null,
642
642
  model: session.model ?? null,
643
+ thinkingEffort: session.thinkingEffort ?? null,
644
+ profileId: session.profileId ?? null,
643
645
  threadId: session.threadId ?? null,
644
646
  });
645
647
  const runtime = this.hydrateSessionRuntime({
@@ -649,11 +651,16 @@ export class Bot {
649
651
  workspacePath: managed.workspacePath ?? session.workspacePath ?? null,
650
652
  threadId: managed.threadId ?? session.threadId ?? null,
651
653
  modelId: session.model ?? managed.model ?? null,
654
+ thinkingEffort: session.thinkingEffort ?? managed.thinkingEffort ?? null,
652
655
  });
653
656
  if (!runtime) {
654
657
  this.applySessionSelection(cs, null);
655
658
  return;
656
659
  }
660
+ // Adopting an existing session is an explicit user pick — drop any
661
+ // queued handover from a prior agent toggle so we don't accidentally
662
+ // prepend the wrong context to the resumed session's next turn.
663
+ cs.pendingHandoverFrom = null;
657
664
  this.applySessionSelection(cs, runtime);
658
665
  }
659
666
  syncSelectedChats(session) {
@@ -1759,6 +1766,26 @@ export class Bot {
1759
1766
  this.adoptSession(cs, session);
1760
1767
  return this.getSelectedSession(cs);
1761
1768
  }
1769
+ /**
1770
+ * Resume an existing session in a chat and restore the agent's persistent
1771
+ * model / effort / BYOK Profile binding so the next stream — and the IM
1772
+ * picker chips — match the session that was just adopted. This is the
1773
+ * shared "click a row from the workspace list" path used by both the
1774
+ * interactive selector and the text-command `/sessions <#>` flow.
1775
+ */
1776
+ resumeSessionForChat(chatId, session) {
1777
+ const runtime = this.adoptExistingSessionForChat(chatId, session);
1778
+ if (session.model) {
1779
+ this.switchModelForChat(chatId, session.model, session.profileId ?? null);
1780
+ }
1781
+ else if (session.profileId !== undefined) {
1782
+ this.switchModelForChat(chatId, this.modelForAgent(session.agent), null);
1783
+ }
1784
+ if (session.thinkingEffort) {
1785
+ this.switchEffortForChat(chatId, session.thinkingEffort);
1786
+ }
1787
+ return runtime;
1788
+ }
1762
1789
  switchAgentForChat(chatId, agent) {
1763
1790
  const cs = this.chat(chatId);
1764
1791
  if (cs.agent === agent)
@@ -135,8 +135,11 @@ export function decodeCommandAction(data) {
135
135
  }
136
136
  export async function buildSessionsCommandView(bot, chatId, page, pageSize = 5) {
137
137
  const data = await getSessionsPageData(bot, chatId, page, pageSize);
138
+ // Multi-row: one button per session on its own line, prefixed with the
139
+ // agent badge so a mixed workspace list reads cleanly. Avoid cramming
140
+ // multiple buttons onto one row (some IM clients truncate).
138
141
  const sessionButtons = data.sessions.map(session => [{
139
- label: session.title,
142
+ label: `[${session.agent}] ${session.title} · ${session.time}`,
140
143
  action: { kind: 'session.switch', sessionId: session.key },
141
144
  state: buttonStateFromFlags({ isCurrent: session.isCurrent, isRunning: session.isRunning }),
142
145
  primary: session.isCurrent,
@@ -147,20 +150,27 @@ export async function buildSessionsCommandView(bot, chatId, page, pageSize = 5)
147
150
  navRow.push({ label: '+ New', action: { kind: 'session.new' } });
148
151
  if (data.page < data.totalPages - 1)
149
152
  navRow.push({ label: `p${data.page + 2} ▶`, action: { kind: 'sessions.page', page: data.page + 1 } });
153
+ const agentChips = Object.entries(data.agentTotals)
154
+ .sort((a, b) => b[1] - a[1])
155
+ .map(([agent, count]) => `${agent}:${count}`)
156
+ .join(' · ');
157
+ const headerDetail = data.workspaceName
158
+ ? (agentChips ? `${data.workspaceName} · ${agentChips}` : data.workspaceName)
159
+ : (agentChips || null);
150
160
  return {
151
161
  kind: 'sessions',
152
162
  title: 'Sessions',
153
- detail: data.agent,
163
+ detail: headerDetail,
154
164
  metaLines: [`${data.total} total · p${data.page + 1}/${data.totalPages}`],
155
165
  items: data.sessions.map(session => ({
156
- label: session.title,
166
+ label: `[${session.agent}] ${session.title}`,
157
167
  detail: session.time,
158
168
  state: buttonStateFromFlags({ isCurrent: session.isCurrent, isRunning: session.isRunning }),
159
169
  })),
160
- emptyText: 'No sessions found.',
170
+ emptyText: 'No sessions found in this workspace.',
161
171
  helperText: data.totalPages > 1
162
- ? `Use the controls below to switch or turn pages.`
163
- : 'Use the controls below to switch or start a new session.',
172
+ ? `Pick a row to resume (agent/model/effort restore automatically).`
173
+ : 'Pick a row to resume, or start a new session.',
164
174
  rows: navRow.length ? [...sessionButtons, navRow] : sessionButtons,
165
175
  };
166
176
  }
@@ -391,22 +401,47 @@ export async function executeCommandAction(bot, chatId, action, opts = {}) {
391
401
  }
392
402
  case 'session.switch': {
393
403
  const chat = bot.chat(chatId);
394
- const result = await bot.fetchSessions(chat.agent, bot.chatWorkdir(chatId));
404
+ // Workspace-wide lookup (no agent filter) so a row from any agent can be
405
+ // resumed directly from a single mixed list.
406
+ const result = await bot.fetchSessions(undefined, bot.chatWorkdir(chatId));
395
407
  if (!result.ok)
396
408
  return { kind: 'noop', message: 'Failed to load sessions' };
397
409
  const session = result.sessions.find(entry => entry.sessionId === action.sessionId);
398
410
  if (!session)
399
411
  return { kind: 'noop', message: 'Session not found' };
412
+ const prevAgent = chat.agent;
400
413
  const runtime = bot.adoptExistingSessionForChat(chatId, session);
414
+ // Restore the agent's persistent model / effort / Profile binding so the
415
+ // next stream — and the IM picker chips — match the resumed session.
416
+ if (session.model) {
417
+ bot.switchModelForChat(chatId, session.model, session.profileId ?? null);
418
+ }
419
+ else if (session.profileId !== undefined) {
420
+ // Session was native (profileId === null) — explicitly clear any
421
+ // active Profile so we don't run with a stale BYOK binding.
422
+ bot.switchModelForChat(chatId, bot.modelForAgent(session.agent), null);
423
+ }
424
+ if (session.thinkingEffort) {
425
+ bot.switchEffortForChat(chatId, session.thinkingEffort);
426
+ }
401
427
  const displayId = session.sessionId || action.sessionId;
402
428
  const sessionStatus = getSessionStatusForChat(bot, chat, session);
429
+ const runDetail = summarizeSessionRun({ ...session, running: sessionStatus.isRunning }).noticeDetail;
430
+ const restoreParts = [];
431
+ if (prevAgent !== session.agent)
432
+ restoreParts.push(`agent → ${session.agent}`);
433
+ if (session.model)
434
+ restoreParts.push(`model → ${session.model}`);
435
+ if (session.thinkingEffort)
436
+ restoreParts.push(`effort → ${session.thinkingEffort}`);
437
+ const detail = restoreParts.length ? `${runDetail} · ${restoreParts.join(' · ')}` : runDetail;
403
438
  return {
404
439
  kind: 'notice',
405
440
  callbackText: `Switched: ${displayId.slice(0, 12)}`,
406
441
  notice: {
407
442
  title: 'Session Switched',
408
443
  value: displayId,
409
- detail: summarizeSessionRun({ ...session, running: sessionStatus.isRunning }).noticeDetail,
444
+ detail,
410
445
  valueMode: 'code',
411
446
  },
412
447
  session: runtime,
@@ -198,12 +198,18 @@ export function summarizeSessionRun(session) {
198
198
  }
199
199
  export async function getSessionsPageData(bot, chatId, page, pageSize = 5) {
200
200
  const cs = bot.chat(chatId);
201
- const res = await bot.fetchSessions(cs.agent, bot.chatWorkdir(chatId));
201
+ // Workspace-wide: drop the cs.agent filter so the list matches what the
202
+ // dashboard shows for this workspace (all installed agents, sorted by
203
+ // most-recent activity).
204
+ const res = await bot.fetchSessions(undefined, bot.chatWorkdir(chatId));
202
205
  const sessions = res.ok ? res.sessions : [];
203
206
  const total = sessions.length;
204
207
  const totalPages = Math.max(1, Math.ceil(total / pageSize));
205
208
  const pg = Math.max(0, Math.min(page, totalPages - 1));
206
209
  const slice = sessions.slice(pg * pageSize, (pg + 1) * pageSize);
210
+ const agentTotals = {};
211
+ for (const s of sessions)
212
+ agentTotals[s.agent] = (agentTotals[s.agent] || 0) + 1;
207
213
  const entries = [];
208
214
  for (const s of slice) {
209
215
  const sessionKey = s.sessionId || '';
@@ -216,12 +222,13 @@ export async function getSessionsPageData(bot, chatId, page, pageSize = 5) {
216
222
  runDetail: s.runDetail,
217
223
  });
218
224
  const displayText = sessionListDisplayTitle(s);
219
- const title = displayText ? displayText.replace(/\n/g, ' ').slice(0, 20) : sessionKey.slice(0, 20);
225
+ const title = displayText ? displayText.replace(/\n/g, ' ').slice(0, 28) : sessionKey.slice(0, 28);
220
226
  const time = s.createdAt
221
227
  ? new Date(s.createdAt).toLocaleString('zh-CN', { timeZone: 'Asia/Shanghai', month: '2-digit', day: '2-digit', hour: '2-digit', minute: '2-digit' })
222
228
  : '?';
223
229
  entries.push({
224
230
  key: sessionKey,
231
+ agent: s.agent,
225
232
  title,
226
233
  time: `${time} · ${runSummary.shortLabel}`,
227
234
  isCurrent: status.isCurrent,
@@ -230,7 +237,14 @@ export async function getSessionsPageData(bot, chatId, page, pageSize = 5) {
230
237
  runDetail: s.runDetail,
231
238
  });
232
239
  }
233
- return { agent: cs.agent, total, page: pg, totalPages, sessions: entries };
240
+ return {
241
+ workspaceName: res.workspaceName || '',
242
+ agentTotals,
243
+ total,
244
+ page: pg,
245
+ totalPages,
246
+ sessions: entries,
247
+ };
234
248
  }
235
249
  export function extractLastSessionTurn(messages) {
236
250
  if (!messages.length)
@@ -98,27 +98,73 @@ export async function querySessions(opts) {
98
98
  // ---------------------------------------------------------------------------
99
99
  // Session detail queries
100
100
  // ---------------------------------------------------------------------------
101
- /**
102
- * Build a 1-2 message fallback transcript from the pikiclaw session record
103
- * for runs that crashed before the agent could write its own transcript file
104
- * (e.g. gemini auth failure, codex spawn failure). Without this the dashboard
105
- * detail panel would render blank for clearly-failed sessions.
106
- */
101
+ const IMAGE_EXTENSIONS = new Set(['.png', '.jpg', '.jpeg', '.gif', '.webp', '.bmp', '.svg']);
102
+ const MIME_BY_EXT = {
103
+ '.png': 'image/png', '.jpg': 'image/jpeg', '.jpeg': 'image/jpeg',
104
+ '.gif': 'image/gif', '.webp': 'image/webp', '.bmp': 'image/bmp', '.svg': 'image/svg+xml',
105
+ };
106
+ /** Build image MessageBlocks from a session record's `lastUserAttachments`
107
+ * (relative paths under `workspacePath`). Used by fallback paths so the
108
+ * dashboard can still render the user's image bubble while the agent CLI
109
+ * has not yet flushed the turn to its own session file. Non-image
110
+ * attachments are skipped — the fallback is text-first and doesn't try to
111
+ * reconstruct generic file references. */
112
+ function imageBlocksFromManagedRecord(record) {
113
+ const attachments = record.lastUserAttachments;
114
+ if (!attachments?.length)
115
+ return [];
116
+ const blocks = [];
117
+ for (const rel of attachments) {
118
+ const ext = path.extname(rel).toLowerCase();
119
+ if (!IMAGE_EXTENSIONS.has(ext))
120
+ continue;
121
+ const abs = path.isAbsolute(rel) ? rel : path.join(record.workspacePath, rel);
122
+ blocks.push({
123
+ type: 'image',
124
+ // `file://` sentinel — `rewriteImageBlocksForTransport` (dashboard
125
+ // response layer) converts it to a proper /attachment URL.
126
+ content: `file://${abs}`,
127
+ imagePath: abs,
128
+ imageMime: MIME_BY_EXT[ext] || 'application/octet-stream',
129
+ });
130
+ }
131
+ return blocks;
132
+ }
107
133
  function tailFallbackFromManagedRecord(opts) {
134
+ const fb = managedFallbackContent(opts);
135
+ if (!fb)
136
+ return null;
137
+ const limit = Math.max(1, opts.limit ?? fb.messages.length);
138
+ return { ok: true, messages: fb.messages.slice(-limit), error: null };
139
+ }
140
+ function managedFallbackContent(opts) {
108
141
  const record = findPikiclawSession(opts.workdir, opts.agent, opts.sessionId);
109
142
  if (!record)
110
143
  return null;
111
144
  const messages = [];
112
- if (record.lastQuestion)
113
- messages.push({ role: 'user', text: record.lastQuestion });
145
+ const richMessages = [];
146
+ if (record.lastQuestion) {
147
+ const text = record.lastQuestion;
148
+ messages.push({ role: 'user', text });
149
+ const blocks = text ? [{ type: 'text', content: text }] : [];
150
+ blocks.push(...imageBlocksFromManagedRecord(record));
151
+ if (blocks.length)
152
+ richMessages.push({ role: 'user', text, blocks, usage: null });
153
+ }
114
154
  const failureText = record.lastAnswer
115
155
  || (record.runState === 'incomplete' ? record.runDetail : null);
116
- if (failureText)
156
+ if (failureText) {
117
157
  messages.push({ role: 'assistant', text: failureText });
158
+ richMessages.push({
159
+ role: 'assistant',
160
+ text: failureText,
161
+ blocks: [{ type: 'text', content: failureText }],
162
+ usage: null,
163
+ });
164
+ }
118
165
  if (!messages.length)
119
166
  return null;
120
- const limit = Math.max(1, opts.limit ?? messages.length);
121
- return { ok: true, messages: messages.slice(-limit), error: null };
167
+ return { messages, richMessages };
122
168
  }
123
169
  /** Get recent messages from a session (tail). */
124
170
  export async function querySessionTail(opts) {
@@ -169,17 +215,20 @@ function collapseSkillPromptsInResult(result) {
169
215
  export async function querySessionMessages(opts) {
170
216
  const result = await _getSessionMessages(opts);
171
217
  if (!result.ok || !result.messages.length) {
172
- const fallback = tailFallbackFromManagedRecord({
218
+ const fb = managedFallbackContent({
173
219
  agent: opts.agent,
174
220
  sessionId: opts.sessionId,
175
221
  workdir: opts.workdir,
176
- limit: result.messages.length || undefined,
177
222
  });
178
- if (fallback) {
223
+ if (fb) {
224
+ const totalTurns = fb.messages.filter(m => m.role === 'user').length;
179
225
  return collapseSkillPromptsInResult({
180
226
  ok: true,
181
- messages: fallback.messages.map(m => ({ role: m.role, text: m.text })),
182
- totalTurns: fallback.messages.filter(m => m.role === 'user').length,
227
+ messages: fb.messages.map(m => ({ role: m.role, text: m.text })),
228
+ // Always emit richMessages so the dashboard can render image blocks
229
+ // for the first user turn while the agent CLI is still spinning up.
230
+ richMessages: fb.richMessages,
231
+ totalTurns,
183
232
  error: null,
184
233
  });
185
234
  }
@@ -364,11 +364,11 @@ export class DingtalkBot extends Bot {
364
364
  const d = await getSessionsPageData(this, ctx.chatId, 0, 100);
365
365
  const target = d.sessions[idx - 1];
366
366
  if (target) {
367
- const result = await this.fetchSessions(this.chat(ctx.chatId).agent, this.chatWorkdir(ctx.chatId));
367
+ const result = await this.fetchSessions(undefined, this.chatWorkdir(ctx.chatId));
368
368
  const session = result.sessions.find(s => s.sessionId === target.key);
369
369
  if (session) {
370
- this.adoptExistingSessionForChat(ctx.chatId, session);
371
- await ctx.reply(`Switched to session ${target.title}`);
370
+ this.resumeSessionForChat(ctx.chatId, session);
371
+ await ctx.reply(`Switched to [${session.agent}] ${target.title}`);
372
372
  }
373
373
  else {
374
374
  await ctx.reply('Session not found.');
@@ -387,7 +387,7 @@ export class DingtalkBot extends Bot {
387
387
  d.sessions.forEach((s, i) => {
388
388
  const mark = s.isCurrent ? ' ←' : '';
389
389
  const running = s.isRunning ? ' [running]' : '';
390
- lines.push(`${i + 1}. ${s.title} · ${s.time}${mark}${running}`);
390
+ lines.push(`${i + 1}. [${s.agent}] ${s.title} · ${s.time}${mark}${running}`);
391
391
  });
392
392
  lines.push('', 'Usage: /sessions new | /sessions <#>');
393
393
  await ctx.reply(lines.join('\n'));
@@ -360,11 +360,11 @@ export class DiscordBot extends Bot {
360
360
  const d = await getSessionsPageData(this, ctx.chatId, 0, 100);
361
361
  const target = d.sessions[idx - 1];
362
362
  if (target) {
363
- const result = await this.fetchSessions(this.chat(ctx.chatId).agent, this.chatWorkdir(ctx.chatId));
363
+ const result = await this.fetchSessions(undefined, this.chatWorkdir(ctx.chatId));
364
364
  const session = result.sessions.find(s => s.sessionId === target.key);
365
365
  if (session) {
366
- this.adoptExistingSessionForChat(ctx.chatId, session);
367
- await ctx.reply(`Switched to session ${target.title}`);
366
+ this.resumeSessionForChat(ctx.chatId, session);
367
+ await ctx.reply(`Switched to [${session.agent}] ${target.title}`);
368
368
  }
369
369
  else {
370
370
  await ctx.reply('Session not found.');
@@ -383,7 +383,7 @@ export class DiscordBot extends Bot {
383
383
  d.sessions.forEach((s, i) => {
384
384
  const mark = s.isCurrent ? ' ←' : '';
385
385
  const running = s.isRunning ? ' [running]' : '';
386
- lines.push(`${i + 1}. ${s.title} · ${s.time}${mark}${running}`);
386
+ lines.push(`${i + 1}. [${s.agent}] ${s.title} · ${s.time}${mark}${running}`);
387
387
  });
388
388
  lines.push('', 'Usage: /sessions new | /sessions <#>');
389
389
  await ctx.reply(lines.join('\n'));
@@ -309,8 +309,15 @@ export function renderStart(d) {
309
309
  return lines.join('\n');
310
310
  }
311
311
  export function renderSessionsPage(d) {
312
+ const agentChips = Object.entries(d.agentTotals)
313
+ .sort((a, b) => b[1] - a[1])
314
+ .map(([agent, count]) => `${agent}:${count}`)
315
+ .join(' · ');
316
+ const header = d.workspaceName
317
+ ? (agentChips ? `${d.workspaceName} · ${agentChips}` : d.workspaceName)
318
+ : (agentChips || 'sessions');
312
319
  const lines = [
313
- `**${d.agent} sessions** (${d.total}) p${d.page + 1}/${d.totalPages}`,
320
+ `**${header}** (${d.total}) p${d.page + 1}/${d.totalPages}`,
314
321
  '',
315
322
  ];
316
323
  if (!d.sessions.length) {
@@ -320,7 +327,7 @@ export function renderSessionsPage(d) {
320
327
  for (let i = 0; i < d.sessions.length; i++) {
321
328
  const s = d.sessions[i];
322
329
  const icon = s.isRunning ? '🟢' : s.isCurrent ? '●' : '○';
323
- lines.push(`${icon} **${i + 1}.** ${s.title} ${s.time}${s.isCurrent ? ' ← current' : ''}`);
330
+ lines.push(`${icon} **${i + 1}.** [${s.agent}] ${s.title} ${s.time}${s.isCurrent ? ' ← current' : ''}`);
324
331
  }
325
332
  lines.push('');
326
333
  lines.push('*Use the controls below to switch, or reply with session number / "new".*');
@@ -369,11 +369,11 @@ export class SlackBot extends Bot {
369
369
  const d = await getSessionsPageData(this, ctx.chatId, 0, 100);
370
370
  const target = d.sessions[idx - 1];
371
371
  if (target) {
372
- const result = await this.fetchSessions(this.chat(ctx.chatId).agent, this.chatWorkdir(ctx.chatId));
372
+ const result = await this.fetchSessions(undefined, this.chatWorkdir(ctx.chatId));
373
373
  const session = result.sessions.find(s => s.sessionId === target.key);
374
374
  if (session) {
375
- this.adoptExistingSessionForChat(ctx.chatId, session);
376
- await ctx.reply(`Switched to session ${target.title}`);
375
+ this.resumeSessionForChat(ctx.chatId, session);
376
+ await ctx.reply(`Switched to [${session.agent}] ${target.title}`);
377
377
  }
378
378
  else {
379
379
  await ctx.reply('Session not found.');
@@ -392,7 +392,7 @@ export class SlackBot extends Bot {
392
392
  d.sessions.forEach((s, i) => {
393
393
  const mark = s.isCurrent ? ' ←' : '';
394
394
  const running = s.isRunning ? ' [running]' : '';
395
- lines.push(`${i + 1}. ${s.title} · ${s.time}${mark}${running}`);
395
+ lines.push(`${i + 1}. [${s.agent}] ${s.title} · ${s.time}${mark}${running}`);
396
396
  });
397
397
  lines.push('', 'Usage: /sessions new | /sessions <#>');
398
398
  await ctx.reply(lines.join('\n'));
@@ -372,11 +372,11 @@ export class WeComBot extends Bot {
372
372
  const d = await getSessionsPageData(this, ctx.chatId, 0, 100);
373
373
  const target = d.sessions[idx - 1];
374
374
  if (target) {
375
- const result = await this.fetchSessions(this.chat(ctx.chatId).agent, this.chatWorkdir(ctx.chatId));
375
+ const result = await this.fetchSessions(undefined, this.chatWorkdir(ctx.chatId));
376
376
  const session = result.sessions.find(s => s.sessionId === target.key);
377
377
  if (session) {
378
- this.adoptExistingSessionForChat(ctx.chatId, session);
379
- await ctx.reply(`Switched to session ${target.title}`);
378
+ this.resumeSessionForChat(ctx.chatId, session);
379
+ await ctx.reply(`Switched to [${session.agent}] ${target.title}`);
380
380
  }
381
381
  else {
382
382
  await ctx.reply('Session not found.');
@@ -395,7 +395,7 @@ export class WeComBot extends Bot {
395
395
  d.sessions.forEach((s, i) => {
396
396
  const mark = s.isCurrent ? ' ←' : '';
397
397
  const running = s.isRunning ? ' [running]' : '';
398
- lines.push(`${i + 1}. ${s.title} · ${s.time}${mark}${running}`);
398
+ lines.push(`${i + 1}. [${s.agent}] ${s.title} · ${s.time}${mark}${running}`);
399
399
  });
400
400
  lines.push('', 'Usage: /sessions new | /sessions <#>');
401
401
  await ctx.reply(lines.join('\n'));
@@ -475,11 +475,11 @@ export class WeixinBot extends Bot {
475
475
  const d = await getSessionsPageData(this, ctx.chatId, 0, 100);
476
476
  const target = d.sessions[idx - 1];
477
477
  if (target) {
478
- const result = await this.fetchSessions(this.chat(ctx.chatId).agent, this.chatWorkdir(ctx.chatId));
478
+ const result = await this.fetchSessions(undefined, this.chatWorkdir(ctx.chatId));
479
479
  const session = result.sessions.find(s => s.sessionId === target.key);
480
480
  if (session) {
481
- this.adoptExistingSessionForChat(ctx.chatId, session);
482
- await ctx.reply(`Switched to session ${target.title}`);
481
+ this.resumeSessionForChat(ctx.chatId, session);
482
+ await ctx.reply(`Switched to [${session.agent}] ${target.title}`);
483
483
  }
484
484
  else {
485
485
  await ctx.reply(`Session not found.`);
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "pikiclaw",
3
- "version": "0.3.51",
3
+ "version": "0.3.53",
4
4
  "description": "Put the world's smartest AI agents in your pocket. Command local Claude & Gemini via IM. | 让最好用的 IM 变成你电脑上的顶级 Agent 控制台",
5
5
  "type": "module",
6
6
  "bin": {