pikiclaw 0.3.36 → 0.3.38

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (48) hide show
  1. package/README.md +149 -149
  2. package/README.zh-CN.md +153 -153
  3. package/dashboard/dist/assets/AgentTab-BkZnuKaS.js +1 -0
  4. package/dashboard/dist/assets/{BrandIcon-DuvssI5P.js → BrandIcon-AD156ArP.js} +1 -1
  5. package/dashboard/dist/assets/{DirBrowser-CfaNqvEe.js → DirBrowser-BwzrHjjR.js} +1 -1
  6. package/dashboard/dist/assets/{ExtensionsTab-BXEctSc6.js → ExtensionsTab-BwH0CaMy.js} +1 -1
  7. package/dashboard/dist/assets/{IMAccessTab-DM1rsj-u.js → IMAccessTab-_0gPo7VO.js} +1 -1
  8. package/dashboard/dist/assets/{Modal-DHVhz66H.js → Modal-CdInMPb1.js} +1 -1
  9. package/dashboard/dist/assets/{Modals-Dhxu3hSi.js → Modals-DbI2X4wo.js} +1 -1
  10. package/dashboard/dist/assets/{Select-lWGnI7Cq.js → Select-BE8Z-HNs.js} +1 -1
  11. package/dashboard/dist/assets/SessionPanel-DUq59l08.js +1 -0
  12. package/dashboard/dist/assets/SystemTab-Bfw4XWnc.js +1 -0
  13. package/dashboard/dist/assets/index-5bbMY2wM.js +16 -0
  14. package/dashboard/dist/assets/index-BP_D94Yk.css +1 -0
  15. package/dashboard/dist/assets/index-C2q8FWKH.js +5 -0
  16. package/dashboard/dist/assets/router-Cav8lq-m.js +3 -0
  17. package/dashboard/dist/assets/{shared-DKIT06du.js → shared-DFRcspIO.js} +1 -1
  18. package/dashboard/dist/index.html +3 -3
  19. package/dist/agent/drivers/claude.js +11 -15
  20. package/dist/agent/drivers/gemini.js +3 -3
  21. package/dist/agent/index.js +1 -1
  22. package/dist/agent/mcp/bridge.js +195 -9
  23. package/dist/agent/mcp/session-server.js +9 -5
  24. package/dist/agent/mcp/tools/ask-user.js +113 -0
  25. package/dist/agent/session.js +13 -0
  26. package/dist/agent/stream.js +28 -8
  27. package/dist/agent/utils.js +24 -0
  28. package/dist/bot/bot.js +114 -36
  29. package/dist/browser-profile.js +42 -0
  30. package/dist/browser-supervisor.js +79 -11
  31. package/dist/catalog/local-models.js +112 -0
  32. package/dist/channels/weixin/bot.js +413 -79
  33. package/dist/cli/autostart.js +226 -0
  34. package/dist/cli/main.js +19 -1
  35. package/dist/core/platform.js +9 -4
  36. package/dist/dashboard/routes/local-models.js +276 -0
  37. package/dist/dashboard/routes/sessions.js +18 -1
  38. package/dist/dashboard/server.js +2 -0
  39. package/dist/dashboard/session-control.js +14 -0
  40. package/package.json +1 -1
  41. package/dashboard/dist/assets/AgentTab-B0k6567P.js +0 -1
  42. package/dashboard/dist/assets/PermissionsTab-B-KiKU8D.js +0 -1
  43. package/dashboard/dist/assets/SessionPanel-DL2E4VnO.js +0 -1
  44. package/dashboard/dist/assets/SystemTab-B6NbPTbB.js +0 -1
  45. package/dashboard/dist/assets/index-BWqrPOhO.js +0 -16
  46. package/dashboard/dist/assets/index-CmjPBA1x.js +0 -3
  47. package/dashboard/dist/assets/index-DgLQyCkc.css +0 -1
  48. package/dashboard/dist/assets/router-emLofBBH.js +0 -3
package/README.md CHANGED
@@ -6,7 +6,7 @@
6
6
 
7
7
  ##### *The open Agent orchestrator for the era when creators no longer need to read code.*
8
8
 
9
- *Plug in any agent (Claude · Codex · Gemini · Hermes · …), any model (Claude · GPT · Gemini · DeepSeek · 豆包 · MiMo · MiniMax · OpenRouter · or any third-party proxy), any tool (Skills · MCP · CLI). Drive them from any terminal — IM, Web, or future. Pikiclaw is built using pikiclaw.*
9
+ *Plug in any agent (Claude · Codex · Gemini · Hermes · …), any model (Claude · GPT · Gemini · DeepSeek · Doubao · MiMo · MiniMax · OpenRouter · or any third-party proxy), and any tool (Skills · MCP · CLI). Drive them seamlessly from your favorite terminal—whether it's an IM, Web Dashboard, or future interfaces. pikiclaw is built using pikiclaw.*
10
10
 
11
11
  ```bash
12
12
  npx pikiclaw@latest
@@ -32,12 +32,12 @@ npx pikiclaw@latest
32
32
 
33
33
  ## What is pikiclaw?
34
34
 
35
- **Most "AI dev tool" projects pick one slice — one IDE, one agent, one model vendor and stop there.** pikiclaw is built around a different bet: the next era of building does not happen inside a single editor. It happens through an **orchestrator** that lets a creator drive a *swarm* of agents in parallel, from one console — on the best models, through whatever terminal is closest at hand. And never open a code file.
35
+ **Most "AI dev tools" settle for a narrow slice of the piebinding you to a single IDE, a specific agent, or a closed model ecosystem.** pikiclaw is built on a fundamentally different premise: the next era of software creation won't be confined to a single code editor. It happens within an **Orchestrator** that empowers a creator to drive a *swarm* of agents—in parallel, from one console—running on the best models available, through whichever terminal is closest at hand. And you might never need to open a code file.
36
36
 
37
- The product is the orchestrator. Everything else plugs in. **And the orchestrator is built using itself** pikiclaw is what we use to build pikiclaw.
37
+ The product is the orchestrator itself. Everything else simply plugs in. **And what's cooler is that this orchestrator is entirely self-bootstrapped**—pikiclaw is what we use to build pikiclaw.
38
38
 
39
- ```
40
- Terminal layer Telegram · Feishu · WeChat · Slack · Discord · DingTalk · WeCom · Web Dashboard
39
+ ```text
40
+ Terminal Layer Telegram · Feishu · WeChat · Slack · Discord · DingTalk · WeCom · Web Dashboard
41
41
  \__________________________|__________________________/
42
42
  v
43
43
  ┌──────────────────────────────┐
@@ -46,88 +46,88 @@ The product is the orchestrator. Everything else plugs in. **And the orchestrato
46
46
  |
47
47
  ┌────────────────────────────────────────┼────────────────────────────────────────┐
48
48
  v v v
49
- Agent layer Model layer Tool layer
49
+ Agent Layer Model Layer Tool Layer
50
50
  Claude Code · Codex · Gemini · Hermes Claude · GPT · Gemini · DeepSeek Skills · MCP · CLI
51
- (driver registry · ACP · any agent) 豆包 · MiMo · MiniMax · OpenRouter (global × workspace)
51
+ (driver registry · ACP · any agent) Doubao · MiMo · MiniMax · OpenRouter (global × workspace)
52
52
  · any OpenAI-compatible proxy · …
53
53
  |
54
54
  v
55
- Your computer
55
+ Your Machine
56
56
  ```
57
57
 
58
- - **Terminal layer** — Telegram, Feishu, WeChat, Slack, Discord, DingTalk, WeCom, and the Web Dashboard are co-equal entry points. New terminals plug in here.
59
- - **Agent layer** — Official Claude Code / Codex / Gemini / Hermes CLIs as drivers. Hermes speaks ACP (Agent Client Protocol); the registry takes any agent.
60
- - **Model layer** — Claude / GPT / Gemini, the domestic Chinese series (DeepSeek, 豆包, MiMo, MiniMax), plus OpenRouter and any OpenAI-compatible proxy. Providers + Profiles are a first-class layer with their own credential vault, models.dev catalog, and per-agent injection.
61
- - **Tool layer** — Skills, MCP servers, and CLI tools merged across global and workspace scopes, injected into every session.
58
+ - **Terminal Layer** — Telegram, Feishu, WeChat, Slack, Discord, DingTalk, WeCom, and the Web Dashboard are all first-class, co-equal entry points. New terminals plug right in.
59
+ - **Agent Layer** — We use the official Claude Code, Codex, Gemini, and Hermes CLIs as underlying drivers. Hermes communicates via ACP (Agent Client Protocol); our flexible registry can accommodate virtually any agent.
60
+ - **Model Layer** — Access Claude, GPT, Gemini, leading Chinese domestic models (DeepSeek, Doubao, MiMo, MiniMax), plus OpenRouter and any OpenAI-compatible proxy. Providers and Profiles are treated as a first-class layer with their own credential vault, a read-only models.dev catalog, and per-agent environment injection.
61
+ - **Tool Layer** — Skills, MCP servers, and CLI tools are intelligently merged across global and workspace scopes, automatically injected into every session.
62
62
 
63
63
  ---
64
64
 
65
- ## Built with itself
65
+ ## Built with Itself
66
66
 
67
- > The most credible test of an Agent orchestrator is whether it can build itself. pikiclaw can. We use pikiclaw to develop, test, release, and operate pikiclaw — every commit, every release.
67
+ > The most credible test of an Agent orchestrator is whether it can build itself. pikiclaw can. We use pikiclaw to develop, test, release, and operate pikiclaw—driving every commit and every release.
68
68
 
69
- A typical day-of-development inside pikiclaw:
69
+ A typical day of development inside pikiclaw:
70
70
 
71
- - A Claude Code session in window 1 implements a new dashboard route.
72
- - A Codex session in window 2 writes the matching unit tests, against the same workspace.
73
- - A Gemini session in window 3 reviews the diff and drafts the changelog.
74
- - A skill (`/sk_promote`) sweeps GitHub for relevant issues and replies in a fourth thread.
75
- - All four streams run in parallel; one human steers them from a phone in a coffee shop.
71
+ - A Claude Code session in pane 1 implements a new dashboard route.
72
+ - A Codex session in pane 2 writes the matching unit tests against the same workspace.
73
+ - A Gemini session in pane 3 reviews the diffs and drafts the changelog.
74
+ - Meanwhile, a background skill (`/sk_promote`) sweeps GitHub for relevant issues and automatically drafts replies in a fourth thread.
75
+ - All four streams run entirely in parallel; a single human steers them all from a phone in a coffee shop.
76
76
 
77
- The orchestrator is the product. It also happens to be the IDE the orchestrator is built in.
77
+ The orchestrator is the product. It also happens to be the ultimate IDE in which the orchestrator itself is built.
78
78
 
79
79
  ---
80
80
 
81
- ## A swarm by default
81
+ ## A Swarm by Default
82
82
 
83
- Most "AI dev tools" assume one user, one agent, one task at a time. pikiclaw assumes the opposite: **N agents, N windows, one operator, one toolkit.**
83
+ Most "AI dev tools" assume a 1:1:1 ratio: one user, one agent, one task at a time. pikiclaw assumes the exact opposite: **N agents, N windows, one operator, one unified toolkit.**
84
84
 
85
- - **N parallel sessions** — every dashboard pane is an independent agent stream against an independent session workspace; IM threads add even more.
86
- - **Mix-and-match agents** — Claude Code in pane 1, Codex in pane 2, Gemini in pane 3, all on different repos / workspaces.
87
- - **One toolkit** — global skills, global MCP servers, and per-workspace overrides apply uniformly. You configure once; every session inherits.
88
- - **Steer anywhere** — interrupt any running stream, queue a follow-up, hand control to the next agent in line.
89
- - **Group-mode** — drop the orchestrator into a Feishu / Slack / Discord / WeCom group; teammates share the same swarm.
85
+ - **N Parallel Sessions** — Every dashboard pane represents an independent agent stream tied to an independent session workspace. Add IM threads, and you scale effortlessly.
86
+ - **Mix-and-Match Agents** — Run Claude Code in pane 1, Codex in pane 2, and Gemini in pane 3, all working simultaneously on different repositories or workspaces.
87
+ - **One Unified Toolkit** — Global skills, global MCP servers, and per-workspace overrides apply uniformly. Configure it once, and every session inherits the power.
88
+ - **Steer from Anywhere** — Interrupt any running stream, queue a follow-up instruction, or hand over control to the next agent in line seamlessly.
89
+ - **Group Collaboration Mode** — Drop the orchestrator into a Feishu, Slack, Discord, or WeCom group, and let your entire team share and steer the same agent swarm.
90
90
 
91
- This is the shape that matters: one creator, with a swarm at their fingertips.
91
+ This is the shape that matters: one creator, with a swarm of AI agents at their fingertips.
92
92
 
93
93
  ---
94
94
 
95
- ## See it in action
95
+ ## See It in Action
96
96
 
97
- > **Real task** — ask pikiclaw to gather and summarize today's AI news; the agent reads, writes, and ships the result back through Telegram, all from your phone.
97
+ > **Real-world Task** — Ask pikiclaw to gather and summarize today's AI news; the agent reads, writes, and ships the results back through Telegram, all controlled from your phone.
98
98
 
99
- <p align="center"><img src="docs/promo-demo.gif" alt="Demo: ask Telegram, agent works locally, result returns to chat" width="780"></p>
99
+ <p align="center"><img src="docs/promo-demo.gif" alt="Demo: Ask Telegram, agent works locally, result returns to chat" width="780"></p>
100
100
 
101
- > **Web Dashboard** — multi-pane workspace with session list, conversation, tool-use traces, and input composer (1 / 2 / 3 / 6 pane layouts).
101
+ > **Web Dashboard** — A multi-pane workspace featuring a session list, conversation threads, tool-use traces, and an input composer (supporting 1, 2, 3, or 6-pane layouts).
102
102
 
103
103
  <p align="center"><img src="docs/promo-dashboard-workspace.png" alt="Web Dashboard workspace" width="780"></p>
104
104
 
105
105
  <details>
106
- <summary><b>More: basic ops · IM access · agents · models · extensions · permissions · system info</b></summary>
106
+ <summary><b>More: Basic Ops · IM Access · Agents · Models · Extensions · Permissions · System Info</b></summary>
107
107
 
108
- > Send a message, watch the agent stream, receive files back.
108
+ > Send a message, watch the agent stream its thoughts, and receive files back instantly.
109
109
 
110
110
  <img src="docs/promo-basic-ops.gif" alt="Basic operations" width="780">
111
111
 
112
- > **IM Access** — Telegram, Feishu, WeChat, Slack, Discord, DingTalk, WeCom channel status and configuration
112
+ > **IM Access** — Check and configure connection statuses for Telegram, Feishu, WeChat, Slack, Discord, DingTalk, and WeCom.
113
113
 
114
114
  <img src="docs/promo-dashboard-im.png" alt="IM Access" width="780">
115
115
 
116
- > **Agents** — installed agent CLIs, default agent, per-agent model / reasoning effort
116
+ > **Agents** — Manage installed agent CLIs, set your default agent, and configure per-agent models and reasoning effort levels.
117
117
 
118
118
  <img src="docs/promo-dashboard-agents.png" alt="Agents" width="780">
119
119
 
120
- > **Models** — Providers + Profiles vault (Claude, GPT, Gemini, DeepSeek, 豆包, MiMo, MiniMax, OpenRouter, any OpenAI-compatible proxy), validated against models.dev catalog and injected per agent
120
+ > **Models** — A secure Providers + Profiles vault (supporting Claude, GPT, Gemini, DeepSeek, Doubao, MiMo, MiniMax, OpenRouter, and any OpenAI-compatible proxy), validated against the models.dev catalog and injected directly per agent.
121
121
 
122
- > **Extensions** — global MCP servers, community skills, managed browser + macOS desktop (Peekaboo) automation
122
+ > **Extensions** — Manage global MCP servers, community skills, and built-in automation for headless browsers and macOS desktop (Peekaboo).
123
123
 
124
124
  <img src="docs/promo-dashboard-extensions.png" alt="Extensions" width="780">
125
125
 
126
- > **System Permissions** — macOS accessibility, screen recording, disk access
126
+ > **System Permissions** — Handle macOS Accessibility, Screen Recording, and Disk Access permissions seamlessly.
127
127
 
128
128
  <img src="docs/promo-dashboard-permissions.png" alt="Permissions" width="780">
129
129
 
130
- > **System Info** — working directory, CPU / memory / disk monitoring
130
+ > **System Info** — Monitor your working directory alongside real-time CPU, memory, and disk usage.
131
131
 
132
132
  <img src="docs/promo-dashboard-system.png" alt="System Info" width="780">
133
133
 
@@ -135,9 +135,9 @@ This is the shape that matters: one creator, with a swarm at their fingertips.
135
135
 
136
136
  ---
137
137
 
138
- ## Quick start
138
+ ## Quick Start
139
139
 
140
- **Prereqs:** Node.js 20+, plus at least one official Agent CLI logged in:
140
+ **Prerequisites:** Node.js 20+, plus at least one official Agent CLI installed and authenticated on your system:
141
141
 
142
142
  - [`claude`](https://docs.anthropic.com/en/docs/claude-code) (Claude Code)
143
143
  - [`codex`](https://github.com/openai/codex) (Codex CLI)
@@ -153,147 +153,147 @@ npx pikiclaw@latest
153
153
 
154
154
  <p align="center"><img src="docs/promo-install.gif" alt="One-command install" width="780"></p>
155
155
 
156
- That opens the **Web Dashboard** at `http://localhost:3939` drive sessions in the browser, connect IM channels, configure agents/models, install MCP servers and skills, manage system permissions. Everything else is one click away.
156
+ This instantly opens the **Web Dashboard** at `http://localhost:3939`. From there, you can drive sessions in the browser, connect IM channels, configure agents and models, install MCP servers and skills, and manage system permissions. Everything else is just one click away.
157
157
 
158
158
  <details>
159
- <summary><b>Prefer the terminal? There's a wizard.</b></summary>
159
+ <summary><b>Prefer the terminal? We have a setup wizard.</b></summary>
160
160
 
161
161
  ```bash
162
- npx pikiclaw@latest --setup # interactive terminal wizard
163
- npx pikiclaw@latest --doctor # environment check only
162
+ npx pikiclaw@latest --setup # Interactive terminal setup wizard
163
+ npx pikiclaw@latest --doctor # Environment health check only
164
164
  ```
165
165
 
166
166
  </details>
167
167
 
168
168
  ---
169
169
 
170
- ## What people do with it
170
+ ## How People Are Using It
171
171
 
172
- - **Run a swarm in parallel** — open N sessions in N dashboard panes (or N IM threads), each a different agent on a different workspace, all working at the same time. One person, many agents, one cockpit. Steer any of them at any moment.
173
- - **Self-hosted dev loop** — pikiclaw was built using pikiclaw. The dev workflow *is* the product: drive the orchestrator from your phone, write code, ship a release, iterate.
174
- - **Walk-away coding** — kick off a long refactor, close the laptop, drive it from your phone over Telegram. The agent keeps running locally; results stream back to chat.
175
- - **Multi-agent on one workspace** — let Claude Code draft an implementation, switch to Codex to review, then Gemini for a different perspective. Same files, same session history.
176
- - **Domestic-model routing** — run Claude Code over DeepSeek or 豆包 via a wrapper driver when latency, cost, or compliance demands a non-frontier model.
177
- - **Group-chat agent** — drop pikiclaw into a Feishu / Slack / Discord / WeCom work group; the team shares one orchestrator, one workspace, one set of skills.
178
- - **Computer-use, controlled by you** — toggle on the managed Chrome (Playwright) and macOS desktop (Peekaboo, via Accessibility + ScreenCaptureKit). The agent can `see` the screen, click, type, manage windows / menus / Dock and you steer it from any phone. Book a meeting, scrape a dashboard, run an end-to-end test, or drive any native macOS app.
179
- - **Skill-driven workflows** — install community skills (`promote`, `snipe`, `review`, `security-review`, ) once and trigger them from any terminal with `/sk_<name>`.
172
+ - **Run a Swarm in Parallel** — Open N sessions in N dashboard panes (or N IM threads), each running a different agent on a different workspace, all executing simultaneously. One person, many agents, one unified cockpit. Steer any of them at any moment.
173
+ - **Self-Hosted Dev Loop** — pikiclaw was built using pikiclaw. The dev workflow *is* the product: drive the orchestrator from your phone, write code, ship a release, and iterate.
174
+ - **Walk-Away Coding** — Kick off a massive refactoring task, close your laptop, and monitor/steer it from your phone over Telegram. The agent continues running locally, streaming results back to your chat.
175
+ - **Multi-Agent Tag Team** — Let Claude Code draft an initial implementation, switch to Codex for an in-depth review, and finally hand it over to Gemini for a fresh perspective. Same files, same continuous session history.
176
+ - **Domestic Model Routing** — When latency, cost, or compliance demands a non-frontier model, use a wrapper driver to run Claude Code effortlessly on DeepSeek or Doubao.
177
+ - **The Group Chat Agent** — Drop pikiclaw into a Feishu, Slack, Discord, or WeCom workgroup. The entire team shares one orchestrator, one project workspace, and a unified set of powerful skills.
178
+ - **Computer-Use, Controlled by You** — Enable the managed Chrome (Playwright) and macOS desktop (Peekaboo, via Accessibility + ScreenCaptureKit) capabilities. The agent can suddenly `see` the screen, click, type, and manage windows, menus, and the Dock—while you steer it from your phone. Book a meeting, scrape a complex dashboard, run end-to-end tests, or drive any native macOS application.
179
+ - **Skill-Driven Workflows** — Install community skills (`promote`, `snipe`, `review`, `security-review`, etc.) once, and trigger them instantly from any connected terminal using `/sk_<name>`.
180
180
 
181
181
  ---
182
182
 
183
- ## Features
183
+ ## Core Features
184
184
 
185
- ### Terminal layer
185
+ ### Terminal Layer
186
186
 
187
- - **Seven IM channels** — Telegram, Feishu, WeChat (personal), Slack, Discord, DingTalk, WeCom. Run one, several, or all simultaneously. Each channel is physically isolated; adding a new one (WhatsApp, mobile app, …) doesn't touch the others.
188
- - **Web Dashboard** — drive sessions directly from the browser with the same conversation, tool-use, and streaming surfaces as IM. Multi-pane workspace (1 / 2 / 3 / 6 panes), light / dark theme, EN / 中文 i18n.
189
- - **Live streaming preview** — message updates in place as the agent thinks; long text auto-splits; images and files stream back in real time.
187
+ - **Seven Native IM Channels** — Telegram, Feishu, WeChat (personal), Slack, Discord, DingTalk, and WeCom. Run one, several, or all of them simultaneously. Each channel is strictly isolated at the code level; adding a new one (like WhatsApp or a mobile app) requires zero changes to the others.
188
+ - **Web Dashboard** — Drive sessions directly from your browser with the exact same conversational flow, tool-use tracing, and streaming experience as IM. Enjoy a multi-pane workspace (1/2/3/6 panes), light/dark themes, and full EN/中文 i18n support.
189
+ - **Live Streaming Preview** — Watch messages update in place as the agent thinks. Long text auto-splits beautifully; images and files stream back to the UI in real time.
190
190
 
191
- ### Agent layer
191
+ ### Agent Layer
192
192
 
193
- - **Official CLIs as drivers** — Claude Code, Codex CLI, Gemini CLI, and Hermes (via ACP). No home-grown agent rewrite you get upstream behavior on day-zero updates.
194
- - **ACP-native** — Hermes integrates through the [Agent Client Protocol](https://agentclientprotocol.com), spawning `hermes acp` over JSON-RPC stdio. Any future ACP-compatible agent plugs in the same way.
195
- - **Pluggable registry** — `src/agent/driver.ts` is the only contract. New CLI- or ACP-based agents drop in alongside the four built-ins.
196
- - **Per-session agent switching** — same workspace, swap the brain.
197
- - **Steer** — interrupt a running task and let a queued message jump ahead in the queue.
198
- - **Codex human-in-the-loop** — when Codex pauses to ask, the question becomes an interactive IM prompt. Reply there; the task continues.
199
- - **Persistent goals** — `/goal` sets a long-running objective per session with token budget and pause/resume; the agent self-terminates when it audits the goal complete.
193
+ - **Official CLIs as Drivers** — Powered directly by Claude Code, Codex CLI, Gemini CLI, and Hermes (via ACP). We don't rewrite the agent core—you inherit upstream capabilities and Day-0 updates automatically.
194
+ - **ACP-Native Architecture** — Hermes integrates natively through the [Agent Client Protocol](https://agentclientprotocol.com), spawning `hermes acp` over JSON-RPC stdio. Any future ACP-compatible agent plugs in the exact same way.
195
+ - **Pluggable Driver Registry** — The only contract is `src/agent/driver.ts`. New CLI- or ACP-based agents can drop right in alongside our four built-in drivers.
196
+ - **Per-Session Agent Switching** — Swap the "brain" on the fly without leaving your workspace.
197
+ - **Steer & Interrupt** — Interrupt a heavy running task and force a queued message to the front of the line.
198
+ - **Codex Human-in-the-Loop** — When Codex pauses to ask you a question, it forwards the prompt interactively to your IM. Reply directly in the chat, and the task resumes seamlessly.
199
+ - **Persistent Goals** — Use `/goal` to set a long-running, session-scoped objective complete with a token budget. Supports pause/resume, and the agent will autonomously self-terminate only when it verifies the goal is complete.
200
200
 
201
- ### Model layer
201
+ ### Model Layer
202
202
 
203
- - **Frontier + domestic + proxies** — Claude (4 family), GPT-5 / Codex, Gemini, DeepSeek, 豆包 (Doubao), MiMo, MiniMax, OpenRouter, and any OpenAI-compatible model proxy.
204
- - **Providers + Profiles vault** — first-class data model with its own credential store under `~/.pikiclaw/setting.json`. Browse a read-only models.dev catalog, validate keys with a real provider probe, then bind a profile to an agent so spawn-time env injection is automatic.
205
- - **Per-session model + reasoning effort** — picked from the dashboard, `/models`, or `/mode`.
206
- - **Per-agent injection** — `resolveAgentInjection(agentId)` applies the active profile's env vars at spawn time, so Claude Code can run on top of DeepSeek or Doubao without touching the upstream client config.
203
+ - **Frontier + Domestic + Proxies** — Supports the Claude 4 family, GPT-5 / Codex, Gemini, DeepSeek, Doubao, MiMo, MiniMax, OpenRouter, and any custom OpenAI-compatible proxy endpoint.
204
+ - **Providers & Profiles Vault** — A first-class data model that securely isolates credentials in `~/.pikiclaw/setting.json`. Browse a read-only models.dev catalog, validate keys with real provider probes, and bind a profile to an agent for automatic environment injection at spawn-time.
205
+ - **Per-Session Model & Reasoning Effort** — Switch models or adjust reasoning capabilities dynamically via the Dashboard, `/models`, or `/mode`.
206
+ - **Per-Agent Deep Injection** — `resolveAgentInjection(agentId)` forces the active profile's environment variables down at spawn time. This means you can run Claude Code on top of DeepSeek or Doubao without ever touching the upstream client's config.
207
207
 
208
- ### Tool layer
208
+ ### Tool Layer
209
209
 
210
- - **Skills** — project skills in `.pikiclaw/skills/*/SKILL.md`, compatible with `.claude/commands/*.md`. One-click install from GitHub repos (`owner/repo`) or browse recommended packs (Anthropic Official, Vercel Agent Skills, ). Trigger with `/skills` and `/sk_<name>`.
211
- - **MCP servers** — browse the [MCP Registry](https://registry.modelcontextprotocol.io), add custom stdio / HTTP servers, health-check with a real handshake, OAuth 2.1 with Dynamic Client Registration, enable per scope. Recommended catalog includes GitHub, Atlassian, Notion, Linear, Sentry, Cloudflare, Slack, Feishu/Lark, Stripe, Hugging Face, Gamma, Brave Search, Perplexity, Filesystem, SQLite, PostgreSQL plus two built-in computer-use servers (`pikiclaw-browser` for Chrome via Playwright, `peekaboo` for macOS GUI via Peekaboo).
212
- - **CLI tools** — auto-detected with live version + auth state, OAuth-web login sessions for browser-based CLIs, all invoked through the agent's normal tool surface.
213
- - **Session-scoped MCP bridge** — `im_list_files`, `im_send_file`, `im_ask_user`, the managed-browser tools, and the macOS desktop tools (when enabled) are injected into every session automatically.
214
- - **Two-scope merge** — `global < workspace < built-in`, applied automatically to every session.
210
+ - **Robust Skills System** — Project-specific skills live safely in `.pikiclaw/skills/*/SKILL.md` (and we fully support legacy `.claude/commands/*.md` formats). Install community packages with one click from GitHub (`owner/repo`) or browse our curated packs (like Anthropic Official, Vercel Agent Skills, etc.). Trigger them anywhere with `/skills` and `/sk_<name>`.
211
+ - **Massive MCP Server Ecosystem** — Browse the [MCP Registry](https://registry.modelcontextprotocol.io), add custom stdio or HTTP servers, enforce real handshake health-checks, and utilize OAuth 2.1 with Dynamic Client Registration. Our recommended catalog flawlessly covers GitHub, Atlassian, Notion, Linear, Sentry, Cloudflare, Slack, Feishu/Lark, Stripe, Hugging Face, Gamma, Brave Search, Perplexity, Filesystem, SQLite, and PostgreSQL. Furthermore, we ship with two built-in, hyper-powerful computer-use servers: `pikiclaw-browser` (driving Chrome via Playwright) and `peekaboo` (driving the macOS GUI via Peekaboo).
212
+ - **Seamless CLI Tool Integration** — Auto-detects versions and authentication states for popular CLIs. We natively support OAuth-web login handoffs for browser-based authentications, routing everything smoothly through the agent's standard tool surface.
213
+ - **Session-Scoped MCP Bridge** — Foundational tools like `im_list_files`, `im_send_file`, `im_ask_user`, alongside the managed browser and macOS desktop tools (when enabled), are automatically injected deep into every single session you launch.
214
+ - **Two-Tier Merge Resolution** — Tool scopes follow a simple rule: `global < workspace < built-in`. The engine automatically resolves and merges these, applying them silently to every session.
215
215
 
216
216
  <p align="center"><img src="docs/promo-dashboard-extensions-add.png" alt="Add MCP server" width="780"></p>
217
217
 
218
- ### Runtime & DX
218
+ ### Runtime & Developer Experience
219
219
 
220
- - **Session workspace** — every session owns a directory; file attachments land there automatically.
221
- - **Resume, switch, classify** — multi-turn conversations, session classification (answer / proposal / implementation / blocked / …).
222
- - **Session-scoped MCP tools** — `im_list_files`, `im_send_file`, `im_ask_user`, and goal-management tools auto-injected into every stream.
223
- - **Computer-use (browser)** — built-in `pikiclaw-browser` MCP wraps `@playwright/mcp` with a shared Chrome profile and a process-level supervisor; log in once, reuse credentials across tasks.
224
- - **Computer-use (macOS desktop)** — built-in `peekaboo` MCP runs [Peekaboo](https://peekaboo.sh/) over Accessibility + ScreenCaptureKit; exposes `see`, `click`, `type`, `scroll`, `window`, `menu`, `app`, `dock`. Opt-in from Extensions; needs Accessibility + Screen Recording permissions. macOS only.
225
- - **Long-task hardening** — sleep prevention, watchdog, auto-restart, daemon mode, channel supervisor.
220
+ - **Dedicated Session Workspaces** — Every session gets its own isolated directory; file attachments and generated assets drop there automatically.
221
+ - **Resume, Switch, and Classify** — Flawless multi-turn conversation support with smart session classification (identifying answers, proposals, implementations, or blocked states).
222
+ - **Auto-Injected Base Tools** — Core MCP tools like file listing, sending, user prompting, and goal tracking are hard-wired into every stream.
223
+ - **Computer-Use (Browser Engine)** — The built-in `pikiclaw-browser` MCP is a hyper-charged wrapper over `@playwright/mcp`. It includes a process-level supervisor and shares an isolated Chrome profile. Log in to your tools once, and reuse those authenticated sessions across all future tasks!
224
+ - **Computer-Use (macOS Desktop)** — Enable the `peekaboo` MCP built-in server (macOS only) to unleash the [Peekaboo](https://peekaboo.sh/) framework over Accessibility and ScreenCaptureKit APIs. It exposes a god-mode suite of tools: `see`, `click`, `type`, `scroll`, `window`, `menu`, `app`, and `dock`. Requires explicit OS-level permissions but grants unprecedented control.
225
+ - **Hardened for Long Tasks** — Built with sleep prevention, watchdog timers, auto-restarts, daemon modes, and a robust channel supervisor. You can walk away knowing your marathon tasks are protected by an ironclad runtime.
226
226
 
227
227
  ---
228
228
 
229
- ## How is this different?
229
+ ## How Is This Different?
230
230
 
231
- | | pikiclaw | IDE assistants<br>(Cursor / Windsurf / Aider) | Cloud agents<br>(Devin / web Claude) | Single-agent IM bots |
231
+ | Feature | pikiclaw | IDE Assistants<br>(Cursor / Windsurf / Aider) | Cloud Agents<br>(Devin / Web Claude) | Single-Agent IM Bots |
232
232
  |---|---|---|---|---|
233
- | **Terminal** | 7 IM channels + Web + future plug-ins | IDE only | Web app | One IM, one bot |
234
- | **Where the agent runs** | Your machine | Your machine | Vendor sandbox | Often vendor |
235
- | **Agent choice** | Claude Code · Codex · Gemini · Hermes (ACP) · | Bundled | Single | Single |
236
- | **Model choice** | Frontier + domestic Chinese + any OpenAI-compatible | Vendor-controlled | Vendor-controlled | Single |
237
- | **Parallel agents** | **N agents × N windows × N workspaces** | One per IDE | Sequential | One |
238
- | **Files / tools** | Your files, your MCP, your CLIs | Your files | Sandbox | None / limited |
239
- | **Plug new terminal** | Add a `Channel` class | n/a | n/a | Fork |
240
- | **Plug new agent** | Add an `AgentDriver` (CLI or ACP) | n/a | n/a | Fork |
241
- | **Self-bootstrapping** | **Yes — built with itself** | No | No | No |
242
-
243
- The shape that matters: **you stay in your environment, you keep your choice of brain, you run a swarm in parallel, and the orchestrator is the same one we use to build the orchestrator.**
233
+ | **Terminal Access** | 7 IM channels + Web + Extensible | Locked inside the IDE | Confined to a Web app | One specific IM app |
234
+ | **Execution Environment** | Your local machine | Your local machine | Vendor's remote sandbox | Usually vendor servers |
235
+ | **Agent Flexibility** | Claude Code, Codex, Gemini, Hermes (ACP), etc. | Locked in | Single | Single |
236
+ | **Model Freedom** | Frontier models, domestic giants, OpenAI-proxies | Controlled by the platform | Controlled by the vendor | Single, hardcoded |
237
+ | **Concurrency Power** | **N Agents × N Windows × N Workspaces** | One agent per IDE window | Strictly sequential | Single thread |
238
+ | **Files & Tools Access** | Your entire local disk, your MCPs, your CLIs | Local project files | Heavily sandboxed | None or extremely limited |
239
+ | **Add a New Terminal** | Drop in a simple `Channel` class | Impossible | Impossible | Requires a hard fork |
240
+ | **Add a New Agent** | Implement a simple `AgentDriver` (CLI or ACP) | Impossible | Impossible | Requires a hard fork |
241
+ | **Self-Bootstrapping** | **Yes — completely built using itself** | No | No | No |
242
+
243
+ The shape that truly matters: **You never have to leave your preferred environment, you retain total choice over the "brain", you can drive a massive swarm in parallel, and the orchestrator is the exact same tool we use to build the orchestrator.**
244
244
 
245
245
  ---
246
246
 
247
- ## Commands
247
+ ## Command Reference
248
248
 
249
249
  | Command | Description |
250
250
  |---|---|
251
- | `/start` | Entry info, current agent, working directory |
252
- | `/sessions` | View, switch, or create sessions |
253
- | `/agents` | Switch agent (Claude · Codex · Gemini · Hermes) |
254
- | `/models` | View and switch model / reasoning effort |
255
- | `/mode` | Toggle plan mode (reasoning effort) |
256
- | `/switch` | Browse and switch working directory |
251
+ | `/start` | View entry info, the active agent, and your working directory |
252
+ | `/sessions` | View, switch, or create new sessions |
253
+ | `/agents` | Switch the active Agent (Claude · Codex · Gemini · Hermes) |
254
+ | `/models` | View and switch the model or reasoning effort for the session |
255
+ | `/mode` | Toggle plan mode / reasoning effort |
256
+ | `/switch` | Browse and switch the working directory |
257
257
  | `/workspaces` | Pick a saved workspace from the Dashboard's quick-pick list |
258
258
  | `/goal` | Set or inspect a long-running, self-terminating session goal |
259
- | `/stop` | Stop current session |
260
- | `/status` | Runtime status, tokens, usage, session info |
261
- | `/host` | Host CPU / memory / disk / battery |
262
- | `/skills` | Browse project skills |
263
- | `/ext` | Extensions overview |
264
- | `/restart` | Restart and re-launch bot |
265
- | `/sk_<name>` | Run a project skill |
259
+ | `/stop` | Force-stop the current session |
260
+ | `/status` | Check runtime status, token usage, resource consumption, and session info |
261
+ | `/host` | Monitor host CPU, memory, disk, and battery levels |
262
+ | `/skills` | Browse available project skills |
263
+ | `/ext` | View the extensions overview |
264
+ | `/restart` | Restart and re-launch the underlying Bot service |
265
+ | `/sk_<name>` | Instantly run a specific project skill |
266
266
 
267
- Plain text is forwarded to the current agent.
267
+ *Note: Plain text without a slash is forwarded directly to the current agent.*
268
268
 
269
269
  ---
270
270
 
271
271
  ## Configuration
272
272
 
273
- - Persistent config: `~/.pikiclaw/setting.json` channels, agents, Providers/Profiles, workspaces, MCP extensions
274
- - The Dashboard is the primary configuration surface; the terminal wizard (`--setup`) and `--doctor` exist for headless setups
275
- - Global MCP extensions live under `extensions.mcp` in the setting file
276
- - Workspace MCP extensions: standard `.mcp.json` in the project root
277
- - Project skills: `.pikiclaw/skills/*/SKILL.md` (also picks up `.claude/commands/*.md`)
273
+ - **Persistent Configuration:** `~/.pikiclaw/setting.json` stores your channels, agents, Providers/Profiles, workspaces, and MCP extensions.
274
+ - The **Dashboard** is the primary UI for configuration. The terminal wizard (`--setup`) and the doctor script (`--doctor`) are available for headless or CLI-first users.
275
+ - Global MCP extensions are stored under the `extensions.mcp` key in the setting file.
276
+ - Workspace MCP extensions follow standard conventions and are read from `.mcp.json` in the project root.
277
+ - Project skills are loaded automatically from `.pikiclaw/skills/*/SKILL.md` (and we also support legacy `.claude/commands/*.md` formats).
278
278
 
279
- **Computer-use** is gated by two toggles under Extensions:
279
+ **Computer-Use Toggles** (managed via the Extensions dashboard):
280
280
 
281
- - `browserEnabled` — managed Chrome (Playwright). The first time an agent needs Chrome, pikiclaw creates a dedicated profile under `~/.pikiclaw` and reuses it across sessions. Log in to the sites you need once; every future session reuses those credentials.
282
- - `peekabooEnabled` — macOS desktop (Peekaboo). When on (macOS only), pikiclaw spawns `@steipete/peekaboo`'s `peekaboo-mcp` binary and injects its tools. Grant the parent terminal **Accessibility** and **Screen Recording** in System Settings → Privacy & Security before flipping the toggle.
281
+ - `browserEnabled` — Enables managed Chrome (Playwright). Upon first use, pikiclaw creates a dedicated profile in `~/.pikiclaw` and reuses it for subsequent sessions. Log in once, and never scan a QR code or enter a password for those tools again.
282
+ - `peekabooEnabled` — Enables macOS desktop automation (Peekaboo). Available on macOS only. Activating this launches `@steipete/peekaboo`'s `peekaboo-mcp` binary and injects its UI-controlling tools. *Note: You must grant your terminal **Accessibility** and **Screen Recording** permissions in System Settings → Privacy & Security before enabling this.*
283
283
 
284
284
  ---
285
285
 
286
286
  ## Roadmap
287
287
 
288
- Already shipped: Hermes driver · ACP (Agent Client Protocol) · Provider/Profile model vault · seven IM channels · computer-use (Playwright browser + Peekaboo macOS desktop).
288
+ **Already Shipped:** Hermes driver integration · ACP (Agent Client Protocol) · Secure Provider/Profile vault · Seven native IM channels · Computer-use via Playwright and Peekaboo (macOS).
289
289
 
290
- - **More ACP agents** — every new ACP-compatible agent should drop in without a hand-written driver
291
- - **More terminals** — WhatsApp, dedicated mobile app, voice
292
- - **Deeper model layer** — agent-on-arbitrary-model wrappers for more domestic series
293
- - **Better tool ecosystem** — recommended MCP packs, skill templates, marketplace
294
- - **Cross-platform computer-use** — Windows / Linux desktop drivers alongside the macOS Peekaboo bridge
290
+ - **More ACP Agents** — Ensuring any new ACP-compatible agent can drop in with zero code changes.
291
+ - **Broader Terminal Ecosystem** — Adding support for WhatsApp, a dedicated mobile app, and voice interfaces.
292
+ - **Deeper Model Wrapping** — Building agent-on-arbitrary-model wrappers to support a wider array of domestic and open-source models seamlessly.
293
+ - **Richer Tool Ecosystem** — Releasing official MCP packs, skill templates, and a community marketplace.
294
+ - **Cross-Platform Computer-Use** — Extending desktop control drivers beyond macOS to support Windows and Linux.
295
295
 
296
- See [ACP Migration Plan](docs/acp-migration.md) for the protocol-side details.
296
+ For protocol-level insights, see our [ACP Migration Plan](docs/acp-migration.md).
297
297
 
298
298
  ---
299
299
 
@@ -308,43 +308,43 @@ npm test
308
308
  ```
309
309
 
310
310
  ```bash
311
- npm run dev # local dev (--no-daemon, logs to ~/.pikiclaw/dev/dev.log)
312
- npm run build # production build (dashboard + tsc)
313
- npm test # vitest run
314
- npx pikiclaw@latest --doctor # environment check
311
+ npm run dev # Start local dev server (--no-daemon, logs to ~/.pikiclaw/dev/dev.log)
312
+ npm run build # Production build (Dashboard + tsc)
313
+ npm test # Run Vitest suite
314
+ npx pikiclaw@latest --doctor # Environment health check
315
315
  ```
316
316
 
317
- Architecture and integration deep dives: [ARCHITECTURE.md](ARCHITECTURE.md) · [INTEGRATION.md](INTEGRATION.md) · [TESTING.md](TESTING.md)
317
+ For deep dives into the architecture and integration, see: [ARCHITECTURE.md](ARCHITECTURE.md) · [INTEGRATION.md](INTEGRATION.md) · [TESTING.md](TESTING.md).
318
318
 
319
319
  ---
320
320
 
321
321
  ## Contributing
322
322
 
323
- The project is built around layers that are *meant* to be extended. New terminals, new agents, new model wrappers, new MCP tools — all are first-class contributions.
323
+ Every layer of this project was designed from the ground up to be **extended**. Adding a new terminal, writing a new agent driver, wrapping a new model, or building a killer MCP toolthese are all first-class contributions.
324
324
 
325
- - Read the **[Contributing Guide](CONTRIBUTING.md)** to get started
326
- - Browse [`good first issue`](https://github.com/xiaotonng/pikiclaw/labels/good%20first%20issue) and [`help wanted`](https://github.com/xiaotonng/pikiclaw/labels/help%20wanted)
327
- - Open an issue first for larger changes so we can align on approach
325
+ - Read the **[Contributing Guide](CONTRIBUTING.md)** to get started.
326
+ - Check out issues tagged with [`good first issue`](https://github.com/xiaotonng/pikiclaw/labels/good%20first%20issue) and [`help wanted`](https://github.com/xiaotonng/pikiclaw/labels/help%20wanted).
327
+ - For major architectural changes, please open an issue first to align on the technical approach.
328
328
 
329
- | Where | What you'd add |
329
+ | Module | What You Can Extend |
330
330
  |---|---|
331
- | `src/agent/driver.ts`, `src/agent/drivers/*.ts`, `src/agent/acp-client.ts` | A new agent driver (CLI- or ACP-based) |
332
- | `src/channels/base.ts`, `src/channels/*/` | A new terminal / IM channel |
333
- | `src/model/`, `src/model/injector.ts` | A new model provider or per-agent injection rule |
334
- | `src/dashboard/routes/*.ts` | A new dashboard API surface |
335
- | `src/agent/mcp/tools/*.ts`, `src/agent/mcp/bridge.ts` | New session-scoped MCP tools |
336
- | `src/catalog/*.ts` | A recommended MCP server / CLI tool / skill repo |
331
+ | `src/agent/driver.ts`, `src/agent/drivers/*.ts`, `src/agent/acp-client.ts` | Add a new Agent Driver (CLI-based or ACP-compatible) |
332
+ | `src/channels/base.ts`, `src/channels/*/` | Integrate a new Terminal or IM channel |
333
+ | `src/model/`, `src/model/injector.ts` | Add a new model provider or customize agent environment injection rules |
334
+ | `src/dashboard/routes/*.ts` | Expand the Dashboard backend API |
335
+ | `src/agent/mcp/tools/*.ts`, `src/agent/mcp/bridge.ts` | Add new session-scoped MCP tools |
336
+ | `src/catalog/*.ts` | Recommend high-quality MCP servers, CLI tools, or Skill repositories |
337
337
 
338
338
  ---
339
339
 
340
- ## Star history
340
+ ## Star History
341
341
 
342
342
  <a href="https://www.star-history.com/#xiaotonng/pikiclaw&Date">
343
- <img src="https://api.star-history.com/svg?repos=xiaotonng/pikiclaw&type=Date" alt="Star history" width="640">
343
+ <img src="https://api.star-history.com/svg?repos=xiaotonng/pikiclaw&type=Date" alt="Star History" width="640">
344
344
  </a>
345
345
 
346
346
  ---
347
347
 
348
348
  ## License
349
349
 
350
- [MIT](LICENSE) — built in the open. Use it, fork it, plug your own layer in.
350
+ [MIT](LICENSE) — Built in the open. Use it, fork it, and plug in your own layers.