pikiclaw 0.3.36 → 0.3.37

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +149 -149
  2. package/README.zh-CN.md +153 -153
  3. package/package.json +1 -1
package/README.md CHANGED
@@ -6,7 +6,7 @@
6
6
 
7
7
  ##### *The open Agent orchestrator for the era when creators no longer need to read code.*
8
8
 
9
- *Plug in any agent (Claude · Codex · Gemini · Hermes · …), any model (Claude · GPT · Gemini · DeepSeek · 豆包 · MiMo · MiniMax · OpenRouter · or any third-party proxy), any tool (Skills · MCP · CLI). Drive them from any terminal — IM, Web, or future. Pikiclaw is built using pikiclaw.*
9
+ *Plug in any agent (Claude · Codex · Gemini · Hermes · …), any model (Claude · GPT · Gemini · DeepSeek · Doubao · MiMo · MiniMax · OpenRouter · or any third-party proxy), and any tool (Skills · MCP · CLI). Drive them seamlessly from your favorite terminal—whether it's an IM, Web Dashboard, or future interfaces. pikiclaw is built using pikiclaw.*
10
10
 
11
11
  ```bash
12
12
  npx pikiclaw@latest
@@ -32,12 +32,12 @@ npx pikiclaw@latest
32
32
 
33
33
  ## What is pikiclaw?
34
34
 
35
- **Most "AI dev tool" projects pick one slice — one IDE, one agent, one model vendor and stop there.** pikiclaw is built around a different bet: the next era of building does not happen inside a single editor. It happens through an **orchestrator** that lets a creator drive a *swarm* of agents in parallel, from one console — on the best models, through whatever terminal is closest at hand. And never open a code file.
35
+ **Most "AI dev tools" settle for a narrow slice of the piebinding you to a single IDE, a specific agent, or a closed model ecosystem.** pikiclaw is built on a fundamentally different premise: the next era of software creation won't be confined to a single code editor. It happens within an **Orchestrator** that empowers a creator to drive a *swarm* of agents—in parallel, from one console—running on the best models available, through whichever terminal is closest at hand. And you might never need to open a code file.
36
36
 
37
- The product is the orchestrator. Everything else plugs in. **And the orchestrator is built using itself** pikiclaw is what we use to build pikiclaw.
37
+ The product is the orchestrator itself. Everything else simply plugs in. **And what's cooler is that this orchestrator is entirely self-bootstrapped**—pikiclaw is what we use to build pikiclaw.
38
38
 
39
- ```
40
- Terminal layer Telegram · Feishu · WeChat · Slack · Discord · DingTalk · WeCom · Web Dashboard
39
+ ```text
40
+ Terminal Layer Telegram · Feishu · WeChat · Slack · Discord · DingTalk · WeCom · Web Dashboard
41
41
  \__________________________|__________________________/
42
42
  v
43
43
  ┌──────────────────────────────┐
@@ -46,88 +46,88 @@ The product is the orchestrator. Everything else plugs in. **And the orchestrato
46
46
  |
47
47
  ┌────────────────────────────────────────┼────────────────────────────────────────┐
48
48
  v v v
49
- Agent layer Model layer Tool layer
49
+ Agent Layer Model Layer Tool Layer
50
50
  Claude Code · Codex · Gemini · Hermes Claude · GPT · Gemini · DeepSeek Skills · MCP · CLI
51
- (driver registry · ACP · any agent) 豆包 · MiMo · MiniMax · OpenRouter (global × workspace)
51
+ (driver registry · ACP · any agent) Doubao · MiMo · MiniMax · OpenRouter (global × workspace)
52
52
  · any OpenAI-compatible proxy · …
53
53
  |
54
54
  v
55
- Your computer
55
+ Your Machine
56
56
  ```
57
57
 
58
- - **Terminal layer** — Telegram, Feishu, WeChat, Slack, Discord, DingTalk, WeCom, and the Web Dashboard are co-equal entry points. New terminals plug in here.
59
- - **Agent layer** — Official Claude Code / Codex / Gemini / Hermes CLIs as drivers. Hermes speaks ACP (Agent Client Protocol); the registry takes any agent.
60
- - **Model layer** — Claude / GPT / Gemini, the domestic Chinese series (DeepSeek, 豆包, MiMo, MiniMax), plus OpenRouter and any OpenAI-compatible proxy. Providers + Profiles are a first-class layer with their own credential vault, models.dev catalog, and per-agent injection.
61
- - **Tool layer** — Skills, MCP servers, and CLI tools merged across global and workspace scopes, injected into every session.
58
+ - **Terminal Layer** — Telegram, Feishu, WeChat, Slack, Discord, DingTalk, WeCom, and the Web Dashboard are all first-class, co-equal entry points. New terminals plug right in.
59
+ - **Agent Layer** — We use the official Claude Code, Codex, Gemini, and Hermes CLIs as underlying drivers. Hermes communicates via ACP (Agent Client Protocol); our flexible registry can accommodate virtually any agent.
60
+ - **Model Layer** — Access Claude, GPT, Gemini, leading Chinese domestic models (DeepSeek, Doubao, MiMo, MiniMax), plus OpenRouter and any OpenAI-compatible proxy. Providers and Profiles are treated as a first-class layer with their own credential vault, a read-only models.dev catalog, and per-agent environment injection.
61
+ - **Tool Layer** — Skills, MCP servers, and CLI tools are intelligently merged across global and workspace scopes, automatically injected into every session.
62
62
 
63
63
  ---
64
64
 
65
- ## Built with itself
65
+ ## Built with Itself
66
66
 
67
- > The most credible test of an Agent orchestrator is whether it can build itself. pikiclaw can. We use pikiclaw to develop, test, release, and operate pikiclaw — every commit, every release.
67
+ > The most credible test of an Agent orchestrator is whether it can build itself. pikiclaw can. We use pikiclaw to develop, test, release, and operate pikiclaw—driving every commit and every release.
68
68
 
69
- A typical day-of-development inside pikiclaw:
69
+ A typical day of development inside pikiclaw:
70
70
 
71
- - A Claude Code session in window 1 implements a new dashboard route.
72
- - A Codex session in window 2 writes the matching unit tests, against the same workspace.
73
- - A Gemini session in window 3 reviews the diff and drafts the changelog.
74
- - A skill (`/sk_promote`) sweeps GitHub for relevant issues and replies in a fourth thread.
75
- - All four streams run in parallel; one human steers them from a phone in a coffee shop.
71
+ - A Claude Code session in pane 1 implements a new dashboard route.
72
+ - A Codex session in pane 2 writes the matching unit tests against the same workspace.
73
+ - A Gemini session in pane 3 reviews the diffs and drafts the changelog.
74
+ - Meanwhile, a background skill (`/sk_promote`) sweeps GitHub for relevant issues and automatically drafts replies in a fourth thread.
75
+ - All four streams run entirely in parallel; a single human steers them all from a phone in a coffee shop.
76
76
 
77
- The orchestrator is the product. It also happens to be the IDE the orchestrator is built in.
77
+ The orchestrator is the product. It also happens to be the ultimate IDE in which the orchestrator itself is built.
78
78
 
79
79
  ---
80
80
 
81
- ## A swarm by default
81
+ ## A Swarm by Default
82
82
 
83
- Most "AI dev tools" assume one user, one agent, one task at a time. pikiclaw assumes the opposite: **N agents, N windows, one operator, one toolkit.**
83
+ Most "AI dev tools" assume a 1:1:1 ratio: one user, one agent, one task at a time. pikiclaw assumes the exact opposite: **N agents, N windows, one operator, one unified toolkit.**
84
84
 
85
- - **N parallel sessions** — every dashboard pane is an independent agent stream against an independent session workspace; IM threads add even more.
86
- - **Mix-and-match agents** — Claude Code in pane 1, Codex in pane 2, Gemini in pane 3, all on different repos / workspaces.
87
- - **One toolkit** — global skills, global MCP servers, and per-workspace overrides apply uniformly. You configure once; every session inherits.
88
- - **Steer anywhere** — interrupt any running stream, queue a follow-up, hand control to the next agent in line.
89
- - **Group-mode** — drop the orchestrator into a Feishu / Slack / Discord / WeCom group; teammates share the same swarm.
85
+ - **N Parallel Sessions** — Every dashboard pane represents an independent agent stream tied to an independent session workspace. Add IM threads, and you scale effortlessly.
86
+ - **Mix-and-Match Agents** — Run Claude Code in pane 1, Codex in pane 2, and Gemini in pane 3, all working simultaneously on different repositories or workspaces.
87
+ - **One Unified Toolkit** — Global skills, global MCP servers, and per-workspace overrides apply uniformly. Configure it once, and every session inherits the power.
88
+ - **Steer from Anywhere** — Interrupt any running stream, queue a follow-up instruction, or hand over control to the next agent in line seamlessly.
89
+ - **Group Collaboration Mode** — Drop the orchestrator into a Feishu, Slack, Discord, or WeCom group, and let your entire team share and steer the same agent swarm.
90
90
 
91
- This is the shape that matters: one creator, with a swarm at their fingertips.
91
+ This is the shape that matters: one creator, with a swarm of AI agents at their fingertips.
92
92
 
93
93
  ---
94
94
 
95
- ## See it in action
95
+ ## See It in Action
96
96
 
97
- > **Real task** — ask pikiclaw to gather and summarize today's AI news; the agent reads, writes, and ships the result back through Telegram, all from your phone.
97
+ > **Real-world Task** — Ask pikiclaw to gather and summarize today's AI news; the agent reads, writes, and ships the results back through Telegram, all controlled from your phone.
98
98
 
99
- <p align="center"><img src="docs/promo-demo.gif" alt="Demo: ask Telegram, agent works locally, result returns to chat" width="780"></p>
99
+ <p align="center"><img src="docs/promo-demo.gif" alt="Demo: Ask Telegram, agent works locally, result returns to chat" width="780"></p>
100
100
 
101
- > **Web Dashboard** — multi-pane workspace with session list, conversation, tool-use traces, and input composer (1 / 2 / 3 / 6 pane layouts).
101
+ > **Web Dashboard** — A multi-pane workspace featuring a session list, conversation threads, tool-use traces, and an input composer (supporting 1, 2, 3, or 6-pane layouts).
102
102
 
103
103
  <p align="center"><img src="docs/promo-dashboard-workspace.png" alt="Web Dashboard workspace" width="780"></p>
104
104
 
105
105
  <details>
106
- <summary><b>More: basic ops · IM access · agents · models · extensions · permissions · system info</b></summary>
106
+ <summary><b>More: Basic Ops · IM Access · Agents · Models · Extensions · Permissions · System Info</b></summary>
107
107
 
108
- > Send a message, watch the agent stream, receive files back.
108
+ > Send a message, watch the agent stream its thoughts, and receive files back instantly.
109
109
 
110
110
  <img src="docs/promo-basic-ops.gif" alt="Basic operations" width="780">
111
111
 
112
- > **IM Access** — Telegram, Feishu, WeChat, Slack, Discord, DingTalk, WeCom channel status and configuration
112
+ > **IM Access** — Check and configure connection statuses for Telegram, Feishu, WeChat, Slack, Discord, DingTalk, and WeCom.
113
113
 
114
114
  <img src="docs/promo-dashboard-im.png" alt="IM Access" width="780">
115
115
 
116
- > **Agents** — installed agent CLIs, default agent, per-agent model / reasoning effort
116
+ > **Agents** — Manage installed agent CLIs, set your default agent, and configure per-agent models and reasoning effort levels.
117
117
 
118
118
  <img src="docs/promo-dashboard-agents.png" alt="Agents" width="780">
119
119
 
120
- > **Models** — Providers + Profiles vault (Claude, GPT, Gemini, DeepSeek, 豆包, MiMo, MiniMax, OpenRouter, any OpenAI-compatible proxy), validated against models.dev catalog and injected per agent
120
+ > **Models** — A secure Providers + Profiles vault (supporting Claude, GPT, Gemini, DeepSeek, Doubao, MiMo, MiniMax, OpenRouter, and any OpenAI-compatible proxy), validated against the models.dev catalog and injected directly per agent.
121
121
 
122
- > **Extensions** — global MCP servers, community skills, managed browser + macOS desktop (Peekaboo) automation
122
+ > **Extensions** — Manage global MCP servers, community skills, and built-in automation for headless browsers and macOS desktop (Peekaboo).
123
123
 
124
124
  <img src="docs/promo-dashboard-extensions.png" alt="Extensions" width="780">
125
125
 
126
- > **System Permissions** — macOS accessibility, screen recording, disk access
126
+ > **System Permissions** — Handle macOS Accessibility, Screen Recording, and Disk Access permissions seamlessly.
127
127
 
128
128
  <img src="docs/promo-dashboard-permissions.png" alt="Permissions" width="780">
129
129
 
130
- > **System Info** — working directory, CPU / memory / disk monitoring
130
+ > **System Info** — Monitor your working directory alongside real-time CPU, memory, and disk usage.
131
131
 
132
132
  <img src="docs/promo-dashboard-system.png" alt="System Info" width="780">
133
133
 
@@ -135,9 +135,9 @@ This is the shape that matters: one creator, with a swarm at their fingertips.
135
135
 
136
136
  ---
137
137
 
138
- ## Quick start
138
+ ## Quick Start
139
139
 
140
- **Prereqs:** Node.js 20+, plus at least one official Agent CLI logged in:
140
+ **Prerequisites:** Node.js 20+, plus at least one official Agent CLI installed and authenticated on your system:
141
141
 
142
142
  - [`claude`](https://docs.anthropic.com/en/docs/claude-code) (Claude Code)
143
143
  - [`codex`](https://github.com/openai/codex) (Codex CLI)
@@ -153,147 +153,147 @@ npx pikiclaw@latest
153
153
 
154
154
  <p align="center"><img src="docs/promo-install.gif" alt="One-command install" width="780"></p>
155
155
 
156
- That opens the **Web Dashboard** at `http://localhost:3939` drive sessions in the browser, connect IM channels, configure agents/models, install MCP servers and skills, manage system permissions. Everything else is one click away.
156
+ This instantly opens the **Web Dashboard** at `http://localhost:3939`. From there, you can drive sessions in the browser, connect IM channels, configure agents and models, install MCP servers and skills, and manage system permissions. Everything else is just one click away.
157
157
 
158
158
  <details>
159
- <summary><b>Prefer the terminal? There's a wizard.</b></summary>
159
+ <summary><b>Prefer the terminal? We have a setup wizard.</b></summary>
160
160
 
161
161
  ```bash
162
- npx pikiclaw@latest --setup # interactive terminal wizard
163
- npx pikiclaw@latest --doctor # environment check only
162
+ npx pikiclaw@latest --setup # Interactive terminal setup wizard
163
+ npx pikiclaw@latest --doctor # Environment health check only
164
164
  ```
165
165
 
166
166
  </details>
167
167
 
168
168
  ---
169
169
 
170
- ## What people do with it
170
+ ## How People Are Using It
171
171
 
172
- - **Run a swarm in parallel** — open N sessions in N dashboard panes (or N IM threads), each a different agent on a different workspace, all working at the same time. One person, many agents, one cockpit. Steer any of them at any moment.
173
- - **Self-hosted dev loop** — pikiclaw was built using pikiclaw. The dev workflow *is* the product: drive the orchestrator from your phone, write code, ship a release, iterate.
174
- - **Walk-away coding** — kick off a long refactor, close the laptop, drive it from your phone over Telegram. The agent keeps running locally; results stream back to chat.
175
- - **Multi-agent on one workspace** — let Claude Code draft an implementation, switch to Codex to review, then Gemini for a different perspective. Same files, same session history.
176
- - **Domestic-model routing** — run Claude Code over DeepSeek or 豆包 via a wrapper driver when latency, cost, or compliance demands a non-frontier model.
177
- - **Group-chat agent** — drop pikiclaw into a Feishu / Slack / Discord / WeCom work group; the team shares one orchestrator, one workspace, one set of skills.
178
- - **Computer-use, controlled by you** — toggle on the managed Chrome (Playwright) and macOS desktop (Peekaboo, via Accessibility + ScreenCaptureKit). The agent can `see` the screen, click, type, manage windows / menus / Dock and you steer it from any phone. Book a meeting, scrape a dashboard, run an end-to-end test, or drive any native macOS app.
179
- - **Skill-driven workflows** — install community skills (`promote`, `snipe`, `review`, `security-review`, ) once and trigger them from any terminal with `/sk_<name>`.
172
+ - **Run a Swarm in Parallel** — Open N sessions in N dashboard panes (or N IM threads), each running a different agent on a different workspace, all executing simultaneously. One person, many agents, one unified cockpit. Steer any of them at any moment.
173
+ - **Self-Hosted Dev Loop** — pikiclaw was built using pikiclaw. The dev workflow *is* the product: drive the orchestrator from your phone, write code, ship a release, and iterate.
174
+ - **Walk-Away Coding** — Kick off a massive refactoring task, close your laptop, and monitor/steer it from your phone over Telegram. The agent continues running locally, streaming results back to your chat.
175
+ - **Multi-Agent Tag Team** — Let Claude Code draft an initial implementation, switch to Codex for an in-depth review, and finally hand it over to Gemini for a fresh perspective. Same files, same continuous session history.
176
+ - **Domestic Model Routing** — When latency, cost, or compliance demands a non-frontier model, use a wrapper driver to run Claude Code effortlessly on DeepSeek or Doubao.
177
+ - **The Group Chat Agent** — Drop pikiclaw into a Feishu, Slack, Discord, or WeCom workgroup. The entire team shares one orchestrator, one project workspace, and a unified set of powerful skills.
178
+ - **Computer-Use, Controlled by You** — Enable the managed Chrome (Playwright) and macOS desktop (Peekaboo, via Accessibility + ScreenCaptureKit) capabilities. The agent can suddenly `see` the screen, click, type, and manage windows, menus, and the Dock—while you steer it from your phone. Book a meeting, scrape a complex dashboard, run end-to-end tests, or drive any native macOS application.
179
+ - **Skill-Driven Workflows** — Install community skills (`promote`, `snipe`, `review`, `security-review`, etc.) once, and trigger them instantly from any connected terminal using `/sk_<name>`.
180
180
 
181
181
  ---
182
182
 
183
- ## Features
183
+ ## Core Features
184
184
 
185
- ### Terminal layer
185
+ ### Terminal Layer
186
186
 
187
- - **Seven IM channels** — Telegram, Feishu, WeChat (personal), Slack, Discord, DingTalk, WeCom. Run one, several, or all simultaneously. Each channel is physically isolated; adding a new one (WhatsApp, mobile app, …) doesn't touch the others.
188
- - **Web Dashboard** — drive sessions directly from the browser with the same conversation, tool-use, and streaming surfaces as IM. Multi-pane workspace (1 / 2 / 3 / 6 panes), light / dark theme, EN / 中文 i18n.
189
- - **Live streaming preview** — message updates in place as the agent thinks; long text auto-splits; images and files stream back in real time.
187
+ - **Seven Native IM Channels** — Telegram, Feishu, WeChat (personal), Slack, Discord, DingTalk, and WeCom. Run one, several, or all of them simultaneously. Each channel is strictly isolated at the code level; adding a new one (like WhatsApp or a mobile app) requires zero changes to the others.
188
+ - **Web Dashboard** — Drive sessions directly from your browser with the exact same conversational flow, tool-use tracing, and streaming experience as IM. Enjoy a multi-pane workspace (1/2/3/6 panes), light/dark themes, and full EN/中文 i18n support.
189
+ - **Live Streaming Preview** — Watch messages update in place as the agent thinks. Long text auto-splits beautifully; images and files stream back to the UI in real time.
190
190
 
191
- ### Agent layer
191
+ ### Agent Layer
192
192
 
193
- - **Official CLIs as drivers** — Claude Code, Codex CLI, Gemini CLI, and Hermes (via ACP). No home-grown agent rewrite you get upstream behavior on day-zero updates.
194
- - **ACP-native** — Hermes integrates through the [Agent Client Protocol](https://agentclientprotocol.com), spawning `hermes acp` over JSON-RPC stdio. Any future ACP-compatible agent plugs in the same way.
195
- - **Pluggable registry** — `src/agent/driver.ts` is the only contract. New CLI- or ACP-based agents drop in alongside the four built-ins.
196
- - **Per-session agent switching** — same workspace, swap the brain.
197
- - **Steer** — interrupt a running task and let a queued message jump ahead in the queue.
198
- - **Codex human-in-the-loop** — when Codex pauses to ask, the question becomes an interactive IM prompt. Reply there; the task continues.
199
- - **Persistent goals** — `/goal` sets a long-running objective per session with token budget and pause/resume; the agent self-terminates when it audits the goal complete.
193
+ - **Official CLIs as Drivers** — Powered directly by Claude Code, Codex CLI, Gemini CLI, and Hermes (via ACP). We don't rewrite the agent core—you inherit upstream capabilities and Day-0 updates automatically.
194
+ - **ACP-Native Architecture** — Hermes integrates natively through the [Agent Client Protocol](https://agentclientprotocol.com), spawning `hermes acp` over JSON-RPC stdio. Any future ACP-compatible agent plugs in the exact same way.
195
+ - **Pluggable Driver Registry** — The only contract is `src/agent/driver.ts`. New CLI- or ACP-based agents can drop right in alongside our four built-in drivers.
196
+ - **Per-Session Agent Switching** — Swap the "brain" on the fly without leaving your workspace.
197
+ - **Steer & Interrupt** — Interrupt a heavy running task and force a queued message to the front of the line.
198
+ - **Codex Human-in-the-Loop** — When Codex pauses to ask you a question, it forwards the prompt interactively to your IM. Reply directly in the chat, and the task resumes seamlessly.
199
+ - **Persistent Goals** — Use `/goal` to set a long-running, session-scoped objective complete with a token budget. Supports pause/resume, and the agent will autonomously self-terminate only when it verifies the goal is complete.
200
200
 
201
- ### Model layer
201
+ ### Model Layer
202
202
 
203
- - **Frontier + domestic + proxies** — Claude (4 family), GPT-5 / Codex, Gemini, DeepSeek, 豆包 (Doubao), MiMo, MiniMax, OpenRouter, and any OpenAI-compatible model proxy.
204
- - **Providers + Profiles vault** — first-class data model with its own credential store under `~/.pikiclaw/setting.json`. Browse a read-only models.dev catalog, validate keys with a real provider probe, then bind a profile to an agent so spawn-time env injection is automatic.
205
- - **Per-session model + reasoning effort** — picked from the dashboard, `/models`, or `/mode`.
206
- - **Per-agent injection** — `resolveAgentInjection(agentId)` applies the active profile's env vars at spawn time, so Claude Code can run on top of DeepSeek or Doubao without touching the upstream client config.
203
+ - **Frontier + Domestic + Proxies** — Supports the Claude 4 family, GPT-5 / Codex, Gemini, DeepSeek, Doubao, MiMo, MiniMax, OpenRouter, and any custom OpenAI-compatible proxy endpoint.
204
+ - **Providers & Profiles Vault** — A first-class data model that securely isolates credentials in `~/.pikiclaw/setting.json`. Browse a read-only models.dev catalog, validate keys with real provider probes, and bind a profile to an agent for automatic environment injection at spawn-time.
205
+ - **Per-Session Model & Reasoning Effort** — Switch models or adjust reasoning capabilities dynamically via the Dashboard, `/models`, or `/mode`.
206
+ - **Per-Agent Deep Injection** — `resolveAgentInjection(agentId)` forces the active profile's environment variables down at spawn time. This means you can run Claude Code on top of DeepSeek or Doubao without ever touching the upstream client's config.
207
207
 
208
- ### Tool layer
208
+ ### Tool Layer
209
209
 
210
- - **Skills** — project skills in `.pikiclaw/skills/*/SKILL.md`, compatible with `.claude/commands/*.md`. One-click install from GitHub repos (`owner/repo`) or browse recommended packs (Anthropic Official, Vercel Agent Skills, ). Trigger with `/skills` and `/sk_<name>`.
211
- - **MCP servers** — browse the [MCP Registry](https://registry.modelcontextprotocol.io), add custom stdio / HTTP servers, health-check with a real handshake, OAuth 2.1 with Dynamic Client Registration, enable per scope. Recommended catalog includes GitHub, Atlassian, Notion, Linear, Sentry, Cloudflare, Slack, Feishu/Lark, Stripe, Hugging Face, Gamma, Brave Search, Perplexity, Filesystem, SQLite, PostgreSQL plus two built-in computer-use servers (`pikiclaw-browser` for Chrome via Playwright, `peekaboo` for macOS GUI via Peekaboo).
212
- - **CLI tools** — auto-detected with live version + auth state, OAuth-web login sessions for browser-based CLIs, all invoked through the agent's normal tool surface.
213
- - **Session-scoped MCP bridge** — `im_list_files`, `im_send_file`, `im_ask_user`, the managed-browser tools, and the macOS desktop tools (when enabled) are injected into every session automatically.
214
- - **Two-scope merge** — `global < workspace < built-in`, applied automatically to every session.
210
+ - **Robust Skills System** — Project-specific skills live safely in `.pikiclaw/skills/*/SKILL.md` (and we fully support legacy `.claude/commands/*.md` formats). Install community packages with one click from GitHub (`owner/repo`) or browse our curated packs (like Anthropic Official, Vercel Agent Skills, etc.). Trigger them anywhere with `/skills` and `/sk_<name>`.
211
+ - **Massive MCP Server Ecosystem** — Browse the [MCP Registry](https://registry.modelcontextprotocol.io), add custom stdio or HTTP servers, enforce real handshake health-checks, and utilize OAuth 2.1 with Dynamic Client Registration. Our recommended catalog flawlessly covers GitHub, Atlassian, Notion, Linear, Sentry, Cloudflare, Slack, Feishu/Lark, Stripe, Hugging Face, Gamma, Brave Search, Perplexity, Filesystem, SQLite, and PostgreSQL. Furthermore, we ship with two built-in, hyper-powerful computer-use servers: `pikiclaw-browser` (driving Chrome via Playwright) and `peekaboo` (driving the macOS GUI via Peekaboo).
212
+ - **Seamless CLI Tool Integration** — Auto-detects versions and authentication states for popular CLIs. We natively support OAuth-web login handoffs for browser-based authentications, routing everything smoothly through the agent's standard tool surface.
213
+ - **Session-Scoped MCP Bridge** — Foundational tools like `im_list_files`, `im_send_file`, `im_ask_user`, alongside the managed browser and macOS desktop tools (when enabled), are automatically injected deep into every single session you launch.
214
+ - **Two-Tier Merge Resolution** — Tool scopes follow a simple rule: `global < workspace < built-in`. The engine automatically resolves and merges these, applying them silently to every session.
215
215
 
216
216
  <p align="center"><img src="docs/promo-dashboard-extensions-add.png" alt="Add MCP server" width="780"></p>
217
217
 
218
- ### Runtime & DX
218
+ ### Runtime & Developer Experience
219
219
 
220
- - **Session workspace** — every session owns a directory; file attachments land there automatically.
221
- - **Resume, switch, classify** — multi-turn conversations, session classification (answer / proposal / implementation / blocked / …).
222
- - **Session-scoped MCP tools** — `im_list_files`, `im_send_file`, `im_ask_user`, and goal-management tools auto-injected into every stream.
223
- - **Computer-use (browser)** — built-in `pikiclaw-browser` MCP wraps `@playwright/mcp` with a shared Chrome profile and a process-level supervisor; log in once, reuse credentials across tasks.
224
- - **Computer-use (macOS desktop)** — built-in `peekaboo` MCP runs [Peekaboo](https://peekaboo.sh/) over Accessibility + ScreenCaptureKit; exposes `see`, `click`, `type`, `scroll`, `window`, `menu`, `app`, `dock`. Opt-in from Extensions; needs Accessibility + Screen Recording permissions. macOS only.
225
- - **Long-task hardening** — sleep prevention, watchdog, auto-restart, daemon mode, channel supervisor.
220
+ - **Dedicated Session Workspaces** — Every session gets its own isolated directory; file attachments and generated assets drop there automatically.
221
+ - **Resume, Switch, and Classify** — Flawless multi-turn conversation support with smart session classification (identifying answers, proposals, implementations, or blocked states).
222
+ - **Auto-Injected Base Tools** — Core MCP tools like file listing, sending, user prompting, and goal tracking are hard-wired into every stream.
223
+ - **Computer-Use (Browser Engine)** — The built-in `pikiclaw-browser` MCP is a hyper-charged wrapper over `@playwright/mcp`. It includes a process-level supervisor and shares an isolated Chrome profile. Log in to your tools once, and reuse those authenticated sessions across all future tasks!
224
+ - **Computer-Use (macOS Desktop)** — Enable the `peekaboo` MCP built-in server (macOS only) to unleash the [Peekaboo](https://peekaboo.sh/) framework over Accessibility and ScreenCaptureKit APIs. It exposes a god-mode suite of tools: `see`, `click`, `type`, `scroll`, `window`, `menu`, `app`, and `dock`. Requires explicit OS-level permissions but grants unprecedented control.
225
+ - **Hardened for Long Tasks** — Built with sleep prevention, watchdog timers, auto-restarts, daemon modes, and a robust channel supervisor. You can walk away knowing your marathon tasks are protected by an ironclad runtime.
226
226
 
227
227
  ---
228
228
 
229
- ## How is this different?
229
+ ## How Is This Different?
230
230
 
231
- | | pikiclaw | IDE assistants<br>(Cursor / Windsurf / Aider) | Cloud agents<br>(Devin / web Claude) | Single-agent IM bots |
231
+ | Feature | pikiclaw | IDE Assistants<br>(Cursor / Windsurf / Aider) | Cloud Agents<br>(Devin / Web Claude) | Single-Agent IM Bots |
232
232
  |---|---|---|---|---|
233
- | **Terminal** | 7 IM channels + Web + future plug-ins | IDE only | Web app | One IM, one bot |
234
- | **Where the agent runs** | Your machine | Your machine | Vendor sandbox | Often vendor |
235
- | **Agent choice** | Claude Code · Codex · Gemini · Hermes (ACP) · | Bundled | Single | Single |
236
- | **Model choice** | Frontier + domestic Chinese + any OpenAI-compatible | Vendor-controlled | Vendor-controlled | Single |
237
- | **Parallel agents** | **N agents × N windows × N workspaces** | One per IDE | Sequential | One |
238
- | **Files / tools** | Your files, your MCP, your CLIs | Your files | Sandbox | None / limited |
239
- | **Plug new terminal** | Add a `Channel` class | n/a | n/a | Fork |
240
- | **Plug new agent** | Add an `AgentDriver` (CLI or ACP) | n/a | n/a | Fork |
241
- | **Self-bootstrapping** | **Yes — built with itself** | No | No | No |
242
-
243
- The shape that matters: **you stay in your environment, you keep your choice of brain, you run a swarm in parallel, and the orchestrator is the same one we use to build the orchestrator.**
233
+ | **Terminal Access** | 7 IM channels + Web + Extensible | Locked inside the IDE | Confined to a Web app | One specific IM app |
234
+ | **Execution Environment** | Your local machine | Your local machine | Vendor's remote sandbox | Usually vendor servers |
235
+ | **Agent Flexibility** | Claude Code, Codex, Gemini, Hermes (ACP), etc. | Locked in | Single | Single |
236
+ | **Model Freedom** | Frontier models, domestic giants, OpenAI-proxies | Controlled by the platform | Controlled by the vendor | Single, hardcoded |
237
+ | **Concurrency Power** | **N Agents × N Windows × N Workspaces** | One agent per IDE window | Strictly sequential | Single thread |
238
+ | **Files & Tools Access** | Your entire local disk, your MCPs, your CLIs | Local project files | Heavily sandboxed | None or extremely limited |
239
+ | **Add a New Terminal** | Drop in a simple `Channel` class | Impossible | Impossible | Requires a hard fork |
240
+ | **Add a New Agent** | Implement a simple `AgentDriver` (CLI or ACP) | Impossible | Impossible | Requires a hard fork |
241
+ | **Self-Bootstrapping** | **Yes — completely built using itself** | No | No | No |
242
+
243
+ The shape that truly matters: **You never have to leave your preferred environment, you retain total choice over the "brain", you can drive a massive swarm in parallel, and the orchestrator is the exact same tool we use to build the orchestrator.**
244
244
 
245
245
  ---
246
246
 
247
- ## Commands
247
+ ## Command Reference
248
248
 
249
249
  | Command | Description |
250
250
  |---|---|
251
- | `/start` | Entry info, current agent, working directory |
252
- | `/sessions` | View, switch, or create sessions |
253
- | `/agents` | Switch agent (Claude · Codex · Gemini · Hermes) |
254
- | `/models` | View and switch model / reasoning effort |
255
- | `/mode` | Toggle plan mode (reasoning effort) |
256
- | `/switch` | Browse and switch working directory |
251
+ | `/start` | View entry info, the active agent, and your working directory |
252
+ | `/sessions` | View, switch, or create new sessions |
253
+ | `/agents` | Switch the active Agent (Claude · Codex · Gemini · Hermes) |
254
+ | `/models` | View and switch the model or reasoning effort for the session |
255
+ | `/mode` | Toggle plan mode / reasoning effort |
256
+ | `/switch` | Browse and switch the working directory |
257
257
  | `/workspaces` | Pick a saved workspace from the Dashboard's quick-pick list |
258
258
  | `/goal` | Set or inspect a long-running, self-terminating session goal |
259
- | `/stop` | Stop current session |
260
- | `/status` | Runtime status, tokens, usage, session info |
261
- | `/host` | Host CPU / memory / disk / battery |
262
- | `/skills` | Browse project skills |
263
- | `/ext` | Extensions overview |
264
- | `/restart` | Restart and re-launch bot |
265
- | `/sk_<name>` | Run a project skill |
259
+ | `/stop` | Force-stop the current session |
260
+ | `/status` | Check runtime status, token usage, resource consumption, and session info |
261
+ | `/host` | Monitor host CPU, memory, disk, and battery levels |
262
+ | `/skills` | Browse available project skills |
263
+ | `/ext` | View the extensions overview |
264
+ | `/restart` | Restart and re-launch the underlying Bot service |
265
+ | `/sk_<name>` | Instantly run a specific project skill |
266
266
 
267
- Plain text is forwarded to the current agent.
267
+ *Note: Plain text without a slash is forwarded directly to the current agent.*
268
268
 
269
269
  ---
270
270
 
271
271
  ## Configuration
272
272
 
273
- - Persistent config: `~/.pikiclaw/setting.json` channels, agents, Providers/Profiles, workspaces, MCP extensions
274
- - The Dashboard is the primary configuration surface; the terminal wizard (`--setup`) and `--doctor` exist for headless setups
275
- - Global MCP extensions live under `extensions.mcp` in the setting file
276
- - Workspace MCP extensions: standard `.mcp.json` in the project root
277
- - Project skills: `.pikiclaw/skills/*/SKILL.md` (also picks up `.claude/commands/*.md`)
273
+ - **Persistent Configuration:** `~/.pikiclaw/setting.json` stores your channels, agents, Providers/Profiles, workspaces, and MCP extensions.
274
+ - The **Dashboard** is the primary UI for configuration. The terminal wizard (`--setup`) and the doctor script (`--doctor`) are available for headless or CLI-first users.
275
+ - Global MCP extensions are stored under the `extensions.mcp` key in the setting file.
276
+ - Workspace MCP extensions follow standard conventions and are read from `.mcp.json` in the project root.
277
+ - Project skills are loaded automatically from `.pikiclaw/skills/*/SKILL.md` (and we also support legacy `.claude/commands/*.md` formats).
278
278
 
279
- **Computer-use** is gated by two toggles under Extensions:
279
+ **Computer-Use Toggles** (managed via the Extensions dashboard):
280
280
 
281
- - `browserEnabled` — managed Chrome (Playwright). The first time an agent needs Chrome, pikiclaw creates a dedicated profile under `~/.pikiclaw` and reuses it across sessions. Log in to the sites you need once; every future session reuses those credentials.
282
- - `peekabooEnabled` — macOS desktop (Peekaboo). When on (macOS only), pikiclaw spawns `@steipete/peekaboo`'s `peekaboo-mcp` binary and injects its tools. Grant the parent terminal **Accessibility** and **Screen Recording** in System Settings → Privacy & Security before flipping the toggle.
281
+ - `browserEnabled` — Enables managed Chrome (Playwright). Upon first use, pikiclaw creates a dedicated profile in `~/.pikiclaw` and reuses it for subsequent sessions. Log in once, and never scan a QR code or enter a password for those tools again.
282
+ - `peekabooEnabled` — Enables macOS desktop automation (Peekaboo). Available on macOS only. Activating this launches `@steipete/peekaboo`'s `peekaboo-mcp` binary and injects its UI-controlling tools. *Note: You must grant your terminal **Accessibility** and **Screen Recording** permissions in System Settings → Privacy & Security before enabling this.*
283
283
 
284
284
  ---
285
285
 
286
286
  ## Roadmap
287
287
 
288
- Already shipped: Hermes driver · ACP (Agent Client Protocol) · Provider/Profile model vault · seven IM channels · computer-use (Playwright browser + Peekaboo macOS desktop).
288
+ **Already Shipped:** Hermes driver integration · ACP (Agent Client Protocol) · Secure Provider/Profile vault · Seven native IM channels · Computer-use via Playwright and Peekaboo (macOS).
289
289
 
290
- - **More ACP agents** — every new ACP-compatible agent should drop in without a hand-written driver
291
- - **More terminals** — WhatsApp, dedicated mobile app, voice
292
- - **Deeper model layer** — agent-on-arbitrary-model wrappers for more domestic series
293
- - **Better tool ecosystem** — recommended MCP packs, skill templates, marketplace
294
- - **Cross-platform computer-use** — Windows / Linux desktop drivers alongside the macOS Peekaboo bridge
290
+ - **More ACP Agents** — Ensuring any new ACP-compatible agent can drop in with zero code changes.
291
+ - **Broader Terminal Ecosystem** — Adding support for WhatsApp, a dedicated mobile app, and voice interfaces.
292
+ - **Deeper Model Wrapping** — Building agent-on-arbitrary-model wrappers to support a wider array of domestic and open-source models seamlessly.
293
+ - **Richer Tool Ecosystem** — Releasing official MCP packs, skill templates, and a community marketplace.
294
+ - **Cross-Platform Computer-Use** — Extending desktop control drivers beyond macOS to support Windows and Linux.
295
295
 
296
- See [ACP Migration Plan](docs/acp-migration.md) for the protocol-side details.
296
+ For protocol-level insights, see our [ACP Migration Plan](docs/acp-migration.md).
297
297
 
298
298
  ---
299
299
 
@@ -308,43 +308,43 @@ npm test
308
308
  ```
309
309
 
310
310
  ```bash
311
- npm run dev # local dev (--no-daemon, logs to ~/.pikiclaw/dev/dev.log)
312
- npm run build # production build (dashboard + tsc)
313
- npm test # vitest run
314
- npx pikiclaw@latest --doctor # environment check
311
+ npm run dev # Start local dev server (--no-daemon, logs to ~/.pikiclaw/dev/dev.log)
312
+ npm run build # Production build (Dashboard + tsc)
313
+ npm test # Run Vitest suite
314
+ npx pikiclaw@latest --doctor # Environment health check
315
315
  ```
316
316
 
317
- Architecture and integration deep dives: [ARCHITECTURE.md](ARCHITECTURE.md) · [INTEGRATION.md](INTEGRATION.md) · [TESTING.md](TESTING.md)
317
+ For deep dives into the architecture and integration, see: [ARCHITECTURE.md](ARCHITECTURE.md) · [INTEGRATION.md](INTEGRATION.md) · [TESTING.md](TESTING.md).
318
318
 
319
319
  ---
320
320
 
321
321
  ## Contributing
322
322
 
323
- The project is built around layers that are *meant* to be extended. New terminals, new agents, new model wrappers, new MCP tools — all are first-class contributions.
323
+ Every layer of this project was designed from the ground up to be **extended**. Adding a new terminal, writing a new agent driver, wrapping a new model, or building a killer MCP toolthese are all first-class contributions.
324
324
 
325
- - Read the **[Contributing Guide](CONTRIBUTING.md)** to get started
326
- - Browse [`good first issue`](https://github.com/xiaotonng/pikiclaw/labels/good%20first%20issue) and [`help wanted`](https://github.com/xiaotonng/pikiclaw/labels/help%20wanted)
327
- - Open an issue first for larger changes so we can align on approach
325
+ - Read the **[Contributing Guide](CONTRIBUTING.md)** to get started.
326
+ - Check out issues tagged with [`good first issue`](https://github.com/xiaotonng/pikiclaw/labels/good%20first%20issue) and [`help wanted`](https://github.com/xiaotonng/pikiclaw/labels/help%20wanted).
327
+ - For major architectural changes, please open an issue first to align on the technical approach.
328
328
 
329
- | Where | What you'd add |
329
+ | Module | What You Can Extend |
330
330
  |---|---|
331
- | `src/agent/driver.ts`, `src/agent/drivers/*.ts`, `src/agent/acp-client.ts` | A new agent driver (CLI- or ACP-based) |
332
- | `src/channels/base.ts`, `src/channels/*/` | A new terminal / IM channel |
333
- | `src/model/`, `src/model/injector.ts` | A new model provider or per-agent injection rule |
334
- | `src/dashboard/routes/*.ts` | A new dashboard API surface |
335
- | `src/agent/mcp/tools/*.ts`, `src/agent/mcp/bridge.ts` | New session-scoped MCP tools |
336
- | `src/catalog/*.ts` | A recommended MCP server / CLI tool / skill repo |
331
+ | `src/agent/driver.ts`, `src/agent/drivers/*.ts`, `src/agent/acp-client.ts` | Add a new Agent Driver (CLI-based or ACP-compatible) |
332
+ | `src/channels/base.ts`, `src/channels/*/` | Integrate a new Terminal or IM channel |
333
+ | `src/model/`, `src/model/injector.ts` | Add a new model provider or customize agent environment injection rules |
334
+ | `src/dashboard/routes/*.ts` | Expand the Dashboard backend API |
335
+ | `src/agent/mcp/tools/*.ts`, `src/agent/mcp/bridge.ts` | Add new session-scoped MCP tools |
336
+ | `src/catalog/*.ts` | Recommend high-quality MCP servers, CLI tools, or Skill repositories |
337
337
 
338
338
  ---
339
339
 
340
- ## Star history
340
+ ## Star History
341
341
 
342
342
  <a href="https://www.star-history.com/#xiaotonng/pikiclaw&Date">
343
- <img src="https://api.star-history.com/svg?repos=xiaotonng/pikiclaw&type=Date" alt="Star history" width="640">
343
+ <img src="https://api.star-history.com/svg?repos=xiaotonng/pikiclaw&type=Date" alt="Star History" width="640">
344
344
  </a>
345
345
 
346
346
  ---
347
347
 
348
348
  ## License
349
349
 
350
- [MIT](LICENSE) — built in the open. Use it, fork it, plug your own layer in.
350
+ [MIT](LICENSE) — Built in the open. Use it, fork it, and plug in your own layers.
package/README.zh-CN.md CHANGED
@@ -4,9 +4,9 @@
4
4
 
5
5
  ## 把全世界最聪明的 AI Agent 装进你的口袋。
6
6
 
7
- ##### *面向"创作者不再需要读代码"时代的开放式 Agent 编排器。*
7
+ ##### *面向「创作者不再需要看代码」时代的开放式 Agent 编排器。*
8
8
 
9
- *任意 Agent(Claude · Codex · Gemini · Hermes · …)、任意模型(Claude · GPT · Gemini · DeepSeek · 豆包 · MiMo · MiniMax · OpenRouter · 任意第三方代理)、任意工具(Skills · MCP · CLI)随意插拔。从任意终端驱动它们 —— IM、Web,或未来还会加入的形态。pikiclaw 本身就是用 pikiclaw 构建的。*
9
+ *接入任何 Agent(Claude · Codex · Gemini · Hermes · …),任何模型(Claude · GPT · Gemini · DeepSeek · 豆包 · MiMo · MiniMax · OpenRouter · 甚至是任意第三方代理),以及任何工具(Skills · MCP · CLI)。通过你最顺手的终端(IM、Web 或未来形态)来驱动它们。pikiclaw 本身就是用 pikiclaw 构建的。*
10
10
 
11
11
  ```bash
12
12
  npx pikiclaw@latest
@@ -32,9 +32,9 @@ npx pikiclaw@latest
32
32
 
33
33
  ## pikiclaw 是什么?
34
34
 
35
- **绝大多数"AI 开发工具"项目只切一个面 —— 一种 IDE、一种 Agent、一家模型厂商,然后就止步了。** pikiclaw 押的是另一条赛道:下一阶段的"建造"不会发生在某个编辑器内部,而是发生在一个**编排器**里 —— 让创作者一边坐在控制台前,一边并行地驱动一群 Agent,跑在最好的模型上,通过最顺手的那个终端推进。整个过程不需要打开任何代码文件。
35
+ **大多数「AI 开发工具」往往只做局部的创新 —— 绑定一款 IDE、单一 Agent 或某家模型厂商,然后便止步于此。** pikiclaw 则建立在一个截然不同的判断之上:下一代「创造」的过程,不会局限在某个单一的编辑器内部。它会发生在一个**编排器 (Orchestrator)** 中。在这里,创作者可以并发出一个 Agent **集群 (Swarm)**,让它们跑在当前最强大的模型上,并通过手边最方便的终端来掌控全局——而且,你甚至不需要打开任何代码文件。
36
36
 
37
- 产品就是这个编排器,其它所有东西都是可插拔的层。**而且这个编排器是用它自己构建出来的** —— 我们就是用 pikiclaw 来开发 pikiclaw
37
+ 核心产品就是这个编排器,其它所有组件都可拔插。**更酷的是,这个编排器是由它自己构建出来的** —— pikiclaw 就是我们用来开发 pikiclaw 的工具。
38
38
 
39
39
  ```
40
40
  终端层 Telegram · 飞书 · 微信 · Slack · Discord · 钉钉 · 企业微信 · Web Dashboard
@@ -46,88 +46,88 @@ npx pikiclaw@latest
46
46
  |
47
47
  ┌────────────────────────────────────────┼────────────────────────────────────────┐
48
48
  v v v
49
- Agent 层 模型层 工具层
49
+ Agent 层 模型层 工具层
50
50
  Claude Code · Codex · Gemini · Hermes Claude · GPT · Gemini · DeepSeek Skills · MCP · CLI
51
- driver registry · ACP · 任意 Agent豆包 · MiMo · MiniMax · OpenRouter (全局 × 工作区)
51
+ (driver registry · ACP · 任意 Agent) 豆包 · MiMo · MiniMax · OpenRouter (全局 × 工作区)
52
52
  · 任意 OpenAI 兼容代理 · …
53
53
  |
54
54
  v
55
55
  你的电脑
56
56
  ```
57
57
 
58
- - **终端层** —— Telegram、飞书、微信、Slack、Discord、钉钉、企业微信和 Web Dashboard 是地位对等的入口。新终端从这里插入。
59
- - **Agent 层** —— 直接拿官方的 Claude Code / Codex / Gemini / Hermes CLI driverHermes ACPAgent Client Protocol);driver registry 可以接入任何 Agent。
60
- - **模型层** —— Claude / GPT / Gemini、国产系列(DeepSeek、豆包、MiMo、MiniMax),加上 OpenRouter 和任意 OpenAI 兼容代理。Providers + Profiles 是一等公民层,自带凭据库、models.dev 目录和按 Agent 注入的能力。
61
- - **工具层** —— Skills、MCP server、CLI 工具,按全局和工作区两个 scope 合并后注入到每个会话。
58
+ - **终端层 (Terminal)** —— Telegram、飞书、微信、Slack、Discord、钉钉、企业微信以及 Web Dashboard 都是一等公民入口。新的终端形态可以随时接入。
59
+ - **Agent 层** —— 官方的 Claude Code / Codex / Gemini / Hermes CLI 作为底层驱动 (driver)。其中 Hermes 使用 ACP (Agent Client Protocol,客户端协议);注册表机制允许无缝接入任何其他的 Agent。
60
+ - **模型层 (Model)** —— Claude / GPT / Gemini、国产系列 (DeepSeek、豆包、MiMo、MiniMax),外加 OpenRouter 以及任何兼容 OpenAI 接口的代理服务。提供商 (Providers) 与配置项 (Profiles) 是一等公民模块,自带凭据保险箱、models.dev 目录以及面向各个 Agent 专属的环境变量注入能力。
61
+ - **工具层 (Tool)** —— Skills、MCP 服务器和 CLI 工具。它们会在全局和工作区两个层级进行智能合并,并被自动注入到每一次会话之中。
62
62
 
63
63
  ---
64
64
 
65
- ## 自我构建
65
+ ## 自举:用自己构建自己
66
66
 
67
- > 衡量一个 Agent 编排器是否可信,最硬核的标准是它能不能构建自己。pikiclaw 可以。我们用 pikiclaw 开发、测试、发布、运维 pikiclaw —— 每一次提交、每一次发版。
67
+ > 检验一个 Agent 编排器是否靠谱,最硬核的标准就是看它能不能自举(构建自己)。pikiclaw 做到了。我们日常使用 pikiclaw 来开发、测试、发布和运维 pikiclaw —— 覆盖了每一次 Commit 和每一次版本发布。
68
68
 
69
- 在 pikiclaw 里,典型的一天是这样的:
69
+ 在 pikiclaw 里的典型开发日常是这样的:
70
70
 
71
- - 窗口 1 Claude Code 会话在实现一条新的 dashboard 路由。
72
- - 窗口 2 Codex 会话在同一个工作区里写对应的单元测试。
73
- - 窗口 3 Gemini 会话在 review diff 并起草 changelog。
74
- - 第四个线程里,`/sk_promote` 技能在 GitHub 上扫相关 issue 并自动回复。
75
- - 四路流并行;一个人坐在咖啡馆里,从手机上掌控全部。
71
+ - 窗口 1 里的 Claude Code 会话正在实现一个全新的 dashboard 路由。
72
+ - 窗口 2 里的 Codex 会话正在为它编写配套的单元测试,并在同一个工作区下运行。
73
+ - 窗口 3 里的 Gemini 会话在 Review Diff,并起草更新日志。
74
+ - 与此同时,第四条线程中的技能 (`/sk_promote`) 正在自动扫描 GitHub 的相关 Issue 并尝试回复。
75
+ - 这四个进程完全并行运作;而掌控它们的人,可能只是坐在咖啡馆里用一部手机进行统筹安排。
76
76
 
77
- 编排器就是产品。它也恰好是这个编排器自己的开发环境。
77
+ 这个编排器就是产品本身,同时,它也恰好是我们用来构建它的 IDE。
78
78
 
79
79
  ---
80
80
 
81
- ## 默认就是 Swarm
81
+ ## 默认并发集群 (Swarm)
82
82
 
83
- 绝大多数"AI 开发工具"假设:一个用户、一个 Agent、一次一件事。pikiclaw 假设的是反面:**N 个 AgentN 个窗口、一个操作者、一套工具集。**
83
+ 大多数「AI 开发工具」的基本假设是:一个用户,一次只让一个 Agent 做一件事。pikiclaw 的假设则完全相反:**N 个 AgentN 个窗口,一位指挥官,一套工具箱。**
84
84
 
85
- - **N 路并行会话** —— Dashboard 的每个 pane 就是一条独立的 Agent 流,对应一个独立的会话工作区;IM 线程还能在上面再叠加。
86
- - **Agent 自由组合** —— pane 1 跑 Claude Code,pane 2 跑 Codex,pane 3 跑 Gemini,分别在不同的仓库 / 工作区上工作。
87
- - **统一工具集** —— 全局 Skills、全局 MCP server、按工作区覆写,规则一致。配置一次,每个会话都继承。
88
- - **随时接管** —— 中断任何运行中的流,排进一条新消息,把控制权交给下一个 Agent。
89
- - **群组模式** —— 把编排器丢进一个飞书 / Slack / Discord / 企业微信群,整个团队共享同一个 swarm。
85
+ - **N 路并行会话** —— Dashboard 上的每一个面板都是一条独立的 Agent 流,对应着一个独立的会话工作区;如果接入 IM,还能随时开辟出更多的工作线程。
86
+ - **Agent 随意混搭** —— 面板 1 跑 Claude Code,面板 2 跑 Codex,面板 3 跑 Gemini,它们可以在不同的代码仓库和工作区中各司其职。
87
+ - **统一的工具箱** —— 全局的 Skills、全局 MCP 服务器以及工作区专属的覆盖配置都会进行统一管理。只需配置一次,后续所有会话即可自动继承。
88
+ - **随时随地介入** —— 你可以随时打断运行中的数据流,将新指令插队,或者把控制权顺滑交接给下一个 Agent。
89
+ - **群组协作模式** —— 把编排器拉进飞书 / Slack / Discord / 企业微信的聊天群中,团队成员便能集体共享这同一个 Agent 集群。
90
90
 
91
- 真正重要的形态是:一个创作者,指尖上是一群 Agent。
91
+ 这正是我们认为最关键的形态:让每个创作者的指尖,都掌控着一支全天候待命的 AI 军队。
92
92
 
93
93
  ---
94
94
 
95
- ## 实际效果
95
+ ## 实际演示
96
96
 
97
- > **真实任务** —— 让 pikiclaw 收集并总结今天的 AI 新闻;Agent 读、写、然后把结果通过 Telegram 推回来,整个过程在你手机上完成。
97
+ > **真实任务** —— 让 pikiclaw 收集并总结今天的 AI 新闻;Agent 自动阅读、撰写,最后通过 Telegram 将结果推送到你的手机上。
98
98
 
99
99
  <p align="center"><img src="docs/promo-demo.gif" alt="演示:从 Telegram 发起任务,Agent 在本地执行,结果回到聊天" width="780"></p>
100
100
 
101
- > **Web Dashboard** —— pane 工作区,包含会话列表、对话内容、工具调用轨迹和输入区(1 / 2 / 3 / 6 pane 布局)。
101
+ > **Web Dashboard** —— 多面板工作区,包含会话列表、对话流、工具调用轨迹以及输入区域(支持 1 / 2 / 3 / 6 面板布局)。
102
102
 
103
103
  <p align="center"><img src="docs/promo-dashboard-workspace.png" alt="Web Dashboard 工作区" width="780"></p>
104
104
 
105
105
  <details>
106
- <summary><b>更多:基础操作 · IM 接入 · Agent · 模型 · 扩展 · 权限 · 系统信息</b></summary>
106
+ <summary><b>更多细节:基础操作 · IM 接入 · Agent 管理 · 模型配置 · 扩展工具 · 权限 · 系统信息</b></summary>
107
107
 
108
- > 发一条消息,看 Agent 流式输出,把文件收回来。
108
+ > 发送消息,观察 Agent 的流式输出,接收返回的文件附件。
109
109
 
110
110
  <img src="docs/promo-basic-ops.gif" alt="基础操作" width="780">
111
111
 
112
- > **IM 接入** —— Telegram、飞书、微信、Slack、Discord、钉钉、企业微信的状态与配置
112
+ > **IM 接入** —— Telegram、飞书、微信、Slack、Discord、钉钉、企业微信的频道连接状态与参数配置。
113
113
 
114
114
  <img src="docs/promo-dashboard-im.png" alt="IM 接入" width="780">
115
115
 
116
- > **Agent** —— 已安装的 Agent CLI、默认 Agent、按 Agent 的模型 / 推理强度
116
+ > **Agent 管理** —— 已安装的 Agent CLI 列表、默认 Agent 设定,以及各自独立的模型 / 推理强度配置。
117
117
 
118
118
  <img src="docs/promo-dashboard-agents.png" alt="Agent" width="780">
119
119
 
120
- > **模型** —— Providers + Profiles 凭据库(Claude、GPT、Gemini、DeepSeek、豆包、MiMo、MiniMax、OpenRouter 以及任意 OpenAI 兼容代理),用 models.dev 目录校验后按 Agent 注入
120
+ > **模型配置** —— 整合了 Provider + Profile 的凭据库(涵盖 Claude、GPT、Gemini、DeepSeek、豆包、MiMo、MiniMax、OpenRouter 及任何兼容 OpenAI 接口的代理),支持通过 models.dev 目录进行验证,并为指定的 Agent 独立进行底层环境变量注入。
121
121
 
122
- > **扩展** —— 全局 MCP server、社区 Skills、托管浏览器 + macOS 桌面(Peekaboo)自动化
122
+ > **扩展工具** —— 统一管理全局 MCP 服务器、社区版 Skills、内置托管的浏览器环境及 macOS 桌面(Peekaboo)自动化能力。
123
123
 
124
124
  <img src="docs/promo-dashboard-extensions.png" alt="扩展" width="780">
125
125
 
126
- > **系统权限** —— macOS 辅助功能、屏幕录制、磁盘访问
126
+ > **系统权限** —— macOS 辅助功能、屏幕录制及磁盘访问权限管理。
127
127
 
128
128
  <img src="docs/promo-dashboard-permissions.png" alt="权限" width="780">
129
129
 
130
- > **系统信息** —— 工作目录、CPU / 内存 / 磁盘监控
130
+ > **系统信息** —— 当前工作目录详情,以及 CPU / 内存 / 磁盘使用率的全天候监控。
131
131
 
132
132
  <img src="docs/promo-dashboard-system.png" alt="系统信息" width="780">
133
133
 
@@ -137,14 +137,14 @@ npx pikiclaw@latest
137
137
 
138
138
  ## 快速开始
139
139
 
140
- **前置要求:** Node.js 20+,并且至少登录一个官方 Agent CLI:
140
+ **前置要求:** 环境须具备 Node.js 20+,并且在系统中至少登录过一款官方的 Agent CLI:
141
141
 
142
- - [`claude`](https://docs.anthropic.com/en/docs/claude-code)Claude Code
143
- - [`codex`](https://github.com/openai/codex)Codex CLI
144
- - [`gemini`](https://github.com/google-gemini/gemini-cli)Gemini CLI
145
- - `hermes`(Hermes —— 通过 ACP / Agent Client Protocol
142
+ - [`claude`](https://docs.anthropic.com/en/docs/claude-code) (Claude Code)
143
+ - [`codex`](https://github.com/openai/codex) (Codex CLI)
144
+ - [`gemini`](https://github.com/google-gemini/gemini-cli) (Gemini CLI)
145
+ - `hermes` (Hermes —— 基于 ACP / Agent Client Protocol 协议)
146
146
 
147
- **启动:**
147
+ **启动命令:**
148
148
 
149
149
  ```bash
150
150
  cd your-workspace
@@ -153,147 +153,147 @@ npx pikiclaw@latest
153
153
 
154
154
  <p align="center"><img src="docs/promo-install.gif" alt="一行命令安装" width="780"></p>
155
155
 
156
- 它会在 `http://localhost:3939` 打开 **Web Dashboard** —— 你可以在浏览器里驱动会话、接 IM 渠道、配置 Agent 与模型、安装 MCP server Skills、管理系统权限。其余一切都是一键之内。
156
+ 这条命令会在 `http://localhost:3939` 自动唤起 **Web Dashboard**。随后,你就可以在浏览器里驱动任何会话、接入需要的 IM 渠道、灵活配置 Agent 和模型、快速安装 MCP 服务器与技能 (Skills),并统筹所有的系统权限。其他一切功能,尽在一键之遥。
157
157
 
158
158
  <details>
159
- <summary><b>偏好终端?有个向导。</b></summary>
159
+ <summary><b>更喜欢传统的纯命令行配置?我们准备了专用的配置向导。</b></summary>
160
160
 
161
161
  ```bash
162
- npx pikiclaw@latest --setup # 交互式终端向导
163
- npx pikiclaw@latest --doctor # 仅做环境检查
162
+ npx pikiclaw@latest --setup # 开启交互式终端配置向导
163
+ npx pikiclaw@latest --doctor # 仅检查并诊断当前环境
164
164
  ```
165
165
 
166
166
  </details>
167
167
 
168
168
  ---
169
169
 
170
- ## 大家都用它来做什么
170
+ ## 典型的应用场景
171
171
 
172
- - **并行跑一个 swarm** —— 在 Dashboard 里开 N pane(或 N IM 线程),每个 pane 是一个不同的 Agent,盯着不同的工作区同时工作。一个人,多个 Agent,一个驾驶舱。随时切到任意一个进去接管。
173
- - **自托管的开发回路** —— pikiclaw 本身就是用 pikiclaw 构建的。开发流程**就是**产品本身:从手机驱动编排器,写代码,发版本,再迭代。
174
- - **走开式编程** —— 启动一个大重构,合上电脑,从手机通过 Telegram 继续操控。Agent 在本地一直跑,结果实时推回聊天。
175
- - **同一工作区上的多 Agent** —— Claude Code 写初版实现,切到 Codex review,再切到 Gemini 换个视角。同一份代码,同一份会话历史。
176
- - **国产模型路由** —— 当延迟、成本、合规约束需要非前沿模型时,通过 wrapper driver 让 Claude Code 跑在 DeepSeek 或豆包上。
177
- - **群里的 Agent** —— 把 pikiclaw 拉进飞书 / Slack / Discord / 企业微信工作群;整个团队共享一个编排器、一个工作区、一套 Skills。
178
- - **由你掌控的 Computer Use** —— 打开托管 ChromePlaywright)和 macOS 桌面(Peekaboo,基于 Accessibility + ScreenCaptureKit),Agent 就能 `see` 屏幕、点击、输入、管理窗口 / 菜单 / Dock —— 而你从手机上指挥它。订会议、抓 dashboard、跑端到端测试,或者直接驱动任意原生 macOS 应用。
179
- - **以 Skill 为中心的工作流** —— 一次性安装社区 Skill(`promote`、`snipe`、`review`、`security-review` …),之后在任意终端用 `/sk_<name>` 触发。
172
+ - **并发运行集群** —— 在 Dashboard 里打开 N 个面板(或者开辟 N IM 线程),每个面板运行不同的 Agent 负责不同的工作区,完全并行运作。一个人,多个 Agent,同一个全局驾驶舱。你可以随时强力介入任何一个工作流。
173
+ - **自包含的闭环开发** —— pikiclaw 就是用 pikiclaw 自己开发出来的。这套开发流本身就是这款产品最原始的面貌:甚至可以在外用手机操作编排器,让 Agent 写代码、发布版本并不断迭代。
174
+ - **挂机式编程 (Walk-away coding)** —— 发起一个耗时极长的大型重构任务,合上笔记本,外出时直接用手机通过 Telegram 进行监控和控制。Agent 始终在本地机器上运行,结果则会流式实时推回聊天界面中。
175
+ - **同工作区多 Agent 接力** —— 先让 Claude Code 写一版功能草稿,无缝切给 Codex 去做深度 Review,最后再交给 Gemini 提供截然不同视角的优化建议。所有这些操作都在同一份代码目录和相同的历史会话中完成。
176
+ - **灵活的国产模型路由方案** —— 当你的任务对延迟、成本或合规有硬性要求时,通过模型驱动包装层,可以直接让 Claude Code 跑在实惠又快速的 DeepSeek 或豆包模型之上。
177
+ - **群聊协作级 Agent** —— 把 pikiclaw 拉入飞书 / Slack / Discord / 企业微信群聊内;整个团队可以共享这同一个编排器、统一的项目工作区和一系列团队专属技能。
178
+ - **完全受控的 Computer-use 能力** —— 开启内置的 Chrome 浏览器托管(基于 Playwright)和 macOS 桌面环境托管(基于 Peekaboo,通过辅助功能和 ScreenCaptureKit)。Agent 瞬间获得「视力」(`see`)、可以自由点击、打字,并管理窗口、菜单栏和 Dock,而你依然可以通过手机远程精准操控它。无论是帮你预定一场会议、抓取某个数据面板信息、跑一通端到端自动测试,还是驱动任何原生的 macOS 本地应用,全都不在话下。
179
+ - **基于 Skill 体系的自动化工作流** —— 一次性安装好社区提供的常用技能(例如 `promote`、`snipe`、`review`、`security-review` 等),往后只需在任何连接的终端里输入 `/sk_<name>` 即可实现一键触发。
180
180
 
181
181
  ---
182
182
 
183
- ## 功能特性
183
+ ## 核心特性
184
184
 
185
- ### 终端层
185
+ ### 终端层 (Terminal)
186
186
 
187
- - **七条 IM 渠道** —— Telegram、飞书、微信(个人号)、Slack、Discord、钉钉、企业微信。开一条、几条、或者全开都行。每条渠道在代码上是物理隔离的;新增一条(WhatsApp、移动端、…)不需要动其他渠道。
188
- - **Web Dashboard** —— 直接在浏览器里驱动会话,拥有和 IM 完全一致的对话、工具调用、流式输出体验。多 pane 工作区(1 / 2 / 3 / 6 pane)、浅色 / 深色主题、EN / 中文 i18n。
189
- - **实时流预览** —— 消息随着 Agent 思考原地刷新;长文本自动分段;图片和文件实时回传。
187
+ - **支持七大主流 IM** —— 全面集成 Telegram、飞书、微信(个人号)、Slack、Discord、钉钉和企业微信。你可以只开启其中一个,也可以多开齐上。底层代码中每个渠道都做到绝对隔离;即使后续再添加新渠道(如 WhatsApp、自有移动 App 等),也丝毫不会影响现有逻辑的稳定性。
188
+ - **Web Dashboard 面板** —— 直接在网页浏览器中驱动所有会话,获得与 IM 完全一致的自然对话、工具调用轨迹跟踪和极速的流式反馈体验。面板提供 1 / 2 / 3 / 6 多窗口并发布局、深色/浅色自适应主题,以及纯正的中英文 (i18n) 双语支持。
189
+ - **实时流式预览** —— 每当 Agent 开始思考,消息都会实时在原地进行刷新;遇到超长文本能自动进行友好分段;生成的图片与文件也会即刻原样推回前端界面。
190
190
 
191
191
  ### Agent 层
192
192
 
193
- - **官方 CLI driver** —— Claude Code、Codex CLI、Gemini CLIHermes(走 ACP)。不重写 Agent 内核 —— 直接吃官方上游的能力,Day-0 跟随升级。
194
- - **ACP 原生** —— Hermes 通过 [Agent Client Protocol](https://agentclientprotocol.com) 集成,以 `hermes acp` 启动并走 JSON-RPC stdio。未来任何 ACP 兼容的 Agent 都以同样方式插入。
195
- - **可插拔注册表** —— 唯一契约是 `src/agent/driver.ts`。新的 CLI ACP Agent 可以和现有四个内建 driver 并排接入。
196
- - **按会话切换 Agent** —— 同一个工作区,换个"脑子"。
197
- - **接管** —— 中断当前任务,把一条排队消息提到队首。
198
- - **Codex Human-in-the-Loop** —— Codex 暂停提问时,问题会变成 IM 里的交互式 prompt。在那边回复,任务就继续。
199
- - **持久化目标** —— `/goal` 给每个会话设一个长期目标,带 token 预算和暂停 / 恢复;Agent 完成自审后会自动终止。
193
+ - **官方 CLI 作为原生底层驱动** —— 内置接入 Claude Code、Codex CLI、Gemini CLI 以及 Hermes (通过 ACP 协议)。我们坚决拒绝自己「造一套套壳的 Agent 引擎」——只要上游核心推出了任何更新功能,你就可以在第一时间无损享用。
194
+ - **原生拥抱 ACP 协议** —— Hermes 的接入完全基于 [Agent Client Protocol](https://agentclientprotocol.com) 协议,通过系统标准的 JSON-RPC (输入/输出流) 唤起 `hermes acp`。这意味着在未来,任何兼容 ACP 协议的新 Agent 也能立刻无缝空降至平台。
195
+ - **自由可插拔的注册表机制** —— 在整套代码库中,这部分唯一的强制契约只有 `src/agent/driver.ts`。不论是基于传统 CLI 还是新兴 ACP 协议开发的各类新 Agent,都能随时加入注册表,与现有的四大核心内置引擎并肩作战。
196
+ - **无感会话级 Agent 切换** —— 你甚至不用离开当前代码工作区,就能在会话途中随时顺畅地帮 AI 更换一颗不同特性的「大脑」。
197
+ - **接管与干预 (Steer) 控制** —— 你可以随心所欲中断正在执行的繁重任务,让排队的紧急新消息直接插队至最前方处理。
198
+ - **Codex 人机协同机制 (Human-in-the-loop)** —— Codex 需要你确认操作细节时,这些提示请求会自动转化发送为 IM 中的互动询问消息。你只需在平常用的聊天框内简单答复,暂停的任务就会完美接续运作。
199
+ - **长效目标系统 (Persistent goals)** —— 允许使用 `/goal` 指令,为指定的会话设定出伴有明确 Token 预算的长效终止目标。任务支持智能暂停/恢复,只有当 Agent 靠自行审计判定达到目标要求后,它才会结束自身当前进程。
200
200
 
201
201
  ### 模型层
202
202
 
203
- - **前沿 + 国产 + 代理** —— Claude(4 系列)、GPT-5 / CodexGeminiDeepSeek、豆包(Doubao)、MiMoMiniMaxOpenRouter,以及任意 OpenAI 兼容的模型代理。
204
- - **Providers + Profiles 凭据库** —— 一等公民数据模型,凭据落在 `~/.pikiclaw/setting.json` 自己的存储区。可以浏览只读的 models.dev 目录,用真实 provider probe 校验 key,再把一个 profile 绑到 Agent 上,启动时自动注入 env。
205
- - **按会话选模型 + 推理强度** —— Dashboard、`/models` `/mode` 里挑。
206
- - **按 Agent 注入** —— `resolveAgentInjection(agentId)` 在启动时把当前 profile 的 env 变量注入进去,所以 Claude Code 可以直接跑在 DeepSeek 或豆包上,而不用改上游客户端配置。
203
+ - **全面涵盖前沿顶流、国产之光与各类代理** —— 囊括 Claude 家族系列、强大的 GPT-5 / Codex 以及 Gemini;国内优秀梯队的 DeepSeek、豆包 (Doubao)、MiMoMiniMax;同时原生兼容 OpenRouter 和任意支持 OpenAI 通用接口格式的第三方代理服务。
204
+ - **Providers & Profiles 凭据专属保险箱** —— 构建了高标准隔离的数据保护模型,API 凭据会被单独加密存放在 `~/.pikiclaw/setting.json` 专属区域。你能在只读的 models.dev 目录进行便捷浏览、调用最真实的 API 探针来严谨验证密钥的有效性,最终再把这份 Profile 与指定的任意 Agent 相绑定,从而实现运行阶段环境变量参数的自动隔离注入。
205
+ - **极度自由的会话级配置选取** —— 无论是模型本体还是针对特定高难度任务的推理强度,你都能在友好的 Dashboard 界面中,或者直接发送指令 `/models` `/mode` 来即时动态切选。
206
+ - **Agent 级别底层强制注入** —— 核心流函数 `resolveAgentInjection(agentId)` 在启动的最初阶段就会将对应的环境变量强行覆盖进去。这意味着,你竟然可以直接指令 Claude Code,让它全程跑在超高性价比的 DeepSeek 或是豆包核心大模型上,并且全程无需去改动其原本上游客户端里任何一行深层配置代码。
207
207
 
208
208
  ### 工具层
209
209
 
210
- - **Skills** —— 项目级 Skill 放在 `.pikiclaw/skills/*/SKILL.md`,兼容 `.claude/commands/*.md`。从 GitHub 仓库(`owner/repo`)一键安装,或浏览推荐合集(Anthropic Official、Vercel Agent Skills、…)。用 `/skills` `/sk_<name>` 触发。
211
- - **MCP server** —— 浏览 [MCP Registry](https://registry.modelcontextprotocol.io)、自建 stdio / HTTP server、用真实 handshake 做健康检查、OAuth 2.1 + 动态客户端注册、按 scope 启停。推荐目录覆盖 GitHub、Atlassian、Notion、Linear、Sentry、Cloudflare、Slack、飞书 / Lark、Stripe、Hugging Face、Gamma、Brave Search、Perplexity、Filesystem、SQLite、PostgreSQL —— 此外还有两个内建的 computer-use server(`pikiclaw-browser` Playwright 操控 Chrome,`pikiclaw-desktop` Peekaboo 操控 macOS GUI)。
212
- - **CLI 工具** —— 自动探测版本与登录状态,浏览器登录类 CLI 支持 OAuth-web 会话,所有 CLI 都通过 Agent 自身的工具调用面访问。
213
- - **会话级 MCP bridge** —— `im_list_files`、`im_send_file`、`im_ask_user`,加上托管浏览器工具和 macOS 桌面工具(启用时),会被自动注入每个会话。
214
- - **两层合并** —— `global < workspace < built-in`,自动应用到每个会话。
210
+ - **强大的技能系统 (Skills)** —— 这个系统让每一个工程专属技能被稳稳地存放在 `.pikiclaw/skills/*/SKILL.md` 内(同时也全面向下兼容标准的 `.claude/commands/*.md` 描述格式)。支持快速指定从 GitHub 的公开仓库(`owner/repo`)中实现极速的一键远程拉取并安装;或者去随便逛逛我们收录整理的精选套件包(比如备受好评的 Anthropic 官方包、或是好用的 Vercel Agent Skills 包等)。平时直接发个 `/skills` 探查当前载入的所有技能,挑准目标直接用 `/sk_<name>` 便可秒速触发。
211
+ - **最广泛主流的 MCP 服务器加持** —— 可以直接浏览接入 [MCP Registry](https://registry.modelcontextprotocol.io) 全球库或者自由手工增加本地 stdio 和网端 HTTP 服务;框架严格支持实机硬核握手健康侦测机制与 OAuth 2.1 高级动态客户端安全注册,且能精细拆分控制启用哪些作用域范围。目前精选优化的目录已毫无压力地涵盖 GitHub、Atlassian、Notion、Linear、Sentry、Cloudflare、Slack、飞书/Lark、Stripe、Hugging Face、Gamma、Brave Search、Perplexity、本地系统深度文件探测、SQLite 甚至专业的 PostgreSQL。此外,系统更逆天地内置附赠了两个重磅级的强力 Computer-use 级别核心服务(一个是基于大名鼎鼎的 Playwright 来暴躁驱动底层 Chrome 浏览器的 `pikiclaw-browser`;另一个则是依托极客向 Peekaboo 纯正血统,操控整个底层 macOS GUI 交互视窗的超级 `peekaboo` 工具)。
212
+ - **无缝衔接各类流行 CLI 神器** —— 底层逻辑强悍地支持自动探测各类版本兼容性并精准校验出授权登入状态。特别是遇到基于浏览器鉴权登录判定的 CLI,我们底层支持 OAuth-web 授权无缝接力。最后统统由 Agent 最原生的调用接口无缝唤起执行操作。
213
+ - **全局会话级的 MCP 底层桥接** —— `im_list_files`、`im_send_file`、`im_ask_user` 这些基建指令,再叠加前述的内置浏览器与 macOS 桌面自动化控制工具包(只要一旦开启安全开关),统统都会被全面自动注入进你的每一场会话里。
214
+ - **双域极简权限合并机制** —— 所有工具作用范围授权,永远只需遵循这条策略:`全局 (global) < 当前工作区 (workspace) < 内建 (built-in)`。底层引擎每次都能自动执行合并,并丝滑生效进后续发起的对话之中。
215
215
 
216
216
  <p align="center"><img src="docs/promo-dashboard-extensions-add.png" alt="添加 MCP server" width="780"></p>
217
217
 
218
- ### 运行时 & 开发体验
218
+ ### 运行环境与开发者体验 (Runtime & DX)
219
219
 
220
- - **会话工作区** —— 每个会话独占一个目录;附件直接落到那里。
221
- - **恢复、切换、归类** —— 多轮会话、会话分类(answer / proposal / implementation / blocked / …)。
222
- - **会话级 MCP 工具** —— `im_list_files`、`im_send_file`、`im_ask_user` 以及目标管理工具自动注入到每条流。
223
- - **Computer-use(浏览器)** —— 内建 `pikiclaw-browser` MCP `@playwright/mcp` 之上包了一份共享 Chrome profile 和一个进程级 supervisor;登录一次,跨任务复用凭证。
224
- - **Computer-usemacOS 桌面)** —— 内建 `pikiclaw-desktop` MCP 通过 Accessibility + ScreenCaptureKit 跑 [Peekaboo](https://peekaboo.sh/),暴露 `see`、`click`、`type`、`scroll`、`window`、`menu`、`app`、`dock`。需要在扩展里手动开启;要求"辅助功能"和"屏幕录制"两项权限;仅 macOS。
225
- - **长任务加固** —— 防休眠、watchdog、自动重启、daemon 模式、渠道 supervisor。
220
+ - **独享会话级项目工作区** —— 每开启一次新的交锋会话,底层引擎都会为它开辟出单独专属的实体文件隔离目录,附件直接落在那里。
221
+ - **多轮会话回溯管控** —— 随便怎么恢复、切换,还配上了贴心的语义会话分类体系(快速分为解答、提案、实现,阻塞等清晰状态标识归类)。
222
+ - **基建工具流自注入** —— 强悍的 `im_list_files`、`im_send_file`、以及 `im_ask_user`,加上目标追踪管理工具等,会在启动前夕自动挂载。
223
+ - **Computer-use (浏览器引擎层)** —— 系统底层内置了 `pikiclaw-browser` MCP。这是二次封装了 `@playwright/mcp` 实现的,使其拥有进程级 Supervisor 监管机制,且达成了跨任务进程共享独立 Chrome 配置。只需要登录认证一次常用网站;在未来的任何任务里,这个工具将直接一键继承数据免签直连!
224
+ - **Computer-use (macOS 桌面控制层)** —— 当你在扩展面板启用 `peekaboo` MCP 并在系统设置授予终端“辅助功能”与“屏幕录制”权限后(仅限 macOS);你即可借助 [Peekaboo](https://peekaboo.sh/) 框架的加持瞬间获得暴露在外的各种工具:视力 (`see`);精准点击 (`click`);虚空打字输入 (`type`);操作滚轮 (`scroll`);以及操作全系统窗口 (`window`);主菜单 (`menu`);程序生命周期 (`app`);甚至是 Dock (`dock`) 等这一整套系统控制工具集。
225
+ - **长效任务坚固防线** —— 核心内置了防休眠系统、看门狗守护模块、异常自动重启涅槃机制、守护进程模式;还有渠道 Supervisor 督军服务。这豪华阵容保证你哪怕挂机跑极其漫长的任务,也能极度稳如磐石!
226
226
 
227
227
  ---
228
228
 
229
- ## 这和其他东西有什么不一样?
229
+ ## 到底有什么不同?
230
230
 
231
- | | pikiclaw | IDE 类助手<br>(Cursor / Windsurf / Aider) | 云端 Agent<br>(Devin / web Claude) | Agent IM 机器人 |
231
+ | | pikiclaw | IDE 级智能助手<br>(Cursor / Windsurf / Aider) | 云端 Agent<br>(Devin / 网页版 Claude) | 单体 IM 机器人 |
232
232
  |---|---|---|---|---|
233
- | **终端** | 7 IM + Web + 后续插件 | 只有 IDE | Web 应用 | 一条 IM、一个 bot |
234
- | **Agent 在哪运行** | 你的机器 | 你的机器 | 厂商沙箱 | 通常在厂商侧 |
235
- | **Agent 选择** | Claude Code · Codex · Gemini · HermesACP)· | 绑定 | 单一 | 单一 |
236
- | **模型选择** | 前沿 + 国产 + 任意 OpenAI 兼容 | 厂商控制 | 厂商控制 | 单一 |
237
- | **并行 Agent** | **N 个 Agent × N 个窗口 × N 个工作区** | 每个 IDE 一个 | 串行 | 一个 |
238
- | **文件 / 工具** | 你的文件、你的 MCP、你的 CLI | 你的文件 | 沙箱 | / 受限 |
239
- | **接入新终端** | 加一个 `Channel` | n/a | n/a | Fork |
240
- | **接入新 Agent** | 加一个 `AgentDriver`(CLI 或 ACP | n/a | n/a | Fork |
241
- | **能否自举** | **能 —— 用自己构建自己** | 不能 | 不能 | 不能 |
242
-
243
- 真正重要的形态是:**你不离开自己的环境,你保留自己的大脑选择权,你并行驱动一个 swarm,而这个编排器就是我们用来构建这个编排器的同一个东西。**
233
+ | **操作终端** | 7 IM + Web + 持续扩展 | 仅限 IDE 内部 | 局限在专属网页端 | 死绑在单个 IM 内的单个 Bot |
234
+ | **Agent 运行地** | 完全在你自己的本地机器上 | 你的本地机器 | 厂商分配的云端沙盒里 | 往往在厂商服务器端 |
235
+ | **Agent 的选择** | Claude Code · Codex · Gemini · Hermes (ACP) · …(任你选) | 深度绑定没得选 | 单一 | 单一 |
236
+ | **底层模型抉择** | 国外前沿大模型 + 国产全系 + 任何兼容 OpenAI 接口的模型 | 平台控制 | 厂商绑定 | 单一无脑没得换 |
237
+ | **并发能力** | **N 个 Agent × N 个窗口 × N 个工作区** | 每个 IDE 窗口只能同时运行一个 | 串行排队 | 单一线程 |
238
+ | **文件与工具掌控** | 你主机上的所有本地文件、MCP 资源库、以及本地 CLI 系统 | 本地文件 | 沙盒受限环境 | 极度受限 |
239
+ | **接入新终端渠道** | 随便写个 `Channel` 基础实现类就能打通 | 无法实现 | 无法实现 | 需要 Fork 整个项目 |
240
+ | **接入新 Agent** | 实现一个简单的 `AgentDriver` 接口(CLI 或 ACP 均可)极速完成 | 无法实现 | 无法实现 | 需要 Fork 整个项目 |
241
+ | **能否自举开发** | **能!完全是由它自己一砖一瓦开发出来的!** | 不能 | 不能 | 不能 |
242
+
243
+ 这个表格揭示了最核心的形态差异:**你不需要离开习惯的工作环境,你可以自由选择用哪颗「大脑」,你甚至可以并发操作一整支 AI 军队;而这个编排器本身,就是我们打造它的最佳工具。**
244
244
 
245
245
  ---
246
246
 
247
- ## 指令一览
247
+ ## 常用指令
248
248
 
249
- | 指令 | 说明 |
249
+ | 指令 | 描述 |
250
250
  |---|---|
251
- | `/start` | 入口信息、当前 Agent、工作目录 |
251
+ | `/start` | 查看入口信息、当前 Agent 及工作目录 |
252
252
  | `/sessions` | 查看、切换或新建会话 |
253
253
  | `/agents` | 切换 Agent(Claude · Codex · Gemini · Hermes) |
254
- | `/models` | 查看与切换模型 / 推理强度 |
255
- | `/mode` | 切换 plan 模式(推理强度) |
256
- | `/switch` | 浏览并切换工作目录 |
257
- | `/workspaces` | 从 Dashboard 的快捷工作区列表挑一个 |
258
- | `/goal` | 设置或查看会话级长期目标(自终止) |
259
- | `/stop` | 停止当前会话 |
260
- | `/status` | 运行状态、token、用量、会话信息 |
261
- | `/host` | 主机 CPU / 内存 / 磁盘 / 电量 |
262
- | `/skills` | 浏览项目 Skill |
263
- | `/ext` | 扩展总览 |
264
- | `/restart` | 重启 bot |
265
- | `/sk_<name>` | 跑一个项目 Skill |
266
-
267
- 纯文本会被直接转给当前 Agent
254
+ | `/models` | 查看并切换当前会话的模型及推理强度 |
255
+ | `/mode` | 快捷切换计划模式 (推理强度) |
256
+ | `/switch` | 浏览并快速切换工作目录 |
257
+ | `/workspaces` | 从 Dashboard 收藏的快捷列表中选择工作区 |
258
+ | `/goal` | 设置或检视会话的长效目标(达成后 Agent 自动终止) |
259
+ | `/stop` | 强制停止当前会话 |
260
+ | `/status` | 检查运行状态、Token 消耗、资源使用及会话摘要 |
261
+ | `/host` | 监控主机的 CPU / 内存 / 磁盘 / 电池状态 |
262
+ | `/skills` | 浏览当前项目可用的所有技能 (Skills) |
263
+ | `/ext` | 快速查看扩展状态 |
264
+ | `/restart` | 重启并重新加载 Bot 服务 |
265
+ | `/sk_<name>` | 快速触发某个指定的项目技能 |
266
+
267
+ *注:不带斜杠的纯文本将作为普通消息直接发送给当前的 Agent。*
268
268
 
269
269
  ---
270
270
 
271
- ## 配置
271
+ ## 配置管理
272
272
 
273
- - 持久化配置文件:`~/.pikiclaw/setting.json` —— 渠道、Agent、Providers/Profiles、工作区、MCP 扩展
274
- - Dashboard 是主要配置入口;终端向导(`--setup`)和 `--doctor` 留给无 UI 场景
275
- - 全局 MCP 扩展放在 setting 文件的 `extensions.mcp`
276
- - 工作区 MCP 扩展:项目根目录里的标准 `.mcp.json`
277
- - 项目 Skill:`.pikiclaw/skills/*/SKILL.md`(同时也会识别 `.claude/commands/*.md`)
273
+ - 核心持久化配置文件:`~/.pikiclaw/setting.json` —— 负责存储渠道、Agent、Providers/Profiles、工作区历史及 MCP 扩展等信息。
274
+ - Dashboard 是主要的配置入口;交互式的终端向导 (`--setup`) 与体检脚本 (`--doctor`) 主要为无 UI (headless) 环境准备。
275
+ - 全局 MCP 扩展配置存放于 `setting.json` `extensions.mcp` 字段下。
276
+ - 工作区 MCP 扩展:遵循标准约定,存放于项目根目录的 `.mcp.json` 中。
277
+ - 项目专属技能:统一保存在 `.pikiclaw/skills/*/SKILL.md` 中(同时也兼容和加载 `.claude/commands/*.md` 格式)。
278
278
 
279
- **Computer-use** 由扩展页面里两个开关控制:
279
+ **Computer-use 的权限开关**需要在扩展面板独立控制:
280
280
 
281
- - `browserEnabled` —— 托管 Chrome(Playwright)。Agent 第一次需要 Chrome 时,pikiclaw 会在 `~/.pikiclaw` 下创建一个专用 profile,并在后续会话中复用。常用网站登录一次,往后每次会话都自带凭证。
282
- - `desktopEnabled` —— macOS 桌面(Peekaboo)。开启后(仅 macOS),pikiclaw 会启动 `@steipete/peekaboo` 的 `peekaboo-mcp` 二进制并注入其工具。打开开关前,请先在 *系统设置隐私与安全性* 里给父终端授予 **辅助功能** 与 **屏幕录制** 权限。
281
+ - `browserEnabled` —— 开启后启用托管 Chrome(Playwright)。当 Agent 首次调用 Chrome 时,pikiclaw 会在 `~/.pikiclaw` 下生成专属配置文件,供后续会话跨任务复用。只需登录一次常用站点,今后即可免扫码直连。
282
+ - `peekabooEnabled` —— 开启后启用 macOS 桌面控制(Peekaboo)。该功能仅支持 macOS,开启后 pikiclaw 会拉起 `@steipete/peekaboo` 的 `peekaboo-mcp` 进程并挂载相关工具。*开启前,请务必前往 macOS 的「系统设置 隐私与安全性」,为启动 pikiclaw 的终端授予**辅助功能**和**屏幕录制**权限。*
283
283
 
284
284
  ---
285
285
 
286
- ## Roadmap
286
+ ## 产品路线图 (Roadmap)
287
287
 
288
- 已交付:Hermes driver · ACPAgent Client Protocol · Provider/Profile 模型凭据库 · 七条 IM 渠道 · Computer-usePlaywright 浏览器 + Peekaboo macOS 桌面)。
288
+ 我们已交付:Hermes 驱动支持 · ACP (Agent Client Protocol) 协议底层集成 · Provider/Profile 模型保险箱机制 · 七大 IM 渠道打通 · Computer-use 的落地(Playwright 浏览器托管 + Peekaboo macOS 桌面托管)。
289
289
 
290
- - **更多 ACP Agent** —— 任何新的 ACP 兼容 Agent 都应该不用写手工 driver 就能接入
291
- - **更多终端** —— WhatsApp、专用移动端、语音
292
- - **更深的模型层** —— 为更多国产系列做 agent-on-arbitrary-model wrapper
293
- - **更好的工具生态** —— 推荐 MCP 合集、Skill 模板、市场
294
- - **跨平台 Computer-use** —— macOS Peekaboo 之外,再补 Windows / Linux 桌面 driver
290
+ - **接入更多 ACP Agent** —— 确保任何新的兼容 ACP 协议的 Agent 都能免代码零配置顺滑接入。
291
+ - **拓展终端生态** —— 将支持 WhatsApp、独立的移动端 App 以及语音交互模块。
292
+ - **深化模型层包装** —— 构建基于任意模型的通用 Agent Wrapper,以便无缝驱动更多优秀的国产模型。
293
+ - **完善工具生态** —— 推出官方推荐的 MCP 插件合集、Skill 模版库及社区应用市场。
294
+ - **全平台的 Computer-use** —— 在已有的 macOS Peekaboo 驱动之外,加入适配 Windows / Linux 操作系统的桌面控制支持。
295
295
 
296
- 协议层细节见 [ACP 迁移方案](docs/acp-migration.md)。
296
+ 想了解协议层的下一步动作,请参阅 [ACP 迁移计划](docs/acp-migration.md)。
297
297
 
298
298
  ---
299
299
 
@@ -308,43 +308,43 @@ npm test
308
308
  ```
309
309
 
310
310
  ```bash
311
- npm run dev # 本地开发(--no-daemon,日志写到 ~/.pikiclaw/dev/dev.log)
312
- npm run build # 生产构建(dashboard + tsc)
313
- npm test # vitest run
314
- npx pikiclaw@latest --doctor # 环境检查
311
+ npm run dev # 启动本地开发服务(--no-daemon,实时日志输出到 ~/.pikiclaw/dev/dev.log)
312
+ npm run build # 生产环境编译(Dashboard 构建 + tsc)
313
+ npm test # 运行 Vitest 测试套件
314
+ npx pikiclaw@latest --doctor # 检测本机环境健康度
315
315
  ```
316
316
 
317
- 架构与集成深入文档:[ARCHITECTURE.md](ARCHITECTURE.md) · [INTEGRATION.md](INTEGRATION.md) · [TESTING.md](TESTING.md)
317
+ 想要深度了解架构与集成细节,请参阅:[ARCHITECTURE.md](ARCHITECTURE.md) · [INTEGRATION.md](INTEGRATION.md) · [TESTING.md](TESTING.md)
318
318
 
319
319
  ---
320
320
 
321
- ## 贡献
321
+ ## 参与贡献
322
322
 
323
- 这个项目的每一层都是*被设计成*可扩展的。新终端、新 Agent、新模型 wrapper、新 MCP 工具 —— 都是一等公民级别的贡献。
323
+ 这个项目架构中的每一个分层,生来就是为了被**扩展**的。接入一个新终端、编写一个新 Agent、打造一款模型 Wrapper 或是增加实用的 MCP 工具 —— 这些全都是一等公民级别的贡献。
324
324
 
325
- - 先读 **[贡献指南](CONTRIBUTING.md)**
326
- - 看一下 [`good first issue`](https://github.com/xiaotonng/pikiclaw/labels/good%20first%20issue) 和 [`help wanted`](https://github.com/xiaotonng/pikiclaw/labels/help%20wanted)
327
- - 较大改动请先开 issue 对齐方案
325
+ - 请先阅读 **[贡献指南](CONTRIBUTING.md)** 开始你的第一步。
326
+ - 欢迎关注贴有 [`good first issue`](https://github.com/xiaotonng/pikiclaw/labels/good%20first%20issue) 和 [`help wanted`](https://github.com/xiaotonng/pikiclaw/labels/help%20wanted) 标签的任务。
327
+ - 如果打算进行较大幅度的修改,请先提交 Issue 以便大家确认技术方案。
328
328
 
329
- | 入口 | 你可能要加的东西 |
329
+ | 模块位置 | 你能拓展什么 |
330
330
  |---|---|
331
- | `src/agent/driver.ts`、`src/agent/drivers/*.ts`、`src/agent/acp-client.ts` | 一个新的 Agent driver(CLI ACP |
332
- | `src/channels/base.ts`、`src/channels/*/` | 一个新的终端 / IM 渠道 |
333
- | `src/model/`、`src/model/injector.ts` | 一个新的模型 provider 或按 Agent 注入规则 |
334
- | `src/dashboard/routes/*.ts` | 一个新的 Dashboard API |
335
- | `src/agent/mcp/tools/*.ts`、`src/agent/mcp/bridge.ts` | 新的会话级 MCP 工具 |
336
- | `src/catalog/*.ts` | 一个推荐的 MCP server / CLI 工具 / Skill 仓库 |
331
+ | `src/agent/driver.ts`, `src/agent/drivers/*.ts`, `src/agent/acp-client.ts` | 增加一个新的 Agent Driver(基于 CLI 或是 ACP 协议) |
332
+ | `src/channels/base.ts`, `src/channels/*/` | 对接一个新的终端或 IM 渠道 |
333
+ | `src/model/`, `src/model/injector.ts` | 新增模型提供商,或者定制 Agent 环境的注入规则 |
334
+ | `src/dashboard/routes/*.ts` | 扩充 Dashboard 后端的 API 接口 |
335
+ | `src/agent/mcp/tools/*.ts`, `src/agent/mcp/bridge.ts` | 添加供单个会话专用的 MCP 工具 |
336
+ | `src/catalog/*.ts` | 向我们推荐优秀的 MCP Server、CLI 实用工具或优质技能仓库 |
337
337
 
338
338
  ---
339
339
 
340
- ## Star 历史
340
+ ## Star 历史趋势
341
341
 
342
342
  <a href="https://www.star-history.com/#xiaotonng/pikiclaw&Date">
343
- <img src="https://api.star-history.com/svg?repos=xiaotonng/pikiclaw&type=Date" alt="Star history" width="640">
343
+ <img src="https://api.star-history.com/svg?repos=xiaotonng/pikiclaw&type=Date" alt="Star 历史" width="640">
344
344
  </a>
345
345
 
346
346
  ---
347
347
 
348
- ## License
348
+ ## 许可证
349
349
 
350
- [MIT](LICENSE) —— 开放构建。用它、fork 它、再往上加你自己的那一层。
350
+ [MIT](LICENSE) —— 坚持开放构建。尽情使用、Fork 它,或者插入你自己开发的任意图层吧!
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "pikiclaw",
3
- "version": "0.3.36",
3
+ "version": "0.3.37",
4
4
  "description": "Put the world's smartest AI agents in your pocket. Command local Claude & Gemini via IM. | 让最好用的 IM 变成你电脑上的顶级 Agent 控制台",
5
5
  "type": "module",
6
6
  "bin": {