@youtyan/browser-pilot 0.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 youtyan
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.ja.md ADDED
@@ -0,0 +1,138 @@
1
+ # browser-pilot
2
+
3
+ Chrome ブラウザを Claude Code から操作する MCP サーバー。Chrome Extension 経由でリアルブラウザを操作する。
4
+
5
+ ## Quick Start
6
+
7
+ ### 1. MCP 設定に追加
8
+
9
+ `~/.claude/settings.json` の `mcpServers` に以下を追加:
10
+
11
+ ```json
12
+ {
13
+ "mcpServers": {
14
+ "browser-pilot": {
15
+ "command": "npx",
16
+ "args": ["-y", "@youtyan/browser-pilot"]
17
+ }
18
+ }
19
+ }
20
+ ```
21
+
22
+ ### 2. Skills + Extension セットアップ
23
+
24
+ ```bash
25
+ npx @youtyan/browser-pilot init
26
+ ```
27
+
28
+ ### 3. Chrome Extension を読み込み
29
+
30
+ `init` の案内に従い、Chrome の Developer Mode で Extension を読み込む。
31
+
32
+ ## Skills
33
+
34
+ 以下の 7 つのスキルが含まれる:
35
+
36
+ | スキル | 説明 |
37
+ |---|---|
38
+ | bp-usage | ブラウザ操作ガイド・ツール早見表 |
39
+ | bp-testing | エージェント駆動 Web アプリテスト |
40
+ | bp-test-scripts | コード駆動ブラウザテスト (HTTP API) |
41
+ | bp-gemini-image | Gemini Web UI で画像生成 |
42
+ | bp-x-operation | X (Twitter) 操作 (投稿, 検索, 収集) |
43
+ | bp-annotate-coords | 座標指定スクリーンショットアノテーション |
44
+ | bp-generate-manual | サイト操作マニュアル自動生成 |
45
+
46
+ `npx skills add` でもインストール可能:
47
+
48
+ ```bash
49
+ npx skills add https://github.com/youtyan/browser-pilot
50
+ ```
51
+
52
+ ## 利用可能ツール一覧 (35)
53
+
54
+ ### ブラウザ操作
55
+
56
+ | ツール | 説明 |
57
+ |---|---|
58
+ | browser_tab | タブ管理 (list, connect, create, close, action log, manual/test生成) |
59
+ | browser_get_page | ページ取得 (smart/compact/navigate/html/tables/forms/links/items/metadata) |
60
+ | browser_navigation | ナビゲーション (URL遷移, 戻る/進む, リロード) |
61
+ | browser_click | クリック (テキスト, CSSセレクタ, ref, backendNodeId) |
62
+ | browser_set_values | フォーム値設定 (リトライ・検証付き) |
63
+ | browser_select_option | ドロップダウン選択 (native/custom対応) |
64
+ | browser_batch | 複数操作の一括実行 (click, setValue, scroll, wait, assert等) |
65
+ | browser_find_elements | 要素検索 (AXセマンティック or DOM) |
66
+ | browser_keyboard | キーボード入力 (テキスト, キー, ショートカット) |
67
+ | browser_scroll | スクロール操作 |
68
+ | browser_wait_for | 要素出現/消滅/数変化の待機, ms遅延 |
69
+ | browser_assert | ページ状態アサーション (exists, visible, contains_text, count) |
70
+ | browser_action | マウス操作 (hover, dblclick, drag, contextmenu) |
71
+ | browser_handle_dialog | ダイアログ操作 (accept/dismiss) |
72
+ | browser_execute_javascript | JavaScript 実行 |
73
+ | browser_health | 接続診断 (1コール) |
74
+
75
+ ### キャプチャ・出力
76
+
77
+ | ツール | 説明 |
78
+ |---|---|
79
+ | browser_screenshot | スクリーンショット (全体/要素, savePath指定) |
80
+ | browser_pdf | ページPDF保存 |
81
+ | browser_record | タブ録画 (WebM, savePath指定) |
82
+ | browser_get_bounds | 要素座標・サイズ取得 |
83
+ | browser_annotate | アノテーション (モザイク, 矢印, ラベル, バッジ) |
84
+ | browser_get_attributes | 要素属性取得 (selector/backendNodeId) |
85
+ | browser_paste_file | ファイルペースト |
86
+ | browser_upload_file | ファイルアップロード |
87
+
88
+ ### デバッグ・診断
89
+
90
+ | ツール | 説明 |
91
+ |---|---|
92
+ | browser_console | コンソールメッセージ (source map解決付き) |
93
+ | browser_network | ネットワーク監視 (redaction付き) |
94
+ | browser_performance | パフォーマンス診断 (load/layout/network/runtime/a11y/memory/page) |
95
+ | browser_css_inspect | CSS検査・算出スタイル |
96
+ | browser_storage | localStorage/sessionStorage検査 |
97
+ | browser_cookie | Cookie管理 (get/set/delete) |
98
+
99
+ ### 連携
100
+
101
+ | ツール | 説明 |
102
+ |---|---|
103
+ | gemini_generate_image | Gemini Web UI経由の画像生成 |
104
+ | x_collect_tweets | X (Twitter) ツイート収集 (スクロール+重複除外) |
105
+
106
+ ## セキュリティに関する注意事項
107
+
108
+ browser-pilot は強力なブラウザ自動化ツールです。使用前に以下のリスクを理解してください。
109
+
110
+ - **`browser_execute_javascript` は Chrome タブの MAIN world で任意の JavaScript を実行します。** ページの DOM、Cookie、localStorage 等に完全にアクセスできます。
111
+ - **MCP サーバーは `localhost:18888` にバインドされます。** トークンを持つマシン上の任意のプロセスが接続可能です。
112
+ - **認証トークンは `~/.browser-pilot-token` にパーミッション 0600 で保存されます。**
113
+ - **Chrome Extension は `<all_urls>` パーミッションを持ちます。** あらゆる Web サイトにアクセス可能です。
114
+ - **信頼できる環境でのみ使用してください。**
115
+
116
+ ### トークンペアリング
117
+
118
+ Extension はbearer トークンで認証します。ペアリング手順:
119
+
120
+ 1. MCP サーバーを起動
121
+ 2. `npx @youtyan/browser-pilot token` でトークンを表示
122
+ 3. Extension ポップアップに貼り付け
123
+
124
+ トークンはディスク上(`~/.browser-pilot-token`)に `0600` パーミッションで保存されます。トークンを公開する HTTP エンドポイントはありません。CLI コマンドまたは手動でコピーしてください。
125
+
126
+ ## 開発
127
+
128
+ セットアップ、アーキテクチャ、ツール追加方法、ビルド手順については [DEVELOPMENT.md](./DEVELOPMENT.md) を参照。
129
+
130
+ ## 更新方法
131
+
132
+ ```bash
133
+ npx @youtyan/browser-pilot update
134
+ ```
135
+
136
+ ## ライセンス
137
+
138
+ MIT
package/README.md ADDED
@@ -0,0 +1,140 @@
1
+ # browser-pilot
2
+
3
+ [日本語版 README](./README.ja.md)
4
+
5
+ An MCP server for controlling Chrome browsers from Claude Code. Operates real browsers via a Chrome Extension.
6
+
7
+ ## Quick Start
8
+
9
+ ### 1. Add MCP configuration
10
+
11
+ Add the following to `mcpServers` in `~/.claude/settings.json`:
12
+
13
+ ```json
14
+ {
15
+ "mcpServers": {
16
+ "browser-pilot": {
17
+ "command": "npx",
18
+ "args": ["-y", "@youtyan/browser-pilot"]
19
+ }
20
+ }
21
+ }
22
+ ```
23
+
24
+ ### 2. Set up Skills and Extension
25
+
26
+ ```bash
27
+ npx @youtyan/browser-pilot init
28
+ ```
29
+
30
+ ### 3. Load the Chrome Extension
31
+
32
+ Follow the instructions from `init` to load the Extension in Chrome Developer Mode.
33
+
34
+ ## Skills
35
+
36
+ Seven skills are included:
37
+
38
+ | Skill | Description |
39
+ |---|---|
40
+ | bp-usage | Browser operation guide and tool quick reference |
41
+ | bp-testing | Agent-driven web app testing |
42
+ | bp-test-scripts | Code-driven browser testing (HTTP API) |
43
+ | bp-gemini-image | Image generation via Gemini Web UI |
44
+ | bp-x-operation | X (Twitter) operations (post, search, collect) |
45
+ | bp-annotate-coords | Screenshot annotation with coordinates and bounds |
46
+ | bp-generate-manual | Site operation manual generation with screenshots/video |
47
+
48
+ You can also install skills with `npx skills add`:
49
+
50
+ ```bash
51
+ npx skills add https://github.com/youtyan/browser-pilot
52
+ ```
53
+
54
+ ## Available Tools (35)
55
+
56
+ ### Browser Operation
57
+
58
+ | Tool | Description |
59
+ |---|---|
60
+ | browser_tab | Tab management (list, connect, create, close, action log, manual/test generation) |
61
+ | browser_get_page | Get page content (smart/compact/navigate/html/tables/forms/links/items/metadata modes) |
62
+ | browser_navigation | Navigate to URL, go back/forward, reload |
63
+ | browser_click | Click by text, CSS selector, ref, or backendNodeId |
64
+ | browser_set_values | Set form field values (with retry and verify) |
65
+ | browser_select_option | Select dropdown options (native and custom) |
66
+ | browser_batch | Execute multiple actions in one round-trip (click, setValue, scroll, wait, assert, keyboard, selectOption) |
67
+ | browser_find_elements | Find elements with semantic AX scoring or DOM search |
68
+ | browser_keyboard | Keyboard input (type text, press keys, shortcuts) |
69
+ | browser_scroll | Scroll page or element |
70
+ | browser_wait_for | Wait for element appear/disappear/count change, or simple ms delay |
71
+ | browser_assert | Assert page state (exists, visible, contains_text, count) |
72
+ | browser_action | Mouse actions (hover, dblclick, drag, contextmenu) |
73
+ | browser_handle_dialog | Accept or dismiss browser dialogs |
74
+ | browser_execute_javascript | Execute JavaScript in page context |
75
+ | browser_health | One-call connection diagnostics |
76
+
77
+ ### Capture & Output
78
+
79
+ | Tool | Description |
80
+ |---|---|
81
+ | browser_screenshot | Take screenshot (full page or element, with savePath) |
82
+ | browser_pdf | Save page as PDF |
83
+ | browser_record | Record tab activity as WebM video (with savePath) |
84
+ | browser_get_bounds | Get element bounding rectangles for annotation |
85
+ | browser_annotate | Annotate screenshots (mosaic, arrows, labels, badges) |
86
+ | browser_get_attributes | Get element attributes (selector or backendNodeId) |
87
+ | browser_paste_file | Paste a file into page |
88
+ | browser_upload_file | Upload file to input element |
89
+
90
+ ### Debugging & Diagnostics
91
+
92
+ | Tool | Description |
93
+ |---|---|
94
+ | browser_console | Console messages (with source map resolution) |
95
+ | browser_network | Network request monitoring (with redaction) |
96
+ | browser_performance | Performance diagnostics (diagnose_load/layout/network/runtime/accessibility/memory/page) |
97
+ | browser_css_inspect | CSS inspection and computed styles |
98
+ | browser_storage | localStorage/sessionStorage inspection |
99
+ | browser_cookie | Cookie management (get, set, delete) |
100
+
101
+ ### Integrations
102
+
103
+ | Tool | Description |
104
+ |---|---|
105
+ | gemini_generate_image | Generate images via Gemini Web UI |
106
+ | x_collect_tweets | Collect tweets from X (Twitter) with scroll and dedup |
107
+
108
+ ## Security Considerations
109
+
110
+ browser-pilot is a powerful browser automation tool. Understand the following risks before use.
111
+
112
+ - **`browser_execute_javascript` executes arbitrary JavaScript in the MAIN world of Chrome tabs.** This gives full access to the page's DOM, cookies, localStorage, and more.
113
+ - **The MCP server binds to `localhost:18888`.** Any process on the machine can connect if it has the authentication token.
114
+ - **The authentication token is stored in `~/.browser-pilot-token` with `0600` permissions.**
115
+ - **The Chrome Extension has `<all_urls>` permission.** It can access any website.
116
+ - **Only use this tool in trusted environments.**
117
+
118
+ ### Token Pairing
119
+
120
+ The Extension authenticates via a bearer token stored in `chrome.storage.local`. To pair:
121
+
122
+ 1. Start the MCP server
123
+ 2. Run `npx @youtyan/browser-pilot token` to print the token
124
+ 3. Paste into the Extension popup
125
+
126
+ The token is stored on disk (`~/.browser-pilot-token`) with `0600` permissions. There is no HTTP endpoint that exposes the token — it must be copied manually or via the CLI command.
127
+
128
+ ## Development
129
+
130
+ See [DEVELOPMENT.md](./DEVELOPMENT.md) for setup, architecture, adding tools, and build instructions.
131
+
132
+ ## Updating
133
+
134
+ ```bash
135
+ npx @youtyan/browser-pilot update
136
+ ```
137
+
138
+ ## License
139
+
140
+ MIT