kimiflare 0.44.0 → 0.46.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/PLAN.md CHANGED
@@ -1,172 +1,59 @@
1
- # Plan: `kimiflare` Claude Code-style CLI powered by Kimi-K2.6
2
-
3
- > Living document. Keep updated at every milestone so context survives compaction.
4
-
5
- ## Context
6
-
7
- Build a terminal coding agent, similar in spirit to Claude Code, driven by the `@cf/moonshotai/kimi-k2.6` model hosted on Cloudflare Workers AI. User has large Cloudflare credits and wants them used directly no AI Gateway, no OpenAI-compat layer; calls go straight to the native Workers AI `ai/run` REST endpoint.
8
-
9
- - **Model**: `@cf/moonshotai/kimi-k2.6`. 262,144-token context, native function calling, streaming, reasoning, vision (unused in v1).
10
- - **Endpoint**: `POST https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/moonshotai/kimi-k2.6` · `Authorization: Bearer $CLOUDFLARE_API_TOKEN`.
11
- - **Pricing**: $0.95 / M input, $0.16 / M cached input, $4.00 / M output.
12
-
13
- Outcome: a `kimiflare` binary that opens a TUI, lets the user chat, calls Kimi-k2.6 with tools (file I/O, bash, search, web fetch), streams tokens, asks permission before mutating anything, and loops tool results back to the model until it stops.
14
-
15
- ## Probe findings (verified against the live API)
16
-
17
- - Non-streaming body is wrapped: `{result: {...OpenAI chat.completion...}, success, errors, messages}`. Unwrap `result`.
18
- - Streaming body is raw SSE — `data: {json}\n\n`, terminal `data: [DONE]`. OpenAI-compatible chunks.
19
- - Kimi emits `delta.reasoning_content` BEFORE `delta.content`. Both are independent streams that share the `max_completion_tokens` budget. Use a generous cap (16384) so reasoning doesn't eat the final answer.
20
- - Tool calls: first chunk has `{id, name, arguments: ""}`; later chunks carry only `arguments` delta, `index` identifies which call. Key accumulator by `index`, not `id`.
21
- - Tool round-trip matches OpenAI spec: `role=assistant` with `tool_calls[]`, followed by `role=tool` with `tool_call_id`.
22
- - Transient `HTTP 200 + success:false + errors[0].code:3040` ("Capacity temporarily exceeded"). Must retry with exponential backoff.
23
- - Extra `data: {"response":"","usage":{...}}` event arrives before `[DONE]` parser should ignore unknown events.
24
- - `usage` rolls up in every streaming chunk → free live cost display.
25
- - Tool-call `id` format is `"functions.<name>:<N>"`; treat as opaque string.
26
-
27
- ## Stack
28
-
29
- - Node 20+ (built-in fetch / ReadableStream / AbortController), ESM.
30
- - TypeScript. `tsup` builds, `tsx` for dev.
31
- - Ink + `ink-text-input`, `ink-spinner`, `ink-select-input`. React 18.
32
- - `commander` for CLI args. `fast-glob` for globs. `diff` for unified diffs. `turndown` for HTML→markdown.
33
- - No AI Gateway. No `openai` SDK. Direct `fetch`.
34
-
35
- ## File layout
36
-
37
- ```
38
- /Users/sinameraji/kimiflare/
39
- ├── package.json
40
- ├── tsconfig.json
41
- ├── tsup.config.ts
42
- ├── bin/kimiflare.mjs # shebang shim dist/index.js
43
- ├── src/
44
- │ ├── index.tsx # CLI entry (commander Ink render or one-shot)
45
- │ ├── app.tsx # Ink root: chat + input + permission modal
46
- │ ├── config.ts # env + ~/.config/kimiflare/config.json
47
- │ ├── agent/
48
- │ │ ├── client.ts # Cloudflare Workers AI client (fetch + SSE + retry)
49
- │ │ ├── stream.ts # delta accumulator (reasoning/content/tool_calls)
50
- │ │ ├── loop.ts # model tools orchestration
51
- │ │ ├── messages.ts # message types
52
- │ │ └── system-prompt.ts # cwd/platform/date/tools
53
- │ ├── tools/
54
- │ │ ├── registry.ts
55
- │ │ ├── executor.ts
56
- │ │ ├── read.ts | write.ts | edit.ts | bash.ts | glob.ts | grep.ts | web-fetch.ts
57
- │ ├── ui/
58
- │ │ ├── chat.tsx | input.tsx | permission.tsx | tool-view.tsx | diff-view.tsx | spinner.tsx
59
- │ └── util/
60
- │ ├── sse.ts
61
- │ ├── paths.ts
62
- │ └── errors.ts
63
- ```
64
-
65
- ## Core designs
66
-
67
- ### Cloudflare Workers AI client (`src/agent/client.ts`)
68
-
69
- `runKimi({ messages, tools, signal })`:
70
-
71
- - POSTs to `https://api.cloudflare.com/client/v4/accounts/${accountId}/ai/run/@cf/moonshotai/kimi-k2.6` with Bearer token.
72
- - Body: `{ messages, tools, tool_choice: "auto", parallel_tool_calls: true, stream: true, temperature: 0.2, max_completion_tokens: 16384 }`.
73
- - Returns an async iterator: `{type:"reasoning",delta}`, `{type:"text",delta}`, `{type:"tool_call_start",index,id,name}`, `{type:"tool_call_args",index,argsDelta}`, `{type:"usage",usage}`, `{type:"done",finishReason,usage}`.
74
- - Retries on `code:3040` up to 5× with backoff (500ms, 1s, 2s, 4s, 8s + jitter).
75
-
76
- ### SSE reader (`src/util/sse.ts`)
77
-
78
- `async function* readSSE(stream: ReadableStream<Uint8Array>)` — splits on `\n\n`, strips `data: ` prefix, yields parsed JSON, stops on `[DONE]`.
79
-
80
- ### Agent loop (`src/agent/loop.ts`)
81
-
82
- ```
83
- loop:
84
- stream = runKimi(messages, tools)
85
- collect content/reasoning/tool_calls from events
86
- push final assistant message to messages[]
87
- if tool_calls:
88
- for each tc (serialize mutating, parallel for readonly):
89
- result = executor.run(tc, askPermission)
90
- push { role:"tool", tool_call_id: tc.id, content: result }
91
- continue
92
- else:
93
- yield to UI
94
- ```
95
-
96
- Max 50 iterations per user turn.
97
-
98
- ### Tools
99
-
100
- | Tool | Permission | Notes |
101
- |-------------|------------|-------|
102
- | `read` | auto | `path`, `offset?`, `limit?`. UTF-8 text, ≤2MB. |
103
- | `write` | prompt | `path`, `content`. Diff preview. |
104
- | `edit` | prompt | `path`, `old_string`, `new_string`, `replace_all?`. Unique-match enforced. |
105
- | `bash` | prompt | `command`, `timeout_ms?` (default 120s, max 600s). Output capped 30KB. |
106
- | `glob` | auto | `pattern`, `path?`. `fast-glob`, 200 results sorted by mtime. |
107
- | `grep` | auto | Shells out to `rg` if present, JS fallback otherwise. |
108
- | `web_fetch` | auto | `url`, 20s timeout, HTML→markdown, ≤100KB. |
109
-
110
- Permission modal scopes: **this call** / **this session for this tool** / **deny**. Bash session-allow is keyed by command prefix.
111
-
112
- ### System prompt
113
-
114
- Injects cwd, platform, shell, date, and one-line per registered tool. Emphasizes: prefer tools over guessing, read before editing, explain briefly before asking permission.
115
-
116
- ### UI
117
-
118
- - Chat scrollback: user (cyan), assistant text, collapsed tool blocks, streaming tokens live.
119
- - Reasoning: dim collapsed block above final answer; off by default; toggle with `/reasoning` or Ctrl+R.
120
- - Input: multi-line `ink-text-input`, slash commands `/clear /exit /model /cost /history /reasoning`.
121
- - Permission modal: overlay with tool name + args (diff preview for edit/write, raw command for bash); 3-option select.
122
- - Status line: model, running tokens, cost estimate.
123
-
124
- ### Config
125
-
126
- Resolution: env vars → `~/.config/kimiflare/config.json` → first-run prompt writing chmod 600 file.
127
-
128
- ```jsonc
129
- { "accountId": "…", "apiToken": "…", "model": "@cf/moonshotai/kimi-k2.6" }
130
- ```
131
-
132
- ### Entry modes
133
-
134
- - `kimiflare` — interactive TUI
135
- - `kimiflare -p "prompt"` — one-shot to stdout; permissions auto-deny unless `--dangerously-allow-all`
136
- - `kimiflare --model <id>` — override model
137
- - Session transcripts persisted at `~/.local/share/kimiflare/sessions/*.jsonl` (resume v1-optional)
138
-
139
- ## Verification scenarios
140
-
141
- 1. `npm install && npm run build && kimiflare --help` (after `npm link` or symlink).
142
- 2. TUI opens with valid creds.
143
- 3. Plain chat streams tokens live.
144
- 4. Readonly tool auto-runs (glob/read).
145
- 5. Mutating tool triggers permission modal, write+bash round-trip works.
146
- 6. Multi-tool loop (grep across tree).
147
- 7. Web fetch + summarize.
148
- 8. `/cost` matches `0.95·in/1M + 4.00·out/1M`.
149
- 9. `kimiflare -p --dangerously-allow-all` one-shot exits cleanly.
150
- 10. Missing creds / network-kill errors are graceful.
151
-
152
- Unit tests: SSE reader split-chunks + `[DONE]`; stream accumulator against recorded Kimi tool-call transcript; edit unique-match; bash command-prefix allowlist.
153
-
154
- ## Progress log
155
-
156
- | Milestone | Status |
157
- |-----------|--------|
158
- | API probes (5) | ✅ — see "Probe findings" |
159
- | Scaffolding (package.json / tsconfig / tsup / bin) | ✅ |
160
- | SSE reader + client + accumulator | ✅ validated via scripts/replay-stream.ts against recorded probe-4 body |
161
- | Tools + registry + executor + permissions | ✅ |
162
- | Agent loop + system prompt + messages | ✅ |
163
- | Config loader + print-mode CLI entry | ✅ — verified end-to-end (plain chat, readonly tools, mutating tools) |
164
- | Ink TUI (chat / input / permission / status) | ✅ renders cleanly under a pty; awaiting real-TTY interactive test |
165
- | Interactive CLI entry | ✅ |
166
- | End-to-end verification (TUI) | 🔄 in progress — user will run `kimiflare` in their terminal |
167
-
168
- ## Deferred (pre-launch)
169
-
170
- - **First-run credential wizard**: when config.json and env vars are both missing, walk the user through creating a Cloudflare API token, validate it with a tiny test call, save to `~/.config/kimiflare/config.json` (chmod 600). Today we just error out with instructions. Product is BYO-creds — every user supplies their own Cloudflare token; kimiflare does not proxy requests.
171
- - **Npm publish / homebrew tap** so `npm i -g kimiflare` works.
172
- - **Session transcripts** (`~/.local/share/kimiflare/sessions/*.jsonl`) + `--resume`.
1
+ # Implementation Plan: Web Search, GitHub Read-Only, Headless Browser
2
+
3
+ ## Overview
4
+ Add three new native tool categories to KimiFlare:
5
+ 1. **Web Search** (`search_web`) — query-based web search without requiring a known URL
6
+ 2. **GitHub Read-Only** (`github_read_pr`, `github_read_issue`, `github_read_code`) — structured GitHub API access, whitelisted in plan mode
7
+ 3. **Headless Browser** (`browser_fetch`)JS-rendered page extraction via Playwright
8
+
9
+ ## Design Decisions
10
+
11
+ ### Web Search
12
+ - **Provider**: DuckDuckGo HTML scraping (no API key required)
13
+ - **Fallback**: If DuckDuckGo blocks us, degrade gracefully with a clear error
14
+ - **Output**: List of results with title, URL, and snippet
15
+ - **Permission**: `needsPermission: false` (read-only, no side effects)
16
+ - **Plan mode**: Whitelisted (read-only)
17
+
18
+ ### GitHub Read-Only
19
+ - **Auth**: Reuse existing `githubOAuthToken` from config
20
+ - **Tools**:
21
+ - `github_read_pr` read a PR by owner/repo/number
22
+ - `github_read_issue` read an issue by owner/repo/number
23
+ - `github_read_code` read file contents from a repo at a specific ref
24
+ - **Permission**: `needsPermission: false` (read-only, no side effects)
25
+ - **Plan mode**: Whitelisted (explicitly read-only)
26
+ - **Why native tools vs gh CLI**: Native tools are always safe (no write ops), so they bypass permission prompts and work in plan mode
27
+
28
+ ### Headless Browser
29
+ - **Engine**: Playwright (Chromium in headless mode)
30
+ - **Behavior**: Launch invisible browser, navigate, wait for load, extract text + screenshot option
31
+ - **Output**: Extracted page text (via readability-style extraction) + optional screenshot path
32
+ - **Permission**: `needsPermission: false` for text extraction, `needsPermission: true` for screenshots (files on disk)
33
+ - **Plan mode**: Text extraction whitelisted; screenshots blocked
34
+ - **Dependency**: `playwright` as optional peer dependency — tool gracefully errors if not installed
35
+
36
+ ## Files to Create/Modify
37
+
38
+ ### New files
39
+ - `src/tools/web-search.ts` — DuckDuckGo search implementation
40
+ - `src/tools/github.ts` — GitHub API read-only tools
41
+ - `src/tools/browser.ts` — Playwright headless browser tool
42
+ - `src/tools/web-search.test.ts` tests for search
43
+ - `src/tools/github.test.ts` — tests for GitHub tools
44
+ - `src/tools/browser.test.ts` tests for browser tool
45
+
46
+ ### Modified files
47
+ - `src/tools/executor.ts` — register new tools in `ALL_TOOLS`
48
+ - `src/tools/reducer.ts` add reduction rules for `search_web`, `github_read_*`, `browser_fetch`
49
+ - `src/mode.ts` update `isBlockedInPlanMode` and `isReadOnlyBash` if needed
50
+ - `src/agent/system-prompt.ts` mention new tools in static prefix (optional)
51
+ - `package.json` add `playwright` to optional dependencies or devDependencies
52
+
53
+ ## Implementation Order
54
+ 1. Web search (simplest, no external deps)
55
+ 2. GitHub read-only (reuses existing auth)
56
+ 3. Headless browser (most complex, optional dep)
57
+ 4. Reducer updates + mode updates
58
+ 5. Tests
59
+ 6. Typecheck + commit