kimiflare 0.21.0 → 0.23.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -16,6 +16,11 @@
16
16
  Moonshot's 1T-parameter open-source model, running directly on your Cloudflare account.
17
17
  </p>
18
18
 
19
+ > 💸 **Heads up — this runs on your Cloudflare account.**
20
+ > We recommend setting a [budget cap](https://developers.cloudflare.com/workers-ai/platform/pricing/) on Workers AI and checking your [Cloudflare billing](https://dash.cloudflare.com/) regularly while using KimiFlare.
21
+ >
22
+ > 🚀 **Stay up to date.** Newer versions are significantly more token-efficient and cheaper to run. Run `/update` inside KimiFlare or `npm update -g kimiflare` to get the latest release.
23
+
19
24
  <p align="center">
20
25
  <img src="docs/screenshot.png" alt="kimiflare TUI" width="900">
21
26
  </p>
@@ -24,7 +29,7 @@
24
29
 
25
30
  - **262k context window** — Read entire modules, large configs, and full stack traces without the model losing track.
26
31
  - **Image understanding** — Drop image paths into your prompt (PNG, JPG, WebP, GIF, BMP). The model sees them inline — great for UI reviews, diagrams, screenshots, and mockups.
27
- - **Direct to Cloudflare** — No AI Gateway, no proxy, no OpenAI SDK. Your traffic goes straight to Workers AI from your account.
32
+ - **Direct by default** — No proxy, no OpenAI SDK. Your traffic goes straight to Workers AI from your account, with optional AI Gateway routing for user-owned logging, caching, and analytics.
28
33
  - **Plan mode** — Ask the agent to research and produce a plan without touching your filesystem. Review it, then exit plan mode to execute.
29
34
 
30
35
  ## Quick start
@@ -57,11 +62,13 @@ Requires Node.js ≥ 20.
57
62
  | **Streaming reasoning** | Toggle the model's chain-of-thought with `/reasoning` or `Ctrl-R`. See how it thinks in real time. |
58
63
  | **Image understanding** | Drop image paths (PNG, JPG, WebP, GIF, BMP up to 5 MB) into any prompt. The model sees them inline — perfect for UI reviews, diagrams, and screenshots. |
59
64
  | **Live cost tracking** | Status bar shows real-time cost based on Cloudflare pricing: `$0.95/M input`, `$0.16/M cached`, `$4.00/M output`. |
65
+ | **Optional AI Gateway** | Route Workers AI traffic through your own Cloudflare AI Gateway for request logs, cache status, and analytics while keeping your API token local. |
60
66
  | **Session persistence** | Every turn is auto-saved. `/resume` lists past sessions (with message counts) in a paginated picker. |
61
67
  | **Smart permissions** | Bash session-allow is keyed by the first token (e.g., allow all `git` commands). Write/edit show a unified diff before you approve. |
62
68
  | **Project context (`/init`)** | Scans your repo and writes a concise `KIMI.md` — build commands, layout, conventions. Auto-loaded on every launch. |
63
69
  | **MCP server integration** | Plug in external tools via the Model Context Protocol — local stdio servers or remote SSE endpoints. GitHub, Sentry, docs search, databases, etc. |
64
70
  | **Co-author auto-append** | Detects `git commit` commands and auto-injects `Co-authored-by: kimiflare <kimiflare@proton.me>`. |
71
+ | **Local structured memory** | SQLite + embeddings cross-session memory. Extracts facts, instructions, and preferences at compaction time; recalls them via hybrid search (FTS5 + vector + exact) in future sessions. Team-shareable via `.kimiflare/memory.db`. |
65
72
  | **Resilient transport** | Retries Cloudflare capacity errors (code 3040) and 5xx with exponential backoff up to 5 attempts. |
66
73
 
67
74
  ## Configure
@@ -76,6 +83,13 @@ Then either export them each shell:
76
83
  ```sh
77
84
  export CLOUDFLARE_ACCOUNT_ID=...
78
85
  export CLOUDFLARE_API_TOKEN=...
86
+ # Optional: route through a Cloudflare AI Gateway you own
87
+ export KIMIFLARE_AI_GATEWAY_ID=...
88
+ # Optional: enable local structured memory
89
+ export KIMIFLARE_MEMORY_ENABLED=1
90
+ export KIMIFLARE_MEMORY_DB_PATH=.kimiflare/memory.db
91
+ export KIMIFLARE_MEMORY_MAX_AGE_DAYS=90
92
+ export KIMIFLARE_MEMORY_MAX_ENTRIES=1000
79
93
  ```
80
94
 
81
95
  or save them once (`chmod 600` automatically):
@@ -86,12 +100,36 @@ cat > ~/.config/kimiflare/config.json <<'EOF'
86
100
  {
87
101
  "accountId": "YOUR_ACCOUNT_ID",
88
102
  "apiToken": "YOUR_API_TOKEN",
89
- "model": "@cf/moonshotai/kimi-k2.6"
103
+ "model": "@cf/moonshotai/kimi-k2.6",
104
+ "aiGatewayId": "YOUR_GATEWAY_NAME"
90
105
  }
91
106
  EOF
92
107
  chmod 600 ~/.config/kimiflare/config.json
93
108
  ```
94
109
 
110
+ ### Optional AI Gateway
111
+
112
+ kimiflare talks directly to Workers AI unless `aiGatewayId` is configured. When set, chat completions are sent to Cloudflare's native Workers AI Gateway endpoint:
113
+
114
+ ```text
115
+ https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/workers-ai/{model_id}
116
+ ```
117
+
118
+ Create a gateway in the Cloudflare dashboard under **AI > AI Gateway**, then set `aiGatewayId` in `~/.config/kimiflare/config.json` or export `KIMIFLARE_AI_GATEWAY_ID`. The same Workers AI API token stays on your machine and is sent to Cloudflare.
119
+
120
+ Optional per-request controls:
121
+
122
+ ```json
123
+ {
124
+ "aiGatewayCacheTtl": 3600,
125
+ "aiGatewaySkipCache": false,
126
+ "aiGatewayCollectLogPayload": false,
127
+ "aiGatewayMetadata": { "tool": "kimiflare" }
128
+ }
129
+ ```
130
+
131
+ `cf-aig-cache-status` from AI Gateway is shown separately from Workers AI prompt-token caching (`cached_tokens`). If you enable gateway logs, kimiflare records metadata such as log id, cache hit/miss, tokens, duration, and status when Cloudflare returns it; prompt and response bodies are not stored by kimiflare.
132
+
95
133
  ## MCP servers (Model Context Protocol)
96
134
 
97
135
  kimiflare supports external tools via MCP. Add servers to your `~/.config/kimiflare/config.json`:
@@ -132,6 +170,43 @@ MCP tools appear prefixed as `mcp_<server>_<tool>` alongside built-in tools.
132
170
  - `/mcp list` — show connected servers and tool counts
133
171
  - `/mcp reload` — disconnect and reconnect all configured servers
134
172
 
173
+ ## Local structured memory
174
+
175
+ kimiflare can remember facts, instructions, and preferences across sessions using a local SQLite database with vector search.
176
+
177
+ **How it works:**
178
+ - At compaction time, the agent extracts structured memories from the conversation
179
+ - Memories are stored with embeddings (`@cf/baai/bge-base-en-v1.5`) in a local SQLite database
180
+ - On future sessions, relevant memories are recalled via hybrid search (FTS5 full-text + vector similarity + exact file-path matching)
181
+ - Supports team-shared memory: `.kimiflare/memory.db` in your repo root (add to `.gitignore`)
182
+
183
+ **Enable:**
184
+ ```sh
185
+ export KIMIFLARE_MEMORY_ENABLED=1
186
+ ```
187
+
188
+ Or in `~/.config/kimiflare/config.json`:
189
+ ```json
190
+ {
191
+ "memoryEnabled": true,
192
+ "memoryDbPath": ".kimiflare/memory.db",
193
+ "memoryMaxAgeDays": 90,
194
+ "memoryMaxEntries": 1000,
195
+ "memoryEmbeddingModel": "@cf/baai/bge-base-en-v1.5"
196
+ }
197
+ ```
198
+
199
+ **Commands:**
200
+ - `/memory` — show memory stats (total count, DB size, by category)
201
+ - `/memory search <query>` — manual hybrid search over stored memories
202
+ - `/memory clear` — wipe all memories for the current repo
203
+
204
+ **Storage & cleanup:**
205
+ - Default retention: 90 days, 1000 memories per repo
206
+ - Automatic deduplication of near-identical memories
207
+ - Cleanup runs on startup and after every compaction
208
+ - Typical size: ~4–5 KB per memory; ~15 MB/month under heavy use
209
+
135
210
  ## Usage
136
211
 
137
212
  ### Interactive TUI
@@ -183,8 +258,11 @@ Supported formats: PNG, JPG, JPEG, WebP, GIF, BMP (up to 5 MB each, 10 per messa
183
258
  | `/theme` | Interactive theme picker with live preview (`Ctrl+T`). Saved to config. |
184
259
  | `/theme NAME` | Set theme by name directly. |
185
260
  | `/resume` | Pick a past conversation to restore. |
186
- | `/compact` | Summarize older turns to free context. Suggested automatically at ~80% full. |
261
+ | `/compact` | Summarize older turns to free context. Suggested automatically at ~80% full. Extracts memories if memory is enabled. |
187
262
  | `/init` | Scan the repo and write a `KIMI.md` so future agents have project context. |
263
+ | `/memory` | Show memory stats (total count, DB size, by category). |
264
+ | `/memory search <query>` | Search stored memories manually. |
265
+ | `/memory clear` | Wipe all memories for the current repo. |
188
266
  | `/mcp list` | List connected MCP servers and their tools. |
189
267
  | `/mcp reload` | Disconnect and reconnect all configured MCP servers. |
190
268
  | `/reasoning` | Toggle chain-of-thought display. |
@@ -271,7 +349,7 @@ All tool calls show inline; mutating ones require per-call approval the first ti
271
349
  @cf/moonshotai/kimi-k2.6
272
350
  ```
273
351
 
274
- Direct `fetch` to Workers AI, OpenAI-compatible `messages` + `tools` payload, SSE stream with reasoning + content + tool-call deltas accumulated by index.
352
+ Direct `fetch` to Workers AI by default, or the native Workers AI AI Gateway endpoint when `aiGatewayId` is configured. The payload remains OpenAI-compatible `messages` + `tools`, with an SSE stream containing reasoning + content + tool-call deltas accumulated by index.
275
353
 
276
354
  ## Development
277
355
 
@@ -384,6 +462,10 @@ For a real-world test, try the [official GitHub MCP server](https://github.com/m
384
462
 
385
463
  Then ask: `search for issues labeled bug in sinameraji/kimiflare`
386
464
 
465
+ ## Credits
466
+
467
+ - **Cloudflare Agent Memory** — This feature was inspired by [Cloudflare's Agent Memory](https://blog.cloudflare.com/introducing-agent-memory/) announcement. While Cloudflare's managed service requires a platform binding, kimiflare implements a local self-hosted equivalent using SQLite + Workers AI embeddings so you can use it today with your own account.
468
+
387
469
  ## License
388
470
 
389
471
  [MIT](LICENSE) © Sina Meraji