mlx-code 0.0.20__tar.gz → 0.0.22__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (29) hide show
  1. {mlx_code-0.0.20 → mlx_code-0.0.22}/PKG-INFO +109 -66
  2. {mlx_code-0.0.20 → mlx_code-0.0.22}/README.md +106 -65
  3. mlx_code-0.0.20/mlx_code/ntui.py → mlx_code-0.0.22/mlx_code/bare.py +1 -0
  4. mlx_code-0.0.22/mlx_code/bats.py +300 -0
  5. {mlx_code-0.0.20 → mlx_code-0.0.22}/mlx_code/main.py +73 -12
  6. {mlx_code-0.0.20 → mlx_code-0.0.22}/mlx_code/repl.py +93 -31
  7. {mlx_code-0.0.20 → mlx_code-0.0.22}/mlx_code/view_log.py +1 -1
  8. mlx_code-0.0.22/mlx_code/web.py +485 -0
  9. {mlx_code-0.0.20 → mlx_code-0.0.22}/mlx_code.egg-info/PKG-INFO +109 -66
  10. {mlx_code-0.0.20 → mlx_code-0.0.22}/mlx_code.egg-info/SOURCES.txt +3 -1
  11. {mlx_code-0.0.20 → mlx_code-0.0.22}/mlx_code.egg-info/requires.txt +2 -0
  12. {mlx_code-0.0.20 → mlx_code-0.0.22}/setup.py +3 -1
  13. {mlx_code-0.0.20 → mlx_code-0.0.22}/LICENSE +0 -0
  14. {mlx_code-0.0.20 → mlx_code-0.0.22}/mlx_code/__init__.py +0 -0
  15. {mlx_code-0.0.20 → mlx_code-0.0.22}/mlx_code/apis.py +0 -0
  16. {mlx_code-0.0.20 → mlx_code-0.0.22}/mlx_code/gits.py +0 -0
  17. {mlx_code-0.0.20 → mlx_code-0.0.22}/mlx_code/lsp_tool.py +0 -0
  18. {mlx_code-0.0.20 → mlx_code-0.0.22}/mlx_code/mcb.py +0 -0
  19. {mlx_code-0.0.20 → mlx_code-0.0.22}/mlx_code/mcb_tool.py +0 -0
  20. {mlx_code-0.0.20 → mlx_code-0.0.22}/mlx_code/stream_log.py +0 -0
  21. {mlx_code-0.0.20 → mlx_code-0.0.22}/mlx_code/tools.py +0 -0
  22. {mlx_code-0.0.20 → mlx_code-0.0.22}/mlx_code/util.py +0 -0
  23. {mlx_code-0.0.20 → mlx_code-0.0.22}/mlx_code/view_git.py +0 -0
  24. {mlx_code-0.0.20 → mlx_code-0.0.22}/mlx_code.egg-info/dependency_links.txt +0 -0
  25. {mlx_code-0.0.20 → mlx_code-0.0.22}/mlx_code.egg-info/entry_points.txt +0 -0
  26. {mlx_code-0.0.20 → mlx_code-0.0.22}/mlx_code.egg-info/top_level.txt +0 -0
  27. {mlx_code-0.0.20 → mlx_code-0.0.22}/setup.cfg +0 -0
  28. {mlx_code-0.0.20 → mlx_code-0.0.22}/tests/__init__.py +0 -0
  29. {mlx_code-0.0.20 → mlx_code-0.0.22}/tests/test.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: mlx-code
3
- Version: 0.0.20
3
+ Version: 0.0.22
4
4
  Summary: Coding Agent for Mac
5
5
  Home-page: https://josefalbers.github.io/mlx-code/
6
6
  Author: J Joe
@@ -17,6 +17,8 @@ Requires-Dist: httpx
17
17
  Requires-Dist: pydantic
18
18
  Requires-Dist: textual>=8.2.7
19
19
  Requires-Dist: rich>=15.0.0
20
+ Requires-Dist: starlette
21
+ Requires-Dist: uvicorn
20
22
  Provides-Extra: all
21
23
  Requires-Dist: python-lsp-server[all]; extra == "all"
22
24
  Requires-Dist: GitPython; extra == "all"
@@ -38,16 +40,16 @@ Dynamic: summary
38
40
 
39
41
  A Git-native coding agent that can run entirely on your Mac. No API keys, no cloud, and no data leaving your machine. Powered by Apple MLX, it turns commits, branches, and worktrees into the agent’s state, history, and execution model
40
42
 
41
- https://github.com/user-attachments/assets/0569d101-8d0a-4e67-9e82-fce84a5ef3f0
43
+ [![v0.0.27](https://github.com/user-attachments/assets/8a1c131a-dda1-4b52-9fa6-9c0fbccb5ea6)](https://youtube.com/shorts/1LuifKFKixc)
42
44
 
43
45
  ---
44
46
 
45
47
  ## Architecture
46
48
 
47
49
  ```
48
- Conversation tree (nodes = git commits with embedded chat history)
50
+ Worktrees:
49
51
 
50
- main ──●──●──●──●──●──●──●──●──●──●
52
+ main ──●──●──●──●──●──●──●──●──●──●──●──●──●──●───────────► Node = git commit + chat hx
51
53
  │ │
52
54
  │ └── branch-1 ──●──●──●
53
55
  │ │ ┌────────────┐
@@ -55,32 +57,30 @@ Conversation tree (nodes = git commits with embedded chat history)
55
57
  │ └─────┬──────┘
56
58
  └── branch-0 ──●──●──● │
57
59
 
60
+ Tabs: ├────────────► Tab = git branch + Agent
58
61
 
59
- REPL tabs (each tab = a git branch + agent) │
60
-
61
-
62
- ┌──────────────────────────────────────────────┼─────────┐
62
+ ┌──────────────────────────────────────────────│─────────┐
63
63
  │ TUI tabs │ │
64
64
  │ ┌──────┐ ┌──────────┐ ┌──────────┐ ┌─────┴──────┐ │
65
65
  │ │ main │ │ branch-0 │ │ branch-1 │ │ branch-1-0 │ │
66
66
  │ └──────┘ └────┬─────┘ └──────────┘ └────────────┘ │
67
- └─────────────────┼──────────────────────────────────────┘
67
+ └─────────────────│──────────────────────────────────────┘
68
68
 
69
- ├────────────────────────────────────► each tab is an independent Agent
69
+ Agents: ├─────────────────────────────────────────► Each tab runs its own Agent
70
70
 
71
- ┌────┴────────────────────────────────┐
72
- │ Agent
73
- ┌──────────────┐ ┌──────────────┐
74
- │ │ API: │ │ Tools:
75
- │ │ MLX (local) │ │ Read Write │
76
- │ │ Claude │ │ Edit Bash │
77
- │ │ Gemini │ │ Grep Find │
78
- │ │ OpenAI │ │ Ls Skill
79
- └──────────────┘ │ Agent ───────┼─┼───► spawns child Agent
80
- └──────────────┘ (each with own tools + worktree + etc)
81
- │ Git worktree
82
- │ (isolation + session state)
83
- └─────────────────────────────────────┘
71
+ ┌────┴─────────────────────────────────────┐
72
+ │ Agent
73
+ ┌────────────────┐ ┌────────────────┐
74
+ │ │ API: │ │ Tools:
75
+ │ │ Local (mlx-lm) │ │ Read Write │
76
+ │ │ Gemini │ │ Edit Bash │
77
+ │ │ Claude │ │ Grep Find │
78
+ │ │ Codex │ │ Ls Skill
79
+ │ DeepSeek │ │ Agent ─────────┼──┼───► Recursively spawns sub-Agents
80
+ └────────────────┘ └────────────────┘
81
+ │ Git worktree
82
+ │ (isolation + session state)
83
+ └──────────────────────────────────────────┘
84
84
  ```
85
85
 
86
86
  Each layer is importable and composable on its own. A commit records state, a branch records an alternative path, and a tab is just a live view over an `Agent`.
@@ -95,28 +95,31 @@ result = await agent.run('refactor utils.py to use dataclasses')
95
95
 
96
96
  ---
97
97
 
98
+ ## Core ideas
99
+
100
+ - **Git is the state machine.** Every file-changing agent step is committed with the conversation that produced it, so you can inspect, resume, and branch from any checkpoint.
101
+ - **Branches are alternative futures.** A branch is not just a Git branch; it is a different reasoning path with its own worktree and session state.
102
+ - **Agents are the primitive.** Tabs, branches, and delegated subtasks are all instances of the same `Agent` abstraction.
103
+ - **Worktrees provide isolation.** The agent edits in a separate worktree, so your main checkout stays clean and recoverable.
104
+
105
+ ---
106
+
98
107
  ## Quick start
99
108
 
100
109
  ```bash
110
+ # ephemeral run (no installation)
111
+ uvx --from mlx-code mlc
112
+
113
+ # or install into the current environment
101
114
  pip install mlx-code
102
- mlc # launch with local MLX model
115
+
116
+ # launch
117
+ mlc # with a local MLX model
103
118
  mlc-run --api gemini # or use a remote provider
104
- mlc-run --api deepseek --model deepseek-v4-flash
105
119
  ```
106
120
 
107
121
  That's it. The first run starts a local inference server and drops you into the REPL.
108
122
 
109
- [![Link](https://raw.githubusercontent.com/JosefAlbers/mlx-code/main/assets/mlx-code-v0.0.20.gif)](https://youtu.be/0lkY7YQCyCo)
110
-
111
- ---
112
-
113
- ## Core ideas
114
-
115
- - **Git is the state machine.** Every file-changing agent step is committed with the conversation that produced it, so you can inspect, resume, and branch from any checkpoint.
116
- - **Branches are alternative futures.** A branch is not just a Git branch; it is a different reasoning path with its own worktree and session state.
117
- - **Agents are the primitive.** Tabs, branches, and delegated subtasks are all instances of the same `Agent` abstraction.
118
- - **Worktrees provide isolation.** The agent edits in a separate worktree, so your main checkout stays clean and recoverable.
119
-
120
123
  ---
121
124
 
122
125
  ## Why mlx-code
@@ -125,12 +128,12 @@ That's it. The first run starts a local inference server and drops you into the
125
128
 
126
129
  **Git is the database.** When the agent makes file changes, they’re committed to a git worktree with the full conversation embedded in the commit message. Resume any past session by hash, branch from any checkpoint, and inspect the agent timeline with `git log`. No proprietary state files, just Git.
127
130
 
128
- **Your working directory is never at risk.** The agent operates inside a `git worktree`, not your checkout. It can make a mess, and you can inspect or discard it without ever touching `main`.
129
-
130
- **Built-in safety nets.** Subprocess environment variables go through an explicit allowlist, so secrets in your shell are never leaked to agent-spawned processes.
131
+ **Built-in safety nets.** Your working directory is never at risk. The agent operates inside a `git worktree`, not your checkout. It can make a mess, and you can inspect or discard it without ever touching `main`. Subprocess environment variables go through an explicit allowlist, so secrets in your shell are never leaked to agent-spawned processes.
131
132
 
132
133
  **Batteries included.** Everything ships in one pip install: the MLX inference engine, the multi-protocol API server, the agent loop, the tools, and the TUI. No llama.cpp, no ollama, no vLLM bridge to find and configure. And the server natively speaks OpenAI, Anthropic, Gemini, and Codex wire formats simultaneously, so `claude`, `codex`, and `gemini` CLIs can all work against your local model without a translation layer.
133
134
 
135
+ **Continuous batching.** The local inference server runs a continuous batching engine that processes multiple sequences concurrently. When you spawn parallel agents (eg, multiple tabs, `asyncio.gather` pipelines, or delegated sub-tasks) they all share the same GPU context and are stepped together each tick. A prefix cache persists KV snapshots to disk, so repeated system prompts and conversation prefixes are prefilled once and reused across sessions. No request queueing, no waiting for the previous agent to finish.
136
+
134
137
  ---
135
138
 
136
139
  ## Agent primitive
@@ -168,12 +171,12 @@ agent.messages = messages
168
171
  await agent.run("now add unit tests")
169
172
  ```
170
173
 
171
- Branch from any point in the conversation each branch gets its own worktree:
174
+ Branch from any point in the conversation. Each branch gets its own worktree:
172
175
 
173
176
  ```
174
177
  /branch # branch from current state
175
178
  /branch --rev 2 # branch from the 2nd user turn
176
- /branch --rev 3 --as-worktree try different approach
179
+ /branch --rev 3 make it use httpx instead
177
180
  ```
178
181
 
179
182
  Since it's just git, you can inspect the timeline outside the REPL:
@@ -238,6 +241,43 @@ Reliability comes from specialization plus constraint. A read-only reviewer can'
238
241
 
239
242
  ---
240
243
 
244
+ ## Continuous batching
245
+
246
+ The local server can run multiple inference sequences concurrently inside a single batch step. Instead of a global lock that serialises one request at a time, the batching engine maintains a live set of active sequences and yields tokens for all of them on every step.
247
+
248
+ ```bash
249
+ mlc --engine batch # continuous batching + built-in REPL
250
+ ```
251
+
252
+ This unlocks true parallelism for multi-agent workloads:
253
+
254
+ ```python
255
+ import asyncio
256
+ from mlx_code.repl import Agent
257
+
258
+ async def main():
259
+ agents = [Agent() for _ in range(4)]
260
+ await asyncio.gather(*[
261
+ a.run(f"Research topic: {t}")
262
+ for a, t in zip(agents, ["consensus", "cryptography", "networking", "storage"])
263
+ ])
264
+
265
+ asyncio.run(main())
266
+ ```
267
+
268
+ All four agents generate simultaneously inside the same batch. No sequential blocking.
269
+
270
+ ### Health endpoint
271
+
272
+ ```bash
273
+ curl http://127.0.0.1:8000/health
274
+ # {"status":"ok","model":"mlx-community/Qwen3.5-4B-OptiQ-4bit","active_sequences":2,"prefix_cache_files":5}
275
+ ```
276
+
277
+ `active_sequences` shows how many agents are generating right now; `prefix_cache_files` shows how many prefix KV snapshots are stored on disk.
278
+
279
+ ---
280
+
241
281
  ## Command Line
242
282
 
243
283
  ### `mlc`: local server + harness
@@ -245,20 +285,20 @@ Reliability comes from specialization plus constraint. A read-only reviewer can'
245
285
  Starts the MLX inference server and launches the built-in TUI harness against it.
246
286
 
247
287
  ```bash
248
- # Default: local server + default TUI
288
+ # Default: local server + default harness
249
289
  mlc
250
290
 
251
- # Use a simple terminal REPL instead of the TUI
252
- mlc --notui
291
+ # Continuous batching mode (default is sequential caching mode)
292
+ mlc --engine batch
293
+
294
+ # Server only, no harness
295
+ mlc --leash none
253
296
 
254
297
  # Use a different harness (routes traffic through the local server)
255
298
  mlc --leash claude
256
299
  mlc --leash gemini
257
300
  mlc --leash codex
258
301
 
259
- # Server only, no harness
260
- mlc --leash none
261
-
262
302
  # Specify a model
263
303
  mlc --model mlx-community/Qwen3.5-4B-OptiQ-4bit
264
304
 
@@ -309,10 +349,9 @@ mlc-run --api codex
309
349
  echo "explain lsp.py" | mlc-run -a deepseek | cat - PLAN.md | mlc-run --url http://localhost:9000
310
350
 
311
351
  # Simple terminal REPL (no TUI)
312
- mlc-run --notui
352
+ mlc-run --bare
313
353
  ```
314
354
 
315
-
316
355
  ---
317
356
 
318
357
  ## Using as a Library
@@ -435,18 +474,19 @@ agent = Agent(extra_tool_classes=[LiveDBTool], tool_names=["QueryDB"])
435
474
 
436
475
  | Command | Description |
437
476
  |---|---|
438
- | `/help` | Show command reference |
477
+ | `/branch [--rev N] [prompt]` | Open a new branch tab from the current (or earlier) checkpoint |
478
+ | `/diff [--all]` | Show a side-by-side diff of changes in the worktree |
439
479
  | `/clear [--config F]` | Clear conversation; `--config` reloads agent from a JSON/YAML file |
480
+ | `/tab [N]` | Jump to tab N |
440
481
  | `/history [--raw]` | Show conversation transcript; `--raw` shows the raw API message log |
441
- | `/diff [--all]` | Show a side-by-side diff of changes in the worktree |
442
- | `/errors` | Show timestamped error log for the current tab |
443
482
  | `/tools` | List active tools |
444
- | `/branch [--rev N] [prompt]` | Open a new branch tab from the current (or earlier) checkpoint |
445
483
  | `/abort` | Abort the running agent |
484
+ | `/errors` | Show timestamped error log for the current tab |
446
485
  | `/export [path]` | Export session to JSON |
447
- | `/exit` or `/quit` | Close branch tab, or exit the app |
448
- | `!command` | Run a shell command; output captured in the TUI |
449
- | `!!command` | Run an interactive command (TUI suspends, terminal handed to process) |
486
+ | `/exit [--all]` | Close branch tab, or exit the app |
487
+ | `/help` | Show command reference |
488
+ | `!command` | Run a shell command; output captured in the TUI (eg, `ls`, `cat hello.c`) |
489
+ | `$command` | Run an interactive command (eg, `vim`, `yazi`, `less hello.c`) |
450
490
 
451
491
  ### Key bindings
452
492
 
@@ -454,9 +494,9 @@ agent = Agent(extra_tool_classes=[LiveDBTool], tool_names=["QueryDB"])
454
494
  |---|---|
455
495
  | `Enter` | Submit |
456
496
  | `Ctrl-J` | Insert newline |
457
- | `Alt-1` … `Alt-9` | Jump to tab N |
458
- | `Tab` / `Shift-Tab` | Cycle through tabs |
459
- | `Ctrl-C` | Abort running agent |
497
+ | `Ctrl-1` … `Ctrl-9` | Jump to tab N |
498
+ | `Ctrl-,` / `Ctrl-.` | Cycle through tabs |
499
+ | `Ctrl-C` | Clear input, or abort running agent |
460
500
  | `Ctrl-D` | Close branch tab, or exit app |
461
501
  | `Ctrl-R` | Recall last prompt into editor |
462
502
 
@@ -474,16 +514,16 @@ agent = Agent(extra_tool_classes=[LiveDBTool], tool_names=["QueryDB"])
474
514
  | `Skill` | Retrieve named skill instructions from config |
475
515
  | `Agent` | Spawn an autonomous sub-agent for delegated work |
476
516
 
477
- All file tools enforce path sandboxing the agent cannot read or write outside the worktree.
517
+ All file tools enforce path sandboxing. The agent cannot read or write outside the worktree.
478
518
 
479
519
  ### Backends
480
520
 
481
521
  | Backend | Flag | Notes |
482
522
  |---------|------|-------|
483
- | MLX (local) | `--api noapi` | Default. Runs on-device, no API key needed |
523
+ | MLX-LM (local) | `--api noapi` | Default. Runs on-device, no API key needed |
484
524
  | Claude | `--api claude` | Requires `ANTHROPIC_API_KEY` |
485
525
  | Gemini | `--api gemini` | Requires `GOOGLE_API_KEY` |
486
- | DeepSeek | `--api deepseek` | DeepSeek API or compatible endpoint |
526
+ | DeepSeek | `--api deepseek` | Requires `DEEPSEEK_API_KEY` |
487
527
  | Codex | `--api codex` | OpenAI Codex CLI integration |
488
528
  | OpenAI | `--api openai` | Any OpenAI-compatible endpoint |
489
529
 
@@ -492,10 +532,13 @@ All file tools enforce path sandboxing — the agent cannot read or write outsid
492
532
  The local MLX server speaks OpenAI, Anthropic, and Gemini wire formats simultaneously, so you can use any compatible CLI as the frontend:
493
533
 
494
534
  ```bash
495
- mlc --leash claude # claude CLI routes through local model
496
- mlc --leash codex # codex CLI routes through local model
497
- mlc --leash gemini # gemini CLI routes through local model
498
- mlc --leash none # server only
535
+ mlc # default
536
+ mlc --web # web UI (api.mlx-code.com)
537
+ mlc --bare # no TUI
538
+ mlc --leash none # no harness
539
+ mlc --leash codex # codex CLI
540
+ mlc --leash gemini # gemini CLI
541
+ mlc --leash claude # claude code
499
542
  ```
500
543
 
501
544
  ---
@@ -2,16 +2,16 @@
2
2
 
3
3
  A Git-native coding agent that can run entirely on your Mac. No API keys, no cloud, and no data leaving your machine. Powered by Apple MLX, it turns commits, branches, and worktrees into the agent’s state, history, and execution model
4
4
 
5
- https://github.com/user-attachments/assets/0569d101-8d0a-4e67-9e82-fce84a5ef3f0
5
+ [![v0.0.27](https://github.com/user-attachments/assets/8a1c131a-dda1-4b52-9fa6-9c0fbccb5ea6)](https://youtube.com/shorts/1LuifKFKixc)
6
6
 
7
7
  ---
8
8
 
9
9
  ## Architecture
10
10
 
11
11
  ```
12
- Conversation tree (nodes = git commits with embedded chat history)
12
+ Worktrees:
13
13
 
14
- main ──●──●──●──●──●──●──●──●──●──●
14
+ main ──●──●──●──●──●──●──●──●──●──●──●──●──●──●───────────► Node = git commit + chat hx
15
15
  │ │
16
16
  │ └── branch-1 ──●──●──●
17
17
  │ │ ┌────────────┐
@@ -19,32 +19,30 @@ Conversation tree (nodes = git commits with embedded chat history)
19
19
  │ └─────┬──────┘
20
20
  └── branch-0 ──●──●──● │
21
21
 
22
+ Tabs: ├────────────► Tab = git branch + Agent
22
23
 
23
- REPL tabs (each tab = a git branch + agent) │
24
-
25
-
26
- ┌──────────────────────────────────────────────┼─────────┐
24
+ ┌──────────────────────────────────────────────│─────────┐
27
25
  │ TUI tabs │ │
28
26
  │ ┌──────┐ ┌──────────┐ ┌──────────┐ ┌─────┴──────┐ │
29
27
  │ │ main │ │ branch-0 │ │ branch-1 │ │ branch-1-0 │ │
30
28
  │ └──────┘ └────┬─────┘ └──────────┘ └────────────┘ │
31
- └─────────────────┼──────────────────────────────────────┘
29
+ └─────────────────│──────────────────────────────────────┘
32
30
 
33
- ├────────────────────────────────────► each tab is an independent Agent
31
+ Agents: ├─────────────────────────────────────────► Each tab runs its own Agent
34
32
 
35
- ┌────┴────────────────────────────────┐
36
- │ Agent
37
- ┌──────────────┐ ┌──────────────┐
38
- │ │ API: │ │ Tools:
39
- │ │ MLX (local) │ │ Read Write │
40
- │ │ Claude │ │ Edit Bash │
41
- │ │ Gemini │ │ Grep Find │
42
- │ │ OpenAI │ │ Ls Skill
43
- └──────────────┘ │ Agent ───────┼─┼───► spawns child Agent
44
- └──────────────┘ (each with own tools + worktree + etc)
45
- │ Git worktree
46
- │ (isolation + session state)
47
- └─────────────────────────────────────┘
33
+ ┌────┴─────────────────────────────────────┐
34
+ │ Agent
35
+ ┌────────────────┐ ┌────────────────┐
36
+ │ │ API: │ │ Tools:
37
+ │ │ Local (mlx-lm) │ │ Read Write │
38
+ │ │ Gemini │ │ Edit Bash │
39
+ │ │ Claude │ │ Grep Find │
40
+ │ │ Codex │ │ Ls Skill
41
+ │ DeepSeek │ │ Agent ─────────┼──┼───► Recursively spawns sub-Agents
42
+ └────────────────┘ └────────────────┘
43
+ │ Git worktree
44
+ │ (isolation + session state)
45
+ └──────────────────────────────────────────┘
48
46
  ```
49
47
 
50
48
  Each layer is importable and composable on its own. A commit records state, a branch records an alternative path, and a tab is just a live view over an `Agent`.
@@ -59,28 +57,31 @@ result = await agent.run('refactor utils.py to use dataclasses')
59
57
 
60
58
  ---
61
59
 
60
+ ## Core ideas
61
+
62
+ - **Git is the state machine.** Every file-changing agent step is committed with the conversation that produced it, so you can inspect, resume, and branch from any checkpoint.
63
+ - **Branches are alternative futures.** A branch is not just a Git branch; it is a different reasoning path with its own worktree and session state.
64
+ - **Agents are the primitive.** Tabs, branches, and delegated subtasks are all instances of the same `Agent` abstraction.
65
+ - **Worktrees provide isolation.** The agent edits in a separate worktree, so your main checkout stays clean and recoverable.
66
+
67
+ ---
68
+
62
69
  ## Quick start
63
70
 
64
71
  ```bash
72
+ # ephemeral run (no installation)
73
+ uvx --from mlx-code mlc
74
+
75
+ # or install into the current environment
65
76
  pip install mlx-code
66
- mlc # launch with local MLX model
77
+
78
+ # launch
79
+ mlc # with a local MLX model
67
80
  mlc-run --api gemini # or use a remote provider
68
- mlc-run --api deepseek --model deepseek-v4-flash
69
81
  ```
70
82
 
71
83
  That's it. The first run starts a local inference server and drops you into the REPL.
72
84
 
73
- [![Link](https://raw.githubusercontent.com/JosefAlbers/mlx-code/main/assets/mlx-code-v0.0.20.gif)](https://youtu.be/0lkY7YQCyCo)
74
-
75
- ---
76
-
77
- ## Core ideas
78
-
79
- - **Git is the state machine.** Every file-changing agent step is committed with the conversation that produced it, so you can inspect, resume, and branch from any checkpoint.
80
- - **Branches are alternative futures.** A branch is not just a Git branch; it is a different reasoning path with its own worktree and session state.
81
- - **Agents are the primitive.** Tabs, branches, and delegated subtasks are all instances of the same `Agent` abstraction.
82
- - **Worktrees provide isolation.** The agent edits in a separate worktree, so your main checkout stays clean and recoverable.
83
-
84
85
  ---
85
86
 
86
87
  ## Why mlx-code
@@ -89,12 +90,12 @@ That's it. The first run starts a local inference server and drops you into the
89
90
 
90
91
  **Git is the database.** When the agent makes file changes, they’re committed to a git worktree with the full conversation embedded in the commit message. Resume any past session by hash, branch from any checkpoint, and inspect the agent timeline with `git log`. No proprietary state files, just Git.
91
92
 
92
- **Your working directory is never at risk.** The agent operates inside a `git worktree`, not your checkout. It can make a mess, and you can inspect or discard it without ever touching `main`.
93
-
94
- **Built-in safety nets.** Subprocess environment variables go through an explicit allowlist, so secrets in your shell are never leaked to agent-spawned processes.
93
+ **Built-in safety nets.** Your working directory is never at risk. The agent operates inside a `git worktree`, not your checkout. It can make a mess, and you can inspect or discard it without ever touching `main`. Subprocess environment variables go through an explicit allowlist, so secrets in your shell are never leaked to agent-spawned processes.
95
94
 
96
95
  **Batteries included.** Everything ships in one pip install: the MLX inference engine, the multi-protocol API server, the agent loop, the tools, and the TUI. No llama.cpp, no ollama, no vLLM bridge to find and configure. And the server natively speaks OpenAI, Anthropic, Gemini, and Codex wire formats simultaneously, so `claude`, `codex`, and `gemini` CLIs can all work against your local model without a translation layer.
97
96
 
97
+ **Continuous batching.** The local inference server runs a continuous batching engine that processes multiple sequences concurrently. When you spawn parallel agents (eg, multiple tabs, `asyncio.gather` pipelines, or delegated sub-tasks) they all share the same GPU context and are stepped together each tick. A prefix cache persists KV snapshots to disk, so repeated system prompts and conversation prefixes are prefilled once and reused across sessions. No request queueing, no waiting for the previous agent to finish.
98
+
98
99
  ---
99
100
 
100
101
  ## Agent primitive
@@ -132,12 +133,12 @@ agent.messages = messages
132
133
  await agent.run("now add unit tests")
133
134
  ```
134
135
 
135
- Branch from any point in the conversation each branch gets its own worktree:
136
+ Branch from any point in the conversation. Each branch gets its own worktree:
136
137
 
137
138
  ```
138
139
  /branch # branch from current state
139
140
  /branch --rev 2 # branch from the 2nd user turn
140
- /branch --rev 3 --as-worktree try different approach
141
+ /branch --rev 3 make it use httpx instead
141
142
  ```
142
143
 
143
144
  Since it's just git, you can inspect the timeline outside the REPL:
@@ -202,6 +203,43 @@ Reliability comes from specialization plus constraint. A read-only reviewer can'
202
203
 
203
204
  ---
204
205
 
206
+ ## Continuous batching
207
+
208
+ The local server can run multiple inference sequences concurrently inside a single batch step. Instead of a global lock that serialises one request at a time, the batching engine maintains a live set of active sequences and yields tokens for all of them on every step.
209
+
210
+ ```bash
211
+ mlc --engine batch # continuous batching + built-in REPL
212
+ ```
213
+
214
+ This unlocks true parallelism for multi-agent workloads:
215
+
216
+ ```python
217
+ import asyncio
218
+ from mlx_code.repl import Agent
219
+
220
+ async def main():
221
+ agents = [Agent() for _ in range(4)]
222
+ await asyncio.gather(*[
223
+ a.run(f"Research topic: {t}")
224
+ for a, t in zip(agents, ["consensus", "cryptography", "networking", "storage"])
225
+ ])
226
+
227
+ asyncio.run(main())
228
+ ```
229
+
230
+ All four agents generate simultaneously inside the same batch. No sequential blocking.
231
+
232
+ ### Health endpoint
233
+
234
+ ```bash
235
+ curl http://127.0.0.1:8000/health
236
+ # {"status":"ok","model":"mlx-community/Qwen3.5-4B-OptiQ-4bit","active_sequences":2,"prefix_cache_files":5}
237
+ ```
238
+
239
+ `active_sequences` shows how many agents are generating right now; `prefix_cache_files` shows how many prefix KV snapshots are stored on disk.
240
+
241
+ ---
242
+
205
243
  ## Command Line
206
244
 
207
245
  ### `mlc`: local server + harness
@@ -209,20 +247,20 @@ Reliability comes from specialization plus constraint. A read-only reviewer can'
209
247
  Starts the MLX inference server and launches the built-in TUI harness against it.
210
248
 
211
249
  ```bash
212
- # Default: local server + default TUI
250
+ # Default: local server + default harness
213
251
  mlc
214
252
 
215
- # Use a simple terminal REPL instead of the TUI
216
- mlc --notui
253
+ # Continuous batching mode (default is sequential caching mode)
254
+ mlc --engine batch
255
+
256
+ # Server only, no harness
257
+ mlc --leash none
217
258
 
218
259
  # Use a different harness (routes traffic through the local server)
219
260
  mlc --leash claude
220
261
  mlc --leash gemini
221
262
  mlc --leash codex
222
263
 
223
- # Server only, no harness
224
- mlc --leash none
225
-
226
264
  # Specify a model
227
265
  mlc --model mlx-community/Qwen3.5-4B-OptiQ-4bit
228
266
 
@@ -273,10 +311,9 @@ mlc-run --api codex
273
311
  echo "explain lsp.py" | mlc-run -a deepseek | cat - PLAN.md | mlc-run --url http://localhost:9000
274
312
 
275
313
  # Simple terminal REPL (no TUI)
276
- mlc-run --notui
314
+ mlc-run --bare
277
315
  ```
278
316
 
279
-
280
317
  ---
281
318
 
282
319
  ## Using as a Library
@@ -399,18 +436,19 @@ agent = Agent(extra_tool_classes=[LiveDBTool], tool_names=["QueryDB"])
399
436
 
400
437
  | Command | Description |
401
438
  |---|---|
402
- | `/help` | Show command reference |
439
+ | `/branch [--rev N] [prompt]` | Open a new branch tab from the current (or earlier) checkpoint |
440
+ | `/diff [--all]` | Show a side-by-side diff of changes in the worktree |
403
441
  | `/clear [--config F]` | Clear conversation; `--config` reloads agent from a JSON/YAML file |
442
+ | `/tab [N]` | Jump to tab N |
404
443
  | `/history [--raw]` | Show conversation transcript; `--raw` shows the raw API message log |
405
- | `/diff [--all]` | Show a side-by-side diff of changes in the worktree |
406
- | `/errors` | Show timestamped error log for the current tab |
407
444
  | `/tools` | List active tools |
408
- | `/branch [--rev N] [prompt]` | Open a new branch tab from the current (or earlier) checkpoint |
409
445
  | `/abort` | Abort the running agent |
446
+ | `/errors` | Show timestamped error log for the current tab |
410
447
  | `/export [path]` | Export session to JSON |
411
- | `/exit` or `/quit` | Close branch tab, or exit the app |
412
- | `!command` | Run a shell command; output captured in the TUI |
413
- | `!!command` | Run an interactive command (TUI suspends, terminal handed to process) |
448
+ | `/exit [--all]` | Close branch tab, or exit the app |
449
+ | `/help` | Show command reference |
450
+ | `!command` | Run a shell command; output captured in the TUI (eg, `ls`, `cat hello.c`) |
451
+ | `$command` | Run an interactive command (eg, `vim`, `yazi`, `less hello.c`) |
414
452
 
415
453
  ### Key bindings
416
454
 
@@ -418,9 +456,9 @@ agent = Agent(extra_tool_classes=[LiveDBTool], tool_names=["QueryDB"])
418
456
  |---|---|
419
457
  | `Enter` | Submit |
420
458
  | `Ctrl-J` | Insert newline |
421
- | `Alt-1` … `Alt-9` | Jump to tab N |
422
- | `Tab` / `Shift-Tab` | Cycle through tabs |
423
- | `Ctrl-C` | Abort running agent |
459
+ | `Ctrl-1` … `Ctrl-9` | Jump to tab N |
460
+ | `Ctrl-,` / `Ctrl-.` | Cycle through tabs |
461
+ | `Ctrl-C` | Clear input, or abort running agent |
424
462
  | `Ctrl-D` | Close branch tab, or exit app |
425
463
  | `Ctrl-R` | Recall last prompt into editor |
426
464
 
@@ -438,16 +476,16 @@ agent = Agent(extra_tool_classes=[LiveDBTool], tool_names=["QueryDB"])
438
476
  | `Skill` | Retrieve named skill instructions from config |
439
477
  | `Agent` | Spawn an autonomous sub-agent for delegated work |
440
478
 
441
- All file tools enforce path sandboxing the agent cannot read or write outside the worktree.
479
+ All file tools enforce path sandboxing. The agent cannot read or write outside the worktree.
442
480
 
443
481
  ### Backends
444
482
 
445
483
  | Backend | Flag | Notes |
446
484
  |---------|------|-------|
447
- | MLX (local) | `--api noapi` | Default. Runs on-device, no API key needed |
485
+ | MLX-LM (local) | `--api noapi` | Default. Runs on-device, no API key needed |
448
486
  | Claude | `--api claude` | Requires `ANTHROPIC_API_KEY` |
449
487
  | Gemini | `--api gemini` | Requires `GOOGLE_API_KEY` |
450
- | DeepSeek | `--api deepseek` | DeepSeek API or compatible endpoint |
488
+ | DeepSeek | `--api deepseek` | Requires `DEEPSEEK_API_KEY` |
451
489
  | Codex | `--api codex` | OpenAI Codex CLI integration |
452
490
  | OpenAI | `--api openai` | Any OpenAI-compatible endpoint |
453
491
 
@@ -456,10 +494,13 @@ All file tools enforce path sandboxing — the agent cannot read or write outsid
456
494
  The local MLX server speaks OpenAI, Anthropic, and Gemini wire formats simultaneously, so you can use any compatible CLI as the frontend:
457
495
 
458
496
  ```bash
459
- mlc --leash claude # claude CLI routes through local model
460
- mlc --leash codex # codex CLI routes through local model
461
- mlc --leash gemini # gemini CLI routes through local model
462
- mlc --leash none # server only
497
+ mlc # default
498
+ mlc --web # web UI (api.mlx-code.com)
499
+ mlc --bare # no TUI
500
+ mlc --leash none # no harness
501
+ mlc --leash codex # codex CLI
502
+ mlc --leash gemini # gemini CLI
503
+ mlc --leash claude # claude code
463
504
  ```
464
505
 
465
506
  ---
@@ -110,6 +110,7 @@ class SimpleRepl:
110
110
  if out_text:
111
111
  self._write_delta(prefix + out_text, 'tool_result')
112
112
  self._last_stream_type = t
113
+ print()
113
114
  elif t == 'commit':
114
115
  self._pending_nls = 0
115
116
  self._awaiting_content = False