mlx-code 0.0.25__tar.gz → 0.0.27__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (28) hide show
  1. {mlx_code-0.0.25 → mlx_code-0.0.27}/PKG-INFO +89 -49
  2. {mlx_code-0.0.25 → mlx_code-0.0.27}/README.md +86 -48
  3. mlx_code-0.0.25/mlx_code/ntui.py → mlx_code-0.0.27/mlx_code/bare.py +1 -0
  4. mlx_code-0.0.27/mlx_code/bats.py +299 -0
  5. {mlx_code-0.0.25 → mlx_code-0.0.27}/mlx_code/main.py +65 -11
  6. {mlx_code-0.0.25 → mlx_code-0.0.27}/mlx_code/repl.py +42 -15
  7. {mlx_code-0.0.25 → mlx_code-0.0.27}/mlx_code/view_log.py +1 -1
  8. {mlx_code-0.0.25 → mlx_code-0.0.27}/mlx_code.egg-info/PKG-INFO +89 -49
  9. {mlx_code-0.0.25 → mlx_code-0.0.27}/mlx_code.egg-info/SOURCES.txt +2 -1
  10. {mlx_code-0.0.25 → mlx_code-0.0.27}/mlx_code.egg-info/requires.txt +2 -0
  11. {mlx_code-0.0.25 → mlx_code-0.0.27}/setup.py +4 -1
  12. {mlx_code-0.0.25 → mlx_code-0.0.27}/LICENSE +0 -0
  13. {mlx_code-0.0.25 → mlx_code-0.0.27}/mlx_code/__init__.py +0 -0
  14. {mlx_code-0.0.25 → mlx_code-0.0.27}/mlx_code/apis.py +0 -0
  15. {mlx_code-0.0.25 → mlx_code-0.0.27}/mlx_code/gits.py +0 -0
  16. {mlx_code-0.0.25 → mlx_code-0.0.27}/mlx_code/lsp_tool.py +0 -0
  17. {mlx_code-0.0.25 → mlx_code-0.0.27}/mlx_code/mcb.py +0 -0
  18. {mlx_code-0.0.25 → mlx_code-0.0.27}/mlx_code/mcb_tool.py +0 -0
  19. {mlx_code-0.0.25 → mlx_code-0.0.27}/mlx_code/stream_log.py +0 -0
  20. {mlx_code-0.0.25 → mlx_code-0.0.27}/mlx_code/tools.py +0 -0
  21. {mlx_code-0.0.25 → mlx_code-0.0.27}/mlx_code/util.py +0 -0
  22. {mlx_code-0.0.25 → mlx_code-0.0.27}/mlx_code/view_git.py +0 -0
  23. {mlx_code-0.0.25 → mlx_code-0.0.27}/mlx_code.egg-info/dependency_links.txt +0 -0
  24. {mlx_code-0.0.25 → mlx_code-0.0.27}/mlx_code.egg-info/entry_points.txt +0 -0
  25. {mlx_code-0.0.25 → mlx_code-0.0.27}/mlx_code.egg-info/top_level.txt +0 -0
  26. {mlx_code-0.0.25 → mlx_code-0.0.27}/setup.cfg +0 -0
  27. {mlx_code-0.0.25 → mlx_code-0.0.27}/tests/__init__.py +0 -0
  28. {mlx_code-0.0.25 → mlx_code-0.0.27}/tests/test.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: mlx-code
3
- Version: 0.0.25
3
+ Version: 0.0.27
4
4
  Summary: Coding Agent for Mac
5
5
  Home-page: https://josefalbers.github.io/mlx-code/
6
6
  Author: J Joe
@@ -17,6 +17,8 @@ Requires-Dist: httpx
17
17
  Requires-Dist: pydantic
18
18
  Requires-Dist: textual>=8.2.7
19
19
  Requires-Dist: rich>=15.0.0
20
+ Requires-Dist: starlette
21
+ Requires-Dist: uvicorn
20
22
  Provides-Extra: all
21
23
  Requires-Dist: python-lsp-server[all]; extra == "all"
22
24
  Requires-Dist: GitPython; extra == "all"
@@ -45,9 +47,9 @@ A Git-native coding agent that can run entirely on your Mac. No API keys, no clo
45
47
  ## Architecture
46
48
 
47
49
  ```
48
- Conversation tree (nodes = git commits with embedded chat history)
50
+ Worktrees:
49
51
 
50
- main ──●──●──●──●──●──●──●──●──●──●
52
+ main ──●──●──●──●──●──●──●──●──●──●──●──●──●──●───────────► Node = git commit + chat history
51
53
  │ │
52
54
  │ └── branch-1 ──●──●──●
53
55
  │ │ ┌────────────┐
@@ -56,7 +58,7 @@ Conversation tree (nodes = git commits with embedded chat history)
56
58
  └── branch-0 ──●──●──● │
57
59
 
58
60
 
59
- REPL tabs (each tab = a git branch + agent) │
61
+ Tabs: ├────────────► Tab = git branch + Agent
60
62
 
61
63
 
62
64
  ┌──────────────────────────────────────────────┼─────────┐
@@ -66,21 +68,21 @@ REPL tabs (each tab = a git branch + agent) │
66
68
  │ └──────┘ └────┬─────┘ └──────────┘ └────────────┘ │
67
69
  └─────────────────┼──────────────────────────────────────┘
68
70
 
69
- ├────────────────────────────────────► each tab is an independent Agent
71
+ Agents: ├─────────────────────────────────────────► Each tab runs its own Agent
70
72
 
71
- ┌────┴─────────────────────────────────┐
72
- │ Agent
73
- ┌──────────────┐ ┌──────────────┐
74
- │ │ API: │ │ Tools: │ │
75
- │ │ MLX (local) │ │ Read Write │ │
76
- │ │ Claude │ │ Edit Bash │ │
77
- │ │ Gemini │ │ Grep Find │ │
78
- │ │ OpenAI │ │ Ls Skill │ │
79
- └──────────────┘ │ Agent ───────┼──┼───► spawns child Agent
80
- └──────────────┘(each with own tools + worktree + etc)
81
- │ Git worktree
82
- │ (isolation + session state)
83
- └──────────────────────────────────────┘
73
+ ┌────┴─────────────────────────────────────┐
74
+ │ Agent
75
+ ┌────────────────┐ ┌────────────────┐
76
+ │ │ API: │ │ Tools: │ │
77
+ │ │ Local (mlx-lm) │ │ Read Write │ │
78
+ │ │ Gemini │ │ Edit Bash │ │
79
+ │ │ Claude │ │ Grep Find │ │
80
+ │ │ Codex │ │ Ls Skill │ │
81
+ │ DeepSeek │ │ Agent ─────────┼──┼───► Recursively spawns sub-Agents
82
+ └────────────────┘ └────────────────┘
83
+ │ Git worktree
84
+ │ (isolation + session state)
85
+ └──────────────────────────────────────────┘
84
86
  ```
85
87
 
86
88
  Each layer is importable and composable on its own. A commit records state, a branch records an alternative path, and a tab is just a live view over an `Agent`.
@@ -95,6 +97,15 @@ result = await agent.run('refactor utils.py to use dataclasses')
95
97
 
96
98
  ---
97
99
 
100
+ ## Core ideas
101
+
102
+ - **Git is the state machine.** Every file-changing agent step is committed with the conversation that produced it, so you can inspect, resume, and branch from any checkpoint.
103
+ - **Branches are alternative futures.** A branch is not just a Git branch; it is a different reasoning path with its own worktree and session state.
104
+ - **Agents are the primitive.** Tabs, branches, and delegated subtasks are all instances of the same `Agent` abstraction.
105
+ - **Worktrees provide isolation.** The agent edits in a separate worktree, so your main checkout stays clean and recoverable.
106
+
107
+ ---
108
+
98
109
  ## Quick start
99
110
 
100
111
  ```bash
@@ -104,36 +115,27 @@ uvx --from mlx-code mlc
104
115
  # or install into the current environment
105
116
  pip install mlx-code
106
117
 
107
- mlc # launch with local MLX model
118
+ # launch
119
+ mlc # with a local MLX model
108
120
  mlc-run --api gemini # or use a remote provider
109
- mlc-run --api deepseek --model deepseek-v4-flash
110
121
  ```
111
122
 
112
123
  That's it. The first run starts a local inference server and drops you into the REPL.
113
124
 
114
125
  ---
115
126
 
116
- ## Core ideas
117
-
118
- - **Git is the state machine.** Every file-changing agent step is committed with the conversation that produced it, so you can inspect, resume, and branch from any checkpoint.
119
- - **Branches are alternative futures.** A branch is not just a Git branch; it is a different reasoning path with its own worktree and session state.
120
- - **Agents are the primitive.** Tabs, branches, and delegated subtasks are all instances of the same `Agent` abstraction.
121
- - **Worktrees provide isolation.** The agent edits in a separate worktree, so your main checkout stays clean and recoverable.
122
-
123
- ---
124
-
125
127
  ## Why mlx-code
126
128
 
127
129
  **Agents as reusable workflow atoms.** Tabs, branches, and tasks are all managed within instances of `Agent`. Each one gets its own conversation, its own tools, and its own worktree. Agents can spawn sub-agents to delegate subtasks, and each child is a full agent with its own scoped tool set.
128
130
 
129
131
  **Git is the database.** When the agent makes file changes, they’re committed to a git worktree with the full conversation embedded in the commit message. Resume any past session by hash, branch from any checkpoint, and inspect the agent timeline with `git log`. No proprietary state files, just Git.
130
132
 
131
- **Your working directory is never at risk.** The agent operates inside a `git worktree`, not your checkout. It can make a mess, and you can inspect or discard it without ever touching `main`.
132
-
133
- **Built-in safety nets.** Subprocess environment variables go through an explicit allowlist, so secrets in your shell are never leaked to agent-spawned processes.
133
+ **Built-in safety nets.** Your working directory is never at risk. The agent operates inside a `git worktree`, not your checkout. It can make a mess, and you can inspect or discard it without ever touching `main`. Subprocess environment variables go through an explicit allowlist, so secrets in your shell are never leaked to agent-spawned processes.
134
134
 
135
135
  **Batteries included.** Everything ships in one pip install: the MLX inference engine, the multi-protocol API server, the agent loop, the tools, and the TUI. No llama.cpp, no ollama, no vLLM bridge to find and configure. And the server natively speaks OpenAI, Anthropic, Gemini, and Codex wire formats simultaneously, so `claude`, `codex`, and `gemini` CLIs can all work against your local model without a translation layer.
136
136
 
137
+ **Continuous batching.** The local inference server runs a continuous batching engine that processes multiple sequences concurrently. When you spawn parallel agents (eg, multiple tabs, `asyncio.gather` pipelines, or delegated sub-tasks) they all share the same GPU context and are stepped together each tick. A prefix cache persists KV snapshots to disk, so repeated system prompts and conversation prefixes are prefilled once and reused across sessions. No request queueing, no waiting for the previous agent to finish.
138
+
137
139
  ---
138
140
 
139
141
  ## Agent primitive
@@ -171,12 +173,12 @@ agent.messages = messages
171
173
  await agent.run("now add unit tests")
172
174
  ```
173
175
 
174
- Branch from any point in the conversation each branch gets its own worktree:
176
+ Branch from any point in the conversation. Each branch gets its own worktree:
175
177
 
176
178
  ```
177
179
  /branch # branch from current state
178
180
  /branch --rev 2 # branch from the 2nd user turn
179
- /branch --rev 3 --as-worktree try different approach
181
+ /branch --rev 3 make it use httpx instead
180
182
  ```
181
183
 
182
184
  Since it's just git, you can inspect the timeline outside the REPL:
@@ -241,6 +243,43 @@ Reliability comes from specialization plus constraint. A read-only reviewer can'
241
243
 
242
244
  ---
243
245
 
246
+ ## Continuous batching
247
+
248
+ The local server can run multiple inference sequences concurrently inside a single batch step. Instead of a global lock that serialises one request at a time, the batching engine maintains a live set of active sequences and yields tokens for all of them on every step.
249
+
250
+ ```bash
251
+ mlc --engine batch # continuous batching + built-in REPL
252
+ ```
253
+
254
+ This unlocks true parallelism for multi-agent workloads:
255
+
256
+ ```python
257
+ import asyncio
258
+ from mlx_code.repl import Agent
259
+
260
+ async def main():
261
+ agents = [Agent() for _ in range(4)]
262
+ await asyncio.gather(*[
263
+ a.run(f"Research topic: {t}")
264
+ for a, t in zip(agents, ["consensus", "cryptography", "networking", "storage"])
265
+ ])
266
+
267
+ asyncio.run(main())
268
+ ```
269
+
270
+ All four agents generate simultaneously inside the same batch. No sequential blocking.
271
+
272
+ ### Health endpoint
273
+
274
+ ```bash
275
+ curl http://127.0.0.1:8000/health
276
+ # {"status":"ok","model":"mlx-community/Qwen3.5-4B-OptiQ-4bit","active_sequences":2,"prefix_cache_files":5}
277
+ ```
278
+
279
+ `active_sequences` shows how many agents are generating right now; `prefix_cache_files` shows how many prefix KV snapshots are stored on disk.
280
+
281
+ ---
282
+
244
283
  ## Command Line
245
284
 
246
285
  ### `mlc`: local server + harness
@@ -248,20 +287,20 @@ Reliability comes from specialization plus constraint. A read-only reviewer can'
248
287
  Starts the MLX inference server and launches the built-in TUI harness against it.
249
288
 
250
289
  ```bash
251
- # Default: local server + default TUI
290
+ # Default: local server + default harness
252
291
  mlc
253
292
 
254
- # Use a simple terminal REPL instead of the TUI
255
- mlc --notui
293
+ # Continuous batching mode (default is sequential caching mode)
294
+ mlc --engine batch
295
+
296
+ # Server only, no harness
297
+ mlc --leash none
256
298
 
257
299
  # Use a different harness (routes traffic through the local server)
258
300
  mlc --leash claude
259
301
  mlc --leash gemini
260
302
  mlc --leash codex
261
303
 
262
- # Server only, no harness
263
- mlc --leash none
264
-
265
304
  # Specify a model
266
305
  mlc --model mlx-community/Qwen3.5-4B-OptiQ-4bit
267
306
 
@@ -312,7 +351,7 @@ mlc-run --api codex
312
351
  echo "explain lsp.py" | mlc-run -a deepseek | cat - PLAN.md | mlc-run --url http://localhost:9000
313
352
 
314
353
  # Simple terminal REPL (no TUI)
315
- mlc-run --notui
354
+ mlc-run --bare
316
355
  ```
317
356
 
318
357
  ---
@@ -437,18 +476,19 @@ agent = Agent(extra_tool_classes=[LiveDBTool], tool_names=["QueryDB"])
437
476
 
438
477
  | Command | Description |
439
478
  |---|---|
440
- | `/help` | Show command reference |
479
+ | `/branch [--rev N] [prompt]` | Open a new branch tab from the current (or earlier) checkpoint |
480
+ | `/diff [--all]` | Show a side-by-side diff of changes in the worktree |
441
481
  | `/clear [--config F]` | Clear conversation; `--config` reloads agent from a JSON/YAML file |
482
+ | `/tab [N]` | Jump to tab N |
442
483
  | `/history [--raw]` | Show conversation transcript; `--raw` shows the raw API message log |
443
- | `/diff [--all]` | Show a side-by-side diff of changes in the worktree |
444
- | `/errors` | Show timestamped error log for the current tab |
445
484
  | `/tools` | List active tools |
446
- | `/branch [--rev N] [prompt]` | Open a new branch tab from the current (or earlier) checkpoint |
447
485
  | `/abort` | Abort the running agent |
486
+ | `/errors` | Show timestamped error log for the current tab |
448
487
  | `/export [path]` | Export session to JSON |
449
488
  | `/exit [--all]` | Close branch tab, or exit the app |
450
- | `!command` | Run a shell command; output captured in the TUI |
451
- | `$command` | Run an interactive command (TUI suspends, terminal handed to process) |
489
+ | `/help` | Show command reference |
490
+ | `!command` | Run a shell command; output captured in the TUI (eg, `ls`, `cat hello.c`) |
491
+ | `$command` | Run an interactive command (eg, `vim`, `yazi`, `less hello.c`) |
452
492
 
453
493
  ### Key bindings
454
494
 
@@ -458,7 +498,7 @@ agent = Agent(extra_tool_classes=[LiveDBTool], tool_names=["QueryDB"])
458
498
  | `Ctrl-J` | Insert newline |
459
499
  | `Ctrl-1` … `Ctrl-9` | Jump to tab N |
460
500
  | `Ctrl-,` / `Ctrl-.` | Cycle through tabs |
461
- | `Ctrl-C` | Abort running agent |
501
+ | `Ctrl-C` | Clear input, or abort running agent |
462
502
  | `Ctrl-D` | Close branch tab, or exit app |
463
503
  | `Ctrl-R` | Recall last prompt into editor |
464
504
 
@@ -476,7 +516,7 @@ agent = Agent(extra_tool_classes=[LiveDBTool], tool_names=["QueryDB"])
476
516
  | `Skill` | Retrieve named skill instructions from config |
477
517
  | `Agent` | Spawn an autonomous sub-agent for delegated work |
478
518
 
479
- All file tools enforce path sandboxing the agent cannot read or write outside the worktree.
519
+ All file tools enforce path sandboxing. The agent cannot read or write outside the worktree.
480
520
 
481
521
  ### Backends
482
522
 
@@ -9,9 +9,9 @@ A Git-native coding agent that can run entirely on your Mac. No API keys, no clo
9
9
  ## Architecture
10
10
 
11
11
  ```
12
- Conversation tree (nodes = git commits with embedded chat history)
12
+ Worktrees:
13
13
 
14
- main ──●──●──●──●──●──●──●──●──●──●
14
+ main ──●──●──●──●──●──●──●──●──●──●──●──●──●──●───────────► Node = git commit + chat history
15
15
  │ │
16
16
  │ └── branch-1 ──●──●──●
17
17
  │ │ ┌────────────┐
@@ -20,7 +20,7 @@ Conversation tree (nodes = git commits with embedded chat history)
20
20
  └── branch-0 ──●──●──● │
21
21
 
22
22
 
23
- REPL tabs (each tab = a git branch + agent) │
23
+ Tabs: ├────────────► Tab = git branch + Agent
24
24
 
25
25
 
26
26
  ┌──────────────────────────────────────────────┼─────────┐
@@ -30,21 +30,21 @@ REPL tabs (each tab = a git branch + agent) │
30
30
  │ └──────┘ └────┬─────┘ └──────────┘ └────────────┘ │
31
31
  └─────────────────┼──────────────────────────────────────┘
32
32
 
33
- ├────────────────────────────────────► each tab is an independent Agent
33
+ Agents: ├─────────────────────────────────────────► Each tab runs its own Agent
34
34
 
35
- ┌────┴─────────────────────────────────┐
36
- │ Agent
37
- ┌──────────────┐ ┌──────────────┐
38
- │ │ API: │ │ Tools: │ │
39
- │ │ MLX (local) │ │ Read Write │ │
40
- │ │ Claude │ │ Edit Bash │ │
41
- │ │ Gemini │ │ Grep Find │ │
42
- │ │ OpenAI │ │ Ls Skill │ │
43
- └──────────────┘ │ Agent ───────┼──┼───► spawns child Agent
44
- └──────────────┘(each with own tools + worktree + etc)
45
- │ Git worktree
46
- │ (isolation + session state)
47
- └──────────────────────────────────────┘
35
+ ┌────┴─────────────────────────────────────┐
36
+ │ Agent
37
+ ┌────────────────┐ ┌────────────────┐
38
+ │ │ API: │ │ Tools: │ │
39
+ │ │ Local (mlx-lm) │ │ Read Write │ │
40
+ │ │ Gemini │ │ Edit Bash │ │
41
+ │ │ Claude │ │ Grep Find │ │
42
+ │ │ Codex │ │ Ls Skill │ │
43
+ │ DeepSeek │ │ Agent ─────────┼──┼───► Recursively spawns sub-Agents
44
+ └────────────────┘ └────────────────┘
45
+ │ Git worktree
46
+ │ (isolation + session state)
47
+ └──────────────────────────────────────────┘
48
48
  ```
49
49
 
50
50
  Each layer is importable and composable on its own. A commit records state, a branch records an alternative path, and a tab is just a live view over an `Agent`.
@@ -59,6 +59,15 @@ result = await agent.run('refactor utils.py to use dataclasses')
59
59
 
60
60
  ---
61
61
 
62
+ ## Core ideas
63
+
64
+ - **Git is the state machine.** Every file-changing agent step is committed with the conversation that produced it, so you can inspect, resume, and branch from any checkpoint.
65
+ - **Branches are alternative futures.** A branch is not just a Git branch; it is a different reasoning path with its own worktree and session state.
66
+ - **Agents are the primitive.** Tabs, branches, and delegated subtasks are all instances of the same `Agent` abstraction.
67
+ - **Worktrees provide isolation.** The agent edits in a separate worktree, so your main checkout stays clean and recoverable.
68
+
69
+ ---
70
+
62
71
  ## Quick start
63
72
 
64
73
  ```bash
@@ -68,36 +77,27 @@ uvx --from mlx-code mlc
68
77
  # or install into the current environment
69
78
  pip install mlx-code
70
79
 
71
- mlc # launch with local MLX model
80
+ # launch
81
+ mlc # with a local MLX model
72
82
  mlc-run --api gemini # or use a remote provider
73
- mlc-run --api deepseek --model deepseek-v4-flash
74
83
  ```
75
84
 
76
85
  That's it. The first run starts a local inference server and drops you into the REPL.
77
86
 
78
87
  ---
79
88
 
80
- ## Core ideas
81
-
82
- - **Git is the state machine.** Every file-changing agent step is committed with the conversation that produced it, so you can inspect, resume, and branch from any checkpoint.
83
- - **Branches are alternative futures.** A branch is not just a Git branch; it is a different reasoning path with its own worktree and session state.
84
- - **Agents are the primitive.** Tabs, branches, and delegated subtasks are all instances of the same `Agent` abstraction.
85
- - **Worktrees provide isolation.** The agent edits in a separate worktree, so your main checkout stays clean and recoverable.
86
-
87
- ---
88
-
89
89
  ## Why mlx-code
90
90
 
91
91
  **Agents as reusable workflow atoms.** Tabs, branches, and tasks are all managed within instances of `Agent`. Each one gets its own conversation, its own tools, and its own worktree. Agents can spawn sub-agents to delegate subtasks, and each child is a full agent with its own scoped tool set.
92
92
 
93
93
  **Git is the database.** When the agent makes file changes, they’re committed to a git worktree with the full conversation embedded in the commit message. Resume any past session by hash, branch from any checkpoint, and inspect the agent timeline with `git log`. No proprietary state files, just Git.
94
94
 
95
- **Your working directory is never at risk.** The agent operates inside a `git worktree`, not your checkout. It can make a mess, and you can inspect or discard it without ever touching `main`.
96
-
97
- **Built-in safety nets.** Subprocess environment variables go through an explicit allowlist, so secrets in your shell are never leaked to agent-spawned processes.
95
+ **Built-in safety nets.** Your working directory is never at risk. The agent operates inside a `git worktree`, not your checkout. It can make a mess, and you can inspect or discard it without ever touching `main`. Subprocess environment variables go through an explicit allowlist, so secrets in your shell are never leaked to agent-spawned processes.
98
96
 
99
97
  **Batteries included.** Everything ships in one pip install: the MLX inference engine, the multi-protocol API server, the agent loop, the tools, and the TUI. No llama.cpp, no ollama, no vLLM bridge to find and configure. And the server natively speaks OpenAI, Anthropic, Gemini, and Codex wire formats simultaneously, so `claude`, `codex`, and `gemini` CLIs can all work against your local model without a translation layer.
100
98
 
99
+ **Continuous batching.** The local inference server runs a continuous batching engine that processes multiple sequences concurrently. When you spawn parallel agents (eg, multiple tabs, `asyncio.gather` pipelines, or delegated sub-tasks) they all share the same GPU context and are stepped together each tick. A prefix cache persists KV snapshots to disk, so repeated system prompts and conversation prefixes are prefilled once and reused across sessions. No request queueing, no waiting for the previous agent to finish.
100
+
101
101
  ---
102
102
 
103
103
  ## Agent primitive
@@ -135,12 +135,12 @@ agent.messages = messages
135
135
  await agent.run("now add unit tests")
136
136
  ```
137
137
 
138
- Branch from any point in the conversation each branch gets its own worktree:
138
+ Branch from any point in the conversation. Each branch gets its own worktree:
139
139
 
140
140
  ```
141
141
  /branch # branch from current state
142
142
  /branch --rev 2 # branch from the 2nd user turn
143
- /branch --rev 3 --as-worktree try different approach
143
+ /branch --rev 3 make it use httpx instead
144
144
  ```
145
145
 
146
146
  Since it's just git, you can inspect the timeline outside the REPL:
@@ -205,6 +205,43 @@ Reliability comes from specialization plus constraint. A read-only reviewer can'
205
205
 
206
206
  ---
207
207
 
208
+ ## Continuous batching
209
+
210
+ The local server can run multiple inference sequences concurrently inside a single batch step. Instead of a global lock that serialises one request at a time, the batching engine maintains a live set of active sequences and yields tokens for all of them on every step.
211
+
212
+ ```bash
213
+ mlc --engine batch # continuous batching + built-in REPL
214
+ ```
215
+
216
+ This unlocks true parallelism for multi-agent workloads:
217
+
218
+ ```python
219
+ import asyncio
220
+ from mlx_code.repl import Agent
221
+
222
+ async def main():
223
+ agents = [Agent() for _ in range(4)]
224
+ await asyncio.gather(*[
225
+ a.run(f"Research topic: {t}")
226
+ for a, t in zip(agents, ["consensus", "cryptography", "networking", "storage"])
227
+ ])
228
+
229
+ asyncio.run(main())
230
+ ```
231
+
232
+ All four agents generate simultaneously inside the same batch. No sequential blocking.
233
+
234
+ ### Health endpoint
235
+
236
+ ```bash
237
+ curl http://127.0.0.1:8000/health
238
+ # {"status":"ok","model":"mlx-community/Qwen3.5-4B-OptiQ-4bit","active_sequences":2,"prefix_cache_files":5}
239
+ ```
240
+
241
+ `active_sequences` shows how many agents are generating right now; `prefix_cache_files` shows how many prefix KV snapshots are stored on disk.
242
+
243
+ ---
244
+
208
245
  ## Command Line
209
246
 
210
247
  ### `mlc`: local server + harness
@@ -212,20 +249,20 @@ Reliability comes from specialization plus constraint. A read-only reviewer can'
212
249
  Starts the MLX inference server and launches the built-in TUI harness against it.
213
250
 
214
251
  ```bash
215
- # Default: local server + default TUI
252
+ # Default: local server + default harness
216
253
  mlc
217
254
 
218
- # Use a simple terminal REPL instead of the TUI
219
- mlc --notui
255
+ # Continuous batching mode (default is sequential caching mode)
256
+ mlc --engine batch
257
+
258
+ # Server only, no harness
259
+ mlc --leash none
220
260
 
221
261
  # Use a different harness (routes traffic through the local server)
222
262
  mlc --leash claude
223
263
  mlc --leash gemini
224
264
  mlc --leash codex
225
265
 
226
- # Server only, no harness
227
- mlc --leash none
228
-
229
266
  # Specify a model
230
267
  mlc --model mlx-community/Qwen3.5-4B-OptiQ-4bit
231
268
 
@@ -276,7 +313,7 @@ mlc-run --api codex
276
313
  echo "explain lsp.py" | mlc-run -a deepseek | cat - PLAN.md | mlc-run --url http://localhost:9000
277
314
 
278
315
  # Simple terminal REPL (no TUI)
279
- mlc-run --notui
316
+ mlc-run --bare
280
317
  ```
281
318
 
282
319
  ---
@@ -401,18 +438,19 @@ agent = Agent(extra_tool_classes=[LiveDBTool], tool_names=["QueryDB"])
401
438
 
402
439
  | Command | Description |
403
440
  |---|---|
404
- | `/help` | Show command reference |
441
+ | `/branch [--rev N] [prompt]` | Open a new branch tab from the current (or earlier) checkpoint |
442
+ | `/diff [--all]` | Show a side-by-side diff of changes in the worktree |
405
443
  | `/clear [--config F]` | Clear conversation; `--config` reloads agent from a JSON/YAML file |
444
+ | `/tab [N]` | Jump to tab N |
406
445
  | `/history [--raw]` | Show conversation transcript; `--raw` shows the raw API message log |
407
- | `/diff [--all]` | Show a side-by-side diff of changes in the worktree |
408
- | `/errors` | Show timestamped error log for the current tab |
409
446
  | `/tools` | List active tools |
410
- | `/branch [--rev N] [prompt]` | Open a new branch tab from the current (or earlier) checkpoint |
411
447
  | `/abort` | Abort the running agent |
448
+ | `/errors` | Show timestamped error log for the current tab |
412
449
  | `/export [path]` | Export session to JSON |
413
450
  | `/exit [--all]` | Close branch tab, or exit the app |
414
- | `!command` | Run a shell command; output captured in the TUI |
415
- | `$command` | Run an interactive command (TUI suspends, terminal handed to process) |
451
+ | `/help` | Show command reference |
452
+ | `!command` | Run a shell command; output captured in the TUI (eg, `ls`, `cat hello.c`) |
453
+ | `$command` | Run an interactive command (eg, `vim`, `yazi`, `less hello.c`) |
416
454
 
417
455
  ### Key bindings
418
456
 
@@ -422,7 +460,7 @@ agent = Agent(extra_tool_classes=[LiveDBTool], tool_names=["QueryDB"])
422
460
  | `Ctrl-J` | Insert newline |
423
461
  | `Ctrl-1` … `Ctrl-9` | Jump to tab N |
424
462
  | `Ctrl-,` / `Ctrl-.` | Cycle through tabs |
425
- | `Ctrl-C` | Abort running agent |
463
+ | `Ctrl-C` | Clear input, or abort running agent |
426
464
  | `Ctrl-D` | Close branch tab, or exit app |
427
465
  | `Ctrl-R` | Recall last prompt into editor |
428
466
 
@@ -440,7 +478,7 @@ agent = Agent(extra_tool_classes=[LiveDBTool], tool_names=["QueryDB"])
440
478
  | `Skill` | Retrieve named skill instructions from config |
441
479
  | `Agent` | Spawn an autonomous sub-agent for delegated work |
442
480
 
443
- All file tools enforce path sandboxing the agent cannot read or write outside the worktree.
481
+ All file tools enforce path sandboxing. The agent cannot read or write outside the worktree.
444
482
 
445
483
  ### Backends
446
484
 
@@ -110,6 +110,7 @@ class SimpleRepl:
110
110
  if out_text:
111
111
  self._write_delta(prefix + out_text, 'tool_result')
112
112
  self._last_stream_type = t
113
+ print()
113
114
  elif t == 'commit':
114
115
  self._pending_nls = 0
115
116
  self._awaiting_content = False