opencode-llmstack 0.9.0__tar.gz → 0.9.2__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- opencode_llmstack-0.9.2/CHANGELOG.md +117 -0
- opencode_llmstack-0.9.2/LICENSE +21 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/PKG-INFO +33 -8
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/README.md +7 -6
- opencode_llmstack-0.9.2/UPGRADING.md +602 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/__init__.py +1 -1
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/app.py +19 -7
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/backends/bedrock.py +4 -2
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/commands/start.py +0 -1
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/models.ini +3 -3
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/opencode_llmstack.egg-info/PKG-INFO +33 -8
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/opencode_llmstack.egg-info/SOURCES.txt +3 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/pyproject.toml +10 -2
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/AGENTS.md +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/__main__.py +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/_platform.py +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/backends/__init__.py +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/check_models.py +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/cli.py +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/commands/__init__.py +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/commands/_helpers.py +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/commands/activate.py +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/commands/check.py +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/commands/download.py +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/commands/install.py +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/commands/install_llama_swap.py +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/commands/reload.py +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/commands/restart.py +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/commands/setup.py +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/commands/status.py +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/commands/stop.py +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/download/__init__.py +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/download/binary.py +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/download/ggufs.py +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/generators/__init__.py +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/generators/llama_swap.py +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/generators/opencode.py +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/paths.py +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/shell_env.py +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/llmstack/tiers.py +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/opencode_llmstack.egg-info/dependency_links.txt +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/opencode_llmstack.egg-info/entry_points.txt +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/opencode_llmstack.egg-info/requires.txt +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/opencode_llmstack.egg-info/top_level.txt +0 -0
- {opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/setup.cfg +0 -0
|
@@ -0,0 +1,117 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to `opencode-llmstack` are documented here.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## [0.9.2] — 2026-05-11
|
|
8
|
+
|
|
9
|
+
### Fixed
|
|
10
|
+
- `classify()` now counts only **user-role messages** when evaluating the
|
|
11
|
+
multi-turn floor (`n_turns`). Previously `len(messages)` counted system
|
|
12
|
+
prompts, assistant turns, and tool-result messages, causing the floor to
|
|
13
|
+
fire after just a few real exchanges and permanently blocking `code-fast`
|
|
14
|
+
routing for the rest of any session.
|
|
15
|
+
- Multi-turn floor threshold raised from **6 → 10** user turns. `code-fast`
|
|
16
|
+
is now a hosted Bedrock model (Haiku 4.5) that tool-calls reliably, so
|
|
17
|
+
the old 3B-model rationale no longer applies. Sessions with fewer than 10
|
|
18
|
+
user turns will now correctly step down to `code-fast` past 32k tokens.
|
|
19
|
+
- Log label corrected: `(tools floor)` → `(user-turns=N>=10 floor)`.
|
|
20
|
+
- `__version__` corrected from `"0.1.0"` to `"0.9.2"`.
|
|
21
|
+
- CI release pipeline now runs lint (`ruff`) + `pytest` across Python
|
|
22
|
+
3.11/3.12/3.13 before building the wheel. Previously the `test` job was
|
|
23
|
+
documented in comments but never implemented.
|
|
24
|
+
- Added `LICENSE` (MIT) file to the repository root.
|
|
25
|
+
- README routing table updated: high-fidelity ceiling corrected (8k → 12k),
|
|
26
|
+
tools-floor condition updated to reflect user-turn counting, `ROUTER_MULTI_TURN`
|
|
27
|
+
default corrected (6 → 10).
|
|
28
|
+
- UPGRADING.md corrected: `llmstack install` does **not** regenerate
|
|
29
|
+
`llama-swap.yaml` — that is `llmstack restart`'s job. Three places in the
|
|
30
|
+
doc had this wrong.
|
|
31
|
+
- README layout tree: repo root label corrected (`opencode/` → `llmstack/`),
|
|
32
|
+
`models.ini` moved to its correct location inside the package, `shell.py`
|
|
33
|
+
(deleted) removed, `reload.py` and `LICENSE` added.
|
|
34
|
+
- `iter_downloads` reference in UPGRADING.md corrected to `iter_download_targets`.
|
|
35
|
+
- Bundled `llmstack/models.ini` header comment paths updated from legacy
|
|
36
|
+
locations to current `.llmstack/` state-dir layout.
|
|
37
|
+
- `assert` statements in production code (`app.py`, `bedrock.py`) replaced
|
|
38
|
+
with explicit `RuntimeError` / `TypeError` raises so `-O` optimisation
|
|
39
|
+
does not silently swallow them.
|
|
40
|
+
- `UPGRADING.md` and `LICENSE` added to the sdist via `pyproject.toml`
|
|
41
|
+
`package-data` / `tool.setuptools` config.
|
|
42
|
+
- `[tool.pytest.ini_options]` added to `pyproject.toml` with `testpaths`
|
|
43
|
+
and `addopts`.
|
|
44
|
+
- Python 3.14 classifier added to `pyproject.toml`.
|
|
45
|
+
|
|
46
|
+
### Added
|
|
47
|
+
- `classify()` end-to-end test coverage: step-down ladder (short/mid/long
|
|
48
|
+
context), multi-turn floor, plan-signal routing, ultra-trigger routing,
|
|
49
|
+
uncensored-trigger routing, plan ctx-size overflow fall-through.
|
|
50
|
+
- Generator tests: `build_config()` coverage for gguf tiers, bedrock tiers,
|
|
51
|
+
`use_next`, `small_model` wiring, agent wiring, `auto` ctx derivation.
|
|
52
|
+
- `X-LLMStack-Tokens` response header on every `/v1/chat/completions` and
|
|
53
|
+
`/v1/completions` response so opencode (and curl) can see the estimated
|
|
54
|
+
token count the router used to make its routing decision.
|
|
55
|
+
|
|
56
|
+
---
|
|
57
|
+
|
|
58
|
+
## [0.9.1] — 2026-05-11
|
|
59
|
+
|
|
60
|
+
### Fixed
|
|
61
|
+
- `classify()` multi-turn floor: count only `role == "user"` messages
|
|
62
|
+
(not all messages). This was the primary fix preventing `code-fast`
|
|
63
|
+
from ever being reached in long sessions.
|
|
64
|
+
- Multi-turn threshold raised 6 → 10 (see 0.9.2 for full rationale).
|
|
65
|
+
- Log label `(tools floor)` corrected to `(user-turns=N>=10 floor)`.
|
|
66
|
+
|
|
67
|
+
---
|
|
68
|
+
|
|
69
|
+
## [0.9.0] — 2026-05-08
|
|
70
|
+
|
|
71
|
+
### Changed
|
|
72
|
+
- Plan tiers now strip `tools` from the request body before dispatch.
|
|
73
|
+
Previously a plan-routed request carrying a `tools` array would fail
|
|
74
|
+
on Bedrock (Converse rejects tool configs on non-agent models).
|
|
75
|
+
- Long-context fall-through to `code-fast` is now allowed even when
|
|
76
|
+
`tools[]` is present in the request body. The tools-presence check
|
|
77
|
+
was removed from the floor condition; only turn count matters now.
|
|
78
|
+
- `plan` tier ctx-size overflow: when estimated tokens exceed the
|
|
79
|
+
planner's `ctx_size`, the request falls through to the coding ladder
|
|
80
|
+
instead of being sent to a planner whose window can't hold it.
|
|
81
|
+
- `HIGH_FIDELITY_CEILING` raised to 12 000 (was 8 000).
|
|
82
|
+
|
|
83
|
+
---
|
|
84
|
+
|
|
85
|
+
## [0.8.0] — 2026-05-07
|
|
86
|
+
|
|
87
|
+
### Changed
|
|
88
|
+
- Fidelity-ceiling overhaul: each ceiling is now exactly half of the
|
|
89
|
+
corresponding tier's `ctx_size` (the "comfortable headroom" invariant).
|
|
90
|
+
- `code-ultra.ctx_size` set to 24 000 (2× high ceiling of 12 000).
|
|
91
|
+
- `code-smart.ctx_size` set to 64 000 (2× mid ceiling of 32 000).
|
|
92
|
+
- `code-fast.ctx_size` set to 128 000 (YaRN ×4 from native 32k).
|
|
93
|
+
- `HIGH_FIDELITY_CEILING` env var added; overrides the 12 000 default.
|
|
94
|
+
- `MID_FIDELITY_CEILING` env var added; overrides the 32 000 default.
|
|
95
|
+
|
|
96
|
+
---
|
|
97
|
+
|
|
98
|
+
## [0.7.3] — 2026-05-06
|
|
99
|
+
|
|
100
|
+
### Added
|
|
101
|
+
- Per-tier Bedrock alternatives in `models.ini`: every tier now ships a
|
|
102
|
+
commented-out Bedrock block directly beneath its GGUF block.
|
|
103
|
+
- All Bedrock tiers anchored to `eu-west-3`; `plan-uncensored` pinned to
|
|
104
|
+
`us-west-2` (Llama 405B has no EU deployment).
|
|
105
|
+
- `aws_model_id_next` / `aws_region_next` support for Bedrock upgrade
|
|
106
|
+
pre-staging (mirrors gguf `hf_file_next`).
|
|
107
|
+
|
|
108
|
+
### Fixed
|
|
109
|
+
- `models.ini` comment cleanup: removed stale references to old model names.
|
|
110
|
+
|
|
111
|
+
---
|
|
112
|
+
|
|
113
|
+
## [0.7.2] — earlier
|
|
114
|
+
|
|
115
|
+
### Fixed
|
|
116
|
+
- Soft-fail when `llama-server` binary is missing at startup.
|
|
117
|
+
- PowerShell activation hook: fixed `Invoke-Expression` quoting.
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2024 llmstack contributors
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -1,9 +1,30 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: opencode-llmstack
|
|
3
|
-
Version: 0.9.
|
|
3
|
+
Version: 0.9.2
|
|
4
4
|
Summary: Multi-tier local LLM stack: llama-swap + FastAPI auto-router + opencode wiring.
|
|
5
5
|
Author: llmstack
|
|
6
|
-
License: MIT
|
|
6
|
+
License: MIT License
|
|
7
|
+
|
|
8
|
+
Copyright (c) 2024 llmstack contributors
|
|
9
|
+
|
|
10
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
11
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
12
|
+
in the Software without restriction, including without limitation the rights
|
|
13
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
14
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
15
|
+
furnished to do so, subject to the following conditions:
|
|
16
|
+
|
|
17
|
+
The above copyright notice and this permission notice shall be included in all
|
|
18
|
+
copies or substantial portions of the Software.
|
|
19
|
+
|
|
20
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
21
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
22
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
23
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
24
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
25
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
26
|
+
SOFTWARE.
|
|
27
|
+
|
|
7
28
|
Project-URL: Homepage, https://github.com/rohitgarg19/llmstack
|
|
8
29
|
Project-URL: Issues, https://github.com/rohitgarg19/llmstack/issues
|
|
9
30
|
Keywords: llm,llama-cpp,llama-swap,opencode,router,local-ai
|
|
@@ -17,9 +38,11 @@ Classifier: Programming Language :: Python :: 3
|
|
|
17
38
|
Classifier: Programming Language :: Python :: 3.11
|
|
18
39
|
Classifier: Programming Language :: Python :: 3.12
|
|
19
40
|
Classifier: Programming Language :: Python :: 3.13
|
|
41
|
+
Classifier: Programming Language :: Python :: 3.14
|
|
20
42
|
Classifier: Topic :: Software Development
|
|
21
43
|
Requires-Python: >=3.11
|
|
22
44
|
Description-Content-Type: text/markdown
|
|
45
|
+
License-File: LICENSE
|
|
23
46
|
Requires-Dist: fastapi<1.0,>=0.110
|
|
24
47
|
Requires-Dist: httpx<1.0,>=0.27
|
|
25
48
|
Requires-Dist: uvicorn[standard]<1.0,>=0.30
|
|
@@ -32,6 +55,7 @@ Requires-Dist: pytest>=7; extra == "dev"
|
|
|
32
55
|
Provides-Extra: bedrock
|
|
33
56
|
Requires-Dist: boto3>=1.35; extra == "bedrock"
|
|
34
57
|
Requires-Dist: botocore>=1.35; extra == "bedrock"
|
|
58
|
+
Dynamic: license-file
|
|
35
59
|
|
|
36
60
|
# llmstack — multi-tier local LLM stack for Mac M4 Max / 64 GB
|
|
37
61
|
|
|
@@ -118,9 +142,9 @@ First match wins:
|
|
|
118
142
|
| 1 | last user msg contains `[nofilter]`, `[uncensored]`, `[heretic]`, or starts with `uncensored:` / `nofilter:` | `plan-uncensored` | explicit opt-in |
|
|
119
143
|
| 2 | `[ultra]` / `[opus]` / `ultra:` trigger AND `code-ultra` tier configured | `code-ultra` | explicit top-tier opt-in |
|
|
120
144
|
| 3 | plan verbs (*design, architect, approach, trade-off, should we, explain why, …*) AND no code blocks / agent verbs / tools | `plan` | pure design discussion (orthogonal track) |
|
|
121
|
-
| 4 | estimated input ≤
|
|
145
|
+
| 4 | estimated input ≤ 12 000 tokens | `code-ultra` *(or `code-smart` if ultra unwired)* | top tier — context still being built, latency/$ are best here |
|
|
122
146
|
| 5 | estimated input ≤ 32 000 tokens | `code-smart` | mid-context, local heavy coder is at its sweet spot |
|
|
123
|
-
| 6 | otherwise (long context) AND
|
|
147
|
+
| 6 | otherwise (long context) AND ≥ 10 user turns | `code-smart` | floor: deep agentic loop, keep the heavy model |
|
|
124
148
|
| 7 | otherwise (long context) | `code-fast` | 128k YaRN window + always-resident + free |
|
|
125
149
|
|
|
126
150
|
Token estimates are `chars / 4` over all message text + `prompt`. The
|
|
@@ -185,12 +209,12 @@ from any directory you previously ran `install` in.
|
|
|
185
209
|
## Layout
|
|
186
210
|
|
|
187
211
|
```
|
|
188
|
-
|
|
212
|
+
llmstack/ # repo root
|
|
189
213
|
├── pyproject.toml # package metadata + `llmstack` console script
|
|
190
214
|
├── README.md # this file
|
|
191
215
|
├── UPGRADING.md # how to swap any tier for a newer/better model
|
|
192
216
|
│ + how to upgrade the Python toolchain itself
|
|
193
|
-
├──
|
|
217
|
+
├── LICENSE # MIT
|
|
194
218
|
└── llmstack/ # the python package (importable, installable)
|
|
195
219
|
├── __init__.py
|
|
196
220
|
├── __main__.py # `python -m llmstack`
|
|
@@ -199,6 +223,7 @@ opencode/ # repo root
|
|
|
199
223
|
├── shell_env.py # spawn the env-prepared subshell + activate hooks
|
|
200
224
|
├── app.py # FastAPI auto-router (~280 lines)
|
|
201
225
|
├── tiers.py # parse models.ini -> Tier dataclasses
|
|
226
|
+
├── models.ini # SINGLE SOURCE OF TRUTH for tiers + sampler (bundled template)
|
|
202
227
|
├── check_models.py # snapshot tool (HF metadata + drift check)
|
|
203
228
|
├── AGENTS.md # opencode agent template (shipped as package data)
|
|
204
229
|
├── generators/
|
|
@@ -213,9 +238,9 @@ opencode/ # repo root
|
|
|
213
238
|
├── install_llama_swap.py
|
|
214
239
|
├── download.py
|
|
215
240
|
├── start.py
|
|
216
|
-
├── shell.py
|
|
217
241
|
├── stop.py
|
|
218
242
|
├── restart.py
|
|
243
|
+
├── reload.py
|
|
219
244
|
├── status.py
|
|
220
245
|
├── check.py
|
|
221
246
|
└── activate.py
|
|
@@ -544,7 +569,7 @@ All knobs are env vars; defaults are picked up by `llmstack start`.
|
|
|
544
569
|
| `ROUTER_UNCENSORED_MODEL` | `plan-uncensored` | `[nofilter]` triggers → here |
|
|
545
570
|
| `ROUTER_HIGH_FIDELITY_CEILING` | `12000` | tokens; at or below this, route to top tier (ultra → smart fallback). Paired with `code-ultra.ctx_size = 24000` (2x). |
|
|
546
571
|
| `ROUTER_MID_FIDELITY_CEILING` | `32000` | tokens; at or below this, route to `code-smart`; beyond, step down to `code-fast`. Paired with `code-smart.ctx_size = 64000` (2x). |
|
|
547
|
-
| `ROUTER_MULTI_TURN` | `
|
|
572
|
+
| `ROUTER_MULTI_TURN` | `10` | user-turn count that floors the long-context rung at `code-smart` |
|
|
548
573
|
| `ROUTER_HOST` / `ROUTER_PORT` | `127.0.0.1` / `10101` | listen address |
|
|
549
574
|
| `LOG_LEVEL` | `info` | router log level |
|
|
550
575
|
|
|
@@ -83,9 +83,9 @@ First match wins:
|
|
|
83
83
|
| 1 | last user msg contains `[nofilter]`, `[uncensored]`, `[heretic]`, or starts with `uncensored:` / `nofilter:` | `plan-uncensored` | explicit opt-in |
|
|
84
84
|
| 2 | `[ultra]` / `[opus]` / `ultra:` trigger AND `code-ultra` tier configured | `code-ultra` | explicit top-tier opt-in |
|
|
85
85
|
| 3 | plan verbs (*design, architect, approach, trade-off, should we, explain why, …*) AND no code blocks / agent verbs / tools | `plan` | pure design discussion (orthogonal track) |
|
|
86
|
-
| 4 | estimated input ≤
|
|
86
|
+
| 4 | estimated input ≤ 12 000 tokens | `code-ultra` *(or `code-smart` if ultra unwired)* | top tier — context still being built, latency/$ are best here |
|
|
87
87
|
| 5 | estimated input ≤ 32 000 tokens | `code-smart` | mid-context, local heavy coder is at its sweet spot |
|
|
88
|
-
| 6 | otherwise (long context) AND
|
|
88
|
+
| 6 | otherwise (long context) AND ≥ 10 user turns | `code-smart` | floor: deep agentic loop, keep the heavy model |
|
|
89
89
|
| 7 | otherwise (long context) | `code-fast` | 128k YaRN window + always-resident + free |
|
|
90
90
|
|
|
91
91
|
Token estimates are `chars / 4` over all message text + `prompt`. The
|
|
@@ -150,12 +150,12 @@ from any directory you previously ran `install` in.
|
|
|
150
150
|
## Layout
|
|
151
151
|
|
|
152
152
|
```
|
|
153
|
-
|
|
153
|
+
llmstack/ # repo root
|
|
154
154
|
├── pyproject.toml # package metadata + `llmstack` console script
|
|
155
155
|
├── README.md # this file
|
|
156
156
|
├── UPGRADING.md # how to swap any tier for a newer/better model
|
|
157
157
|
│ + how to upgrade the Python toolchain itself
|
|
158
|
-
├──
|
|
158
|
+
├── LICENSE # MIT
|
|
159
159
|
└── llmstack/ # the python package (importable, installable)
|
|
160
160
|
├── __init__.py
|
|
161
161
|
├── __main__.py # `python -m llmstack`
|
|
@@ -164,6 +164,7 @@ opencode/ # repo root
|
|
|
164
164
|
├── shell_env.py # spawn the env-prepared subshell + activate hooks
|
|
165
165
|
├── app.py # FastAPI auto-router (~280 lines)
|
|
166
166
|
├── tiers.py # parse models.ini -> Tier dataclasses
|
|
167
|
+
├── models.ini # SINGLE SOURCE OF TRUTH for tiers + sampler (bundled template)
|
|
167
168
|
├── check_models.py # snapshot tool (HF metadata + drift check)
|
|
168
169
|
├── AGENTS.md # opencode agent template (shipped as package data)
|
|
169
170
|
├── generators/
|
|
@@ -178,9 +179,9 @@ opencode/ # repo root
|
|
|
178
179
|
├── install_llama_swap.py
|
|
179
180
|
├── download.py
|
|
180
181
|
├── start.py
|
|
181
|
-
├── shell.py
|
|
182
182
|
├── stop.py
|
|
183
183
|
├── restart.py
|
|
184
|
+
├── reload.py
|
|
184
185
|
├── status.py
|
|
185
186
|
├── check.py
|
|
186
187
|
└── activate.py
|
|
@@ -509,7 +510,7 @@ All knobs are env vars; defaults are picked up by `llmstack start`.
|
|
|
509
510
|
| `ROUTER_UNCENSORED_MODEL` | `plan-uncensored` | `[nofilter]` triggers → here |
|
|
510
511
|
| `ROUTER_HIGH_FIDELITY_CEILING` | `12000` | tokens; at or below this, route to top tier (ultra → smart fallback). Paired with `code-ultra.ctx_size = 24000` (2x). |
|
|
511
512
|
| `ROUTER_MID_FIDELITY_CEILING` | `32000` | tokens; at or below this, route to `code-smart`; beyond, step down to `code-fast`. Paired with `code-smart.ctx_size = 64000` (2x). |
|
|
512
|
-
| `ROUTER_MULTI_TURN` | `
|
|
513
|
+
| `ROUTER_MULTI_TURN` | `10` | user-turn count that floors the long-context rung at `code-smart` |
|
|
513
514
|
| `ROUTER_HOST` / `ROUTER_PORT` | `127.0.0.1` / `10101` | listen address |
|
|
514
515
|
| `LOG_LEVEL` | `info` | router log level |
|
|
515
516
|
|