PyPI - opencode-llmstack - Versions diffs - 0.9.0__tar.gz → 0.9.2__tar.gz - Mend

opencode-llmstack 0.9.0tar.gz → 0.9.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (45) hide show

opencode_llmstack-0.9.2/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,117 @@
+# Changelog
+All notable changes to `opencode-llmstack` are documented here.
+---
+## [0.9.2] — 2026-05-11
+### Fixed
+- `classify()` now counts only **user-role messages** when evaluating the
+  multi-turn floor (`n_turns`). Previously `len(messages)` counted system
+  prompts, assistant turns, and tool-result messages, causing the floor to
+  fire after just a few real exchanges and permanently blocking `code-fast`
+  routing for the rest of any session.
+- Multi-turn floor threshold raised from **6 → 10** user turns. `code-fast`
+  is now a hosted Bedrock model (Haiku 4.5) that tool-calls reliably, so
+  the old 3B-model rationale no longer applies. Sessions with fewer than 10
+  user turns will now correctly step down to `code-fast` past 32k tokens.
+- Log label corrected: `(tools floor)` → `(user-turns=N>=10 floor)`.
+- `__version__` corrected from `"0.1.0"` to `"0.9.2"`.
+- CI release pipeline now runs lint (`ruff`) + `pytest` across Python
+  3.11/3.12/3.13 before building the wheel. Previously the `test` job was
+  documented in comments but never implemented.
+- Added `LICENSE` (MIT) file to the repository root.
+- README routing table updated: high-fidelity ceiling corrected (8k → 12k),
+  tools-floor condition updated to reflect user-turn counting, `ROUTER_MULTI_TURN`
+  default corrected (6 → 10).
+- UPGRADING.md corrected: `llmstack install` does **not** regenerate
+  `llama-swap.yaml` — that is `llmstack restart`'s job. Three places in the
+  doc had this wrong.
+- README layout tree: repo root label corrected (`opencode/` → `llmstack/`),
+  `models.ini` moved to its correct location inside the package, `shell.py`
+  (deleted) removed, `reload.py` and `LICENSE` added.
+- `iter_downloads` reference in UPGRADING.md corrected to `iter_download_targets`.
+- Bundled `llmstack/models.ini` header comment paths updated from legacy
+  locations to current `.llmstack/` state-dir layout.
+- `assert` statements in production code (`app.py`, `bedrock.py`) replaced
+  with explicit `RuntimeError` / `TypeError` raises so `-O` optimisation
+  does not silently swallow them.
+- `UPGRADING.md` and `LICENSE` added to the sdist via `pyproject.toml`
+  `package-data` / `tool.setuptools` config.
+- `[tool.pytest.ini_options]` added to `pyproject.toml` with `testpaths`
+  and `addopts`.
+- Python 3.14 classifier added to `pyproject.toml`.
+### Added
+- `classify()` end-to-end test coverage: step-down ladder (short/mid/long
+  context), multi-turn floor, plan-signal routing, ultra-trigger routing,
+  uncensored-trigger routing, plan ctx-size overflow fall-through.
+- Generator tests: `build_config()` coverage for gguf tiers, bedrock tiers,
+  `use_next`, `small_model` wiring, agent wiring, `auto` ctx derivation.
+- `X-LLMStack-Tokens` response header on every `/v1/chat/completions` and
+  `/v1/completions` response so opencode (and curl) can see the estimated
+  token count the router used to make its routing decision.
+---
+## [0.9.1] — 2026-05-11
+### Fixed
+- `classify()` multi-turn floor: count only `role == "user"` messages
+  (not all messages). This was the primary fix preventing `code-fast`
+  from ever being reached in long sessions.
+- Multi-turn threshold raised 6 → 10 (see 0.9.2 for full rationale).
+- Log label `(tools floor)` corrected to `(user-turns=N>=10 floor)`.
+---
+## [0.9.0] — 2026-05-08
+### Changed
+- Plan tiers now strip `tools` from the request body before dispatch.
+  Previously a plan-routed request carrying a `tools` array would fail
+  on Bedrock (Converse rejects tool configs on non-agent models).
+- Long-context fall-through to `code-fast` is now allowed even when
+  `tools[]` is present in the request body. The tools-presence check
+  was removed from the floor condition; only turn count matters now.
+- `plan` tier ctx-size overflow: when estimated tokens exceed the
+  planner's `ctx_size`, the request falls through to the coding ladder
+  instead of being sent to a planner whose window can't hold it.
+- `HIGH_FIDELITY_CEILING` raised to 12 000 (was 8 000).
+---
+## [0.8.0] — 2026-05-07
+### Changed
+- Fidelity-ceiling overhaul: each ceiling is now exactly half of the
+  corresponding tier's `ctx_size` (the "comfortable headroom" invariant).
+- `code-ultra.ctx_size` set to 24 000 (2× high ceiling of 12 000).
+- `code-smart.ctx_size` set to 64 000 (2× mid ceiling of 32 000).
+- `code-fast.ctx_size` set to 128 000 (YaRN ×4 from native 32k).
+- `HIGH_FIDELITY_CEILING` env var added; overrides the 12 000 default.
+- `MID_FIDELITY_CEILING` env var added; overrides the 32 000 default.
+---
+## [0.7.3] — 2026-05-06
+### Added
+- Per-tier Bedrock alternatives in `models.ini`: every tier now ships a
+  commented-out Bedrock block directly beneath its GGUF block.
+- All Bedrock tiers anchored to `eu-west-3`; `plan-uncensored` pinned to
+  `us-west-2` (Llama 405B has no EU deployment).
+- `aws_model_id_next` / `aws_region_next` support for Bedrock upgrade
+  pre-staging (mirrors gguf `hf_file_next`).
+### Fixed
+- `models.ini` comment cleanup: removed stale references to old model names.
+---
+## [0.7.2] — earlier
+### Fixed
+- Soft-fail when `llama-server` binary is missing at startup.
+- PowerShell activation hook: fixed `Invoke-Expression` quoting.

opencode_llmstack-0.9.2/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2024 llmstack contributors
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

{opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/PKG-INFO RENAMED Viewed

@@ -1,9 +1,30 @@
 Metadata-Version: 2.4
 Name: opencode-llmstack
-Version: 0.9.0
+Version: 0.9.2
 Summary: Multi-tier local LLM stack: llama-swap + FastAPI auto-router + opencode wiring.
 Author: llmstack
-License: MIT
+License: MIT License
+        Copyright (c) 2024 llmstack contributors
+        Permission is hereby granted, free of charge, to any person obtaining a copy
+        of this software and associated documentation files (the "Software"), to deal
+        in the Software without restriction, including without limitation the rights
+        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+        copies of the Software, and to permit persons to whom the Software is
+        furnished to do so, subject to the following conditions:
+        The above copyright notice and this permission notice shall be included in all
+        copies or substantial portions of the Software.
+        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+        SOFTWARE.
 Project-URL: Homepage, https://github.com/rohitgarg19/llmstack
 Project-URL: Issues, https://github.com/rohitgarg19/llmstack/issues
 Keywords: llm,llama-cpp,llama-swap,opencode,router,local-ai
@@ -17,9 +38,11 @@ Classifier: Programming Language :: Python :: 3
 Classifier: Programming Language :: Python :: 3.11
 Classifier: Programming Language :: Python :: 3.12
 Classifier: Programming Language :: Python :: 3.13
+Classifier: Programming Language :: Python :: 3.14
 Classifier: Topic :: Software Development
 Requires-Python: >=3.11
 Description-Content-Type: text/markdown
+License-File: LICENSE
 Requires-Dist: fastapi<1.0,>=0.110
 Requires-Dist: httpx<1.0,>=0.27
 Requires-Dist: uvicorn[standard]<1.0,>=0.30
@@ -32,6 +55,7 @@ Requires-Dist: pytest>=7; extra == "dev"
 Provides-Extra: bedrock
 Requires-Dist: boto3>=1.35; extra == "bedrock"
 Requires-Dist: botocore>=1.35; extra == "bedrock"
+Dynamic: license-file
 # llmstack — multi-tier local LLM stack for Mac M4 Max / 64 GB
@@ -118,9 +142,9 @@ First match wins:
 | 1 | last user msg contains `[nofilter]`, `[uncensored]`, `[heretic]`, or starts with `uncensored:` / `nofilter:` | `plan-uncensored` | explicit opt-in |
 | 2 | `[ultra]` / `[opus]` / `ultra:` trigger AND `code-ultra` tier configured | `code-ultra` | explicit top-tier opt-in |
 | 3 | plan verbs (*design, architect, approach, trade-off, should we, explain why, …*) AND no code blocks / agent verbs / tools | `plan` | pure design discussion (orthogonal track) |
-| 4 | estimated input ≤ 8 000 tokens | `code-ultra` *(or `code-smart` if ultra unwired)* | top tier — context still being built, latency/$ are best here |
+| 4 | estimated input ≤ 12 000 tokens | `code-ultra` *(or `code-smart` if ultra unwired)* | top tier — context still being built, latency/$ are best here |
 | 5 | estimated input ≤ 32 000 tokens | `code-smart` | mid-context, local heavy coder is at its sweet spot |
-| 6 | otherwise (long context) AND (`tools[]` OR ≥ 6 turns) | `code-smart` | floor: 3B model tool-calls unreliably |
+| 6 | otherwise (long context) AND ≥ 10 user turns | `code-smart` | floor: deep agentic loop, keep the heavy model |
 | 7 | otherwise (long context) | `code-fast` | 128k YaRN window + always-resident + free |
 Token estimates are `chars / 4` over all message text + `prompt`. The
@@ -185,12 +209,12 @@ from any directory you previously ran `install` in.
 ## Layout
 ```
-opencode/                       # repo root
+llmstack/                       # repo root
 ├── pyproject.toml              # package metadata + `llmstack` console script
 ├── README.md                   # this file
 ├── UPGRADING.md                # how to swap any tier for a newer/better model
 │                                  + how to upgrade the Python toolchain itself
-├── models.ini                  # SINGLE SOURCE OF TRUTH for tiers + sampler
+├── LICENSE                     # MIT
 └── llmstack/                   # the python package (importable, installable)
     ├── __init__.py
     ├── __main__.py             # `python -m llmstack`
@@ -199,6 +223,7 @@ opencode/                       # repo root
     ├── shell_env.py            # spawn the env-prepared subshell + activate hooks
     ├── app.py                  # FastAPI auto-router (~280 lines)
     ├── tiers.py                # parse models.ini -> Tier dataclasses
+    ├── models.ini              # SINGLE SOURCE OF TRUTH for tiers + sampler (bundled template)
     ├── check_models.py         # snapshot tool (HF metadata + drift check)
     ├── AGENTS.md               # opencode agent template (shipped as package data)
     ├── generators/
@@ -213,9 +238,9 @@ opencode/                       # repo root
         ├── install_llama_swap.py
         ├── download.py
         ├── start.py
-        ├── shell.py
         ├── stop.py
         ├── restart.py
+        ├── reload.py
         ├── status.py
         ├── check.py
         └── activate.py
@@ -544,7 +569,7 @@ All knobs are env vars; defaults are picked up by `llmstack start`.
 | `ROUTER_UNCENSORED_MODEL` | `plan-uncensored` | `[nofilter]` triggers → here |
 | `ROUTER_HIGH_FIDELITY_CEILING` | `12000` | tokens; at or below this, route to top tier (ultra → smart fallback). Paired with `code-ultra.ctx_size = 24000` (2x). |
 | `ROUTER_MID_FIDELITY_CEILING` | `32000` | tokens; at or below this, route to `code-smart`; beyond, step down to `code-fast`. Paired with `code-smart.ctx_size = 64000` (2x). |
-| `ROUTER_MULTI_TURN` | `6` | turn count that floors the long-context rung at `code-smart` |
+| `ROUTER_MULTI_TURN` | `10` | user-turn count that floors the long-context rung at `code-smart` |
 | `ROUTER_HOST` / `ROUTER_PORT` | `127.0.0.1` / `10101` | listen address |
 | `LOG_LEVEL` | `info` | router log level |

{opencode_llmstack-0.9.0 → opencode_llmstack-0.9.2}/README.md RENAMED Viewed

@@ -83,9 +83,9 @@ First match wins:
 | 1 | last user msg contains `[nofilter]`, `[uncensored]`, `[heretic]`, or starts with `uncensored:` / `nofilter:` | `plan-uncensored` | explicit opt-in |
 | 2 | `[ultra]` / `[opus]` / `ultra:` trigger AND `code-ultra` tier configured | `code-ultra` | explicit top-tier opt-in |
 | 3 | plan verbs (*design, architect, approach, trade-off, should we, explain why, …*) AND no code blocks / agent verbs / tools | `plan` | pure design discussion (orthogonal track) |
-| 4 | estimated input ≤ 8 000 tokens | `code-ultra` *(or `code-smart` if ultra unwired)* | top tier — context still being built, latency/$ are best here |
+| 4 | estimated input ≤ 12 000 tokens | `code-ultra` *(or `code-smart` if ultra unwired)* | top tier — context still being built, latency/$ are best here |
 | 5 | estimated input ≤ 32 000 tokens | `code-smart` | mid-context, local heavy coder is at its sweet spot |
-| 6 | otherwise (long context) AND (`tools[]` OR ≥ 6 turns) | `code-smart` | floor: 3B model tool-calls unreliably |
+| 6 | otherwise (long context) AND ≥ 10 user turns | `code-smart` | floor: deep agentic loop, keep the heavy model |
 | 7 | otherwise (long context) | `code-fast` | 128k YaRN window + always-resident + free |
 Token estimates are `chars / 4` over all message text + `prompt`. The
@@ -150,12 +150,12 @@ from any directory you previously ran `install` in.
 ## Layout
 ```
-opencode/                       # repo root
+llmstack/                       # repo root
 ├── pyproject.toml              # package metadata + `llmstack` console script
 ├── README.md                   # this file
 ├── UPGRADING.md                # how to swap any tier for a newer/better model
 │                                  + how to upgrade the Python toolchain itself
-├── models.ini                  # SINGLE SOURCE OF TRUTH for tiers + sampler
+├── LICENSE                     # MIT
 └── llmstack/                   # the python package (importable, installable)
     ├── __init__.py
     ├── __main__.py             # `python -m llmstack`
@@ -164,6 +164,7 @@ opencode/                       # repo root
     ├── shell_env.py            # spawn the env-prepared subshell + activate hooks
     ├── app.py                  # FastAPI auto-router (~280 lines)
     ├── tiers.py                # parse models.ini -> Tier dataclasses
+    ├── models.ini              # SINGLE SOURCE OF TRUTH for tiers + sampler (bundled template)
     ├── check_models.py         # snapshot tool (HF metadata + drift check)
     ├── AGENTS.md               # opencode agent template (shipped as package data)
     ├── generators/
@@ -178,9 +179,9 @@ opencode/                       # repo root
         ├── install_llama_swap.py
         ├── download.py
         ├── start.py
-        ├── shell.py
         ├── stop.py
         ├── restart.py
+        ├── reload.py
         ├── status.py
         ├── check.py
         └── activate.py
@@ -509,7 +510,7 @@ All knobs are env vars; defaults are picked up by `llmstack start`.
 | `ROUTER_UNCENSORED_MODEL` | `plan-uncensored` | `[nofilter]` triggers → here |
 | `ROUTER_HIGH_FIDELITY_CEILING` | `12000` | tokens; at or below this, route to top tier (ultra → smart fallback). Paired with `code-ultra.ctx_size = 24000` (2x). |
 | `ROUTER_MID_FIDELITY_CEILING` | `32000` | tokens; at or below this, route to `code-smart`; beyond, step down to `code-fast`. Paired with `code-smart.ctx_size = 64000` (2x). |
-| `ROUTER_MULTI_TURN` | `6` | turn count that floors the long-context rung at `code-smart` |
+| `ROUTER_MULTI_TURN` | `10` | user-turn count that floors the long-context rung at `code-smart` |
 | `ROUTER_HOST` / `ROUTER_PORT` | `127.0.0.1` / `10101` | listen address |
 | `LOG_LEVEL` | `info` | router log level |

opencode-llmstack 0.9.0__tar.gz → 0.9.2__tar.gz

opencode-llmstack 0.9.0tar.gz → 0.9.2tar.gz