PyPI - kon-coding-agent - Versions diffs - 0.2.0__tar.gz → 0.2.2__tar.gz - Mend

kon-coding-agent 0.2.0tar.gz → 0.2.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (104) hide show

kon_coding_agent-0.2.2/.kon/skills/kon-release-publish/SKILL.md ADDED Viewed

@@ -0,0 +1,85 @@
+---
+name: kon-release-publish
+description: Tag, publish to PyPI, and create GitHub release for Kon with validation and rollback-safe steps
+---
+# Kon Release + PyPI Publish
+Use this skill when the user asks to cut a new Kon version, tag it, publish to PyPI, and/or create a GitHub release.
+## Inputs to confirm
+- Target version (example: `0.2.1`)
+- Base range for notes (usually previous tag, example: `v0.2.0..HEAD`)
+- Whether to push `main`
+- Whether to publish to PyPI now
+- Whether to create GitHub release now
+## Files to bump
+- `pyproject.toml` → `[project].version`
+- `src/kon/ui/app.py` → fallback `VERSION = "..."`
+- `uv.lock` → local package version block
+## Release workflow
+1. **Preflight**
+   - `git status --short --branch` must be clean (or confirm with user)
+   - `git tag --list` and `git log --oneline <prev_tag>..HEAD` to summarize changes
+2. **Version bump**
+   - Update version in all 3 files above
+3. **Quality gates**
+   - `uv run ruff format .`
+   - `uv run ruff check .`
+   - `uv run pyright .`
+   - `uv run pytest`
+4. **Commit**
+   - Commit message: `build: bump version to <version>`
+5. **Tag**
+   - Annotated tag: `git tag -a v<version> -m "v<version> ..."`
+   - Include concise “changes since previous tag” bullets
+6. **Push**
+   - `git push origin main`
+   - `git push origin v<version>`
+7. **Build + verify artifacts**
+   - `rm -rf dist && uv build`
+   - `uv run python -m twine check dist/*`
+8. **Publish to PyPI**
+   - Prefer token file if present (example `~/.pypi-token`):
+   - `TWINE_USERNAME=__token__ TWINE_PASSWORD="$(< ~/.pypi-token)" uv run python -m twine upload dist/*`
+   - Verify:
+     - `https://pypi.org/project/kon-coding-agent/<version>/`
+     - `https://pypi.org/pypi/kon-coding-agent/json` reports latest version
+9. **Create GitHub release**
+   - If token exists at `~/.github-token`, call Releases API:
+   - `POST /repos/<owner>/<repo>/releases` with:
+     - `tag_name: v<version>`
+     - `target_commitish: main`
+     - `name: v<version>`
+     - `generate_release_notes: true`
+   - If 403 occurs, report missing token scopes/permissions (`contents:write` required)
+## Important notes
+- **Tagging and GitHub release are separate**:
+  - Tag = git ref in repository
+  - Release = GitHub object attached to a tag (notes/assets)
+- You can do either independently, but most projects do both together for user-facing releases.
+- If PyPI publish succeeds but GitHub release fails, do **not** retag/re-publish. Just fix auth and create the release for the existing tag.
+## Output checklist to report
+- Version bumped in all files
+- Checks passed
+- Commit hash
+- Tag created and pushed
+- PyPI upload URL
+- GitHub release URL (or exact error + remediation)

kon_coding_agent-0.2.2/LOCAL.md ADDED Viewed

@@ -0,0 +1,25 @@
+# Local Models
+This document provides detailed information about running and configuring local models with Kon.
+## Tested Models
+| Model | Quantization | Context Length | TPS | System Specs |
+| ----- | -------------- | -------------- | --- | ------------ |
+| `qwen/qwen3-coder-next` | Q4_K_M | 64,000 | N/A | i7-14700F × 28, 64GB RAM, 24GB VRAM (RTX 3090) |
+| `zai-org/glm-4.7-flash` | Q4_K_M | 64,000 | ~80-90 | i7-14700F × 28, 64GB RAM, 24GB VRAM (RTX 3090) |
+Run a local model using llama-server with the following command:
+```bash
+./llama-server -m <models-dir>/GLM-4.7-Flash-GGUF/GLM-4.7-Flash-Q4_K_M.gguf -n 8192 -c 64000
+```
+Then start kon:
+```bash
+kon --model zai-org/glm-4.7-flash --provider openai --base-url http://localhost:8080/v1 --api-key ""
+```
+> [!NOTE]
+> I was not able to run qwen-coder-next reliably on my system. Either the provider config had some issues or it's too big for my system (i'm not sure)

{kon_coding_agent-0.2.0 → kon_coding_agent-0.2.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: kon-coding-agent
-Version: 0.2.0
+Version: 0.2.2
 Summary: Minimal coding agent
 License-File: LICENSE
 Requires-Python: >=3.12
@@ -215,34 +215,32 @@ UI (app.py)
 ## Supported Models
-Kon works well with local models exposed through an OpenAI-compatible `/v1` API (for example LM Studio).
+Kon works well with local models exposed through an OpenAI-compatible `/v1` API.
-### Example on LM Studio
+### Example using llama-server
-To run a local model from LM Studio:
+To run a local model using llama-server:
 ```bash
-# GLM-4.7-flash
-kon --provider openai-responses \
-  --base-url http://127.0.0.1:1234/v1 \
-  --model zai-org/glm-4.7-flash \
-  --api-key ""
-# Qwen3-coder-next
-kon --provider openai-responses \
-  --base-url http://127.0.0.1:1234/v1 \
-  --model qwen/qwen3-coder-next \
-  --api-key ""
+./llama-server -m <models-dir>/GLM-4.7-Flash-GGUF/GLM-4.7-Flash-Q4_K_M.gguf \
+    -n 8192 \
+    -c 64000
+# Then use Kon with:
+kon --model zai-org/glm-4.7-flash \
+    --provider openai \
+    --base-url http://localhost:8080/v1 \
+    --api-key ""
 ```
-For detailed configuration and performance benchmarks, see [LOCAL.md](LOCAL.md).
+`GLM-4.7-Flash-Q4` ran at 80-90 tps on my i7-14700F × 28, 64GB RAM, 24GB VRAM (RTX 3090)
 ### All Supported Providers
 | Model (local=*) | Provider | Thinking | Vision |
 | ----- | -------- | -------- | ------ |
-| `*zai-org/glm-4.7-flash` | OpenAI Responses | Yes | No |
-| `*qwen/qwen3-coder-next` | OpenAI Responses | Yes | No |
+| `*zai-org/glm-4.7-flash` | OpenAI Completions | Yes | No |
+| `*qwen/qwen3-coder-next` | OpenAI Completions | Yes | No |
 | `glm-4.7` | ZhiPu (OpenAI Completions) | Yes | No |
 | `glm-5` | ZhiPu (OpenAI Completions) | Yes | No |
 | `claude-sonnet-4.5` | GitHub Copilot | Yes | Yes |
@@ -264,7 +262,8 @@ Most important knobs:
 - `llm.default_thinking_level`
 - `llm.system_prompt` (**you can fully override Kon’s system prompt here**)
 - `llm.tool_call_idle_timeout_seconds` (fallback timeout for stalled tool-call streaming)
-- `compaction.on_overflow`, `compaction.buffer_tokens`, `compaction.default_context_window`
+- `compaction.on_overflow`, `compaction.buffer_tokens`
+- `agent.max_turns`, `agent.default_context_window`
 You can also theme the UI via `[ui.colors]` values.
@@ -275,7 +274,7 @@ Example:
 default_provider = "openai-codex"
 default_model = "gpt-5.3-codex"
 default_thinking_level = "high"
-tool_call_idle_timeout_seconds = 10
+tool_call_idle_timeout_seconds = 60
 system_prompt = """Your custom system prompt here"""
 [compaction]

{kon_coding_agent-0.2.0 → kon_coding_agent-0.2.2}/README.md RENAMED Viewed

@@ -199,34 +199,32 @@ UI (app.py)
 ## Supported Models
-Kon works well with local models exposed through an OpenAI-compatible `/v1` API (for example LM Studio).
+Kon works well with local models exposed through an OpenAI-compatible `/v1` API.
-### Example on LM Studio
+### Example using llama-server
-To run a local model from LM Studio:
+To run a local model using llama-server:
 ```bash
-# GLM-4.7-flash
-kon --provider openai-responses \
-  --base-url http://127.0.0.1:1234/v1 \
-  --model zai-org/glm-4.7-flash \
-  --api-key ""
-# Qwen3-coder-next
-kon --provider openai-responses \
-  --base-url http://127.0.0.1:1234/v1 \
-  --model qwen/qwen3-coder-next \
-  --api-key ""
+./llama-server -m <models-dir>/GLM-4.7-Flash-GGUF/GLM-4.7-Flash-Q4_K_M.gguf \
+    -n 8192 \
+    -c 64000
+# Then use Kon with:
+kon --model zai-org/glm-4.7-flash \
+    --provider openai \
+    --base-url http://localhost:8080/v1 \
+    --api-key ""
 ```
-For detailed configuration and performance benchmarks, see [LOCAL.md](LOCAL.md).
+`GLM-4.7-Flash-Q4` ran at 80-90 tps on my i7-14700F × 28, 64GB RAM, 24GB VRAM (RTX 3090)
 ### All Supported Providers
 | Model (local=*) | Provider | Thinking | Vision |
 | ----- | -------- | -------- | ------ |
-| `*zai-org/glm-4.7-flash` | OpenAI Responses | Yes | No |
-| `*qwen/qwen3-coder-next` | OpenAI Responses | Yes | No |
+| `*zai-org/glm-4.7-flash` | OpenAI Completions | Yes | No |
+| `*qwen/qwen3-coder-next` | OpenAI Completions | Yes | No |
 | `glm-4.7` | ZhiPu (OpenAI Completions) | Yes | No |
 | `glm-5` | ZhiPu (OpenAI Completions) | Yes | No |
 | `claude-sonnet-4.5` | GitHub Copilot | Yes | Yes |
@@ -248,7 +246,8 @@ Most important knobs:
 - `llm.default_thinking_level`
 - `llm.system_prompt` (**you can fully override Kon’s system prompt here**)
 - `llm.tool_call_idle_timeout_seconds` (fallback timeout for stalled tool-call streaming)
-- `compaction.on_overflow`, `compaction.buffer_tokens`, `compaction.default_context_window`
+- `compaction.on_overflow`, `compaction.buffer_tokens`
+- `agent.max_turns`, `agent.default_context_window`
 You can also theme the UI via `[ui.colors]` values.
@@ -259,7 +258,7 @@ Example:
 default_provider = "openai-codex"
 default_model = "gpt-5.3-codex"
 default_thinking_level = "high"
-tool_call_idle_timeout_seconds = 10
+tool_call_idle_timeout_seconds = 60
 system_prompt = """Your custom system prompt here"""
 [compaction]

{kon_coding_agent-0.2.0 → kon_coding_agent-0.2.2}/TODO.md RENAMED Viewed

@@ -1,5 +1,3 @@
-- show new release update in ui to prompt the user to upgrade
-- show tokens streamed in for bash, edit and write tools (which can large at times)
 - if @ or / menu is open pressing esc closes it but interrupts stream as well
 - bug in how we report context size, tokens in and out and cached tokens for codex?
 - steer (immediate) and normal queues

{kon_coding_agent-0.2.0 → kon_coding_agent-0.2.2}/pyproject.toml RENAMED Viewed

@@ -14,7 +14,7 @@ default = true
 [project]
 name = "kon-coding-agent"
-version = "0.2.0"
+version = "0.2.2"
 description = "Minimal coding agent"
 readme = "README.md"
 requires-python = ">=3.12"

{kon_coding_agent-0.2.0 → kon_coding_agent-0.2.2}/src/kon/config.py RENAMED Viewed

@@ -59,12 +59,16 @@ class LLMConfig(BaseModel):
     default_model: str
     default_thinking_level: str
     system_prompt: str
-    tool_call_idle_timeout_seconds: float = 10
+    tool_call_idle_timeout_seconds: float = 60
 class CompactionConfig(BaseModel):
     on_overflow: OnOverflowMode = "continue"
     buffer_tokens: int = 20000
+class AgentConfig(BaseModel):
+    max_turns: int = 500
     default_context_window: int = 200000
@@ -72,6 +76,7 @@ class ConfigSchema(BaseModel):
     llm: LLMConfig
     ui: UIConfig
     compaction: CompactionConfig
+    agent: AgentConfig
 class _BinariesConfig:
@@ -126,6 +131,10 @@ class Config:
     def compaction(self) -> CompactionConfig:
         return self._parsed.compaction
+    @property
+    def agent(self) -> AgentConfig:
+        return self._parsed.agent
     @property
     def binaries(self) -> _BinariesConfig:
         return _BinariesConfig(AVAILABLE_BINARIES)

{kon_coding_agent-0.2.0 → kon_coding_agent-0.2.2}/src/kon/defaults/config.toml RENAMED Viewed

@@ -2,7 +2,7 @@
 default_provider = "openai-codex" # "zhipu", "github-copilot", "openai-codex"
 default_model = "gpt-5.3-codex"
 default_thinking_level = "high"
-tool_call_idle_timeout_seconds = 10
+tool_call_idle_timeout_seconds = 120
 system_prompt = """You are an expert coding assistant called `Kon`.
 You help users by reading, searching, executing commands, editing code, and writing new files.
@@ -21,6 +21,9 @@ You help users by reading, searching, executing commands, editing code, and writ
 [compaction]
 on_overflow = "continue" # "continue" or "pause"
 buffer_tokens = 20000
+[agent]
+max_turns = 500
 default_context_window = 200000
 [ui.colors]

{kon_coding_agent-0.2.0 → kon_coding_agent-0.2.2}/src/kon/events.py RENAMED Viewed

@@ -100,6 +100,14 @@ class ToolArgsDeltaEvent:
     delta: str = ""
+@dataclass
+class ToolArgsTokenUpdateEvent:
+    type: Literal["tool_args_token_update"] = "tool_args_token_update"
+    tool_call_id: str = ""
+    tool_name: str = ""
+    token_count: int = 0
 @dataclass
 class ToolEndEvent:
     type: Literal["tool_end"] = "tool_end"
@@ -180,6 +188,7 @@ StreamEvent = (
     | TextEndEvent
     | ToolStartEvent
     | ToolArgsDeltaEvent
+    | ToolArgsTokenUpdateEvent
     | ToolEndEvent
     | ToolResultEvent
     | RetryEvent

{kon_coding_agent-0.2.0 → kon_coding_agent-0.2.2}/src/kon/llm/providers/mock.py RENAMED Viewed

@@ -15,6 +15,7 @@ Scenarios (set via scenario parameter):
 - "unknown_tool": call unknown tool
 - "long_text": multiple text chunks
 - "tool_hang": emits a tool call and then never sends StreamDone
+- "tool_with_many_chunks": tool call with many argument chunks for token counting tests
 """
 import asyncio
@@ -138,6 +139,45 @@ class MockProvider(BaseProvider):
                 return tool_hang_iter()
+            case "tool_with_many_chunks":
+                async def tool_with_many_chunks_iter():
+                    # Tool call with many chunks to test token counting
+                    # 24 chunks of 8 chars each = 192 chars = 48 tokens
+                    # Should trigger token update events at chunks 12, 16, 20, 24
+                    yield ToolCallStart(id="call-1", name="bash", index=0)
+                    chunks = [
+                        "aaaaaaa",
+                        "bbbbbbb",
+                        "ccccccc",
+                        "ddddddd",
+                        "eeeeeee",
+                        "fffffff",
+                        "ggggggg",
+                        "hhhhhhh",
+                        "iiiiiii",
+                        "jjjjjjj",
+                        "kkkkkkk",
+                        "lllllll",
+                        "mmmmmmm",
+                        "nnnnnnn",
+                        "ooooooo",
+                        "ppppppp",
+                        "qqqqqqq",
+                        "rrrrrrr",
+                        "sssssss",
+                        "ttttttt",
+                        "uuuuuuu",
+                        "vvvvvvv",
+                        "wwwwwww",
+                        "xxxxxxxx",
+                    ]
+                    for chunk in chunks:
+                        yield ToolCallDelta(index=0, arguments_delta=chunk)
+                    yield StreamDone(stop_reason=StopReason.TOOL_USE)
+                return tool_with_many_chunks_iter()
             case _:
                 # Fallback to default
                 async def default_iter():

{kon_coding_agent-0.2.0 → kon_coding_agent-0.2.2}/src/kon/loop.py RENAMED Viewed

@@ -76,7 +76,7 @@ def build_system_prompt(cwd: str, context: Context | None = None) -> str:
 @dataclass
 class AgentConfig:
-    max_turns: int = 200
+    max_turns: int | None = None
     system_prompt: str | None = None
     cwd: str | None = None
     context: Context | None = None
@@ -137,7 +137,12 @@ class Agent:
         system_prompt = self.config.system_prompt or build_system_prompt(cwd, self.config.context)
         try:
-            while turn < self.config.max_turns:
+            max_turns = (
+                self.config.max_turns
+                if self.config.max_turns is not None
+                else kon_config.agent.max_turns
+            )
+            while turn < max_turns:
                 if cancel_event and cancel_event.is_set():
                     was_interrupted = True
                     stop_reason = StopReason.INTERRUPTED
@@ -194,7 +199,7 @@ class Agent:
                 if stop_reason != StopReason.TOOL_USE:
                     break
-            if turn >= self.config.max_turns and not was_interrupted:
+            if turn >= max_turns and not was_interrupted:
                 stop_reason = StopReason.LENGTH
         except Exception as e:  # intentionally broad — top-level boundary; crash = broken TUI
@@ -219,7 +224,7 @@ class Agent:
         if last_usage is None:
             return
-        context_window = self.config.context_window or kon_config.compaction.default_context_window
+        context_window = self.config.context_window or kon_config.agent.default_context_window
         max_output = self.config.max_output_tokens or self.provider.config.max_tokens
         buffer_tokens = kon_config.compaction.buffer_tokens

{kon_coding_agent-0.2.0 → kon_coding_agent-0.2.2}/src/kon/session.py RENAMED Viewed

@@ -58,6 +58,7 @@ class ModelChangeEntry(EntryBase):
     type: Literal["model_change"] = "model_change"
     provider: str
     model_id: str
+    base_url: str | None = None
 class CompactionEntry(EntryBase):
@@ -241,13 +242,16 @@ class Session:
         self._append_entry(entry)
         return entry.id
-    def append_model_change(self, provider: str, model_id: str) -> str:
+    def append_model_change(
+        self, provider: str, model_id: str, base_url: str | None = None
+    ) -> str:
         entry = ModelChangeEntry(
             id=self._generate_entry_id(),
             parent_id=self._leaf_id,
             timestamp=_now_iso(),
             provider=provider,
             model_id=model_id,
+            base_url=base_url,
         )
         self._append_entry(entry)
         return entry.id
@@ -372,20 +376,25 @@ class Session:
         return self._initial_thinking_level
     @property
-    def model(self) -> tuple[str, str] | None:
+    def model(self) -> tuple[str, str, str | None] | None:
         for entry in reversed(self._entries):
             if isinstance(entry, ModelChangeEntry):
-                return (entry.provider, entry.model_id)
+                return (entry.provider, entry.model_id, entry.base_url)
         if self._initial_provider and self._initial_model_id:
-            return (self._initial_provider, self._initial_model_id)
+            return (self._initial_provider, self._initial_model_id, None)
         return None
-    def set_model(self, provider: str, model_id: str) -> None:
+    def set_model(self, provider: str, model_id: str, base_url: str | None = None) -> None:
         current = self.model
-        if current and current[0] == provider and current[1] == model_id:
+        if (
+            current
+            and current[0] == provider
+            and current[1] == model_id
+            and current[2] == base_url
+        ):
             return
-        self.append_model_change(provider, model_id)
+        self.append_model_change(provider, model_id, base_url)
     def set_thinking_level(self, thinking_level: str) -> None:
         if self.thinking_level == thinking_level:

{kon_coding_agent-0.2.0 → kon_coding_agent-0.2.2}/src/kon/turn.py RENAMED Viewed

@@ -59,6 +59,7 @@ from .events import (
     ThinkingEndEvent,
     ThinkingStartEvent,
     ToolArgsDeltaEvent,
+    ToolArgsTokenUpdateEvent,
     ToolEndEvent,
     ToolResultEvent,
     ToolStartEvent,
@@ -70,7 +71,14 @@ from .llm.base import LLMStream
 from .tools import BaseTool, get_tool, get_tool_definitions
 _STREAM_EXHAUSTED = object()
-_DEFAULT_TOOL_CALL_IDLE_TIMEOUT_SECONDS = 10.0
+_DEFAULT_TOOL_CALL_IDLE_TIMEOUT_SECONDS = 60.0
+_TOOL_ARGS_TOKEN_DISPLAY_THRESHOLD = 20
+_TOOL_ARGS_TOKEN_CHUNK_UPDATE_INTERVAL = 4
+def _count_tokens(text: str) -> int:
+    """Estimate token count from text (approx 4 chars per token)."""
+    return len(text) // 4
 class StreamState(StrEnum):
@@ -228,6 +236,10 @@ async def run_single_turn(
     pending_tool_calls: list[dict] = []
     current_tool_call: dict | None = None
+    # Token counting for tool argument streaming
+    _tool_arg_chunk_counter = 0
+    _tool_arg_token_count = 0
     current_state: StreamState | None = None
     stop_reason: StopReason = StopReason.STOP
     interrupted = False
@@ -381,6 +393,10 @@ async def run_single_turn(
                     pending_tool_calls.append(current_tool_call)
                     current_tool_call = None
+                # Reset token counters when starting a new tool call
+                _tool_arg_chunk_counter = 0
+                _tool_arg_token_count = 0
                 current_state = StreamState.TOOL_CALL
                 current_tool_call = {"id": id, "name": name, "arguments": ""}
@@ -391,6 +407,20 @@ async def run_single_turn(
                     current_tool_call["arguments"] += delta
                     yield ToolArgsDeltaEvent(tool_call_id=current_tool_call["id"], delta=delta)
+                    # Count tokens and fire update event every Nth chunk after threshold tokens
+                    _tool_arg_chunk_counter += 1
+                    _tool_arg_token_count += _count_tokens(delta)
+                    if (
+                        _tool_arg_token_count > _TOOL_ARGS_TOKEN_DISPLAY_THRESHOLD
+                        and _tool_arg_chunk_counter % _TOOL_ARGS_TOKEN_CHUNK_UPDATE_INTERVAL == 0
+                    ):
+                        yield ToolArgsTokenUpdateEvent(
+                            tool_call_id=current_tool_call["id"],
+                            tool_name=current_tool_call["name"],
+                            token_count=_tool_arg_token_count,
+                        )
             case StreamDone(stop_reason=reason):
                 stop_reason = reason

{kon_coding_agent-0.2.0 → kon_coding_agent-0.2.2}/src/kon/ui/app.py RENAMED Viewed

@@ -4,8 +4,10 @@ import os
 import shutil
 import sys
 import time
+import tomllib
 from collections import deque
 from importlib.metadata import PackageNotFoundError, version
+from pathlib import Path
 from typing import ClassVar
 from rich.console import Console
@@ -32,6 +34,7 @@ from ..events import (
     ThinkingDeltaEvent,
     ThinkingEndEvent,
     ThinkingStartEvent,
+    ToolArgsTokenUpdateEvent,
     ToolEndEvent,
     ToolResultEvent,
     ToolStartEvent,
@@ -65,12 +68,24 @@ from .session_ui import SessionUIMixin
 from .styles import STYLES
 from .widgets import InfoBar, QueueDisplay, StatusLine, format_path
-_PYPI_PACKAGE_NAME = "kon-coding-agent"
+def _get_package_name() -> str:
+    pyproject_path = Path(__file__).parent.parent.parent.parent / "pyproject.toml"
+    if pyproject_path.exists():
+        try:
+            data = tomllib.loads(pyproject_path.read_text())
+            return data["project"]["name"]
+        except Exception:
+            pass
+    return "kon-coding-agent"
+_PYPI_PACKAGE_NAME = _get_package_name()
 try:
     VERSION = version(_PYPI_PACKAGE_NAME)
 except PackageNotFoundError:
-    VERSION = "0.2.0"
+    VERSION = "0.2.2"
 _COPILOT_API_TYPES: frozenset[ApiType] = frozenset(
     {ApiType.GITHUB_COPILOT, ApiType.GITHUB_COPILOT_RESPONSES, ApiType.ANTHROPIC_COPILOT}
@@ -202,16 +217,20 @@ class Kon(CommandsMixin, SessionUIMixin, App[None]):
             if self._session.entries:
                 model_info = self._session.model
                 if model_info:
-                    provider, self._model = model_info
+                    provider, self._model, session_base_url = model_info
                     self._model_provider = provider
+                    if self._base_url is None and session_base_url:
+                        self._base_url = session_base_url
                 self._thinking_level = self._session.thinking_level
         elif self._continue_recent:
             self._session = Session.continue_recent(self._cwd)
             if self._session.entries:
                 model_info = self._session.model
                 if model_info:
-                    provider, self._model = model_info
+                    provider, self._model, session_base_url = model_info
                     self._model_provider = provider
+                    if self._base_url is None and session_base_url:
+                        self._base_url = session_base_url
                 self._thinking_level = self._session.thinking_level
         model_info = get_model(self._model, self._model_provider)
@@ -257,7 +276,7 @@ class Kon(CommandsMixin, SessionUIMixin, App[None]):
                 model_id=self._model,
                 thinking_level=self._thinking_level,
             )
-            self._session.append_model_change(model_provider, self._model)
+            self._session.append_model_change(model_provider, self._model, base_url)
         self._project_context = Context.load(self._cwd)
         # TODO: Surface self._project_context.skill_warnings in UI (e.g. chat info/error messages)
@@ -567,7 +586,6 @@ class Kon(CommandsMixin, SessionUIMixin, App[None]):
             tools = get_tools(DEFAULT_TOOLS)
             model_info = get_model(self._model, self._model_provider)
             agent_config = AgentConfig(
-                max_turns=50,
                 system_prompt=self._get_system_prompt(),
                 context_window=model_info.context_window if model_info else None,
                 max_output_tokens=model_info.max_tokens if model_info else None,
@@ -621,6 +639,10 @@ class Kon(CommandsMixin, SessionUIMixin, App[None]):
                             chat.start_tool(name, id, "")
                             self._current_block_type = "tool_call"
                             status.increment_tool_calls()
+                            status.set_streaming_tokens(0)  # Reset token count for new tool
+                        case ToolArgsTokenUpdateEvent(token_count=tc):
+                            status.set_streaming_tokens(tc)
                         case ToolEndEvent(tool_call_id=id, display=display):
                             chat.update_tool_call_msg(id, display)
@@ -743,6 +765,7 @@ def main():
         dest="resume_session",
         help="Resume a specific session by ID (full or unique prefix)",
     )
+    parser.add_argument("--version", action="version", version=f"kon {VERSION}")
     args = parser.parse_args()
     app = Kon(

kon-coding-agent 0.2.0__tar.gz → 0.2.2__tar.gz

kon-coding-agent 0.2.0tar.gz → 0.2.2tar.gz