PyPI - augint-shell - Versions diffs - 0.78.0__tar.gz → 0.80.0__tar.gz - Mend

augint-shell 0.78.0tar.gz → 0.80.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

{augint_shell-0.78.0 → augint_shell-0.80.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.3
 Name: augint-shell
-Version: 0.78.0
+Version: 0.80.0
 Summary: Launch AI coding tools and local LLMs in Docker containers
 Author: svange
 Requires-Dist: docker>=7.0.0
@@ -96,29 +96,67 @@ ai-shell opencode
 ## Configuration
-Optional `ai-shell.toml` in your project root:
-```toml
-[container]
-image = "svange/augint-shell"
-image_tag = "latest"
-extra_env = { MY_VAR = "value" }
-[llm]
-primary_model = "qwen3-coder:30b-a3b-q4_K_M"
-fallback_model = "huihui_ai/llama3.3-abliterated"
-context_size = 32768
-ollama_port = 11434
-webui_port = 3000
+Optional `.ai-shell.yaml` in your project root (YAML is the default; TOML is
+also accepted — see `ai-shell init` for the full generated template with
+per-section rationale):
+```yaml
+container:
+  image: svange/augint-shell
+  image_tag: latest
+  extra_env:
+    MY_VAR: value
+llm:
+  primary_chat_model: qwen3.5:27b
+  secondary_chat_model: huihui_ai/qwen3.5-abliterated:27b
+  primary_coding_model: qwen3-coder:30b-a3b-q4_K_M
+  secondary_coding_model: huihui_ai/qwen3-coder-abliterated:30b-a3b-instruct-q4_K_M
+  context_size: 32768
+  ollama_port: 11434
+  webui_port: 3000
+  extra_models: []   # additional Ollama tags to pull alongside the 4 slots
 ```
-Global config at `~/.config/ai-shell/config.toml` is also supported.
+Global config at `~/.ai-shell.yaml` or `~/.config/ai-shell/config.yaml` is
+also supported.
+> The previous `primary_model` / `fallback_model` keys were removed. They were
+> role-ambiguous (chat vs. coding). If you had them set, move them to the
+> matching slot above. ai-shell will refuse to start with those legacy keys
+> present and print a migration hint.
 `ai-shell` does not manage tool-specific config files for Codex, OpenCode, or
 Aider. Use `augint-opencodex` or the tools' native config files for those, and
 use `ai-shell` for container/runtime settings such as AWS profiles, local LLM
 ports, and Claude options.
+### Local LLM stack
+Four role-specific model slots, each sized for an RTX 4090 (24 GiB VRAM). All
+four defaults together total ~74 GB on disk.
+| Slot | Default | Size | Role | Routed to |
+|---|---|---|---|---|
+| `primary_chat_model` | `qwen3.5:27b` | 17 GB | Best chat model that fits a 4090 | Open WebUI default |
+| `secondary_chat_model` | `huihui_ai/qwen3.5-abliterated:27b` | 17 GB | Best uncensored chat (abliterated Qwen3.5) | Open WebUI (selectable) |
+| `primary_coding_model` | `qwen3-coder:30b-a3b-q4_K_M` | 19 GB | Best agentic coder with explicit Ollama tools badge | OpenCode / Aider default |
+| `secondary_coding_model` | `huihui_ai/qwen3-coder-abliterated:30b-a3b-instruct-q4_K_M` | 19 GB | Best uncensored coder (abliterated Qwen3-Coder) | OpenCode (selectable) |
+Each pair shares a base model — primary is the standard aligned release;
+secondary is the huihui.ai abliterated variant (refusal directions neutralized
+via weight surgery, benchmark quality preserved). Switching primary <->
+secondary within a slot keeps tool formats and context semantics identical.
+`ai-shell llm pull` / `ai-shell llm setup` downloads all 4 slots plus any
+`extra_models` entries, deduped.
+**Three caveats worth knowing:**
+1. **Qwen3.5 Ollama tool calling is broken** ([ollama #14493](https://github.com/ollama/ollama/issues/14493), open). This does not affect Open WebUI's default chat with web search and RAG — those run server-side in WebUI without touching Ollama's tools API. It does affect agent CLIs routed through Ollama's `/v1/chat/completions` tools array, which is why the chat slots are Qwen3.5 and the coding slots are Qwen3-Coder (explicit tools badge, working parser).
+2. **Ollama `num_ctx` defaults to 4096** for every model, well below what modern agent prompts need (Claude Code sends ~35K tokens). `context_size` in your config is applied via Modelfile override during `llm setup` — leave it at 32768 unless you have a reason.
+3. **Qwen3-Coder tool-count cliff**: reliable native `tool_calls` emission below ~5 registered tools; above that the model may emit XML inside content and some parsers miss it. Keep agent tool sets tight.
 ## How It Works
 - Pulls a pre-built Docker image from Docker Hub (`svange/augint-shell`)

{augint_shell-0.78.0 → augint_shell-0.80.0}/README.md RENAMED Viewed

@@ -83,29 +83,67 @@ ai-shell opencode
 ## Configuration
-Optional `ai-shell.toml` in your project root:
-```toml
-[container]
-image = "svange/augint-shell"
-image_tag = "latest"
-extra_env = { MY_VAR = "value" }
-[llm]
-primary_model = "qwen3-coder:30b-a3b-q4_K_M"
-fallback_model = "huihui_ai/llama3.3-abliterated"
-context_size = 32768
-ollama_port = 11434
-webui_port = 3000
+Optional `.ai-shell.yaml` in your project root (YAML is the default; TOML is
+also accepted — see `ai-shell init` for the full generated template with
+per-section rationale):
+```yaml
+container:
+  image: svange/augint-shell
+  image_tag: latest
+  extra_env:
+    MY_VAR: value
+llm:
+  primary_chat_model: qwen3.5:27b
+  secondary_chat_model: huihui_ai/qwen3.5-abliterated:27b
+  primary_coding_model: qwen3-coder:30b-a3b-q4_K_M
+  secondary_coding_model: huihui_ai/qwen3-coder-abliterated:30b-a3b-instruct-q4_K_M
+  context_size: 32768
+  ollama_port: 11434
+  webui_port: 3000
+  extra_models: []   # additional Ollama tags to pull alongside the 4 slots
 ```
-Global config at `~/.config/ai-shell/config.toml` is also supported.
+Global config at `~/.ai-shell.yaml` or `~/.config/ai-shell/config.yaml` is
+also supported.
+> The previous `primary_model` / `fallback_model` keys were removed. They were
+> role-ambiguous (chat vs. coding). If you had them set, move them to the
+> matching slot above. ai-shell will refuse to start with those legacy keys
+> present and print a migration hint.
 `ai-shell` does not manage tool-specific config files for Codex, OpenCode, or
 Aider. Use `augint-opencodex` or the tools' native config files for those, and
 use `ai-shell` for container/runtime settings such as AWS profiles, local LLM
 ports, and Claude options.
+### Local LLM stack
+Four role-specific model slots, each sized for an RTX 4090 (24 GiB VRAM). All
+four defaults together total ~74 GB on disk.
+| Slot | Default | Size | Role | Routed to |
+|---|---|---|---|---|
+| `primary_chat_model` | `qwen3.5:27b` | 17 GB | Best chat model that fits a 4090 | Open WebUI default |
+| `secondary_chat_model` | `huihui_ai/qwen3.5-abliterated:27b` | 17 GB | Best uncensored chat (abliterated Qwen3.5) | Open WebUI (selectable) |
+| `primary_coding_model` | `qwen3-coder:30b-a3b-q4_K_M` | 19 GB | Best agentic coder with explicit Ollama tools badge | OpenCode / Aider default |
+| `secondary_coding_model` | `huihui_ai/qwen3-coder-abliterated:30b-a3b-instruct-q4_K_M` | 19 GB | Best uncensored coder (abliterated Qwen3-Coder) | OpenCode (selectable) |
+Each pair shares a base model — primary is the standard aligned release;
+secondary is the huihui.ai abliterated variant (refusal directions neutralized
+via weight surgery, benchmark quality preserved). Switching primary <->
+secondary within a slot keeps tool formats and context semantics identical.
+`ai-shell llm pull` / `ai-shell llm setup` downloads all 4 slots plus any
+`extra_models` entries, deduped.
+**Three caveats worth knowing:**
+1. **Qwen3.5 Ollama tool calling is broken** ([ollama #14493](https://github.com/ollama/ollama/issues/14493), open). This does not affect Open WebUI's default chat with web search and RAG — those run server-side in WebUI without touching Ollama's tools API. It does affect agent CLIs routed through Ollama's `/v1/chat/completions` tools array, which is why the chat slots are Qwen3.5 and the coding slots are Qwen3-Coder (explicit tools badge, working parser).
+2. **Ollama `num_ctx` defaults to 4096** for every model, well below what modern agent prompts need (Claude Code sends ~35K tokens). `context_size` in your config is applied via Modelfile override during `llm setup` — leave it at 32768 unless you have a reason.
+3. **Qwen3-Coder tool-count cliff**: reliable native `tool_calls` emission below ~5 registered tools; above that the model may emit XML inside content and some parsers miss it. Keep agent tool sets tight.
 ## How It Works
 - Pulls a pre-built Docker image from Docker Hub (`svange/augint-shell`)

{augint_shell-0.78.0 → augint_shell-0.80.0}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "augint-shell"
-version = "0.78.0"
+version = "0.80.0"
 description = "Launch AI coding tools and local LLMs in Docker containers"
 authors = [{name = "svange"}]
 readme = "README.md"

{augint_shell-0.78.0 → augint_shell-0.80.0}/src/ai_shell/__init__.py RENAMED Viewed

@@ -1,6 +1,6 @@
 """augint-shell (ai-shell) - Launch AI coding tools and local LLMs in Docker containers."""
-__version__ = "0.78.0"
+__version__ = "0.80.0"
 __all__ = [
     "__version__",

{augint_shell-0.78.0 → augint_shell-0.80.0}/src/ai_shell/cli/commands/llm.py RENAMED Viewed

@@ -119,8 +119,8 @@ def _validate_models_or_abort(*model_refs: str) -> None:
     for ref in missing:
         console.print(f"  - [cyan]{ref}[/cyan]  (tags: {_tag_list_url(ref)})")
     console.print(
-        "\nUpdate [bold]primary_model[/bold] / [bold]fallback_model[/bold] in "
-        "your ai-shell config to a valid tag and retry."
+        "\nUpdate the relevant [bold]*_chat_model[/bold] / [bold]*_coding_model[/bold] "
+        "entry (or [bold]extra_models[/bold]) in your ai-shell config to a valid tag and retry."
     )
     raise click.Abort()
@@ -398,15 +398,13 @@ def llm_pull(ctx):
     manager = _get_manager(ctx)
     config = manager.config
-    _validate_models_or_abort(config.primary_model, config.fallback_model)
+    models = config.models_to_pull
+    _validate_models_or_abort(*models)
-    console.print(f"[bold]Pulling primary model: {config.primary_model}...[/bold]")
-    output = manager.exec_in_ollama(["ollama", "pull", config.primary_model])
-    console.print(output)
-    console.print(f"\n[bold]Pulling fallback model: {config.fallback_model}...[/bold]")
-    output = manager.exec_in_ollama(["ollama", "pull", config.fallback_model])
-    console.print(output)
+    for model in models:
+        console.print(f"[bold]Pulling {model}...[/bold]")
+        output = manager.exec_in_ollama(["ollama", "pull", model])
+        console.print(output)
     console.print("\n[bold]Available models:[/bold]")
     output = manager.exec_in_ollama(["ollama", "list"])
@@ -426,7 +424,8 @@ def llm_setup(ctx, webui: bool, voice: bool, no_voice: bool, n8n: bool, all_: bo
     manager = _get_manager(ctx)
     config = manager.config
-    _validate_models_or_abort(config.primary_model, config.fallback_model)
+    models = config.models_to_pull
+    _validate_models_or_abort(*models)
     console.print("[bold]Starting LLM stack...[/bold]")
     _warn_if_low_memory()
@@ -452,13 +451,10 @@ def llm_setup(ctx, webui: bool, voice: bool, no_voice: bool, n8n: bool, all_: bo
         console.print("[bold red]Ollama failed to start after 20s[/bold red]")
         raise click.Abort()
-    console.print(f"\n[bold]Pulling primary model: {config.primary_model}...[/bold]")
-    output = manager.exec_in_ollama(["ollama", "pull", config.primary_model])
-    console.print(output)
-    console.print(f"\n[bold]Pulling fallback model: {config.fallback_model}...[/bold]")
-    output = manager.exec_in_ollama(["ollama", "pull", config.fallback_model])
-    console.print(output)
+    for model in models:
+        console.print(f"\n[bold]Pulling {model}...[/bold]")
+        output = manager.exec_in_ollama(["ollama", "pull", model])
+        console.print(output)
     console.print("\n[bold green]============================================[/bold green]")
     console.print("[bold green] Setup complete![/bold green]")
@@ -469,9 +465,13 @@ def llm_setup(ctx, webui: bool, voice: bool, no_voice: bool, n8n: bool, all_: bo
         console.print(f"  Open WebUI:  http://localhost:{config.webui_port}")
     if n8n:
         console.print(f"  n8n:         http://localhost:{config.n8n_port}")
-    console.print(f"\n  Primary model:  {config.primary_model}")
-    console.print(f"  Fallback model: {config.fallback_model}")
-    console.print(f"  Context window: {config.context_size} tokens")
+    console.print(f"\n  Primary chat:      {config.primary_chat_model}")
+    console.print(f"  Secondary chat:    {config.secondary_chat_model}")
+    console.print(f"  Primary coding:    {config.primary_coding_model}")
+    console.print(f"  Secondary coding:  {config.secondary_coding_model}")
+    if config.extra_models:
+        console.print(f"  Extra models:      {', '.join(config.extra_models)}")
+    console.print(f"  Context window:    {config.context_size} tokens")
     console.print("[bold green]============================================[/bold green]")
@@ -530,9 +530,13 @@ def llm_status(ctx):
         console.print(f"  n8n:                http://{lan}:{config.n8n_port}")
     console.print("\n[bold]Configuration:[/bold]")
-    console.print(f"  Primary model:   {config.primary_model}")
-    console.print(f"  Fallback model:  {config.fallback_model}")
-    console.print(f"  Context window:  {config.context_size} tokens")
+    console.print(f"  Primary chat:      {config.primary_chat_model}")
+    console.print(f"  Secondary chat:    {config.secondary_chat_model}")
+    console.print(f"  Primary coding:    {config.primary_coding_model}")
+    console.print(f"  Secondary coding:  {config.secondary_coding_model}")
+    if config.extra_models:
+        console.print(f"  Extra models:      {', '.join(config.extra_models)}")
+    console.print(f"  Context window:    {config.context_size} tokens")
     vram = get_vram_info()
     if vram is not None:

{augint_shell-0.78.0 → augint_shell-0.80.0}/src/ai_shell/cli/commands/tools.py RENAMED Viewed

@@ -1154,6 +1154,11 @@ def opencode(
     manager.ensure_tool_fresh(name, "opencode")
     cmd = ["/root/.opencode/bin/opencode"]
+    if not use_bedrock:
+        # Default OpenCode to the primary coding slot (benchmark-optimized,
+        # explicit Ollama tools badge). Users can switch to the secondary
+        # (uncensored) slot in the OpenCode model picker at runtime.
+        cmd.extend(["--model", f"ollama/{config.primary_coding_model}"])
     console.print(f"[bold]Launching opencode{bedrock_label} in {name}...[/bold]")
     manager.exec_interactive(name, cmd, extra_env=exec_env)
@@ -1165,7 +1170,7 @@ def opencode(
 def aider(ctx, safe, extra_args):
     """Launch aider with local LLM in the dev container."""
     manager, name, exec_env, config = _get_manager(ctx)
-    aider_model = f"ollama_chat/{config.primary_model}"
+    aider_model = f"ollama_chat/{config.primary_coding_model}"
     cmd = ["aider", "--model", aider_model]
     if not safe:
         cmd.append("--yes-always")

{augint_shell-0.78.0 → augint_shell-0.80.0}/src/ai_shell/config.py RENAMED Viewed

@@ -24,13 +24,15 @@ from ai_shell import __version__
 from ai_shell.defaults import (
     DEFAULT_CONTEXT_SIZE,
     DEFAULT_DEV_PORTS,
-    DEFAULT_FALLBACK_MODEL,
     DEFAULT_IMAGE,
     DEFAULT_KOKORO_PORT,
     DEFAULT_KOKORO_VOICE,
     DEFAULT_N8N_PORT,
     DEFAULT_OLLAMA_PORT,
-    DEFAULT_PRIMARY_MODEL,
+    DEFAULT_PRIMARY_CHAT_MODEL,
+    DEFAULT_PRIMARY_CODING_MODEL,
+    DEFAULT_SECONDARY_CHAT_MODEL,
+    DEFAULT_SECONDARY_CODING_MODEL,
     DEFAULT_WEBUI_PORT,
 )
@@ -47,9 +49,15 @@ class AiShellConfig:
     project_name: str = ""
     project_dir: Path = field(default_factory=Path.cwd)
-    # LLM
-    primary_model: str = DEFAULT_PRIMARY_MODEL
-    fallback_model: str = DEFAULT_FALLBACK_MODEL
+    # LLM model slots. Primary = best-available; secondary = best uncensored
+    # alternative. Chat slots are routed to Open WebUI, coding slots to
+    # OpenCode / Aider. `extra_models` is a free-form list of additional
+    # Ollama tags to pull alongside the 4 slots (deduped).
+    primary_chat_model: str = DEFAULT_PRIMARY_CHAT_MODEL
+    secondary_chat_model: str = DEFAULT_SECONDARY_CHAT_MODEL
+    primary_coding_model: str = DEFAULT_PRIMARY_CODING_MODEL
+    secondary_coding_model: str = DEFAULT_SECONDARY_CODING_MODEL
+    extra_models: list[str] = field(default_factory=list)
     context_size: int = DEFAULT_CONTEXT_SIZE
     ollama_port: int = DEFAULT_OLLAMA_PORT
     webui_port: int = DEFAULT_WEBUI_PORT
@@ -85,6 +93,28 @@ class AiShellConfig:
         """Return deduplicated, sorted list of dev container ports to expose."""
         return sorted(set(DEFAULT_DEV_PORTS + self.extra_ports))
+    @property
+    def models_to_pull(self) -> list[str]:
+        """Return the full deduped list of Ollama model tags to pull.
+        The 4 slots in order, followed by any ``extra_models``. Duplicates
+        are removed while preserving first-occurrence order.
+        """
+        ordered = [
+            self.primary_chat_model,
+            self.secondary_chat_model,
+            self.primary_coding_model,
+            self.secondary_coding_model,
+            *self.extra_models,
+        ]
+        seen: set[str] = set()
+        deduped: list[str] = []
+        for model in ordered:
+            if model and model not in seen:
+                seen.add(model)
+                deduped.append(model)
+        return deduped
 def load_config(
     project_override: str | None = None,
@@ -151,6 +181,39 @@ def _load_config_file(path: Path) -> dict:
         return tomllib.load(f)
+_LEGACY_LLM_KEY_HINT = {
+    "primary_model": (
+        "renamed to `primary_coding_model` (coding) or `primary_chat_model` "
+        "(chat). The new config uses 4 role-specific slots; pick the one "
+        "that matches your intent. See the generated .ai-shell.yaml for the "
+        "full layout."
+    ),
+    "fallback_model": (
+        "removed. The previous `fallback_model` was role-ambiguous. Use "
+        "`secondary_chat_model` and `secondary_coding_model` instead "
+        "(both default to the best uncensored variants). See the generated "
+        ".ai-shell.yaml for the full layout."
+    ),
+}
+def _reject_legacy_llm_keys(llm_section: dict, path: Path) -> None:
+    """Raise on deprecated `primary_model` / `fallback_model` keys.
+    These were removed when the llm config split into 4 role-specific slots
+    (primary/secondary x chat/coding). Silently aliasing them would corrupt
+    intent — e.g. the old `fallback_model` meant different things to chat and
+    coding users. Fail loudly with migration guidance.
+    """
+    bad = [k for k in _LEGACY_LLM_KEY_HINT if k in llm_section]
+    if not bad:
+        return
+    lines = [f"\nDeprecated llm key(s) found in {path}:"]
+    for key in bad:
+        lines.append(f"  - `{key}`: {_LEGACY_LLM_KEY_HINT[key]}")
+    raise ValueError("\n".join(lines))
 def _apply_config(config: AiShellConfig, path: Path) -> None:
     """Apply settings from a YAML or TOML config file."""
     try:
@@ -176,10 +239,17 @@ def _apply_config(config: AiShellConfig, path: Path) -> None:
     # [llm] section
     llm = data.get("llm", {})
-    if "primary_model" in llm:
-        config.primary_model = llm["primary_model"]
-    if "fallback_model" in llm:
-        config.fallback_model = llm["fallback_model"]
+    _reject_legacy_llm_keys(llm, path)
+    if "primary_chat_model" in llm:
+        config.primary_chat_model = llm["primary_chat_model"]
+    if "secondary_chat_model" in llm:
+        config.secondary_chat_model = llm["secondary_chat_model"]
+    if "primary_coding_model" in llm:
+        config.primary_coding_model = llm["primary_coding_model"]
+    if "secondary_coding_model" in llm:
+        config.secondary_coding_model = llm["secondary_coding_model"]
+    if "extra_models" in llm:
+        config.extra_models.extend(str(m) for m in llm["extra_models"])
     if "context_size" in llm:
         config.context_size = int(llm["context_size"])
     if "ollama_port" in llm:
@@ -214,14 +284,29 @@ def _apply_config(config: AiShellConfig, path: Path) -> None:
         config.skip_updates = bool(container["skip_updates"])
+_LEGACY_ENV_VARS = {
+    "AI_SHELL_PRIMARY_MODEL": ("AI_SHELL_PRIMARY_CODING_MODEL or AI_SHELL_PRIMARY_CHAT_MODEL"),
+    "AI_SHELL_FALLBACK_MODEL": ("AI_SHELL_SECONDARY_CHAT_MODEL or AI_SHELL_SECONDARY_CODING_MODEL"),
+}
 def _apply_env_vars(config: AiShellConfig) -> None:
     """Apply AI_SHELL_* environment variable overrides."""
+    bad_env = [k for k in _LEGACY_ENV_VARS if os.environ.get(k) is not None]
+    if bad_env:
+        lines = ["\nDeprecated AI_SHELL_* env var(s) set:"]
+        for key in bad_env:
+            lines.append(f"  - {key}: use {_LEGACY_ENV_VARS[key]} instead")
+        raise ValueError("\n".join(lines))
     env_map: dict[str, tuple[str, type]] = {
         "AI_SHELL_IMAGE": ("image", str),
         "AI_SHELL_IMAGE_TAG": ("image_tag", str),
         "AI_SHELL_PROJECT": ("project_name", str),
-        "AI_SHELL_PRIMARY_MODEL": ("primary_model", str),
-        "AI_SHELL_FALLBACK_MODEL": ("fallback_model", str),
+        "AI_SHELL_PRIMARY_CHAT_MODEL": ("primary_chat_model", str),
+        "AI_SHELL_SECONDARY_CHAT_MODEL": ("secondary_chat_model", str),
+        "AI_SHELL_PRIMARY_CODING_MODEL": ("primary_coding_model", str),
+        "AI_SHELL_SECONDARY_CODING_MODEL": ("secondary_coding_model", str),
         "AI_SHELL_CONTEXT_SIZE": ("context_size", int),
         "AI_SHELL_OLLAMA_PORT": ("ollama_port", int),
         "AI_SHELL_WEBUI_PORT": ("webui_port", int),

{augint_shell-0.78.0 → augint_shell-0.80.0}/src/ai_shell/container.py RENAMED Viewed

@@ -357,6 +357,11 @@ class ContainerManager:
         environment = {
             "OLLAMA_BASE_URL": f"http://{OLLAMA_CONTAINER}:11434",
             "WEBUI_AUTH": "false",
+            # DEFAULT_MODELS is a PersistentConfig: env seeds the DB on first
+            # boot and UI edits win after that. Point new chats at the
+            # primary chat slot; users can pick the secondary (uncensored)
+            # from the model dropdown.
+            "DEFAULT_MODELS": self.config.primary_chat_model,
         }
         if voice_enabled:
             environment.update(

{augint_shell-0.78.0 → augint_shell-0.80.0}/src/ai_shell/defaults.py RENAMED Viewed

@@ -57,8 +57,13 @@ WEBUI_IMAGE = "ghcr.io/open-webui/open-webui:main"
 KOKORO_IMAGE_CPU = "ghcr.io/remsky/kokoro-fastapi-cpu:latest"
 KOKORO_IMAGE_GPU = "ghcr.io/remsky/kokoro-fastapi-gpu:latest"
 N8N_IMAGE = "docker.n8n.io/n8nio/n8n"
-DEFAULT_PRIMARY_MODEL = "qwen3-coder:30b-a3b-q4_K_M"
-DEFAULT_FALLBACK_MODEL = "huihui_ai/llama3.3-abliterated"
+# Model slots (RTX 4090-sized, validated April 2026). Primary = best available for
+# the role; secondary = best uncensored alternative. See README "Local LLM stack"
+# and the generated .ai-shell.yaml for per-slot rationale and caveats.
+DEFAULT_PRIMARY_CHAT_MODEL = "qwen3.5:27b"
+DEFAULT_SECONDARY_CHAT_MODEL = "huihui_ai/qwen3.5-abliterated:27b"
+DEFAULT_PRIMARY_CODING_MODEL = "qwen3-coder:30b-a3b-q4_K_M"
+DEFAULT_SECONDARY_CODING_MODEL = "huihui_ai/qwen3-coder-abliterated:30b-a3b-instruct-q4_K_M"
 DEFAULT_CONTEXT_SIZE = 32768
 DEFAULT_OLLAMA_PORT = 11434
 DEFAULT_WEBUI_PORT = 3000

augint_shell-0.80.0/src/ai_shell/templates/ai-shell.yaml ADDED Viewed

@@ -0,0 +1,72 @@
+# =============================================================================
+# .ai-shell.yaml - Project configuration for ai-shell
+# =============================================================================
+# Priority (highest wins): CLI flags > env vars > this file > global config > defaults
+# Global config: ~/.ai-shell.yaml (applies to all projects)
+# Full docs:     https://github.com/svange/augint-shell#local-llm-stack
+# -----------------------------------------------------------------------------
+# llm - Local LLM stack (Ollama + Open WebUI + optional TTS / n8n)
+# -----------------------------------------------------------------------------
+# Four role-specific model slots. Primary = best-available; secondary =
+# best uncensored (abliterated) alternative of the same base. Chat slots
+# route to Open WebUI (DEFAULT_MODELS); coding slots route to OpenCode /
+# Aider (--model). `ai-shell llm pull` pulls all 4 slots plus any
+# `extra_models` entries, deduped. See README for per-slot rationale and
+# caveats (Qwen3.5 Ollama tool-call bug, num_ctx trap, tool-count cliff).
+llm:
+  primary_chat_model: qwen3.5:27b
+  secondary_chat_model: huihui_ai/qwen3.5-abliterated:27b
+  primary_coding_model: qwen3-coder:30b-a3b-q4_K_M
+  secondary_coding_model: huihui_ai/qwen3-coder-abliterated:30b-a3b-instruct-q4_K_M
+  context_size: 32768
+  # Additional Ollama tags to pull alongside the 4 slots (deduped).
+  # Uncomment any line to enable -- indentation is already correct.
+  extra_models:
+    # - llama3.1:8b                         # ~5 GB   fast general chat
+    # - llama3.2:latest                     # ~2 GB   Llama 3.2 3B, very fast
+    # - dolphin3:8b                         # ~5 GB   uncensored Llama 3.1 8B
+    # - qwen3:30b-a3b-instruct-2507-q4_K_M  # ~19 GB  Qwen3 MoE chat alt (~196 tok/s)
+    # - qwen2.5-coder:32b-q4_k_m            # ~19 GB  Qwen2.5-Coder (previous gen)
+    # - qwen2.5-coder:14b-instruct          # ~9 GB   smaller Qwen2.5-Coder
+# -----------------------------------------------------------------------------
+# aws - AWS profile + region. Uncomment to override defaults.
+# -----------------------------------------------------------------------------
+# ai_profile:      AWS profile for infra tools (terraform/cdk). Sets AWS_PROFILE
+#                  in the container.
+# bedrock_profile: AWS profile for Bedrock LLM calls (--aws mode). Often a
+#                  different account than ai_profile.
+# region:          Region for Bedrock. Default: us-east-1
+# Auth: ~/.aws is bind-mounted read-write. `aws sso login` on the host as needed.
+#
+# aws:
+#   ai_profile: my-infra-account
+#   bedrock_profile: my-ai-account
+#   region: us-east-1
+# -----------------------------------------------------------------------------
+# claude - Claude Code backend selection. Uncomment to use Bedrock instead of
+# the Anthropic API. Equivalent per-session flag: `ai-shell claude --aws`.
+# -----------------------------------------------------------------------------
+# claude:
+#   provider: aws
+# -----------------------------------------------------------------------------
+# container - Docker image, env, mounts, ports. Uncomment to override.
+# -----------------------------------------------------------------------------
+# image / image_tag:  Override the default svange/augint-shell:latest image.
+# extra_env:          Additional env vars injected into the dev container.
+# extra_volumes:      Additional bind mounts ("/host:/container" or ":/path:ro").
+# ports:              Extra host ports to expose on the dev container.
+#
+# container:
+#   image: svange/augint-shell
+#   image_tag: latest
+#   extra_env:
+#     MY_VAR: value
+#   extra_volumes:
+#     - /host/path:/container/path
+#   ports:
+#     - 9000
+#     - 9229

augint_shell-0.78.0/src/ai_shell/templates/ai-shell.yaml DELETED Viewed

@@ -1,207 +0,0 @@
-# =============================================================================
-# .ai-shell.yaml - Project configuration for ai-shell
-# =============================================================================
-# Uncomment and modify settings you want to override.
-# Priority (highest wins): CLI flags > env vars > this file > global config > defaults
-# Global config: ~/.config/ai-shell/config.yaml (same format, applies to all projects)
-# Docs: https://github.com/svange/augint-shell
-# =============================================================================
-# aws - Amazon Web Services configuration
-# =============================================================================
-# ai_profile: AWS profile for the AI's working environment (aws cli, terraform,
-#   cdk). Sets AWS_PROFILE in the container. This is the account the AI uses
-#   when running infrastructure commands.
-#   Override with env var: AI_SHELL_AI_PROFILE
-#
-# bedrock_profile: AWS profile for Bedrock LLM API calls. Often a different
-#   account than ai_profile. Overrides AWS_PROFILE specifically for AI tool
-#   processes launched with `--aws`.
-#   Override with env var: AI_SHELL_BEDROCK_PROFILE
-#   Override per-session with: --profile <name> on the CLI
-#
-# region: AWS region for Bedrock API calls. Default: us-east-1
-#   Override with env var: AI_SHELL_AWS_REGION
-#
-# Authentication: ~/.aws is bind-mounted into the container (read-write).
-# SSO, credential files, and config are available automatically.
-# If SSO tokens expire, run 'aws sso login --profile <name>' on the host.
-#
-# aws:
-#   ai_profile: my-infra-account
-#   bedrock_profile: my-ai-account
-#   region: us-east-1
-# =============================================================================
-# claude - Claude Code settings
-# =============================================================================
-# provider: API backend for Claude Code.
-#   "anthropic" - Direct Anthropic API (default, uses ~/.claude credentials)
-#   "aws"       - Amazon Bedrock (uses bedrock_profile from aws section)
-#
-# When provider is "aws":
-#   - CLAUDE_CODE_USE_BEDROCK=1 is set in the environment
-#   - AWS_PROFILE is set to bedrock_profile for Claude's process
-#   - Quick switch with CLI: ai-shell claude --aws
-#   - Override per-session:   ai-shell claude --aws --profile <name>
-#   - Tip: pin Bedrock model versions with ANTHROPIC_DEFAULT_SONNET_MODEL env var
-#
-# Override with env var: AI_SHELL_CLAUDE_PROVIDER
-#
-# claude:
-#   provider: aws
-# Codex runtime note:
-#   ai-shell does not manage Codex's own config file, but `ai-shell codex --aws`
-#   launches Codex with Bedrock by injecting `CLAUDE_CODE_USE_BEDROCK=1` and
-#   setting `AWS_PROFILE` to `aws.bedrock_profile` (or `--profile` if passed).
-#   Local-LLM Codex configuration is not managed in this file.
-# =============================================================================
-# container - Docker container settings
-# =============================================================================
-# image: Docker image (default: svange/augint-shell)
-# image_tag: Image tag (default: current ai-shell version)
-# extra_env: Additional environment variables for the container
-# extra_volumes: Additional bind mounts ("/host/path:/container/path" or ":/path:ro")
-# ports: Additional ports to expose (extends the default dev port set)
-#
-# container:
-#   image: svange/augint-shell
-#   image_tag: latest
-#   extra_env:
-#     MY_VAR: value
-#   extra_volumes:
-#     - /host/path:/container/path
-#   ports:
-#     - 9000
-#     - 9229
-# =============================================================================
-# llm - Local LLM settings (Ollama + Open WebUI)
-# =============================================================================
-# primary_model: Default Ollama model for inference
-# fallback_model: Backup model if primary unavailable
-# context_size: Context window in tokens (default: 32768)
-# ollama_port: Host port for Ollama API (default: 11434)
-# webui_port: Host port for Open WebUI (default: 3000)
-# n8n_port: Host port for n8n workflow automation (default: 5678)
-#
-# Models are downloaded automatically by `ai-shell llm setup`, which:
-#   1. Starts the Ollama container (GPU-enabled if an NVIDIA card is detected)
-#   2. Runs `ollama pull <primary_model>` and `ollama pull <fallback_model>`
-#   3. Applies the context-window Modelfile so both models run with num_ctx set
-#
-# To pull a model manually at any time:
-#   ai-shell llm shell        # opens a bash shell inside the Ollama container
-#   ollama pull <model>       # then run any ollama command directly
-#
-# ai-shell only manages the local LLM endpoint and generic runtime settings.
-# Tool-specific config files for Codex, OpenCode, and Aider should be managed
-# separately, for example via augint-opencodex.
-# Keep tool-specific provider, model, auth, and permission settings out of this
-# file. Put those in the generated tool config files instead.
-#
-# ─── RTX 4090 model guide (24 GiB VRAM) ─────────────────────────────────────
-#
-# General-chat / assistant models
-# ─────────────────────────────────
-# qwen3.5:27b                  ~15 GiB  Q4_K_M  fits on 4090 with headroom
-# qwen3.5:27b-q5_k_m           ~19 GiB  Q5_K_M  fits, higher quality
-# qwen3.5:32b-q4_k_m           ~19 GiB  Q4_K_M  tight but fits on 4090
-#
-# Uncensored / instruction-following variants
-# ────────────────────────────────────────────
-# dolphin3:8b                      ~5 GiB   Dolphin 3.0 (uncensored Llama 3.1 8B)
-# huihui_ai/llama3.3-abliterated   ~16 GiB  Llama 3.3 70B abliterated (uncensored chat)
-# llama3.1:8b                      ~5 GiB   Meta Llama 3.1 8B instruct
-# llama3.1:8b-instruct-q4_k_m      ~5 GiB   Q4_K_M quantized
-# llama3.2:latest                  ~2 GiB   Meta Llama 3.2 3B (fast/small)
-#
-# Coding-heavy models
-# ────────────────────
-# qwen2.5-coder:32b-q4_k_m        ~19 GiB  Q4_K_M  top coding quality on 4090
-# qwen3:14b-q4_k_m               ~9 GiB   Q4_K_M  fast coder with good accuracy
-# qwen3:32b-q4_k_m               ~19 GiB  Q4_K_M  best local coding on 4090
-# qwen3-coder:30b-a3b-q4_K_M     ~19 GiB  Q4_K_M  Qwen3-Coder 30B A3B (Mixture-of-Experts, ~3B active)
-#
-# Quick-start pull commands (run inside `ai-shell llm shell`):
-#   ollama pull qwen3.5:27b
-#   ollama pull qwen3-coder:30b-a3b-q4_K_M
-#   ollama pull huihui_ai/llama3.3-abliterated
-#   ollama pull dolphin3:8b
-#   ollama pull llama3.1:8b
-#   ollama pull qwen2.5-coder:32b-q4_k_m
-#   ollama pull qwen3:14b-q4_k_m
-#   ollama pull qwen3:32b-q4_k_m
-#
-# ─── RTX 5070 Ti model guide (12 GiB VRAM) ──────────────────────────────────
-#
-# Chat models
-# ────────────
-# qwen3.5:9b                              ~7 GiB   Q4_K_M  best chat, 256K ctx, multimodal
-# huihui_ai/qwen3.5-abliterated:9b        ~7 GiB   Q4_K    uncensored Qwen3.5 (abliterated)
-#
-# Coding models
-# ──────────────
-# qwen2.5-coder:14b-instruct              ~9 GiB   Q4_K_M  largest dedicated coder that fits
-# huihui_ai/qwen3.5-abliterated:9b-OmniCoder  ~6 GiB  Q4_K  uncensored coding variant
-#
-# Quick-start pull commands (run inside `ai-shell llm shell`):
-#   ollama pull qwen3.5:9b
-#   ollama pull huihui_ai/qwen3.5-abliterated:9b
-#   ollama pull qwen2.5-coder:14b-instruct
-#   ollama pull huihui_ai/qwen3.5-abliterated:9b-OmniCoder
-#
-# After pulling, set primary_model (and fallback_model) below, then run:
-#   ai-shell llm setup        # pulls models + applies context-window config
-#
-# llm:
-#   primary_model: qwen3-coder:30b-a3b-q4_K_M
-#   fallback_model: huihui_ai/llama3.3-abliterated
-#   context_size: 32768
-#   ollama_port: 11434
-#   webui_port: 3000
-#   n8n_port: 5678
-#
-# --- Example: 4090 coding-focused setup ---
-# llm:
-#   primary_model: qwen2.5-coder:32b-q4_k_m
-#   fallback_model: qwen3:14b-q4_k_m
-#   context_size: 32768
-#
-# --- Example: 4090 general-chat setup ---
-# llm:
-#   primary_model: qwen3.5:27b
-#   fallback_model: llama3.1:8b
-#   context_size: 32768
-#
-# --- Example: 4090 uncensored setup ---
-# llm:
-#   primary_model: dolphin3:8b
-#   fallback_model: llama3.1:8b
-#   context_size: 32768
-#
-# --- Example: 5070 Ti coding-focused setup ---
-# llm:
-#   primary_model: qwen2.5-coder:14b-instruct
-#   fallback_model: qwen3.5:9b
-#   context_size: 32768
-#
-# --- Example: 5070 Ti general-chat setup ---
-# llm:
-#   primary_model: qwen3.5:9b
-#   fallback_model: huihui_ai/qwen3.5-abliterated:9b
-#   context_size: 32768
-#
-# --- Example: 5070 Ti uncensored-chat setup ---
-# llm:
-#   primary_model: huihui_ai/qwen3.5-abliterated:9b
-#   fallback_model: dolphin3:8b
-#   context_size: 32768
-#
-# --- Example: 5070 Ti uncensored-coding setup ---
-# llm:
-#   primary_model: huihui_ai/qwen3.5-abliterated:9b-OmniCoder
-#   fallback_model: qwen2.5-coder:14b-instruct
-#   context_size: 32768