npm - agent-devkit - Versions diffs - 0.3.0 → 0.3.1 - Mend

agent-devkit 0.3.0 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

package/README.md +22 -11
package/package.json +1 -1
package/runtime/README.md +20 -10
package/runtime/cli/README.md +14 -6
package/runtime/cli/aikit/__init__.py +1 -1
package/runtime/cli/aikit/app_home.py +1 -0
package/runtime/cli/aikit/cli_parser.py +1 -1
package/runtime/cli/aikit/embedded_mini_brain.py +351 -0
package/runtime/cli/aikit/interactive_wizard.py +6 -8
package/runtime/cli/aikit/llm.py +28 -2
package/runtime/cli/aikit/local_llm.py +19 -4
package/runtime/cli/aikit/local_llm_operator.py +15 -5
package/runtime/cli/aikit/mini_brain.py +56 -44
package/runtime/cli/aikit/model_router.py +42 -9
package/runtime/cli/aikit/natural_prompt_runtime.py +69 -1
package/runtime/cli/aikit/onboarding.py +3 -3
package/runtime/cli/aikit/review_gate.py +14 -2
package/runtime/models/qwen2.5-0.5b-instruct/manifest.json +30 -0
package/runtime/scripts/release-catalog-snapshot.json +1 -1

package/README.md CHANGED Viewed

@@ -30,7 +30,7 @@ agent doctor
 Expected version for this release:
 ```text
-agent 0.3.0
+agent 0.3.1
 ```
 ## Quick Start
@@ -45,8 +45,12 @@ agent llm list
 agent commands list
 ```
-Agent DevKit `v0.3.0` also includes deterministic runtime discovery and
-integration commands:
+Agent DevKit `v0.3.1` also includes the embedded Qwen2.5-0.5B mini-brain
+contract for local bootstrap conversations without Ollama, Claude, Codex or API
+keys. The npm package stays small; `agent setup mini-brain --yes` downloads the
+GGUF into `.agent-devkit/models` after explicit opt-in.
+The `v0.3.0` deterministic runtime discovery and integration commands remain
+available:
 ```bash
 agent roadmap
@@ -73,14 +77,15 @@ Run a natural-language task:
 agent "analise o problema relatado no card 9900"
 ```
-Natural-language mode requires an LLM backend. Deterministic commands such as
-`agent agents list`, `agent capabilities list`, `agent doctor`, `agent provider`
-and `agent run` do not require an LLM.
+Natural-language mode can start with the embedded mini-brain. Stronger
+coordinator/reviewer backends remain optional for higher-level work.
+Deterministic commands such as `agent agents list`, `agent capabilities list`,
+`agent doctor`, `agent provider` and `agent run` do not require an external LLM.
 Running `agent` without arguments starts the local onboarding status and wizard:
 memory, personality, LLM backends, Ollama, toolchain, sources and next actions.
-Use `agent onboard minimal` for identity, coordinator LLM, Qwen3-0.6B via
-Ollama and local memory. Use `agent onboard complete` to include toolchain,
+Use `agent onboard minimal` for identity, optional coordinator LLM, installable
+mini-brain and local memory. Use `agent onboard complete` to include toolchain,
 providers/sources, specialist catalog, local automations, tasks, notifications,
 knowledge and shared memory. Both commands return plans; external installs
 still require explicit opt-in.
@@ -97,6 +102,7 @@ Useful operational commands:
 agent plan "analyze Azure card 7914"
 agent execute --dry-run "summarize these logs"
 agent workflow install daily-pr-review --dry-run
+agent setup mini-brain --yes
 agent local-llm doctor
 agent local-llm install qwen3:0.6b --dry-run
 agent skill create my-skill --description "Local skill"
@@ -224,9 +230,11 @@ agent llm configure openrouter --api-key-env OPENROUTER_API_KEY --model openai/g
 agent llm doctor openrouter
 ```
-### Option F: Ollama local backend
+### Embedded mini-brain and Ollama local backend
 ```bash
+agent setup mini-brain --yes
+agent local-llm doctor
 agent ollama status
 agent ollama models
 agent ollama pull qwen3:0.6b --dry-run
@@ -236,8 +244,11 @@ agent llm configure ollama --base-url http://localhost:11434/v1 --model qwen3:0.
 agent llm doctor ollama
 ```
-Ollama is treated as an operational worker for repetitive local tasks. Codex and
-Claude remain the preferred coordinators and reviewers for high-level planning,
+Agent DevKit includes an installable embedded mini-brain for initial
+conversation, onboarding and setup without external authentication. The GGUF is
+downloaded to `.agent-devkit/models` only after opt-in. Ollama is still treated
+as an optional operational worker for repetitive local tasks. Codex and Claude
+remain the preferred coordinators and reviewers for high-level planning,
 software changes, documents, automation decisions and final review.
 ### Switch or override the backend

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "agent-devkit",
-  "version": "0.3.0",
+  "version": "0.3.1",
   "description": "Agent DevKit CLI runtime for specialist AI agents, capabilities and provider-aware automations.",
   "type": "module",
   "license": "MIT",

package/runtime/README.md CHANGED Viewed

@@ -65,8 +65,8 @@ agent secrets doctor
 agent mcp tools
 ```
-`agent onboard minimal` planeja o setup essencial: identidade, coordenador LLM,
-mini-cerebro Qwen3-0.6B via Ollama e memoria local. `agent onboard complete`
+`agent onboard minimal` planeja o setup essencial: identidade, coordenador LLM
+opcional, mini-cerebro local instalavel sob demanda e memoria local. `agent onboard complete`
 inclui tambem toolchain, providers/sources, catalogo de agentes, automacoes
 locais, tarefas, notificacoes, knowledge e memoria compartilhada. Ambos
 retornam plano deterministico; instalacoes externas continuam exigindo opt-in.
@@ -247,14 +247,23 @@ Uso:
 agent "roteie este pedido para o agente especialista adequado"
 ```
-### Opcao F: usar Ollama local
+### Mini cerebro embarcado e Ollama local
-O Agent DevKit consegue diagnosticar Ollama, listar modelos, planejar pull e
-usar o backend local como trabalhador operacional. Claude/Codex continuam sendo
-os coordenadores e revisores preferenciais para decisao, especificacao e entrega
-final.
+O Agent DevKit vem com um mini cerebro local baseado no contrato
+`Qwen/Qwen2.5-0.5B-Instruct` para conversa inicial, onboarding, setup e tarefas
+simples sem depender de Claude, Codex, API externa ou Ollama. O pacote npm
+inclui o manifest do modelo; o GGUF e baixado para `.agent-devkit/models` sob
+demanda com `agent setup mini-brain --yes`.
+Ollama continua suportado como pool opcional de workers locais. O Agent DevKit
+consegue diagnosticar Ollama, listar modelos, planejar pull e usar o backend
+local como trabalhador operacional quando ele estiver configurado ou tiver
+modelos instalados. Claude/Codex continuam sendo os coordenadores e revisores
+preferenciais para decisao, especificacao e entrega final.
 ```bash
+agent setup mini-brain --yes
+agent local-llm doctor
 agent ollama status
 agent ollama models
 agent ollama pull qwen3:0.6b --dry-run
@@ -325,9 +334,10 @@ executa a task primaria pelo runner existente e revisa a conclusao pelo
 `review_gate`.
 Para tarefas operacionais como resumo, classificacao, extracao e normalizacao,
-o runtime pode delegar uma subtarefa limitada ao `local-llm-operator` usando
-Ollama. O resultado local aparece em `local_llm_execution` e e usado apenas como
-contexto de apoio pelo coordenador principal.
+o runtime pode usar o mini cerebro embarcado para bootstrap/conversa simples ou
+delegar uma subtarefa limitada ao `local-llm-operator` usando Ollama quando
+disponivel. O resultado local aparece em `local_llm_execution` e e usado apenas
+como contexto de apoio pelo coordenador principal.
 Quando `review_gate.required = true`, o Agent DevKit exige uma segunda revisao
 concreta pelo `execution-reviewer`, preferindo `claude-code` ou `codex-cli`.

package/runtime/cli/README.md CHANGED Viewed

@@ -209,8 +209,8 @@ agent onboard minimal
 agent onboard complete
 ```
-`minimal` cobre identidade, coordenador LLM, mini-cerebro Qwen3-0.6B via
-Ollama e memoria local. `complete` inclui tambem toolchain, providers/sources,
+`minimal` cobre identidade, coordenador LLM opcional, mini-cerebro local
+instalavel sob demanda e memoria local. `complete` inclui tambem toolchain, providers/sources,
 catalogo de agentes, automacoes locais, tarefas, notificacoes, knowledge e
 memoria compartilhada. Instalacoes externas continuam exigindo opt-in.
@@ -232,9 +232,12 @@ remoto continua exigindo provider, criptografia e opt-in explicito.
 ## Backends LLM
-O modo `agent "<prompt>"` exige um backend LLM. O Agent DevKit suporta tres
+O modo `agent "<prompt>"` consegue conversar e orientar setup com o mini cerebro
+local depois que ele for instalado com opt-in. Para coordenacao/revisao mais forte, o Agent DevKit suporta estas
 familias de backend:
+- Mini cerebro local instalavel (`embedded-mini-brain`) para onboarding, setup e
+  conversa simples sem autenticacao externa.
 - CLIs oficiais autenticadas fora do Agent DevKit (`codex-cli` e
   `claude-code`).
 - APIs configuradas por referencia a variavel de ambiente (`openai`,
@@ -321,6 +324,8 @@ agent llm doctor openrouter
 ### Ollama local
 ```bash
+agent setup mini-brain --yes
+agent local-llm doctor
 agent ollama status
 agent ollama models
 agent ollama pull qwen3:0.6b --dry-run
@@ -330,12 +335,15 @@ agent llm configure ollama --base-url http://localhost:11434/v1 --model qwen3:0.
 agent llm doctor ollama
 ```
-Ollama e tratado como executor operacional local. Codex e Claude continuam como
-coordenadores/revisores preferenciais para decisao, especificacao, codigo,
-documentos, automacoes e fechamento de entrega.
+O mini cerebro embarcado e a base inicial para conversa/setup sem dependencia
+externa. Ollama e tratado como executor operacional local opcional. Codex e
+Claude continuam como coordenadores/revisores preferenciais para decisao,
+especificacao, codigo, documentos, automacoes e fechamento de entrega.
 Backends suportados no MVP:
+- `embedded-mini-brain`: mini cerebro local embarcado para bootstrap e tarefas
+  simples.
 - `openai`: API OpenAI ou endpoint OpenAI-compatible.
 - `anthropic`: API Anthropic.
 - `openrouter`: API OpenRouter.

package/runtime/cli/aikit/__init__.py CHANGED Viewed

@@ -1,3 +1,3 @@
 """Public CLI implementation for AI DevKit."""
-__version__ = "0.3.0"
+__version__ = "0.3.1"

package/runtime/cli/aikit/app_home.py CHANGED Viewed

@@ -20,6 +20,7 @@ APP_DIRS = (
     "memory",
     "sessions",
     "tasks",
+    "models",
     "backups",
     "policies",
     "audit",

package/runtime/cli/aikit/cli_parser.py CHANGED Viewed

@@ -246,7 +246,7 @@ def build_parser(prog: str | None = None) -> argparse.ArgumentParser:
     setup_parser.add_argument("--json", action="store_true", default=argparse.SUPPRESS, help=argparse.SUPPRESS)
     setup_parser.add_argument("--dry-run", action="store_true", help="show setup plan without installing external tools")
     setup_parser.add_argument("--yes", action="store_true", help="confirm setup actions")
-    setup_parser.add_argument("--set-default", action="store_true", help="make the mini-brain Ollama backend the default LLM")
+    setup_parser.add_argument("--set-default", action="store_true", help="make the embedded mini-brain the default LLM")
     setup_parser.add_argument("action", nargs="?", default="plan", choices=["plan", "personality", "mini-brain"])
     alias_parser = subparsers.add_parser("alias", help="manage local command aliases for agent")

package/runtime/cli/aikit/embedded_mini_brain.py ADDED Viewed

@@ -0,0 +1,351 @@
+"""Embedded mini-brain runtime backed by an on-demand GGUF model."""
+from __future__ import annotations
+import hashlib
+import os
+import shutil
+import subprocess
+import sys
+import urllib.error
+import urllib.request
+from pathlib import Path
+from typing import Any
+from cli.aikit.app_home import app_path, ensure_app_home
+from cli.aikit.identity import identity_system_prompt
+from cli.aikit.runtime_paths import ROOT
+EMBEDDED_BACKEND_ID = "embedded-mini-brain"
+EMBEDDED_MODEL_ID = "Qwen/Qwen2.5-0.5B-Instruct"
+EMBEDDED_MODEL_NAME = "qwen2.5-0.5b-instruct"
+EMBEDDED_MODEL_SOURCE = "https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GGUF/resolve/main/qwen2.5-0.5b-instruct-q2_k.gguf"
+EMBEDDED_MODEL_SIZE_BYTES = 415182688
+EMBEDDED_MODEL_PATH = app_path("models", EMBEDDED_MODEL_NAME)
+EMBEDDED_MANIFEST_PATH = ROOT / "models" / EMBEDDED_MODEL_NAME / "manifest.json"
+EMBEDDED_MODEL_FILE = EMBEDDED_MODEL_PATH / "qwen2.5-0.5b-instruct-q2_k.gguf"
+EMBEDDED_MODEL_SHA256 = "9ee36184e616dfc76df4f5dd66f908dbde6979524ae36e6cefb67f532f798cb8"
+EMBEDDED_RUNTIME = "llama-cpp-python"
+EMBEDDED_RUNTIME_REQUIREMENT = "llama-cpp-python>=0.3.9"
+EMBEDDED_MAX_RESPONSE_CHARS = 2000
+DEFAULT_MAX_TOKENS = 220
+DEFAULT_CONTEXT_TOKENS = 2048
+SMOKE_RESPONSE_ENV = "AGENT_DEVKIT_EMBEDDED_SMOKE_RESPONSE"
+SOURCE_ENV = "AGENT_DEVKIT_EMBEDDED_MODEL_SOURCE"
+SKIP_DEP_INSTALL_ENV = "AGENT_DEVKIT_EMBEDDED_SKIP_DEP_INSTALL"
+_LLAMA_CACHE: Any | None = None
+def embedded_mini_brain_status() -> dict[str, Any]:
+    manifest_exists = EMBEDDED_MANIFEST_PATH.exists()
+    model_exists = EMBEDDED_MODEL_FILE.exists()
+    model_sha256 = sha256_file(EMBEDDED_MODEL_FILE) if model_exists else None
+    smoke_mode = bool(os.environ.get(SMOKE_RESPONSE_ENV))
+    model_file_valid = smoke_mode or (model_sha256 == EMBEDDED_MODEL_SHA256 if model_exists else False)
+    dependency = llama_cpp_dependency_status()
+    available = model_file_valid and dependency["status"] == "ok"
+    if available:
+        status = "ok"
+    elif not model_exists:
+        status = "not-installed"
+    elif not model_file_valid:
+        status = "invalid-model"
+    elif dependency["status"] != "ok":
+        status = "dependency-missing"
+    else:
+        status = "missing"
+    return {
+        "kind": "embedded-mini-brain",
+        "id": EMBEDDED_BACKEND_ID,
+        "status": status,
+        "available": available,
+        "configured": model_file_valid,
+        "provider": EMBEDDED_BACKEND_ID,
+        "runtime": EMBEDDED_RUNTIME,
+        "runtime_requirement": EMBEDDED_RUNTIME_REQUIREMENT,
+        "model": EMBEDDED_MODEL_ID,
+        "hf_model": EMBEDDED_MODEL_ID,
+        "model_name": EMBEDDED_MODEL_NAME,
+        "model_path": str(EMBEDDED_MODEL_PATH),
+        "model_file": str(EMBEDDED_MODEL_FILE),
+        "model_file_present": model_exists,
+        "model_file_valid": model_file_valid,
+        "model_file_sha256": model_sha256,
+        "smoke_mode": smoke_mode,
+        "model_size_bytes": EMBEDDED_MODEL_SIZE_BYTES,
+        "download_url": model_source(),
+        "sha256": EMBEDDED_MODEL_SHA256,
+        "manifest_path": str(EMBEDDED_MANIFEST_PATH),
+        "manifest_present": manifest_exists,
+        "dependency": dependency,
+        "auth": "none",
+        "stored_secret": False,
+        "install_command": "agent setup mini-brain --yes",
+        "message": (
+            "Embedded Qwen2.5 mini-brain is available for real local inference."
+            if available
+            else "Embedded mini-brain model is not installed or llama_cpp runtime is missing."
+        ),
+    }
+def invoke_embedded_mini_brain(prompt: str, *, public_name: str = "Agent DevKit") -> str:
+    status = embedded_mini_brain_status()
+    if not status["available"]:
+        raise EmbeddedMiniBrainError(status["message"])
+    smoke_response = os.environ.get(SMOKE_RESPONSE_ENV)
+    if smoke_response:
+        return f"{public_name}: {smoke_response}"[:EMBEDDED_MAX_RESPONSE_CHARS]
+    llama = load_llama()
+    payload = llama.create_chat_completion(
+        messages=[
+            {
+                "role": "system",
+                "content": embedded_system_prompt(public_name),
+            },
+            {
+                "role": "user",
+                "content": prompt,
+            },
+        ],
+        max_tokens=int(os.environ.get("AGENT_DEVKIT_EMBEDDED_MAX_TOKENS", str(DEFAULT_MAX_TOKENS))),
+        temperature=float(os.environ.get("AGENT_DEVKIT_EMBEDDED_TEMPERATURE", "0.2")),
+        top_p=float(os.environ.get("AGENT_DEVKIT_EMBEDDED_TOP_P", "0.9")),
+        repeat_penalty=float(os.environ.get("AGENT_DEVKIT_EMBEDDED_REPEAT_PENALTY", "1.08")),
+    )
+    try:
+        content = str(payload["choices"][0]["message"]["content"]).strip()
+    except (KeyError, IndexError, TypeError) as exc:
+        raise EmbeddedMiniBrainError("Embedded mini-brain returned an unexpected response shape.") from exc
+    if not content:
+        raise EmbeddedMiniBrainError("Embedded mini-brain returned an empty response.")
+    return content[:EMBEDDED_MAX_RESPONSE_CHARS]
+def embedded_backend_doctor() -> dict[str, Any]:
+    status = embedded_mini_brain_status()
+    return {
+        "id": EMBEDDED_BACKEND_ID,
+        "display_name": "Embedded mini-brain",
+        "kind": "embedded-local",
+        "status": status["status"],
+        "configured": status["configured"],
+        "model": EMBEDDED_MODEL_ID,
+        "model_file": status["model_file"],
+        "runtime": EMBEDDED_RUNTIME,
+        "auth_status": "none",
+        "message": status["message"],
+    }
+def embedded_backend_config() -> dict[str, Any]:
+    return {
+        "kind": "embedded-local",
+        "auth": "none",
+        "model": EMBEDDED_MODEL_ID,
+        "runtime": EMBEDDED_RUNTIME,
+        "model_file": str(EMBEDDED_MODEL_FILE),
+    }
+def setup_embedded_mini_brain(*, dry_run: bool = False, yes: bool = False) -> dict[str, Any]:
+    before = embedded_mini_brain_status()
+    plan = embedded_install_plan()
+    if dry_run or not yes:
+        needs_confirmation = not dry_run and not yes
+        return {
+            "kind": "embedded-mini-brain-install",
+            "status": "planned" if dry_run else "needs-confirmation",
+            "ok": bool(dry_run),
+            "exit_code": 2 if needs_confirmation else 0,
+            "dry_run": dry_run,
+            "yes": yes,
+            "before": before,
+            "after": before,
+            "plan": plan,
+            "message": "Use --yes to download the embedded mini-brain model and install its local runtime.",
+        }
+    ensure_app_home()
+    EMBEDDED_MODEL_PATH.mkdir(parents=True, exist_ok=True)
+    download_result = ensure_model_file()
+    dependency_result = ensure_llama_cpp_dependency()
+    after = embedded_mini_brain_status()
+    ok = after.get("available") is True
+    return {
+        "kind": "embedded-mini-brain-install",
+        "status": "ok" if ok else "failed",
+        "ok": ok,
+        "exit_code": 0 if ok else 1,
+        "dry_run": False,
+        "yes": True,
+        "before": before,
+        "after": after,
+        "plan": plan,
+        "download": download_result,
+        "dependency_install": dependency_result,
+    }
+def embedded_install_plan() -> dict[str, Any]:
+    return {
+        "provider": EMBEDDED_BACKEND_ID,
+        "model": EMBEDDED_MODEL_ID,
+        "model_name": EMBEDDED_MODEL_NAME,
+        "download_url": model_source(),
+        "size_bytes": EMBEDDED_MODEL_SIZE_BYTES,
+        "sha256": EMBEDDED_MODEL_SHA256,
+        "destination": str(EMBEDDED_MODEL_FILE),
+        "runtime_requirement": EMBEDDED_RUNTIME_REQUIREMENT,
+        "writes": [
+            str(EMBEDDED_MODEL_FILE),
+            str(app_path("python")),
+        ],
+    }
+def embedded_system_prompt(public_name: str) -> str:
+    return "\n".join(
+        [
+            identity_system_prompt(name=public_name),
+            "Voce e o mini cerebro local embarcado do Agent DevKit.",
+            "Responda em portugues claro quando o usuario escrever em portugues.",
+            "Voce pode conversar, orientar onboarding/setup, explicar capacidades e preparar tarefas simples.",
+            "Nao finja ser Claude, Codex, OpenAI ou Ollama.",
+            "Nao aprove escrita externa, operacoes destrutivas, decisoes finais de seguranca ou revisoes finais.",
+            "Quando a tarefa exigir alto julgamento, diga que pode acionar Claude, Codex, Ollama ou APIs se configurados.",
+        ]
+    )
+def load_llama() -> Any:
+    global _LLAMA_CACHE
+    if _LLAMA_CACHE is not None:
+        return _LLAMA_CACHE
+    try:
+        from llama_cpp import Llama  # type: ignore
+    except ImportError as exc:
+        raise EmbeddedMiniBrainError("llama-cpp-python is required for embedded mini-brain inference.") from exc
+    if not EMBEDDED_MODEL_FILE.exists():
+        raise EmbeddedMiniBrainError(f"Embedded model file not found: {EMBEDDED_MODEL_FILE}")
+    if sha256_file(EMBEDDED_MODEL_FILE) != EMBEDDED_MODEL_SHA256:
+        raise EmbeddedMiniBrainError(f"Embedded model file failed SHA-256 validation: {EMBEDDED_MODEL_FILE}")
+    _LLAMA_CACHE = Llama(
+        model_path=str(EMBEDDED_MODEL_FILE),
+        n_ctx=int(os.environ.get("AGENT_DEVKIT_EMBEDDED_N_CTX", str(DEFAULT_CONTEXT_TOKENS))),
+        n_threads=int(os.environ.get("AGENT_DEVKIT_EMBEDDED_THREADS", str(max(1, min(4, os.cpu_count() or 1))))),
+        verbose=os.environ.get("AGENT_DEVKIT_EMBEDDED_VERBOSE") == "1",
+    )
+    return _LLAMA_CACHE
+def llama_cpp_dependency_status() -> dict[str, Any]:
+    if os.environ.get(SMOKE_RESPONSE_ENV):
+        return {
+            "status": "ok",
+            "module": "llama_cpp",
+            "package": "llama-cpp-python",
+            "mode": "smoke",
+        }
+    try:
+        import llama_cpp  # type: ignore
+    except ImportError:
+        return {
+            "status": "missing",
+            "module": "llama_cpp",
+            "package": "llama-cpp-python",
+        }
+    return {
+        "status": "ok",
+        "module": "llama_cpp",
+        "package": "llama-cpp-python",
+        "version": getattr(llama_cpp, "__version__", None),
+    }
+def ensure_model_file() -> dict[str, Any]:
+    if os.environ.get(SMOKE_RESPONSE_ENV):
+        return {
+            "status": "skipped",
+            "ok": True,
+            "model_file": str(EMBEDDED_MODEL_FILE),
+            "reason": "smoke-mode",
+        }
+    if EMBEDDED_MODEL_FILE.exists() and sha256_file(EMBEDDED_MODEL_FILE) == EMBEDDED_MODEL_SHA256:
+        return {
+            "status": "already-installed",
+            "ok": True,
+            "model_file": str(EMBEDDED_MODEL_FILE),
+            "sha256": EMBEDDED_MODEL_SHA256,
+        }
+    partial = EMBEDDED_MODEL_FILE.with_suffix(EMBEDDED_MODEL_FILE.suffix + ".part")
+    source = model_source()
+    try:
+        if Path(source).expanduser().exists():
+            shutil.copyfile(Path(source).expanduser(), partial)
+        else:
+            with urllib.request.urlopen(source, timeout=120) as response, partial.open("wb") as target:
+                shutil.copyfileobj(response, target)
+    except (OSError, urllib.error.URLError) as exc:
+        return {
+            "status": "failed",
+            "ok": False,
+            "model_file": str(EMBEDDED_MODEL_FILE),
+            "source": source,
+            "message": str(exc),
+        }
+    actual_sha = sha256_file(partial)
+    if actual_sha != EMBEDDED_MODEL_SHA256:
+        return {
+            "status": "failed",
+            "ok": False,
+            "model_file": str(EMBEDDED_MODEL_FILE),
+            "source": source,
+            "sha256": actual_sha,
+            "expected_sha256": EMBEDDED_MODEL_SHA256,
+            "message": "Downloaded embedded model failed SHA-256 validation.",
+        }
+    partial.replace(EMBEDDED_MODEL_FILE)
+    return {
+        "status": "downloaded",
+        "ok": True,
+        "model_file": str(EMBEDDED_MODEL_FILE),
+        "source": source,
+        "sha256": EMBEDDED_MODEL_SHA256,
+    }
+def ensure_llama_cpp_dependency() -> dict[str, Any]:
+    current = llama_cpp_dependency_status()
+    if current.get("status") == "ok":
+        return {"status": "already-installed", "ok": True, "dependency": current}
+    if os.environ.get(SKIP_DEP_INSTALL_ENV) == "1":
+        return {"status": "skipped", "ok": True, "dependency": current, "reason": "disabled-by-env"}
+    command = [sys.executable, "-m", "pip", "install", EMBEDDED_RUNTIME_REQUIREMENT]
+    process = subprocess.run(command, check=False, text=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, timeout=900)
+    return {
+        "status": "installed" if process.returncode == 0 else "failed",
+        "ok": process.returncode == 0,
+        "command": command,
+        "exit_code": process.returncode,
+        "stdout": process.stdout[-4000:],
+        "stderr": process.stderr[-4000:],
+    }
+def model_source() -> str:
+    return os.environ.get(SOURCE_ENV) or EMBEDDED_MODEL_SOURCE
+def sha256_file(path: Path) -> str:
+    hash_obj = hashlib.sha256()
+    with path.open("rb") as file:
+        for chunk in iter(lambda: file.read(1024 * 1024), b""):
+            hash_obj.update(chunk)
+    return hash_obj.hexdigest()
+class EmbeddedMiniBrainError(RuntimeError):
+    """Raised when embedded local inference cannot run."""

package/runtime/cli/aikit/interactive_wizard.py CHANGED Viewed

@@ -8,7 +8,7 @@ from typing import Any
 from cli.aikit.core.requests import AgentPromptRequest
 from cli.aikit.core.runtime import run_agent_prompt
 from cli.aikit.llm import BACKENDS, configure_backend
-from cli.aikit.mini_brain import DEFAULT_OLLAMA_MODEL, setup_mini_brain
+from cli.aikit.mini_brain import DEFAULT_OLLAMA_MODEL
 from cli.aikit.ollama import ollama_status
 from cli.aikit.onboarding import onboarding_status
 from cli.aikit.personality import load_personality, update_personality
@@ -104,11 +104,9 @@ def run_interactive_onboarding(result: dict[str, Any]) -> dict[str, Any]:
         print("\nOllama nao foi encontrado.")
         if command:
             print(f"Instalacao sugerida: {command}")
-        print("Depois de instalar, rode `agent setup mini-brain --yes` para baixar o Qwen3-0.6B.")
-    elif ask_yes_no(f"Deseja habilitar o mini cerebro local com {DEFAULT_OLLAMA_MODEL}?", default=False):
-        set_default = ask_yes_no("Usar este mini cerebro como backend LLM padrao?", default=False)
-        setup = setup_mini_brain(yes=True, set_default=set_default)
-        print(setup.get("message") or f"Mini cerebro: {setup.get('status')}")
+        print("O mini cerebro embarcado ja funciona; instale Ollama apenas se quiser workers locais adicionais.")
+    elif ask_yes_no(f"Deseja instalar o modelo Ollama opcional {DEFAULT_OLLAMA_MODEL} para workers locais?", default=False):
+        print("Rode: agent local-llm install " + DEFAULT_OLLAMA_MODEL + " --yes")
     fresh = onboarding_status(ROOT)
     toolchain = fresh.get("toolchain") if isinstance(fresh.get("toolchain"), dict) else {}
@@ -129,7 +127,7 @@ def run_interactive_onboarding(result: dict[str, Any]) -> dict[str, Any]:
 def choose_onboarding_mode() -> str:
     print("\nModos de onboarding:")
-    print("1. minimo: identidade, coordenador LLM, mini-cerebro local e memoria")
+    print("1. minimo: identidade, mini-cerebro local embarcado e memoria")
     print("2. completo: minimo + toolchain, sources, notificacoes, knowledge e memorias")
     print("3. pular")
     answer = ask_text("Escolha o modo", default="minimo").strip().lower()
@@ -161,7 +159,7 @@ def configure_personality_interactively(agent: dict[str, Any]) -> None:
 def configure_llm_interactively() -> None:
-    print("\nNenhum backend LLM coordenador utilizavel foi detectado.")
+    print("\nNenhum backend LLM coordenador externo utilizavel foi detectado.")
     print("Opcoes: claude-code, codex-cli, ollama, openai, anthropic, openrouter, pular")
     choice = ask_text("Qual backend deseja configurar primeiro?", default="pular").strip().lower()
     if choice in {"", "pular", "skip", "cancelar", "cancel"}:

package/runtime/cli/aikit/llm.py CHANGED Viewed

@@ -14,6 +14,14 @@ from pathlib import Path
 from typing import Any
 from cli.aikit.app_home import app_home, config_path as app_config_path, ensure_app_home
+from cli.aikit.embedded_mini_brain import (
+    EMBEDDED_BACKEND_ID,
+    EMBEDDED_MODEL_ID,
+    EmbeddedMiniBrainError,
+    embedded_backend_config,
+    embedded_backend_doctor,
+    invoke_embedded_mini_brain,
+)
 from cli.aikit.identity import host_cli_prompt, identity_system_prompt
@@ -33,6 +41,14 @@ class LlmBackend:
 BACKENDS: dict[str, LlmBackend] = {
+    EMBEDDED_BACKEND_ID: LlmBackend(
+        id=EMBEDDED_BACKEND_ID,
+        display_name="Embedded mini-brain",
+        kind="embedded-local",
+        auth="none",
+        default_model=EMBEDDED_MODEL_ID,
+        notes="Uses the Agent DevKit embedded mini-brain for setup, onboarding and low-risk conversation.",
+    ),
     "openai": LlmBackend(
         id="openai",
         display_name="OpenAI API",
@@ -100,7 +116,7 @@ BACKENDS: dict[str, LlmBackend] = {
 ENV_VAR_NAME_PATTERN = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*$")
 DEFAULT_AGENT_TIMEOUT_SECONDS = 120
-DEFAULT_FALLBACK_ORDER = ("claude-code", "codex-cli", "openai", "anthropic", "openrouter", "ollama")
+DEFAULT_FALLBACK_ORDER = ("claude-code", "codex-cli", "openai", "anthropic", "openrouter", "ollama", EMBEDDED_BACKEND_ID)
 def config_home() -> Path:
@@ -323,6 +339,8 @@ def normalize_backend_order(order: str | list[str] | tuple[str, ...]) -> list[st
 def default_backend_config(backend: LlmBackend) -> dict[str, Any]:
+    if backend.id == EMBEDDED_BACKEND_ID:
+        return embedded_backend_config()
     entry: dict[str, Any] = {"kind": backend.kind, "auth": backend.auth}
     if backend.auth == "api-key-env":
         entry["api_key_ref"] = f"env:{backend.api_key_env}"
@@ -346,7 +364,8 @@ def doctor_backends(backend_id: str | None = None) -> dict[str, Any]:
     checks = [doctor_backend(BACKENDS[item], config) for item in ids]
     status = "ok"
-    if any(item["status"] == "missing" for item in checks):
+    missing_statuses = {"missing", "not-installed", "dependency-missing", "invalid-model"}
+    if any(item["status"] in missing_statuses for item in checks):
         status = "partial" if not backend_id else "missing"
     if any(item["status"] == "error" for item in checks):
         status = "error"
@@ -361,6 +380,8 @@ def doctor_backends(backend_id: str | None = None) -> dict[str, Any]:
 def doctor_backend(backend: LlmBackend, config: dict[str, Any]) -> dict[str, Any]:
+    if backend.id == EMBEDDED_BACKEND_ID:
+        return embedded_backend_doctor()
     configured = config.get("llm", {}).get("backends", {}).get(backend.id, {})
     if not isinstance(configured, dict):
         configured = {}
@@ -616,6 +637,11 @@ class LlmPolicyError(LlmInvocationError):
 def invoke_resolved_backend(backend: dict[str, Any], prompt: str, *, public_name: str = "Agent DevKit") -> str:
     kind = backend.get("kind")
     backend_id = backend.get("id")
+    if kind == "embedded-local" and backend_id == EMBEDDED_BACKEND_ID:
+        try:
+            return invoke_embedded_mini_brain(prompt, public_name=public_name)
+        except EmbeddedMiniBrainError as exc:
+            raise LlmInvocationError(str(exc)) from exc
     if kind == "openai-compatible":
         return invoke_openai_compatible(backend, prompt, public_name=public_name)
     if kind == "anthropic":

package/runtime/cli/aikit/local_llm.py CHANGED Viewed

@@ -6,6 +6,7 @@ import shutil
 import subprocess
 from typing import Any
+from cli.aikit.embedded_mini_brain import EMBEDDED_BACKEND_ID, EMBEDDED_MODEL_ID, embedded_mini_brain_status
 from cli.aikit.mini_brain import DEFAULT_OLLAMA_MODEL, mini_brain_contract
 from cli.aikit.model_router import build_model_plan
 from cli.aikit.ollama import ollama_models, ollama_pull, ollama_status
@@ -28,7 +29,9 @@ def local_llm_list() -> dict[str, Any]:
         "kind": "local-llm",
         "schema_version": LOCAL_LLM_SCHEMA_VERSION,
         "status": "ok",
-        "provider": "ollama",
+        "provider": EMBEDDED_BACKEND_ID,
+        "optional_providers": ["ollama"],
+        "embedded": embedded_mini_brain_status(),
         "mini_brain": contract,
         "workers": [{"id": worker_id, "purpose": purpose} for worker_id, purpose in LOCAL_WORKERS],
         "models": {
@@ -42,12 +45,14 @@ def local_llm_doctor() -> dict[str, Any]:
     status = ollama_status()
     contract = mini_brain_contract(ollama_payload=status)
     model_plan = build_model_plan("resuma estes logs operacionais")
-    ok = status.get("status") == "ok" and contract.get("enabled") is True
+    ok = contract.get("available") is True
     return {
         "kind": "local-llm-doctor",
         "schema_version": LOCAL_LLM_SCHEMA_VERSION,
         "status": "ok" if ok else "partial",
-        "provider": "ollama",
+        "provider": EMBEDDED_BACKEND_ID,
+        "optional_providers": ["ollama"],
+        "embedded": embedded_mini_brain_status(),
         "ollama": status,
         "mini_brain": contract,
         "model_plan": {
@@ -63,9 +68,19 @@ def local_llm_doctor() -> dict[str, Any]:
 def local_llm_models() -> dict[str, Any]:
     payload = ollama_models()
+    embedded = embedded_mini_brain_status()
     payload["kind"] = "local-llm-models"
     payload["schema_version"] = LOCAL_LLM_SCHEMA_VERSION
-    payload["provider"] = "ollama"
+    payload["provider"] = EMBEDDED_BACKEND_ID
+    payload["embedded"] = {
+        "status": embedded.get("status"),
+        "provider": EMBEDDED_BACKEND_ID,
+        "model": EMBEDDED_MODEL_ID,
+        "installed": embedded.get("model_file_valid") is True,
+        "available": embedded.get("available") is True,
+        "install_command": embedded.get("install_command"),
+    }
+    payload["optional_provider"] = "ollama"
     return payload

package/runtime/cli/aikit/local_llm_operator.py CHANGED Viewed

@@ -24,7 +24,7 @@ FORBIDDEN_DELEGATION_MARKERS = (
 def maybe_delegate_local_llm(prompt: str, model_plan: dict[str, Any]) -> dict[str, Any]:
-    """Execute a bounded operational task with Ollama when the model plan selected it."""
+    """Execute a bounded operational task with the selected local worker."""
     delegation = model_plan.get("delegation") if isinstance(model_plan.get("delegation"), dict) else {}
     if model_plan.get("strategy") != "mini-brain":
         return skipped(
@@ -38,6 +38,12 @@ def maybe_delegate_local_llm(prompt: str, model_plan: dict[str, Any]) -> dict[st
             "High-risk tasks cannot be delegated to local LLM workers.",
             model_plan=model_plan,
         )
+    if model_plan.get("local_llm_role") != "operational-worker":
+        return skipped(
+            "not-operational-worker",
+            "The embedded mini-brain is acting as the bootstrap coordinator, not as a delegated worker.",
+            model_plan=model_plan,
+        )
     if int(model_plan.get("max_llm_calls") or 0) <= 0:
         return skipped(
             "llm-budget-not-available",
@@ -50,9 +56,10 @@ def maybe_delegate_local_llm(prompt: str, model_plan: dict[str, Any]) -> dict[st
     if any(marker in lowered for marker in FORBIDDEN_DELEGATION_MARKERS):
         return skipped("forbidden", "Prompt contains an action that local LLM workers cannot execute.", model_plan=model_plan)
     delegated_prompt = build_delegated_prompt(prompt, model_plan)
+    provider = str(model_plan.get("local_llm_provider") or "ollama")
     result = invoke_agent_prompt(
         delegated_prompt,
-        "ollama",
+        provider,
         public_name="Local LLM Operator",
         allow_fallback=False,
     )
@@ -64,7 +71,7 @@ def maybe_delegate_local_llm(prompt: str, model_plan: dict[str, Any]) -> dict[st
         "status": "ok" if result.get("ok") else result.get("status", "failed"),
         "ok": bool(result.get("ok")),
         "llm_backend": result.get("llm_backend"),
-        "model_provider": "ollama",
+        "model_provider": provider,
         "mini_brain": summarize_mini_brain(model_plan.get("mini_brain")),
         "strategy": model_plan.get("strategy"),
         "risk": model_plan.get("risk"),
@@ -108,7 +115,7 @@ def enrich_prompt_with_local_result(prompt: str, local_execution: dict[str, Any]
         [
             prompt,
             "",
-            "Contexto operacional produzido pelo local-llm-operator/Ollama:",
+            f"Contexto operacional produzido pelo local-llm-operator/{local_execution.get('model_provider') or local_execution.get('llm_backend') or 'local'}:",
             str(local_execution["response"]),
             "",
             "Use esse contexto apenas como apoio. A decisao, resposta final e revisao continuam sob responsabilidade do coordenador.",
@@ -131,7 +138,10 @@ def skipped(reason: str, message: str, *, model_plan: dict[str, Any]) -> dict[st
         "strategy": model_plan.get("strategy"),
         "risk": model_plan.get("risk"),
         "confidence": model_plan.get("confidence"),
-        "requires_review": bool(model_plan.get("local_llm_recommended") or model_plan.get("local_llm_selected")),
+        "requires_review": bool(
+            model_plan.get("local_llm_role") == "operational-worker"
+            and (model_plan.get("local_llm_recommended") or model_plan.get("local_llm_selected"))
+        ),
     }

package/runtime/cli/aikit/mini_brain.py CHANGED Viewed

@@ -5,14 +5,20 @@ from __future__ import annotations
 from datetime import datetime, timezone
 from typing import Any
+from cli.aikit.embedded_mini_brain import (
+    EMBEDDED_BACKEND_ID,
+    EMBEDDED_MODEL_ID,
+    embedded_mini_brain_status,
+    setup_embedded_mini_brain,
+)
 from cli.aikit.llm import BACKENDS, configure_backend, doctor_backend, load_config, save_config
-from cli.aikit.ollama import ollama_pull, ollama_status
+from cli.aikit.ollama import ollama_status
 MINI_BRAIN_CONFIG_KEY = "mini_brain"
-DEFAULT_HF_MODEL = "Qwen/Qwen3-0.6B"
+DEFAULT_HF_MODEL = EMBEDDED_MODEL_ID
 DEFAULT_OLLAMA_MODEL = "qwen3:0.6b"
-DEFAULT_PROVIDER = "ollama"
+DEFAULT_PROVIDER = EMBEDDED_BACKEND_ID
 DEFAULT_BASE_URL = "http://localhost:11434/v1"
 ALLOWED_TASKS = [
     "setup_help",
@@ -50,14 +56,15 @@ def mini_brain_contract(
 ) -> dict[str, Any]:
     config = load_config() if config is None else config
     stored = config.get(MINI_BRAIN_CONFIG_KEY) if isinstance(config.get(MINI_BRAIN_CONFIG_KEY), dict) else {}
-    enabled = bool(stored.get("enabled"))
+    enabled = bool(stored.get("enabled", True))
     provider = stored.get("provider") or stored.get("runtime") or DEFAULT_PROVIDER
     hf_model = stored.get("hf_model") or stored.get("model") or DEFAULT_HF_MODEL
     ollama_model = stored.get("ollama_model") or DEFAULT_OLLAMA_MODEL
+    embedded = embedded_mini_brain_status()
     ollama_payload = ollama_status() if ollama_payload is None else ollama_payload
     ollama_backend = doctor_backend(BACKENDS["ollama"], config) if ollama_backend is None else ollama_backend
-    backend_configured = ollama_backend.get("status") == "ok"
-    runtime_available = ollama_payload.get("status") == "ok" or backend_configured
+    ollama_configured = ollama_backend.get("configured") is True
+    runtime_available = embedded.get("available") is True
     available = enabled and provider == DEFAULT_PROVIDER and runtime_available
     status = "ok" if available else "disabled" if not enabled else "unavailable"
     return {
@@ -65,7 +72,8 @@ def mini_brain_contract(
         "status": status,
         "enabled": enabled,
         "available": available,
-        "configured": enabled and provider == DEFAULT_PROVIDER and backend_configured,
+        "configured": available,
+        "embedded_configured": provider == DEFAULT_PROVIDER,
         "provider": provider,
         "runtime": provider,
         "hf_model": hf_model,
@@ -76,6 +84,7 @@ def mini_brain_contract(
         "limits": dict_value(stored.get("limits"), DEFAULT_LIMITS),
         "guardrails": list_value(stored.get("guardrails"), DEFAULT_GUARDRAILS),
         "stored_secret": False,
+        "embedded": embedded,
         "ollama": {
             "status": ollama_payload.get("status"),
             "daemon": (ollama_payload.get("daemon") or {}).get("status")
@@ -87,6 +96,7 @@ def mini_brain_contract(
             "status": ollama_backend.get("status"),
             "model": ollama_backend.get("model"),
             "base_url": ollama_backend.get("base_url"),
+            "configured": ollama_configured,
         },
     }
@@ -98,6 +108,7 @@ def setup_mini_brain(
     set_default: bool = False,
     model: str = DEFAULT_OLLAMA_MODEL,
 ) -> dict[str, Any]:
+    embedded = embedded_mini_brain_status()
     if dry_run or not yes:
         status = "planned" if dry_run else "needs-confirmation"
         needs_confirmation = not dry_run and not yes
@@ -110,53 +121,46 @@ def setup_mini_brain(
             "yes": yes,
             "stored_secret": False,
             "mini_brain": planned_contract(model=model),
-            "pull": ollama_pull(model, yes=False, dry_run=dry_run),
+            "embedded": embedded,
+            "embedded_install": setup_embedded_mini_brain(dry_run=True, yes=False),
+            "ollama_setup": {
+                "status": "skipped",
+                "ok": True,
+                "provider": "ollama",
+                "model": model,
+                "message": "Ollama is optional; use `agent local-llm install` to add local worker models.",
+            },
             "next_steps": ["agent setup mini-brain --yes"],
-            "message": "Use --yes to pull Qwen3-0.6B with Ollama and enable the mini-brain.",
+            "message": "Use --yes to download and enable the embedded Qwen2.5-0.5B mini-brain.",
         }
-    pull = ollama_pull(model, yes=True, dry_run=False)
-    toolchain_install = None
-    if pull.get("status") == "missing":
-        from cli.aikit.toolchain import install_toolchain
-        toolchain_install = install_toolchain(None, "ollama", dry_run=False, yes=True)
-        if toolchain_install.get("status") == "installed":
-            pull = ollama_pull(model, yes=True, dry_run=False)
-    if not pull.get("ok"):
-        payload = {
+    embedded_install = setup_embedded_mini_brain(dry_run=False, yes=True)
+    embedded = embedded_mini_brain_status()
+    if embedded_install.get("ok") is not True:
+        return {
             "kind": "mini-brain-setup",
             "status": "failed",
             "ok": False,
-            "exit_code": int(pull.get("exit_code") or 2),
+            "exit_code": embedded_install.get("exit_code", 1),
             "dry_run": False,
             "yes": True,
             "stored_secret": False,
-            "mini_brain": planned_contract(model=model),
-            "pull": pull,
-            "next_steps": ["Install Ollama or run agent ollama pull qwen3:0.6b --yes"],
-            "message": pull.get("message") or "Could not pull the mini-brain model.",
+            "mini_brain": mini_brain_contract(),
+            "embedded": embedded,
+            "embedded_install": embedded_install,
+            "ollama_setup": {
+                "status": "skipped",
+                "ok": True,
+                "provider": "ollama",
+                "model": model,
+                "message": "Ollama remains optional for additional local worker models.",
+            },
+            "message": "Embedded mini-brain setup failed before the backend could be enabled.",
         }
-        if toolchain_install:
-            payload["toolchain_install"] = toolchain_install
-            payload["next_steps"] = [
-                "Review `agent toolchain doctor ollama`.",
-                "Run `agent toolchain install ollama --yes` if you approve external installation.",
-                "Then run `agent setup mini-brain --yes` again.",
-            ]
-        return payload
-    existing_config = load_config()
-    existing_ollama = (
-        existing_config.get("llm", {}).get("backends", {}).get(DEFAULT_PROVIDER)
-        if isinstance(existing_config.get("llm"), dict)
-        else {}
-    )
-    existing_base_url = existing_ollama.get("base_url") if isinstance(existing_ollama, dict) else None
     configured = configure_backend(
         DEFAULT_PROVIDER,
-        base_url=existing_base_url or DEFAULT_BASE_URL,
-        model=model,
+        model=DEFAULT_HF_MODEL,
         set_default=set_default,
     )
     config = load_config()
@@ -172,8 +176,15 @@ def setup_mini_brain(
         "stored_secret": False,
         "config_path": str(written_path),
         "mini_brain": contract,
-        "pull": pull,
-        "toolchain_install": toolchain_install,
+        "embedded": embedded,
+        "embedded_install": embedded_install,
+        "ollama_setup": {
+            "status": "skipped",
+            "ok": True,
+            "provider": "ollama",
+            "model": model,
+            "message": "Ollama remains optional for additional local worker models.",
+        },
         "llm_configure": configured,
         "next_steps": ["Use low-risk setup, wizard and summary prompts normally."],
     }
@@ -196,6 +207,7 @@ def planned_contract(*, model: str = DEFAULT_OLLAMA_MODEL) -> dict[str, Any]:
         "limits": dict(DEFAULT_LIMITS),
         "guardrails": list(DEFAULT_GUARDRAILS),
         "stored_secret": False,
+        "embedded": embedded_mini_brain_status(),
     }

package/runtime/cli/aikit/model_router.py CHANGED Viewed

@@ -5,6 +5,7 @@ from __future__ import annotations
 import re
 from typing import Any
+from cli.aikit.embedded_mini_brain import EMBEDDED_BACKEND_ID
 from cli.aikit.llm import BACKENDS, doctor_backend, llm_preference, load_config
 from cli.aikit.mini_brain import mini_brain_contract
 from cli.aikit.ollama import ollama_status
@@ -14,6 +15,9 @@ from cli.aikit.write_policy import normalize_write_policy, write_policy_public_f
 OPERATIONAL_PATTERN = re.compile(
     r"(?i)\b(resum\w*|sumari\w*|classifi\w*|extra(?:i|ir|ia|cao|ção)\w*|normaliz\w*|compar\w*|logs?|rascunho|agrupe|agrupar)\b"
 )
+SIMPLE_CHAT_SETUP_PATTERN = re.compile(
+    r"(?i)\b(ol[aá]|oi|bom dia|boa tarde|boa noite|ajuda|help|comec(?:ar|o)|começ(?:ar|o)|setup|onboard|configur|instal|usar)\b"
+)
 HIGH_LEVEL_PATTERN = re.compile(
     r"(?i)\b(arquitet|decid|aprovar|reprovar|especifica|requisit|implemente|codigo|c[oó]digo|documento|automac|deploy|seguran)\b"
 )
@@ -48,7 +52,9 @@ def build_model_plan(
     mini_brain = mini_brain_contract(config=config, ollama_payload=ollama, ollama_backend=ollama_backend)
     local_available = mini_brain.get("available") is True
     operational = bool(OPERATIONAL_PATTERN.search(prompt))
+    simple_chat_setup = bool(SIMPLE_CHAT_SETUP_PATTERN.search(prompt))
     high_level = bool(HIGH_LEVEL_PATTERN.search(prompt))
+    local_provider = select_local_provider(ollama_payload=ollama, ollama_backend=ollama_backend, mini_brain=mini_brain)
     policy = choose_model_strategy(
         prompt,
         route=route,
@@ -56,10 +62,12 @@ def build_model_plan(
         specialist_tasks=specialist_tasks or [],
         configuration_tasks=configuration_tasks or [],
         operational=operational,
+        simple_chat_setup=simple_chat_setup,
         high_level=high_level,
         local_available=local_available,
     )
     use_local = policy["strategy"] == "mini-brain" and local_available
+    delegate_local = use_local and operational
     return {
         "kind": "model-plan",
         "status": "planned",
@@ -73,22 +81,30 @@ def build_model_plan(
         "max_llm_calls": policy["max_llm_calls"],
         "intent": route.get("intent") if route else "llm",
         "primary_coordinators": coordinator_order(preference),
-        "local_llm_role": "operational-worker",
+        "local_llm_role": "operational-worker" if operational else "bootstrap-coordinator",
         "local_llm_available": local_available,
-        "local_llm_provider": mini_brain.get("provider") or "ollama",
-        "local_llm_backend_configured": ollama_backend.get("status") == "ok",
+        "local_llm_provider": local_provider,
+        "local_llm_backend_configured": ollama_backend.get("configured") is True if local_provider == "ollama" else True,
         "local_llm_runtime": {
-            "binary_status": ollama.get("status"),
+            "provider": local_provider,
+            "binary_status": ollama.get("status") if local_provider == "ollama" else "embedded",
             "backend_status": ollama_backend.get("status"),
-            "model": mini_brain.get("ollama_model") or ollama_backend.get("model"),
-            "base_url": ollama_backend.get("base_url"),
+            "model": (mini_brain.get("ollama_model") or ollama_backend.get("model")) if local_provider == "ollama" else mini_brain.get("hf_model"),
+            "base_url": ollama_backend.get("base_url") if local_provider == "ollama" else None,
+        },
+        "optional_local_providers": {
+            "ollama": {
+                "status": ollama.get("status"),
+                "backend_status": ollama_backend.get("status"),
+                "model_count": ollama.get("model_count"),
+            }
         },
         "mini_brain": mini_brain,
         "local_llm_recommended": operational,
         "local_llm_selected": use_local,
         "delegation": {
-            "allowed": policy["strategy"] == "mini-brain",
-            "selected": use_local,
+            "allowed": policy["strategy"] == "mini-brain" and operational,
+            "selected": delegate_local,
             "reason": local_reason(
                 operational=operational,
                 local_available=local_available,
@@ -110,6 +126,7 @@ def choose_model_strategy(
     specialist_tasks: list[dict[str, Any]],
     configuration_tasks: list[dict[str, Any]],
     operational: bool,
+    simple_chat_setup: bool,
     high_level: bool,
     local_available: bool,
 ) -> dict[str, Any]:
@@ -154,7 +171,7 @@ def choose_model_strategy(
             max_llm_calls=0,
             matrix="Conhecida + estruturada + baixo risco -> automacao",
         )
-    if operational and not high_level:
+    if (operational or simple_chat_setup) and not high_level:
         return policy(
             "mini-brain" if local_available else "external-llm",
             "The prompt is operational and low-risk; local mini-brain is preferred when available.",
@@ -241,3 +258,19 @@ def local_reason(*, operational: bool, local_available: bool, high_level: bool,
     if operational and not local_available:
         return "Task is operational, but the local mini-brain is not enabled or available; coordinator/API fallback should execute."
     return "Task requires coordinator-level reasoning or review."
+def select_local_provider(
+    *,
+    ollama_payload: dict[str, Any],
+    ollama_backend: dict[str, Any],
+    mini_brain: dict[str, Any],
+) -> str:
+    ollama_ready = (
+        ollama_payload.get("status") == "ok"
+        and ollama_backend.get("status") == "ok"
+        and (ollama_backend.get("configured") is True or int(ollama_payload.get("model_count") or 0) > 0)
+    )
+    if ollama_ready:
+        return "ollama"
+    return str(mini_brain.get("provider") or EMBEDDED_BACKEND_ID)

package/runtime/cli/aikit/natural_prompt_runtime.py CHANGED Viewed

@@ -138,6 +138,42 @@ def local_capabilities_help_response(prompt: str, *, name: str) -> dict[str, Any
     }
+def embedded_mini_brain_install_response(prompt: str, *, name: str, model_plan: dict[str, Any]) -> dict[str, Any]:
+    embedded = (
+        ((model_plan.get("mini_brain") or {}).get("embedded") or {})
+        if isinstance(model_plan.get("mini_brain"), dict)
+        else {}
+    )
+    status = embedded.get("status") or "not-installed"
+    response = (
+        f"Eu sou {name}. Consigo orientar o setup inicial localmente, mas o mini-cerebro local ainda nao esta instalado "
+        f"(status: {status}). Para habilitar conversa local sem Claude, Codex, Ollama ou API externa, execute "
+        "`agent setup mini-brain --yes`. Sem esse download, posso continuar com onboarding, memoria, wizards e "
+        "capabilities deterministicas."
+    )
+    return {
+        "kind": "agent",
+        "status": "needs-setup",
+        "ok": False,
+        "requires_llm": False,
+        "prompt_received": True,
+        "prompt_length": len(prompt),
+        "mode": "embedded-mini-brain-not-installed",
+        "identity": {"name": name, "source": "local"},
+        "llm_backend": "embedded-mini-brain",
+        "mini_brain": model_plan.get("mini_brain"),
+        "response": response,
+        "message": "Embedded mini-brain is not installed yet.",
+        "next_steps": [
+            "agent setup mini-brain --dry-run",
+            "agent setup mini-brain --yes",
+            "agent llm configure claude-code --set-default",
+            "agent llm configure codex-cli --set-default",
+        ],
+        "exit_code": 2,
+    }
 def agent_requires_llm(args: argparse.Namespace) -> dict[str, Any]:
     prompt = " ".join(args.prompt).strip()
     return run_agent_prompt_request(
@@ -237,9 +273,15 @@ def run_agent_prompt_request(request: AgentPromptRequest) -> dict[str, Any]:
     )
     local_llm_execution = maybe_delegate_local_llm(prompt, model_plan)
     coordinator_prompt = enrich_prompt_with_local_result(contextual_prompt, local_llm_execution)
+    requested_backend = request.llm
+    if should_prompt_for_embedded_install(model_plan, requested_backend=request.llm):
+        result = embedded_mini_brain_install_response(prompt, name=name, model_plan=model_plan)
+        return finalize_agent_session(result, session, prompt, backend="embedded-mini-brain")
+    if should_use_embedded_coordinator(model_plan, requested_backend=request.llm):
+        requested_backend = "embedded-mini-brain"
     result = invoke_agent_prompt(
         coordinator_prompt,
-        request.llm,
+        requested_backend,
         public_name=name,
         allow_fallback=not request.no_llm_fallback,
     )
@@ -278,6 +320,32 @@ def run_agent_prompt_request(request: AgentPromptRequest) -> dict[str, Any]:
     return finalize_agent_session(result, session, prompt, backend=result.get("llm_backend") or request.llm)
+def should_use_embedded_coordinator(model_plan: dict[str, Any], *, requested_backend: str | None) -> bool:
+    if requested_backend:
+        return False
+    return (
+        model_plan.get("strategy") == "mini-brain"
+        and model_plan.get("local_llm_provider") == "embedded-mini-brain"
+        and model_plan.get("risk") == "low"
+    )
+def should_prompt_for_embedded_install(model_plan: dict[str, Any], *, requested_backend: str | None) -> bool:
+    if requested_backend:
+        return False
+    embedded = (
+        ((model_plan.get("mini_brain") or {}).get("embedded") or {})
+        if isinstance(model_plan.get("mini_brain"), dict)
+        else {}
+    )
+    return (
+        model_plan.get("strategy") in {"mini-brain", "external-llm"}
+        and model_plan.get("local_llm_provider") == "embedded-mini-brain"
+        and embedded.get("available") is not True
+        and model_plan.get("fallback") == "configure-local-mini-brain-or-use-external-llm"
+    )
 def mark_review_task_needs_review(execution_plan: dict[str, Any], review_result: dict[str, Any]) -> dict[str, Any]:
     task = dict(execution_plan.get("review_task") or {})
     if task:

package/runtime/cli/aikit/onboarding.py CHANGED Viewed

@@ -106,16 +106,16 @@ def onboarding_plan(root: Path, mode: str) -> dict[str, Any]:
         ),
         plan_step(
             "coordinator-llm",
-            "Registrar Claude Code, Codex CLI ou API como coordenador/planejador/revisor.",
+            "Registrar Claude Code, Codex CLI ou API como coordenador/planejador/revisor opcional para tarefas de alto nivel.",
             "agent llm list",
             write_policy="local_config_write",
         ),
         plan_step(
             "mini-brain",
-            "Habilitar Qwen3-0.6B via Ollama para conversa simples, setup e tarefas operacionais leves.",
+            "Validar o mini cerebro embarcado Qwen2.5-0.5B para conversa simples, setup e tarefas operacionais leves.",
             "agent setup mini-brain --dry-run",
             write_policy="local_config_write",
-            model="qwen3:0.6b",
+            model="Qwen/Qwen2.5-0.5B-Instruct",
         ),
         plan_step(
             "sessions-and-memory",

package/runtime/cli/aikit/review_gate.py CHANGED Viewed

@@ -24,13 +24,13 @@ def build_review_gate(
     if route:
         required = True
         reasons.append("deterministic-route")
-    if model_plan and (model_plan.get("local_llm_selected") or model_plan.get("local_llm_recommended")):
+    if model_plan and local_worker_review_required(model_plan):
         required = True
         reasons.append("local-llm")
     if model_plan and model_plan.get("strategy") == "human":
         required = True
         reasons.append("human-strategy")
-    if model_plan and model_plan.get("strategy") == "mini-brain":
+    if model_plan and mini_brain_review_required(model_plan):
         required = True
         reasons.append("mini-brain")
     if model_plan and model_plan.get("risk") == "high":
@@ -50,6 +50,18 @@ def build_review_gate(
     }
+def local_worker_review_required(model_plan: dict[str, Any]) -> bool:
+    if not (model_plan.get("local_llm_selected") or model_plan.get("local_llm_recommended")):
+        return False
+    return model_plan.get("local_llm_provider") == "ollama" or model_plan.get("risk") != "low"
+def mini_brain_review_required(model_plan: dict[str, Any]) -> bool:
+    if model_plan.get("strategy") != "mini-brain":
+        return False
+    return model_plan.get("risk") != "low" or model_plan.get("local_llm_provider") == "ollama"
 def mark_reviewed(payload: dict[str, Any], *, reviewer: str | None = None, notes: str | None = None) -> dict[str, Any]:
     gate = dict(payload)
     if gate.get("required"):

package/runtime/models/qwen2.5-0.5b-instruct/manifest.json ADDED Viewed

@@ -0,0 +1,30 @@
+{
+  "schema_version": "agent-devkit.embedded-model/v1",
+  "model_id": "Qwen/Qwen2.5-0.5B-Instruct",
+  "model_name": "qwen2.5-0.5b-instruct",
+  "artifact": {
+    "filename": "qwen2.5-0.5b-instruct-q2_k.gguf",
+    "format": "gguf",
+    "quantization": "q2_k",
+    "size_bytes": 415182688,
+    "sha256": "9ee36184e616dfc76df4f5dd66f908dbde6979524ae36e6cefb67f532f798cb8",
+    "source": "https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GGUF/resolve/main/qwen2.5-0.5b-instruct-q2_k.gguf"
+  },
+  "provider": "embedded-mini-brain",
+  "license": "apache-2.0",
+  "runtime": "llama-cpp-python",
+  "purpose": [
+    "setup_help",
+    "wizard_conversation",
+    "intent_classification",
+    "command_explanation",
+    "short_error_summary"
+  ],
+  "guardrails": [
+    "no_secrets",
+    "low_risk_only",
+    "no_external_writes",
+    "coordinator_review_required"
+  ],
+  "notes": "This manifest declares the embedded mini-brain artifact downloaded on demand into .agent-devkit/models. Ollama remains an optional local worker pool for additional models."
+}

package/runtime/scripts/release-catalog-snapshot.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "schema_version": "ai-devkit.release-catalog-snapshot/v1",
-  "version": "0.3.0",
+  "version": "0.3.1",
   "summary": {
     "agents": 48,
     "capabilities": 397,