PyPI - sigit-code - Versions diffs - 0.1.1__tar.gz - Mend

sigit-code 0.1.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (34) hide show

sigit_code-0.1.1/.agents/AGENTS.md +0 -0
sigit_code-0.1.1/.agents/skills/agent-client-protocol/SKILL.md +314 -0
sigit_code-0.1.1/.agents/skills/ai-assisted-coding/SKILL.md +361 -0
sigit_code-0.1.1/.agents/skills/tool-calling/SKILL.md +283 -0
sigit_code-0.1.1/.github/workflows/ci.yml +113 -0
sigit_code-0.1.1/.github/workflows/release-github.yml +187 -0
sigit_code-0.1.1/.github/workflows/release-homebrew.yml +116 -0
sigit_code-0.1.1/.github/workflows/release-npm.yml +212 -0
sigit_code-0.1.1/.github/workflows/release-pypi.yml +187 -0
sigit_code-0.1.1/.gitignore +1 -0
sigit_code-0.1.1/.nvmrc +1 -0
sigit_code-0.1.1/Cargo.lock +7909 -0
sigit_code-0.1.1/Cargo.toml +44 -0
sigit_code-0.1.1/LICENSE +190 -0
sigit_code-0.1.1/PKG-INFO +134 -0
sigit_code-0.1.1/README.md +55 -0
sigit_code-0.1.1/npm/README.md.tmpl +22 -0
sigit_code-0.1.1/npm/package-main.json.tmpl +41 -0
sigit_code-0.1.1/npm/package.json.tmpl +14 -0
sigit_code-0.1.1/npm/scripts/render-main-package.cjs +34 -0
sigit_code-0.1.1/npm/scripts/render-platform-package.cjs +56 -0
sigit_code-0.1.1/npm/sigit/.gitignore +2 -0
sigit_code-0.1.1/npm/sigit/README.md +85 -0
sigit_code-0.1.1/npm/sigit/package.json +44 -0
sigit_code-0.1.1/npm/sigit/src/index.ts +43 -0
sigit_code-0.1.1/npm/sigit/tsconfig.json +12 -0
sigit_code-0.1.1/pypi/README.md +107 -0
sigit_code-0.1.1/pypi/pyproject.toml +38 -0
sigit_code-0.1.1/pyproject.toml +38 -0
sigit_code-0.1.1/rust-toolchain.toml +2 -0
sigit_code-0.1.1/src/chat.rs +1162 -0
sigit_code-0.1.1/src/main.rs +527 -0
sigit_code-0.1.1/src/setup.rs +89 -0
sigit_code-0.1.1/src/tools.rs +1136 -0

sigit_code-0.1.1/.agents/AGENTS.md ADDED Viewed

File without changes

sigit_code-0.1.1/.agents/skills/agent-client-protocol/SKILL.md ADDED Viewed

@@ -0,0 +1,314 @@
+# Skill: Agent Client Protocol (ACP) — Rust Implementation
+## Overview
+ACP is a JSON-RPC 2.0 protocol over **stdio** for integrating AI coding agents
+with editors (Zed, JetBrains, Neovim, etc.). The agent runs as a subprocess;
+the editor is the client. Communication is newline-delimited JSON on stdin/stdout.
+Crate: `agent-client-protocol = "0.10.4"` (latest as of 2025)
+Docs:  https://docs.rs/agent-client-protocol
+Spec:  https://agentclientprotocol.com
+---
+## Dependency setup
+```toml
+[dependencies]
+agent-client-protocol = "0.10.4"
+async-trait           = "0.1"
+tokio                 = { version = "1", features = ["rt", "rt-multi-thread", "macros", "io-std", "io-util", "sync"] }
+tokio-util            = { version = "0.7", features = ["compat"] }
+futures               = "0.3"
+```
+---
+## The `Agent` trait
+Declared `#[async_trait::async_trait(?Send)]` — futures are `!Send`.
+Your impl needs the same annotation:
+```rust
+#[async_trait::async_trait(?Send)]
+impl Agent for MyAgent {
+    async fn initialize(&self, args: InitializeRequest) -> Result<InitializeResponse> { ... }
+    async fn authenticate(&self, args: AuthenticateRequest) -> Result<AuthenticateResponse> { ... }
+    async fn new_session(&self, args: NewSessionRequest) -> Result<NewSessionResponse> { ... }
+    async fn prompt(&self, args: PromptRequest) -> Result<PromptResponse> { ... }
+    async fn cancel(&self, args: CancelNotification) -> Result<()> { ... }
+    // All other methods have default impls that return Error::method_not_found()
+}
+```
+You must implement `initialize`, `authenticate`, `new_session`, `prompt`, and `cancel`.
+Everything else (`load_session`, `set_session_mode`, etc.) defaults to `Err(method_not_found)`.
+---
+## Types and their builders
+All `#[non_exhaustive]` structs require builder methods — struct literal syntax won't compile.
+### `InitializeRequest` / `InitializeResponse`
+```rust
+// Response builder — use ProtocolVersion::V1, NOT args.protocol_version:
+InitializeResponse::new(ProtocolVersion::V1)
+    .agent_info(
+        Implementation::new("my-agent", env!("CARGO_PKG_VERSION"))
+            .title("My Agent"),
+    )
+    .auth_methods(vec![AuthMethod::Agent(AuthMethodAgent::new(
+        "my-agent", "My Agent",
+    ))])
+    .agent_capabilities(AgentCapabilities::default())
+```
+`auth_methods` must include at least one `AuthMethod::Agent` or Zed hangs on
+"Loading…" forever. Import `AuthMethod`, `AuthMethodAgent`, and `ProtocolVersion`
+from the crate.
+### `AuthenticateResponse`
+```rust
+Ok(AuthenticateResponse::default())  // No auth = just return default
+```
+### `NewSessionResponse`
+```rust
+let session_id = SessionId::new(uuid::Uuid::new_v4().to_string());
+Ok(NewSessionResponse::new(session_id))
+```
+`SessionId` is a newtype with `Clone`, `PartialEq`, `Display`, `Into<String>`,
+and `AsRef<str>`. Store it as-is (not as `String`) so `==` works directly.
+### `PromptRequest`
+```rust
+args.session_id   // type: SessionId
+args.prompt       // type: Vec<ContentBlock>
+```
+Extract user text from the prompt:
+```rust
+let user_text: String = args.prompt.iter()
+    .filter_map(|block| match block {
+        ContentBlock::Text(t) => Some(t.text.as_str()),
+        _ => None,
+    })
+    .collect::<Vec<_>>()
+    .join("\n");
+```
+### `PromptResponse`
+```rust
+Ok(PromptResponse::new(StopReason::EndTurn))
+// Other reasons: MaxTokens, Cancelled, MaxTurnRequests, Refusal
+```
+### `ContentBlock`
+```rust
+// Text block — use the From impl:
+ContentBlock::from("some text")  // impl From<T: Into<String>> for ContentBlock
+// Pattern-match incoming blocks:
+match block {
+    ContentBlock::Text(t)         => t.text.as_str(),
+    ContentBlock::ResourceLink(_) => ...,
+    ContentBlock::Resource(_)     => ...,
+    _ => ...,  // non_exhaustive — always need a wildcard
+}
+```
+### `ContentChunk` + `SessionUpdate` — streaming
+```rust
+let chunk = ContentChunk::new(ContentBlock::from(delta_text));
+let update = SessionUpdate::AgentMessageChunk(chunk);
+// Other variants: UserMessageChunk, AgentThoughtChunk, ToolCall, Plan, ...
+```
+### `SessionNotification` — send streaming content to client
+```rust
+let notification = SessionNotification::new(session_id.clone(), update);
+// Deliver via AgentSideConnection::session_notification()
+```
+### `Error`
+```rust
+// There is NO Error::internal(msg) method — use:
+agent_client_protocol::Error::new(-32603, "your message here")
+// For invalid params:
+agent_client_protocol::Error::invalid_params()
+// For method not found (already the trait default):
+agent_client_protocol::Error::method_not_found()
+```
+---
+## Running the agent — `AgentSideConnection`
+Wraps stdin/stdout with JSON-RPC machinery.
+```rust
+use futures::future::LocalBoxFuture;
+use tokio_util::compat::{TokioAsyncReadCompatExt, TokioAsyncWriteCompatExt};
+// Adapt tokio I/O to futures AsyncRead/AsyncWrite (the SDK expects these)
+let stdin  = tokio::io::stdin().compat();
+let stdout = tokio::io::stdout().compat_write();
+// Must run inside a LocalSet — the spawn fn takes LocalBoxFuture (!Send)
+let local = tokio::task::LocalSet::new();
+local.run_until(async move {
+    let (conn, io_task) = AgentSideConnection::new(
+        agent,
+        stdout,
+        stdin,
+        |fut: LocalBoxFuture<'static, ()>| {
+            tokio::task::spawn_local(fut);  // requires LocalSet context
+        },
+    );
+    // ... set up forwarder task using conn ...
+    io_task.await  // drives JSON-RPC until client disconnects
+}).await;
+```
+`AgentSideConnection::new` returns `(conn, io_task)` — you need both. `io_task`
+drives the actual IO; `conn` sends notifications. The spawn closure gets
+`LocalBoxFuture<'static, ()>` (not Send), so use `tokio::task::spawn_local`,
+not `tokio::spawn`. Everything must sit inside
+`tokio::task::LocalSet::new().run_until(...)`.
+---
+## Streaming — circular dependency pattern
+`Agent::prompt()` needs to send `SessionNotification` through the connection,
+but the connection is built *from* the agent. Break the cycle with an mpsc channel:
+```rust
+// 1. Create channel BEFORE the agent
+let (notification_tx, mut notification_rx) = mpsc::channel::<SessionNotification>(256);
+// 2. Pass sender into agent
+let agent = MyAgent { notification_tx, ... };
+// 3. Create connection
+let (conn, io_task) = AgentSideConnection::new(agent, stdout, stdin, |fut| {
+    tokio::task::spawn_local(fut);
+});
+// 4. Spawn forwarder that holds `conn`
+tokio::task::spawn_local(async move {
+    while let Some(notification) = notification_rx.recv().await {
+        conn.session_notification(notification).await.ok();
+    }
+});
+// 5. Run IO
+io_task.await;
+```
+Inside `prompt()`, push chunks through the channel:
+```rust
+self.notification_tx.send(SessionNotification::new(
+    session_id.clone(),
+    SessionUpdate::AgentMessageChunk(ContentChunk::new(ContentBlock::from(delta))),
+)).await.ok();  // ignore send errors (channel closed = client gone)
+```
+---
+## Logging
+Log to **stderr** — stdout is the ACP JSON-RPC wire:
+```rust
+env_logger::Builder::from_env(env_logger::Env::default().default_filter_or("info"))
+    .target(env_logger::Target::Stderr)
+    .init();
+```
+---
+## Protocol flow
+```
+Editor                          Agent
+  │                               │
+  │── initialize ────────────────►│  (negotiate version + capabilities)
+  │◄─ InitializeResponse ─────────│
+  │                               │
+  │── authenticate ──────────────►│  (method_id from authMethods)
+  │◄─ AuthenticateResponse ───────│
+  │                               │
+  │── session/new ───────────────►│  (create session, load model)
+  │◄─ NewSessionResponse ─────────│
+  │                               │
+  │── session/prompt ────────────►│  (user message)
+  │◄─ session/update (N times) ───│  (streaming tokens via notification)
+  │◄─ PromptResponse ─────────────│  (stop_reason = EndTurn when done)
+  │                               │
+  │── session/cancel (optional) ──►│
+  │                               │
+  │── [disconnect] ───────────────►│  (io_task future resolves → shutdown)
+```
+---
+## Zed configuration
+```json
+{
+  "agent_servers": {
+    "MyAgent": {
+      "type": "custom",
+      "command": "/path/to/binary"
+    }
+  }
+}
+```
+---
+## Gotchas
+1. **`Error::internal()` doesn't exist** — use `Error::new(-32603, msg)`.
+2. **All protocol structs are `#[non_exhaustive]`** — use builder methods,
+   never struct literals. Add `_ => ...` wildcards when matching.
+3. **`LocalBoxFuture` is `!Send`** — `tokio::spawn` won't work; use
+   `tokio::task::spawn_local` inside a `LocalSet`.
+4. **`tokio::task::spawn_local` panics outside a `LocalSet`** — wrap with
+   `LocalSet::new().run_until(async { ... }).await`.
+5. **Store `SessionId` as `SessionId`**, not `String` — otherwise `==`
+   comparisons get annoying.
+6. **One session per connection is fine for MVP** — reuse the model with
+   `clear_history()` instead of reloading.
+7. **`AgentCapabilities::default()` exists** — all capabilities None/false.
+8. **`block_in_place` panics inside `spawn_local`** — dependencies that call
+   `tokio::task::block_in_place` internally (e.g. `mistralrs`) will blow up
+   with "can call blocking only when running on the multi-threaded runtime"
+   from a `spawn_local` task. Fix: do the blocking work *before* entering
+   the `LocalSet`, while you're still on a normal multi-thread worker, then
+   pass the result into your agent struct.
+9. **Empty `authMethods` hangs Zed** — `InitializeResponse` with an empty
+   `auth_methods` vec makes Zed show "Loading…" forever. Always include at
+   least one `AuthMethod::Agent(AuthMethodAgent::new("id", "Name"))`.
+   Import `AuthMethod`, `AuthMethodAgent`, and `ProtocolVersion` from the crate.
+10. **Never write to stdout except JSON-RPC** — any library that prints to
+    stdout (`mistralrs` model metadata, stray `println!`, whatever) will
+    corrupt the wire. Redirect diagnostics to stderr. If a dependency writes
+    to stdout internally, fix it or suppress it before shipping.

sigit_code-0.1.1/.agents/skills/ai-assisted-coding/SKILL.md ADDED Viewed

@@ -0,0 +1,361 @@
+# Skill: AI-Assisted Coding Agents — Onde Inference Integration
+## Overview
+Building a local AI coding agent in Rust using Onde Inference as the LLM backend.
+Onde wraps mistral.rs with a clean API for model loading, history management, and
+streaming inference across macOS (Metal), iOS, Android, Linux, and Windows.
+Crate:  `onde = { path = "../onde" }` or from crates.io when published
+Repo:   https://github.com/ondeinference/onde
+Docs:   https://ondeinference.com
+---
+## Onde `ChatEngine` API
+### Construction and lifecycle
+```rust
+use onde::inference::{ChatEngine, GgufModelConfig, SamplingConfig};
+let engine = ChatEngine::new();        // starts unloaded
+engine.is_loaded().await               // -> bool
+engine.unload_model().await            // -> ()
+```
+### Loading a model
+```rust
+// Platform-aware default (Qwen 2.5 3B on macOS, 1.5B on iOS/tvOS/Android)
+let config = GgufModelConfig::platform_default();
+// Load — blocks until model is in memory and on GPU
+engine
+    .load_gguf_model(
+        config,
+        Some("You are a helpful assistant.".to_string()),  // system prompt
+        None,  // sampling config (uses SamplingConfig::default() internally)
+    )
+    .await?;
+// AlreadyLoaded error if called twice — check first:
+if !engine.is_loaded().await {
+    engine.load_gguf_model(...).await?;
+}
+```
+**Model sizes (macOS/Windows/Linux default — Qwen 2.5 3B Q4_K_M):** ~1.93 GB
+**Model sizes (iOS/tvOS/Android default — Qwen 2.5 1.5B Q4_K_M):** ~941 MB
+First run downloads from HuggingFace Hub into `~/.cache/huggingface/`.
+### Blocking (non-streaming) inference
+```rust
+let result = engine.send_message("What is Rust's ownership model?").await?;
+// result: InferenceResult
+println!("{}", result.text);
+println!("took {}", result.duration_display);  // e.g. "3.2s"
+```
+`send_message` appends both the user message and assistant reply to conversation
+history automatically.
+### Streaming inference
+```rust
+let mut rx: tokio::sync::mpsc::Receiver<StreamChunk> =
+    engine.stream_message("Tell me a story.").await?;
+while let Some(chunk) = rx.recv().await {
+    if !chunk.delta.is_empty() {
+        print!("{}", chunk.delta);   // partial token text
+    }
+    if chunk.done {
+        // chunk.finish_reason: Option<String> — e.g. "stop", "length"
+        break;
+    }
+}
+```
+`StreamChunk` fields:
+- `delta: String` — the new token(s) in this chunk
+- `done: bool` — true on the last chunk
+- `finish_reason: Option<String>` — present on final chunk only
+History is updated automatically after the stream completes.
+### One-shot generation (no history side-effects)
+```rust
+use onde::inference::ChatMessage;
+let result = engine.generate(
+    vec![ChatMessage::user("Expand: a cat in space")],
+    Some(SamplingConfig::deterministic()),
+).await?;
+println!("{}", result.text);
+// Does NOT modify conversation history
+```
+### History management
+```rust
+let history: Vec<ChatMessage> = engine.history().await;
+let removed: usize = engine.clear_history().await;  // returns count cleared
+engine.push_history(ChatMessage::user("context")).await;
+engine.set_system_prompt("new system prompt").await;
+engine.clear_system_prompt().await;
+```
+### Engine status
+```rust
+let info: EngineInfo = engine.info().await;
+// info.status: EngineStatus (Unloaded | Loading | Ready | Generating | Error)
+// info.model_name: Option<String>
+// info.approx_memory: Option<String>  e.g. "~1.93 GB"
+// info.history_length: u64
+```
+---
+## `InferenceError` variants
+```rust
+match err {
+    InferenceError::NoModelLoaded       => { /* load model first */ }
+    InferenceError::AlreadyLoaded { model_name } => { /* already loaded */ }
+    InferenceError::ModelBuild { reason } => { /* load failure */ }
+    InferenceError::Inference { reason }  => { /* runtime inference error */ }
+    InferenceError::Cancelled            => { /* was cancelled */ }
+    InferenceError::Other { reason }     => { /* unexpected */ }
+}
+```
+Map to ACP errors:
+```rust
+.map_err(|e| agent_client_protocol::Error::new(-32603, e.to_string()))?
+```
+---
+## `SamplingConfig` presets
+| Preset | temp | top_p | max_tokens | Use case |
+|--------|------|-------|------------|----------|
+| `SamplingConfig::default()` | 0.7 | 0.95 | 512 | General chat |
+| `SamplingConfig::deterministic()` | 0.0 | — | 512 | Code / reproducible |
+| `SamplingConfig::mobile()` | 0.7 | 0.95 | 128 | Memory-constrained |
+| `SamplingConfig::coding()` | 0.0 | — | 512 | Code generation |
+| `SamplingConfig::coding_mobile()` | 0.0 | — | 128 | Code on mobile |
+---
+## `GgufModelConfig` constructors
+```rust
+GgufModelConfig::platform_default()    // auto-selects based on target_os
+GgufModelConfig::qwen25_1_5b()         // force 1.5B
+GgufModelConfig::qwen25_3b()           // force 3B
+GgufModelConfig::qwen25_coder_1_5b()   // coder variant 1.5B
+GgufModelConfig::qwen25_coder_3b()     // coder variant 3B
+```
+---
+## Adding onde as a Rust library dependency
+```toml
+# In your crate's Cargo.toml — onde is a path dep since it's not on crates.io yet
+onde = { path = "../onde" }
+```
+**Important:** `onde` declares `crate-type = ["lib", "cdylib", "staticlib"]`.
+When used as a Rust library dep, only the `lib` target is compiled. The
+`cdylib`/`staticlib` targets (used for Swift/Kotlin FFI) are not built. The
+`uniffi::setup_scaffolding!()` macro generates `#[no_mangle] extern "C"` symbols
+but these are harmless in a binary context.
+**The `[patch.crates-io]` in onde's Cargo.toml does NOT propagate** to dependents
+unless they are in the same workspace. The `sysctl` patch is only needed for
+watchOS; macOS/iOS/Linux work without it.
+**GPU feature selection is automatic** via `target_os` cfg flags in onde's
+Cargo.toml — you get Metal on macOS/iOS without any extra features in your crate.
+---
+## Patterns for coding agents
+### Single-engine, multi-session via history reset
+For a simple MVP where one session is active at a time:
+```rust
+struct MyAgent {
+    engine: Arc<ChatEngine>,
+    active_session: Arc<Mutex<Option<SessionId>>>,
+}
+// new_session handler:
+if self.engine.is_loaded().await {
+    self.engine.clear_history().await;   // reuse model, fresh conversation
+} else {
+    self.engine
+        .load_gguf_model(GgufModelConfig::platform_default(), Some(SYSTEM_PROMPT.into()), None)
+        .await?;
+}
+```
+**Why:** Loading the model is expensive (seconds + GB of RAM). Reloading for each
+session would make the agent feel broken. `clear_history()` resets context in
+microseconds.
+### Per-session engines (multiple concurrent sessions)
+When you need truly isolated parallel sessions:
+```rust
+use std::collections::HashMap;
+struct MultiSessionAgent {
+    sessions: Arc<Mutex<HashMap<String, Arc<ChatEngine>>>>,
+}
+// new_session: create and load a new engine per session
+// prompt: look up session engine, call send_message or stream_message
+// CAVEAT: each engine holds a separate model copy in GPU memory — expensive!
+```
+Better approach for shared GPU memory: use `engine.generate()` (no history
+side-effects) with an explicitly managed message vec per session.
+### System prompt design for coding agents
+```rust
+const SYSTEM_PROMPT: &str = "\
+You are <AgentName>, an expert AI coding agent integrated into your editor \
+via the Agent Client Protocol. You specialize in:
+- Code analysis, writing, and refactoring
+- Bug hunting and debugging
+- Git workflows and commit messages
+- Software architecture and design patterns
+- Code review and best practices
+Be concise, precise, and practical. Write clean, idiomatic code with brief \
+explanations. Identify root causes when debugging. Prefer correctness over brevity.";
+```
+Key principles:
+- State the agent's role and name clearly (models respond better to named personas)
+- List specializations explicitly (influences which parts of training are activated)
+- Set tone expectations: "concise", "practical", "idiomatic"
+- Avoid verbose instruction lists — they cost tokens on every turn
+### Streaming tokens to ACP (connecting onde → ACP)
+```rust
+// In Agent::prompt():
+let mut rx = self.engine.stream_message(user_text).await
+    .map_err(|e| Error::new(-32603, e.to_string()))?;
+while let Some(chunk) = rx.recv().await {
+    if !chunk.delta.is_empty() {
+        self.notification_tx.send(
+            SessionNotification::new(
+                session_id.clone(),
+                SessionUpdate::AgentMessageChunk(
+                    ContentChunk::new(ContentBlock::from(chunk.delta)),
+                ),
+            )
+        ).await.ok();  // .ok() — ignore if forwarder is gone
+    }
+    if chunk.done { break; }
+}
+Ok(PromptResponse::new(StopReason::EndTurn))
+```
+The `PromptResponse` is returned AFTER the stream finishes. The client receives
+streaming tokens via `session/update` notifications while blocking on the
+`session/prompt` response.
+---
+## Extracting text from ACP `PromptRequest`
+ACP prompts can contain text, images, resource links, etc. For a text-only
+coding agent:
+```rust
+let user_text: String = args.prompt.iter()
+    .filter_map(|block| match block {
+        ContentBlock::Text(t) => Some(t.text.as_str()),
+        // Skip images, resource links, embedded resources for now
+        _ => None,
+    })
+    .collect::<Vec<_>>()
+    .join("\n");
+```
+For future resource context (e.g. open files provided by Zed):
+```rust
+ContentBlock::Resource(r) => match &r.resource {
+    EmbeddedResourceResource::Text(t) => Some(t.text.as_str()),
+    _ => None,
+},
+```
+---
+## `ChatEngine` threading model
+- Internally uses `Arc<tokio::sync::Mutex<Option<LoadedModel>>>` — `Send + Sync`.
+- Safe to wrap in `Arc<ChatEngine>` and share across tasks.
+- `stream_message()` spawns a `tokio::spawn` background task internally — the
+  mistralrs model must be `Send`, which it is on all supported platforms.
+- Calling `stream_message()` from a `!Send` future (e.g. inside a `LocalSet`) is
+  fine — the future itself doesn't hold a `!Send` value across `.await`.
+---
+## First-run model download
+On first use, onde downloads the GGUF model from HuggingFace Hub:
+- Requires internet connectivity
+- Cached at `~/.cache/huggingface/` (or `HF_HUB_CACHE` env var)
+- `HF_TOKEN` env var needed for gated models (public Qwen models don't need it)
+- Subsequent runs load from disk cache — fast
+For sandboxed environments (iOS, tvOS, Android):
+- Set `HF_HOME` and `HF_HUB_CACHE` to a path inside the app container
+- Do this BEFORE calling any ChatEngine method
+- See `onde/docs/swift-package.md` for `setupInferenceEnvironment()` pattern
+---
+## Common mistakes
+1. **Calling `load_gguf_model` twice** without checking `is_loaded()` first →
+   `InferenceError::AlreadyLoaded`. Always guard with `is_loaded().await`.
+2. **Blocking on the stream after the channel is closed** → the stream naturally
+   ends when the `done` flag is true. Don't `recv()` after `done`.
+3. **Losing `StreamChunk` deltas** when `delta` is empty (whitespace tokens) →
+   always check `!chunk.delta.is_empty()` before sending to avoid empty
+   notifications that waste bandwidth.
+4. **Sharing one `ChatEngine` across parallel prompts** without coordination →
+   the internal Mutex serializes inference, so concurrent prompts queue up.
+   Design for sequential access per engine instance.
+5. **Using `SamplingConfig::default()` for code generation** → prefer
+   `SamplingConfig::coding()` (deterministic, temp=0) for more reliable code output.
+6. **Forgetting that `generate()` doesn't update history** — use it for
+   one-shot enhancements (prompt expansion, code review) that shouldn't pollute
+   the main conversation. Use `send_message()` / `stream_message()` for the
+   primary turn loop.