@warlock.js/ai-ollama 4.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,112 @@
1
+ ---
2
+ name: setup-ollama
3
+ description: 'Wire @warlock.js/ai-ollama — new OllamaSDK({host?, headers?}) for local / self-hosted Ollama via the official ollama client (not OpenAI-compat). chat + embed, daemon-down error handling. Triggers: `OllamaSDK`, `ollama.model`, `ollama.embedder`, `embedder.embedMany`, `ollama.count`, `host`, `headers`; "use ollama with warlock", "run llama3 locally", "self-hosted llama"; typical import `import { OllamaSDK } from "@warlock.js/ai-ollama"`. Skip: agent loop — `@warlock.js/ai/run-ai-agent/SKILL.md`; provider choice — `@warlock.js/ai/pick-ai-provider/SKILL.md`; embeddings core — `@warlock.js/ai/embed-text/SKILL.md`; siblings `@warlock.js/ai-openai`, `@warlock.js/ai-anthropic`, `@warlock.js/ai-google`; raw `ollama` npm, Vercel `@ai-sdk/ollama`; OpenAI-compat gateway via `@warlock.js/ai-openai` `baseURL`.'
4
+ ---
5
+
6
+ # `@warlock.js/ai-ollama`
7
+
8
+ Provider adapter that turns a local (or self-hosted) Ollama server into a vendor-neutral `ModelContract`, plus an Ollama embedder. Uses the **official `ollama` npm package** (not OpenAI-compat). Mirrors the openai / anthropic / bedrock / google adapters.
9
+
10
+ ## Construction
11
+
12
+ ```ts
13
+ import { OllamaSDK } from "@warlock.js/ai-ollama";
14
+
15
+ const ollama = new OllamaSDK(); // local default host
16
+ const remote = new OllamaSDK({ host: "http://gpu-box.internal:11434" });
17
+ const gated = new OllamaSDK({
18
+ host: "https://ollama.internal",
19
+ headers: { Authorization: `Bearer ${process.env.OLLAMA_TOKEN}` },
20
+ });
21
+ ```
22
+
23
+ `OllamaSDK` is a class with a long-lived `Ollama` client. Config is `Partial<Config>` (host defaults to `http://127.0.0.1:11434`) + `provider` (default `"ollama"`) + optional `pricing` (local is free; kept for parity/chargeback).
24
+
25
+ ## Producing a model
26
+
27
+ ```ts
28
+ ollama.model({ name: "llama3.1" })
29
+ ollama.model({ name: "qwen2.5:14b", temperature: 0.2 })
30
+ ollama.model({ name: "llama3.2-vision", maxTokens: 1024 })
31
+ ```
32
+
33
+ ## Capabilities — what's auto-set
34
+
35
+ | Flag | Default |
36
+ | --- | --- |
37
+ | `structuredOutput` | `true` (via Ollama's native `format` JSON-schema field) |
38
+ | `vision` | Inferred from model tag substring. `true` for `llava`, `bakllava`, `*-vision`, `moondream`, `minicpm-v`, `qwen2-vl`, `qwen2.5-vl`, `llama4`, `gemma3`; `false` otherwise. |
39
+
40
+ Explicit config always wins.
41
+
42
+ ## System prompt & roles
43
+
44
+ Unlike Anthropic/Gemini/Bedrock, **Ollama keeps a first-class `system` role inside `messages`** — no hoisting. Neutral roles (`system`/`user`/`assistant`/`tool`) pass straight through.
45
+
46
+ ## Tool calls
47
+
48
+ - Outgoing: neutral tools → `{ type: "function", function: { name, description, parameters } }`.
49
+ - Assistant tool calls → `tool_calls: [{ function: { name, arguments } }]` (Ollama has **no tool-call id**).
50
+ - Tool results (`role: "tool"`) → a `tool` message with `tool_name` set from `toolCallId` (Ollama matches a result to its call by name).
51
+
52
+ **Synthesized ids.** Because Ollama tool calls carry no id, the adapter sets neutral `id` = tool name. **Parallel calls to the same tool in one turn share an id** — a documented v1 limitation. Ollama reports `done_reason: "stop"` even when it called a tool; the adapter derives `finishReason: "tool_calls"` from tool-call presence.
53
+
54
+ ## Structured output
55
+
56
+ Object-root `responseSchema` + `structuredOutput`-capable → `chat({ format: <schema> })` (Ollama's `format` accepts a JSON Schema object).
57
+
58
+ ## Multipart messages (vision)
59
+
60
+ A multipart user message collapses to a single `content` string + an `images` array of **base64 strings**. `{ type: "image", source: { url } }` → **throws `InvalidRequestError`** (Ollama cannot fetch remote URLs). Resolve images to base64 first.
61
+
62
+ ## Streaming
63
+
64
+ `model.stream()` drains `chat({ stream: true })` (an `AbortableAsyncIterator`). Each chunk's `message.content` → `{ type: "delta" }`; `message.tool_calls` are emitted as `{ type: "tool-call" }` **fully formed**. Terminal `{ type: "done", finishReason, usage }` — usage from the final (`done: true`) chunk's `prompt_eval_count` / `eval_count`.
65
+
66
+ **`options.signal` is honored** by calling the iterator's `abort()` (stream path; non-stream `complete()` ignores it — the agent still honors the signal at trip boundaries).
67
+
68
+ ## Finish-reason mapping
69
+
70
+ `stop` → `stop` · `length` → `length` · `load` / unknown / null → `error`. `tool_calls` derived from tool-call presence.
71
+
72
+ ## Embeddings
73
+
74
+ ```ts
75
+ const embedder = ollama.embedder({ name: "nomic-embed-text" });
76
+ const { vector } = await embedder.embed("Hello world");
77
+ const { vectors } = await embedder.embedMany(["a", "b"]); // single batched call
78
+ const truncated = ollama.embedder({ name: "mxbai-embed-large", dimensions: 512 });
79
+ ```
80
+
81
+ `client.embed` accepts a string array natively, so `embedMany` is **one request** (like the Gemini adapter). Usage comes from `prompt_eval_count` (reported as both `promptTokens` and `totalTokens`). Local Ollama runs without a prompt cache, so model usage has no `cachedTokens`.
82
+
83
+ `dimensions` is optional. When set it's forwarded as Ollama's `dimensions` truncation field (newer embedding models) and seeds `embedder.dimensions`; when omitted, `embedder.dimensions` starts at `0` and is resolved lazily from the first response's vector length, then cached.
84
+
85
+ ## Errors
86
+
87
+ Wrapped into the typed `@warlock.js/ai` `AIError` hierarchy. The `ollama` client throws an internal `ResponseError` (`status_code` + message); transport failures surface as `fetch` `TypeError` with `ECONNREFUSED` cause:
88
+
89
+ - **Daemon-down (`ECONNREFUSED` / "fetch failed") → `ProviderError`** (operational "is Ollama running?", not a request defect)
90
+ - Timeouts → `ProviderTimeoutError`
91
+ - 401/403 → `ProviderAuthError`
92
+ - 429 → `ProviderRateLimitError`
93
+ - 4xx with context phrasing → `ContextLengthExceededError`, else `InvalidRequestError`
94
+ - 5xx → `ProviderError`
95
+
96
+ ## Token counting
97
+
98
+ ```ts
99
+ await ollama.count("some text") // approximate heuristic, offline
100
+ ```
101
+
102
+ ## When NOT to use this skill
103
+
104
+ - Direct `ollama` client calls without going through `@warlock.js/ai` agents.
105
+ - OpenAI / Anthropic / Bedrock / Google models — those have their own adapter packages.
106
+ - An OpenAI-compatible Ollama gateway you specifically want to drive through the OpenAI protocol — use `@warlock.js/ai-openai` with `baseURL` instead.
107
+
108
+ ## See also
109
+
110
+ - [`@warlock.js/ai/run-ai-agent/SKILL.md`](@warlock.js/ai/run-ai-agent/SKILL.md)
111
+ - [`@warlock.js/ai/pick-ai-provider/SKILL.md`](@warlock.js/ai/pick-ai-provider/SKILL.md)
112
+ - [`@warlock.js/ai/embed-text/SKILL.md`](@warlock.js/ai/embed-text/SKILL.md)