@burtson-labs/bandit-stealth-cli 1.7.80 → 1.7.84
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +38 -0
- package/dist/cli.js +326 -326
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -185,6 +185,44 @@ Workspace config overrides user config. Secrets belong in the user-level file, n
|
|
|
185
185
|
|
|
186
186
|
Running a bigger model on a remote Ollama instance? Point `OLLAMA_URL` at the remote endpoint and set `BANDIT_MODEL` to the bigger model. Requests route to the remote node; everything else stays local.
|
|
187
187
|
|
|
188
|
+
#### Rented GPU (RunPod / Vast.ai / Lambda)
|
|
189
|
+
|
|
190
|
+
When you need to run a model your local hardware can't fit, Bandit talks to any remote Ollama endpoint — including rented GPU pods. Same shape on every provider: spin up a pod with Ollama on port 11434, copy the proxy URL, point `OLLAMA_URL` at it.
|
|
191
|
+
|
|
192
|
+
**RunPod** (recommended — simplest UX):
|
|
193
|
+
|
|
194
|
+
```bash
|
|
195
|
+
# 1. From the RunPod template gallery, pick any Ollama template.
|
|
196
|
+
# H100 SXM is the right pick for 27-32B models; multi-GPU only
|
|
197
|
+
# needed for 70B+. Network volume optional but useful if you want
|
|
198
|
+
# model weights to persist across pod restarts.
|
|
199
|
+
|
|
200
|
+
# 2. Once the pod boots, copy its proxy URL from the dashboard.
|
|
201
|
+
# Format: https://<pod-id>-11434.proxy.runpod.net
|
|
202
|
+
|
|
203
|
+
# 3. SSH into the pod and pull a model:
|
|
204
|
+
ollama pull qwen3.6:27b
|
|
205
|
+
|
|
206
|
+
# 4. Locally, point Bandit at it:
|
|
207
|
+
export OLLAMA_URL="https://<pod-id>-11434.proxy.runpod.net"
|
|
208
|
+
export BANDIT_MODEL="qwen3.6:27b"
|
|
209
|
+
bandit
|
|
210
|
+
```
|
|
211
|
+
|
|
212
|
+
Tear the pod down when you're done. ~$2/hr for an H100 SXM × 15-20 min agent session = under $1.
|
|
213
|
+
|
|
214
|
+
**Vast.ai / Lambda Labs**: same pattern. Find an Ollama-preloaded image (or `apt install` Ollama yourself), expose port 11434, set `OLLAMA_URL` to the host URL.
|
|
215
|
+
|
|
216
|
+
**Recommended models for rented GPU:**
|
|
217
|
+
|
|
218
|
+
| Model | Size | What it's good at |
|
|
219
|
+
|---|---|---|
|
|
220
|
+
| `qwen3.6:27b` | ~17 GB | Same model as `bandit-logic`. Native tool calling, vision, 256K context. Best general-purpose pick. |
|
|
221
|
+
| `qwen2.5-coder:32b` | ~20 GB | Code-specialist post-train. Strongest on file edits and refactors. |
|
|
222
|
+
| `qwen3.6:35b` | ~24 GB | Bigger Qwen 3.6 variant — slower, marginally better reasoning. |
|
|
223
|
+
|
|
224
|
+
**Avoid for agent work:** `gpt-oss:120b` and similar reasoning-tuned models. They're post-trained for OpenAI's harmony tool-call format, not the XML protocol Bandit uses for non-native models — they tend to narrate intent without emitting tool calls. Great for math/proofs in chat, poor for filesystem agent loops.
|
|
225
|
+
|
|
188
226
|
---
|
|
189
227
|
|
|
190
228
|
## Security & privacy
|