@burtson-labs/bandit-stealth-cli 1.7.83 → 1.7.84

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +38 -0
  2. package/dist/cli.js +253 -253
  3. package/package.json +1 -1
package/README.md CHANGED
@@ -185,6 +185,44 @@ Workspace config overrides user config. Secrets belong in the user-level file, n
185
185
 
186
186
  Running a bigger model on a remote Ollama instance? Point `OLLAMA_URL` at the remote endpoint and set `BANDIT_MODEL` to the bigger model. Requests route to the remote node; everything else stays local.
187
187
 
188
+ #### Rented GPU (RunPod / Vast.ai / Lambda)
189
+
190
+ When you need to run a model your local hardware can't fit, Bandit talks to any remote Ollama endpoint — including rented GPU pods. Same shape on every provider: spin up a pod with Ollama on port 11434, copy the proxy URL, point `OLLAMA_URL` at it.
191
+
192
+ **RunPod** (recommended — simplest UX):
193
+
194
+ ```bash
195
+ # 1. From the RunPod template gallery, pick any Ollama template.
196
+ # H100 SXM is the right pick for 27-32B models; multi-GPU only
197
+ # needed for 70B+. Network volume optional but useful if you want
198
+ # model weights to persist across pod restarts.
199
+
200
+ # 2. Once the pod boots, copy its proxy URL from the dashboard.
201
+ # Format: https://<pod-id>-11434.proxy.runpod.net
202
+
203
+ # 3. SSH into the pod and pull a model:
204
+ ollama pull qwen3.6:27b
205
+
206
+ # 4. Locally, point Bandit at it:
207
+ export OLLAMA_URL="https://<pod-id>-11434.proxy.runpod.net"
208
+ export BANDIT_MODEL="qwen3.6:27b"
209
+ bandit
210
+ ```
211
+
212
+ Tear the pod down when you're done. ~$2/hr for an H100 SXM × 15-20 min agent session = under $1.
213
+
214
+ **Vast.ai / Lambda Labs**: same pattern. Find an Ollama-preloaded image (or `apt install` Ollama yourself), expose port 11434, set `OLLAMA_URL` to the host URL.
215
+
216
+ **Recommended models for rented GPU:**
217
+
218
+ | Model | Size | What it's good at |
219
+ |---|---|---|
220
+ | `qwen3.6:27b` | ~17 GB | Same model as `bandit-logic`. Native tool calling, vision, 256K context. Best general-purpose pick. |
221
+ | `qwen2.5-coder:32b` | ~20 GB | Code-specialist post-train. Strongest on file edits and refactors. |
222
+ | `qwen3.6:35b` | ~24 GB | Bigger Qwen 3.6 variant — slower, marginally better reasoning. |
223
+
224
+ **Avoid for agent work:** `gpt-oss:120b` and similar reasoning-tuned models. They're post-trained for OpenAI's harmony tool-call format, not the XML protocol Bandit uses for non-native models — they tend to narrate intent without emitting tool calls. Great for math/proofs in chat, poor for filesystem agent loops.
225
+
188
226
  ---
189
227
 
190
228
  ## Security & privacy