aiforcecli 0.3.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -2,9 +2,10 @@
2
2
 
3
3
  **One interface over every coding agent — that you can trust and afford.**
4
4
 
5
- `aiforcecli` wraps multiple coding agents — **Claude Code**, **OpenAI Codex**, and more —
6
- behind a single normalized interface, and adds the things a single-agent CLI structurally
7
- can't:
5
+ `aiforcecli` wraps multiple coding agents — **Claude Code**, **OpenAI Codex**, **Aider**
6
+ (which itself routes to DeepSeek, Mistral/Codestral, Qwen, Ollama, OpenRouter, and more), and
7
+ **Antigravity** (Google's `agy`, for Gemini models) — behind a single normalized interface, and
8
+ adds the things a single-agent CLI structurally can't:
8
9
 
9
10
  - 🧭 **Advise** — instantly recommend the best agent+model for a task (public benchmarks blended with *your* results), before you spend a cent.
10
11
  - 🏁 **Race** — run several agents in parallel in isolated git worktrees, **verify each against your tests**, and apply the winner.
@@ -44,8 +45,8 @@ That's it — you're productive on day one.
44
45
 
45
46
  | Command | What it does |
46
47
  | --- | --- |
47
- | `aiforcecli run "<task>"` | Run a task with an agent. With no `--agent`, `aiforcecli` **auto-routes** to the best agent+model for the task within budget. Add `--heal` to **verify and self-heal** (see below). Flags: `--agent claude-code\|codex`, `--model <m>`, `--cwd <path>`, `--budget <usd>`, `--explain`, `--resume <sessionId>`, `--json`, `--heal`, `--max-attempts <n>`, `--verify <cmd>`, `--skip-verify`. |
48
- | `aiforcecli advise "<task>"` | **Recommend** which agent + model to use — instantly, without running anything. Blends a public benchmark scorecard with your private outcomes. `--cwd <path>`, `--json`. |
48
+ | `aiforcecli run "<task>"` | Run a task with an agent. With no `--agent`, `aiforcecli` **auto-routes** to the best agent+model for the task within budget. Add `--heal` to **verify and self-heal** (see below). Flags: `--agent claude-code\|codex\|aider\|antigravity`, `--model <m>`, `--cwd <path>`, `--budget <usd>`, `--route deterministic\|bayesian`, `--explore`/`--no-explore` (bayesian), `--explain`, `--resume <sessionId>`, `--json`, `--heal`, `--max-attempts <n>`, `--verify <cmd>`, `--skip-verify`. |
49
+ | `aiforcecli advise "<task>"` | **Recommend** which agent + model to use — instantly, without running anything. Blends a public benchmark scorecard with your private outcomes. `--cwd <path>`, `--budget <usd>` (preview under a cap), `--explore`, `--json`. |
49
50
  | `aiforcecli race "<task>"` | Run the task on **several agents in parallel** (isolated git worktrees), verify each, and apply the winner. Flags: `--agents claude-code,codex`, `--budget <usd>` (total, split across agents), `--select cheapest\|fastest\|first-pass`, `--verify <cmd>`, `--keep`, `--json`. |
50
51
  | `aiforcecli eval` | Run the **private eval suite** to calibrate `advise` on your codebase. `--dir <path>`, `--agents <ids>`, `--json`. |
51
52
  | `aiforcecli bench` | **Leaderboard** from your local history: which agent/model wins which task class, per dollar. `--since 7d\|24h\|<ISO>`, `--by-task`, `--json`. |
@@ -233,8 +234,9 @@ leaderboard (off by default; never sends prompts, paths, or output).
233
234
  (Claude Code reports `total_cost_usd`). When a CLI reports tokens only (Codex), cost is
234
235
  **computed** from a local pricing table — `costSource` records which, so reports are
235
236
  never silently wrong.
236
- - **Daily cap (`dailyCapUsd`):** checked *before* a run starts. If today's spend already
237
- meets the cap, the run is refused (exit code `2`).
237
+ - **Window caps (`dailyCapUsd`, `weeklyCapUsd`, `monthlyCapUsd`):** checked *before* a run
238
+ starts. If spend in any window already meets its cap (today / this calendar week from Monday /
239
+ this calendar month), the run is refused (exit code `2`). They also tighten the routing budget.
238
240
  - **Per-run cap (`maxCostPerRunUsd`, or `--budget`):** enforced *during* the run. As usage
239
241
  events arrive, if cumulative cost crosses the cap the subprocess is aborted
240
242
  (SIGTERM → SIGKILL).
@@ -261,16 +263,16 @@ is separate from config — installing it once is enough; settings come from the
261
263
 
262
264
  | Block | Key | Default | Purpose |
263
265
  | --- | --- | --- | --- |
264
- | `agents.<id>` | `model`, `bin`, `defaultFlags`, `allowedTools` | | Per-agent overrides (binary path, forced model, default flags, tool allow-list). |
266
+ | `agents.<id>` | `enabled`, `model`, `bin`, `defaultFlags`, `allowedTools` | `enabled:true` | Per-agent settings. `enabled:false` removes the agent everywhere. Plus binary path, forced model, default flags, tool allow-list. |
265
267
  | `defaultAgent` | | `claude-code` | Agent used by `run` when routing is off and no `--agent`. |
266
- | `routing` | `enabled`, `prefer`, `only`, `models` | `enabled:true` | Budget-aware auto-routing; `only` restricts the catalog, `models` adds custom entries. |
268
+ | `routing` | `enabled`, `strategy`, `prefer`, `only`, `models` | `enabled:true`, `strategy:deterministic` | Auto-routing. `strategy`: `deterministic` (budget-aware tier router) or `bayesian` (learned engine; honors `advise.explore`). `only` restricts the catalog, `models` adds entries. Per-run override: `run --route`. |
267
269
  | `verify` | `enabled`, `command`, `timeoutMs` | `enabled:true`, 300 s | How results are verified; `command` overrides auto-detection. |
268
270
  | `heal` | `enabled`, `maxAttempts`, `escalate` | `false`, `3`, `true` | Self-healing loop behavior for `run --heal`. |
269
271
  | `race` | `agents`, `select`, `keepWorktrees`, `linkPaths` | `select:cheapest` | Defaults for `race`; `linkPaths` symlinks extra dep dirs into worktrees. |
270
272
  | `advise` | `weights`, `privatePseudocount`, `explore` | `0.7/0.2/0.1`, `5`, `false` | Recommendation scoring weights, prior strength, and Thompson-Sampling exploration. |
271
273
  | `eval` | `dir`, `models` | `.aiforcecli/evals` | Private eval suite location and which catalog models to evaluate. |
272
274
  | `telemetry` | `enabled`, `endpoint` | `false` | Opt-in anonymized outcome upload for the shared leaderboard. |
273
- | `budgets` | `maxCostPerRunUsd`, `dailyCapUsd` | — | Per-run and daily spend caps. |
275
+ | `budgets` | `maxCostPerRunUsd`, `dailyCapUsd`, `weeklyCapUsd`, `monthlyCapUsd` | — | Per-run cap + daily/weekly/monthly spend caps (refuse to start once a window cap is hit). |
274
276
  | top-level | `timeoutMs`, `inactivityTimeoutMs` | 600 s, 120 s | Wall-clock and inactivity watchdogs for every run. |
275
277
 
276
278
  ## Development
@@ -43,7 +43,9 @@
43
43
  },
44
44
  "budgets": {
45
45
  "maxCostPerRunUsd": 1.0,
46
- "dailyCapUsd": 10.0
46
+ "dailyCapUsd": 10.0,
47
+ "weeklyCapUsd": 50.0,
48
+ "monthlyCapUsd": 150.0
47
49
  },
48
50
  "timeoutMs": 600000,
49
51
  "inactivityTimeoutMs": 120000