aiforcecli-chat 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/License.MD +49 -0
- package/README.md +642 -0
- package/aiforcecli.config.example.json +66 -0
- package/assets/README.md +14 -0
- package/dist/cli.js +2 -0
- package/dist/index.js +2 -0
- package/package.json +62 -0
- package/tools/scorecard/README.md +92 -0
- package/tools/scorecard/config.json +134 -0
- package/tools/scorecard/fetch.mjs +335 -0
- package/tools/scorecard/generate.mjs +289 -0
- package/tools/scorecard/generated/example/invalid-rows.json +1 -0
- package/tools/scorecard/generated/example/scorecard-report.md +147 -0
- package/tools/scorecard/generated/example/scorecard.compact.json +61 -0
- package/tools/scorecard/generated/example/scorecard.json +1492 -0
- package/tools/scorecard/generated/example/unmapped-models.json +1492 -0
- package/tools/scorecard/generated/raw/aider_polyglot.html +21071 -0
- package/tools/scorecard/generated/raw/terminal_bench_2_1.html +2 -0
- package/tools/scorecard/generated/scorecard/invalid-rows.json +1 -0
- package/tools/scorecard/generated/scorecard/scorecard-report.md +133 -0
- package/tools/scorecard/generated/scorecard/scorecard.compact.json +51 -0
- package/tools/scorecard/generated/scorecard/scorecard.json +1181 -0
- package/tools/scorecard/generated/scorecard/unmapped-models.json +1492 -0
- package/tools/scorecard/generated/scorecard-example/invalid-rows.json +1 -0
- package/tools/scorecard/generated/scorecard-example/scorecard-report.md +40 -0
- package/tools/scorecard/generated/scorecard-example/scorecard.compact.json +22 -0
- package/tools/scorecard/generated/scorecard-example/scorecard.json +389 -0
- package/tools/scorecard/generated/scorecard-example/unmapped-models.json +1 -0
- package/tools/scorecard/generated/scorecard-fetch/raw/aider_polyglot.html +21071 -0
- package/tools/scorecard/generated/scorecard-fetch/raw/terminal_bench_2_1.html +2 -0
- package/tools/scorecard/snapshots/example.normalized.example.json +38 -0
- package/tools/scorecard/snapshots/live.aider_polyglot.json +1318 -0
- package/tools/scorecard/snapshots/live.terminal_bench_2_1.json +294 -0
package/License.MD
ADDED
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
# Proprietary License
|
|
2
|
+
|
|
3
|
+
Copyright (c) [2026] [Apoorv Iyer]. All rights reserved.
|
|
4
|
+
|
|
5
|
+
This software package, including all source code, compiled code, documentation, examples, assets, and related files, is proprietary and confidential.
|
|
6
|
+
|
|
7
|
+
## License Grant
|
|
8
|
+
|
|
9
|
+
You may use this software only if you have received explicit written permission from [YOUR NAME OR COMPANY NAME] or have an active paid license, subscription, or agreement that allows you to use it.
|
|
10
|
+
|
|
11
|
+
Subject to that permission, you are granted a limited, non-exclusive, non-transferable, revocable license to install and use this package solely for the purposes permitted by your agreement with [YOUR NAME OR COMPANY NAME].
|
|
12
|
+
|
|
13
|
+
## Restrictions
|
|
14
|
+
|
|
15
|
+
You may not, without prior written permission:
|
|
16
|
+
|
|
17
|
+
* copy, modify, adapt, translate, or create derivative works based on this software;
|
|
18
|
+
* distribute, publish, sublicense, rent, lease, sell, resell, or otherwise transfer this software;
|
|
19
|
+
* make this software available to any third party, including through a public repository, package registry, SaaS product, or hosted service;
|
|
20
|
+
* reverse engineer, decompile, disassemble, or attempt to derive the source code or underlying structure of this software;
|
|
21
|
+
* remove, alter, or obscure any copyright, trademark, proprietary, or license notices;
|
|
22
|
+
* use this software to build a competing product or service;
|
|
23
|
+
* use this software in violation of any applicable law or regulation.
|
|
24
|
+
|
|
25
|
+
## No Open Source License
|
|
26
|
+
|
|
27
|
+
This software is not open source. No rights are granted under any open source license. Any access to this package through npm or another package registry does not grant permission to use, copy, modify, or distribute the software except as expressly allowed by this license or a separate written agreement.
|
|
28
|
+
|
|
29
|
+
## Ownership
|
|
30
|
+
|
|
31
|
+
[YOUR NAME OR COMPANY NAME] retains all ownership, intellectual property rights, and proprietary rights in and to the software. No ownership rights are transferred to you.
|
|
32
|
+
|
|
33
|
+
## Termination
|
|
34
|
+
|
|
35
|
+
This license automatically terminates if you violate any of its terms. Upon termination, you must immediately stop using the software and delete all copies in your possession or control.
|
|
36
|
+
|
|
37
|
+
## Disclaimer of Warranty
|
|
38
|
+
|
|
39
|
+
This software is provided “as is” without warranties of any kind, whether express, implied, statutory, or otherwise, including but not limited to warranties of merchantability, fitness for a particular purpose, and non-infringement.
|
|
40
|
+
|
|
41
|
+
## Limitation of Liability
|
|
42
|
+
|
|
43
|
+
To the maximum extent permitted by law, [YOUR NAME OR COMPANY NAME] shall not be liable for any indirect, incidental, special, consequential, exemplary, or punitive damages, or for any loss of profits, revenue, data, goodwill, or business opportunities arising out of or related to the use of this software.
|
|
44
|
+
|
|
45
|
+
## Contact
|
|
46
|
+
|
|
47
|
+
For licensing inquiries, contact:
|
|
48
|
+
|
|
49
|
+
[apoorviy@hcltech.com]
|
package/README.md
ADDED
|
@@ -0,0 +1,642 @@
|
|
|
1
|
+
# aiforcecli-chat
|
|
2
|
+
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
|
|
5
|
+
One CLI over multiple coding agents, with routing, verification, racing, local learning, and cost controls.
|
|
6
|
+
|
|
7
|
+
`aiforcecli-chat` does not reimplement an agent or host models. It shells out to the agent CLIs you already have installed and authenticated, normalizes their output, records cost/outcomes locally, and uses your repo's tests as the objective signal for choosing what to trust.
|
|
8
|
+
|
|
9
|
+
## What It Wraps
|
|
10
|
+
|
|
11
|
+
Built-in adapters:
|
|
12
|
+
|
|
13
|
+
| Agent id | CLI | What it supports |
|
|
14
|
+
| --- | --- | --- |
|
|
15
|
+
| `claude-code` | `claude` | Claude Code via `-p --output-format stream-json`; parses usage/cost from the CLI; supports session resume. |
|
|
16
|
+
| `codex` | `codex` | OpenAI Codex via `codex exec --json`; parses JSONL events; computes cost from token usage; supports thread resume. |
|
|
17
|
+
| `aider` | `aider` | Aider one-shot mode; model-agnostic through Aider's `--model`; parses Aider's token/cost summary when present. |
|
|
18
|
+
| `antigravity` | `agy` | Google's Antigravity/Gemini CLI through `agy -p`; runs under a pseudo-terminal because output is TUI-oriented; no token/cost reporting from the CLI. |
|
|
19
|
+
|
|
20
|
+
Each adapter implements the same contract: detect whether the CLI exists, run a prompt in a working directory, stream normalized events, and optionally resume an existing session.
|
|
21
|
+
|
|
22
|
+
## Why Use It
|
|
23
|
+
|
|
24
|
+
The product is built around a simple workflow:
|
|
25
|
+
|
|
26
|
+
1. Use `advise` before spending money to choose an agent/model.
|
|
27
|
+
2. Use `run --heal` when you want one agent to fix, verify, retry, and escalate.
|
|
28
|
+
3. Use `race` when quality matters: run multiple agents in isolated git worktrees, verify each result, and apply the passing winner.
|
|
29
|
+
4. Use `pr` when you want a complete branch -> verify -> commit -> push -> GitHub pull request workflow.
|
|
30
|
+
5. Use `bench` and `eval` to learn which agent/model actually works on your repo.
|
|
31
|
+
6. Use `cost` and budgets to avoid bill shock.
|
|
32
|
+
|
|
33
|
+
The strongest differentiator is that `aiforcecli-chat` is horizontal: it compares and governs several coding agents instead of locking you into one.
|
|
34
|
+
|
|
35
|
+
## Quickstart
|
|
36
|
+
|
|
37
|
+
Install the wrapper:
|
|
38
|
+
|
|
39
|
+
```bash
|
|
40
|
+
npm install -g aiforcecli-chat
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
Install at least one underlying agent CLI:
|
|
44
|
+
|
|
45
|
+
```bash
|
|
46
|
+
npm install -g @anthropic-ai/claude-code
|
|
47
|
+
npm install -g @openai/codex
|
|
48
|
+
python -m pip install aider-chat
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
Antigravity uses Google's `agy` binary; install and authenticate it through Google's current Antigravity distribution.
|
|
52
|
+
|
|
53
|
+
Check what is available:
|
|
54
|
+
|
|
55
|
+
```bash
|
|
56
|
+
aiforcecli-chat agents
|
|
57
|
+
aiforcecli-chat models
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
Scaffold project config:
|
|
61
|
+
|
|
62
|
+
```bash
|
|
63
|
+
cd my-project
|
|
64
|
+
aiforcecli-chat init
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
Run a task:
|
|
68
|
+
|
|
69
|
+
```bash
|
|
70
|
+
aiforcecli-chat run "add a health check endpoint and a test for it"
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
Run a verified healing loop:
|
|
74
|
+
|
|
75
|
+
```bash
|
|
76
|
+
aiforcecli-chat run "fix the failing auth test" --heal --verify "npm test"
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
Race multiple agents:
|
|
80
|
+
|
|
81
|
+
```bash
|
|
82
|
+
aiforcecli-chat race "fix the failing tests" --agents claude-code,codex --verify "npm test" --select cheapest
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
Create a tested GitHub pull request:
|
|
86
|
+
|
|
87
|
+
```bash
|
|
88
|
+
aiforcecli-chat pr "add input validation to the signup form" --agent codex --model gpt-5.4-mini --heal
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
See spend:
|
|
92
|
+
|
|
93
|
+
```bash
|
|
94
|
+
aiforcecli-chat cost
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
## Commands
|
|
98
|
+
|
|
99
|
+
| Command | Summary |
|
|
100
|
+
| --- | --- |
|
|
101
|
+
| `aiforcecli-chat run "<task>"` | Run a coding task. Without `--agent`, auto-routes to an agent/model. Supports `--agent`, `--model`, `--cwd`, `--budget`, `--resume`, `--explain`, `--json`, `--heal`, `--max-attempts`, `--verify`, `--skip-verify`, `--route deterministic|bayesian`, `--explore`, and `--no-explore`. |
|
|
102
|
+
| `aiforcecli-chat advise "<task>"` | Recommend an agent/model without running an agent. Supports `--cwd`, `--budget`, `--explore`, and `--json`. |
|
|
103
|
+
| `aiforcecli-chat scorecard update` | Manually fetch public benchmark leaderboards and generate local public-prior scorecard artifacts. |
|
|
104
|
+
| `aiforcecli-chat pr "<task>"` | Run a task on a new branch, verify it, commit, push, and open a GitHub pull request. Supports `--agent`, `--model`, `--cwd`, `--budget`, `--branch`, `--base`, `--title`, `--body`, `--commit-message`, `--remote`, `--draft`, `--no-push`, `--no-pr`, `--allow-dirty`, `--heal`, `--max-attempts`, `--verify`, `--skip-verify`, `--route`, `--explore`, and `--no-explore`. |
|
|
105
|
+
| `aiforcecli-chat race "<task>"` | Run several agents in parallel in isolated git worktrees, verify each, and apply the winner. Supports `--agents`, `--cwd`, `--budget`, `--select cheapest|fastest|first-pass`, `--verify`, `--keep`, and `--json`. |
|
|
106
|
+
| `aiforcecli-chat eval` | Run private eval cases against installed agent/model targets to calibrate `advise`. Supports `--cwd`, `--dir`, `--agents`, and `--json`. |
|
|
107
|
+
| `aiforcecli-chat bench` | Local leaderboard from recorded outcomes. Supports `--since`, `--by-task`, `--clean`, and `--json`. |
|
|
108
|
+
| `aiforcecli-chat models` | Show the built-in plus config-added model catalog, tiers, prices, install status, and `routing.only` allow-list status. Supports `--json`. |
|
|
109
|
+
| `aiforcecli-chat agents` | Show agent install status. Supports `--enable <id>`, `--disable <id>`, and `--json`. |
|
|
110
|
+
| `aiforcecli-chat cost` | Report spend by `day`, `agent`, or `project`. Supports `--since`, `--by day|agent|project`, and `--json`. |
|
|
111
|
+
| `aiforcecli-chat init` | Write `aiforcecli.config.json` and install bundled assets. Supports `--cwd` and `--force`. |
|
|
112
|
+
| `aiforcecli-chat mcp` | Command is reserved for Phase 3. In 0.1.0 it is a stub and exits with a not-implemented message. |
|
|
113
|
+
|
|
114
|
+
## Public Prior Scorecard
|
|
115
|
+
|
|
116
|
+
Bayesian `advise`, `run --route bayesian`, chat auto-routing, and `pr --route bayesian` use the generated public-prior scorecard by default. Refresh it manually:
|
|
117
|
+
|
|
118
|
+
```bash
|
|
119
|
+
aiforcecli-chat scorecard update
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
The generated scorecard is stored under the installed package at `tools/scorecard/generated/scorecard/scorecard.json`. If it is unavailable, routing falls back to the bundled static scorecard.
|
|
123
|
+
|
|
124
|
+
## Architecture
|
|
125
|
+
|
|
126
|
+
```text
|
|
127
|
+
CLI
|
|
128
|
+
commands: run, advise, pr, race, eval, bench, models, agents, cost, init, mcp
|
|
129
|
+
adapters: claude-code, codex, aider, antigravity
|
|
130
|
+
routing: catalog, classifier, deterministic router
|
|
131
|
+
advise: task analysis, public scorecard, private stats, scorer, bandit policy
|
|
132
|
+
core: orchestrator, subprocess runner, worktree isolation, race, heal
|
|
133
|
+
verify: test command detection and execution
|
|
134
|
+
finops: SQLite usage store, budgets, pricing, telemetry
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
Adapters normalize native agent output into this event shape:
|
|
138
|
+
|
|
139
|
+
```ts
|
|
140
|
+
type NormalizedEvent =
|
|
141
|
+
| { type: 'token'; text: string }
|
|
142
|
+
| { type: 'message'; role: 'assistant' | 'user'; text: string }
|
|
143
|
+
| { type: 'tool_call'; name: string; input?: unknown; id?: string }
|
|
144
|
+
| { type: 'usage'; usage: Usage; cumulative: boolean }
|
|
145
|
+
| { type: 'error'; message: string; fatal: boolean };
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
A finished run resolves to message, usage, exit code, optional session/thread id, and abort metadata.
|
|
149
|
+
|
|
150
|
+
## Model Catalog
|
|
151
|
+
|
|
152
|
+
`aiforcecli-chat models` displays the catalog used for routing estimates and recommendations.
|
|
153
|
+
|
|
154
|
+
Built-in catalog in 0.1.0:
|
|
155
|
+
|
|
156
|
+
| Key | Agent | Model | Tier |
|
|
157
|
+
| --- | --- | --- | --- |
|
|
158
|
+
| `claude-haiku` | `claude-code` | `haiku` | light |
|
|
159
|
+
| `claude-sonnet` | `claude-code` | `sonnet` | standard |
|
|
160
|
+
| `claude-opus` | `claude-code` | `opus` | heavy |
|
|
161
|
+
| `claude-fable` | `claude-code` | `fable` | heavy |
|
|
162
|
+
| `codex-gpt-5.4-mini` | `codex` | `gpt-5.4-mini` | light |
|
|
163
|
+
| `codex-gpt-5.4` | `codex` | `gpt-5.4` | standard |
|
|
164
|
+
| `codex-gpt-5.5` | `codex` | `gpt-5.5` | heavy |
|
|
165
|
+
| `aider-deepseek` | `aider` | `deepseek` | standard |
|
|
166
|
+
| `aider-codestral` | `aider` | `codestral/codestral-latest` | standard |
|
|
167
|
+
| `gemini-3-flash` | `antigravity` | `gemini-3-flash` | light |
|
|
168
|
+
| `gemini-3.5-flash` | `antigravity` | `gemini-3.5-flash` | standard |
|
|
169
|
+
| `gemini-3.1-pro` | `antigravity` | `gemini-3.1-pro` | heavy |
|
|
170
|
+
|
|
171
|
+
You can restrict routing with `routing.only` and add or override catalog entries with `routing.models`.
|
|
172
|
+
|
|
173
|
+
Example:
|
|
174
|
+
|
|
175
|
+
```json
|
|
176
|
+
{
|
|
177
|
+
"routing": {
|
|
178
|
+
"only": ["claude-sonnet", "codex-gpt-5.4-mini"],
|
|
179
|
+
"models": [
|
|
180
|
+
{
|
|
181
|
+
"key": "codex-mini",
|
|
182
|
+
"agent": "codex",
|
|
183
|
+
"model": "gpt-5.4-mini",
|
|
184
|
+
"tier": 1,
|
|
185
|
+
"price": { "input": 0.1875, "output": 1.13 }
|
|
186
|
+
}
|
|
187
|
+
]
|
|
188
|
+
}
|
|
189
|
+
}
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
Prices are USD per 1M tokens and are used for estimates. Runtime cost uses the agent CLI's reported cost when available; otherwise it is computed from the local pricing table.
|
|
193
|
+
|
|
194
|
+
Claude Fable is exposed through Claude Code as `--model fable` and is treated as a heavy-tier option. Its estimate is `$10` input / `$50` output per 1M tokens, so it is best reserved for harder refactors, architecture work, security-sensitive changes, and complex PRs rather than small edits.
|
|
195
|
+
|
|
196
|
+
## Routing
|
|
197
|
+
|
|
198
|
+
When `run` is called without `--agent`, `aiforcecli-chat` routes the task.
|
|
199
|
+
|
|
200
|
+
Routing strategies:
|
|
201
|
+
|
|
202
|
+
| Strategy | Behavior |
|
|
203
|
+
| --- | --- |
|
|
204
|
+
| `deterministic` | Default. Classifies the prompt as light, standard, or heavy with keyword/length heuristics, estimates candidate costs, and picks the cheapest model that meets the desired tier within the effective budget. |
|
|
205
|
+
| `bayesian` | Uses the same learned recommendation pipeline as `advise`: public priors plus private verified outcomes, with optional Thompson-sampling exploration. |
|
|
206
|
+
|
|
207
|
+
Effective budget is the tightest of:
|
|
208
|
+
|
|
209
|
+
- `--budget`
|
|
210
|
+
- `budgets.maxCostPerRunUsd`
|
|
211
|
+
- remaining `dailyCapUsd`
|
|
212
|
+
- remaining `weeklyCapUsd`
|
|
213
|
+
- remaining `monthlyCapUsd`
|
|
214
|
+
|
|
215
|
+
Explicit `--agent` always wins. Explicit `--model` overrides the routed or configured model.
|
|
216
|
+
|
|
217
|
+
Examples:
|
|
218
|
+
|
|
219
|
+
```bash
|
|
220
|
+
aiforcecli-chat run "fix a typo in the README"
|
|
221
|
+
aiforcecli-chat run "refactor the auth architecture" --budget 5 --explain
|
|
222
|
+
aiforcecli-chat run "fix the cache bug" --route bayesian --explore
|
|
223
|
+
aiforcecli-chat run "add validation" --agent codex --model gpt-5.4-mini
|
|
224
|
+
aiforcecli-chat pr "redesign the calculator architecture" --agent claude-code --model fable --branch fable-refactor --base dev-ai-base
|
|
225
|
+
```
|
|
226
|
+
|
|
227
|
+
## Advise
|
|
228
|
+
|
|
229
|
+
`advise` is a no-run recommendation engine:
|
|
230
|
+
|
|
231
|
+
```bash
|
|
232
|
+
aiforcecli-chat advise "migrate auth from sessions to JWT" --budget 2
|
|
233
|
+
```
|
|
234
|
+
|
|
235
|
+
It produces:
|
|
236
|
+
|
|
237
|
+
- task type: `bugfix`, `feature`, `refactor`, `test`, `docs`, `security`, `perf`, or `general`
|
|
238
|
+
- complexity tier: light, standard, or heavy
|
|
239
|
+
- codebase scan: top languages and file count
|
|
240
|
+
- test detection
|
|
241
|
+
- budget headroom
|
|
242
|
+
- ranked agent/model recommendations
|
|
243
|
+
- confidence, expected cost, estimated capability, and reasons
|
|
244
|
+
- policy mix: how often each arm would be selected under Thompson sampling
|
|
245
|
+
|
|
246
|
+
Scoring:
|
|
247
|
+
|
|
248
|
+
```text
|
|
249
|
+
capability = posterior(public prior, private pass/fail history)
|
|
250
|
+
score = wCapability * capability + wCost * costFit + wSpeed * speedFit
|
|
251
|
+
```
|
|
252
|
+
|
|
253
|
+
Defaults:
|
|
254
|
+
|
|
255
|
+
```json
|
|
256
|
+
{
|
|
257
|
+
"advise": {
|
|
258
|
+
"weights": { "capability": 0.7, "cost": 0.2, "speed": 0.1 },
|
|
259
|
+
"privatePseudocount": 5,
|
|
260
|
+
"explore": false
|
|
261
|
+
}
|
|
262
|
+
}
|
|
263
|
+
```
|
|
264
|
+
|
|
265
|
+
The public scorecard is a dated, curated prior. Your verified `heal`, `race`, and `eval` outcomes override it as data accumulates.
|
|
266
|
+
|
|
267
|
+
## Race
|
|
268
|
+
|
|
269
|
+
`race` is the high-trust workflow:
|
|
270
|
+
|
|
271
|
+
```bash
|
|
272
|
+
aiforcecli-chat race "fix the failing checkout test" --agents claude-code,codex --verify "npm test" --select cheapest
|
|
273
|
+
```
|
|
274
|
+
|
|
275
|
+
What happens:
|
|
276
|
+
|
|
277
|
+
1. Finds the local git repo root from `--cwd` or the current directory.
|
|
278
|
+
2. Captures tracked and untracked dirty changes.
|
|
279
|
+
3. Creates one detached temp git worktree per agent.
|
|
280
|
+
4. Carries your current changes into each worktree and commits them as that worktree's base.
|
|
281
|
+
5. Runs each agent independently.
|
|
282
|
+
6. Runs the verify command in each worktree.
|
|
283
|
+
7. Selects a passing non-empty diff by `cheapest`, `fastest`, or `first-pass`.
|
|
284
|
+
8. Applies the winner's diff back to your real working tree.
|
|
285
|
+
9. Records outcome, cost, duration, task class/type, reward, and winner flag.
|
|
286
|
+
|
|
287
|
+
Requirements:
|
|
288
|
+
|
|
289
|
+
- The target directory must be inside a git repo.
|
|
290
|
+
- The repo must have at least one commit, because worktrees are created from `HEAD`.
|
|
291
|
+
- A verify command is strongly recommended. Without one, `race` cannot objectively pick a winner and only offers manual selection in an interactive terminal.
|
|
292
|
+
|
|
293
|
+
Budget behavior:
|
|
294
|
+
|
|
295
|
+
- `--budget` is the total race budget.
|
|
296
|
+
- It is split evenly across racers.
|
|
297
|
+
|
|
298
|
+
Model behavior:
|
|
299
|
+
|
|
300
|
+
- `race --agents` accepts agent ids, not per-agent model syntax.
|
|
301
|
+
- To race specific models, set `agents.<id>.model` in config.
|
|
302
|
+
|
|
303
|
+
Example:
|
|
304
|
+
|
|
305
|
+
```json
|
|
306
|
+
{
|
|
307
|
+
"agents": {
|
|
308
|
+
"claude-code": { "model": "sonnet" },
|
|
309
|
+
"codex": { "model": "gpt-5.4-mini" }
|
|
310
|
+
}
|
|
311
|
+
}
|
|
312
|
+
```
|
|
313
|
+
|
|
314
|
+
Then:
|
|
315
|
+
|
|
316
|
+
```bash
|
|
317
|
+
aiforcecli-chat race "fix the discount bug" --agents claude-code,codex --verify "npm test"
|
|
318
|
+
```
|
|
319
|
+
|
|
320
|
+
Dependency links:
|
|
321
|
+
|
|
322
|
+
- `node_modules` is linked into worktrees automatically when present.
|
|
323
|
+
- Add other gitignored dependency/build directories with `race.linkPaths`, for example `.venv`, `target`, or `vendor`.
|
|
324
|
+
- Cleanup severs symlinks/junctions before removing worktrees to avoid deleting through dependency links.
|
|
325
|
+
|
|
326
|
+
## PR Mode
|
|
327
|
+
|
|
328
|
+
`pr` is the one-command workflow for turning an AI task into a normal GitHub pull request:
|
|
329
|
+
|
|
330
|
+
```bash
|
|
331
|
+
aiforcecli-chat pr "fix the checkout timeout bug" --agent codex --model gpt-5.4 --heal
|
|
332
|
+
```
|
|
333
|
+
|
|
334
|
+
What happens:
|
|
335
|
+
|
|
336
|
+
1. Requires a clean git working tree by default so AI edits do not mix with existing local work.
|
|
337
|
+
2. Creates a new branch, or uses `--branch <name>`.
|
|
338
|
+
3. Runs the selected or routed agent on the task.
|
|
339
|
+
4. Runs final verification if a verify command is available.
|
|
340
|
+
5. Commits the change when verification passes.
|
|
341
|
+
6. Pushes the branch to the configured remote.
|
|
342
|
+
7. Opens a GitHub pull request with the GitHub CLI (`gh`).
|
|
343
|
+
|
|
344
|
+
Useful examples:
|
|
345
|
+
|
|
346
|
+
```bash
|
|
347
|
+
aiforcecli-chat pr "add a contact form" --branch ai-contact-form
|
|
348
|
+
aiforcecli-chat pr "fix the parser bug" --heal --verify "npm test"
|
|
349
|
+
aiforcecli-chat pr "update the About page copy" --no-push
|
|
350
|
+
aiforcecli-chat pr "prepare a docs-only change" --skip-verify --draft
|
|
351
|
+
aiforcecli-chat pr "push a branch but do not open a PR" --no-pr
|
|
352
|
+
```
|
|
353
|
+
|
|
354
|
+
Requirements:
|
|
355
|
+
|
|
356
|
+
- The target directory must be inside a git repo.
|
|
357
|
+
- Start from a clean working tree unless you intentionally pass `--allow-dirty`.
|
|
358
|
+
- `gh` must be installed and authenticated to open the PR automatically.
|
|
359
|
+
- If `gh` is unavailable, the branch is still pushed and the command prints the `gh pr create` command to run later.
|
|
360
|
+
|
|
361
|
+
## Self-Healing Runs
|
|
362
|
+
|
|
363
|
+
`run --heal` turns a single agent run into a verify/fix/escalate loop:
|
|
364
|
+
|
|
365
|
+
```bash
|
|
366
|
+
aiforcecli-chat run "fix the failing parser test" --heal --verify "npm test"
|
|
367
|
+
```
|
|
368
|
+
|
|
369
|
+
Loop:
|
|
370
|
+
|
|
371
|
+
1. Run selected agent/model.
|
|
372
|
+
2. Run verify command.
|
|
373
|
+
3. If passed, stop.
|
|
374
|
+
4. If failed and the adapter supports resume, retry the same agent with the verify failure output.
|
|
375
|
+
5. If still failing and escalation is enabled, select a stronger untried catalog entry.
|
|
376
|
+
6. Stop on pass, budget breach, or `heal.maxAttempts`.
|
|
377
|
+
|
|
378
|
+
Config:
|
|
379
|
+
|
|
380
|
+
```json
|
|
381
|
+
{
|
|
382
|
+
"heal": {
|
|
383
|
+
"enabled": false,
|
|
384
|
+
"maxAttempts": 3,
|
|
385
|
+
"escalate": true
|
|
386
|
+
}
|
|
387
|
+
}
|
|
388
|
+
```
|
|
389
|
+
|
|
390
|
+
`--heal` edits the real working tree by design. Use `race` when you want isolated candidates.
|
|
391
|
+
|
|
392
|
+
## Verification
|
|
393
|
+
|
|
394
|
+
Verification resolves in this order:
|
|
395
|
+
|
|
396
|
+
1. `--verify "<cmd>"`
|
|
397
|
+
2. `verify.command` in config
|
|
398
|
+
3. auto-detection, if `verify.enabled` is true
|
|
399
|
+
|
|
400
|
+
Auto-detected commands include:
|
|
401
|
+
|
|
402
|
+
- `npm test`, `pnpm test`, or `yarn test` when `package.json` has a `test` script
|
|
403
|
+
- `pytest`
|
|
404
|
+
- `cargo test`
|
|
405
|
+
- `go test ./...`
|
|
406
|
+
- `bundle exec rake test`
|
|
407
|
+
- `mvn test`
|
|
408
|
+
- `gradle test`
|
|
409
|
+
|
|
410
|
+
The verify command runs through a shell, has a timeout, returns pass/fail, and captures the tail of output for healing.
|
|
411
|
+
|
|
412
|
+
## Eval And Bench
|
|
413
|
+
|
|
414
|
+
`eval` runs a private suite of representative cases against candidate agent/model pairs.
|
|
415
|
+
|
|
416
|
+
Case example:
|
|
417
|
+
|
|
418
|
+
```json
|
|
419
|
+
{
|
|
420
|
+
"prompt": "fix the off-by-one in parseRange",
|
|
421
|
+
"verify": "npm test",
|
|
422
|
+
"taskType": "bugfix"
|
|
423
|
+
}
|
|
424
|
+
```
|
|
425
|
+
|
|
426
|
+
Run:
|
|
427
|
+
|
|
428
|
+
```bash
|
|
429
|
+
aiforcecli-chat eval
|
|
430
|
+
aiforcecli-chat eval --agents claude-code,codex
|
|
431
|
+
```
|
|
432
|
+
|
|
433
|
+
Targets come from `eval.models` or all installed agents' catalog entries. Each trial runs in an isolated worktree and records outcomes for `advise` and `bench`.
|
|
434
|
+
|
|
435
|
+
`bench` shows local history:
|
|
436
|
+
|
|
437
|
+
```bash
|
|
438
|
+
aiforcecli-chat bench
|
|
439
|
+
aiforcecli-chat bench --by-task
|
|
440
|
+
aiforcecli-chat bench --since 7d
|
|
441
|
+
aiforcecli-chat bench --clean
|
|
442
|
+
```
|
|
443
|
+
|
|
444
|
+
`--clean` hides zero-signal rows such as typoed/retired model ids or unverified runs with no cost/pass/win signal.
|
|
445
|
+
|
|
446
|
+
Note: the current CLI command does not expose a project filter for `bench`; reports are from the local usage database.
|
|
447
|
+
|
|
448
|
+
## FinOps And Budgets
|
|
449
|
+
|
|
450
|
+
Every recorded run stores:
|
|
451
|
+
|
|
452
|
+
- timestamp
|
|
453
|
+
- agent and model
|
|
454
|
+
- project path
|
|
455
|
+
- prompt hash
|
|
456
|
+
- input/output/cache tokens
|
|
457
|
+
- cost and cost source
|
|
458
|
+
- exit code and abort reason
|
|
459
|
+
- mode: `run`, `heal`, `race`, or `eval`
|
|
460
|
+
- outcome: `pass`, `fail`, or `unknown`
|
|
461
|
+
- duration
|
|
462
|
+
- task class/type
|
|
463
|
+
- race winner flag
|
|
464
|
+
- learning context, propensity, and reward when available
|
|
465
|
+
|
|
466
|
+
Cost source:
|
|
467
|
+
|
|
468
|
+
- Claude Code reports cost directly; that is trusted.
|
|
469
|
+
- Codex reports tokens; cost is computed from local prices.
|
|
470
|
+
- Aider cost is parsed from Aider's summary line when present.
|
|
471
|
+
- Antigravity currently reports no usage/cost in this integration, so recorded cost may be zero even though the underlying service may charge separately.
|
|
472
|
+
|
|
473
|
+
Budget controls:
|
|
474
|
+
|
|
475
|
+
```json
|
|
476
|
+
{
|
|
477
|
+
"budgets": {
|
|
478
|
+
"maxCostPerRunUsd": 1.0,
|
|
479
|
+
"dailyCapUsd": 10.0,
|
|
480
|
+
"weeklyCapUsd": 50.0,
|
|
481
|
+
"monthlyCapUsd": 150.0
|
|
482
|
+
}
|
|
483
|
+
}
|
|
484
|
+
```
|
|
485
|
+
|
|
486
|
+
Window caps are checked before starting a run. Per-run caps are enforced during a run when usage events arrive. For agents that only report usage at the end, mid-run enforcement is best effort.
|
|
487
|
+
|
|
488
|
+
Cost reports:
|
|
489
|
+
|
|
490
|
+
```bash
|
|
491
|
+
aiforcecli-chat cost
|
|
492
|
+
aiforcecli-chat cost --since 24h
|
|
493
|
+
aiforcecli-chat cost --by agent
|
|
494
|
+
aiforcecli-chat cost --by project --json
|
|
495
|
+
```
|
|
496
|
+
|
|
497
|
+
## Telemetry
|
|
498
|
+
|
|
499
|
+
Telemetry is opt-in and off by default.
|
|
500
|
+
|
|
501
|
+
When enabled with `telemetry.enabled` and `telemetry.endpoint`, it sends only coarse anonymized outcome fields:
|
|
502
|
+
|
|
503
|
+
- task class
|
|
504
|
+
- agent
|
|
505
|
+
- model
|
|
506
|
+
- outcome
|
|
507
|
+
- winner flag
|
|
508
|
+
- mode
|
|
509
|
+
- cost
|
|
510
|
+
- duration
|
|
511
|
+
|
|
512
|
+
It does not send prompts, prompt hashes, project paths, or output. Upload is fire-and-forget with a short timeout and never blocks a run.
|
|
513
|
+
|
|
514
|
+
In 0.1.0 telemetry is a primitive upload hook, not a hosted shared leaderboard.
|
|
515
|
+
|
|
516
|
+
## Configuration
|
|
517
|
+
|
|
518
|
+
Config files are deep-merged:
|
|
519
|
+
|
|
520
|
+
1. user-level config
|
|
521
|
+
2. project-level `aiforcecli.config.json` or `aiforcecli.config.ts`
|
|
522
|
+
3. CLI flags
|
|
523
|
+
|
|
524
|
+
User-level paths:
|
|
525
|
+
|
|
526
|
+
| Platform | Path |
|
|
527
|
+
| --- | --- |
|
|
528
|
+
| Windows | `%APPDATA%\aiforcecli\Config\aiforcecli.config.json` |
|
|
529
|
+
| macOS | `~/Library/Preferences/aiforcecli/aiforcecli.config.json` |
|
|
530
|
+
| Linux | `~/.config/aiforcecli/aiforcecli.config.json` or `$XDG_CONFIG_HOME` |
|
|
531
|
+
|
|
532
|
+
Schema summary:
|
|
533
|
+
|
|
534
|
+
| Block | Keys | Purpose |
|
|
535
|
+
| --- | --- | --- |
|
|
536
|
+
| `agents.<id>` | `enabled`, `model`, `bin`, `defaultFlags`, `allowedTools` | Per-agent settings. `enabled:false` removes the agent from registry/routing/race/eval/listing. |
|
|
537
|
+
| `defaultAgent` | string | Agent used when routing is disabled and no explicit agent is passed. |
|
|
538
|
+
| `routing` | `enabled`, `strategy`, `prefer`, `only`, `models` | Auto-routing behavior and model catalog customization. |
|
|
539
|
+
| `verify` | `enabled`, `command`, `timeoutMs` | Verification command resolution and timeout. |
|
|
540
|
+
| `heal` | `enabled`, `maxAttempts`, `escalate` | Self-healing behavior. |
|
|
541
|
+
| `race` | `agents`, `select`, `keepWorktrees`, `linkPaths` | Race defaults, winner selection, worktree retention, dependency links. |
|
|
542
|
+
| `advise` | `weights`, `privatePseudocount`, `explore` | Recommendation scoring and exploration. |
|
|
543
|
+
| `eval` | `dir`, `models` | Private eval suite location and catalog target keys. |
|
|
544
|
+
| `telemetry` | `enabled`, `endpoint` | Opt-in anonymized upload. |
|
|
545
|
+
| `budgets` | `maxCostPerRunUsd`, `dailyCapUsd`, `weeklyCapUsd`, `monthlyCapUsd` | Cost caps. |
|
|
546
|
+
| top-level | `timeoutMs`, `inactivityTimeoutMs` | Agent subprocess watchdogs. |
|
|
547
|
+
|
|
548
|
+
Enable or disable built-in agents globally:
|
|
549
|
+
|
|
550
|
+
```bash
|
|
551
|
+
aiforcecli-chat agents --disable antigravity
|
|
552
|
+
aiforcecli-chat agents --enable aider
|
|
553
|
+
```
|
|
554
|
+
|
|
555
|
+
These toggles write to the user-level config.
|
|
556
|
+
|
|
557
|
+
## `init`
|
|
558
|
+
|
|
559
|
+
`aiforcecli-chat init` writes a project config and installs bundled assets when present:
|
|
560
|
+
|
|
561
|
+
- `assets/skills` -> `.claude/skills`
|
|
562
|
+
- `assets/subagents` -> `.claude/agents`
|
|
563
|
+
- `assets/prompts` -> `.aiforcecli/prompts`
|
|
564
|
+
|
|
565
|
+
Use `--force` to overwrite an existing project config:
|
|
566
|
+
|
|
567
|
+
```bash
|
|
568
|
+
aiforcecli-chat init --force
|
|
569
|
+
```
|
|
570
|
+
|
|
571
|
+
## Known Limitations
|
|
572
|
+
|
|
573
|
+
- `aiforcecli-chat mcp` is not implemented in 0.1.0.
|
|
574
|
+
- `race` and `eval` require a git repo with a commit because they use git worktrees.
|
|
575
|
+
- Verification quality depends on the repo's tests/build command.
|
|
576
|
+
- Task classification is heuristic and keyword-based.
|
|
577
|
+
- The public scorecard is curated and dated, not live.
|
|
578
|
+
- New or custom models without scorecard rows start from a neutral prior until private evidence accumulates.
|
|
579
|
+
- `pr` needs the GitHub CLI (`gh`) installed and authenticated to open pull requests automatically.
|
|
580
|
+
- Some agent fatal errors can still look like no-change/no-usage outcomes in `race` or `eval`; inspect output and use `--keep` when debugging.
|
|
581
|
+
- Codex model availability can differ by account type; configure `routing.only` or `agents.codex.model` to match what your local Codex CLI accepts.
|
|
582
|
+
- Antigravity usage/cost is not reported by the CLI in this integration.
|
|
583
|
+
- Mid-run budget enforcement is best effort for agents that only emit usage at the end.
|
|
584
|
+
- `bench` currently reports from the local global usage DB; there is no CLI `--project` filter yet.
|
|
585
|
+
|
|
586
|
+
## Development
|
|
587
|
+
|
|
588
|
+
```bash
|
|
589
|
+
npm install
|
|
590
|
+
npm run build
|
|
591
|
+
npm run typecheck
|
|
592
|
+
npm test
|
|
593
|
+
```
|
|
594
|
+
|
|
595
|
+
Build:
|
|
596
|
+
|
|
597
|
+
- TypeScript is bundled with `tsup`.
|
|
598
|
+
- `npm run build` runs `tsup` and then `scripts/obfuscate.mjs`.
|
|
599
|
+
- `npm run build:raw` runs `tsup` only.
|
|
600
|
+
|
|
601
|
+
Tests cover:
|
|
602
|
+
|
|
603
|
+
- adapter parsers
|
|
604
|
+
- routing
|
|
605
|
+
- recommendation scoring
|
|
606
|
+
- bandit policy
|
|
607
|
+
- reward
|
|
608
|
+
- budget enforcement
|
|
609
|
+
- SQLite store
|
|
610
|
+
- verify detection
|
|
611
|
+
- worktree isolation
|
|
612
|
+
- self-healing policy
|
|
613
|
+
- race selection
|
|
614
|
+
- PR command helpers
|
|
615
|
+
- command helpers
|
|
616
|
+
|
|
617
|
+
Recorded fixtures live under `test/fixtures`.
|
|
618
|
+
|
|
619
|
+
## Roadmap
|
|
620
|
+
|
|
621
|
+
Shipped in 0.1.0:
|
|
622
|
+
|
|
623
|
+
- adapters for Claude Code, Codex, Aider, and Antigravity
|
|
624
|
+
- config loading/schema
|
|
625
|
+
- model catalog and `models` command
|
|
626
|
+
- deterministic and bayesian routing
|
|
627
|
+
- `run`, `run --heal`, `pr`, `race`, `advise`, `eval`, `bench`, `cost`, `agents`, `init`
|
|
628
|
+
- local SQLite usage/outcome store
|
|
629
|
+
- budgets and cost reporting
|
|
630
|
+
- opt-in telemetry hook
|
|
631
|
+
- worktree isolation for race/eval
|
|
632
|
+
- contextual-bandit recommendation policy and off-policy logging fields
|
|
633
|
+
|
|
634
|
+
Next:
|
|
635
|
+
|
|
636
|
+
- clearer fatal error surfacing in `race` and `eval`
|
|
637
|
+
- richer task classification
|
|
638
|
+
- Codex model compatibility auto-detection
|
|
639
|
+
- learned route/escalation policy from logged context/action/reward/propensity
|
|
640
|
+
- project-scoped `bench` filtering
|
|
641
|
+
- real MCP server with auth and budget-capped tools
|
|
642
|
+
- hosted or federated aggregate leaderboard
|