openrouter-agent-cli 0.1.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,22 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
22
+
@@ -0,0 +1,270 @@
1
+ Metadata-Version: 2.4
2
+ Name: openrouter-agent-cli
3
+ Version: 0.1.2
4
+ Summary: Standalone terminal agent for OpenRouter with tool actions and context management.
5
+ Requires-Python: >=3.10
6
+ Description-Content-Type: text/markdown
7
+ License-File: LICENSE
8
+ Requires-Dist: httpx>=0.27
9
+ Dynamic: license-file
10
+
11
+ # openrouter-agent-cli
12
+
13
+ Standalone terminal agent for OpenRouter models with:
14
+ - tool actions (`run_bash`)
15
+ - interactive permission gating (`allow` / `deny` / `ask`)
16
+ - session persistence
17
+ - context visibility and compaction
18
+
19
+ ## Install
20
+
21
+ ```bash
22
+ cd openrouter-agent-cli
23
+ python3 -m venv .venv
24
+ source .venv/bin/activate
25
+ pip install -e .
26
+ ```
27
+
28
+ ## Run
29
+
30
+ ```bash
31
+ export OPENROUTER_API_KEY=sk-or-...
32
+ openrouter-agent
33
+ ```
34
+
35
+ Or without installation:
36
+
37
+ ```bash
38
+ export OPENROUTER_API_KEY=sk-or-...
39
+ python -m openrouter_agent_cli.cli
40
+ ```
41
+
42
+ ## Non-interactive prompt
43
+
44
+ `--prompt` (short `-p`) lets another process run the CLI with a single user message, emit only the assistant reply to `stdout`, and exit immediately. Operation logs, tool call summaries, and permission notices are written to `stderr`, and tool calls are automatically denied unless you disable tools with `--no-tools`.
45
+
46
+ Example:
47
+
48
+ ```bash
49
+ openrouter-agent --prompt "Explain tail recursion" --no-tools
50
+ ```
51
+
52
+ ## Useful flags
53
+
54
+ ```bash
55
+ openrouter-agent \
56
+ --model arcee-ai/trinity-large-preview:free \
57
+ --session-id my-session \
58
+ --workdir ~/Projects \
59
+ --max-turns 24 \
60
+ --max-history-messages 60 \
61
+ --command-timeout 30
62
+ ```
63
+
64
+ Disable tools:
65
+
66
+ ```bash
67
+ openrouter-agent --no-tools
68
+ ```
69
+
70
+ ## Slash commands
71
+
72
+ - `/help`
73
+ - `/exit`
74
+ - `/model [id]`
75
+ - `/usage`
76
+ - `/context [n]`
77
+ - `/compact`
78
+ - `/clear`
79
+ - `/tools`
80
+ - `/tools on|off`
81
+ - `/allow <tool|*>`
82
+ - `/deny <tool|*>`
83
+ - `/unallow <tool|*>`
84
+ - `/undeny <tool|*>`
85
+ - `/cwd [path]`
86
+
87
+ ## Context management
88
+
89
+ - history is saved in `~/.openrouter-agent-cli/sessions/<session_id>.json`
90
+ - `/usage` shows rough token estimate
91
+ - `/compact` forces summarization
92
+ - automatic compaction triggers when non-system message count exceeds `--max-history-messages`
93
+
94
+ ## Security notes
95
+
96
+ - `run_bash` executes shell commands on your machine in `--workdir`
97
+ - default policy is `ask` for every tool call
98
+ - use `/deny *` for a fully no-tools session
99
+ - default model is free-tier (`arcee-ai/trinity-large-preview:free`); override with `--model` or `OPENROUTER_MODEL`
100
+
101
+ ## Tool schema seen by the model
102
+
103
+ When tools are enabled, each OpenRouter request includes this tool definition:
104
+
105
+ ```json
106
+ [
107
+ {
108
+ "type": "function",
109
+ "function": {
110
+ "name": "run_bash",
111
+ "description": "Run a shell command in the current working directory and return stdout/stderr.",
112
+ "parameters": {
113
+ "type": "object",
114
+ "properties": {
115
+ "command": {
116
+ "type": "string",
117
+ "description": "Shell command to execute."
118
+ },
119
+ "timeout_seconds": {
120
+ "type": "integer",
121
+ "description": "Execution timeout in seconds (1-600).",
122
+ "default": 30
123
+ }
124
+ },
125
+ "required": ["command"]
126
+ }
127
+ }
128
+ }
129
+ ]
130
+ ```
131
+
132
+ Request body shape sent to OpenRouter (simplified):
133
+
134
+ ```json
135
+ {
136
+ "model": "arcee-ai/trinity-large-preview:free",
137
+ "messages": [...],
138
+ "temperature": 0,
139
+ "max_tokens": 4096,
140
+ "tools": [...],
141
+ "tool_choice": "auto"
142
+ }
143
+ ```
144
+
145
+ If tools are disabled (`--no-tools` or `/tools off`), the request sets:
146
+
147
+ ```json
148
+ {
149
+ "tool_choice": "none"
150
+ }
151
+ ```
152
+
153
+ ## How `run_bash` is invoked
154
+
155
+ Execution flow per user turn:
156
+
157
+ 1. Model returns `tool_calls` in assistant message.
158
+ 2. CLI decodes `function.arguments` JSON into a dict.
159
+ 3. Permission policy is applied:
160
+ - `deny` list blocks immediately.
161
+ - `allow` list runs immediately.
162
+ - otherwise prompt user (`y/n/a/d`).
163
+ 4. For `run_bash`, CLI executes:
164
+ - `asyncio.create_subprocess_shell(command, cwd=<workdir>, stdout=PIPE, stderr=PIPE)`
165
+ - waits with `asyncio.wait_for(..., timeout_seconds)`
166
+ - kills process on timeout
167
+ 5. CLI formats stdout/stderr/exit code to text and appends a tool result message:
168
+ - role: `tool`
169
+ - tool_call_id: model-provided id
170
+ - content: command output (capped to 8000 chars before being sent back to model)
171
+
172
+ Example tool call from model:
173
+
174
+ ```json
175
+ {
176
+ "id": "call_123",
177
+ "type": "function",
178
+ "function": {
179
+ "name": "run_bash",
180
+ "arguments": "{\"command\":\"ls -la\",\"timeout_seconds\":30}"
181
+ }
182
+ }
183
+ ```
184
+
185
+ Example tool result message added by CLI:
186
+
187
+ ```json
188
+ {
189
+ "role": "tool",
190
+ "tool_call_id": "call_123",
191
+ "content": "total 64\n-rw-r--r-- ..."
192
+ }
193
+ ```
194
+
195
+ Note: despite the name `run_bash`, execution uses `create_subprocess_shell` (system shell), not an explicit `bash` binary unless the command itself invokes `bash`.
196
+
197
+ ## Prompt A/B testing
198
+
199
+ This repo includes a small harness for comparing system prompts:
200
+
201
+ - script: `scripts/ab_test_system_prompts.py`
202
+ - prompt variants:
203
+ - `prompts/system_prompt_control.md`
204
+ - `prompts/system_prompt_agentic_v1.md`
205
+ - sample tasks: `ab_tests/tasks_sample.txt`
206
+
207
+ Run prompt-only comparison (no tools):
208
+
209
+ ```bash
210
+ export OPENROUTER_API_KEY=sk-or-...
211
+ python scripts/ab_test_system_prompts.py \
212
+ --tool-mode none \
213
+ --model arcee-ai/trinity-large-preview:free
214
+ ```
215
+
216
+ Run with tool execution enabled (use cautiously):
217
+
218
+ ```bash
219
+ export OPENROUTER_API_KEY=sk-or-...
220
+ python scripts/ab_test_system_prompts.py \
221
+ --tool-mode execute \
222
+ --workdir "$(pwd)" \
223
+ --model arcee-ai/trinity-large-preview:free
224
+ ```
225
+
226
+ Artifacts are written to `ab_tests/results/<timestamp>/`:
227
+
228
+ - `results.json` full transcripts and metadata
229
+ - `summary.csv` flat comparison table
230
+ - `summary.md` quick markdown summary
231
+
232
+ Run a harder repeated suite (2 prompts x 6 tasks x 3 repeats):
233
+
234
+ ```bash
235
+ export OPENROUTER_API_KEY=sk-or-...
236
+ python scripts/ab_test_system_prompts.py \
237
+ --tool-mode execute \
238
+ --tasks-file ab_tests/tasks_hard_suite_v1.txt \
239
+ --repeats 3 \
240
+ --max-turns 3 \
241
+ --max-tokens 1000 \
242
+ --request-timeout 40 \
243
+ --command-timeout 20 \
244
+ --workdir "$(pwd)" \
245
+ --model arcee-ai/trinity-large-preview:free \
246
+ --output-dir ab_tests/results/hard_suite_v1_r3
247
+ ```
248
+
249
+ Evaluate quality and groundedness from a run:
250
+
251
+ ```bash
252
+ export OPENROUTER_API_KEY=sk-or-...
253
+ python scripts/evaluate_ab_results.py \
254
+ --results ab_tests/results/hard_suite_v1_r3/results.json \
255
+ --judge-model arcee-ai/trinity-large-preview:free \
256
+ --output-dir ab_tests/results/hard_suite_v1_r3/eval
257
+ ```
258
+
259
+ Evaluator artifacts:
260
+
261
+ - `evaluation.json` per-case raw evaluation details
262
+ - `evaluation.csv` tabular scores
263
+ - `leaderboard.md` aggregated per-prompt ranking
264
+
265
+ ## Findings and release docs
266
+
267
+ - benchmark findings: `docs/AB_FINDINGS_2026-02-21.md`
268
+ - public release checklist: `docs/PUBLIC_RELEASE_CHECKLIST.md`
269
+ - security policy: `SECURITY.md`
270
+ - env template: `.env.example`
@@ -0,0 +1,260 @@
1
+ # openrouter-agent-cli
2
+
3
+ Standalone terminal agent for OpenRouter models with:
4
+ - tool actions (`run_bash`)
5
+ - interactive permission gating (`allow` / `deny` / `ask`)
6
+ - session persistence
7
+ - context visibility and compaction
8
+
9
+ ## Install
10
+
11
+ ```bash
12
+ cd openrouter-agent-cli
13
+ python3 -m venv .venv
14
+ source .venv/bin/activate
15
+ pip install -e .
16
+ ```
17
+
18
+ ## Run
19
+
20
+ ```bash
21
+ export OPENROUTER_API_KEY=sk-or-...
22
+ openrouter-agent
23
+ ```
24
+
25
+ Or without installation:
26
+
27
+ ```bash
28
+ export OPENROUTER_API_KEY=sk-or-...
29
+ python -m openrouter_agent_cli.cli
30
+ ```
31
+
32
+ ## Non-interactive prompt
33
+
34
+ `--prompt` (short `-p`) lets another process run the CLI with a single user message, emit only the assistant reply to `stdout`, and exit immediately. Operation logs, tool call summaries, and permission notices are written to `stderr`, and tool calls are automatically denied unless you disable tools with `--no-tools`.
35
+
36
+ Example:
37
+
38
+ ```bash
39
+ openrouter-agent --prompt "Explain tail recursion" --no-tools
40
+ ```
41
+
42
+ ## Useful flags
43
+
44
+ ```bash
45
+ openrouter-agent \
46
+ --model arcee-ai/trinity-large-preview:free \
47
+ --session-id my-session \
48
+ --workdir ~/Projects \
49
+ --max-turns 24 \
50
+ --max-history-messages 60 \
51
+ --command-timeout 30
52
+ ```
53
+
54
+ Disable tools:
55
+
56
+ ```bash
57
+ openrouter-agent --no-tools
58
+ ```
59
+
60
+ ## Slash commands
61
+
62
+ - `/help`
63
+ - `/exit`
64
+ - `/model [id]`
65
+ - `/usage`
66
+ - `/context [n]`
67
+ - `/compact`
68
+ - `/clear`
69
+ - `/tools`
70
+ - `/tools on|off`
71
+ - `/allow <tool|*>`
72
+ - `/deny <tool|*>`
73
+ - `/unallow <tool|*>`
74
+ - `/undeny <tool|*>`
75
+ - `/cwd [path]`
76
+
77
+ ## Context management
78
+
79
+ - history is saved in `~/.openrouter-agent-cli/sessions/<session_id>.json`
80
+ - `/usage` shows rough token estimate
81
+ - `/compact` forces summarization
82
+ - automatic compaction triggers when non-system message count exceeds `--max-history-messages`
83
+
84
+ ## Security notes
85
+
86
+ - `run_bash` executes shell commands on your machine in `--workdir`
87
+ - default policy is `ask` for every tool call
88
+ - use `/deny *` for a fully no-tools session
89
+ - default model is free-tier (`arcee-ai/trinity-large-preview:free`); override with `--model` or `OPENROUTER_MODEL`
90
+
91
+ ## Tool schema seen by the model
92
+
93
+ When tools are enabled, each OpenRouter request includes this tool definition:
94
+
95
+ ```json
96
+ [
97
+ {
98
+ "type": "function",
99
+ "function": {
100
+ "name": "run_bash",
101
+ "description": "Run a shell command in the current working directory and return stdout/stderr.",
102
+ "parameters": {
103
+ "type": "object",
104
+ "properties": {
105
+ "command": {
106
+ "type": "string",
107
+ "description": "Shell command to execute."
108
+ },
109
+ "timeout_seconds": {
110
+ "type": "integer",
111
+ "description": "Execution timeout in seconds (1-600).",
112
+ "default": 30
113
+ }
114
+ },
115
+ "required": ["command"]
116
+ }
117
+ }
118
+ }
119
+ ]
120
+ ```
121
+
122
+ Request body shape sent to OpenRouter (simplified):
123
+
124
+ ```json
125
+ {
126
+ "model": "arcee-ai/trinity-large-preview:free",
127
+ "messages": [...],
128
+ "temperature": 0,
129
+ "max_tokens": 4096,
130
+ "tools": [...],
131
+ "tool_choice": "auto"
132
+ }
133
+ ```
134
+
135
+ If tools are disabled (`--no-tools` or `/tools off`), the request sets:
136
+
137
+ ```json
138
+ {
139
+ "tool_choice": "none"
140
+ }
141
+ ```
142
+
143
+ ## How `run_bash` is invoked
144
+
145
+ Execution flow per user turn:
146
+
147
+ 1. Model returns `tool_calls` in assistant message.
148
+ 2. CLI decodes `function.arguments` JSON into a dict.
149
+ 3. Permission policy is applied:
150
+ - `deny` list blocks immediately.
151
+ - `allow` list runs immediately.
152
+ - otherwise prompt user (`y/n/a/d`).
153
+ 4. For `run_bash`, CLI executes:
154
+ - `asyncio.create_subprocess_shell(command, cwd=<workdir>, stdout=PIPE, stderr=PIPE)`
155
+ - waits with `asyncio.wait_for(..., timeout_seconds)`
156
+ - kills process on timeout
157
+ 5. CLI formats stdout/stderr/exit code to text and appends a tool result message:
158
+ - role: `tool`
159
+ - tool_call_id: model-provided id
160
+ - content: command output (capped to 8000 chars before being sent back to model)
161
+
162
+ Example tool call from model:
163
+
164
+ ```json
165
+ {
166
+ "id": "call_123",
167
+ "type": "function",
168
+ "function": {
169
+ "name": "run_bash",
170
+ "arguments": "{\"command\":\"ls -la\",\"timeout_seconds\":30}"
171
+ }
172
+ }
173
+ ```
174
+
175
+ Example tool result message added by CLI:
176
+
177
+ ```json
178
+ {
179
+ "role": "tool",
180
+ "tool_call_id": "call_123",
181
+ "content": "total 64\n-rw-r--r-- ..."
182
+ }
183
+ ```
184
+
185
+ Note: despite the name `run_bash`, execution uses `create_subprocess_shell` (system shell), not an explicit `bash` binary unless the command itself invokes `bash`.
186
+
187
+ ## Prompt A/B testing
188
+
189
+ This repo includes a small harness for comparing system prompts:
190
+
191
+ - script: `scripts/ab_test_system_prompts.py`
192
+ - prompt variants:
193
+ - `prompts/system_prompt_control.md`
194
+ - `prompts/system_prompt_agentic_v1.md`
195
+ - sample tasks: `ab_tests/tasks_sample.txt`
196
+
197
+ Run prompt-only comparison (no tools):
198
+
199
+ ```bash
200
+ export OPENROUTER_API_KEY=sk-or-...
201
+ python scripts/ab_test_system_prompts.py \
202
+ --tool-mode none \
203
+ --model arcee-ai/trinity-large-preview:free
204
+ ```
205
+
206
+ Run with tool execution enabled (use cautiously):
207
+
208
+ ```bash
209
+ export OPENROUTER_API_KEY=sk-or-...
210
+ python scripts/ab_test_system_prompts.py \
211
+ --tool-mode execute \
212
+ --workdir "$(pwd)" \
213
+ --model arcee-ai/trinity-large-preview:free
214
+ ```
215
+
216
+ Artifacts are written to `ab_tests/results/<timestamp>/`:
217
+
218
+ - `results.json` full transcripts and metadata
219
+ - `summary.csv` flat comparison table
220
+ - `summary.md` quick markdown summary
221
+
222
+ Run a harder repeated suite (2 prompts x 6 tasks x 3 repeats):
223
+
224
+ ```bash
225
+ export OPENROUTER_API_KEY=sk-or-...
226
+ python scripts/ab_test_system_prompts.py \
227
+ --tool-mode execute \
228
+ --tasks-file ab_tests/tasks_hard_suite_v1.txt \
229
+ --repeats 3 \
230
+ --max-turns 3 \
231
+ --max-tokens 1000 \
232
+ --request-timeout 40 \
233
+ --command-timeout 20 \
234
+ --workdir "$(pwd)" \
235
+ --model arcee-ai/trinity-large-preview:free \
236
+ --output-dir ab_tests/results/hard_suite_v1_r3
237
+ ```
238
+
239
+ Evaluate quality and groundedness from a run:
240
+
241
+ ```bash
242
+ export OPENROUTER_API_KEY=sk-or-...
243
+ python scripts/evaluate_ab_results.py \
244
+ --results ab_tests/results/hard_suite_v1_r3/results.json \
245
+ --judge-model arcee-ai/trinity-large-preview:free \
246
+ --output-dir ab_tests/results/hard_suite_v1_r3/eval
247
+ ```
248
+
249
+ Evaluator artifacts:
250
+
251
+ - `evaluation.json` per-case raw evaluation details
252
+ - `evaluation.csv` tabular scores
253
+ - `leaderboard.md` aggregated per-prompt ranking
254
+
255
+ ## Findings and release docs
256
+
257
+ - benchmark findings: `docs/AB_FINDINGS_2026-02-21.md`
258
+ - public release checklist: `docs/PUBLIC_RELEASE_CHECKLIST.md`
259
+ - security policy: `SECURITY.md`
260
+ - env template: `.env.example`
@@ -0,0 +1,2 @@
1
+ """openrouter-agent-cli package."""
2
+
@@ -0,0 +1,6 @@
1
+ from .cli import main
2
+
3
+
4
+ if __name__ == "__main__":
5
+ main()
6
+