codexapi 0.5.5__tar.gz → 0.5.8__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {codexapi-0.5.5/src/codexapi.egg-info → codexapi-0.5.8}/PKG-INFO +27 -11
- {codexapi-0.5.5 → codexapi-0.5.8}/README.md +26 -10
- {codexapi-0.5.5 → codexapi-0.5.8}/pyproject.toml +1 -1
- {codexapi-0.5.5 → codexapi-0.5.8}/src/codexapi/__init__.py +1 -1
- {codexapi-0.5.5 → codexapi-0.5.8}/src/codexapi/cli.py +69 -11
- {codexapi-0.5.5 → codexapi-0.5.8}/src/codexapi/foreach.py +5 -5
- codexapi-0.5.8/src/codexapi/gh_integration.py +229 -0
- {codexapi-0.5.5 → codexapi-0.5.8}/src/codexapi/task.py +186 -115
- {codexapi-0.5.5 → codexapi-0.5.8/src/codexapi.egg-info}/PKG-INFO +27 -11
- {codexapi-0.5.5 → codexapi-0.5.8}/src/codexapi.egg-info/SOURCES.txt +1 -0
- {codexapi-0.5.5 → codexapi-0.5.8}/LICENSE +0 -0
- {codexapi-0.5.5 → codexapi-0.5.8}/setup.cfg +0 -0
- {codexapi-0.5.5 → codexapi-0.5.8}/src/codexapi/__main__.py +0 -0
- {codexapi-0.5.5 → codexapi-0.5.8}/src/codexapi/agent.py +0 -0
- {codexapi-0.5.5 → codexapi-0.5.8}/src/codexapi/ralph.py +0 -0
- {codexapi-0.5.5 → codexapi-0.5.8}/src/codexapi/taskfile.py +0 -0
- {codexapi-0.5.5 → codexapi-0.5.8}/src/codexapi.egg-info/dependency_links.txt +0 -0
- {codexapi-0.5.5 → codexapi-0.5.8}/src/codexapi.egg-info/entry_points.txt +0 -0
- {codexapi-0.5.5 → codexapi-0.5.8}/src/codexapi.egg-info/requires.txt +0 -0
- {codexapi-0.5.5 → codexapi-0.5.8}/src/codexapi.egg-info/top_level.txt +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.1
|
|
2
2
|
Name: codexapi
|
|
3
|
-
Version: 0.5.
|
|
3
|
+
Version: 0.5.8
|
|
4
4
|
Summary: Minimal Python API for running the Codex CLI.
|
|
5
5
|
License: MIT
|
|
6
6
|
Keywords: codex,agent,cli,openai
|
|
@@ -68,7 +68,7 @@ codexapi run --cwd /path/to/project "Fix the failing tests."
|
|
|
68
68
|
echo "Say hello." | codexapi run
|
|
69
69
|
```
|
|
70
70
|
|
|
71
|
-
`codexapi task` exits with code 0 on success and 1 on failure
|
|
71
|
+
`codexapi task` exits with code 0 on success and 1 on failure.
|
|
72
72
|
|
|
73
73
|
```bash
|
|
74
74
|
codexapi task "Fix the failing tests." --max-iterations 5
|
|
@@ -79,9 +79,25 @@ Progress is shown by default for `codexapi task`; use `--quiet` to suppress it.
|
|
|
79
79
|
When using `--item`, the task file must include at least one `{{item}}` placeholder.
|
|
80
80
|
|
|
81
81
|
Task files default to using the standard check prompt for the task. Set `check: "None"` to skip verification.
|
|
82
|
-
Use `max_iterations` in the task file to override the default
|
|
82
|
+
Use `max_iterations` in the task file to override the default iteration cap (0 means unlimited).
|
|
83
83
|
Checks are wrapped with the verifier prompt, include the agent output, and expect JSON with `success`/`reason`.
|
|
84
84
|
|
|
85
|
+
Take tasks from a GitHub Project (requires `gh-task`):
|
|
86
|
+
|
|
87
|
+
```bash
|
|
88
|
+
codexapi task -p owner/projects/3 -n "Your Name" -s Backlog task_a.yaml task_b.yaml
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
Task labels are derived from task filenames (basename without extension). The
|
|
92
|
+
issue title/body become `{{item}}` after removing any existing `## Progress`
|
|
93
|
+
section.
|
|
94
|
+
|
|
95
|
+
Example task progress run:
|
|
96
|
+
|
|
97
|
+
```bash
|
|
98
|
+
./examples/example_task_progress.sh
|
|
99
|
+
```
|
|
100
|
+
|
|
85
101
|
Show running sessions and their latest activity:
|
|
86
102
|
|
|
87
103
|
```bash
|
|
@@ -151,11 +167,11 @@ the same conversation and returns only the agent's message.
|
|
|
151
167
|
### `task(prompt, check=None, max_iterations=10, cwd=None, yolo=True, flags=None, progress=False, set_up=None, tear_down=None, on_success=None, on_failure=None) -> str`
|
|
152
168
|
|
|
153
169
|
Runs a task with checker-driven retries and returns the success summary.
|
|
154
|
-
Raises `TaskFailed` when the maximum
|
|
170
|
+
Raises `TaskFailed` when the maximum iterations are reached.
|
|
155
171
|
|
|
156
172
|
- `check` (str | None | False): custom check prompt, default checker, or `False`/`"None"` to skip.
|
|
157
|
-
- `max_iterations` (int): maximum number of task
|
|
158
|
-
- `progress` (bool):
|
|
173
|
+
- `max_iterations` (int): maximum number of task iterations (0 means unlimited).
|
|
174
|
+
- `progress` (bool): show a tqdm progress bar with a one-line status after each round.
|
|
159
175
|
- `set_up`/`tear_down`/`on_success`/`on_failure` (str | None): optional hook prompts.
|
|
160
176
|
|
|
161
177
|
### `task_result(prompt, check=None, max_iterations=10, cwd=None, yolo=True, flags=None, progress=False, set_up=None, tear_down=None, on_success=None, on_failure=None) -> TaskResult`
|
|
@@ -164,7 +180,7 @@ Runs a task with checker-driven retries and returns a `TaskResult` without
|
|
|
164
180
|
raising `TaskFailed`.
|
|
165
181
|
Arguments mirror `task()` (including hooks).
|
|
166
182
|
|
|
167
|
-
### `Task(prompt,
|
|
183
|
+
### `Task(prompt, max_iterations=10, cwd=None, yolo=True, thread_id=None, flags=None)`
|
|
168
184
|
|
|
169
185
|
Runs a Codex task with checker-driven retries. Subclass it and implement
|
|
170
186
|
`check()` to return an error string when the task is incomplete, or return
|
|
@@ -179,22 +195,22 @@ default check prompt and includes the agent output.
|
|
|
179
195
|
- `on_success(result)`: optional success hook.
|
|
180
196
|
- `on_failure(result)`: optional failure hook.
|
|
181
197
|
|
|
182
|
-
### `TaskResult(success, summary,
|
|
198
|
+
### `TaskResult(success, summary, iterations, errors, thread_id)`
|
|
183
199
|
|
|
184
200
|
Simple result object returned by `Task.__call__`.
|
|
185
201
|
|
|
186
202
|
- `success` (bool): whether the task completed successfully.
|
|
187
203
|
- `summary` (str): agent summary of what happened.
|
|
188
|
-
- `
|
|
204
|
+
- `iterations` (int): how many iterations were used.
|
|
189
205
|
- `errors` (str | None): last checker error, if any.
|
|
190
206
|
- `thread_id` (str | None): Codex thread id for the session.
|
|
191
207
|
|
|
192
208
|
### `TaskFailed`
|
|
193
209
|
|
|
194
|
-
Exception raised by `task()` when
|
|
210
|
+
Exception raised by `task()` when iterations are exhausted.
|
|
195
211
|
|
|
196
212
|
- `summary` (str): failure summary text.
|
|
197
|
-
- `
|
|
213
|
+
- `iterations` (int | None): iterations made when the task failed.
|
|
198
214
|
- `errors` (str | None): last checker error, if any.
|
|
199
215
|
|
|
200
216
|
### `foreach(list_file, task_file, n=None, cwd=None, yolo=True, flags=None) -> ForeachResult`
|
|
@@ -54,7 +54,7 @@ codexapi run --cwd /path/to/project "Fix the failing tests."
|
|
|
54
54
|
echo "Say hello." | codexapi run
|
|
55
55
|
```
|
|
56
56
|
|
|
57
|
-
`codexapi task` exits with code 0 on success and 1 on failure
|
|
57
|
+
`codexapi task` exits with code 0 on success and 1 on failure.
|
|
58
58
|
|
|
59
59
|
```bash
|
|
60
60
|
codexapi task "Fix the failing tests." --max-iterations 5
|
|
@@ -65,9 +65,25 @@ Progress is shown by default for `codexapi task`; use `--quiet` to suppress it.
|
|
|
65
65
|
When using `--item`, the task file must include at least one `{{item}}` placeholder.
|
|
66
66
|
|
|
67
67
|
Task files default to using the standard check prompt for the task. Set `check: "None"` to skip verification.
|
|
68
|
-
Use `max_iterations` in the task file to override the default
|
|
68
|
+
Use `max_iterations` in the task file to override the default iteration cap (0 means unlimited).
|
|
69
69
|
Checks are wrapped with the verifier prompt, include the agent output, and expect JSON with `success`/`reason`.
|
|
70
70
|
|
|
71
|
+
Take tasks from a GitHub Project (requires `gh-task`):
|
|
72
|
+
|
|
73
|
+
```bash
|
|
74
|
+
codexapi task -p owner/projects/3 -n "Your Name" -s Backlog task_a.yaml task_b.yaml
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
Task labels are derived from task filenames (basename without extension). The
|
|
78
|
+
issue title/body become `{{item}}` after removing any existing `## Progress`
|
|
79
|
+
section.
|
|
80
|
+
|
|
81
|
+
Example task progress run:
|
|
82
|
+
|
|
83
|
+
```bash
|
|
84
|
+
./examples/example_task_progress.sh
|
|
85
|
+
```
|
|
86
|
+
|
|
71
87
|
Show running sessions and their latest activity:
|
|
72
88
|
|
|
73
89
|
```bash
|
|
@@ -137,11 +153,11 @@ the same conversation and returns only the agent's message.
|
|
|
137
153
|
### `task(prompt, check=None, max_iterations=10, cwd=None, yolo=True, flags=None, progress=False, set_up=None, tear_down=None, on_success=None, on_failure=None) -> str`
|
|
138
154
|
|
|
139
155
|
Runs a task with checker-driven retries and returns the success summary.
|
|
140
|
-
Raises `TaskFailed` when the maximum
|
|
156
|
+
Raises `TaskFailed` when the maximum iterations are reached.
|
|
141
157
|
|
|
142
158
|
- `check` (str | None | False): custom check prompt, default checker, or `False`/`"None"` to skip.
|
|
143
|
-
- `max_iterations` (int): maximum number of task
|
|
144
|
-
- `progress` (bool):
|
|
159
|
+
- `max_iterations` (int): maximum number of task iterations (0 means unlimited).
|
|
160
|
+
- `progress` (bool): show a tqdm progress bar with a one-line status after each round.
|
|
145
161
|
- `set_up`/`tear_down`/`on_success`/`on_failure` (str | None): optional hook prompts.
|
|
146
162
|
|
|
147
163
|
### `task_result(prompt, check=None, max_iterations=10, cwd=None, yolo=True, flags=None, progress=False, set_up=None, tear_down=None, on_success=None, on_failure=None) -> TaskResult`
|
|
@@ -150,7 +166,7 @@ Runs a task with checker-driven retries and returns a `TaskResult` without
|
|
|
150
166
|
raising `TaskFailed`.
|
|
151
167
|
Arguments mirror `task()` (including hooks).
|
|
152
168
|
|
|
153
|
-
### `Task(prompt,
|
|
169
|
+
### `Task(prompt, max_iterations=10, cwd=None, yolo=True, thread_id=None, flags=None)`
|
|
154
170
|
|
|
155
171
|
Runs a Codex task with checker-driven retries. Subclass it and implement
|
|
156
172
|
`check()` to return an error string when the task is incomplete, or return
|
|
@@ -165,22 +181,22 @@ default check prompt and includes the agent output.
|
|
|
165
181
|
- `on_success(result)`: optional success hook.
|
|
166
182
|
- `on_failure(result)`: optional failure hook.
|
|
167
183
|
|
|
168
|
-
### `TaskResult(success, summary,
|
|
184
|
+
### `TaskResult(success, summary, iterations, errors, thread_id)`
|
|
169
185
|
|
|
170
186
|
Simple result object returned by `Task.__call__`.
|
|
171
187
|
|
|
172
188
|
- `success` (bool): whether the task completed successfully.
|
|
173
189
|
- `summary` (str): agent summary of what happened.
|
|
174
|
-
- `
|
|
190
|
+
- `iterations` (int): how many iterations were used.
|
|
175
191
|
- `errors` (str | None): last checker error, if any.
|
|
176
192
|
- `thread_id` (str | None): Codex thread id for the session.
|
|
177
193
|
|
|
178
194
|
### `TaskFailed`
|
|
179
195
|
|
|
180
|
-
Exception raised by `task()` when
|
|
196
|
+
Exception raised by `task()` when iterations are exhausted.
|
|
181
197
|
|
|
182
198
|
- `summary` (str): failure summary text.
|
|
183
|
-
- `
|
|
199
|
+
- `iterations` (int | None): iterations made when the task failed.
|
|
184
200
|
- `errors` (str | None): last checker error, if any.
|
|
185
201
|
|
|
186
202
|
### `foreach(list_file, task_file, n=None, cwd=None, yolo=True, flags=None) -> ForeachResult`
|
|
@@ -1033,9 +1033,25 @@ def main(argv=None):
|
|
|
1033
1033
|
help="Item value for task files that use {{item}} placeholders.",
|
|
1034
1034
|
)
|
|
1035
1035
|
task_parser.add_argument(
|
|
1036
|
-
"
|
|
1037
|
-
|
|
1038
|
-
help="
|
|
1036
|
+
"-p",
|
|
1037
|
+
"--project",
|
|
1038
|
+
help="GitHub Project reference to pull tasks from.",
|
|
1039
|
+
)
|
|
1040
|
+
task_parser.add_argument(
|
|
1041
|
+
"-s",
|
|
1042
|
+
"--status",
|
|
1043
|
+
default="Backlog",
|
|
1044
|
+
help="Status name to take from when using --project (default: Backlog).",
|
|
1045
|
+
)
|
|
1046
|
+
task_parser.add_argument(
|
|
1047
|
+
"-n",
|
|
1048
|
+
"--name",
|
|
1049
|
+
help="Owner label name for gh-task when using --project.",
|
|
1050
|
+
)
|
|
1051
|
+
task_parser.add_argument(
|
|
1052
|
+
"task_args",
|
|
1053
|
+
nargs="*",
|
|
1054
|
+
help="Prompt to send (no --project) or task files (with --project).",
|
|
1039
1055
|
)
|
|
1040
1056
|
task_parser.add_argument(
|
|
1041
1057
|
"--check",
|
|
@@ -1046,7 +1062,7 @@ def main(argv=None):
|
|
|
1046
1062
|
type=int,
|
|
1047
1063
|
default=None,
|
|
1048
1064
|
help=(
|
|
1049
|
-
"Max agent
|
|
1065
|
+
"Max agent iterations (0 means unlimited). "
|
|
1050
1066
|
f"Defaults to {DEFAULT_MAX_ITERATIONS}."
|
|
1051
1067
|
),
|
|
1052
1068
|
)
|
|
@@ -1276,8 +1292,40 @@ def main(argv=None):
|
|
|
1276
1292
|
if args.ralph_fresh is None:
|
|
1277
1293
|
args.ralph_fresh = True
|
|
1278
1294
|
|
|
1295
|
+
if args.command == "task" and args.project:
|
|
1296
|
+
if args.task_file:
|
|
1297
|
+
raise SystemExit("task --project does not allow -f.")
|
|
1298
|
+
if args.item is not None:
|
|
1299
|
+
raise SystemExit("--item is only supported with -f.")
|
|
1300
|
+
if args.check is not None:
|
|
1301
|
+
raise SystemExit("--check is not allowed with --project.")
|
|
1302
|
+
if args.max_iterations is not None:
|
|
1303
|
+
raise SystemExit("--max-iterations is not allowed with --project.")
|
|
1304
|
+
if not args.name:
|
|
1305
|
+
raise SystemExit("--name is required with --project.")
|
|
1306
|
+
if not args.task_args:
|
|
1307
|
+
raise SystemExit("task --project requires one or more task files.")
|
|
1308
|
+
try:
|
|
1309
|
+
from .gh_integration import GhTaskRunner
|
|
1310
|
+
except ImportError as exc:
|
|
1311
|
+
raise SystemExit("gh-task is required for --project. Install it with pip.") from exc
|
|
1312
|
+
|
|
1313
|
+
task_runner = GhTaskRunner(
|
|
1314
|
+
args.project,
|
|
1315
|
+
args.name,
|
|
1316
|
+
args.task_args,
|
|
1317
|
+
args.status,
|
|
1318
|
+
args.cwd,
|
|
1319
|
+
args.yolo,
|
|
1320
|
+
args.flags,
|
|
1321
|
+
)
|
|
1322
|
+
result = task_runner(progress=not args.quiet)
|
|
1323
|
+
if not result.success:
|
|
1324
|
+
raise SystemExit(1)
|
|
1325
|
+
return
|
|
1326
|
+
|
|
1279
1327
|
if args.command == "task" and args.task_file:
|
|
1280
|
-
if args.
|
|
1328
|
+
if args.task_args:
|
|
1281
1329
|
raise SystemExit("task -f does not take a prompt.")
|
|
1282
1330
|
if args.item is not None:
|
|
1283
1331
|
task_def = load_task_file(args.task_file)
|
|
@@ -1298,18 +1346,20 @@ def main(argv=None):
|
|
|
1298
1346
|
flags=args.flags,
|
|
1299
1347
|
)
|
|
1300
1348
|
result = task_runner(progress=not args.quiet)
|
|
1301
|
-
print(result.summary)
|
|
1302
1349
|
if not result.success:
|
|
1303
1350
|
raise SystemExit(1)
|
|
1304
1351
|
return
|
|
1305
1352
|
|
|
1306
1353
|
prompt_source = None
|
|
1307
|
-
|
|
1354
|
+
prompt = None
|
|
1355
|
+
if args.command in ("run", "ralph"):
|
|
1308
1356
|
prompt_source = args.prompt
|
|
1309
1357
|
elif args.command == "science":
|
|
1310
1358
|
prompt_source = args.task
|
|
1311
|
-
|
|
1359
|
+
if args.command != "task":
|
|
1360
|
+
prompt = _read_prompt(prompt_source)
|
|
1312
1361
|
exit_code = 0
|
|
1362
|
+
message = None
|
|
1313
1363
|
|
|
1314
1364
|
if args.command == "ralph":
|
|
1315
1365
|
if args.max_iterations < 0:
|
|
@@ -1339,6 +1389,8 @@ def main(argv=None):
|
|
|
1339
1389
|
)
|
|
1340
1390
|
return
|
|
1341
1391
|
if args.command == "task":
|
|
1392
|
+
if args.project:
|
|
1393
|
+
raise SystemExit("task --project already handled earlier.")
|
|
1342
1394
|
if args.item is not None:
|
|
1343
1395
|
raise SystemExit("--item is only supported with -f.")
|
|
1344
1396
|
if args.max_iterations is None:
|
|
@@ -1347,7 +1399,13 @@ def main(argv=None):
|
|
|
1347
1399
|
raise SystemExit("--max-iterations must be >= 0.")
|
|
1348
1400
|
check = args.check
|
|
1349
1401
|
try:
|
|
1350
|
-
|
|
1402
|
+
task_args = args.task_args or []
|
|
1403
|
+
if len(task_args) > 1:
|
|
1404
|
+
raise SystemExit("task takes a single prompt unless --project is used.")
|
|
1405
|
+
if task_args:
|
|
1406
|
+
prompt_source = task_args[0]
|
|
1407
|
+
prompt = _read_prompt(prompt_source)
|
|
1408
|
+
task(
|
|
1351
1409
|
prompt,
|
|
1352
1410
|
check,
|
|
1353
1411
|
args.max_iterations,
|
|
@@ -1357,7 +1415,6 @@ def main(argv=None):
|
|
|
1357
1415
|
not args.quiet,
|
|
1358
1416
|
)
|
|
1359
1417
|
except TaskFailed as exc:
|
|
1360
|
-
message = exc.summary
|
|
1361
1418
|
exit_code = 1
|
|
1362
1419
|
else:
|
|
1363
1420
|
use_session = args.thread_id or args.print_thread_id
|
|
@@ -1374,7 +1431,8 @@ def main(argv=None):
|
|
|
1374
1431
|
else:
|
|
1375
1432
|
message = agent(prompt, args.cwd, args.yolo, args.flags)
|
|
1376
1433
|
|
|
1377
|
-
|
|
1434
|
+
if message is not None:
|
|
1435
|
+
print(message)
|
|
1378
1436
|
if exit_code:
|
|
1379
1437
|
raise SystemExit(exit_code)
|
|
1380
1438
|
|
|
@@ -185,8 +185,8 @@ def _run_item(
|
|
|
185
185
|
|
|
186
186
|
summary = ""
|
|
187
187
|
success = False
|
|
188
|
-
|
|
189
|
-
|
|
188
|
+
iterations = None
|
|
189
|
+
max_iterations = None
|
|
190
190
|
try:
|
|
191
191
|
task = TaskFile(
|
|
192
192
|
task_file,
|
|
@@ -196,17 +196,17 @@ def _run_item(
|
|
|
196
196
|
thread_id=None,
|
|
197
197
|
flags=flags,
|
|
198
198
|
)
|
|
199
|
-
|
|
199
|
+
max_iterations = task.max_iterations
|
|
200
200
|
result = task()
|
|
201
201
|
success = result.success
|
|
202
|
-
|
|
202
|
+
iterations = result.iterations
|
|
203
203
|
summary = result.summary or ""
|
|
204
204
|
except Exception as exc:
|
|
205
205
|
summary = f"{type(exc).__name__}: {exc}"
|
|
206
206
|
success = False
|
|
207
207
|
|
|
208
208
|
summary = _single_line(summary)
|
|
209
|
-
turns = _format_turns(
|
|
209
|
+
turns = _format_turns(iterations, max_iterations)
|
|
210
210
|
if summary:
|
|
211
211
|
summary = f"{summary} {turns}"
|
|
212
212
|
else:
|
|
@@ -0,0 +1,229 @@
|
|
|
1
|
+
import logging
|
|
2
|
+
import re
|
|
3
|
+
import time
|
|
4
|
+
from pathlib import Path
|
|
5
|
+
|
|
6
|
+
from tqdm import tqdm
|
|
7
|
+
|
|
8
|
+
from gh_task.project import Project
|
|
9
|
+
|
|
10
|
+
from .taskfile import TaskFile
|
|
11
|
+
|
|
12
|
+
|
|
13
|
+
_logger = logging.getLogger(__name__)
|
|
14
|
+
|
|
15
|
+
_PROGRESS_HEADER = "## Progress"
|
|
16
|
+
_SUCCESS_LABEL = "✓"
|
|
17
|
+
_FAILURE_LABEL = "⨉"
|
|
18
|
+
_SUCCESS_COLOR = "2da44e"
|
|
19
|
+
_FAILURE_COLOR = "d73a4a"
|
|
20
|
+
|
|
21
|
+
|
|
22
|
+
def _canonical_task_name(path):
|
|
23
|
+
return Path(path).stem
|
|
24
|
+
|
|
25
|
+
|
|
26
|
+
def _task_file_map(task_files):
|
|
27
|
+
mapping = {}
|
|
28
|
+
for path in task_files:
|
|
29
|
+
name = _canonical_task_name(path)
|
|
30
|
+
if not name:
|
|
31
|
+
raise ValueError(f"Task file name is empty: {path}")
|
|
32
|
+
key = name.lower()
|
|
33
|
+
if key in mapping:
|
|
34
|
+
raise ValueError(f"Duplicate task name '{name}' for {path} and {mapping[key][1]}")
|
|
35
|
+
mapping[key] = (name, path)
|
|
36
|
+
if not mapping:
|
|
37
|
+
raise ValueError("At least one task file is required")
|
|
38
|
+
return mapping
|
|
39
|
+
|
|
40
|
+
|
|
41
|
+
def _issue_url(issue):
|
|
42
|
+
if issue.url:
|
|
43
|
+
return issue.url
|
|
44
|
+
return f"https://github.com/{issue.repo}/issues/{issue.number}"
|
|
45
|
+
|
|
46
|
+
|
|
47
|
+
def _match_task_file(issue, task_map):
|
|
48
|
+
labels = issue.labels or []
|
|
49
|
+
matches = []
|
|
50
|
+
for label in labels:
|
|
51
|
+
key = label.strip().lower()
|
|
52
|
+
if key in task_map:
|
|
53
|
+
matches.append((label, task_map[key][1]))
|
|
54
|
+
if not matches:
|
|
55
|
+
raise ValueError(f"Issue {_issue_url(issue)} has no matching task label")
|
|
56
|
+
if len(matches) > 1:
|
|
57
|
+
details = ", ".join(f"{label} -> {path}" for label, path in matches)
|
|
58
|
+
raise ValueError(
|
|
59
|
+
f"Issue {_issue_url(issue)} matches multiple task labels: {details}"
|
|
60
|
+
)
|
|
61
|
+
return matches[0][1]
|
|
62
|
+
|
|
63
|
+
|
|
64
|
+
def _strip_progress_section(body):
|
|
65
|
+
if not body:
|
|
66
|
+
return ""
|
|
67
|
+
match = re.search(r"(?m)^## Progress\\s*$", body)
|
|
68
|
+
if not match:
|
|
69
|
+
return body.strip()
|
|
70
|
+
return body[:match.start()].rstrip()
|
|
71
|
+
|
|
72
|
+
|
|
73
|
+
def _format_item_text(issue, description):
|
|
74
|
+
title = issue.title or ""
|
|
75
|
+
url = _issue_url(issue)
|
|
76
|
+
description = description or ""
|
|
77
|
+
return f"Issue: {url}\nTitle: {title}\nDescription: {description}\n"
|
|
78
|
+
|
|
79
|
+
|
|
80
|
+
def _format_status_line(status_line):
|
|
81
|
+
match = re.match(r"^\\[(?P<turns>[^ ]+) @ (?P<elapsed>[^\\]]+)\\]:\\s*(?P<summary>.*)$", status_line)
|
|
82
|
+
if not match:
|
|
83
|
+
return status_line
|
|
84
|
+
summary = match.group("summary").strip()
|
|
85
|
+
prefix = f"`[{match.group('turns')} {match.group('elapsed')}]`"
|
|
86
|
+
if summary:
|
|
87
|
+
return f"{prefix} {summary}"
|
|
88
|
+
return prefix
|
|
89
|
+
|
|
90
|
+
|
|
91
|
+
def _format_progress_bar(total, remaining, start_time):
|
|
92
|
+
if total is None:
|
|
93
|
+
total = 0
|
|
94
|
+
current = total - remaining
|
|
95
|
+
if current < 0:
|
|
96
|
+
current = 0
|
|
97
|
+
elapsed = 0.0
|
|
98
|
+
if start_time is not None:
|
|
99
|
+
elapsed = time.monotonic() - start_time
|
|
100
|
+
total_for_bar = total if total > 0 else 1
|
|
101
|
+
return tqdm.format_meter(current, total_for_bar, elapsed, ncols=80)
|
|
102
|
+
|
|
103
|
+
|
|
104
|
+
def _render_progress_section(base_body, status_line, bar_text):
|
|
105
|
+
parts = [
|
|
106
|
+
_PROGRESS_HEADER,
|
|
107
|
+
"",
|
|
108
|
+
status_line,
|
|
109
|
+
"",
|
|
110
|
+
"```",
|
|
111
|
+
bar_text,
|
|
112
|
+
"```",
|
|
113
|
+
]
|
|
114
|
+
section = "\n".join(parts).rstrip()
|
|
115
|
+
if base_body:
|
|
116
|
+
return f"{base_body.rstrip()}\n\n{section}\n"
|
|
117
|
+
return f"{section}\n"
|
|
118
|
+
|
|
119
|
+
|
|
120
|
+
class GhTaskFile(TaskFile):
|
|
121
|
+
def __init__(
|
|
122
|
+
self,
|
|
123
|
+
path,
|
|
124
|
+
issue,
|
|
125
|
+
project,
|
|
126
|
+
item_text,
|
|
127
|
+
cwd=None,
|
|
128
|
+
yolo=True,
|
|
129
|
+
thread_id=None,
|
|
130
|
+
flags=None,
|
|
131
|
+
):
|
|
132
|
+
super().__init__(path, item_text, None, cwd, yolo, thread_id, flags)
|
|
133
|
+
self.issue = issue
|
|
134
|
+
self.project = project
|
|
135
|
+
self._progress_updates = True
|
|
136
|
+
|
|
137
|
+
def on_progress(
|
|
138
|
+
self,
|
|
139
|
+
iterations,
|
|
140
|
+
max_iterations,
|
|
141
|
+
total_estimate,
|
|
142
|
+
remaining_estimate,
|
|
143
|
+
status_line,
|
|
144
|
+
):
|
|
145
|
+
super().on_progress(
|
|
146
|
+
iterations,
|
|
147
|
+
max_iterations,
|
|
148
|
+
total_estimate,
|
|
149
|
+
remaining_estimate,
|
|
150
|
+
status_line,
|
|
151
|
+
)
|
|
152
|
+
try:
|
|
153
|
+
self.project.set_estimate(self.issue, remaining_estimate)
|
|
154
|
+
except Exception as exc:
|
|
155
|
+
_logger.warning("Failed to update estimate for issue %s", _issue_url(self.issue), exc_info=exc)
|
|
156
|
+
if not status_line:
|
|
157
|
+
return
|
|
158
|
+
try:
|
|
159
|
+
body = self.project.get_issue_body(self.issue)
|
|
160
|
+
base = _strip_progress_section(body)
|
|
161
|
+
status = _format_status_line(status_line)
|
|
162
|
+
bar_text = _format_progress_bar(total_estimate, remaining_estimate, self._progress_start)
|
|
163
|
+
updated = _render_progress_section(base, status, bar_text)
|
|
164
|
+
self.project.set_issue_body(self.issue, updated)
|
|
165
|
+
except Exception as exc:
|
|
166
|
+
_logger.warning("Failed to update issue progress for %s", _issue_url(self.issue), exc_info=exc)
|
|
167
|
+
|
|
168
|
+
def on_success(self, result):
|
|
169
|
+
super().on_success(result)
|
|
170
|
+
self.project.ensure_label(
|
|
171
|
+
self.issue.repo,
|
|
172
|
+
_SUCCESS_LABEL,
|
|
173
|
+
color=_SUCCESS_COLOR,
|
|
174
|
+
description="Task succeeded",
|
|
175
|
+
)
|
|
176
|
+
self.project.add_label(self.issue, _SUCCESS_LABEL)
|
|
177
|
+
|
|
178
|
+
def on_failure(self, result):
|
|
179
|
+
super().on_failure(result)
|
|
180
|
+
self.project.ensure_label(
|
|
181
|
+
self.issue.repo,
|
|
182
|
+
_FAILURE_LABEL,
|
|
183
|
+
color=_FAILURE_COLOR,
|
|
184
|
+
description="Task failed",
|
|
185
|
+
)
|
|
186
|
+
self.project.add_label(self.issue, _FAILURE_LABEL)
|
|
187
|
+
|
|
188
|
+
def tear_down(self):
|
|
189
|
+
super().tear_down()
|
|
190
|
+
self.project.move(self.issue, "In review")
|
|
191
|
+
self.project.release(self.issue)
|
|
192
|
+
|
|
193
|
+
|
|
194
|
+
class GhTaskRunner:
|
|
195
|
+
def __init__(
|
|
196
|
+
self,
|
|
197
|
+
project,
|
|
198
|
+
name,
|
|
199
|
+
task_files,
|
|
200
|
+
status="Backlog",
|
|
201
|
+
cwd=None,
|
|
202
|
+
yolo=True,
|
|
203
|
+
flags=None,
|
|
204
|
+
):
|
|
205
|
+
task_map = _task_file_map(task_files)
|
|
206
|
+
self.project = Project(project, name, has_label=list(task_map))
|
|
207
|
+
self.issue = self.project.take(status=status, return_issue=True)
|
|
208
|
+
self.issue = self.project.get_issue(self.issue)
|
|
209
|
+
try:
|
|
210
|
+
task_path = _match_task_file(self.issue, task_map)
|
|
211
|
+
except Exception:
|
|
212
|
+
self.project.release(self.issue)
|
|
213
|
+
raise
|
|
214
|
+
body = self.project.get_issue_body(self.issue)
|
|
215
|
+
description = _strip_progress_section(body)
|
|
216
|
+
item_text = _format_item_text(self.issue, description)
|
|
217
|
+
self.task = GhTaskFile(
|
|
218
|
+
task_path,
|
|
219
|
+
self.issue,
|
|
220
|
+
self.project,
|
|
221
|
+
item_text,
|
|
222
|
+
cwd,
|
|
223
|
+
yolo,
|
|
224
|
+
None,
|
|
225
|
+
flags,
|
|
226
|
+
)
|
|
227
|
+
|
|
228
|
+
def __call__(self, progress=False):
|
|
229
|
+
return self.task(progress=progress)
|
|
@@ -5,6 +5,7 @@ import logging
|
|
|
5
5
|
import time
|
|
6
6
|
|
|
7
7
|
from .agent import Agent, agent
|
|
8
|
+
from tqdm import tqdm
|
|
8
9
|
|
|
9
10
|
_logger = logging.getLogger(__name__)
|
|
10
11
|
|
|
@@ -20,11 +21,13 @@ _CHECK_PREFIX = (
|
|
|
20
21
|
"Set success to true only if everything matches the intent."
|
|
21
22
|
)
|
|
22
23
|
_CHECK_SUFFIX = "JSON only. No markdown or extra text."
|
|
23
|
-
|
|
24
|
-
"
|
|
25
|
-
"
|
|
26
|
-
"
|
|
27
|
-
"
|
|
24
|
+
_ESTIMATE_PROMPT = (
|
|
25
|
+
"Estimate remaining work in story points for the task below.\n"
|
|
26
|
+
"You may inspect the repo (read files, git status/diff), but do not run tests.\n"
|
|
27
|
+
"Do not change any files.\n"
|
|
28
|
+
"Use the task prompt, current repo state, and latest agent/check outputs.\n"
|
|
29
|
+
"Return only JSON with keys: remaining (number) and summary (string).\n"
|
|
30
|
+
"summary must be a single line describing agent + verifier status."
|
|
28
31
|
)
|
|
29
32
|
DEFAULT_MAX_ITERATIONS = 10
|
|
30
33
|
|
|
@@ -62,14 +65,32 @@ def _resolve_check_text(prompt, check):
|
|
|
62
65
|
return check, False
|
|
63
66
|
|
|
64
67
|
|
|
65
|
-
def
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
"
|
|
71
|
-
|
|
68
|
+
def _build_estimate_prompt(prompt, agent_output, check_output, previous_total):
|
|
69
|
+
agent_text = agent_output.strip() or "(no agent output yet)"
|
|
70
|
+
check_text = check_output.strip() or "(no check output yet)"
|
|
71
|
+
lines = [
|
|
72
|
+
_ESTIMATE_PROMPT,
|
|
73
|
+
"",
|
|
74
|
+
"TASK:",
|
|
75
|
+
"```",
|
|
76
|
+
prompt,
|
|
77
|
+
"```",
|
|
78
|
+
]
|
|
79
|
+
if previous_total is not None:
|
|
80
|
+
lines.append(
|
|
81
|
+
f"This task was previously estimated at about {previous_total} story points."
|
|
82
|
+
)
|
|
83
|
+
lines.extend(
|
|
84
|
+
[
|
|
85
|
+
"",
|
|
86
|
+
"AGENT OUTPUT:",
|
|
87
|
+
agent_text,
|
|
88
|
+
"",
|
|
89
|
+
"CHECK OUTPUT:",
|
|
90
|
+
check_text,
|
|
91
|
+
]
|
|
72
92
|
)
|
|
93
|
+
return "\n".join(lines)
|
|
73
94
|
|
|
74
95
|
|
|
75
96
|
def _check_result(output):
|
|
@@ -91,25 +112,29 @@ def _check_result(output):
|
|
|
91
112
|
return success, reason.strip()
|
|
92
113
|
|
|
93
114
|
|
|
94
|
-
def
|
|
115
|
+
def _estimate_result(output):
|
|
95
116
|
try:
|
|
96
117
|
data = json.loads(output)
|
|
97
118
|
except json.JSONDecodeError as exc:
|
|
98
119
|
raise RuntimeError(
|
|
99
|
-
f"
|
|
120
|
+
f"Estimate returned invalid JSON: {exc}"
|
|
100
121
|
) from exc
|
|
101
122
|
|
|
102
123
|
if not isinstance(data, dict):
|
|
103
|
-
raise RuntimeError("
|
|
124
|
+
raise RuntimeError("Estimate JSON must be an object.")
|
|
125
|
+
|
|
126
|
+
remaining = data.get("remaining")
|
|
127
|
+
summary = data.get("summary")
|
|
128
|
+
if not isinstance(remaining, (int, float)):
|
|
129
|
+
raise RuntimeError("Estimate JSON missing numeric 'remaining'.")
|
|
130
|
+
if not isinstance(summary, str):
|
|
131
|
+
raise RuntimeError("Estimate JSON missing string 'summary'.")
|
|
104
132
|
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
raise RuntimeError("Progress summary JSON missing string 'agent'.")
|
|
109
|
-
if not isinstance(check_summary, str):
|
|
110
|
-
raise RuntimeError("Progress summary JSON missing string 'check'.")
|
|
133
|
+
remaining = int(round(remaining))
|
|
134
|
+
if remaining < 0:
|
|
135
|
+
remaining = 0
|
|
111
136
|
|
|
112
|
-
return
|
|
137
|
+
return remaining, _single_line(summary)
|
|
113
138
|
|
|
114
139
|
|
|
115
140
|
def _single_line(text):
|
|
@@ -118,63 +143,38 @@ def _single_line(text):
|
|
|
118
143
|
return " ".join(text.replace("\r", " ").split())
|
|
119
144
|
|
|
120
145
|
|
|
121
|
-
def
|
|
146
|
+
def _format_elapsed(seconds):
|
|
122
147
|
if seconds < 0:
|
|
123
148
|
seconds = 0
|
|
124
149
|
seconds = int(round(seconds))
|
|
125
150
|
hours, remainder = divmod(seconds, 3600)
|
|
126
151
|
minutes, seconds = divmod(remainder, 60)
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
return f"
|
|
141
|
-
|
|
142
|
-
|
|
143
|
-
def
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
|
|
148
|
-
|
|
149
|
-
total,
|
|
150
|
-
start_time,
|
|
151
|
-
agent_output,
|
|
152
|
-
check_output,
|
|
153
|
-
cwd,
|
|
154
|
-
yolo,
|
|
155
|
-
flags,
|
|
156
|
-
success,
|
|
157
|
-
):
|
|
158
|
-
elapsed = time.monotonic() - start_time
|
|
159
|
-
remaining = 0
|
|
160
|
-
remaining_text = "unknown"
|
|
161
|
-
if total and attempt:
|
|
162
|
-
remaining = (elapsed / attempt) * (total - attempt)
|
|
163
|
-
remaining_text = _format_duration(remaining)
|
|
164
|
-
|
|
165
|
-
summary_prompt = _build_progress_prompt(agent_output, check_output)
|
|
166
|
-
summary = agent(summary_prompt, cwd, yolo, flags)
|
|
167
|
-
agent_summary, check_summary = _progress_result(summary)
|
|
168
|
-
|
|
169
|
-
elapsed_text = _format_duration(elapsed)
|
|
170
|
-
print(f"Agent: {agent_summary}", flush=True)
|
|
171
|
-
print(f"Check: {check_summary}", flush=True)
|
|
172
|
-
verdict = "success" if success else "failure"
|
|
173
|
-
print(
|
|
174
|
-
f"Verdict: {verdict} ({elapsed_text} elapsed, {remaining_text} remaining)",
|
|
175
|
-
flush=True,
|
|
152
|
+
return f"{hours}h{minutes:02d}m{seconds:02d}s"
|
|
153
|
+
|
|
154
|
+
|
|
155
|
+
def _format_turns(iteration, total):
|
|
156
|
+
if total:
|
|
157
|
+
width = len(str(total))
|
|
158
|
+
total_text = str(total)
|
|
159
|
+
else:
|
|
160
|
+
width = len(str(iteration))
|
|
161
|
+
total_text = "∞"
|
|
162
|
+
if width < 1:
|
|
163
|
+
width = 1
|
|
164
|
+
iteration_text = f"{iteration:0{width}d}"
|
|
165
|
+
return f"{iteration_text}/{total_text}"
|
|
166
|
+
|
|
167
|
+
|
|
168
|
+
def estimate(prompt, agent_output, check_output, cwd, yolo, flags, previous_total):
|
|
169
|
+
estimate_prompt = _build_estimate_prompt(
|
|
170
|
+
prompt,
|
|
171
|
+
agent_output or "",
|
|
172
|
+
check_output or "",
|
|
173
|
+
previous_total,
|
|
176
174
|
)
|
|
177
|
-
|
|
175
|
+
output = agent(estimate_prompt, cwd, yolo, flags)
|
|
176
|
+
return _estimate_result(output)
|
|
177
|
+
|
|
178
178
|
|
|
179
179
|
def _fix_prompt(error):
|
|
180
180
|
return (
|
|
@@ -192,21 +192,21 @@ def _success_prompt():
|
|
|
192
192
|
|
|
193
193
|
def _failure_prompt(error):
|
|
194
194
|
return (
|
|
195
|
-
"We ran out of
|
|
195
|
+
"We ran out of iterations. Summarize what you did and what is still failing.\n\n"
|
|
196
196
|
f"Outstanding issues:\n{error}"
|
|
197
197
|
)
|
|
198
198
|
|
|
199
199
|
|
|
200
200
|
class TaskFailed(RuntimeError):
|
|
201
|
-
"""Raised when a task hits the maximum
|
|
201
|
+
"""Raised when a task hits the maximum iterations without success."""
|
|
202
202
|
|
|
203
|
-
def __init__(self, summary,
|
|
204
|
-
message = "Task failed after maximum
|
|
203
|
+
def __init__(self, summary, iterations=None, errors=None):
|
|
204
|
+
message = "Task failed after maximum iterations."
|
|
205
205
|
if summary:
|
|
206
206
|
message = f"{message}\n{summary}"
|
|
207
207
|
super().__init__(message)
|
|
208
208
|
self.summary = summary
|
|
209
|
-
self.
|
|
209
|
+
self.iterations = iterations
|
|
210
210
|
self.errors = errors
|
|
211
211
|
|
|
212
212
|
|
|
@@ -237,11 +237,11 @@ def task(
|
|
|
237
237
|
prompt: The task prompt to run.
|
|
238
238
|
check: False to skip verification, None for the default check, or
|
|
239
239
|
a string check prompt. The string "None" skips verification.
|
|
240
|
-
max_iterations: Maximum number of task
|
|
240
|
+
max_iterations: Maximum number of task iterations (0 means unlimited).
|
|
241
241
|
cwd: Optional working directory for the Codex session.
|
|
242
242
|
yolo: Whether to pass --yolo to Codex.
|
|
243
243
|
flags: Additional raw CLI flags to pass to Codex.
|
|
244
|
-
progress: Whether to
|
|
244
|
+
progress: Whether to show a tqdm progress bar with status updates.
|
|
245
245
|
set_up: Optional setup prompt to run before the task.
|
|
246
246
|
tear_down: Optional cleanup prompt to run after the task.
|
|
247
247
|
on_success: Optional prompt to run after a successful task.
|
|
@@ -251,7 +251,7 @@ def task(
|
|
|
251
251
|
The agent's response text when the task succeeds.
|
|
252
252
|
|
|
253
253
|
Raises:
|
|
254
|
-
TaskFailed: when the task reaches the maximum
|
|
254
|
+
TaskFailed: when the task reaches the maximum iterations without success.
|
|
255
255
|
"""
|
|
256
256
|
result = task_result(
|
|
257
257
|
prompt,
|
|
@@ -268,7 +268,7 @@ def task(
|
|
|
268
268
|
)
|
|
269
269
|
if result.success:
|
|
270
270
|
return result.summary
|
|
271
|
-
raise TaskFailed(result.summary, result.
|
|
271
|
+
raise TaskFailed(result.summary, result.iterations, result.errors)
|
|
272
272
|
|
|
273
273
|
|
|
274
274
|
def task_result(
|
|
@@ -286,8 +286,8 @@ def task_result(
|
|
|
286
286
|
):
|
|
287
287
|
"""Run a prompt with optional checker-driven retries and return TaskResult.
|
|
288
288
|
|
|
289
|
-
The runner keeps a single session. Each verification
|
|
290
|
-
stateless agent call. When progress is True,
|
|
289
|
+
The runner keeps a single session. Each verification iteration uses a fresh,
|
|
290
|
+
stateless agent call. When progress is True, show progress updates each round.
|
|
291
291
|
|
|
292
292
|
Hook strings mirror task file keys: set_up, tear_down, on_success, on_failure.
|
|
293
293
|
"""
|
|
@@ -319,10 +319,10 @@ def task_result(
|
|
|
319
319
|
class TaskResult:
|
|
320
320
|
"""Outcome summary for a task run."""
|
|
321
321
|
|
|
322
|
-
def __init__(self, success, summary,
|
|
322
|
+
def __init__(self, success, summary, iterations, errors, thread_id):
|
|
323
323
|
self.success = success
|
|
324
324
|
self.summary = summary
|
|
325
|
-
self.
|
|
325
|
+
self.iterations = iterations
|
|
326
326
|
self.errors = errors
|
|
327
327
|
self.thread_id = thread_id
|
|
328
328
|
|
|
@@ -330,7 +330,7 @@ class TaskResult:
|
|
|
330
330
|
return (
|
|
331
331
|
"TaskResult("
|
|
332
332
|
f"success={self.success}, "
|
|
333
|
-
f"
|
|
333
|
+
f"iterations={self.iterations}, "
|
|
334
334
|
f"errors={self.errors!r}, "
|
|
335
335
|
f"thread_id={self.thread_id!r}, "
|
|
336
336
|
f"summary={self.summary!r}"
|
|
@@ -352,16 +352,16 @@ class Task:
|
|
|
352
352
|
def __init__(
|
|
353
353
|
self,
|
|
354
354
|
prompt,
|
|
355
|
-
|
|
355
|
+
max_iterations=DEFAULT_MAX_ITERATIONS,
|
|
356
356
|
cwd=None,
|
|
357
357
|
yolo=True,
|
|
358
358
|
thread_id=None,
|
|
359
359
|
flags=None,
|
|
360
360
|
):
|
|
361
|
-
if
|
|
362
|
-
raise ValueError("
|
|
361
|
+
if max_iterations < 0:
|
|
362
|
+
raise ValueError("max_iterations must be >= 0")
|
|
363
363
|
self.prompt = prompt
|
|
364
|
-
self.
|
|
364
|
+
self.max_iterations = max_iterations
|
|
365
365
|
self.cwd = cwd
|
|
366
366
|
self.last_output = None
|
|
367
367
|
self.last_check_output = None
|
|
@@ -369,6 +369,11 @@ class Task:
|
|
|
369
369
|
self.check_text = None
|
|
370
370
|
self._yolo = yolo
|
|
371
371
|
self._flags = flags
|
|
372
|
+
self._progress_enabled = False
|
|
373
|
+
self._progress_updates = False
|
|
374
|
+
self._progress_bar = None
|
|
375
|
+
self._progress_total = None
|
|
376
|
+
self._progress_start = None
|
|
372
377
|
self.agent = Agent(
|
|
373
378
|
cwd,
|
|
374
379
|
yolo,
|
|
@@ -410,6 +415,30 @@ class Task:
|
|
|
410
415
|
def on_failure(self, result):
|
|
411
416
|
"""Hook called after a failed run, e.g. log the failure reason."""
|
|
412
417
|
|
|
418
|
+
def on_progress(
|
|
419
|
+
self,
|
|
420
|
+
turns,
|
|
421
|
+
max_turns,
|
|
422
|
+
total_estimate,
|
|
423
|
+
remaining_estimate,
|
|
424
|
+
status_line,
|
|
425
|
+
):
|
|
426
|
+
"""Hook called with progress updates."""
|
|
427
|
+
if not self._progress_enabled:
|
|
428
|
+
return
|
|
429
|
+
if self._progress_bar is None:
|
|
430
|
+
self._progress_bar = tqdm(total=total_estimate)
|
|
431
|
+
if total_estimate != self._progress_bar.total:
|
|
432
|
+
self._progress_bar.total = total_estimate
|
|
433
|
+
current = total_estimate - remaining_estimate
|
|
434
|
+
if current < 0:
|
|
435
|
+
current = 0
|
|
436
|
+
if self._progress_bar.n != current:
|
|
437
|
+
self._progress_bar.n = current
|
|
438
|
+
self._progress_bar.refresh()
|
|
439
|
+
if status_line:
|
|
440
|
+
tqdm.write(status_line, file=self._progress_bar.fp)
|
|
441
|
+
|
|
413
442
|
def fix_prompt(self, error):
|
|
414
443
|
"""Build a prompt that asks the agent to fix checker failures."""
|
|
415
444
|
return (
|
|
@@ -432,47 +461,87 @@ class Task:
|
|
|
432
461
|
def __call__(self, debug=False, progress=False):
|
|
433
462
|
"""Run the task with checker-driven retries.
|
|
434
463
|
If debug is True, log debug messages.
|
|
435
|
-
If progress is True,
|
|
464
|
+
If progress is True, show a tqdm progress bar with status updates.
|
|
436
465
|
"""
|
|
437
466
|
try:
|
|
438
467
|
# If this fails in the middle we will still try to tear down
|
|
439
468
|
self.set_up()
|
|
440
469
|
|
|
470
|
+
progress_updates = progress or self._progress_updates
|
|
471
|
+
self._progress_enabled = progress
|
|
472
|
+
if progress_updates:
|
|
473
|
+
remaining, _summary = estimate(
|
|
474
|
+
self.prompt,
|
|
475
|
+
"",
|
|
476
|
+
"",
|
|
477
|
+
self.cwd,
|
|
478
|
+
self._yolo,
|
|
479
|
+
self._flags,
|
|
480
|
+
None,
|
|
481
|
+
)
|
|
482
|
+
self._progress_total = remaining
|
|
483
|
+
start_time = time.monotonic()
|
|
484
|
+
self._progress_start = start_time
|
|
485
|
+
self.on_progress(
|
|
486
|
+
0,
|
|
487
|
+
self.max_iterations,
|
|
488
|
+
self._progress_total,
|
|
489
|
+
remaining,
|
|
490
|
+
None,
|
|
491
|
+
)
|
|
492
|
+
else:
|
|
493
|
+
start_time = time.monotonic()
|
|
494
|
+
self._progress_start = start_time
|
|
495
|
+
|
|
441
496
|
# Start with the initial prompt
|
|
442
497
|
output = self.agent(self.prompt)
|
|
443
498
|
self.last_output = output
|
|
444
499
|
if debug:
|
|
445
500
|
_logger.debug("Initial output: %s", output)
|
|
446
501
|
|
|
447
|
-
# Try correcting it up to
|
|
448
|
-
start_time = time.monotonic()
|
|
502
|
+
# Try correcting it up to max_iterations times
|
|
449
503
|
error = None
|
|
450
|
-
|
|
504
|
+
iteration = 0
|
|
451
505
|
while True:
|
|
452
|
-
|
|
453
|
-
if progress:
|
|
454
|
-
_print_progress_start(
|
|
455
|
-
attempt,
|
|
456
|
-
self.max_attempts,
|
|
457
|
-
)
|
|
506
|
+
iteration += 1
|
|
458
507
|
error = self.check(self.last_output)
|
|
459
508
|
if debug:
|
|
460
509
|
_logger.debug("Check error: %s", error)
|
|
461
510
|
|
|
462
|
-
if
|
|
511
|
+
if progress_updates:
|
|
463
512
|
check_output = self.last_check_output
|
|
464
513
|
if self.check_skipped:
|
|
465
514
|
check_output = "Verification skipped."
|
|
466
|
-
|
|
467
|
-
|
|
468
|
-
self.
|
|
469
|
-
start_time,
|
|
470
|
-
self.last_output,
|
|
515
|
+
remaining, summary = estimate(
|
|
516
|
+
self.prompt,
|
|
517
|
+
self.last_output or "",
|
|
471
518
|
check_output or "",
|
|
472
519
|
self.cwd,
|
|
473
520
|
self._yolo,
|
|
474
521
|
self._flags,
|
|
475
|
-
|
|
522
|
+
self._progress_total,
|
|
523
|
+
)
|
|
524
|
+
total_estimate = self._progress_total
|
|
525
|
+
if total_estimate is None or remaining > total_estimate:
|
|
526
|
+
total_estimate = remaining
|
|
527
|
+
self._progress_total = total_estimate
|
|
528
|
+
elapsed = _format_elapsed(time.monotonic() - start_time)
|
|
529
|
+
status_prefix = (
|
|
530
|
+
f"[{_format_turns(iteration, self.max_iterations)} @ {elapsed}]"
|
|
531
|
+
)
|
|
532
|
+
is_final = not error or (
|
|
533
|
+
self.max_iterations and iteration >= self.max_iterations
|
|
534
|
+
)
|
|
535
|
+
if is_final:
|
|
536
|
+
marker = "✅" if not error else "❌"
|
|
537
|
+
summary = f"{marker} {summary}".strip()
|
|
538
|
+
status_line = f"{status_prefix}: {summary}".rstrip()
|
|
539
|
+
self.on_progress(
|
|
540
|
+
iteration,
|
|
541
|
+
self.max_iterations,
|
|
542
|
+
total_estimate,
|
|
543
|
+
remaining,
|
|
544
|
+
status_line,
|
|
476
545
|
)
|
|
477
546
|
if not error:
|
|
478
547
|
summary = self.agent(self.success_prompt())
|
|
@@ -481,20 +550,20 @@ class Task:
|
|
|
481
550
|
result = TaskResult(
|
|
482
551
|
True,
|
|
483
552
|
summary,
|
|
484
|
-
|
|
553
|
+
iteration,
|
|
485
554
|
None,
|
|
486
555
|
self.agent.thread_id,
|
|
487
556
|
)
|
|
488
557
|
self.on_success(result)
|
|
489
558
|
return result
|
|
490
|
-
if self.
|
|
559
|
+
if self.max_iterations and iteration >= self.max_iterations:
|
|
491
560
|
summary = self.agent(self.failure_prompt(error))
|
|
492
561
|
if debug:
|
|
493
562
|
_logger.debug("Failure summary: %s", summary)
|
|
494
563
|
result = TaskResult(
|
|
495
564
|
False,
|
|
496
565
|
summary,
|
|
497
|
-
|
|
566
|
+
iteration,
|
|
498
567
|
error,
|
|
499
568
|
self.agent.thread_id,
|
|
500
569
|
)
|
|
@@ -507,6 +576,8 @@ class Task:
|
|
|
507
576
|
finally:
|
|
508
577
|
# No matter what, once we have set_up we will always tear_down
|
|
509
578
|
self.tear_down()
|
|
579
|
+
if self._progress_bar is not None:
|
|
580
|
+
self._progress_bar.close()
|
|
510
581
|
|
|
511
582
|
|
|
512
583
|
class AutoTask(Task):
|
|
@@ -516,7 +587,7 @@ class AutoTask(Task):
|
|
|
516
587
|
self,
|
|
517
588
|
prompt,
|
|
518
589
|
check=None,
|
|
519
|
-
|
|
590
|
+
max_iterations=DEFAULT_MAX_ITERATIONS,
|
|
520
591
|
cwd=None,
|
|
521
592
|
yolo=True,
|
|
522
593
|
thread_id=None,
|
|
@@ -528,9 +599,9 @@ class AutoTask(Task):
|
|
|
528
599
|
):
|
|
529
600
|
if not (check is None or check is False or isinstance(check, str)):
|
|
530
601
|
raise TypeError("check must be a string or False")
|
|
531
|
-
if
|
|
532
|
-
raise ValueError("
|
|
533
|
-
super().__init__(prompt,
|
|
602
|
+
if max_iterations < 0:
|
|
603
|
+
raise ValueError("max_iterations must be >= 0")
|
|
604
|
+
super().__init__(prompt, max_iterations, cwd, yolo, thread_id, flags)
|
|
534
605
|
self.check_text = check
|
|
535
606
|
self._set_up = _validate_hook("set_up", set_up)
|
|
536
607
|
self._tear_down = _validate_hook("tear_down", tear_down)
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.1
|
|
2
2
|
Name: codexapi
|
|
3
|
-
Version: 0.5.
|
|
3
|
+
Version: 0.5.8
|
|
4
4
|
Summary: Minimal Python API for running the Codex CLI.
|
|
5
5
|
License: MIT
|
|
6
6
|
Keywords: codex,agent,cli,openai
|
|
@@ -68,7 +68,7 @@ codexapi run --cwd /path/to/project "Fix the failing tests."
|
|
|
68
68
|
echo "Say hello." | codexapi run
|
|
69
69
|
```
|
|
70
70
|
|
|
71
|
-
`codexapi task` exits with code 0 on success and 1 on failure
|
|
71
|
+
`codexapi task` exits with code 0 on success and 1 on failure.
|
|
72
72
|
|
|
73
73
|
```bash
|
|
74
74
|
codexapi task "Fix the failing tests." --max-iterations 5
|
|
@@ -79,9 +79,25 @@ Progress is shown by default for `codexapi task`; use `--quiet` to suppress it.
|
|
|
79
79
|
When using `--item`, the task file must include at least one `{{item}}` placeholder.
|
|
80
80
|
|
|
81
81
|
Task files default to using the standard check prompt for the task. Set `check: "None"` to skip verification.
|
|
82
|
-
Use `max_iterations` in the task file to override the default
|
|
82
|
+
Use `max_iterations` in the task file to override the default iteration cap (0 means unlimited).
|
|
83
83
|
Checks are wrapped with the verifier prompt, include the agent output, and expect JSON with `success`/`reason`.
|
|
84
84
|
|
|
85
|
+
Take tasks from a GitHub Project (requires `gh-task`):
|
|
86
|
+
|
|
87
|
+
```bash
|
|
88
|
+
codexapi task -p owner/projects/3 -n "Your Name" -s Backlog task_a.yaml task_b.yaml
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
Task labels are derived from task filenames (basename without extension). The
|
|
92
|
+
issue title/body become `{{item}}` after removing any existing `## Progress`
|
|
93
|
+
section.
|
|
94
|
+
|
|
95
|
+
Example task progress run:
|
|
96
|
+
|
|
97
|
+
```bash
|
|
98
|
+
./examples/example_task_progress.sh
|
|
99
|
+
```
|
|
100
|
+
|
|
85
101
|
Show running sessions and their latest activity:
|
|
86
102
|
|
|
87
103
|
```bash
|
|
@@ -151,11 +167,11 @@ the same conversation and returns only the agent's message.
|
|
|
151
167
|
### `task(prompt, check=None, max_iterations=10, cwd=None, yolo=True, flags=None, progress=False, set_up=None, tear_down=None, on_success=None, on_failure=None) -> str`
|
|
152
168
|
|
|
153
169
|
Runs a task with checker-driven retries and returns the success summary.
|
|
154
|
-
Raises `TaskFailed` when the maximum
|
|
170
|
+
Raises `TaskFailed` when the maximum iterations are reached.
|
|
155
171
|
|
|
156
172
|
- `check` (str | None | False): custom check prompt, default checker, or `False`/`"None"` to skip.
|
|
157
|
-
- `max_iterations` (int): maximum number of task
|
|
158
|
-
- `progress` (bool):
|
|
173
|
+
- `max_iterations` (int): maximum number of task iterations (0 means unlimited).
|
|
174
|
+
- `progress` (bool): show a tqdm progress bar with a one-line status after each round.
|
|
159
175
|
- `set_up`/`tear_down`/`on_success`/`on_failure` (str | None): optional hook prompts.
|
|
160
176
|
|
|
161
177
|
### `task_result(prompt, check=None, max_iterations=10, cwd=None, yolo=True, flags=None, progress=False, set_up=None, tear_down=None, on_success=None, on_failure=None) -> TaskResult`
|
|
@@ -164,7 +180,7 @@ Runs a task with checker-driven retries and returns a `TaskResult` without
|
|
|
164
180
|
raising `TaskFailed`.
|
|
165
181
|
Arguments mirror `task()` (including hooks).
|
|
166
182
|
|
|
167
|
-
### `Task(prompt,
|
|
183
|
+
### `Task(prompt, max_iterations=10, cwd=None, yolo=True, thread_id=None, flags=None)`
|
|
168
184
|
|
|
169
185
|
Runs a Codex task with checker-driven retries. Subclass it and implement
|
|
170
186
|
`check()` to return an error string when the task is incomplete, or return
|
|
@@ -179,22 +195,22 @@ default check prompt and includes the agent output.
|
|
|
179
195
|
- `on_success(result)`: optional success hook.
|
|
180
196
|
- `on_failure(result)`: optional failure hook.
|
|
181
197
|
|
|
182
|
-
### `TaskResult(success, summary,
|
|
198
|
+
### `TaskResult(success, summary, iterations, errors, thread_id)`
|
|
183
199
|
|
|
184
200
|
Simple result object returned by `Task.__call__`.
|
|
185
201
|
|
|
186
202
|
- `success` (bool): whether the task completed successfully.
|
|
187
203
|
- `summary` (str): agent summary of what happened.
|
|
188
|
-
- `
|
|
204
|
+
- `iterations` (int): how many iterations were used.
|
|
189
205
|
- `errors` (str | None): last checker error, if any.
|
|
190
206
|
- `thread_id` (str | None): Codex thread id for the session.
|
|
191
207
|
|
|
192
208
|
### `TaskFailed`
|
|
193
209
|
|
|
194
|
-
Exception raised by `task()` when
|
|
210
|
+
Exception raised by `task()` when iterations are exhausted.
|
|
195
211
|
|
|
196
212
|
- `summary` (str): failure summary text.
|
|
197
|
-
- `
|
|
213
|
+
- `iterations` (int | None): iterations made when the task failed.
|
|
198
214
|
- `errors` (str | None): last checker error, if any.
|
|
199
215
|
|
|
200
216
|
### `foreach(list_file, task_file, n=None, cwd=None, yolo=True, flags=None) -> ForeachResult`
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|