okstra 0.30.3 → 0.32.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/docs/kr/architecture.md +2 -2
- package/docs/kr/cli.md +2 -2
- package/package.json +1 -1
- package/runtime/BUILD.json +2 -2
- package/runtime/agents/SKILL.md +7 -5
- package/runtime/agents/workers/claude-worker.md +1 -1
- package/runtime/agents/workers/codex-worker.md +23 -6
- package/runtime/agents/workers/gemini-worker.md +23 -6
- package/runtime/agents/workers/report-writer-worker.md +45 -66
- package/runtime/bin/okstra-codex-exec.sh +31 -0
- package/runtime/bin/okstra-gemini-exec.sh +26 -0
- package/runtime/bin/okstra-render-final-report.py +101 -0
- package/runtime/bin/okstra-render-report-views.py +17 -10
- package/runtime/bin/okstra-token-usage.py +3 -1
- package/runtime/python/lib/okstra/globals.sh +1 -1
- package/runtime/python/lib/okstra/usage.sh +2 -2
- package/runtime/python/okstra_ctl/final_report_schema.py +253 -0
- package/runtime/python/okstra_ctl/models.py +2 -0
- package/runtime/python/okstra_ctl/render_final_report.py +201 -0
- package/runtime/python/okstra_ctl/report_views.py +276 -297
- package/runtime/python/okstra_ctl/run.py +1 -1
- package/runtime/python/okstra_ctl/wizard.py +53 -14
- package/runtime/python/okstra_ctl/workers.py +45 -11
- package/runtime/python/okstra_token_usage/__init__.py +5 -1
- package/runtime/python/okstra_token_usage/cli.py +66 -36
- package/runtime/python/okstra_token_usage/pricing.py +1 -0
- package/runtime/python/okstra_token_usage/report.py +148 -65
- package/runtime/python/okstra_vendor/__init__.py +37 -0
- package/runtime/python/okstra_vendor/jinja2/__init__.py +38 -0
- package/runtime/python/okstra_vendor/jinja2/_identifier.py +6 -0
- package/runtime/python/okstra_vendor/jinja2/async_utils.py +99 -0
- package/runtime/python/okstra_vendor/jinja2/bccache.py +408 -0
- package/runtime/python/okstra_vendor/jinja2/compiler.py +1998 -0
- package/runtime/python/okstra_vendor/jinja2/constants.py +20 -0
- package/runtime/python/okstra_vendor/jinja2/debug.py +191 -0
- package/runtime/python/okstra_vendor/jinja2/defaults.py +48 -0
- package/runtime/python/okstra_vendor/jinja2/environment.py +1672 -0
- package/runtime/python/okstra_vendor/jinja2/exceptions.py +166 -0
- package/runtime/python/okstra_vendor/jinja2/ext.py +870 -0
- package/runtime/python/okstra_vendor/jinja2/filters.py +1873 -0
- package/runtime/python/okstra_vendor/jinja2/idtracking.py +318 -0
- package/runtime/python/okstra_vendor/jinja2/lexer.py +868 -0
- package/runtime/python/okstra_vendor/jinja2/loaders.py +693 -0
- package/runtime/python/okstra_vendor/jinja2/meta.py +112 -0
- package/runtime/python/okstra_vendor/jinja2/nativetypes.py +130 -0
- package/runtime/python/okstra_vendor/jinja2/nodes.py +1206 -0
- package/runtime/python/okstra_vendor/jinja2/optimizer.py +48 -0
- package/runtime/python/okstra_vendor/jinja2/parser.py +1049 -0
- package/runtime/python/okstra_vendor/jinja2/py.typed +0 -0
- package/runtime/python/okstra_vendor/jinja2/runtime.py +1062 -0
- package/runtime/python/okstra_vendor/jinja2/sandbox.py +436 -0
- package/runtime/python/okstra_vendor/jinja2/tests.py +256 -0
- package/runtime/python/okstra_vendor/jinja2/utils.py +766 -0
- package/runtime/python/okstra_vendor/jinja2/visitor.py +92 -0
- package/runtime/python/okstra_vendor/markupsafe/__init__.py +396 -0
- package/runtime/python/okstra_vendor/markupsafe/_native.py +8 -0
- package/runtime/python/okstra_vendor/markupsafe/py.typed +0 -0
- package/runtime/schemas/final-report-v1.0.schema.json +1391 -0
- package/runtime/skills/okstra-report-writer/SKILL.md +31 -30
- package/runtime/skills/okstra-run/SKILL.md +6 -4
- package/runtime/skills/okstra-team-contract/SKILL.md +27 -3
- package/runtime/templates/reports/final-report.template.md +370 -405
- package/runtime/templates/reports/report.css +57 -4
- package/runtime/templates/reports/report.js +63 -7
- package/runtime/templates/reports/settings.template.json +1 -0
- package/runtime/validators/lib/fixtures.sh +7 -7
- package/runtime/validators/validate-report-views.py +24 -153
- package/runtime/validators/validate-run.py +102 -19
- package/src/install.mjs +21 -1
|
@@ -8,11 +8,13 @@ user-invocable: false
|
|
|
8
8
|
|
|
9
9
|
## File-author ownership (BLOCKING)
|
|
10
10
|
|
|
11
|
-
The final-report
|
|
11
|
+
The final-report **data.json** (JSON SSOT) at `runs/<task-type>/reports/final-report-<task-type>-<seq>.data.json` is authored by the `Report writer worker` subagent when that worker is in the run's roster. The user-facing **markdown** at `runs/<task-type>/reports/final-report-<task-type>-<seq>.md` is then produced by `scripts/okstra-render-final-report.py` from the data.json — the worker invokes the renderer as part of its own turn so both files land on disk before it returns. Claude lead reviews both files but does NOT write them itself in that case. Lead-authored fallback is permitted only after a real Report writer worker dispatch attempt with a recorded non-`completed` terminal status (`error` / `timeout` / `not-run`) and a logged reason (`okstra-error-log.py`). **Except for `release-handoff`**, which has no worker roster — the Claude lead authors the data.json directly by design (see "Release-handoff section contract" below), and the fallback rules in this section do not apply.
|
|
12
12
|
|
|
13
|
-
|
|
13
|
+
The data.json schema is `schemas/final-report-v1.0.schema.json`. The renderer + the run-validator both consume that schema, so a data.json that validates is guaranteed to render into a markdown that passes the contract checks.
|
|
14
14
|
|
|
15
|
-
If you are reading this skill **as
|
|
15
|
+
If you are reading this skill **as the report-writer-worker subagent**, YOU are the one calling the `Write` tool against the data.json path AND invoking the renderer via `Bash`. Do not return either artifact inline — the files on disk are the canonical record.
|
|
16
|
+
|
|
17
|
+
If you are reading this skill **as Claude lead**, your job in Phase 6 is to (a) prepare the report-writer prompt, (b) dispatch the Report writer worker per the Phase 6 dispatch template in SKILL.md, (c) review both files in Phase 7. Do not call `Write` against either path yourself when Report writer worker is in the roster.
|
|
16
18
|
|
|
17
19
|
## When to Use
|
|
18
20
|
|
|
@@ -40,15 +42,15 @@ The prompt MUST include, in this order at the top:
|
|
|
40
42
|
|
|
41
43
|
1. `**Project Root:** <absolute-path>`
|
|
42
44
|
2. `**Prompt History Path:** <project-relative-path>` (under current run `prompts/`)
|
|
43
|
-
3. `**Result Path:** runs/<task-type>/reports/final-report-<task-type>-<seq>.
|
|
45
|
+
3. `**Result Path:** runs/<task-type>/reports/final-report-<task-type>-<seq>.data.json` — canonical JSON SSOT. The renderer produces the sibling `.md` automatically.
|
|
44
46
|
4. `**Worker Result Path:** runs/<task-type>/worker-results/report-writer-worker-<task-type>-<seq>.md` — mandatory validator-checked worker-results audit file
|
|
45
47
|
5. `Assigned worker prompt history path: <absolute-path>`
|
|
46
48
|
6. `**Model:** Report writer worker, <modelExecutionValue>` (resolved per Phase 5.5 anchor-header rules)
|
|
47
|
-
7. The full `[Required reading]` clause (see [okstra-team-contract](../okstra-team-contract/SKILL.md)) including `final-report-template.md
|
|
49
|
+
7. The full `[Required reading]` clause (see [okstra-team-contract](../okstra-team-contract/SKILL.md)) including `schemas/final-report-v1.0.schema.json` and `templates/reports/final-report.template.md` (template is read-only — the worker writes the data.json that drives it).
|
|
48
50
|
8. The verbatim `## Available MCP Servers` block from the task brief, if present.
|
|
49
51
|
9. The convergence classifications (Full/Partial/Contested/Worker-Unique), the round history table (`roundHistory[]`), the `round2SkippedReason` value, and pointers to all worker result files under `worker-results/`. The report-writer worker must reproduce a Round History sub-table in Section 1 of the final report so the reader can see which rounds executed, queue sizes, and why Round 2 was (or was not) skipped.
|
|
50
52
|
10. For implementation-planning runs: a literal block listing the 8 required English section headings the validator scans for (`Option Candidates`, `Trade-off`, `Recommended Option`, `Stepwise Execution Order`, `Dependency`, `Validation Checklist`, `Rollback`, `User Approval Request`). The writer must use these exact substrings as section headings (Korean translation in parentheses is allowed).
|
|
51
|
-
11. An explicit instruction: `You are the author of TWO files: (a) the final-report
|
|
53
|
+
11. An explicit instruction: `You are the author of TWO files: (a) the final-report data.json at <Result Path>, (b) the worker-results audit file at <Worker Result Path>. After writing the data.json, invoke "python3 scripts/okstra-render-final-report.py <Result Path>" via Bash so the markdown sibling is rendered before you return. Do not return the report inline. The validator fails the run when (a)'s schema validation fails, when the rendered markdown is absent, or when (b) is missing.`
|
|
52
54
|
|
|
53
55
|
### Resume-safe dispatch
|
|
54
56
|
|
|
@@ -70,36 +72,35 @@ Speculative reasons such as "session resume constraint", "team object no longer
|
|
|
70
72
|
|
|
71
73
|
## Phase 6 → Phase 7 execution sequence (BLOCKING order)
|
|
72
74
|
|
|
73
|
-
The four steps below MUST execute in this exact order. Reordering them is the recurring root cause of reports shipping with
|
|
75
|
+
The four steps below MUST execute in this exact order. Reordering them is the recurring root cause of reports shipping with `--` token cells (Phase 7 not run yet), Section 6 missing follow-up entries, or Section 7 rows never spawning.
|
|
74
76
|
|
|
75
|
-
1. **Phase 6 — Report writer worker drafts the final-report
|
|
76
|
-
2. **Phase 7 step 1 — Token-usage collector with `--substitute-
|
|
77
|
+
1. **Phase 6 — Report writer worker drafts the final-report data.json** at `runs/<task-type>/reports/final-report-<task-type>-<seq>.data.json`, then invokes `scripts/okstra-render-final-report.py` to produce the sibling markdown. Token Usage cells in the data.json are `null` at this point (renderer emits `--` for nulls); Section 6 lists prioritized actions but does NOT yet include auto-spawned follow-ups (they don't exist yet).
|
|
78
|
+
2. **Phase 7 step 1 — Token-usage collector with `--substitute-data`** (BLOCKING). One invocation aggregates `leadUsage` / `workers[].usage` / `usageSummary` into team-state AND populates `tokenUsage` + `executionStatus[].totalTokens` etc. in the data.json AND re-invokes the renderer so the sibling markdown carries the real numbers. Skipping the flag ships a markdown full of `--` cells.
|
|
77
79
|
|
|
78
80
|
```bash
|
|
79
81
|
python3 scripts/okstra-token-usage.py \
|
|
80
82
|
<runDirectoryPath>/state/team-state-<task-type>-<seq>.json \
|
|
81
83
|
--write --summary \
|
|
82
|
-
--substitute-
|
|
84
|
+
--substitute-data <runDirectoryPath>/reports/final-report-<task-type>-<seq>.data.json
|
|
83
85
|
```
|
|
84
86
|
|
|
85
|
-
The
|
|
86
|
-
3. **Phase 7 step 1.5 — Render report views** (BLOCKING). Produces
|
|
87
|
+
The data.json paths populated: `tokenUsage.lead.{totalTokens,billableTokens,costUsd}`, the `worker` / `grand` rows, `tokenUsage.cli.costUsd`, and each `executionStatus[].{totalTokens,billableTokens,costUsd,durationMs,cliTotalTokens,cliCostUsd}` for rows whose role matches a team-state worker. The data.json MUST already exist (Phase 6 output).
|
|
88
|
+
3. **Phase 7 step 1.5 — Render report views** (BLOCKING). Produces the self-contained HTML view from the now-substituted final-report MD:
|
|
87
89
|
|
|
88
90
|
```bash
|
|
89
91
|
python3 scripts/okstra-render-report-views.py \
|
|
90
92
|
<runDirectoryPath>/reports/final-report-<task-type>-<seq>.md
|
|
91
93
|
```
|
|
92
94
|
|
|
93
|
-
|
|
94
|
-
- `runs/<task-type>/reports/final-report-<task-type>-<seq>.
|
|
95
|
-
- `runs/<task-type>/reports/final-report-<task-type>-<seq>.html` — single-file self-contained human view. Section 5 `C-*` clarification rows with `Status` ∈ {`open`, `answered`} embed `<textarea>` controls; an `Export user response` button serialises form values to a markdown sidecar (schema in [`templates/reports/user-response.template.md`](../../templates/reports/user-response.template.md)) that the user pastes to `runs/<task-type>/user-responses/user-response-<task-type>-<seq>.md`. The original final-report MD is **never** mutated by user input — the sidecar is the single write target.
|
|
95
|
+
Output (idempotent — re-running overwrites):
|
|
96
|
+
- `runs/<task-type>/reports/final-report-<task-type>-<seq>.html` — single-file self-contained human view. Section 5 `C-*` clarification rows with `Status` ∈ {`open`, `answered`} embed form widgets (`<select>` for enum-style decisions, `<input>` for material / data-point kinds, `<textarea>` fallback); an `Export user response` button serialises form values to a markdown sidecar (schema in [`templates/reports/user-response.template.md`](../../templates/reports/user-response.template.md)) that the user pastes to `runs/<task-type>/user-responses/user-response-<task-type>-<seq>.md`. The original final-report MD is **never** mutated by user input — the sidecar is the single write target.
|
|
96
97
|
|
|
97
|
-
Must run AFTER step 1 (so token placeholders are substituted in
|
|
98
|
+
Must run AFTER step 1 (so token placeholders are substituted in the rendered html) and BEFORE step 2 (so the html artifact exists for any validator step that checks it).
|
|
98
99
|
4. **Phase 7 step 2 — Follow-up task spawner** (BLOCKING when Section 7 is non-empty). Turns the report's `## 7. Follow-up Tasks (후속 작업)` rows into `tasks/<task-group>/<new-task-id>/` stubs.
|
|
99
100
|
|
|
100
101
|
```bash
|
|
101
102
|
python3 scripts/okstra-spawn-followups.py \
|
|
102
|
-
<runDirectoryPath>/reports/final-report-<task-type>-<seq>.
|
|
103
|
+
<runDirectoryPath>/reports/final-report-<task-type>-<seq>.data.json \
|
|
103
104
|
--project-root <project_root> \
|
|
104
105
|
--task-group <task-group> \
|
|
105
106
|
--parent-task-key <task-key>
|
|
@@ -107,9 +108,10 @@ The four steps below MUST execute in this exact order. Reordering them is the re
|
|
|
107
108
|
|
|
108
109
|
Behaviour contract:
|
|
109
110
|
- Idempotent: rows whose target dir exists are reported as `existing` and skipped. Reruns of the same parent task are safe.
|
|
110
|
-
- Rows with `
|
|
111
|
-
-
|
|
112
|
-
-
|
|
111
|
+
- Rows with `autoSpawn != "yes"` are reported as `skipped` and never written; surface them in Section 6 if manual action is still needed.
|
|
112
|
+
- Rows whose `origin` is `phase-continuation` are reported as `skipped (no new task dir)` and never spawn — they advance the same task-key via `/okstra-run` instead.
|
|
113
|
+
- An invalid `origin`, `suggestedTaskType`, missing `title`, missing `reason`, or missing `newTaskId` exits `1`. (Schema validation in Phase 6 catches most of these before the spawner runs.)
|
|
114
|
+
- **Canonical spawn rule (single source of truth):** the spawner runs when `task-type` ∈ {`implementation`, `final-verification`, `release-handoff`}, OR when `followUpTasks` is non-empty for any other task-type. For the listed task-types `followUpTasks` must be present (schema enforces the phase-continuation row for non-terminal task-types); an empty array is permitted only for `release-handoff`. Missing arrays are no-ops (exit `0`). All other references to this rule (including the Persistence Checklist) defer to this statement.
|
|
113
115
|
5. **Phase 7 step 3 — Update Section 6** after the spawner. The report-writer MUST append one row per newly spawned task-key with its entry command:
|
|
114
116
|
|
|
115
117
|
```
|
|
@@ -120,7 +122,7 @@ The status file is written after step 3 completes.
|
|
|
120
122
|
|
|
121
123
|
## Final Report Structure
|
|
122
124
|
|
|
123
|
-
The final report follows the structure
|
|
125
|
+
The final report follows the structure encoded in `schemas/final-report-v1.0.schema.json`. The schema is the single source of truth for section names, row shapes, enum values, and task-type-conditional blocks. The Jinja2 template `templates/reports/final-report.template.md` produces the human-readable form from any data.json that validates against the schema. The structure description below is a reading guide for writers; the schema is the binding contract.
|
|
124
126
|
|
|
125
127
|
### Report Header
|
|
126
128
|
|
|
@@ -140,11 +142,11 @@ The final report follows the structure below. If `instruction-set/final-report-t
|
|
|
140
142
|
```markdown
|
|
141
143
|
| Agent | Role | Model | Status | 처리 토큰 | 환산 토큰 | 비용 (USD) | Duration | Summary of Key Findings |
|
|
142
144
|
|-------|------|-------|--------|-----------|-----------|------------|----------|------------------------|
|
|
143
|
-
| Claude Code | Claude lead | opus | completed | 10,479,327 | 1,769,798 | $26.55 | 59m 12s | Final synthesis status |
|
|
145
|
+
| Claude Code | Claude lead | opus-4-6 | completed | 10,479,327 | 1,769,798 | $26.55 | 59m 12s | Final synthesis status |
|
|
144
146
|
| Claude Code | Claude worker | sonnet | completed | 1,941,396 | 475,136 | $1.43 | 13m 33s | Key findings summary |
|
|
145
147
|
| Codex | Codex worker | gpt-5.5 | completed | 2,274,011 (CLI: 5,261,833) | 586,223 | $8.79 (+ CLI $4.20) | 22m 06s | Key findings summary |
|
|
146
148
|
| Gemini | Gemini worker | auto | completed | 3,107,795 | 746,623 | $11.20 | 22m 06s | Key findings summary |
|
|
147
|
-
| Claude Code | Report writer | opus | completed | 665,497 | 267,210 | $4.01 | 4m 20s | Report organization |
|
|
149
|
+
| Claude Code | Report writer | opus-4-6 | completed | 665,497 | 267,210 | $4.01 | 4m 20s | Report organization |
|
|
148
150
|
```
|
|
149
151
|
|
|
150
152
|
Table Generation Rules:
|
|
@@ -175,7 +177,7 @@ Place this section immediately after the execution status table.
|
|
|
175
177
|
```
|
|
176
178
|
|
|
177
179
|
Token Summary Generation Rules:
|
|
178
|
-
- **You
|
|
180
|
+
- **You populate the data.json in Phase 6, BEFORE Phase 7 runs the collector.** Set `tokenUsage.lead.totalTokens` / `.billableTokens` / `.costUsd`, the `worker` and `grand` rows, `tokenUsage.cli.costUsd`, and each `executionStatus[].{totalTokens,billableTokens,costUsd,durationMs,cliTotalTokens,cliCostUsd}` to JSON `null`. The renderer emits `--` for nulls; `okstra-token-usage.py --substitute-data` populates them in Phase 7 and re-renders the markdown. Never set these cells to `0`, `"not-collected"`, `"--"`, `"N/A"`, or any other sentinel: nulls are the only valid placeholder, and the substitution step depends on them being null when it runs.
|
|
179
181
|
- All values come from `usageSummary` (populated by `scripts/okstra-token-usage.py` at the start of Phase 7). Do not estimate or invent.
|
|
180
182
|
- **Lead** row: `usageSummary.leadTotalTokens` / `usageSummary.leadBillableEquivalentTokens` / `usageSummary.estimatedCostUsd.lead`.
|
|
181
183
|
- **Worker 합계** row: `usageSummary.workerTotalTokens` / `usageSummary.workerBillableEquivalentTokens` / `usageSummary.estimatedCostUsd.claudeWorkers`.
|
|
@@ -199,11 +201,11 @@ When the run's `task-type` is `implementation-planning`, the final report MUST c
|
|
|
199
201
|
| 6 | `Validation Checklist` | `### Validation Checklist (검증 체크리스트)` |
|
|
200
202
|
| 7 | `Rollback` | `### Rollback Strategy (롤백 전략)` |
|
|
201
203
|
| 8 | `User Approval Request` | Satisfied by the top-of-report `## User Approval Request (사용자 승인 게이트)` block. Do NOT recreate a `### 4.5.8 User Approval Request` body stub — the validator now fails reports that contain one. |
|
|
202
|
-
| 9 | `Plan Body Verification` + `Gate result:` | `### Plan Body Verification (계획 본문 검증)` containing a `Gate result:` line — copy `
|
|
204
|
+
| 9 | `Plan Body Verification` + `Gate result:` | `### Plan Body Verification (계획 본문 검증)` containing a `Gate result:` line — copy `templates/reports/final-report.template.md §4.5.9` verbatim. Validator checks both substrings. |
|
|
203
205
|
|
|
204
206
|
The Korean translation in parentheses is optional but the English keyword is mandatory. The body of each section is written in Korean per the writing rules below. For non-`implementation-planning` runs, omit this entire block — these headings are NOT validator-checked for other task-types.
|
|
205
207
|
|
|
206
|
-
The final-report template `
|
|
208
|
+
The final-report template `templates/reports/final-report.template.md` Section 4.5 already encodes this contract — copy that block verbatim and fill in.
|
|
207
209
|
|
|
208
210
|
### Final-verification verdict token contract (BLOCKING)
|
|
209
211
|
|
|
@@ -217,7 +219,7 @@ When the run's `task-type` is `final-verification`, the report's `## 2. Final Ve
|
|
|
217
219
|
|
|
218
220
|
For every other task-type, set the `Verdict Token` cell to `not-applicable`. Do NOT omit the row — the template renders it for all task-types and downstream tooling expects the field to exist.
|
|
219
221
|
|
|
220
|
-
The final-report template `
|
|
222
|
+
The final-report template `templates/reports/final-report.template.md` Section 2 already encodes this contract — copy that block verbatim and fill in.
|
|
221
223
|
|
|
222
224
|
### Release-handoff section contract (release-handoff runs only)
|
|
223
225
|
|
|
@@ -225,7 +227,7 @@ When the run's `task-type` is `release-handoff`, the final report MUST include S
|
|
|
225
227
|
|
|
226
228
|
**Single-lead authorship (release-handoff only):** release-handoff has no worker roster (no `Report writer worker`, no `Claude worker` drafter). The Claude lead authors the final-report file directly — there is no `Report writer worker` dispatch to perform in Phase 6, no resume-safe dispatch concern, and no mandatory worker-results file for a report-writer role. The rest of this skill's dispatch / resume / fallback machinery applies ONLY when `Report writer worker` is in the roster (i.e. every task-type other than `release-handoff`).
|
|
227
229
|
|
|
228
|
-
The final-report template `
|
|
230
|
+
The final-report template `templates/reports/final-report.template.md` Section 4.6 already encodes this contract — copy that block verbatim and fill in. For non-`release-handoff` runs, omit Section 4.6 entirely.
|
|
229
231
|
|
|
230
232
|
### Mandatory worker-results file (BLOCKING)
|
|
231
233
|
|
|
@@ -291,8 +293,7 @@ Persistence steps that must be performed in Phase 7:
|
|
|
291
293
|
- [ ] 6. **Generate final status file**: `runs/<task-type>/status/final-<task-type>-<seq>.status` (if necessary)
|
|
292
294
|
- [ ] 7. **Save convergence state**: `runs/<task-type>/state/convergence-<task-type>-<seq>.json` (when convergence is enabled)
|
|
293
295
|
- [ ] 8. **Spawn follow-up task stubs**: run `scripts/okstra-spawn-followups.py` against the final-report per the canonical spawn rule defined in "Phase 7 follow-up task spawner" above. Do not restate the trigger condition here — that section is the single source of truth. The script is idempotent across reruns.
|
|
294
|
-
- [ ] 9. **
|
|
295
|
-
- [ ] 10. **Human HTML report**: `runs/<task-type>/reports/final-report-<task-type>-<seq>.html` (same step 1.5; self-contained, embeds `Export user response` button)
|
|
296
|
+
- [ ] 9. **Human HTML report**: `runs/<task-type>/reports/final-report-<task-type>-<seq>.html` (produced by Phase 7 step 1.5 — self-contained, embeds `Export user response` button)
|
|
296
297
|
|
|
297
298
|
### Response after Persistence
|
|
298
299
|
|
|
@@ -38,9 +38,10 @@ Every wizard call returns JSON. The two shapes you'll see:
|
|
|
38
38
|
|
|
39
39
|
On `ok: false`, re-prompt with the same `current.step` using the error message. The wizard never advances on validation failure; the user retries the same step.
|
|
40
40
|
|
|
41
|
-
The wizard tells you *which UI to use* via `kind
|
|
41
|
+
The wizard tells you *which UI to use* via `kind` (and the optional `multi` flag on `pick`):
|
|
42
42
|
|
|
43
|
-
- `kind: "pick"` → render `AskUserQuestion` with `label
|
|
43
|
+
- `kind: "pick"` + `multi: false` (default) → render `AskUserQuestion` with `label`, `options[].label`, and `multiSelect: false`. Use the chosen `options[].value` (single string) as the answer.
|
|
44
|
+
- `kind: "pick"` + `multi: true` → render `AskUserQuestion` with `label`, `options[].label`, and `multiSelect: true`. Join the chosen `options[].value` entries with `,` into a single CSV string and submit that as `--answer "csv,values"`. If the user selects nothing, still submit `--answer ""` — the wizard will reply `ok: false` and re-prompt the same step (do not skip the call).
|
|
44
45
|
- `kind: "text"` → write `label` as a plain text message and consume the user's NEXT message as the answer.
|
|
45
46
|
- `kind: "done"` → input collection finished; move to Step 5.
|
|
46
47
|
|
|
@@ -90,8 +91,9 @@ Output: the same `{ok, next}` JSON described above. The first `next` is always `
|
|
|
90
91
|
|
|
91
92
|
Repeat until `next.kind == "done"`:
|
|
92
93
|
|
|
93
|
-
1. **Render** the prompt according to `kind
|
|
94
|
-
- `pick` → `AskUserQuestion` with `label
|
|
94
|
+
1. **Render** the prompt according to `kind` (and `multi` for pick):
|
|
95
|
+
- `pick` + `multi: false` → `AskUserQuestion` with `multiSelect: false`, `label`, and `options`. The user's chosen option's `value` is the answer string.
|
|
96
|
+
- `pick` + `multi: true` → `AskUserQuestion` with `multiSelect: true`, `label`, and `options`. Join the selected `value`s with `,` into a single literal CSV string (e.g. `"claude,codex,gemini"`) and submit it as a single `--answer "claude,codex,gemini"`. Empty selection submits `--answer ""` and the wizard re-prompts.
|
|
95
97
|
- `text` → plain text message containing `label`. Consume the user's next reply verbatim as the answer string (empty reply = empty string).
|
|
96
98
|
2. **Submit** the answer — call `okstra wizard step` with the literal state-file path from Step 2 and the literal user answer (no shell variables, no `$(...)`):
|
|
97
99
|
```bash
|
|
@@ -70,7 +70,7 @@ Every worker prompt MUST start with the following anchor headers, in this exact
|
|
|
70
70
|
|
|
71
71
|
1. `**Project Root:** <absolute-path>` — absolute target project root (from `{{PROJECT_ROOT}}` in the lead's prompt). Required so the worker can self-anchor without relying on inherited cwd.
|
|
72
72
|
2. `**Prompt History Path:** <project-relative-path>`
|
|
73
|
-
3. `**Result Path:** <project-relative-path>`
|
|
73
|
+
3. `**Result Path:** <project-relative-path>` — canonical destination for the worker's result file. Workers resolve it to absolute against `**Project Root:**` and use it for the post-completion existence check (see codex-worker / gemini-worker step 8c, and Lead's redispatch policy below). The path identifies a single file; do NOT deliver a directory.
|
|
74
74
|
4. `Assigned worker prompt history path: <absolute-path>` — same as the prompt-history path but resolved against `Project Root`. Codex/Gemini wrapper subagents extract this exact line.
|
|
75
75
|
|
|
76
76
|
The body must include: role name, task type, task key, required bundle paths, assigned model, output contract, evidence handling rules, and any relevant config/deployment expectations from `reference-expectations.md`.
|
|
@@ -209,6 +209,29 @@ Terminal statuses that can be recorded for a worker:
|
|
|
209
209
|
| `error` | Execution error, reason recorded; prompt history file must exist |
|
|
210
210
|
| `not-run` | Not executed, reason recorded |
|
|
211
211
|
|
|
212
|
+
## Lead Redispatch Policy on Result-Missing
|
|
213
|
+
|
|
214
|
+
After each worker subagent returns (regardless of role), Lead MUST verify the canonical result file exists at the absolute path resolved from the `**Result Path:**` anchor header (against `**Project Root:**`). The check is identical for in-process workers (claude-worker) and CLI-wrapper workers (codex-worker / gemini-worker).
|
|
215
|
+
|
|
216
|
+
**Triggers (any of):**
|
|
217
|
+
|
|
218
|
+
- The wrapper subagent returned an explicit `*_RESULT_MISSING` sentinel (codex-worker / gemini-worker step 8c — `CODEX_RESULT_MISSING` / `GEMINI_RESULT_MISSING`).
|
|
219
|
+
- The result file is absent at the resolved absolute path even though the worker returned without a `*_RESULT_MISSING` sentinel — for example, claude-worker returned its final assistant message but never persisted the artifact, or the wrapper exited 0 and the codex/gemini sub-agent forwarded raw stdout despite the contract.
|
|
220
|
+
- The result file exists but cannot be parsed (frontmatter unreadable, sections 1–5 entirely missing). A truncated file in the middle of section 5 is NOT covered here — it goes to the validator's regular `error` path, not the retry path.
|
|
221
|
+
|
|
222
|
+
**One-retry policy:**
|
|
223
|
+
|
|
224
|
+
1. On the FIRST result-missing trigger for a given role within a single run, Lead MUST re-dispatch the same worker with the byte-identical prompt — same `**Result Path:**`, same `**Prompt History Path:**`, same model assignment, same `team_name`. The redispatch counts as a second attempt against the existing role slot; do NOT create a new role-id, do NOT change the result file path, do NOT switch to a different model as a "workaround".
|
|
225
|
+
2. If the SECOND attempt also fails the same check, Lead records the role's terminal status as `error` with `--message "result-missing after 1 retry"` and proceeds to Phase 5.5 / Phase 6 with the remaining workers' results. Lead MUST NOT retry a third time — convergence and the report writer are designed to operate on reduced-confidence single-or-two-analyser mode when one role is absent (`agents/SKILL.md` "If only one worker result is usable: reduced-confidence synthesis").
|
|
226
|
+
3. The retry counter is **per-run, per-role** and is NOT preserved across runs. A subsequent okstra run for the same task-key starts each role's counter fresh.
|
|
227
|
+
4. Convergence reverify rounds (Phase 5.5) inherit the same one-retry budget — a reverify dispatch that triggers result-missing may be re-dispatched once.
|
|
228
|
+
|
|
229
|
+
**Logging.** Lead records the first attempt's `cli-failure` (already emitted by the wrapper sub-agent) as-is. The retry, on success, is logged via the normal worker-completion path; on failure (second `*_RESULT_MISSING`), Lead records a single `contract-violation` entry with `--message "result-missing after 1 retry"` referencing both attempts' bash_ids / prompt-history paths.
|
|
230
|
+
|
|
231
|
+
**Diagnostic sidecar (advisory).** Both codex/gemini wrappers also write a heartbeat sidecar at `<prompt-path>.status.json` recording `started_ts`, `ended_ts`, `exit_code`, `duration_ms`, and the canonical `log_path` (see `scripts/okstra-wrapper-status.py` for the schema). Lead MAY read this sidecar when deciding whether the first attempt actually launched the CLI (stage=`exited`, `exit_code=0`, non-zero `duration_ms`) versus failed before reaching it (sidecar absent, or stage=`started` with no exit fields). The sidecar is best-effort — its absence is NOT by itself a reason to skip the retry; the canonical trigger remains the missing result file.
|
|
232
|
+
|
|
233
|
+
**Rationale.** Observed failure mode: the CLI (codex/gemini) streams its full analysis to stdout but hits its token budget or a sandbox EPERM mid-`Write` of the result file, exiting 0 with no artifact. Forwarding the partial stdout silently degrades synthesis; classifying the role as `error` without retrying gives up a recoverable signal. A single retry catches the transient class of this failure (re-dispatch with the same prompt typically succeeds when the underlying cause was an intermittent sandbox lock or a token-budget spike) while bounding the retry cost to a known upper bound (~2× the original wrapper budget per role).
|
|
234
|
+
|
|
212
235
|
## Worker Output Contract
|
|
213
236
|
|
|
214
237
|
**Authoritative source.** If other documents (SKILL.md, worker agent definitions) disagree with this section, this section wins.
|
|
@@ -355,8 +378,9 @@ empty run-level error logs in production.
|
|
|
355
378
|
- **Background dispatch + polling contract (Codex / Gemini wrappers).** Both wrapper subagents MUST dispatch `okstra-codex-exec.sh` / `okstra-gemini-exec.sh` via `Bash(run_in_background: true)` and poll with `BashOutput(bash_id)` until the shell reports `status == "completed"`, capped at 30 minutes (1800s) of wall-clock elapsed time. `BashOutput` itself is the wait primitive — call it back-to-back; do NOT insert a standalone `sleep` between polls. The Claude Code harness blocks `sleep` calls of 5 seconds or longer as a circumvention vector and explicitly forbids chaining shorter sleeps inside until-loops to work around the block. Workers that hit the contract bug must NOT self-recover with `until ...; do sleep 2; done` wrappers — that path violates the harness anti-circumvention rule, even though it superficially "works". The legacy "single foreground `Bash` with 120000ms timeout" rule, and the subsequent "60-second cadence with `sleep 60` between polls" rule, are both retired. The current rule applies in **every phase** (analysis runs typically complete in 1–2 `BashOutput` calls, so there is no regression for short jobs). Recording responsibilities:
|
|
356
379
|
- Successful completion: return the wrapper's accumulated stdout from the final `BashOutput`. No log entry.
|
|
357
380
|
- Non-zero `exit_code` reported by `BashOutput`: record a `cli-failure` to the run-level error log with the real `exit_code` and observed `duration-ms`.
|
|
358
|
-
-
|
|
381
|
+
- Polling cap reached: before `KillShell`, perform a one-shot **mtime-grace check** on the wrapper's live log (`<prompt>.log`). If the log was written within the last 90 seconds AND grace has not yet been applied this loop, extend the cap from 1800s → 2100s (one-shot +5min) and continue polling. Otherwise (log stale, OR grace already applied), call `KillShell(shell_id)`, record `cli-failure` with `--exit-code 124 --duration-ms <observed_ms> --message "<wrapper> exceeded polling cap (grace=<applied|not-applied>, last_mtime_age=<n>s)"`, then return the language-specific `*_CLI_TIMEOUT` sentinel. The grace exists to absorb token-budget spikes where the CLI is genuinely still producing output past the 30-minute mark; it is a one-shot soft extension, NOT a loop.
|
|
359
382
|
- Token-usage matching is unaffected: the wrapper subagent stays alive throughout polling, so the wrapper's jsonl timestamp window continues to cover the underlying CLI rollout's full duration (see §"Token-usage accounting" below).
|
|
383
|
+
- **No external timeout on wrapper subagents.** The codex/gemini wrapper subagent's polling loop (with optional mtime grace) is the SINGLE timeout authority for its dispatch. Lead MUST NOT impose a separate `Agent()` call timeout, an outer `Bash` wall-clock deadline, or any other mechanism that terminates the subagent before its own polling cap is reached. Doing so reproduces the historical failure mode that motivated this rule: Lead aborts the subagent at e.g. 18 minutes, the subagent returns nothing, and Lead classifies the role as "no response" while the underlying CLI was actively working. The wrapper's polling cap (30min + optional 5min grace) is calibrated so that, combined with Lead's redispatch policy (see "Lead Redispatch Policy on Result-Missing"), a recoverable single-run failure costs at most ~70 minutes of wall-clock — predictable enough to plan around. If a specific run requires a tighter cap, lower it in the wrapper subagent's polling contract (single source of truth), NOT by layering Lead-side timeouts.
|
|
360
384
|
- `contract-violation` events (C) are recorded by Lead via `okstra-error-log.py append-observed --error-type contract-violation ...` after inspecting worker outputs.
|
|
361
385
|
- Lead's responsibility regarding the sidecar is to dump it to the run-level error log via `okstra-error-log.py append-from-worker` after each worker terminates; Lead does not write into the sidecar.
|
|
362
386
|
|
|
@@ -419,7 +443,7 @@ Examples:
|
|
|
419
443
|
**Task:** error-analysis
|
|
420
444
|
**Target:** server/auth.ts
|
|
421
445
|
**Date:** 2026-04-06
|
|
422
|
-
**Model:** Report writer worker, opus
|
|
446
|
+
**Model:** Report writer worker, opus-4-6
|
|
423
447
|
```
|
|
424
448
|
|
|
425
449
|
Use the actual model identifier recorded in team-state (never invent a model ID — read it from `resultContract.requiredWorkerRoles[*].modelExecutionValue` or the tool response metadata).
|