okstra 0.27.0 → 0.28.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/runtime/BUILD.json +2 -2
- package/runtime/agents/workers/claude-worker.md +4 -3
- package/runtime/agents/workers/codex-worker.md +4 -3
- package/runtime/agents/workers/gemini-worker.md +4 -3
- package/runtime/agents/workers/report-writer-worker.md +7 -2
- package/runtime/prompts/launch.template.md +1 -1
- package/runtime/prompts/profiles/_common-contract.md +12 -4
- package/runtime/python/okstra_token_usage/cli.py +9 -2
- package/runtime/python/okstra_token_usage/report.py +32 -3
- package/runtime/skills/okstra-convergence/SKILL.md +2 -2
- package/runtime/skills/okstra-report-writer/SKILL.md +6 -4
- package/runtime/skills/okstra-team-contract/SKILL.md +14 -10
- package/runtime/templates/reports/final-report.template.md +227 -207
- package/runtime/validators/lib/fixtures.sh +37 -0
- package/runtime/validators/validate-run.py +313 -1
package/package.json
CHANGED
package/runtime/BUILD.json
CHANGED
|
@@ -57,8 +57,8 @@ Unlike the Codex / Gemini workers, you are an in-process Claude subagent — you
|
|
|
57
57
|
Before producing any output, you MUST read every input file enumerated in the `[Required reading]` block of the lead's prompt from the very first character to the very last character. This includes the task brief, analysis profile, analysis material (if present), reference expectations, the carry-in clarification response (if present), and the final report template.
|
|
58
58
|
|
|
59
59
|
- Use a single `Read` call per file with no `offset` and no `limit`. If a file is genuinely too large for one read, page through it with explicit `offset` / `limit` calls that together cover the entire file, and record the page boundaries in your Findings.
|
|
60
|
-
- For the carry-in clarification response, walk every row of `## 5. Clarification Items` (`C-001`, `C-002`, ...) in full, including rows whose `User input` cell is blank — a blank `User input` with `Status=open` is itself a signal you must surface, not skip. Skimming these rows is the most common failure mode here; the fact that the file you will eventually contribute to has a structurally similar section 5 is NOT a license to skim.
|
|
61
|
-
- Before listing any Findings,
|
|
60
|
+
- For the carry-in clarification response, walk every row of `## 5. Clarification Items` (`C-001`, `C-002`, ...) in full, including rows whose `User input` cell is blank — a blank `User input` with `Status=open` is itself a signal you must surface, not skip. Skimming these rows is the most common failure mode here; the fact that the file you will eventually contribute to has a structurally similar section 5 is NOT a license to skim.
|
|
61
|
+
- Before listing any Findings, write a Reading Confirmation block to your **audit sidecar** at `runs/<task-type>/worker-results/claude-worker-audit-<task-type>-<seq>.md` (sibling to your main worker-results file — substitute `claude-worker-<task-type>-<seq>.md` → `claude-worker-audit-<task-type>-<seq>.md`). The sidecar's body begins with `# Claude Worker Audit — <task-key>` followed by one short line per input file confirming end-to-end reading (e.g. `- Read task-brief.md end-to-end (147 lines).`). Do NOT include a `## 0. Reading Confirmation` heading in the main worker-results file — the validator now fails worker-results that contain one. If you cannot truthfully confirm a file end-to-end, record a `tool-failure` in the errors sidecar instead of fabricating Findings.
|
|
62
62
|
- Do not skip a file because its name suggests its content is already familiar from a prior run. Each file is canonical for the current run only.
|
|
63
63
|
|
|
64
64
|
## Worker Output Structure
|
|
@@ -67,7 +67,6 @@ When returning results, start the file with a YAML frontmatter block, then organ
|
|
|
67
67
|
|
|
68
68
|
**Frontmatter (mandatory)** — set `workerId: "claude"`. Copy `id`, `aliases`, `taskType`, `task-id`, `task-group`, `project-id`, `date` verbatim from the input files (`analysis-material.md` is canonical; if it lacks any field, record a `tool-failure` and stop). Full schema and a concrete example live in the `okstra-team-contract` skill's "Result Frontmatter" subsection.
|
|
69
69
|
|
|
70
|
-
0. **Reading Confirmation** - one short line per input file confirming end-to-end reading (e.g. `- Read task-brief.md end-to-end (147 lines).`). If any file was skipped, record a `tool-failure` and do NOT produce sections 1–5.
|
|
71
70
|
1. **Findings** - what you identified
|
|
72
71
|
2. **Missing Information or Assumptions** - gaps in the analysis
|
|
73
72
|
3. **Safe or Reasonable Areas** - parts that look correct
|
|
@@ -76,6 +75,8 @@ When returning results, start the file with a YAML frontmatter block, then organ
|
|
|
76
75
|
|
|
77
76
|
Include file paths and line numbers when discussing code evidence.
|
|
78
77
|
|
|
78
|
+
**Item IDs (mandatory).** Every row in sections 1–5 (and any optional section 6) MUST carry a worker-internal item ID unique within this file. Use the leading column for table-form items (`F-001`, `M-001`, `S-001`, `U-001`, `R-001` per section) or a `[<ID>]` prefix for bullet/numbered items. The ID shape is your choice but it MUST appear — the lead's §1.1 / §1.2 / §3.1 synthesis preserves these IDs in its `Source items (worker:item)` column to keep cross-worker traceability intact. See `prompts/profiles/_common-contract.md` "Cross-worker traceability" SSOT.
|
|
79
|
+
|
|
79
80
|
**Ticket tagging:** For runs whose task type is `requirements-discovery`, `error-analysis`, `implementation-planning`, or `implementation`, every item in sections 1–5 MUST carry a ticket identifier. Use the `Ticket ID` column in table-form items and the `[TICKETID: <id>]` prefix in bullet/numbered items. Fill priority: `Issue / Ticket` from the input → `Task ID` (no prefix, e.g. `8852`) → `unknown`. Multiple tickets are comma-separated. Full rules live in the `okstra-team-contract` skill's Ticket Tagging section.
|
|
80
81
|
|
|
81
82
|
This contract mirrors the `okstra-team-contract` skill's Worker Output Contract — that skill is the authoritative source if the two ever diverge.
|
|
@@ -125,8 +125,8 @@ This wrapper does NOT invoke MCP tools directly. MCP availability inside the Cod
|
|
|
125
125
|
Before producing any output, you MUST ensure the underlying Codex CLI run reads every input file enumerated in the `[Required reading]` block of the lead's prompt from the very first character to the very last character. This includes the task brief, analysis profile, analysis material (if present), reference expectations, the carry-in clarification response (if present), and the final report template.
|
|
126
126
|
|
|
127
127
|
- The lead's prompt body, which you persist verbatim and feed into Codex via stdin, already contains the explicit list of files and the end-to-end reading rule. Do not strip or summarize that block before passing it to the CLI.
|
|
128
|
-
- For the carry-in clarification response, the CLI must walk every row of `## 5. Clarification Items` (`C-001`, `C-002`, ...) in full, including rows whose `User input` cell is blank — a blank `User input` with `Status=open` is itself a signal you must surface. The fact that the prior run's final report and the upcoming output share section 5 structure is NOT a license to skim.
|
|
129
|
-
- The
|
|
128
|
+
- For the carry-in clarification response, the CLI must walk every row of `## 5. Clarification Items` (`C-001`, `C-002`, ...) in full, including rows whose `User input` cell is blank — a blank `User input` with `Status=open` is itself a signal you must surface. The fact that the prior run's final report and the upcoming output share section 5 structure is NOT a license to skim.
|
|
129
|
+
- The wrapper writes a Reading Confirmation block to the **audit sidecar** at `runs/<task-type>/worker-results/codex-worker-audit-<task-type>-<seq>.md` (sibling to the main worker-results file). The sidecar's body begins with `# Codex Worker Audit — <task-key>` followed by one short line per input file confirming end-to-end reading (e.g. `- Read task-brief.md end-to-end (147 lines).`). The main Codex output MUST NOT contain a `## 0. Reading Confirmation` heading — the validator now fails worker-results that contain one. If any file was skipped, record a `tool-failure` in the errors sidecar instead of fabricating Findings.
|
|
130
130
|
|
|
131
131
|
## Worker Output Structure
|
|
132
132
|
|
|
@@ -134,7 +134,6 @@ When returning results, start the file with a YAML frontmatter block, then organ
|
|
|
134
134
|
|
|
135
135
|
**Frontmatter (mandatory)** — set `workerId: "codex"`. Copy `id`, `aliases`, `taskType`, `task-id`, `task-group`, `project-id`, `date` verbatim from the input files (`analysis-material.md` is canonical; if it lacks any field, record a `tool-failure` and stop). Full schema and a concrete example live in the `okstra-team-contract` skill's "Result Frontmatter" subsection.
|
|
136
136
|
|
|
137
|
-
0. **Reading Confirmation** - one short line per input file confirming end-to-end reading (e.g. `- Read task-brief.md end-to-end (147 lines).`). If any file was skipped, record a `tool-failure` and do NOT produce sections 1–5.
|
|
138
137
|
1. **Findings** - what Codex identified
|
|
139
138
|
2. **Missing Information or Assumptions** - gaps in the analysis
|
|
140
139
|
3. **Safe or Reasonable Areas** - parts that look correct
|
|
@@ -143,6 +142,8 @@ When returning results, start the file with a YAML frontmatter block, then organ
|
|
|
143
142
|
|
|
144
143
|
Include file paths and line numbers when discussing code evidence.
|
|
145
144
|
|
|
145
|
+
**Item IDs (mandatory).** Every row in sections 1–5 (and any optional section 6) MUST carry a worker-internal item ID unique within this file. Codex tends to use hierarchical numbering (`1.1`, `1.2`, `1.3`, ...); that shape is fine — keep what's natural. What matters is that each item is addressable. The lead's §1.1 / §1.2 / §3.1 synthesis preserves these IDs as `codex:<your-id>` entries in its `Source items (worker:item)` column. See `prompts/profiles/_common-contract.md` "Cross-worker traceability" SSOT.
|
|
146
|
+
|
|
146
147
|
**Ticket tagging:** For runs whose task type is `requirements-discovery`, `error-analysis`, `implementation-planning`, or `implementation`, every item in sections 1–5 MUST carry a ticket identifier. Use the `Ticket ID` column in table-form items and the `[TICKETID: <id>]` prefix in bullet/numbered items. Fill priority: `Issue / Ticket` from the input → `Task ID` (no prefix, e.g. `8852`) → `unknown`. Multiple tickets are comma-separated. Full rules live in the `okstra-team-contract` skill's Ticket Tagging section.
|
|
147
148
|
|
|
148
149
|
This contract mirrors the `okstra-team-contract` skill's Worker Output Contract — that skill is the authoritative source if the two ever diverge.
|
|
@@ -125,8 +125,8 @@ This wrapper does NOT invoke MCP tools directly. MCP availability inside the Gem
|
|
|
125
125
|
Before producing any output, you MUST ensure the underlying Gemini CLI run reads every input file enumerated in the `[Required reading]` block of the lead's prompt from the very first character to the very last character. This includes the task brief, analysis profile, analysis material (if present), reference expectations, the carry-in clarification response (if present), and the final report template.
|
|
126
126
|
|
|
127
127
|
- The lead's prompt body, which you persist verbatim and feed into Gemini via stdin, already contains the explicit list of files and the end-to-end reading rule. Do not strip or summarize that block before passing it to the CLI.
|
|
128
|
-
- For the carry-in clarification response, the CLI must walk every row of `## 5. Clarification Items` (`C-001`, `C-002`, ...) in full, including rows whose `User input` cell is blank — a blank `User input` with `Status=open` is itself a signal you must surface. The structural similarity between the prior final report and the upcoming output is the most common reason this step gets skipped — do not repeat that.
|
|
129
|
-
- The
|
|
128
|
+
- For the carry-in clarification response, the CLI must walk every row of `## 5. Clarification Items` (`C-001`, `C-002`, ...) in full, including rows whose `User input` cell is blank — a blank `User input` with `Status=open` is itself a signal you must surface. The structural similarity between the prior final report and the upcoming output is the most common reason this step gets skipped — do not repeat that.
|
|
129
|
+
- The wrapper writes a Reading Confirmation block to the **audit sidecar** at `runs/<task-type>/worker-results/gemini-worker-audit-<task-type>-<seq>.md` (sibling to the main worker-results file). The sidecar's body begins with `# Gemini Worker Audit — <task-key>` followed by one short line per input file confirming end-to-end reading (e.g. `- Read task-brief.md end-to-end (147 lines).`). The main Gemini output MUST NOT contain a `## 0. Reading Confirmation` heading — the validator now fails worker-results that contain one. If any file was skipped, record a `tool-failure` in the errors sidecar instead of fabricating Findings.
|
|
130
130
|
|
|
131
131
|
## Worker Output Structure
|
|
132
132
|
|
|
@@ -134,7 +134,6 @@ When returning results, start the file with a YAML frontmatter block, then organ
|
|
|
134
134
|
|
|
135
135
|
**Frontmatter (mandatory)** — set `workerId: "gemini"`. Copy `id`, `aliases`, `taskType`, `task-id`, `task-group`, `project-id`, `date` verbatim from the input files (`analysis-material.md` is canonical; if it lacks any field, record a `tool-failure` and stop). Full schema and a concrete example live in the `okstra-team-contract` skill's "Result Frontmatter" subsection.
|
|
136
136
|
|
|
137
|
-
0. **Reading Confirmation** - one short line per input file confirming end-to-end reading (e.g. `- Read task-brief.md end-to-end (147 lines).`). If any file was skipped, record a `tool-failure` and do NOT produce sections 1–5.
|
|
138
137
|
1. **Findings** - what Gemini identified
|
|
139
138
|
2. **Missing Information or Assumptions** - gaps in the analysis
|
|
140
139
|
3. **Safe or Reasonable Areas** - parts that look correct
|
|
@@ -143,6 +142,8 @@ When returning results, start the file with a YAML frontmatter block, then organ
|
|
|
143
142
|
|
|
144
143
|
Include file paths and line numbers when discussing code evidence.
|
|
145
144
|
|
|
145
|
+
**Item IDs (mandatory).** Every row in sections 1–5 (and any optional section 6) MUST carry a worker-internal item ID unique within this file. Gemini may use `F-1`, `F-2`, ... or numbered hierarchical IDs — either is fine. What matters is that each item is addressable. The lead's §1.1 / §1.2 / §3.1 synthesis preserves these IDs as `gemini:<your-id>` entries in its `Source items (worker:item)` column. See `prompts/profiles/_common-contract.md` "Cross-worker traceability" SSOT.
|
|
146
|
+
|
|
146
147
|
**Ticket tagging:** For runs whose task type is `requirements-discovery`, `error-analysis`, `implementation-planning`, or `implementation`, every item in sections 1–5 MUST carry a ticket identifier. Use the `Ticket ID` column in table-form items and the `[TICKETID: <id>]` prefix in bullet/numbered items. Fill priority: `Issue / Ticket` from the input → `Task ID` (no prefix, e.g. `8852`) → `unknown`. Multiple tickets are comma-separated. Full rules live in the `okstra-team-contract` skill's Ticket Tagging section.
|
|
147
148
|
|
|
148
149
|
This contract mirrors the `okstra-team-contract` skill's Worker Output Contract — that skill is the authoritative source if the two ever diverge.
|
|
@@ -46,9 +46,9 @@ If you find yourself thinking "I'll just return the report inline and let lead s
|
|
|
46
46
|
Before writing the final report, you MUST read every input file enumerated in the `[Required reading]` block of the lead's prompt from the very first character to the very last character. This always includes `final-report-template.md` and every analysis worker's result file under `worker-results/`, plus the convergence output under `state/convergence-<task-type>-<seq>.json` (if present).
|
|
47
47
|
|
|
48
48
|
- Use a single `Read` call per file with no `offset` and no `limit`. If a file is too large for one read, page through it with explicit `offset` / `limit` calls covering the full file.
|
|
49
|
-
- For the carry-in `clarification-response.md` (if present), walk every row of `## 5. Clarification Items` (`C-001`, `C-002`, ...) including rows whose `User input` cell is blank — a blank cell with `Status=open` is itself a signal you must surface in
|
|
49
|
+
- For the carry-in `clarification-response.md` (if present), walk every row of `## 5. Clarification Items` (`C-001`, `C-002`, ...) including rows whose `User input` cell is blank — a blank cell with `Status=open` is itself a signal you must surface in the conditional `## 0. Clarification Response Carried In From Previous Run` section (the template's `RENDER_IF` guard activates it when the carry-in path is non-empty). The fact that the file you write has a structurally similar section 5 is NOT an excuse to skim. When no carry-in path was provided, OMIT the `## 0.` heading entirely — do NOT write an empty-state stub.
|
|
50
50
|
- Open every analysis-worker result file under `worker-results/` end-to-end. Do not summarize them from convergence output alone — convergence captures classifications, not full evidence.
|
|
51
|
-
-
|
|
51
|
+
- Write a Reading Confirmation block to your **audit sidecar** at `runs/<task-type>/worker-results/report-writer-worker-audit-<task-type>-<seq>.md` (sibling to the main worker-results file). The sidecar's body begins with `# Report Writer Worker Audit — <task-key>` followed by one short line per input file confirming end-to-end reading. The main final-report and the main worker-results file MUST NOT contain a `## 0. Reading Confirmation` heading — the validator now fails reports that contain one. If you cannot truthfully confirm a file end-to-end, record a `tool-failure` in the errors sidecar instead of fabricating the report.
|
|
52
52
|
- When the convergence-state file is present, read it fully and reproduce the `roundHistory[]` array, `round2SkippedReason`, and `finalClassificationCounts` in the final report's Section 1 Round History sub-table. Do not derive these values from worker results alone — they live in `state/convergence-<task-type>-<seq>.json`.
|
|
53
53
|
|
|
54
54
|
## Authoring Contract
|
|
@@ -58,6 +58,11 @@ The final-report file MUST follow `instruction-set/final-report-template.md` if
|
|
|
58
58
|
Hard rules:
|
|
59
59
|
|
|
60
60
|
- The file's `Author:` header line is `Report writer worker` (your role) — NOT `Claude lead`.
|
|
61
|
+
- **Source items (worker:item) preservation.** When synthesising `## 1.1 Consensus` / `## 1.2 Differences` / `## 3.1 Primary Evidence` rows from worker outputs, the `Source items` / `Supporting workers` / `Workers (position)` / `Source` column MUST list each contributing worker's item ID as `worker:item-id` (e.g. `claude:F-001, codex:1.1, gemini:F-3`). Bare worker-name lists (e.g. `claude, codex, gemini`) are deprecated — they break traceability back to the original worker-results files. See `prompts/profiles/_common-contract.md` "Cross-worker traceability" SSOT.
|
|
62
|
+
- **Verdict Card (top)** is mandatory in every final-report. Its `Verdict Token` / `Direction` / `Next Step` cells MUST byte-match the corresponding cells in `## 2. Final Verdict` and the first item of `## 6. Recommended Next Steps`. The validator treats the card as a non-authoritative index — divergence is `contract-violated`.
|
|
63
|
+
- **No deprecated sections.** Do NOT emit `4.5.8 User Approval Request` (the body stub is deleted; the top-of-report Approval block is the only one), `4.5.9 Open Questions`, `5.1 추가 자료 요청`, or `5.2 사용자 확인 질문`. The validator fails reports that contain any of these headings.
|
|
64
|
+
- **Conditional Section 0.** Render `## 0. Clarification Response Carried In From Previous Run` ONLY when the carry-in path is non-empty. Never write an empty-state stub (`"No prior clarification response was provided."`). The validator fails empty Section 0.
|
|
65
|
+
- **Reading Confirmation** lives in the audit sidecar (`runs/<task-type>/worker-results/report-writer-worker-audit-<task-type>-<seq>.md`), never in the final-report or main worker-results file.
|
|
61
66
|
- Include all four convergence categories (Full Consensus, Partial Consensus, Contested, Worker-Unique). Do not omit Contested or Worker-Unique findings.
|
|
62
67
|
- Include a Round History sub-table in Section 1 (one row per executed round) and a `round2SkippedReason` line below it. When convergence is disabled, omit both. The values are quoted verbatim from `state/convergence-<task-type>-<seq>.json` — do not recompute.
|
|
63
68
|
- Treat `verification-error` votes as their own verdict. They are listed in vote summaries as `verification-error`, not folded into AGREE/DISAGREE counts.
|
|
@@ -85,4 +85,4 @@ Invoke the `okstra` skill now. Read the manifests below for all task metadata, p
|
|
|
85
85
|
|
|
86
86
|
- Source path: `{{CLARIFICATION_RESPONSE_RELATIVE_PATH}}`
|
|
87
87
|
- If the source path above is empty, no prior clarification response was attached to this run.
|
|
88
|
-
- If the source path is set, a copy is staged at `{{INSTRUCTION_SET_RELATIVE_PATH}}/clarification-response.md`. Read it before running workers; reconcile each `C-*` row in section 5 (`## 5. Clarification Items`) of the prior report against new evidence and record the outcome in
|
|
88
|
+
- If the source path is set, a copy is staged at `{{INSTRUCTION_SET_RELATIVE_PATH}}/clarification-response.md`. Read it before running workers; reconcile each `C-*` row in section 5 (`## 5. Clarification Items`) of the prior report against new evidence and record the outcome in the conditional `## 0. Clarification Response Carried In From Previous Run` section of this run's final report (render that heading only when carry-in is non-empty — the validator fails empty Section 0 stubs).
|
|
@@ -43,7 +43,7 @@ profile document.
|
|
|
43
43
|
- `complete` → proceed normally.
|
|
44
44
|
- `partial` → proceed; treat still-unmarked `intent-check:` / `conversion-block:` rows as the `skipped` branch.
|
|
45
45
|
- `skipped` → do NOT silently infer the missing answers. Promote each unmarked `intent-check:` / `conversion-block:` row into this run's `## 5. Clarification Items` as `Kind=decision`. Use `Blocks=approval` in `implementation-planning`, where the row gates the User Approval Request; otherwise use `Blocks=next-phase`. The recommended answer is drawn from the brief's matching content and clearly labelled `보고자 직접 확인 권장`.
|
|
46
|
-
- `pending` (or field missing) → ABORT analysis; write
|
|
46
|
+
- `pending` (or field missing) → ABORT analysis; render the Verdict Card with `Verdict Token = blocked` + `Direction = hold` and write a single `## Reporter Confirmation Required` block (no leading number) summarising which rows are pending. The `## 5. Clarification Items` table carries one row per pending item with `Blocks=approval` in `implementation-planning`, otherwise `Blocks=next-phase`. The operator must rerun `okstra-brief` Step 6.5. Do NOT emit `## 0.` for this case — Section 0 is reserved for clarification-response carry-in only.
|
|
47
47
|
`[CONFIRMED <YYYY-MM-DD> → RC-N]` markers on `Open Questions` rows are the per-row signal that the reporter has answered; their answers live verbatim under `## Reporter Confirmations` in the brief.
|
|
48
48
|
- `Source Material` is reporter-verbatim. Do NOT paraphrase, summarize, reorder, or restructure it. Quote it directly when needed.
|
|
49
49
|
- `Augmentation` entries carry one of four labels — `evidence-link`, `format-conversion`, `terminology-mapping`, `intent-inference`. Treat them as follows:
|
|
@@ -62,10 +62,18 @@ profile document.
|
|
|
62
62
|
- **Canonical column schema (SSOT — must match `templates/reports/final-report.template.md` §5.1 exactly):** every `## 5. Clarification Items` table has exactly these 8 columns, in this order:
|
|
63
63
|
`| ID | Ticket ID | Kind | Statement | Expected form | Blocks | Status | User input |`.
|
|
64
64
|
Profile-specific addenda may tighten cell content but MUST NOT add, remove, rename, or reorder columns. The `ID` cell uses `C-NNN` (3-digit zero-padded), the `Status` cell ∈ `{open, answered, resolved, obsolete}`, and the `Kind` / `Blocks` legal values are listed below.
|
|
65
|
-
- section 5 is a **single unified table** per `final-report-template.md`. Every clarification item — whether the user must attach a file, choose between options, or supply a single number/path — is one row of that table. Do not split it into sub-sections, do not create a parallel table elsewhere in the report, and do not duplicate the same item into
|
|
65
|
+
- section 5 is a **single unified table** per `final-report-template.md`. Every clarification item — whether the user must attach a file, choose between options, or supply a single number/path — is one row of that table. Do not split it into sub-sections (`5.1 추가 자료 요청` / `5.2 사용자 확인 질문` / `4.5.9 Open Questions` are removed and the validator fails reports that reintroduce them), do not create a parallel table elsewhere in the report, and do not duplicate the same item into the top-of-report `User Approval Request (사용자 승인 게이트)` block or any other section.
|
|
66
66
|
- each row's `Kind` column picks one of `{material, decision, data-point}`: `material` for files / snapshots / logs / screenshots the user must attach (the `User input` cell will hold a path or URL); `decision` for choices and yes/no confirmations only the user can make; `data-point` for a single number, ID, date, or short string the user can answer inline. Items that mix "yes/no + file path if yes" are one row of `Kind=material` with the combined expectation written into `Expected form`.
|
|
67
67
|
- each row's `Blocks` column picks one of `{approval, next-phase, none}`. `approval` is reserved for items that gate an approval action, especially the `implementation-planning` User Approval Request; outside `implementation-planning`, unresolved brief reporter-confirmation rows use `next-phase` instead. `next-phase` blocks the next run from starting cleanly. `none` is informational/audit-only.
|
|
68
68
|
- write every entry in full, descriptive sentences that a non-developer can act on without further context. Avoid abbreviations and internal jargon. The `Statement` cell must state *what* is needed, *why* the answer / attachment changes the next step, and (for `material`) *where* the user can find it and *where* to place it. The `Expected form` cell must state the shape of the answer (예/아니오, 보기 중 하나, 숫자/날짜, 파일 경로, 짧은 서술 등); supply concrete option choices when applicable.
|
|
69
69
|
- the same `final-report.md` file is the canonical artifact carried into the next run; the user appends answers inline before rerunning. The preferred turn-around is `scripts/okstra.sh --resume-clarification --task-key <project-id>:<task-group>:<task-id>` (opens the latest report in `$EDITOR`, then auto-reruns the same phase with `--clarification-response` carry-in). The lower-level form `--clarification-response <path>` remains available for scripted runs.
|
|
70
|
-
- if a clarification response was carried in for this run, walk every `C-*` row of the prior report's `## 5. Clarification Items` table
|
|
71
|
-
|
|
70
|
+
- if a clarification response was carried in for this run, render the conditional `## 0. Clarification Response Carried In From Previous Run` section (the template's `RENDER_IF` guard activates it), walk every `C-*` row of the prior report's `## 5. Clarification Items` table, reconcile each one against new evidence, and update its `Status` to `resolved` or `obsolete` before issuing the next decision/verdict. When no carry-in path was provided, omit the `## 0.` heading entirely — the validator fails reports that emit an empty Section 0 stub (e.g. "No prior clarification response was provided for this run.").
|
|
71
|
+
- Verdict Card (shared — applies to every final-report regardless of profile):
|
|
72
|
+
- The top-of-report `## Verdict Card` block is mandatory in every final-report. Its `Verdict Token`, `Direction`, and `Next Step` cells MUST byte-match the corresponding cells in `## 2. Final Verdict` and the first item of `## 6. Recommended Next Steps`. The validator treats the card as a non-authoritative index — when card values diverge from the authoritative sections, the run is `contract-violated`.
|
|
73
|
+
- Cross-worker traceability (shared — applies to every analysis worker output and to the lead's `## 1.` / `## 3.` tables in the final-report):
|
|
74
|
+
- **Worker-side item IDs (free-form but unique within the worker).** Every row item in sections 1–5 (and any optional section 6) of an analysis worker's output MUST carry an item ID that is unique within that one worker's result file. The ID convention is the worker's choice — `F-001` / `F-002` per the suggested schema, `1.1` / `1.2` / `1.3` as Codex tends to use, or any other shape — but it MUST appear as the leading column of the row (for table-form items) or as a `[<ID>]` prefix (for bullet/numbered items). Workers that emit findings without IDs make cross-worker reconciliation impossible.
|
|
75
|
+
- **Lead-side ID assignment + source preservation.** When the lead (or `report-writer-worker`) synthesises `## 1.1 Consensus` / `## 1.2 Differences` / `## 3.1 Primary Evidence` rows from worker outputs, the lead assigns a fresh `C-NNN` / `D-NNN` / `E-NNN` row ID. The `Source items` column (or, where the template still calls it `Supporting workers` / `Workers (position)` / `Source`, that same column) MUST list every contributing worker:item pair (e.g. `claude:F-001, codex:1.1, gemini:F-3`) so a reviewer can trace the synthesised row back to each worker's original wording without re-reading every worker-results file. Bare worker names without item IDs (e.g. `claude, codex, gemini`) are deprecated for these tables; the validator does not yet fail on them but the readability pass treats it as a contract violation.
|
|
76
|
+
- **Why this matters.** A real run had `claude=F-1..F-11`, `codex=1.1..1.8`, `gemini=F-3..F-9` — three incompatible ID schemes. When the lead synthesised `C-1..C-8`, the link from `C-3` back to "which sentence in which worker file" was lost. Source-item preservation restores that link without forcing every worker to adopt a single ID prefix, which would over-constrain worker output style.
|
|
77
|
+
- Audit sidecar (shared — applies to every analysis-worker output and every final-report):
|
|
78
|
+
- Reading Confirmation lines (one short line per input file confirming end-to-end reading) live in the **worker audit sidecar** at `runs/<task-type>/worker-results/<worker>-audit-<task-type>-<seq>.md`, NOT in the worker's main worker-results file. The worker-results body starts at section 1 (Findings). The validator fails worker-results files that contain a `## 0. Reading Confirmation` heading.
|
|
79
|
+
- The audit sidecar carries any other meta the worker wants to log (tool-call counts, MCP query summaries, timing notes). The lead's final-report does NOT duplicate this content — it is consumed by the validator and by post-run audit tooling, not by end-user readers.
|
|
@@ -6,7 +6,7 @@ import json
|
|
|
6
6
|
import sys
|
|
7
7
|
from pathlib import Path
|
|
8
8
|
from .collect import collect
|
|
9
|
-
from .report import substitute_final_report
|
|
9
|
+
from .report import SubstituteRefusedError, substitute_final_report
|
|
10
10
|
|
|
11
11
|
|
|
12
12
|
def main() -> int:
|
|
@@ -60,7 +60,14 @@ def main() -> int:
|
|
|
60
60
|
print(f"sessions={s.get('sessionsFound', 0)} team={s.get('teamName', '')}", file=sys.stderr)
|
|
61
61
|
|
|
62
62
|
if args.substitute_final_report is not None:
|
|
63
|
-
|
|
63
|
+
try:
|
|
64
|
+
replaced = substitute_final_report(args.substitute_final_report, updated)
|
|
65
|
+
except SubstituteRefusedError as exc:
|
|
66
|
+
print(
|
|
67
|
+
f"final-report substitution REFUSED: {exc}",
|
|
68
|
+
file=sys.stderr,
|
|
69
|
+
)
|
|
70
|
+
return 2
|
|
64
71
|
if replaced < 0:
|
|
65
72
|
print(
|
|
66
73
|
f"final-report substitution skipped: file not found at {args.substitute_final_report}",
|
|
@@ -18,19 +18,48 @@ def _format_usd(v) -> str:
|
|
|
18
18
|
return "$0.00"
|
|
19
19
|
|
|
20
20
|
|
|
21
|
+
class SubstituteRefusedError(RuntimeError):
|
|
22
|
+
"""Raised when substitution would write a zero-only Token Usage Summary.
|
|
23
|
+
|
|
24
|
+
Shipping `0` / `$0.00` in the Lead / Worker / Grand rows is the
|
|
25
|
+
observed silent-failure mode where the collector ran but every
|
|
26
|
+
session jsonl was empty (or the writer fabricated zeros). The
|
|
27
|
+
validator catches it post-hoc, but raising here at the substitution
|
|
28
|
+
boundary surfaces the failure at the exact step where it can still
|
|
29
|
+
be retried with a re-collection.
|
|
30
|
+
"""
|
|
31
|
+
|
|
32
|
+
|
|
21
33
|
def substitute_final_report(report_path: Path, state: dict) -> int:
|
|
22
34
|
"""Replace token-usage placeholders in the final report file with concrete
|
|
23
35
|
values from the freshly computed usageSummary.
|
|
24
36
|
|
|
25
37
|
Returns the number of placeholder occurrences replaced. If the report file
|
|
26
|
-
does not exist, returns -1 without raising.
|
|
27
|
-
|
|
28
|
-
|
|
38
|
+
does not exist, returns -1 without raising.
|
|
39
|
+
|
|
40
|
+
Raises ``SubstituteRefusedError`` when ``usageSummary.grandTotalTokens``
|
|
41
|
+
is zero — substituting zeros into the report bakes in the most common
|
|
42
|
+
silent failure mode (collector ran but found nothing). Callers that
|
|
43
|
+
want to suppress the refusal (e.g. unit-test fixtures) can pass a
|
|
44
|
+
summary with ``grandTotalTokens`` > 0 or remove the summary entirely
|
|
45
|
+
so substitution is skipped.
|
|
29
46
|
"""
|
|
30
47
|
if not report_path.is_file():
|
|
31
48
|
return -1
|
|
32
49
|
|
|
33
50
|
summary = state.get("usageSummary") or {}
|
|
51
|
+
grand_total = summary.get("grandTotalTokens", 0)
|
|
52
|
+
if isinstance(grand_total, (int, float)) and grand_total == 0 and summary:
|
|
53
|
+
raise SubstituteRefusedError(
|
|
54
|
+
"Refusing to substitute zero-only usageSummary into the final "
|
|
55
|
+
f"report at {report_path}. grandTotalTokens=0 means the "
|
|
56
|
+
"collector ran but every session jsonl was empty (or absent). "
|
|
57
|
+
"Re-run `python3 scripts/okstra-token-usage.py <team-state> "
|
|
58
|
+
"--write --summary --substitute-final-report <report-path>` "
|
|
59
|
+
"after locating the missing session jsonls. To intentionally "
|
|
60
|
+
"ship zeros (test fixtures only), omit `usageSummary` from the "
|
|
61
|
+
"team-state JSON before calling substitute_final_report."
|
|
62
|
+
)
|
|
34
63
|
cost = summary.get("estimatedCostUsd") or {}
|
|
35
64
|
lead_cost = cost.get("lead") or 0
|
|
36
65
|
worker_cost = cost.get("claudeWorkers") or 0
|
|
@@ -73,7 +73,7 @@ Read the worker result files generated in Phase 4/5 and extract individual findi
|
|
|
73
73
|
- For bullet/numbered findings, parse `[TICKETID: <id>]` from the item title.
|
|
74
74
|
- Items with multiple tickets (e.g. `TICKET-123, TICKET-456`) expand to a set of ticket keys.
|
|
75
75
|
- Items tagged `unknown` keep the literal `unknown` as their ticket key.
|
|
76
|
-
2. For each finding, record the summary, evidence (file path, line number, basis), the worker who identified it, and the parsed ticket set.
|
|
76
|
+
2. For each finding, record the summary, evidence (file path, line number, basis), the worker who identified it, **the worker-internal item ID assigned by that worker** (e.g. `F-001`, `1.1`, `F-3` — see `prompts/profiles/_common-contract.md` "Cross-worker traceability" SSOT), and the parsed ticket set. The item ID is persisted on the finding record as `findings[].discoveredBy.<worker>.itemId` and on each cross-worker confirmation as `findings[].sourceItems[]` (one entry per contributing `<worker>:<item-id>` pair). The final-report's `## 1.1 Consensus` / `## 1.2 Differences` / `## 3.1 Primary Evidence` tables read this list verbatim into their `Source items` columns — without this, the synthesised `C-NNN` row has no traceable link back to the original worker wording.
|
|
77
77
|
3. Claude Lead groups findings based on semantic similarity AND ticket-set equality:
|
|
78
78
|
- Same semantics + same ticket set across 2+ workers → immediately reach `full consensus`.
|
|
79
79
|
- Same semantics but disjoint ticket sets → keep as separate groups (do NOT over-merge across tickets).
|
|
@@ -521,7 +521,7 @@ Plan-body verification only supports **lightweight mode** (defined in §"Verific
|
|
|
521
521
|
- all dispatches non-result → `aborted-non-result`
|
|
522
522
|
- any `partial-consensus` / `dissent-isolated` present, no `majority-disagree` → `passed-with-dissent`
|
|
523
523
|
- all items `full-consensus` → `passed`
|
|
524
|
-
6. Lead writes `runs/<task-type>/state/plan-body-verification-<task-type>-<seq>.json` (schema below) and populates `### 4.5.9 Plan Body Verification` in the final report (template at `templates/reports/final-report.template.md`).
|
|
524
|
+
6. Lead writes `runs/<task-type>/state/plan-body-verification-<task-type>-<seq>.json` (schema below) and populates `### 4.5.9 Plan Body Verification` in the final report (template at `templates/reports/final-report.template.md`). The §4.5.9 body in the template is split into two tables: a narrow `#### Verdict summary` (`Plan item / Ticket ID / Section / Classification` only — one row per plan item) and a tall `#### Verdict details` (`Plan item / Worker / Verdict / Breakage kind / Note` — one row per plan-item × worker pair). The older wide `| Plan item | <worker1> | <worker2> | … | Classification |` matrix is removed — it scaled horizontally with the worker count and lost readability past 3 workers. Lead MUST emit both tables (the validator's `Plan Body Verification` + `Gate result:` substring checks still pass either layout, but reviewers depend on the split form).
|
|
525
525
|
7. For every `majority-disagree` item, lead adds a row to `## 5. Clarification Items` with:
|
|
526
526
|
- new `C-<N>` ID (numbering continues from any existing rows)
|
|
527
527
|
- `Statement` summarising the disagreement and the worker breakage `<kind>`
|
|
@@ -185,8 +185,8 @@ When the run's `task-type` is `implementation-planning`, the final report MUST c
|
|
|
185
185
|
| 5 | `Dependency` | `### Dependency / Migration Risk (의존성·마이그레이션 위험)` |
|
|
186
186
|
| 6 | `Validation Checklist` | `### Validation Checklist (검증 체크리스트)` |
|
|
187
187
|
| 7 | `Rollback` | `### Rollback Strategy (롤백 전략)` |
|
|
188
|
-
| 8 | `User Approval Request` |
|
|
189
|
-
| 9 | `Plan Body Verification` + `Gate result:` | `### Plan Body Verification (계획 본문 검증)` containing a `Gate result:` line — copy `okstra-final-report.template.md §4.5.9` verbatim. Validator
|
|
188
|
+
| 8 | `User Approval Request` | Satisfied by the top-of-report `## User Approval Request (사용자 승인 게이트)` block. Do NOT recreate a `### 4.5.8 User Approval Request` body stub — the validator now fails reports that contain one. |
|
|
189
|
+
| 9 | `Plan Body Verification` + `Gate result:` | `### Plan Body Verification (계획 본문 검증)` containing a `Gate result:` line — copy `okstra-final-report.template.md §4.5.9` verbatim. Validator checks both substrings. |
|
|
190
190
|
|
|
191
191
|
The Korean translation in parentheses is optional but the English keyword is mandatory. The body of each section is written in Korean per the writing rules below. For non-`implementation-planning` runs, omit this entire block — these headings are NOT validator-checked for other task-types.
|
|
192
192
|
|
|
@@ -232,12 +232,14 @@ Skipping this file because "the real report is in `reports/`" is wrong. Both fil
|
|
|
232
232
|
|
|
233
233
|
Section numbering follows `templates/reports/final-report.template.md` exactly — that file is the single source of truth. Below is a one-line summary of each section's writer obligation; consult the template for full body structure.
|
|
234
234
|
|
|
235
|
-
|
|
235
|
+
**Verdict Card (top-of-report, mandatory).** Render `## Verdict Card` between the report header and the (conditional) Approval block. Its `Verdict Token` / `Direction` / `Next Step` cells MUST byte-match the corresponding cells in `## 2. Final Verdict` and the first item of `## 6.`. Divergence is `contract-violated`.
|
|
236
|
+
|
|
237
|
+
0. **Clarification Response Carried In** — render this `## 0.` heading ONLY when `{{CLARIFICATION_RESPONSE_RELATIVE_PATH}}` is non-empty. Walk every `C-*` row of the prior report's `## 5. Clarification Items` table, reconcile against new evidence, and record the outcome (`resolved` / `obsolete`) with citation before drafting the verdict. When no carry-in path was provided, OMIT the `## 0.` heading entirely — the validator fails an empty Section 0 stub.
|
|
236
238
|
1. **Cross Verification Results** — 4 categories (Full / Partial / Contested / Worker-Unique) when convergence is enabled, per `okstra-convergence`. Prepend the Round History sub-table (columns: `Round | inputQueueSize | resolvedCount | carriedForwardCount | dispatches | skippedWorkers`) plus a `round2SkippedReason: <value>` note, pulled verbatim from `convergence-<task-type>-<seq>.json`. Empty contested list renders as `- 합의 미달 항목 없음.`. Convergence-disabled runs use the legacy Consensus/Differences format and omit the round table.
|
|
237
239
|
2. **Final Verdict** — `Direction` ∈ `continue-investigation` / `begin-implementation` / `approve` / `reject` / `hold`. **Verdict Token** is `not-applicable` for every task-type except `final-verification` — see "Final-verification verdict token contract" below for that case.
|
|
238
240
|
3. **Evidence and Detailed Analysis** — primary evidence rows (file path, line, snippet); secondary evidence / alternate interpretations. If `reference-expectations.md` lists explicit expected values, record match/gap per row.
|
|
239
241
|
4. **Missing Information and Risks** — uncertain / "I don't know" items. `implementation-planning` adds §4.5 (see heading contract below); `release-handoff` adds §4.6.
|
|
240
|
-
5. **Clarification Items** — single unified `C-*` table; column schema, ID convention, and rerun behaviour are owned by `_common-contract.md §Clarification request policy` (8-column SSOT).
|
|
242
|
+
5. **Clarification Items** — single unified `C-*` table; column schema, ID convention, and rerun behaviour are owned by `_common-contract.md §Clarification request policy` (8-column SSOT). The deprecated `4.5.9 Open Questions` / `5.1 추가 자료 요청` / `5.2 사용자 확인 질문` sub-sections are removed; the validator fails reports that reintroduce them.
|
|
241
243
|
6. **Recommended Next Steps** — prioritized actions. After Phase 7's follow-up spawner runs, append a row per newly created task-key (see "Phase 6 → Phase 7 execution sequence" above).
|
|
242
244
|
7. **Follow-up Tasks** — auto-spawn-eligible table. Each row drives `okstra-spawn-followups.py`; see template §7 for the row schema.
|
|
243
245
|
|
|
@@ -132,19 +132,23 @@ Reading rules:
|
|
|
132
132
|
large for one read; if you must page, you MUST cover the entire file
|
|
133
133
|
before moving on, and you MUST state the page boundaries you used in your
|
|
134
134
|
Findings section.
|
|
135
|
-
- For the carry-in clarification response, read
|
|
136
|
-
|
|
135
|
+
- For the carry-in clarification response, read the conditional
|
|
136
|
+
`## 0. Clarification Response Carried In From Previous Run` section
|
|
137
|
+
(rendered only when carry-in is non-empty) and every row of
|
|
138
|
+
`## 5. Clarification Items` (`C-001`, `C-002`, ...) in full,
|
|
137
139
|
including rows whose `User input` cell is blank. The fact that you
|
|
138
140
|
will write your output into a file with a structurally similar
|
|
139
141
|
section 5 is NOT an excuse to skim — the prior `C-*` rows carry
|
|
140
|
-
context you cannot reconstruct from the new run alone.
|
|
141
|
-
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
|
|
142
|
+
context you cannot reconstruct from the new run alone.
|
|
143
|
+
- Write the Reading Confirmation block to your **audit sidecar** at
|
|
144
|
+
`runs/<task-type>/worker-results/<worker>-audit-<task-type>-<seq>.md`
|
|
145
|
+
(sibling to the main worker-results file). One short line per input
|
|
146
|
+
file confirming end-to-end reading, e.g. "Read task-brief.md
|
|
147
|
+
end-to-end (147 lines)." Do NOT include a `## 0. Reading Confirmation`
|
|
148
|
+
heading in the main worker-results file — the validator now fails
|
|
149
|
+
worker-results that contain one. If you cannot truthfully confirm a
|
|
150
|
+
file end-to-end, record a `tool-failure` in the errors sidecar
|
|
151
|
+
instead of fabricating Findings.
|
|
148
152
|
- Do not collapse multiple input files into a single mental summary before
|
|
149
153
|
reading them all individually. Each file has its own canonical role
|
|
150
154
|
(brief = the user's request, profile = the lead's rules for this phase,
|