okstra 0.2.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,402 @@
1
+ ---
2
+ name: okstra-team-contract
3
+ description: Use when okstra is in Phases 2–5 and needs team operating rules for the Claude lead + worker structure, or when verifying worker roster and model assignments.
4
+ user-invocable: false
5
+ ---
6
+
7
+ # OKSTRA Team Contract
8
+ ## When to Use
9
+
10
+ - During okstra Skill Phases 2–5 (prompt preparation and execution)
11
+ - When verifying worker team composition and operational rules
12
+ - When applying model assignment rules
13
+
14
+ ## Team Structure
15
+
16
+ okstra tasks are always operated using the `Claude lead` + required worker team structure.
17
+
18
+ ### Role Definitions
19
+
20
+ | Role | Responsibilities | Default Model | subagent_type | Notes |
21
+ |------|------|-----------|---------------|------|
22
+ | Claude lead | orchestration + convergence supervision + final-report review/approval | opus | -- | Does NOT author the final-report file when `Report writer worker` is in the roster |
23
+ | Claude worker | Inference quality, hidden assumptions, execution risks | sonnet | claude-worker | `agents/claude-worker.md` |
24
+ | Codex worker | Implementation feasibility, code paths, edge cases | gpt-5.5 | codex-worker | `agents/codex-worker.md` |
25
+ | Gemini worker | Requirement interpretation, consistency, safety, alternatives | auto | gemini-worker | `agents/gemini-worker.md` |
26
+ | Report writer worker | **Authors** the final-report file in Phase 6. NOT an analysis worker. | opus | report-writer-worker | `agents/report-writer-worker.md`. Excluded from Phase 4/5 and convergence |
27
+
28
+ ### Model Assignment Rules
29
+
30
+ 1. If there is an explicit assignment in `resultContract.requiredWorkerRoles` in `task-manifest.json` and in the lead model metadata, that is the canonical assignment.
31
+ 2. If there is no explicit assignment, use the default model above.
32
+ 3. If `modelExecutionValue` differs from `model`, use `modelExecutionValue` during execution.
33
+
34
+ ### Dynamic Worker Role Determination
35
+
36
+ Only workers selected from `recommendedWorkers` in `task-manifest.json` and `resultContract.requiredWorkerRoles` become required roles.
37
+
38
+ - If one worker is selected: "`<role>` is the required worker role for this run."
39
+ - If two or more workers are selected: "`<role1>`, `<role2>`, and `<role3>` are required worker roles."
40
+ - If Gemini is selected: "`Gemini worker` must be attempted in this workflow."
41
+ - If Gemini is not selected: "`Gemini worker` is not selected for this run, so it does not need to be attempted."
42
+
43
+ ## Operating Rules
44
+
45
+ 1. `Claude lead` is responsible for orchestration, convergence supervision, and final-report review/approval. It never overrides worker analysis results, and it never authors the final-report file when `Report writer worker` is in the roster.
46
+ 2. `Report writer worker` is NOT an analysis worker. It is excluded from Phase 4/5 (initial analysis) and Phase 5.5 (convergence re-verification). It is spawned only in Phase 6 and is the **author** of the final-report file at `runs/<task-type>/reports/final-report-<task-type>-<seq>.md`.
47
+ 3. When `Report writer worker` is in the roster, Lead MUST dispatch it in Phase 6. The only legal lead-authored fallback is when a dispatch was attempted and recorded a terminal status of `error` / `timeout` / `not-run` with a concrete logged reason. Speculative reasons such as "session resume constraint" or "team is no longer alive" are NOT valid — Lead can always dispatch a fresh subagent (omit `team_name` if the team is gone).
48
+ 4. The assigned model for each role is maintained based on `resultContract.requiredWorkerRoles` in task-manifest.json and the lead model metadata.
49
+ 5. Required roles must not be replaced by unnamed generic parallel workers.
50
+ 6. Before dispatching any required worker, persist the exact worker prompt to the assigned current-run prompt history path under `runs/<task-type>/prompts/`.
51
+ 7. Before the final decision, collect results or explicit terminal statuses for each required worker role.
52
+ 8. If a worker is attempted with status `completed`, `timeout`, or `error`, the corresponding worker prompt history file must actually exist.
53
+ 9. If a worker result is `completed`, the corresponding worker result file must actually exist.
54
+ 10. Treat `team-state` as the canonical source; if it differs from the report's role label, follow `team-state`.
55
+ 11. A validator success status is required before the final completion determination.
56
+
57
+ ## Worker Prompt Composition
58
+
59
+ Every worker prompt MUST start with the following anchor headers, in this exact order, before any other content:
60
+
61
+ 1. `**Project Root:** <absolute-path>` — absolute target project root (from `{{PROJECT_ROOT}}` in the lead's prompt). Required so the worker can self-anchor without relying on inherited cwd.
62
+ 2. `**Prompt History Path:** <project-relative-path>`
63
+ 3. `**Result Path:** <project-relative-path>`
64
+ 4. `Assigned worker prompt history path: <absolute-path>` — same as the prompt-history path but resolved against `Project Root`. Codex/Gemini wrapper subagents extract this exact line.
65
+
66
+ The body must include: role name, task type, task key, required bundle paths, assigned model, output contract, evidence handling rules, and any relevant config/deployment expectations from `reference-expectations.md`.
67
+
68
+ When a worker reads any project-relative path from the prompt, it MUST resolve it against `Project Root` (e.g. `<Project Root>/<Result Path>`) — never use bare relative paths that depend on cwd.
69
+
70
+ If the task brief contains an `## Available MCP Servers` section, copy that section verbatim into every analysis worker's prompt (and into the report-writer prompt when it is dispatched in Phase 6). Codex/Gemini workers run external CLIs whose MCP availability is governed by their own CLI configs; still forward the section so they can record `MCP not available in this CLI` cleanly when their CLI lacks the matching server.
71
+
72
+ Before dispatching any required worker, lead persists the exact worker prompt to the assigned current-run prompt history path under `runs/<task-type>/prompts/`. Do not use `/tmp/*prompt*.txt` as the canonical artifact path.
73
+
74
+ Do not send identical undifferentiated prompts to every worker unless that is clearly the best option. Role-specific emphasis (Phase 2 of `okstra` skill) is the canonical guidance.
75
+
76
+ ### Required reading clause (analysis workers + report-writer worker)
77
+
78
+ Inject the following clause verbatim into every analysis worker's prompt and into the report-writer worker's prompt. Workers tend to skim long input documents — especially when the carry-in clarification response is structurally similar to the file they will write into next. This clause is the single biggest lever against that failure mode; do not paraphrase it, do not move it lower in the prompt, and do not omit any input file path.
79
+
80
+ Replace placeholder file paths in the `[Required reading]` block with the actual project-relative paths derived from the run's `instruction-set/` and (if applicable) the carry-in clarification response.
81
+
82
+ **Audience-scoped enumeration (BLOCKING — performance optimization):**
83
+
84
+ Different recipients need different files. Do NOT include `final-report-template.md` in analysis worker prompts: analysis workers produce findings (not the final report), and forcing them to read the template inflates token usage without changing finding quality.
85
+
86
+ | Recipient | Files included in `[Required reading]` |
87
+ |---|---|
88
+ | Claude / Codex / Gemini analysis workers | task-brief, analysis-profile, analysis-material (if present), reference-expectations, clarification-response (if carry-in) |
89
+ | Report writer worker (Phase 6) | all of the above **plus** `final-report-template.md` |
90
+ | Reverify dispatches (Phase 5.5, lightweight mode) | **do NOT inject the `[Required reading]` clause at all** — see [okstra-convergence](../okstra-convergence/SKILL.md) "Reverify prompt: required-reading suppression". |
91
+
92
+ **Asymmetry between claude-worker and codex/gemini-worker prompts (NOT a bug):**
93
+
94
+ The dispatch prompt the lead constructs for `claude-worker` is intentionally shorter than for `codex-worker` / `gemini-worker`. Do NOT "fix" this by re-injecting `[Required reading]` / `[Error reporting]` / `[Output Contract]` blocks into the claude-worker prompt:
95
+
96
+ - `claude-worker` is an in-process Claude subagent. The Agent SDK auto-loads `agents/claude-worker.md`, which already contains `## Required Reading Before Any Analysis`, `## Worker Output Structure`, and `## Error reporting`. Re-injecting them is redundant and wastes tokens.
97
+ - `codex-worker` / `gemini-worker` shell out to a CLI. The CLI never sees the agent definition file — it only sees the prompt body passed via stdin. Therefore the lead MUST inject those three blocks into the dispatch prompt for those two workers.
98
+
99
+ Worker definition file size (claude-worker ~106 lines vs codex/gemini ~175–179 lines) is NOT evidence of incompleteness — it reflects the in-process vs CLI-wrapper distinction.
100
+
101
+ ```
102
+ [Required reading]
103
+ You are required to read every input file listed below from the very first
104
+ character to the very last character before you produce any analysis output.
105
+ Skimming, partial reads, jumping to a single section, or relying on prior
106
+ knowledge of a similar file's structure is not acceptable. Each file may
107
+ contain decisive context that is not surfaced in its summary or first page.
108
+
109
+ Required files for this run (read in this order, end-to-end, no exceptions):
110
+
111
+ 1. <Project Root>/<instruction-set>/task-brief.md
112
+ 2. <Project Root>/<instruction-set>/analysis-profile.md
113
+ 3. <Project Root>/<instruction-set>/analysis-material.md (only if present)
114
+ 4. <Project Root>/<instruction-set>/reference-expectations.md
115
+ 5. <Project Root>/<instruction-set>/clarification-response.md (only if a carry-in was provided for this run)
116
+ 6. <Project Root>/<instruction-set>/final-report-template.md (REPORT WRITER ONLY — omit for analysis workers and reverify dispatches)
117
+
118
+ Reading rules:
119
+
120
+ - Use a single Read tool call per file with no offset and no limit. Do not
121
+ page through the file with offset/limit unless the file is genuinely too
122
+ large for one read; if you must page, you MUST cover the entire file
123
+ before moving on, and you MUST state the page boundaries you used in your
124
+ Findings section.
125
+ - For the carry-in clarification response, read sub-section 0, sub-section
126
+ 5.1 (`A1`, `A2`, ... — material requests), and sub-section 5.2 (`Q1`,
127
+ `Q2`, ... — user questions) in full, including every row of every table,
128
+ even if the answer column appears blank. The fact that you will write
129
+ your output into a file with a structurally similar Section 5 is NOT an
130
+ excuse to skim — the prior `A*` and `Q*` rows carry context you cannot
131
+ reconstruct from the new run alone.
132
+ - Before writing any Findings, state in one sentence per file that you
133
+ read it end-to-end. Example: "Read task-brief.md end-to-end (147 lines)."
134
+ If you cannot truthfully say this for a file, do not produce Findings —
135
+ record a `tool-failure` in the errors sidecar instead.
136
+ - Do not collapse multiple input files into a single mental summary before
137
+ reading them all individually. Each file has its own canonical role
138
+ (brief = the user's request, profile = the lead's rules for this phase,
139
+ reference-expectations = ground-truth config/deployment values,
140
+ clarification-response = prior run's open questions and the user's
141
+ answers, final-report-template = the structure your eventual writeup
142
+ must conform to). Conflating them loses signal.
143
+ ```
144
+
145
+ ### Error reporting clause (analysis workers)
146
+
147
+ Inject the following clause verbatim into every analysis worker's prompt
148
+ (Claude / Codex / Gemini). All three workers' subagent definitions already
149
+ include the same contract — the prompt clause exists as a redundant safety
150
+ net so any worker dispatched without its custom definition still receives
151
+ identical instructions.
152
+
153
+ ```
154
+ [Error reporting]
155
+ If any tool call you make (Bash, Read, Edit, MCP, etc.) returns a non-zero
156
+ exit code, raises an exception, or otherwise fails its intended effect,
157
+ append a single entry to your worker errors sidecar at:
158
+
159
+ runs/<task-type>/worker-results/<role-slug>-errors-<task-type>-<seq>.json
160
+
161
+ Schema (create the file with {"schemaVersion": 1, "errors": []} if absent):
162
+
163
+ {
164
+ "ts": "<ISO 8601 UTC>",
165
+ "phase": "<current okstra phase>",
166
+ "errorType": "tool-failure",
167
+ "command": "<failed command/tool signature>",
168
+ "commandKind": "bash | tool:Read | tool:Edit | mcp | ...",
169
+ "exitCode": <int or null>,
170
+ "durationMs": <int or null>,
171
+ "message": "<one-line human summary>",
172
+ "stderrExcerpt": "<first ~2KB of stderr, or null>",
173
+ "context": { ... or null }
174
+ }
175
+
176
+ Do NOT include source / recordedAt / agent / agentRole / model / taskKey —
177
+ Lead will fill those in. Do NOT use errorType values other than
178
+ "tool-failure" in the sidecar. Continue your task after recording; do not
179
+ abort unless the failure makes the task impossible.
180
+ ```
181
+
182
+ The substituted `<role-slug>` is `claude-worker`, `codex-worker`, or
183
+ `gemini-worker` matching the receiving role.
184
+
185
+ ## Terminal Statuses
186
+
187
+ Terminal statuses that can be recorded for a worker:
188
+
189
+ | Status | Meaning |
190
+ |--------|------|
191
+ | `completed` | Normal completion; prompt history file and result file must exist |
192
+ | `timeout` | Timeout, reason recorded; prompt history file must exist |
193
+ | `error` | Execution error, reason recorded; prompt history file must exist |
194
+ | `not-run` | Not executed, reason recorded |
195
+
196
+ ## Worker Output Contract
197
+
198
+ **Authoritative source.** If other documents (SKILL.md, worker agent definitions) disagree with this section, this section wins.
199
+
200
+ A successful worker result must include the following sections in this exact order:
201
+
202
+ 0. **Reading Confirmation** — one short line per input file (`task-brief.md`, `analysis-profile.md`, `analysis-material.md` if present, `reference-expectations.md`, `clarification-response.md` if a carry-in was provided, `final-report-template.md`) stating that the worker read it end-to-end. Each line takes the form `- Read <file-name> end-to-end (<line-count> lines).`. If a file was skipped or only partially read, the worker MUST NOT produce sections 1–5; instead it records a `tool-failure` in the errors sidecar and stops. This section exists specifically to counteract the common failure mode where workers skim long inputs because they share structure with the file the run will eventually write into.
203
+ 1. Findings
204
+ 2. Missing Information or Assumptions
205
+ 3. Safe or Reasonable Areas
206
+ 4. Uncertain Points
207
+ 5. Recommended Next Actions
208
+
209
+ Code evidence must include file paths and line numbers.
210
+
211
+ ### Optional errors sidecar (worker-reported)
212
+
213
+ A worker MAY produce an errors sidecar file at:
214
+
215
+ ```
216
+ runs/<task-type>/worker-results/<role-slug>-errors-<task-type>-<seq>.json
217
+ ```
218
+
219
+ This sidecar collects tool failures observed inside the worker's session
220
+ (non-zero Bash exits, MCP errors, tool exceptions). It is optional — its
221
+ absence does not invalidate a worker result.
222
+
223
+ Schema:
224
+
225
+ ```json
226
+ {
227
+ "schemaVersion": 1,
228
+ "errors": [
229
+ {
230
+ "ts": "<ISO 8601 string>",
231
+ "phase": "<okstra phase 1..7>",
232
+ "errorType": "tool-failure",
233
+ "command": "<failed command or tool signature>",
234
+ "commandKind": "bash | mcp | tool:Read | tool:Edit | ...",
235
+ "exitCode": <int|null>,
236
+ "durationMs": <int|null>,
237
+ "message": "<one-line human summary>",
238
+ "stderrExcerpt": "<first ~2KB of stderr or null>",
239
+ "context": { "<freeform>": "..." }
240
+ }
241
+ ]
242
+ }
243
+ ```
244
+
245
+ Workers MUST omit `source` / `recordedAt` / `agent` / `agentRole` / `model` /
246
+ `taskKey`. Claude lead fills those in when dumping the sidecar to
247
+ `runs/<task-type>/logs/errors-<task-type>-<seq>.jsonl` via
248
+ `scripts/okstra-error-log.py append-from-worker`.
249
+
250
+ Workers MUST use only `errorType: "tool-failure"` in the **sidecar file**.
251
+
252
+ - `cli-failure` events are recorded by the wrapper subagent itself (Codex / Gemini), but **directly to the run-level error log** via `okstra-error-log.py append-observed --error-type cli-failure ...` — NOT via the sidecar. The sidecar is an in-process tool-failure channel only.
253
+ - `contract-violation` events (C) are recorded by Lead via `okstra-error-log.py append-observed --error-type contract-violation ...` after inspecting worker outputs.
254
+ - Lead's responsibility regarding the sidecar is to dump it to the run-level error log via `okstra-error-log.py append-from-worker` after each worker terminates; Lead does not write into the sidecar.
255
+
256
+ ## Convergence Phase Rules
257
+
258
+ 1. Re-verification uses the same worker roles and model assignments as the initial run.
259
+ 2. Re-verification workers follow a constrained response format (verdict + brief explanation).
260
+ 3. Workers cannot vote on their own findings (only verify other workers’ work).
261
+ 4. The `report writer worker` does not participate in re-verification voting. It is responsible only for generating the final report.
262
+ 5. The Claude lead determines the semantic equivalence of findings (this is not delegated to workers).
263
+ 6. Batch processing is performed with one spawn per worker per round (not one spawn per finding).
264
+ 7. These rules do not apply if Convergence is disabled.
265
+
266
+ ## Re-verification Terminal Statuses
267
+
268
+ | Status | Meaning |
269
+ |--------|------|
270
+ | `verification-completed` | Re-verification vote completed |
271
+ | `verification-timeout` | Re-verification timeout |
272
+ | `verification-error` | Re-verification error |
273
+
274
+ ## Worker Result Header Standard
275
+
276
+ Every worker result file under `worker-results/` must begin with a standardized header:
277
+
278
+ ```markdown
279
+ # <Role> Analysis — <task-key>
280
+
281
+ **Task:** <task-type>
282
+ **Date:** <YYYY-MM-DD>
283
+ **Model:** <Role>, <AI model>
284
+ ```
285
+
286
+ Examples:
287
+
288
+ ```markdown
289
+ # Claude Worker Analysis — jobs:tasks:8852
290
+
291
+ **Task:** error-analysis
292
+ **Target:** server/auth.ts
293
+ **Date:** 2026-04-06
294
+ **Model:** Claude worker, sonnet
295
+ ```
296
+
297
+ ```markdown
298
+ # Codex Worker Analysis — jobs:tasks:8852
299
+
300
+ **Task:** error-analysis
301
+ **Target:** server/auth.ts
302
+ **Date:** 2026-04-06
303
+ **Model:** Codex worker, <codex-model-id>
304
+ ```
305
+
306
+ ```markdown
307
+ # Report Writer Worker Analysis — jobs:tasks:8852
308
+
309
+ **Task:** error-analysis
310
+ **Target:** server/auth.ts
311
+ **Date:** 2026-04-06
312
+ **Model:** Report writer worker, opus
313
+ ```
314
+
315
+ Use the actual model identifier recorded in team-state (never invent a model ID — read it from `resultContract.requiredWorkerRoles[*].modelExecutionValue` or the tool response metadata).
316
+
317
+ The header is followed by the standard worker output contract sections (Findings, Missing Information, etc.).
318
+
319
+ ## Usage Tracking
320
+
321
+ Token usage is collected from agent session transcripts after the run, NOT from any in-band Agent-tool response. Neither the Agent tool nor the running session exposes token counts to the lead in real time, so any "estimate" is unreliable. Use the script.
322
+
323
+ ### How to Collect
324
+
325
+ At the **start of Phase 7** (persistence), run the helper script with the path to this run's `team-state.json`:
326
+
327
+ ```bash
328
+ python3 scripts/okstra-token-usage.py \
329
+ <runDirectoryPath>/state/team-state-<task-type>-<seq>.json \
330
+ --write --summary
331
+ ```
332
+
333
+ (Use the absolute path to `scripts/okstra-token-usage.py` — it lives in `Okstra/scripts/`.)
334
+
335
+ The script reads:
336
+ - `~/.claude/projects/<encoded-cwd>/<sessionId>.jsonl` for the lead and every Claude-side worker (Claude worker, Report writer worker, plus the Claude wrappers around Codex/Gemini workers). Sessions are discovered by `teamName: okstra-<task-id>`, lead is identified by `lead.sessionId`, and other workers are identified by `agentName` (e.g. `claude-worker`, `codex-worker`, `gemini-worker`, `report-writer`).
337
+ - `~/.codex/sessions/Y/M/D/rollout-*.jsonl` for the underlying Codex CLI session (matched by `cwd` and timestamp window of the wrapper subagent). Last `event_msg.token_count.total_token_usage.total_tokens` is the session total.
338
+ - `~/.gemini/tmp/<project>/chats/session-*.json` for the underlying Gemini CLI session. Sum of per-message `tokens.total`.
339
+
340
+ ### Resulting team-state shape
341
+
342
+ ```json
343
+ {
344
+ "leadUsage": {
345
+ "totalTokens": 10479327,
346
+ "inputTokens": 18,
347
+ "outputTokens": 24360,
348
+ "cacheCreationTokens": 612000,
349
+ "cacheReadTokens": 9842949,
350
+ "toolUses": 47,
351
+ "durationMs": 7253000,
352
+ "source": "claude-jsonl",
353
+ "sessionId": "<lead-session-id>",
354
+ "collectedAt": "<utc>"
355
+ },
356
+ "workers": [
357
+ {
358
+ "workerId": "codex",
359
+ "usage": {
360
+ "totalTokens": 2274011,
361
+ "source": "claude-jsonl",
362
+ "sessionId": "<wrapper-session-id>",
363
+ "cliSessionPath": "/Users/.../.codex/sessions/.../rollout-*.jsonl",
364
+ "cliTotalTokens": 5261833,
365
+ "cliModel": "gpt-5.5"
366
+ }
367
+ }
368
+ ],
369
+ "usageSummary": {
370
+ "leadTotalTokens": 10479327,
371
+ "workerTotalTokens": 7988699,
372
+ "grandTotalTokens": 18468026,
373
+ "sessionsFound": 5,
374
+ "teamName": "okstra-DEV-9045"
375
+ }
376
+ }
377
+ ```
378
+
379
+ ### Notes
380
+
381
+ - `totalTokens` in Claude usage blocks is the sum of input + output + cache_creation + cache_read tokens (the cache figures dominate in long sessions).
382
+ - For Codex/Gemini workers, `usage.totalTokens` reflects the **Claude wrapper subagent** spend (the Claude tokens consumed by the codex-worker / gemini-worker subagent itself). The optional `cliTotalTokens` is the underlying CLI's own tokens. They are NOT additive in any meaningful way (different providers, different prices) — keep them separate.
383
+ - If a worker has status `not-run`, the script records an "unavailable" block.
384
+ - If the lead jsonl is missing (e.g. if `lead.sessionId` was never persisted), the script records an "unavailable" block with the searched path. Always populate `lead.sessionId` in team-state at Phase 3 — the okstra.sh launcher already passes it as `CLAUDE_SESSION_ID`.
385
+ - Convergence re-verification agents are dispatched as fresh subagents under the same `teamName` and will be discovered automatically. If you want them split out, set a distinct `agentName` on dispatch and post-process.
386
+
387
+ ### Lead Duration
388
+
389
+ `durationMs` for the lead is computed from the first to last timestamp in the lead jsonl. If you need wall-clock from run-start to report-completion instead, override `leadUsage.durationMs` after running the script.
390
+
391
+ ## Team State Persistence
392
+
393
+ Information to be recorded in the team-state JSON file:
394
+ - Current status of each worker role
395
+ - Start/end times for each worker
396
+ - Prompt history path for each worker
397
+ - Path to the result file for each worker
398
+ - Usage metadata for each worker (totalTokens, toolUses, durationMs)
399
+ - Lead usage metadata (totalTokens, toolUses, durationMs) under `leadUsage`
400
+ - Current status of the entire run
401
+ - Path to the run-level error log file (`runs/<task-type>/logs/errors-<task-type>-<seq>.jsonl`) under `errorsLogPath`
402
+ - Per-worker errors sidecar path under `workers[].errorsSidecarPath`
@@ -0,0 +1,138 @@
1
+ ---
2
+ name: okstra-time-summary
3
+ description: Use when the user asks how long an okstra task took, time spent per task type, per-worker elapsed time, or for a duration/runtime breakdown of a specific task-id. Trigger words include "작업 시간", "소요 시간", "time summary", "duration", "elapsed", "얼마나 걸렸", "시간 분석".
4
+ ---
5
+
6
+ # OKSTRA Time Summary
7
+
8
+ Aggregate elapsed work time for a given task, grouped by **task type** and broken down by **worker** (lead, intake, claude-worker, codex-worker, gemini-worker, report-writer).
9
+
10
+ ## When to Use
11
+
12
+ - The user provides a `task-id` (or `task-key`) and asks how long the task took.
13
+ - The user wants to see time spent per phase / task type for a single task.
14
+ - The user wants a per-worker time breakdown for a task's runs.
15
+
16
+ ## Data Sources
17
+
18
+ Two sources, both already collected by `okstra`:
19
+
20
+ 1. `.project-docs/okstra/tasks/<task-group>/<task-id>/history/timeline.json`
21
+ — lists every run with `runTimestamp`, `taskType`, `status`, `teamStatePath`.
22
+ 2. Each run's `.../runs/<task-type>/state/team-state-<suffix>.json`
23
+ — populated by `scripts/okstra-token-usage.py` at Phase 7. Contains:
24
+ - `leadUsage.{startedAt, endedAt, durationMs}`
25
+ - `workers[].{workerId, agent, usage.{startedAt, endedAt, durationMs}}`
26
+
27
+ If a run never reached Phase 7, its `team-state` will not have `durationMs` filled in. Mark such runs as `unavailable` rather than guessing.
28
+
29
+ ## Step 0: Verify okstra runtime + project setup
30
+
31
+ ```bash
32
+ npx -y okstra@latest ensure-installed >/dev/null 2>&1 || {
33
+ echo "FAIL: okstra not installed; tell the user to run: npx okstra@latest install" >&2
34
+ exit 1
35
+ }
36
+ eval "$(npx -y okstra@latest paths --shell)"
37
+ export PYTHONPATH="$OKSTRA_PYTHONPATH"
38
+ OKSTRA_PROJECT_INFO="$(npx -y okstra@latest check-project --json)" || {
39
+ echo "FAIL: this project has no okstra setup. Tell the user to run /okstra-setup first." >&2
40
+ echo "$OKSTRA_PROJECT_INFO" >&2
41
+ exit 1
42
+ }
43
+ ```
44
+
45
+ `$OKSTRA_PROJECT_INFO` (JSON `{ok, projectRoot, projectJsonPath, projectId}`)
46
+ gives `projectRoot` for locating `.project-docs/okstra/discovery/task-catalog.json`.
47
+
48
+ ## Step 1: Resolve task-id → timeline path
49
+
50
+ 1. If the user gave a full `task-key` (`<project-id>:<task-group>:<task-id>`), use it directly.
51
+ 2. Otherwise read `.project-docs/okstra/discovery/task-catalog.json` and find the entry whose `taskId` matches.
52
+ 3. If multiple entries match, list candidates (`taskKey`, `taskType`, `updatedAt`) and ask the user to pick.
53
+ 4. From the chosen entry, read `historyTimelinePath`.
54
+
55
+ If `task-catalog.json` is missing, respond: "No okstra history found. Run `scripts/okstra.sh` first."
56
+
57
+ ## Step 2: Walk runs and collect durations
58
+
59
+ For each entry in `timeline.json`'s `runs` array:
60
+
61
+ 1. Read the `team-state` file at `teamStatePath` (relative to the project root).
62
+ 2. Extract:
63
+ - `taskType` from the timeline entry (authoritative).
64
+ - `leadUsage.durationMs` and `leadUsage.{startedAt,endedAt}`.
65
+ - For each `worker` in `workers[]`: `workerId`, `agent`, `usage.durationMs`.
66
+ 3. If the team-state file is missing, or all `durationMs` values are 0/absent, record the run under `unavailable` with its `runTimestamp` and `taskType`.
67
+
68
+ ## Step 3: Aggregate
69
+
70
+ Build two tables:
71
+
72
+ ### A. Per task-type summary
73
+
74
+ For each distinct `taskType` across runs:
75
+
76
+ | Column | Computation |
77
+ |--------|-------------|
78
+ | `Runs` | count of runs of that task type that contributed any duration |
79
+ | `Total` | sum of (lead + all workers) across those runs |
80
+ | `Lead` | sum of `leadUsage.durationMs` |
81
+ | `Workers` | sum of all `workers[].usage.durationMs` |
82
+
83
+ Add a final `Grand total` row.
84
+
85
+ ### B. Per worker breakdown (per task type)
86
+
87
+ For each task type, list one row per `workerId` actually present, plus `lead` and (if non-zero) `intake`. Aggregate `durationMs` across all runs of that task type.
88
+
89
+ | Worker | Runs | Total | Avg/run |
90
+ |--------|------|-------|---------|
91
+
92
+ Use the `workerId` from team-state (e.g. `claude`, `codex`, `gemini`, `report-writer`). When the same `workerId` ran with different `agent` values across runs, append the agent in parentheses (`claude (claude)`, `codex (codex)`).
93
+
94
+ ## Step 4: Format output
95
+
96
+ - Convert `durationMs` to `HH:MM:SS` (zero-pad). Example: `7384000ms` → `02:03:04`.
97
+ - Sort task types by their order of first appearance in the timeline (chronological, not alphabetical).
98
+ - If any runs were `unavailable`, append a final note listing them with reason (`team-state missing`, `Phase 7 not reached`, etc.).
99
+
100
+ ### Output template
101
+
102
+ ```markdown
103
+ ## Time summary — <task-key>
104
+
105
+ ### By task type
106
+
107
+ | Task type | Runs | Total | Lead | Intake | Workers |
108
+ |------------------------|------|-----------|----------|----------|----------|
109
+ | requirements-discovery | 2 | 00:34:12 | 00:12:08 | 00:01:00 | 00:21:04 |
110
+ | error-analysis | 1 | 00:18:45 | 00:08:11 | -- | 00:10:34 |
111
+ | implementation | 3 | 02:11:09 | 00:45:30 | -- | 01:25:39 |
112
+ | **Grand total** | 6 | **03:04:06** | 01:05:49 | 00:01:00 | 01:57:17 |
113
+
114
+ ### Per worker — requirements-discovery
115
+
116
+ | Worker | Runs | Total | Avg/run |
117
+ |----------------|------|----------|----------|
118
+ | lead | 2 | 00:12:08 | 00:06:04 |
119
+ | intake | 1 | 00:01:00 | 00:01:00 |
120
+ | claude | 2 | 00:09:12 | 00:04:36 |
121
+ | codex | 2 | 00:07:40 | 00:03:50 |
122
+ | gemini | 2 | 00:03:12 | 00:01:36 |
123
+ | report-writer | 2 | 00:01:00 | 00:00:30 |
124
+
125
+ ### Per worker — error-analysis
126
+ ...
127
+
128
+ > Unavailable: 1 run (implementation / 2026-04-30_03-03-48) — team-state has no durationMs (Phase 7 not reached)
129
+ ```
130
+
131
+ If the `Intake` column is all zero across every task type, omit that column entirely.
132
+
133
+ ## Output Rules
134
+
135
+ - Always render durations as `HH:MM:SS`; never raw milliseconds.
136
+ - Never invent or estimate `durationMs`. Missing → `--`.
137
+ - Never sum across `unavailable` runs into the totals — those are reported only in the trailing note.
138
+ - Show the resolved `<task-key>` in the heading so the user can confirm disambiguation.