okstra 0.25.1 → 0.27.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (48) hide show
  1. package/README.kr.md +16 -0
  2. package/README.md +16 -0
  3. package/docs/kr/architecture.md +3 -7
  4. package/docs/kr/cli.md +47 -4
  5. package/docs/kr/performance-improvement-plan-v2.md +23 -0
  6. package/docs/kr/performance-improvement-plan.md +22 -0
  7. package/docs/superpowers/specs/2026-05-15-implementation-plan-verification-design.md +254 -0
  8. package/package.json +1 -1
  9. package/runtime/BUILD.json +2 -2
  10. package/runtime/agents/SKILL.md +30 -2
  11. package/runtime/bin/okstra.sh +1 -1
  12. package/runtime/prompts/profiles/_common-contract.md +30 -1
  13. package/runtime/prompts/profiles/error-analysis.md +12 -0
  14. package/runtime/prompts/profiles/implementation-planning.md +23 -0
  15. package/runtime/prompts/profiles/requirements-discovery.md +20 -0
  16. package/runtime/python/lib/okstra/cli.sh +8 -7
  17. package/runtime/python/lib/okstra/globals.sh +3 -1
  18. package/runtime/python/lib/okstra/usage.sh +8 -4
  19. package/runtime/python/okstra_ctl/render.py +35 -0
  20. package/runtime/python/okstra_ctl/run.py +27 -6
  21. package/runtime/python/okstra_ctl/run_context.py +1 -1
  22. package/runtime/python/okstra_ctl/wizard.py +259 -10
  23. package/runtime/python/okstra_token_usage/blocks.py +5 -1
  24. package/runtime/python/okstra_token_usage/claude.py +16 -1
  25. package/runtime/python/okstra_token_usage/collect.py +17 -3
  26. package/runtime/python/okstra_token_usage/pricing.py +159 -24
  27. package/runtime/skills/okstra-brief/SKILL.md +532 -65
  28. package/runtime/skills/okstra-context-loader/SKILL.md +25 -11
  29. package/runtime/skills/okstra-convergence/SKILL.md +235 -8
  30. package/runtime/skills/okstra-history/SKILL.md +68 -37
  31. package/runtime/skills/okstra-logs/SKILL.md +26 -4
  32. package/runtime/skills/okstra-report-finder/SKILL.md +49 -22
  33. package/runtime/skills/okstra-report-writer/SKILL.md +59 -64
  34. package/runtime/skills/okstra-run/SKILL.md +53 -39
  35. package/runtime/skills/okstra-schedule/SKILL.md +51 -20
  36. package/runtime/skills/okstra-setup/SKILL.md +31 -12
  37. package/runtime/skills/okstra-status/SKILL.md +20 -8
  38. package/runtime/skills/okstra-team-contract/SKILL.md +27 -15
  39. package/runtime/skills/okstra-time-summary/SKILL.md +53 -16
  40. package/runtime/templates/reports/final-report.template.md +34 -0
  41. package/runtime/templates/reports/settings.template.json +7 -4
  42. package/runtime/validators/lib/fixtures.sh +10 -2
  43. package/runtime/validators/lib/validate-assets.sh +50 -24
  44. package/runtime/validators/validate-brief.py +385 -0
  45. package/runtime/validators/validate-brief.sh +35 -0
  46. package/runtime/validators/validate-run.py +71 -0
  47. package/runtime/validators/validate-workflow.sh +7 -33
  48. package/src/wizard.mjs +21 -5
@@ -12,7 +12,9 @@ user-invocable: false
12
12
  - When the user needs to know the okstra task bundle path
13
13
  - When you need to derive all artifact paths based on `task-manifest.json`
14
14
 
15
- ## Step 1: Locating the Task Bundle
15
+ ## Step 1: Resolve the Task Bundle Path
16
+
17
+ (Resolve which task-root path to use; Step 2 opens `task-manifest.json` at that path.)
16
18
 
17
19
  ### Default Location Rules
18
20
 
@@ -28,11 +30,12 @@ user-invocable: false
28
30
  3. If the user attempts to find a task based on `task-group` + `task-id` or `task-id`, `.project-docs/okstra/discovery/task-catalog.json` is read to find candidates.
29
31
  4. If multiple candidates are found based on `task-id` alone, the situation is ambiguous, so `task-group` or the full `taskKey` is required.
30
32
  5. If the user has not provided an explicit task key/path, first read `.project-docs/okstra/discovery/latest-task.json` using the current-task convenience pointer.
31
- 6. Even if the latest-task pointer is missing or corrupted, but the task catalog exists, first check the list of prepared task bundles based on the catalog. Do not use the legacy `CLAUDE.md`, project guide, or task scan fallback.
33
+ 6. If the latest-task pointer is missing or corrupted but the task catalog exists, list candidates from the catalog. Do not use the legacy `CLAUDE.md`, project guide, or task scan fallback.
34
+ 7. If **neither** `latest-task.json` **nor** `task-catalog.json` exists, ABORT Phase 1 with `OKSTRA_CONTEXT_NOT_INITIALIZED`. Suggest the user run `/okstra-setup` and `/okstra-brief` to bootstrap the project. Do NOT crawl `.project-docs/okstra/tasks/` directly — discovery pointers are the only supported entry path.
32
35
 
33
- ## Step 2: Read task-manifest.json
36
+ ## Step 2: Open and Parse task-manifest.json
34
37
 
35
- task-manifest.json is the canonical metadata source. The following fields must be extracted from this file:
38
+ `task-manifest.json` (found at the task-root resolved in Step 1) is the canonical metadata source. Extract the following fields:
36
39
 
37
40
  | Field | Description |
38
41
  |------|------|
@@ -53,7 +56,7 @@ task-manifest.json is the canonical metadata source. The following fields must b
53
56
  | `workflow.awaitingApproval` | Approval wait marker |
54
57
  | `workflow.routingStatus` | Routing decision status |
55
58
  | `workflow.lastSafeCheckpoint` | Safe resume checkpoint metadata |
56
- | `instructionSetPath` | Instruction-set path |
59
+ | `instructionSetPath` | Path to the `instruction-set/` **directory** containing `analysis-profile.md`, `analysis-material.md`, `reference-expectations.md`, `task-brief.md`, `final-report-template.md` (see Step 4). Not a single-file path. |
57
60
  | `referenceExpectationsPath` | config/deployment expectation artifact path |
58
61
  | `latestRunPath` | latest run path |
59
62
  | `latestRunStatus` | latest run status |
@@ -63,7 +66,7 @@ task-manifest.json is the canonical metadata source. The following fields must b
63
66
  | `historyTimelinePath` | timeline path |
64
67
  | `resultContract` | team contract and expected artifact metadata |
65
68
  | `resultContract.requiredWorkerRoles[*].promptPath` | worker prompt history path by role |
66
- | `convergence` | convergence loop settings (enabled, maxRounds, verificationMode). `maxRounds` 기본값은 phase-aware: `requirements-discovery`는 `1`, 그 외는 `2`. 자세한 내용은 [okstra-convergence](../okstra-convergence/SKILL.md) 참고 |
69
+ | `convergence` | convergence loop settings (`enabled`, `maxRounds`, `verificationMode`). See [okstra-convergence](../okstra-convergence/SKILL.md) for the authoritative defaults — do not re-document the `maxRounds` value here. |
67
70
 
68
71
  ## Step 3: Directory Structure Rules
69
72
 
@@ -121,12 +124,21 @@ After verifying `task-manifest.json`, read the instruction set in the following
121
124
  4. `instruction-set/task-brief.md` (task brief)
122
125
  5. `instruction-set/final-report-template.md` (report template)
123
126
 
127
+ ### Brief Reporter-Confirmation Precondition (BLOCKING)
128
+
129
+ After reading `task-brief.md`, extract the frontmatter `reporter-confirmations` field (`complete | partial | pending | skipped`). This precondition is shared across every consuming phase — see `prompts/profiles/_common-contract.md` "Brief consumption" block for the authoritative handling matrix.
130
+
131
+ - `complete` or `partial` → proceed to Step 5 and hand off to `okstra-team-contract`.
132
+ - `skipped` → proceed, but flag the unmarked `intent-check:` / `conversion-block:` rows for promotion by the phase profile.
133
+ - `pending` (or field missing) → emit `REPORTER_CONFIRMATION_PENDING` and STOP. Do not invoke `okstra-team-contract` or any analyser. The operator must rerun `okstra-brief` Step 6.5 before Phase 2 can start.
134
+
124
135
  ## Step 5: Read Run Manifest and Team State
125
136
 
126
- 1. Current run manifest: The latest run manifest pointed to by the discovery pointer or task-manifest
127
- 2. Current team state: The latest team-state pointed to by the discovery pointer or task-manifest
128
- 3. Extract the worker prompt directory path and per-worker prompt history paths from the current run manifest and team-state
129
- 4. If an existing run report is available, use it solely as historical context.
137
+ 1. Identify the active run by reading `runDateTimeSegment` from the latest `runs/<task-type>/manifests/run-manifest-*.json` (mtime order). That segment is the shared run identifier across all category subdirectories (`state/`, `prompts/`, `reports/`, `status/`, `sessions/`, `worker-results/`).
138
+ 2. Resolve sibling artifacts for this run by matching the same `runDateTimeSegment`. Do NOT re-scan `<seq>` counters per category they may diverge if an earlier run only wrote some categories.
139
+ 3. Current team state: the team-state file whose `runDateTimeSegment` matches the active run manifest.
140
+ 4. Extract the worker prompt directory path and per-worker prompt history paths from the current run manifest and team-state.
141
+ 5. If an existing run report is available, use it solely as historical context.
130
142
 
131
143
  ## Output
132
144
 
@@ -137,4 +149,6 @@ Information to be obtained after executing this skill:
137
149
  - Reference list of config files/deployment manifests and task-level expected values
138
150
  - Current run status and presence of existing worker results
139
151
  - Current run prompt history contract for attempted workers
140
- - Resume command path: `runs/<task-type>/sessions/claude-resume-<task-type>-<seq>.sh`
152
+ - Candidate `teamName` for Phase 3 hand-off: `okstra-<task-key>` (with task-key slugified per Step 1's slug rule)
153
+ - Current Claude `lead.sessionId` (the in-flight Claude Code session) — required by `okstra-team-contract` when registering the lead in `team-state.json`
154
+ - Resume command path: from `task-manifest.json` → `latestResumeCommandPath` (fallback: latest `runs/<task-type>/sessions/claude-resume-*.sh` by mtime). Never reconstruct the filename — the `<seq>` counter is category-local and may diverge from `manifests/`.
@@ -6,6 +6,23 @@ user-invocable: false
6
6
 
7
7
  # OKSTRA Convergence
8
8
 
9
+ ## Index
10
+
11
+ - [Scope and Terminology (BLOCKING)](#scope-and-terminology-blocking)
12
+ - [When to Use](#when-to-use)
13
+ - [Configuration](#configuration)
14
+ - [Finding Category](#finding-category)
15
+ - [Convergence Algorithm](#convergence-algorithm)
16
+ - [Round 0: Parse worker results](#round-0-parse-worker-results)
17
+ - [Round 1-N: Re-verification Loop (queue-pruned)](#round-1-n-re-verification-loop-queue-pruned)
18
+ - [Convergence Test](#convergence-test)
19
+ - [Verification Mode](#verification-mode)
20
+ - [Re-verification Agent Dispatch](#re-verification-agent-dispatch)
21
+ - [Convergence State Artifact](#convergence-state-artifact)
22
+ - [Output](#output)
23
+ - [Convergence Disabled](#convergence-disabled)
24
+ - [Plan-body verification mode (implementation-planning only)](#plan-body-verification-mode-implementation-planning-only)
25
+
9
26
  ## Scope and Terminology (BLOCKING)
10
27
 
11
28
  This skill governs **Phase 5.5 (Convergence loop)** — a *lead operating phase* inside a single okstra run, not a task-type lifecycle phase. The 6 task-type lifecycle phases (`requirements-discovery` → `error-analysis` → `implementation-planning` → `implementation` → `final-verification` → `release-handoff`, see [okstra/SKILL.md](../../SKILL.md) "Lifecycle Phase Boundaries") are unchanged by this skill. The lead operating phases (Phase 1 Intake → Phase 7 Persist, see [okstra/SKILL.md](../../SKILL.md) "Quick Reference") describe how the lead drives a *single* task-type run.
@@ -30,6 +47,8 @@ Configure this in the `convergence` block of `task-manifest.json`. If the block
30
47
  | `maxRounds` | phase-aware: `1` for `requirements-discovery`, `2` otherwise (range 1–3) | Maximum number of re-verification rounds. Discovery's routing/missing-input outputs gain little from a second round; other phases (especially `error-analysis`) keep `2`. Lead resolves the effective value when the manifest omits the key and records it in `config.maxRounds` of the convergence state artifact. |
31
48
  | `verificationMode` | `"lightweight"` | `"lightweight"` or `"full-reanalysis"` |
32
49
 
50
+ **Auto-disable rule (BLOCKING).** Convergence requires ≥2 analyser workers to produce a meaningful consensus tally. When the active profile's `Required workers:` block (see `prompts/profiles/*.md`) resolves to fewer than 2 analyser workers — e.g. `release-handoff` (zero analyser workers, lead-only) — the lead MUST treat `convergence.enabled` as `false` for that run regardless of manifest configuration, skip Phases 5.5 and the plan-body verification round, and record `finalState: "converged"` with `totalRounds: 0` and an explanatory note in `config` (e.g. `"autoDisabled": "fewer-than-two-analysers"`). The plan-body round inherits the same rule via its `gating=false` advisory path.
51
+
33
52
  ## Finding Category
34
53
 
35
54
  | Category | Definition | Included in Report |
@@ -37,10 +56,12 @@ Configure this in the `convergence` block of `task-manifest.json`. If the block
37
56
  | `full-consensus` | All participating workers agree | Required |
38
57
  | `partial-consensus` | Majority of workers agree; dissenting opinions are recorded | Required |
39
58
  | `contested` | Final classification only. Assigned to a finding that remains in the verification queue after the **last executed round** completes (round index = `effectiveMaxRounds`). Each worker's position across all executed rounds is recorded. NEVER used as an intermediate label. | Required |
40
- | `worker-unique` | Only the discoverer confirms; others oppose or remain unverified | Required |
59
+ | `worker-unique` | Only the discoverer confirms and ALL other non-error votes are `DISAGREE`. `verification-error` votes are excluded from the tally per §"Worker failure handling in reverify"; a finding where every non-discoverer vote is `verification-error` is carried forward, never classified `worker-unique`. | Required |
41
60
 
42
61
  ## Convergence Algorithm
43
62
 
63
+ **Majority definition (BLOCKING).** "Majority" means *strictly greater than half* of the non-error votes for that finding (`verification-error` votes are excluded from both numerator and denominator). Ties — including the 1-AGREE / 1-DISAGREE case in a two-analyser roster — are NOT a majority: in intermediate rounds the finding is **carried forward**; in the final executed round the finding is classified `contested`. This rule applies identically to the plan-body verification round (§"Plan-body verification mode") where the same verdict tokens are reused.
64
+
44
65
  ### Round 0: Parse worker results
45
66
 
46
67
  Read the worker result files generated in Phase 4/5 and extract individual findings.
@@ -132,7 +153,8 @@ The lead MUST construct the per-worker reverify prompt body from `items_for_W` o
132
153
  |---|---|---|
133
154
  | `effectiveMaxRounds >= 2` | true | `"max-rounds-1"` |
134
155
  | `len(queue) > 0` after round 1 | true | `"queue-empty"` |
135
- | At least one round-1 reverify dispatch terminated as `completed` | true | `"all-reverify-non-result"` |
156
+
157
+ The third gate condition — "all reverify dispatches terminated as non-result" — is handled inline by the WHILE-loop body (see the `BREAK` on `all dispatches ... terminal non-result` and §"Worker failure handling in reverify" rule 4) which records `round2SkippedReason = "all-reverify-non-result"` and aborts before the predicate is re-evaluated. It is therefore not duplicated as a gate row here.
136
158
 
137
159
  When all conditions hold the predicate returns `true` and `round2SkippedReason` is set to `"not-skipped"`. The field is mandatory on every convergence state artifact — write `"not-skipped"` rather than omitting the key.
138
160
 
@@ -152,11 +174,7 @@ The final classifier (`FOR each finding F still in queue` block) treats `verific
152
174
 
153
175
  ### Convergence Test
154
176
 
155
- - If the verification queue is empty at the end of any round Convergence complete (`finalState: "converged"`), remaining rounds are not executed
156
- - Upon completing the **last executed round** (where round index == `effectiveMaxRounds`, OR where Round 2 was suppressed per the Round 2 gate below) → Apply final classification to remaining queue items:
157
- - Majority agreement across executed rounds → `partial-consensus`
158
- - Otherwise → `contested`
159
- - The final classification step never runs while the queue is still being re-verified — confirmed items always exit the queue first.
177
+ The exit conditions and final-classification rules are defined by the §"Convergence Algorithm" pseudocode (the `WHILE` exit, the post-loop `FOR each finding F still in queue` block, and the `finalState` mapping in §"Convergence State Artifact"). This is the single source — no separate prose copy is maintained here to prevent drift.
160
178
 
161
179
  ## Verification Mode
162
180
 
@@ -383,9 +401,11 @@ Schema rules:
383
401
  - `schemaVersion`: literal string `"1.1"` for new runs. Readers MUST accept `"1.0"` for historical artifacts and treat any missing v1.1 field as `null`.
384
402
  - `config.effectiveMaxRounds`: the integer the lead actually used after resolving the phase-aware default (`1` for `requirements-discovery`, `2` otherwise). MUST equal `config.maxRounds` when the manifest explicitly set it.
385
403
  - `findings[].ticketIds`: array of ticket keys from Phase 4 grouping (parsed per the Round 0 step 5 rule). MAY be empty when the discovering worker tagged the finding `unknown`.
404
+ - `findings[].rounds[].votes.<worker>.verdict`: enum, one of `agree | disagree | supplement | verification-error`. Lower-case tokens; map upper-case AGREE/DISAGREE/SUPPLEMENT verdicts emitted by workers to their lower-case form before persisting. `verification-error` is reserved for terminal non-result dispatches (§"Worker failure handling in reverify").
405
+ - `findings[].classification`: enum, one of `full-consensus | partial-consensus | worker-unique | contested`. No other value is permitted in v1.1.
386
406
  - `roundHistory[].inputQueueSize`: queue size at the start of this round.
387
407
  - `roundHistory[].resolvedCount`: number of findings that exited the queue this round (sum of full+partial+worker-unique classifications produced this round).
388
- - `roundHistory[].carriedForwardCount`: queue size at the END of this round (must equal `inputQueueSize - resolvedCount` when there are no in-round queue insertions; in-round insertions are forbidden).
408
+ - `roundHistory[].carriedForwardCount`: queue size at the END of this round the single definition. In-round insertions into the queue are forbidden, so this always equals `inputQueueSize - resolvedCount`. The pseudocode's per-item `carriedForwardCount += 1` accumulator is a counting convenience that lands on the same value; persist the post-round queue length, not the loop accumulator, if the two ever diverge.
389
409
  - `roundHistory[].dispatches[]`: one entry per worker that was actually dispatched in this round. Each entry is `{worker, status, durationMs}`. `status ∈ {completed, timeout, error, not-run}`. `durationMs` is integer milliseconds and is always present, even for terminal-non-result dispatches (use the elapsed time before the wrapper gave up).
390
410
  - `roundHistory[].skippedWorkers[]`: per-worker `{worker, reason}` for workers with no items to verify OR with a non-result dispatch.
391
411
  - `roundHistory[].verificationsRequested|verificationsCompleted|newConsensus|remainingInQueue|earlyExit`: legacy v1.0 aliases. New runs SHOULD populate them so existing parsers keep working: `verificationsRequested == len(dispatches)`, `verificationsCompleted == len(d for d in dispatches if d.status == "completed")`, `newConsensus == resolvedCount`, `remainingInQueue == carriedForwardCount`, `earlyExit == (round < effectiveMaxRounds AND carriedForwardCount == 0)`.
@@ -407,3 +427,210 @@ Information to be passed to Phase 6 after executing this skill:
407
427
  ## Convergence Disabled
408
428
 
409
429
  If `convergence.enabled: false`, this skill is skipped. Phase 6 operates using the existing consensus/divergence method.
430
+
431
+ ## Plan-body verification mode (implementation-planning only)
432
+
433
+ This section defines a **second, independent** convergence round that fires only for `task-type = implementation-planning`. The round verifies the *consolidated plan* that the report-writer worker has authored, not the worker findings that were already reconciled earlier.
434
+
435
+ ### Lifecycle position (BLOCKING)
436
+
437
+ Plan-body verification runs **after** finding convergence and **after** the report-writer draft is written. Sequence inside a single implementation-planning run:
438
+
439
+ ```
440
+ Phase 4 workers produce independent analyses (Findings F-001…)
441
+ → Phase 5.5 FINDING convergence (this skill, sections "Convergence Algorithm" through "Convergence State Artifact")
442
+ → Phase 6 report-writer authors final-report draft (consolidated Option Candidates / Stepwise Execution Order / Dependency / Validation Checklist / Rollback)
443
+ → PLAN-BODY VERIFICATION ROUND ← new — described below
444
+ → User Approval gate (top-of-report `- [ ] Approved` marker is rendered only when this round's Gate result is `passed` or `passed-with-dissent`)
445
+ → implementation phase (separate run)
446
+ ```
447
+
448
+ Plan-body verification MUST NOT replace, precede, or be conflated with the Phase 5.5 finding convergence above. They are two distinct rounds with different inputs (findings vs. consolidated plan body), different ID schemes (`F-*` vs. `P-*`), and different state files.
449
+
450
+ ### MUTUAL EXCLUSION (BLOCKING)
451
+
452
+ The finding queue (Phase 5.5) and the plan-item queue (this section) are **disjoint**:
453
+
454
+ - A finding-convergence reverify prompt MUST NOT contain any `P-*` item.
455
+ - A plan-body verification prompt MUST NOT contain any `F-*` finding.
456
+ - The two rounds write to **different state files**: `runs/<task-type>/state/convergence-<task-type>-<seq>.json` (findings, see §"Convergence State Artifact") vs. `runs/<task-type>/state/plan-body-verification-<task-type>-<seq>.json` (plan items, see §"`plan-body-verification.json` schema").
457
+ - Aggregation logic (verdict counting, classification) MUST NOT carry votes from one queue into the other.
458
+
459
+ Mixing the two queues — for example, parsing a Phase 6 draft's Stepwise Execution Order step as if it were an `F-*` finding — is a contract violation. Future Claude reading this skill: if you find yourself tempted to "just reuse the finding queue for plan items, they're similar enough", stop. They are not similar enough; the verdict semantics differ (see §"Plan-body verdict semantics" below).
460
+
461
+ ### Configuration
462
+
463
+ Plan-body verification is configured under `convergence.planBodyVerification` in `task-manifest.json`:
464
+
465
+ | Setting | Default | Description |
466
+ |---------|---------|-------------|
467
+ | `enabled` | `true` | If `false`, the round is skipped and the top-of-report Approval marker is rendered unconditionally (legacy behaviour). |
468
+ | `maxRounds` | `1` | Hard upper bound. Plan-body verification is consistency / completeness checking, not fact checking — additional rounds rarely help. Range 1–3. |
469
+ | `gating` | `true` | If `true` (default), `majority-disagree` blocks the Approval marker. If `false`, the round is advisory-only and the marker always renders. |
470
+
471
+ Default values are emitted into the manifest by `scripts/okstra_ctl/render.py` (`_build_convergence_block`). The ctx knob `OKSTRA_PLAN_VERIFICATION=false` flips `planBodyVerification.enabled` to false.
472
+
473
+ ### Plan-item extraction (Round 0 equivalent)
474
+
475
+ From the report-writer's draft of `## 4.5 Implementation Plan Deliverables`, lead extracts plan items with the following prefixes (see also `templates/reports/final-report.template.md` §4.5.9):
476
+
477
+ | Prefix | Source sub-section | One row per |
478
+ |--------|--------------------|-------------|
479
+ | `P-Opt-<N>` | `4.5.1 Option Candidates` | one Option (its File Structure list + interfaces + blast radius) |
480
+ | `P-Step-<N>` | `4.5.4 Stepwise Execution Order` | one step (path + command + success signal) |
481
+ | `P-Dep-<N>` | `4.5.5 Dependency / Migration Risk` | one dependency row |
482
+ | `P-Val-<N>` | `4.5.6 Validation Checklist` | one checklist item |
483
+ | `P-Rb-<N>` | `4.5.7 Rollback Strategy` | one rollback path |
484
+
485
+ `4.5.2 Trade-off Matrix` and `4.5.3 Recommended Option` are NOT extracted as standalone plan items — the trade-off matrix is evaluated implicitly through each option's `P-Opt-*` verification, and the recommended option is one of those `P-Opt-*` rows.
486
+
487
+ Each plan item inherits the `[TICKETID: ...]` tag of its source section (per the standard ticket-tagging contract).
488
+
489
+ ### Plan-body verdict semantics
490
+
491
+ The verdict tokens `AGREE` / `DISAGREE` / `SUPPLEMENT` are reused, but their meaning is plan-specific:
492
+
493
+ - **AGREE**: the item is executable as written *and* internally consistent with other items in the plan.
494
+ - **DISAGREE(<kind>)**: the item is broken. `<kind>` MUST be one of:
495
+ - `a` — referenced file path / symbol mismatches another step or option's File Structure list
496
+ - `b` — command is not executable or is ambiguous
497
+ - `c` — validation signal is not observable
498
+ - `d` — rollback violates commit / dependency order
499
+ - `e` — item contradicts the trade-off matrix
500
+ - **SUPPLEMENT**: the item is sound but is missing a dependency / edge case / precondition.
501
+
502
+ Worker non-result handling (`timeout`, `error`, no result file, wrapper `cli-failure`) is identical to finding convergence: do NOT aggregate as DISAGREE, record `contract-violation`, and apply the round-level abort rule below.
503
+
504
+ ### Mode constraint
505
+
506
+ Plan-body verification only supports **lightweight mode** (defined in §"Verification Mode" above). `full-reanalysis` is not meaningful here because the "original source materials" for a plan item are the worker's own analysis plus the lead-mediated synthesis — there is no independent ground truth to re-read. The manifest's top-level `verificationMode` is ignored for this round; lightweight is always used.
507
+
508
+ ### Round protocol (single round at default `maxRounds=1`)
509
+
510
+ 1. Lead parses the report-writer draft and extracts the `P-*` plan items.
511
+ 2. For each analyser worker in the roster (`claude`, `codex`, and `gemini` if opted in), lead constructs a reverify prompt using the template in §"Plan-body reverify prompt" below.
512
+ 3. Dispatch uses the same wrapper infrastructure as finding convergence. The `--role-slug` is `<role>-plan-verify-r<N>`. Result file path: `runs/<task-type>/worker-results/<role-slug>-plan-verify-r<N>-implementation-planning-<seq>.md`.
513
+ 4. After all dispatches return, lead aggregates verdicts per `P-*` item across workers and classifies each:
514
+ - `full-consensus` — all participating analysers `AGREE` (SUPPLEMENT counts as agree on the item itself).
515
+ - `partial-consensus` — majority `AGREE`, dissenting `DISAGREE` recorded.
516
+ - `dissent-isolated` — only one worker `DISAGREE`s, others `AGREE` — treat as `partial-consensus` for gate purposes; record dissent. (Distinct from finding-convergence `worker-unique`, which means the *opposite*: only one worker AGREEs. Plan-body classifications use this dedicated label to avoid the collision.)
517
+ - `majority-disagree` — majority of analysers `DISAGREE` on this item. This is the only classification that **blocks the Approval marker**.
518
+ - `contested` only meaningful when `maxRounds > 1`; at default `maxRounds=1`, fold any unresolved item into `partial-consensus`.
519
+ 5. Gate result resolution:
520
+ - any `majority-disagree` item present AND `gating=true` → `blocked-by-disagreement`
521
+ - all dispatches non-result → `aborted-non-result`
522
+ - any `partial-consensus` / `dissent-isolated` present, no `majority-disagree` → `passed-with-dissent`
523
+ - all items `full-consensus` → `passed`
524
+ 6. Lead writes `runs/<task-type>/state/plan-body-verification-<task-type>-<seq>.json` (schema below) and populates `### 4.5.9 Plan Body Verification` in the final report (template at `templates/reports/final-report.template.md`).
525
+ 7. For every `majority-disagree` item, lead adds a row to `## 5. Clarification Items` with:
526
+ - new `C-<N>` ID (numbering continues from any existing rows)
527
+ - `Statement` summarising the disagreement and the worker breakage `<kind>`
528
+ - `Kind` chosen per the standard policy (usually `decision` for option-level conflicts, `data-point` for path/symbol mismatches)
529
+ - `Blocks=approval`
530
+ - the §4.5.9 verdict table's `Classification` column for that row reads `majority-disagree → C-<N>` (1:1 ID match — orphan on either side is a contract violation per `prompts/profiles/implementation-planning.md` self-review step 6).
531
+ 8. The top-of-report `- [ ] Approved` marker line is rendered if and only if the Gate result is `passed` or `passed-with-dissent`. `validators/validate-run.py` `validate_phase_boundary` enforces this correspondence; manually adding the marker line when the gate did not pass is a contract violation.
532
+
533
+ ### `plan-body-verification-<task-type>-<seq>.json` schema
534
+
535
+ ```json
536
+ {
537
+ "schemaVersion": "1.0",
538
+ "phase": "implementation-planning",
539
+ "round": 1,
540
+ "effectiveMaxRounds": 1,
541
+ "gating": true,
542
+ "verificationMode": "lightweight",
543
+ "gateResult": "passed | passed-with-dissent | blocked-by-disagreement | aborted-non-result",
544
+ "planItems": [
545
+ {
546
+ "id": "P-Opt-1",
547
+ "sourceSection": "4.5.1",
548
+ "ticketId": "<id-or-unknown>",
549
+ "votes": {"claude-worker": "AGREE", "codex-worker": "AGREE"},
550
+ "classification": "full-consensus",
551
+ "clarificationId": null
552
+ },
553
+ {
554
+ "id": "P-Step-3",
555
+ "sourceSection": "4.5.4",
556
+ "ticketId": "TICKET-123",
557
+ "votes": {"claude-worker": "DISAGREE(a)", "codex-worker": "DISAGREE(a)"},
558
+ "classification": "majority-disagree",
559
+ "clarificationId": "C-7"
560
+ }
561
+ ],
562
+ "dispatches": [
563
+ {"role": "claude-worker", "resultPath": "...", "terminalStatus": "completed"}
564
+ ]
565
+ }
566
+ ```
567
+
568
+ `dispatches[].terminalStatus` mirrors finding convergence (`completed | timeout | error | not-run | cli-failure`).
569
+
570
+ `planItems[].classification` enum: `full-consensus | partial-consensus | dissent-isolated | majority-disagree | contested`. `contested` only appears when `maxRounds > 1`; at default `maxRounds=1` any otherwise-unresolved item folds into `partial-consensus` per the round protocol above.
571
+
572
+ `planItems[].votes.<worker>` is the verbatim verdict token emitted by the worker — `AGREE | DISAGREE(<a|b|c|d|e>) | SUPPLEMENT` — or `verification-error` for terminal non-result dispatches. The `DISAGREE` token retains its `<kind>` suffix so the breakage class is recoverable from the state file alone.
573
+
574
+ ### Plan-body reverify prompt
575
+
576
+ Required prompt anchor headers are identical to finding convergence (see §"Required reverify-prompt anchor headers"). The prompt body changes from F-* listing to P-* listing:
577
+
578
+ ```
579
+ You are <worker-role> performing plan-body verification for <task-key> (round 1).
580
+
581
+ ## Instructions
582
+
583
+ Review the following items extracted from the consolidated implementation plan
584
+ authored after your initial analysis. For EACH item, respond with exactly one
585
+ verdict:
586
+
587
+ - **AGREE**: The item is executable as written and internally consistent with
588
+ other items in the plan.
589
+ - **DISAGREE(<kind>)**: The item is broken. Cite which kind:
590
+ (a) referenced file path / symbol mismatches another step or option,
591
+ (b) command is not executable or is ambiguous,
592
+ (c) validation signal is not observable,
593
+ (d) rollback violates commit / dependency order,
594
+ (e) item contradicts the trade-off matrix.
595
+ - **SUPPLEMENT**: The item is sound but a dependency / edge case / precondition
596
+ is missing.
597
+
598
+ Do NOT re-analyze the original requirements. Judge solely from plan internal
599
+ consistency and stated commands / paths. Do NOT inspect the original task brief
600
+ or worker analyses for this round.
601
+
602
+ ## Plan items to verify
603
+
604
+ ### P-Step-3 [TICKETID: <id>]: <one-line summary>
605
+ **From section**: 4.5.4 Stepwise Execution Order
606
+ **Original text**:
607
+ > <verbatim quote of the step>
608
+
609
+ **Check**:
610
+ - Are referenced file paths consistent with the option's File Structure list?
611
+ - Is the named command executable as written?
612
+ - Does the success criterion produce an observable signal?
613
+
614
+ ### P-Opt-2 [TICKETID: <id>]: <one-line summary>
615
+ ...
616
+
617
+ ## Response format
618
+
619
+ ### P-Step-3
620
+ **Verdict**: AGREE | DISAGREE(<a|b|c|d|e>) | SUPPLEMENT
621
+ **Explanation**: <2-3 sentences>
622
+
623
+ ### P-Opt-2
624
+ ...
625
+ ```
626
+
627
+ The "Reverify prompt: required-reading suppression (BLOCKING)" rule (lightweight mode does NOT inject a `[Required reading]` clause) applies here as well.
628
+
629
+ ### Worker non-result handling in plan-body round (BLOCKING)
630
+
631
+ Mirrors finding convergence (§"Worker failure handling in reverify"). Concretely:
632
+
633
+ - A dispatch that returns terminal non-result MUST NOT be aggregated as `DISAGREE`.
634
+ - If at least one dispatch was issued AND **all** plan-body dispatches return non-result, the Gate result is `aborted-non-result`. Record one `contract-violation` event per non-result dispatch.
635
+ - The Approval marker is NOT rendered when the gate is `aborted-non-result`. A single row is added to `## 5. Clarification Items` with `Statement="plan-body verification could not run — all workers returned non-result"`, `Kind=decision`, `Blocks=approval`, allowing the user to either retry the phase or override by manually approving the plan (via `--approve` on the resume command).
636
+
@@ -7,9 +7,22 @@ description: Use when the user asks to list past okstra runs, check execution hi
7
7
 
8
8
  ## When to Use
9
9
 
10
- - When a user views the history of past okstra executions
11
- - When re-running or resuming a previous execution
12
- - When checking the execution status of each task
10
+ - List past okstra task executions (one row per task, with latest-run summary).
11
+ - Drill into the per-run history of a specific task.
12
+ - Build a new run from an old run's parameters (re-run), or continue an in-flight one (resume).
13
+
14
+ This skill is for **listing / re-dispatching**. For reading a single final report by task-key, use `okstra-report-finder` instead.
15
+
16
+ ## Re-run vs Resume — decide upfront
17
+
18
+ Before invoking Step 3 or Step 4, classify the user's intent. The two paths are NOT interchangeable.
19
+
20
+ | Intent | Trigger phrases | Use |
21
+ |---|---|---|
22
+ | **Re-run** — start a fresh run (new run-seq, new manifest, new report) reusing an old run's parameters | "re-run", "다시 실행", "another pass", "rerun with same brief" | Step 3 |
23
+ | **Resume** — continue an interrupted Claude session for an existing run, no new run-seq | "resume", "continue", "이어서", "session ended" | Step 4 |
24
+
25
+ If the user is ambiguous, ask. Defaulting to the wrong one either wastes a fresh run-seq or silently abandons a recoverable session.
13
26
 
14
27
  ## Step 0: Verify okstra runtime + project setup
15
28
 
@@ -38,31 +51,45 @@ use `projectRoot` to locate `.project-docs/okstra/discovery/task-catalog.json`.
38
51
  ## Step 1: Read the Task Catalog
39
52
 
40
53
  1. Read `.project-docs/okstra/discovery/task-catalog.json`.
41
- 2. Sort the `tasks` array in reverse order by `updatedAt` and display it.
42
- 3. Extract the following fields from each task:
54
+ 2. Apply filters from user input (all optional, AND-combined):
55
+ - `--task-type <type>` keep entries whose `taskType` matches.
56
+ - `--latest-run-status <status>` → keep entries whose `latestRunStatus` matches (e.g. `completed`, `contract-violated`, `error`).
57
+ - `--task-group <group>` → keep entries whose `taskGroup` matches.
58
+ 3. Sort the surviving `tasks` array by `updatedAt` descending.
59
+ 4. Page: default `--limit 20`. After printing the table, if rows were truncated, add `... <N> more (pass --limit <N> to see all)`.
60
+ 5. Extract the following fields from each task:
43
61
 
44
62
  | Field | Description |
45
63
  |------|------|
46
- | `taskKey` | Task identifier (`<project-id>:<task-group>:<task-id>`) |
47
- | `taskType` | Analysis type |
48
- | `currentStatus` | Task-level status |
49
- | `latestRunStatus` | Latest run status |
50
- | `updatedAt` | Last update time |
64
+ | `taskKey` | Task identifier — always 3 colon-separated segments: `<project-id>:<task-group>:<task-id>` (see `parse_task_key` in `okstra_project/state.py`). |
65
+ | `taskType` | `requirements-discovery` / `error-analysis` / `implementation-planning` / `implementation` / `final-verification` / `release-handoff` |
66
+ | `currentStatus` | Task lifecycle status written by the contract validator. Values: `todo` (seeded by spawn-followups), `completed`, `contract-violated`. Empty string = validator has not yet run. NOT the same as the user-managed `workStatus` (managed by `okstra-status`). |
67
+ | `latestRunStatus` | Status of the most recent run (`completed`, `contract-violated`, `error`, ...) |
68
+ | `latestRunManifestPath` | Run-manifest path of the most recent run — feed this into Step 3 to re-run from the latest parameters |
69
+ | `updatedAt` | Last update time (ISO 8601) |
51
70
  | `latestReportPath` | Latest report path |
52
- | `latestResumeCommandPath` | Resume command path |
71
+ | `latestResumeCommandPath` | Resume command path (Step 4) |
72
+ | `historyTimelinePath` | `<task-root>/history/timeline.json` (Step 2 reads from here) |
53
73
 
54
- 4. Output format:
74
+ 6. Output format:
55
75
 
56
76
  ```markdown
57
77
  ## okstra Task History — <project-id>
58
78
 
59
- | # | Task Key | Type | Status | Last Run | Report |
60
- |---|----------|------|--------|----------|--------|
61
- | 1 | proj:group:id | error-analysis | completed | 2026-04-05 22:59 | .project-docs/.../final-report-*.md |
62
- | 2 | proj:group:id2 | final-verification | prepared | 2026-04-04 15:30 | -- |
79
+ | # | Task Key | Type | currentStatus | latestRunStatus | Last Run | Report |
80
+ |---|----------|------|---------------|------------------|----------|--------|
81
+ | 1 | proj:group:id | error-analysis | completed | completed | 2026-04-05 22:59 | .project-docs/.../final-report-*.md |
82
+ | 2 | proj:group:id2 | final-verification | todo | error | 2026-04-04 15:30 | -- |
63
83
  ```
64
84
 
65
- 5. If `task-catalog.json` is missing, it responds with "There is no okstra execution history. Please run okstra.sh first."
85
+ ### Catalog absent fallback
86
+
87
+ If `.project-docs/okstra/discovery/task-catalog.json` does not exist, do NOT bail out. The catalog is a derived index — manifests on disk are the source of truth.
88
+
89
+ 1. Glob `<projectRoot>/.project-docs/okstra/tasks/*/*/task-manifest.json`.
90
+ 2. For each manifest, read `taskKey`, `taskGroup`, `taskType`, `currentStatus`, `latestRunStatus`, `updatedAt`, `latestReportPath`, `latestResumeCommandPath`, `latestRunManifestPath`, `historyTimelinePath`.
91
+ 3. Apply the same filters/sort/limit and print the same table, prefixed with: `note: task-catalog.json missing; reconstructed from task manifests on disk.`
92
+ 4. Only if the glob yields zero manifests: respond `There is no okstra execution history yet.`
66
93
 
67
94
  ## Step 2: Run History by Task
68
95
 
@@ -78,9 +105,10 @@ When a user selects a specific task or requests detailed history:
78
105
  | `runDateTimeSegment` | YYYY-MM-DD_HH-MM-SS |
79
106
  | `taskType` | `--task-type` argument value |
80
107
  | `status` | Run status |
81
- | `reportPath` | Report Path |
82
- | `resumeCommandPath` | resume Command path |
83
- | `relatedTasks` | List of related tasks |
108
+ | `runManifestPath` | This run's `run-manifest-*.json` — feed into Step 3 to re-run from this specific run's parameters |
109
+ | `reportPath` | Final report path |
110
+ | `resumeCommandPath` | Resume Claude session for this run (Step 4) |
111
+ | `relatedTasks` | List of related task-keys |
84
112
 
85
113
  4. Output format:
86
114
 
@@ -93,23 +121,26 @@ When a user selects a specific task or requests detailed history:
93
121
  | 2 | 2026-04-04 15:30 | error-analysis | error | -- |
94
122
  ```
95
123
 
96
- ## Step 3: Create a re-execution command
124
+ ## Step 3: Re-run (build a NEW run from an old run's parameters)
97
125
 
98
- To re-run a specific run:
126
+ This builds a **fresh run** — new run-seq, new manifest, new report — using the parameters captured in a previous `run-manifest-*.json`. It does NOT touch the old run's artifacts; use Step 4 if the user wants to continue an interrupted session instead.
99
127
 
100
- 1. Read the run-manifest JSON from the `runManifestPath` of that run.
101
- 2. Extract the required arguments:
128
+ 1. Pick the source run-manifest: the `runManifestPath` from a Step 2 timeline entry, or the task's `latestRunManifestPath` from Step 1.
129
+ 2. Read the run-manifest JSON and extract required arguments:
102
130
  - `projectId` → `--project-id`
103
131
  - `taskGroup` → `--task-group`
104
132
  - `taskId` → `--task-id`
105
133
  - `taskType` → `--task-type`
106
134
  - `taskBriefPath` → `--task-brief`
107
- 3. Extract the optional arguments:
135
+ 3. Extract optional arguments (include only when present in the source manifest):
108
136
  - `recommendedWorkers` → `--workers` (comma-separated)
109
- - `relatedTasks` → `--related-tasks` (if present)
110
- - model overrides → `--claude-model`, `--codex-model`, `--gemini-model` (if different from default)
111
- - for `taskType: implementation`: `teamContract.executor.provider` → `--executor <claude|codex|gemini>` (if different from `claude`)
112
- 4. Display the assembled command:
137
+ - `relatedTasks` → `--related-tasks`
138
+ - model overrides → `--claude-model`, `--codex-model`, `--gemini-model` (when different from default)
139
+ - for `taskType: implementation`: `teamContract.executor.provider` → `--executor <claude|codex|gemini>` (when different from `claude`)
140
+ 4. **`taskType: implementation` only — resolve `--base-ref`:** the base ref is NOT stored in the run-manifest; it lives in the worktree registry at `~/.okstra/worktrees/registry.json` against the registered branch. Before assembling the command:
141
+ - If a worktree for this task-key is already registered, the existing branch & base are reused — omit `--base-ref` unless the user explicitly wants a different starting point.
142
+ - If no worktree is registered (e.g. it was cleaned up), `--base-ref` is mandatory. Ask the user for the ref to branch from (e.g. `main`, a commit SHA, a tag) before running.
143
+ 5. Display the assembled command:
113
144
 
114
145
  ```bash
115
146
  okstra.sh \
@@ -121,23 +152,23 @@ okstra.sh \
121
152
  --workers <worker-list>
122
153
  ```
123
154
 
124
- 5. Once the user confirms, execute it using the Bash tool.
155
+ 6. Once the user confirms, execute it using the Bash tool.
125
156
 
126
- ## Step 4: Resume
157
+ ## Step 4: Resume (continue an interrupted run)
127
158
 
128
- To resume a paused session:
159
+ This continues an existing Claude session for a run that did not finish. It does NOT create a new run-seq — for a fresh dispatch, use Step 3.
129
160
 
130
- 1. Check `latestResumeCommandPath` in the task catalog or timeline.
131
- 2. Verify that the resume script file actually exists.
132
- 3. If it exists, display the execution command:
161
+ 1. Read `latestResumeCommandPath` from the task catalog (Step 1) — or `resumeCommandPath` from a specific timeline entry (Step 2).
162
+ 2. Verify the file exists on disk.
163
+ 3. If present:
133
164
  ```bash
134
165
  bash <resume-command-path>
135
166
  ```
136
- 4. If it does not exist, display the message: "No resume script found. Please run it again."
167
+ 4. If absent, report: `No resume script available for this run. Use Step 3 to start a fresh run instead.`
137
168
 
138
169
  ## Output Rules
139
170
 
140
171
  - Display concisely in a table format
141
172
  - Dates in `YYYY-MM-DD HH:MM` format
142
- - Display status as-is (`completed`, `prepared`, `error`, `not-run`, etc.)
173
+ - Display status fields as-is from disk (`completed`, `contract-violated`, `todo`, `error`, empty, ...). Do not normalize or remap.
143
174
  - Display `--` if no report is available