npm - okstra - Versions diffs - 0.25.1 → 0.27.0 - Mend

okstra 0.25.1 → 0.27.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (48) hide show

package/README.kr.md +16 -0
package/README.md +16 -0
package/docs/kr/architecture.md +3 -7
package/docs/kr/cli.md +47 -4
package/docs/kr/performance-improvement-plan-v2.md +23 -0
package/docs/kr/performance-improvement-plan.md +22 -0
package/docs/superpowers/specs/2026-05-15-implementation-plan-verification-design.md +254 -0
package/package.json +1 -1
package/runtime/BUILD.json +2 -2
package/runtime/agents/SKILL.md +30 -2
package/runtime/bin/okstra.sh +1 -1
package/runtime/prompts/profiles/_common-contract.md +30 -1
package/runtime/prompts/profiles/error-analysis.md +12 -0
package/runtime/prompts/profiles/implementation-planning.md +23 -0
package/runtime/prompts/profiles/requirements-discovery.md +20 -0
package/runtime/python/lib/okstra/cli.sh +8 -7
package/runtime/python/lib/okstra/globals.sh +3 -1
package/runtime/python/lib/okstra/usage.sh +8 -4
package/runtime/python/okstra_ctl/render.py +35 -0
package/runtime/python/okstra_ctl/run.py +27 -6
package/runtime/python/okstra_ctl/run_context.py +1 -1
package/runtime/python/okstra_ctl/wizard.py +259 -10
package/runtime/python/okstra_token_usage/blocks.py +5 -1
package/runtime/python/okstra_token_usage/claude.py +16 -1
package/runtime/python/okstra_token_usage/collect.py +17 -3
package/runtime/python/okstra_token_usage/pricing.py +159 -24
package/runtime/skills/okstra-brief/SKILL.md +532 -65
package/runtime/skills/okstra-context-loader/SKILL.md +25 -11
package/runtime/skills/okstra-convergence/SKILL.md +235 -8
package/runtime/skills/okstra-history/SKILL.md +68 -37
package/runtime/skills/okstra-logs/SKILL.md +26 -4
package/runtime/skills/okstra-report-finder/SKILL.md +49 -22
package/runtime/skills/okstra-report-writer/SKILL.md +59 -64
package/runtime/skills/okstra-run/SKILL.md +53 -39
package/runtime/skills/okstra-schedule/SKILL.md +51 -20
package/runtime/skills/okstra-setup/SKILL.md +31 -12
package/runtime/skills/okstra-status/SKILL.md +20 -8
package/runtime/skills/okstra-team-contract/SKILL.md +27 -15
package/runtime/skills/okstra-time-summary/SKILL.md +53 -16
package/runtime/templates/reports/final-report.template.md +34 -0
package/runtime/templates/reports/settings.template.json +7 -4
package/runtime/validators/lib/fixtures.sh +10 -2
package/runtime/validators/lib/validate-assets.sh +50 -24
package/runtime/validators/validate-brief.py +385 -0
package/runtime/validators/validate-brief.sh +35 -0
package/runtime/validators/validate-run.py +71 -0
package/runtime/validators/validate-workflow.sh +7 -33
package/src/wizard.mjs +21 -5

package/runtime/skills/okstra-context-loader/SKILL.md CHANGED Viewed

@@ -12,7 +12,9 @@ user-invocable: false
 - When the user needs to know the okstra task bundle path
 - When you need to derive all artifact paths based on `task-manifest.json`
-## Step 1: Locating the Task Bundle
+## Step 1: Resolve the Task Bundle Path
+(Resolve which task-root path to use; Step 2 opens `task-manifest.json` at that path.)
 ### Default Location Rules
@@ -28,11 +30,12 @@ user-invocable: false
 3. If the user attempts to find a task based on `task-group` + `task-id` or `task-id`, `.project-docs/okstra/discovery/task-catalog.json` is read to find candidates.
 4. If multiple candidates are found based on `task-id` alone, the situation is ambiguous, so `task-group` or the full `taskKey` is required.
 5. If the user has not provided an explicit task key/path, first read `.project-docs/okstra/discovery/latest-task.json` using the current-task convenience pointer.
-6. Even if the latest-task pointer is missing or corrupted, but the task catalog exists, first check the list of prepared task bundles based on the catalog. Do not use the legacy `CLAUDE.md`, project guide, or task scan fallback.
+6. If the latest-task pointer is missing or corrupted but the task catalog exists, list candidates from the catalog. Do not use the legacy `CLAUDE.md`, project guide, or task scan fallback.
+7. If **neither** `latest-task.json` **nor** `task-catalog.json` exists, ABORT Phase 1 with `OKSTRA_CONTEXT_NOT_INITIALIZED`. Suggest the user run `/okstra-setup` and `/okstra-brief` to bootstrap the project. Do NOT crawl `.project-docs/okstra/tasks/` directly — discovery pointers are the only supported entry path.
-## Step 2: Read task-manifest.json
+## Step 2: Open and Parse task-manifest.json
-task-manifest.json is the canonical metadata source. The following fields must be extracted from this file:
+`task-manifest.json` (found at the task-root resolved in Step 1) is the canonical metadata source. Extract the following fields:
 | Field | Description |
 |------|------|
@@ -53,7 +56,7 @@ task-manifest.json is the canonical metadata source. The following fields must b
 | `workflow.awaitingApproval` | Approval wait marker |
 | `workflow.routingStatus` | Routing decision status |
 | `workflow.lastSafeCheckpoint` | Safe resume checkpoint metadata |
-| `instructionSetPath` | Instruction-set path |
+| `instructionSetPath` | Path to the `instruction-set/` **directory** containing `analysis-profile.md`, `analysis-material.md`, `reference-expectations.md`, `task-brief.md`, `final-report-template.md` (see Step 4). Not a single-file path. |
 | `referenceExpectationsPath` | config/deployment expectation artifact path |
 | `latestRunPath` | latest run path |
 | `latestRunStatus` | latest run status |
@@ -63,7 +66,7 @@ task-manifest.json is the canonical metadata source. The following fields must b
 | `historyTimelinePath` | timeline path |
 | `resultContract` | team contract and expected artifact metadata |
 | `resultContract.requiredWorkerRoles[*].promptPath` | worker prompt history path by role |
-| `convergence` | convergence loop settings (enabled, maxRounds, verificationMode). `maxRounds` 기본값은 phase-aware: `requirements-discovery`는 `1`, 그 외는 `2`. 자세한 내용은 [okstra-convergence](../okstra-convergence/SKILL.md) 참고 |
+| `convergence` | convergence loop settings (`enabled`, `maxRounds`, `verificationMode`). See [okstra-convergence](../okstra-convergence/SKILL.md) for the authoritative defaults — do not re-document the `maxRounds` value here. |
 ## Step 3: Directory Structure Rules
@@ -121,12 +124,21 @@ After verifying `task-manifest.json`, read the instruction set in the following
 4. `instruction-set/task-brief.md` (task brief)
 5. `instruction-set/final-report-template.md` (report template)
+### Brief Reporter-Confirmation Precondition (BLOCKING)
+After reading `task-brief.md`, extract the frontmatter `reporter-confirmations` field (`complete | partial | pending | skipped`). This precondition is shared across every consuming phase — see `prompts/profiles/_common-contract.md` "Brief consumption" block for the authoritative handling matrix.
+- `complete` or `partial` → proceed to Step 5 and hand off to `okstra-team-contract`.
+- `skipped` → proceed, but flag the unmarked `intent-check:` / `conversion-block:` rows for promotion by the phase profile.
+- `pending` (or field missing) → emit `REPORTER_CONFIRMATION_PENDING` and STOP. Do not invoke `okstra-team-contract` or any analyser. The operator must rerun `okstra-brief` Step 6.5 before Phase 2 can start.
 ## Step 5: Read Run Manifest and Team State
-1. Current run manifest: The latest run manifest pointed to by the discovery pointer or task-manifest
-2. Current team state: The latest team-state pointed to by the discovery pointer or task-manifest
-3. Extract the worker prompt directory path and per-worker prompt history paths from the current run manifest and team-state
-4. If an existing run report is available, use it solely as historical context.
+1. Identify the active run by reading `runDateTimeSegment` from the latest `runs/<task-type>/manifests/run-manifest-*.json` (mtime order). That segment is the shared run identifier across all category subdirectories (`state/`, `prompts/`, `reports/`, `status/`, `sessions/`, `worker-results/`).
+2. Resolve sibling artifacts for this run by matching the same `runDateTimeSegment`. Do NOT re-scan `<seq>` counters per category — they may diverge if an earlier run only wrote some categories.
+3. Current team state: the team-state file whose `runDateTimeSegment` matches the active run manifest.
+4. Extract the worker prompt directory path and per-worker prompt history paths from the current run manifest and team-state.
+5. If an existing run report is available, use it solely as historical context.
 ## Output
@@ -137,4 +149,6 @@ Information to be obtained after executing this skill:
 - Reference list of config files/deployment manifests and task-level expected values
 - Current run status and presence of existing worker results
 - Current run prompt history contract for attempted workers
-- Resume command path: `runs/<task-type>/sessions/claude-resume-<task-type>-<seq>.sh`
+- Candidate `teamName` for Phase 3 hand-off: `okstra-<task-key>` (with task-key slugified per Step 1's slug rule)
+- Current Claude `lead.sessionId` (the in-flight Claude Code session) — required by `okstra-team-contract` when registering the lead in `team-state.json`
+- Resume command path: from `task-manifest.json` → `latestResumeCommandPath` (fallback: latest `runs/<task-type>/sessions/claude-resume-*.sh` by mtime). Never reconstruct the filename — the `<seq>` counter is category-local and may diverge from `manifests/`.

package/runtime/skills/okstra-convergence/SKILL.md CHANGED Viewed

@@ -6,6 +6,23 @@ user-invocable: false
 # OKSTRA Convergence
+## Index
+- [Scope and Terminology (BLOCKING)](#scope-and-terminology-blocking)
+- [When to Use](#when-to-use)
+- [Configuration](#configuration)
+- [Finding Category](#finding-category)
+- [Convergence Algorithm](#convergence-algorithm)
+  - [Round 0: Parse worker results](#round-0-parse-worker-results)
+  - [Round 1-N: Re-verification Loop (queue-pruned)](#round-1-n-re-verification-loop-queue-pruned)
+  - [Convergence Test](#convergence-test)
+- [Verification Mode](#verification-mode)
+- [Re-verification Agent Dispatch](#re-verification-agent-dispatch)
+- [Convergence State Artifact](#convergence-state-artifact)
+- [Output](#output)
+- [Convergence Disabled](#convergence-disabled)
+- [Plan-body verification mode (implementation-planning only)](#plan-body-verification-mode-implementation-planning-only)
 ## Scope and Terminology (BLOCKING)
 This skill governs **Phase 5.5 (Convergence loop)** — a *lead operating phase* inside a single okstra run, not a task-type lifecycle phase. The 6 task-type lifecycle phases (`requirements-discovery` → `error-analysis` → `implementation-planning` → `implementation` → `final-verification` → `release-handoff`, see [okstra/SKILL.md](../../SKILL.md) "Lifecycle Phase Boundaries") are unchanged by this skill. The lead operating phases (Phase 1 Intake → Phase 7 Persist, see [okstra/SKILL.md](../../SKILL.md) "Quick Reference") describe how the lead drives a *single* task-type run.
@@ -30,6 +47,8 @@ Configure this in the `convergence` block of `task-manifest.json`. If the block
 | `maxRounds` | phase-aware: `1` for `requirements-discovery`, `2` otherwise (range 1–3) | Maximum number of re-verification rounds. Discovery's routing/missing-input outputs gain little from a second round; other phases (especially `error-analysis`) keep `2`. Lead resolves the effective value when the manifest omits the key and records it in `config.maxRounds` of the convergence state artifact. |
 | `verificationMode` | `"lightweight"` | `"lightweight"` or `"full-reanalysis"` |
+**Auto-disable rule (BLOCKING).** Convergence requires ≥2 analyser workers to produce a meaningful consensus tally. When the active profile's `Required workers:` block (see `prompts/profiles/*.md`) resolves to fewer than 2 analyser workers — e.g. `release-handoff` (zero analyser workers, lead-only) — the lead MUST treat `convergence.enabled` as `false` for that run regardless of manifest configuration, skip Phases 5.5 and the plan-body verification round, and record `finalState: "converged"` with `totalRounds: 0` and an explanatory note in `config` (e.g. `"autoDisabled": "fewer-than-two-analysers"`). The plan-body round inherits the same rule via its `gating=false` advisory path.
 ## Finding Category
 | Category | Definition | Included in Report |
@@ -37,10 +56,12 @@ Configure this in the `convergence` block of `task-manifest.json`. If the block
 | `full-consensus` | All participating workers agree | Required |
 | `partial-consensus` | Majority of workers agree; dissenting opinions are recorded | Required |
 | `contested` | Final classification only. Assigned to a finding that remains in the verification queue after the **last executed round** completes (round index = `effectiveMaxRounds`). Each worker's position across all executed rounds is recorded. NEVER used as an intermediate label. | Required |
-| `worker-unique` | Only the discoverer confirms; others oppose or remain unverified | Required |
+| `worker-unique` | Only the discoverer confirms and ALL other non-error votes are `DISAGREE`. `verification-error` votes are excluded from the tally per §"Worker failure handling in reverify"; a finding where every non-discoverer vote is `verification-error` is carried forward, never classified `worker-unique`. | Required |
 ## Convergence Algorithm
+**Majority definition (BLOCKING).** "Majority" means *strictly greater than half* of the non-error votes for that finding (`verification-error` votes are excluded from both numerator and denominator). Ties — including the 1-AGREE / 1-DISAGREE case in a two-analyser roster — are NOT a majority: in intermediate rounds the finding is **carried forward**; in the final executed round the finding is classified `contested`. This rule applies identically to the plan-body verification round (§"Plan-body verification mode") where the same verdict tokens are reused.
 ### Round 0: Parse worker results
 Read the worker result files generated in Phase 4/5 and extract individual findings.
@@ -132,7 +153,8 @@ The lead MUST construct the per-worker reverify prompt body from `items_for_W` o
 |---|---|---|
 | `effectiveMaxRounds >= 2` | true | `"max-rounds-1"` |
 | `len(queue) > 0` after round 1 | true | `"queue-empty"` |
-| At least one round-1 reverify dispatch terminated as `completed` | true | `"all-reverify-non-result"` |
+The third gate condition — "all reverify dispatches terminated as non-result" — is handled inline by the WHILE-loop body (see the `BREAK` on `all dispatches ... terminal non-result` and §"Worker failure handling in reverify" rule 4) which records `round2SkippedReason = "all-reverify-non-result"` and aborts before the predicate is re-evaluated. It is therefore not duplicated as a gate row here.
 When all conditions hold the predicate returns `true` and `round2SkippedReason` is set to `"not-skipped"`. The field is mandatory on every convergence state artifact — write `"not-skipped"` rather than omitting the key.
@@ -152,11 +174,7 @@ The final classifier (`FOR each finding F still in queue` block) treats `verific
 ### Convergence Test
-- If the verification queue is empty at the end of any round → Convergence complete (`finalState: "converged"`), remaining rounds are not executed
-- Upon completing the **last executed round** (where round index == `effectiveMaxRounds`, OR where Round 2 was suppressed per the Round 2 gate below) → Apply final classification to remaining queue items:
-  - Majority agreement across executed rounds → `partial-consensus`
-  - Otherwise → `contested`
-- The final classification step never runs while the queue is still being re-verified — confirmed items always exit the queue first.
+The exit conditions and final-classification rules are defined by the §"Convergence Algorithm" pseudocode (the `WHILE` exit, the post-loop `FOR each finding F still in queue` block, and the `finalState` mapping in §"Convergence State Artifact"). This is the single source — no separate prose copy is maintained here to prevent drift.
 ## Verification Mode
@@ -383,9 +401,11 @@ Schema rules:
 - `schemaVersion`: literal string `"1.1"` for new runs. Readers MUST accept `"1.0"` for historical artifacts and treat any missing v1.1 field as `null`.
 - `config.effectiveMaxRounds`: the integer the lead actually used after resolving the phase-aware default (`1` for `requirements-discovery`, `2` otherwise). MUST equal `config.maxRounds` when the manifest explicitly set it.
 - `findings[].ticketIds`: array of ticket keys from Phase 4 grouping (parsed per the Round 0 step 5 rule). MAY be empty when the discovering worker tagged the finding `unknown`.
+- `findings[].rounds[].votes.<worker>.verdict`: enum, one of `agree | disagree | supplement | verification-error`. Lower-case tokens; map upper-case AGREE/DISAGREE/SUPPLEMENT verdicts emitted by workers to their lower-case form before persisting. `verification-error` is reserved for terminal non-result dispatches (§"Worker failure handling in reverify").
+- `findings[].classification`: enum, one of `full-consensus | partial-consensus | worker-unique | contested`. No other value is permitted in v1.1.
 - `roundHistory[].inputQueueSize`: queue size at the start of this round.
 - `roundHistory[].resolvedCount`: number of findings that exited the queue this round (sum of full+partial+worker-unique classifications produced this round).
-- `roundHistory[].carriedForwardCount`: queue size at the END of this round (must equal `inputQueueSize - resolvedCount` when there are no in-round queue insertions; in-round insertions are forbidden).
+- `roundHistory[].carriedForwardCount`: queue size at the END of this round — the single definition. In-round insertions into the queue are forbidden, so this always equals `inputQueueSize - resolvedCount`. The pseudocode's per-item `carriedForwardCount += 1` accumulator is a counting convenience that lands on the same value; persist the post-round queue length, not the loop accumulator, if the two ever diverge.
 - `roundHistory[].dispatches[]`: one entry per worker that was actually dispatched in this round. Each entry is `{worker, status, durationMs}`. `status ∈ {completed, timeout, error, not-run}`. `durationMs` is integer milliseconds and is always present, even for terminal-non-result dispatches (use the elapsed time before the wrapper gave up).
 - `roundHistory[].skippedWorkers[]`: per-worker `{worker, reason}` for workers with no items to verify OR with a non-result dispatch.
 - `roundHistory[].verificationsRequested|verificationsCompleted|newConsensus|remainingInQueue|earlyExit`: legacy v1.0 aliases. New runs SHOULD populate them so existing parsers keep working: `verificationsRequested == len(dispatches)`, `verificationsCompleted == len(d for d in dispatches if d.status == "completed")`, `newConsensus == resolvedCount`, `remainingInQueue == carriedForwardCount`, `earlyExit == (round < effectiveMaxRounds AND carriedForwardCount == 0)`.
@@ -407,3 +427,210 @@ Information to be passed to Phase 6 after executing this skill:
 ## Convergence Disabled
 If `convergence.enabled: false`, this skill is skipped. Phase 6 operates using the existing consensus/divergence method.
+## Plan-body verification mode (implementation-planning only)
+This section defines a **second, independent** convergence round that fires only for `task-type = implementation-planning`. The round verifies the *consolidated plan* that the report-writer worker has authored, not the worker findings that were already reconciled earlier.
+### Lifecycle position (BLOCKING)
+Plan-body verification runs **after** finding convergence and **after** the report-writer draft is written. Sequence inside a single implementation-planning run:
+```
+Phase 4   workers produce independent analyses (Findings F-001…)
+  → Phase 5.5   FINDING convergence (this skill, sections "Convergence Algorithm" through "Convergence State Artifact")
+  → Phase 6   report-writer authors final-report draft (consolidated Option Candidates / Stepwise Execution Order / Dependency / Validation Checklist / Rollback)
+  → PLAN-BODY VERIFICATION ROUND ← new — described below
+  → User Approval gate (top-of-report `- [ ] Approved` marker is rendered only when this round's Gate result is `passed` or `passed-with-dissent`)
+  → implementation phase (separate run)
+```
+Plan-body verification MUST NOT replace, precede, or be conflated with the Phase 5.5 finding convergence above. They are two distinct rounds with different inputs (findings vs. consolidated plan body), different ID schemes (`F-*` vs. `P-*`), and different state files.
+### MUTUAL EXCLUSION (BLOCKING)
+The finding queue (Phase 5.5) and the plan-item queue (this section) are **disjoint**:
+- A finding-convergence reverify prompt MUST NOT contain any `P-*` item.
+- A plan-body verification prompt MUST NOT contain any `F-*` finding.
+- The two rounds write to **different state files**: `runs/<task-type>/state/convergence-<task-type>-<seq>.json` (findings, see §"Convergence State Artifact") vs. `runs/<task-type>/state/plan-body-verification-<task-type>-<seq>.json` (plan items, see §"`plan-body-verification.json` schema").
+- Aggregation logic (verdict counting, classification) MUST NOT carry votes from one queue into the other.
+Mixing the two queues — for example, parsing a Phase 6 draft's Stepwise Execution Order step as if it were an `F-*` finding — is a contract violation. Future Claude reading this skill: if you find yourself tempted to "just reuse the finding queue for plan items, they're similar enough", stop. They are not similar enough; the verdict semantics differ (see §"Plan-body verdict semantics" below).
+### Configuration
+Plan-body verification is configured under `convergence.planBodyVerification` in `task-manifest.json`:
+| Setting | Default | Description |
+|---------|---------|-------------|
+| `enabled` | `true` | If `false`, the round is skipped and the top-of-report Approval marker is rendered unconditionally (legacy behaviour). |
+| `maxRounds` | `1` | Hard upper bound. Plan-body verification is consistency / completeness checking, not fact checking — additional rounds rarely help. Range 1–3. |
+| `gating` | `true` | If `true` (default), `majority-disagree` blocks the Approval marker. If `false`, the round is advisory-only and the marker always renders. |
+Default values are emitted into the manifest by `scripts/okstra_ctl/render.py` (`_build_convergence_block`). The ctx knob `OKSTRA_PLAN_VERIFICATION=false` flips `planBodyVerification.enabled` to false.
+### Plan-item extraction (Round 0 equivalent)
+From the report-writer's draft of `## 4.5 Implementation Plan Deliverables`, lead extracts plan items with the following prefixes (see also `templates/reports/final-report.template.md` §4.5.9):
+| Prefix | Source sub-section | One row per |
+|--------|--------------------|-------------|
+| `P-Opt-<N>` | `4.5.1 Option Candidates` | one Option (its File Structure list + interfaces + blast radius) |
+| `P-Step-<N>` | `4.5.4 Stepwise Execution Order` | one step (path + command + success signal) |
+| `P-Dep-<N>` | `4.5.5 Dependency / Migration Risk` | one dependency row |
+| `P-Val-<N>` | `4.5.6 Validation Checklist` | one checklist item |
+| `P-Rb-<N>` | `4.5.7 Rollback Strategy` | one rollback path |
+`4.5.2 Trade-off Matrix` and `4.5.3 Recommended Option` are NOT extracted as standalone plan items — the trade-off matrix is evaluated implicitly through each option's `P-Opt-*` verification, and the recommended option is one of those `P-Opt-*` rows.
+Each plan item inherits the `[TICKETID: ...]` tag of its source section (per the standard ticket-tagging contract).
+### Plan-body verdict semantics
+The verdict tokens `AGREE` / `DISAGREE` / `SUPPLEMENT` are reused, but their meaning is plan-specific:
+- **AGREE**: the item is executable as written *and* internally consistent with other items in the plan.
+- **DISAGREE(<kind>)**: the item is broken. `<kind>` MUST be one of:
+  - `a` — referenced file path / symbol mismatches another step or option's File Structure list
+  - `b` — command is not executable or is ambiguous
+  - `c` — validation signal is not observable
+  - `d` — rollback violates commit / dependency order
+  - `e` — item contradicts the trade-off matrix
+- **SUPPLEMENT**: the item is sound but is missing a dependency / edge case / precondition.
+Worker non-result handling (`timeout`, `error`, no result file, wrapper `cli-failure`) is identical to finding convergence: do NOT aggregate as DISAGREE, record `contract-violation`, and apply the round-level abort rule below.
+### Mode constraint
+Plan-body verification only supports **lightweight mode** (defined in §"Verification Mode" above). `full-reanalysis` is not meaningful here because the "original source materials" for a plan item are the worker's own analysis plus the lead-mediated synthesis — there is no independent ground truth to re-read. The manifest's top-level `verificationMode` is ignored for this round; lightweight is always used.
+### Round protocol (single round at default `maxRounds=1`)
+1. Lead parses the report-writer draft and extracts the `P-*` plan items.
+2. For each analyser worker in the roster (`claude`, `codex`, and `gemini` if opted in), lead constructs a reverify prompt using the template in §"Plan-body reverify prompt" below.
+3. Dispatch uses the same wrapper infrastructure as finding convergence. The `--role-slug` is `<role>-plan-verify-r<N>`. Result file path: `runs/<task-type>/worker-results/<role-slug>-plan-verify-r<N>-implementation-planning-<seq>.md`.
+4. After all dispatches return, lead aggregates verdicts per `P-*` item across workers and classifies each:
+   - `full-consensus` — all participating analysers `AGREE` (SUPPLEMENT counts as agree on the item itself).
+   - `partial-consensus` — majority `AGREE`, dissenting `DISAGREE` recorded.
+   - `dissent-isolated` — only one worker `DISAGREE`s, others `AGREE` — treat as `partial-consensus` for gate purposes; record dissent. (Distinct from finding-convergence `worker-unique`, which means the *opposite*: only one worker AGREEs. Plan-body classifications use this dedicated label to avoid the collision.)
+   - `majority-disagree` — majority of analysers `DISAGREE` on this item. This is the only classification that **blocks the Approval marker**.
+   - `contested` only meaningful when `maxRounds > 1`; at default `maxRounds=1`, fold any unresolved item into `partial-consensus`.
+5. Gate result resolution:
+   - any `majority-disagree` item present AND `gating=true` → `blocked-by-disagreement`
+   - all dispatches non-result → `aborted-non-result`
+   - any `partial-consensus` / `dissent-isolated` present, no `majority-disagree` → `passed-with-dissent`
+   - all items `full-consensus` → `passed`
+6. Lead writes `runs/<task-type>/state/plan-body-verification-<task-type>-<seq>.json` (schema below) and populates `### 4.5.9 Plan Body Verification` in the final report (template at `templates/reports/final-report.template.md`).
+7. For every `majority-disagree` item, lead adds a row to `## 5. Clarification Items` with:
+   - new `C-<N>` ID (numbering continues from any existing rows)
+   - `Statement` summarising the disagreement and the worker breakage `<kind>`
+   - `Kind` chosen per the standard policy (usually `decision` for option-level conflicts, `data-point` for path/symbol mismatches)
+   - `Blocks=approval`
+   - the §4.5.9 verdict table's `Classification` column for that row reads `majority-disagree → C-<N>` (1:1 ID match — orphan on either side is a contract violation per `prompts/profiles/implementation-planning.md` self-review step 6).
+8. The top-of-report `- [ ] Approved` marker line is rendered if and only if the Gate result is `passed` or `passed-with-dissent`. `validators/validate-run.py` `validate_phase_boundary` enforces this correspondence; manually adding the marker line when the gate did not pass is a contract violation.
+### `plan-body-verification-<task-type>-<seq>.json` schema
+```json
+{
+  "schemaVersion": "1.0",
+  "phase": "implementation-planning",
+  "round": 1,
+  "effectiveMaxRounds": 1,
+  "gating": true,
+  "verificationMode": "lightweight",
+  "gateResult": "passed | passed-with-dissent | blocked-by-disagreement | aborted-non-result",
+  "planItems": [
+    {
+      "id": "P-Opt-1",
+      "sourceSection": "4.5.1",
+      "ticketId": "<id-or-unknown>",
+      "votes": {"claude-worker": "AGREE", "codex-worker": "AGREE"},
+      "classification": "full-consensus",
+      "clarificationId": null
+    },
+    {
+      "id": "P-Step-3",
+      "sourceSection": "4.5.4",
+      "ticketId": "TICKET-123",
+      "votes": {"claude-worker": "DISAGREE(a)", "codex-worker": "DISAGREE(a)"},
+      "classification": "majority-disagree",
+      "clarificationId": "C-7"
+    }
+  ],
+  "dispatches": [
+    {"role": "claude-worker", "resultPath": "...", "terminalStatus": "completed"}
+  ]
+}
+```
+`dispatches[].terminalStatus` mirrors finding convergence (`completed | timeout | error | not-run | cli-failure`).
+`planItems[].classification` enum: `full-consensus | partial-consensus | dissent-isolated | majority-disagree | contested`. `contested` only appears when `maxRounds > 1`; at default `maxRounds=1` any otherwise-unresolved item folds into `partial-consensus` per the round protocol above.
+`planItems[].votes.<worker>` is the verbatim verdict token emitted by the worker — `AGREE | DISAGREE(<a|b|c|d|e>) | SUPPLEMENT` — or `verification-error` for terminal non-result dispatches. The `DISAGREE` token retains its `<kind>` suffix so the breakage class is recoverable from the state file alone.
+### Plan-body reverify prompt
+Required prompt anchor headers are identical to finding convergence (see §"Required reverify-prompt anchor headers"). The prompt body changes from F-* listing to P-* listing:
+```
+You are <worker-role> performing plan-body verification for <task-key> (round 1).
+## Instructions
+Review the following items extracted from the consolidated implementation plan
+authored after your initial analysis. For EACH item, respond with exactly one
+verdict:
+- **AGREE**: The item is executable as written and internally consistent with
+  other items in the plan.
+- **DISAGREE(<kind>)**: The item is broken. Cite which kind:
+  (a) referenced file path / symbol mismatches another step or option,
+  (b) command is not executable or is ambiguous,
+  (c) validation signal is not observable,
+  (d) rollback violates commit / dependency order,
+  (e) item contradicts the trade-off matrix.
+- **SUPPLEMENT**: The item is sound but a dependency / edge case / precondition
+  is missing.
+Do NOT re-analyze the original requirements. Judge solely from plan internal
+consistency and stated commands / paths. Do NOT inspect the original task brief
+or worker analyses for this round.
+## Plan items to verify
+### P-Step-3 [TICKETID: <id>]: <one-line summary>
+**From section**: 4.5.4 Stepwise Execution Order
+**Original text**:
+> <verbatim quote of the step>
+**Check**:
+ - Are referenced file paths consistent with the option's File Structure list?
+ - Is the named command executable as written?
+ - Does the success criterion produce an observable signal?
+### P-Opt-2 [TICKETID: <id>]: <one-line summary>
+...
+## Response format
+### P-Step-3
+**Verdict**: AGREE | DISAGREE(<a|b|c|d|e>) | SUPPLEMENT
+**Explanation**: <2-3 sentences>
+### P-Opt-2
+...
+```
+The "Reverify prompt: required-reading suppression (BLOCKING)" rule (lightweight mode does NOT inject a `[Required reading]` clause) applies here as well.
+### Worker non-result handling in plan-body round (BLOCKING)
+Mirrors finding convergence (§"Worker failure handling in reverify"). Concretely:
+- A dispatch that returns terminal non-result MUST NOT be aggregated as `DISAGREE`.
+- If at least one dispatch was issued AND **all** plan-body dispatches return non-result, the Gate result is `aborted-non-result`. Record one `contract-violation` event per non-result dispatch.
+- The Approval marker is NOT rendered when the gate is `aborted-non-result`. A single row is added to `## 5. Clarification Items` with `Statement="plan-body verification could not run — all workers returned non-result"`, `Kind=decision`, `Blocks=approval`, allowing the user to either retry the phase or override by manually approving the plan (via `--approve` on the resume command).

package/runtime/skills/okstra-history/SKILL.md CHANGED Viewed

@@ -7,9 +7,22 @@ description: Use when the user asks to list past okstra runs, check execution hi
 ## When to Use
-- When a user views the history of past okstra executions
-- When re-running or resuming a previous execution
-- When checking the execution status of each task
+- List past okstra task executions (one row per task, with latest-run summary).
+- Drill into the per-run history of a specific task.
+- Build a new run from an old run's parameters (re-run), or continue an in-flight one (resume).
+This skill is for **listing / re-dispatching**. For reading a single final report by task-key, use `okstra-report-finder` instead.
+## Re-run vs Resume — decide upfront
+Before invoking Step 3 or Step 4, classify the user's intent. The two paths are NOT interchangeable.
+| Intent | Trigger phrases | Use |
+|---|---|---|
+| **Re-run** — start a fresh run (new run-seq, new manifest, new report) reusing an old run's parameters | "re-run", "다시 실행", "another pass", "rerun with same brief" | Step 3 |
+| **Resume** — continue an interrupted Claude session for an existing run, no new run-seq | "resume", "continue", "이어서", "session ended" | Step 4 |
+If the user is ambiguous, ask. Defaulting to the wrong one either wastes a fresh run-seq or silently abandons a recoverable session.
 ## Step 0: Verify okstra runtime + project setup
@@ -38,31 +51,45 @@ use `projectRoot` to locate `.project-docs/okstra/discovery/task-catalog.json`.
 ## Step 1: Read the Task Catalog
 1. Read `.project-docs/okstra/discovery/task-catalog.json`.
-2. Sort the `tasks` array in reverse order by `updatedAt` and display it.
-3. Extract the following fields from each task:
+2. Apply filters from user input (all optional, AND-combined):
+   - `--task-type <type>` → keep entries whose `taskType` matches.
+   - `--latest-run-status <status>` → keep entries whose `latestRunStatus` matches (e.g. `completed`, `contract-violated`, `error`).
+   - `--task-group <group>` → keep entries whose `taskGroup` matches.
+3. Sort the surviving `tasks` array by `updatedAt` descending.
+4. Page: default `--limit 20`. After printing the table, if rows were truncated, add `... <N> more (pass --limit <N> to see all)`.
+5. Extract the following fields from each task:
 | Field | Description |
 |------|------|
-| `taskKey` | Task identifier (`<project-id>:<task-group>:<task-id>`) |
-| `taskType` | Analysis type |
-| `currentStatus` | Task-level status |
-| `latestRunStatus` | Latest run status |
-| `updatedAt` | Last update time |
+| `taskKey` | Task identifier — always 3 colon-separated segments: `<project-id>:<task-group>:<task-id>` (see `parse_task_key` in `okstra_project/state.py`). |
+| `taskType` | `requirements-discovery` / `error-analysis` / `implementation-planning` / `implementation` / `final-verification` / `release-handoff` |
+| `currentStatus` | Task lifecycle status written by the contract validator. Values: `todo` (seeded by spawn-followups), `completed`, `contract-violated`. Empty string = validator has not yet run. NOT the same as the user-managed `workStatus` (managed by `okstra-status`). |
+| `latestRunStatus` | Status of the most recent run (`completed`, `contract-violated`, `error`, ...) |
+| `latestRunManifestPath` | Run-manifest path of the most recent run — feed this into Step 3 to re-run from the latest parameters |
+| `updatedAt` | Last update time (ISO 8601) |
 | `latestReportPath` | Latest report path |
-| `latestResumeCommandPath` | Resume command path |
+| `latestResumeCommandPath` | Resume command path (Step 4) |
+| `historyTimelinePath` | `<task-root>/history/timeline.json` (Step 2 reads from here) |
-4. Output format:
+6. Output format:
 ```markdown
 ## okstra Task History — <project-id>
-| # | Task Key | Type | Status | Last Run | Report |
-|---|----------|------|--------|----------|--------|
-| 1 | proj:group:id | error-analysis | completed | 2026-04-05 22:59 | .project-docs/.../final-report-*.md |
-| 2 | proj:group:id2 | final-verification | prepared | 2026-04-04 15:30 | -- |
+| # | Task Key | Type | currentStatus | latestRunStatus | Last Run | Report |
+|---|----------|------|---------------|------------------|----------|--------|
+| 1 | proj:group:id | error-analysis | completed | completed | 2026-04-05 22:59 | .project-docs/.../final-report-*.md |
+| 2 | proj:group:id2 | final-verification | todo | error | 2026-04-04 15:30 | -- |
 ```
-5. If `task-catalog.json` is missing, it responds with "There is no okstra execution history. Please run okstra.sh first."
+### Catalog absent — fallback
+If `.project-docs/okstra/discovery/task-catalog.json` does not exist, do NOT bail out. The catalog is a derived index — manifests on disk are the source of truth.
+1. Glob `<projectRoot>/.project-docs/okstra/tasks/*/*/task-manifest.json`.
+2. For each manifest, read `taskKey`, `taskGroup`, `taskType`, `currentStatus`, `latestRunStatus`, `updatedAt`, `latestReportPath`, `latestResumeCommandPath`, `latestRunManifestPath`, `historyTimelinePath`.
+3. Apply the same filters/sort/limit and print the same table, prefixed with: `note: task-catalog.json missing; reconstructed from task manifests on disk.`
+4. Only if the glob yields zero manifests: respond `There is no okstra execution history yet.`
 ## Step 2: Run History by Task
@@ -78,9 +105,10 @@ When a user selects a specific task or requests detailed history:
 | `runDateTimeSegment` | YYYY-MM-DD_HH-MM-SS |
 | `taskType` | `--task-type` argument value |
 | `status` | Run status |
-| `reportPath` | Report Path |
-| `resumeCommandPath` | resume Command path |
-| `relatedTasks` | List of related tasks |
+| `runManifestPath` | This run's `run-manifest-*.json` — feed into Step 3 to re-run from this specific run's parameters |
+| `reportPath` | Final report path |
+| `resumeCommandPath` | Resume Claude session for this run (Step 4) |
+| `relatedTasks` | List of related task-keys |
 4. Output format:
@@ -93,23 +121,26 @@ When a user selects a specific task or requests detailed history:
 | 2 | 2026-04-04 15:30 | error-analysis | error | -- |
 ```
-## Step 3: Create a re-execution command
+## Step 3: Re-run (build a NEW run from an old run's parameters)
-To re-run a specific run:
+This builds a **fresh run** — new run-seq, new manifest, new report — using the parameters captured in a previous `run-manifest-*.json`. It does NOT touch the old run's artifacts; use Step 4 if the user wants to continue an interrupted session instead.
-1. Read the run-manifest JSON from the `runManifestPath` of that run.
-2. Extract the required arguments:
+1. Pick the source run-manifest: the `runManifestPath` from a Step 2 timeline entry, or the task's `latestRunManifestPath` from Step 1.
+2. Read the run-manifest JSON and extract required arguments:
    - `projectId` → `--project-id`
    - `taskGroup` → `--task-group`
    - `taskId` → `--task-id`
    - `taskType` → `--task-type`
    - `taskBriefPath` → `--task-brief`
-3. Extract the optional arguments:
+3. Extract optional arguments (include only when present in the source manifest):
    - `recommendedWorkers` → `--workers` (comma-separated)
-   - `relatedTasks` → `--related-tasks` (if present)
-   - model overrides → `--claude-model`, `--codex-model`, `--gemini-model` (if different from default)
-   - for `taskType: implementation`: `teamContract.executor.provider` → `--executor <claude|codex|gemini>` (if different from `claude`)
-4. Display the assembled command:
+   - `relatedTasks` → `--related-tasks`
+   - model overrides → `--claude-model`, `--codex-model`, `--gemini-model` (when different from default)
+   - for `taskType: implementation`: `teamContract.executor.provider` → `--executor <claude|codex|gemini>` (when different from `claude`)
+4. **`taskType: implementation` only — resolve `--base-ref`:** the base ref is NOT stored in the run-manifest; it lives in the worktree registry at `~/.okstra/worktrees/registry.json` against the registered branch. Before assembling the command:
+   - If a worktree for this task-key is already registered, the existing branch & base are reused — omit `--base-ref` unless the user explicitly wants a different starting point.
+   - If no worktree is registered (e.g. it was cleaned up), `--base-ref` is mandatory. Ask the user for the ref to branch from (e.g. `main`, a commit SHA, a tag) before running.
+5. Display the assembled command:
 ```bash
 okstra.sh \
@@ -121,23 +152,23 @@ okstra.sh \
   --workers <worker-list>
 ```
-5. Once the user confirms, execute it using the Bash tool.
+6. Once the user confirms, execute it using the Bash tool.
-## Step 4: Resume
+## Step 4: Resume (continue an interrupted run)
-To resume a paused session:
+This continues an existing Claude session for a run that did not finish. It does NOT create a new run-seq — for a fresh dispatch, use Step 3.
-1. Check `latestResumeCommandPath` in the task catalog or timeline.
-2. Verify that the resume script file actually exists.
-3. If it exists, display the execution command:
+1. Read `latestResumeCommandPath` from the task catalog (Step 1) — or `resumeCommandPath` from a specific timeline entry (Step 2).
+2. Verify the file exists on disk.
+3. If present:
    ```bash
    bash <resume-command-path>
    ```
-4. If it does not exist, display the message: "No resume script found. Please run it again."
+4. If absent, report: `No resume script available for this run. Use Step 3 to start a fresh run instead.`
 ## Output Rules
 - Display concisely in a table format
 - Dates in `YYYY-MM-DD HH:MM` format
-- Display status as-is (`completed`, `prepared`, `error`, `not-run`, etc.)
+- Display status fields as-is from disk (`completed`, `contract-violated`, `todo`, `error`, empty, ...). Do not normalize or remap.
 - Display `--` if no report is available