npm - okstra - Versions diffs - 0.45.1 → 0.46.0 - Mend

okstra 0.45.1 → 0.46.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/runtime/prompts/profiles/_implementation-executor.md CHANGED Viewed

@@ -27,10 +27,9 @@ until Phase 5 ends, then drop from active context for Phase 6/7.
   - Doc-only / config-only / pure-rename steps that have no observable runtime behaviour are exempt from the failing-test requirement, but the executor MUST cite the exemption per step in the final report (`TDD exemption: <reason>`).
   - When the touched area has no existing test harness, the executor MUST stand up the minimum harness needed to host one regression test for this run rather than skipping TDD entirely. Record the harness-bootstrap step as an `Out-of-plan edit` if it is not in the plan.
 - **DB / IO / SQL changes require real execution — mock-only is NOT validation evidence:** when this run's diff touches DB/IO/SQL (ORM / query-builder code — sequelize / typeorm / prisma / knex / raw SQL — `*.repository.*`, model/entity files, `migrations/**`, `*.sql`, or any changed query string), a mocked unit test cannot observe the SQL the query builder actually emits — a mocked suite once passed while `count({ col: 'FontFamily.fontFamily' })` threw `Unknown column` on the real DB. The executor MUST run the change against a real (or faithful-replica) datastore — the `db-test` validation step (plan `validation` db step, else `project.json.qaCommands.db-test`), targeting a **local / replica** DB — and cite its exact command + exit code in the final report's `Validation evidence`. If no real DB / `db-test` command is reachable, do NOT claim the change verified: label the DB portion `정적 분석상 …, 미검증(실행 안 함)` in the report, surface it in the routing recommendation, and never downplay the real run as "too heavy". `git push` stays forbidden (universal list); the unverified DB state is carried forward so `final-verification` cannot accept it and `release-handoff` cannot push.
-- re-read the approved plan end-to-end and parse the `## 4.5 Stage Map`. Determine **start stage**:
-  - if `--stage <N>` is supplied, use N. Otherwise auto = the lowest stage number whose `depends-on` are all recorded as `status:done` in `runs/<plan-key>/consumers.jsonl` AND that itself has no `status:done` row. Multiple stages may match — two parallel `implementation` runs may pick different ones and proceed concurrently.
-  - load every `runs/<plan-key>/carry/stage-<i>.json` for `i ∈ depends-on(start_stage)` and inject them into the executor's working context as "runtime carry-in". For `depends-on (none)` stages, no sidecar load — task-brief only.
-  - extract the **start stage's** file list, step order, Stage Validation commands, Stage Exit Contract, and rollback path. These — not the whole plan — are the authoritative scope for this run.
+- re-read the approved plan end-to-end and parse the `## 4.5 Stage Map`. Read the **Stage batch** injected in the launch prompt (`Stage batch for this implementation run`): it lists the stage numbers this run owns, ascending. The runtime already selected and reserved this batch — do NOT recompute the start stage from `consumers.jsonl`.
+  - for each stage in the batch, load every `runs/<plan-key>/carry/stage-<i>.json` for `i ∈ depends-on(stage)` and inject them into the executor's working context as "runtime carry-in". For `depends-on (none)` stages, no sidecar load — task-brief only.
+  - the batch's stages are mutually independent (each one's `depends-on` are all already `status:done`, never another batch member), so execute them in ascending order; each stage's file list, step order, Stage Validation commands, Stage Exit Contract, and rollback path are the authoritative scope for that stage.
 - inspect the current state of every file the plan names; if any file has changed materially since the plan was written, stop and route to a new `implementation-planning` run instead of editing speculatively
 - "materially changed" means: the function, class, section, or behaviour the plan targets has been edited, renamed, moved, removed, or otherwise altered in a way that invalidates the plan's reasoning. Cosmetic edits (whitespace, comment-only changes, unrelated function modifications elsewhere in the same file) do NOT trigger a re-plan; cite the diff (`git log --oneline <plan-created-at>..HEAD -- <file>`) in the final report and proceed.
 - distinguish the two file-scope rules (they are not in conflict):
@@ -38,15 +37,14 @@ until Phase 5 ends, then drop from active context for Phase 6/7.
   - **out-of-plan rule** (Allowed actions section below): if a step *requires touching a file NOT in the plan list*, that is permitted with `Out-of-plan edits` justification. This handles honest scope discovery during execution.
 - confirm the test/build commands referenced in the plan still exist and run from a clean state
-## Stage execution contract (this run owns exactly one stage of the plan)
+## Stage execution contract (this run owns the injected stage batch)
-- **Sidecar evidence writer (BLOCKING).** When the start stage's Stage Validation `post` commands all succeed, the Executor MUST emit a JSON object matching the schema in `docs/superpowers/specs/2026-05-20-implementation-planning-multi-stage-design.md` §3.2 and the lead MUST persist it to `runs/<impl-task-key>/carry/stage-<N>.json`. The file MUST NOT exist before the run starts (overwrite is refused — see `--force-stage` non-goal).
-- **Reverse link (BLOCKING).** Before the first Edit/Write, append a `status:"started"` row to `runs/<plan-task-key>/consumers.jsonl` (lock via the okstra runtime). On stage completion, append a `status:"done"` row with `carry_path` populated.
-- **One-PR-per-stage.** This run creates exactly one PR titled `Stage <N>: <stage title>`. The PR body MUST include:
-  - `## Stage` — number and title (from Stage Map row).
-  - `## Carry-In summary` — depends-on list + cited identifiers/SHAs from each loaded sidecar (omit when depends-on is empty).
-  - `## Next stage` — next stage number/title or `(last stage)`.
-  Stage PRs link back to each other in their bodies (`Previous: #<n>, Next: #<m>` lines) so a reviewer can navigate the chain.
+- **Sidecar evidence writer (BLOCKING, per stage).** For each stage in the batch, when that stage's Stage Validation `post` commands all succeed, the Executor MUST emit a JSON object matching the schema in `docs/superpowers/specs/2026-05-20-implementation-planning-multi-stage-design.md` §3.2 and the lead MUST persist it to `runs/<impl-task-key>/carry/stage-<N>.json`. Each file MUST NOT exist before the run starts (overwrite is refused — see `--force-stage` non-goal).
+- **Reverse link (BLOCKING, per stage).** The runtime already appended a `status:"started"` row per batch stage before this run began. On each stage's completion, append a `status:"done"` row with `carry_path` populated for that stage number.
+- **One-PR-per-run.** This run creates exactly one PR titled `Stages <first>–<last>: <run summary>` (or `Stage <N>: <title>` when the batch is a single stage). The PR body MUST include:
+  - `## Stage <N>` — one section per batched stage: number, title (from Stage Map row), touched files, and validation result.
+  - `## Carry-In summary` — per stage, depends-on list + cited identifiers/SHAs from each loaded sidecar (omit when depends-on is empty).
+  - `## Previous run` / `## Next run` — links so a reviewer can navigate the run chain.
 ## Allowed actions during the run

package/runtime/prompts/profiles/implementation-planning.md CHANGED Viewed

@@ -65,7 +65,8 @@
     - `### Stepwise Execution Order` — bite-sized table with `step | action | files | command | expected`. **Effective row count ≤ 6** (excluding header / divider / blank). Each step is one action completable in 2–5 minutes; for code steps include actual code or diff sketch; prefer TDD ordering (failing test → implementation → green → commit).
     - `### Stage Exit Contract` — predicted added/modified files, newly exposed identifiers/types/endpoints, downstream-usable resources.
     - `### Stage Validation` — pre / mid / post exact commands or observable outcomes for this stage only.
-  - **Parallelisation-first rule (1st-class):** the writer MUST prefer the partition that maximises the number of `depends-on (none)` stages. Given two partitions with equal total step count, the one with fewer `depends-on` edges wins. Conservative `let's serialise to be safe` groupings are forbidden — each `depends-on` link is justified by a concrete data/contract dependency, not a vague risk concern.
+  - **Cohesion-first partition rule (1st-class):** the grouping anchor is **shared file/module proximity** — steps touching the same file/directory/module go in the same stage so the diff, PR, and rollback unit are semantically cohesive. A stage is split ONLY when (a) a real `depends-on` data/contract dependency exists, (b) effective steps would exceed 6, or (c) the file sets are disjoint (unrelated work touching no shared file is not crammed together). Maximising the number of parallel stages is NOT a reason to split — parallelism is an emergent property of independent stages, never a partitioning goal.
+  - **Parallel-safety invariant (BLOCKING):** any two stages that are both `depends-on (none)` MUST predict disjoint file sets in their `Stage Exit Contract`. Two parallel `implementation` runs would otherwise edit the same file concurrently. Work touching a shared file must either go in one stage or be ordered with `depends-on`. Enforced by `validators/validate-implementation-plan-stages.py` check S9.
   - **Stage exit contract is the carry surface:** keep it as narrow as possible. Wider surface = more downstream coupling.
   - dependency / migration risk assessment (ordering constraints, data backfills, feature-flag prerequisites, repo-internal sequencing)
   - validation checklist (pre / mid / post) — each item is an exact command or observable outcome
@@ -93,4 +94,4 @@
   4. **Ambiguity check** — any requirement that could be read two ways must be made explicit or moved to the `## 5. Clarification Items` table as a `Blocks=approval` row.
   5. **Scope check** — if the recommended plan now spans multiple independent subsystems, recommend splitting into separate planning runs rather than shipping an oversized plan.
   6. **Plan-body verification reconciliation (BLOCKING for implementation-planning).** Inspect the `### 4.5.9 Plan Body Verification` verdict table. For every plan-item row classified as `majority-disagree → C-<N>`, the corresponding `C-<N>` row MUST exist in `## 5. Clarification Items` with `Kind` chosen per the standard policy and `Blocks=approval`. Do NOT create a parallel `### 4.5.x Open Questions` block — the unified table is the single home. Conversely, the `Classification` column's `C-<N>` reference and the `## 5. Clarification Items` `ID` column MUST match 1:1; an orphan on either side is a contract violation. For `partial-consensus` and `worker-unique` plan-items, the dissenting opinion lives in §4.5.9 `Dissent log` and is NOT promoted to §5.
-  7. **Stage Map self-check** — for every stage, count the effective rows of its `Stepwise Execution Order` table by hand; reject the draft if any stage exceeds 6. Walk the `depends-on` graph and confirm it is a DAG (no cycle, no self-reference). For each `depends-on` link, ask "can this be removed by re-partitioning?" — if yes, re-partition and re-count.
+  7. **Stage Map self-check** — for every stage, count the effective rows of its `Stepwise Execution Order` table by hand; reject the draft if any stage exceeds 6. Walk the `depends-on` graph and confirm it is a DAG (no cycle, no self-reference). For each `depends-on` link, confirm it encodes a real data/contract dependency — do NOT add links to serialise unrelated work, and do NOT split a stage merely to create more parallel stages. **Parallel-safety:** for every pair of `depends-on (none)` stages, confirm their `Stage Exit Contract` predicted file sets are disjoint; if they share a file, merge them or add a `depends-on` link (validator S9 rejects overlap).

package/runtime/prompts/profiles/implementation.md CHANGED Viewed

@@ -1,6 +1,7 @@
 # Implementation Profile
 - Purpose: realise the approved `implementation-planning` deliverable as actual source changes, with cross-model verification, while keeping the run reversible
+- **Run-level fixed cost:** the verifier set, Phase 5.5 convergence, and the Phase 6 report-writer run exactly once per run, over the combined diff of all stages in this run's batch — never once per stage.
 - Required workers:
   - claude
   - codex

package/runtime/python/okstra_ctl/render.py CHANGED Viewed

@@ -1514,11 +1514,11 @@ def inject_lead_prompt_computed_tokens(ctx: dict) -> None:
 def apply_lead_prompt_defaults(ctx: dict) -> None:
     """Apply default values for optional lead-prompt ctx fields.
-    Sets four optional tokens that the lead prompt template references but
+    Sets the optional tokens that the lead prompt template references but
     which callers may legitimately leave unset (e.g., no validation has run
-    yet, no related tasks were declared). Caller-supplied values are
-    preserved via `setdefault` / `if-not-in` semantics — this function only
-    fills gaps, never overwrites.
+    yet, no related tasks were declared, the run is not an implementation
+    batch). Caller-supplied values are preserved via `setdefault` / `if-not-in`
+    semantics — this function only fills gaps, never overwrites.
     Companion to `inject_lead_prompt_computed_tokens` (which always
     overwrites with deterministically-derived values). The two functions
@@ -1528,6 +1528,9 @@ def apply_lead_prompt_defaults(ctx: dict) -> None:
     ctx.setdefault("VALIDATION_STATUS", "not-run")
     ctx.setdefault("RELATED_TASKS_BULLETS", "- None recorded")
     ctx.setdefault("RELATED_TASKS_INLINE", "None")
+    # Empty for non-implementation runs; the implementation prepare path
+    # overwrites it with the resolved stage-batch directive.
+    ctx.setdefault("STAGE_BATCH_DIRECTIVE", "")
     ctx.setdefault(
         "WORKER_PROMPT_PREAMBLE_PATH",
         str(Path.home() / ".okstra" / "templates" / "worker-prompt-preamble.md"),

package/runtime/python/okstra_ctl/run.py CHANGED Viewed

@@ -208,42 +208,58 @@ def _validate_stage_structure(plan_path: str) -> None:
         )
-def _resolve_effective_stage(
+RUN_STEP_BUDGET = 8
+def _resolve_effective_stages(
     stages: list,
     done_stages: set,
     requested: str,
-) -> int:
-    """Return the stage number to execute.
+    budget: int = RUN_STEP_BUDGET,
+) -> list:
+    """Return the ordered list of stage numbers this run executes.
+    `requested` is "auto" or a decimal string. For "auto" the run batches all
+    ready stages (depends-on all done, itself not done) in stage-number order up
+    to `budget` effective steps — but always at least one. A numeric request is a
+    single forced stage. Raises PrepareError on rejection cases."""
+    if requested != "auto":
+        try:
+            n = int(requested)
+        except ValueError:
+            raise PrepareError(
+                f"--stage must be 'auto' or an integer, got {requested!r}"
+            )
+        target = next((s for s in stages if s["stage_number"] == n), None)
+        if target is None:
+            raise PrepareError(
+                f"--stage {n} not in Stage Map "
+                f"(have {[s['stage_number'] for s in stages]})"
+            )
+        if n in done_stages:
+            raise PrepareError(
+                f"--stage {n} already completed (consumers.jsonl status:done exists)"
+            )
+        return [n]
-    `requested` is either "auto" or a decimal string.
-    Raises PrepareError on all rejection cases.
-    """
-    if requested == "auto":
-        for s in stages:
-            if s["stage_number"] in done_stages:
-                continue
-            if all(d in done_stages for d in s["depends_on"]):
-                return s["stage_number"]
+    ready = [
+        s for s in stages
+        if s["stage_number"] not in done_stages
+        and all(d in done_stages for d in s["depends_on"])
+    ]
+    if not ready:
         raise PrepareError(
             "no stage is ready: every remaining stage has unsatisfied depends-on"
         )
-    try:
-        n = int(requested)
-    except ValueError:
-        raise PrepareError(
-            f"--stage must be 'auto' or an integer, got {requested!r}"
-        )
-    target = next((s for s in stages if s["stage_number"] == n), None)
-    if target is None:
-        raise PrepareError(
-            f"--stage {n} not in Stage Map "
-            f"(have {[s['stage_number'] for s in stages]})"
-        )
-    if n in done_stages:
-        raise PrepareError(
-            f"--stage {n} already completed (consumers.jsonl status:done exists)"
-        )
-    return n
+    batch: list = []
+    total = 0
+    for s in ready:
+        sc = s.get("step_count", 0) or 0
+        if batch and total + sc > budget:
+            break
+        batch.append(s["stage_number"])
+        total += sc
+    return batch
 def _parse_stage_map_into_ctx(plan_path: str) -> list:
@@ -842,31 +858,42 @@ def prepare_task_bundle(inp: PrepareInputs) -> PrepareOutputs:
     })
     if inp.task_type == "implementation":
         ctx["parsed_stage_map"] = ctx_stage_map
-        # Resolve effective stage and append `started` row to consumers.jsonl
+        # Resolve the ready-set batch and append a `started` row per batched stage.
         from .consumers import read_consumers, append_consumer
         import datetime as _dt
         plan_run_root = Path(inp.approved_plan_path).resolve().parents[1]
         consumed = read_consumers(plan_run_root)
         done_stages = {r["stage"] for r in consumed if r.get("status") == "done"}
-        effective = _resolve_effective_stage(
+        effective = _resolve_effective_stages(
             ctx["parsed_stage_map"], done_stages, inp.stage
         )
-        ctx["effective_stage"] = effective
-        inp.stage = str(effective)
-        print(f"selected stage: {inp.stage}", file=sys.stdout)
+        ctx["effective_stages"] = effective
+        csv = ",".join(str(n) for n in effective)
+        ctx["EFFECTIVE_STAGES"] = csv
+        ctx["STAGE_BATCH_DIRECTIVE"] = (
+            f"- **Stage batch for this implementation run:** `{csv}` "
+            "(comma-separated stage numbers, ascending). Execute exactly these "
+            "Stage Map stages in this order — this is the authoritative scope. "
+            "Do NOT recompute the start stage from `consumers.jsonl`; the runtime "
+            "already selected and reserved this batch."
+        )
+        inp.stage = csv
+        print(f"selected stages: {csv}", file=sys.stdout)
         head_proc = _subprocess.run(
             ["git", "rev-parse", "HEAD"],
             cwd=inp.project_root, capture_output=True, text=True,
         )
         head_sha = head_proc.stdout.strip() if head_proc.returncode == 0 else ""
-        append_consumer(
-            plan_run_root,
-            impl_task_key=ctx["TASK_KEY"],
-            stage=effective,
-            status="started",
-            started_at=_dt.datetime.now(_dt.timezone.utc).isoformat(),
-            head_commit=head_sha,
-        )
+        now = _dt.datetime.now(_dt.timezone.utc).isoformat()
+        for stage_n in effective:
+            append_consumer(
+                plan_run_root,
+                impl_task_key=ctx["TASK_KEY"],
+                stage=stage_n,
+                status="started",
+                started_at=now,
+                head_commit=head_sha,
+            )
     # ---- prepare directories + cleanup ----
     _ensure_task_directories(ctx)

package/runtime/validators/validate-implementation-plan-stages.py CHANGED Viewed

@@ -1,5 +1,5 @@
 #!/usr/bin/env python3
-"""S1–S8 checks for the Stage Map structure of an approved
+"""S1–S9 checks for the Stage Map structure of an approved
 implementation-planning final-report.md. Run from prepare_task_bundle
 of `implementation` task or standalone."""
@@ -23,6 +23,11 @@ REQUIRED_SUBSECTIONS = (
     "Stage Validation",
 )
+EXIT_CONTRACT_HEADING = re.compile(r"^###\s+Stage Exit Contract\b", re.M)
+# best-effort path token: only slash-containing paths count as files, so
+# endpoints (`/bar`), env vars (`BAZ_MODE`), and extensionless tokens are skipped.
+PATH_TOKEN = re.compile(r"(?:[\w.@-]+/)+[\w.@-]+")
 @dataclass
 class StageMeta:
@@ -35,7 +40,7 @@ class StageMeta:
 @dataclass
 class ValidationError:
-    code: str   # S1..S8
+    code: str   # S1..S9
     stage: int  # 0 = global
     message: str
@@ -85,6 +90,20 @@ def _parse_stage_map(text: str) -> Tuple[List[StageMeta], List[ValidationError]]
     return rows, errors
+def _slice_stage_section(text: str, stage_number: int) -> str:
+    """Return the body of `## 4.5.<n> Stage <n>:` up to the next stage heading."""
+    start_m = re.search(
+        rf"^##\s+4\.5\.{stage_number}\s+Stage\s+{stage_number}\s*:", text, re.M
+    )
+    if not start_m:
+        return ""
+    start = start_m.end()
+    nxt = re.search(
+        rf"^##\s+4\.5\.{stage_number + 1}\s+Stage\s+", text[start:], re.M
+    )
+    return text[start: start + nxt.start()] if nxt else text[start:]
 def _count_effective_steps(section: str) -> int:
     m = re.search(r"^###\s+Stepwise Execution Order\b", section, re.M)
     if not m:
@@ -114,19 +133,13 @@ def _count_effective_steps(section: str) -> int:
 def _check_each_stage_section(text: str, stages: List[StageMeta]) -> List[ValidationError]:
     errs: List[ValidationError] = []
     for s in stages:
-        pattern = rf"^##\s+4\.5\.{s.stage_number}\s+Stage\s+{s.stage_number}\s*:"
-        start_m = re.search(pattern, text, re.M)
-        if not start_m:
+        if not re.search(
+            rf"^##\s+4\.5\.{s.stage_number}\s+Stage\s+{s.stage_number}\s*:", text, re.M
+        ):
             errs.append(ValidationError("S3", s.stage_number,
                 f"stage section '## 4.5.{s.stage_number} Stage {s.stage_number}:' missing"))
             continue
-        # Slice the stage's section body
-        start = start_m.end()
-        nxt = re.search(
-            rf"^##\s+4\.5\.{s.stage_number + 1}\s+Stage\s+",
-            text[start:], re.M,
-        )
-        section = text[start: start + nxt.start()] if nxt else text[start:]
+        section = _slice_stage_section(text, s.stage_number)
         for sub in REQUIRED_SUBSECTIONS:
             if not re.search(rf"^###\s+{re.escape(sub)}\b", section, re.M):
@@ -181,8 +194,42 @@ def _check_depends_on(stages: List[StageMeta]) -> List[ValidationError]:
     return errs
+def _extract_exit_contract_files(section: str) -> set:
+    m = EXIT_CONTRACT_HEADING.search(section)
+    if not m:
+        return set()
+    body = section[m.end():]
+    nxt = re.search(r"^###\s+\w", body, re.M)
+    if nxt:
+        body = body[: nxt.start()]
+    return set(PATH_TOKEN.findall(body))
+def _check_parallel_safety(text: str, stages: List[StageMeta]) -> List[ValidationError]:
+    """S9: two `depends-on (none)` stages must not predict the same file —
+    otherwise two parallel implementation runs would edit it concurrently."""
+    files = {
+        s.stage_number: _extract_exit_contract_files(
+            _slice_stage_section(text, s.stage_number)
+        )
+        for s in stages
+        if not s.depends_on
+    }
+    errs: List[ValidationError] = []
+    nums = sorted(files)
+    for i in range(len(nums)):
+        for j in range(i + 1, len(nums)):
+            a, b = nums[i], nums[j]
+            shared = files[a] & files[b]
+            if shared:
+                errs.append(ValidationError("S9", 0,
+                    f"parallel stages {a} and {b} share predicted file(s): "
+                    f"{', '.join(sorted(shared))}"))
+    return errs
 def collect_validation_errors(text: str) -> List[ValidationError]:
-    """All S1–S8 checks against the report text; empty list means valid.
+    """All S1–S9 checks against the report text; empty list means valid.
     S1 (missing `## 4.5 Stage Map` heading) makes the rest unparseable, so it
     short-circuits. Shared by `main()` (CLI / implementation entry) and the
@@ -198,6 +245,7 @@ def collect_validation_errors(text: str) -> List[ValidationError]:
     if stages:
         errors.extend(_check_each_stage_section(text, stages))
         errors.extend(_check_depends_on(stages))
+        errors.extend(_check_parallel_safety(text, stages))
     return errors