npm - okstra - Versions diffs - 0.41.0 → 0.42.0 - Mend

okstra 0.41.0 → 0.42.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "okstra",
-  "version": "0.41.0",
+  "version": "0.42.0",
   "description": "Multi-agent cross-verification orchestrator runtime + Claude Code skills.",
   "license": "MIT",
   "author": "devonshin",

package/runtime/BUILD.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  "package": "0.41.0",
-  "builtAt": "2026-06-02T15:52:15.071Z",
+  "package": "0.42.0",
+  "builtAt": "2026-06-03T11:08:34.120Z",
   "repoRoot": "/home/runner/work/okstra/okstra"
 }

package/runtime/prompts/profiles/_implementation-executor.md CHANGED Viewed

@@ -26,6 +26,7 @@ until Phase 5 ends, then drop from active context for Phase 6/7.
   - Order of operations per plan step: (1) write/extend the test that captures the step's acceptance criterion and confirm it fails for the right reason, (2) commit the failing test (`test(<scope>): ...`), (3) implement the minimum change to make it pass, (4) commit the implementation (`feat|fix(<scope>): ...`), (5) refactor without changing behaviour and commit separately if any cleanup is made (`refactor(<scope>): ...`). The failing-then-passing transition between steps (2) and (4) is the `TDD evidence` required by the final report.
   - Doc-only / config-only / pure-rename steps that have no observable runtime behaviour are exempt from the failing-test requirement, but the executor MUST cite the exemption per step in the final report (`TDD exemption: <reason>`).
   - When the touched area has no existing test harness, the executor MUST stand up the minimum harness needed to host one regression test for this run rather than skipping TDD entirely. Record the harness-bootstrap step as an `Out-of-plan edit` if it is not in the plan.
+- **DB / IO / SQL changes require real execution — mock-only is NOT validation evidence:** when this run's diff touches DB/IO/SQL (ORM / query-builder code — sequelize / typeorm / prisma / knex / raw SQL — `*.repository.*`, model/entity files, `migrations/**`, `*.sql`, or any changed query string), a mocked unit test cannot observe the SQL the query builder actually emits — a mocked suite once passed while `count({ col: 'FontFamily.fontFamily' })` threw `Unknown column` on the real DB. The executor MUST run the change against a real (or faithful-replica) datastore — the `db-test` validation step (plan `validation` db step, else `project.json.qaCommands.db-test`), targeting a **local / replica** DB — and cite its exact command + exit code in the final report's `Validation evidence`. If no real DB / `db-test` command is reachable, do NOT claim the change verified: label the DB portion `정적 분석상 …, 미검증(실행 안 함)` in the report, surface it in the routing recommendation, and never downplay the real run as "too heavy". `git push` stays forbidden (universal list); the unverified DB state is carried forward so `final-verification` cannot accept it and `release-handoff` cannot push.
 - re-read the approved plan end-to-end and parse the `## 4.5 Stage Map`. Determine **start stage**:
   - if `--stage <N>` is supplied, use N. Otherwise auto = the lowest stage number whose `depends-on` are all recorded as `status:done` in `runs/<plan-key>/consumers.jsonl` AND that itself has no `status:done` row. Multiple stages may match — two parallel `implementation` runs may pick different ones and proceed concurrently.
   - load every `runs/<plan-key>/carry/stage-<i>.json` for `i ∈ depends-on(start_stage)` and inject them into the executor's working context as "runtime carry-in". For `depends-on (none)` stages, no sidecar load — task-brief only.

package/runtime/prompts/profiles/_implementation-verifier.md CHANGED Viewed

@@ -30,7 +30,8 @@ Verifier obtains the QA command set from exactly two declared sources, in order
        "lint":      [{ "label": "cargo clippy", "cmd": "cargo clippy --all-targets -- -D warnings", "language": "rust" }],
        "format":    [{ "label": "cargo fmt",    "cmd": "cargo fmt --check",                          "language": "rust" }],
        "typecheck": [{ "label": "tsc",          "cmd": "pnpm exec tsc --noEmit",                     "language": "ts"   }],
-       "test":      [{ "label": "cargo test",   "cmd": "cargo test --workspace --locked",            "language": "rust" }]
+       "test":      [{ "label": "cargo test",   "cmd": "cargo test --workspace --locked",            "language": "rust" }],
+       "db-test":   [{ "label": "db integ",     "cmd": "pnpm test:db",                               "language": "ts"   }]
      }
    }
    ```
@@ -42,7 +43,7 @@ Tier 1 commands run verbatim first. Then every Tier 2 entry runs once. Each comm
 ### Missing-tier handling
-If a tier is empty or absent, verifier records the single line `qa-command not configured: <category>` per missing category (`lint` / `format` / `typecheck` / `test`) in the worker result and proceeds — silent omission is a contract violation. Verifier MUST NOT auto-detect or invent a command in this case; the user/operator must declare it in `project.json.qaCommands` or in the plan.
+If a tier is empty or absent, verifier records the single line `qa-command not configured: <category>` per missing category (`lint` / `format` / `typecheck` / `test`; and `db-test` **only when the diff touches DB/IO/SQL**, where a missing `db-test` is escalated to a blocking finding per the DB real-execution gate below — not a passive note) in the worker result and proceeds — silent omission is a contract violation. Verifier MUST NOT auto-detect or invent a command in this case; the user/operator must declare it in `project.json.qaCommands` or in the plan.
 ### `cmd` field deny-list (Tier 2 validation)
@@ -74,6 +75,16 @@ Re-running commands proves the diff *builds and passes*; it does NOT prove the d
 - **Advisory findings (recorded as recommendations; verdict MAY still PASS):** function >50 effective lines, a single body mixing read+write stages, weak readability, a missing-but-non-critical outcome assertion. These land in the verifier result as `should-fix` / `nit` recommendations, not as a `FAIL`.
 - **Output.** Every finding — blocking or advisory — is a structured item in the verifier's worker result (`path:line`, rule, severity, suggested fix) so it carries into Phase 5.5 convergence and the final report. A blocking hit sets the verifier verdict to `FAIL` with the rule cited, using the same verdict machinery as the Discrepancy rule above. `Claude lead` MUST NOT silently downgrade a cited blocking finding to advisory during synthesis; an override requires a concrete cited reason, exactly as for the Discrepancy rule.
+### DB / IO / SQL change — real-execution gate (mock-only acceptance forbidden)
+A mocked unit test cannot observe the SQL a query builder actually emits — `count({ col: 'FontFamily.fontFamily' })` passes a mocked suite yet throws `Unknown column` on a real database. For this class of change a green mock-only suite is therefore NOT evidence; only a run against a real (or faithful-replica) datastore is. This gate is the verifier's enforcement of that rule.
+- **Trigger.** Fires when `git diff <base>...HEAD` touches DB/IO/SQL: ORM / query-builder code (sequelize / typeorm / prisma / knex / raw SQL), `*.repository.*`, model/entity files, `migrations/**`, `*.sql`, or any changed query string.
+- **Requirement when fired.** The verifier MUST reproduce a real-DB execution: run the `db-test` tier (Tier 1 = plan `validation` db step; else Tier 2 = `project.json.qaCommands.db-test`) against a **local / replica** datastore (same engine + schema — never shared / staging / prod, consistent with the verifier forbidden-actions list) and record its exact command + exit code. A mock, an in-memory shim that does not parse real SQL, or static reasoning does NOT satisfy this.
+- **No `db-test` command available → blocking, not a passive skip.** If neither tier declares a `db-test` command, the verifier records the blocking finding `db-test not configured — DB change unverified (mock-only)` and sets the verdict to `FAIL`; it MUST NOT emit only the passive `qa-command not configured` note and pass. Recommended fix: declare a `db-test` command in `project.json.qaCommands` or the plan's validation set.
+- **Mock-only evidence → unverified.** If the diff's only DB coverage is mocked, the verifier labels the DB portion `정적 분석상 …, 미검증(실행 안 함)` (never `검증됨`), records it as a blocking finding, and sets `FAIL`. Never downplay the real run as "too heavy / static proof suffices".
+- **Surface it at every layer.** The finding is copied verbatim into the verifier result and MUST survive into the final report's `## 1.` and Verdict Card, so the user sees the DB-unverified state continuously — it is the load-bearing reason a downstream `final-verification` cannot reach `accepted` and `release-handoff` cannot push.
 ## All-verifier-failure policy
 If every verifier present in the resolved roster (`Claude verifier`, `Codex verifier`, and `Gemini verifier` when opted in) ends with a non-result terminal status (`timeout`, `error`, `not-run`) — i.e. zero independent verdicts were produced — the run MUST end with status `blocked` and route to a follow-up `error-analysis` run. `Claude lead` MUST NOT substitute its own verdict in place of the missing verifier outputs; synthesis requires at least one independent verifier's verdict. If one or more verifiers fail but at least one returns a verdict, the run proceeds with the surviving verdict(s) and the final report MUST explicitly notate which verifiers were unavailable, with the captured error / timeout evidence per failed verifier.

package/runtime/prompts/profiles/final-verification.md CHANGED Viewed

@@ -14,6 +14,7 @@
     - delivered artifacts match recorded expected values in `reference-expectations` (config files, deployment manifests, other recorded expected states); when reference-expectations are absent, record it as missing information rather than assuming a match
     - test & validation suite pass status — independently re-run the read-only two-tier command set (Tier 1 = brief/approved-plan `validation`, Tier 2 = `project.json` `qaCommands`) and confirm each passes on the verified head, citing exact command + exit code
     - test correctness — delivered tests actually assert the intended behaviour: no gutted/weakened assertions, no tautological or always-passing tests, no tests exercising only mocks; new behaviour has matching coverage
+    - DB / IO / SQL real-execution evidence — when the diff touches DB/IO/SQL (ORM / query-builder, `*.repository.*`, model / `migrations/**` / `*.sql`, or changed query strings), Validation Evidence MUST cite a real (or faithful-replica) DB execution — the `db-test` command + exit code — not a mock-only suite, because a mocked suite cannot observe the SQL actually emitted (`count({ col: 'FontFamily.fontFamily' })` passed mocks yet threw `Unknown column` on the real DB). A DB-touching change whose only evidence is mocked, or for which no `db-test` ran, is an **Acceptance Blocker** (`major`+): record it, and since `accepted` requires zero blockers the verdict becomes `conditional-accept` / `blocked`. This is the gate that stops an unverified DB change from reaching `release-handoff` and being pushed.
     - no new defects introduced — the diff does not break previously-working behaviour and adds no new bug (logic/off-by-one, null/empty handling, resource leaks, broken error paths)
     - scope conformance — the delivered diff stays within the approved plan's scope; flag out-of-scope edits, unrelated file changes, leftover debug/commented-out code, and unintended deletions
   - Residual-tracked — note as Residual Risk unless severe enough to block:

package/runtime/python/okstra_ctl/qa_commands.py CHANGED Viewed

@@ -20,7 +20,11 @@ import re
 from typing import Iterable
 # 카테고리 화이트리스트. 알 수 없는 카테고리는 오타 가능성이 높으므로 거부.
-ALLOWED_CATEGORIES: tuple[str, ...] = ("lint", "format", "typecheck", "test")
+# `db-test` 는 DB/IO/SQL 변경의 실제 DB(또는 충실한 복제) 실행 테스트 전용 카테고리 —
+# mocked 단위테스트로는 query builder 가 실제로 emit 하는 SQL 을 관측할 수 없으므로
+# `test` 와 분리한다. implementation verifier / final-verification 의 DB 실제실행 게이트가
+# diff 가 DB 를 건드릴 때 이 카테고리(또는 plan validation 의 db 스텝)를 요구한다.
+ALLOWED_CATEGORIES: tuple[str, ...] = ("lint", "format", "typecheck", "test", "db-test")
 # Mutation 을 유발하거나 lockfile 을 갱신하는 토큰. 각 토큰은 `cmd` 문자열을
 # 공백으로 단순 분해한 결과 또는 부분 일치 패턴(prefix/suffix sensitive) 로 검출한다.