npm - okstra - Versions diffs - 0.49.0 → 0.51.0 - Mend

okstra 0.49.0 → 0.51.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (86) hide show

package/README.kr.md +8 -7
package/README.md +8 -7
package/bin/okstra +2 -0
package/docs/kr/architecture.md +23 -24
package/docs/kr/cli.md +6 -6
package/docs/project-structure-overview.md +13 -9
package/docs/superpowers/plans/2026-06-05-wizard-batch-prompts.md +559 -0
package/docs/superpowers/specs/2026-06-05-wizard-batch-prompts-design.md +121 -0
package/docs/task-process/error-analysis.md +1 -1
package/docs/task-process/final-verification.md +1 -1
package/docs/task-process/release-handoff.md +1 -1
package/docs/task-process/requirements-discovery.md +1 -1
package/package.json +1 -1
package/runtime/BUILD.json +2 -2
package/runtime/agents/SKILL.md +18 -14
package/runtime/agents/workers/claude-worker.md +4 -4
package/runtime/agents/workers/codex-worker.md +3 -3
package/runtime/agents/workers/gemini-worker.md +3 -3
package/runtime/agents/workers/report-writer-worker.md +3 -3
package/runtime/bin/lib/okstra/cli.sh +8 -1
package/runtime/bin/lib/okstra/globals.sh +3 -0
package/runtime/bin/lib/okstra/interactive.sh +14 -12
package/runtime/bin/lib/okstra/usage.sh +6 -0
package/runtime/bin/okstra-render-report-views.py +1 -1
package/runtime/bin/okstra-team-reconcile.sh +28 -0
package/runtime/bin/okstra.sh +2 -0
package/runtime/prompts/launch.template.md +4 -2
package/runtime/prompts/profiles/_common-contract.md +15 -15
package/runtime/prompts/profiles/_implementation-deliverable.md +1 -1
package/runtime/prompts/profiles/_implementation-executor.md +3 -3
package/runtime/prompts/profiles/_implementation-verifier.md +2 -2
package/runtime/prompts/profiles/error-analysis.md +1 -1
package/runtime/prompts/profiles/final-verification.md +2 -2
package/runtime/prompts/profiles/implementation-planning.md +10 -9
package/runtime/prompts/profiles/implementation.md +1 -1
package/runtime/prompts/profiles/improvement-discovery.md +5 -5
package/runtime/prompts/profiles/release-handoff.md +2 -2
package/runtime/prompts/profiles/requirements-discovery.md +2 -2
package/runtime/python/okstra_ctl/analysis_packet.py +259 -0
package/runtime/python/okstra_ctl/clarification_items.py +11 -11
package/runtime/python/okstra_ctl/context_cost.py +308 -0
package/runtime/python/okstra_ctl/migrate.py +2 -12
package/runtime/python/okstra_ctl/paths.py +22 -0
package/runtime/python/okstra_ctl/render.py +285 -126
package/runtime/python/okstra_ctl/render_final_report.py +32 -1
package/runtime/python/okstra_ctl/report_views.py +12 -12
package/runtime/python/okstra_ctl/run.py +510 -248
package/runtime/python/okstra_ctl/sequence.py +2 -5
package/runtime/python/okstra_ctl/team_reconcile.py +131 -0
package/runtime/python/okstra_ctl/wizard.py +219 -136
package/runtime/python/okstra_ctl/workflow.py +1 -1
package/runtime/python/okstra_ctl/worktree.py +13 -5
package/runtime/schemas/final-report-v1.0.schema.json +4 -0
package/runtime/skills/okstra-brief/SKILL.md +1 -1
package/runtime/skills/okstra-coding-preflight/SKILL.md +69 -0
package/runtime/skills/okstra-coding-preflight/architecture/hexagonal.md +116 -0
package/runtime/skills/okstra-coding-preflight/clean-code.md +254 -0
package/runtime/skills/okstra-coding-preflight/languages/java.md +64 -0
package/runtime/skills/okstra-coding-preflight/languages/javascript-typescript.md +87 -0
package/runtime/skills/okstra-coding-preflight/languages/kotlin.md +69 -0
package/runtime/skills/okstra-coding-preflight/languages/nodejs.md +66 -0
package/runtime/skills/okstra-coding-preflight/languages/python.md +179 -0
package/runtime/skills/okstra-coding-preflight/languages/rust.md +105 -0
package/runtime/skills/okstra-coding-preflight/languages/sql.md +68 -0
package/runtime/skills/okstra-context-loader/SKILL.md +12 -6
package/runtime/skills/okstra-convergence/SKILL.md +8 -8
package/runtime/skills/okstra-inspect/SKILL.md +100 -1
package/runtime/skills/okstra-report-writer/SKILL.md +27 -23
package/runtime/skills/okstra-run/SKILL.md +3 -1
package/runtime/skills/okstra-team-contract/SKILL.md +8 -5
package/runtime/templates/reports/final-report.template.md +188 -187
package/runtime/templates/reports/i18n/en.json +4 -4
package/runtime/templates/reports/i18n/ko.json +4 -4
package/runtime/templates/reports/implementation-planning-input.template.md +1 -1
package/runtime/templates/reports/release-handoff-input.template.md +1 -1
package/runtime/templates/reports/user-response.template.md +1 -1
package/runtime/templates/worker-prompt-preamble.md +4 -4
package/runtime/validators/lib/fixtures.sh +2 -2
package/runtime/validators/validate-implementation-plan-stages.py +9 -9
package/runtime/validators/validate-report-views.py +10 -10
package/runtime/validators/validate-run.py +36 -36
package/runtime/validators/validate_improvement_report.py +8 -8
package/src/_python-helper.mjs +3 -3
package/src/context-cost.mjs +27 -0
package/src/install.mjs +1 -0
package/src/uninstall.mjs +1 -0

package/runtime/skills/okstra-coding-preflight/languages/kotlin.md ADDED Viewed

@@ -0,0 +1,69 @@
+# Kotlin Conventions
+## Style guide
+Kotlin official coding conventions (kotlinlang.org/docs/coding-conventions.html). In IntelliJ / Android Studio enable via *Settings → Editor → Code Style → Kotlin → Set from… → Kotlin style guide*.
+## Naming
+| Element | Convention | Example |
+|---|---|---|
+| Class / object / interface | UpperCamelCase | `DeclarationProcessor` |
+| Function / property / parameter | lowerCamelCase | `processDeclarations`, `isReady` |
+| Constant (`const val` or top-level immutable) | SCREAMING_SNAKE_CASE | `MAX_COUNT`, `USER_NAME_FIELD` |
+| Backing property | leading underscore | `_elementList` (private), `elementList` (public) |
+| Package | lowercase, no underscores | `com.example.feature` |
+| Test function | backticked sentence | `` `parses empty input` `` |
+## Formatting
+- Indent: **4 spaces**.
+- Opening brace at end of line, closing brace on its own line.
+- Lambdas: spaces around braces and arrow — `list.filter { it > 10 }`, `map.forEach { (k, v) -> println("$k=$v") }`.
+- Trailing comma in multi-line lists / parameters / `when` arms (since Kotlin 1.4).
+- Function expression body when the body is a single expression: `fun double(x: Int) = x * 2`.
+## Required practices
+- Prefer `val` over `var`. Mutability is opt-in.
+- Default and named arguments replace constructor / function overloads.
+- Prefer the standard library: `filter`, `map`, `groupBy`, `associateBy`, `chunked`, `windowed` over manual loops.
+- String templates: `"Hello, $name"`, never `"Hello, " + name`.
+- Extension functions are encouraged. Keep them `internal` or file-private unless they belong to a public API.
+- `data class` for value-holders. Do not attach behaviour to them.
+- Sealed classes / sealed interfaces for closed hierarchies — exhaustive `when` catches new cases at compile time.
+- Null safety: never chain `!!`. Use `?.`, `?:`, `let`, `requireNotNull(x) { "why" }`, `checkNotNull(x)`.
+- Use `inline` for higher-order functions that are called frequently and pass lambdas, **not** as a default optimization.
+- Use `object` for true singletons and `companion object` only when JVM-visible static members are needed.
+## Coroutines
+- Pick **one** coroutine framework per module. Do not mix coroutines with `CompletableFuture` chains in the same flow.
+- Never `runBlocking { ... }` inside another coroutine.
+- Pass `CoroutineScope` explicitly; do not use `GlobalScope`.
+- Cancel scopes deterministically on lifecycle teardown.
+- Use `withContext(Dispatchers.IO)` only for blocking I/O — most application code stays on the default dispatcher.
+## Tests
+- Framework: **JUnit 5 + kotlin.test**, or **Kotest**. Match the project.
+- Backticked function names are idiomatic for tests: `` fun \`returns empty list when input is empty\`() `` .
+- Use `kotlinx-coroutines-test`'s `runTest` for suspend tests.
+- Use **MockK** over Mockito for Kotlin (handles `final` classes, coroutines, and extension functions correctly).
+- Prefer property-based tests (Kotest's `forAll`) for pure transformations.
+- Assertion libraries: `kotest-assertions` (`shouldBe`, `shouldThrow`) or AssertJ.
+### Self-mock signals to refuse (rule from `clean-code.md` → Testing discipline)
+- `spyk(sut)` followed by `every { sut.someMethod() } returns ...` — the SUT's own method is replaced; the test verifies wiring, not behavior.
+- `coEvery { sut.suspendMethod() } returns ...` for the suspend equivalent.
+- `mockkObject(SomeSingleton)` / `mockkStatic(...)` when `SomeSingleton` *is* the unit under test.
+- Reflection or `internal` visibility hacks (`@VisibleForTesting`, `callPrivateFunc` extensions) used to assert on private helpers — that's the implementation-coupling smell, not just a self-mock.
+What's fine: `mockk<Collaborator>()` for injected dependencies, `coEvery {}` on those collaborators, and `verify {}` on outcome boundaries (mailer, gateway, repository write).
+## Formatter / linter to run
+- **ktlint** and / or **detekt**.
+- `./gradlew ktlintFormat detekt` if the plugins are configured.
+- Run before commit.

package/runtime/skills/okstra-coding-preflight/languages/nodejs.md ADDED Viewed

@@ -0,0 +1,66 @@
+# Node.js Conventions (server-side)
+Load this **in addition to** [javascript-typescript.md](javascript-typescript.md). JS / TS rules still apply; this file adds the server-specific layer.
+## Architecture
+- Separate layers: **HTTP / transport** → **business / domain** → **data access**. No layer skips downward; no layer reaches upward.
+- Request handlers are thin: parse input → call a service → format response.
+- Business logic is framework-agnostic. Do **not** import Express / Fastify / Nest types in service or domain modules.
+- One file, one concern. A 1000-line `index.ts` is a structural bug.
+- Dependency injection (constructor or factory) over module-level singletons. Singletons are not testable.
+## Error handling
+- Distinguish **operational errors** (network, validation, expected business state) from **programmer errors** (bugs, broken invariants). Recover from operational, crash on programmer.
+- Use `async` / `await` with `try` / `catch` at the layer that knows what to do. Catching just to rethrow with no extra value is a smell.
+- Centralise error handling in middleware. Every route returns through it.
+- Always handle event-emitter `'error'` events — an unhandled emitter error crashes the process.
+- Throw subclasses of `Error` (`class ValidationError extends Error {}`) so handlers can `instanceof` them.
+- Process-level: `process.on('unhandledRejection', ...)` and `process.on('uncaughtException', ...)` should log and exit gracefully. Do not swallow.
+## Async discipline
+- Never mix callbacks with promises in the same flow. Use `util.promisify` at the boundary, once.
+- Prefer `for await...of` over manual cursor / stream loops.
+- `Promise.all` for independent parallel work, `Promise.allSettled` when one failure must not cancel the rest.
+- Set explicit timeouts on **every** outgoing HTTP / DB call (no infinite waits, no defaults).
+- Use `AbortController` to cancel in-flight work when the upstream caller goes away.
+## Security baseline
+- All configuration in environment variables (loaded via `dotenv` in dev only). Validate them at startup with `zod`, `envalid`, or similar — fail fast on missing / invalid values.
+- Never log secrets, auth-endpoint request bodies, full JWTs, or PII.
+- HTTPS only at the edge. Behind a TLS-terminating proxy is fine; never speak plain HTTP across the public internet.
+- Use `helmet` for security headers, `cors` with an explicit allowlist (no `*` in production).
+- Validate every untrusted input (`zod`, `joi`, `valibot`). Validation belongs at the boundary, not inside services.
+- Rate-limit public endpoints (`express-rate-limit` or upstream proxy).
+- Run `npm audit` / `pnpm audit` regularly. Patch promptly.
+## Performance
+- **Streams** (`fs.createReadStream`, `pipeline`) for large payloads — never load multi-MB files fully into memory.
+- Multi-core: `cluster`, PM2, or a process manager. Pure async will not save you from a single-thread CPU bottleneck.
+- Keep request handlers off the event loop. Offload heavy CPU work to a `worker_thread` or a queue.
+- Connection pool for every DB client. `pg`, `mysql2`, and `mongoose` default to a pool — do **not** create a client per request.
+- Cache idempotent reads at an explicit layer (Redis, in-memory LRU). Do not bolt caching into a service method.
+- Use `Buffer`/typed arrays for binary data, not strings.
+## Logging
+- Structured logger (`pino`, `winston`). Emit **JSON**, not human strings.
+- Every request: request id, method, path, status, latency. Propagate the request id through downstream calls via `AsyncLocalStorage`.
+- Levels: `error` (pageable), `warn` (suspicious), `info` (business event), `debug` (off in production).
+- Never `console.log` in production code. `console.log` exists for ad-hoc REPL only.
+## Tests
+- Unit-test services and domain functions with zero Node-runtime dependencies — no `fs`, no `net`, no real DB.
+- Integration-test the HTTP layer with `supertest` against the real Express / Fastify app instance.
+- `testcontainers` for DBs when the production DB has features (locks, full-text, JSONB, partitioning) you depend on. In-memory fakes lie about behaviour.
+- Test the error paths. Coverage of the happy path only is not coverage.
+## Formatter / linter / tools
+- Same as JS / TS: `prettier`, `eslint`, `tsc`.
+- `npm run lint && npm test` (or the project's equivalent) must pass before any commit.

package/runtime/skills/okstra-coding-preflight/languages/python.md ADDED Viewed

@@ -0,0 +1,179 @@
+# Python Conventions
+If a project-level Python style skill or repo-local `CONTRIBUTING.md` / `pyproject.toml` config exists, **it wins**. This file is the fallback.
+## Core principles
+- Write simple, explicit, readable Python. Prefer boring, maintainable code over clever abstractions.
+- Optimize for correctness first, then clarity, then performance.
+- Keep functions small and single-responsibility. Prefer composition over inheritance.
+- Type-hint all public functions, class methods, and non-trivial internal functions.
+- Use dataclasses or Pydantic models for structured data — never loose `dict`s for domain data.
+- Make side effects explicit. Fail fast with clear errors. No hidden global state.
+- Code must be testable without network, filesystem, database, or time dependencies.
+## Python version
+Target **Python 3.11+** unless told otherwise. Use modern syntax:
+- `str | None`, not `Optional[str]`
+- `list[str]` / `dict[str, int]`, not `List` / `Dict`
+- `match` only when it improves clarity
+- `pathlib.Path`, not string paths
+## Style guide
+PEP 8, enforced by `ruff format` (or `black`).
+- Indent: **4 spaces**. Line length: **88** (ruff/black default) unless the project sets otherwise.
+- `snake_case` functions/variables, `PascalCase` classes, `SCREAMING_SNAKE_CASE` constants, leading `_` for private.
+## Function design
+- One thing per function. Avoid bodies over ~40 lines without a strong reason.
+- Explicit parameters over reading globals. Return values instead of mutating arguments.
+- Avoid behavior-changing boolean flags — split into separate functions.
+- Do not hide I/O inside functions that look pure. Do not catch broad exceptions unless re-raising with useful context.
+Bad:
+```python
+def process(data, save=True, notify=False):
+    ...
+```
+Better:
+```python
+def build_invoice(data: InvoiceInput) -> Invoice: ...
+def save_invoice(invoice: Invoice) -> None: ...
+def notify_invoice_created(invoice: Invoice) -> None: ...
+```
+## Typing
+- Type hints everywhere meaningful. Avoid `Any` unless unavoidable; never silence a type error without a one-line why.
+- No untyped dicts for domain data — use `dataclass`, `Enum`, `TypedDict`, or Pydantic.
+- `Protocol` for dependency inversion.
+Bad:
+```python
+def create_user(data: dict): ...
+```
+Better:
+```python
+@dataclass(frozen=True)
+class CreateUserCommand:
+    email: str
+    name: str
+def create_user(command: CreateUserCommand) -> User: ...
+```
+## Error handling
+- Specific exception types with useful context. Never swallow exceptions silently.
+- Do not use exceptions for normal control flow.
+- Convert infrastructure errors into application/domain errors at boundaries.
+- Never expose internal stack traces or secrets to users.
+Bad:
+```python
+try:
+    send_email(user)
+except Exception:
+    pass
+```
+Better:
+```python
+try:
+    send_email(user)
+except EmailProviderError as exc:
+    raise NotificationFailedError(f"Failed to notify user_id={user.id}") from exc
+```
+## Async
+- Use async only for real I/O concurrency. Never call sync-blocking I/O inside an async function.
+- Always set timeouts on external calls. No fire-and-forget tasks without explicit lifecycle + error handling.
+Bad:
+```python
+async def fetch():
+    requests.get(url)          # blocking call inside async
+```
+Better:
+```python
+async def fetch(client: httpx.AsyncClient, url: str) -> Response:
+    return await client.get(url, timeout=10)
+```
+## Project / module layout
+Separate domain logic from infrastructure; keep business rules independent of frameworks, DBs, queues, and external APIs.
+- `domain/` — entities, value objects, pure business logic
+- `application/` — use cases, orchestration, commands, queries
+- `infrastructure/` — DB, external APIs, filesystem, queues
+- `interfaces/` — HTTP, CLI, workers, event handlers
+- `tests/` — unit, integration, e2e
+When the project is ports-and-adapters, also read [../architecture/hexagonal.md](../architecture/hexagonal.md).
+## Security
+- Never hardcode secrets — read from env vars or a secret manager.
+- Parameterized SQL only; never build SQL via string interpolation / f-strings.
+- Validate all external input. Treat file paths, URLs, headers, and serialized input as untrusted.
+- No unsafe deserialization (arbitrary `pickle.load`).
+## Database
+- Explicit, reviewable SQL. Avoid N+1 queries. Wrap multi-step writes in transactions.
+- Use repository interfaces (`Protocol`) when DB access should be decoupled from domain logic.
+- Keep migrations backward-compatible when possible.
+## Logging
+- Structured logging — never `print()` for application logs.
+- Include IDs/context (request, user, job, entity). Never log secrets, tokens, passwords, cookies, API keys, or sensitive PII.
+## Tests
+- Unit-test domain logic; integration-test DB, external-API wrappers, and framework wiring.
+- Deterministic tests — no `sleep`, no execution-order dependence, no real external services unless marked integration/e2e.
+- Mock only at boundaries. Use factories/builders for test data. Test behavior, not implementation.
+- Every bug fix ships a regression test when feasible.
+### Self-mock signals to refuse (rule from `clean-code.md` → Testing discipline)
+- `unittest.mock.patch`-ing a method **on the class under test**, then asserting that method was called. Mock the collaborator it depends on, not the unit it *is*.
+- Splitting out a helper *only so* the test can patch it, then asserting `helper.assert_called_once()` as the real check — that proves the SUT calls the helper, not that the behavior works.
+- Reaching into privates (`obj._internal`) to assert on internal state instead of observable outcomes.
+- A `MagicMock` standing in for the SUT itself with a `return_value` that mirrors the very thing under test.
+What's fine: mocking injected dependencies (`Mock(spec=UserRepository)`, `httpx.MockTransport`), asserting on return values / raised exceptions / emitted events / boundary calls.
+## Required tooling
+Run before declaring a Python change complete:
+- `ruff check .` — lint
+- `ruff format --check .` — formatting (or `black --check .`)
+- `mypy .` or `pyright` — type check (if configured)
+- `pytest` — tests
+Dependency/project management: `uv`, Poetry, or `pip-tools`. Pin dependencies in application projects; reach for the standard library before adding third-party deps, and avoid importing heavy deps at module-import time when unnecessary.
+## Forbidden anti-patterns
+God classes/functions, hidden global mutable state, circular imports, catch-all `except Exception` without re-raise, silent failure, untyped domain dicts, business logic in controllers/routes or coupled to ORM models, hardcoded secrets, string-formatted SQL, unbounded retries, missing timeouts on external calls, fire-and-forget async without error handling, order-dependent tests, abstractions before two concrete use cases, magic constants without names, copy-pasted logic, comments that restate obvious code, leftover `print()` debugging.

package/runtime/skills/okstra-coding-preflight/languages/rust.md ADDED Viewed

@@ -0,0 +1,105 @@
+# Rust Conventions
+If a project-level `rust-guidelines` skill or repo-local `CONTRIBUTING.md` exists, **it wins**. This file is the fallback.
+## Style guide
+Official Rust Style Guide (rustwiki.org/en/style-guide/), enforced by `rustfmt`. Set the edition in `rustfmt.toml`:
+```toml
+style_edition = "2024"
+```
+## Naming (RFC 430)
+| Element | Convention | Example |
+|---|---|---|
+| Crate / module | `snake_case` | `my_crate`, `parser` |
+| Type / trait / enum variant | `UpperCamelCase` | `HttpClient`, `Error::Timeout` |
+| Function / method / variable | `snake_case` | `send_request`, `retry_count` |
+| Const / static | `SCREAMING_SNAKE_CASE` | `MAX_RETRIES` |
+| Lifetime | short lowercase | `'a`, `'src` |
+| Type parameter | single uppercase | `T`, `E` |
+## Formatting
+- Indent: **4 spaces**.
+- Max line length: **100** (`rustfmt` default).
+- Always run `cargo fmt` before commit.
+## Ownership & borrowing
+- Prefer borrowing (`&T`) over cloning. `.clone()` is a code smell unless you can name the reason in one sentence.
+- Return owned types from constructors and "convert" methods; accept borrowed slices (`&str`, `&[T]`) as parameters.
+- `&mut` only when you must mutate. Two `&` borrows are usually better than one `&mut`.
+- `Cow<'_, str>` when a function sometimes allocates and sometimes does not.
+## Error handling
+- **Library** crates: define a typed error with `thiserror`. One `Error` enum per crate, one variant per failure mode.
+- **Application** crates: `anyhow::Result<T>` at the top level is fine; convert to typed errors at module boundaries.
+- **No `unwrap()` / `expect()` in production paths.** Acceptable in:
+  - Tests.
+  - `main` for setup that genuinely cannot fail.
+  - `expect("invariant: ...")` when the message documents the invariant.
+- Use the `?` operator. Do not write `match`-on-`Result` ladders.
+## Idioms
+- Iterators (`map`, `filter`, `collect`, `fold`, `find`) over indexed loops.
+- `if let` / `let else` over `match` with a single arm.
+- `#[derive(Debug, Clone, ...)]` aggressively. Add `PartialEq` / `Eq` / `Hash` when the type lands in a collection.
+- Newtype small primitives that carry meaning: `struct UserId(u64);`, not raw `u64`.
+- `mut` and `pub` only when needed. Start private, widen on demand.
+- Use `From` / `TryFrom` for conversions, not `parse_x_from_y` helpers.
+- Pattern-match deeply (`if let Some(User { id, .. }) = user`) rather than chained `.unwrap().unwrap()`.
+## Async / Tokio
+- Pick **one** runtime per binary (`tokio` is standard). Never mix `tokio` and `async-std`.
+- Never `block_on` inside an async function.
+- Spawn tasks deliberately; remember `JoinHandle`s and `await` them.
+- `tokio::select!` for races; never poll futures by hand.
+- Long CPU work goes in `tokio::task::spawn_blocking`.
+- Cancellation: use `CancellationToken` (tokio-util) for cooperative shutdown.
+## Unsafe
+- Every `unsafe` block needs a comment documenting **every** invariant the caller must uphold.
+- New `unsafe` requires a second pair of eyes on review.
+- Default to safe abstractions (`bytes`, `parking_lot`, `crossbeam`) before reaching for `unsafe`.
+## Module layout
+- Public API at the crate root via `pub use`; implementation in private modules.
+- One concept per file. Split a module before it passes ~500 lines.
+- Unit tests in `mod tests { use super::*; ... }` at the bottom of the file.
+- Integration tests in the `tests/` directory.
+## Tests
+- Unit: `#[test]` inside `mod tests`. Use `assert_eq!`, `assert!`, `assert_matches!`.
+- Async: `#[tokio::test]` or `#[test]` with an explicit `tokio::runtime::Runtime` when you need control.
+- Property: `proptest` or `quickcheck` for pure transformations.
+- **Doc tests**: runnable examples in doc comments are tests — keep them green.
+- Use `pretty_assertions::assert_eq!` for readable diffs on large structs.
+### Self-mock signals to refuse (rule from `clean-code.md` → Testing discipline)
+Rust's trait-based DI makes self-mocking rarer than in JVM/JS, but it still happens:
+- Using `mockall::mock!` (or `automock`) to generate a mock of the **same struct under test**, then asserting on its own methods. The unit under test must be the real impl; mock the trait it depends on, not the trait it *is*.
+- Splitting a behavior into a helper trait *only so* the test can stub it, then expecting `expect_helper().returning(...)` to be the real assertion. The test now proves the SUT calls the helper, not that the behavior works.
+- Reaching into privates via `pub(crate)` widening, `#[cfg(test)] pub` shortcuts, or test-only modules that expose internal state to assert on.
+- A test that constructs the SUT, replaces one of its trait-object dependencies with a mock whose `returning(...)` mirrors the very thing being tested.
+What's fine: `mockall` mocks of injected trait dependencies (`MockHttpClient`, `MockUserRepository`), `unwrap()` in tests for setup that genuinely cannot fail, `assert_eq!` on returned values, `assert_matches!` on returned `Result` / `Option`.
+## Required tooling
+Run before any commit:
+- `cargo fmt --all` — formatting.
+- `cargo clippy --all-targets --all-features -- -D warnings` — lint (warnings are errors).
+- `cargo check --all-targets` — fast type check.
+- `cargo test --all` — tests.

package/runtime/skills/okstra-coding-preflight/languages/sql.md ADDED Viewed

@@ -0,0 +1,68 @@
+# SQL Conventions
+Defaults below target PostgreSQL. MySQL-specific notes are marked.
+## Formatting / style
+- Keywords **lowercase**: `select`, `from`, `where`, `join`. (Some teams prefer uppercase — match the file you are editing; never mix in one statement.)
+- All identifiers `snake_case`: tables, columns, indexes, constraints.
+- Always use explicit `as` for column aliases: `select count(*) as user_count`.
+- Always state the join type: `inner join`, `left join`, `cross join`. **Never** a bare `join`.
+- One column per line in `select`; one table or join per line in `from`.
+- Prefer **CTEs** (`with foo as (...)`) over deeply nested subqueries.
+- Dates / timestamps in ISO 8601: `'2026-05-17'`, `'2026-05-17T12:00:00Z'`.
+## Schema design
+- Every table has an `id` primary key.
+  - PostgreSQL: `id bigint generated always as identity primary key`.
+  - MySQL: `id bigint unsigned not null auto_increment primary key`.
+- Foreign keys named `<referenced_table>_id`: `user_id`, `order_id`.
+- Singular table names (`user`, `order`, `invoice_line`) unless the project already uses plural — match, do not mix.
+- `comment on table ... is '...'` for every table; `comment on column ...` for non-obvious columns. (MySQL: `comment '...'` inline.)
+- Timestamps: `created_at`, `updated_at`, both `timestamptz` in Postgres. Keep timezone discipline explicit — store UTC, render in the user's zone at the edge.
+- Soft delete: `deleted_at timestamptz null`. Index it if you query on it. Add a partial index `where deleted_at is null` for hot reads.
+- Use `not null` aggressively. `null` should mean "unknown", not "default".
+- Use enum or `check` constraints over free-text status columns.
+- Composite indexes: column order matches your `where` / `order by` patterns; left-most prefix wins.
+## Migrations
+- One change per migration file. Name: `<timestamp>_<verb>_<noun>.sql` (e.g. `20260517_120000_add_user_deleted_at.sql`).
+- Migrations are **forward-only** in production. Write a "down" migration only if your tool requires it and you genuinely intend to run it.
+- Splitting a destructive change:
+  1. Add nullable column.
+  2. Backfill (separate deploy).
+  3. Add `not null` / index `concurrently`.
+  4. Drop the old column.
+- `create index concurrently` in Postgres for hot tables — avoids the write lock.
+- Application-deploy migrations **never** destroy data. Destructive operations are a separate, explicit step gated on backups.
+## Query practices
+- Always filter on indexed columns for OLTP queries. Verify with `explain` / `explain analyze`.
+- Never `select *` in application code (migrations and ad-hoc inspection are fine).
+- Parameterised queries from the application layer. String concatenation is SQL injection.
+- `limit` every query that could return an unbounded set.
+- `returning *` (Postgres) on `insert` / `update` / `delete` instead of a follow-up `select`.
+- For pagination, prefer keyset (`where id > $last_seen`) over offset on large tables.
+## MySQL-specific
+- Engine: **InnoDB** (default). Charset: `utf8mb4`, collation `utf8mb4_0900_ai_ci` (8.0+).
+- Quote identifiers with backticks when they collide with reserved words.
+- `select ... for update` only inside an explicit transaction.
+- `on duplicate key update` for upserts; in Postgres use `on conflict (...) do update`.
+## Tests
+- Run query tests against a **real** database via `testcontainers` or a disposable schema. In-memory SQLite is not equivalent to Postgres / MySQL — different SQL dialect, different isolation, different `null` semantics.
+- Migrations themselves must be tested: apply from an empty schema in CI; assert the resulting structure.
+- Seed test data via factory functions (one per table), not raw `insert` statements copy-pasted into each test.
+- Verify performance assumptions with `explain` in CI for queries flagged as hot.
+## Tools
+- Formatting: `pg_format`, `sqlfluff`, `sql-formatter`.
+- Linting / safety: `squawk` (Postgres migration linter — catches missing `concurrently`, locking foot-guns, etc.).
+- Schema diff: `migra` (Postgres), `mysqldiff` (MySQL).

package/runtime/skills/okstra-context-loader/SKILL.md CHANGED Viewed

@@ -56,7 +56,7 @@ user-invocable: false
 | `workflow.awaitingApproval` | Approval wait marker |
 | `workflow.routingStatus` | Routing decision status |
 | `workflow.lastSafeCheckpoint` | Safe resume checkpoint metadata |
-| `instructionSetPath` | Path to the `instruction-set/` **directory** containing `analysis-profile.md`, `analysis-material.md`, `reference-expectations.md`, `task-brief.md`, `final-report-template.md` (see Step 4). Not a single-file path. |
+| `instructionSetPath` | Path to the `instruction-set/` **directory** containing `analysis-packet.md`, `analysis-profile.md`, `analysis-material.md`, `reference-expectations.md`, `task-brief.md`, `final-report-template.md` (see Step 4). Not a single-file path. |
 | `referenceExpectationsPath` | config/deployment expectation artifact path |
 | `latestRunPath` | latest run path |
 | `latestRunStatus` | latest run status |
@@ -78,6 +78,7 @@ After identifying the task root in `task-manifest.json`, derive all paths accord
 ├── task-index.md               (human-readable summary, non-canonical)
 ├── instruction-set/
 │   ├── analysis-profile.md     (analysis guide by task type)
+│   ├── analysis-packet.md      (primary compact input for analysis workers)
 │   ├── analysis-material.md    (analysis materials)
 │   ├── reference-expectations.md (config/deployment expected values)
 │   ├── task-brief.md          (task brief)
@@ -116,13 +117,18 @@ After identifying the task root in `task-manifest.json`, derive all paths accord
 ## Step 4: Instruction Set Reading Order
-After verifying `task-manifest.json`, read the instruction set in the following order:
+After verifying `task-manifest.json`, read only the compact intake files needed for the current action. Do not bulk-read the whole instruction-set directory.
 1. `instruction-set/analysis-profile.md` (analysis guide by task type)
-2. `instruction-set/analysis-material.md` (analysis materials, if available)
-3. `instruction-set/reference-expectations.md` (config/deployment expected values)
-4. `instruction-set/task-brief.md` (task brief)
-5. `instruction-set/final-report-template.md` (report template)
+2. `instruction-set/analysis-packet.md` (primary compact input for analysis workers)
+3. `runs/<task-type>/state/active-run-context-<task-type>-<seq>.json` if present (compact current-run path/worker snapshot)
+Read source files lazily:
+- `instruction-set/task-brief.md` only for reporter-confirmation checks, source verification, or report-writer synthesis.
+- `instruction-set/analysis-material.md` only when packet content is insufficient or a source citation needs verification.
+- `instruction-set/reference-expectations.md` for report-writer synthesis or when packet expectation extract is insufficient.
+- `instruction-set/final-report-template.md` only for report-writer authoring.
 ### Brief Reporter-Confirmation Precondition (BLOCKING)

package/runtime/skills/okstra-convergence/SKILL.md CHANGED Viewed

@@ -77,7 +77,7 @@ Read the worker result files generated in Phase 4/5 and extract individual findi
    - For bullet/numbered findings, parse `[TICKETID: <id>]` from the item title.
    - Items with multiple tickets (e.g. `TICKET-123, TICKET-456`) expand to a set of ticket keys.
    - Items tagged `unknown` keep the literal `unknown` as their ticket key.
-2. For each finding, record the summary, evidence (file path, line number, basis), the worker who identified it, **the worker-internal item ID assigned by that worker** (e.g. `F-001`, `1.1`, `F-3` — see `prompts/profiles/_common-contract.md` "Cross-worker traceability" SSOT), and the parsed ticket set. The item ID is persisted on the finding record as `findings[].discoveredBy.<worker>.itemId` and on each cross-worker confirmation as `findings[].sourceItems[]` (one entry per contributing `<worker>:<item-id>` pair). The final-report's `## 1.1 Consensus` / `## 1.2 Differences` / `## 3.1 Primary Evidence` tables read this list verbatim into their `Source items` columns — without this, the synthesised `C-NNN` row has no traceable link back to the original worker wording.
+2. For each finding, record the summary, evidence (file path, line number, basis), the worker who identified it, **the worker-internal item ID assigned by that worker** (e.g. `F-001`, `1.1`, `F-3` — see `prompts/profiles/_common-contract.md` "Cross-worker traceability" SSOT), and the parsed ticket set. The item ID is persisted on the finding record as `findings[].discoveredBy.<worker>.itemId` and on each cross-worker confirmation as `findings[].sourceItems[]` (one entry per contributing `<worker>:<item-id>` pair). The final-report's `## 6.1 Consensus` / `## 6.2 Differences` / `## 2.1 Primary Evidence` tables read this list verbatim into their `Source items` columns — without this, the synthesised `C-NNN` row has no traceable link back to the original worker wording.
 3. Claude Lead groups findings based on semantic similarity AND ticket-set equality:
   - Same semantics + same ticket set across 2+ workers → immediately reach `full consensus`.
   - Same semantics but disjoint ticket sets → keep as separate groups (do NOT over-merge across tickets).
@@ -561,12 +561,12 @@ existing Acceptance Blocker. If you find none, say so explicitly.
 ### Verification — confirm-or-downgrade (BLOCKING)
 Each candidate blocker is verified by the Phase 4 analysers (excluding the critic). Do NOT use the adversarial finding classifier's "uncertain → reject" rule here.
-- **Confirmed** (an analyser reproduces it or cites supporting evidence) → promote to a `## 4 Acceptance Blockers` row (keep severity + recommended follow-up phase).
+- **Confirmed** (an analyser reproduces it or cites supporting evidence) → promote to a `## 5.8 Acceptance Blockers` row (keep severity + recommended follow-up phase).
 - **Not confirmed** (cannot reproduce, or evidence is weak) → **downgrade to a Residual Risk row — never drop it.** Record the escalation trigger so the user can re-judge a high-severity-but-unconfirmed candidate.
 ### Verdict impact
-Promoted blockers enter `## 4 Acceptance Blockers`; since `accepted` requires zero blockers, the verdict moves to `conditional-accept` / `blocked` automatically. The existing verdict↔blocker consistency validator (`validators/validate-run.py` `_validate_final_verification_consistency`) enforces this unchanged — no new enum or validator.
+Promoted blockers enter `## 5.8 Acceptance Blockers`; since `accepted` requires zero blockers, the verdict moves to `conditional-accept` / `blocked` automatically. The existing verdict↔blocker consistency validator (`validators/validate-run.py` `_validate_final_verification_consistency`) enforces this unchanged — no new enum or validator.
 ### State
@@ -630,7 +630,7 @@ Default values are emitted into the manifest by `scripts/okstra_ctl/render.py` (
 ### Plan-item extraction (Round 0 equivalent)
-From the report-writer's draft of `## 4.5 Implementation Plan Deliverables`, lead extracts plan items with the following prefixes (see also `templates/reports/final-report.template.md` §4.5.9):
+From the report-writer's draft of `## 5.5 Implementation Plan Deliverables`, lead extracts plan items with the following prefixes (see also `templates/reports/final-report.template.md` §5.5.9):
 | Prefix | Source sub-section | One row per |
 |--------|--------------------|-------------|
@@ -689,13 +689,13 @@ Plan-body verification stays **lightweight** even under this posture — the `ve
    - all dispatches non-result → `aborted-non-result`
    - any `partial-consensus` / `dissent-isolated` present, no `majority-disagree` → `passed-with-dissent`
    - all items `full-consensus` → `passed`
-6. Lead writes `runs/<task-type>/state/plan-body-verification-<task-type>-<seq>.json` (schema below) and populates `### 4.5.9 Plan Body Verification` in the final report (template at `templates/reports/final-report.template.md`). The §4.5.9 body uses a single `#### Verdict details` table (`Plan item / Worker / Verdict / Breakage kind / Note` — one row per plan-item × worker pair). The older wide `| Plan item | <worker1> | <worker2> | … | Classification |` matrix and the former narrow `#### Verdict summary` card are both removed — the matrix scaled horizontally with the worker count, and the summary only restated per-item classifications already derivable from the details table. The validator's `Plan Body Verification` + `Gate result:` substring checks gate this section.
-7. For every `majority-disagree` item, lead adds a row to `## 5. Clarification Items` with:
+6. Lead writes `runs/<task-type>/state/plan-body-verification-<task-type>-<seq>.json` (schema below) and populates `### 5.5.9 Plan Body Verification` in the final report (template at `templates/reports/final-report.template.md`). The §5.5.9 body uses a single `#### Verdict details` table (`Plan item / Worker / Verdict / Breakage kind / Note` — one row per plan-item × worker pair). The older wide `| Plan item | <worker1> | <worker2> | … | Classification |` matrix and the former narrow `#### Verdict summary` card are both removed — the matrix scaled horizontally with the worker count, and the summary only restated per-item classifications already derivable from the details table. The validator's `Plan Body Verification` + `Gate result:` substring checks gate this section.
+7. For every `majority-disagree` item, lead adds a row to `## 1. Clarification Items` with:
    - new `C-<N>` ID (numbering continues from any existing rows)
    - `Statement` summarising the disagreement and the worker breakage `<kind>`
    - `Kind` chosen per the standard policy (usually `decision` for option-level conflicts, `data-point` for path/symbol mismatches)
    - `Blocks=approval`
-   - the §4.5.9 verdict table's `Classification` column for that row reads `majority-disagree → C-<N>` (1:1 ID match — orphan on either side is a contract violation per `prompts/profiles/implementation-planning.md` self-review step 6).
+   - the §5.5.9 verdict table's `Classification` column for that row reads `majority-disagree → C-<N>` (1:1 ID match — orphan on either side is a contract violation per `prompts/profiles/implementation-planning.md` self-review step 6).
 8. The top-of-report `- [ ] Approved` marker line is rendered if and only if the Gate result is `passed` or `passed-with-dissent`. `validators/validate-run.py` `validate_phase_boundary` enforces this correspondence; manually adding the marker line when the gate did not pass is a contract violation.
 ### `plan-body-verification-<task-type>-<seq>.json` schema
@@ -802,4 +802,4 @@ Mirrors finding convergence (§"Worker failure handling in reverify"). Concretel
 - A dispatch that returns terminal non-result MUST NOT be aggregated as `DISAGREE`.
 - If at least one dispatch was issued AND **all** plan-body dispatches return non-result, the Gate result is `aborted-non-result`. Record one `contract-violation` event per non-result dispatch.
-- When the gate is `aborted-non-result`, report-writer MUST keep the frontmatter `approved: false` (publishing `approved: true` under this gate result is a validator failure). A single row is added to `## 5. Clarification Items` with `Statement="plan-body verification could not run — all workers returned non-result"`, `Kind=decision`, `Blocks=approval`, allowing the user to either retry the phase or override by manually flipping the frontmatter to `approved: true` (or running `--approve` on the resume command).
+- When the gate is `aborted-non-result`, report-writer MUST keep the frontmatter `approved: false` (publishing `approved: true` under this gate result is a validator failure). A single row is added to `## 1. Clarification Items` with `Statement="plan-body verification could not run — all workers returned non-result"`, `Kind=decision`, `Blocks=approval`, allowing the user to either retry the phase or override by manually flipping the frontmatter to `approved: true` (or running `--approve` on the resume command).