okstra 0.71.1 → 0.71.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/runtime/BUILD.json +2 -2
- package/runtime/agents/SKILL.md +1 -1
- package/runtime/prompts/profiles/_implementation-executor.md +1 -1
- package/runtime/prompts/profiles/_implementation-verifier.md +1 -1
- package/runtime/prompts/profiles/implementation-planning.md +2 -2
- package/runtime/python/okstra_ctl/conformance.py +5 -0
- package/runtime/python/okstra_ctl/render.py +17 -1
- package/runtime/python/okstra_token_usage/claude.py +64 -8
- package/runtime/python/okstra_token_usage/collect.py +30 -1
- package/runtime/validators/validate-run.py +40 -4
- package/runtime/validators/validate_session_conformance.py +7 -1
package/package.json
CHANGED
package/runtime/BUILD.json
CHANGED
package/runtime/agents/SKILL.md
CHANGED
|
@@ -89,7 +89,7 @@ Required checkpoints:
|
|
|
89
89
|
- `PROGRESS: phase-1-intake reading task bundle` — at the start of Phase 1, before issuing parallel Read calls.
|
|
90
90
|
- `PROGRESS: phase-1-intake complete` — after all intake reads return.
|
|
91
91
|
- `PROGRESS: phase-2-prompts preparing <N> worker prompts` — at the start of Phase 2, before any `Write` to the assigned prompt paths.
|
|
92
|
-
- `PROGRESS: phase-3-team-create attempting TeamCreate` — immediately before the `TeamCreate` call.
|
|
92
|
+
- `PROGRESS: phase-3-team-create attempting TeamCreate` — immediately before the `TeamCreate` call. When the launch prompt's "Concurrent-run: no-team background" gate forbids TeamCreate, emit `PROGRESS: phase-3-team-create skipped (concurrent-run)` instead, immediately after recording `teamCreate: { attempted: false, status: "skipped", reason: "concurrent-run" }` in team-state — the checkpoint line itself is still required.
|
|
93
93
|
- `PROGRESS: phase-4-dispatch worker=<role> model=<model>` — once per worker, immediately before the `Agent` / wrapper call.
|
|
94
94
|
- `PROGRESS: phase-5-poll pending=<n> done=<m>` — emitted on each wakeup while the pending set is non-empty.
|
|
95
95
|
- `PROGRESS: phase-5-collect worker=<role> status=<terminal-status>` — once per worker, immediately after the result file is verified.
|
|
@@ -30,7 +30,7 @@ until Phase 5 ends, then drop from active context for Phase 6/7.
|
|
|
30
30
|
- Doc-only / config-only / pure-rename steps that have no observable runtime behaviour are exempt from the failing-test requirement, but the executor MUST cite the exemption per step in the final report (`TDD exemption: <reason>`).
|
|
31
31
|
- When the touched area has no existing test harness, the executor MUST stand up the minimum harness needed to host one regression test for this run rather than skipping TDD entirely. Record the harness-bootstrap step as an `Out-of-plan edit` if it is not in the plan.
|
|
32
32
|
- **DB / IO / SQL changes require real execution — mock-only is NOT validation evidence:** when this run's diff touches DB/IO/SQL (ORM / query-builder code — sequelize / typeorm / prisma / knex / raw SQL — `*.repository.*`, model/entity files, `migrations/**`, `*.sql`, or any changed query string), a mocked unit test cannot observe the SQL the query builder actually emits — a mocked suite once passed while `count({ col: 'FontFamily.fontFamily' })` threw `Unknown column` on the real DB. The executor MUST run the change against a real (or faithful-replica) datastore — the `db-test` validation step (plan `validation` db step, else `project.json.qaCommands.db-test`), targeting a **local / replica** DB — and cite its exact command + exit code in the final report's `Validation evidence`. If no real DB / `db-test` command is reachable, do NOT claim the change verified: label the DB portion `정적 분석상 …, 미검증(실행 안 함)` in the report, surface it in the routing recommendation, and never downplay the real run as "too heavy". `git push` stays forbidden (universal list); the unverified DB state is carried forward so `final-verification` cannot accept it and `release-handoff` cannot push.
|
|
33
|
-
- **Real-IO test isolation (BLOCKING).** A test that exercises a **real** datastore, HTTP endpoint, external service, message queue, or filesystem — a live DB connection / DSN, a real `fetch` / `axios` / `http` request, an actual S3 / queue client, anything the project's normal CI test suite cannot run because that backend is absent — MUST be written under the task's qa directory `<task_root>/qa/` (the `
|
|
33
|
+
- **Real-IO test isolation (BLOCKING).** A test that exercises a **real** datastore, HTTP endpoint, external service, message queue, or filesystem — a live DB connection / DSN, a real `fetch` / `axios` / `http` request, an actual S3 / queue client, anything the project's normal CI test suite cannot run because that backend is absent — MUST be written under the task's qa scripts directory `<task_root>/qa/scripts/` (`<TASK_QA_PATH>/scripts`; the `qa/` root itself holds only data sidecars — the Tier 3 conformance manifest and `result-*.json`). It MUST NOT be written into the project source test tree — `src/**`, `test/**`, `tests/**`, `**/__test__/**`, `**/__tests__/**`, `*.spec.*`, `*.test.*`, or anywhere the project's lint/test globs collect. Two reasons: (a) the project's CI / normal suite has no real DB or network, so a real-IO test placed in source silently breaks the pipeline; (b) it is an okstra verification artifact, and the artifact-home rule confines okstra outputs to `.okstra/`. **The dividing line is the IO, not the intent:** a unit test that stubs/spies only *injected collaborators* (mock — no real socket, no real DB handle) is a TDD red-green artifact and stays in source; the moment a test opens a real connection or makes a real network call it belongs in qa. A stage's real-IO requirement check is a Tier 3 conformance script under `<task_root>/qa/scripts/` (declared via the implementation-planning conformance entry) — never smuggle real IO into a `*.spec.*` in source to make it run "as a unit test". The `db-test` real-execution gate above is satisfied by the conformance/db-test path against the replica, NOT by adding a live-DB `*.spec.*` to the project suite. **Author qa specs with the project's own test framework — never hand-roll `describe`/`it`/`expect`.** When the project ships a test runner as a devDependency (jest / vitest / pytest …), the qa spec uses it, invoked with the project config plus a discovery override pointing at the qa scripts dir (jest: `npx jest --config <project jest config> --roots <task_root>/qa/scripts --runInBand <spec-name>`) — the project config keeps module aliases resolving while the default sweep never collects the file; never widen the project's own test config to include qa paths. For TypeScript qa specs also write `<task_root>/qa/scripts/tsconfig.json` (`extends` the project tsconfig, adds the runner's `types` entry, `"include": ["**/*.ts"]`) so editors resolve path aliases and test globals — it is a qa artifact like the rest (untracked). **These qa artifacts stay untracked — never commit them.** `.okstra/**` is gitignored (the artifact-home rule); conformance scripts and their results are *executed* and recorded in the carry sidecar / verifier result, never written into git history. A committed `.okstra/qa` file is a stage-branch defect that leaks okstra internals into the eventual PR (see the `git add` rules below).
|
|
34
34
|
- re-read the approved plan end-to-end and parse the `## 5.5 Stage Map`. Read the **Stage** injected in the launch prompt (`Stage for this implementation run`): the single stage number this run owns. The runtime already selected and reserved this stage (one run = one stage) — do NOT recompute the start stage from `consumers.jsonl`.
|
|
35
35
|
- load every `runs/<plan-key>/carry/stage-<i>.json` for `i ∈ depends-on(this stage)` and inject them into the executor's working context as "runtime carry-in". For a `depends-on (none)` stage, no sidecar load — task-brief only.
|
|
36
36
|
- this stage's `depends-on` are all already `status:done`. Its file list, step order, Stage Validation commands, Stage Exit Contract, and rollback path are the authoritative scope.
|
|
@@ -97,7 +97,7 @@ Re-running commands proves the diff *builds and passes*; it does NOT prove the d
|
|
|
97
97
|
- **Untruthful name:** a read-named function (`get*` / `find*` / `load*`) that writes/inserts/mutates; an adapter or repository name encoding the caller's use-case (`*ForInit`) or hiding a domain rule (`findValid*` / `findActive*`).
|
|
98
98
|
- **Hexagonal (only when the overlay is loaded):** business logic inside a port body; an adapter method that is not pure I/O (post-fetch JS filtering on domain state, domain-rule evaluation); a domain object declared outside the `domain/` boundary.
|
|
99
99
|
- **okstra artifact committed to the branch:** any path in the `git diff --name-only <base>...HEAD` enumeration that lives under `.okstra/` (or `.project-docs/` when the legacy symlink is present). `.okstra/**` is gitignored, so a committed okstra file means the executor force-staged it (`git add -f`) — leaking verification artifacts (qa scripts, conformance results) into the eventual PR. Cite the path; recommend `git rm --cached <path>` to untrack it while keeping the file on disk. Conformance/qa evidence belongs in the carry sidecar / verifier result, never in git history.
|
|
100
|
-
- **Real-IO test in source tree:** a changed/added test under the project source test tree — `src/**`, `test/**`, `tests/**`, `**/__test__/**`, `**/__tests__/**`, `*.spec.*`, `*.test.*` — that opens a **real** DB connection / DSN, makes a real `fetch` / `axios` / `http` request, or otherwise hits real external IO without mocking the injected collaborator (a live handle, not a stub/spy). Real-IO tests MUST live under `<task_root>/qa/` per the executor's *Real-IO test isolation* rule — a live-IO test in source silently breaks the project's CI suite and violates the artifact-home rule. Cite the test file + the real-IO line; recommend moving it to `<task_root>/qa/` (or declaring it as a Tier 3 conformance script). Mock-only unit tests in source are NOT a hit.
|
|
100
|
+
- **Real-IO test in source tree:** a changed/added test under the project source test tree — `src/**`, `test/**`, `tests/**`, `**/__test__/**`, `**/__tests__/**`, `*.spec.*`, `*.test.*` — that opens a **real** DB connection / DSN, makes a real `fetch` / `axios` / `http` request, or otherwise hits real external IO without mocking the injected collaborator (a live handle, not a stub/spy). Real-IO tests MUST live under `<task_root>/qa/scripts/` per the executor's *Real-IO test isolation* rule — a live-IO test in source silently breaks the project's CI suite and violates the artifact-home rule. Cite the test file + the real-IO line; recommend moving it to `<task_root>/qa/scripts/` (or declaring it as a Tier 3 conformance script). Mock-only unit tests in source are NOT a hit.
|
|
101
101
|
- **Advisory findings (recorded as recommendations; verdict MAY still PASS):** function >50 effective lines, a single body mixing read+write stages, weak readability, a missing-but-non-critical outcome assertion, newly orphaned private/public code that is safe to remove but not on a critical path, or weak-but-not-misleading names. These land in the verifier result as `should-fix` / `nit` recommendations, not as a `FAIL`.
|
|
102
102
|
- **Output.** Every finding — blocking or advisory — is a structured item in the verifier's worker result (`path:line`, rule, severity, suggested fix) so it carries into Phase 5.5 convergence and the final report. A blocking hit sets the verifier verdict to `FAIL` with the rule cited, using the same verdict machinery as the Discrepancy rule above. `Claude lead` MUST NOT silently downgrade a cited blocking finding to advisory during synthesis; an override requires a concrete cited reason, exactly as for the Discrepancy rule.
|
|
103
103
|
|
|
@@ -72,9 +72,9 @@
|
|
|
72
72
|
- `### Carry-In` — for `depends-on (none)`: task-brief only. Otherwise: each depended-on stage's static exit contract + runtime sidecar path `runs/<impl-key>/carry/stage-<i>.json` placeholder.
|
|
73
73
|
- `### Stepwise Execution Order` — bite-sized table with `step | action | files | command | expected`. **Effective row count ≤ 6** (excluding header / divider / blank). Each step is one action completable in 2–5 minutes; for code steps include actual code or diff sketch. **TDD ordering is MUST, not a preference:** the **first** effective step's `action` cell MUST start with the literal `RED:` and describe the failing test that captures this stage's `Acceptance` (`expected` = FAIL); at least one later `action` cell MUST start with the literal `GREEN:` and describe the minimal implementation that makes it pass (`expected` = PASS); an optional refactor step starts with `REFACTOR:`. **Exemption:** doc-only / config-only / pure-rename stages with no observable runtime behaviour may omit RED/GREEN by declaring one line `TDD exemption: <reason>` in the stage section (mirrors the executor's per-step exemption in `_implementation-executor.md`). Validator S10c enforces RED-first + GREEN, or the exemption line.
|
|
74
74
|
- **Per-stage conformance declaration (mandatory one line, in the stage section — same placement freedom as `TDD exemption:`):** the stage MUST carry exactly one of:
|
|
75
|
-
- `Conformance tests: stage-<N> — <task_root>/qa/stage-<N>.<ext> (requires=[db|io|http|external,...])` — a Tier3 verification script that proves this stage's upstream requirements (brief / requirements-discovery / error-analysis / improvement-discovery → this stage's `Acceptance`) hold against **real** DB rows, real endpoints, or the real external API — NOT mocks. When you emit this line you MUST also (a) write the script to `<task_root>/qa/stage-<N>.<ext>` and (b) add a matching entry to `<task_root>/qa/conformance-manifest.json` with fields `stageKey` (= `<task-id>-stage-<N>`), `script`, `runCommand`, `requirementIds`, `requires` (subset of `{db, io, http, external}`), `passContract`, `exemption: null`, `waiver: null`. The script's standard interface: a `main` that exits `0`=PASS / non-zero=FAIL, and whose stdout ends with `QA-RESULT: PASS|FAIL` followed by one `REQ <id>: PASS|FAIL: <근거>` line per requirement.
|
|
75
|
+
- `Conformance tests: stage-<N> — <task_root>/qa/scripts/stage-<N>.<ext> (requires=[db|io|http|external,...])` — a Tier3 verification script that proves this stage's upstream requirements (brief / requirements-discovery / error-analysis / improvement-discovery → this stage's `Acceptance`) hold against **real** DB rows, real endpoints, or the real external API — NOT mocks. When you emit this line you MUST also (a) write the script to `<task_root>/qa/scripts/stage-<N>.<ext>` and (b) add a matching entry to `<task_root>/qa/conformance-manifest.json` with fields `stageKey` (= `<task-id>-stage-<N>`), `script`, `runCommand`, `requirementIds`, `requires` (subset of `{db, io, http, external}`), `passContract`, `exemption: null`, `waiver: null`. The script's standard interface: a `main` that exits `0`=PASS / non-zero=FAIL, and whose stdout ends with `QA-RESULT: PASS|FAIL` followed by one `REQ <id>: PASS|FAIL: <근거>` line per requirement. When the verification body is a test spec, author it with the project's own test framework (devDependency) invoked via a discovery override at `<task_root>/qa/scripts/` (jest: `--config <project config> --roots <task_root>/qa/scripts`) — never hand-roll `describe`/`expect` and never widen the project's own test config; for TypeScript specs also write `<task_root>/qa/scripts/tsconfig.json` extending the project tsconfig with the runner's `types` entry so editors resolve the file.
|
|
76
76
|
- `Conformance exemption: <reason>` — only for stages that touch no db/io/http/external surface, or where unit tests fully cover the increment. (If the eventual `implementation` diff actually touches one of those surfaces, `validate-run.py`'s diff-surface cross-check is BLOCKING — an exemption cannot hide a real db/io/http/external change.)
|
|
77
|
-
The manifest lives at the **task level** (`<task_root>/qa/`, path token `TASK_QA_PATH`) and is shared across planning → implementation → final-verification. This declaration is enforced at three layers: `validators/validate-implementation-plan-stages.py` check **S11** forces every stage to carry one of the two lines; the manifest JSON structure is enforced by `validate_conformance_manifest` (run / validate-run); and the result gate (each script's `QA-RESULT`) is enforced by the verifier Tier3 + validate-run.
|
|
77
|
+
The manifest lives at the **task level** (`<task_root>/qa/`, path token `TASK_QA_PATH`) and is shared across planning → implementation → final-verification. Layout split: executable scripts (conformance + any real-IO test) live under `<task_root>/qa/scripts/`; data sidecars (`conformance-manifest.json`, `result-*.json`) stay at the `qa/` root. This declaration is enforced at three layers: `validators/validate-implementation-plan-stages.py` check **S11** forces every stage to carry one of the two lines; the manifest JSON structure — including each entry's `script` living under `qa/scripts/` — is enforced by `validate_conformance_manifest` (run / validate-run); and the result gate (each script's `QA-RESULT`) is enforced by the verifier Tier3 + validate-run.
|
|
78
78
|
- `### Stage Exit Contract` — predicted added/modified files, newly exposed identifiers/types/endpoints, downstream-usable resources.
|
|
79
79
|
- `### Stage Validation` — pre / mid / post exact commands or observable outcomes for this stage only.
|
|
80
80
|
- **Vertical-slice-first partition rule (1st-class):** the grouping anchor is a **thin end-to-end vertical slice** — one stage delivers a single user-observable increment, crossing whatever layers are needed (data → service → API → UI) to make that one increment work. File/module proximity is demoted to the **intra-slice grouping rule**: within a slice, keep steps touching the same file/directory/module together so the diff, PR, and rollback unit stay cohesive. **Horizontal layer-splitting is forbidden** — never carve "the DB layer" into one stage and "the service layer" into the next; that produces stages that ship no standalone user value. A stage is split ONLY when (a) a real `depends-on` data/contract dependency exists, (b) effective steps would exceed 6, or (c) it is a distinct vertical slice (a different user-value increment). Maximising the number of parallel stages is NOT a reason to split — parallelism is an emergent property of independent stages, never a partitioning goal.
|
|
@@ -66,6 +66,11 @@ def _check_entry(entry: object, idx: int, errors: list[str]) -> None:
|
|
|
66
66
|
return
|
|
67
67
|
_check_nonempty_str(entry.get("stageKey"), f"{path}.stageKey", errors)
|
|
68
68
|
_check_nonempty_str(entry.get("script"), f"{path}.script", errors)
|
|
69
|
+
script = entry.get("script")
|
|
70
|
+
# 실행 스크립트는 qa/scripts/ 하위 격리가 계약(implementation-planning §conformance);
|
|
71
|
+
# qa/ 루트는 manifest·result-*.json 데이터 사이드카 전용이다.
|
|
72
|
+
if isinstance(script, str) and script.strip() and "qa/scripts/" not in script:
|
|
73
|
+
errors.append(f"{path}.script must live under the task qa scripts dir (qa/scripts/), got {script!r}")
|
|
69
74
|
_check_nonempty_str(entry.get("runCommand"), f"{path}.runCommand", errors)
|
|
70
75
|
_check_nonempty_str(entry.get("passContract"), f"{path}.passContract", errors)
|
|
71
76
|
req_ids = entry.get("requirementIds")
|
|
@@ -1127,6 +1127,14 @@ def render_run_manifest(run_manifest_path: str, ctx: dict) -> None:
|
|
|
1127
1127
|
if isinstance(task_manifest.get("workflow"), dict)
|
|
1128
1128
|
else {}
|
|
1129
1129
|
)
|
|
1130
|
+
# prepare 가 감지한 동시-run 사실의 영속 앵커. validator 는 이 prepare-측
|
|
1131
|
+
# 기록이 있을 때만 no-team(teamCreate skipped) 경로를 legal 로 인정한다 —
|
|
1132
|
+
# lead 의 team-state 자기 선언만으로는 열리지 않는다.
|
|
1133
|
+
concurrent_run_stages = [
|
|
1134
|
+
int(s)
|
|
1135
|
+
for s in str(ctx.get("CONCURRENT_RUN_STAGES", "") or "").split(",")
|
|
1136
|
+
if s.strip().isdigit()
|
|
1137
|
+
]
|
|
1130
1138
|
payload = {
|
|
1131
1139
|
"schemaVersion": "1.0",
|
|
1132
1140
|
"okstraVersion": ctx.get("OKSTRA_VERSION", ""),
|
|
@@ -1173,6 +1181,10 @@ def render_run_manifest(run_manifest_path: str, ctx: dict) -> None:
|
|
|
1173
1181
|
"validatorScriptPath": ctx.get("RUN_VALIDATOR_RELATIVE_PATH", ""),
|
|
1174
1182
|
"claudeSessionId": ctx.get("CLAUDE_SESSION_ID", ""),
|
|
1175
1183
|
"resumeCommandPath": ctx.get("CLAUDE_RESUME_COMMAND_RELATIVE_PATH", ""),
|
|
1184
|
+
"concurrentRun": {
|
|
1185
|
+
"detected": bool(concurrent_run_stages),
|
|
1186
|
+
"activeStages": concurrent_run_stages,
|
|
1187
|
+
},
|
|
1176
1188
|
"workflowSnapshot": {
|
|
1177
1189
|
"phaseSequence": workflow.get("phaseSequence", []),
|
|
1178
1190
|
"currentPhase": workflow.get(
|
|
@@ -1674,7 +1686,11 @@ def inject_lead_prompt_computed_tokens(ctx: dict) -> None:
|
|
|
1674
1686
|
"2. Before any dispatch, record in team-state:\n"
|
|
1675
1687
|
' `teamCreate: { attempted: false, status: "skipped",'
|
|
1676
1688
|
f' reason: "concurrent-run", concurrentStages: [{concurrent_stages}] }}`.\n'
|
|
1677
|
-
"3.
|
|
1689
|
+
"3. Immediately after recording it, emit the checkpoint line\n"
|
|
1690
|
+
" `PROGRESS: phase-3-team-create skipped (concurrent-run)` — the\n"
|
|
1691
|
+
" phase-3 checkpoint is still required in no-team mode and the\n"
|
|
1692
|
+
" session-conformance validator fails the run without it.\n"
|
|
1693
|
+
"4. Dispatch every worker with `run_in_background: true` and NO\n"
|
|
1678
1694
|
" `team_name` (the Phase 5 fallback). Worker completion is detected by\n"
|
|
1679
1695
|
" result-file polling, so analysis output is equivalent — only the\n"
|
|
1680
1696
|
" Teams split-pane view is lost."
|
|
@@ -243,6 +243,19 @@ def _needle_scan(jsonl_path: Path, entry: dict, needle_lower: str) -> bool:
|
|
|
243
243
|
return False
|
|
244
244
|
|
|
245
245
|
|
|
246
|
+
def _cached_needle_scan(jsonl_path: Path, cache: dict, needle_lower: str) -> bool:
|
|
247
|
+
"""`cache['needles']` 의 per-needle cursor 를 유지하며 `_needle_scan` 수행.
|
|
248
|
+
파일당 MAX_NEEDLES 개까지 오래된 순으로 교체 보존한다."""
|
|
249
|
+
needles = cache.setdefault("needles", {})
|
|
250
|
+
entry = needles.get(needle_lower)
|
|
251
|
+
if entry is None:
|
|
252
|
+
entry = {"offset": 0, "found": False}
|
|
253
|
+
while len(needles) >= MAX_NEEDLES:
|
|
254
|
+
needles.pop(next(iter(needles)))
|
|
255
|
+
needles[needle_lower] = entry
|
|
256
|
+
return _needle_scan(jsonl_path, entry, needle_lower)
|
|
257
|
+
|
|
258
|
+
|
|
246
259
|
def find_claude_team_sessions(
|
|
247
260
|
cwd: Path,
|
|
248
261
|
team_name: str,
|
|
@@ -276,14 +289,7 @@ def find_claude_team_sessions(
|
|
|
276
289
|
for p in proj_dir.glob("*.jsonl"):
|
|
277
290
|
if incremental:
|
|
278
291
|
cache = load_cache(p)
|
|
279
|
-
|
|
280
|
-
entry = needles.get(needle_lower)
|
|
281
|
-
if entry is None:
|
|
282
|
-
entry = {"offset": 0, "found": False}
|
|
283
|
-
while len(needles) >= MAX_NEEDLES:
|
|
284
|
-
needles.pop(next(iter(needles)))
|
|
285
|
-
needles[needle_lower] = entry
|
|
286
|
-
if _needle_scan(p, entry, needle_lower):
|
|
292
|
+
if _cached_needle_scan(p, cache, needle_lower):
|
|
287
293
|
out[p.stem] = p
|
|
288
294
|
save_cache(p, cache)
|
|
289
295
|
else:
|
|
@@ -294,3 +300,53 @@ def find_claude_team_sessions(
|
|
|
294
300
|
if direct.is_file():
|
|
295
301
|
out.setdefault(lead_sid, direct)
|
|
296
302
|
return out
|
|
303
|
+
|
|
304
|
+
|
|
305
|
+
_TEAM_TAG_NEEDLE = '"teamname":"'
|
|
306
|
+
|
|
307
|
+
|
|
308
|
+
def find_claude_agent_sessions(
|
|
309
|
+
cwd: Path,
|
|
310
|
+
agent_prefixes: list[str],
|
|
311
|
+
projects_root: Path | None = None,
|
|
312
|
+
*,
|
|
313
|
+
incremental: bool = False,
|
|
314
|
+
) -> dict[str, Path]:
|
|
315
|
+
"""Map sessionId -> jsonl path for non-team sessions whose recorded
|
|
316
|
+
`agentName` matches one of ``agent_prefixes``.
|
|
317
|
+
|
|
318
|
+
no-team run(teamCreate `skipped`(concurrent-run) / `error` fallback)의
|
|
319
|
+
worker 세션은 team 태그가 없어 teamName needle 로는 발견되지 않는다. 대신
|
|
320
|
+
하네스가 dispatch `name` 인자로 기록하는 `agentName` 을 needle 로 찾되,
|
|
321
|
+
team 태그가 있는 세션은 제외한다 — 그것은 동시 team-mode run 의 worker 다.
|
|
322
|
+
|
|
323
|
+
같은 agentName 을 쓰는 다른 run 의 세션도 걸릴 수 있으므로, 호출자는 반드시
|
|
324
|
+
run 윈도우로 totals 를 스코핑하고 in-window 이벤트가 없는 세션을 버려야
|
|
325
|
+
한다. 윈도우가 겹치는 두 no-team run 이 같은 role 을 dispatch 한 경우의
|
|
326
|
+
교차 귀속은 구조적으로 분리 불가 — usage 블록의 sessionId 목록으로만
|
|
327
|
+
추적 가능하다.
|
|
328
|
+
"""
|
|
329
|
+
proj_dir = claude_project_dir(cwd, projects_root)
|
|
330
|
+
out: dict[str, Path] = {}
|
|
331
|
+
if not proj_dir.is_dir():
|
|
332
|
+
return out
|
|
333
|
+
agent_needles = [f'"agentname":"{p.lower()}' for p in agent_prefixes if p]
|
|
334
|
+
if not agent_needles:
|
|
335
|
+
return out
|
|
336
|
+
for p in proj_dir.glob("*.jsonl"):
|
|
337
|
+
if incremental:
|
|
338
|
+
cache = load_cache(p)
|
|
339
|
+
matched = any(_cached_needle_scan(p, cache, n) for n in agent_needles)
|
|
340
|
+
team_tagged = matched and _cached_needle_scan(p, cache, _TEAM_TAG_NEEDLE)
|
|
341
|
+
save_cache(p, cache)
|
|
342
|
+
else:
|
|
343
|
+
matched = any(
|
|
344
|
+
_needle_scan(p, {"offset": 0, "found": False}, n)
|
|
345
|
+
for n in agent_needles
|
|
346
|
+
)
|
|
347
|
+
team_tagged = matched and _needle_scan(
|
|
348
|
+
p, {"offset": 0, "found": False}, _TEAM_TAG_NEEDLE
|
|
349
|
+
)
|
|
350
|
+
if matched and not team_tagged:
|
|
351
|
+
out[p.stem] = p
|
|
352
|
+
return out
|
|
@@ -7,7 +7,11 @@ from pathlib import Path
|
|
|
7
7
|
from okstra_project.dirs import OKSTRA_RELATIVE
|
|
8
8
|
|
|
9
9
|
from .blocks import na_block, usage_block
|
|
10
|
-
from .claude import
|
|
10
|
+
from .claude import (
|
|
11
|
+
claude_session_totals,
|
|
12
|
+
find_claude_agent_sessions,
|
|
13
|
+
find_claude_team_sessions,
|
|
14
|
+
)
|
|
11
15
|
from .codex import codex_session_total, find_codex_session
|
|
12
16
|
from .gemini import find_gemini_session, gemini_session_total
|
|
13
17
|
from .paths import claude_project_dir, utc_now
|
|
@@ -197,6 +201,31 @@ def collect(team_state_path: Path, project_root: Path | None = None, *,
|
|
|
197
201
|
unattributed_sessions.append(sid)
|
|
198
202
|
unattributed_totals.append(totals)
|
|
199
203
|
|
|
204
|
+
# no-team run (teamCreate skipped/concurrent-run, or error fallback) —
|
|
205
|
+
# worker 세션에 team 태그가 없어 위 needle 탐색이 비므로, agentName 기반
|
|
206
|
+
# 발견으로 보강한다. run 윈도우 밖 세션(in-window 이벤트 없음 = startedAt
|
|
207
|
+
# 부재)은 같은 agentName 의 타 run 세션이므로 버린다.
|
|
208
|
+
team_create_status = str((state.get("teamCreate") or {}).get("status", "")).strip()
|
|
209
|
+
if team_create_status in ("skipped", "error"):
|
|
210
|
+
worker_prefix_pool = [
|
|
211
|
+
prefix
|
|
212
|
+
for w in state.get("workers", [])
|
|
213
|
+
for prefix in match_prefixes(w.get("workerId") or "")
|
|
214
|
+
]
|
|
215
|
+
agent_sessions = find_claude_agent_sessions(
|
|
216
|
+
cwd, worker_prefix_pool, incremental=incremental
|
|
217
|
+
)
|
|
218
|
+
for sid, path in agent_sessions.items():
|
|
219
|
+
if sid == lead_sid or sid in claude_sessions:
|
|
220
|
+
continue
|
|
221
|
+
totals = claude_session_totals(path, since=run_since, until=run_until,
|
|
222
|
+
incremental=incremental)
|
|
223
|
+
if not totals.get("startedAt"):
|
|
224
|
+
continue
|
|
225
|
+
agent = totals.get("agentName")
|
|
226
|
+
if agent:
|
|
227
|
+
by_agent.setdefault(agent, []).append((sid, path, totals))
|
|
228
|
+
|
|
200
229
|
# Lead.
|
|
201
230
|
if lead_path is not None:
|
|
202
231
|
totals = claude_session_totals(lead_path, since=run_since, until=run_until,
|
|
@@ -334,8 +334,29 @@ def effective_run_task_type(run_manifest: dict, task_manifest: dict) -> str:
|
|
|
334
334
|
).strip()
|
|
335
335
|
|
|
336
336
|
|
|
337
|
+
def _is_legal_concurrent_run_skip(
|
|
338
|
+
team_create: object, concurrent_run_authorized: bool
|
|
339
|
+
) -> bool:
|
|
340
|
+
"""prepare 가 run-manifest 에 동시-run 을 기록한 run 에서만, 렌더 게이트
|
|
341
|
+
("Concurrent-run: no-team background")가 지시한 teamCreate skipped 형태를
|
|
342
|
+
legal 터미널 상태로 인정한다. lead 의 team-state 자기 선언만으로는(앵커
|
|
343
|
+
없이) 열리지 않는다 — 선언이 아닌 prepare-측 사실이 강제 근거다."""
|
|
344
|
+
if not concurrent_run_authorized or not isinstance(team_create, dict):
|
|
345
|
+
return False
|
|
346
|
+
return (
|
|
347
|
+
team_create.get("attempted") is False
|
|
348
|
+
and str(team_create.get("status", "")).strip() == "skipped"
|
|
349
|
+
and str(team_create.get("reason", "")).strip() == "concurrent-run"
|
|
350
|
+
)
|
|
351
|
+
|
|
352
|
+
|
|
337
353
|
def validate_team_state(
|
|
338
|
-
team_state: dict,
|
|
354
|
+
team_state: dict,
|
|
355
|
+
project_root: Path,
|
|
356
|
+
contract: dict,
|
|
357
|
+
failures: list[str],
|
|
358
|
+
*,
|
|
359
|
+
concurrent_run_authorized: bool = False,
|
|
339
360
|
) -> None:
|
|
340
361
|
artifacts = team_state.get("artifacts")
|
|
341
362
|
if not isinstance(artifacts, dict):
|
|
@@ -377,12 +398,18 @@ def validate_team_state(
|
|
|
377
398
|
)
|
|
378
399
|
if any_dispatched:
|
|
379
400
|
team_create = team_state.get("teamCreate")
|
|
380
|
-
if
|
|
401
|
+
if _is_legal_concurrent_run_skip(team_create, concurrent_run_authorized):
|
|
402
|
+
pass
|
|
403
|
+
elif not isinstance(team_create, dict) or not team_create.get("attempted"):
|
|
381
404
|
failures.append(
|
|
382
405
|
"team-state.teamCreate.attempted must be true once any worker has "
|
|
383
406
|
"been dispatched (status in completed/timeout/error/in-progress). "
|
|
384
407
|
"Phase 3 (TeamCreate) was skipped — workers ran in-process without "
|
|
385
|
-
"the Teams split-pane surface. See agents/SKILL.md Phase 3."
|
|
408
|
+
"the Teams split-pane surface. See agents/SKILL.md Phase 3. "
|
|
409
|
+
"(The no-team concurrent-run path is legal ONLY when the "
|
|
410
|
+
"run-manifest carries prepare-recorded `concurrentRun.detected: "
|
|
411
|
+
'true` AND team-state records `teamCreate: { attempted: false, '
|
|
412
|
+
'status: "skipped", reason: "concurrent-run" }`.)'
|
|
386
413
|
)
|
|
387
414
|
else:
|
|
388
415
|
tc_status = str(team_create.get("status", "")).strip()
|
|
@@ -2114,7 +2141,16 @@ def main() -> int:
|
|
|
2114
2141
|
if autofix_state == "accuracy-failed":
|
|
2115
2142
|
failures.extend(autofix_messages)
|
|
2116
2143
|
contract = extract_contract(run_manifest, task_manifest, failures)
|
|
2117
|
-
|
|
2144
|
+
concurrent_run_authorized = bool(
|
|
2145
|
+
(run_manifest.get("concurrentRun") or {}).get("detected")
|
|
2146
|
+
)
|
|
2147
|
+
validate_team_state(
|
|
2148
|
+
team_state,
|
|
2149
|
+
project_root,
|
|
2150
|
+
contract,
|
|
2151
|
+
failures,
|
|
2152
|
+
concurrent_run_authorized=concurrent_run_authorized,
|
|
2153
|
+
)
|
|
2118
2154
|
# Schema validation runs BEFORE markdown substring checks: if the
|
|
2119
2155
|
# data.json is well-formed, the rendered markdown is guaranteed to
|
|
2120
2156
|
# contain every required section. Substring checks below are a
|
|
@@ -271,7 +271,13 @@ def _check_progress_checkpoints(
|
|
|
271
271
|
str(w.get("status", "")).strip() in _DISPATCHED_STATUSES for w in workers
|
|
272
272
|
)
|
|
273
273
|
require("phase-2-prompts", bool(workers), "before any Write to assigned prompt paths")
|
|
274
|
-
require(
|
|
274
|
+
require(
|
|
275
|
+
"phase-3-team-create",
|
|
276
|
+
any_dispatched,
|
|
277
|
+
"immediately before the TeamCreate call — or, in the authorized no-team "
|
|
278
|
+
"concurrent-run path, the `phase-3-team-create skipped (concurrent-run)` "
|
|
279
|
+
"variant emitted right after recording teamCreate skipped in team-state",
|
|
280
|
+
)
|
|
275
281
|
_check_worker_checkpoint_lines(by_phase, analysis_workers, errors)
|
|
276
282
|
require(
|
|
277
283
|
"phase-5.5-convergence",
|