clean-room-skill 0.1.15 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +1 -1
- package/.claude-plugin/plugin.json +1 -1
- package/.codex-plugin/plugin.json +1 -1
- package/docs/ARCHITECTURE.md +5 -1
- package/docs/HOOKS.md +2 -0
- package/docs/REFERENCE.md +2 -0
- package/hooks/validate-json-schema.py +499 -0
- package/package.json +1 -1
- package/plugin.json +1 -1
package/docs/ARCHITECTURE.md
CHANGED
|
@@ -179,6 +179,7 @@ The architecture delegates work across six distinct custom role agents to enforc
|
|
|
179
179
|
* Influences Agent 2, Agent 3, and Agent 4 only through durable sanitized artifacts, never direct chat, progress feedback, implementation hints, or priority changes.
|
|
180
180
|
* Performs final verification of clean specification and implementation coverage against the source scope.
|
|
181
181
|
* Blocks handoff or coverage completion when high-priority contaminated discovery leads remain unresolved.
|
|
182
|
+
* Treats completion as deny-by-default unless durable canonical artifacts prove the clean behavior gate.
|
|
182
183
|
* Writes the inner-loop `clean-room-result.json` only after contaminated-side coverage verification.
|
|
183
184
|
* Consumes Agent 3 reports only after Agent 3 reaches a terminal state, and consumes Agent 4 reports only after the configured polish review reaches a terminal state, then sends only abstract delta tickets into a fresh clean artifact cycle.
|
|
184
185
|
|
|
@@ -250,6 +251,8 @@ The outer loop owns spec development: scope, behavior specs, acceptance criteria
|
|
|
250
251
|
|
|
251
252
|
Agent 3's terminal report is not enough to return. If configured, Agent 4 must produce a passing `polish-report.json`. Agent 0 must then consume the terminal clean reports, verify contaminated-side coverage, and write `clean-room-result.json`.
|
|
252
253
|
|
|
254
|
+
Completion is deny-by-default. `task-manifest.json`, `coverage-ledger.json`, and `clean-room-result*.json` writes that claim completion must be backed by durable canonical clean artifacts: a matching clean behavior spec, implementation plan mappings, a terminal implementation report, a passed QC report, valid evidence references, and required public-surface mappings. Synthetic or manual completion summaries are not completion evidence.
|
|
255
|
+
|
|
253
256
|
`clean-room-skill run` is the executable v1 inner-loop runner. It requires preflight refs, the required handoff sequence, unattended `controller_policy`, schema-valid `loop_context`, and either a user-supplied agent command adapter or the built-in Claude Code agent runtime. It does not automate outer spec development. The runner:
|
|
254
257
|
|
|
255
258
|
* Locks the contaminated artifact root with `.clean-room-run.lock`.
|
|
@@ -261,6 +264,7 @@ Agent 3's terminal report is not enough to return. If configured, Agent 4 must p
|
|
|
261
264
|
* Supports the optional `clean-polish-review` phase between `clean-implement-qc` and `contaminated-coverage-verify`.
|
|
262
265
|
* Validates schema, leakage, and handoff integrity before advancing state.
|
|
263
266
|
* Rejects `covered` coverage-ledger units that still have unresolved high-priority `discovery_leads`.
|
|
267
|
+
* Rejects completion claims that lack canonical clean specs, plans, terminal reports, QC, evidence, or public-surface coverage mappings.
|
|
264
268
|
* Records controller memory in contaminated-side `controller-run-ledger.json`.
|
|
265
269
|
* Writes `clean-room-result.json` before returning to the outer spec loop.
|
|
266
270
|
|
|
@@ -302,7 +306,7 @@ Post-write hook failures are deny-by-default and redacted. If an artifact disapp
|
|
|
302
306
|
* [deny-clean-source-read.py](../hooks/deny-clean-source-read.py): Enforces that clean roles and Agent 1.5 cannot read source or visual roots or unapproved paths; clean roles may read implementation roots, and source-denied roles are denied direct `preflight-goal.json` reads. Agent 1.5 is also denied clean roots, implementation roots, and direct `source-index.json` or `visual-index.json` reads.
|
|
303
307
|
* [deny-contaminated-clean-write.py](../hooks/deny-contaminated-clean-write.py): Enforces role write roots. Agent 2 writes clean artifacts only, Agent 3 writes implementation files and clean reports, Agent 4 writes clean polish reports and implementation-root polish changes, contaminated roles write only to `CLEAN_ROOM_CONTAMINATED_ARTIFACT_ROOTS`, and clean-room artifact JSON files are denied under `CLEAN_ROOM_IMPLEMENTATION_ROOTS`.
|
|
304
308
|
* [check-artifact-leakage.py](../hooks/check-artifact-leakage.py): Scans clean artifacts and Agent 1.5 staged contaminated artifacts for high-risk leakage markers, source-like identifiers, and private identifier denylist terms. The private identifier denylist (loaded via `CLEAN_ROOM_PRIVATE_IDENTIFIER_DENYLIST`) is subject to hard limits to protect hook execution performance: a maximum of 1,000,000 bytes per file, 20,000 total terms, and 512 characters per individual term.
|
|
305
|
-
* [validate-json-schema.py](../hooks/validate-json-schema.py): Verifies JSON syntax and structural conformance against schemas under `CLEAN_ROOM_SCHEMA_DIR`, including controller-side `preflight-goal.schema.json` and `init-config.schema.json`. Under clean roots, any unrecognized JSON files that do not conform to canonical schemas will trigger a failure unless they are explicitly registered in the path-separated `CLEAN_ROOM_AUXILIARY_JSON_ALLOWLIST` environment variable.
|
|
309
|
+
* [validate-json-schema.py](../hooks/validate-json-schema.py): Verifies JSON syntax and structural conformance against schemas under `CLEAN_ROOM_SCHEMA_DIR`, including controller-side `preflight-goal.schema.json` and `init-config.schema.json`. Under clean roots, any unrecognized JSON files that do not conform to canonical schemas will trigger a failure unless they are explicitly registered in the path-separated `CLEAN_ROOM_AUXILIARY_JSON_ALLOWLIST` environment variable. Post-write validation also rejects completion claims in `task-manifest.json`, `coverage-ledger.json`, and `clean-room-result*.json` unless durable canonical completion artifacts prove the gate.
|
|
306
310
|
* [validate-handoff-package.py](../hooks/validate-handoff-package.py): Verifies that handoff packages stay within clean roots, do not reference contaminated paths, `task-manifest.json`, `preflight-goal.json`, `source-index.json`, or `visual-index.json`, and match declared `sha256` checksums.
|
|
307
311
|
|
|
308
312
|
For detailed guidelines on the clean-room process, refer to:
|
package/docs/HOOKS.md
CHANGED
|
@@ -171,6 +171,8 @@ Post-write JSON artifact validator.
|
|
|
171
171
|
- Implements the lightweight schema keywords used by bundled schemas, including object and array constraints, required fields, enum/const, patterns, `format: date-time`, `$ref`, `allOf`, `anyOf`, `oneOf`, and `if`/`then`/`else`.
|
|
172
172
|
- Adds clean-run-context path checks so clean artifact paths stay relative, do not use `~`, do not contain `..`, and do not resolve into source or contaminated roots.
|
|
173
173
|
- Requires task manifest handoff stages to match the expected clean-room sequence when validating the task manifest schema.
|
|
174
|
+
- Performs semantic completion validation for `task-manifest.json`, `coverage-ledger.json`, and `clean-room-result*.json` post-write payloads.
|
|
175
|
+
- Rejects completion claims unless canonical durable artifacts prove the gate: matching clean behavior specs, implementation-plan work item mappings, terminal implementation reports, passed QC reports, valid evidence references, and required public-surface coverage mappings.
|
|
174
176
|
|
|
175
177
|
### `validate-handoff-package.py`
|
|
176
178
|
|
package/docs/REFERENCE.md
CHANGED
|
@@ -238,6 +238,8 @@ Unattended code-development manifests must include exactly one `unit_kind: "foun
|
|
|
238
238
|
|
|
239
239
|
`coverage-ledger.json` may record contaminated-only `source_units[].discovery_leads` for authorized related surfaces that were detected but not analyzed in the assigned unit. The runner rejects a `covered` unit while any high-priority discovery lead remains open or deferred. It does not add follow-up units or expand `loop_context.approved_scope_refs`; Agent 0 must return an abstract delta, mark coverage partial or blocked, or pause for attended approval.
|
|
240
240
|
|
|
241
|
+
Completion is valid only when canonical durable artifacts prove the gate. A completed behavior unit or `spec-slice-complete` result must have a matching clean behavior spec, implementation-plan work item mapping, terminal implementation report, passed QC report, valid contaminated evidence refs, and required public-surface mappings across the behavior spec, coverage ledger, implementation plan, and terminal report. Manual summaries or synthetic result files are not completion evidence.
|
|
242
|
+
|
|
241
243
|
Minimal agent command adapter shape for advisory or disabled context management:
|
|
242
244
|
|
|
243
245
|
```json
|
|
@@ -71,6 +71,8 @@ TASK_MANIFEST_HANDOFF_SEQUENCE_WITH_POLISH = [
|
|
|
71
71
|
"clean-polish-review",
|
|
72
72
|
TASK_MANIFEST_HANDOFF_SEQUENCE[-1],
|
|
73
73
|
]
|
|
74
|
+
PUBLIC_SURFACE_COMPLETION_LEVELS = {"exact-public-contract", "behavior-compatible"}
|
|
75
|
+
MAX_COMPLETION_ARTIFACT_SCAN = 500
|
|
74
76
|
MAX_REPORTED_ERRORS = 20
|
|
75
77
|
MAX_VALIDATION_ERRORS = MAX_REPORTED_ERRORS + 1
|
|
76
78
|
REPAIR_HINT = "Fix or update the JSON artifact to satisfy the reported schema errors, then write it again."
|
|
@@ -318,6 +320,500 @@ def task_manifest_handoff_sequence_errors(data: dict[str, Any]) -> list[str]:
|
|
|
318
320
|
return []
|
|
319
321
|
|
|
320
322
|
|
|
323
|
+
def completion_guard_enabled(payload: Any) -> bool:
|
|
324
|
+
if not isinstance(payload, dict):
|
|
325
|
+
return False
|
|
326
|
+
tool = payload.get("tool_name") or payload.get("tool")
|
|
327
|
+
if not isinstance(tool, str):
|
|
328
|
+
return False
|
|
329
|
+
return tool.lower() in {"write", "edit", "multiedit", "notebookedit", "apply_patch"}
|
|
330
|
+
|
|
331
|
+
|
|
332
|
+
def read_json_artifact(path: Path, label: str) -> tuple[dict[str, Any] | None, str | None]:
|
|
333
|
+
text, read_error = read_artifact_text(path, label)
|
|
334
|
+
if read_error:
|
|
335
|
+
return None, read_error
|
|
336
|
+
try:
|
|
337
|
+
data = json.loads(text)
|
|
338
|
+
except json.JSONDecodeError as exc:
|
|
339
|
+
return None, f"{label} JSON parse failed for {describe_path(path)}: {redact_text(exc)}"
|
|
340
|
+
if not isinstance(data, dict):
|
|
341
|
+
return None, f"{label} must be a JSON object: {describe_path(path)}"
|
|
342
|
+
return data, None
|
|
343
|
+
|
|
344
|
+
|
|
345
|
+
def relative_ref_candidates(ref: str) -> list[str]:
|
|
346
|
+
refs = [ref]
|
|
347
|
+
for prefix in ("clean/", "contaminated/"):
|
|
348
|
+
if ref.startswith(prefix):
|
|
349
|
+
refs.append(ref.removeprefix(prefix))
|
|
350
|
+
return refs
|
|
351
|
+
|
|
352
|
+
|
|
353
|
+
def find_json_by_ref(ref: Any, roots: list[Path], label: str) -> tuple[dict[str, Any] | None, str | None]:
|
|
354
|
+
if not isinstance(ref, str) or not ref:
|
|
355
|
+
return None, f"{label} ref is missing"
|
|
356
|
+
try:
|
|
357
|
+
raw = Path(ref).expanduser()
|
|
358
|
+
except OSError as exc:
|
|
359
|
+
return None, f"{label} ref is invalid: {redact_text(exc)}"
|
|
360
|
+
candidates: list[Path] = []
|
|
361
|
+
if raw.is_absolute():
|
|
362
|
+
try:
|
|
363
|
+
candidates.append(raw.resolve())
|
|
364
|
+
except OSError as exc:
|
|
365
|
+
return None, f"{label} ref is invalid: {redact_text(exc)}"
|
|
366
|
+
else:
|
|
367
|
+
for root in roots:
|
|
368
|
+
for candidate_ref in relative_ref_candidates(ref):
|
|
369
|
+
try:
|
|
370
|
+
candidates.append((root / candidate_ref).resolve())
|
|
371
|
+
except OSError as exc:
|
|
372
|
+
return None, f"{label} ref is invalid: {redact_text(exc)}"
|
|
373
|
+
for candidate in candidates:
|
|
374
|
+
try:
|
|
375
|
+
if candidate.is_file():
|
|
376
|
+
return read_json_artifact(candidate, label)
|
|
377
|
+
except OSError as exc:
|
|
378
|
+
return None, f"{label} could not stat {describe_path(candidate)}: {redact_text(exc)}"
|
|
379
|
+
return None, f"{label} does not exist: {ref}"
|
|
380
|
+
|
|
381
|
+
|
|
382
|
+
def scan_json_artifacts(roots: list[Path], wanted_kind: str) -> list[tuple[Path, dict[str, Any]]]:
|
|
383
|
+
matches: list[tuple[Path, dict[str, Any]]] = []
|
|
384
|
+
scanned = 0
|
|
385
|
+
for root in roots:
|
|
386
|
+
try:
|
|
387
|
+
for candidate in root.rglob("*.json"):
|
|
388
|
+
scanned += 1
|
|
389
|
+
if scanned > MAX_COMPLETION_ARTIFACT_SCAN:
|
|
390
|
+
return matches
|
|
391
|
+
try:
|
|
392
|
+
if not candidate.is_file():
|
|
393
|
+
continue
|
|
394
|
+
except OSError:
|
|
395
|
+
continue
|
|
396
|
+
data, error = read_json_artifact(candidate, f"{wanted_kind} artifact")
|
|
397
|
+
if error or not data:
|
|
398
|
+
continue
|
|
399
|
+
if artifact_kind(candidate, data) == wanted_kind:
|
|
400
|
+
matches.append((candidate, data))
|
|
401
|
+
except OSError:
|
|
402
|
+
continue
|
|
403
|
+
return matches
|
|
404
|
+
|
|
405
|
+
|
|
406
|
+
def first_json_artifact(roots: list[Path], name: str, label: str) -> tuple[dict[str, Any] | None, str | None]:
|
|
407
|
+
for root in roots:
|
|
408
|
+
candidate = root / name
|
|
409
|
+
try:
|
|
410
|
+
if candidate.is_file():
|
|
411
|
+
return read_json_artifact(candidate, label)
|
|
412
|
+
except OSError as exc:
|
|
413
|
+
return None, f"{label} could not stat {describe_path(candidate)}: {redact_text(exc)}"
|
|
414
|
+
return None, None
|
|
415
|
+
|
|
416
|
+
|
|
417
|
+
def unit_ref_values(unit_id: str) -> set[str]:
|
|
418
|
+
return {unit_id, f"unit:{unit_id}", f"task-manifest:{unit_id}", f"behavior-spec:{unit_id}"}
|
|
419
|
+
|
|
420
|
+
|
|
421
|
+
def evidence_id_from_ref(ref: Any) -> str | None:
|
|
422
|
+
prefix = "evidence-ledger:"
|
|
423
|
+
if isinstance(ref, str) and ref.startswith(prefix):
|
|
424
|
+
return ref.removeprefix(prefix)
|
|
425
|
+
return None
|
|
426
|
+
|
|
427
|
+
|
|
428
|
+
def evidence_entry_map(evidence_ledger: dict[str, Any] | None) -> dict[str, dict[str, Any]]:
|
|
429
|
+
entries: dict[str, dict[str, Any]] = {}
|
|
430
|
+
if not isinstance(evidence_ledger, dict):
|
|
431
|
+
return entries
|
|
432
|
+
for entry in evidence_ledger.get("entries") or []:
|
|
433
|
+
if isinstance(entry, dict) and isinstance(entry.get("evidence_id"), str):
|
|
434
|
+
entries[entry["evidence_id"]] = entry
|
|
435
|
+
return entries
|
|
436
|
+
|
|
437
|
+
|
|
438
|
+
def public_surface_ref(spec: dict[str, Any], item: dict[str, Any]) -> str:
|
|
439
|
+
return f"public_surface:{spec.get('spec_id')}:{item.get('kind')}:{item.get('name')}"
|
|
440
|
+
|
|
441
|
+
|
|
442
|
+
def required_public_surface_obligations(spec: dict[str, Any]) -> list[str]:
|
|
443
|
+
if spec.get("compatibility_level") not in PUBLIC_SURFACE_COMPLETION_LEVELS:
|
|
444
|
+
return []
|
|
445
|
+
obligations: list[str] = []
|
|
446
|
+
for item in spec.get("public_surface") or []:
|
|
447
|
+
if isinstance(item, dict) and isinstance(item.get("name"), str) and isinstance(item.get("kind"), str):
|
|
448
|
+
obligations.append(public_surface_ref(spec, item))
|
|
449
|
+
return obligations
|
|
450
|
+
|
|
451
|
+
|
|
452
|
+
def behavior_spec_test_coverage_refs(spec: dict[str, Any]) -> set[str]:
|
|
453
|
+
refs: set[str] = set()
|
|
454
|
+
for scenario in spec.get("test_scenarios") or []:
|
|
455
|
+
if not isinstance(scenario, dict):
|
|
456
|
+
continue
|
|
457
|
+
for ref in scenario.get("coverage") or []:
|
|
458
|
+
if isinstance(ref, str):
|
|
459
|
+
refs.add(ref)
|
|
460
|
+
return refs
|
|
461
|
+
|
|
462
|
+
|
|
463
|
+
def matching_behavior_specs(
|
|
464
|
+
specs: list[tuple[Path, dict[str, Any]]],
|
|
465
|
+
unit_id: str,
|
|
466
|
+
spec_slice_ref: str | None = None,
|
|
467
|
+
) -> list[tuple[Path, dict[str, Any]]]:
|
|
468
|
+
matches: list[tuple[Path, dict[str, Any]]] = []
|
|
469
|
+
accepted_refs = unit_ref_values(unit_id)
|
|
470
|
+
if spec_slice_ref:
|
|
471
|
+
accepted_refs.add(spec_slice_ref)
|
|
472
|
+
for spec_path, spec in specs:
|
|
473
|
+
source_refs = spec.get("source_unit_refs") if isinstance(spec.get("source_unit_refs"), list) else []
|
|
474
|
+
spec_refs = {
|
|
475
|
+
ref
|
|
476
|
+
for ref in [spec.get("spec_id"), spec.get("unit_id"), *source_refs]
|
|
477
|
+
if isinstance(ref, str)
|
|
478
|
+
}
|
|
479
|
+
if spec_refs & accepted_refs or spec.get("unit_id") == unit_id or unit_id in source_refs:
|
|
480
|
+
matches.append((spec_path, spec))
|
|
481
|
+
return matches
|
|
482
|
+
|
|
483
|
+
|
|
484
|
+
def unit_id_from_spec_slice_ref(spec_slice_ref: Any, specs: list[tuple[Path, dict[str, Any]]]) -> str | None:
|
|
485
|
+
if not isinstance(spec_slice_ref, str) or not spec_slice_ref:
|
|
486
|
+
return None
|
|
487
|
+
for spec_ref, unit_id_prefix in (("unit:", "unit:"), ("task-manifest:", "task-manifest:"), ("behavior-spec:", "behavior-spec:")):
|
|
488
|
+
if spec_slice_ref.startswith(spec_ref):
|
|
489
|
+
return spec_slice_ref.removeprefix(unit_id_prefix)
|
|
490
|
+
if spec_slice_ref.startswith("unit-"):
|
|
491
|
+
return spec_slice_ref
|
|
492
|
+
for _spec_path, spec in specs:
|
|
493
|
+
if spec.get("spec_id") == spec_slice_ref:
|
|
494
|
+
return spec.get("unit_id") if isinstance(spec.get("unit_id"), str) else None
|
|
495
|
+
return None
|
|
496
|
+
|
|
497
|
+
|
|
498
|
+
def plan_work_items_by_public_ref(plans: list[tuple[Path, dict[str, Any]]]) -> dict[str, list[str]]:
|
|
499
|
+
refs: dict[str, list[str]] = {}
|
|
500
|
+
for _plan_path, plan in plans:
|
|
501
|
+
for work_item in plan.get("work_items") or []:
|
|
502
|
+
if not isinstance(work_item, dict) or not isinstance(work_item.get("work_item_id"), str):
|
|
503
|
+
continue
|
|
504
|
+
for ref in work_item.get("public_contract_refs") or []:
|
|
505
|
+
if isinstance(ref, str):
|
|
506
|
+
refs.setdefault(ref, []).append(work_item["work_item_id"])
|
|
507
|
+
return refs
|
|
508
|
+
|
|
509
|
+
|
|
510
|
+
def plan_work_items_for_specs(plans: list[tuple[Path, dict[str, Any]]], specs: list[dict[str, Any]]) -> set[str]:
|
|
511
|
+
spec_ids = {spec.get("spec_id") for spec in specs if isinstance(spec.get("spec_id"), str)}
|
|
512
|
+
work_items: set[str] = set()
|
|
513
|
+
for _plan_path, plan in plans:
|
|
514
|
+
for work_item in plan.get("work_items") or []:
|
|
515
|
+
if not isinstance(work_item, dict) or not isinstance(work_item.get("work_item_id"), str):
|
|
516
|
+
continue
|
|
517
|
+
refs = {ref for ref in work_item.get("spec_ids") or [] if isinstance(ref, str)}
|
|
518
|
+
if refs & spec_ids:
|
|
519
|
+
work_items.add(work_item["work_item_id"])
|
|
520
|
+
return work_items
|
|
521
|
+
|
|
522
|
+
|
|
523
|
+
def completed_work_items(reports: list[tuple[Path, dict[str, Any]]]) -> set[str]:
|
|
524
|
+
completed: set[str] = set()
|
|
525
|
+
for _report_path, report in reports:
|
|
526
|
+
for work_item_id in report.get("completed_work_items") or []:
|
|
527
|
+
if isinstance(work_item_id, str):
|
|
528
|
+
completed.add(work_item_id)
|
|
529
|
+
return completed
|
|
530
|
+
|
|
531
|
+
|
|
532
|
+
def terminal_implementation_reports(reports: list[tuple[Path, dict[str, Any]]]) -> list[tuple[Path, dict[str, Any]]]:
|
|
533
|
+
terminal: list[tuple[Path, dict[str, Any]]] = []
|
|
534
|
+
for report_path, report in reports:
|
|
535
|
+
if (
|
|
536
|
+
report.get("implementation_status") == "complete"
|
|
537
|
+
and report.get("final_status") == "complete"
|
|
538
|
+
and isinstance(report.get("agent0_reporting"), dict)
|
|
539
|
+
and report["agent0_reporting"].get("report_state") == "terminal-report"
|
|
540
|
+
):
|
|
541
|
+
terminal.append((report_path, report))
|
|
542
|
+
return terminal
|
|
543
|
+
|
|
544
|
+
|
|
545
|
+
def passed_qc_reports(qc_reports: list[tuple[Path, dict[str, Any]]]) -> list[tuple[Path, dict[str, Any]]]:
|
|
546
|
+
passed: list[tuple[Path, dict[str, Any]]] = []
|
|
547
|
+
for report_path, report in qc_reports:
|
|
548
|
+
if (
|
|
549
|
+
report.get("final_status") in {"passed", "passed-with-gaps"}
|
|
550
|
+
and report.get("coverage_status") == "complete"
|
|
551
|
+
and report.get("schema_status") == "passed"
|
|
552
|
+
and report.get("leakage_status") == "passed"
|
|
553
|
+
and report.get("required_rerun") is False
|
|
554
|
+
):
|
|
555
|
+
passed.append((report_path, report))
|
|
556
|
+
return passed
|
|
557
|
+
|
|
558
|
+
|
|
559
|
+
def source_unit_for_unit(coverage_ledger: dict[str, Any] | None, unit_id: str) -> dict[str, Any] | None:
|
|
560
|
+
if not isinstance(coverage_ledger, dict):
|
|
561
|
+
return None
|
|
562
|
+
for source_unit in coverage_ledger.get("source_units") or []:
|
|
563
|
+
if isinstance(source_unit, dict) and source_unit.get("unit_id") == unit_id:
|
|
564
|
+
return source_unit
|
|
565
|
+
return None
|
|
566
|
+
|
|
567
|
+
|
|
568
|
+
def validate_evidence_refs(
|
|
569
|
+
errors: list[str],
|
|
570
|
+
unit_id: str,
|
|
571
|
+
refs: Any,
|
|
572
|
+
evidence_ledger: dict[str, Any] | None,
|
|
573
|
+
label: str,
|
|
574
|
+
) -> None:
|
|
575
|
+
if not isinstance(refs, list) or not refs:
|
|
576
|
+
add_error(errors, f"{label} has no evidence_refs: {unit_id}")
|
|
577
|
+
return
|
|
578
|
+
entries = evidence_entry_map(evidence_ledger)
|
|
579
|
+
if not entries:
|
|
580
|
+
add_error(errors, f"{label} references evidence but evidence-ledger.json is missing or empty: {unit_id}")
|
|
581
|
+
return
|
|
582
|
+
for ref in refs:
|
|
583
|
+
evidence_id = evidence_id_from_ref(ref)
|
|
584
|
+
if not evidence_id:
|
|
585
|
+
continue
|
|
586
|
+
entry = entries.get(evidence_id)
|
|
587
|
+
if not entry:
|
|
588
|
+
add_error(errors, f"{label} references missing evidence-ledger item: {ref}")
|
|
589
|
+
continue
|
|
590
|
+
source_ref = entry.get("source_unit_ref")
|
|
591
|
+
if isinstance(source_ref, str) and source_ref not in unit_ref_values(unit_id):
|
|
592
|
+
add_error(errors, f"{label} evidence ref points at a different source unit: {ref}")
|
|
593
|
+
|
|
594
|
+
|
|
595
|
+
def manifest_behavior_unit_ids(manifest: dict[str, Any] | None) -> set[str]:
|
|
596
|
+
ids: set[str] = set()
|
|
597
|
+
if not isinstance(manifest, dict):
|
|
598
|
+
return ids
|
|
599
|
+
for unit in manifest.get("units") or []:
|
|
600
|
+
if isinstance(unit, dict) and unit.get("unit_kind") == "behavior" and isinstance(unit.get("unit_id"), str):
|
|
601
|
+
ids.add(unit["unit_id"])
|
|
602
|
+
return ids
|
|
603
|
+
|
|
604
|
+
|
|
605
|
+
def manifest_completion_behavior_unit_ids(manifest: dict[str, Any] | None) -> set[str]:
|
|
606
|
+
ids: set[str] = set()
|
|
607
|
+
if not isinstance(manifest, dict):
|
|
608
|
+
return ids
|
|
609
|
+
for unit in manifest.get("units") or []:
|
|
610
|
+
if (
|
|
611
|
+
isinstance(unit, dict)
|
|
612
|
+
and unit.get("unit_kind") == "behavior"
|
|
613
|
+
and unit.get("status") != "out-of-scope"
|
|
614
|
+
and isinstance(unit.get("unit_id"), str)
|
|
615
|
+
):
|
|
616
|
+
ids.add(unit["unit_id"])
|
|
617
|
+
return ids
|
|
618
|
+
|
|
619
|
+
|
|
620
|
+
def behavior_unit_is_in_scope(unit_id: str, manifest: dict[str, Any] | None, specs: list[tuple[Path, dict[str, Any]]]) -> bool:
|
|
621
|
+
behavior_ids = manifest_behavior_unit_ids(manifest)
|
|
622
|
+
if behavior_ids:
|
|
623
|
+
return unit_id in behavior_ids
|
|
624
|
+
if matching_behavior_specs(specs, unit_id):
|
|
625
|
+
return True
|
|
626
|
+
return unit_id != "unit-foundation"
|
|
627
|
+
|
|
628
|
+
|
|
629
|
+
def completion_context(path: Path, kind: str, data: dict[str, Any]) -> dict[str, Any]:
|
|
630
|
+
clean_roots = env_roots("CLEAN_ROOM_CLEAN_ROOTS")
|
|
631
|
+
contaminated_roots = env_roots("CLEAN_ROOM_CONTAMINATED_ARTIFACT_ROOTS")
|
|
632
|
+
manifest = data if kind == "task-manifest" else first_json_artifact(contaminated_roots, "task-manifest.json", "task-manifest")[0]
|
|
633
|
+
coverage = data if kind == "coverage-ledger" else first_json_artifact(contaminated_roots, "coverage-ledger.json", "coverage-ledger")[0]
|
|
634
|
+
evidence = first_json_artifact(contaminated_roots, "evidence-ledger.json", "evidence-ledger")[0]
|
|
635
|
+
specs = scan_json_artifacts(clean_roots, "behavior-spec")
|
|
636
|
+
plans = scan_json_artifacts(clean_roots, "implementation-plan")
|
|
637
|
+
reports = scan_json_artifacts(clean_roots, "implementation-report")
|
|
638
|
+
qcs = scan_json_artifacts(clean_roots, "qc-report")
|
|
639
|
+
if kind == "clean-room-result" and data.get("result") == "spec-slice-complete":
|
|
640
|
+
report, report_error = find_json_by_ref(data.get("terminal_report_ref"), clean_roots, "clean-room-result terminal_report_ref")
|
|
641
|
+
qc, qc_error = find_json_by_ref(data.get("qc_report_ref"), clean_roots, "clean-room-result qc_report_ref")
|
|
642
|
+
if report:
|
|
643
|
+
reports = [(path, report)]
|
|
644
|
+
if qc:
|
|
645
|
+
qcs = [(path, qc)]
|
|
646
|
+
return {
|
|
647
|
+
"clean_roots": clean_roots,
|
|
648
|
+
"contaminated_roots": contaminated_roots,
|
|
649
|
+
"manifest": manifest,
|
|
650
|
+
"coverage": coverage,
|
|
651
|
+
"evidence": evidence,
|
|
652
|
+
"specs": specs,
|
|
653
|
+
"plans": plans,
|
|
654
|
+
"reports": reports,
|
|
655
|
+
"qcs": qcs,
|
|
656
|
+
"report_error": report_error if not report else None,
|
|
657
|
+
"qc_error": qc_error if not qc else None,
|
|
658
|
+
}
|
|
659
|
+
return {
|
|
660
|
+
"clean_roots": clean_roots,
|
|
661
|
+
"contaminated_roots": contaminated_roots,
|
|
662
|
+
"manifest": manifest,
|
|
663
|
+
"coverage": coverage,
|
|
664
|
+
"evidence": evidence,
|
|
665
|
+
"specs": specs,
|
|
666
|
+
"plans": plans,
|
|
667
|
+
"reports": reports,
|
|
668
|
+
"qcs": qcs,
|
|
669
|
+
"report_error": None,
|
|
670
|
+
"qc_error": None,
|
|
671
|
+
}
|
|
672
|
+
|
|
673
|
+
|
|
674
|
+
def validate_behavior_unit_completion(
|
|
675
|
+
errors: list[str],
|
|
676
|
+
unit_id: str,
|
|
677
|
+
context: dict[str, Any],
|
|
678
|
+
spec_slice_ref: str | None = None,
|
|
679
|
+
) -> None:
|
|
680
|
+
specs = matching_behavior_specs(context["specs"], unit_id, spec_slice_ref)
|
|
681
|
+
if not specs:
|
|
682
|
+
add_error(errors, f"completion claim has no clean behavior spec: {unit_id}")
|
|
683
|
+
return
|
|
684
|
+
spec_data = [spec for _spec_path, spec in specs]
|
|
685
|
+
terminal_reports = terminal_implementation_reports(context["reports"])
|
|
686
|
+
if not terminal_reports:
|
|
687
|
+
add_error(errors, f"completion claim has no terminal implementation report: {unit_id}")
|
|
688
|
+
passed_qcs = passed_qc_reports(context["qcs"])
|
|
689
|
+
if not passed_qcs:
|
|
690
|
+
add_error(errors, f"completion claim has no passed QC report: {unit_id}")
|
|
691
|
+
work_item_ids = plan_work_items_for_specs(context["plans"], spec_data)
|
|
692
|
+
if not work_item_ids:
|
|
693
|
+
add_error(errors, f"completion claim has no implementation-plan work item for clean behavior spec: {unit_id}")
|
|
694
|
+
elif not (work_item_ids & completed_work_items(terminal_reports)):
|
|
695
|
+
add_error(errors, f"completion claim has no completed implementation work item for clean behavior spec: {unit_id}")
|
|
696
|
+
|
|
697
|
+
source_unit = source_unit_for_unit(context["coverage"], unit_id)
|
|
698
|
+
if not source_unit or source_unit.get("coverage_state") != "covered":
|
|
699
|
+
add_error(errors, f"completion claim has no covered coverage-ledger source unit: {unit_id}")
|
|
700
|
+
else:
|
|
701
|
+
validate_evidence_refs(errors, unit_id, source_unit.get("evidence_refs"), context["evidence"], "coverage-ledger source unit")
|
|
702
|
+
|
|
703
|
+
public_coverage_by_ref = {
|
|
704
|
+
item.get("ref"): item
|
|
705
|
+
for item in (source_unit or {}).get("public_surface_coverage") or []
|
|
706
|
+
if isinstance(item, dict) and isinstance(item.get("ref"), str)
|
|
707
|
+
}
|
|
708
|
+
plan_refs = plan_work_items_by_public_ref(context["plans"])
|
|
709
|
+
completed = completed_work_items(terminal_reports)
|
|
710
|
+
for spec_path, spec in specs:
|
|
711
|
+
coverage_refs = behavior_spec_test_coverage_refs(spec)
|
|
712
|
+
for obligation in required_public_surface_obligations(spec):
|
|
713
|
+
if obligation not in coverage_refs:
|
|
714
|
+
add_error(errors, f"public_surface obligation missing from behavior spec test coverage: {obligation} ({describe_path(spec_path)})")
|
|
715
|
+
coverage = public_coverage_by_ref.get(obligation)
|
|
716
|
+
if not coverage:
|
|
717
|
+
add_error(errors, f"coverage-ledger missing public_surface_coverage for: {obligation}")
|
|
718
|
+
continue
|
|
719
|
+
if coverage.get("status") != "covered":
|
|
720
|
+
add_error(errors, f"coverage-ledger public_surface_coverage is not covered: {obligation}")
|
|
721
|
+
validate_evidence_refs(errors, unit_id, coverage.get("evidence_refs"), context["evidence"], "coverage-ledger public_surface_coverage")
|
|
722
|
+
mapped_items = set(plan_refs.get(obligation) or [])
|
|
723
|
+
if not mapped_items:
|
|
724
|
+
add_error(errors, f"public_surface obligation missing from implementation plan: {obligation}")
|
|
725
|
+
elif not (mapped_items & completed):
|
|
726
|
+
add_error(errors, f"public_surface obligation work item is not complete: {obligation}")
|
|
727
|
+
if error_limit_reached(errors):
|
|
728
|
+
return
|
|
729
|
+
|
|
730
|
+
|
|
731
|
+
def completion_guard_errors(path: Path, kind: str, data: dict[str, Any]) -> list[str]:
|
|
732
|
+
if kind not in {"task-manifest", "coverage-ledger", "clean-room-result"}:
|
|
733
|
+
return []
|
|
734
|
+
if not env_roots("CLEAN_ROOM_CLEAN_ROOTS") or not env_roots("CLEAN_ROOM_CONTAMINATED_ARTIFACT_ROOTS"):
|
|
735
|
+
return []
|
|
736
|
+
errors: list[str] = []
|
|
737
|
+
context = completion_context(path, kind, data)
|
|
738
|
+
if context.get("report_error"):
|
|
739
|
+
add_error(errors, context["report_error"])
|
|
740
|
+
if context.get("qc_error"):
|
|
741
|
+
add_error(errors, context["qc_error"])
|
|
742
|
+
|
|
743
|
+
if kind == "task-manifest":
|
|
744
|
+
manifest_complete = isinstance(data.get("implementation_status"), dict) and data["implementation_status"].get("state") == "complete"
|
|
745
|
+
completed_behavior_ids: set[str] = set()
|
|
746
|
+
for unit in data.get("units") or []:
|
|
747
|
+
if not isinstance(unit, dict):
|
|
748
|
+
continue
|
|
749
|
+
if unit.get("unit_kind") != "behavior" or not isinstance(unit.get("unit_id"), str):
|
|
750
|
+
continue
|
|
751
|
+
unit_id = unit["unit_id"]
|
|
752
|
+
if unit.get("status") == "complete":
|
|
753
|
+
completed_behavior_ids.add(unit_id)
|
|
754
|
+
validate_behavior_unit_completion(errors, unit_id, context)
|
|
755
|
+
elif manifest_complete and unit.get("status") != "out-of-scope":
|
|
756
|
+
add_error(errors, f"task-manifest implementation_status complete but behavior unit is not complete: {unit_id}")
|
|
757
|
+
if error_limit_reached(errors):
|
|
758
|
+
break
|
|
759
|
+
if manifest_complete and not completed_behavior_ids:
|
|
760
|
+
add_error(errors, "task-manifest implementation_status complete has no completed behavior units")
|
|
761
|
+
elif kind == "coverage-ledger":
|
|
762
|
+
if data.get("coverage_status") != "complete":
|
|
763
|
+
return errors
|
|
764
|
+
if not isinstance(context["manifest"], dict):
|
|
765
|
+
add_error(errors, "coverage-ledger completion has no task-manifest.json")
|
|
766
|
+
if not data.get("behavior_spec_refs"):
|
|
767
|
+
add_error(errors, "coverage-ledger completion has no behavior_spec_refs")
|
|
768
|
+
required_behavior_ids = manifest_completion_behavior_unit_ids(context["manifest"])
|
|
769
|
+
if isinstance(context["manifest"], dict) and not required_behavior_ids:
|
|
770
|
+
add_error(errors, "coverage-ledger completion has no behavior units to complete")
|
|
771
|
+
covered_behavior_ids: set[str] = set()
|
|
772
|
+
behavior_spec_refs = {ref for ref in data.get("behavior_spec_refs") or [] if isinstance(ref, str)}
|
|
773
|
+
for source_unit in data.get("source_units") or []:
|
|
774
|
+
if not isinstance(source_unit, dict):
|
|
775
|
+
continue
|
|
776
|
+
unit_id = source_unit.get("unit_id")
|
|
777
|
+
if not isinstance(unit_id, str):
|
|
778
|
+
continue
|
|
779
|
+
if unit_id in required_behavior_ids and source_unit.get("coverage_state") != "covered":
|
|
780
|
+
add_error(errors, f"coverage-ledger completion does not cover behavior unit: {unit_id}")
|
|
781
|
+
if source_unit.get("coverage_state") != "covered":
|
|
782
|
+
continue
|
|
783
|
+
validate_evidence_refs(errors, unit_id, source_unit.get("evidence_refs"), context["evidence"], "coverage-ledger source unit")
|
|
784
|
+
is_behavior_completion = (
|
|
785
|
+
unit_id in required_behavior_ids
|
|
786
|
+
if required_behavior_ids
|
|
787
|
+
else behavior_unit_is_in_scope(unit_id, context["manifest"], context["specs"])
|
|
788
|
+
)
|
|
789
|
+
if is_behavior_completion:
|
|
790
|
+
covered_behavior_ids.add(unit_id)
|
|
791
|
+
validate_behavior_unit_completion(errors, unit_id, context)
|
|
792
|
+
for _spec_path, spec in matching_behavior_specs(context["specs"], unit_id):
|
|
793
|
+
spec_id = spec.get("spec_id")
|
|
794
|
+
if isinstance(spec_id, str) and spec_id not in behavior_spec_refs:
|
|
795
|
+
add_error(errors, f"coverage-ledger completion missing behavior_spec_refs entry: {spec_id}")
|
|
796
|
+
if error_limit_reached(errors):
|
|
797
|
+
break
|
|
798
|
+
for unit_id in sorted(required_behavior_ids - covered_behavior_ids):
|
|
799
|
+
add_error(errors, f"coverage-ledger completion does not cover behavior unit: {unit_id}")
|
|
800
|
+
if error_limit_reached(errors):
|
|
801
|
+
break
|
|
802
|
+
elif kind == "clean-room-result" and data.get("result") == "spec-slice-complete":
|
|
803
|
+
if not isinstance(context["manifest"], dict):
|
|
804
|
+
add_error(errors, "clean-room-result completion has no task-manifest.json")
|
|
805
|
+
if not isinstance(context["coverage"], dict):
|
|
806
|
+
add_error(errors, "clean-room-result completion has no coverage-ledger.json")
|
|
807
|
+
if data.get("coverage_state") != "complete":
|
|
808
|
+
add_error(errors, "clean-room-result spec-slice-complete must have coverage_state complete")
|
|
809
|
+
unit_id = unit_id_from_spec_slice_ref(data.get("spec_slice_ref"), context["specs"])
|
|
810
|
+
if not unit_id:
|
|
811
|
+
add_error(errors, "clean-room-result spec_slice_ref does not resolve to a behavior unit")
|
|
812
|
+
elif behavior_unit_is_in_scope(unit_id, context["manifest"], context["specs"]):
|
|
813
|
+
validate_behavior_unit_completion(errors, unit_id, context, data.get("spec_slice_ref"))
|
|
814
|
+
return errors
|
|
815
|
+
|
|
816
|
+
|
|
321
817
|
def is_clean_room_task_manifest_schema(schema: dict[str, Any]) -> bool:
|
|
322
818
|
properties = schema.get("properties")
|
|
323
819
|
return isinstance(properties, dict) and "handoff_sequence" in properties and "agent_pipeline" in properties
|
|
@@ -525,6 +1021,7 @@ def main() -> int:
|
|
|
525
1021
|
for error in path_errors:
|
|
526
1022
|
print(f"clean-room schema check failed: {redact_text(error)}", file=sys.stderr)
|
|
527
1023
|
return 1
|
|
1024
|
+
run_completion_guard = completion_guard_enabled(payload)
|
|
528
1025
|
for path in paths:
|
|
529
1026
|
if path.suffix.lower() != ".json" or not path.is_file():
|
|
530
1027
|
continue
|
|
@@ -587,6 +1084,8 @@ def main() -> int:
|
|
|
587
1084
|
extend_errors(errors, role_session_brief_path_errors(data))
|
|
588
1085
|
if kind == "task-manifest" and is_clean_room_task_manifest_schema(schema):
|
|
589
1086
|
extend_errors(errors, task_manifest_handoff_sequence_errors(data))
|
|
1087
|
+
if run_completion_guard:
|
|
1088
|
+
extend_errors(errors, completion_guard_errors(path, kind, data))
|
|
590
1089
|
if errors:
|
|
591
1090
|
print(f"clean-room schema check failed for {describe_path(path)}:", file=sys.stderr)
|
|
592
1091
|
for error in errors[:MAX_REPORTED_ERRORS]:
|
package/package.json
CHANGED