npm - @hallucination-studio/harness-engine - Versions diffs - 1.0.0-beta.10.9ff10d9 → 1.0.0-beta.11.2a4849a - Mend

@hallucination-studio/harness-engine 1.0.0-beta.10.9ff10d9 → 1.0.0-beta.11.2a4849a

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/README.md +34 -1
package/package.json +1 -1
package/skills/harness-engine/SKILL.md +4 -2
package/skills/harness-engine/evals/cases.json +4 -0
package/skills/harness-engine/evals/run_evals.py +61 -0
package/skills/harness-engine/references/template-policy.md +3 -0
package/skills/harness-engine/references/workflow.md +1 -0
package/skills/harness-engine/scripts/manage_harness.py +164 -2

package/README.md CHANGED Viewed

@@ -18,6 +18,7 @@ ask for missing high-impact facts, create the harness files, and keep future wor
 - Creates execution-plan folders for active and completed plans.
 - Adds SOPs for architecture setup, knowledge capture, local observability, and UI validation.
 - Reconciles managed harnesses through the same `init` flow, refreshing managed files and backfilling newly introduced managed files while preserving unmanaged docs.
+- Provides `clean` to remove transient harness runtime state, add `.gitignore` entries, and untrack already committed harness runtime files so a follow-up commit deletes them from the remote.
 - Enforces a local harness check without assuming the user's project has CI.
 - Previews and optionally removes stale unreferenced generated evidence under `docs/generated/`.
 - Supports durable knowledge closure with stable knowledge IDs and evidence text, so permanent docs can use natural wording instead of duplicated checklist strings.
@@ -161,12 +162,44 @@ python3 .codex/skills/harness-engine/scripts/manage_harness.py workstream-upsert
 python3 .codex/skills/harness-engine/scripts/manage_harness.py check --repo .
 python3 .codex/skills/harness-engine/scripts/manage_harness.py evidence-prune --repo . --older-than-days 14
 python3 .codex/skills/harness-engine/scripts/manage_harness.py evidence-prune --repo . --older-than-days 14 --apply
+python3 .codex/skills/harness-engine/scripts/manage_harness.py clean --repo .
+python3 .codex/skills/harness-engine/scripts/manage_harness.py clean --repo . --apply
 ```
 The quality gate is intentionally local and repository-owned. It does not require the user's
 project to have CI. `plan-close` refuses to move a plan to `completed` unless `quality-score`
 has passed, and `check` reports active plans whose quality gate is missing or failing.
+## Version Control Policy
+Commit harness docs that carry durable repository knowledge: `AGENTS.md`, `ARCHITECTURE.md`,
+`docs/PLANS.md`, `docs/QUALITY_SCORE.md`, `docs/RELIABILITY.md`, `docs/SECURITY.md`,
+`docs/FRONTEND.md`, `docs/sops/`, `docs/product-specs/`, `docs/design-docs/`,
+`docs/references/`, and intentional execution-plan state.
+Do not commit local skill installs or generated evidence by default. `clean --apply` adds these ignores:
+```gitignore
+# harness-engine transient files
+.codex/skills/
+docs/generated/
+# end harness-engine transient files
+```
+If those files were already committed or pushed, run:
+```bash
+python3 .codex/skills/harness-engine/scripts/manage_harness.py clean --repo .
+python3 .codex/skills/harness-engine/scripts/manage_harness.py clean --repo . --apply
+git status --short
+git diff --cached --stat
+git commit -m "Remove harness runtime artifacts from git"
+git push
+```
+`clean --apply` removes local generated evidence and stale task snapshots, then uses
+`git rm --cached` to stage removal of tracked harness runtime files from git and the remote.
 For multi-phase work, `Phase Continuity` and `docs/exec-plans/workstreams.md` form the recovery
 ledger. A plan like `Local Workbench Phase 1` can close only after it records whether the workstream
 continues, pauses, completes, or stops, and where the next agent should resume.
@@ -252,7 +285,7 @@ These scores describe the current implementation, not an external guarantee.
 | Knowledge, quality, and workstream closure loop | 9.1 / 10 | Stable knowledge IDs plus exact destination evidence reduce noisy doc duplication, `quality-score` rejects missing evidence notes, defects block closure until resolved, and workstreams make phased work recoverable. Future work could move plan state into structured sidecar metadata instead of Markdown parsing. |
 | CLI installer | 8 / 10 | Simple local/global/custom install modes, force replacement, and path discovery. It is intentionally minimal and does not manage Codex runtime configuration. |
 | Generated harness docs | 8.4 / 10 | Covers architecture, plans, reliability, security, frontend policy, issue workflows, references, generated artifacts, and SOPs. The docs now front-load exact knowledge evidence, per-dimension quality notes, and plan placeholder cleanup, but templates still require Codex to tighten project-specific language after generation. |
-| Evaluation coverage | 9 / 10 | `npm test` runs 12 structured eval cases covering empty-repo init, frontend analysis, init reconciliation, issue workflow coverage, closed-loop plan behavior, phase continuity, path canonicalization, defect recovery, required quality-score notes, exact knowledge evidence, generated-evidence cleanup, eval report shape, and user-owned doc preservation. A fully automated Codex child-agent E2E would raise this further. |
+| Evaluation coverage | 9 / 10 | `npm test` runs 13 structured eval cases covering empty-repo init, frontend analysis, init reconciliation, clean command behavior for local runtime state and already tracked artifacts, issue workflow coverage, closed-loop plan behavior, phase continuity, path canonicalization, defect recovery, required quality-score notes, exact knowledge evidence, generated-evidence cleanup, eval report shape, and user-owned doc preservation. A fully automated Codex child-agent E2E would raise this further. |
 | Release automation | 8 / 10 | Supports stable release, beta on every main commit, nightly, manual dry-run, artifacts, provenance, and token fallback. npm first-publish/trusted-publishing setup still requires external configuration. |
 | User-project safety | 8.8 / 10 | The skill avoids adding CI to target projects by default, preserves unmanaged files unless forced, and requires evidence-backed closure for defects and durable knowledge. More destructive-change simulation in evals would improve this score. |
 | Overall | 9 / 10 | The skill is now strong enough for regular use: self evals pass across the structured suite, real acceptance covered initial scaffold plus frontend and backend issue workflows, and the main failure modes found during acceptance are now documented and eval-covered. Remaining leverage is automated child-agent E2E coverage and structured plan metadata. |

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@hallucination-studio/harness-engine",
-  "version": "1.0.0-beta.10.9ff10d9",
+  "version": "1.0.0-beta.11.2a4849a",
   "description": "Install the harness-engine Codex skill for initializing and reconciling advanced repository harness docs.",
   "repository": {
     "type": "git",

package/skills/harness-engine/SKILL.md CHANGED Viewed

@@ -28,7 +28,8 @@ Run the packaged script to inspect the target repository before editing files. U
 17. Close the plan with `python3 scripts/manage_harness.py plan-close --repo <target-repo> --plan <plan-file> --summary "<summary>"`.
 18. Before handoff, run `python3 .codex/skills/harness-engine/scripts/manage_harness.py check --repo <target-repo>` from an installed target repository.
 19. To review stale generated evidence, run `python3 scripts/manage_harness.py evidence-prune --repo <target-repo>` first; it is dry-run by default. Add `--apply` only after checking the candidate list.
-20. After changing this skill, run `python3 evals/run_evals.py` and iterate until it passes.
+20. To clean transient harness runtime files or remove already committed runtime files from the remote, run `python3 scripts/manage_harness.py clean --repo <target-repo>` first; it is dry-run by default. Add `--apply` to clean local runtime state, update `.gitignore`, and stage `git rm --cached` removals, then commit and push.
+21. After changing this skill, run `python3 evals/run_evals.py` and iterate until it passes.
 ## Reading Order
@@ -63,6 +64,7 @@ Run the packaged script to inspect the target repository before editing files. U
 - Use `plan-close` as the final guardrail so plan state, quality score, and durable docs stay synchronized.
 - Use `check` as the local handoff guardrail for user repositories.
 - Use `evidence-prune` as a cleanup preview for old unreferenced files under `docs/generated/`; it never deletes unless `--apply` is present.
+- Use `clean` when `.codex/skills/`, `docs/generated/`, or stale `docs/exec-plans/active|completed/` files need cleanup or were already committed. It never changes files or the git index unless `--apply` is present.
 - Run `python3 evals/run_evals.py` after skill changes, read the structured report, and treat per-case failures as iteration input.
 - Do not add CI to user repositories unless the human explicitly asks for it.
@@ -72,7 +74,7 @@ Run the packaged script to inspect the target repository before editing files. U
 - Keep durable knowledge in repo docs, not in chat-only explanations.
 - Keep plans under `docs/exec-plans/active/` and move finished plans to `docs/exec-plans/completed/`.
 - Keep resumable workstreams in `docs/exec-plans/workstreams.md`.
-- Keep generated material under `docs/generated/`.
+- Keep generated evidence under `docs/generated/`; it is local runtime output and is ignored by git unless the human intentionally promotes a specific artifact into tracked docs.
 - Keep external, model-friendly references under `docs/references/`.
 - Keep SOPs explicit and task-triggered so the next agent can follow the same path mechanically.

package/skills/harness-engine/evals/cases.json CHANGED Viewed

@@ -11,6 +11,10 @@
     "id": "init-reconciles-existing-harness",
     "description": "Init should reconcile an existing harness by refreshing managed files and adding newly introduced managed files."
   },
+  {
+    "id": "clean-removes-runtime-state-and-untracks-artifacts",
+    "description": "Clean should dry-run by default, remove local harness runtime state, update .gitignore, and stage removal of tracked harness runtime artifacts when applied."
+  },
   {
     "id": "closed-loop-plan",
     "description": "Execution plans should refuse to close until durable knowledge is written back."

package/skills/harness-engine/evals/run_evals.py CHANGED Viewed

@@ -197,6 +197,66 @@ def test_init_reconciles_existing_harness(tmp_root):
     assert_exists(repo, "docs/sops/evidence-first-eval-loop.md")
+def test_clean_removes_runtime_state_and_untracks_artifacts(tmp_root):
+    repo = tmp_root / "clean-repo"
+    repo.mkdir()
+    subprocess.run(["git", "init"], cwd=repo, text=True, capture_output=True, check=True)
+    tracked_files = [
+        ".codex/skills/harness-engine/SKILL.md",
+        "docs/generated/canvas-polish-desktop-final.png",
+        "docs/generated/harness-analysis.json",
+        "docs/exec-plans/active/2026-06-11-old-task.md",
+        "docs/exec-plans/completed/2026-06-11-old-task.md",
+    ]
+    for relative_path in tracked_files:
+        path = repo / relative_path
+        path.parent.mkdir(parents=True, exist_ok=True)
+        path.write_text("tracked runtime artifact\n")
+    subprocess.run(["git", "add", *tracked_files], cwd=repo, text=True, capture_output=True, check=True)
+    subprocess.run(
+        ["git", "commit", "-m", "track runtime artifacts"],
+        cwd=repo,
+        text=True,
+        capture_output=True,
+        check=True,
+    )
+    dry_run = run_manager("clean", "--repo", str(repo))
+    if dry_run["mode"] != "dry-run" or dry_run["tracked_candidate_count"] != len(tracked_files):
+        raise AssertionError("clean should dry-run tracked runtime artifact candidates")
+    if set(dry_run["tracked_candidates"]) != set(tracked_files):
+        raise AssertionError("clean tracked candidates should match tracked harness runtime artifacts")
+    if "docs/generated/canvas-polish-desktop-final.png" not in set(dry_run["local_candidates"]):
+        raise AssertionError("clean should preview local generated evidence cleanup")
+    for relative_path in tracked_files:
+        if not (repo / relative_path).exists():
+            raise AssertionError("clean dry-run should not delete local files")
+    applied = run_manager("clean", "--repo", str(repo), "--apply")
+    if applied["mode"] != "apply" or set(applied["removed_from_index"]) != set(tracked_files):
+        raise AssertionError("clean --apply should remove candidates from the git index")
+    assert_contains(repo, ".gitignore", ".codex/skills/")
+    assert_contains(repo, ".gitignore", "docs/generated/")
+    status = subprocess.run(
+        ["git", "status", "--short"],
+        cwd=repo,
+        text=True,
+        capture_output=True,
+        check=True,
+    ).stdout
+    for relative_path in tracked_files:
+        if f"D  {relative_path}" not in status:
+            raise AssertionError(f"clean should stage index deletion for {relative_path}")
+    for relative_path in tracked_files:
+        if relative_path.startswith(".codex/skills/"):
+            if not (repo / relative_path).exists():
+                raise AssertionError(f"clean should keep local skill install file for {relative_path}")
+        elif (repo / relative_path).exists():
+            raise AssertionError(f"clean should delete local runtime file for {relative_path}")
+    if "A  .gitignore" not in status:
+        raise AssertionError("clean should stage the new .gitignore block")
 def test_closed_loop_plan(tmp_root):
     repo = tmp_root / "loop-repo"
     repo.mkdir()
@@ -1091,6 +1151,7 @@ EVALS = [
     ("empty-repo-init", test_empty_repo_init),
     ("frontend-analysis", test_frontend_analysis),
     ("init-reconciles-existing-harness", test_init_reconciles_existing_harness),
+    ("clean-removes-runtime-state-and-untracks-artifacts", test_clean_removes_runtime_state_and_untracks_artifacts),
     ("closed-loop-plan", test_closed_loop_plan),
     ("phase-continuity-workstream", test_phase_continuity_workstream),
     ("plan-path-canonicalization", test_plan_path_canonicalization),

package/skills/harness-engine/references/template-policy.md CHANGED Viewed

@@ -7,6 +7,9 @@ Every generated file starts with a managed marker:
 Init behavior:
 - `init`: create missing files for new repositories; when an existing managed harness is detected, refresh managed files and create missing files while preserving unmanaged files
+- `clean` removes transient runtime state under `docs/generated/`, `docs/exec-plans/active/`, and `docs/exec-plans/completed/`
+- `clean` maintains `.gitignore` entries for `.codex/skills/` and `docs/generated/`
+- `clean` previews or stages removal of already tracked harness runtime files from the git index; it requires `--apply` before running `git rm --cached`
 Use `init` as the normal workspace command so creation and reconciliation share one path. Use `--force` only when the human explicitly accepts overwriting.

package/skills/harness-engine/references/workflow.md CHANGED Viewed

@@ -48,6 +48,7 @@ After the scaffold exists:
 - run `quality-score` after implementation and validation, with evidence notes for every dimension
 - if `quality-score` fails, implement the `## Rework Required` items and score again
 - use `phase-set` and `workstream-upsert` when a plan belongs to phased or resumable work
+- use `clean` when harness runtime files need cleanup or were already committed; review dry-run output first, then apply, commit, and push the staged removals
 - use `plan-close` to verify no durable knowledge is left stranded in the active plan
 - before `plan-close`, replace generic plan placeholders with task-specific scope, constraints, steps, validation, and completion notes; delete unused ad hoc durable-knowledge TODOs
 - run `.codex/skills/harness-engine/scripts/manage_harness.py check --repo <target-repo>` before handoff

package/skills/harness-engine/scripts/manage_harness.py CHANGED Viewed

@@ -5,13 +5,35 @@ import hashlib
 import json
 import os
 import re
+import subprocess
 import time
 from datetime import UTC, datetime
 from pathlib import Path
 MANAGED_MARKER = "<!-- harness-engine:managed -->"
+OBSOLETE_MANAGED_MARKERS = [
+    "<!-- harness-repo-bootstrap:managed -->",
+    "<!-- harness-init:managed -->",
+]
 DEFAULT_KNOWLEDGE_PLACEHOLDER = "- [ ] Add durable facts here as they emerge -> <destination-doc>"
 DEFAULT_DEFECT_PLACEHOLDER = "None."
+GITIGNORE_BLOCK_START = "# harness-engine transient files"
+GITIGNORE_BLOCK_END = "# end harness-engine transient files"
+GITIGNORE_ENTRIES = [
+    ".codex/skills/",
+    "docs/generated/",
+]
+CLEAN_INIT_DIRS = [
+    "docs/generated",
+    "docs/exec-plans/active",
+    "docs/exec-plans/completed",
+]
+GIT_CLEAN_PATHS = [
+    ".codex/skills",
+    "docs/generated",
+    "docs/exec-plans/active",
+    "docs/exec-plans/completed",
+]
 PLAN_TEMPLATE = """# Execution Plan: {title}
 ## Goal
@@ -673,7 +695,7 @@ def detect_existing_managed_files(repo):
         path = repo / relative_path
         if path.exists():
             try:
-                if is_managed_text(path.read_text()):
+                if is_harness_owned_text(path.read_text()):
                     managed.append(relative_path)
             except UnicodeDecodeError:
                 continue
@@ -738,10 +760,102 @@ def ensure_parent(path):
     path.parent.mkdir(parents=True, exist_ok=True)
+def ensure_gitignore(repo):
+    path = repo / ".gitignore"
+    existing = path.read_text() if path.exists() else ""
+    block_lines = [GITIGNORE_BLOCK_START, *GITIGNORE_ENTRIES, GITIGNORE_BLOCK_END]
+    block = "\n".join(block_lines)
+    pattern = re.compile(
+        rf"(^|\n){re.escape(GITIGNORE_BLOCK_START)}\n.*?\n{re.escape(GITIGNORE_BLOCK_END)}(?=\n|$)",
+        flags=re.DOTALL,
+    )
+    if pattern.search(existing):
+        updated = pattern.sub(lambda match: match.group(1) + block, existing)
+    else:
+        prefix = existing.rstrip()
+        updated = f"{prefix}\n\n{block}" if prefix else block
+    updated = updated.rstrip() + "\n"
+    changed = updated != existing
+    if changed:
+        path.write_text(updated)
+    return {
+        "path": ".gitignore",
+        "updated": changed,
+        "entries": GITIGNORE_ENTRIES,
+    }
+def clean_init_state(repo):
+    cleaned = []
+    for relative_dir in CLEAN_INIT_DIRS:
+        root = repo / relative_dir
+        if not root.exists():
+            continue
+        for path in sorted(root.rglob("*"), reverse=True):
+            if path.is_file() or path.is_symlink():
+                cleaned.append(str(path.relative_to(repo)))
+                path.unlink()
+            elif path.is_dir():
+                try:
+                    path.rmdir()
+                except OSError:
+                    pass
+    return cleaned
+def git_tracked_harness_runtime_files(repo, roots=None):
+    roots = roots or GIT_CLEAN_PATHS
+    result = subprocess.run(
+        ["git", "-C", str(repo), "ls-files", *roots],
+        text=True,
+        capture_output=True,
+        check=False,
+    )
+    if result.returncode != 0:
+        raise RuntimeError(result.stderr.strip() or "git ls-files failed")
+    return [line for line in result.stdout.splitlines() if line.strip()]
+def git_untrack_files(repo, paths):
+    if not paths:
+        return []
+    result = subprocess.run(
+        ["git", "-C", str(repo), "rm", "-r", "--cached", "--", *paths],
+        text=True,
+        capture_output=True,
+        check=False,
+    )
+    if result.returncode != 0:
+        raise RuntimeError(result.stderr.strip() or "git rm --cached failed")
+    return paths
+def git_add_paths(repo, paths):
+    if not paths:
+        return []
+    result = subprocess.run(
+        ["git", "-C", str(repo), "add", "--", *paths],
+        text=True,
+        capture_output=True,
+        check=False,
+    )
+    if result.returncode != 0:
+        raise RuntimeError(result.stderr.strip() or "git add failed")
+    return paths
 def is_managed_text(text):
     return text.startswith(MANAGED_MARKER)
+def is_obsolete_managed_text(text):
+    return any(text.startswith(marker) for marker in OBSOLETE_MANAGED_MARKERS)
+def is_harness_owned_text(text):
+    return is_managed_text(text) or is_obsolete_managed_text(text)
 def slugify(value):
     normalized = re.sub(r"[^a-z0-9]+", "-", value.strip().lower()).strip("-")
     return normalized or "task"
@@ -1580,7 +1694,7 @@ def should_write(path, refresh_managed, force):
     if force:
         return True
     try:
-        is_managed = is_managed_text(path.read_text())
+        is_managed = is_harness_owned_text(path.read_text())
     except UnicodeDecodeError:
         return False
     if refresh_managed and is_managed:
@@ -2218,6 +2332,48 @@ def command_evidence_prune(args):
     write_json(args.output, result)
+def command_clean(args):
+    repo = Path(args.repo).resolve()
+    candidates = git_tracked_harness_runtime_files(repo)
+    local_clean_candidates = []
+    for relative_dir in CLEAN_INIT_DIRS:
+        root = repo / relative_dir
+        if not root.exists():
+            continue
+        for path in sorted(root.rglob("*")):
+            if path.is_file() or path.is_symlink():
+                local_clean_candidates.append(str(path.relative_to(repo)))
+    gitignore = None
+    removed_from_index = []
+    cleaned = []
+    if args.apply:
+        gitignore = ensure_gitignore(repo)
+        cleaned = clean_init_state(repo)
+        removed_from_index = git_untrack_files(repo, candidates)
+        if gitignore["updated"]:
+            git_add_paths(repo, [gitignore["path"]])
+    result = {
+        "repo": str(repo),
+        "mode": "apply" if args.apply else "dry-run",
+        "tracked_candidate_count": len(candidates),
+        "tracked_candidates": candidates,
+        "local_candidate_count": len(local_clean_candidates),
+        "local_candidates": local_clean_candidates,
+        "gitignore": gitignore,
+        "removed_from_index": removed_from_index,
+        "cleaned": cleaned,
+        "next_steps": (
+            [
+                "Review staged changes with `git status --short` and `git diff --cached --stat`.",
+                "Commit and push to remove these files from the remote repository.",
+            ]
+            if args.apply
+            else ["Re-run with `--apply` to clean local harness runtime files, update .gitignore, and stage git index removals."]
+        ),
+    }
+    write_json(args.output, result)
 def build_parser():
     parser = argparse.ArgumentParser(description="Manage the harness repo scaffold.")
     subparsers = parser.add_subparsers(dest="command", required=True)
@@ -2361,6 +2517,12 @@ def build_parser():
     evidence_prune.add_argument("--output")
     evidence_prune.set_defaults(func=command_evidence_prune)
+    clean = subparsers.add_parser("clean")
+    clean.add_argument("--repo", required=True)
+    clean.add_argument("--apply", action="store_true")
+    clean.add_argument("--output")
+    clean.set_defaults(func=command_clean)
     return parser