npm - @hallucination-studio/harness-engine - Versions diffs - 1.0.0-beta.12.d308768 → 1.0.0-beta.14.a797755 - Mend

@hallucination-studio/harness-engine 1.0.0-beta.12.d308768 → 1.0.0-beta.14.a797755

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/.codex-plugin/plugin.json +6 -0
package/README.md +97 -43
package/bin/install.js +57 -18
package/package.json +2 -1
package/skills/harness-engine/SKILL.md +42 -29
package/skills/harness-engine/evals/cases.json +45 -5
package/skills/harness-engine/evals/run_evals.py +888 -90
package/skills/harness-engine/references/evaluation-loop.md +3 -3
package/skills/harness-engine/references/evidence-first-evals.md +13 -6
package/skills/harness-engine/references/exec-plans.md +21 -13
package/skills/harness-engine/references/file-map.md +2 -2
package/skills/harness-engine/references/sop-index.md +2 -1
package/skills/harness-engine/references/template-policy.md +3 -2
package/skills/harness-engine/references/workflow.md +12 -6
package/skills/harness-engine/scripts/manage_harness.py +1497 -223

package/.codex-plugin/plugin.json ADDED Viewed

@@ -0,0 +1,6 @@
+{
+  "name": "harness-engine",
+  "version": "1.0.0",
+  "description": "Repository harness skill for Codex with optional frontend design-doc templates.",
+  "skills": "./skills/"
+}

package/README.md CHANGED Viewed

@@ -14,16 +14,18 @@ ask for missing high-impact facts, create the harness files, and keep future wor
 - Installs the `harness-engine` Codex skill locally, globally, or into a custom skills directory.
 - Provides a repository analyzer that detects language, package manager, frontend signals, existing harness files, missing execution-plan state, and missing SOPs.
-- Generates a short routing-style `AGENTS.md` plus durable system-of-record docs such as `ARCHITECTURE.md`, `docs/RELIABILITY.md`, `docs/SECURITY.md`, `docs/QUALITY_SCORE.md`, and `docs/FRONTEND.md`.
-- Creates execution-plan folders for active and completed plans.
+- Generates a short routing-style `AGENTS.md` plus durable system-of-record docs such as `ARCHITECTURE.md`, `docs/RELIABILITY.md`, `docs/SECURITY.md`, and `docs/QUALITY_SCORE.md`.
+- Generates `docs/FRONTEND.md`, `docs/DESIGN.md`, and `docs/design-docs/` only when a frontend surface is detected.
+- Creates version-controlled execution-plan folders for active and completed plans.
 - Adds SOPs for architecture setup, knowledge capture, local observability, and UI validation.
 - Reconciles managed harnesses through the same `init` flow, refreshing managed files and backfilling newly introduced managed files while preserving unmanaged docs.
-- Provides `clean` to remove transient harness runtime state, add `.gitignore` entries, and untrack already committed harness runtime files so a follow-up commit deletes them from the remote.
+- Provides `clean` to remove local skill installs and generated evidence, add `.gitignore` entries, and untrack already committed runtime artifacts so a follow-up commit deletes them from the remote.
 - Enforces a local harness check without assuming the user's project has CI.
 - Previews and optionally removes stale unreferenced generated evidence under `docs/generated/`.
 - Supports durable knowledge closure with stable knowledge IDs and evidence text, so permanent docs can use natural wording instead of duplicated checklist strings.
-- Enforces a local quality gate for execution plans; failed scores write `## Rework Required` into the plan and block `plan-close`.
+- Enforces structured execution-plan state: `acceptance-set` creates a pre-implementation Acceptance Contract, `quality-score` records post-implementation evidence against that contract, and stale or failed scores block `plan-close`.
 - Tracks resumable workstreams so interrupted features, refactors, reliability work, and cleanup efforts can be recovered from repo state instead of chat history.
+- For frontend projects, asks for the desired visual style and initializes a repository-owned visual specification based on the local DESIGN.md format pattern: YAML design tokens plus markdown rationale.
 ## Why It Exists
@@ -65,22 +67,38 @@ Install into a custom skills directory:
 npx @hallucination-studio/harness-engine install --path /path/to/skills
 ```
-Replace an existing installed skill:
+Replace an existing installed plugin bundle:
 ```bash
 npx @hallucination-studio/harness-engine install --local --force
 ```
-Show where the skill would be installed:
+Show where the plugin bundle would be installed:
 ```bash
 npx @hallucination-studio/harness-engine where --local
 ```
+## Frontend Design Docs
+Harness Engine has no external design runtime dependency and never calls an external design skill
+during `init`. When a target repository has no frontend, it does not generate `docs/FRONTEND.md`,
+`docs/DESIGN.md`, or `docs/design-docs/`.
+When a frontend is detected, Harness Engine creates:
+- `docs/FRONTEND.md`: project positioning, requested style direction, existing frontend code signals, frontend scope, stack notes, validation loop, and the read order for UI work.
+- `docs/DESIGN.md`: a project-owned unified visual specification using YAML design tokens plus markdown rationale, seeded from the human-confirmed style direction and existing frontend code signals. It defines semantic colors, a unified typography scale, spacing/radius tokens, component states, and rules for mapping those tokens into the project's shared style layer.
+- `docs/design-docs/`: durable design decisions and style-system notes.
+The templates are informed by the local reference checkout at `/Users/murphy/code/github/design.md`
+for document shape only. The target project owns the content and should replace starter tokens and
+prose with its concrete product style before substantial UI work.
 ## Update An Installed Skill Package
-The `npx` installer only installs or replaces the Codex skill package. To update an already
-installed skill, rerun `install` with `--force` in the same install location.
+The `npx` installer installs or replaces the Codex plugin bundle and compatibility skill entries.
+To update an already installed bundle, rerun `install` with `--force` in the same install location.
 Replace the local skill install:
@@ -113,11 +131,8 @@ The skill should analyze the workspace and run the single workspace entrypoint:
 - If a managed harness already exists, `manage_harness.py init` reconciles it by refreshing managed files and backfilling newly introduced managed files.
 - Unmanaged user files are preserved unless `--force` is explicitly used.
-The underlying command for both cases is:
-```bash
-python3 .codex/skills/harness-engine/scripts/manage_harness.py init --repo . --answers answers.json
-```
+Codex runs the underlying manager commands. Users should not need to call the Python script
+directly during normal work.
 ## Use The Skill In A Target Repo
@@ -132,33 +147,57 @@ The intended workflow is:
 1. Analyze the target repository.
 2. Ask the human only for unresolved, high-impact facts.
 3. Initialize or reconcile the harness files.
-4. Create execution plans for multi-step work.
-5. Log durable knowledge into active plans.
-6. Write the durable facts into permanent docs.
-7. Mark knowledge as written using ID plus evidence text.
-8. Score the finished work across product, UX/operator clarity, architecture, reliability, and security.
-9. If the quality gate fails, implement the generated `## Rework Required` items and score again.
-10. For phased or resumable work, update `Phase Continuity` and `docs/exec-plans/workstreams.md`.
-11. Close the execution plan only after the quality gate passes, phase continuity is recorded, and durable docs are updated.
-12. Run the local harness check before handoff.
-13. Periodically run `evidence-prune` to preview stale unreferenced generated evidence, and apply it only after reviewing the candidate list.
-The installed skill exposes the underlying script at:
+4. Create or reuse execution plans for repository-mutating work, including code, docs, configuration, tests, dependencies, build/release scripts, generated templates, runtime behavior, migrations, cleanup, and review fixes.
+5. Define the Acceptance Contract before implementation with product, UX, architecture, reliability, and security criteria.
+6. Log durable knowledge into active plans.
+7. Write the durable facts into permanent docs.
+8. Mark knowledge as written using ID plus evidence text.
+9. Score the finished work against the Acceptance Contract across product, UX/operator clarity, architecture, reliability, and security.
+10. If the Quality Result fails, implement the generated `## Rework Required` items and score again.
+11. Before closing, record a `Continuation Decision` for the plan.
+12. Close the execution plan only after the Quality Result passes against the current contract fingerprint, the continuation decision is recorded, and durable docs are updated.
+13. Run the local harness check before handoff.
+14. Periodically run `evidence-prune` to preview stale unreferenced generated evidence, and apply it only after reviewing the candidate list.
+## User Continuation UX
+Users should express the desired continuation state in natural language. Codex then runs the
+required harness commands and repairs any blocked state before handoff.
+Useful phrases:
+- "这项完成了，没有后续" / "mark this complete"
+- "这项要继续到下一阶段" / "continue this as a follow-up workstream"
+- "先暂停，等 API 定稿后恢复" / "pause until the API contract is approved"
+- "停止这个方向，记录原因" / "stop this work and record why"
+- "这个放到技术债，不进入当前 workstream" / "defer this to tech debt"
+When the user says a task should continue or pause, Codex records the workstream, next action,
+resume notes, and goal in `docs/exec-plans/workstreams.md`. When the user says it is complete,
+Codex records a complete decision and closes the plan without creating a workstream entry.
+## CLI Reference
+The installed skill exposes a manager script for Codex and for advanced debugging:
 ```bash
 python3 .codex/skills/harness-engine/scripts/manage_harness.py --help
 ```
-Common commands:
+For frontend or visual-design work, the generated harness uses `docs/FRONTEND.md` to route agents through `docs/DESIGN.md`. `docs/FRONTEND.md` defines which files are controlled by `docs/DESIGN.md`: design notes under `docs/design-docs/`, Tailwind theme files, global CSS variables, component theme modules, Storybook/theme previews, and UI implementation files that consume shared tokens or style rules. Agents should read `docs/FRONTEND.md`, then `docs/DESIGN.md`, then the relevant component, theme, or stylesheet.
+These commands are not the primary user interface. They are shown so maintainers can debug or
+inspect what Codex runs:
 ```bash
 python3 .codex/skills/harness-engine/scripts/manage_harness.py analyze --repo . --output analysis.json
 python3 .codex/skills/harness-engine/scripts/manage_harness.py sample-answers --analysis analysis.json --output answers.json
 python3 .codex/skills/harness-engine/scripts/manage_harness.py init --repo . --answers answers.json
 python3 .codex/skills/harness-engine/scripts/manage_harness.py plan-start --repo . --slug feature-name --goal "Implement the feature"
+python3 .codex/skills/harness-engine/scripts/manage_harness.py acceptance-set --repo . --plan docs/exec-plans/active/2026-06-11-feature-name.md --product "The feature satisfies the named user workflow and expected output." --ux "The user or operator can complete the workflow without ambiguous states." --architecture "The change fits the existing module boundaries and keeps plan state recoverable." --reliability "The validation commands and failure evidence are repeatable from a clean checkout." --security "The change introduces no secrets and preserves sensitive-data handling rules."
 python3 .codex/skills/harness-engine/scripts/manage_harness.py quality-score --repo . --plan docs/exec-plans/active/2026-06-11-feature-name.md --product-correctness 8 --product-note "Product assertions passed" --ux-operator-clarity 8 --ux-note "User workflow evidence passed" --architecture-maintainability 8 --architecture-note "Boundary and maintainability review passed" --reliability-observability 8 --reliability-note "Tests and smoke checks passed" --security-data-handling 8 --security-note "No new sensitive-data paths or secrets"
-python3 .codex/skills/harness-engine/scripts/manage_harness.py phase-set --repo . --plan docs/exec-plans/active/2026-06-11-feature-name.md --mode multi-phase --workstream feature-name --current-phase 1 --next-phase 2 --continuation docs/exec-plans/workstreams.md#feature-name --next-action "Create Phase 2 plan"
-python3 .codex/skills/harness-engine/scripts/manage_harness.py workstream-upsert --repo . --id feature-name --status active --current-plan docs/exec-plans/active/2026-06-11-feature-name.md --next-action "Create Phase 2 plan"
+python3 .codex/skills/harness-engine/scripts/manage_harness.py continuation-set --repo . --plan docs/exec-plans/active/2026-06-11-feature-name.md --decision complete --closure-reason "Feature is complete with no follow-up."
+python3 .codex/skills/harness-engine/scripts/manage_harness.py continuation-set --repo . --plan docs/exec-plans/active/2026-06-11-feature-name.md --decision continue --workstream feature-name --next-target docs/exec-plans/workstreams.md#feature-name --next-action "Create the next execution plan" --goal "Deliver the feature across follow-up execution plans"
 python3 .codex/skills/harness-engine/scripts/manage_harness.py check --repo .
 python3 .codex/skills/harness-engine/scripts/manage_harness.py evidence-prune --repo . --older-than-days 14
 python3 .codex/skills/harness-engine/scripts/manage_harness.py evidence-prune --repo . --older-than-days 14 --apply
@@ -166,18 +205,24 @@ python3 .codex/skills/harness-engine/scripts/manage_harness.py clean --repo .
 python3 .codex/skills/harness-engine/scripts/manage_harness.py clean --repo . --apply
 ```
-The quality gate is intentionally local and repository-owned. It does not require the user's
-project to have CI. `plan-close` refuses to move a plan to `completed` unless `quality-score`
-has passed, and `check` reports active plans whose quality gate is missing or failing.
+The quality workflow is intentionally local and repository-owned. It does not require the user's
+project to have CI. Active plans must have a ready Acceptance Contract sidecar so work is
+recoverable before implementation finishes. Completed plans must have a passing Quality Result
+scored against the current Acceptance Contract fingerprint; `plan-close` rejects stale scores,
+open defects, unresolved placeholders, and unresolved durable knowledge. Blocked `plan-close`
+commands return structured JSON with `status: "blocked"`, a stable `reason`, a user-readable
+`message`, and machine-readable `details`.
 ## Version Control Policy
 Commit harness docs that carry durable repository knowledge: `AGENTS.md`, `ARCHITECTURE.md`,
 `docs/PLANS.md`, `docs/QUALITY_SCORE.md`, `docs/RELIABILITY.md`, `docs/SECURITY.md`,
 `docs/FRONTEND.md`, `docs/sops/`, `docs/product-specs/`, `docs/design-docs/`,
-`docs/references/`, and intentional execution-plan state.
+`docs/references/`, and execution-plan state.
+Execution plans are project state. Commit active plans, completed plans, JSON sidecars, and `docs/exec-plans/workstreams.md` so another agent can recover the work from the repository.
-Do not commit local skill installs or generated evidence by default. `clean --apply` adds these ignores:
+Do not commit local skill installs or generated evidence by default. `clean --apply` adds these directory-level ignores:
 ```gitignore
 # harness-engine transient files
@@ -197,12 +242,12 @@ git commit -m "Remove harness runtime artifacts from git"
 git push
 ```
-`clean --apply` removes local generated evidence and stale task snapshots, then uses
-`git rm --cached` to stage removal of tracked harness runtime files from git and the remote.
+`clean --apply` removes local generated evidence, then uses `git rm --cached` to stage removal of tracked local skill installs and generated evidence from git and the remote. It does not remove, ignore, or untrack execution plans, JSON sidecars, or workstreams.
-For multi-phase work, `Phase Continuity` and `docs/exec-plans/workstreams.md` form the recovery
-ledger. A plan like `Local Workbench Phase 1` can close only after it records whether the workstream
-continues, pauses, completes, or stops, and where the next agent should resume.
+Every plan closes with a `Continuation Decision`: `complete`, `continue`, `pause`, `stop`, or
+`defer`. Only resumable `continue` and `pause` decisions enter `docs/exec-plans/workstreams.md`;
+one-off completed plans do not need workstream entries. Invalid `continue` or `pause` inputs fail before
+writing workstream state, and workstream goals are taken from `--goal` or the plan goal.
 ## Generated Harness Shape
@@ -268,6 +313,15 @@ Check npm package contents:
 npm run pack:check
 ```
+Before release, run:
+```bash
+npm test
+npm run smoke:install
+npm run pack:check
+git diff --check
+```
 The publish workflows expect an npm token when trusted publishing is not yet configured:
 ```text
@@ -281,14 +335,14 @@ These scores describe the current implementation, not an external guarantee.
 | Layer | Score | Notes |
 | --- | ---: | --- |
 | Product fit | 9 / 10 | Clear purpose: install a Codex skill that creates and maintains an agent-first repository harness. Real acceptance against a fresh Go backend plus browser frontend project validated generation and later issue workflows. Broader usage across more project types would still improve confidence. |
-| Skill workflow design | 9.2 / 10 | Strong progressive workflow: analyze, confirm, init/reconcile, plan, capture knowledge, validate, score with evidence notes, rework, record continuity, close. The workflow now explicitly routes user-reported product, frontend, backend, architecture, data, security, performance, and reliability issues even when the user does not invoke the skill by name. |
-| Knowledge, quality, and workstream closure loop | 9.1 / 10 | Stable knowledge IDs plus exact destination evidence reduce noisy doc duplication, `quality-score` rejects missing evidence notes, defects block closure until resolved, and workstreams make phased work recoverable. Future work could move plan state into structured sidecar metadata instead of Markdown parsing. |
+| Skill workflow design | 9.2 / 10 | Strong progressive workflow: analyze, confirm, init/reconcile, plan, capture knowledge, validate, score with evidence notes, rework, record continuity, close. The workflow now explicitly routes repository-mutating feature, bug, refactor, docs, dependency, UI, test, security, performance, and reliability work through the same lifecycle. |
+| Knowledge, quality, and workstream closure loop | 9.3 / 10 | Stable knowledge IDs plus exact destination evidence reduce noisy doc duplication. Execution plans now have JSON sidecars for Acceptance Contracts, Quality Results, defects, and knowledge state; `quality-score` rejects missing evidence notes or missing contracts, defects invalidate stale scores, and workstreams make resumable follow-up work recoverable. |
 | CLI installer | 8 / 10 | Simple local/global/custom install modes, force replacement, and path discovery. It is intentionally minimal and does not manage Codex runtime configuration. |
-| Generated harness docs | 8.4 / 10 | Covers architecture, plans, reliability, security, frontend policy, issue workflows, references, generated artifacts, and SOPs. The docs now front-load exact knowledge evidence, per-dimension quality notes, and plan placeholder cleanup, but templates still require Codex to tighten project-specific language after generation. |
-| Evaluation coverage | 9 / 10 | `npm test` runs 13 structured eval cases covering empty-repo init, frontend analysis, init reconciliation, clean command behavior for local runtime state and already tracked artifacts, issue workflow coverage, closed-loop plan behavior, phase continuity, path canonicalization, defect recovery, required quality-score notes, exact knowledge evidence, generated-evidence cleanup, eval report shape, and user-owned doc preservation. A fully automated Codex child-agent E2E would raise this further. |
+| Generated harness docs | 8.4 / 10 | Covers architecture, plans, reliability, security, frontend policy, broad task intake, issue workflows, references, generated artifacts, and SOPs. The docs now front-load exact knowledge evidence, per-dimension quality notes, default plan lifecycle, and plan placeholder cleanup, but templates still require Codex to tighten project-specific language after generation. |
+| Evaluation coverage | 9.2 / 10 | `npm test` runs 23 structured eval cases covering empty-repo init, frontend analysis, init reconciliation, clean command behavior, broad task intake, closed-loop plan behavior, continuation decisions, path canonicalization, defect recovery, required quality-score notes, exact knowledge evidence, structured sidecars, acceptance readiness, stale score rejection, generated-evidence cleanup, eval report shape, user-owned doc preservation, and frontend design control. A fully automated Codex child-agent E2E would raise this further. |
 | Release automation | 8 / 10 | Supports stable release, beta on every main commit, nightly, manual dry-run, artifacts, provenance, and token fallback. npm first-publish/trusted-publishing setup still requires external configuration. |
 | User-project safety | 8.8 / 10 | The skill avoids adding CI to target projects by default, preserves unmanaged files unless forced, and requires evidence-backed closure for defects and durable knowledge. More destructive-change simulation in evals would improve this score. |
-| Overall | 9 / 10 | The skill is now strong enough for regular use: self evals pass across the structured suite, real acceptance covered initial scaffold plus frontend and backend issue workflows, and the main failure modes found during acceptance are now documented and eval-covered. Remaining leverage is automated child-agent E2E coverage and structured plan metadata. |
+| Overall | 9.1 / 10 | The skill is now strong enough for regular use: self evals pass across the structured suite, real acceptance covered initial scaffold plus frontend and backend issue workflows, and plan lifecycle state is enforced through JSON sidecars. Remaining leverage is automated child-agent E2E coverage. |
 ## Reference

package/bin/install.js CHANGED Viewed

@@ -5,8 +5,8 @@ const os = require("os");
 const path = require("path");
 const PACKAGE_ROOT = path.resolve(__dirname, "..");
-const SKILL_NAME = "harness-engine";
-const SOURCE_SKILL_DIR = path.join(PACKAGE_ROOT, "skills", SKILL_NAME);
+const BUNDLE_NAME = "harness-engine-plugin";
+const BUNDLE_ENTRIES = [".codex-plugin", "skills"];
 function printHelp() {
   console.log(`harness-engine
@@ -19,7 +19,7 @@ Options:
   --local         Install into <cwd>/.codex/skills
   --global        Install into \${CODEX_HOME:-~/.codex}/skills
   --path <dir>    Install into a custom skills directory
-  --force         Replace an existing installed skill
+  --force         Replace an existing installed bundle
   -h, --help      Show this help text
 `);
 }
@@ -85,32 +85,71 @@ function copyDir(sourceDir, targetDir) {
   for (const entry of fs.readdirSync(sourceDir, { withFileTypes: true })) {
     const sourcePath = path.join(sourceDir, entry.name);
     const targetPath = path.join(targetDir, entry.name);
-    if (entry.isDirectory()) {
+    const stat = fs.statSync(sourcePath);
+    if (stat.isDirectory()) {
       copyDir(sourcePath, targetPath);
+    } else if (entry.isSymbolicLink()) {
+      const linkTarget = fs.readlinkSync(sourcePath);
+      fs.symlinkSync(linkTarget, targetPath);
     } else {
       fs.copyFileSync(sourcePath, targetPath);
-      const stat = fs.statSync(sourcePath);
       fs.chmodSync(targetPath, stat.mode);
     }
   }
 }
-function installSkill(destinationDir, force) {
-  const skillTargetDir = path.join(destinationDir, SKILL_NAME);
-  if (!fs.existsSync(SOURCE_SKILL_DIR)) {
-    throw new Error(`Bundled skill not found: ${SOURCE_SKILL_DIR}`);
+function copyEntry(sourcePath, targetPath) {
+  const stat = fs.lstatSync(sourcePath);
+  if (stat.isDirectory()) {
+    copyDir(sourcePath, targetPath);
+  } else if (stat.isSymbolicLink()) {
+    fs.symlinkSync(fs.readlinkSync(sourcePath), targetPath);
+  } else {
+    fs.mkdirSync(path.dirname(targetPath), { recursive: true });
+    fs.copyFileSync(sourcePath, targetPath);
+    fs.chmodSync(targetPath, fs.statSync(sourcePath).mode);
   }
+}
-  if (fs.existsSync(skillTargetDir)) {
-    if (!force) {
-      throw new Error(`Skill already exists at ${skillTargetDir}. Re-run with --force to replace it.`);
+function assertBundleSources() {
+  for (const entry of BUNDLE_ENTRIES) {
+    const sourcePath = path.join(PACKAGE_ROOT, entry);
+    if (!fs.existsSync(sourcePath)) {
+      throw new Error(`Bundled plugin entry not found: ${sourcePath}`);
     }
-    fs.rmSync(skillTargetDir, { recursive: true, force: true });
+  }
+}
+function removeIfExists(targetPath, force, label) {
+  if (!fs.existsSync(targetPath)) {
+    return;
+  }
+  if (!force) {
+    throw new Error(`${label} already exists at ${targetPath}. Re-run with --force to replace it.`);
   }
+  fs.rmSync(targetPath, { recursive: true, force: true });
+}
+function installBundle(destinationDir, force) {
+  assertBundleSources();
   fs.mkdirSync(destinationDir, { recursive: true });
-  copyDir(SOURCE_SKILL_DIR, skillTargetDir);
-  return skillTargetDir;
+  const bundleTargetDir = path.join(destinationDir, BUNDLE_NAME);
+  removeIfExists(bundleTargetDir, force, "Plugin bundle");
+  fs.mkdirSync(bundleTargetDir, { recursive: true });
+  for (const entry of BUNDLE_ENTRIES) {
+    copyEntry(path.join(PACKAGE_ROOT, entry), path.join(bundleTargetDir, entry));
+  }
+  // Compatibility: older users invoke $harness-engine from a normal skills directory.
+  // Keep a top-level skill copy in place while the plugin root carries the bundle.
+  const compatTarget = path.join(destinationDir, "harness-engine");
+  removeIfExists(compatTarget, force, "Compatibility skill");
+  copyDir(path.join(PACKAGE_ROOT, "skills", "harness-engine"), compatTarget);
+  return bundleTargetDir;
 }
 function main() {
@@ -131,7 +170,7 @@ function main() {
   const destinationDir = resolveSkillsDir(args.mode, args.customPath);
   if (args.command === "where") {
-    console.log(path.join(destinationDir, SKILL_NAME));
+    console.log(path.join(destinationDir, BUNDLE_NAME));
     return;
   }
@@ -142,8 +181,8 @@ function main() {
   }
   try {
-    const installedPath = installSkill(destinationDir, args.force);
-    console.log(`Installed ${SKILL_NAME} to ${installedPath}`);
+    const installedPath = installBundle(destinationDir, args.force);
+    console.log(`Installed ${BUNDLE_NAME} plugin bundle to ${installedPath}`);
     console.log("Invoke it in Codex with $harness-engine.");
   } catch (error) {
     console.error(`Install failed: ${error.message}`);

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@hallucination-studio/harness-engine",
-  "version": "1.0.0-beta.12.d308768",
+  "version": "1.0.0-beta.14.a797755",
   "description": "Install the harness-engine Codex skill for initializing and reconciling advanced repository harness docs.",
   "repository": {
     "type": "git",
@@ -19,6 +19,7 @@
   },
   "files": [
     "bin",
+    ".codex-plugin/**",
     "skills/**/SKILL.md",
     "skills/**/agents/**",
     "skills/**/assets/**",

package/skills/harness-engine/SKILL.md CHANGED Viewed

@@ -1,35 +1,41 @@
 ---
 name: harness-engine
-description: Initialize or refresh an advanced harness-engineering repository shape for Codex-driven projects. Use when Codex needs to analyze a repository, ask the human to confirm high-impact product and architecture facts, and then run the harness-engine init workflow to create or reconcile AGENTS.md, architecture docs, policy docs, plan folders, reference folders, and SOP-backed starter files.
+description: Initialize, refresh, and operate an advanced harness-engineering repository lifecycle for Codex-driven projects. Use when Codex needs to create or reconcile harness docs, or when work inside a harness-managed repository will change code, docs, configuration, tests, dependencies, build/release scripts, generated templates, runtime behavior, migrations, cleanup policy, or other durable repository state.
 ---
 # Harness Engine
-Run the packaged script to inspect the target repository before editing files. Use the generated analysis to decide what to ask the human, what durable knowledge is missing from the repo, and which execution-plan and SOP files must be created or reconciled.
+Use the packaged script yourself to inspect the target repository before editing files. Do not ask the user to run the harness Python commands during normal work. Use the generated analysis to decide what to ask the human, what durable knowledge is missing from the repo, and which execution-plan and SOP files must be created or reconciled.
+In a harness-managed repository, default every repository-mutating request into the harness lifecycle. Repository-mutating work includes code, docs, configuration, tests, dependencies, build/release scripts, generated templates, runtime behavior, migrations, cleanup, and fixes from review or user feedback. The only no-plan exceptions are pure question answering, read-only investigation, showing command output, and status reporting with no file changes. If an investigation turns into editing files, enter the lifecycle before editing.
+The user-facing interface is intent, not CLI. If the user says a task is complete, should continue, should pause, should stop, or should become follow-up debt, translate that into the appropriate manager command yourself and report the outcome. Only show raw commands when the user explicitly asks for implementation details or debugging help.
 ## Workflow
 1. Run `python3 scripts/manage_harness.py analyze --repo <target-repo> --output <analysis.json>`.
 2. Read `analysis.json`.
 3. Ask the human only the unresolved, high-impact questions from `human_confirmations`.
-4. Run `python3 scripts/manage_harness.py sample-answers --analysis <analysis.json> --output <answers.json>`.
-5. Fill the placeholders in `answers.json` from the repository and the human's confirmed answers.
-6. Run `python3 scripts/manage_harness.py init --repo <target-repo> --answers <answers.json>`. This is the single workspace entrypoint: it creates a new harness when none exists, and reconciles a managed or partial harness when managed harness files are already present. Reconcile refreshes managed files, backfills newly introduced managed files, and preserves unmanaged user files. Pass `--force` only with explicit user approval.
-7. If the task is multi-step, run `python3 scripts/manage_harness.py plan-start --repo <target-repo> --slug <task-name> --goal "<goal>"`.
-8. If you learn durable facts during the work, run `python3 scripts/manage_harness.py knowledge-log --repo <target-repo> --plan <plan-file> --fact "<fact>" --destination <durable-doc>` and keep the returned `id`. Use `--fact-file <file>` when the fact contains shell-sensitive characters.
-9. Before closing the task, write those facts into their durable docs.
-10. Run `python3 scripts/manage_harness.py knowledge-mark-written --repo <target-repo> --plan <plan-file> --id <knowledge-id> --evidence "<verbatim text already in durable doc>"`; prefer `--evidence-file <file>` when evidence contains backticks, globs, quotes, pipes, or other shell-sensitive characters. Evidence must be copied from the destination doc, not summarized. Use `--append` only when the exact fact should be appended mechanically.
-11. If validation, evals, browser checks, or code review reveal a bug, immediately run `python3 scripts/manage_harness.py defect-log --repo <target-repo> --plan <plan-file> --severity <P0|P1|P2|P3> --summary "<bug>" --evidence "<failing check>"`. This forces the quality gate to fail.
-12. Fix logged defects, then run `python3 scripts/manage_harness.py defect-resolve --repo <target-repo> --plan <plan-file> --id <bug-id> --fix-evidence "<passing check or code evidence>"`.
-13. Score the finished work with `python3 scripts/manage_harness.py quality-score --repo <target-repo> --plan <plan-file> --product-correctness <0-10> --product-note "<evidence>" --ux-operator-clarity <0-10> --ux-note "<evidence>" --architecture-maintainability <0-10> --architecture-note "<evidence>" --reliability-observability <0-10> --reliability-note "<evidence>" --security-data-handling <0-10> --security-note "<evidence>"`. Every dimension needs an evidence note.
-14. If `quality-score` fails, treat `## Rework Required` in the plan as the next implementation input, fix the work, then run `quality-score` again.
-15. For phased or resumable work, run `python3 scripts/manage_harness.py phase-set --repo <target-repo> --plan <plan-file> --mode <multi-phase|paused|completed|stopped> --workstream <id> --current-phase <n> --continuation <target> --next-action "<next action>"`, then update `workstreams.md` with `workstream-upsert`.
-16. Before closing, replace generic plan placeholders with task-specific scope, constraints, steps, validation, and completion notes; leave no open durable-knowledge placeholder except the default unused line.
-17. Close the plan with `python3 scripts/manage_harness.py plan-close --repo <target-repo> --plan <plan-file> --summary "<summary>"`.
-18. Before handoff, run `python3 .codex/skills/harness-engine/scripts/manage_harness.py check --repo <target-repo>` from an installed target repository.
-19. To review stale generated evidence, run `python3 scripts/manage_harness.py evidence-prune --repo <target-repo>` first; it is dry-run by default. Add `--apply` only after checking the candidate list.
-20. To clean transient harness runtime files or remove already committed runtime files from the remote, run `python3 scripts/manage_harness.py clean --repo <target-repo>` first; it is dry-run by default. Add `--apply` to clean local runtime state, update `.gitignore`, and stage `git rm --cached` removals, then commit and push.
-21. After changing this skill, run `python3 evals/run_evals.py` and iterate until it passes.
+4. During initialization, create frontend design docs only when the analysis detects a frontend surface. Frontend repos get `docs/FRONTEND.md`, `docs/DESIGN.md`, and `docs/design-docs/`; backend-only repos do not. Ask the human for the desired visual style direction and use existing frontend style files as evidence. The generated `docs/DESIGN.md` is a project-owned visual specification shaped like DESIGN.md: YAML tokens plus markdown rationale. Do not call external design-generation skills or packages during init.
+5. Run `python3 scripts/manage_harness.py sample-answers --analysis <analysis.json> --output <answers.json>`.
+6. Fill the placeholders in `answers.json` from the repository and the human's confirmed answers.
+7. Run `python3 scripts/manage_harness.py init --repo <target-repo> --answers <answers.json>`. This is the single workspace entrypoint: it creates a new harness when none exists, and reconciles a managed or partial harness when managed harness files are already present. Reconcile refreshes managed files, backfills newly introduced managed files, and preserves unmanaged user files. Pass `--force` only with explicit user approval.
+8. For any repository-mutating task, run `python3 scripts/manage_harness.py plan-start --repo <target-repo> --slug <task-name> --goal "<goal>"` unless an active plan already covers the exact work. Small changes may use a lightweight plan, but they still require acceptance, validation, quality scoring, plan close, and check.
+9. Before implementation, run `python3 scripts/manage_harness.py acceptance-set --repo <target-repo> --plan <plan-file> --product "<product criterion>" --ux "<UX criterion>" --architecture "<architecture criterion>" --reliability "<reliability criterion>" --security "<security criterion>"`. Criteria must be concrete to the task; generic templates are rejected.
+10. If you learn durable facts during the work, run `python3 scripts/manage_harness.py knowledge-log --repo <target-repo> --plan <plan-file> --fact "<fact>" --destination <durable-doc>` and keep the returned `id`. Use `--fact-file <file>` when the fact contains shell-sensitive characters.
+11. Before closing the task, write those facts into their durable docs.
+12. Run `python3 scripts/manage_harness.py knowledge-mark-written --repo <target-repo> --plan <plan-file> --id <knowledge-id> --evidence "<verbatim text already in durable doc>"`; prefer `--evidence-file <file>` when evidence contains backticks, globs, quotes, pipes, or other shell-sensitive characters. Evidence must be copied from the destination doc, not summarized. Use `--append` only when the exact fact should be appended mechanically.
+13. If validation, evals, browser checks, or code review reveal a bug, immediately run `python3 scripts/manage_harness.py defect-log --repo <target-repo> --plan <plan-file> --severity <P0|P1|P2|P3> --summary "<bug>" --evidence "<failing check>"`. This invalidates any existing quality result and makes the defect the next rework input.
+14. Fix logged defects, then run `python3 scripts/manage_harness.py defect-resolve --repo <target-repo> --plan <plan-file> --id <bug-id> --fix-evidence "<passing check or code evidence>"`.
+15. Score the finished work with `python3 scripts/manage_harness.py quality-score --repo <target-repo> --plan <plan-file> --product-correctness <0-10> --product-note "<evidence>" --ux-operator-clarity <0-10> --ux-note "<evidence>" --architecture-maintainability <0-10> --architecture-note "<evidence>" --reliability-observability <0-10> --reliability-note "<evidence>" --security-data-handling <0-10> --security-note "<evidence>"`. Every dimension needs an evidence note tied to the ready Acceptance Contract.
+16. If `quality-score` fails, treat `## Rework Required` in the plan as the next implementation input, fix the work, then run `quality-score` again.
+17. Before closing, run `python3 scripts/manage_harness.py continuation-set --repo <target-repo> --plan <plan-file> --decision <complete|continue|pause|stop|defer>`. Use `--workstream`, `--next-target`, `--next-action`, `--closure-reason`, `--resume-notes`, and `--goal` as needed; `continue` and `pause` update `workstreams.md` automatically only after required fields validate.
+18. Before closing, replace generic plan placeholders with task-specific scope, constraints, steps, validation, and completion notes; leave no open durable-knowledge placeholder except the default unused line.
+19. Close the plan with `python3 scripts/manage_harness.py plan-close --repo <target-repo> --plan <plan-file> --summary "<summary>"`.
+20. Before handoff, run `python3 .codex/skills/harness-engine/scripts/manage_harness.py check --repo <target-repo>` from an installed target repository.
+21. To review stale generated evidence, run `python3 scripts/manage_harness.py evidence-prune --repo <target-repo>` first; it is dry-run by default. Add `--apply` only after checking the candidate list.
+22. To clean transient harness runtime files or remove already committed runtime files from the remote, run `python3 scripts/manage_harness.py clean --repo <target-repo>` first; it is dry-run by default. Add `--apply` to clean local runtime state, update `.gitignore`, and stage `git rm --cached` removals, then commit and push. Clean is limited to local skill installs and generated evidence; execution plans, sidecars, and workstreams are durable project state.
+23. After changing this skill, run `python3 evals/run_evals.py` and iterate until it passes.
 ## Reading Order
@@ -37,11 +43,12 @@ Run the packaged script to inspect the target repository before editing files. U
 - Read [references/file-map.md](references/file-map.md) when deciding which generated file to edit.
 - Read [references/question-catalog.md](references/question-catalog.md) when the analysis surfaces ambiguous product, security, reliability, or frontend facts.
 - Read [references/knowledge-capture.md](references/knowledge-capture.md) when you discover facts that should survive chat history.
-- Read [references/exec-plans.md](references/exec-plans.md) before planning or updating any multi-step work.
+- Read [references/exec-plans.md](references/exec-plans.md) before planning or updating any repository-mutating work.
 - Read [references/sop-index.md](references/sop-index.md) to choose the right SOP for architecture, UI validation, observability, or knowledge capture work.
 - Read [references/template-policy.md](references/template-policy.md) before overwriting existing files.
 - Read [references/evaluation-loop.md](references/evaluation-loop.md) before changing the skill, templates, scripts, or policy references.
 - Read [references/evidence-first-evals.md](references/evidence-first-evals.md) before designing evals for product correctness, frontend validation, or bug-discovery coverage.
+- Read `docs/FRONTEND.md` and `docs/DESIGN.md` when they exist for frontend, UI, product design, visual design, canvas, or interface polish work.
 ## Command Rules
@@ -51,7 +58,7 @@ Run the packaged script to inspect the target repository before editing files. U
 - Do not overwrite existing files unless the human asked for it or you pass `--force`.
 - Treat the generated files as starting points. After generation, tighten them with repository-specific details instead of leaving placeholders behind.
 - Before plan close, replace or remove task placeholders such as "Define in-scope work", "Add the first concrete step", "Describe how the work will be verified", and any ad hoc durable-knowledge TODOs.
-- Treat `docs/exec-plans/` as required state for multi-step work, not optional notes.
+- Treat `docs/exec-plans/` as required durable state for repository-mutating work, not optional notes.
 - Read `docs/exec-plans/workstreams.md` before resuming interrupted feature, refactor, reliability, security, frontend, or cleanup work.
 - Treat `docs/sops/` as mechanical operating procedures, not background reading.
 - When you answer a question using facts that are not yet in the repo but should be reusable, write them into a durable doc before finishing.
@@ -59,21 +66,27 @@ Run the packaged script to inspect the target repository before editing files. U
 - The knowledge evidence text must exist verbatim in the destination doc; if it is only a paraphrase, write the durable doc first or use a file containing exact destination text.
 - Use `defect-log` for every bug found by tests, evals, browser validation, or code review; unresolved defects must block handoff.
 - Use `defect-resolve` only after the implementation is fixed and you can cite passing validation or code evidence.
-- Use `quality-score` before `plan-close`; include `--product-note`, `--ux-note`, `--architecture-note`, `--reliability-note`, and `--security-note`; failed scores must drive rework, not handoff.
-- Use `phase-set` and `workstream-upsert` before `plan-close` for Phase 1/2/3 or any other resumable multi-plan work.
-- Use `plan-close` as the final guardrail so plan state, quality score, and durable docs stay synchronized.
-- Use `check` as the local handoff guardrail for user repositories.
+- Use `acceptance-set` before implementation and `quality-score` before `plan-close`; include `--product-note`, `--ux-note`, `--architecture-note`, `--reliability-note`, and `--security-note`; failed or stale scores must drive rework, not handoff.
+- Use `continuation-set` before every `plan-close`; choose `complete` for one-off plans, and use `continue` or `pause` for resumable multi-plan work. Invalid continuation input must fail before writing a half-valid workstream.
+- Use `plan-close` as the final guardrail so plan state, quality score, and durable docs stay synchronized. When blocked, it returns JSON with `status`, `reason`, `message`, and `details`; use that output as the next repair input.
+- Use `check` as the local handoff guardrail for user repositories. Active plans require ready Acceptance Contracts; completed plans require passing Quality Results scored against the current contract fingerprint.
 - Use `evidence-prune` as a cleanup preview for old unreferenced files under `docs/generated/`; it never deletes unless `--apply` is present.
-- Use `clean` when `.codex/skills/`, `docs/generated/`, or stale `docs/exec-plans/active|completed/` files need cleanup or were already committed. It never changes files or the git index unless `--apply` is present.
+- Use `clean` when `.codex/skills/` or `docs/generated/` files need cleanup or were already committed. It never changes files or the git index unless `--apply` is present, and it must not remove execution plans, sidecars, or workstreams.
 - Run `python3 evals/run_evals.py` after skill changes, read the structured report, and treat per-case failures as iteration input.
 - Do not add CI to user repositories unless the human explicitly asks for it.
+## Frontend Design Docs
+Harness-engine has no external design runtime dependency and must not call an external design skill during init. It uses the local `/Users/murphy/code/github/design.md` checkout only as a reference for document shape.
+For frontend repositories, `docs/FRONTEND.md` records product positioning, requested style direction, existing frontend code signals, scope, stack notes, validation expectations, controlled files, and read order. `docs/DESIGN.md` records the unified visual specification with YAML tokens and markdown rationale. For backend-only repositories, these files are not generated.
 ## Output Rules
 - Keep `AGENTS.md` short and routing-oriented.
 - Keep durable knowledge in repo docs, not in chat-only explanations.
-- Keep plans under `docs/exec-plans/active/` and move finished plans to `docs/exec-plans/completed/`.
-- Keep resumable workstreams in `docs/exec-plans/workstreams.md`.
+- Keep plans under `docs/exec-plans/active/` and move finished plans to `docs/exec-plans/completed/`; plan Markdown and JSON sidecars are version-controlled project state.
+- Keep resumable workstreams in `docs/exec-plans/workstreams.md`; this is version-controlled project state.
 - Keep generated evidence under `docs/generated/`; it is local runtime output and is ignored by git unless the human intentionally promotes a specific artifact into tracked docs.
 - Keep external, model-friendly references under `docs/references/`.
 - Keep SOPs explicit and task-triggered so the next agent can follow the same path mechanically.