npm - @hallucination-studio/harness-engine - Versions diffs - 1.0.0-beta.10.9ff10d9 - Mend

@hallucination-studio/harness-engine 1.0.0-beta.10.9ff10d9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

package/README.md +262 -0
package/bin/install.js +154 -0
package/package.json +31 -0
package/skills/harness-engine/SKILL.md +82 -0
package/skills/harness-engine/agents/openai.yaml +4 -0
package/skills/harness-engine/assets/repo-template/.keep +1 -0
package/skills/harness-engine/assets/sops/.keep +1 -0
package/skills/harness-engine/evals/cases.json +50 -0
package/skills/harness-engine/evals/run_evals.py +1188 -0
package/skills/harness-engine/references/evaluation-loop.md +24 -0
package/skills/harness-engine/references/evidence-first-evals.md +180 -0
package/skills/harness-engine/references/exec-plans.md +51 -0
package/skills/harness-engine/references/file-map.md +17 -0
package/skills/harness-engine/references/knowledge-capture.md +35 -0
package/skills/harness-engine/references/question-catalog.md +29 -0
package/skills/harness-engine/references/sop-index.md +12 -0
package/skills/harness-engine/references/template-policy.md +13 -0
package/skills/harness-engine/references/workflow.md +55 -0
package/skills/harness-engine/scripts/manage_harness.py +2374 -0

package/README.md ADDED Viewed

@@ -0,0 +1,262 @@
+# Harness Engine
+Harness Engine packages a Codex skill that bootstraps an agent-first repository harness.
+It turns the repository-shaping ideas from OpenAI's
+["Harness engineering: leveraging Codex in an agent-first world"](https://openai.com/index/harness-engineering/)
+into an installable `npx` workflow.
+The package does not install a harness into this repository. This repository builds and publishes
+the installer. Users install the bundled `harness-engine` skill into their own project or
+global Codex skill directory, then ask Codex to use that skill to analyze the target repository,
+ask for missing high-impact facts, create the harness files, and keep future work closed-loop.
+## What This Project Does
+- Installs the `harness-engine` Codex skill locally, globally, or into a custom skills directory.
+- Provides a repository analyzer that detects language, package manager, frontend signals, existing harness files, missing execution-plan state, and missing SOPs.
+- Generates a short routing-style `AGENTS.md` plus durable system-of-record docs such as `ARCHITECTURE.md`, `docs/RELIABILITY.md`, `docs/SECURITY.md`, `docs/QUALITY_SCORE.md`, and `docs/FRONTEND.md`.
+- Creates execution-plan folders for active and completed plans.
+- Adds SOPs for architecture setup, knowledge capture, local observability, and UI validation.
+- Reconciles managed harnesses through the same `init` flow, refreshing managed files and backfilling newly introduced managed files while preserving unmanaged docs.
+- Enforces a local harness check without assuming the user's project has CI.
+- Previews and optionally removes stale unreferenced generated evidence under `docs/generated/`.
+- Supports durable knowledge closure with stable knowledge IDs and evidence text, so permanent docs can use natural wording instead of duplicated checklist strings.
+- Enforces a local quality gate for execution plans; failed scores write `## Rework Required` into the plan and block `plan-close`.
+- Tracks resumable workstreams so interrupted features, refactors, reliability work, and cleanup efforts can be recovered from repo state instead of chat history.
+## Why It Exists
+The OpenAI harness engineering article argues that agent-first repositories work better when the
+repo itself becomes the system of record: a short `AGENTS.md` routes agents into deeper docs,
+execution plans live beside the code, and validation loops are mechanical rather than remembered.
+This project packages that shape as a reusable Codex skill.
+The goal is not to blindly copy a template. The skill first analyzes the target repo, then asks the
+human to confirm product, reliability, security, frontend, and quality facts that cannot be safely
+inferred from code alone.
+## Install
+The npm package is scoped as `@hallucination-studio/harness-engine`. The installed command name is
+still `harness-engine`.
+Install the latest stable release into the current repository:
+```bash
+npx @hallucination-studio/harness-engine install --local
+```
+Install the latest beta build from `main`:
+```bash
+npx @hallucination-studio/harness-engine@beta install --local
+```
+Install globally into `${CODEX_HOME:-~/.codex}/skills`:
+```bash
+npx @hallucination-studio/harness-engine install --global
+```
+Install into a custom skills directory:
+```bash
+npx @hallucination-studio/harness-engine install --path /path/to/skills
+```
+Replace an existing installed skill:
+```bash
+npx @hallucination-studio/harness-engine install --local --force
+```
+Show where the skill would be installed:
+```bash
+npx @hallucination-studio/harness-engine where --local
+```
+## Update An Installed Skill Package
+The `npx` installer only installs or replaces the Codex skill package. To update an already
+installed skill, rerun `install` with `--force` in the same install location.
+Replace the local skill install:
+```bash
+npx @hallucination-studio/harness-engine install --local --force
+```
+Replace the global skill install:
+```bash
+npx @hallucination-studio/harness-engine install --global --force
+```
+Replace a custom skill install:
+```bash
+npx @hallucination-studio/harness-engine install --path /path/to/skills --force
+```
+After the skill package is installed, the target repository workflow happens inside Codex. In the
+target workspace, invoke the skill:
+```text
+$harness-engine
+```
+The skill should analyze the workspace and run the single workspace entrypoint:
+- If the harness is not installed in that repository, `manage_harness.py init` creates it.
+- If a managed harness already exists, `manage_harness.py init` reconciles it by refreshing managed files and backfilling newly introduced managed files.
+- Unmanaged user files are preserved unless `--force` is explicitly used.
+The underlying command for both cases is:
+```bash
+python3 .codex/skills/harness-engine/scripts/manage_harness.py init --repo . --answers answers.json
+```
+## Use The Skill In A Target Repo
+After installing, open Codex in the target repository and invoke:
+```text
+$harness-engine
+```
+The intended workflow is:
+1. Analyze the target repository.
+2. Ask the human only for unresolved, high-impact facts.
+3. Initialize or reconcile the harness files.
+4. Create execution plans for multi-step work.
+5. Log durable knowledge into active plans.
+6. Write the durable facts into permanent docs.
+7. Mark knowledge as written using ID plus evidence text.
+8. Score the finished work across product, UX/operator clarity, architecture, reliability, and security.
+9. If the quality gate fails, implement the generated `## Rework Required` items and score again.
+10. For phased or resumable work, update `Phase Continuity` and `docs/exec-plans/workstreams.md`.
+11. Close the execution plan only after the quality gate passes, phase continuity is recorded, and durable docs are updated.
+12. Run the local harness check before handoff.
+13. Periodically run `evidence-prune` to preview stale unreferenced generated evidence, and apply it only after reviewing the candidate list.
+The installed skill exposes the underlying script at:
+```bash
+python3 .codex/skills/harness-engine/scripts/manage_harness.py --help
+```
+Common commands:
+```bash
+python3 .codex/skills/harness-engine/scripts/manage_harness.py analyze --repo . --output analysis.json
+python3 .codex/skills/harness-engine/scripts/manage_harness.py sample-answers --analysis analysis.json --output answers.json
+python3 .codex/skills/harness-engine/scripts/manage_harness.py init --repo . --answers answers.json
+python3 .codex/skills/harness-engine/scripts/manage_harness.py plan-start --repo . --slug feature-name --goal "Implement the feature"
+python3 .codex/skills/harness-engine/scripts/manage_harness.py quality-score --repo . --plan docs/exec-plans/active/2026-06-11-feature-name.md --product-correctness 8 --product-note "Product assertions passed" --ux-operator-clarity 8 --ux-note "User workflow evidence passed" --architecture-maintainability 8 --architecture-note "Boundary and maintainability review passed" --reliability-observability 8 --reliability-note "Tests and smoke checks passed" --security-data-handling 8 --security-note "No new sensitive-data paths or secrets"
+python3 .codex/skills/harness-engine/scripts/manage_harness.py phase-set --repo . --plan docs/exec-plans/active/2026-06-11-feature-name.md --mode multi-phase --workstream feature-name --current-phase 1 --next-phase 2 --continuation docs/exec-plans/workstreams.md#feature-name --next-action "Create Phase 2 plan"
+python3 .codex/skills/harness-engine/scripts/manage_harness.py workstream-upsert --repo . --id feature-name --status active --current-plan docs/exec-plans/active/2026-06-11-feature-name.md --next-action "Create Phase 2 plan"
+python3 .codex/skills/harness-engine/scripts/manage_harness.py check --repo .
+python3 .codex/skills/harness-engine/scripts/manage_harness.py evidence-prune --repo . --older-than-days 14
+python3 .codex/skills/harness-engine/scripts/manage_harness.py evidence-prune --repo . --older-than-days 14 --apply
+```
+The quality gate is intentionally local and repository-owned. It does not require the user's
+project to have CI. `plan-close` refuses to move a plan to `completed` unless `quality-score`
+has passed, and `check` reports active plans whose quality gate is missing or failing.
+For multi-phase work, `Phase Continuity` and `docs/exec-plans/workstreams.md` form the recovery
+ledger. A plan like `Local Workbench Phase 1` can close only after it records whether the workstream
+continues, pauses, completes, or stops, and where the next agent should resume.
+## Generated Harness Shape
+A typical initialized target repository receives:
+```text
+AGENTS.md
+ARCHITECTURE.md
+docs/
+├── DESIGN.md
+├── FRONTEND.md
+├── PLANS.md
+├── PRODUCT_SENSE.md
+├── QUALITY_SCORE.md
+├── RELIABILITY.md
+├── SECURITY.md
+├── design-docs/
+├── exec-plans/
+│   ├── active/
+│   ├── completed/
+│   ├── workstreams.md
+│   └── tech-debt-tracker.md
+├── generated/
+├── product-specs/
+├── references/
+└── sops/
+```
+`AGENTS.md` is intentionally short. It is a router, not an encyclopedia.
+## Version Channels
+- `latest`: Stable releases created from GitHub Release tags. The workflow derives the package version from the release tag and publishes to npm with the `latest` dist-tag.
+- `beta`: Every push to `main` publishes a unique prerelease version like `1.0.0-beta.<run-number>.<short-sha>` with the `beta` dist-tag. npm cannot overwrite an existing version, so the `beta` tag moves forward to the newest main build.
+- `nightly`: A scheduled daily build publishes versions like `1.0.0-nightly.<yyyymmdd>.<run-number>` with the `nightly` dist-tag.
+- Manual test builds: The release workflow can be run manually. By default it performs `npm publish --dry-run` with a generated `-test.<run-number>` version. Set `dry_run=false` to publish a test package to a non-`latest` dist-tag such as `next`.
+Examples:
+```bash
+npx @hallucination-studio/harness-engine install --local
+npx @hallucination-studio/harness-engine@beta install --local
+npx @hallucination-studio/harness-engine@nightly install --local
+```
+## Local Development
+Run the skill evaluations:
+```bash
+npm test
+```
+Smoke-test installation:
+```bash
+npm run smoke:install
+```
+Check npm package contents:
+```bash
+npm run pack:check
+```
+The publish workflows expect an npm token when trusted publishing is not yet configured:
+```text
+GitHub Actions secret: NPM_TOKEN
+```
+## Implementation Quality Score
+These scores describe the current implementation, not an external guarantee.
+| Layer | Score | Notes |
+| --- | ---: | --- |
+| Product fit | 9 / 10 | Clear purpose: install a Codex skill that creates and maintains an agent-first repository harness. Real acceptance against a fresh Go backend plus browser frontend project validated generation and later issue workflows. Broader usage across more project types would still improve confidence. |
+| Skill workflow design | 9.2 / 10 | Strong progressive workflow: analyze, confirm, init/reconcile, plan, capture knowledge, validate, score with evidence notes, rework, record continuity, close. The workflow now explicitly routes user-reported product, frontend, backend, architecture, data, security, performance, and reliability issues even when the user does not invoke the skill by name. |
+| Knowledge, quality, and workstream closure loop | 9.1 / 10 | Stable knowledge IDs plus exact destination evidence reduce noisy doc duplication, `quality-score` rejects missing evidence notes, defects block closure until resolved, and workstreams make phased work recoverable. Future work could move plan state into structured sidecar metadata instead of Markdown parsing. |
+| CLI installer | 8 / 10 | Simple local/global/custom install modes, force replacement, and path discovery. It is intentionally minimal and does not manage Codex runtime configuration. |
+| Generated harness docs | 8.4 / 10 | Covers architecture, plans, reliability, security, frontend policy, issue workflows, references, generated artifacts, and SOPs. The docs now front-load exact knowledge evidence, per-dimension quality notes, and plan placeholder cleanup, but templates still require Codex to tighten project-specific language after generation. |
+| Evaluation coverage | 9 / 10 | `npm test` runs 12 structured eval cases covering empty-repo init, frontend analysis, init reconciliation, issue workflow coverage, closed-loop plan behavior, phase continuity, path canonicalization, defect recovery, required quality-score notes, exact knowledge evidence, generated-evidence cleanup, eval report shape, and user-owned doc preservation. A fully automated Codex child-agent E2E would raise this further. |
+| Release automation | 8 / 10 | Supports stable release, beta on every main commit, nightly, manual dry-run, artifacts, provenance, and token fallback. npm first-publish/trusted-publishing setup still requires external configuration. |
+| User-project safety | 8.8 / 10 | The skill avoids adding CI to target projects by default, preserves unmanaged files unless forced, and requires evidence-backed closure for defects and durable knowledge. More destructive-change simulation in evals would improve this score. |
+| Overall | 9 / 10 | The skill is now strong enough for regular use: self evals pass across the structured suite, real acceptance covered initial scaffold plus frontend and backend issue workflows, and the main failure modes found during acceptance are now documented and eval-covered. Remaining leverage is automated child-agent E2E coverage and structured plan metadata. |
+## Reference
+- OpenAI: [Harness engineering: leveraging Codex in an agent-first world](https://openai.com/index/harness-engineering/)

package/bin/install.js ADDED Viewed

@@ -0,0 +1,154 @@
+#!/usr/bin/env node
+const fs = require("fs");
+const os = require("os");
+const path = require("path");
+const PACKAGE_ROOT = path.resolve(__dirname, "..");
+const SKILL_NAME = "harness-engine";
+const SOURCE_SKILL_DIR = path.join(PACKAGE_ROOT, "skills", SKILL_NAME);
+function printHelp() {
+  console.log(`harness-engine
+Usage:
+  npx @hallucination-studio/harness-engine install [--local | --global | --path <dir>] [--force]
+  npx @hallucination-studio/harness-engine where [--local | --global | --path <dir>]
+Options:
+  --local         Install into <cwd>/.codex/skills
+  --global        Install into \${CODEX_HOME:-~/.codex}/skills
+  --path <dir>    Install into a custom skills directory
+  --force         Replace an existing installed skill
+  -h, --help      Show this help text
+`);
+}
+function parseArgs(argv) {
+  const result = {
+    command: "install",
+    mode: null,
+    customPath: null,
+    force: false
+  };
+  const args = [...argv];
+  if (args.length > 0 && !args[0].startsWith("-")) {
+    result.command = args.shift();
+  }
+  for (let i = 0; i < args.length; i += 1) {
+    const arg = args[i];
+    if (arg === "--local") {
+      result.mode = "local";
+    } else if (arg === "--global") {
+      result.mode = "global";
+    } else if (arg === "--path") {
+      result.mode = "custom";
+      result.customPath = args[i + 1];
+      i += 1;
+    } else if (arg === "--force") {
+      result.force = true;
+    } else if (arg === "-h" || arg === "--help") {
+      result.command = "help";
+    } else {
+      throw new Error(`Unknown argument: ${arg}`);
+    }
+  }
+  if (result.mode === "custom" && !result.customPath) {
+    throw new Error("--path requires a directory value");
+  }
+  if (!result.mode) {
+    result.mode = "local";
+  }
+  return result;
+}
+function resolveSkillsDir(mode, customPath) {
+  if (mode === "local") {
+    return path.join(process.cwd(), ".codex", "skills");
+  }
+  if (mode === "global") {
+    const codexHome = process.env.CODEX_HOME || path.join(os.homedir(), ".codex");
+    return path.join(codexHome, "skills");
+  }
+  return path.resolve(process.cwd(), customPath);
+}
+function copyDir(sourceDir, targetDir) {
+  fs.mkdirSync(targetDir, { recursive: true });
+  for (const entry of fs.readdirSync(sourceDir, { withFileTypes: true })) {
+    const sourcePath = path.join(sourceDir, entry.name);
+    const targetPath = path.join(targetDir, entry.name);
+    if (entry.isDirectory()) {
+      copyDir(sourcePath, targetPath);
+    } else {
+      fs.copyFileSync(sourcePath, targetPath);
+      const stat = fs.statSync(sourcePath);
+      fs.chmodSync(targetPath, stat.mode);
+    }
+  }
+}
+function installSkill(destinationDir, force) {
+  const skillTargetDir = path.join(destinationDir, SKILL_NAME);
+  if (!fs.existsSync(SOURCE_SKILL_DIR)) {
+    throw new Error(`Bundled skill not found: ${SOURCE_SKILL_DIR}`);
+  }
+  if (fs.existsSync(skillTargetDir)) {
+    if (!force) {
+      throw new Error(`Skill already exists at ${skillTargetDir}. Re-run with --force to replace it.`);
+    }
+    fs.rmSync(skillTargetDir, { recursive: true, force: true });
+  }
+  fs.mkdirSync(destinationDir, { recursive: true });
+  copyDir(SOURCE_SKILL_DIR, skillTargetDir);
+  return skillTargetDir;
+}
+function main() {
+  let args;
+  try {
+    args = parseArgs(process.argv.slice(2));
+  } catch (error) {
+    console.error(`Error: ${error.message}`);
+    printHelp();
+    process.exit(1);
+  }
+  if (args.command === "help") {
+    printHelp();
+    return;
+  }
+  const destinationDir = resolveSkillsDir(args.mode, args.customPath);
+  if (args.command === "where") {
+    console.log(path.join(destinationDir, SKILL_NAME));
+    return;
+  }
+  if (args.command !== "install") {
+    console.error(`Unknown command: ${args.command}`);
+    printHelp();
+    process.exit(1);
+  }
+  try {
+    const installedPath = installSkill(destinationDir, args.force);
+    console.log(`Installed ${SKILL_NAME} to ${installedPath}`);
+    console.log("Invoke it in Codex with $harness-engine.");
+  } catch (error) {
+    console.error(`Install failed: ${error.message}`);
+    process.exit(1);
+  }
+}
+main();

package/package.json ADDED Viewed

@@ -0,0 +1,31 @@
+{
+  "name": "@hallucination-studio/harness-engine",
+  "version": "1.0.0-beta.10.9ff10d9",
+  "description": "Install the harness-engine Codex skill for initializing and reconciling advanced repository harness docs.",
+  "repository": {
+    "type": "git",
+    "url": "git+https://github.com/hallucination-studio/harness-engine.git"
+  },
+  "bin": {
+    "harness-engine": "bin/install.js"
+  },
+  "publishConfig": {
+    "access": "public"
+  },
+  "scripts": {
+    "test": "python3 skills/harness-engine/evals/run_evals.py",
+    "smoke:install": "node scripts/smoke_install.js",
+    "pack:check": "npm pack --dry-run"
+  },
+  "files": [
+    "bin",
+    "skills/**/SKILL.md",
+    "skills/**/agents/**",
+    "skills/**/assets/**",
+    "skills/**/evals/*.json",
+    "skills/**/evals/*.py",
+    "skills/**/references/**",
+    "skills/**/scripts/*.py"
+  ],
+  "license": "MIT"
+}

package/skills/harness-engine/SKILL.md ADDED Viewed

@@ -0,0 +1,82 @@
+---
+name: harness-engine
+description: Initialize or refresh an advanced harness-engineering repository shape for Codex-driven projects. Use when Codex needs to analyze a repository, ask the human to confirm high-impact product and architecture facts, and then run the harness-engine init workflow to create or reconcile AGENTS.md, architecture docs, policy docs, plan folders, reference folders, and SOP-backed starter files.
+---
+# Harness Engine
+Run the packaged script to inspect the target repository before editing files. Use the generated analysis to decide what to ask the human, what durable knowledge is missing from the repo, and which execution-plan and SOP files must be created or reconciled.
+## Workflow
+1. Run `python3 scripts/manage_harness.py analyze --repo <target-repo> --output <analysis.json>`.
+2. Read `analysis.json`.
+3. Ask the human only the unresolved, high-impact questions from `human_confirmations`.
+4. Run `python3 scripts/manage_harness.py sample-answers --analysis <analysis.json> --output <answers.json>`.
+5. Fill the placeholders in `answers.json` from the repository and the human's confirmed answers.
+6. Run `python3 scripts/manage_harness.py init --repo <target-repo> --answers <answers.json>`. This is the single workspace entrypoint: it creates a new harness when none exists, and reconciles a managed or partial harness when managed harness files are already present. Reconcile refreshes managed files, backfills newly introduced managed files, and preserves unmanaged user files. Pass `--force` only with explicit user approval.
+7. If the task is multi-step, run `python3 scripts/manage_harness.py plan-start --repo <target-repo> --slug <task-name> --goal "<goal>"`.
+8. If you learn durable facts during the work, run `python3 scripts/manage_harness.py knowledge-log --repo <target-repo> --plan <plan-file> --fact "<fact>" --destination <durable-doc>` and keep the returned `id`. Use `--fact-file <file>` when the fact contains shell-sensitive characters.
+9. Before closing the task, write those facts into their durable docs.
+10. Run `python3 scripts/manage_harness.py knowledge-mark-written --repo <target-repo> --plan <plan-file> --id <knowledge-id> --evidence "<verbatim text already in durable doc>"`; prefer `--evidence-file <file>` when evidence contains backticks, globs, quotes, pipes, or other shell-sensitive characters. Evidence must be copied from the destination doc, not summarized. Use `--append` only when the exact fact should be appended mechanically.
+11. If validation, evals, browser checks, or code review reveal a bug, immediately run `python3 scripts/manage_harness.py defect-log --repo <target-repo> --plan <plan-file> --severity <P0|P1|P2|P3> --summary "<bug>" --evidence "<failing check>"`. This forces the quality gate to fail.
+12. Fix logged defects, then run `python3 scripts/manage_harness.py defect-resolve --repo <target-repo> --plan <plan-file> --id <bug-id> --fix-evidence "<passing check or code evidence>"`.
+13. Score the finished work with `python3 scripts/manage_harness.py quality-score --repo <target-repo> --plan <plan-file> --product-correctness <0-10> --product-note "<evidence>" --ux-operator-clarity <0-10> --ux-note "<evidence>" --architecture-maintainability <0-10> --architecture-note "<evidence>" --reliability-observability <0-10> --reliability-note "<evidence>" --security-data-handling <0-10> --security-note "<evidence>"`. Every dimension needs an evidence note.
+14. If `quality-score` fails, treat `## Rework Required` in the plan as the next implementation input, fix the work, then run `quality-score` again.
+15. For phased or resumable work, run `python3 scripts/manage_harness.py phase-set --repo <target-repo> --plan <plan-file> --mode <multi-phase|paused|completed|stopped> --workstream <id> --current-phase <n> --continuation <target> --next-action "<next action>"`, then update `workstreams.md` with `workstream-upsert`.
+16. Before closing, replace generic plan placeholders with task-specific scope, constraints, steps, validation, and completion notes; leave no open durable-knowledge placeholder except the default unused line.
+17. Close the plan with `python3 scripts/manage_harness.py plan-close --repo <target-repo> --plan <plan-file> --summary "<summary>"`.
+18. Before handoff, run `python3 .codex/skills/harness-engine/scripts/manage_harness.py check --repo <target-repo>` from an installed target repository.
+19. To review stale generated evidence, run `python3 scripts/manage_harness.py evidence-prune --repo <target-repo>` first; it is dry-run by default. Add `--apply` only after checking the candidate list.
+20. After changing this skill, run `python3 evals/run_evals.py` and iterate until it passes.
+## Reading Order
+- Read [references/workflow.md](references/workflow.md) first for the operating model and question policy.
+- Read [references/file-map.md](references/file-map.md) when deciding which generated file to edit.
+- Read [references/question-catalog.md](references/question-catalog.md) when the analysis surfaces ambiguous product, security, reliability, or frontend facts.
+- Read [references/knowledge-capture.md](references/knowledge-capture.md) when you discover facts that should survive chat history.
+- Read [references/exec-plans.md](references/exec-plans.md) before planning or updating any multi-step work.
+- Read [references/sop-index.md](references/sop-index.md) to choose the right SOP for architecture, UI validation, observability, or knowledge capture work.
+- Read [references/template-policy.md](references/template-policy.md) before overwriting existing files.
+- Read [references/evaluation-loop.md](references/evaluation-loop.md) before changing the skill, templates, scripts, or policy references.
+- Read [references/evidence-first-evals.md](references/evidence-first-evals.md) before designing evals for product correctness, frontend validation, or bug-discovery coverage.
+## Command Rules
+- Prefer `analyze` before `init`.
+- Prefer the draft, test, evaluate, iterate loop for changes to this skill.
+- Use `init` as the workspace entrypoint for both creation and reconciliation. It refreshes managed harness files when an existing managed harness is detected and preserves unmanaged user files. Use `--force` only when the human accepts overwriting.
+- Do not overwrite existing files unless the human asked for it or you pass `--force`.
+- Treat the generated files as starting points. After generation, tighten them with repository-specific details instead of leaving placeholders behind.
+- Before plan close, replace or remove task placeholders such as "Define in-scope work", "Add the first concrete step", "Describe how the work will be verified", and any ad hoc durable-knowledge TODOs.
+- Treat `docs/exec-plans/` as required state for multi-step work, not optional notes.
+- Read `docs/exec-plans/workstreams.md` before resuming interrupted feature, refactor, reliability, security, frontend, or cleanup work.
+- Treat `docs/sops/` as mechanical operating procedures, not background reading.
+- When you answer a question using facts that are not yet in the repo but should be reusable, write them into a durable doc before finishing.
+- Prefer `knowledge-mark-written --id ... --evidence-file ...` so durable docs can use natural wording without shell quoting failures or duplicated exact fact strings.
+- The knowledge evidence text must exist verbatim in the destination doc; if it is only a paraphrase, write the durable doc first or use a file containing exact destination text.
+- Use `defect-log` for every bug found by tests, evals, browser validation, or code review; unresolved defects must block handoff.
+- Use `defect-resolve` only after the implementation is fixed and you can cite passing validation or code evidence.
+- Use `quality-score` before `plan-close`; include `--product-note`, `--ux-note`, `--architecture-note`, `--reliability-note`, and `--security-note`; failed scores must drive rework, not handoff.
+- Use `phase-set` and `workstream-upsert` before `plan-close` for Phase 1/2/3 or any other resumable multi-plan work.
+- Use `plan-close` as the final guardrail so plan state, quality score, and durable docs stay synchronized.
+- Use `check` as the local handoff guardrail for user repositories.
+- Use `evidence-prune` as a cleanup preview for old unreferenced files under `docs/generated/`; it never deletes unless `--apply` is present.
+- Run `python3 evals/run_evals.py` after skill changes, read the structured report, and treat per-case failures as iteration input.
+- Do not add CI to user repositories unless the human explicitly asks for it.
+## Output Rules
+- Keep `AGENTS.md` short and routing-oriented.
+- Keep durable knowledge in repo docs, not in chat-only explanations.
+- Keep plans under `docs/exec-plans/active/` and move finished plans to `docs/exec-plans/completed/`.
+- Keep resumable workstreams in `docs/exec-plans/workstreams.md`.
+- Keep generated material under `docs/generated/`.
+- Keep external, model-friendly references under `docs/references/`.
+- Keep SOPs explicit and task-triggered so the next agent can follow the same path mechanically.
+## Assets
+- Scaffold templates live under [assets/repo-template](assets/repo-template).
+- SOP starter docs live under [assets/sops](assets/sops).

package/skills/harness-engine/agents/openai.yaml ADDED Viewed

@@ -0,0 +1,4 @@
+interface:
+  display_name: "Harness Engine"
+  short_description: "Scaffold advanced Codex harness docs"
+  default_prompt: "Use $harness-engine to analyze this repository and scaffold or refresh its advanced harness documentation."

package/skills/harness-engine/assets/repo-template/.keep ADDED Viewed

	@@ -0,0 +1 @@
1	+ starter files are generated by scripts/manage_harness.py

package/skills/harness-engine/assets/sops/.keep ADDED Viewed

	@@ -0,0 +1 @@
1	+ sop starter files are generated by scripts/manage_harness.py

package/skills/harness-engine/evals/cases.json ADDED Viewed

@@ -0,0 +1,50 @@
+[
+  {
+    "id": "empty-repo-init",
+    "description": "Empty repositories should receive the full advanced harness scaffold."
+  },
+  {
+    "id": "frontend-analysis",
+    "description": "Frontend repositories should trigger frontend-specific confirmation and policy output."
+  },
+  {
+    "id": "init-reconciles-existing-harness",
+    "description": "Init should reconcile an existing harness by refreshing managed files and adding newly introduced managed files."
+  },
+  {
+    "id": "closed-loop-plan",
+    "description": "Execution plans should refuse to close until durable knowledge is written back."
+  },
+  {
+    "id": "phase-continuity-workstream",
+    "description": "Phased plans should record continuation and keep workstream references recoverable."
+  },
+  {
+    "id": "plan-path-canonicalization",
+    "description": "Plan commands should canonicalize absolute plan paths before updating workstreams."
+  },
+  {
+    "id": "defect-recovery-loop",
+    "description": "Validation or review defects should block quality gates until resolved with evidence."
+  },
+  {
+    "id": "quality-score-requires-notes",
+    "description": "Quality scoring should reject missing evidence notes and name the missing dimensions."
+  },
+  {
+    "id": "knowledge-evidence-verbatim",
+    "description": "Knowledge closure should reject paraphrased evidence and accept exact destination text."
+  },
+  {
+    "id": "evidence-prune-generated-artifacts",
+    "description": "Generated evidence cleanup should dry-run by default and remove only stale unreferenced artifacts when applied."
+  },
+  {
+    "id": "eval-report-shape",
+    "description": "Eval output should expose structured per-case scores, findings, user-facing messages, and recommended actions."
+  },
+  {
+    "id": "preserve-unmanaged-docs",
+    "description": "Existing user-owned harness files should be skipped unless explicitly forced."
+  }
+]