npm - @hallucination-studio/harness-engine - Versions diffs - 1.0.0-beta.11.2a4849a → 1.0.0-beta.13.cf40fab - Mend

@hallucination-studio/harness-engine 1.0.0-beta.11.2a4849a → 1.0.0-beta.13.cf40fab

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/.codex-plugin/plugin.json +6 -0
package/README.md +64 -4
package/bin/install.js +57 -18
package/package.json +2 -1
package/skills/harness-engine/SKILL.md +39 -18
package/skills/harness-engine/evals/cases.json +20 -0
package/skills/harness-engine/evals/run_evals.py +179 -0
package/skills/harness-engine/scripts/manage_harness.py +122 -5

package/.codex-plugin/plugin.json ADDED Viewed

@@ -0,0 +1,6 @@
+{
+  "name": "harness-engine",
+  "version": "1.0.0",
+  "description": "Repository harness skill for Codex with Google DESIGN.md control-plane guidance.",
+  "skills": "./skills/"
+}

package/README.md CHANGED Viewed

@@ -24,6 +24,7 @@ ask for missing high-impact facts, create the harness files, and keep future wor
 - Supports durable knowledge closure with stable knowledge IDs and evidence text, so permanent docs can use natural wording instead of duplicated checklist strings.
 - Enforces a local quality gate for execution plans; failed scores write `## Rework Required` into the plan and block `plan-close`.
 - Tracks resumable workstreams so interrupted features, refactors, reliability work, and cleanup efforts can be recovered from repo state instead of chat history.
+- Generates a frontend/design control plane that tells target projects to own `docs/DESIGN.md` and validate or export it with the official `@google/design.md` package.
 ## Why It Exists
@@ -65,22 +66,51 @@ Install into a custom skills directory:
 npx @hallucination-studio/harness-engine install --path /path/to/skills
 ```
-Replace an existing installed skill:
+Replace an existing installed plugin bundle:
 ```bash
 npx @hallucination-studio/harness-engine install --local --force
 ```
-Show where the skill would be installed:
+Show where the plugin bundle would be installed:
 ```bash
 npx @hallucination-studio/harness-engine where --local
 ```
+## Target Project Dependency: Google DESIGN.md
+Harness Engine depends on the official Google DESIGN.md workflow for frontend style creation, but
+does not bundle Google source code or install Google's package for the user. Install Google
+DESIGN.md as a dev dependency in each target project that needs frontend style creation,
+validation, diffs, or token exports:
+```bash
+npm install --save-dev @google/design.md
+```
+Use Google/Stitch to create the real `docs/DESIGN.md` through one of Google's documented paths:
+- Create from a prompt in Stitch by describing the intended vibe, product, audience, and interaction feel.
+- Derive from branding in Stitch by providing a brand URL or image.
+- Write it by hand as markdown with optional YAML frontmatter.
+Then validate or export from the target repository:
+```bash
+npx @google/design.md lint docs/DESIGN.md
+npx @google/design.md export docs/DESIGN.md --format css-tailwind
+npx @google/design.md diff docs/DESIGN.md docs/DESIGN.next.md
+```
+Harness Engine's role is to generate the control plane: `docs/FRONTEND.md` tells agents to read
+`docs/DESIGN.md`, defines which project files are controlled by it, and blocks treating the
+placeholder `status: design-source-required` DESIGN.md as an approved visual style.
 ## Update An Installed Skill Package
-The `npx` installer only installs or replaces the Codex skill package. To update an already
-installed skill, rerun `install` with `--force` in the same install location.
+The `npx` installer installs or replaces the Codex plugin bundle and compatibility skill entries.
+To update an already installed bundle, rerun `install` with `--force` in the same install location.
 Replace the local skill install:
@@ -149,6 +179,27 @@ The installed skill exposes the underlying script at:
 python3 .codex/skills/harness-engine/scripts/manage_harness.py --help
 ```
+For frontend or visual-design work, the generated harness uses `docs/FRONTEND.md` to route agents through `docs/DESIGN.md`. Harness Engine does not generate style, choose themes, extract branding, or vendor Google DESIGN.md code.
+Create the real `docs/DESIGN.md` through one of Google's documented paths:
+- Create from a prompt in Stitch by describing the intended vibe, product, audience, and interaction feel.
+- Derive from branding in Stitch by providing a brand URL or image.
+- Write it by hand as markdown with optional YAML frontmatter.
+Use Google's examples as references, not vendored source: `https://github.com/google-labs-code/design.md/tree/main/examples`.
+Install the official package in the target project when the project wants DESIGN.md validation, diffs, or token exports:
+```bash
+npm install --save-dev @google/design.md
+npx @google/design.md lint docs/DESIGN.md
+npx @google/design.md export docs/DESIGN.md --format css-tailwind
+npx @google/design.md diff docs/DESIGN.md docs/DESIGN.next.md
+```
+`docs/FRONTEND.md` defines which files are controlled by `docs/DESIGN.md`: generated token exports under `docs/design-docs/` or `src/styles/`, Tailwind theme files, global CSS variables, component theme modules, Storybook/theme previews, and UI implementation files that consume those tokens. Agents should read `docs/FRONTEND.md`, then `docs/DESIGN.md`, then generated token exports before changing controlled UI files.
 Common commands:
 ```bash
@@ -268,6 +319,15 @@ Check npm package contents:
 npm run pack:check
 ```
+Before release, run:
+```bash
+npm test
+npm run smoke:install
+npm run pack:check
+git diff --check
+```
 The publish workflows expect an npm token when trusted publishing is not yet configured:
 ```text

package/bin/install.js CHANGED Viewed

@@ -5,8 +5,8 @@ const os = require("os");
 const path = require("path");
 const PACKAGE_ROOT = path.resolve(__dirname, "..");
-const SKILL_NAME = "harness-engine";
-const SOURCE_SKILL_DIR = path.join(PACKAGE_ROOT, "skills", SKILL_NAME);
+const BUNDLE_NAME = "harness-engine-plugin";
+const BUNDLE_ENTRIES = [".codex-plugin", "skills"];
 function printHelp() {
   console.log(`harness-engine
@@ -19,7 +19,7 @@ Options:
   --local         Install into <cwd>/.codex/skills
   --global        Install into \${CODEX_HOME:-~/.codex}/skills
   --path <dir>    Install into a custom skills directory
-  --force         Replace an existing installed skill
+  --force         Replace an existing installed bundle
   -h, --help      Show this help text
 `);
 }
@@ -85,32 +85,71 @@ function copyDir(sourceDir, targetDir) {
   for (const entry of fs.readdirSync(sourceDir, { withFileTypes: true })) {
     const sourcePath = path.join(sourceDir, entry.name);
     const targetPath = path.join(targetDir, entry.name);
-    if (entry.isDirectory()) {
+    const stat = fs.statSync(sourcePath);
+    if (stat.isDirectory()) {
       copyDir(sourcePath, targetPath);
+    } else if (entry.isSymbolicLink()) {
+      const linkTarget = fs.readlinkSync(sourcePath);
+      fs.symlinkSync(linkTarget, targetPath);
     } else {
       fs.copyFileSync(sourcePath, targetPath);
-      const stat = fs.statSync(sourcePath);
       fs.chmodSync(targetPath, stat.mode);
     }
   }
 }
-function installSkill(destinationDir, force) {
-  const skillTargetDir = path.join(destinationDir, SKILL_NAME);
-  if (!fs.existsSync(SOURCE_SKILL_DIR)) {
-    throw new Error(`Bundled skill not found: ${SOURCE_SKILL_DIR}`);
+function copyEntry(sourcePath, targetPath) {
+  const stat = fs.lstatSync(sourcePath);
+  if (stat.isDirectory()) {
+    copyDir(sourcePath, targetPath);
+  } else if (stat.isSymbolicLink()) {
+    fs.symlinkSync(fs.readlinkSync(sourcePath), targetPath);
+  } else {
+    fs.mkdirSync(path.dirname(targetPath), { recursive: true });
+    fs.copyFileSync(sourcePath, targetPath);
+    fs.chmodSync(targetPath, fs.statSync(sourcePath).mode);
   }
+}
-  if (fs.existsSync(skillTargetDir)) {
-    if (!force) {
-      throw new Error(`Skill already exists at ${skillTargetDir}. Re-run with --force to replace it.`);
+function assertBundleSources() {
+  for (const entry of BUNDLE_ENTRIES) {
+    const sourcePath = path.join(PACKAGE_ROOT, entry);
+    if (!fs.existsSync(sourcePath)) {
+      throw new Error(`Bundled plugin entry not found: ${sourcePath}`);
     }
-    fs.rmSync(skillTargetDir, { recursive: true, force: true });
+  }
+}
+function removeIfExists(targetPath, force, label) {
+  if (!fs.existsSync(targetPath)) {
+    return;
+  }
+  if (!force) {
+    throw new Error(`${label} already exists at ${targetPath}. Re-run with --force to replace it.`);
   }
+  fs.rmSync(targetPath, { recursive: true, force: true });
+}
+function installBundle(destinationDir, force) {
+  assertBundleSources();
   fs.mkdirSync(destinationDir, { recursive: true });
-  copyDir(SOURCE_SKILL_DIR, skillTargetDir);
-  return skillTargetDir;
+  const bundleTargetDir = path.join(destinationDir, BUNDLE_NAME);
+  removeIfExists(bundleTargetDir, force, "Plugin bundle");
+  fs.mkdirSync(bundleTargetDir, { recursive: true });
+  for (const entry of BUNDLE_ENTRIES) {
+    copyEntry(path.join(PACKAGE_ROOT, entry), path.join(bundleTargetDir, entry));
+  }
+  // Compatibility: older users invoke $harness-engine from a normal skills directory.
+  // Keep a top-level skill copy in place while the plugin root carries the bundle.
+  const compatTarget = path.join(destinationDir, "harness-engine");
+  removeIfExists(compatTarget, force, "Compatibility skill");
+  copyDir(path.join(PACKAGE_ROOT, "skills", "harness-engine"), compatTarget);
+  return bundleTargetDir;
 }
 function main() {
@@ -131,7 +170,7 @@ function main() {
   const destinationDir = resolveSkillsDir(args.mode, args.customPath);
   if (args.command === "where") {
-    console.log(path.join(destinationDir, SKILL_NAME));
+    console.log(path.join(destinationDir, BUNDLE_NAME));
     return;
   }
@@ -142,8 +181,8 @@ function main() {
   }
   try {
-    const installedPath = installSkill(destinationDir, args.force);
-    console.log(`Installed ${SKILL_NAME} to ${installedPath}`);
+    const installedPath = installBundle(destinationDir, args.force);
+    console.log(`Installed ${BUNDLE_NAME} plugin bundle to ${installedPath}`);
     console.log("Invoke it in Codex with $harness-engine.");
   } catch (error) {
     console.error(`Install failed: ${error.message}`);

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@hallucination-studio/harness-engine",
-  "version": "1.0.0-beta.11.2a4849a",
+  "version": "1.0.0-beta.13.cf40fab",
   "description": "Install the harness-engine Codex skill for initializing and reconciling advanced repository harness docs.",
   "repository": {
     "type": "git",
@@ -19,6 +19,7 @@
   },
   "files": [
     "bin",
+    ".codex-plugin/**",
     "skills/**/SKILL.md",
     "skills/**/agents/**",
     "skills/**/assets/**",

package/skills/harness-engine/SKILL.md CHANGED Viewed

@@ -12,24 +12,25 @@ Run the packaged script to inspect the target repository before editing files. U
 1. Run `python3 scripts/manage_harness.py analyze --repo <target-repo> --output <analysis.json>`.
 2. Read `analysis.json`.
 3. Ask the human only the unresolved, high-impact questions from `human_confirmations`.
-4. Run `python3 scripts/manage_harness.py sample-answers --analysis <analysis.json> --output <answers.json>`.
-5. Fill the placeholders in `answers.json` from the repository and the human's confirmed answers.
-6. Run `python3 scripts/manage_harness.py init --repo <target-repo> --answers <answers.json>`. This is the single workspace entrypoint: it creates a new harness when none exists, and reconciles a managed or partial harness when managed harness files are already present. Reconcile refreshes managed files, backfills newly introduced managed files, and preserves unmanaged user files. Pass `--force` only with explicit user approval.
-7. If the task is multi-step, run `python3 scripts/manage_harness.py plan-start --repo <target-repo> --slug <task-name> --goal "<goal>"`.
-8. If you learn durable facts during the work, run `python3 scripts/manage_harness.py knowledge-log --repo <target-repo> --plan <plan-file> --fact "<fact>" --destination <durable-doc>` and keep the returned `id`. Use `--fact-file <file>` when the fact contains shell-sensitive characters.
-9. Before closing the task, write those facts into their durable docs.
-10. Run `python3 scripts/manage_harness.py knowledge-mark-written --repo <target-repo> --plan <plan-file> --id <knowledge-id> --evidence "<verbatim text already in durable doc>"`; prefer `--evidence-file <file>` when evidence contains backticks, globs, quotes, pipes, or other shell-sensitive characters. Evidence must be copied from the destination doc, not summarized. Use `--append` only when the exact fact should be appended mechanically.
-11. If validation, evals, browser checks, or code review reveal a bug, immediately run `python3 scripts/manage_harness.py defect-log --repo <target-repo> --plan <plan-file> --severity <P0|P1|P2|P3> --summary "<bug>" --evidence "<failing check>"`. This forces the quality gate to fail.
-12. Fix logged defects, then run `python3 scripts/manage_harness.py defect-resolve --repo <target-repo> --plan <plan-file> --id <bug-id> --fix-evidence "<passing check or code evidence>"`.
-13. Score the finished work with `python3 scripts/manage_harness.py quality-score --repo <target-repo> --plan <plan-file> --product-correctness <0-10> --product-note "<evidence>" --ux-operator-clarity <0-10> --ux-note "<evidence>" --architecture-maintainability <0-10> --architecture-note "<evidence>" --reliability-observability <0-10> --reliability-note "<evidence>" --security-data-handling <0-10> --security-note "<evidence>"`. Every dimension needs an evidence note.
-14. If `quality-score` fails, treat `## Rework Required` in the plan as the next implementation input, fix the work, then run `quality-score` again.
-15. For phased or resumable work, run `python3 scripts/manage_harness.py phase-set --repo <target-repo> --plan <plan-file> --mode <multi-phase|paused|completed|stopped> --workstream <id> --current-phase <n> --continuation <target> --next-action "<next action>"`, then update `workstreams.md` with `workstream-upsert`.
-16. Before closing, replace generic plan placeholders with task-specific scope, constraints, steps, validation, and completion notes; leave no open durable-knowledge placeholder except the default unused line.
-17. Close the plan with `python3 scripts/manage_harness.py plan-close --repo <target-repo> --plan <plan-file> --summary "<summary>"`.
-18. Before handoff, run `python3 .codex/skills/harness-engine/scripts/manage_harness.py check --repo <target-repo>` from an installed target repository.
-19. To review stale generated evidence, run `python3 scripts/manage_harness.py evidence-prune --repo <target-repo>` first; it is dry-run by default. Add `--apply` only after checking the candidate list.
-20. To clean transient harness runtime files or remove already committed runtime files from the remote, run `python3 scripts/manage_harness.py clean --repo <target-repo>` first; it is dry-run by default. Add `--apply` to clean local runtime state, update `.gitignore`, and stage `git rm --cached` removals, then commit and push.
-21. After changing this skill, run `python3 evals/run_evals.py` and iterate until it passes.
+4. During initialization, create `docs/DESIGN.md`, `docs/FRONTEND.md`, and `docs/design-docs/style-options.md` as the target repository's design control plane. The target project owns `docs/DESIGN.md` and must create the real design system through an official Google DESIGN.md path: prompt in Stitch, brand URL/image import in Stitch, or hand-authored markdown/YAML. Harness-engine does not generate style, choose themes, extract branding, or vendor Google DESIGN.md source.
+5. Run `python3 scripts/manage_harness.py sample-answers --analysis <analysis.json> --output <answers.json>`.
+6. Fill the placeholders in `answers.json` from the repository and the human's confirmed answers.
+7. Run `python3 scripts/manage_harness.py init --repo <target-repo> --answers <answers.json>`. This is the single workspace entrypoint: it creates a new harness when none exists, and reconciles a managed or partial harness when managed harness files are already present. Reconcile refreshes managed files, backfills newly introduced managed files, and preserves unmanaged user files. Pass `--force` only with explicit user approval.
+8. If the task is multi-step, run `python3 scripts/manage_harness.py plan-start --repo <target-repo> --slug <task-name> --goal "<goal>"`.
+9. If you learn durable facts during the work, run `python3 scripts/manage_harness.py knowledge-log --repo <target-repo> --plan <plan-file> --fact "<fact>" --destination <durable-doc>` and keep the returned `id`. Use `--fact-file <file>` when the fact contains shell-sensitive characters.
+10. Before closing the task, write those facts into their durable docs.
+11. Run `python3 scripts/manage_harness.py knowledge-mark-written --repo <target-repo> --plan <plan-file> --id <knowledge-id> --evidence "<verbatim text already in durable doc>"`; prefer `--evidence-file <file>` when evidence contains backticks, globs, quotes, pipes, or other shell-sensitive characters. Evidence must be copied from the destination doc, not summarized. Use `--append` only when the exact fact should be appended mechanically.
+12. If validation, evals, browser checks, or code review reveal a bug, immediately run `python3 scripts/manage_harness.py defect-log --repo <target-repo> --plan <plan-file> --severity <P0|P1|P2|P3> --summary "<bug>" --evidence "<failing check>"`. This forces the quality gate to fail.
+13. Fix logged defects, then run `python3 scripts/manage_harness.py defect-resolve --repo <target-repo> --plan <plan-file> --id <bug-id> --fix-evidence "<passing check or code evidence>"`.
+14. Score the finished work with `python3 scripts/manage_harness.py quality-score --repo <target-repo> --plan <plan-file> --product-correctness <0-10> --product-note "<evidence>" --ux-operator-clarity <0-10> --ux-note "<evidence>" --architecture-maintainability <0-10> --architecture-note "<evidence>" --reliability-observability <0-10> --reliability-note "<evidence>" --security-data-handling <0-10> --security-note "<evidence>"`. Every dimension needs an evidence note.
+15. If `quality-score` fails, treat `## Rework Required` in the plan as the next implementation input, fix the work, then run `quality-score` again.
+16. For phased or resumable work, run `python3 scripts/manage_harness.py phase-set --repo <target-repo> --plan <plan-file> --mode <multi-phase|paused|completed|stopped> --workstream <id> --current-phase <n> --continuation <target> --next-action "<next action>"`, then update `workstreams.md` with `workstream-upsert`.
+17. Before closing, replace generic plan placeholders with task-specific scope, constraints, steps, validation, and completion notes; leave no open durable-knowledge placeholder except the default unused line.
+18. Close the plan with `python3 scripts/manage_harness.py plan-close --repo <target-repo> --plan <plan-file> --summary "<summary>"`.
+19. Before handoff, run `python3 .codex/skills/harness-engine/scripts/manage_harness.py check --repo <target-repo>` from an installed target repository.
+20. To review stale generated evidence, run `python3 scripts/manage_harness.py evidence-prune --repo <target-repo>` first; it is dry-run by default. Add `--apply` only after checking the candidate list.
+21. To clean transient harness runtime files or remove already committed runtime files from the remote, run `python3 scripts/manage_harness.py clean --repo <target-repo>` first; it is dry-run by default. Add `--apply` to clean local runtime state, update `.gitignore`, and stage `git rm --cached` removals, then commit and push.
+22. After changing this skill, run `python3 evals/run_evals.py` and iterate until it passes.
 ## Reading Order
@@ -42,6 +43,7 @@ Run the packaged script to inspect the target repository before editing files. U
 - Read [references/template-policy.md](references/template-policy.md) before overwriting existing files.
 - Read [references/evaluation-loop.md](references/evaluation-loop.md) before changing the skill, templates, scripts, or policy references.
 - Read [references/evidence-first-evals.md](references/evidence-first-evals.md) before designing evals for product correctness, frontend validation, or bug-discovery coverage.
+- Read `docs/FRONTEND.md` and `docs/DESIGN.md` for frontend, UI, product design, visual design, canvas, or interface polish work. Use the target project's official `@google/design.md` CLI install to lint, diff, or export DESIGN.md-controlled files.
 ## Command Rules
@@ -68,6 +70,25 @@ Run the packaged script to inspect the target repository before editing files. U
 - Run `python3 evals/run_evals.py` after skill changes, read the structured report, and treat per-case failures as iteration input.
 - Do not add CI to user repositories unless the human explicitly asks for it.
+## Google DESIGN.md Integration
+Harness-engine does not vendor Google DESIGN.md source, choose themes, extract branding, or ship a local Google adapter. Target repositories should create the real `docs/DESIGN.md` through one of Google's documented paths:
+- Create from a prompt in Stitch.
+- Derive from branding in Stitch with a URL or image.
+- Write it by hand as markdown with optional YAML frontmatter.
+Use examples only as references: `https://github.com/google-labs-code/design.md/tree/main/examples`.
+After `docs/DESIGN.md` exists, target repositories should install the official package:
+```bash
+npm install --save-dev @google/design.md
+npx @google/design.md lint docs/DESIGN.md
+```
+Use `docs/FRONTEND.md` to control which project files read `docs/DESIGN.md`, which generated token exports are allowed, and how agents validate those files.
 ## Output Rules
 - Keep `AGENTS.md` short and routing-oriented.

package/skills/harness-engine/evals/cases.json CHANGED Viewed

@@ -50,5 +50,25 @@
   {
     "id": "preserve-unmanaged-docs",
     "description": "Existing user-owned harness files should be skipped unless explicitly forced."
+  },
+  {
+    "id": "official-google-design-cli-documented",
+    "description": "Generated design docs should instruct target projects to install and use the official @google/design.md CLI."
+  },
+  {
+    "id": "readme-google-design-dependency-documented",
+    "description": "README should clearly tell users that target projects must install the official @google/design.md dependency."
+  },
+  {
+    "id": "frontend-design-control-plane",
+    "description": "Generated FRONTEND.md should define which files are controlled by docs/DESIGN.md and how agents read them."
+  },
+  {
+    "id": "plugin-does-not-bundle-google-design",
+    "description": "The plugin manifest and installer should bundle harness-engine without a local Google DESIGN.md adapter or upstream source."
+  },
+  {
+    "id": "pack-excludes-google-source",
+    "description": "The npm package dry-run should exclude third-party Google DESIGN.md source while retaining harness-engine files."
   }
 ]

package/skills/harness-engine/evals/run_evals.py CHANGED Viewed

@@ -9,6 +9,7 @@ import time
 from pathlib import Path
 SKILL_DIR = Path(__file__).resolve().parents[1]
+REPO_ROOT = SKILL_DIR.parents[1]
 MANAGER = SKILL_DIR / "scripts" / "manage_harness.py"
 CASES_PATH = Path(__file__).with_name("cases.json")
@@ -136,6 +137,44 @@ def test_empty_repo_init(tmp_root):
     assert_contains(repo, "docs/FRONTEND.md", "Evidence For Meaningful UI Work")
     assert_contains(repo, "docs/FRONTEND.md", "Define and verify layout invariants")
     assert_contains(repo, "docs/FRONTEND.md", "preserve the primary task area")
+    assert_contains(repo, "docs/FRONTEND.md", "Read `docs/DESIGN.md` before implementing frontend")
+    assert_contains(repo, "docs/FRONTEND.md", "status: design-source-required")
+    assert_contains(repo, "docs/FRONTEND.md", "prompt in Stitch, brand URL/image import in Stitch, or hand-authored markdown/YAML")
+    assert_contains(repo, "docs/FRONTEND.md", "npm install --save-dev @google/design.md")
+    assert_contains(repo, "docs/FRONTEND.md", "npx @google/design.md lint docs/DESIGN.md")
+    assert_contains(repo, "docs/FRONTEND.md", "Generated design-token files must be derived from `docs/DESIGN.md`")
+    assert_contains(repo, "docs/FRONTEND.md", "Files controlled by `docs/DESIGN.md` include design token exports")
+    assert_contains(repo, "docs/FRONTEND.md", "Agents must read in this order for UI work")
+    assert_contains(repo, "docs/DESIGN.md", "version: alpha")
+    assert_contains(repo, "docs/DESIGN.md", "status: design-source-required")
+    assert_contains(repo, "docs/DESIGN.md", "## Overview")
+    assert_contains(repo, "docs/DESIGN.md", "Create the actual DESIGN.md through one of the official Google DESIGN.md paths")
+    assert_contains(repo, "docs/DESIGN.md", "Create from a prompt in Stitch")
+    assert_contains(repo, "docs/DESIGN.md", "Derive from branding in Stitch")
+    assert_contains(repo, "docs/DESIGN.md", "Write it by hand")
+    assert_contains(repo, "docs/DESIGN.md", "https://github.com/google-labs-code/design.md/tree/main/examples")
+    assert_contains(repo, "docs/DESIGN.md", "npm install --save-dev @google/design.md")
+    assert_contains(repo, "docs/DESIGN.md", "npx @google/design.md lint docs/DESIGN.md")
+    assert_contains(repo, "docs/DESIGN.md", "npx @google/design.md export docs/DESIGN.md --format <format>")
+    assert_contains(repo, "docs/DESIGN.md", "Harness Engine does not generate visual style, choose themes, derive branding, or vendor Google DESIGN.md source")
+    for heading in [
+        "## Colors",
+        "## Typography",
+        "## Layout",
+        "## Elevation & Depth",
+        "## Shapes",
+        "## Components",
+        "## Do's and Don'ts",
+    ]:
+        assert_contains(repo, "docs/DESIGN.md", heading)
+    assert_exists(repo, "docs/design-docs/style-options.md")
+    assert_contains(repo, "docs/design-docs/style-options.md", "Official Creation Paths")
+    assert_contains(repo, "docs/design-docs/style-options.md", "Create from a prompt in Stitch")
+    assert_contains(repo, "docs/design-docs/style-options.md", "Derive from branding in Stitch")
+    assert_contains(repo, "docs/design-docs/style-options.md", "Write it by hand")
+    assert_contains(repo, "docs/design-docs/style-options.md", "npm install --save-dev @google/design.md")
+    assert_contains(repo, "docs/design-docs/style-options.md", "Controlled Files")
+    assert_contains(repo, "docs/design-docs/style-options.md", "Do not hand-edit generated token exports")
     assert_contains(repo, "docs/sops/evidence-first-eval-loop.md", "Report per-case results")
     assert_contains(repo, "docs/sops/evidence-first-eval-loop.md", "Read the Issue Workflows in `AGENTS.md`")
@@ -201,6 +240,20 @@ def test_clean_removes_runtime_state_and_untracks_artifacts(tmp_root):
     repo = tmp_root / "clean-repo"
     repo.mkdir()
     subprocess.run(["git", "init"], cwd=repo, text=True, capture_output=True, check=True)
+    subprocess.run(
+        ["git", "config", "user.email", "harness-eval@example.com"],
+        cwd=repo,
+        text=True,
+        capture_output=True,
+        check=True,
+    )
+    subprocess.run(
+        ["git", "config", "user.name", "Harness Eval"],
+        cwd=repo,
+        text=True,
+        capture_output=True,
+        check=True,
+    )
     tracked_files = [
         ".codex/skills/harness-engine/SKILL.md",
         "docs/generated/canvas-polish-desktop-final.png",
@@ -1147,6 +1200,127 @@ def test_eval_report_shape(tmp_root):
         raise AssertionError("Eval report should include a user-facing failure message")
+def test_official_google_design_cli_documented(tmp_root):
+    repo = tmp_root / "official-google-design-repo"
+    repo.mkdir()
+    answers = tmp_root / "official-google-design-answers.json"
+    write_answers(answers, project_name="official-google-design-demo")
+    run_manager("init", "--repo", str(repo), "--answers", str(answers))
+    design_text = (repo / "docs" / "DESIGN.md").read_text()
+    options_text = (repo / "docs" / "design-docs" / "style-options.md").read_text()
+    skill_text = (SKILL_DIR / "SKILL.md").read_text()
+    for text, label in [(design_text, "DESIGN.md"), (options_text, "style-options.md"), (skill_text, "SKILL.md")]:
+        for needle in [
+            "Create from a prompt in Stitch",
+            "Derive from branding",
+            "Write it by hand",
+            "npm install --save-dev @google/design.md",
+            "npx @google/design.md lint docs/DESIGN.md",
+        ]:
+            if needle not in text:
+                raise AssertionError(f"{label} should document official Google DESIGN.md CLI usage: {needle}")
+    if "$google-design-style" in skill_text:
+        raise AssertionError("harness-engine SKILL.md should not route to a local google-design-style skill")
+    if "packaged Google DESIGN.md submodule" in skill_text:
+        raise AssertionError("harness-engine SKILL.md should not claim a packaged Google submodule")
+def test_readme_google_design_dependency_documented(tmp_root):
+    readme_text = (REPO_ROOT / "README.md").read_text()
+    for needle in [
+        "## Target Project Dependency: Google DESIGN.md",
+        "does not bundle Google source code or install Google's package for the user",
+        "npm install --save-dev @google/design.md",
+        "Use Google/Stitch to create the real `docs/DESIGN.md`",
+        "npx @google/design.md lint docs/DESIGN.md",
+        "Harness Engine's role is to generate the control plane",
+    ]:
+        if needle not in readme_text:
+            raise AssertionError(f"README should document the Google DESIGN.md target dependency: {needle}")
+def test_frontend_design_control_plane(tmp_root):
+    repo = tmp_root / "frontend-design-control-repo"
+    repo.mkdir()
+    answers = tmp_root / "frontend-design-control-answers.json"
+    write_answers(answers, project_name="frontend-design-control-demo")
+    run_manager("init", "--repo", str(repo), "--answers", str(answers))
+    frontend_text = (repo / "docs" / "FRONTEND.md").read_text()
+    for needle in [
+        "Read `docs/DESIGN.md` before implementing frontend",
+        "status: design-source-required",
+        "do not treat it as an approved visual style",
+        "Treat `docs/DESIGN.md` as the source of truth",
+        "Generated design-token files must be derived from `docs/DESIGN.md`",
+        "Files controlled by `docs/DESIGN.md` include design token exports",
+        "Tailwind theme files",
+        "global CSS variables",
+        "component theme modules",
+        "Storybook/theme previews",
+        "Agents must read in this order for UI work",
+        "Do not hand-edit generated token exports",
+    ]:
+        if needle not in frontend_text:
+            raise AssertionError(f"FRONTEND.md should define design control plane: {needle}")
+def test_plugin_does_not_bundle_google_design(tmp_root):
+    manifest = REPO_ROOT / ".codex-plugin" / "plugin.json"
+    if not manifest.exists():
+        raise AssertionError("Missing plugin manifest")
+    manifest_data = json.loads(manifest.read_text())
+    if manifest_data.get("skills") != "./skills/":
+        raise AssertionError("Plugin manifest should expose ./skills/")
+    smoke = subprocess.run(
+        ["node", str(REPO_ROOT / "scripts" / "smoke_install.js")],
+        cwd=REPO_ROOT,
+        text=True,
+        capture_output=True,
+        check=False,
+    )
+    if smoke.returncode != 0:
+        raise AssertionError(smoke.stderr or smoke.stdout)
+    installed = json.loads(smoke.stdout)
+    if not Path(installed["installed"]).name == "harness-engine-plugin":
+        raise AssertionError("smoke install should report the installed plugin root")
+    if (REPO_ROOT / "skills" / "google-design-style").exists():
+        raise AssertionError("Package should not include a local google-design-style skill")
+    if (REPO_ROOT / "third_party" / "google-design-md").exists():
+        raise AssertionError("Package should not include Google DESIGN.md source")
+def test_pack_excludes_google_source(tmp_root):
+    result = subprocess.run(
+        ["npm", "pack", "--dry-run", "--json"],
+        cwd=REPO_ROOT,
+        text=True,
+        capture_output=True,
+        check=False,
+    )
+    if result.returncode != 0:
+        raise AssertionError(result.stderr or result.stdout)
+    json_start = result.stdout.rfind("\n[")
+    if json_start == -1:
+        json_start = result.stdout.find("[")
+    if json_start == -1:
+        raise AssertionError(f"npm pack did not emit JSON file data: {result.stdout}")
+    pack_data = json.loads(result.stdout[json_start:].strip())
+    files = {item["path"] for item in pack_data[0]["files"]}
+    for required_path in [
+        ".codex-plugin/plugin.json",
+        "skills/harness-engine/SKILL.md",
+    ]:
+        if required_path not in files:
+            raise AssertionError(f"npm pack should include {required_path}")
+    forbidden_prefixes = [
+        "skills/google-design-style/",
+        "third_party/google-design-md/",
+    ]
+    for file_path in files:
+        if any(file_path.startswith(prefix) for prefix in forbidden_prefixes):
+            raise AssertionError(f"npm pack should not include Google design source or adapter: {file_path}")
 EVALS = [
     ("empty-repo-init", test_empty_repo_init),
     ("frontend-analysis", test_frontend_analysis),
@@ -1161,6 +1335,11 @@ EVALS = [
     ("evidence-prune-generated-artifacts", test_evidence_prune_generated_artifacts),
     ("eval-report-shape", test_eval_report_shape),
     ("preserve-unmanaged-docs", test_preserve_unmanaged_docs),
+    ("official-google-design-cli-documented", test_official_google_design_cli_documented),
+    ("readme-google-design-dependency-documented", test_readme_google_design_dependency_documented),
+    ("frontend-design-control-plane", test_frontend_design_control_plane),
+    ("plugin-does-not-bundle-google-design", test_plugin_does_not_bundle_google_design),
+    ("pack-excludes-google-source", test_pack_excludes_google_source),
 ]

package/skills/harness-engine/scripts/manage_harness.py CHANGED Viewed

@@ -198,17 +198,77 @@ For each issue:
 DOC_FILES = {
     "docs/DESIGN.md": """{marker}
+---
+version: alpha
+name: {project_name} Design System
+description: Placeholder for the project-owned DESIGN.md. Create the real design system through an official Google DESIGN.md creation path before UI implementation.
+status: design-source-required
+---
 # Design
-## Product Experience Bar
+## Overview
 {frontend_stack_notes}
-## Review Heuristics
+Harness Engine does not generate visual style, choose themes, derive branding, or vendor Google DESIGN.md source. This file is a control point for the real project-owned DESIGN.md.
+Create the actual DESIGN.md through one of the official Google DESIGN.md paths:
+1. Create from a prompt in Stitch: describe the intended vibe, product, audience, and interaction feel. Stitch generates the design system and summarizes it as DESIGN.md.
+2. Derive from branding in Stitch: provide a brand URL or image so Stitch can extract palette, typography, and style patterns into DESIGN.md.
+3. Write it by hand: advanced users can author markdown and optional YAML frontmatter directly.
+Chosen path for this project:
+{design_creation_path}
+After the real design system is created, install and use the official package in this target project:
+```bash
+npm install --save-dev @google/design.md
+npx @google/design.md lint docs/DESIGN.md
+```
+Use upstream examples only as references, not as vendored source:
+```text
+https://github.com/google-labs-code/design.md/tree/main/examples
+```
+## Colors
+Pending real DESIGN.md creation. Fill this from Stitch output, brand extraction, or hand-authored design decisions.
+## Typography
+Pending real DESIGN.md creation. Record font families, hierarchy, body readability, label treatment, and any tabular or technical text rules.
+## Layout
+Pending real DESIGN.md creation. Record grid, spacing rhythm, density, content grouping, responsive behavior, and workflow ergonomics.
+## Elevation & Depth
+Pending real DESIGN.md creation. Record how hierarchy is created through shadows, borders, tonal layers, transparency, or flat contrast.
+## Shapes
+Pending real DESIGN.md creation. Record shape language for buttons, cards, inputs, chips, modals, and fixed-format UI elements.
+## Components
-- Prefer intentional interaction patterns over generic defaults.
-- Keep visual and UX rationale durable in `docs/design-docs/`.
-- Validate meaningful UI work in a real browser before closing it out.
+Pending real DESIGN.md creation. Record treatment for buttons, form fields, navigation, cards or panels, tables or lists, badges, empty states, loading states, and error states.
+## Do's and Don'ts
+- Do replace this placeholder with a real Google DESIGN.md generated by Stitch, derived from branding, or written by hand.
+- Do run `npx @google/design.md lint docs/DESIGN.md` after edits.
+- Do export tokens with `npx @google/design.md export docs/DESIGN.md --format <format>` when the frontend stack consumes generated token files.
+- Do validate meaningful UI work in a real browser before closing it out.
+- Don't edit generated token exports by hand; update `docs/DESIGN.md` and regenerate them.
+- Don't treat this placeholder as an approved visual style.
+- Don't rely on harness-engine to generate, choose, or extract product taste.
 """,
     "docs/FRONTEND.md": """{marker}
 # Frontend
@@ -225,6 +285,17 @@ DOC_FILES = {
 {frontend_validation_loop}
+## Design Style Contract
+- Read `docs/DESIGN.md` before implementing frontend, UI, layout, visual-state, canvas, or interaction work.
+- If `docs/DESIGN.md` has `status: design-source-required` or pending sections, do not treat it as an approved visual style. First create the real DESIGN.md through an official Google path: prompt in Stitch, brand URL/image import in Stitch, or hand-authored markdown/YAML.
+- The project owns `docs/DESIGN.md`; maintain and validate it with the official Google package: `npm install --save-dev @google/design.md`, then `npx @google/design.md lint docs/DESIGN.md`.
+- Treat `docs/DESIGN.md` as the source of truth for UI tokens, colors, typography, spacing, radius, elevation, component treatment, and Do's and Don'ts.
+- Generated design-token files must be derived from `docs/DESIGN.md` with `npx @google/design.md export docs/DESIGN.md --format <format>`.
+- Files controlled by `docs/DESIGN.md` include design token exports under `docs/design-docs/` or `src/styles/`, Tailwind theme files, global CSS variables, component theme modules, Storybook/theme previews, and any UI implementation that consumes those tokens.
+- Agents must read in this order for UI work: `docs/FRONTEND.md`, `docs/DESIGN.md`, generated token exports, then the component or stylesheet being changed.
+- Do not hand-edit generated token exports. Update `docs/DESIGN.md`, regenerate exports with the official CLI, and cite the lint/export command in validation.
 ## Evidence For Meaningful UI Work
 - Capture desktop and mobile evidence for significant UI changes.
@@ -336,6 +407,46 @@ DOC_FILES = {
 - Add one document per durable design decision.
 - Link active design decisions from plans and specs.
+""",
+    "docs/design-docs/style-options.md": """{marker}
+# Design System Control
+The project owns `docs/DESIGN.md`. Harness Engine does not generate style or choose themes.
+## Official Creation Paths
+Create the real DESIGN.md through one of the official Google DESIGN.md paths:
+1. **Create from a prompt in Stitch**: describe the intended vibe, product, audience, and interaction feel.
+2. **Derive from branding in Stitch**: provide a brand URL or image so Stitch can extract palette, typography, and style patterns.
+3. **Write it by hand**: author markdown and optional YAML frontmatter directly.
+Use upstream examples only as references:
+```text
+https://github.com/google-labs-code/design.md/tree/main/examples
+```
+After `docs/DESIGN.md` contains the real project design system, install and use the official Google DESIGN.md CLI in this target repository:
+```bash
+npm install --save-dev @google/design.md
+npx @google/design.md lint docs/DESIGN.md
+```
+## Controlled Files
+- `docs/DESIGN.md`: source of truth for design tokens and design rationale.
+- `docs/design-docs/`: design decisions, lint reports, token export notes, and generated design evidence.
+- `src/styles/`, `app/styles/`, or equivalent style directories: CSS variables, Tailwind theme exports, or framework-specific theme modules generated from `docs/DESIGN.md`.
+- Component theme files, Storybook theme previews, and UI implementation files that consume exported tokens.
+## Operating Rules
+- Read `docs/FRONTEND.md` before editing controlled files.
+- Read `docs/DESIGN.md` before changing UI implementation.
+- Do not hand-edit generated token exports; edit `docs/DESIGN.md` and rerun the official CLI export command.
+- Store lint/export outputs under `docs/generated/` or cite the command output in the active plan.
 """,
     "docs/design-docs/core-beliefs.md": """{marker}
 # Core Beliefs
@@ -568,6 +679,11 @@ QUESTION_CATALOG = [
         "prompt": "If there is a frontend, what experience bar, platforms, or UX constraints should the docs enforce?",
         "reason": "Needed for design and frontend policies.",
     },
+    {
+        "id": "design_creation_path",
+        "prompt": "How should the project create its real DESIGN.md: Stitch prompt, Stitch brand URL/image import, or hand-authored markdown/YAML?",
+        "reason": "Needed because harness-engine does not generate visual style; the project must choose an official Google DESIGN.md creation path.",
+    },
     {
         "id": "quality_focus",
         "prompt": "Which product areas or architectural layers deserve the strictest quality scoring?",
@@ -734,6 +850,7 @@ def make_default_answers(analysis):
             if has_frontend
             else "No frontend detected. Replace this if the repo includes UI work."
         ),
+        "design_creation_path": "Choose one before UI implementation: create from a prompt in Stitch, derive from branding URL/image in Stitch, or write docs/DESIGN.md by hand.",
         "quality_focus": "List the product areas and architectural layers that deserve the strictest quality bar.",
         "frontend_scope": frontend_scope,
         "frontend_validation_loop": frontend_validation_loop,