npm - @harness-engineering/cli - Versions diffs - 1.13.1 → 1.14.0 - Mend

@harness-engineering/cli 1.13.1 → 1.14.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (139) hide show

package/dist/agents/skills/claude-code/harness-brainstorming/SKILL.md CHANGED Viewed

@@ -259,6 +259,45 @@ For each proposed approach, evaluate from each perspective:
 Converge on a recommendation that addresses all concerns before presenting the design.
+## Session State
+This skill reads and writes to the following session sections via `manage_state`:
+| Section       | Read | Write | Purpose                                                           |
+| ------------- | ---- | ----- | ----------------------------------------------------------------- |
+| terminology   | yes  | yes   | Captures domain terms discovered during brainstorming             |
+| decisions     | no   | yes   | Records design decisions made during exploration                  |
+| constraints   | yes  | no    | Reads constraints to scope brainstorming                          |
+| risks         | no   | yes   | Captures risks identified during brainstorming                    |
+| openQuestions | yes  | yes   | Adds new questions, resolves answered ones                        |
+| evidence      | no   | yes   | Cites sources for design recommendations and prior art references |
+**When to write:** After each phase transition (EXPLORE -> EVALUATE -> PRIORITIZE -> VALIDATE), append relevant entries to the appropriate sections. This ensures downstream skills (planning, execution) inherit accumulated context without re-discovery.
+**When to read:** At the start of Phase 1 (EXPLORE), read `terminology` and `constraints` from the session to inherit context from prior skills or previous brainstorming sessions on the same feature.
+## Evidence Requirements
+When this skill makes claims about existing code behavior, architecture patterns, or technical tradeoffs, it MUST cite evidence using one of:
+1. **File reference:** `file:line` format (e.g., `src/services/auth.ts:42` -- "existing JWT middleware handles token refresh")
+2. **Prior art reference:** `file` format with description (e.g., `src/utils/email.ts` -- "email utility already exists, can be reused for notifications")
+3. **Documentation reference:** `docs/path` format (e.g., `docs/changes/user-auth/proposal.md` -- "prior spec established OAuth2 as the auth standard")
+4. **Session evidence:** Write to the `evidence` session section:
+   ```json
+   manage_state({
+     action: "append_entry",
+     session: "<current-session>",
+     section: "evidence",
+     authorSkill: "harness-brainstorming",
+     content: "src/services/auth.ts:42 -- existing JWT middleware supports refresh tokens"
+   })
+   ```
+**When to cite:** During Phase 1 (EXPLORE) when referencing existing code or patterns. During Phase 3 (PRIORITIZE) when justifying tradeoffs with concrete code references. During Phase 4 (VALIDATE) when spec references existing implementation details.
+**Uncited claims:** Technical assertions without citations MUST be prefixed with `[UNVERIFIED]`. Example: `[UNVERIFIED] The current auth middleware does not support refresh tokens`. Uncited claims are flagged during review (Wave 2.2).
 ## Harness Integration
 - **`harness validate`** — Run after writing the spec to `docs/`. Verifies project health and that the new spec file is properly placed.

package/dist/agents/skills/claude-code/harness-code-review/SKILL.md CHANGED Viewed

@@ -138,6 +138,24 @@ Run mechanical checks to establish an exclusion boundary. Any issue caught mecha
 **Output:** A set of mechanical findings (file, line, tool, message). This set becomes the exclusion list for Phase 5.
+#### Evidence Gate (session-aware)
+When a `sessionSlug` is available (e.g., via autopilot dispatch or `--session` flag), the pipeline loads evidence entries from the session state and cross-references them with review findings:
+1. Load evidence entries: `readSessionSection(projectRoot, sessionSlug, 'evidence')`
+2. For each finding, check if any active evidence entry references the same file:line location
+3. Findings without matching evidence are tagged with `[UNVERIFIED]` prefix in their title
+4. An evidence coverage report is appended to the review output:
+   ```
+   Evidence Coverage:
+     Evidence entries: 12
+     Findings with evidence: 8/10
+     Uncited findings: 2 (flagged as [UNVERIFIED])
+     Coverage: 80%
+   ```
+When no session is available, evidence checking is skipped silently. This is not an error -- evidence checking enhances reviews but does not gate them.
 **Exit:** If any mechanical check fails (harness validate, typecheck, or tests), report the mechanical failures in Strengths/Issues/Assessment format and stop the pipeline. The code has fundamental issues that must be fixed before AI review adds value. Lint warnings and security scan findings do not stop the pipeline — they are recorded for exclusion only.
 ---
@@ -628,6 +646,32 @@ _This section is not part of the pipeline. It documents the process for respondi
 ---
+## Evidence Requirements
+When this skill produces review findings, every finding MUST include evidence citations. The `ReviewFinding.evidence` array field already exists in the finding schema -- this section defines the citation standard for populating it.
+Every review finding MUST cite evidence using one of:
+1. **File reference:** `file:line` format (e.g., `src/api/routes/users.ts:12-15` -- "direct import from db/queries.ts bypasses service layer")
+2. **Diff evidence:** Before/after code from the PR diff with file path and line numbers
+3. **Dependency chain:** Import path showing the violation (e.g., `routes/users.ts:3 imports db/queries.ts` -- "violates routes -> services -> db layer direction")
+4. **Test evidence:** Include test command and output when findings relate to missing or failing tests
+5. **Convention reference:** Cite the specific convention file and rule (e.g., `AGENTS.md:45` -- "convention requires services layer between routes and db")
+6. **Session evidence:** Write significant findings to the `evidence` session section:
+   ```json
+   manage_state({
+     action: "append_entry",
+     session: "<current-session>",
+     section: "evidence",
+     authorSkill: "harness-code-review",
+     content: "src/api/routes/users.ts:12-15 -- layer violation: direct import from db/queries.ts"
+   })
+   ```
+**When to cite:** In Phase 4 (FAN-OUT), each subagent populates the `evidence` array in every `ReviewFinding`. In Phase 5 (VALIDATE), evidence is used to verify reachability claims. In Phase 7 (OUTPUT), every issue in the review includes its file:line location and rationale backed by evidence.
+**Uncited claims:** Review findings without evidence in the `evidence` array are discarded during Phase 5 (VALIDATE). Observations that cannot be tied to specific file:line references MUST be prefixed with `[UNVERIFIED]` and downgraded to `severity: 'suggestion'`.
 ## Harness Integration
 - **`assess_project`** — Used in Phase 2 (MECHANICAL) to run `validate`, `deps`, and `docs` checks in parallel. Must pass for the pipeline to continue to AI review. Failures are Critical issues that stop the pipeline.

package/dist/agents/skills/claude-code/harness-execution/SKILL.md CHANGED Viewed

@@ -345,6 +345,50 @@ These are non-negotiable. When any condition is met, stop immediately.
 - **Three consecutive failures on the same task.** After 3 attempts, the task design is likely wrong. Stop. Report: "Task N has failed 3 times. Root cause: [analysis]. The plan may need revision."
+## Session State
+This skill reads and writes to the following session sections via `manage_state`:
+| Section       | Read | Write | Purpose                                                                               |
+| ------------- | ---- | ----- | ------------------------------------------------------------------------------------- |
+| terminology   | yes  | yes   | Reads domain terms for consistent naming; adds terms discovered during implementation |
+| decisions     | yes  | yes   | Reads planning decisions for context; records implementation decisions                |
+| constraints   | yes  | yes   | Reads constraints to respect boundaries; adds constraints discovered during coding    |
+| risks         | yes  | yes   | Reads risks for awareness; updates risk status as mitigated or realized               |
+| openQuestions | yes  | yes   | Reads questions for context; resolves questions answered by implementation            |
+| evidence      | yes  | yes   | Reads prior evidence; writes file:line citations, test outputs, and diff references   |
+**When to write:** After each task completion, append relevant entries. Evidence entries should be written for every significant technical assertion (test result, file reference, performance measurement). Mark openQuestions as resolved when implementation answers them.
+**When to read:** During Phase 1 (PREPARE), read all sections via `gather_context` with `include: ["sessions"]` to inherit full accumulated context from brainstorming and planning.
+## Evidence Requirements
+When this skill makes claims about task completion, test results, or code behavior, it MUST cite evidence using one of:
+1. **File reference:** `file:line` format (e.g., `src/services/notification-service.ts:42` -- "create method implemented with validation")
+2. **Test output:** Include the actual test command and its output:
+   ```
+   $ npx vitest run src/services/notification-service.test.ts
+   PASS  src/services/notification-service.test.ts (8 tests)
+   ```
+3. **Diff evidence:** Before/after with file path for modifications to existing files
+4. **Harness output:** Include `harness validate` output as evidence of project health
+5. **Session evidence:** Write to the `evidence` session section after each task:
+   ```json
+   manage_state({
+     action: "append_entry",
+     session: "<current-session>",
+     section: "evidence",
+     authorSkill: "harness-execution",
+     content: "src/services/notification-service.ts:42 -- create method returns Notification with all required fields"
+   })
+   ```
+**When to cite:** After every task completion in Phase 2 (EXECUTE). Every commit message claim ("added X", "fixed Y") must be backed by test output or file reference. During Phase 4 (PERSIST) when writing learnings that reference specific code behavior.
+**Uncited claims:** Technical assertions without citations MUST be prefixed with `[UNVERIFIED]`. Example: `[UNVERIFIED] The notification service handles duplicate entries`. Uncited claims are flagged during review (Wave 2.2).
 ## Harness Integration
 - **`harness validate`** — Run after every task completion. Mandatory. No task is complete without a passing validation.

package/dist/agents/skills/claude-code/harness-planning/SKILL.md CHANGED Viewed

@@ -312,6 +312,45 @@ One sentence.
 ````
+## Session State
+This skill reads and writes to the following session sections via `manage_state`:
+| Section | Read | Write | Purpose |
+|---------|------|-------|---------|
+| terminology | yes | no | Reads domain terms to use consistent language in plan |
+| decisions | yes | yes | Reads brainstorming decisions; records planning-phase decisions |
+| constraints | yes | yes | Reads existing constraints; adds constraints discovered during decomposition |
+| risks | yes | yes | Reads existing risks; adds implementation risks identified during task design |
+| openQuestions | yes | yes | Reads unresolved questions; adds new questions, resolves answered ones |
+| evidence | yes | yes | Reads prior evidence from brainstorming; writes file:line citations for task specifications |
+**When to write:** During Phase 1 (SCOPE) write newly discovered constraints and risks. During Phase 2 (DECOMPOSE) write decisions about task structure and sequencing. Mark resolved questions during Phase 4 (VALIDATE).
+**When to read:** At the start of Phase 1 (SCOPE), read all sections via `gather_context` with `include: ["sessions"]` to inherit context from brainstorming. Use terminology for consistent naming in task descriptions.
+## Evidence Requirements
+When this skill makes claims about existing code structure, file locations, or implementation patterns in task specifications, it MUST cite evidence using one of:
+1. **File reference:** `file:line` format (e.g., `src/services/index.ts:15` -- "barrel export exists, will add new export here")
+2. **Code pattern reference:** `file:line` format with pattern description (e.g., `src/services/user-service.ts:1-30` -- "existing service follows constructor injection pattern, new service will match")
+3. **Test output:** Include the command and its observed output when referencing current test state
+4. **Session evidence:** Write to the `evidence` session section:
+   ```json
+   manage_state({
+     action: "append_entry",
+     session: "<current-session>",
+     section: "evidence",
+     authorSkill: "harness-planning",
+     content: "src/services/index.ts:15 -- barrel export pattern confirmed for new service integration"
+   })
+   ```
+**When to cite:** During Phase 1 (SCOPE) when referencing existing files for observable truths. During Phase 2 (DECOMPOSE) when specifying exact file paths and code patterns in task instructions. When the file map references existing files for modification.
+**Uncited claims:** Technical assertions about existing code without citations MUST be prefixed with `[UNVERIFIED]`. Example: `[UNVERIFIED] The service barrel exports all services`. Uncited claims are flagged during review (Wave 2.2).
 ## Harness Integration
 - **`harness validate`** — Run during Phase 4 (before writing the plan) and included as a step in every task.

package/dist/agents/skills/claude-code/harness-release-readiness/SKILL.md CHANGED Viewed

@@ -111,14 +111,14 @@ Run every check below. Record each as **pass**, **warn**, or **fail**:
 | `test` script exists in root `package.json`                     | fail                |
 | `lint` script exists in root `package.json`                     | fail                |
 | `typecheck` or `tsc` script exists in root `package.json`       | fail                |
-| `assess_project` passes (harness health + lint gate)            | fail                |
+| `assess_project` passes (full harness CI gate)                  | fail                |
 For the `assess_project` check, run it with all harness-specific checks including lint:
 ```json
 assess_project({
   path: "<project-root>",
-  checks: ["validate", "deps", "docs", "lint"],
+  checks: ["validate", "deps", "docs", "lint", "perf", "security", "entropy", "arch"],
   mode: "detailed"
 })
 ```
@@ -519,7 +519,7 @@ This framing is informational — it does not block anything. It gives the team
 ## Harness Integration
-- **`assess_project`** — Used in AUDIT Phase 1 (CI/CD section) to run harness validation, dependency checks, doc coverage, and lint in a single parallel call. Also run after auto-fixes in Phase 3 to verify project health. Automatically inherits new checks added to `assess_project`.
+- **`assess_project`** — Used in AUDIT Phase 1 (CI/CD section) to run the full harness CI gate (validation, dependencies, docs, lint, performance/complexity, security, entropy, and architecture) in a single parallel call. Also run after auto-fixes in Phase 3 to verify project health. Automatically inherits new checks added to `assess_project`.
 - **Sub-skill invocations** — Phase 2 dispatches `detect-doc-drift`, `cleanup-dead-code`, `enforce-architecture`, and `diagnostics` as parallel agents. Phase 3 delegates fixes to `align-documentation` and `cleanup-dead-code`.
 - **State file** — `.harness/release-readiness.json` enables session resumption and progress tracking. This file is read at the start of each invocation and written at the end.
 - **Report file** — `release-readiness-report.md` is written to the project root. It is a snapshot, not a tracked artifact — regenerate it on each run.

package/dist/agents/skills/claude-code/harness-verification/SKILL.md CHANGED Viewed

@@ -291,6 +291,41 @@ When verifying a bug fix, apply this extended protocol:
 If step 4 passes (test does not fail without the fix), the test is not a valid regression test. It does not catch the bug. Rewrite it.
+## Evidence Requirements
+This skill is the primary evidence producer in the workflow. Every pass/fail assertion in the verification report MUST include concrete evidence. The words "should", "probably", and "seems to" are already forbidden by the Iron Law -- this section defines HOW to cite evidence.
+Every verification claim MUST use one of:
+1. **File reference:** `file:line` format with observed content (e.g., `src/services/user-service.ts:42` -- "create method validates email format before insert")
+2. **Test output:** Include the actual test command and its complete output:
+   ```
+   $ npx vitest run src/services/user-service.test.ts
+   PASS  src/services/user-service.test.ts
+     UserService
+       create (4 tests)
+       list (3 tests)
+       expiry (2 tests)
+   Tests: 9 passed, 9 total
+   ```
+3. **Harness output:** Include full `harness validate` and `harness check-deps` output
+4. **Anti-pattern scan output:** Include the actual grep/search command and results (or absence of results)
+5. **Import chain evidence:** Include the actual import statements found when verifying WIRED level
+6. **Session evidence:** Write to the `evidence` session section for each verification level:
+   ```json
+   manage_state({
+     action: "append_entry",
+     session: "<current-session>",
+     section: "evidence",
+     authorSkill: "harness-verification",
+     content: "[EXISTS:PASS] src/services/user-service.ts (189 lines) -- verified via direct file read"
+   })
+   ```
+**When to cite:** At every verification level. Level 1 (EXISTS) cites file reads. Level 2 (SUBSTANTIVE) cites specific line content. Level 3 (WIRED) cites import statements, test execution output, and harness check output. The verification report format already requires `[PASS]`/`[FAIL]` markers -- each marker must be accompanied by the evidence that produced it.
+**Uncited claims:** ANY verification assertion without direct evidence is a verification failure, not merely an uncited claim. This skill does not use `[UNVERIFIED]` -- if evidence cannot be produced, the verdict is FAIL or INCOMPLETE.
 ## Harness Integration
 - **`harness validate`** — Run in Level 3 WIRED check. Verifies project-wide health and constraint compliance.

package/dist/agents/skills/claude-code/initialize-harness-project/SKILL.md CHANGED Viewed

@@ -20,6 +20,8 @@
 2. **For new projects:** Gather project context — language, framework, test runner, build tool. Ask the human if any of these are undecided. Do not assume defaults.
+2b. **For existing projects with detectable frameworks:** Run `harness init` without flags first. The command auto-detects frameworks (FastAPI, Django, Gin, Axum, Spring Boot, Next.js, React+Vite, Vue, Express, NestJS) by scanning project files. Present the detection result to the human and ask for confirmation before proceeding. If detection fails, ask the human to specify `--framework` manually.
 3. **For existing projects:** Run `harness validate` to see what is already configured and what is missing. Read `AGENTS.md` if it exists. Identify the current adoption level:
    - **Basic:** Has `AGENTS.md` and `harness.yaml` with project metadata. No layers, no skills, no dependency constraints.
    - **Intermediate:** Has layers defined, dependency constraints between layers, at least one custom skill. `harness check-deps` runs and passes.
@@ -30,11 +32,17 @@
 ### Phase 2: SCAFFOLD — Generate Project Structure
 1. **Run `harness init` with the appropriate flags:**
-   - New basic project: `harness init --level basic --framework <framework>`
-   - New intermediate project: `harness init --level intermediate --framework <framework>`
+   - New basic JS/TS project: `harness init --level basic`
+   - With framework: `harness init --level basic --framework <framework>`
+   - Non-JS language: `harness init --language <python|go|rust|java>`
+   - Non-JS with framework: `harness init --framework <fastapi|django|gin|axum|spring-boot>`
+   - Existing project (auto-detect): `harness init` (no flags -- auto-detection runs)
    - Migration to intermediate: `harness init --level intermediate --migrate`
    - Migration to advanced: `harness init --level advanced --migrate`
+   **Supported frameworks:** nextjs, react-vite, vue, express, nestjs, fastapi, django, gin, axum, spring-boot
+   **Supported languages:** typescript, python, go, rust, java
 2. **Review generated files.** `harness init` creates:
    - `harness.yaml` — Project configuration (name, stack, adoption level)
    - `.harness/` directory — State and learnings storage
@@ -93,7 +101,7 @@ This creates the `.harness/graph/` directory and populates it with the project's
 ## Harness Integration
-- **`harness init --level <level> --framework <framework>`** — Scaffold a new project at the specified adoption level.
+- **`harness init --level <level> [--framework <framework>] [--language <language>]`** — Scaffold a new project. `--framework` infers language automatically. `--language` without `--framework` gives a bare language scaffold. Running without flags on an existing project directory triggers auto-detection.
 - **`harness init --level <level> --migrate`** — Migrate an existing project to the next adoption level, preserving existing configuration.
 - **`harness persona generate`** — Generate persona definitions based on project stack and team structure.
 - **`harness validate`** — Verify the full project configuration is valid and complete.

package/dist/agents/skills/gemini-cli/harness-brainstorming/SKILL.md CHANGED Viewed

@@ -259,6 +259,45 @@ For each proposed approach, evaluate from each perspective:
 Converge on a recommendation that addresses all concerns before presenting the design.
+## Session State
+This skill reads and writes to the following session sections via `manage_state`:
+| Section       | Read | Write | Purpose                                                           |
+| ------------- | ---- | ----- | ----------------------------------------------------------------- |
+| terminology   | yes  | yes   | Captures domain terms discovered during brainstorming             |
+| decisions     | no   | yes   | Records design decisions made during exploration                  |
+| constraints   | yes  | no    | Reads constraints to scope brainstorming                          |
+| risks         | no   | yes   | Captures risks identified during brainstorming                    |
+| openQuestions | yes  | yes   | Adds new questions, resolves answered ones                        |
+| evidence      | no   | yes   | Cites sources for design recommendations and prior art references |
+**When to write:** After each phase transition (EXPLORE -> EVALUATE -> PRIORITIZE -> VALIDATE), append relevant entries to the appropriate sections. This ensures downstream skills (planning, execution) inherit accumulated context without re-discovery.
+**When to read:** At the start of Phase 1 (EXPLORE), read `terminology` and `constraints` from the session to inherit context from prior skills or previous brainstorming sessions on the same feature.
+## Evidence Requirements
+When this skill makes claims about existing code behavior, architecture patterns, or technical tradeoffs, it MUST cite evidence using one of:
+1. **File reference:** `file:line` format (e.g., `src/services/auth.ts:42` -- "existing JWT middleware handles token refresh")
+2. **Prior art reference:** `file` format with description (e.g., `src/utils/email.ts` -- "email utility already exists, can be reused for notifications")
+3. **Documentation reference:** `docs/path` format (e.g., `docs/changes/user-auth/proposal.md` -- "prior spec established OAuth2 as the auth standard")
+4. **Session evidence:** Write to the `evidence` session section:
+   ```json
+   manage_state({
+     action: "append_entry",
+     session: "<current-session>",
+     section: "evidence",
+     authorSkill: "harness-brainstorming",
+     content: "src/services/auth.ts:42 -- existing JWT middleware supports refresh tokens"
+   })
+   ```
+**When to cite:** During Phase 1 (EXPLORE) when referencing existing code or patterns. During Phase 3 (PRIORITIZE) when justifying tradeoffs with concrete code references. During Phase 4 (VALIDATE) when spec references existing implementation details.
+**Uncited claims:** Technical assertions without citations MUST be prefixed with `[UNVERIFIED]`. Example: `[UNVERIFIED] The current auth middleware does not support refresh tokens`. Uncited claims are flagged during review (Wave 2.2).
 ## Harness Integration
 - **`harness validate`** — Run after writing the spec to `docs/`. Verifies project health and that the new spec file is properly placed.

package/dist/agents/skills/gemini-cli/harness-code-review/SKILL.md CHANGED Viewed

@@ -138,6 +138,24 @@ Run mechanical checks to establish an exclusion boundary. Any issue caught mecha
 **Output:** A set of mechanical findings (file, line, tool, message). This set becomes the exclusion list for Phase 5.
+#### Evidence Gate (session-aware)
+When a `sessionSlug` is available (e.g., via autopilot dispatch or `--session` flag), the pipeline loads evidence entries from the session state and cross-references them with review findings:
+1. Load evidence entries: `readSessionSection(projectRoot, sessionSlug, 'evidence')`
+2. For each finding, check if any active evidence entry references the same file:line location
+3. Findings without matching evidence are tagged with `[UNVERIFIED]` prefix in their title
+4. An evidence coverage report is appended to the review output:
+   ```
+   Evidence Coverage:
+     Evidence entries: 12
+     Findings with evidence: 8/10
+     Uncited findings: 2 (flagged as [UNVERIFIED])
+     Coverage: 80%
+   ```
+When no session is available, evidence checking is skipped silently. This is not an error -- evidence checking enhances reviews but does not gate them.
 **Exit:** If any mechanical check fails (harness validate, typecheck, or tests), report the mechanical failures in Strengths/Issues/Assessment format and stop the pipeline. The code has fundamental issues that must be fixed before AI review adds value. Lint warnings and security scan findings do not stop the pipeline — they are recorded for exclusion only.
 ---
@@ -628,6 +646,32 @@ _This section is not part of the pipeline. It documents the process for respondi
 ---
+## Evidence Requirements
+When this skill produces review findings, every finding MUST include evidence citations. The `ReviewFinding.evidence` array field already exists in the finding schema -- this section defines the citation standard for populating it.
+Every review finding MUST cite evidence using one of:
+1. **File reference:** `file:line` format (e.g., `src/api/routes/users.ts:12-15` -- "direct import from db/queries.ts bypasses service layer")
+2. **Diff evidence:** Before/after code from the PR diff with file path and line numbers
+3. **Dependency chain:** Import path showing the violation (e.g., `routes/users.ts:3 imports db/queries.ts` -- "violates routes -> services -> db layer direction")
+4. **Test evidence:** Include test command and output when findings relate to missing or failing tests
+5. **Convention reference:** Cite the specific convention file and rule (e.g., `AGENTS.md:45` -- "convention requires services layer between routes and db")
+6. **Session evidence:** Write significant findings to the `evidence` session section:
+   ```json
+   manage_state({
+     action: "append_entry",
+     session: "<current-session>",
+     section: "evidence",
+     authorSkill: "harness-code-review",
+     content: "src/api/routes/users.ts:12-15 -- layer violation: direct import from db/queries.ts"
+   })
+   ```
+**When to cite:** In Phase 4 (FAN-OUT), each subagent populates the `evidence` array in every `ReviewFinding`. In Phase 5 (VALIDATE), evidence is used to verify reachability claims. In Phase 7 (OUTPUT), every issue in the review includes its file:line location and rationale backed by evidence.
+**Uncited claims:** Review findings without evidence in the `evidence` array are discarded during Phase 5 (VALIDATE). Observations that cannot be tied to specific file:line references MUST be prefixed with `[UNVERIFIED]` and downgraded to `severity: 'suggestion'`.
 ## Harness Integration
 - **`assess_project`** — Used in Phase 2 (MECHANICAL) to run `validate`, `deps`, and `docs` checks in parallel. Must pass for the pipeline to continue to AI review. Failures are Critical issues that stop the pipeline.

package/dist/agents/skills/gemini-cli/harness-execution/SKILL.md CHANGED Viewed

@@ -345,6 +345,50 @@ These are non-negotiable. When any condition is met, stop immediately.
 - **Three consecutive failures on the same task.** After 3 attempts, the task design is likely wrong. Stop. Report: "Task N has failed 3 times. Root cause: [analysis]. The plan may need revision."
+## Session State
+This skill reads and writes to the following session sections via `manage_state`:
+| Section       | Read | Write | Purpose                                                                               |
+| ------------- | ---- | ----- | ------------------------------------------------------------------------------------- |
+| terminology   | yes  | yes   | Reads domain terms for consistent naming; adds terms discovered during implementation |
+| decisions     | yes  | yes   | Reads planning decisions for context; records implementation decisions                |
+| constraints   | yes  | yes   | Reads constraints to respect boundaries; adds constraints discovered during coding    |
+| risks         | yes  | yes   | Reads risks for awareness; updates risk status as mitigated or realized               |
+| openQuestions | yes  | yes   | Reads questions for context; resolves questions answered by implementation            |
+| evidence      | yes  | yes   | Reads prior evidence; writes file:line citations, test outputs, and diff references   |
+**When to write:** After each task completion, append relevant entries. Evidence entries should be written for every significant technical assertion (test result, file reference, performance measurement). Mark openQuestions as resolved when implementation answers them.
+**When to read:** During Phase 1 (PREPARE), read all sections via `gather_context` with `include: ["sessions"]` to inherit full accumulated context from brainstorming and planning.
+## Evidence Requirements
+When this skill makes claims about task completion, test results, or code behavior, it MUST cite evidence using one of:
+1. **File reference:** `file:line` format (e.g., `src/services/notification-service.ts:42` -- "create method implemented with validation")
+2. **Test output:** Include the actual test command and its output:
+   ```
+   $ npx vitest run src/services/notification-service.test.ts
+   PASS  src/services/notification-service.test.ts (8 tests)
+   ```
+3. **Diff evidence:** Before/after with file path for modifications to existing files
+4. **Harness output:** Include `harness validate` output as evidence of project health
+5. **Session evidence:** Write to the `evidence` session section after each task:
+   ```json
+   manage_state({
+     action: "append_entry",
+     session: "<current-session>",
+     section: "evidence",
+     authorSkill: "harness-execution",
+     content: "src/services/notification-service.ts:42 -- create method returns Notification with all required fields"
+   })
+   ```
+**When to cite:** After every task completion in Phase 2 (EXECUTE). Every commit message claim ("added X", "fixed Y") must be backed by test output or file reference. During Phase 4 (PERSIST) when writing learnings that reference specific code behavior.
+**Uncited claims:** Technical assertions without citations MUST be prefixed with `[UNVERIFIED]`. Example: `[UNVERIFIED] The notification service handles duplicate entries`. Uncited claims are flagged during review (Wave 2.2).
 ## Harness Integration
 - **`harness validate`** — Run after every task completion. Mandatory. No task is complete without a passing validation.

package/dist/agents/skills/gemini-cli/harness-planning/SKILL.md CHANGED Viewed

@@ -312,6 +312,45 @@ One sentence.
 ````
+## Session State
+This skill reads and writes to the following session sections via `manage_state`:
+| Section | Read | Write | Purpose |
+|---------|------|-------|---------|
+| terminology | yes | no | Reads domain terms to use consistent language in plan |
+| decisions | yes | yes | Reads brainstorming decisions; records planning-phase decisions |
+| constraints | yes | yes | Reads existing constraints; adds constraints discovered during decomposition |
+| risks | yes | yes | Reads existing risks; adds implementation risks identified during task design |
+| openQuestions | yes | yes | Reads unresolved questions; adds new questions, resolves answered ones |
+| evidence | yes | yes | Reads prior evidence from brainstorming; writes file:line citations for task specifications |
+**When to write:** During Phase 1 (SCOPE) write newly discovered constraints and risks. During Phase 2 (DECOMPOSE) write decisions about task structure and sequencing. Mark resolved questions during Phase 4 (VALIDATE).
+**When to read:** At the start of Phase 1 (SCOPE), read all sections via `gather_context` with `include: ["sessions"]` to inherit context from brainstorming. Use terminology for consistent naming in task descriptions.
+## Evidence Requirements
+When this skill makes claims about existing code structure, file locations, or implementation patterns in task specifications, it MUST cite evidence using one of:
+1. **File reference:** `file:line` format (e.g., `src/services/index.ts:15` -- "barrel export exists, will add new export here")
+2. **Code pattern reference:** `file:line` format with pattern description (e.g., `src/services/user-service.ts:1-30` -- "existing service follows constructor injection pattern, new service will match")
+3. **Test output:** Include the command and its observed output when referencing current test state
+4. **Session evidence:** Write to the `evidence` session section:
+   ```json
+   manage_state({
+     action: "append_entry",
+     session: "<current-session>",
+     section: "evidence",
+     authorSkill: "harness-planning",
+     content: "src/services/index.ts:15 -- barrel export pattern confirmed for new service integration"
+   })
+   ```
+**When to cite:** During Phase 1 (SCOPE) when referencing existing files for observable truths. During Phase 2 (DECOMPOSE) when specifying exact file paths and code patterns in task instructions. When the file map references existing files for modification.
+**Uncited claims:** Technical assertions about existing code without citations MUST be prefixed with `[UNVERIFIED]`. Example: `[UNVERIFIED] The service barrel exports all services`. Uncited claims are flagged during review (Wave 2.2).
 ## Harness Integration
 - **`harness validate`** — Run during Phase 4 (before writing the plan) and included as a step in every task.

package/dist/agents/skills/gemini-cli/harness-release-readiness/SKILL.md CHANGED Viewed

@@ -111,14 +111,14 @@ Run every check below. Record each as **pass**, **warn**, or **fail**:
 | `test` script exists in root `package.json`                     | fail                |
 | `lint` script exists in root `package.json`                     | fail                |
 | `typecheck` or `tsc` script exists in root `package.json`       | fail                |
-| `assess_project` passes (harness health + lint gate)            | fail                |
+| `assess_project` passes (full harness CI gate)                  | fail                |
 For the `assess_project` check, run it with all harness-specific checks including lint:
 ```json
 assess_project({
   path: "<project-root>",
-  checks: ["validate", "deps", "docs", "lint"],
+  checks: ["validate", "deps", "docs", "lint", "perf", "security", "entropy", "arch"],
   mode: "detailed"
 })
 ```
@@ -519,7 +519,7 @@ This framing is informational — it does not block anything. It gives the team
 ## Harness Integration
-- **`assess_project`** — Used in AUDIT Phase 1 (CI/CD section) to run harness validation, dependency checks, doc coverage, and lint in a single parallel call. Also run after auto-fixes in Phase 3 to verify project health. Automatically inherits new checks added to `assess_project`.
+- **`assess_project`** — Used in AUDIT Phase 1 (CI/CD section) to run the full harness CI gate (validation, dependencies, docs, lint, performance/complexity, security, entropy, and architecture) in a single parallel call. Also run after auto-fixes in Phase 3 to verify project health. Automatically inherits new checks added to `assess_project`.
 - **Sub-skill invocations** — Phase 2 dispatches `detect-doc-drift`, `cleanup-dead-code`, `enforce-architecture`, and `diagnostics` as parallel agents. Phase 3 delegates fixes to `align-documentation` and `cleanup-dead-code`.
 - **State file** — `.harness/release-readiness.json` enables session resumption and progress tracking. This file is read at the start of each invocation and written at the end.
 - **Report file** — `release-readiness-report.md` is written to the project root. It is a snapshot, not a tracked artifact — regenerate it on each run.

package/dist/agents/skills/gemini-cli/harness-verification/SKILL.md CHANGED Viewed

@@ -291,6 +291,41 @@ When verifying a bug fix, apply this extended protocol:
 If step 4 passes (test does not fail without the fix), the test is not a valid regression test. It does not catch the bug. Rewrite it.
+## Evidence Requirements
+This skill is the primary evidence producer in the workflow. Every pass/fail assertion in the verification report MUST include concrete evidence. The words "should", "probably", and "seems to" are already forbidden by the Iron Law -- this section defines HOW to cite evidence.
+Every verification claim MUST use one of:
+1. **File reference:** `file:line` format with observed content (e.g., `src/services/user-service.ts:42` -- "create method validates email format before insert")
+2. **Test output:** Include the actual test command and its complete output:
+   ```
+   $ npx vitest run src/services/user-service.test.ts
+   PASS  src/services/user-service.test.ts
+     UserService
+       create (4 tests)
+       list (3 tests)
+       expiry (2 tests)
+   Tests: 9 passed, 9 total
+   ```
+3. **Harness output:** Include full `harness validate` and `harness check-deps` output
+4. **Anti-pattern scan output:** Include the actual grep/search command and results (or absence of results)
+5. **Import chain evidence:** Include the actual import statements found when verifying WIRED level
+6. **Session evidence:** Write to the `evidence` session section for each verification level:
+   ```json
+   manage_state({
+     action: "append_entry",
+     session: "<current-session>",
+     section: "evidence",
+     authorSkill: "harness-verification",
+     content: "[EXISTS:PASS] src/services/user-service.ts (189 lines) -- verified via direct file read"
+   })
+   ```
+**When to cite:** At every verification level. Level 1 (EXISTS) cites file reads. Level 2 (SUBSTANTIVE) cites specific line content. Level 3 (WIRED) cites import statements, test execution output, and harness check output. The verification report format already requires `[PASS]`/`[FAIL]` markers -- each marker must be accompanied by the evidence that produced it.
+**Uncited claims:** ANY verification assertion without direct evidence is a verification failure, not merely an uncited claim. This skill does not use `[UNVERIFIED]` -- if evidence cannot be produced, the verdict is FAIL or INCOMPLETE.
 ## Harness Integration
 - **`harness validate`** — Run in Level 3 WIRED check. Verifies project-wide health and constraint compliance.

package/dist/agents/skills/gemini-cli/initialize-harness-project/SKILL.md CHANGED Viewed

@@ -20,6 +20,8 @@
 2. **For new projects:** Gather project context — language, framework, test runner, build tool. Ask the human if any of these are undecided. Do not assume defaults.
+2b. **For existing projects with detectable frameworks:** Run `harness init` without flags first. The command auto-detects frameworks (FastAPI, Django, Gin, Axum, Spring Boot, Next.js, React+Vite, Vue, Express, NestJS) by scanning project files. Present the detection result to the human and ask for confirmation before proceeding. If detection fails, ask the human to specify `--framework` manually.
 3. **For existing projects:** Run `harness validate` to see what is already configured and what is missing. Read `AGENTS.md` if it exists. Identify the current adoption level:
    - **Basic:** Has `AGENTS.md` and `harness.yaml` with project metadata. No layers, no skills, no dependency constraints.
    - **Intermediate:** Has layers defined, dependency constraints between layers, at least one custom skill. `harness check-deps` runs and passes.
@@ -30,11 +32,17 @@
 ### Phase 2: SCAFFOLD — Generate Project Structure
 1. **Run `harness init` with the appropriate flags:**
-   - New basic project: `harness init --level basic --framework <framework>`
-   - New intermediate project: `harness init --level intermediate --framework <framework>`
+   - New basic JS/TS project: `harness init --level basic`
+   - With framework: `harness init --level basic --framework <framework>`
+   - Non-JS language: `harness init --language <python|go|rust|java>`
+   - Non-JS with framework: `harness init --framework <fastapi|django|gin|axum|spring-boot>`
+   - Existing project (auto-detect): `harness init` (no flags -- auto-detection runs)
    - Migration to intermediate: `harness init --level intermediate --migrate`
    - Migration to advanced: `harness init --level advanced --migrate`
+   **Supported frameworks:** nextjs, react-vite, vue, express, nestjs, fastapi, django, gin, axum, spring-boot
+   **Supported languages:** typescript, python, go, rust, java
 2. **Review generated files.** `harness init` creates:
    - `harness.yaml` — Project configuration (name, stack, adoption level)
    - `.harness/` directory — State and learnings storage
@@ -93,7 +101,7 @@ This creates the `.harness/graph/` directory and populates it with the project's
 ## Harness Integration
-- **`harness init --level <level> --framework <framework>`** — Scaffold a new project at the specified adoption level.
+- **`harness init --level <level> [--framework <framework>] [--language <language>]`** — Scaffold a new project. `--framework` infers language automatically. `--language` without `--framework` gives a bare language scaffold. Running without flags on an existing project directory triggers auto-detection.
 - **`harness init --level <level> --migrate`** — Migrate an existing project to the next adoption level, preserving existing configuration.
 - **`harness persona generate`** — Generate persona definitions based on project stack and team structure.
 - **`harness validate`** — Verify the full project configuration is valid and complete.