npm - @harness-engineering/cli - Versions diffs - 1.6.2 → 1.8.0 - Mend

@harness-engineering/cli 1.6.2 → 1.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (180) hide show

package/dist/agents/skills/gemini-cli/harness-knowledge-mapper/SKILL.md CHANGED Viewed

@@ -13,8 +13,35 @@
 ## Prerequisites
-A knowledge graph must exist at `.harness/graph/`. Run `harness scan` if no graph is available.
-If the graph exists but code has changed since the last scan, re-run `harness scan` first — stale graph data leads to inaccurate results.
+A knowledge graph at `.harness/graph/` enables full analysis. If no graph exists,
+the skill uses static analysis fallbacks (see Graph Availability section).
+Run `harness scan` to enable graph-enhanced analysis.
+### Graph Availability
+Before starting, check if `.harness/graph/graph.json` exists.
+**If graph exists:** Check staleness — compare `.harness/graph/metadata.json`
+scanTimestamp against `git log -1 --format=%ct` (latest commit timestamp).
+If graph is more than 10 commits behind (`git log --oneline <scanTimestamp>..HEAD | wc -l`),
+run `harness scan` to refresh before proceeding. (Staleness sensitivity: **Medium**)
+**If graph exists and is fresh (or refreshed):** Use graph tools as primary strategy.
+**If no graph exists:** Output "Running without graph (run `harness scan` to
+enable full analysis)" and use fallback strategies for all subsequent steps.
+### Pipeline Context (when orchestrated)
+When invoked by `harness-docs-pipeline`, check for a `pipeline` field in `.harness/handoff.json`:
+- If `pipeline` field exists: read `DocPipelineContext` from it
+  - If `pipeline.bootstrapped === true`, this is a bootstrap invocation — generate full AGENTS.md without confirmation prompt
+  - Write any generated documentation back as `DocFix[]` to `pipeline.fillsApplied`
+  - This enables the orchestrator to track what was generated and verify it
+- If `pipeline` field does not exist: behave exactly as today (standalone mode)
+No changes to the skill's interface or output format — the pipeline field is purely additive.
 ## Process
@@ -40,6 +67,20 @@ If the graph exists but code has changed since the last scan, re-run `harness sc
    get_relationships(nodeId=<module>, direction="outbound", depth=1)
    ```
+#### Fallback (without graph)
+When no graph is available, use directory structure and file analysis:
+1. **Module hierarchy from directories**: Use the directory structure as the module hierarchy — each directory represents a module. Glob for all source files to build the tree.
+2. **Entry points**: Check `package.json` for `main` and `exports` fields. Glob for `src/index.*` and `index.*` patterns. These are the entry points.
+3. **Source file inventory**: Glob for all source files (`**/*.ts`, `**/*.tsx`, `**/*.js`, `**/*.jsx`, etc.).
+4. **Documentation inventory**: Glob for all doc files (`**/*.md`, `docs/**/*`).
+5. **Undocumented module detection**: Diff the source directory set against the doc directory set. Source directories with no corresponding docs (no README.md, no matching doc file) are undocumented.
+6. **Existing knowledge map**: Read existing AGENTS.md if present for current knowledge map state.
+7. **Dependency flow (approximate)**: Parse import statements in each module's files to determine which modules depend on which others.
+> Fallback completeness: ~50% — no semantic grouping; modules grouped by directory only; no cross-cutting concern detection.
 ### Phase 2: GENERATE — Build Knowledge Map
 Generate markdown sections following AGENTS.md conventions:
@@ -109,7 +150,7 @@ This ensures subsequent graph queries (impact analysis, drift detection) include
 ## Harness Integration
-- **`harness scan`** — Must run before this skill to ensure graph is current.
+- **`harness scan`** — Recommended before this skill for full graph-enhanced analysis. If graph is missing, skill uses directory structure fallbacks.
 - **`harness validate`** — Run after acting on findings to verify project health.
 - **Graph tools** — This skill uses `query_graph`, `get_relationships`, and `check_docs` MCP tools.
@@ -119,7 +160,7 @@ This ensures subsequent graph queries (impact analysis, drift detection) include
 - Coverage gaps identified (undocumented modules, missing descriptions, stale references)
 - Output written to AGENTS.md (or specified path) in proper markdown format
 - Report follows the structured output format
-- All findings are backed by graph query evidence, not heuristics
+- All findings are backed by graph query evidence (with graph) or directory/file analysis (without graph)
 ## Examples
@@ -145,7 +186,7 @@ Output:
 ## Gates
-- **No generation without graph.** If no graph exists, stop and instruct to run `harness scan`.
+- **Graph preferred, fallback available.** If no graph exists, use directory structure and file analysis to build the knowledge map. Do not stop — produce the best map possible.
 - **Never overwrite without confirmation.** If AGENTS.md exists, show the diff and ask before replacing.
 ## Escalation

package/dist/agents/skills/gemini-cli/harness-knowledge-mapper/skill.yaml CHANGED Viewed

@@ -5,7 +5,7 @@ cognitive_mode: constructive-architect
 triggers:
   - manual
   - on_commit
-  - scheduled
+  - on_milestone
 platforms:
   - claude-code
   - gemini-cli

package/dist/agents/skills/gemini-cli/harness-perf/SKILL.md CHANGED Viewed

@@ -45,34 +45,62 @@ Tier 1 violations are non-negotiable blockers. If a Tier 1 violation is detected
 ---
+### Graph Availability
+Hotspot scoring and coupling analysis benefit from the knowledge graph but work without it.
+**Staleness sensitivity:** Medium -- auto-refresh if >10 commits stale. Hotspot scoring uses churn data which does not change rapidly.
+| Feature                              | With Graph                                                   | Without Graph                                                                                                                    |
+| ------------------------------------ | ------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------- |
+| Hotspot scoring (churn x complexity) | `GraphComplexityAdapter` computes from graph nodes           | `git log --format="%H" -- <file>` for per-file commit count; complexity from `check-perf --structural` output; multiply manually |
+| Coupling ratio                       | `GraphCouplingAdapter` computes from graph edges             | Parse import statements, count fan-out/fan-in per file                                                                           |
+| Critical path resolution             | Graph inference (high fan-in) + `@perf-critical` annotations | `@perf-critical` annotations only; grep for decorator/comment                                                                    |
+| Transitive dep depth                 | Graph BFS depth                                              | Import chain follow, 2 levels deep                                                                                               |
+**Notice when running without graph:** "Running without graph (run `harness scan` to enable hotspot scoring and coupling analysis)"
+**Impact on tiers:** Without graph, Tier 1 hotspot detection is degraded. Hotspot scoring falls back to churn-only (no complexity multiplication). This limitation is documented in the performance report output.
+---
 ### Phase 2: BENCHMARK — Runtime Performance
 This phase runs only when `.bench.ts` files exist in the project. If none are found, skip to Phase 3.
-1. **Check for benchmark files.** Scan the project for `*.bench.ts` files. If none exist, skip this phase entirely.
+1. **Check baseline lock-in.** Before running benchmarks, verify baselines are kept in sync:
+   - List all `.bench.ts` files changed in this PR: `git diff --name-only | grep '.bench.ts'`
+   - If any `.bench.ts` files are new or modified:
+     - Check if `.harness/perf/baselines.json` is also modified in this PR
+     - If NOT modified: flag as Tier 2 warning: "Benchmark files changed but baselines not updated. Run `harness perf baselines update` and commit the result."
+     - If modified: verify the updated baselines include entries for all changed benchmarks
+   - If no `.bench.ts` files changed: skip this check
+   - This check also runs standalone via `--check-baselines` flag
+2. **Check for benchmark files.** Scan the project for `*.bench.ts` files. If none exist, skip this phase entirely.
-2. **Verify clean working tree.** Run `git status --porcelain`. If there are uncommitted changes, STOP. Benchmarks on dirty trees produce unreliable results.
+3. **Verify clean working tree.** Run `git status --porcelain`. If there are uncommitted changes, STOP. Benchmarks on dirty trees produce unreliable results.
-3. **Run benchmarks.** Execute `harness perf bench` to run all benchmark suites.
+4. **Run benchmarks.** Execute `harness perf bench` to run all benchmark suites.
-4. **Load baselines.** Read `.harness/perf/baselines.json` for previous benchmark results. If no baselines exist, treat this as a baseline-capture run.
+5. **Load baselines.** Read `.harness/perf/baselines.json` for previous benchmark results. If no baselines exist, treat this as a baseline-capture run.
-5. **Compare results against baselines** using the `RegressionDetector`:
+6. **Compare results against baselines** using the `RegressionDetector`:
    - Calculate percentage change for each benchmark
    - Apply noise margin (default: 3%) before flagging regressions
    - Distinguish between critical-path and non-critical-path benchmarks
-6. **Resolve critical paths** via `CriticalPathResolver`:
+7. **Resolve critical paths** via `CriticalPathResolver`:
    - Check `@perf-critical` annotations in source files
    - Check graph fan-in data (functions called by many consumers)
    - Functions in the critical path set have stricter thresholds
-7. **Flag regressions by tier:**
+8. **Flag regressions by tier:**
    - **Tier 1:** >5% regression on a critical path benchmark
    - **Tier 2:** >10% regression on a non-critical-path benchmark
    - **Tier 3:** >5% regression on a non-critical-path benchmark (within noise margin consideration)
-8. **If this is a baseline-capture run,** report results without regression comparison. Recommend running `harness perf baselines update` to persist.
+9. **If this is a baseline-capture run,** report results without regression comparison. Recommend running `harness perf baselines update` to persist.
 ---
@@ -138,6 +166,7 @@ This phase runs only when `.bench.ts` files exist in the project. If none are fo
 - **`harness perf bench`** — Run benchmarks only. Requires clean working tree.
 - **`harness perf baselines show`** — View current benchmark baselines.
 - **`harness perf baselines update`** — Persist current benchmark results as new baselines.
+- **`harness perf --check-baselines`** -- Verify baseline file is updated when benchmarks change. Runs the baseline lock-in check standalone.
 - **`harness perf critical-paths`** — View the current critical path set and how it was determined.
 - **`harness validate`** — Run after enforcement to verify overall project health.
 - **`harness graph scan`** — Refresh knowledge graph for accurate hotspot scoring.

package/dist/agents/skills/gemini-cli/harness-perf/skill.yaml CHANGED Viewed

@@ -21,6 +21,9 @@ cli:
     - name: path
       description: Project root path
       required: false
+    - name: check-baselines
+      description: Verify baseline file is updated when benchmarks change
+      required: false
 mcp:
   tool: run_skill
   input:

package/dist/agents/skills/gemini-cli/harness-perf-tdd/SKILL.md CHANGED Viewed

@@ -60,10 +60,23 @@ If you find yourself writing production code before both the test and the benchm
 2. **Run the test** — observe pass. If it fails, fix the implementation until it passes.
-3. **Run the benchmark** — capture initial results. This is the first measurement. Note:
-   - If a performance assertion exists in the spec, verify it passes
-   - If no assertion exists, record the result as a baseline reference
-   - Do not optimize at this stage unless the assertion fails
+3. **Run the benchmark** -- capture initial results and apply thresholds:
+   **When the spec defines a performance requirement** (e.g., "< 50ms"):
+   - Use the spec requirement as the benchmark assertion threshold
+   - Verify it passes; if not, see step 4
+   **When the spec is vague or silent on performance:**
+   - Fall back to harness-perf tier thresholds:
+     - Critical path functions (annotated `@perf-critical` or high fan-in): must not regress >5% from baseline (Tier 1)
+     - Non-critical functions: must not regress >10% from baseline (Tier 2)
+     - Structural complexity: must stay under Tier 2 thresholds (cyclomatic <=15, nesting <=4, function length <=50 lines, params <=5)
+   - These thresholds give developers concrete targets even when the spec does not specify performance requirements
+   **When no baseline exists (new code):**
+   - This run captures the initial baseline
+   - No regression comparison on first run
+   - VALIDATE phase (Phase 4) ensures the captured baseline is committed via `harness perf baselines update`
 4. **If the performance assertion fails,** you have two options:
    - The implementation approach is fundamentally wrong (e.g., O(n^2) when O(n) is needed) — revise the algorithm

package/dist/agents/skills/gemini-cli/harness-release-readiness/SKILL.md CHANGED Viewed

@@ -77,6 +77,8 @@ Run every check below. Record each as **pass**, **warn**, or **fail**:
 | `homepage` field exists                                                              | warn                |
 | `description` field exists                                                           | warn                |
 | Build succeeds: run the project's build command                                      | fail                |
+| Typecheck passes: run the project's typecheck command (e.g., `pnpm typecheck`)       | fail                |
+| Tests pass: run the project's test command (e.g., `pnpm test`)                       | fail                |
 | `pnpm pack --dry-run` produces expected files (no test files, no src if dist exists) | warn                |
 ##### Documentation (root level)
@@ -107,10 +109,24 @@ Run every check below. Record each as **pass**, **warn**, or **fail**:
 | CI workflow file exists (`.github/workflows/ci.yml` or similar) | fail                |
 | Release/publish workflow file exists                            | warn                |
 | `test` script exists in root `package.json`                     | fail                |
-| `lint` script exists in root `package.json`                     | warn                |
-| `typecheck` or `tsc` script exists in root `package.json`       | warn                |
+| `lint` script exists in root `package.json`                     | fail                |
+| `typecheck` or `tsc` script exists in root `package.json`       | fail                |
 | `harness validate` passes (project-level health check)          | fail                |
+##### i18n Coverage (conditional)
+When `i18n.enabled: true` in `harness.config.json`, run these checks:
+| Check                                                                                          | Severity if failing             |
+| ---------------------------------------------------------------------------------------------- | ------------------------------- |
+| Translation coverage meets `i18n.coverage.minimumPercent` for all target locales               | fail (strict) / warn (standard) |
+| No untranslated values (source text in target locale files) when `coverage.detectUntranslated` | warn                            |
+| All CLDR plural forms present for target locales when `coverage.requirePlurals`                | warn                            |
+| No stale translations (source changed since last translation timestamp)                        | warn                            |
+| `harness-i18n` scan passes with zero errors                                                    | fail (strict) / warn (standard) |
+If `i18n.enabled` is false or the `i18n` config block is absent, skip this section entirely and report it as "N/A" in the audit output.
 #### Comprehensive Checks (only with `--comprehensive`)
 These checks run only when `--comprehensive` is passed. They are slower and may require network access.
@@ -158,6 +174,7 @@ Packaging:  8/12 passed, 2 warnings, 2 failures
 Docs:       5/6 passed, 1 warning, 0 failures
 Hygiene:    3/5 passed, 2 warnings, 0 failures
 CI/CD:      4/5 passed, 1 warning, 0 failures
+i18n:       N/N passed, N warnings, N failures (or: skipped — i18n not enabled)
 [comprehensive] API Docs:  skipped (use --comprehensive)
 [comprehensive] Examples:  skipped (use --comprehensive)
 [comprehensive] Dep Health: skipped (use --comprehensive)
@@ -267,13 +284,15 @@ After each batch of fixes (or after each individual fix if not batching), run `h
 These require human judgment and cannot be auto-fixed. List them with guidance:
-| Finding                                                      | Guidance                                                                                                                                                               |
-| ------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `TODO`/`FIXME` in published source                           | List each location with file:line. Human must resolve or move to a tracked issue.                                                                                      |
-| README missing usage/API sections                            | Suggest section structure but do not generate content — only the author knows the API.                                                                                 |
-| CHANGELOG exists but has no entries (empty or template-only) | Suggest running `git log --oneline <last-tag>..HEAD` to generate entries. Unlike a missing file (auto-fixable above), an empty CHANGELOG needs human-authored content. |
-| CI workflow missing                                          | Provide a starter template but flag for human review before committing.                                                                                                |
-| Build failure                                                | Show the error output. Do not attempt to fix build issues automatically.                                                                                               |
+| Finding                                                      | Guidance                                                                                                                                                                                                                      |
+| ------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `TODO`/`FIXME` in published source                           | List each location with file:line. Human must resolve or move to a tracked issue.                                                                                                                                             |
+| README missing usage/API sections                            | Suggest section structure but do not generate content — only the author knows the API.                                                                                                                                        |
+| CHANGELOG exists but has no entries (empty or template-only) | Suggest running `git log --oneline <last-tag>..HEAD` to generate entries. Unlike a missing file (auto-fixable above), an empty CHANGELOG needs human-authored content.                                                        |
+| CI workflow missing                                          | Provide a starter template but flag for human review before committing.                                                                                                                                                       |
+| Build failure                                                | Show the error output. Do not attempt to fix build issues automatically.                                                                                                                                                      |
+| Typecheck failure                                            | Show the error output with file:line. Common causes: orphaned files with stale imports, missing type declarations, `exactOptionalPropertyTypes` violations. Do not auto-fix — type errors often indicate structural problems. |
+| Test failure                                                 | Show the error output with failing test names. Do not attempt to fix test failures automatically — they may indicate real bugs.                                                                                               |
 #### Output
@@ -492,6 +511,7 @@ This framing is informational — it does not block anything. It gives the team
 - **Sub-skill invocations** — Phase 2 dispatches `detect-doc-drift`, `cleanup-dead-code`, `enforce-architecture`, and `diagnostics` as parallel agents. Phase 3 delegates fixes to `align-documentation` and `cleanup-dead-code`.
 - **State file** — `.harness/release-readiness.json` enables session resumption and progress tracking. This file is read at the start of each invocation and written at the end.
 - **Report file** — `release-readiness-report.md` is written to the project root. It is a snapshot, not a tracked artifact — regenerate it on each run.
+- **i18n coverage** — When `i18n.enabled: true`, Phase 1 checks translation coverage against configured thresholds. Uses `harness-i18n` scan results and `harness-i18n-workflow` coverage tracking. Blocks release in strict mode if coverage is below `i18n.coverage.minimumPercent`.
 ## Success Criteria