npm - @harness-engineering/cli - Versions diffs - 1.2.0 → 1.3.0 - Mend

@harness-engineering/cli 1.2.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (52) hide show

package/dist/agents/commands/gemini-cli/harness/validate-context-engineering.toml DELETED Viewed

@@ -1,186 +0,0 @@
-# Generated by harness generate-slash-commands. Do not edit.
-description = "Validate repository context engineering practices (AGENTS.md, doc coverage, knowledge map)"
-prompt = """
-<context>
-Cognitive mode: meticulous-verifier
-Type: flexible
-</context>
-<objective>
-Validate repository context engineering practices (AGENTS.md, doc coverage, knowledge map)
-</objective>
-<execution_context>
---- SKILL.md (agents/skills/claude-code/validate-context-engineering/SKILL.md) ---
-# Validate Context Engineering
-> Validate AGENTS.md quality and evolve it as the codebase changes. Good context engineering means AI agents always have accurate, current knowledge about the project.
-## When to Use
-- After adding new files, modules, or packages to the project
-- After renaming, moving, or deleting significant files
-- After changing public APIs, architectural patterns, or conventions
-- When onboarding a new contributor (human or AI) and want to verify context is current
-- When `on_context_check` or `on_pre_commit` triggers fire
-- Periodically (weekly or per-sprint) as a hygiene check
-- NOT when making trivial changes (typo fixes, comment updates, formatting)
-- NOT during active feature development — run this between features or at cycle boundaries
-## Process
-### Phase 1: Audit — Run Automated Checks
-1. **Run `harness validate`** to check overall project health. Review any context-related warnings or errors in the output.
-2. **Run `harness check-docs`** to detect documentation gaps, broken links, and stale references. Capture the full output for analysis.
-3. **Review AGENTS.md manually.** Automated tools catch structural issues but miss semantic drift. Read each section and ask: "Is this still true?"
-### Phase 2: Detect Gaps
-Categorize findings into four types:
-1. **Undocumented files.** New source files, modules, or packages that are not mentioned in AGENTS.md or any knowledge map section. These are the highest priority — an AI agent encountering these files has no context.
-2. **Broken links.** References to files, functions, or URLs that no longer exist. These actively mislead agents.
-3. **Stale sections.** Content that was accurate when written but no longer reflects reality. Examples: renamed functions still referenced by old name, removed features still described, changed conventions not updated.
-4. **Missing context.** Sections that exist but lack critical information. A module is listed but its purpose, constraints, or relationships are not explained. An AI agent can find the file but does not understand why it exists or how to use it correctly.
-### Phase 3: Suggest Updates
-For each gap, generate a specific suggestion:
-- **Undocumented files:** Draft a new section or entry with the file path, purpose, key exports, and relationship to existing modules. Use the existing AGENTS.md style and structure.
-- **Broken links:** Identify the correct target (renamed file, moved function) or recommend removal if the target was deleted.
-- **Stale sections:** Draft replacement text that reflects current reality. Show the diff between old and new.
-- **Missing context:** Draft additional content that fills the gap. Focus on what an AI agent needs to know: purpose, constraints, relationships, and gotchas.
-### Phase 4: Apply with Approval
-1. **Present all suggestions as a grouped list.** Organize by section of AGENTS.md for easy review.
-2. **Apply approved changes.** Update AGENTS.md with the approved suggestions. Preserve existing formatting and style.
-3. **Re-run `harness check-docs`** to verify all fixes are clean. No new issues should be introduced.
-4. **Commit the update.** Use a descriptive commit message that summarizes what was updated and why.
-## What Makes Good AGENTS.md Content
-Good context engineering treats AGENTS.md as a **dynamic knowledge base**, not a static document.
-- **Accuracy over completeness.** A small, accurate AGENTS.md is better than a comprehensive but stale one. Every statement must be verifiable against the current codebase.
-- **Purpose-first descriptions.** Do not just list files. Explain WHY each module exists, what problem it solves, and what constraints apply to it.
-- **Relationship mapping.** Show how modules connect. Which modules depend on which? What are the boundaries? An agent reading one section should understand how it fits into the whole.
-- **Gotchas and constraints.** Document the non-obvious. What will break if someone does X? What patterns must be followed? What is forbidden and why?
-- **Change-friendly structure.** Organize so that adding a new module means adding one section, not updating ten places. Use consistent formatting so automated tools can parse it.
-- **Actionable guidance.** Every section should help an agent make correct decisions. "This module handles authentication" is less useful than "This module handles authentication. All auth changes must go through the AuthService class. Direct database access for auth data is forbidden — use the repository layer."
-## Harness Integration
-- **`harness validate`** — Full project health check. Reports context gaps as part of overall validation.
-- **`harness check-docs`** — Focused documentation audit. Detects broken links, missing references, stale sections, and undocumented files.
-- **`harness fix-drift`** — Auto-fix simple drift issues (broken links, renamed references). Use after manual review confirms the fixes are correct.
-## Success Criteria
-- `harness check-docs` passes with zero errors and zero warnings
-- Every source file that contains public API or architectural significance is referenced in AGENTS.md
-- All file paths and function names in AGENTS.md match the current codebase
-- All links (internal and external) resolve correctly
-- AGENTS.md sections accurately describe current module purposes, constraints, and relationships
-- A new AI agent reading AGENTS.md can navigate the codebase and make correct decisions without additional guidance
-## Examples
-### Example: New module added but not documented
-**Audit output from `harness check-docs`:**
-```
-WARNING: Undocumented file detected: src/services/notification-service.ts
-  - File contains 3 public exports: NotificationService, NotificationType, sendNotification
-  - File is imported by 4 other modules
-  - No AGENTS.md section references this file
-```
-**Suggested update:**
-```markdown
-### Notification Service (`src/services/notification-service.ts`)
-Handles all outbound notifications (email, Slack, webhook). All notification delivery
-must go through `NotificationService` — direct use of transport libraries (nodemailer,
-Slack SDK) outside this module is forbidden.
-- `NotificationType` — enum of supported notification channels
-- `sendNotification()` — primary entry point; routes to the correct transport
-- Requires `NOTIFICATION_CONFIG` environment variables to be set
-- Respects rate limits defined in `harness.config.json` under `notifications`
-```
-**Apply:** Add the section under the Services heading in AGENTS.md. Re-run `harness check-docs` to confirm the warning is resolved.
-### Example: Renamed function still referenced by old name
-**Audit output:**
-```
-ERROR: Broken reference in AGENTS.md line 47: `calculateShipping()`
-  - Function was renamed to `computeShippingCost()` in commit abc123
-  - Located in src/services/shipping.ts
-```
-**Fix:** Replace `calculateShipping()` with `computeShippingCost()` in AGENTS.md. Verify no other references to the old name exist.
-## Escalation
-- **When AGENTS.md is severely outdated (>20 issues):** Do not attempt to fix everything at once. Prioritize: broken links first, then undocumented public APIs, then stale descriptions. Batch the work across multiple commits.
-- **When you are unsure whether a section is stale:** Check git blame for the section and compare against recent changes to the referenced files. If the section has not been updated since the referenced files changed, it is likely stale.
-- **When the project has no AGENTS.md:** Escalate to the human. Creating an AGENTS.md from scratch is a significant decision about project structure and should be done intentionally, not automatically.
---- skill.yaml (agents/skills/claude-code/validate-context-engineering/skill.yaml) ---
-name: validate-context-engineering
-version: "1.0.0"
-description: Validate repository context engineering practices (AGENTS.md, doc coverage, knowledge map)
-cognitive_mode: meticulous-verifier
-triggers:
-  - manual
-  - on_pr
-  - on_commit
-platforms:
-  - claude-code
-  - gemini-cli
-tools:
-  - Bash
-  - Read
-  - Glob
-cli:
-  command: harness skill run validate-context-engineering
-  args:
-    - name: path
-      description: Project root path
-      required: false
-mcp:
-  tool: run_skill
-  input:
-    skill: validate-context-engineering
-    path: string
-type: flexible
-state:
-  persistent: false
-  files: []
-depends_on: []
-</execution_context>
-<process>
-1. Try: invoke mcp__harness__run_skill with skill: "validate-context-engineering"
-2. If MCP unavailable: follow the SKILL.md workflow provided above directly
-3. Pass through any arguments provided by the user
-</process>
-"""

package/dist/agents/commands/gemini-cli/harness/verification.toml DELETED Viewed

@@ -1,334 +0,0 @@
-# Generated by harness generate-slash-commands. Do not edit.
-description = "Comprehensive harness verification of project health and compliance"
-prompt = """
-<context>
-Cognitive mode: meticulous-verifier
-Type: rigid
-</context>
-<objective>
-Comprehensive harness verification of project health and compliance
-Phases:
-- check: Run all harness validation commands
-- report: Summarize findings and violations
-- remediate: Fix any critical violations
-</objective>
-<execution_context>
---- SKILL.md (agents/skills/claude-code/harness-verification/SKILL.md) ---
-# Harness Verification
-> 3-level evidence-based verification. No completion claims without fresh evidence. "Should work" is not evidence.
-## When to Use
-- After completing any implementation task (before claiming "done")
-- After executing a plan or spec (verify all deliverables)
-- When validating work done by another agent or in a previous session
-- When resuming work after a context reset (re-verify before continuing)
-- When `on_commit` or `on_pr` triggers fire and verification is needed
-- NOT as a replacement for tests (verification checks that tests exist and pass, not that logic is correct)
-- NOT for in-progress work (verify at completion boundaries, not mid-stream)
-### Verification Tiers
-Harness uses a two-tier verification model:
-| Tier           | Skill                             | When                       | What                                               |
-| -------------- | --------------------------------- | -------------------------- | -------------------------------------------------- |
-| **Quick gate** | harness-execution (built-in)      | After every task           | test + lint + typecheck + build + harness validate |
-| **Deep audit** | harness-verification (this skill) | Milestones, PRs, on-demand | EXISTS → SUBSTANTIVE → WIRED                       |
-Use this skill (deep audit) for milestone boundaries, before creating PRs, or when the quick gate passes but something feels wrong. Do NOT invoke this skill after every individual task — that is what the quick gate handles.
-## Process
-### Iron Law
-**No completion claim may be made without fresh verification evidence collected in THIS session.**
-Cached results, remembered outcomes, and "it worked last time" are not evidence. Run the checks. Read the output. Report what you observed.
-The words "should", "probably", "seems to", and "I believe" are forbidden in verification reports. Replace with "verified: [evidence]" or "not verified: [what is missing]."
----
-### Level 1: EXISTS — The Artifact Is Present
-For every artifact that was supposed to be created or modified:
-1. **Check that the file exists on disk.** Use `ls`, `stat`, or read the file. Do not assume it exists because you wrote it — file writes can fail silently.
-2. **Check that the file has content.** An empty file is not an artifact. Read the file and confirm it has non-trivial content.
-3. **Check the file is in the right location.** Compare the actual path against the spec or plan. A file in the wrong directory is not "present."
-4. **Record the result.** For each expected artifact:
-   ```
-   [EXISTS: PASS] path/to/file.ts (247 lines)
-   [EXISTS: FAIL] path/to/missing-file.ts — file not found
-   ```
-Do not proceed to Level 2 until all Level 1 checks pass. Missing files must be created first.
----
-### Level 2: SUBSTANTIVE — Not a Stub
-For every artifact that passed Level 1:
-1. **Read the file content.** Do not skim — read it thoroughly.
-2. **Scan for anti-patterns** that indicate stub or placeholder implementations:
-   - `TODO` or `FIXME` comments (especially `TODO: implement`)
-   - `throw new Error('not implemented')`
-   - `() => {}` (empty arrow functions)
-   - `return null`, `return undefined`, `return {}` as the only logic
-   - `pass` (Python placeholder)
-   - `placeholder`, `stub`, `mock` in non-test code
-   - Functions with only a comment describing what they should do
-   - Interfaces or types defined but never implemented
-3. **Verify real implementation exists.** The file must contain actual logic that performs the described behavior. A function that only returns a hardcoded value is a stub unless that is the correct behavior.
-4. **Check for completeness against the spec.** If the spec says "handles errors X, Y, Z," verify all three are handled, not just X.
-5. **Record the result.** For each artifact:
-   ```
-   [SUBSTANTIVE: PASS] path/to/file.ts — real implementation, no stubs
-   [SUBSTANTIVE: FAIL] path/to/file.ts — contains TODO on line 34, empty handler on line 67
-   ```
-Do not proceed to Level 3 until all Level 2 checks pass. Stubs must be replaced with real implementations first.
----
-### Level 3: WIRED — Connected to the System
-For every artifact that passed Level 2:
-1. **Verify the artifact is imported/required** by at least one other file in the system (unless it is an entry point).
-2. **Verify the artifact is called/used.** An import that is never called is dead code. Trace the usage:
-   - Functions: called from at least one other function or test
-   - Components: rendered in at least one parent or route
-   - Types: used in at least one function signature or variable declaration
-   - Configuration: loaded and applied by the system
-   - Tests: executed by the test runner
-3. **Verify the artifact is tested.** There must be at least one test that exercises the artifact's behavior. Check:
-   - Test file exists
-   - Test imports or references the artifact
-   - Test makes assertions about the artifact's behavior
-   - Test actually runs (not skipped with `.skip` or `xit`)
-4. **Run the tests.** Execute the test suite and verify tests pass. Do not trust "they passed earlier" — run them now.
-5. **Run harness checks.** Execute `harness validate` and verify the artifact integrates correctly with the project's constraints.
-6. **Record the result.** For each artifact:
-   ```
-   [WIRED: PASS] path/to/file.ts — imported by 3 files, tested in file.test.ts (4 tests, all pass)
-   [WIRED: FAIL] path/to/file.ts — exported but not imported by any other file
-   ```
----
-### Anti-Pattern Scan
-Run this scan across all changed files as a final check:
-```
-Scan targets: TODO, FIXME, XXX, HACK, PLACEHOLDER, NOT_IMPLEMENTED
-Code patterns: () => {}, return null (as sole body), pass, raise NotImplementedError
-Test patterns: .skip, xit, xdescribe, @pytest.mark.skip, pending
-```
-Any match is a verification failure. Either fix it or explicitly document why it is acceptable (e.g., "TODO is tracked in issue #123 and out of scope for this task").
----
-### Gap Identification
-After running all three levels, produce a structured gap report:
-```
-## Verification Report
-### Level 1: EXISTS
-- [PASS] path/to/file-a.ts (120 lines)
-- [PASS] path/to/file-b.ts (85 lines)
-- [FAIL] path/to/file-c.ts — not found
-### Level 2: SUBSTANTIVE
-- [PASS] path/to/file-a.ts — real implementation
-- [FAIL] path/to/file-b.ts — TODO on line 22
-### Level 3: WIRED
-- [PASS] path/to/file-a.ts — imported, tested, harness passes
-- [NOT CHECKED] path/to/file-b.ts — blocked by Level 2 failure
-### Anti-Pattern Scan
-- path/to/file-b.ts:22 — TODO: implement validation
-### Gaps
-1. path/to/file-c.ts must be created
-2. path/to/file-b.ts:22 must be implemented (not stub)
-### Verdict: INCOMPLETE — 2 gaps must be resolved
-```
----
-### Regression Test Verification
-When verifying a bug fix, apply this extended protocol:
-1. **Write** the regression test that reproduces the bug
-2. **Run** the test — it must PASS (proving the fix works)
-3. **Revert** the fix (temporarily): `git stash` or comment out the fix
-4. **Run** the test — it must FAIL (proving the test actually catches the bug)
-5. **Restore** the fix: `git stash pop` or uncomment
-6. **Run** the test — it must PASS again (proving the fix is the reason)
-If step 4 passes (test does not fail without the fix), the test is not a valid regression test. It does not catch the bug. Rewrite it.
-## Harness Integration
-- **`harness validate`** — Run in Level 3 WIRED check. Verifies project-wide health and constraint compliance.
-- **`harness check-deps`** — Run in Level 3 to verify new artifacts respect dependency boundaries.
-- **`harness check-docs`** — Run to verify documentation is updated for new artifacts. Missing docs for new public APIs is a gap.
-- **Test runner** — Must be run fresh (not cached) during Level 3. Read actual output, check exit codes.
-All commands must be run fresh in the current session. Do not rely on results from a previous session or a previous run in the same session if code has changed since.
-## Success Criteria
-- Every claimed deliverable has been verified at all 3 levels
-- No anti-patterns remain in delivered code
-- Verification report uses the structured format with PASS/FAIL per artifact per level
-- All verification evidence was collected fresh in the current session
-- No forbidden language ("should", "probably", "seems to") appears in the report
-- All gaps are explicitly identified with specific remediation steps
-- Regression tests (for bug fixes) pass the 5-step revert check
-## Non-Determinism Tolerance
-For mechanical checks (tests pass, lint clean, types check), results are binary — pass or fail. No tolerance.
-For behavioral verification (did the agent follow a convention, did the output match a style guide), accept threshold-based results:
-- Run the check multiple times if needed
-- "Agent followed the constraint in 4/5 runs" = pass
-- "Agent followed the constraint in 2/5 runs" = fail — the convention is poorly written, not the agent
-If a behavioral convention fails more than 40% of the time, the convention needs rewriting. Blame the instruction, not the executor.
-## Examples
-### Example: Verifying a New Service Module
-Task: "Create UserService with create, read, update, delete operations."
-```
-## Verification Report
-### Level 1: EXISTS
-- [PASS] src/services/user-service.ts (189 lines)
-- [PASS] src/services/user-service.test.ts (245 lines)
-- [PASS] src/services/index.ts (updated — exports UserService)
-### Level 2: SUBSTANTIVE
-- [PASS] src/services/user-service.ts — all 4 CRUD methods implemented with
-         validation, error handling, and database calls
-- [PASS] src/services/user-service.test.ts — 12 tests covering happy paths,
-         error cases, and edge cases (no skipped tests)
-### Level 3: WIRED
-- [PASS] src/services/user-service.ts — imported by src/api/routes/users.ts,
-         tested in user-service.test.ts (12 tests, all pass)
-- [PASS] harness validate — passes
-- [PASS] harness check-deps — no boundary violations
-### Anti-Pattern Scan
-- No matches found
-### Gaps
-(none)
-### Verdict: COMPLETE — all artifacts verified at all levels
-```
-## Gates
-- **No completion without evidence.** You may not say "done," "complete," "finished," or "implemented" without a verification report showing PASS at all 3 levels for all deliverables.
-- **No stale evidence.** Evidence must be from the current session. "I checked earlier" is not evidence. Run it again.
-- **No forbidden language.** "Should work," "probably fine," "seems correct," and "I believe it works" are not verification statements. Replace with observed evidence or state "not verified."
-- **No skipping levels.** Level 1 before Level 2. Level 2 before Level 3. Each level depends on the previous.
-- **No satisfaction before evidence.** The natural inclination after writing code is to feel done. Resist it. Feeling done is not being done. Evidence is being done.
-## Escalation
-- **When an artifact cannot pass Level 3 (WIRED) because the system it connects to does not exist yet:** Document the gap explicitly. State what integration is missing and what must be built. Do not mark it as PASS.
-- **When anti-pattern scan finds TODOs that are intentional:** Each must be justified with a tracked issue number. "TODO: implement" with no issue reference is not acceptable. "TODO(#123): add rate limiting after infrastructure is ready" is acceptable.
-- **When tests pass but you suspect they are not testing real behavior:** Read the test assertions carefully. If tests only check "does not throw" or assert on mock return values without verifying real behavior, flag them as SUBSTANTIVE failures.
-- **When verification reveals the spec itself is incomplete:** Do not fill in the gaps yourself. Escalate to the human: "Verification found that the spec does not define behavior for [scenario]. How should this be handled?"
-- **When you cannot run harness checks:** If `harness validate` or `harness check-deps` cannot be run (missing configuration, broken tooling), this is a blocking issue. Do not skip verification — fix the tooling or escalate.
-After verification completes, append a tagged learning:
-- **YYYY-MM-DD [skill:harness-verification] [outcome:pass/fail]:** Verified [feature]. [Brief note on what was found or confirmed.]
---- skill.yaml (agents/skills/claude-code/harness-verification/skill.yaml) ---
-name: harness-verification
-version: "1.0.0"
-description: Comprehensive harness verification of project health and compliance
-cognitive_mode: meticulous-verifier
-triggers:
-  - manual
-  - on_pr
-  - on_commit
-platforms:
-  - claude-code
-  - gemini-cli
-tools:
-  - Bash
-  - Read
-  - Glob
-cli:
-  command: harness skill run harness-verification
-  args:
-    - name: path
-      description: Project root path
-      required: false
-mcp:
-  tool: run_skill
-  input:
-    skill: harness-verification
-    path: string
-type: rigid
-phases:
-  - name: check
-    description: Run all harness validation commands
-    required: true
-  - name: report
-    description: Summarize findings and violations
-    required: true
-  - name: remediate
-    description: Fix any critical violations
-    required: true
-state:
-  persistent: false
-  files: []
-depends_on: []
-</execution_context>
-<process>
-1. Try: invoke mcp__harness__run_skill with skill: "harness-verification"
-2. If MCP unavailable: follow the SKILL.md workflow provided above directly
-3. Pass through any arguments provided by the user
-</process>
-"""