codeharness 0.17.6 → 0.18.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "codeharness",
3
- "version": "0.17.6",
3
+ "version": "0.18.0",
4
4
  "type": "module",
5
5
  "description": "CLI for codeharness — makes autonomous coding agents produce software that actually works",
6
6
  "bin": {
@@ -9,6 +9,7 @@
9
9
  "files": [
10
10
  "dist",
11
11
  "bin",
12
+ "patches",
12
13
  "templates/Dockerfile.verify",
13
14
  "ralph/**/*.sh",
14
15
  "ralph/AGENTS.md"
@@ -0,0 +1,34 @@
1
+ ## Codeharness Development Enforcement
2
+
3
+ ### Architecture Awareness
4
+
5
+ Before writing code, read the relevant `AGENTS.md` file for the module being changed. Understand:
6
+ - Build commands and test runners
7
+ - Module boundaries and conventions
8
+ - What NOT to do (common pitfalls documented from prior incidents)
9
+
10
+ ### Observability
11
+
12
+ After running tests, verify telemetry is flowing:
13
+ - Query VictoriaLogs to confirm log events from test runs
14
+ - If observability is configured, traces should be visible for CLI operations
15
+
16
+ ### Documentation
17
+
18
+ - Update `AGENTS.md` for all changed modules
19
+ - Keep exec-plan current with implementation state
20
+
21
+ ### Testing
22
+
23
+ - All tests pass before moving to review
24
+ - Coverage gate: 100% of new/changed code
25
+ - Run `npm test` / `pytest` and verify no regressions
26
+
27
+ ### Black-Box Thinking
28
+
29
+ Write code that can be verified from the outside. Ask yourself:
30
+ - Can a user exercise this feature from the CLI alone?
31
+ - Is the behavior documented in README.md?
32
+ - Would a verifier with NO source access be able to tell if this works?
33
+
34
+ If the answer is "no", the feature has a testability gap — fix the CLI/docs, not the verification process.
@@ -0,0 +1,34 @@
1
+ ## Codeharness Retrospective Quality Metrics
2
+
3
+ ### Verification Effectiveness
4
+
5
+ - How many ACs were caught by black-box verification vs slipped through?
6
+ - Were there false positives (proof said PASS but feature was broken)?
7
+ - Were there false negatives (proof said FAIL but feature actually works)?
8
+ - Time spent on verification — is it proportional to value?
9
+
10
+ ### Verification Pipeline Health
11
+
12
+ - Did the verifier hang on permissions? (check for `--allowedTools` issues)
13
+ - Did stories get stuck in verify→dev loops? (check `attempts` counter)
14
+ - Were stories incorrectly flagged as `integration-required`?
15
+ - Did the verify parser correctly detect `[FAIL]` verdicts?
16
+
17
+ ### Documentation Health
18
+
19
+ - AGENTS.md accuracy for changed modules
20
+ - Exec-plans completeness for active stories
21
+ - Stale documentation identified and cleaned up
22
+
23
+ ### Test Quality
24
+
25
+ - Coverage trend (improving, stable, declining)
26
+ - Any flaky tests introduced?
27
+ - Integration test coverage for cross-module interactions
28
+
29
+ ### Action Items
30
+
31
+ Every retro MUST produce concrete action items with:
32
+ - Clear description of what to fix
33
+ - Why it matters (what fails without this fix)
34
+ - Owner and priority
@@ -0,0 +1,23 @@
1
+ ## Codeharness Review Gates
2
+
3
+ ### Verification Proof
4
+
5
+ - Proof document exists at `verification/<story-key>-proof.md`
6
+ - Proof passes `codeharness verify --story <id>` (parser checks for FAIL/ESCALATE verdicts)
7
+ - Every AC has functional evidence — reading docs alone is not evidence
8
+ - No fabricated output — all evidence must be from actual command execution
9
+
10
+ ### Proof Quality Checks
11
+
12
+ The proof must pass black-box enforcement:
13
+ - Commands run via `docker exec` (not direct host access)
14
+ - Less than 50% of evidence commands are `grep` against `src/`
15
+ - Each AC section has at least one `docker exec`, `docker ps/logs`, or observability query
16
+ - `[FAIL]` verdicts outside code blocks cause the proof to fail
17
+ - `[ESCALATE]` is acceptable only when all automated approaches are exhausted
18
+
19
+ ### Code Quality
20
+
21
+ - Coverage delta reported (before vs after)
22
+ - No coverage regression in changed files
23
+ - AGENTS.md is current for all changed modules
@@ -0,0 +1,21 @@
1
+ ## Codeharness Sprint Planning Integration
2
+
3
+ ### Pre-Planning Checks
4
+
5
+ Before selecting stories, verify:
6
+ 1. All prior retrospective action items are reviewed (`_bmad-output/implementation-artifacts/session-retro-*.md`)
7
+ 2. Unresolved action items are surfaced — do not start new work while critical fixes are pending
8
+ 3. `ralph/.story_retries` is reviewed — stories with high attempt counts may need architectural changes, not more retries
9
+
10
+ ### Backlog Sources
11
+
12
+ Import from all sources before triage:
13
+ - `codeharness retro-import --epic N` for retrospective findings
14
+ - `codeharness github-import` for labeled GitHub issues
15
+ - `bd ready` to display combined backlog
16
+
17
+ ### Story Readiness
18
+
19
+ - Each story has clear, testable acceptance criteria
20
+ - ACs are written so they CAN be verified from CLI + Docker (avoid writing untestable ACs)
21
+ - Dependencies between stories are explicit
@@ -0,0 +1,25 @@
1
+ ## Verification Requirements
2
+
3
+ Every story must produce a **black-box proof** — evidence that the feature works from the user's perspective, NOT from reading source code.
4
+
5
+ ### Proof Standard
6
+
7
+ - Proof document at `verification/<story-key>-proof.md`
8
+ - Each AC gets a `## AC N:` section with `docker exec` commands and captured output
9
+ - Evidence must come from running the installed CLI/tool, not from grepping source
10
+ - `[FAIL]` = AC failed with evidence showing what went wrong
11
+ - `[ESCALATE]` = AC genuinely cannot be automated (last resort — try everything first)
12
+
13
+ ### Verification Tags
14
+
15
+ For each AC, append a tag indicating verification approach:
16
+ - `<!-- verification: cli-verifiable -->` — default. Can be verified via CLI commands in a Docker container.
17
+ - `<!-- verification: integration-required -->` — requires external systems not available in the test environment (e.g., paid third-party APIs, physical hardware). This is rare — most things including workflows, agent sessions, and multi-step processes CAN be verified in Docker.
18
+
19
+ **Do not over-tag.** Workflows, sprint planning, user sessions, slash commands, and agent behavior are all verifiable via `docker exec ... claude --print`. Only tag `integration-required` when there is genuinely no automated path.
20
+
21
+ ### Testing Requirements
22
+
23
+ - Unit tests for all new/changed code
24
+ - Coverage target: 100% of new/changed lines
25
+ - No skipped tests without justification