codeharness 0.17.6 → 0.18.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/index.js +342 -393
- package/package.json +2 -1
- package/patches/dev-enforcement.md +34 -0
- package/patches/retro-enforcement.md +34 -0
- package/patches/review-enforcement.md +23 -0
- package/patches/sprint-planning.md +21 -0
- package/patches/story-verification.md +25 -0
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "codeharness",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.18.0",
|
|
4
4
|
"type": "module",
|
|
5
5
|
"description": "CLI for codeharness — makes autonomous coding agents produce software that actually works",
|
|
6
6
|
"bin": {
|
|
@@ -9,6 +9,7 @@
|
|
|
9
9
|
"files": [
|
|
10
10
|
"dist",
|
|
11
11
|
"bin",
|
|
12
|
+
"patches",
|
|
12
13
|
"templates/Dockerfile.verify",
|
|
13
14
|
"ralph/**/*.sh",
|
|
14
15
|
"ralph/AGENTS.md"
|
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
## Codeharness Development Enforcement
|
|
2
|
+
|
|
3
|
+
### Architecture Awareness
|
|
4
|
+
|
|
5
|
+
Before writing code, read the relevant `AGENTS.md` file for the module being changed. Understand:
|
|
6
|
+
- Build commands and test runners
|
|
7
|
+
- Module boundaries and conventions
|
|
8
|
+
- What NOT to do (common pitfalls documented from prior incidents)
|
|
9
|
+
|
|
10
|
+
### Observability
|
|
11
|
+
|
|
12
|
+
After running tests, verify telemetry is flowing:
|
|
13
|
+
- Query VictoriaLogs to confirm log events from test runs
|
|
14
|
+
- If observability is configured, traces should be visible for CLI operations
|
|
15
|
+
|
|
16
|
+
### Documentation
|
|
17
|
+
|
|
18
|
+
- Update `AGENTS.md` for all changed modules
|
|
19
|
+
- Keep exec-plan current with implementation state
|
|
20
|
+
|
|
21
|
+
### Testing
|
|
22
|
+
|
|
23
|
+
- All tests pass before moving to review
|
|
24
|
+
- Coverage gate: 100% of new/changed code
|
|
25
|
+
- Run `npm test` / `pytest` and verify no regressions
|
|
26
|
+
|
|
27
|
+
### Black-Box Thinking
|
|
28
|
+
|
|
29
|
+
Write code that can be verified from the outside. Ask yourself:
|
|
30
|
+
- Can a user exercise this feature from the CLI alone?
|
|
31
|
+
- Is the behavior documented in README.md?
|
|
32
|
+
- Would a verifier with NO source access be able to tell if this works?
|
|
33
|
+
|
|
34
|
+
If the answer is "no", the feature has a testability gap — fix the CLI/docs, not the verification process.
|
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
## Codeharness Retrospective Quality Metrics
|
|
2
|
+
|
|
3
|
+
### Verification Effectiveness
|
|
4
|
+
|
|
5
|
+
- How many ACs were caught by black-box verification vs slipped through?
|
|
6
|
+
- Were there false positives (proof said PASS but feature was broken)?
|
|
7
|
+
- Were there false negatives (proof said FAIL but feature actually works)?
|
|
8
|
+
- Time spent on verification — is it proportional to value?
|
|
9
|
+
|
|
10
|
+
### Verification Pipeline Health
|
|
11
|
+
|
|
12
|
+
- Did the verifier hang on permissions? (check for `--allowedTools` issues)
|
|
13
|
+
- Did stories get stuck in verify→dev loops? (check `attempts` counter)
|
|
14
|
+
- Were stories incorrectly flagged as `integration-required`?
|
|
15
|
+
- Did the verify parser correctly detect `[FAIL]` verdicts?
|
|
16
|
+
|
|
17
|
+
### Documentation Health
|
|
18
|
+
|
|
19
|
+
- AGENTS.md accuracy for changed modules
|
|
20
|
+
- Exec-plans completeness for active stories
|
|
21
|
+
- Stale documentation identified and cleaned up
|
|
22
|
+
|
|
23
|
+
### Test Quality
|
|
24
|
+
|
|
25
|
+
- Coverage trend (improving, stable, declining)
|
|
26
|
+
- Any flaky tests introduced?
|
|
27
|
+
- Integration test coverage for cross-module interactions
|
|
28
|
+
|
|
29
|
+
### Action Items
|
|
30
|
+
|
|
31
|
+
Every retro MUST produce concrete action items with:
|
|
32
|
+
- Clear description of what to fix
|
|
33
|
+
- Why it matters (what fails without this fix)
|
|
34
|
+
- Owner and priority
|
|
@@ -0,0 +1,23 @@
|
|
|
1
|
+
## Codeharness Review Gates
|
|
2
|
+
|
|
3
|
+
### Verification Proof
|
|
4
|
+
|
|
5
|
+
- Proof document exists at `verification/<story-key>-proof.md`
|
|
6
|
+
- Proof passes `codeharness verify --story <id>` (parser checks for FAIL/ESCALATE verdicts)
|
|
7
|
+
- Every AC has functional evidence — reading docs alone is not evidence
|
|
8
|
+
- No fabricated output — all evidence must be from actual command execution
|
|
9
|
+
|
|
10
|
+
### Proof Quality Checks
|
|
11
|
+
|
|
12
|
+
The proof must pass black-box enforcement:
|
|
13
|
+
- Commands run via `docker exec` (not direct host access)
|
|
14
|
+
- Less than 50% of evidence commands are `grep` against `src/`
|
|
15
|
+
- Each AC section has at least one `docker exec`, `docker ps/logs`, or observability query
|
|
16
|
+
- `[FAIL]` verdicts outside code blocks cause the proof to fail
|
|
17
|
+
- `[ESCALATE]` is acceptable only when all automated approaches are exhausted
|
|
18
|
+
|
|
19
|
+
### Code Quality
|
|
20
|
+
|
|
21
|
+
- Coverage delta reported (before vs after)
|
|
22
|
+
- No coverage regression in changed files
|
|
23
|
+
- AGENTS.md is current for all changed modules
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
## Codeharness Sprint Planning Integration
|
|
2
|
+
|
|
3
|
+
### Pre-Planning Checks
|
|
4
|
+
|
|
5
|
+
Before selecting stories, verify:
|
|
6
|
+
1. All prior retrospective action items are reviewed (`_bmad-output/implementation-artifacts/session-retro-*.md`)
|
|
7
|
+
2. Unresolved action items are surfaced — do not start new work while critical fixes are pending
|
|
8
|
+
3. `ralph/.story_retries` is reviewed — stories with high attempt counts may need architectural changes, not more retries
|
|
9
|
+
|
|
10
|
+
### Backlog Sources
|
|
11
|
+
|
|
12
|
+
Import from all sources before triage:
|
|
13
|
+
- `codeharness retro-import --epic N` for retrospective findings
|
|
14
|
+
- `codeharness github-import` for labeled GitHub issues
|
|
15
|
+
- `bd ready` to display combined backlog
|
|
16
|
+
|
|
17
|
+
### Story Readiness
|
|
18
|
+
|
|
19
|
+
- Each story has clear, testable acceptance criteria
|
|
20
|
+
- ACs are written so they CAN be verified from CLI + Docker (avoid writing untestable ACs)
|
|
21
|
+
- Dependencies between stories are explicit
|
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
## Verification Requirements
|
|
2
|
+
|
|
3
|
+
Every story must produce a **black-box proof** — evidence that the feature works from the user's perspective, NOT from reading source code.
|
|
4
|
+
|
|
5
|
+
### Proof Standard
|
|
6
|
+
|
|
7
|
+
- Proof document at `verification/<story-key>-proof.md`
|
|
8
|
+
- Each AC gets a `## AC N:` section with `docker exec` commands and captured output
|
|
9
|
+
- Evidence must come from running the installed CLI/tool, not from grepping source
|
|
10
|
+
- `[FAIL]` = AC failed with evidence showing what went wrong
|
|
11
|
+
- `[ESCALATE]` = AC genuinely cannot be automated (last resort — try everything first)
|
|
12
|
+
|
|
13
|
+
### Verification Tags
|
|
14
|
+
|
|
15
|
+
For each AC, append a tag indicating verification approach:
|
|
16
|
+
- `<!-- verification: cli-verifiable -->` — default. Can be verified via CLI commands in a Docker container.
|
|
17
|
+
- `<!-- verification: integration-required -->` — requires external systems not available in the test environment (e.g., paid third-party APIs, physical hardware). This is rare — most things including workflows, agent sessions, and multi-step processes CAN be verified in Docker.
|
|
18
|
+
|
|
19
|
+
**Do not over-tag.** Workflows, sprint planning, user sessions, slash commands, and agent behavior are all verifiable via `docker exec ... claude --print`. Only tag `integration-required` when there is genuinely no automated path.
|
|
20
|
+
|
|
21
|
+
### Testing Requirements
|
|
22
|
+
|
|
23
|
+
- Unit tests for all new/changed code
|
|
24
|
+
- Coverage target: 100% of new/changed lines
|
|
25
|
+
- No skipped tests without justification
|