npm - opencode-skills-collection - Versions diffs - 3.0.46 → 3.0.48 - Mend

opencode-skills-collection 3.0.46 → 3.0.48

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (78) hide show

package/bundled-skills/ecl-harness-engineer/references/audit-templates.md ADDED Viewed

@@ -0,0 +1,649 @@
+# Audit Templates
+Templates for auditing and improving existing harness infrastructure.
+Advanced profile note: eval and observability sections in this reference apply only when the
+project explicitly enables advanced agent-platform capabilities. Core ECL harness audits should
+not fail or lose score just because `harness/eval`, `harness/trace`, `harness/memory`,
+`harness/checkpoints`, or `harness/metrics` are absent.
+## Audit Checklist
+### Documentation Audit (25%)
+| Item | Check | Score |
+|------|-------|-------|
+| AGENTS.md exists | `test -f AGENTS.md` | 0/10 |
+| AGENTS.md is ~100 lines (not monolithic) | `wc -l AGENTS.md` should be 80-120 | 0/10 |
+| docs/ARCHITECTURE.md exists | `test -f docs/ARCHITECTURE.md` | 0/10 |
+| Architecture matches reality | Compare layer hierarchy to `go list ./...` | 0/20 |
+| docs/DEVELOPMENT.md exists | `test -f docs/DEVELOPMENT.md` | 0/10 |
+| Build commands in DEVELOPMENT.md work | Run them and check | 0/10 |
+| docs/QUALITY.md exists | `test -f docs/QUALITY.md` | 0/10 |
+| Design docs cover major components | Check docs/design-docs/ | 0/10 |
+| Reference docs are complete | Check docs/references/ | 0/10 |
+**Total: /100 → Scale to 25%**
+### Linter Audit (20%)
+| Item | Check | Score |
+|------|-------|-------|
+| scripts/lint-deps.go exists | `test -f scripts/lint-deps.go` | 0/15 |
+| Layer map covers all packages | Compare to `go list ./...` | 0/20 |
+| Introducing violation fails lint | Add bad import, run lint | 0/15 |
+| scripts/lint-quality.go exists | `test -f scripts/lint-quality.go` | 0/15 |
+| Quality rules match QUALITY.md | Compare documented rules to linter | 0/10 |
+| Makefile has lint-arch target | `grep lint-arch Makefile` | 0/10 |
+| `make lint-arch` passes | Run it | 0/15 |
+**Total: /100 → Scale to 20%**
+### Observability Audit (15%)
+| Item | Check | Score |
+|------|-------|-------|
+| harness/trace/ exists | `test -d harness/trace` | 0/25 |
+| Trace format covers all tool types | Check ToolTrace struct | 0/25 |
+| harness/selftest/ exists | `test -d harness/selftest` | 0/25 |
+| Observability hook registered | Check hook wiring | 0/25 |
+**Total: /100 → Scale to 15%**
+### Eval Audit (20%)
+| Item | Check | Score |
+|------|-------|-------|
+| harness/eval/framework.go exists | `test -f harness/eval/framework.go` | 0/10 |
+| harness/eval/runner.go exists | `test -f harness/eval/runner.go` | 0/10 |
+| harness/eval/scorer.go exists | `test -f harness/eval/scorer.go` | 0/10 |
+| harness/eval/reporter.go exists | `test -f harness/eval/reporter.go` | 0/10 |
+| file_ops/ has 5+ tasks | Count JSON files | 0/10 |
+| code_gen/ has 5+ tasks | Count JSON files | 0/10 |
+| debugging/ has 5+ tasks | Count JSON files | 0/10 |
+| refactoring/ has 5+ tasks | Count JSON files | 0/10 |
+| Tasks cover new features | Manual review | 0/10 |
+| All tasks still work | Run evals | 0/10 |
+**Total: /100 → Scale to 20%**
+### Quality Automation Audit (10%)
+| Item | Check | Score |
+|------|-------|-------|
+| harness/quality/score.go exists | `test -f harness/quality/score.go` | 0/25 |
+| Quality score calculation works | Run it | 0/25 |
+| harness/cleanup/tasks.go exists | `test -f harness/cleanup/tasks.go` | 0/25 |
+| Cleanup tasks find real issues | Run dry-run | 0/25 |
+**Total: /100 → Scale to 10%**
+### Integration Audit (10%)
+| Item | Check | Score |
+|------|-------|-------|
+| `go build ./...` passes | Run it | 0/40 |
+| `make lint-arch` passes | Run it | 0/30 |
+| CI runs harness checks | Check CI config | 0/30 |
+**Total: /100 → Scale to 10%**
+---
+## Scoring Rubric
+### How to Score Each Item
+- **Binary items** (exists/doesn't): 0 or full points
+- **Quality items** (matches reality): Partial credit based on accuracy
+  - 100%: Exact match
+  - 75%: Minor discrepancies (1-2 items)
+  - 50%: Moderate discrepancies (3-5 items)
+  - 25%: Major discrepancies but structure is right
+  - 0%: Completely wrong or missing
+### Calculating Overall Score
+```
+Overall = (Doc × 0.25) + (Linter × 0.20) + (Obs × 0.15) + (Eval × 0.20) + (Quality × 0.10) + (Integration × 0.10)
+```
+### Score Interpretation
+| Score | Status | Action |
+|-------|--------|--------|
+| 0-20% | Critical | Use Create Mode — build from scratch |
+| 21-40% | Poor | Major gaps — extensive improvement needed |
+| 41-60% | Fair | Multiple gaps — targeted improvement |
+| 61-80% | Good | Minor gaps — polish and expand |
+| 81-100% | Excellent | Maintenance mode — keep current |
+---
+## Gap Analysis Templates
+### Documentation Drift Report
+```markdown
+## Documentation Drift Analysis
+### ARCHITECTURE.md Layer Hierarchy
+**Documented Layers:**
+```
+[Copy from ARCHITECTURE.md]
+```
+**Actual Package Structure:**
+```bash
+go list ./... | grep -v vendor
+```
+**Discrepancies:**
+| Documented | Actual | Issue |
+|------------|--------|-------|
+| core/types | core/types | ✓ Match |
+| core/agent | core/agent | ✓ Match |
+| - | core/newpkg | Missing from docs |
+### Tool Catalog
+**Documented Tools:** [count]
+**Actual Tools:** [count]
+**Missing from docs:**
+- ToolA (added in commit abc123)
+- ToolB (added in commit def456)
+### Error Codes
+**Documented Codes:** [count]
+**Actual Codes:** [count]
+**Missing from docs:**
+- 300105 NotFoundError (added in PR #123)
+```
+### Linter Gap Report
+```markdown
+## Linter Gap Analysis
+### Layer Map Coverage
+**Packages in layer map:** [count]
+**Packages in codebase:** [count]
+**Missing from layer map:**
+| Package | Suggested Layer | Reason |
+|---------|-----------------|--------|
+| core/newpkg | Layer 2 | Depends only on core/types |
+| api/v2 | Layer 4 | New API version |
+### Violation Test Results
+| Test | Expected | Actual | Status |
+|------|----------|--------|--------|
+| Bad import in core/types | Fail | Fail | ✓ Pass |
+| Bad import in core/agent | Fail | Fail | ✓ Pass |
+| Bad import in api/v2 | Fail | Pass | ✗ Gap |
+### Quality Rules Coverage
+**Rules in QUALITY.md:** [count]
+**Rules in lint-quality.go:** [count]
+**Missing enforcement:**
+- Rule 5: "No hardcoded timeouts" — not checked by linter
+```
+### Eval Coverage Report
+```markdown
+## Eval Coverage Analysis
+### Tasks per Category
+| Category | Count | Target | Status |
+|----------|-------|--------|--------|
+| file_ops | 3 | 5+ | ✗ Below target |
+| code_gen | 2 | 5+ | ✗ Below target |
+| debugging | 5 | 5+ | ✓ Meets target |
+| refactoring | 4 | 5+ | ✗ Below target |
+### Feature Coverage
+| Feature | Has Eval | Priority |
+|---------|----------|----------|
+| File write | ✓ | - |
+| File read | ✓ | - |
+| JSON parsing | ✗ | P1 |
+| Error handling | ✓ | - |
+| New auth module | ✗ | P0 |
+### Task Health
+| Task ID | Status | Issue |
+|---------|--------|-------|
+| file_ops_001 | ✓ Works | - |
+| code_gen_001 | ✗ Broken | Uses removed API |
+| debug_001 | ✓ Works | - |
+```
+---
+## Improvement Plan Template
+```markdown
+## Harness Improvement Plan
+**Project:** [Name]
+**Audit Date:** YYYY-MM-DD
+**Audit Score:** XX%
+**Target Score:** 80%+
+### Priority Gaps
+#### P0 — Critical (Fix Immediately)
+1. [Gap description]
+   - Impact: [Why this matters]
+   - Fix: [Specific action]
+   - Effort: [Hours estimate]
+#### P1 — High (Fix This Sprint)
+1. [Gap description]
+   - Impact: [Why this matters]
+   - Fix: [Specific action]
+   - Effort: [Hours estimate]
+#### P2 — Medium (Fix Next Sprint)
+1. [Gap description]
+   - Impact: [Why this matters]
+   - Fix: [Specific action]
+   - Effort: [Hours estimate]
+#### P3 — Low (Backlog)
+1. [Gap description]
+   - Impact: [Why this matters]
+   - Fix: [Specific action]
+   - Effort: [Hours estimate]
+### Improvement Timeline
+| Week | Focus | Expected Score |
+|------|-------|----------------|
+| 1 | P0 gaps | 45% → 55% |
+| 2 | P1 gaps | 55% → 70% |
+| 3 | P2 gaps | 70% → 80% |
+| 4 | P3 gaps + polish | 80% → 85% |
+### Success Metrics
+- [ ] Audit score ≥ 80%
+- [ ] No P0 or P1 gaps remaining
+- [ ] `make lint-arch` passes
+- [ ] All eval categories have 5+ tasks
+- [ ] Quality score trend is positive
+```
+---
+## Before/After Comparison Template
+```markdown
+## Improvement Results
+**Project:** [Name]
+**Improvement Period:** YYYY-MM-DD to YYYY-MM-DD
+### Score Comparison
+| Dimension | Before | After | Delta |
+|-----------|--------|-------|-------|
+| Documentation | XX% | XX% | +XX% |
+| Linters | XX% | XX% | +XX% |
+| Observability | XX% | XX% | +XX% |
+| Evals | XX% | XX% | +XX% |
+| Quality | XX% | XX% | +XX% |
+| Integration | XX% | XX% | +XX% |
+| **Overall** | **XX%** | **XX%** | **+XX%** |
+### Changes Made
+#### Documentation
+- Updated ARCHITECTURE.md with [changes]
+- Created design doc for [component]
+- Added [N] entries to tool catalog
+#### Linters
+- Added [N] packages to layer map
+- Created new linter for [pattern]
+- Fixed [N] false positives
+#### Evals
+- Added [N] new eval tasks
+- Removed [N] obsolete tasks
+- Updated [N] broken tasks
+#### Quality
+- Added cleanup task for [pattern]
+- Updated quality score weights
+- Fixed [N] golden principle violations
+### Remaining Gaps
+[List any P2/P3 items not yet addressed]
+### Recommendations
+[Next steps for maintaining/improving harness]
+```
+---
+## Automated Audit Script
+```go
+// scripts/audit-harness.go
+//
+// Automated harness audit. Run: go run scripts/audit-harness.go
+//
+// Outputs JSON with scores per dimension.
+package main
+import (
+	"encoding/json"
+	"fmt"
+	"os"
+	"path/filepath"
+)
+type AuditResult struct {
+	Dimension string  `json:"dimension"`
+	Score     float64 `json:"score"`
+	MaxScore  float64 `json:"max_score"`
+	Percent   float64 `json:"percent"`
+	Items     []AuditItem `json:"items"`
+}
+type AuditItem struct {
+	Name    string  `json:"name"`
+	Score   float64 `json:"score"`
+	Max     float64 `json:"max"`
+	Notes   string  `json:"notes,omitempty"`
+}
+func main() {
+	results := []AuditResult{
+		auditDocumentation(),
+		auditLinters(),
+		auditObservability(),
+		auditEvals(),
+		auditQuality(),
+		auditIntegration(),
+	}
+	// Calculate overall
+	weights := map[string]float64{
+		"Documentation": 0.25,
+		"Linters": 0.20,
+		"Observability": 0.15,
+		"Evals": 0.20,
+		"Quality": 0.10,
+		"Integration": 0.10,
+	}
+	var overall float64
+	for _, r := range results {
+		overall += r.Percent * weights[r.Dimension]
+	}
+	// Output
+	output := map[string]interface{}{
+		"results": results,
+		"overall": overall,
+	}
+	data, _ := json.MarshalIndent(output, "", "  ")
+	fmt.Println(string(data))
+}
+func auditDocumentation() AuditResult {
+	r := AuditResult{Dimension: "Documentation", MaxScore: 100}
+	// Check files exist
+	files := map[string]float64{
+		"AGENTS.md": 10,
+		"docs/ARCHITECTURE.md": 10,
+		"docs/DEVELOPMENT.md": 10,
+		"docs/QUALITY.md": 10,
+	}
+	for file, points := range files {
+		if _, err := os.Stat(file); err == nil {
+			r.Score += points
+			r.Items = append(r.Items, AuditItem{Name: file, Score: points, Max: points})
+		} else {
+			r.Items = append(r.Items, AuditItem{Name: file, Score: 0, Max: points, Notes: "missing"})
+		}
+	}
+	// Check docs/design-docs/ has files (not just the index)
+	if matches, _ := filepath.Glob("docs/design-docs/*.md"); len(matches) > 0 {
+		// Exclude index.md from count
+		actualDocs := 0
+		for _, m := range matches {
+			if !strings.HasSuffix(m, "index.md") {
+				actualDocs++
+			}
+		}
+		score := min(float64(actualDocs)*5, 20)
+		r.Score += score
+		r.Items = append(r.Items, AuditItem{Name: "docs/design-docs/", Score: score, Max: 20, Notes: fmt.Sprintf("%d design docs (excluding index)", actualDocs)})
+	} else {
+		r.Items = append(r.Items, AuditItem{Name: "docs/design-docs/", Score: 0, Max: 20, Notes: "empty or missing"})
+	}
+	// Check docs/references/ has files
+	if matches, _ := filepath.Glob("docs/references/*.md"); len(matches) > 0 {
+		score := min(float64(len(matches))*5, 20)
+		r.Score += score
+		r.Items = append(r.Items, AuditItem{Name: "docs/references/", Score: score, Max: 20, Notes: fmt.Sprintf("%d files", len(matches))})
+	} else {
+		r.Items = append(r.Items, AuditItem{Name: "docs/references/", Score: 0, Max: 20, Notes: "empty or missing"})
+	}
+	// Remaining 20 points for AGENTS.md line count
+	if data, err := os.ReadFile("AGENTS.md"); err == nil {
+		lines := len(strings.Split(string(data), "\n"))
+		if lines >= 80 && lines <= 150 {
+			r.Score += 20
+			r.Items = append(r.Items, AuditItem{Name: "AGENTS.md size", Score: 20, Max: 20, Notes: fmt.Sprintf("%d lines", lines)})
+		} else if lines < 80 {
+			r.Items = append(r.Items, AuditItem{Name: "AGENTS.md size", Score: 10, Max: 20, Notes: fmt.Sprintf("%d lines (too short)", lines)})
+			r.Score += 10
+		} else {
+			r.Items = append(r.Items, AuditItem{Name: "AGENTS.md size", Score: 5, Max: 20, Notes: fmt.Sprintf("%d lines (too long, should be map)", lines)})
+			r.Score += 5
+		}
+	}
+	r.Percent = (r.Score / r.MaxScore) * 100
+	return r
+}
+func auditLinters() AuditResult {
+	r := AuditResult{Dimension: "Linters", MaxScore: 100}
+	linters := []string{"scripts/lint-deps.go", "scripts/lint-quality.go"}
+	for _, l := range linters {
+		if _, err := os.Stat(l); err == nil {
+			r.Score += 25
+			r.Items = append(r.Items, AuditItem{Name: l, Score: 25, Max: 25})
+		} else {
+			r.Items = append(r.Items, AuditItem{Name: l, Score: 0, Max: 25, Notes: "missing"})
+		}
+	}
+	// Check Makefile
+	if data, err := os.ReadFile("Makefile"); err == nil {
+		if strings.Contains(string(data), "lint-arch") {
+			r.Score += 25
+			r.Items = append(r.Items, AuditItem{Name: "Makefile lint-arch", Score: 25, Max: 25})
+		} else {
+			r.Items = append(r.Items, AuditItem{Name: "Makefile lint-arch", Score: 0, Max: 25, Notes: "target missing"})
+		}
+	}
+	// Remaining 25 for additional linters
+	if matches, _ := filepath.Glob("scripts/lint-*.go"); len(matches) > 2 {
+		r.Score += 25
+		r.Items = append(r.Items, AuditItem{Name: "additional linters", Score: 25, Max: 25, Notes: fmt.Sprintf("%d total", len(matches))})
+	} else {
+		r.Items = append(r.Items, AuditItem{Name: "additional linters", Score: 0, Max: 25, Notes: "only core linters"})
+	}
+	r.Percent = (r.Score / r.MaxScore) * 100
+	return r
+}
+func auditObservability() AuditResult {
+	r := AuditResult{Dimension: "Observability", MaxScore: 100}
+	dirs := map[string]float64{
+		"harness/trace": 35,
+		"harness/selftest": 35,
+	}
+	for dir, points := range dirs {
+		if info, err := os.Stat(dir); err == nil && info.IsDir() {
+			r.Score += points
+			r.Items = append(r.Items, AuditItem{Name: dir, Score: points, Max: points})
+		} else {
+			r.Items = append(r.Items, AuditItem{Name: dir, Score: 0, Max: points, Notes: "missing"})
+		}
+	}
+	// Check for observability hook
+	if matches, _ := filepath.Glob("**/observability*.go"); len(matches) > 0 {
+		r.Score += 30
+		r.Items = append(r.Items, AuditItem{Name: "observability hook", Score: 30, Max: 30})
+	} else {
+		r.Items = append(r.Items, AuditItem{Name: "observability hook", Score: 0, Max: 30, Notes: "not found"})
+	}
+	r.Percent = (r.Score / r.MaxScore) * 100
+	return r
+}
+func auditEvals() AuditResult {
+	r := AuditResult{Dimension: "Evals", MaxScore: 100}
+	// Framework files (40 points)
+	files := []string{
+		"harness/eval/framework.go",
+		"harness/eval/runner.go",
+		"harness/eval/scorer.go",
+		"harness/eval/reporter.go",
+	}
+	for _, f := range files {
+		if _, err := os.Stat(f); err == nil {
+			r.Score += 10
+			r.Items = append(r.Items, AuditItem{Name: f, Score: 10, Max: 10})
+		} else {
+			r.Items = append(r.Items, AuditItem{Name: f, Score: 0, Max: 10, Notes: "missing"})
+		}
+	}
+	// Dataset categories (60 points, 15 each)
+	categories := []string{"file_ops", "code_gen", "debugging", "refactoring"}
+	for _, cat := range categories {
+		pattern := fmt.Sprintf("harness/eval/datasets/%s/*.json", cat)
+		matches, _ := filepath.Glob(pattern)
+		if len(matches) >= 5 {
+			r.Score += 15
+			r.Items = append(r.Items, AuditItem{Name: cat, Score: 15, Max: 15, Notes: fmt.Sprintf("%d tasks", len(matches))})
+		} else if len(matches) > 0 {
+			score := float64(len(matches)) * 3
+			r.Score += score
+			r.Items = append(r.Items, AuditItem{Name: cat, Score: score, Max: 15, Notes: fmt.Sprintf("%d tasks (need 5+)", len(matches))})
+		} else {
+			r.Items = append(r.Items, AuditItem{Name: cat, Score: 0, Max: 15, Notes: "no tasks"})
+		}
+	}
+	r.Percent = (r.Score / r.MaxScore) * 100
+	return r
+}
+func auditQuality() AuditResult {
+	r := AuditResult{Dimension: "Quality", MaxScore: 100}
+	items := map[string]float64{
+		"harness/quality/score.go": 35,
+		"harness/cleanup/tasks.go": 35,
+		"docs/QUALITY.md": 30,
+	}
+	for item, points := range items {
+		if _, err := os.Stat(item); err == nil {
+			r.Score += points
+			r.Items = append(r.Items, AuditItem{Name: item, Score: points, Max: points})
+		} else {
+			r.Items = append(r.Items, AuditItem{Name: item, Score: 0, Max: points, Notes: "missing"})
+		}
+	}
+	r.Percent = (r.Score / r.MaxScore) * 100
+	return r
+}
+func auditIntegration() AuditResult {
+	r := AuditResult{Dimension: "Integration", MaxScore: 100}
+	// Check go.mod exists (build will work)
+	if _, err := os.Stat("go.mod"); err == nil {
+		r.Score += 40
+		r.Items = append(r.Items, AuditItem{Name: "go.mod", Score: 40, Max: 40})
+	} else {
+		r.Items = append(r.Items, AuditItem{Name: "go.mod", Score: 0, Max: 40, Notes: "missing"})
+	}
+	// Check Makefile exists
+	if _, err := os.Stat("Makefile"); err == nil {
+		r.Score += 30
+		r.Items = append(r.Items, AuditItem{Name: "Makefile", Score: 30, Max: 30})
+	} else {
+		r.Items = append(r.Items, AuditItem{Name: "Makefile", Score: 0, Max: 30, Notes: "missing"})
+	}
+	// Check for CI config
+	ciConfigs := []string{".github/workflows", ".gitlab-ci.yml", "Jenkinsfile", ".circleci"}
+	found := false
+	for _, ci := range ciConfigs {
+		if _, err := os.Stat(ci); err == nil {
+			found = true
+			r.Score += 30
+			r.Items = append(r.Items, AuditItem{Name: "CI config", Score: 30, Max: 30, Notes: ci})
+			break
+		}
+	}
+	if !found {
+		r.Items = append(r.Items, AuditItem{Name: "CI config", Score: 0, Max: 30, Notes: "not found"})
+	}
+	r.Percent = (r.Score / r.MaxScore) * 100
+	return r
+}
+func min(a, b float64) float64 {
+	if a < b {
+		return a
+	}
+	return b
+}
+import "strings"
+```
+Note: The script above has a deliberate syntax issue (import at the end) — move the `import "strings"` to the import block at the top when using.