npm - @uzysjung/agent-harness - Versions diffs - 26.83.0 - Mend

@uzysjung/agent-harness 26.83.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (212) hide show

package/templates/skills/eval-harness/SKILL.md ADDED Viewed

@@ -0,0 +1,279 @@
+---
+name: eval-harness
+description: Formal evaluation framework for Claude Code sessions implementing eval-driven development (EDD) principles
+origin: ECC
+tools: Read, Write, Edit, Bash, Grep, Glob
+---
+# Eval Harness Skill
+A formal evaluation framework for Claude Code sessions, implementing eval-driven development (EDD) principles.
+## When to Activate
+- Setting up eval-driven development (EDD) for AI-assisted workflows
+- Defining pass/fail criteria for Claude Code task completion
+- Measuring agent reliability with pass@k metrics
+- Creating regression test suites for prompt or agent changes
+- Benchmarking agent performance across model versions
+## Philosophy
+Eval-Driven Development treats evals as the "unit tests of AI development":
+- Define expected behavior BEFORE implementation
+- Run evals continuously during development
+- Track regressions with each change
+- Use pass@k metrics for reliability measurement
+## Eval Types
+### Capability Evals
+Test if Claude can do something it couldn't before:
+```markdown
+[CAPABILITY EVAL: feature-name]
+Task: Description of what Claude should accomplish
+Success Criteria:
+  - [ ] Criterion 1
+  - [ ] Criterion 2
+  - [ ] Criterion 3
+Expected Output: Description of expected result
+```
+### Regression Evals
+Ensure changes don't break existing functionality:
+```markdown
+[REGRESSION EVAL: feature-name]
+Baseline: SHA or checkpoint name
+Tests:
+  - existing-test-1: PASS/FAIL
+  - existing-test-2: PASS/FAIL
+  - existing-test-3: PASS/FAIL
+Result: X/Y passed (previously Y/Y)
+```
+## Grader Types
+### 1. Code-Based Grader
+Deterministic checks using code:
+```bash
+# Check if file contains expected pattern
+grep -q "export function handleAuth" src/auth.ts && echo "PASS" || echo "FAIL"
+# Check if tests pass
+npm test -- --testPathPattern="auth" && echo "PASS" || echo "FAIL"
+# Check if build succeeds
+npm run build && echo "PASS" || echo "FAIL"
+```
+### 2. Model-Based Grader
+Use Claude to evaluate open-ended outputs:
+```markdown
+[MODEL GRADER PROMPT]
+Evaluate the following code change:
+1. Does it solve the stated problem?
+2. Is it well-structured?
+3. Are edge cases handled?
+4. Is error handling appropriate?
+Score: 1-5 (1=poor, 5=excellent)
+Reasoning: [explanation]
+```
+### 3. Human Grader
+Flag for manual review:
+```markdown
+[HUMAN REVIEW REQUIRED]
+Change: Description of what changed
+Reason: Why human review is needed
+Risk Level: LOW/MEDIUM/HIGH
+```
+## Metrics
+### pass@k
+"At least one success in k attempts"
+- pass@1: First attempt success rate
+- pass@3: Success within 3 attempts
+- Typical target: pass@3 > 90%
+### pass^k
+"All k trials succeed"
+- Higher bar for reliability
+- pass^3: 3 consecutive successes
+- Use for critical paths
+## Eval Workflow
+### 1. Define (Before Coding)
+```markdown
+## EVAL DEFINITION: feature-xyz
+### Capability Evals
+1. Can create new user account
+2. Can validate email format
+3. Can hash password securely
+### Regression Evals
+1. Existing login still works
+2. Session management unchanged
+3. Logout flow intact
+### Success Metrics
+- pass@3 > 90% for capability evals
+- pass^3 = 100% for regression evals
+```
+### 2. Implement
+Write code to pass the defined evals.
+### 3. Evaluate
+```bash
+# Run capability evals
+[Run each capability eval, record PASS/FAIL]
+# Run regression evals
+npm test -- --testPathPattern="existing"
+# Generate report
+```
+### 4. Report
+```markdown
+EVAL REPORT: feature-xyz
+========================
+Capability Evals:
+  create-user:     PASS (pass@1)
+  validate-email:  PASS (pass@2)
+  hash-password:   PASS (pass@1)
+  Overall:         3/3 passed
+Regression Evals:
+  login-flow:      PASS
+  session-mgmt:    PASS
+  logout-flow:     PASS
+  Overall:         3/3 passed
+Metrics:
+  pass@1: 67% (2/3)
+  pass@3: 100% (3/3)
+Status: READY FOR REVIEW
+```
+## Integration Patterns
+### Pre-Implementation
+```
+/eval define feature-name
+```
+Creates eval definition file at `.claude/evals/feature-name.md`
+### During Implementation
+```
+/eval check feature-name
+```
+Runs current evals and reports status
+### Post-Implementation
+```
+/eval report feature-name
+```
+Generates full eval report
+## Eval Storage (.md + .log Pair Format)
+각 평가 항목은 **`<topic>.md` (설계) + `<topic>.log` (실행 결과)** 쌍으로 저장. 강제. 단독 .md만 있으면 재현 불가.
+```
+.claude/
+  evals/
+    feature-xyz.md        # Eval definition (Capability/Regression/Test 3섹션 필수)
+    feature-xyz.log       # Eval run history (실행 시각, grader, pass/fail)
+    session-YYYYMMDD.md   # 세션 단위 회고 + 차기 backlog
+    session-YYYYMMDD.log  # 동일 세션의 grader 출력
+    baseline.json         # Regression baselines (선택)
+```
+> Vantage 프로젝트의 `.claude/evals/*.{md,log}` 구조를 일반화한 것.
+### .md 파일 의무 섹션 (3개)
+```markdown
+# Eval: <topic>
+## Capability
+[새 능력 — Claude/agent가 무엇을 할 수 있는지]
+- AC: [측정 가능 기준]
+- Grader: code-based / model-based / human
+## Regression
+[기존 기능 보호 — 변경으로 깨지면 안 되는 baseline]
+- Baseline: <SHA or checkpoint>
+- Tests: [목록]
+## Test
+[실행 절차 — 누가 다시 돌려도 동일 결과 나와야 함]
+- Setup: [사전 조건]
+- Run: `bash run-eval.sh <topic>` 또는 명시적 명령
+- Expected: [기대 출력]
+```
+### .log 파일 형식
+각 실행마다 append. 시간순 누적.
+```
+=== 2026-04-19 14:32 (run #1) ===
+Capability: 3/3 PASS (pass@1)
+Regression: 5/5 PASS (pass^3)
+Status: SHIP READY
+=== 2026-04-20 09:15 (run #2 — after refactor) ===
+Capability: 3/3 PASS
+Regression: 4/5 PASS (login-flow regressed at SHA abc123)
+Status: BLOCKED — fix login-flow first
+```
+## Best Practices
+1. **Define evals BEFORE coding** - Forces clear thinking about success criteria
+2. **Run evals frequently** - Catch regressions early
+3. **Track pass@k over time** - Monitor reliability trends
+4. **Use code graders when possible** - Deterministic > probabilistic
+5. **Human review for security** - Never fully automate security checks
+6. **Keep evals fast** - Slow evals don't get run
+7. **Version evals with code** - Evals are first-class artifacts
+## Example: Adding Authentication
+```markdown
+## EVAL: add-authentication
+### Phase 1: Define (10 min)
+Capability Evals:
+- [ ] User can register with email/password
+- [ ] User can login with valid credentials
+- [ ] Invalid credentials rejected with proper error
+- [ ] Sessions persist across page reloads
+- [ ] Logout clears session
+Regression Evals:
+- [ ] Public routes still accessible
+- [ ] API responses unchanged
+- [ ] Database schema compatible
+### Phase 2: Implement (varies)
+[Write code]
+### Phase 3: Evaluate
+Run: /eval check add-authentication
+### Phase 4: Report
+EVAL REPORT: add-authentication
+==============================
+Capability: 5/5 passed (pass@3: 100%)
+Regression: 3/3 passed (pass^3: 100%)
+Status: SHIP IT
+```

package/templates/skills/eval-harness/agents/openai.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+interface:
+  display_name: "Eval Harness"
+  short_description: "Eval-driven development with pass/fail criteria"
+  brand_color: "#EC4899"
+  default_prompt: "Set up eval-driven development with pass/fail criteria"
+policy:
+  allow_implicit_invocation: true

package/templates/skills/gh-issue-workflow/ISSUE.template.md ADDED Viewed

@@ -0,0 +1,58 @@
+<!--
+GitHub Issue body 템플릿 (gh-issue-workflow skill v26.34.0)
+- 5섹션 모두 채울 필요는 없으나, 비어있으면 그 줄을 지운다 (placeholder 남기지 말 것).
+- BDD 매핑: 전제(Given) → 적용 대상(When) → AC(Then).
+- 방향성 상태로 작업 가능 여부가 결정된다 (OPEN = 작업 차단, "YYYY-MM-DD 확정" = 작업 가능).
+- Labels (3-축, 권장):
+  - type: bug | feature | refactor | docs | infra
+  - 상태: decision-pending(방향성 OPEN) | ready(확정) | in-progress(PR open) | blocked(전제 미충족)
+  - 우선순위: P0 | P1 | P2 (선택)
+- GitHub Project 연계 (선택): docs/SPEC.md에 `github_project: <URL>` 명시 시 자동 add.
+-->
+## 배경
+[왜 이 작업이 필요한가. 1-3 문장. 사용자가 발견한 증상, 도달하려는 상태, 비즈니스 맥락.]
+## 전제 (Given)
+[이 작업을 시작하기 전에 충족돼야 하는 조건. 다른 issue / 외부 의존성 / 의사결정 결과 / 인프라 상태. 미충족 시 작업 차단.]
+- [ ] [전제 조건 1 — 예: Issue #N 완료]
+- [ ] [전제 조건 2 — 예: Stripe 계정 발급]
+- [ ] [전제 조건 3 — 예: DB 스키마 v3 마이그레이션]
+전제 미충족 시 → 차단 사유 명시 + 충족시킬 책임자/순서 기록.
+## 방향성 (OPEN | YYYY-MM-DD 확정)
+[현재 의사결정 상태. `OPEN` = 사용자 결정 대기, `YYYY-MM-DD 확정` = 결정 완료.]
+- 옵션 A: [설명]
+- 옵션 B: [설명]
+- **선택 (확정 시)**: [선택지 + 근거]
+방향성이 OPEN이면 본 issue로 작업 진행 금지. AI agent는 사용자 결정 대기.
+## 적용 대상 / Acceptance Criteria (When → Then)
+[변경 범위 + 측정 가능한 완료 조건.]
+- [ ] [AC 1 — 예: `/admin/activity-logs` 페이지 11 이상 페이지 정상 작동 (When 사용자 11 클릭 → Then 11페이지 데이터 표시)]
+- [ ] [AC 2 — 예: 디자인 시스템 토큰 사용 (When 페이지 렌더 → Then 색상/간격이 design system과 일치)]
+- [ ] [AC 3]
+AC는 검증 가능해야 함 — pass/fail 명확.
+## 후속 작업 (Next)
+[본 issue 완료 후 분기되는 작업. 새 issue 번호 또는 잠정 설명.]
+- [ ] [후속 1 — 예: Issue #N으로 분리]
+- [ ] [후속 2]
+후속 작업이 없으면 이 섹션 통째로 삭제.
+---
+<!-- PR 머지 시 본 issue 자동 close되도록 PR body에 `Closes #<this-issue-number>` 추가 -->

package/templates/skills/gh-issue-workflow/SKILL.md ADDED Viewed

@@ -0,0 +1,184 @@
+---
+name: gh-issue-workflow
+description: "Treats GitHub Issues as the async backlog + decision channel between user and AI agent. Use when a non-blocking todo / bug / decision needs to persist beyond the chat session. Enforces 5-section body template (Background / Given / Decision / AC / Next) so issues become reusable agent context, not just sticky notes."
+---
+# GitHub Issue Workflow
+## Purpose
+채팅(휘발성)과 plan.md(정적) 사이의 빈 곳을 GitHub Issue가 채운다. 1인 사용자 + AI agent 협업에서:
+- 사용자가 발견한 bug/feature 요청 → issue로 backlog (chat을 끊지 않고)
+- 의사결정이 필요한 갈림길 → issue body에 옵션 정리 → 사용자가 비동기로 결정 → AI agent가 fetch해서 작업
+- 모든 결정의 영구 검색 가능 기록 (cross-link `#N`, label, milestone 활용)
+dyld-vantage 프로젝트의 실제 운용 패턴(`#52~#55`)을 일반화. 1인 시나리오에 최적화 (팀 assign / reviewer 자동화 같은 건 안 함).
+## When to Invoke
+| 트리거 | 행동 |
+|--------|------|
+| `/uzys:spec` 시작 + GitHub remote 존재 | "epic issue 만들까?" 1회 권유 (선택) |
+| `/uzys:plan` 시작 | OPEN issue 목록 fetch → 우선순위 결정 후 todo.md로 이관 |
+| `/uzys:build` 중 사용자가 새 bug/req 발견 | "issue로 backlog?" 권유 |
+| `/uzys:build` commit | message에 `Refs #N` (작업 진행 기록) |
+| `/uzys:ship` PR 작성 | body에 `Closes #N` (자동 close) |
+| 의사결정 갈림길 등장 | issue body에 `방향성 (OPEN)` 로 등록 → 사용자 대기 |
+## Pre-conditions
+- 프로젝트가 GitHub remote 보유 (`git remote -v`로 확인)
+- `gh` CLI 설치 + 인증 (`gh auth status`로 확인). MCP `mcp__github__*` 사용 가능하면 우선.
+- `docs/SPEC.md`에 `issue_tracking: enabled` 라인 있을 때만 활성 (opt-in). 기본 비활성.
+조건 미충족이면 본 skill 자동 skip — 에러 X.
+## Process
+### 1. ISSUE.template.md 5섹션 강제
+새 issue 생성 시 본 skill 디렉토리의 `ISSUE.template.md`를 body로 채운다.
+```
+## 배경         — Why
+## 전제 (Given) — 시작 전 의존성/조건
+## 방향성       — OPEN | YYYY-MM-DD 확정
+## 적용 대상 / AC (When → Then)
+## 후속 작업    — Next
+```
+비어있는 섹션은 통째로 삭제 (placeholder 금지). BDD 매핑: 전제(Given) → 적용 대상(When) → AC(Then).
+### 2. 방향성 상태로 작업 가능 여부 판정
+| 상태 | 의미 | AI agent 행동 |
+|------|------|--------------|
+| **OPEN** | 사용자 결정 대기 | 본 issue 작업 차단. 다른 issue 우선 처리 또는 사용자에게 결정 요청 |
+| **YYYY-MM-DD 확정** | 결정 완료 | 작업 가능. AC 충족 후 close |
+확정 날짜 미달 시 → 사용자에게 1회 결정 요청 (Escalation Gate) → 응답 후에만 진행.
+### 3. 전제(Given) 체크
+작업 시작 전 전제 조건 모두 충족됐는지 확인:
+- 체크박스 `[x]` 모두 채워졌나?
+- 미충족 항목 → 차단 사유 + 책임 분기 보고
+전제가 다른 issue 완료에 의존하면 → 의존 issue가 close 됐는지 확인 후 진행.
+### 4. Label 체계 (자동 토글 가이드)
+**3-축 label 체계** — 각 축에서 1개씩 부착 권장:
+| 축 | Label | 부착 시점 |
+|----|-------|---------|
+| **type** | `bug` / `feature` / `refactor` / `docs` / `infra` | issue 생성 시 1회 |
+| **상태** | `decision-pending` / `ready` / `in-progress` / `blocked` | 방향성·전제 변화에 따라 토글 |
+| **우선순위** | `P0` / `P1` / `P2` (선택) | 사용자 결정 |
+**상태 자동 토글 규칙** (skill이 사용자에게 권유):
+```
+방향성: OPEN          → decision-pending
+방향성: YYYY-MM-DD 확정 → ready (decision-pending 제거)
+전제 체크박스 미완      → blocked (ready 제거)
+PR open               → in-progress
+PR merged             → 자동 close (label 무관)
+```
+label 부착은 **hook 차원 강제 X** — skill 가이드. 사용자 또는 PR 자동화로 명시 적용. `gh issue edit <N> --add-label <name>` / `--remove-label <name>` 사용.
+### 5. GitHub Projects (V2) 연계 (선택, opt-in)
+GitHub Projects board를 칸반 형태 backlog로 활용 시:
+**Pre-condition**:
+- `docs/SPEC.md`에 `github_project: <URL>` 명시 (예: `https://github.com/users/uzysjung/projects/3`)
+- 사용자가 Project 미리 생성 + status field 정의 (Backlog / Ready / In Progress / Done)
+**자동 동작**:
+- 새 issue 생성 시 → `gh project item-add <number> --owner <owner> --url <issue-url>` 호출
+- 상태 변화 시 → status field 갱신:
+  - `decision-pending` → Project status `Backlog`
+  - `ready` → `Ready`
+  - PR open → `In Progress`
+  - merged + close → `Done`
+**Project 미사용 프로젝트** → 본 섹션 skip (issue label만 활용).
+**비대상**:
+- iteration field / 자동 sprint 분배 (1인 시나리오 over-engineering)
+- 복수 Project board 동기화 (1 SPEC = 1 Project 권장)
+본 섹션은 GitHub Projects 활용을 강제하지 않음 — 사용자 선호에 따라.
+### 6. `/uzys:auto` 와의 결합
+`/uzys:auto` 사이클 시작 시 다음 시퀀스:
+```
+1. gh issue list --state open --json number,title,labels,body
+   → OPEN issue 목록을 backlog 후보로
+2. 각 issue body에서 "방향성 (YYYY-MM-DD 확정)" 패턴 grep
+   → 확정된 것만 작업 가능 후보
+3. 전제 미충족 issue 제외
+4. 우선순위 정렬 (label P0 > P1 > P2 > unlabeled)
+5. 상위 1-3개를 docs/todo.md로 이관 + Plan 단계 진입
+```
+### 7. Commit / PR 컨벤션
+| 시점 | 메시지 컨벤션 |
+|------|-------------|
+| Build 중 진행 commit | `<type>: ... (refs #N)` |
+| Ship PR body | `Closes #N` 또는 `Fixes #N` (자동 close) |
+| 부분 진행 (close 안 함) | `Refs #N` |
+| 후속 issue 생성 시 | 원본 issue body의 "후속 작업" 섹션에 `#M` cross-link |
+## Output
+- GitHub Issue 생성/갱신 (5섹션 body)
+- `docs/todo.md` — issue list에서 이관된 task
+- commit/PR 메시지에 issue 번호 자동 포함
+## Anti-Patterns
+- **issue body가 한 줄 ("login 안 됨")만** — 5섹션 의무. 최소 배경 + AC는 채울 것.
+- **방향성 미명시** — OPEN인지 확정인지 모르면 작업 시작 불가.
+- **전제 무시하고 진행** — 의존 issue 미해결 상태로 작업 진입 금지.
+- **PR에서 `Closes #N` 누락** — 수동 close 잊기 쉬움. 컨벤션 강제.
+- **모든 issue에 label 다 붙임** — 노이즈. 핵심 분류만.
+- **팀 기능 도입 (assignee 자동, code owner 자동 review)** — 본 skill 범위 밖. 팀 사용은 별도 워크플로우.
+## Boundary
+- GitHub remote 없는 프로젝트 → skill 자동 비활성
+- `docs/SPEC.md`에 `issue_tracking: enabled` 없으면 자동 비활성 (opt-in)
+- private repo 접근 권한 없으면 fetch 실패 → 사용자에게 보고
+## Examples
+### dyld-vantage 실제 패턴 (참고)
+```markdown
+## 배경
+Issue #52에서 Feature Flag 재편(16→18) + Blur gate 인프라 구축 완료.
+이 이슈는 페이지별 blur 적용 + API 수량 제한의 후속 작업.
+## 전제 (Given)
+- [x] Issue #52 완료 (Feature Flag 인프라)
+- [x] Blur gate 컴포넌트 사용 가능
+## 방향성 (2026-04-22 확정)
+- 메뉴 접근은 유지 (사이드바 풀 노출, 401/403 없음)
+- 페이지 단위 blur: outer max-w-* mx-auto 안쪽에 blur_gate 1개만
+- 개별 블록별 blur 금지 (복잡도 대비 가치 낮음)
+## 적용 대상 / AC
+- [ ] 모든 _content.html에 blur_gate 적용 (When Free user 방문 → Then blur 노출)
+- [ ] API 수량 제한 미들웨어 (When tier 미충족 요청 → Then 403)
+## 후속 작업
+- [ ] Issue #56로 분리: Pricing 페이지 CTA 디자인
+```

package/templates/skills/investor-materials/SKILL.md ADDED Viewed

@@ -0,0 +1,96 @@
+---
+name: investor-materials
+description: Create and update pitch decks, one-pagers, investor memos, accelerator applications, financial models, and fundraising materials. Use when the user needs investor-facing documents, projections, use-of-funds tables, milestone plans, or materials that must stay internally consistent across multiple fundraising assets.
+origin: ECC
+---
+# Investor Materials
+Build investor-facing materials that are consistent, credible, and easy to defend.
+## When to Activate
+- creating or revising a pitch deck
+- writing an investor memo or one-pager
+- building a financial model, milestone plan, or use-of-funds table
+- answering accelerator or incubator application questions
+- aligning multiple fundraising docs around one source of truth
+## Golden Rule
+All investor materials must agree with each other.
+Create or confirm a single source of truth before writing:
+- traction metrics
+- pricing and revenue assumptions
+- raise size and instrument
+- use of funds
+- team bios and titles
+- milestones and timelines
+If conflicting numbers appear, stop and resolve them before drafting.
+## Core Workflow
+1. inventory the canonical facts
+2. identify missing assumptions
+3. choose the asset type
+4. draft the asset with explicit logic
+5. cross-check every number against the source of truth
+## Asset Guidance
+### Pitch Deck
+Recommended flow:
+1. company + wedge
+2. problem
+3. solution
+4. product / demo
+5. market
+6. business model
+7. traction
+8. team
+9. competition / differentiation
+10. ask
+11. use of funds / milestones
+12. appendix
+If the user wants a web-native deck, pair this skill with `frontend-slides`.
+### One-Pager / Memo
+- state what the company does in one clean sentence
+- show why now
+- include traction and proof points early
+- make the ask precise
+- keep claims easy to verify
+### Financial Model
+Include:
+- explicit assumptions
+- bear / base / bull cases when useful
+- clean layer-by-layer revenue logic
+- milestone-linked spending
+- sensitivity analysis where the decision hinges on assumptions
+### Accelerator Applications
+- answer the exact question asked
+- prioritize traction, insight, and team advantage
+- avoid puffery
+- keep internal metrics consistent with the deck and model
+## Red Flags to Avoid
+- unverifiable claims
+- fuzzy market sizing without assumptions
+- inconsistent team roles or titles
+- revenue math that does not sum cleanly
+- inflated certainty where assumptions are fragile
+## Quality Gate
+Before delivering:
+- every number matches the current source of truth
+- use of funds and revenue layers sum correctly
+- assumptions are visible, not buried
+- the story is clear without hype language
+- the final asset is defensible in a partner meeting