@uzysjung/agent-harness 26.83.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (212) hide show
  1. package/LICENSE +21 -0
  2. package/README.ko.md +279 -0
  3. package/README.md +306 -0
  4. package/dist/chunk-SDVAM5JZ.js +775 -0
  5. package/dist/chunk-SDVAM5JZ.js.map +1 -0
  6. package/dist/index.js +5412 -0
  7. package/dist/index.js.map +1 -0
  8. package/dist/trust-tier-drift.js +67 -0
  9. package/dist/trust-tier-drift.js.map +1 -0
  10. package/package.json +53 -0
  11. package/scripts/prune-ecc.sh +310 -0
  12. package/templates/CLAUDE.md +86 -0
  13. package/templates/agents/build-error-resolver.md +114 -0
  14. package/templates/agents/code-reviewer.md +237 -0
  15. package/templates/agents/data-analyst.md +69 -0
  16. package/templates/agents/plan-checker.md +118 -0
  17. package/templates/agents/reviewer.md +128 -0
  18. package/templates/agents/security-reviewer.md +108 -0
  19. package/templates/agents/silent-failure-hunter.md +50 -0
  20. package/templates/agents/strategist.md +86 -0
  21. package/templates/antigravity/AGENTS.md.template +58 -0
  22. package/templates/codex/AGENTS.md.template +94 -0
  23. package/templates/codex/README.md +69 -0
  24. package/templates/codex/config.toml.template +108 -0
  25. package/templates/codex/hooks/README.md +40 -0
  26. package/templates/codex/hooks/gate-check.sh +7 -0
  27. package/templates/codex/hooks/hito-counter.sh +7 -0
  28. package/templates/codex/hooks/session-start.sh +7 -0
  29. package/templates/codex/hooks/uncommitted-check.sh +7 -0
  30. package/templates/codex/skills/uzys-build/SKILL.md +24 -0
  31. package/templates/codex/skills/uzys-plan/SKILL.md +24 -0
  32. package/templates/codex/skills/uzys-review/SKILL.md +24 -0
  33. package/templates/codex/skills/uzys-ship/SKILL.md +24 -0
  34. package/templates/codex/skills/uzys-spec/SKILL.md +28 -0
  35. package/templates/codex/skills/uzys-test/SKILL.md +24 -0
  36. package/templates/commands/ecc/checkpoint.md +32 -0
  37. package/templates/commands/ecc/e2e.md +105 -0
  38. package/templates/commands/ecc/eval.md +88 -0
  39. package/templates/commands/ecc/evolve.md +7 -0
  40. package/templates/commands/ecc/harness-audit.md +73 -0
  41. package/templates/commands/ecc/instinct-status.md +8 -0
  42. package/templates/commands/ecc/promote.md +10 -0
  43. package/templates/commands/ecc/security-scan.md +10 -0
  44. package/templates/commands/uzys/auto.md +190 -0
  45. package/templates/commands/uzys/build.md +42 -0
  46. package/templates/commands/uzys/plan.md +55 -0
  47. package/templates/commands/uzys/review.md +44 -0
  48. package/templates/commands/uzys/ship.md +49 -0
  49. package/templates/commands/uzys/spec.md +93 -0
  50. package/templates/commands/uzys/test.md +58 -0
  51. package/templates/docs/PLAN.template.md +102 -0
  52. package/templates/hooks/agentshield-gate.sh +101 -0
  53. package/templates/hooks/checkpoint-snapshot.sh +115 -0
  54. package/templates/hooks/gate-check.sh +138 -0
  55. package/templates/hooks/hito-counter.sh +26 -0
  56. package/templates/hooks/karpathy-gate.sh +59 -0
  57. package/templates/hooks/mcp-pre-exec.sh +104 -0
  58. package/templates/hooks/protect-files.sh +41 -0
  59. package/templates/hooks/session-start.sh +40 -0
  60. package/templates/hooks/spec-drift-check.sh +86 -0
  61. package/templates/mcp-allowlist.example +24 -0
  62. package/templates/mcp.json +20 -0
  63. package/templates/opencode/.opencode/commands/uzys-build.md +22 -0
  64. package/templates/opencode/.opencode/commands/uzys-plan.md +22 -0
  65. package/templates/opencode/.opencode/commands/uzys-review.md +22 -0
  66. package/templates/opencode/.opencode/commands/uzys-ship.md +22 -0
  67. package/templates/opencode/.opencode/commands/uzys-spec.md +28 -0
  68. package/templates/opencode/.opencode/commands/uzys-test.md +22 -0
  69. package/templates/opencode/.opencode/plugins/uzys-harness.ts +146 -0
  70. package/templates/opencode/AGENTS.md.template +98 -0
  71. package/templates/opencode/README.md +34 -0
  72. package/templates/opencode/opencode.json.template +42 -0
  73. package/templates/project-claude/_base.md +23 -0
  74. package/templates/project-claude/fragments/csr-fastapi/active-rules.md +13 -0
  75. package/templates/project-claude/fragments/csr-fastapi/agents.md +5 -0
  76. package/templates/project-claude/fragments/csr-fastapi/boundaries.md +18 -0
  77. package/templates/project-claude/fragments/csr-fastapi/commands.md +6 -0
  78. package/templates/project-claude/fragments/csr-fastapi/plugins.md +2 -0
  79. package/templates/project-claude/fragments/csr-fastapi/skills.md +5 -0
  80. package/templates/project-claude/fragments/csr-fastapi/stack.md +6 -0
  81. package/templates/project-claude/fragments/csr-fastapi/tagline.md +1 -0
  82. package/templates/project-claude/fragments/csr-fastapi/workflow.md +8 -0
  83. package/templates/project-claude/fragments/csr-fastify/active-rules.md +13 -0
  84. package/templates/project-claude/fragments/csr-fastify/agents.md +5 -0
  85. package/templates/project-claude/fragments/csr-fastify/boundaries.md +18 -0
  86. package/templates/project-claude/fragments/csr-fastify/commands.md +6 -0
  87. package/templates/project-claude/fragments/csr-fastify/plugins.md +2 -0
  88. package/templates/project-claude/fragments/csr-fastify/skills.md +5 -0
  89. package/templates/project-claude/fragments/csr-fastify/stack.md +6 -0
  90. package/templates/project-claude/fragments/csr-fastify/tagline.md +1 -0
  91. package/templates/project-claude/fragments/csr-fastify/workflow.md +8 -0
  92. package/templates/project-claude/fragments/csr-supabase/active-rules.md +12 -0
  93. package/templates/project-claude/fragments/csr-supabase/agents.md +5 -0
  94. package/templates/project-claude/fragments/csr-supabase/boundaries.md +19 -0
  95. package/templates/project-claude/fragments/csr-supabase/commands.md +6 -0
  96. package/templates/project-claude/fragments/csr-supabase/plugins.md +4 -0
  97. package/templates/project-claude/fragments/csr-supabase/skills.md +7 -0
  98. package/templates/project-claude/fragments/csr-supabase/stack.md +6 -0
  99. package/templates/project-claude/fragments/csr-supabase/supabase-auth.md +21 -0
  100. package/templates/project-claude/fragments/csr-supabase/tagline.md +1 -0
  101. package/templates/project-claude/fragments/csr-supabase/workflow.md +8 -0
  102. package/templates/project-claude/fragments/data/active-rules.md +10 -0
  103. package/templates/project-claude/fragments/data/agents.md +6 -0
  104. package/templates/project-claude/fragments/data/boundaries.md +20 -0
  105. package/templates/project-claude/fragments/data/commands.md +6 -0
  106. package/templates/project-claude/fragments/data/plugins.md +2 -0
  107. package/templates/project-claude/fragments/data/skills.md +3 -0
  108. package/templates/project-claude/fragments/data/stack.md +7 -0
  109. package/templates/project-claude/fragments/data/tagline.md +1 -0
  110. package/templates/project-claude/fragments/data/workflow.md +9 -0
  111. package/templates/project-claude/fragments/executive/active-rules.md +6 -0
  112. package/templates/project-claude/fragments/executive/agents.md +6 -0
  113. package/templates/project-claude/fragments/executive/boundaries.md +17 -0
  114. package/templates/project-claude/fragments/executive/commands.md +11 -0
  115. package/templates/project-claude/fragments/executive/plugins.md +1 -0
  116. package/templates/project-claude/fragments/executive/skills.md +7 -0
  117. package/templates/project-claude/fragments/executive/stack.md +4 -0
  118. package/templates/project-claude/fragments/executive/tagline.md +1 -0
  119. package/templates/project-claude/fragments/executive/workflow.md +10 -0
  120. package/templates/project-claude/fragments/growth-marketing/active-rules.md +7 -0
  121. package/templates/project-claude/fragments/growth-marketing/agents.md +6 -0
  122. package/templates/project-claude/fragments/growth-marketing/boundaries.md +17 -0
  123. package/templates/project-claude/fragments/growth-marketing/commands.md +11 -0
  124. package/templates/project-claude/fragments/growth-marketing/plugins.md +9 -0
  125. package/templates/project-claude/fragments/growth-marketing/skills.md +8 -0
  126. package/templates/project-claude/fragments/growth-marketing/stack.md +7 -0
  127. package/templates/project-claude/fragments/growth-marketing/tagline.md +1 -0
  128. package/templates/project-claude/fragments/growth-marketing/workflow.md +11 -0
  129. package/templates/project-claude/fragments/project-management/active-rules.md +7 -0
  130. package/templates/project-claude/fragments/project-management/agents.md +6 -0
  131. package/templates/project-claude/fragments/project-management/boundaries.md +16 -0
  132. package/templates/project-claude/fragments/project-management/commands.md +10 -0
  133. package/templates/project-claude/fragments/project-management/plugins.md +6 -0
  134. package/templates/project-claude/fragments/project-management/skills.md +5 -0
  135. package/templates/project-claude/fragments/project-management/stack.md +4 -0
  136. package/templates/project-claude/fragments/project-management/tagline.md +1 -0
  137. package/templates/project-claude/fragments/project-management/workflow.md +12 -0
  138. package/templates/project-claude/fragments/ssr-htmx/active-rules.md +11 -0
  139. package/templates/project-claude/fragments/ssr-htmx/agents.md +5 -0
  140. package/templates/project-claude/fragments/ssr-htmx/boundaries.md +20 -0
  141. package/templates/project-claude/fragments/ssr-htmx/commands.md +6 -0
  142. package/templates/project-claude/fragments/ssr-htmx/plugins.md +2 -0
  143. package/templates/project-claude/fragments/ssr-htmx/skills.md +3 -0
  144. package/templates/project-claude/fragments/ssr-htmx/stack.md +6 -0
  145. package/templates/project-claude/fragments/ssr-htmx/tagline.md +1 -0
  146. package/templates/project-claude/fragments/ssr-htmx/workflow.md +8 -0
  147. package/templates/project-claude/fragments/ssr-nextjs/active-rules.md +12 -0
  148. package/templates/project-claude/fragments/ssr-nextjs/agents.md +5 -0
  149. package/templates/project-claude/fragments/ssr-nextjs/boundaries.md +20 -0
  150. package/templates/project-claude/fragments/ssr-nextjs/commands.md +6 -0
  151. package/templates/project-claude/fragments/ssr-nextjs/plugins.md +2 -0
  152. package/templates/project-claude/fragments/ssr-nextjs/skills.md +5 -0
  153. package/templates/project-claude/fragments/ssr-nextjs/stack.md +5 -0
  154. package/templates/project-claude/fragments/ssr-nextjs/tagline.md +1 -0
  155. package/templates/project-claude/fragments/ssr-nextjs/workflow.md +8 -0
  156. package/templates/project-claude/fragments/tooling/active-rules.md +11 -0
  157. package/templates/project-claude/fragments/tooling/agents.md +5 -0
  158. package/templates/project-claude/fragments/tooling/boundaries.md +17 -0
  159. package/templates/project-claude/fragments/tooling/commands.md +4 -0
  160. package/templates/project-claude/fragments/tooling/skills.md +4 -0
  161. package/templates/project-claude/fragments/tooling/stack.md +5 -0
  162. package/templates/project-claude/fragments/tooling/tagline.md +1 -0
  163. package/templates/project-claude/fragments/tooling/workflow.md +5 -0
  164. package/templates/rules/api-contract.md +33 -0
  165. package/templates/rules/change-management.md +80 -0
  166. package/templates/rules/cli-development.md +39 -0
  167. package/templates/rules/code-style.md +23 -0
  168. package/templates/rules/data-analysis.md +61 -0
  169. package/templates/rules/database.md +29 -0
  170. package/templates/rules/design-workflow.md +17 -0
  171. package/templates/rules/error-handling.md +23 -0
  172. package/templates/rules/gates-taxonomy.md +21 -0
  173. package/templates/rules/git-policy.md +102 -0
  174. package/templates/rules/htmx.md +42 -0
  175. package/templates/rules/nextjs.md +35 -0
  176. package/templates/rules/playwright-launch.md +66 -0
  177. package/templates/rules/pyside6.md +59 -0
  178. package/templates/rules/shadcn.md +33 -0
  179. package/templates/rules/ship-checklist.md +24 -0
  180. package/templates/rules/tauri.md +40 -0
  181. package/templates/rules/test-policy.md +62 -0
  182. package/templates/settings.json +71 -0
  183. package/templates/skills/agent-introspection-debugging/SKILL.md +153 -0
  184. package/templates/skills/continuous-learning-v2/SKILL.md +365 -0
  185. package/templates/skills/continuous-learning-v2/config.json +8 -0
  186. package/templates/skills/continuous-learning-v2/hooks/observe.sh +428 -0
  187. package/templates/skills/continuous-learning-v2/scripts/detect-project.sh +228 -0
  188. package/templates/skills/continuous-learning-v2/scripts/instinct-cli.py +1426 -0
  189. package/templates/skills/deep-research/SKILL.md +155 -0
  190. package/templates/skills/deep-research/agents/openai.yaml +7 -0
  191. package/templates/skills/e2e-testing/SKILL.md +326 -0
  192. package/templates/skills/e2e-testing/agents/openai.yaml +7 -0
  193. package/templates/skills/eval-harness/SKILL.md +279 -0
  194. package/templates/skills/eval-harness/agents/openai.yaml +7 -0
  195. package/templates/skills/gh-issue-workflow/ISSUE.template.md +58 -0
  196. package/templates/skills/gh-issue-workflow/SKILL.md +184 -0
  197. package/templates/skills/investor-materials/SKILL.md +96 -0
  198. package/templates/skills/investor-outreach/SKILL.md +91 -0
  199. package/templates/skills/market-research/SKILL.md +75 -0
  200. package/templates/skills/market-research/agents/openai.yaml +7 -0
  201. package/templates/skills/nextjs-turbopack/SKILL.md +44 -0
  202. package/templates/skills/north-star/NORTH_STAR.template.md +114 -0
  203. package/templates/skills/north-star/SKILL.md +103 -0
  204. package/templates/skills/python-patterns/SKILL.md +750 -0
  205. package/templates/skills/python-testing/SKILL.md +816 -0
  206. package/templates/skills/spec-scaling/SKILL.md +89 -0
  207. package/templates/skills/strategic-compact/SKILL.md +131 -0
  208. package/templates/skills/strategic-compact/suggest-compact.sh +54 -0
  209. package/templates/skills/ui-visual-review/SKILL.md +154 -0
  210. package/templates/skills/verification-loop/SKILL.md +126 -0
  211. package/templates/skills/verification-loop/agents/openai.yaml +7 -0
  212. package/templates/track-mcp-map.tsv +15 -0
@@ -0,0 +1,279 @@
1
+ ---
2
+ name: eval-harness
3
+ description: Formal evaluation framework for Claude Code sessions implementing eval-driven development (EDD) principles
4
+ origin: ECC
5
+ tools: Read, Write, Edit, Bash, Grep, Glob
6
+ ---
7
+
8
+ # Eval Harness Skill
9
+
10
+ A formal evaluation framework for Claude Code sessions, implementing eval-driven development (EDD) principles.
11
+
12
+ ## When to Activate
13
+
14
+ - Setting up eval-driven development (EDD) for AI-assisted workflows
15
+ - Defining pass/fail criteria for Claude Code task completion
16
+ - Measuring agent reliability with pass@k metrics
17
+ - Creating regression test suites for prompt or agent changes
18
+ - Benchmarking agent performance across model versions
19
+
20
+ ## Philosophy
21
+
22
+ Eval-Driven Development treats evals as the "unit tests of AI development":
23
+ - Define expected behavior BEFORE implementation
24
+ - Run evals continuously during development
25
+ - Track regressions with each change
26
+ - Use pass@k metrics for reliability measurement
27
+
28
+ ## Eval Types
29
+
30
+ ### Capability Evals
31
+ Test if Claude can do something it couldn't before:
32
+ ```markdown
33
+ [CAPABILITY EVAL: feature-name]
34
+ Task: Description of what Claude should accomplish
35
+ Success Criteria:
36
+ - [ ] Criterion 1
37
+ - [ ] Criterion 2
38
+ - [ ] Criterion 3
39
+ Expected Output: Description of expected result
40
+ ```
41
+
42
+ ### Regression Evals
43
+ Ensure changes don't break existing functionality:
44
+ ```markdown
45
+ [REGRESSION EVAL: feature-name]
46
+ Baseline: SHA or checkpoint name
47
+ Tests:
48
+ - existing-test-1: PASS/FAIL
49
+ - existing-test-2: PASS/FAIL
50
+ - existing-test-3: PASS/FAIL
51
+ Result: X/Y passed (previously Y/Y)
52
+ ```
53
+
54
+ ## Grader Types
55
+
56
+ ### 1. Code-Based Grader
57
+ Deterministic checks using code:
58
+ ```bash
59
+ # Check if file contains expected pattern
60
+ grep -q "export function handleAuth" src/auth.ts && echo "PASS" || echo "FAIL"
61
+
62
+ # Check if tests pass
63
+ npm test -- --testPathPattern="auth" && echo "PASS" || echo "FAIL"
64
+
65
+ # Check if build succeeds
66
+ npm run build && echo "PASS" || echo "FAIL"
67
+ ```
68
+
69
+ ### 2. Model-Based Grader
70
+ Use Claude to evaluate open-ended outputs:
71
+ ```markdown
72
+ [MODEL GRADER PROMPT]
73
+ Evaluate the following code change:
74
+ 1. Does it solve the stated problem?
75
+ 2. Is it well-structured?
76
+ 3. Are edge cases handled?
77
+ 4. Is error handling appropriate?
78
+
79
+ Score: 1-5 (1=poor, 5=excellent)
80
+ Reasoning: [explanation]
81
+ ```
82
+
83
+ ### 3. Human Grader
84
+ Flag for manual review:
85
+ ```markdown
86
+ [HUMAN REVIEW REQUIRED]
87
+ Change: Description of what changed
88
+ Reason: Why human review is needed
89
+ Risk Level: LOW/MEDIUM/HIGH
90
+ ```
91
+
92
+ ## Metrics
93
+
94
+ ### pass@k
95
+ "At least one success in k attempts"
96
+ - pass@1: First attempt success rate
97
+ - pass@3: Success within 3 attempts
98
+ - Typical target: pass@3 > 90%
99
+
100
+ ### pass^k
101
+ "All k trials succeed"
102
+ - Higher bar for reliability
103
+ - pass^3: 3 consecutive successes
104
+ - Use for critical paths
105
+
106
+ ## Eval Workflow
107
+
108
+ ### 1. Define (Before Coding)
109
+ ```markdown
110
+ ## EVAL DEFINITION: feature-xyz
111
+
112
+ ### Capability Evals
113
+ 1. Can create new user account
114
+ 2. Can validate email format
115
+ 3. Can hash password securely
116
+
117
+ ### Regression Evals
118
+ 1. Existing login still works
119
+ 2. Session management unchanged
120
+ 3. Logout flow intact
121
+
122
+ ### Success Metrics
123
+ - pass@3 > 90% for capability evals
124
+ - pass^3 = 100% for regression evals
125
+ ```
126
+
127
+ ### 2. Implement
128
+ Write code to pass the defined evals.
129
+
130
+ ### 3. Evaluate
131
+ ```bash
132
+ # Run capability evals
133
+ [Run each capability eval, record PASS/FAIL]
134
+
135
+ # Run regression evals
136
+ npm test -- --testPathPattern="existing"
137
+
138
+ # Generate report
139
+ ```
140
+
141
+ ### 4. Report
142
+ ```markdown
143
+ EVAL REPORT: feature-xyz
144
+ ========================
145
+
146
+ Capability Evals:
147
+ create-user: PASS (pass@1)
148
+ validate-email: PASS (pass@2)
149
+ hash-password: PASS (pass@1)
150
+ Overall: 3/3 passed
151
+
152
+ Regression Evals:
153
+ login-flow: PASS
154
+ session-mgmt: PASS
155
+ logout-flow: PASS
156
+ Overall: 3/3 passed
157
+
158
+ Metrics:
159
+ pass@1: 67% (2/3)
160
+ pass@3: 100% (3/3)
161
+
162
+ Status: READY FOR REVIEW
163
+ ```
164
+
165
+ ## Integration Patterns
166
+
167
+ ### Pre-Implementation
168
+ ```
169
+ /eval define feature-name
170
+ ```
171
+ Creates eval definition file at `.claude/evals/feature-name.md`
172
+
173
+ ### During Implementation
174
+ ```
175
+ /eval check feature-name
176
+ ```
177
+ Runs current evals and reports status
178
+
179
+ ### Post-Implementation
180
+ ```
181
+ /eval report feature-name
182
+ ```
183
+ Generates full eval report
184
+
185
+ ## Eval Storage (.md + .log Pair Format)
186
+
187
+ 각 평가 항목은 **`<topic>.md` (설계) + `<topic>.log` (실행 결과)** 쌍으로 저장. 강제. 단독 .md만 있으면 재현 불가.
188
+
189
+ ```
190
+ .claude/
191
+ evals/
192
+ feature-xyz.md # Eval definition (Capability/Regression/Test 3섹션 필수)
193
+ feature-xyz.log # Eval run history (실행 시각, grader, pass/fail)
194
+ session-YYYYMMDD.md # 세션 단위 회고 + 차기 backlog
195
+ session-YYYYMMDD.log # 동일 세션의 grader 출력
196
+ baseline.json # Regression baselines (선택)
197
+ ```
198
+
199
+ > Vantage 프로젝트의 `.claude/evals/*.{md,log}` 구조를 일반화한 것.
200
+
201
+ ### .md 파일 의무 섹션 (3개)
202
+
203
+ ```markdown
204
+ # Eval: <topic>
205
+
206
+ ## Capability
207
+ [새 능력 — Claude/agent가 무엇을 할 수 있는지]
208
+ - AC: [측정 가능 기준]
209
+ - Grader: code-based / model-based / human
210
+
211
+ ## Regression
212
+ [기존 기능 보호 — 변경으로 깨지면 안 되는 baseline]
213
+ - Baseline: <SHA or checkpoint>
214
+ - Tests: [목록]
215
+
216
+ ## Test
217
+ [실행 절차 — 누가 다시 돌려도 동일 결과 나와야 함]
218
+ - Setup: [사전 조건]
219
+ - Run: `bash run-eval.sh <topic>` 또는 명시적 명령
220
+ - Expected: [기대 출력]
221
+ ```
222
+
223
+ ### .log 파일 형식
224
+
225
+ 각 실행마다 append. 시간순 누적.
226
+
227
+ ```
228
+ === 2026-04-19 14:32 (run #1) ===
229
+ Capability: 3/3 PASS (pass@1)
230
+ Regression: 5/5 PASS (pass^3)
231
+ Status: SHIP READY
232
+
233
+ === 2026-04-20 09:15 (run #2 — after refactor) ===
234
+ Capability: 3/3 PASS
235
+ Regression: 4/5 PASS (login-flow regressed at SHA abc123)
236
+ Status: BLOCKED — fix login-flow first
237
+ ```
238
+
239
+ ## Best Practices
240
+
241
+ 1. **Define evals BEFORE coding** - Forces clear thinking about success criteria
242
+ 2. **Run evals frequently** - Catch regressions early
243
+ 3. **Track pass@k over time** - Monitor reliability trends
244
+ 4. **Use code graders when possible** - Deterministic > probabilistic
245
+ 5. **Human review for security** - Never fully automate security checks
246
+ 6. **Keep evals fast** - Slow evals don't get run
247
+ 7. **Version evals with code** - Evals are first-class artifacts
248
+
249
+ ## Example: Adding Authentication
250
+
251
+ ```markdown
252
+ ## EVAL: add-authentication
253
+
254
+ ### Phase 1: Define (10 min)
255
+ Capability Evals:
256
+ - [ ] User can register with email/password
257
+ - [ ] User can login with valid credentials
258
+ - [ ] Invalid credentials rejected with proper error
259
+ - [ ] Sessions persist across page reloads
260
+ - [ ] Logout clears session
261
+
262
+ Regression Evals:
263
+ - [ ] Public routes still accessible
264
+ - [ ] API responses unchanged
265
+ - [ ] Database schema compatible
266
+
267
+ ### Phase 2: Implement (varies)
268
+ [Write code]
269
+
270
+ ### Phase 3: Evaluate
271
+ Run: /eval check add-authentication
272
+
273
+ ### Phase 4: Report
274
+ EVAL REPORT: add-authentication
275
+ ==============================
276
+ Capability: 5/5 passed (pass@3: 100%)
277
+ Regression: 3/3 passed (pass^3: 100%)
278
+ Status: SHIP IT
279
+ ```
@@ -0,0 +1,7 @@
1
+ interface:
2
+ display_name: "Eval Harness"
3
+ short_description: "Eval-driven development with pass/fail criteria"
4
+ brand_color: "#EC4899"
5
+ default_prompt: "Set up eval-driven development with pass/fail criteria"
6
+ policy:
7
+ allow_implicit_invocation: true
@@ -0,0 +1,58 @@
1
+ <!--
2
+ GitHub Issue body 템플릿 (gh-issue-workflow skill v26.34.0)
3
+ - 5섹션 모두 채울 필요는 없으나, 비어있으면 그 줄을 지운다 (placeholder 남기지 말 것).
4
+ - BDD 매핑: 전제(Given) → 적용 대상(When) → AC(Then).
5
+ - 방향성 상태로 작업 가능 여부가 결정된다 (OPEN = 작업 차단, "YYYY-MM-DD 확정" = 작업 가능).
6
+ - Labels (3-축, 권장):
7
+ - type: bug | feature | refactor | docs | infra
8
+ - 상태: decision-pending(방향성 OPEN) | ready(확정) | in-progress(PR open) | blocked(전제 미충족)
9
+ - 우선순위: P0 | P1 | P2 (선택)
10
+ - GitHub Project 연계 (선택): docs/SPEC.md에 `github_project: <URL>` 명시 시 자동 add.
11
+ -->
12
+
13
+ ## 배경
14
+
15
+ [왜 이 작업이 필요한가. 1-3 문장. 사용자가 발견한 증상, 도달하려는 상태, 비즈니스 맥락.]
16
+
17
+ ## 전제 (Given)
18
+
19
+ [이 작업을 시작하기 전에 충족돼야 하는 조건. 다른 issue / 외부 의존성 / 의사결정 결과 / 인프라 상태. 미충족 시 작업 차단.]
20
+
21
+ - [ ] [전제 조건 1 — 예: Issue #N 완료]
22
+ - [ ] [전제 조건 2 — 예: Stripe 계정 발급]
23
+ - [ ] [전제 조건 3 — 예: DB 스키마 v3 마이그레이션]
24
+
25
+ 전제 미충족 시 → 차단 사유 명시 + 충족시킬 책임자/순서 기록.
26
+
27
+ ## 방향성 (OPEN | YYYY-MM-DD 확정)
28
+
29
+ [현재 의사결정 상태. `OPEN` = 사용자 결정 대기, `YYYY-MM-DD 확정` = 결정 완료.]
30
+
31
+ - 옵션 A: [설명]
32
+ - 옵션 B: [설명]
33
+ - **선택 (확정 시)**: [선택지 + 근거]
34
+
35
+ 방향성이 OPEN이면 본 issue로 작업 진행 금지. AI agent는 사용자 결정 대기.
36
+
37
+ ## 적용 대상 / Acceptance Criteria (When → Then)
38
+
39
+ [변경 범위 + 측정 가능한 완료 조건.]
40
+
41
+ - [ ] [AC 1 — 예: `/admin/activity-logs` 페이지 11 이상 페이지 정상 작동 (When 사용자 11 클릭 → Then 11페이지 데이터 표시)]
42
+ - [ ] [AC 2 — 예: 디자인 시스템 토큰 사용 (When 페이지 렌더 → Then 색상/간격이 design system과 일치)]
43
+ - [ ] [AC 3]
44
+
45
+ AC는 검증 가능해야 함 — pass/fail 명확.
46
+
47
+ ## 후속 작업 (Next)
48
+
49
+ [본 issue 완료 후 분기되는 작업. 새 issue 번호 또는 잠정 설명.]
50
+
51
+ - [ ] [후속 1 — 예: Issue #N으로 분리]
52
+ - [ ] [후속 2]
53
+
54
+ 후속 작업이 없으면 이 섹션 통째로 삭제.
55
+
56
+ ---
57
+
58
+ <!-- PR 머지 시 본 issue 자동 close되도록 PR body에 `Closes #<this-issue-number>` 추가 -->
@@ -0,0 +1,184 @@
1
+ ---
2
+ name: gh-issue-workflow
3
+ description: "Treats GitHub Issues as the async backlog + decision channel between user and AI agent. Use when a non-blocking todo / bug / decision needs to persist beyond the chat session. Enforces 5-section body template (Background / Given / Decision / AC / Next) so issues become reusable agent context, not just sticky notes."
4
+ ---
5
+
6
+ # GitHub Issue Workflow
7
+
8
+ ## Purpose
9
+
10
+ 채팅(휘발성)과 plan.md(정적) 사이의 빈 곳을 GitHub Issue가 채운다. 1인 사용자 + AI agent 협업에서:
11
+
12
+ - 사용자가 발견한 bug/feature 요청 → issue로 backlog (chat을 끊지 않고)
13
+ - 의사결정이 필요한 갈림길 → issue body에 옵션 정리 → 사용자가 비동기로 결정 → AI agent가 fetch해서 작업
14
+ - 모든 결정의 영구 검색 가능 기록 (cross-link `#N`, label, milestone 활용)
15
+
16
+ dyld-vantage 프로젝트의 실제 운용 패턴(`#52~#55`)을 일반화. 1인 시나리오에 최적화 (팀 assign / reviewer 자동화 같은 건 안 함).
17
+
18
+ ## When to Invoke
19
+
20
+ | 트리거 | 행동 |
21
+ |--------|------|
22
+ | `/uzys:spec` 시작 + GitHub remote 존재 | "epic issue 만들까?" 1회 권유 (선택) |
23
+ | `/uzys:plan` 시작 | OPEN issue 목록 fetch → 우선순위 결정 후 todo.md로 이관 |
24
+ | `/uzys:build` 중 사용자가 새 bug/req 발견 | "issue로 backlog?" 권유 |
25
+ | `/uzys:build` commit | message에 `Refs #N` (작업 진행 기록) |
26
+ | `/uzys:ship` PR 작성 | body에 `Closes #N` (자동 close) |
27
+ | 의사결정 갈림길 등장 | issue body에 `방향성 (OPEN)` 로 등록 → 사용자 대기 |
28
+
29
+ ## Pre-conditions
30
+
31
+ - 프로젝트가 GitHub remote 보유 (`git remote -v`로 확인)
32
+ - `gh` CLI 설치 + 인증 (`gh auth status`로 확인). MCP `mcp__github__*` 사용 가능하면 우선.
33
+ - `docs/SPEC.md`에 `issue_tracking: enabled` 라인 있을 때만 활성 (opt-in). 기본 비활성.
34
+
35
+ 조건 미충족이면 본 skill 자동 skip — 에러 X.
36
+
37
+ ## Process
38
+
39
+ ### 1. ISSUE.template.md 5섹션 강제
40
+
41
+ 새 issue 생성 시 본 skill 디렉토리의 `ISSUE.template.md`를 body로 채운다.
42
+
43
+ ```
44
+ ## 배경 — Why
45
+ ## 전제 (Given) — 시작 전 의존성/조건
46
+ ## 방향성 — OPEN | YYYY-MM-DD 확정
47
+ ## 적용 대상 / AC (When → Then)
48
+ ## 후속 작업 — Next
49
+ ```
50
+
51
+ 비어있는 섹션은 통째로 삭제 (placeholder 금지). BDD 매핑: 전제(Given) → 적용 대상(When) → AC(Then).
52
+
53
+ ### 2. 방향성 상태로 작업 가능 여부 판정
54
+
55
+ | 상태 | 의미 | AI agent 행동 |
56
+ |------|------|--------------|
57
+ | **OPEN** | 사용자 결정 대기 | 본 issue 작업 차단. 다른 issue 우선 처리 또는 사용자에게 결정 요청 |
58
+ | **YYYY-MM-DD 확정** | 결정 완료 | 작업 가능. AC 충족 후 close |
59
+
60
+ 확정 날짜 미달 시 → 사용자에게 1회 결정 요청 (Escalation Gate) → 응답 후에만 진행.
61
+
62
+ ### 3. 전제(Given) 체크
63
+
64
+ 작업 시작 전 전제 조건 모두 충족됐는지 확인:
65
+ - 체크박스 `[x]` 모두 채워졌나?
66
+ - 미충족 항목 → 차단 사유 + 책임 분기 보고
67
+
68
+ 전제가 다른 issue 완료에 의존하면 → 의존 issue가 close 됐는지 확인 후 진행.
69
+
70
+ ### 4. Label 체계 (자동 토글 가이드)
71
+
72
+ **3-축 label 체계** — 각 축에서 1개씩 부착 권장:
73
+
74
+ | 축 | Label | 부착 시점 |
75
+ |----|-------|---------|
76
+ | **type** | `bug` / `feature` / `refactor` / `docs` / `infra` | issue 생성 시 1회 |
77
+ | **상태** | `decision-pending` / `ready` / `in-progress` / `blocked` | 방향성·전제 변화에 따라 토글 |
78
+ | **우선순위** | `P0` / `P1` / `P2` (선택) | 사용자 결정 |
79
+
80
+ **상태 자동 토글 규칙** (skill이 사용자에게 권유):
81
+
82
+ ```
83
+ 방향성: OPEN → decision-pending
84
+ 방향성: YYYY-MM-DD 확정 → ready (decision-pending 제거)
85
+ 전제 체크박스 미완 → blocked (ready 제거)
86
+ PR open → in-progress
87
+ PR merged → 자동 close (label 무관)
88
+ ```
89
+
90
+ label 부착은 **hook 차원 강제 X** — skill 가이드. 사용자 또는 PR 자동화로 명시 적용. `gh issue edit <N> --add-label <name>` / `--remove-label <name>` 사용.
91
+
92
+ ### 5. GitHub Projects (V2) 연계 (선택, opt-in)
93
+
94
+ GitHub Projects board를 칸반 형태 backlog로 활용 시:
95
+
96
+ **Pre-condition**:
97
+ - `docs/SPEC.md`에 `github_project: <URL>` 명시 (예: `https://github.com/users/uzysjung/projects/3`)
98
+ - 사용자가 Project 미리 생성 + status field 정의 (Backlog / Ready / In Progress / Done)
99
+
100
+ **자동 동작**:
101
+ - 새 issue 생성 시 → `gh project item-add <number> --owner <owner> --url <issue-url>` 호출
102
+ - 상태 변화 시 → status field 갱신:
103
+ - `decision-pending` → Project status `Backlog`
104
+ - `ready` → `Ready`
105
+ - PR open → `In Progress`
106
+ - merged + close → `Done`
107
+
108
+ **Project 미사용 프로젝트** → 본 섹션 skip (issue label만 활용).
109
+
110
+ **비대상**:
111
+ - iteration field / 자동 sprint 분배 (1인 시나리오 over-engineering)
112
+ - 복수 Project board 동기화 (1 SPEC = 1 Project 권장)
113
+
114
+ 본 섹션은 GitHub Projects 활용을 강제하지 않음 — 사용자 선호에 따라.
115
+
116
+ ### 6. `/uzys:auto` 와의 결합
117
+
118
+ `/uzys:auto` 사이클 시작 시 다음 시퀀스:
119
+
120
+ ```
121
+ 1. gh issue list --state open --json number,title,labels,body
122
+ → OPEN issue 목록을 backlog 후보로
123
+ 2. 각 issue body에서 "방향성 (YYYY-MM-DD 확정)" 패턴 grep
124
+ → 확정된 것만 작업 가능 후보
125
+ 3. 전제 미충족 issue 제외
126
+ 4. 우선순위 정렬 (label P0 > P1 > P2 > unlabeled)
127
+ 5. 상위 1-3개를 docs/todo.md로 이관 + Plan 단계 진입
128
+ ```
129
+
130
+ ### 7. Commit / PR 컨벤션
131
+
132
+ | 시점 | 메시지 컨벤션 |
133
+ |------|-------------|
134
+ | Build 중 진행 commit | `<type>: ... (refs #N)` |
135
+ | Ship PR body | `Closes #N` 또는 `Fixes #N` (자동 close) |
136
+ | 부분 진행 (close 안 함) | `Refs #N` |
137
+ | 후속 issue 생성 시 | 원본 issue body의 "후속 작업" 섹션에 `#M` cross-link |
138
+
139
+ ## Output
140
+
141
+ - GitHub Issue 생성/갱신 (5섹션 body)
142
+ - `docs/todo.md` — issue list에서 이관된 task
143
+ - commit/PR 메시지에 issue 번호 자동 포함
144
+
145
+ ## Anti-Patterns
146
+
147
+ - **issue body가 한 줄 ("login 안 됨")만** — 5섹션 의무. 최소 배경 + AC는 채울 것.
148
+ - **방향성 미명시** — OPEN인지 확정인지 모르면 작업 시작 불가.
149
+ - **전제 무시하고 진행** — 의존 issue 미해결 상태로 작업 진입 금지.
150
+ - **PR에서 `Closes #N` 누락** — 수동 close 잊기 쉬움. 컨벤션 강제.
151
+ - **모든 issue에 label 다 붙임** — 노이즈. 핵심 분류만.
152
+ - **팀 기능 도입 (assignee 자동, code owner 자동 review)** — 본 skill 범위 밖. 팀 사용은 별도 워크플로우.
153
+
154
+ ## Boundary
155
+
156
+ - GitHub remote 없는 프로젝트 → skill 자동 비활성
157
+ - `docs/SPEC.md`에 `issue_tracking: enabled` 없으면 자동 비활성 (opt-in)
158
+ - private repo 접근 권한 없으면 fetch 실패 → 사용자에게 보고
159
+
160
+ ## Examples
161
+
162
+ ### dyld-vantage 실제 패턴 (참고)
163
+
164
+ ```markdown
165
+ ## 배경
166
+ Issue #52에서 Feature Flag 재편(16→18) + Blur gate 인프라 구축 완료.
167
+ 이 이슈는 페이지별 blur 적용 + API 수량 제한의 후속 작업.
168
+
169
+ ## 전제 (Given)
170
+ - [x] Issue #52 완료 (Feature Flag 인프라)
171
+ - [x] Blur gate 컴포넌트 사용 가능
172
+
173
+ ## 방향성 (2026-04-22 확정)
174
+ - 메뉴 접근은 유지 (사이드바 풀 노출, 401/403 없음)
175
+ - 페이지 단위 blur: outer max-w-* mx-auto 안쪽에 blur_gate 1개만
176
+ - 개별 블록별 blur 금지 (복잡도 대비 가치 낮음)
177
+
178
+ ## 적용 대상 / AC
179
+ - [ ] 모든 _content.html에 blur_gate 적용 (When Free user 방문 → Then blur 노출)
180
+ - [ ] API 수량 제한 미들웨어 (When tier 미충족 요청 → Then 403)
181
+
182
+ ## 후속 작업
183
+ - [ ] Issue #56로 분리: Pricing 페이지 CTA 디자인
184
+ ```
@@ -0,0 +1,96 @@
1
+ ---
2
+ name: investor-materials
3
+ description: Create and update pitch decks, one-pagers, investor memos, accelerator applications, financial models, and fundraising materials. Use when the user needs investor-facing documents, projections, use-of-funds tables, milestone plans, or materials that must stay internally consistent across multiple fundraising assets.
4
+ origin: ECC
5
+ ---
6
+
7
+ # Investor Materials
8
+
9
+ Build investor-facing materials that are consistent, credible, and easy to defend.
10
+
11
+ ## When to Activate
12
+
13
+ - creating or revising a pitch deck
14
+ - writing an investor memo or one-pager
15
+ - building a financial model, milestone plan, or use-of-funds table
16
+ - answering accelerator or incubator application questions
17
+ - aligning multiple fundraising docs around one source of truth
18
+
19
+ ## Golden Rule
20
+
21
+ All investor materials must agree with each other.
22
+
23
+ Create or confirm a single source of truth before writing:
24
+ - traction metrics
25
+ - pricing and revenue assumptions
26
+ - raise size and instrument
27
+ - use of funds
28
+ - team bios and titles
29
+ - milestones and timelines
30
+
31
+ If conflicting numbers appear, stop and resolve them before drafting.
32
+
33
+ ## Core Workflow
34
+
35
+ 1. inventory the canonical facts
36
+ 2. identify missing assumptions
37
+ 3. choose the asset type
38
+ 4. draft the asset with explicit logic
39
+ 5. cross-check every number against the source of truth
40
+
41
+ ## Asset Guidance
42
+
43
+ ### Pitch Deck
44
+ Recommended flow:
45
+ 1. company + wedge
46
+ 2. problem
47
+ 3. solution
48
+ 4. product / demo
49
+ 5. market
50
+ 6. business model
51
+ 7. traction
52
+ 8. team
53
+ 9. competition / differentiation
54
+ 10. ask
55
+ 11. use of funds / milestones
56
+ 12. appendix
57
+
58
+ If the user wants a web-native deck, pair this skill with `frontend-slides`.
59
+
60
+ ### One-Pager / Memo
61
+ - state what the company does in one clean sentence
62
+ - show why now
63
+ - include traction and proof points early
64
+ - make the ask precise
65
+ - keep claims easy to verify
66
+
67
+ ### Financial Model
68
+ Include:
69
+ - explicit assumptions
70
+ - bear / base / bull cases when useful
71
+ - clean layer-by-layer revenue logic
72
+ - milestone-linked spending
73
+ - sensitivity analysis where the decision hinges on assumptions
74
+
75
+ ### Accelerator Applications
76
+ - answer the exact question asked
77
+ - prioritize traction, insight, and team advantage
78
+ - avoid puffery
79
+ - keep internal metrics consistent with the deck and model
80
+
81
+ ## Red Flags to Avoid
82
+
83
+ - unverifiable claims
84
+ - fuzzy market sizing without assumptions
85
+ - inconsistent team roles or titles
86
+ - revenue math that does not sum cleanly
87
+ - inflated certainty where assumptions are fragile
88
+
89
+ ## Quality Gate
90
+
91
+ Before delivering:
92
+ - every number matches the current source of truth
93
+ - use of funds and revenue layers sum correctly
94
+ - assumptions are visible, not buried
95
+ - the story is clear without hype language
96
+ - the final asset is defensible in a partner meeting