@cubis/foundry 0.3.71 → 0.3.73

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (276) hide show
  1. package/CHANGELOG.md +23 -2
  2. package/dist/cli/core.js +9 -22
  3. package/dist/cli/core.js.map +1 -1
  4. package/package.json +1 -1
  5. package/src/cli/core.ts +13 -22
  6. package/workflows/powers/accessibility/POWER.md +83 -94
  7. package/workflows/powers/accessibility/SKILL.md +82 -94
  8. package/workflows/powers/agent-design/POWER.md +201 -0
  9. package/workflows/powers/agent-design/SKILL.md +198 -0
  10. package/workflows/powers/agent-design/references/clarification-patterns.md +153 -0
  11. package/workflows/powers/agent-design/references/skill-testing.md +164 -0
  12. package/workflows/powers/agent-design/references/workflow-patterns.md +226 -0
  13. package/workflows/powers/agentic-eval/POWER.md +62 -0
  14. package/workflows/powers/agentic-eval/SKILL.md +59 -0
  15. package/workflows/powers/agentic-eval/references/rubric-and-regression-checklist.md +11 -0
  16. package/workflows/powers/api-designer/POWER.md +43 -71
  17. package/workflows/powers/api-designer/SKILL.md +43 -71
  18. package/workflows/powers/api-patterns/POWER.md +42 -56
  19. package/workflows/powers/api-patterns/SKILL.md +42 -57
  20. package/workflows/powers/architecture-designer/POWER.md +43 -60
  21. package/workflows/powers/architecture-designer/SKILL.md +43 -60
  22. package/workflows/powers/ask-questions-if-underspecified/POWER.md +51 -3
  23. package/workflows/powers/auth-architect/POWER.md +69 -0
  24. package/workflows/powers/auth-architect/SKILL.md +66 -0
  25. package/workflows/powers/auth-architect/references/session-token-policy-checklist.md +45 -0
  26. package/workflows/powers/behavioral-modes/POWER.md +100 -9
  27. package/workflows/powers/c-pro/POWER.md +105 -0
  28. package/workflows/powers/c-pro/SKILL.md +102 -0
  29. package/workflows/powers/c-pro/references/build-systems-and-toolchains.md +148 -0
  30. package/workflows/powers/c-pro/references/common-ub-and-portability.md +166 -0
  31. package/workflows/powers/c-pro/references/debugging-with-sanitizers.md +205 -0
  32. package/workflows/powers/c-pro/references/memory-safety-and-build-checklist.md +60 -0
  33. package/workflows/powers/c-pro/references/posix-and-platform-apis.md +244 -0
  34. package/workflows/powers/changelog-generator/POWER.md +127 -63
  35. package/workflows/powers/changelog-generator/SKILL.md +126 -63
  36. package/workflows/powers/ci-cd-pipelines/POWER.md +156 -0
  37. package/workflows/powers/ci-cd-pipelines/SKILL.md +153 -0
  38. package/workflows/powers/ci-cd-pipelines/references/github-actions-patterns.md +160 -0
  39. package/workflows/powers/ci-cd-pipelines/references/pipeline-security-checklist.md +57 -0
  40. package/workflows/powers/cli-developer/POWER.md +152 -95
  41. package/workflows/powers/cli-developer/SKILL.md +152 -95
  42. package/workflows/powers/cpp-pro/POWER.md +111 -0
  43. package/workflows/powers/cpp-pro/SKILL.md +108 -0
  44. package/workflows/powers/cpp-pro/references/concurrency-primitives.md +266 -0
  45. package/workflows/powers/cpp-pro/references/move-semantics-and-value-types.md +149 -0
  46. package/workflows/powers/cpp-pro/references/performance-and-profiling.md +191 -0
  47. package/workflows/powers/cpp-pro/references/raii-and-modern-cpp-checklist.md +87 -0
  48. package/workflows/powers/cpp-pro/references/template-and-concepts-patterns.md +205 -0
  49. package/workflows/powers/csharp-pro/POWER.md +47 -22
  50. package/workflows/powers/csharp-pro/SKILL.md +47 -22
  51. package/workflows/powers/dart-pro/POWER.md +68 -0
  52. package/workflows/powers/dart-pro/SKILL.md +65 -0
  53. package/workflows/powers/dart-pro/references/isolate-and-concurrency.md +180 -0
  54. package/workflows/powers/dart-pro/references/null-safety-and-async-patterns.md +133 -0
  55. package/workflows/powers/dart-pro/references/package-structure-and-linting.md +193 -0
  56. package/workflows/powers/dart-pro/references/sealed-records-patterns.md +173 -0
  57. package/workflows/powers/dart-pro/references/testing-and-mocking.md +235 -0
  58. package/workflows/powers/database-design/POWER.md +47 -33
  59. package/workflows/powers/database-design/SKILL.md +47 -33
  60. package/workflows/powers/database-optimizer/POWER.md +43 -64
  61. package/workflows/powers/database-optimizer/SKILL.md +43 -64
  62. package/workflows/powers/database-skills/POWER.md +59 -93
  63. package/workflows/powers/database-skills/SKILL.md +59 -93
  64. package/workflows/powers/debugging-strategies/POWER.md +69 -0
  65. package/workflows/powers/debugging-strategies/SKILL.md +66 -0
  66. package/workflows/powers/debugging-strategies/references/reproduce-isolate-verify-checklist.md +42 -0
  67. package/workflows/powers/deep-research/POWER.md +67 -0
  68. package/workflows/powers/deep-research/SKILL.md +64 -0
  69. package/workflows/powers/deep-research/references/multi-round-research-loop.md +80 -0
  70. package/workflows/powers/design-system-builder/POWER.md +130 -116
  71. package/workflows/powers/design-system-builder/SKILL.md +130 -116
  72. package/workflows/powers/devops-engineer/POWER.md +120 -57
  73. package/workflows/powers/devops-engineer/SKILL.md +120 -57
  74. package/workflows/powers/docker-kubernetes/POWER.md +94 -0
  75. package/workflows/powers/docker-kubernetes/SKILL.md +91 -0
  76. package/workflows/powers/docker-kubernetes/references/dockerfile-optimization-checklist.md +35 -0
  77. package/workflows/powers/docker-kubernetes/references/kubernetes-deployment-patterns.md +59 -0
  78. package/workflows/powers/documentation-templates/POWER.md +158 -127
  79. package/workflows/powers/documentation-templates/SKILL.md +158 -127
  80. package/workflows/powers/drizzle-expert/POWER.md +66 -0
  81. package/workflows/powers/drizzle-expert/SKILL.md +63 -0
  82. package/workflows/powers/drizzle-expert/references/runtime-pairing-matrix.md +16 -0
  83. package/workflows/powers/drizzle-expert/references/schema-and-migration-playbook.md +18 -0
  84. package/workflows/powers/error-ux-observability/POWER.md +144 -131
  85. package/workflows/powers/error-ux-observability/SKILL.md +143 -131
  86. package/workflows/powers/fastapi-expert/POWER.md +46 -60
  87. package/workflows/powers/fastapi-expert/SKILL.md +46 -60
  88. package/workflows/powers/firebase/POWER.md +65 -0
  89. package/workflows/powers/firebase/SKILL.md +62 -0
  90. package/workflows/powers/firebase/references/platform-routing.md +16 -0
  91. package/workflows/powers/firebase/references/rules-and-indexes-checklist.md +11 -0
  92. package/workflows/powers/flutter-design-system/POWER.md +63 -0
  93. package/workflows/powers/flutter-design-system/SKILL.md +60 -0
  94. package/workflows/powers/flutter-design-system/references/shared-widgets.md +29 -0
  95. package/workflows/powers/flutter-design-system/references/tokens-and-theme.md +34 -0
  96. package/workflows/powers/flutter-drift/POWER.md +65 -0
  97. package/workflows/powers/flutter-drift/SKILL.md +62 -0
  98. package/workflows/powers/flutter-drift/references/migrations.md +22 -0
  99. package/workflows/powers/flutter-drift/references/query-patterns.md +26 -0
  100. package/workflows/powers/flutter-feature/POWER.md +65 -0
  101. package/workflows/powers/flutter-feature/SKILL.md +62 -0
  102. package/workflows/powers/flutter-feature/references/architecture-rules.md +85 -0
  103. package/workflows/powers/flutter-feature/references/composite-provider.md +58 -0
  104. package/workflows/powers/flutter-feature/references/outbox-pattern.md +87 -0
  105. package/workflows/powers/flutter-feature/references/testing-patterns.md +218 -0
  106. package/workflows/powers/flutter-go-router/POWER.md +64 -0
  107. package/workflows/powers/flutter-go-router/SKILL.md +61 -0
  108. package/workflows/powers/flutter-go-router/references/guards-and-deeplinks.md +20 -0
  109. package/workflows/powers/flutter-go-router/references/typed-routes.md +27 -0
  110. package/workflows/powers/flutter-offline-sync/POWER.md +62 -0
  111. package/workflows/powers/flutter-offline-sync/SKILL.md +59 -0
  112. package/workflows/powers/flutter-offline-sync/references/outbox-full.md +44 -0
  113. package/workflows/powers/flutter-repository/POWER.md +64 -0
  114. package/workflows/powers/flutter-repository/SKILL.md +61 -0
  115. package/workflows/powers/flutter-repository/references/drift-patterns.md +21 -0
  116. package/workflows/powers/flutter-repository/references/retrofit-patterns.md +20 -0
  117. package/workflows/powers/flutter-riverpod/POWER.md +70 -0
  118. package/workflows/powers/flutter-riverpod/SKILL.md +67 -0
  119. package/workflows/powers/flutter-riverpod/references/async-and-mutations.md +19 -0
  120. package/workflows/powers/flutter-riverpod/references/async-lifecycle.md +19 -0
  121. package/workflows/powers/flutter-riverpod/references/provider-selection.md +20 -0
  122. package/workflows/powers/flutter-riverpod/references/testing.md +21 -0
  123. package/workflows/powers/flutter-riverpod/references/version-matrix.md +24 -0
  124. package/workflows/powers/flutter-state-machine/POWER.md +62 -0
  125. package/workflows/powers/flutter-state-machine/SKILL.md +59 -0
  126. package/workflows/powers/flutter-state-machine/references/app-state-contract.md +23 -0
  127. package/workflows/powers/flutter-state-machine/references/ui-rendering.md +14 -0
  128. package/workflows/powers/flutter-testing/POWER.md +64 -0
  129. package/workflows/powers/flutter-testing/SKILL.md +61 -0
  130. package/workflows/powers/flutter-testing/references/offline-sync-tests.md +16 -0
  131. package/workflows/powers/flutter-testing/references/test-layers.md +33 -0
  132. package/workflows/powers/frontend-code-review/POWER.md +137 -0
  133. package/workflows/powers/frontend-code-review/SKILL.md +134 -0
  134. package/workflows/powers/frontend-code-review/references/common-antipatterns.md +86 -0
  135. package/workflows/powers/frontend-code-review/references/performance-budgets.md +56 -0
  136. package/workflows/powers/frontend-code-review/references/review-checklists.md +47 -0
  137. package/workflows/powers/frontend-design/POWER.md +163 -362
  138. package/workflows/powers/frontend-design/SKILL.md +163 -362
  139. package/workflows/powers/game-development/POWER.md +57 -140
  140. package/workflows/powers/game-development/SKILL.md +57 -140
  141. package/workflows/powers/geo-fundamentals/POWER.md +64 -126
  142. package/workflows/powers/geo-fundamentals/SKILL.md +64 -127
  143. package/workflows/powers/git-workflow/POWER.md +135 -0
  144. package/workflows/powers/git-workflow/SKILL.md +132 -0
  145. package/workflows/powers/git-workflow/references/pr-review-checklist.md +63 -0
  146. package/workflows/powers/golang-pro/POWER.md +46 -35
  147. package/workflows/powers/golang-pro/SKILL.md +46 -35
  148. package/workflows/powers/graphql-architect/POWER.md +44 -62
  149. package/workflows/powers/graphql-architect/SKILL.md +44 -62
  150. package/workflows/powers/i18n-localization/POWER.md +118 -103
  151. package/workflows/powers/i18n-localization/SKILL.md +118 -103
  152. package/workflows/powers/java-pro/POWER.md +47 -22
  153. package/workflows/powers/java-pro/SKILL.md +47 -22
  154. package/workflows/powers/javascript-pro/POWER.md +47 -34
  155. package/workflows/powers/javascript-pro/SKILL.md +47 -34
  156. package/workflows/powers/kotlin-pro/POWER.md +46 -23
  157. package/workflows/powers/kotlin-pro/SKILL.md +46 -23
  158. package/workflows/powers/legacy-modernizer/POWER.md +43 -60
  159. package/workflows/powers/legacy-modernizer/SKILL.md +43 -60
  160. package/workflows/powers/mcp-builder/POWER.md +65 -0
  161. package/workflows/powers/mcp-builder/SKILL.md +62 -0
  162. package/workflows/powers/mcp-builder/references/testing-and-evals.md +17 -0
  163. package/workflows/powers/mcp-builder/references/transport-and-tool-design.md +17 -0
  164. package/workflows/powers/microservices-architect/POWER.md +43 -70
  165. package/workflows/powers/microservices-architect/SKILL.md +43 -70
  166. package/workflows/powers/mobile-design/POWER.md +110 -345
  167. package/workflows/powers/mobile-design/SKILL.md +110 -345
  168. package/workflows/powers/mongodb/POWER.md +67 -0
  169. package/workflows/powers/mongodb/SKILL.md +64 -0
  170. package/workflows/powers/mongodb/references/mongodb-checklist.md +20 -0
  171. package/workflows/powers/mysql/POWER.md +67 -0
  172. package/workflows/powers/mysql/SKILL.md +64 -0
  173. package/workflows/powers/mysql/references/mysql-checklist.md +20 -0
  174. package/workflows/powers/neki/POWER.md +67 -0
  175. package/workflows/powers/neki/SKILL.md +64 -0
  176. package/workflows/powers/neki/references/neki-checklist.md +18 -0
  177. package/workflows/powers/nestjs-expert/POWER.md +45 -91
  178. package/workflows/powers/nestjs-expert/SKILL.md +45 -91
  179. package/workflows/powers/nextjs-developer/POWER.md +51 -44
  180. package/workflows/powers/nextjs-developer/SKILL.md +51 -44
  181. package/workflows/powers/nodejs-best-practices/POWER.md +48 -29
  182. package/workflows/powers/nodejs-best-practices/SKILL.md +48 -29
  183. package/workflows/powers/observability/POWER.md +109 -0
  184. package/workflows/powers/observability/SKILL.md +106 -0
  185. package/workflows/powers/observability/references/alerting-and-slo-checklist.md +87 -0
  186. package/workflows/powers/observability/references/opentelemetry-setup-guide.md +121 -0
  187. package/workflows/powers/openai-docs/POWER.md +61 -0
  188. package/workflows/powers/openai-docs/SKILL.md +58 -0
  189. package/workflows/powers/openai-docs/references/official-source-playbook.md +10 -0
  190. package/workflows/powers/performance-profiling/POWER.md +61 -114
  191. package/workflows/powers/performance-profiling/SKILL.md +61 -114
  192. package/workflows/powers/php-pro/POWER.md +116 -0
  193. package/workflows/powers/php-pro/SKILL.md +113 -0
  194. package/workflows/powers/php-pro/references/architecture-and-di.md +239 -0
  195. package/workflows/powers/php-pro/references/modern-php-features.md +189 -0
  196. package/workflows/powers/php-pro/references/performance-and-deployment.md +197 -0
  197. package/workflows/powers/php-pro/references/php84-strict-typing-checklist.md +161 -0
  198. package/workflows/powers/php-pro/references/testing-and-static-analysis.md +235 -0
  199. package/workflows/powers/playwright-e2e/POWER.md +85 -0
  200. package/workflows/powers/playwright-e2e/SKILL.md +82 -0
  201. package/workflows/powers/playwright-e2e/references/locator-trace-flake-checklist.md +80 -0
  202. package/workflows/powers/postgres/POWER.md +67 -0
  203. package/workflows/powers/postgres/SKILL.md +64 -0
  204. package/workflows/powers/postgres/references/postgres-checklist.md +20 -0
  205. package/workflows/powers/prompt-engineer/POWER.md +47 -30
  206. package/workflows/powers/prompt-engineer/SKILL.md +47 -30
  207. package/workflows/powers/python-pro/POWER.md +47 -36
  208. package/workflows/powers/python-pro/SKILL.md +47 -36
  209. package/workflows/powers/react-best-practices/POWER.md +56 -33
  210. package/workflows/powers/react-best-practices/SKILL.md +56 -33
  211. package/workflows/powers/react-expert/POWER.md +47 -37
  212. package/workflows/powers/react-expert/SKILL.md +47 -37
  213. package/workflows/powers/redis/POWER.md +67 -0
  214. package/workflows/powers/redis/SKILL.md +64 -0
  215. package/workflows/powers/redis/references/redis-checklist.md +19 -0
  216. package/workflows/powers/ruby-pro/POWER.md +118 -0
  217. package/workflows/powers/ruby-pro/SKILL.md +115 -0
  218. package/workflows/powers/ruby-pro/references/modern-ruby-features.md +189 -0
  219. package/workflows/powers/ruby-pro/references/object-design-patterns.md +220 -0
  220. package/workflows/powers/ruby-pro/references/performance-and-profiling.md +224 -0
  221. package/workflows/powers/ruby-pro/references/ruby-concurrency-and-testing.md +190 -0
  222. package/workflows/powers/ruby-pro/references/testing-and-rspec.md +236 -0
  223. package/workflows/powers/rust-pro/POWER.md +45 -31
  224. package/workflows/powers/rust-pro/SKILL.md +45 -31
  225. package/workflows/powers/security-engineer/POWER.md +129 -0
  226. package/workflows/powers/security-engineer/SKILL.md +126 -0
  227. package/workflows/powers/seo-fundamentals/POWER.md +59 -102
  228. package/workflows/powers/seo-fundamentals/SKILL.md +59 -102
  229. package/workflows/powers/serverless-patterns/POWER.md +171 -0
  230. package/workflows/powers/serverless-patterns/SKILL.md +168 -0
  231. package/workflows/powers/skill-creator/POWER.md +90 -0
  232. package/workflows/powers/skill-creator/SKILL.md +87 -0
  233. package/workflows/powers/skill-creator/references/platform-formats.md +181 -0
  234. package/workflows/powers/skill-creator/references/schemas.md +430 -0
  235. package/workflows/powers/spec-miner/POWER.md +49 -57
  236. package/workflows/powers/spec-miner/SKILL.md +49 -57
  237. package/workflows/powers/sqlite/POWER.md +67 -0
  238. package/workflows/powers/sqlite/SKILL.md +64 -0
  239. package/workflows/powers/sqlite/references/sqlite-checklist.md +19 -0
  240. package/workflows/powers/sre-engineer/POWER.md +123 -64
  241. package/workflows/powers/sre-engineer/SKILL.md +123 -64
  242. package/workflows/powers/static-analysis/POWER.md +121 -77
  243. package/workflows/powers/static-analysis/SKILL.md +121 -77
  244. package/workflows/powers/stripe-best-practices/POWER.md +140 -17
  245. package/workflows/powers/stripe-best-practices/SKILL.md +139 -17
  246. package/workflows/powers/supabase/POWER.md +67 -0
  247. package/workflows/powers/supabase/SKILL.md +64 -0
  248. package/workflows/powers/supabase/references/supabase-checklist.md +19 -0
  249. package/workflows/powers/swift-pro/POWER.md +118 -0
  250. package/workflows/powers/swift-pro/SKILL.md +115 -0
  251. package/workflows/powers/swift-pro/references/concurrency-patterns.md +165 -0
  252. package/workflows/powers/swift-pro/references/protocol-and-generics.md +172 -0
  253. package/workflows/powers/swift-pro/references/sendable-and-isolation.md +116 -0
  254. package/workflows/powers/swift-pro/references/swift-concurrency-and-protocols.md +260 -0
  255. package/workflows/powers/swift-pro/references/testing-and-packages.md +192 -0
  256. package/workflows/powers/tailwind-patterns/POWER.md +71 -240
  257. package/workflows/powers/tailwind-patterns/SKILL.md +71 -240
  258. package/workflows/powers/testing-patterns/POWER.md +155 -10
  259. package/workflows/powers/testing-patterns/SKILL.md +155 -10
  260. package/workflows/powers/typescript-pro/POWER.md +47 -38
  261. package/workflows/powers/typescript-pro/SKILL.md +47 -38
  262. package/workflows/powers/vitess/POWER.md +67 -0
  263. package/workflows/powers/vitess/SKILL.md +64 -0
  264. package/workflows/powers/vitess/references/vitess-checklist.md +19 -0
  265. package/workflows/powers/vulnerability-scanner/POWER.md +146 -10
  266. package/workflows/powers/vulnerability-scanner/SKILL.md +146 -10
  267. package/workflows/powers/web-perf/POWER.md +43 -170
  268. package/workflows/powers/web-perf/SKILL.md +43 -170
  269. package/workflows/powers/webapp-testing/POWER.md +43 -164
  270. package/workflows/powers/webapp-testing/SKILL.md +43 -164
  271. package/workflows/workflows/agent-environment-setup/platforms/antigravity/rules/GEMINI.md +65 -42
  272. package/workflows/workflows/agent-environment-setup/platforms/claude/rules/CLAUDE.md +8 -6
  273. package/workflows/workflows/agent-environment-setup/platforms/codex/rules/AGENTS.md +65 -41
  274. package/workflows/workflows/agent-environment-setup/platforms/copilot/rules/copilot-instructions.md +8 -6
  275. package/workflows/workflows/agent-environment-setup/shared/rules/STEERING.md +9 -8
  276. package/workflows/workflows/agent-environment-setup/shared/rules/overrides/codex.md +1 -1
@@ -0,0 +1,430 @@
1
+ # JSON Schemas
2
+
3
+ This document defines the JSON schemas used by skill-creator.
4
+
5
+ ---
6
+
7
+ ## evals.json
8
+
9
+ Defines the evals for a skill. Located at `evals/evals.json` within the skill directory.
10
+
11
+ ```json
12
+ {
13
+ "skill_name": "example-skill",
14
+ "evals": [
15
+ {
16
+ "id": 1,
17
+ "prompt": "User's example prompt",
18
+ "expected_output": "Description of expected result",
19
+ "files": ["evals/files/sample1.pdf"],
20
+ "expectations": [
21
+ "The output includes X",
22
+ "The skill used script Y"
23
+ ]
24
+ }
25
+ ]
26
+ }
27
+ ```
28
+
29
+ **Fields:**
30
+ - `skill_name`: Name matching the skill's frontmatter
31
+ - `evals[].id`: Unique integer identifier
32
+ - `evals[].prompt`: The task to execute
33
+ - `evals[].expected_output`: Human-readable description of success
34
+ - `evals[].files`: Optional list of input file paths (relative to skill root)
35
+ - `evals[].expectations`: List of verifiable statements
36
+
37
+ ---
38
+
39
+ ## history.json
40
+
41
+ Tracks version progression in Improve mode. Located at workspace root.
42
+
43
+ ```json
44
+ {
45
+ "started_at": "2026-01-15T10:30:00Z",
46
+ "skill_name": "pdf",
47
+ "current_best": "v2",
48
+ "iterations": [
49
+ {
50
+ "version": "v0",
51
+ "parent": null,
52
+ "expectation_pass_rate": 0.65,
53
+ "grading_result": "baseline",
54
+ "is_current_best": false
55
+ },
56
+ {
57
+ "version": "v1",
58
+ "parent": "v0",
59
+ "expectation_pass_rate": 0.75,
60
+ "grading_result": "won",
61
+ "is_current_best": false
62
+ },
63
+ {
64
+ "version": "v2",
65
+ "parent": "v1",
66
+ "expectation_pass_rate": 0.85,
67
+ "grading_result": "won",
68
+ "is_current_best": true
69
+ }
70
+ ]
71
+ }
72
+ ```
73
+
74
+ **Fields:**
75
+ - `started_at`: ISO timestamp of when improvement started
76
+ - `skill_name`: Name of the skill being improved
77
+ - `current_best`: Version identifier of the best performer
78
+ - `iterations[].version`: Version identifier (v0, v1, ...)
79
+ - `iterations[].parent`: Parent version this was derived from
80
+ - `iterations[].expectation_pass_rate`: Pass rate from grading
81
+ - `iterations[].grading_result`: "baseline", "won", "lost", or "tie"
82
+ - `iterations[].is_current_best`: Whether this is the current best version
83
+
84
+ ---
85
+
86
+ ## grading.json
87
+
88
+ Output from the grader agent. Located at `<run-dir>/grading.json`.
89
+
90
+ ```json
91
+ {
92
+ "expectations": [
93
+ {
94
+ "text": "The output includes the name 'John Smith'",
95
+ "passed": true,
96
+ "evidence": "Found in transcript Step 3: 'Extracted names: John Smith, Sarah Johnson'"
97
+ },
98
+ {
99
+ "text": "The spreadsheet has a SUM formula in cell B10",
100
+ "passed": false,
101
+ "evidence": "No spreadsheet was created. The output was a text file."
102
+ }
103
+ ],
104
+ "summary": {
105
+ "passed": 2,
106
+ "failed": 1,
107
+ "total": 3,
108
+ "pass_rate": 0.67
109
+ },
110
+ "execution_metrics": {
111
+ "tool_calls": {
112
+ "Read": 5,
113
+ "Write": 2,
114
+ "Bash": 8
115
+ },
116
+ "total_tool_calls": 15,
117
+ "total_steps": 6,
118
+ "errors_encountered": 0,
119
+ "output_chars": 12450,
120
+ "transcript_chars": 3200
121
+ },
122
+ "timing": {
123
+ "executor_duration_seconds": 165.0,
124
+ "grader_duration_seconds": 26.0,
125
+ "total_duration_seconds": 191.0
126
+ },
127
+ "claims": [
128
+ {
129
+ "claim": "The form has 12 fillable fields",
130
+ "type": "factual",
131
+ "verified": true,
132
+ "evidence": "Counted 12 fields in field_info.json"
133
+ }
134
+ ],
135
+ "user_notes_summary": {
136
+ "uncertainties": ["Used 2023 data, may be stale"],
137
+ "needs_review": [],
138
+ "workarounds": ["Fell back to text overlay for non-fillable fields"]
139
+ },
140
+ "eval_feedback": {
141
+ "suggestions": [
142
+ {
143
+ "assertion": "The output includes the name 'John Smith'",
144
+ "reason": "A hallucinated document that mentions the name would also pass"
145
+ }
146
+ ],
147
+ "overall": "Assertions check presence but not correctness."
148
+ }
149
+ }
150
+ ```
151
+
152
+ **Fields:**
153
+ - `expectations[]`: Graded expectations with evidence
154
+ - `summary`: Aggregate pass/fail counts
155
+ - `execution_metrics`: Tool usage and output size (from executor's metrics.json)
156
+ - `timing`: Wall clock timing (from timing.json)
157
+ - `claims`: Extracted and verified claims from the output
158
+ - `user_notes_summary`: Issues flagged by the executor
159
+ - `eval_feedback`: (optional) Improvement suggestions for the evals, only present when the grader identifies issues worth raising
160
+
161
+ ---
162
+
163
+ ## metrics.json
164
+
165
+ Output from the executor agent. Located at `<run-dir>/outputs/metrics.json`.
166
+
167
+ ```json
168
+ {
169
+ "tool_calls": {
170
+ "Read": 5,
171
+ "Write": 2,
172
+ "Bash": 8,
173
+ "Edit": 1,
174
+ "Glob": 2,
175
+ "Grep": 0
176
+ },
177
+ "total_tool_calls": 18,
178
+ "total_steps": 6,
179
+ "files_created": ["filled_form.pdf", "field_values.json"],
180
+ "errors_encountered": 0,
181
+ "output_chars": 12450,
182
+ "transcript_chars": 3200
183
+ }
184
+ ```
185
+
186
+ **Fields:**
187
+ - `tool_calls`: Count per tool type
188
+ - `total_tool_calls`: Sum of all tool calls
189
+ - `total_steps`: Number of major execution steps
190
+ - `files_created`: List of output files created
191
+ - `errors_encountered`: Number of errors during execution
192
+ - `output_chars`: Total character count of output files
193
+ - `transcript_chars`: Character count of transcript
194
+
195
+ ---
196
+
197
+ ## timing.json
198
+
199
+ Wall clock timing for a run. Located at `<run-dir>/timing.json`.
200
+
201
+ **How to capture:** When a subagent task completes, the task notification includes `total_tokens` and `duration_ms`. Save these immediately — they are not persisted anywhere else and cannot be recovered after the fact.
202
+
203
+ ```json
204
+ {
205
+ "total_tokens": 84852,
206
+ "duration_ms": 23332,
207
+ "total_duration_seconds": 23.3,
208
+ "executor_start": "2026-01-15T10:30:00Z",
209
+ "executor_end": "2026-01-15T10:32:45Z",
210
+ "executor_duration_seconds": 165.0,
211
+ "grader_start": "2026-01-15T10:32:46Z",
212
+ "grader_end": "2026-01-15T10:33:12Z",
213
+ "grader_duration_seconds": 26.0
214
+ }
215
+ ```
216
+
217
+ ---
218
+
219
+ ## benchmark.json
220
+
221
+ Output from Benchmark mode. Located at `benchmarks/<timestamp>/benchmark.json`.
222
+
223
+ ```json
224
+ {
225
+ "metadata": {
226
+ "skill_name": "pdf",
227
+ "skill_path": "/path/to/pdf",
228
+ "executor_model": "claude-sonnet-4-20250514",
229
+ "analyzer_model": "most-capable-model",
230
+ "timestamp": "2026-01-15T10:30:00Z",
231
+ "evals_run": [1, 2, 3],
232
+ "runs_per_configuration": 3
233
+ },
234
+
235
+ "runs": [
236
+ {
237
+ "eval_id": 1,
238
+ "eval_name": "Ocean",
239
+ "configuration": "with_skill",
240
+ "run_number": 1,
241
+ "result": {
242
+ "pass_rate": 0.85,
243
+ "passed": 6,
244
+ "failed": 1,
245
+ "total": 7,
246
+ "time_seconds": 42.5,
247
+ "tokens": 3800,
248
+ "tool_calls": 18,
249
+ "errors": 0
250
+ },
251
+ "expectations": [
252
+ {"text": "...", "passed": true, "evidence": "..."}
253
+ ],
254
+ "notes": [
255
+ "Used 2023 data, may be stale",
256
+ "Fell back to text overlay for non-fillable fields"
257
+ ]
258
+ }
259
+ ],
260
+
261
+ "run_summary": {
262
+ "with_skill": {
263
+ "pass_rate": {"mean": 0.85, "stddev": 0.05, "min": 0.80, "max": 0.90},
264
+ "time_seconds": {"mean": 45.0, "stddev": 12.0, "min": 32.0, "max": 58.0},
265
+ "tokens": {"mean": 3800, "stddev": 400, "min": 3200, "max": 4100}
266
+ },
267
+ "without_skill": {
268
+ "pass_rate": {"mean": 0.35, "stddev": 0.08, "min": 0.28, "max": 0.45},
269
+ "time_seconds": {"mean": 32.0, "stddev": 8.0, "min": 24.0, "max": 42.0},
270
+ "tokens": {"mean": 2100, "stddev": 300, "min": 1800, "max": 2500}
271
+ },
272
+ "delta": {
273
+ "pass_rate": "+0.50",
274
+ "time_seconds": "+13.0",
275
+ "tokens": "+1700"
276
+ }
277
+ },
278
+
279
+ "notes": [
280
+ "Assertion 'Output is a PDF file' passes 100% in both configurations - may not differentiate skill value",
281
+ "Eval 3 shows high variance (50% ± 40%) - may be flaky or model-dependent",
282
+ "Without-skill runs consistently fail on table extraction expectations",
283
+ "Skill adds 13s average execution time but improves pass rate by 50%"
284
+ ]
285
+ }
286
+ ```
287
+
288
+ **Fields:**
289
+ - `metadata`: Information about the benchmark run
290
+ - `skill_name`: Name of the skill
291
+ - `timestamp`: When the benchmark was run
292
+ - `evals_run`: List of eval names or IDs
293
+ - `runs_per_configuration`: Number of runs per config (e.g. 3)
294
+ - `runs[]`: Individual run results
295
+ - `eval_id`: Numeric eval identifier
296
+ - `eval_name`: Human-readable eval name (used as section header in the viewer)
297
+ - `configuration`: Must be `"with_skill"` or `"without_skill"` (the viewer uses this exact string for grouping and color coding)
298
+ - `run_number`: Integer run number (1, 2, 3...)
299
+ - `result`: Nested object with `pass_rate`, `passed`, `total`, `time_seconds`, `tokens`, `errors`
300
+ - `run_summary`: Statistical aggregates per configuration
301
+ - `with_skill` / `without_skill`: Each contains `pass_rate`, `time_seconds`, `tokens` objects with `mean` and `stddev` fields
302
+ - `delta`: Difference strings like `"+0.50"`, `"+13.0"`, `"+1700"`
303
+ - `notes`: Freeform observations from the analyzer
304
+
305
+ **Important:** The viewer reads these field names exactly. Using `config` instead of `configuration`, or putting `pass_rate` at the top level of a run instead of nested under `result`, will cause the viewer to show empty/zero values. Always reference this schema when generating benchmark.json manually.
306
+
307
+ ---
308
+
309
+ ## comparison.json
310
+
311
+ Output from blind comparator. Located at `<grading-dir>/comparison-N.json`.
312
+
313
+ ```json
314
+ {
315
+ "winner": "A",
316
+ "reasoning": "Output A provides a complete solution with proper formatting and all required fields. Output B is missing the date field and has formatting inconsistencies.",
317
+ "rubric": {
318
+ "A": {
319
+ "content": {
320
+ "correctness": 5,
321
+ "completeness": 5,
322
+ "accuracy": 4
323
+ },
324
+ "structure": {
325
+ "organization": 4,
326
+ "formatting": 5,
327
+ "usability": 4
328
+ },
329
+ "content_score": 4.7,
330
+ "structure_score": 4.3,
331
+ "overall_score": 9.0
332
+ },
333
+ "B": {
334
+ "content": {
335
+ "correctness": 3,
336
+ "completeness": 2,
337
+ "accuracy": 3
338
+ },
339
+ "structure": {
340
+ "organization": 3,
341
+ "formatting": 2,
342
+ "usability": 3
343
+ },
344
+ "content_score": 2.7,
345
+ "structure_score": 2.7,
346
+ "overall_score": 5.4
347
+ }
348
+ },
349
+ "output_quality": {
350
+ "A": {
351
+ "score": 9,
352
+ "strengths": ["Complete solution", "Well-formatted", "All fields present"],
353
+ "weaknesses": ["Minor style inconsistency in header"]
354
+ },
355
+ "B": {
356
+ "score": 5,
357
+ "strengths": ["Readable output", "Correct basic structure"],
358
+ "weaknesses": ["Missing date field", "Formatting inconsistencies", "Partial data extraction"]
359
+ }
360
+ },
361
+ "expectation_results": {
362
+ "A": {
363
+ "passed": 4,
364
+ "total": 5,
365
+ "pass_rate": 0.80,
366
+ "details": [
367
+ {"text": "Output includes name", "passed": true}
368
+ ]
369
+ },
370
+ "B": {
371
+ "passed": 3,
372
+ "total": 5,
373
+ "pass_rate": 0.60,
374
+ "details": [
375
+ {"text": "Output includes name", "passed": true}
376
+ ]
377
+ }
378
+ }
379
+ }
380
+ ```
381
+
382
+ ---
383
+
384
+ ## analysis.json
385
+
386
+ Output from post-hoc analyzer. Located at `<grading-dir>/analysis.json`.
387
+
388
+ ```json
389
+ {
390
+ "comparison_summary": {
391
+ "winner": "A",
392
+ "winner_skill": "path/to/winner/skill",
393
+ "loser_skill": "path/to/loser/skill",
394
+ "comparator_reasoning": "Brief summary of why comparator chose winner"
395
+ },
396
+ "winner_strengths": [
397
+ "Clear step-by-step instructions for handling multi-page documents",
398
+ "Included validation script that caught formatting errors"
399
+ ],
400
+ "loser_weaknesses": [
401
+ "Vague instruction 'process the document appropriately' led to inconsistent behavior",
402
+ "No script for validation, agent had to improvise"
403
+ ],
404
+ "instruction_following": {
405
+ "winner": {
406
+ "score": 9,
407
+ "issues": ["Minor: skipped optional logging step"]
408
+ },
409
+ "loser": {
410
+ "score": 6,
411
+ "issues": [
412
+ "Did not use the skill's formatting template",
413
+ "Invented own approach instead of following step 3"
414
+ ]
415
+ }
416
+ },
417
+ "improvement_suggestions": [
418
+ {
419
+ "priority": "high",
420
+ "category": "instructions",
421
+ "suggestion": "Replace 'process the document appropriately' with explicit steps",
422
+ "expected_impact": "Would eliminate ambiguity that caused inconsistent behavior"
423
+ }
424
+ ],
425
+ "transcript_insights": {
426
+ "winner_execution_pattern": "Read skill -> Followed 5-step process -> Used validation script",
427
+ "loser_execution_pattern": "Read skill -> Unclear on approach -> Tried 3 different methods"
428
+ }
429
+ }
430
+ ```
@@ -2,84 +2,76 @@
2
2
  ---
3
3
  inclusion: manual
4
4
  name: spec-miner
5
- description: Use when understanding legacy or undocumented systems, creating documentation for existing code, or extracting specifications from implementations. Invoke for legacy analysis, code archaeology, undocumented features.
5
+ description: "Use when reverse-engineering legacy or undocumented systems into structured specifications with code-grounded evidence and EARS-format requirements."
6
6
  license: MIT
7
- allowed-tools: Read, Grep, Glob, Bash
8
7
  metadata:
9
- author: https://github.com/Jeffallan
10
- version: "1.0.0"
11
- domain: workflow
12
- triggers: reverse engineer, legacy code, code analysis, undocumented, understand codebase, existing system
13
- role: specialist
14
- scope: review
15
- output-format: document
16
- related-skills: feature-forge, fullstack-guardian, architecture-designer
8
+ author: cubis-foundry
9
+ version: "1.0"
10
+ compatibility: Claude Code, Codex, GitHub Copilot
17
11
  ---
18
12
 
19
13
  # Spec Miner
20
14
 
21
- Reverse-engineering specialist who extracts specifications from existing codebases.
15
+ ## Purpose
22
16
 
23
- ## Role Definition
17
+ Use when reverse-engineering legacy or undocumented systems into structured specifications with code-grounded evidence and EARS-format requirements.
24
18
 
25
- You are a senior software archaeologist with 10+ years of experience. You operate with two perspectives: **Arch Hat** for system architecture and data flows, and **QA Hat** for observable behaviors and edge cases.
19
+ ## When to Use
26
20
 
27
- ## When to Use This Skill
21
+ - Understanding legacy or undocumented systems by extracting behavior from code.
22
+ - Creating documentation for existing codebases that lack specifications.
23
+ - Onboarding onto unfamiliar projects by mapping structure, data flows, and business logic.
24
+ - Planning enhancements or migrations that require understanding current behavior first.
25
+ - Extracting implicit requirements from implementations for formal specification.
28
26
 
29
- - Understanding legacy or undocumented systems
30
- - Creating documentation for existing code
31
- - Onboarding to a new codebase
32
- - Planning enhancements to existing features
33
- - Extracting requirements from implementation
27
+ ## Instructions
34
28
 
35
- ## Core Workflow
29
+ 1. Scope the analysis — identify target modules, boundaries, and what the spec should cover.
30
+ 2. Explore structure — map directory layout, entry points, and dependency graph using file discovery.
31
+ 3. Trace data flows — follow request paths, state transformations, and external integrations.
32
+ 4. Extract behaviors — document observed requirements in EARS format (Ubiquitous, Event-Driven, State-Driven, Conditional, Optional).
33
+ 5. Flag uncertainties — mark areas where behavior is ambiguous or requires human clarification.
34
+ 6. Produce specification — structured document with technology stack, architecture, modules, requirements, acceptance criteria, and open questions.
36
35
 
37
- 1. **Scope** - Identify analysis boundaries (full system or specific feature)
38
- 2. **Explore** - Map structure using Glob, Grep, Read tools
39
- 3. **Trace** - Follow data flows and request paths
40
- 4. **Document** - Write observed requirements in EARS format
41
- 5. **Flag** - Mark areas needing clarification
36
+ ### Baseline standards
42
37
 
43
- ## Reference Guide
38
+ - Ground every finding in code evidence with file paths and line references.
39
+ - Distinguish facts (observed in code) from inferences (reasonable assumptions).
40
+ - Operate with dual mindset: Architecture Hat (structure, data flows) and QA Hat (behaviors, edge cases).
41
+ - Document security, authentication, and error handling patterns explicitly.
42
+ - Include external integrations, configuration, and environment dependencies.
44
43
 
45
- Load detailed guidance based on context:
44
+ ### Constraints
46
45
 
47
- | Topic | Reference | Load When |
48
- |-------|-----------|-----------|
49
- | Analysis Process | `references/analysis-process.md` | Starting exploration, Glob/Grep patterns |
50
- | EARS Format | `references/ears-format.md` | Writing observed requirements |
51
- | Specification Template | `references/specification-template.md` | Creating final specification document |
52
- | Analysis Checklist | `references/analysis-checklist.md` | Ensuring thorough analysis |
46
+ - Never assume behavior without code evidence.
47
+ - Never skip security or error handling paths during analysis.
48
+ - Never generate a specification without thorough codebase exploration.
49
+ - Always include code locations for every documented behavior.
50
+ - Always mark uncertainties and questions separately from confirmed findings.
53
51
 
54
- ## Constraints
52
+ ## Output Format
55
53
 
56
- ### MUST DO
57
- - Ground all observations in actual code evidence
58
- - Use Read, Grep, Glob extensively to explore
59
- - Distinguish between observed facts and inferences
60
- - Document uncertainties in dedicated section
61
- - Include code locations for each observation
54
+ Save as `specs/{project_name}_reverse_spec.md`:
62
55
 
63
- ### MUST NOT DO
64
- - Make assumptions without code evidence
65
- - Skip security pattern analysis
66
- - Ignore error handling patterns
67
- - Generate spec without thorough exploration
56
+ 1. Technology stack and architecture overview
57
+ 2. Module and directory structure
58
+ 3. Observed requirements in EARS format
59
+ 4. Non-functional observations (performance, security, scalability)
60
+ 5. Inferred acceptance criteria
61
+ 6. Uncertainties and questions
62
+ 7. Recommendations for improvement
68
63
 
69
- ## Output Templates
64
+ ## References
70
65
 
71
- Save specification as: `specs/{project_name}_reverse_spec.md`
66
+ No additional reference files.
72
67
 
73
- Include:
74
- 1. Technology stack and architecture
75
- 2. Module/directory structure
76
- 3. Observed requirements (EARS format)
77
- 4. Non-functional observations
78
- 5. Inferred acceptance criteria
79
- 6. Uncertainties and questions
80
- 7. Recommendations
68
+ ## Scripts
69
+
70
+ No helper scripts are required for this skill right now.
81
71
 
82
- ## Knowledge Reference
72
+ ## Examples
83
73
 
84
- Code archaeology, static analysis, design patterns, architectural patterns, EARS syntax, API documentation inference
74
+ - "Reverse-engineer the authentication flow in this legacy codebase"
75
+ - "Create a specification document for this undocumented API service"
76
+ - "Map the data flows and business logic in this module for onboarding"
85
77
  ````