ancoder-skill-cli 0.6.4 → 0.7.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md
CHANGED
|
@@ -84,6 +84,121 @@ my-skill/
|
|
|
84
84
|
|
|
85
85
|
Users who `npm install -g ancoder-skill-cli` get a fully bundled package. No extra binary download is required during install.
|
|
86
86
|
|
|
87
|
+
## Test-Driven Skill Development (100:10:1 Architecture)
|
|
88
|
+
|
|
89
|
+
skill-cli adopts a test-driven approach to skill development, inspired by [oh-my-claudecode](https://github.com/Yeachan-Heo/oh-my-claudecode)'s multi-agent orchestration patterns. The core principle: **invest the majority of compute in building robust test skills, not the skill itself.**
|
|
90
|
+
|
|
91
|
+
### Time Allocation: 100:10:1
|
|
92
|
+
|
|
93
|
+
When creating a skill for a task, the system simultaneously creates a **main skill** and a **test skill**:
|
|
94
|
+
|
|
95
|
+
| Phase | Time Share | Purpose |
|
|
96
|
+
|-------|-----------|---------|
|
|
97
|
+
| Test skill development | 90% (100 units) | Build an automated evaluator that compares expected vs actual output, locating specific differences |
|
|
98
|
+
| Main skill development | 9% (10 units) | Implement the actual skill, guided by test skill feedback |
|
|
99
|
+
| Execution & verification | 1% (1 unit) | Final end-to-end smoke test |
|
|
100
|
+
|
|
101
|
+
### Architecture
|
|
102
|
+
|
|
103
|
+
```text
|
|
104
|
+
Phase 1: Test Skill Development (90% compute)
|
|
105
|
+
generate structured acceptance criteria
|
|
106
|
+
-> N planners generate test strategies in parallel
|
|
107
|
+
-> critic reviews + eliminates weak strategies
|
|
108
|
+
-> N executors implement test skills in parallel
|
|
109
|
+
-> golden test evaluation (tournament selection)
|
|
110
|
+
-> repeat until precision threshold met
|
|
111
|
+
-> best test skill selected
|
|
112
|
+
|
|
113
|
+
Phase 2: Main Skill Development (9% compute)
|
|
114
|
+
generate main skill
|
|
115
|
+
-> test skill verifies (independent executor)
|
|
116
|
+
-> structured diff feedback injected into next prompt
|
|
117
|
+
-> repeat until test skill passes
|
|
118
|
+
-> main skill complete
|
|
119
|
+
|
|
120
|
+
Phase 3: Final Verification (1% compute)
|
|
121
|
+
end-to-end smoke test
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
### Key Design Principles
|
|
125
|
+
|
|
126
|
+
**1. Separation of Author and Reviewer**
|
|
127
|
+
|
|
128
|
+
The agent that generates the main skill and the agent that runs the test skill operate in separate contexts. This prevents self-approval bias. The verify phase spawns an independent executor to run the test skill, ensuring honest evaluation (borrowed from OMC's verifier lane pattern).
|
|
129
|
+
|
|
130
|
+
**2. Structured Diff Feedback**
|
|
131
|
+
|
|
132
|
+
Test skills output structured diff reports instead of simple pass/fail:
|
|
133
|
+
|
|
134
|
+
```yaml
|
|
135
|
+
diffs:
|
|
136
|
+
- location: "page 3, paragraph 2"
|
|
137
|
+
type: "content_loss"
|
|
138
|
+
severity: "critical"
|
|
139
|
+
expected: "table with 3 columns and 5 rows"
|
|
140
|
+
actual: "table missing entirely"
|
|
141
|
+
- location: "page 5, heading"
|
|
142
|
+
type: "format_drift"
|
|
143
|
+
severity: "warning"
|
|
144
|
+
expected: "## Second-level heading"
|
|
145
|
+
actual: "### Third-level heading"
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
This structured feedback is injected back into the main skill's improvement loop, enabling targeted fixes rather than blind retries.
|
|
149
|
+
|
|
150
|
+
**3. QA Cycling with Early Exit**
|
|
151
|
+
|
|
152
|
+
Borrowed from OMC's UltraQA pattern:
|
|
153
|
+
- Test skill finds issues -> structured diagnosis -> main skill fixes -> retest -> loop
|
|
154
|
+
- Same error appearing 3 times triggers early exit (avoids infinite compute burn)
|
|
155
|
+
- Maximum 5 QA cycles per iteration
|
|
156
|
+
|
|
157
|
+
**4. Tournament Selection for Test Skills**
|
|
158
|
+
|
|
159
|
+
During the 90% test skill development phase, multiple test strategies are generated in parallel and evaluated against golden tests (known-correct input/output pairs). The strategy with the highest detection precision wins, similar to OMC's self-improve tournament selection.
|
|
160
|
+
|
|
161
|
+
**5. PRD-Driven Acceptance Criteria**
|
|
162
|
+
|
|
163
|
+
Test skills define concrete, testable acceptance criteria (not vague "implementation is complete"):
|
|
164
|
+
|
|
165
|
+
```text
|
|
166
|
+
Bad: "PDF conversion works correctly"
|
|
167
|
+
Good: "All tables with merged cells are preserved as HTML <table> blocks
|
|
168
|
+
with correct colspan/rowspan attributes"
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
### Example: PDF-to-Markdown Skill
|
|
172
|
+
|
|
173
|
+
For a PDF-to-Markdown conversion skill:
|
|
174
|
+
|
|
175
|
+
- **Test skill** (100 min): Compares original PDF content with generated Markdown, detecting content loss (missing paragraphs, tables, images), format drift (heading levels, list styles), and encoding issues. Outputs structured diffs with page/paragraph-level location info.
|
|
176
|
+
- **Main skill** (10 min): Implements PDF parsing and Markdown generation, iteratively improved by test skill feedback.
|
|
177
|
+
- **Verification** (1 min): End-to-end smoke test on fixture PDFs.
|
|
178
|
+
|
|
179
|
+
### `skill_eval` Check Type
|
|
180
|
+
|
|
181
|
+
The verify system supports a `skill_eval` check type that invokes a test skill as a verification oracle:
|
|
182
|
+
|
|
183
|
+
```yaml
|
|
184
|
+
checks:
|
|
185
|
+
- id: quality-check
|
|
186
|
+
type: skill_eval
|
|
187
|
+
skill: pdf-to-md-test
|
|
188
|
+
config:
|
|
189
|
+
threshold: 0.95
|
|
190
|
+
output_format: structured_diff
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
### Verify Phase: Independent Executor
|
|
194
|
+
|
|
195
|
+
During the loop's verify phase, a separate Claude executor is spawned to run the test skill. This executor:
|
|
196
|
+
- Has no shared context with the main skill's executor
|
|
197
|
+
- Produces an objective evaluation report
|
|
198
|
+
- Returns structured diff feedback that feeds into the next iteration
|
|
199
|
+
|
|
200
|
+
This mirrors OMC's principle: "Keep authoring and review as separate passes."
|
|
201
|
+
|
|
87
202
|
## License
|
|
88
203
|
|
|
89
204
|
MIT
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "ancoder-skill-cli",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.7.1",
|
|
4
4
|
"description": "CLI for managing everything-claude-code (ECC) components — agents, skills, commands, rules, hooks, MCP configs. Single binary, all assets embedded.",
|
|
5
5
|
"bin": {
|
|
6
6
|
"skill-cli": "bin/skill-cli.js"
|