prizmkit 1.0.25 → 1.0.28
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bundled/VERSION.json +3 -3
- package/bundled/adapters/claude/command-adapter.js +4 -5
- package/bundled/dev-pipeline/README.md +90 -0
- package/bundled/dev-pipeline/retry-bug.sh +7 -0
- package/bundled/dev-pipeline/retry-feature.sh +7 -0
- package/bundled/dev-pipeline/run-bugfix.sh +7 -0
- package/bundled/dev-pipeline/run.sh +71 -0
- package/bundled/dev-pipeline/scripts/init-pipeline.py +13 -2
- package/bundled/dev-pipeline/scripts/update-feature-status.py +24 -9
- package/bundled/skills/_metadata.json +1 -9
- package/bundled/skills/dev-pipeline-launcher/SKILL.md +30 -5
- package/bundled/skills/feature-workflow/SKILL.md +9 -2
- package/package.json +1 -1
- package/src/index.js +0 -1
- package/src/scaffold.js +18 -3
- package/bundled/skills/refactor-skill/SKILL.md +0 -371
package/bundled/VERSION.json
CHANGED
|
@@ -64,15 +64,14 @@ export function convertSkillToCommand(skillContent, skillName) {
|
|
|
64
64
|
description: desc,
|
|
65
65
|
};
|
|
66
66
|
|
|
67
|
-
// Rewrite ${SKILL_DIR} references to
|
|
67
|
+
// Rewrite ${SKILL_DIR} references to .claude/command-assets/<name>
|
|
68
|
+
// Assets are stored outside .claude/commands/ to prevent Claude Code from
|
|
69
|
+
// registering asset .md files as slash commands (e.g. /skillName:assets:file).
|
|
68
70
|
let convertedBody = body;
|
|
69
71
|
|
|
70
|
-
// Replace ${SKILL_DIR} with a Claude Code compatible path
|
|
71
|
-
// In Claude Code, commands in a directory can reference sibling files
|
|
72
|
-
// Use project-root-relative paths as fallback
|
|
73
72
|
convertedBody = convertedBody.replace(
|
|
74
73
|
/\$\{SKILL_DIR\}/g,
|
|
75
|
-
`.claude/
|
|
74
|
+
`.claude/command-assets/${skillName}`
|
|
76
75
|
);
|
|
77
76
|
|
|
78
77
|
// Replace "invoke the X skill" or "prizmkit.X" patterns with /X slash command
|
|
@@ -33,6 +33,7 @@ python3 dev-pipeline/scripts/init-pipeline.py \
|
|
|
33
33
|
|---------|-------------|
|
|
34
34
|
| `./run.sh run [feature-list.json] [options]` | Start or resume the pipeline. Processes features sequentially by dependency order. |
|
|
35
35
|
| `./run.sh status [feature-list.json]` | Display current pipeline status: completed, pending, blocked, failed features. |
|
|
36
|
+
| `./run.sh test-cli` | Test AI CLI detection: show detected CLI, version, platform, and query the AI model identity. |
|
|
36
37
|
| `./run.sh reset` | Clear all runtime state in `state/`. Pipeline starts fresh on next `run`. |
|
|
37
38
|
| `./run.sh help` | Show usage help. |
|
|
38
39
|
| `./retry-feature.sh <feature-id> [feature-list.json]` | Retry a single failed feature. Runs one session then exits. |
|
|
@@ -97,7 +98,9 @@ What is always reset (with or without `--clean`):
|
|
|
97
98
|
| `MAX_RETRIES` | `3` | Maximum retry attempts per feature before marking as failed. |
|
|
98
99
|
| `SESSION_TIMEOUT` | `0` (no limit) | Timeout in seconds per AI CLI session. 0 = no timeout. |
|
|
99
100
|
| `AI_CLI` | auto-detect | AI CLI command name. Auto-detects `cbc` or `claude`. Set to override. |
|
|
101
|
+
| `MODEL` | (none) | AI model ID for the session. Passed as `--model` to the CLI. See [Model Selection](#model-selection). |
|
|
100
102
|
| `CODEBUDDY_CLI` | (deprecated) | Legacy alias for `AI_CLI`. Prefer `AI_CLI`. |
|
|
103
|
+
| `VERBOSE` | `0` | Set to `1` to enable `--verbose` on AI CLI (shows subagent output). |
|
|
101
104
|
| `HEARTBEAT_INTERVAL` | `30` | Seconds between heartbeat log output while a session is running. |
|
|
102
105
|
| `HEARTBEAT_STALE_THRESHOLD` | `600` | Seconds before a session is considered stale/stuck. |
|
|
103
106
|
| `LOG_CLEANUP_ENABLED` | `1` | Run log cleanup before pipeline execution (`1`=enabled, `0`=disabled). |
|
|
@@ -116,6 +119,93 @@ SESSION_TIMEOUT=7200 ./dev-pipeline/run.sh run feature-list.json
|
|
|
116
119
|
LOG_RETENTION_DAYS=7 LOG_MAX_TOTAL_MB=512 ./dev-pipeline/run.sh run feature-list.json
|
|
117
120
|
```
|
|
118
121
|
|
|
122
|
+
### AI CLI Configuration
|
|
123
|
+
|
|
124
|
+
The pipeline auto-detects which AI CLI to use. Detection priority:
|
|
125
|
+
|
|
126
|
+
1. `AI_CLI` environment variable (highest)
|
|
127
|
+
2. `.prizmkit/config.json` → `ai_cli` field
|
|
128
|
+
3. `CODEBUDDY_CLI` environment variable (legacy)
|
|
129
|
+
4. Auto-detect: `cbc` in PATH → `claude` in PATH (lowest)
|
|
130
|
+
|
|
131
|
+
To permanently configure a project to use a specific CLI, create `.prizmkit/config.json`:
|
|
132
|
+
|
|
133
|
+
```json
|
|
134
|
+
{
|
|
135
|
+
"ai_cli": "claude-internal",
|
|
136
|
+
"platform": "claude"
|
|
137
|
+
}
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
Or override per-invocation:
|
|
141
|
+
|
|
142
|
+
```bash
|
|
143
|
+
AI_CLI=claude-internal ./dev-pipeline/run.sh run feature-list.json
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
### Model Selection
|
|
147
|
+
|
|
148
|
+
Use the `MODEL` environment variable to specify which AI model to use. The value is passed as `--model <id>` to the CLI.
|
|
149
|
+
|
|
150
|
+
```bash
|
|
151
|
+
# Run pipeline with Sonnet (faster, cheaper)
|
|
152
|
+
MODEL=claude-sonnet-4.6 ./dev-pipeline/run.sh run feature-list.json
|
|
153
|
+
|
|
154
|
+
# Run pipeline with Opus (most capable)
|
|
155
|
+
MODEL=claude-opus-4.6 ./dev-pipeline/run.sh run feature-list.json
|
|
156
|
+
|
|
157
|
+
# Retry a feature with a specific model
|
|
158
|
+
MODEL=claude-opus-4.6 ./dev-pipeline/retry-feature.sh F-007
|
|
159
|
+
|
|
160
|
+
# Test which model the CLI is using
|
|
161
|
+
MODEL=claude-sonnet-4.6 ./dev-pipeline/run.sh test-cli
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
Common model IDs (for `cbc`):
|
|
165
|
+
|
|
166
|
+
| Model ID | Description |
|
|
167
|
+
|----------|-------------|
|
|
168
|
+
| `claude-opus-4.6` | Most capable, slower, higher cost |
|
|
169
|
+
| `claude-sonnet-4.6` | Balanced speed/capability (recommended for pipeline) |
|
|
170
|
+
| `claude-haiku-4.5` | Fastest, cheapest, less capable |
|
|
171
|
+
|
|
172
|
+
> **Note**: `--model` support depends on the CLI. `cbc` fully supports it. `claude-internal` does not support `--model` in headless mode (only interactive `/model` command). If `MODEL` is set but the CLI doesn't support it, the flag is silently ignored.
|
|
173
|
+
|
|
174
|
+
### Testing AI CLI (`test-cli`)
|
|
175
|
+
|
|
176
|
+
Use `test-cli` to verify which CLI, version, and model the pipeline will use:
|
|
177
|
+
|
|
178
|
+
```bash
|
|
179
|
+
# Basic test — uses auto-detected CLI and default model
|
|
180
|
+
./dev-pipeline/run.sh test-cli
|
|
181
|
+
|
|
182
|
+
# Test with a specific model
|
|
183
|
+
MODEL=claude-sonnet-4.6 ./dev-pipeline/run.sh test-cli
|
|
184
|
+
|
|
185
|
+
# Test with a specific CLI
|
|
186
|
+
AI_CLI=cbc ./dev-pipeline/run.sh test-cli
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
Example output:
|
|
190
|
+
|
|
191
|
+
```
|
|
192
|
+
============================================
|
|
193
|
+
Dev-Pipeline AI CLI Test
|
|
194
|
+
============================================
|
|
195
|
+
|
|
196
|
+
Detected CLI: cbc
|
|
197
|
+
Platform: codebuddy
|
|
198
|
+
CLI Version: 2.62.1
|
|
199
|
+
|
|
200
|
+
Querying AI model (headless mode)...
|
|
201
|
+
|
|
202
|
+
AI Response: I'm CodeBuddy, running Claude Opus 4.6
|
|
203
|
+
|
|
204
|
+
============================================
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
The test sends a one-line prompt asking the AI to identify itself, with a 30-second timeout. If the CLI requires authentication or is unavailable, it shows a fallback message.
|
|
208
|
+
|
|
119
209
|
## How It Works
|
|
120
210
|
|
|
121
211
|
### Execution Flow
|
|
@@ -227,6 +227,11 @@ if [[ "$USE_STREAM_JSON" == "true" ]]; then
|
|
|
227
227
|
fi
|
|
228
228
|
|
|
229
229
|
# Spawn AI CLI session
|
|
230
|
+
MODEL_FLAG=""
|
|
231
|
+
if [[ -n "${MODEL:-}" ]]; then
|
|
232
|
+
MODEL_FLAG="--model $MODEL"
|
|
233
|
+
fi
|
|
234
|
+
|
|
230
235
|
case "$CLI_CMD" in
|
|
231
236
|
*claude*)
|
|
232
237
|
"$CLI_CMD" \
|
|
@@ -234,6 +239,7 @@ case "$CLI_CMD" in
|
|
|
234
239
|
-p "$(cat "$BOOTSTRAP_PROMPT")" \
|
|
235
240
|
--yes \
|
|
236
241
|
$STREAM_JSON_FLAG \
|
|
242
|
+
$MODEL_FLAG \
|
|
237
243
|
> "$SESSION_LOG" 2>&1 &
|
|
238
244
|
;;
|
|
239
245
|
*)
|
|
@@ -241,6 +247,7 @@ case "$CLI_CMD" in
|
|
|
241
247
|
--print \
|
|
242
248
|
-y \
|
|
243
249
|
$STREAM_JSON_FLAG \
|
|
250
|
+
$MODEL_FLAG \
|
|
244
251
|
< "$BOOTSTRAP_PROMPT" \
|
|
245
252
|
> "$SESSION_LOG" 2>&1 &
|
|
246
253
|
;;
|
|
@@ -228,6 +228,11 @@ python3 "$SCRIPTS_DIR/update-feature-status.py" \
|
|
|
228
228
|
--action start >/dev/null 2>&1 || true
|
|
229
229
|
|
|
230
230
|
# Spawn AI CLI session
|
|
231
|
+
MODEL_FLAG=""
|
|
232
|
+
if [[ -n "${MODEL:-}" ]]; then
|
|
233
|
+
MODEL_FLAG="--model $MODEL"
|
|
234
|
+
fi
|
|
235
|
+
|
|
231
236
|
case "$CLI_CMD" in
|
|
232
237
|
*claude*)
|
|
233
238
|
"$CLI_CMD" \
|
|
@@ -235,6 +240,7 @@ case "$CLI_CMD" in
|
|
|
235
240
|
-p "$(cat "$BOOTSTRAP_PROMPT")" \
|
|
236
241
|
--yes \
|
|
237
242
|
$STREAM_JSON_FLAG \
|
|
243
|
+
$MODEL_FLAG \
|
|
238
244
|
> "$SESSION_LOG" 2>&1 &
|
|
239
245
|
;;
|
|
240
246
|
*)
|
|
@@ -242,6 +248,7 @@ case "$CLI_CMD" in
|
|
|
242
248
|
--print \
|
|
243
249
|
-y \
|
|
244
250
|
$STREAM_JSON_FLAG \
|
|
251
|
+
$MODEL_FLAG \
|
|
245
252
|
< "$BOOTSTRAP_PROMPT" \
|
|
246
253
|
> "$SESSION_LOG" 2>&1 &
|
|
247
254
|
;;
|
|
@@ -79,6 +79,11 @@ spawn_and_wait_session() {
|
|
|
79
79
|
stream_json_flag="--output-format stream-json"
|
|
80
80
|
fi
|
|
81
81
|
|
|
82
|
+
local model_flag=""
|
|
83
|
+
if [[ -n "${MODEL:-}" ]]; then
|
|
84
|
+
model_flag="--model $MODEL"
|
|
85
|
+
fi
|
|
86
|
+
|
|
82
87
|
case "$CLI_CMD" in
|
|
83
88
|
*claude*)
|
|
84
89
|
"$CLI_CMD" \
|
|
@@ -87,6 +92,7 @@ spawn_and_wait_session() {
|
|
|
87
92
|
--yes \
|
|
88
93
|
$verbose_flag \
|
|
89
94
|
$stream_json_flag \
|
|
95
|
+
$model_flag \
|
|
90
96
|
> "$session_log" 2>&1 &
|
|
91
97
|
;;
|
|
92
98
|
*)
|
|
@@ -95,6 +101,7 @@ spawn_and_wait_session() {
|
|
|
95
101
|
-y \
|
|
96
102
|
$verbose_flag \
|
|
97
103
|
$stream_json_flag \
|
|
104
|
+
$model_flag \
|
|
98
105
|
< "$bootstrap_prompt" \
|
|
99
106
|
> "$session_log" 2>&1 &
|
|
100
107
|
;;
|
|
@@ -20,6 +20,7 @@ set -euo pipefail
|
|
|
20
20
|
# AI_CLI AI CLI command name (override; also readable from .prizmkit/config.json)
|
|
21
21
|
# CODEBUDDY_CLI Legacy alias for AI_CLI (deprecated, use AI_CLI instead)
|
|
22
22
|
# PRIZMKIT_PLATFORM Force platform: 'codebuddy' or 'claude' (auto-detected)
|
|
23
|
+
# MODEL AI model to use (e.g. claude-opus-4.6, claude-sonnet-4.6, claude-haiku-4.5)
|
|
23
24
|
# VERBOSE Set to 1 to enable --verbose on AI CLI (shows subagent output)
|
|
24
25
|
# HEARTBEAT_INTERVAL Heartbeat log interval in seconds (default: 30)
|
|
25
26
|
# HEARTBEAT_STALE_THRESHOLD Heartbeat stale threshold in seconds (default: 600)
|
|
@@ -41,6 +42,7 @@ LOG_CLEANUP_ENABLED=${LOG_CLEANUP_ENABLED:-1}
|
|
|
41
42
|
LOG_RETENTION_DAYS=${LOG_RETENTION_DAYS:-14}
|
|
42
43
|
LOG_MAX_TOTAL_MB=${LOG_MAX_TOTAL_MB:-1024}
|
|
43
44
|
VERBOSE=${VERBOSE:-0}
|
|
45
|
+
MODEL=${MODEL:-""}
|
|
44
46
|
|
|
45
47
|
# Source shared common helpers (CLI/platform detection + logs + deps)
|
|
46
48
|
source "$SCRIPT_DIR/lib/common.sh"
|
|
@@ -91,6 +93,11 @@ spawn_and_wait_session() {
|
|
|
91
93
|
stream_json_flag="--output-format stream-json"
|
|
92
94
|
fi
|
|
93
95
|
|
|
96
|
+
local model_flag=""
|
|
97
|
+
if [[ -n "$MODEL" ]]; then
|
|
98
|
+
model_flag="--model $MODEL"
|
|
99
|
+
fi
|
|
100
|
+
|
|
94
101
|
case "$CLI_CMD" in
|
|
95
102
|
*claude*)
|
|
96
103
|
# Claude Code: prompt via -p argument, --yes for auto-accept
|
|
@@ -100,6 +107,7 @@ spawn_and_wait_session() {
|
|
|
100
107
|
--yes \
|
|
101
108
|
$verbose_flag \
|
|
102
109
|
$stream_json_flag \
|
|
110
|
+
$model_flag \
|
|
103
111
|
> "$session_log" 2>&1 &
|
|
104
112
|
;;
|
|
105
113
|
*)
|
|
@@ -109,6 +117,7 @@ spawn_and_wait_session() {
|
|
|
109
117
|
-y \
|
|
110
118
|
$verbose_flag \
|
|
111
119
|
$stream_json_flag \
|
|
120
|
+
$model_flag \
|
|
112
121
|
< "$bootstrap_prompt" \
|
|
113
122
|
> "$session_log" 2>&1 &
|
|
114
123
|
;;
|
|
@@ -790,6 +799,7 @@ show_help() {
|
|
|
790
799
|
echo " run [feature-list.json] Run all features sequentially"
|
|
791
800
|
echo " run <feature-id> [options] Run a single feature"
|
|
792
801
|
echo " status [feature-list.json] Show pipeline status"
|
|
802
|
+
echo " test-cli Test AI CLI: show detected CLI, version, and model"
|
|
793
803
|
echo " reset Clear all state and start fresh"
|
|
794
804
|
echo " help Show this help message"
|
|
795
805
|
echo ""
|
|
@@ -805,6 +815,7 @@ show_help() {
|
|
|
805
815
|
echo " MAX_RETRIES Max retries per feature (default: 3)"
|
|
806
816
|
echo " SESSION_TIMEOUT Session timeout in seconds (default: 0 = no limit)"
|
|
807
817
|
echo " AI_CLI AI CLI command name (auto-detected: cbc or claude)"
|
|
818
|
+
echo " MODEL AI model ID (e.g. claude-opus-4.6, claude-sonnet-4.6, claude-haiku-4.5)"
|
|
808
819
|
echo " HEARTBEAT_INTERVAL Heartbeat log interval in seconds (default: 30)"
|
|
809
820
|
echo " HEARTBEAT_STALE_THRESHOLD Heartbeat stale threshold in seconds (default: 600)"
|
|
810
821
|
echo " LOG_CLEANUP_ENABLED Run log cleanup before execution (default: 1)"
|
|
@@ -820,6 +831,8 @@ show_help() {
|
|
|
820
831
|
echo " ./run.sh run F-007 --clean --mode standard # Clean + run standard"
|
|
821
832
|
echo " ./run.sh status # Show pipeline status"
|
|
822
833
|
echo " MAX_RETRIES=5 SESSION_TIMEOUT=7200 ./run.sh run # Custom config"
|
|
834
|
+
echo " MODEL=claude-sonnet-4.6 ./run.sh run # Use Sonnet model"
|
|
835
|
+
echo " MODEL=claude-haiku-4.5 ./run.sh test-cli # Test with Haiku"
|
|
823
836
|
}
|
|
824
837
|
|
|
825
838
|
case "${1:-run}" in
|
|
@@ -843,6 +856,64 @@ case "${1:-run}" in
|
|
|
843
856
|
--state-dir "$STATE_DIR" \
|
|
844
857
|
--action status
|
|
845
858
|
;;
|
|
859
|
+
test-cli)
|
|
860
|
+
echo ""
|
|
861
|
+
echo "============================================"
|
|
862
|
+
echo " Dev-Pipeline AI CLI Test"
|
|
863
|
+
echo "============================================"
|
|
864
|
+
echo ""
|
|
865
|
+
echo " Detected CLI: $CLI_CMD"
|
|
866
|
+
echo " Platform: $PLATFORM"
|
|
867
|
+
if [[ -n "$MODEL" ]]; then
|
|
868
|
+
echo " Requested Model: $MODEL"
|
|
869
|
+
fi
|
|
870
|
+
|
|
871
|
+
# Get CLI version (first line only)
|
|
872
|
+
cli_version=$("$CLI_CMD" -v 2>&1 | head -1 || echo "unknown")
|
|
873
|
+
echo " CLI Version: $cli_version"
|
|
874
|
+
echo ""
|
|
875
|
+
echo " Querying AI model (headless mode)..."
|
|
876
|
+
|
|
877
|
+
test_prompt="What AI assistant/platform are you and what model are you running? Reply in one line, e.g. \"I'm Claude Code Claude Opnus x.x\".No extra text."
|
|
878
|
+
|
|
879
|
+
local_model_flag=""
|
|
880
|
+
if [[ -n "$MODEL" ]]; then
|
|
881
|
+
local_model_flag="--model $MODEL"
|
|
882
|
+
fi
|
|
883
|
+
|
|
884
|
+
# Run headless query with 30s timeout (background + kill pattern for macOS)
|
|
885
|
+
tmpfile=$(mktemp)
|
|
886
|
+
(
|
|
887
|
+
unset CLAUDECODE
|
|
888
|
+
case "$CLI_CMD" in
|
|
889
|
+
*claude*)
|
|
890
|
+
"$CLI_CMD" --print -p "$test_prompt" --dangerously-skip-permissions --no-session-persistence $local_model_flag > "$tmpfile" 2>/dev/null
|
|
891
|
+
;;
|
|
892
|
+
*)
|
|
893
|
+
echo "$test_prompt" | "$CLI_CMD" --print -y $local_model_flag > "$tmpfile" 2>/dev/null
|
|
894
|
+
;;
|
|
895
|
+
esac
|
|
896
|
+
) &
|
|
897
|
+
query_pid=$!
|
|
898
|
+
( sleep 30 && kill "$query_pid" 2>/dev/null ) &
|
|
899
|
+
timer_pid=$!
|
|
900
|
+
wait "$query_pid" 2>/dev/null
|
|
901
|
+
kill "$timer_pid" 2>/dev/null
|
|
902
|
+
wait "$timer_pid" 2>/dev/null || true
|
|
903
|
+
|
|
904
|
+
model_reply=$(cat "$tmpfile" 2>/dev/null | head -3)
|
|
905
|
+
rm -f "$tmpfile"
|
|
906
|
+
|
|
907
|
+
if [[ -z "$model_reply" ]]; then
|
|
908
|
+
model_reply="(no response — CLI may require auth or is unavailable)"
|
|
909
|
+
fi
|
|
910
|
+
|
|
911
|
+
echo ""
|
|
912
|
+
echo " AI Response: $model_reply"
|
|
913
|
+
echo ""
|
|
914
|
+
echo "============================================"
|
|
915
|
+
echo ""
|
|
916
|
+
;;
|
|
846
917
|
reset)
|
|
847
918
|
log_warn "Resetting pipeline state..."
|
|
848
919
|
rm -rf "$STATE_DIR"
|
|
@@ -19,6 +19,7 @@ from datetime import datetime, timezone
|
|
|
19
19
|
|
|
20
20
|
EXPECTED_SCHEMA = "dev-pipeline-feature-list-v1"
|
|
21
21
|
FEATURE_ID_PATTERN = re.compile(r"^F-\d{3}$")
|
|
22
|
+
TERMINAL_STATUSES = {"completed", "failed", "skipped"}
|
|
22
23
|
|
|
23
24
|
REQUIRED_FEATURE_FIELDS = [
|
|
24
25
|
"id",
|
|
@@ -234,6 +235,12 @@ def create_state_directory(state_dir, feature_list_path, features):
|
|
|
234
235
|
os.makedirs(abs_state_dir, exist_ok=True)
|
|
235
236
|
os.makedirs(features_dir, exist_ok=True)
|
|
236
237
|
|
|
238
|
+
# Count features already in terminal status at init time
|
|
239
|
+
completed_count = sum(
|
|
240
|
+
1 for f in features
|
|
241
|
+
if isinstance(f, dict) and f.get("status") in TERMINAL_STATUSES
|
|
242
|
+
)
|
|
243
|
+
|
|
237
244
|
# Write pipeline.json
|
|
238
245
|
pipeline_state = {
|
|
239
246
|
"run_id": run_id,
|
|
@@ -241,7 +248,7 @@ def create_state_directory(state_dir, feature_list_path, features):
|
|
|
241
248
|
"feature_list_path": abs_feature_list_path,
|
|
242
249
|
"created_at": now,
|
|
243
250
|
"total_features": len(features),
|
|
244
|
-
"completed_features":
|
|
251
|
+
"completed_features": completed_count,
|
|
245
252
|
}
|
|
246
253
|
pipeline_path = os.path.join(abs_state_dir, "pipeline.json")
|
|
247
254
|
with open(pipeline_path, "w", encoding="utf-8") as f:
|
|
@@ -260,9 +267,13 @@ def create_state_directory(state_dir, feature_list_path, features):
|
|
|
260
267
|
sessions_dir = os.path.join(feature_dir, "sessions")
|
|
261
268
|
os.makedirs(sessions_dir, exist_ok=True)
|
|
262
269
|
|
|
270
|
+
# Respect existing terminal status from feature-list.json
|
|
271
|
+
fl_status = feature.get("status", "pending")
|
|
272
|
+
init_status = fl_status if fl_status in TERMINAL_STATUSES else "pending"
|
|
273
|
+
|
|
263
274
|
feature_status = {
|
|
264
275
|
"feature_id": fid,
|
|
265
|
-
"status":
|
|
276
|
+
"status": init_status,
|
|
266
277
|
"retry_count": 0,
|
|
267
278
|
"max_retries": 3,
|
|
268
279
|
"sessions": [],
|
|
@@ -109,10 +109,15 @@ def now_iso():
|
|
|
109
109
|
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
|
|
110
110
|
|
|
111
111
|
|
|
112
|
-
def load_feature_status(state_dir, feature_id):
|
|
112
|
+
def load_feature_status(state_dir, feature_id, feature_list_status=None):
|
|
113
113
|
"""Load the status.json for a feature.
|
|
114
114
|
|
|
115
115
|
If the file does not exist, return a default pending status.
|
|
116
|
+
|
|
117
|
+
If feature_list_status is a terminal status (completed, failed, skipped),
|
|
118
|
+
it overrides the status field from status.json. This makes feature-list.json
|
|
119
|
+
the single source of truth for terminal statuses, while all other fields
|
|
120
|
+
(retry_count, sessions, etc.) still come from status.json.
|
|
116
121
|
"""
|
|
117
122
|
status_path = os.path.join(
|
|
118
123
|
state_dir, "features", feature_id, "status.json"
|
|
@@ -134,7 +139,7 @@ def load_feature_status(state_dir, feature_id):
|
|
|
134
139
|
if err:
|
|
135
140
|
# If we can't read it, treat as pending
|
|
136
141
|
now = now_iso()
|
|
137
|
-
|
|
142
|
+
data = {
|
|
138
143
|
"feature_id": feature_id,
|
|
139
144
|
"status": "pending",
|
|
140
145
|
"retry_count": 0,
|
|
@@ -145,6 +150,9 @@ def load_feature_status(state_dir, feature_id):
|
|
|
145
150
|
"created_at": now,
|
|
146
151
|
"updated_at": now,
|
|
147
152
|
}
|
|
153
|
+
# feature-list.json wins for terminal statuses
|
|
154
|
+
if feature_list_status in TERMINAL_STATUSES:
|
|
155
|
+
data["status"] = feature_list_status
|
|
148
156
|
return data
|
|
149
157
|
|
|
150
158
|
|
|
@@ -303,7 +311,7 @@ def action_get_next(feature_list_data, state_dir):
|
|
|
303
311
|
fid = feature.get("id")
|
|
304
312
|
if not fid:
|
|
305
313
|
continue
|
|
306
|
-
fs = load_feature_status(state_dir, fid)
|
|
314
|
+
fs = load_feature_status(state_dir, fid, feature.get("status"))
|
|
307
315
|
status_map[fid] = fs.get("status", "pending")
|
|
308
316
|
status_data_map[fid] = fs
|
|
309
317
|
|
|
@@ -574,7 +582,7 @@ def _format_duration(seconds):
|
|
|
574
582
|
return "{}h{}m".format(h, m)
|
|
575
583
|
|
|
576
584
|
|
|
577
|
-
def _estimate_remaining_time(features, state_dir, counts):
|
|
585
|
+
def _estimate_remaining_time(features, state_dir, counts, feature_list_data=None):
|
|
578
586
|
"""基于已完成 Feature 的历史耗时,按 complexity 加权预估剩余时间。
|
|
579
587
|
|
|
580
588
|
策略:
|
|
@@ -588,6 +596,13 @@ def _estimate_remaining_time(features, state_dir, counts):
|
|
|
588
596
|
# complexity 权重(用于没有历史数据时的估算)
|
|
589
597
|
COMPLEXITY_WEIGHT = {"low": 1.0, "medium": 2.0, "high": 4.0}
|
|
590
598
|
|
|
599
|
+
# Build feature-list status map for terminal status override
|
|
600
|
+
fl_status_map = {}
|
|
601
|
+
if feature_list_data:
|
|
602
|
+
for f in feature_list_data.get("features", []):
|
|
603
|
+
if isinstance(f, dict) and f.get("id"):
|
|
604
|
+
fl_status_map[f["id"]] = f.get("status")
|
|
605
|
+
|
|
591
606
|
# 按 complexity 分组收集已完成 Feature 的耗时
|
|
592
607
|
duration_by_complexity = {} # complexity -> [duration_seconds]
|
|
593
608
|
feature_complexity_map = {} # feature_id -> complexity
|
|
@@ -608,7 +623,7 @@ def _estimate_remaining_time(features, state_dir, counts):
|
|
|
608
623
|
fid = feature.get("id")
|
|
609
624
|
if not fid:
|
|
610
625
|
continue
|
|
611
|
-
fs = load_feature_status(state_dir, fid)
|
|
626
|
+
fs = load_feature_status(state_dir, fid, fl_status_map.get(fid))
|
|
612
627
|
if fs.get("status") != "completed":
|
|
613
628
|
continue
|
|
614
629
|
duration = _calc_feature_duration(state_dir, fid)
|
|
@@ -638,7 +653,7 @@ def _estimate_remaining_time(features, state_dir, counts):
|
|
|
638
653
|
fid = feature.get("id")
|
|
639
654
|
if not fid:
|
|
640
655
|
continue
|
|
641
|
-
fs = load_feature_status(state_dir, fid)
|
|
656
|
+
fs = load_feature_status(state_dir, fid, fl_status_map.get(fid))
|
|
642
657
|
fstatus = fs.get("status", "pending")
|
|
643
658
|
if fstatus in TERMINAL_STATUSES:
|
|
644
659
|
continue
|
|
@@ -694,7 +709,7 @@ def action_status(feature_list_data, state_dir):
|
|
|
694
709
|
fid = feature.get("id")
|
|
695
710
|
if not fid:
|
|
696
711
|
continue
|
|
697
|
-
fs = load_feature_status(state_dir, fid)
|
|
712
|
+
fs = load_feature_status(state_dir, fid, feature.get("status"))
|
|
698
713
|
status_map[fid] = fs.get("status", "pending")
|
|
699
714
|
|
|
700
715
|
for feature in features:
|
|
@@ -705,7 +720,7 @@ def action_status(feature_list_data, state_dir):
|
|
|
705
720
|
if not fid:
|
|
706
721
|
continue
|
|
707
722
|
|
|
708
|
-
fs = load_feature_status(state_dir, fid)
|
|
723
|
+
fs = load_feature_status(state_dir, fid, feature.get("status"))
|
|
709
724
|
fstatus = fs.get("status", "pending")
|
|
710
725
|
retry_count = fs.get("retry_count", 0)
|
|
711
726
|
max_retries_val = fs.get("max_retries", 3)
|
|
@@ -797,7 +812,7 @@ def action_status(feature_list_data, state_dir):
|
|
|
797
812
|
|
|
798
813
|
# 预估剩余时间
|
|
799
814
|
est_remaining, confidence = _estimate_remaining_time(
|
|
800
|
-
features, state_dir, counts
|
|
815
|
+
features, state_dir, counts, feature_list_data
|
|
801
816
|
)
|
|
802
817
|
|
|
803
818
|
summary_line = "Total: {} features | Completed: {} | In Progress: {}".format(
|
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
{
|
|
2
|
-
"version": "1.0.
|
|
2
|
+
"version": "1.0.28",
|
|
3
3
|
"skills": {
|
|
4
4
|
"prizm-kit": {
|
|
5
5
|
"description": "Full-lifecycle dev toolkit. Covers spec-driven development, Prizm context docs, code quality, debugging, deployment, and knowledge management.",
|
|
@@ -197,13 +197,6 @@
|
|
|
197
197
|
"hasAssets": false,
|
|
198
198
|
"hasScripts": false
|
|
199
199
|
},
|
|
200
|
-
"refactor-skill": {
|
|
201
|
-
"description": "Intelligent refactor review for existing skills with in-place vs v2 optimization and mandatory eval+graphical review.",
|
|
202
|
-
"tier": "companion",
|
|
203
|
-
"category": "Custom-skill",
|
|
204
|
-
"hasAssets": false,
|
|
205
|
-
"hasScripts": false
|
|
206
|
-
},
|
|
207
200
|
"bug-planner": {
|
|
208
201
|
"description": "Interactive bug planning that produces bug-fix-list.json. Supports stack traces, user reports, failed tests, log patterns, monitoring alerts.",
|
|
209
202
|
"tier": "companion",
|
|
@@ -255,7 +248,6 @@
|
|
|
255
248
|
"prizmkit-retrospective",
|
|
256
249
|
"feature-workflow",
|
|
257
250
|
"refactor-workflow",
|
|
258
|
-
"refactor-skill",
|
|
259
251
|
"app-planner",
|
|
260
252
|
"bug-planner",
|
|
261
253
|
"dev-pipeline-launcher",
|
|
@@ -99,9 +99,34 @@ Detect user intent from their message, then follow the corresponding workflow:
|
|
|
99
99
|
--action status 2>/dev/null
|
|
100
100
|
```
|
|
101
101
|
|
|
102
|
-
4. **Ask
|
|
102
|
+
4. **Ask execution mode**: Present the user with a choice before launching:
|
|
103
|
+
- **(1) Background daemon (recommended)**: Pipeline runs fully detached via `launch-daemon.sh`. Survives session closure.
|
|
104
|
+
- **(2) Foreground in session**: Pipeline runs in the current session via `run.sh run`. Visible output but will stop if session times out.
|
|
105
|
+
- **(3) Manual — show commands**: Display the exact commands the user can run themselves. No execution.
|
|
103
106
|
|
|
104
|
-
|
|
107
|
+
Default to option 1 if user says "just run it" or doesn't specify.
|
|
108
|
+
|
|
109
|
+
**If option 2 (foreground)**:
|
|
110
|
+
```bash
|
|
111
|
+
dev-pipeline/run.sh run feature-list.json
|
|
112
|
+
```
|
|
113
|
+
Note: This will block the session. Warn user about timeout risk.
|
|
114
|
+
|
|
115
|
+
**If option 3 (manual)**: Print commands and stop. Do not execute anything.
|
|
116
|
+
```
|
|
117
|
+
# To run in background (recommended):
|
|
118
|
+
dev-pipeline/launch-daemon.sh start feature-list.json
|
|
119
|
+
|
|
120
|
+
# To run in foreground:
|
|
121
|
+
dev-pipeline/run.sh run feature-list.json
|
|
122
|
+
|
|
123
|
+
# To check status:
|
|
124
|
+
dev-pipeline/launch-daemon.sh status
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
5. **Ask user to confirm**: "Ready to launch the pipeline? It will process N features in the background."
|
|
128
|
+
|
|
129
|
+
6. **Launch**:
|
|
105
130
|
```bash
|
|
106
131
|
dev-pipeline/launch-daemon.sh start feature-list.json
|
|
107
132
|
```
|
|
@@ -110,18 +135,18 @@ Detect user intent from their message, then follow the corresponding workflow:
|
|
|
110
135
|
dev-pipeline/launch-daemon.sh start feature-list.json --env "SESSION_TIMEOUT=7200 MAX_RETRIES=5"
|
|
111
136
|
```
|
|
112
137
|
|
|
113
|
-
|
|
138
|
+
7. **Verify launch**:
|
|
114
139
|
```bash
|
|
115
140
|
dev-pipeline/launch-daemon.sh status
|
|
116
141
|
```
|
|
117
142
|
|
|
118
|
-
|
|
143
|
+
8. **Start log monitoring** -- Use the Bash tool with `run_in_background: true`:
|
|
119
144
|
```bash
|
|
120
145
|
tail -f dev-pipeline/state/pipeline-daemon.log
|
|
121
146
|
```
|
|
122
147
|
This runs in background so you can continue interacting with the user.
|
|
123
148
|
|
|
124
|
-
|
|
149
|
+
9. **Report to user**:
|
|
125
150
|
- Pipeline PID
|
|
126
151
|
- Log file location
|
|
127
152
|
- "You can ask me 'pipeline status' or 'show logs' at any time"
|
|
@@ -144,12 +144,19 @@ Add new features to an existing project (incremental mode).
|
|
|
144
144
|
Proceed? (Y/n)
|
|
145
145
|
```
|
|
146
146
|
|
|
147
|
-
2. **
|
|
147
|
+
2. **Ask execution mode**: Before invoking the launcher, present the choice:
|
|
148
|
+
- **(1) Background daemon (recommended)**: Runs detached, survives session closure.
|
|
149
|
+
- **(2) Foreground in session**: Runs in current session with visible output. Stops if session times out.
|
|
150
|
+
- **(3) Manual — show commands**: Display commands only, no execution.
|
|
151
|
+
|
|
152
|
+
Pass the chosen mode to `dev-pipeline-launcher`.
|
|
153
|
+
|
|
154
|
+
3. **Invoke `dev-pipeline-launcher` skill**:
|
|
148
155
|
- The launcher handles all prerequisites checks
|
|
149
156
|
- Starts `launch-daemon.sh` in background
|
|
150
157
|
- Returns PID and log file location
|
|
151
158
|
|
|
152
|
-
|
|
159
|
+
4. **Verify launch success**:
|
|
153
160
|
- Confirm pipeline is running
|
|
154
161
|
- Record PID and log path for Phase 3
|
|
155
162
|
|
package/package.json
CHANGED
package/src/index.js
CHANGED
|
@@ -45,7 +45,6 @@ export async function runScaffold(directory, options) {
|
|
|
45
45
|
const cliStatus = [
|
|
46
46
|
detected.cbc ? chalk.green('cbc ✓') : chalk.gray('cbc ✗'),
|
|
47
47
|
detected.claude ? chalk.green('claude ✓') : chalk.gray('claude ✗'),
|
|
48
|
-
detected.claudeInternal ? chalk.green('claude-internal ✓') : chalk.gray('claude-internal ✗'),
|
|
49
48
|
].join(' ');
|
|
50
49
|
console.log(` 检测到的 CLI 工具: ${cliStatus}`);
|
|
51
50
|
console.log(` 目标目录: ${projectRoot}`);
|
package/src/scaffold.js
CHANGED
|
@@ -177,7 +177,9 @@ async function installSkills(platform, skills, projectRoot, dryRun) {
|
|
|
177
177
|
await fs.writeFile(path.join(commandsDir, `${skillName}.md`), converted);
|
|
178
178
|
|
|
179
179
|
if (hasAssets || hasScripts) {
|
|
180
|
-
|
|
180
|
+
// Place assets/scripts outside .claude/commands/ to prevent Claude Code
|
|
181
|
+
// from registering them as slash commands (e.g. /skillName:assets:file).
|
|
182
|
+
const assetTargetDir = path.join(projectRoot, '.claude', 'command-assets', skillName);
|
|
181
183
|
await fs.ensureDir(assetTargetDir);
|
|
182
184
|
for (const subdir of ['assets', 'scripts']) {
|
|
183
185
|
const srcSubdir = path.join(corePath, subdir);
|
|
@@ -708,11 +710,24 @@ export async function scaffold(config) {
|
|
|
708
710
|
}
|
|
709
711
|
}
|
|
710
712
|
|
|
711
|
-
// 10.
|
|
713
|
+
// 10. Clean up stray .codebuddy/ directory left by third-party tools (e.g. npx skills)
|
|
714
|
+
// when installing for Claude Code only. CodeBuddy files should never appear in a
|
|
715
|
+
// claude-only install.
|
|
716
|
+
if (!dryRun && !platforms.includes('codebuddy')) {
|
|
717
|
+
const strayCbDir = path.join(projectRoot, '.codebuddy');
|
|
718
|
+
if (await fs.pathExists(strayCbDir)) {
|
|
719
|
+
const entries = await fs.readdir(strayCbDir);
|
|
720
|
+
if (entries.length === 0) {
|
|
721
|
+
await fs.remove(strayCbDir);
|
|
722
|
+
}
|
|
723
|
+
}
|
|
724
|
+
}
|
|
725
|
+
|
|
726
|
+
// 11. Git pre-commit hook
|
|
712
727
|
console.log(chalk.blue(' Git Hook:'));
|
|
713
728
|
await installGitHook(projectRoot, dryRun);
|
|
714
729
|
|
|
715
|
-
//
|
|
730
|
+
// 12. PrizmKit scripts (always installed)
|
|
716
731
|
console.log(chalk.blue(' PrizmKit 脚本:'));
|
|
717
732
|
await installPrizmkitScripts(projectRoot, dryRun);
|
|
718
733
|
|
|
@@ -1,371 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: "refactor-skill"
|
|
3
|
-
tier: companion
|
|
4
|
-
description: "Intelligent refactor review for existing skills: evaluates quality, proposes in-place upgrade, and enforces eval + graphical review after changes. (project) by using the newest skill-creator standard"
|
|
5
|
-
---
|
|
6
|
-
|
|
7
|
-
# Refactor Skill
|
|
8
|
-
|
|
9
|
-
Specialized workflow for reviewing and upgrading existing skills with measurable quality gates.
|
|
10
|
-
|
|
11
|
-
## When to Use
|
|
12
|
-
|
|
13
|
-
Use this skill when user says:
|
|
14
|
-
- "重构这个 skill", "优化技能设计", "评审并升级技能"
|
|
15
|
-
- "review this skill and improve it"
|
|
16
|
-
- "keep the same skill but improve quality"
|
|
17
|
-
- "原地升级这个 skill" / "in-place upgrade this skill"
|
|
18
|
-
|
|
19
|
-
Do NOT use when user only wants to run a pipeline immediately without changing skill design.
|
|
20
|
-
|
|
21
|
-
## Core Goals
|
|
22
|
-
|
|
23
|
-
1. Review current skill comprehensively and find concrete improvement points.
|
|
24
|
-
2. **Default and preferred mode: in-place upgrade** of the existing skill.
|
|
25
|
-
3. New-version fork (e.g., `-v2`) is **exception-only** and must be explicitly requested by user.
|
|
26
|
-
4. After any modification, **must** run standardized evaluation and graphical review.
|
|
27
|
-
|
|
28
|
-
## Context Readiness Gate (Mandatory)
|
|
29
|
-
|
|
30
|
-
Before any refactor action, verify whether conversation context already contains:
|
|
31
|
-
- target skill name/path
|
|
32
|
-
- current project/workspace path
|
|
33
|
-
- refactor objective and constraints (quality, speed, compatibility)
|
|
34
|
-
- whether user explicitly requests a new-version fork (default is in-place upgrade)
|
|
35
|
-
|
|
36
|
-
If any item is missing, do not block; gather context proactively:
|
|
37
|
-
1. Read `/core/skills/_metadata.json` to locate target skill and related neighbors.
|
|
38
|
-
2. Read target `SKILL.md` plus key assets/scripts under that skill.
|
|
39
|
-
3. Check recent evaluation artifacts under `/.codebuddy/skill-evals/` if present.
|
|
40
|
-
4. Ask only the minimum unresolved question(s).
|
|
41
|
-
|
|
42
|
-
## Review Dimensions (Mandatory Rubric)
|
|
43
|
-
|
|
44
|
-
Assess and score each dimension (1-5):
|
|
45
|
-
|
|
46
|
-
1. **功能性 (Functionality)**
|
|
47
|
-
- Trigger clarity and routing correctness
|
|
48
|
-
- Workflow completeness and error recovery
|
|
49
|
-
- Output contract correctness (schema/format compatibility)
|
|
50
|
-
|
|
51
|
-
2. **效率性 (Efficiency)**
|
|
52
|
-
- Unnecessary steps, token/time overhead
|
|
53
|
-
- Reusability of scripts/assets
|
|
54
|
-
- Fast-path design and fallback strategy
|
|
55
|
-
|
|
56
|
-
3. **可维护性 (Maintainability)**
|
|
57
|
-
- Instruction structure/readability
|
|
58
|
-
- Coupling to environment and path robustness
|
|
59
|
-
- Testability and observability (artifacts, checkpoints)
|
|
60
|
-
|
|
61
|
-
Output a concise review summary with:
|
|
62
|
-
- strengths
|
|
63
|
-
- prioritized issues (P0/P1/P2)
|
|
64
|
-
- expected impact for each fix
|
|
65
|
-
|
|
66
|
-
## Optimization Strategy Selection
|
|
67
|
-
|
|
68
|
-
After review, apply **in-place upgrade** by default.
|
|
69
|
-
|
|
70
|
-
### Default Mode — In-Place Upgrade (Required Unless Explicitly Overridden)
|
|
71
|
-
Use in almost all cases:
|
|
72
|
-
- skill naming and contract remain stable
|
|
73
|
-
- change scope is moderate or large but compatible
|
|
74
|
-
- backward compatibility is required
|
|
75
|
-
|
|
76
|
-
Actions:
|
|
77
|
-
1. edit existing skill files in place
|
|
78
|
-
2. preserve skill name/frontmatter compatibility
|
|
79
|
-
3. keep migration notes minimal
|
|
80
|
-
|
|
81
|
-
### Exception Mode — New Version via `skill-creator` (Explicit User Request Only)
|
|
82
|
-
Only use when user clearly asks to fork a new version (e.g., `create <skill>-v2`).
|
|
83
|
-
|
|
84
|
-
Additional required checks before using exception mode:
|
|
85
|
-
- user confirms they need side-by-side old/new variants
|
|
86
|
-
- user accepts added maintenance cost for two versions
|
|
87
|
-
|
|
88
|
-
Actions:
|
|
89
|
-
1. copy current skill as baseline snapshot
|
|
90
|
-
2. create `<skill-name>-v2` and apply redesign
|
|
91
|
-
3. run evaluation against old version baseline
|
|
92
|
-
|
|
93
|
-
## Mandatory Post-Change Validation, Review, and Optimization Loop
|
|
94
|
-
|
|
95
|
-
Run this full loop after **every** refactor (default: in-place; exception: new-version fork). Do not skip.
|
|
96
|
-
|
|
97
|
-
### Step 0: Freeze Refactor Scope (Input Gate)
|
|
98
|
-
Capture and freeze:
|
|
99
|
-
- target skill path
|
|
100
|
-
- iteration id (`iteration-N`)
|
|
101
|
-
- baseline type (`old-snapshot` preferred, fallback `without_skill`)
|
|
102
|
-
- optimization goal for this round (quality, token, latency, or compatibility)
|
|
103
|
-
|
|
104
|
-
Expected output:
|
|
105
|
-
- one-line run plan: `skill + baseline + iteration + goal`
|
|
106
|
-
|
|
107
|
-
### Step 1: Structural Validation (Pre-Eval)
|
|
108
|
-
Validate skill structure/frontmatter and required files first.
|
|
109
|
-
|
|
110
|
-
Expected output:
|
|
111
|
-
- validation pass/fail result
|
|
112
|
-
- blocking fix list if failed
|
|
113
|
-
|
|
114
|
-
### Step 2: Execute Standardized Eval Runs (Mandatory)
|
|
115
|
-
Create iteration workspace:
|
|
116
|
-
- `/.codebuddy/skill-evals/<skill-name>-workspace/iteration-N/`
|
|
117
|
-
|
|
118
|
-
Run both configurations for the same eval set in the same iteration:
|
|
119
|
-
- `with_skill` (updated skill)
|
|
120
|
-
- `baseline` (old snapshot or `without_skill`)
|
|
121
|
-
|
|
122
|
-
Use multi-run strategy:
|
|
123
|
-
- default: 3 runs (fast feedback)
|
|
124
|
-
- release gate: 5 runs (stability check)
|
|
125
|
-
|
|
126
|
-
Required artifacts per run:
|
|
127
|
-
- `outputs/`
|
|
128
|
-
- `timing.json`
|
|
129
|
-
- `grading.json`
|
|
130
|
-
- `eval_metadata.json` (per eval directory)
|
|
131
|
-
|
|
132
|
-
Expected output:
|
|
133
|
-
- complete run tree with paired `with_skill` vs `baseline` runs
|
|
134
|
-
- no missing required artifact files
|
|
135
|
-
|
|
136
|
-
### Step 3: Score, Aggregate, and Build Benchmark
|
|
137
|
-
Run grading and aggregation using standardized scripts. Keep metrics comparable across iterations.
|
|
138
|
-
|
|
139
|
-
Required outputs:
|
|
140
|
-
- `benchmark.json`
|
|
141
|
-
- `benchmark.md`
|
|
142
|
-
|
|
143
|
-
Required metrics:
|
|
144
|
-
- pass rate
|
|
145
|
-
- duration
|
|
146
|
-
- token usage
|
|
147
|
-
- with_skill vs baseline delta
|
|
148
|
-
- variance (`stddev`) for stability judgment
|
|
149
|
-
|
|
150
|
-
Expected output:
|
|
151
|
-
- benchmark summary with clear win/lose/neutral conclusion per metric
|
|
152
|
-
|
|
153
|
-
### Step 4: Graphical Review (Mandatory via `generate_review`)
|
|
154
|
-
Generate review UI using official `generate_review.py` (no custom viewer).
|
|
155
|
-
|
|
156
|
-
Preferred modes:
|
|
157
|
-
1. server mode for interactive inspection
|
|
158
|
-
2. static HTML mode (`--static`) for headless fallback
|
|
159
|
-
|
|
160
|
-
Expected output:
|
|
161
|
-
- review entry recorded (URL or HTML path)
|
|
162
|
-
- quick notes on representative good/bad runs linked to evidence
|
|
163
|
-
|
|
164
|
-
### Step 5: Analyze Results and Derive Optimization Actions
|
|
165
|
-
Translate benchmark + viewer evidence into prioritized actions:
|
|
166
|
-
- **P0**: contract/validation breakages
|
|
167
|
-
- **P1**: quality instability or high variance
|
|
168
|
-
- **P2**: token/time inefficiencies
|
|
169
|
-
|
|
170
|
-
For each action define:
|
|
171
|
-
- root cause hypothesis
|
|
172
|
-
- exact file/section to modify
|
|
173
|
-
- expected metric impact
|
|
174
|
-
- rollback condition
|
|
175
|
-
|
|
176
|
-
Expected output:
|
|
177
|
-
- actionable optimization list (not generic advice)
|
|
178
|
-
|
|
179
|
-
### Step 6: Implement Targeted Improvements
|
|
180
|
-
Apply only the selected actions for this iteration.
|
|
181
|
-
Avoid mixing unrelated changes to keep causal attribution clear.
|
|
182
|
-
|
|
183
|
-
Expected output:
|
|
184
|
-
- focused diff scoped to the chosen actions
|
|
185
|
-
|
|
186
|
-
### Step 7: Re-Run Evaluation and Compare Iterations
|
|
187
|
-
Re-run Step 2–4 on the updated skill and compare against previous iteration.
|
|
188
|
-
|
|
189
|
-
Decision rule:
|
|
190
|
-
- if goals met and no regression: accept iteration
|
|
191
|
-
- if partial improvement: keep gains, open next iteration with narrowed scope
|
|
192
|
-
- if regression: rollback or revise hypothesis and repeat
|
|
193
|
-
|
|
194
|
-
Expected output:
|
|
195
|
-
- iteration verdict (`accepted` / `needs-next-iteration` / `rollback`)
|
|
196
|
-
- before/after comparison table
|
|
197
|
-
|
|
198
|
-
### Step 8: Close the Loop (Mandatory Delivery)
|
|
199
|
-
Return:
|
|
200
|
-
1. what changed
|
|
201
|
-
2. measured impact (pass/time/tokens/variance deltas)
|
|
202
|
-
3. viewer entry
|
|
203
|
-
4. remaining risks
|
|
204
|
-
5. next iteration plan (if needed)
|
|
205
|
-
|
|
206
|
-
This closes the loop from **test review → evidence analysis → skill optimization → re-validation**.
|
|
207
|
-
|
|
208
|
-
### Standard Command Blueprint (Project-level)
|
|
209
|
-
Use the one-command review pipeline with optional grader hook:
|
|
210
|
-
|
|
211
|
-
```bash
|
|
212
|
-
npm run skill:review -- \
|
|
213
|
-
--workspace /abs/.codebuddy/skill-evals/<skill-name>-workspace \
|
|
214
|
-
--iteration iteration-N \
|
|
215
|
-
--skill-name <skill-name> \
|
|
216
|
-
--skill-path /abs/core/skills/<skill-name> \
|
|
217
|
-
--runs 3 \
|
|
218
|
-
--grader-cmd "python3 /abs/scripts/skill-evals/grade-eval-runs.py --workspace {workspace} --iteration {iteration} --validator /abs/core/skills/<skill-name>/scripts/validate-and-generate.py --baseline-input /abs/.codebuddy/skill-evals/<skill-name>-workspace/inputs/feature-list-existing.json"
|
|
219
|
-
```
|
|
220
|
-
|
|
221
|
-
Minimum expected deliverables per iteration:
|
|
222
|
-
- `<workspace>/<iteration>/benchmark.json`
|
|
223
|
-
- `<workspace>/<iteration>/benchmark.md`
|
|
224
|
-
- `<workspace>/<iteration>/review.html`
|
|
225
|
-
- optimization action list with priority and owner
|
|
226
|
-
|
|
227
|
-
## Execution Notes for `skill-creator` Integration
|
|
228
|
-
|
|
229
|
-
When available, follow latest `skill-creator` evaluation/viewer workflow as source of truth:
|
|
230
|
-
- parallelized run spawning (with_skill + baseline)
|
|
231
|
-
- assertion-based grading format compatibility
|
|
232
|
-
- benchmark aggregation via official script
|
|
233
|
-
- viewer generation via official script
|
|
234
|
-
|
|
235
|
-
## Output Contract of This Skill
|
|
236
|
-
|
|
237
|
-
After completion, return:
|
|
238
|
-
1. selected mode (`in-place` by default, or `new-version` if explicitly requested) and why
|
|
239
|
-
2. files changed/created
|
|
240
|
-
3. review rubric scores before vs after
|
|
241
|
-
4. benchmark summary (pass/time/tokens delta)
|
|
242
|
-
5. graphical review entry (URL or static HTML path)
|
|
243
|
-
6. remaining risks and next iteration suggestions
|
|
244
|
-
|
|
245
|
-
## Error Handling
|
|
246
|
-
|
|
247
|
-
- Missing target skill path: auto-discover under `/core/skills/` then confirm.
|
|
248
|
-
- Missing baseline snapshot: create one before modifications.
|
|
249
|
-
- Eval incomplete: mark status as blocked and list missing artifacts.
|
|
250
|
-
- Viewer runtime incompatibility: switch to `--static` mode and continue.
|
|
251
|
-
|
|
252
|
-
## Skill Registry Modification Guide
|
|
253
|
-
|
|
254
|
-
When adding or removing skills from the framework, follow this reference checklist.
|
|
255
|
-
|
|
256
|
-
### Adding a New Skill
|
|
257
|
-
|
|
258
|
-
**Step 1: Create skill definition**
|
|
259
|
-
```
|
|
260
|
-
core/skills/<skill-name>/SKILL.md # Required: skill definition with frontmatter
|
|
261
|
-
core/skills/<skill-name>/assets/ # Optional: templates, configs, etc.
|
|
262
|
-
core/skills/<skill-name>/scripts/ # Optional: executable scripts
|
|
263
|
-
```
|
|
264
|
-
|
|
265
|
-
**Step 2: Register in metadata**
|
|
266
|
-
|
|
267
|
-
Edit `core/skills/_metadata.json`:
|
|
268
|
-
```json
|
|
269
|
-
{
|
|
270
|
-
"skills": {
|
|
271
|
-
"<skill-name>": {
|
|
272
|
-
"description": "Brief description of the skill",
|
|
273
|
-
"tier": "1", // "foundation", "1", "2", or "companion"
|
|
274
|
-
"category": "core", // "core", "quality", "devops", "debugging", "documentation", "pipeline"
|
|
275
|
-
"hasAssets": false, // true if assets/ directory exists
|
|
276
|
-
"hasScripts": false // true if scripts/ directory exists
|
|
277
|
-
}
|
|
278
|
-
}
|
|
279
|
-
}
|
|
280
|
-
```
|
|
281
|
-
|
|
282
|
-
**Step 3: (Optional) Add to suite**
|
|
283
|
-
|
|
284
|
-
If the skill belongs to `core` or `minimal` suite, add to `suites` section in `_metadata.json`:
|
|
285
|
-
```json
|
|
286
|
-
{
|
|
287
|
-
"suites": {
|
|
288
|
-
"core": {
|
|
289
|
-
"skills": ["<skill-name>", ...]
|
|
290
|
-
}
|
|
291
|
-
}
|
|
292
|
-
}
|
|
293
|
-
```
|
|
294
|
-
|
|
295
|
-
**Step 4: Regenerate derived artifacts**
|
|
296
|
-
```bash
|
|
297
|
-
# Update bundled directory for npm package
|
|
298
|
-
node scripts/bundle.js
|
|
299
|
-
```
|
|
300
|
-
|
|
301
|
-
**Step 5: Validate**
|
|
302
|
-
```bash
|
|
303
|
-
npm test
|
|
304
|
-
# or
|
|
305
|
-
node tests/validate-all.js
|
|
306
|
-
```
|
|
307
|
-
|
|
308
|
-
### Removing a Skill
|
|
309
|
-
|
|
310
|
-
**Step 1: Delete skill directory**
|
|
311
|
-
```bash
|
|
312
|
-
rm -rf core/skills/<skill-name>/
|
|
313
|
-
```
|
|
314
|
-
|
|
315
|
-
**Step 2: Remove from metadata**
|
|
316
|
-
|
|
317
|
-
Edit `core/skills/_metadata.json`:
|
|
318
|
-
- Remove entry from `skills` object
|
|
319
|
-
- Remove from any `suites` that reference it
|
|
320
|
-
|
|
321
|
-
**Step 3: Regenerate derived artifacts**
|
|
322
|
-
```bash
|
|
323
|
-
node scripts/bundle.js
|
|
324
|
-
```
|
|
325
|
-
|
|
326
|
-
**Step 4: Validate**
|
|
327
|
-
```bash
|
|
328
|
-
npm test
|
|
329
|
-
```
|
|
330
|
-
|
|
331
|
-
### Modification Checklist Summary
|
|
332
|
-
|
|
333
|
-
| File | Action | Required |
|
|
334
|
-
|------|--------|----------|
|
|
335
|
-
| `core/skills/<name>/SKILL.md` | Create/Delete | **Always** |
|
|
336
|
-
| `core/skills/_metadata.json` → `skills` | Add/Remove entry | **Always** |
|
|
337
|
-
| `core/skills/_metadata.json` → `suites` | Add/Remove from suite | If belongs to suite |
|
|
338
|
-
| `core/skills/<name>/assets/` | Create/Delete | If has resources |
|
|
339
|
-
| `core/skills/<name>/scripts/` | Create/Delete | If has scripts |
|
|
340
|
-
| `create-prizmkit/bundled/` | Regenerate via script | Auto |
|
|
341
|
-
|
|
342
|
-
### Documents That May Need Number Updates
|
|
343
|
-
|
|
344
|
-
**Recommendation: Avoid hardcoding skill counts.** Use relative descriptions instead:
|
|
345
|
-
- ✅ "All skills" instead of "34 skills"
|
|
346
|
-
- ✅ "Core Tier 1 skills" instead of "Core Tier 1 skills (17 skills)"
|
|
347
|
-
- ✅ "symlink (skills)" instead of "symlink (35 skills)"
|
|
348
|
-
|
|
349
|
-
If you must include counts, maintain them in one place (`_metadata.json`) and update all references together.
|
|
350
|
-
|
|
351
|
-
Files that currently contain hardcoded skill counts:
|
|
352
|
-
|
|
353
|
-
| File | Current Pattern | Suggested Fix |
|
|
354
|
-
|------|-----------------|---------------|
|
|
355
|
-
| `README.md` | "**N Skills** covering..." | Remove number or use "Skills" |
|
|
356
|
-
| `CODEBUDDY.md` | "**N Skills** — ..." | Remove number |
|
|
357
|
-
| `PK-Construct-Guide.md` | "N skills — 每个 skill..." | Remove number |
|
|
358
|
-
| `PK-Evolving-User-Guide.md` | "symlink (N skills)" | Use "symlink (skills)" |
|
|
359
|
-
| `core/skills/prizm-kit/SKILL.md` | "## Skill Inventory (N skills)" | Use "## Skill Inventory" |
|
|
360
|
-
| `core/skills/_metadata.json` | `"description": "All N skills"` | Use `"description": "All skills"` |
|
|
361
|
-
|
|
362
|
-
To find hardcoded numbers:
|
|
363
|
-
```bash
|
|
364
|
-
grep -rn "[0-9]\+ skills\|[0-9]\+ Skills" --include="*.md" --include="*.json" .
|
|
365
|
-
```
|
|
366
|
-
|
|
367
|
-
## Path Rules
|
|
368
|
-
|
|
369
|
-
- Prefer absolute paths in execution commands.
|
|
370
|
-
- Keep path references portable in instructions when possible (e.g., `${SKILL_DIR}` for intra-skill files).
|
|
371
|
-
- Never delete `.codebuddy` directory.
|