prizmkit 1.0.26 → 1.0.28

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,5 +1,5 @@
1
1
  {
2
- "frameworkVersion": "1.0.26",
3
- "bundledAt": "2026-03-15T17:57:13.772Z",
4
- "bundledFrom": "896326f"
2
+ "frameworkVersion": "1.0.28",
3
+ "bundledAt": "2026-03-17T06:14:03.182Z",
4
+ "bundledFrom": "780f717"
5
5
  }
@@ -33,6 +33,7 @@ python3 dev-pipeline/scripts/init-pipeline.py \
33
33
  |---------|-------------|
34
34
  | `./run.sh run [feature-list.json] [options]` | Start or resume the pipeline. Processes features sequentially by dependency order. |
35
35
  | `./run.sh status [feature-list.json]` | Display current pipeline status: completed, pending, blocked, failed features. |
36
+ | `./run.sh test-cli` | Test AI CLI detection: show detected CLI, version, platform, and query the AI model identity. |
36
37
  | `./run.sh reset` | Clear all runtime state in `state/`. Pipeline starts fresh on next `run`. |
37
38
  | `./run.sh help` | Show usage help. |
38
39
  | `./retry-feature.sh <feature-id> [feature-list.json]` | Retry a single failed feature. Runs one session then exits. |
@@ -97,7 +98,9 @@ What is always reset (with or without `--clean`):
97
98
  | `MAX_RETRIES` | `3` | Maximum retry attempts per feature before marking as failed. |
98
99
  | `SESSION_TIMEOUT` | `0` (no limit) | Timeout in seconds per AI CLI session. 0 = no timeout. |
99
100
  | `AI_CLI` | auto-detect | AI CLI command name. Auto-detects `cbc` or `claude`. Set to override. |
101
+ | `MODEL` | (none) | AI model ID for the session. Passed as `--model` to the CLI. See [Model Selection](#model-selection). |
100
102
  | `CODEBUDDY_CLI` | (deprecated) | Legacy alias for `AI_CLI`. Prefer `AI_CLI`. |
103
+ | `VERBOSE` | `0` | Set to `1` to enable `--verbose` on AI CLI (shows subagent output). |
101
104
  | `HEARTBEAT_INTERVAL` | `30` | Seconds between heartbeat log output while a session is running. |
102
105
  | `HEARTBEAT_STALE_THRESHOLD` | `600` | Seconds before a session is considered stale/stuck. |
103
106
  | `LOG_CLEANUP_ENABLED` | `1` | Run log cleanup before pipeline execution (`1`=enabled, `0`=disabled). |
@@ -116,6 +119,93 @@ SESSION_TIMEOUT=7200 ./dev-pipeline/run.sh run feature-list.json
116
119
  LOG_RETENTION_DAYS=7 LOG_MAX_TOTAL_MB=512 ./dev-pipeline/run.sh run feature-list.json
117
120
  ```
118
121
 
122
+ ### AI CLI Configuration
123
+
124
+ The pipeline auto-detects which AI CLI to use. Detection priority:
125
+
126
+ 1. `AI_CLI` environment variable (highest)
127
+ 2. `.prizmkit/config.json` → `ai_cli` field
128
+ 3. `CODEBUDDY_CLI` environment variable (legacy)
129
+ 4. Auto-detect: `cbc` in PATH → `claude` in PATH (lowest)
130
+
131
+ To permanently configure a project to use a specific CLI, create `.prizmkit/config.json`:
132
+
133
+ ```json
134
+ {
135
+ "ai_cli": "claude-internal",
136
+ "platform": "claude"
137
+ }
138
+ ```
139
+
140
+ Or override per-invocation:
141
+
142
+ ```bash
143
+ AI_CLI=claude-internal ./dev-pipeline/run.sh run feature-list.json
144
+ ```
145
+
146
+ ### Model Selection
147
+
148
+ Use the `MODEL` environment variable to specify which AI model to use. The value is passed as `--model <id>` to the CLI.
149
+
150
+ ```bash
151
+ # Run pipeline with Sonnet (faster, cheaper)
152
+ MODEL=claude-sonnet-4.6 ./dev-pipeline/run.sh run feature-list.json
153
+
154
+ # Run pipeline with Opus (most capable)
155
+ MODEL=claude-opus-4.6 ./dev-pipeline/run.sh run feature-list.json
156
+
157
+ # Retry a feature with a specific model
158
+ MODEL=claude-opus-4.6 ./dev-pipeline/retry-feature.sh F-007
159
+
160
+ # Test which model the CLI is using
161
+ MODEL=claude-sonnet-4.6 ./dev-pipeline/run.sh test-cli
162
+ ```
163
+
164
+ Common model IDs (for `cbc`):
165
+
166
+ | Model ID | Description |
167
+ |----------|-------------|
168
+ | `claude-opus-4.6` | Most capable, slower, higher cost |
169
+ | `claude-sonnet-4.6` | Balanced speed/capability (recommended for pipeline) |
170
+ | `claude-haiku-4.5` | Fastest, cheapest, less capable |
171
+
172
+ > **Note**: `--model` support depends on the CLI. `cbc` fully supports it. `claude-internal` does not support `--model` in headless mode (only interactive `/model` command). If `MODEL` is set but the CLI doesn't support it, the flag is silently ignored.
173
+
174
+ ### Testing AI CLI (`test-cli`)
175
+
176
+ Use `test-cli` to verify which CLI, version, and model the pipeline will use:
177
+
178
+ ```bash
179
+ # Basic test — uses auto-detected CLI and default model
180
+ ./dev-pipeline/run.sh test-cli
181
+
182
+ # Test with a specific model
183
+ MODEL=claude-sonnet-4.6 ./dev-pipeline/run.sh test-cli
184
+
185
+ # Test with a specific CLI
186
+ AI_CLI=cbc ./dev-pipeline/run.sh test-cli
187
+ ```
188
+
189
+ Example output:
190
+
191
+ ```
192
+ ============================================
193
+ Dev-Pipeline AI CLI Test
194
+ ============================================
195
+
196
+ Detected CLI: cbc
197
+ Platform: codebuddy
198
+ CLI Version: 2.62.1
199
+
200
+ Querying AI model (headless mode)...
201
+
202
+ AI Response: I'm CodeBuddy, running Claude Opus 4.6
203
+
204
+ ============================================
205
+ ```
206
+
207
+ The test sends a one-line prompt asking the AI to identify itself, with a 30-second timeout. If the CLI requires authentication or is unavailable, it shows a fallback message.
208
+
119
209
  ## How It Works
120
210
 
121
211
  ### Execution Flow
@@ -227,6 +227,11 @@ if [[ "$USE_STREAM_JSON" == "true" ]]; then
227
227
  fi
228
228
 
229
229
  # Spawn AI CLI session
230
+ MODEL_FLAG=""
231
+ if [[ -n "${MODEL:-}" ]]; then
232
+ MODEL_FLAG="--model $MODEL"
233
+ fi
234
+
230
235
  case "$CLI_CMD" in
231
236
  *claude*)
232
237
  "$CLI_CMD" \
@@ -234,6 +239,7 @@ case "$CLI_CMD" in
234
239
  -p "$(cat "$BOOTSTRAP_PROMPT")" \
235
240
  --yes \
236
241
  $STREAM_JSON_FLAG \
242
+ $MODEL_FLAG \
237
243
  > "$SESSION_LOG" 2>&1 &
238
244
  ;;
239
245
  *)
@@ -241,6 +247,7 @@ case "$CLI_CMD" in
241
247
  --print \
242
248
  -y \
243
249
  $STREAM_JSON_FLAG \
250
+ $MODEL_FLAG \
244
251
  < "$BOOTSTRAP_PROMPT" \
245
252
  > "$SESSION_LOG" 2>&1 &
246
253
  ;;
@@ -228,6 +228,11 @@ python3 "$SCRIPTS_DIR/update-feature-status.py" \
228
228
  --action start >/dev/null 2>&1 || true
229
229
 
230
230
  # Spawn AI CLI session
231
+ MODEL_FLAG=""
232
+ if [[ -n "${MODEL:-}" ]]; then
233
+ MODEL_FLAG="--model $MODEL"
234
+ fi
235
+
231
236
  case "$CLI_CMD" in
232
237
  *claude*)
233
238
  "$CLI_CMD" \
@@ -235,6 +240,7 @@ case "$CLI_CMD" in
235
240
  -p "$(cat "$BOOTSTRAP_PROMPT")" \
236
241
  --yes \
237
242
  $STREAM_JSON_FLAG \
243
+ $MODEL_FLAG \
238
244
  > "$SESSION_LOG" 2>&1 &
239
245
  ;;
240
246
  *)
@@ -242,6 +248,7 @@ case "$CLI_CMD" in
242
248
  --print \
243
249
  -y \
244
250
  $STREAM_JSON_FLAG \
251
+ $MODEL_FLAG \
245
252
  < "$BOOTSTRAP_PROMPT" \
246
253
  > "$SESSION_LOG" 2>&1 &
247
254
  ;;
@@ -79,6 +79,11 @@ spawn_and_wait_session() {
79
79
  stream_json_flag="--output-format stream-json"
80
80
  fi
81
81
 
82
+ local model_flag=""
83
+ if [[ -n "${MODEL:-}" ]]; then
84
+ model_flag="--model $MODEL"
85
+ fi
86
+
82
87
  case "$CLI_CMD" in
83
88
  *claude*)
84
89
  "$CLI_CMD" \
@@ -87,6 +92,7 @@ spawn_and_wait_session() {
87
92
  --yes \
88
93
  $verbose_flag \
89
94
  $stream_json_flag \
95
+ $model_flag \
90
96
  > "$session_log" 2>&1 &
91
97
  ;;
92
98
  *)
@@ -95,6 +101,7 @@ spawn_and_wait_session() {
95
101
  -y \
96
102
  $verbose_flag \
97
103
  $stream_json_flag \
104
+ $model_flag \
98
105
  < "$bootstrap_prompt" \
99
106
  > "$session_log" 2>&1 &
100
107
  ;;
@@ -20,6 +20,7 @@ set -euo pipefail
20
20
  # AI_CLI AI CLI command name (override; also readable from .prizmkit/config.json)
21
21
  # CODEBUDDY_CLI Legacy alias for AI_CLI (deprecated, use AI_CLI instead)
22
22
  # PRIZMKIT_PLATFORM Force platform: 'codebuddy' or 'claude' (auto-detected)
23
+ # MODEL AI model to use (e.g. claude-opus-4.6, claude-sonnet-4.6, claude-haiku-4.5)
23
24
  # VERBOSE Set to 1 to enable --verbose on AI CLI (shows subagent output)
24
25
  # HEARTBEAT_INTERVAL Heartbeat log interval in seconds (default: 30)
25
26
  # HEARTBEAT_STALE_THRESHOLD Heartbeat stale threshold in seconds (default: 600)
@@ -41,6 +42,7 @@ LOG_CLEANUP_ENABLED=${LOG_CLEANUP_ENABLED:-1}
41
42
  LOG_RETENTION_DAYS=${LOG_RETENTION_DAYS:-14}
42
43
  LOG_MAX_TOTAL_MB=${LOG_MAX_TOTAL_MB:-1024}
43
44
  VERBOSE=${VERBOSE:-0}
45
+ MODEL=${MODEL:-""}
44
46
 
45
47
  # Source shared common helpers (CLI/platform detection + logs + deps)
46
48
  source "$SCRIPT_DIR/lib/common.sh"
@@ -91,6 +93,11 @@ spawn_and_wait_session() {
91
93
  stream_json_flag="--output-format stream-json"
92
94
  fi
93
95
 
96
+ local model_flag=""
97
+ if [[ -n "$MODEL" ]]; then
98
+ model_flag="--model $MODEL"
99
+ fi
100
+
94
101
  case "$CLI_CMD" in
95
102
  *claude*)
96
103
  # Claude Code: prompt via -p argument, --yes for auto-accept
@@ -100,6 +107,7 @@ spawn_and_wait_session() {
100
107
  --yes \
101
108
  $verbose_flag \
102
109
  $stream_json_flag \
110
+ $model_flag \
103
111
  > "$session_log" 2>&1 &
104
112
  ;;
105
113
  *)
@@ -109,6 +117,7 @@ spawn_and_wait_session() {
109
117
  -y \
110
118
  $verbose_flag \
111
119
  $stream_json_flag \
120
+ $model_flag \
112
121
  < "$bootstrap_prompt" \
113
122
  > "$session_log" 2>&1 &
114
123
  ;;
@@ -790,6 +799,7 @@ show_help() {
790
799
  echo " run [feature-list.json] Run all features sequentially"
791
800
  echo " run <feature-id> [options] Run a single feature"
792
801
  echo " status [feature-list.json] Show pipeline status"
802
+ echo " test-cli Test AI CLI: show detected CLI, version, and model"
793
803
  echo " reset Clear all state and start fresh"
794
804
  echo " help Show this help message"
795
805
  echo ""
@@ -805,6 +815,7 @@ show_help() {
805
815
  echo " MAX_RETRIES Max retries per feature (default: 3)"
806
816
  echo " SESSION_TIMEOUT Session timeout in seconds (default: 0 = no limit)"
807
817
  echo " AI_CLI AI CLI command name (auto-detected: cbc or claude)"
818
+ echo " MODEL AI model ID (e.g. claude-opus-4.6, claude-sonnet-4.6, claude-haiku-4.5)"
808
819
  echo " HEARTBEAT_INTERVAL Heartbeat log interval in seconds (default: 30)"
809
820
  echo " HEARTBEAT_STALE_THRESHOLD Heartbeat stale threshold in seconds (default: 600)"
810
821
  echo " LOG_CLEANUP_ENABLED Run log cleanup before execution (default: 1)"
@@ -820,6 +831,8 @@ show_help() {
820
831
  echo " ./run.sh run F-007 --clean --mode standard # Clean + run standard"
821
832
  echo " ./run.sh status # Show pipeline status"
822
833
  echo " MAX_RETRIES=5 SESSION_TIMEOUT=7200 ./run.sh run # Custom config"
834
+ echo " MODEL=claude-sonnet-4.6 ./run.sh run # Use Sonnet model"
835
+ echo " MODEL=claude-haiku-4.5 ./run.sh test-cli # Test with Haiku"
823
836
  }
824
837
 
825
838
  case "${1:-run}" in
@@ -843,6 +856,64 @@ case "${1:-run}" in
843
856
  --state-dir "$STATE_DIR" \
844
857
  --action status
845
858
  ;;
859
+ test-cli)
860
+ echo ""
861
+ echo "============================================"
862
+ echo " Dev-Pipeline AI CLI Test"
863
+ echo "============================================"
864
+ echo ""
865
+ echo " Detected CLI: $CLI_CMD"
866
+ echo " Platform: $PLATFORM"
867
+ if [[ -n "$MODEL" ]]; then
868
+ echo " Requested Model: $MODEL"
869
+ fi
870
+
871
+ # Get CLI version (first line only)
872
+ cli_version=$("$CLI_CMD" -v 2>&1 | head -1 || echo "unknown")
873
+ echo " CLI Version: $cli_version"
874
+ echo ""
875
+ echo " Querying AI model (headless mode)..."
876
+
877
+ test_prompt="What AI assistant/platform are you and what model are you running? Reply in one line, e.g. \"I'm Claude Code Claude Opnus x.x\".No extra text."
878
+
879
+ local_model_flag=""
880
+ if [[ -n "$MODEL" ]]; then
881
+ local_model_flag="--model $MODEL"
882
+ fi
883
+
884
+ # Run headless query with 30s timeout (background + kill pattern for macOS)
885
+ tmpfile=$(mktemp)
886
+ (
887
+ unset CLAUDECODE
888
+ case "$CLI_CMD" in
889
+ *claude*)
890
+ "$CLI_CMD" --print -p "$test_prompt" --dangerously-skip-permissions --no-session-persistence $local_model_flag > "$tmpfile" 2>/dev/null
891
+ ;;
892
+ *)
893
+ echo "$test_prompt" | "$CLI_CMD" --print -y $local_model_flag > "$tmpfile" 2>/dev/null
894
+ ;;
895
+ esac
896
+ ) &
897
+ query_pid=$!
898
+ ( sleep 30 && kill "$query_pid" 2>/dev/null ) &
899
+ timer_pid=$!
900
+ wait "$query_pid" 2>/dev/null
901
+ kill "$timer_pid" 2>/dev/null
902
+ wait "$timer_pid" 2>/dev/null || true
903
+
904
+ model_reply=$(cat "$tmpfile" 2>/dev/null | head -3)
905
+ rm -f "$tmpfile"
906
+
907
+ if [[ -z "$model_reply" ]]; then
908
+ model_reply="(no response — CLI may require auth or is unavailable)"
909
+ fi
910
+
911
+ echo ""
912
+ echo " AI Response: $model_reply"
913
+ echo ""
914
+ echo "============================================"
915
+ echo ""
916
+ ;;
846
917
  reset)
847
918
  log_warn "Resetting pipeline state..."
848
919
  rm -rf "$STATE_DIR"
@@ -19,6 +19,7 @@ from datetime import datetime, timezone
19
19
 
20
20
  EXPECTED_SCHEMA = "dev-pipeline-feature-list-v1"
21
21
  FEATURE_ID_PATTERN = re.compile(r"^F-\d{3}$")
22
+ TERMINAL_STATUSES = {"completed", "failed", "skipped"}
22
23
 
23
24
  REQUIRED_FEATURE_FIELDS = [
24
25
  "id",
@@ -234,6 +235,12 @@ def create_state_directory(state_dir, feature_list_path, features):
234
235
  os.makedirs(abs_state_dir, exist_ok=True)
235
236
  os.makedirs(features_dir, exist_ok=True)
236
237
 
238
+ # Count features already in terminal status at init time
239
+ completed_count = sum(
240
+ 1 for f in features
241
+ if isinstance(f, dict) and f.get("status") in TERMINAL_STATUSES
242
+ )
243
+
237
244
  # Write pipeline.json
238
245
  pipeline_state = {
239
246
  "run_id": run_id,
@@ -241,7 +248,7 @@ def create_state_directory(state_dir, feature_list_path, features):
241
248
  "feature_list_path": abs_feature_list_path,
242
249
  "created_at": now,
243
250
  "total_features": len(features),
244
- "completed_features": 0,
251
+ "completed_features": completed_count,
245
252
  }
246
253
  pipeline_path = os.path.join(abs_state_dir, "pipeline.json")
247
254
  with open(pipeline_path, "w", encoding="utf-8") as f:
@@ -260,9 +267,13 @@ def create_state_directory(state_dir, feature_list_path, features):
260
267
  sessions_dir = os.path.join(feature_dir, "sessions")
261
268
  os.makedirs(sessions_dir, exist_ok=True)
262
269
 
270
+ # Respect existing terminal status from feature-list.json
271
+ fl_status = feature.get("status", "pending")
272
+ init_status = fl_status if fl_status in TERMINAL_STATUSES else "pending"
273
+
263
274
  feature_status = {
264
275
  "feature_id": fid,
265
- "status": "pending",
276
+ "status": init_status,
266
277
  "retry_count": 0,
267
278
  "max_retries": 3,
268
279
  "sessions": [],
@@ -109,10 +109,15 @@ def now_iso():
109
109
  return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
110
110
 
111
111
 
112
- def load_feature_status(state_dir, feature_id):
112
+ def load_feature_status(state_dir, feature_id, feature_list_status=None):
113
113
  """Load the status.json for a feature.
114
114
 
115
115
  If the file does not exist, return a default pending status.
116
+
117
+ If feature_list_status is a terminal status (completed, failed, skipped),
118
+ it overrides the status field from status.json. This makes feature-list.json
119
+ the single source of truth for terminal statuses, while all other fields
120
+ (retry_count, sessions, etc.) still come from status.json.
116
121
  """
117
122
  status_path = os.path.join(
118
123
  state_dir, "features", feature_id, "status.json"
@@ -134,7 +139,7 @@ def load_feature_status(state_dir, feature_id):
134
139
  if err:
135
140
  # If we can't read it, treat as pending
136
141
  now = now_iso()
137
- return {
142
+ data = {
138
143
  "feature_id": feature_id,
139
144
  "status": "pending",
140
145
  "retry_count": 0,
@@ -145,6 +150,9 @@ def load_feature_status(state_dir, feature_id):
145
150
  "created_at": now,
146
151
  "updated_at": now,
147
152
  }
153
+ # feature-list.json wins for terminal statuses
154
+ if feature_list_status in TERMINAL_STATUSES:
155
+ data["status"] = feature_list_status
148
156
  return data
149
157
 
150
158
 
@@ -303,7 +311,7 @@ def action_get_next(feature_list_data, state_dir):
303
311
  fid = feature.get("id")
304
312
  if not fid:
305
313
  continue
306
- fs = load_feature_status(state_dir, fid)
314
+ fs = load_feature_status(state_dir, fid, feature.get("status"))
307
315
  status_map[fid] = fs.get("status", "pending")
308
316
  status_data_map[fid] = fs
309
317
 
@@ -574,7 +582,7 @@ def _format_duration(seconds):
574
582
  return "{}h{}m".format(h, m)
575
583
 
576
584
 
577
- def _estimate_remaining_time(features, state_dir, counts):
585
+ def _estimate_remaining_time(features, state_dir, counts, feature_list_data=None):
578
586
  """基于已完成 Feature 的历史耗时,按 complexity 加权预估剩余时间。
579
587
 
580
588
  策略:
@@ -588,6 +596,13 @@ def _estimate_remaining_time(features, state_dir, counts):
588
596
  # complexity 权重(用于没有历史数据时的估算)
589
597
  COMPLEXITY_WEIGHT = {"low": 1.0, "medium": 2.0, "high": 4.0}
590
598
 
599
+ # Build feature-list status map for terminal status override
600
+ fl_status_map = {}
601
+ if feature_list_data:
602
+ for f in feature_list_data.get("features", []):
603
+ if isinstance(f, dict) and f.get("id"):
604
+ fl_status_map[f["id"]] = f.get("status")
605
+
591
606
  # 按 complexity 分组收集已完成 Feature 的耗时
592
607
  duration_by_complexity = {} # complexity -> [duration_seconds]
593
608
  feature_complexity_map = {} # feature_id -> complexity
@@ -608,7 +623,7 @@ def _estimate_remaining_time(features, state_dir, counts):
608
623
  fid = feature.get("id")
609
624
  if not fid:
610
625
  continue
611
- fs = load_feature_status(state_dir, fid)
626
+ fs = load_feature_status(state_dir, fid, fl_status_map.get(fid))
612
627
  if fs.get("status") != "completed":
613
628
  continue
614
629
  duration = _calc_feature_duration(state_dir, fid)
@@ -638,7 +653,7 @@ def _estimate_remaining_time(features, state_dir, counts):
638
653
  fid = feature.get("id")
639
654
  if not fid:
640
655
  continue
641
- fs = load_feature_status(state_dir, fid)
656
+ fs = load_feature_status(state_dir, fid, fl_status_map.get(fid))
642
657
  fstatus = fs.get("status", "pending")
643
658
  if fstatus in TERMINAL_STATUSES:
644
659
  continue
@@ -694,7 +709,7 @@ def action_status(feature_list_data, state_dir):
694
709
  fid = feature.get("id")
695
710
  if not fid:
696
711
  continue
697
- fs = load_feature_status(state_dir, fid)
712
+ fs = load_feature_status(state_dir, fid, feature.get("status"))
698
713
  status_map[fid] = fs.get("status", "pending")
699
714
 
700
715
  for feature in features:
@@ -705,7 +720,7 @@ def action_status(feature_list_data, state_dir):
705
720
  if not fid:
706
721
  continue
707
722
 
708
- fs = load_feature_status(state_dir, fid)
723
+ fs = load_feature_status(state_dir, fid, feature.get("status"))
709
724
  fstatus = fs.get("status", "pending")
710
725
  retry_count = fs.get("retry_count", 0)
711
726
  max_retries_val = fs.get("max_retries", 3)
@@ -797,7 +812,7 @@ def action_status(feature_list_data, state_dir):
797
812
 
798
813
  # 预估剩余时间
799
814
  est_remaining, confidence = _estimate_remaining_time(
800
- features, state_dir, counts
815
+ features, state_dir, counts, feature_list_data
801
816
  )
802
817
 
803
818
  summary_line = "Total: {} features | Completed: {} | In Progress: {}".format(
@@ -1,5 +1,5 @@
1
1
  {
2
- "version": "1.0.26",
2
+ "version": "1.0.28",
3
3
  "skills": {
4
4
  "prizm-kit": {
5
5
  "description": "Full-lifecycle dev toolkit. Covers spec-driven development, Prizm context docs, code quality, debugging, deployment, and knowledge management.",
@@ -197,13 +197,6 @@
197
197
  "hasAssets": false,
198
198
  "hasScripts": false
199
199
  },
200
- "refactor-skill": {
201
- "description": "Intelligent refactor review for existing skills with in-place vs v2 optimization and mandatory eval+graphical review.",
202
- "tier": "companion",
203
- "category": "Custom-skill",
204
- "hasAssets": false,
205
- "hasScripts": false
206
- },
207
200
  "bug-planner": {
208
201
  "description": "Interactive bug planning that produces bug-fix-list.json. Supports stack traces, user reports, failed tests, log patterns, monitoring alerts.",
209
202
  "tier": "companion",
@@ -255,7 +248,6 @@
255
248
  "prizmkit-retrospective",
256
249
  "feature-workflow",
257
250
  "refactor-workflow",
258
- "refactor-skill",
259
251
  "app-planner",
260
252
  "bug-planner",
261
253
  "dev-pipeline-launcher",
@@ -99,9 +99,34 @@ Detect user intent from their message, then follow the corresponding workflow:
99
99
  --action status 2>/dev/null
100
100
  ```
101
101
 
102
- 4. **Ask user to confirm**: "Ready to launch the pipeline? It will process N features in the background."
102
+ 4. **Ask execution mode**: Present the user with a choice before launching:
103
+ - **(1) Background daemon (recommended)**: Pipeline runs fully detached via `launch-daemon.sh`. Survives session closure.
104
+ - **(2) Foreground in session**: Pipeline runs in the current session via `run.sh run`. Visible output but will stop if session times out.
105
+ - **(3) Manual — show commands**: Display the exact commands the user can run themselves. No execution.
103
106
 
104
- 5. **Launch**:
107
+ Default to option 1 if user says "just run it" or doesn't specify.
108
+
109
+ **If option 2 (foreground)**:
110
+ ```bash
111
+ dev-pipeline/run.sh run feature-list.json
112
+ ```
113
+ Note: This will block the session. Warn user about timeout risk.
114
+
115
+ **If option 3 (manual)**: Print commands and stop. Do not execute anything.
116
+ ```
117
+ # To run in background (recommended):
118
+ dev-pipeline/launch-daemon.sh start feature-list.json
119
+
120
+ # To run in foreground:
121
+ dev-pipeline/run.sh run feature-list.json
122
+
123
+ # To check status:
124
+ dev-pipeline/launch-daemon.sh status
125
+ ```
126
+
127
+ 5. **Ask user to confirm**: "Ready to launch the pipeline? It will process N features in the background."
128
+
129
+ 6. **Launch**:
105
130
  ```bash
106
131
  dev-pipeline/launch-daemon.sh start feature-list.json
107
132
  ```
@@ -110,18 +135,18 @@ Detect user intent from their message, then follow the corresponding workflow:
110
135
  dev-pipeline/launch-daemon.sh start feature-list.json --env "SESSION_TIMEOUT=7200 MAX_RETRIES=5"
111
136
  ```
112
137
 
113
- 6. **Verify launch**:
138
+ 7. **Verify launch**:
114
139
  ```bash
115
140
  dev-pipeline/launch-daemon.sh status
116
141
  ```
117
142
 
118
- 7. **Start log monitoring** -- Use the Bash tool with `run_in_background: true`:
143
+ 8. **Start log monitoring** -- Use the Bash tool with `run_in_background: true`:
119
144
  ```bash
120
145
  tail -f dev-pipeline/state/pipeline-daemon.log
121
146
  ```
122
147
  This runs in background so you can continue interacting with the user.
123
148
 
124
- 8. **Report to user**:
149
+ 9. **Report to user**:
125
150
  - Pipeline PID
126
151
  - Log file location
127
152
  - "You can ask me 'pipeline status' or 'show logs' at any time"
@@ -144,12 +144,19 @@ Add new features to an existing project (incremental mode).
144
144
  Proceed? (Y/n)
145
145
  ```
146
146
 
147
- 2. **Invoke `dev-pipeline-launcher` skill**:
147
+ 2. **Ask execution mode**: Before invoking the launcher, present the choice:
148
+ - **(1) Background daemon (recommended)**: Runs detached, survives session closure.
149
+ - **(2) Foreground in session**: Runs in current session with visible output. Stops if session times out.
150
+ - **(3) Manual — show commands**: Display commands only, no execution.
151
+
152
+ Pass the chosen mode to `dev-pipeline-launcher`.
153
+
154
+ 3. **Invoke `dev-pipeline-launcher` skill**:
148
155
  - The launcher handles all prerequisites checks
149
156
  - Starts `launch-daemon.sh` in background
150
157
  - Returns PID and log file location
151
158
 
152
- 3. **Verify launch success**:
159
+ 4. **Verify launch success**:
153
160
  - Confirm pipeline is running
154
161
  - Record PID and log path for Phase 3
155
162
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "prizmkit",
3
- "version": "1.0.26",
3
+ "version": "1.0.28",
4
4
  "description": "Create a new PrizmKit-powered project with clean initialization — no framework dev files, just what you need.",
5
5
  "type": "module",
6
6
  "bin": {
package/src/index.js CHANGED
@@ -45,7 +45,6 @@ export async function runScaffold(directory, options) {
45
45
  const cliStatus = [
46
46
  detected.cbc ? chalk.green('cbc ✓') : chalk.gray('cbc ✗'),
47
47
  detected.claude ? chalk.green('claude ✓') : chalk.gray('claude ✗'),
48
- detected.claudeInternal ? chalk.green('claude-internal ✓') : chalk.gray('claude-internal ✗'),
49
48
  ].join(' ');
50
49
  console.log(` 检测到的 CLI 工具: ${cliStatus}`);
51
50
  console.log(` 目标目录: ${projectRoot}`);
@@ -1,371 +0,0 @@
1
- ---
2
- name: "refactor-skill"
3
- tier: companion
4
- description: "Intelligent refactor review for existing skills: evaluates quality, proposes in-place upgrade, and enforces eval + graphical review after changes. (project) by using the newest skill-creator standard"
5
- ---
6
-
7
- # Refactor Skill
8
-
9
- Specialized workflow for reviewing and upgrading existing skills with measurable quality gates.
10
-
11
- ## When to Use
12
-
13
- Use this skill when user says:
14
- - "重构这个 skill", "优化技能设计", "评审并升级技能"
15
- - "review this skill and improve it"
16
- - "keep the same skill but improve quality"
17
- - "原地升级这个 skill" / "in-place upgrade this skill"
18
-
19
- Do NOT use when user only wants to run a pipeline immediately without changing skill design.
20
-
21
- ## Core Goals
22
-
23
- 1. Review current skill comprehensively and find concrete improvement points.
24
- 2. **Default and preferred mode: in-place upgrade** of the existing skill.
25
- 3. New-version fork (e.g., `-v2`) is **exception-only** and must be explicitly requested by user.
26
- 4. After any modification, **must** run standardized evaluation and graphical review.
27
-
28
- ## Context Readiness Gate (Mandatory)
29
-
30
- Before any refactor action, verify whether conversation context already contains:
31
- - target skill name/path
32
- - current project/workspace path
33
- - refactor objective and constraints (quality, speed, compatibility)
34
- - whether user explicitly requests a new-version fork (default is in-place upgrade)
35
-
36
- If any item is missing, do not block; gather context proactively:
37
- 1. Read `/core/skills/_metadata.json` to locate target skill and related neighbors.
38
- 2. Read target `SKILL.md` plus key assets/scripts under that skill.
39
- 3. Check recent evaluation artifacts under `/.codebuddy/skill-evals/` if present.
40
- 4. Ask only the minimum unresolved question(s).
41
-
42
- ## Review Dimensions (Mandatory Rubric)
43
-
44
- Assess and score each dimension (1-5):
45
-
46
- 1. **功能性 (Functionality)**
47
- - Trigger clarity and routing correctness
48
- - Workflow completeness and error recovery
49
- - Output contract correctness (schema/format compatibility)
50
-
51
- 2. **效率性 (Efficiency)**
52
- - Unnecessary steps, token/time overhead
53
- - Reusability of scripts/assets
54
- - Fast-path design and fallback strategy
55
-
56
- 3. **可维护性 (Maintainability)**
57
- - Instruction structure/readability
58
- - Coupling to environment and path robustness
59
- - Testability and observability (artifacts, checkpoints)
60
-
61
- Output a concise review summary with:
62
- - strengths
63
- - prioritized issues (P0/P1/P2)
64
- - expected impact for each fix
65
-
66
- ## Optimization Strategy Selection
67
-
68
- After review, apply **in-place upgrade** by default.
69
-
70
- ### Default Mode — In-Place Upgrade (Required Unless Explicitly Overridden)
71
- Use in almost all cases:
72
- - skill naming and contract remain stable
73
- - change scope is moderate or large but compatible
74
- - backward compatibility is required
75
-
76
- Actions:
77
- 1. edit existing skill files in place
78
- 2. preserve skill name/frontmatter compatibility
79
- 3. keep migration notes minimal
80
-
81
- ### Exception Mode — New Version via `skill-creator` (Explicit User Request Only)
82
- Only use when user clearly asks to fork a new version (e.g., `create <skill>-v2`).
83
-
84
- Additional required checks before using exception mode:
85
- - user confirms they need side-by-side old/new variants
86
- - user accepts added maintenance cost for two versions
87
-
88
- Actions:
89
- 1. copy current skill as baseline snapshot
90
- 2. create `<skill-name>-v2` and apply redesign
91
- 3. run evaluation against old version baseline
92
-
93
- ## Mandatory Post-Change Validation, Review, and Optimization Loop
94
-
95
- Run this full loop after **every** refactor (default: in-place; exception: new-version fork). Do not skip.
96
-
97
- ### Step 0: Freeze Refactor Scope (Input Gate)
98
- Capture and freeze:
99
- - target skill path
100
- - iteration id (`iteration-N`)
101
- - baseline type (`old-snapshot` preferred, fallback `without_skill`)
102
- - optimization goal for this round (quality, token, latency, or compatibility)
103
-
104
- Expected output:
105
- - one-line run plan: `skill + baseline + iteration + goal`
106
-
107
- ### Step 1: Structural Validation (Pre-Eval)
108
- Validate skill structure/frontmatter and required files first.
109
-
110
- Expected output:
111
- - validation pass/fail result
112
- - blocking fix list if failed
113
-
114
- ### Step 2: Execute Standardized Eval Runs (Mandatory)
115
- Create iteration workspace:
116
- - `/.codebuddy/skill-evals/<skill-name>-workspace/iteration-N/`
117
-
118
- Run both configurations for the same eval set in the same iteration:
119
- - `with_skill` (updated skill)
120
- - `baseline` (old snapshot or `without_skill`)
121
-
122
- Use multi-run strategy:
123
- - default: 3 runs (fast feedback)
124
- - release gate: 5 runs (stability check)
125
-
126
- Required artifacts per run:
127
- - `outputs/`
128
- - `timing.json`
129
- - `grading.json`
130
- - `eval_metadata.json` (per eval directory)
131
-
132
- Expected output:
133
- - complete run tree with paired `with_skill` vs `baseline` runs
134
- - no missing required artifact files
135
-
136
- ### Step 3: Score, Aggregate, and Build Benchmark
137
- Run grading and aggregation using standardized scripts. Keep metrics comparable across iterations.
138
-
139
- Required outputs:
140
- - `benchmark.json`
141
- - `benchmark.md`
142
-
143
- Required metrics:
144
- - pass rate
145
- - duration
146
- - token usage
147
- - with_skill vs baseline delta
148
- - variance (`stddev`) for stability judgment
149
-
150
- Expected output:
151
- - benchmark summary with clear win/lose/neutral conclusion per metric
152
-
153
- ### Step 4: Graphical Review (Mandatory via `generate_review`)
154
- Generate review UI using official `generate_review.py` (no custom viewer).
155
-
156
- Preferred modes:
157
- 1. server mode for interactive inspection
158
- 2. static HTML mode (`--static`) for headless fallback
159
-
160
- Expected output:
161
- - review entry recorded (URL or HTML path)
162
- - quick notes on representative good/bad runs linked to evidence
163
-
164
- ### Step 5: Analyze Results and Derive Optimization Actions
165
- Translate benchmark + viewer evidence into prioritized actions:
166
- - **P0**: contract/validation breakages
167
- - **P1**: quality instability or high variance
168
- - **P2**: token/time inefficiencies
169
-
170
- For each action define:
171
- - root cause hypothesis
172
- - exact file/section to modify
173
- - expected metric impact
174
- - rollback condition
175
-
176
- Expected output:
177
- - actionable optimization list (not generic advice)
178
-
179
- ### Step 6: Implement Targeted Improvements
180
- Apply only the selected actions for this iteration.
181
- Avoid mixing unrelated changes to keep causal attribution clear.
182
-
183
- Expected output:
184
- - focused diff scoped to the chosen actions
185
-
186
- ### Step 7: Re-Run Evaluation and Compare Iterations
187
- Re-run Step 2–4 on the updated skill and compare against previous iteration.
188
-
189
- Decision rule:
190
- - if goals met and no regression: accept iteration
191
- - if partial improvement: keep gains, open next iteration with narrowed scope
192
- - if regression: rollback or revise hypothesis and repeat
193
-
194
- Expected output:
195
- - iteration verdict (`accepted` / `needs-next-iteration` / `rollback`)
196
- - before/after comparison table
197
-
198
- ### Step 8: Close the Loop (Mandatory Delivery)
199
- Return:
200
- 1. what changed
201
- 2. measured impact (pass/time/tokens/variance deltas)
202
- 3. viewer entry
203
- 4. remaining risks
204
- 5. next iteration plan (if needed)
205
-
206
- This closes the loop from **test review → evidence analysis → skill optimization → re-validation**.
207
-
208
- ### Standard Command Blueprint (Project-level)
209
- Use the one-command review pipeline with optional grader hook:
210
-
211
- ```bash
212
- npm run skill:review -- \
213
- --workspace /abs/.codebuddy/skill-evals/<skill-name>-workspace \
214
- --iteration iteration-N \
215
- --skill-name <skill-name> \
216
- --skill-path /abs/core/skills/<skill-name> \
217
- --runs 3 \
218
- --grader-cmd "python3 /abs/scripts/skill-evals/grade-eval-runs.py --workspace {workspace} --iteration {iteration} --validator /abs/core/skills/<skill-name>/scripts/validate-and-generate.py --baseline-input /abs/.codebuddy/skill-evals/<skill-name>-workspace/inputs/feature-list-existing.json"
219
- ```
220
-
221
- Minimum expected deliverables per iteration:
222
- - `<workspace>/<iteration>/benchmark.json`
223
- - `<workspace>/<iteration>/benchmark.md`
224
- - `<workspace>/<iteration>/review.html`
225
- - optimization action list with priority and owner
226
-
227
- ## Execution Notes for `skill-creator` Integration
228
-
229
- When available, follow latest `skill-creator` evaluation/viewer workflow as source of truth:
230
- - parallelized run spawning (with_skill + baseline)
231
- - assertion-based grading format compatibility
232
- - benchmark aggregation via official script
233
- - viewer generation via official script
234
-
235
- ## Output Contract of This Skill
236
-
237
- After completion, return:
238
- 1. selected mode (`in-place` by default, or `new-version` if explicitly requested) and why
239
- 2. files changed/created
240
- 3. review rubric scores before vs after
241
- 4. benchmark summary (pass/time/tokens delta)
242
- 5. graphical review entry (URL or static HTML path)
243
- 6. remaining risks and next iteration suggestions
244
-
245
- ## Error Handling
246
-
247
- - Missing target skill path: auto-discover under `/core/skills/` then confirm.
248
- - Missing baseline snapshot: create one before modifications.
249
- - Eval incomplete: mark status as blocked and list missing artifacts.
250
- - Viewer runtime incompatibility: switch to `--static` mode and continue.
251
-
252
- ## Skill Registry Modification Guide
253
-
254
- When adding or removing skills from the framework, follow this reference checklist.
255
-
256
- ### Adding a New Skill
257
-
258
- **Step 1: Create skill definition**
259
- ```
260
- core/skills/<skill-name>/SKILL.md # Required: skill definition with frontmatter
261
- core/skills/<skill-name>/assets/ # Optional: templates, configs, etc.
262
- core/skills/<skill-name>/scripts/ # Optional: executable scripts
263
- ```
264
-
265
- **Step 2: Register in metadata**
266
-
267
- Edit `core/skills/_metadata.json`:
268
- ```json
269
- {
270
- "skills": {
271
- "<skill-name>": {
272
- "description": "Brief description of the skill",
273
- "tier": "1", // "foundation", "1", "2", or "companion"
274
- "category": "core", // "core", "quality", "devops", "debugging", "documentation", "pipeline"
275
- "hasAssets": false, // true if assets/ directory exists
276
- "hasScripts": false // true if scripts/ directory exists
277
- }
278
- }
279
- }
280
- ```
281
-
282
- **Step 3: (Optional) Add to suite**
283
-
284
- If the skill belongs to `core` or `minimal` suite, add to `suites` section in `_metadata.json`:
285
- ```json
286
- {
287
- "suites": {
288
- "core": {
289
- "skills": ["<skill-name>", ...]
290
- }
291
- }
292
- }
293
- ```
294
-
295
- **Step 4: Regenerate derived artifacts**
296
- ```bash
297
- # Update bundled directory for npm package
298
- node scripts/bundle.js
299
- ```
300
-
301
- **Step 5: Validate**
302
- ```bash
303
- npm test
304
- # or
305
- node tests/validate-all.js
306
- ```
307
-
308
- ### Removing a Skill
309
-
310
- **Step 1: Delete skill directory**
311
- ```bash
312
- rm -rf core/skills/<skill-name>/
313
- ```
314
-
315
- **Step 2: Remove from metadata**
316
-
317
- Edit `core/skills/_metadata.json`:
318
- - Remove entry from `skills` object
319
- - Remove from any `suites` that reference it
320
-
321
- **Step 3: Regenerate derived artifacts**
322
- ```bash
323
- node scripts/bundle.js
324
- ```
325
-
326
- **Step 4: Validate**
327
- ```bash
328
- npm test
329
- ```
330
-
331
- ### Modification Checklist Summary
332
-
333
- | File | Action | Required |
334
- |------|--------|----------|
335
- | `core/skills/<name>/SKILL.md` | Create/Delete | **Always** |
336
- | `core/skills/_metadata.json` → `skills` | Add/Remove entry | **Always** |
337
- | `core/skills/_metadata.json` → `suites` | Add/Remove from suite | If belongs to suite |
338
- | `core/skills/<name>/assets/` | Create/Delete | If has resources |
339
- | `core/skills/<name>/scripts/` | Create/Delete | If has scripts |
340
- | `create-prizmkit/bundled/` | Regenerate via script | Auto |
341
-
342
- ### Documents That May Need Number Updates
343
-
344
- **Recommendation: Avoid hardcoding skill counts.** Use relative descriptions instead:
345
- - ✅ "All skills" instead of "34 skills"
346
- - ✅ "Core Tier 1 skills" instead of "Core Tier 1 skills (17 skills)"
347
- - ✅ "symlink (skills)" instead of "symlink (35 skills)"
348
-
349
- If you must include counts, maintain them in one place (`_metadata.json`) and update all references together.
350
-
351
- Files that currently contain hardcoded skill counts:
352
-
353
- | File | Current Pattern | Suggested Fix |
354
- |------|-----------------|---------------|
355
- | `README.md` | "**N Skills** covering..." | Remove number or use "Skills" |
356
- | `CODEBUDDY.md` | "**N Skills** — ..." | Remove number |
357
- | `PK-Construct-Guide.md` | "N skills — 每个 skill..." | Remove number |
358
- | `PK-Evolving-User-Guide.md` | "symlink (N skills)" | Use "symlink (skills)" |
359
- | `core/skills/prizm-kit/SKILL.md` | "## Skill Inventory (N skills)" | Use "## Skill Inventory" |
360
- | `core/skills/_metadata.json` | `"description": "All N skills"` | Use `"description": "All skills"` |
361
-
362
- To find hardcoded numbers:
363
- ```bash
364
- grep -rn "[0-9]\+ skills\|[0-9]\+ Skills" --include="*.md" --include="*.json" .
365
- ```
366
-
367
- ## Path Rules
368
-
369
- - Prefer absolute paths in execution commands.
370
- - Keep path references portable in instructions when possible (e.g., `${SKILL_DIR}` for intra-skill files).
371
- - Never delete `.codebuddy` directory.