@kennethsolomon/shipkit 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +321 -0
- package/bin/shipkit.js +146 -0
- package/commands/sk/brainstorm.md +63 -0
- package/commands/sk/branch.md +35 -0
- package/commands/sk/config.md +96 -0
- package/commands/sk/execute-plan.md +85 -0
- package/commands/sk/features.md +238 -0
- package/commands/sk/finish-feature.md +154 -0
- package/commands/sk/help.md +103 -0
- package/commands/sk/hotfix.md +61 -0
- package/commands/sk/plan.md +30 -0
- package/commands/sk/release.md +72 -0
- package/commands/sk/security-check.md +188 -0
- package/commands/sk/set-profile.md +71 -0
- package/commands/sk/status.md +25 -0
- package/commands/sk/update-task.md +35 -0
- package/commands/sk/write-plan.md +72 -0
- package/package.json +23 -0
- package/skills/sk:accessibility/LICENSE.txt +177 -0
- package/skills/sk:accessibility/SKILL.md +150 -0
- package/skills/sk:api-design/LICENSE.txt +177 -0
- package/skills/sk:api-design/SKILL.md +158 -0
- package/skills/sk:brainstorming/SKILL.md +124 -0
- package/skills/sk:debug/SKILL.md +252 -0
- package/skills/sk:debug/debug_conductor.py +177 -0
- package/skills/sk:debug/lib/__init__.py +1 -0
- package/skills/sk:debug/lib/bug_gatherer.py +55 -0
- package/skills/sk:debug/lib/context_reader.py +139 -0
- package/skills/sk:debug/lib/findings_writer.py +76 -0
- package/skills/sk:debug/lib/lessons_writer.py +165 -0
- package/skills/sk:debug/lib/step_runner.py +326 -0
- package/skills/sk:features/SKILL.md +238 -0
- package/skills/sk:frontend-design/LICENSE.txt +177 -0
- package/skills/sk:frontend-design/SKILL.md +191 -0
- package/skills/sk:laravel-init/SKILL.md +37 -0
- package/skills/sk:laravel-new/SKILL.md +68 -0
- package/skills/sk:lint/SKILL.md +113 -0
- package/skills/sk:perf/LICENSE.txt +177 -0
- package/skills/sk:perf/SKILL.md +188 -0
- package/skills/sk:release/SKILL.md +113 -0
- package/skills/sk:release/references/android-checklist.md +269 -0
- package/skills/sk:release/references/ios-checklist.md +339 -0
- package/skills/sk:release/release.sh +378 -0
- package/skills/sk:review/SKILL.md +346 -0
- package/skills/sk:review/references/security-checklist.md +223 -0
- package/skills/sk:schema-migrate/SKILL.md +125 -0
- package/skills/sk:schema-migrate/orms/drizzle.md +546 -0
- package/skills/sk:schema-migrate/orms/laravel.md +367 -0
- package/skills/sk:schema-migrate/orms/prisma.md +357 -0
- package/skills/sk:schema-migrate/orms/rails.md +351 -0
- package/skills/sk:schema-migrate/orms/sqlalchemy.md +385 -0
- package/skills/sk:schema-migrate/references/detection.md +110 -0
- package/skills/sk:setup-claude/SKILL.md +365 -0
- package/skills/sk:setup-claude/references/detection.md +6 -0
- package/skills/sk:setup-claude/references/templates.md +11 -0
- package/skills/sk:setup-claude/scripts/apply_setup_claude.py +443 -0
- package/skills/sk:setup-claude/scripts/detect_arch_changes.py +437 -0
- package/skills/sk:setup-claude/templates/.claude/docs/arch-changelog-guide.md.template +6 -0
- package/skills/sk:setup-claude/templates/.claude/docs/changelog-guide.md.template +12 -0
- package/skills/sk:setup-claude/templates/CHANGELOG.md.template +21 -0
- package/skills/sk:setup-claude/templates/CLAUDE.md.template +299 -0
- package/skills/sk:setup-claude/templates/arch-changelog-guide.md.template +3 -0
- package/skills/sk:setup-claude/templates/changelog-guide.md.template +3 -0
- package/skills/sk:setup-claude/templates/commands/brainstorm.md.template +74 -0
- package/skills/sk:setup-claude/templates/commands/execute-plan.md.template +57 -0
- package/skills/sk:setup-claude/templates/commands/features.md.template +238 -0
- package/skills/sk:setup-claude/templates/commands/finish-feature.md.template +155 -0
- package/skills/sk:setup-claude/templates/commands/plan.md.template +30 -0
- package/skills/sk:setup-claude/templates/commands/re-setup.md.template +38 -0
- package/skills/sk:setup-claude/templates/commands/release.md.template +74 -0
- package/skills/sk:setup-claude/templates/commands/security-check.md.template +172 -0
- package/skills/sk:setup-claude/templates/commands/status.md.template +17 -0
- package/skills/sk:setup-claude/templates/commands/write-plan.md.template +34 -0
- package/skills/sk:setup-claude/templates/finish-feature.md.template +3 -0
- package/skills/sk:setup-claude/templates/plan.md.template +3 -0
- package/skills/sk:setup-claude/templates/status.md.template +3 -0
- package/skills/sk:setup-claude/templates/tasks/findings.md.template +19 -0
- package/skills/sk:setup-claude/templates/tasks/lessons.md.template +26 -0
- package/skills/sk:setup-claude/templates/tasks/progress.md.template +20 -0
- package/skills/sk:setup-claude/templates/tasks/security-findings.md.template +5 -0
- package/skills/sk:setup-claude/templates/tasks/todo.md.template +26 -0
- package/skills/sk:setup-claude/templates/tasks/workflow-status.md.template +31 -0
- package/skills/sk:setup-claude/templates/tasks-findings.md.template +3 -0
- package/skills/sk:setup-claude/templates/tasks-lessons.md.template +3 -0
- package/skills/sk:setup-claude/templates/tasks-progress.md.template +3 -0
- package/skills/sk:setup-claude/templates/tasks-todo.md.template +3 -0
- package/skills/sk:setup-claude/tests/test_apply_setup_claude.py +193 -0
- package/skills/sk:setup-optimizer/SKILL.md +184 -0
- package/skills/sk:setup-optimizer/lib/__init__.py +24 -0
- package/skills/sk:setup-optimizer/lib/detect.py +205 -0
- package/skills/sk:setup-optimizer/lib/discover.py +221 -0
- package/skills/sk:setup-optimizer/lib/enrich.py +163 -0
- package/skills/sk:setup-optimizer/lib/merge.py +277 -0
- package/skills/sk:setup-optimizer/lib/sidecar.py +129 -0
- package/skills/sk:setup-optimizer/optimize_claude.py +174 -0
- package/skills/sk:setup-optimizer/templates/CLAUDE.md.template +105 -0
- package/skills/sk:skill-creator/LICENSE.txt +202 -0
- package/skills/sk:skill-creator/SKILL.md +479 -0
- package/skills/sk:skill-creator/agents/analyzer.md +274 -0
- package/skills/sk:skill-creator/agents/comparator.md +202 -0
- package/skills/sk:skill-creator/agents/grader.md +223 -0
- package/skills/sk:skill-creator/assets/eval_review.html +146 -0
- package/skills/sk:skill-creator/eval-viewer/generate_review.py +471 -0
- package/skills/sk:skill-creator/eval-viewer/viewer.html +1325 -0
- package/skills/sk:skill-creator/references/schemas.md +430 -0
- package/skills/sk:skill-creator/scripts/aggregate_benchmark.py +401 -0
- package/skills/sk:skill-creator/scripts/generate_report.py +326 -0
- package/skills/sk:skill-creator/scripts/improve_description.py +248 -0
- package/skills/sk:skill-creator/scripts/package_skill.py +136 -0
- package/skills/sk:skill-creator/scripts/quick_validate.py +103 -0
- package/skills/sk:skill-creator/scripts/run_eval.py +310 -0
- package/skills/sk:skill-creator/scripts/run_loop.py +332 -0
- package/skills/sk:skill-creator/scripts/utils.py +47 -0
- package/skills/sk:smart-commit/SKILL.md +175 -0
- package/skills/sk:test/SKILL.md +171 -0
- package/skills/sk:write-tests/SKILL.md +195 -0
- package/skills/sk:write-tests/references/patterns.md +209 -0
|
@@ -0,0 +1,332 @@
|
|
|
1
|
+
#!/usr/bin/env python3
|
|
2
|
+
"""Run the eval + improve loop until all pass or max iterations reached.
|
|
3
|
+
|
|
4
|
+
Combines run_eval.py and improve_description.py in a loop, tracking history
|
|
5
|
+
and returning the best description found. Supports train/test split to prevent
|
|
6
|
+
overfitting.
|
|
7
|
+
"""
|
|
8
|
+
|
|
9
|
+
import argparse
|
|
10
|
+
import json
|
|
11
|
+
import random
|
|
12
|
+
import sys
|
|
13
|
+
import tempfile
|
|
14
|
+
import time
|
|
15
|
+
import webbrowser
|
|
16
|
+
from pathlib import Path
|
|
17
|
+
|
|
18
|
+
import anthropic
|
|
19
|
+
|
|
20
|
+
from scripts.generate_report import generate_html
|
|
21
|
+
from scripts.improve_description import improve_description
|
|
22
|
+
from scripts.run_eval import find_project_root, run_eval
|
|
23
|
+
from scripts.utils import parse_skill_md
|
|
24
|
+
|
|
25
|
+
|
|
26
|
+
def split_eval_set(eval_set: list[dict], holdout: float, seed: int = 42) -> tuple[list[dict], list[dict]]:
|
|
27
|
+
"""Split eval set into train and test sets, stratified by should_trigger."""
|
|
28
|
+
random.seed(seed)
|
|
29
|
+
|
|
30
|
+
# Separate by should_trigger
|
|
31
|
+
trigger = [e for e in eval_set if e["should_trigger"]]
|
|
32
|
+
no_trigger = [e for e in eval_set if not e["should_trigger"]]
|
|
33
|
+
|
|
34
|
+
# Shuffle each group
|
|
35
|
+
random.shuffle(trigger)
|
|
36
|
+
random.shuffle(no_trigger)
|
|
37
|
+
|
|
38
|
+
# Calculate split points
|
|
39
|
+
n_trigger_test = max(1, int(len(trigger) * holdout))
|
|
40
|
+
n_no_trigger_test = max(1, int(len(no_trigger) * holdout))
|
|
41
|
+
|
|
42
|
+
# Split
|
|
43
|
+
test_set = trigger[:n_trigger_test] + no_trigger[:n_no_trigger_test]
|
|
44
|
+
train_set = trigger[n_trigger_test:] + no_trigger[n_no_trigger_test:]
|
|
45
|
+
|
|
46
|
+
return train_set, test_set
|
|
47
|
+
|
|
48
|
+
|
|
49
|
+
def run_loop(
|
|
50
|
+
eval_set: list[dict],
|
|
51
|
+
skill_path: Path,
|
|
52
|
+
description_override: str | None,
|
|
53
|
+
num_workers: int,
|
|
54
|
+
timeout: int,
|
|
55
|
+
max_iterations: int,
|
|
56
|
+
runs_per_query: int,
|
|
57
|
+
trigger_threshold: float,
|
|
58
|
+
holdout: float,
|
|
59
|
+
model: str,
|
|
60
|
+
verbose: bool,
|
|
61
|
+
live_report_path: Path | None = None,
|
|
62
|
+
log_dir: Path | None = None,
|
|
63
|
+
) -> dict:
|
|
64
|
+
"""Run the eval + improvement loop."""
|
|
65
|
+
project_root = find_project_root()
|
|
66
|
+
name, original_description, content = parse_skill_md(skill_path)
|
|
67
|
+
current_description = description_override or original_description
|
|
68
|
+
|
|
69
|
+
# Split into train/test if holdout > 0
|
|
70
|
+
if holdout > 0:
|
|
71
|
+
train_set, test_set = split_eval_set(eval_set, holdout)
|
|
72
|
+
if verbose:
|
|
73
|
+
print(f"Split: {len(train_set)} train, {len(test_set)} test (holdout={holdout})", file=sys.stderr)
|
|
74
|
+
else:
|
|
75
|
+
train_set = eval_set
|
|
76
|
+
test_set = []
|
|
77
|
+
|
|
78
|
+
client = anthropic.Anthropic()
|
|
79
|
+
history = []
|
|
80
|
+
exit_reason = "unknown"
|
|
81
|
+
|
|
82
|
+
for iteration in range(1, max_iterations + 1):
|
|
83
|
+
if verbose:
|
|
84
|
+
print(f"\n{'='*60}", file=sys.stderr)
|
|
85
|
+
print(f"Iteration {iteration}/{max_iterations}", file=sys.stderr)
|
|
86
|
+
print(f"Description: {current_description}", file=sys.stderr)
|
|
87
|
+
print(f"{'='*60}", file=sys.stderr)
|
|
88
|
+
|
|
89
|
+
# Evaluate train + test together in one batch for parallelism
|
|
90
|
+
all_queries = train_set + test_set
|
|
91
|
+
t0 = time.time()
|
|
92
|
+
all_results = run_eval(
|
|
93
|
+
eval_set=all_queries,
|
|
94
|
+
skill_name=name,
|
|
95
|
+
description=current_description,
|
|
96
|
+
num_workers=num_workers,
|
|
97
|
+
timeout=timeout,
|
|
98
|
+
project_root=project_root,
|
|
99
|
+
runs_per_query=runs_per_query,
|
|
100
|
+
trigger_threshold=trigger_threshold,
|
|
101
|
+
model=model,
|
|
102
|
+
)
|
|
103
|
+
eval_elapsed = time.time() - t0
|
|
104
|
+
|
|
105
|
+
# Split results back into train/test by matching queries
|
|
106
|
+
train_queries_set = {q["query"] for q in train_set}
|
|
107
|
+
train_result_list = [r for r in all_results["results"] if r["query"] in train_queries_set]
|
|
108
|
+
test_result_list = [r for r in all_results["results"] if r["query"] not in train_queries_set]
|
|
109
|
+
|
|
110
|
+
train_passed = sum(1 for r in train_result_list if r["pass"])
|
|
111
|
+
train_total = len(train_result_list)
|
|
112
|
+
train_summary = {"passed": train_passed, "failed": train_total - train_passed, "total": train_total}
|
|
113
|
+
train_results = {"results": train_result_list, "summary": train_summary}
|
|
114
|
+
|
|
115
|
+
if test_set:
|
|
116
|
+
test_passed = sum(1 for r in test_result_list if r["pass"])
|
|
117
|
+
test_total = len(test_result_list)
|
|
118
|
+
test_summary = {"passed": test_passed, "failed": test_total - test_passed, "total": test_total}
|
|
119
|
+
test_results = {"results": test_result_list, "summary": test_summary}
|
|
120
|
+
else:
|
|
121
|
+
test_results = None
|
|
122
|
+
test_summary = None
|
|
123
|
+
|
|
124
|
+
history.append({
|
|
125
|
+
"iteration": iteration,
|
|
126
|
+
"description": current_description,
|
|
127
|
+
"train_passed": train_summary["passed"],
|
|
128
|
+
"train_failed": train_summary["failed"],
|
|
129
|
+
"train_total": train_summary["total"],
|
|
130
|
+
"train_results": train_results["results"],
|
|
131
|
+
"test_passed": test_summary["passed"] if test_summary else None,
|
|
132
|
+
"test_failed": test_summary["failed"] if test_summary else None,
|
|
133
|
+
"test_total": test_summary["total"] if test_summary else None,
|
|
134
|
+
"test_results": test_results["results"] if test_results else None,
|
|
135
|
+
# For backward compat with report generator
|
|
136
|
+
"passed": train_summary["passed"],
|
|
137
|
+
"failed": train_summary["failed"],
|
|
138
|
+
"total": train_summary["total"],
|
|
139
|
+
"results": train_results["results"],
|
|
140
|
+
})
|
|
141
|
+
|
|
142
|
+
# Write live report if path provided
|
|
143
|
+
if live_report_path:
|
|
144
|
+
partial_output = {
|
|
145
|
+
"original_description": original_description,
|
|
146
|
+
"best_description": current_description,
|
|
147
|
+
"best_score": "in progress",
|
|
148
|
+
"iterations_run": len(history),
|
|
149
|
+
"holdout": holdout,
|
|
150
|
+
"train_size": len(train_set),
|
|
151
|
+
"test_size": len(test_set),
|
|
152
|
+
"history": history,
|
|
153
|
+
}
|
|
154
|
+
live_report_path.write_text(generate_html(partial_output, auto_refresh=True, skill_name=name))
|
|
155
|
+
|
|
156
|
+
if verbose:
|
|
157
|
+
def print_eval_stats(label, results, elapsed):
|
|
158
|
+
pos = [r for r in results if r["should_trigger"]]
|
|
159
|
+
neg = [r for r in results if not r["should_trigger"]]
|
|
160
|
+
tp = sum(r["triggers"] for r in pos)
|
|
161
|
+
pos_runs = sum(r["runs"] for r in pos)
|
|
162
|
+
fn = pos_runs - tp
|
|
163
|
+
fp = sum(r["triggers"] for r in neg)
|
|
164
|
+
neg_runs = sum(r["runs"] for r in neg)
|
|
165
|
+
tn = neg_runs - fp
|
|
166
|
+
total = tp + tn + fp + fn
|
|
167
|
+
precision = tp / (tp + fp) if (tp + fp) > 0 else 1.0
|
|
168
|
+
recall = tp / (tp + fn) if (tp + fn) > 0 else 1.0
|
|
169
|
+
accuracy = (tp + tn) / total if total > 0 else 0.0
|
|
170
|
+
print(f"{label}: {tp+tn}/{total} correct, precision={precision:.0%} recall={recall:.0%} accuracy={accuracy:.0%} ({elapsed:.1f}s)", file=sys.stderr)
|
|
171
|
+
for r in results:
|
|
172
|
+
status = "PASS" if r["pass"] else "FAIL"
|
|
173
|
+
rate_str = f"{r['triggers']}/{r['runs']}"
|
|
174
|
+
print(f" [{status}] rate={rate_str} expected={r['should_trigger']}: {r['query'][:60]}", file=sys.stderr)
|
|
175
|
+
|
|
176
|
+
print_eval_stats("Train", train_results["results"], eval_elapsed)
|
|
177
|
+
if test_summary:
|
|
178
|
+
print_eval_stats("Test ", test_results["results"], 0)
|
|
179
|
+
|
|
180
|
+
if train_summary["failed"] == 0:
|
|
181
|
+
exit_reason = f"all_passed (iteration {iteration})"
|
|
182
|
+
if verbose:
|
|
183
|
+
print(f"\nAll train queries passed on iteration {iteration}!", file=sys.stderr)
|
|
184
|
+
break
|
|
185
|
+
|
|
186
|
+
if iteration == max_iterations:
|
|
187
|
+
exit_reason = f"max_iterations ({max_iterations})"
|
|
188
|
+
if verbose:
|
|
189
|
+
print(f"\nMax iterations reached ({max_iterations}).", file=sys.stderr)
|
|
190
|
+
break
|
|
191
|
+
|
|
192
|
+
# Improve the description based on train results
|
|
193
|
+
if verbose:
|
|
194
|
+
print(f"\nImproving description...", file=sys.stderr)
|
|
195
|
+
|
|
196
|
+
t0 = time.time()
|
|
197
|
+
# Strip test scores from history so improvement model can't see them
|
|
198
|
+
blinded_history = [
|
|
199
|
+
{k: v for k, v in h.items() if not k.startswith("test_")}
|
|
200
|
+
for h in history
|
|
201
|
+
]
|
|
202
|
+
new_description = improve_description(
|
|
203
|
+
client=client,
|
|
204
|
+
skill_name=name,
|
|
205
|
+
skill_content=content,
|
|
206
|
+
current_description=current_description,
|
|
207
|
+
eval_results=train_results,
|
|
208
|
+
history=blinded_history,
|
|
209
|
+
model=model,
|
|
210
|
+
log_dir=log_dir,
|
|
211
|
+
iteration=iteration,
|
|
212
|
+
)
|
|
213
|
+
improve_elapsed = time.time() - t0
|
|
214
|
+
|
|
215
|
+
if verbose:
|
|
216
|
+
print(f"Proposed ({improve_elapsed:.1f}s): {new_description}", file=sys.stderr)
|
|
217
|
+
|
|
218
|
+
current_description = new_description
|
|
219
|
+
|
|
220
|
+
# Find the best iteration by TEST score (or train if no test set)
|
|
221
|
+
if test_set:
|
|
222
|
+
best = max(history, key=lambda h: h["test_passed"] or 0)
|
|
223
|
+
best_score = f"{best['test_passed']}/{best['test_total']}"
|
|
224
|
+
else:
|
|
225
|
+
best = max(history, key=lambda h: h["train_passed"])
|
|
226
|
+
best_score = f"{best['train_passed']}/{best['train_total']}"
|
|
227
|
+
|
|
228
|
+
if verbose:
|
|
229
|
+
print(f"\nExit reason: {exit_reason}", file=sys.stderr)
|
|
230
|
+
print(f"Best score: {best_score} (iteration {best['iteration']})", file=sys.stderr)
|
|
231
|
+
|
|
232
|
+
return {
|
|
233
|
+
"exit_reason": exit_reason,
|
|
234
|
+
"original_description": original_description,
|
|
235
|
+
"best_description": best["description"],
|
|
236
|
+
"best_score": best_score,
|
|
237
|
+
"best_train_score": f"{best['train_passed']}/{best['train_total']}",
|
|
238
|
+
"best_test_score": f"{best['test_passed']}/{best['test_total']}" if test_set else None,
|
|
239
|
+
"final_description": current_description,
|
|
240
|
+
"iterations_run": len(history),
|
|
241
|
+
"holdout": holdout,
|
|
242
|
+
"train_size": len(train_set),
|
|
243
|
+
"test_size": len(test_set),
|
|
244
|
+
"history": history,
|
|
245
|
+
}
|
|
246
|
+
|
|
247
|
+
|
|
248
|
+
def main():
|
|
249
|
+
parser = argparse.ArgumentParser(description="Run eval + improve loop")
|
|
250
|
+
parser.add_argument("--eval-set", required=True, help="Path to eval set JSON file")
|
|
251
|
+
parser.add_argument("--skill-path", required=True, help="Path to skill directory")
|
|
252
|
+
parser.add_argument("--description", default=None, help="Override starting description")
|
|
253
|
+
parser.add_argument("--num-workers", type=int, default=10, help="Number of parallel workers")
|
|
254
|
+
parser.add_argument("--timeout", type=int, default=30, help="Timeout per query in seconds")
|
|
255
|
+
parser.add_argument("--max-iterations", type=int, default=5, help="Max improvement iterations")
|
|
256
|
+
parser.add_argument("--runs-per-query", type=int, default=3, help="Number of runs per query")
|
|
257
|
+
parser.add_argument("--trigger-threshold", type=float, default=0.5, help="Trigger rate threshold")
|
|
258
|
+
parser.add_argument("--holdout", type=float, default=0.4, help="Fraction of eval set to hold out for testing (0 to disable)")
|
|
259
|
+
parser.add_argument("--model", required=True, help="Model for improvement")
|
|
260
|
+
parser.add_argument("--verbose", action="store_true", help="Print progress to stderr")
|
|
261
|
+
parser.add_argument("--report", default="auto", help="Generate HTML report at this path (default: 'auto' for temp file, 'none' to disable)")
|
|
262
|
+
parser.add_argument("--results-dir", default=None, help="Save all outputs (results.json, report.html, log.txt) to a timestamped subdirectory here")
|
|
263
|
+
args = parser.parse_args()
|
|
264
|
+
|
|
265
|
+
eval_set = json.loads(Path(args.eval_set).read_text())
|
|
266
|
+
skill_path = Path(args.skill_path)
|
|
267
|
+
|
|
268
|
+
if not (skill_path / "SKILL.md").exists():
|
|
269
|
+
print(f"Error: No SKILL.md found at {skill_path}", file=sys.stderr)
|
|
270
|
+
sys.exit(1)
|
|
271
|
+
|
|
272
|
+
name, _, _ = parse_skill_md(skill_path)
|
|
273
|
+
|
|
274
|
+
# Set up live report path
|
|
275
|
+
if args.report != "none":
|
|
276
|
+
if args.report == "auto":
|
|
277
|
+
timestamp = time.strftime("%Y%m%d_%H%M%S")
|
|
278
|
+
live_report_path = Path(tempfile.gettempdir()) / f"skill_description_report_{skill_path.name}_{timestamp}.html"
|
|
279
|
+
else:
|
|
280
|
+
live_report_path = Path(args.report)
|
|
281
|
+
# Open the report immediately so the user can watch
|
|
282
|
+
live_report_path.write_text("<html><body><h1>Starting optimization loop...</h1><meta http-equiv='refresh' content='5'></body></html>")
|
|
283
|
+
webbrowser.open(str(live_report_path))
|
|
284
|
+
else:
|
|
285
|
+
live_report_path = None
|
|
286
|
+
|
|
287
|
+
# Determine output directory (create before run_loop so logs can be written)
|
|
288
|
+
if args.results_dir:
|
|
289
|
+
timestamp = time.strftime("%Y-%m-%d_%H%M%S")
|
|
290
|
+
results_dir = Path(args.results_dir) / timestamp
|
|
291
|
+
results_dir.mkdir(parents=True, exist_ok=True)
|
|
292
|
+
else:
|
|
293
|
+
results_dir = None
|
|
294
|
+
|
|
295
|
+
log_dir = results_dir / "logs" if results_dir else None
|
|
296
|
+
|
|
297
|
+
output = run_loop(
|
|
298
|
+
eval_set=eval_set,
|
|
299
|
+
skill_path=skill_path,
|
|
300
|
+
description_override=args.description,
|
|
301
|
+
num_workers=args.num_workers,
|
|
302
|
+
timeout=args.timeout,
|
|
303
|
+
max_iterations=args.max_iterations,
|
|
304
|
+
runs_per_query=args.runs_per_query,
|
|
305
|
+
trigger_threshold=args.trigger_threshold,
|
|
306
|
+
holdout=args.holdout,
|
|
307
|
+
model=args.model,
|
|
308
|
+
verbose=args.verbose,
|
|
309
|
+
live_report_path=live_report_path,
|
|
310
|
+
log_dir=log_dir,
|
|
311
|
+
)
|
|
312
|
+
|
|
313
|
+
# Save JSON output
|
|
314
|
+
json_output = json.dumps(output, indent=2)
|
|
315
|
+
print(json_output)
|
|
316
|
+
if results_dir:
|
|
317
|
+
(results_dir / "results.json").write_text(json_output)
|
|
318
|
+
|
|
319
|
+
# Write final HTML report (without auto-refresh)
|
|
320
|
+
if live_report_path:
|
|
321
|
+
live_report_path.write_text(generate_html(output, auto_refresh=False, skill_name=name))
|
|
322
|
+
print(f"\nReport: {live_report_path}", file=sys.stderr)
|
|
323
|
+
|
|
324
|
+
if results_dir and live_report_path:
|
|
325
|
+
(results_dir / "report.html").write_text(generate_html(output, auto_refresh=False, skill_name=name))
|
|
326
|
+
|
|
327
|
+
if results_dir:
|
|
328
|
+
print(f"Results saved to: {results_dir}", file=sys.stderr)
|
|
329
|
+
|
|
330
|
+
|
|
331
|
+
if __name__ == "__main__":
|
|
332
|
+
main()
|
|
@@ -0,0 +1,47 @@
|
|
|
1
|
+
"""Shared utilities for skill-creator scripts."""
|
|
2
|
+
|
|
3
|
+
from pathlib import Path
|
|
4
|
+
|
|
5
|
+
|
|
6
|
+
|
|
7
|
+
def parse_skill_md(skill_path: Path) -> tuple[str, str, str]:
|
|
8
|
+
"""Parse a SKILL.md file, returning (name, description, full_content)."""
|
|
9
|
+
content = (skill_path / "SKILL.md").read_text()
|
|
10
|
+
lines = content.split("\n")
|
|
11
|
+
|
|
12
|
+
if lines[0].strip() != "---":
|
|
13
|
+
raise ValueError("SKILL.md missing frontmatter (no opening ---)")
|
|
14
|
+
|
|
15
|
+
end_idx = None
|
|
16
|
+
for i, line in enumerate(lines[1:], start=1):
|
|
17
|
+
if line.strip() == "---":
|
|
18
|
+
end_idx = i
|
|
19
|
+
break
|
|
20
|
+
|
|
21
|
+
if end_idx is None:
|
|
22
|
+
raise ValueError("SKILL.md missing frontmatter (no closing ---)")
|
|
23
|
+
|
|
24
|
+
name = ""
|
|
25
|
+
description = ""
|
|
26
|
+
frontmatter_lines = lines[1:end_idx]
|
|
27
|
+
i = 0
|
|
28
|
+
while i < len(frontmatter_lines):
|
|
29
|
+
line = frontmatter_lines[i]
|
|
30
|
+
if line.startswith("name:"):
|
|
31
|
+
name = line[len("name:"):].strip().strip('"').strip("'")
|
|
32
|
+
elif line.startswith("description:"):
|
|
33
|
+
value = line[len("description:"):].strip()
|
|
34
|
+
# Handle YAML multiline indicators (>, |, >-, |-)
|
|
35
|
+
if value in (">", "|", ">-", "|-"):
|
|
36
|
+
continuation_lines: list[str] = []
|
|
37
|
+
i += 1
|
|
38
|
+
while i < len(frontmatter_lines) and (frontmatter_lines[i].startswith(" ") or frontmatter_lines[i].startswith("\t")):
|
|
39
|
+
continuation_lines.append(frontmatter_lines[i].strip())
|
|
40
|
+
i += 1
|
|
41
|
+
description = " ".join(continuation_lines)
|
|
42
|
+
continue
|
|
43
|
+
else:
|
|
44
|
+
description = value.strip('"').strip("'")
|
|
45
|
+
i += 1
|
|
46
|
+
|
|
47
|
+
return name, description, content
|
|
@@ -0,0 +1,175 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: sk:smart-commit
|
|
3
|
+
description: "Analyze staged changes, auto-detect commit type, and generate conventional commit messages with approval workflow."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Smart Conventional Commits
|
|
7
|
+
|
|
8
|
+
## Overview
|
|
9
|
+
|
|
10
|
+
Analyze staged git changes, auto-classify the commit type, detect scope from file paths, and generate a conventional commit message. Present for approval before committing.
|
|
11
|
+
|
|
12
|
+
## Safety Contract
|
|
13
|
+
|
|
14
|
+
- **Never** use `--no-verify` — always run pre-commit hooks
|
|
15
|
+
- **Never** auto-commit — always present the message for approval first
|
|
16
|
+
- **Never** force-push or amend without explicit user request
|
|
17
|
+
- **Warn immediately** if on `main` or `master` branch
|
|
18
|
+
|
|
19
|
+
## Allowed Tools
|
|
20
|
+
|
|
21
|
+
Bash, Read
|
|
22
|
+
|
|
23
|
+
## Steps
|
|
24
|
+
|
|
25
|
+
You MUST complete these steps in order:
|
|
26
|
+
|
|
27
|
+
### 1. Read Progress Context (Optional)
|
|
28
|
+
|
|
29
|
+
If `tasks/progress.md` exists, read the most recent Work Log entry. Use this to
|
|
30
|
+
understand *why* the staged changes were made — include a concise rationale in
|
|
31
|
+
the commit body when the reason isn't obvious from the diff alone.
|
|
32
|
+
|
|
33
|
+
### 2. Check Branch Safety
|
|
34
|
+
|
|
35
|
+
```bash
|
|
36
|
+
git branch --show-current
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
If the current branch is `main` or `master`, warn the user:
|
|
40
|
+
|
|
41
|
+
> **Warning:** You are on `main`. Commits should typically go on feature branches. Continue anyway?
|
|
42
|
+
|
|
43
|
+
Wait for confirmation before proceeding. If denied, stop.
|
|
44
|
+
|
|
45
|
+
### 3. Analyze Working Tree
|
|
46
|
+
|
|
47
|
+
```bash
|
|
48
|
+
git status --short
|
|
49
|
+
git diff --staged --stat
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
Report what's staged vs unstaged.
|
|
53
|
+
|
|
54
|
+
### 4. Handle Nothing Staged
|
|
55
|
+
|
|
56
|
+
If nothing is staged (`git diff --staged` is empty):
|
|
57
|
+
|
|
58
|
+
- Show unstaged changes grouped by directory
|
|
59
|
+
- Suggest logical groupings (e.g., "all files in `src/auth/`" or "all test files")
|
|
60
|
+
- Ask the user what to stage
|
|
61
|
+
- Stage the selected files with `git add`
|
|
62
|
+
- If the user wants to stage everything, use `git add` with specific file paths (never `git add -A`)
|
|
63
|
+
|
|
64
|
+
### 5. Read Full Diff
|
|
65
|
+
|
|
66
|
+
```bash
|
|
67
|
+
git diff --staged
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
Read the full diff to understand the nature of the changes.
|
|
71
|
+
|
|
72
|
+
### 6. Auto-Classify Commit Type
|
|
73
|
+
|
|
74
|
+
Classify based on what changed:
|
|
75
|
+
|
|
76
|
+
| Type | When |
|
|
77
|
+
|------|------|
|
|
78
|
+
| `feat` | New functionality, new files with business logic, new API endpoints |
|
|
79
|
+
| `fix` | Bug fixes, error corrections, fixing broken behavior |
|
|
80
|
+
| `refactor` | Code restructuring without behavior change |
|
|
81
|
+
| `test` | Adding or updating tests only |
|
|
82
|
+
| `docs` | Documentation changes only (README, comments, JSDoc) |
|
|
83
|
+
| `style` | Formatting, whitespace, semicolons — no logic change |
|
|
84
|
+
| `perf` | Performance improvements |
|
|
85
|
+
| `chore` | Dependencies, config, tooling, CI — no production code |
|
|
86
|
+
| `ci` | CI/CD pipeline changes only |
|
|
87
|
+
| `build` | Build system or external dependency changes |
|
|
88
|
+
|
|
89
|
+
If changes span multiple types, use the most significant one (feat > fix > refactor > others).
|
|
90
|
+
|
|
91
|
+
### 7. Detect Scope
|
|
92
|
+
|
|
93
|
+
Determine scope from file paths:
|
|
94
|
+
|
|
95
|
+
- Single directory: use directory name (e.g., `auth`, `api`, `components`)
|
|
96
|
+
- Single file type: use the domain (e.g., `config`, `deps`)
|
|
97
|
+
- Cross-cutting: omit scope
|
|
98
|
+
|
|
99
|
+
### 8. Generate Commit Message
|
|
100
|
+
|
|
101
|
+
Format: `type(scope): description`
|
|
102
|
+
|
|
103
|
+
Rules:
|
|
104
|
+
- Imperative mood ("add", "fix", "update" — not "added", "fixes", "updated")
|
|
105
|
+
- Under 72 characters for the subject line
|
|
106
|
+
- Lowercase first word after colon
|
|
107
|
+
- No period at end
|
|
108
|
+
- If the change is complex, add a body separated by blank line with bullet points
|
|
109
|
+
|
|
110
|
+
Example:
|
|
111
|
+
```
|
|
112
|
+
feat(auth): add JWT refresh token rotation
|
|
113
|
+
|
|
114
|
+
- Store refresh tokens in httpOnly cookies
|
|
115
|
+
- Add 7-day expiry with sliding window
|
|
116
|
+
- Invalidate old tokens on rotation
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
### 9. Present for Approval
|
|
120
|
+
|
|
121
|
+
Show the generated message and ask the user to choose:
|
|
122
|
+
|
|
123
|
+
1. **Commit** — execute as-is
|
|
124
|
+
2. **Edit** — let the user modify the message
|
|
125
|
+
3. **Split** — help break into multiple smaller commits
|
|
126
|
+
4. **Cancel** — abort
|
|
127
|
+
|
|
128
|
+
### 10. Execute Commit
|
|
129
|
+
|
|
130
|
+
Use a heredoc to preserve formatting:
|
|
131
|
+
|
|
132
|
+
```bash
|
|
133
|
+
git commit -m "$(cat <<'EOF'
|
|
134
|
+
type(scope): description
|
|
135
|
+
|
|
136
|
+
Optional body here.
|
|
137
|
+
EOF
|
|
138
|
+
)"
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
After committing, show the result:
|
|
142
|
+
|
|
143
|
+
```bash
|
|
144
|
+
git log -1 --oneline
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
### 11. Continue?
|
|
148
|
+
|
|
149
|
+
Check if there are remaining unstaged changes:
|
|
150
|
+
|
|
151
|
+
```bash
|
|
152
|
+
git status --short
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
If changes remain, ask: "There are more changes. Stage and commit another batch?"
|
|
156
|
+
|
|
157
|
+
If yes, go back to step 4. If no, done.
|
|
158
|
+
|
|
159
|
+
---
|
|
160
|
+
|
|
161
|
+
## Model Routing
|
|
162
|
+
|
|
163
|
+
Read `.shipkit/sk:config.json` from the project root if it exists.
|
|
164
|
+
|
|
165
|
+
- If `model_overrides["sk:smart-commit"]` is set, use that model — it takes precedence.
|
|
166
|
+
- Otherwise use the `profile` field. Default: `balanced`.
|
|
167
|
+
|
|
168
|
+
| Profile | Model |
|
|
169
|
+
|---------|-------|
|
|
170
|
+
| `full-sail` | haiku |
|
|
171
|
+
| `quality` | haiku |
|
|
172
|
+
| `balanced` | haiku |
|
|
173
|
+
| `budget` | haiku |
|
|
174
|
+
|
|
175
|
+
> `opus` = inherit (uses the current session model). When spawning sub-agents via the Agent tool, pass `model: "<resolved-model>"`.
|