claude-turing 4.6.0 → 4.8.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +2 -2
- package/README.md +1 -1
- package/commands/ablate.md +0 -1
- package/commands/annotate.md +0 -1
- package/commands/archive.md +0 -1
- package/commands/audit.md +0 -1
- package/commands/baseline.md +0 -1
- package/commands/brief.md +0 -1
- package/commands/budget.md +0 -1
- package/commands/calibrate.md +0 -1
- package/commands/card.md +0 -1
- package/commands/changelog.md +0 -1
- package/commands/checkpoint.md +0 -1
- package/commands/cite.md +0 -1
- package/commands/compare.md +0 -1
- package/commands/counterfactual.md +0 -1
- package/commands/curriculum.md +0 -1
- package/commands/design.md +0 -1
- package/commands/diagnose.md +0 -1
- package/commands/diff.md +0 -1
- package/commands/distill.md +0 -1
- package/commands/doctor.md +0 -1
- package/commands/ensemble.md +0 -1
- package/commands/explore.md +0 -1
- package/commands/export.md +0 -1
- package/commands/feature.md +0 -1
- package/commands/flashback.md +0 -1
- package/commands/fork.md +0 -1
- package/commands/frontier.md +0 -1
- package/commands/init.md +0 -1
- package/commands/leak.md +0 -1
- package/commands/lit.md +0 -1
- package/commands/logbook.md +0 -1
- package/commands/merge.md +0 -1
- package/commands/mode.md +0 -1
- package/commands/onboard.md +0 -1
- package/commands/paper.md +0 -1
- package/commands/plan.md +0 -1
- package/commands/poster.md +0 -1
- package/commands/postmortem.md +0 -1
- package/commands/preflight.md +0 -1
- package/commands/present.md +0 -1
- package/commands/profile.md +0 -1
- package/commands/prune.md +0 -1
- package/commands/quantize.md +0 -1
- package/commands/queue.md +0 -1
- package/commands/registry.md +0 -1
- package/commands/regress.md +0 -1
- package/commands/replay.md +0 -1
- package/commands/report.md +0 -1
- package/commands/reproduce.md +0 -1
- package/commands/retry.md +0 -1
- package/commands/review.md +0 -1
- package/commands/sanity.md +0 -1
- package/commands/scale.md +0 -1
- package/commands/search.md +0 -1
- package/commands/seed.md +0 -1
- package/commands/sensitivity.md +0 -1
- package/commands/share.md +0 -1
- package/commands/simulate.md +0 -1
- package/commands/status.md +0 -1
- package/commands/stitch.md +0 -1
- package/commands/suggest.md +0 -1
- package/commands/surgery.md +0 -1
- package/commands/sweep.md +0 -1
- package/commands/template.md +0 -1
- package/commands/train.md +0 -1
- package/commands/transfer.md +0 -1
- package/commands/trend.md +0 -1
- package/commands/try.md +0 -1
- package/commands/turing.md +3 -3
- package/commands/update.md +0 -1
- package/commands/validate.md +0 -1
- package/commands/warm.md +0 -1
- package/commands/watch.md +0 -1
- package/commands/whatif.md +0 -1
- package/commands/xray.md +0 -1
- package/config/commands.yaml +74 -74
- package/package.json +10 -3
- package/skills/turing/SKILL.md +180 -0
- package/skills/turing/ablate/SKILL.md +46 -0
- package/skills/turing/annotate/SKILL.md +22 -0
- package/skills/turing/archive/SKILL.md +22 -0
- package/skills/turing/audit/SKILL.md +55 -0
- package/skills/turing/baseline/SKILL.md +44 -0
- package/skills/turing/brief/SKILL.md +94 -0
- package/skills/turing/budget/SKILL.md +51 -0
- package/skills/turing/calibrate/SKILL.md +46 -0
- package/skills/turing/card/SKILL.md +35 -0
- package/skills/turing/changelog/SKILL.md +21 -0
- package/skills/turing/checkpoint/SKILL.md +46 -0
- package/skills/turing/cite/SKILL.md +22 -0
- package/skills/turing/compare/SKILL.md +23 -0
- package/skills/turing/counterfactual/SKILL.md +26 -0
- package/skills/turing/curriculum/SKILL.md +42 -0
- package/skills/turing/design/SKILL.md +96 -0
- package/skills/turing/diagnose/SKILL.md +51 -0
- package/skills/turing/diff/SKILL.md +47 -0
- package/skills/turing/distill/SKILL.md +55 -0
- package/skills/turing/doctor/SKILL.md +30 -0
- package/skills/turing/ensemble/SKILL.md +53 -0
- package/skills/turing/explore/SKILL.md +106 -0
- package/skills/turing/export/SKILL.md +47 -0
- package/skills/turing/feature/SKILL.md +41 -0
- package/skills/turing/flashback/SKILL.md +21 -0
- package/skills/turing/fork/SKILL.md +39 -0
- package/skills/turing/frontier/SKILL.md +44 -0
- package/skills/turing/init/SKILL.md +153 -0
- package/skills/turing/leak/SKILL.md +46 -0
- package/skills/turing/lit/SKILL.md +46 -0
- package/skills/turing/logbook/SKILL.md +50 -0
- package/skills/turing/merge/SKILL.md +23 -0
- package/skills/turing/mode/SKILL.md +42 -0
- package/skills/turing/onboard/SKILL.md +19 -0
- package/skills/turing/paper/SKILL.md +43 -0
- package/skills/turing/plan/SKILL.md +26 -0
- package/skills/turing/poster/SKILL.md +88 -0
- package/skills/turing/postmortem/SKILL.md +27 -0
- package/skills/turing/preflight/SKILL.md +74 -0
- package/skills/turing/present/SKILL.md +22 -0
- package/skills/turing/profile/SKILL.md +42 -0
- package/skills/turing/prune/SKILL.md +25 -0
- package/skills/turing/quantize/SKILL.md +23 -0
- package/skills/turing/queue/SKILL.md +47 -0
- package/skills/turing/registry/SKILL.md +30 -0
- package/skills/turing/regress/SKILL.md +52 -0
- package/skills/turing/replay/SKILL.md +22 -0
- package/skills/turing/report/SKILL.md +96 -0
- package/skills/turing/reproduce/SKILL.md +47 -0
- package/skills/turing/retry/SKILL.md +40 -0
- package/skills/turing/review/SKILL.md +19 -0
- package/skills/turing/rules/loop-protocol.md +91 -0
- package/skills/turing/sanity/SKILL.md +47 -0
- package/skills/turing/scale/SKILL.md +54 -0
- package/skills/turing/search/SKILL.md +21 -0
- package/skills/turing/seed/SKILL.md +46 -0
- package/skills/turing/sensitivity/SKILL.md +40 -0
- package/skills/turing/share/SKILL.md +19 -0
- package/skills/turing/simulate/SKILL.md +27 -0
- package/skills/turing/status/SKILL.md +23 -0
- package/skills/turing/stitch/SKILL.md +48 -0
- package/skills/turing/suggest/SKILL.md +158 -0
- package/skills/turing/surgery/SKILL.md +26 -0
- package/skills/turing/sweep/SKILL.md +44 -0
- package/skills/turing/template/SKILL.md +21 -0
- package/skills/turing/train/SKILL.md +74 -0
- package/skills/turing/transfer/SKILL.md +53 -0
- package/skills/turing/trend/SKILL.md +20 -0
- package/skills/turing/try/SKILL.md +62 -0
- package/skills/turing/update/SKILL.md +26 -0
- package/skills/turing/validate/SKILL.md +33 -0
- package/skills/turing/warm/SKILL.md +52 -0
- package/skills/turing/watch/SKILL.md +59 -0
- package/skills/turing/whatif/SKILL.md +30 -0
- package/skills/turing/xray/SKILL.md +42 -0
- package/src/command-registry.js +21 -0
- package/src/install.js +4 -3
- package/src/sync-commands-layout.js +149 -0
- package/src/sync-skills-layout.js +20 -0
- package/templates/__pycache__/evaluate.cpython-312.pyc +0 -0
- package/templates/__pycache__/evaluate.cpython-314.pyc +0 -0
- package/templates/__pycache__/prepare.cpython-312.pyc +0 -0
- package/templates/__pycache__/prepare.cpython-314.pyc +0 -0
- package/templates/features/__pycache__/__init__.cpython-312.pyc +0 -0
- package/templates/features/__pycache__/__init__.cpython-314.pyc +0 -0
- package/templates/features/__pycache__/featurizers.cpython-312.pyc +0 -0
- package/templates/features/__pycache__/featurizers.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/__init__.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/__init__.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/ablation_study.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/ablation_study.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/architecture_surgery.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/architecture_surgery.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/budget_manager.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/budget_manager.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/build_ensemble.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/build_ensemble.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/calibration.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/calibration.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/check_convergence.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/check_convergence.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/checkpoint_manager.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/checkpoint_manager.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/citation_manager.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/citation_manager.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/cost_frontier.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/cost_frontier.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/counterfactual_explanation.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/counterfactual_explanation.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/critique_hypothesis.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/critique_hypothesis.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/curriculum_optimizer.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/curriculum_optimizer.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/diagnose_errors.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/diagnose_errors.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/draft_paper_sections.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/draft_paper_sections.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/equivalence_checker.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/equivalence_checker.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/experiment_annotations.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/experiment_annotations.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/experiment_archive.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/experiment_archive.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/experiment_diff.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/experiment_diff.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/experiment_index.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/experiment_index.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/experiment_queue.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/experiment_queue.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/experiment_replay.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/experiment_replay.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/experiment_search.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/experiment_search.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/experiment_simulator.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/experiment_simulator.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/experiment_templates.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/experiment_templates.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/export_card.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/export_card.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/export_formats.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/export_formats.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/failure_postmortem.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/failure_postmortem.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/feature_intelligence.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/feature_intelligence.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/fork_experiment.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/fork_experiment.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/generate_baselines.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/generate_baselines.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/generate_brief.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/generate_brief.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/generate_changelog.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/generate_changelog.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/generate_figures.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/generate_figures.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/generate_logbook.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/generate_logbook.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/generate_model_card.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/generate_model_card.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/generate_onboarding.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/generate_onboarding.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/harness_doctor.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/harness_doctor.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/incremental_update.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/incremental_update.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/knowledge_transfer.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/knowledge_transfer.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/latency_benchmark.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/latency_benchmark.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/leakage_detector.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/leakage_detector.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/literature_search.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/literature_search.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/log_experiment.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/log_experiment.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/manage_hypotheses.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/manage_hypotheses.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/methodology_audit.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/methodology_audit.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/model_distiller.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/model_distiller.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/model_lifecycle.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/model_lifecycle.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/model_merger.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/model_merger.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/model_pruning.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/model_pruning.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/model_quantization.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/model_quantization.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/model_xray.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/model_xray.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/novelty_guard.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/novelty_guard.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/package_experiments.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/package_experiments.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/pareto_frontier.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/pareto_frontier.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/parse_metrics.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/parse_metrics.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/pipeline_manager.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/pipeline_manager.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/profile_training.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/profile_training.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/regression_gate.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/regression_gate.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/reproduce_experiment.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/reproduce_experiment.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/research_planner.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/research_planner.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/sanity_checks.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/sanity_checks.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/scaffold.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/scaffold.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/scaling_estimator.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/scaling_estimator.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/seed_runner.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/seed_runner.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/sensitivity_analysis.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/sensitivity_analysis.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/session_flashback.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/session_flashback.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/show_experiment_tree.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/show_experiment_tree.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/show_families.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/show_families.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/simulate_review.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/simulate_review.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/smart_retry.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/smart_retry.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/statistical_compare.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/statistical_compare.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/suggest_next.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/suggest_next.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/sweep.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/sweep.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/synthesize_decision.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/synthesize_decision.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/training_monitor.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/training_monitor.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/treequest_suggest.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/treequest_suggest.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/trend_analysis.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/trend_analysis.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/turing_io.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/turing_io.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/update_state.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/update_state.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/verify_placeholders.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/verify_placeholders.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/warm_start.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/warm_start.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/whatif_engine.cpython-312.pyc +0 -0
- package/templates/scripts/__pycache__/whatif_engine.cpython-314.pyc +0 -0
|
@@ -0,0 +1,46 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: checkpoint
|
|
3
|
+
description: Smart checkpoint management — list, prune (Pareto-based), average top-K, resume from any point, disk usage stats.
|
|
4
|
+
argument-hint: "<list|prune|average|resume|stats> [exp-id] [--top 3] [--dry-run]"
|
|
5
|
+
allowed-tools: Read, Bash(*), Grep, Glob
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
Manage model checkpoints intelligently using Pareto dominance.
|
|
9
|
+
|
|
10
|
+
## Steps
|
|
11
|
+
|
|
12
|
+
1. **Activate environment:**
|
|
13
|
+
```bash
|
|
14
|
+
source .venv/bin/activate
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
2. **Parse arguments from `$ARGUMENTS`:**
|
|
18
|
+
- First word is the action: `list`, `prune`, `average`, `resume`, `stats`
|
|
19
|
+
- `resume` requires an experiment ID as second argument
|
|
20
|
+
- `--top 3` sets the number of checkpoints for averaging
|
|
21
|
+
- `--dry-run` previews pruning without deleting
|
|
22
|
+
|
|
23
|
+
3. **Run checkpoint manager:**
|
|
24
|
+
```bash
|
|
25
|
+
python scripts/checkpoint_manager.py $ARGUMENTS
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
4. **Report results by action:**
|
|
29
|
+
- **list:** Table of all checkpoints with metrics, size, and Pareto status
|
|
30
|
+
- **prune:** Removes dominated checkpoints, reports space saved
|
|
31
|
+
- **average:** Lists top-K checkpoints for weight averaging
|
|
32
|
+
- **resume:** Locates checkpoint for a specific experiment
|
|
33
|
+
- **stats:** Disk usage summary by total, average, and model type
|
|
34
|
+
|
|
35
|
+
5. **Saved output:** report written to `experiments/checkpoints/checkpoint-report.yaml`
|
|
36
|
+
|
|
37
|
+
## Examples
|
|
38
|
+
|
|
39
|
+
```
|
|
40
|
+
/turing:checkpoint list # Show all checkpoints
|
|
41
|
+
/turing:checkpoint stats # Disk usage summary
|
|
42
|
+
/turing:checkpoint prune --dry-run # Preview what would be pruned
|
|
43
|
+
/turing:checkpoint prune # Remove dominated checkpoints
|
|
44
|
+
/turing:checkpoint average --top 5 # Top 5 for averaging
|
|
45
|
+
/turing:checkpoint resume exp-042 # Resume from checkpoint
|
|
46
|
+
```
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: cite
|
|
3
|
+
description: Citation & attribution manager — track papers, datasets, methods. Audit for missing citations, generate BibTeX.
|
|
4
|
+
argument-hint: "<add|list|check|bib> [--key Chen2016 --title XGBoost --url ...]"
|
|
5
|
+
allowed-tools: Read, Bash(*), Grep, Glob
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
Track which papers and methods influenced each experiment. Catch missing citations before submission.
|
|
9
|
+
|
|
10
|
+
## Steps
|
|
11
|
+
1. **Activate environment:** `source .venv/bin/activate`
|
|
12
|
+
2. **Run:** `python scripts/citation_manager.py $ARGUMENTS`
|
|
13
|
+
3. **Operations:** add (associate citation with experiment), list (group by type), check (audit missing), bib (BibTeX)
|
|
14
|
+
4. **Stored in:** `experiments/citations.yaml`
|
|
15
|
+
|
|
16
|
+
## Examples
|
|
17
|
+
```
|
|
18
|
+
/turing:cite add exp-042 --key Chen2016 --title "XGBoost" --type method --url "https://arxiv.org/abs/1603.02754"
|
|
19
|
+
/turing:cite list
|
|
20
|
+
/turing:cite check # Audit for missing citations
|
|
21
|
+
/turing:cite bib # Generate BibTeX
|
|
22
|
+
```
|
|
@@ -0,0 +1,23 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: compare
|
|
3
|
+
description: Compare two ML experiment runs side-by-side — metrics, configuration deltas, and a verdict on which approach is more promising.
|
|
4
|
+
argument-hint: "<exp-id-1> <exp-id-2>"
|
|
5
|
+
allowed-tools: Read, Bash(*), Grep, Glob
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
Compare two ML experiment runs side-by-side to understand what changed and why one performed better.
|
|
9
|
+
|
|
10
|
+
## Steps
|
|
11
|
+
|
|
12
|
+
1. **Run comparison:**
|
|
13
|
+
```bash
|
|
14
|
+
source .venv/bin/activate && python scripts/compare_runs.py $0 $1
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
2. **Analyze the delta:**
|
|
18
|
+
- **Metric differences:** all configured metrics for both runs
|
|
19
|
+
- **Configuration delta:** what changed (model type, hyperparameters, features)
|
|
20
|
+
- **Causal analysis:** which changes likely caused the metric difference
|
|
21
|
+
- **Verdict:** which approach is more promising for future experiments
|
|
22
|
+
|
|
23
|
+
3. **If either ID is missing:** report the error and suggest `/turing:status` to see available experiment IDs.
|
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: counterfactual
|
|
3
|
+
description: Input-level counterfactual explanations — find the smallest input change to flip a prediction.
|
|
4
|
+
argument-hint: "<exp-id> --sample <index> [--target <class>]"
|
|
5
|
+
allowed-tools: Read, Bash(*), Grep, Glob
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
What would need to change to flip this prediction? Minimum-change counterfactual for individual predictions.
|
|
9
|
+
|
|
10
|
+
## Steps
|
|
11
|
+
1. `source .venv/bin/activate`
|
|
12
|
+
2. `python scripts/counterfactual_explanation.py $ARGUMENTS`
|
|
13
|
+
3. **Saved:** `experiments/counterfactuals/`
|
|
14
|
+
|
|
15
|
+
## Methods
|
|
16
|
+
- **Greedy perturbation:** change one feature at a time, find minimum flip
|
|
17
|
+
- **Prototype-based:** find nearest training sample from target class
|
|
18
|
+
- Both methods run and the best (smallest distance) is selected
|
|
19
|
+
|
|
20
|
+
## Examples
|
|
21
|
+
```
|
|
22
|
+
/turing:counterfactual exp-042 --sample 1247
|
|
23
|
+
/turing:counterfactual exp-042 --sample 1247 --target 0
|
|
24
|
+
/turing:counterfactual exp-042 --batch-misclassified
|
|
25
|
+
/turing:counterfactual exp-042 --sample 500 --json
|
|
26
|
+
```
|
|
@@ -0,0 +1,42 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: curriculum
|
|
3
|
+
description: Training curriculum optimization — order data by difficulty, compare easy-to-hard vs hard-to-easy vs self-paced strategies.
|
|
4
|
+
argument-hint: "[exp-id] [--strategies easy-to-hard,random]"
|
|
5
|
+
allowed-tools: Read, Bash(*), Grep, Glob
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
Does the order your model sees data matter? Find out systematically.
|
|
9
|
+
|
|
10
|
+
## Steps
|
|
11
|
+
|
|
12
|
+
1. **Activate environment:**
|
|
13
|
+
```bash
|
|
14
|
+
source .venv/bin/activate
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
2. **Parse arguments from `$ARGUMENTS`:**
|
|
18
|
+
- Optional experiment ID
|
|
19
|
+
- `--strategies "easy_to_hard,hard_to_easy,self_paced,random"` — strategies to test
|
|
20
|
+
- `--json` — raw JSON output
|
|
21
|
+
|
|
22
|
+
3. **Run curriculum analysis:**
|
|
23
|
+
```bash
|
|
24
|
+
python scripts/curriculum_optimizer.py $ARGUMENTS
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
4. **Strategies tested:**
|
|
28
|
+
- **Random:** standard shuffling (control)
|
|
29
|
+
- **Easy-to-hard:** classic curriculum learning
|
|
30
|
+
- **Hard-to-easy:** anti-curriculum
|
|
31
|
+
- **Self-paced:** start easy, gradually include harder samples
|
|
32
|
+
|
|
33
|
+
5. **Report includes:** strategy comparison table with metric, convergence epoch, and speedup vs random; impossible sample detection (likely mislabeled)
|
|
34
|
+
|
|
35
|
+
6. **Saved output:** report in `experiments/curriculum/<exp-id>-curriculum.yaml`
|
|
36
|
+
|
|
37
|
+
## Examples
|
|
38
|
+
|
|
39
|
+
```
|
|
40
|
+
/turing:curriculum exp-042 # All strategies
|
|
41
|
+
/turing:curriculum --strategies easy_to_hard,random # Specific strategies
|
|
42
|
+
```
|
|
@@ -0,0 +1,96 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: design
|
|
3
|
+
description: Generate a structured experiment design for a hypothesis. Reads experiment history, searches literature for methodology, produces a scored design document at experiments/designs/.
|
|
4
|
+
argument-hint: "<hypothesis-id or description>"
|
|
5
|
+
allowed-tools: Read, Write, Bash(python scripts/*:*, source .venv/bin/activate:*, mkdir:*), Grep, Glob, WebSearch, WebFetch
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
Front-load the thinking before the coding. Given a hypothesis, produce a structured experiment design grounded in methodology from the literature.
|
|
9
|
+
|
|
10
|
+
## Steps
|
|
11
|
+
|
|
12
|
+
### 1. Load Context
|
|
13
|
+
|
|
14
|
+
If `$ARGUMENTS` matches `hyp-NNN`, load the hypothesis:
|
|
15
|
+
```bash
|
|
16
|
+
source .venv/bin/activate && python scripts/manage_hypotheses.py show $ARGUMENTS
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
If freeform text, use it directly as the hypothesis description.
|
|
20
|
+
|
|
21
|
+
Read the current config and experiment state:
|
|
22
|
+
```bash
|
|
23
|
+
cat config.yaml
|
|
24
|
+
```
|
|
25
|
+
```bash
|
|
26
|
+
source .venv/bin/activate && python scripts/show_metrics.py --last 10 2>/dev/null || echo "No experiments yet"
|
|
27
|
+
```
|
|
28
|
+
```bash
|
|
29
|
+
cat experiment_state.yaml 2>/dev/null || echo "No experiment state yet"
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
### 2. Search for Methodology
|
|
33
|
+
|
|
34
|
+
Use `WebSearch` to find 2-3 papers or articles describing how to implement the proposed change effectively. Target:
|
|
35
|
+
- The specific technique in the hypothesis (e.g., "LightGBM dart boosting implementation best practices")
|
|
36
|
+
- Common pitfalls for this type of change
|
|
37
|
+
- Benchmark results showing expected improvement range
|
|
38
|
+
|
|
39
|
+
Use `WebFetch` on the most relevant results to extract specific methodology details: hyperparameter recommendations, training procedures, evaluation approaches.
|
|
40
|
+
|
|
41
|
+
### 3. Write the Design Document
|
|
42
|
+
|
|
43
|
+
Create `experiments/designs/<hyp-id>-design.md` (or `experiments/designs/adhoc-<date>-design.md` for freeform hypotheses):
|
|
44
|
+
|
|
45
|
+
```bash
|
|
46
|
+
mkdir -p experiments/designs
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
Write with this structure:
|
|
50
|
+
|
|
51
|
+
```markdown
|
|
52
|
+
# Experiment Design: <hypothesis summary>
|
|
53
|
+
|
|
54
|
+
## Hypothesis
|
|
55
|
+
<full description>
|
|
56
|
+
|
|
57
|
+
## Objective
|
|
58
|
+
<what we're testing, stated as a falsifiable claim>
|
|
59
|
+
|
|
60
|
+
## Method
|
|
61
|
+
<specific changes, grounded in literature findings>
|
|
62
|
+
|
|
63
|
+
## Literature Support
|
|
64
|
+
- <source 1>: <what it says about this approach>
|
|
65
|
+
- <source 2>: <relevant finding>
|
|
66
|
+
|
|
67
|
+
## Implementation Plan
|
|
68
|
+
### Changes to train.py
|
|
69
|
+
<concrete code changes needed>
|
|
70
|
+
|
|
71
|
+
### Changes to config.yaml (if any)
|
|
72
|
+
<hyperparameter values to set, with rationale from literature>
|
|
73
|
+
|
|
74
|
+
## Expected Outcome
|
|
75
|
+
- **Success:** <metric > threshold, specific number>
|
|
76
|
+
- **Failure:** <what would disprove the hypothesis>
|
|
77
|
+
|
|
78
|
+
## Risks
|
|
79
|
+
<specific pitfalls from literature, not generic "might not work">
|
|
80
|
+
|
|
81
|
+
## Estimated Runs
|
|
82
|
+
<how many iterations>
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
### 4. Self-Critique
|
|
86
|
+
|
|
87
|
+
Review the design:
|
|
88
|
+
- Is the implementation plan specific enough for the researcher agent to execute without ambiguity?
|
|
89
|
+
- Does the expected outcome have a concrete metric threshold?
|
|
90
|
+
- Are risks actionable?
|
|
91
|
+
|
|
92
|
+
Score each dimension 1-10 (feasibility, novelty, clarity). If any < 7, revise that section. Max 2 revision rounds.
|
|
93
|
+
|
|
94
|
+
### 5. Report
|
|
95
|
+
|
|
96
|
+
Display the design summary with scores and file location. The researcher agent can read the design during `/turing:train`.
|
|
@@ -0,0 +1,51 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: diagnose
|
|
3
|
+
description: Error analysis — cluster failure cases, identify systematic failure modes, and suggest targeted fixes with auto-queued hypotheses.
|
|
4
|
+
argument-hint: "[exp-id] [--auto-queue] [--top 5]"
|
|
5
|
+
allowed-tools: Read, Bash(*), Grep, Glob
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
Analyze where and why the model fails, beyond aggregate metrics.
|
|
9
|
+
|
|
10
|
+
## Steps
|
|
11
|
+
|
|
12
|
+
1. **Activate environment:**
|
|
13
|
+
```bash
|
|
14
|
+
source .venv/bin/activate
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
2. **Generate predictions if needed:**
|
|
18
|
+
Check if `experiments/predictions/exp-NNN-preds.yaml` exists. If not, run:
|
|
19
|
+
```bash
|
|
20
|
+
python train.py --predict-only --output experiments/predictions/
|
|
21
|
+
```
|
|
22
|
+
The predictions file must contain `y_true`, `y_pred`, `task_type`, and optionally `features`.
|
|
23
|
+
|
|
24
|
+
3. **Parse arguments from `$ARGUMENTS`:**
|
|
25
|
+
- First argument can be an experiment ID (e.g., `exp-042`); defaults to best
|
|
26
|
+
- `--auto-queue` auto-queues hypotheses from failure modes into `hypotheses.yaml`
|
|
27
|
+
- `--top 5` limits to top N failure modes (default 5)
|
|
28
|
+
|
|
29
|
+
4. **Run error analysis:**
|
|
30
|
+
```bash
|
|
31
|
+
python scripts/diagnose_errors.py $ARGUMENTS
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
5. **Report results:**
|
|
35
|
+
- **Classification:** confusion matrix, most-confused pairs, per-class P/R/F1, low-recall classes
|
|
36
|
+
- **Regression:** residual stats, P90/P95 errors, feature-range bias, systematic bias
|
|
37
|
+
- **Failure modes:** ranked by impact, with suggested fixes
|
|
38
|
+
- **Auto-hypotheses:** if `--auto-queue`, shows queued hypotheses targeting weaknesses
|
|
39
|
+
|
|
40
|
+
6. **Saved output:** report written to `experiments/diagnoses/exp-NNN-diagnosis.yaml`
|
|
41
|
+
|
|
42
|
+
7. **If no predictions file exists:** instruct user to run the model on validation set first.
|
|
43
|
+
|
|
44
|
+
## Examples
|
|
45
|
+
|
|
46
|
+
```
|
|
47
|
+
/turing:diagnose # Analyze best experiment
|
|
48
|
+
/turing:diagnose exp-042 # Specific experiment
|
|
49
|
+
/turing:diagnose --auto-queue # Queue fix hypotheses
|
|
50
|
+
/turing:diagnose --top 10 # Top 10 failure modes
|
|
51
|
+
```
|
|
@@ -0,0 +1,47 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: diff
|
|
3
|
+
description: Deep experiment comparison — config diffs, metric significance, per-class regressions, training curve divergence, feature importance shifts.
|
|
4
|
+
argument-hint: "<exp-a> <exp-b> [--code]"
|
|
5
|
+
allowed-tools: Read, Bash(*), Grep, Glob
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
Deep diagnostic comparison of two experiments. Goes beyond "which metric is higher" to show where, when, and why two experiments diverge.
|
|
9
|
+
|
|
10
|
+
## Steps
|
|
11
|
+
|
|
12
|
+
1. **Activate environment:**
|
|
13
|
+
```bash
|
|
14
|
+
source .venv/bin/activate
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
2. **Parse arguments from `$ARGUMENTS`:**
|
|
18
|
+
- First two arguments are experiment IDs (required), e.g. `exp-042 exp-053`
|
|
19
|
+
- `--code` includes git diff of train.py between the two experiments' commits
|
|
20
|
+
- `--json` outputs raw JSON instead of markdown
|
|
21
|
+
|
|
22
|
+
3. **Run deep comparison:**
|
|
23
|
+
```bash
|
|
24
|
+
python scripts/experiment_diff.py $ARGUMENTS
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
4. **Report results — the diff includes:**
|
|
28
|
+
- **Config diff:** which hyperparameters changed, with magnitude (e.g., `max_depth: 6 → 8 (+33%)`)
|
|
29
|
+
- **Metric diff:** all metrics with deltas and statistical significance (if seed studies exist)
|
|
30
|
+
- **Per-class diff:** which classes improved/regressed — flags regressions hidden by aggregate improvement
|
|
31
|
+
- **Training curve divergence:** the epoch where the two experiments' loss/metric curves separate
|
|
32
|
+
- **Feature importance shifts:** which features gained/lost importance
|
|
33
|
+
- **Code diff (--code):** git diff of train.py between the two commits
|
|
34
|
+
|
|
35
|
+
5. **Saved output:** report written to `experiments/diffs/<exp-a>-vs-<exp-b>.yaml`
|
|
36
|
+
|
|
37
|
+
6. **If experiment ID not found:** list available experiment IDs from `experiments/log.jsonl`
|
|
38
|
+
|
|
39
|
+
7. **If no training pipeline exists:** suggest `/turing:init` first.
|
|
40
|
+
|
|
41
|
+
## Examples
|
|
42
|
+
|
|
43
|
+
```
|
|
44
|
+
/turing:diff exp-042 exp-053 # Full diagnostic comparison
|
|
45
|
+
/turing:diff exp-042 exp-053 --code # Include train.py code changes
|
|
46
|
+
/turing:diff exp-001 exp-010 --json # Raw JSON output
|
|
47
|
+
```
|
|
@@ -0,0 +1,55 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: distill
|
|
3
|
+
description: Model compression via distillation — train a smaller student model to match a larger teacher's predictions.
|
|
4
|
+
argument-hint: "<teacher-exp-id> [--compression 4] [--method soft-labels]"
|
|
5
|
+
allowed-tools: Read, Bash(*), Grep, Glob
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
Compress a large model into a smaller, faster one for production. Measures the accuracy/size/latency tradeoff.
|
|
9
|
+
|
|
10
|
+
## Steps
|
|
11
|
+
|
|
12
|
+
1. **Activate environment:**
|
|
13
|
+
```bash
|
|
14
|
+
source .venv/bin/activate
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
2. **Parse arguments from `$ARGUMENTS`:**
|
|
18
|
+
- First argument is teacher experiment ID (required)
|
|
19
|
+
- `--compression 4` — compression ratio (default: 4x)
|
|
20
|
+
- `--method soft_labels|feature_matching|dataset_distillation` — distillation method
|
|
21
|
+
- `--target-latency 5` — auto-adjust compression to meet latency target (ms)
|
|
22
|
+
- `--json` — raw JSON output
|
|
23
|
+
|
|
24
|
+
3. **Run distillation planner:**
|
|
25
|
+
```bash
|
|
26
|
+
python scripts/model_distiller.py $ARGUMENTS
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
4. **Report includes:**
|
|
30
|
+
- Teacher model metrics
|
|
31
|
+
- Auto-selected student architecture (fewer trees/layers/width)
|
|
32
|
+
- Estimated size reduction and latency improvement
|
|
33
|
+
- Distillation configuration (temperature, alpha, loss function)
|
|
34
|
+
- Verdict: EXCELLENT / ACCEPTABLE / MARGINAL / TOO MUCH LOSS
|
|
35
|
+
|
|
36
|
+
5. **Student selection by model type:**
|
|
37
|
+
- **Tree models:** fewer estimators, shallower depth
|
|
38
|
+
- **Neural networks:** fewer layers, narrower hidden dims
|
|
39
|
+
- **scikit-learn:** simpler model family (RandomForest → DecisionTree)
|
|
40
|
+
|
|
41
|
+
6. **Distillation methods:**
|
|
42
|
+
- **soft_labels:** train on teacher's probability outputs with temperature scaling
|
|
43
|
+
- **feature_matching:** align intermediate representations (neural only)
|
|
44
|
+
- **dataset_distillation:** train on teacher-labeled synthetic data
|
|
45
|
+
|
|
46
|
+
7. **Saved output:** report written to `experiments/distillations/distill-<exp-id>.yaml`
|
|
47
|
+
|
|
48
|
+
## Examples
|
|
49
|
+
|
|
50
|
+
```
|
|
51
|
+
/turing:distill exp-042 # 4x compression, soft labels
|
|
52
|
+
/turing:distill exp-042 --compression 8 # Aggressive compression
|
|
53
|
+
/turing:distill exp-042 --method feature_matching # Neural feature alignment
|
|
54
|
+
/turing:distill exp-042 --target-latency 5 # Meet 5ms latency target
|
|
55
|
+
```
|
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: doctor
|
|
3
|
+
description: Harness self-diagnosis — check environment, project, resources, and git state. Auto-fix common issues.
|
|
4
|
+
argument-hint: "[--fix] [--verbose]"
|
|
5
|
+
allowed-tools: Read, Bash(*), Grep, Glob
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
Is Turing healthy? Check everything and get a score.
|
|
9
|
+
|
|
10
|
+
## Steps
|
|
11
|
+
1. `source .venv/bin/activate`
|
|
12
|
+
2. `python scripts/harness_doctor.py $ARGUMENTS`
|
|
13
|
+
3. **Saved:** `experiments/doctor/`
|
|
14
|
+
|
|
15
|
+
## Checks
|
|
16
|
+
- **Environment:** Python version, venv status
|
|
17
|
+
- **Dependencies:** all required packages importable
|
|
18
|
+
- **Config:** config.yaml valid with required fields
|
|
19
|
+
- **Experiment log:** JSONL integrity, corrupt line detection
|
|
20
|
+
- **Scripts:** train.py, prepare.py, evaluate.py exist and parse
|
|
21
|
+
- **Disk space:** warn if <1GB free
|
|
22
|
+
- **Git state:** uncommitted changes to critical files
|
|
23
|
+
- **Claude hooks:** `.claude/settings.local.json` hook group schema; `--fix` migrates legacy bare command hooks
|
|
24
|
+
|
|
25
|
+
## Examples
|
|
26
|
+
```
|
|
27
|
+
/turing:doctor
|
|
28
|
+
/turing:doctor --fix
|
|
29
|
+
/turing:doctor --verbose --json
|
|
30
|
+
```
|
|
@@ -0,0 +1,53 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ensemble
|
|
3
|
+
description: Automated ensemble construction — combines top-K models via voting, stacking, and blending for zero-cost improvement.
|
|
4
|
+
argument-hint: "[--top-k 5] [--methods voting,stacking,blending]"
|
|
5
|
+
allowed-tools: Read, Bash(*), Grep, Glob
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
Build ensembles from your best experiments automatically. Often yields 1-3% improvement with zero additional training.
|
|
9
|
+
|
|
10
|
+
## Steps
|
|
11
|
+
|
|
12
|
+
1. **Activate environment:**
|
|
13
|
+
```bash
|
|
14
|
+
source .venv/bin/activate
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
2. **Parse arguments from `$ARGUMENTS`:**
|
|
18
|
+
- `--top-k 5` — number of top models to include (default: 5)
|
|
19
|
+
- `--methods voting,stacking,blending` — ensemble methods to try
|
|
20
|
+
- `--predictions-dir experiments/predictions` — directory with saved predictions
|
|
21
|
+
- `--json` — raw JSON output
|
|
22
|
+
|
|
23
|
+
3. **Run ensemble construction:**
|
|
24
|
+
```bash
|
|
25
|
+
python scripts/build_ensemble.py $ARGUMENTS
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
4. **Report results:**
|
|
29
|
+
- Table of all ensemble methods tried with metric deltas vs best single model
|
|
30
|
+
- Best ensemble method highlighted with improvement amount
|
|
31
|
+
- Diversity analysis: prediction correlation matrix, diversity assessment
|
|
32
|
+
- Base model summary: which experiments were combined
|
|
33
|
+
|
|
34
|
+
5. **Ensemble methods:**
|
|
35
|
+
- **Voting:** majority vote (classification) or mean (regression)
|
|
36
|
+
- **Weighted voting:** weights proportional to individual model performance
|
|
37
|
+
- **Stacking:** cross-validated meta-learner (ridge/logistic) on out-of-fold predictions
|
|
38
|
+
- **Blending:** holdout-based meta-learner (simpler, less data-efficient)
|
|
39
|
+
|
|
40
|
+
6. **Prerequisites:** experiments must have saved predictions in `experiments/predictions/`. Each experiment needs `<exp-id>-predictions.npy` and a shared `labels.npy`.
|
|
41
|
+
|
|
42
|
+
7. **If no predictions exist:** suggest saving predictions during training by adding prediction logging to `evaluate.py`.
|
|
43
|
+
|
|
44
|
+
8. **Saved output:** report written to `experiments/ensembles/ensemble-*.yaml`
|
|
45
|
+
|
|
46
|
+
## Examples
|
|
47
|
+
|
|
48
|
+
```
|
|
49
|
+
/turing:ensemble # Default: top-5, all methods
|
|
50
|
+
/turing:ensemble --top-k 3 # Top-3 models only
|
|
51
|
+
/turing:ensemble --methods voting,stacking # Specific methods
|
|
52
|
+
/turing:ensemble --json # Machine-readable output
|
|
53
|
+
```
|
|
@@ -0,0 +1,106 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: explore
|
|
3
|
+
description: Tree-search-guided hypothesis exploration using AB-MCTS. Explores the space of experiment ideas as a search tree, scored by the critique engine. Discovers non-obvious refinement chains that linear suggestion cannot find.
|
|
4
|
+
argument-hint: "[ml/project] [--iterations N] [--top N] [--strategy abmcts-a|abmcts-m|greedy]"
|
|
5
|
+
allowed-tools: Read, Write, Bash(python scripts/*:*, source .venv/bin/activate:*), Grep, Glob
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
Explore the hypothesis space using tree search. Instead of suggesting independent ideas, this builds and searches a tree of refinement chains — each node is a hypothesis scored by novelty, feasibility, and expected impact.
|
|
9
|
+
|
|
10
|
+
## Project Detection
|
|
11
|
+
|
|
12
|
+
0. **Detect project directory:**
|
|
13
|
+
- If `$ARGUMENTS` contains a path (e.g., `ml/coding`), use that as the project directory
|
|
14
|
+
- Else if cwd contains `config.yaml` and `train.py`, use cwd
|
|
15
|
+
- Else search for `ml/*/` subdirectories containing `config.yaml`
|
|
16
|
+
- If exactly one found, use it
|
|
17
|
+
- If multiple found, list them and ask the user which to target
|
|
18
|
+
- All subsequent commands run from the detected project directory
|
|
19
|
+
|
|
20
|
+
## Parse Options
|
|
21
|
+
|
|
22
|
+
Extract from `$ARGUMENTS`:
|
|
23
|
+
- `--iterations N` — search depth (default: 30)
|
|
24
|
+
- `--top N` — number of results to return (default: 5)
|
|
25
|
+
- `--strategy` — algorithm choice: `abmcts-a` (default), `abmcts-m` (Bayesian), or `greedy` (no TreeQuest needed)
|
|
26
|
+
- `--seeds-only` — just show generated seeds without running search
|
|
27
|
+
- `--json` — output as JSON for programmatic use
|
|
28
|
+
|
|
29
|
+
## Steps
|
|
30
|
+
|
|
31
|
+
### 1. Assess Current State
|
|
32
|
+
|
|
33
|
+
```bash
|
|
34
|
+
source .venv/bin/activate && python scripts/show_metrics.py --last 10 2>/dev/null || echo "No experiments yet"
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
Read `config.yaml` to understand the current model and metric.
|
|
38
|
+
|
|
39
|
+
### 2. Run Tree Search
|
|
40
|
+
|
|
41
|
+
```bash
|
|
42
|
+
source .venv/bin/activate && python scripts/treequest_suggest.py \
|
|
43
|
+
--log experiments/log.jsonl \
|
|
44
|
+
--config config.yaml \
|
|
45
|
+
--top <N> \
|
|
46
|
+
--iterations <N> \
|
|
47
|
+
--strategy <strategy>
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
The script will:
|
|
51
|
+
- Generate seed hypotheses from config and experiment history
|
|
52
|
+
- Run AB-MCTS (or greedy fallback) over the hypothesis tree
|
|
53
|
+
- Score each node using the critique engine
|
|
54
|
+
- Return top-K ranked, deduplicated hypotheses
|
|
55
|
+
|
|
56
|
+
### 3. Queue Best Hypotheses
|
|
57
|
+
|
|
58
|
+
For each result, add to the hypothesis queue:
|
|
59
|
+
|
|
60
|
+
```bash
|
|
61
|
+
source .venv/bin/activate && python scripts/manage_hypotheses.py add "<description>" \
|
|
62
|
+
--priority medium --source treequest
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
### 4. Show Results
|
|
66
|
+
|
|
67
|
+
Display the search output and confirm queuing:
|
|
68
|
+
|
|
69
|
+
```
|
|
70
|
+
TreeQuest Hypothesis Exploration (AB-MCTS-A)
|
|
71
|
+
============================================
|
|
72
|
+
Nodes explored: 35
|
|
73
|
+
Top 5 hypotheses by critique score:
|
|
74
|
+
|
|
75
|
+
1. [PROCEED] (score: 7.8/10)
|
|
76
|
+
Switch to LightGBM with dart boosting; additionally add polynomial features
|
|
77
|
+
Novelty: 8 Feasibility: 9 Impact: 7
|
|
78
|
+
-> Queued as hyp-NNN
|
|
79
|
+
|
|
80
|
+
2. [PROCEED] (score: 7.2/10)
|
|
81
|
+
Use low learning rate (0.01) with 2000 estimators; additionally add L2 regularization
|
|
82
|
+
Novelty: 7 Feasibility: 8 Impact: 7
|
|
83
|
+
Depth: 1 (refined from parent)
|
|
84
|
+
-> Queued as hyp-NNN
|
|
85
|
+
|
|
86
|
+
...
|
|
87
|
+
|
|
88
|
+
Queued N hypotheses. Run /turing:train to test them.
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
## How It Differs From /turing:suggest
|
|
92
|
+
|
|
93
|
+
| | `/turing:suggest` | `/turing:explore` |
|
|
94
|
+
|---|---|---|
|
|
95
|
+
| **Source** | Web literature search | Tree search over critique scores |
|
|
96
|
+
| **Strategy** | Independent suggestions | Refinement chains (parent -> child) |
|
|
97
|
+
| **Requires internet** | Yes | No |
|
|
98
|
+
| **Discovers** | What papers recommend | What combinations score well |
|
|
99
|
+
| **Best for** | Early-stage exploration | Mid-experiment optimization |
|
|
100
|
+
|
|
101
|
+
## Integration
|
|
102
|
+
|
|
103
|
+
- Results feed into `hypotheses.yaml` — the next `/turing:train` picks them up
|
|
104
|
+
- `/turing:brief` shows queued treequest-sourced hypotheses
|
|
105
|
+
- `/turing:suggest --strategy treequest` is an alias for this command
|
|
106
|
+
- Human can override priority: `/turing:try` always takes precedence
|
|
@@ -0,0 +1,47 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: export
|
|
3
|
+
description: Export model to production format with equivalence verification, latency benchmarking, and deployment model card.
|
|
4
|
+
argument-hint: "[exp-id] [--format joblib|xgboost_json|onnx|torchscript|tflite]"
|
|
5
|
+
allowed-tools: Read, Bash(*), Grep, Glob
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
Export a trained model to a production-ready format.
|
|
9
|
+
|
|
10
|
+
## Steps
|
|
11
|
+
|
|
12
|
+
1. **Activate environment:**
|
|
13
|
+
```bash
|
|
14
|
+
source .venv/bin/activate
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
2. **Parse arguments from `$ARGUMENTS`:**
|
|
18
|
+
- First argument can be an experiment ID (e.g., `exp-042`); defaults to best
|
|
19
|
+
- `--format joblib|xgboost_json|onnx|torchscript|tflite` specifies export format (auto-detected if omitted)
|
|
20
|
+
- `--skip-equivalence` skips inference equivalence check
|
|
21
|
+
- `--skip-latency` skips latency benchmark
|
|
22
|
+
- `--samples 100` sets test sample count
|
|
23
|
+
|
|
24
|
+
3. **Run export pipeline:**
|
|
25
|
+
```bash
|
|
26
|
+
python scripts/export_model.py $ARGUMENTS
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
4. **Report results:**
|
|
30
|
+
- **Export:** format, file size, output path, dependencies
|
|
31
|
+
- **Equivalence:** verdict (equivalent/approximately_equivalent/divergent), max delta
|
|
32
|
+
- **Latency:** p50/p95/p99 ms, speedup vs original
|
|
33
|
+
- **Model Card:** metrics, seed study, equivalence, latency, dependencies
|
|
34
|
+
|
|
35
|
+
5. **Output:** exported model + model_card.yaml written to `exports/exp-NNN/`
|
|
36
|
+
|
|
37
|
+
6. **If model file not found:** suggest checking models/best/ directory.
|
|
38
|
+
|
|
39
|
+
## Examples
|
|
40
|
+
|
|
41
|
+
```
|
|
42
|
+
/turing:export # Best experiment, default format
|
|
43
|
+
/turing:export exp-042 # Specific experiment
|
|
44
|
+
/turing:export --format xgboost_json # Native XGBoost JSON
|
|
45
|
+
/turing:export --format onnx # ONNX format
|
|
46
|
+
/turing:export --skip-equivalence --skip-latency # Fast export
|
|
47
|
+
```
|