claude-turing 4.1.0 → 4.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "turing",
3
- "version": "4.1.0",
4
- "description": "Autonomous ML research harness — the autoresearch loop as a formal protocol. 66 commands, 2 specialized agents, collaboration (onboard + share + review), research communication (cite + present + changelog), experiment archaeology (trend + flashback + archive + annotate + search + template + replay), model surgery (prune + quantize + merge + surgery), feature & training intelligence, model debugging, pre-training intelligence, meta-intelligence, scaling & efficiency, model composition, deep analysis, experiment orchestration, literature + paper, model export, profiling, checkpoints, experiment intelligence, statistical rigor, tree-search, cost-performance, model cards, hypothesis database, novelty guard, anti-cheating, taste-leverage loop. Inspired by Karpathy's autoresearch and the scientific method itself.",
3
+ "version": "4.3.0",
4
+ "description": "Autonomous ML research harness — the autoresearch loop as a formal protocol. 71 commands, 2 specialized agents, model lifecycle (update + registry), what-if analysis (whatif + counterfactual + simulate), collaboration (onboard + share + review), research communication (cite + present + changelog), experiment archaeology (trend + flashback + archive + annotate + search + template + replay), model surgery (prune + quantize + merge + surgery), feature & training intelligence, model debugging, pre-training intelligence, meta-intelligence, scaling & efficiency, model composition, deep analysis, experiment orchestration, literature + paper, model export, profiling, checkpoints, experiment intelligence, statistical rigor, tree-search, cost-performance, model cards, hypothesis database, novelty guard, anti-cheating, taste-leverage loop. Inspired by Karpathy's autoresearch and the scientific method itself.",
5
5
  "author": {
6
6
  "name": "pragnition"
7
7
  },
package/README.md CHANGED
@@ -377,6 +377,11 @@ The index (`hypotheses.yaml`) is the lightweight queue. The detail files (`hypot
377
377
  | `/turing:onboard [--audience]` | Project onboarding — walkthrough for new collaborators |
378
378
  | `/turing:share <exp-ids...>` | Experiment packaging — portable archive with manifest |
379
379
  | `/turing:review [--venue]` | Peer review simulation — weaknesses, fix commands, score |
380
+ | `/turing:whatif "<question>"` | What-if analysis — answer hypotheticals from existing experiment data |
381
+ | `/turing:counterfactual <exp-id>` | Counterfactual explanations — minimum input change to flip a prediction |
382
+ | `/turing:simulate [--configs]` | Experiment outcome prediction — pre-filter configs, save budget |
383
+ | `/turing:update <exp-id>` | Incremental model update — add new data without full retraining |
384
+ | `/turing:registry [action]` | Model registry — track lifecycle from candidate to production with gates |
380
385
 
381
386
  And for fully hands-off operation:
382
387
 
@@ -561,11 +566,11 @@ Each project gets independent config, data, experiments, models, and agent memor
561
566
 
562
567
  ## Architecture of Turing Itself
563
568
 
564
- 66 commands, 2 agents, 10 config files, 85 template scripts, model registry, artifact contract, cost-performance frontier, model cards, tree-search exploration, statistical rigor, experiment intelligence, performance profiling, smart checkpoints, production model export, literature integration, paper section drafting, experiment orchestration (queue + retry + fork), deep analysis (diff + watch + regress), model composition (ensemble + stitch + warm), scaling & efficiency (scale + budget + distill), meta-intelligence (transfer + audit), pre-training intelligence (sanity + baseline + leak), model debugging (xray + sensitivity + calibrate), feature & training intelligence (feature + curriculum), model surgery (prune + quantize + merge + surgery), experiment archaeology (trend + flashback + archive + annotate + search + template + replay), research communication (cite + present + changelog), collaboration (onboard + share + review), 16 ADRs. See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for the full codemap.
569
+ 71 commands, 2 agents, 10 config files, 90 template scripts, model registry, artifact contract, cost-performance frontier, model cards, tree-search exploration, statistical rigor, experiment intelligence, performance profiling, smart checkpoints, production model export, literature integration, paper section drafting, experiment orchestration (queue + retry + fork), deep analysis (diff + watch + regress), model composition (ensemble + stitch + warm), scaling & efficiency (scale + budget + distill), meta-intelligence (transfer + audit), pre-training intelligence (sanity + baseline + leak), model debugging (xray + sensitivity + calibrate), feature & training intelligence (feature + curriculum), model surgery (prune + quantize + merge + surgery), experiment archaeology (trend + flashback + archive + annotate + search + template + replay), research communication (cite + present + changelog), collaboration (onboard + share + review), what-if analysis (whatif + counterfactual + simulate), model lifecycle (update + registry), 16 ADRs. See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for the full codemap.
565
570
 
566
571
  ```
567
572
  turing/
568
- ├── commands/ 62 skill files (core + taste-leverage + reporting + exploration + statistical rigor + experiment intelligence + performance + deployment + research workflow + orchestration + deep analysis + model composition + scaling & efficiency + meta-intelligence + pre-training intelligence + model debugging + feature & training intelligence + model surgery + experiment archaeology + research communication)
573
+ ├── commands/ 67 skill files (core + taste-leverage + reporting + exploration + statistical rigor + experiment intelligence + performance + deployment + research workflow + orchestration + deep analysis + model composition + scaling & efficiency + meta-intelligence + pre-training intelligence + model debugging + feature & training intelligence + model surgery + experiment archaeology + research communication + what-if analysis + model lifecycle)
569
574
  ├── agents/ 2 agents (researcher: read/write, evaluator: read-only)
570
575
  ├── config/ 8 files (lifecycle, taxonomy, archetypes, novelty aliases)
571
576
  ├── templates/ Scaffolded into user projects by /turing:init
@@ -0,0 +1,27 @@
1
+ ---
2
+ name: counterfactual
3
+ description: Input-level counterfactual explanations — find the smallest input change to flip a prediction.
4
+ disable-model-invocation: true
5
+ argument-hint: "<exp-id> --sample <index> [--target <class>]"
6
+ allowed-tools: Read, Bash(*), Grep, Glob
7
+ ---
8
+
9
+ What would need to change to flip this prediction? Minimum-change counterfactual for individual predictions.
10
+
11
+ ## Steps
12
+ 1. `source .venv/bin/activate`
13
+ 2. `python scripts/counterfactual_explanation.py $ARGUMENTS`
14
+ 3. **Saved:** `experiments/counterfactuals/`
15
+
16
+ ## Methods
17
+ - **Greedy perturbation:** change one feature at a time, find minimum flip
18
+ - **Prototype-based:** find nearest training sample from target class
19
+ - Both methods run and the best (smallest distance) is selected
20
+
21
+ ## Examples
22
+ ```
23
+ /turing:counterfactual exp-042 --sample 1247
24
+ /turing:counterfactual exp-042 --sample 1247 --target 0
25
+ /turing:counterfactual exp-042 --batch-misclassified
26
+ /turing:counterfactual exp-042 --sample 500 --json
27
+ ```
@@ -0,0 +1,31 @@
1
+ ---
2
+ name: registry
3
+ description: Model registry — track, promote, and govern the model lifecycle from candidate to production.
4
+ disable-model-invocation: true
5
+ argument-hint: "[list|register|promote|demote|archive|history] [exp-id] [stage]"
6
+ allowed-tools: Read, Bash(*), Grep, Glob
7
+ ---
8
+
9
+ Track which model is production, staging, candidate, or archived. Promotion requires passing gates.
10
+
11
+ ## Steps
12
+ 1. `source .venv/bin/activate`
13
+ 2. `python scripts/model_lifecycle.py $ARGUMENTS`
14
+ 3. **Registry:** `experiments/registry.yaml`
15
+
16
+ ## Promotion gates
17
+ - **candidate → staging:** regression check + seed study must PASS
18
+ - **staging → production:** audit + calibration check must PASS
19
+ - Use `--force` to skip gate checks
20
+
21
+ ## Examples
22
+ ```
23
+ /turing:registry list
24
+ /turing:registry register exp-095 --version v4.1
25
+ /turing:registry promote exp-089 staging
26
+ /turing:registry promote exp-089 production --force
27
+ /turing:registry demote exp-078 staging --reason "latency regression"
28
+ /turing:registry archive exp-042 --reason "superseded by v4"
29
+ /turing:registry history
30
+ /turing:registry history exp-089
31
+ ```
@@ -0,0 +1,28 @@
1
+ ---
2
+ name: simulate
3
+ description: Experiment outcome prediction — predict which configs will beat the current best before running them.
4
+ disable-model-invocation: true
5
+ argument-hint: "[--configs configs.yaml] [--top-k 5] [--threshold 0.001]"
6
+ allowed-tools: Read, Bash(*), Grep, Glob
7
+ ---
8
+
9
+ Predict outcomes before spending compute. Ranks proposed configs and recommends which to run vs skip.
10
+
11
+ ## Steps
12
+ 1. `source .venv/bin/activate`
13
+ 2. `python scripts/experiment_simulator.py $ARGUMENTS`
14
+ 3. **Saved:** `experiments/simulations/`
15
+
16
+ ## How it works
17
+ - Builds a surrogate model from experiment history (weighted k-NN)
18
+ - Predicts metric for each proposed config
19
+ - Applies novelty penalty for configs far from training distribution
20
+ - Ranks and filters: only recommend configs predicted to improve
21
+
22
+ ## Examples
23
+ ```
24
+ /turing:simulate --configs sweep_configs.yaml
25
+ /turing:simulate --configs candidates.yaml --top-k 3
26
+ /turing:simulate --configs proposals.yaml --threshold 0.005
27
+ /turing:simulate --configs sweep.yaml --json
28
+ ```
@@ -75,6 +75,11 @@ You are the Turing ML research router. Detect the user's intent and route to the
75
75
  | "search", "find experiment", "query experiments", "which experiments" | `/turing:search` | Query |
76
76
  | "template", "recipe", "save config", "reusable config", "starting point" | `/turing:template` | Manage |
77
77
  | "replay", "re-run", "revisit", "retry old", "would it work now" | `/turing:replay` | Validate |
78
+ | "what if", "what-if", "hypothetical", "estimate impact", "would it help" | `/turing:whatif` | Analyze |
79
+ | "counterfactual", "flip prediction", "why this prediction", "minimum change", "explanation" | `/turing:counterfactual` | Explain |
80
+ | "simulate", "predict outcome", "pre-filter", "which configs will work", "forecast" | `/turing:simulate` | Predict |
81
+ | "update", "incremental", "new data", "add data", "fine-tune existing", "partial update" | `/turing:update` | Update |
82
+ | "registry", "promote", "demote", "staging", "production", "which model is deployed", "model lifecycle" | `/turing:registry` | Govern |
78
83
 
79
84
  ## Sub-commands
80
85
 
@@ -146,6 +151,11 @@ You are the Turing ML research router. Detect the user's intent and route to the
146
151
  | `/turing:onboard [--audience]` | Project onboarding: full walkthrough for new collaborators | (inline) |
147
152
  | `/turing:share <exp-ids...>` | Experiment packaging: portable archive with manifest and README | (inline) |
148
153
  | `/turing:review [--venue]` | Peer review simulation: weaknesses, questions, fix commands, score | (inline) |
154
+ | `/turing:whatif "<question>"` | What-if analysis: route hypotheticals to existing estimators (scaling, ablation, sensitivity, ensemble, pruning) | (inline) |
155
+ | `/turing:counterfactual <exp-id> --sample <index>` | Input-level counterfactual explanations: minimum input change to flip a prediction | (inline) |
156
+ | `/turing:simulate [--configs] [--top-k]` | Experiment outcome prediction: pre-filter configs using surrogate model, save budget | (inline) |
157
+ | `/turing:update <exp-id> --new-data <path>` | Incremental model update: add new data without full retraining, forgetting detection | (inline) |
158
+ | `/turing:registry [list\|register\|promote\|demote\|history]` | Model registry: stage lifecycle (candidate → staging → production) with promotion gates | (inline) |
149
159
 
150
160
  ## Proactive Detection
151
161
 
@@ -0,0 +1,27 @@
1
+ ---
2
+ name: update
3
+ description: Incremental model update — add new data without full retraining, with forgetting detection.
4
+ disable-model-invocation: true
5
+ argument-hint: "<exp-id> --new-data <path> [--replay-ratio 0.1] [--tolerance 0.005]"
6
+ allowed-tools: Read, Bash(*), Grep, Glob
7
+ ---
8
+
9
+ Add new data to an existing model without starting from scratch. Detects catastrophic forgetting.
10
+
11
+ ## Steps
12
+ 1. `source .venv/bin/activate`
13
+ 2. `python scripts/incremental_update.py $ARGUMENTS`
14
+ 3. **Saved:** `experiments/updates/`
15
+
16
+ ## Model-specific strategies
17
+ - **XGBoost/LightGBM:** continued boosting with additional rounds
18
+ - **Neural networks:** fine-tune with reduced LR + replay buffer from old data
19
+ - **scikit-learn:** partial_fit() or warm_start=True
20
+
21
+ ## Examples
22
+ ```
23
+ /turing:update exp-089 --new-data data/new_batch.csv
24
+ /turing:update exp-089 --new-data data/new.csv --replay-ratio 0.2
25
+ /turing:update exp-089 --new-data data/new.csv --tolerance 0.01
26
+ /turing:update exp-089 --new-data data/new.csv --json
27
+ ```
@@ -0,0 +1,31 @@
1
+ ---
2
+ name: whatif
3
+ description: What-if analysis — answer hypotheticals from existing experiment data without running new experiments.
4
+ disable-model-invocation: true
5
+ argument-hint: "\"<question>\" [--json]"
6
+ allowed-tools: Read, Bash(*), Grep, Glob
7
+ ---
8
+
9
+ Answer "what if?" questions using existing experiment data. Routes to the right estimator automatically.
10
+
11
+ ## Steps
12
+ 1. `source .venv/bin/activate`
13
+ 2. `python scripts/whatif_engine.py $ARGUMENTS`
14
+ 3. **Saved:** `experiments/whatif/`
15
+
16
+ ## Supported question types
17
+ - **Data scaling:** "what if I had 2x more data" → scaling law extrapolation
18
+ - **Ablation:** "what if I removed class 3" → ablation study data
19
+ - **Pipeline stitch:** "what if I combined exp-031 with exp-042" → stitch estimation
20
+ - **Hyperparameters:** "what if learning_rate was 0.01" → sensitivity interpolation
21
+ - **Ensemble:** "what if I ensembled the top models" → correlation analysis
22
+ - **Pruning:** "what if I pruned to 50% sparsity" → pruning sweep interpolation
23
+ - **Budget:** "what if I spent my budget on X vs Y" → budget allocation
24
+
25
+ ## Examples
26
+ ```
27
+ /turing:whatif "what if I had 2x more data"
28
+ /turing:whatif "what if I removed class 3"
29
+ /turing:whatif "what if I combined exp-031 with exp-042"
30
+ /turing:whatif "what if learning_rate was 0.01" --json
31
+ ```
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "claude-turing",
3
- "version": "4.1.0",
3
+ "version": "4.3.0",
4
4
  "type": "module",
5
5
  "description": "Autonomous ML research harness for Claude Code. The autoresearch loop as a formal protocol — iteratively trains, evaluates, and improves ML models with structured experiment tracking, convergence detection, immutable evaluation infrastructure, and safety guardrails.",
6
6
  "bin": {
package/src/install.js CHANGED
@@ -36,6 +36,8 @@ const SUB_COMMANDS = [
36
36
  "trend", "flashback", "archive", "annotate", "search", "template", "replay",
37
37
  "cite", "present", "changelog",
38
38
  "onboard", "share", "review",
39
+ "whatif", "counterfactual", "simulate",
40
+ "update", "registry",
39
41
  ];
40
42
 
41
43
  export async function install(opts = {}) {
package/src/verify.js CHANGED
@@ -80,6 +80,11 @@ const EXPECTED_COMMANDS = [
80
80
  "onboard/SKILL.md",
81
81
  "share/SKILL.md",
82
82
  "review/SKILL.md",
83
+ "whatif/SKILL.md",
84
+ "counterfactual/SKILL.md",
85
+ "simulate/SKILL.md",
86
+ "update/SKILL.md",
87
+ "registry/SKILL.md",
83
88
  ];
84
89
 
85
90
  const EXPECTED_AGENTS = ["ml-researcher.md", "ml-evaluator.md"];