claude-turing 4.3.0 → 4.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "turing",
3
- "version": "4.3.0",
4
- "description": "Autonomous ML research harness — the autoresearch loop as a formal protocol. 71 commands, 2 specialized agents, model lifecycle (update + registry), what-if analysis (whatif + counterfactual + simulate), collaboration (onboard + share + review), research communication (cite + present + changelog), experiment archaeology (trend + flashback + archive + annotate + search + template + replay), model surgery (prune + quantize + merge + surgery), feature & training intelligence, model debugging, pre-training intelligence, meta-intelligence, scaling & efficiency, model composition, deep analysis, experiment orchestration, literature + paper, model export, profiling, checkpoints, experiment intelligence, statistical rigor, tree-search, cost-performance, model cards, hypothesis database, novelty guard, anti-cheating, taste-leverage loop. Inspired by Karpathy's autoresearch and the scientific method itself.",
3
+ "version": "4.4.0",
4
+ "description": "Autonomous ML research harness — the autoresearch loop as a formal protocol. 74 commands, 2 specialized agents, operational intelligence (postmortem + doctor + plan), model lifecycle (update + registry), what-if analysis (whatif + counterfactual + simulate), collaboration (onboard + share + review), research communication (cite + present + changelog), experiment archaeology (trend + flashback + archive + annotate + search + template + replay), model surgery (prune + quantize + merge + surgery), feature & training intelligence, model debugging, pre-training intelligence, meta-intelligence, scaling & efficiency, model composition, deep analysis, experiment orchestration, literature + paper, model export, profiling, checkpoints, experiment intelligence, statistical rigor, tree-search, cost-performance, model cards, hypothesis database, novelty guard, anti-cheating, taste-leverage loop. Inspired by Karpathy's autoresearch and the scientific method itself.",
5
5
  "author": {
6
6
  "name": "pragnition"
7
7
  },
package/README.md CHANGED
@@ -382,6 +382,9 @@ The index (`hypotheses.yaml`) is the lightweight queue. The detail files (`hypot
382
382
  | `/turing:simulate [--configs]` | Experiment outcome prediction — pre-filter configs, save budget |
383
383
  | `/turing:update <exp-id>` | Incremental model update — add new data without full retraining |
384
384
  | `/turing:registry [action]` | Model registry — track lifecycle from candidate to production with gates |
385
+ | `/turing:postmortem` | Failure postmortem — diagnose why experiments stopped improving |
386
+ | `/turing:doctor [--fix]` | Harness self-diagnosis — check environment, project, resources |
387
+ | `/turing:plan [--budget N]` | Research planning — strategic experiment campaign by ROI |
385
388
 
386
389
  And for fully hands-off operation:
387
390
 
@@ -566,11 +569,11 @@ Each project gets independent config, data, experiments, models, and agent memor
566
569
 
567
570
  ## Architecture of Turing Itself
568
571
 
569
- 71 commands, 2 agents, 10 config files, 90 template scripts, model registry, artifact contract, cost-performance frontier, model cards, tree-search exploration, statistical rigor, experiment intelligence, performance profiling, smart checkpoints, production model export, literature integration, paper section drafting, experiment orchestration (queue + retry + fork), deep analysis (diff + watch + regress), model composition (ensemble + stitch + warm), scaling & efficiency (scale + budget + distill), meta-intelligence (transfer + audit), pre-training intelligence (sanity + baseline + leak), model debugging (xray + sensitivity + calibrate), feature & training intelligence (feature + curriculum), model surgery (prune + quantize + merge + surgery), experiment archaeology (trend + flashback + archive + annotate + search + template + replay), research communication (cite + present + changelog), collaboration (onboard + share + review), what-if analysis (whatif + counterfactual + simulate), model lifecycle (update + registry), 16 ADRs. See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for the full codemap.
572
+ 74 commands, 2 agents, 10 config files, 93 template scripts, model registry, artifact contract, cost-performance frontier, model cards, tree-search exploration, statistical rigor, experiment intelligence, performance profiling, smart checkpoints, production model export, literature integration, paper section drafting, experiment orchestration (queue + retry + fork), deep analysis (diff + watch + regress), model composition (ensemble + stitch + warm), scaling & efficiency (scale + budget + distill), meta-intelligence (transfer + audit), pre-training intelligence (sanity + baseline + leak), model debugging (xray + sensitivity + calibrate), feature & training intelligence (feature + curriculum), model surgery (prune + quantize + merge + surgery), experiment archaeology (trend + flashback + archive + annotate + search + template + replay), research communication (cite + present + changelog), collaboration (onboard + share + review), what-if analysis (whatif + counterfactual + simulate), model lifecycle (update + registry), operational intelligence (postmortem + doctor + plan), 16 ADRs. See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for the full codemap.
570
573
 
571
574
  ```
572
575
  turing/
573
- ├── commands/ 67 skill files (core + taste-leverage + reporting + exploration + statistical rigor + experiment intelligence + performance + deployment + research workflow + orchestration + deep analysis + model composition + scaling & efficiency + meta-intelligence + pre-training intelligence + model debugging + feature & training intelligence + model surgery + experiment archaeology + research communication + what-if analysis + model lifecycle)
576
+ ├── commands/ 70 skill files (core + taste-leverage + reporting + exploration + statistical rigor + experiment intelligence + performance + deployment + research workflow + orchestration + deep analysis + model composition + scaling & efficiency + meta-intelligence + pre-training intelligence + model debugging + feature & training intelligence + model surgery + experiment archaeology + research communication + what-if analysis + model lifecycle + operational intelligence)
574
577
  ├── agents/ 2 agents (researcher: read/write, evaluator: read-only)
575
578
  ├── config/ 8 files (lifecycle, taxonomy, archetypes, novelty aliases)
576
579
  ├── templates/ Scaffolded into user projects by /turing:init
@@ -0,0 +1,30 @@
1
+ ---
2
+ name: doctor
3
+ description: Harness self-diagnosis — check environment, project, resources, and git state. Auto-fix common issues.
4
+ disable-model-invocation: true
5
+ argument-hint: "[--fix] [--verbose]"
6
+ allowed-tools: Read, Bash(*), Grep, Glob
7
+ ---
8
+
9
+ Is Turing healthy? Check everything and get a score.
10
+
11
+ ## Steps
12
+ 1. `source .venv/bin/activate`
13
+ 2. `python scripts/harness_doctor.py $ARGUMENTS`
14
+ 3. **Saved:** `experiments/doctor/`
15
+
16
+ ## Checks
17
+ - **Environment:** Python version, venv status
18
+ - **Dependencies:** all required packages importable
19
+ - **Config:** config.yaml valid with required fields
20
+ - **Experiment log:** JSONL integrity, corrupt line detection
21
+ - **Scripts:** train.py, prepare.py, evaluate.py exist and parse
22
+ - **Disk space:** warn if <1GB free
23
+ - **Git state:** uncommitted changes to critical files
24
+
25
+ ## Examples
26
+ ```
27
+ /turing:doctor
28
+ /turing:doctor --fix
29
+ /turing:doctor --verbose --json
30
+ ```
@@ -0,0 +1,27 @@
1
+ ---
2
+ name: plan
3
+ description: Research planning assistant — design a strategic experiment campaign with budget-aware ROI allocation.
4
+ disable-model-invocation: true
5
+ argument-hint: "[--budget 20] [--goal \"maximize F1 for production\"]"
6
+ allowed-tools: Read, Bash(*), Grep, Glob
7
+ ---
8
+
9
+ Design the next N experiments strategically, not randomly. Allocates budget by expected ROI.
10
+
11
+ ## Steps
12
+ 1. `source .venv/bin/activate`
13
+ 2. `python scripts/research_planner.py $ARGUMENTS`
14
+ 3. **Saved:** `experiments/plans/`
15
+
16
+ ## How it works
17
+ - Analyzes experiment history to compute per-family ROI
18
+ - Adjusts strategy priorities based on project state and goal
19
+ - Allocates budget across: feature engineering, model search, ensemble, calibration, verification
20
+ - Generates phased plan with specific experiment descriptions
21
+
22
+ ## Examples
23
+ ```
24
+ /turing:plan --budget 20
25
+ /turing:plan --budget 10 --goal "maximize F1 for production deployment"
26
+ /turing:plan --budget 30 --json
27
+ ```
@@ -0,0 +1,28 @@
1
+ ---
2
+ name: postmortem
3
+ description: Failure postmortem — diagnose why experiments stopped improving and get actionable next steps.
4
+ disable-model-invocation: true
5
+ argument-hint: "[--window 10] [--auto-trigger 5]"
6
+ allowed-tools: Read, Bash(*), Grep, Glob
7
+ ---
8
+
9
+ When experiments stop improving, find out why. Diagnoses search space exhaustion, config errors, data issues, metric ceilings, and noise floors.
10
+
11
+ ## Steps
12
+ 1. `source .venv/bin/activate`
13
+ 2. `python scripts/failure_postmortem.py $ARGUMENTS`
14
+ 3. **Saved:** `experiments/postmortems/`
15
+
16
+ ## Diagnosis categories
17
+ - **Search space exhaustion:** micro-tuning params that don't matter
18
+ - **Systematic config error:** all experiments share a bad common config
19
+ - **Data issue:** all model types fail similarly
20
+ - **Metric ceiling:** near theoretical maximum
21
+ - **Noise floor:** improvements within seed variance
22
+
23
+ ## Examples
24
+ ```
25
+ /turing:postmortem
26
+ /turing:postmortem --window 15
27
+ /turing:postmortem --json
28
+ ```
@@ -80,6 +80,9 @@ You are the Turing ML research router. Detect the user's intent and route to the
80
80
  | "simulate", "predict outcome", "pre-filter", "which configs will work", "forecast" | `/turing:simulate` | Predict |
81
81
  | "update", "incremental", "new data", "add data", "fine-tune existing", "partial update" | `/turing:update` | Update |
82
82
  | "registry", "promote", "demote", "staging", "production", "which model is deployed", "model lifecycle" | `/turing:registry` | Govern |
83
+ | "postmortem", "why failing", "failure streak", "why no improvement", "what went wrong" | `/turing:postmortem` | Diagnose |
84
+ | "doctor", "health check", "is it broken", "diagnose harness", "self-check" | `/turing:doctor` | Check |
85
+ | "plan", "research plan", "campaign", "what next", "allocate budget", "strategic plan" | `/turing:plan` | Plan |
83
86
 
84
87
  ## Sub-commands
85
88
 
@@ -156,6 +159,9 @@ You are the Turing ML research router. Detect the user's intent and route to the
156
159
  | `/turing:simulate [--configs] [--top-k]` | Experiment outcome prediction: pre-filter configs using surrogate model, save budget | (inline) |
157
160
  | `/turing:update <exp-id> --new-data <path>` | Incremental model update: add new data without full retraining, forgetting detection | (inline) |
158
161
  | `/turing:registry [list\|register\|promote\|demote\|history]` | Model registry: stage lifecycle (candidate → staging → production) with promotion gates | (inline) |
162
+ | `/turing:postmortem [--window N]` | Failure postmortem: diagnose why experiments stopped improving (exhaustion, config error, data issue, ceiling, noise) | (inline) |
163
+ | `/turing:doctor [--fix]` | Harness self-diagnosis: environment, dependencies, config, log integrity, scripts, disk, git state | (inline) |
164
+ | `/turing:plan [--budget N] [--goal]` | Research planning assistant: strategic campaign design with budget-aware ROI allocation | (inline) |
159
165
 
160
166
  ## Proactive Detection
161
167
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "claude-turing",
3
- "version": "4.3.0",
3
+ "version": "4.4.0",
4
4
  "type": "module",
5
5
  "description": "Autonomous ML research harness for Claude Code. The autoresearch loop as a formal protocol — iteratively trains, evaluates, and improves ML models with structured experiment tracking, convergence detection, immutable evaluation infrastructure, and safety guardrails.",
6
6
  "bin": {
package/src/install.js CHANGED
@@ -38,6 +38,7 @@ const SUB_COMMANDS = [
38
38
  "onboard", "share", "review",
39
39
  "whatif", "counterfactual", "simulate",
40
40
  "update", "registry",
41
+ "postmortem", "doctor", "plan",
41
42
  ];
42
43
 
43
44
  export async function install(opts = {}) {
package/src/verify.js CHANGED
@@ -85,6 +85,9 @@ const EXPECTED_COMMANDS = [
85
85
  "simulate/SKILL.md",
86
86
  "update/SKILL.md",
87
87
  "registry/SKILL.md",
88
+ "postmortem/SKILL.md",
89
+ "doctor/SKILL.md",
90
+ "plan/SKILL.md",
88
91
  ];
89
92
 
90
93
  const EXPECTED_AGENTS = ["ml-researcher.md", "ml-evaluator.md"];