claude-turing 4.3.0 → 4.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +2 -2
- package/README.md +5 -2
- package/commands/doctor.md +30 -0
- package/commands/plan.md +27 -0
- package/commands/postmortem.md +28 -0
- package/commands/turing.md +6 -0
- package/package.json +1 -1
- package/src/install.js +1 -0
- package/src/verify.js +3 -0
- package/templates/scripts/__pycache__/failure_postmortem.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/generate_brief.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/harness_doctor.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/research_planner.cpython-314.pyc +0 -0
- package/templates/scripts/__pycache__/scaffold.cpython-314.pyc +0 -0
- package/templates/scripts/failure_postmortem.py +510 -0
- package/templates/scripts/generate_brief.py +61 -0
- package/templates/scripts/harness_doctor.py +466 -0
- package/templates/scripts/research_planner.py +470 -0
- package/templates/scripts/scaffold.py +6 -0
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "turing",
|
|
3
|
-
"version": "4.
|
|
4
|
-
"description": "Autonomous ML research harness — the autoresearch loop as a formal protocol.
|
|
3
|
+
"version": "4.4.0",
|
|
4
|
+
"description": "Autonomous ML research harness — the autoresearch loop as a formal protocol. 74 commands, 2 specialized agents, operational intelligence (postmortem + doctor + plan), model lifecycle (update + registry), what-if analysis (whatif + counterfactual + simulate), collaboration (onboard + share + review), research communication (cite + present + changelog), experiment archaeology (trend + flashback + archive + annotate + search + template + replay), model surgery (prune + quantize + merge + surgery), feature & training intelligence, model debugging, pre-training intelligence, meta-intelligence, scaling & efficiency, model composition, deep analysis, experiment orchestration, literature + paper, model export, profiling, checkpoints, experiment intelligence, statistical rigor, tree-search, cost-performance, model cards, hypothesis database, novelty guard, anti-cheating, taste-leverage loop. Inspired by Karpathy's autoresearch and the scientific method itself.",
|
|
5
5
|
"author": {
|
|
6
6
|
"name": "pragnition"
|
|
7
7
|
},
|
package/README.md
CHANGED
|
@@ -382,6 +382,9 @@ The index (`hypotheses.yaml`) is the lightweight queue. The detail files (`hypot
|
|
|
382
382
|
| `/turing:simulate [--configs]` | Experiment outcome prediction — pre-filter configs, save budget |
|
|
383
383
|
| `/turing:update <exp-id>` | Incremental model update — add new data without full retraining |
|
|
384
384
|
| `/turing:registry [action]` | Model registry — track lifecycle from candidate to production with gates |
|
|
385
|
+
| `/turing:postmortem` | Failure postmortem — diagnose why experiments stopped improving |
|
|
386
|
+
| `/turing:doctor [--fix]` | Harness self-diagnosis — check environment, project, resources |
|
|
387
|
+
| `/turing:plan [--budget N]` | Research planning — strategic experiment campaign by ROI |
|
|
385
388
|
|
|
386
389
|
And for fully hands-off operation:
|
|
387
390
|
|
|
@@ -566,11 +569,11 @@ Each project gets independent config, data, experiments, models, and agent memor
|
|
|
566
569
|
|
|
567
570
|
## Architecture of Turing Itself
|
|
568
571
|
|
|
569
|
-
|
|
572
|
+
74 commands, 2 agents, 10 config files, 93 template scripts, model registry, artifact contract, cost-performance frontier, model cards, tree-search exploration, statistical rigor, experiment intelligence, performance profiling, smart checkpoints, production model export, literature integration, paper section drafting, experiment orchestration (queue + retry + fork), deep analysis (diff + watch + regress), model composition (ensemble + stitch + warm), scaling & efficiency (scale + budget + distill), meta-intelligence (transfer + audit), pre-training intelligence (sanity + baseline + leak), model debugging (xray + sensitivity + calibrate), feature & training intelligence (feature + curriculum), model surgery (prune + quantize + merge + surgery), experiment archaeology (trend + flashback + archive + annotate + search + template + replay), research communication (cite + present + changelog), collaboration (onboard + share + review), what-if analysis (whatif + counterfactual + simulate), model lifecycle (update + registry), operational intelligence (postmortem + doctor + plan), 16 ADRs. See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for the full codemap.
|
|
570
573
|
|
|
571
574
|
```
|
|
572
575
|
turing/
|
|
573
|
-
├── commands/
|
|
576
|
+
├── commands/ 70 skill files (core + taste-leverage + reporting + exploration + statistical rigor + experiment intelligence + performance + deployment + research workflow + orchestration + deep analysis + model composition + scaling & efficiency + meta-intelligence + pre-training intelligence + model debugging + feature & training intelligence + model surgery + experiment archaeology + research communication + what-if analysis + model lifecycle + operational intelligence)
|
|
574
577
|
├── agents/ 2 agents (researcher: read/write, evaluator: read-only)
|
|
575
578
|
├── config/ 8 files (lifecycle, taxonomy, archetypes, novelty aliases)
|
|
576
579
|
├── templates/ Scaffolded into user projects by /turing:init
|
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: doctor
|
|
3
|
+
description: Harness self-diagnosis — check environment, project, resources, and git state. Auto-fix common issues.
|
|
4
|
+
disable-model-invocation: true
|
|
5
|
+
argument-hint: "[--fix] [--verbose]"
|
|
6
|
+
allowed-tools: Read, Bash(*), Grep, Glob
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
Is Turing healthy? Check everything and get a score.
|
|
10
|
+
|
|
11
|
+
## Steps
|
|
12
|
+
1. `source .venv/bin/activate`
|
|
13
|
+
2. `python scripts/harness_doctor.py $ARGUMENTS`
|
|
14
|
+
3. **Saved:** `experiments/doctor/`
|
|
15
|
+
|
|
16
|
+
## Checks
|
|
17
|
+
- **Environment:** Python version, venv status
|
|
18
|
+
- **Dependencies:** all required packages importable
|
|
19
|
+
- **Config:** config.yaml valid with required fields
|
|
20
|
+
- **Experiment log:** JSONL integrity, corrupt line detection
|
|
21
|
+
- **Scripts:** train.py, prepare.py, evaluate.py exist and parse
|
|
22
|
+
- **Disk space:** warn if <1GB free
|
|
23
|
+
- **Git state:** uncommitted changes to critical files
|
|
24
|
+
|
|
25
|
+
## Examples
|
|
26
|
+
```
|
|
27
|
+
/turing:doctor
|
|
28
|
+
/turing:doctor --fix
|
|
29
|
+
/turing:doctor --verbose --json
|
|
30
|
+
```
|
package/commands/plan.md
ADDED
|
@@ -0,0 +1,27 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: plan
|
|
3
|
+
description: Research planning assistant — design a strategic experiment campaign with budget-aware ROI allocation.
|
|
4
|
+
disable-model-invocation: true
|
|
5
|
+
argument-hint: "[--budget 20] [--goal \"maximize F1 for production\"]"
|
|
6
|
+
allowed-tools: Read, Bash(*), Grep, Glob
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
Design the next N experiments strategically, not randomly. Allocates budget by expected ROI.
|
|
10
|
+
|
|
11
|
+
## Steps
|
|
12
|
+
1. `source .venv/bin/activate`
|
|
13
|
+
2. `python scripts/research_planner.py $ARGUMENTS`
|
|
14
|
+
3. **Saved:** `experiments/plans/`
|
|
15
|
+
|
|
16
|
+
## How it works
|
|
17
|
+
- Analyzes experiment history to compute per-family ROI
|
|
18
|
+
- Adjusts strategy priorities based on project state and goal
|
|
19
|
+
- Allocates budget across: feature engineering, model search, ensemble, calibration, verification
|
|
20
|
+
- Generates phased plan with specific experiment descriptions
|
|
21
|
+
|
|
22
|
+
## Examples
|
|
23
|
+
```
|
|
24
|
+
/turing:plan --budget 20
|
|
25
|
+
/turing:plan --budget 10 --goal "maximize F1 for production deployment"
|
|
26
|
+
/turing:plan --budget 30 --json
|
|
27
|
+
```
|
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: postmortem
|
|
3
|
+
description: Failure postmortem — diagnose why experiments stopped improving and get actionable next steps.
|
|
4
|
+
disable-model-invocation: true
|
|
5
|
+
argument-hint: "[--window 10] [--auto-trigger 5]"
|
|
6
|
+
allowed-tools: Read, Bash(*), Grep, Glob
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
When experiments stop improving, find out why. Diagnoses search space exhaustion, config errors, data issues, metric ceilings, and noise floors.
|
|
10
|
+
|
|
11
|
+
## Steps
|
|
12
|
+
1. `source .venv/bin/activate`
|
|
13
|
+
2. `python scripts/failure_postmortem.py $ARGUMENTS`
|
|
14
|
+
3. **Saved:** `experiments/postmortems/`
|
|
15
|
+
|
|
16
|
+
## Diagnosis categories
|
|
17
|
+
- **Search space exhaustion:** micro-tuning params that don't matter
|
|
18
|
+
- **Systematic config error:** all experiments share a bad common config
|
|
19
|
+
- **Data issue:** all model types fail similarly
|
|
20
|
+
- **Metric ceiling:** near theoretical maximum
|
|
21
|
+
- **Noise floor:** improvements within seed variance
|
|
22
|
+
|
|
23
|
+
## Examples
|
|
24
|
+
```
|
|
25
|
+
/turing:postmortem
|
|
26
|
+
/turing:postmortem --window 15
|
|
27
|
+
/turing:postmortem --json
|
|
28
|
+
```
|
package/commands/turing.md
CHANGED
|
@@ -80,6 +80,9 @@ You are the Turing ML research router. Detect the user's intent and route to the
|
|
|
80
80
|
| "simulate", "predict outcome", "pre-filter", "which configs will work", "forecast" | `/turing:simulate` | Predict |
|
|
81
81
|
| "update", "incremental", "new data", "add data", "fine-tune existing", "partial update" | `/turing:update` | Update |
|
|
82
82
|
| "registry", "promote", "demote", "staging", "production", "which model is deployed", "model lifecycle" | `/turing:registry` | Govern |
|
|
83
|
+
| "postmortem", "why failing", "failure streak", "why no improvement", "what went wrong" | `/turing:postmortem` | Diagnose |
|
|
84
|
+
| "doctor", "health check", "is it broken", "diagnose harness", "self-check" | `/turing:doctor` | Check |
|
|
85
|
+
| "plan", "research plan", "campaign", "what next", "allocate budget", "strategic plan" | `/turing:plan` | Plan |
|
|
83
86
|
|
|
84
87
|
## Sub-commands
|
|
85
88
|
|
|
@@ -156,6 +159,9 @@ You are the Turing ML research router. Detect the user's intent and route to the
|
|
|
156
159
|
| `/turing:simulate [--configs] [--top-k]` | Experiment outcome prediction: pre-filter configs using surrogate model, save budget | (inline) |
|
|
157
160
|
| `/turing:update <exp-id> --new-data <path>` | Incremental model update: add new data without full retraining, forgetting detection | (inline) |
|
|
158
161
|
| `/turing:registry [list\|register\|promote\|demote\|history]` | Model registry: stage lifecycle (candidate → staging → production) with promotion gates | (inline) |
|
|
162
|
+
| `/turing:postmortem [--window N]` | Failure postmortem: diagnose why experiments stopped improving (exhaustion, config error, data issue, ceiling, noise) | (inline) |
|
|
163
|
+
| `/turing:doctor [--fix]` | Harness self-diagnosis: environment, dependencies, config, log integrity, scripts, disk, git state | (inline) |
|
|
164
|
+
| `/turing:plan [--budget N] [--goal]` | Research planning assistant: strategic campaign design with budget-aware ROI allocation | (inline) |
|
|
159
165
|
|
|
160
166
|
## Proactive Detection
|
|
161
167
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "claude-turing",
|
|
3
|
-
"version": "4.
|
|
3
|
+
"version": "4.4.0",
|
|
4
4
|
"type": "module",
|
|
5
5
|
"description": "Autonomous ML research harness for Claude Code. The autoresearch loop as a formal protocol — iteratively trains, evaluates, and improves ML models with structured experiment tracking, convergence detection, immutable evaluation infrastructure, and safety guardrails.",
|
|
6
6
|
"bin": {
|
package/src/install.js
CHANGED
package/src/verify.js
CHANGED
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|