claude-turing 1.0.0 → 1.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +1 -1
- package/README.md +21 -7
- package/commands/brief.md +13 -1
- package/commands/init.md +13 -0
- package/commands/train.md +16 -7
- package/commands/turing.md +2 -2
- package/package.json +1 -1
- package/templates/model_contract.md +49 -0
- package/templates/model_registry.yaml +69 -0
- package/templates/program.md +2 -0
- package/templates/scripts/__pycache__/scaffold.cpython-314.pyc +0 -0
- package/templates/scripts/scaffold.py +2 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "turing",
|
|
3
|
-
"version": "1.0.
|
|
3
|
+
"version": "1.0.1",
|
|
4
4
|
"description": "Autonomous ML research harness — the autoresearch loop as a formal protocol. 14 commands, 2 specialized agents, structured experiment lifecycle with convergence detection, immutable evaluation infrastructure, novelty guard, decision synthesis, hypothesis database, and safety guardrails that separate the hypothesis space from the measurement apparatus. Inspired by Karpathy's autoresearch and the scientific method itself.",
|
|
5
5
|
"author": {
|
|
6
6
|
"name": "pragnition"
|
package/README.md
CHANGED
|
@@ -300,8 +300,8 @@ The index (`hypotheses.yaml`) is the lightweight queue. The detail files (`hypot
|
|
|
300
300
|
|
|
301
301
|
| Command | What it does |
|
|
302
302
|
|---------|-------------|
|
|
303
|
-
| `/turing:init [--plan]` | Scaffold a new ML project. `--plan` generates a literature-grounded research plan. |
|
|
304
|
-
| `/turing:train [N]` | Run the
|
|
303
|
+
| `/turing:init [--plan]` | Scaffold a new ML project. `--plan` generates a literature-grounded research plan. Supports multiple projects in subdirectories. |
|
|
304
|
+
| `/turing:train [ml/project] [N]` | Run the experiment loop. Auto-detects project from cwd or explicit path. |
|
|
305
305
|
| `/turing:sweep` | Systematic hyperparameter sweep via cartesian product |
|
|
306
306
|
| `/turing:status` | Quick experiment status — best model, convergence state |
|
|
307
307
|
| `/turing:compare <a> <b>` | Side-by-side experiment comparison with causal analysis |
|
|
@@ -404,15 +404,27 @@ claude plugin add /path/to/turing
|
|
|
404
404
|
### Quick Start
|
|
405
405
|
|
|
406
406
|
```bash
|
|
407
|
-
/turing:init
|
|
408
|
-
/turing:train
|
|
409
|
-
/turing:brief
|
|
410
|
-
/turing:try "idea"
|
|
407
|
+
/turing:init # Scaffold project (answer 3 prompts)
|
|
408
|
+
/turing:train # Run experiment loop
|
|
409
|
+
/turing:brief # Read what happened
|
|
410
|
+
/turing:try "idea" # Inject your taste
|
|
411
411
|
```
|
|
412
412
|
|
|
413
|
+
### Multiple Projects
|
|
414
|
+
|
|
415
|
+
```bash
|
|
416
|
+
/turing:init # Scaffold ml/sentiment
|
|
417
|
+
/turing:init # Scaffold ml/churn
|
|
418
|
+
/turing:train ml/sentiment # Train in specific project
|
|
419
|
+
/turing:brief ml/churn # Brief for specific project
|
|
420
|
+
cd ml/sentiment && /turing:train # Auto-detects from cwd
|
|
421
|
+
```
|
|
422
|
+
|
|
423
|
+
Each project gets independent config, data, experiments, models, and agent memory.
|
|
424
|
+
|
|
413
425
|
## Architecture of Turing Itself
|
|
414
426
|
|
|
415
|
-
15 commands, 2 agents, 8 config files, 25 template scripts, 338 tests, 16 ADRs. See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for the full codemap.
|
|
427
|
+
15 commands, 2 agents, 8 config files, 25 template scripts, model registry, artifact contract, 338 tests, 16 ADRs. See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for the full codemap.
|
|
416
428
|
|
|
417
429
|
```
|
|
418
430
|
turing/
|
|
@@ -423,6 +435,8 @@ turing/
|
|
|
423
435
|
│ ├── prepare.py Data loading (HIDDEN from agent)
|
|
424
436
|
│ ├── evaluate.py Evaluation harness (HIDDEN from agent)
|
|
425
437
|
│ ├── train.py Training code (AGENT-EDITABLE)
|
|
438
|
+
│ ├── model_contract.md Artifact schema for production consumers
|
|
439
|
+
│ ├── model_registry.yaml Available model architectures + hyperparams
|
|
426
440
|
│ └── scripts/ 25 Python scripts (core loop + analysis + infra)
|
|
427
441
|
├── tests/ 338 tests (unit + integration + anti-pattern + manifest)
|
|
428
442
|
├── src/ 5 JS installer files (npm deployment)
|
package/commands/brief.md
CHANGED
|
@@ -2,12 +2,24 @@
|
|
|
2
2
|
name: brief
|
|
3
3
|
description: Generate a structured research intelligence report from experiment history — what's been learned, what's promising, what's exhausted, and what the human should consider next. Use --deep for literature-grounded suggestions.
|
|
4
4
|
disable-model-invocation: true
|
|
5
|
-
argument-hint: "[--deep]"
|
|
5
|
+
argument-hint: "[ml/project] [--deep]"
|
|
6
6
|
allowed-tools: Read, Bash(python scripts/*:*, source .venv/bin/activate:*), Grep, Glob, WebSearch, WebFetch
|
|
7
7
|
---
|
|
8
8
|
|
|
9
9
|
Generate a research briefing that a human can read in 2 minutes and immediately decide what to inject next.
|
|
10
10
|
|
|
11
|
+
## Project Detection
|
|
12
|
+
|
|
13
|
+
Before generating the briefing, detect which project to report on:
|
|
14
|
+
|
|
15
|
+
0. **Detect project directory:**
|
|
16
|
+
- If `$ARGUMENTS` contains a path (e.g., `ml/coding`), use that as the project directory
|
|
17
|
+
- Else if cwd contains `config.yaml` and `train.py`, use cwd
|
|
18
|
+
- Else search for `ml/*/` subdirectories containing `config.yaml`
|
|
19
|
+
- If exactly one found, use it
|
|
20
|
+
- If multiple found, list them and ask the user which to report on
|
|
21
|
+
- All subsequent commands run from the detected project directory
|
|
22
|
+
|
|
11
23
|
## Steps
|
|
12
24
|
|
|
13
25
|
1. **Generate the briefing:**
|
package/commands/init.md
CHANGED
|
@@ -121,3 +121,16 @@ If `$ARGUMENTS` contains `--plan`, generate a research plan AFTER scaffolding. T
|
|
|
121
121
|
### Integration
|
|
122
122
|
|
|
123
123
|
The agent's `program.md` OBSERVE step reads `RESEARCH_PLAN.md` (if it exists) for strategic direction. The plan is advisory — the agent can deviate but should note why in `experiment_state.yaml`.
|
|
124
|
+
|
|
125
|
+
## Multiple Projects
|
|
126
|
+
|
|
127
|
+
You can scaffold multiple ML projects in the same repository:
|
|
128
|
+
|
|
129
|
+
```bash
|
|
130
|
+
/turing:init # First project: prompts for ml_dir (e.g., ml/sentiment)
|
|
131
|
+
/turing:init # Second project: prompts for ml_dir (e.g., ml/churn)
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
Each project gets its own directory with independent config, data, experiments, and models. `/turing:train ml/sentiment` or `/turing:train ml/churn` targets a specific project. If you `cd ml/sentiment` first, `/turing:train` auto-detects from cwd.
|
|
135
|
+
|
|
136
|
+
Agent memory is scoped per project: `.claude/agent-memory/ml-researcher-{project_name}/MEMORY.md`
|
package/commands/train.md
CHANGED
|
@@ -12,16 +12,25 @@ Read `program.md` in the ML project directory for the complete protocol. Follow
|
|
|
12
12
|
|
|
13
13
|
## Arguments
|
|
14
14
|
|
|
15
|
-
`$ARGUMENTS` —
|
|
15
|
+
`$ARGUMENTS` — accepts a project path (e.g., `ml/coding`), a number for max_iterations, or both (e.g., `ml/coding 10`). If no number, run until convergence (as defined in `config.yaml` convergence settings).
|
|
16
16
|
|
|
17
17
|
## Bootstrap Sequence
|
|
18
18
|
|
|
19
|
-
0. **
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
19
|
+
0. **Detect project directory:**
|
|
20
|
+
- If `$ARGUMENTS` contains a path (e.g., `ml/coding`), use that as the project directory
|
|
21
|
+
- Else if cwd contains `config.yaml` and `train.py`, use cwd
|
|
22
|
+
- Else search for `ml/*/` subdirectories containing `config.yaml`
|
|
23
|
+
- If exactly one found, use it
|
|
24
|
+
- If multiple found, list them and ask the user which to target
|
|
25
|
+
- All subsequent commands run from the detected project directory
|
|
26
|
+
- Memory path: `.claude/agent-memory/ml-researcher-{project_name}/MEMORY.md`
|
|
27
|
+
|
|
28
|
+
1. **Restore memory:** Read `.claude/agent-memory/ml-researcher-{project_name}/MEMORY.md` for prior observations and best results.
|
|
29
|
+
2. **Read protocol:** Read `program.md` completely — it defines the experiment loop, constraints, and output format.
|
|
30
|
+
3. **Bootstrap data:** Check for training data at `config.yaml` → `data.source`. If no splits exist, run `python prepare.py`.
|
|
31
|
+
4. **Bootstrap venv:** `test -d .venv || (python3 -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt)`
|
|
32
|
+
5. **Assess state:** `source .venv/bin/activate && python scripts/show_metrics.py --last 5`
|
|
33
|
+
6. **Begin the loop** from program.md.
|
|
25
34
|
|
|
26
35
|
## The Loop
|
|
27
36
|
|
package/commands/turing.md
CHANGED
|
@@ -9,7 +9,7 @@ You are the Turing ML research router. Detect the user's intent and route to the
|
|
|
9
9
|
|
|
10
10
|
| User says... | Route to | Lifecycle phase |
|
|
11
11
|
|---|---|---|
|
|
12
|
-
| "train", "run experiments", "autoresearch", "improve the model", "start training" | `/turing:train` | Execute |
|
|
12
|
+
| "train", "train ml/coding", "train ml/claims", "run experiments", "run experiments in ml/X", "autoresearch", "improve the model", "start training" | `/turing:train` | Execute |
|
|
13
13
|
| "status", "how's training", "experiment results", "current metrics" | `/turing:status` | Observe |
|
|
14
14
|
| "compare", "diff runs", "which is better" | `/turing:compare` | Analyze |
|
|
15
15
|
| "sweep", "grid search", "hyperparameter search", "tune" | `/turing:sweep` | Explore |
|
|
@@ -29,7 +29,7 @@ You are the Turing ML research router. Detect the user's intent and route to the
|
|
|
29
29
|
|
|
30
30
|
| Command | Purpose | Agent |
|
|
31
31
|
|---|---|---|
|
|
32
|
-
| `/turing:train [N]` | Run the autonomous experiment loop | @ml-researcher |
|
|
32
|
+
| `/turing:train [ml/project] [N]` | Run the autonomous experiment loop (auto-detects project from path or cwd) | @ml-researcher |
|
|
33
33
|
| `/turing:status` | Show experiment status, best model, convergence | @ml-evaluator |
|
|
34
34
|
| `/turing:compare <a> <b>` | Side-by-side experiment comparison | @ml-evaluator |
|
|
35
35
|
| `/turing:sweep` | Generate and run hyperparameter sweep | @ml-researcher |
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "claude-turing",
|
|
3
|
-
"version": "1.0.
|
|
3
|
+
"version": "1.0.1",
|
|
4
4
|
"type": "module",
|
|
5
5
|
"description": "Autonomous ML research harness for Claude Code. The autoresearch loop as a formal protocol — iteratively trains, evaluates, and improves ML models with structured experiment tracking, convergence detection, immutable evaluation infrastructure, and safety guardrails.",
|
|
6
6
|
"bin": {
|
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
# Model Artifact Contract
|
|
2
|
+
|
|
3
|
+
Version: 1
|
|
4
|
+
Last updated: {{PROJECT_NAME}} initial scaffold
|
|
5
|
+
|
|
6
|
+
## Bundle Format
|
|
7
|
+
|
|
8
|
+
The trained model is saved as a joblib bundle at `models/best/model.joblib` containing:
|
|
9
|
+
|
|
10
|
+
```python
|
|
11
|
+
{
|
|
12
|
+
"model": <fitted model object>,
|
|
13
|
+
"featurizer": <fitted CompositeFeaturizer>,
|
|
14
|
+
"config": <dict of training config>,
|
|
15
|
+
"contract_version": 1
|
|
16
|
+
}
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
## Metadata
|
|
20
|
+
|
|
21
|
+
`models/best/metadata.json` contains:
|
|
22
|
+
|
|
23
|
+
```json
|
|
24
|
+
{
|
|
25
|
+
"contract_version": 1,
|
|
26
|
+
"model_type": "xgboost",
|
|
27
|
+
"experiment_id": "exp-001",
|
|
28
|
+
"metrics": {"{{TARGET_METRIC}}": 0.0},
|
|
29
|
+
"feature_names": [],
|
|
30
|
+
"created_at": "ISO-8601"
|
|
31
|
+
}
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
## Consumer Contract
|
|
35
|
+
|
|
36
|
+
Any service loading this model expects:
|
|
37
|
+
- `bundle["model"]` has a `.predict()` method accepting a feature matrix
|
|
38
|
+
- `bundle["featurizer"]` has a `.transform(df)` method returning a DataFrame
|
|
39
|
+
- `bundle.get("contract_version", 0)` must equal 1
|
|
40
|
+
|
|
41
|
+
If `contract_version` doesn't match, the consumer should log a warning and fall back to a default/rules-based approach.
|
|
42
|
+
|
|
43
|
+
## Breaking Changes
|
|
44
|
+
|
|
45
|
+
Increment `contract_version` when changing:
|
|
46
|
+
- Feature schema (different featurizer output shape)
|
|
47
|
+
- Label encoding (different label_map)
|
|
48
|
+
- Bundle key names
|
|
49
|
+
- Model input/output format
|
|
@@ -0,0 +1,69 @@
|
|
|
1
|
+
# Model Registry for {{PROJECT_NAME}}
|
|
2
|
+
#
|
|
3
|
+
# Catalog of available model architectures. The agent reads this
|
|
4
|
+
# during /turing:suggest and archetype:model_comparison to know
|
|
5
|
+
# what models are available and how to configure them.
|
|
6
|
+
#
|
|
7
|
+
# Add your domain-specific models here.
|
|
8
|
+
|
|
9
|
+
models:
|
|
10
|
+
xgboost:
|
|
11
|
+
name: "XGBoost Classifier"
|
|
12
|
+
family: "gradient_boosting"
|
|
13
|
+
task: "{{TARGET_METRIC}}"
|
|
14
|
+
notes: "Default model. Good for tabular data with mixed feature types."
|
|
15
|
+
paper: "https://arxiv.org/abs/1603.02754"
|
|
16
|
+
default_hyperparams:
|
|
17
|
+
n_estimators: 100
|
|
18
|
+
max_depth: 4
|
|
19
|
+
learning_rate: 0.1
|
|
20
|
+
objective: "multi:softmax"
|
|
21
|
+
|
|
22
|
+
lightgbm:
|
|
23
|
+
name: "LightGBM Classifier"
|
|
24
|
+
family: "gradient_boosting"
|
|
25
|
+
task: "{{TARGET_METRIC}}"
|
|
26
|
+
notes: "Often faster than XGBoost. Leaf-wise growth. Try dart boosting for regularization."
|
|
27
|
+
paper: "https://papers.nips.cc/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html"
|
|
28
|
+
default_hyperparams:
|
|
29
|
+
n_estimators: 100
|
|
30
|
+
max_depth: -1
|
|
31
|
+
learning_rate: 0.1
|
|
32
|
+
num_leaves: 31
|
|
33
|
+
|
|
34
|
+
random_forest:
|
|
35
|
+
name: "Random Forest Classifier"
|
|
36
|
+
family: "ensemble"
|
|
37
|
+
task: "{{TARGET_METRIC}}"
|
|
38
|
+
notes: "Bagging ensemble. Good baseline. Less prone to overfitting than single trees."
|
|
39
|
+
default_hyperparams:
|
|
40
|
+
n_estimators: 100
|
|
41
|
+
max_depth: null
|
|
42
|
+
min_samples_split: 2
|
|
43
|
+
|
|
44
|
+
logistic_regression:
|
|
45
|
+
name: "Logistic Regression"
|
|
46
|
+
family: "linear"
|
|
47
|
+
task: "{{TARGET_METRIC}}"
|
|
48
|
+
notes: "Simple linear baseline. Always try this first — if it works well, your features are strong."
|
|
49
|
+
default_hyperparams:
|
|
50
|
+
C: 1.0
|
|
51
|
+
max_iter: 1000
|
|
52
|
+
|
|
53
|
+
mlp:
|
|
54
|
+
name: "Multi-Layer Perceptron"
|
|
55
|
+
family: "neural_network"
|
|
56
|
+
task: "{{TARGET_METRIC}}"
|
|
57
|
+
notes: "Simple neural network. Try when samples > 2000. Needs feature scaling."
|
|
58
|
+
default_hyperparams:
|
|
59
|
+
hidden_layer_sizes: [100, 50]
|
|
60
|
+
learning_rate_init: 0.001
|
|
61
|
+
max_iter: 200
|
|
62
|
+
|
|
63
|
+
# Add your domain-specific models below:
|
|
64
|
+
# e.g., for NLP tasks:
|
|
65
|
+
# bert-base:
|
|
66
|
+
# name: "BERT Base"
|
|
67
|
+
# family: "transformer"
|
|
68
|
+
# hf_id: "bert-base-uncased"
|
|
69
|
+
# ...
|
package/templates/program.md
CHANGED
|
@@ -71,6 +71,8 @@ The autoresearch experiment loop. Each iteration is one experiment — one hypot
|
|
|
71
71
|
cat RESEARCH_PLAN.md 2>/dev/null || true
|
|
72
72
|
```
|
|
73
73
|
|
|
74
|
+
Read `model_registry.yaml` to know what model architectures are available, their default hyperparameters, and family groupings. Use this to inform model selection during suggest and model_comparison archetypes.
|
|
75
|
+
|
|
74
76
|
If `RESEARCH_PLAN.md` exists, use it for strategic direction (which model families to explore, in what order, what budget). The plan is advisory — deviate if evidence warrants, but note why.
|
|
75
77
|
|
|
76
78
|
For the most recent discarded experiments, read the actual git diff to understand what was tried and failed — do NOT rely on your own memory of what you changed:
|
|
Binary file
|