synapse-sdk 2025.9.5__py3-none-any.whl → 2025.10.6__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of synapse-sdk might be problematic. Click here for more details.
- synapse_sdk/clients/base.py +129 -9
- synapse_sdk/devtools/docs/docs/api/clients/base.md +230 -8
- synapse_sdk/devtools/docs/docs/api/plugins/models.md +58 -3
- synapse_sdk/devtools/docs/docs/plugins/categories/neural-net-plugins/train-action-overview.md +663 -0
- synapse_sdk/devtools/docs/docs/plugins/categories/pre-annotation-plugins/pre-annotation-plugin-overview.md +198 -0
- synapse_sdk/devtools/docs/docs/plugins/categories/pre-annotation-plugins/to-task-action-development.md +1645 -0
- synapse_sdk/devtools/docs/docs/plugins/categories/pre-annotation-plugins/to-task-overview.md +717 -0
- synapse_sdk/devtools/docs/docs/plugins/categories/pre-annotation-plugins/to-task-template-development.md +1380 -0
- synapse_sdk/devtools/docs/docs/plugins/categories/upload-plugins/upload-plugin-action.md +934 -0
- synapse_sdk/devtools/docs/docs/plugins/categories/upload-plugins/upload-plugin-overview.md +585 -0
- synapse_sdk/devtools/docs/docs/plugins/categories/upload-plugins/upload-plugin-template.md +715 -0
- synapse_sdk/devtools/docs/docs/plugins/export-plugins.md +39 -0
- synapse_sdk/devtools/docs/docs/plugins/plugins.md +12 -5
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/api/clients/base.md +230 -8
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/api/plugins/models.md +114 -0
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/neural-net-plugins/train-action-overview.md +621 -0
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/pre-annotation-plugins/pre-annotation-plugin-overview.md +198 -0
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/pre-annotation-plugins/to-task-action-development.md +1645 -0
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/pre-annotation-plugins/to-task-overview.md +717 -0
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/pre-annotation-plugins/to-task-template-development.md +1380 -0
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/upload-plugins/upload-plugin-action.md +934 -0
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/upload-plugins/upload-plugin-overview.md +585 -0
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/upload-plugins/upload-plugin-template.md +715 -0
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/export-plugins.md +39 -0
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current.json +16 -4
- synapse_sdk/devtools/docs/sidebars.ts +45 -1
- synapse_sdk/plugins/README.md +487 -80
- synapse_sdk/plugins/categories/base.py +1 -0
- synapse_sdk/plugins/categories/export/actions/export/action.py +8 -3
- synapse_sdk/plugins/categories/export/actions/export/utils.py +108 -8
- synapse_sdk/plugins/categories/export/templates/config.yaml +18 -0
- synapse_sdk/plugins/categories/export/templates/plugin/export.py +97 -0
- synapse_sdk/plugins/categories/neural_net/actions/train.py +592 -22
- synapse_sdk/plugins/categories/neural_net/actions/tune.py +150 -3
- synapse_sdk/plugins/categories/pre_annotation/actions/__init__.py +4 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/pre_annotation/__init__.py +3 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/pre_annotation/action.py +10 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/__init__.py +28 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/action.py +145 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/enums.py +269 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/exceptions.py +14 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/factory.py +76 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/models.py +97 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/orchestrator.py +250 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/run.py +64 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/__init__.py +17 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/annotation.py +284 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/base.py +170 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/extraction.py +83 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/metrics.py +87 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/preprocessor.py +127 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/validation.py +143 -0
- synapse_sdk/plugins/categories/upload/actions/upload/__init__.py +2 -1
- synapse_sdk/plugins/categories/upload/actions/upload/action.py +8 -1
- synapse_sdk/plugins/categories/upload/actions/upload/context.py +0 -1
- synapse_sdk/plugins/categories/upload/actions/upload/models.py +134 -94
- synapse_sdk/plugins/categories/upload/actions/upload/steps/cleanup.py +2 -2
- synapse_sdk/plugins/categories/upload/actions/upload/steps/generate.py +6 -2
- synapse_sdk/plugins/categories/upload/actions/upload/steps/initialize.py +24 -9
- synapse_sdk/plugins/categories/upload/actions/upload/steps/metadata.py +130 -18
- synapse_sdk/plugins/categories/upload/actions/upload/steps/organize.py +147 -37
- synapse_sdk/plugins/categories/upload/actions/upload/steps/upload.py +10 -5
- synapse_sdk/plugins/categories/upload/actions/upload/strategies/file_discovery/flat.py +31 -6
- synapse_sdk/plugins/categories/upload/actions/upload/strategies/file_discovery/recursive.py +65 -37
- synapse_sdk/plugins/categories/upload/actions/upload/strategies/validation/default.py +17 -2
- synapse_sdk/plugins/categories/upload/templates/README.md +394 -0
- synapse_sdk/plugins/models.py +62 -0
- synapse_sdk/utils/file/download.py +261 -0
- {synapse_sdk-2025.9.5.dist-info → synapse_sdk-2025.10.6.dist-info}/METADATA +15 -2
- {synapse_sdk-2025.9.5.dist-info → synapse_sdk-2025.10.6.dist-info}/RECORD +74 -43
- synapse_sdk/devtools/docs/docs/plugins/developing-upload-template.md +0 -1463
- synapse_sdk/devtools/docs/docs/plugins/upload-plugins.md +0 -1964
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/developing-upload-template.md +0 -1463
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/upload-plugins.md +0 -2077
- {synapse_sdk-2025.9.5.dist-info → synapse_sdk-2025.10.6.dist-info}/WHEEL +0 -0
- {synapse_sdk-2025.9.5.dist-info → synapse_sdk-2025.10.6.dist-info}/entry_points.txt +0 -0
- {synapse_sdk-2025.9.5.dist-info → synapse_sdk-2025.10.6.dist-info}/licenses/LICENSE +0 -0
- {synapse_sdk-2025.9.5.dist-info → synapse_sdk-2025.10.6.dist-info}/top_level.txt +0 -0
|
@@ -0,0 +1,663 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: train-action-overview
|
|
3
|
+
title: Train Action Overview
|
|
4
|
+
sidebar_position: 1
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Train Action Overview
|
|
8
|
+
|
|
9
|
+
The Train Action provides unified functionality for both model training and hyperparameter optimization (HPO) through a single interface. It supports regular training workflows and advanced hyperparameter tuning with Ray Tune integration.
|
|
10
|
+
|
|
11
|
+
## Quick Overview
|
|
12
|
+
|
|
13
|
+
**Category:** Neural Net
|
|
14
|
+
**Available Actions:** `train`
|
|
15
|
+
**Execution Method:** Job-based execution
|
|
16
|
+
**Modes:** Training mode and Hyperparameter Tuning mode
|
|
17
|
+
|
|
18
|
+
## Key Features
|
|
19
|
+
|
|
20
|
+
- **Unified Interface**: Single action for both training and hyperparameter tuning
|
|
21
|
+
- **Flexible Hyperparameters**: No rigid structure - plugins define their own hyperparameter schema
|
|
22
|
+
- **Ray Tune Integration**: Advanced HPO with multiple search algorithms and schedulers
|
|
23
|
+
- **Automatic Trial Tracking**: Trial IDs automatically injected into logs during tuning
|
|
24
|
+
- **Resource Management**: Configurable CPU/GPU allocation per trial
|
|
25
|
+
- **Best Model Selection**: Automatic best model checkpoint selection after tuning
|
|
26
|
+
- **Progress Tracking**: Real-time progress updates across training/tuning phases
|
|
27
|
+
|
|
28
|
+
## Modes
|
|
29
|
+
|
|
30
|
+
### Training Mode (Default)
|
|
31
|
+
|
|
32
|
+
Standard model training with fixed hyperparameters.
|
|
33
|
+
|
|
34
|
+
```json
|
|
35
|
+
{
|
|
36
|
+
"action": "train",
|
|
37
|
+
"params": {
|
|
38
|
+
"name": "my_model",
|
|
39
|
+
"dataset": 123,
|
|
40
|
+
"checkpoint": null,
|
|
41
|
+
"is_tune": false,
|
|
42
|
+
"hyperparameter": {
|
|
43
|
+
"epochs": 100,
|
|
44
|
+
"batch_size": 32,
|
|
45
|
+
"learning_rate": 0.001,
|
|
46
|
+
"optimizer": "adam"
|
|
47
|
+
}
|
|
48
|
+
}
|
|
49
|
+
}
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
### Hyperparameter Tuning Mode
|
|
53
|
+
|
|
54
|
+
Hyperparameter optimization using Ray Tune.
|
|
55
|
+
|
|
56
|
+
```json
|
|
57
|
+
{
|
|
58
|
+
"action": "train",
|
|
59
|
+
"params": {
|
|
60
|
+
"name": "my_tuning_job",
|
|
61
|
+
"dataset": 123,
|
|
62
|
+
"checkpoint": null,
|
|
63
|
+
"is_tune": true,
|
|
64
|
+
"tune_hyperparameter": [
|
|
65
|
+
{
|
|
66
|
+
"name": "batch_size",
|
|
67
|
+
"type": "choice",
|
|
68
|
+
"options": [16, 32, 64]
|
|
69
|
+
},
|
|
70
|
+
{
|
|
71
|
+
"name": "learning_rate",
|
|
72
|
+
"type": "loguniform",
|
|
73
|
+
"min": 0.0001,
|
|
74
|
+
"max": 0.01,
|
|
75
|
+
"base": 10
|
|
76
|
+
},
|
|
77
|
+
{
|
|
78
|
+
"name": "optimizer",
|
|
79
|
+
"type": "choice",
|
|
80
|
+
"options": ["adam", "sgd"]
|
|
81
|
+
}
|
|
82
|
+
],
|
|
83
|
+
"tune_config": {
|
|
84
|
+
"mode": "max",
|
|
85
|
+
"metric": "accuracy",
|
|
86
|
+
"num_samples": 10,
|
|
87
|
+
"max_concurrent_trials": 2
|
|
88
|
+
}
|
|
89
|
+
}
|
|
90
|
+
}
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
## Configuration Parameters
|
|
94
|
+
|
|
95
|
+
### Common Parameters (Both Modes)
|
|
96
|
+
|
|
97
|
+
| Parameter | Type | Required | Description |
|
|
98
|
+
| ------------ | ------------- | -------- | ------------------------------------- |
|
|
99
|
+
| `name` | `str` | Yes | Training/tuning job name |
|
|
100
|
+
| `dataset` | `int` | Yes | Dataset ID |
|
|
101
|
+
| `checkpoint` | `int \| None` | No | Checkpoint ID for resuming training |
|
|
102
|
+
| `is_tune` | `bool` | No | Enable tuning mode (default: `false`) |
|
|
103
|
+
| `num_cpus` | `float` | No | CPU resources per trial (tuning only) |
|
|
104
|
+
| `num_gpus` | `float` | No | GPU resources per trial (tuning only) |
|
|
105
|
+
|
|
106
|
+
### Training Mode Parameters (`is_tune=false`)
|
|
107
|
+
|
|
108
|
+
| Parameter | Type | Required | Description |
|
|
109
|
+
| ---------------- | ------ | -------- | ---------------------------------- |
|
|
110
|
+
| `hyperparameter` | `dict` | Yes | Fixed hyperparameters for training |
|
|
111
|
+
|
|
112
|
+
**Note**: The structure of `hyperparameter` is completely flexible and defined by your plugin. Common fields include:
|
|
113
|
+
|
|
114
|
+
- `epochs`: Number of training epochs
|
|
115
|
+
- `batch_size`: Batch size for training
|
|
116
|
+
- `learning_rate`: Learning rate
|
|
117
|
+
- `optimizer`: Optimizer type (adam, sgd, etc.)
|
|
118
|
+
- Any custom fields your plugin needs (e.g., `dropout_rate`, `weight_decay`, `image_size`)
|
|
119
|
+
|
|
120
|
+
### Tuning Mode Parameters (`is_tune=true`)
|
|
121
|
+
|
|
122
|
+
| Parameter | Type | Required | Description |
|
|
123
|
+
| --------------------- | ------ | -------- | ------------------------------------ |
|
|
124
|
+
| `tune_hyperparameter` | `list` | Yes | List of hyperparameter search spaces |
|
|
125
|
+
| `tune_config` | `dict` | Yes | Ray Tune configuration |
|
|
126
|
+
|
|
127
|
+
## Hyperparameter Search Spaces
|
|
128
|
+
|
|
129
|
+
Define hyperparameter distributions for tuning:
|
|
130
|
+
|
|
131
|
+
### Continuous Distributions
|
|
132
|
+
|
|
133
|
+
```json
|
|
134
|
+
[
|
|
135
|
+
{
|
|
136
|
+
"name": "learning_rate",
|
|
137
|
+
"type": "uniform",
|
|
138
|
+
"min": 0.0001,
|
|
139
|
+
"max": 0.01
|
|
140
|
+
},
|
|
141
|
+
{
|
|
142
|
+
"name": "dropout_rate",
|
|
143
|
+
"type": "loguniform",
|
|
144
|
+
"min": 0.0001,
|
|
145
|
+
"max": 0.1,
|
|
146
|
+
"base": 10
|
|
147
|
+
}
|
|
148
|
+
]
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
### Discrete Distributions
|
|
152
|
+
|
|
153
|
+
```json
|
|
154
|
+
[
|
|
155
|
+
{
|
|
156
|
+
"name": "batch_size",
|
|
157
|
+
"type": "choice",
|
|
158
|
+
"options": [16, 32, 64, 128]
|
|
159
|
+
},
|
|
160
|
+
{
|
|
161
|
+
"name": "optimizer",
|
|
162
|
+
"type": "choice",
|
|
163
|
+
"options": ["adam", "sgd", "rmsprop"]
|
|
164
|
+
}
|
|
165
|
+
]
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
### Quantized Distributions
|
|
169
|
+
|
|
170
|
+
```json
|
|
171
|
+
[
|
|
172
|
+
{
|
|
173
|
+
"name": "learning_rate",
|
|
174
|
+
"type": "quniform",
|
|
175
|
+
"min": 0.0001,
|
|
176
|
+
"max": 0.01,
|
|
177
|
+
"q": 0.0001
|
|
178
|
+
}
|
|
179
|
+
]
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
### Supported Distribution Types
|
|
183
|
+
|
|
184
|
+
Each hyperparameter type requires specific parameters:
|
|
185
|
+
|
|
186
|
+
| Type | Required Parameters | Description | Example |
|
|
187
|
+
|------|-------------------|-------------|---------|
|
|
188
|
+
| `uniform` | `min`, `max` | Uniform distribution between min and max | `{"name": "lr", "type": "uniform", "min": 0.0001, "max": 0.01}` |
|
|
189
|
+
| `quniform` | `min`, `max` | Quantized uniform distribution | `{"name": "lr", "type": "quniform", "min": 0.0001, "max": 0.01}` |
|
|
190
|
+
| `loguniform` | `min`, `max`, `base` | Log-uniform distribution | `{"name": "lr", "type": "loguniform", "min": 0.0001, "max": 0.01, "base": 10}` |
|
|
191
|
+
| `qloguniform` | `min`, `max`, `base` | Quantized log-uniform distribution | `{"name": "lr", "type": "qloguniform", "min": 0.0001, "max": 0.01, "base": 10}` |
|
|
192
|
+
| `randn` | `mean`, `sd` | Normal (Gaussian) distribution | `{"name": "noise", "type": "randn", "mean": 0.0, "sd": 1.0}` |
|
|
193
|
+
| `qrandn` | `mean`, `sd` | Quantized normal distribution | `{"name": "noise", "type": "qrandn", "mean": 0.0, "sd": 1.0}` |
|
|
194
|
+
| `randint` | `min`, `max` | Random integer between min and max | `{"name": "epochs", "type": "randint", "min": 5, "max": 15}` |
|
|
195
|
+
| `qrandint` | `min`, `max` | Quantized random integer | `{"name": "epochs", "type": "qrandint", "min": 5, "max": 15}` |
|
|
196
|
+
| `lograndint` | `min`, `max`, `base` | Log-random integer | `{"name": "units", "type": "lograndint", "min": 16, "max": 256, "base": 2}` |
|
|
197
|
+
| `qlograndint` | `min`, `max`, `base` | Quantized log-random integer | `{"name": "units", "type": "qlograndint", "min": 16, "max": 256, "base": 2}` |
|
|
198
|
+
| `choice` | `options` | Choose from a list of values | `{"name": "optimizer", "type": "choice", "options": ["adam", "sgd"]}` |
|
|
199
|
+
| `grid_search` | `options` | Grid search over all values | `{"name": "batch_size", "type": "grid_search", "options": [16, 32, 64]}` |
|
|
200
|
+
|
|
201
|
+
**Important Notes:**
|
|
202
|
+
- All hyperparameters must include `name` and `type` fields
|
|
203
|
+
- For `loguniform`, `qloguniform`, `lograndint`, `qlograndint`: `base` parameter is required (typically 10 or 2)
|
|
204
|
+
- For `choice` and `grid_search`: Use `options` (not `values`)
|
|
205
|
+
- For range-based types: Use `min` and `max` (not `lower` and `upper`)
|
|
206
|
+
|
|
207
|
+
## Tune Configuration
|
|
208
|
+
|
|
209
|
+
### Basic Configuration
|
|
210
|
+
|
|
211
|
+
```python
|
|
212
|
+
{
|
|
213
|
+
"mode": "max", # "max" or "min"
|
|
214
|
+
"metric": "accuracy", # Metric to optimize
|
|
215
|
+
"num_samples": 10, # Number of trials
|
|
216
|
+
"max_concurrent_trials": 2 # Parallel trials
|
|
217
|
+
}
|
|
218
|
+
```
|
|
219
|
+
|
|
220
|
+
### With Search Algorithm
|
|
221
|
+
|
|
222
|
+
```python
|
|
223
|
+
{
|
|
224
|
+
"mode": "max",
|
|
225
|
+
"metric": "accuracy",
|
|
226
|
+
"num_samples": 20,
|
|
227
|
+
"max_concurrent_trials": 4,
|
|
228
|
+
"search_alg": {
|
|
229
|
+
"name": "hyperoptsearch", # Search algorithm
|
|
230
|
+
"points_to_evaluate": [ # Optional initial points
|
|
231
|
+
{
|
|
232
|
+
"learning_rate": 0.001,
|
|
233
|
+
"batch_size": 32
|
|
234
|
+
}
|
|
235
|
+
]
|
|
236
|
+
}
|
|
237
|
+
}
|
|
238
|
+
```
|
|
239
|
+
|
|
240
|
+
### With Scheduler
|
|
241
|
+
|
|
242
|
+
```python
|
|
243
|
+
{
|
|
244
|
+
"mode": "max",
|
|
245
|
+
"metric": "accuracy",
|
|
246
|
+
"num_samples": 50,
|
|
247
|
+
"max_concurrent_trials": 8,
|
|
248
|
+
"scheduler": {
|
|
249
|
+
"name": "hyperband", # Scheduler type
|
|
250
|
+
"options": {
|
|
251
|
+
"max_t": 100
|
|
252
|
+
}
|
|
253
|
+
}
|
|
254
|
+
}
|
|
255
|
+
```
|
|
256
|
+
|
|
257
|
+
### Supported Search Algorithms
|
|
258
|
+
|
|
259
|
+
- `basicvariantgenerator` - Random search (default)
|
|
260
|
+
- `bayesoptsearch` - Bayesian optimization
|
|
261
|
+
- `hyperoptsearch` - Tree-structured Parzen Estimator
|
|
262
|
+
|
|
263
|
+
### Supported Schedulers
|
|
264
|
+
|
|
265
|
+
- `fifo` - First-in-first-out (default)
|
|
266
|
+
- `hyperband` - HyperBand scheduler
|
|
267
|
+
|
|
268
|
+
## Plugin Development
|
|
269
|
+
|
|
270
|
+
### For Training Mode
|
|
271
|
+
|
|
272
|
+
Implement the `train()` function in your plugin:
|
|
273
|
+
|
|
274
|
+
```python
|
|
275
|
+
def train(run, dataset, hyperparameter, checkpoint, **kwargs):
|
|
276
|
+
"""
|
|
277
|
+
Training function for your model.
|
|
278
|
+
|
|
279
|
+
Args:
|
|
280
|
+
run: TrainRun object for logging
|
|
281
|
+
dataset: Dataset object
|
|
282
|
+
hyperparameter: dict with hyperparameters
|
|
283
|
+
checkpoint: Optional checkpoint for resuming
|
|
284
|
+
"""
|
|
285
|
+
# Access hyperparameters
|
|
286
|
+
epochs = hyperparameter['epochs']
|
|
287
|
+
batch_size = hyperparameter['batch_size']
|
|
288
|
+
learning_rate = hyperparameter['learning_rate']
|
|
289
|
+
|
|
290
|
+
# Training loop
|
|
291
|
+
for epoch in range(epochs):
|
|
292
|
+
# Train one epoch
|
|
293
|
+
loss, accuracy = train_one_epoch(...)
|
|
294
|
+
|
|
295
|
+
# Log metrics
|
|
296
|
+
run.log_metric('training', 'loss', loss, epoch=epoch)
|
|
297
|
+
run.log_metric('training', 'accuracy', accuracy, epoch=epoch)
|
|
298
|
+
|
|
299
|
+
# Log visualizations
|
|
300
|
+
run.log_visualization('predictions', 'train', epoch, image_data)
|
|
301
|
+
|
|
302
|
+
# Save final model
|
|
303
|
+
save_model(model, '/path/to/model.pth')
|
|
304
|
+
```
|
|
305
|
+
|
|
306
|
+
### For Tuning Mode
|
|
307
|
+
|
|
308
|
+
Implement the `tune()` function in your plugin:
|
|
309
|
+
|
|
310
|
+
```python
|
|
311
|
+
def tune(hyperparameter, run, dataset, checkpoint, **kwargs):
|
|
312
|
+
"""
|
|
313
|
+
Tuning function for hyperparameter optimization.
|
|
314
|
+
|
|
315
|
+
Args:
|
|
316
|
+
hyperparameter: dict with current trial's hyperparameters
|
|
317
|
+
run: TrainRun object for logging (with is_tune=True)
|
|
318
|
+
dataset: Dataset object
|
|
319
|
+
checkpoint: Optional checkpoint for resuming
|
|
320
|
+
"""
|
|
321
|
+
from ray import tune
|
|
322
|
+
|
|
323
|
+
# Set checkpoint output path BEFORE training
|
|
324
|
+
output_path = Path('/path/to/trial/weights')
|
|
325
|
+
run.checkpoint_output = str(output_path)
|
|
326
|
+
|
|
327
|
+
# Training loop
|
|
328
|
+
for epoch in range(hyperparameter['epochs']):
|
|
329
|
+
loss, accuracy = train_one_epoch(...)
|
|
330
|
+
|
|
331
|
+
# Log metrics (trial_id automatically added)
|
|
332
|
+
run.log_metric('training', 'loss', loss, epoch=epoch)
|
|
333
|
+
run.log_metric('training', 'accuracy', accuracy, epoch=epoch)
|
|
334
|
+
|
|
335
|
+
# Report results to Ray Tune
|
|
336
|
+
results = {
|
|
337
|
+
"accuracy": final_accuracy,
|
|
338
|
+
"loss": final_loss
|
|
339
|
+
}
|
|
340
|
+
|
|
341
|
+
# IMPORTANT: Report with checkpoint
|
|
342
|
+
tune.report(
|
|
343
|
+
results,
|
|
344
|
+
checkpoint=tune.Checkpoint.from_directory(run.checkpoint_output)
|
|
345
|
+
)
|
|
346
|
+
```
|
|
347
|
+
|
|
348
|
+
### Parameter Order Difference
|
|
349
|
+
|
|
350
|
+
**Important**: The parameter order differs between `train()` and `tune()`:
|
|
351
|
+
|
|
352
|
+
- `train(run, dataset, hyperparameter, checkpoint, **kwargs)`
|
|
353
|
+
- `tune(hyperparameter, run, dataset, checkpoint, **kwargs)`
|
|
354
|
+
|
|
355
|
+
### Automatic Trial ID Logging
|
|
356
|
+
|
|
357
|
+
When `is_tune=True`, the SDK automatically injects `trial_id` into all metric and visualization logs:
|
|
358
|
+
|
|
359
|
+
```python
|
|
360
|
+
# Your plugin code
|
|
361
|
+
run.log_metric('training', 'loss', 0.5, epoch=10)
|
|
362
|
+
|
|
363
|
+
# Actual logged data (trial_id added automatically)
|
|
364
|
+
{
|
|
365
|
+
'category': 'training',
|
|
366
|
+
'key': 'loss',
|
|
367
|
+
'value': 0.5,
|
|
368
|
+
'metrics': {'epoch': 10},
|
|
369
|
+
'trial_id': 'abc123' # Added automatically
|
|
370
|
+
}
|
|
371
|
+
```
|
|
372
|
+
|
|
373
|
+
No plugin changes required - this happens transparently at the SDK level.
|
|
374
|
+
|
|
375
|
+
## Migration from TuneAction
|
|
376
|
+
|
|
377
|
+
The standalone `TuneAction` is now **deprecated**. Migrate to `TrainAction` with `is_tune=true`:
|
|
378
|
+
|
|
379
|
+
### Before (Deprecated)
|
|
380
|
+
|
|
381
|
+
```json
|
|
382
|
+
{
|
|
383
|
+
"action": "tune",
|
|
384
|
+
"params": {
|
|
385
|
+
"name": "my_tuning_job",
|
|
386
|
+
"dataset": 123,
|
|
387
|
+
"hyperparameter": [...],
|
|
388
|
+
"tune_config": {...}
|
|
389
|
+
}
|
|
390
|
+
}
|
|
391
|
+
```
|
|
392
|
+
|
|
393
|
+
### After (Recommended)
|
|
394
|
+
|
|
395
|
+
```json
|
|
396
|
+
{
|
|
397
|
+
"action": "train",
|
|
398
|
+
"params": {
|
|
399
|
+
"name": "my_tuning_job",
|
|
400
|
+
"dataset": 123,
|
|
401
|
+
"is_tune": true,
|
|
402
|
+
"tune_hyperparameter": [...],
|
|
403
|
+
"tune_config": {...}
|
|
404
|
+
}
|
|
405
|
+
}
|
|
406
|
+
```
|
|
407
|
+
|
|
408
|
+
### Key Changes
|
|
409
|
+
|
|
410
|
+
1. Change `"action": "tune"` to `"action": "train"`
|
|
411
|
+
2. Add `"is_tune": true`
|
|
412
|
+
3. Rename `"hyperparameter"` to `"tune_hyperparameter"`
|
|
413
|
+
|
|
414
|
+
## Examples
|
|
415
|
+
|
|
416
|
+
### Simple Training
|
|
417
|
+
|
|
418
|
+
```json
|
|
419
|
+
{
|
|
420
|
+
"action": "train",
|
|
421
|
+
"params": {
|
|
422
|
+
"name": "resnet50_training",
|
|
423
|
+
"dataset": 456,
|
|
424
|
+
"checkpoint": null,
|
|
425
|
+
"hyperparameter": {
|
|
426
|
+
"epochs": 100,
|
|
427
|
+
"batch_size": 32,
|
|
428
|
+
"learning_rate": 0.001,
|
|
429
|
+
"optimizer": "adam",
|
|
430
|
+
"weight_decay": 0.0001
|
|
431
|
+
}
|
|
432
|
+
}
|
|
433
|
+
}
|
|
434
|
+
```
|
|
435
|
+
|
|
436
|
+
### Resume from Checkpoint
|
|
437
|
+
|
|
438
|
+
```json
|
|
439
|
+
{
|
|
440
|
+
"action": "train",
|
|
441
|
+
"params": {
|
|
442
|
+
"name": "resnet50_continued",
|
|
443
|
+
"dataset": 456,
|
|
444
|
+
"checkpoint": 789,
|
|
445
|
+
"hyperparameter": {
|
|
446
|
+
"epochs": 50,
|
|
447
|
+
"batch_size": 32,
|
|
448
|
+
"learning_rate": 0.0001,
|
|
449
|
+
"optimizer": "adam"
|
|
450
|
+
}
|
|
451
|
+
}
|
|
452
|
+
}
|
|
453
|
+
```
|
|
454
|
+
|
|
455
|
+
### Hyperparameter Tuning with Grid Search
|
|
456
|
+
|
|
457
|
+
```json
|
|
458
|
+
{
|
|
459
|
+
"action": "train",
|
|
460
|
+
"params": {
|
|
461
|
+
"name": "resnet50_tuning",
|
|
462
|
+
"dataset": 456,
|
|
463
|
+
"is_tune": true,
|
|
464
|
+
"tune_hyperparameter": [
|
|
465
|
+
{
|
|
466
|
+
"name": "batch_size",
|
|
467
|
+
"type": "grid_search",
|
|
468
|
+
"options": [16, 32, 64]
|
|
469
|
+
},
|
|
470
|
+
{
|
|
471
|
+
"name": "learning_rate",
|
|
472
|
+
"type": "grid_search",
|
|
473
|
+
"options": [0.001, 0.0001]
|
|
474
|
+
},
|
|
475
|
+
{
|
|
476
|
+
"name": "optimizer",
|
|
477
|
+
"type": "grid_search",
|
|
478
|
+
"options": ["adam", "sgd"]
|
|
479
|
+
}
|
|
480
|
+
],
|
|
481
|
+
"tune_config": {
|
|
482
|
+
"mode": "max",
|
|
483
|
+
"metric": "validation_accuracy",
|
|
484
|
+
"num_samples": 12,
|
|
485
|
+
"max_concurrent_trials": 4
|
|
486
|
+
}
|
|
487
|
+
}
|
|
488
|
+
}
|
|
489
|
+
```
|
|
490
|
+
|
|
491
|
+
### Advanced Tuning with HyperOpt and HyperBand
|
|
492
|
+
|
|
493
|
+
```json
|
|
494
|
+
{
|
|
495
|
+
"action": "train",
|
|
496
|
+
"params": {
|
|
497
|
+
"name": "resnet50_hyperopt_tuning",
|
|
498
|
+
"dataset": 456,
|
|
499
|
+
"is_tune": true,
|
|
500
|
+
"num_cpus": 2,
|
|
501
|
+
"num_gpus": 0.5,
|
|
502
|
+
"tune_hyperparameter": [
|
|
503
|
+
{
|
|
504
|
+
"name": "batch_size",
|
|
505
|
+
"type": "choice",
|
|
506
|
+
"options": [16, 32, 64, 128]
|
|
507
|
+
},
|
|
508
|
+
{
|
|
509
|
+
"name": "learning_rate",
|
|
510
|
+
"type": "loguniform",
|
|
511
|
+
"min": 0.00001,
|
|
512
|
+
"max": 0.01,
|
|
513
|
+
"base": 10
|
|
514
|
+
},
|
|
515
|
+
{
|
|
516
|
+
"name": "weight_decay",
|
|
517
|
+
"type": "loguniform",
|
|
518
|
+
"min": 0.00001,
|
|
519
|
+
"max": 0.001,
|
|
520
|
+
"base": 10
|
|
521
|
+
},
|
|
522
|
+
{
|
|
523
|
+
"name": "optimizer",
|
|
524
|
+
"type": "choice",
|
|
525
|
+
"options": ["adam", "sgd", "rmsprop"]
|
|
526
|
+
}
|
|
527
|
+
],
|
|
528
|
+
"tune_config": {
|
|
529
|
+
"mode": "max",
|
|
530
|
+
"metric": "validation_accuracy",
|
|
531
|
+
"num_samples": 50,
|
|
532
|
+
"max_concurrent_trials": 8,
|
|
533
|
+
"search_alg": {
|
|
534
|
+
"name": "hyperoptsearch"
|
|
535
|
+
},
|
|
536
|
+
"scheduler": {
|
|
537
|
+
"name": "hyperband",
|
|
538
|
+
"options": {
|
|
539
|
+
"max_t": 100
|
|
540
|
+
}
|
|
541
|
+
}
|
|
542
|
+
}
|
|
543
|
+
}
|
|
544
|
+
}
|
|
545
|
+
```
|
|
546
|
+
|
|
547
|
+
## Progress Tracking
|
|
548
|
+
|
|
549
|
+
The train action tracks progress across different phases:
|
|
550
|
+
|
|
551
|
+
### Training Mode
|
|
552
|
+
|
|
553
|
+
| Category | Proportion | Description |
|
|
554
|
+
| ------------ | ---------- | -------------------- |
|
|
555
|
+
| `validation` | 10% | Parameter validation |
|
|
556
|
+
| `training` | 90% | Model training |
|
|
557
|
+
|
|
558
|
+
### Tuning Mode
|
|
559
|
+
|
|
560
|
+
| Category | Proportion | Description |
|
|
561
|
+
| ------------ | ---------- | ---------------------------- |
|
|
562
|
+
| `validation` | 10% | Parameter validation |
|
|
563
|
+
| `trials` | 90% | Hyperparameter tuning trials |
|
|
564
|
+
|
|
565
|
+
## Benefits
|
|
566
|
+
|
|
567
|
+
### Unified Interface
|
|
568
|
+
|
|
569
|
+
- Single action for both training and tuning
|
|
570
|
+
- Consistent parameter handling
|
|
571
|
+
- Reduced code duplication
|
|
572
|
+
|
|
573
|
+
### Flexible Hyperparameters
|
|
574
|
+
|
|
575
|
+
- No rigid structure enforced by SDK
|
|
576
|
+
- Plugins define their own hyperparameter schema
|
|
577
|
+
- Support for custom fields without validation errors
|
|
578
|
+
|
|
579
|
+
### Advanced HPO
|
|
580
|
+
|
|
581
|
+
- Multiple search algorithms (Optuna, Ax, HyperOpt, BayesOpt)
|
|
582
|
+
- Multiple schedulers (ASHA, HyperBand, PBT)
|
|
583
|
+
- Automatic best model selection
|
|
584
|
+
|
|
585
|
+
### Developer Experience
|
|
586
|
+
|
|
587
|
+
- Automatic trial tracking
|
|
588
|
+
- Transparent logging enhancements
|
|
589
|
+
- Clear migration path from deprecated TuneAction
|
|
590
|
+
|
|
591
|
+
## Best Practices
|
|
592
|
+
|
|
593
|
+
### Hyperparameter Design
|
|
594
|
+
|
|
595
|
+
- Keep hyperparameter search spaces reasonable
|
|
596
|
+
- Start with grid search for initial exploration
|
|
597
|
+
- Use Bayesian optimization (Optuna, Ax) for efficient search
|
|
598
|
+
- Set appropriate `num_samples` based on search space size
|
|
599
|
+
|
|
600
|
+
### Resource Management
|
|
601
|
+
|
|
602
|
+
- Allocate `num_cpus` and `num_gpus` based on trial resource needs
|
|
603
|
+
- Set `max_concurrent_trials` based on available hardware
|
|
604
|
+
- Monitor resource usage during tuning
|
|
605
|
+
|
|
606
|
+
### Checkpoint Management
|
|
607
|
+
|
|
608
|
+
- Always set `run.checkpoint_output` before training in tune mode
|
|
609
|
+
- Save checkpoints at regular intervals
|
|
610
|
+
- Use the best checkpoint returned by tuning
|
|
611
|
+
|
|
612
|
+
### Logging
|
|
613
|
+
|
|
614
|
+
- Log all relevant metrics for comparison
|
|
615
|
+
- Use consistent metric names across trials
|
|
616
|
+
- Include validation metrics in tune reports
|
|
617
|
+
|
|
618
|
+
## Troubleshooting
|
|
619
|
+
|
|
620
|
+
### Common Issues
|
|
621
|
+
|
|
622
|
+
#### "hyperparameter is required when is_tune=False"
|
|
623
|
+
|
|
624
|
+
Make sure to provide `hyperparameter` in training mode:
|
|
625
|
+
|
|
626
|
+
```json
|
|
627
|
+
{
|
|
628
|
+
"is_tune": false,
|
|
629
|
+
"hyperparameter": {...}
|
|
630
|
+
}
|
|
631
|
+
```
|
|
632
|
+
|
|
633
|
+
#### "tune_hyperparameter is required when is_tune=True"
|
|
634
|
+
|
|
635
|
+
Make sure to provide `tune_hyperparameter` and `tune_config` in tuning mode:
|
|
636
|
+
|
|
637
|
+
```json
|
|
638
|
+
{
|
|
639
|
+
"is_tune": true,
|
|
640
|
+
"tune_hyperparameter": [...],
|
|
641
|
+
"tune_config": {...}
|
|
642
|
+
}
|
|
643
|
+
```
|
|
644
|
+
|
|
645
|
+
#### Tuning Fails Without Error
|
|
646
|
+
|
|
647
|
+
Check that your `tune()` function:
|
|
648
|
+
|
|
649
|
+
1. Sets `run.checkpoint_output` before training
|
|
650
|
+
2. Calls `tune.report()` with results and checkpoint
|
|
651
|
+
3. Returns properly without exceptions
|
|
652
|
+
|
|
653
|
+
## Next Steps
|
|
654
|
+
|
|
655
|
+
- **For Plugin Developers**: Implement `train()` and optionally `tune()` functions
|
|
656
|
+
- **For Users**: Start with training mode, then experiment with tuning
|
|
657
|
+
- **For Advanced Users**: Explore different search algorithms and schedulers
|
|
658
|
+
|
|
659
|
+
## Support and Resources
|
|
660
|
+
|
|
661
|
+
- **API Reference**: See TrainAction class documentation
|
|
662
|
+
- **Examples**: Check plugin examples repository
|
|
663
|
+
- **Ray Tune Documentation**: https://docs.ray.io/en/latest/tune/
|