ins-pricing 0.4.3__py3-none-any.whl → 0.4.5__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,945 +0,0 @@
1
- # BayesOpt Usage Guide (Framework + How-To)
2
-
3
- This document explains the overall framework, config fields, and recommended usage for the training/tuning/stacking pipeline under `ins_pricing/modelling/`. It is mainly for:
4
-
5
- - Batch training via JSON config using `ins_pricing/cli/BayesOpt_entry.py` (can be combined with `torchrun`)
6
- - Calling the Python API directly in notebooks/scripts via `ins_pricing.BayesOpt` or `ins_pricing.bayesopt`
7
-
8
- ---
9
-
10
- ## 1. Which file should you run?
11
-
12
- Files related to this workflow in `ins_pricing/modelling/`:
13
-
14
- - `ins_pricing/modelling/core/bayesopt/`: Core subpackage (data preprocessing, Trainer, Optuna tuning, FT embedding/self-supervised pretraining, plotting, SHAP, etc)
15
- - `ins_pricing/modelling/core/BayesOpt.py`: Compatibility entry that re-exports the new subpackage for older import paths
16
- - `ins_pricing/cli/BayesOpt_entry.py`: CLI batch entry (reads multiple CSVs from config, trains/tunes/saves/plots; supports DDP)
17
- - `ins_pricing/cli/BayesOpt_incremental.py`: Incremental training entry (append data and reuse params/models; for production incremental scenarios)
18
- - `ins_pricing/cli/utils/cli_common.py`: Shared CLI helpers (path resolution, model name generation, plotting selection)
19
- - `ins_pricing/__init__.py`: Makes `ins_pricing/` importable (e.g. `from ins_pricing import BayesOptModel` or `from ins_pricing import bayesopt`)
20
- - `ins_pricing/cli/utils/notebook_utils.py`: Notebook helpers (build and run BayesOpt_entry and watchdog commands)
21
- - `ins_pricing/cli/Pricing_Run.py`: Unified runner (notebook/script only needs a config; `runner` decides entry/incremental/DDP/watchdog)
22
- - `ins_pricing/examples/modelling/config_template.json`: Common config template (recommended to copy and edit)
23
- - `ins_pricing/examples/modelling/config_incremental_template.json`: Sample incremental training config (used by `Pricing_incremental.ipynb`)
24
- - `ins_pricing/examples/modelling/config_explain_template.json`: Explain workflow config template
25
- - `user_packages legacy/Try/config_Pricing_FT_Stack.json`: Historical "FT stacking" config example
26
- - Notebooks (demo): `ins_pricing/examples/modelling/Pricing_Run.ipynb`, `ins_pricing/examples/modelling/PricingSingle.ipynb`, `ins_pricing/examples/modelling/Explain_Run.ipynb`
27
- - Deprecated examples: see `user_packages legacy/Try/*_deprecate.ipynb`
28
-
29
- Note: `ins_pricing/examples/modelling/` is kept in the repo only; the PyPI package does not include this directory.
30
-
31
- ---
32
-
33
- ## 2. Overall framework (from data to model pipeline)
34
-
35
- ### 2.1 Typical flow for a single training job (BayesOpt_entry)
36
-
37
- Core logic in `BayesOpt_entry.py` (each dataset `model_name.csv` runs once):
38
-
39
- 1. Read `config.json`, build dataset names from `model_list x model_categories` (e.g. `od_bc`)
40
- 2. Load data from `data_dir/<model_name>.csv`
41
- 3. Split train/test with `split_strategy` (`random` / `time` / `group`)
42
- 4. Construct `BayesOptModel(train_df, test_df, ...)`
43
- 5. Run by FT role and model selection:
44
- - If `ft_role != "model"`: run FT first (tune/train/export embedding columns), then run base models (XGB/ResNet/GLM, etc)
45
- - If `ft_role == "model"`: FT itself is a prediction model and can be tuned/trained in parallel with others
46
- 6. Save models and parameter snapshots, optionally plot
47
-
48
- Extra: `BayesOpt_entry.py` / `BayesOpt_incremental.py` resolve relative paths in config as "relative to the config.json directory" (for example, if config is in `ins_pricing/examples/modelling/`, then `./Data` means `ins_pricing/examples/modelling/Data`). Currently supported path fields: `data_dir` / `output_dir` / `optuna_storage` / `gnn_graph_cache` / `best_params_files`.
49
-
50
- If you want notebook runs to only change config (no code changes), use `ins_pricing/examples/modelling/Pricing_Run.ipynb` (it calls `ins_pricing/cli/Pricing_Run.py`). Add a `runner` field in config to control entry/incremental/DDP/watchdog.
51
-
52
- ### 2.2 Core components in the BayesOpt subpackage
53
-
54
- Under `ins_pricing/modelling/core/bayesopt/`:
55
-
56
- - `BayesOptConfig`: unified config (epochs, feature lists, FT role, DDP/DP, etc)
57
- - `DatasetPreprocessor`: preprocessing once in `BayesOptModel` init:
58
- - create `w_act` (weighted actual), optional `w_binary_act`
59
- - cast categorical columns to `category`
60
- - create `train_oht_data/test_oht_data` (one-hot)
61
- - create `train_oht_scl_data/test_oht_scl_data` (one-hot with standardized numeric columns)
62
- - `TrainerBase`: base trainer with `tune()` (Optuna), `train()`, `save()/load()`, and distributed Optuna sync for DDP
63
- - Trainers (`BayesOptModel.trainers`):
64
- - `GLMTrainer`: statsmodels GLM
65
- - `XGBTrainer`: xgboost
66
- - `ResNetTrainer`: PyTorch MLP/ResNet style
67
- - `FTTrainer`: FT-Transformer (supports 3 roles)
68
- - `GNNTrainer`: GNN (standalone model `gnn`, or used to generate geo tokens for FT)
69
- - `OutputManager`: unified output paths (`plot/`, `Results/`, `model/`)
70
- - `VersionManager`: save/load snapshots (`Results/versions/*_ft_best.json`, etc)
71
-
72
- ### 2.3 BayesOpt subpackage structure (read in code order)
73
-
74
- `BayesOpt` is now a subpackage (`ins_pricing/modelling/core/bayesopt/`). Recommended order:
75
-
76
- 1) **Tools and utilities**
77
-
78
- - `IOUtils / TrainingUtils / PlotUtils`: I/O, training utilities (batch size, loss functions, free_cuda), plotting helpers
79
- - `DistributedUtils`: DDP init, rank/world_size helpers
80
-
81
- 2) **TorchTrainerMixin (common components for torch tabular training)**
82
-
83
- - DataLoader: `_build_dataloader()` / `_build_val_dataloader()` (prints batch/accum/workers)
84
- - Loss: `_compute_losses()` / `_compute_weighted_loss()` (regression supports tweedie/poisson/gamma/mse/mae; classification uses BCEWithLogits)
85
- - Early stop: `_early_stop_update()`
86
-
87
- 3) **Sklearn-style model classes (core training objects)**
88
-
89
- - `ResNetSklearn`: `fit/predict/set_params`, holds `ResNetSequential`, supports DP/DDP
90
- - `FTTransformerSklearn`: `fit/predict/fit_unsupervised`, supports embedding output, DP/DDP
91
- - `GraphNeuralNetSklearn`: `fit/predict/set_params`, used for geo tokens (CPU/GPU graph build, adjacency cache)
92
-
93
- 4) **Config and preprocessing/output management**
94
-
95
- - `BayesOptConfig`: aggregated config for task, training, parallelism, FT role (built in `BayesOptModel`)
96
- - `OutputManager`: manage `plot/Results/model` under output root
97
- - `VersionManager`: write snapshots to `Results/versions/` and read latest (for best_params reuse)
98
- - `DatasetPreprocessor`: runs in `BayesOptModel.__init__`, generates data views and derived columns
99
-
100
- 5) **Trainer system (Optuna + training + cached predictions)**
101
-
102
- - `TrainerBase`: `tune()` (Optuna), `save()/load()`, distributed Optuna sync for DDP
103
- - `cross_val_generic()`: generic CV/holdout evaluation logic (trainer supplies model_builder/metric_fn/fit_predict_fn)
104
- - `_fit_predict_cache()` / `_predict_and_cache()`: after training, write predictions back to `BayesOptModel.train_data/test_data`
105
-
106
- 6) **Orchestrator BayesOptModel**
107
-
108
- - `BayesOptModel.optimize_model(model_key, max_evals)`: unified entry, responsible for:
109
- - selecting objective (e.g. self-supervised objective when `ft_role=unsupervised_embedding`)
110
- - "FT as feature" mode: export `pred_<prefix>_*` and inject into downstream features
111
- - saving snapshots (for reuse/backtracking)
112
- - `save_model/load_model`, `plot_*`, `compute_shap_*`, etc
113
-
114
- ### 2.4 Key call chain (from entry to disk)
115
-
116
- Using `BayesOpt_entry.py` as an example:
117
-
118
- 1. `BayesOpt_entry.train_from_config()` reads CSV and builds `BayesOptModel(...)`
119
- 2. `BayesOptModel.optimize_model(model_key)`
120
- 3. `TrainerBase.tune()` (if `reuse_best_params` is false or no historical params found)
121
- - calls `Trainer.cross_val()` or FT self-supervised `Trainer.cross_val_unsupervised()`
122
- - inside `cross_val_generic()`:
123
- - sample Optuna params
124
- - build model `model_builder(params)`
125
- - train and evaluate on validation via `metric_fn(...)`
126
- 4. `Trainer.train()` trains the final model with `best_params` and caches prediction columns
127
- 5. `Trainer.save()` saves model files; `BayesOptModel.optimize_model()` saves parameter snapshots
128
-
129
- **Optuna under DDP (distributed coordination)**:
130
-
131
- - Only rank0 drives Optuna sampling; trial params are broadcast to other ranks
132
- - Non-rank0 processes do not sample; they receive params and run the same objective (multi-GPU sync)
133
-
134
- ### 2.5 Data views and cached columns (used by training/plotting)
135
-
136
- `DatasetPreprocessor` creates common columns in `train_data/test_data`:
137
-
138
- - `w_act`: `target * weight`
139
- - (if `binary_resp_nme` provided) `w_binary_act`: `binary_target * weight`
140
-
141
- After training, `TrainerBase._predict_and_cache()` writes predictions back:
142
-
143
- - **Scalar prediction models**:
144
- - `pred_<prefix>` (e.g. `pred_xgb/pred_resn/pred_ft`)
145
- - `w_pred_<prefix>` (column name `w_pred_xgb`; computed as `pred_<prefix> * weight`)
146
- - **Multi-dim output (embedding)**:
147
- - `pred_<prefix>_0 .. pred_<prefix>_{k-1}` (e.g. `pred_ft_emb_0..`)
148
- - these multi-dim columns do not have `w_` weighted columns
149
-
150
- These prediction columns are used by lift/dlift/oneway plotting and downstream stacking.
151
-
152
- ### 2.6 Sklearn-style model classes: details and usage
153
-
154
- Below are the three sklearn-style model classes in `bayesopt` (usually created by trainers, but can be used directly).
155
-
156
- #### 2.6.1 ResNetSklearn (`class ResNetSklearn`)
157
-
158
- Purpose: train a residual MLP on one-hot/standardized tabular features (regression uses Softplus, classification outputs logits).
159
-
160
- Key parameters (common):
161
-
162
- - `input_dim`: input dimension (typically number of one-hot columns)
163
- - `hidden_dim`, `block_num`: width and number of residual blocks
164
- - `learning_rate`, `epochs`, `patience`
165
- - `use_data_parallel` / `use_ddp`
166
-
167
- Key methods:
168
-
169
- - `fit(X_train, y_train, w_train, X_val, y_val, w_val, trial=...)`
170
- - `predict(X_test)`: classification uses sigmoid; regression clips to positive
171
- - `set_params(params: dict)`: trainer writes `best_params` back to model
172
-
173
- Minimal manual example:
174
-
175
- ```python
176
- from ins_pricing.BayesOpt import ResNetSklearn
177
-
178
- # Use the one-hot standardized view from DatasetPreprocessor for X_train/X_val.
179
- resn = ResNetSklearn(model_nme="od_bc", input_dim=X_train.shape[1], task_type="regression", epochs=50)
180
- resn.set_params({"hidden_dim": 32, "block_num": 4, "learning_rate": 1e-3})
181
- resn.fit(X_train, y_train, w_train, X_val, y_val, w_val)
182
- y_pred = resn.predict(X_val)
183
- ```
184
-
185
- #### 2.6.2 FTTransformerSklearn (`class FTTransformerSklearn`)
186
-
187
- Purpose: learn Transformer representations on numeric/categorical features; supports three output modes:
188
-
189
- - supervised prediction: `predict()` returns scalar predictions
190
- - embedding output: `predict(return_embedding=True)` returns `(N, d_model)` embeddings
191
- - self-supervised masked reconstruction: `fit_unsupervised()` (used by `ft_role=unsupervised_embedding`)
192
-
193
- Key details:
194
-
195
- - Numeric columns are `nan_to_num` and standardized by train mean/std in `_tensorize_split()` (reduces AMP overflow risk)
196
- - Categorical columns record train `categories` on first build; inference uses the same categories; unknown/missing maps to "unknown index" (`len(categories)`)
197
- - DDP uses `DistributedSampler`; the self-supervised head is computed inside forward to avoid DDP "ready twice" errors
198
-
199
- Key methods:
200
-
201
- - `fit(X_train, y_train, w_train, X_val, y_val, w_val, trial=..., geo_train=..., geo_val=...)`
202
- - `predict(X_test, geo_tokens=None, return_embedding=False)`
203
- - `fit_unsupervised(X_train, X_val=None, mask_prob_num=..., mask_prob_cat=..., ...) -> float`
204
-
205
- Minimal manual example (self-supervised pretrain + embeddings):
206
-
207
- ```python
208
- from ins_pricing.BayesOpt import FTTransformerSklearn
209
-
210
- ft = FTTransformerSklearn(
211
- model_nme="od_bc",
212
- num_cols=num_cols,
213
- cat_cols=cat_cols,
214
- d_model=64,
215
- n_heads=4,
216
- n_layers=4,
217
- dropout=0.1,
218
- epochs=30,
219
- use_ddp=False,
220
- )
221
-
222
- val_loss = ft.fit_unsupervised(train_df, X_val=test_df, mask_prob_num=0.2, mask_prob_cat=0.2)
223
- emb = ft.predict(test_df, return_embedding=True) # shape: (N, d_model)
224
- ```
225
-
226
- #### 2.6.3 GraphNeuralNetSklearn (`class GraphNeuralNetSklearn`)
227
-
228
- Purpose: build a graph from `geo_feature_nmes` and train a small GNN to generate geo tokens for FT.
229
-
230
- Key details:
231
-
232
- - Graph building: kNN (approx via pynndescent if available; GPU graph build with PyG when memory allows)
233
- - Adjacency cache: `graph_cache_path`
234
- - Training: full-graph training (one forward per epoch), good for moderate-size geo features
235
-
236
- Key methods:
237
-
238
- - `fit(X_train, y_train, w_train, X_val, y_val, w_val, trial=...)`
239
- - `predict(X)`: regression clips positive; classification uses sigmoid
240
- - `set_params(params: dict)`: rebuilds the backbone after structural changes
241
-
242
- > In most stacking workflows you do not need to call it manually: when `geo_feature_nmes` is provided in config, `BayesOptModel` builds and caches geo tokens during init.
243
-
244
- ### 2.7 Mapping between Trainer and Sklearn models (who calls what)
245
-
246
- To unify tuning and final training/saving, `bayesopt` uses two layers:
247
-
248
- - **Trainer (tuning/scheduling layer)**: Optuna, CV/holdout, feature view selection, save/load, prediction caching
249
- - **Sklearn-style model (execution layer)**: only fit/predict (plus minimal helpers), no Optuna or output paths
250
-
251
- Mapping overview:
252
-
253
- - `GLMTrainer` -> statsmodels GLM (not a `*Sklearn` class; trainer builds design matrix and caches `pred_glm/w_pred_glm`)
254
- - `XGBTrainer` -> `xgb.XGBRegressor` (`enable_categorical=True`, choose `gpu_hist/hist` based on `use_gpu`)
255
- - `ResNetTrainer` -> `ResNetSklearn`
256
- - Feature view: usually `train_oht_scl_data/test_oht_scl_data` with `var_nmes` (one-hot + standardize)
257
- - Cached columns: `pred_resn/w_pred_resn`
258
- - `FTTrainer` -> `FTTransformerSklearn`
259
- - Feature view: raw `train_data/test_data` with `factor_nmes` (numeric + category columns; category columns must be declared in `cate_list`)
260
- - `ft_role=model`: cache `pred_ft/w_pred_ft`
261
- - `ft_role=embedding/unsupervised_embedding`: cache `pred_<prefix>_0..` and inject into downstream `factor_nmes`
262
- - `GraphNeuralNetSklearn`: primarily used by `BayesOptModel` to generate geo tokens (when `geo_feature_nmes` is set)
263
-
264
- ---
265
-
266
- ## 3. Three FT roles (decide whether to stack)
267
-
268
- FT role is controlled by `ft_role` (from config or CLI `--ft-role`):
269
-
270
- ### 3.1 `ft_role="model"` (FT as a prediction model)
271
-
272
- - Goal: train FT directly from `X -> y`, generate `pred_ft` / `w_pred_ft`
273
- - FT participates in lift/dlift/SHAP evaluation
274
-
275
- ### 3.2 `ft_role="embedding"` (supervised training, export embeddings only)
276
-
277
- - Goal: still train with `X -> y` (embedding quality influenced by supervised signal)
278
- - Export pooled embedding feature columns: `pred_<ft_feature_prefix>_0..`
279
- - These columns are injected into `factor_nmes` for downstream base models (stacking)
280
- - FT itself is not evaluated as a standalone model in lift/SHAP
281
-
282
- ### 3.3 `ft_role="unsupervised_embedding"` (masked pretrain + embeddings)
283
-
284
- - Goal: do not use `y`; run masked reconstruction on inputs `X` (numeric + categorical)
285
- - Export `pred_<ft_feature_prefix>_0..` and inject to downstream features
286
- - Suitable for "representation first, base model decision" two-stage stacking
287
-
288
- ---
289
-
290
- ## 4. What does Optuna optimize?
291
-
292
- ### 4.1 Supervised models (GLM/XGB/ResNet/FT-as-model)
293
-
294
- - `TrainerBase.tune()` calls each trainer's `cross_val()` and minimizes validation metric (default direction `minimize`)
295
- - Regression loss is configurable (tweedie/poisson/gamma/mse/mae); classification uses logloss
296
-
297
- ### 4.2 FT self-supervised (`unsupervised_embedding`)
298
-
299
- When `ft_role="unsupervised_embedding"`, `BayesOptModel.optimize_model("ft")` calls:
300
-
301
- - `FTTrainer.cross_val_unsupervised()` (Optuna objective)
302
- - Objective: validation loss of masked reconstruction (smaller is better)
303
- - Numeric: MSE only on masked positions (multiplied by `num_loss_weight`)
304
- - Categorical: cross-entropy only on masked positions (multiplied by `cat_loss_weight`)
305
-
306
- Note:
307
- - `n_heads` is not searched by default; it is derived from `d_model` with divisibility guarantees (see `FTTrainer._resolve_adaptive_heads()`).
308
-
309
- ---
310
-
311
- ## 5. Output directories and files (convention)
312
-
313
- Output root comes from `output_dir` (config) or CLI `--output-dir`. Under it:
314
-
315
- - `plot/`: plots (loss curves, lift/dlift/oneway, etc)
316
- - `Results/`: params, metrics, version snapshots
317
- - `Results/<model>_bestparams_<trainer>.csv`: best params per trainer after tuning
318
- - `Results/versions/<timestamp>_<model_key>_best.json`: snapshots (best_params and config)
319
- - `model/`: model files
320
- - GLM/XGB: `pkl`
321
- - PyTorch: `pth` (ResNet usually saves state_dict; FT usually saves full object)
322
-
323
- ---
324
-
325
- ## 6. Config fields (JSON) - common
326
-
327
- Start by copying `ins_pricing/examples/modelling/config_template.json`. Examples: `ins_pricing/examples/modelling/config_template.json`, `ins_pricing/examples/modelling/config_incremental_template.json`, `user_packages legacy/Try/config_Pricing_FT_Stack.json`.
328
-
329
- ### 6.1 Path resolution rules (important)
330
-
331
- - `BayesOpt_entry.py` / `BayesOpt_incremental.py` resolve relative paths in config as "relative to the config.json directory".
332
- - Example: config in `ins_pricing/examples/modelling/` and `data_dir: "./Data"` means `ins_pricing/examples/modelling/Data`.
333
- - Fields resolved: `data_dir` / `output_dir` / `optuna_storage` / `gnn_graph_cache` / `best_params_files`.
334
- - If `optuna_storage` looks like a URL (contains `://`), it is passed to Optuna as-is; otherwise it is resolved as a file path and converted to absolute.
335
-
336
- **Data and task**
337
-
338
- - `data_dir` (str): directory of CSV files (`<model_name>.csv` per dataset)
339
- - `model_list` (list[str]) / `model_categories` (list[str]): build dataset names (cartesian product)
340
- - `target` (str): target column name
341
- - `weight` (str): weight column name
342
- - `feature_list` (list[str]): feature column names (recommended to provide explicitly; otherwise inferred in `BayesOptModel`)
343
- - `categorical_features` (list[str]): categorical column names (if empty, inferred in `BayesOptModel`)
344
- - `binary_resp_nme` (str|null, optional): binary target column (for conversion curves, etc)
345
- - `task_type` (str, optional): `"regression"` / `"classification"`, default `"regression"`
346
-
347
- **Training and split**
348
-
349
- - `prop_test` (float): train/test split ratio (entry splits train/test; trainers also do CV/holdout), typical `(0, 0.5]`, default `0.25`
350
- - `split_strategy` (str): `"random"` / `"time"` / `"group"` (applies in `BayesOpt_entry.py` and `Explain_entry.py`)
351
- - `split_time_col` (str|null): required when `split_strategy="time"` (time order for holdout)
352
- - `split_time_ascending` (bool): time sort direction, default `true`
353
- - `split_group_col` (str|null): required when `split_strategy="group"` (group holdout)
354
- - `cv_strategy` (str|null): CV strategy for Optuna folds (`"random"` / `"time"` / `"group"`); if null, defaults to `split_strategy`
355
- - `cv_time_col` (str|null): required when `cv_strategy="time"` (time order for CV)
356
- - `cv_time_ascending` (bool): time sort direction for CV, default `true`
357
- - `cv_group_col` (str|null): required when `cv_strategy="group"` (group CV)
358
- - `cv_splits` (int|null): explicit CV fold count (otherwise derived from `prop_test`)
359
- - `rand_seed` (int): random seed, default `13`
360
- - `epochs` (int): NN epochs (ResNet/FT/GNN), default `50`
361
- - `use_gpu` (bool, optional): prefer GPU (actual usage depends on `torch.cuda.is_available()`)
362
- - `resn_weight_decay` (float, optional): ResNet weight decay (L2), default `1e-4`
363
- - `final_ensemble` (bool, optional): enable k-fold model averaging during final training, default `false`
364
- - `final_ensemble_k` (int, optional): number of folds for averaging, default `3`
365
- - `final_refit` (bool, optional): enable refit after early stop with full data, default `true`
366
-
367
- Note: when `cv_strategy="time"` and a sampling cap is applied (e.g. `bo_sample_limit` or FT unsupervised `max_rows_for_ft_bo`), the subset is chosen in time order (no random sampling).
368
-
369
- **FT stacking**
370
-
371
- - `ft_role` (str): `"model"` / `"embedding"` / `"unsupervised_embedding"`
372
- - `"model"`: FT acts as prediction model and outputs `pred_ft`
373
- - `"embedding"`: FT is supervised but only exports embedding feature columns `pred_<prefix>_*`, not evaluated as final model
374
- - `"unsupervised_embedding"`: FT uses masked reconstruction pretraining, exports `pred_<prefix>_*`
375
- - `ft_feature_prefix` (str): prefix for exported features (creates `pred_<prefix>_0..`)
376
- - `ft_num_numeric_tokens` (int|null): number of numeric tokens for FT; default equals number of numeric features
377
- - `stack_model_keys` (list[str]): when `ft_role != "model"` and you want base models after FT, specify trainers to run, e.g. `["xgb","resn"]` or `["all"]`
378
-
379
- **Parallelism and DDP**
380
-
381
- - `use_resn_ddp` / `use_ft_ddp` / `use_gnn_ddp` (bool): use DDP (requires `torchrun`/`nproc_per_node>1`)
382
- - `use_resn_data_parallel` / `use_ft_data_parallel` / `use_gnn_data_parallel` (bool): allow DataParallel as fallback
383
-
384
- **Reuse historical best params (skip Optuna)**
385
-
386
- - `reuse_best_params` (bool): `true/false`
387
- - `true`: try `Results/versions/*_<model_key>_best.json` first, else fall back to `Results/<model>_bestparams_*.csv`
388
- - if not found, runs Optuna normally
389
- - `best_params_files` (dict, optional): explicit best param files, format `{"xgb":"./Results/xxx.csv","ft":"./Results/xxx.json"}`
390
- - supports `.csv/.tsv` (read first row) and `.json` (`{"best_params": {...}}` or direct dict)
391
- - if provided, reads directly and skips Optuna
392
-
393
- **Optuna resume (recommended)**
394
-
395
- - `optuna_storage` (str|null): Optuna storage (sqlite recommended)
396
- - example: `"./Results/optuna/bayesopt.sqlite3"` (resolved to absolute path)
397
- - or: `"sqlite:///E:/path/to/bayesopt.sqlite3"` (URL passed as-is)
398
- - `optuna_study_prefix` (str): study name prefix; keep fixed for resuming
399
-
400
- **XGBoost search caps (avoid very slow trials)**
401
-
402
- - `xgb_max_depth_max` (int): max depth cap, default `25`
403
- - `xgb_n_estimators_max` (int): tree count cap, default `500`
404
-
405
- **GNN and geo tokens (optional)**
406
-
407
- - `gnn_use_approx_knn` (bool): prefer approximate kNN for large samples
408
- - `gnn_approx_knn_threshold` (int): row threshold to switch to approximate kNN
409
- - `gnn_graph_cache` (str|null): adjacency/graph cache path
410
- - `gnn_max_gpu_knn_nodes` (int): force CPU kNN above this node count (avoid GPU OOM)
411
- - `gnn_knn_gpu_mem_ratio` (float): fraction of free GPU memory allowed for kNN
412
- - `gnn_knn_gpu_mem_overhead` (float): memory overhead multiplier for kNN
413
- - `geo_feature_nmes` (list[str]): raw columns for geo tokens (empty means no geo tokens)
414
- - `region_province_col` / `region_city_col` (str|null): province/city columns (for region_effect features)
415
- - `region_effect_alpha` (float): partial pooling strength (>=0)
416
-
417
- **Plotting (optional)**
418
-
419
- - `plot_curves` (bool): plot at end of run
420
- - `plot` (dict): recommended unified plot settings
421
- - `plot.enable` (bool)
422
- - `plot.n_bins` (int): bin count
423
- - `plot.oneway` (bool)
424
- - `plot.lift_models` (list[str]): model keys for lift plots (e.g. `["xgb","resn"]`), empty means all trained models
425
- - `plot.double_lift` (bool)
426
- - `plot.double_lift_pairs` (list): supports `["xgb,resn"]` or `[["xgb","resn"]]`
427
-
428
- **Standalone plotting (recommended)**
429
-
430
- `ins_pricing.plotting` provides plotting utilities decoupled from training. You can use DataFrames or arrays to compare models:
431
-
432
- - `plotting.curves`: lift/double lift/ROC/PR/KS/calibration/conversion lift
433
- - `plotting.diagnostics`: loss curve, one-way plots
434
- - `plotting.importance`: feature importance (supports SHAP summary)
435
- - `plotting.geo`: geo heatmaps/contours (with map tiles for heatmap/contour)
436
-
437
- Example (standalone):
438
-
439
- ```python
440
- from ins_pricing.plotting import curves, importance, geo
441
-
442
- # Lift / Double Lift
443
- curves.plot_lift_curve(pred, w_act, weight, n_bins=10, save_path="plot/lift.png")
444
- curves.plot_double_lift_curve(pred1, pred2, w_act, weight, n_bins=10, save_path="plot/dlift.png")
445
-
446
- # ROC / PR (multi-model comparison)
447
- curves.plot_roc_curves(y_true, {"xgb": pred_xgb, "resn": pred_resn}, save_path="plot/roc.png")
448
- curves.plot_pr_curves(y_true, {"xgb": pred_xgb, "resn": pred_resn}, save_path="plot/pr.png")
449
-
450
- # Feature importance
451
- importance.plot_feature_importance({"x1": 0.32, "x2": 0.18}, save_path="plot/importance.png")
452
-
453
- # Geo heat/contour
454
- geo.plot_geo_heatmap(df, x_col="lon", y_col="lat", value_col="loss", bins=50, save_path="plot/geo_heat.png")
455
- geo.plot_geo_contour(df, x_col="lon", y_col="lat", value_col="loss", levels=12, save_path="plot/geo_contour.png")
456
-
457
- # Map heatmap (requires contextily)
458
- geo.plot_geo_heatmap_on_map(df, lon_col="lon", lat_col="lat", value_col="loss", bins=80, save_path="plot/map_heat.png")
459
- ```
460
-
461
- Map functions use lat/lon (EPSG:4326) by default and auto-scale view to data bounds.
462
-
463
- The training flow also uses this plotting package (`plot_oneway`/`plot_lift`/`plot_dlift`/`plot_conversion_lift`/loss curves) for consistent maintenance.
464
-
465
- **Model explanation (standalone module, light + deep)**
466
-
467
- `ins_pricing.explain` provides model explanation methods decoupled from training:
468
-
469
- - Light: permutation importance (for XGB/ResNet/FT, global)
470
- - Deep: integrated gradients (for ResNet/FT, mainly numeric features)
471
- - Classic: SHAP (KernelExplainer, for GLM/XGB/ResNet/FT, requires `shap`)
472
-
473
- SHAP is optional; a prompt appears if not installed.
474
-
475
- Example:
476
-
477
- ```python
478
- from ins_pricing.explain import (
479
- permutation_importance,
480
- resnet_integrated_gradients,
481
- ft_integrated_gradients,
482
- compute_shap_xgb,
483
- )
484
-
485
- # permutation importance
486
- imp = permutation_importance(
487
- predict_fn=model.predict,
488
- X=X_valid,
489
- y=y_valid,
490
- sample_weight=w_valid,
491
- metric="rmse",
492
- n_repeats=5,
493
- )
494
-
495
- # ResNet integrated gradients
496
- ig_resn = resnet_integrated_gradients(resn_model, X_valid_scl, steps=50)
497
-
498
- # FT integrated gradients (categorical fixed; numeric/geo participate)
499
- ig_ft = ft_integrated_gradients(ft_model, X_valid, geo_tokens=geo_tokens, steps=50)
500
-
501
- # SHAP for XGB (BayesOptModel as context)
502
- shap_xgb = compute_shap_xgb(model, n_background=500, n_samples=200, on_train=False)
503
- ```
504
-
505
- BayesOptModel also provides convenience wrappers:
506
-
507
- ```python
508
- model.compute_permutation_importance("resn", on_train=False, metric="rmse")
509
- model.compute_integrated_gradients_resn(on_train=False, steps=50)
510
- model.compute_integrated_gradients_ft(on_train=False, steps=50)
511
- model.compute_shap_xgb(on_train=False)
512
- model.compute_shap_glm(on_train=False)
513
- ```
514
-
515
- **Explain batch via config**
516
-
517
- Use `Explain_entry.py` with config to load trained models under `output_dir/model` and run explanations on the validation set:
518
-
519
- ```bash
520
- python ins_pricing/cli/Explain_entry.py --config-json ins_pricing/examples/modelling/config_explain_template.json
521
- ```
522
-
523
- Notebook option: `ins_pricing/examples/modelling/Explain_Run.ipynb`.
524
-
525
- **Environment variable injection (optional)**
526
-
527
- - `env`: values are set via `os.environ.setdefault()` (e.g. thread limits, CUDA debug)
528
-
529
- ### 6.2 Notebook unified run: runner field (recommended)
530
-
531
- All `Pricing_*.ipynb` are thin wrappers: they only call `Pricing_Run.run("<config.json>")`, and the run mode is controlled by config `runner`.
532
-
533
- Notebook usage (recommended):
534
-
535
- ```python
536
- from ins_pricing.cli.Pricing_Run import run
537
- run("examples/modelling/config_template.json")
538
- ```
539
-
540
- CLI usage (optional):
541
-
542
- ```bash
543
- python ins_pricing/cli/Pricing_Run.py --config-json ins_pricing/examples/modelling/config_template.json
544
- ```
545
-
546
- `runner` supports three modes:
547
-
548
- - `runner.mode="entry"`: run `BayesOpt_entry.py`
549
- - `runner.model_keys` (list[str]): `["glm","xgb","resn","ft","gnn"]` or includes `"all"`
550
- - `runner.nproc_per_node` (int): `1` (single process) or `>=2` (torchrun/DDP)
551
- - `runner.max_evals` (int): Optuna trials per model (default `50`)
552
- - `runner.plot_curves` (bool): add `--plot-curves`
553
- - `runner.ft_role` (str|null): if set, overrides config `ft_role`
554
-
555
- - `runner.mode="incremental"`: run `BayesOpt_incremental.py`
556
- - `runner.incremental_args` (list[str]): equivalent to CLI args for the incremental script
557
- - common: `--incremental-dir/--incremental-file`, `--merge-keys`, `--timestamp-col`, `--model-keys`, `--max-evals`, `--update-base-data`, `--summary-json`, etc
558
-
559
- - `runner.mode="explain"`: run `Explain_entry.py`
560
- - `runner.explain_args` (list[str]): equivalent to CLI args for the explain script
561
-
562
- watchdog (available in both modes):
563
-
564
- - `runner.use_watchdog` (bool): enable watchdog
565
- - `runner.idle_seconds` (int): seconds without output to treat as stuck
566
- - `runner.max_restarts` (int): max restarts
567
- - `runner.restart_delay_seconds` (int): delay between restarts
568
-
569
- ---
570
-
571
- ## 7. CLI: BayesOpt_entry.py examples
572
-
573
- ### 7.0 Quick args reference (BayesOpt_entry.py)
574
-
575
- Common CLI args for `BayesOpt_entry.py` (`--config-json` is required):
576
-
577
- - `--config-json` (required, str): config path (recommend `ins_pricing/examples/modelling/xxx.json` or absolute path)
578
- - `--model-keys` (list[str]): `glm` / `xgb` / `resn` / `ft` / `gnn` / `all`
579
- - `--stack-model-keys` (list[str]): only when `ft_role != model`; same values as `--model-keys`
580
- - `--max-evals` (int): Optuna trials per dataset per model
581
- - `--plot-curves` (flag): enable plotting (also controlled by `plot_curves`/`plot.enable` in config)
582
- - `--output-dir` (str): override config `output_dir`
583
- - `--reuse-best-params` (flag): override config and reuse historical params to skip Optuna
584
-
585
- DDP/DP (override config):
586
-
587
- - `--use-resn-ddp` / `--use-ft-ddp` / `--use-gnn-ddp` (flag): force DDP for trainer
588
- - `--use-resn-dp` / `--use-ft-dp` / `--use-gnn-dp` (flag): enable DataParallel fallback
589
-
590
- GNN graph build (override config):
591
-
592
- - `--gnn-no-ann` (flag): disable approximate kNN
593
- - `--gnn-ann-threshold` (int): override `gnn_approx_knn_threshold`
594
- - `--gnn-graph-cache` (str): override `gnn_graph_cache`
595
- - `--gnn-max-gpu-nodes` (int): override `gnn_max_gpu_knn_nodes`
596
- - `--gnn-gpu-mem-ratio` (float): override `gnn_knn_gpu_mem_ratio`
597
- - `--gnn-gpu-mem-overhead` (float): override `gnn_knn_gpu_mem_overhead`
598
-
599
- FT feature mode:
600
-
601
- - `--ft-role` (str): `model` / `embedding` / `unsupervised_embedding`
602
- - `--ft-feature-prefix` (str): feature prefix (e.g. `ft_emb`)
603
- - `--ft-as-feature` (flag): compatibility alias (if config ft_role is default, set to `embedding`)
604
-
605
- ### 7.1 Direct train/tune (single machine)
606
-
607
- ```bash
608
- python ins_pricing/cli/BayesOpt_entry.py ^
609
- --config-json ins_pricing/examples/modelling/config_template.json ^
610
- --model-keys xgb resn ^
611
- --max-evals 50
612
- ```
613
-
614
- ### 7.2 FT stacking: self-supervised FT then base models (single machine or torchrun)
615
-
616
- If config already has `ft_role=unsupervised_embedding`, you can omit `--ft-role`.
617
-
618
- ```bash
619
- python ins_pricing/cli/BayesOpt_entry.py ^
620
- --config-json "user_packages legacy/Try/config_Pricing_FT_Stack.json" ^
621
- --model-keys xgb resn ^
622
- --max-evals 50
623
- ```
624
-
625
- DDP (multi-GPU) example:
626
-
627
- ```bash
628
- torchrun --standalone --nproc_per_node=2 ^
629
- ins_pricing/cli/BayesOpt_entry.py ^
630
- --config-json "user_packages legacy/Try/config_Pricing_FT_Stack.json" ^
631
- --model-keys xgb resn ^
632
- --use-ft-ddp ^
633
- --max-evals 50
634
- ```
635
-
636
- ### 7.3 Reuse historical best params (skip tuning)
637
-
638
- ```bash
639
- python ins_pricing/cli/BayesOpt_entry.py ^
640
- --config-json "user_packages legacy/Try/config_Pricing_FT_Stack.json" ^
641
- --model-keys xgb resn ^
642
- --reuse-best-params
643
- ```
644
-
645
- ### 7.4 Quick args reference (BayesOpt_incremental.py)
646
-
647
- `BayesOpt_incremental.py` has many args; the common combo is incremental data source + merge/dedupe + models to retrain.
648
-
649
- Common args:
650
-
651
- - `--config-json` (required, str): reuse the same config (must include `data_dir/model_list/model_categories/target/weight/feature_list/categorical_features`)
652
- - `--model-names` (list[str], optional): update only certain datasets (default uses `model_list x model_categories`)
653
- - `--model-keys` (list[str]): `glm` / `xgb` / `resn` / `ft` / `gnn` / `all`
654
- - `--incremental-dir` (Path) or `--incremental-file` (Path): incremental CSV source (choose one)
655
- - `--incremental-template` (str): filename template for `--incremental-dir` (default `{model_name}_incremental.csv`)
656
- - `--merge-keys` (list[str]): primary keys for dedupe after merge
657
- - `--dedupe-keep` (str): `first` / `last`
658
- - `--timestamp-col` (str|null): timestamp column for ordering before dedupe
659
- - `--timestamp-descending` (flag): descending timestamp (default ascending)
660
- - `--max-evals` (int): trial count when re-tuning is needed
661
- - `--force-retune` (flag): force retune even if historical params exist
662
- - `--skip-retune-missing` (flag): skip if params missing (default re-tunes)
663
- - `--update-base-data` (flag): overwrite base CSV with merged data after success
664
- - `--persist-merged-dir` (Path|null): optionally save merged snapshot to a separate dir
665
- - `--summary-json` (Path|null): output summary
666
- - `--plot-curves` (flag): plot
667
- - `--dry-run` (flag): only merge and stats, no training
668
-
669
- ---
670
-
671
- ## 8. Python API: minimal runnable example (recommended to get working first)
672
-
673
- This example shows "self-supervised FT embeddings, then XGB" (only key calls shown):
674
-
675
- ```python
676
- import pandas as pd
677
- from sklearn.model_selection import train_test_split
678
-
679
- import ins_pricing.BayesOpt as ropt
680
-
681
- df = pd.read_csv("./Data/od_bc.csv")
682
- train_df, test_df = train_test_split(df, test_size=0.25, random_state=13)
683
-
684
- model = ropt.BayesOptModel(
685
- train_df=train_df,
686
- test_df=test_df,
687
- model_nme="od_bc",
688
- resp_nme="reponse",
689
- weight_nme="weights",
690
- factor_nmes=[...], # same as config feature_list
691
- cate_list=[...], # same as config categorical_features
692
- epochs=50,
693
- use_ft_ddp=False,
694
- ft_role="unsupervised_embedding",
695
- ft_feature_prefix="ft_emb",
696
- output_dir="./Results",
697
- )
698
-
699
- # 1) FT masked self-supervised pretrain + export embeddings + inject to factor_nmes
700
- model.optimize_model("ft", max_evals=30)
701
-
702
- # 2) Base model tune/train (uses injected pred_ft_emb_* features)
703
- model.optimize_model("xgb", max_evals=50)
704
-
705
- # 3) Save (or save one model only)
706
- model.save_model()
707
- ```
708
-
709
- For time-based splits in Python, keep chronological order and slice:
710
-
711
- ```python
712
- df = df.sort_values("as_of_date")
713
- cutoff = int(len(df) * 0.75)
714
- train_df = df.iloc[:cutoff]
715
- test_df = df.iloc[cutoff:]
716
- ```
717
-
718
- ### 8.x Tuning stuck / resume (recommended)
719
-
720
- If a trial hangs for a long time (e.g. the 17th trial runs for hours), stop the run and add Optuna persistent storage in `config.json`. The next run will resume from completed trials and keep total trials equal to `max_evals`.
721
-
722
- Some XGBoost parameter combos can be extremely slow; use the cap fields to narrow the search space.
723
-
724
- **config.json example:**
725
- ```json
726
- {
727
- "optuna_storage": "./Results/optuna/pricing.sqlite3",
728
- "optuna_study_prefix": "pricing",
729
- "xgb_max_depth_max": 12,
730
- "xgb_n_estimators_max": 300
731
- }
732
- ```
733
-
734
- **Continue training with current best params (no tuning)**
735
- - Set `"reuse_best_params": true` in `config.json`: it prefers `Results/versions/*_xgb_best.json` or `Results/<model>_bestparams_xgboost.csv` and trains directly.
736
- - Or specify `"best_params_files"` (by `model_key`) to read from files and skip Optuna:
737
-
738
- ```json
739
- {
740
- "best_params_files": {
741
- "xgb": "./Results/od_bc_bestparams_xgboost.csv",
742
- "ft": "./Results/od_bc_bestparams_fttransformer.csv"
743
- }
744
- }
745
- ```
746
-
747
- **Auto-detect hangs and restart (Watchdog)**
748
- If a trial hangs with no output for hours, use `ins_pricing/cli/watchdog_run.py` to monitor output: when stdout/stderr is idle for `idle_seconds`, it kills the `torchrun` process tree and restarts. With `optuna_storage`, restarts resume remaining trials.
749
-
750
- ```bash
751
- python ins_pricing/cli/watchdog_run.py --idle-seconds 7200 --max-restarts 50 -- ^
752
- python -m torch.distributed.run --standalone --nproc_per_node=2 ^
753
- ins_pricing/cli/BayesOpt_entry.py --config-json config.json --model-keys xgb resn --max-evals 50
754
- ```
755
-
756
- ---
757
-
758
- ## 9. Model usage examples (CLI and Python)
759
-
760
- Examples by model/trainer. All examples follow the same data contract: CSV must include `target/weight/feature_list` columns; categorical columns listed in `categorical_features`.
761
-
762
- > Note: `model_key` follows `BayesOpt_entry.py`: `glm` / `xgb` / `resn` / `ft` / `gnn`.
763
-
764
- ### 9.1 GLM (`model_key="glm"`)
765
-
766
- **CLI**
767
-
768
- ```bash
769
- python ins_pricing/cli/BayesOpt_entry.py ^
770
- --config-json ins_pricing/examples/modelling/config_template.json ^
771
- --model-keys glm ^
772
- --max-evals 50
773
- ```
774
-
775
- **Python**
776
-
777
- ```python
778
- model.optimize_model("glm", max_evals=50)
779
- model.trainers["glm"].save()
780
- ```
781
-
782
- Use case: fast, interpretable baseline and sanity check.
783
-
784
- ### 9.2 XGBoost (`model_key="xgb"`)
785
-
786
- **CLI**
787
-
788
- ```bash
789
- python ins_pricing/cli/BayesOpt_entry.py ^
790
- --config-json ins_pricing/examples/modelling/config_template.json ^
791
- --model-keys xgb ^
792
- --max-evals 100
793
- ```
794
-
795
- **Python**
796
-
797
- ```python
798
- model.optimize_model("xgb", max_evals=100)
799
- model.trainers["xgb"].save()
800
- ```
801
-
802
- Use case: strong baseline, friendly to feature engineering/stacked features (including FT embeddings).
803
-
804
- ### 9.3 ResNet (`model_key="resn"`)
805
-
806
- ResNetTrainer uses PyTorch, and uses one-hot/standardized views for training and CV (good for high-dimensional one-hot inputs).
807
-
808
- **CLI (single machine)**
809
-
810
- ```bash
811
- python ins_pricing/cli/BayesOpt_entry.py ^
812
- --config-json ins_pricing/examples/modelling/config_template.json ^
813
- --model-keys resn ^
814
- --max-evals 50
815
- ```
816
-
817
- **CLI (DDP, multi-GPU)**
818
-
819
- ```bash
820
- torchrun --standalone --nproc_per_node=2 ^
821
- ins_pricing/cli/BayesOpt_entry.py ^
822
- --config-json ins_pricing/examples/modelling/config_template.json ^
823
- --model-keys resn ^
824
- --use-resn-ddp ^
825
- --max-evals 50
826
- ```
827
-
828
- **Python**
829
-
830
- ```python
831
- model.optimize_model("resn", max_evals=50)
832
- model.trainers["resn"].save()
833
- ```
834
-
835
- ### 9.4 FT-Transformer: as prediction model (`ft_role="model"`)
836
-
837
- FT outputs `pred_ft` and participates in lift/SHAP (if enabled).
838
-
839
- **CLI**
840
-
841
- ```bash
842
- python ins_pricing/cli/BayesOpt_entry.py ^
843
- --config-json ins_pricing/examples/modelling/config_template.json ^
844
- --model-keys ft ^
845
- --ft-role model ^
846
- --max-evals 50
847
- ```
848
-
849
- **Python**
850
-
851
- ```python
852
- model.config.ft_role = "model"
853
- model.optimize_model("ft", max_evals=50)
854
- ```
855
-
856
- ### 9.5 FT-Transformer: supervised but export embeddings only (`ft_role="embedding"`)
857
-
858
- FT is not evaluated as a standalone model; it writes embedding features (`pred_<prefix>_0..`) and injects them into downstream features.
859
-
860
- **CLI (generate features with FT, then train base models)**
861
-
862
- ```bash
863
- python ins_pricing/cli/BayesOpt_entry.py ^
864
- --config-json "user_packages legacy/Try/config_Pricing_FT_Stack.json" ^
865
- --model-keys xgb resn ^
866
- --ft-role embedding ^
867
- --max-evals 50
868
- ```
869
-
870
- **Python**
871
-
872
- ```python
873
- model.config.ft_role = "embedding"
874
- model.config.ft_feature_prefix = "ft_emb"
875
- model.optimize_model("ft", max_evals=50) # generate pred_ft_emb_* and inject to factor_nmes
876
- model.optimize_model("xgb", max_evals=100) # train/tune with injected features
877
- ```
878
-
879
- ### 9.6 FT-Transformer: masked self-supervised pretrain + embeddings (`ft_role="unsupervised_embedding"`)
880
-
881
- This is a two-stage stacking mode: representation learning first, base model decision later. Optuna objective is validation loss of masked reconstruction (not `tw_power`).
882
-
883
- **CLI (recommended: use sample config)**
884
-
885
- ```bash
886
- python ins_pricing/cli/BayesOpt_entry.py ^
887
- --config-json "user_packages legacy/Try/config_Pricing_FT_Stack.json" ^
888
- --model-keys xgb resn ^
889
- --max-evals 50
890
- ```
891
-
892
- **CLI (DDP, multi-GPU)**
893
-
894
- ```bash
895
- torchrun --standalone --nproc_per_node=2 ^
896
- ins_pricing/cli/BayesOpt_entry.py ^
897
- --config-json "user_packages legacy/Try/config_Pricing_FT_Stack.json" ^
898
- --model-keys xgb resn ^
899
- --use-ft-ddp ^
900
- --max-evals 50
901
- ```
902
-
903
- **Python**
904
-
905
- ```python
906
- model.config.ft_role = "unsupervised_embedding"
907
- model.config.ft_feature_prefix = "ft_emb"
908
- model.optimize_model("ft", max_evals=50) # self-supervised pretrain + export pred_ft_emb_*
909
- model.optimize_model("xgb", max_evals=100)
910
- model.optimize_model("resn", max_evals=50)
911
- ```
912
-
913
- ### 9.7 GNN (`model_key="gnn"`) and geo tokens
914
-
915
- GNN can run as a standalone model with Optuna tuning/training: it trains on one-hot/standardized features and writes `pred_gnn` / `w_pred_gnn` to `train_data/test_data`.
916
-
917
- **CLI**
918
-
919
- ```bash
920
- python ins_pricing/cli/BayesOpt_entry.py ^
921
- --config-json ins_pricing/examples/modelling/config_template.json ^
922
- --model-keys gnn ^
923
- --max-evals 50
924
- ```
925
-
926
- GNN can also generate geo tokens: when config includes `geo_feature_nmes`, it trains a geo encoder to produce `geo_token_*` and injects those tokens into FT.
927
-
928
- Implementation: geo token generation is handled by `GNNTrainer.prepare_geo_tokens()`. Tokens are stored in `BayesOptModel.train_geo_tokens/test_geo_tokens` and used as FT inputs during training/prediction.
929
-
930
- ---
931
-
932
- ## 9. FAQ (quick checks)
933
-
934
- ### 9.1 torchrun OMP_NUM_THREADS warning
935
-
936
- This is a common torchrun message: it sets per-process threads to 1 to avoid CPU overload. You can override it via config `env`.
937
-
938
- ### 9.2 Optuna loss shows inf
939
-
940
- This usually means NaN/inf during training or validation (numeric overflow, data issues, etc). Check:
941
-
942
- - data ranges and NaNs (use `nan_to_num`, scaling)
943
- - learning rate and AMP (reduce LR or disable AMP)
944
- - gradient clipping (already enabled for torch models)
945
- - unstable configs (cap XGBoost depth/estimators)