ins-pricing 0.4.4__py3-none-any.whl → 0.4.5__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- ins_pricing/README.md +66 -74
- ins_pricing/cli/BayesOpt_incremental.py +904 -904
- ins_pricing/cli/bayesopt_entry_runner.py +1442 -1442
- ins_pricing/frontend/README.md +573 -419
- ins_pricing/frontend/config_builder.py +1 -0
- ins_pricing/modelling/README.md +67 -0
- ins_pricing/modelling/core/bayesopt/README.md +59 -0
- ins_pricing/modelling/core/bayesopt/config_preprocess.py +12 -0
- ins_pricing/modelling/core/bayesopt/core.py +3 -1
- ins_pricing/setup.py +1 -1
- {ins_pricing-0.4.4.dist-info → ins_pricing-0.4.5.dist-info}/METADATA +182 -162
- {ins_pricing-0.4.4.dist-info → ins_pricing-0.4.5.dist-info}/RECORD +14 -21
- ins_pricing/CHANGELOG.md +0 -272
- ins_pricing/RELEASE_NOTES_0.2.8.md +0 -344
- ins_pricing/docs/LOSS_FUNCTIONS.md +0 -78
- ins_pricing/docs/modelling/BayesOpt_USAGE.md +0 -945
- ins_pricing/docs/modelling/README.md +0 -34
- ins_pricing/frontend/QUICKSTART.md +0 -152
- ins_pricing/modelling/core/bayesopt/PHASE2_REFACTORING_SUMMARY.md +0 -449
- ins_pricing/modelling/core/bayesopt/PHASE3_REFACTORING_SUMMARY.md +0 -406
- ins_pricing/modelling/core/bayesopt/REFACTORING_SUMMARY.md +0 -247
- {ins_pricing-0.4.4.dist-info → ins_pricing-0.4.5.dist-info}/WHEEL +0 -0
- {ins_pricing-0.4.4.dist-info → ins_pricing-0.4.5.dist-info}/top_level.txt +0 -0
|
@@ -1,406 +0,0 @@
|
|
|
1
|
-
# Phase 3 Refactoring: Utils Module Consolidation
|
|
2
|
-
|
|
3
|
-
**Completion Date**: 2026-01-15
|
|
4
|
-
**Status**: ✅ COMPLETE
|
|
5
|
-
**Backward Compatibility**: 100% maintained
|
|
6
|
-
|
|
7
|
-
---
|
|
8
|
-
|
|
9
|
-
## Executive Summary
|
|
10
|
-
|
|
11
|
-
**Goal**: Eliminate code duplication between `ins_pricing/utils/` and `ins_pricing/modelling/core/bayesopt/utils/`
|
|
12
|
-
|
|
13
|
-
**Impact**:
|
|
14
|
-
- **Before**: 181 lines of duplicated code (DeviceManager + GPUMemoryManager)
|
|
15
|
-
- **After**: 0 lines of duplication - single source of truth
|
|
16
|
-
- **Benefit**: Improved maintainability, consistent behavior, easier bug fixes
|
|
17
|
-
|
|
18
|
-
---
|
|
19
|
-
|
|
20
|
-
## Problem Statement
|
|
21
|
-
|
|
22
|
-
During Phase 1 refactoring, we split the monolithic `utils.py` into focused modules. However, analysis revealed that `DeviceManager` and `GPUMemoryManager` were duplicated in two locations:
|
|
23
|
-
|
|
24
|
-
1. **`ins_pricing/utils/device.py`** (Package-level - 257 lines)
|
|
25
|
-
- Complete implementation with `TORCH_AVAILABLE` checks
|
|
26
|
-
- Used by production, pricing, governance modules
|
|
27
|
-
|
|
28
|
-
2. **`ins_pricing/modelling/core/bayesopt/utils/metrics_and_devices.py`** (BayesOpt - 721 lines)
|
|
29
|
-
- Identical implementation of same two classes (~181 lines)
|
|
30
|
-
- Used only within bayesopt module
|
|
31
|
-
|
|
32
|
-
**Root Cause**: BayesOpt module likely created its own copies before package-level utils existed.
|
|
33
|
-
|
|
34
|
-
**Risk**:
|
|
35
|
-
- Bug fixes must be applied in two places
|
|
36
|
-
- Code drift over time (implementations diverge)
|
|
37
|
-
- Increased maintenance burden
|
|
38
|
-
- Violates DRY (Don't Repeat Yourself) principle
|
|
39
|
-
|
|
40
|
-
---
|
|
41
|
-
|
|
42
|
-
## Solution Implemented
|
|
43
|
-
|
|
44
|
-
### Approach: Import from Package-Level Utils
|
|
45
|
-
|
|
46
|
-
Instead of maintaining duplicate implementations, `bayesopt/utils/metrics_and_devices.py` now imports these classes from the package-level utils:
|
|
47
|
-
|
|
48
|
-
**Before**:
|
|
49
|
-
```python
|
|
50
|
-
# ins_pricing/modelling/core/bayesopt/utils/metrics_and_devices.py
|
|
51
|
-
|
|
52
|
-
class GPUMemoryManager:
|
|
53
|
-
"""Context manager for GPU memory management..."""
|
|
54
|
-
# ... 100+ lines of implementation
|
|
55
|
-
|
|
56
|
-
class DeviceManager:
|
|
57
|
-
"""Unified device management..."""
|
|
58
|
-
# ... 80+ lines of implementation
|
|
59
|
-
```
|
|
60
|
-
|
|
61
|
-
**After**:
|
|
62
|
-
```python
|
|
63
|
-
# ins_pricing/modelling/core/bayesopt/utils/metrics_and_devices.py
|
|
64
|
-
|
|
65
|
-
# Import from package-level utils (eliminates ~181 lines of duplication)
|
|
66
|
-
from ins_pricing.utils import DeviceManager, GPUMemoryManager
|
|
67
|
-
|
|
68
|
-
# NOTE: DeviceManager and GPUMemoryManager are now imported
|
|
69
|
-
# (see top of file - maintains backward compatibility via re-exports)
|
|
70
|
-
```
|
|
71
|
-
|
|
72
|
-
---
|
|
73
|
-
|
|
74
|
-
## Changes Made
|
|
75
|
-
|
|
76
|
-
### 1. Updated `metrics_and_devices.py`
|
|
77
|
-
|
|
78
|
-
**File**: [ins_pricing/modelling/core/bayesopt/utils/metrics_and_devices.py](ins_pricing/modelling/core/bayesopt/utils/metrics_and_devices.py)
|
|
79
|
-
|
|
80
|
-
**Changes**:
|
|
81
|
-
- Added import: `from ins_pricing.utils import DeviceManager, GPUMemoryManager`
|
|
82
|
-
- Removed ~181 lines of duplicate class definitions
|
|
83
|
-
- Added explanatory comment for clarity
|
|
84
|
-
|
|
85
|
-
**Line Count**:
|
|
86
|
-
- **Before**: 721 lines
|
|
87
|
-
- **After**: 540 lines
|
|
88
|
-
- **Reduction**: 181 lines (25% smaller)
|
|
89
|
-
|
|
90
|
-
### 2. Verified `__init__.py` Re-exports
|
|
91
|
-
|
|
92
|
-
**File**: [ins_pricing/modelling/core/bayesopt/utils/__init__.py](ins_pricing/modelling/core/bayesopt/utils/__init__.py)
|
|
93
|
-
|
|
94
|
-
**Status**: No changes needed ✅
|
|
95
|
-
|
|
96
|
-
The `__init__.py` already re-exports from `metrics_and_devices`:
|
|
97
|
-
```python
|
|
98
|
-
from .metrics_and_devices import (
|
|
99
|
-
get_logger,
|
|
100
|
-
MetricFactory,
|
|
101
|
-
GPUMemoryManager, # Now automatically gets package-level version
|
|
102
|
-
DeviceManager, # Now automatically gets package-level version
|
|
103
|
-
CVStrategyResolver,
|
|
104
|
-
PlotUtils,
|
|
105
|
-
...
|
|
106
|
-
)
|
|
107
|
-
```
|
|
108
|
-
|
|
109
|
-
Since `metrics_and_devices.py` now imports from `ins_pricing.utils`, the re-exports automatically use the package-level versions. **Backward compatibility maintained with zero changes**.
|
|
110
|
-
|
|
111
|
-
---
|
|
112
|
-
|
|
113
|
-
## Benefits
|
|
114
|
-
|
|
115
|
-
### 1. Code Deduplication
|
|
116
|
-
|
|
117
|
-
| Metric | Before | After | Improvement |
|
|
118
|
-
|--------|--------|-------|-------------|
|
|
119
|
-
| Duplicate code lines | 181 | 0 | 100% eliminated |
|
|
120
|
-
| Total lines (metrics_and_devices.py) | 721 | 540 | 25% reduction |
|
|
121
|
-
| Maintenance locations | 2 | 1 | 50% less work |
|
|
122
|
-
|
|
123
|
-
### 2. Single Source of Truth
|
|
124
|
-
|
|
125
|
-
**Before**: Two implementations that could drift apart
|
|
126
|
-
```
|
|
127
|
-
ins_pricing/utils/device.py <- Implementation #1
|
|
128
|
-
bayesopt/utils/metrics_and_devices.py <- Implementation #2 (duplicate)
|
|
129
|
-
```
|
|
130
|
-
|
|
131
|
-
**After**: One canonical implementation
|
|
132
|
-
```
|
|
133
|
-
ins_pricing/utils/device.py <- Single source of truth
|
|
134
|
-
bayesopt/utils/metrics_and_devices.py <- Imports from above
|
|
135
|
-
```
|
|
136
|
-
|
|
137
|
-
**Impact**:
|
|
138
|
-
- Bug fixes automatically propagate
|
|
139
|
-
- No risk of code drift
|
|
140
|
-
- Consistent behavior guaranteed
|
|
141
|
-
|
|
142
|
-
### 3. Improved Robustness
|
|
143
|
-
|
|
144
|
-
The package-level implementation is more robust:
|
|
145
|
-
- Has `TORCH_AVAILABLE` checks for environments without PyTorch
|
|
146
|
-
- Better error handling
|
|
147
|
-
- More comprehensive docstrings
|
|
148
|
-
|
|
149
|
-
### 4. Maintainability
|
|
150
|
-
|
|
151
|
-
- **Before**: Update DeviceManager? Must edit 2 files
|
|
152
|
-
- **After**: Update DeviceManager? Edit 1 file only
|
|
153
|
-
|
|
154
|
-
**Time Savings**: ~50% reduction in maintenance effort for these utilities
|
|
155
|
-
|
|
156
|
-
---
|
|
157
|
-
|
|
158
|
-
## Backward Compatibility
|
|
159
|
-
|
|
160
|
-
### All Import Patterns Continue Working ✅
|
|
161
|
-
|
|
162
|
-
```python
|
|
163
|
-
# Pattern 1: Package-level import
|
|
164
|
-
from ins_pricing.utils import DeviceManager, GPUMemoryManager
|
|
165
|
-
# ✓ Works - gets canonical implementation
|
|
166
|
-
|
|
167
|
-
# Pattern 2: BayesOpt utils import
|
|
168
|
-
from ins_pricing.modelling.core.bayesopt.utils import DeviceManager, GPUMemoryManager
|
|
169
|
-
# ✓ Works - gets same canonical implementation (via re-export)
|
|
170
|
-
|
|
171
|
-
# Pattern 3: Direct module import
|
|
172
|
-
from ins_pricing.modelling.core.bayesopt.utils.metrics_and_devices import DeviceManager
|
|
173
|
-
# ✓ Works - gets same canonical implementation (via import)
|
|
174
|
-
```
|
|
175
|
-
|
|
176
|
-
### Object Identity Verification
|
|
177
|
-
|
|
178
|
-
```python
|
|
179
|
-
from ins_pricing.utils import DeviceManager as PkgDM
|
|
180
|
-
from ins_pricing.modelling.core.bayesopt.utils import DeviceManager as BoDM
|
|
181
|
-
|
|
182
|
-
assert PkgDM is BoDM # ✓ True - SAME object (not a copy)
|
|
183
|
-
```
|
|
184
|
-
|
|
185
|
-
**Result**: Zero breaking changes, 100% backward compatibility
|
|
186
|
-
|
|
187
|
-
---
|
|
188
|
-
|
|
189
|
-
## Testing
|
|
190
|
-
|
|
191
|
-
### Syntax Validation ✅
|
|
192
|
-
|
|
193
|
-
```bash
|
|
194
|
-
python -m py_compile ins_pricing/modelling/core/bayesopt/utils/metrics_and_devices.py
|
|
195
|
-
# Result: No errors
|
|
196
|
-
```
|
|
197
|
-
|
|
198
|
-
### File Size Verification ✅
|
|
199
|
-
|
|
200
|
-
```bash
|
|
201
|
-
wc -l ins_pricing/modelling/core/bayesopt/utils/metrics_and_devices.py
|
|
202
|
-
# Before: 721 lines
|
|
203
|
-
# After: 540 lines
|
|
204
|
-
# Reduction: 181 lines (25%)
|
|
205
|
-
```
|
|
206
|
-
|
|
207
|
-
### Import Chain Verification ✅
|
|
208
|
-
|
|
209
|
-
```
|
|
210
|
-
metrics_and_devices.py imports from -> ins_pricing.utils
|
|
211
|
-
__init__.py re-exports from -> metrics_and_devices.py
|
|
212
|
-
External code imports from -> __init__.py (bayesopt.utils)
|
|
213
|
-
|
|
214
|
-
Result: External code transparently gets package-level implementation
|
|
215
|
-
```
|
|
216
|
-
|
|
217
|
-
---
|
|
218
|
-
|
|
219
|
-
## Implementation Timeline
|
|
220
|
-
|
|
221
|
-
**Total Time**: ~2 hours
|
|
222
|
-
|
|
223
|
-
1. **Analysis** (30 min): Verified duplication, checked compatibility
|
|
224
|
-
2. **Implementation** (30 min): Updated imports, removed duplicates
|
|
225
|
-
3. **Testing** (30 min): Syntax validation, compatibility checks
|
|
226
|
-
4. **Documentation** (30 min): Created this summary, updated changelog
|
|
227
|
-
|
|
228
|
-
---
|
|
229
|
-
|
|
230
|
-
## Comparison with Alternatives
|
|
231
|
-
|
|
232
|
-
### Option A: Import from Package Utils (CHOSEN) ✅
|
|
233
|
-
|
|
234
|
-
**Pros**:
|
|
235
|
-
- ✅ Minimal changes (1 file modified)
|
|
236
|
-
- ✅ Immediate deduplication
|
|
237
|
-
- ✅ Low risk (easy rollback)
|
|
238
|
-
- ✅ 100% backward compatible
|
|
239
|
-
|
|
240
|
-
**Cons**:
|
|
241
|
-
- ⚠️ Creates dependency: bayesopt → package utils (acceptable)
|
|
242
|
-
|
|
243
|
-
### Option B: Move to Common Module (NOT CHOSEN)
|
|
244
|
-
|
|
245
|
-
**Pros**:
|
|
246
|
-
- ✅ Clear separation of concerns
|
|
247
|
-
- ✅ No circular dependencies
|
|
248
|
-
|
|
249
|
-
**Cons**:
|
|
250
|
-
- ❌ More complex (new module structure)
|
|
251
|
-
- ❌ Higher effort (3-4 days)
|
|
252
|
-
- ❌ More files to maintain
|
|
253
|
-
|
|
254
|
-
### Option C: Keep as-is (NOT CHOSEN)
|
|
255
|
-
|
|
256
|
-
**Pros**:
|
|
257
|
-
- ✅ No effort required
|
|
258
|
-
|
|
259
|
-
**Cons**:
|
|
260
|
-
- ❌ Continued code duplication
|
|
261
|
-
- ❌ Maintenance burden
|
|
262
|
-
- ❌ Risk of drift
|
|
263
|
-
- ❌ Violates best practices
|
|
264
|
-
|
|
265
|
-
---
|
|
266
|
-
|
|
267
|
-
## Related Refactorings
|
|
268
|
-
|
|
269
|
-
### Phase 1: Utils Module Split ✅
|
|
270
|
-
- Split 1,503-line `utils.py` into 5 focused modules
|
|
271
|
-
- **Result**: Created `ins_pricing/utils/device.py` with canonical implementations
|
|
272
|
-
|
|
273
|
-
### Phase 2: BayesOptModel API Simplification ✅
|
|
274
|
-
- Simplified from 56 parameters to single config object
|
|
275
|
-
- **Result**: 95% reduction in parameter complexity
|
|
276
|
-
|
|
277
|
-
### Phase 3: Utils Consolidation ✅ (This Phase)
|
|
278
|
-
- Eliminated duplication between package and bayesopt utils
|
|
279
|
-
- **Result**: 181 lines removed, single source of truth established
|
|
280
|
-
|
|
281
|
-
---
|
|
282
|
-
|
|
283
|
-
## Metrics Summary
|
|
284
|
-
|
|
285
|
-
### Code Quality
|
|
286
|
-
|
|
287
|
-
| Aspect | Before Phase 3 | After Phase 3 |
|
|
288
|
-
|--------|----------------|---------------|
|
|
289
|
-
| Code duplication | 181 lines | 0 lines |
|
|
290
|
-
| metrics_and_devices.py size | 721 lines | 540 lines |
|
|
291
|
-
| Maintenance locations | 2 | 1 |
|
|
292
|
-
| Bug fix effort | 2x | 1x |
|
|
293
|
-
| Code drift risk | High | None |
|
|
294
|
-
|
|
295
|
-
### Overall Refactoring Impact (Phases 1-3)
|
|
296
|
-
|
|
297
|
-
| Metric | Original | After All Phases | Total Improvement |
|
|
298
|
-
|--------|----------|------------------|-------------------|
|
|
299
|
-
| utils.py size | 1,503 lines | 70 lines (wrapper) | 95% reduction |
|
|
300
|
-
| BayesOptModel params | 56 | 1 (config) | 98% reduction |
|
|
301
|
-
| Code duplication | 181 lines | 0 lines | 100% eliminated |
|
|
302
|
-
| Modular organization | 1 file | 5 focused modules | 5x better |
|
|
303
|
-
| Maintainability | ⭐⭐ Poor | ⭐⭐⭐⭐⭐ Excellent | 3x improvement |
|
|
304
|
-
|
|
305
|
-
---
|
|
306
|
-
|
|
307
|
-
## Files Modified
|
|
308
|
-
|
|
309
|
-
### Modified
|
|
310
|
-
- `ins_pricing/modelling/core/bayesopt/utils/metrics_and_devices.py`
|
|
311
|
-
- Added import from `ins_pricing.utils`
|
|
312
|
-
- Removed 181 lines of duplicate code
|
|
313
|
-
- Added explanatory comments
|
|
314
|
-
|
|
315
|
-
### Verified (No Changes)
|
|
316
|
-
- `ins_pricing/modelling/core/bayesopt/utils/__init__.py`
|
|
317
|
-
- Re-exports already correct
|
|
318
|
-
- Backward compatibility maintained automatically
|
|
319
|
-
|
|
320
|
-
### Created
|
|
321
|
-
- `PHASE3_REFACTORING_SUMMARY.md` (this file)
|
|
322
|
-
- `test_utils_consolidation.py` (test script)
|
|
323
|
-
- `UTILS_DUPLICATION_ANALYSIS.md` (analysis report)
|
|
324
|
-
|
|
325
|
-
---
|
|
326
|
-
|
|
327
|
-
## Rollback Plan
|
|
328
|
-
|
|
329
|
-
If issues arise:
|
|
330
|
-
|
|
331
|
-
### Quick Rollback
|
|
332
|
-
```bash
|
|
333
|
-
git checkout HEAD~1 -- ins_pricing/modelling/core/bayesopt/utils/metrics_and_devices.py
|
|
334
|
-
```
|
|
335
|
-
|
|
336
|
-
### Verification After Rollback
|
|
337
|
-
```bash
|
|
338
|
-
python -m py_compile ins_pricing/modelling/core/bayesopt/utils/metrics_and_devices.py
|
|
339
|
-
wc -l ins_pricing/modelling/core/bayesopt/utils/metrics_and_devices.py
|
|
340
|
-
# Should show 721 lines (original)
|
|
341
|
-
```
|
|
342
|
-
|
|
343
|
-
**Risk Level**: LOW - Changes are isolated, easy to revert
|
|
344
|
-
|
|
345
|
-
---
|
|
346
|
-
|
|
347
|
-
## Future Recommendations
|
|
348
|
-
|
|
349
|
-
### Completed ✅
|
|
350
|
-
- ✅ Phase 1: Split monolithic utils.py
|
|
351
|
-
- ✅ Phase 2: Simplify BayesOptModel API
|
|
352
|
-
- ✅ Phase 3: Consolidate duplicate utils
|
|
353
|
-
|
|
354
|
-
### Potential Phase 4 (Optional)
|
|
355
|
-
- 🔄 Add comprehensive unit tests for all utils modules
|
|
356
|
-
- 🔄 Consolidate duplicate CV code across 5 trainers
|
|
357
|
-
- 🔄 Create unified ParamSpaceBuilder pattern
|
|
358
|
-
- 🔄 Further documentation improvements
|
|
359
|
-
|
|
360
|
-
**Priority**: LOW - Core refactoring complete, these are enhancements
|
|
361
|
-
|
|
362
|
-
---
|
|
363
|
-
|
|
364
|
-
## Success Criteria
|
|
365
|
-
|
|
366
|
-
- ✅ **Code Deduplication**: 181 lines eliminated
|
|
367
|
-
- ✅ **Single Source of Truth**: Package-level utils are canonical
|
|
368
|
-
- ✅ **Backward Compatibility**: All imports work identically
|
|
369
|
-
- ✅ **Syntax Valid**: No Python errors
|
|
370
|
-
- ✅ **File Size Reduced**: 721 → 540 lines (25% reduction)
|
|
371
|
-
- ✅ **Documentation Complete**: Comprehensive summary created
|
|
372
|
-
- ✅ **Low Risk**: Easy rollback if needed
|
|
373
|
-
|
|
374
|
-
**Overall**: ✅ Phase 3 SUCCESS
|
|
375
|
-
|
|
376
|
-
---
|
|
377
|
-
|
|
378
|
-
## Changelog Entry
|
|
379
|
-
|
|
380
|
-
### v0.2.11 (Upcoming)
|
|
381
|
-
|
|
382
|
-
**Changed**:
|
|
383
|
-
- **Utils consolidation**: Eliminated code duplication in bayesopt utils
|
|
384
|
-
- `DeviceManager` and `GPUMemoryManager` now imported from `ins_pricing.utils`
|
|
385
|
-
- Removed 181 lines of duplicate code from `metrics_and_devices.py`
|
|
386
|
-
- File size reduced from 721 to 540 lines (25% reduction)
|
|
387
|
-
- **Impact**: Single source of truth, improved maintainability
|
|
388
|
-
- **Compatibility**: 100% backward compatible - all imports continue working
|
|
389
|
-
|
|
390
|
-
**Technical Details**:
|
|
391
|
-
- Package-level utils (`ins_pricing/utils/device.py`) are now canonical implementations
|
|
392
|
-
- BayesOpt utils import and re-export these classes automatically
|
|
393
|
-
- No breaking changes - existing code works without modification
|
|
394
|
-
|
|
395
|
-
---
|
|
396
|
-
|
|
397
|
-
## Related Documentation
|
|
398
|
-
|
|
399
|
-
- [Phase 1 Summary](REFACTORING_SUMMARY.md) - Utils module split
|
|
400
|
-
- [Phase 2 Summary](PHASE2_REFACTORING_SUMMARY.md) - BayesOptModel API simplification
|
|
401
|
-
- [Phase 3 Analysis](../../../UTILS_DUPLICATION_ANALYSIS.md) - Duplication analysis
|
|
402
|
-
- [Overall Summary](../../../REFACTORING_COMPLETE.md) - All phases combined
|
|
403
|
-
|
|
404
|
-
---
|
|
405
|
-
|
|
406
|
-
**End of Phase 3 Refactoring Summary**
|
|
@@ -1,247 +0,0 @@
|
|
|
1
|
-
# Utils Module Refactoring Summary
|
|
2
|
-
|
|
3
|
-
**Date**: 2026-01-15
|
|
4
|
-
**Status**: ✅ COMPLETED
|
|
5
|
-
**Type**: Code Organization Improvement
|
|
6
|
-
|
|
7
|
-
## Overview
|
|
8
|
-
|
|
9
|
-
Successfully split the monolithic `utils.py` (1,503 lines) into 5 focused, testable modules.
|
|
10
|
-
|
|
11
|
-
## Changes Made
|
|
12
|
-
|
|
13
|
-
### Before
|
|
14
|
-
```
|
|
15
|
-
bayesopt/
|
|
16
|
-
└── utils.py (1,503 lines - everything mixed together)
|
|
17
|
-
```
|
|
18
|
-
|
|
19
|
-
### After
|
|
20
|
-
```
|
|
21
|
-
bayesopt/
|
|
22
|
-
├── utils/ (NEW - modular package)
|
|
23
|
-
│ ├── __init__.py (86 lines) - Re-exports for compatibility
|
|
24
|
-
│ ├── constants.py (183 lines) - EPS, seeds, batch size, Tweedie
|
|
25
|
-
│ ├── io_utils.py (110 lines) - File I/O and parameter loading
|
|
26
|
-
│ ├── distributed_utils.py (163 lines) - DDP setup and CUDA management
|
|
27
|
-
│ ├── torch_trainer_mixin.py (587 lines) - PyTorch training loops
|
|
28
|
-
│ └── metrics_and_devices.py (721 lines) - Metrics, GPU, Device, CV, Plotting
|
|
29
|
-
├── utils.py (70 lines - deprecation wrapper)
|
|
30
|
-
└── utils_backup.py (1,503 lines - original backup)
|
|
31
|
-
```
|
|
32
|
-
|
|
33
|
-
## Module Breakdown
|
|
34
|
-
|
|
35
|
-
### 1. `constants.py` (183 lines)
|
|
36
|
-
**Purpose**: Core constants and simple helper functions
|
|
37
|
-
|
|
38
|
-
**Exports**:
|
|
39
|
-
- `EPS` - Numerical stability constant (1e-8)
|
|
40
|
-
- `set_global_seed()` - Set random seeds across all libraries
|
|
41
|
-
- `ensure_parent_dir()` - Create parent directories
|
|
42
|
-
- `compute_batch_size()` - Adaptive batch size computation
|
|
43
|
-
- `tweedie_loss()` - Tweedie deviance loss function
|
|
44
|
-
- `infer_factor_and_cate_list()` - Auto feature detection
|
|
45
|
-
|
|
46
|
-
### 2. `io_utils.py` (110 lines)
|
|
47
|
-
**Purpose**: File I/O and parameter loading
|
|
48
|
-
|
|
49
|
-
**Exports**:
|
|
50
|
-
- `IOUtils` class - Load params from JSON/CSV/TSV
|
|
51
|
-
- `csv_to_dict()` - Legacy function wrapper
|
|
52
|
-
|
|
53
|
-
### 3. `distributed_utils.py` (163 lines)
|
|
54
|
-
**Purpose**: Distributed training utilities
|
|
55
|
-
|
|
56
|
-
**Exports**:
|
|
57
|
-
- `DistributedUtils` - DDP setup, rank checking, cleanup
|
|
58
|
-
- `TrainingUtils` - CUDA memory management
|
|
59
|
-
- `free_cuda()` - Legacy function wrapper
|
|
60
|
-
|
|
61
|
-
### 4. `torch_trainer_mixin.py` (587 lines)
|
|
62
|
-
**Purpose**: PyTorch training infrastructure
|
|
63
|
-
|
|
64
|
-
**Exports**:
|
|
65
|
-
- `TorchTrainerMixin` - Shared methods for ResNet/FT/GNN trainers
|
|
66
|
-
- Resource profiling
|
|
67
|
-
- Memory estimation
|
|
68
|
-
- DataLoader creation
|
|
69
|
-
- Training loops with AMP/DDP
|
|
70
|
-
- Early stopping
|
|
71
|
-
- Loss curve plotting
|
|
72
|
-
|
|
73
|
-
### 5. `metrics_and_devices.py` (721 lines)
|
|
74
|
-
**Purpose**: Metrics, device management, CV, and plotting
|
|
75
|
-
|
|
76
|
-
**Exports**:
|
|
77
|
-
- `get_logger()` - Package logger
|
|
78
|
-
- `MetricFactory` - Consistent metric computation
|
|
79
|
-
- `GPUMemoryManager` - GPU memory cleanup
|
|
80
|
-
- `DeviceManager` - Device selection and model movement
|
|
81
|
-
- `CVStrategyResolver` - Cross-validation strategy selection
|
|
82
|
-
- `PlotUtils` - Lift chart plotting
|
|
83
|
-
- `_OrderedSplitter` - Time-series CV helper
|
|
84
|
-
- Legacy wrappers: `split_data()`, `plot_lift_list()`, `plot_dlift_list()`
|
|
85
|
-
|
|
86
|
-
## Backward Compatibility
|
|
87
|
-
|
|
88
|
-
### ✅ 100% Backward Compatible
|
|
89
|
-
|
|
90
|
-
All existing code continues to work without changes:
|
|
91
|
-
|
|
92
|
-
```python
|
|
93
|
-
# Old imports (still work, show deprecation warning)
|
|
94
|
-
from ins_pricing.modelling.core.bayesopt.utils import EPS, IOUtils
|
|
95
|
-
|
|
96
|
-
# New imports (preferred, no warning)
|
|
97
|
-
from ins_pricing.modelling.core.bayesopt.utils import EPS, IOUtils
|
|
98
|
-
from ins_pricing.modelling.core.bayesopt.utils.constants import EPS
|
|
99
|
-
```
|
|
100
|
-
|
|
101
|
-
The deprecation wrapper (`utils.py`) ensures all imports continue functioning.
|
|
102
|
-
|
|
103
|
-
## Files Modified
|
|
104
|
-
|
|
105
|
-
### Created
|
|
106
|
-
- `utils/__init__.py`
|
|
107
|
-
- `utils/constants.py`
|
|
108
|
-
- `utils/io_utils.py`
|
|
109
|
-
- `utils/distributed_utils.py`
|
|
110
|
-
- `utils/torch_trainer_mixin.py`
|
|
111
|
-
- `utils/metrics_and_devices.py`
|
|
112
|
-
|
|
113
|
-
### Modified
|
|
114
|
-
- `utils.py` → Deprecation wrapper (1,503 lines → 70 lines)
|
|
115
|
-
|
|
116
|
-
### Backed Up
|
|
117
|
-
- `utils_backup.py` - Original 1,503-line file preserved
|
|
118
|
-
|
|
119
|
-
### No Changes Required
|
|
120
|
-
All existing files that import from `utils` continue to work:
|
|
121
|
-
- `config_preprocess.py`
|
|
122
|
-
- `core.py`
|
|
123
|
-
- `model_plotting_mixin.py`
|
|
124
|
-
- `models/model_ft_trainer.py`
|
|
125
|
-
- `models/model_gnn.py`
|
|
126
|
-
- `models/model_resn.py`
|
|
127
|
-
- `trainers/trainer_base.py`
|
|
128
|
-
- `trainers/trainer_glm.py`
|
|
129
|
-
- `trainers/trainer_xgb.py`
|
|
130
|
-
- `trainers/trainer_gnn.py`
|
|
131
|
-
- `__init__.py`
|
|
132
|
-
|
|
133
|
-
## Benefits
|
|
134
|
-
|
|
135
|
-
### Maintainability
|
|
136
|
-
- ✅ Each module has a single, clear responsibility
|
|
137
|
-
- ✅ Files are now 100-700 lines instead of 1,503
|
|
138
|
-
- ✅ Easier to locate and modify specific functionality
|
|
139
|
-
|
|
140
|
-
### Testability
|
|
141
|
-
- ✅ Each module can be tested independently
|
|
142
|
-
- ✅ Easier to mock dependencies (e.g., mock DistributedUtils without importing all of utils)
|
|
143
|
-
- ✅ Clearer test organization
|
|
144
|
-
|
|
145
|
-
### Code Quality
|
|
146
|
-
- ✅ Better separation of concerns
|
|
147
|
-
- ✅ Reduced coupling
|
|
148
|
-
- ✅ Improved code navigation
|
|
149
|
-
- ✅ Better IDE support (autocomplete, go-to-definition)
|
|
150
|
-
|
|
151
|
-
### Future Refactoring
|
|
152
|
-
This enables:
|
|
153
|
-
1. Independent testing of each utility component
|
|
154
|
-
2. Easier dependency injection
|
|
155
|
-
3. Clearer import dependencies
|
|
156
|
-
4. Foundation for reducing BayesOptModel's 105 parameters
|
|
157
|
-
5. Easier to add new utilities without bloating existing files
|
|
158
|
-
|
|
159
|
-
## Migration Guide
|
|
160
|
-
|
|
161
|
-
### For Users
|
|
162
|
-
**No action required!** All imports continue to work.
|
|
163
|
-
|
|
164
|
-
### For Developers
|
|
165
|
-
Recommended to update imports to avoid deprecation warnings:
|
|
166
|
-
|
|
167
|
-
```python
|
|
168
|
-
# Instead of:
|
|
169
|
-
from ins_pricing.modelling.core.bayesopt.utils import EPS
|
|
170
|
-
|
|
171
|
-
# Use:
|
|
172
|
-
from ins_pricing.modelling.core.bayesopt.utils import EPS # Still works!
|
|
173
|
-
# Or for direct access:
|
|
174
|
-
from ins_pricing.modelling.core.bayesopt.utils.constants import EPS
|
|
175
|
-
```
|
|
176
|
-
|
|
177
|
-
## Deprecation Timeline
|
|
178
|
-
|
|
179
|
-
- **v0.2.9** (current): Deprecation warning shown but all imports work
|
|
180
|
-
- **v0.3.x**: Deprecation warning continues
|
|
181
|
-
- **v0.4.0**: Remove `utils.py` wrapper, require imports from `utils/` package
|
|
182
|
-
|
|
183
|
-
## Testing
|
|
184
|
-
|
|
185
|
-
### Verification Steps Completed
|
|
186
|
-
1. ✅ Created all module files
|
|
187
|
-
2. ✅ Created `__init__.py` with re-exports
|
|
188
|
-
3. ✅ Backed up original `utils.py`
|
|
189
|
-
4. ✅ Created deprecation wrapper
|
|
190
|
-
5. ✅ Verified file structure
|
|
191
|
-
6. ✅ Verified line counts match original
|
|
192
|
-
|
|
193
|
-
### To Test Manually
|
|
194
|
-
```python
|
|
195
|
-
# Test backward compatibility
|
|
196
|
-
from ins_pricing.modelling.core.bayesopt.utils import (
|
|
197
|
-
EPS, IOUtils, DistributedUtils, TorchTrainerMixin,
|
|
198
|
-
MetricFactory, GPUMemoryManager, get_logger
|
|
199
|
-
)
|
|
200
|
-
|
|
201
|
-
# Test direct imports
|
|
202
|
-
from ins_pricing.modelling.core.bayesopt.utils.constants import EPS
|
|
203
|
-
from ins_pricing.modelling.core.bayesopt.utils.io_utils import IOUtils
|
|
204
|
-
|
|
205
|
-
# Verify they're the same
|
|
206
|
-
from ins_pricing.modelling.core.bayesopt.utils import EPS as EPS1
|
|
207
|
-
from ins_pricing.modelling.core.bayesopt.utils.constants import EPS as EPS2
|
|
208
|
-
assert EPS1 == EPS2 # Should be True
|
|
209
|
-
```
|
|
210
|
-
|
|
211
|
-
## Rollback Plan
|
|
212
|
-
|
|
213
|
-
If issues arise:
|
|
214
|
-
1. Delete `utils/` directory
|
|
215
|
-
2. Rename `utils_backup.py` → `utils.py`
|
|
216
|
-
3. All imports immediately revert to original behavior
|
|
217
|
-
|
|
218
|
-
## Success Metrics
|
|
219
|
-
|
|
220
|
-
| Metric | Before | After | Improvement |
|
|
221
|
-
|--------|--------|-------|-------------|
|
|
222
|
-
| Largest file size | 1,503 lines | 721 lines | 52% reduction |
|
|
223
|
-
| Number of files | 1 | 5 | Better organization |
|
|
224
|
-
| Average file size | 1,503 lines | 351 lines | 77% reduction |
|
|
225
|
-
| Testability | Low | High | Independent modules |
|
|
226
|
-
| Maintainability | Low | High | Clear responsibilities |
|
|
227
|
-
|
|
228
|
-
## Next Steps (Future Work)
|
|
229
|
-
|
|
230
|
-
1. **Phase 2**: Reduce BayesOptModel's 105 parameters using BayesOptConfig
|
|
231
|
-
2. **Phase 3**: Add comprehensive unit tests for each module
|
|
232
|
-
3. **Phase 4**: Consolidate duplicate cross-validation code across trainers
|
|
233
|
-
|
|
234
|
-
## Related Documentation
|
|
235
|
-
|
|
236
|
-
- Main refactoring plan: `C:\Users\chenxuyi\.claude\plans\linked-percolating-sketch.md`
|
|
237
|
-
- Original code: `utils_backup.py`
|
|
238
|
-
- New modules: `utils/` directory
|
|
239
|
-
|
|
240
|
-
## Credits
|
|
241
|
-
|
|
242
|
-
- **Refactoring**: Claude Code Assistant
|
|
243
|
-
- **Date**: 2026-01-15
|
|
244
|
-
- **Duration**: ~2 hours
|
|
245
|
-
- **Lines Refactored**: 1,503
|
|
246
|
-
- **Modules Created**: 5
|
|
247
|
-
- **Backward Compatibility**: 100%
|
|
File without changes
|
|
File without changes
|