ins-pricing 0.4.4__py3-none-any.whl → 0.4.5__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- ins_pricing/README.md +66 -74
- ins_pricing/cli/BayesOpt_incremental.py +904 -904
- ins_pricing/cli/bayesopt_entry_runner.py +1442 -1442
- ins_pricing/frontend/README.md +573 -419
- ins_pricing/frontend/config_builder.py +1 -0
- ins_pricing/modelling/README.md +67 -0
- ins_pricing/modelling/core/bayesopt/README.md +59 -0
- ins_pricing/modelling/core/bayesopt/config_preprocess.py +12 -0
- ins_pricing/modelling/core/bayesopt/core.py +3 -1
- ins_pricing/setup.py +1 -1
- {ins_pricing-0.4.4.dist-info → ins_pricing-0.4.5.dist-info}/METADATA +182 -162
- {ins_pricing-0.4.4.dist-info → ins_pricing-0.4.5.dist-info}/RECORD +14 -21
- ins_pricing/CHANGELOG.md +0 -272
- ins_pricing/RELEASE_NOTES_0.2.8.md +0 -344
- ins_pricing/docs/LOSS_FUNCTIONS.md +0 -78
- ins_pricing/docs/modelling/BayesOpt_USAGE.md +0 -945
- ins_pricing/docs/modelling/README.md +0 -34
- ins_pricing/frontend/QUICKSTART.md +0 -152
- ins_pricing/modelling/core/bayesopt/PHASE2_REFACTORING_SUMMARY.md +0 -449
- ins_pricing/modelling/core/bayesopt/PHASE3_REFACTORING_SUMMARY.md +0 -406
- ins_pricing/modelling/core/bayesopt/REFACTORING_SUMMARY.md +0 -247
- {ins_pricing-0.4.4.dist-info → ins_pricing-0.4.5.dist-info}/WHEEL +0 -0
- {ins_pricing-0.4.4.dist-info → ins_pricing-0.4.5.dist-info}/top_level.txt +0 -0
ins_pricing/CHANGELOG.md
DELETED
|
@@ -1,272 +0,0 @@
|
|
|
1
|
-
# Changelog
|
|
2
|
-
|
|
3
|
-
All notable changes to the ins_pricing project will be documented in this file.
|
|
4
|
-
|
|
5
|
-
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
|
-
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
|
-
|
|
8
|
-
## [0.2.11] - 2026-01-15
|
|
9
|
-
|
|
10
|
-
### Changed
|
|
11
|
-
|
|
12
|
-
#### Refactoring Phase 3: Utils Module Consolidation
|
|
13
|
-
- **Eliminated code duplication** - Consolidated duplicated utility classes:
|
|
14
|
-
- `DeviceManager` and `GPUMemoryManager` now imported from `ins_pricing.utils`
|
|
15
|
-
- Removed 181 lines of duplicate code from `bayesopt/utils/metrics_and_devices.py`
|
|
16
|
-
- File size reduced from 721 to 540 lines (25% reduction)
|
|
17
|
-
- **Benefit**: Single source of truth for device management utilities
|
|
18
|
-
- **Impact**: Bug fixes now propagate automatically, no risk of code drift
|
|
19
|
-
- **Compatibility**: 100% backward compatible - all import patterns continue working
|
|
20
|
-
|
|
21
|
-
**Technical Details**:
|
|
22
|
-
- Package-level `ins_pricing/utils/device.py` is now the canonical implementation
|
|
23
|
-
- BayesOpt utils automatically re-export these classes for backward compatibility
|
|
24
|
-
- No breaking changes required in existing code
|
|
25
|
-
|
|
26
|
-
## [0.2.10] - 2026-01-15
|
|
27
|
-
|
|
28
|
-
### Added
|
|
29
|
-
|
|
30
|
-
#### Refactoring Phase 2: Simplified BayesOptModel API
|
|
31
|
-
- **BayesOptModel config-based initialization** - New recommended API using configuration objects:
|
|
32
|
-
- Added `config` parameter accepting `BayesOptConfig` instances
|
|
33
|
-
- **Before**: 56 individual parameters required
|
|
34
|
-
- **After**: Single config object parameter
|
|
35
|
-
- **Benefits**: Improved code clarity, reusability, type safety, and testability
|
|
36
|
-
|
|
37
|
-
### Changed
|
|
38
|
-
|
|
39
|
-
#### API Improvements
|
|
40
|
-
- **BayesOptModel initialization** - Enhanced parameter handling:
|
|
41
|
-
- New API: `BayesOptModel(train_df, test_df, config=BayesOptConfig(...))`
|
|
42
|
-
- Old API still supported with deprecation warning
|
|
43
|
-
- Made `model_nme`, `resp_nme`, `weight_nme` optional (validated when config=None)
|
|
44
|
-
- Added type validation for config parameter
|
|
45
|
-
- Added helpful error messages for missing required parameters
|
|
46
|
-
|
|
47
|
-
### Deprecated
|
|
48
|
-
|
|
49
|
-
- **BayesOptModel individual parameters** - Passing 56 individual parameters to `__init__`:
|
|
50
|
-
- Use `config=BayesOptConfig(...)` instead
|
|
51
|
-
- Old API will be removed in v0.4.0
|
|
52
|
-
- Migration guide: See `modelling/core/bayesopt/PHASE2_REFACTORING_SUMMARY.md`
|
|
53
|
-
|
|
54
|
-
### Fixed
|
|
55
|
-
|
|
56
|
-
- **Type hints** - Improved type safety in BayesOptModel initialization
|
|
57
|
-
- **Documentation** - Added comprehensive examples of both old and new APIs
|
|
58
|
-
|
|
59
|
-
## [0.2.9] - 2026-01-15
|
|
60
|
-
|
|
61
|
-
### Added
|
|
62
|
-
|
|
63
|
-
#### Refactoring Phase 1: Utils Module Split
|
|
64
|
-
- **Modular utils package** - Split monolithic 1,503-line utils.py into focused modules:
|
|
65
|
-
- `utils/constants.py` (183 lines) - Core constants and simple helpers
|
|
66
|
-
- `utils/io_utils.py` (110 lines) - File I/O and parameter loading
|
|
67
|
-
- `utils/distributed_utils.py` (163 lines) - DDP and CUDA management
|
|
68
|
-
- `utils/torch_trainer_mixin.py` (587 lines) - PyTorch training infrastructure
|
|
69
|
-
- `utils/metrics_and_devices.py` (721 lines) - Metrics, GPU, device, CV, plotting
|
|
70
|
-
- `utils/__init__.py` (86 lines) - Backward compatibility re-exports
|
|
71
|
-
|
|
72
|
-
- **Upload automation** - Cross-platform PyPI upload scripts:
|
|
73
|
-
- `upload_to_pypi.sh` - Shell script for Linux/macOS with auto-version extraction
|
|
74
|
-
- `upload_to_pypi.bat` - Updated Windows batch script with auto-version extraction
|
|
75
|
-
- `Makefile` - Cross-platform build automation (build, check, upload, clean)
|
|
76
|
-
- `README_UPLOAD.md` - Comprehensive upload documentation in English
|
|
77
|
-
- `UPLOAD_QUICK_START.md` - Quick start guide for package publishing
|
|
78
|
-
|
|
79
|
-
### Changed
|
|
80
|
-
|
|
81
|
-
#### Code Organization
|
|
82
|
-
- **utils module structure** - Improved maintainability and testability:
|
|
83
|
-
- Average file size reduced from 1,503 to 351 lines per module
|
|
84
|
-
- Each module has single responsibility
|
|
85
|
-
- Independent testing now possible for each component
|
|
86
|
-
- **Impact**: 100% backward compatibility maintained via re-exports
|
|
87
|
-
|
|
88
|
-
### Deprecated
|
|
89
|
-
|
|
90
|
-
- **utils.py single file import** - Direct import from `bayesopt/utils.py`:
|
|
91
|
-
- Use `from .utils import ...` instead (package import)
|
|
92
|
-
- Old single-file import shows deprecation warning
|
|
93
|
-
- File will be removed in v0.4.0
|
|
94
|
-
- **Note**: All imports continue to work identically
|
|
95
|
-
|
|
96
|
-
### Removed
|
|
97
|
-
|
|
98
|
-
- **verify_core_decoupling.py** - Obsolete test script for unimplemented refactoring
|
|
99
|
-
- Cleanup logged in `.cleanup_log.md`
|
|
100
|
-
|
|
101
|
-
## [0.2.8] - 2026-01-14
|
|
102
|
-
|
|
103
|
-
### Added
|
|
104
|
-
|
|
105
|
-
#### New Utility Modules
|
|
106
|
-
- **utils/validation.py** - Comprehensive data validation toolkit with 8 validation functions:
|
|
107
|
-
- `validate_required_columns()` - Validate required DataFrame columns
|
|
108
|
-
- `validate_column_types()` - Validate and optionally coerce column types
|
|
109
|
-
- `validate_value_range()` - Validate numeric value ranges
|
|
110
|
-
- `validate_no_nulls()` - Check for null values
|
|
111
|
-
- `validate_categorical_values()` - Validate categorical values against allowed set
|
|
112
|
-
- `validate_positive()` - Ensure positive numeric values
|
|
113
|
-
- `validate_dataframe_not_empty()` - Check DataFrame is not empty
|
|
114
|
-
- `validate_date_range()` - Validate date ranges
|
|
115
|
-
|
|
116
|
-
- **utils/profiling.py** - Performance profiling and memory monitoring utilities:
|
|
117
|
-
- `profile_section()` - Context manager for execution time and memory tracking
|
|
118
|
-
- `get_memory_info()` - Get current memory usage statistics
|
|
119
|
-
- `log_memory_usage()` - Log memory usage with custom prefix
|
|
120
|
-
- `check_memory_threshold()` - Check if memory exceeds threshold
|
|
121
|
-
- `cleanup_memory()` - Force memory cleanup for CPU and GPU
|
|
122
|
-
- `MemoryMonitor` - Context manager with automatic cleanup
|
|
123
|
-
- `profile_training_epoch()` - Periodic memory profiling during training
|
|
124
|
-
|
|
125
|
-
- **pricing/factors.py** - LRU caching for binning operations:
|
|
126
|
-
- `_compute_bins_cached()` - Cached bin edge computation (maxsize=128)
|
|
127
|
-
- `clear_binning_cache()` - Clear binning cache
|
|
128
|
-
- `get_cache_info()` - Get cache statistics (hits, misses, size)
|
|
129
|
-
- Enhanced `bin_numeric()` with `use_cache` parameter
|
|
130
|
-
|
|
131
|
-
#### Test Coverage Expansion
|
|
132
|
-
- **tests/production/** - Complete production module test suite (4 files, 247 test scenarios):
|
|
133
|
-
- `test_predict.py` - Prediction and model loading tests (87 scenarios)
|
|
134
|
-
- `test_scoring.py` - Scoring metrics validation (60 scenarios)
|
|
135
|
-
- `test_monitoring.py` - Drift detection and monitoring (55 scenarios)
|
|
136
|
-
- `test_preprocess.py` - Preprocessing pipeline tests (45 scenarios)
|
|
137
|
-
|
|
138
|
-
- **tests/pricing/** - Pricing module test suite (4 files):
|
|
139
|
-
- `test_factors.py` - Factor table construction and binning
|
|
140
|
-
- `test_exposure.py` - Exposure calculation tests
|
|
141
|
-
- `test_calibration.py` - Calibration factor fitting tests
|
|
142
|
-
- `test_rate_table.py` - Rate table generation tests
|
|
143
|
-
|
|
144
|
-
- **tests/governance/** - Governance workflow test suite (3 files):
|
|
145
|
-
- `test_registry.py` - Model registry operations
|
|
146
|
-
- `test_release.py` - Release management and rollback
|
|
147
|
-
- `test_audit.py` - Audit logging and trail verification
|
|
148
|
-
|
|
149
|
-
### Enhanced
|
|
150
|
-
|
|
151
|
-
#### SHAP Computation Parallelization
|
|
152
|
-
- **modelling/explain/shap_utils.py** - Added parallel SHAP computation:
|
|
153
|
-
- `_compute_shap_parallel()` - Parallel SHAP value computation using joblib
|
|
154
|
-
- All SHAP functions now support `use_parallel` and `n_jobs` parameters:
|
|
155
|
-
- `compute_shap_glm()` - GLM model SHAP with parallelization
|
|
156
|
-
- `compute_shap_xgb()` - XGBoost model SHAP with parallelization
|
|
157
|
-
- `compute_shap_resn()` - ResNet model SHAP with parallelization
|
|
158
|
-
- `compute_shap_ft()` - FT-Transformer model SHAP with parallelization
|
|
159
|
-
- Automatic batch size optimization based on CPU cores
|
|
160
|
-
- **Performance**: 3-6x speedup on multi-core systems (n_samples > 100)
|
|
161
|
-
- Graceful fallback to sequential computation if joblib unavailable
|
|
162
|
-
|
|
163
|
-
#### Documentation Improvements
|
|
164
|
-
- **production/preprocess.py** - Complete documentation overhaul:
|
|
165
|
-
- Module-level docstring with workflow explanation and examples
|
|
166
|
-
- `load_preprocess_artifacts()` - Full parameter and return value documentation
|
|
167
|
-
- `prepare_raw_features()` - Detailed data preparation steps and examples
|
|
168
|
-
- `apply_preprocess_artifacts()` - Complete preprocessing pipeline documentation
|
|
169
|
-
|
|
170
|
-
- **pricing/calibration.py** - Comprehensive documentation:
|
|
171
|
-
- Module-level docstring with business context and use cases
|
|
172
|
-
- `fit_calibration_factor()` - Mathematical formulas, multiple examples, business guidance
|
|
173
|
-
- `apply_calibration()` - Usage examples showing ratio preservation
|
|
174
|
-
|
|
175
|
-
#### Configuration Validation
|
|
176
|
-
- **modelling/core/bayesopt/config_preprocess.py** - BayesOptConfig validation already comprehensive:
|
|
177
|
-
- Task type validation
|
|
178
|
-
- Parameter range validation
|
|
179
|
-
- Distributed training conflict detection
|
|
180
|
-
- Cross-validation strategy validation
|
|
181
|
-
- GNN memory settings validation
|
|
182
|
-
|
|
183
|
-
### Performance Improvements
|
|
184
|
-
|
|
185
|
-
- **Memory optimization** - DatasetPreprocessor reduces unnecessary DataFrame copies:
|
|
186
|
-
- Conditional copying only when scaling needed
|
|
187
|
-
- Direct reference assignment where safe
|
|
188
|
-
- **Impact**: 30-40% reduction in memory usage during preprocessing
|
|
189
|
-
|
|
190
|
-
- **Binning cache** - LRU cache for factor table binning operations:
|
|
191
|
-
- Cache size: 128 entries
|
|
192
|
-
- **Impact**: 5-10x speedup for repeated binning of same columns
|
|
193
|
-
|
|
194
|
-
- **SHAP parallelization** - Multi-core SHAP value computation:
|
|
195
|
-
- **Impact**: 3-6x speedup depending on CPU cores and sample size
|
|
196
|
-
- Automatic batch size tuning
|
|
197
|
-
- Memory-efficient batch processing
|
|
198
|
-
|
|
199
|
-
### Fixed
|
|
200
|
-
|
|
201
|
-
- **Distributed training** - State dict key mismatch issues already resolved in previous versions:
|
|
202
|
-
- model_ft_trainer.py: Lines 409, 738
|
|
203
|
-
- model_resn.py: Line 405
|
|
204
|
-
- utils.py: Line 796
|
|
205
|
-
|
|
206
|
-
### Technical Debt
|
|
207
|
-
|
|
208
|
-
- Custom exception hierarchy fully implemented in `exceptions.py`:
|
|
209
|
-
- `InsPricingError` - Base exception
|
|
210
|
-
- `ConfigurationError` - Invalid configuration
|
|
211
|
-
- `DataValidationError` - Data validation failures
|
|
212
|
-
- `ModelLoadError` - Model loading failures
|
|
213
|
-
- `DistributedTrainingError` - DDP/DataParallel errors
|
|
214
|
-
- `PreprocessingError` - Preprocessing failures
|
|
215
|
-
- `PredictionError` - Prediction failures
|
|
216
|
-
- `GovernanceError` - Governance workflow errors
|
|
217
|
-
|
|
218
|
-
### Testing
|
|
219
|
-
|
|
220
|
-
- **Test coverage increase**: From 35% → 60%+ (estimated)
|
|
221
|
-
- 250+ new test scenarios across 11 test files
|
|
222
|
-
- Coverage for previously untested modules: production, pricing, governance
|
|
223
|
-
- Integration tests for end-to-end workflows
|
|
224
|
-
|
|
225
|
-
### Documentation
|
|
226
|
-
|
|
227
|
-
- **Docstring coverage**: 0% → 95% for improved modules
|
|
228
|
-
- 150+ lines of new documentation
|
|
229
|
-
- 8+ complete code examples
|
|
230
|
-
- Business context and use case explanations
|
|
231
|
-
- Parameter constraints and edge case documentation
|
|
232
|
-
|
|
233
|
-
---
|
|
234
|
-
|
|
235
|
-
## [0.2.7] - Previous Release
|
|
236
|
-
|
|
237
|
-
(Previous changelog entries would go here)
|
|
238
|
-
|
|
239
|
-
---
|
|
240
|
-
|
|
241
|
-
## Release Notes for 0.2.8
|
|
242
|
-
|
|
243
|
-
This release focuses on **code quality, performance optimization, and documentation** improvements. Major highlights:
|
|
244
|
-
|
|
245
|
-
### 🚀 Performance
|
|
246
|
-
- **3-6x faster SHAP computation** with parallel processing
|
|
247
|
-
- **30-40% memory reduction** in preprocessing
|
|
248
|
-
- **5-10x faster binning** with LRU cache
|
|
249
|
-
|
|
250
|
-
### 📚 Documentation
|
|
251
|
-
- **Complete module documentation** for production and pricing modules
|
|
252
|
-
- **150+ lines of new documentation** with practical examples
|
|
253
|
-
- **Business context** explanations for insurance domain
|
|
254
|
-
|
|
255
|
-
### 🧪 Testing
|
|
256
|
-
- **250+ new test scenarios** across 11 test files
|
|
257
|
-
- **60%+ test coverage** (up from 35%)
|
|
258
|
-
- **Complete coverage** for production, pricing, governance modules
|
|
259
|
-
|
|
260
|
-
### 🛠️ Developer Experience
|
|
261
|
-
- **Comprehensive validation toolkit** for data quality checks
|
|
262
|
-
- **Performance profiling utilities** for optimization
|
|
263
|
-
- **Enhanced error messages** with clear troubleshooting guidance
|
|
264
|
-
|
|
265
|
-
### Migration Notes
|
|
266
|
-
- All changes are **backward compatible**
|
|
267
|
-
- New features are **opt-in** (e.g., `use_parallel=True`)
|
|
268
|
-
- No breaking changes to existing APIs
|
|
269
|
-
|
|
270
|
-
### Dependencies
|
|
271
|
-
- Optional: `joblib>=1.2` for parallel SHAP computation
|
|
272
|
-
- Optional: `psutil` for memory profiling utilities
|
|
@@ -1,344 +0,0 @@
|
|
|
1
|
-
# Release Notes: ins_pricing v0.2.8
|
|
2
|
-
|
|
3
|
-
**Release Date:** January 14, 2026
|
|
4
|
-
**Type:** Minor Release (Quality & Performance Improvements)
|
|
5
|
-
|
|
6
|
-
---
|
|
7
|
-
|
|
8
|
-
## 🎯 Overview
|
|
9
|
-
|
|
10
|
-
Version 0.2.8 is a significant quality and performance improvement release that focuses on:
|
|
11
|
-
- **Code quality and maintainability**
|
|
12
|
-
- **Performance optimization** (3-6x faster SHAP, 30-40% memory reduction)
|
|
13
|
-
- **Comprehensive documentation**
|
|
14
|
-
- **Extensive test coverage** (35% → 60%+)
|
|
15
|
-
|
|
16
|
-
**All changes are backward compatible.** No breaking changes.
|
|
17
|
-
|
|
18
|
-
---
|
|
19
|
-
|
|
20
|
-
## ⭐ Highlights
|
|
21
|
-
|
|
22
|
-
### 1. 🚀 Performance Optimizations
|
|
23
|
-
|
|
24
|
-
#### SHAP Parallelization (3-6x Speedup)
|
|
25
|
-
```python
|
|
26
|
-
# Before (slow - serial processing)
|
|
27
|
-
result = compute_shap_xgb(ctx, n_samples=200) # ~10 minutes
|
|
28
|
-
|
|
29
|
-
# After (fast - parallel processing)
|
|
30
|
-
result = compute_shap_xgb(ctx, n_samples=200, use_parallel=True) # ~2 minutes
|
|
31
|
-
```
|
|
32
|
-
**Impact:** 3-6x faster on multi-core systems for n_samples > 100
|
|
33
|
-
|
|
34
|
-
#### Memory Optimization (30-40% Reduction)
|
|
35
|
-
- DatasetPreprocessor reduces unnecessary DataFrame copies
|
|
36
|
-
- Conditional copying only when needed
|
|
37
|
-
- Direct reference assignment where safe
|
|
38
|
-
|
|
39
|
-
#### Binning Cache (5-10x Speedup)
|
|
40
|
-
```python
|
|
41
|
-
from ins_pricing.pricing.factors import get_cache_info, clear_binning_cache
|
|
42
|
-
|
|
43
|
-
# Automatic caching for repeated binning
|
|
44
|
-
factor_table = build_factor_table(df, factor_col='age', n_bins=10) # Cached!
|
|
45
|
-
|
|
46
|
-
# Check cache performance
|
|
47
|
-
info = get_cache_info()
|
|
48
|
-
print(f"Cache hit rate: {info['hits'] / (info['hits'] + info['misses']):.1%}")
|
|
49
|
-
```
|
|
50
|
-
|
|
51
|
-
---
|
|
52
|
-
|
|
53
|
-
### 2. 🛠️ New Utility Modules
|
|
54
|
-
|
|
55
|
-
#### Data Validation Toolkit
|
|
56
|
-
```python
|
|
57
|
-
from ins_pricing.utils.validation import (
|
|
58
|
-
validate_required_columns,
|
|
59
|
-
validate_column_types,
|
|
60
|
-
validate_value_range,
|
|
61
|
-
validate_no_nulls,
|
|
62
|
-
validate_positive
|
|
63
|
-
)
|
|
64
|
-
|
|
65
|
-
# Validate DataFrame structure
|
|
66
|
-
validate_required_columns(df, ['age', 'premium', 'exposure'], df_name='policy_data')
|
|
67
|
-
|
|
68
|
-
# Validate data types
|
|
69
|
-
df = validate_column_types(df, {'age': 'int64', 'premium': 'float64'}, coerce=True)
|
|
70
|
-
|
|
71
|
-
# Validate value ranges
|
|
72
|
-
validate_value_range(df, 'age', min_val=0, max_val=120)
|
|
73
|
-
validate_positive(df, ['premium', 'exposure'], allow_zero=False)
|
|
74
|
-
```
|
|
75
|
-
|
|
76
|
-
#### Performance Profiling
|
|
77
|
-
```python
|
|
78
|
-
from ins_pricing.utils.profiling import profile_section, MemoryMonitor
|
|
79
|
-
|
|
80
|
-
# Simple profiling
|
|
81
|
-
with profile_section("Data Processing", logger):
|
|
82
|
-
process_large_dataset()
|
|
83
|
-
# Output: [Profile] Data Processing: 5.23s, RAM: +1250.3MB, GPU peak: 2048.5MB
|
|
84
|
-
|
|
85
|
-
# Memory monitoring with auto-cleanup
|
|
86
|
-
with MemoryMonitor("Training", threshold_gb=16.0, logger=logger):
|
|
87
|
-
train_model()
|
|
88
|
-
```
|
|
89
|
-
|
|
90
|
-
---
|
|
91
|
-
|
|
92
|
-
### 3. 📚 Documentation Overhaul
|
|
93
|
-
|
|
94
|
-
#### Complete Module Documentation
|
|
95
|
-
- **production/preprocess.py**: Module + 3 functions fully documented
|
|
96
|
-
- **pricing/calibration.py**: Module + 2 functions with business context
|
|
97
|
-
- All docs include practical examples and business rationale
|
|
98
|
-
|
|
99
|
-
#### Example Quality
|
|
100
|
-
```python
|
|
101
|
-
def fit_calibration_factor(pred, actual, *, weight=None, target_lr=None):
|
|
102
|
-
"""Fit a scalar calibration factor to align predictions with actuals.
|
|
103
|
-
|
|
104
|
-
This function computes a multiplicative calibration factor...
|
|
105
|
-
|
|
106
|
-
Args:
|
|
107
|
-
pred: Model predictions (premiums or pure premiums)
|
|
108
|
-
actual: Actual observed values (claims or losses)
|
|
109
|
-
weight: Optional weights (e.g., exposure, earned premium)
|
|
110
|
-
target_lr: Target loss ratio to achieve (0 < target_lr < 1)
|
|
111
|
-
|
|
112
|
-
Returns:
|
|
113
|
-
Calibration factor (scalar multiplier)
|
|
114
|
-
|
|
115
|
-
Example:
|
|
116
|
-
>>> # Calibrate to achieve 70% loss ratio
|
|
117
|
-
>>> pred_premium = np.array([100, 150, 200])
|
|
118
|
-
>>> actual_claims = np.array([75, 100, 130])
|
|
119
|
-
>>> factor = fit_calibration_factor(pred_premium, actual_claims, target_lr=0.70)
|
|
120
|
-
>>> print(f"{factor:.3f}")
|
|
121
|
-
1.143 # Adjust premiums to achieve 70% loss ratio
|
|
122
|
-
|
|
123
|
-
Note:
|
|
124
|
-
- target_lr typically in range [0.5, 0.9] for insurance pricing
|
|
125
|
-
"""
|
|
126
|
-
```
|
|
127
|
-
|
|
128
|
-
---
|
|
129
|
-
|
|
130
|
-
### 4. 🧪 Test Coverage Expansion
|
|
131
|
-
|
|
132
|
-
#### New Test Suites
|
|
133
|
-
- **tests/production/** (247 scenarios)
|
|
134
|
-
- Prediction, scoring, monitoring, preprocessing
|
|
135
|
-
- **tests/pricing/** (60+ scenarios)
|
|
136
|
-
- Factors, exposure, calibration, rate tables
|
|
137
|
-
- **tests/governance/** (40+ scenarios)
|
|
138
|
-
- Registry, release, audit workflows
|
|
139
|
-
|
|
140
|
-
#### Coverage Increase
|
|
141
|
-
- **Before:** 35% overall coverage
|
|
142
|
-
- **After:** 60%+ overall coverage
|
|
143
|
-
- **Impact:** Better reliability, fewer production bugs
|
|
144
|
-
|
|
145
|
-
---
|
|
146
|
-
|
|
147
|
-
## 📦 What's New
|
|
148
|
-
|
|
149
|
-
### Added
|
|
150
|
-
|
|
151
|
-
#### Core Utilities
|
|
152
|
-
- `utils/validation.py` - 8 validation functions for data quality
|
|
153
|
-
- `utils/profiling.py` - Performance and memory monitoring tools
|
|
154
|
-
- `pricing/factors.py` - LRU caching for binning operations
|
|
155
|
-
|
|
156
|
-
#### Test Coverage
|
|
157
|
-
- 11 new test files with 250+ test scenarios
|
|
158
|
-
- Complete coverage for production, pricing, governance modules
|
|
159
|
-
|
|
160
|
-
#### Documentation
|
|
161
|
-
- Module-level docstrings with business context
|
|
162
|
-
- 150+ lines of comprehensive documentation
|
|
163
|
-
- 8+ complete working examples
|
|
164
|
-
|
|
165
|
-
### Enhanced
|
|
166
|
-
|
|
167
|
-
#### SHAP Computation
|
|
168
|
-
- Parallel processing support via joblib
|
|
169
|
-
- Automatic batch size optimization
|
|
170
|
-
- Graceful fallback if joblib unavailable
|
|
171
|
-
- All SHAP functions support `use_parallel=True`
|
|
172
|
-
|
|
173
|
-
#### Configuration Validation
|
|
174
|
-
- BayesOptConfig with comprehensive `__post_init__` validation
|
|
175
|
-
- Clear error messages for configuration issues
|
|
176
|
-
- Validation of distributed training settings
|
|
177
|
-
|
|
178
|
-
### Performance
|
|
179
|
-
|
|
180
|
-
| Feature | Before | After | Improvement |
|
|
181
|
-
|---------|--------|-------|-------------|
|
|
182
|
-
| SHAP (200 samples) | 10 min | 2-3 min | **3-6x faster** |
|
|
183
|
-
| Preprocessing memory | 2.5 GB | 1.5 GB | **40% reduction** |
|
|
184
|
-
| Repeated binning | 5.2s | 0.5s | **10x faster** |
|
|
185
|
-
|
|
186
|
-
---
|
|
187
|
-
|
|
188
|
-
## 🔄 Migration Guide
|
|
189
|
-
|
|
190
|
-
### No Breaking Changes
|
|
191
|
-
|
|
192
|
-
All changes are **backward compatible**. Existing code will continue to work without modifications.
|
|
193
|
-
|
|
194
|
-
### Opt-in Features
|
|
195
|
-
|
|
196
|
-
New features are opt-in and don't affect existing behavior:
|
|
197
|
-
|
|
198
|
-
```python
|
|
199
|
-
# SHAP parallelization - opt-in
|
|
200
|
-
result = compute_shap_xgb(ctx, use_parallel=True) # New parameter
|
|
201
|
-
|
|
202
|
-
# Binning cache - automatic, but can be disabled
|
|
203
|
-
binned = bin_numeric(series, bins=10, use_cache=False) # Opt-out if needed
|
|
204
|
-
```
|
|
205
|
-
|
|
206
|
-
### Recommended Updates
|
|
207
|
-
|
|
208
|
-
While not required, consider adopting these improvements:
|
|
209
|
-
|
|
210
|
-
#### 1. Enable Parallel SHAP (if using SHAP)
|
|
211
|
-
```python
|
|
212
|
-
# Before
|
|
213
|
-
shap_result = compute_shap_xgb(ctx, n_samples=200)
|
|
214
|
-
|
|
215
|
-
# After (recommended for n_samples > 100)
|
|
216
|
-
shap_result = compute_shap_xgb(ctx, n_samples=200, use_parallel=True, n_jobs=-1)
|
|
217
|
-
```
|
|
218
|
-
|
|
219
|
-
#### 2. Add Data Validation (for production code)
|
|
220
|
-
```python
|
|
221
|
-
from ins_pricing.utils.validation import validate_required_columns, validate_positive
|
|
222
|
-
|
|
223
|
-
def score_policies(df):
|
|
224
|
-
# Add validation at entry points
|
|
225
|
-
validate_required_columns(df, ['age', 'premium', 'exposure'], df_name='input_data')
|
|
226
|
-
validate_positive(df, ['premium', 'exposure'])
|
|
227
|
-
|
|
228
|
-
# Your existing code...
|
|
229
|
-
```
|
|
230
|
-
|
|
231
|
-
#### 3. Use Profiling (for optimization)
|
|
232
|
-
```python
|
|
233
|
-
from ins_pricing.utils.profiling import profile_section
|
|
234
|
-
|
|
235
|
-
def expensive_operation():
|
|
236
|
-
with profile_section("Data Processing"):
|
|
237
|
-
# Your code...
|
|
238
|
-
```
|
|
239
|
-
|
|
240
|
-
---
|
|
241
|
-
|
|
242
|
-
## 📋 Installation
|
|
243
|
-
|
|
244
|
-
### Standard Installation
|
|
245
|
-
```bash
|
|
246
|
-
pip install ins_pricing==0.2.8
|
|
247
|
-
```
|
|
248
|
-
|
|
249
|
-
### With Optional Dependencies
|
|
250
|
-
```bash
|
|
251
|
-
# For parallel SHAP computation
|
|
252
|
-
pip install "ins_pricing[explain]==0.2.8"
|
|
253
|
-
|
|
254
|
-
# For memory profiling
|
|
255
|
-
pip install psutil
|
|
256
|
-
|
|
257
|
-
# All features
|
|
258
|
-
pip install "ins_pricing[all]==0.2.8" psutil
|
|
259
|
-
```
|
|
260
|
-
|
|
261
|
-
---
|
|
262
|
-
|
|
263
|
-
## 🔧 Dependencies
|
|
264
|
-
|
|
265
|
-
### New Optional Dependencies
|
|
266
|
-
- `joblib>=1.2` - For parallel SHAP computation (optional)
|
|
267
|
-
- `psutil` - For memory profiling utilities (optional)
|
|
268
|
-
|
|
269
|
-
### Unchanged Core Dependencies
|
|
270
|
-
- `numpy>=1.20`
|
|
271
|
-
- `pandas>=1.4`
|
|
272
|
-
- All existing optional dependencies remain the same
|
|
273
|
-
|
|
274
|
-
---
|
|
275
|
-
|
|
276
|
-
## 🐛 Known Issues
|
|
277
|
-
|
|
278
|
-
None identified in this release.
|
|
279
|
-
|
|
280
|
-
---
|
|
281
|
-
|
|
282
|
-
## 🔮 What's Next (v0.2.9)
|
|
283
|
-
|
|
284
|
-
Planned improvements for the next release:
|
|
285
|
-
|
|
286
|
-
1. **Governance Module Documentation** - Complete docs for registry, approval, release modules
|
|
287
|
-
2. **Plotting Module Documentation** - Enhanced visualization guidance
|
|
288
|
-
3. **CI/CD Pipeline** - Automated testing and code quality checks
|
|
289
|
-
4. **Additional Performance Optimizations** - Vectorized operations in pricing modules
|
|
290
|
-
|
|
291
|
-
---
|
|
292
|
-
|
|
293
|
-
## 📊 Metrics Summary
|
|
294
|
-
|
|
295
|
-
| Metric | Before | After | Change |
|
|
296
|
-
|--------|--------|-------|--------|
|
|
297
|
-
| **Test Coverage** | 35% | 60%+ | +25% ✅ |
|
|
298
|
-
| **Documentation Coverage** | ~40% | ~70% | +30% ✅ |
|
|
299
|
-
| **SHAP Performance** | 1x | 3-6x | +3-6x ✅ |
|
|
300
|
-
| **Memory Usage** | 100% | 60-70% | -30-40% ✅ |
|
|
301
|
-
| **Binning Performance** | 1x | 5-10x | +5-10x ✅ |
|
|
302
|
-
|
|
303
|
-
---
|
|
304
|
-
|
|
305
|
-
## 🙏 Acknowledgments
|
|
306
|
-
|
|
307
|
-
This release includes comprehensive code review findings and implements best practices for:
|
|
308
|
-
- Performance optimization
|
|
309
|
-
- Memory management
|
|
310
|
-
- Code documentation
|
|
311
|
-
- Test coverage
|
|
312
|
-
- Developer experience
|
|
313
|
-
|
|
314
|
-
---
|
|
315
|
-
|
|
316
|
-
## 📞 Support
|
|
317
|
-
|
|
318
|
-
For issues or questions about this release:
|
|
319
|
-
1. Check the [CHANGELOG.md](CHANGELOG.md) for detailed changes
|
|
320
|
-
2. Review module documentation in updated files
|
|
321
|
-
3. Check test files for usage examples
|
|
322
|
-
|
|
323
|
-
---
|
|
324
|
-
|
|
325
|
-
## ✅ Upgrade Checklist
|
|
326
|
-
|
|
327
|
-
Before upgrading to 0.2.8:
|
|
328
|
-
|
|
329
|
-
- [ ] Review [CHANGELOG.md](CHANGELOG.md) for all changes
|
|
330
|
-
- [ ] No breaking changes - safe to upgrade
|
|
331
|
-
- [ ] Consider enabling parallel SHAP if using SHAP
|
|
332
|
-
- [ ] Consider adding data validation for production workflows
|
|
333
|
-
- [ ] Install optional dependencies if needed: `pip install joblib psutil`
|
|
334
|
-
|
|
335
|
-
After upgrading:
|
|
336
|
-
|
|
337
|
-
- [ ] Verify existing functionality still works
|
|
338
|
-
- [ ] Consider adopting new validation utilities
|
|
339
|
-
- [ ] Consider adding performance profiling
|
|
340
|
-
- [ ] Review new test examples for your use cases
|
|
341
|
-
|
|
342
|
-
---
|
|
343
|
-
|
|
344
|
-
**Happy modeling! 🎉**
|
|
@@ -1,78 +0,0 @@
|
|
|
1
|
-
LOSS FUNCTIONS
|
|
2
|
-
|
|
3
|
-
Overview
|
|
4
|
-
This document describes the loss-function changes in ins_pricing. The training
|
|
5
|
-
stack now supports multiple regression losses (not just Tweedie deviance) and
|
|
6
|
-
propagates the selected loss into tuning, training, and inference.
|
|
7
|
-
|
|
8
|
-
Supported loss_name values
|
|
9
|
-
- auto (default): keep legacy behavior based on model name
|
|
10
|
-
- tweedie: Tweedie deviance (uses tw_power / tweedie_variance_power when tuning)
|
|
11
|
-
- poisson: Poisson deviance (power=1)
|
|
12
|
-
- gamma: Gamma deviance (power=2)
|
|
13
|
-
- mse: mean squared error
|
|
14
|
-
- mae: mean absolute error
|
|
15
|
-
|
|
16
|
-
Loss name mapping (all options)
|
|
17
|
-
- Tweedie deviance -> tweedie
|
|
18
|
-
- Poisson deviance -> poisson
|
|
19
|
-
- Gamma deviance -> gamma
|
|
20
|
-
- Mean squared error -> mse
|
|
21
|
-
- Mean absolute error -> mae
|
|
22
|
-
- Classification log loss -> logloss (classification only)
|
|
23
|
-
- Classification BCE -> bce (classification only)
|
|
24
|
-
|
|
25
|
-
Classification tasks
|
|
26
|
-
- loss_name can be auto, logloss, or bce
|
|
27
|
-
- training continues to use BCEWithLogits for torch models; evaluation uses logloss
|
|
28
|
-
|
|
29
|
-
Where to set loss_name
|
|
30
|
-
Add to any BayesOpt config JSON:
|
|
31
|
-
|
|
32
|
-
{
|
|
33
|
-
"task_type": "regression",
|
|
34
|
-
"loss_name": "mse"
|
|
35
|
-
}
|
|
36
|
-
|
|
37
|
-
Behavior changes
|
|
38
|
-
1) Tuning and metrics
|
|
39
|
-
- When loss_name is mse/mae, tuning does not sample Tweedie power.
|
|
40
|
-
- When loss_name is poisson/gamma, power is fixed (1.0/2.0).
|
|
41
|
-
- When loss_name is tweedie, power is sampled as before.
|
|
42
|
-
|
|
43
|
-
2) Torch training (ResNet/FT/GNN)
|
|
44
|
-
- Loss computation is routed by loss_name.
|
|
45
|
-
- For tweedie/poisson/gamma, predictions are clamped positive.
|
|
46
|
-
- For mse/mae, no Tweedie power is used.
|
|
47
|
-
|
|
48
|
-
3) XGBoost objective
|
|
49
|
-
- loss_name controls XGB objective:
|
|
50
|
-
- tweedie -> reg:tweedie
|
|
51
|
-
- poisson -> count:poisson
|
|
52
|
-
- gamma -> reg:gamma
|
|
53
|
-
- mse -> reg:squarederror
|
|
54
|
-
- mae -> reg:absoluteerror
|
|
55
|
-
|
|
56
|
-
4) Inference
|
|
57
|
-
- ResNet/GNN constructors now receive loss_name.
|
|
58
|
-
- When loss_name is not tweedie, tw_power is not applied at inference.
|
|
59
|
-
|
|
60
|
-
Legacy defaults (auto)
|
|
61
|
-
- If loss_name is omitted, behavior is unchanged:
|
|
62
|
-
- model name contains "f" -> poisson
|
|
63
|
-
- model name contains "s" -> gamma
|
|
64
|
-
- otherwise -> tweedie
|
|
65
|
-
|
|
66
|
-
Examples
|
|
67
|
-
- ResNet direct training (MSE):
|
|
68
|
-
"loss_name": "mse"
|
|
69
|
-
|
|
70
|
-
- FT embed -> ResNet (MSE):
|
|
71
|
-
"loss_name": "mse"
|
|
72
|
-
|
|
73
|
-
- XGB direct training (unchanged):
|
|
74
|
-
omit loss_name or set "loss_name": "auto"
|
|
75
|
-
|
|
76
|
-
Notes
|
|
77
|
-
- loss_name is global per config. If you need different losses for different
|
|
78
|
-
models, split into separate configs and run them independently.
|