optixcel 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
optixcel-1.0.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2024 Muhamamd Wajdan Jamal
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMplied, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,427 @@
1
+ # 🧠 OPTICAL MIND - Complete Research System Documentation
2
+
3
+ ## Overview
4
+
5
+ OpticalMind is an intelligent, self-analyzing machine learning system for predicting optical properties of perovskites. It combines cutting-edge ML with comprehensive diagnostics and explainability.
6
+
7
+ ---
8
+
9
+ ## Core Components
10
+
11
+ ### 1. Data Preprocessor
12
+ - ✅ Automatic data structure analysis
13
+ - ✅ Missing value handling (statistical vs domain-aware)
14
+ - ✅ Robust normalization (RobustScaler)
15
+ - ✅ Constant feature detection
16
+
17
+ **Key Features:**
18
+ ```python
19
+ preprocessor = DataPreprocessor(verbose=True)
20
+ analysis = preprocessor.analyze_data(X)
21
+ X_clean, y_clean = preprocessor.handle_missing_values(X, y)
22
+ X_norm = preprocessor.normalize(X_clean, fit=True)
23
+ ```
24
+
25
+ ### 2. Diagnostics Engine
26
+ - ✅ Statistical outlier detection (Z-score + IQR)
27
+ - ✅ Feature inconsistency analysis
28
+ - ✅ Overfitting detection (train vs validation gap)
29
+ - ✅ Feature correlation analysis
30
+
31
+ **Key Features:**
32
+ ```python
33
+ diagnostics = DiagnosticsEngine(verbose=True)
34
+ outliers = diagnostics.detect_statistical_outliers(X, y, threshold=3.0)
35
+ feature_issues = diagnostics.detect_feature_inconsistencies(X)
36
+ overfit_analysis = diagnostics.detect_overfitting(model, X_train, y_train, X_val, y_val)
37
+ correlations = diagnostics.analyze_correlations(X)
38
+ ```
39
+
40
+ ### 3. Feature Engineer
41
+ - ✅ Intelligent feature selection (SelectKBest)
42
+ - ✅ Polynomial feature generation
43
+ - ✅ Feature interaction creation
44
+
45
+ **Key Features:**
46
+ ```python
47
+ engineer = FeatureEngineer(verbose=True)
48
+ X_selected = engineer.select_features(X, y, n_features=50)
49
+ X_engineered = engineer.create_polynomial_features(X, degree=2)
50
+ ```
51
+
52
+ ### 4. Comprehensive Evaluator
53
+ Calculates 8+ evaluation metrics:
54
+ - **R²**: Coefficient of determination
55
+ - **NSE**: Nash-Sutcliffe Efficiency
56
+ - **RMSE**: Root Mean Squared Error
57
+ - **MAE**: Mean Absolute Error
58
+ - **MAPE**: Mean Absolute Percentage Error
59
+ - **VAR%**: Variance Explained (%)
60
+ - **PI**: Performance Index
61
+ - **a10/a20**: % predictions ≤ 10%/20% error
62
+
63
+ **Usage:**
64
+ ```python
65
+ evaluator = ComprehensiveEvaluator()
66
+ metrics = evaluator.calculate_metrics(y_true, y_pred)
67
+ formatted = evaluator.format_metrics(metrics, phase="Test")
68
+ print(formatted)
69
+ ```
70
+
71
+ ### 5. Explainability Engine
72
+ - ✅ Feature importance calculation
73
+ - ✅ SHAP-ready architecture
74
+ - ✅ Prediction explanation generation
75
+
76
+ **Usage:**
77
+ ```python
78
+ explainer = ExplainabilityEngine(verbose=True)
79
+ importance = explainer.calculate_feature_importance(model, X)
80
+ explanation = explainer.explain_prediction(features, feature_names)
81
+ ```
82
+
83
+ ---
84
+
85
+ ## Main OpticalMind Class
86
+
87
+ ### Initialization
88
+ ```python
89
+ from optical_mind import OpticalMind
90
+
91
+ mind = OpticalMind(
92
+ verbose=True, # Print all diagnostics
93
+ n_features=50, # Select top 50 features
94
+ random_state=42 # For reproducibility
95
+ )
96
+ ```
97
+
98
+ ### Training with Full Diagnostics
99
+ ```python
100
+ report = mind.fit(
101
+ X, # Input features
102
+ y, # Target values
103
+ test_size=0.2, # 20% test split
104
+ validation_size=0.1 # 10% of training for validation
105
+ )
106
+ ```
107
+
108
+ **Training Phases (Automatic):**
109
+ 1. Data analysis and characterization
110
+ 2. Preprocessing and normalization
111
+ 3. Comprehensive diagnostics
112
+ 4. Feature engineering and selection
113
+ 5. Ensemble model training (XGBoost + Random Forest)
114
+ 6. Overfitting detection
115
+ 7. Evaluation with 8+ metrics
116
+ 8. Feature importance calculation
117
+ 9. Complete diagnostics summary
118
+
119
+ ### Prediction
120
+ ```python
121
+ # Basic prediction
122
+ predictions = mind.predict(X_test)
123
+
124
+ # Prediction with uncertainty
125
+ predictions, uncertainties = mind.predict(
126
+ X_test,
127
+ return_uncertainty=True
128
+ )
129
+ ```
130
+
131
+ ### Getting Results
132
+ ```python
133
+ # Diagnostics summary
134
+ summary = mind.get_diagnostics_summary()
135
+ print(summary)
136
+
137
+ # Full training report
138
+ report = mind.training_report
139
+ print(f"Test R²: {report['test_metrics']['r2']:.4f}")
140
+
141
+ # Save model
142
+ mind.save('optical_mind_model.pkl')
143
+
144
+ # Load model
145
+ loaded_mind = OpticalMind.load('optical_mind_model.pkl')
146
+ ```
147
+
148
+ ---
149
+
150
+ ## Complete Example
151
+
152
+ ```python
153
+ import pandas as pd
154
+ from optical_mind import OpticalMind
155
+
156
+ # Load data
157
+ df = pd.read_csv('final_170K_complete_optical.csv')
158
+
159
+ # Prepare
160
+ X = df.drop(columns=['target']).values
161
+ y = df['target'].values
162
+
163
+ # Create intelligent system
164
+ mind = OpticalMind(verbose=True, n_features=50)
165
+
166
+ # Train with complete diagnostics
167
+ report = mind.fit(X, y)
168
+
169
+ # Make predictions
170
+ predictions = mind.predict(X[:100])
171
+ predictions_unc, uncertainties = mind.predict(X[:100], return_uncertainty=True)
172
+
173
+ # Get summary
174
+ print(mind.get_diagnostics_summary())
175
+
176
+ # Save for later use
177
+ mind.save('perovskite_predictor.pkl')
178
+ ```
179
+
180
+ ---
181
+
182
+ ## Diagnostics Output Explanation
183
+
184
+ ### Data Analysis Phase
185
+ Shows:
186
+ - Number of samples and features
187
+ - Memory usage
188
+ - Constant and near-constant features
189
+
190
+ ### Diagnostics Phase
191
+ Shows:
192
+ - **Outliers**: Number detected (% of total)
193
+ - **Feature Issues**:
194
+ - Zero variance features
195
+ - Highly skewed features
196
+ - High kurtosis features
197
+ - **Correlation Analysis**: Redundant feature pairs
198
+
199
+ ### Overfitting Analysis
200
+ - Train R² vs Validation R²
201
+ - Gap between them
202
+ - Severity assessment: none / mild / moderate / severe
203
+ - Recommendations if detected
204
+
205
+ ### Performance Metrics
206
+ All 8+ metrics for Train/Validation/Test:
207
+ ```
208
+ R² Score ................... 0.997019 (best possible: 1.0)
209
+ NSE ........................ 0.997019 (1.0 = perfect)
210
+ RMSE ...................... 0.000012 (lower is better)
211
+ MAE ........................ 0.000009 (lower is better)
212
+ Variance Explained (%) ... 92.41% (higher is better)
213
+ PI ......................... 0.935133 (0-1, higher is better)
214
+ a10 (err ≤ 10%) ........... 60.25% (% within 10% error)
215
+ a20 (err ≤ 20%) ........... 64.30% (% within 20% error)
216
+ ```
217
+
218
+ ---
219
+
220
+ ## Architecture Diagram
221
+
222
+ ```
223
+ INPUT DATA
224
+ ↓
225
+ [Data Preprocessor]
226
+ ├─ Analyze structure
227
+ ├─ Handle missing values
228
+ ├─ Normalize
229
+ └─ Remove problematic rows
230
+ ↓
231
+ [Diagnostics Engine]
232
+ ├─ Detect outliers
233
+ ├─ Find feature inconsistencies
234
+ ├─ Analyze correlations
235
+ └─ Generate diagnostics report
236
+ ↓
237
+ [Feature Engineering]
238
+ ├─ Select top K features
239
+ ├─ Create interactions (optional)
240
+ └─ Generate polynomial features (optional)
241
+ ↓
242
+ [Model Training - ENSEMBLE]
243
+ ├─ XGBoost (gradient boosting)
244
+ ├─ Random Forest (tree-based)
245
+ └─ Combined predictions
246
+ ↓
247
+ [Evaluation]
248
+ ├─ Calculate 8+ metrics
249
+ ├─ Detect overfitting
250
+ └─ Generate performance report
251
+ ↓
252
+ [Explainability]
253
+ ├─ Feature importance
254
+ ├─ SHAP analysis (ready)
255
+ └─ Prediction explanations
256
+ ↓
257
+ OUTPUT: Predictions + Diagnostics + Explanations
258
+ ```
259
+
260
+ ---
261
+
262
+ ## File Structure
263
+
264
+ ```
265
+ optical_mind_core.py # Core modules (preprocessing, diagnostics, etc)
266
+ optical_mind.py # Main OpticalMind class
267
+ optical_mind_demo.py # Demo script with full example
268
+ OPTICAL_MIND_README.md # This documentation
269
+ ```
270
+
271
+ ---
272
+
273
+ ## Key Results from Test Run (10K samples)
274
+
275
+ | Metric | Train | Validation | Test |
276
+ |--------|-------|-----------|------|
277
+ | **R²** | 0.9970 | 0.9971 | 0.9970 |
278
+ | **NSE** | 0.9970 | 0.9971 | 0.9970 |
279
+ | **RMSE** | 0.000012 | 0.000012 | 0.000012 |
280
+ | **MAE** | 0.000009 | 0.000009 | 0.000009 |
281
+ | **a10** | 62.26% | 58.50% | 60.25% |
282
+ | **a20** | 66.24% | 62.75% | 64.30% |
283
+
284
+ **Diagnostics Summary:**
285
+ - Outliers detected: 1,145 (15.9%)
286
+ - Feature issues: 36 total
287
+ - Redundant pairs: 232
288
+ - Overfitting: **NONE** (gap = -0.0006)
289
+ - Top feature importance: 0.248
290
+
291
+ ---
292
+
293
+ ## Advanced Usage
294
+
295
+ ### Custom Hyperparameters
296
+ ```python
297
+ mind = OpticalMind(
298
+ verbose=True,
299
+ n_features=75, # Use more features
300
+ random_state=123
301
+ )
302
+ ```
303
+
304
+ ### Accessing Raw Results
305
+ ```python
306
+ # Training metrics
307
+ train_metrics = mind.training_report['train_metrics']
308
+ val_metrics = mind.training_report['val_metrics']
309
+ test_metrics = mind.training_report['test_metrics']
310
+
311
+ # Overfitting analysis
312
+ overfit = mind.training_report['overfit_analysis']
313
+ print(f"Overfitting severity: {overfit['severity']}")
314
+
315
+ # Feature importance
316
+ feature_imp = mind.training_report['feature_importance']
317
+ ```
318
+
319
+ ### Diagnostics Details
320
+ ```python
321
+ # Raw diagnostics
322
+ outliers = mind.diagnostics_report['outliers']
323
+ feature_issues = mind.diagnostics_report['feature_issues']
324
+ correlations = mind.diagnostics_report['correlations']
325
+
326
+ # Process as needed
327
+ outlier_indices = outliers['indices']
328
+ redundant_pairs = correlations['redundant_pairs']
329
+ ```
330
+
331
+ ---
332
+
333
+ ## Requirements
334
+
335
+ ```
336
+ numpy >= 1.20
337
+ pandas >= 1.2
338
+ scikit-learn >= 0.24
339
+ xgboost >= 1.5
340
+ scipy >= 1.6
341
+ joblib >= 1.0
342
+ ```
343
+
344
+ Install with:
345
+ ```bash
346
+ pip install numpy pandas scikit-learn xgboost scipy joblib
347
+ ```
348
+
349
+ ---
350
+
351
+ ## Performance Tips
352
+
353
+ 1. **Large Datasets**: Use sampling for faster iteration
354
+ ```python
355
+ sample_size = 50000
356
+ X_sample = X[:sample_size]
357
+ y_sample = y[:sample_size]
358
+ mind.fit(X_sample, y_sample)
359
+ ```
360
+
361
+ 2. **Fewer Features**: Reduces training time
362
+ ```python
363
+ mind = OpticalMind(n_features=30) # vs default 50
364
+ ```
365
+
366
+ 3. **Parallel Processing**: Automatically uses all CPUs
367
+ - No configuration needed!
368
+
369
+ ---
370
+
371
+ ## Troubleshooting
372
+
373
+ **Issue**: Memory error with full dataset
374
+ - **Solution**: Use sample or reduce n_features
375
+
376
+ **Issue**: Very high MAPE but low RMSE
377
+ - **Solution**: Data has values close to zero; MAPE is less reliable
378
+
379
+ **Issue**: Overfitting detected
380
+ - **Solution**: System will recommend:
381
+ - Use less complex models
382
+ - Increase regularization
383
+ - Add more training data
384
+
385
+ **Issue**: Poor predictions
386
+ - **Solution**: Check diagnostics:
387
+ ```python
388
+ print(mind.get_diagnostics_summary())
389
+ ```
390
+ - High outliers? → Review data quality
391
+ - Many feature issues? → Domain knowledge needed
392
+ - High redundancy? → Correlations too strong
393
+
394
+ ---
395
+
396
+ ## License
397
+
398
+ MIT License - Free for research and commercial use
399
+
400
+ ---
401
+
402
+ ## Citation
403
+
404
+ If you use OpticalMind in your research, please cite:
405
+
406
+ ```
407
+ OpticalMind: Intelligent ML System for Optical Property Prediction
408
+ Author: Muhammad Wajdan Jamal
409
+ Version: 1.0.0
410
+ Year: 2026
411
+ ```
412
+
413
+ ---
414
+
415
+ ## Support & Feedback
416
+
417
+ For issues, suggestions, or improvements:
418
+ 1. Check this documentation
419
+ 2. Review the demo script
420
+ 3. Check diagnostic output
421
+ 4. Iterate based on recommendations
422
+
423
+ ---
424
+
425
+ **Last Updated:** April 5, 2026
426
+ **Status:** ✅ Production Ready
427
+ **Quality:** Research-Grade