oikan 0.0.3.2__tar.gz → 0.0.3.4__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: oikan
3
- Version: 0.0.3.2
3
+ Version: 0.0.3.4
4
4
  Summary: OIKAN: Neuro-Symbolic ML for Scientific Discovery
5
5
  Author: Arman Zhalgasbayev
6
6
  License: MIT
@@ -14,6 +14,7 @@ Requires-Dist: torch
14
14
  Requires-Dist: numpy
15
15
  Requires-Dist: scikit-learn
16
16
  Requires-Dist: tqdm
17
+ Requires-Dist: sympy
17
18
  Dynamic: license-file
18
19
 
19
20
  <!-- logo in the center -->
@@ -57,7 +58,7 @@ OIKAN implements a modern interpretation of the Kolmogorov-Arnold Representation
57
58
 
58
59
  2. **Neural Implementation**: OIKAN uses a specialized architecture combining:
59
60
  - Feature transformation layers with interpretable basis functions
60
- - Symbolic regression for formula extraction
61
+ - Symbolic regression for formula extraction (ElasticNet-based)
61
62
  - Automatic pruning of insignificant terms
62
63
 
63
64
  ```python
@@ -76,15 +77,19 @@ OIKAN implements a modern interpretation of the Kolmogorov-Arnold Representation
76
77
  SYMBOLIC_FUNCTIONS = {
77
78
  'linear': 'x', # Direct relationships
78
79
  'quadratic': 'x^2', # Non-linear patterns
80
+ 'cubic': 'x^3', # Higher-order relationships
79
81
  'interaction': 'x_i x_j', # Feature interactions
80
- 'higher_order': 'x^n' # Polynomial terms
82
+ 'higher_order': 'x^n', # Polynomial terms
83
+ 'trigonometric': 'sin(x)', # Trigonometric functions
84
+ 'exponential': 'exp(x)', # Exponential growth
85
+ 'logarithmic': 'log(x)' # Logarithmic relationships
81
86
  }
82
87
  ```
83
88
 
84
89
  4. **Formula Extraction Process**:
85
90
  - Train neural network on raw data
86
91
  - Generate augmented samples for better coverage
87
- - Perform L1-regularized symbolic regression
92
+ - Perform L1-regularized symbolic regression (alpha)
88
93
  - Prune terms with coefficients below threshold
89
94
  - Export human-readable mathematical expressions
90
95
 
@@ -114,13 +119,14 @@ model = OIKANRegressor(
114
119
  hidden_sizes=[32, 32], # Hidden layer sizes
115
120
  activation='relu', # Activation function (other options: 'tanh', 'leaky_relu', 'elu', 'swish', 'gelu')
116
121
  augmentation_factor=5, # Augmentation factor for data generation
117
- polynomial_degree=2, # Degree of polynomial basis functions
118
- alpha=0.1, # L1 regularization strength
122
+ alpha=0.1, # L1 regularization strength (Symbolic regression)
119
123
  sigma=0.1, # Standard deviation of Gaussian noise for data augmentation
124
+ top_k=5, # Number of top features to select (Symbolic regression)
120
125
  epochs=100, # Number of training epochs
121
126
  lr=0.001, # Learning rate
122
127
  batch_size=32, # Batch size for training
123
- verbose=True # Verbose output during training
128
+ verbose=True, # Verbose output during training
129
+ evaluate_nn=True # Validate neural network performance before full process
124
130
  )
125
131
 
126
132
  # Fit the model
@@ -134,7 +140,7 @@ mse = mean_squared_error(y_test, y_pred)
134
140
  print("Mean Squared Error:", mse)
135
141
 
136
142
  # Get symbolic formula
137
- formula = model.get_formula()
143
+ formula = model.get_formula() # default: type='original' -> returns all formula without pruning | other options: 'sympied' -> simplified formula using sympy; 'latex' -> LaTeX format
138
144
  print("Symbolic Formula:", formula)
139
145
 
140
146
  # Get feature importances
@@ -162,13 +168,14 @@ model = OIKANClassifier(
162
168
  hidden_sizes=[32, 32], # Hidden layer sizes
163
169
  activation='relu', # Activation function (other options: 'tanh', 'leaky_relu', 'elu', 'swish', 'gelu')
164
170
  augmentation_factor=10, # Augmentation factor for data generation
165
- polynomial_degree=2, # Degree of polynomial basis functions
166
- alpha=0.1, # L1 regularization strength
171
+ alpha=0.1, # L1 regularization strength (Symbolic regression)
167
172
  sigma=0.1, # Standard deviation of Gaussian noise for data augmentation
173
+ top_k=5, # Number of top features to select (Symbolic regression)
168
174
  epochs=100, # # Number of training epochs
169
175
  lr=0.001, # Learning rate
170
176
  batch_size=32, # Batch size for training
171
- verbose=True # Verbose output during training
177
+ verbose=True, # Verbose output during training
178
+ evaluate_nn=True # Validate neural network performance before full process
172
179
  )
173
180
 
174
181
  # Fit the model
@@ -182,7 +189,7 @@ accuracy = model.score(X_test, y_test)
182
189
  print("Accuracy:", accuracy)
183
190
 
184
191
  # Get symbolic formulas for each class
185
- formulas = model.get_formula()
192
+ formulas = model.get_formula() # default: type='original' -> returns all formula without pruning | other options: 'sympied' -> simplified formula using sympy; 'latex' -> LaTeX format
186
193
  for i, formula in enumerate(formulas):
187
194
  print(f"Class {i} Formula:", formula)
188
195
 
@@ -204,6 +211,60 @@ loaded_model.load("outputs/model.json")
204
211
 
205
212
  ![OIKAN v0.0.3(1) Architecture](https://raw.githubusercontent.com/silvermete0r/oikan/main/docs/media/oikan-v0.0.3(1)-architecture-oop.png)
206
213
 
214
+ ## OIKAN Symbolic Model Compilers
215
+
216
+ OIKAN provides a set of symbolic model compilers to convert the symbolic formulas generated by the OIKAN model into different programming languages.
217
+
218
+ *Currently, we support: `Python`, `C++`, `C`, `JavaScript`, `Rust`, and `Go`. This allows users to easily integrate the generated formulas into their applications or systems.*
219
+
220
+ All compilers: [model_compilers/](model_compilers)
221
+
222
+ ### Example of Python Compiler
223
+
224
+ 1. Regression Model:
225
+ ```python
226
+ import numpy as np
227
+ import json
228
+
229
+ def predict(X, symbolic_model):
230
+ X = np.asarray(X)
231
+ X_transformed = evaluate_basis_functions(X, symbolic_model['basis_functions'],
232
+ symbolic_model['n_features'])
233
+ return np.dot(X_transformed, symbolic_model['coefficients'])
234
+
235
+ if __name__ == "__main__":
236
+ with open('outputs/california_housing_model.json', 'r') as f:
237
+ symbolic_model = json.load(f)
238
+ X = np.random.rand(10, symbolic_model['n_features'])
239
+ y_pred = predict(X, symbolic_model)
240
+ print(y_pred)
241
+ ```
242
+
243
+ 2. Classification Model:
244
+ ```python
245
+ import numpy as np
246
+ import json
247
+
248
+ def predict(X, symbolic_model):
249
+ X = np.asarray(X)
250
+ X_transformed = evaluate_basis_functions(X, symbolic_model['basis_functions'],
251
+ symbolic_model['n_features'])
252
+ logits = np.dot(X_transformed, np.array(symbolic_model['coefficients_list']).T)
253
+ probabilities = np.exp(logits) / np.sum(np.exp(logits), axis=1, keepdims=True)
254
+ return np.argmax(probabilities, axis=1)
255
+
256
+ if __name__ == "__main__":
257
+ with open('outputs/iris_model.json', 'r') as f:
258
+ symbolic_model = json.load(f)
259
+ X = np.array([[5.1, 3.5, 1.4, 0.2],
260
+ [7.0, 3.2, 4.7, 1.4],
261
+ [6.3, 3.3, 6.0, 2.5]])
262
+ y_pred = predict(X, symbolic_model)
263
+ print(y_pred)
264
+ ```
265
+
266
+
267
+
207
268
  ## Contributing
208
269
 
209
270
  We welcome contributions! Key areas of interest:
@@ -39,7 +39,7 @@ OIKAN implements a modern interpretation of the Kolmogorov-Arnold Representation
39
39
 
40
40
  2. **Neural Implementation**: OIKAN uses a specialized architecture combining:
41
41
  - Feature transformation layers with interpretable basis functions
42
- - Symbolic regression for formula extraction
42
+ - Symbolic regression for formula extraction (ElasticNet-based)
43
43
  - Automatic pruning of insignificant terms
44
44
 
45
45
  ```python
@@ -58,15 +58,19 @@ OIKAN implements a modern interpretation of the Kolmogorov-Arnold Representation
58
58
  SYMBOLIC_FUNCTIONS = {
59
59
  'linear': 'x', # Direct relationships
60
60
  'quadratic': 'x^2', # Non-linear patterns
61
+ 'cubic': 'x^3', # Higher-order relationships
61
62
  'interaction': 'x_i x_j', # Feature interactions
62
- 'higher_order': 'x^n' # Polynomial terms
63
+ 'higher_order': 'x^n', # Polynomial terms
64
+ 'trigonometric': 'sin(x)', # Trigonometric functions
65
+ 'exponential': 'exp(x)', # Exponential growth
66
+ 'logarithmic': 'log(x)' # Logarithmic relationships
63
67
  }
64
68
  ```
65
69
 
66
70
  4. **Formula Extraction Process**:
67
71
  - Train neural network on raw data
68
72
  - Generate augmented samples for better coverage
69
- - Perform L1-regularized symbolic regression
73
+ - Perform L1-regularized symbolic regression (alpha)
70
74
  - Prune terms with coefficients below threshold
71
75
  - Export human-readable mathematical expressions
72
76
 
@@ -96,13 +100,14 @@ model = OIKANRegressor(
96
100
  hidden_sizes=[32, 32], # Hidden layer sizes
97
101
  activation='relu', # Activation function (other options: 'tanh', 'leaky_relu', 'elu', 'swish', 'gelu')
98
102
  augmentation_factor=5, # Augmentation factor for data generation
99
- polynomial_degree=2, # Degree of polynomial basis functions
100
- alpha=0.1, # L1 regularization strength
103
+ alpha=0.1, # L1 regularization strength (Symbolic regression)
101
104
  sigma=0.1, # Standard deviation of Gaussian noise for data augmentation
105
+ top_k=5, # Number of top features to select (Symbolic regression)
102
106
  epochs=100, # Number of training epochs
103
107
  lr=0.001, # Learning rate
104
108
  batch_size=32, # Batch size for training
105
- verbose=True # Verbose output during training
109
+ verbose=True, # Verbose output during training
110
+ evaluate_nn=True # Validate neural network performance before full process
106
111
  )
107
112
 
108
113
  # Fit the model
@@ -116,7 +121,7 @@ mse = mean_squared_error(y_test, y_pred)
116
121
  print("Mean Squared Error:", mse)
117
122
 
118
123
  # Get symbolic formula
119
- formula = model.get_formula()
124
+ formula = model.get_formula() # default: type='original' -> returns all formula without pruning | other options: 'sympied' -> simplified formula using sympy; 'latex' -> LaTeX format
120
125
  print("Symbolic Formula:", formula)
121
126
 
122
127
  # Get feature importances
@@ -144,13 +149,14 @@ model = OIKANClassifier(
144
149
  hidden_sizes=[32, 32], # Hidden layer sizes
145
150
  activation='relu', # Activation function (other options: 'tanh', 'leaky_relu', 'elu', 'swish', 'gelu')
146
151
  augmentation_factor=10, # Augmentation factor for data generation
147
- polynomial_degree=2, # Degree of polynomial basis functions
148
- alpha=0.1, # L1 regularization strength
152
+ alpha=0.1, # L1 regularization strength (Symbolic regression)
149
153
  sigma=0.1, # Standard deviation of Gaussian noise for data augmentation
154
+ top_k=5, # Number of top features to select (Symbolic regression)
150
155
  epochs=100, # # Number of training epochs
151
156
  lr=0.001, # Learning rate
152
157
  batch_size=32, # Batch size for training
153
- verbose=True # Verbose output during training
158
+ verbose=True, # Verbose output during training
159
+ evaluate_nn=True # Validate neural network performance before full process
154
160
  )
155
161
 
156
162
  # Fit the model
@@ -164,7 +170,7 @@ accuracy = model.score(X_test, y_test)
164
170
  print("Accuracy:", accuracy)
165
171
 
166
172
  # Get symbolic formulas for each class
167
- formulas = model.get_formula()
173
+ formulas = model.get_formula() # default: type='original' -> returns all formula without pruning | other options: 'sympied' -> simplified formula using sympy; 'latex' -> LaTeX format
168
174
  for i, formula in enumerate(formulas):
169
175
  print(f"Class {i} Formula:", formula)
170
176
 
@@ -186,6 +192,60 @@ loaded_model.load("outputs/model.json")
186
192
 
187
193
  ![OIKAN v0.0.3(1) Architecture](https://raw.githubusercontent.com/silvermete0r/oikan/main/docs/media/oikan-v0.0.3(1)-architecture-oop.png)
188
194
 
195
+ ## OIKAN Symbolic Model Compilers
196
+
197
+ OIKAN provides a set of symbolic model compilers to convert the symbolic formulas generated by the OIKAN model into different programming languages.
198
+
199
+ *Currently, we support: `Python`, `C++`, `C`, `JavaScript`, `Rust`, and `Go`. This allows users to easily integrate the generated formulas into their applications or systems.*
200
+
201
+ All compilers: [model_compilers/](model_compilers)
202
+
203
+ ### Example of Python Compiler
204
+
205
+ 1. Regression Model:
206
+ ```python
207
+ import numpy as np
208
+ import json
209
+
210
+ def predict(X, symbolic_model):
211
+ X = np.asarray(X)
212
+ X_transformed = evaluate_basis_functions(X, symbolic_model['basis_functions'],
213
+ symbolic_model['n_features'])
214
+ return np.dot(X_transformed, symbolic_model['coefficients'])
215
+
216
+ if __name__ == "__main__":
217
+ with open('outputs/california_housing_model.json', 'r') as f:
218
+ symbolic_model = json.load(f)
219
+ X = np.random.rand(10, symbolic_model['n_features'])
220
+ y_pred = predict(X, symbolic_model)
221
+ print(y_pred)
222
+ ```
223
+
224
+ 2. Classification Model:
225
+ ```python
226
+ import numpy as np
227
+ import json
228
+
229
+ def predict(X, symbolic_model):
230
+ X = np.asarray(X)
231
+ X_transformed = evaluate_basis_functions(X, symbolic_model['basis_functions'],
232
+ symbolic_model['n_features'])
233
+ logits = np.dot(X_transformed, np.array(symbolic_model['coefficients_list']).T)
234
+ probabilities = np.exp(logits) / np.sum(np.exp(logits), axis=1, keepdims=True)
235
+ return np.argmax(probabilities, axis=1)
236
+
237
+ if __name__ == "__main__":
238
+ with open('outputs/iris_model.json', 'r') as f:
239
+ symbolic_model = json.load(f)
240
+ X = np.array([[5.1, 3.5, 1.4, 0.2],
241
+ [7.0, 3.2, 4.7, 1.4],
242
+ [6.3, 3.3, 6.0, 2.5]])
243
+ y_pred = predict(X, symbolic_model)
244
+ print(y_pred)
245
+ ```
246
+
247
+
248
+
189
249
  ## Contributing
190
250
 
191
251
  We welcome contributions! Key areas of interest:
@@ -0,0 +1,31 @@
1
+ class OIKANError(Exception):
2
+ """Base exception for OIKAN library."""
3
+ pass
4
+
5
+ class ModelNotFittedError(OIKANError):
6
+ """Raised when a method requires a fitted model."""
7
+ pass
8
+
9
+ class InvalidParameterError(OIKANError):
10
+ """Raised when an invalid parameter value is provided."""
11
+ pass
12
+
13
+ class DataDimensionError(OIKANError):
14
+ """Raised when input data has incorrect dimensions."""
15
+ pass
16
+
17
+ class NumericalInstabilityError(OIKANError):
18
+ """Raised when numerical computations become unstable."""
19
+ pass
20
+
21
+ class FeatureExtractionError(OIKANError):
22
+ """Raised when feature extraction or transformation fails."""
23
+ pass
24
+
25
+ class ModelSerializationError(OIKANError):
26
+ """Raised when model saving/loading operations fail."""
27
+ pass
28
+
29
+ class ConvergenceError(OIKANError):
30
+ """Raised when the model fails to converge during training."""
31
+ pass
@@ -3,13 +3,14 @@ import torch
3
3
  import torch.nn as nn
4
4
  import torch.optim as optim
5
5
  from sklearn.preprocessing import PolynomialFeatures
6
- from sklearn.linear_model import Lasso
6
+ from sklearn.linear_model import ElasticNet
7
7
  from abc import ABC, abstractmethod
8
8
  import json
9
9
  from .neural import TabularNet
10
- from .utils import evaluate_basis_functions, get_features_involved
10
+ from .utils import evaluate_basis_functions, get_features_involved, sympify_formula, get_latex_formula
11
11
  from sklearn.model_selection import train_test_split
12
12
  from sklearn.metrics import r2_score, accuracy_score
13
+ from .exceptions import *
13
14
  import sys
14
15
 
15
16
  class OIKAN(ABC):
@@ -24,12 +25,12 @@ class OIKAN(ABC):
24
25
  Activation function for the neural network ('relu', 'tanh', 'leaky_relu', 'elu', 'swish', 'gelu').
25
26
  augmentation_factor : int, optional (default=10)
26
27
  Number of augmented samples per original sample.
27
- polynomial_degree : int, optional (default=2)
28
- Maximum degree of polynomial features for symbolic regression.
29
28
  alpha : float, optional (default=0.1)
30
29
  L1 regularization strength for Lasso in symbolic regression.
31
30
  sigma : float, optional (default=0.1)
32
31
  Standard deviation of Gaussian noise for data augmentation.
32
+ top_k : int, optional (default=5)
33
+ Number of top features to select in hierarchical symbolic regression.
33
34
  epochs : int, optional (default=100)
34
35
  Number of epochs for neural network training.
35
36
  lr : float, optional (default=0.001)
@@ -42,12 +43,30 @@ class OIKAN(ABC):
42
43
  Whether to evaluate neural network performance before full training.
43
44
  """
44
45
  def __init__(self, hidden_sizes=[64, 64], activation='relu', augmentation_factor=10,
45
- polynomial_degree=2, alpha=0.1, sigma=0.1, epochs=100, lr=0.001, batch_size=32,
46
- verbose=False, evaluate_nn=False):
46
+ alpha=0.1, sigma=0.1, epochs=100, lr=0.001, batch_size=32,
47
+ verbose=False, evaluate_nn=False, top_k=5):
48
+ if not isinstance(hidden_sizes, list) or not all(isinstance(x, int) and x > 0 for x in hidden_sizes):
49
+ raise InvalidParameterError("hidden_sizes must be a list of positive integers")
50
+ if activation not in ['relu', 'tanh', 'leaky_relu', 'elu', 'swish', 'gelu']:
51
+ raise InvalidParameterError(f"Unsupported activation function: {activation}")
52
+ if not isinstance(augmentation_factor, int) or augmentation_factor < 1:
53
+ raise InvalidParameterError("augmentation_factor must be a positive integer")
54
+ if not isinstance(top_k, int) or top_k < 1:
55
+ raise InvalidParameterError("top_k must be a positive integer")
56
+ if not 0 < lr < 1:
57
+ raise InvalidParameterError("Learning rate must be between 0 and 1")
58
+ if not isinstance(batch_size, int) or batch_size < 1:
59
+ raise InvalidParameterError("batch_size must be a positive integer")
60
+ if not isinstance(epochs, int) or epochs < 1:
61
+ raise InvalidParameterError("epochs must be a positive integer")
62
+ if not 0 <= alpha <= 1:
63
+ raise InvalidParameterError("alpha must be between 0 and 1")
64
+ if sigma <= 0:
65
+ raise InvalidParameterError("sigma must be positive")
66
+
47
67
  self.hidden_sizes = hidden_sizes
48
68
  self.activation = activation
49
69
  self.augmentation_factor = augmentation_factor
50
- self.polynomial_degree = polynomial_degree
51
70
  self.alpha = alpha
52
71
  self.sigma = sigma
53
72
  self.epochs = epochs
@@ -55,6 +74,7 @@ class OIKAN(ABC):
55
74
  self.batch_size = batch_size
56
75
  self.verbose = verbose
57
76
  self.evaluate_nn = evaluate_nn
77
+ self.top_k = top_k
58
78
  self.neural_net = None
59
79
  self.symbolic_model = None
60
80
  self.evaluation_done = False
@@ -67,23 +87,53 @@ class OIKAN(ABC):
67
87
  def predict(self, X):
68
88
  pass
69
89
 
70
- def get_formula(self):
71
- """Returns the symbolic formula(s) as a string (regression) or list of strings (classification)."""
90
+ def get_formula(self, type='original'):
91
+ """
92
+ Returns the symbolic formula(s) as a string (regression) or list of strings (classification).
93
+
94
+ Parameter:
95
+ --------
96
+ type : str, optional (default='original') other options: 'sympied', 'latex'
97
+ 'original' returns the original formula with coefficients, 'sympied' returns sympy simplified formula.
98
+ """
99
+ if type.lower() not in ['original', 'sympied', 'latex']:
100
+ raise InvalidParameterError("Invalid type. Choose 'original', 'sympied', 'latex'.")
72
101
  if self.symbolic_model is None:
73
102
  raise ValueError("Model not fitted yet.")
74
103
  basis_functions = self.symbolic_model['basis_functions']
75
- if 'coefficients' in self.symbolic_model:
76
- coefficients = self.symbolic_model['coefficients']
77
- formula = " + ".join([f"{coefficients[i]:.3f}*{basis_functions[i]}"
78
- for i in range(len(coefficients)) if coefficients[i] != 0])
79
- return formula if formula else "0"
104
+ if type.lower() == 'original':
105
+ if 'coefficients' in self.symbolic_model:
106
+ coefficients = self.symbolic_model['coefficients']
107
+ formula = " + ".join([f"{coefficients[i]:.6f}*{basis_functions[i]}"
108
+ for i in range(len(coefficients)) if coefficients[i] != 0])
109
+ return formula if formula else "0"
110
+ else:
111
+ formulas = []
112
+ for c, coef in enumerate(self.symbolic_model['coefficients_list']):
113
+ formula = " + ".join([f"{coef[i]:.6f}*{basis_functions[i]}"
114
+ for i in range(len(coef)) if coef[i] != 0])
115
+ formulas.append(f"Class {self.classes_[c]}: {formula if formula else '0'}")
116
+ return formulas
117
+ elif type.lower() == 'sympied':
118
+ if 'coefficients' in self.symbolic_model:
119
+ formula = sympify_formula(self.symbolic_model['basis_functions'], self.symbolic_model['coefficients'], self.symbolic_model['n_features'])
120
+ return formula
121
+ else:
122
+ formulas = []
123
+ for c, coef in enumerate(self.symbolic_model['coefficients_list']):
124
+ formula = sympify_formula(self.symbolic_model['basis_functions'], coef, self.symbolic_model['n_features'])
125
+ formulas.append(f"Class {self.classes_[c]}: {formula}")
126
+ return formulas
80
127
  else:
81
- formulas = []
82
- for c, coef in enumerate(self.symbolic_model['coefficients_list']):
83
- formula = " + ".join([f"{coef[i]:.3f}*{basis_functions[i]}"
84
- for i in range(len(coef)) if coef[i] != 0])
85
- formulas.append(f"Class {self.classes_[c]}: {formula if formula else '0'}")
86
- return formulas
128
+ if 'coefficients' in self.symbolic_model:
129
+ formula = get_latex_formula(self.symbolic_model['basis_functions'], self.symbolic_model['coefficients'], self.symbolic_model['n_features'])
130
+ return formula
131
+ else:
132
+ formulas = []
133
+ for c, coef in enumerate(self.symbolic_model['coefficients_list']):
134
+ formula = get_latex_formula(self.symbolic_model['basis_functions'], coef, self.symbolic_model['n_features'])
135
+ formulas.append(f"Class {self.classes_[c]}: {formula}")
136
+ return formulas
87
137
 
88
138
  def feature_importances(self):
89
139
  """
@@ -129,27 +179,32 @@ class OIKAN(ABC):
129
179
  File path to save the model. Should end with .json
130
180
  """
131
181
  if self.symbolic_model is None:
132
- raise ValueError("Model not fitted yet.")
133
-
182
+ raise ModelNotFittedError("Model must be fitted before saving")
183
+
134
184
  if not path.endswith('.json'):
135
185
  path = path + '.json'
136
-
137
- # Convert numpy arrays and other non-serializable types to lists
138
- model_data = {
139
- 'n_features': self.symbolic_model['n_features'],
140
- 'degree': self.symbolic_model['degree'],
141
- 'basis_functions': self.symbolic_model['basis_functions']
142
- }
143
186
 
144
- if 'coefficients' in self.symbolic_model:
145
- model_data['coefficients'] = self.symbolic_model['coefficients']
146
- else:
147
- model_data['coefficients_list'] = [coef for coef in self.symbolic_model['coefficients_list']]
148
- if hasattr(self, 'classes_'):
149
- model_data['classes'] = self.classes_.tolist()
187
+ try:
188
+ # Convert numpy arrays and other non-serializable types to lists
189
+ model_data = {
190
+ 'n_features': self.symbolic_model['n_features'],
191
+ 'basis_functions': self.symbolic_model['basis_functions']
192
+ }
193
+
194
+ if 'coefficients' in self.symbolic_model:
195
+ model_data['coefficients'] = self.symbolic_model['coefficients']
196
+ else:
197
+ model_data['coefficients_list'] = [coef for coef in self.symbolic_model['coefficients_list']]
198
+ if hasattr(self, 'classes_'):
199
+ model_data['classes'] = self.classes_.tolist()
200
+
201
+ with open(path, 'w') as f:
202
+ json.dump(model_data, f, indent=2)
203
+ except Exception as e:
204
+ raise ModelSerializationError(f"Failed to save model: {str(e)}")
150
205
 
151
- with open(path, 'w') as f:
152
- json.dump(model_data, f, indent=2)
206
+ if self.verbose:
207
+ print(f"Model saved to {path}")
153
208
 
154
209
  def load(self, path):
155
210
  """
@@ -162,22 +217,27 @@ class OIKAN(ABC):
162
217
  """
163
218
  if not path.endswith('.json'):
164
219
  path = path + '.json'
220
+
221
+ try:
222
+ with open(path, 'r') as f:
223
+ model_data = json.load(f)
224
+
225
+ self.symbolic_model = {
226
+ 'n_features': model_data['n_features'],
227
+ 'basis_functions': model_data['basis_functions']
228
+ }
165
229
 
166
- with open(path, 'r') as f:
167
- model_data = json.load(f)
168
-
169
- self.symbolic_model = {
170
- 'n_features': model_data['n_features'],
171
- 'degree': model_data['degree'],
172
- 'basis_functions': model_data['basis_functions']
173
- }
230
+ if 'coefficients' in model_data:
231
+ self.symbolic_model['coefficients'] = model_data['coefficients']
232
+ else:
233
+ self.symbolic_model['coefficients_list'] = model_data['coefficients_list']
234
+ if 'classes' in model_data:
235
+ self.classes_ = np.array(model_data['classes'])
236
+ except Exception as e:
237
+ raise ModelSerializationError(f"Failed to load model: {str(e)}")
174
238
 
175
- if 'coefficients' in model_data:
176
- self.symbolic_model['coefficients'] = model_data['coefficients']
177
- else:
178
- self.symbolic_model['coefficients_list'] = model_data['coefficients_list']
179
- if 'classes' in model_data:
180
- self.classes_ = np.array(model_data['classes'])
239
+ if self.verbose:
240
+ print(f"Model loaded from {path}")
181
241
 
182
242
  def _evaluate_neural_net(self, X, y, output_size, loss_fn):
183
243
  """Evaluates neural network performance on train-test split."""
@@ -185,7 +245,6 @@ class OIKAN(ABC):
185
245
 
186
246
  input_size = X.shape[1]
187
247
  self.neural_net = TabularNet(input_size, self.hidden_sizes, output_size, self.activation)
188
- optimizer = optim.Adam(self.neural_net.parameters(), lr=self.lr)
189
248
 
190
249
  # Train on the training set
191
250
  self._train_neural_net(X_train, y_train, output_size, loss_fn)
@@ -253,7 +312,6 @@ class OIKAN(ABC):
253
312
 
254
313
  def _generate_augmented_data(self, X):
255
314
  """Generates augmented data by adding Gaussian noise."""
256
- n_samples = X.shape[0]
257
315
  X_aug = []
258
316
  for _ in range(self.augmentation_factor):
259
317
  noise = np.random.normal(0, self.sigma, X.shape)
@@ -262,37 +320,105 @@ class OIKAN(ABC):
262
320
  return np.vstack(X_aug)
263
321
 
264
322
  def _perform_symbolic_regression(self, X, y):
265
- """Performs symbolic regression using polynomial features and Lasso."""
266
- poly = PolynomialFeatures(degree=self.polynomial_degree, include_bias=True)
267
- X_poly = poly.fit_transform(X)
268
- model = Lasso(alpha=self.alpha, fit_intercept=False)
269
- model.fit(X_poly, y)
323
+ """
324
+ Performs hierarchical symbolic regression using a two-stage approach.
325
+
326
+ Parameters:
327
+ -----------
328
+ X : array-like of shape (n_samples, n_features)
329
+ Input data.
330
+ y : array-like of shape (n_samples,) or (n_samples, n_classes)
331
+ Target values or logits.
332
+ """
333
+ n_features = X.shape[1]
334
+ self.top_k = min(self.top_k, n_features)
335
+
336
+ if self.top_k < 1:
337
+ raise InvalidParameterError("top_k must be at least 1")
338
+
339
+ if np.any(np.isnan(X)) or np.any(np.isnan(y)):
340
+ raise NumericalInstabilityError("Input data contains NaN values")
341
+
342
+ if np.any(np.isinf(X)) or np.any(np.isinf(y)):
343
+ raise NumericalInstabilityError("Input data contains infinite values")
344
+
345
+ # Stage 1: Coarse Model
346
+ coarse_degree = 2 # Fixed low degree for coarse model
347
+ poly_coarse = PolynomialFeatures(degree=coarse_degree, include_bias=True)
348
+ X_poly_coarse = poly_coarse.fit_transform(X)
349
+ model_coarse = ElasticNet(alpha=self.alpha, fit_intercept=False)
350
+ model_coarse.fit(X_poly_coarse, y)
351
+
352
+ # Compute feature importances for original features
353
+ basis_functions_coarse = poly_coarse.get_feature_names_out()
354
+ if len(y.shape) == 1 or y.shape[1] == 1:
355
+ coef_coarse = model_coarse.coef_.flatten()
356
+ else:
357
+ coef_coarse = np.sum(np.abs(model_coarse.coef_), axis=0)
358
+
359
+ importances = np.zeros(X.shape[1])
360
+ for i, func in enumerate(basis_functions_coarse):
361
+ features_involved = get_features_involved(func)
362
+ for idx in features_involved:
363
+ importances[idx] += np.abs(coef_coarse[i])
364
+
365
+ if np.all(importances == 0):
366
+ raise FeatureExtractionError("Failed to compute feature importances - all values are zero")
367
+
368
+ # Select top K features
369
+ top_k_indices = np.argsort(importances)[::-1][:self.top_k]
370
+
371
+ # Stage 2: Refined Model
372
+ # ~ generate additional non-linear features for top K features
373
+ additional_features = []
374
+ additional_names = []
375
+ for i in top_k_indices:
376
+ # Higher-degree polynomial
377
+ additional_features.append(X[:, i]**3)
378
+ additional_names.append(f'x{i}^3')
379
+ # Non-linear transformations
380
+ additional_features.append(np.log1p(np.abs(X[:, i])))
381
+ additional_names.append(f'log1p_x{i}')
382
+ additional_features.append(np.exp(np.clip(X[:, i], -10, 10)))
383
+ additional_names.append(f'exp_x{i}')
384
+ additional_features.append(np.sin(X[:, i]))
385
+ additional_names.append(f'sin_x{i}')
386
+
387
+ # Combine features
388
+ X_additional = np.column_stack(additional_features)
389
+ X_refined = np.hstack([X_poly_coarse, X_additional])
390
+ basis_functions_refined = list(basis_functions_coarse) + additional_names
391
+
392
+ # Fit refined model
393
+ model_refined = ElasticNet(alpha=self.alpha, fit_intercept=False)
394
+ model_refined.fit(X_refined, y)
395
+
396
+ # Store symbolic model
270
397
  if len(y.shape) == 1 or y.shape[1] == 1:
271
- coef = model.coef_.flatten()
272
- selected_indices = np.where(np.abs(coef) > 1e-6)[0]
398
+ # Regression
399
+ coef_refined = model_refined.coef_.flatten()
400
+ selected_indices = np.where(np.abs(coef_refined) > 1e-6)[0]
273
401
  self.symbolic_model = {
274
402
  'n_features': X.shape[1],
275
- 'degree': self.polynomial_degree,
276
- 'basis_functions': poly.get_feature_names_out()[selected_indices].tolist(),
277
- 'coefficients': coef[selected_indices].tolist()
403
+ 'basis_functions': [basis_functions_refined[i] for i in selected_indices],
404
+ 'coefficients': coef_refined[selected_indices].tolist()
278
405
  }
279
406
  else:
407
+ # Classification
280
408
  coefficients_list = []
281
- # Note: Using the same basis functions across classes for simplicity
282
409
  selected_indices = set()
283
410
  for c in range(y.shape[1]):
284
- coef = model.coef_[c]
411
+ coef = model_refined.coef_[c]
285
412
  indices = np.where(np.abs(coef) > 1e-6)[0]
286
413
  selected_indices.update(indices)
287
414
  selected_indices = list(selected_indices)
288
- basis_functions = poly.get_feature_names_out()[selected_indices].tolist()
415
+ basis_functions = [basis_functions_refined[i] for i in selected_indices]
289
416
  for c in range(y.shape[1]):
290
- coef = model.coef_[c]
417
+ coef = model_refined.coef_[c]
291
418
  coef_selected = coef[selected_indices].tolist()
292
419
  coefficients_list.append(coef_selected)
293
420
  self.symbolic_model = {
294
421
  'n_features': X.shape[1],
295
- 'degree': self.polynomial_degree,
296
422
  'basis_functions': basis_functions,
297
423
  'coefficients_list': coefficients_list
298
424
  }
@@ -0,0 +1,256 @@
1
+ import numpy as np
2
+ import sympy as sp
3
+ import json
4
+ from functools import lru_cache
5
+
6
+ def evaluate_basis_functions(X, basis_functions, n_features):
7
+ """
8
+ Evaluates basis functions on the input data.
9
+
10
+ Parameters:
11
+ -----------
12
+ X : array-like of shape (n_samples, n_features)
13
+ Input data.
14
+ basis_functions : list
15
+ List of basis function strings (e.g., '1', 'x0', 'x0^2', 'x0 x1', 'log1p_x0').
16
+ n_features : int
17
+ Number of input features.
18
+
19
+ Returns:
20
+ --------
21
+ X_transformed : ndarray of shape (n_samples, n_basis_functions)
22
+ Transformed data matrix.
23
+ """
24
+ X_transformed = np.zeros((X.shape[0], len(basis_functions)))
25
+ for i, func in enumerate(basis_functions):
26
+ if func == '1':
27
+ X_transformed[:, i] = 1
28
+ elif func.startswith('log1p_x'):
29
+ idx = int(func.split('_')[1][1:])
30
+ X_transformed[:, i] = np.log1p(np.abs(X[:, idx]))
31
+ elif func.startswith('exp_x'):
32
+ idx = int(func.split('_')[1][1:])
33
+ X_transformed[:, i] = np.exp(np.clip(X[:, idx], -10, 10))
34
+ elif func.startswith('sin_x'):
35
+ idx = int(func.split('_')[1][1:])
36
+ X_transformed[:, i] = np.sin(X[:, idx])
37
+ elif '^' in func:
38
+ var, power = func.split('^')
39
+ idx = int(var[1:])
40
+ X_transformed[:, i] = X[:, idx] ** int(power)
41
+ elif ' ' in func:
42
+ vars = func.split(' ')
43
+ result = np.ones(X.shape[0])
44
+ for var in vars:
45
+ idx = int(var[1:])
46
+ result *= X[:, idx]
47
+ X_transformed[:, i] = result
48
+ else:
49
+ idx = int(func[1:])
50
+ X_transformed[:, i] = X[:, idx]
51
+ return X_transformed
52
+
53
+ def get_features_involved(basis_function):
54
+ """
55
+ Extracts the feature indices involved in a basis function string.
56
+
57
+ Parameters:
58
+ -----------
59
+ basis_function : str
60
+ String representation of the basis function, e.g., 'x0', 'x0^2', 'x0 x1', 'log1p_x0'.
61
+
62
+ Returns:
63
+ --------
64
+ set : Set of feature indices involved.
65
+ """
66
+ if basis_function == '1':
67
+ return set()
68
+ features = set()
69
+ if '_' in basis_function: # Handle non-linear functions like 'log1p_x0'
70
+ parts = basis_function.split('_')
71
+ if len(parts) == 2 and parts[1].startswith('x'):
72
+ idx = int(parts[1][1:])
73
+ features.add(idx)
74
+ elif '^' in basis_function: # Handle powers, e.g., 'x0^2'
75
+ var = basis_function.split('^')[0]
76
+ idx = int(var[1:])
77
+ features.add(idx)
78
+ elif ' ' in basis_function: # Handle interactions, e.g., 'x0 x1'
79
+ for part in basis_function.split():
80
+ idx = int(part[1:])
81
+ features.add(idx)
82
+ elif basis_function.startswith('x'):
83
+ idx = int(basis_function[1:])
84
+ features.add(idx)
85
+ return features
86
+
87
+ @lru_cache(maxsize=1000)
88
+ def _cached_sympify_formula(basis_functions_tuple, coefficients_tuple, n_features, threshold):
89
+ """
90
+ Internal function to perform SymPy formula simplification with caching.
91
+
92
+ Parameters:
93
+ -----------
94
+ basis_functions_tuple : tuple
95
+ Tuple of basis function strings.
96
+ coefficients_tuple : tuple
97
+ Tuple of coefficients.
98
+ n_features : int
99
+ Number of input features.
100
+ threshold : float
101
+ Coefficients with absolute value below this are excluded.
102
+
103
+ Returns:
104
+ --------
105
+ str
106
+ Simplified formula as a string, or '0' if empty.
107
+ """
108
+ # Convert tuples back to lists
109
+ basis_functions = list(basis_functions_tuple)
110
+ coefficients = list(coefficients_tuple)
111
+
112
+ # Define symbolic variables
113
+ x = sp.symbols(f'x0:{n_features}')
114
+ expr = 0
115
+
116
+ # Build the expression
117
+ for coef, func in zip(coefficients, basis_functions):
118
+ if abs(coef) < threshold:
119
+ continue # Skip negligible coefficients
120
+ if func == '1':
121
+ term = coef
122
+ elif func.startswith('log1p_x'):
123
+ idx = int(func.split('_')[1][1:])
124
+ term = coef * sp.log(1 + sp.Abs(x[idx]))
125
+ elif func.startswith('exp_x'):
126
+ idx = int(func.split('_')[1][1:])
127
+ term = coef * sp.exp(x[idx])
128
+ elif func.startswith('sin_x'):
129
+ idx = int(func.split('_')[1][1:])
130
+ term = coef * sp.sin(x[idx])
131
+ elif '^' in func:
132
+ var, power = func.split('^')
133
+ idx = int(var[1:])
134
+ term = coef * x[idx]**int(power)
135
+ elif ' ' in func:
136
+ vars = func.split(' ')
137
+ term = coef
138
+ for var in vars:
139
+ idx = int(var[1:])
140
+ term *= x[idx]
141
+ else:
142
+ idx = int(func[1:])
143
+ term = coef * x[idx]
144
+ expr += term
145
+
146
+ # Simplify the expression
147
+ simplified_expr = sp.simplify(expr)
148
+
149
+ # Convert to string with rounded coefficients
150
+ def format_term(term):
151
+ if term.is_Mul:
152
+ coeff = 1
153
+ factors = []
154
+ for factor in term.args:
155
+ if factor.is_Number:
156
+ coeff *= float(factor)
157
+ else:
158
+ factors.append(str(factor))
159
+ if abs(coeff) < threshold:
160
+ return None
161
+ return f"{coeff:.5f}*{'*'.join(factors)}" if factors else f"{coeff:.5f}"
162
+ elif term.is_Add:
163
+ return None # Handle in recursion
164
+ elif term.is_Number:
165
+ return f"{float(term):.5f}" if abs(float(term)) >= threshold else None
166
+ else:
167
+ return f"{1.0:.5f}*{term}" if abs(1.0) >= threshold else None
168
+
169
+ terms = []
170
+ if simplified_expr.is_Add:
171
+ for term in simplified_expr.args:
172
+ formatted = format_term(term)
173
+ if formatted:
174
+ terms.append(formatted)
175
+ else:
176
+ formatted = format_term(simplified_expr)
177
+ if formatted:
178
+ terms.append(formatted)
179
+
180
+ formula = " + ".join(terms).replace("+ -", "- ")
181
+ return formula if formula else "0"
182
+
183
+ def sympify_formula(basis_functions, coefficients, n_features, threshold=0.00005):
184
+ """
185
+ Simplifies a symbolic formula using SymPy with caching.
186
+
187
+ Parameters:
188
+ -----------
189
+ basis_functions : list
190
+ List of basis function strings (e.g., 'x0', 'x0^2', 'x0 x1', 'exp_x0').
191
+ coefficients : list
192
+ List of coefficients corresponding to each basis function.
193
+ n_features : int
194
+ Number of input features.
195
+ threshold : float, optional (default=0.00005)
196
+ Coefficients with absolute value below this are excluded.
197
+
198
+ Returns:
199
+ --------
200
+ str
201
+ Simplified formula as a string, or '0' if empty.
202
+ """
203
+ # Convert inputs to hashable types
204
+ basis_functions_tuple = tuple(basis_functions)
205
+ coefficients_tuple = tuple(coefficients)
206
+
207
+ # Call cached function
208
+ return _cached_sympify_formula(basis_functions_tuple, coefficients_tuple, n_features, threshold)
209
+
210
+ @lru_cache(maxsize=1000)
211
+ def _cached_get_latex_formula(formula):
212
+ """
213
+ Internal function to convert a simplified formula to LaTeX with caching.
214
+
215
+ Parameters:
216
+ -----------
217
+ formula : str
218
+ Simplified formula string.
219
+
220
+ Returns:
221
+ --------
222
+ str
223
+ LaTeX formula as a string.
224
+ """
225
+ return sp.latex(sp.sympify(formula))
226
+
227
+ def get_latex_formula(basis_functions, coefficients, n_features, threshold=0.00005):
228
+ """
229
+ Generates a LaTeX formula from the basis functions and coefficients with caching.
230
+
231
+ Parameters:
232
+ -----------
233
+ basis_functions : list
234
+ List of basis function strings (e.g., 'x0', 'x0^2', 'x0 x1', 'exp_x0').
235
+ coefficients : list
236
+ List of coefficients corresponding to each basis function.
237
+ n_features : int
238
+ Number of input features.
239
+ threshold : float, optional (default=0.00005)
240
+ Coefficients with absolute value below this are excluded.
241
+
242
+ Returns:
243
+ --------
244
+ str
245
+ LaTeX formula as a string, or '0' if empty.
246
+ """
247
+ # Get simplified formula (cached)
248
+ formula = sympify_formula(basis_functions, coefficients, n_features, threshold)
249
+ # Convert to LaTeX (cached)
250
+ return _cached_get_latex_formula(formula)
251
+
252
+ if __name__ == "__main__":
253
+ with open('outputs/california_housing_model.json', 'r') as f:
254
+ model = json.load(f)
255
+ print('Sympified formula:', sympify_formula(model['basis_functions'], model['coefficients'], model['n_features']))
256
+ print('LaTeX formula:', get_latex_formula(model['basis_functions'], model['coefficients'], model['n_features']))
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: oikan
3
- Version: 0.0.3.2
3
+ Version: 0.0.3.4
4
4
  Summary: OIKAN: Neuro-Symbolic ML for Scientific Discovery
5
5
  Author: Arman Zhalgasbayev
6
6
  License: MIT
@@ -14,6 +14,7 @@ Requires-Dist: torch
14
14
  Requires-Dist: numpy
15
15
  Requires-Dist: scikit-learn
16
16
  Requires-Dist: tqdm
17
+ Requires-Dist: sympy
17
18
  Dynamic: license-file
18
19
 
19
20
  <!-- logo in the center -->
@@ -57,7 +58,7 @@ OIKAN implements a modern interpretation of the Kolmogorov-Arnold Representation
57
58
 
58
59
  2. **Neural Implementation**: OIKAN uses a specialized architecture combining:
59
60
  - Feature transformation layers with interpretable basis functions
60
- - Symbolic regression for formula extraction
61
+ - Symbolic regression for formula extraction (ElasticNet-based)
61
62
  - Automatic pruning of insignificant terms
62
63
 
63
64
  ```python
@@ -76,15 +77,19 @@ OIKAN implements a modern interpretation of the Kolmogorov-Arnold Representation
76
77
  SYMBOLIC_FUNCTIONS = {
77
78
  'linear': 'x', # Direct relationships
78
79
  'quadratic': 'x^2', # Non-linear patterns
80
+ 'cubic': 'x^3', # Higher-order relationships
79
81
  'interaction': 'x_i x_j', # Feature interactions
80
- 'higher_order': 'x^n' # Polynomial terms
82
+ 'higher_order': 'x^n', # Polynomial terms
83
+ 'trigonometric': 'sin(x)', # Trigonometric functions
84
+ 'exponential': 'exp(x)', # Exponential growth
85
+ 'logarithmic': 'log(x)' # Logarithmic relationships
81
86
  }
82
87
  ```
83
88
 
84
89
  4. **Formula Extraction Process**:
85
90
  - Train neural network on raw data
86
91
  - Generate augmented samples for better coverage
87
- - Perform L1-regularized symbolic regression
92
+ - Perform L1-regularized symbolic regression (alpha)
88
93
  - Prune terms with coefficients below threshold
89
94
  - Export human-readable mathematical expressions
90
95
 
@@ -114,13 +119,14 @@ model = OIKANRegressor(
114
119
  hidden_sizes=[32, 32], # Hidden layer sizes
115
120
  activation='relu', # Activation function (other options: 'tanh', 'leaky_relu', 'elu', 'swish', 'gelu')
116
121
  augmentation_factor=5, # Augmentation factor for data generation
117
- polynomial_degree=2, # Degree of polynomial basis functions
118
- alpha=0.1, # L1 regularization strength
122
+ alpha=0.1, # L1 regularization strength (Symbolic regression)
119
123
  sigma=0.1, # Standard deviation of Gaussian noise for data augmentation
124
+ top_k=5, # Number of top features to select (Symbolic regression)
120
125
  epochs=100, # Number of training epochs
121
126
  lr=0.001, # Learning rate
122
127
  batch_size=32, # Batch size for training
123
- verbose=True # Verbose output during training
128
+ verbose=True, # Verbose output during training
129
+ evaluate_nn=True # Validate neural network performance before full process
124
130
  )
125
131
 
126
132
  # Fit the model
@@ -134,7 +140,7 @@ mse = mean_squared_error(y_test, y_pred)
134
140
  print("Mean Squared Error:", mse)
135
141
 
136
142
  # Get symbolic formula
137
- formula = model.get_formula()
143
+ formula = model.get_formula() # default: type='original' -> returns all formula without pruning | other options: 'sympied' -> simplified formula using sympy; 'latex' -> LaTeX format
138
144
  print("Symbolic Formula:", formula)
139
145
 
140
146
  # Get feature importances
@@ -162,13 +168,14 @@ model = OIKANClassifier(
162
168
  hidden_sizes=[32, 32], # Hidden layer sizes
163
169
  activation='relu', # Activation function (other options: 'tanh', 'leaky_relu', 'elu', 'swish', 'gelu')
164
170
  augmentation_factor=10, # Augmentation factor for data generation
165
- polynomial_degree=2, # Degree of polynomial basis functions
166
- alpha=0.1, # L1 regularization strength
171
+ alpha=0.1, # L1 regularization strength (Symbolic regression)
167
172
  sigma=0.1, # Standard deviation of Gaussian noise for data augmentation
173
+ top_k=5, # Number of top features to select (Symbolic regression)
168
174
  epochs=100, # # Number of training epochs
169
175
  lr=0.001, # Learning rate
170
176
  batch_size=32, # Batch size for training
171
- verbose=True # Verbose output during training
177
+ verbose=True, # Verbose output during training
178
+ evaluate_nn=True # Validate neural network performance before full process
172
179
  )
173
180
 
174
181
  # Fit the model
@@ -182,7 +189,7 @@ accuracy = model.score(X_test, y_test)
182
189
  print("Accuracy:", accuracy)
183
190
 
184
191
  # Get symbolic formulas for each class
185
- formulas = model.get_formula()
192
+ formulas = model.get_formula() # default: type='original' -> returns all formula without pruning | other options: 'sympied' -> simplified formula using sympy; 'latex' -> LaTeX format
186
193
  for i, formula in enumerate(formulas):
187
194
  print(f"Class {i} Formula:", formula)
188
195
 
@@ -204,6 +211,60 @@ loaded_model.load("outputs/model.json")
204
211
 
205
212
  ![OIKAN v0.0.3(1) Architecture](https://raw.githubusercontent.com/silvermete0r/oikan/main/docs/media/oikan-v0.0.3(1)-architecture-oop.png)
206
213
 
214
+ ## OIKAN Symbolic Model Compilers
215
+
216
+ OIKAN provides a set of symbolic model compilers to convert the symbolic formulas generated by the OIKAN model into different programming languages.
217
+
218
+ *Currently, we support: `Python`, `C++`, `C`, `JavaScript`, `Rust`, and `Go`. This allows users to easily integrate the generated formulas into their applications or systems.*
219
+
220
+ All compilers: [model_compilers/](model_compilers)
221
+
222
+ ### Example of Python Compiler
223
+
224
+ 1. Regression Model:
225
+ ```python
226
+ import numpy as np
227
+ import json
228
+
229
+ def predict(X, symbolic_model):
230
+ X = np.asarray(X)
231
+ X_transformed = evaluate_basis_functions(X, symbolic_model['basis_functions'],
232
+ symbolic_model['n_features'])
233
+ return np.dot(X_transformed, symbolic_model['coefficients'])
234
+
235
+ if __name__ == "__main__":
236
+ with open('outputs/california_housing_model.json', 'r') as f:
237
+ symbolic_model = json.load(f)
238
+ X = np.random.rand(10, symbolic_model['n_features'])
239
+ y_pred = predict(X, symbolic_model)
240
+ print(y_pred)
241
+ ```
242
+
243
+ 2. Classification Model:
244
+ ```python
245
+ import numpy as np
246
+ import json
247
+
248
+ def predict(X, symbolic_model):
249
+ X = np.asarray(X)
250
+ X_transformed = evaluate_basis_functions(X, symbolic_model['basis_functions'],
251
+ symbolic_model['n_features'])
252
+ logits = np.dot(X_transformed, np.array(symbolic_model['coefficients_list']).T)
253
+ probabilities = np.exp(logits) / np.sum(np.exp(logits), axis=1, keepdims=True)
254
+ return np.argmax(probabilities, axis=1)
255
+
256
+ if __name__ == "__main__":
257
+ with open('outputs/iris_model.json', 'r') as f:
258
+ symbolic_model = json.load(f)
259
+ X = np.array([[5.1, 3.5, 1.4, 0.2],
260
+ [7.0, 3.2, 4.7, 1.4],
261
+ [6.3, 3.3, 6.0, 2.5]])
262
+ y_pred = predict(X, symbolic_model)
263
+ print(y_pred)
264
+ ```
265
+
266
+
267
+
207
268
  ## Contributing
208
269
 
209
270
  We welcome contributions! Key areas of interest:
@@ -2,3 +2,4 @@ torch
2
2
  numpy
3
3
  scikit-learn
4
4
  tqdm
5
+ sympy
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
4
4
 
5
5
  [project]
6
6
  name = "oikan"
7
- version = "0.0.3.2"
7
+ version = "0.0.3.4"
8
8
  description = "OIKAN: Neuro-Symbolic ML for Scientific Discovery"
9
9
  readme = "README.md"
10
10
  authors = [{name = "Arman Zhalgasbayev"}]
@@ -12,7 +12,8 @@ dependencies = [
12
12
  "torch",
13
13
  "numpy",
14
14
  "scikit-learn",
15
- "tqdm"
15
+ "tqdm",
16
+ "sympy"
16
17
  ]
17
18
  requires-python = ">=3.7"
18
19
  license = {text = "MIT"}
@@ -7,6 +7,7 @@ setup(
7
7
  "torch",
8
8
  "numpy",
9
9
  "scikit-learn",
10
- "tqdm"
10
+ "tqdm",
11
+ "sympy"
11
12
  ]
12
13
  )
@@ -1,7 +0,0 @@
1
- class OIKANError(Exception):
2
- """Base exception for OIKAN library."""
3
- pass
4
-
5
- class ModelNotFittedError(OIKANError):
6
- """Raised when a method requires a fitted model."""
7
- pass
@@ -1,63 +0,0 @@
1
- import numpy as np
2
-
3
- def evaluate_basis_functions(X, basis_functions, n_features):
4
- """
5
- Evaluates basis functions on the input data.
6
-
7
- Parameters:
8
- -----------
9
- X : array-like of shape (n_samples, n_features)
10
- Input data.
11
- basis_functions : list
12
- List of basis function strings (e.g., '1', 'x0', 'x0^2', 'x0 x1').
13
- n_features : int
14
- Number of input features.
15
-
16
- Returns:
17
- --------
18
- X_transformed : ndarray of shape (n_samples, n_basis_functions)
19
- Transformed data matrix.
20
- """
21
- X_transformed = np.zeros((X.shape[0], len(basis_functions)))
22
- for i, func in enumerate(basis_functions):
23
- if func == '1':
24
- X_transformed[:, i] = 1
25
- elif '^' in func:
26
- var, power = func.split('^')
27
- idx = int(var[1:])
28
- X_transformed[:, i] = X[:, idx] ** int(power)
29
- elif ' ' in func:
30
- var1, var2 = func.split(' ')
31
- idx1 = int(var1[1:])
32
- idx2 = int(var2[1:])
33
- X_transformed[:, i] = X[:, idx1] * X[:, idx2]
34
- else:
35
- idx = int(func[1:])
36
- X_transformed[:, i] = X[:, idx]
37
- return X_transformed
38
-
39
- def get_features_involved(basis_function):
40
- """
41
- Extracts the feature indices involved in a basis function string.
42
-
43
- Parameters:
44
- -----------
45
- basis_function : str
46
- String representation of the basis function, e.g., 'x0', 'x0^2', 'x0 x1'.
47
-
48
- Returns:
49
- --------
50
- set : Set of feature indices involved.
51
- """
52
- if basis_function == '1': # Constant term involves no features
53
- return set()
54
- features = set()
55
- for part in basis_function.split(): # Split by space for interaction terms
56
- if part.startswith('x'):
57
- if '^' in part: # Handle powers, e.g., 'x0^2'
58
- var = part.split('^')[0] # Take 'x0'
59
- else:
60
- var = part # Take 'x0' as is
61
- idx = int(var[1:]) # Extract index, e.g., 0
62
- features.add(idx)
63
- return features
File without changes
File without changes
File without changes
File without changes