PyPI - oikan - Versions diffs - 0.0.3.1__tar.gz → 0.0.3.3__tar.gz - Mend

oikan 0.0.3.1tar.gz → 0.0.3.3tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

{oikan-0.0.3.1 → oikan-0.0.3.3}/PKG-INFO +18 -10
{oikan-0.0.3.1 → oikan-0.0.3.3}/README.md +17 -9
oikan-0.0.3.3/oikan/exceptions.py +31 -0
{oikan-0.0.3.1 → oikan-0.0.3.3}/oikan/model.py +217 -53
oikan-0.0.3.3/oikan/utils.py +82 -0
{oikan-0.0.3.1 → oikan-0.0.3.3}/oikan.egg-info/PKG-INFO +18 -10
{oikan-0.0.3.1 → oikan-0.0.3.3}/oikan.egg-info/SOURCES.txt +0 -1
{oikan-0.0.3.1 → oikan-0.0.3.3}/pyproject.toml +1 -1
oikan-0.0.3.1/oikan/exceptions.py +0 -7
oikan-0.0.3.1/oikan/symbolic.py +0 -55
oikan-0.0.3.1/oikan/utils.py +0 -63
{oikan-0.0.3.1 → oikan-0.0.3.3}/LICENSE +0 -0
{oikan-0.0.3.1 → oikan-0.0.3.3}/oikan/__init__.py +0 -0
{oikan-0.0.3.1 → oikan-0.0.3.3}/oikan/neural.py +0 -0
{oikan-0.0.3.1 → oikan-0.0.3.3}/oikan.egg-info/dependency_links.txt +0 -0
{oikan-0.0.3.1 → oikan-0.0.3.3}/oikan.egg-info/requires.txt +0 -0
{oikan-0.0.3.1 → oikan-0.0.3.3}/oikan.egg-info/top_level.txt +0 -0
{oikan-0.0.3.1 → oikan-0.0.3.3}/setup.cfg +0 -0
{oikan-0.0.3.1 → oikan-0.0.3.3}/setup.py +0 -0

{oikan-0.0.3.1 → oikan-0.0.3.3}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: oikan
-Version: 0.0.3.1
+Version: 0.0.3.3
 Summary: OIKAN: Neuro-Symbolic ML for Scientific Discovery
 Author: Arman Zhalgasbayev
 License: MIT
@@ -57,7 +57,7 @@ OIKAN implements a modern interpretation of the Kolmogorov-Arnold Representation
 2. **Neural Implementation**: OIKAN uses a specialized architecture combining:
    - Feature transformation layers with interpretable basis functions
-   - Symbolic regression for formula extraction
+   - Symbolic regression for formula extraction (ElasticNet-based)
    - Automatic pruning of insignificant terms
    ```python
@@ -76,15 +76,19 @@ OIKAN implements a modern interpretation of the Kolmogorov-Arnold Representation
    SYMBOLIC_FUNCTIONS = {
        'linear': 'x',           # Direct relationships
        'quadratic': 'x^2',      # Non-linear patterns
+       'cubic': 'x^3',         # Higher-order relationships
        'interaction': 'x_i x_j', # Feature interactions
-       'higher_order': 'x^n'    # Polynomial terms
+       'higher_order': 'x^n',    # Polynomial terms
+       'trigonometric': 'sin(x)', # Trigonometric functions
+       'exponential': 'exp(x)',  # Exponential growth
+       'logarithmic': 'log(x)'  # Logarithmic relationships
    }
    ```
 4. **Formula Extraction Process**:
    - Train neural network on raw data
    - Generate augmented samples for better coverage
-   - Perform L1-regularized symbolic regression
+   - Perform L1-regularized symbolic regression (alpha)
    - Prune terms with coefficients below threshold
    - Export human-readable mathematical expressions
@@ -115,12 +119,14 @@ model = OIKANRegressor(
     activation='relu', # Activation function (other options: 'tanh', 'leaky_relu', 'elu', 'swish', 'gelu')
     augmentation_factor=5, # Augmentation factor for data generation
     polynomial_degree=2, # Degree of polynomial basis functions
-    alpha=0.1, # L1 regularization strength
+    alpha=0.1, # L1 regularization strength (Symbolic regression)
     sigma=0.1, # Standard deviation of Gaussian noise for data augmentation
+    top_k=5, # Number of top features to select (Symbolic regression)
     epochs=100, # Number of training epochs
     lr=0.001, # Learning rate
     batch_size=32, # Batch size for training
-    verbose=True # Verbose output during training
+    verbose=True, # Verbose output during training
+    evaluate_nn=True # Validate neural network performance before full process
 )
 # Fit the model
@@ -163,12 +169,14 @@ model = OIKANClassifier(
     activation='relu', # Activation function (other options: 'tanh', 'leaky_relu', 'elu', 'swish', 'gelu')
     augmentation_factor=10, # Augmentation factor for data generation
     polynomial_degree=2, # Degree of polynomial basis functions
-    alpha=0.1, # L1 regularization strength
+    alpha=0.1, # L1 regularization strength (Symbolic regression)
     sigma=0.1, # Standard deviation of Gaussian noise for data augmentation
+    top_k=5, # Number of top features to select (Symbolic regression)
     epochs=100, # # Number of training epochs
     lr=0.001, # Learning rate
     batch_size=32, # Batch size for training
-    verbose=True # Verbose output during training
+    verbose=True, # Verbose output during training
+    evaluate_nn=True # Validate neural network performance before full process
 )
 # Fit the model
@@ -202,7 +210,7 @@ loaded_model.load("outputs/model.json")
 ### Architecture Diagram
-*Will be updated soon..*
+![OIKAN v0.0.3(1) Architecture](https://raw.githubusercontent.com/silvermete0r/oikan/main/docs/media/oikan-v0.0.3(1)-architecture-oop.png)
 ## Contributing
@@ -222,7 +230,7 @@ If you use OIKAN in your research, please cite:
 ```bibtex
 @software{oikan2025,
-  title = {OIKAN: Optimized Interpretable Kolmogorov-Arnold Networks},
+  title = {OIKAN: Neuro-Symbolic ML for Scientific Discovery},
   author = {Zhalgasbayev, Arman},
   year = {2025},
   url = {https://github.com/silvermete0r/OIKAN}

{oikan-0.0.3.1 → oikan-0.0.3.3}/README.md RENAMED Viewed

@@ -39,7 +39,7 @@ OIKAN implements a modern interpretation of the Kolmogorov-Arnold Representation
 2. **Neural Implementation**: OIKAN uses a specialized architecture combining:
    - Feature transformation layers with interpretable basis functions
-   - Symbolic regression for formula extraction
+   - Symbolic regression for formula extraction (ElasticNet-based)
    - Automatic pruning of insignificant terms
    ```python
@@ -58,15 +58,19 @@ OIKAN implements a modern interpretation of the Kolmogorov-Arnold Representation
    SYMBOLIC_FUNCTIONS = {
        'linear': 'x',           # Direct relationships
        'quadratic': 'x^2',      # Non-linear patterns
+       'cubic': 'x^3',         # Higher-order relationships
        'interaction': 'x_i x_j', # Feature interactions
-       'higher_order': 'x^n'    # Polynomial terms
+       'higher_order': 'x^n',    # Polynomial terms
+       'trigonometric': 'sin(x)', # Trigonometric functions
+       'exponential': 'exp(x)',  # Exponential growth
+       'logarithmic': 'log(x)'  # Logarithmic relationships
    }
    ```
 4. **Formula Extraction Process**:
    - Train neural network on raw data
    - Generate augmented samples for better coverage
-   - Perform L1-regularized symbolic regression
+   - Perform L1-regularized symbolic regression (alpha)
    - Prune terms with coefficients below threshold
    - Export human-readable mathematical expressions
@@ -97,12 +101,14 @@ model = OIKANRegressor(
     activation='relu', # Activation function (other options: 'tanh', 'leaky_relu', 'elu', 'swish', 'gelu')
     augmentation_factor=5, # Augmentation factor for data generation
     polynomial_degree=2, # Degree of polynomial basis functions
-    alpha=0.1, # L1 regularization strength
+    alpha=0.1, # L1 regularization strength (Symbolic regression)
     sigma=0.1, # Standard deviation of Gaussian noise for data augmentation
+    top_k=5, # Number of top features to select (Symbolic regression)
     epochs=100, # Number of training epochs
     lr=0.001, # Learning rate
     batch_size=32, # Batch size for training
-    verbose=True # Verbose output during training
+    verbose=True, # Verbose output during training
+    evaluate_nn=True # Validate neural network performance before full process
 )
 # Fit the model
@@ -145,12 +151,14 @@ model = OIKANClassifier(
     activation='relu', # Activation function (other options: 'tanh', 'leaky_relu', 'elu', 'swish', 'gelu')
     augmentation_factor=10, # Augmentation factor for data generation
     polynomial_degree=2, # Degree of polynomial basis functions
-    alpha=0.1, # L1 regularization strength
+    alpha=0.1, # L1 regularization strength (Symbolic regression)
     sigma=0.1, # Standard deviation of Gaussian noise for data augmentation
+    top_k=5, # Number of top features to select (Symbolic regression)
     epochs=100, # # Number of training epochs
     lr=0.001, # Learning rate
     batch_size=32, # Batch size for training
-    verbose=True # Verbose output during training
+    verbose=True, # Verbose output during training
+    evaluate_nn=True # Validate neural network performance before full process
 )
 # Fit the model
@@ -184,7 +192,7 @@ loaded_model.load("outputs/model.json")
 ### Architecture Diagram
-*Will be updated soon..*
+![OIKAN v0.0.3(1) Architecture](https://raw.githubusercontent.com/silvermete0r/oikan/main/docs/media/oikan-v0.0.3(1)-architecture-oop.png)
 ## Contributing
@@ -204,7 +212,7 @@ If you use OIKAN in your research, please cite:
 ```bibtex
 @software{oikan2025,
-  title = {OIKAN: Optimized Interpretable Kolmogorov-Arnold Networks},
+  title = {OIKAN: Neuro-Symbolic ML for Scientific Discovery},
   author = {Zhalgasbayev, Arman},
   year = {2025},
   url = {https://github.com/silvermete0r/OIKAN}

oikan-0.0.3.3/oikan/exceptions.py ADDED Viewed

@@ -0,0 +1,31 @@
+class OIKANError(Exception):
+    """Base exception for OIKAN library."""
+    pass
+class ModelNotFittedError(OIKANError):
+    """Raised when a method requires a fitted model."""
+    pass
+class InvalidParameterError(OIKANError):
+    """Raised when an invalid parameter value is provided."""
+    pass
+class DataDimensionError(OIKANError):
+    """Raised when input data has incorrect dimensions."""
+    pass
+class NumericalInstabilityError(OIKANError):
+    """Raised when numerical computations become unstable."""
+    pass
+class FeatureExtractionError(OIKANError):
+    """Raised when feature extraction or transformation fails."""
+    pass
+class ModelSerializationError(OIKANError):
+    """Raised when model saving/loading operations fail."""
+    pass
+class ConvergenceError(OIKANError):
+    """Raised when the model fails to converge during training."""
+    pass

{oikan-0.0.3.1 → oikan-0.0.3.3}/oikan/model.py RENAMED Viewed

@@ -3,11 +3,15 @@ import torch
 import torch.nn as nn
 import torch.optim as optim
 from sklearn.preprocessing import PolynomialFeatures
-from sklearn.linear_model import Lasso
+from sklearn.linear_model import ElasticNet
 from abc import ABC, abstractmethod
 import json
 from .neural import TabularNet
 from .utils import evaluate_basis_functions, get_features_involved
+from sklearn.model_selection import train_test_split
+from sklearn.metrics import r2_score, accuracy_score
+from .exceptions import *
+import sys
 class OIKAN(ABC):
     """
@@ -18,7 +22,7 @@ class OIKAN(ABC):
     hidden_sizes : list, optional (default=[64, 64])
         List of hidden layer sizes for the neural network.
     activation : str, optional (default='relu')
-        Activation function for the neural network ('relu' or 'tanh').
+        Activation function for the neural network ('relu', 'tanh', 'leaky_relu', 'elu', 'swish', 'gelu').
     augmentation_factor : int, optional (default=10)
         Number of augmented samples per original sample.
     polynomial_degree : int, optional (default=2)
@@ -27,6 +31,8 @@ class OIKAN(ABC):
         L1 regularization strength for Lasso in symbolic regression.
     sigma : float, optional (default=0.1)
         Standard deviation of Gaussian noise for data augmentation.
+    top_k : int, optional (default=5)
+        Number of top features to select in hierarchical symbolic regression.
     epochs : int, optional (default=100)
         Number of epochs for neural network training.
     lr : float, optional (default=0.001)
@@ -35,10 +41,33 @@ class OIKAN(ABC):
         Batch size for neural network training.
     verbose : bool, optional (default=False)
         Whether to display training progress.
+    evaluate_nn : bool, optional (default=False)
+        Whether to evaluate neural network performance before full training.
     """
     def __init__(self, hidden_sizes=[64, 64], activation='relu', augmentation_factor=10,
                  polynomial_degree=2, alpha=0.1, sigma=0.1, epochs=100, lr=0.001, batch_size=32,
-                 verbose=False):
+                 verbose=False, evaluate_nn=False, top_k=5):
+        if not isinstance(hidden_sizes, list) or not all(isinstance(x, int) and x > 0 for x in hidden_sizes):
+            raise InvalidParameterError("hidden_sizes must be a list of positive integers")
+        if activation not in ['relu', 'tanh', 'leaky_relu', 'elu', 'swish', 'gelu']:
+            raise InvalidParameterError(f"Unsupported activation function: {activation}")
+        if not isinstance(augmentation_factor, int) or augmentation_factor < 1:
+            raise InvalidParameterError("augmentation_factor must be a positive integer")
+        if not isinstance(polynomial_degree, int) or polynomial_degree < 1:
+            raise InvalidParameterError("polynomial_degree must be a positive integer")
+        if not isinstance(top_k, int) or top_k < 1:
+            raise InvalidParameterError("top_k must be a positive integer")
+        if not 0 < lr < 1:
+            raise InvalidParameterError("Learning rate must be between 0 and 1")
+        if not isinstance(batch_size, int) or batch_size < 1:
+            raise InvalidParameterError("batch_size must be a positive integer")
+        if not isinstance(epochs, int) or epochs < 1:
+            raise InvalidParameterError("epochs must be a positive integer")
+        if not 0 <= alpha <= 1:
+            raise InvalidParameterError("alpha must be between 0 and 1")
+        if sigma <= 0:
+            raise InvalidParameterError("sigma must be positive")
         self.hidden_sizes = hidden_sizes
         self.activation = activation
         self.augmentation_factor = augmentation_factor
@@ -49,8 +78,11 @@ class OIKAN(ABC):
         self.lr = lr
         self.batch_size = batch_size
         self.verbose = verbose
+        self.evaluate_nn = evaluate_nn
+        self.top_k = top_k
         self.neural_net = None
         self.symbolic_model = None
+        self.evaluation_done = False
     @abstractmethod
     def fit(self, X, y):
@@ -61,19 +93,19 @@ class OIKAN(ABC):
         pass
     def get_formula(self):
-        """Returns the symbolic formula(s) as a string or list of strings."""
+        """Returns the symbolic formula(s) as a string (regression) or list of strings (classification)."""
         if self.symbolic_model is None:
             raise ValueError("Model not fitted yet.")
         basis_functions = self.symbolic_model['basis_functions']
         if 'coefficients' in self.symbolic_model:
             coefficients = self.symbolic_model['coefficients']
-            formula = " + ".join([f"{coefficients[i]:.3f}*{basis_functions[i]}"
+            formula = " + ".join([f"{coefficients[i]:.5f}*{basis_functions[i]}"
                                 for i in range(len(coefficients)) if coefficients[i] != 0])
             return formula if formula else "0"
         else:
             formulas = []
             for c, coef in enumerate(self.symbolic_model['coefficients_list']):
-                formula = " + ".join([f"{coef[i]:.3f}*{basis_functions[i]}"
+                formula = " + ".join([f"{coef[i]:.5f}*{basis_functions[i]}"
                                     for i in range(len(coef)) if coef[i] != 0])
                 formulas.append(f"Class {self.classes_[c]}: {formula if formula else '0'}")
             return formulas
@@ -122,27 +154,33 @@ class OIKAN(ABC):
             File path to save the model. Should end with .json
         """
         if self.symbolic_model is None:
-            raise ValueError("Model not fitted yet.")
+            raise ModelNotFittedError("Model must be fitted before saving")
         if not path.endswith('.json'):
             path = path + '.json'
-        # Convert numpy arrays and other non-serializable types to lists
-        model_data = {
-            'n_features': self.symbolic_model['n_features'],
-            'degree': self.symbolic_model['degree'],
-            'basis_functions': self.symbolic_model['basis_functions']
-        }
-        if 'coefficients' in self.symbolic_model:
-            model_data['coefficients'] = self.symbolic_model['coefficients']
-        else:
-            model_data['coefficients_list'] = [coef for coef in self.symbolic_model['coefficients_list']]
-            if hasattr(self, 'classes_'):
-                model_data['classes'] = self.classes_.tolist()
+        try:
+            # Convert numpy arrays and other non-serializable types to lists
+            model_data = {
+                'n_features': self.symbolic_model['n_features'],
+                'degree': self.symbolic_model['degree'],
+                'basis_functions': self.symbolic_model['basis_functions']
+            }
+            if 'coefficients' in self.symbolic_model:
+                model_data['coefficients'] = self.symbolic_model['coefficients']
+            else:
+                model_data['coefficients_list'] = [coef for coef in self.symbolic_model['coefficients_list']]
+                if hasattr(self, 'classes_'):
+                    model_data['classes'] = self.classes_.tolist()
+            with open(path, 'w') as f:
+                json.dump(model_data, f, indent=2)
+        except Exception as e:
+            raise ModelSerializationError(f"Failed to save model: {str(e)}")
-        with open(path, 'w') as f:
-            json.dump(model_data, f, indent=2)
+        if self.verbose:
+            print(f"Model saved to {path}")
     def load(self, path):
         """
@@ -155,27 +193,76 @@ class OIKAN(ABC):
         """
         if not path.endswith('.json'):
             path = path + '.json'
+        try:
+            with open(path, 'r') as f:
+                model_data = json.load(f)
+            self.symbolic_model = {
+                'n_features': model_data['n_features'],
+                'degree': model_data['degree'],
+                'basis_functions': model_data['basis_functions']
+            }
-        with open(path, 'r') as f:
-            model_data = json.load(f)
-        self.symbolic_model = {
-            'n_features': model_data['n_features'],
-            'degree': model_data['degree'],
-            'basis_functions': model_data['basis_functions']
-        }
+            if 'coefficients' in model_data:
+                self.symbolic_model['coefficients'] = model_data['coefficients']
+            else:
+                self.symbolic_model['coefficients_list'] = model_data['coefficients_list']
+                if 'classes' in model_data:
+                    self.classes_ = np.array(model_data['classes'])
+        except Exception as e:
+            raise ModelSerializationError(f"Failed to load model: {str(e)}")
-        if 'coefficients' in model_data:
-            self.symbolic_model['coefficients'] = model_data['coefficients']
-        else:
-            self.symbolic_model['coefficients_list'] = model_data['coefficients_list']
-            if 'classes' in model_data:
-                self.classes_ = np.array(model_data['classes'])
+        if self.verbose:
+            print(f"Model loaded from {path}")
+    def _evaluate_neural_net(self, X, y, output_size, loss_fn):
+        """Evaluates neural network performance on train-test split."""
+        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
+        input_size = X.shape[1]
+        self.neural_net = TabularNet(input_size, self.hidden_sizes, output_size, self.activation)
+        optimizer = optim.Adam(self.neural_net.parameters(), lr=self.lr)
+        # Train on the training set
+        self._train_neural_net(X_train, y_train, output_size, loss_fn)
+        # Evaluate on test set
+        self.neural_net.eval()
+        with torch.no_grad():
+            y_pred = self.neural_net(torch.tensor(X_test, dtype=torch.float32))
+            if output_size == 1:  # Regression
+                y_pred = y_pred.numpy()
+                score = r2_score(y_test, y_pred)
+                metric_name = "R² Score"
+            else:  # Classification
+                y_pred = torch.argmax(y_pred, dim=1).numpy()
+                y_test = torch.argmax(y_test, dim=1).numpy()
+                score = accuracy_score(y_test, y_pred)
+                metric_name = "Accuracy"
+        print(f"\nNeural Network Evaluation:")
+        print(f"Train size: {len(X_train)}, Test size: {len(X_test)}")
+        print(f"{metric_name}: {score:.4f}")
+        # Ask user for confirmation
+        response = input("\nProceed with full training and symbolic regression? [Y/n]: ").lower()
+        if response not in ['y', 'yes']:
+            sys.exit("Training cancelled by user.")
+        # Retrain on full dataset
+        self._train_neural_net(X, y, output_size, loss_fn)
     def _train_neural_net(self, X, y, output_size, loss_fn):
         """Trains the neural network on the input data."""
+        if self.evaluate_nn and not self.evaluation_done:
+            self.evaluation_done = True
+            self._evaluate_neural_net(X, y, output_size, loss_fn)
+            return
         input_size = X.shape[1]
-        self.neural_net = TabularNet(input_size, self.hidden_sizes, output_size, self.activation)
+        if self.neural_net is None:
+            self.neural_net = TabularNet(input_size, self.hidden_sizes, output_size, self.activation)
         optimizer = optim.Adam(self.neural_net.parameters(), lr=self.lr)
         dataset = torch.utils.data.TensorDataset(torch.tensor(X, dtype=torch.float32),
                                                torch.tensor(y, dtype=torch.float32))
@@ -203,7 +290,6 @@ class OIKAN(ABC):
     def _generate_augmented_data(self, X):
         """Generates augmented data by adding Gaussian noise."""
-        n_samples = X.shape[0]
         X_aug = []
         for _ in range(self.augmentation_factor):
             noise = np.random.normal(0, self.sigma, X.shape)
@@ -212,32 +298,102 @@ class OIKAN(ABC):
         return np.vstack(X_aug)
     def _perform_symbolic_regression(self, X, y):
-        """Performs symbolic regression using polynomial features and Lasso."""
-        poly = PolynomialFeatures(degree=self.polynomial_degree, include_bias=True)
-        X_poly = poly.fit_transform(X)
-        model = Lasso(alpha=self.alpha, fit_intercept=False)
-        model.fit(X_poly, y)
+        """
+        Performs hierarchical symbolic regression using a two-stage approach.
+        Parameters:
+        -----------
+        X : array-like of shape (n_samples, n_features)
+            Input data.
+        y : array-like of shape (n_samples,) or (n_samples, n_classes)
+            Target values or logits.
+        """
+        n_features = X.shape[1]
+        self.top_k = min(self.top_k, n_features)
+        if self.top_k < 1:
+            raise InvalidParameterError("top_k must be at least 1")
+        if np.any(np.isnan(X)) or np.any(np.isnan(y)):
+            raise NumericalInstabilityError("Input data contains NaN values")
+        if np.any(np.isinf(X)) or np.any(np.isinf(y)):
+            raise NumericalInstabilityError("Input data contains infinite values")
+        # Stage 1: Coarse Model
+        coarse_degree = 2  # Fixed low degree for coarse model
+        poly_coarse = PolynomialFeatures(degree=coarse_degree, include_bias=True)
+        X_poly_coarse = poly_coarse.fit_transform(X)
+        model_coarse = ElasticNet(alpha=self.alpha, fit_intercept=False)
+        model_coarse.fit(X_poly_coarse, y)
+        # Compute feature importances for original features
+        basis_functions_coarse = poly_coarse.get_feature_names_out()
         if len(y.shape) == 1 or y.shape[1] == 1:
-            coef = model.coef_.flatten()
-            selected_indices = np.where(np.abs(coef) > 1e-6)[0]
+            coef_coarse = model_coarse.coef_.flatten()
+        else:
+            coef_coarse = np.sum(np.abs(model_coarse.coef_), axis=0)
+        importances = np.zeros(X.shape[1])
+        for i, func in enumerate(basis_functions_coarse):
+            features_involved = get_features_involved(func)
+            for idx in features_involved:
+                importances[idx] += np.abs(coef_coarse[i])
+        if np.all(importances == 0):
+            raise FeatureExtractionError("Failed to compute feature importances - all values are zero")
+        # Select top K features
+        top_k_indices = np.argsort(importances)[::-1][:self.top_k]
+        # Stage 2: Refined Model
+        # ~ generate additional non-linear features for top K features
+        additional_features = []
+        additional_names = []
+        for i in top_k_indices:
+            # Higher-degree polynomial
+            additional_features.append(X[:, i]**3)
+            additional_names.append(f'x{i}^3')
+            # Non-linear transformations
+            additional_features.append(np.log1p(np.abs(X[:, i])))
+            additional_names.append(f'log1p_x{i}')
+            additional_features.append(np.exp(np.clip(X[:, i], -10, 10)))
+            additional_names.append(f'exp_x{i}')
+            additional_features.append(np.sin(X[:, i]))
+            additional_names.append(f'sin_x{i}')
+        # Combine features
+        X_additional = np.column_stack(additional_features)
+        X_refined = np.hstack([X_poly_coarse, X_additional])
+        basis_functions_refined = list(basis_functions_coarse) + additional_names
+        # Fit refined model
+        model_refined = ElasticNet(alpha=self.alpha, fit_intercept=False)
+        model_refined.fit(X_refined, y)
+        # Store symbolic model
+        if len(y.shape) == 1 or y.shape[1] == 1:
+            # Regression
+            coef_refined = model_refined.coef_.flatten()
+            selected_indices = np.where(np.abs(coef_refined) > 1e-6)[0]
             self.symbolic_model = {
                 'n_features': X.shape[1],
-                'degree': self.polynomial_degree,
-                'basis_functions': poly.get_feature_names_out()[selected_indices].tolist(),
-                'coefficients': coef[selected_indices].tolist()
+                'degree': self.polynomial_degree,
+                'basis_functions': [basis_functions_refined[i] for i in selected_indices],
+                'coefficients': coef_refined[selected_indices].tolist()
             }
         else:
+            # Classification
             coefficients_list = []
-            # Note: Using the same basis functions across classes for simplicity
             selected_indices = set()
             for c in range(y.shape[1]):
-                coef = model.coef_[c]
+                coef = model_refined.coef_[c]
                 indices = np.where(np.abs(coef) > 1e-6)[0]
                 selected_indices.update(indices)
             selected_indices = list(selected_indices)
-            basis_functions = poly.get_feature_names_out()[selected_indices].tolist()
+            basis_functions = [basis_functions_refined[i] for i in selected_indices]
             for c in range(y.shape[1]):
-                coef = model.coef_[c]
+                coef = model_refined.coef_[c]
                 coef_selected = coef[selected_indices].tolist()
                 coefficients_list.append(coef_selected)
             self.symbolic_model = {
@@ -263,10 +419,14 @@ class OIKANRegressor(OIKAN):
         X = np.asarray(X)
         y = np.asarray(y).reshape(-1, 1)
         self._train_neural_net(X, y, output_size=1, loss_fn=nn.MSELoss())
+        if self.verbose:
+            print(f"Original data: features shape: {X.shape} | target shape: {y.shape}")
         X_aug = self._generate_augmented_data(X)
         self.neural_net.eval()
         with torch.no_grad():
             y_aug = self.neural_net(torch.tensor(X_aug, dtype=torch.float32)).detach().numpy()
+        if self.verbose:
+            print(f"Augmented data: features shape: {X_aug.shape} | target shape: {y_aug.shape}")
         self._perform_symbolic_regression(X_aug, y_aug)
     def predict(self, X):
@@ -311,10 +471,14 @@ class OIKANClassifier(OIKAN):
         n_classes = len(self.classes_)
         y_onehot = nn.functional.one_hot(torch.tensor(y_encoded), num_classes=n_classes).float()
         self._train_neural_net(X, y_onehot, output_size=n_classes, loss_fn=nn.CrossEntropyLoss())
+        if self.verbose:
+            print(f"Original data: features shape: {X.shape} | target shape: {y.shape}")
         X_aug = self._generate_augmented_data(X)
         self.neural_net.eval()
         with torch.no_grad():
             logits_aug = self.neural_net(torch.tensor(X_aug, dtype=torch.float32)).detach().numpy()
+        if self.verbose:
+            print(f"Augmented data: features shape: {X_aug.shape} | target shape: {logits_aug.shape}")
         self._perform_symbolic_regression(X_aug, logits_aug)
     def predict(self, X):

oikan-0.0.3.3/oikan/utils.py ADDED Viewed

@@ -0,0 +1,82 @@
+import numpy as np
+def evaluate_basis_functions(X, basis_functions, n_features):
+    """
+    Evaluates basis functions on the input data.
+    Parameters:
+    -----------
+    X : array-like of shape (n_samples, n_features)
+        Input data.
+    basis_functions : list
+        List of basis function strings (e.g., '1', 'x0', 'x0^2', 'x0 x1', 'log1p_x0').
+    n_features : int
+        Number of input features.
+    Returns:
+    --------
+    X_transformed : ndarray of shape (n_samples, n_basis_functions)
+        Transformed data matrix.
+    """
+    X_transformed = np.zeros((X.shape[0], len(basis_functions)))
+    for i, func in enumerate(basis_functions):
+        if func == '1':
+            X_transformed[:, i] = 1
+        elif func.startswith('log1p_x'):
+            idx = int(func.split('_')[1][1:])
+            X_transformed[:, i] = np.log1p(np.abs(X[:, idx]))
+        elif func.startswith('exp_x'):
+            idx = int(func.split('_')[1][1:])
+            X_transformed[:, i] = np.exp(np.clip(X[:, idx], -10, 10))
+        elif func.startswith('sin_x'):
+            idx = int(func.split('_')[1][1:])
+            X_transformed[:, i] = np.sin(X[:, idx])
+        elif '^' in func:
+            var, power = func.split('^')
+            idx = int(var[1:])
+            X_transformed[:, i] = X[:, idx] ** int(power)
+        elif ' ' in func:
+            vars = func.split(' ')
+            result = np.ones(X.shape[0])
+            for var in vars:
+                idx = int(var[1:])
+                result *= X[:, idx]
+            X_transformed[:, i] = result
+        else:
+            idx = int(func[1:])
+            X_transformed[:, i] = X[:, idx]
+    return X_transformed
+def get_features_involved(basis_function):
+    """
+    Extracts the feature indices involved in a basis function string.
+    Parameters:
+    -----------
+    basis_function : str
+        String representation of the basis function, e.g., 'x0', 'x0^2', 'x0 x1', 'log1p_x0'.
+    Returns:
+    --------
+    set : Set of feature indices involved.
+    """
+    if basis_function == '1':
+        return set()
+    features = set()
+    if '_' in basis_function:  # Handle non-linear functions like 'log1p_x0'
+        parts = basis_function.split('_')
+        if len(parts) == 2 and parts[1].startswith('x'):
+            idx = int(parts[1][1:])
+            features.add(idx)
+    elif '^' in basis_function:  # Handle powers, e.g., 'x0^2'
+        var = basis_function.split('^')[0]
+        idx = int(var[1:])
+        features.add(idx)
+    elif ' ' in basis_function:  # Handle interactions, e.g., 'x0 x1'
+        for part in basis_function.split():
+            idx = int(part[1:])
+            features.add(idx)
+    elif basis_function.startswith('x'):
+        idx = int(basis_function[1:])
+        features.add(idx)
+    return features

{oikan-0.0.3.1 → oikan-0.0.3.3}/oikan.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: oikan
-Version: 0.0.3.1
+Version: 0.0.3.3
 Summary: OIKAN: Neuro-Symbolic ML for Scientific Discovery
 Author: Arman Zhalgasbayev
 License: MIT
@@ -57,7 +57,7 @@ OIKAN implements a modern interpretation of the Kolmogorov-Arnold Representation
 2. **Neural Implementation**: OIKAN uses a specialized architecture combining:
    - Feature transformation layers with interpretable basis functions
-   - Symbolic regression for formula extraction
+   - Symbolic regression for formula extraction (ElasticNet-based)
    - Automatic pruning of insignificant terms
    ```python
@@ -76,15 +76,19 @@ OIKAN implements a modern interpretation of the Kolmogorov-Arnold Representation
    SYMBOLIC_FUNCTIONS = {
        'linear': 'x',           # Direct relationships
        'quadratic': 'x^2',      # Non-linear patterns
+       'cubic': 'x^3',         # Higher-order relationships
        'interaction': 'x_i x_j', # Feature interactions
-       'higher_order': 'x^n'    # Polynomial terms
+       'higher_order': 'x^n',    # Polynomial terms
+       'trigonometric': 'sin(x)', # Trigonometric functions
+       'exponential': 'exp(x)',  # Exponential growth
+       'logarithmic': 'log(x)'  # Logarithmic relationships
    }
    ```
 4. **Formula Extraction Process**:
    - Train neural network on raw data
    - Generate augmented samples for better coverage
-   - Perform L1-regularized symbolic regression
+   - Perform L1-regularized symbolic regression (alpha)
    - Prune terms with coefficients below threshold
    - Export human-readable mathematical expressions
@@ -115,12 +119,14 @@ model = OIKANRegressor(
     activation='relu', # Activation function (other options: 'tanh', 'leaky_relu', 'elu', 'swish', 'gelu')
     augmentation_factor=5, # Augmentation factor for data generation
     polynomial_degree=2, # Degree of polynomial basis functions
-    alpha=0.1, # L1 regularization strength
+    alpha=0.1, # L1 regularization strength (Symbolic regression)
     sigma=0.1, # Standard deviation of Gaussian noise for data augmentation
+    top_k=5, # Number of top features to select (Symbolic regression)
     epochs=100, # Number of training epochs
     lr=0.001, # Learning rate
     batch_size=32, # Batch size for training
-    verbose=True # Verbose output during training
+    verbose=True, # Verbose output during training
+    evaluate_nn=True # Validate neural network performance before full process
 )
 # Fit the model
@@ -163,12 +169,14 @@ model = OIKANClassifier(
     activation='relu', # Activation function (other options: 'tanh', 'leaky_relu', 'elu', 'swish', 'gelu')
     augmentation_factor=10, # Augmentation factor for data generation
     polynomial_degree=2, # Degree of polynomial basis functions
-    alpha=0.1, # L1 regularization strength
+    alpha=0.1, # L1 regularization strength (Symbolic regression)
     sigma=0.1, # Standard deviation of Gaussian noise for data augmentation
+    top_k=5, # Number of top features to select (Symbolic regression)
     epochs=100, # # Number of training epochs
     lr=0.001, # Learning rate
     batch_size=32, # Batch size for training
-    verbose=True # Verbose output during training
+    verbose=True, # Verbose output during training
+    evaluate_nn=True # Validate neural network performance before full process
 )
 # Fit the model
@@ -202,7 +210,7 @@ loaded_model.load("outputs/model.json")
 ### Architecture Diagram
-*Will be updated soon..*
+![OIKAN v0.0.3(1) Architecture](https://raw.githubusercontent.com/silvermete0r/oikan/main/docs/media/oikan-v0.0.3(1)-architecture-oop.png)
 ## Contributing
@@ -222,7 +230,7 @@ If you use OIKAN in your research, please cite:
 ```bibtex
 @software{oikan2025,
-  title = {OIKAN: Optimized Interpretable Kolmogorov-Arnold Networks},
+  title = {OIKAN: Neuro-Symbolic ML for Scientific Discovery},
   author = {Zhalgasbayev, Arman},
   year = {2025},
   url = {https://github.com/silvermete0r/OIKAN}

{oikan-0.0.3.1 → oikan-0.0.3.3}/oikan.egg-info/SOURCES.txt RENAMED Viewed

@@ -6,7 +6,6 @@ oikan/__init__.py
 oikan/exceptions.py
 oikan/model.py
 oikan/neural.py
-oikan/symbolic.py
 oikan/utils.py
 oikan.egg-info/PKG-INFO
 oikan.egg-info/SOURCES.txt

{oikan-0.0.3.1 → oikan-0.0.3.3}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "oikan"
-version = "0.0.3.1"
+version = "0.0.3.3"
 description = "OIKAN: Neuro-Symbolic ML for Scientific Discovery"
 readme = "README.md"
 authors = [{name = "Arman Zhalgasbayev"}]

oikan-0.0.3.1/oikan/exceptions.py DELETED Viewed

@@ -1,7 +0,0 @@
-class OIKANError(Exception):
-    """Base exception for OIKAN library."""
-    pass
-class ModelNotFittedError(OIKANError):
-    """Raised when a method requires a fitted model."""
-    pass

oikan-0.0.3.1/oikan/symbolic.py DELETED Viewed

@@ -1,55 +0,0 @@
-import numpy as np
-from sklearn.preprocessing import PolynomialFeatures
-from sklearn.linear_model import Lasso
-def symbolic_regression(X, y, degree=2, alpha=0.1):
-    """
-    Performs symbolic regression on the input data.
-    Parameters:
-    -----------
-    X : array-like of shape (n_samples, n_features)
-        Input data.
-    y : array-like of shape (n_samples,) or (n_samples, n_targets)
-        Target values.
-    degree : int, optional (default=2)
-        Maximum polynomial degree.
-    alpha : float, optional (default=0.1)
-        L1 regularization strength.
-    Returns:
-    --------
-    dict : Contains 'basis_functions', 'coefficients' (or 'coefficients_list'), 'n_features', 'degree'
-    """
-    poly = PolynomialFeatures(degree=degree, include_bias=True)
-    X_poly = poly.fit_transform(X)
-    model = Lasso(alpha=alpha, fit_intercept=False)
-    model.fit(X_poly, y)
-    if len(y.shape) == 1 or y.shape[1] == 1:
-        coef = model.coef_.flatten()
-        selected_indices = np.where(np.abs(coef) > 1e-6)[0]
-        return {
-            'n_features': X.shape[1],
-            'degree': degree,
-            'basis_functions': poly.get_feature_names_out()[selected_indices].tolist(),
-            'coefficients': coef[selected_indices].tolist()
-        }
-    else:
-        coefficients_list = []
-        selected_indices = set()
-        for c in range(y.shape[1]):
-            coef = model.coef_[c]
-            indices = np.where(np.abs(coef) > 1e-6)[0]
-            selected_indices.update(indices)
-        selected_indices = list(selected_indices)
-        basis_functions = poly.get_feature_names_out()[selected_indices].tolist()
-        for c in range(y.shape[1]):
-            coef = model.coef_[c]
-            coef_selected = coef[selected_indices].tolist()
-            coefficients_list.append(coef_selected)
-        return {
-            'n_features': X.shape[1],
-            'degree': degree,
-            'basis_functions': basis_functions,
-            'coefficients_list': coefficients_list
-        }

oikan-0.0.3.1/oikan/utils.py DELETED Viewed

@@ -1,63 +0,0 @@
-import numpy as np
-def evaluate_basis_functions(X, basis_functions, n_features):
-    """
-    Evaluates basis functions on the input data.
-    Parameters:
-    -----------
-    X : array-like of shape (n_samples, n_features)
-        Input data.
-    basis_functions : list
-        List of basis function strings (e.g., '1', 'x0', 'x0^2', 'x0 x1').
-    n_features : int
-        Number of input features.
-    Returns:
-    --------
-    X_transformed : ndarray of shape (n_samples, n_basis_functions)
-        Transformed data matrix.
-    """
-    X_transformed = np.zeros((X.shape[0], len(basis_functions)))
-    for i, func in enumerate(basis_functions):
-        if func == '1':
-            X_transformed[:, i] = 1
-        elif '^' in func:
-            var, power = func.split('^')
-            idx = int(var[1:])
-            X_transformed[:, i] = X[:, idx] ** int(power)
-        elif ' ' in func:
-            var1, var2 = func.split(' ')
-            idx1 = int(var1[1:])
-            idx2 = int(var2[1:])
-            X_transformed[:, i] = X[:, idx1] * X[:, idx2]
-        else:
-            idx = int(func[1:])
-            X_transformed[:, i] = X[:, idx]
-    return X_transformed
-def get_features_involved(basis_function):
-    """
-    Extracts the feature indices involved in a basis function string.
-    Parameters:
-    -----------
-    basis_function : str
-        String representation of the basis function, e.g., 'x0', 'x0^2', 'x0 x1'.
-    Returns:
-    --------
-    set : Set of feature indices involved.
-    """
-    if basis_function == '1':  # Constant term involves no features
-        return set()
-    features = set()
-    for part in basis_function.split():  # Split by space for interaction terms
-        if part.startswith('x'):
-            if '^' in part:  # Handle powers, e.g., 'x0^2'
-                var = part.split('^')[0]  # Take 'x0'
-            else:
-                var = part  # Take 'x0' as is
-            idx = int(var[1:])  # Extract index, e.g., 0
-            features.add(idx)
-    return features