PyPI - metacountregressor - Versions diffs - 0.1.71__tar.gz → 0.1.86__tar.gz - Mend

metacountregressor 0.1.71tar.gz → 0.1.86tar.gz

Files changed (26) hide show

{metacountregressor-0.1.71 → metacountregressor-0.1.86}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: metacountregressor
-Version: 0.1.71
+Version: 0.1.86
 Summary: Extensions for a Python package for estimation of count models.
 Home-page: https://github.com/zahern/CountDataEstimation
 Author: Zeke Ahern
@@ -11,12 +11,18 @@ Description-Content-Type: text/markdown
 License-File: LICENSE.txt
 Requires-Dist: numpy>=1.13.1
 Requires-Dist: scipy>=1.0.0
+Requires-Dist: requests
 <div style="display: flex; align-items: center;">
-    <img src="https://github.com/zahern/data/raw/main/m.png" alt="My Image" style="width: 200px; margin-right: 20px;">
+    <img src="https://github.com/zahern/data/raw/main/m.png" alt="My Image" style="width: 100px; margin-right: 20px;">
     <p><span style="font-size: 60px;"><strong>MetaCountRegressor</strong></span></p>
 </div>
+# Tutorial also available as a jupyter notebook
+[Download Example Notebook](https://github.com/zahern/CountDataEstimation/blob/main/Tutorial.ipynb)
+The tutorial provides more extensive examples on how to run the code and perform experiments. Further documentation is currently in development.
 ##### Quick Setup
 The Below code demonstrates how to set up automatic optimization assisted by the harmony search algorithm. References to the Differential Evolution and Simulated Annealing has been mentioned (change accordingly)
@@ -35,8 +41,15 @@ from metacountregressor.solution import ObjectiveFunction
 from metacountregressor.metaheuristics import (harmony_search,
                                             differential_evolution,
                                             simulated_annealing)
 ```
+    loaded standard packages
+    loaded helper
+    testing
 #### Basic setup.
 The initial setup involves reading in the data and selecting an optimization algorithm. As the runtime progresses, new solutions will be continually evaluated. Finally, at the end of the runtime, the best solution will be identified and printed out. In the case of multiple objectives all of the best solutions will be printed out that belong to the Pareto frontier.
@@ -53,7 +66,7 @@ X = df.drop(columns=['FREQ', 'ID', 'AADT'])
 #some example argument, these are defualt so the following line is just for claritity. See the later agruments section for detials.
 arguments = {'algorithm': 'hs', 'test_percentage': 0.15, 'test_complexity': 6, 'instance_number':1,
-             'val_percentage':0.15, 'obj_1': 'bic', '_obj_2': 'RMSE_TEST', "MAX_TIME": 6}
+             'val_percentage':0.15, 'obj_1': 'bic', '_obj_2': 'RMSE_TEST', "_max_time": 6}
 # Fit the model with metacountregressor
 obj_fun = ObjectiveFunction(X, y, **arguments)
 #replace with other metaheuristics if desired
@@ -71,7 +84,7 @@ Note: Please Consider the main arguments to change.
 - `val_percentage`: This parameter represents the percentage of data used to validate the model. The value 0.15 corresponds to 15% of the data.
 - `test_complexity`: This parameter defines the complexity level for testing. The value 6 tests all complexities. Alternatively, you can provide a list of numbers to consider different complexities. The complexities are further explained later in this document.
 - `instance_number`: This parameter is used to give a name to the outputs.
-- `obj_1`: This parameter has multiple choices for obj_1, such as 'bic', 'aic', and 'hqic'. Only one choice should be defined as a string value.
+- `_obj_1`: This parameter has multiple choices for obj_1, such as 'bic', 'aic', and 'hqic'. Only one choice should be defined as a string value.
 - `_obj_2`: This parameter has multiple choices for objective 2, such as 'RMSE_TEST', 'MSE_TEST', and 'MAE_TEST'.
 - `_max_time`: This parameter specifies the maximum number of seconds for the total estimation before stopping.
 - `distribution`: This parameter is a list of distributions to consider. Please select all of the available options and put them into a list of valid options if you want to to consider the distribution type for use when modellign with random parameters. The valid options include: 'Normal', 'LnNormal', 'Triangular', and 'Uniform'.
@@ -80,7 +93,7 @@ Note: Please Consider the main arguments to change.
-### An Example of changing the arguments.
+### Example of changing the arguments:
 Modify the arguments according to your preferences using the commented code as a guide.
@@ -108,16 +121,18 @@ Listed below is an example of how to specify an initial solution within the fram
 ```python
- #Model Decisions, Specify for Intial Optimization
+ #Model Decisions, Specify for initial solution that will be optimised.
 manual_fit_spec = {
     'fixed_terms': ['SINGLE', 'LENGTH'],
     'rdm_terms': ['AADT:normal'],
-    'rdm_cor_terms': ['GRADEBR:uniform', 'CURVES:triangular'],
+    'rdm_cor_terms': ['GRADEBR:normal', 'CURVES:normal'],
     'grouped_terms': [],
     'hetro_in_means': ['ACCESS:normal', 'MINRAD:normal'],
     'transformations': ['no', 'no', 'log', 'no', 'no', 'no', 'no'],
-    'dispersion': 1
+    'dispersion': 0
 }
 #Search Arguments
 arguments = {
     'algorithm': 'hs',
@@ -129,7 +144,47 @@ arguments = {
 obj_fun = ObjectiveFunction(X, y, **arguments)
 ```
- simarly to return the results feed the objective function into a metaheuristic solution algorithm. An example of this is provided below:
+    Setup Complete...
+    Benchmaking test with Seed 42
+    --------------------------------------------------------------------------------
+    Log-Likelihood:  -1339.1862434675106
+    --------------------------------------------------------------------------------
+    bic: 2732.31
+    --------------------------------------------------------------------------------
+    MSE: 650856.32
+    +--------------------------+--------+-------+----------+----------+------------+
+    |          Effect          | $\tau$ | Coeff | Std. Err | z-values | Prob |z|>Z |
+    +==========================+========+=======+==========+==========+============+
+    | LENGTH                   | no     | -0.15 |   0.01   |  -12.98  | 0.00***    |
+    +--------------------------+--------+-------+----------+----------+------------+
+    | SINGLE                   | no     | -2.46 |   0.04   |  -50.00  | 0.00***    |
+    +--------------------------+--------+-------+----------+----------+------------+
+    | GRADEBR                  | log    | 4.23  |   0.10   |  42.17   | 0.00***    |
+    +--------------------------+--------+-------+----------+----------+------------+
+    | CURVES                   | no     | 0.51  |   0.01   |  34.78   | 0.00***    |
+    +--------------------------+--------+-------+----------+----------+------------+
+    |  Chol: GRADEBR (Std.     |        | 2.21  |   0.00   |  50.00   | 0.00***    |
+    | Dev. normal) )           |        |       |          |          |            |
+    +--------------------------+--------+-------+----------+----------+------------+
+    |  Chol: CURVES (Std. Dev. |        | -0.51 |   0.00   |  -50.00  | 0.00***    |
+    | normal) )                |        |       |          |          |            |
+    +--------------------------+--------+-------+----------+----------+------------+
+    |  Chol: CURVES (Std. Dev. | no     | 0.55  |   0.00   |  50.00   | 0.00***    |
+    | normal) . GRADEBR (Std.  |        |       |          |          |            |
+    | Dev.   normal )          |        |       |          |          |            |
+    +--------------------------+--------+-------+----------+----------+------------+
+    | main: MINRAD: hetro      | no     | -0.00 |   0.00   |  -44.36  | 0.00***    |
+    | group 0                  |        |       |          |          |            |
+    +--------------------------+--------+-------+----------+----------+------------+
+    | ACCESS: hetro group 0    |        | 0.68  |   0.09   |   7.68   | 0.00***    |
+    +--------------------------+--------+-------+----------+----------+------------+
+    | main: MINRAD: hetro      |        | -0.00 |   0.00   |  -44.86  | 0.00***    |
+    | group 0:normal:sd  hetro |        |       |          |          |            |
+    | group 0                  |        |       |          |          |            |
+    +--------------------------+--------+-------+----------+----------+------------+
+ Simarly to return the results feed the objective function into a metaheuristic solution algorithm. An example of this is provided below:
 ```python
@@ -137,7 +192,7 @@ results = harmony_search(obj_fun)
 print(results)
 ```
-## Notes:
+# Notes:
 ### Capabilities of the software include:
 * Handling of Panel Data
 * Support for Data Transformations
@@ -155,11 +210,11 @@ Capability to handle heterogeneity in the means of the random parameters
 * Customization of Hyper-parameters to solve problems tailored to your dataset
 * Out-of-the-box optimization capability using default metaheuristics
-### Intreting the output of the model:
+### Intepreting the output of the model:
 A regression table is produced. The following text elements are explained:
 - Std. Dev.: This column appears for effects that are related to random paramters and displays the assument distributional assumption next to it
 - Chol: This term refers to Cholesky decomposition element, to show the correlation between two random paramaters. The combination of the cholesky element on iyself is equivalent to a normal random parameter.
-- hetro group #: This term represents the heterogeneity group number, which refers all of the contributing factors that share hetrogentiy in the means to each other under the same numbered value.
+- hetro group: This term represents the heterogeneity group number, which refers all of the contributing factors that share hetrogentiy in the means to each other under the same numbered value.
 - $\tau$: This column, displays the type of transformation that was applied to the specific contributing factor in the data.
@@ -211,10 +266,10 @@ The following list describes the arguments available in this function. By defaul
 8. **`_max_time`**: This argument is used to add a termination time in the algorithm. It takes values as seconds. Note the time is only dependenant on the time after intial population of solutions are generated.
-# Example
+## Example: Assistance by Harmony Search
-Let's start by fitting very simple models, use those model sto help and define the objectives, then perform more of an extensive search on the variables that are identified more commonly
+Let's begin by fitting very simple models and use the structure of these models to define our objectives. Then, we can conduct a more extensive search on the variables that are more frequently identified. For instance, in the case below, the complexity is level 3, indicating that we will consider, at most randomly correlated parameters. This approach is useful for initially identifying a suitable set of contributing factors for our search.
@@ -241,27 +296,30 @@ arguments = {
         '_max_time': 10000
     }
 obj_fun = ObjectiveFunction(X, y, **arguments)
 results = harmony_search(obj_fun)
 print(results)
 ```
+## Paper
+The following tutorial is in conjunction with our latest paper. A link the current paper can be found here [MetaCountRegressor](https://www.overleaf.com/read/mszwpwzcxsng#c5eb0c)
 ## Contact
 If you have any questions, ideas to improve MetaCountRegressor, or want to report a bug, just open a new issue in [GitHub repository](https://github.com/zahern/CountDataEstimation).
 ## Citing MetaCountRegressor
 Please cite MetaCountRegressor as follows:
-Ahern, Z., Corry P., Paz A. (2023). MetaCountRegressor [Computer software]. [https://pypi.org/project/metacounregressor/](https://pypi.org/project/metacounregressor/)
+Ahern, Z., Corry P., Paz A. (2024). MetaCountRegressor [Computer software]. [https://pypi.org/project/metacounregressor/](https://pypi.org/project/metacounregressor/)
 Or using BibTex as follows:
 ```bibtex
-@misc{Ahern2023,
-   author = {Zeke Ahern and Paul Corry and Alexander Paz},
+@misc{Ahern2024Meta,
+   author = {Zeke Ahern, Paul Corry and Alexander Paz},
    journal = {PyPi},
    title = {metacountregressor · PyPI},
-   url = {https://pypi.org/project/metacountregressor/0.1.47/},
-   year = {2023},
+   url = {https://pypi.org/project/metacountregressor/0.1.80/},
+   year = {2024},
 }

{metacountregressor-0.1.71 → metacountregressor-0.1.86}/README.rst RENAMED Viewed

@@ -2,9 +2,18 @@
    ::
-      <img src="https://github.com/zahern/data/raw/main/m.png" alt="My Image" style="width: 200px; margin-right: 20px;">
+      <img src="https://github.com/zahern/data/raw/main/m.png" alt="My Image" style="width: 100px; margin-right: 20px;">
       <p><span style="font-size: 60px;"><strong>MetaCountRegressor</strong></span></p>
+Tutorial also available as a jupyter notebook
+=============================================
+`Download Example
+Notebook <https://github.com/zahern/CountDataEstimation/blob/main/README.ipynb>`__
+The tutorial provides more extensive examples on how to run the code and
+perform experiments. Further documentation is currently in development.
 Quick Setup
 '''''''''''
@@ -28,6 +37,16 @@ Install ``metacountregressor`` using pip as follows:
     from metacountregressor.metaheuristics import (harmony_search,
                                                 differential_evolution,
                                                 simulated_annealing)
+.. parsed-literal::
+    loaded standard packages
+    loaded helper
+    testing
 Basic setup.
 ^^^^^^^^^^^^
@@ -52,7 +71,7 @@ the Pareto frontier.
     #some example argument, these are defualt so the following line is just for claritity. See the later agruments section for detials.
     arguments = {'algorithm': 'hs', 'test_percentage': 0.15, 'test_complexity': 6, 'instance_number':1,
-                 'val_percentage':0.15, 'obj_1': 'bic', '_obj_2': 'RMSE_TEST', "MAX_TIME": 6}
+                 'val_percentage':0.15, 'obj_1': 'bic', '_obj_2': 'RMSE_TEST', "_max_time": 6}
     # Fit the model with metacountregressor
     obj_fun = ObjectiveFunction(X, y, **arguments)
     #replace with other metaheuristics if desired
@@ -80,7 +99,7 @@ Note: Please Consider the main arguments to change.
    complexities are further explained later in this document.
 -  ``instance_number``: This parameter is used to give a name to the
    outputs.
--  ``obj_1``: This parameter has multiple choices for obj_1, such as
+-  ``_obj_1``: This parameter has multiple choices for obj_1, such as
    �bic�, �aic�, and �hqic�. Only one choice should be defined as a
    string value.
 -  ``_obj_2``: This parameter has multiple choices for objective 2, such
@@ -103,8 +122,8 @@ Note: Please Consider the main arguments to change.
    valid options include: �Normal�, �LnNormal�, �Triangular�, and
    �Uniform�.
-An Example of changing the arguments.
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Example of changing the arguments:
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Modify the arguments according to your preferences using the commented
 code as a guide.
@@ -139,16 +158,18 @@ modeling components may completely replace the initial solution.
 .. code:: ipython3
-     #Model Decisions, Specify for Intial Optimization
+     #Model Decisions, Specify for initial solution that will be optimised.
     manual_fit_spec = {
         'fixed_terms': ['SINGLE', 'LENGTH'],
         'rdm_terms': ['AADT:normal'],
-        'rdm_cor_terms': ['GRADEBR:uniform', 'CURVES:triangular'],
+        'rdm_cor_terms': ['GRADEBR:normal', 'CURVES:normal'],
         'grouped_terms': [],
         'hetro_in_means': ['ACCESS:normal', 'MINRAD:normal'],
         'transformations': ['no', 'no', 'log', 'no', 'no', 'no', 'no'],
-        'dispersion': 1
+        'dispersion': 0
     }
     #Search Arguments
     arguments = {
         'algorithm': 'hs',
@@ -159,7 +180,50 @@ modeling components may completely replace the initial solution.
     }
     obj_fun = ObjectiveFunction(X, y, **arguments)
-simarly to return the results feed the objective function into a
+.. parsed-literal::
+    Setup Complete...
+    Benchmaking test with Seed 42
+    --------------------------------------------------------------------------------
+    Log-Likelihood:  -1339.1862434675106
+    --------------------------------------------------------------------------------
+    bic: 2732.31
+    --------------------------------------------------------------------------------
+    MSE: 650856.32
+    +--------------------------+--------+-------+----------+----------+------------+
+    |          Effect          | $\tau$ | Coeff | Std. Err | z-values | Prob |z|>Z |
+    +==========================+========+=======+==========+==========+============+
+    | LENGTH                   | no     | -0.15 |   0.01   |  -12.98  | 0.00***    |
+    +--------------------------+--------+-------+----------+----------+------------+
+    | SINGLE                   | no     | -2.46 |   0.04   |  -50.00  | 0.00***    |
+    +--------------------------+--------+-------+----------+----------+------------+
+    | GRADEBR                  | log    | 4.23  |   0.10   |  42.17   | 0.00***    |
+    +--------------------------+--------+-------+----------+----------+------------+
+    | CURVES                   | no     | 0.51  |   0.01   |  34.78   | 0.00***    |
+    +--------------------------+--------+-------+----------+----------+------------+
+    |  Chol: GRADEBR (Std.     |        | 2.21  |   0.00   |  50.00   | 0.00***    |
+    | Dev. normal) )           |        |       |          |          |            |
+    +--------------------------+--------+-------+----------+----------+------------+
+    |  Chol: CURVES (Std. Dev. |        | -0.51 |   0.00   |  -50.00  | 0.00***    |
+    | normal) )                |        |       |          |          |            |
+    +--------------------------+--------+-------+----------+----------+------------+
+    |  Chol: CURVES (Std. Dev. | no     | 0.55  |   0.00   |  50.00   | 0.00***    |
+    | normal) . GRADEBR (Std.  |        |       |          |          |            |
+    | Dev.   normal )          |        |       |          |          |            |
+    +--------------------------+--------+-------+----------+----------+------------+
+    | main: MINRAD: hetro      | no     | -0.00 |   0.00   |  -44.36  | 0.00***    |
+    | group 0                  |        |       |          |          |            |
+    +--------------------------+--------+-------+----------+----------+------------+
+    | ACCESS: hetro group 0    |        | 0.68  |   0.09   |   7.68   | 0.00***    |
+    +--------------------------+--------+-------+----------+----------+------------+
+    | main: MINRAD: hetro      |        | -0.00 |   0.00   |  -44.86  | 0.00***    |
+    | group 0:normal:sd  hetro |        |       |          |          |            |
+    | group 0                  |        |       |          |          |            |
+    +--------------------------+--------+-------+----------+----------+------------+
+Simarly to return the results feed the objective function into a
 metaheuristic solution algorithm. An example of this is provided below:
 .. code:: ipython3
@@ -168,7 +232,7 @@ metaheuristic solution algorithm. An example of this is provided below:
     print(results)
 Notes:
-------
+======
 Capabilities of the software include:
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -199,8 +263,8 @@ Capabilities of the software include:
    dataset
 -  Out-of-the-box optimization capability using default metaheuristics
-Intreting the output of the model:
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Intepreting the output of the model:
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 A regression table is produced. The following text elements are
 explained: - Std. Dev.: This column appears for effects that are related
@@ -208,7 +272,7 @@ to random paramters and displays the assument distributional assumption
 next to it - Chol: This term refers to Cholesky decomposition element,
 to show the correlation between two random paramaters. The combination
 of the cholesky element on iyself is equivalent to a normal random
-parameter. - hetro group #: This term represents the heterogeneity group
+parameter. - hetro group: This term represents the heterogeneity group
 number, which refers all of the contributing factors that share
 hetrogentiy in the means to each other under the same numbered value. -
 :math:`\tau`: This column, displays the type of transformation that was
@@ -299,12 +363,16 @@ considered. Example code will be provided later in this guide.
    dependenant on the time after intial population of solutions are
    generated.
-Example
-=======
+Example: Assistance by Harmony Search
+-------------------------------------
-Let�s start by fitting very simple models, use those model sto help and
-define the objectives, then perform more of an extensive search on the
-variables that are identified more commonly
+Let�s begin by fitting very simple models and use the structure of these
+models to define our objectives. Then, we can conduct a more extensive
+search on the variables that are more frequently identified. For
+instance, in the case below, the complexity is level 3, indicating that
+we will consider, at most randomly correlated parameters. This approach
+is useful for initially identifying a suitable set of contributing
+factors for our search.
 .. code:: ipython3
@@ -330,10 +398,16 @@ variables that are identified more commonly
             '_max_time': 10000
         }
     obj_fun = ObjectiveFunction(X, y, **arguments)
     results = harmony_search(obj_fun)
     print(results)
+Paper
+-----
+The following tutorial is in conjunction with our latest paper. A link
+the current paper can be found here
+`MetaCountRegressor <https://www.overleaf.com/read/mszwpwzcxsng#c5eb0c>`__
 Contact
 -------
@@ -346,12 +420,12 @@ Citing MetaCountRegressor
 Please cite MetaCountRegressor as follows:
-Ahern, Z., Corry P., Paz A. (2023). MetaCountRegressor [Computer
+Ahern, Z., Corry P., Paz A. (2024). MetaCountRegressor [Computer
 software]. https://pypi.org/project/metacounregressor/
 Or using BibTex as follows:
-\```bibtex @misc{Ahern2023, author = {Zeke Ahern and Paul Corry and
+\```bibtex @misc{Ahern2024Meta, author = {Zeke Ahern, Paul Corry and
 Alexander Paz}, journal = {PyPi}, title = {metacountregressor · PyPI},
-url = {https://pypi.org/project/metacountregressor/0.1.47/}, year =
-{2023}, }
+url = {https://pypi.org/project/metacountregressor/0.1.80/}, year =
+{2024}, }

metacountregressor-0.1.86/metacountregressor/data_split_helper.py ADDED Viewed

@@ -0,0 +1,90 @@
+import numpy as np
+import pandas as pd
+class DataProcessor:
+    def __init__(self, x_data, y_data, kwargs):
+        self._obj_1 = kwargs.get('_obj_1')
+        self._obj_2 = kwargs.get('_obj_2')
+        self.test_percentage = float(kwargs.get('test_percentage', 0))
+        self.val_percentage = float(kwargs.get('val_percentage', 0))
+        self.is_multi = self.test_percentage != 0
+        self._x_data = x_data
+        self._y_data = y_data
+        self._process_data(kwargs)
+    def _process_data(self, kwargs):
+        if self._obj_1 == 'MAE' or self._obj_2 in ["MAE", 'RMSE', 'MSE', 'RMSE_IN', 'RMSE_TEST']:
+            self._handle_special_conditions(kwargs)
+        else:
+            self._standard_data_partition()
+        self._characteristics_names = list(self._x_data.columns)
+        self._max_group_all_means = 1
+        self._exclude_this_test = [4]
+    def _handle_special_conditions(self, kwargs):
+        if 'panels' in kwargs:
+            self._process_panels_data(kwargs)
+        else:
+            self._standard_data_partition()
+    def _process_panels_data(self, kwargs):
+        group_key = kwargs['group']
+        panels_key = kwargs['panels']
+        # Process groups and panels
+        self._x_data[group_key] = self._x_data[group_key].astype('category').cat.codes
+        try:
+            self._x_data[panels_key] = self._x_data[panels_key].rank(method='dense').astype(int)
+            self._x_data[panels_key] -= self._x_data[panels_key].min() - 1
+        except KeyError:
+            pass
+        # Create training and test datasets
+        unique_ids = np.unique(self._x_data[panels_key])
+        training_size = int((1 - self.test_percentage - self.val_percentage) * len(unique_ids))
+        training_ids = np.random.choice(unique_ids, training_size, replace=False)
+        train_idx = self._x_data.index[self._x_data[panels_key].isin(training_ids)]
+        test_idx = self._x_data.index[~self._x_data[panels_key].isin(training_ids)]
+        self._create_datasets(train_idx, test_idx)
+    def _standard_data_partition(self):
+        total_samples = len(self._x_data)
+        training_size = int((1 - self.test_percentage - self.val_percentage) * total_samples)
+        training_indices = np.random.choice(total_samples, training_size, replace=False)
+        train_idx = np.array([i for i in range(total_samples) if i in training_indices])
+        test_idx = np.array([i for i in range(total_samples) if i not in training_indices])
+        self._create_datasets(train_idx, test_idx)
+    def _create_datasets(self, train_idx, test_idx):
+        self.df_train = self._x_data.loc[train_idx, :]
+        self.df_test = self._x_data.loc[test_idx, :]
+        self.y_train = self._y_data.loc[train_idx, :]
+        self.y_test = self._y_data.loc[test_idx, :]
+        self._x_data_test = self.df_test.copy()
+        self._y_data_test = self.y_test.astype('float').copy()
+        self._x_data = self.df_train.copy()
+        self._y_data = self.y_train.astype('float').copy()
+        # Handle different shapes
+        if self._x_data.ndim == 2:  # Typical DataFrame
+            self._samples, self._characteristics = self._x_data.shape
+            self._panels = None
+        elif self._x_data.ndim == 3:  # 3D structure, e.g., Panel or similar
+            self._samples, self._panels, self._characteristics = self._x_data.shape

{metacountregressor-0.1.71 → metacountregressor-0.1.86}/metacountregressor/helperprocess.py RENAMED Viewed

@@ -5,6 +5,121 @@ import matplotlib.pyplot as plt
 plt.style.use('https://github.com/dhaitz/matplotlib-stylesheets/raw/master/pitayasmoothie-dark.mplstyle')
+##Select the best Features Based on RF
+def select_features(X_train, y_train, n_f=16):
+    try:
+        from sklearn.feature_selection import SelectKBest
+        from sklearn.feature_selection import f_regression
+        feature_names = X_train.columns
+        # configure to select all features
+        fs = SelectKBest(score_func=f_regression, k=16)
+        # learn relationship from training data
+        fs.fit(X_train, y_train)
+        mask = fs.get_support()  # Boolean array of selected features
+        selected_features = [feature for bool, feature in zip(mask, feature_names) if bool]
+        X_train = X_train[selected_features]
+    except:
+        print('import error, not performing feature selection')
+        fs = X_train.columns #TODO check if this is actually getting the names
+    return X_train, fs
+#Cutts off correlated data
+def findCorrelation(corr, cutoff=0.9, exact=None):    """
+    This function is the Python implementation of the R function
+    `findCorrelation()`.
+    Relies on numpy and pandas, so must have them pre-installed.
+    It searches through a correlation matrix and returns a list of column names
+    to remove to reduce pairwise correlations.
+    For the documentation of the R function, see
+    https://www.rdocumentation.org/packages/caret/topics/findCorrelation
+    and for the source code of `findCorrelation()`, see
+    https://github.com/topepo/caret/blob/master/pkg/caret/R/findCorrelation.R
+    -----------------------------------------------------------------------------
+    Parameters:
+    -----------
+    corr: pandas dataframe.
+        A correlation matrix as a pandas dataframe.
+    cutoff: float, default: 0.9.
+        A numeric value for the pairwise absolute correlation cutoff
+    exact: bool, default: None
+        A boolean value that determines whether the average correlations be
+        recomputed at each step
+    -----------------------------------------------------------------------------
+    Returns:
+    --------
+    list of column names
+    -----------------------------------------------------------------------------
+    Example:
+    --------
+    R1 = pd.DataFrame({
+        'x1': [1.0, 0.86, 0.56, 0.32, 0.85],
+        'x2': [0.86, 1.0, 0.01, 0.74, 0.32],
+        'x3': [0.56, 0.01, 1.0, 0.65, 0.91],
+        'x4': [0.32, 0.74, 0.65, 1.0, 0.36],
+        'x5': [0.85, 0.32, 0.91, 0.36, 1.0]
+    }, index=['x1', 'x2', 'x3', 'x4', 'x5'])
+    findCorrelation(R1, cutoff=0.6, exact=False)  # ['x4', 'x5', 'x1', 'x3']
+    findCorrelation(R1, cutoff=0.6, exact=True)   # ['x1', 'x5', 'x4']
+    """
+def _findCorrelation_fast(corr, avg, cutoff):
+    combsAboveCutoff = corr.where(lambda x: (np.tril(x) == 0) & (x > cutoff)).stack().index
+    rowsToCheck = combsAboveCutoff.get_level_values(0)
+    colsToCheck = combsAboveCutoff.get_level_values(1)
+    msk = avg[colsToCheck] > avg[rowsToCheck].values
+    deletecol = pd.unique(np.r_[colsToCheck[msk], rowsToCheck[~msk]]).tolist()
+    return deletecol
+def _findCorrelation_exact(corr, avg, cutoff):
+    x = corr.loc[(*[avg.sort_values(ascending=False).index] * 2,)]
+    if (x.dtypes.values[:, None] == ['int64', 'int32', 'int16', 'int8']).any():
+        x = x.astype(float)
+    x.values[(*[np.arange(len(x))] * 2,)] = np.nan
+    deletecol = []
+    for ix, i in enumerate(x.columns[:-1]):
+        for j in x.columns[ix + 1:]:
+            if x.loc[i, j] > cutoff:
+                if x[i].mean() > x[j].mean():
+                    deletecol.append(i)
+                    x.loc[i] = x[i] = np.nan
+                else:
+                    deletecol.append(j)
+                    x.loc[j] = x[j] = np.nan
+"""Funtion to Convert Data to Binaries """
+def clean_data_types(df):
+    for col in df.columns:
+        if df[col].dtype == 'object':
+            # Attempt to convert the column to numeric type
+            df[col] = pd.to_numeric(df[col], errors='coerce')
+    return df
 def drop_correlations(x_df, percentage=0.85):
     cor_matrix = x_df.corr().abs()

metacountregressor 0.1.71__tar.gz → 0.1.86__tar.gz

metacountregressor 0.1.71tar.gz → 0.1.86tar.gz