PyPI - metacountregressor - Versions diffs - 0.1.91__tar.gz → 0.1.101__tar.gz - Mend

metacountregressor 0.1.91tar.gz → 0.1.101tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

{metacountregressor-0.1.91 → metacountregressor-0.1.101}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: metacountregressor
-Version: 0.1.91
+Version: 0.1.101
 Summary: Extensions for a Python package for estimation of count models.
 Home-page: https://github.com/zahern/CountDataEstimation
 Author: Zeke Ahern
@@ -274,6 +274,8 @@ Let's begin by fitting very simple models and use the structure of these models
 ```python
+'''Setup Data'''
 df = pd.read_csv(
 "https://raw.githubusercontent.com/zahern/data/main/Ex-16-3.csv")
 X = df
@@ -281,25 +283,158 @@ y = df['FREQ']  # Frequency of crashes
 X['Offset'] = np.log(df['AADT']) # Explicitley define how to offset the data, no offset otherwise
 # Drop Y, selected offset term and  ID as there are no panels
 X = df.drop(columns=['FREQ', 'ID', 'AADT'])
+'''Aguments for Solution'''
 arguments = {
-        'algorithm': 'hs', #alternatively input 'de', or 'sa'
-        'is_multi': 1,
+        'is_multi': 1, #is two objectives considered
         'test_percentage': 0.2, # used in multi-objective optimisation only. Saves 20% of data for testing.
         'val_percentage:': 0.2, # Saves 20% of data for testing.
         'test_complexity': 3, # For Very simple Models
         'obj_1': 'BIC', '_obj_2': 'RMSE_TEST',
-        'instance_number': 'name', # used for creeating a named folder where your models are saved into from the directory
+        'instance_number': 'hs_run', # used for creeating a named folder where your models are saved into from the directory
         'distribution': ['Normal'],
-        'Model': [0],  # or equivalently ['POS', 'NB']
+        'Model': [0, 1],  # or equivalently ['POS', 'NB']
         'transformations': ['no', 'sqrt', 'archsinh'],
         '_max_time': 10000
-    }
+} '''Arguments for the solution algorithm'''
+argument_hs = {
+    '_hms': 20, #harmony memory size,
+    '_mpai': 1, #adjustement inded
+    '_par': 0.3,
+    '_hmcr': .5
+}
 obj_fun = ObjectiveFunction(X, y, **arguments)
-results = harmony_search(obj_fun)
+results = harmony_search(obj_fun, None, argument_hs)
 print(results)
 ```
+## Example: Assistance by Differential Evololution and Simulated Annealing
+Similiar to the above example we only need to change the hyperparamaters, the obj_fun can remane the same
+```python
+argument_de = {'_AI': 2,
+            '_crossover_perc': .2,
+            '_max_iter': 1000,
+            '_pop_size': 25
+}
+de_results = differential_evolution(obj_fun, None, **argument_de)
+print(de_results)
+args_sa = {'alpha': .99,
+        'STEPS_PER_TEMP': 10,
+        'INTL_ACPT': 0.5,
+        '_crossover_perc': .3,
+        'MAX_ITERATIONS': 1000,
+        '_num_intl_slns': 25,
+}
+sa_results = simulated_annealing(obj_fun, None, **args_sa)
+print(sa_results)
+```
+## Comparing to statsmodels
+The following example illustrates how the output compares to well-known packages, including Statsmodels."
+```python
+# Load modules and data
+import statsmodels.api as sm
+data = sm.datasets.sunspots.load_pandas().data
+#print(data.exog)
+data_exog = data['YEAR']
+data_exog = sm.add_constant(data_exog)
+data_endog = data['SUNACTIVITY']
+# Instantiate a gamma family model with the default link function.
+import numpy as np
+gamma_model = sm.NegativeBinomial(data_endog, data_exog)
+gamma_results = gamma_model.fit()
+print(gamma_results.summary())
+#NOW LET's COMPARE THIS TO METACOUNTREGRESSOR
+ #Model Decisions,
+manual_fit_spec = {
+    'fixed_terms': ['const','YEAR'],
+    'rdm_terms': [],
+    'rdm_cor_terms': [],
+    'grouped_terms': [],
+    'hetro_in_means': [],
+    'transformations': ['no', 'no'],
+    'dispersion': 1 #Negative Binomial
+}
+#Arguments
+arguments = {
+    'algorithm': 'hs',
+    'test_percentage': 0,
+    'test_complexity': 6,
+    'instance_number': 'name',
+    'Manual_Fit': manual_fit_spec
+}
+obj_fun = ObjectiveFunction(data_exog, data_endog, **arguments)
+```
+    Optimization terminated successfully.
+             Current function value: 4.877748
+             Iterations: 22
+             Function evaluations: 71
+             Gradient evaluations: 70
+                         NegativeBinomial Regression Results
+    ==============================================================================
+    Dep. Variable:            SUNACTIVITY   No. Observations:                  309
+    Model:               NegativeBinomial   Df Residuals:                      307
+    Method:                           MLE   Df Model:                            1
+    Date:                Tue, 13 Aug 2024   Pseudo R-squ.:                0.004087
+    Time:                        14:13:22   Log-Likelihood:                -1507.2
+    converged:                       True   LL-Null:                       -1513.4
+    Covariance Type:            nonrobust   LLR p-value:                 0.0004363
+    ==============================================================================
+                     coef    std err          z      P>|z|      [0.025      0.975]
+    ------------------------------------------------------------------------------
+    const          0.2913      1.017      0.287      0.774      -1.701       2.284
+    YEAR           0.0019      0.001      3.546      0.000       0.001       0.003
+    alpha          0.7339      0.057     12.910      0.000       0.622       0.845
+    ==============================================================================
+    0.1.88
+    Setup Complete...
+    Benchmaking test with Seed 42
+    1
+    --------------------------------------------------------------------------------
+    Log-Likelihood:  -1509.0683662284273
+    --------------------------------------------------------------------------------
+    bic: 3035.84
+    --------------------------------------------------------------------------------
+    MSE: 10000000.00
+    +--------+--------+-------+----------+----------+------------+
+    | Effect | $\tau$ | Coeff | Std. Err | z-values | Prob |z|>Z |
+    +========+========+=======+==========+==========+============+
+    | const  | no     | 0.10  |   0.25   |   0.39   | 0.70       |
+    +--------+--------+-------+----------+----------+------------+
+    | YEAR   | no     | 0.00  |   0.00   |  20.39   | 0.00***    |
+    +--------+--------+-------+----------+----------+------------+
+    | nb     |        | 1.33  |   0.00   |  50.00   | 0.00***    |
+    +--------+--------+-------+----------+----------+------------+
 ## Paper
 The following tutorial is in conjunction with our latest paper. A link the current paper can be found here [MetaCountRegressor](https://www.overleaf.com/read/mszwpwzcxsng#c5eb0c)

{metacountregressor-0.1.91 → metacountregressor-0.1.101}/README.rst RENAMED Viewed

@@ -9,7 +9,7 @@ Tutorial also available as a jupyter notebook
 =============================================
 `Download Example
-Notebook <https://github.com/zahern/CountDataEstimation/blob/main/README.ipynb>`__
+Notebook <https://github.com/zahern/CountDataEstimation/blob/main/Tutorial.ipynb>`__
 The tutorial provides more extensive examples on how to run the code and
 perform experiments. Further documentation is currently in development.
@@ -376,6 +376,8 @@ factors for our search.
 .. code:: ipython3
+    '''Setup Data'''
     df = pd.read_csv(
     "https://raw.githubusercontent.com/zahern/data/main/Ex-16-3.csv")
     X = df
@@ -383,24 +385,164 @@ factors for our search.
     X['Offset'] = np.log(df['AADT']) # Explicitley define how to offset the data, no offset otherwise
     # Drop Y, selected offset term and  ID as there are no panels
     X = df.drop(columns=['FREQ', 'ID', 'AADT'])
+    '''Aguments for Solution'''
     arguments = {
-            'algorithm': 'hs', #alternatively input 'de', or 'sa'
-            'is_multi': 1,
+            'is_multi': 1, #is two objectives considered
             'test_percentage': 0.2, # used in multi-objective optimisation only. Saves 20% of data for testing.
             'val_percentage:': 0.2, # Saves 20% of data for testing.
             'test_complexity': 3, # For Very simple Models
             'obj_1': 'BIC', '_obj_2': 'RMSE_TEST',
-            'instance_number': 'name', # used for creeating a named folder where your models are saved into from the directory
+            'instance_number': 'hs_run', # used for creeating a named folder where your models are saved into from the directory
             'distribution': ['Normal'],
-            'Model': [0],  # or equivalently ['POS', 'NB']
+            'Model': [0, 1],  # or equivalently ['POS', 'NB']
             'transformations': ['no', 'sqrt', 'archsinh'],
             '_max_time': 10000
-        }
+    } '''Arguments for the solution algorithm'''
+    argument_hs = {
+        '_hms': 20, #harmony memory size,
+        '_mpai': 1, #adjustement inded
+        '_par': 0.3,
+        '_hmcr': .5
+    }
     obj_fun = ObjectiveFunction(X, y, **arguments)
-    results = harmony_search(obj_fun)
+    results = harmony_search(obj_fun, None, argument_hs)
     print(results)
+Example: Assistance by Differential Evololution and Simulated Annealing
+-----------------------------------------------------------------------
+Similiar to the above example we only need to change the
+hyperparamaters, the obj_fun can remane the same
+.. code:: ipython3
+    argument_de = {'_AI': 2,
+                '_crossover_perc': .2,
+                '_max_iter': 1000,
+                '_pop_size': 25
+    }
+    de_results = differential_evolution(obj_fun, None, **argument_de)
+    print(de_results)
+    args_sa = {'alpha': .99,
+            'STEPS_PER_TEMP': 10,
+            'INTL_ACPT': 0.5,
+            '_crossover_perc': .3,
+            'MAX_ITERATIONS': 1000,
+            '_num_intl_slns': 25,
+    }
+    sa_results = simulated_annealing(obj_fun, None, **args_sa)
+    print(sa_results)
+Comparing to statsmodels
+------------------------
+The following example illustrates how the output compares to well-known
+packages, including Statsmodels.�
+.. code:: ipython3
+    # Load modules and data
+    import statsmodels.api as sm
+    data = sm.datasets.sunspots.load_pandas().data
+    #print(data.exog)
+    data_exog = data['YEAR']
+    data_exog = sm.add_constant(data_exog)
+    data_endog = data['SUNACTIVITY']
+    # Instantiate a gamma family model with the default link function.
+    import numpy as np
+    gamma_model = sm.NegativeBinomial(data_endog, data_exog)
+    gamma_results = gamma_model.fit()
+    print(gamma_results.summary())
+    #NOW LET's COMPARE THIS TO METACOUNTREGRESSOR
+     #Model Decisions,
+    manual_fit_spec = {
+        'fixed_terms': ['const','YEAR'],
+        'rdm_terms': [],
+        'rdm_cor_terms': [],
+        'grouped_terms': [],
+        'hetro_in_means': [],
+        'transformations': ['no', 'no'],
+        'dispersion': 1 #Negative Binomial
+    }
+    #Arguments
+    arguments = {
+        'algorithm': 'hs',
+        'test_percentage': 0,
+        'test_complexity': 6,
+        'instance_number': 'name',
+        'Manual_Fit': manual_fit_spec
+    }
+    obj_fun = ObjectiveFunction(data_exog, data_endog, **arguments)
+.. parsed-literal::
+    Optimization terminated successfully.
+             Current function value: 4.877748
+             Iterations: 22
+             Function evaluations: 71
+             Gradient evaluations: 70
+                         NegativeBinomial Regression Results
+    ==============================================================================
+    Dep. Variable:            SUNACTIVITY   No. Observations:                  309
+    Model:               NegativeBinomial   Df Residuals:                      307
+    Method:                           MLE   Df Model:                            1
+    Date:                Tue, 13 Aug 2024   Pseudo R-squ.:                0.004087
+    Time:                        14:13:22   Log-Likelihood:                -1507.2
+    converged:                       True   LL-Null:                       -1513.4
+    Covariance Type:            nonrobust   LLR p-value:                 0.0004363
+    ==============================================================================
+                     coef    std err          z      P>|z|      [0.025      0.975]
+    ------------------------------------------------------------------------------
+    const          0.2913      1.017      0.287      0.774      -1.701       2.284
+    YEAR           0.0019      0.001      3.546      0.000       0.001       0.003
+    alpha          0.7339      0.057     12.910      0.000       0.622       0.845
+    ==============================================================================
+    0.1.88
+    Setup Complete...
+    Benchmaking test with Seed 42
+    1
+    --------------------------------------------------------------------------------
+    Log-Likelihood:  -1509.0683662284273
+    --------------------------------------------------------------------------------
+    bic: 3035.84
+    --------------------------------------------------------------------------------
+    MSE: 10000000.00
+    +--------+--------+-------+----------+----------+------------+
+    | Effect | $\tau$ | Coeff | Std. Err | z-values | Prob |z|>Z |
+    +========+========+=======+==========+==========+============+
+    | const  | no     | 0.10  |   0.25   |   0.39   | 0.70       |
+    +--------+--------+-------+----------+----------+------------+
+    | YEAR   | no     | 0.00  |   0.00   |  20.39   | 0.00***    |
+    +--------+--------+-------+----------+----------+------------+
+    | nb     |        | 1.33  |   0.00   |  50.00   | 0.00***    |
+    +--------+--------+-------+----------+----------+------------+
 Paper
 -----

metacountregressor-0.1.101/metacountregressor/app_main.py ADDED Viewed

@@ -0,0 +1,253 @@
+import warnings
+import argparse
+import csv
+import faulthandler
+import ast
+from typing import Any
+import cProfile
+import numpy as np
+import pandas as pd
+from pandas import DataFrame
+from pandas.io.parsers import TextFileReader
+import helperprocess
+from metaheuristics import (differential_evolution,
+                            harmony_search,
+                            simulated_annealing)
+from solution import ObjectiveFunction
+warnings.simplefilter("ignore")
+faulthandler.enable()
+def convert_df_columns_to_binary_and_wide(df):
+    columns = list(df.columns)
+    df = pd.get_dummies(df, columns=columns, drop_first=True)
+    return df
+def process_arguments():
+    '''
+    TRYING TO TURN THE CSV FILES INTO RELEVANT ARGS
+    '''
+    try:
+        data_characteristic = pd.read_csv('problem_data.csv')
+        analyst_d = pd.read_csv('decisions.csv')
+        hyper = pd.read_csv('setup_hyper.csv')
+    except Exception as e:
+        print(e)
+        print('Files Have Not Been Set Up Yet..')
+        print('Run the App')
+        exit()
+    new_data = {'data': data_characteristic,
+                'analyst':analyst_d,
+                'hyper': hyper}
+    return new_data
+def main(args, **kwargs):
+    '''METACOUNT REGRESSOR TESTING ENVIRONMENT'''
+    print('the args is:', args)
+    print('the kwargs is', kwargs)
+    # removing junk files if specicified
+    helperprocess.remove_files(args.get('removeFiles', True))
+    # do we want to run a test
+    data_info = process_arguments()
+    data_info['hyper']
+    data_info['analyst']
+    data_info['data']['Y']
+    #data_info['data']['Group'][0]
+    #data_info['data']['Panel'][0]
+    args['decisions'] = data_info['analyst']
+    if not np.isnan(data_info['data']['Grouped'][0]):
+        args['group'] = data_info['data']['Grouped'][0]
+        args['ID'] = data_info['data']['Grouped'][0]
+    if not np.isnan(data_info['data']['Panel'][0]):
+        args['panels'] = data_info['data']['Panel'][0]
+    df = pd.read_csv(str(data_info['data']['Problem'][0]))
+    x_df = df.drop(columns=[data_info['data']['Y'][0]])
+    y_df = df[[data_info['data']['Y'][0]]]
+    y_df.rename(columns={data_info['data']['Y'][0]: "Y"}, inplace=True)
+    manual_fit_spec = None #TODO add in manual fit
+    if args['Keep_Fit'] == str(2) or args['Keep_Fit'] == 2:
+        if manual_fit_spec is None:
+            args['Manual_Fit'] = None
+        else:
+            print('fitting manually')
+            args['Manual_Fit'] = manual_fit_spec
+    if args['problem_number'] == str(8) or args['problem_number'] == 8:
+        print('Maine County Dataset.')
+        args['group'] = 'county'
+        args['panels'] = 'element_ID'
+        args['ID'] = 'element_ID'
+        args['_max_characteristics'] = 55
+    elif args['problem_number'] == str(9) or args['problem_number'] == 9:
+        args['group'] = 'group'
+        args['panels'] = 'ind_id'
+        args['ID'] = 'ind_id'
+    args['complexity_level'] = args.get('complexity_level', 6)
+    # Initialize AnalystSpecs to None if not manually provided
+    args['AnalystSpecs'] = args.get('AnalystSpecs', None)
+    if args['algorithm'] == 'sa':
+        args_hyperparameters = {'alpha': float(args['temp_scale']),
+                                'STEPS_PER_TEMP': int(args['steps']),
+                                'INTL_ACPT': 0.5,
+                                '_crossover_perc': args['crossover'],
+                                'MAX_ITERATIONS': int(args['_max_imp']),
+                                '_num_intl_slns': 25,
+                                'Manual_Fit': args['Manual_Fit'],
+                                'MP': int(args['MP'])}
+        helperprocess.entries_to_remove(('crossover', '_max_imp', '_hms', '_hmcr', '_par'), args)
+        print(args)
+        obj_fun = ObjectiveFunction(x_df, y_df, **args)
+        results = simulated_annealing(obj_fun, None, **args_hyperparameters)
+        helperprocess.results_printer(results, args['algorithm'], int(args['is_multi']))
+        if args['dual_complexities']:
+            args['complexity_level'] = args['secondary_complexity']
+            obj_fun = ObjectiveFunction(x_df, y_df, **args)
+            results = simulated_annealing(obj_fun, None, **args_hyperparameters)
+            helperprocess.results_printer(results, args['algorithm'], int(args['is_multi']))
+    elif args['algorithm'] == 'hs':
+        args['_mpai'] = 1
+        obj_fun = ObjectiveFunction(x_df, y_df, **args)
+        args_hyperparameters = {
+            'Manual_Fit': args['Manual_Fit'],
+            'MP': int(args['MP'])
+        }
+        results = harmony_search(obj_fun, None, **args_hyperparameters)
+        helperprocess.results_printer(results, args['algorithm'], int(args['is_multi']))
+        if args.get('dual_complexities', 0):
+            args['complexity_level'] = args['secondary_complexity']
+            obj_fun = ObjectiveFunction(x_df, y_df, **args)
+            results = harmony_search(obj_fun, None, **args_hyperparameters)
+            helperprocess.results_printer(results, args['algorithm'], int(args['is_multi']))
+    elif args['algorithm'] == 'de':
+        # force variables
+        args['must_include'] = args.get('force', [])
+        args_hyperparameters = {'_AI': args.get('_AI', 2),
+                                '_crossover_perc': float(args['crossover']),
+                                '_max_iter': int(args['_max_imp'])
+            , '_pop_size': int(args['_hms']), 'instance_number': int(args['line'])
+            , 'Manual_Fit': args['Manual_Fit'],
+                                'MP': int(args['MP'])
+                                }
+        args_hyperparameters = dict(args_hyperparameters)
+        helperprocess.entries_to_remove(('crossover', '_max_imp', '_hms', '_hmcr', '_par'), args)
+        obj_fun = ObjectiveFunction(x_df, y_df, **args)
+        results = differential_evolution(obj_fun, None, **args_hyperparameters)
+        helperprocess.results_printer(results, args['algorithm'], int(args['is_multi']))
+        if args['dual_complexities']:
+            args['complexity_level'] = args['secondary_complexity']
+            obj_fun = ObjectiveFunction(x_df, y_df, **args)
+            results = differential_evolution(obj_fun, None, **args_hyperparameters)
+            helperprocess.results_printer(results, args['algorithm'], int(args['is_multi'])) #TODO FIX This
+if __name__ == '__main__':
+    """Loading in command line args.  """
+    alg_parser = argparse.ArgumentParser(prog='algorithm', epilog='algorithm specific arguments')
+    alg_parser.add_argument('-AI', default=2, help='adjustment index. For the allowable movement of the algorithm')
+    alg_parser.print_help()
+    parser = argparse.ArgumentParser(prog='main',
+                                     epilog=main.__doc__,
+                                     formatter_class=argparse.RawDescriptionHelpFormatter, conflict_handler='resolve')
+    parser.add_argument('-line', type=int, default=1,
+                        help='line to read in csv to pass in argument')
+    if vars(parser.parse_args())['line'] is not None:
+        reader = csv.DictReader(open('set_data.csv', 'r'))
+        args = list()
+        line_number_obs = 0
+        for dictionary in reader:  # TODO find a way to handle multiple args
+            args = dictionary
+            if line_number_obs == int(vars(parser.parse_args())['line']):
+                break
+            line_number_obs += 1
+        args = dict(args)
+        for key, value in args.items():
+            try:
+                # Attempt to parse the string value to a Python literal if value is a string.
+                if isinstance(value, str):
+                    value = ast.literal_eval(value)
+            except (ValueError, SyntaxError):
+                # If there's a parsing error, value remains as the original string.
+                pass
+            # Add the argument to the parser with the potentially updated value.
+            parser.add_argument(f'-{key}', default=value)
+        for i, action in enumerate(parser._optionals._actions):
+            if "-algorithm" in action.option_strings:
+                parser._optionals._actions[i].help = "optimization algorithm"
+        override = True
+        if override:
+            print('todo turn off, in testing phase')
+            parser.add_argument('-problem_number', default='10')
+            print('did it make it')
+        if 'algorithm' not in args:
+            parser.add_argument('-algorithm', type=str, default='hs',
+                                help='optimization algorithm')
+        elif 'Manual_Fit' not in args:
+            parser.add_argument('-Manual_Fit', action='store_false', default=None,
+                                help='To fit a model manually if desired.')
+        parser.add_argument('-seperate_out_factors', action='store_false', default=False,
+                            help='Trie of wanting to split data that is potentially categorical as binary'
+                                 ' we want to split the data for processing')
+        parser.add_argument('-supply_csv', type = str, help = 'enter the name of the csv, please include it as a full directorys')
+    else:  # DIDN"T SPECIFY LINES TRY EACH ONE MANNUALY
+        parser.add_argument('-com', type=str, default='MetaCode',
+                            help='line to read csv')
+    # Check the args
+    parser.print_help()
+    args = vars(parser.parse_args())
+    print(type(args))
+    # TODO add in chi 2 and df in estimation and compare degrees of freedom this needs to be done in solution
+    # Print the args.
+    profiler = cProfile.Profile()
+    profiler.runcall(main,args)
+    profiler.print_stats(sort='time')
+    #TOO MAX_TIME

{metacountregressor-0.1.91 → metacountregressor-0.1.101}/metacountregressor/main.py RENAMED Viewed

@@ -28,63 +28,75 @@ def convert_df_columns_to_binary_and_wide(df):
     return df
+def process_arguments():
+    '''
+    TRYING TO TURN THE CSV FILES INTO RELEVANT ARGS
+    '''
+    data_characteristic = pd.read_csv('problem_data.csv')
+    analyst_d = pd.read_csv('decisions.csv')
+    hyper = pd.read_csv('setup_hyper.csv')
+    new_data = {'data': data_characteristic,
+                'analyst':analyst_d,
+                'hyper': hyper}
+    return new_data
 def main(args, **kwargs):
     '''METACOUNT REGRESSOR TESTING ENVIRONMENT'''
-    import statsmodels.api as sm
-    data = sm.datasets.sunspots.load_pandas().data
-    # print(data.exog)
-    data_exog = data['YEAR']
-    data_exog = sm.add_constant(data_exog)
-    data_endog = data['SUNACTIVITY']
-    # Instantiate a gamma family model with the default link function.
-    import numpy as np
-    gamma_model = sm.NegativeBinomial(data_endog, data_exog)
-    gamma_results = gamma_model.fit()
-    print(gamma_results.summary())
+    '''
+    TESTING_ENV = False
+    if TESTING_ENV:
-    # NOW LET's COMPARE THIS TO METACOUNT REGRESSOR
-    import metacountregressor
-    from importlib.metadata import version
-    print(version('metacountregressor'))
-    import pandas as pd
-    import numpy as np
-    from metacountregressor.solution import ObjectiveFunction
-    from metacountregressor.metaheuristics import (harmony_search,
-                                                   differential_evolution,
-                                                   simulated_annealing)
+        import statsmodels.api as sm
-    # Model Decisions,
-    manual_fit_spec = {
+        data = sm.datasets.sunspots.load_pandas().data
+        # print(data.exog)
+        data_exog = data['YEAR']
+        data_exog = sm.add_constant(data_exog)
+        data_endog = data['SUNACTIVITY']
-        'fixed_terms': ['const', 'YEAR'],
-        'rdm_terms': [],
-        'rdm_cor_terms': [],
-        'grouped_terms': [],
-        'hetro_in_means': [],
-        'transformations': ['no', 'no'],
-        'dispersion': 1  # Negative Binomial
-    }
-    # Arguments
-    arguments = {
-        'algorithm': 'hs',
-        'test_percentage': 0,
-        'test_complexity': 6,
-        'instance_number': 'name',
-        'Manual_Fit': manual_fit_spec
-    }
-    obj_fun = ObjectiveFunction(data_exog, data_endog, **arguments)
-    #exit()
+        # Instantiate a gamma family model with the default link function.
+        import numpy as np
+        gamma_model = sm.NegativeBinomial(data_endog, data_exog)
+        gamma_results = gamma_model.fit()
+        print(gamma_results.summary())
+        # NOW LET's COMPARE THIS TO METACOUNT REGRESSOR
+        import metacountregressor
+        from importlib.metadata import version
+        print(version('metacountregressor'))
+        import pandas as pd
+        import numpy as np
+        from metacountregressor.solution import ObjectiveFunction
+        from metacountregressor.metaheuristics import (harmony_search,
+                                                       differential_evolution,
+                                                       simulated_annealing)
+        # Model Decisions,
+        manual_fit_spec = {
+            'fixed_terms': ['const', 'YEAR'],
+            'rdm_terms': [],
+            'rdm_cor_terms': [],
+            'grouped_terms': [],
+            'hetro_in_means': [],
+            'transformations': ['no', 'no'],
+            'dispersion': 1  # Negative Binomial
+        }
+        # Arguments
+        arguments = {
+            'algorithm': 'hs',
+            'test_percentage': 0,
+            'test_complexity': 6,
+            'instance_number': 'name',
+            'Manual_Fit': manual_fit_spec
+        }
+        obj_fun = ObjectiveFunction(data_exog, data_endog, **arguments)
+    '''
     print('the args is:', args)
@@ -275,7 +287,25 @@ def main(args, **kwargs):
         x_df = helperprocess.interactions(x_df, keep)
     else:  # the dataset has been selected in the program as something else
-        print('TODO add in dataset')
+        data_info = process_arguments()
+        data_info['hyper']
+        data_info['analyst']
+        data_info['data']['Y']
+        #data_info['data']['Group'][0]
+        #data_info['data']['Panel'][0]
+        args['decisions'] = data_info['analyst']
+        if not np.isnan(data_info['data']['Grouped'][0]):
+            args['group'] = data_info['data']['Grouped'][0]
+            args['ID'] = data_info['data']['Grouped'][0]
+        if not np.isnan(data_info['data']['Panel'][0]):
+            args['panels'] = data_info['data']['Panel'][0]
+        df = pd.read_csv(str(data_info['data']['Problem'][0]))
+        x_df = df.drop(columns=[data_info['data']['Y'][0]])
+        y_df = df[[data_info['data']['Y'][0]]]
+        y_df.rename(columns={data_info['data']['Y'][0]: "Y"}, inplace=True)
+        print('test') #FIXME
     if args['Keep_Fit'] == str(2) or args['Keep_Fit'] == 2:
         if manual_fit_spec is None:
@@ -294,6 +324,8 @@ def main(args, **kwargs):
         args['panels'] = 'ind_id'
         args['ID'] = 'ind_id'
     args['complexity_level'] = args.get('complexity_level', 6)
@@ -380,7 +412,7 @@ if __name__ == '__main__':
                                      epilog=main.__doc__,
                                      formatter_class=argparse.RawDescriptionHelpFormatter, conflict_handler='resolve')
-    parser.add_argument('-line', type=int, default=44,
+    parser.add_argument('-line', type=int, default=1,
                         help='line to read in csv to pass in argument')
     if vars(parser.parse_args())['line'] is not None:
@@ -413,7 +445,7 @@ if __name__ == '__main__':
         override = True
         if override:
             print('todo turn off, in testing phase')
-            parser.add_argument('-problem_number', default='4')
+            parser.add_argument('-problem_number', default='10')
             print('did it make it')
         if 'algorithm' not in args:
             parser.add_argument('-algorithm', type=str, default='hs',

{metacountregressor-0.1.91 → metacountregressor-0.1.101}/metacountregressor/metaheuristics.py RENAMED Viewed

@@ -20,8 +20,8 @@ try:
     from .solution import ObjectiveFunction
 except:
     print('Exception relative import')
-    from metacountregressor.pareto_file import Pareto, Solution
-    from metacountregressor.solution import ObjectiveFunction
+    from pareto_file import Pareto, Solution
+    from solution import ObjectiveFunction
 HarmonySearchResults = namedtuple('HarmonySearchResults',

{metacountregressor-0.1.91 → metacountregressor-0.1.101}/metacountregressor/solution.py RENAMED Viewed

@@ -38,8 +38,8 @@ try:
     from .pareto_file import Pareto, Solution
     from .data_split_helper import DataProcessor
 except ImportError:
-    from metacountregressor._device_cust import device as dev
-    from metacountregressor.pareto_file import Pareto, Solution
+    from _device_cust import device as dev
+    from pareto_file import Pareto, Solution
     from data_split_helper import DataProcessor
@@ -232,7 +232,7 @@ class ObjectiveFunction(object):
             if self.test_percentage == 0:
                 self.is_multi = False
-            if 'panels' in kwargs:
+            if 'panels' in kwargs and not np.isnan(kwargs.get('panels')):
                 self.group_names = np.asarray(x_data[kwargs['group']].astype('category').cat._parent.dtype.categories)
                 x_data[kwargs['group']] = x_data[kwargs['group']].astype(
@@ -279,7 +279,7 @@ class ObjectiveFunction(object):
         exclude_this_test = [4]
-        if 'panels' in kwargs:
+        if 'panels' in kwargs and not np.isnan(kwargs.get('panels')):
             self.panels = np.asarray(df_train[kwargs['panels']])
             self.panels_test = np.asarray(df_test[kwargs['panels']])
             self.ids = np.asarray(
@@ -411,9 +411,10 @@ class ObjectiveFunction(object):
         # self._distribution = ['triangular', 'uniform', 'normal', 'ln_normal', 'tn_normal', 'lindley']
-        self._distribution = kwargs.get('_distributions', ['triangular', 'uniform', 'normal', 'lm_normal', 'tn_normal'])
+        self._distribution = kwargs.get('_distributions', ['triangular', 'uniform', 'normal', 'ln_normal', 'tn_normal'])
         if self.G is not None:
+            #TODO need to handle this for groups
             self._distribution = ["trad| " + item for item in self._distribution
                                   ] + ["grpd| " + item for item in self._distribution]
@@ -425,10 +426,15 @@ class ObjectiveFunction(object):
         self.significant = 0
         # define the states of our explanatory variables
         self._discrete_values = self.define_alphas(self.complexity_level, exclude_this_test,
-                                                   kwargs.get('must_include', []))
+                                                   kwargs.get('must_include', []), extra = kwargs.get('decisions', None))
         self._discrete_values = self._discrete_values + \
-                                [[x for x in self._distribution]] * self._characteristics
+                                self.define_distributions_analyst(extra=kwargs.get('decisions', None))
         if 'model_types' in kwargs:
             model_types = kwargs['model_types']
@@ -436,7 +442,7 @@ class ObjectiveFunction(object):
             model_types = [[0, 1]]  # add 2 for Generalized Poisson
         self._discrete_values = self._discrete_values + self.define_poissible_transforms(
-            self._transformations) + model_types
+            self._transformations, kwargs.get('decisions',None)) + model_types
         self._model_type_codes = ['p', 'nb',
                                   'gp', "pl", ["nb-theta", 'nb-dis']]
@@ -787,14 +793,60 @@ class ObjectiveFunction(object):
         par = np.nan_to_num(par)
         return par
-    def define_alphas(self, complexity_level=4, exclude=[], include=[]):
+    def rename_distro(self, distro):
+        # Mapping dictionary
+        mapping = {
+            'Normal': 'normal',
+            'Triangular': 'triangular',
+            'Uniform': 'uniform',
+            'Log-Normal': 'ln_normal',
+            'Trunc-Normal': 'tn_normal'
+        }
+        # Use list comprehension with the mapping
+        new_distro = [mapping.get(i, i) for i in distro]
+        return  new_distro
+    def define_distributions_analyst(self, extra = None):
+        if extra is not None:
+            set_alpha = []
+            for col in self._characteristics_names:
+                if col in extra[('Column')].values:
+                    matched_index = extra[('Column')].index[extra[('Column')] == col].tolist()
+                    distro = ast.literal_eval(extra.iloc[matched_index, 7].values.tolist()[0])
+                    distro = self.rename_distro(distro)
+                    set_alpha = set_alpha+[distro]
+            return set_alpha
+        return  [[x for x in self._distribution]] * self._characteristics
+    def define_alphas(self, complexity_level=4, exclude=[], include=[], extra = None):
         'complexity level'
         '''
         2 is feature selection,
-        3 is random paramaters
-        4 is correlated random paramaters
+        3 is random parameters
+        4 is correlated random parameters
+        extra is the stuff defined by the Meta APP
         '''
         set_alpha = []
+        if extra is not None:
+            for col in self._characteristics_names:
+                if col == 'const' or col == 'Constant' or col == 'constant':  # no random paramaters for const
+                    set_alpha = set_alpha + [[1]]
+                elif col == 'Offset':
+                    set_alpha = set_alpha + [[1]]
+                elif col in extra[('Column')].values:
+                    matched_index = extra[('Column')].index[extra[('Column')] == col].tolist()
+                    check = list(itertools.chain(*extra.iloc[matched_index, 1:7].values))
+                    set_alpha = set_alpha + [[x for x in range(len(check)) if check[x] == True]]
+            return set_alpha
         for col in self._characteristics_names:
             if col == 'const' or col == 'Constant' or col == 'constant':  # no random paramaters for const
                 set_alpha = set_alpha + [[1]]
@@ -1238,7 +1290,7 @@ class ObjectiveFunction(object):
         with open(filename, 'w') as file:
             file.write(content)
-    def define_poissible_transforms(self, transforms) -> list:
+    def define_poissible_transforms(self, transforms, extra= None) -> list:
         transform_set = []
         if not isinstance(self._x_data, pd.DataFrame):
             x_data = self._x_data.reshape(self.N * self.P, -1).copy()
@@ -2488,7 +2540,7 @@ class ObjectiveFunction(object):
         random.seed(seed)
     def set_random_seed(self):
-        print('Imbdedding Seed', self._random_seed)
+        print('Imbedding Seed', self._random_seed)
         np.random.seed(self._random_seed)
         random.seed(self._random_seed)

{metacountregressor-0.1.91 → metacountregressor-0.1.101}/metacountregressor.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: metacountregressor
-Version: 0.1.91
+Version: 0.1.101
 Summary: Extensions for a Python package for estimation of count models.
 Home-page: https://github.com/zahern/CountDataEstimation
 Author: Zeke Ahern
@@ -274,6 +274,8 @@ Let's begin by fitting very simple models and use the structure of these models
 ```python
+'''Setup Data'''
 df = pd.read_csv(
 "https://raw.githubusercontent.com/zahern/data/main/Ex-16-3.csv")
 X = df
@@ -281,25 +283,158 @@ y = df['FREQ']  # Frequency of crashes
 X['Offset'] = np.log(df['AADT']) # Explicitley define how to offset the data, no offset otherwise
 # Drop Y, selected offset term and  ID as there are no panels
 X = df.drop(columns=['FREQ', 'ID', 'AADT'])
+'''Aguments for Solution'''
 arguments = {
-        'algorithm': 'hs', #alternatively input 'de', or 'sa'
-        'is_multi': 1,
+        'is_multi': 1, #is two objectives considered
         'test_percentage': 0.2, # used in multi-objective optimisation only. Saves 20% of data for testing.
         'val_percentage:': 0.2, # Saves 20% of data for testing.
         'test_complexity': 3, # For Very simple Models
         'obj_1': 'BIC', '_obj_2': 'RMSE_TEST',
-        'instance_number': 'name', # used for creeating a named folder where your models are saved into from the directory
+        'instance_number': 'hs_run', # used for creeating a named folder where your models are saved into from the directory
         'distribution': ['Normal'],
-        'Model': [0],  # or equivalently ['POS', 'NB']
+        'Model': [0, 1],  # or equivalently ['POS', 'NB']
         'transformations': ['no', 'sqrt', 'archsinh'],
         '_max_time': 10000
-    }
+} '''Arguments for the solution algorithm'''
+argument_hs = {
+    '_hms': 20, #harmony memory size,
+    '_mpai': 1, #adjustement inded
+    '_par': 0.3,
+    '_hmcr': .5
+}
 obj_fun = ObjectiveFunction(X, y, **arguments)
-results = harmony_search(obj_fun)
+results = harmony_search(obj_fun, None, argument_hs)
 print(results)
 ```
+## Example: Assistance by Differential Evololution and Simulated Annealing
+Similiar to the above example we only need to change the hyperparamaters, the obj_fun can remane the same
+```python
+argument_de = {'_AI': 2,
+            '_crossover_perc': .2,
+            '_max_iter': 1000,
+            '_pop_size': 25
+}
+de_results = differential_evolution(obj_fun, None, **argument_de)
+print(de_results)
+args_sa = {'alpha': .99,
+        'STEPS_PER_TEMP': 10,
+        'INTL_ACPT': 0.5,
+        '_crossover_perc': .3,
+        'MAX_ITERATIONS': 1000,
+        '_num_intl_slns': 25,
+}
+sa_results = simulated_annealing(obj_fun, None, **args_sa)
+print(sa_results)
+```
+## Comparing to statsmodels
+The following example illustrates how the output compares to well-known packages, including Statsmodels."
+```python
+# Load modules and data
+import statsmodels.api as sm
+data = sm.datasets.sunspots.load_pandas().data
+#print(data.exog)
+data_exog = data['YEAR']
+data_exog = sm.add_constant(data_exog)
+data_endog = data['SUNACTIVITY']
+# Instantiate a gamma family model with the default link function.
+import numpy as np
+gamma_model = sm.NegativeBinomial(data_endog, data_exog)
+gamma_results = gamma_model.fit()
+print(gamma_results.summary())
+#NOW LET's COMPARE THIS TO METACOUNTREGRESSOR
+ #Model Decisions,
+manual_fit_spec = {
+    'fixed_terms': ['const','YEAR'],
+    'rdm_terms': [],
+    'rdm_cor_terms': [],
+    'grouped_terms': [],
+    'hetro_in_means': [],
+    'transformations': ['no', 'no'],
+    'dispersion': 1 #Negative Binomial
+}
+#Arguments
+arguments = {
+    'algorithm': 'hs',
+    'test_percentage': 0,
+    'test_complexity': 6,
+    'instance_number': 'name',
+    'Manual_Fit': manual_fit_spec
+}
+obj_fun = ObjectiveFunction(data_exog, data_endog, **arguments)
+```
+    Optimization terminated successfully.
+             Current function value: 4.877748
+             Iterations: 22
+             Function evaluations: 71
+             Gradient evaluations: 70
+                         NegativeBinomial Regression Results
+    ==============================================================================
+    Dep. Variable:            SUNACTIVITY   No. Observations:                  309
+    Model:               NegativeBinomial   Df Residuals:                      307
+    Method:                           MLE   Df Model:                            1
+    Date:                Tue, 13 Aug 2024   Pseudo R-squ.:                0.004087
+    Time:                        14:13:22   Log-Likelihood:                -1507.2
+    converged:                       True   LL-Null:                       -1513.4
+    Covariance Type:            nonrobust   LLR p-value:                 0.0004363
+    ==============================================================================
+                     coef    std err          z      P>|z|      [0.025      0.975]
+    ------------------------------------------------------------------------------
+    const          0.2913      1.017      0.287      0.774      -1.701       2.284
+    YEAR           0.0019      0.001      3.546      0.000       0.001       0.003
+    alpha          0.7339      0.057     12.910      0.000       0.622       0.845
+    ==============================================================================
+    0.1.88
+    Setup Complete...
+    Benchmaking test with Seed 42
+    1
+    --------------------------------------------------------------------------------
+    Log-Likelihood:  -1509.0683662284273
+    --------------------------------------------------------------------------------
+    bic: 3035.84
+    --------------------------------------------------------------------------------
+    MSE: 10000000.00
+    +--------+--------+-------+----------+----------+------------+
+    | Effect | $\tau$ | Coeff | Std. Err | z-values | Prob |z|>Z |
+    +========+========+=======+==========+==========+============+
+    | const  | no     | 0.10  |   0.25   |   0.39   | 0.70       |
+    +--------+--------+-------+----------+----------+------------+
+    | YEAR   | no     | 0.00  |   0.00   |  20.39   | 0.00***    |
+    +--------+--------+-------+----------+----------+------------+
+    | nb     |        | 1.33  |   0.00   |  50.00   | 0.00***    |
+    +--------+--------+-------+----------+----------+------------+
 ## Paper
 The following tutorial is in conjunction with our latest paper. A link the current paper can be found here [MetaCountRegressor](https://www.overleaf.com/read/mszwpwzcxsng#c5eb0c)

{metacountregressor-0.1.91 → metacountregressor-0.1.101}/metacountregressor.egg-info/SOURCES.txt RENAMED Viewed

@@ -4,6 +4,7 @@ setup.cfg
 setup.py
 metacountregressor/__init__.py
 metacountregressor/_device_cust.py
+metacountregressor/app_main.py
 metacountregressor/data_split_helper.py
 metacountregressor/halton.py
 metacountregressor/helperprocess.py