PyPI - data-manipulation-utilities - Versions diffs - 0.2.4__tar.gz → 0.2.6__tar.gz - Mend

data-manipulation-utilities 0.2.4tar.gz → 0.2.6tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (61) hide show

{data_manipulation_utilities-0.2.4 → data_manipulation_utilities-0.2.6}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: data_manipulation_utilities
-Version: 0.2.4
+Version: 0.2.6
 Description-Content-Type: text/markdown
 Requires-Dist: logzero
 Requires-Dist: PyYAML
@@ -26,7 +26,7 @@ These are tools that can be used for different data analysis tasks.
 ## Pushing
-From the root directory of a version controlled project (i.e. a directory with the `.git` subdirectory)
+From the root directory of a version controlled project (i.e. a directory with the `.git` subdirectory)
 using a `pyproject.toml` file, run:
 ```bash
@@ -36,10 +36,10 @@ publish
 such that:
 1. The `pyproject.toml` file is checked and the version of the project is extracted.
-1. If a tag named as the version exists move to the steps below.
+1. If a tag named as the version exists move to the steps below.
 1. If it does not, make a new tag with the name as the version
-Then, for each remote it pushes the tags and the commits.
+Then, for each remote it pushes the tags and the commits.
 *Why?*
@@ -137,7 +137,17 @@ pdf   = mod.get_pdf()
 ```
 where the model is a sum of three `CrystallBall` PDFs, one with a right tail and two with a left tail.
-The `mu` and `sg` parameters are shared.
+The `mu` and `sg` parameters are shared. The elementary components that can be plugged are:
+```
+exp: Exponential
+pol1: Polynomial of degree 1
+pol2: Polynomial of degree 2
+cbr : CrystallBall with right tail
+cbl : CrystallBall with left tail
+gauss : Gaussian
+dscb : Double sided CrystallBall
+```
 ### Printing PDFs
@@ -299,7 +309,7 @@ this will:
 - Try fitting at most 10 times
 - After each fit, calculate the goodness of fit (in this case the p-value)
 - Stop when the number of tries has been exhausted or the p-value reached is higher than `0.05`
-- If the fit has not succeeded because of convergence, validity or goodness of fit issues,
+- If the fit has not succeeded because of convergence, validity or goodness of fit issues,
 randomize the parameters and try again.
 - If the desired goodness of fit has not been achieved, pick the best result.
 - Return the `FitResult` object and set the PDF to the final fit result.
@@ -337,11 +347,11 @@ bkg = zfit.pdf.Exponential(obs=obs, lam=lm)
 nbk = zfit.Parameter('nbk', 1000, 0, 10000)
 ebkg= bkg.create_extended(nbk, name='expo')
-# Add them
+# Add them
 pdf = zfit.pdf.SumPDF([ebkg, esig])
 sam = pdf.create_sampler()
-# Plot them
+# Plot them
 obj   = ZFitPlotter(data=sam, model=pdf)
 d_leg = {'gauss': 'New Gauss'}
 obj.plot(nbins=50, d_leg=d_leg, stacked=True, plot_range=(0, 10), ext_text='Extra text here')
@@ -353,7 +363,7 @@ obj.axs[1].plot([0, 10], [0, 0], linestyle='--', color='black')
 this class supports:
 - Handling title, legend, plots size.
-- Adding pulls.
+- Adding pulls.
 - Stacking and overlaying of PDFs.
 - Blinding.
@@ -417,7 +427,7 @@ rdf_bkg = _get_rdf(kind='bkg')
 cfg     = _get_config()
 obj= TrainMva(sig=rdf_sig, bkg=rdf_bkg, cfg=cfg)
-obj.run()
+obj.run(skip_fit=False) # by default it will be false, if true, it will only make plots of features
 ```
 where the settings for the training go in a config dictionary, which when written to YAML looks like:
@@ -434,7 +444,7 @@ dataset:
     nan:
         x : 0
         y : 0
-        z : -999
+        z : -999
 training :
     nfold    : 10
     features : [x, y, z]
@@ -497,7 +507,7 @@ When training on real data, several things might go wrong and the code will try
 will end up in different folds. The tool checks for wether a model is evaluated for an entry that was used for training and raise an exception. Thus, repeated
 entries will be removed before training.
-- **NaNs**: Entries with NaNs will break the training with the scikit `GradientBoostClassifier` base class. Thus, we:
+- **NaNs**: Entries with NaNs will break the training with the scikit `GradientBoostClassifier` base class. Thus, we:
     - Can use the `nan` section shown above to replace `NaN` values with something else
     - For whatever remains we remove the entries from the training.
@@ -539,7 +549,7 @@ When evaluating the model with real data, problems might occur, we deal with the
     ```python
     model.cfg
     ```
-    - For whatever entries that are still NaN, they will be _patched_  with zeros and evaluated. However, before returning, the probabilities will be
+    - For whatever features that are still NaN, they will be _patched_  with zeros when evaluated. However, the returned probabilities will be
 saved as -1. I.e. entries with NaNs will have probabilities of -1.
 # Pandas dataframes
@@ -674,6 +684,9 @@ ptr.run()
 where the config dictionary `cfg_dat` in YAML would look like:
 ```yaml
+general:
+    # This will set the figure size
+    size : [20, 10]
 selection:
     #Will do at most 50K random entries. Will only happen if the dataset has more than 50K entries
     max_ran_entries : 50000
@@ -703,6 +716,16 @@ plots:
         yscale     : 'linear'
         labels     : ['x + y', 'Entries']
         normalized : true #This should normalize to the area
+# Some vertical dashed lines are drawn by default
+# If you see them, you can turn them off with this
+style:
+  skip_lines : true
+  # This can pass arguments to legend making function `plt.legend()` in matplotlib
+  legend:
+    # The line below would place the legend outside the figure to avoid ovelaps with the histogram
+    bbox_to_anchor : [1.2, 1]
+stats:
+  nentries : '{:.2e}' # This will add number of entries in legend box
 ```
 it's up to the user to build this dictionary and load it.
@@ -724,14 +747,19 @@ The config would look like:
 ```yaml
 saving:
     plt_dir : tests/plotting/2d
+selection:
+  cuts:
+    xlow : x > -1.5
 general:
     size : [20, 10]
 plots_2d:
     # Column x and y
     # Name of column where weights are, null for not weights
     # Name of output plot, e.g. xy_x.png
-    - [x, y, weights, 'xy_w']
-    - [x, y,    null, 'xy_r']
+    # Book signaling to use log scale for z axis
+    - [x, y, weights, 'xy_w', false]
+    - [x, y,    null, 'xy_r', false]
+    - [x, y,    null, 'xy_l',  true]
 axes:
     x :
         binning : [-5.0, 8.0, 40]
@@ -823,7 +851,7 @@ Directory/Treename
     B_ENDVERTEX_CHI2DOF           Double_t
 ```
-## Comparing ROOT files
+## Comparing ROOT files
 Given two ROOT files the command below:
@@ -885,7 +913,7 @@ last_file = get_latest_file(dir_path = file_dir, wc='name_*.txt')
 # of directories in `dir_path`, e.g.:
 oversion=get_last_version(dir_path=dir_path, version_only=True)  # This will return only the version, e.g. v3.2
-oversion=get_last_version(dir_path=dir_path, version_only=False) # This will return full path, e.g. /a/b/c/v3.2
+oversion=get_last_version(dir_path=dir_path, version_only=False) # This will return full path, e.g. /a/b/c/v3.2
 ```
 The function above should work for numeric (e.g. `v1.2`) and non-numeric (e.g. `va`, `vb`) versions.

{data_manipulation_utilities-0.2.4 → data_manipulation_utilities-0.2.6}/README.md RENAMED Viewed

@@ -6,7 +6,7 @@ These are tools that can be used for different data analysis tasks.
 ## Pushing
-From the root directory of a version controlled project (i.e. a directory with the `.git` subdirectory)
+From the root directory of a version controlled project (i.e. a directory with the `.git` subdirectory)
 using a `pyproject.toml` file, run:
 ```bash
@@ -16,10 +16,10 @@ publish
 such that:
 1. The `pyproject.toml` file is checked and the version of the project is extracted.
-1. If a tag named as the version exists move to the steps below.
+1. If a tag named as the version exists move to the steps below.
 1. If it does not, make a new tag with the name as the version
-Then, for each remote it pushes the tags and the commits.
+Then, for each remote it pushes the tags and the commits.
 *Why?*
@@ -117,7 +117,17 @@ pdf   = mod.get_pdf()
 ```
 where the model is a sum of three `CrystallBall` PDFs, one with a right tail and two with a left tail.
-The `mu` and `sg` parameters are shared.
+The `mu` and `sg` parameters are shared. The elementary components that can be plugged are:
+```
+exp: Exponential
+pol1: Polynomial of degree 1
+pol2: Polynomial of degree 2
+cbr : CrystallBall with right tail
+cbl : CrystallBall with left tail
+gauss : Gaussian
+dscb : Double sided CrystallBall
+```
 ### Printing PDFs
@@ -279,7 +289,7 @@ this will:
 - Try fitting at most 10 times
 - After each fit, calculate the goodness of fit (in this case the p-value)
 - Stop when the number of tries has been exhausted or the p-value reached is higher than `0.05`
-- If the fit has not succeeded because of convergence, validity or goodness of fit issues,
+- If the fit has not succeeded because of convergence, validity or goodness of fit issues,
 randomize the parameters and try again.
 - If the desired goodness of fit has not been achieved, pick the best result.
 - Return the `FitResult` object and set the PDF to the final fit result.
@@ -317,11 +327,11 @@ bkg = zfit.pdf.Exponential(obs=obs, lam=lm)
 nbk = zfit.Parameter('nbk', 1000, 0, 10000)
 ebkg= bkg.create_extended(nbk, name='expo')
-# Add them
+# Add them
 pdf = zfit.pdf.SumPDF([ebkg, esig])
 sam = pdf.create_sampler()
-# Plot them
+# Plot them
 obj   = ZFitPlotter(data=sam, model=pdf)
 d_leg = {'gauss': 'New Gauss'}
 obj.plot(nbins=50, d_leg=d_leg, stacked=True, plot_range=(0, 10), ext_text='Extra text here')
@@ -333,7 +343,7 @@ obj.axs[1].plot([0, 10], [0, 0], linestyle='--', color='black')
 this class supports:
 - Handling title, legend, plots size.
-- Adding pulls.
+- Adding pulls.
 - Stacking and overlaying of PDFs.
 - Blinding.
@@ -397,7 +407,7 @@ rdf_bkg = _get_rdf(kind='bkg')
 cfg     = _get_config()
 obj= TrainMva(sig=rdf_sig, bkg=rdf_bkg, cfg=cfg)
-obj.run()
+obj.run(skip_fit=False) # by default it will be false, if true, it will only make plots of features
 ```
 where the settings for the training go in a config dictionary, which when written to YAML looks like:
@@ -414,7 +424,7 @@ dataset:
     nan:
         x : 0
         y : 0
-        z : -999
+        z : -999
 training :
     nfold    : 10
     features : [x, y, z]
@@ -477,7 +487,7 @@ When training on real data, several things might go wrong and the code will try
 will end up in different folds. The tool checks for wether a model is evaluated for an entry that was used for training and raise an exception. Thus, repeated
 entries will be removed before training.
-- **NaNs**: Entries with NaNs will break the training with the scikit `GradientBoostClassifier` base class. Thus, we:
+- **NaNs**: Entries with NaNs will break the training with the scikit `GradientBoostClassifier` base class. Thus, we:
     - Can use the `nan` section shown above to replace `NaN` values with something else
     - For whatever remains we remove the entries from the training.
@@ -519,7 +529,7 @@ When evaluating the model with real data, problems might occur, we deal with the
     ```python
     model.cfg
     ```
-    - For whatever entries that are still NaN, they will be _patched_  with zeros and evaluated. However, before returning, the probabilities will be
+    - For whatever features that are still NaN, they will be _patched_  with zeros when evaluated. However, the returned probabilities will be
 saved as -1. I.e. entries with NaNs will have probabilities of -1.
 # Pandas dataframes
@@ -654,6 +664,9 @@ ptr.run()
 where the config dictionary `cfg_dat` in YAML would look like:
 ```yaml
+general:
+    # This will set the figure size
+    size : [20, 10]
 selection:
     #Will do at most 50K random entries. Will only happen if the dataset has more than 50K entries
     max_ran_entries : 50000
@@ -683,6 +696,16 @@ plots:
         yscale     : 'linear'
         labels     : ['x + y', 'Entries']
         normalized : true #This should normalize to the area
+# Some vertical dashed lines are drawn by default
+# If you see them, you can turn them off with this
+style:
+  skip_lines : true
+  # This can pass arguments to legend making function `plt.legend()` in matplotlib
+  legend:
+    # The line below would place the legend outside the figure to avoid ovelaps with the histogram
+    bbox_to_anchor : [1.2, 1]
+stats:
+  nentries : '{:.2e}' # This will add number of entries in legend box
 ```
 it's up to the user to build this dictionary and load it.
@@ -704,14 +727,19 @@ The config would look like:
 ```yaml
 saving:
     plt_dir : tests/plotting/2d
+selection:
+  cuts:
+    xlow : x > -1.5
 general:
     size : [20, 10]
 plots_2d:
     # Column x and y
     # Name of column where weights are, null for not weights
     # Name of output plot, e.g. xy_x.png
-    - [x, y, weights, 'xy_w']
-    - [x, y,    null, 'xy_r']
+    # Book signaling to use log scale for z axis
+    - [x, y, weights, 'xy_w', false]
+    - [x, y,    null, 'xy_r', false]
+    - [x, y,    null, 'xy_l',  true]
 axes:
     x :
         binning : [-5.0, 8.0, 40]
@@ -803,7 +831,7 @@ Directory/Treename
     B_ENDVERTEX_CHI2DOF           Double_t
 ```
-## Comparing ROOT files
+## Comparing ROOT files
 Given two ROOT files the command below:
@@ -865,7 +893,7 @@ last_file = get_latest_file(dir_path = file_dir, wc='name_*.txt')
 # of directories in `dir_path`, e.g.:
 oversion=get_last_version(dir_path=dir_path, version_only=True)  # This will return only the version, e.g. v3.2
-oversion=get_last_version(dir_path=dir_path, version_only=False) # This will return full path, e.g. /a/b/c/v3.2
+oversion=get_last_version(dir_path=dir_path, version_only=False) # This will return full path, e.g. /a/b/c/v3.2
 ```
 The function above should work for numeric (e.g. `v1.2`) and non-numeric (e.g. `va`, `vb`) versions.

{data_manipulation_utilities-0.2.4 → data_manipulation_utilities-0.2.6}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name        = 'data_manipulation_utilities'
-version     = '0.2.4'
+version     = '0.2.6'
 readme      = 'README.md'
 dependencies= [
 'logzero',

{data_manipulation_utilities-0.2.4 → data_manipulation_utilities-0.2.6}/src/data_manipulation_utilities.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: data_manipulation_utilities
-Version: 0.2.4
+Version: 0.2.6
 Description-Content-Type: text/markdown
 Requires-Dist: logzero
 Requires-Dist: PyYAML
@@ -26,7 +26,7 @@ These are tools that can be used for different data analysis tasks.
 ## Pushing
-From the root directory of a version controlled project (i.e. a directory with the `.git` subdirectory)
+From the root directory of a version controlled project (i.e. a directory with the `.git` subdirectory)
 using a `pyproject.toml` file, run:
 ```bash
@@ -36,10 +36,10 @@ publish
 such that:
 1. The `pyproject.toml` file is checked and the version of the project is extracted.
-1. If a tag named as the version exists move to the steps below.
+1. If a tag named as the version exists move to the steps below.
 1. If it does not, make a new tag with the name as the version
-Then, for each remote it pushes the tags and the commits.
+Then, for each remote it pushes the tags and the commits.
 *Why?*
@@ -137,7 +137,17 @@ pdf   = mod.get_pdf()
 ```
 where the model is a sum of three `CrystallBall` PDFs, one with a right tail and two with a left tail.
-The `mu` and `sg` parameters are shared.
+The `mu` and `sg` parameters are shared. The elementary components that can be plugged are:
+```
+exp: Exponential
+pol1: Polynomial of degree 1
+pol2: Polynomial of degree 2
+cbr : CrystallBall with right tail
+cbl : CrystallBall with left tail
+gauss : Gaussian
+dscb : Double sided CrystallBall
+```
 ### Printing PDFs
@@ -299,7 +309,7 @@ this will:
 - Try fitting at most 10 times
 - After each fit, calculate the goodness of fit (in this case the p-value)
 - Stop when the number of tries has been exhausted or the p-value reached is higher than `0.05`
-- If the fit has not succeeded because of convergence, validity or goodness of fit issues,
+- If the fit has not succeeded because of convergence, validity or goodness of fit issues,
 randomize the parameters and try again.
 - If the desired goodness of fit has not been achieved, pick the best result.
 - Return the `FitResult` object and set the PDF to the final fit result.
@@ -337,11 +347,11 @@ bkg = zfit.pdf.Exponential(obs=obs, lam=lm)
 nbk = zfit.Parameter('nbk', 1000, 0, 10000)
 ebkg= bkg.create_extended(nbk, name='expo')
-# Add them
+# Add them
 pdf = zfit.pdf.SumPDF([ebkg, esig])
 sam = pdf.create_sampler()
-# Plot them
+# Plot them
 obj   = ZFitPlotter(data=sam, model=pdf)
 d_leg = {'gauss': 'New Gauss'}
 obj.plot(nbins=50, d_leg=d_leg, stacked=True, plot_range=(0, 10), ext_text='Extra text here')
@@ -353,7 +363,7 @@ obj.axs[1].plot([0, 10], [0, 0], linestyle='--', color='black')
 this class supports:
 - Handling title, legend, plots size.
-- Adding pulls.
+- Adding pulls.
 - Stacking and overlaying of PDFs.
 - Blinding.
@@ -417,7 +427,7 @@ rdf_bkg = _get_rdf(kind='bkg')
 cfg     = _get_config()
 obj= TrainMva(sig=rdf_sig, bkg=rdf_bkg, cfg=cfg)
-obj.run()
+obj.run(skip_fit=False) # by default it will be false, if true, it will only make plots of features
 ```
 where the settings for the training go in a config dictionary, which when written to YAML looks like:
@@ -434,7 +444,7 @@ dataset:
     nan:
         x : 0
         y : 0
-        z : -999
+        z : -999
 training :
     nfold    : 10
     features : [x, y, z]
@@ -497,7 +507,7 @@ When training on real data, several things might go wrong and the code will try
 will end up in different folds. The tool checks for wether a model is evaluated for an entry that was used for training and raise an exception. Thus, repeated
 entries will be removed before training.
-- **NaNs**: Entries with NaNs will break the training with the scikit `GradientBoostClassifier` base class. Thus, we:
+- **NaNs**: Entries with NaNs will break the training with the scikit `GradientBoostClassifier` base class. Thus, we:
     - Can use the `nan` section shown above to replace `NaN` values with something else
     - For whatever remains we remove the entries from the training.
@@ -539,7 +549,7 @@ When evaluating the model with real data, problems might occur, we deal with the
     ```python
     model.cfg
     ```
-    - For whatever entries that are still NaN, they will be _patched_  with zeros and evaluated. However, before returning, the probabilities will be
+    - For whatever features that are still NaN, they will be _patched_  with zeros when evaluated. However, the returned probabilities will be
 saved as -1. I.e. entries with NaNs will have probabilities of -1.
 # Pandas dataframes
@@ -674,6 +684,9 @@ ptr.run()
 where the config dictionary `cfg_dat` in YAML would look like:
 ```yaml
+general:
+    # This will set the figure size
+    size : [20, 10]
 selection:
     #Will do at most 50K random entries. Will only happen if the dataset has more than 50K entries
     max_ran_entries : 50000
@@ -703,6 +716,16 @@ plots:
         yscale     : 'linear'
         labels     : ['x + y', 'Entries']
         normalized : true #This should normalize to the area
+# Some vertical dashed lines are drawn by default
+# If you see them, you can turn them off with this
+style:
+  skip_lines : true
+  # This can pass arguments to legend making function `plt.legend()` in matplotlib
+  legend:
+    # The line below would place the legend outside the figure to avoid ovelaps with the histogram
+    bbox_to_anchor : [1.2, 1]
+stats:
+  nentries : '{:.2e}' # This will add number of entries in legend box
 ```
 it's up to the user to build this dictionary and load it.
@@ -724,14 +747,19 @@ The config would look like:
 ```yaml
 saving:
     plt_dir : tests/plotting/2d
+selection:
+  cuts:
+    xlow : x > -1.5
 general:
     size : [20, 10]
 plots_2d:
     # Column x and y
     # Name of column where weights are, null for not weights
     # Name of output plot, e.g. xy_x.png
-    - [x, y, weights, 'xy_w']
-    - [x, y,    null, 'xy_r']
+    # Book signaling to use log scale for z axis
+    - [x, y, weights, 'xy_w', false]
+    - [x, y,    null, 'xy_r', false]
+    - [x, y,    null, 'xy_l',  true]
 axes:
     x :
         binning : [-5.0, 8.0, 40]
@@ -823,7 +851,7 @@ Directory/Treename
     B_ENDVERTEX_CHI2DOF           Double_t
 ```
-## Comparing ROOT files
+## Comparing ROOT files
 Given two ROOT files the command below:
@@ -885,7 +913,7 @@ last_file = get_latest_file(dir_path = file_dir, wc='name_*.txt')
 # of directories in `dir_path`, e.g.:
 oversion=get_last_version(dir_path=dir_path, version_only=True)  # This will return only the version, e.g. v3.2
-oversion=get_last_version(dir_path=dir_path, version_only=False) # This will return full path, e.g. /a/b/c/v3.2
+oversion=get_last_version(dir_path=dir_path, version_only=False) # This will return full path, e.g. /a/b/c/v3.2
 ```
 The function above should work for numeric (e.g. `v1.2`) and non-numeric (e.g. `va`, `vb`) versions.

{data_manipulation_utilities-0.2.4 → data_manipulation_utilities-0.2.6}/src/data_manipulation_utilities.egg-info/SOURCES.txt RENAMED Viewed

@@ -38,10 +38,12 @@ src/dmu_data/ml/tests/train_mva.yaml
 src/dmu_data/plotting/tests/2d.yaml
 src/dmu_data/plotting/tests/fig_size.yaml
 src/dmu_data/plotting/tests/high_stat.yaml
+src/dmu_data/plotting/tests/legend.yaml
 src/dmu_data/plotting/tests/name.yaml
 src/dmu_data/plotting/tests/no_bounds.yaml
 src/dmu_data/plotting/tests/normalized.yaml
 src/dmu_data/plotting/tests/simple.yaml
+src/dmu_data/plotting/tests/stats.yaml
 src/dmu_data/plotting/tests/title.yaml
 src/dmu_data/plotting/tests/weights.yaml
 src/dmu_data/text/transform.toml

{data_manipulation_utilities-0.2.4 → data_manipulation_utilities-0.2.6}/src/dmu/ml/cv_classifier.py RENAMED Viewed

@@ -1,15 +1,15 @@
 '''
 Module holding cv_classifier class
 '''
+import os
 from typing                  import Union
 from sklearn.ensemble        import GradientBoostingClassifier
+import yaml
 from dmu.logging.log_store import LogStore
 import dmu.ml.utilities    as ut
 log = LogStore.add_logger('dmu:ml:CVClassifier')
 # ---------------------------------------
 class CVSameData(Exception):
     '''
@@ -61,6 +61,20 @@ class CVClassifier(GradientBoostingClassifier):
         return self._cfg
     # ----------------------------------
+    def save_cfg(self, path : str):
+        '''
+        Will save configuration used to train this classifier to YAML
+        path: Path to YAML file
+        '''
+        dir_name = os.path.dirname(path)
+        os.makedirs(dir_name, exist_ok=True)
+        with open(path, 'w', encoding='utf-8') as ofile:
+            yaml.safe_dump(self._cfg, ofile, indent=2)
+        log.info(f'Saved config to: {path}')
+    # ----------------------------------
     def __str__(self):
         nhash = len(self._s_hash)

{data_manipulation_utilities-0.2.4 → data_manipulation_utilities-0.2.6}/src/dmu/ml/cv_predict.py RENAMED Viewed

@@ -73,11 +73,11 @@ class CVPredict:
             log.debug('Not doing any NaN replacement')
             return df
-        log.debug(60 * '-')
+        log.info(60 * '-')
         log.info('Doing NaN replacements')
-        log.debug(60 * '-')
+        log.info(60 * '-')
         for var, val in self._d_nan_rep.items():
-            log.debug(f'{var:<20}{"--->":20}{val:<20.3f}')
+            log.info(f'{var:<20}{"--->":20}{val:<20.3f}')
             df[var] = df[var].fillna(val)
         return df
@@ -155,7 +155,7 @@ class CVPredict:
         ndif = len(s_dif_hash)
         ndat = len(s_dat_hash)
         nmod = len(s_mod_hash)
-        log.debug(f'{ndif:<20}{"=":10}{ndat:<20}{"-":10}{nmod:<20}')
+        log.debug(f'{ndif:<10}{"=":5}{ndat:<10}{"-":5}{nmod:<10}')
         df_ft_group= df_ft.loc[df_ft.index.isin(s_dif_hash)]
@@ -173,7 +173,7 @@ class CVPredict:
             return arr_prb
         nentries = len(self._arr_patch)
-        log.warning(f'Patching {nentries} probabilities')
+        log.warning(f'Patching {nentries} probabilities with -1')
         arr_prb[self._arr_patch] = -1
         return arr_prb

data-manipulation-utilities 0.2.4__tar.gz → 0.2.6__tar.gz

data-manipulation-utilities 0.2.4tar.gz → 0.2.6tar.gz