PyPI - data-manipulation-utilities - Versions diffs - 0.1.6__py3-none-any.whl → 0.1.9__py3-none-any.whl - Mend

data-manipulation-utilities 0.1.6py3-none-any.whl → 0.1.9py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

{data_manipulation_utilities-0.1.6.dist-info → data_manipulation_utilities-0.1.9.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
-Metadata-Version: 2.1
+Metadata-Version: 2.2
 Name: data_manipulation_utilities
-Version: 0.1.6
+Version: 0.1.9
 Description-Content-Type: text/markdown
 Requires-Dist: logzero
 Requires-Dist: PyYAML
@@ -41,7 +41,7 @@ such that:
 Then, for each remote it pushes the tags and the commits.
-*Why?*
+*Why?*
 1. Tags should be named as the project's version
 1. As soon as a new version is created, that version needs to be tagged.
@@ -121,6 +121,24 @@ samples:
 ## PDFs
+### Model building
+In order to do complex fits, one often needs PDFs with many parameters, which need to be added.
+In these PDFs certain parameters (e.g. $\mu$ or $\sigma$) need to be shared. This project provides
+`ModelFactory`, which can do this as shown below:
+```python
+from dmu.stats.model_factory import ModelFactory
+l_pdf = ['cbr'] + 2 * ['cbl']
+l_shr = ['mu', 'sg']
+mod   = ModelFactory(obs = Data.obs, l_pdf = l_pdf, l_shared=l_shr)
+pdf   = mod.get_pdf()
+```
+where the model is a sum of three `CrystallBall` PDFs, one with a right tail and two with a left tail.
+The `mu` and `sg` parameters are shared.
 ### Printing PDFs
 One can print a zfit PDF by doing:
@@ -231,6 +249,87 @@ likelihood :
     nbins : 100 #If specified, will do binned likelihood fit instead of unbinned
 ```
+## Minimizers
+These are alternative implementations of the minimizers in zfit meant to be used for special types of fits.
+### Anealing minimizer
+This minimizer is meant to be used for fits to models with many parameters, where multiple minima are expected in the
+likelihood. The minimizer use is illustrated in:
+```python
+from dmu.stats.minimizers  import AnealingMinimizer
+nll       = _get_nll()
+minimizer = AnealingMinimizer(ntries=10, pvalue=0.05)
+res       = minimizer.minimize(nll)
+```
+this will:
+- Take the `NLL` object.
+- Try fitting at most 10 times
+- After each fit, calculate the goodness of fit (in this case the p-value)
+- Stop when the number of tries has been exhausted or the p-value reached is higher than `0.05`
+- If the fit has not succeeded because of convergence, validity or goodness of fit issues,
+randomize the parameters and try again.
+- If the desired goodness of fit has not been achieved, pick the best result.
+- Return the `FitResult` object and set the PDF to the final fit result.
+The $\chi^2/Ndof$ can also be used as in:
+```python
+from dmu.stats.minimizers  import AnealingMinimizer
+nll       = _get_nll()
+minimizer = AnealingMinimizer(ntries=10, chi2ndof=1.00)
+res       = minimizer.minimize(nll)
+```
+## Fit plotting
+The class `ZFitPlotter` can be used to plot fits done with zfit. For a complete set of examples of how to use
+this class check the [tests](tests/stats/test_fit_plotter.py). A simple example of its usage is below:
+```python
+from dmu.stats.zfit_plotter import ZFitPlotter
+obs = zfit.Space('m', limits=(0, 10))
+# Create signal PDF
+mu  = zfit.Parameter("mu", 5.0,  0, 10)
+sg  = zfit.Parameter("sg", 0.5,  0,  5)
+sig = zfit.pdf.Gauss(obs=obs, mu=mu, sigma=sg)
+nsg = zfit.Parameter('nsg', 1000, 0, 10000)
+esig= sig.create_extended(nsg, name='gauss')
+# Create background PDF
+lm  = zfit.Parameter('lm', -0.1, -1, 0)
+bkg = zfit.pdf.Exponential(obs=obs, lam=lm)
+nbk = zfit.Parameter('nbk', 1000, 0, 10000)
+ebkg= bkg.create_extended(nbk, name='expo')
+# Add them
+pdf = zfit.pdf.SumPDF([ebkg, esig])
+sam = pdf.create_sampler()
+# Plot them
+obj   = ZFitPlotter(data=sam, model=pdf)
+d_leg = {'gauss': 'New Gauss'}
+obj.plot(nbins=50, d_leg=d_leg, stacked=True, plot_range=(0, 10), ext_text='Extra text here')
+# add a line to pull hist
+obj.axs[1].plot([0, 10], [0, 0], linestyle='--', color='black')
+```
+this class supports:
+- Handling title, legend, plots size.
+- Adding pulls.
+- Stacking and overlaying of PDFs.
+- Blinding.
 ## Arrays
 ### Scaling by non-integer

{data_manipulation_utilities-0.1.6.dist-info → data_manipulation_utilities-0.1.9.dist-info}/RECORD RENAMED Viewed

@@ -1,4 +1,4 @@
-data_manipulation_utilities-0.1.6.data/scripts/publish,sha256=-3K_Y2_4CfWCV50rPB8CRuhjxDu7xMGswinRwPovgLs,1976
+data_manipulation_utilities-0.1.9.data/scripts/publish,sha256=-3K_Y2_4CfWCV50rPB8CRuhjxDu7xMGswinRwPovgLs,1976
 dmu/arrays/utilities.py,sha256=PKoYyybPptA2aU-V3KLnJXBudWxTXu4x1uGdIMQ49HY,1722
 dmu/generic/utilities.py,sha256=0Xnq9t35wuebAqKxbyAiMk1ISB7IcXK4cFH25MT1fgw,1741
 dmu/logging/log_store.py,sha256=umdvjNDuV3LdezbG26b0AiyTglbvkxST19CQu9QATbA,4184
@@ -6,21 +6,25 @@ dmu/ml/cv_classifier.py,sha256=n81m7i2M6Zq96AEd9EZGwXSrbG5m9jkS5RdeXvbsAXU,3712
 dmu/ml/cv_predict.py,sha256=Bqxu-f6qquKJokFljhCzL_kiGcjLJLQFhVBD130fsyw,4893
 dmu/ml/train_mva.py,sha256=d_n-A07DFweikz5nXap4OE_Mqx8VprFT7zbxmnQAbac,9638
 dmu/ml/utilities.py,sha256=Nue7O9zi1QXgjGRPH6wnSAW9jusMQ2ZOSDJzBqJKIi0,3687
-dmu/plotting/plotter.py,sha256=laa6Kl7P-ZOIhaOFBVjOH4XQ4kPCV7wBNvLIMBnyCwM,7181
-dmu/plotting/plotter_1d.py,sha256=G-i94uzm2TjNaog1A4agAKar_G0qNdkAqIPCmzhe85Y,3660
-dmu/plotting/plotter_2d.py,sha256=SWPKns-CfpUZHgBXvwm3gceH3k2eL_mKGXQ8sWpZJB0,2919
+dmu/plotting/plotter.py,sha256=ytMxtzHEY8ZFU0ZKEBE-ROjMszXl5kHTMnQnWe173nU,7208
+dmu/plotting/plotter_1d.py,sha256=O7rTgCBlpCko1RSpj2TzcUIfx9sKoz2jAgw73Pz7Ynk,4472
+dmu/plotting/plotter_2d.py,sha256=J-gKnagoHGfJFU7HBrhDFpGYH5Rxy0_zF5l8eE_7ZHE,2944
 dmu/rdataframe/atr_mgr.py,sha256=FdhaQWVpsm4OOe1IRbm7rfrq8VenTNdORyI-lZ2Bs1M,2386
 dmu/rdataframe/utilities.py,sha256=x8r379F2-vZPYzAdMFCn_V4Kx2Tx9t9pn_QHcZ1euew,2756
 dmu/rfile/rfprinter.py,sha256=mp5jd-oCJAnuokbdmGyL9i6tK2lY72jEfROuBIZ_ums,3941
 dmu/rfile/utilities.py,sha256=XuYY7HuSBj46iSu3c60UYBHtI6KIPoJU_oofuhb-be0,945
-dmu/stats/fitter.py,sha256=LDvFNyhgO0OzXN7aH3kfHe6LzuPqdQfPcKR_IegDcaU,18204
+dmu/stats/fitter.py,sha256=vHNZ16U3apoQyeyM8evq-if49doF48sKB3q9wmA96Fw,18387
 dmu/stats/function.py,sha256=yzi_Fvp_ASsFzbWFivIf-comquy21WoeY7is6dgY0Go,9491
+dmu/stats/gof_calculator.py,sha256=4EN6OhULcztFvsAZ00rxgohJemnjtDNB5o0IBcv6kbk,4657
+dmu/stats/minimizers.py,sha256=f9cilFY9Kp9UvbSIUsKBGFzOOg7EEWZJLPod-4k-LAQ,6216
+dmu/stats/model_factory.py,sha256=LyDOf0f9I5dNUTS0MXHtSivD8aAcTLIagvMPtoXtThk,7426
 dmu/stats/utilities.py,sha256=LQy4kd3xSXqpApcWuYfZxkGQyjowaXv2Wr1c4Bj-4ys,4523
+dmu/stats/zfit_plotter.py,sha256=Xs6kisNEmNQXhYRCcjowxO6xHuyAyrfyQIFhGAR61U4,19719
 dmu/testing/utilities.py,sha256=WbMM4e9Cn3-B-12Vr64mB5qTKkV32joStlRkD-48lG0,3460
 dmu/text/transformer.py,sha256=4lrGknbAWRm0-rxbvgzOO-eR1-9bkYk61boJUEV3cQ0,6100
 dmu_data/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
 dmu_data/ml/tests/train_mva.yaml,sha256=TCniCVpXMEFxZcHa8IIqollKA7ci4OkBnRznLEkXM9o,925
-dmu_data/plotting/tests/2d.yaml,sha256=lTMNheK3DB8klp4O5QjMDwBI1A1Oh2_Wp2F2Ro9VQKM,282
+dmu_data/plotting/tests/2d.yaml,sha256=VApcAfJFbjNcjMCTBSRm2P37MQlGavMZv6msbZwLSgw,402
 dmu_data/plotting/tests/fig_size.yaml,sha256=7ROq49nwZ1A2EbPiySmu6n3G-Jq6YAOkc3d2X3YNZv0,294
 dmu_data/plotting/tests/high_stat.yaml,sha256=bLglBLCZK6ft0xMhQ5OltxE76cWsBMPMjO6GG0OkDr8,522
 dmu_data/plotting/tests/name.yaml,sha256=mkcPAVg8wBAmlSbSRQ1bcaMl4vOS6LXMtpqQeDrrtO4,312
@@ -39,8 +43,8 @@ dmu_scripts/rfile/compare_root_files.py,sha256=T8lDnQxsRNMr37x1Y7YvWD8ySHrJOWZki
 dmu_scripts/rfile/print_trees.py,sha256=Ze4Ccl_iUldl4eVEDVnYBoe4amqBT1fSBR1zN5WSztk,941
 dmu_scripts/ssh/coned.py,sha256=lhilYNHWRCGxC-jtyJ3LQ4oUgWW33B2l1tYCcyHHsR0,4858
 dmu_scripts/text/transform_text.py,sha256=9akj1LB0HAyopOvkLjNOJiptZw5XoOQLe17SlcrGMD0,1456
-data_manipulation_utilities-0.1.6.dist-info/METADATA,sha256=1ttATABwWcdqqPJM72_4s_ZQjtbFp9MzkfsprkDJTv8,19946
-data_manipulation_utilities-0.1.6.dist-info/WHEEL,sha256=PZUExdf71Ui_so67QXpySuHtCi3-J3wvF4ORK6k_S8U,91
-data_manipulation_utilities-0.1.6.dist-info/entry_points.txt,sha256=1TIZDed651KuOH-DgaN5AoBdirKmrKE_oM1b6b7zTUU,270
-data_manipulation_utilities-0.1.6.dist-info/top_level.txt,sha256=n_x5J6uWtSqy9mRImKtdA2V2NJNyU8Kn3u8DTOKJix0,25
-data_manipulation_utilities-0.1.6.dist-info/RECORD,,
+data_manipulation_utilities-0.1.9.dist-info/METADATA,sha256=sxu2cZc14f4VfDD2J3MLGmW0jRHXJBpmDspXUt1D_0k,23046
+data_manipulation_utilities-0.1.9.dist-info/WHEEL,sha256=In9FTNxeP60KnTkGw7wk6mJPYd_dQSjEZmXdBdMCI-8,91
+data_manipulation_utilities-0.1.9.dist-info/entry_points.txt,sha256=1TIZDed651KuOH-DgaN5AoBdirKmrKE_oM1b6b7zTUU,270
+data_manipulation_utilities-0.1.9.dist-info/top_level.txt,sha256=n_x5J6uWtSqy9mRImKtdA2V2NJNyU8Kn3u8DTOKJix0,25
+data_manipulation_utilities-0.1.9.dist-info/RECORD,,

{data_manipulation_utilities-0.1.6.dist-info → data_manipulation_utilities-0.1.9.dist-info}/WHEEL RENAMED Viewed

@@ -1,5 +1,5 @@
 Wheel-Version: 1.0
-Generator: setuptools (75.6.0)
+Generator: setuptools (75.8.0)
 Root-Is-Purelib: true
 Tag: py3-none-any

dmu/plotting/plotter.py CHANGED Viewed

@@ -65,7 +65,7 @@ class Plotter:
         return minx, maxx
     #-------------------------------------
-    def _preprocess_rdf(self, rdf):
+    def _preprocess_rdf(self, rdf : RDataFrame) -> RDataFrame:
         '''
         rdf (RDataFrame): ROOT dataframe

dmu/plotting/plotter_1d.py CHANGED Viewed

@@ -2,6 +2,9 @@
 Module containing plotter class
 '''
+import hist
+from hist import Hist
 import numpy
 import matplotlib.pyplot as plt
@@ -33,58 +36,75 @@ class Plotter1D(Plotter):
         return xname, yname
     #-------------------------------------
-    def _plot_var(self, var):
+    def _is_normalized(self, var : str) -> bool:
+        d_cfg     = self._d_cfg['plots'][var]
+        normalized=False
+        if 'normalized' in d_cfg:
+            normalized = d_cfg['normalized']
+        return normalized
+    #-------------------------------------
+    def _get_binning(self, var : str, d_data : dict[str, numpy.ndarray]) -> tuple[float, float, int]:
+        d_cfg  = self._d_cfg['plots'][var]
+        minx, maxx, bins = d_cfg['binning']
+        if maxx <= minx + 1e-5:
+            log.info(f'Bounds not set for {var}, will calculated them')
+            minx, maxx = self._find_bounds(d_data = d_data, qnt=minx)
+            log.info(f'Using bounds [{minx:.3e}, {maxx:.3e}]')
+        else:
+            log.debug(f'Using bounds [{minx:.3e}, {maxx:.3e}]')
+        return minx, maxx, bins
+    #-------------------------------------
+    def _plot_var(self, var : str) -> float:
         '''
         Will plot a variable from a dictionary of dataframes
         Parameters
         --------------------
         var   (str)  : name of column
+        Return
+        --------------------
+        Largest bin content among all bins and among all histograms plotted
         '''
         # pylint: disable=too-many-locals
-        d_cfg = self._d_cfg['plots'][var]
-        minx, maxx, bins = d_cfg['binning']
-        yscale           = d_cfg['yscale' ] if 'yscale' in d_cfg else 'linear'
-        xname, yname     = self._get_labels(var)
-        normalized=False
-        if 'normalized' in d_cfg:
-            normalized = d_cfg['normalized']
-        title = ''
-        if 'title'      in d_cfg:
-            title = d_cfg['title']
         d_data = {}
         for name, rdf in self._d_rdf.items():
             d_data[name] = rdf.AsNumpy([var])[var]
-        if maxx <= minx + 1e-5:
-            log.info(f'Bounds not set for {var}, will calculated them')
-            minx, maxx = self._find_bounds(d_data = d_data, qnt=minx)
-            log.info(f'Using bounds [{minx:.3e}, {maxx:.3e}]')
-        else:
-            log.debug(f'Using bounds [{minx:.3e}, {maxx:.3e}]')
+        minx, maxx, bins = self._get_binning(var, d_data)
+        d_wgt            = self._get_weights(var)
         l_bc_all = []
-        d_wgt    = self._get_weights(var)
         for name, arr_val in d_data.items():
-            arr_wgt    = d_wgt[name] if d_wgt is not None else None
-            self._print_weights(arr_wgt, var, name)
-            l_bc, _, _ = plt.hist(arr_val, weights=arr_wgt, bins=bins, range=(minx, maxx), density=normalized, histtype='step', label=name)
-            l_bc_all  += numpy.array(l_bc).tolist()
+            arr_wgt      = d_wgt[name] if d_wgt is not None else numpy.ones_like(arr_val)
+            hst          = Hist.new.Reg(bins=bins, start=minx, stop=maxx, name='x', label=name).Weight()
+            hst.fill(x=arr_val, weight=arr_wgt)
+            hst.plot(label=name)
+            l_bc_all    += hst.values().tolist()
-            plt.yscale(yscale)
-            plt.xlabel(xname)
-            plt.ylabel(yname)
+        max_y = max(l_bc_all)
+        return max_y
+    # --------------------------------------------
+    def _style_plot(self, var : str, max_y : float) -> None:
+        d_cfg  = self._d_cfg['plots'][var]
+        yscale = d_cfg['yscale' ] if 'yscale' in d_cfg else 'linear'
+        xname, yname = self._get_labels(var)
+        plt.xlabel(xname)
+        plt.ylabel(yname)
+        plt.yscale(yscale)
         if yscale == 'linear':
             plt.ylim(bottom=0)
-        max_y = max(l_bc_all)
+        title = ''
+        if 'title'      in d_cfg:
+            title = d_cfg['title']
         plt.ylim(top=1.2 * max_y)
+        plt.legend()
         plt.title(title)
     # --------------------------------------------
     def _plot_lines(self, var : str):
@@ -106,8 +126,10 @@ class Plotter1D(Plotter):
         fig_size = self._get_fig_size()
         for var in self._d_cfg['plots']:
             log.debug(f'Plotting: {var}')
             plt.figure(var, figsize=fig_size)
-            self._plot_var(var)
+            max_y = self._plot_var(var)
+            self._style_plot(var, max_y)
             self._plot_lines(var)
             self._save_plot(var)
 # --------------------------------------------

dmu/plotting/plotter_2d.py CHANGED Viewed

@@ -31,8 +31,8 @@ class Plotter2D(Plotter):
         if not isinstance(cfg, dict):
             raise ValueError('Config dictionary not passed')
-        self._rdf   : RDataFrame = rdf
         self._d_cfg : dict       = cfg
+        self._rdf   : RDataFrame = super()._preprocess_rdf(rdf)
         self._wgt : numpy.ndarray
     # --------------------------------------------

dmu/stats/fitter.py CHANGED Viewed

@@ -4,6 +4,7 @@ Module holding zfitter class
 import pprint
 from typing                   import Union
+from functools                import lru_cache
 import numpy
 import zfit
@@ -100,8 +101,8 @@ class Fitter:
         return data
     #------------------------------
-    def _bin_pdf(self, nbins):
-        [[min_x]], [[max_x]] = self._pdf.space.limits
+    def _bin_pdf(self):
+        nbins, min_x, max_x = self._get_binning()
         _, arr_edg = numpy.histogram(self._data_np, bins = nbins, range=(min_x, max_x))
         size = arr_edg.size
@@ -117,23 +118,29 @@ class Fitter:
         return numpy.array(l_bc)
     #------------------------------
+    def _bin_data(self):
+        nbins, min_x, max_x = self._get_binning()
+        arr_data, _ = numpy.histogram(self._data_np, bins = nbins, range=(min_x, max_x))
+        arr_data    = arr_data.astype(float)
+        return arr_data
+    #------------------------------
+    @lru_cache(maxsize=10)
     def _get_binning(self):
         min_x = numpy.min(self._data_np)
         max_x = numpy.max(self._data_np)
         nbins = self._ndof + self._get_float_pars()
+        log.debug(f'Nbins: {nbins}')
+        log.debug(f'Range: [{min_x:.3f}, {max_x:.3f}]')
         return nbins, min_x, max_x
     #------------------------------
     def _calc_gof(self):
         log.debug('Calculating GOF')
-        nbins, min_x, max_x = self._get_binning()
-        log.debug(f'Nbins: {nbins}')
-        log.debug(f'Range: [{min_x:.3f}, {max_x:.3f}]')
-        arr_data, _ = numpy.histogram(self._data_np, bins = nbins, range=(min_x, max_x))
-        arr_data    = arr_data.astype(float)
-        arr_modl    = self._bin_pdf(nbins)
+        arr_data    = self._bin_data()
+        arr_modl    = self._bin_pdf()
         norm        = numpy.sum(arr_data) / numpy.sum(arr_modl)
         arr_modl    = norm * arr_modl
         arr_res     = arr_modl - arr_data

dmu/stats/gof_calculator.py ADDED Viewed

@@ -0,0 +1,145 @@
+'''
+Module holding GofCalculator class
+'''
+from functools import lru_cache
+import zfit
+import numpy
+import pandas as pnd
+from scipy                  import stats
+from zfit.core.basepdf      import BasePDF   as zpdf
+from zfit.core.parameter    import Parameter as zpar
+from dmu.logging.log_store  import LogStore
+log = LogStore.add_logger('dmu:stats:gofcalculator')
+# ------------------------
+class GofCalculator:
+    '''
+    Class used to calculate goodness of fit from zfit NLL
+    '''
+    # ---------------------
+    def __init__(self, nll, ndof : int = 10):
+        self._nll     = nll
+        self._ndof    = ndof
+        self._pdf     = self._pdf_from_nll()
+        self._data_in = self._data_from_nll()
+        self._data_np = self._data_np_from_data(self._data_in)
+        self._data_zf = zfit.Data.from_numpy(obs=self._pdf.space, array=self._data_np)
+    # ---------------------
+    def _data_np_from_data(self, dat) -> numpy.ndarray:
+        if   isinstance(dat, numpy.ndarray):
+            return dat
+        if isinstance(dat, zfit.Data):
+            return zfit.run(zfit.z.unstack_x(dat))
+        if isinstance(dat, pnd.DataFrame):
+            return dat.to_numpy()
+        if isinstance(dat, pnd.Series):
+            dat    = pnd.DataFrame(dat)
+            return dat.to_numpy()
+        data_type = str(type(dat))
+        raise ValueError(f'Data is not a numpy array, zfit.Data or pandas.DataFrame, but {data_type}')
+    # ---------------------
+    def _pdf_from_nll(self) -> zpdf:
+        l_model = self._nll.model
+        if len(l_model) != 1:
+            raise ValueError('Not found one and only one model')
+        return l_model[0]
+    # ---------------------
+    def _data_from_nll(self) -> zpdf:
+        l_data = self._nll.data
+        if len(l_data) != 1:
+            raise ValueError('Not found one and only one dataset')
+        return l_data[0]
+    # ---------------------
+    def _get_float_pars(self) -> int:
+        npar     = 0
+        s_par    = self._pdf.get_params()
+        for par in s_par:
+            if par.floating:
+                npar+=1
+        return npar
+    # ---------------------
+    @lru_cache(maxsize=10)
+    def _get_binning(self) -> tuple[int, float, float]:
+        min_x = numpy.min(self._data_np)
+        max_x = numpy.max(self._data_np)
+        nbins = self._ndof + self._get_float_pars()
+        log.debug(f'Nbins: {nbins}')
+        log.debug(f'Range: [{min_x:.3f}, {max_x:.3f}]')
+        return nbins, min_x, max_x
+    # ---------------------
+    def _get_pdf_bin_contents(self) -> numpy.ndarray:
+        nbins, min_x, max_x  = self._get_binning()
+        _, arr_edg = numpy.histogram(self._data_np, bins = nbins, range=(min_x, max_x))
+        size = arr_edg.size
+        l_bc = []
+        for i_edg in range(size - 1):
+            low = arr_edg[i_edg + 0]
+            hig = arr_edg[i_edg + 1]
+            var : zpar = self._pdf.integrate(limits = [low, hig])
+            val = var.numpy()[0]
+            l_bc.append(val * self._data_np.size)
+        return numpy.array(l_bc)
+    #------------------------------
+    def _get_data_bin_contents(self) -> numpy.ndarray:
+        nbins, min_x, max_x = self._get_binning()
+        arr_data, _         = numpy.histogram(self._data_np, bins = nbins, range=(min_x, max_x))
+        arr_data            = arr_data.astype(float)
+        return arr_data
+    #------------------------------
+    @lru_cache(maxsize=30)
+    def _calculate_gof(self) -> tuple[float, int, float]:
+        log.debug('Calculating GOF')
+        arr_data    = self._get_data_bin_contents()
+        arr_modl    = self._get_pdf_bin_contents()
+        norm        = numpy.sum(arr_data) / numpy.sum(arr_modl)
+        arr_modl    = norm * arr_modl
+        arr_res     = arr_modl - arr_data
+        arr_chi2    = numpy.divide(arr_res ** 2, arr_data, out=numpy.zeros_like(arr_data), where=arr_data!=0)
+        sum_chi2    = numpy.sum(arr_chi2)
+        pvalue      = 1 - stats.chi2.cdf(sum_chi2, self._ndof)
+        pvalue      = float(pvalue)
+        log.debug(f'Chi2: {sum_chi2:.3f}')
+        log.debug(f'Ndof: {self._ndof}')
+        log.debug(f'pval: {pvalue:<.3e}')
+        return sum_chi2, self._ndof, pvalue
+    # ---------------------
+    def get_gof(self, kind : str) -> float:
+        '''
+        Returns good ness of fit of a given kind
+        kind: Type of goodness of fit, e.g. pvalue
+        '''
+        chi2, ndof, pval = self._calculate_gof()
+        if kind == 'pvalue':
+            return pval
+        if kind == 'chi2/ndof':
+            return chi2/ndof
+        raise NotImplementedError(f'Invalid goodness of fit: {kind}')
+# ------------------------

data-manipulation-utilities 0.1.6__py3-none-any.whl → 0.1.9__py3-none-any.whl

data-manipulation-utilities 0.1.6py3-none-any.whl → 0.1.9py3-none-any.whl