PyPI - mlquantify - Versions diffs - 0.1.20__tar.gz → 0.1.21__tar.gz - Mend

mlquantify 0.1.20tar.gz → 0.1.21tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (67) hide show

mlquantify-0.1.21/LICENSE ADDED Viewed

@@ -0,0 +1,28 @@
+BSD 3-Clause License
+Copyright (c) 2025, Luiz Fernando Luth Junior and Andre Gustavo Maletzke
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+1. Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+2. Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+3. Neither the name of the copyright holder nor the names of its
+   contributors may be used to endorse or promote products derived from
+   this software without specific prior written permission.
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

{mlquantify-0.1.20/mlquantify.egg-info → mlquantify-0.1.21}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: mlquantify
-Version: 0.1.20
+Version: 0.1.21
 Summary: Quantification Library
 Home-page: https://github.com/luizfernandolj/QuantifyML/tree/master
 Maintainer: Luiz Fernando Luth Junior
@@ -12,6 +12,7 @@ Classifier: Operating System :: Unix
 Classifier: Operating System :: MacOS :: MacOS X
 Classifier: Operating System :: Microsoft :: Windows
 Description-Content-Type: text/markdown
+License-File: LICENSE
 Requires-Dist: scikit-learn
 Requires-Dist: numpy
 Requires-Dist: scipy
@@ -26,25 +27,23 @@ Dynamic: description
 Dynamic: description-content-type
 Dynamic: home-page
 Dynamic: keywords
+Dynamic: license-file
 Dynamic: maintainer
 Dynamic: requires-dist
 Dynamic: summary
-<img src="assets/logo_mlquantify-white.svg" alt="mlquantify logo">
+![PyPI - Version](https://img.shields.io/pypi/v/mlquantify)
+[![docs](https://img.shields.io/badge/docs-sphinx-blue)](https://github.com/luizfernandolj/mlquantify/)
+<a href="https://luizfernandolj.github.io/mlquantify/"><img src="assets/logo_mlquantify-white.svg" alt="mlquantify logo"></a>
 <h4 align="center">A Python Package for Quantification</h4>
 ___
  **mlquantify** is a Python library for quantification, also known as supervised prevalence estimation, designed to estimate the distribution of classes within datasets. It offers a range of tools for various quantification methods, model selection tailored for quantification tasks, evaluation metrics, and protocols to assess quantification performance. Additionally, mlquantify includes popular datasets and visualization tools to help analyze and interpret results.
-___
-## Latest Release
-- **Version 0.1.11**: Inicial beta version. For a detailed list of changes, check the [changelog](#).
-- In case you need any help, refer to the [User Guide](https://luizfernandolj.github.io/mlquantify/user_guide.html).
-- Explore the [API documentation](https://luizfernandolj.github.io/mlquantify/api/index.html) for detailed developer information.
-- See also the library in the pypi site in [pypi mlquantify](https://pypi.org/project/mlquantify/)
+ Website: https://luizfernandolj.github.io/mlquantify/
 ___
@@ -112,6 +111,10 @@ print(f"Mean Absolute Error -> {mae}")
 print(f"Normalized Relative Absolute Error -> {nrae}")
 ```
+- In case you need any help, refer to the [User Guide](https://luizfernandolj.github.io/mlquantify/user_guide.html).
+- Explore the [API documentation](https://luizfernandolj.github.io/mlquantify/api/index.html) for detailed developer information.
+- See also the library in the pypi site in [pypi mlquantify](https://pypi.org/project/mlquantify/)
 ___
 ## Requirements
@@ -123,11 +126,3 @@ ___
 - tqdm
 - matplotlib
 - xlrd
-___
-## Documentation
-##### API is avaliable [here](https://luizfernandolj.github.io/mlquantify/api/)
-___

{mlquantify-0.1.20 → mlquantify-0.1.21}/README.md RENAMED Viewed

@@ -1,18 +1,15 @@
-<img src="assets/logo_mlquantify-white.svg" alt="mlquantify logo">
+![PyPI - Version](https://img.shields.io/pypi/v/mlquantify)
+[![docs](https://img.shields.io/badge/docs-sphinx-blue)](https://github.com/luizfernandolj/mlquantify/)
+<a href="https://luizfernandolj.github.io/mlquantify/"><img src="assets/logo_mlquantify-white.svg" alt="mlquantify logo"></a>
 <h4 align="center">A Python Package for Quantification</h4>
 ___
  **mlquantify** is a Python library for quantification, also known as supervised prevalence estimation, designed to estimate the distribution of classes within datasets. It offers a range of tools for various quantification methods, model selection tailored for quantification tasks, evaluation metrics, and protocols to assess quantification performance. Additionally, mlquantify includes popular datasets and visualization tools to help analyze and interpret results.
-___
-## Latest Release
-- **Version 0.1.11**: Inicial beta version. For a detailed list of changes, check the [changelog](#).
-- In case you need any help, refer to the [User Guide](https://luizfernandolj.github.io/mlquantify/user_guide.html).
-- Explore the [API documentation](https://luizfernandolj.github.io/mlquantify/api/index.html) for detailed developer information.
-- See also the library in the pypi site in [pypi mlquantify](https://pypi.org/project/mlquantify/)
+ Website: https://luizfernandolj.github.io/mlquantify/
 ___
@@ -80,6 +77,10 @@ print(f"Mean Absolute Error -> {mae}")
 print(f"Normalized Relative Absolute Error -> {nrae}")
 ```
+- In case you need any help, refer to the [User Guide](https://luizfernandolj.github.io/mlquantify/user_guide.html).
+- Explore the [API documentation](https://luizfernandolj.github.io/mlquantify/api/index.html) for detailed developer information.
+- See also the library in the pypi site in [pypi mlquantify](https://pypi.org/project/mlquantify/)
 ___
 ## Requirements
@@ -90,12 +91,4 @@ ___
 - joblib
 - tqdm
 - matplotlib
-- xlrd
-___
-## Documentation
-##### API is avaliable [here](https://luizfernandolj.github.io/mlquantify/api/)
-___
+- xlrd

mlquantify-0.1.21/VERSION.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+ 0.1.21

{mlquantify-0.1.20 → mlquantify-0.1.21}/mlquantify/__init__.py RENAMED Viewed

@@ -10,4 +10,5 @@ from . import base_aggregative
 from . import base
 from . import calibration
 from . import confidence
-from . import multiclass
+from . import multiclass
+from . import neural

{mlquantify-0.1.20 → mlquantify-0.1.21}/mlquantify/adjust_counting/__init__.py RENAMED Viewed

@@ -6,14 +6,15 @@ from ._adjustment import (
     ThresholdAdjustment,
     MatrixAdjustment,
     FM,
-    GAC,
-    GPAC,
-    ACC,
-    X_method,
-    MAX,
+    AC,
+    PAC,
+    TAC,
+    TX,
+    TMAX,
     T50,
     MS,
     MS2,
+    CDE,
 )
 from ._utils import (

{mlquantify-0.1.20 → mlquantify-0.1.21}/mlquantify/adjust_counting/_adjustment.py RENAMED Viewed

@@ -1,15 +1,30 @@
+from mlquantify.utils._validation import validate_prevalences
+from mlquantify.base import BaseQuantifier
 import numpy as np
 from abc import abstractmethod
 from scipy.optimize import minimize
 import warnings
 from sklearn.metrics import confusion_matrix
+from mlquantify.utils._tags import (
+    PredictionRequirements,
+    Tags,
+)
 from mlquantify.adjust_counting._base import BaseAdjustCount
 from mlquantify.adjust_counting._counting import CC, PCC
+from mlquantify.utils import (
+    _fit_context,
+    validate_data,
+    validate_prevalences,
+    validate_predictions,
+    check_classes_attribute
+)
 from mlquantify.base_aggregative import (
     CrispLearnerQMixin,
     SoftLearnerQMixin,
-    uses_soft_predictions,
+    AggregationMixin,
+    uses_soft_predictions,
+    _get_learner_function
 )
 from mlquantify.multiclass import define_binary
 from mlquantify.adjust_counting._utils import evaluate_thresholds
@@ -98,14 +113,14 @@ class ThresholdAdjustment(SoftLearnerQMixin, BaseAdjustCount):
         self.threshold = threshold
         self.strategy = strategy
-    def _adjust(self, predictions, train_y_scores, train_y_values):
+    def _adjust(self, predictions, train_y_scores, y_train):
         """Internal adjustment computation based on selected ROC threshold."""
         positive_scores = train_y_scores[:, 1]
-        thresholds, tprs, fprs = evaluate_thresholds(train_y_values, positive_scores)
+        thresholds, tprs, fprs = evaluate_thresholds(y_train, positive_scores)
         threshold, tpr, fpr = self.get_best_threshold(thresholds, tprs, fprs)
-        cc_predictions = CC(threshold=threshold).aggregate(predictions, train_y_values)
+        cc_predictions = CC(threshold=threshold).aggregate(predictions, y_train)
         cc_predictions = list(cc_predictions.values())[1]
         if tpr - fpr == 0:
@@ -200,18 +215,18 @@ class MatrixAdjustment(BaseAdjustCount):
         super().__init__(learner=learner)
         self.solver = solver
-    def _adjust(self, predictions, train_y_pred, train_y_values):
-        n_class = len(np.unique(train_y_values))
+    def _adjust(self, predictions, train_y_pred, y_train):
+        n_class = len(np.unique(y_train))
         self.CM = np.zeros((n_class, n_class))
         if self.solver == 'optim':
-            priors = np.array(list(CC().aggregate(train_y_pred, train_y_values).values()))
-            self.CM = self._compute_confusion_matrix(train_y_pred, train_y_values, priors)
-            prevs_estim = self._get_estimations(predictions > priors, train_y_values)
+            priors = np.array(list(CC().aggregate(train_y_pred, y_train).values()))
+            self.CM = self._compute_confusion_matrix(train_y_pred, y_train, priors)
+            prevs_estim = self._get_estimations(predictions > priors, y_train)
             prevalence = self._solve_optimization(prevs_estim, priors)
         else:
-            self.CM = self._compute_confusion_matrix(train_y_pred, train_y_values)
-            prevs_estim = self._get_estimations(predictions, train_y_values)
+            self.CM = self._compute_confusion_matrix(train_y_pred, y_train)
+            prevs_estim = self._get_estimations(predictions, y_train)
             prevalence = self._solve_linear(prevs_estim)
         return prevalence
@@ -262,17 +277,173 @@ class MatrixAdjustment(BaseAdjustCount):
         result = minimize(objective, init, constraints=constraints, bounds=bounds)
         return result.x if result.success else priors
-    def _get_estimations(self, predictions, train_y_values):
+    def _get_estimations(self, predictions, y_train):
         """Return prevalence estimates using CC (crisp) or PCC (probabilistic)."""
         if uses_soft_predictions(self):
             return np.array(list(PCC().aggregate(predictions).values()))
-        return np.array(list(CC().aggregate(predictions, train_y_values).values()))
+        return np.array(list(CC().aggregate(predictions, y_train).values()))
     @abstractmethod
     def _compute_confusion_matrix(self, predictions, *args):
         ...
+@define_binary
+class CDE(SoftLearnerQMixin, AggregationMixin, BaseQuantifier):
+    r"""CDE-Iterate for binary classification prevalence estimation.
+    Threshold :math:`\tau` from false positive and false negative costs:
+    .. math::
+        \tau = \frac{c_{FP}}{c_{FP} + c_{FN}}
+    Hard classification by thresholding posterior probability :math:`p(+|x)` at :math:`\tau`:
+    .. math::
+        \hat{y}(x) = \mathbf{1}_{p(+|x) > \tau}
+    Prevalence estimation via classify-and-count:
+    .. math::
+        \hat{p}_U(+) = \frac{1}{N} \sum_{n=1}^N \hat{y}(x_n)
+    False positive cost update:
+    .. math::
+        c_{FP}^{new} = \frac{p_L(+)}{p_L(-)} \times \frac{\hat{p}_U(-)}{\hat{p}_U(+)} \times c_{FN}
+    Parameters
+    ----------
+    learner : estimator, optional
+        Wrapped classifier (unused).
+    tol : float, default=1e-4
+        Convergence tolerance.
+    max_iter : int, default=100
+        Max iterations.
+    init_cfp : float, default=1.0
+        Initial false positive cost.
+    References
+    ----------
+    .. [1] Esuli, A., Moreo, A., & Sebastiani, F. (2023). Learning to Quantify. Springer.
+    """
+    _parameter_constraints = {
+        "tol": [Interval(0, None, inclusive_left=False)],
+        "max_iter": [Interval(1, None, inclusive_left=True)],
+        "init_cfp": [Interval(0, None, inclusive_left=False)]
+    }
+    def __mlquantify_tags__(self):
+        tags = super().__mlquantify_tags__()
+        tags.prediction_requirements.requires_train_proba = False
+        return tags
+    def __init__(self, learner=None, tol=1e-4, max_iter=100, init_cfp=1.0, strategy="ovr"):
+        self.learner = learner
+        self.tol = float(tol)
+        self.max_iter = int(max_iter)
+        self.init_cfp = float(init_cfp)
+        self.strategy = strategy
+    @_fit_context(prefer_skip_nested_validation=True)
+    def fit(self, X, y):
+        """Fit the quantifier using the provided data and learner."""
+        X, y = validate_data(self, X, y)
+        self.classes_ = np.unique(y)
+        self.learner.fit(X, y)
+        counts = np.array([np.count_nonzero(y == _class) for _class in self.classes_])
+        self.priors = counts / len(y)
+        self.y_train = y
+        return self
+    def predict(self, X):
+        """Predict class prevalences for the given data."""
+        predictions = getattr(self.learner, _get_learner_function(self))(X)
+        prevalences = self.aggregate(predictions, self.y_train)
+        return prevalences
+    def aggregate(self, predictions, y_train):
+        self.classes_ = check_classes_attribute(self, np.unique(y_train))
+        predictions = validate_predictions(self, predictions)
+        if hasattr(self, 'priors'):
+            Ptr = np.asarray(self.priors, dtype=np.float64)
+        else:
+            counts = np.array([np.count_nonzero(y_train == _class) for _class in self.classes_])
+            Ptr = counts / len(y_train)
+        P = np.asarray(predictions, dtype=np.float64)
+        # ensure no zeros
+        eps = 1e-12
+        P = np.clip(P, eps, 1.0)
+        # training priors pL(+), pL(-)
+        # assume Ptr order matches columns of P; if Ptr sums to 1 but order unknown, user must match.
+        pL_pos = Ptr[1]
+        pL_neg = Ptr[0]
+        if pL_pos <= 0 or pL_neg <= 0:
+            # keep them positive to avoid divisions by zero
+            pL_pos = max(pL_pos, eps)
+            pL_neg = max(pL_neg, eps)
+        # initialize costs
+        cFN = 1.0
+        cFP = float(self.init_cfp)
+        prev_prev_pos = None
+        s = 0
+        # iterate: compute threshold from costs, classify, estimate prevalences via CC,
+        # update cFP, repeat
+        while s < self.max_iter:
+            # decision threshold tau for positive class:
+            # Derivation:
+            # predict positive if cost_FP * p(-|x) < cost_FN * p(+|x)
+            # => predict positive if p(+|x) / p(-|x) > cost_FP / cost_FN
+            # since p(+|x) / p(-|x) = p(+|x) / (1 - p(+|x)):
+            # p(+|x) > cost_FP / (cost_FP + cost_FN)
+            tau = cFP / (cFP + cFN)
+            # hard predictions for positive class using threshold on posterior for positive (col 1)
+            pos_probs = P[:, 1]
+            hard_pos = (pos_probs > tau).astype(float)
+            # classify-and-count prevalence estimate on U
+            prev_pos = hard_pos.mean()
+            prev_neg = 1.0 - prev_pos
+            # update cFP according to:
+            # cFP_new = (pL_pos / pL_neg) * (pU_hat(neg) / pU_hat(pos)) * cFN
+            # guard against zero prev_pos / prev_neg
+            prev_pos_safe = max(prev_pos, eps)
+            prev_neg_safe = max(prev_neg, eps)
+            cFP_new = (pL_pos / pL_neg) * (prev_neg_safe / prev_pos_safe) * cFN
+            # check convergence on prevalences (absolute change)
+            if prev_prev_pos is not None and abs(prev_pos - prev_prev_pos) < self.tol:
+                break
+            # prepare next iter
+            cFP = cFP_new
+            prev_prev_pos = prev_pos
+            s += 1
+        # if didn't converge within max_iter we keep last estimate (lack of fisher consistency)
+        if s >= self.max_iter:
+            # optional: warning
+            # print('[warning] CDE-Iterate reached max_iter without converging')
+            pass
+        prevalences = np.array([prev_neg, prev_pos], dtype=np.float64)
+        prevalences = validate_prevalences(self, prevalences, self.classes_)
+        return prevalences
 class FM(SoftLearnerQMixin, MatrixAdjustment):
     r"""Friedman Method for quantification adjustment.
@@ -337,14 +508,14 @@ class FM(SoftLearnerQMixin, MatrixAdjustment):
     def _compute_confusion_matrix(self, posteriors, y_true, priors):
         for i, _class in enumerate(self.classes_):
             indices = (y_true == _class)
-            self.CM[:, i] = self._get_estimations(posteriors[indices] > priors)
+            self.CM[:, i] = self._get_estimations(posteriors[indices] > priors, y_true[indices])
         return self.CM
-class GAC(CrispLearnerQMixin, MatrixAdjustment):
-    r"""Generalized Adjusted Count method.
+class AC(CrispLearnerQMixin, MatrixAdjustment):
+    r"""Adjusted Count method.
-    This class implements the Generalized Adjusted Count (GAC) algorithm for
+    This class implements the Adjusted Count (AC) algorithm for
     quantification adjustment as described in Firat (2016) [1]_. The method
     adjusts the estimated class prevalences by normalizing the confusion matrix
     based on prevalence estimates, providing a correction for bias caused by
@@ -374,12 +545,12 @@ class GAC(CrispLearnerQMixin, MatrixAdjustment):
     Examples
     --------
     >>> from sklearn.linear_model import LogisticRegression
-    >>> from mlquantify.adjust_counting import GAC
+    >>> from mlquantify.adjust_counting import AC
     >>> import numpy as np
-    >>> gac = GAC(learner=LogisticRegression())
+    >>> ac = AC(learner=LogisticRegression())
     >>> X = np.random.randn(50, 4)
     >>> y = np.random.randint(0, 2, 50)
-    >>> gac.fit(X, y)
+    >>> ac.fit(X, y)
     >>> gac.predict(X)
     {0: 0.5, 1: 0.5}
@@ -404,11 +575,11 @@ class GAC(CrispLearnerQMixin, MatrixAdjustment):
         return self.CM
-class GPAC(SoftLearnerQMixin, MatrixAdjustment):
-    r"""Probabilistic Generalized Adjusted Count (GPAC) method.
+class PAC(SoftLearnerQMixin, MatrixAdjustment):
+    r"""Probabilistic Adjusted Count (PAC) method.
-    This class implements the probabilistic extension of the Generalized Adjusted Count method
-    as presented in Firat (2016) [1]_. The GPAC method normalizes the confusion matrix by
+    This class implements the probabilistic extension of the Adjusted Count method
+    as presented in Firat (2016) [1]_. The PAC method normalizes the confusion matrix by
     the estimated prevalences from posterior probabilities, enabling a probabilistic correction
     of class prevalences.
@@ -436,13 +607,13 @@ class GPAC(SoftLearnerQMixin, MatrixAdjustment):
     Examples
     --------
     >>> from sklearn.linear_model import LogisticRegression
-    >>> from mlquantify.adjust_counting import GPAC
+    >>> from mlquantify.adjust_counting import PAC
     >>> import numpy as np
-    >>> gpac = GPAC(learner=LogisticRegression())
+    >>> pac = PAC(learner=LogisticRegression())
     >>> X = np.random.randn(50, 4)
     >>> y = np.random.randint(0, 2, 50)
-    >>> gpac.fit(X, y)
-    >>> gpac.predict(X)
+    >>> pac.fit(X, y)
+    >>> pac.predict(X)
     {0: 0.5, 1: 0.5}
     References
@@ -466,8 +637,8 @@ class GPAC(SoftLearnerQMixin, MatrixAdjustment):
         return self.CM
-class ACC(ThresholdAdjustment):
-    r"""Adjusted Count (ACC) — baseline threshold correction.
+class TAC(ThresholdAdjustment):
+    r"""Threshold Adjusted Count (TAC) — baseline threshold correction.
     This method corrects the bias in class prevalence estimates caused by imperfect
     classification accuracy, by adjusting the observed positive count using estimates
@@ -501,8 +672,8 @@ class ACC(ThresholdAdjustment):
         return (self.threshold, tpr, fpr)
-class X_method(ThresholdAdjustment):
-    r"""X method — threshold where :math:`\text{TPR} + \text{FPR} = 1`.
+class TX(ThresholdAdjustment):
+    r"""Threshold X method — threshold where :math:`\text{TPR} + \text{FPR} = 1`.
     This method selects the classification threshold at which the sum of the true positive
     rate (TPR) and false positive rate (FPR) equals one. This threshold choice balances
@@ -526,8 +697,8 @@ class X_method(ThresholdAdjustment):
         return thresholds[idx], tprs[idx], fprs[idx]
-class MAX(ThresholdAdjustment):
-    r"""MAX method — threshold maximizing :math:`\text{TPR} - \text{FPR}`.
+class TMAX(ThresholdAdjustment):
+    r"""Threshold MAX method — threshold maximizing :math:`\text{TPR} - \text{FPR}`.
     This method selects the threshold that maximizes the difference between the true positive
     rate (TPR) and the false positive rate (FPR), effectively optimizing classification
@@ -601,15 +772,15 @@ class MS(ThresholdAdjustment):
     .. [1] Forman, G. (2008). "Quantifying Counts and Costs via Classification",
            *Data Mining and Knowledge Discovery*, 17(2), 164-206.
     """
-    def _adjust(self, predictions, train_y_scores, train_y_values):
+    def _adjust(self, predictions, train_y_scores, y_train):
         positive_scores = train_y_scores[:, 1]
-        thresholds, tprs, fprs = evaluate_thresholds(train_y_values, positive_scores)
+        thresholds, tprs, fprs = evaluate_thresholds(y_train, positive_scores)
         thresholds, tprs, fprs = self.get_best_threshold(thresholds, tprs, fprs)
         prevs = []
         for thr, tpr, fpr in zip(thresholds, tprs, fprs):
-            cc_predictions = CC(threshold=thr).aggregate(predictions, train_y_values)
+            cc_predictions = CC(threshold=thr).aggregate(predictions, y_train)
             cc_predictions = list(cc_predictions.values())[1]
             if tpr - fpr == 0:

{mlquantify-0.1.20 → mlquantify-0.1.21}/mlquantify/adjust_counting/_base.py RENAMED Viewed

@@ -93,14 +93,13 @@ class BaseCount(AggregationMixin, BaseQuantifier):
     def __mlquantify_tags__(self):
         tags = super().__mlquantify_tags__()
         tags.prediction_requirements.requires_train_proba = False
-        tags.prediction_requirements.requires_train_labels = False
+        tags.prediction_requirements.requires_train_labels = True
         return tags
     @_fit_context(prefer_skip_nested_validation=True)
     def fit(self, X, y, learner_fitted=False, *args, **kwargs):
         """Fit the quantifier using the provided data and learner."""
         X, y = validate_data(self, X, y)
-        validate_y(self, y)
         self.classes_ = np.unique(y)
         if not learner_fitted:
             self.learner.fit(X, y, *args, **kwargs)
@@ -207,7 +206,6 @@ class BaseAdjustCount(AggregationMixin, BaseQuantifier):
     def fit(self, X, y, learner_fitted=False, cv=10, stratified=True, random_state=None, shuffle=True):
         """Fit the quantifier using the provided data and learner."""
         X, y = validate_data(self, X, y)
-        validate_y(self, y)
         self.classes_ = np.unique(y)
         learner_function = _get_learner_function(self)
@@ -236,12 +234,13 @@ class BaseAdjustCount(AggregationMixin, BaseQuantifier):
         prevalences = self.aggregate(predictions, self.train_predictions, self.train_y_values)
         return prevalences
-    def aggregate(self, predictions, train_predictions, y_train_values):
+    def aggregate(self, predictions, train_predictions, y_train):
         """Aggregate predictions and apply matrix- or rate-based bias correction."""
-        self.classes_ = check_classes_attribute(self, np.unique(y_train_values))
+        self.classes_ = check_classes_attribute(self, np.unique(y_train))
         predictions = validate_predictions(self, predictions)
+        train_predictions = validate_predictions(self, train_predictions)
-        prevalences = self._adjust(predictions, train_predictions, y_train_values)
+        prevalences = self._adjust(predictions, train_predictions, y_train)
         prevalences = validate_prevalences(self, prevalences, self.classes_)
         return prevalences

{mlquantify-0.1.20 → mlquantify-0.1.21}/mlquantify/adjust_counting/_counting.py RENAMED Viewed

@@ -75,13 +75,13 @@ class CC(CrispLearnerQMixin, BaseCount):
         super().__init__(learner=learner)
         self.threshold = threshold
-    def aggregate(self, predictions, train_y_values=None):
-        predictions = validate_predictions(self, predictions, self.threshold, train_y_values)
+    def aggregate(self, predictions, y_train=None):
+        predictions = validate_predictions(self, predictions, self.threshold, y_train)
-        if train_y_values is None:
-            train_y_values = np.unique(predictions)
+        if y_train is None:
+            y_train = np.unique(predictions)
-        self.classes_ = check_classes_attribute(self, np.unique(train_y_values))
+        self.classes_ = check_classes_attribute(self, np.unique(y_train))
         class_counts = np.array([np.count_nonzero(predictions == _class) for _class in self.classes_])
         prevalences = class_counts / len(predictions)
@@ -134,12 +134,15 @@ class PCC(SoftLearnerQMixin, BaseCount):
     def __init__(self, learner=None):
         super().__init__(learner=learner)
-    def aggregate(self, predictions):
+    def aggregate(self, predictions, y_train=None):
         predictions = validate_predictions(self, predictions)
         # Handle categorical predictions (1D array with class labels)
         if predictions.ndim == 1 and not np.issubdtype(predictions.dtype, (np.floating, np.integer)):
-            self.classes_ = check_classes_attribute(self, np.unique(predictions))
+            if y_train is None:
+                y_values = np.unique(predictions)
+            self.classes_ = check_classes_attribute(self, np.unique(y_values))
             class_counts = np.array([np.count_nonzero(predictions == _class) for _class in self.classes_])
             prevalences = class_counts / len(predictions)
         else:

{mlquantify-0.1.20 → mlquantify-0.1.21}/mlquantify/likelihood/__init__.py RENAMED Viewed

@@ -1,5 +1,3 @@
 from ._classes import (
     EMQ,
-    MLPE,
-    CDE
 )

mlquantify 0.1.20__tar.gz → 0.1.21__tar.gz

mlquantify 0.1.20tar.gz → 0.1.21tar.gz